Even before I tried, I already realize it must be true when I read this reply! Great job! thanks, Andy.
> str(z) `data.frame': 235 obs. of 2 variables: $ CLAIMNUM : Factor w/ 1907 levels "0","10000001849",..: 1083 1083 1083 1582 1582 1084 1681 1681 1391 1391 ... $ SIU.SAVED: int 475 3000 3000 0 0 4352 0 0 4500 3000 ... So, I have another general question: how to avoid this when I do the matching? In my case, claimnum does not have to be a factor. I think I can do as.integer on it to de-factor it. But, I want to know how to do it w/ keeping is as factor? btw, what's your way to drop those levels? :) weiwei On 6/21/05, Liaw, Andy <[EMAIL PROTECTED]> wrote: > What does str(z) say? I suspect the second column is a factor, which, after > the subsetting, has some empty levels. If so, just drop those levels. > > Andy > > > From: Weiwei Shi > > > > hi > > i tried all the methods suggested above: > > ave and rowsum with "with" function works for my situation. I think > > the problem might not be due to tapply. > > My data z comes from > > z<-y[y[[1]] %in% x[[2]], c(1,9)] > > > > while z is supposed to have no entries for those non-matched > > between x and y. > > > > however, when I run tapply, and the result also includes those > > non-matched entries. I use is.na function to remove those entry from z > > first and then use tapply again, but the result is the same: those > > NA's and those non-matched results are still there. That's what I mean > > by "it doesn't work". > > > > Is there something I missed here so that z "implicitly" has some > > "trace" back to y dataset? > > > > thanks, > > > > On 6/20/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > > > On 6/20/05, Weiwei Shi <[EMAIL PROTECTED]> wrote: > > > > hi, > > > > i have another question on tapply: > > > > i have a dataset z like this: > > > > 5540 389100307391 2600 > > > > 5541 389100307391 2600 > > > > 5542 389100307391 2600 > > > > 5543 389100307391 2600 > > > > 5544 389100307391 2600 > > > > 5546 381300302513 NA > > > > 5547 387000307470 NA > > > > 5548 387000307470 NA > > > > 5549 387000307470 NA > > > > 5550 387000307470 NA > > > > 5551 387000307470 NA > > > > 5552 387000307470 NA > > > > > > > > I want to sum the column 3 by column 2. > > > > I removed NA by calling: > > > > tapply(z[[3]], z[[2]], sum, na.rm=T) > > > > but it does not work. > > > > > > > > then, i used > > > > z1<-z[!is.na(z[[3]],] > > > > and repeat > > > > still doesn't work. > > > > > > > > please help. > > > > > > > > > > Depending on what you want you may be able to use rowsum: > > > > > > - display only groups that have at least one non-NA with the sum > > > being the sum of the non-NAs: > > > > > > with(na.omit(z), rowsum(V3, V2)) > > > > > > - display all groups with the sum being NA if any member is NA: > > > > > > rowsum(z$V3, z$V2) > > > > > > > > > -- > > Weiwei Shi, Ph.D > > > > "Did you always know?" > > "No, I did not. But I believed..." > > ---Matrix III > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachment...{{dropped}} ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html