[R] Interaction between aggregate() and length()
Folks, I've been running into an odd situation that occurs when I use length() function with aggregate(), but not with either one separately. Together, the results looks correct but is given an unexpected name. 'if (stringsAsFactors) factor(x) else x' instead of just 'x'. # Numbers work ok tt - data.frame(idx=c(1,1,1,1,1,1,2,2,2,2,2,2) ,n=c(1,3,5,7,5,5,2,4,8,16,4,4) ,t=c(1,3,5,7,5,5,2,4,8,16,4,4) ,stringsAsFactors=FALSE) aggregate(tt$t, list('idx'=tt$idx), length) aggregate(as.factor(tt$t), list('idx'=tt$idx), length) # Character data doesn't work right unless I convert the data to factors. tt - data.frame(idx=c(1,1,1,1,1,1,2,2,2,2,2,2) ,n=c('1','3','5','7','5','5','2','4','8','16','4','4') ,t=c('1','3','5','7','5','5','2','4','8','16','4','4') ,stringsAsFactors=FALSE) aggregate(tt$t, list('idx'=tt$idx), length) aggregate(as.factor(tt$t), list('idx'=tt$idx), length) Any idea what is going on here? For the record, this also happens with the modalvalue() function defined at http://wiki.r-project.org/rwiki/doku.php?id=tips:stats-basic:modalvalue (which also relies on length() ). As a side note, this began as an attempt to determine sample size, for which I've defined a function count - function(x) { length(na.omit(x)) }. No doubt there's a built in function to do just that, but as a newbie I've yet to find it. Thank you for your help, cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD [EMAIL PROTECTED] 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interaction between aggregate() and length()
One option is use this: aggregate(list(t=tt$t), list(idx=tt$idx), length) On Thu, Aug 28, 2008 at 4:36 PM, [EMAIL PROTECTED] wrote: Folks, I've been running into an odd situation that occurs when I use length() function with aggregate(), but not with either one separately. Together, the results looks correct but is given an unexpected name. 'if (stringsAsFactors) factor(x) else x' instead of just 'x'. # Numbers work ok tt - data.frame(idx=c(1,1,1,1,1,1,2,2,2,2,2,2) ,n=c(1,3,5,7,5,5,2,4,8,16,4,4) ,t=c(1,3,5,7,5,5,2,4,8,16,4,4) ,stringsAsFactors=FALSE) aggregate(tt$t, list('idx'=tt$idx), length) aggregate(as.factor(tt$t), list('idx'=tt$idx), length) # Character data doesn't work right unless I convert the data to factors. tt - data.frame(idx=c(1,1,1,1,1,1,2,2,2,2,2,2) ,n=c('1','3','5','7','5','5','2','4','8','16','4','4') ,t=c('1','3','5','7','5','5','2','4','8','16','4','4') ,stringsAsFactors=FALSE) aggregate(tt$t, list('idx'=tt$idx), length) aggregate(as.factor(tt$t), list('idx'=tt$idx), length) Any idea what is going on here? For the record, this also happens with the modalvalue() function defined at http://wiki.r-project.org/rwiki/doku.php?id=tips:stats-basic:modalvalue (which also relies on length() ). As a side note, this began as an attempt to determine sample size, for which I've defined a function count - function(x) { length(na.omit(x)) }. No doubt there's a built in function to do just that, but as a newbie I've yet to find it. Thank you for your help, cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD [EMAIL PROTECTED] 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interaction between aggregate() and length()
That's a great work around, as I can eliminate renaming the results column from 'x' to whatever. Thanks for the quick tip, Henrique. On the other hand, I'm still stumped as to why aggregate() would name an output column as 'if (stringsAsFactors) factor(x) else x'. That sort of behaviour seems to contrdict the principle of least astonishment. Enjoy your days, cur Henrique Dallazuanna [EMAIL PROTECTED] wrote on 08/28/2008 01:52:03 PM: One option is use this: aggregate(list(t=tt$t), list(idx=tt$idx), length) On Thu, Aug 28, 2008 at 4:36 PM, [EMAIL PROTECTED] wrote: Folks, I've been running into an odd situation that occurs when I use length() function with aggregate(), but not with either one separately. Together, the results looks correct but is given an unexpected name. 'if (stringsAsFactors) factor(x) else x' instead of just 'x'. # Numbers work ok tt - data.frame(idx=c(1,1,1,1,1,1,2,2,2,2,2,2) ,n=c(1,3,5,7,5,5,2,4,8,16,4,4) ,t=c(1,3,5,7,5,5,2,4,8,16,4,4) ,stringsAsFactors=FALSE) aggregate(tt$t, list('idx'=tt$idx), length) aggregate(as.factor(tt$t), list('idx'=tt$idx), length) # Character data doesn't work right unless I convert the data to factors. tt - data.frame(idx=c(1,1,1,1,1,1,2,2,2,2,2,2) ,n=c('1','3','5','7','5','5','2','4','8','16','4','4') ,t=c('1','3','5','7','5','5','2','4','8','16','4','4') ,stringsAsFactors=FALSE) aggregate(tt$t, list('idx'=tt$idx), length) aggregate(as.factor(tt$t), list('idx'=tt$idx), length) Any idea what is going on here? For the record, this also happens with the modalvalue() function defined at http://wiki.r-project.org/rwiki/doku.php?id=tips:stats-basic:modalvalue (which also relies on length() ). As a side note, this began as an attempt to determine sample size, for which I've defined a function count - function(x) { length(na.omit(x)) }. No doubt there's a built in function to do just that, but as a newbie I've yet to find it. Thank you for your help, cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD [EMAIL PROTECTED] 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.