The fact that your column names from your aggregate result contain multiple 
numbers, suggests that something has gone wrong with reading your data in from 
file. Have you had a look at your data.frame 'all'? Are BAR and X etc. numeric? 
Judging from the 'c. etc' they aren't.


 So, how do I aggregate the data frame?

Aggregate either accepts a data.frame or a vector as first argument (actually 
anything that can be coerced into a data.frame). In case of a data.frame is 
applies the aggregation function to each column. So, your first aggregate call 
should be ok (except that you input might be wrong (see above)). However, you 
didn't use names arguments in you list() so R will generate names for you. 
Hence, the strange names.

aggregate returns a data.frame. So if you want to do combine more than one 
aggregate call, you can use merge to merge the results:

Count<- aggregate(all$FOO, by = list(FOO=all$FOO), FUN = length);
byFOO<- merge(byFOO, by="FOO")

If you want to have a vector you could use tapply.

 How do I rename a column?

?names

e.g.
names(all)<- c("column1" , "column2", ...)

 How do I check that two vectors are the same?

?all

all(vector1 == vector2)

but first have a look at:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


HTH,
Jan







On 02/15/2011 12:42 AM, Sam Steingold wrote:
Hi,

I am trying to aggregate some data and I am confused by the results.
I load a data frame "all" from a csv file, and then I do:
(FOO,BAR,X,Y come from the header line in the csv file,
BTW, how do I rename a column?)

byFOO<- aggregate(list(all$BAR,all$QUUX,all$X/all$Y),
                      by = list(FOO=all$FOO),
                      FUN = mean);

I expect a data frame with 4 columns: FOO,BAR,QUUX and X/Y with all FOO
being different (they are character strings, do I need a special
incantation to turn them into factors?)
what I get is indeed a data frame but with names

[1] "FOO"
[2] "c.1.78e.11..4.38e.09..1.461e.11..4.3186e.10..1.1181e.10..5.5389e.10.."
[3] "c.33879300..3713870..190963000..7042170..4590010..91569200..12108200.."
[4] "c.1.37087599544937..1.72690992018244..1.82034830430797..1.70338983050847.."

why? how do I fix the column names?

then I am trying to add to that same frame byFOO some other columns:

byFOO$Count<- aggregate(all$FOO, by = list(all$FOO), FUN = length);
byFOO$Mean<- aggregate(all$Value, by = list(all$FOO), FUN = mean);
byFOO$Total<- aggregate(all$Value, by = list(all$FOO), FUN = sum);

however, byFOO$Count et al are not columns in byFOO with the appropriate
names ("Count"&c) but data frames with columns "Group.1" and "x".
Luckily, at least it appears that byFOO$Count$Group.1 is the same as
byFOO$FOO, as they should be, although I don't see any function which
would check that two vectors are the same ("==" returns a vector which I
have to manually inspect for presence of "FALSE").

So, how do I aggregate the data frame?
How do I rename a column?
How do I check that two vectors are the same?

thanks a lot!

PS. I have not used R for a few years, so please be gentle...
PPS. Please do not tell me to RTFM - I did. At least tell me what to
search for.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to