Ahhh, I see my error, thanks to Steve and others who mailed me off list.

Perhaps reading a little too quickly, I mis-interpreted the help for ddply, in particular, the second argument:

>    ".variables: variables to split data frame by, as quoted variables, a
>            formula or character vector"

I assumed that I could select entire columns (i.e. the *variables* that comprise my data.frame) using this argument.

Thx again,
- S.


On Apr 7, 2010, at 6:13 PM, Steve Lianoglou wrote:

Howdy,

I'm no plyr master, but here's my 2 cents ...

On Wed, Apr 7, 2010 at 5:15 PM, Stuart Andrews <stu.andr...@gmail.com> wrote:
Hi,

I am confused by results from:

ddply(aa, names(aa), colwise(sum))

I thought ddply was just calling colwise(sum)() with each column. However
ddply() returns a 13 x 5 result !!

The general result I expected is similar to that of apply() , or using
colwise(sum)()  alone.  Shouldn't  ddply()  produce the same ?

Not sure what exactly is happening, but I don't think I'd expect ddply
to produce the same as the example you gave, since the second arg to
ddply determines how the aa data.frame should be split (row-wise)
before the colwise(...) do-hicky is called.

I'm not sure, but what are you trying to get at by row-wise splitting
`aa` by c('a', 'b', 'c', 'd', 'e')  [ie. namaes(aa)]?


Thanks in advance for your help,
- Stuart Andrews


set.seed(1234)
aa = as.data.frame(matrix(rnorm(100)>0.3,nrow=20))
names(aa) = c('a','b','c','d','e')
head(aa)
a     b     c     d     e
1 FALSE FALSE FALSE  TRUE  TRUE
2  TRUE  TRUE FALSE  TRUE FALSE
3  TRUE  TRUE FALSE  TRUE  TRUE
4  TRUE FALSE FALSE  TRUE FALSE
5  TRUE FALSE FALSE  TRUE FALSE
6 FALSE FALSE FALSE FALSE  TRUE

ddply(aa, names(aa), colwise(sum))
a b c d e
1  0 0 0 0 0
2  0 0 0 0 2
3  0 0 0 4 0
4  0 0 0 1 1
5  0 0 1 0 0
6  0 0 2 0 2
7  0 0 1 1 0
8  0 2 0 0 0
9  0 1 0 0 1
10 1 0 0 0 0
11 2 0 0 0 2
12 1 0 0 1 0
13 1 0 0 1 1

apply(as.matrix(aa),2,sum)
a b c d e
5 3 4 8 9

colwise(sum)(aa)
 a b c d e
1 5 3 4 8 9


... Isn't ddply() just doing something like this for each column??

colwise(sum)(aa[,1,drop=F])
 a
1 5

That's what colwise is doing per each column of the data.frame it's
working on ... ddply does the split-by-row/apply/merge magic on the
data frame and is giving colwise smaller chunks of `aa` to work on at
a time...

So, to summarize, I think you just need to figure out the correct 2nd
arg to ddply for your specific problem.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to