On Wed, 27 Aug 2008, Giuseppe Paleologo wrote:

I have two questions for the group. One is very concrete, and is dangerously
close to a "please do my homework" posting. The second follows from the
first one but is more general. I would welcome the advice of experienced R
users.

As for the first one: I have a data frame with two variables

X  Y
A,   chris
D,   chris
B,   chris
B,   chris
C,   andrew
E,   andrew
C,   andrew
B,   beth
D,  chris
D,   beth
C,   beth
D,   beth
D,   beth
A,   andrew
A,   andrew
A,   andrew
C,   chris
B,   beth
D,   chris
E,   andrew
D,   chris
D,   beth
D,   chris
A,   andrew
A,   chris
C    chris
A    chris
B    chris
C    beth
A    chris

I would like to produce a table, with one row for every level of the factor
X, and multiple columns, filled with the observed levels of the factor Y
that are observed jointly with X. Hence:

X   Z1  Z2  Z3
A,  andrew,  chris
B,  chris beth,  chris
C,  andrew,  beth,  chris
D,  chris,  beth
E,  andrew

A solution would be to something like

temp = tapply(Y, X, function(a) levels(a[,drop=TRUE])))

        lapply( split(Y,X), unique )

or

        lapply( split(Y,X), function(x) as.character(unique(x)))

HTH,

Chuck



and then putting the output in an appropriately sized data frame. The issue
I have with this is that it is inelegant and rather slow for my typical data
set (~200K rows). So I was wondering if a more efficient, nicer solution
exists.

This leads me to a second question. Maybe out of laziness, maybe because R
is good enough, I tend to do all my local data manipulations in R. This
includes de-duping records, joining tables, and grouping observations. I do
this also for larger data sets (say, dense tables with 100M+ elements). Is
this current practice among R users? If so, is there a tutorial, or an R
view on it?  If not, what do you use?

Thanks in advance,

-gappy

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]                  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to