On Dec 28, 2010, at 9:23 PM, Entropi ntrp wrote:
Hi,
I have been examining large data and need to do simple linear
regression
with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y, and I
need to
regress x on y for each distinct value of ID. Specifically, for the
set of
data corresponding to each of the 4 values of ID (76,111,121,168) in
the
below data, I should invoke linear regression 4 times. The challenge
is
that, the length of the ID vector is around 20000 and therefore linear
regression must be done automatically for each distinct value of ID.
ID x y
76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111
35756 4.8
121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727
21.9 168
37739 29.7 168 37746 97.4
Let's say that is a dataframe named "indat. Try:
lapply(split(indat, as.factor(indat$ID)), function(df) {lm(y ~ x,
data=df)} )
I was wondering whether there is an easy way to group data based on
the
values of ID in R so that linear regression can be done easily for
each
group determined by each value of ID. Or, is the only way to construct
loops with 'for' or 'while' in which a matrix is generated for each
distinct value of ID that stores corresponding values of x and y by
screening the entire ID vector?
Thanks in advance,
Yasin
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.