On Dec 28, 2010, at 9:23 PM, Entropi ntrp wrote:

Hi,
I have been examining large data and need to do simple linear regression
with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y, and I need to regress x on y for each distinct value of ID. Specifically, for the set of data corresponding to each of the 4 values of ID (76,111,121,168) in the below data, I should invoke linear regression 4 times. The challenge is
that, the length of the ID vector is around 20000 and therefore linear
regression must be done automatically for each distinct value of ID.

              ID            x                     y
76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168
37739 29.7  168 37746 97.4

Let's say that is a dataframe named "indat. Try:

lapply(split(indat, as.factor(indat$ID)), function(df) {lm(y ~ x, data=df)} )

I was wondering whether there is an easy way to group data based on the values of ID in R so that linear regression can be done easily for each
group determined by each value of ID. Or, is the only way to construct
loops  with 'for' or 'while'  in which a matrix is generated for each
distinct value of ID  that stores corresponding values of x and y by
screening the entire ID vector?

Thanks in advance,

Yasin

--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to