[R] help with linear model

2009-10-26 Thread Eleni Christodoulou
Dear list,

I have been searching for a week to fit a simple linear model to my data. I
have looked into the previous posts but I haven't found anything relevant to
my problem. I guess it is something simple...I just cannot see it.
I have the following data frame, named data, which is a subset of a
microarray experiment. The columns are the samples and the rows are the
probes. I binded the first line, called norm, which represents the
estimated output. I want to create a linear model which shows the
relationship between the gene expressions (rows) and the output (norm).

 *data*
GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL
norm 0.897000  0.59  0.683000  0.949000
206427_s_at  5.387205  6.036506  8.824783 10.864122
205338_s_at  6.454779 13.143095  6.123212 12.726562
209848_s_at  6.703062  7.783330 12.175654  9.339651
205694_at5.894131  5.794516 12.876555 11.534664
201909_at   12.616538 12.913255 12.275182 12.767743
208894_at   13.049286  9.317874 12.873516 13.527182
216512_s_at  6.324789 12.783791  6.216932 12.013404
205337_at6.175940 12.158796  6.117519 12.041078
201850_at6.633013  6.465900  6.535434  7.749985
210982_s_at 12.444791  8.597388 12.197696 12.963449
GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL
norm 0.302000  0.597000  0.27  0.53
206427_s_at  5.690357  8.014055 13.034753  5.493977
205338_s_at  5.757048  7.706341 13.258410  5.562588
209848_s_at  6.461028  7.036515 13.633649  5.874098
205694_at5.519552  5.297107  6.498811  5.146150
201909_at   12.814454 11.592632  6.594229  6.650796
208894_at   13.835359 13.028096  5.839909  6.045578
216512_s_at  6.033096  7.273650 12.669054  5.946932
205337_at5.879028  7.381713 12.633829  5.379559
201850_at9.684397  6.560014  8.523229  6.573052
210982_s_at 13.342729 12.470517  5.903681  5.658115
GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL
norm  0.43400  0.647000  0.113000  1.00
206427_s_at  12.80257  5.645002  6.519554 13.572480
205338_s_at  13.38057  5.804107 11.090690 14.024922
209848_s_at  13.27718  6.490851  9.784199 14.101162
205694_at11.37717  5.802105  7.944963 14.060492
201909_at13.24126 12.263899 12.578315  6.443491
208894_at12.29916  7.563361  9.971493  7.094214
216512_s_at  13.00303  5.905789 10.512761 13.647573
205337_at12.63560  5.430138 10.707242 13.020312
201850_at12.71874  6.275480  6.987962 12.354580
210982_s_at  11.53559  7.225199  9.322706  6.617615
GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL
norm  0.35700  0.967000  0.823000  1.00
206427_s_at  13.33764 13.607918 13.190551 12.387189
205338_s_at  13.65492 12.812950 12.237476 12.912605
209848_s_at  13.48525 13.435389 13.851347 12.540495
205694_at 7.70928 10.045331 13.391456 11.103841
201909_at12.47093 11.937344  6.631023  7.160071
208894_at12.20508  8.892181  6.478889  5.927860
216512_s_at  13.42313 12.151691 11.620552 12.341763
205337_at12.67544 12.036528 11.641203 12.275845
201850_at11.85481 13.172666 12.964316 12.156142
210982_s_at  11.49940  8.380404  6.121762  5.921634
GSM276743.CEL GSM276744.CEL GSM276745.CEL GSM276747.CEL
norm 0.899000  0.927000  0.754000  0.437000
206427_s_at 12.665097 12.604673 11.446630 13.000295
205338_s_at 13.261141 12.448096 13.185698 12.510952
209848_s_at 13.396711 13.882529 13.040600 12.984137
205694_at   10.888474  7.094063  8.630120 12.321685
201909_at   12.100560  6.666787 12.330600  6.572282
208894_at7.741437  8.348155 10.106442  6.009902
216512_s_at 12.830373 11.504074 12.300163 11.525958
205337_at   12.264569 11.676281 11.940917 11.618351
201850_at   11.055564 12.202366  7.327056 12.853055
210982_s_at  7.285289  8.129298  9.577032  5.924993
GSM276748.CEL GSM276752.CEL GSM276754.CEL GSM276756.CEL
norm 0.321000  0.62  0.155000  0.946000
206427_s_at  9.081283 11.446978  8.191261 13.192507
205338_s_at 13.737773 13.698520 12.983830 10.948681
209848_s_at 13.234025 12.956672 10.644642 

Re: [R] help with linear model

2009-10-26 Thread Ista Zahn
I'm not familiar with microarray data, so I hope I'm not off base here.

Data frames are structured so that variables appear in the columns and
cases in the rows. From your formula it looks like you're trying to
fit a model using rows as variables and columns as cases. There is
probably a way to do this, but It might be easier to just flip your
data. One way to do this is

dataNew - as.data.frame(t(data))
row.names(dataNew) - names(data)
names(dataNew) - paste(I,row.names(data), sep=) #variable names
should start with a letter

(note that naming your data data is not a good practice.)

Now you should be able to run your model as before (prefixing I to
the variable names to match the new naming scheme):

m1 = lm(norm ~ I206427_s_at + I205338_s_at + I209848_s_at + I205694_at
+ I201909_at + I208894_at + I216512_s_at + I205337_at + I201850_at +
I210982_s_at, data=dataNew)

Hope it helps,
Ista
On Mon, Oct 26, 2009 at 5:48 AM, Eleni Christodoulou
elenic...@gmail.com wrote:
 Dear list,

 I have been searching for a week to fit a simple linear model to my data. I
 have looked into the previous posts but I haven't found anything relevant to
 my problem. I guess it is something simple...I just cannot see it.
 I have the following data frame, named data, which is a subset of a
 microarray experiment. The columns are the samples and the rows are the
 probes. I binded the first line, called norm, which represents the
 estimated output. I want to create a linear model which shows the
 relationship between the gene expressions (rows) and the output (norm).

  *data*
            GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL
 norm             0.897000      0.59      0.683000      0.949000
 206427_s_at      5.387205      6.036506      8.824783     10.864122
 205338_s_at      6.454779     13.143095      6.123212     12.726562
 209848_s_at      6.703062      7.783330     12.175654      9.339651
 205694_at        5.894131      5.794516     12.876555     11.534664
 201909_at       12.616538     12.913255     12.275182     12.767743
 208894_at       13.049286      9.317874     12.873516     13.527182
 216512_s_at      6.324789     12.783791      6.216932     12.013404
 205337_at        6.175940     12.158796      6.117519     12.041078
 201850_at        6.633013      6.465900      6.535434      7.749985
 210982_s_at     12.444791      8.597388     12.197696     12.963449
            GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL
 norm             0.302000      0.597000      0.27      0.53
 206427_s_at      5.690357      8.014055     13.034753      5.493977
 205338_s_at      5.757048      7.706341     13.258410      5.562588
 209848_s_at      6.461028      7.036515     13.633649      5.874098
 205694_at        5.519552      5.297107      6.498811      5.146150
 201909_at       12.814454     11.592632      6.594229      6.650796
 208894_at       13.835359     13.028096      5.839909      6.045578
 216512_s_at      6.033096      7.273650     12.669054      5.946932
 205337_at        5.879028      7.381713     12.633829      5.379559
 201850_at        9.684397      6.560014      8.523229      6.573052
 210982_s_at     13.342729     12.470517      5.903681      5.658115
            GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL
 norm              0.43400      0.647000      0.113000      1.00
 206427_s_at      12.80257      5.645002      6.519554     13.572480
 205338_s_at      13.38057      5.804107     11.090690     14.024922
 209848_s_at      13.27718      6.490851      9.784199     14.101162
 205694_at        11.37717      5.802105      7.944963     14.060492
 201909_at        13.24126     12.263899     12.578315      6.443491
 208894_at        12.29916      7.563361      9.971493      7.094214
 216512_s_at      13.00303      5.905789     10.512761     13.647573
 205337_at        12.63560      5.430138     10.707242     13.020312
 201850_at        12.71874      6.275480      6.987962     12.354580
 210982_s_at      11.53559      7.225199      9.322706      6.617615
            GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL
 norm              0.35700      0.967000      0.823000      1.00
 206427_s_at      13.33764     13.607918     13.190551     12.387189
 205338_s_at      13.65492     12.812950     12.237476     12.912605
 209848_s_at      13.48525     13.435389     13.851347     12.540495
 205694_at         7.70928     10.045331     13.391456     11.103841
 201909_at        12.47093     11.937344      6.631023      7.160071
 208894_at        12.20508      8.892181      6.478889      5.927860
 216512_s_at      13.42313     12.151691     11.620552     12.341763
 205337_at        12.67544     12.036528     11.641203     12.275845
 201850_at        11.85481     13.172666     12.964316     12.156142
 210982_s_at      11.49940      8.380404      6.121762      5.921634
            GSM276743.CEL GSM276744.CEL GSM276745.CEL GSM276747.CEL
 norm             0.899000      0.927000     

Re: [R] help with linear model

2009-10-26 Thread Eleni Christodoulou
Thank you all for your replies. I have tried transposing my data and before
but I did not mention it because I was getting the same error. In the
present case though it worked because I put
lm1=lm(*norm~*.,data=t(data))
instead of
lm1=lm(*fm1*, data=t(data))
where *fm1=norm~cols...*
I actually didn't know that there exists such a difference between norm~cols
and norm~.
I wonder why...

Thank you all again!
Best,
Eleni

On Mon, Oct 26, 2009 at 12:24 PM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi


 r-help-boun...@r-project.org napsal dne 26.10.2009 10:48:51:

  Dear list,
 
  I have been searching for a week to fit a simple linear model to my
 data. I
  have looked into the previous posts but I haven't found anything
 relevant to
  my problem. I guess it is something simple...I just cannot see it.
  I have the following data frame, named data, which is a subset of a
  microarray experiment. The columns are the samples and the rows are the
  probes. I binded the first line, called norm, which represents the
  estimated output. I want to create a linear model which shows the
  relationship between the gene expressions (rows) and the output (norm).
 
   *data*
  GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL
  norm 0.897000  0.59  0.683000  0.949000
  206427_s_at  5.387205  6.036506  8.824783 10.864122
  205338_s_at  6.454779 13.143095  6.123212 12.726562
  209848_s_at  6.703062  7.783330 12.175654  9.339651
  205694_at5.894131  5.794516 12.876555 11.534664
  201909_at   12.616538 12.913255 12.275182 12.767743
  208894_at   13.049286  9.317874 12.873516 13.527182
  216512_s_at  6.324789 12.783791  6.216932 12.013404
  205337_at6.175940 12.158796  6.117519 12.041078
  201850_at6.633013  6.465900  6.535434  7.749985
  210982_s_at 12.444791  8.597388 12.197696 12.963449
  GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL
  norm 0.302000  0.597000  0.27  0.53
  206427_s_at  5.690357  8.014055 13.034753  5.493977
  205338_s_at  5.757048  7.706341 13.258410  5.562588
  209848_s_at  6.461028  7.036515 13.633649  5.874098
  205694_at5.519552  5.297107  6.498811  5.146150
  201909_at   12.814454 11.592632  6.594229  6.650796
  208894_at   13.835359 13.028096  5.839909  6.045578
  216512_s_at  6.033096  7.273650 12.669054  5.946932
  205337_at5.879028  7.381713 12.633829  5.379559
  201850_at9.684397  6.560014  8.523229  6.573052
  210982_s_at 13.342729 12.470517  5.903681  5.658115
  GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL
  norm  0.43400  0.647000  0.113000  1.00
  206427_s_at  12.80257  5.645002  6.519554 13.572480
  205338_s_at  13.38057  5.804107 11.090690 14.024922
  209848_s_at  13.27718  6.490851  9.784199 14.101162
  205694_at11.37717  5.802105  7.944963 14.060492
  201909_at13.24126 12.263899 12.578315  6.443491
  208894_at12.29916  7.563361  9.971493  7.094214
  216512_s_at  13.00303  5.905789 10.512761 13.647573
  205337_at12.63560  5.430138 10.707242 13.020312
  201850_at12.71874  6.275480  6.987962 12.354580
  210982_s_at  11.53559  7.225199  9.322706  6.617615
  GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL
  norm  0.35700  0.967000  0.823000  1.00
  206427_s_at  13.33764 13.607918 13.190551 12.387189
  205338_s_at  13.65492 12.812950 12.237476 12.912605
  209848_s_at  13.48525 13.435389 13.851347 12.540495
  205694_at 7.70928 10.045331 13.391456 11.103841
  201909_at12.47093 11.937344  6.631023  7.160071
  208894_at12.20508  8.892181  6.478889  5.927860
  216512_s_at  13.42313 12.151691 11.620552 12.341763
  205337_at12.67544 12.036528 11.641203 12.275845
  201850_at11.85481 13.172666 12.964316 12.156142
  210982_s_at  11.49940  8.380404  6.121762  5.921634
  GSM276743.CEL GSM276744.CEL GSM276745.CEL GSM276747.CEL
  norm 0.899000  0.927000  0.754000  0.437000
  206427_s_at 12.665097 12.604673 11.446630 13.000295
  205338_s_at 13.261141 12.448096 13.185698 12.510952
  209848_s_at 13.396711 13.882529 13.040600 12.984137
  205694_at   10.888474  7.094063  8.630120 12.321685
  201909_at   12.100560  

Re: [R] help with linear model

2009-10-26 Thread Petr PIKAL
r-help-boun...@r-project.org napsal dne 26.10.2009 11:31:26:

 Thank you all for your replies. I have tried transposing my data and 
before
 but I did not mention it because I was getting the same error. In the
 present case though it worked because I put
 lm1=lm(*norm~*.,data=t(data))
 instead of
 lm1=lm(*fm1*, data=t(data))
 where *fm1=norm~cols...*

There shall not be any difference. I suspect that your formula definition 
has superfluous commas and/or t(data) change names which you suppose to be 
e.g. 206427_s_at but it can not be valid name.

look at

head(t(data))

how names are changed. You need to change your formula according to names.

Regards
Petr



 I actually didn't know that there exists such a difference between 
norm~cols
 and norm~.
 I wonder why...
 
 Thank you all again!
 Best,
 Eleni
 
 On Mon, Oct 26, 2009 at 12:24 PM, Petr PIKAL petr.pi...@precheza.cz 
wrote:
 
  Hi
 
 
  r-help-boun...@r-project.org napsal dne 26.10.2009 10:48:51:
 
   Dear list,
  
   I have been searching for a week to fit a simple linear model to my
  data. I
   have looked into the previous posts but I haven't found anything
  relevant to
   my problem. I guess it is something simple...I just cannot see it.
   I have the following data frame, named data, which is a subset of 
a
   microarray experiment. The columns are the samples and the rows are 
the
   probes. I binded the first line, called norm, which represents the
   estimated output. I want to create a linear model which shows the
   relationship between the gene expressions (rows) and the output 
(norm).
  
*data*
   GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL
   norm 0.897000  0.59  0.683000  0.949000
   206427_s_at  5.387205  6.036506  8.824783 10.864122
   205338_s_at  6.454779 13.143095  6.123212 12.726562
   209848_s_at  6.703062  7.783330 12.175654  9.339651
   205694_at5.894131  5.794516 12.876555 11.534664
   201909_at   12.616538 12.913255 12.275182 12.767743
   208894_at   13.049286  9.317874 12.873516 13.527182
   216512_s_at  6.324789 12.783791  6.216932 12.013404
   205337_at6.175940 12.158796  6.117519 12.041078
   201850_at6.633013  6.465900  6.535434  7.749985
   210982_s_at 12.444791  8.597388 12.197696 12.963449
   GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL
   norm 0.302000  0.597000  0.27  0.53
   206427_s_at  5.690357  8.014055 13.034753  5.493977
   205338_s_at  5.757048  7.706341 13.258410  5.562588
   209848_s_at  6.461028  7.036515 13.633649  5.874098
   205694_at5.519552  5.297107  6.498811  5.146150
   201909_at   12.814454 11.592632  6.594229  6.650796
   208894_at   13.835359 13.028096  5.839909  6.045578
   216512_s_at  6.033096  7.273650 12.669054  5.946932
   205337_at5.879028  7.381713 12.633829  5.379559
   201850_at9.684397  6.560014  8.523229  6.573052
   210982_s_at 13.342729 12.470517  5.903681  5.658115
   GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL
   norm  0.43400  0.647000  0.113000  1.00
   206427_s_at  12.80257  5.645002  6.519554 13.572480
   205338_s_at  13.38057  5.804107 11.090690 14.024922
   209848_s_at  13.27718  6.490851  9.784199 14.101162
   205694_at11.37717  5.802105  7.944963 14.060492
   201909_at13.24126 12.263899 12.578315  6.443491
   208894_at12.29916  7.563361  9.971493  7.094214
   216512_s_at  13.00303  5.905789 10.512761 13.647573
   205337_at12.63560  5.430138 10.707242 13.020312
   201850_at12.71874  6.275480  6.987962 12.354580
   210982_s_at  11.53559  7.225199  9.322706  6.617615
   GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL
   norm  0.35700  0.967000  0.823000  1.00
   206427_s_at  13.33764 13.607918 13.190551 12.387189
   205338_s_at  13.65492 12.812950 12.237476 12.912605
   209848_s_at  13.48525 13.435389 13.851347 12.540495
   205694_at 7.70928 10.045331 13.391456 11.103841
   201909_at12.47093 11.937344  6.631023  7.160071
   208894_at12.20508  8.892181  6.478889  5.927860
   216512_s_at  13.42313 12.151691 11.620552 12.341763
   205337_at12.67544 12.036528 11.641203 12.275845
   201850_at11.85481 13.172666 12.964316 12.156142
   210982_s_at  11.49940  8.380404  6.121762