Re: [R] Linear Regression Problem

Ravi Varadhan Tue, 14 Jul 2009 08:51:56 -0700

I am not sure that you really want to do separate regressions for each row
of X, with the same y.  This does not make much sense.


Why do you think multiple linear regression is not possible just because X'X
is not invertible?  You have 2 main options here:

1.  Obtain a minimum-norm solution using SVD (also known as Moore-Penrose
inverse). This solution minimizes ||y - Xb|| subject to minimum ||b||
2.  Obtain a regularized solution such as the ridge-regression, as Vito
suggested.

You can do (1) as follows:

        require(MASS)

        soln <- ginv(X, y)

Here is an example:

        X <- matrix(rnorm(1000), 10, 100)  # matrix with rank = 10

        b <- rep(1, 100)

        y <- crossprod(t(X), b)

        soln <- c(ginv(X) %*% y)  # this will not be close to b
 
Hope this helps,
Ravi.


----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvarad...@jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 

----------------------------------------------------------------------------
--------


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Alex Roy
Sent: Tuesday, July 14, 2009 11:29 AM
To: Vito Muggeo (UniPa)
Cc: r-help@r-project.org
Subject: Re: [R] Linear Regression Problem

Dear Vito,
                Thanks for your comments. But I want to do Simple linear
regression not Multiple Linear regression. Multiple Linear regression is not
possible here as number of variables are much more than samples.( X is ill
condioned, inverse of X^TX does not exist! ) I just want to take one
predictor variable and regress on y and store regression coefficients, p
values and R^2 values. And the loop go up to 40,000 predictors.

Alex
On Tue, Jul 14, 2009 at 5:18 PM, Vito Muggeo (UniPa)
<vito.mug...@unipa.it>wrote:

> dear Alex,
> I think your problem with a large number of predictors and a 
> relatively small number of subjects may be faced via some 
> regularization approach (ridge or lasso regression..)
>
> hope this helps you,
> vito
>
> Alex Roy ha scritto:
>
>>  Dear All,
>>                 I have a matrix  say, X ( 100 X 40,000) and a vector 
>> say, y (100 X 1) . I want to perform linear regression. I have scaled  
>> X matrix by using scale () to get mean zero and s.d 1  . But still I 
>> get very high values of regression coefficients.  If I scale X 
>> matrix, then the regression coefficients will bahave as a correlation 
>> coefficient and they should not be more than 1. Am I right? I do not 
>> whats going wrong.
>> Thanks for your help.
>> Alex
>>
>>
>> *Code:*
>>
>> UniBeta <- sapply(1:dim(X)[2], function(k)
>> + summary(lm(y~X[,k]))$coefficients[2,1])
>>
>> pval <- sapply(1:dim(X)[2], function(l)
>> + summary(lm(y~X[,l]))$coefficients[2,4])
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting
-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> ====================================
> Vito M.R. Muggeo
> Dip.to Sc Statist e Matem `Vianelli'
> Universit` di Palermo
> viale delle Scienze, edificio 13
> 90128 Palermo - ITALY
> tel: 091 6626240
> fax: 091 485726/485612
> http://dssm.unipa.it/vmuggeo
> ====================================
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear Regression Problem

Reply via email to