Re: [R] Linear Regression Problem

Alex Roy Tue, 14 Jul 2009 09:18:08 -0700

Dear Dr. Ravi Varadhan,
Thanks for your comments. Here, variables (p) are in columns and samples are
in rows(n). And I want to find out significant variables associated with
response (y).
The reason why I said multiple linear regression (MLR) is not possible :
MLR or classical MLR developed with a philosophy where nr. of samples
(n) are more than variables(p). And predictors are uncorrelated.
 So, I do not consider penalization/shrinkage/regularaization methods as a
traditional regression methods such as MLR.
The solutions you suggested, I am completely agree with it , even I can  add
some other techniques like Elastic net, Partial Least squares, Principal
component regression etc or may be machine learning methods like Support
vector regression or Random forest regression to get done my job.
But I want to do Univarite Method ( like Simple Linear regression ) for some
purpose.


Regards

Alex
On Tue, Jul 14, 2009 at 5:48 PM, Ravi Varadhan <rvarad...@jhmi.edu> wrote:

> I am not sure that you really want to do separate regressions for each row
> of X, with the same y.  This does not make much sense.
>
> Why do you think multiple linear regression is not possible just because
> X'X
> is not invertible?  You have 2 main options here:
>
> 1.  Obtain a minimum-norm solution using SVD (also known as Moore-Penrose
> inverse). This solution minimizes ||y - Xb|| subject to minimum ||b||
> 2.  Obtain a regularized solution such as the ridge-regression, as Vito
> suggested.
>
> You can do (1) as follows:
>
>        require(MASS)
>
>        soln <- ginv(X, y)
>
> Here is an example:
>
>        X <- matrix(rnorm(1000), 10, 100)  # matrix with rank = 10
>
>        b <- rep(1, 100)
>
>        y <- crossprod(t(X), b)
>
>        soln <- c(ginv(X) %*% y)  # this will not be close to b
>
> Hope this helps,
> Ravi.
>
>
>
> ----------------------------------------------------------------------------
> -------
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvarad...@jhmi.edu
>
> Webpage:
>
> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> tml
>
>
>
>
> ----------------------------------------------------------------------------
> --------
>
>
> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Alex Roy
> Sent: Tuesday, July 14, 2009 11:29 AM
> To: Vito Muggeo (UniPa)
> Cc: r-help@r-project.org
> Subject: Re: [R] Linear Regression Problem
>
>  Dear Vito,
>                Thanks for your comments. But I want to do Simple linear
> regression not Multiple Linear regression. Multiple Linear regression is
> not
> possible here as number of variables are much more than samples.( X is ill
> condioned, inverse of X^TX does not exist! ) I just want to take one
> predictor variable and regress on y and store regression coefficients, p
> values and R^2 values. And the loop go up to 40,000 predictors.
>
> Alex
> On Tue, Jul 14, 2009 at 5:18 PM, Vito Muggeo (UniPa)
> <vito.mug...@unipa.it>wrote:
>
> > dear Alex,
> > I think your problem with a large number of predictors and a
> > relatively small number of subjects may be faced via some
> > regularization approach (ridge or lasso regression..)
> >
> > hope this helps you,
> > vito
> >
> > Alex Roy ha scritto:
> >
> >>  Dear All,
> >>                 I have a matrix  say, X ( 100 X 40,000) and a vector
> >> say, y (100 X 1) . I want to perform linear regression. I have scaled
> >> X matrix by using scale () to get mean zero and s.d 1  . But still I
> >> get very high values of regression coefficients.  If I scale X
> >> matrix, then the regression coefficients will bahave as a correlation
> >> coefficient and they should not be more than 1. Am I right? I do not
> >> whats going wrong.
> >> Thanks for your help.
> >> Alex
> >>
> >>
> >> *Code:*
> >>
> >> UniBeta <- sapply(1:dim(X)[2], function(k)
> >> + summary(lm(y~X[,k]))$coefficients[2,1])
> >>
> >> pval <- sapply(1:dim(X)[2], function(l)
> >> + summary(lm(y~X[,l]))$coefficients[2,4])
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> <http://www.r-project.org/posting
> -guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> > --
> > ====================================
> > Vito M.R. Muggeo
> > Dip.to Sc Statist e Matem `Vianelli'
> > Universit` di Palermo
> > viale delle Scienze, edificio 13
> > 90128 Palermo - ITALY
> > tel: 091 6626240
> > fax: 091 485726/485612
> > http://dssm.unipa.it/vmuggeo
> > ====================================
> >
>
>        [[alternative HTML version deleted]]
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear Regression Problem

Reply via email to