Re: [R] Re gression using age and Duration of disease as a continous factors

Steve Lianoglou Tue, 21 Jul 2009 11:26:46 -0700

it looks like the experts
individuals just come to poke fun at our expesense who has nobackground of
statistics.

This isn't really a fair statement ... I'd simply suggest to bemindful of what you ask. It was as if you couldn't be bothered to takethe time to fully describe your problem (how was anybody supposed todeduce what you explained below from your original email??), butwanted other people to take their time and to understand what you wantand do your work for you.

When you look at it that way, it's not a big surprise that youreceived some of the answers you received. Lastly, I'm not sure howtrue this is through and through, or how relevant it is to *thisparticular scenario* but when people post to a somehow-professionallist such as this one, I'd think it's generally frowned upon to usesome bizarre alias instead of a real name (my 2 cents, there).


In any event, perhaps we can all move on.

As a disclaimer, anything I say from here on out would require takingwith a grain of salt:

I have a 8 proteins and I have two groups with 840 samples incontrol and1140 samples in diseases further stratified by sex, draw age,duration ofdisease. all these groups and sub groups is making the thing veryconfusingas how to do the regression in R. the pupose is to show the changesin thelevels of these proteins as the disease progress or changes in theirlevels
with respect to progression in age, effect of gender, SNPs for these
proteins, it is a pretty big dataset.

I'd start by trying to creating some clever graphics to see if you caneyeball any trends to see if you can get some juice out of furtherdownstream analysis.

Anyway, I don't think there is a simple answer you can get from anemail, and I'm a bit surprised that your statistician mentor doesn'thave at least some idea of where to start. It sounds like you want tobuild some predictive model that uses the values in your predictorvariables to predict some real valued expression of your protein(s) --and the problem is that there is no guarantee that you can do thiswith the data you have anyway (repeat after me: "research is fun").

That being said, one (overly) simple approach (there is no grouping/subgrouping here) you can do is to use glmnet to and try to do lassoor elasticnet regression using all the factors you mention aspredictor variables for the 8 different output vectors, which would bethe individual expression of your proteins (so -- that's 8 differentmodels you're trying to learn).

The hope is that the lasso will nuke some of the predictors (bysetting their coefficients to 0) and help you find "the mostimportant" factors that influence the protein expression ... in alllikelihood, this probably won't work ... and if this is the type ofanswer you are looking to get, I'm not sure you will get anythingsatisfactory (repeat after me: "research is fun").

I am not here to ask someone to do my data analysis, but to get an
understanding of the process as well as a proper direction to lookfor theanalysis. after all I do have to explain all these things to myboss as
well.

I'm not an expert, but there is no canned process to do this ... andlike I said, there is no guarantee you can do this .. I mean, does itmake sense to set up your problem in this way and expect a reasonableoutcome (biologically speaking-wise)? Do you have to somehow take intoaccount how these 8 proteins are interacting w/ each other? Manyquestions to answer ...

Anyway ... I'm not sure there's any real value in this email, but I'vegot my own fish to fry so time to move on ...


-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re gression using age and Duration of disease as a continous factors

Reply via email to