it looks like the experts
individuals just come to poke fun at our expesense who has no background of
statistics.

This isn't really a fair statement ... I'd simply suggest to be mindful of what you ask. It was as if you couldn't be bothered to take the time to fully describe your problem (how was anybody supposed to deduce what you explained below from your original email??), but wanted other people to take their time and to understand what you want and do your work for you.

When you look at it that way, it's not a big surprise that you received some of the answers you received. Lastly, I'm not sure how true this is through and through, or how relevant it is to *this particular scenario* but when people post to a somehow-professional list such as this one, I'd think it's generally frowned upon to use some bizarre alias instead of a real name (my 2 cents, there).

In any event, perhaps we can all move on.

As a disclaimer, anything I say from here on out would require taking with a grain of salt:

I have a 8 proteins and I have two groups with 840 samples in control and 1140 samples in diseases further stratified by sex, draw age, duration of disease. all these groups and sub groups is making the thing very confusing as how to do the regression in R. the pupose is to show the changes in the levels of these proteins as the disease progress or changes in their levels
with respect to progression in age, effect of gender, SNPs for these
proteins, it is a pretty big dataset.

I'd start by trying to creating some clever graphics to see if you can eyeball any trends to see if you can get some juice out of further downstream analysis.

Anyway, I don't think there is a simple answer you can get from an email, and I'm a bit surprised that your statistician mentor doesn't have at least some idea of where to start. It sounds like you want to build some predictive model that uses the values in your predictor variables to predict some real valued expression of your protein(s) -- and the problem is that there is no guarantee that you can do this with the data you have anyway (repeat after me: "research is fun").

That being said, one (overly) simple approach (there is no grouping/ subgrouping here) you can do is to use glmnet to and try to do lasso or elasticnet regression using all the factors you mention as predictor variables for the 8 different output vectors, which would be the individual expression of your proteins (so -- that's 8 different models you're trying to learn).

The hope is that the lasso will nuke some of the predictors (by setting their coefficients to 0) and help you find "the most important" factors that influence the protein expression ... in all likelihood, this probably won't work ... and if this is the type of answer you are looking to get, I'm not sure you will get anything satisfactory (repeat after me: "research is fun").

I am not here to ask someone to do my data analysis, but to get an
understanding of the process as well as a proper direction to look for the analysis. after all I do have to explain all these things to my boss as
well.

I'm not an expert, but there is no canned process to do this ... and like I said, there is no guarantee you can do this .. I mean, does it make sense to set up your problem in this way and expect a reasonable outcome (biologically speaking-wise)? Do you have to somehow take into account how these 8 proteins are interacting w/ each other? Many questions to answer ...

Anyway ... I'm not sure there's any real value in this email, but I've got my own fish to fry so time to move on ...

-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to