Have you looked into bioconductor? There is a separate mailing list and many packages designed for genetic analysis within the bioconductor project.
-- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of DB1984 > Sent: Friday, February 04, 2011 4:21 PM > To: r-help@r-project.org > Subject: Re: [R] Finding non-normal distributions per row of data > frame? > > > Greg, Dennis - thanks for your input, I really appreciate the feedback, > as it > is not easy to source. > > In terms of the data; I've described it as 20 columns, which is the > smallest > dataset, but this can run to 320 columns, so in some cases there is > likely > to be enough power to detect non-normality. That said, a better > solution > would be useful. > > As a first approximation, I looked at the mean/median ratio to indicate > simple skew in the data - which suggested that most of the data was > normally > distributed. I took the 'nuggets' to be those with a mean/median ratio > in > the top or bottom 1% of the data. This was a small group - overall the > data > appears relatively normally distributed within rows. > > The aim is really to find those nuggets with significantly non-normal > distributions. My hope was to be able to take the tails of the p-values > for > Shapiro-Wilk, or some similar test, and find these enriched with > nuggets. > This may not be an appropriately robust approach - but is there a > better > option? > > One idea was to sort the data in each row, and perform a linear > regression. > For normal distributions I am expecting the intercept to be close to > the > mean. Using the (intercept-mean) and p-values for the fit of the > regression > was again another way to filter out the nuggets in the dataset. > > If it helps, the nuggets I am expecting are either grouped 80% grouped > around the mean with 20% forming a uni-directional tail, or an > approximate > bimodal distribution. > > As I'd imagine is obvious - I don't have an ideal solution to finding > these > nuggets, and so coming up with the R code to do so is harder still. If > anybody has insight into this sort of problem, and can point me in the > direction of further reading, that would be helpful. If there is a > ready-made solution, even better! > > As I said, thanks for your time with this... > > > -- > View this message in context: http://r.789695.n4.nabble.com/Finding- > non-normal-distributions-per-row-of-data-frame-tp3259439p3261203.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.