Re: Some "elementary" issues -

kchittur Wed, 24 Nov 1999 08:17:08 -0800
On Tue, 23 Nov 1999, Donald F. Burrill wrote:
> Preliminary response:
> On 22 Nov 1999 [EMAIL PROTECTED] wrote:
> > I am looking for a way to characterize a set of data - each set consists
> > of many thousands of data points spanning a wide range over three
> > sometimes four orders of magnitudes. 
>       This leads one to wonder whether the original variables, or their
> logarithms, would be the more appropriate metric for descriptive purposes. 
> What do the distributions look like?

Will try to answer this - 

A typical data set is such that if the maximum value is 30,000
most (90 to 95%) of values are less than 10 percent of this max -
i.e. most values are "small" - (I am looking at cDNA microarray 
data - this data tells us of the relative levels of different
messenger RNA in cells - so while there are a few mRNA's present
at high levels, most of the messages are at small levels)

>       This would imply that the only characteristics of interest are 
> the mean and s.d.  Is that really so?  (If the 10 distributions are all 
> (approximately) Gaussian (aka "normal"), these are all you need;  but if 
> they are not, rather more descriptive information is probably needed. 
> As remarked above, the fact that your range extends over several orders 
> of magnitude seems to suggest that the distributions probably are NOT 
> Gaussian. 

I am sure there is a straightforward way to see if a given 
distribution behaves "normally"??  I know the equation for the
normal distribution - i.e. given sigma and mean, I can draw
the Bell Curve - I am trying to understand something called
the QQ plot?() to check for 'normality" 

> On reflection, I see that I've been assuming that each of your 10 data 
> sets contains univariate values whose distribution is of interest;  but 
> your description is also consistent with having multivariate values, or 
> (what is not quite the same thing) having a clutch of subsets, each of 
> which is of interest for its own distribution (or parameters thereof). 
> If those several orders of magnitude arise from several systematic 
> differences within each data set, then a less simple-minded approach than 
> what I've outlined above would surely be called for.
> 
The difference in magnitudes of the mRNA levels _may be_systematic -

Let me explain a bit more here - 

A given experiment consists of approximately 5000 data points of 
different intensities - intensities vary from background values
of between 7 to 14 to about 30,000 or so - the intensities are
proportional to the amount of messenger RNA (mostly, there are 
exceptions) - these 5000 or so points are obtained by 
throwing a bunch of messenger RNA against a membrane that 
contains 5000 or so genes - replicate experiments are done with
different membranes, different mRNA preparations - if we look at 
a given spot on the membrane, the intensity varies from 
experiment to experiment - the question is how do we 
characterize these variations?  - these variations can be due
to different membrane lots, different RNA's, different people doing
the experiment (!) etc, etc - 

I apologize if this is long winded - but short of a real 
explanation -
Re: Some "elementary" issues -

Reply via email to