Correlated random numbers

1999-11-16 Thread Rich Strauss

I have a problem that I had initially thought would be straightforward (but
then, what is?).  For a Monte Carlo-type simulation study, I want to be
able to to generate sets of pseudorandom numbers having correlations equal
to (or differing only randomly from) a target correlation matrix that I
specify up front, based on postulated relationships among variables.  This
is very easy to do using the classic method of Kaiser & Dickman (1962), as
long as the target correlation matrix is positive definite (PD) (ie, has
all positive eigenvalues).  If not, the algorithm (programmed in Matlab)
returns complex numbers, which are not satisfactory for my purposes.

So, for a non-PD target correlation matrix, I decided to find the PD matrix
that is "closest" to the target matrix in some sense.  Somewhere in the
past I had gotten the idea that, for a correlation matrix to be PD, all of
the pairwise correlations must be internally consistent with respect to all
of their partial correlations.  So I wrote another function that
iteratively and minimally adjusts all correlations until each is within the
possible range predicted by all possible partial correlations.  To my
surprise, the resulting matrix is still not positive definite, which means
that my idea about positive-definiteness (definity?) is wrong.  Or at least
that this kind of internal consistency is necessary but not sufficient.

So my question is: in what way should I be adjusting pairwise correlations
so as to find the PD matrix that is "closest" to the target?  After a
reasonably thorough literature search and perusal of texts on linear
algebra and related topics, I've failed to find any literature relevant to
this problem.  Any suggests on how to proceed, or citations that I've missed?

Thanks.
Rich Strauss



Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 




Re: Correlated random numbers

1999-11-17 Thread Rich Strauss

Thanks for the various comments I've gotten (most sent directly to me) on
my problem with random sampling from correlation matrices.  

For those who've requested, here's little bit of background information.
I'm interested in a biological phenomenon known as morphological
integration, and I work on skeletal development in vertebrates (mostly
fishes).  Animals becomes regionally compartmentalized during development,
such that some suites of bones (those of the jaw, for example, or of the
forelimb) become more tightly integrated with one another (ie, more highly
correlated in their sizes and shapes) than they are with bones in other
suites, although all are correlated at some level.  This can be modeled as
a time-dependent correlation matrix, in which the correlations change with
age or size, with increasing within-suite correlations and decreasing
among-suite correlations.  (Actually, we use covariances rather than
correlations because the scaling is important, but the principles are the
same.)

I'm interested in modeling this for several reasons.  First, several
different quantitative measures of morphological integration (indices) are
in use in the literature, and I'm interested in their (largely unknown)
distributional properties.  Second, morphological integration relates to
several other biological aspects of development, such as fluctuating
bilateral asymmetry, allometric gradients, and metamorphosis, all of which
can also be modeled with time-dependent covariances.

So, what I want to be able to do is to postulate a set of "target"
correlation matrices, varying such things as the numbers of character
(=variable) suites, the numbers of characters per suite, and the strengths
of the within-suite and among-suite correlations, and for each of these to
generate samples of potential "morphologies".  Although most such matrices
will be similar to those observed for real organisms and thus very well
behaved, I occasionally want to gradually push the envelop to extreme
conditions, and that's when I bump into statistically incompatible or
ill-conditioned sets of correlations.  It seemed reasonable to me in such
cases to step back to the "closest" correlation matrix that is internally
consistent, which is where my problem arose.

Several people have suggested to me the following numerical solution: get
the eigenvectors and eigenvalues, set the negative eigenvalues to zero
(there's generally only one that's negative) and proportionately adjust the
others to maintain the same sum (total variance), and reconstruct the
correlation matrix.  I've tried it, and so far it seems to work very well
in practice.  However, Rich Ulrich has raised the spectre of "nearly
invalid" results, and so what I plan to do is to begin with a
well-conditioned correlation matrix and gradually change it until it
becomes positive indefinite (is that the correct term?), and check whether
the adjustment is consistent with the changes I made in the matrix leading
up to the ill-conditioning.

So if anyone has any further thoughts on this, or if you're interested in
the results, please let me know.  And thanks again for the help I've gotten
so far.

Rich Strauss

At 12:00 PM 11/17/99 -0500, you wrote:
>On 16 Nov 1999 13:29:31 -0800, [EMAIL PROTECTED] (Rich Strauss)
>([EMAIL PROTECTED]) wrote:
>
>> I have a problem that I had initially thought would be straightforward (but
>> then, what is?).  For a Monte Carlo-type simulation study, I want to be
>> able to to generate sets of pseudorandom numbers having correlations equal
>> to (or differing only randomly from) a target correlation matrix that I
>> specify up front, based on postulated relationships among variables.  This
>> is very easy to do using the classic method of Kaiser & Dickman (1962), as
>> long as the target correlation matrix is positive definite (PD) (ie, has
>> all positive eigenvalues).  If not, the algorithm (programmed in Matlab)
>> returns complex numbers, which are not satisfactory for my purposes.
>> 
>> So, for a non-PD target correlation matrix, I decided to find the PD matrix
>> that is "closest" to the target matrix in some sense....
>
>Slow down;  stop;  back up.
>
>You don't say what your Monte Carlo is for, and why you are putting in
>a variety of correlations, but you don't seem to be taking this "bad
>conditioning"   seriously enough.  -- Look at it this way:  If you set
>yourself up with a matrix that is the next-closest thing to an invalid
>correlation matrix, you are going to get the next-closest thing to
>invalid results -- In this case, it seems that you are planning to do
>it  without ever measuring or recording just how close you are to the
>limit, because you are just (blindly) approximating some target.

Re: Disadvantage of Non-parametric vs. Parametric Test

1999-12-08 Thread Rich Strauss

At 12:04 PM 12/8/99 -0500, Rich Ulrich wrote:

-- snip -- 
>Similarly, bootstrapping is a method of "robust variance estimation"
>but it does not change the metric like a power transformation does, or
>abandon the metric like a rank-order transformation does.  If it were
>proper  terminology to say randomization is nonparametric, you would
>probably want to say bootstrapping is nonparametric, too.  (I think
>some people have done so; but it is not widespread.)

In my fields of interest (ecology and evolutionary biology), it is becoming
increasing common to refer to two "kinds" of bootstrapping: nonparametric
bootstrapping, in which replicate samples are drawn randomly with
replacement from the original sample; and parametric bootstrapping, in
which samples are drawn randomly from a (usually normal) distribution
having the same mean and variance as the original sample.  The former is
bootstrapping in the traditional sense, of course, while the latter is a
form of Monte Carlo simulation.  Unfortunately, the new terminology seems
to be spreading rapidly.

Rich Strauss






Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 




Re: ANOVA with proportions

1999-12-14 Thread Rich Strauss

At 12:52 PM 12/14/99 -0800, Dale Berger wrote:
>Just a reminder that transformations can be used on proportions as a dv to
reduce
>the skew, important if some values approach 0 or 1.  These include arcsine,
>probit, and logit.  Each needs special treatment when p=0 or p=1.  Cohen
and Cohen
>(2nd ed. of Applied MR/C) has a section on transformations for proportions
(pp.
>265-270).

I'll just add the usual caveat that hasn't yet been mentioned in these
responses about proportions: the transformations, use of the binomial, and
comment about proportions just being means all assume that the data really
are proportions, not ratios -- that is, that the denominator is fixed among
all values, not variable.  The problem is that many people use the terms
interchangably, talking about proportions or percentages when they're
actually dealing with ratios.

Rich Strauss



Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 




Re: Ocean Waves: Stationary Random Theory

2000-01-21 Thread Rich Strauss

At 10:53 AM 1/21/00 +, you wrote:
>In the 1983 Guinness Book of World Records under OCEANS, the following
>appears concerning the heights of waves:
>"It has been calculated on the statistics of the Stationary Random theory
>that one wave in more than 300,000 may exceed the average by a factor of 4."
>
>What is a reference on Stationary Random statistical theory?
>What assumptions are involved in modeling random interactions of waves?
>What is the sampling distribution for the heights of "random" waves?

For the latter question, you might check out the literature on extreme
value theory, such as Castillo's book "Extreme value theory in engineering"
(1988 I believe, but I don't know the publisher).  There's a good but
scattered geomorphometry literature on the occurrence of extreme
earthquakes, floods, waves, etc.

Rich Strauss





Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 

===
  This list is an open list and occasionally, people lacking the respect
  for the other members of the list sometimes send messages that are
  inappropriate in reguards to the list discussion topics.
  Please just delete the offensive email.

  For information concerning the list please see the following web page:
  http://jse.stat.ncsu.edu/
===



Re: cluster analysis in one-dimensional "circular" space

2000-04-17 Thread Rich Strauss

Since clustering methods begin with pairwise distances among observations,
why not measure these distances as minimum arc-lengths along the
best-fitting circle (or min chord lengths, or min angular deviations with
respect to the centroid, etc)?  This is how geographic distances are
measured (in 2 dimensions, rather than one) and clustered, and also how
distances are measured among observations in Kendall's shape spaces (e.g.,
Procrustes distances), so there's a well established literature.

Rich Strauss

At 05:32 PM 4/14/00 +0200, you wrote:
>Hi everybody.
>I face the problem of clustering one-dimensional data that can range in a
>circular way. Does anybody knows the best way to solve this problem with no
>aid of an additional variable ? Using a well-suitable trigonometric
>transform ? Using an ad-hoc metric ?
>Thanks.
>
>Carl
>
>
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===
> 


Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: differences between groups/treatments ?

2000-06-22 Thread Rich Strauss

At 04:31 PM 6/22/00 +, Gene Gallagher wrote:
>This pattern was described in an obit about two-three years ago in the
>NY Times.  A statistician's obit noted that he'd found a flaw in the
>Israeli air force's training program.  Apparently, the Israeli air force
>was punishing the worst performers in a test because this usually
>produced a better performance in subsequent tests and was supposedly
>much more effective than positive reinforcement.  They'd found that
>positive reinforcement of the best performers often resulted in a poorer
>performance on the next test.  This now-deceased statistician pointed
>out the confounding effect of regression to the mean on this assessement
>of negative and positive reinforcement.  The effectiveness of negative
>reinforcement (punishment) could be nothing more than a chance effect.

A few years ago the journal "Statistical Methods in Medical Research"
published an issue on regression to the mean (vol 6, no 2, 1997).  It
included the five following papers:

Regression towards the mean, historically considered (pp. 103-114)
  M Stigler S. 

The impact and implication of regression to the mean on the design and
  analysis of medical investigations (115-128)
  Chuang-Stein C.,M Tong D. 

Adjusting for regression toward the mean when variables are normally
  distributed (129-146)
  Lin H., Hughes M. 

Non-normal variation and regression to the mean (147-166)
  Chesher A. 

Using regression models for prediction: shrinkage and regression to the
  mean (167-183)
  Copas J. 

Rich Strauss





Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Adjusting a Correlation Matrix

2000-07-06 Thread Rich Strauss

At 03:46 PM 7/6/00 +, Christian A. Walter wrote:
>Does anyone know if there is a structured way to adjust a negative
>definite matrix such that it becomes semi-definite, while "minimizing"
>the induced changes to the matrix?
>
>Cheers,
>Christian

I posed a similar question to edstat last fall.  I was specifically
concerned with non-positive-definite correlations matrices.  Several people
suggested to me the following numerical solution: get the eigenvectors and
eigenvalues, set the negative eigenvalues to zero (there's generally only
one that's negative) and proportionately adjust the others to maintain the
same sum (total variance), and reconstruct the correlation matrix.  This
seems to work very well in practice.  I've also done some simulations,
beginning with a well-conditioned correlation matrix and gradually changing
it until it becomes slightly ill-conditioned.  The eigen procedure
successfully 'corrects' the matrix.

Rich Strauss





Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Question

2000-10-30 Thread Rich Strauss

Reference:  Lande, R.  1977.  On comparing coefficients of variation.
Systematic Zoology 26:214-217.

Simplest approach: since the squared CV is approximately equal to the
variance of the log-transformed data for CV < 30% or so, compare the
squared CVs with an F-test or equivalent.  Or, compare the variances of the
original log-transformed data using Levene's test or equivalent.

Rich Strauss

At 07:56 AM 10/30/00 -0500, you wrote:
>Hi!
>
>My question is on a test to compare CVs.  The CVs are computed using the
>same data but two different variance methods and I have to compare them.
>Been told there is no real test and as of yet have not checked the Current
>Index of Stat books but wondered if someone in the group has had this
>problem.  Someone suggested that take one of the Cvs and make it the
>population CV and do 95% C.I. around that.  Any suggestions??  Thanks.



Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]  (formerly [EMAIL PROTECTED])
Phone: 806-742-2719
Fax: 806-742-2963 



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Problem on the probability of death

2001-01-16 Thread Rich Strauss

I have what seems to be a straightforward question involving a conditional
probability, but I must be missing something because I can't quite get a
handle on it.  Let's say I have treatment and control groups with
individuals preassigned to each, with T individuals in the treatment group
and C in the control group.  I observe mortality after some period of time,
with t of T dying in the treatment group and c of C in the control group.
I would like a measure of the probability of death due to the treatment,
over and above (in some sense) the probability of death in the control group.

I know that P(x of T) is hypergeometric, assuming that the probabilities of
death for treatment and control are identical, so I know how to determine
whether (t of T) is significantly greater than (c of C).  And I've just
verified that this probability is the same as the chi-square probability
for the 2 x 2 contingency table.  But how do I measure this effect?  As a
simple difference between the probabilities for the two groups?

I initially guessed that the value I wanted was just P(death | treatment),
but of course this turns out to be just the ratio t/T, which contains no
information about the control group.  I'm sure this must be commonly done,
as, for example, in estimating the additional probability of death at a
particular age due to smoking, but I've scanned the resources (texts,
personnel, etc.) I have available and can't find the relevant information.
Can someone point me in the right direction?

Thanks in advance.

Rich Strauss



Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]  (formerly [EMAIL PROTECTED])
Phone: 806-742-2719
Fax: 806-742-2963 



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Nonrandomness of binary matrices

2001-07-23 Thread Rich Strauss

Say I have a binary data matrix for which both the rows (observations) and
columns (variables) are computely permutable.  (In practice, about 5-20% of
the cells will contain 1's, and the remainder will contain 0's.)   Assume
that the expected probability of a cell containing a '1' is identical for
all cells in the matrix.  I'd like to be able to test this assumption by
measuring (and testing the significance of) the degree of 'nonrandomness'
of the 1's in the matrix.

If the rows and columns were fixed in sequence, then this would be an easy
problem involving spatial statistics, but the permutability seems to really
complicate things.  I think that I can test the rows or columns separately
by comparing the row or column totals against a corresponding binomial
distribution using a goodness-of-fit test, but I can't get a handle on how
to do this for the entire matrix.  I'd really appreciate ideas about this.
Thanks in advance.

Rich Strauss



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Nonrandomness of binary matrices

2001-07-25 Thread Rich Strauss

Thanks to Rich Ulrich for the suggestion below -- that was the direction I
was heading, but there seem to be difficulties.  The general problem is
that I have a standard [nxp] data matrix, but (skipping over the scientific
details) some of the values are "special", typically 5-20% of them, and I
want to know whether their distribution within the matrix is structured in
some way.  In particular, they might be concentrated in particular rows or
columns, but beyond that I have no notion of "nonrandom".  I'm hoping that
they're uniformly randomly distributed (or rather, not significantly
different from random) because then I can basically ignore the fact that
they're special, for the scientific problem at hand.

I'd like to have two things: a nicely behaved index of "nonrandomness"
(perhaps a test statistic, rescaled to an interval 0-1?) and a significance
test.  So I recoded the matrix as binary, with the special values coded as
1s.  I presumed that the null marginal distributions would be binomial
rather than Poisson because the frequency of occurrence is so high, but
either way I could test that.  And if I measured the deviations of marginal
totals from expected (as a chi-square statistic, perhaps, or a mean squared
deviation) that would provide both an index and a goodness-of-fit
significance test for the entire matrix.

But the problem is: what if the row totals and column totals are not
independent?  I've done a few 2-way chi-square contingency tests on these
matrices (using randomized null distributions, of course, since the
matrices are binary), and some of the results are statistically
significant.  Doesn't this mean that I can't simply accumulate the row and
column totals for a goodness-of-fit test, since they're not always
independent?  And even if I did the goodness-of-fit tests for rows and
columns independently, how do I combine the p-values to get a single level
of singificance for the entire matrix, if the tests are not independent?

I have the feeling that I'm missing something obvious here but I can't
quite get a handle on it, and this little problem is holding up the
analysis of the results from a much larger study.  I've talked to
statisticians on campus, with little progress, so basically I'm begging for
help.

Rich Strauss

At 10:47 AM 7/25/01 -0400, you wrote:
>On 23 Jul 2001 14:22:58 -0700, [EMAIL PROTECTED] (Rich Strauss)
>wrote:
>
>> Say I have a binary data matrix for which both the rows (observations) and
>> columns (variables) are computely permutable.  (In practice, about 5-20% of
>> the cells will contain 1's, and the remainder will contain 0's.)   Assume
>> that the expected probability of a cell containing a '1' is identical for
>> all cells in the matrix.  I'd like to be able to test this assumption by
>> measuring (and testing the significance of) the degree of 'nonrandomness'
>> of the 1's in the matrix.
>> 
>> If the rows and columns were fixed in sequence, then this would be an easy
>> problem involving spatial statistics, but the permutability seems to really
>> complicate things.  I think that I can test the rows or columns separately
>> by comparing the row or column totals against a corresponding binomial
>> distribution using a goodness-of-fit test, but I can't get a handle on how
>> to do this for the entire matrix.  I'd really appreciate ideas about this.
>> Thanks in advance.
>
>I'm not sure that I grasp what you are after, but - an idea.
>
>If they are completely permutable, then "permute":
>sort them by decreasing counts for row and for column.
>This puts me in mind of certain alternatives to "random."
>
>The set of counts on a margin should be ... Poisson?
>The table can be drawn into quadrants or smaller sections, 
>so that the number of 1s in each can be tabulated, to make
>ordinary contingency tables.
>
>-- 
>Rich Ulrich, [EMAIL PROTECTED]
>http://www.pitt.edu/~wpilib/index.html
> 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: diff in proportions

2001-11-16 Thread Rich Strauss

At 05:12 PM 11/16/01 +, you wrote:
>>On Thu, 15 Nov 2001, Jerry Dallal wrote:
>>> But, if the null hypothesis is that the means are the same, why
>>> isn't(aren't) the sample variance(s) calculated about a pooled
>>> estimate of the common mean?

I've just done some quick simulations in Matlab, constructing randomized
null distributions of the t-statistic under both scenarious: (1) sample
variances based on sample means vs. (2) variances about the pooled mean.
Assuming I've done everything correctly, the result is that the null
distribution of the t-statistic in the second case consistently
approximates the theoretical t-distribution more closely that that of the
first case.  This seems to be true regardless of sample sizes and of
whether the two sample sizes are identical or different.  This result
implies that the t-statistic should indeed be calculated about a pooled
estimate of the common mean, as Jerry Dallal suggested.

I could pass on the details of my simulation if anyone is interested, but
mostly I'd appreciate it if someone could repeat this simulation
independently of mine to see whether it holds up.

Rich Strauss



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Fwd: Re: diff in proportions

2001-11-17 Thread Rich Strauss

This is true.  I simulated the null distributions, those obtained when the
null hypothesis is true, which is what the centered t-distribution
represents.  I didn't look at the sampling distributions for different
effect sizes.

>Date: Sat, 17 Nov 2001 00:19:06 -0600
>From: jim clark <[EMAIL PROTECTED]>
>Subject: Re: diff in proportions
>Sender: [EMAIL PROTECTED]
>X-Sender: [EMAIL PROTECTED]
>To: [EMAIL PROTECTED]
>Organization: The University of Winnipeg
>X-Authentication-warning: dex.pathlink.com: news set sender to
> [EMAIL PROTECTED] using -f
>Original-recipient: rfc822;[EMAIL PROTECTED]
>
>Hi
>
>On 16 Nov 2001, Rich Strauss wrote:
>> I've just done some quick simulations in Matlab, constructing randomized
>> null distributions of the t-statistic under both scenarious: (1) sample
>> variances based on sample means vs. (2) variances about the pooled mean.
>> Assuming I've done everything correctly, the result is that the null
>> distribution of the t-statistic in the second case consistently
>> approximates the theoretical t-distribution more closely that that of the
>> first case.  This seems to be true regardless of sample sizes and of
>> whether the two sample sizes are identical or different.  This result
>> implies that the t-statistic should indeed be calculated about a pooled
>> estimate of the common mean, as Jerry Dallal suggested.
>> 
>> I could pass on the details of my simulation if anyone is interested, but
>> mostly I'd appreciate it if someone could repeat this simulation
>> independently of mine to see whether it holds up.
>
>This simply cannot be generally true.  It probably only applies
>when the null is in fact true, which may be the case for your
>simulations.  To appreciate the illogical nature of this
>recommendation, consider creating a real difference of x between
>your population means, then 2x, then 3x, and so on.  By the
>common mean approach, you are treating the variability between
>groups as though it were noise (i.e., a component in your
>estimate of sigma^2, the variance about the null-hypothesis of
>a common mean).  It is critical to keep in mind that the null
>hypothesis is in fact just that, a hypothesis that may or may
>not be correct.  Computing the within-group variance about the
>group means is the correct way to estimate sigma^2, however,
>irrespective of whether the Ho about the means is true or not.
>
>Best wishes
>Jim
>
>
>James M. Clark (204) 786-9757
>Department of Psychology   (204) 774-4134 Fax
>University of Winnipeg 4L05D
>Winnipeg, Manitoba  R3B 2E9[EMAIL PROTECTED]
>CANADA http://www.uwinnipeg.ca/~clark
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>  http://jse.stat.ncsu.edu/
>=
> 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: N.Y. Times: Statistics, a Tool for Life, Is Getting Short Shrift

2001-11-28 Thread Rich Strauss

>If the trend continues nationwide, this newspaper could someday report
>that an apparently alarming cluster of cancer cases has arisen in an
>innocuous normal distribution, and students will be able to explain to
>their parents what that means.

The reporting of cancer clusters already happens on a regular basis,
including in the NYTimes.  An excellent article on "The Cancer-Cluster
Myth" by Atul Gawande was published in The New Yorker, 8 Feb 99.  It was
reprinted in "The Best American Science and Nature Writing" last year
(2000, Houghton Mifflin).

===
Richard E. Strauss  (806) 742-2719
Biological Sciences (806) 742-2963 Fax
Texas Tech University   [EMAIL PROTECTED]
Lubbock, TX  79409-3131 
http://www.biol.ttu.edu/Faculty/FacPages/Strauss/Strauss.html
===



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: N.Y. Times: Statistics, a Tool for Life, Is Getting Short Shrift

2001-11-29 Thread Rich Strauss

This has nothing to do with normal distributions, as Robert Dawson noted
yesterday.  The article I cited makes no mention of normal distributions,
and I didn't mean to imply that it did.

Rich Strauss

At 04:29 AM 11/29/01 +, Jerry Dallal wrote:
>Rich Strauss <[EMAIL PROTECTED]> wrote:
>:>If the trend continues nationwide, this newspaper could someday report
>:>that an apparently alarming cluster of cancer cases has arisen in an
>:>innocuous normal distribution, and students will be able to explain to
>:>their parents what that means.
>
>: The reporting of cancer clusters already happens on a regular basis,
>: including in the NYTimes.  An excellent article on "The Cancer-Cluster
>: Myth" by Atul Gawande was published in The New Yorker, 8 Feb 99.  It was
>: reprinted in "The Best American Science and Nature Writing" last year
>: (2000, Houghton Mifflin).
>
>I'd be happy if *anyone* could explain to me what "an apparently 
>alarming cluster of cancer cases has arisen in an innocuous normal 
>distribution" means!  I *think* there's an unfortunate use of the word 
>"normal" here, but I can't be sure.

===
Richard E. Strauss  (806) 742-2719
Biological Sciences (806) 742-2963 Fax
Texas Tech University   [EMAIL PROTECTED]
Lubbock, TX  79409-3131 
http://www.biol.ttu.edu/Faculty/FacPages/Strauss/Strauss.html
===



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=