subject:"\"Question\""

Re: regressive question

2001-05-16 Thread Alan McLean


Thanks to everyone who answered my question. The various reservations
about such a test were spot on, and helpful.

My own reservations were because, I think, it is not at all clear what
the null would be in this case. Are you testing mu = beta_0 (so using
the null model with fixed mean) or beta_0 = mu (so using the regression
model with potentially variable mean).

Alan

-- 
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: A regressive question

2001-05-16 Thread Vadim and Oxana Marmer


What if all right-hand side variables have mean close to zero? Intercept
will be close to the sample mean even if model is significant.
On 15 May 2001, Alan McLean wrote:

> Hi to all,
>
> The usual test for a simple linear regression model is to test whether
> the slope coefficient is zero or not. However, if the slope is very
> close to zero, the intercept will be very close to the dependent
> variable mean, which suggests that a test could be based on the
> difference between the estimated intercept and the sample mean.
>
> Does anybody know of a test of this sort?
>
> Regards,
> Alan
>
> --
> Alan McLean ([EMAIL PROTECTED])
> Department of Econometrics and Business Statistics
> Monash University, Caulfield Campus, Melbourne
> Tel:  +61 03 9903 2102Fax: +61 03 9903 2007
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: A regressive question

2001-05-16 Thread Donald Burrill

If the mean of the predictor X is zero, the intercept is equal to the 
mean of the dependent variable Y, however steep or shallow the slope 
may be.  And as Jim pointed out, the standard error of a predicted value 
depends on its distance from the mean of X (being larger the farther 
away it is from the mean, the confidence band being described by a 
hyperbola).  It would seem to follow that a test such as Alan asks about 
would be unusable if the mean of X is too close to 0, and would be (too?) 
insensitive if the mean of X is too far from 0.  An intermediate region, 
where a test of intercept vs. mean Y might be useful, might perhaps be 
defined in terms of the coefficient of variation of X (or perhaps its 
reciprocal, if the mean of X were in danger of actually BEING zero). 

One rather suspects that any such test would be less powerful than the 
usual test of the hypothesis that the true slope is zero, which might 
be an interesting proposition (for someone else!) to pursue.
-- Don.

On Wed, 16 May 2001, Alan McLean wrote:

> The usual test for a simple linear regression model is to test whether
> the slope coefficient is zero or not. However, if the slope is very
> close to zero, the intercept will be very close to the dependent
> variable mean, which suggests that a test could be based on the
> difference between the estimated intercept and the sample mean.
> 
> Does anybody know of a test of this sort?

 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-472-3742  

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: A regressive question

2001-05-16 Thread jim clark

Hi

On 15 May 2001, Alan McLean wrote:
> The usual test for a simple linear regression model is to test whether
> the slope coefficient is zero or not. However, if the slope is very
> close to zero, the intercept will be very close to the dependent
> variable mean, which suggests that a test could be based on the
> difference between the estimated intercept and the sample mean.

Would this not depend on the scale being used?  If the predictor
was some scale on which the normal range of values was quite
large (e.g., GRE scores?), then the value at 0 might be some
distance from the mean of Y even given a very shallow slope.  So
the test would somehow have to adjust for this; that is, the
standard error of the difference from the mean of Y would have to
vary as a function of the distance of 0 from the mean of X. And
presumably the test should produce the equivalent results to the
normal test of the slope. It would be interesting to see if there
is such a test.  Could it be related to the equations for
confidence interval for predicted Y given X?  There are separate
formulas for individual and group predictions and the widths do
vary with distance from the mean of X.

Best wishes
Jim

James M. Clark  (204) 786-9757
Department of Psychology(204) 774-4134 Fax
University of Winnipeg  4L05D
Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]
CANADA  http://www.uwinnipeg.ca/~clark

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

A regressive question

2001-05-15 Thread Alan McLean


Hi to all,

The usual test for a simple linear regression model is to test whether
the slope coefficient is zero or not. However, if the slope is very
close to zero, the intercept will be very close to the dependent
variable mean, which suggests that a test could be based on the
difference between the estimated intercept and the sample mean.

Does anybody know of a test of this sort?

Regards,
Alan

-- 
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question

2001-05-13 Thread Rich Ulrich

On 11 May 2001 07:34:38 -0700, [EMAIL PROTECTED] (Magill,
Brett) wrote:

> Don and Dennis,
> 
> Thanks for your comments, I have some points and futher questions on the
> ussue below.
> 
> For both Dennis and Don:  I think the option of aggregating the information
> is a viable one.  

I would call it "unavoidable"  rather than just "viable."  The data
that you show is basically aggregated  already;  there's just one item
per-person.

>  Yet, I cannot help but think there is some way to do this
> taking into account the fact that there is variation within organizations.
> I mean, if I have a organizational salary mean of .70 (70%) with a very tiny
 [ snip, rest]

 - I agree, you can use the information concerning within-variation.
I think it is totally proper to insist on using it, in order to
validate the conclusions, to whatever degree is possible.  
You might be able to turn around that 'validation'  to incorporate
it into the initial test;  but I think the role as "validation"  is
easier to see by itself, first.

Here's a simple example where the 'variance'  is Poisson.
(Ex.)  A town experiences some crime at a rate that declines 
steadily, from 20 000 incidents to 19 900 incidents, over a 5-year
period.  The linear trend fitted to the several points is "highly
significant"  by a regression test.  Do you believe it?

(Answer)  What I would believe is:  No, there is no trend, but it is
probably true that someone is fudging the numbers.  The 
*observed variation*  in means is far too small for the totals to
be seen be chance.  And the most obvious sources of error
would work in the opposite direction.  

[That is, if there were only a few criminals responsible for many
crimes each, and the number-of-criminals is what was subject 
to Poisson variation, THEN  the number-of-crimes should be 
even more variable.]

In your present case, I think you can estimate on the basis of
your factory (aggregate) data, and then you figure what you 
can about how consistent those numbers are with the 
un-aggregated data, in terms of means or variances.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

RE: Question

2001-05-11 Thread Magill, Brett


Don and Dennis,

Thanks for your comments, I have some points and futher questions on the
ussue below.

For both Dennis and Don:  I think the option of aggregating the information
is a viable one.  Yet, I cannot help but think there is some way to do this
taking into account the fact that there is variation within organizations.
I mean, if I have a organizational salary mean of .70 (70%) with a very tiny
s.d. it is different than a mean of .70 with a large s.d.  Should be some
way to account for this.  In addition, the problems with aggregation are
well documented and I believe in gereneral suggest that aggregated results
overestimate relationships.


Don:  I suggested that the problem was not a traditional multilevel problem.
Perhaps I am wrong, but here is where I thought the difference was.
Typically, say in a classroom problem, I want to assess the effect of
classroom characterisitcs (student/teacher ratio, teacher experience, etc.)
which are constant within classrooms on say student performance, which
varies within classroom across individuals.  The difference between this and
the problem I presented is that the OUTCOME is a contextual variable.  That
is, rather than individual-level varaition, the outcome caries only at the
organizational level.  Perhaps this can be modeled with MLMs, but it is
certainly different than the typical problem.

With regard to independence, I am talking about the independence of the
X2's.  That is X2-1 is not independent of X2-2 and X2-4 is not independent
of X2-5.  This is because these cases come from the same organization.  So,
if we simply regressed Y~X2, not accounting for X1 in the model, this causes
problems for ANOVA and regression, the GLM family more generally.  The lack
of independence here is exactly the reason for repeated measures and MLM
more generally, no?

Perhaps I am making to much of the issue, but the data structure is one that
I have not encountered before and I found it something of an interesting and
challenging problem, just hoping I might learn something along the way.
Would appreciate any comments on my comments above.

Oh, and just so there is no confusion, the data below I constructed.  It
reflects that structure of the data and nature of the relatinoship, but I
generated this data set.  In addition, the real thing does include variables
such as tenure, previous experience, etc. that are also used as covariates
at the individual level.  Of course, this also means that these would need
be aggregated as well if that approach is taken.

Best

> IDX1  X2  Y
> 1 1   0.700.40
> 2 1   0.800.40
> 3 1   0.650.40
> 4 2   1.200.25
> 5 2   1.100.25
> 6 3   0.900.30
> 7 4   0.500.50
> 8 4   0.600.50
> 9 4   0.700.50


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question

2001-05-10 Thread Donald Burrill


On Thu, 10 May 2001, Magill, Brett wrote, inter alia:

> How should these data be analyzed?  The difficulty is that the data 
> are cross level.  Not the traditional multi-level model however.  

Hi, Brett.  I don't understand this statement.  Looks to me like an 
obvious place to apply multilevel (aka "hierarchical") modelling.  
(Have you read Harvey Goldstein's text on the method?)  You have persons 
within organizations (just as, in educational applications of ML models, 
one has pupils within schools for a two-level model, and pupils within 
schools within districts for a three-level model), and apparently want to 
carry out some estimation or other analysis while taking into account the 
(possible) covariances between levels.
If you want a simpler method than ML modelling, the method Dennis 
proposed at least lets you see some aggregate effects.  (This does, 
however, put me in mind of a paper of (I think) Brian Joiner's whose 
temporary working title was "To aggregate is to aggravate" -- though it 
was published under another title.)  ;-)
Along the lines of Dennis' suggestion, you could plot Y vs X2 
(or X2 vs Y) directly, which would give you the visual effect Dennis 
showed while at the same time showing the scatter in the X2 dimension 
around the organization average.  For larger data sets with more 
organizations in them (so that perhaps several organizations would have 
the same (or at any rate indistinguishable, at the resolution of the 
plotting device used) turnover rate), you could generate a letter-plot 
(MINITAB command:  LPLOT), using the organization ID in X1 as a labelling 
variable.

Brett's original post presented this data structure:

> A colleague has a data set with a structure like the one below:
> 
> IDX1  X2  Y
> 1 1   0.700.40
> 2 1   0.800.40
> 3 1   0.650.40
> 4 2   1.200.25
> 5 2   1.100.25
> 6 3   0.900.30
> 7 4   0.500.50
> 8 4   0.600.50
> 9 4   0.700.50
> 
> Where X1 is the organization.  X2 is the percent of market salary an
> employee within the organization is paid -- i.e. ID 1 makes 70% of the 
> market salary for their position and the local economy.  And Y is the 
> annual overall turnover rate in the organization, so it is constant 
> across individuals within the organization.  There are different 
> numbers of employee salaries measured within each organization.  The 
> goal is to assess the relationship between employee salary (as percent 
> of market salary for their position and location) and overall 
> organizational turnover rates.
>
> How should these data be analyzed?  The difficulty is that the data are 
> cross level.  Not the traditional multi-level model however.  That 
> there is no variance across individuals within an organization on the 
> outcome is problematic.  Of course, so is aggregating the individual 
> results.  How can this be modeled both preserving the fact that there is 
> variance within organizations and between organizations?

As I understand it (as implied above), this is exactly the kind of 
structure for which multilevel methods were invented.

> I suggested that this was a repeated measures problem, with repeated 
> measurements within the organization, my colleague argued it was not. 

This strikes me as a possible approach (repeated measures can be treated 
as a special case of multilevel modelling).  But most software that I 
know of that would handle repeated-measures ANOVA would tend to insist 
that there be equal numbers of levels of the repeated-measures factor 
throughout the design, and this appears not to be the case (your sample 
data, at any rate, have different numbers of individuals in the several 
organizations).

> Can this be modeled appropriately with traditional regression models at 
> the individual level?  That is, ignoring X1 and regressing Y ~ X2. 

That was, after a fashion, what Dennis illustrated.  In a formal 
regression analysis, I should think it unnecessary to ignore X1;  
although it would doubtless be necessary to recode it into a series of 
indicator-variable dichotomies, ot something equivalent.

> It seems to me that this violates the assumption of independence. 

Not altogether clear.  By "this" do you mean regression analysis?  
Or, perhaps, the particular analysis you suggested, ignoring X1?  Or...? 
And what "assumption of independence" are you referring to?  (At any 
rate, what such assumption that would not be violated in other formal 
analyses, e.g. repeated-measures ANOVA?)

> Certainly, the percent of market salary that an employee is paid is 
> correlated between employees within an organization (taking into 
> account things like tenure, previous experience, etc.).

Well, would the desired model take such things into account? 
(If not, why not?  If so, where is the problem that I rather vaguely 
sense lurking between the lines here?)

Re: Question

2001-05-10 Thread dennis roberts

this is not unlike having scores for students in a class ... one score for 
each student and ... the age of the teacher of THOSE students ... for a 
class ... scores will vary but, age for the teacher remains the same ... 
but the age might be different in ANother class with a different teacher 
... in a sense, the age is like a mean  just like your turnover rate ... 
and you want to know the relationship between student scores and teachers ages

something has to give

i think you have to reduce the data points on X2 ... find the mean within 
organization 1 ... on X2 ... then have .4 next to it ... second data pair 
would be mean on X2 for organization 2 .. with .25 ... etc.

so, in this case ... you have 4 values on X2 and 4 values on Y ... so, what 
is the relationship between those??

look at the following:

  Row C7 C8

1   0.72   0.40
2   1.15   0.25
3   0.90   0.30
4   0.60   0.50

MTB > plot c8 c7

Plot

  - *
  0.48+
  -
  C8  -
  - *
  -
  0.36+
  -
  -   *
  -
  -
  0.24+*
+-+-+-+-+-+--C7
 0.60  0.70  0.80  0.90  1.00  1.10
Correlations: C7, C8

Pearson correlation of C7 and C8 = -0.957
P-Value = 0.043

there might be a better way to do it but ... looks like a pretty clear case 
of the greater the % of market the organization pays ... the less is there 
turnover rate

At 06:05 PM 5/10/01 -0400, Magill, Brett wrote:
>A colleague has a data set with a structure like the one below:
>
>ID  X1  X2  Y
>1   1   0.700.40
>2   1   0.800.40
>3   1   0.650.40
>4   2   1.200.25
>5   2   1.100.25
>6   3   0.900.30
>7   4   0.500.50
>8   4   0.600.50
>9   4   0.700.50
>
>Where X1 is the organization.  X2 is the percent of market salary an
>employee within the organization is paid--i.e. ID 1 makes 70% of the market
>salary for their position and the local economy.  And Y is the annual
>overall turnover rate in the organization, so it is constant across
>individuals within the organization.  There are different numbers of
>employee salaries measured within each organization. The goal is to assess
>the relationship between employee salary (as percent of market salary for
>their position and location) and overall organizational turnover rates.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Question

2001-05-10 Thread Magill, Brett


A colleague has a data set with a structure like the one below:

ID  X1  X2  Y
1   1   0.700.40
2   1   0.800.40
3   1   0.650.40
4   2   1.200.25
5   2   1.100.25
6   3   0.900.30
7   4   0.500.50
8   4   0.600.50
9   4   0.700.50

Where X1 is the organization.  X2 is the percent of market salary an
employee within the organization is paid--i.e. ID 1 makes 70% of the market
salary for their position and the local economy.  And Y is the annual
overall turnover rate in the organization, so it is constant across
individuals within the organization.  There are different numbers of
employee salaries measured within each organization. The goal is to assess
the relationship between employee salary (as percent of market salary for
their position and location) and overall organizational turnover rates.

How should these data be analyzed?  The difficulty is that the data are
cross level.  Not the traditional multi-level model however.  That there is
no variance across individuals within an organization on the outcome is
problematic.  Of course, so is aggregating the individual results.  How can
this be modeled both preserving the fact that there is variance within
organizations and between organizations.  I suggested that this was a
repeated measures problem, with repeated measurements within the
organization, my colleague argued it was not. Can this be modeled
appropriately with traditional regression models at the individual level?
That is, ignoring X1 and regressing Y ~ X2.  It seems to me that this
violates the assumption of independence.  Certainly, the percent of market
salary that an employee is paid is correlated between employees within an
organization (taking into account things like tenure, previous experience,
etc.).

Thanks


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question: Assumptions for Statistical Clustering (ie. Euclidean distance based)

2001-04-22 Thread Rich Ulrich

On Sun, 22 Apr 2001 16:23:46 GMT, Robert Ehrlich <[EMAIL PROTECTED]>
wrote:

> Clustering has a lot of associated problems.  The first is tha tof cluster
> validity--most algorithms define the existence of as many clusters as the user
> demands.  A very important problem is homogeneity of variance.  So a Z
> transformation is not a bad idea whether or not the variables are normal.

Unless you want the 0-1 variable to count as 10% as potent as the
variable scored 0-10.  The classical default analysis does let you
WEIGHT the variables, by using arbitrary scaling.  (Years ago, it was
typical, shoddy documentation of the standard default, that they
didn't warn the tyro.  Has it improved?  Has the default changed?)

> Quasi-normnality is about all you have to assume--the absence of intersample
> polymodality and the aproximation of the mean and the mode. However, to my
> knowledge, there is no satisfying "theory" associated withcluster analyis--only
> rules of thumb.
[ snip, original question ]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question: Assumptions for Statistical Clustering (ie. Euclidean

2001-04-22 Thread Neville X. Elliven

Robert Ehrlich wrote:

>to my knowledge, there is no satisfying "theory"
>associated withcluster analyis--only rules of thumb.

The underlying theory is classification theory; see Jardine 
& Sibson, Sokal & Sneath, or The Classification Society 
Bulletin.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Question: Assumptions for Statistical Clustering (ie. Euclidean

2001-04-22 Thread Robert Ehrlich

Clustering has a lot of associated problems.  The first is tha tof cluster
validity--most algorithms define the existence of as many clusters as the user
demands.  A very important problem is homogeneity of variance.  So a Z
transformation is not a bad idea whether or not the variables are normal.
Quasi-normnality is about all you have to assume--the absence of intersample
polymodality and the aproximation of the mean and the mode. However, to my
knowledge, there is no satisfying "theory" associated withcluster analyis--only
rules of thumb.

Beng Hai Chea wrote:

> Here is a statistical issue that I have been pondering for a few days now,
> and I am hoping someone can shed some light or even help set me straight.
>
> Would like to know if we need to assume multivariate normality for the data
> whenever we use the Euclidean distance based clustering?
>
> Or it is good to have but not necessary?
>
> The argument I used was that since we need to standardize the raw data for
> this type of clustering, thus we need to assume normality or at least try to
> make sure that the data is normally distributed.
>
> Would like to hear the opinions from this mailing list.
>
> Thanks in advance!
> Beng Hai
>
> _
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Regression toward the Mean - search question

2001-04-09 Thread Gene Gallagher


>A few weeks ago, I believe on this list, a quick discussion of Galton's 
>regression to the mean popped up.  I downloaded some of Galton's data, 
>generated my own, and found some ways to express the effect in ways my 
>non-statistian education friends might understand.  Still working on 
>that part.
>
>In addition, there was a reference to a wonderful article, which I read, 
>and which explained the whole thing in excellent terms and clarity for 
>me.  The author is clearly an expert on the subject of detecting change 
>in things.  He (I think) even listed people who had fallen into the 
>regression toward the mean fallacy, including himself.
>
>Problem:  Now of course I really want that article again, and 
>reference.  I cannot find it on my hard drive.  Maybe I didn't download 
>it - it was large.  But I can't find the reference to it, either.  Bummer!
>
>Can anyone figure out who and what article I'm referring to, and 
>re-point me to it?
>
>Very much obliged to you all,
>Jay
>
>-- 
>Jay Warner
>Principal Scientist
>Warner Consulting, Inc.
> North Green Bay Road
>Racine, WI 53404-1216
>USA
>
Trochim's page has a nice description of the problem but with few historical
references:
http://trochim.human.cornell.edu/kb/regrmean.htm

Campbell, D. T. and D. A. Kenny 1999.  A primer on regression artifacts.
Guilford Press.  This book is devoted almost entirely to regression to the mean
and what to do about it.

Stigler, S. M. 1999. Statistics on the table. Harvard University Press. 
[Stigler has several essays to the discovery of RTM under the heading
"Galtonian Ideas" He also presents a sobering case study of poor Otto Secrist,
whose 1933 magnum opus in econometrics is a classic RTM artifact.
Eugene Gallagher
ECOS
UMASS/Boston


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Regression toward the Mean - search question

2001-04-08 Thread Jay Warner


Dear Everyone,

I feel singularly stupid.  My filing system has collapsed, if it ever 
was structured.

A few weeks ago, I believe on this list, a quick discussion of Galton's 
regression to the mean popped up.  I downloaded some of Galton's data, 
generated my own, and found some ways to express the effect in ways my 
non-statistian education friends might understand.  Still working on 
that part.

In addition, there was a reference to a wonderful article, which I read, 
and which explained the whole thing in excellent terms and clarity for 
me.  The author is clearly an expert on the subject of detecting change 
in things.  He (I think) even listed people who had fallen into the 
regression toward the mean fallacy, including himself.

Problem:  Now of course I really want that article again, and 
reference.  I cannot find it on my hard drive.  Maybe I didn't download 
it - it was large.  But I can't find the reference to it, either.  Bummer!

Can anyone figure out who and what article I'm referring to, and 
re-point me to it?

Very much obliged to you all,
Jay

-- 
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX:(262) 681-1133
email:  [EMAIL PROTECTED]
web:http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Fw: statistics question

2001-04-08 Thread Herman Rubin


In article <003101c0bea9$31b26820$[EMAIL PROTECTED]>,  <[EMAIL PROTECTED]> wrote:
>Hi,

>The below question was on my Doctorate Comprehensives in
>Education at the University of North Florida.

>Would one of you learned scholars pop me back with possible appropriate answers.

>Carmen Cummings




>An educational researcher was interested in developing a predictive scheme to 
>forecast success in an
>elementary statistics course at a local university.  He developed an instrument with 
>a range of scores from 0
>to 50.  He administered this to 50 incoming frechmen signed up for the elementary 
>statistics course, before
>the class started.  At the end of the semester he obtained each of the 50 student's 
>final average.

>Describe an appropriate design to collect data to test the hypothesis.


What design?  The data is already collected, assuming that the
data matches the scores on the prediction instrument and the
final result of the student.

What hypothesis?  

The hypotheses and the assumptions come from the user of 
statistics alone; the learned scholars, as statisticians,
should only try to extract these form the user, and to
point out which assumptions are important and which are
of little importance.  For example, normality is usually
of secondary importance, and is usually quite false, while
the assumptions about the structure are of major importance.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Paired t test Question

2001-04-07 Thread lucz


Andy,
With  a sample size of 4,  n=4, you need to get hold of the Stat EXACT
software developed by Cyrus Mehta.
My one cent-
Luke

"Andrew L." <[EMAIL PROTECTED]> wrote in message
oEYy6.4479$[EMAIL PROTECTED]">news:oEYy6.4479$[EMAIL PROTECTED]...
> I am anlaysing some data and want to administer a paired t test.  Although
i
> can perform the test, i am not totally familiar with the t-test.  Can
anyone
> tell me whether the test relies on having a large number of samples, or
> whether i can still realte an accurate answer from n=4 (n= number of
> participants).
>
> Also, does anyone know what the F stands for - i think it means F-test.
> What is this test designed to show.
>
> I will be grateful for any help
>
> Thanks
>
> Andy
>
>
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Fw: statistics question

2001-04-06 Thread Rich Ulrich

I reformatted this.

Quoting a letter from Carmen Cummings to himself,
On 6 Apr 2001 08:48:38 -0700, [EMAIL PROTECTED] wrote:

> The below question was on my Doctorate Comprehensives in
> Education at the University of North Florida.
> 
> Would one of you learned scholars pop me back with 
>possible appropriate answers.

==== the question
An educational researcher was interested in developing a
predictive scheme to forecast success in an elementary statistics
course at a local university. He developed an instrument with a
range of scores from 0 to 50. He administered this to 50 incoming
frechmen signed up for the elementary statistics course, before
the class started. At the end of the semester he obtained each of
the 50 student's final average. 

Describe an appropriate design to collect data to test the
hypothesis. 
= end of cite.

I hope the time of the Comprehensives is past.  Anyway, this
might be better suited for facetious answers, than serious ones.

The "appropriate design" in the strong sense:  

Consult with a statistician  IN ORDER TO "develop an instrument".  
Who decided only a single dimension should be of interest?  
(How else does one interpret a score with a "range" from 0 to 50?)

Consult with a statistician BEFORE administering something to --
selected?  unselected? -- freshman; and consult (perhaps) 
in order to develop particular hypotheses worth testing.  
I mean, the kids scoring over 700 on Math SATs will ace 
the course,  and the kids under 400 will have trouble.  

Generalizing, of course.  If "final average"  (as suggested) 
is the criterion, instead of "learning."
But you don't need a new study to tell you those results.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Fw: statistics question

2001-04-06 Thread comyn


Hi,

The below question was on my Doctorate Comprehensives in
Education at the University of North Florida.

Would one of you learned scholars pop me back with possible appropriate answers.

Carmen Cummings


- Original Message -
From: "Carmen Cummings" <[EMAIL PROTECTED]>
To: "David Cummings" <[EMAIL PROTECTED]>
Sent: Thursday, April 05, 2001 4:38 PM
Subject: statistics question


An educational researcher was interested in developing a predictive scheme to forecast 
success in an
elementary statistics course at a local university.  He developed an instrument with a 
range of scores from 0
to 50.  He administered this to 50 incoming frechmen signed up for the elementary 
statistics course, before
the class started.  At the end of the semester he obtained each of the 50 student's 
final average.

Describe an appropriate design to collect data to test the hypothesis.





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Paired t test Question

2001-04-05 Thread Paige Miller

"Andrew L." wrote:
> 
> I am anlaysing some data and want to administer a paired t test.  Although i
> can perform the test, i am not totally familiar with the t-test.  Can anyone
> tell me whether the test relies on having a large number of samples, or
> whether i can still realte an accurate answer from n=4 (n= number of
> participants).
> 
> Also, does anyone know what the F stands for - i think it means F-test.
> What is this test designed to show.

I think you should definitely get a basic introductory book on
statistics and brush up on your statistical knowledge. In regards to
your specific questions, the accuracy of your results doesn't really
depend on the sample size, but the precision does. Your comparison of
the means (You do want to compare means, don't you? You didn't
actually say that...) will not be very precise with just 4 samples. F
may stand for an F-test and it may stand for a lot of other things; I
don't normally associate doing a F-test with a paired t-test.

So I would advise, based upon your questions, don't just mechanically
crank a paired t-test through whatever software you have ... sit down
with someone who knows statistics and explain your entire problem to
him or her, and find out if a paired t-test is the right thing to do,
and how a sample size of 4 affects your comparison of the means. 

-- 
Paige Miller
Eastman Kodak Company
[EMAIL PROTECTED]

"It's nothing until I call it!" -- Bill Klem, NL Umpire
"Those black-eyed peas tasted all right to me" -- Dixie Chicks

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Paired t test Question

2001-04-05 Thread Andrew L.


I am anlaysing some data and want to administer a paired t test.  Although i
can perform the test, i am not totally familiar with the t-test.  Can anyone
tell me whether the test relies on having a large number of samples, or
whether i can still realte an accurate answer from n=4 (n= number of
participants).

Also, does anyone know what the F stands for - i think it means F-test.
What is this test designed to show.

I will be grateful for any help

Thanks

Andy





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Easy question

2001-03-10 Thread Wenjing Dai


Thanks for your comment.  The message was accidentally sent from my wife's
news account.

I didn't take the measurement simultaneously, but it is not my major
concern.  My concern is: I did a regression on the mean(WT) against
mean(AT).  Is this good enough?  Can I get more out of data?  I've been
trying to get QVF(Quasiliklihood estimation model from stat.tamu.edu) and
some multivariate delta SAS macro to work.  They seems too complicated for
such a simple situation.  Is there a simple way?

Thanks again for your help.

Cheers,
Wenjing Dai
([EMAIL PROTECTED])
Department of Computer Science, University of Illinois


"Donald Burrill" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> On Fri, 9 Mar 2001, Wei Xiao wrote:
>
> > Suppose I went to 10 lakes.  I want to measure the relation with water
> > temperature (WT) and air temperature (AT).  So I can do a regression
> > with these 10 points like this:
> > |*
> > |*AT
> > |*
> > |__*__
> >  WT
> >
> > However, to be sure, I took 3 AT's and 3 WT's at each lake.  Now any
> > particular AT is not correlated with WT.
>
> How can that be?  Did you not take each AT and WT at the same time and
> in the same place?  (Not necessarily at the same time, or in the same
> place, as the other pairs of (WT,AT);  in fact, preferably the
> measurements should have been made at different (time, place) if what
> you were trying to do was to get a measure of the variability in WT
> and AT at each lake.)
>   If you claim they're not correlated because all six values were taken
> more or less simultaneously at the same place, and they were not taken
> in (WT,AT) pairs, then the three WT values are not independent
> observations, nor are the three AT values, but within each of THESE
> triplets the values are correlated in an unknown, and possibly
> unknowable, way.  Then all you can do is take the easy way out:
> take the average of the three WT values as the WT for that lake,
> and similarly for the three AT values.
>
> > Instead, they are kind of have error in both X and Y axis.
>
> This remark is not helpful.  If you only had one value of (WT,AT) at
> each lake, those values would surely have measurement error in both
> measurements.
>
> > Can somebody show me a better way to analyze this?
> > I prefer talking in SAS or SAS macro.
> Sorry, not one of my languages.
>
> > Here is hypotheticall data sheet.
> > Lake, WT, AT
> > Lake11015
> > Lake11114
> > Lake11213
> > ...
> >
> > Notice there is no relation between WT and AT reading.
> > I can record this way too:
> > Lake, WT, AT
> > Lake11013
> > Lake11114
> > Lake11215
> > ...
>
> It is not at all clear why you can legitimately shuffle these values
> around with respect to each other:  unless either (a) all 6 values are
> recorded simultaneously in the same place;  or (b) you took all 6
> values at 6 different times and places, so that there really is no
> empirical connection between any particular AT and any particular WT.
> Either case would seem to me to represent faulty experimental
> procedure... to put it politely.
> -- DFB.
>  --
>  Donald F. Burrill[EMAIL PROTECTED]
>  348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
>  MSC #29, Plymouth, NH 03264 (603) 535-2597
>  184 Nashua Road, Bedford, NH 03110  (603) 471-7128
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Easy question

2001-03-10 Thread Donald Burrill

On Fri, 9 Mar 2001, Wei Xiao wrote:

> Suppose I went to 10 lakes.  I want to measure the relation with water 
> temperature (WT) and air temperature (AT).  So I can do a regression 
> with these 10 points like this: 
> |*
> |*AT
> |*
> |__*__
>  WT
> 
> However, to be sure, I took 3 AT's and 3 WT's at each lake.  Now any
> particular AT is not correlated with WT. 

How can that be?  Did you not take each AT and WT at the same time and 
in the same place?  (Not necessarily at the same time, or in the same 
place, as the other pairs of (WT,AT);  in fact, preferably the 
measurements should have been made at different (time, place) if what 
you were trying to do was to get a measure of the variability in WT 
and AT at each lake.)
  If you claim they're not correlated because all six values were taken 
more or less simultaneously at the same place, and they were not taken 
in (WT,AT) pairs, then the three WT values are not independent 
observations, nor are the three AT values, but within each of THESE 
triplets the values are correlated in an unknown, and possibly 
unknowable, way.  Then all you can do is take the easy way out:  
take the average of the three WT values as the WT for that lake, 
and similarly for the three AT values.

> Instead, they are kind of have error in both X and Y axis. 

This remark is not helpful.  If you only had one value of (WT,AT) at 
each lake, those values would surely have measurement error in both  
measurements.

> Can somebody show me a better way to analyze this? 
> I prefer talking in SAS or SAS macro.
Sorry, not one of my languages.

> Here is hypotheticall data sheet.
> Lake, WT, AT
> Lake11015
> Lake11114
> Lake11213
> ...
> 
> Notice there is no relation between WT and AT reading. 
> I can record this way too:
> Lake, WT, AT
> Lake11013
> Lake11114
> Lake11215
> ...

It is not at all clear why you can legitimately shuffle these values 
around with respect to each other:  unless either (a) all 6 values are 
recorded simultaneously in the same place;  or (b) you took all 6 
values at 6 different times and places, so that there really is no 
empirical connection between any particular AT and any particular WT. 
Either case would seem to me to represent faulty experimental 
procedure... to put it politely.
-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Easy question

2001-03-09 Thread Wei Xiao


Hi folks,

I have this problem at hand:

Suppose I went to 10 lakes.  I want to measure the relation with water
temperature (WT) and air temperature (AT).  So I can do a regression with
these 10 points like this:
|*
|*AT
|*
|__*__
 WT

However, to be sure, I took 3 AT's and 3 WT's at each lake.  Now any
particular AT is not correlated with WT.  Instead, they are kind of have
error in both X and Y axis.  Can somebody show me a better way to analyze
this?  I prefer talking in SAS or SAS macro.

Here is hypotheticall data sheet.
Lake, WT, AT
Lake11015
Lake11114
Lake11213
...

Notice there is no relation between WT and AT reading.  I can record this
way too:
Lake, WT, AT
Lake11013
Lake11114
Lake11215
...

Thanks in advance.

Best regard,
W




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Trend analysis question: follow-up

2001-03-06 Thread Rich Ulrich

On 5 Mar 2001 16:41:22 -0800, [EMAIL PROTECTED] (Donald Burrill)
wrote:

> On Mon, 5 Mar 2001, Philip Cozzolino wrote in part:
> 
> > Yeah, I don't know why I didn't think to compute my eta-squared on the 
> > significant trends. As I said, trend analysis is new to me (psych grad
> > student) and I just got startled by the results.
> > 
> > The "significant" 4th and 5th order trends only account for 1% of the
> > variance each, so I guess that should tell me something. The linear 
> > trend accounts for 44% and the quadratic accounts for 35% more, so 79% 
> > of the original 82% omnibus F (this is all practice data).
> > 
> > I guess, if I am now interpreting this correctly, the quadratic trend 
> > is the best solution.
DB >
>   Well, now, THAT depends in part on what the 
> spectrum of candidate solutions is, doesn't it?  For all that what you 
> have is "practice data", I cannot resist asking:  Are the linear & 
> quadratic components both positive, and is the overall relationship 
> monotonically increasing?  Then, would the context have an interesting 
> interpretation if the relationship were exponential?  Does plotting 
 [ snip, rest ]

"Interesting interpretation" is important.  In this example, the
interest (probably) lies mainly with the variance-explained: 
in the linear and quadratic.

It's hard for me to be highly interested in an order-5 polynomial,
and sometimes a quadratic seems unnecessarily awkward.

What you want is the convenient, natural explanation.  
If "baseline" is far different from what follows, that will induce 
a bunch of high order terms if you insist on modeling all the 
periods in one repeated measures ANOVA.  A sensible
interpretation in that case might be, to describe the "shock effect"
and separately describe what happened later.

Example.
The start of Psychotropic medications has a huge, immediate,
"normalizing"  effect on some aspects of sleep of depressed patients
(sleep latency, REM latency, REM time, etc.).  Various changes 
*after*  the initial jolt can be described as no-change;  continued
improvement;  or  return toward the initial baseline.  

In real life, linear trends worked fine for describing the on-meds
followup observation nights (with - not accidentally - increasing
intervals between them).
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Trend analysis question: follow-up

2001-03-05 Thread Donald Burrill


On Mon, 5 Mar 2001, Philip Cozzolino wrote in part:

> Yeah, I don't know why I didn't think to compute my eta-squared on the 
> significant trends. As I said, trend analysis is new to me (psych grad
> student) and I just got startled by the results.
> 
> The "significant" 4th and 5th order trends only account for 1% of the
> variance each, so I guess that should tell me something. The linear 
> trend accounts for 44% and the quadratic accounts for 35% more, so 79% 
> of the original 82% omnibus F (this is all practice data).
> 
> I guess, if I am now interpreting this correctly, the quadratic trend 
> is the best solution.
Well, now, THAT depends in part on what the 
spectrum of candidate solutions is, doesn't it?  For all that what you 
have is "practice data", I cannot resist asking:  Are the linear & 
quadratic components both positive, and is the overall relationship 
monotonically increasing?  Then, would the context have an interesting 
interpretation if the relationship were exponential?  Does plotting 
log(Y) against X look approximately linear?  If so, especially if your 
six values of X are points in time, Y can be described as exhibiting 
exponential growth over the period observed, and there is a constant 
doubling time (if Y is increasing) or half-life (if Y is decreasing).

The formal equation for exponential growth in Y (with X = time) is
Y = a*exp(b*X)
and the doubling time is  log(2)/b  (using the natural logarithm); 
if  b  is negative,  Y  is exhibiting exponential decay and this 
quantity is its half-life.

In the intermediate course (ANOVA and MLR), I used to use some old data 
on the mass of chick embryos to illustrate a period of exponential 
growth.  11 time points, 1 day apart, and a very nice exponential fit.  
A polynomial fit required a quartic equation.
-- Don.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-05 Thread Herman Rubin


In article <52jo6.114$[EMAIL PROTECTED]>,
Milo Schield <[EMAIL PROTECTED]> wrote:
>But what does this (in)dependence really mean?
>Can it change on conditioning?

.

>This seems related to Simpson's paradox.
>In any event, it seems that independence can be conditional.
>Is this so?  If so, where is this discussed in more detail?

Why does it have to be discussed in more detail?  Conditional
probability is probability.  
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-05 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Richard A. Beldin <[EMAIL PROTECTED]> wrote:

>You missed the point, Herman. I don't assert that these are independent
>random variables. I claim that introducing students to the concept of
>independent sample spaces from which we construct a cartesian product
>sample space will make it easier for them to understand independent
>events and random variables when we define them late.

I believe that this will not do what is expected, and might
even make it worse.

When we introduce sample spaces, we do not, and should not,
introduce the probabilities at that time.  If we did, we 
could not have inference, and also I believe that we need
to get across the idea that there is no "right" sample space
for a problem, but merely adequate representations; the 
point in a sample space can represent the result of the
experiment under consideration, but we might have more.
Otherwise, how can we consider the number of successes to
be a real-valued random variable?

Sample spaces can be Cartesian products without the
coordinates being independent; whenever we have a bivariate
classification, we have a Cartesian product, whether or not
there is independence.  We do not want students to consider
race and lactose intolerance to be independent.

Presenting oversimplified special cases seems to make it
harder for people to understand.  I deliberately postpone
all considerations of symmetry or equally likely, as the
students (and also those using probability and statistics)
have a major tendency to impose this when it is very
definitely not the case.  The "principle of insufficient
reason" contributed to the demise of Bayesian statistics
in the 19th century, and I see it going strong now.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Trend analysis question

2001-03-05 Thread Robert Ellis



"Philip Cozzolino" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> However, after the cubic non-significant finding, the 4th and 5th order
> trends are significant.
>
> Intuitively, it seems that if there is no cubic trend of significance,
> there will not be any higher order trend, but this is relatively new to
> me.

Hi Philip.

In a trend analysis, each test is orthogonal (independent) of the other
tests so the results reported are quite reasonable. Admittedly, in my
experience at least, it's a little unusual to have 4 out of the 5 trends
significant but such a finding does not indicate any problem with the
analysis. Are there equal intervals between the six levels of your factor?

Robert




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Trend analysis question - Thanks

2001-03-04 Thread Philip Cozzolino


Thanks Donald and Karl for your responses...

Yeah, I don't know why I didn't think to compute my eta-squared on the
significant trends. As I said, trend analysis is new to me (psych grad
student) and I just got startled by the results.

The "significant" 4th and 5th order trends only account for 1% of the
variance each, so I guess that should tell me something. The linear trend
accounts for 44% and the quadratic accounts for 35% more, so 79% of the
original 82% omnibus F (this is all practice data).

I guess, if I am now interpreting this correctly, the quadratic trend is the
best solution.

Thanks again for your help,
-Philip


--- 
"If we knew what we were doing,
it wouldn't be called research, would it?"

-Albert Einstein


in article [EMAIL PROTECTED], Philip Cozzolino
at [EMAIL PROTECTED] wrote on 3/3/01 7:23 PM:

> Hi,
> 
> I have a question on how to interpret a specific trend analysis summary
> table. The IV has 6 levels, so I had SPSS run the analysis checking up
> the 5th order trend.
> 
> There is a significant linear and quadratic trend, but not cubic.
> 
> However, after the cubic non-significant finding, the 4th and 5th order
> trends are significant.
> 
> Intuitively, it seems that if there is no cubic trend of significance,
> there will not be any higher order trend, but this is relatively new to
> me.
> 
> Any help is greatly appreciated.
> -Philip



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Trend analysis question

2001-03-04 Thread Karl L. Wuensch


Philip has been unfortunate enough to get significance on his 4th and 5th
order trends, and is hoping that nonsignificance of the 3rd order trend
means the higher order trends are spurious.  Sorry no.  Consider a perfect
quadratic relationship -- there will be absolutely no linear component.  I
wonder if one should even test for trends of an order that one could not
interpret.  They will always be present in some magnitude, and, given
sufficient sample size, will be "significant."  It might help to compute
eta-squared (divide the trend SS by the total SS) and then use that
statistic to decide whether you can dismiss the "significant trend" as
trivial in magnitude -- I have generally been able to do so when having
encountered such higher order trends that defy interpretation but meet our
criterion of statistical significance.

++ Karl L. Wuensch, Department of Psychology, East Carolina University,
Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283
[EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Trend analysis question

2001-03-03 Thread Donald Burrill


On Sun, 4 Mar 2001, Philip Cozzolino wrote in part:

> However, after the cubic non-significant finding, the 4th and 5th 
> order trends are significant. 
> 
> Intuitively, it seems that if there is no cubic trend of significance, 
> there will not be any higher order trend, but this is relatively new 
> to me.
Your intuition is, in this case, incorrect.  The five 
trends are mutually independent in the sense that any combination of them 
may be operating.  (I am for the moment accepting the implied premise 
that a power function of the IV is a reasonable function to try to fit to 
your data.  In most instances I know of, this is not "really" the case, 
and the power function is more usefully thought of as an approximation 
to whatever the "real" functionality is.)  This may be seen by 
considering the following relationships between Y and X (think of them as 
DV and IV if you wish):

I. +   * *
   -*   *
   Y   -
   -*   *
   -
   + *  *
   -
   -   *  *
   - *
   -
   +-+-+-+-+-+-  X

II.+   *
   -  * **
   -
Y  -  **   *
   -
   +   * *   *
   -
   - *  * *
   -
   -   * *
   +-+-+-+-+-+-  X

In I. above, the linear trend is approximately zero, and the quadratic 
component of X accounts for nearly all the variation in Y.  A "rule" 
that claimed "If the linear trend is insignificant there can be no 
significant quadratic trend" is clearly false in this case.
 In II. above, both the linear and quadratic components of trend are 
virtually zero -- certainly insignificant -- and the cubic component 
accounts for nearly all the varition in Y.  Similar situations can be 
imagined, where only the quartic, or only the quintic, or only the 
linear, quadratic, and quartic, or any other arbitrary combination of 
the basic trends are significant, and other components are not.

If you are carrying out your trend analysis by using orthogonal 
polynomials (as you probably should be), try constructing the model 
derived from your linear + quadratic fit only, and plot those as 
predicted values against X;  then construct the model derived from linear 
+ quadratic + quartic + quintic, and plot those predicted values against 
X.  You may find it illuminating also to plot the residuals in each case 
against X, especially if you force the same vertical scale on the two 
sets of residuals.

I note in passing that you haven't stated how much of the variance of Y 
is accounted for by each of the significant components, nor how much 
residual variance there is after each component is entered.  That also 
might be illuminating.
-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-03 Thread Milo Schield


But what does this (in)dependence really mean?
Can it change on conditioning?
Suppose that we take into account a plausible confounder: defective
equipment.  Suppose blacks are more likely to have "defective equipment
(broken light, etc.).  Suppose we find that percentage who are black  among
those stopped for defective equipment is the same as the percentage who are
black among those having defective equipment.  Now we have independence at
one level and non-independence at another.

This seems related to Simpson's paradox.
In any event, it seems that independence can be conditional.
Is this so?  If so, where is this discussed in more detail?
"Lise DeShea" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
Re probability/independence, I've found that the most effective way to
communicate this concept to my students (College of Education, not heavily
math-oriented) is the following:

Then you can move to an example of racial profiling.  Out of all the people
in your city who  drive, what proportion are African-American?
[p(African-American).] Now, GIVEN that you look only at drivers who are
pulled over, what proportion of these people are African American?
[p(African-American|pulled over).]  If being black and being pulled over are
independent events, then the probabilities should be equal.

You can illustrate this graphically by drawing a  large box to represent all
the drivers, then mark the proportion representing African-American drivers.
Then draw a smaller box representing the people being pulled over, with a
proportion of the box marked to represent the African-American drivers who
are pulled over.  If the proportions of each box are equal, then the events
are independent.

So now,  I would welcome comments from the more mathematically/statistically
rigorous list members among us!

~~~
Lise DeShea, Ph.D.
Assistant Professor
Educational and Counseling Psychology Department
University of Kentucky
245 Dickey Hall
Lexington KY 40506
Email:  [EMAIL PROTECTED]
Phone:  (859) 257-9884







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Trend analysis question

2001-03-03 Thread Philip Cozzolino


Hi,

I have a question on how to interpret a specific trend analysis summary 
table. The IV has 6 levels, so I had SPSS run the analysis checking up 
the 5th order trend.

There is a significant linear and quadratic trend, but not cubic.

However, after the cubic non-significant finding, the 4th and 5th order 
trends are significant. 

Intuitively, it seems that if there is no cubic trend of significance, 
there will not be any higher order trend, but this is relatively new to 
me.

Any help is greatly appreciated.
-Philip

-- 
"Leave the gun. Take the cannolis."
   
   --Peter Clemenza, "The Godfather"


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-03 Thread Richard A. Beldin


This is a multi-part message in MIME format.
--5A878725779696BDEEB17BCF
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

You missed the point, Herman. I don't assert that these are independent
random variables. I claim that introducing students to the concept of
independent sample spaces from which we construct a cartesian product
sample space will make it easier for them to understand independent
events and random variables when we define them late.

--5A878725779696BDEEB17BCF
Content-Type: text/x-vcard; charset=us-ascii;
 name="rabeldin.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Richard A. Beldin
Content-Disposition: attachment;
 filename="rabeldin.vcf"

begin:vcard 
n:Beldin;Richard
tel;home:787-255-2142
x-mozilla-html:TRUE
url:netdial.caribe.net/~rabeldin/Home.html
org:BELDIN Consulting Services
version:2.1
email;internet:[EMAIL PROTECTED]
title:Professional Statistician (retired)
adr;quoted-printable:;;PO Box 716=0D=0A;Boquerón;PR;00622;
fn:Richard A. Beldin
end:vcard

--5A878725779696BDEEB17BCF--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-03 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Richard A. Beldin <[EMAIL PROTECTED]> wrote:

>The suits and ranks of cards in a bridge deck certainly can be presented
>as independent sample spaces which we use as components of a cartesian
>product. Whether one does so or not is a matter of choice. I am on
>record as favoring the presentation as the cartesian product. Even the
>sample mean and variance can be seen this way, in fact, every vector
>valued random variable can be cast in the form of a random vector from a
>cartesian product.

This is the case for ONE card.  Now suppose that one takes a
sample without replacement; it still is the case that the
suit of one card and the rank of another are independent, but
it is not the case that the number of cards of a given suit
and the number of cards of a given rank are independent.

>My point is that if we introduce independence as an attribute of sample
>spaces which we proceed to study as one, we can better motivate the idea
>of independent random variables and independent events.

How about this one, I believe due to Mandel?  Take a sample
from a trivariate independent normal distribution.  Then
each pair of correlations is independent, but the three 
correlations cannot be.

Or this one, which leads to an easy derivation of the
Wishart distribution, and generation of Wishart matrices?

Let the sum of squares and cross products from a sample
of size n from a p-dimensional normal distribution with
mean 0 and covariance matrix I be written as AA', with
A 0 above the main diagonal.  Then if n>=p, (the changes
for nhttp://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-02 Thread Richard A. Beldin


This is a multi-part message in MIME format.
--D6CAE5CBE7F2826036C27891
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

The suits and ranks of cards in a bridge deck certainly can be presented
as independent sample spaces which we use as components of a cartesian
product. Whether one does so or not is a matter of choice. I am on
record as favoring the presentation as the cartesian product. Even the
sample mean and variance can be seen this way, in fact, every vector
valued random variable can be cast in the form of a random vector from a
cartesian product.

My point is that if we introduce independence as an attribute of sample
spaces which we proceed to study as one, we can better motivate the idea
of independent random variables and independent events.

--D6CAE5CBE7F2826036C27891
Content-Type: text/x-vcard; charset=us-ascii;
 name="rabeldin.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Richard A. Beldin
Content-Disposition: attachment;
 filename="rabeldin.vcf"

begin:vcard 
n:Beldin;Richard
tel;home:787-255-2142
x-mozilla-html:TRUE
url:netdial.caribe.net/~rabeldin/Home.html
org:BELDIN Consulting Services
version:2.1
email;internet:[EMAIL PROTECTED]
title:Professional Statistician (retired)
adr;quoted-printable:;;PO Box 716=0D=0A;Boquerón;PR;00622;
fn:Richard A. Beldin
end:vcard

--D6CAE5CBE7F2826036C27891--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-03-02 Thread George W. Cobb



>
> I think that introducing the word "independent" as a descriptor of
> sample spaces and then carrying it on to the events in the product space
> is much less likely to generate the confusion due to the common informal
> description "Independent events don't have anything to do with each
> other" and "Mutually exclusive events can't happen together."
>


I like Dick's idea a lot.  To me, part of the problem is that textbooks
fail to distinguish independence as a mathematical construct from
independence as a modeling construct.  Too many intro books put their
expository effort into the mathematical definition, and then get
obfuscatorily circular when it comes to the examples.  Mathematicians
*assume* independence, statisticians look at the data, and textbooks fail
to recognize the difference.  Dick's approach gives a nice way, in
an elementary seting, to help students recognize situations where an
assumption of independence is likely to stand up to empirical scrutiny.

I agree, too, Dick, that this should help with mutually exclusive
vs. independent.

  George Cobb

George W. Cobb
Mount Holyoke College
South Hadley, MA  01075
413-538-2401




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-02-28 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Richard A. Beldin <[EMAIL PROTECTED]> wrote:
>This is a multi-part message in MIME format.
>--20D27C74B83065021A622DE0
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit

>I have long thought that the usual textbook discussion of independence
>is misleading. In the first place, the most common situation where we
>encounter independent random variables is with a cartesian product of
>two indpendent sample spaces. Example: I toss a die and a coin. I have
>reasonable assumptions about the distributions of events in either case
>and I wish to discuss joint events. I have tried in vain to find natural
>examples of independent random variables in a smple space not
>constructed as a cartesian product.

>I think that introducing the word "independent" as a descriptor of
>sample spaces and then carrying it on to the events in the product space
>is much less likely to generate the confusion due to the common informal
>description "Independent events don't have anything to do with each
>other" and "Mutually exclusive events can't happen together."

>Comments?

The usual definition of "independence" is a computational
convenience, but an atrocious definition.  A far better
way to do it, which conveys the essence, is to use
conditional probability.  Random variables, or more
generally partitions, are independent if, given any
information about some of them, the conditional
probability of any event formed from the others is the
same as the unconditional probability.  This is the way
it is used.

As for a "natural" example not coming from a Cartesian
product, consider drawing a hand from an ordinary deck
of cards.  On another newsgroup, someone asked for a
proof that the number of aces and the number of spades
was uncorrelated; they are not independent.  The proof
I posted used that for the i-th and j-th cards dealt,
the rank of the i-th card and the suit of the j-th are
independent.  For i=j, this can be looked upon as a
product space, but not for i and j different.

There are other examples.  The independence of the sample
mean and sample variance in a sample from a normal 
distribution is certainly an important example.  The 
independence of the various sample variances in an ANOVA
model is another.  The independence for each t of X(t)
and X'(t) in a stationary differentiable Gaussian 
process is another.

This is thrown together off the cuff.  There are lots of
others.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-02-28 Thread Lise DeShea


Re probability/independence, I've found that the most
effective way to communicate this concept to my students (College of
Education, not heavily math-oriented) is the following:

Consider the student population of your university.  Perhaps there
is a fairly equal split of males and females in the student body. 
Now, put a condition upon the student body -- only those majoring in,
say, psychology.  Do you find the same proportion of students who
are male within only psych majors, compared with the proportion of
students in the entire student body who are male?  If gender and
psych major are independent, then the probability of a randomly chosen
person at the university being male should equal the probability of a
randomly chosen psych major being male.  That is, 

p(male) = p(male|psych major)    <==(p. of male, given
you're looking at psych majors)

Then you can move to an example of racial profiling.  Out of all the
people in your city who  drive, what proportion are
African-American?  [p(African-American).] Now, GIVEN that you look
only at drivers who are pulled over, what proportion of these people are
African American?  [p(African-American|pulled over).]  If being
black and being pulled over are independent events, then the
probabilities should be equal.  

You can illustrate this graphically by drawing a  large box to
represent all the drivers, then mark the proportion representing
African-American drivers.  Then draw a smaller box representing the
people being pulled over, with a proportion of the box marked to
represent the African-American drivers who are pulled over.  If the
proportions of each box are equal, then the events are independent.

So now,  I would welcome comments from the more
mathematically/statistically rigorous list members among us!

~~~
Lise DeShea, Ph.D.
Assistant Professor
Educational and Counseling Psychology Department
University of Kentucky
245 Dickey Hall
Lexington KY 40506
Email:  [EMAIL PROTECTED]
Phone:  (859) 257-9884

Re: Satterthwaite-newbie question

2001-02-28 Thread Allyson Rosen

Wow.  I'm impressed with this group's thoughtful responses both privately
and on the server.   Yes, Hayes calls this the Behrens-Fisher problem too.

I was always taught to use equal n's and then the homogeneity of variance
assumptions were not as big of an issue (the ttest post alluded to this
too).  Since I'm working with a clinical sample, I'm stuck.  Just to give
more info.

n1=6; n2=8

I started computing multiple ttests just to see how things changed when the
n's were kept constant.  Of course I knew in advance which was the one I
wanted to use.  The SD's are quite different for some of the comparisons
since one group is impaired and one is generally normal.

satterthwaite weighted df=
  a =SEM1^2
  b =SEM2^2
  c =a^2/(n+1)
  d =b^2/(n+1)
[(a+b)^2/(c+d)]-2

(I hope I got this coding correct-see Hayes p328). I checked the spss
algorithms web site you gave and all the formulas for ttests and
t-statistics used only 1 term for n (I did find Satterthwaite listed in
appendix 2 so I might try redoing this with spss) so I used Minitab (someone
else suggested this package) after trying the calculations by hand (excel).
Here are the SEM's for means 1 and 2.  It looks like the df decreases as the
difference (diff) between the SEM's goes up. I also added the SEM's (sum)
just to see if there was a relationship to overall variability.   It looks
like it's working well for me.

Thanks everyone!

Allyson

Here are my calculations with minitab

SEM1= 51
SEM2 =73
diff =-22
sum= 124
df =11

SEM1 =39
SEM2 =114
diff  =-75
sum =153
df  =8

SEM1 = 42
SEM2 = 23
diff = 19
sum = 65
df = 8

SEM1 = 17
SEM2 = 20
diff = -3
sum = 37
df = 11

SEM1 = 21
SEM2 = 180
diff = -159
sum = 201
df = 7

SEM1 = 52
SEM2 = 36
diff = 16
sum = 88
df = 9

Rich Ulrich wrote in message <[EMAIL PROTECTED]>...
>On Wed, 28 Feb 2001 08:26:30 -0500, Christopher Tong
><[EMAIL PROTECTED]> wrote:
>
>> On Tue, 27 Feb 2001, Allyson Rosen wrote:
>>
>> > I need to compare two means with unequal n's. Hayes (1994) suggests
using a
>> > formula by Satterthwaite, 1946.  I'm about to write up the paper and I
can't
>> > find the full reference ANYWHERE in the book or in any databases or in
my
>> > books.  Is this an obscure test and should I be using another?
>>
>> Perhaps it refers to:
>>
>> F. E. Sattherwaite, 1946:  An approximate distribution of estimates of
>> variance components.  Biometrics Bulletin, 2, 110-114.
>>
>> According to Casella & Berger (1990, pp. 287-9), "this approximation
>> is quite good, and is still widely used today."  However, it still may
>> not be valid for your specific analysis:  I suggest reading the
>> discussion in Casella & Berger ("Statistical Inference", Duxbury Press,
>> 1990).  There are more commonly used methods for comparing means with
>> unequal n available, and you should make sure that they can't be used
>> in your problem before resorting to Sattherwaite.
>
>I don't have access to Casella & Berger, but I am curious about what
>they recommend or suggest.  Compare means with Student's t-test or
>logistic regression; or Satterthwaite t if you can't avoid it if both
>means and variances are different enough, and you wouldn't rather do
>some transformation (for example, to ranks:  then test Ranks).  And
>there's randomization and bootstrap.  Anything else?
>
>Yesterday (so it should still be on your server), there was a post
>with comments about the t-tests.
> from the header
>From: [EMAIL PROTECTED] (Jay Warner)
>Newsgroups: sci.stat.edu
>Subject: Re: two sample t
>
>
>There are *additional* methods for comparing, but the one that is
>*more common* is probably the Student's t, which  ignores the
>inequality.
>
>Any intro-stat-book with the t-test is likely to have one or another
>version of the Satterthwaite t.  The SPSS website includes algorithms
>for what that stat-package uses, under t-test, for "unequal
>variances."  I find it almost impossible to find the algorithms by
>navigating the site, so here is an address --
>http://www.spss.com/tech/stat/Algorithms.htm
>
>--
>Rich Ulrich, [EMAIL PROTECTED]
>http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Satterthwaite-newbie question

2001-02-28 Thread Christopher Tong

First, forgive me for mis-spelling Satterthwaite in my previous post.

On Wed, 28 Feb 2001, Rich Ulrich wrote:

> I don't have access to Casella & Berger, but I am curious about what
> they recommend or suggest.  Compare means with Student's t-test or
> logistic regression; or Satterthwaite t if you can't avoid it if both
> means and variances are different enough, and you wouldn't rather do
> some transformation (for example, to ranks:  then test Ranks).  And
> there's randomization and bootstrap.  Anything else?

Casella & Berger basically say that unknown, unequal variance is a hard
problem but Satterthwaite is a good approximation.  They call this
the Behrens-Fisher problem and give references (e.g, Kendall & Stewart).

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Satterthwaite-newbie question

2001-02-28 Thread Rich Ulrich

On Wed, 28 Feb 2001 08:26:30 -0500, Christopher Tong
<[EMAIL PROTECTED]> wrote:

> On Tue, 27 Feb 2001, Allyson Rosen wrote:
> 
> > I need to compare two means with unequal n's. Hayes (1994) suggests using a
> > formula by Satterthwaite, 1946.  I'm about to write up the paper and I can't
> > find the full reference ANYWHERE in the book or in any databases or in my
> > books.  Is this an obscure test and should I be using another?
> 
> Perhaps it refers to:
> 
> F. E. Sattherwaite, 1946:  An approximate distribution of estimates of
> variance components.  Biometrics Bulletin, 2, 110-114.
> 
> According to Casella & Berger (1990, pp. 287-9), "this approximation
> is quite good, and is still widely used today."  However, it still may
> not be valid for your specific analysis:  I suggest reading the
> discussion in Casella & Berger ("Statistical Inference", Duxbury Press,
> 1990).  There are more commonly used methods for comparing means with
> unequal n available, and you should make sure that they can't be used
> in your problem before resorting to Sattherwaite.

I don't have access to Casella & Berger, but I am curious about what
they recommend or suggest.  Compare means with Student's t-test or
logistic regression; or Satterthwaite t if you can't avoid it if both
means and variances are different enough, and you wouldn't rather do
some transformation (for example, to ranks:  then test Ranks).  And
there's randomization and bootstrap.  Anything else?

Yesterday (so it should still be on your server), there was a post
with comments about the t-tests.
 from the header
From: [EMAIL PROTECTED] (Jay Warner)
Newsgroups: sci.stat.edu
Subject: Re: two sample t

There are *additional* methods for comparing, but the one that is
*more common* is probably the Student's t, which  ignores the
inequality.

Any intro-stat-book with the t-test is likely to have one or another
version of the Satterthwaite t.  The SPSS website includes algorithms
for what that stat-package uses, under t-test, for "unequal
variances."  I find it almost impossible to find the algorithms by
navigating the site, so here is an address --
http://www.spss.com/tech/stat/Algorithms.htm

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-02-28 Thread Radford Neal

In article <[EMAIL PROTECTED]>,
Richard A. Beldin <[EMAIL PROTECTED]> wrote:

>... I have tried in vain to find natural
>examples of independent random variables in a smple space not
>constructed as a cartesian product.

An important example theoretically is the independence of the sample
mean and the sample variance of a data set consisting of points drawn
independently from a Gaussian distribution.  Now, you might be able
to view this in terms of a Cartesian product, but it's not obvious
that that's a natural view.

>I think that introducing the word "independent" as a descriptor of
>sample spaces and then carrying it on to the events in the product space
>is much less likely to generate the confusion due to the common informal
>description "Independent events don't have anything to do with each
>other" and "Mutually exclusive events can't happen together."

I think this would be a bad idea.  Events can be independent without
being constructed to be independent in this way.  

As a definition, "Independent events don't have anything to do with
each other" is dangerous because it leads one to think that
independence is a property of events as physical phenomena.  For
instance, one might decide that the event of a person having a
harmless variant of gene A is independent of the event of their having
a harmless variant of gene B, on the grounds that the mechanisms for
the two genes mutating are such that there's no reason for them to
mutate together.  But if the genes are linked, and the context is a
sample of people from some community founded not too long ago by a
small number of people, the events of the two variants occuring in a
person may not be independent, even though they would be independent
if the context were a sample of people from the whole world.  Here,
independence is not a property of the people, or of the genes, but of
what is considered to be the sample space for whatever problem is
being tackled.

Regarding "Mutually exclusive events can't happen together", this is
not an adequate definition if some non-null events have zero probability.

I think that independence is not something that can be explained in
ANY simple way.  Multiple explanations and multiple examples are needed.

   Radford Neal

Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Satterthwaite-newbie question

2001-02-28 Thread Christopher Tong

On Tue, 27 Feb 2001, Allyson Rosen wrote:

> I need to compare two means with unequal n's. Hayes (1994) suggests using a
> formula by Satterthwaite, 1946.  I'm about to write up the paper and I can't
> find the full reference ANYWHERE in the book or in any databases or in my
> books.  Is this an obscure test and should I be using another?

Perhaps it refers to:

F. E. Sattherwaite, 1946:  An approximate distribution of estimates of
variance components.  Biometrics Bulletin, 2, 110-114.

According to Casella & Berger (1990, pp. 287-9), "this approximation
is quite good, and is still widely used today."  However, it still may
not be valid for your specific analysis:  I suggest reading the
discussion in Casella & Berger ("Statistical Inference", Duxbury Press,
1990).  There are more commonly used methods for comparing means with
unequal n available, and you should make sure that they can't be used
in your problem before resorting to Sattherwaite.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Satterthwaite-newbie question

2001-02-27 Thread Allyson Rosen


I need to compare two means with unequal n's. Hayes (1994) suggests using a
formula by Satterthwaite, 1946.  I'm about to write up the paper and I can't
find the full reference ANYWHERE in the book or in any databases or in my
books.  Is this an obscure test and should I be using another?

Thanks,

Allyson




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-02-27 Thread Jay Warner

Richard A. Beldin wrote:

> This is a multi-part message in MIME format.
> --20D27C74B83065021A622DE0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> 
> I have long thought that the usual textbook discussion of independence
> is misleading. In the first place, the most common situation where we
> encounter independent random variables is with a cartesian product of
> two independent sample spaces. Example: I toss a die and a coin. I have
> reasonable assumptions about the distributions of events in either case
> and I wish to discuss joint events. I have tried in vain to find natural
> examples of independent random variables in a sample space not
> constructed as a cartesian product.
> 
> I think that introducing the word "independent" as a descriptor of
> sample spaces and then carrying it on to the events in the product space
> is much less likely to generate the confusion due to the common informal
> description "Independent events don't have anything to do with each
> other" and "Mutually exclusive events can't happen together."
> 
> Comments?

1)It is conceivable, that a plant making blue and red 'thingies' on 
the same production line would discover that the probability that the 
next thingie is internally flawed (in the cast portion) is independent 
of the probability that it is blue.

BTW - 'Thingies' are so commonly used by everyone that it is not 
necessary to describe them in detail. :)

2)There are many terms, concepts, and definitions in the 'textbook' 
that have no exact match in reality.  Common expressions include, "There 
is no such thing as random,' 'There is no such thing as Normal 
(distribution),' and my own contribution, "There is no such thing as a 
dichotomy this side of a theological discussion.'  The abstract 
definitions are just that - theoretical ideals.  Down here in the mud of 
reality, we recognize this, and try to decide if the theory is 
reasonably close to what is happening.   A couple confirmation trials 
help, too.

If the internal casting flaws are generated at an early point, and the 
paint is added later, depending on the orders received, then I would 
assert that independence was likely.  If the paint is added to castings 
made on different dies or production machines, as a color code, then I 
would suspect independence was unlikely.

3)Presenting 'independence' as axes in a cartesian coordinate system 
is extremely handy, especially for discussing orthogonal arrays and 
designed experiments, etc.  The presentation, however, does not make 
them independent.  One has to check the physical system behavior to 
assure that.

4)I may have shot far wider than your intended mark, in which case, 
sorry for the interruption.

Jay

-- 
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX:(262) 681-1133
email:  [EMAIL PROTECTED]
web:http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: basic stats question

2001-02-27 Thread Richard A. Beldin


This is a multi-part message in MIME format.
--20D27C74B83065021A622DE0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I have long thought that the usual textbook discussion of independence
is misleading. In the first place, the most common situation where we
encounter independent random variables is with a cartesian product of
two indpendent sample spaces. Example: I toss a die and a coin. I have
reasonable assumptions about the distributions of events in either case
and I wish to discuss joint events. I have tried in vain to find natural
examples of independent random variables in a smple space not
constructed as a cartesian product.

I think that introducing the word "independent" as a descriptor of
sample spaces and then carrying it on to the events in the product space
is much less likely to generate the confusion due to the common informal
description "Independent events don't have anything to do with each
other" and "Mutually exclusive events can't happen together."

Comments?

--20D27C74B83065021A622DE0
Content-Type: text/x-vcard; charset=us-ascii;
 name="rabeldin.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Richard A. Beldin
Content-Disposition: attachment;
 filename="rabeldin.vcf"

begin:vcard 
n:Beldin;Richard
tel;home:787-255-2142
x-mozilla-html:TRUE
url:netdial.caribe.net/~rabeldin/Home.html
org:BELDIN Consulting Services
version:2.1
email;internet:[EMAIL PROTECTED]
title:Professional Statistician (retired)
adr;quoted-printable:;;PO Box 716=0D=0A;Boquerón;PR;00622;
fn:Richard A. Beldin
end:vcard

--20D27C74B83065021A622DE0--



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sample size question

2001-02-23 Thread Rich Ulrich

On 23 Feb 2001 12:08:45 -0800, [EMAIL PROTECTED] (Scheltema,
Karen) wrote:

> I tried the site but received errors trying to download it.  It couldn't
> find the FTP site.  Has anyone else been able to access it?

As of a few minutes ago, it downloaded fine for me, when I clicked on
it with  Internet Explorer.  The  .zip  file expanded okay.  I used
right-click (I just learned that last week) in order to download the
 .pfd  version of the help.

[ ... ]

< Earlier Q and Answer >
"Can anyone point me to software for estimating ANCOVA or regression
sample sizes based on effect size?"
> > Look here:
> > http://www.interchg.ubc.ca/steiger/r2.htm

Hmm.  Placing limits on R^2.  I have't read the 
accompanying documentation.  

On the general principal that you can't compute power
if you don't know what power you are looking for, I suggest reading
the relevant chapters in Jacob Cohen's book (1988+ edition).

-- 
Rich Ulrich, [EMAIL PROTECTED]

http://www.pitt.edu/~wpilib/index.html

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

RE: Sample size question

2001-02-23 Thread DJNordlund


>I tried the site but received errors trying to download it.  It couldn't
>find the FTP site.  Has anyone else been able to access it?
>
>Karen Scheltema
>Statistician
>HealthEast
>Research and Education
>1700 University Ave W
>St. Paul, MN 55104
>(651) 232-5212   fax (651) 641-0683
>[EMAIL PROTECTED]
>
>> -Original Message-
>> From:Chuck Cleland [SMTP:[EMAIL PROTECTED]]
>> Sent:Friday, February 23, 2001 11:04 AM
>> To:  [EMAIL PROTECTED]
>> Subject: Re: Sample size question
>> 
>> "Scheltema, Karen" wrote:
>> > Can anyone point me to software for estimating ANCOVA or regression
>> sample
>> > sizes based on effect size?
>> 
>> Look here:
>> 
>> http://www.interchg.ubc.ca/steiger/r2.htm
>> 
>> Chuck
>

Karen,

I just looked, and was able to access the site and download the files.

Dan Nordlund



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

RE: Sample size question

2001-02-23 Thread Scheltema, Karen


I tried the site but received errors trying to download it.  It couldn't
find the FTP site.  Has anyone else been able to access it?

Karen Scheltema
Statistician
HealthEast
Research and Education
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax (651) 641-0683
[EMAIL PROTECTED]

> -Original Message-
> From: Chuck Cleland [SMTP:[EMAIL PROTECTED]]
> Sent: Friday, February 23, 2001 11:04 AM
> To:   [EMAIL PROTECTED]
> Subject:  Re: Sample size question
> 
> "Scheltema, Karen" wrote:
> > Can anyone point me to software for estimating ANCOVA or regression
> sample
> > sizes based on effect size?
> 
> Look here:
> 
> http://www.interchg.ubc.ca/steiger/r2.htm
> 
> Chuck
>  
> -<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-
>  Chuck Cleland 
>  Institute for the Study of Child Development  
>  UMDNJ--Robert Wood Johnson Medical School 
>  97 Paterson Street
>  New Brunswick, NJ 08903   
>  phone: (732) 235-7699 
>fax: (732) 235-6189 
>   http://www2.umdnj.edu/iscdweb/   
>   http://members.nbci.com/cmcleland/ 
> -<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sample size question

2001-02-23 Thread Chuck Cleland


"Scheltema, Karen" wrote:
> Can anyone point me to software for estimating ANCOVA or regression sample
> sizes based on effect size?

Look here:

http://www.interchg.ubc.ca/steiger/r2.htm

Chuck
 
-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-
 Chuck Cleland 
 Institute for the Study of Child Development  
 UMDNJ--Robert Wood Johnson Medical School 
 97 Paterson Street
 New Brunswick, NJ 08903   
 phone: (732) 235-7699 
   fax: (732) 235-6189 
  http://www2.umdnj.edu/iscdweb/   
  http://members.nbci.com/cmcleland/ 
-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-<>-


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Sample size question

2001-02-23 Thread Alex Yu



You can use Sample Power from SPSS (a.k.a. Power and Preceision) or PASS 
2000 from NCSS. For more info, please visit:

http://www.spss.com
http://www.ncss.com
http://seamonkey.ed.asu.edu/~alex/teaching/WBI/power_es.html

---
--"Regression to the mean" is not always true. After 30, my weight never 
regresses to the mean.


Chong-ho (Alex) Yu, Ph.D., MCSE, CNE
Academic Research Professional/Manager
Educational Data Communication, Assessment, Research and Evaluation
Farmer 418
Arizona State University
Tempe AZ 85287-0611
Email: [EMAIL PROTECTED]
URL:http://seamonkey.ed.asu.edu/~alex/
   
  




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

RE: Sample size question

2001-02-23 Thread Scheltema, Karen


Thanks!  This was exactly what I was looking for!

Karen Scheltema
Statistician
HealthEast
Research and Education
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax (651) 641-0683
[EMAIL PROTECTED]

> -Original Message-
> From: Magill, Brett [SMTP:[EMAIL PROTECTED]]
> Sent: Friday, February 23, 2001 9:53 AM
> To:   'Scheltema, Karen'; [EMAIL PROTECTED]
> Subject:  RE: Sample size question
> 
> G*Power is a powere analysis package that is freely available.  You can
> download it at:
> 
> http://www.psychologie.uni-trier.de:8000/projects/gpower.html 
> 
> You can calculate a sample size for a given effect size, alpha level, and
> power value. 
> 
> 
> -Original Message-
> From: Scheltema, Karen [mailto:[EMAIL PROTECTED]]
> Sent: Friday, February 23, 2001 10:07 AM
> To: [EMAIL PROTECTED]
> Subject: Sample size question
> 
> 
> Can anyone point me to software for estimating ANCOVA or regression sample
> sizes based on effect size?
> 
> Karen Scheltema
> Statistician
> HealthEast
> Research and Education
> 1700 University Ave W
> St. Paul, MN 55104
> (651) 232-5212   fax (651) 641-0683
> [EMAIL PROTECTED]
> 
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

RE: Sample size question

2001-02-23 Thread Magill, Brett


G*Power is a powere analysis package that is freely available.  You can
download it at:

http://www.psychologie.uni-trier.de:8000/projects/gpower.html 

You can calculate a sample size for a given effect size, alpha level, and
power value. 


-Original Message-
From: Scheltema, Karen [mailto:[EMAIL PROTECTED]]
Sent: Friday, February 23, 2001 10:07 AM
To: [EMAIL PROTECTED]
Subject: Sample size question


Can anyone point me to software for estimating ANCOVA or regression sample
sizes based on effect size?

Karen Scheltema
Statistician
HealthEast
Research and Education
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax (651) 641-0683
[EMAIL PROTECTED]



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Sample size question

2001-02-23 Thread Scheltema, Karen


Can anyone point me to software for estimating ANCOVA or regression sample
sizes based on effect size?

Karen Scheltema
Statistician
HealthEast
Research and Education
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax (651) 641-0683
[EMAIL PROTECTED]



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: statistics question

2001-02-06 Thread shl20740


In article <95nuk5$8df$[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] wrote:
> Thanks very much for your helpful response.
> 1) My factors are continous.  I have multiple responses. Some are
> continous and some are categorical. I need to optimize my resonses.
> The main region that they are interested is for A between-35 and 95
> and for B between 900 and 1750.
> In addition they want to run couple of points outside of this region,
> as there is reason to believe that it will optimize the response.
> These are B=2000 and A is any pt between 65 and 95, say 80.
> Also, they want to run the combination A=35 and B=1650.
> Also, would like to include A=90 and B=1650.
> Also, would like to include A=105 and B close to 1325.
> these points are not totally fixed. If I can get close to it that will
> work.
> Everything else looks like flexible. I'll be able to run the
experiment
> 21 times. I can include replications. Will replication on some runs
and
> not on other destroy orthogonality?
>
> I'm not sure how to set this up.
> I appreciate your help very much.
> SH. Lee
> In article <[EMAIL PROTECTED]>,
>   [EMAIL PROTECTED] wrote:
> > Flash response:
> >
> > 1)Are the levels fixed by some characteristic of the process?
> they
> > look continuous, and you could do much better if they were, and you
> > could select different intermediate levels.
> >
> > 2)the number of levels can be what you want of it.  Some good
> > response surface designs use 5 levels.  some use more.
> >
> > 3)Factor B levels are equally spaced, which is good.  Factor A
> > levels are not evenly spaced.  A full factorial will not give you a
> > 'clean' design - Without doing the math, I don't believe it will be
> > orthogonal, even if you did do all the combinations.
> >
> > 4)what are you going to do with the results of this experiment?
> If
> > you wish to build a model of the system behavior, then a full
> factorial
> > type approach is a waste of your effort, time, and experimental
runs.
> >
> > 5)Suggest you look at a Response model, with maybe 3-5 levels in
> > both factors, but using a proper RSM type design.  If you do it
> > properly, you can avoid a single 'corner' point and recover it
> > mathematically.
> >
> > 6)I'd also ask if you have hard reason to believe that a RSM
type
> > model, which will get you quadratic terms in a model, is in fact
> worth
> > doing (financial/your time costs) the first time out?  If little
> prior
> > information is available, it would probably be better to do a
> simpler,
> > 2-level factorial first, if at all possible.  Doing this will teach
> you
> > a great deal [that you probably don't already know].  Your choice
> here,
> > but remember - most people overestimate their knowledge level :)
> >
> > 7)You haven't discussed the response yet.  Please spend some
time
> > thinking about that, too.
> >
> > More later, if this helps at all.  Let me know.
> >
> > Jay
> >
> > [EMAIL PROTECTED] wrote:
> >
> > > Hi,
> > >
> > > I have two factors A and B and I want to run a DOE to study my
> response.
> > > My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is
> at 4
> > > levels 35, 65, 80 and 105.
> > > First of all is it right to have one factor at 4 levels. I have
> > > encountered situations where the factors are either at 2 levels
or 3
> > > levels.?
> > > This will require me to have 12 runs for a full factorial, right?
> > > Also, I do not want to run  only the level 35 of factor A with the
> level
> > > 900 of factor B. If I remove the combination 35, 1450 and 35,
2000;
> > > I'll have only 10 runs and the resulting design space will not be
> > > orthogonal. How do I tackle this problem?
> > > Is there a different design that you would suggest.
> > > Thanks for your help.
> > > SH Lee
> > >
> > >
> > > Sent via Deja.com
> > > http://www.deja.com/
> > >
> > >
> > > =
> > > Instructions for joining and leaving this list and remarks about
> > > the problem of INAPPROPRIATE MESSAGES are available at
> > >   http://jse.stat.ncsu.edu/
> > > =
> > >
> > >
> > >
> >
> > --
> > Jay Warner
> > Principal Scientist
> > Warner Consulting, Inc.
> >  North Green Bay Road
> > Racine, WI 53404-1216
> > USA
> >
> > Ph: (262) 634-9100
> > FAX:(262) 681-1133
> > email:  [EMAIL PROTECTED]
> > web:http://www.a2q.com
> >
> > The A2Q Method (tm) -- What do you want to improve today?
> >
> > =
> > Instructions for joining and leaving this list and remarks about
> > the problem of INAPPROPRIATE MESSAGES are available at
> >   http://jse.stat.ncsu.edu/
> > =
> >
>
> Sent via Deja.com
> http://www.deja.com/
>


Sent via Deja.com
http://www.deja.com/


==

Re: statistics question

2001-02-05 Thread shl20740


Thanks very much for your helpful response.
1) My factors are continous.  I have multiple responses. Some are
continous and some are categorical. I need to optimize my resonses.
The main region that they are interested is for A between-35 and 95
and for B between 900 and 1750.
In addition they want to run couple of points outside of this region,
as there is reason to believe that it will optimize the response.
These are B=2000 and A is any pt between 65 and 95, say 80.
Also, they want to run the combination A=35 and B=1650.
Also, would like to include A=90 and B=1650.
Also, would like to include A=105 and B close to 1325.
these points are not totally fixed. If I can get close to it that will
work.
Everything else looks like flexible. I'll be able to run the experiment
21 times. I can include replications. Will replication on some runs and
not on other destroy orthogonality?

I'm not sure how to set this up.
I appreciate your help very much.
SH. Lee
In article <[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] wrote:
> Flash response:
>
> 1)Are the levels fixed by some characteristic of the process?
they
> look continuous, and you could do much better if they were, and you
> could select different intermediate levels.
>
> 2)the number of levels can be what you want of it.  Some good
> response surface designs use 5 levels.  some use more.
>
> 3)Factor B levels are equally spaced, which is good.  Factor A
> levels are not evenly spaced.  A full factorial will not give you a
> 'clean' design - Without doing the math, I don't believe it will be
> orthogonal, even if you did do all the combinations.
>
> 4)what are you going to do with the results of this experiment?
If
> you wish to build a model of the system behavior, then a full
factorial
> type approach is a waste of your effort, time, and experimental runs.
>
> 5)Suggest you look at a Response model, with maybe 3-5 levels in
> both factors, but using a proper RSM type design.  If you do it
> properly, you can avoid a single 'corner' point and recover it
> mathematically.
>
> 6)I'd also ask if you have hard reason to believe that a RSM type
> model, which will get you quadratic terms in a model, is in fact
worth
> doing (financial/your time costs) the first time out?  If little
prior
> information is available, it would probably be better to do a
simpler,
> 2-level factorial first, if at all possible.  Doing this will teach
you
> a great deal [that you probably don't already know].  Your choice
here,
> but remember - most people overestimate their knowledge level :)
>
> 7)You haven't discussed the response yet.  Please spend some time
> thinking about that, too.
>
> More later, if this helps at all.  Let me know.
>
> Jay
>
> [EMAIL PROTECTED] wrote:
>
> > Hi,
> >
> > I have two factors A and B and I want to run a DOE to study my
response.
> > My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is
at 4
> > levels 35, 65, 80 and 105.
> > First of all is it right to have one factor at 4 levels. I have
> > encountered situations where the factors are either at 2 levels or 3
> > levels.?
> > This will require me to have 12 runs for a full factorial, right?
> > Also, I do not want to run  only the level 35 of factor A with the
level
> > 900 of factor B. If I remove the combination 35, 1450 and 35, 2000;
> > I'll have only 10 runs and the resulting design space will not be
> > orthogonal. How do I tackle this problem?
> > Is there a different design that you would suggest.
> > Thanks for your help.
> > SH Lee
> >
> >
> > Sent via Deja.com
> > http://www.deja.com/
> >
> >
> > =
> > Instructions for joining and leaving this list and remarks about
> > the problem of INAPPROPRIATE MESSAGES are available at
> >   http://jse.stat.ncsu.edu/
> > =
> >
> >
> >
>
> --
> Jay Warner
> Principal Scientist
> Warner Consulting, Inc.
>  North Green Bay Road
> Racine, WI 53404-1216
> USA
>
> Ph:   (262) 634-9100
> FAX:  (262) 681-1133
> email:[EMAIL PROTECTED]
> web:  http://www.a2q.com
>
> The A2Q Method (tm) -- What do you want to improve today?
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>


Sent via Deja.com
http://www.deja.com/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: statistics question

2001-02-05 Thread Donald Burrill

You've had a good "flash response" from Jay Warner.
 Other short answers embedded in original query below:

On Sat, 3 Feb 2001 [EMAIL PROTECTED] wrote:

> I have two factors A and B and I want to run a DOE to study my 
> response.  My factor B is at 3 levels; (900, 1450 and 2000) , my factor 
> A is at 4 levels 35, 65, 80 and 105.
> First of all is it right to have one factor at 4 levels. 

"Right" I don't know about.  There's nothing _wrong_ with it.
If it is logically required by the problem, or useful for design reasons 
of one sort or another, it is certainly defensible.

> This will require me to have 12 runs for a full factorial, right?

Your arithmetic is correct.

> Also, I want to run the level 35 of factor A only with the level 900 
> of factor B.  If I remove the combination 35, 1450 and 35, 2000;
> I'll have only 10 runs and the resulting design spce will not be
> orthogonal. 
True.

> How do I tackle this problem?
> Is there a different design that you would suggest.

Depends on what you're carrying out this experiment for, and why it makes 
sense to omit those two design points.  But one way to approach the 
problem is to treat the data as a one-way design with 10 levels, and 
model the detailed questions you want to ask via assorted contrasts. 
Of course, the contrasts will probably not be orthogonal;  but having 
found out some (preliminary?) things about the situation in this run, 
you can then more intelligently design a subsequent run, perhaps with 
fewer than 10 combinations, or with a design of the sort Jay suggested. 

-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: statistics question

2001-02-04 Thread Jay Warner

Flash response:

1)Are the levels fixed by some characteristic of the process?  they 
look continuous, and you could do much better if they were, and you 
could select different intermediate levels.

2)the number of levels can be what you want of it.  Some good 
response surface designs use 5 levels.  some use more.

3)Factor B levels are equally spaced, which is good.  Factor A 
levels are not evenly spaced.  A full factorial will not give you a 
'clean' design - Without doing the math, I don't believe it will be 
orthogonal, even if you did do all the combinations.

4)what are you going to do with the results of this experiment?  If 
you wish to build a model of the system behavior, then a full factorial 
type approach is a waste of your effort, time, and experimental runs.

5)Suggest you look at a Response model, with maybe 3-5 levels in 
both factors, but using a proper RSM type design.  If you do it 
properly, you can avoid a single 'corner' point and recover it 
mathematically.

6)I'd also ask if you have hard reason to believe that a RSM type 
model, which will get you quadratic terms in a model, is in fact worth 
doing (financial/your time costs) the first time out?  If little prior 
information is available, it would probably be better to do a simpler, 
2-level factorial first, if at all possible.  Doing this will teach you 
a great deal [that you probably don't already know].  Your choice here, 
but remember - most people overestimate their knowledge level :)

7)You haven't discussed the response yet.  Please spend some time 
thinking about that, too.

More later, if this helps at all.  Let me know.

Jay

[EMAIL PROTECTED] wrote:

> Hi,
> 
> I have two factors A and B and I want to run a DOE to study my response.
> My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4
> levels 35, 65, 80 and 105.
> First of all is it right to have one factor at 4 levels. I have
> encountered situations where the factors are either at 2 levels or 3
> levels.?
> This will require me to have 12 runs for a full factorial, right?
> Also, I do not want to run  only the level 35 of factor A with the level
> 900 of factor B. If I remove the combination 35, 1450 and 35, 2000;
> I'll have only 10 runs and the resulting design space will not be
> orthogonal. How do I tackle this problem?
> Is there a different design that you would suggest.
> Thanks for your help.
> SH Lee
> 
> 
> Sent via Deja.com
> http://www.deja.com/
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
> 
> 
> 

-- 
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX:(262) 681-1133
email:  [EMAIL PROTECTED]
web:http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

statistics question

2001-02-03 Thread shl20740


Hi,

I have two factors A and B and I want to run a DOE to study my response.
My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4
levels 35, 65, 80 and 105.
First of all is it right to have one factor at 4 levels. I have
encountered situations where the factors are either at 2 levels or 3
levels.?
This will require me to have 12 runs for a full factorial, right?
Also, I do not want to run  only the level 35 of factor A with the level
900 of factor B. If I remove the combination 35, 1450 and 35, 2000;
I'll have only 10 runs and the resulting design spce will not be
orthogonal. How do I tackle this problem?
Is there a different design that you would suggest.
Thanks for your help.
SH Lee


Sent via Deja.com
http://www.deja.com/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

updated statistics question

2001-02-03 Thread shl20740


Hi,

I have two factors A and B and I want to run a DOE to study my response.
My factor B is at 3 levels; (900, 1450 and 2000) , my factor A is at 4
levels 35, 65, 80 and 105.
First of all is it right to have one factor at 4 levels. I have
encountered situations where the factors are either at 2 levels or 3
levels.?
This will require me to have 12 runs for a full factorial, right?
Also, I want to run only  the level 35 of factor A with the level
900 of factor B. If I remove the combination 35, 1450 and 35, 2000;
I'll have only 10 runs and the resulting design space will not be
orthogonal. How do I tackle this problem?
Is there a different design that you would suggest.
Thanks for your help.
SH Lee


Sent via Deja.com
http://www.deja.com/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

62 matches

Mail list logo