New Statistics Website

2000-06-20 Thread secondmoment

I am hoping to get some feedback regarding an
applied statistics and analytics website I
created to bring together acedemics and industry
analysts. It is designed to showcase academic
research and create discussion in the field of
applied statistics and to provide links to
other useful statistics sites as well as
employment opportunities. The site is called
Second Moment and is at
. I would appreciate
your any feedback and suggestions on how I could
improve the site. Thanks.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Rates and proportions

2000-06-20 Thread Alan McLean

One might also ask what is meant by the 'population escape rate' in this
context. Is the data not population data?

Alan

Dale Berger wrote:
> 
> Hi Don et al.,
> 
> If we observe one escape out of 1250 inmates, why can't we reliably rule out
> zero as the population escape rate?  The normal approximation to the
> binomial may not be appropriate here.
> 
> Dale Berger

> 
> > "Unreliable" or "useless"?  Well, the basic graininess in a rate
> > is one escapee more (or less) than was reported.  A rate of .08 per 100
> > is about 1 out of 1250.  If the data on which the rate was based were 1
> > escapee out of 1250 inmates, one cannot _reliably_ tell the rate from
> > zero.  If the data were 13 escapees out of 16,200 inmates, one would have
> > more faith in the rate, at least insofar as representing a small value
> > different from (not equal to!) zero.  Unfortunately, the rate itself
> > does not tell one how grainy the data were.
> >
>- 
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Rates and proportions

2000-06-20 Thread Dale Berger

Hi Don et al.,

If we observe one escape out of 1250 inmates, why can't we reliably rule out
zero as the population escape rate?  The normal approximation to the
binomial may not be appropriate here.

Dale Berger
Professor and Dean, Psychology
Claremont Graduate University
123 East Eighth Street
Claremont, CA  91711

FAX: 909-621-8905
Phone: 909-621-8084
http://www.cgu.edu/faculty/bergerd.html

- Original Message -
From: Donald Burrill <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, June 20, 2000 2:49 PM
Subject: Re: Rates and proportions


> On Tue, 20 Jun 2000 [EMAIL PROTECTED] wrote:
>
> > Hello, I "inherited" the reporting system for our escapes and have some
> > questions about how data has been reported in the past.
;
;

> "Unreliable" or "useless"?  Well, the basic graininess in a rate
> is one escapee more (or less) than was reported.  A rate of .08 per 100
> is about 1 out of 1250.  If the data on which the rate was based were 1
> escapee out of 1250 inmates, one cannot _reliably_ tell the rate from
> zero.  If the data were 13 escapees out of 16,200 inmates, one would have
> more faith in the rate, at least insofar as representing a small value
> different from (not equal to!) zero.  Unfortunately, the rate itself
> does not tell one how grainy the data were.
>




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: power analysis for the log-rank test to prove equivalence

2000-06-20 Thread Jerry Hintze

> Does anybody know how to calculate the sample size needed to prove
> EQUIVALENCE, not difference of two treatments concerning survival data
> (log-rank test, cox regression).
>
> Thanks Bernd
>
If you were willing to use proportions or means, you could use our program
PASS (at www.ncss.com) to solve your problem. You can try it out for free
for 30 days.

You might want to use two one-sided confidence limits. This is the approach
of Farrington and Manning which seems to be popular. This can of thing shows
up in Statistics in Medicine.

Regards,

Jerry Hintze, NCSS





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



time series assumptions

2000-06-20 Thread Barbara Lehman

I am attempting to help a friend analyze interrupted time series data,
and am hoping someone on the list might have advice about how to
proceed.  The data, which are police reports of hate crimes over a three
year period, seem less than ideal.  The series is organized into weekly
frequency data (a total of 156 data points).  Approximately 35 of the
weeks had some sort of hate crime incident (and one week had as many as
75), but the remaining points are all 0 (i.e., there were no hate crimes
that week).

First, I am wondering about the amount of variability required to
reliably identify an ARIMA model for the series.  In trying to model the
series, I was amazed (and a bit frightened) by how easily all the
variability was captured by either one autoregressive or one moving
average term.  In playing with fake data, it is clear it is possible to
produce autocorrelations for any series that is not a constant, even if
it has almost no variation.  I’m wondering if there are any assumptions
(or rules of thumb) about how much variation is necessary to use ARIMA.

Second, the weeks with hate crimes are not distributed evenly over time,
so the mean and standard deviation of the series are not stable.
However, the series does not appear to need to be differenced, maybe
because although both the mean and the standard deviation increase
considerably over time, most data points are still 0.  How much of a
problem is this lack of stability of the mean and standard deviation?
Is there a diagnostic I should be using to test homogeneity?

It might be easiest for people to reply directly to me at
[EMAIL PROTECTED] and I will post a summary of helpful rules of
thumb/diagnostics.  Thank you in advance for any advice/feedback!

Barbara Lehman
Claremont Graduate University




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Rates and proportions

2000-06-20 Thread Donald Burrill

On Tue, 20 Jun 2000 [EMAIL PROTECTED] wrote:

> Hello, I "inherited" the reporting system for our escapes and have some
> questions about how data has been reported in the past.
> 
> First, I have a question about the formula used to calculate escape 
> rates which is (escapes)/(average daily population - escapes).  Then 
> this is reported as a rate per 100 inmates.  Isn't this actually a ratio 
> of escapees to non-escapees. 

Right.  AKA an odds ratio (before multiplying by 100).
 One might take the reciprocal (e.g., 1/0.0008 = 1,250, from your .08 per 
100 below) as representing the odds AGAINST escaping (1250:1), rather 
than the odds IN FAVOR OF escaping (0.08 chance in 100, or 1:1250).

> Maybe I'm just picking at semantics, let me know.  I thought that the 
> formula for rates was (a/(a+b)) * k where the numerator is included in 
> the denominator. 
Right again.

> Then I also have a rule of thumb question.  At what point is a rate
> considered unreliable or a useless piece of information?  My example 
> again and remember that it uses the "formula" I first presented above. 
> The previous reports show rates of .44 per 100 or .08 per 100, etc.  
> Of course I find this comical because I imagine that .44 means an 
> escapee with only a torso, legs and head and .08 as an escapee with 
> only the torso! 
Mmm.  Only the left shin, I would have thought...
But this is no more comical than expressions like 0.44% (do you remember 
the old Ivory Soap ads, claiming that Ivory was 99.44% pure?  Only they 
wrote it as a fraction, 44/100.)
"Unreliable" or "useless"?  Well, the basic graininess in a rate 
is one escapee more (or less) than was reported.  A rate of .08 per 100 
is about 1 out of 1250.  If the data on which the rate was based were 1 
escapee out of 1250 inmates, one cannot _reliably_ tell the rate from 
zero.  If the data were 13 escapees out of 16,200 inmates, one would have 
more faith in the rate, at least insofar as representing a small value 
different from (not equal to!) zero.  Unfortunately, the rate itself 
does not tell one how grainy the data were.

> But, many folks around here take those numbers to indicate that the 
> escape "rate" has decreased substantially! 

Well, in fairness, .08 is only 20% -- that is, 1/5 -- of 0.44.  Dividing 
one's number of escapees by 5 might well reflect substantial success, in 
some terms.  But part of the point, as Dennis has mentioned, is whether 
the comparison is between the same institution at two different times 
(then one could suppose the "average daily population" to be, if not 
essentially constant, at least comparable), or between two different 
institutions with very different sizes of population.

> I have seen CDC tables with a caveat regarding
> small rates and will pull those as evidence for my argument.
> 
> So here's a real life problem for my colleagues out there.  I am going 
> through all the statistics books in my office and have started to 
> search for references to present my case.  I'm not kidding because I 
> was told that this is the way it has always been calculated so don't 
> mess with tradition. 
Sounds depressingly realistic.

> If anyone has any references, suggestions, openings for positions, 
> cites [ Sites?  -- dfb ] to search, etc. I would really appreciate it.  
> Many thanks in advance, Fran


 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Comments about my syllabus

2000-06-20 Thread Warren

It does seem ambitious for any survey course.
And why not teach something useful...do you prohibit them from having a
calculator?  What's the difference in a calculator and a computer?  I remember
learning how to take square roots, approximately...do you require them to do
that?  An awful waste of a semester if they never learn how to work with real
data sets.

Repeating is good.  I repeat...repeating is good.  :)

One teaches as one was taught.  Hard to break that pattern.  For example, how
many of us spend class time teaching how to get "p-values" from the tables.
Why?  Textbooks still have the variance formulae in two forms...one for
computation.  Why?

Warren May
(I'm still learning to teach, too)

Paul R Swank wrote:

> I found your syllabus to be very ambitious for undergraduates. Is this
> their first stat course?
>
> At 07:34 AM 6/18/00 -0400, SM wrote:
> >Howdy,
> >I am not a subscriber of this listserv, but was invited to post by E.
> >Jacquelin Dietz, editor of THE JOURNAL OF STATISTICS EDUCATION.
> >
> >I am a social worker (MSW with a Ph.D. in Sociology) and I teach two
> >sections of statistics (to social work and criminal justice majors) at a
> >small college in rural North Carolina.  I've completed seven statistics
> >courses on the Ph.D. level.  However, my Ph.D. experience with statistic
> >courses may not have prepared to teach this course to social work
> >majors.
> >
> >I have shared my syllabus with my social work colleagues, but they have
> >less of a background in teaching statistics than I do! I am interested
> >in sharing my syllabus with others who teach statistics and get
> >feedback.
> >
> >Two issues that may not be clear on the syllabus:
> >
> >1) I prohibit students from using a computer until they have solved the
> >equation by hand first.  I have discovered that students do much better
> >on exams when they have done the math.  For example, I can ask non math
> >questions on an exam, and students do better.  They seem to have a
> >deeper understanding.  Have you experienced this?
> >
> >2) Students seem to understand basis statistical concepts when I repeat
> >the explanation 3 to 5 times in different ways. I use links on my
> >syllabus, lecture, films (AGAINST ALL ODDS), the text, and supplemental
> >readings.
> >
> >My syllabus can be found at
> >http://www.uncp.edu/home/marson/360_summer.html .  I would appreciate
> >your guidance, but try not to hurt my feelings!
> >
> >Cordially,
> >
> >Steve
> >
> >Stephen M. Marson, Ph.D., ACSW
> >Professor/Director, Social Work Program
> >UNC-P
> >
> >
> >
> >
> >
> >===
> >This list is open to everyone.  Occasionally, less thoughtful
> >people send inappropriate messages.  Please DO NOT COMPLAIN TO
> >THE POSTMASTER about these messages because the postmaster has no
> >way of controlling them, and excessive complaints will result in
> >termination of the list.
> >
> >For information about this list, including information about the
> >problem of inappropriate messages and information about how to
> >unsubscribe, please see the web page at
> >http://jse.stat.ncsu.edu/
> >===
> >
> 
> Paul R. Swank, PhD.
> Professor & Advanced Quantitative Methodologist
> UT-Houston School of Nursing
> Center for Nursing Research
> Phone (713)500-2031
> Fax (713) 500-2033
>
> ===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
> ===



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: how to calculate the variance of x/y

2000-06-20 Thread bill knight

Robert Dawson wrote:
> 
> dz asked:
> > Hi, anybody knows how to caculate the variance of x/y? where x and y are
> two
> > independent variables with normal dis n(a1,b1) and
> > n(a2,b2) respectively.
> 
> Yes. And no.
> 
> Sitting down?
> 
> Trick question! It is always undefined, as is the mean!  Moreover,this
> holds for any continuous distributions st the density of the denominator
> function at y=0 is nonzero.
> 
> Imagine just the region y = (- epsilon, epsilon). Let A be the minimum
> of the density of Y  over that interval; for small enough epsilon this is
> >0.
> 
> Now, the mean, E(X/Y), is defined to be \integral \integral x/y f_x f_y
> dx dy evaluated over the whole (x,y) plane. But if you try to evaluate this
> over regions
> 
> (0,infinity) x (epsilon/2, epsilon)
> 
> you get   \integral x f_x dx   \integral from epsilon/2 to epsilon  1/y
> f(y) dy.
> 
> The first factor is always positive (or 0 iff X is always negative); the
> second
> 
> is bounded below by A (ln(epsilon) -ln(epsilon/2)) = A ln 2.
> 
> So each strip from epsilon to epsilon/2, epsilon/2 to epsilon/4, etc,
> makes a contribution greater than A ln 2 \integral x f_x dx; thus the
> integral diverges.
> If X is always negative, the same argument holds, but integrating over
> (-infinity,0).
> 
> Now, this seems daft. Intuitively, if the heights of husbands and wives
> were independent (they aren't, but let that pass - independence is not
> really the issue here) and normally distributed,  you would sureluy expect
> the mean of the ratio to exist? Yes, but. If they were truly normally
> distributed, even in the far tails, you would expect one person in a few
> quadrillion or something to have negative height, or a height of a few
> microns. It is those hypothetical microscopic people who would provide a
> scattering of astronomically large ratios.
> 
> So, what do you do? Well, there are limiting techniques such as Cauchy
> principle values, but that is a highly abstract and formal technique that
> you had better understand _very_ well before trying it on a real world
> problem.  Alternatively, you will find that if the mean is several standard
> deviations away from 0, you can trim anywhere from 1% to (say) 0.1%
> of the extreme values from your distributions and the trimmed means you get
> will not vary within that range. That "plateau" before the weirdness starts
> is a good practical value to use instead. For any reasonable-sized sample,
> the sampling distribution of the mean will gather around that value.
> 
> If the mean of Y is close to 0, you cannot do that - you have something
> like a Cauchy distribution that is genuinely heavy-tailed even for practical
> purposes. In a case like that, the sampling distribution of the mean of a
> small sample will not converge to anything, and you had better use the
> median instead. The question is: is Y~0 a definite possibility for the sort
> of sample sizes you envisage?
> 
> -Robert Dawson
---

If I may add a few comments ---  
 william knight
 [EMAIL PROTECTED]

(1)  Since the ratio, X/Y, is wanted, I doubt that X and Y
 are normally distributed.  One seldom takes ratios 
 when X and Y may occasionally be negative (See Robert 
 Dawson's heights example.)  

(2)  Is that ratio really a good idea?  Ratios can be nasty:
 Even if Y cannot actually be zero, a small Y inflates the
 ratio badly.  Consider this data:
Case   1  2  3  4 5   
  X1  5 1/4 51/5   
  Y1  4 1/51/55 
 X/Y   1  1.25  0.8250.04
 I question the meaning of the average of these ratios
 (about 6), let alone the meaning of the variance.

(3)  I am puzzled why the original question didn't ask about
 the expected ratio as well as the variance.   
 
(4)  Possible approaches

 (4.1) Transform the data to logarithms.  Means and variances
 follow from usual linear formulae.  I deduce there are no
 zeroes since the original question concerned ratios which
 would fail with division by zero.

 (4.2) Look at X/(X+Y)
 This avoids small denominator problems.  The variance 
 (and average which, surprisingly was not asked about) can
 be approximated by methods found under "ratio estimates"
 in any textbook on the statistical methods of survey 
 sampling.
   
(5)  It would help to know what X and Y are:  Heights?  
 Number of Clintonia borealis plants in a quadrat?
 X=length of fibula, Y=length of tibia?
 Strength of treated vs untreated concrete specimens?
 (None of these can be negative, and hence none of 
 these can be normal.)  Indeed, it is dangerous to 
 answer the question without knowing this.


===
This lis

Re: differences between groups/treatments ?

2000-06-20 Thread Donald Burrill

On Tue, 20 Jun 2000, Rich Ulrich wrote:

> On 19 Jun 2000 18:01:28 -0700, [EMAIL PROTECTED] (Dónal Murtagh) wrote:
> 
>  < ... > 
> > Firstly, thank you for your comments. Am I right in saying that the two
> > (equivalent) options I have are:
> 
> These are not quite equivalent options since the first one really
> stinks --

Sorry, Rich, I must take issue with yoku.  If the first option really 
stinks, so does the second:  they are, in fact, equivalent, as Donal 
describes the second (with dichotomies for X1 and X2).

> If you are considering drawing conclusions about causation,

This is a fair enough warning, I suppose;  but I don't recall reading 
anything in the original post that implied that it was desired to show 
causation.  (Can't think of anything that expressly denied it either; 
but I still think you're reading it into, rather than out of, the 
problem.) 

> you need *random assignment* and the two Groups of performance are the
> furthest thing from random.

For that matter, had it been specified that the treatments were assigned 
at random?  In any case, I'd be interested in knowing how you would 
propose that performance might be assigned at random.  ;-)

> Let's see:  the simple notion of regression-to-the-mean  says that the 
> Best performers should fall back, the Worst performers should improve;
> that's a weird main-effect, which should wreak havoc with interpreting 
> other effects. 
> Or:  If the Pre is powerful enough to measure potential, then a
> continued-growth model says that Best performers should improve more,
> even given no treatments.  

Ummm...  I think you have to postulate that the POST is powerful enough, 
unless you're assuming that the Pre and Post measures are identical 
(which they may be, of course; though that introduces other measurement 
issues).

> For simple change-scores (and ANOVA interactions) from dichotomous
> groups, you assume that neither of those possibilities are true, if
> you want to be able to interpret them.

Only if you want to be able to interpret them SIMPLY.

> The Regression model at least places the contrasts into the realm 
> of comparing the regression lines. 
Yes, provided one is modelling 
the pretest as a continuum rather than as a coded dichotomy, as Donal 
described it.

> Your fundamental knowledge of what is happening will probably come 
> from examining and comparing the scatterplots, pre-post, for the two 
> treatments.  (Another thing to note from the picture:  Are there 
> ceiling/basement effects on the performance test?)

Good advice.  I concur.

>  - Treating it as a continuum is better by a lot, even if you were
> sure that the Performance scale
> was close to the ANOVA-analytic ideal, a normal distribution.

Did you mean the ERRORS (or residuals) in the Performance scale, perhaps?
-- Don.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: differences between groups/treatments ?

2000-06-20 Thread Donald Burrill

On Tue, 20 Jun 2000, Murtagh wrote:

> Firstly, thank you for your comments. Am I right in saying that the two
> (equivalent) options I have are:
> 
> 1.ANOVA
> 
> Yijk = mew + Ai + Bj + ABij + Eijk
> 
> Ai:   a fixed factor representing the treatments (2 levels)
> Bj:   a fixed factor representing prior perfromance (2 levels)
> ABij: an interaction between Ai and Bj
> Yijk: the score of the kth child who received treatment i and is from 
>   group j 
> Eijk: random error
> 
> I suspect that this model is inapporpriate, as the Eijk term will represent
> between subjects (children) variation, which is not usually included in the
> estimate of random error.

I do not understand this comment.  What source(s) of random error exist 
in this design APART from variation between subjects within cells?  
Between-subjects variation (as residuals from the model) defines the 
standard error-variance term against which the variability in the 
systematic effects is tested.
 
> 2.MLR
> 
> Y = Bo + B1*X1 + B2*X2 + B3*X3 + E
> 
> X1:   prior performance (0 => weak, 1 => strong)
> X2:   treatment (0 => treament A, 1 => treatment B)
> X3:   treatment*prior performance
-- hence with the coding shown for X1 and X2,  1 => 
strong prior performance and treatment B, 0 => all other conditions.

And  E = Eijk of the ANOVA model.  B1 is a straightforward function 
(depending on the coding of X1, of course) of the Ai in the ANOVA model, 
B2 of the Bj (and depends on the coding of X2), and B3 of Ai, Bj, and 
ABij. 
 
> I appreciate that prior performance is probably better considered as a
> continuum, rather than a dichotomy.

_I_ would consider it so.  In fact, the first thing I'd do is ask for 
scatterplots of post-performance vs. pre-performance for all the cells 
in the design I was considering.  (In what you've described, that's two 
cells.)  THEN decide whether it appeared to make better sense to divide 
the continuum into two (or more) pieces, or to model it AS a continuum, 
possibly with non-linear functions.

> >> 1.  If there are children of different sexes, you may be able to 
> >> consider a three-way design, although I suspect it would be 
> >> unbalanced, which (I also suspect!) may induce serious difficulties 
> >> for you.

> You mean that there would not be the same numbers in each group? 

Yes.

> I can't see why this should cause problems, but then that's probably 
> due to my relative ignorance of linear models!

Doesn't cause problems in one-way designs.  But in 2-way designs (let 
alone 3-way, 4-way, ...) unequal  n's  induce association of some kind 
between the design factors.  People who do multiple regression don't have 
much problem with this, it's their normal situation;  but people who try 
to do formal ANOVA design-of-experiments (and are therefore accustomed to 
the notion that the factors are mutually independent (and therefore are 
orthogonal)) are sometimes boggled by (1) the fact that the sums of 
squares for the several sources of variation do not simply add to the 
total sum of squares about the grand mean, or (2) the fact that the 
sums of squares reported depend on the order in which the factors are 
considered.  And many of the standard packages for doing multi-factor 
ANOVA use algorithms that require the design to be balanced. 
 (A GLM -- general linear model -- program does not usually have such 
constraints, and may even produce output patterned after the form of a 
standard balanced ANOVA, but one needs to be aware of (1) and (2) above.) 

> >> 2.  Your Performance information you have chosen to dichotomize,
> >> although it is presumably (quasi-)continuous to start with.  You 
> >> might find out something useful by treating it as a continuous 
> >> predictor rather than as a dichotomy:  in effect carrying out an 
> >> analysis of covariance with pre-treatment reading score as the 
> >> covariate, whether you used an "Analysis of Covariance" program or 
> >> a "Multiple Regression" program or a "General Linear Model" (GLM) 
> >> program to do the arithmetic. 
> 
> Presumably, this could achieved by simply using the pre-treatment score 
> itself (rather than 0 or 1) for the value of X1 in the suggested MLR 
> model above?
Right. 
 And if the pre-post relationship should turn out to be detectably 
nonlinear, you can substitute some candidate nonlinear function(s) of X1 
and see if that helps.

There may be nonlinearity to be EXPECTED:  in the nature of a reading 
test, there is a highest possible score (all items right, e.g.) and a 
lowest possible score (no items right, e.g.).  Students who perform well 
pre-treatment cannot have change scores that would put them above the 
highest possible score at post-treatment;  so it would not be surprising 
if (a) change correlates negatively with pre-treatment, (b) post scores 
were censored at the maximum (and negatively skewed), (c) pre scores were 
censored at the minimum

Re: Rates and proportions

2000-06-20 Thread dennis roberts


>Reword these as per 10,000? That way you have "whole people" while
>preserving the differences among the rates.

this might ease the problem but, does not eliminate it (though makes more
sense than a base of 100) ... for, what if the value comes out to be ...
.04423? i guess it depends on how many tend to be IN each prison ...
perhaps ... as that should be the yardstick to use ... if you are comparing
escapee rates across institutions ...

but, maybe the base should more be a function of the type of comparison
being done ... across institutions might (logically) call for a smaller
base ... across states might call for a larger base ... 

for example ... in small towns in a state ... to talk about the escapee
rate in local jails as out of 10,000  seems totally unrealistic ... it
might give you a nice "whole" number but, it seems rather meaningless as it
might take 40 years to accumulate that many prisoners

there will always be some "roundoff" ... of course, the bigger the base ...
10,000 versus 1,000 or 100 ... the less importance it will have

>
>Disclaimer: I am in NO WAY an expert.
>
>
>Jill Binker
>Fathom Dynamic Statistics Software
>KCP Technologies, an affiliate of Key College Publishing and
>Key Curriculum Press
>1150 65th St
>Emeryville, CA  94608
>1-800-995-MATH (6284)
>[EMAIL PROTECTED]
>http://www.keypress.com
>__
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===

==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/droberts.htm


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Rates and proportions

2000-06-20 Thread Jill Binker

At 12:55 PM -0400 6/20/00, dennis roberts wrote:
>At 11:10 AM 6/20/00 -0500, [EMAIL PROTECTED] wrote:
>>Then I also have a rule of thumb question.  At what point is a rate
>>considered unreliable or a useless piece of information?  My example again
>>and remember that it uses the "formula" I first presented above.  The
>>previous reports show rates of .44 per 100 or .08 per 100, etc.  Of course I
>>find this comical because I imagine that .44 means an escapee with only a
>>torso, legs and head and .08 as an escapee with only the torso!  But, many
>>folks around here take those numbers to indicate that the escape "rate" has
>>decreased substantially!  I have seen CDC tables with a caveat regarding
>>small rates and will pull those as evidence for my argument.
>
>
>well, just like the mean on a 50 item test might be 29.84 ... which no
>person could actually obtain AS a score ... you have to take summary values
>like these with a grain of salt ... for reporting purposes ... it would
>seem to me to make more sense to say ... 30 items ...
>
>for escapee rates ... in either the case of .44 per 100 or .08 per 100 ...
>you don't want to round to 0 ... and report that since ... it suggest NO
>escapees ... but, saying about 1 per 100 seems not correct either ...
>though, i would prefer saying "about 1" to saying .44 or .08 ... 1 gives a
>more UNDERstandable idea of what is happening ...

Reword these as per 10,000? That way you have "whole people" while
preserving the differences among the rates.

Disclaimer: I am in NO WAY an expert.


Jill Binker
Fathom Dynamic Statistics Software
KCP Technologies, an affiliate of Key College Publishing and
Key Curriculum Press
1150 65th St
Emeryville, CA  94608
1-800-995-MATH (6284)
[EMAIL PROTECTED]
http://www.keypress.com
__


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Rates and proportionscc:Mail UUCPLINK 2.0 Undeliverable Message

2000-06-20 Thread Postmaster

User recep!skolenik@recep is not defined
   Original text follows 
-
Received: by ccmail
Received:  from glas by glasnet.ru (UUPC/extended 1.11) with UUCP;
   Tue, 20 Jun 2000 20:52:05 MST
Received: from mx.glasnet.ru([193.124.5.58]) (4157 bytes) by bison.glasnet.ru
via sendmail with P:esmtp/R:guucp_neighbors/T:guux
(sender: <[EMAIL PROTECTED]>) 
id <[EMAIL PROTECTED]>
for ; Tue, 20 Jun 2000 20:50:01 +0400 (MSD)
(Smail-3.2.0.111 2000-Feb-17 #2 built 2000-Mar-2)
Received: from jse.stat.ncsu.edu([152.1.95.14]) (3839 bytes) by mx.glasnet.ru
via sendmail with P:esmtp/R:force_locsmtp/T:smtp
(sender: <[EMAIL PROTECTED]>) 
id <[EMAIL PROTECTED]>
for <[EMAIL PROTECTED]>; Tue, 20 Jun 2000 20:49:45 +0400 (MSD)
(Smail-3.2.0.111 2000-Feb-17 #1 built 2000-Mar-3)
Received: (from majordom@localhost)
  by jse.stat.ncsu.edu (8.8.4/EC02Jan97)
  id MAA19060 for edstat-outgoing; Tue, 20 Jun 2000 12:11:57 -0400 (EDT)
X-Authentication-Warning: jse.stat.ncsu.edu: majordom set sender to 
[EMAIL PROTECTED] using -f
Received: from docrs2.doc.state.ok.us ([204.62.19.2])
  by jse.stat.ncsu.edu (8.8.4/EC02Jan97) with ESMTP
  id MAA19056 for <[EMAIL PROTECTED]>; Tue, 20 Jun 2000 12:11:52 
-0400 (EDT)
From: [EMAIL PROTECTED]
X-ccAdmin: postmaster@glas
Received: from re_franf_nt ([204.87.71.151]) by docrs2.doc.state.ok.us
  (Netscape Mail Server v2.02) with SMTP id AAA3436
  for <[EMAIL PROTECTED]>; Tue, 20 Jun 2000 11:11:29 -0500
Reply-To: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: Rates and proportions
Date: Tue, 20 Jun 2000 11:10:00 -0500
Message-ID: <4E967EBF9C1BD31191DD006094630855011FE1FE@docexchange>
MIME-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3612.1700
Importance: Normal
Sender: [EMAIL PROTECTED]
Precedence: bulk

Hello, I "inherited" the reporting system for our escapes and have some
questions about how data has been reported in the past.

First, I have a question about the formula used to calculate escape rates
which is (escapes)/(average daily population - escapes).  Then this is
reported as a rate per 100 inmates.  Isn't this actually a ratio of escapees
to non-escapees.  Maybe I'm just picking at semantics, let me know.  I
thought that the formula for rates was (a/(a+b)) * k where the numerator is
included in the denominator.

Then I also have a rule of thumb question.  At what point is a rate
considered unreliable or a useless piece of information?  My example again
and remember that it uses the "formula" I first presented above.  The
previous reports show rates of .44 per 100 or .08 per 100, etc.  Of course I
find this comical because I imagine that .44 means an escapee with only a
torso, legs and head and .08 as an escapee with only the torso!  But, many
folks around here take those numbers to indicate that the escape "rate" has
decreased substantially!  I have seen CDC tables with a caveat regarding
small rates and will pull those as evidence for my argument.

So here's a real life problem for my colleagues out there.  I am going
through all the statistics books in my office and have started to search for
references to present my case.  I'm not kidding because I was told that this
is the way it has always been calculated so don't mess with tradition.  If
anyone has any references, suggestions, openings for positions, cites to
search, etc. I would really appreciate it.  Many thanks in advance, Fran



Fran Ferrari, Researcher
Data Analysis & Statistics
Oklahoma Dept. of Corrections
50 NW 23rd Street
Oklahoma City, OK  73105
PH:  405-522-4964
FX:  405-522-4961




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messag

Re: Rates and proportions

2000-06-20 Thread dennis roberts

At 11:10 AM 6/20/00 -0500, [EMAIL PROTECTED] wrote:
>Then I also have a rule of thumb question.  At what point is a rate
>considered unreliable or a useless piece of information?  My example again
>and remember that it uses the "formula" I first presented above.  The
>previous reports show rates of .44 per 100 or .08 per 100, etc.  Of course I
>find this comical because I imagine that .44 means an escapee with only a
>torso, legs and head and .08 as an escapee with only the torso!  But, many
>folks around here take those numbers to indicate that the escape "rate" has
>decreased substantially!  I have seen CDC tables with a caveat regarding
>small rates and will pull those as evidence for my argument.


well, just like the mean on a 50 item test might be 29.84 ... which no 
person could actually obtain AS a score ... you have to take summary values 
like these with a grain of salt ... for reporting purposes ... it would 
seem to me to make more sense to say ... 30 items ...

for escapee rates ... in either the case of .44 per 100 or .08 per 100 ... 
you don't want to round to 0 ... and report that since ... it suggest NO 
escapees ... but, saying about 1 per 100 seems not correct either ... 
though, i would prefer saying "about 1" to saying .44 or .08 ... 1 gives a 
more UNDERstandable idea of what is happening ...

of course, what if you are comparing rates across different prisons ... 
where one is .44 ... another is .08 ... and another is .99 ... to call all 
about 1 seems not quite fair either

the best rule of a thumb variety or not is ... take them all with a grain 
of salt

using statistics ... and understanding what each might provide (ie, what IS 
the mean anyway) .. are not the same ...

Dennis Roberts, EdPsy, Penn State University
208 Cedar Bldg., University Park PA 16802
Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Rates and proportions

2000-06-20 Thread fran . ferrari

Hello, I "inherited" the reporting system for our escapes and have some
questions about how data has been reported in the past.

First, I have a question about the formula used to calculate escape rates
which is (escapes)/(average daily population - escapes).  Then this is
reported as a rate per 100 inmates.  Isn't this actually a ratio of escapees
to non-escapees.  Maybe I'm just picking at semantics, let me know.  I
thought that the formula for rates was (a/(a+b)) * k where the numerator is
included in the denominator.

Then I also have a rule of thumb question.  At what point is a rate
considered unreliable or a useless piece of information?  My example again
and remember that it uses the "formula" I first presented above.  The
previous reports show rates of .44 per 100 or .08 per 100, etc.  Of course I
find this comical because I imagine that .44 means an escapee with only a
torso, legs and head and .08 as an escapee with only the torso!  But, many
folks around here take those numbers to indicate that the escape "rate" has
decreased substantially!  I have seen CDC tables with a caveat regarding
small rates and will pull those as evidence for my argument.

So here's a real life problem for my colleagues out there.  I am going
through all the statistics books in my office and have started to search for
references to present my case.  I'm not kidding because I was told that this
is the way it has always been calculated so don't mess with tradition.  If
anyone has any references, suggestions, openings for positions, cites to
search, etc. I would really appreciate it.  Many thanks in advance, Fran



Fran Ferrari, Researcher
Data Analysis & Statistics
Oklahoma Dept. of Corrections
50 NW 23rd Street
Oklahoma City, OK  73105
PH:  405-522-4964
FX:  405-522-4961




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



How Long do you have to Live (fwd)

2000-06-20 Thread Bob Hayden


- Forwarded message from Zina Taran -

Just in case you've got too much time:

 http://life-expectancy.longtolive.com/Life-Expectancy.asp

What caught my eye was the concluding statement, "Statistically you should
die on  ".

- End of forwarded message from Zina Taran -

I went there but it just crashed my browser.  I tried with and without
the final file name.  
 

  _
 | |Robert W. Hayden
 | |  Work: Department of Mathematics
/  |Plymouth State College MSC#29
   |   |Plymouth, New Hampshire 03264  USA
   | * |fax (603) 535-2943
  /|  Home: 82 River Street (use this in the summer)
 | )Ashland, NH 03217
 L_/(603) 968-9914 (use this year-round)
Map of New[EMAIL PROTECTED] (works year-round)
Hampshire http://mathpc04.plymouth.edu (works year-round)


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: differences between groups/treatments ?

2000-06-20 Thread Rich Ulrich



On 19 Jun 2000 18:01:28 -0700, [EMAIL PROTECTED] (Dónal Murtagh) wrote:

 < ... > 
> Firstly, thank you for your comments. Am I right in saying that the two
> (equivalent) options I have are:

These are not quite equivalent options since the first one really
stinks -- If you are considering drawing conclusions about causation,
you need *random assignment* and the two Groups of performance are the
furthest thing from random.

Let's see:  the simple notion of regression-to-the-mean  says that the
Best performers should fall back, the Worst performers should improve;
that's a weird main-effect, which should wreak havoc with interpreting
other effects.
Or:  If the Pre is powerful enough to measure potential, then a
continued-growth model says that Best performers should improve more,
even given no treatments.  

For simple change-scores (and ANOVA interactions) from dichotomous
groups, you assume that neither of those possibilities are true, if
you want to be able to interpret them.

The Regression model at least places the contrasts into the realm 
of comparing the regression lines.  Your fundamental knowledge 
of what is happening will probably come from examining and comparing
the scatterplots, pre-post, for the two treatments.  (Another thing to
note from the picture:  Are there ceiling/basement effects on the
performance test?)

> 1.ANOVA
> 
> Yijk = mew + Ai + Bj + ABij + Eijk
> 
> Ai:   a fixed factor representing the treatments (2 levels)
> Bj:   a fixed factor representing prior perfromance (2 levels)
> ABij: an interaction between Ai and Bj
> Yijk: the score of the kth child who received treatment i and is from group j
> Eijk: random error
> 
> I suspect that this model is inapporpriate, as the Eijk term will represent
> between subjects (children) variation, which is not usually included in the
> estimate of random error.
> 
> 2.MLR
> 
> Y = Bo + B1*X1 + B2*X2 + B3*X3 + E
> 
> X1:   prior performance (0 => weak, 1 => strong)
> X2:   treatment (0 => treament A, 1 => treatment B)
> X3:   treatment*prior performance
> 
> I appreciate that prior performance is probably better considered as a
> continuum, rather than a dichotomy.
> 

 - Treating it as a continuum is better by a lot, even if you were
sure that the Performance scale
was close to the ANOVA-analytic ideal, a normal distribution.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



How Long do you have to Live

2000-06-20 Thread Zina Taran

Just in case you've got too much time:

 http://life-expectancy.longtolive.com/Life-Expectancy.asp

What caught my eye was the concluding statement, "Statistically you should
die on  ".


 How Long do you have to Live.url