Tolerance intervals (calculation of tolerance factors)

2000-03-31 Thread lg106

I am looking to an algorithm to compute the tolerance factors used in
constructing normal tolerance limits. I have the article 'Tables of
Tolerance-Limit Factors for Normal Distributions' Alfred Weissberg and
Glenn H Beatty 1960. This contains tables of r(N,P) and u(f,y)
multiplying these together gives the tolerance factor K. I beleive the
non-central t-distribution and chi-square distribution are used for
calculation of these. Does anybody have an algorithm (in any language)
available to do the calculations or have the formulas available?
Many thanks
Dr L Green.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Combining 2x2 tables

2000-03-31 Thread Bruce Weaver

On Thu, 30 Mar 2000, JohnPeters wrote:

> Hi,
> I was wondering if someone could help me.  I am interested in combining
> 2x2 tables from multiple studies.  The test used is the McNemar's
> chi-sq.  I have the raw data from each of these studies.  What is the
> proper correction that should be used when combining the results.
> Thanks!!!


Meta-analysis is a common way to combine information from 2x2 tables, but
I'm not sure how you would do this with McNemar's chi-square as your
measure of "effect size" for each table.  It might be possible if you
are willing to use something else. 

It's Friday afternoon, and this is off the top of my head, but here goes 
anyway.  I wonder if you could write the tables this way:

 Change
Yes   No
-ab
Before
+cd


Cell a:  change from - to +
Cell b:  no change, - before and after
Cell c:  change from + to -
Cell d:  no change, + before and after

Suppose we're talking about change in opinion after hearing a political
speech.  The odds ratio for this table would give you the odds of changing
from a negative to a positive oppion over the odds of changing from
positive to negative. If you're the speaker, you're hoping for an odds 
ratio greater than 1 (i.e., greater change in those who were negative 
before the speech).  If the amount of change is similar in both groups, 
the odds ratio will be about 1.  

If this is a legitimate way to analyze the data for one such table, and I 
can't see why not, then you could pool the tables meta-analytically with 
ln(OR) as your measure of effect size.  Here's a paper that describes how 
to go about it:

Fleiss, JL. (1993). The statistical basis of meta-analysis. Statistical 
Methods in Medical Research, 2, 121-145.

There are also free programs available for performing this kind of 
meta-analysis.  I have links to some in the statistics section of my 
homepage.

Hope this helps. Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: testing a coin flipper

2000-03-31 Thread A. G. McDowell

Here is a somewhat DIY approach. Comments?

In article <[EMAIL PROTECTED]>, Bob Parks
<[EMAIL PROTECTED]> writes
>Consider the following problem (which has a real world
>problem behind it)
>
>You have 100 coins, each of which has a different
>probability of heads (assume that you know that
>probability or worse can estimate it).
>
>Each coin is labeled.  You ask one person (or machine
>if you will) to flip each coin a different number of times,
>and you record the number of heads.
>
>Assume that the (known/estimated) probability of heads
>is between .01 and .20, and the number of flips for
>each coin is between 4 and 40.
So there are only about 41 possible different results (# of heads seen)
for each individual coin, and it is possible to calculate the
probability of each of those 41 different results under the null
hypothesis: prob(observed) ~ Binomial(p_know_i, 40) or something
>
>The question is how to test that the person/machine
>doing the flipping is flipping 'randomly/fairly'.  That is,
>the person/machine might not flip 'randomly/fairly/...'
>and you want to test that hypothesis.
>
>One can easily state the null hypothesis as
>
>  p_hat_i = p_know_i  for i=1 to 100
>
>where p_hat_i is the observed # heads / # flips for each i.
>
>Since each coin has a different probability of heads,
>you can not directly aggregate.
>
But here I assume that, for each coin, you can attach some sort of
'score' to each of its 41 possible results. This might be (observed -
expected)^2/expected, or -log(prob observed | null hypothesis), or
something that reflects your desired alternative hypothesis more
closely: e.g. if you are looking for a consistent bias to heads you
might include the sign of the deviation in the score, or if you are
looking for a trend effect you might set scores for a coin according to
its position in your list of 100 coins.

I also assume that the final statistic is produced by summing the
individual scores. The remaining question is how to estimate the
significance of the result.

Chances are, your scores are small floating point numbers. Shift, scale
and round them to convert them all to integers of reasonable size - say
in the range 0,1,2,... 1000. The total score is then in the range
0..4 or so. It isn't quite as powerful a statistic as the original
one, but it is susceptible to exact calculation. The distribution of an
integer valued score can be represented by an array of floating point
numbers: the probabilities that the score is equal to 0, 1, 2, ...
4. What is more, the distribution of an independent sum of two such
scores is computed by simply convolving the two distributions. Even
without the FFT, convolving arrays of 1000 and 40,000 floats looks
doable on a modern machine. In fact, it's easier than that because only
41 of the 1000 floats in the smaller of the two arrays to be convolved
at each stage are non-zero. Repeat this process 100 times and you've got
the exact distribution of your final (integer-valued) score.
-- 
A. G. McDowell


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Tolerance intervals (calculation of tolerance factors)

2000-03-31 Thread lg106

I am looking to an algorithm to compute the tolerance factors used in
constructing normal tolerance limits. I have the article 'Tables of
Tolerance-Limit Factors for Normal Distributions' Alfred Weissberg and
Glenn H Beatty 1960. This contains tables of r(N,P) and u(f,y)
multiplying these together gives the tolerance factor K. I beleive the
non-central t-distribution and chi-square distribution are used for
calculation of these. Does anybody have an algorithm (in any language)
available to do the calculations or have the formulas available?


Many thanks
Dr L Green.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



6 NJ short courses & seminars

2000-03-31 Thread Barron, Alfred [PRI]


 Springtime for Statistics (April-May-June) 
Six New Jersey Area announcements

[1]  Logistic Regression Short Course
[2]  Clinical Trials Short Course
[3]  Multiple Comparison & Exact Inference Short Courses
[4]  Bates' Nonlinear Regression Short Course
[5]  ICSA Symposium 
[6]  NJ Chapter, ASA Spring Symposium 
[7]  announcement of conscience

===( Announcement #1: Short Course )===

The New Jersey and New York City Metro Chapters Present: 

An American Statistical Association Short Course,
   An Introduction to Logistic Regression

Stanley Lemeshow, Ph.D.

   FRIDAY April 7, 2000
9:00 A.M.-1:00 P.M.

 Course Outline: 

* The Logistic Regression Model (Chap 1 and 2)
* Estimating the Coefficients in the Logistic Model (Chap 1 and 2)
* Assessing Model Performance (Chap 5)

Text: Hosmer, D. W., & Lemeshow, S. (1989).  
Applied Logistic Regression. New York: Wiley.
Handout will be provided.  
Text is available from John Wiley Publishers

Dr. Lemeshow is Director of the Ohio State University Biostatistics 
Program and Professor of Biostatistics in School of Public Health and 
Department of Statistics. He has 25 years experience in research and 
teaching in biomedical applications; he is an internationally recognized 
statistician for his contributions to the fields of logistic regression, 
sample survey methods, and survival analysis.  He is Fellow of the 
American Statistical Association and co-author of 4 recent texts in
applied statistical methods: Applied Logistic Regression, Applied 
Survival Analysis, Sampling of Populations, and Adequacy of
Sample Size.

Location:  Montclair State University, Upper Montclair, NJ
 Richardson Hall, RI-106

Time:9:00 A. M. to 1:00 P. M. 
8:30 A.M.  Registration and Continental Breakfast 

Registration:   $85 Chapter members, $95 Non members, $50 Students
Fee includes handout, continental breakfast and
box lunch
Reg. Deadline:  March 31, 2000
Directions:visit Montclair web site for directions & pub transp:
http://www.montclair.edu/welcome/directions.html
Information:   
Cynthia Scherer,[EMAIL PROTECTED]  
[212] 733-4085 


  Registration Form
  An Introduction to Logistic Regression
 Stanley Lemeshow, Ph.D. 
  Friday April 7, 2000
=

Name:   

Organization:  ___

Busness
Address: ___

  ___

Phone:___

Email ___
  
Registration Deadline: Friday, March 31, 2000

ASA Chapter Member  $85 Non Member$95   

Full TimeStudents$50

Payment enclosed $15 additional fee to register on site. 

Checks should be made out to: New York Metro ASA Chapter.  Mail this 
Registration form and your check:

Marcia Levenstein
Pfizer Pharmaceuticals 
235 E. 42nd Street MS 205-8-24
New York, New York 10017
Fax 212 -309-4346

 <> 
===( Announcement # 2: Presentation )===

Covance, The Princeton-Trenton and New Jersey Chapters 
of the American Statistical Association present

 Dr. Gordon Lan, Ph.D.
"The Use of Conditional Power in Interim Analyses of Clinical Trials."

   28 April 2000
   3:00 - 5:00pm

   Covance, Inc.
   206 Carnegie Center,
  Princeton, NJ

Please R.S.V.P. and fax to Covance at (609) 514-0971 
by Tuesday, 25 April 2000.

Dr. Lan is a Senior Technical Advisor at Pfizer Central Research, 
Groton, Connecticut.  His tenure at Pfizer since 1995 follows an 
academic career, including the appointments of Professor of Statistics 
at George Washington University, and Mathematical Statistician at 
the National Heart, Lung and Blood Institute of the National Institutes 
of Health.  

Directions

>From the New York - Northern New Jersey area:
Take the New Jersey Turnpike South to Exit 9.  Follow the signs for Route 18
North
immediately watch for signs for Route 1 South. Proceed on Route 1 South for 
approximately 17 miles.  Take the Alexander Road East exit (toward Princeton
Junction) 
and cross over Route 1 (the Princeton Hyatt Hotel will be on your right)

Re: Combining 2x2 tables

2000-03-31 Thread Rich Ulrich

On Thu, 30 Mar 2000 11:22:32 -0500, JohnPeters <[EMAIL PROTECTED]>
wrote:

> I was wondering if someone could help me.  I am interested in combining
> 2x2 tables from multiple studies.  The test used is the McNemar's
> chi-sq.  I have the raw data from each of these studies.  What is the
> proper correction that should be used when combining the results.

That test is an approximation for the binomial - the test for 50% in
each group.  Does that tell you enough so you can figure what to do?

I don't know what you are looking for as a 'correction' but there are
several different ways to combine results from multiple studies.  Do
you elect to combine the p-levels or combine some  measure of
the effect-size, and do you weight studies equally, or according to N,
or according to precision of result? - if that is different from N. 

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Kruskal-Wallis & equal variances

2000-03-31 Thread Rich Ulrich

 -  I can address a couple of concrete points -

On Sat, 25 Mar 2000 15:22:43 GMT, Gene Gallagher
<[EMAIL PROTECTED]> wrote:

 < snip > 
> The real problem that we often see is a dataset composed of lots of zeros
> with a few positive values. From the literature, especially Hollander &
> Wolfe, I know that a high percentage of ties poses problems for procedures
> based on ranks (even with the ties procedures). In that case, I thought that
> random permutation tests would provide a better alternative. However, as I
> mentioned in my original post, Manly is cautious about inferences based on
> random permutations when there are large differences in the variances among
> populations and the sample sizes are small.When you have lots of zeros in the
> dataset, the problem of ties is confounded with the problems of unequal
> variances.

To take the last thing first: I don't worry about unequal variances
when I have data consisting of just a few integer values.  There won't
be "outliers" in the important sense when extremes don't exist.  And
you can generate you own simple examples of dichotomies, with unequal
N and unequal proportions, in order to see that the "pooled estimate
of the variance" works better, giving more accurate p-levels,  than
the Satterthwaite version ("using separate variance estimates, for
unequal variances").  And the pooled-test p-levels are pretty good in
the absolute sense, too.


Agresti has a fine example of mostly-zero, mostly tied, in both of his
books on categorical data analysis.  Right now, I am pulling this from
memory:  I think he shows three ways to score, so that categories --

 a) are scored arbitrarily 0,1,2,3, with a moderately good test;
 b) are scored according to the underlying measure, "number of
drinks", as 0,1,3,8 -- or some such -- with a more powerful test;
 c) are scored by average rank, as with the usual "nonparametric
test": after linear massaging, these are equivalent to arbitrary
scores of 0,1, 1.2, 1.3 -- or some such, and the test (which is now
practically equivalent to 0 versus other) no longer rejects at 5% when
applied to his survey example.

Consider it this way.  With most of the "expectation" depending on the
huge category of zeroes, what you get by scoring the other categories
are essentially the *weights* for contrasting those categories with 0.
If there were several different tests, how important would you want to
consider those groups, relatively?  Since there is a rapidly
diminishing N (if I remember the example right), even the scoring
"0,1,2,3"  gives most weight to the 0/1 comparison.   Obviously, the
test after forcing a transform to ranks is even weaker in its weight
for the higher groups.

That is not all that I have hoped to say, but that may be all that I
get to, this time around.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: testing a coin flipper

2000-03-31 Thread David A. Heiser


- Original Message -
From: Bob Parks <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 30, 2000 6:44 AM
Subject: testing a coin flipper


> Consider the following problem (which has a real world
> problem behind it)
>
> You have 100 coins, each of which has a different
> probability of heads (assume that you know that
> probability or worse can estimate it).
>
> Each coin is labeled.  You ask one person (or machine
> if you will) to flip each coin a different number of times,
> and you record the number of heads.

..
Incidentally, I found that William Feller in chapter III (vol I) of his
classic book "An Introduction to Probability Theory and its Applications",
covers coin flipping nicely.

The sequence is treated as a random walk. The probability of the sign
reversal (i.e. heads is +1 and tails is -1) is low, indicating long
intervals between successive crossing of the axis. His Theorem 1 (page 84)
states that the probability (e), that up to epoch 2n+1 (n flips), there
occurs exactly r changes of sign equals 2 times the probability of the sum
of events being equal to 2r+1 in 2n+1 trials. (Involves the number of paths
to 2r+1 out of 2n+1 trials.)

His table on page 85 gives the probabilities of zero sign reversal in 99
trials as 0.1592, which is surprisingly high.

DAHeiser



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Out of sample prediction

2000-03-31 Thread Steve McGrew

On 31 Mar 2000 06:41:38 GMT, [EMAIL PROTECTED] (Victor Aina) wrote:
>I've got 2 non-overlapping periods. Data is
>available for period one (the first period).
>The intention is to predict observations that
>will be coming in period 2.
>
>Now, suppose an extra information is available for
>period 2. In particular, suppose it is known that
>the values of observations in the 1st half of
>period two will increase, and thereafter level off.
>
>My question is what options are available for
>capturing (in a regression model) such problem?
>And what are the caveats and/or pitfalls?

I'm biased towards using genetic algorithms, so the following
advice is GA-oriented.

Compose a function with adjustable coefficients that, with
suitable choice of coefficient values, can be made to fit the known
data.  Add a function to it that a) meets the requirements of the
"extra information", without messing up the ability of the first
function to fit the data.  Evolve the resulting function for optimal
fit to the data.

You can download a demo version of Generator, an easy-to-use
GA which works with Excel spreadsheets, from http://www.iea.com/~nli.
I'd be glad to advise you on how to set up a spreadsheet for your
purpose.

Steve



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: "Kolmogorov-Smirnov" vs "Chi Square"

2000-03-31 Thread Herman Rubin

In article <8c0ctq$kol$[EMAIL PROTECTED]>,
CD Madewell  <[EMAIL PROTECTED]> wrote:
>I wonder if the writer of the original question really wanted a
>thesis or just a simple answer in how to look at a data set and decide
>which of the two test (he mentioned) to use.  If he wanted a discussion
>on which test was more powerful and etc. he should have included that in
>his question.  Although I was wrong to say it was the "main point", my
>answer does serve to offer a down to Earth method of decision making.

How should one decide which type of test to use EXCEPT by
looking at its power?  Statistics is not a collection of
mantras to appease the gods.

In making a decision, one has to consider all the consequences
in all states of nature.  The use of an "easy" test because
one has been taught it, without considering the consequences,
is wrong.  The writer of the original question asked for the
reasons to use one or the other.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Honors Projects

2000-03-31 Thread Alphonse

I need ideas about undergraduate honors projects in
statistics - what is the practice at various
colleges/universities, what should the empahsis be etc.
Thanks.



* Sent from AltaVista http://www.altavista.com Where you can also find related Web 
Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===