Tolerance intervals (calculation of tolerance factors)
I am looking to an algorithm to compute the tolerance factors used in constructing normal tolerance limits. I have the article 'Tables of Tolerance-Limit Factors for Normal Distributions' Alfred Weissberg and Glenn H Beatty 1960. This contains tables of r(N,P) and u(f,y) multiplying these together gives the tolerance factor K. I beleive the non-central t-distribution and chi-square distribution are used for calculation of these. Does anybody have an algorithm (in any language) available to do the calculations or have the formulas available? Many thanks Dr L Green. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Combining 2x2 tables
On Thu, 30 Mar 2000, JohnPeters wrote: > Hi, > I was wondering if someone could help me. I am interested in combining > 2x2 tables from multiple studies. The test used is the McNemar's > chi-sq. I have the raw data from each of these studies. What is the > proper correction that should be used when combining the results. > Thanks!!! Meta-analysis is a common way to combine information from 2x2 tables, but I'm not sure how you would do this with McNemar's chi-square as your measure of "effect size" for each table. It might be possible if you are willing to use something else. It's Friday afternoon, and this is off the top of my head, but here goes anyway. I wonder if you could write the tables this way: Change Yes No -ab Before +cd Cell a: change from - to + Cell b: no change, - before and after Cell c: change from + to - Cell d: no change, + before and after Suppose we're talking about change in opinion after hearing a political speech. The odds ratio for this table would give you the odds of changing from a negative to a positive oppion over the odds of changing from positive to negative. If you're the speaker, you're hoping for an odds ratio greater than 1 (i.e., greater change in those who were negative before the speech). If the amount of change is similar in both groups, the odds ratio will be about 1. If this is a legitimate way to analyze the data for one such table, and I can't see why not, then you could pool the tables meta-analytically with ln(OR) as your measure of effect size. Here's a paper that describes how to go about it: Fleiss, JL. (1993). The statistical basis of meta-analysis. Statistical Methods in Medical Research, 2, 121-145. There are also free programs available for performing this kind of meta-analysis. I have links to some in the statistics section of my homepage. Hope this helps. Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: testing a coin flipper
Here is a somewhat DIY approach. Comments? In article <[EMAIL PROTECTED]>, Bob Parks <[EMAIL PROTECTED]> writes >Consider the following problem (which has a real world >problem behind it) > >You have 100 coins, each of which has a different >probability of heads (assume that you know that >probability or worse can estimate it). > >Each coin is labeled. You ask one person (or machine >if you will) to flip each coin a different number of times, >and you record the number of heads. > >Assume that the (known/estimated) probability of heads >is between .01 and .20, and the number of flips for >each coin is between 4 and 40. So there are only about 41 possible different results (# of heads seen) for each individual coin, and it is possible to calculate the probability of each of those 41 different results under the null hypothesis: prob(observed) ~ Binomial(p_know_i, 40) or something > >The question is how to test that the person/machine >doing the flipping is flipping 'randomly/fairly'. That is, >the person/machine might not flip 'randomly/fairly/...' >and you want to test that hypothesis. > >One can easily state the null hypothesis as > > p_hat_i = p_know_i for i=1 to 100 > >where p_hat_i is the observed # heads / # flips for each i. > >Since each coin has a different probability of heads, >you can not directly aggregate. > But here I assume that, for each coin, you can attach some sort of 'score' to each of its 41 possible results. This might be (observed - expected)^2/expected, or -log(prob observed | null hypothesis), or something that reflects your desired alternative hypothesis more closely: e.g. if you are looking for a consistent bias to heads you might include the sign of the deviation in the score, or if you are looking for a trend effect you might set scores for a coin according to its position in your list of 100 coins. I also assume that the final statistic is produced by summing the individual scores. The remaining question is how to estimate the significance of the result. Chances are, your scores are small floating point numbers. Shift, scale and round them to convert them all to integers of reasonable size - say in the range 0,1,2,... 1000. The total score is then in the range 0..4 or so. It isn't quite as powerful a statistic as the original one, but it is susceptible to exact calculation. The distribution of an integer valued score can be represented by an array of floating point numbers: the probabilities that the score is equal to 0, 1, 2, ... 4. What is more, the distribution of an independent sum of two such scores is computed by simply convolving the two distributions. Even without the FFT, convolving arrays of 1000 and 40,000 floats looks doable on a modern machine. In fact, it's easier than that because only 41 of the 1000 floats in the smaller of the two arrays to be convolved at each stage are non-zero. Repeat this process 100 times and you've got the exact distribution of your final (integer-valued) score. -- A. G. McDowell === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Tolerance intervals (calculation of tolerance factors)
I am looking to an algorithm to compute the tolerance factors used in constructing normal tolerance limits. I have the article 'Tables of Tolerance-Limit Factors for Normal Distributions' Alfred Weissberg and Glenn H Beatty 1960. This contains tables of r(N,P) and u(f,y) multiplying these together gives the tolerance factor K. I beleive the non-central t-distribution and chi-square distribution are used for calculation of these. Does anybody have an algorithm (in any language) available to do the calculations or have the formulas available? Many thanks Dr L Green. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
6 NJ short courses & seminars
Springtime for Statistics (April-May-June) Six New Jersey Area announcements [1] Logistic Regression Short Course [2] Clinical Trials Short Course [3] Multiple Comparison & Exact Inference Short Courses [4] Bates' Nonlinear Regression Short Course [5] ICSA Symposium [6] NJ Chapter, ASA Spring Symposium [7] announcement of conscience ===( Announcement #1: Short Course )=== The New Jersey and New York City Metro Chapters Present: An American Statistical Association Short Course, An Introduction to Logistic Regression Stanley Lemeshow, Ph.D. FRIDAY April 7, 2000 9:00 A.M.-1:00 P.M. Course Outline: * The Logistic Regression Model (Chap 1 and 2) * Estimating the Coefficients in the Logistic Model (Chap 1 and 2) * Assessing Model Performance (Chap 5) Text: Hosmer, D. W., & Lemeshow, S. (1989). Applied Logistic Regression. New York: Wiley. Handout will be provided. Text is available from John Wiley Publishers Dr. Lemeshow is Director of the Ohio State University Biostatistics Program and Professor of Biostatistics in School of Public Health and Department of Statistics. He has 25 years experience in research and teaching in biomedical applications; he is an internationally recognized statistician for his contributions to the fields of logistic regression, sample survey methods, and survival analysis. He is Fellow of the American Statistical Association and co-author of 4 recent texts in applied statistical methods: Applied Logistic Regression, Applied Survival Analysis, Sampling of Populations, and Adequacy of Sample Size. Location: Montclair State University, Upper Montclair, NJ Richardson Hall, RI-106 Time:9:00 A. M. to 1:00 P. M. 8:30 A.M. Registration and Continental Breakfast Registration: $85 Chapter members, $95 Non members, $50 Students Fee includes handout, continental breakfast and box lunch Reg. Deadline: March 31, 2000 Directions:visit Montclair web site for directions & pub transp: http://www.montclair.edu/welcome/directions.html Information: Cynthia Scherer,[EMAIL PROTECTED] [212] 733-4085 Registration Form An Introduction to Logistic Regression Stanley Lemeshow, Ph.D. Friday April 7, 2000 = Name: Organization: ___ Busness Address: ___ ___ Phone:___ Email ___ Registration Deadline: Friday, March 31, 2000 ASA Chapter Member $85 Non Member$95 Full TimeStudents$50 Payment enclosed $15 additional fee to register on site. Checks should be made out to: New York Metro ASA Chapter. Mail this Registration form and your check: Marcia Levenstein Pfizer Pharmaceuticals 235 E. 42nd Street MS 205-8-24 New York, New York 10017 Fax 212 -309-4346 <> ===( Announcement # 2: Presentation )=== Covance, The Princeton-Trenton and New Jersey Chapters of the American Statistical Association present Dr. Gordon Lan, Ph.D. "The Use of Conditional Power in Interim Analyses of Clinical Trials." 28 April 2000 3:00 - 5:00pm Covance, Inc. 206 Carnegie Center, Princeton, NJ Please R.S.V.P. and fax to Covance at (609) 514-0971 by Tuesday, 25 April 2000. Dr. Lan is a Senior Technical Advisor at Pfizer Central Research, Groton, Connecticut. His tenure at Pfizer since 1995 follows an academic career, including the appointments of Professor of Statistics at George Washington University, and Mathematical Statistician at the National Heart, Lung and Blood Institute of the National Institutes of Health. Directions >From the New York - Northern New Jersey area: Take the New Jersey Turnpike South to Exit 9. Follow the signs for Route 18 North immediately watch for signs for Route 1 South. Proceed on Route 1 South for approximately 17 miles. Take the Alexander Road East exit (toward Princeton Junction) and cross over Route 1 (the Princeton Hyatt Hotel will be on your right)
Re: Combining 2x2 tables
On Thu, 30 Mar 2000 11:22:32 -0500, JohnPeters <[EMAIL PROTECTED]> wrote: > I was wondering if someone could help me. I am interested in combining > 2x2 tables from multiple studies. The test used is the McNemar's > chi-sq. I have the raw data from each of these studies. What is the > proper correction that should be used when combining the results. That test is an approximation for the binomial - the test for 50% in each group. Does that tell you enough so you can figure what to do? I don't know what you are looking for as a 'correction' but there are several different ways to combine results from multiple studies. Do you elect to combine the p-levels or combine some measure of the effect-size, and do you weight studies equally, or according to N, or according to precision of result? - if that is different from N. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Kruskal-Wallis & equal variances
- I can address a couple of concrete points - On Sat, 25 Mar 2000 15:22:43 GMT, Gene Gallagher <[EMAIL PROTECTED]> wrote: < snip > > The real problem that we often see is a dataset composed of lots of zeros > with a few positive values. From the literature, especially Hollander & > Wolfe, I know that a high percentage of ties poses problems for procedures > based on ranks (even with the ties procedures). In that case, I thought that > random permutation tests would provide a better alternative. However, as I > mentioned in my original post, Manly is cautious about inferences based on > random permutations when there are large differences in the variances among > populations and the sample sizes are small.When you have lots of zeros in the > dataset, the problem of ties is confounded with the problems of unequal > variances. To take the last thing first: I don't worry about unequal variances when I have data consisting of just a few integer values. There won't be "outliers" in the important sense when extremes don't exist. And you can generate you own simple examples of dichotomies, with unequal N and unequal proportions, in order to see that the "pooled estimate of the variance" works better, giving more accurate p-levels, than the Satterthwaite version ("using separate variance estimates, for unequal variances"). And the pooled-test p-levels are pretty good in the absolute sense, too. Agresti has a fine example of mostly-zero, mostly tied, in both of his books on categorical data analysis. Right now, I am pulling this from memory: I think he shows three ways to score, so that categories -- a) are scored arbitrarily 0,1,2,3, with a moderately good test; b) are scored according to the underlying measure, "number of drinks", as 0,1,3,8 -- or some such -- with a more powerful test; c) are scored by average rank, as with the usual "nonparametric test": after linear massaging, these are equivalent to arbitrary scores of 0,1, 1.2, 1.3 -- or some such, and the test (which is now practically equivalent to 0 versus other) no longer rejects at 5% when applied to his survey example. Consider it this way. With most of the "expectation" depending on the huge category of zeroes, what you get by scoring the other categories are essentially the *weights* for contrasting those categories with 0. If there were several different tests, how important would you want to consider those groups, relatively? Since there is a rapidly diminishing N (if I remember the example right), even the scoring "0,1,2,3" gives most weight to the 0/1 comparison. Obviously, the test after forcing a transform to ranks is even weaker in its weight for the higher groups. That is not all that I have hoped to say, but that may be all that I get to, this time around. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: testing a coin flipper
- Original Message - From: Bob Parks <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, March 30, 2000 6:44 AM Subject: testing a coin flipper > Consider the following problem (which has a real world > problem behind it) > > You have 100 coins, each of which has a different > probability of heads (assume that you know that > probability or worse can estimate it). > > Each coin is labeled. You ask one person (or machine > if you will) to flip each coin a different number of times, > and you record the number of heads. .. Incidentally, I found that William Feller in chapter III (vol I) of his classic book "An Introduction to Probability Theory and its Applications", covers coin flipping nicely. The sequence is treated as a random walk. The probability of the sign reversal (i.e. heads is +1 and tails is -1) is low, indicating long intervals between successive crossing of the axis. His Theorem 1 (page 84) states that the probability (e), that up to epoch 2n+1 (n flips), there occurs exactly r changes of sign equals 2 times the probability of the sum of events being equal to 2r+1 in 2n+1 trials. (Involves the number of paths to 2r+1 out of 2n+1 trials.) His table on page 85 gives the probabilities of zero sign reversal in 99 trials as 0.1592, which is surprisingly high. DAHeiser === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Out of sample prediction
On 31 Mar 2000 06:41:38 GMT, [EMAIL PROTECTED] (Victor Aina) wrote: >I've got 2 non-overlapping periods. Data is >available for period one (the first period). >The intention is to predict observations that >will be coming in period 2. > >Now, suppose an extra information is available for >period 2. In particular, suppose it is known that >the values of observations in the 1st half of >period two will increase, and thereafter level off. > >My question is what options are available for >capturing (in a regression model) such problem? >And what are the caveats and/or pitfalls? I'm biased towards using genetic algorithms, so the following advice is GA-oriented. Compose a function with adjustable coefficients that, with suitable choice of coefficient values, can be made to fit the known data. Add a function to it that a) meets the requirements of the "extra information", without messing up the ability of the first function to fit the data. Evolve the resulting function for optimal fit to the data. You can download a demo version of Generator, an easy-to-use GA which works with Excel spreadsheets, from http://www.iea.com/~nli. I'd be glad to advise you on how to set up a spreadsheet for your purpose. Steve === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: "Kolmogorov-Smirnov" vs "Chi Square"
In article <8c0ctq$kol$[EMAIL PROTECTED]>, CD Madewell <[EMAIL PROTECTED]> wrote: >I wonder if the writer of the original question really wanted a >thesis or just a simple answer in how to look at a data set and decide >which of the two test (he mentioned) to use. If he wanted a discussion >on which test was more powerful and etc. he should have included that in >his question. Although I was wrong to say it was the "main point", my >answer does serve to offer a down to Earth method of decision making. How should one decide which type of test to use EXCEPT by looking at its power? Statistics is not a collection of mantras to appease the gods. In making a decision, one has to consider all the consequences in all states of nature. The use of an "easy" test because one has been taught it, without considering the consequences, is wrong. The writer of the original question asked for the reasons to use one or the other. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Honors Projects
I need ideas about undergraduate honors projects in statistics - what is the practice at various colleges/universities, what should the empahsis be etc. Thanks. * Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping. Smart is Beautiful === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===