Re: Degrees of Freedom
On Thu, 27 Apr 2000, GEORGE PERKINS wrote: I got a call the other day from a high school science teacher asking about the following: She is testing different brands of yogurt for acid neutralization by acidophilus bacteria. O.K. To start with we have some unspecified number b of grands of yogurt; it follows that either we want to average them all together, so as to ignore any systematic differences that may exist between brands, or we want to keep them explicitly separate, so that we can detect (or at any rate attempt to detect) systematic differences between brands. If b 2, already t-tests are to be discarded in favor of analysis of variance (ANOVA). Her students have measured the pH of yogurt then poured in a known amount of acid and began measuring pH in intervals of 1 minute for 5 minutes. She has six replicates for each of the types of yogurt for a total of 12 time series. If there are only 12 time series, then it appears b = 2. Yes? Now the manipulation and measuring seem to have been carried out by some (also unspecified number of) students. Are the six replicates associated with six students, each of whom carried out one replicate? Or is the procedure followed rather messier than that? And if the several students aren't equivalent to the replicates, in what precisely do the replicates consist? She wants to test if the mean concentration of acid is different in the two groups by taking the initial pH value - final pH value for each replicate getting a total of six differences per group then finds a mean of differences for each set. Only initial vs. final? What was the point of the 1-minute-apart administration of acid and measurement of pH, if one is going to ignore the time-series information altogether? Finally, she wants to take the means from each set of differences and do a hypothesis test mu1=mu2 using a t-test but can't figure out the degrees of freedom of the test and frankly I am not quite sure either. Why? That is, why a t-test? Because that's the only form of analysis she knows how to do? The situation clearly calls for a repeated-measures ANOVA; and I'd bet that if she actually does treat it as a t-test (comparing Brand B with Brand X, I'd guess?), which could be equivalent to the formal test of one of the main effects in the proper ANOVA, she won't correctly calculate the sampling variance of the two means. If it be the case that she doesn't know how to do ANOVA, point her gently in the direction of Bruning Kintz, Computational Handbook of Statistics, which must be in a 4th or 5th edition by now. Marvellous cookbook -- leads the naive (or for that matter not so naive) reader through the necessary arithmetic step by step [rather as though one were writing a computer program for the computer between one's ears] for a _wide_ variety of formal analyses, and supplies references for those who want to pursue the matter further. Her idea is to take 12-2 degrees but others have said it should be 6-1 degrees. I wonder if others out there can shed light on three issues: Well, let's see. If I've sorted this out aright, she has six replicates (r = 6) of time-series measurements (t = 6) on each of two brands of yogurt (b = 2) Looks like 72 measurements all together. Presumably the six time points are conceptually or logically equivalent for all 12 time series, so replicates (R) are crossed with time (T), and they are necessarily nested within brand (B); we have therefore a formal design of the form R(B)xT -- a repeated measures design. The formal ANOVA table will have the following lines: Source df Error term Brand1R(B) Replicates(Brand) 10--- Time 5TR(B) Brand x Time 5TR(B) Time x Repl (Brand) 50--- TOTAL 71 (Another name for "Error term" is "denominator mean square".) If she decides to discard all the data in the time series except for the first and last measurements, then there's only 1 d.f. for Time, and only 1 d.f. for the Time-by-Brand interaction, and 10 d.f. for TR(B). 1) Is the t-test approach she is using on solid statistical footing, and if so how many degrees of freedom is to be used for the t-test? Well, _I'd_ use ANOVA myself. Error d.f. for Brand are 10. 2) If the t-test approach is not legitimate what type of statistical test can be used to test the mu1=mu2 hypothesis? (keep in mind that these are high school students) Discussed at length above. Get Bruning Kintz. 3) Is there a 'better' way to proceed with the analysis in the future for these types of experiments? Yes. If you want to answer could you please forward the response to my
Re: Is Bootstrapping Appropriate?
Date: Wed, 26 APR 2000 22:05:01 -0400 From: Zubin [EMAIL PROTECTED] Yes, I believe your methodology will work. However, you should sample a window of data rather than a single data point when you calculate your re-sampling statistics. I am not sure on the window size, though. 1. Based on the 1 sec 1/e decorrelation time , or on another characteristic time, T, based on thresholding the autocorrelation at another level, combine the values in the window (t-T/2,t+T/2) to create a value associated with the random draw that picked sample x(t). 2. Treat this value as if it were a random draw of independent measurements? If this is what you meant I don't think that defeats the correlation problem if the draws are performed with replacement. If the draws are performed without replacement, wouldn't you need to exclude the whole window of length T? ... I don't think that would yield many values before you ran out of allowable windows. To defeat that problem maybe I should just subsample uniformly at, say one pulse every T seconds to obtain M ~ 20*T samples with N ~ 26/T independent measurements per sample. Then perform M tests. For T ~ 2 sec (i.e., 2 decorrelation times), I get M ~ 40 tests with ~ 13 measurements per test. I then get M ~ 40 test probabilities to average. For the purposes of argument, let's say the procedure in the above paragraph is acceptable. Then would a resampling based on randomly picking one measurement from each of the 13 windows be better? That way I would get ~ 40^13 possible combinations instead of 40. Is the last paragraph what you were suggesting? Thanks, Greg Gregory E. Heath [EMAIL PROTECTED] The views expressed here are M.I.T. Lincoln Lab (781) 981-2815not necessarily shared by Lexington, MA(781) 981-0908(FAX) M.I.T./LL or its sponsors 02420-9185, USA Greg Heath [EMAIL PROTECTED] wrote in message Pine.SOL.3.91.1000426203238.20192C-10@miles">news:Pine.SOL.3.91.1000426203238.20192C-10@miles... Can you help or lead me to the appropriate reference? I have 526 radar measurements evenly sampled over 26.25 sec (i.e., pulse repetition frequency = 20 points per second). mean = 0.0 stdv = 1.2 t0= 1sec (1/e decorrelation time from the autocorrelation function) I want to test the null hypothesis that these correlated measurements could have been drawn from a zero-mean Gaussian distribution. However I don't believe I have enough independent measurements. Will bootstrapping help? i.e., --- SNIP --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Is Bootstrapping Appropriate?
Date: Fri, 28 APR 2000 00:00:45 GMT From: [EMAIL PROTECTED] 1. Randomly draw, with replacement, 526 measurements. You are only justfied in resampling in this way if you know that all your observations are iid. I didn't quite follow your problem but it sounds that the iid assumption is not justified. If your obs were iid then there would probably be much easier ways of testing your hypothesis than bootstrap. Could you please explain you problem again...How many obserations of the time series (1...T) do you have (i.e. what is T?) How many variables are in each observation of the time series(20?)? What are you testing about this time series? One variable, 20 measurements per second, 26.25 seconds (526 measurements). The 1/e decorrelation time estimated from the autocorrelation function is ~ 1 second. Therefore, I will get independent measurements approximately every T0 seconds (probably ~2 = T0 = ~ 4 sec) Could these correlated measurements have come from a Gaussian distribution? Please see my responses to the other replies. Greg Hope this helps. Gregory E. Heath [EMAIL PROTECTED] The views expressed here are M.I.T. Lincoln Lab (781) 981-2815not necessarily shared by Lexington, MA(781) 981-0908(FAX) M.I.T./LL or its sponsors 02420-9185, USA === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Q: error on RMS, __please__ help.
I am sorry for the confusion. English is not my native language and sometimes I am not precise enough. What I meant with the term error, was the statistical error of a measurement. I am interessted in the statistical relevance of the measurement (confidence interval that the measured value is correct with a probability of 68.3% = probability that the measured value is 1 sigma around the real value). And for sure, with 100 measurement I cannot measure the _real_ distribution and thus not measure the real rms. But I can estimate the rms and then I should give a number how good this estimation is. Rich Ulrich wrote: On Thu, 27 Apr 2000 14:43:08 +0200, Selim Issever [EMAIL PROTECTED] wrote: Dear all, I measure a physical quantity about 100 times. I am not interessted in the mean value but the spread (the RMS) of this quantity. I can calculate the RMS easily, but I also need the error on the RMS. Could you give me a hint how to calculate the error on the rms? From your description, there is no reason to think that there has to be any "error" at all. You have a set of measures. The are somewhat spread, for real, physical reasons. The dispersion looks like gaussian, but it would not have to be that shape. (How were the points selected? Why were they selected?) If you want to describe the spread of that set of measures by the RMS, you may do so -- though it might be more useful, it seems to me, to describe the extremes and the conditions that produced them. Why do you think there may be error in the measurements, and how would you detect it if there were? May be I should add, that the spread is not due to the measurement, but real. A good example would be a metal bar, which expands and shrinks due to stochastic temperature effects. The value I would be interessted in, is the _length_variation_ and an _error_estimation_ for this value. The distribution of the quantity I am looking at could be approximated by an gaussian (just in case it eases the discussion). At least it looks like a gaussian, when I histogram it. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html -- Selim Issever | Tel: 040 8998-2843+- Du sollst nicht gleichzeitig DESY-F15 | Fax: 040 8998-4033+- trinken und backen.- Notkestr. 85 | [EMAIL PROTECTED] +- A. Schwarzenegger/Der Cityhai 22603 Hamburg/Germany | http://www.physik.uni-dortmund.de/~issevers S M M + +: Your new mapping mud client @ http://smm.mudcenter.com --- This signature was automatically generated with Signify v1.04. For this and other cool products, check out http://www.verisim.com/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Question about kappa
I think I would consider using generalizability theory for this problem. Shavelson and Webb have a good book out on the subject, published by Sage. On Thu, 27 Apr 2000, Robert McGrath wrote: I am looking for a formula for kappa that applies for very special circumstances: 1) Two raters rated each event, but the raters varied across event. 2) The study involved 100 subjects, each of whom generated app. 17 events, so multiple events were generated by the same subject. I know Fleiss has developed a formula for kappa that allows for multiple sets of raters, but is there a formula that is appropriate for the circumstance I have described? Thanks for your help! Bob - Robert McGrath, Ph.D. School of Psychology T110A Fairleigh Dickinson University, Teaneck NJ 07666 voice: 201-692-2445 fax: 201-692-2304 - Original Message - From: "Bob Wheeler" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, April 27, 2000 3:15 PM Subject: Sample size and distributions programs I have uploaded two programs that some may find of use: (1) Tables. A Windows program written quite a few years ago. It treats 42 distributions extensively including plots and technical documentation. (2) SSize. A sample size program for the Palm devices. It treats linear models for several distributions: normal, binomial, Poisson, and chi-squared. ANOVA, t-tests, logistic, etc. There is a fairly extensive documentation in pdf format. This is a new program, so there are undoubtedly bugs. I would greatly appreciate hearing about them. They are at http://www.bobwheeler.com/stat/ -- Bob Wheeler --- (Reply to: [EMAIL PROTECTED]) ECHIP, Inc. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Statistical Software
I need to find a statistical software packages. Most of my statistical work has been done using Microsoft Excel. This has worked out fine, however, I need to find a more heavy duty package but nothing over whelming. I perform some simple statistical work but would like to begin to use a more powerful package. Any suggestion would be great. Thnaks Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Data Mining blooper and Related Subjects
I respectfully disagree with Michael Wyatt. I come from an academic background and now work outside of academia, except for the occassional course here or there. I too report to a manager or managers, depending on the circumstances. But my experiences have not been the same as his. I am constantly urged to use all my skills as a statistician and a research methodologist by "my managers." (Horrid!!!) Henry M. Silvert PHD Research Statistician The Conference Board 845 3rd. Avenue New York, NY 10022 Tel. No.: (212) 339-0438 Fax No.: (212) 836-3825 -Original Message- From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] Sent: Friday, April 28, 2000 7:52 AM To: [EMAIL PROTECTED] Subject: Re: Data Mining blooper and Related Subjects ...And it extends even further. Many of us who toil in areas outside of academia have our work and productivity "supervised" by managers or directors who have little or no training in statistics, beyond a survey course. They receive the flashy brochures and read the ads that promise analytical software that will provide significant information, without the bother of of formulating one of those fancy-shmancy hypotheses. The higher-ups come to view data mining, decision support, outcomes analysis, etc. as requiring no more skill than the ability to use a PC. I call it "The Myth of the Statistical Meat Grinder". The push of a button or two will generate the answer to all corporate questions, plus a few neat-o graphs for the board of directors packets. Michael T. Wyatt, Ph.D. (Embittered) Healthcare Analyst Quality Improvement Dept. DCH Regional Medical Center Tuscaloosa, AL On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts [EMAIL PROTECTED] writes: At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: It does not surprise me one bit. The typical statistics course teaches statistical methods and pronouncements, with no attempt to achieve understanding. snip of more this is something i happen to agree with herman about ... but, it is a much broader problem than can be attributed to what happens in one course it is an attitude about what higher education is all about ... and what the goals are for it 'going to college' ... be it undergraduate level or graduate level ... has become a much more hit and miss experience, residence has little meaning ... that is being tailored more and more to the convenience of students ... and to what is 'user' friendly (or it won't SELL). studying principles in disciplines is hard work ... NOT user friendly ... so, less and less is being required in the way of diligent study. take graduate school for example ... there was a time, was there not ... where doctoral students were REALLY expected to be responsible for their dissertations AND were expected to be the experts in that particular area of inquiry ... AND to be competent enough to have done the work him/herself ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT but, what i have noticed over many years is that dissertations are becoming more of a committee effort ... yes, the student MAY have had the idea (though not necessarily) but, from there ... he/she gets help with the design ... has someone else do the analysis (because he/she did not take any/sufficient work in analytic methods to understand what is going on) ... gets help in writing and editing .. and, even gets help in terms of what their results MEAN ... gives new meaning to the term: "cooperative learning" = == This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ = == YOU'RE PAYING TOO MUCH FOR THE INTERNET! Juno now offers FREE Internet Access! Try it today - there's no risk! For your FREE software, visit: http://dl.www.juno.com/get/tagj. == = This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the
Re: Is Bootstrapping Appropriate?
In article Pine.SOL.3.91.1000428033622.20399C-10@miles, Greg Heath [EMAIL PROTECTED] wrote: Date: Fri, 28 APR 2000 00:00:45 GMT From: [EMAIL PROTECTED] ... One variable, 20 measurements per second, 26.25 seconds (526 measurements). The 1/e decorrelation time estimated from the autocorrelation function is ~ 1 second. Therefore, I will get independent measurements approximately every T0 seconds (probably ~2 = T0 = ~ 4 sec) Could these correlated measurements have come from a Gaussian distribution? Please see my responses to the other replies. Bootstrapping is totally inappropriate. However, there are other simpler simulation methods of obtaining the significance level, using any test statistic you wish to use, assuming you are willing to use the particular value of the correlation coefficient and you are using a scale-invariant test. The variance will not affect your test in this problem. BTW, this method is the one used for obtaining significance levels for the Kolmogorov-Smirnov test when parameters are estimated. Construct samples according to the null hypothesis. The samples should be independent; the dependence within each sample should follow the model. Then use the empirical distribution to determine the significance of your data set. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Blackjack problem
Paul Bernhardt writes: True, but card counters abound. Last month's (April, 00) Discover Magazine had an article on gambling and mentioned a newly developed card counting strategy that you don't need to be a genius to execute effectively. I have a buddy who has placed in a Vegas Blackjack tournament. He counts cards, using his foot position to keep track of the aces (very important, as they are needed for Blackjacks). There are no casino cameras to monitor your foot position (yet) so that you can get away with it. I have a friend who is a professional black jack player, and from what I understand it is a bit more complicated than that. You have to change your betting behavior substantially when the deck is loaded with aces and face cards. If you don't change your betting behavior on the basis of the card count, how could you gain any advantage? You can walk away from a table when the deck has very few aces and face cards, but that doesn't help you as much as increasing your bets when the deck is in your favor. It is this change in betting behavior that tips off the casinos. I have a running joke here at the hospital about how we need to put some money in the budget for these research grant proposals for some applied probability research at Vegas. Steve Simon, [EMAIL PROTECTED], Standard Disclaimer. STATS - Steve's Attempt to Teach Statistics: http://www.cmh.edu/stats === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Blackjack problem
On 27 Apr 2000 13:50:24 -0700, [EMAIL PROTECTED] (Donald F. Burrill) wrote: [ ... ] (3) It is true for Blackjack, unlike nearly all other Las Vegas-type games, that a variable strategy on the part of the Player can change the statistical advantage to the Player's side. It should not surprise you that only a certain type of variable strategy, among a very large number of possible strategies, can have this effect; and that Players showing evidence of pursuing such a strategy very rapidly become persona non grata at the gaming tables. - make that, something about, Players "SUCCESSFULLY pursuing such a strategy " are the ones who are asked to leave. (And they remember your face, and pass around your picture.) Card-counting brought a lot of new suckers into the casinos, so I read. Some of them couldn't pay enough attention in any case, some couldn't pay enough attention once they had a few drinks, and some kept trying even when the house implemented multiple-decks and frequent shuffling. Also, in addition to pure odds, there is that aspect of "going broke." If a game is fair, remember, the winner in the long run is the player with deeper pockets at the start. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Blackjack problem
Clip from earlier message... "The Player may choose to play exactly the same rules as the Dealer is REQUIRED to play; or the Player may choose some of the other options. Since the Player has more choices or options in play than does the Dealer, why does the Dealer have the statistical advantage? It seems to me the Player would have the advantage." Doesn't the law of large numbers figure in here somewhere too: 1. The probability of winning with the house strategy is known a priori and it is optimal (as someone else pointed out). 2. An individual playing with this same strategy may win or lose more or less in the short run. 3. With the volume of games the house plays, the empirical probability will approach the a priori probability in the long run--to the house's advantage. Simplistic and poorly articulated I am sure, but I think it captures the essence of the mechanism at work here. "The Player may choose to play exactly the same rules as the Dealer is REQUIRED to play; or the Player may choose some of the other options. Since the Player has more choices or options in play than does the Dealer, why does the Dealer have the statistical advantage? It seems to me the Player would have the advantage." === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Blackjack problem
"The Player may choose to play exactly the same rules as the Dealer is REQUIRED to play; or the Player may choose some of the other options. Since the Player has more choices or options in play than does the Dealer, why does the Dealer have the statistical advantage? It seems to me the Player would have the advantage." Doesn't the law of large numbers figure in here somewhere too: 1. The probability of winning with the house strategy is known a priori and it is optimal (as someone else pointed out). 2. An individual playing with this same strategy may win or lose more or less in the short run. 3. With the volume of games the house plays, the empirical probability will approach the a priori probability in the long run--to the house's advantage. Simplistic and poorly articulated I am sure, but I think it captures the essence of the mechanism at work here. No. *If*, as originally suggested, the game were symmetric except for certain possibly useful choices that player could make and the dealer could not, the expected winnings of the player would be positive, and the expected winnings of the house negative, on each individual game [assuming intelligent play]. Now, E(sum(X_i)) = sum(E(X_i)) regardless of distribution or even joint distribution. So the house would lose in the long run *because* it lost in the short run, not despite that. What is going on is that the rules of the game are not as was supposed. Ties in which both hands are under 22 are [with some exceptions? help me!] no-win-no-lose, but if both player and dealer bust, the house wins. -Robert === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Process Capability / Specification Limits
Ed, Was the spec written with an understanding of the measurement resolution? Why not ask whoever wrote the spec? I have been following numerous discussions through other sources about design, gdt, and metrology. Miscommunication is a major problem. Statistics won't help you decide what the person who wrote the spec meant. Jeff Falk [EMAIL PROTECTED] wrote: I'm doing a process capability study. The spec. is 1.1 +/-.1 What is the argument against using limits of .95 and 1.24. The idea is any measurement within this window would round within the actual spec. If the characteristic must be measured with a resolution of .05, does this change the argument? Any help to settle this argument is greatly appreciated. Ed Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
no correlation assumption among X's in MLR
Besides independent normal errors with mean zero and constant variance, some (many?) econometric text books do make the assumption that the independent variables are uncorrelated. For example see Gujarti, Damodar (1988), _Basic Econometrics 2nd edition_, McGraw Hill, p. 166 Mark Eakin Associate Professor Information Systems and Management Sciences Department University of Texas at Arlington [EMAIL PROTECTED] or [EMAIL PROTECTED] === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects
I have been following the discussion on Data Mining blooper for a while. Being a first year graduate student in statistics, my comments on this issue might sound premature. Nevertheless, I would put forward my observations. What I have learnt so far from my interaction with the statisticians in the academics as well as in the industry is the following: 1) Many of the statisticians still feel that "Data Mining" as a discipline should be left for the people in computer science. Of course, I don't agree to this statement at all. If you read the paper "Data Mining and Statistics" by Dr. J. Friedman, you would realize how statisticians have neglected this emerging field over last few years. 2) There are few statistics graduate programs which emphasize on "Data Mining" research. Of course, there are few ones like Carnegie Mellon. But overall, we are yet to give the much needed attention it needs. I think, now is the time when we have to decide "Do we accept DATA MINING as a part of statistics or do we keep neglecting this field as before". I am sure there would be few statistics students like me who feel that Data Mining is very much the part of statistics. Thanks Debasmit -- Debasmit Mohanty Graduate Student - Statistics http://bama.ua.edu/~mohan001/ -- Date: Wed, 26 Apr 2000 11:38:28 -0400 From: dennis roberts [EMAIL PROTECTED] Subject: At 07:57 AM 4/26/00 -0500, Herman Rubin wrote: It does not surprise me one bit. The typical statistics course teaches statistical methods and pronouncements, with no attempt to achieve understanding. snip of more this is something i happen to agree with herman about ... but, it is a much broader problem than can be attributed to what happens in one course it is an attitude about what higher education is all about ... and what the goals are for it 'going to college' ... be it undergraduate level or graduate level ... has become a much more hit and miss experience, residence has little meaning ... that is being tailored more and more to the convenience of students ... and to what is 'user' friendly (or it won't SELL). studying principles in disciplines is hard work ... NOT user friendly ... so, less and less is being required in the way of diligent study. take graduate school for example ... there was a time, was there not ... where doctoral students were REALLY expected to be responsible for their dissertations AND were expected to be the experts in that particular area of inquiry ... AND to be competent enough to have done the work him/herself ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT but, what i have noticed over many years is that dissertations are becoming more of a committee effort ... yes, the student MAY have had the idea (though not necessarily) but, from there ... he/she gets help with the design ... has someone else do the analysis (because he/she did not take any/sufficient work in analytic methods to understand what is going on) ... gets help in writing and editing .. and, even gets help in terms of what their results MEAN ... gives new meaning to the term: "cooperative learning" Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Unbalance Nested (Hierarchical) Design
I have an UNBALANCED nested (also called hierarchial) design with Factor A being fixed and the Factor B (within A) random. So my ANOVA has the line entries (for source): A, B(A), Error (or within cell) and total. I am looking for the expected mean squares and approaches for computing confidence intervals on the mean for different levels of A. Any help or reference will be highly appreciated. Arvind Shah Univ of South Alabama === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: truncated Binomial
In article 8e7etv$msp$[EMAIL PROTECTED], [EMAIL PROTECTED] wrote: Hi, Could anybody tell me how to write the density of the binomiale distribution when x=0 is not observed? will the MLE of p different than X-bar in the case of truncated Binomial? How about the variance and the bias of this estimator? The probability distribution will be the conditional distribution. Y = X-bar is still a sufficient statistic, but unless Y = n, it will not be the MLE. In fact, if Y = 1, the MLE is 0. The MLE satisfies (1-q^n)*Y = np. The asymptotic mean and variance can be computed in the usual manner for regular problems, but the actual mean and variance is not simple. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: no correlation assumption among X's in MLR
At 11:09 AM 4/28/00 -0500, EAKIN MARK E wrote: Besides independent normal errors with mean zero and constant variance, some (many?) econometric text books do make the assumption that the independent variables are uncorrelated. For example see Gujarti, Damodar (1988), _Basic Econometrics 2nd edition_, McGraw Hill, p. 166 first, this would only possibly apply in the inferential situation, using r to estimate rho ... but has nothing to do with the correlation between X and Y (r) in the data set at hand and what assumptions are made about the correlation coefficient ... and secondly, independent variables are either correlated with each other (non 0) or not ... thus, only for some specific application ... such as ... how can we maximize the multiple R between a set of predictors AND a criterion ... would such a statement make sense ... and there it is not even an assumption ... just a limiting case for R sure, for some specific econometric model based on some theory ... one might assume this to SIMPLIFY THE MODEL but ... that has nothing to do with independent variables per se === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Statistical Software
On Fri, 28 Apr 2000 [EMAIL PROTECTED] wrote: I need to find a statistical software packages. Most of my statistical work has been done using Microsoft Excel. This has worked out fine, however, I need to find a more heavy duty package but nothing over whelming. I perform some simple statistical work but would like to begin to use a more powerful package. Any suggestion would be great. I like Minitab, myself. One virtue may be that it behaves in some ways like a spreadsheet, and the data are stored ( displayed, if desired) in what Minitab calls a "worksheet", which looks very much like the database display of a spreadsheet package. Its command language is straightforward and easy to learn, although these days Minitab Inc. seems to be downplaying that particular advantage in favor of menu-driven controls. If you are a student, I believe there is a special deal available from Minitab; perhaps one of my colleagues whose knowledge is more immediate than mine will care to comment. -- DFB. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining blooper and Related Subjects (fwd)
- Forwarded message from Debasmit Mohanty - I think, now is the time when we have to decide "Do we accept DATA MINING as a part of statistics or do we keep neglecting this field as before". I am sure there would be few statistics students like me who feel that Data Mining is very much the part of statistics. - End of forwarded message from Debasmit Mohanty - It may be a disagreement over words. Much of the work Tukey et al. did in the 60s, called exploratory data analysis, had to do with looking at data and trying to detect patterns. However, if you sift through data you will find many "patterns" that are just flukes of chance. How do you avoid taking these seriously? This was a criticism directed at Tukey then, and even more so at what goes on today under the name of "Data Mining". But I have a sense that Tukey had a much deeper awareness of the underlying statitical issues than most of the miners have!-) _ | | Robert W. Hayden | | Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | Rural Route 1, Box 10 /| Ashland, NH 03217-9702 | ) (603) 968-9914 (home) L_/ [EMAIL PROTECTED] fax (603) 535-2943 (work) === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: no correlation assumption among X's in MLR
On Fri, 28 Apr 2000, EAKIN MARK E wrote: Besides independent normal errors with mean zero and constant variance, some (many?) econometric text books do make the assumption that the independent variables are uncorrelated. For example see Gujarti, Damodar (1988), _Basic Econometrics 2nd edition_, McGraw Hill, p. 166 One is always at liberty to make additional assumptions, especially if there are some useful purposes to be served thereby. The assumption that predictors are uncorrelated would be such an additional assumption. It is not necessary for any known purpose in MLR qua MLR; it may be necessary (though frankly I can't think why, but then I'm not an econometriciean), or perhaps useful, in some econometric models. I _am_ curious, though: If one is in the midst of a real-world problem of the kind that Professor Gujarti would wish to address, and the real predictors one has ARE correlated, what does one do? Throw up one's hands in despair and wail, "It can't be done!" ? -- DFB. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Unbalance Nested (Hierarchical) Design
On Fri, 28 Apr 2000, Arvind Shah wrote: I have an UNBALANCED nested (also called hierarchial) design with Factor A being fixed and the Factor B (within A) random. So my ANOVA has the line entries (for source): A, B(A), Error (or within cell) and total. I am looking for the expected mean squares and approaches for computing confidence intervals on the mean for different levels of A. Any help or reference will be highly appreciated. When you write "unbalanced", do you mean only that the number of cases within each cell is not equal in all cells; or do you mean the more serious problem that the number of levels of Factor B differs between levels of A? If the former, perhaps the simplest approach would be an unweighted means analysis (which really means "equally weighted", not "UNweighted"!), for which the expected mean squares would be pretty much what they'd be for a balanced design (especially if the unbalancing is not really severe). Confidence intervals on the means for different levels of A might want to vary according to the number of cases in each level; confidence intervals on the _differences_ between means would be more difficult. Alternatively, cast the entire problem into multiple regression format, using indicator variables of one kind or another to represent the several levels of A and of B. -- DFB. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Statistical Software
see http://www.e-academy.com ... for lots of software ... including minitab at 'rental' prices ... At 02:04 PM 4/28/00 -0400, Donald F. Burrill wrote: On Fri, 28 Apr 2000 [EMAIL PROTECTED] wrote: I need to find a statistical software packages. Most of my statistical work has been done using Microsoft Excel. This has worked out fine, however, I need to find a more heavy duty package but nothing over whelming. I perform some simple statistical work but would like to begin to use a more powerful package. Any suggestion would be great. I like Minitab, myself. footnote to don's comment re: command language ... not only does minitab downplay it ... and have been for several releases now ... they almost don't even acknowlege that it exists ... it does! == dennis roberts, penn state university educational psychology, 8148632401 http://roberts.ed.psu.edu/users/droberts/droberts.htm === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Grad Student needs guidance
I am a graduate student in an engineering program which emphasizes statistical methods for process improvement and/or product development. I have found that I love applying statistical methods for process/product development and testing. I would not even mind a company that is developing software applying statistics to process/product development. What types of jobs could I actually apply these concepts? In other words what should I be searching for as far as job titles? Your help would be greatly appreciated. -- Charles Madewell Implementation of Technology, Process/Product Development; Statistical Design and Analysis of Experiments, Regression Analysis Model Building. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
What is the logarithmic distribution? (many questions)
General question, I've seen two descriptions of "logarithmic distribution". One is related to the frequency of digits called Benford's law (digit 1 occurs more frequently than 2, 2 than 3, etc) whose explanation is that it is the result of a mixture of distributions. The other description is a 2-page paragraph The logarithmic distribution in Kendall and Stuart (1977, The Advanced theory of statistics, Vol 1, 4th edition, pp 139-140), attributing the derivation to Fisher (1943). Are these concepts of logarithmic distribution the same or not? Second question I would like to ask: Kendall and Stuart give an example of a distribution of the logarithmic type from Fisher (1943), "distribution of butterflies in Malaya, with theoretical frequencies given by the logarithmic distribution" No. of species Theoretical frequency Observed frequency 1 135.05 118 2 67.33 74 3 44.75 44 4 33.46 24 5 26.69 29 6 22.17 22 7 18.95 20 etc ... From what I've understood, the Theoretical frequency was generated by - ( q^r ) / ( r * ln(1-q) ) in which r is the No. of species, q is the probability of the presence of an attribute. How was, how can the fit be realized? With thanks in advance, Vincent Vinh-Hung === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Question about kappa
On 27 Apr 2000 13:24:01 -0700, [EMAIL PROTECTED] (Robert McGrath) wrote: I am looking for a formula for kappa that applies for very special circumstances: 1) Two raters rated each event, but the raters varied across event. 2) The study involved 100 subjects, each of whom generated app. 17 events, so multiple events were generated by the same subject. I know Fleiss has developed a formula for kappa that allows for multiple sets of raters, but is there a formula that is appropriate for the circumstance I have described? Thanks for your help! I think it was Fleiss who stated that for complex situations, the kappa is usually equal to the Intraclass correlation (ICC), to the first two decimal places. So all you need to do, is this: Define the appropriate ANOVA table, and decide on the appropriate version of the ICC. My stats-FAQ has a reference on ICC for an unbalanced design. It entails approximations, so I hope the design is not *too* unbalanced. snip, McGrath sig. snip, Bob Wheeler post; included for no imaginable reason. snip, quoting of Edstat-L message from the bottom of Bob Wheeler's post snip, Edstat-L message -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Is Bootstrapping Appropriate?
Date: Fri, 28 APR 2000 16:04:34 -0400 From: Rich Ulrich [EMAIL PROTECTED] On Fri, 28 Apr 2000 03:31:45 -0400, Greg Heath [EMAIL PROTECTED] wrote: snip, various My simulation currently assumes that the residuals are Gaussian. If this is a bad assumption, I need to know ASAP to prevent higher level decision makers from making some very costly mistakes. ... Herman Rubin suggested doing simulations, and that seems like a good approach. How much sensitivity is there, to assumptions? What sort of things will change the outcomes, by how much? Can you reproduce your present data, with your favored assumptions? Can you reproduce your present data, with dangerous assumptions? Good questions. Currently, answers unknown. Will respond when I have them. Thank you. Greg Hope this helps. Gregory E. Heath [EMAIL PROTECTED] The views expressed here are M.I.T. Lincoln Lab (781) 981-2815not necessarily shared by Lexington, MA(781) 981-0908(FAX) M.I.T./LL or its sponsors 02420-9185, USA === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Is Bootstrapping Appropriate?
From: Herman Rubin [EMAIL PROTECTED] Newsgroups: sci.stat.consult, sci.stat.edu, sci.stat.math In article Pine.SOL.3.91.1000428033622.20399C-10@miles, Greg Heath [EMAIL PROTECTED] wrote: Date: Fri, 28 APR 2000 00:00:45 GMT From: [EMAIL PROTECTED] ... One variable, 20 measurements per second, 26.25 seconds (526 measurements). The 1/e decorrelation time estimated from the autocorrelation function is ~ 1 second. Therefore, I will get independent measurements approximately every T0 seconds (probably ~2 = T0 = ~ 4 sec) Could these correlated measurements have come from a Gaussian distribution? Please see my responses to the other replies. Bootstrapping is totally inappropriate. However, there are other simpler simulation methods of obtaining the significance level, using any test statistic you wish to use, assuming you are willing to use the particular value of the correlation coefficient and you are using a scale-invariant test. The variance will not affect your test in this problem. BTW, this method is the one used for obtaining significance levels for the Kolmogorov-Smirnov test when parameters are estimated. Construct samples according to the null hypothesis. The samples should be independent; the dependence within each sample should follow the model. Then use the empirical distribution to determine the significance of your data set. Sounds good. Thank you. However, I'm surprised that simulation is still necessary if the measurements were independent instead of correlated. Warren Sarle (private communication) commented that decorrelating the series using ARIMA and testing the residuals is also a valid approach. However, since the variance is estimated I'd still have to use the simulation approach to obtain the significance level. Is that correct? Greg Hope this helps. Gregory E. Heath [EMAIL PROTECTED] The views expressed here are M.I.T. Lincoln Lab (781) 981-2815not necessarily shared by Lexington, MA(781) 981-0908(FAX) M.I.T./LL or its sponsors 02420-9185, USA === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Grad Student needs guidance
I am a graduate student in an engineering program which emphasizes statistical methods for process improvement and/or product development. I have found that I love applying statistical methods for process/product development and testing. I would not even mind a company that is developing software applying statistics to process/product development. What types of jobs could I actually apply these concepts? In other words what should I be searching for as far as job titles? Your help would be greatly appreciated. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Hypothesis
The EDSTAT traffic after the initial submission by Dennis Roberts on 4/7/2000 interested me. A lot of good thoughts on teaching a fundamental concept. His proposal resulted in a total of 117 messages up to 4/27/2000. This may be a record on comments to a single theme. It struck a cord with 25 separate individuals. Here is my tally. NameĀ Number of messages Dennis Roberts 23 Robert Dawson 18 Herman Rubin 16 Michael Granaas 13 Alan McLean 7 Bruce Weaver 5 Alan Hutson 4 David Heiser 4 Donald Burril 4 Rich Ulrich 4 Henry Silvert 3 Jon Cryer 2 Thom Baguley 2 Art Kendall 1 Bill Knight 1 Chris Mecklin 1 I Williams 1 Jerrold Zar 1 Jerry Dallal 1 Joe Ward 1 Juha Puranen 1 Magil Brett 1 Milo Schield 1 Richard Barton 1 Robert McGrath 1 The count of replies up to the first single reply follows a Pareto distribution reasonably well. For some this may be the first time encounter with the Pareto distribution, since it is rarely discussed in stat textbooks. DAHeiser