Re: cigs & figs
- in respect of the up-coming U.S. holiday - On Mon, 25 Jun 2001 11:49:47 GMT, mackeral@remove~this~first~yahoo.com (J. Williams) wrote: > On Sun, 24 Jun 2001 16:37:48 -0400, Rich Ulrich <[EMAIL PROTECTED]> > wrote: > > > >What rights are denied to smokers? JW > > Many smokers, including my late mother, feel being unable to smoke on > a commerical aircraft, sit anywhere in a restaurant, etc. were > violation of her "rights." I don't agree as a non-smoker, but that > was her viewpoint until the day she died. What's your point: She was a crabby old lady, whining (or whinging) about fancied 'rights'? You don't introduce anything that seems "inalienable" or "self-evident" (if I may introduce July-4th language). Nobody stopped her from smoking as long as she kept it away from other people-who-would-be-offended. Okay, we form governments to help assure each other of rights. Lately, the law sees fit to stop some assaults from happening, even though it did not always do that in the past. - the offender still has quite a bit of leeway; if you don't cause fatal diseases, you legally can offend quite a lot. We finally have laws about smoking. But she wants the law to stop at HER convenience? [ snip, various ] JW > > Talking about confused and/or politically driven, what do Scalia and > Thomas have to do with smoking rights? Please cite the case law. I mention "rights" because that did seem to be a attitude you mentioned that was (as you see) provocative to me. I toss in S & T, because I think that, to a large extent, they share your mother's preference for a casual, self-centered definition of rights. And they are Supreme Court justices. [ Well, they don't say, "This is what *I* want" these two translate the blame/ credit to Nature (euphemism for God).] So: I don't fault your mother *too* harshly, when Justices hardly do better. Even though a prolonged skew was needed, to end up with two like this. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: about a problem of khi2 test
On Sun, 01 Jul 2001 14:19:31 +0200, Bruno Facon <[EMAIL PROTECTED]> wrote: > I work in the area of intelligence differentiation. I would like to know > how to use the khi2 statistic to determine whether the number of > statistically different correlations between two groups is due or not to > random variations. In particular I would like to know how to determine > the expected numbers of statistically different correlations due to > chance. > Let me take an example. Suppose I compare two correlations matrices of > 45 coefficients obtained from two independent groups (A and B). If there > is no true difference between the two matrices, the number of > statistically different correlations should be equal to 1.25 in favor of Yes, that is the number. But there is not a legitimate test that I know of, unless you are willing to make a strong assumption that no pair of the variables should be correlated. I never heard of the khi2 statistic before this. I searched with google, and found a respectable number of references, and here is something that I had not seen with a statistic: kh2 appears to be solely French in its use. Of the first 50 hits, most were in French, at French ISPs (.fr). The few that were in English were also from French sources. One article had a reference (not available in my local libraries): Freilich MH and Chelton DB, J Phys Oceanogr 16, 741-757. > > group A and equal to 1.25 in favor of group B (in case of alpha = .05). > > Consequently, the expected number of nonsignificant differences should > be 42.75. Is my reasoning correct? I would be nice to test the numbers, but I don't credit that reference as a good one, yet. I don't remember for sure, but I think you might be able to compare two correlation matrices with programs from Jim Steiger's site, http://www.interchg.ubc.ca/steiger/multi.htm On the other hand, you would be better off if you can compare the entire covariance structures, to keep from making accidental assumptions about variances. (Does Jim provide for that?) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Maximum Likelihood
On 28 Jun 2001 20:39:18 -0700, [EMAIL PROTECTED] (Mark W. Humphries) wrote: > Hi, > > Does anyone have references to a simple/intuitive introduction to Maximum > Log Likelihood methods. > References to algorithms would also be appreciated. > Look on the Internet. I used www.google.com to search on "maximum likelihood" tutorial (put the phrase in quotes to keep it together; or you can use Advanced search) There were MANY hits, and the second reference was in a tutorial that begins at http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_2.html The third reference was for some programs and examples in Gauss (a programming language) by Gary King at Harvard, in his application area. If these aren't worthwhile (I did not try to download anything), there are plenty of other sites to check. [ I am intrigued by G. King, a little. This is the fellow who putatively has a method, not Heckman's, for overcoming or compensating for aggregation bias. Which I never found available for free. But, too bad, the page says these programs go with his 1989 book, and I think his Method is more recent.] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Help with stats please
On 24 Jun 2001 13:54:56 -0700, [EMAIL PROTECTED] (dennis roberts) wrote: > At 12:20 PM 6/24/01 -0700, Melady Preece wrote: > >Hi. I am teaching educational statistics for the first time, and although I > >can go on at length about complex statistical techniques, I find myself at a > >loss with this multiple choice question in my test bank. I understand why > >the range of (b) is smaller than (a) and (c), but I can't figure out how to > >prove that it is smaller than (d). > > > >If you can explain it to me, I will be humiliated, but grateful. > > > > > >1. Which one of the following classes had > > the smallest range in IQ scores? dr > > of course, there is nothing about the shape of the distribution of any > class ... so, does the item assume sort of normal? in fact, since each of > these classes is probably on the small side ... it would be hard to assume > that but, for the sake of the item ... pretend > [ snip ] Good point, about normality. And who provides the "test bank" of items? The testee has to *assume* a certain amount of normality, which is not stated; and you have to *assume* that the N is greater than 2 -- or else the claim is *not* true. It seems to me that when the reader has to supply unstated technical assumptions like these, the test-validator should be careful: I suspect that success on THIS item is context-dependent. There is less problem, if everyone is always given exactly the same test. That *is* an issue, if different sets of items are extracted for use, at different times -- which is what I think of, when I hear "item bank." Could other items clue this answer? That is, Do other items STATE those assumptions? Do other items REQUIRE those assumptions if you are going to answer them? - If the user has seen items in his selection from the "bank", is he more apt to make the intended assumptions here? I expect that a conscientious scale developer is interested of minimizing the work required for validation; and he would avoid this problem if he noticed it. Answer (b) seems right, if the reader is supposed to describe what you would expect 'for moderate sized samples, with scores that are continuous and approximately normal.' -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: cigs & figs
- re: some outstandingly confused thinking. Or writing. On Sat, 23 Jun 2001 15:25:31 GMT, mackeral@remove~this~first~yahoo.com (J. Williams) wrote: [ snip; Slate reference, etcetera ] > ... My mother was 91 years > old when she died a year ago and chain smoked since her college days. > She defended the tobacco companies for years saying, "it didn't hurt > me." She outlived most of her doctors. Upon quoting statistics and > research on the subject, her view was that I, like other "do gooders > and non-smokers," wanted to deny smokers their rights. What statistics would her view quote? to show that someone wants to deny smokers 'their rights'? [ Hey, I didn't write the sentence ] I just love it, how a 'natural right' works out to be *exactly* what the speaker wants to do. And not a whit more. (Thomas and Scalia are probably going to give us tons of that bad philosophy, over the next decades.) What rights are denied to smokers? You know, you can't build your outhouse right on the riverbank, either. >Obviously, > there is a health connection. How strong that connection is, is what > makes this a unique statistical conundrum. How strong is that connection? Well, quite strong. I once considered that it might not be so bad to die 9 years early, owing to smoking, if that cut off years of bad health and suffering. Then I realized, the smoking grants you most of the bad health of old age, EARLY. (You do miss the Alzheimer's.) One day, I might give up smoking my pipe. What is the statistical conundrum? I can almost imagine an ethical conundrum. ("How strongly can we legislate, to encourage cyclists to wear helmets?") I sure don't spot a statistical conundrum. Is this word intended? If so, how so? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Marijuana
- I will delete most, and comment on a few points. Maybe further posts will delete the sci.stat.* groups - On Fri, 22 Jun 2001 20:49:02 GMT, Steve Leibel <[EMAIL PROTECTED]> wrote: > In article <[EMAIL PROTECTED]>, > Rich Ulrich <[EMAIL PROTECTED]> wrote: > [ ... ] > > Hallucinating? On pot? What are YOU smokin'? Pot doesn't cause > hallucinations -- although a lot of anti-drug hysteria certainly does. I read 30 years ago that the pharmacologists classed it as hallucinogen, and then I discovered why. Then I got bored and quit. At least the stuff is not addictive. Should I conclude from your comments that this domestic, sinsemilla stuff I read about is grossly is inferior to the imports of old? > A cursory web search turned up these links among many others to support > my statement. Naturally this subject is controversial and there are > lots of conflicting studies. ... - even the first one you cite includes ample support for what I posted. After saying other negative things, > http://www.norml.org/canorml/myths/myth1.shtml says ' The second NHTSA study, "Marijuana and Actual Driving Performance," concluded that the adverse effects of cannabis on driving appear "relatively small" and are less than those of drunken driving." ' "... less than those of drunken driving" is *not* refutation of what I wrote. That article supports me rather fully: intoxicants help people have accidents. Arguments to the contrary are (it seems to me) supported by wishful thinking that causes the arguments to blur before your very eyes. > > And "stranger named Steve?" I've been on this newsgroup since 1995. > Not as famous as James Harris, maybe, but certainly no stranger. - sorry. Today's check with groups.google.com shows me you post frequently in sci.math -- which I don't read. This thread, you may note (as I just noted), is posted there, and crossposted to three sci.stat.* groups, where I do participate. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Marijuana
On Fri, 22 Jun 2001 18:45:52 GMT, Steve Leibel <[EMAIL PROTECTED]> wrote: > In article <[EMAIL PROTECTED]>, > [EMAIL PROTECTED] (Eamon) wrote: > > > (c) Reduced motor co-ordination, e.g. when driving a car > > > > Numerous studies have shown that marijuana actually improves driving > ability. It makes people more attentive and less aggressive. You could > look it up. An intoxicant does *that*? I think I recall in the literature, that people getting stoned, on whatever, occasionally *think* that their reaction time or sense of humor or other performance is getting better. Improving your driving by getting mildly stoned (omitting the episodes of hallucinating) seems unlikely enough, to me, that *I* think the burden of proof is the stranger named Steve. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: a form of censoring I have not met before
On 21 Jun 2001 00:35:11 -0700, [EMAIL PROTECTED] (Margaret Mackisack) wrote: > I was wondering if anyone could direct me to a reference about the > following situation. In a 3-factor experiment, measurements of a continuous > variable, which is increasing monotonically over time, are made every 2 > hours from 0 to 192 hours on the experimental units (this is an engineering > experiment). If the response exceeds a set maximum level the unit is not > observed any more (so we only know that the response is > that level). If > the measuring equipment could do so it would be preferred to observe all > units for the full 192 hours. The time to censoring is of no interest as > such, the aim is to estimate the form of the response for each unit which > is the trace of some curve that we observe every 2 hours. Ignoring the > censored traces in the time period after they are censored puts a huge Well, it certainly *sounds* as if the "time to censoring" should be of great interest, if you had an adequate model. Thus, when you say that "ignoring" them gives "a huge downward bias", it sounds to me as if you are admitting that you do not have an acceptable model. Who can you blame for that? What leverage do you have, if you try to toss out those bad results? (Surely, you do have some ideas about forming estimates that *do* take the hours into account. The problem belongs in the hands of someone who does.) - maybe you want to segregate trials into the ones with 192 hours, or less than 192 hours; and figure two (Maximum Likelihood) estimates for the parameters, which you then combine. > downward bias into the results and is clearly not the thing to do although > that's what has been done in the past with these experiments. Any > suggestions of where people have addressed data of this or related form > would be very gratefully received. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Help me, please!
On 18 Jun 2001 01:18:37 -0700, [EMAIL PROTECTED] (Monica De Stefani) wrote: > 1) Are there some conditions which I can apply normality to Kendall > tau? tau is *lumpy* in its distribution for N less than 10. And all rank-order statistics are a bit problematic when you try to use them on rating scales with just a few discrete scores -- the tied values give you bad scaling intervals, and the estimate of variance won't be very good,either. For correlations, your assumption of 'normality' is usually applied to the values at zero. > I was wondering if x's observations must be > independent and y's observations must be independent to apply > asymptotically normal limiting > distribution. > (null hypothesis = x and y are independent). > Could you tell me something about? - Independence is needed for just about any tests. I started to say (as a minor piece of exaggeration) that independence is needed "absolutely"; but the correct statement, I think, is that independence is always demanded "relative to the error term." [ snip, non-linear?] "Monotonic" is the term. [ snip, T(z): I don't know what that is.] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Probability Of an Unknown Event
On Sat, 16 Jun 2001 23:05:52 GMT, "W. D. Allen Sr." <[EMAIL PROTECTED]> wrote: > It's been years since I was in school so I do not remember if I have the > following statement correct. > > Pascal said that if we know absolutely nothing > about the probability of occurrence of an event > then our best estimate for the probability of > occurrence of that event is one half. > > Do I have it correctly? Any guidance on a source reference would be greatly > appreciated! I did a little bit of Web searching and could not find that. Here is an essay about Bayes, which (dis)credits him and his contemporaries as assuming something like that, years before Laplace. I found it with a google search on <"know absolutely nothing" probability> . http://web.onetel.net.uk/~wstanners/bayes.htm -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: individual item analysis
On 15 Jun 2001 14:24:39 -0700, [EMAIL PROTECTED] (Doug Sawyer) wrote: > I am trying to locate a journal article or textbook that addresses > whether or not exam quesitons can be normalized, when the questions are > grouped differently. For example, could a question bank be developed > where any subset of questions could be selected, and the assembled exam > is normalized? > > What is name of this area of statistics? What authors or keywords would > I use for such a search? Do you know whether or not this can be done? I believe that they do this sort of thing in scholastic achievement tests, as a matter of course. Isn't that how they make the transition from year to year? I guess this would be "norming". A few weeks ago, I discovered that there is a whole series of tech-reports put out by one of the big test companies. I would look back to it, for this sort of question. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: meta-analysis
On 17 Jun 2001 04:34:26 -0700, [EMAIL PROTECTED] (Marc) wrote: > I have to summarize the results of some clinical trials. > Unfortunately the reported information is not complete. > The information given in the trials contain: > > (1) Mean effect in the treatment group (days of hospitalization) > > (2) Mean effect in the control group (days of hospitalization) > > (3) Numbers of patients in the control and treatment group > > (4) p-values of a t-test (between the differences of treatment > and control) > My question: > How can I calculate the variance of treatment difference which I need > to perform meta-analysis? Note that the numbers of patients in the Aren't you going too far? You said you have to summarize. Well, summarize. The difference is in terms of days. Or it is in terms of percentage of increase. And you have the t-test and p-values. You might be right in what you propose, but I think you are much more likely to produce a useful report if you keep it simple. You are right; meta-analyses are complex. And a majority of the published ones are (in my opinion) awful. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Marijuana
On 15 Jun 2001 02:04:36 -0700, [EMAIL PROTECTED] (Eamon) wrote: [ snip, Paul Jones. About marijuana statistics.] > > Surely this whole research is based upon a false premise. Isn't it > like saying that 90%, say, of heroin users previously used soft drugs. > Therefore, soft-drug use usually leads to hard-drug use - which does > not logically follow. (A => B =/= B => A) > > Conclusions drawn from the set of people who have had heart attacks > cannot be validly applied to the set of people who smoke dope. > Rather than collect data from a large number of people who had heart > attacks and look for a backward link, they should monitor a large > number of people who smoke dope. But, of course this is much more > expensive. It is much more expensive, but it is also totally stupid to carry out the expensive research if the *cheap* and lousy research didn't give you a hint that there might be something going on. The numbers that he was asking about do pass the simple test. I mean, there were not 1 million people contributing one hour each, but we should still ask, *Would* this say something? If it would not, then the whole question is *totally* arid. The 2x2 table is approximately (dividing the first column by 100; and subtracting from a total): 10687 and 124 175 and 9 That gives a contingency test of 21.2 or 18.2, with p-values under .001. The Odds Ratio on that is 4.4. That is pretty convincing that there is SOMETHING going on, POSSIBLY something that merits an explanation. The expectation for the cell with 9 is just 2.2 -- the tiny cell is the cell that matters for contributions to the test -- which is why it is okay to lop the "hundreds" off the first column (to make it readable). Now, you may return to your discussion of why the table is not any good, and what is needed for a proper test. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: multivariate techniques for large datasets
On 13 Jun 2001 20:32:51 -0700, [EMAIL PROTECTED] (Tracey Continelli) wrote: > Sidney Thomas <[EMAIL PROTECTED]> wrote in message >news:<[EMAIL PROTECTED]>... > > srinivas wrote: > > > > > > Hi, > > > > > > I have a problem in identifying the right multivariate tools to > > > handle datset of dimension 1,00,000*500. The problem is still > > > complicated with lot of missing data. can anyone suggest a way out to > > > reduce the data set and also to estimate the missing value. I need to > > > know which clustering tool is appropriate for grouping the > > > observations( based on 500 variables ). > > One of the best ways in which to handle missing data is to impute the > mean for other cases with the selfsame value. If I'm doing > psychological research and I am missing some values on my depression > scale for certain individuals, I can look at their, say, locus of > control reported and impute the mean value. Let's say [common > finding] that I find a pattern - individuals with a high locus of > control report low levels of depression, and I have a scale ranging > from 1-100 listing locus of control. If I have a missing value for > depression at level 75 for one case, I can take the mean depression > level for all individuals at level 75 of locus of control and impute > that for all missing cases in which 75 is the listed locus of control > value. I'm not sure why you'd want to reduce the size of the data > set, since for the most part the larger the "N" the better. Do you draw numeric limits for a variable, and for a person? Do you make sure, first, that there is not a pattern? That is -- Do you do something different depending on how many are missing? Say, estimate the value, if it is an oversight in filling blanks on a form, BUT drop a variable if more than 5% of responses are unexpectedly missing, since (obviously) there was something wrong in the conception of it, or the collection of it Psychological research (possibly) expects fewer missing than market research. As to the N - As I suggested before - my computer takes more time to read 50 megabytes than one megabyte. But a psychologist should understand that it is easier to look at and grasp and balance raw numbers that are only two or three digits, compared to 5 and 6. A COMMENT ABOUT HUGE DATA-BASES. And as a statistician, I keep noticing that HUGE databases tend to consist of aggregations. And these are "random" samples only in the sense that they are uncontrolled, and their structure is apt to be ignored. If you start to sample, to are more likely to ask yourself about the structure - by time, geography, what-have-you. An N of millions gives you tests that are wrong; estimates ignoring "relevant" structure have a spurious report of precision. To put it another way: the Error (or real variation) that *exists* between a fixed number of units (years, or cities, for what I mentioned above) is something that you want to generalize across. With a small N, that error term is (we assume?) small enough to ignore. However, that error term will not decrease with N, so with a large N, it will eventually dominate. The test based on N becomes increasing irrelevant -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: multivariate techniques for large datasets
On 11 Jun 2001 22:18:11 -0700, [EMAIL PROTECTED] (srinivas) wrote: > Hi, > > I have a problem in identifying the right multivariate tools to > handle datset of dimension 1,00,000*500. The problem is still > complicated with lot of missing data. can anyone suggest a way out to > reduce the data set and also to estimate the missing value. I need to > know which clustering tool is appropriate for grouping the > observations( based on 500 variables ). 'An intelligent user' with a little experience. Look at all the data, and figure what comprises a 'random' subset. There are not many purposes that require more than 10,000 cases so long as your sampling gives you a few hundred in every interesting category. [This can cut down your subsequent computer processing time, since 1 million times 500 could be a couple of hundred megabytes, and might take some time just for the disk reading.] Look at the means/ SDs/ # missing for all 500; look at frequency tabulations for things in categories; look at cross tabulations between a few variables of your 'primary' interest, and the rest. Throw out what is relatively useless. For *your* purposes, how do you combine logical categories? - 8 ounce size with 24 ounce; chocolate with vanilla; etc. A computer program won't tell you what makes sense, not for another few years. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: About kendall
On 12 Jun 2001 08:43:53 -0700, [EMAIL PROTECTED] (Monica De Stefani) wrote: > When I aplly Kendall tau or Kendall's partial tau to a time series do > I have to calcolate ranks or not? > In fact a time series has a natural temporal order. ... but you are not partialing out time. Surely. Your program that does the Kendall tau must do some ranking, as part of the algorithm. Why do you think you might have to calculate ranks? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: please help
On 10 Jun 2001 07:27:55 -0700, [EMAIL PROTECTED] (Kelly) wrote: > I have the gage repeatability & reproducibility(gage R&R) analysis > done on two instruments, what hyphoses test can I use to test that the > repeatability variance(expected sigma values of repeatability) of the > two instruments are significantly different form each other or to say > one has a lower variance than the other. > Any insight will be greatly appreciated. > Thanks in advance for your help. I am not completely sure I understand, but I will make a guess. There is hardly any power for comparing two ANOVAs that are done on different samples, until you make strong assumptions about samples being equivalent, in various regards. If ANOVAs are on the same sample, then a CHOW test can be used on the "improved prediction" if one hypothesis consists of an extra d.f. of prediction. If ANOVAs are on separate samples, I wonder if you could compare the residual variances, by the simple variance ratio F-test -- well, you could do it, but I don't know what arguments should be raised against it, for your particular case. There are criteria resembling the CHOW test that are used less formally, for incommensurate ANOVAs (not the same predictors) - AKAIKE and others. If your measures are done on the same (exact) items, you might have a paired test. Instrument A gets closer values on of the measurements that are done. Finally, if you can do a bunch of separate experiments, you can test whether A or B does better in more than half of them. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Diagnosing and addressing collinearity in Survival Analysis
On 06 Jun 2001 06:46:55 GMT, [EMAIL PROTECTED] (ELANMEL) wrote: > Any assistance would be appreciated: I am attempting to run some survival > analyses using Stata STCOX, and am getting messages that certain variables are > collinear and have been dropped. Unfortunately, these variables are the ones I > am testing in my analysis! > If there are 3 groups (classes), then you can have only two dummy variables to refer to their degrees of freedom. You can code those in the most convenient and informative way. If your problem arises otherwise, then you have a fundamental problem in the logic of what is being tested. Google shows some examples of problems when I search for "statistical confounding" (use the quotes for the search). And "confounded designs" seems to obtain discussions. > I would appreciate any information or recommendations on how best to diagnose > and explore solutions to this problem. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Obtain standard error in nonlinear estimation
On Mon, 4 Jun 2001 14:54:56 -0400, "Jiwu Rao" <[EMAIL PROTECTED]> wrote: > Hi > > I performed a regression analysis on a model nonlinear in parameters. The > function is: > q = ( k* P^n ) + (k2 * P2^n2) > where P and P2 are independent variables, k, n, k2, n2 are parameters. > The estimates and their variances can be obtained, as well as correlation > between any two parameters. > > The question is: how do I estimate the standard error in the first term of > the equation? That is, what is the error in estimating w = k* P^n? 1) What is this question supposed to mean? How do *you* want to interpret the error in adding two variables to an equation, if it were an ordinary multiple regression? In terms of 'error', does it answer your question, to drop out the whole term, and compare the fuller Fit (4 or 5 variables) with a model having 2 variables less? That probably gives you a statistical test if you are fitting by Least squares, or by Maximum likelihood. 2) If your correlations between parameters are nearly 1.0 (as you go on to say), that suggests you don't have the model in an elegant form. Reparameterize. The form ofq= (k * P^n) looks like a power-transformation. If you are trying to solve for the Box-Cox transformation, or to do something similar, I think you want a component that multiplies or divides by n, as part of the constant before P. That should get rid of some of the correlation. 3) Are you really committed to that equation? Who around you knows enough that they should commit you to that equation? - Ask *them* what the proper parameterization should be, since the version yielding correlations near to 1.0 is (at best) a mistake from not paying enough attention. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Need Good book on foundations of statistics
On 1 Jun 2001 19:07:31 GMT, [EMAIL PROTECTED] wrote: > > Can anyone refer me to a good book on the foundations of statistics? Stigler's "The History of Statistics" is the most widely read of recent popular histories. It covers pre-1900. His newer book is "Statistics on the Table" and I enjoyed that one, too. It includes the founding of *modern* statistics in, say, the 1930s, in addition to much older anecdotes. > I want to know of the limitations, assumptions, and philosophy > behind statistics. A discussion of how the quantum world may have > different laws of statistics might be a plus. That last sentence makes me think that you don't know any answers to the sentence just previous to it. " ... have different laws" is certainly not the way statisticians would put it. Leptons *obey* different laws than baryons do (I think), but the laws are descriptions that were imagined by human beings. I suppose one way to describe the dilemma of physics might be, It is trying to force all of these particles into fitting descriptions that are less than ideal (or, so it keeps working out). I think it is curious and interesting that the physicists at the highest levels of abstraction -- cosmology; and high-energy particles/relativity -- are beginning to use fairly ordinary 'statistical tests' to judge whether they have anything. "IS there oscillation in the measured background of stars, near 4 degrees kelvin, across the whole universe?" "IF they continued CERN for another 18 months, would there have been another dozen or so *apparent* particles of the right type, so they could conclude that the number observed was 'significant' at the one-in-a-million level, instead of just one-in-two-hundred?" -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: fit-ness
On Thu, 31 May 2001 12:05:24 +0100, "Alexis Gatt" <[EMAIL PROTECTED]> wrote: > Hi, > > a basic question from a MSc student in England. First of all, yeah I read > the FAQ and I didnt find anything answering my question, which is fairly > simple: I am trying to analyse how well several mathematical methods perform > to modelize a scanner. So I have, for every input data, the corresponding > output given by the scanner and the values given by the mathematical models > I am using. > First, given the distribution of the errors, I can use the usual mean-StdDev I can think of two or 3 meanings of 'scanner' and not a one of them would have a simple, indisputable measure of 'error.' 1) Some measures would be biased toward one 'method' or another, so a winner would be obvious. 2) Some samples to be tested would be biased (similarly) toward a winner by one method or another. So you select your winner by selecting your mix of samples. If you have fine measures, then you can give histograms of your results (assuming 1-dimensional, as your alternatives suggest). Is it enough to have the picture? What would your audience demand? What is your need? > if the distro is normal, or median-95th percentile otherwise. Any other > known methods to enhance the pertinence of the analysis? Any ideas welcome. Average squared error (giving SD) is popular. Average absolute error de-emphasizes the extremes. Count of errors beyond a critical limit sometimes fills a need. A more complicated way is to build in a cost function. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: ONLY ONE
FYI - that piece of HTML code is a SPAM advertisement, which does seem to evoke other Web addresses. On 27 May 2001 18:51:32 -0700, [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote: > > > > window.location="<A HREF="http://www.moodysoft.com"">http://www.moodysoft.com"</A>; > > > > > Best screen capture on earth and in cyberspace.In fact the only one.Anything >else is just a long learning process. > SPX® v2.0Everytime you need to select a portion of >screen, hold right-click longer than usual until the cursor turns into the "cross" >graphical cursor.Make your selection and as soon as you release the mouse, SPX® will >send it to the destination of your choice: Clipboard, File, Mail, Printer/Fax > Very useful, no?http://www.moodysoft.com";>www.moodysoft.com > > > > -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: The False Placebo Effect
On 26 May 2001 03:50:32 GMT, Elliot Cramer <[EMAIL PROTECTED]> wrote: > Rich Ulrich <[EMAIL PROTECTED]> wrote: > : - I was a bit surprised by the newspaper coverage. I tend to > : forget that most people, including scientists, do *not* blame > : regression-to-the-mean, as the FIRST suspicious cause > : whenever there is a pre-post design: because they have > : scarce heard of it. > > I don't see how RTM can explain the average change in a prepost design - explanation: whole experiment is conducted on patients who are at their *worst* because the flare-up is what sent them to a doctor. Sorry; I might have been more complete there. All the pre-post studies in psychiatric intervention (where I work) have this as something to watch for. I guess I could have said, "first suspicious cause *of selective improvement* in any pre-post design." > those above the pre population mean will tend to be closer to the post > population mean but this doesn't say anything about the average > change. Any depression study is apt to show both a placebo AND a no > treatment effect after 6 weeks - I'm not sure what that last phrase means... "both " 30% or so of acutely depressed patients will get quite a bit better. In psychiatry, I think we have called some effects "placebo" even when we know that it is not a very good word. The experience of being in a research trial, by the way, seems to produce a placebo effect, according to what people have told me. (I think that careful scientists attribute that one to the extra time and attention given to those subjects.) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: The False Placebo Effect
On 24 May 2001 21:39:17 -0700, [EMAIL PROTECTED] (David Heiser) wrote: > > Be careful on your assumptions in your models and studies! > --- > > Placebo Effect An Illusion, Study Says > By Gina Kolata > New York Times > (Published in the Sacramento Bee, Thursday, May 24, 2001) > > In a new report that is being met with a mixture of astonishment and some > disbelief, two Danish researchers say that the placebo effect is a myth. Do you think they will not believe in voudon/ voodoo, either? > > The investigators analyzed 114 published studies involving about 7,500 > patients with 40 different conditions. They found no support for the common > notion that, in general, about one-third of patients will improve if they > are given a dummy pill and told it is real. [ ... ] The story goes on. The authors look at studies where the placebo effect is probably explained by regression-to-the-mean. - I was a bit surprised by the newspaper coverage. I tend to forget that most people, including scientists, do *not* blame regression-to-the-mean, as the FIRST suspicious cause whenever there is a pre-post design: because they have scarce heard of it. On the other hand, I have expected for a long time that the best that a light-weight placebo will do is a light-weight improvement. > ... > The researchers said they saw a slight effect of placebos on subjective > outcomes reported by patients, like their descriptions of how much pain they > experienced. But Hrobjartsson said he questioned that effect. "It could be a > true effect, but it also could be a reporting bias," he said. "The patient > wants to please the investigator and tells the investigator, 'I feel > slightly better. ' " "Pain" is a hugely subjective report. It is notorious. I would not want to do a summary across the papers of the whole field of pain-researchers, since -- based on difficulty, and not on knowing those researchers -- I expect an enormous amount of bad research in that area. - I don't know if the researchers are quite unwise here, of if they only seem that way because of bad news reporting. - Oh, I did read a meta-analysis a while ago, that one from Steve Simon. It was based on pain research (and, basically, only relevant to pain research), and the authors insisted that the vast majority of studies were not very good. About the studies these authors found, using 3 groups: > They found 114, published between 1946 and 1998. When they analyzed the > data, they could detect no effects of placebos on objective measurements, > like cholesterol levels or blood pressure. - That is interesting. 114 is a big enough number. Controlled medical research, however, seemed to undergo big changes across those decades. I expect that double-blind and triple-blind studies did not get much use until halfway through that interval. If someone does look into the original publication, and will tell us about it -- I am interested, especially, in what the authors say about pain studies, and what they say about time trends. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: sample size and sampling error
Posted to sci.stat.consult,sci.stat.math, sci.stat.edu where the same questions had been posted. On Thu, 24 May 2001 09:51:48 -0400, "Mike Tonkovich" <[EMAIL PROTECTED]> wrote: > Before I get to the issue at hand, I was hoping someone might explain the > differences between the following 3 newsgroups: sci.stat.edu, sci.stat.cons, > and sci.stat.math? Now that I've found these newsgroups, chances are good I > will be taking advantage of the powerful resources that exist out there. > However, I could use some guideance on what tends to get posted where? Some > general guidelines would be helpful. [ snip - statistical question, which someone has answered with plenty if good references and commentary.] Don't worry a whole lot about where. But if you want to post to all three, you can put all three in your address line. That way, the message only goes out once; people with decent newsreaders will only see it once; and a person who Replies (with most newsreaders) will be carried in all three. Two of these three groups also exist as Mail-lists (sse, ssc). What I just wrote about Replies probably doesn't work for them. Someone reading on a List will (I think) reply just to that list. Also, the Mail-list readers are less apt to read all the groups. My stats-FAQ has messages saved from all three. I never did pay much attention to what showed up, where, but you could scan my site for some indication as of a few years ago, when I was compiling those files. There are a lot of questions that would suit to any of them, and many are cross-posted, or posted separately to each. The math group tends to get some higher-calculus questions, and the questions overlapping with numerical analysis or computer science. You might look for cross-posts if you can examine Headers. I think it has been in .edu where we have discussed standardized testing; and the philosophical ideas of hypothesis testing; and how to teach statistical ideas. The .consult group seem appropriate for posing questions that don't have strong educational implications (that the poser notices). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Standardized testing in schools
On Thu, 24 May 2001 17:30:35 -0400, Rich Ulrich <[EMAIL PROTECTED]> wrote: > Standardized tests and their problems? Here was a > problem with equating the scores between years. > > The NY Times had a long front-page article on Monday, May 21: > "When a test fails the schools, careers and reputations suffer." > It was about a minor screw-up in standardizing, in 1999. Or, since I don't see the Sunday NY Times. But there were letters on Thursday, May 24, concerning a story on May 20, which was concerned with scoring errors -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Standardized testing in schools
On Thu, 24 May 2001 23:25:42 GMT, "W. D. Allen Sr." <[EMAIL PROTECTED]> wrote: > "And this proved to me , once again, > why nuclear power plants are too hazardous to trust:..." > > Maybe you better rush to tell the Navy how risky nuclear power plants are! > They have only been operating nuclear power plants for almost half a century > with NO, I repeat NO failures that has ever resulted in any radiation > poisoning or the death of any ship's crew. In fact the most extensive use of > Navy nuclear power plants has been under the most constrained possible > conditions, and that is aboard submarines! > > Beware of our imaginary boogy bears!! As I construct an appropriate sampling frame, one out of two nuclear navies has a good long-term record. Admiral Rickover had a fine success. The other navy was not so lucky, or suffered because it was more pressed for resources. > > You are right though. There is nothing really hazardous about the operation > of nuclear power plants. The real problem has been civilian management's > ignorance or laziness! [...] I'm glad you see the problem - though I see it more as 'ordinary management' than ignorance or laziness. It might not even have to be 'poor' management by conventional terms; the conventions don't take into account extraordinarily dangerous materials. The Japanese power plant's nuke-fluke of last year was an illustration of employee inventiveness and 'shop-floor innovation'. Unfortunately for them, they 'solved a problem' that had been a (too-) cleverly designed safety precaution. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Standardized testing in schools
Standardized tests and their problems? Here was a problem with equating the scores between years. The NY Times had a long front-page article on Monday, May 21: "When a test fails the schools, careers and reputations suffer." It was about a minor screw-up in standardizing, in 1999. Or, since the company stonewalled and refused to admit any problems, and took a long time to find the problems, it sounds like it became a moderately *bad* screw-up. The article about CTB/McGraw-Hill starts on page 1, and covers most of two pages on the inside of the first section. It seems highly relevant to the 'testing' that the Bush administration advocates, to substitute for having an education policy. CTB/McGraw-Hill runs the tests for a number of states, so they are one of the major players. And this proved to me , once again, why nuclear power plants are too hazardous to trust: we can't yet Managements to spot problems, or to react to credible problem reports in a responsible way. In this example, there was one researcher from Tennessee who had strong longitudinal data to back up his protest to the company; the company arbitrarily (it sounds like) fiddled with *his* scores, to satisfy that complaint, without ever facing up to the fact that they did have a real problem. Other people, they just talked down. The company did not necessarily lose much business from the episode because, as someone was quoted, all the companies who sell these tests have histories of making mistakes. (But, do they have the same history of responding so badly?) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Variance in z test comparing purcenteges
- BUT, Robert, the equal N case is different from cases with unequal N - - or did I lose track of what the topic really is... - On 22 May 2001 06:52:27 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: > and Rich Ulrich responded: > > Aren't we looking at the same contrast as the t-test with > > pooled and unpooled variance estimates? Then - > > Similar, but not identical. With the z-for-proportion we > have the additional twist that the amount of extra power > from te unpooled test is linked to the size of the effect > we're trying to measure, in such a way that we get it > precisely when we don't need it. Or, to avoid being too > pessimistic, let's say that the pooled test only costs us > power when we can afford to lose some . > - Robert wrote on May 18,"And, clearly, the pooled variance is larger; as the function is convex up, the linear interpolation is always less." Back to my example in the previous post: Whenever you do a t-test, you get exactly the same t if the Ns are equal. For unequal N, you get a bigger t when the group with the smaller variance gets more weight. I think your z-tests on proportions have to work the same way. I can do a t-test with a dichotomous variable as the criterion, testing 1 of 100 versus 3 of 6: 2x2 table is (1+99), (3+3). That gives me a pooled t of 6 or 7, that is p < .001; and a separate-variance t that is p= 0.06. - I like that pooled test, but I do think that it has stronger assumptions than the 2x2 table. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Elementary cross-sectional statistics
On Mon, 21 May 2001 13:41:16 GMT, "Sakke" <[EMAIL PROTECTED]> wrote: > Hello Everybody! > > We have a probably very simple question. We are doing cross-sectional > regressions. We are doing one regression per moth for a period of ten years, > resulting in 120 regressions. As we understood, it is possible to just take > a arithmetic average for every coefficient. Well, sure, it is possible to take an arithmetic average and then you can tell people, "Here is the arithmetic average." It's a lot harder to have any certainty that the average of a time series means much. > What we do not know, is how to > calculate the t-statistics for these coefficients. Can we just do the same, > arithmetic average? Can anybody help us? No, you certainly can't compute an average of some t-tests and claim that it is a t-test. What you absolutely have to have (in some sense) is a model of what happens over 10 years. For instance: If it is the same experience over and over again (that is your model of 'what happens'), *maybe* it would be proper to average each Variable over the 120 time points; and then do the regression. That is the easiest case I can think of --the mean is supposed to represent something, and you conclude that it represents the whole thing. Otherwise: What is there? What are you trying to conclude? Why? (Who cares?) Are the individual regressions 'significant'? Highly? Are there mean-differences over time? - variations between years or seasons? Are the lagged correlations near zero? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Variance in z test comparing purcenteges
On 18 May 2001 07:51:21 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: [ ... ] > OK, so what *is* going on here? Checking a dozen or so sources, I > found that indeed both versions are used fairly frequently (BTW, I > myself use the pooled version, and the last few textbooks I've used do > so). > > Then I did what I should have done years ago, and I tried a MINITAB > simulation. I saw that for (say) n1=n2=10, p1=p2=0.5, the unpooled > statistic tends to have a somewhat heavy-tailed distribution. This makes > sense: when the sample sizes are small the pooled variance estimator is > computed using a sample size for which the normal approximation works > better. > > The advantage of the unpooled statistic is presumably higher power; > hoewever, in most cases, this is illusory. When p1 and p2 are close > together, you do not *get* much extra power. When they are far apart > and have moderate sample sizes you don't *need* extra power. And when [ snip, rest] Aren't we looking at the same contrast as the t-test with pooled and unpooled variance estimates? Then - (a) there is exactly the same t-test value when the Ns are equal; the only change is in DF. (b) Which test is more powerful depends on which group is larger, the one with *small* variance, or the one with *large* variance. -- it is a large difference when Ns and variances are both different by (say) a fourfold factor or more. If the big N has the small variance, then the advantage lies with 'pooling' so that the wild, small group is not weighted as heavily. If the big N has the large variance, then the separate-variance estimate lets you take advantage of the precision of the smaller group. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Intepreting MANOVA and legitimacy of ANOVA
The usual problem of MANOVA, which is hard to avoid, is that even if a test comes out significant, you can't say what you have shown except 'different.' You get a clue by looking at the univariate tests and correlations. Or drawing up the interesting contrasts and testing them to see if they account for everything. I have a problem, here, that might be avoidable -- I can't tell what you are describing. Part of that is 'ugly abbreviations,' part is 'I do not like the terminology, DV and IV, abbreviated or not' so I will not take much time at it. On Fri, 18 May 2001 14:57:49 -0500, "auda" <[EMAIL PROTECTED]> wrote: > Hi, all, > In my experiment, two dependent variables were measured (say, DV1 and DV2). > I found that when analyzed sepeartely with ANOVA, independent variable (say, > IV and had two levels IV_1 and IV_2) modulated DV1 and DV2 differentially: > > mean DV1 in IV_1 > mean DV1 in IV_2 > mean DV2 in IV_1 < mean DV2 in IV_2 > > If analyzed with MANOVA, the effect of IV was significant, Rao > R(2,14)=112.60, p<0.000. How to intepret this result of MANOVA? Can I go > ahead to claim IV modulated DV1 and DV2 differentially based up the result > from MANOVA? Or I have to do other tests? > > Moreover, can I treat DV1 and DV2 as two levels of a factor, say, "type of > dependent variable", and then go ahead to test the data with > repeated-measures ANOVA and see if there is an interaction between IV and > "type of dependent variable"? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: bootstrap, and testing the mean of variance ratio
On Thu, 17 May 2001 02:33:54 + (UTC), [EMAIL PROTECTED] (rpking) wrote: [ snip, some of my response and his old and new comments.] > I use bootstrap to get the confidence intervals for A and B because > they are both >0 by construction, so the exact distributions of A and > B cannot be normal, and thus starndard distribution theory cannot > be used to obtain CIs. Occasionally, someone will say, as a point of theoretical interest, such-and-so cannot be 'normal' because it has a limited range (above zero, say). That is, in the context I think of, a hyper-technical point being made. It is to counter some silliness, where someone wants to work from Perfect Normality. Now, you have come up with the opposite silliness, and you claim that normal distribution theory cannot be used for CIs, with that thin excuse. You might consider: the name of 'normal' was attached because of the success in describing sociological data with that shape: measures including height, weight, number of births and deaths. Almost none of them included negative numbers. > > Now I want to test the null hypothesis that A - B=0. Let D=A-B. Could > D have a normal distribution? I don't know, and that's why I'm asking. > As I suggested - if we are not happy with the normality of variances, it is usually fine after we take the log. Ratios are another thing that are usually dealt with by taking the log. I posted: > > ... and that is relevant to what? Distributions of raw data are > >seldom (if ever) "asymptotically normal". I could clarify: samples do not become 'more normal' when the N gets larger. We hope that their *means* become better behaved, and they usually do. They don't have to be normal for the means to be used with the usual parametric statistics. SO I will say, one more time, try to apply ordinary (normal) statistics. > > So no social scientist should ever use asymptotic theory in their > anaysis of (raw) data? This is certainly a very extreme view. ?? I don't know what you attributing to me -- I was trying to tell you firmly, without being too rude, that only an ignorant amateur will start out with bootstrapping, and will refuse to use normal theory (like you are doing). I think you are mis-construing 'asymptotic' if you think it applies to raw data. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: (none)
[ note, Jay: HTML-formatting makes this hard to read ] On 11 May 2001 00:30:06 -0700, [EMAIL PROTECTED] (Jay Warner) wrote: [snip, HTML header] > I've had occasion to talk with a number of educator types lately, at different > application and responsibility levels of primary & secondary Ed. > Only one recalled the term, regression toward the mean. Some (granted, > the less analytically minded) vehemently denied that such could be causing > the results I was discussing. Lots of other causes were invoked. > IN an MBA course I teach, which frequently includes teachers wishing > to escape the trenches, the textbook never once mentions the term. > I don't recall any other intro stat book including the term, much less > an explanation. The explanation I worked out required some refinement > to become rational to those educator types (if it has yet :). - I am really sorry to learn that - Not even the texts! that's bad. By the way, there are two relevant chapters in the 1999 history, "Statistics on the Table" by Stephen Stigler (see pages 157-179). Stigler documents a big, embarrassing blunder by a noted economist, published in 1933. Horace Secrist wrote a book with tedious detail, much of it being accidental repetitions of regression fallacy. Hotelling panned it in a review in JASA. Next, Secrist replied in a letter, calling Hotelling "wholly mistaken." Hotelling tromped back, " ... and when one version of the thesis is interesting but false and the other is true but trivial, it becomes the duty of the reviewer to give warning at least against the false version." Maybe Stigler's user-friendly anecdote will help to spread the lesson, eventually. > So I'm not surprised that even the NYT would miss it entirely. > Rich, I hope you penned a short note to the editor, pointing out its presence. > Someone has to, soon. I did not write, yet. But I see an e-mail address, which is not usual in the NYTimes. I guess they identify Richard Rothstein as [EMAIL PROTECTED] because this article was laid out as a feature (Lessons) instead of an ordinary news report. I'm still considering what I should say, if someone else doesn't tell me that they have passed the word. > BTW, Campbell's text, "A primer on regression artifacts" mentions a > correction factor/method, which I haven't understood yet. Does anyone > in education and other social science circles use this correction, and > may I have a worked out example? Since you mentioned it, I checked my new copy of the Campbell/ Kenny book. Are you in Chapter 5? There is a lot going on, but I don't grasp that there is any well-recommended correction. Except, maybe, Structural-equations-modeling, and they just gesture vaguely in the direction of that. Give me a page number? I thought that they re-inforced my own prejudices, that when two groups are not matched at Pre, you have a lot of trouble forming clear conclusions. You can be a bit assertive if one group "wins" by all three standards (raw score, change score, regressed-change score), but you still can't be 100% sure. When your groups don't match, you draw the graphs to help you clarify trends, since the eyeball is great at pattern analysis. Then you see if any hostile interpretations can undercut your optimistic ones, and you sigh regrets when they do. > Jay > Rich Ulrich wrote: > http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: bootstrap, and testing the mean of variance ratio
On Wed, 16 May 2001 11:50:07 + (UTC), [EMAIL PROTECTED] (rpking) wrote: > For each of the two variance ratios, A=var(x)/var(y) and > B=var(w)/var(z), I bootstrapped with 2000 replications to obtain > confidence intervals. Now I want to test whether the means are > equal, ie. E(A) = E(B), and I am wondering whether I could just use > the 2000 data points, calculate the standard deviations, and do a > simple t test. This raises questions, questions, questions. What do you mean by a "data point"? by "bootstrapping"? Why do you want ratios of the variances? If you are concerned with variances, why aren't you considering the logs of V? If you are concerned with ratios, why are you considering the logs of the ratios? With "2000 replications" each, there would seem to be 4000 points. Or, what relation is there among x-y-z-w?If these give you 2000 vectors, then why don't you have a paired comparison in mind? Bootstrapping is tough enough to figure what's proper, that I don't want to bother with it. Direct tests are usually enough: So, if you were considering a direct test, What would you be testing? (I figure there is really good chance that you are wrong in what you are trying to bootstrap, or how you are doing it.) > I have concerns because A and B are bounded below at 0 (but not > bounded above), so the distribution may not be asymptotically > normal. ... and that is relevant to what? Distributions of raw data are seldom (if ever) "asymptotically normal". >But I also found the bootstrapped A and B are well away from > zero; the 1% percentile has a value of 0.78. ... well, I should hope they are away from zero. Relevance? > So could t test be > used in this situation? Or should I do another bootstrapping for > the test? Take your original problem to a statistician. "Bootstrap" is something to retreat to when you can't estimate error directly, and you have given no clue why you might need it. -- Rich Ulrich,[EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: additional variance explained (SPSS)
On 11 May 2001 12:04:04 -0700, [EMAIL PROTECTED] (Dianne Worth) wrote: > "I have a multiple regression y=a+b1+b2+b3+b4+b5. My Adj. R-sq is > .403. > > "I would like to determine how much explanation of variance each IV > provides. I have created individual models (y=a+b1+b2+b3) to obtain Unfortunately - that is a F.A.Q. which has no easy answer. Please read up on multiple regression in some textbooks. The variables are acting together, so there is not actually any "amount that each IV provides." Unless the variables are totally uncorrelated (say, design factors), there is no satisfactory, unique answer. [ One clever partition uses the sum of r_0-times- , which does add up to the R-squared. However, if it were a *satisfactory* generalization, it could never have terms less than zero... which does happen. ] What we usually get is the "variance AFTER all the others", which is (for instance) obtained by subtraction, as you suppose; and that will usually add up to far less than 100%. However, beware: In odd cases, with "suppressor variables," these variances may add up to more than 100%. The regression does give you the t-test (or F-test) on the contribution of variance. Actually, that can be manipulated to give precisely the "variance after all the others" since the tests are using that variance in the numerator. As to reporting: you should report the initial, zero-level correlation. That is other evidence about whatever is happening in the equation. So, you can report, "B3 accounts for by itself; and it can account for even after the contributions of the other are taken into account first." Hope this helps. I think my stats-FAQ offers some more perspective on Multiple regression. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question
On 11 May 2001 07:34:38 -0700, [EMAIL PROTECTED] (Magill, Brett) wrote: > Don and Dennis, > > Thanks for your comments, I have some points and futher questions on the > ussue below. > > For both Dennis and Don: I think the option of aggregating the information > is a viable one. I would call it "unavoidable" rather than just "viable." The data that you show is basically aggregated already; there's just one item per-person. > Yet, I cannot help but think there is some way to do this > taking into account the fact that there is variation within organizations. > I mean, if I have a organizational salary mean of .70 (70%) with a very tiny [ snip, rest] - I agree, you can use the information concerning within-variation. I think it is totally proper to insist on using it, in order to validate the conclusions, to whatever degree is possible. You might be able to turn around that 'validation' to incorporate it into the initial test; but I think the role as "validation" is easier to see by itself, first. Here's a simple example where the 'variance' is Poisson. (Ex.) A town experiences some crime at a rate that declines steadily, from 20 000 incidents to 19 900 incidents, over a 5-year period. The linear trend fitted to the several points is "highly significant" by a regression test. Do you believe it? (Answer) What I would believe is: No, there is no trend, but it is probably true that someone is fudging the numbers. The *observed variation* in means is far too small for the totals to be seen be chance. And the most obvious sources of error would work in the opposite direction. [That is, if there were only a few criminals responsible for many crimes each, and the number-of-criminals is what was subject to Poisson variation, THEN the number-of-crimes should be even more variable.] In your present case, I think you can estimate on the basis of your factory (aggregate) data, and then you figure what you can about how consistent those numbers are with the un-aggregated data, in terms of means or variances. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Variance in z test comparing percenteges
On 11 May 2001 22:29:37 -0700, [EMAIL PROTECTED] (Donald Burrill) wrote: > On Sat, 12 May 2001, Alexandre Kaoukhov (RD <[EMAIL PROTECTED]>) wrote: > > > I am puzzled with the following question: > > In z test for continuous variables we just use the sum of estimated > > variances to calculate the variance of a difference of two means i.e. > >s^2 = s1^2/n1 + s2^2/n2. [ snip, Q and A, AK and DB ... ] > > On the other hand the chi2 is derived from Z^2 as assumed by first > > approach. DB> > Sorry; the relevance of this comment eludes me. Well -- every (normal) z score can be squared, to produce a chi-squared score. One particular formula for a z matches the Pearson product-moment chisquared test statistic. > > Finally, I would like to know whether the second formula is ever used > > and if so does it have any name. DB> > "Ever" is a wider universe of discourse than I would dare pretend to. > Perhaps colleagues on the list may know of applications. > I would be surprised if it had been named, though. I don't remember a name, either. I think I do remember seeing a textbook that presented that t as their preferred "test for proportions." -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: 2x2 tables in epi. Why Fisher test?
- I offer a suggestion of a reference. On 10 May 2001 17:25:36 GMT, Ronald Bloom <[EMAIL PROTECTED]> wrote: [ snip, much detail ] > It has become the custom, in epidemiological reports > to use always the hypergeometric inference test -- > The Fisher Exact Test -- when treating 2x2 tables > arising from all manner of experimental setups -- e.g. > > a.) the prospective study > b.) the cross-sectional study > 3.) the retrospective (or case-control) study > [ ... ] I don't know what you are reading, to conclude that this has "become the custom." Is that a standard for some journals, now? I would have thought that the Logistic formulation was what was winning out, if anything. My stats-FAQ has mention of the discussion published in JRSS (Series B) in the1980s. Several statisticians gave ambivalent support to Fisher's test. Yates argued the logic of the exact test, and he further recommended the X2 test computed with his (1935) adjustment factor, as a very accurate estimator of Fisher's p-levels. I suppose that people who hate naked p-levels will have to hate Fisher's Exact test, since that is all it gives you. I like the conventional chisquared test for the 2x2, computed without Yates's correction -- for pragmatic reasons. Pragmatically, it produces a good imitation of what you describe, a randomization with a fixed N but not fixed margins. That is ironic, as Yates points out (cited above) because the test "assumes fixed margins" when you derive it. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: (none)
- selecting from CH's article, and re-formatting. I don't know if I am agreeing, disagreeing, or just rambling on. On 4 May 2001 10:15:23 -0700, [EMAIL PROTECTED] (Carl Huberty) wrote: CH: " Why do articles appear in print when study methods, analyses, results, and conclusions are somewhat faulty?" - I suspect it might be a consequence of "Sturgeon's Law," named after the science fiction author. "Ninety percent of everything is crap." Why do they appear in print when they are GROSSLY faulty? Yesterday's NY Times carried a report on how the WORST schools have improved more than the schools that were only BAD. That was much- discussed, if not published. - One critique was, the absence of peer review. There are comments from statisticians in the NY Times article; they criticize, but (I thought) they don't "get it" on the simplest point. The article, while expressing skepticism by numerous people, never mentions "REGRESSION TOWARD the MEAN" which did seem (to me) to account for every single claim of the original authors whose writing caused the article. CH: " [] My first, and perhaps overly critical, response is that the editorial practices are faulty[ ... ] I can think of two reasons: 1) journal editors can not or do not send manuscripts to reviewers with statistical analysis expertise; and 2) manuscript originators do not regularly seek methodologists as co-authors. Which is more prevalent?" APA Journals have started trying for both, I think. But I think that "statistics" only scratches the surface. A lot of what arises are issues of design. And then there are issues of "data analysis". Becoming a statistician helped me understand those so that I could articulate them for other people; but a lot of what I know was never important in any courses. I remember taking just one course or epidemiology, where we students were responsible for reading and interpreting some published report, for the edification of the whole class -- I thought I did mine pretty well, but the rest of the class really did stagger through the exercise. Is this "critical reading" something that can be learned, and improved? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Analysis of a time series of categorical data
On 3 May 2001 09:46:12 -0700, [EMAIL PROTECTED] (R. Mark Sharp; Ext. 476) wrote: > If there is a better venue for this question, please advise me. - an epidemiology mailing list? [ snip, much detail ] > Time point 1Time point 2Time point 3Time point 4 Hosts > Inf Not-InfInf Not-InfInf Not-InfInf Not-Inf Tested > > G1-S11 14 11 4 11 1 13 2 57 > G1-S27 8 12 3 14 2 15 8 69 > G1-S31 246 18815915 95 > > G2-S43 12 12 4 10 4 14 2 61 > G2-S55 105 68 7 1114 57 > G2-S62 26 12 12 1116 1412 105 > > The questions are how can group 1 (G1) be compare to group 2 (G2) and > how can subgroups be compared. I maintain that the heterogeneity > within each group does not prevent pooling of the subgroup data > within each group, because the groupings were made a priori based on > genetic similarity. Mostly, heterogeneity prevents pooling. What's an average supposed to mean? Only if the Ns represent naturally-occurring proportions, and so does your hypothesis, then you MIGHT want to analyze the numbers that way. How much do you know about the speed of expected onset, and offset of the disease? If this were real, It looks to me like you would want special software. Or special evaluation of a likelihood function. I can put the hypothesis in simple ANOVA terms, comparing species (S). Then, the within-Variability of G1 and G2 -- which is big -- would be used to test the difference Between: according to some parameter. Would that be an estimate of "maximum number afflicted"? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Please ignore. Only a test
On 1 May 2001 01:35:58 -0700, [EMAIL PROTECTED] wrote: > Anyhow, what's going on here is that I've > been rather dismayed at how my very > carefully formatted postings have > appeared on the list with all their lines > truncated in all the wrong spots. (Or at > least they've shown up that way on my > server) Have been advised that I should > keep their length to 72 characters or > less, and am giving that a shot. Wanna > see how well it works. - I just scanned a number of posts in sci.stat.edu, which is EDSTAT-L, and I don't see any that are truncated/wrapped. I do see messages, now and then, in some groups, that have been line-wrapped by some agency. And I can set a VIEW option in Forte Agent that might provide truncation, or it will line-wrap for those messages where the whole paragraph has been sent as a single line. But I don't see a problem today. And I don't have any line-problem with a message that you posted a couple of days ago. So, I think you are reporting on a feature of the mail program or the Newsreader that you happen to be using, or a feature of the OPTIONS of the program. You can keep lines short to avoid causing those problems, for whomever Or you can keep them short just to make them easier to read. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: (none)
On 1 May 2001 16:14:28 -0700, [EMAIL PROTECTED] (SamFaz Consulting) wrote: > Under the Bill s. 1618 title III passed by the 105th US congress this letter cannot be considered SPAM as long as the sender includes contact information and a method of removal. To be removed, hit reply and type ?remove? in the subject line. Here was a message posted, that my reader saw as an attachment. The lines above were at the start of the SPAM. Ahem. I am about 100% sure that the above is a lie. In multiple ways. For instance, Is there a legal definition of SPAM? It has been remarked that you do *not* want to use the "remove" option when someone sends SPAM that is using a *garbage* mailing list (Such as, the above; and any other mailing list that includes a Mail-list, or a Usenet group). That's because the REPLY from you proves that your Mail address is real, and current, and that you read the message. So the SPAMmer will save your name, specially. If you can read the headers, you sometimes can make an effective complaint to the X-abuse: address, if there is one, or to the appropriate < Postmaster@isp-name >. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
progress in science [ was: rotations and PCA ]
- "progress in science" is the new topic. I comment. On 9 Apr 2001 07:12:08 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: > Eric Bohlman wrote: >In science, it's not enough to > > say that you have data that's consistent with your hypothesis; you also > > need to show a) that you don't have data that's inconsistent with your > > hypothesis and b) that your data is *not* consistent with competing > > hypotheses. And there's absolutely nothing controversial about that last > > sentence [...] > > Well, I'd want to modify it a little. On the one hand, a certain amount > of inconsistency can be (and sometimes must be) dealt with by saying > "every so often something unexpected happens"; otherwise it would only > take two researchers making inconsistent observations to bring the whole > structure of science crashing down. And on the other hand there are Once upon a time, I spent many hours with the book, "Criticism and the growth of knowledge." Various (top) philosophers comment on Thomas Kuhn's contributions (normal and revolutionary science; paradigms; and so on), and on each other. In real science (I. Lakatos argues), models are strongly resistant to refutation so long as they remain fertile for research and speculation. The pertinent historical model is "phlogiston versus the caloric theory" -- The honored professors on neither side, it seems, ever convinced the other; there was plenty of conflicting data, for decades. But one side won new adherents and new researchers. > _always_ competing hypotheses. [Consider Jaynes' example of the > policeman seeing one who appears to be a masked burglar exiting from the > broken window of a jewellery store with a bag of jewellery; he (the > policeman) does *not* draw the perfectly logical conclusion that this > might be the owner, returning from a costume party, and, having noticed > that the window was broken, collecting his stock for safekeeping.] It is > sufficient to show that your data are not consistent with hypotheses > that are simpler or more plausible, or at least not much less simple or > plausible. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
On 21 Apr 2001 13:04:55 -0700, [EMAIL PROTECTED] (Will Hopkins) wrote: > I've joined this one at the fag end. I'm with Dennis Roberts. The way I > would put it is this: the PRINCIPLE of a sampling distribution is actually > incredibly simple: keep repeating the study and this is the sort of spread > you get for the statistic you're interested in. What makes it incredibly > simple is that I keep well away from test statistics when I teach stats to > biomedical researchers. I deal only with effect (outcome) statistics. I > even forbid my students and colleagues from putting the values of test > statistics in their papers. Test statistics are clutter. > > The actual mathematical form of any given sampling distribution is > incredibly complex, but only the really gifted students who want to make > careers out of statistical research need to come to terms with that. The So you guys are all giving advice about teaching statistics to psychology majors/ graduates, who have no aspirations or potential for being anything more than "consumers" (readers) of statistics? Or (similar intent) to biomedical researchers? Don't researchers deserve to be shown a tad more? A problem that I have run into is that Researchers who are well-schooled in the names and terms of procedures don't always recognize the leap to "good data analysis." Actually, that can be true about people trained as biostatisticians, too, despite a modicum of exposure to case studies and Real Data, and I suspect it is *usually* true about people just emerging from training as mathematical statisticians. Just a couple of thoughts. > rest of us just plug numbers into a stats package or spreadsheet. I'm not > sure what would be a good sequence for teaching the mathematical > forms. Binomial --> normal --> t is probably as good as any. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Question: Assumptions for Statistical Clustering (ie. Euclidean distance based)
On Sun, 22 Apr 2001 16:23:46 GMT, Robert Ehrlich <[EMAIL PROTECTED]> wrote: > Clustering has a lot of associated problems. The first is tha tof cluster > validity--most algorithms define the existence of as many clusters as the user > demands. A very important problem is homogeneity of variance. So a Z > transformation is not a bad idea whether or not the variables are normal. Unless you want the 0-1 variable to count as 10% as potent as the variable scored 0-10. The classical default analysis does let you WEIGHT the variables, by using arbitrary scaling. (Years ago, it was typical, shoddy documentation of the standard default, that they didn't warn the tyro. Has it improved? Has the default changed?) > Quasi-normnality is about all you have to assume--the absence of intersample > polymodality and the aproximation of the mean and the mode. However, to my > knowledge, there is no satisfying "theory" associated withcluster analyis--only > rules of thumb. [ snip, original question ] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: ANCOVA vs. sequential regression
On Fri, 20 Apr 2001 13:11:02 -0400, "William Levine" <[EMAIL PROTECTED]> wrote: ... > A study was conducted to assess whether there were age differences in memory > for order independent of memory for items. Two preexisting groups (younger > and older adults - let's call this variable A) were tested for memory for > order information (Y). These groups were also tested for item memory (X). > > Two ways of analyzing these data came to mind. One was to perform an ANCOVA > treating X as a covariate. But the two groups differ with respect to X, > which would make interpretation of the ANCOVA difficult. Thus, an ANCOVA did > not seem like the correct analysis. - "potentially problematic" - but not always wrong. > A second analysis option (suggested by a friend) is to perform a sequential > regression, entering X first and A second to > test if there is significant leftover variance explained by A. [ snip ... suggestions? ] Yes, you are right, that is exactly the same as the ANCOVA. What can you do? What can you conclude? That depends on - how much you know and trust the *scaling* of the X measure, - how much overlap there is between the groups, and - how much correlation there is, X and Y. You probably want to start by plotting the data. When you use different symbols for Age, what do you see about X and Y? and Age? Here's a quick example of hard choices when groups don't match. Assume: group A improves, on the average, from a mean score of 4, to 2. Assume group B improves from 10 to 5 Then: a) A is definitely better in "simple outcome" at 2 vs. 5; b) B is definitely better in "points of improvement" at 5 vs. 2; c) A and B fared exactly as well, in terms of "50% improvement" (dropping towards a 0 that is apparently meaningful). I would probably opt for that 3rd interpretation, given this set of numbers, since the 3rd answer preserves a null hypothesis. With another single set of numbers in hand, I would lean towards *whatever* preserves the null. But here is where experience is supposed to be a teacher -- If you have dozens of numbers, eventually you have to read them with consistency, instead of bending an interpretation to fit the latest set. But if you do have masses of data on hand, then you should have extra evidence about correlations, and about additive or multiplicative scaling. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
On 19 Apr 2001 05:26:25 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: [ ... ] > The z test and interval do have some value as a pedagogical > scaffold with the better students who are intended to actually > _understand_ the t test at a mathematical level by the end of the > course. > > For the rest, we - like construction crews - have to be careful > about leaving scaffolding unattended where youngsters might play on it > in a dangerous fashion. > > One can also justify teaching advanced students about the Z test so > that they can read papers that are 50 years out of date. The fact that > some of those papers may have been written last year - or next- is, > however, unfortunate; and we should make it plain to *our* students that > this is a "deprecated feature included for reverse compatibility only". Mainly, I disagree. I had read 3 or 4 statistic books and used several stat programs before I enrolled in graduate courses. One of the *big surprises* to me was to learn that some statistics were approximations, through-and-through, whereas others might be 'exact' in some sense. Using z as the large sample test, in place of t, is approximate. Using z as the test-statistic on a dichotomy or ranks is exact, since the variances are known from the marginal Ns. Using z for *huge* N is a desirable simplifications, now and then. Is the 1-df chi-squared equally worthless, in your opinion? A lot of those exist, fundamentally, as the square of a z that *could* be used instead (for example, McNemar's test). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression hypotheses
On Thu, 19 Apr 2001 10:27:40 -0400, "Junwook Chi" <[EMAIL PROTECTED]> wrote: > Hi everybody! > I am doing Tobit analysis for my research and want to test regression > hypotheses (Normality, Constant variance, Independence) using plots of > residuals. I also want to check outliers and leverage. but I am not sure > whether I could use these tests for the Tobit model (non-linear) or they > apply only for linear regression. does anybody know it? thank you! > Does your Tobit model have *testing* as part of it? Normality might not matter much for any of the models. Statistical tests - any of them - depend on how you weight your cases. So, constant variance matters (more or less). You can measure to see how you get by. Independence is a tricky notion, at times, but you surely need it. It is best to assure independence from your model and your theory if you can, because it can be tough to account for autocorrelation, etc., after the fact. This is also, "If you want useful testing." -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Simple ? on standardized regression coeff.
On Tue, 17 Apr 2001 16:32:06 -0500, "d.u." <[EMAIL PROTECTED]> wrote: > Hi, thanks for the reply. But is beta really just b/SD_b? In the standardized > case, the X and Y variables are centered and scaled. If Rxx is the corr matrix [ ... ] No. b/SD_b is the t-test. Beta is b, after it is scaled by the SD of X and the SD of Y. Yes, beta is the b if X and Y are 'scaled' to unit normal. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Simple ? on standardized regression coeff.
On Mon, 16 Apr 2001 20:24:10 -0500, "d.u." <[EMAIL PROTECTED]> wrote: > Hi everyone. In the case of standardized regression coefficients (beta), > do they have a range that's like a correlation coefficient's? In other > words, must they be within (-1,+1)? And why if they do? Thanks! > There is no limit on the raw coefficient, b, so there is no limit on beta= b/SD. In practice, b gets large when there is a suppressor relationship, so that the x1-x2 difference is what matters, e.g., (10x1-9x2). Beta is about the size of the univariate correlation when the co-predictors balance out in their effects. I usually want to consider a different equation if any beta is greater than 1 or has the opposite sign from its corresponding, initial r -- for instance, I might combine (X1, X2) in a rational way. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: In realtion to t-tests
On Mon, 09 Apr 2001 10:44:40 -0400, Paige Miller <[EMAIL PROTECTED]> wrote: > "Andrew L." wrote: > > > > I am trying to learn what a t-test will actually tell me, in simple terms. > > Dennis Roberts and Paige Miller, have helped alot, but i still dont quite > > understand the significance. > > > > Andy L > > A t-test compares a mean to a specific value...or two means to each > other... [ ... ] I remember my estimation classes, where the comparison was always to ZERO for means. To ONE, I guess, for ratios. Technically speaking, or writing. For instance, if the difference in averages X1, X2 is expected to be zero, then "{(X1-X2) -0 }" ... is distributed as t . It might look like a lot of equations with the 'minus zero' seemingly tacked on, but I consider this to be good form. It formalizes as minus -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Logistic regression advice
On 6 Apr 2001 13:15:34 -0700, [EMAIL PROTECTED] (Zina Taran) wrote: [ ... on logistic regression ] ZT: "1). The 'omnibus' chi-squared for the equation. Is it accurate to say that I can interpret individual significant coefficients if (and only if) the equation itself is significant? " Confused question. Why do you label it the omnibus test? When you think to use that term, the term is (mainly) a ROLE for the overall test, or for a test that subsumes a coherent set of several tests; sometimes you place use test that way, and sometimes you don't. ZT: "2) A few times I added interaction terms and some things became significant. Can I interpret these even if the interaction variable itself (such as 'age') is not significant? Can I interpret an interaction term if neither variable has a significant beta?" Probably not. Assuredly not, unless someone has used care and attention (and knowledge) in the exact dummy-coding of the effects. [ ... snip, 'Nagelkerke' that I don't recall; 'massive' regression which is a term that escapes me, but I think it means, 'no hypotheses, test everything'; and so I disapprove. ] ZT: "5) I know the general rule is 'just the facts' in the results section, meaning that there should be no explanation or interpretation regarding the results. When writing the results section do I specifically draw conclusions as to whether a hypothesis is supported or does that get left to the discussion?" Do you have an Example that is difficult? - It seems to me that if the analyses are straightforward, there should be little question about what the 'results' mean, when you lay them out in their own, minimalist section. In other words, leave discussion to the discussion; but that should be a re-cap of what's apparent. You hope. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: rotations and PCA
- Intelligence, figuring what it might be, and categorizing it, and measuring it... I like the topics, so I have to post more. On Thu, 05 Apr 2001 22:09:33 +0100, Colin Cooper <[EMAIL PROTECTED]> wrote: > In article <[EMAIL PROTECTED]>, > Rich Ulrich <[EMAIL PROTECTED]> wrote: > > > I liked Gould's book. I know that he offended people by pointing to > > gross evidence of racism and sexism in 'scientific reports.' But he > > has (I think) offended Carroll in a more subtle way. Gould is > > certainly partial to ideas that Carroll is not receptive to; I think > > that is what underlies this critique. > > > > ===snip > > I've several problems with Gould's book. > > (1) Sure - some of the original applications of intelligence testing > (screening immigrants who were ignorant of the language using tests > which were grossly unfair to them) were unfair, immoral and wrong. But > why impugn the whole area as 'suspect' because of the > politically-dubious activities of some researchers a century ago? It I think Gould to "impugned" more than just one area. The message, as I read it, was, "Be leery of social scientists who provide self-congratulatory and self-serving, simplistic conclusions." In recent decades, I imagine that economists have been bigger at that than psychologists. Historians have quite a bit of 20th century history-writing to live down, too. > seems to me to be exceptionally surprising to find that ALL abilities - > musical, aesthetic, abstract-reasoning, spatial, verbal, memory etc. > correlate not just significantly but substantially. Here is one URL for references to Howard Gardner, who has shown some facets of independence of abilities (and who you mention, below). http://www.newhorizons.org/trm_gardner.html > (2) Gould's implication is that since Spearman found one factor > (general ability) whilst Thurstone fornd about 9 identifiable factors, > then factor analysis is a method of dubious use, since it seems to > generate contradictory models. There are several crucial differences - I read Gould as being more subtle than that. > between the work of Spearman and Thurstone that may account for these > differences. For example, (a) Spearman (stupidly) designed tests > containing a broad spectrum of abilities: his 'numerical' test, for > example, comprised various sorts of problems - addition, fractions, etc. > Thurstone used separate tests for each: so Thurstone's factors > essentially corresponded to Spearman's tests. (b) Thurstone's work was > with students where the limited range of abilities would reduce the > magnitude of correlations between tests. (c) More recent work (e.g., > Gustafsson, 1981; Carroll, 1993) using exploratory factoring and CFA > finds good evidence for a three-stratum model of abilities: 20+ > first-order factors, half a dozen second-order factors, or a single > 3rd-order factor. > > (3) Interestingly, Gardner's recent work has come to almost exactly the > same conclusions from a very different starting point. Gardner > identiied groups of abilities which, according to the literature, tended > to covary - for example, which tend to develop at the same age, all > change following drugs or brain injury, which interfere with each other > in 'dual-task' experiments and so on. His list of abilities derived in > this was is very similar to the factors identified by Gustaffson, > Carroll and others. - but Gardner has "groups of abilities" that are, therefore, distinct from each other. And also, only a couple of abilities are usually rewarded (or even measured) in our educational system. When I read his book, I thought Gardner was being overly "scholastic" in his leaning, and restrictive in his data, too. > I have a feeling that we're going to get on to the issue of whether > factors are merely arbitrary representations of sets of data or whether > some solutions are more are more meaningful than others - the rotational > indeterminacy problem - but I'm off to bed! Well, how much data can you load into one factor analysis? How much virtue can you assign to one 'central ability'? - I see the problem as philosophical instead of numeric. What you will *identify* as a single factor (by techniques of today) will be more trivial than you want. Daniel Dennett, in "Consciousness Explained," does a clever job of defining consciousness. And trivializing it; what I was interested in (I reflect to myself) was something much grander, something more meaningful. But intelligence and self-awareness are separate topics, and big ones. Julian Jaynes's book was more use
Re: Fw: statistics question
I reformatted this. Quoting a letter from Carmen Cummings to himself, On 6 Apr 2001 08:48:38 -0700, [EMAIL PROTECTED] wrote: > The below question was on my Doctorate Comprehensives in > Education at the University of North Florida. > > Would one of you learned scholars pop me back with >possible appropriate answers. the question An educational researcher was interested in developing a predictive scheme to forecast success in an elementary statistics course at a local university. He developed an instrument with a range of scores from 0 to 50. He administered this to 50 incoming frechmen signed up for the elementary statistics course, before the class started. At the end of the semester he obtained each of the 50 student's final average. Describe an appropriate design to collect data to test the hypothesis. = end of cite. I hope the time of the Comprehensives is past. Anyway, this might be better suited for facetious answers, than serious ones. The "appropriate design" in the strong sense: Consult with a statistician IN ORDER TO "develop an instrument". Who decided only a single dimension should be of interest? (How else does one interpret a score with a "range" from 0 to 50?) Consult with a statistician BEFORE administering something to -- selected? unselected? -- freshman; and consult (perhaps) in order to develop particular hypotheses worth testing. I mean, the kids scoring over 700 on Math SATs will ace the course, and the kids under 400 will have trouble. Generalizing, of course. If "final average" (as suggested) is the criterion, instead of "learning." But you don't need a new study to tell you those results. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Repeated-measures t test for ratio level data
replying to mine, and catching an error, (My apologies for the error, please, and my thanks to Jim for the catch ), On Tue, 03 Apr 2001 15:05:20 -0700, James H. Steiger wrote: > Things are not always what they seem. > > Consider the following data: > > A B A/BLog A Log B Log A - Log > 3 13.477 0.477 > 1 3.333 0 .477 -.477 > 2 2 1.301.301 0 > > > The t test for the difference of logs obviously > gives a value of zero, while the t for the hypothesis > that the mean ratio is 1 has a positive value. > > This seems to show that the statement that the > two tests are "precisely, 100% identical" > is incorrect. [ snip, more ... ] Yep, sorry -- I fear that I left out a step, even as I sat and read the problem. And when I read my own 1st draft of an answer, I saw that it was worded a bit equivocally. I made that statement firmer, but I forgot to make sure it was still true, in detail. { 1/2, 1/1, 2/1 } ( equal to .5, 1.0, 2.0) clearly does not define equal steps. Except, if you first take log(X). The automatic advice for "ratios" -- not always true, but always to be considered -- is "take the logs". When you have a ratio ( above zero), there is far less room between 0-1 than above 1. Is this asymmetry ever desirable, for the metric? Well, it *ought* to be desirable, if you are going to use a ratio without further transformation. But I think it is not apt to be desirable for human reaction times. For log(A) and log(B), consider: log(A/B) = log(A) - log(B). The one-sample test on *LOG* of A/B is the same as the difference in logs. Those are the tests I had in mind... or should have had in mind. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: attachments
On Fri, 06 Apr 2001 13:34:03 GMT, Jerry Dallal <[EMAIL PROTECTED]> wrote: > "Drake R. Bradley" wrote: > > > While I agree with the sentiments expressed by others that attachments should > > not be sent to email lists, I take exception that this should apply to small > > (only a few KB or so) gif or jpeg images. Pictures *are* often worth a > > thousand words, and certainly it makes sense that the subscribers to a stat > > It's worth noting that some lists have gateways to Usenet groups. > Usenet does not support attachments, so they will be lost to Usenet > readers. [ break ] - my Usenet connection seems to give me all the attachments. But if I depended on a modem and a 7-bit protocol, I would be pleased if my ISP filtered out the occasional, 100 kilobyte 8-bit attachment. (Some folk still use 7-bit protocols, don't they?) > Also, even in the anything-goes early 21-st Century climate > of the Internet, one big no-no remains the posting of binaries to > non-binary groups. Right; that's partly because of size. My vendor has the practice, these days, of saving ordinary groups for a week, binary groups (which are the BULK of their internet feed) for 24 hours. Binary strings may be treated as screen-commands, if your Reader doesn't know to package them as an 'attachment' or otherwise ignore them. Some attachments are binary, some are not. Standard HTML files are ASCII, with the added 'risk' (I sometimes look at it that way) of invoking an immediate internet connection. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Repeated-measures t test for ratio level data
Doing that one-sample t-test on the ratio is not a bad idea. But it is not a new idea, either. It is, precisely, 100% identical to doing a repeated measures test on the logarithm of the raw numbers. Which is the same as the paired t-test. On 2 Apr 2001 11:53:11 -0700, [EMAIL PROTECTED] (Dr Graham D Smith) wrote: > I would like to start a discussion on a family of procedures > that tend not to be emphasised in the literature. The procedures > I have in mind are based upon the ratio between two sets of > scores from the same sample. [ ... snip, detail ] > My feeling is that the t test for ratios should have a similar > status and profile as the repeated measures t test (on > differences). I suspect that the t test for differences is often > used when the t test for ratios would be more suitable. So > why is the procedure not more widely used? Perhaps this > is only a problem within psychology where ratio level data > is not commonly used. [ snip, rest ] Logarithms (if that is what is appropriate) is a more general start to a model. Building directly on ratios is not as convenient. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: rotations and PCA
On Sun, 01 Apr 2001 22:13:18 +0100, Colin Cooper <[EMAIL PROTECTED]> wrote: > ==snip See Stephen Jay Gould's _The Mismeasure of Man_ for more > > details; note that Thurstone adopted varimax rotations because their > > results were consistent with *his* pet theories about intelligence. > Hmm. Gould's book is generally reckoned to be rather partial and not > particularly accurate - see for example JB Carroll's 'editorial review' > of the second edition in 'Intelligence' about 4 years ago. (sorry - > haven't got the exact reference to hand). Comrey & Lee's book is one of A google search on < Carroll Gould Intelligence > immediately hit a copy of the article -- http://www.mugu.com/cgi-bin/Upstream/Issues/psychology/IQ/carroll-gould.html I liked Gould's book. I know that he offended people by pointing to gross evidence of racism and sexism in 'scientific reports.' But he has (I think) offended Carroll in a more subtle way. Gould is certainly partial to ideas that Carroll is not receptive to; I think that is what underlies this critique. After Google-ing Carroll, I see that he is a long-time researcher in "intelligence." To me, it seems that Gould is in touch with the newer stream of hypotheses about intelligence -- ideas that tend to invalidate the basic structures of old-line theorists like Carroll. In the article, Carroll eventually seems to express high enthusiasm for 'new techniques' (compared to what Gould made use of) in factor analysis. I can say, my own experience and reading has not led me to the same enthusiasm. Am I missing something? > the better introductions - Loehlin 'latent variable Models' is good if > you're coming to it from a structural equation modelling background. > > Colin Cooper -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: convergent validity
I'm coming in at a different slant from what I have seen posted on this thread (in sci.stat.edu). On Thu, 29 Mar 2001 20:30:59 +0200, "H.Goudriaan" <[EMAIL PROTECTED]> wrote: ... > I have 2 questionnaires assessing (physical and emotional) health of > heart patients. The 1st measures present state and it's assessed before > treatment and a couple of months after treatment, so that difference > scores can be calculated. The 2nd questionnaire is assessed after > treatment only, and asks respondents how much they have changed on every > aspect (same aspects as the first questionnaire) since just before > treatment. > Respondents received both questionnaires. Now I would like to > investigate the convergent validity of the two domains assessed with > both questionnaire versions. Is there a standard, straightforward way of > doing this? Someone advised me to do a factoranalysis (PCA) (on the > baseline items, the serially measured change scores and the > retrosepctively assessed change scores) and then compare the > factorloadings (I assume after rotation? (Varimax?)). I haven't got a > good feeling about this method for two reasons: > - my questionnaire items are measured on 5- and 7-point Likert scales, > so they're not measured on an interval level and consequently not > (bivariate) normally distributed; [ snip, about factor loading.] If items were really Likert, they would be close enough to normal. But there is no way (that comes to mind) that you should have labels for "Change" that are Likert: Likert range is "completely disagree" ... "completely agree" and responses describe attitudes. You can claim to have Likert-type labels, if you do have a symmetrical set. That is more likely to apply to your Present-Status reports, than to Changes. At any rate -- despite the fact that I have never found clean definitions on this -- having a summed score is not enough to qualify a scale as Likert. Thus, you *may* be well-advised, if someone has advised you so, to treat your responses as 'categories' -- at least, until you do the dual-scaling or other item analyses that will justify regarding them as "interval." For someone experienced in drawing up scales, or if you were picking up items from much-used tests, that would not be a problem; but people are apt to make mistakes if they haven't seen those mistakes well-illustrated. What is your question about individual items? Are some, perhaps, grossly inappropriate? Or, too rarely marked? If 11 are intended for a "physical factor", there *should* emerge a factor or principal component to reflect it. Ditto, for emotional. Any items that don't load are duds (that would be my guess). Or do you imagine 2 strong factors? Again -- whatever happens should not come as much surprise if you've done this sort of thing before. IF the items are done in strict-parallel, it seems unnecessary and obfuscatory to omit a comparison of responses, item by item. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: rotations and PCA
On Thu, 29 Mar 2001 10:17:09 +0200, "Nicolas Voirin" <[EMAIL PROTECTED]> wrote: > OK, thanks. > > In fact, it's a "visual" method to see a set of points with the better > view (maximum of variance). > It's like to swivel a cube around to see all of its sides ... but this > in more than 3D. > When I show points in differents planes (F1-F2, F2-F3, F2-F4 ... for > example), I make rotations, isn't it ? I think I would use the term, "projection" onto specific planes, if you are denoting x,y, and z (for instance) with F1, F2, F3 : You can look at the x-y plane, the y-z plane, and so on. Here is an example in 2 dimensions, which suggests a simplified version of an old controversy about 'intelligence'-- tests might provide two scores of Math=110, Verbal= 90. However, the abilities can be reported, with no loss of detail, as General= 100, M-versus-V= +20. Historically, Spearman wanted us all to conclude that "Spearman's g" had to exist as a mental entity, since its statistical description could be reliably produced. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: One tailed vs. Two tailed test
- I finally get back to this topic - On Fri, 16 Mar 2001 23:40:07 GMT, [EMAIL PROTECTED] (Jerry Dallal) wrote: > Rich Ulrich ([EMAIL PROTECTED]) wrote: > > : Notice, you can take out a 0.1% test and leave the main > : test as 4.9%, which is not effectively different from 5%. > > I've no problem with having different probabilities in the > two tails as long as they're specified up front. I say > so on my web page about 1-sided tests. I have concerns about > getting investigators to settle on anything other than > equal tails, but that's a separate issue. > The thing I've found interesting about > this thread is that everyone who seems to be defending > one-tailed tests is proposing something other than a > standard one-tailed test! > > FWIW, for large samples, 0.1% in the unexpected tail > corresponds to a t statistic of 3.09. I'd love to > be a fly on the wall while someone is explaining to > a client why that t = 3.00 is non-significant! :-) = concerning the 5.1% solution; asymmetrical testing with 0.05 as a one-sided, nominal level of significance, and 0.001 as the other side (as a precaution). Jerry, In that last line, you are jumping to a conclusion. Aren't you jumping to a conclusion? If the Investigator was seriously headed toward a 1-sided test -- which (as I imagine it) is how it must have been, that he could have been talked-around to the prospect of a 5.1% combined test instead -- then he won't be eager to jump on t= 3.00 as significant. I mean, it can be easier to "publish" if you pass magic size, but it is easier to avoid "perishing" in the long run, with a series of connected hypotheses. I think of the Investigator as torn three ways. a) Stick to the plan; ignore the t=3.0, which is *not quite* 0.001. 'It did not reach the previously stated, 0.001 nominal level, and I still don't believe it. (And I don't want to furnish ammunition for arguments for other things.)' Practically speaking, the risk of earning blame for stonewalling like that is not high. b) Run with it; claim that a two-sided test always *did* make sense and the statistician was to blame for brain-fever, for wanting 1-tailed in the first place. (Or, never mention it.) The fly on the wall probably would not see this. The statistician should have already quit. c) Report the outcome in the same diffident style as would have been earned by a 0.06 result in the other direction, "not quite meeting the preset nominal test size, but it is suggestive." Unlike the 6% result, this one is unwelcome. T=3.00 will stir up investigation to try to undermine the implication (such as it is). I have trouble taking the imagined outcome much further without speculating about where you have the trade-off between effect-size and N; and whether the "experimental design" was thoroughly robust -- and there's a different slant to the arguments if you are explaining or explaining-away the results of uncontrolled observation. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: rotations and PCA
On Wed, 28 Mar 2001 08:57:36 +0200, "Nicolas V." <[EMAIL PROTECTED]> wrote: > Hi, > > What are "rotations" in PCA ? > What is the difference between "rotated" and "unrotated" PCA ? > Does it exist in others analysis ? = just on 'existence' = Rotations certain exist in other analyses, and for other purposes. Anytime you have a coordinate system, you have potential for drawing in different axes, and then describing locations in terms of the new system. On a map in 2D, you can describe positions as directions, N-E-S-W. But if a river cuts along the diagonal, it could be more sensible to describe cities as "up-river" from the ocean by some amount, and by how far they are from the main tributary. - simplification like that, is the idea behind rotation. Common Factors are usually selected from the full-rank set, and rotated: so the description will be simpler. The full-rank set of PCs is often used as a matter of convenience (vectors are not correlated); and there's no help from rotation if there's no separate description being used. The set of "significant" Factors in canonical correlation might be subjected to rotation, because they are rather like Common Factors; but that is seldom done (in what I read). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Data Reduction
On 26 Mar 2001 19:12:22 -0800, [EMAIL PROTECTED] (Dianne Worth) wrote: > I have a regression model with (mostly) identifiable IVs. In addition, > I want to examine another set of responses and have about 15-20 > questions that relate to that new 'factor.' [ ... ] There is just one factor to be defined? And you have a set of proposed questions, just 15 or 20? It seems like you should be able to look at your correlation matrix, or unrotated PC analysis, or FA, and drop the variables that are odd. You add together the ones that are okay, and that's it. Do you really need a second factor? Or more? You might have to say why, before you figure out which analysis to do, and how many factors to save. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: 1 tail 2 tail mumbo jumbo
On Mon, 19 Mar 2001 13:14:39 -0500, Bruce Weaver <[EMAIL PROTECTED]> wrote: > On Fri, 16 Mar 2001, Rich Ulrich wrote: [ snip, including earlier post ] > > That ANOVA is inherently a 2-sided test. So is the traditional 2x2 > > contingency table. That is because, sides refer to hypotheses. > > > > > I agree with you Rich, except that I don't find "2-sided" all that > appropriate for describing ANOVA. For an ANOVA with more than 2 groups, > there are MULTIPLE patterns of means that invalidate the null hypothesis, > not just 2. With only 3 groups,for example: > > A < B < C > A < C < B > B < A < C [ ... ] > And then if you included all of the cases where 2 of the means are equal > to each other, but not equal to the 3rd mean, there are several more > possibilities. And these ways of departing from 3 equal means do not > correspond to tails in some distribution. > > There's my attempt to add to the confusion. ;-) If I convince people that they want only one *contrast* for their ANOVA, then it is just two-sided. I've been talking people out of blindly testing multiple-groups and multiple periods, for years. Then I have to start over on the folks, to convince them about MANOVA. If there are two groups and two variables, there are FOUR sides -- and that's if you just count what is 'significant' by the single variables. Most of the possible results are not useful ones; that is, they are not easily interpretable, when no variable is 'significant' by itself, or when logical directions seem to conflict. We can interpret "group A is better than B." And we analyze measures that have the scaled meaning, where one end is better. So the sensible analysis uses a defined contrast, the 'composite score'; and then you don't have to use the MANOVA packages, and you have the improved power of testing just one or two sides. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: calculating reliability
On 23 Mar 2001 02:53:11 GMT, John Uebersax <[EMAIL PROTECTED]> wrote: > Paul's comment is very apt. It is very important to consider whether > a consistent error should or should not count against reliability. > In some cases, a constant positive or negative bias should not matter. - If you have a choice, you design your experiment so that a bias will not matter. Assays may be conducted in large batches, or the same rater may be assigned for both Pre and Post assessment. > For example, one might be willing to standardize each measure before > using it in statistical analysis. The standardization would then > remove differences due to a constant bias (as well as differences > associated with a different variance for each measure/rating). ? so that rater A's BPRS on the patient is divided by something, to make it compare to rater B's rating? That sounds hard to justify. I agree that, conceivably, raters could want to use a scale differently. If that's a chance, then: Before you start the study, you train the raters to use the scale the same. Standardizing for variance like that, between *raters,* is something I don't remember doing. I do standardize for the observed SD for a variable, when I create a composite score across several variables. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: calculating reliability
On Thu, 22 Mar 2001 08:23:54 -0500, Bruce Weaver <[EMAIL PROTECTED]> wrote: > > On 21 Mar 2001, Awahab El-Naggar wrote: > > > Dear Colleagues > > I have been using "test-retest" method for calculating reliability by > > applying the Pearson Product Moment (PPM) analysis. However, I have been > > told that this not the right way to calculate reliability, and I should use > > the ANOVA to calculate the reliability. Would you comment and advise me. > > Many Thanks. > > A'Wahab > > > > Here are a couple sites that may provide some useful information: > > http://www.nyu.edu/acf/socsci/Docs/correlate.html > http://www.nyu.edu/acf/socsci/Docs/intracls.html Awahab, = what is in your data = If you want to know what you have in your data, you were doing it the right way. To be complete, you do want to look at the paired *t-test* to check for systematic differences; and you want to confirm that the variances are not too different. If you have multiple raters, you usually want to know about oddities for any single rater. You can find other comments about reliability in my stats-FAQ. = publishing a single number = If you want to publish a simple, single number, then editors have been trained to ask for an IntraClass Correlation (ICC) of some sort. The ICC reference Bruce W. cites above tells how SPSS now offers 10 different ICCs, following some over-used, much-cited studies. The most common ICC (between two raters) does a simple job of confounding the Pearson correlation with the mean difference (by assuming the means are equal), instead of inviting you look at those two dimensions separately. It can look pretty good, even when a t-test would give you a warning. That's why I think of an ICC as a summary that "only an editor can love." Once you have confirmed that you have good reliability, then you might want to do the ANOVA to get the ICC that an editor wants. But a wise editor or reviewer should be pleased with suitable reports of Pearson r and tests of means. = ICC for special purposes = I have seen a study planned, where 3-rater estimates of some X would be used, in order to increase the precision of X and reduce the number of cases. The estimate of the eventual sample size used one particular species of ICC, from the many that are possible. That's the legitimate reason for computing a special ICC (however, I do have doubts about its accuracy). Over the last thirty years, I remember seeing that once. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Seeking eingineering data sets
On Tue, 20 Mar 2001 23:20:55 GMT, "W. D. Allen Sr." <[EMAIL PROTECTED]> wrote: > Check out XLStats at, > > http://www.man.deakin.edu.au/rodneyc/XLStats.htm > > I have used a number of stat programs and this one is the easiest to use for > us non-professional statisticians. It solves what I believe is the biggest > problem in statistics, i.e., which statistical inference test is appropriate > for my particular problem. - Darn! Those years of study and practice, all wasted! Someone just needed to sell me that magic box By the way, WD Allen, you are in a horrible fix if you don't read your XLStats closer than you read the stats group. Jim asked for pointers to *data* and not to stat packs: > "Jim Youngman" <[EMAIL PROTECTED]> wrote in message > 9xHt6.40968$[EMAIL PROTECTED]">news:9xHt6.40968$[EMAIL PROTECTED]... > > Can anyone point me to data sets related to (Civil) Engineering that > would > > be suitable for use as examples in an elementary statistics course for > > engineers? === Here is one reference. It might not help, but it might inspire others to mention journals, if this is now a popular thing. I discovered this a couple of weeks ago. "Biometrics" has a number of datasets online which have been used in their articles. I don't know what a civil engineer needs, so I can't judge their relevance, even if I look at more than the title of the article. Click on 'data sets' on their main page - http://stat.tamu.edu/Biometrics/ -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Discriminant analysis group sample sizes
On 20 Mar 2001 15:50:07 -0800, [EMAIL PROTECTED] (Brian M. Schott) wrote: BMS: " Is it necessary to approximately balance the sample size in the groups of the validation or holdout sample to develop a good discriminant function?" The answer is generally No, but I'm not sure what the alternatives are supposed to be. The 'holdout sample', if there's just one, doesn't do much to develop the function; it illustrates it. Some points - being representative usually matter a lot (as opposed to merely being numerous). Never throw away free cases that you have in-hand, just for the sake of achieving balance. BMS: " I am a little unclear about the extent to which the prior probabilities can be used to adjust for sample proportions which do not represent the population proportions. I suspect there is a difference between predictive and descriptive discriminant analysis in regard to this question, btw. But I cannot find a textbook that addresses this question. " In the usual, ordinary, discriminant function, the 'prior probabilities' play absolutely no role in the mathematics of the solution. The DF is, in other terms, a problem in canonical correlation; or an eigenvector problem. There's no place in the basic problem for those weights to enter in. [ You might assign weights on the groups, if you look at step-wise inclusion of variables -- but that whole prospect is unappealing. I have not bothered to see what is implemented. ] The 'prior probabilities' are used to in the step that describes the (predicted) group memberships. You draw lines in particular places. In your terms, you might say, it is a part of the 'descriptive analysis' only. However, there is NO *analysis* in a sense - you just have the description. Furthermore: Using the priors is (often) not well understood. Most writers avoid from saying much, because they haven't figured them out, either. USUALLY, you do not need to (and should not) use priors. In the cases where they are used, USUALLY the adjustment (away from 50-50) should be less than proportionate to the Ns. USUALLY, it is fair to draw a cutoff score (almost) arbitrarily. I don't have an explicit reference. The textbooks on my shelf don't say much. I can suggest looking for texts that mention cost-benefit, and Decisions. You might want to read journal references on those topics, which are from the 1970s in my texts. I have not seen the new ones, but I suspect there are citations for 1995+, to go along with multiple-category logistic regression being available in SPSS and SAS. -- Try google. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Brogden-Clemens coefficient
On Wed, 21 Mar 2001 01:08:00 GMT, Ken Reed <[EMAIL PROTECTED]> wrote: > Is anyone familiar with the Brogden-Clemens coefficient for measuring index > reliability? > > How is it calculated? > > What is the original reference? Is this your subject? "When is a test valid enough?" http://www.aptitude-testing.com/brogden.htm A google search on your subject drew a blank. didn't mention reliability. looks like the place to start. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: One tailed vs. Two tailed test
On 16 Mar 2001 20:32:40 -0800, [EMAIL PROTECTED] (dennis roberts) wrote: [ ... ] > seems to me when you fold over (say) a t distribution ... you don't have a > t distribution anymore ... mighten you have a chi square if before you fold > it over you square the values? [ ... snip, rest ] You are forgetting? normal z^2 is chi^squared. And t^2 with xxx degrees of freedom, is equal to F(1,xxx) d.f. -- Rich U. http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: misues of statistics
Elliot, the Baldus study *might* be a poor enough effort that we shouldn't bother trying to figure what it said, and whether one court or another made good use of it -- On Fri, 16 Mar 2001 16:31:15 -0500, Elliot Cramer <[EMAIL PROTECTED]> wrote: > On Fri, 16 Mar 2001, Rich Ulrich wrote: > > > Elliot, > > > > It appears to me that Arnold Barnett is guilty > > of a serious misuse of statistical argument. [ snip, various, mine and his. Elliot had posted an article from Barnett, concerning statistics offered for a court case.] EC > > The point of the article is that the Supreme Court apparently understood > the odd ratio to be a probability ratio. The US district court did not > make this mistake and issued a devastating critique of the Baldus Study > which used linear regression instead of logistic regression, amongh other > things. It was VERY inadequate in dealing with nature of the crime which > is the most important consideration in the death penalty. [ ... ] When I searched on "Baldus study", Google included this page by the Federation of American Scientists, with testimony to Congress in 1989. The FAS is a lobbying organization whose testimony and data collection have always been highly credible (and I have contributed money to FAS, for years). http://www.fas.org/irp/congress/1989_cr/s891018-drugs.htm Statement of Edward S.G. Dennis, Jr., Assistant Attorney General, Criminal Division [ ... ] "There appears to be a misconception that McCleskey involved a judicial finding of systemic discrimination in the imposition of the death penalty, and the upholding of capital punishment despite such a finding. Any such reading of the Court's opinion is contrary to fact. As I will discuss in greater detail below, the district court in McCleskey found that the empirical study on which the systemic discrimination claim was based was seriously flawed. The Supreme Court, in reviewing the case, did not question the accuracy of the district court's findings. " "In McCleskey, the defendant submitted a statistical study, the Baldus study, that purported to show that a disparity in the imposition of the death penalty in Georgia was attributable to the race of the murder victim and, to a lesser extent, the race of the defendant. Id. at 286. The defendant argued that the Baldus study demonstrated that his rights had been violated under the Eighth and Fourteenth Amendments. " [ ... ] "Second, as noted above, the Supreme Court simply assumed that the Baldus study was statistically accurate in order to reach the defendant's constitutional arguments. The record is clear, however, that the Baldus study was significantly flawed. As the Supreme Court noted, the district court in the McCleskey care had examined the Baldus study `with case,' following `an extensive evidentiary hearing.' 481 U.S. at 287. In the course of a thoughtful and exhaustive opinion, the district court found that the Baldus study was unpersuasive. Among many other things, the district court found that the data compiled as the basis for the study was incomplete and contained `substantial flaws' and that the defendant had not established by a preponderance of the evidence that the data was `essentially trustworthy.' McCleskey v. Zant, 580 F. Supp. 338, 360 (N.D. Ga. 1984). 1 "1 See also the Supreme Court's summary of the flaws in the Baldus study found by the district court. 481 U.S. at 288 n.6. " === end of cite -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: can you use a t-test with non-interval data?
On 17 Mar 2001 19:54:27 -0800, [EMAIL PROTECTED] (Will Hopkins) wrote: > I just thought of a new justification doing the usual parametric analyses > on the numbered levels of a Likert-scale variable. Numbering the levels > is formally the same as ranking them, and a parametric analysis of a > rank-transformed variable is a non-parametric analysis. If non-parametric > analyses are OK, then so are parametric analyses of Likert-scale variables. Good comment. One thing that happened, in recent years, was that Conover, et al., showed that you can to the t-test on Ranked data and get a really good approximation of the "exact" p-level, even when the Ns are quite small. Further: Ranked data has theoretical problems with *ties* -- which is the chronic condition Likert-scale items. In fact, using the t-test on Ranks sometimes gives a better p-value that what your textbook recommends for "adjusting for ties." Further again: In the cases where there are "odd" distributions, in the several categories, you want to check to see what the rank-tranformation assigns to categories as their effective "scores" and then select between analyses. For my data, the 1...5 assigned scoring almost always looks better than the intervals achieved by ranks. Agresti has a detailed example of arbitrary scoring of categories in his textbook, "Introduction to categorical data analysis." > > But... an important condition is that the sampling distribution of your > outcome statistic must be normal. This topic came up on this list a few > weeks ago. In summary, if the majority of your responses are stacked up on > one or other extreme value of the Likert scale for one or more groups in > the analysis, and if you have less than 10 observations in one or more of > those groups, your confidence intervals or p values are untrustworthy. See > http://newstats.org/modelsdetail.html#normal for more. Good comment, too. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: 1 tail 2 tail mumbo jumbo
On 14 Mar 2001 08:33:29 -0800, [EMAIL PROTECTED] (dennis roberts) wrote: [ ... ] > however, i think that we definitely need some standardization and revamping > when it comes to using terms like 1 and 2 tailed tests ... > the term "tail" ... either 1 tailed or 2 tailed ... should ONLY be used in > connection with what the test statistic that you have decided to use ... > naturally asks you to do with respect to deciding on critical values ... [ snip, much] I agree, that we need to be careful... Maybe we need some conventions? I was less sure, until I read the following: > > when we do a simple ANOVA ... this should be called a 1 tailed test ... no > matter what your research predictions are ... when we use chi square on a > contingency table ... it should be called a 1 tailed test ... no matter how > you think the direction of the relationship should go Ooh, I don't like it, I don't like any mention of "1" right here, in either case. Sure, "1" is true, but it is mainly misleading and irrelevant, right? That ANOVA is inherently a 2-sided test. So is the traditional 2x2 contingency table. That is because, sides refer to hypotheses. The t-test is inherently 1-sided, like a z: only the large, plus-sign values have small p's. But some people *always* refer to 2-sided probabilities of z and t. That is, they use a two-tailed t-test, (two-tailed z) which is equivalent to using an ANOVA F (chi-squared with 1 d.f.). The default-test of any sort, I suggest, is "one-tailed" and we get p from its Cumulative Distribution Function; and 1-tailed does not have to be mentioned. If we pool the tails, that requires special notice, and we should specify that the t is "two-tailed." [ snip, more details; including 't-test' suggestions that are contrary to what I just wrote.] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: One tailed vs. Two tailed test
Sides? Tails? There are hypotheses that are one- or two-sided. There are distributions (like the t) that are sometimes folded over, in order report "two tails" worth of p-level for the amount of the extreme. I don't like to write about these, because it is so easy to be careless and write it wrong -- there is not an official terminology. On Thu, 15 Mar 2001 14:29:04 GMT, Jerry Dallal <[EMAIL PROTECTED]> wrote: > We don't really disagree. Any apparent disagreement is probably due > to the abbreviated kind of discussion that takes place in Usenet. > See http://www.tufts.edu/~gdallal/onesided.htm > > Alan McLean ([EMAIL PROTECTED]) wrote: > > > My point however is still true - that the person who receives > > the control treatment is presumably getting an inferior treatment. You > > certainly don't test a new treatment if you think it is worse than > > nothing, or worse than current treatments! > > Equipoise demands the investigator be uncertain of the direction. > The problem with one-tailed tests is that they imply the irrelevance > of differences in a particular direction. I've yet to meet the > researcher who is willing to say they are irrelevant regardless of > what they might be. [ ... ] "Equipoise"? I'm not familiar with that as a principle, though I would guess When I was taught testing, I was taught that using *one* tail of a distribution is what is statistically intelligible, or natural. Adding together the opposite extremes of the CDF, as with a "two-tailed t-test," is an arbitrary act. It seems to be justified or explained by pointing to the relation between tests on two means, t^2 = F. Is that explanation enough? Technically speaking (as I was taught, and as it still seems to me), there is nothing wrong with electing to take 4.5% from one tail, and 0.5% from the other tail. Someone has complained about this: that is "really" what some experimenters do. They say they plan a one-tailed t- test of a one-sided hypothesis. However, they do not *dismiss* a big effect in the wrong direction, but they want to apply different values to it. I say, This does make sense, if you set up the tests like I just said. That is: I ask, What is believable? Yes, to a 4.4% test (for instance) in the expected direction. No, to a test of 2% or 1% or so, in the other direction; - but: Pay attention, if it is EXTREME enough. Notice, you can take out a 0.1% test and leave the main test as 4.9%, which is not effectively different from 5%. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: misues of statistics
a white-victim case is 0.99/0.01; in other > words, a death sentence is 99 times as likely as the alternative. But even > after being cut by a factor of 4.3, the odds ratio in the case of a black > victim would take the revised value of 99/4.3 = 23, meaning that the > perpetrator would be 23 times as likely as not to be sentenced to > death. That is: > > > > Work out the algebra and you find that PB = 0.96. In other words, while a > death sentence is almost inevitable when the murder victim is white, it is > also so when the victim is black - a result that few readers of the "four > times as likely" statistic would infer. While not all Georgia killings are > so aggravated that PW = 0.99, the quoted study found that the heavy > majority of capital verdicts came up in circumstances when PW, and thus > PB, is very high. > > None of this is to deny that there is some evidence of race-of-victim > disparity in sentencing. The point is that the improper interchange of two > apparently similar words greatly exaggerated the general understanding of > the degree of disparity. - Now, the author is asserting that 1% versus 4% is far different from 99% versus 96%. Statisticians should be leery of that. Yes, there are occasions when they differ: 1 versus 4 is an important difference if you multiply the fractions times costs or benefits. But I don't sense the relevance, when moving a fraction between categories of 'life in prison' and 'death'. Steve Simon posted a few weeks ago to one stats-group. He rather likes the likelihood approach, and he was citing someone else who does; whereas, I have posted several times about how foolish it seems to me, both logically and mathematically, to model 'Likely' instead of using Log-Odds. > Blame for the confusion should presumably be > shared by the judges and the journalists who made the mistake and the > researchers who did too little to prevent it. - the judges and journalists missed the word; they missed the math that would have made the word important; so they ended up with the right conclusion. > > (Despite its uncritical acceptance of an overstated racial disparity, the > Supreme Court's McClesky v. Kemp decision upheld Georgia's death > penalty. The court concluded that a defendant must show race prejudice in > his or her own case to have the death sentence countermanded as > discriminatory.) ==== For what I have noticed, omitting the Odds ratio is more likely to be abusive, than *using* it. For instance, 98% of whites will complete certain training, and 92% of blacks, that is another 4:1 Odds ratio. There is not much difference in terms of success-rate (or money-invested for training); that is a big difference in failure rate, which did seem to matter. - I have seen that oversight in a newspaper report. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
- I hate having to explain jokes - On 14 Mar 2001 15:34:45 -0800, [EMAIL PROTECTED] (dennis roberts) wrote: > At 04:10 PM 3/14/01 -0500, Rich Ulrich wrote: > > >Oh, I see. You do the opposite. Your own > >flabby rationalizations might be subtly valid, > >and, on close examination, > >*do* have some relationship to the questions > > > could we ALL please lower a notch or two ... the darts and arrows? i can't > keep track of who started what and who is tossing the latest flames but ... > somehow, i think we can do a little better than this ... Dennis, Please, where is YOUR sense of humor? My post was a literary exercise -- I intentionally posted his lines immediately before mine, so the reader could follow my re-write phrase by phrase. I'm still hoping "Irving" will lighten up. You chopped out the original that I was paraphrasing, and you did *not* indicate those important [snip]s -- You would mislead the casual reader to think someone other than JimS is originating lines like that, or intend them as critique in this group. - I'm not always kind, but I think I am never that wild. - It's probably been a dozen years since I purely flamed like . (Or maybe I never flamed, if you talk about the really empty ones. In the olden days of local Bulletin Boards, with political topics, I discarded 1/3 of my compositions without ever posting, because of poor content or tone. I still use some judgment in what I post.) Compare his original line about 'little or no ... relationship' with my clever reversal, "... on close examination, *do* have some relationship to the questions." Well, I was trying for humor, anyway. Sorry, if I missed. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
On 14 Mar 2001 21:55:48 GMT, [EMAIL PROTECTED] (Radford Neal) wrote: > In article <[EMAIL PROTECTED]>, > Rich Ulrich <[EMAIL PROTECTED]> wrote: > > >(This guy is already posting irrelevant rants as if > >I've driven him up the wall or something. So this > >is just another poke in the eye with a blunt stick, to see > >what he will swing at next) > > I think we may take this as an admission by Mr. Ulrich that he is > incapable of advancing any sensible argument in favour of his > position. Certainly he's never made any sensible response to my > criticism. - In a new thread, I have now provided a response that is sensible, or, at least, somewhat numeric. I notice that Jim C. has taken up the cudgel, in trying to explain the basics of t-tests to Jim S, and that "furthers my position." I figure that after I state my position in one post, explicate it in another, and try that again while refining the language -- then I may as well call it quits with JS, when he still doesn't get the points from the first (or from the couple of other people who were posting them before I was). I may not be saying it all that well, but I wasn't inventing the position. You and I are in agreement, now, on one minor conclusion: "The t-test isn't good evidence about a difference in averages." But for me, that's true because the numbers are crappy indicators of performance -- which was clued *first* by the distribution. Whereas, you seem to have much more respect for crude averages, compared to the several of us who object. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
MIT numbers. Was: Re: On inappropriate hy.
Neal, I did intend to respond to this post -- you seem serious about this, more so than "Irving." On 13 Mar 2001 22:36:03 GMT, [EMAIL PROTECTED] (Radford Neal) wrote: [ snip, previous posts on what might be tested ] > > None of you said it explicitly, because none of you made any coherent > exposition of what should be done. I had to infer a procedure which > would make sense of the argument that a significance test should have > been done. > > NOW, however, you proceed to explicitly say exactly what you claim not > to be saying: > RU> > > >I know that I was explicit in saying otherwise. I said something > >like, If your data aren't good enough so you can quantify this mean > >difference with a t-test, you probably should not be offering means as > >evidence. - This is a point that you are still missing. I am considering the data... then rejecting the *data* as lousy. I'm *not* drawing substantial conclusions (about the original hypotheses) from the computed t, or going ahead with further tests. NR> > In other words, if you can't reject the null hypothesis that the > performance of male and female faculty does not differ in some > population from which the actual faculty were supposedly drawn, then > you should ignore the difference in performance seen with the actual > faculty, even though this difference would - by standard statistical > methodology explained in any elementary statistics book - result in a > higher standard error for the estimate of the gender effect, possibly > undermining the claim of discrimination. - Hey, I'm willing to use the honest standard error. When I have decent numbers to compare. But when the numbers are not *worthy* of computing a mean, then I resist comparing means. RU> > > > And, Many of us statisticians find tests to be useful, > >even when they are not wholly valid. > NR> > It is NOT standard statistical methodology to test the significance of > correlations between predictors in a regression setting, and to then > pretend that these correlations are zero if you can't reject the null. - again, I don't know where you get this. Besides, on these data, "We reject the null..." once JS finally did a t-test. But it was barely 5%. And now I complain that there is a huge gap. It is hard to pretend that these numbers were generated as small, independent effects that are added up to give a distribution that is approximately normal. [ snip, some ] RN> > So the bigger the performance differences, the less attention should > be paid to them? Strange... > Yep, strange but true. They would be more convincing if the gap were not there. The t-tests (Students/ Satterthwaite) give p-values of .044 and .048 for the comparison of raw average values, 7032 versus 1529. If we subtract off 5000 from each of the 3 large counts (over 10,000), the t-tests have p-values of .037 and .036, comparing 4532 versus 1529. Subtract 7000 for the three, p-values are hardly different, at .043, .040; comparing counts of 3532 versus 1539. In my opinion, this final difference rates (perhaps) higher on the scale of "huge differences" than the first one: the t-tests are about equal, but the actual numbers (in the second set) don't confirm any suspicions about a bad distribution. The first set is bad enough that "averages" are not very meaningful. http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
On Tue, 13 Mar 2001 14:04:19 -0800, Irving Scheffe (JS) <[EMAIL PROTECTED]> wrote: > Actually, in practice, the decisions are seldom made > on the basis of rational evaluation of data. They > are usually made on the basis of political pressure, > with thin, and obviously invalid, pseudo-rationalizations > on the basis of data that, on close examination, have > little or no necessary relationship to the questions > being asked. Oh, I see. You do the opposite. Your own flabby rationalizations might be subtly valid, and, on close examination, *do* have some relationship to the questions [ snip, one sentence of post, plus irrelevant citation. ] (This guy is already posting irrelevant rants as if I've driven him up the wall or something. So this is just another poke in the eye with a blunt stick, to see what he will swing at next) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
On 12 Mar 2001 14:25:41 GMT, [EMAIL PROTECTED] (Radford Neal) wrote: [ snip, baseball game; etc. ] > In this context, all that matters is that there is a difference. As > explained in many previous posts by myself and others, it is NOT > appropriate in this context to do a significance test, and ignore the > difference if you can't reject the null hypothesis of no difference in > the populations from which these people were drawn (whatever one might > think those populations are). So far as I remember, you are the only person who imagined that procedure, "do a test and ignore ... if you can't reject" Oh, maybe Jim, too. I know that I was explicit in saying otherwise. I said something like, If your data aren't good enough so you can quantify this mean difference with a t-test, you probably should not be offering means as evidence. And, Many of us statisticians find tests to be useful, even when they are not wholly valid. As evidence, I pointed to the (over-) acceptance of observational studies in epidemiology. I think I made those arguments at least two or three times, each. As it turns out, the big gap in the "scores" makes those averages dubious, even though a t-test *is* nominally significant. (That's so, when computed on X or on log(X), but not so, on 1/X.) And then, as I later discovered, the arguments and the style of the original report make Jim's criticism tenuous. Even if you were to illustrate how all the males have out-achieved all the females, by one criterion or by several criteria, you would not discredit the decision of the dean -- Wasn't the report was talking more about 'what all our faculty deserve' instead of what's earned by individuals? You guys have skipped that half. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
On Thu, 08 Mar 2001 10:38:59 -0800, Irving Scheffe <[EMAIL PROTECTED]> wrote: > On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich <[EMAIL PROTECTED]> > wrote: > > >On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving > >Scheffe) wrote: > > > >My comments are written as responses to the technical > >comments to Jim Steiger's last post. This is shorter than his post, > >since I omit redundancy and mostly ignore his 'venting.' > >I think I offer a little different perspective on my previous posts. > > > >[ snip, intro. ] > > Mr. Ulrich's latest post is a thinly veiled ad hominem, and > I'd urge him to rethink this strategy, as it does not > present him in a favorable light. - I have a different notion of ad-hominem, since I think it is something directed towards 'the person' rather than at the presentation. Or else, I don't follow what he means by 'thinly veiled.' When a belligerent and nasty and arrogant tone seems to be an essential part of an argument, I don't consider myself to be reacting 'ad-hominem' when I complain about it -- it's not that I hate to be ad-hominem, but I don't like to be misconstrued. I'm willing, at times, to plunk for the 'ad-hominem'. For instance, since my last post on the subject, I looked at those reports. Also, I searched with google for the IWF -- who printed the anti-MIT critiques. I see the organization characterized as an 'anti-feminist' organization, with some large funding from Richard Scaife. 'Anti-feminist' could mean a reasoned-opposition, or a reflex opposition. Given these papers, it appears to me to qualify as 'reflex' or kneejerk opposition. Oh, ho! I say, this explains where the arguments came from, and why Jim keeps on going -- Now, THIS PARAGRAPH is what I consider an ad-hominem argument. And I'll give you some more. Scaife is a paranoid moneybags and publisher who infests this Pittsburgh region (which is why I have noticed him more than a westerner like Coors). His cash was important in persecuting Clinton for his terms in office. For example, Scaife kept alive Victor Foster's suicide for years. He held out money for anyone willing to chase down Clinton-scandals. Oh, he funded the chair at Pepperdine that Starr had intended to take. Now: My comment on the original reports: I am happy to say that it looks to me as if MIT is setting a good model for other universities to follow. The senior administrator listens to his faculty, especially his senior faculty, and responds. MIT makes no point about numbers in their statements, and it does seem to be wise and proper that they don't do so. I see now, Jim is not really arguing with MIT. They won't argue back. Jim's purpose is to create a hostile presence, a shadow to threaten other administrators. He goes, like, "If you try to 'cut a break' for women, we'll be watching and threatening and undermining, threatening your job if we can." I suppose state universities are more vulnerable than the private universities like MIT. On the other hand, with the numbers that Jim has put into the public eye, the next administrator can point to the precedent of MIT and assert that, clearly, the simple numbers on 'quality' are substantially irrelevant to the issues, since they were irrelevant at MIT. Hope this helps. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Trend analysis question: follow-up
On 5 Mar 2001 16:41:22 -0800, [EMAIL PROTECTED] (Donald Burrill) wrote: > On Mon, 5 Mar 2001, Philip Cozzolino wrote in part: > > > Yeah, I don't know why I didn't think to compute my eta-squared on the > > significant trends. As I said, trend analysis is new to me (psych grad > > student) and I just got startled by the results. > > > > The "significant" 4th and 5th order trends only account for 1% of the > > variance each, so I guess that should tell me something. The linear > > trend accounts for 44% and the quadratic accounts for 35% more, so 79% > > of the original 82% omnibus F (this is all practice data). > > > > I guess, if I am now interpreting this correctly, the quadratic trend > > is the best solution. DB > > Well, now, THAT depends in part on what the > spectrum of candidate solutions is, doesn't it? For all that what you > have is "practice data", I cannot resist asking: Are the linear & > quadratic components both positive, and is the overall relationship > monotonically increasing? Then, would the context have an interesting > interpretation if the relationship were exponential? Does plotting [ snip, rest ] "Interesting interpretation" is important. In this example, the interest (probably) lies mainly with the variance-explained: in the linear and quadratic. It's hard for me to be highly interested in an order-5 polynomial, and sometimes a quadratic seems unnecessarily awkward. What you want is the convenient, natural explanation. If "baseline" is far different from what follows, that will induce a bunch of high order terms if you insist on modeling all the periods in one repeated measures ANOVA. A sensible interpretation in that case might be, to describe the "shock effect" and separately describe what happened later. Example. The start of Psychotropic medications has a huge, immediate, "normalizing" effect on some aspects of sleep of depressed patients (sleep latency, REM latency, REM time, etc.). Various changes *after* the initial jolt can be described as no-change; continued improvement; or return toward the initial baseline. In real life, linear trends worked fine for describing the on-meds followup observation nights (with - not accidentally - increasing intervals between them). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Census Bureau nixes sampling on 2000 count
On Fri, 02 Mar 2001 12:16:42 GMT, [EMAIL PROTECTED] (J. Williams) wrote: > The Census Bureau urged Commerce Secretary Don Evans on Thursday not > to use adjusted results from the 2000 population count. Evans must > now weigh the recommendation from the Census Bureau, and will make the > decision next week. If the data were adjusted statistically it could > be used to redistribute and remap political district lines. William > Barron, the Bureau Director, said in a letter to Evans that he agreed > with a Census Bureau committee recommendation "that unadjusted census > data be released as the Census Bureau's official redistricting data." > Some say about 3 million or so people make up a disenfranchising > undercount. Others disagree viewing sampling as a method to "invent" > people who have not actually been counted. Politically, the stakes > are high on Evans' final decision. People may wonder, "Why did the Census Bureau say this, and why is there little criticism of them?" According to the reports of a few weeks ago, the inner-city counts, etc., of this census were quite a bit more accurate than they were 10 years ago. That means that we couldn't be so sure that adjustment would make a big improvement, or any improvement. This frees Republicans of some blame, for this one instance, of pushing specious technical arguments for short-term GOP gain. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Post-hoc comparisons
On 2 Mar 2001 07:27:16 -0800, [EMAIL PROTECTED] (Esa M. Rantanen) wrote: [ snip, detail ] > contingency table. I have used a Chi-Sq. analysis to determine if there is > a statisitcally significant difference between the (treatment) groups (all > 4!), and indeed there is. I assume, however, that I cannot simply do > pairwise comparisons between the groups using Chi-Sq. and 2 x 2 matrices > without inflating the probability of Type 1 error, (1-alpha)^4 in this > case. As far as I know, there are no equivalents to Duncan's or Tukey's > tests for the type of data (binary) I have to deal with. Well, if you want to do the ANOVA on the dichotomous variable, I won't complain. My reaction is, you are assuming that, somewhere, great precision matters. But being precise in your thinking will gain you most, so that you do and report just ONE important test, that you figured out beforehand, instead of trying to cope with 6 tests that happen to fall into your lap. I would probably (a) Let the Overall test justify all my followup testing, where the followup testing is descriptive, among categories of equal N and equivalent importance; or (b) Do a few specified tests with Bonferroni correction, and report those tests. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
that judgment is impertinent. If these data were carefully designed, I should expect more qualification and justification to them; aren't they a crude number? - Perhaps I miss something by not reading the papers, but, if so, you should have pointed Gene and Dennis politely to the details, instead of blundering around and making it appear that "this one is huge" is your whole basis. My commentary is devoted to your presentation, here. [ snip, "importance of issue" and more redundancy.] Hope that helps. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Regression with repeated measures
On 28 Feb 2001 09:24:55 -0800, [EMAIL PROTECTED] (Mike Granaas) wrote: > > I have a student coming in later to talk about a regression problem. > Based on what he's told me so far he is going to be using predicting > inter-response intervals to predict inter-stimulus intervals (or vice > versa). - Is it just me, or is that sentence hard to parse? " ... he is going to be using predicting inter-response intervals to predict inter-stimulus intervals (or vice versa)." Since I am accustomed to S -> R, I assume the 'vice-versa' must be the case; it leaves me with "Intervals between stimuli that predict, predicting intervals between responses." Can I drop the word 'predicting' that seems (to me) accidental? Well, it seems to me that an 'interval' can be a stimulus or a measure of response, but when the problem keeps that terminology, it (further) suggests to me that data are collected as a time-series. - If so, Time-series has to be incorporated, from the start. > > What bothers me is that he will be collecting data from multiple trials > for each subject and then treating the trials as independent replicates. > That is, assuming 10 tials/S and 10 S he will act as if he has 100 > independent data points for calculating a bivariate regression. > > Obviously these are not independent data points. > > Is the non-independence likely to be severe enough to warrant concern? > > If yes, is there some method that will allow him to get the prediction > equation he wants? - Can he do a prediction equation on one person? If there's a parameter for a person, then he has 10 people, each of whom yields a parameter value. A test, of sorts, might be possible on the scores for one person. But the generalization is tested using the 10 scores, comparing those parameter values to some null. His power-of-analysis will be much better if he can define his hypotheses from the start, instead of trying to let a pattern 'emerge from the data' across the 10 consecutive trials. -- RIch Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Cronbach's alpha and sample size
On Wed, 28 Feb 2001 12:08:55 +0100, Nicolas Sander <[EMAIL PROTECTED]> wrote: > How is Cronbach's alpha affected by the sample size apart from questions > related to generalizability issues? - apart from generalizability, "not at all." > > Ifind it hard to trace down the mathmatics related to this question > clearly, and wether there migt be a trade off between N of Items and N > of sujects (i.e. compensating for lack of subjects by high number of > items). I don't know what you mean by 'trade-off.' I have trouble trying to imagine just what it is, that you are trying to trace down. But, NO. Once you assume some variances are equal, Alpha can be seen as a fairly simple function of the number of items and the average correlation -- more items, higher alpha. The average correlation has a tiny bias by N, but that's typically, safely ignored. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Satterthwaite-newbie question
On Wed, 28 Feb 2001 08:26:30 -0500, Christopher Tong <[EMAIL PROTECTED]> wrote: > On Tue, 27 Feb 2001, Allyson Rosen wrote: > > > I need to compare two means with unequal n's. Hayes (1994) suggests using a > > formula by Satterthwaite, 1946. I'm about to write up the paper and I can't > > find the full reference ANYWHERE in the book or in any databases or in my > > books. Is this an obscure test and should I be using another? > > Perhaps it refers to: > > F. E. Sattherwaite, 1946: An approximate distribution of estimates of > variance components. Biometrics Bulletin, 2, 110-114. > > According to Casella & Berger (1990, pp. 287-9), "this approximation > is quite good, and is still widely used today." However, it still may > not be valid for your specific analysis: I suggest reading the > discussion in Casella & Berger ("Statistical Inference", Duxbury Press, > 1990). There are more commonly used methods for comparing means with > unequal n available, and you should make sure that they can't be used > in your problem before resorting to Sattherwaite. I don't have access to Casella & Berger, but I am curious about what they recommend or suggest. Compare means with Student's t-test or logistic regression; or Satterthwaite t if you can't avoid it if both means and variances are different enough, and you wouldn't rather do some transformation (for example, to ranks: then test Ranks). And there's randomization and bootstrap. Anything else? Yesterday (so it should still be on your server), there was a post with comments about the t-tests. from the header From: [EMAIL PROTECTED] (Jay Warner) Newsgroups: sci.stat.edu Subject: Re: two sample t There are *additional* methods for comparing, but the one that is *more common* is probably the Student's t, which ignores the inequality. Any intro-stat-book with the t-test is likely to have one or another version of the Satterthwaite t. The SPSS website includes algorithms for what that stat-package uses, under t-test, for "unequal variances." I find it almost impossible to find the algorithms by navigating the site, so here is an address -- http://www.spss.com/tech/stat/Algorithms.htm -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: two sample t
On 26 Feb 2001 12:26:19 -0800, [EMAIL PROTECTED] (dennis roberts) wrote: > when we do a 2 sample t test ... where we are estimating the population > variances ... in the context of comparing means ... the test statistic ... > > diff in means / standard error of differences ... is not exactly like a t > distribution with n1-1 + n2-1 degrees of freedom (without using the term > non central t) > > would it be fair to tell students, as a thumb rule ... that in the case where: > > ns are quite different ... AND, smaller variance associated with larger > n, and reverse ... is the situation where the test statistic above is when > we are LEAST comfortable saying that it follows (close to) a t > distribution with n1-1 + n2-1 degrees of freedom? > > that is ... i want to set up the "red flag" condition for them ... > > what are guidelines (if any) any of you have used in this situation? Neither extreme is better than the other. Student's t-test and that Satterthwaite test have their problems in the opposite directions. With unequal Ns and unequal variances, and a one-tailed test, - one t-test will be too small (rejecting, approximately, never) and - the other will be too big (rejecting about twice as often); - making the TWO-tailed versions come out 'robust'! for size. Neither direction is better until you decide what bias you want. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
- I want to comment a little more thoroughly about the lines I cited: what Garson said about inference, and his citation of Olkey. On Thu, 22 Feb 2001 18:21:41 -0500, Rich Ulrich <[EMAIL PROTECTED]> wrote: [ snip, previous discussion ] me > > I think that Garson is wrong, and the last 40 years of epidemiological > research have proven the worth of statistics provided on non-random, > "observational" samples. When handled with care. > > From G. David Garson, "PA 765 Notes: An Online Textbook." > > On Sampling > http://www2.chass.ncsu.edu/garson/pa765/sampling.htm > > Significance testing is only appropriate for random samples. > > Random sampling is assumed for inferential statistics > (significance testing). "Inferential" refers to the fact > that conclusions are drawn about relationships in the data > based on inference from knowledge of the sampling > distribution. Significance tests are based on a sampling > theory which requires that every case have a chance of being > selected known in advance of sample selection, usually an > equal chance. Statistical inference assesses the > significance of estimates made using random samples. For > enumerations and censuses, such inference is not needed > since estimates are exact. Sampling error is irrelevant and > therefore inferential statistics dealing with sampling error > are irrelevant. - I agree with most of what he says, throughout; there will be a matter of nuances on interpretation and actions. For enumerations and censuses, a limited sort of statistics on 'finite populations,' he says sampling error is irrelevant. Irrelevant is a good and fitting word here. This is not 'illegal and banned,' but rather 'unwanted and totally beside the point.' Garson > > Significance tests are sometimes applied > arbitrarily to non-random samples but there is no existing > method of assessing the validity of such estimates, though > analysis of non-response may shed some light. The following > is typical of a disclaimer footnote in research based on a > non random sample: Here is my perspective on testing, which does not match his. - For a randomized experimental design, a small p-level on a "test of hypothesis" establishes that *something* seemed to happen, owing to the treatment; the test might stand pretty-much by itself. - For a non-random sample, a similar test establishes that *something* seems to exist, owing to the factor in question *or* to any of a dozen factors that someone might imagine. The test establishes, perhaps, the _prima facie_ case but the investigator has the responsibility of trying to dispute it. That is, it is an investigator's responsibility (and not just an option) to consider potential confounders and covariates. If the small p-level stands up robustly, that is good for the theory -- but not definitive. If there are vital aspects or factors that cannot be tested, then opponents can stay unsatisfied, no matter WHAT the available tests may say. Garson > > "Because some authors (ex., Oakes, 1986) note the use of > inferential statistics is warranted for nonprobability > samples if the sample seems to represent the population, and > in deference to the widespread social science practice of > reporting significance levels for nonprobability samples as > a convenient if arbitrary assessment criterion, significance > levels have been reported in the tables included in this > article." See Michael Oakes (1986). Statistical inference: A > commentary for social and behavioral sciences. NY: Wiley. > Garson is telling his readers and would-be statisticians a way to present p-levels, even when the sampling doesn't justify it. And, I would say, when the analysis doesn't justify it. I am not happy with the lines -- The disclaimer does not assume that a *good* analysis has been done, nor does it point to what makes up a good analysis. '... if the sample seems to represent the population' seems to be a weak reminder of the proper effort to overcome 'confounding factors'; it is not an assurance that the effects have proven to be robust. So, the disclaimer should recognize that the non random sample is potentially open to various interpretations; the present analysis has attempted to control for several possibilities; certain effects do seem robust statistically, in addition to being supported by outside chains of inference, and data collected independently. I suggested earlier that this is the status of epidemiological, observational studies. For the most part, those studies have been quite fruitful. But not always. They have been e
Re: Sample size question
On 23 Feb 2001 12:08:45 -0800, [EMAIL PROTECTED] (Scheltema, Karen) wrote: > I tried the site but received errors trying to download it. It couldn't > find the FTP site. Has anyone else been able to access it? As of a few minutes ago, it downloaded fine for me, when I clicked on it with Internet Explorer. The .zip file expanded okay. I used right-click (I just learned that last week) in order to download the .pfd version of the help. [ ... ] < Earlier Q and Answer > "Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size?" > > Look here: > > http://www.interchg.ubc.ca/steiger/r2.htm Hmm. Placing limits on R^2. I have't read the accompanying documentation. On the general principal that you can't compute power if you don't know what power you are looking for, I suggest reading the relevant chapters in Jacob Cohen's book (1988+ edition). -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
On Mon, 19 Feb 2001 04:27:24 GMT, [EMAIL PROTECTED] (Irving Scheffe) wrote: > In responding to Rich, I'll intersperse selected comments with > selected portions of his text and append his entire post below. - I'm not done with the topic yet. But it is difficult to go on from this point. I think the difficulty is that JS has constructed his straw-man argument about how "hypotheses" are handled; and since it is a stupid strategy, it is easy for him to claim that it is fatally flawed. >From his insistence on his "examples," it seems to me that he believes that someone else is committed to using p-levels in a strict way, by beating 5%. That's certainly not the case for me, and I doubt if anyone defends or promotes it, outside of carefully designed Controlled Random Experiments. Despite the fact that I could not make sense of WHY he wanted his example, it turns out -- after he explains it more -- that my own analysis covered the relevant bases. I agree, if you don't have "statistical power," then you don't ask for a 5% test, or (maybe) any test at all. The JUSTIFICATION for having a test on the MIT data is that the power is sufficient to say something. And what it said is that Jim did BAD INFERENCE. I said that a couple of times. I regret that I may have confused people with unnecessary words about "inference." Outlier => No central tendency => Mean is BAD statistic; careful reader insists on more or better information before asserting there's a difference. I asserted that more than once. Optimistically, my own data analysis technique might be described as "starting out with everything Jim might figure out and conclude from the data, and adding to that, flexible comparisons from related fields, and other statistical tools." -- It was quite annoying for me to read where he implicitly says, "You, idiot, would HAVE to decide otherwise." I mean, I thought I wrote a lot clearer than that. Now, below is a quotation that describes Jim's justifications, I hope, in more detail than Jim does. This is from web site which I just discovered, but which looks quite admirable -- except for this question of "Sampling". I think that Garson is wrong, and the last 40 years of epidemiological research have proven the worth of statistics provided on non-random, "observational" samples. When handled with care. >From G. David Garson, "PA 765 Notes: An Online Textbook." On Sampling http://www2.chass.ncsu.edu/garson/pa765/sampling.htm Significance testing is only appropriate for random samples. Random sampling is assumed for inferential statistics (significance testing). "Inferential" refers to the fact that conclusions are drawn about relationships in the data based on inference from knowledge of the sampling distribution. Significance tests are based on a sampling theory which requires that every case have a chance of being selected known in advance of sample selection, usually an equal chance. Statistical inference assesses the significance of estimates made using random samples. For enumerations and censuses, such inference is not needed since estimates are exact. Sampling error is irrelevant and therefore inferential statistics dealing with sampling error are irrelevant. Significance tests are sometimes applied arbitrarily to non-random samples but there is no existing method of assessing the validity of such estimates, though analysis of non-response may shed some light. The following is typical of a disclaimer footnote in research based on a non random sample: "Because some authors (ex., Oakes, 1986) note the use of inferential statistics is warranted for nonprobability samples if the sample seems to represent the population, and in deference to the widespread social science practice of reporting significance levels for nonprobability samples as a convenient if arbitrary assessment criterion, significance levels have been reported in the tables included in this article." See Michael Oakes (1986). Statistical inference: A commentary for social and behavioral sciences. NY: Wiley. Maybe we can pick up the discussion from here? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
I am going to try to stick to the statistics-related parts, in replying to Jim Steiger. With a fake user-name, JS wrote on Thu, 15 Feb 2001 17:34:15 GMT, [EMAIL PROTECTED] (Irving Scheffe): JS > "Rich: "To be blunt, although your comments in this forum are often valuable, you fell far short of two cents worth this time. "This is not a popularity contest, it is a statistical argument. " - I say, if your 'statistical argument' about 'populations' is rejected by large (and growing) fraction of all statisticians, then I think you do have to go back to defend your textbook, or show how your argument differs from what I think it is. That's what I was getting at by mentioning textbooks. < snip, verbiage; Jim cited me, RU > > > - and if you want to know something about how unlikely it was to > >get means that extreme, you can randomize. Do the test. JS > "a. You do *have* means 'that extreme.' "b. There is no 'likelihood' to be considered, because the entire population is available. We were assessing the original MIT conjecture that to imply there were important performance differences between male and female biologists AT MIT would be 'the last refuge of the bigot.'" Given group A and group B, I can do a t-test. Or something. That will give me a quantification that I did not have before. Is such a test interesting? - If I am really in a 'population' circumstance, that question can hardly arise; I would know that the test tells me nothing. It has nothing to do with taking a vote, or providing services to a fixed population. Why does Jim call some means 'extreme'? - in a theoretical 'population', you have means that *exist*. Right now, I think that it is difficult to justify applying any such adjectives, if you regard the set of numbers as a 'population.' I am pointing out: Jim claimed that the productivity of the Men was impressively greater than that of the women; and that was an act of inference on his part. So, his act is screwed up, twice: He does a bad deduction / wrong inference (by ignoring p-level -- in this instance, apparently ignoring the strong impact of an outlier), and then he wrongly claims immunity from the standards of inference. That is, he ought NOT to use means when there are huge outliers that mess up the t-test; and he ought to find a way to use a p-level for support. I have said this a number of times: if you extract meaning, if you make inferences, then you are treating the population as a sample. That is what we do in science, and what we do in almost any occasion where we are publishing for people who are not 'administration.' And that is why we seldom use the set of statistics for 'finite populations' and why we do use tests of inference. JS > "So, my countercomments to you are: "1. Rather than snipping the Gork example, deal with it. Explain, in detail, why the Gork women shouldn't be paid more than the men. My prediction: you can't, and you won't." In detail: I think that it is a wretched example. I still can't figure out what it is supposed to exemplify. But I can comment on the problem. = problem summarized Productivity, Females: (91, 92, 93) Males: (89.5, 90, 90.5) Why should not Females be paid more, if that's what matters? == Based on a t-test, Females might test as having a higher mean. With a few more cases, that difference would be 'significant' with either parametric or rank-testing. But if the natural meaning of production is being used, then there would be a natural zero, and one should OBSERVE: all of these scores are confined to a tiny per-cent range. In fact, the range seems too tiny to be real. Eventually, I conclude that I don't understand the mechanism of generating the scores, and/ or someone has been 'cooking the books' or faking the numbers. If there were a few more subjects added to each Sex, in the same narrow range and pattern, I would conclude that there DEFINITELY was something phony going on. If pay is to be meritocratic, that would seem to justify a TINY difference in wages. Nothing about quality. Piece work, I assume. Sampling of 3 versus 3 is small N; it is far worse than 6 vs 6. If this is supposed to be about 'statistical power': In the MIT citation data, the "large difference" between M and F *would* be significant if there weren't something fishy. < snip, rest. #2 and #3 - (2) seems to have been answered, and (3) seems to be a contentious followup to the artificial example that I scarcely understand in the first place. > -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Survival Analysis - Derivation of Functions?
On Fri, 16 Feb 2001 14:59:14 -0500, "Michelle White" <[EMAIL PROTECTED]> wrote: > Is there any text or article that someone can recommend that clearly goes > through the derivation of the survival function, density function, and > hazard functions? Especially how one is derived from the other? > Survival Analysis. Kleinbaum, David G., 1996, ISBN: 0387945431, Springer-Verlag New York, Incorporated,, US DOLLARS 65.00. If you want the short version, here's a couple of pages on-line that I found in less than a minute by searching (google) on "survival function" "hazard function" http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism & statistical bunk
I am just tossing in my two cents worth ... On Thu, 15 Feb 2001 07:53:13 GMT, Jim Steiger, posting as [EMAIL PROTECTED] (Irving Scheffe) wrote: < snip, name comment > > 2. I tried to make the Detroit Pistons example as obvious as I could. > The point is, if you want to know whether one population performed > better than another, and you have the performance information, [under > the simplying assumption, stated in the example and obviously not > literally true in basketball, that you have an acceptable > unidimensional index of performance], you don't do a statistical test, > you simply compare the groups. - and if you want to know something about how unlikely it was to get means that extreme, you can randomize. Do the test. > > Your question about the randomization test seems > to reflect a rather common confusion, probably > deriving from some overly enthusiastic comments > about randomization tests in some > elementary book. - If you are willing, perhaps we could discuss the textbook examples. I don't remember seeing what I would call "overly enthusiastic comments about randomization." When I looked a few years ago, I did see one book with an opposite fault, exemplified in a problem about planets. I thought the authors' were pedantic or silly, when they refused to admit randomization as a first step of assessing whether there *might* be something interesting going on. >Some people seem to > emerge with vague notions that two-sample randomization tests make > statistical testing appropriate in any situation in which you have > two stacks of numbers. That obviously isn't true. > Your final question asks if "statistical tests" be appropriate > even when not sampling from a population. In some sense, sure. But not > in this case. I can't say that I have absorbed everything that has been argued. But as of now, I think Gene has the better of it. To me, it is not very appropriate to be highly impressed at the mean-differences, when TESTS that are attempted can't show anything. The samples are small-ish, but the means must be wrecked a bit by outliers. > > Maybe the following example will help make > it clearer: < snip rest, including example that brings in "power" but not convincingly. > -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Simulating T tests for Likert scales
On 13 Feb 2001 01:38:35 -0800, [EMAIL PROTECTED] (Will Hopkins) wrote: > Rich Ulrich wrote: > >You can use t-tests > >effectively on outcomes that are dichotomous variables, and you use > >the pooled version (Student's t) despite any difference in variances. > >That is the test that gives you the proper p-levels. WH > > Rich, if the sample sizes in the two groups are different, you have to use > the t test jigged for unequal variances. That's what my simulations showed. > > Your other commments about the robustness of t tests for Likert scales are > reassuring, and thanks for responding. I did find that the confidence > interval went awry when responses got too stacked up on the first or last > level. And what were the conditions of your simulations, the ones that seemed to show a need for testing with 'unequal variances'? - I assume that those were for Likert examples, not dichotomies. I have been pleased with how well the Student's t performed with dichotomies, and annoyed at how badly the Unequal-var test performed. I can show those with EXAMPLES rather than randomizations. I just re-did a couple, to make sure that I was not remembering them wrong. Because I don't remember seeing these comparisons in public before, I will show the results below: - Here are statistics (from SPSS) for the 2x2 table, and for the two t-tests that can be performed. I consider the primary, useful test to be the Pearson chisquared (no correction for continuity). The Student's t and the Pearson chisquared are practically identical in the first table; and in the second table, the Unequal var. t is again far off the mark by every comparison. These tables are lined up for fixed font; but the lines are short enough that they should usually not-wrap. == summary of 2x2 statistics 10% (of 20) vs 1% (of 100) 18 | 2 99 | 1 Chi-SquareValue DF Significance --- Pearson 5.54 1 .0186 Continuity Correction2.46 1 .117 Likelihood Ratio 3.85 1 .0496 Mantel-Haenszel test for 5.49 1 .0191 linear association Fisher's Exact Test: One-Tail .07 Two-Tail .07 - - - - - - - - t-test, pooled var2.39 118 .018 t-test, sep.means .01 vs .1 1.29 19.8.21 t-test, sep.means 1.84 vs 1.331.53 2.04.26 #1 Means of 0.01 vs 0.1 Levene's Test for Equality of Variances: F= 24.0 P= .000 #2 Means of 1.84 vs 1.33 Levene's Test for Equality of Variances: F= 1.59 P= .210 1% (of 100) vs 10% (of 200) 99 | 1 180 |20 Chi-SquareValueDFSignificance --- Pearson 8.291 .00398 Continuity Correction6.971 .0083 Likelihood Ratio10.941 .00094 Mantel-Haenszel test for 8.261 .00404 linear association - - - - - - - - t-test, pooled var 2.91 298 .009 t-test, sep.means .01 vs .1 3.83 270.2.000 t-test, sep.means 1.54 vs 1.95 5.53 36.8 .000 #1 Means of 0.01 vs 0.1 Levene's Test for Equality of Variances: F= 40.9 P= .000 ======== #2 Mean of 1.64 vs 1.95 Levene's Test for Equality of Variances: F= 127.3 P= .000 -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: needed clustering algorithm for similar mean groups
On 12 Feb 2001 11:27:26 -0800, [EMAIL PROTECTED] (EAKIN MARK E) wrote: > I have just given a class their first exam and would like to put a class > of 60 into groups of size three. I would like the groups to have basically > the same average score on the first exam. Would anyone know of an > algorithm for doing this? I don't imagine that I would be totally happy with a mechanical algorithm. How much to I care about the Standard Deviation? For starters, I guess I would generate 20 teams of 3 by random, and then evaluate on a criterion or two. If I generated 1000 or 10,000 sets like this, then I could sort them into order. And see if I like the results. Maybe it would be the set with the minimum F-test for the between-groups ANOVA. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Hypothesis Testing where General Limit Theorem doesn't hold?
On Sun, 11 Feb 2001 01:53:00 GMT, "Neo Sunrider" <[EMAIL PROTECTED]> wrote: > I am just taking an undergraduate introductory stats course but now I > am faced with a somewhat difficult problem (at least for me). > > If I want to test a hypothesis (t-test, z-score etc.) and the underlying > distribution will under no circumstances aproach normal... (i.e. the results > of the experiement will always be something like 100*10.5, 40*-5 etc.) The > Central Limit Theorem doesn't help here, or does it? > > Can anyone explain, or point me in the right direction - how can I test in > these cases? It reads to me as if "the results" will be 2-dimensioned, Frequency (above: 100 or 40) and point-value (10.5 or -5) and you are combining them "unthinkingly" as a product. Or does your notation indicate a few outcome scores, for instance: 10.5 or -5, and the number of times those were manifested? You don't want to use rank-transformation, if you are rightfully concerned with the numerical average, of the scores or of those products -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Simulating T tests for Likert scales
On 1 Feb 2001 01:03:40 -0800, [EMAIL PROTECTED] (Will Hopkins) wrote: > I have an important (for me) question, but first a preamble and hopefully > some useful info for people using Likert scales. > > A week or so ago I initiated a discussion about how non-normal the > residuals have to be before you stop trusting analyses based on > normality. Someone quite rightly pointed out that it depends on the sample > size, because the sampling distribution of almost every statistic derived > from a variable with almost any distribution is near enough to normal for a > large enough sample, thanks to the central limit theorem. Therefore you > get believable confidence limits from t statistics. > > But how non-normal, and how big a sample? I have been doing simulations to > find out. I've limited the simulations to t tests for Likert scales with > only a few levels, because these crop up often in research, and > Likert-scale variables with responses stacked up at one end are not what > you call normally distributed. Yes, I know you can and maybe should > analyze these with logistic regression, but it's hard work for [ ... snip, rest ] Here is an echo of comments I have posted before. You can use t-tests effectively on outcomes that are dichotomous variables, and you use the pooled version (Student's t) despite any difference in variances. That is the test that gives you the proper p-levels. "Likert scales" are something that I tend to think of as "well developed" so they would offer no question to t-testing. But, anyway, items with 3 or 4 or 5 scale points are not prone to having extreme outliers; and if your actual responses across 5 points are bi-modal, you might want to rethink your response-meanings. Generally, I generalize from the dichomous case, to conclude that the t-test will be robust for items with a few points. Years ago, I read an article or two that explicitly asserted that conclusion, based on some Monte Carlo simulations. Just a few weeks ago, I read another justification for scoring categories as integers -- the Mantel paper that is the basis for what Agresti presents in his "Introduction to Categorical Data Analysis." . That "M^2" test (page 35) makes use of fixed variances for proportions. M^2 is tested as chi squared, and its computation is almost identical to t. So I don't fret about using t on items with Likert-type responses, even for small N. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =