Re: cite for using linear regression instead of logistic regression
David -- Logistic Regression is more appealing to some folks since it maps the Predicted values into the range 0-1. If you do a least-squares regression predicting a 0-1 dependent variable, the predicted values may not be mapped into 0-1 (e.g. some predicted values may be < 0 and some may be > 1. However, for "practical" decision-making such as "selection", "classification" the results will be the same. Since you brought up the question, I'm sure that the "logistic regression" folks can enlighten us on the "practical" advantages of "logistic regression". -- Joe Joe Ward167 East Arrowhead Dr.San Antonio, TX 78228-2402Home phone: 210-433-6575Home fax: 210-433-2828Email: [EMAIL PROTECTED]http://www.ijoa.org/joeward/wardindex.htmlHealth Careers High School4646 Hamilton WolfeSan Antonio, TX 78229Phone: 210-617-5400Fax: 210-617-5423 - Original Message - From: "David Duffy" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, March 18, 2001 8:41 PM Subject: Re: cite for using linear regression instead of logistic regression > Scheltema, Karen <[EMAIL PROTECTED]> wrote:> > > I've read several times on this listserve comments from people that when> > p(y) is not extreme, a logistic regression model can be estimated by a> > linear regression model.> > Some references cited by Harvey (1982): also BF&H> > Harvey WR (1982). Least squares analysis of discrete data. J Anim Sci> 54: 1067-1071.> > Cochran WG (1940). The analysis of variance when experimental errors follow> the Poisson or binomial laws. Ann Math Statis 11: 335.> > Cochran WG (1943). Analysis of variance for percentages based on> unequal numbers. JASA 38:287.> > Li JCR (1964). Introduction to statistical inference I. Ann Arbor: Edwards.> > -- > | David Duffy. ,-_|\> | email: [EMAIL PROTECTED] ph: INT+61+7+3362-0217 fax: -0101 / *> | Epidemiology Unit, The Queensland Institute of Medical Research \_,-._/> | 300 Herston Rd, Brisbane, Queensland 4029, Australia v > > > => Instructions for joining and leaving this list and remarks about> the problem of INAPPROPRIATE MESSAGES are available at> http://jse.stat.ncsu.edu/> =
No Subject
subscribe edstat-L Vadim Abbakoumov = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
web robot
The source code (Perl) is free. What you pay for is advanced technical support to design your own applications. The program comes with a sample application to download stock quotes. Available at http://www.datashaping.com/robot.shtml = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: cite for using linear regression instead of logistic regression
Scheltema, Karen <[EMAIL PROTECTED]> wrote: > I've read several times on this listserve comments from people that when > p(y) is not extreme, a logistic regression model can be estimated by a > linear regression model. Some references cited by Harvey (1982): also BF&H Harvey WR (1982). Least squares analysis of discrete data. J Anim Sci 54: 1067-1071. Cochran WG (1940). The analysis of variance when experimental errors follow the Poisson or binomial laws. Ann Math Statis 11: 335. Cochran WG (1943). Analysis of variance for percentages based on unequal numbers. JASA 38:287. Li JCR (1964). Introduction to statistical inference I. Ann Arbor: Edwards. -- | David Duffy. ,-_|\ | email: [EMAIL PROTECTED] ph: INT+61+7+3362-0217 fax: -0101/ * | Epidemiology Unit, The Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia v = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: can you use a t-test with non-interval data?
On 17 Mar 2001 19:54:27 -0800, [EMAIL PROTECTED] (Will Hopkins) wrote: > I just thought of a new justification doing the usual parametric analyses > on the numbered levels of a Likert-scale variable. Numbering the levels > is formally the same as ranking them, and a parametric analysis of a > rank-transformed variable is a non-parametric analysis. If non-parametric > analyses are OK, then so are parametric analyses of Likert-scale variables. Good comment. One thing that happened, in recent years, was that Conover, et al., showed that you can to the t-test on Ranked data and get a really good approximation of the "exact" p-level, even when the Ns are quite small. Further: Ranked data has theoretical problems with *ties* -- which is the chronic condition Likert-scale items. In fact, using the t-test on Ranks sometimes gives a better p-value that what your textbook recommends for "adjusting for ties." Further again: In the cases where there are "odd" distributions, in the several categories, you want to check to see what the rank-tranformation assigns to categories as their effective "scores" and then select between analyses. For my data, the 1...5 assigned scoring almost always looks better than the intervals achieved by ranks. Agresti has a detailed example of arbitrary scoring of categories in his textbook, "Introduction to categorical data analysis." > > But... an important condition is that the sampling distribution of your > outcome statistic must be normal. This topic came up on this list a few > weeks ago. In summary, if the majority of your responses are stacked up on > one or other extreme value of the Likert scale for one or more groups in > the analysis, and if you have less than 10 observations in one or more of > those groups, your confidence intervals or p values are untrustworthy. See > http://newstats.org/modelsdetail.html#normal for more. Good comment, too. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
edstat-l@jse.stat.ncsu.edu
Jerry Dallal <[EMAIL PROTECTED]> wrote: >It is frustrating to keep getting errors when I try to access a >printable version of the report, whether by using IE or Netscape. >Is there a known workaround? Yes, it's called Opera: http://www.operasoftware.com = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: One tailed vs. Two tailed test
On Fri, 16 Mar 2001 23:40:07 -, [EMAIL PROTECTED] (Jerry Dallal) wrote: >FWIW, for large samples, 0.1% in the unexpected tail >corresponds to a t statistic of 3.09. I'd love to >be a fly on the wall while someone is explaining to >a client why that t = 3.00 is non-significant! :-) What if you had an effect that when it does happen is pretty obvious (e.g. H_1 results in a std t-distn mean-shifted to mean = 10)? An observed t-value of 3 may be statistically significant at the 0.1% level and yet should still count as evidence for the null hypothesis rather than against it. But, of course, in situations like that there is no need to run a statistical test... Vit D. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: can you use a t-test with non-interval data?
Ben Kenward wrote: > My girlfriend is researching teaching methods using a questionnaire, and she > has answers for questions in the form of numbers from 1 to 5 where 5 is > strongly agree with a statement and 1 is strongly disagree. She is proposing > to do a t-test to compare, for example, male and female responses to a > particular question. > > I was surprised by this because I always thought that you needed at least > interval data in order for a t-test to be valid. Her textbook actually says > it is OK to do this though. I don't have any of my old (life-sciences) stats > books with me, so I can't check what I used to do. > > So are the social scientists playing fast and loose with test validity, or > is my memory playing up? Classic issue, frequent discussion, careful response distinctions needed. Yes, interval data is needed to do a t test. Is the data from a Likert scale (what your friend has) interval data? Depends on how you see it. When a respondent puts a mark halfway between two check boxes (i.e., 3.5 on the numerical scale), they are trying to tell you that _they_ see it as interval, as continuous in fact. What is the '3' position? Is it really between 2 and 4, or is it 'none of the above' type of thing? If the latter, it's no dice - not interval. for a t test, you really want intervals that are equally spaced. Is this so? Is this reasonably close to so? Lots more debate on that. By making the levels marked as points on the continuum from the 1 to the 5 positions, you are implying that they are equally spaced. Does the respondant see them that way? Could be. maybe we should just try it, to see what comes out. For a t test, you prefer a scale which is in principle potentially infinite. When I do this sort of thing, I sometimes get responses of 0 and 6, for potential conditions I didn't anticipate. Otherwise, the scale is restricted at the bottom and top. How to correct for this? One way is to do a logit transform (if I get the term right) Convert the 1 - 5 scale into a 0 to 1 scale by: y' = (y-1)/4 then a logit transform (omega transform via Taguchi): y'' = ln(y'/(1-y') the y'' distribution will more closely approach the infinite width potential requested, and will never give you a prediction of more than 5 or less than 1 on the y scale. BUT... this assumes that the earlier assumptions about scale and interval size are very tight. They probably aren't. Why waste your time doing very precise analyses on weak data? Suggestion: (a)run the t test on the raw responses, y's. See if anything pops up. (b)go back and check that the assumption requirements are met or at least arguable. Check some respondents to see that they saw the scale as you did, and adjust your thinking to theirs. (c)IF you have time and the data is reasonably tight, AND if you want to impress someone with your transformational skills, then go do that transform and re-analyze. In most cases, the conclusions will not be greatly different, in my experience. the only place things get dicey is when a mean response is near the ends (1 or 5). Detecting differences there can be harder, and a small change there is more significant than a small change in the middle. references?sorry. I've only done it a couple times, and know it works - it gets me predictions that pan out in confirmation. treating the data as nominal, instead of interval, may give away information. that's expensive. good luck, Jay -- Jay Warner Principal Scientist Warner Consulting, Inc. North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX:(262) 681-1133 email: [EMAIL PROTECTED] web:http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =