Re: cite for using linear regression instead of logistic regression

2001-03-18 Thread Joe Ward



David --
 
Logistic Regression is more appealing to some 
folks since
it maps the Predicted values into the range 
0-1.
 
If you do a least-squares regression predicting a 
0-1 
dependent variable, the predicted values may not 
be
mapped into 0-1 (e.g. some predicted values may 
be < 0
and some may be > 1.
 
However, for "practical" decision-making such as 
"selection",
"classification" the results will be the 
same.
 
Since you brought up the 
question, I'm sure that the "logistic regression"
folks can enlighten us on 
the "practical" advantages of "logistic regression".
 
-- Joe
 
Joe 
Ward167 East Arrowhead Dr.San Antonio, TX 78228-2402Home phone: 
210-433-6575Home fax: 210-433-2828Email: [EMAIL PROTECTED]http://www.ijoa.org/joeward/wardindex.htmlHealth 
Careers High School4646 Hamilton WolfeSan Antonio, TX 78229Phone: 
210-617-5400Fax: 
210-617-5423
 
 
- Original Message - 
From: "David Duffy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, March 18, 2001 8:41 PM
Subject: Re: cite for using linear regression 
instead of logistic regression
> Scheltema, Karen <[EMAIL PROTECTED]> 
wrote:> > > I've read several times on this listserve comments 
from people that when> > p(y) is not extreme, a logistic regression 
model can be estimated by a> > linear regression model.> 
> Some references cited by Harvey (1982):  also BF&H> 
> Harvey WR (1982).  Least squares analysis of discrete data.  
J Anim Sci> 54: 1067-1071.> > Cochran WG (1940).  The 
analysis of variance when experimental errors follow> the Poisson or 
binomial laws.  Ann Math Statis 11: 335.> > Cochran WG 
(1943).  Analysis of variance for percentages based on> unequal 
numbers. JASA 38:287.> > Li JCR (1964).  Introduction to 
statistical inference I.  Ann Arbor: Edwards.> > -- > 
| David 
Duffy. 
,-_|\> | email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101    
/ *> | Epidemiology Unit, The Queensland 
Institute of Medical Research \_,-._/> | 300 Herston Rd, Brisbane, 
Queensland 4029, 
Australia 
v > > > 
=> 
Instructions for joining and leaving this list and remarks about> the 
problem of INAPPROPRIATE MESSAGES are available at> 
  
http://jse.stat.ncsu.edu/> 
= 



No Subject

2001-03-18 Thread áÂÂÁËÕÍÏ× ÷ÁÄÉÍ ìÅÏÎÁÒÄÏ×ÉÞ

subscribe edstat-L Vadim Abbakoumov 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



web robot

2001-03-18 Thread Vincent Granville

The source code (Perl) is free. What you pay for is advanced technical
support to design your own applications. The program comes with a sample
application to download stock quotes. Available at
http://www.datashaping.com/robot.shtml




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: cite for using linear regression instead of logistic regression

2001-03-18 Thread David Duffy

Scheltema, Karen <[EMAIL PROTECTED]> wrote:

> I've read several times on this listserve comments from people that when
> p(y) is not extreme, a logistic regression model can be estimated by a
> linear regression model.

Some references cited by Harvey (1982):  also BF&H

Harvey WR (1982).  Least squares analysis of discrete data.  J Anim Sci
54: 1067-1071.

Cochran WG (1940).  The analysis of variance when experimental errors follow
the Poisson or binomial laws.  Ann Math Statis 11: 335.

Cochran WG (1943).  Analysis of variance for percentages based on
unequal numbers. JASA 38:287.

Li JCR (1964).  Introduction to statistical inference I.  Ann Arbor: Edwards.

-- 
| David Duffy. ,-_|\
| email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101/ *
| Epidemiology Unit, The Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia v 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: can you use a t-test with non-interval data?

2001-03-18 Thread Rich Ulrich

On 17 Mar 2001 19:54:27 -0800, [EMAIL PROTECTED] (Will Hopkins)
wrote:

> I just thought of a new justification doing the usual parametric analyses 
> on the numbered levels of a Likert-scale variable.   Numbering the levels 
> is formally the same as ranking them, and a parametric analysis of a 
> rank-transformed variable is a non-parametric analysis.   If non-parametric 
> analyses are OK, then so are parametric analyses of Likert-scale variables.

Good comment.  

One thing that happened, in recent years, was that Conover, 
et al., showed that  you can to the t-test on Ranked data and 
get a really good approximation of the "exact" p-level, 
even when the Ns are quite small. 

Further:  Ranked data has theoretical problems with *ties* --
which is the chronic condition Likert-scale items.  In fact, using the
t-test on Ranks sometimes gives a better p-value that what your
textbook recommends for "adjusting for ties."  

Further again:  In the cases where there are "odd"  distributions,
in the several categories, you want to check to see what the
rank-tranformation assigns to categories as their effective "scores"
and then select between analyses.  For my data, the 1...5
assigned scoring almost always looks better than the intervals
achieved by ranks.

Agresti has a detailed example of arbitrary scoring of categories
in his textbook, "Introduction to categorical data analysis."

> 
> But...  an important condition is that the sampling distribution of your 
> outcome statistic must be normal.  This topic came up on this list a few 
> weeks ago.  In summary, if the majority of your responses are stacked up on 
> one or other extreme value of the Likert scale for one or more groups in 
> the analysis, and if you have less than 10 observations in one or more of 
> those groups, your confidence intervals or p values are untrustworthy.  See 
> http://newstats.org/modelsdetail.html#normal for more.

Good comment, too.  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



edstat-l@jse.stat.ncsu.edu

2001-03-18 Thread Neville X. Elliven

Jerry Dallal <[EMAIL PROTECTED]> wrote:

>It is frustrating to keep getting errors when I try to access a
>printable version of the report, whether by using IE or Netscape.
>Is there a known workaround?

Yes, it's called Opera:

http://www.operasoftware.com


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: One tailed vs. Two tailed test

2001-03-18 Thread Vit Drga

On Fri, 16 Mar 2001 23:40:07 -, [EMAIL PROTECTED] (Jerry
Dallal) wrote:

>FWIW, for large samples, 0.1% in the unexpected tail 
>corresponds to a t statistic of 3.09.  I'd love to 
>be a fly on the wall while someone is explaining to 
>a client why that t = 3.00 is non-significant!  :-)

What if you had an effect that when it does happen is pretty obvious
(e.g. H_1 results in a std t-distn mean-shifted to mean = 10)? An
observed t-value of 3 may be statistically significant  at the 0.1%
level and yet should still count as evidence for the null hypothesis
rather than against it.  But, of course, in situations like that there
is no need to run a statistical test...

Vit D.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: can you use a t-test with non-interval data?

2001-03-18 Thread Jay Warner



Ben Kenward wrote:

> My girlfriend is researching teaching methods using a questionnaire, and she
> has answers for questions in the form of numbers from 1 to 5 where 5 is
> strongly agree with a statement and 1 is strongly disagree. She is proposing
> to do a t-test to compare, for example, male and female responses to a
> particular question.
> 
> I was surprised by this because I always thought that you needed at least
> interval data in order for a t-test to be valid. Her textbook actually says
> it is OK to do this though. I don't have any of my old (life-sciences) stats
> books with me, so I can't check what I used to do.
> 
> So are the social scientists playing fast and loose with test validity, or
> is my memory playing up?

Classic issue, frequent discussion, careful response distinctions needed.

Yes, interval data is needed to do a t test.  Is the data from a Likert 
scale (what your friend has) interval data?  Depends on how you see it.  
When a respondent puts a mark halfway between two check boxes (i.e., 3.5 
on the numerical scale), they are trying to tell you that _they_ see it 
as interval, as continuous in fact. 

What is the '3' position?  Is it really between 2 and 4, or is it 'none 
of the above' type of thing?  If the latter, it's no dice - not interval.

for a t test, you really want intervals that are equally spaced.  Is 
this so?  Is this reasonably close to so?  Lots more debate on that.  By 
making the levels marked as points on the continuum from the 1 to the 5 
positions, you are implying that they are equally spaced.  Does the 
respondant see them that way?  Could be.  maybe we should just try it, 
to see what comes out.

For a t test, you prefer a scale which is in principle potentially 
infinite.  When I do this sort of thing, I sometimes get responses of 0 
and 6, for potential conditions I didn't anticipate.  Otherwise, the 
scale is restricted at the bottom and top.  How to correct for this?  
One way is to do a logit transform (if I get the term right)

Convert the 1 - 5 scale into a 0 to 1 scale by:  y' = (y-1)/4
then a logit transform (omega transform via Taguchi):

   y'' = ln(y'/(1-y')

the y'' distribution will more closely approach the infinite width 
potential requested, and will never give you a prediction of more than 5 
or less than 1 on the y scale.

BUT...  this assumes that the earlier assumptions about scale and 
interval size are very tight.  They probably aren't.  Why waste your 
time doing very precise analyses on weak data?

Suggestion: 
(a)run the t test on the raw responses, y's.  See if anything pops up.
(b)go back and check that the assumption requirements are met or at 
least arguable.  Check some respondents to see that they saw the scale 
as you did, and adjust your thinking to theirs.
(c)IF you have time and the data is reasonably tight, AND if you 
want to impress someone with your transformational skills, then go do 
that transform and re-analyze.  In most cases, the conclusions will not 
be greatly different, in my experience.  the only place things get dicey 
is when a mean response is near the ends (1 or 5).  Detecting 
differences there can be harder, and a small change there is more 
significant than a small change in the middle.

references?sorry.  I've only done it a couple times, and know it 
works - it gets me predictions that pan out in confirmation.  treating 
the data as nominal, instead of interval, may give away information.  
that's expensive.

good luck,
Jay

-- 
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX:(262) 681-1133
email:  [EMAIL PROTECTED]
web:http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=