I need Help!!

2000-05-30 Thread John Lexmark

Please help me to solve this problem, I am stuck...

An inspector inspects large truckloads of potatoes to determine the
proportion p in the shipment with major defects prior to using the potatoes
to make potato chips.  Unless there is clear evidence that this proportion
is less than 0.10 she will reject the shipment.  To reach a decision she
will test the hypotheses

H0: p=0.10, Ha: p<0.10

Using the large sample test for a population proportion.  To do so, she
selects an SRS of 50 potatoes from the more than 2000 potatoes on the truck.
Suppose that only 2 of potatoes sampled are found to have major defects.
Determine the P-value of her test.

I really appreciate your helps.

John Lexmark
Struggling Statistics Sutdent.





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: VIF

2000-05-30 Thread Donald F. Burrill

On 31 May 2000, Vmcw wrote:

> >>It is 10. I hope, you are talking about Variance Inflation Factor. 
> >>More than 10 indicates severe multicollinearity.

Thus spake Jin Singh.  And someone else (was it Dave Heiser?) retorted, 
sensibly I thought,

> >And where does this magic number come from? :)

To which Tom in PA replied (possibly tongue-in-cheek?), 

> Neter, Wasserman, Nachtsheim, and Kutner, of course!  (or is it Wasserman,
> Kutner, Neter, and Nachtsheim or one of the other 22 permutations?).

I've heard of a Wasserman (or Wassermann?) test, but didn't think it had 
to do with VIF.  Dunno about all those other blokes.  But apart from 
argument by Appeal to Irrelevant Authority at HeadQuarters, was there 
actually some _reasoning_ underlying the selection of VIF = 10, or was 
it just someone's arbitrary guess (like the 10 subjects per variable one 
is supposed to have before one dares essay a factor analysis)?
-- Don.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: VIF

2000-05-30 Thread Vmcw

>>It is 10. I hope, you are talking about Variance Inflation Factor. More 
>than
>>10 indicates severe multicollinearity.
>
>
>And where does this magic number come from? :)
>
>
Neter, Wasserman, Nachtsheim, and Kutner, of course!  (or is it Wasserman,
Kutner, Neter, and Nachtsheim or one of the other 22 permutations?).

Tom in PA


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: VIF

2000-05-30 Thread T.S. Lim

In article <000701bfca86$f831b9a0$047c6395@sprint>, [EMAIL PROTECTED] 
says...
>
>It is 10. I hope, you are talking about Variance Inflation Factor. More 
than
>10 indicates severe multicollinearity.


And where does this magic number come from? :)


>Jin
>
>Jineshwar Singh, Coordinator, IDS
>Interdisciplinary Department
>George Brown College
>St .James campus
>[EMAIL PROTECTED]
>*
>You cannot control how others act but you can
>control how you react.
>416 -415-2089
>http://www.gbrownc.on.ca/~jsingh
>
>- Original Message -
>From: Karen Scheltema <[EMAIL PROTECTED]>
>To: <[EMAIL PROTECTED]>
>Sent: Tuesday, May 30, 2000 4:51 PM
>Subject: VIF
>
>
>> What is the usual cutoff for saying the VIF is too high?
>>
>> Karen Scheltema, M.A., M.S.
>> Statistician
>> HealthEast
>> 1700 University Ave W
>> St. Paul, MN 55104
>> (651) 232-5212   fax: (651) 641-0683

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: VIF

2000-05-30 Thread Alan Miller

Karen Scheltema wrote in message <[EMAIL PROTECTED]>...
>What is the usual cutoff for saying the VIF is too high?

I don't see that there can be any general criterion for saying that
a VIF is too large.   A large value indicates collinearity between
predictor variables.   In some fields, this cannot be avoided.
I have one set of set for which most of the VIFs are in excess
of a million.   The data are from NIR spectroscopy, where this
is unavoidable.

If you do have large VIFs then make sure that your least-squares
software uses some form of orthogonal reduction.   If it uses
the normal equations, and hence squares the condition number,
then you could be in trouble.

>
>Karen Scheltema, M.A., M.S.
>Statistician
>HealthEast
>1700 University Ave W
>St. Paul, MN 55104
>(651) 232-5212   fax: (651) 641-0683
>

--
Alan Miller, Retired Scientist (Statistician)
CSIRO Mathematical & Information Sciences
Alan.Miller -at- vic.cmis.csiro.au
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: VIF

2000-05-30 Thread Donald F. Burrill

On Tue, 30 May 2000, Dale Glaser wrote:

> Karen..off the top of my head, the VIF is the inverse of tolerance, 
> hence, if tolerance = (1 - r^2j), then VIF = 1/(1-r^2j)..

Yes, Dale is correct.

> ... r^2j would be the percentage of variation accounted for by the 
> predictors in predicting the other predictor.. e.g., the linear 
> combination of x1 and x2 in predicting x3;
> anyway, as with any cutoff value there can be an element of 
> arbitrariness, 
Indeed.

> though some have registered concern if VIF > 10.0; my personal opinion 
> is that the aforementioned cutoff value is way too liberal; 

I agree, if one is using the idea of "cutoff";  though possibly I am 
thinking of "conservative" rather than "liberal", since I have seen (and 
dealt with) VIFs exceeding several hundreds.  They don't frighten me 
particularly, partly because by orthogonalizing they can be reduced to 
manageable levels.  Even partly orthogonalizing can reduce VIFs to values 
like 2 or 1.5, at least in some circumstances.

> for VIF to equal 10.0 then 1/=(1 - .9) entails a multiple R of  
> .9486!!!; for me it is a stretch to conceive that collinearity only 
> becomes problematic when R = .9486...I'll be interested to see what 
> others think

Strictly speaking, "multicollinearity" implies R = 1.000, I believe. 
(I don't know why Dale calculates R;  the effective information is that 
[with VIF = 10] R^2 = 0.9, and 10% of the original variance in the 
predictor remains unaccounted for.  As one of our colleagues (Rich 
Ulrich, I think) recently remarked in another context, with R^2 values 
this large one may often usefully consider their complementary values 
(1 - R^2).)
 Most computer regression programs have a control based on tolerance (the
reciprocal of VIF);  I believe Minitab's default tolerance threshold is
around 0.0001 or 0.0002, implying VIFs of 10,000 or 5,000 respectively. 
This of course is not to be taken as an indication of "good practice", 
but of where the systems analysts thought the algorithm was in danger of 
breaking down:  "severe multicollinearity" indeed. 

But a lurking question, as my earlier post may have suggested, is 
whether the multicollinearity apparently present is inherent in the 
nature of the variables, or an artifact of variable construction.
The latter was the case in the problem addressed in the paper on the 
Minitab web site.

Karen's original question was:

> What is the usual cutoff for saying the VIF is too high?

Depends on the purpose for which you think you want a "cutoff", and 
whether you propose to implement it blindly and without further thought, 
or as a (very!) rough guideline regarding where the currents (and perhaps 
the undertow) may be dangerous and REQUIRE further thought;  just for two 
examples.
-- Don.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: partial least squares regression

2000-05-30 Thread Donald F. Burrill

On Tue, 30 May 2000, Karen Scheltema wrote:

> Can someone enlighten me about how partial least squares regression 
> works to handle multicollinearity. 

Depends partly on whether you're looking at real, or spurious, 
multicollinearity;  and may depend on where the multicollinearity 
actually arises.  We may need more description of your particular 
problem for a useful conversation.

What I understand by "partial regression" is not really different from 
multiple regression:  it involves "partialling" out the effects of 
various predictors both from other (in general susbequent) predictors 
and from the response variable, often either in a sequential manner or 
in a way that implies a sequence by assigning an order of hierarchical 
importance, if you will, to the several predictors.

If the multicollinearity arises from artificial variables computed as 
the product of two or more raw variables, usually in seeking evidence 
for or against the presence of interactions between those raw variables, 
there is an obvious sort of hierarchy: 
 raw variables > 2-way interactions > 3-way interactions > ...
If the multicollinearity is "built in", so to speak, because there really 
exists a (near-)linear combination among the predictors (or even more 
than one such combination), one may need to decide which variable(s) to 
EXclude in order to avoid the multicollinearity:  this is another way of 
saying that one needs to assign a hierarchy of importance to the 
predictors. 
In any event, the main problem with multicollinearity is 
computational:  in finite precision, estimation becomes unreliable as the 
apparent zero-order correlations become less distinguishable from 1.000...
The most effective approach to the problem that I know of is to (begin 
to) orthogonalize at least some of the predictors with respect to some 
or all of the others.  For an example applying this idea in practice 
(where multicollinearity arose from computing raw interaction variables) 
see my paper on the Minitab web site (www.minitab.com  and look for 
Resources, then White Papers).

>  Can SPSS do partial least squares regression? 

Yes, if we're talking on the same wavelength.  Requires computing 
residual variables and adjoining them to the variables in the data set, 
then using them as predictors in place of the products (or raw variables) 
they're residuals from.
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: VIF

2000-05-30 Thread Jineshwar Singh

It is 10. I hope, you are talking about Variance Inflation Factor. More than
10 indicates severe multicollinearity.
Jin

Jineshwar Singh, Coordinator, IDS
Interdisciplinary Department
George Brown College
St .James campus
[EMAIL PROTECTED]
*
You cannot control how others act but you can
control how you react.
416 -415-2089
http://www.gbrownc.on.ca/~jsingh

- Original Message -
From: Karen Scheltema <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, May 30, 2000 4:51 PM
Subject: VIF


> What is the usual cutoff for saying the VIF is too high?
>
> Karen Scheltema, M.A., M.S.
> Statistician
> HealthEast
> 1700 University Ave W
> St. Paul, MN 55104
> (651) 232-5212   fax: (651) 641-0683
>
> 
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
>
>
>
>
===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
===
>



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: VIF

2000-05-30 Thread Dale Glaser

Karen..off the top of my head, the VIF is the inverse of tolerance, hence,
if tolerance = (1 - r^2j), then VIF=
1/(1-r^2j)..[excuse the sloppiness of the notation, but r^2j would be the
percentage of variation accounted for by the predictors in predicting the
other predictor..ie., the linear combination of x1 and x2 in predicting x3];
anyway, as with any cutoff value there can be an element of arbitrariness,
though some have registered concern if VIF > 10.0; my personal (possibly
misinformed!) opinion is that the aforementioned cutoff value is way too
liberal; for VIF to equal 10.0 then 1/=(1 - .9) entails a multiple R of
.9486!!!; for me it is a stretch to conceive that collinearity only becomes
problematic when R = .9486...I'll be interested to see what others
think

Dale Glaser, Ph.D.
Senior Statistician, Pacific Science and Engineering Group
Adjunct faculty/lecturer, SDSU/USD/CSPP
San Diego, CA.



-Original Message-
From:   [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
On Behalf Of Karen Scheltema
Sent:   Tuesday, May 30, 2000 1:52 PM
To: [EMAIL PROTECTED]
Subject:VIF

What is the usual cutoff for saying the VIF is too high?

Karen Scheltema, M.A., M.S.
Statistician
HealthEast
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax: (651) 641-0683


Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: sas vs s-plus for qc (fwd)

2000-05-30 Thread Ken K.

Does not appear to be moving that direction with Minitab itself. They are
teaming with Hertzler Systems (www.hertzlersystems.com) and Qualifine to
jointly provide what looks to be a very nice real-time SPC system. Contact
qualifine (www.qualifine.com I think) to find out more. I'm sure Minitab could
also give you some info on that.

"Donald F. Burrill" wrote:

> Sorry, all;  my attempt to mail this to "Ken K." directly failed.
> Presumably he reads the list, since he posted to it.
> -- DFB.
>  
>  Donald F. Burrill [EMAIL PROTECTED]
>  348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
>  MSC #29, Plymouth, NH 03264 603-535-2597
>  184 Nashua Road, Bedford, NH 03110  603-471-7128
>
> -- Forwarded message --
> Date: Wed, 24 May 2000 13:25:43 -0400 (EDT)
> From: Donald F. Burrill <[EMAIL PROTECTED]>
> To: "Ken K." <[EMAIL PROTECTED]>
> Cc: "Donald F. Burrill" <[EMAIL PROTECTED]>
> Subject: Re: sas vs s-plus for qc
>
> On Wed, 24 May 2000, Ken K. wrote in part:
>
> > I should have mentioned that MINITAB does not provide, and does appear
> > to plan to offer, real-time data collection and SPC
>
> Did you mean "does appear", or "does not appear" ?
> -- DFB.
>  
>  Donald F. Burrill [EMAIL PROTECTED]
>  348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
>  MSC #29, Plymouth, NH 03264 603-535-2597
>  184 Nashua Road, Bedford, NH 03110  603-471-7128
>
> ===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
> ===



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Statistical Calculator for Palm OS

2000-05-30 Thread Ken K.

If you do some searching at http://www.palmgear.com you should be able to
find some function sets that work with the RPN calculator. There is also a
nice statistical analysis package called Palm Stat. PalmGear also has that -
search for "palm stat".

Eric Turkheimer wrote:

> Is there a good statistical calculator for the Palm OS?  Not just the
> usual mean and SD functions, it would be especially useful if some
> statistical distribution functions were included.  Seems like a natural.
>
> Eric
> [EMAIL PROTECTED]



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



VIF

2000-05-30 Thread Karen Scheltema

What is the usual cutoff for saying the VIF is too high?

Karen Scheltema, M.A., M.S.
Statistician
HealthEast
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax: (651) 641-0683


Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



partial least squares regression

2000-05-30 Thread Karen Scheltema

Can someone emlighten me about how partial least squares regression works to 
handle multicollinearity.  Can SPSS do partial least squares regression?

Karen Scheltema, M.A., M.S.
Statistician
HealthEast
1700 University Ave W
St. Paul, MN 55104
(651) 232-5212   fax: (651) 641-0683


Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



WHY is heteroscedasticity bad? (fwd)

2000-05-30 Thread Bob Hayden


This may not fully answer all your questions, but the various formulas
for inference for regression have a place where you plug in THE
variance of the points around the regression model, i.e., they treat
this as a constant.  If it is not a constant, the results of the
formulas will not be correct.  Specific ramifications would depend on
what formual you are interested in, the exact manner in which the
variance varies, etc. 


- Forwarded message from Markus Quandt -

>From [EMAIL PROTECTED]  Tue May 30 15:54:27 2000
X-Authentication-Warning: jse.stat.ncsu.edu: majordom set sender to 
[EMAIL PROTECTED] using -f
To: [EMAIL PROTECTED]
Date: Tue, 30 May 2000 19:03:49 +0200
From: Markus Quandt <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
Organization: Universitaet zu Koeln
X-Sender: [EMAIL PROTECTED]
Subject: WHY is heteroscedasticity bad?
Sender: [EMAIL PROTECTED]
Precedence: bulk

Hello all,

when discussing linear regression assumptions with a colleague, we
noticed that we were unable to explain WHY heteroscedasticity has
the well known ill effects on the estimators' properties. I know
WHAT the consequences are (loss of efficiency, tendency to
underestimate the standard errors) and I also know why these
consequences are undesirable. What I'm lacking is a substantial
understanding of HOW the presence of inhomogeneous error variances
increases the variability of the coefficients, and HOW the
estimation of the standard errors fails to reflect this.
I consulted a number of (obviously too basic) textbooks, all but
one only state the problems that arise from het.sc. The one that
isn't a total blank (Kmenta's Elements of Econometrics, 1986) tries
to give an intuitive explanation (along with a proof of the
inefficiency of the =DF estimators with het.sc.), but I don't fully
understand that.
Kmenta writes:
"The standard least squares principle involves minimizing
[equation: sum of squared errors], which means that each squared
disturbance is given equal weight. This is justifiable wehn each
disturbance comes from the same distribution. Under het.sc.,
however, different disturbances come from different distributions
with different variances. Clearly, those disturbances that come
from distributions with a smaller variance give more precise
information about the regression line than those coming from
distributions with a larger variance. To use sample information
efficiently, one should give more weight to the observations with
less dispersed disturbances than to those with more dispersed
disturbances." p. 272

I see that the conditional distributions of the disturbances
obviously differ if het.sc. is present (well, this is the
definition of het.sc., right?), and that, IF I want to compensate
for this, I can weight the data accordingly (Kmenta goes on to
explain WLS estimation). But firstly, I still don't see why
standard errors increased in the first place... And secondly, is it
really legitimate to claim that OLS is 'wrong', if it treats
differing conditional disturbances with equal weight?

Assume the simple case of increasing variances of Y with increasing
values of X, and therefore het.sc. present. With differing
precision of prediction for different X values, the standard error
(SE) of the regression coefficient (b) should become conditional on
the value of X, the higher X, the higher SE, with E(b) constant
over all values of X - correct? Then, isn't the standard error as
estimated by OLS implicitly an _average_ over all these conditional
SEs (just following intuition here)? How can we claim that the
specific SE at the X value with the lowest disturbance is the
'true' one? (Exception: het.sc. is due to uneven measurement error
for Y - I can see that the respective data points are less
reliable.)

Regarding the first question: Can this be answered at all without
the formal proof?

Thanks for your patience, MQ

--

 Markus Quandt




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===

- End of forwarded message from Markus Quandt -

-- 
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  82 River Street
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  fax (

WHY is heteroscedasticity bad?

2000-05-30 Thread Markus Quandt

Hello all,

when discussing linear regression assumptions with a colleague, we
noticed that we were unable to explain WHY heteroscedasticity has
the well known ill effects on the estimators' properties. I know
WHAT the consequences are (loss of efficiency, tendency to
underestimate the standard errors) and I also know why these
consequences are undesirable. What I'm lacking is a substantial
understanding of HOW the presence of inhomogeneous error variances
increases the variability of the coefficients, and HOW the
estimation of the standard errors fails to reflect this.
I consulted a number of (obviously too basic) textbooks, all but
one only state the problems that arise from het.sc. The one that
isn't a total blank (Kmenta's Elements of Econometrics, 1986) tries
to give an intuitive explanation (along with a proof of the
inefficiency of the =DF estimators with het.sc.), but I don't fully
understand that.
Kmenta writes:
"The standard least squares principle involves minimizing
[equation: sum of squared errors], which means that each squared
disturbance is given equal weight. This is justifiable wehn each
disturbance comes from the same distribution. Under het.sc.,
however, different disturbances come from different distributions
with different variances. Clearly, those disturbances that come
from distributions with a smaller variance give more precise
information about the regression line than those coming from
distributions with a larger variance. To use sample information
efficiently, one should give more weight to the observations with
less dispersed disturbances than to those with more dispersed
disturbances." p. 272

I see that the conditional distributions of the disturbances
obviously differ if het.sc. is present (well, this is the
definition of het.sc., right?), and that, IF I want to compensate
for this, I can weight the data accordingly (Kmenta goes on to
explain WLS estimation). But firstly, I still don't see why
standard errors increased in the first place... And secondly, is it
really legitimate to claim that OLS is 'wrong', if it treats
differing conditional disturbances with equal weight?

Assume the simple case of increasing variances of Y with increasing
values of X, and therefore het.sc. present. With differing
precision of prediction for different X values, the standard error
(SE) of the regression coefficient (b) should become conditional on
the value of X, the higher X, the higher SE, with E(b) constant
over all values of X - correct? Then, isn't the standard error as
estimated by OLS implicitly an _average_ over all these conditional
SEs (just following intuition here)? How can we claim that the
specific SE at the X value with the lowest disturbance is the
'true' one? (Exception: het.sc. is due to uneven measurement error
for Y - I can see that the respective data points are less
reliable.)

Regarding the first question: Can this be answered at all without
the formal proof?

Thanks for your patience, MQ

--

 Markus Quandt




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: question about minitab

2000-05-30 Thread Donald F. Burrill

On Tue, 30 May 2000, Niklas Hansen wrote:

> I'm a statistics student in sweden who needs some help.
> I'm running a best subsets (regression) and I get the following
> output:
<  output deleted  >

> What I would like to know is what does the statistic C-p mean?
> I would be very happy if someone could explain it to me...

C-p  is usually written as  C  with subscript  p .  I don't recall who 
invented it, but I remember encountering it in statistical journals 
several decades ago.  Minitab's Reference Manual (1989, for Release 7;  
there are probably more modern references) has this to say:

[begin Minitab quote]

The  C-p  statistic is given by the formula

C-p  =  (SSEp/MSEm) - (n - 2p)

where SSEp is SSE [Sum of squares due to error] for the best model with 
p  parameters (including the intercept, if it is in the equation), and 
MSEm is the mean square error for the model with all  m  predictors.

In general, we look for models where  C-p  is small and is also close to 
p.  If the model is adequate (i.e., fits the data well), then the 
expected value of  C-p  is approximately equal to  p,  the number of 
parameters in the model.  A small value of  C-p  indicates that the 
model is relatively precise (has small variance) in estimating the true 
regression coefficients and predicting future responses.  This precision 
will not improve much by adding more predictors.  Models with 
considerable lack of fit have values of  C-p  larger than  p.  
 See [9] for more on  C-p.

[end of Minitab quote]

The reference [9] cited is 
 R.R. Hocking (1976).  "A Biometrics Invited Paper:  The Analysis and 
 Selection of Variables in Linear Regression," Biometrics 32, pp. 1-49.

 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Ordinal log-linear model

2000-05-30 Thread Dale Glaser

I just downloaded LEM also and the manual, which as you said is in
POSTSCRIPT format, is readable by what I assume are various sources, but the
one I have is GHOSTSCRIPT; do a search for GHOSTSCRIPT and you can download
it for free...in the old days it was kind of a pain to read PS files as
you had to save fonts files in separate subdirectories, but now it's a lot
easierdale glaser

-Original Message-
From:   [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
On Behalf Of Buoy
Sent:   Sunday, May 28, 2000 6:35 AM
To: [EMAIL PROTECTED]
Subject:Re: Ordinal log-linear model

It's me again

Following your suggestions downloaded LEM. The program is working (I
examined featured examples which I downloaded too).
I also downloaded zipped manual for LEM. The filename was: MANUAL.PS and in
README.TXT was written, that this is a "postscript" format (???) which can
be printed with  "postscript" printer or viewed with "postscript" viewer.

What is that - POSTSCRIPT viewer / printer?
Or simply
How can I view or print the LEM manual?

Desperate
Michal Bojanowski




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



question about minitab

2000-05-30 Thread Niklas Hansen

Im a statistics student in sweden who needs some help.
Im running a best subsets (regression) and I get the following
output:

Response is skad_tot

  D   n a 
  k   r   y n 
  o   i N a t d d 
  l B f y _ _ u u 
  l i t b k u m m 
  _ l _ y o n m m 
  Adj.t a k g r g y y 
Vars   R-sq   R-sqC-p s   r r o g k d 1 2 

   1   83.8   82.8   42.6179.36 X 
   1   81.9   80.8   49.3189.70 X 
   2   89.8   88.4   23.9147.55 X   X 
   2   89.6   88.2   24.6148.95   X X 
   3   94.2   92.9   10.4115.14   X X   X 
   3   92.5   90.9   16.3130.75 X   X   X 
   4   95.3   93.88.5107.63   X X X X 
   4   95.2   93.78.9108.65   X X   X   X 
   5   96.9   95.64.990.937   X   X X   X   X 
   5   96.6   95.25.994.921 X X X   X   X 
   6   97.4   95.95.287.205   X   X X   X X X 
   6   97.2   95.66.090.823   X X X X   X   X 
   7   97.4   95.67.190.850   X   X X X X X X 
   7   97.4   95.67.191.162   X X X X   X X X 
   8   97.4   95.19.095.406   X X X X X X X X 

What I would like to know is what does the statistica C-p mean?
I would be very happy if someone could explain it to me...

Regards
Niklas Hansen
University of mälardalen
Västerås, Sweden



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



dispersal distance

2000-05-30 Thread osama hussien

Hi All
I look for a refrence on statistical analysis ( discrete probabality
distribution and survival function) for dispersal random distance,
(animal movement).
thank you.



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Statistical process control

2000-05-30 Thread Franck Golliot

Hi,

I'm looking for books or references on "Statistical process control".
Does anyone could give me such references.

thank you,

--
Franck GOLLIOT




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===