[EMAIL PROTECTED] (Kload) wrote in message news:<[EMAIL PROTECTED]>...
> Hi all,
> 
> I've been trying to understand the breakdown point of M-Estimators in
> linear regression however there are a couple of issues that continue
> to confuse me.  I would be grateful if any of you could clarify these
> issues or point me to a reference that may help me.
> 
> Lets begin with the quadratic estimator - it has a breakdown point of
> zero because a single outlier (erroneous response/output variable) or
> leverage point (erroneous explanatory/input/factor space variable) can
> skew the estimate by an arbitrary amount.  The root cause of this skew
> is the large residual caused by the outlier/leverage point.
> 
> Now, redescending M-Estimators are designed to limit the effect of
> large residuals.  If a residual is over a certain magnitude it is down
> weighted.  Because the residuals of both outliers and leverage points
> are down weighted in the same way, you would think that M-Estimators
> would be resistant to both types of error.  But the literature says
> that this isn't the case.  M-Estimators are said to have a zero
> breakdown point of zero because a single outlier can cause an
> arbitrarily large skew in the estimate.  But why?  Any residual over
> the cutoff point will have small effect on the summation.  What is the
> breakdown point of an M-Estimator if only outliers are considered? 
> Can 50% contamination be tolerated?
> 
> Why are leverage points considered to have a stronger skewing effect
> than outliers?  Is it because in a linear model (e.g. y=mx+c), the
> explanatory variable x (i.e. the leverage point) is multiplied by the
> parameter m?  If this is the case, then I return to my previous
> argument, the leverage point may have a larger residual due to the
> multiplication, but the redescending M-Estimator should limit the
> effect of this residual.

What makes a leverage point a leverage point is that it has high leverage.

Leverage is the ability of the point to "pull the line" toward itself.

A sufficiently high leverage point may therefore have a *small* residual.
An M-estimator that has nice properties in location estimation will not
solve the problem if the leverage was high enough that the original 
residual was small.

consider the data

  x   y
  1  112
  2  111
  3  113
  4  125
  5  124
  6  135
105    1

The point with x at 105 has very high leverage.

The residual for the high leverage point from a linear regression through 
these points is approximately -1, while the smallest residual (ignoring
sign) among the other points is about 6. So the high leverage point will
not get downweighted - it has the residual closest to zero.

Some people look at deleted residuals instead, but this too has problems if
there are two such outliers instead of one (this is called "masking").

Glen
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to