RE: Robust Regression and Excel for Stats

Humberto Barreto Sat, 12 Jan 2002 08:28:51 -0800

>===== Original Message From Michael Joner <[EMAIL PROTECTED]> =====
>Does it make a big difference if I use
>an MM regression, or LTS, or LMS?


Good question.

I answered your first post from a basic, introductory level.  I was trying to 
convey the idea of robust regression.  I used LMS as my example of a robust 
estimator for two reasons: (1) it is reasonably easy to understand and (2) I 
had a ready-made example in Excel which I wanted to use as evidence that Excel 
is not completely worthless for teaching stats.

I felt I was on firm ground, but you are now moving to deeper intellectual 
waters and I am now treading just like you.  I will give you my opinion, based 
on what I know right now, but I am not nearly as sure of myself as I was 
before.

In my attempt to answer you, I ran across the work of Doug Martin and Andreas 
Ruckstuhl.  I am ccing them on this post in the hopes that they can correct 
any mistakes here and explain, in clear language, what MM in S Plus is doing.

First, I think it's pretty clear that LMS is dominated by LTS or MM because of 
the large SE of the LMS estimator.  I found an excellent post to the S Plus 
list from Doug Martin:

http://www.math.yorku.ca/Who/Faculty/Monette/S-news/0032.html

I recommend that you read this carefully. He makes it clear that LTS and MM 
are attempts to improve the efficiency of the robust estimator without 
compromising its robustness to outliers.

As for which form of robust regression to run, I do not believe there is a 
clear answer.  You can intuitively see that this is going to be an exercise in 
trading off efficiency for robustness and an optimal estimator is going to be 
a function of the data or particular problem at hand.

I am not an S Plus user, but it looks like S Plus is going to give you LTS and 
MM pretty easily.  The S Plus 2000 Release Notes, which are many places on the 
web, e.g.,
http://www.uni-koeln.de/themen/Statistik/s/v51/readme_win.txt
says the following:

Robust LTS regression (ltsreg)

        By default, ltsreg now uses 10% trimming. Previously it used 50%
        trimming. This change was made in response to user feedback that the
        default trimming of 50% was too extreme in most cases.

Robust MM regression (lmRobMM)

        The Robust MM Regression dialog now has a default Resampling Method of
        "Auto", which uses the sample size and number of variables to
        determine which resampling method to use. The command line function
        lmRobMM() is unchanged.

I couldn't find a clear explanation of what exactly MM is doing.  I fear 
you're going to have to read the paper that started this:

Yohai, V., Stahel, W. A., and Zamar, R. H. (1991). A procedure for robust 
estimation and inference in linear regression, in Stahel, W. A. and Weisberg, 
S. W., Eds., Directions in Robust Statistics and Diagnostics, Part II. 
Springer-Verlag.

It looks like this might also be a good source:

Marazzi, A. (1993). Algorithms, Routines, and S functions for Robust 
Statistics. Wadsworth & Brooks/Cole, Pacific Grove, CA.


After you figure out exactly what MM and LTS are doing, I would suggest trying 
all of them, LS, LMS, LTS, and MM.  Robust regression estimates are the result 
of complicated (read "lots of room for mistakes") algortihms.  You need to be 
wary.  I would also recommend that you think carefully about the process that 
generated the data.  Why are you worried about outliers?

I am sorry that this is not a clean, clear answer. Perhaps others can offer 
better, more grounded advice.  Burble burble . . . :-))

Humberto Barreto
[EMAIL PROTECTED]
(765) 361-6315



=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

RE: Robust Regression and Excel for Stats

Reply via email to