>===== Original Message From Michael Joner <[EMAIL PROTECTED]> ===== >Does it make a big difference if I use >an MM regression, or LTS, or LMS?
Good question. I answered your first post from a basic, introductory level. I was trying to convey the idea of robust regression. I used LMS as my example of a robust estimator for two reasons: (1) it is reasonably easy to understand and (2) I had a ready-made example in Excel which I wanted to use as evidence that Excel is not completely worthless for teaching stats. I felt I was on firm ground, but you are now moving to deeper intellectual waters and I am now treading just like you. I will give you my opinion, based on what I know right now, but I am not nearly as sure of myself as I was before. In my attempt to answer you, I ran across the work of Doug Martin and Andreas Ruckstuhl. I am ccing them on this post in the hopes that they can correct any mistakes here and explain, in clear language, what MM in S Plus is doing. First, I think it's pretty clear that LMS is dominated by LTS or MM because of the large SE of the LMS estimator. I found an excellent post to the S Plus list from Doug Martin: http://www.math.yorku.ca/Who/Faculty/Monette/S-news/0032.html I recommend that you read this carefully. He makes it clear that LTS and MM are attempts to improve the efficiency of the robust estimator without compromising its robustness to outliers. As for which form of robust regression to run, I do not believe there is a clear answer. You can intuitively see that this is going to be an exercise in trading off efficiency for robustness and an optimal estimator is going to be a function of the data or particular problem at hand. I am not an S Plus user, but it looks like S Plus is going to give you LTS and MM pretty easily. The S Plus 2000 Release Notes, which are many places on the web, e.g., http://www.uni-koeln.de/themen/Statistik/s/v51/readme_win.txt says the following: Robust LTS regression (ltsreg) By default, ltsreg now uses 10% trimming. Previously it used 50% trimming. This change was made in response to user feedback that the default trimming of 50% was too extreme in most cases. Robust MM regression (lmRobMM) The Robust MM Regression dialog now has a default Resampling Method of "Auto", which uses the sample size and number of variables to determine which resampling method to use. The command line function lmRobMM() is unchanged. I couldn't find a clear explanation of what exactly MM is doing. I fear you're going to have to read the paper that started this: Yohai, V., Stahel, W. A., and Zamar, R. H. (1991). A procedure for robust estimation and inference in linear regression, in Stahel, W. A. and Weisberg, S. W., Eds., Directions in Robust Statistics and Diagnostics, Part II. Springer-Verlag. It looks like this might also be a good source: Marazzi, A. (1993). Algorithms, Routines, and S functions for Robust Statistics. Wadsworth & Brooks/Cole, Pacific Grove, CA. After you figure out exactly what MM and LTS are doing, I would suggest trying all of them, LS, LMS, LTS, and MM. Robust regression estimates are the result of complicated (read "lots of room for mistakes") algortihms. You need to be wary. I would also recommend that you think carefully about the process that generated the data. Why are you worried about outliers? I am sorry that this is not a clean, clear answer. Perhaps others can offer better, more grounded advice. Burble burble . . . :-)) Humberto Barreto [EMAIL PROTECTED] (765) 361-6315 ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =================================================================