That's true in all science but it puts finance on quite shaky ground because there are people trading serious money based on the idea that what they are doing is valid and working correctly. This is obviously kind of relevant to what's going on right now. Thanks for your references also.
mark On Thu, Sep 18, 2008 at 1:29 PM, Matias Salibian-Barrera wrote:
Haven't read "Fooled by randomness", but did start reading Black Swan, and although in general I like provocative books that challenge my points of view, I found his main thesis to be too short to warrant so many words... I took it that his main argument was with those who misinterpret and misuse statistics (particularly when they do it for their own benefit), not with statistics itself, which is always based on assumptions etc.[snip] In fact, he would say that a model works until it doesn't.Which is a fair statement, that also applies to science in general, "theories work until they are proved wrong", and the whole "falsifiability" argument (cf. Popper vs. Kuhn vs. Feyerabend vs...). I believe robust statistics can help you determine when your model (theory) has stopped to work.In any case, with respect to the old "data cleaning versus robust estimators" discussion, I would point the interest reader to the first chapter of Maronna, Martin and Yohai's book (http://books.google.com/books?id=YD--AAAACAAJ&dq=martin+maronna+yohai), and for some more specific inference implications, to the first chapter of my PhD dissertation. Essentially, a couple of main issues are: (a) detecting outliers using non-robust estimators does not work well in general (but even if / when it does, see my next point); (b) if you remove (or alter) observations, all subsequent probabilistic statements (p-values, standard errors, etc) are all conditional on the very non-linear cleaning operation you did, and thus both wrong at face value, and not easy to correct. Robust estimators incorporate the down-weighting and its effect on the corresponding inference at once, and are thus, IMHO, to be preferred.Matias [EMAIL PROTECTED] wrote:Hi: i don't know if you read "fooled by randomness" by Nassim Taleb ( spelling ) but he essentially says using very non statistical arguments but strong nevertheless. ( it's not a stat or a quant finance book ) that outliers in finance are not modellable and don't claim that you can model them because you'd be lying. In fact, he would say that a model works until it doesn't.Anyway, it's an interesting book that sort of indirectly talks ( for a little too long actually. you can get what's he saying in the first 50 pages and it's about 200 pages ) about your comment below so I figured I would just mention it in case you were interested.On Thu, Sep 18, 2008 at 11:36 AM, Ajay Shah wrote:In continuation of the discussion on `Winsorisation' that has takenplace on r-sig-finance today, I thought I'd present all of you with aninteresting dataset and a question.This data is the daily stock returns of the large Indian software firm `Infosys'. (This is the symbol `INFY' on NASDAQ). It is a large numberof observations of daily returns (i.e. percentage changes of the adjusted stock price). Load the data in -- print(load(url("http://www.mayin.org/ajayshah/tmp/infosys_mm.rda"))) str(x) summary(x) sd(x) The name `rj' is used for returns on Infosys, and `rM' is used for returns on the stock market index (Nifty). There are three really weird observations in this. weird.rj <- c(1896,2395) weird.rM <- 2672 x[weird.rj,] x[weird.rM,] As you can see, these observations are quite remarkable given the small standard deviations that we saw above. There is absolutely no measurement error here. These things actually happened. Now consider a typical application: using this to estimate a marketmodel. The goal here is to estimate the coefficient of a regression ofrj on rM. # A regression with all obs summary(lm(rj ~ rM, data=x)) # Drop the weird rj -- summary(lm(rj ~ rM, data=x[-weird.rj,])) # Drop the weird rM -- summary(lm(rj ~ rM, data=x[-weird.rM,])) # Drop both kinds of weird observations -- summary(lm(rj ~ rM, data=x[-c(weird.rM,weird.rj),])) # Robust regressions library(MASS) summary(rlm(rj ~ rM, data=x)) summary(rlm(rj ~ rM, method="MM", data=x)) library(robust) summary(lmRob(rj ~ rM, data=x)) library(quantreg) summary(rq(rj ~ rM, tau=0.5, data=x)) So you see, we have a variety of different estimates for the slope (which is termed `beta' in finance). What value would you trust the most? And, would winsorisation using either my code (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002921.html) or Patrick Burns' code(https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002923.html) be agood idea here? I'm instinctively unhappy with any scheme based on discarding observations that I'm absolutely sure have no measurement error. We have to model the weirdness of this data generating process, not ignore it. --Ajay Shah http://www.mayin.org/ajayshah [EMAIL PROTECTED] http://ajayshahblog.blogspot.com<*(:-? - wizard who doesn't know the answer. _______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first._______________________________________________ R-SIG-Robust@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust-- _____________________________________________________ Matias Salibian-Barrera - Department of Statistics The University of British Columbia Phone: (604) 822-3410 - Fax: (604) 822-6960 "The plural of anecdote is not data" (George Stigler?)
_______________________________________________ R-SIG-Robust@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust