sci.stat.edu people: There have been other Replies to the original post, in sci.stat.math.
On 21 Apr 2004 09:10:14 -0500, [EMAIL PROTECTED] (Herman Rubin) wrote: > In article <[EMAIL PROTECTED]>, S Fan <[EMAIL PROTECTED]> wrote: > >If you plot the residuals, the residuals seem getting bigger (or > >smaller), then you may need transformation. > >When doing regression, one assumption is that the data follow a > >constant (though unknown) sigma. > >Hope it helps. > >S Fan > > This is NOT the most important assumption; one can modify > the regression approach to take it into account. The MOST > important assumption is that the relationship is a linear > relationship, with the "errors" independent of (or at least > uncorrelated with) the predictors. Non-trivial transformations > are extremely unlikely to preserve this property. Herman is accustomed to data that *have* these properties of linearity and independent errors at the start. He is also facile with non-linear analyses where he knows how to accommodate the error structure directly -- something not always easy to do, and sometimes easier to do than to *explain* to an audience which is not sophisticated with numbers. My experience is different from his. In clinical research, bioassays (for one instance) have unit 'concentrations' but the proper unit of measurement, with those properties he mentions, is apt to be the log(). The proper unit of the growth curve is apt to be the logit. Bioassay is an area with a long and healthy tradition of transformations; check any textbook. Tukey provided a rule of thumb for data with natural zero: IF the largest value is 10 or 20 times the smallest, then you probably want to transform. Tukey also provided other guidelines, talking about 'folded' transformations such as the logit, and about the family of power transformations. Some people are fond of the rank-transformation: That is the useful way, in my opinion, of referring a large fraction of the 'non-parametric' alternatives, which I avoid when I can. Finally, some people like arbitrary transformations, including adding arbitrary constants before taking the log or power: What I am thinking of are the ones with the single virtue of giving residuals that are apparently normal, for the data on hand -- That is done in order to improve (or justify) using the F-test. The proper p-level is not achieved if you do not meet the assumption about residuals, so this DOES THAT. I can admit that I did that a time or two, a long time ago, and I might someday do it again. However, the F-test will be more simply wrong, if, say, the linearity is fouled up by the transformation, making the coefficients wrong and mis-measuring the error. I don't know if I avoid 'arbitrary transformations' because of that, or because they are inelegant and hard to justify to anyone else. > > >On 19 Apr 04 03:24:58 -0400 (EDT), opaow wrote: > >>Hi.I am just quite confused about data transformations (specially in > >>doing ANOVA and Regression)... When and why do we transform data?... > >>Any help??? I'm not quite good at it.....thanks in advance.. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
