On Mon, Nov 21, 2011 at 08:08:11PM +0000, John Darrington wrote: > The good news is, that I found and fixed a bug which was causing the Effects > Coding > to produce garbage results. The surprising news (surprising to me anyway) is > that > having fixed it, Effects Coding produces identical results to Dummy Coding.
That's good news. At first, I was surprised when the sums of squares did not agree, because, typically, the sums of squares should not change over different coding schemes. > 2. Another example, this time for SAS, at > http://www.sfu.ca/sasdoc/sashtml/stat/chap30/sect52.htm > > I copied the data given there, and ran it through PSPP and got: > > #===============#=======================#==#============#==========#=======# > # Source #Type III Sum of Squares|df| Mean Square| F | Sig. # > #===============#=======================#==#============#==========#=======# > #Corrected Model# 4259,338506|11| 387,212591| 3,505692|,001298# > #Intercept # 20672,844828| 1|20672,844828|187,164963|,000000# > #drug # 3063,432863| 3| 1021,144288| 9,245096|,000067# > #disease # 418,833741| 2| 209,416870| 1,895990|,161720# > #drug * disease # 707,266259| 6| 117,877710| 1,067225|,395846# > #Error # 5080,816667|46| 110,452536| | # > #Total # 30013,000000|58| | | # > #Corrected Total# 9340,155172|57| | | # > > > Now these numbers are exactly what the SAS example gives for the type II > sums of squares, > (although PSPP is labelling them as Type III) > > > 3. A concise but quite useful description of the various ssq "types" can be > found at > http://afni.nimh.nih.gov/sscc/gangc/SS.html > It says this about Type III : > > "SS gives the sum of squares that would be obtained for each variable if it > were entered last into the model. That is, the effect of each variable is > evaluated after all other factors have been accounted for. Therefore the > result > for each term is equivalent to what is obtained with Type I analysis when > the > term enters the model as the last one in the ordering." This is what I coded in glm.c to begin with, but it wasn't giving the same result as SPSS. I found that SPSS drops an interaction term if it first drops a main effect contained in it. So I went back to mimic that behavior, and now it doesn't seem to match SAS. Now I wonder if SAS and SPSS agree on the meaning of type 3 sums of squares. It would be nice to have some tests using both programs. > 4. However, none of the SPSS examples I have found which feature unbalanced > designs > actually correspond to what PSPP currently produces for type III ssq. > The > interactions are the same, but the main effects quite different. > > The forgoing leads me to infer that SPSS has the meaning of Type II and Type > III > transposed, in comparison to the rest of the world. > > This sounds somewhat incredible, but seems to be consistent with the evidence > so far. > > I can only suggest that we try to implement the Type II next, and see what > happens. I agree, and in the mean time, if anybody out there has access to both SAS and SPSS, please send us a few test results showing the type 1, type 2 and type 3 sums of squares from both programs. And the data. Maybe now is the time for me to mention where this comes from, even though I'm not sure if this will resolve the meaning of 'Type 1', 'Type 2' and 'Type 3'. Pardon me if this explanation is self-evident by now: We are talking about sums of squares due to regression, that is: sum ((predicted Y's - other version of predicted Y's)^2) if Y is the dependent variable. In this case, the predicted Y may be the 'final' predicted Y, or a predicted value based only on some of the predictors, or the sample mean of Y. The idea is to compute the reduction in the sums of squared errors: sum ((predicted Y - observed Y)^2) ...by adding more predictors. The usual ways to measure this reduction in sums of squared errors is to look at sums of squares due to regression, as mentioned above. That is, we look at the *drop* in sums of squared errors by looking for a corresponding *rise* in sums of squares due to regression. The two ways to do this are via 'sequential' sums of squares and 'partial' sums of squares: Suppose X1 and X2 are predictors. Let SSR (X1) be the sums of squares due to regression of Y on X1, that is, SSR (X1) = sum ((predicted Y - mean of Y)^2) ...where 'predicted Y' is the predicted value of Y using X1 as the sole predictor. Now we add another predictor, X2. It's sequential sums of squares are SSR (X2 | X1) = sum ((predicted Y given X1 - predicted Y given X1 *and* X2)^2) This measures the improvement in our prediction by adding X2, when X1 is already present in the model. The *partial* sums of squares for X2 is just SSR (X2) = sum ((predicted Y from X2 - mean of Y)^2) ...with no X1 in the model. The first issue here is: What should we call the partial and sequential sums of squares? One is usually called 'type 1' and the other, 'type 2'. But I think those names are mostly used by software, and not by practitioners. I'm scanning the beginning of chapter 8 of Neter, Wasserman and Kutner's book, and they seem to be satisfied to refer only to 'extra' sums of squares, then using something like the notation above to be more specific. Chapter 4 of Meyers' and Milton's 'A First Course in the Theory of Linear Statistical Models' refers to sums of squares of X2 'in the presence of' X1. It then refers to partial and sequential tests. To be more specific, they use vector and matrix notation. Mendenhall's and Sincich's 'A Second Course in Statistics: Regression Analysis' mentions 'reduced', 'nested' and 'full' models, but doesn't seem to dwell on the differences in the types of sums of squares, though they do mention these a bit in chapter 4. I myself remember in graduate school that several of us, including the instructor, had to occasionally pause to figure out which sums of squares was called 'type 2' by the software. On the other hand, I remember that some coworkers at an old job always used the terms 'type *' and couldn't say much about what the sums of squares meant. One other question: What should we do with an interaction if one of its main effects is dropped? Drop the interaction? There are cases where it makes sense to retain an interaction without the main effect. But there are cases to the contrary. What this means is that there *may* not be a single definition of 'type 3' sums of squares that we can code that will always work. Authors and professors seem to prefer other, more descriptive terms than just 'types 1, 2 and 3'. I'm pretty sure there is no standard definition of 'type 4'. I guess we should figure out how to mimic SPSS in the case of unbalanced designs, though. When I coded type 3 sums of squares, I assumed it meant 'partial' for each variable. That is 'drop Xi, fit the model, add Xi, and find the sums of squares': SSR_type3 (Xi|all but Xi) = sum ((predicted Y from all - predicted Y from all but Xi)^2) This matched SPSS, unless Xi was in some interaction. So I dropped the interaction terms involving Xi, too. Then this was correct, until the design was unbalanced. Now I'm not sure what it should be. -Jason _______________________________________________ pspp-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/pspp-dev
