Prof Brian Ripley wrote: > On Tue, 13 Nov 2007, Dylan Beaudette wrote: > > >> Hi, >> >> I have setup a simple logistic regression model with the glm() function, with >> the follow formula: >> >> y ~ a + b >> >> where: >> 'a' is a continuous variable stratified by >> the levels of 'b' >> >> >> Looking over the manual for model specification, it seems that coefficients >> for unordered factors are given 'against' the first level of that factor. >> > > Only for the default coding. > > >> This makes for difficult interpretation when using factor 'b' as a >> stratifying model term. >> > > Really? You realize that you have not 'stratified' on 'b', which would > need the model to be a*b? What you have is a model with parallel linear > predictors for different levels of 'b', and if the coefficients are not > telling you what you want you should change the coding. > > I have to differ slightly here. "Stratification", at least in the fields that I connect with, usually means that you combine information from several groups via an assumption that they have a common value of a parameter, which in the present case is essentially the same as assuming an additive model y~a+b.
I share your confusion as to why the parametrization of the effects of factor b should matter, though. Surely, the original poster has already noticed that the estimated effect of a is the same whether or not the intercept is included? The only difference I see is that the running anova() or drop1() would not give interesting results for the effect of b in the no-intercept variation. -p > Much of what I am trying to get across is that you have a lot of choice as > to how you specify a model to R. There has to be a default, which is > chosen because it is often a good choice. It does rely on factors being > coded well: the 'base level' (to quote ?contr.treatment) needs to be > interpretable. And also bear in mind that the default choices of > statistical software in this area almost all differ (and R's differs from > S, GLIM, some ways to do this in SAS ...), so people's ideas of a 'good > choice' do differ. > > >> Setting up the model, minus the intercept term, gives me what appear to be >> more meaningful coefficients. However, I am not sure if I am interpreting the >> results from a linear model without an intercept term. Model predictions from >> both specifications (with and without an intercept term) are nearly identical >> (different by about 1E-16 in probability space). >> >> Are there any gotchas to look out for when removing the intercept term from >> such a model? >> > > It is just a different parametrization of the linear predictor. > Anything interpretable in terms of the predictions of the model will be > unchanged. That is the crux: the default coefficients of 'b' will be > log odds-ratios that are directly interpretable, and those in the > per-group coding will be log-odds for a zero value of 'a'. Does a zero > value of 'a' make sense? > > >> Any guidance would be greatly appreciated. >> >> Cheers, >> >> >> > > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.