I'm late to this discussion, but let me try to put it in another context.
Assume that I wanted to know whether kids who live west of their school or east of their shool are more likely to be early (some hypothesis about walking slower if the sun is in their eyes). So I create a 0/1 variable east/west and get samples of 10 student arrival times at each of 100 different schools. Fit the model

   lm(arrive ~ factor(school) + east.west)

where "arrive" is in some common scale like "minutes since midnight". Since different schools could have different starting times for their first class we need an intercept per school.

  Two questions:
1. Incremental effect: the coefficient of east/west measures the incredmental effect across all schools. With n of 1000 it is likely estimated with high precision.
     2. Absolute: predict the average arrival time (on the clock) for students.

Conditional logistic is very like this. We have a large number of strata ("schools") with a small number of observations in each (often only 2 per strata). One can ask incremental questions about variables common to all strata, but absolute prediction is pretty worthless. a. You can only do it for schools (strata) that have already been seen and b. there are so few subjects in each of them that the estimates are very noisy. The default prediction from clogit is focused on questions of type 1. The documentation doesn't even bother to mention predictions of type 2, which would be probabilities of events. I can think of a way to extract such output from the routine (being the author gives some insight), but why would I want to?

Terry Therneau

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to