The second case also needs the argument: weight=n
Then all 3 models should give the same general fit (same coefficients, same
predicted values).
The differences are subtle and may not be of interest. Conceptually think
about: did you run 10 trials under a set of conditions (age=x, sex=y, class=z)
and 9 of them were successes? This is model 2/3. Or did you run a bunch of
individual trials and just by chance 10 of them happened to have the same
conditions (age=x, sex=y, class=z) and 9 of those 10 were successes? This is
model 1.
The biggest visible difference is in the deviance calculations. That comes
about because in model 1 the saturated model can fit every point exactly (since
the responses are all 0 or 1), in the other 2 the saturated model gives the
same proportion for each combination of predictors as observed, but these are
not 0/1 now.
The most important difference comes when you decide to extend the model, (mixed
effects, bootstrapping) because the observational unit is different between
model 1 and models 2 3 (I don't know of any differences between 2 3 other
than looks/convenience).
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of andyer weng
Sent: Monday, October 20, 2008 4:39 PM
To: r-help@r-project.org
Subject: Re: [R] Categorical Response Query
Hi all,
I have a queston about Categorical response.
i have a data frame containing age, sex, class, success(1=success,
0=non sucess).
age, sex,class are the explantory variables, and sucess is the
response variable. and i can get n (the nunber of times each age
occurs) and r (the number of sucess of that age).
when I try to creat the regression relationship for these variables, I
have seen many different cases, i just wonder which one fits me the
best for this situation.
1st case,
xxx.glm-glm(success~age*sex*class,family=binomial, data=xxx.data)
2nd case
xxx.glm-glm(r/n~age*sex*class,family=binomial, data=xxx.data)
3rd case
xxx.glm-glm(cbind(r,n-r)~age*sex*class,family=binomial, data=xxx.data)
what is difference between the above 3 cases? which one is the best to
use?
if Ii don't group the data, can I use the 1st case. if i group the
data, can i use 2nd or 3rd case?
please advise.
Cheers.
Andyer
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.