Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-06 Thread lincoln

David Winsemius wrote
 
 This is making me think you really have multiple observation on the  
 same individuals (and that persons make transitions from one state to  
 another as a result of the passage of time. That needs a more complex  
 analysis than simple logistic regression. You might consider posting  
 a more complete description of the study on the SIG Mixed Effects  
 mailing list.
 
 -- 
 David.
 

No, I haven't. Individuals are birds marked with an unique alphanumeric code
that gives me information on their gender (sometimes I have this data
sometime I haven't), and their birth date (as a consequence also the age).
There are no multiple observations of the same individual.

Anyway, I believe I have not been answered to the main question: when using
anova with test Chisq between two models, is the difference in deviance
between the two models interpretable as the Chi Square value and the
difference in df interpretable as the df of the Chi square test?

For instance, given:

 anova(mod4,update(mod4,~.-cohort),test=Chisq)
Analysis of Deviance Table

Model 1: site ~ cohort
Model 2: site ~ 1
  Resid. Df Resid. Dev Df Deviance P(|Chi|)
1   993 1283.7  
2  1002 1368.2 -9  -84.554 2.002e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Is 84.554 taken as the Chi square value, 9 as the df of the test and the
p-value depending on these two values?

--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293p4632504.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-06 Thread peter dalgaard

On Jun 6, 2012, at 10:59 , lincoln wrote:

 
 David Winsemius wrote
 
 This is making me think you really have multiple observation on the  
 same individuals (and that persons make transitions from one state to  
 another as a result of the passage of time. That needs a more complex  
 analysis than simple logistic regression. You might consider posting  
 a more complete description of the study on the SIG Mixed Effects  
 mailing list.
 
 -- 
 David.
 
 
 No, I haven't. Individuals are birds marked with an unique alphanumeric code
 that gives me information on their gender (sometimes I have this data
 sometime I haven't), and their birth date (as a consequence also the age).
 There are no multiple observations of the same individual.
 
 Anyway, I believe I have not been answered to the main question: when using
 anova with test Chisq between two models, is the difference in deviance
 between the two models interpretable as the Chi Square value and the
 difference in df interpretable as the df of the Chi square test?
 
 For instance, given:
 
 anova(mod4,update(mod4,~.-cohort),test=Chisq)
 Analysis of Deviance Table
 
 Model 1: site ~ cohort
 Model 2: site ~ 1
  Resid. Df Resid. Dev Df Deviance P(|Chi|)
 1   993 1283.7  
 2  1002 1368.2 -9  -84.554 2.002e-14 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
 Is 84.554 taken as the Chi square value, 9 as the df of the test and the
 p-value depending on these two values?

That's the general mechanism, yes. (Whether the chi-square distribution holds 
after variable selection is a more difficult issue. Frank Harrell might chime 
in and remind us that there are books on that subject.)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-06 Thread Marc Schwartz
On Jun 6, 2012, at 9:36 AM, peter dalgaard wrote:

 
 On Jun 6, 2012, at 10:59 , lincoln wrote:
 
 
 David Winsemius wrote
 
 This is making me think you really have multiple observation on the  
 same individuals (and that persons make transitions from one state to  
 another as a result of the passage of time. That needs a more complex  
 analysis than simple logistic regression. You might consider posting  
 a more complete description of the study on the SIG Mixed Effects  
 mailing list.
 
 -- 
 David.
 
 
 No, I haven't. Individuals are birds marked with an unique alphanumeric code
 that gives me information on their gender (sometimes I have this data
 sometime I haven't), and their birth date (as a consequence also the age).
 There are no multiple observations of the same individual.
 
 Anyway, I believe I have not been answered to the main question: when using
 anova with test Chisq between two models, is the difference in deviance
 between the two models interpretable as the Chi Square value and the
 difference in df interpretable as the df of the Chi square test?
 
 For instance, given:
 
 anova(mod4,update(mod4,~.-cohort),test=Chisq)
 Analysis of Deviance Table
 
 Model 1: site ~ cohort
 Model 2: site ~ 1
 Resid. Df Resid. Dev Df Deviance P(|Chi|)
 1   993 1283.7  
 2  1002 1368.2 -9  -84.554 2.002e-14 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
 Is 84.554 taken as the Chi square value, 9 as the df of the test and the
 p-value depending on these two values?
 
 That's the general mechanism, yes. (Whether the chi-square distribution holds 
 after variable selection is a more difficult issue. Frank Harrell might chime 
 in and remind us that there are books on that subject.)



Frank might be busy with useR preparations for next week...

Quoting from Frank's book Regression Modeling Strategies, page 58, in the 
context of variable selection, stepwise methods and stopping rules:

The residual $\chi^2$ can be tested for significance (if one is able to forget 
that because of variable selection this statistic does not have a $\chi^2$ 
distribution), or the stopping rule can be based on Akaike's information 
criterion (AIC), here residual $\chi^2$ - 2 x d.f. Of course, use of more 
insight from knowledge of the subject matter will generally improve the 
modeling process substantially. It must be remembered that no currently 
available stopping rule was developed for data driven variable selection. 
Stopping rules such as AIC or Mallows' $C_p$ are intended for comparing only 
two \emph{prespecified} models.


The entire chapter (4) discusses these issues in more detail and as Peter notes 
there are other books and papers that focus on the underlying issue of variable 
selection. As Frank is oft-quoted as saying:

Variable selection is hazardous both to inference and to prediction. There is 
no free lunch; we are torturing data to confess its own sins.


Going back to Lincoln's prior post in the thread, presuming that there is 
sufficient data to use the original pre-specified model and also that the 
original full model itself was not derived from prior variable selection or 
univariate pre-screening:

  mod1 - glm(site ~ sex + birth + cohort + sex:birth, data=datasex, family = 
binomial) 

I would recommend reviewing the likelihood ratio test for that model versus the 
null model:

  anova(mod1, test = Chisq)

and determine whether or not 'cohort' was significant at some level there, 
rather than in the final reduced model. You might also want to consider using 
some of the tools in Frank's rms package on CRAN to further evaluate/validate 
that model.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-06 Thread lincoln
Thank you all,

This was exactly the sort of help I hoped to get.


--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293p4632568.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-05 Thread lincoln
Thank you for your commentaries and suggestions.

Site 0 and site 1 are interpretable like events. 
In fact these data come from a simultaneous observations of individuals in
two different sites (so they are independent observations: while one
individual is observed in one site it can't be in another). 

Each individual is assigned to age 0 (first year of age), or 1 (all the
rest); even though it may seem a very strong (brutal?) pooling, from a
biological point of view it makes sense given these two classes of
individuals are quite homogeneous in their dispersal behavior within each
age class (0 or 1). The goal of this analysis is just to characterize their
dispersal behavior (which individuals stay home at site 0 and which ones
disperse to site 1?

About the birth issue, here I am more in doubt. Birth relates to the
month of birth (5= May, 6= June, 7= July). It seems to me too it is a quite
severe pooling (one individual born 1st June is 5 as one individual born
30th June but one individual born 30th May or 1st July is 4 or 6 - it
doesn't make much sense). Anyway I didn't find a way to better measure this
variable as there is no a real starting and ending point, more or less
individuals may born since 1st May up to 31th July (I mean in my data set
there are no individuals born before and after these dates).

Any hint?


--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293p4632380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-05 Thread David Winsemius


On Jun 5, 2012, at 4:52 AM, lincoln wrote:


Thank you for your commentaries and suggestions.

Site 0 and site 1 are interpretable like events.
In fact these data come from a simultaneous observations of  
individuals in

two different sites (so they are independent observations: while one
individual is observed in one site it can't be in another).

Each individual is assigned to age 0 (first year of age), or  
1 (all the

rest); even though it may seem a very strong (brutal?) pooling, from a
biological point of view it makes sense given these two classes of
individuals are quite homogeneous in their dispersal behavior within  
each
age class (0 or 1). The goal of this analysis is just to  
characterize their
dispersal behavior (which individuals stay home at site 0 and which  
ones

disperse to site 1?


This is making me think you really have multiple observation on the  
same individuals (and that persons make transitions from one state to  
another as a result of the passage of time. That needs a more complex  
analysis than simple logistic regression. You might consider posting  
a more complete description of the study on the SIG Mixed Effects  
mailing list.


--
David.



About the birth issue, here I am more in doubt. Birth relates to  
the
month of birth (5= May, 6= June, 7= July). It seems to me too it is  
a quite
severe pooling (one individual born 1st June is 5 as one individual  
born

30th June but one individual born 30th May or 1st July is 4 or 6 - it
doesn't make much sense). Anyway I didn't find a way to better  
measure this

variable as there is no a real starting and ending point, more or less
individuals may born since 1st May up to 31th July (I mean in my  
data set

there are no individuals born before and after these dates).

Any hint?


--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293p4632380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-04 Thread lincoln
Hi all,

I have done a backward stepwise selection on a full binomial GLM where the
response variable is gender.
At the end of the selection I have found one model with only one explanatory
variable (cohort, factor variable with 10 levels).

I want to test the significance of the variable cohort that, I believe, is
the same as the significance of this selected model:

 anova(mod4,update(mod4,~.-cohort),test=Chisq)
Analysis of Deviance Table

Model 1: site ~ cohort
Model 2: site ~ 1
  Resid. Df Resid. Dev Df Deviance P(|Chi|)
1   993 1283.7  
2  1002 1368.2 -9  -84.554 2.002e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

My question is:
When I report this result, I would say /cohorts were unevenly distributed
between sites ( Chi2=84.5, df=9, p  0.001)/, is that correct? is the Chi2
value the difference of deviance between model with cohort effect and null
model?

--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-04 Thread David Winsemius


On Jun 4, 2012, at 7:00 AM, lincoln wrote:


Hi all,

I have done a backward stepwise selection on a full binomial GLM  
where the

response variable is gender.
At the end of the selection I have found one model with only one  
explanatory

variable (cohort, factor variable with 10 levels).

I want to test the significance of the variable cohort that, I  
believe, is

the same as the significance of this selected model:


anova(mod4,update(mod4,~.-cohort),test=Chisq)

Analysis of Deviance Table

Model 1: site ~ cohort
Model 2: site ~ 1
 Resid. Df Resid. Dev Df Deviance P(|Chi|)
1   993 1283.7
2  1002 1368.2 -9  -84.554 2.002e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

My question is:
When I report this result, I would say /cohorts were unevenly  
distributed
between sites ( Chi2=84.5, df=9, p  0.001)/, is that correct? is  
the Chi2
value the difference of deviance between model with cohort effect  
and null

model?


I thought you said the response variable was gender? It seems to be  
'site' in these two models. Maybe you should give us some more  
information about how you constructed 'mod4'?


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-04 Thread lincoln
So sorry,

My response variable is site (not gender!).
The selection process was:

 str(data)
'data.frame':   1003 obs. of  5 variables:
 $ site  : Factor w/ 2 levels 0,1: 1 1 1 1 1 1 1 1 1 1 ...
 $ sex   : Factor w/ 2 levels 0,1: NA NA NA NA 1 NA NA NA NA NA ...
 $ age   : Factor w/ 2 levels 0,1: 1 1 1 1 1 1 1 1 1 1 ...
 $ cohort: Factor w/ 10 levels 1999,2000,..: 10 10 10 10 10 10 10 10 10
10 ...
 $ birth : Factor w/ 3 levels 5,6,7: 3 3 2 2 2 2 2 2 2 2 ...
 datasex-subset(data, sex !=NA)

*Here below the structure of the analysis and only the anova.glm of the
last, selected model, mod4:
*
mod1 - glm(site ~ sex + birth + cohort + sex:birth, data=datasex, family =
binomial)
summary(mod1)
anova(mod1,update(mod1,~.-sex:birth),test=Chisq)

mod2 - glm(site ~ sex + birth + cohort, data=datasex, family = binomial)
summary(mod2)
anova(mod2,update(mod2,~.-sex),test=Chisq)

mod3 - glm(site ~ birth + cohort, data=data, family = binomial)
summary(mod3)
anova(mod3,update(mod3,~.-birth),test=Chisq)

mod4 - glm(site ~ cohort, data=data, family = binomial)
summary(mod4)
anova(mod4,update(mod4,~.-cohort),test=Chisq)
Analysis of Deviance Table

Model 1: site ~ cohort
Model 2: site ~ 1
  Resid. Df Resid. Dev Df Deviance P(|Chi|)
1   993 1283.7  
2  1002 1368.2 -9  -84.554 2.002e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

*My question:*
In this case, the Chi2 value would be the difference in deviance between
models and d.f. the difference in d.f. (84.554 and 9)?
In other words may I correctly assess: /cohorts were unevenly distributed
between sites ( Chi2=84.5, df=9, p  0.001)/?



--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293p4632312.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Chi square value of anova(binomialglmnull, binomglmmod, test=Chisq)

2012-06-04 Thread David Winsemius


On Jun 4, 2012, at 11:31 AM, lincoln wrote:


So sorry,

My response variable is site (not gender!).
The selection process was:



If there is a natural probability interpretation to site==1 being a  
sort of event, (say perhaps a non-lymphatic site for the primary site  
of a lymphoma)  then you can say that the log-odds for 'site' being 1  
compared to the log-odds for being 0 are different among the cohorts.  
(Or equivalently that the odds ratios are significantly different.)


Worries: The fact that 'age' codes are 1/0 and' birth' is 5,6,or 7  
makes me wonder what sort of measurements these are. I worry when  
variables usually considered as continuous get so severely  
discretized. The fact that this is data measured over time also raised  
further concerns about independence. Were controls observed in 1999  
still subject to risk in 2000 and subsequent years? Were there  
substantial differences in the time to events? I also worry when words  
normally used as a location are interpreted as events and there is no  
context offered.


--
David.

str(data)

'data.frame':   1003 obs. of  5 variables:
$ site  : Factor w/ 2 levels 0,1: 1 1 1 1 1 1 1 1 1 1 ...
$ sex   : Factor w/ 2 levels 0,1: NA NA NA NA 1 NA NA NA NA NA ...
$ age   : Factor w/ 2 levels 0,1: 1 1 1 1 1 1 1 1 1 1 ...
$ cohort: Factor w/ 10 levels 1999,2000,..: 10 10 10 10 10 10 10  
10 10

10 ...
$ birth : Factor w/ 3 levels 5,6,7: 3 3 2 2 2 2 2 2 2 2 ...

datasex-subset(data, sex !=NA)


*Here below the structure of the analysis and only the anova.glm of  
the

last, selected model, mod4:
*
mod1 - glm(site ~ sex + birth + cohort + sex:birth, data=datasex,  
family =

binomial)

summary(mod1)
anova(mod1,update(mod1,~.-sex:birth),test=Chisq)


mod2 - glm(site ~ sex + birth + cohort, data=datasex, family =  
binomial)

summary(mod2)
anova(mod2,update(mod2,~.-sex),test=Chisq)



mod3 - glm(site ~ birth + cohort, data=data, family = binomial)
summary(mod3)
anova(mod3,update(mod3,~.-birth),test=Chisq)



mod4 - glm(site ~ cohort, data=data, family = binomial)
summary(mod4)
anova(mod4,update(mod4,~.-cohort),test=Chisq)

Analysis of Deviance Table

Model 1: site ~ cohort
Model 2: site ~ 1
 Resid. Df Resid. Dev Df Deviance P(|Chi|)
1   993 1283.7
2  1002 1368.2 -9  -84.554 2.002e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

*My question:*
In this case, the Chi2 value would be the difference in deviance  
between

models and d.f. the difference in d.f. (84.554 and 9)?
In other words may I correctly assess: /cohorts were unevenly  
distributed

between sites ( Chi2=84.5, df=9, p  0.001)/?



--
View this message in context: 
http://r.789695.n4.nabble.com/Chi-square-value-of-anova-binomialglmnull-binomglmmod-test-Chisq-tp4632293p4632312.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.