[R] Very high Nagelkerke’s Pseudo-R2 values

Simone Santoro Tue, 17 Sep 2013 12:19:11 -0700

Hi all,



I have a data set made of 12 years each one with a number of males and a
 number of females. I tested the relationship between the sex ratio 
(proportion  of males over the total) weighted for the number of 
individuals of each year.

In R:

glm.1<-glm(cbind(males,females)~predictor,binomial,data=data)

With this aim I prepared a set of candidate models each one representing
 a specific biological hypothesis. I work with two data sets because I used two 
sexing methods and in one data set I have some extra individuals sexed each 
year with another method. Hence data sets have different sample sizes (min=14 
and 43, max=880 and 950, mean=244 and 324, respectively). One 
identical set and analysis for each data set. I considered four 
predictors but each model contained at most two predictors (one 
categorical predictor plus one of the other three that, instead, are 
continuous). The categorical predictor has a clear effect on the sex 
ratio as resulting from simple plotting of data and by logic beyond the 
hypothesis it depicts. I know both analyses are at risk of being 
overparameterized but I trust that QAICc (Akaike Information Criterion 
corrected for small samples and overdispersion) had ride of this 
problem.



In fact, for the smaller data set I don't find any clear pattern and, as
 a result, the null (only intercept) model performs as well as the one 
considering the categorical predictor. I report the QAICc (c-hat=1.2) 
ranking and as a measure of the effect size, the Nagelkerkes Pseudo-R2,
 that in this case, for the best ranked non-null model (the categorical 
predictor) is about 0.3.



For the bigger data set I find very clear results and the model 
accounting for the categorical predictor plus another (continuous) 
predictor is ranked first at more than six deltaAICc (c-hat=1) from the 
next one (the one with only the categorical predictor).

In this case, the Nagelkerkes Pseudo-R2 is about 0.95 and I feel somehow 
uncomfortable with that high optimistic estimate.



In R the Nagelkerkes Pseudo-R2 was computed following Faraway (2006) as:

R2.nagelkerke<-(1-exp((glm.1$dev - 
glm.1$null)/nrow(data)))/(1-exp(-glm.1$null/nrow(data)))

 

Any opinion/suggestion on this case?

  

Faraway, J. J. (2006). Extending the Linear Model with R. Boca Raton. FL: 
Chapman & Hall/CRC.
                                                                                
          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Very high Nagelkerke’s Pseudo-R2 values

Reply via email to