[R] Replicated LR goodness-of-fit tests, heterogeneity G, with loglm?

Stewart Schultz Wed, 23 May 2007 05:00:38 -0700


I have numerous replicated goodness-of-fit experiments (observed compared to 
expected counts in categories) and these
replicates are nested within a factor. 
The expected counts in each cell are external (from a
scientific model being tested).  The
calculations I need within each level of the nesting factor are a heterogeneity
G test, with the total G and the pooled G across replicates.  Then I would like 
to form an F ratio equal to
the ratio of pooled G divided by its degrees of freedom to heterogeneity G 
divided
by its degrees of freedom.  The F ratio
would (I think) test the hypothesis that the badness-of-fit in the pooled data
is greater than would be expected by chance from the heterogeneity among 
replicates.



 


It seems that the function loglm is the closest within R to what I want.  But I 
can¢t see
how it can be used when the expected
proportions are externally provided.


 


I¢ve appended here a function I wrote that more or less does
what I want (with the nesting factor ignored) but I would prefer to use 
something
like loglm because of the additional information it offers and its
flexibility with hierarchical models.



 


Thanks,


Stu


 


---------


 


hetg = function() {


#creating some fake data


tran=gl(10,10)   #tran
are the random replicates


cov= factor(rep(1:10,10))
#10 levels of this factor for each level of tran


tleng = c(219, 312, 178, 322, 311, 242, 235, 235, 266, 193)
#weighting for


   #each transect (of
each level of tran)


obscounts = rpois(100, 50) 
#these are the observed response count data


expprop = rep(c(.1, .1, .1, .05, .15, .1, .1, .1, .1, .1),
10)  #expected 


   # proportion of
counts within 10 levels within each level of tran


vv = tapply(obscounts, tran, sum) #get marginal sum of
counts


rr = rep(vv, each=10) #rep the marginal sums across each
cell


expcounts = expprop*rr #the vector of expected counts


 


#Now calculate the G for each level of tran (likelihood
ratios)


G = vector()


for (i in levels(tran)) {


   obsi =
obscounts[tran==i]


   expi =
expcounts[tran==i]


   G[i] =
2*sum((obsi*log(obsi/expi)))


}


 


dfs = rep(max(as.integer(levels(cov)))-1, max(as.integer(levels(cov))))










probs = pchisq(G, dfs, lower.tail=F)  #and the lower tail probability of the G


   #get a weighted
average for the pooled expectations


    weightexp = list()


    weights = vector()


    for (i in
levels(tran)) {


      expi = expprop[tran==i]


      lengi =
tleng[as.integer(i)]


      counti =
sum(obscounts[tran==i])


      weightexp[[i]] =
expi*lengi*counti


      weights[i] =
lengi*counti


    }


    sum = rep(0,
length(weightexp[[1]]))


    for (i in
1:length(weightexp)) {     


      sum = sum +
weightexp[[i]]    


    }


    expproppooled =
sum/sum(weights)


    


    #Now the pooled G


    obs =
tapply(obscounts,cov,sum)


    exp =
expproppooled*sum(obs)


    Gp =
2*sum(obs*log(obs/exp))


    dfp =  max(as.integer(levels(cov)))-1


    probp = pchisq(Gp,
dfp, lower.tail=F)


    


    #total G


    Gt = sum(G)


    dft = sum(dfs)


    probt = pchisq(Gt,
dft, lower.tail=F)


    


    #heterogeneity G


    Gh = Gt-Gp


    dfh = dft - dfp


    probh = pchisq(Gh,
dfh, lower.tail=F)


    


    #F ratio


    Fratio =
(Gp/dfp)/(Gh/dfh)


    probf = pf(Fratio,
dfp, dfh, lower.tail=F)


 


res = list(TotG=c(Gt, dft, probt), PooledG=c(Gp, dfp,
probp),


HetG=c(Gh, dfh, probh), Fratio=c(Fratio, dfp, dfh, probf))


 


res


 } #end hetg






       
____________________________________________________________________________________Take
 the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, 
photos & more. 

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Replicated LR goodness-of-fit tests, heterogeneity G, with loglm?

Reply via email to