[R] mu^2(1-mu)^2 variance function for GLM

2005-06-16 Thread Henric Nilsson
Dear list,

I'm trying to mimic the analysis of Wedderburn (1974) as cited by
McCullagh and Nelder (1989) on p.328-332. This is the leaf-blotch on 
barley example, and the data is available in the `faraway' package.

Wedderburn suggested using the variance function mu^2(1-mu)^2. This
variance function isn't readily available in R's `quasi' family object, 
but it seems to me that the following definition could be used:

}, "mu^2(1-mu)^2" = {
 variance <- function(mu) mu^2 * (1 - mu)^2
 validmu <- function(mu) all(mu > 0) && all(mu < 1)
 dev.resids <- function(y, mu, wt) 2 * wt * ((2 * y - 1) *
 (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1,
 (1 - y)/(1 - mu - 2 + y/mu + (1 - y)/(1 - mu))

I've modified the `quasi' function accordingly (into `quasi2' given 
below) and my results are very much in line with the ones cited by 
McCullagh and Nelder on p.330-331:

> data(leafblotch, package = "faraway")
> summary(fit <- glm(blotch ~ site + variety,
+ family = quasi2(link = "logit", variance = "mu^2(1-mu)^2"),
+ data = leafblotch))

Call:
glm(formula = blotch ~ site + variety, family = quasi2(link = "logit",
 variance = "mu^2(1-mu)^2"), data = leafblotch)

Deviance Residuals:
  Min1QMedian3Q   Max
-3.23175  -0.65385  -0.09426   0.46946   1.97152

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.922530.44463 -17.818  < 2e-16 ***
site21.383080.44463   3.111  0.00268 **
site33.860130.44463   8.682 8.18e-13 ***
site43.556970.44463   8.000 1.53e-11 ***
site54.108410.44463   9.240 7.48e-14 ***
site64.305410.44463   9.683 1.13e-14 ***
site74.918110.44463  11.061  < 2e-16 ***
site85.694920.44463  12.808  < 2e-16 ***
site97.067620.44463  15.896  < 2e-16 ***
variety2-0.467280.46868  -0.997  0.32210
variety3 0.078770.46868   0.168  0.86699
variety4 0.954180.46868   2.036  0.04544 *
variety5 1.352760.46868   2.886  0.00514 **
variety6 1.328590.46868   2.835  0.00595 **
variety7 2.340660.46868   4.994 3.99e-06 ***
variety8 3.262680.46868   6.961 1.30e-09 ***
variety9 3.135560.46868   6.690 4.10e-09 ***
variety103.887360.46868   8.294 4.33e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasi family taken to be 0.9884755)

 Null deviance: 339.488  on 89  degrees of freedom
Residual deviance:  71.961  on 72  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 18


Also, the plot of the Pearson residuals against the linear predictor

> plot(residuals(fit, type = "pearson") ~ fit$linear.predictors)
> abline(h = 0, lty = 2)

results in a plot that, to my eyes at least, is very close to Fig. 9.2
on p. 332.

However, I can't seem to find any other published examples using this
variance function. I'd really like to verify that my code above is
working before applying it to real data sets. Can anybody help?

Thanks,
Henric
- - - - -
quasi2 <- function (link = "identity", variance = "constant")
{
 linktemp <- substitute(link)
 if (is.expression(linktemp) || is.call(linktemp))
 linktemp <- link
 else if (!is.character(linktemp))
 linktemp <- deparse(linktemp)
 if (is.character(linktemp))
 stats <- make.link(linktemp)
 else stats <- linktemp
 variancetemp <- substitute(variance)
 if (!is.character(variancetemp)) {
 variancetemp <- deparse(variancetemp)
 if (linktemp == "variance")
 variancetemp <- eval(variance)
 }
 switch(variancetemp, constant = {
 variance <- function(mu) rep.int(1, length(mu))
 dev.resids <- function(y, mu, wt) wt * ((y - mu)^2)
 validmu <- function(mu) TRUE
 }, "mu(1-mu)" = {
 variance <- function(mu) mu * (1 - mu)
 validmu <- function(mu) all(mu > 0) && all(mu < 1)
 dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y ==
 0, 1, y/mu)) + (1 - y) * log(ifelse(y == 1, 1, (1 -
 y)/(1 - mu
 }, "mu^2(1-mu)^2" = {
 variance <- function(mu) mu^2 * (1 - mu)^2
 validmu <- function(mu) all(mu > 0) && all(mu < 1)
 dev.resids <- function(y, mu, wt) 2 * wt * ((2 * y - 1) *
 (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1,
 (1 - y)/(1 - mu - 2 + y/mu + (1 - y)/(1 - mu))
 }, mu = {
 variance <- function(mu) mu
 validmu <- function(mu) all(mu > 0)
 dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y ==
 0, 1, y/mu)) - (y - mu))
 }, "mu^2" = {
 variance <- function(mu) mu^2
 validmu <- function(mu) all(mu > 0)
 dev.resids <- function(y, mu, wt) pmax(-2 * wt * (log(ifelse(y ==
 0, 1, y)/mu) - (y - mu)/mu), 0)
 }, "mu^3" = {
 variance <- function(mu) 

Re: [R] mu^2(1-mu)^2 variance function for GLM

2005-06-16 Thread David Firth
Dear Henric

I do not have a ready stock of other examples, but I do have my own 
version of a family function for this, reproduced below.  It differs 
from yours (apart from being a regular family function rather than 
using a modified "quasi") in the definition of deviance residuals.  
These necessarily involve an arbitrary constant (see McCullagh and 
Nelder, 1989, p330); in my function that arbitrariness is in the choice 
eps <- 0.0005.  I don't think the deviance contributions as you 
specified in your code below will have the right derivative (with 
respect to mu) for observations where y=0 or y=1.

Anyway, this at least gives you some kind of check.  I hope it helps.  
This function will be part of a new package which Heather Turner and I 
will submit to CRAN in a few days' time.  Please do let me know if you 
find any problems with it.

Here is my "wedderburn" family function:

"wedderburn" <-
 function (link = "logit")
{
 linktemp <- substitute(link)
 if (!is.character(linktemp)) {
 linktemp <- deparse(linktemp)
 if (linktemp == "link")
 linktemp <- eval(link)
 }
 if (any(linktemp == c("logit", "probit", "cloglog")))
 stats <- make.link(linktemp)
 else stop(paste(linktemp,
 "link not available for wedderburn quasi-family;",
 "available links are",
 "\"logit\", \"probit\" and \"cloglog\""))
 variance <- function(mu)  mu^2 * (1-mu)^2
 validmu <- function(mu) {
 all(mu > 0) && all(mu < 1)}
 dev.resids <- function(y, mu, wt){
 eps <-  0.0005
 2 * wt * (y/mu + (1 - y)/(1 - mu) - 2 +
   (2 * y - 1) * log((y + eps)*(1 - mu)/((1- y + eps) * 
mu)))
 }
 aic <- function(y, n, mu, wt, dev) NA
 initialize <- expression({
 if (any(y < 0 | y > 1)) stop(paste(
"Values for the wedderburn family must be in [0,1]"))
 n <- rep.int(1, nobs)
 mustart <- (y + 0.1)/1.2
 })
 structure(list(family = "wedderburn",
link = linktemp,
linkfun = stats$linkfun,
linkinv = stats$linkinv,
variance = variance,
dev.resids = dev.resids,
aic = aic,
mu.eta = stats$mu.eta,
initialize = initialize,
validmu = validmu,
valideta = stats$valideta),
   class = "family")
}

Best wishes,
David
http://www.warwick.ac.uk/go/dfirth

On 16 Jun 2005, at 09:27, Henric Nilsson wrote:

> Dear list,
>
> I'm trying to mimic the analysis of Wedderburn (1974) as cited by
> McCullagh and Nelder (1989) on p.328-332. This is the leaf-blotch on
> barley example, and the data is available in the `faraway' package.
>
> Wedderburn suggested using the variance function mu^2(1-mu)^2. This
> variance function isn't readily available in R's `quasi' family object,
> but it seems to me that the following definition could be used:
>
> }, "mu^2(1-mu)^2" = {
>  variance <- function(mu) mu^2 * (1 - mu)^2
>  validmu <- function(mu) all(mu > 0) && all(mu < 1)
>  dev.resids <- function(y, mu, wt) 2 * wt * ((2 * y - 1) *
>  (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1,
>  (1 - y)/(1 - mu - 2 + y/mu + (1 - y)/(1 - mu))
>
> I've modified the `quasi' function accordingly (into `quasi2' given
> below) and my results are very much in line with the ones cited by
> McCullagh and Nelder on p.330-331:
>
>> data(leafblotch, package = "faraway")
>> summary(fit <- glm(blotch ~ site + variety,
> + family = quasi2(link = "logit", variance = "mu^2(1-mu)^2"),
> + data = leafblotch))
>
> Call:
> glm(formula = blotch ~ site + variety, family = quasi2(link = "logit",
>  variance = "mu^2(1-mu)^2"), data = leafblotch)
>
> Deviance Residuals:
>   Min1QMedian3Q   Max
> -3.23175  -0.65385  -0.09426   0.46946   1.97152
>
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept) -7.922530.44463 -17.818  < 2e-16 ***
> site21.383080.44463   3.111  0.00268 **
> site33.860130.44463   8.682 8.18e-13 ***
> site43.556970.44463   8.000 1.53e-11 ***
> site54.108410.44463   9.240 7.48e-14 ***
> site64.305410.44463   9.683 1.13e-14 ***
> site74.918110.44463  11.061  < 2e-16 ***
> site85.694920.44463  12.808  < 2e-16 ***
> site97.067620.44463  15.896  < 2e-16 ***
> variety2-0.467280.46868  -0.997  0.32210
> variety3 0.078770.46868   0.168  0.86699
> variety4 0.954180.46868   2.036  0.04544 *
> variety5 1.352760.46868   2.886  0.00514 **
> variety6 1.328590.46868   2.835  0.00595 **
> variety7 2.340660.46868   4.994 3.99e-06 ***
> variety8 3.262680.46868   6.961 1.30e-09 ***
> variety9 3.135560.46868   6.690

Re: [R] mu^2(1-mu)^2 variance function for GLM

2005-06-22 Thread Henric Nilsson
Dear Professor Firth,

David Firth said the following on 2005-06-16 17:22:

> I do not have a ready stock of other examples, but I do have my own 
> version of a family function for this, reproduced below.  It differs 
> from yours (apart from being a regular family function rather than using 
> a modified "quasi") in the definition of deviance residuals.  These 
> necessarily involve an arbitrary constant (see McCullagh and Nelder, 
> 1989, p330); in my function that arbitrariness is in the choice eps <- 
> 0.0005.  I don't think the deviance contributions as you specified in 
> your code below will have the right derivative (with respect to mu) for 
> observations where y=0 or y=1.

I'm sorry for the late reply.

You're right -- my definition of the deviance residuals isn't correct. 
Your code, on the other hand, seems to do the right thing.

Many thanks for this note and the provided `wedderburn' function.


Henric

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html