hi all, I am fitting a logistic regression model on binary data. I care about the fitted probabilities, so I am not worried about infinite (or non-existent) MLEs. I use:
> glm(Y~., data=X, weights=wgt, family=binomial(link=logit), maxit=250) I understand the three ways to fit model, and in my case Y is a factor, one column > Y <- c(rep("A",679), rep("B",38)) > Y <- as.factor(Y) My question is about the weights. I can use integer weights, which makes more mathematical sense, and > wgt <- c(rep(1,679), rep(17,38)) or i can use > wgt <- c(rep(38/679,679, rep(1,38)) which makes more sense for my problem, but the mathematic is weak as I am using non integer successes in a bernoulli... I estimate the accuracy 'out of the bag' over 10000 experiments to get | integer wgt | non-int wgt -------- + -------------------- + -------------------- accuracy | A = 94.9% B = 82.3% | A = 94.7% B = 83.3% std.dev. | 2.3% 15.4% | 2.6% 13.2% avg. AIC | 707 | 124 As I understand, non-integer weights are more respectful of what I observe since instead of augmenting the successes on the rare class, which I did not observe, they simply down-weight the successes on the populus class. The populations can be thought as equal, and only the sample sizes are unbalanced. Predictions also look better, so I was hoping that the continuity of the Binomial for N in [0,1] ans X in [0,1] could guarantee me that my results still make sense, but I am not sure. Any thoughts? Thanks Edo ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help