Benilton Carvalho wrote: > Hi, > > Until now, I thought that the results of glm() and bigglm() would > coincide. Probably a naive assumption? > > Anyways, I've been using bigglm() on some datasets I have available. > One of the sets has >15M observations. > > I have 3 continuous predictors (A, B, C) and a binary outcome (Y). > And tried the following: > > m1 <- bigglm(Y~A+B+C, family=binomial(), data=dataset1, chunksize=10e6) > m2 <- bigglm(Y~A*B+C, family=binomial(), data=dataset1, chunksize=10e6) > imp <- m1$deviance-m2$deviance > > For my surprise "imp" was negative. > > I then tried the same models, using glm() instead... and as I > expected, "imp" was positive. > > I also noticed differences on the coefficients estimated by glm() and > bigglm() - small differences, though, and CIs for the coefficients (a > given coefficient compared across methods) overlap. > > Are such incrongruences expected? What can I use to check for > convergence with bigglm(), as this might be one plausible cause for a > negative difference on the deviances? > It doesn't sound right, but I cannot reproduce your problem on a similar sized problem (it pretty much killed my machine...). Some observations:
A: You do realize that you are only using 1.5 chunks? (15M vs. 10e6 chunksize) B: Deviance changes are O(1) under the null hypothesis but the deviances themselves are O(N). In a smaller variant (N=1e5), I got > m1$deviance [1] 138626.4 > m2$deviance [1] 138626.4 > m2$deviance - m1$deviance [1] -0.05865785 This does leave some scope for roundoff to creep in. You may want to play with a lower setting of tol=... -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.