I am trying to figure out why 'biglm' can handle large data set...
According to the R document - "biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory." After reading the source code below, I still could not figure out how 'update' implements the algorithm... Thanks for any light shed upon this ... > biglm::biglm function (formula, data, weights = NULL, sandwich = FALSE) { tt <- terms(formula) if (!is.null(weights)) { if (!inherits(weights, "formula")) stop("`weights' must be a formula") w <- model.frame(weights, data)[[1]] } else w <- NULL mf <- model.frame(tt, data) mm <- model.matrix(tt, mf) qr <- bigqr.init(NCOL(mm)) qr <- update(qr, mm, model.response(mf), w) rval <- list(call = sys.call(), qr = qr, assign = attr(mm, "assign"), terms = tt, n = NROW(mm), names = colnames(mm), weights = weights) if (sandwich) { p <- ncol(mm) n <- nrow(mm) xyqr <- bigqr.init(p * (p + 1)) xx <- matrix(nrow = n, ncol = p * (p + 1)) xx[, 1:p] <- mm * model.response(mf) for (i in 1:p) xx[, p * i + (1:p)] <- mm * mm[, i] xyqr <- update(xyqr, xx, rep(0, n), w * w) rval$sandwich <- list(xy = xyqr) } rval$df.resid <- rval$n - length(qr$D) class(rval) <- "biglm" rval } <environment: namespace:biglm> --------------------------- -- View this message in context: http://r.789695.n4.nabble.com/biglm-how-it-handles-large-data-set-tp3020890p3020890.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.