Hi. Background - I am working with a dataset involving around 750K observations, where many of the variables (8/11) are unordered factors.
The typical model used to model this relationship in the literature has been a simple linear additive model, but this is rejected out of hand by the data. I was asked to model this via kernel methods, but first wanted to play with the parametric specification out of curiosity. I thought it would be interesting to see what type of model stepwise BIC would yield, and have been playing with the step() function (on R-beta due to the factor.scope() problem that has been fixed in the patched and beta version). I am running this on a 64bit box with 32GB of RAM and tons of swap, but am hitting the memory wall as occasionally memory needs grow to ungodly proportions (in the early iterations the program starts out around 8GB but quickly grows to 15GB, then grows from there). This is not due to my using the beta version, as this also arises under R-2.2.1 for what that is worth. My question is whether or not there is some simple way to substantially reduce the memory footprint for this procedure. I took a look at previous posts for step() and memory issues, but still wonder whether there might be a switch or possibly better way of constructing my model that would overcome the memory issues. I include the code below, and any comments or suggestions would be most welcome (besides `what type of idiot lets information criteria determine their model ;-)') Thanks ever so much in advance. -- Jeff ---- Begin ---- ## Read in the full data set (n=745466 observations) data <- read.table("../data_header.dat",header=TRUE) ## Create a data frame with all categorical variables declared as ## unordered factors data <- data.frame(logrprice=data$logrprice, cgt=factor(data$cgt), cag=factor(data$cag), gstann=factor(data$gstann), fhogann=factor(data$fhogann), gstfhog=factor(data$gstfhog), luc=factor(data$luc), municipality=factor(data$municipality), time=factor(data$time), distance=data$distance, logr=data$logr, loginc=data$loginc) ## Estimate a simple linear model (used repeatedly in the literature, ## fails the most simple of model specification tests e.g., ## resettest()) model.linear <- lm(logrprice~.,data=data) ## Now conduct stepwise (BIC) regression using the step() function in ## the stats library. The lower model is the unconditional mean of y, ## the upper having polynomials of up to order 6 in the three ## continuous covariates, with interaction among all variables of ## order 2. n <- nrow(data) model.bic <- step(model.linear, scope=list( lower=~ 1, upper=~ (. +I(logr^2) +I(logr^3) +I(logr^4) +I(logr^5) +I(logr^6) +I(distance^2) +I(distance^3) +I(distance^4) +I(distance^5) +I(distance^6) +I(loginc^2) +I(loginc^3) +I(loginc^4) +I(loginc^5) +I(loginc^6)) ^2), trace=TRUE, k=log(n) ) summary(model.bic) ---- End ---- -- Professor J. S. Racine Phone: (905) 525 9140 x 23825 Department of Economics FAX: (905) 521-8232 McMaster University e-mail: [EMAIL PROTECTED] 1280 Main St. W.,Hamilton, URL: http://www.economics.mcmaster.ca/racine/ Ontario, Canada. L8S 4M4 `The generation of random numbers is too important to be left to chance.' ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html