On Wed, 5 Dec 2007, Tim Calkins wrote: > Hi all - > > I'm trying to find a way to create dummy variables from factors in a > regression. I have been using biglm along the lines of > > ff <- log(Price) ~ factor(Colour):factor(Store) + > factor(DummyVar):factor(Colour):factor(Store) > > lm1 <- biglm(ff, data=my.dataset) > > but because there are lots of colours (>100) and lots of stores > (>250), I run it to memory problems. Now, not every store sells every > colour and so it should be possible to create the matrix of factor > variables myself and greatly reduce the size of the problem. it seems > that lm / biglm use all combinations of factor levels when used in > factor(Colour):factor(Store) so by creating my own matrix of factor > variables i should be able to reduce the size of the problem > considerably. > > If i have a data frame >> my.dataset <- data.frame(Price=1:12, Colour= c('red','blue','green'), > Store=c('a', 'b', 'c', 'a', 'c', 'd', 'e', 'e', 'e', 'e', 'b', 'e'), > DummyVar = sort(rep(c(0,1),6)) ) > > i want to create a data frame with the dummy vars that looks like > > red:a red:e blue:b blue:c blue:e green:c green:d green:e > 1 0 0 0 0 0 0 0 > 0 0 1 0 0 0 0 0 > 0 0 0 0 0 1 0 0 > 1 0 0 0 0 0 0 0 > 0 0 0 1 0 0 0 0 > 0 0 0 0 0 0 1 0 > 0 1 0 0 0 0 0 0 > 0 0 0 0 1 0 0 0 > 0 0 0 0 0 0 0 1 > 0 1 0 0 0 0 0 0 > 0 0 1 0 0 0 0 0 > 0 0 0 0 0 0 0 1 > > any ideas would be appreciated.
Use mat <- model.matrix( ~ClrStr-1, transform( my.dataset, ClrStr = factor( paste(Colour,Store,sep=":") ) ) ) then pretty up the colnames() and re-order columns if order matters. ---- However, if DummyVar is a categorical variable, you could just compute means on the appropriate subsets by maintaining a table of sums and totals. Then in a second pass through the data get the residual sums of squares. If the data are already in a database, it might make sense to do these operations there and import the results to R for further massaging. HTH, Chuck > > > -- > Tim Calkins > 0406 753 997 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.