Re: [R] Multinomial Logit Model with lots of Dummy Variables
Hi Thanks to Jeremy for his response... I have been able to generate the factors and generate mlogit data using his code: mldata-mlogit.data(mydata, varying=NULL, choice=pitch_type_1, shape=wide) my mlogit data looks like: dependent_var,A variable,B Var,chid,alt FALSE,110,19,1,0 FALSE,110,19,1,1 FALSE,110,19,1,2 FALSE,110,19,1,3 FALSE,110,19,1,4 TRUE,110,19,1,5 FALSE,110,19,1,6 FALSE,110,19,1,7 FALSE,110,19,1,8 FALSE,110,19,2,0 FALSE,110,19,2,1 FALSE,110,19,2,2 FALSE,110,19,2,3 FALSE,110,19,2,4 FALSE,110,19,2,5 TRUE,110,19,2,6 FALSE,110,19,2,7 FALSE,110,19,2,8 TRUE,110,561,3,0 FALSE,110,561,3,1 FALSE,110,561,3,2 FALSE,110,561,3,3 FALSE,110,561,3,4 FALSE,110,561,3,5 FALSE,110,561,3,6 FALSE,110,561,3,7 FALSE,110,561,3,8 FALSE,110,149,4,0 FALSE,110,149,4,1 TRUE,110,149,4,2 ... The mldata contains 651431 rows. If I try to run this full data set I get the following error: mlogit.model- mlogit(dependent_var~0|A+B, data = mldata, reflevel=0) Error in model.matrix.default(formula, data) : allocMatrix: too many elements specified Calls: mlogit ... model.matrix.mFormula - model.matrix - model.matrix.default Execution halted Smaller datasets (595 mldata rows) and mlogit works fine and generates regression output. Is there a problem with mlogit and huge datasets? I suppose this is perhaps not the best way to assess this kind of data, but I am trying to replicate a previous analysis that was completed on a similar amount of similar data. -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3455345.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multinomial Logit Model with lots of Dummy Variables
Hi All, I am attempting to build a Multinomial Logit model with dummy variables of the following form: Dependent Variable : 0-8 Discrete Choices Dummy Variable 1: 965 dummy varsgh...@student.monash.edu.augh@gp1.com Dummy Variable 2: 805 dummy vars The data set I am using has the dummy columns pre-created, so it's a table of 72,381 rows and 1770 columns. The first 965 columns represent the dummy columns for Variable 1 The next 805 columns represent the dummy columns for Variable 2 My code to build the mlogit model looks like the following. I want to know...is there a better way of doing this without these huge equations? (I probably also need a more powerful PC to do all of this). I'll also want to perform a joint test of significance on the first 805 coefficients... Is this possible? Thanks GP [code] #install MLOGIT library(mlogit) #load mydata mydata = 0 mydata-read.csv(file=G:\\data.csv,head=TRUE) my_data=0 num.rows=length(mydata[,1]) num.cols=965+805+1 my_data=matrix(0,nr=num.rows,nc=num.cols) for(i in 1:num.rows) { nb=mydata[i,2] np=mydata[i,3] my_data[i,nb]=1 my_data[i,965+np]=1 my_data[i,1+1770]=mydata[i,1] } #convert matrix to data.frame # convert to data frame my_data_frame-as.data.frame(my_data) #check data frame headers head(my_data_frame) #load dataframe into mldata with choice variable mldata-mlogit.data(my_data_frame, varying=NULL, choice=V1771, shape=wide) #V1771 = dependent var #V1-V965 = variable 1 dummies #V966-V1700 = variable 2 dummies #regress V1771 against all 1700 variables... mlogit.model-mlogit(V1771~0|V1+V2+V3...+V1700,data=mldata, reflevel=0) [/code] -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3439492.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multinomial Logit Model with lots of Dummy Variables
If you are just looking to collapse the dummy variables into two factor variables, the following will work. ## Generate some example data set.seed(1234) n - 100 # Generate outcome outcome - rbinom(n, 3, 0.5) colnames(exposures) - paste(V, seq(1:10), sep = ) #Generate dummy variables for A and B A - t(apply(matrix(nrow = 100, ncol = 5), 1, function(x) { sample(c(1, 0, 0, 0, 0)) })) B - t(apply(matrix(nrow = 100, ncol = 5), 1, function(x) { sample(c(1, 0, 0, 0, 0)) })) # Combine into data frame dat - data.frame(outcome, A, B) names(dat) - c('outcome', paste(A, seq(1:5), sep = ), paste(B, seq(1:5), sep = )) head(dat) ## Collapse dummies to factor variable A - apply(dat, 1, function(x) { A - x[2:6] A.names - names(x[2:6]) A.value - A.names[A==1] return(A.value) }) B - apply(dat, 1, function(x) { B - x[7:11] B.names - names(x[7:11]) B.names B.value - B.names[B==1] return(B.value) }) # Combine into new data frame dat.new - data.frame(dat$outcome, A, B) head(dat.new) Jeremy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.