Re: [R] Multinomial Logit Model with lots of Dummy Variables

2011-04-17 Thread ghpow1
Hi 

Thanks to Jeremy for his response...

I have been able to generate the factors and generate mlogit data using his
code:

mldata-mlogit.data(mydata, varying=NULL, choice=pitch_type_1,
shape=wide)

my mlogit data looks like:

dependent_var,A variable,B Var,chid,alt
FALSE,110,19,1,0
FALSE,110,19,1,1
FALSE,110,19,1,2
FALSE,110,19,1,3
FALSE,110,19,1,4
TRUE,110,19,1,5
FALSE,110,19,1,6
FALSE,110,19,1,7
FALSE,110,19,1,8
FALSE,110,19,2,0
FALSE,110,19,2,1
FALSE,110,19,2,2
FALSE,110,19,2,3
FALSE,110,19,2,4
FALSE,110,19,2,5
TRUE,110,19,2,6
FALSE,110,19,2,7
FALSE,110,19,2,8
TRUE,110,561,3,0
FALSE,110,561,3,1
FALSE,110,561,3,2
FALSE,110,561,3,3
FALSE,110,561,3,4
FALSE,110,561,3,5
FALSE,110,561,3,6
FALSE,110,561,3,7
FALSE,110,561,3,8
FALSE,110,149,4,0
FALSE,110,149,4,1
TRUE,110,149,4,2

...

The mldata contains 651431 rows.  

If I try to run this full data set I get the following error:  


 mlogit.model- mlogit(dependent_var~0|A+B, data = mldata, reflevel=0)
Error in model.matrix.default(formula, data) :
  allocMatrix: too many elements specified
Calls: mlogit ... model.matrix.mFormula - model.matrix -
model.matrix.default
Execution halted

Smaller datasets (595 mldata rows) and mlogit works fine and generates
regression output.  

Is there a problem with mlogit and huge datasets?  

I suppose this is perhaps not the best way to assess this kind of data, but
I am trying to replicate a previous analysis that was completed on a similar
amount of similar data.




--
View this message in context: 
http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3455345.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multinomial Logit Model with lots of Dummy Variables

2011-04-10 Thread ghpow1
Hi All,

I am attempting to build a Multinomial Logit model with dummy variables of
the following form:

Dependent Variable : 0-8 Discrete Choices

Dummy Variable 1: 965 dummy varsgh...@student.monash.edu.augh@gp1.com
Dummy Variable 2: 805 dummy vars

The data set I am using has the dummy columns pre-created, so it's a table
of 72,381 rows and 1770 columns.

The first 965 columns represent the dummy columns for Variable 1
The next 805 columns represent the dummy columns for Variable 2

My code to build the mlogit model looks like the following. I want to
know...is there a better way of doing this without these huge equations? (I
probably also need a more powerful PC to do all of this).

I'll also want to perform a joint test of significance on the first 805
coefficients...

Is this possible?

Thanks

GP

[code]

#install MLOGIT
library(mlogit)

#load mydata
mydata = 0
mydata-read.csv(file=G:\\data.csv,head=TRUE)
my_data=0

num.rows=length(mydata[,1])
num.cols=965+805+1


my_data=matrix(0,nr=num.rows,nc=num.cols)

for(i in 1:num.rows) {

nb=mydata[i,2]
np=mydata[i,3]

my_data[i,nb]=1
my_data[i,965+np]=1
my_data[i,1+1770]=mydata[i,1]


}

#convert matrix to data.frame
# convert to data frame
my_data_frame-as.data.frame(my_data)

#check data frame headers
head(my_data_frame)

#load dataframe into mldata with choice variable
mldata-mlogit.data(my_data_frame, varying=NULL, choice=V1771,
shape=wide)

#V1771 = dependent var
#V1-V965 = variable 1 dummies
#V966-V1700 = variable 2 dummies

#regress V1771 against all 1700 variables...
mlogit.model-mlogit(V1771~0|V1+V2+V3...+V1700,data=mldata, reflevel=0)


[/code]



--
View this message in context: 
http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3439492.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multinomial Logit Model with lots of Dummy Variables

2011-04-10 Thread Jeremy Hetzel
If you are just looking to collapse the dummy variables into two factor 
variables, the following will work.

## Generate some example data
set.seed(1234)
n - 100
# Generate outcome
outcome - rbinom(n, 3, 0.5)
colnames(exposures) - paste(V, seq(1:10), sep = )

#Generate dummy variables for A and B
A - t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))
B - t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))

# Combine into data frame
dat - data.frame(outcome, A, B)
names(dat) - c('outcome', paste(A, seq(1:5), sep = ), paste(B, 
seq(1:5), sep = ))
head(dat)


## Collapse dummies to factor variable
A - apply(dat, 1, function(x)
{
A - x[2:6]
A.names - names(x[2:6])
A.value - A.names[A==1]
return(A.value)
})

B - apply(dat, 1, function(x)
{
B - x[7:11]
B.names - names(x[7:11])
B.names
B.value - B.names[B==1]
return(B.value)
})

# Combine into new data frame
dat.new - data.frame(dat$outcome, A, B)

head(dat.new)



Jeremy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.