Thanks Steve! -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095
On Mar 14, 2013, at 12:54 PM, Steve Lianoglou <mailinglist.honey...@gmail.com> wrote: > Hi, > > On Thu, Mar 14, 2013 at 2:36 PM, Noah Silverman <noahsilver...@ucla.edu> > wrote: >> Hello, >> >> I am attempting to use elasticnet to classify a number of documents. >> >> The features are words. The data is coded into a matrix with each document >> as a row and each word as a column. The data is binary, with {0,1} >> indicating the presence of a word. >> >> I want to use the cross validation function of elasticnet (cv.enet). >> However, when the code selects a random subset of the data for a given run, >> some of the word columns may be all 0. (A given word simply isn't present >> in the subset of data sampled.) This causes the the function to return an >> error about variance of 0. >> >> Any suggestions on how to mitigate this issue? Given that I want a 5-fold >> cross validation to determine optimal tuning? > > It looks like you can jimmy-up your own splits for cross validation by > using the `foldid` parameter to `cv.glmnet`, so you can either > construct your own splits to make sure that this scenario that's > tripping you up doesn't happen. > > Or, you can create a modified version of the cv function that still > picks samples randomly, but handles situations where you have all 0 > columns as a special case -- I guess you would reduce your feature > matrix for that fold, run the goods, then drop the coefs back into the > original "columns" they'd belong to as if you ran the training on the > full feature matrix. > > Know what I mean? > > HTH, > -steve > > -- > Steve Lianoglou > Defender of The Thesis > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.