Thanks Steve!

--
Noah Silverman, M.S.
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

On Mar 14, 2013, at 12:54 PM, Steve Lianoglou <mailinglist.honey...@gmail.com> 
wrote:

> Hi,
> 
> On Thu, Mar 14, 2013 at 2:36 PM, Noah Silverman <noahsilver...@ucla.edu> 
> wrote:
>> Hello,
>> 
>> I am attempting to use elasticnet to classify a number of documents.
>> 
>> The features are words.  The data is coded into a matrix with each document 
>> as a row and each word as a column.  The data is binary, with {0,1} 
>> indicating the presence of a word.
>> 
>> I want to use the cross validation function of elasticnet (cv.enet).  
>> However, when the code selects a random subset of the data for a given run, 
>> some of the word columns may be all 0.  (A given word simply isn't present 
>> in the subset of data sampled.)  This causes the the function to return an 
>> error about variance of 0.
>> 
>> Any suggestions on how to mitigate this issue?  Given that I want a 5-fold 
>> cross validation to determine optimal tuning?
> 
> It looks like you can jimmy-up your own splits for cross validation by
> using the `foldid` parameter to `cv.glmnet`, so you can either
> construct your own splits to make sure that this scenario that's
> tripping you up doesn't happen.
> 
> Or, you can create a modified version of the cv function that still
> picks samples randomly, but handles situations where you have all 0
> columns as a special case -- I guess you would reduce your feature
> matrix for that fold, run the goods, then drop the coefs back into the
> original "columns" they'd belong to as if you ran the training on the
> full feature matrix.
> 
> Know what I mean?
> 
> HTH,
> -steve
> 
> -- 
> Steve Lianoglou
> Defender of The Thesis
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to