[ 
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Drew Farris updated MAHOUT-228:
-------------------------------

    Attachment: TrainLogisticTest.patch

Played with this a bit tonight to see how it worked. I was able to get the 
donut example working fine. Had the idea to use the text in ClassifierData.DATA 
as test input to TrainLogistic al la the BayesClassifierSelfTest. Attached is a 
patch including the simple test. 

This input has 2 columns, 'label' and 'text' which get assigned to the target 
and predictors arguments respectively. 'text' is processed by the 
TextValueEncoder.

I had to modified TextValueEncoder to override setTraceDictionary to pass the 
dictionary reference to the wordEncoder.

Once did this I could train but I ran into a problem producing the final 
output. Near line 85 in TrainLogistic the predictorWeight method is called with 
the original column name 'text', not the predictor names generated by 
TextValueEncoder. Did you have any thoughts as to the best way to modify the 
code so that the proper predictor names are used?

Once that's fixed, predictorWeight will need to be modified to properly extract 
the weight for a predictor generated by WordValueEncoder from the lr's beta 
matrix. I can tell that the traceDictionary's entry points to the positions in 
the vector where the word's weight is stored, but I'm not sure where to go from 
there.


> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.4
>
>         Attachments: logP.csv, MAHOUT-228-3.patch, MAHOUT-228.patch, 
> MAHOUT-228.patch, MAHOUT-228.patch, MAHOUT-228.patch, r.csv, 
> sgd-derivation.pdf, sgd-derivation.tex, sgd.csv, TrainLogisticTest.patch
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable 
> learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a 
> reasonable place to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to