[
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Farris updated MAHOUT-228:
-------------------------------
Attachment: TrainLogisticTest.patch
Played with this a bit tonight to see how it worked. I was able to get the
donut example working fine. Had the idea to use the text in ClassifierData.DATA
as test input to TrainLogistic al la the BayesClassifierSelfTest. Attached is a
patch including the simple test.
This input has 2 columns, 'label' and 'text' which get assigned to the target
and predictors arguments respectively. 'text' is processed by the
TextValueEncoder.
I had to modified TextValueEncoder to override setTraceDictionary to pass the
dictionary reference to the wordEncoder.
Once did this I could train but I ran into a problem producing the final
output. Near line 85 in TrainLogistic the predictorWeight method is called with
the original column name 'text', not the predictor names generated by
TextValueEncoder. Did you have any thoughts as to the best way to modify the
code so that the proper predictor names are used?
Once that's fixed, predictorWeight will need to be modified to properly extract
the weight for a predictor generated by WordValueEncoder from the lr's beta
matrix. I can tell that the traceDictionary's entry points to the positions in
the vector where the word's weight is stored, but I'm not sure where to go from
there.
> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
> Key: MAHOUT-228
> URL: https://issues.apache.org/jira/browse/MAHOUT-228
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Ted Dunning
> Assignee: Ted Dunning
> Fix For: 0.4
>
> Attachments: logP.csv, MAHOUT-228-3.patch, MAHOUT-228.patch,
> MAHOUT-228.patch, MAHOUT-228.patch, MAHOUT-228.patch, r.csv,
> sgd-derivation.pdf, sgd-derivation.tex, sgd.csv, TrainLogisticTest.patch
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable
> learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a
> reasonable place to start.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.