[
https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614847#action_12614847
]
Steven Handerson commented on MAHOUT-60:
----------------------------------------
Robin,
I can get the training working very well -- I've even started working with a
very
large file (700+ Meg, and not done creating yet, the old slow way). No problem.
But I'd say the judgment / application of a model maybe needs a better
map-reduce
treatment now -- at least I *think* it's working (I've seen it work on smaller
training data) but with my larger task it's getting bogged down.
Maybe I'll think about it / try it, but I'm very new to map-reduce, but it seems
like you should be able to do something clever with throwing the
test data (feature|doc) and model data (feature|category, increment) together,
reducing and emitting category increments / decrements for each
(doc, category) pair, and then summing them up in a reduce.
Or just emitting (doc|category,increment) for all features, and then you
can easily also in the reduce find the maximal category.
I don't think this is what you're doing yet -- you're thinking of loading
the model, rather than shoving it through a map/reduce sequence. I think.
> Complementary Naive Bayes
> -------------------------
>
> Key: MAHOUT-60
> URL: https://issues.apache.org/jira/browse/MAHOUT-60
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification
> Reporter: Robin Anil
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch,
> twcnb.jpg
>
>
> The focus is to implement an improved text classifier based on this paper
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.