Hadoop put their MurmurHash in utils, so that might be a consideration. But for Mahout it fits better, imo, in org.apache.mahout.common with other code that has similar philosophy and purpose. I make the assumption that others will want to add some alternative hash tools, therefore I'd create a "hash" package in mahout.common.
The randomizers I'd put in org.apache.mahout.math due to their interaction with Vector, either at that very depth or in org.apache.mahout.math.randomizer, as .math is beginning to get dense based on number of modules. I imagine using the priors outside of sgd, so they could be moved to org.apache.mahout.math as well, where they may merit their own sub package. --- On Tue, 12/29/09, Ted Dunning (JIRA) <j...@apache.org> wrote: > From: Ted Dunning (JIRA) <j...@apache.org> > Subject: [jira] Commented: (MAHOUT-228) Need sequential logistic regression > implementation using SGD techniques > To: mahout-dev@lucene.apache.org > Date: Tuesday, December 29, 2009, 12:29 PM > > [ > https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795138#action_12795138 > ] > > Ted Dunning commented on MAHOUT-228: > ------------------------------------ > > > This is the time. The MurmurHash and Randomizer > classes both seem ripe for promotion to other packages. > > What I will do is file some additional JIRA's that include > just those classes (one JIRA for Murmur, one for > Randomizer/Vectorizer). Those patches will probably > make it in before this one does because they are > simpler. At that point, I will rework the patch on > this JIRA to not include those classes. > > Where would you recommend these others go? > > > > Need sequential logistic regression implementation > using SGD techniques > > > ----------------------------------------------------------------------- > > > > > Key: MAHOUT-228 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-228 > > > Project: Mahout > > Issue Type: New > Feature > > Components: > Classification > > Reporter: Ted > Dunning > > > Fix For: 0.3 > > > > Attachments: > logP.csv, MAHOUT-228-3.patch, r.csv, sgd-derivation.pdf, > sgd-derivation.tex, sgd.csv > > > > > > Stochastic gradient descent (SGD) is often fast enough > for highly scalable learning (see Vowpal Wabbit, http://hunch.net/~vw/). > > I often need to have a logistic regression in Java as > well, so that is a reasonable place to start. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue > online. > >