[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795128#action_12795128 ]
Jake Mannix commented on MAHOUT-220: ------------------------------------ Anil, Your map-reduces look great, that's the kind of thing I've done for this as well. Good stuff. As for HBase and caching layers, I'd say it's still not fully scalable, as it's limited by whatever cache size you set, and your hit/miss ratio. It seems the Datastore interface really is just a wrapper around Matrix and Vector, calling out to the entries. Doing so in a random-access fashion seems like the reverse of the the way I'd do it: pass the Algorithm *to* the Datastore, and have the computations be done where the data lives (iterate over the Datastore internally, either in memory, or if it knows it's backed by mySQL, say, it can batch calls to the db, pulling chunks into memory, if it's HDFS-backed, then it can fire off a M/R job, etc...). > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.