[ 
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795128#action_12795128
 ] 

Jake Mannix commented on MAHOUT-220:
------------------------------------

Anil,

  Your map-reduces look great, that's the kind of thing I've done for this as 
well.  Good stuff.  

As for HBase and caching layers,  I'd say it's still not fully scalable, as 
it's limited by whatever cache size you set, and your hit/miss ratio.  It seems 
the Datastore interface really is just a wrapper around Matrix and Vector, 
calling out to the entries.  Doing so in a random-access fashion seems like the 
reverse of the the way I'd do it: pass the Algorithm *to* the Datastore, and 
have the computations be done where the data lives (iterate over the Datastore 
internally, either in memory, or if it knows it's backed by mySQL, say, it can 
batch calls to the db, pulling chunks into memory, if it's HDFS-backed, then it 
can fire off a M/R job, etc...).

> Mahout Bayes Code cleanup
> -------------------------
>
>                 Key: MAHOUT-220
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-220
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch
>
>
> Following isabel's checkstyle, I am adding a whole slew of code cleanup with 
> the following exceptions
> 1.  Line length used is 120 instead of 80. 
> 2.  static final log is kept as is. not LOG. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to