[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795050#action_12795050 ]
Robin Anil commented on MAHOUT-220: ----------------------------------- Datastore is an interface which allows you pick a named vector or a matrix and lookup the cell. For Bayes classifier, since the entire code is based on tokens and not SparseVectors. The names of the matrix, the row and column is upto the implementation. for the Cbayes/Bayes algorithms, We have the HBaseBayesDatastore.java and InMemoryBayesDatastore.java. {code} double getWeight(String matrixName, String row, String column) throws InvalidDatastoreException; double getWeight(String vectorName, String index) throws InvalidDatastoreException; {code} For sgd algorithm. I suggest you define your own matrix names, row indices and column indices, which your algorithm and datastore agree upon. I know it, this creates a limitation that you can use integer based column and row names. Maybe we can parameterize it OR change Bayes package to use Vectors instead of the current string token based implementation. I am currenly writing a Map/reduce job to convert text documents to vectors without relying on Lucene. Once that is done, I will overhaul the classifier package to use SparseVectors. Before that I need to know if this Patch is ok. In terms of code style, I will then patch it and start with the enhancements > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.