[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Drew Farris updated MAHOUT-285: ------------------------------- Attachment: MAHOUT-285.patch Robin got the bulk of this done yesterday night, reviewed his changes and integration strategy and tested. Very minor cleanup: * use constants instead of string literals in several places where input/output paths are defined * renamed some variables for more readable code in the CollocCombiner/CollocReducer * handling of help display in SparseVectorsFromSequenceFile Once this patch is applied, consider moving mahout-utils o.a.m.text.SparseVectorsFromSequenceFiles to o.a.m.utils.text.SparseVectorsFromSequenceFiles Also, consider using new Path(String parent, String child) constructor instead of concatenating strings with '/' and then using new Path(String) later on. I lacked the clarity to pursue this one to its bitter end this evening. With these changes, if this isn't ready to close, it is very very close. > Wrap up collocation and dictionary vectorizer integration > --------------------------------------------------------- > > Key: MAHOUT-285 > URL: https://issues.apache.org/jira/browse/MAHOUT-285 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.3 > Reporter: Drew Farris > Fix For: 0.3 > > Attachments: MAHOUT-285.patch, MAHOUT-285.patch, MAHOUT-285.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Final bit of work to integrate collocations into 0.3 > * Modify collocation finder to use dictionary vectorizer output as input > (saves analysis step) > * Generate input dictionary for dictionary vectorizer that includes unigrams > and collocations. > Chatted with Robin this morning, I know what needs to be done it is just a > matter of grinding out the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.