[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831870#action_12831870 ]
Robin Anil commented on MAHOUT-285: ----------------------------------- In the Colloc driver why not run DocumentProcessor as the first step instead of using SparseVectorsFromSequenceFiles taks > Wrap up collocation and dictionary vectorizer integration > --------------------------------------------------------- > > Key: MAHOUT-285 > URL: https://issues.apache.org/jira/browse/MAHOUT-285 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.3 > Reporter: Drew Farris > Fix For: 0.3 > > Attachments: MAHOUT-285.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Final bit of work to integrate collocations into 0.3 > * Modify collocation finder to use dictionary vectorizer output as input > (saves analysis step) > * Generate input dictionary for dictionary vectorizer that includes unigrams > and collocations. > Chatted with Robin this morning, I know what needs to be done it is just a > matter of grinding out the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.