[ 
https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831870#action_12831870
 ] 

Robin Anil commented on MAHOUT-285:
-----------------------------------

In the Colloc driver why not run DocumentProcessor as the first step instead of 
 using SparseVectorsFromSequenceFiles taks

> Wrap up collocation and dictionary vectorizer integration
> ---------------------------------------------------------
>
>                 Key: MAHOUT-285
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-285
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>             Fix For: 0.3
>
>         Attachments: MAHOUT-285.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Final bit of work to integrate collocations into 0.3
> * Modify collocation finder to use dictionary vectorizer output as input 
> (saves analysis step)
> * Generate input dictionary for dictionary vectorizer that includes unigrams 
> and collocations.
> Chatted with Robin this morning, I know what needs to be done it is just a 
> matter of grinding out the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to