[ 
https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Drew Farris updated MAHOUT-285:
-------------------------------

    Attachment: MAHOUT-285.patch

Robin got the bulk of this done yesterday night, reviewed his changes and 
integration strategy and tested. Very minor cleanup:

* use constants instead of string literals in several places where input/output 
paths are defined
* renamed some variables for more readable code in the 
CollocCombiner/CollocReducer
* handling of help display in SparseVectorsFromSequenceFile

Once this patch is applied, consider moving mahout-utils 
o.a.m.text.SparseVectorsFromSequenceFiles to 
o.a.m.utils.text.SparseVectorsFromSequenceFiles

Also, consider using new Path(String parent, String child) constructor instead 
of concatenating strings with '/' and then using new Path(String) later on. I 
lacked the clarity to pursue this one to its bitter end this evening.

With these changes, if this isn't ready to close, it is very very close.



> Wrap up collocation and dictionary vectorizer integration
> ---------------------------------------------------------
>
>                 Key: MAHOUT-285
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-285
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>             Fix For: 0.3
>
>         Attachments: MAHOUT-285.patch, MAHOUT-285.patch, MAHOUT-285.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Final bit of work to integrate collocations into 0.3
> * Modify collocation finder to use dictionary vectorizer output as input 
> (saves analysis step)
> * Generate input dictionary for dictionary vectorizer that includes unigrams 
> and collocations.
> Chatted with Robin this morning, I know what needs to be done it is just a 
> matter of grinding out the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to