[
https://issues.apache.org/jira/browse/MAHOUT-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-647.
------------------------------
Resolution: Fixed
> Two small bugs in seq2sparse
> ----------------------------
>
> Key: MAHOUT-647
> URL: https://issues.apache.org/jira/browse/MAHOUT-647
> Project: Mahout
> Issue Type: Bug
> Components: Utils
> Affects Versions: 0.4
> Reporter: Vasil Vasilev
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.5
>
>
> From Vasil on the mailing list:
> 1. the minLLR parameter is not taken into account. The problem is that in
> the CollocDriver class
> Job job = new Job(conf);
> is executed before
> conf.setFloat(LLRReducer.MIN_LLR, minLLRValue);
> see CollocDriver.computeNGramsPruneByLLR method
> 2. maxDFPercent is not taken into account. The problem is that in
> TFIDFPartialVectorReducer.reduce the check is
> if (df / vectorCount > maxDfPercent) {
> if (log.isInfoEnabled()) {
> log.info("ommiting {}", e.index());
> }
> continue;
> }
> and should be:
> if (df*100 / vectorCount > maxDfPercent) {
> if (log.isInfoEnabled()) {
> log.info("ommiting {}", e.index());
> }
> continue;
> }
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira