[ 
https://issues.apache.org/jira/browse/MAHOUT-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved MAHOUT-647.
------------------------------

    Resolution: Fixed

> Two small bugs in seq2sparse
> ----------------------------
>
>                 Key: MAHOUT-647
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-647
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: Vasil Vasilev
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.5
>
>
> From Vasil on the mailing list:
> 1. the minLLR parameter is not taken into account. The problem is that in
> the CollocDriver class
> Job job = new Job(conf);
> is executed before
> conf.setFloat(LLRReducer.MIN_LLR, minLLRValue);
> see CollocDriver.computeNGramsPruneByLLR method
> 2. maxDFPercent is not taken into account. The problem is that in
> TFIDFPartialVectorReducer.reduce the check is
> if (df / vectorCount > maxDfPercent) {
>          if (log.isInfoEnabled()) {
>                log.info("ommiting {}", e.index());
>              }
>        continue;
>      }
> and should be:
> if (df*100 / vectorCount > maxDfPercent) {
>          if (log.isInfoEnabled()) {
>                log.info("ommiting {}", e.index());
>              }
>        continue;
>      }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to