[ 
https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408387#comment-15408387
 ] 

Ted Dunning commented on MAHOUT-1853:
-------------------------------------

[~pferrel] Computing the parameters of a normal distribution is definitely 
cheaper than updating a t-digest, but I doubt that the difference will be 
visible. It takes a few additions and divisions to update the mean and sd, 
while it takes 100-200ns on average to update a t-digest with a new sample.

But the big win happens when the data being collected is grossly non-normal, or 
when the stuff of interest is an anomalous tail in an otherwise normal 
distribution. Both of these cases apply in this situation.



> Improvements to CCO (Correlated Cross-Occurrence)
> -------------------------------------------------
>
>                 Key: MAHOUT-1853
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1853
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.12.0
>            Reporter: Andrew Palumbo
>            Assignee: Pat Ferrel
>             Fix For: 0.13.0
>
>
> Improvements to CCO (Correlated Cross-Occurrence) to include auto-threshold 
> calculation for LLR downsampling, and possible multiple fixed thresholds for 
> A’A, A’B etc. This is to account for the vast difference in dimensionality 
> between indicator types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to