[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408387#comment-15408387 ]
Ted Dunning commented on MAHOUT-1853: ------------------------------------- [~pferrel] Computing the parameters of a normal distribution is definitely cheaper than updating a t-digest, but I doubt that the difference will be visible. It takes a few additions and divisions to update the mean and sd, while it takes 100-200ns on average to update a t-digest with a new sample. But the big win happens when the data being collected is grossly non-normal, or when the stuff of interest is an anomalous tail in an otherwise normal distribution. Both of these cases apply in this situation. > Improvements to CCO (Correlated Cross-Occurrence) > ------------------------------------------------- > > Key: MAHOUT-1853 > URL: https://issues.apache.org/jira/browse/MAHOUT-1853 > Project: Mahout > Issue Type: New Feature > Affects Versions: 0.12.0 > Reporter: Andrew Palumbo > Assignee: Pat Ferrel > Fix For: 0.13.0 > > > Improvements to CCO (Correlated Cross-Occurrence) to include auto-threshold > calculation for LLR downsampling, and possible multiple fixed thresholds for > A’A, A’B etc. This is to account for the vast difference in dimensionality > between indicator types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)