[
https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408285#comment-15408285
]
Ted Dunning commented on MAHOUT-1853:
-------------------------------------
[~Andrew_Palumbo] There have been a number of upgrades to t-digest. Faster more
accurate. Very nearly API compatible.
[~pferrel] Yes, root LLR is normally distributed if you have no relationship
and have enough data to see the negative side. Most importantly, it is signed.
And yes, the t-digest scan can be pretty rare. Once you know how the mass of
data looks, you are good to go.
> Improvements to CCO (Correlated Cross-Occurrence)
> -------------------------------------------------
>
> Key: MAHOUT-1853
> URL: https://issues.apache.org/jira/browse/MAHOUT-1853
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.12.0
> Reporter: Andrew Palumbo
> Assignee: Pat Ferrel
> Fix For: 0.13.0
>
>
> Improvements to CCO (Correlated Cross-Occurrence) to include auto-threshold
> calculation for LLR downsampling, and possible multiple fixed thresholds for
> A’A, A’B etc. This is to account for the vast difference in dimensionality
> between indicator types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)