[
https://issues.apache.org/jira/browse/DATAFU-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884693#comment-13884693
]
Matthew Hayes commented on DATAFU-24:
-------------------------------------
So, I've thought about this some more. Here are the differences between these
two UDFs:
1) Entropy only calculates the empirical entropy. StreamingEntropy can
calculate empirical, chao-shen, and potentially others.
2) Entropy supports multiple estimation methods. It also supports both
algebraic and accumulator. StreamingEntropy only supports accumulator.
I think what may make more sense is to rename Entropy to EmpiricalEntropy so
there is no confusion about which method it uses. I'll open a separate JIRA to
cover this and other thoughts I have.
> Entropy constructor should be consistent with other UDFs
> --------------------------------------------------------
>
> Key: DATAFU-24
> URL: https://issues.apache.org/jira/browse/DATAFU-24
> Project: DataFu
> Issue Type: Bug
> Reporter: Matthew Hayes
>
> Entropy currently has the following UDFs:
> {noformat}
> Entropy()
> Entropy(String base)
> {noformat}
> This is inconsistent with StreamingEntropy and StreamingCondEntropy, which
> both have constructors like the following:
> {noformat}
> StreamingEntropy()
> StreamingEntropy(String type)
> StreamingEntropy(String type, String base)
> {noformat}
> We should change Entropy to match the other UDFs.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)