[ https://issues.apache.org/jira/browse/DATAFU-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884693#comment-13884693 ]
Matthew Hayes commented on DATAFU-24: ------------------------------------- So, I've thought about this some more. Here are the differences between these two UDFs: 1) Entropy only calculates the empirical entropy. StreamingEntropy can calculate empirical, chao-shen, and potentially others. 2) Entropy supports multiple estimation methods. It also supports both algebraic and accumulator. StreamingEntropy only supports accumulator. I think what may make more sense is to rename Entropy to EmpiricalEntropy so there is no confusion about which method it uses. I'll open a separate JIRA to cover this and other thoughts I have. > Entropy constructor should be consistent with other UDFs > -------------------------------------------------------- > > Key: DATAFU-24 > URL: https://issues.apache.org/jira/browse/DATAFU-24 > Project: DataFu > Issue Type: Bug > Reporter: Matthew Hayes > > Entropy currently has the following UDFs: > {noformat} > Entropy() > Entropy(String base) > {noformat} > This is inconsistent with StreamingEntropy and StreamingCondEntropy, which > both have constructors like the following: > {noformat} > StreamingEntropy() > StreamingEntropy(String type) > StreamingEntropy(String type, String base) > {noformat} > We should change Entropy to match the other UDFs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)