[ 
https://issues.apache.org/jira/browse/DATAFU-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918377#comment-13918377
 ] 

Matthew Hayes commented on DATAFU-26:
-------------------------------------

Committed:  4aa2ef2a425dab3e3d5f5bdacf095af0b18fb993

> Resolve entropy UDF naming conventions
> --------------------------------------
>
>                 Key: DATAFU-26
>                 URL: https://issues.apache.org/jira/browse/DATAFU-26
>             Project: DataFu
>          Issue Type: Task
>            Reporter: Matthew Hayes
>            Assignee: jian wang
>             Fix For: 1.3.0
>
>         Attachments: 0001-update-entropy-naming-conventions.patch
>
>
> There are a couple issues with the naming of entropy UDFs that we should work 
> out before the next release.
> StreamingEntropy supports multiple estimation methods.  Entropy however only 
> support empirical.  The supported constructors are also different as a 
> result.  Although Entropy's documentation states it computes the empirical 
> entropy, I think the name itself may lead to confusion.  
> StreamingEntropy takes data the data in sorted order.  Using this sorted data 
> it computes count, which are then used to compute entropy.  Entropy on the 
> other hand takes counts directly and computes entropy.  These counts need to 
> be computed before calling it.  Our convention in DataFu has been that 
> "Streaming" implies that the data does not need to be sorted.  So 
> StreamingEntropy is in conflict with this.
> My proposal is:
> 1) Rename Entropy to EmpiricalEntropy
> 2) Rename StreamingEntropy to Entropy
> 3) Clearly document why you would use EmpiricalEntropy over Entropy.  It will 
> be more efficient in some scenarios and we should explain this.
> One open question I have is whether we should distinguish in the name somehow 
> that EmpiricalEntropy accepts counts, not the actual items themselves.  
> EmpiricalCountBasedEntropy seems verbose.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to