[ 
https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-22960.
----------------------------------
    Resolution: Won't Fix

> Approximate TopN Key Operator
> -----------------------------
>
>                 Key: HIVE-22960
>                 URL: https://issues.apache.org/jira/browse/HIVE-22960
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png
>
>
> "Different from other operators, top n operator demonstrates the notable 
> “long tail” characteristics which makes it distinct from other operators like 
> join, group by and etc. will saturate very quickly. Update is pretty frequent 
> at the beginning and then diverges to a very slow update frequently.
> The approximation can be implemented in two ways: one way is to stop the 
> array/heap update after certain percentage of the data is been read, for 
> example, 10% or 20%, if we know the table size. The other way is to set a 
> frequency threshold of the array/heap update. After the threshold is met, 
> then stop the top n processing"
> [~rzhappy]
> !Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468!
> Y: number of updates in every 100msec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to