[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916282#comment-13916282
 ] 

Remus Rusanu commented on HIVE-6518:
------------------------------------

Can you somehow modify the LOG.debug at top of flush() to call out that the 
flush was triggered by the gcCanary.get() == null? I was thinking: keep a count 
of gcCanary allocations and print it in the LOG.debug message, this will tell 
us if the GC is the trigger and also will tell how often has occured in the 
operator lifetime, when debugging etc.
+1

> Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
> triggered
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-6518
>                 URL: https://issues.apache.org/jira/browse/HIVE-6518
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Minor
>         Attachments: HIVE-6518.1-tez.patch
>
>
> The current VectorGroupByOperator implementation flushes the in-memory hashes 
> when the maximum entries or fraction of memory is hit.
> This works for most cases, but there are some corner cases where we hit GC 
> ovehead limits or heap size limits before either of those conditions are 
> reached due to the rest of the pipeline.
> This patch adds a SoftReference as a GC canary. If the soft reference is 
> dead, then a full GC pass happened sometime in the near past & the 
> aggregation hashtables should be flushed immediately before another full GC 
> is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to