[
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915212#comment-13915212
]
Gunther Hagleitner commented on HIVE-6518:
------------------------------------------
I like it. Sounds like this will allow you to be more aggressive with
caching/flushing params, while having a trigger that will flush out stuff when
necessary.
+1 (assuming tests pass)
> Add a GC canary to the VectorGroupByOperator to flush whenever a GC is
> triggered
> --------------------------------------------------------------------------------
>
> Key: HIVE-6518
> URL: https://issues.apache.org/jira/browse/HIVE-6518
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.13.0
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Minor
> Attachments: HIVE-6518.1-tez.patch
>
>
> The current VectorGroupByOperator implementation flushes the in-memory hashes
> when the maximum entries or fraction of memory is hit.
> This works for most cases, but there are some corner cases where we hit GC
> ovehead limits or heap size limits before either of those conditions are
> reached due to the rest of the pipeline.
> This patch adds a SoftReference as a GC canary. If the soft reference is
> dead, then a full GC pass happened sometime in the near past & the
> aggregation hashtables should be flushed immediately before another full GC
> is triggered.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)