[
https://issues.apache.org/jira/browse/HIVE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809117#comment-13809117
]
Remus Rusanu commented on HIVE-5692:
------------------------------------
The implementation is much more aggresive now:
- shouldFlush test the in-use vs. max at each batch boudary, not only at
checking limit. Checking limit is only used to decide when to probe/adjust the
average variable row size
- the flush is called in a while loop until it shouldFlush returns false, ie.
it flushes as much as necessary to stay within the prescribed bounds. Progress
is being monitored to prevent infinite loop.
- the checking limit is configured via HiveConf
hive.vectorized.groupby.checkinterval
- the flushing percent is configured via HiveConf
hive.vectorized.groupby.flush.percent
> Make VectorGroupByOperator parameters configurable
> --------------------------------------------------
>
> Key: HIVE-5692
> URL: https://issues.apache.org/jira/browse/HIVE-5692
> Project: Hive
> Issue Type: Sub-task
> Reporter: Remus Rusanu
> Assignee: Remus Rusanu
> Priority: Minor
> Attachments: HIVE-5692.1.patch, HIVE-5692.2.patch
>
>
> The FLUSH_CHECK_THRESHOLD and PERCENT_ENTRIES_TO_FLUSH should be configurable.
--
This message was sent by Atlassian JIRA
(v6.1#6144)