kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459243033
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
maxHashTblMemory/1024/1024,
gcCanary.get() == null ? "dead" : "alive"));
}
+ int avgAccess = computeAvgAccess();
/* Iterate the global (keywrapper,aggregationbuffers) map and emit
a row for each key */
Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
mapKeysAggregationBuffers.entrySet().iterator();
while(iter.hasNext()) {
Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+ if (!all && avgAccess >= 1) {
+ // Retain entries when access pattern is > than average access
+ if (pair.getValue().getAccessCount() > avgAccess) {
Review comment:
@ashutoshc this conversation was still not resolved - I was waiting for
a response; I think we could have improved further on this patch just by
changing it a little bit.
@rbalamohan we are batch removing from the cache elements here; which does
not happen in regular LRU stuff.
if we have {{K}} cache slots; and start the stream with an element which is
there for say {{N*K}} times ; that will raise the bar to retain a new cache
element during flush to {{N}}.
I think the counters of the retained entries should be reset to 0 at least -
it will increase it's effectiveness and neutralize long-term memory effects -
like the above
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]