[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

GitBox Wed, 22 Jul 2020 23:44:13 -0700


kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459243033




##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       @ashutoshc this conversation was still not resolved - I was waiting for 
a response; I think we could have improved further on this patch just by 
changing it a little bit.
   
   @rbalamohan  we are batch removing from the cache elements here; which does 
not happen in regular LRU stuff.
   
   if we have {{K}} cache slots; and start the stream with an element which is 
there for say {{N*K}} times ; that will raise the bar to retain a new cache 
element during flush to {{N}}.
   
   I think the counters of the retained entries should be reset to 0 at least - 
it will increase it's effectiveness and neutralize long-term memory effects - 
like the above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kgyrtkirk commented on a change in pull request #1250: HIVE-23843: Improve key evictions in VectorGroupByOperator

Reply via email to