implement lfu based flushing policy for map side aggregates
-----------------------------------------------------------

                 Key: HIVE-224
                 URL: https://issues.apache.org/jira/browse/HIVE-224
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Joydeep Sen Sarma


currently we flush some random set of rows when the map side hash table 
approaches memory limits.

we have discussed a strategy of flushing hash table entries that have the been 
seen the least number of times (effectively LFU flushing strategy). This will 
be very effective at reducing the amount of data sent from map to reduce step - 
as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to