[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841714#action_12841714 ]
Zheng Shao commented on HIVE-224: --------------------------------- Hi James, currently we don't have the bandwidth to do this, but I guess it won't be too hard - we just need to use http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search for LRU). Are you interested in joining force on this? > implement lfu based flushing policy for map side aggregates > ----------------------------------------------------------- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.