[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843878#action_12843878 ] James Warren commented on HIVE-224: --- Unfortunately have bandwidth limitations myself -- but when (if?) my queue clears I'll be happy to give it a go. cheers, -James > implement lfu based flushing policy for map side aggregates > --- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841714#action_12841714 ] Zheng Shao commented on HIVE-224: - Hi James, currently we don't have the bandwidth to do this, but I guess it won't be too hard - we just need to use http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search for LRU). Are you interested in joining force on this? > implement lfu based flushing policy for map side aggregates > --- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841692#action_12841692 ] James Warren commented on HIVE-224: --- think i bumped up against this or a related issue today - is there any plans on incorporating this into a future release? thanks, -James > implement lfu based flushing policy for map side aggregates > --- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757715#action_12757715 ] Joydeep Sen Sarma commented on HIVE-224: no - i guess we didn't - although it's an easy one.. fallout of reading the SOSP paper? ridiculous - they are reporting 'accumator partial-hash' as something new (never reported in literature) when reference #1 in their paper implements exactly that. so much for research. > implement lfu based flushing policy for map side aggregates > --- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757646#action_12757646 ] Jeff Hammerbacher commented on HIVE-224: Hey Joy, Out of curiosity, did you guys ever look at this issue further? Thanks, Jeff > implement lfu based flushing policy for map side aggregates > --- > > Key: HIVE-224 > URL: https://issues.apache.org/jira/browse/HIVE-224 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma > > currently we flush some random set of rows when the map side hash table > approaches memory limits. > we have discussed a strategy of flushing hash table entries that have the > been seen the least number of times (effectively LFU flushing strategy). This > will be very effective at reducing the amount of data sent from map to reduce > step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.