[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2010-03-10 Thread James Warren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843878#action_12843878
 ] 

James Warren commented on HIVE-224:
---

Unfortunately have bandwidth limitations myself -- but when (if?) my queue 
clears I'll be happy to give it a go.

cheers,
-James

> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2010-03-04 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841714#action_12841714
 ] 

Zheng Shao commented on HIVE-224:
-

Hi James, currently we don't have the bandwidth to do this, but I guess it 
won't be too hard - we just need to use 
http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search 
for LRU).
Are you interested in joining force on this?


> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2010-03-04 Thread James Warren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841692#action_12841692
 ] 

James Warren commented on HIVE-224:
---

think i bumped up against this or a related issue today - is there any plans on 
incorporating this into a future release?

thanks,
-James

> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2009-09-20 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757715#action_12757715
 ] 

Joydeep Sen Sarma commented on HIVE-224:


no - i guess we didn't - although it's an easy one.. fallout of reading the 
SOSP paper?

ridiculous - they are reporting 'accumator partial-hash' as something new 
(never reported in literature) when reference #1 in their paper implements 
exactly that. so much for research.


> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2009-09-19 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757646#action_12757646
 ] 

Jeff Hammerbacher commented on HIVE-224:


Hey Joy,

Out of curiosity, did you guys ever look at this issue further?

Thanks,
Jeff

> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.