Dave Shine ,
Can you share how many data is been taken by map task .If map task is
uneven then it might be Hot Spotting Problem.
Have an look on
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
.
I had also faced same problem i am trying implement this HbaseWD.
Thanks and Regards,
S SYED ABDUL KATHER
*
*
On Fri, Jul 20, 2012 at 6:50 PM, Dave Shine <
[email protected]> wrote:
> I have a job that is emitting over 3 billion rows from the map to the
> reduce. The job is configured with 43 reduce tasks. A perfectly even
> distribution would amount to about 70 million rows per reduce task.
> However I actually got around 60 million for most of the tasks, one task
> got over 100 million, and one task got almost 350 million. This uneven
> distribution caused the job to run exceedingly long.****
>
> ** **
>
> I believe this is referred to as a “key skew problem”, which I know is
> heavily dependent on the actual data being processed. Can anyone point me
> to any blog posts, white papers, etc. that might give me some options on
> how to deal with this issue? ****
>
> ** **
>
> Thanks,****
>
> *Dave Shine*****
>
> Sr. Software Engineer****
>
> 321.939.5093 direct | 407.314.0122 mobile****
>
> ** **
>
> [image: cid:D34AFA33-EA7B-4B08-9DD4-2C8DFBE66338]****
>
> *CI Boost™ Clients* *Outperform Online™ *www.ciboost.com****
>
> facebook platform | where-to-buy | product search engines | shopping
> engines****
>
> ** **
>
> ** **
>
> ------------------------------
> The information contained in this email message is considered confidential
> and proprietary to the sender and is intended solely for review and use by
> the named recipient. Any unauthorized review, use or distribution is
> strictly prohibited. If you have received this message in error, please
> advise the sender by reply email and delete the message.
>
<<image001.png>>
