Thanks Syed.  I'm not using HBase, so I don't think this is related to my 
problem.

Dave Shine
Sr. Software Engineer
321.939.5093 direct |  407.314.0122 mobile
CI Boost(tm) Clients  Outperform Online(tm)  
www.ciboost.com<http://www.ciboost.com/>

From: syed kather [mailto:in.ab...@gmail.com]
Sent: Friday, July 20, 2012 9:58 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Distributing Keys across Reducers

Dave Shine ,
    Can you share how many data is been taken by map task .If map task is 
uneven then it might be Hot Spotting Problem.
Have an look on 
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
 .
  I had also faced same problem i am trying implement this HbaseWD.

            Thanks and Regards,
        S SYED ABDUL KATHER




On Fri, Jul 20, 2012 at 6:50 PM, Dave Shine 
<dave.sh...@channelintelligence.com<mailto:dave.sh...@channelintelligence.com>> 
wrote:
I have a job that is emitting over 3 billion rows from the map to the reduce.  
The job is configured with 43 reduce tasks.  A perfectly even distribution 
would amount to about 70 million rows per reduce task.  However I actually got 
around 60 million for most of the tasks, one task got over 100 million, and one 
task got almost 350 million.  This uneven distribution caused the job to run 
exceedingly long.

I believe this is referred to as a "key skew problem", which I know is heavily 
dependent on the actual data being processed.  Can anyone point me to any blog 
posts, white papers, etc. that might give me some options on how to deal with 
this issue?

Thanks,
Dave Shine
Sr. Software Engineer
321.939.5093<tel:321.939.5093> direct |  407.314.0122<tel:407.314.0122> mobile

[cid:image001.png@01CD6668.0EAF16C0]
CI Boost(tm) Clients  Outperform Online(tm)  
www.ciboost.com<http://www.ciboost.com/>
facebook platform | where-to-buy | product search engines | shopping engines



________________________________
The information contained in this email message is considered confidential and 
proprietary to the sender and is intended solely for review and use by the 
named recipient. Any unauthorized review, use or distribution is strictly 
prohibited. If you have received this message in error, please advise the 
sender by reply email and delete the message.

<<inline: image001.png>>

Reply via email to