Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

C.V.Krishnakumar Tue, 27 Jul 2010 14:39:36 -0700

Hi Deepak,

Maybe I did not make my mail clear. I had tried the instructions in the blog 
you mentioned. They are  working for me. 
Did you change the /etc/hosts file at any point of time?


Regards,
Krishna

On Jul 27, 2010, at 2:30 PM, C.V.Krishnakumar wrote:

> Hi Deepak,
> 
> YOu could refer this too : 
> http://markmail.org/message/mjq6gzjhst2inuab#query:MAX_FAILED_UNIQUE_FETCHES+page:1+mid:ubrwgmddmfvoadh2+state:results
>  
> I tried those instructions and it is working for me. 
> Regards,
> Krishna
> On Jul 27, 2010, at 12:31 PM, Deepak Diwakar wrote:
> 
>> Hey friends,
>> 
>> I got stuck on setting up hdfs cluster and getting this error while running
>> simple wordcount example(I did that 2 yrs back not had any problem).
>> 
>> Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed from
>> (
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>> ).
>> 
>> I checked the firewall settings and /etc/hosts there is no issue there.
>> Also master and slave are accessible both ways.
>> 
>> Also the input size very low ~ 3 MB  and hence there shouldn't be no issue
>> because ulimit(its btw of 4096).
>> 
>> Would be really thankful  if  anyone can guide me to resolve this.
>> 
>> Thanks & regards,
>> - Deepak Diwakar,
>> 
>> 
>> 
>> 
>> On 28 June 2010 18:39, bmdevelopment <bmdevelopm...@gmail.com> wrote:
>> 
>>> Hi, Sorry for the cross-post. But just trying to see if anyone else
>>> has had this issue before.
>>> Thanks
>>> 
>>> 
>>> ---------- Forwarded message ----------
>>> From: bmdevelopment <bmdevelopm...@gmail.com>
>>> Date: Fri, Jun 25, 2010 at 10:56 AM
>>> Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
>>> bailing-out.
>>> To: mapreduce-u...@hadoop.apache.org
>>> 
>>> 
>>> Hello,
>>> Thanks so much for the reply.
>>> See inline.
>>> 
>>> On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <yhema...@gmail.com>
>>> wrote:
>>>> Hi,
>>>> 
>>>>> I've been getting the following error when trying to run a very simple
>>>>> MapReduce job.
>>>>> Map finishes without problem, but error occurs as soon as it enters
>>>>> Reduce phase.
>>>>> 
>>>>> 10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
>>>>> attempt_201006241812_0001_r_000000_0, Status : FAILED
>>>>> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>>>>> 
>>>>> I am running a 5 node cluster and I believe I have all my settings
>>> correct:
>>>>> 
>>>>> * ulimit -n 32768
>>>>> * DNS/RDNS configured properly
>>>>> * hdfs-site.xml : http://pastebin.com/xuZ17bPM
>>>>> * mapred-site.xml : http://pastebin.com/JraVQZcW
>>>>> 
>>>>> The program is very simple - just counts a unique string in a log file.
>>>>> See here: http://pastebin.com/5uRG3SFL
>>>>> 
>>>>> When I run, the job fails and I get the following output.
>>>>> http://pastebin.com/AhW6StEb
>>>>> 
>>>>> However, runs fine when I do *not* use substring() on the value (see
>>>>> map function in code above).
>>>>> 
>>>>> This runs fine and completes successfully:
>>>>>          String str = val.toString();
>>>>> 
>>>>> This causes error and fails:
>>>>>          String str = val.toString().substring(0,10);
>>>>> 
>>>>> Please let me know if you need any further information.
>>>>> It would be greatly appreciated if anyone could shed some light on this
>>> problem.
>>>> 
>>>> It catches attention that changing the code to use a substring is
>>>> causing a difference. Assuming it is consistent and not a red herring,
>>> 
>>> Yes, this has been consistent over the last week. I was running 0.20.1
>>> first and then
>>> upgrade to 0.20.2 but results have been exactly the same.
>>> 
>>>> can you look at the counters for the two jobs using the JobTracker web
>>>> UI - things like map records, bytes etc and see if there is a
>>>> noticeable difference ?
>>> 
>>> Ok, so here is the first job using write.set(value.toString()); having
>>> *no* errors:
>>> http://pastebin.com/xvy0iGwL
>>> 
>>> And here is the second job using
>>> write.set(value.toString().substring(0, 10)); that fails:
>>> http://pastebin.com/uGw6yNqv
>>> 
>>> And here is even another where I used a longer, and therefore unique
>>> string,
>>> by write.set(value.toString().substring(0, 20)); This makes every line
>>> unique, similar to first job.
>>> Still fails.
>>> http://pastebin.com/GdQ1rp8i
>>> 
>>>> Also, are the two programs being run against
>>>> the exact same input data ?
>>> 
>>> Yes, exactly the same input: a single csv file with 23K lines.
>>> Using a shorter string leads to more like keys and therefore more
>>> combining/reducing, but going
>>> by the above it seems to fail whether the substring/key is entirely
>>> unique (23000 combine output records) or
>>> mostly the same (9 combine output records).
>>> 
>>>> 
>>>> Also, since the cluster size is small, you could also look at the
>>>> tasktracker logs on the machines where the maps have run to see if
>>>> there are any failures when the reduce attempts start failing.
>>> 
>>> Here is the TT log from the last failed job. I do not see anything
>>> besides the shuffle failure, but there
>>> may be something I am overlooking or simply do not understand.
>>> http://pastebin.com/DKFTyGXg
>>> 
>>> Thanks again!
>>> 
>>>> 
>>>> Thanks
>>>> Hemanth
>>>> 
>>> 
>

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

Reply via email to