Hi Deepak, Maybe I did not make my mail clear. I had tried the instructions in the blog you mentioned. They are working for me. Did you change the /etc/hosts file at any point of time?
Regards, Krishna On Jul 27, 2010, at 2:30 PM, C.V.Krishnakumar wrote: > Hi Deepak, > > YOu could refer this too : > http://markmail.org/message/mjq6gzjhst2inuab#query:MAX_FAILED_UNIQUE_FETCHES+page:1+mid:ubrwgmddmfvoadh2+state:results > > I tried those instructions and it is working for me. > Regards, > Krishna > On Jul 27, 2010, at 12:31 PM, Deepak Diwakar wrote: > >> Hey friends, >> >> I got stuck on setting up hdfs cluster and getting this error while running >> simple wordcount example(I did that 2 yrs back not had any problem). >> >> Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed from >> ( >> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >> ). >> >> I checked the firewall settings and /etc/hosts there is no issue there. >> Also master and slave are accessible both ways. >> >> Also the input size very low ~ 3 MB and hence there shouldn't be no issue >> because ulimit(its btw of 4096). >> >> Would be really thankful if anyone can guide me to resolve this. >> >> Thanks & regards, >> - Deepak Diwakar, >> >> >> >> >> On 28 June 2010 18:39, bmdevelopment <bmdevelopm...@gmail.com> wrote: >> >>> Hi, Sorry for the cross-post. But just trying to see if anyone else >>> has had this issue before. >>> Thanks >>> >>> >>> ---------- Forwarded message ---------- >>> From: bmdevelopment <bmdevelopm...@gmail.com> >>> Date: Fri, Jun 25, 2010 at 10:56 AM >>> Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; >>> bailing-out. >>> To: mapreduce-u...@hadoop.apache.org >>> >>> >>> Hello, >>> Thanks so much for the reply. >>> See inline. >>> >>> On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala <yhema...@gmail.com> >>> wrote: >>>> Hi, >>>> >>>>> I've been getting the following error when trying to run a very simple >>>>> MapReduce job. >>>>> Map finishes without problem, but error occurs as soon as it enters >>>>> Reduce phase. >>>>> >>>>> 10/06/24 18:41:00 INFO mapred.JobClient: Task Id : >>>>> attempt_201006241812_0001_r_000000_0, Status : FAILED >>>>> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >>>>> >>>>> I am running a 5 node cluster and I believe I have all my settings >>> correct: >>>>> >>>>> * ulimit -n 32768 >>>>> * DNS/RDNS configured properly >>>>> * hdfs-site.xml : http://pastebin.com/xuZ17bPM >>>>> * mapred-site.xml : http://pastebin.com/JraVQZcW >>>>> >>>>> The program is very simple - just counts a unique string in a log file. >>>>> See here: http://pastebin.com/5uRG3SFL >>>>> >>>>> When I run, the job fails and I get the following output. >>>>> http://pastebin.com/AhW6StEb >>>>> >>>>> However, runs fine when I do *not* use substring() on the value (see >>>>> map function in code above). >>>>> >>>>> This runs fine and completes successfully: >>>>> String str = val.toString(); >>>>> >>>>> This causes error and fails: >>>>> String str = val.toString().substring(0,10); >>>>> >>>>> Please let me know if you need any further information. >>>>> It would be greatly appreciated if anyone could shed some light on this >>> problem. >>>> >>>> It catches attention that changing the code to use a substring is >>>> causing a difference. Assuming it is consistent and not a red herring, >>> >>> Yes, this has been consistent over the last week. I was running 0.20.1 >>> first and then >>> upgrade to 0.20.2 but results have been exactly the same. >>> >>>> can you look at the counters for the two jobs using the JobTracker web >>>> UI - things like map records, bytes etc and see if there is a >>>> noticeable difference ? >>> >>> Ok, so here is the first job using write.set(value.toString()); having >>> *no* errors: >>> http://pastebin.com/xvy0iGwL >>> >>> And here is the second job using >>> write.set(value.toString().substring(0, 10)); that fails: >>> http://pastebin.com/uGw6yNqv >>> >>> And here is even another where I used a longer, and therefore unique >>> string, >>> by write.set(value.toString().substring(0, 20)); This makes every line >>> unique, similar to first job. >>> Still fails. >>> http://pastebin.com/GdQ1rp8i >>> >>>> Also, are the two programs being run against >>>> the exact same input data ? >>> >>> Yes, exactly the same input: a single csv file with 23K lines. >>> Using a shorter string leads to more like keys and therefore more >>> combining/reducing, but going >>> by the above it seems to fail whether the substring/key is entirely >>> unique (23000 combine output records) or >>> mostly the same (9 combine output records). >>> >>>> >>>> Also, since the cluster size is small, you could also look at the >>>> tasktracker logs on the machines where the maps have run to see if >>>> there are any failures when the reduce attempts start failing. >>> >>> Here is the TT log from the last failed job. I do not see anything >>> besides the shuffle failure, but there >>> may be something I am overlooking or simply do not understand. >>> http://pastebin.com/DKFTyGXg >>> >>> Thanks again! >>> >>>> >>>> Thanks >>>> Hemanth >>>> >>> >