1) I do a lot of progress reporting 2) Why would the job succeed when the only change in the code is if(NumberWrites++ % 100 == 0) context.write(key,value); comment out the test allowing full writes and the job fails Since every write is a report I assume that something in the write code or other hadoop code for dealing with output if failing. I do increment a counter for every write or in the case of the above code potential write What I am seeing is that where ever the timeout occurs it is not in a place where I am capable of inserting more reporting
On Wed, Jan 18, 2012 at 4:01 PM, Leonardo Urbina <lurb...@mit.edu> wrote: > Perhaps you are not reporting progress throughout your task. If you > happen to run a job large enough job you hit the the default timeout > mapred.task.timeout (that defaults to 10 min). Perhaps you should > consider reporting progress in your mapper/reducer by calling > progress() on the Reporter object. Check tip 7 of this link: > > http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/ > > Hope that helps, > -Leo > > Sent from my phone > > On Jan 18, 2012, at 6:46 PM, Steve Lewis <lordjoe2...@gmail.com> wrote: > > > I KNOW is is a task timeout - what I do NOT know is WHY merely cutting > the > > number of writes causes it to go away. It seems to imply that some > > context.write operation or something downstream from that is taking a > huge > > amount of time and that is all hadoop internal code - not mine so my > > question is why should increasing the number and volume of wriotes cause > a > > task to time out > > > > On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez <t...@supertom.com> wrote: > > > >> Sounds like mapred.task.timeout? The default is 10 minutes. > >> > >> http://hadoop.apache.org/common/docs/current/mapred-default.html > >> > >> Thanks, > >> > >> Tom > >> > >> On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis <lordjoe2...@gmail.com> > >> wrote: > >>> The map tasks fail timing out after 600 sec. > >>> I am processing one 9 GB file with 16,000,000 records. Each record > (think > >>> is it as a line) generates hundreds of key value pairs. > >>> The job is unusual in that the output of the mapper in terms of records > >> or > >>> bytes orders of magnitude larger than the input. > >>> I have no idea what is slowing down the job except that the problem is > in > >>> the writes. > >>> > >>> If I change the job to merely bypass a fraction of the context.write > >>> statements the job succeeds. > >>> This is one map task that failed and one that succeeded - I cannot > >>> understand how a write can take so long > >>> or what else the mapper might be doing > >>> > >>> JOB FAILED WITH TIMEOUT > >>> > >>> *Parser*TotalProteins90,103NumberFragments10,933,089 > >>> > >> > *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807 > >>> *Map-Reduce Framework*Combine output records10,033,499Map input records > >>> 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine > input > >>> records10,844,881Map output records10,933,089 > >>> Same code but fewer writes > >>> JOB SUCCEEDED > >>> > >>> *Parser*TotalProteins90,103NumberFragments206,658,758 > >>> *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607 > >>> FILE_BYTES_WRITTEN220,169,922 > >>> *Map-Reduce Framework*Combine output records4,046,128Map input > >>> records90,103Spilled > >>> Records4,046,128Map output bytes662,354,413Combine input > >> records4,098,609Map > >>> output records2,066,588 > >>> Any bright ideas > >>> -- > >>> Steven M. Lewis PhD > >>> 4221 105th Ave NE > >>> Kirkland, WA 98033 > >>> 206-384-1340 (cell) > >>> Skype lordjoe_com > >> > > > > > > > > -- > > Steven M. Lewis PhD > > 4221 105th Ave NE > > Kirkland, WA 98033 > > 206-384-1340 (cell) > > Skype lordjoe_com > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com