Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs

2014-04-24 Thread Silvina CaĆ­no Lores
Hi!

I've faced the same issue a couple of times and I found nothing in the logs
that lead me to the source of the error. However, I've found out that smart
container and block configuration can prevent these issues

First of all, check RM logs to find any problematic container since the
same task is failing all the time (maybe that split is violating container
resource limits, that should be reflected in such log). For instance, in my
particular case, I was running a memory-intensive map and some records
needed more memory than other in large test cases, hence I observed the
behaviour you describe because containers were getting killed.

I usually find application log files under userlogs, just go to the
directory of the container that triggers the error, as pointed by the RM
logs.

Hope it helps.

Regards,
Silvina



On 11 April 2014 09:15, Phan, Truong Q  wrote:

> I could not find the "attempt_1395628276810_0062_m_000149_0 attemp*" in
> the HDFS "/tmp" directory.
> Where can I find these log files.
>
> Thanks and Regards,
> Truong Phan
>
>
> P+ 61 2 8576 5771
> M   + 61 4 1463 7424
> Etroung.p...@team.telstra.com
> W  www.telstra.com
>
>
> -Original Message-
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Thursday, 10 April 2014 4:32 PM
> To: 
> Subject: Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed
> to run on a larger jobs
>
> It appears to me that whatever chunk of the input CSV files your map task
> 000149 gets, the program is unable to process it and throws an error and
> exits.
>
> Look into the attempt_1395628276810_0062_m_000149_0 attempt's task log to
> see if there's any stdout/stderr printed that may help. The syslog in the
> attempt's task log will also carry a "Processing split ..."
> message that may help you know which file and what offset+length under
> that file was being processed.
>
> On Thu, Apr 10, 2014 at 10:55 AM, Phan, Truong Q <
> troung.p...@team.telstra.com> wrote:
> > Hi
> >
> >
> >
> > My Hadoop 2.2.0-cdh5.0.0-beta-1 is failed to run on a larger MapReduce
> > Streaming job.
> >
> > I have no issue in running the MapReduce Streaming job which has an
> > input data file of around 400Mb CSV file.
> >
> > However, it is failed when I try to run the job which has 11 input
> > data files of size 400Mb each.
> >
> > The job failed with the following error.
> >
> >
> >
> > I appreciate for any hints or suggestions to fix this issue.
> >
> >
> >
> >
> +
> >
> > 2014-04-10 10:28:10,498 FATAL [IPC Server handler 2 on 52179]
> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> > attempt_1395628276810_0062_m_000149_0 - exited :
> java.lang.RuntimeException:
> > PipeMapRed.waitOutputThreads(): subprocess failed with code 1
> >
> > at
> > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
> > va:320)
> >
> > at
> > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
> > 533)
> >
> > at
> > org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
> >
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> >
> > at
> > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
> >
> > at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
> >
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >
> > at
> > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
> >
> > at java.security.AccessController.doPrivileged(Native Method)
> >
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> >
> > at
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> > ion.java:1491)
> >
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
> >
> >
> >
> > 2014-04-10 10:28:10,498 INFO [IPC Server handler 2 on 52179]
> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
> > from
> > attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException:
> > PipeMapRed.waitOutputThreads(): subprocess failed with code 1
> >
> > at
> > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
> > va:320)
> >
> > at
> > org.apache.hadoop.streaming.PipeMapRe

RE: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs

2014-04-11 Thread Phan, Truong Q
I could not find the "attempt_1395628276810_0062_m_000149_0 attemp*" in the 
HDFS "/tmp" directory.
Where can I find these log files.

Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.com


-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Thursday, 10 April 2014 4:32 PM
To: 
Subject: Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run 
on a larger jobs

It appears to me that whatever chunk of the input CSV files your map task 
000149 gets, the program is unable to process it and throws an error and exits.

Look into the attempt_1395628276810_0062_m_000149_0 attempt's task log to see 
if there's any stdout/stderr printed that may help. The syslog in the attempt's 
task log will also carry a "Processing split ..."
message that may help you know which file and what offset+length under that 
file was being processed.

On Thu, Apr 10, 2014 at 10:55 AM, Phan, Truong Q  
wrote:
> Hi
>
>
>
> My Hadoop 2.2.0-cdh5.0.0-beta-1 is failed to run on a larger MapReduce
> Streaming job.
>
> I have no issue in running the MapReduce Streaming job which has an
> input data file of around 400Mb CSV file.
>
> However, it is failed when I try to run the job which has 11 input
> data files of size 400Mb each.
>
> The job failed with the following error.
>
>
>
> I appreciate for any hints or suggestions to fix this issue.
>
>
>
> +
>
> 2014-04-10 10:28:10,498 FATAL [IPC Server handler 2 on 52179]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> attempt_1395628276810_0062_m_000149_0 - exited : java.lang.RuntimeException:
> PipeMapRed.waitOutputThreads(): subprocess failed with code 1
>
> at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
> va:320)
>
> at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
> 533)
>
> at
> org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>
> at
> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
>
> at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
> at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> ion.java:1491)
>
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
>
>
>
> 2014-04-10 10:28:10,498 INFO [IPC Server handler 2 on 52179]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
> from
> attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException:
> PipeMapRed.waitOutputThreads(): subprocess failed with code 1
>
> at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
> va:320)
>
> at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
> 533)
>
> at
> org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>
> at
> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
>
> at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
> at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> ion.java:1491)
>
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
>
>
>
> 2014-04-10 10:28:10,499 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
> Diagnostics report from attempt_1395628276810_0062_m_000149_0: Error:
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 1
>
> at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.ja
> va:320)
>
> at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
> 533)
>

Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs

2014-04-09 Thread Harsh J
It appears to me that whatever chunk of the input CSV files your map
task 000149 gets, the program is unable to process it and throws an
error and exits.

Look into the attempt_1395628276810_0062_m_000149_0 attempt's task log
to see if there's any stdout/stderr printed that may help. The syslog
in the attempt's task log will also carry a "Processing split ..."
message that may help you know which file and what offset+length under
that file was being processed.

On Thu, Apr 10, 2014 at 10:55 AM, Phan, Truong Q
 wrote:
> Hi
>
>
>
> My Hadoop 2.2.0-cdh5.0.0-beta-1 is failed to run on a larger MapReduce
> Streaming job.
>
> I have no issue in running the MapReduce Streaming job which has an input
> data file of around 400Mb CSV file.
>
> However, it is failed when I try to run the job which has 11 input data
> files of size 400Mb each.
>
> The job failed with the following error.
>
>
>
> I appreciate for any hints or suggestions to fix this issue.
>
>
>
> +
>
> 2014-04-10 10:28:10,498 FATAL [IPC Server handler 2 on 52179]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> attempt_1395628276810_0062_m_000149_0 - exited : java.lang.RuntimeException:
> PipeMapRed.waitOutputThreads(): subprocess failed with code 1
>
> at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
>
> at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
>
> at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>
> at
> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
>
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
>
>
>
> 2014-04-10 10:28:10,498 INFO [IPC Server handler 2 on 52179]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from
> attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException:
> PipeMapRed.waitOutputThreads(): subprocess failed with code 1
>
> at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
>
> at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
>
> at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>
> at
> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
>
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
>
>
>
> 2014-04-10 10:28:10,499 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
> report from attempt_1395628276810_0062_m_000149_0: Error:
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 1
>
> at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
>
> at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
>
> at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>
> at
> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
>
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
>
>
>
> +
>
> MAPREDUCE SCRIPT:
>
> $ cat