Skip bad records when streaming supported?

Pillis W Thu, 13 Apr 2017 13:34:04 -0700

Hello,
I am using 'hadoop-streaming.jar' to do a simple word count, and want to
skip records that fail execution. Below is the actual command I run, and
the mapper always fails on one record, and hence fails the job. The input
file is 3 lines with 1 bad line.


hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.job.name=SkipTest
-Dmapreduce.task.skip.start.attempts=1 -Dmapreduce.map.skip.maxrecords=1
-Dmapreduce.reduce.skip.maxgroups=1
-Dmapreduce.map.skip.proc.count.autoincr=false
-Dmapreduce.reduce.skip.proc.count.autoincr=false -D mapred.reduce.tasks=1
-D mapred.map.tasks=1 -files
/home/hadoop/wc/wordcount-mapper-t1.py,/home/hadoop/wc/wordcount-reducer-t1.py
-input /user/hadoop/data/test1 -output /user/hadoop/data/output-test-5
-mapper "python wordcount-mapper-t1.py" -reducer "python
wordcount-reducer-t1.py"


I was wondering if skipping of records is supported when MapReduce is used
in streaming mode?

Thanks in advance.
PW

Skip bad records when streaming supported?

Reply via email to