Re: How to stop a mapper within a map-reduce job when you detect bad input

Tom White Thu, 21 Oct 2010 09:44:19 -0700

On Thu, Oct 21, 2010 at 8:23 AM, ed <hadoopn...@gmail.com> wrote:
> Hello,
>
> The MapRunner classes looks promising.  I noticed it is in the deprecated
> mapred package but I didn't see an equivalent class in the mapreduce
> package.  Is this going to ported to mapreduce or is it no longer being
> supported?  Thanks!


The equivalent functionality is in org.apache.hadoop.mapreduce.Mapper#run.

Cheers
Tom

>
> ~Ed
>
> On Thu, Oct 21, 2010 at 6:36 AM, Harsh J <qwertyman...@gmail.com> wrote:
>
>> If it occurs eventually as your record reader reads it, then you may
>> use a MapRunner class instead of a Mapper IFace/Subclass. This way,
>> you may try/catch over the record reader itself, and call your map
>> function only on valid next()s. I think this ought to work.
>>
>> You can set it via JobConf.setMapRunnerClass(...).
>>
>> Ref: MapRunner API @
>>
>> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/MapRunner.html
>>
>> On Wed, Oct 20, 2010 at 4:14 AM, ed <hadoopn...@gmail.com> wrote:
>> > Hello,
>> >
>> > I have a simple map-reduce job that reads in zipped files and converts
>> them
>> > to lzo compression.  Some of the files are not properly zipped which
>> results
>> > in Hadoop throwing an "java.io.EOFException: Unexpected end of input
>> stream
>> > error" and causes the job to fail.  Is there a way to catch this
>> exception
>> > and tell hadoop to just ignore the file and move on?  I think the
>> exception
>> > is being thrown by the class reading in the Gzip file and not my mapper
>> > class.  Is this correct?  Is there a way to handle this type of error
>> > gracefully?
>> >
>> > Thank you!
>> >
>> > ~Ed
>> >
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>

Re: How to stop a mapper within a map-reduce job when you detect bad input

Reply via email to