Re: Skipping Bad Records

Justin Woody Fri, 14 Oct 2011 05:00:24 -0700

Tom,

Agreed, this is a third party reader operating on a custom data
format. Neither of which I control. The error is happening in the
reader and I'm trying to isolate the issue in order to do proper
handling.


Thanks!
Justin

On Thu, Oct 13, 2011 at 5:31 PM, Tom White <t...@cloudera.com> wrote:
> Justin,
>
> The skipping feature should really only be used when you are calling
> out to a third-party library that may segfault on corrupt data, and
> even then it's probably better to use a subprocess to handles it, as
> Owen suggested here:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201108.mbox/%3ccafqou9ekv+sbvav-bsf5dorjo68vsj6ztqxywwut+qhs3v3...@mail.gmail.com%3e.
>
> In other cases you should handle the corrupt data in your mapper or
> reducer, by catching the relevant exception, for example.
>
> Tom
>
> On Thu, Oct 13, 2011 at 5:41 AM, Justin Woody <justin.wo...@gmail.com> wrote:
>> Harsh,
>>
>> Thanks for the info. If I get some time maybe I can assist. I'm
>> looking over your code now. For now I am failing the files with the
>> mapred.max.map.failures.percent property, but I'm losing a lot of good
>> data going that route.
>>
>> Justin
>>
>> On Wed, Oct 12, 2011 at 4:27 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Justin,
>>>
>>> Unfortunately not. The new API does not have a skipping feature yet
>>> like the older one.
>>>
>>> I did get started on some work on
>>> https://issues.apache.org/jira/browse/MAPREDUCE-1932 to fix this but I
>>> haven't been able to find time to complete it with proper tests and
>>> such. I'll try to do it within a week from now.
>>>
>>> On Wed, Oct 12, 2011 at 10:06 PM, Justin Woody <justin.wo...@gmail.com> 
>>> wrote:
>>>> Can anyone confirm whether the skip options work for MR jobs using the
>>>> new API? I have a job using the new API and I cannot get the job to
>>>> skip corrupted records. I tried configuring job properties manually
>>>> and using the SkipBadRecords class.
>>>>
>>>> Thanks,
>>>> Justin
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>

Re: Skipping Bad Records

Reply via email to