[ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509325#comment-13509325
 ] 

Joseph Adler commented on PIG-2614:
-----------------------------------

Could I propose an alternative? 

I like this functionality, but I don't think that this should be specific to 
Avro records. I think that is should be straightforward to modify 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader to 
implement this functionality for ALL LoadFunc types. Specifically, it should be 
possible to count the number of Exceptions thrown by the getNext method in the 
underlying load function (inside PigRecordReader.nextKeyValue).


                
> AvroStorage crashes on LOADING a single bad error
> -------------------------------------------------
>
>                 Key: PIG-2614
>                 URL: https://issues.apache.org/jira/browse/PIG-2614
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.10.0, 0.11
>            Reporter: Russell Jurney
>            Assignee: Jonathan Coveney
>              Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
> pig, sadism
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, 
> test_avro_files.tar.gz
>
>
> AvroStorage dies when a single bad record exists, such as one with missing 
> fields.  This is very bad on 'big data,' where bad records are inevitable.  
> See discussion at 
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
>  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to