[
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239311#comment-13239311
]
Russell Jurney commented on PIG-2614:
-------------------------------------
Thanks a lot for helping out. This feels like ONERROR :) Ok, I asked the
LinkedIn guys to try 0.10/JRuby, fwiw.
pig.piggybank.storage.avro.bad.record.threshold=0.99
pig.piggybank.storage.avro.bad.record.min=100
So I read these as 'don't die unless max(more than 1% of the records fail, 100
records fail)
Correct?
> AvroStorage crashes on LOADING a single bad error
> -------------------------------------------------
>
> Key: PIG-2614
> URL: https://issues.apache.org/jira/browse/PIG-2614
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.10, 0.11
> Reporter: Russell Jurney
> Labels: avro, avrostorage, bad, book, cutting, doug, for, my,
> pig, sadism
> Fix For: 0.10, 0.11
>
> Attachments: PIG-2614_0.patch
>
>
> AvroStorage dies when a single bad record exists, such as one with missing
> fields. This is very bad on 'big data,' where bad records are inevitable.
> See discussion at
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
> for more theory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira