[
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-2614:
-------------------------------
Attachment: test_avro_files.tar.gz
PIG-2614_2.patch
Hi all,
I rebased the patch to trunk. Hopefully, this will make things more clear:
- Removed PIG-2551 code since it's already committed to trunk.
- Replaced the {{ignore_bad_file}} option that was committed in PIG-2909 with
the {{bad.record.threshold}} and {{bad.record.min}} properties.
- Added unit test cases
{{testCorruptedFile1,2,3}}.
@Joe,
I am not sure if I fully understand your question. Please correct me if I am
wrong.
You're right that {{InputErrorTracker}} can be used by any LoadFunc. What
storages need to do is to create a {{InputErrorTracker}} and increase counters.
Do you have a better suggestion?
Thanks!
> AvroStorage crashes on LOADING a single bad error
> -------------------------------------------------
>
> Key: PIG-2614
> URL: https://issues.apache.org/jira/browse/PIG-2614
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.10.0, 0.11
> Reporter: Russell Jurney
> Assignee: Jonathan Coveney
> Labels: avro, avrostorage, bad, book, cutting, doug, for, my,
> pig, sadism
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch,
> test_avro_files.tar.gz
>
>
> AvroStorage dies when a single bad record exists, such as one with missing
> fields. This is very bad on 'big data,' where bad records are inevitable.
> See discussion at
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
> for more theory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira