[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2614: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Closing it as PIG-3059 incorporates this. > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10.0, 0.11 >Reporter: Russell Jurney >Assignee: Jonathan Coveney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.12 > > Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, > test_avro_files.tar.gz > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2614: --- Fix Version/s: (was: 0.10.1) (was: 0.11) 0.12 moving this to next release so that we can converge on pig 0.11 > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10.0, 0.11 >Reporter: Russell Jurney >Assignee: Jonathan Coveney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.12 > > Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, > test_avro_files.tar.gz > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2614: --- Status: Patch Available (was: Open) > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10.0, 0.11 >Reporter: Russell Jurney >Assignee: Jonathan Coveney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, > test_avro_files.tar.gz > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2614: --- Attachment: test_avro_files.tar.gz PIG-2614_2.patch Hi all, I rebased the patch to trunk. Hopefully, this will make things more clear: - Removed PIG-2551 code since it's already committed to trunk. - Replaced the {{ignore_bad_file}} option that was committed in PIG-2909 with the {{bad.record.threshold}} and {{bad.record.min}} properties. - Added unit test cases {{testCorruptedFile1,2,3}}. @Joe, I am not sure if I fully understand your question. Please correct me if I am wrong. You're right that {{InputErrorTracker}} can be used by any LoadFunc. What storages need to do is to create a {{InputErrorTracker}} and increase counters. Do you have a better suggestion? Thanks! > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10.0, 0.11 >Reporter: Russell Jurney >Assignee: Jonathan Coveney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, > test_avro_files.tar.gz > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2614: Fix Version/s: (was: 0.10.0) 0.10.1 > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10.0, 0.11 >Reporter: Russell Jurney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2614_0.patch, PIG-2614_1.patch > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2614: -- Attachment: PIG-2614_1.patch Russell, This patch gets rid of the logging dependency. Let me know if you have any issues with it! > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10, 0.11 >Reporter: Russell Jurney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.10, 0.11 > > Attachments: PIG-2614_0.patch, PIG-2614_1.patch > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2614: -- Attachment: PIG-2614_0.patch works in both 0.10 and 0.11 > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10, 0.11 >Reporter: Russell Jurney >Priority: Blocker > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.10, 0.11 > > Attachments: PIG-2614_0.patch > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2614: -- Priority: Major (was: Blocker) Affects Version/s: 0.11 Fix Version/s: 0.11 > AvroStorage crashes on LOADING a single bad error > - > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.10, 0.11 >Reporter: Russell Jurney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.10, 0.11 > > Attachments: PIG-2614_0.patch > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira