Travis Crawford created HCATALOG-487:
----------------------------------------

             Summary: HCatalog should tolerate a user-defined amount of bad 
records
                 Key: HCATALOG-487
                 URL: https://issues.apache.org/jira/browse/HCATALOG-487
             Project: HCatalog
          Issue Type: Improvement
            Reporter: Travis Crawford
            Assignee: Travis Crawford


HCatalog tasks currently fail when deserializing corrupt records. In some 
cases, large data sets have a small number of corrupt records and its okay to 
skip them. In fact Hadoop has support for skipping bad records for exactly this 
reason.

However, using the Hadoop-native record skipping feature (like Hive does) is 
very coarse and leads to a large number of failed tasks, task scheduling 
overhead, and limited control over the skipping behavior.

HCatalog should have native support for skipping a user-defined amount of bad 
records.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to