add a option in hive to skip corrupted data entirely
----------------------------------------------------
Key: HIVE-2658
URL: https://issues.apache.org/jira/browse/HIVE-2658
Project: Hive
Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
Add a new parameter:
hive.skip.corrupted.data
This is independent of the type of the underlying data.
The idea is as follows:
We have some corrupted data in our cluster right now.
We will run hive over all the corrupted partitions:
use bucketizedhiveinputformat
set hive.skip.corrupted.data=true
insert overwrite table <T> partition <P>
select * from <T> where <P>
This way, <T>@<P> will be regenerated with all the data that can be read.
If HiveRecordReader gets a exception getting the next row, the mapper will
behave as if no more data is present in the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira