better fault tolerance for corrupted text files
-----------------------------------------------
Key: HADOOP-3144
URL: https://issues.apache.org/jira/browse/HADOOP-3144
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.15.3
Reporter: Joydeep Sen Sarma
every once in a while - we encounter corrupted text files (corrupted at source
prior to copying into hadoop). inevitably - some of the data looks like a
really really long line and hadoop trips over trying to stuff it into an in
memory object and gets outofmem error. Code looks same way in trunk as well ..
so looking for an option to the textinputformat (and like) to ignore long
lines. ideally - we would just skip errant lines above a certain size limit.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.