[jira] [Created] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.

Dale Richardson (JIRA) Sun, 29 Mar 2015 04:00:02 -0700

Dale Richardson created SPARK-6593:
--------------------------------------

             Summary: Provide option for HadoopRDD to skip bad data splits.
                 Key: SPARK-6593
                 URL: https://issues.apache.org/jira/browse/SPARK-6593
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.3.0
            Reporter: Dale Richardson
            Priority: Minor



When reading a large amount of files from HDFS eg. with  
sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted 
then the entire job is canceled. As default behaviour this is probably for the 
best, but it would be nice in some circumstances where you know it will be ok 
to have the option to skip the corrupted portion and continue the job. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6593) Provide option for HadoopRDD to skip bad data splits.

Reply via email to