Dale Richardson created SPARK-6593: -------------------------------------- Summary: Provide option for HadoopRDD to skip bad data splits. Key: SPARK-6593 URL: https://issues.apache.org/jira/browse/SPARK-6593 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.3.0 Reporter: Dale Richardson Priority: Minor
When reading a large amount of files from HDFS eg. with sc.textFile("hdfs:///user/cloudera/logs*.gz"). If a single split is corrupted then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted portion and continue the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org