Cheng Lian created SPARK-14273: ---------------------------------- Summary: Add FileFormat.isSplittable to indicate whether a format is splittable Key: SPARK-14273 URL: https://issues.apache.org/jira/browse/SPARK-14273 Project: Spark Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Cheng Lian
{{FileSourceStrategy}} assumes that all data source formats are splittable and always splits data files by fixed partition size. However, not all HDSF based formats are splittable. We need a flag to indicate that and ensure that non-splittable files won't be split into multiple Spark partitions. (PS: Is it "splitable" or "splittable"? Probably the latter one? Hadoop uses the former one though...) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org