[ https://issues.apache.org/jira/browse/SPARK-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-15473: --------------------------------- Description: Currently CSV data source fails to write and read empty data. The code below: {code} val emptyDf = spark.range(10).filter(_ => false) emptyDf.write .format("csv") .save(path.getCanonicalPath) val copyEmptyDf = spark.read .format("csv") .load(path.getCanonicalPath) copyEmptyDf.show() {code} throws an exception below: {code} Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.<init>(Path.java:135) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) at scala.Option.map(Option.scala:146) {code} Note that this is a different case with the data below {code} val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema) {code} In this case, any writer is not initialised and created. (no calls of {{WriterContainer.writeRows()}}. Maybe, it should be able to read/write header for schemas as well as empty data. For Parquet and JSON, it works but CSV does not. was: Currently CSV data source fails to write and read empty data. The code below: {code} val emptyDf = spark.range(10).filter(_ => false) emptyDf.write .format("csv") .save(path.getCanonicalPath) val copyEmptyDf = spark.read .format("csv") .load(path.getCanonicalPath) copyEmptyDf.show() {code} throws an exception below: {code} Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.<init>(Path.java:135) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) at scala.Option.map(Option.scala:146) {code} Note that this is a different case with the data below {code} val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema) {code} In this case, any writer is not initialised and created. (no calls of {{WriterContainer.writeRows()}}. Maybe, it should be able to read/write header for schemas as well as empty data. Summary: CSV fails to write and read back empty dataframe (was: CSV fails to write empty dataframe) > CSV fails to write and read back empty dataframe > ------------------------------------------------ > > Key: SPARK-15473 > URL: https://issues.apache.org/jira/browse/SPARK-15473 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > > Currently CSV data source fails to write and read empty data. > The code below: > {code} > val emptyDf = spark.range(10).filter(_ => false) > emptyDf.write > .format("csv") > .save(path.getCanonicalPath) > val copyEmptyDf = spark.read > .format("csv") > .load(path.getCanonicalPath) > copyEmptyDf.show() > {code} > throws an exception below: > {code} > Can not create a Path from an empty string > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.<init>(Path.java:135) > at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) > at scala.Option.map(Option.scala:146) > {code} > Note that this is a different case with the data below > {code} > val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema) > {code} > In this case, any writer is not initialised and created. (no calls of > {{WriterContainer.writeRows()}}. > Maybe, it should be able to read/write header for schemas as well as empty > data. > For Parquet and JSON, it works but CSV does not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org