Lakshminarayan Kamath created SPARK-25890: ---------------------------------------------
Summary: Null rows are ignored with Ctrl-A as a delimiter when reading a CSV file. Key: SPARK-25890 URL: https://issues.apache.org/jira/browse/SPARK-25890 Project: Spark Issue Type: Bug Components: Spark Shell, SQL Affects Versions: 2.3.2 Reporter: Lakshminarayan Kamath Reading a Ctrl-A delimited CSV file ignores rows with all null values. However a comma delimited CSV file doesn't. *Reproduction in spark-shell:* import org.apache.spark.sql._ import org.apache.spark.sql.types._ val l = List(List(1, 2), List(null,null), List(2,3)) val datasetSchema = StructType(List(StructField("colA", IntegerType, true), StructField("colB", IntegerType, true))) val rdd = sc.parallelize(l).map(item ⇒ Row.fromSeq(item.toSeq)) val df = spark.createDataFrame(rdd, datasetSchema) df.show() +----+----+ |colA|colB| +----+----+ | 1 | 2 | |null | null| | 2 | 3 | +----+----+ df.write.option("delimiter", "\u0001").option("header", "true").csv("/ctrl-a-separated.csv") df.write.option("delimiter", ",").option("header", "true").csv("/comma-separated.csv") val commaDf = spark.read.option("header", "true").option("delimiter", ",").csv("/comma-separated.csv") commaDf.show +----+----+ |colA|colB| +----+----+ | 1 | 2 | | 2 | 3 | |null |null| +----+----+ val ctrlaDf = spark.read.option("header", "true").option("delimiter", "\u0001").csv("/ctrl-a-separated.csv") ctrlaDf.show +----+----+ |colA|colB| +----+----+ | 1 | 2 | | 2 | 3 | +----+----+ As seen above, for Ctrl-A delimited CSV, rows containing only null values are ignored. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org