Martin Rueckl created SPARK-46959: ------------------------------------- Summary: CSV reader reads data inconsistently depending on column position Key: SPARK-46959 URL: https://issues.apache.org/jira/browse/SPARK-46959 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.1 Reporter: Martin Rueckl
Reading the following CSV {code:java} "a";"b";"c";"d" 10;100,00;"Some;String";"ok" 20;200,00;"";"still ok" 30;300,00;"also ok";"" 40;400,00;"";"" {code} with these options {code:java} spark.read .option("header","true") .option("sep",";") .option("encoding","ISO-8859-1") .option("lineSep","\r\n") .option("nullValue","") .option("quote",'"') .option("escape","") {code} results in the followin inconsistent dataframe !image-2024-02-02-13-05-26-203.png|width=352,height=120! As one can see, the quoted empty fields of the last column are not correctly read as null, whereas it works for column c. If I recall correctly, this only happens when the "escape" option is set to an empty string. Not setting it to "" (defaults to "\") seems to not cause this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org