Willi Raschkowski created SPARK-42373: -----------------------------------------
Summary: Remove unused blank line removal from CSVExprUtils Key: SPARK-42373 URL: https://issues.apache.org/jira/browse/SPARK-42373 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.1 Reporter: Willi Raschkowski The non-multiline CSV read codepath contains references to removal of blank lines throughout. This is not necessary as blank lines are removed by the parser. Furthermore, it causes confusion, indicating that blank lines are removed at this point when instead they are already omitted from the data. The multiline code-path does not explicitly remove blank lines leading to what looks like disparity in behavior between the two. The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need to explicitly skip lines, and this should be respected in {{CSVUtils}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org