Willi Raschkowski created SPARK-42373:
-----------------------------------------

             Summary: Remove unused blank line removal from CSVExprUtils
                 Key: SPARK-42373
                 URL: https://issues.apache.org/jira/browse/SPARK-42373
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: Willi Raschkowski


The non-multiline CSV read codepath contains references to removal of blank 
lines throughout. This is not necessary as blank lines are removed by the 
parser. Furthermore, it causes confusion, indicating that blank lines are 
removed at this point when instead they are already omitted from the data. The 
multiline code-path does not explicitly remove blank lines leading to what 
looks like disparity in behavior between the two.

The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need to 
explicitly skip lines, and this should be respected in {{CSVUtils}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to