[jira] [Created] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

Marcin Mejran (JIRA) Wed, 29 May 2019 13:17:14 -0700

Marcin Mejran created SPARK-27873:
-------------------------------------

             Summary: Csv reader, adding a corrupt record column causes error 
if enforceSchema=false
                 Key: SPARK-27873
                 URL: https://issues.apache.org/jira/browse/SPARK-27873
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.3
            Reporter: Marcin Mejran



In the Spark CSV reader If you're using permissive mode with a column for 
storing corrupt records then you need to add a new schema column corresponding 
to columnNameOfCorruptRecord.

However, if you have a header row and enforceSchema=false the schema vs. header 
validation fails because there is an extra column corresponding to 
columnNameOfCorruptRecord.

Since, the FAILFAST mode doesn't print informative error messages on which rows 
failed to parse there is no way other to track down broken rows without setting 
a corrupt record column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

Reply via email to