Maxim Gekk created SPARK-25669:
----------------------------------

             Summary: Check header only when it exists
                 Key: SPARK-25669
                 URL: https://issues.apache.org/jira/browse/SPARK-25669
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Maxim Gekk


Currently, Spark checks the header in CSV files to fields names in provided or 
inferred schema. The check is bypassed if the header doesn't exists and CSV 
content is read from files. In the case, when input CSV comes as dataset of 
strings, Spark always compares the first row to the user specified or inferred 
schema. For example, parsing the following dataset:
{code:scala}
val input = Seq("1,2").toDS()
spark.read.option("enforceSchema", false).csv(input)
{code}
throws the exception:
{code:java}
java.lang.IllegalArgumentException: CSV header does not conform to the schema.
 Header: 1, 2
 Schema: _c0, _c1
Expected: _c0 but found: 1   
{code}

Need to prevent comparison of the first row (if it is not a header) to specific 
or inferred schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to