Barry Becker created SPARK-17041:
------------------------------------

             Summary: Columns in schema are no longer case sensitive when 
reading csv file
                 Key: SPARK-17041
                 URL: https://issues.apache.org/jira/browse/SPARK-17041
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.0.0
            Reporter: Barry Becker


It used to be (in spark 1.6.2) that I could read a csv file that had columns 
with  names that differed only by case. For example, one column may be "output" 
and another called "Output". Now (with spark 2.0.0) if I try to read such a 
file, I get an error like this:
{code}
org.apache.spark.sql.AnalysisException: Reference 'Output' is ambiguous, could 
be: Output#1263, Output#1295.;
{code}

The schema (dfSchema below) that I pass to the csv read looks like this:
{code}
StructType( StructField(Output,StringType,true), ... 
StructField(output,StringType,true), ...)
{code}
The code that does the read is this
{code}
sqlContext.read
          .format("csv")
          .option("header", "false") // Use first line of all files as header
          .option("inferSchema", "false") // Automatically infer data types
          .schema(dfSchema)
          .csv(dataFile)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to