Barry Becker created SPARK-17041: ------------------------------------ Summary: Columns in schema are no longer case sensitive when reading csv file Key: SPARK-17041 URL: https://issues.apache.org/jira/browse/SPARK-17041 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.0.0 Reporter: Barry Becker
It used to be (in spark 1.6.2) that I could read a csv file that had columns with names that differed only by case. For example, one column may be "output" and another called "Output". Now (with spark 2.0.0) if I try to read such a file, I get an error like this: {code} org.apache.spark.sql.AnalysisException: Reference 'Output' is ambiguous, could be: Output#1263, Output#1295.; {code} The schema (dfSchema below) that I pass to the csv read looks like this: {code} StructType( StructField(Output,StringType,true), ... StructField(output,StringType,true), ...) {code} The code that does the read is this {code} sqlContext.read .format("csv") .option("header", "false") // Use first line of all files as header .option("inferSchema", "false") // Automatically infer data types .schema(dfSchema) .csv(dataFile) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org