Weichen Xu created SPARK-15212: ---------------------------------- Summary: CVS file reader when read file with first line schema do not filter blank in schema column name Key: SPARK-15212 URL: https://issues.apache.org/jira/browse/SPARK-15212 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1, 1.6.2, 2.0.0, 2.1.0 Reporter: Weichen Xu
for example, run the following code in spark-shell, val sqlContext = new org.apache.spark.sql.SQLContext(sc); var reader = sqlContext.read reader.option("header", true) var df = reader.csv("file:///diskext/tdata/spark/d1.csv") when the csv data file contains: ---------------------------------------------------------- col1, col2,col3,col4,col5 1997,Ford,E350,"ac, abs, moon",3000.00 .... ------------------------------------------------------------ the first line contains schema, the col2 has a blank before it, then the generated DataFrame's schema column name contains the blank. This may cause potential problem for example df.select("col2") can't find the column, must use df.select(" col2") and if register the dataframe as a table, then do query, can't select col2. df.registerTempTable("tab1"); sqlContext.sql("select col2 from tab1"); //will fail must add a column name validate when load csv file with schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org