[ https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276275#comment-15276275 ]
Weichen Xu commented on SPARK-15212: ------------------------------------ en...but still may cause problem, for example, the csv file header contains ` character such as: col`1,col2,... so it is better to add a check whether the column name read from file is legal. > CSV file reader when read file with first line schema do not filter blank in > schema column name > ----------------------------------------------------------------------------------------------- > > Key: SPARK-15212 > URL: https://issues.apache.org/jira/browse/SPARK-15212 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.1.0 > Reporter: Weichen Xu > Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > for example, run the following code in spark-shell, > val sqlContext = new org.apache.spark.sql.SQLContext(sc); > var reader = sqlContext.read > reader.option("header", true) > var df = reader.csv("file:///diskext/tdata/spark/d1.csv") > when the csv data file contains: > ---------------------------------------------------------- > col1, col2,col3,col4,col5 > 1997,Ford,E350,"ac, abs, moon",3000.00 > .... > ------------------------------------------------------------ > the first line contains schema, the col2 has a blank before it, > then the generated DataFrame's schema column name contains the blank. > This may cause potential problem for example > df.select("col2") > can't find the column, must use > df.select(" col2") > and if register the dataframe as a table, then do query, can't select col2. > df.registerTempTable("tab1"); > sqlContext.sql("select col2 from tab1"); //will fail > must add a column name validate when load csv file with schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org