[ 
https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276275#comment-15276275
 ] 

Weichen Xu commented on SPARK-15212:
------------------------------------

en...but still may cause problem, for example, the csv file header contains ` 
character such as:
col`1,col2,...
so it is better to add a check whether the column name read from file is legal.

> CSV file reader when read file with first line schema do not filter blank in 
> schema column name
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15212
>                 URL: https://issues.apache.org/jira/browse/SPARK-15212
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Weichen Xu
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> for example, run the following code in spark-shell,
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> reader.option("header", true)
> var df = reader.csv("file:///diskext/tdata/spark/d1.csv")
> when the csv data file contains:
> ----------------------------------------------------------
> col1, col2,col3,col4,col5
> 1997,Ford,E350,"ac, abs, moon",3000.00
> ....
> ------------------------------------------------------------
> the first line contains schema, the col2 has a blank before it,
> then the generated DataFrame's schema column name contains the blank.
> This may cause potential problem for example
> df.select("col2") 
> can't find the column, must use 
> df.select(" col2") 
> and if register the dataframe as a table, then do query, can't select col2.
> df.registerTempTable("tab1");
> sqlContext.sql("select col2 from tab1"); //will fail
> must add a column name validate when load csv file with schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to