Weichen Xu created SPARK-15212:
----------------------------------

             Summary: CVS file reader when read file with first line schema do 
not filter blank in schema column name
                 Key: SPARK-15212
                 URL: https://issues.apache.org/jira/browse/SPARK-15212
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.1, 1.6.2, 2.0.0, 2.1.0
            Reporter: Weichen Xu


for example, run the following code in spark-shell,
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
var reader = sqlContext.read
reader.option("header", true)
var df = reader.csv("file:///diskext/tdata/spark/d1.csv")

when the csv data file contains:
----------------------------------------------------------
col1, col2,col3,col4,col5
1997,Ford,E350,"ac, abs, moon",3000.00
....
------------------------------------------------------------

the first line contains schema, the col2 has a blank before it,
then the generated DataFrame's schema column name contains the blank.

This may cause potential problem for example

df.select("col2") 
can't find the column, must use 
df.select(" col2") 

and if register the dataframe as a table, then do query, can't select col2.

df.registerTempTable("tab1");
sqlContext.sql("select col2 from tab1"); //will fail

must add a column name validate when load csv file with schema.









--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to