[ 
https://issues.apache.org/jira/browse/SPARK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-6189:
------------------------------------
    Component/s:     (was: DataFrame)

> Pandas to DataFrame conversion should check field names for periods
> -------------------------------------------------------------------
>
>                 Key: SPARK-6189
>                 URL: https://issues.apache.org/jira/browse/SPARK-6189
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Issue I ran into:  I imported an R dataset in CSV format into a Pandas 
> DataFrame and then use toDF() to convert that into a Spark DataFrame.  The R 
> dataset had a column with a period in it (column "GNP.deflator" in the 
> "longley" dataset).  When I tried to select it using the Spark DataFrame DSL, 
> I could not because the DSL thought the period was selecting a field within 
> GNP.
> Also, since "GNP" is another field's name, it gives an error which could be 
> obscure to users, complaining:
> {code}
> org.apache.spark.sql.AnalysisException: GetField is not valid on fields of 
> type DoubleType;
> {code}
> We should either handle periods in column names or check during loading and 
> warn/fail gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to