[ https://issues.apache.org/jira/browse/SPARK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357162#comment-14357162 ]
mgdadv commented on SPARK-6189: ------------------------------- The current behavior really is quite confusing. In particular with R datasets where the period is often used. I think it is not obvious what the correct thing to do is. The patch above replaces the period with an underscore. This fixes the problem, but would be potentially problematic if a different solution is wanted in the future, i.e. then scripts relying on this patch would have to be changed. Alternatively one could just spit out a warning. The problem is that Spark is quite verbose and the warning might be missed. This would be the least intrusive solution I can think of. Another possibility would be to raise an exception instead of just printing a warning. > Pandas to DataFrame conversion should check field names for periods > ------------------------------------------------------------------- > > Key: SPARK-6189 > URL: https://issues.apache.org/jira/browse/SPARK-6189 > Project: Spark > Issue Type: Improvement > Components: DataFrame, SQL > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Priority: Minor > > Issue I ran into: I imported an R dataset in CSV format into a Pandas > DataFrame and then use toDF() to convert that into a Spark DataFrame. The R > dataset had a column with a period in it (column "GNP.deflator" in the > "longley" dataset). When I tried to select it using the Spark DataFrame DSL, > I could not because the DSL thought the period was selecting a field within > GNP. > Also, since "GNP" is another field's name, it gives an error which could be > obscure to users, complaining: > {code} > org.apache.spark.sql.AnalysisException: GetField is not valid on fields of > type DoubleType; > {code} > We should either handle periods in column names or check during loading and > warn/fail gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org