[ https://issues.apache.org/jira/browse/SPARK-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-17024: ------------------------------------ Assignee: (was: Apache Spark) > Weird behaviour of the DataFrame when a column name contains dots. > ------------------------------------------------------------------ > > Key: SPARK-17024 > URL: https://issues.apache.org/jira/browse/SPARK-17024 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Iaroslav Zeigerman > > When a column name contains dots and one of the segment in a name is the same > as other column's name, Spark treats this column as a nested structure, > although the actual type of column is String/Int/etc. Example: > {code} > val df = sqlContext.createDataFrame(Seq( > ("user1", "task1"), > ("user2", "task2") > )).toDF("user", "user.task") > {code} > Two columns "user" and "user.task". Both of them are string, and the schema > resolution seems to be correct: > {noformat} > root > |-- user: string (nullable = true) > |-- user.task: string (nullable = true) > {noformat} > But when I'm trying to query this DataFrame like i.e.: > {code} > df.select(df("user"), df("user.task")) > {code} > Spark throws an exception "Can't extract value from user#2;" > It happens during the resolution of the LogicalPlan while processing the > "user.task" column. > Here is the full stacktrace: > {noformat} > Can't extract value from user#2; > org.apache.spark.sql.AnalysisException: Can't extract value from user#2; > at > org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:276) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:275) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:275) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:191) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151) > at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708) > at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696) > {noformat} > Is this actually an expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org