[ https://issues.apache.org/jira/browse/SPARK-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299823#comment-15299823 ]
Takeshi Yamamuro commented on SPARK-15522: ------------------------------------------ Could you make a simpler query to reproduce this for making others easily understood? > DataFrame Column Names That are Numbers aren't referenced correctly in SQL > -------------------------------------------------------------------------- > > Key: SPARK-15522 > URL: https://issues.apache.org/jira/browse/SPARK-15522 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Jason Pohl > > The following code is run: > val pre_piv_df_a = sqlContext.sql(""" > SELECT > CASE WHEN Gender = 'M' Then 1 ELSE 0 END AS has_male, > CASE WHEN Gender = 'F' Then 1 ELSE 0 END AS has_female, > CAST(StartAge AS Double) AS StartAge_dbl, > CAST(EndAge AS Double) AS EndAge_dbl, > * > FROM alldem_union_curr > """) > .withColumn("JavaStartTimestamp", create_ts($"StartTimestamp")) > .drop("StartTimestamp").withColumnRenamed("JavaStartTimestamp", > "StartTimestamp") > .drop("StartAge").drop("EndAge") > .withColumnRenamed("StartAge_dbl", > "StartAge").withColumnRenamed("EndAge_dbl", "EndAge") > val pre_piv_df_b = pre_piv_df_a > .withColumn("media_month_cc", media_month_cc($"MediaMonth")) > .withColumn("media_week_cc", media_week_sts_cc($"StartTimestamp")) > .withColumn("media_day_cc", media_day_sts_cc($"StartTimestamp")) > .withColumn("week_day", week_day($"StartTimestamp")) > .withColumn("week_end", week_end($"StartTimestamp")) > .join(sqlContext.table("cad_nets"), $"Network" === $"nielsen_network", > "inner") > .withColumnRenamed("cad_network", "norm_net_code_a") > .withColumn("norm_net_code", reCodeNets($"norm_net_code_a")) > pre_piv_df_b.registerTempTable("pre_piv_df") > val piv_qhID_df = pre_piv_df_b.groupBy("Network", "Audience", "StartDate", > "rating_category_cd") > .pivot("qaID").agg("rating" -> "mean") > The pivot creates a lot of columns (96) with names that are like > ‘01’,’02’,…,’96’ as a result of pivoting a table that has quarter hour IDs. > In the below SQL the highlighted section causes problems. If I rename the > columns to ‘col01’,’col02’,…,’col96’ I can run the SQL correctly and get the > expected results. > select * from piv_qhID where 82 is NULL limit 20 > And I am getting no rows even though there are nulls. > On the other hand the query: > select * from piv_qhID where 82 is NOT NULL limit 20 > Returns all rows (even those with nulls) > Renaming the columns fixes this, but it would be nice if the columns were > referenced correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org