[jira] [Commented] (SPARK-15522) DataFrame Column Names That are Numbers aren't referenced correctly in SQL

Takeshi Yamamuro (JIRA) Wed, 25 May 2016 03:28:57 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299823#comment-15299823
 ]


Takeshi Yamamuro commented on SPARK-15522:
------------------------------------------

Could you make a simpler query to reproduce this for making others easily 
understood?

> DataFrame Column Names That are Numbers aren't referenced correctly in SQL
> --------------------------------------------------------------------------
>
>                 Key: SPARK-15522
>                 URL: https://issues.apache.org/jira/browse/SPARK-15522
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Jason Pohl
>
> The following code is run:
> val pre_piv_df_a = sqlContext.sql("""
> SELECT
> CASE WHEN Gender = 'M' Then 1 ELSE 0 END AS has_male,
> CASE WHEN Gender = 'F' Then 1 ELSE 0 END AS has_female,
> CAST(StartAge AS Double) AS StartAge_dbl,
> CAST(EndAge AS Double) AS EndAge_dbl,
> *
> FROM alldem_union_curr
> """)
> .withColumn("JavaStartTimestamp", create_ts($"StartTimestamp"))
> .drop("StartTimestamp").withColumnRenamed("JavaStartTimestamp", 
> "StartTimestamp")
> .drop("StartAge").drop("EndAge")
> .withColumnRenamed("StartAge_dbl", 
> "StartAge").withColumnRenamed("EndAge_dbl", "EndAge")
> val pre_piv_df_b = pre_piv_df_a
> .withColumn("media_month_cc", media_month_cc($"MediaMonth"))
> .withColumn("media_week_cc", media_week_sts_cc($"StartTimestamp"))
> .withColumn("media_day_cc", media_day_sts_cc($"StartTimestamp"))
> .withColumn("week_day", week_day($"StartTimestamp"))
> .withColumn("week_end", week_end($"StartTimestamp"))
> .join(sqlContext.table("cad_nets"), $"Network" === $"nielsen_network", 
> "inner")
> .withColumnRenamed("cad_network", "norm_net_code_a")
> .withColumn("norm_net_code", reCodeNets($"norm_net_code_a"))
> pre_piv_df_b.registerTempTable("pre_piv_df")
> val piv_qhID_df = pre_piv_df_b.groupBy("Network", "Audience", "StartDate", 
> "rating_category_cd")
> .pivot("qaID").agg("rating" -> "mean")
> The pivot creates a lot of columns (96) with names that are like 
> ‘01’,’02’,…,’96’ as a result of pivoting a table that has quarter hour IDs.
> In the below SQL the highlighted section causes problems. If I rename the 
> columns to ‘col01’,’col02’,…,’col96’ I can run the SQL correctly and get the 
> expected results.
> select * from piv_qhID where 82 is NULL limit 20
> And I am getting no rows even though there are nulls.
> On the other hand the query:
> select * from piv_qhID where 82 is NOT NULL limit 20
> Returns all rows (even those with nulls)
> Renaming the columns fixes this, but it would be nice if the columns were 
> referenced correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15522) DataFrame Column Names That are Numbers aren't referenced correctly in SQL

Reply via email to