[jira] [Comment Edited] (SPARK-15522) DataFrame Column Names That are Numbers aren't referenced correctly in SQL

2017-03-06 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897638#comment-15897638
 ] 

Hyukjin Kwon edited comment on SPARK-15522 at 3/6/17 11:26 PM:
---

We can use backticks for it as below:

{code}
scala> Seq(Some(1), None).toDF("82").createOrReplaceTempView("piv_qhID")

scala> spark.sql("select * from piv_qhID where '82' is NULL limit 20").show()
+---+
| 82|
+---+
+---+


scala> spark.sql("select * from piv_qhID where `82` is NULL limit 20").show()
++
|  82|
++
|null|
++
{code}

It seems {{'82'}} or {{"82"}} was being created as a constant. I am resolving 
this JIRA. Please reopen this if I misunderstood. 


was (Author: hyukjin.kwon):
We can use backticks for it as below:

{code}
scala> spark.range(10).toDF("82").createOrReplaceTempView("piv_qhID")

scala> spark.sql("select * from piv_qhID where `82` is NULL limit 20").show()
+---+
| 82|
+---+
+---+
{code}

It seems {{'82'}} or {{"82"}} was being created as a constant. I am resolving 
this JIRA. Please reopen this if I misunderstood. 

> DataFrame Column Names That are Numbers aren't referenced correctly in SQL
> --
>
> Key: SPARK-15522
> URL: https://issues.apache.org/jira/browse/SPARK-15522
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Jason Pohl
>
> The following code is run:
> val pre_piv_df_a = sqlContext.sql("""
> SELECT
> CASE WHEN Gender = 'M' Then 1 ELSE 0 END AS has_male,
> CASE WHEN Gender = 'F' Then 1 ELSE 0 END AS has_female,
> CAST(StartAge AS Double) AS StartAge_dbl,
> CAST(EndAge AS Double) AS EndAge_dbl,
> *
> FROM alldem_union_curr
> """)
> .withColumn("JavaStartTimestamp", create_ts($"StartTimestamp"))
> .drop("StartTimestamp").withColumnRenamed("JavaStartTimestamp", 
> "StartTimestamp")
> .drop("StartAge").drop("EndAge")
> .withColumnRenamed("StartAge_dbl", 
> "StartAge").withColumnRenamed("EndAge_dbl", "EndAge")
> val pre_piv_df_b = pre_piv_df_a
> .withColumn("media_month_cc", media_month_cc($"MediaMonth"))
> .withColumn("media_week_cc", media_week_sts_cc($"StartTimestamp"))
> .withColumn("media_day_cc", media_day_sts_cc($"StartTimestamp"))
> .withColumn("week_day", week_day($"StartTimestamp"))
> .withColumn("week_end", week_end($"StartTimestamp"))
> .join(sqlContext.table("cad_nets"), $"Network" === $"nielsen_network", 
> "inner")
> .withColumnRenamed("cad_network", "norm_net_code_a")
> .withColumn("norm_net_code", reCodeNets($"norm_net_code_a"))
> pre_piv_df_b.registerTempTable("pre_piv_df")
> val piv_qhID_df = pre_piv_df_b.groupBy("Network", "Audience", "StartDate", 
> "rating_category_cd")
> .pivot("qaID").agg("rating" -> "mean")
> The pivot creates a lot of columns (96) with names that are like 
> ‘01’,’02’,…,’96’ as a result of pivoting a table that has quarter hour IDs.
> In the below SQL the highlighted section causes problems. If I rename the 
> columns to ‘col01’,’col02’,…,’col96’ I can run the SQL correctly and get the 
> expected results.
> select * from piv_qhID where 82 is NULL limit 20
> And I am getting no rows even though there are nulls.
> On the other hand the query:
> select * from piv_qhID where 82 is NOT NULL limit 20
> Returns all rows (even those with nulls)
> Renaming the columns fixes this, but it would be nice if the columns were 
> referenced correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15522) DataFrame Column Names That are Numbers aren't referenced correctly in SQL

2016-05-25 Thread Huaxin Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300346#comment-15300346
 ] 

Huaxin Gao edited comment on SPARK-15522 at 5/25/16 4:25 PM:
-

I think it's not legal to have solely digit as the column name unless it's 
quoted. 
Could you please try select * from piv_qhID where "82" is NULL limit 20 or  
select * from piv_qhID where '82' is NULL limit 20? 
I think select * from piv_qhID where 82 is NOT NULL limit 20 will always return 
all the rows because 82 is NOT NULL is always true.  I believe 82 is treated as 
a number, not a column name. 


was (Author: huaxing):
I think it's not legal to have solely digit as the column name unless it's 
quoted. 
Could you please try **select * from piv_qhID where "82" is NULL limit 20** or 
** select * from piv_qhID where '82' is NULL limit 20**? 
I think **select * from piv_qhID where 82 is NOT NULL limit 20** will always 
return all the rows because **82 is NOT NULL** is always true.  I believe 82 is 
treated as a number, not a column name. 

> DataFrame Column Names That are Numbers aren't referenced correctly in SQL
> --
>
> Key: SPARK-15522
> URL: https://issues.apache.org/jira/browse/SPARK-15522
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Jason Pohl
>
> The following code is run:
> val pre_piv_df_a = sqlContext.sql("""
> SELECT
> CASE WHEN Gender = 'M' Then 1 ELSE 0 END AS has_male,
> CASE WHEN Gender = 'F' Then 1 ELSE 0 END AS has_female,
> CAST(StartAge AS Double) AS StartAge_dbl,
> CAST(EndAge AS Double) AS EndAge_dbl,
> *
> FROM alldem_union_curr
> """)
> .withColumn("JavaStartTimestamp", create_ts($"StartTimestamp"))
> .drop("StartTimestamp").withColumnRenamed("JavaStartTimestamp", 
> "StartTimestamp")
> .drop("StartAge").drop("EndAge")
> .withColumnRenamed("StartAge_dbl", 
> "StartAge").withColumnRenamed("EndAge_dbl", "EndAge")
> val pre_piv_df_b = pre_piv_df_a
> .withColumn("media_month_cc", media_month_cc($"MediaMonth"))
> .withColumn("media_week_cc", media_week_sts_cc($"StartTimestamp"))
> .withColumn("media_day_cc", media_day_sts_cc($"StartTimestamp"))
> .withColumn("week_day", week_day($"StartTimestamp"))
> .withColumn("week_end", week_end($"StartTimestamp"))
> .join(sqlContext.table("cad_nets"), $"Network" === $"nielsen_network", 
> "inner")
> .withColumnRenamed("cad_network", "norm_net_code_a")
> .withColumn("norm_net_code", reCodeNets($"norm_net_code_a"))
> pre_piv_df_b.registerTempTable("pre_piv_df")
> val piv_qhID_df = pre_piv_df_b.groupBy("Network", "Audience", "StartDate", 
> "rating_category_cd")
> .pivot("qaID").agg("rating" -> "mean")
> The pivot creates a lot of columns (96) with names that are like 
> ‘01’,’02’,…,’96’ as a result of pivoting a table that has quarter hour IDs.
> In the below SQL the highlighted section causes problems. If I rename the 
> columns to ‘col01’,’col02’,…,’col96’ I can run the SQL correctly and get the 
> expected results.
> select * from piv_qhID where 82 is NULL limit 20
> And I am getting no rows even though there are nulls.
> On the other hand the query:
> select * from piv_qhID where 82 is NOT NULL limit 20
> Returns all rows (even those with nulls)
> Renaming the columns fixes this, but it would be nice if the columns were 
> referenced correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org