[ 
https://issues.apache.org/jira/browse/SPARK-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Veli Kerim Celik updated SPARK-11949:
-------------------------------------
    Description: 
{code:title=Reproduce bug|borderStyle=solid}
case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: 
Double)
val df0 = sc.parallelize(Seq
(
fact(20151123, 18, 35, "room1", 18.6),
fact(20151123, 18, 35, "room2", 22.4),
fact(20151123, 18, 36, "room1", 17.4),
fact(20151123, 18, 36, "room2", 25.6)
)).toDF()
val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map
(
"temp" -> "avg"
))
cube0.where("date IS NULL").show()
{code}

The query result is empty. It should not be, because cube0 contains the value 
null several times in column 'date'. The issue arises because the cube function 
reuses the schema information from df0. If I change the type of parameters in 
the case class to Option[T] the query gives correct results.

Solution: The cube function should change the schema by changing the nullable 
property to true, for the columns (dimensions) specified in the method call 
parameters.

I am new at Scala and Spark. I don't know how to implement this. Somebody 
please do.

  was:
Reproduce:

case class fact(date: Int, hour: Int, minute: Int, room_name: java.lang.String, 
temp: Double)
val df0 = sc.parallelize(Seq
(
fact(20151123, 18, 35, "room1", 18.6),
fact(20151123, 18, 35, "room2", 22.4),
fact(20151123, 18, 36, "room1", 17.4),
fact(20151123, 18, 36, "room2", 25.6)
)).toDF()
val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map
(
"temp" -> "avg"
))
cube0.where("date IS NULL").show()

The query result is empty. It should not be, because cube0 contains the value 
null several times in column 'date'. The issue arises because the cube function 
reuses the schema information from df0. If I change the type of parameters in 
the case class to Option[T] the query gives correct results.

Solution: The cube function should change the schema by changing the nullable 
property to true, for the columns (dimensions) specified in the method call 
parameters.

I am new at Scala and Spark. I don't know how to implement this. Somebody 
please do.


> Cube does not change the schema
> -------------------------------
>
>                 Key: SPARK-11949
>                 URL: https://issues.apache.org/jira/browse/SPARK-11949
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Veli Kerim Celik
>              Labels: dataframe, sql
>
> {code:title=Reproduce bug|borderStyle=solid}
> case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: 
> Double)
> val df0 = sc.parallelize(Seq
> (
> fact(20151123, 18, 35, "room1", 18.6),
> fact(20151123, 18, 35, "room2", 22.4),
> fact(20151123, 18, 36, "room1", 17.4),
> fact(20151123, 18, 36, "room2", 25.6)
> )).toDF()
> val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map
> (
> "temp" -> "avg"
> ))
> cube0.where("date IS NULL").show()
> {code}
> The query result is empty. It should not be, because cube0 contains the value 
> null several times in column 'date'. The issue arises because the cube 
> function reuses the schema information from df0. If I change the type of 
> parameters in the case class to Option[T] the query gives correct results.
> Solution: The cube function should change the schema by changing the nullable 
> property to true, for the columns (dimensions) specified in the method call 
> parameters.
> I am new at Scala and Spark. I don't know how to implement this. Somebody 
> please do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to