[jira] [Commented] (SPARK-11949) Query on DataFrame from cube gives wrong results

Veli Kerim Celik (JIRA) Wed, 25 Nov 2015 03:12:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026618#comment-15026618
 ]


Veli Kerim Celik commented on SPARK-11949:
------------------------------------------

Yes, it works on room_name because the column is nullable = true. I don't know 
where in the code groupByExprs is located. In the cube method 'def cube(col1: 
String, cols: String*): GroupedData' after resolving col1 (String) and cols 
(String*) to the corresponding columns, set nullable = true on these columns. 

In the method 'def cube(cols: Column*): GroupedData', set nullable = true for 
all columns in cols.

> Query on DataFrame from cube gives wrong results
> ------------------------------------------------
>
>                 Key: SPARK-11949
>                 URL: https://issues.apache.org/jira/browse/SPARK-11949
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Veli Kerim Celik
>              Labels: dataframe, sql
>
> {code:title=Reproduce bug|borderStyle=solid}
> case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: 
> Double)
> val df0 = sc.parallelize(Seq
> (
> fact(20151123, 18, 35, "room1", 18.6),
> fact(20151123, 18, 35, "room2", 22.4),
> fact(20151123, 18, 36, "room1", 17.4),
> fact(20151123, 18, 36, "room2", 25.6)
> )).toDF()
> val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map
> (
> "temp" -> "avg"
> ))
> cube0.where("date IS NULL").show()
> {code}
> The query result is empty. It should not be, because cube0 contains the value 
> null several times in column 'date'. The issue arises because the cube 
> function reuses the schema information from df0. If I change the type of 
> parameters in the case class to Option[T] the query gives correct results.
> Solution: The cube function should change the schema by changing the nullable 
> property to true, for the columns (dimensions) specified in the method call 
> parameters.
> I am new at Scala and Spark. I don't know how to implement this. Somebody 
> please do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11949) Query on DataFrame from cube gives wrong results

Reply via email to