[ https://issues.apache.org/jira/browse/SPARK-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026508#comment-15026508 ]
Hyukjin Kwon commented on SPARK-11949: -------------------------------------- Oops. I did the same test a bit ago. Here {code} case class Fact(date: Option[Int], hour: Option[Int], minute: Option[Int], room_name: String, temp: Double) val rdd = sc.parallelize(Seq ( Fact(Some(20151123), Some(18), Some(35), "room1", 18.6), Fact(Some(20151123), Some(18), Some(35), "room2", 22.4), Fact(Some(20151123), Some(18), Some(36), "room1", 17.4), Fact(Some(20151123), Some(18), Some(36), "room2", 25.6) )) val df0 = sqlContext.createDataFrame(rdd) val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map ( "temp" -> "avg" )).toDF() cube0.where("date IS NULL").show() {code} > Query on DataFrame from cube gives wrong results > ------------------------------------------------ > > Key: SPARK-11949 > URL: https://issues.apache.org/jira/browse/SPARK-11949 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.1 > Reporter: Veli Kerim Celik > Labels: dataframe, sql > > {code:title=Reproduce bug|borderStyle=solid} > case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: > Double) > val df0 = sc.parallelize(Seq > ( > fact(20151123, 18, 35, "room1", 18.6), > fact(20151123, 18, 35, "room2", 22.4), > fact(20151123, 18, 36, "room1", 17.4), > fact(20151123, 18, 36, "room2", 25.6) > )).toDF() > val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map > ( > "temp" -> "avg" > )) > cube0.where("date IS NULL").show() > {code} > The query result is empty. It should not be, because cube0 contains the value > null several times in column 'date'. The issue arises because the cube > function reuses the schema information from df0. If I change the type of > parameters in the case class to Option[T] the query gives correct results. > Solution: The cube function should change the schema by changing the nullable > property to true, for the columns (dimensions) specified in the method call > parameters. > I am new at Scala and Spark. I don't know how to implement this. Somebody > please do. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org