[ https://issues.apache.org/jira/browse/SPARK-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-15204. --------------------------------- Resolution: Fixed Target Version/s: 2.1.0 > Improve nullability inference for Aggregator > -------------------------------------------- > > Key: SPARK-15204 > URL: https://issues.apache.org/jira/browse/SPARK-15204 > Project: Spark > Issue Type: Improvement > Components: SQL > Environment: spark-2.0.0-SNAPSHOT > Reporter: koert kuipers > Assignee: Koert Kuipers > Priority: Minor > > {noformat} > object SimpleSum extends Aggregator[Row, Int, Int] { > def zero: Int = 0 > def reduce(b: Int, a: Row) = b + a.getInt(1) > def merge(b1: Int, b2: Int) = b1 + b2 > def finish(b: Int) = b > def bufferEncoder: Encoder[Int] = Encoders.scalaInt > def outputEncoder: Encoder[Int] = Encoders.scalaInt > } > val df = List(("a", 1), ("a", 2), ("a", 3)).toDF("k", "v") > val df1 = df.groupBy("k").agg(SimpleSum.toColumn as "v1") > df1.printSchema > df1.show > root > |-- k: string (nullable = true) > |-- v1: integer (nullable = true) > +---+---+ > | k| v1| > +---+---+ > | a| 6| > +---+---+ > {noformat} > notice how v1 has nullable set to true. the default (and expected) behavior > for spark sql is to give an int column false for nullable. for example if i > had uses a built-in aggregator like "sum" instead if would have reported > nullable = false. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org