I try to get the result schema of aggregate functions using DataFrame
API.

However, I find the result field of groupBy columns are always nullable
even the source field is not nullable.

I want to know if this is by design, thank you! Below is the simple code
to show the issue.

======

  import sqlContext.implicits._
  import org.apache.spark.sql.functions._
  case class Test(key: String, value: Long)
  val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF
  
  val result = df.groupBy("key").agg($"key", sum("value"))
  
  // From the output, you can see the "key" column is nullable, why??
  result.printSchema
//    root
//     |-- key: string (nullable = true)
//     |-- SUM(value): long (nullable = true)


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to