Hi Michael, Since we're at it, could you please point at the code where the optimization happens? I assume you're talking about Catalyst when whole-gening the code for queries. Is this nullability (NULL value) propagation perhaps? I'd appreciate (hoping that would improve my understanding of the low-level bits quite substantially). Thanks!
Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Aug 5, 2016 at 1:24 AM, Michael Armbrust <mich...@databricks.com> wrote: > Nullable is an optimization for Spark SQL. It is telling spark to not even > do an if check when accessing that field. > > In this case, your data is nullable, because timestamp is an object in java > and you could put null there. > > On Thu, Aug 4, 2016 at 2:56 PM, luismattor <luismat...@gmail.com> wrote: >> >> Hi all, >> >> Consider the following case: >> >> import java.sql.Timestamp >> case class MyProduct(t: Timestamp, a: Float) >> val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF() >> rdd.printSchema() >> >> The output is: >> root >> |-- t: timestamp (nullable = true) >> |-- a: float (nullable = false) >> >> How can I set the timestamp column to be NOT nullable? >> >> Regards, >> Luis >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-nullable-field-when-create-DataFrame-using-case-class-tp27479.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org