Hi Michael,

Since we're at it, could you please point at the code where the
optimization happens? I assume you're talking about Catalyst when
whole-gening the code for queries. Is this nullability (NULL value)
propagation perhaps? I'd appreciate (hoping that would improve my
understanding of the low-level bits quite substantially). Thanks!

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Aug 5, 2016 at 1:24 AM, Michael Armbrust <mich...@databricks.com> wrote:
> Nullable is an optimization for Spark SQL.  It is telling spark to not even
> do an if check when accessing that field.
>
> In this case, your data is nullable, because timestamp is an object in java
> and you could put null there.
>
> On Thu, Aug 4, 2016 at 2:56 PM, luismattor <luismat...@gmail.com> wrote:
>>
>> Hi all,
>>
>> Consider the following case:
>>
>> import java.sql.Timestamp
>> case class MyProduct(t: Timestamp, a: Float)
>> val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF()
>> rdd.printSchema()
>>
>> The output is:
>> root
>>  |-- t: timestamp (nullable = true)
>>  |-- a: float (nullable = false)
>>
>> How can I set the timestamp column to be NOT nullable?
>>
>> Regards,
>> Luis
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-nullable-field-when-create-DataFrame-using-case-class-tp27479.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to