Hello there, I am trying to understand how and when does DataFrame (or Dataset) sets nullable = true vs false on a schema.
Here is my observation from a sample code I tried... scala> spark.createDataset(Seq((1, "a", 2.0d), (2, "b", 2.0d), (3, "c", 2.0d))).toDF("col1", "col2", "col3").withColumn("col4", lit("bla")).printSchema() root |-- col1: integer (nullable = false) |-- col2: string (nullable = true) |-- col3: double (nullable = false) |-- col4: string (nullable = false) scala> spark.createDataset(Seq((1, "a", 2.0d), (2, "b", 2.0d), (3, "c", 2.0d))).toDF("col1", "col2", "col3").withColumn("col4", lit("bla")).write.parquet("/tmp/sample.parquet") scala> spark.read.parquet("/tmp/sample.parquet").printSchema() root |-- col1: integer (nullable = true) |-- col2: string (nullable = true) |-- col3: double (nullable = true) |-- col4: string (nullable = true) The place where this seem to get me into trouble is when I try to union one data-structure from in-memory (notice that in the below schema the highlighted element is represented as 'false' for in-memory created schema) and one from file that starts out with a schema like below... |-- some_histogram: struct (nullable = true) | |-- values: array (nullable = true) | | |-- element: double (containsNull = true) | |-- freq: array (nullable = true) | | |-- element: long (containsNull = true) Is there a way to convert this attribute from true to false without running any mapping / udf on that column? Please advice, Muthu