Consider the following code: val df = Seq((1, 3), (2, 3)).toDF("key", "value").registerTempTable("tbl")
sqlContext.sql("select key, null as value from tbl") .write.format("json").mode(SaveMode.Overwrite).save("test") sqlContext.read.format("json").load("test").printSchema() It shows: root |-- key: long (nullable = true) The field `value` is removed from the schema when saving the DF to json file, since it is null for all rows. Saving to parquet file is the same. Null fields missed ! It seems that it's a default behavior for DF. But I would like to keep the null fields for schema consistency. Are there some options/configs to do for this purpose ? Thx. -- Hao Ren Data Engineer @ leboncoin Paris, France