Suppose we have the following JSON, which we parse into a DataFrame (using the mulitline option).
[{ "id": 8541, "value": "8541 changed again value" },{ "id": 51109, "name": "newest bob", "value": "51109 changed again" }] Regardless of whether we explicitly define a schema, or allow it to be inferred, the result of df.show(), after parsing this data, is similar to the following: +-----+----------+--------------------+ | id| name| value| +-----+----------+--------------------+ | 8541| null|8541 changed agai...| |51109|newest bob| 51109 changed again| +-----+----------+--------------------+ Notice that the name column for the first row is null. This JSON will produce an identical DataFrame: [{ "id": 8541, "name": null, "value": "8541 changed again value" },{ "id": 51109, "name": "newest bob", "value": "51109 changed again" }] Is there a way to distinguish between these two cases in the DataFrame (i.e. field is missing, but added as null due to inferred or explicit schema, versus field is present but with null value)? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org