[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output
[ https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285625#comment-17285625 ] Hyukjin Kwon commented on SPARK-34441: -- You can workaround as below: {code:scala} scala> ds.withColumn("converted", from_json($"value", StructType(Array(StructField("a", StringType), StructField("_corrupt", StringType))), Map("columnNameOfCorruptRecord" -> "_corrupt"))).filter("converted._corrupt IS NULL").select($"value", $"converted".dropFields("_corrupt")).show() +++ | value|update_fields(converted)| +++ ||null| | {}| {null}| |{"a": "bar"}| {bar}| | {"a": 42}|{42}| +++ {code} But two cases: {code} ... | {| []| ... | {"a"}| []| ... {code} looks a bit odd cc [~maxgekk] FYI. > from_json documentation is wrong about malformed JSONs output > - > > Key: SPARK-34441 > URL: https://issues.apache.org/jira/browse/SPARK-34441 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: Jean-Francis Roy >Priority: Minor > > The documentation of the `from_json` function states that malformed json will > return a `null` value, which is not the case anymore after > https://issues.apache.org/jira/browse/SPARK-25243. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output
[ https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285618#comment-17285618 ] Jean-Francis Roy commented on SPARK-34441: -- [~hyukjin.kwon] of course, here is an example : {code:java} scala> case class Foo(a: String) scala> val ds = List("", "{", "{}", """{"a"}""", """{"a": "bar"}""", """{"a": 42}""").toDS scala> import org.apache.spark.sql.types._ scala> ds.withColumn("converted", from_json($"value", StructType(Array(StructField("a", StringType).show() ++-+ | value|converted| ++-+ || null| | {| []| | {}| []| | {"a"}| []| |{"a": "bar"}|[bar]| | {"a": 42}| [42]| ++-+{code} We see above that faulty JSON will often result in a structure with `null` fields instead of a `null` directly, which is a big change of behavior between Spark 2 and Spark 3. The documentation still states that the behavior is Spark 2's. Moreover, I cannot reproduce Spark 2's behavior. I do want faulty input to be converted to null. I can make the code throw using the `FAILFAST` mode: {code:java} scala> ds.withColumn("converted", from_json($"value", StructType(Array(StructField("a", StringType))), Map("mode" -> "FAILFAST"))).show() {code} But I cannot use the `DROPMALFORMED` mode as it is not supported: scala> ds.withColumn("converted", from_json($"value", StructType(Array(StructField("a", StringType))), Map("mode" -> "DROPMALFORMED"))).show() java.lang.IllegalArgumentException: from_json() doesn't support the DROPMALFORMED mode. Acceptable modes are PERMISSIVE and FAILFAST. > from_json documentation is wrong about malformed JSONs output > - > > Key: SPARK-34441 > URL: https://issues.apache.org/jira/browse/SPARK-34441 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: Jean-Francis Roy >Priority: Minor > > The documentation of the `from_json` function states that malformed json will > return a `null` value, which is not the case anymore after > https://issues.apache.org/jira/browse/SPARK-25243. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output
[ https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285602#comment-17285602 ] Hyukjin Kwon commented on SPARK-34441: -- [~jeanfrancisroy] can you share your codes and input? > from_json documentation is wrong about malformed JSONs output > - > > Key: SPARK-34441 > URL: https://issues.apache.org/jira/browse/SPARK-34441 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: Jean-Francis Roy >Priority: Minor > > The documentation of the `from_json` function states that malformed json will > return a `null` value, which is not the case anymore after > https://issues.apache.org/jira/browse/SPARK-25243. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output
[ https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284955#comment-17284955 ] Jean-Francis Roy commented on SPARK-34441: -- It even seems we cannot reproduce the previous behavior anymore: {code:java} from_json() doesn't support the DROPMALFORMED mode. Acceptable modes are PERMISSIVE and FAILFAST.{code} > from_json documentation is wrong about malformed JSONs output > - > > Key: SPARK-34441 > URL: https://issues.apache.org/jira/browse/SPARK-34441 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0, 3.0.1 >Reporter: Jean-Francis Roy >Priority: Minor > > The documentation of the `from_json` function states that malformed json will > return a `null` value, which is not the case anymore after > https://issues.apache.org/jira/browse/SPARK-25243. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org