[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output

2021-02-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285625#comment-17285625
 ] 

Hyukjin Kwon commented on SPARK-34441:
--

You can workaround as below:


{code:scala}
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType), StructField("_corrupt", 
StringType))), Map("columnNameOfCorruptRecord" -> 
"_corrupt"))).filter("converted._corrupt IS NULL").select($"value", 
$"converted".dropFields("_corrupt")).show()
+++
|   value|update_fields(converted)|
+++
||null|
|  {}|  {null}|
|{"a": "bar"}|   {bar}|
|   {"a": 42}|{42}|
+++
{code}

But two cases:

{code}
...
|   {|   []|
...
|   {"a"}|   []|
...
{code}

looks a bit odd cc [~maxgekk] FYI.

> from_json documentation is wrong about malformed JSONs output
> -
>
> Key: SPARK-34441
> URL: https://issues.apache.org/jira/browse/SPARK-34441
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> The documentation of the `from_json` function states that malformed json will 
> return a `null` value, which is not the case anymore after 
> https://issues.apache.org/jira/browse/SPARK-25243.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output

2021-02-16 Thread Jean-Francis Roy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285618#comment-17285618
 ] 

Jean-Francis Roy commented on SPARK-34441:
--

[~hyukjin.kwon] of course, here is an example :

 

 
{code:java}
scala> case class Foo(a: String)
scala> val ds = List("", "{", "{}", """{"a"}""", """{"a": "bar"}""", """{"a": 
42}""").toDS
scala> import org.apache.spark.sql.types._
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType).show()
++-+
|   value|converted|
++-+
|| null|
|   {|   []|
|  {}|   []|
|   {"a"}|   []|
|{"a": "bar"}|[bar]|
|   {"a": 42}| [42]|
++-+{code}
We see above that faulty JSON will often result in a structure with `null` 
fields instead of a `null` directly, which is a big change of behavior between 
Spark 2 and Spark 3. The documentation still states that the behavior is Spark 
2's.

Moreover, I cannot reproduce Spark 2's behavior. I do want faulty input to be 
converted to null.

I can make the code throw using the `FAILFAST` mode:

 
{code:java}
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))), Map("mode" -> 
"FAILFAST"))).show()
{code}
 

 

But I cannot use the `DROPMALFORMED` mode as it is not supported:
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))), Map("mode" -> 
"DROPMALFORMED"))).show()
java.lang.IllegalArgumentException: from_json() doesn't support the 
DROPMALFORMED mode. Acceptable modes are PERMISSIVE and FAILFAST.
 

> from_json documentation is wrong about malformed JSONs output
> -
>
> Key: SPARK-34441
> URL: https://issues.apache.org/jira/browse/SPARK-34441
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> The documentation of the `from_json` function states that malformed json will 
> return a `null` value, which is not the case anymore after 
> https://issues.apache.org/jira/browse/SPARK-25243.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output

2021-02-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285602#comment-17285602
 ] 

Hyukjin Kwon commented on SPARK-34441:
--

[~jeanfrancisroy] can you share your codes and input?

> from_json documentation is wrong about malformed JSONs output
> -
>
> Key: SPARK-34441
> URL: https://issues.apache.org/jira/browse/SPARK-34441
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> The documentation of the `from_json` function states that malformed json will 
> return a `null` value, which is not the case anymore after 
> https://issues.apache.org/jira/browse/SPARK-25243.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34441) from_json documentation is wrong about malformed JSONs output

2021-02-15 Thread Jean-Francis Roy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284955#comment-17284955
 ] 

Jean-Francis Roy commented on SPARK-34441:
--

It even seems we cannot reproduce the previous behavior anymore:
{code:java}
from_json() doesn't support the DROPMALFORMED mode. Acceptable modes are 
PERMISSIVE and FAILFAST.{code}

> from_json documentation is wrong about malformed JSONs output
> -
>
> Key: SPARK-34441
> URL: https://issues.apache.org/jira/browse/SPARK-34441
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> The documentation of the `from_json` function states that malformed json will 
> return a `null` value, which is not the case anymore after 
> https://issues.apache.org/jira/browse/SPARK-25243.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org