Re: [PR] [SPARK-49451] Allow duplicate keys in parse_json. [spark]

via GitHub Thu, 29 Aug 2024 11:05:02 -0700


chenhao-db commented on code in PR #47920:
URL: https://github.com/apache/spark/pull/47920#discussion_r1736844899



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/variant/VariantExpressionEvalUtilsSuite.scala:
##########
@@ -89,6 +89,12 @@ class VariantExpressionEvalUtilsSuite extends SparkFunSuite {
       /* offset list */ 0, 2, 4, 6,
       /* field data */ primitiveHeader(INT1), 1, primitiveHeader(INT1), 2, 
shortStrHeader(1), '3'),
       Array(VERSION, 3, 0, 1, 2, 3, 'a', 'b', 'c'))
+    check("""{"a": 1, "b": 2, "c": "3", "a": 4}""", Array(objectHeader(false, 
1, 1),

Review Comment:
   I agree that a JSON object is invalid if it contains duplicate keys. 
However, it is not required that our implementation must throw an error for 
this invalid input. As stated in the RFC:
   
   > Many implementations report the last name/value pair only.  Other 
implementations report an error or fail to parse the object, and some 
implementations report all of the name/value pairs, including duplicates.
   
   It seems fair to follow the "many implementations".
   
   As a side note, `from_json` also takes the last-win policy rather than throw 
an error. It is not even configurable (you cannot make it throw an error).
   
   ```
   spark-sql (default)> select from_json('{"a": 1, "a": 2, "a": 3}', 'a int');
   {"a":3}
   Time taken: 1.164 seconds, Fetched 1 row(s)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49451] Allow duplicate keys in parse_json. [spark]

Reply via email to