chenhao-db commented on code in PR #47920: URL: https://github.com/apache/spark/pull/47920#discussion_r1736844899
########## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/variant/VariantExpressionEvalUtilsSuite.scala: ########## @@ -89,6 +89,12 @@ class VariantExpressionEvalUtilsSuite extends SparkFunSuite { /* offset list */ 0, 2, 4, 6, /* field data */ primitiveHeader(INT1), 1, primitiveHeader(INT1), 2, shortStrHeader(1), '3'), Array(VERSION, 3, 0, 1, 2, 3, 'a', 'b', 'c')) + check("""{"a": 1, "b": 2, "c": "3", "a": 4}""", Array(objectHeader(false, 1, 1), Review Comment: I agree that a JSON object is invalid if it contains duplicate keys. However, it is not required that our implementation must throw an error for this invalid input. As stated in the RFC: > Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates. It seems fair to follow the "many implementations". As a side note, `from_json` also takes the last-win policy rather than throw an error. It is not even configurable (you cannot make it throw an error). ``` spark-sql (default)> select from_json('{"a": 1, "a": 2, "a": 3}', 'a int'); {"a":3} Time taken: 1.164 seconds, Fetched 1 row(s) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org