planga82 commented on a change in pull request #33747:
URL: https://github.com/apache/spark/pull/33747#discussion_r690790953



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##########
@@ -197,12 +197,12 @@ class JacksonParser(
 
         case VALUE_STRING if parser.getTextLength >= 1 =>
           // Special case handling for NaN and Infinity.
-          parser.getText match {

Review comment:
       This is a good point, I have done research about it and it seems that it 
is not a consensus. In Json specification this special values are not permitted 
(https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
 pag. 12). 
   
   From this point different libraries follow different criteria. For example
   * **Jackson**, if you use ALLOW_NON_NUMERIC_NUMBERS 
(https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.Feature.html?is-external=true)
 uses `'INF', '-INF', 'NaN'`.  That is the XML standard for this 
(https://www.w3.org/TR/xmlschema-2/)
   * **Python 3**  uses the following `'-Infinity', 'Infinity', 'NaN'`  
(https://docs.python.org/3/library/json.html)
   * **Spray-json**: It seems that not support it
   * **Gson**, using serializeSpecialFloatingPointValues 
(https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.0/com/google/gson/GsonBuilder.html#serializeSpecialFloatingPointValues--)
 , uses the **JavaScript specification** `'-Infinity', 'Infinity', 'NaN'` 
(https://www.ecma-international.org/wp-content/uploads/ECMA-262_12th_edition_june_2021.pdf
 pag 119, pag. 63)
   
   From my point of view, since there is no standard, probably using the same 
options as cast could make the users' lives easier, but it’s true that, for 
example, all lowercase letters  is not an option in any library, so it’s a 
difficult decision. Now we are using the same option as Python and Javascript 
specification.
   

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##########
@@ -197,12 +197,12 @@ class JacksonParser(
 
         case VALUE_STRING if parser.getTextLength >= 1 =>
           // Special case handling for NaN and Infinity.
-          parser.getText match {

Review comment:
       This is a good point, I have done research about it and it seems that it 
is not a consensus. In Json specification this special values are not permitted 
(https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
 pag. 12). 
   
   From this point different libraries follow different criteria. For example
   * **Jackson**, if you use ALLOW_NON_NUMERIC_NUMBERS 
(https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.Feature.html?is-external=true)
 uses `'INF', '-INF', 'NaN'`.  That is the **XML standard** for this 
(https://www.w3.org/TR/xmlschema-2/)
   * **Python 3**  uses the following `'-Infinity', 'Infinity', 'NaN'`  
(https://docs.python.org/3/library/json.html)
   * **Spray-json**: It seems that not support it
   * **Gson**, using serializeSpecialFloatingPointValues 
(https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.0/com/google/gson/GsonBuilder.html#serializeSpecialFloatingPointValues--)
 , uses the **JavaScript specification** `'-Infinity', 'Infinity', 'NaN'` 
(https://www.ecma-international.org/wp-content/uploads/ECMA-262_12th_edition_june_2021.pdf
 pag 119, pag. 63)
   
   From my point of view, since there is no standard, probably using the same 
options as cast could make the users' lives easier, but it’s true that, for 
example, all lowercase letters  is not an option in any library, so it’s a 
difficult decision. Now we are using the same option as Python and Javascript 
specification.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to