Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

via GitHub Wed, 08 May 2024 13:30:54 -0700


sadikovi commented on code in PR #46408:
URL: https://github.com/apache/spark/pull/46408#discussion_r1594580639



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala:
##########
@@ -280,13 +280,32 @@ class JacksonParser(
         case VALUE_STRING =>
           UTF8String.fromString(parser.getText)
 
-        case _ =>
+        case other =>
           // Note that it always tries to convert the data as string without 
the case of failure.
-          val writer = new ByteArrayOutputStream()
-          Utils.tryWithResource(factory.createGenerator(writer, 
JsonEncoding.UTF8)) {
-            generator => generator.copyCurrentStructure(parser)
+          val startLocation = parser.getTokenLocation
+          startLocation.contentReference().getRawContent match {

Review Comment:
   Is there an existing API to get the remaining content as string? Also, would 
it work with multi-line JSON?



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala:
##########
@@ -3865,6 +3865,24 @@ abstract class JsonSuite
       }
     }
   }
+
+  test("SPARK-48148: decimal precision is preserved when object is read as 
string") {
+    withTempPath { path =>
+
+      val granularFloat = "-999.99999999999999999999999999999999995"
+      val jsonString = s"""{"data": {"v": ${granularFloat}}}, {"data": {"v": 
${granularFloat}}}]"""

Review Comment:
   The JSON string appears to be invalid, for example, it ends with `]` but I 
don't see any opening bracket for it.



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala:
##########
@@ -3865,6 +3865,24 @@ abstract class JsonSuite
       }
     }
   }
+
+  test("SPARK-48148: decimal precision is preserved when object is read as 
string") {
+    withTempPath { path =>
+

Review Comment:
   nit: You can remove the new line on L3871.



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala:
##########
@@ -3865,6 +3865,24 @@ abstract class JsonSuite
       }
     }
   }
+
+  test("SPARK-48148: decimal precision is preserved when object is read as 
string") {
+    withTempPath { path =>
+
+      val granularFloat = "-999.99999999999999999999999999999999995"
+      val jsonString = s"""{"data": {"v": ${granularFloat}}}, {"data": {"v": 
${granularFloat}}}]"""
+
+      Seq(jsonString).toDF()
+        .repartition(1)
+        .write
+        .text(path.getAbsolutePath)
+
+      val df = spark.read.schema("data STRING").json(path.getAbsolutePath)
+
+      val expected = s"""{"v": ${granularFloat}}"""

Review Comment:
   Can you add more test cases for the following? 
   - {"data": {"v": "abc"}}, expected: "{"v": "abc"}"
   - {"data":{"v": "0.999"}}, expected: "{"v": "0.999"}"
   - {"data": [1, 2, 3]}, expected: "[1, 2, 3]"
   - {"data": <deeply-nested-object>}, expected the object as string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

Reply via email to