sadikovi commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594580639
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ########## @@ -280,13 +280,32 @@ class JacksonParser( case VALUE_STRING => UTF8String.fromString(parser.getText) - case _ => + case other => // Note that it always tries to convert the data as string without the case of failure. - val writer = new ByteArrayOutputStream() - Utils.tryWithResource(factory.createGenerator(writer, JsonEncoding.UTF8)) { - generator => generator.copyCurrentStructure(parser) + val startLocation = parser.getTokenLocation + startLocation.contentReference().getRawContent match { Review Comment: Is there an existing API to get the remaining content as string? Also, would it work with multi-line JSON? ########## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ########## @@ -3865,6 +3865,24 @@ abstract class JsonSuite } } } + + test("SPARK-48148: decimal precision is preserved when object is read as string") { + withTempPath { path => + + val granularFloat = "-999.99999999999999999999999999999999995" + val jsonString = s"""{"data": {"v": ${granularFloat}}}, {"data": {"v": ${granularFloat}}}]""" Review Comment: The JSON string appears to be invalid, for example, it ends with `]` but I don't see any opening bracket for it. ########## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ########## @@ -3865,6 +3865,24 @@ abstract class JsonSuite } } } + + test("SPARK-48148: decimal precision is preserved when object is read as string") { + withTempPath { path => + Review Comment: nit: You can remove the new line on L3871. ########## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ########## @@ -3865,6 +3865,24 @@ abstract class JsonSuite } } } + + test("SPARK-48148: decimal precision is preserved when object is read as string") { + withTempPath { path => + + val granularFloat = "-999.99999999999999999999999999999999995" + val jsonString = s"""{"data": {"v": ${granularFloat}}}, {"data": {"v": ${granularFloat}}}]""" + + Seq(jsonString).toDF() + .repartition(1) + .write + .text(path.getAbsolutePath) + + val df = spark.read.schema("data STRING").json(path.getAbsolutePath) + + val expected = s"""{"v": ${granularFloat}}""" Review Comment: Can you add more test cases for the following? - {"data": {"v": "abc"}}, expected: "{"v": "abc"}" - {"data":{"v": "0.999"}}, expected: "{"v": "0.999"}" - {"data": [1, 2, 3]}, expected: "[1, 2, 3]" - {"data": <deeply-nested-object>}, expected the object as string. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org