[ https://issues.apache.org/jira/browse/SPARK-26964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774859#comment-16774859 ]
Hyukjin Kwon commented on SPARK-26964: -------------------------------------- I know it's practically fine that JSON is pretty good to store as binary or string column, it's fine. I want to be very sure primitive support is something absolutely required and useful. {quote} Looking at the source code, it seems like all of these types have support in JacksonGenerator and JacksonParser, and so most of the work will be surfacing that, rather than entirely new code. Is there something you expect to be more intricate than additions to JsonToStructs and StructsToJson (and tests)? I'm considering having a look at this myself, but if your intuition implies that this is going to be a dead end, I will not. {quote} The core logic itself can be reused but surfacing the codes is a problem. By exposing primitive array, map, the community faced a lot of corner case problems. For instance, about how we're going to handle corrupt record (Spark provides some options to handle those records). One PR had to be reverted lately, for instance, see https://github.com/apache/spark/pull/23665 . I guess it still needs considerable amount of codes. (see when we added MapType into one of both functions ,https://github.com/apache/spark/pull/18875) . One thing I am pretty sure of is that It would need some efforts to make the codes and get this into codebase - so I am being cautious here. > to_json/from_json do not match JSON spec due to not supporting scalars > ---------------------------------------------------------------------- > > Key: SPARK-26964 > URL: https://issues.apache.org/jira/browse/SPARK-26964 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.2, 2.4.0 > Reporter: Huon Wilson > Priority: Major > > Spark SQL's {{to_json}} and {{from_json}} currently support arrays and > objects, but not the scalar/primitive types. This doesn't match the JSON spec > on https://www.json.org/ or [RFC8259|https://tools.ietf.org/html/rfc8259]: a > JSON document ({{json: element}}) consists of a value surrounded by > whitespace ({{element: ws value ws}}), where a value is an object or array > _or_ a number or string etc.: > {code:none} > value > object > array > string > number > "true" > "false" > "null" > {code} > Having {{to_json}} and {{from_json}} support scalars would make them flexible > enough for a library I'm working on, where an arbitrary (user-supplied) > column needs to be turned into JSON. > NB. these newer specs differ to the original [RFC4627| > https://tools.ietf.org/html/rfc4627] (which is now obsolete) that > (essentially) had {{value: object | array}}. > This is related to SPARK-24391 and SPARK-25252, which added support for > arrays of scalars. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org