Re: JEP-198 - Lets start talking about JSON
> From: "Brian Goetz" > To: "Ethan McCue" , "core-libs-dev" > > Sent: Tuesday, February 28, 2023 8:48:00 PM > Subject: Re: JEP-198 - Lets start talking about JSON > As you can probably imagine, I've been thinking about these topics for quite a > while, ever since we started working on records and pattern matching. It > sounds > like a lot of your thoughts have followed a similar arc to ours. > I'll share with you some of our thoughts, but I can't be engaging in a > detailed > back-and-forth right now -- we have too many other things going on, and this > isn't yet on the front burner. I think there's a right time for this work, and > we're not quite there yet, but we'll get there soon enough and we'll pick up > the ball again then. > To the existential question: yes, there should be a simpler, built-in way to > parse JSON. And, as you observe, the railroad diagram in the JSON spec is a > graphical description of an algebraic data type. One of the great simplifying > effects of having algebraic data types (records + sealed classes) in the > language is that many data modeling problems collapse down to the point where > considerably less creativity is required of an API. Here's the JSON API one > can > write after literally only 30 seconds of thought: >> sealed interface JsonValue { >> record JsonString (String s)implements JsonValue { } >> record JsonNumber (double d)implements JsonValue { } >> record JsonNull ()implements JsonValue { } >> record JsonBoolean ( boolean b)implements JsonValue { } >> record JsonArray (List< JsonValue > values)implements JsonValue { } >> record JsonObject (Map pairs)implements JsonValue { } >> } > It matches the JSON spec almost literally, and you can use pattern matching to > parse a document. (OK, there's some tiny bit of creativity here in that > True/False have been collapsed to a single JsonBoolean type, but you get my > point.) > But, we're not quite ready to put this API into the JDK, because the language > isn't *quite* there yet. Records give you nice pattern matching, but they come > at a cost; they're very specific and have rigid ideas about initialization, > which ripples into a number of constraints on an implementation (i.e., much > harder to parse lazily.) So we're waiting until we have deconstruction > patterns > (next up on the patterns parade) so that the records above can be interfaces > and still support pattern matching (and more flexibility in implementation, > including using value classes when they arrive.) It's not a long hop, though. > I agree with your assessment of streaming models; for documents too large to > fit > into memory, we'll let someone else provide a specialized solution. Streaming > and fully-materialized-tree are not the only two options; there are plenty of > points in the middle. > As to API idioms, these can be layered. The lazy-tree model outlined above can > be a foundation for data binding, dynamic mapping to records, jsonpath, etc. > But once you've made the streaming-vs-materialized choice in favor of > materialized, it's hard to imagine not having something like the above at the > base of the tower. > The question you raise about error handling is one that infuses pattern > matching > in general. Pattern matching allows us to collapse what would be a thousand > questions -- "does key X exist? is it mapped to a number? is the number in the > range of byte?" -- each with their own failure-handling path, into a single > question. That's great for reliable and readable code, but it does make errors > more opaque, because it is more like the red "check engine" light on your > dashboard. (Something like JSONPath could generate better error messages since > you've given it a declarative description of an assumed structural invariant.) > But, imperative code that has to treat each structural assumption as a > possible > control-flow point is a disaster; we've seen too much code like this already. > The ecosystem is big enough that there will be lots of people with strong > opinions that "X is the only sensible way to do it" (we've already seen > X=databinding on this thread), but the reality is that there are multiple > overlapping audiences here, and we have to be clear which audiences we are > prioritizing. We can have that debate when the time is right. > So, we'll get there, but we're waiting for one or two more bits of language > evolution to give us the substrate for the API that feels right. > Hope this helps, > -Brian You can "simulate" deconstructors by using when + instanceof, Let say we an interface with a deconstructor that can deconstruct the instance of that interface as a
Re: JEP-198 - Lets start talking about JSON
As you can probably imagine, I've been thinking about these topics for quite a while, ever since we started working on records and pattern matching. It sounds like a lot of your thoughts have followed a similar arc to ours. I'll share with you some of our thoughts, but I can't be engaging in a detailed back-and-forth right now -- we have too many other things going on, and this isn't yet on the front burner. I think there's a right time for this work, and we're not quite there yet, but we'll get there soon enough and we'll pick up the ball again then. To the existential question: yes, there should be a simpler, built-in way to parse JSON. And, as you observe, the railroad diagram in the JSON spec is a graphical description of an algebraic data type. One of the great simplifying effects of having algebraic data types (records + sealed classes) in the language is that many data modeling problems collapse down to the point where considerably less creativity is required of an API. Here's the JSON API one can write after literally only 30 seconds of thought: sealed interface JsonValue{ record JsonString(String s)implements JsonValue{ } record JsonNumber(double d)implements JsonValue{ } record JsonNull()implements JsonValue{ } record JsonBoolean(booleanb)implements JsonValue{ } record JsonArray(List values)implements JsonValue{ } record JsonObject(Map pairs)implements JsonValue{ } } It matches the JSON spec almost literally, and you can use pattern matching to parse a document. (OK, there's some tiny bit of creativity here in that True/False have been collapsed to a single JsonBoolean type, but you get my point.) But, we're not quite ready to put this API into the JDK, because the language isn't *quite* there yet. Records give you nice pattern matching, but they come at a cost; they're very specific and have rigid ideas about initialization, which ripples into a number of constraints on an implementation (i.e., much harder to parse lazily.) So we're waiting until we have deconstruction patterns (next up on the patterns parade) so that the records above can be interfaces and still support pattern matching (and more flexibility in implementation, including using value classes when they arrive.) It's not a long hop, though. I agree with your assessment of streaming models; for documents too large to fit into memory, we'll let someone else provide a specialized solution. Streaming and fully-materialized-tree are not the only two options; there are plenty of points in the middle. As to API idioms, these can be layered. The lazy-tree model outlined above can be a foundation for data binding, dynamic mapping to records, jsonpath, etc. But once you've made the streaming-vs-materialized choice in favor of materialized, it's hard to imagine not having something like the above at the base of the tower. The question you raise about error handling is one that infuses pattern matching in general. Pattern matching allows us to collapse what would be a thousand questions -- "does key X exist? is it mapped to a number? is the number in the range of byte?" -- each with their own failure-handling path, into a single question. That's great for reliable and readable code, but it does make errors more opaque, because it is more like the red "check engine" light on your dashboard. (Something like JSONPath could generate better error messages since you've given it a declarative description of an assumed structural invariant.) But, imperative code that has to treat each structural assumption as a possible control-flow point is a disaster; we've seen too much code like this already. The ecosystem is big enough that there will be lots of people with strong opinions that "X is the only sensible way to do it" (we've already seen X=databinding on this thread), but the reality is that there are multiple overlapping audiences here, and we have to be clear which audiences we are prioritizing. We can have that debate when the time is right. So, we'll get there, but we're waiting for one or two more bits of language evolution to give us the substrate for the API that feels right. Hope this helps, -Brian On 12/15/2022 3:30 PM, Ethan McCue wrote: I'm writing this to drive some forward motion and to nerd-snipe those who know better than I do into putting their thoughts into words. There are three ways to process JSON[1] - Streaming (Push or Pull) - Traversing a Tree (Realized or Lazy) - Declarative Databind (N ways) Of these, JEP-198 explicitly ruled out providing "JAXB style type safe data binding." No justification is given, but if I had to insert my own: mapping the Json model to/from the Java/JVM object model is a cursed combo of - Huge possible design space - Unpalatably large surface for backwards compatibility - Serialization! Boo![2] So for an artifact like the JDK, it probably doesn't make sense to include. That tracks. It won't
Re: JEP-198 - Lets start talking about JSON
Link to the proxy which I forgot to include https://gist.github.com/bowbahdoe/eb29d172351162408eab5e4ee9d84fec On Tue, Feb 28, 2023 at 12:16 PM Ethan McCue wrote: > As an update to my character arc, I documented and wrote up an explanation > for the prototype library I was working on.[1] > > And I've gotten a good deal of feedback on reddit[2] and in private. > > I think its relevant to the conversation here in the sense of > > - There are more of rzwitserloot's objections to read on the general > concept JSON as a built in.[3] > - There are a lot of well reasoned objections to the manner in which I am > interpreting a JSON tree, as well > as objections to the usage of a tree as the core. JEP 198's current > writeup (which I know is subject to a rewrite/retraction) > presumes that an immutable tree would be the core data structure. > - The peanut gallery might be interested in a "base" to implement whatever > their take on an API should be. > > For that last category, I have a method-handle proxy written up for those > who want to try the "push parser into a pull parser" > transformation I alluded to in my first email of this thread. > > [1]: https://mccue.dev/pages/2-26-23-json > [2]: > https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/ > [3]: Including one that reddit took down, but can be seen through reveddit > https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6=1=new=t1_jaa3x0q_status=all > > On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue wrote: > >> Sidenote about "Project Galahad" - I know Graal uses json for a few >> things including a reflection-config.json. Food for thought. >> >> > the java.util.log experiment shows that trying to ‘core-librarize’ >> needs that the community at large already fulfills with third party deps >> isn’t a good move, >> >> I, personally, do not have much historical context for java.util.log. >> What feels distinct about providing a JSON api is that >> logging is an implicitly global thing. If a JSON api doesn't fill all >> ecosystem niches, multiple can be used alongside >> each other. >> >> > The root issue with JSON is that you just can’t tell how to interpret >> any given JSON token >> >> The point where this could be an issue is numbers. Once something is >> identified as a number we can >> >> 1. Parse it immediately. Using a long and falling back to a BigInteger. >> For decimals its harder to know >> whether to use a double or BigDecimal internally. In the library I've >> been copy pasting from to build >> a prototype that last one is an explicit option and it defaults to >> doubles for the whole parse. >> 2. Store the string and parse it upon request. We can still model it as a >> Json.Number, but the >> work of interpreting is deferred. >> >> But in general, making a tree of json values doesn't particularly affect >> our ability to interpret it >> in a certain way. That interpretation is just positional. That's just as >> true as when making assertions >> in the form of class structure and field types as it is when making >> assertions in the form of code.[2] >> >> record Thing(Instant a) {} >> >> // vs. >> >> Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a))) >> >> If anything, using a named type as a lookup key for a deserialization >> function is the less obvious >> way to do this. >> >> > I’m not sure how to square this circle >> > I don’t like the idea of shipping a non-data-binding JSON API in the >> core libs. >> >> I think the way to cube this rhombus is to find ways to like the idea of >> a non-data-binding JSON API. ¯\_(ツ)_/¯ >> >> My personal journey with that is reaching its terminus here I think. >> >> Look on the bright side though - there are legit upsides to explicit tree >> plucking! >> >> Yeah, the friction per field is slightly higher, but the relative >> friction of custom types, or multiple construction methods for a >> particular type, or maintaining compatibility with >> legacy representations, or even just handling a top level list of things >> - its much lower. >> >> And all that complexity - that an instant is made by looking for a long >> or that it is parsed from a string in a >> particular format - it lives in Java code you can see, touch, feel and >> taste. >> >> I know "nobody does this"[2] but it's not that bad, actually. >> >> [1]: I do apologize for the code sketches consistently being "what I >> think an interaction with a tree api should look like." >> That is what I have been thinking about for a while so it's hard to >> resist. >> [2]: https://youtu.be/dOgfWXw9VrI?t=1225 >> >> On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue wrote: >> >>> > are pure JSON parsers really the go-to for most people? >>> >>> Depends on what you mean by JSON parsers and it depends on what you mean >>> by people. >>> >>> To the best of my knowledge, both python and Javascript do not include >>> streaming, databinding, or path navigation capabilities in their json >>> parsers. >>> >>> >>>
Re: JEP-198 - Lets start talking about JSON
As an update to my character arc, I documented and wrote up an explanation for the prototype library I was working on.[1] And I've gotten a good deal of feedback on reddit[2] and in private. I think its relevant to the conversation here in the sense of - There are more of rzwitserloot's objections to read on the general concept JSON as a built in.[3] - There are a lot of well reasoned objections to the manner in which I am interpreting a JSON tree, as well as objections to the usage of a tree as the core. JEP 198's current writeup (which I know is subject to a rewrite/retraction) presumes that an immutable tree would be the core data structure. - The peanut gallery might be interested in a "base" to implement whatever their take on an API should be. For that last category, I have a method-handle proxy written up for those who want to try the "push parser into a pull parser" transformation I alluded to in my first email of this thread. [1]: https://mccue.dev/pages/2-26-23-json [2]: https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/ [3]: Including one that reddit took down, but can be seen through reveddit https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6=1=new=t1_jaa3x0q_status=all On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue wrote: > Sidenote about "Project Galahad" - I know Graal uses json for a few things > including a reflection-config.json. Food for thought. > > > the java.util.log experiment shows that trying to ‘core-librarize’ needs > that the community at large already fulfills with third party deps isn’t a > good move, > > I, personally, do not have much historical context for java.util.log. What > feels distinct about providing a JSON api is that > logging is an implicitly global thing. If a JSON api doesn't fill all > ecosystem niches, multiple can be used alongside > each other. > > > The root issue with JSON is that you just can’t tell how to interpret > any given JSON token > > The point where this could be an issue is numbers. Once something is > identified as a number we can > > 1. Parse it immediately. Using a long and falling back to a BigInteger. > For decimals its harder to know > whether to use a double or BigDecimal internally. In the library I've been > copy pasting from to build > a prototype that last one is an explicit option and it defaults to doubles > for the whole parse. > 2. Store the string and parse it upon request. We can still model it as a > Json.Number, but the > work of interpreting is deferred. > > But in general, making a tree of json values doesn't particularly affect > our ability to interpret it > in a certain way. That interpretation is just positional. That's just as > true as when making assertions > in the form of class structure and field types as it is when making > assertions in the form of code.[2] > > record Thing(Instant a) {} > > // vs. > > Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a))) > > If anything, using a named type as a lookup key for a deserialization > function is the less obvious > way to do this. > > > I’m not sure how to square this circle > > I don’t like the idea of shipping a non-data-binding JSON API in the > core libs. > > I think the way to cube this rhombus is to find ways to like the idea of a > non-data-binding JSON API. ¯\_(ツ)_/¯ > > My personal journey with that is reaching its terminus here I think. > > Look on the bright side though - there are legit upsides to explicit tree > plucking! > > Yeah, the friction per field is slightly higher, but the relative > friction of custom types, or multiple construction methods for a > particular type, or maintaining compatibility with > legacy representations, or even just handling a top level list of things - > its much lower. > > And all that complexity - that an instant is made by looking for a long or > that it is parsed from a string in a > particular format - it lives in Java code you can see, touch, feel and > taste. > > I know "nobody does this"[2] but it's not that bad, actually. > > [1]: I do apologize for the code sketches consistently being "what I think > an interaction with a tree api should look like." > That is what I have been thinking about for a while so it's hard to resist. > [2]: https://youtu.be/dOgfWXw9VrI?t=1225 > > On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue wrote: > >> > are pure JSON parsers really the go-to for most people? >> >> Depends on what you mean by JSON parsers and it depends on what you mean >> by people. >> >> To the best of my knowledge, both python and Javascript do not include >> streaming, databinding, or path navigation capabilities in their json >> parsers. >> >> >> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue wrote: >> >>> > The 95%+ use case for working with JSON for your average java coder is >>> best done with data binding. >>> >>> To be brave yet controversial: I'm not sure this is neccesarily true. >>> >>> I will elaborate and respond to the other points after a
Re: JEP-198 - Lets start talking about JSON
Sidenote about "Project Galahad" - I know Graal uses json for a few things including a reflection-config.json. Food for thought. > the java.util.log experiment shows that trying to ‘core-librarize’ needs that the community at large already fulfills with third party deps isn’t a good move, I, personally, do not have much historical context for java.util.log. What feels distinct about providing a JSON api is that logging is an implicitly global thing. If a JSON api doesn't fill all ecosystem niches, multiple can be used alongside each other. > The root issue with JSON is that you just can’t tell how to interpret any given JSON token The point where this could be an issue is numbers. Once something is identified as a number we can 1. Parse it immediately. Using a long and falling back to a BigInteger. For decimals its harder to know whether to use a double or BigDecimal internally. In the library I've been copy pasting from to build a prototype that last one is an explicit option and it defaults to doubles for the whole parse. 2. Store the string and parse it upon request. We can still model it as a Json.Number, but the work of interpreting is deferred. But in general, making a tree of json values doesn't particularly affect our ability to interpret it in a certain way. That interpretation is just positional. That's just as true as when making assertions in the form of class structure and field types as it is when making assertions in the form of code.[2] record Thing(Instant a) {} // vs. Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a))) If anything, using a named type as a lookup key for a deserialization function is the less obvious way to do this. > I’m not sure how to square this circle > I don’t like the idea of shipping a non-data-binding JSON API in the core libs. I think the way to cube this rhombus is to find ways to like the idea of a non-data-binding JSON API. ¯\_(ツ)_/¯ My personal journey with that is reaching its terminus here I think. Look on the bright side though - there are legit upsides to explicit tree plucking! Yeah, the friction per field is slightly higher, but the relative friction of custom types, or multiple construction methods for a particular type, or maintaining compatibility with legacy representations, or even just handling a top level list of things - its much lower. And all that complexity - that an instant is made by looking for a long or that it is parsed from a string in a particular format - it lives in Java code you can see, touch, feel and taste. I know "nobody does this"[2] but it's not that bad, actually. [1]: I do apologize for the code sketches consistently being "what I think an interaction with a tree api should look like." That is what I have been thinking about for a while so it's hard to resist. [2]: https://youtu.be/dOgfWXw9VrI?t=1225 On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue wrote: > > are pure JSON parsers really the go-to for most people? > > Depends on what you mean by JSON parsers and it depends on what you mean > by people. > > To the best of my knowledge, both python and Javascript do not include > streaming, databinding, or path navigation capabilities in their json > parsers. > > > On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue wrote: > >> > The 95%+ use case for working with JSON for your average java coder is >> best done with data binding. >> >> To be brave yet controversial: I'm not sure this is neccesarily true. >> >> I will elaborate and respond to the other points after a hot cocoa, but >> the last point is part of why I think that tree-crawling needs _something_ >> better as an API to fit the bill. >> >> With my sketch that set of requirements would be represented as >> >> record Thing( >> List xs >> ) { >> static Thing fromJson(Json json) >> var defaultList = List.of(0L); >> return new Thing(Decoder.optionalNullableField( >> json >> "xs", >> Decoder.oneOf( >> Decoder.array(Decoder.oneOf( >> x -> Long.parseLong(Decoder.string(x)), >> Decoder::long >> )) >> Decoder.null_(defaultList), >> x -> List.of(Decoder.long_(x)) >> ), >> defaultList >> )); >> ) >> } >> >> Which isn't amazing at first glance, but also >> >>{} >>{"xs": null} >>{"xs": 5} >>{"xs": [5]} {"xs": ["5"]} >>{"xs": [1, "2", "3"]} >> >> these are some wildly varied structures. You could make a solid argument >> that something which silently treats these all the same is >> a bad API for all the reasons you would consider it a good one. >> >> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger < >> lichtenberger.johan...@gmail.com> wrote: >> >>> I'll have to read the whole thing, but are pure JSON parsers really the >>> go-to for
Re: JEP-198 - Lets start talking about JSON
> are pure JSON parsers really the go-to for most people? Depends on what you mean by JSON parsers and it depends on what you mean by people. To the best of my knowledge, both python and Javascript do not include streaming, databinding, or path navigation capabilities in their json parsers. On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue wrote: > > The 95%+ use case for working with JSON for your average java coder is > best done with data binding. > > To be brave yet controversial: I'm not sure this is neccesarily true. > > I will elaborate and respond to the other points after a hot cocoa, but > the last point is part of why I think that tree-crawling needs _something_ > better as an API to fit the bill. > > With my sketch that set of requirements would be represented as > > record Thing( > List xs > ) { > static Thing fromJson(Json json) > var defaultList = List.of(0L); > return new Thing(Decoder.optionalNullableField( > json > "xs", > Decoder.oneOf( > Decoder.array(Decoder.oneOf( > x -> Long.parseLong(Decoder.string(x)), > Decoder::long > )) > Decoder.null_(defaultList), > x -> List.of(Decoder.long_(x)) > ), > defaultList > )); > ) > } > > Which isn't amazing at first glance, but also > >{} >{"xs": null} >{"xs": 5} >{"xs": [5]} {"xs": ["5"]} >{"xs": [1, "2", "3"]} > > these are some wildly varied structures. You could make a solid argument > that something which silently treats these all the same is > a bad API for all the reasons you would consider it a good one. > > On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger < > lichtenberger.johan...@gmail.com> wrote: > >> I'll have to read the whole thing, but are pure JSON parsers really the >> go-to for most people? I'm a big advocate of providing also something >> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be >> biased, of course, as I'm working on Brackit[1] in my spare time (which is >> also a query compiler and intended to be used with proven optimizations by >> document stores / JSON stores), but also can be used as an in-memory query >> engine. >> >> kind regards >> Johannes >> >> [1] https://github.com/sirixdb/brackit >> >> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot < >> rein...@zwitserloot.com>: >> >>> A recent Advent-of-Code puzzle also made me double check the support of >>> JSON in the java core libs and it is indeed a curious situation that the >>> java core libs don’t cater to it particularly well. >>> >>> However, I’m not seeing an easy way forward to try to close this hole in >>> the core library offerings. >>> >>> If you need to stream huge swaths of JSON, generally there’s a clear >>> unit size that you can just databind. Something like: >>> >>> String jsonStr = """ { "version": 5, "data": [ >>> -- 1 million relatively small records in this list -- >>> ] } """; >>> >>> >>> The usual swath of JSON parsers tend to support this (giving you a >>> stream of java instances created by databinding those small records one by >>> one), or if not, the best move forward is presumably to file a pull request >>> with those projects; the java.util.log experiment shows that trying to >>> ‘core-librarize’ needs that the community at large already fulfills with >>> third party deps isn’t a good move, especially if the core library variant >>> tries to oversimplify to avoid the trap of being too opinionated (which >>> core libs shouldn’t be). In other words, the need for ’stream this JSON for >>> me’ style APIs is even more exotic that Ethan is suggesting. >>> >>> I see a fundamental problem here: >>> >>> >>>- The 95%+ use case for working with JSON for your average java >>>coder is best done with data binding. >>>- core libs doesn’t want to provide it, partly because it’s got a >>>large design space, partly because the field’s already covered by GSON >>> and >>>Jackson-json; java.util.log proves this doesn’t work. At least, I gather >>>that’s what Ethan thinks and I agree with this assessment. >>>- A language that claims to be “batteries included” that doesn’t >>>ship with a JSON parser in this era is dubious, to say the least. >>> >>> >>> I’m not sure how to square this circle. Hence it feels like core-libs >>> needs to hold some more fundamental debates first: >>> >>> >>>- Maybe it’s time to state in a more or less official decree that >>>well-established, large design space jobs will remain the purview of >>>dependencies no matter how popular it has, unless being part of the >>>core-libs adds something more fundamental the third party deps cannot >>> bring >>>to the table (such as language integration), or the community >>> standardizes >>>on
Re: JEP-198 - Lets start talking about JSON
> The 95%+ use case for working with JSON for your average java coder is best done with data binding. To be brave yet controversial: I'm not sure this is neccesarily true. I will elaborate and respond to the other points after a hot cocoa, but the last point is part of why I think that tree-crawling needs _something_ better as an API to fit the bill. With my sketch that set of requirements would be represented as record Thing( List xs ) { static Thing fromJson(Json json) var defaultList = List.of(0L); return new Thing(Decoder.optionalNullableField( json "xs", Decoder.oneOf( Decoder.array(Decoder.oneOf( x -> Long.parseLong(Decoder.string(x)), Decoder::long )) Decoder.null_(defaultList), x -> List.of(Decoder.long_(x)) ), defaultList )); ) } Which isn't amazing at first glance, but also {} {"xs": null} {"xs": 5} {"xs": [5]} {"xs": ["5"]} {"xs": [1, "2", "3"]} these are some wildly varied structures. You could make a solid argument that something which silently treats these all the same is a bad API for all the reasons you would consider it a good one. On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger < lichtenberger.johan...@gmail.com> wrote: > I'll have to read the whole thing, but are pure JSON parsers really the > go-to for most people? I'm a big advocate of providing also something > similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be > biased, of course, as I'm working on Brackit[1] in my spare time (which is > also a query compiler and intended to be used with proven optimizations by > document stores / JSON stores), but also can be used as an in-memory query > engine. > > kind regards > Johannes > > [1] https://github.com/sirixdb/brackit > > Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot < > rein...@zwitserloot.com>: > >> A recent Advent-of-Code puzzle also made me double check the support of >> JSON in the java core libs and it is indeed a curious situation that the >> java core libs don’t cater to it particularly well. >> >> However, I’m not seeing an easy way forward to try to close this hole in >> the core library offerings. >> >> If you need to stream huge swaths of JSON, generally there’s a clear unit >> size that you can just databind. Something like: >> >> String jsonStr = """ { "version": 5, "data": [ >> -- 1 million relatively small records in this list -- >> ] } """; >> >> >> The usual swath of JSON parsers tend to support this (giving you a stream >> of java instances created by databinding those small records one by one), >> or if not, the best move forward is presumably to file a pull request with >> those projects; the java.util.log experiment shows that trying to >> ‘core-librarize’ needs that the community at large already fulfills with >> third party deps isn’t a good move, especially if the core library variant >> tries to oversimplify to avoid the trap of being too opinionated (which >> core libs shouldn’t be). In other words, the need for ’stream this JSON for >> me’ style APIs is even more exotic that Ethan is suggesting. >> >> I see a fundamental problem here: >> >> >>- The 95%+ use case for working with JSON for your average java coder >>is best done with data binding. >>- core libs doesn’t want to provide it, partly because it’s got a >>large design space, partly because the field’s already covered by GSON and >>Jackson-json; java.util.log proves this doesn’t work. At least, I gather >>that’s what Ethan thinks and I agree with this assessment. >>- A language that claims to be “batteries included” that doesn’t ship >>with a JSON parser in this era is dubious, to say the least. >> >> >> I’m not sure how to square this circle. Hence it feels like core-libs >> needs to hold some more fundamental debates first: >> >> >>- Maybe it’s time to state in a more or less official decree that >>well-established, large design space jobs will remain the purview of >>dependencies no matter how popular it has, unless being part of the >>core-libs adds something more fundamental the third party deps cannot >> bring >>to the table (such as language integration), or the community standardizes >>on a single library (JSR310’s story, more or less). JSON parsing would >>qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’ >>as Ethan pointed out. >>- Given that 99% of java projects, even really simple ones, start >>with maven/gradle and a list of deps, is that really a problem? >> >> >> I’m honestly not sure what the right answer is. On one hand, the npm >> ecosystem seems to be doing very well even though their ‘batteries >> included’ situation is an utter
Re: JEP-198 - Lets start talking about JSON
I'll have to read the whole thing, but are pure JSON parsers really the go-to for most people? I'm a big advocate of providing also something similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be biased, of course, as I'm working on Brackit[1] in my spare time (which is also a query compiler and intended to be used with proven optimizations by document stores / JSON stores), but also can be used as an in-memory query engine. kind regards Johannes [1] https://github.com/sirixdb/brackit Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot < rein...@zwitserloot.com>: > A recent Advent-of-Code puzzle also made me double check the support of > JSON in the java core libs and it is indeed a curious situation that the > java core libs don’t cater to it particularly well. > > However, I’m not seeing an easy way forward to try to close this hole in > the core library offerings. > > If you need to stream huge swaths of JSON, generally there’s a clear unit > size that you can just databind. Something like: > > String jsonStr = """ { "version": 5, "data": [ > -- 1 million relatively small records in this list -- > ] } """; > > > The usual swath of JSON parsers tend to support this (giving you a stream > of java instances created by databinding those small records one by one), > or if not, the best move forward is presumably to file a pull request with > those projects; the java.util.log experiment shows that trying to > ‘core-librarize’ needs that the community at large already fulfills with > third party deps isn’t a good move, especially if the core library variant > tries to oversimplify to avoid the trap of being too opinionated (which > core libs shouldn’t be). In other words, the need for ’stream this JSON for > me’ style APIs is even more exotic that Ethan is suggesting. > > I see a fundamental problem here: > > >- The 95%+ use case for working with JSON for your average java coder >is best done with data binding. >- core libs doesn’t want to provide it, partly because it’s got a >large design space, partly because the field’s already covered by GSON and >Jackson-json; java.util.log proves this doesn’t work. At least, I gather >that’s what Ethan thinks and I agree with this assessment. >- A language that claims to be “batteries included” that doesn’t ship >with a JSON parser in this era is dubious, to say the least. > > > I’m not sure how to square this circle. Hence it feels like core-libs > needs to hold some more fundamental debates first: > > >- Maybe it’s time to state in a more or less official decree that >well-established, large design space jobs will remain the purview of >dependencies no matter how popular it has, unless being part of the >core-libs adds something more fundamental the third party deps cannot bring >to the table (such as language integration), or the community standardizes >on a single library (JSR310’s story, more or less). JSON parsing would >qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’ >as Ethan pointed out. >- Given that 99% of java projects, even really simple ones, start with >maven/gradle and a list of deps, is that really a problem? > > > I’m honestly not sure what the right answer is. On one hand, the npm > ecosystem seems to be doing very well even though their ‘batteries > included’ situation is an utter shambles. Then again, the notion that your > average nodejs project includes 10x+ more dependencies than other languages > is likely a significant part of the security clown fiesta going on over > there as far as 3rd party deps is concerned, so by no means should java > just blindly emulate their solutions. > > I don’t like the idea of shipping a non-data-binding JSON API in the core > libs. The root issue with JSON is that you just can’t tell how to interpret > any given JSON token, because that’s not how JSON is used in practice. What > does 5 mean? Could be that I’m to take that as an int, or as a double, or > perhaps even as a j.t.Instant (epoch-millis), and defaulting behaviour > (similar to j.u.Map’s .getOrDefault is *very* convenient to parse most > JSON out there in the real world - omitting k/v pairs whose value is still > on default is very common). That’s what makes those databind libraries so > enticing: Instead of trying to pattern match my way into this behaviour: > > >- If the element isn’t there at all or null, give me a list-of-longs >with a single 0 in it. >- If the element is a number, make me a list-of-longs with 1 value in >it, that is that number, as long. >- If the element is a string, parse it into a long, then get me a list >with this one long value (because IEEE double rules mean sometimes you have >to put these things in string form or they get mangled by javascript- >eval style parsers). > > > And yet the above is quite common, and can easily be done by a databinder, > which sees
Re: JEP-198 - Lets start talking about JSON
A recent Advent-of-Code puzzle also made me double check the support of JSON in the java core libs and it is indeed a curious situation that the java core libs don’t cater to it particularly well. However, I’m not seeing an easy way forward to try to close this hole in the core library offerings. If you need to stream huge swaths of JSON, generally there’s a clear unit size that you can just databind. Something like: String jsonStr = """ { "version": 5, "data": [ -- 1 million relatively small records in this list -- ] } """; The usual swath of JSON parsers tend to support this (giving you a stream of java instances created by databinding those small records one by one), or if not, the best move forward is presumably to file a pull request with those projects; the java.util.log experiment shows that trying to ‘core-librarize’ needs that the community at large already fulfills with third party deps isn’t a good move, especially if the core library variant tries to oversimplify to avoid the trap of being too opinionated (which core libs shouldn’t be). In other words, the need for ’stream this JSON for me’ style APIs is even more exotic that Ethan is suggesting. I see a fundamental problem here: - The 95%+ use case for working with JSON for your average java coder is best done with data binding. - core libs doesn’t want to provide it, partly because it’s got a large design space, partly because the field’s already covered by GSON and Jackson-json; java.util.log proves this doesn’t work. At least, I gather that’s what Ethan thinks and I agree with this assessment. - A language that claims to be “batteries included” that doesn’t ship with a JSON parser in this era is dubious, to say the least. I’m not sure how to square this circle. Hence it feels like core-libs needs to hold some more fundamental debates first: - Maybe it’s time to state in a more or less official decree that well-established, large design space jobs will remain the purview of dependencies no matter how popular it has, unless being part of the core-libs adds something more fundamental the third party deps cannot bring to the table (such as language integration), or the community standardizes on a single library (JSR310’s story, more or less). JSON parsing would qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’ as Ethan pointed out. - Given that 99% of java projects, even really simple ones, start with maven/gradle and a list of deps, is that really a problem? I’m honestly not sure what the right answer is. On one hand, the npm ecosystem seems to be doing very well even though their ‘batteries included’ situation is an utter shambles. Then again, the notion that your average nodejs project includes 10x+ more dependencies than other languages is likely a significant part of the security clown fiesta going on over there as far as 3rd party deps is concerned, so by no means should java just blindly emulate their solutions. I don’t like the idea of shipping a non-data-binding JSON API in the core libs. The root issue with JSON is that you just can’t tell how to interpret any given JSON token, because that’s not how JSON is used in practice. What does 5 mean? Could be that I’m to take that as an int, or as a double, or perhaps even as a j.t.Instant (epoch-millis), and defaulting behaviour (similar to j.u.Map’s .getOrDefault is *very* convenient to parse most JSON out there in the real world - omitting k/v pairs whose value is still on default is very common). That’s what makes those databind libraries so enticing: Instead of trying to pattern match my way into this behaviour: - If the element isn’t there at all or null, give me a list-of-longs with a single 0 in it. - If the element is a number, make me a list-of-longs with 1 value in it, that is that number, as long. - If the element is a string, parse it into a long, then get me a list with this one long value (because IEEE double rules mean sometimes you have to put these things in string form or they get mangled by javascript-eval style parsers). And yet the above is quite common, and can easily be done by a databinder, which sees you want a List for a field whose default value is List.of(1L), and, armed with that knowledge, can transit the JSON into java in that way. You don’t *need* databinding to cater to this idea: You could for example have a jsonNode.asLong(123) method that would parse a string if need be, even. But this has nothing to do with pattern matching either. --Reinier Zwitserloot On 15 Dec 2022 at 21:30:17, Ethan McCue wrote: > I'm writing this to drive some forward motion and to nerd-snipe those who > know better than I do into putting their thoughts into words. > > There are three ways to process JSON[1] > - Streaming (Push or Pull) > - Traversing a Tree (Realized or Lazy) > - Declarative Databind (N ways) > > Of these, JEP-198 explicitly ruled out