Re: JEP-198 - Lets start talking about JSON

2023-02-28 Thread Remi Forax
> From: "Brian Goetz" 
> To: "Ethan McCue" , "core-libs-dev"
> 
> Sent: Tuesday, February 28, 2023 8:48:00 PM
> Subject: Re: JEP-198 - Lets start talking about JSON

> As you can probably imagine, I've been thinking about these topics for quite a
> while, ever since we started working on records and pattern matching. It 
> sounds
> like a lot of your thoughts have followed a similar arc to ours.

> I'll share with you some of our thoughts, but I can't be engaging in a 
> detailed
> back-and-forth right now -- we have too many other things going on, and this
> isn't yet on the front burner. I think there's a right time for this work, and
> we're not quite there yet, but we'll get there soon enough and we'll pick up
> the ball again then.

> To the existential question: yes, there should be a simpler, built-in way to
> parse JSON. And, as you observe, the railroad diagram in the JSON spec is a
> graphical description of an algebraic data type. One of the great simplifying
> effects of having algebraic data types (records + sealed classes) in the
> language is that many data modeling problems collapse down to the point where
> considerably less creativity is required of an API. Here's the JSON API one 
> can
> write after literally only 30 seconds of thought:

>> sealed interface JsonValue {

>> record JsonString (String s)implements JsonValue { }

>> record JsonNumber (double d)implements JsonValue { }

>> record JsonNull ()implements JsonValue { }

>> record JsonBoolean ( boolean b)implements JsonValue { }

>> record JsonArray (List< JsonValue > values)implements JsonValue { }

>> record JsonObject (Map pairs)implements JsonValue { }

>> }
> It matches the JSON spec almost literally, and you can use pattern matching to
> parse a document. (OK, there's some tiny bit of creativity here in that
> True/False have been collapsed to a single JsonBoolean type, but you get my
> point.)

> But, we're not quite ready to put this API into the JDK, because the language
> isn't *quite* there yet. Records give you nice pattern matching, but they come
> at a cost; they're very specific and have rigid ideas about initialization,
> which ripples into a number of constraints on an implementation (i.e., much
> harder to parse lazily.) So we're waiting until we have deconstruction 
> patterns
> (next up on the patterns parade) so that the records above can be interfaces
> and still support pattern matching (and more flexibility in implementation,
> including using value classes when they arrive.) It's not a long hop, though.

> I agree with your assessment of streaming models; for documents too large to 
> fit
> into memory, we'll let someone else provide a specialized solution. Streaming
> and fully-materialized-tree are not the only two options; there are plenty of
> points in the middle.

> As to API idioms, these can be layered. The lazy-tree model outlined above can
> be a foundation for data binding, dynamic mapping to records, jsonpath, etc.
> But once you've made the streaming-vs-materialized choice in favor of
> materialized, it's hard to imagine not having something like the above at the
> base of the tower.

> The question you raise about error handling is one that infuses pattern 
> matching
> in general. Pattern matching allows us to collapse what would be a thousand
> questions -- "does key X exist? is it mapped to a number? is the number in the
> range of byte?" -- each with their own failure-handling path, into a single
> question. That's great for reliable and readable code, but it does make errors
> more opaque, because it is more like the red "check engine" light on your
> dashboard. (Something like JSONPath could generate better error messages since
> you've given it a declarative description of an assumed structural invariant.)
> But, imperative code that has to treat each structural assumption as a 
> possible
> control-flow point is a disaster; we've seen too much code like this already.

> The ecosystem is big enough that there will be lots of people with strong
> opinions that "X is the only sensible way to do it" (we've already seen
> X=databinding on this thread), but the reality is that there are multiple
> overlapping audiences here, and we have to be clear which audiences we are
> prioritizing. We can have that debate when the time is right.

> So, we'll get there, but we're waiting for one or two more bits of language
> evolution to give us the substrate for the API that feels right.

> Hope this helps,
> -Brian
You can "simulate" deconstructors by using when + instanceof, 

Let say we an interface with a deconstructor that can deconstruct the instance 
of that interface as a

Re: JEP-198 - Lets start talking about JSON

2023-02-28 Thread Brian Goetz
As you can probably imagine, I've been thinking about these topics for 
quite a while, ever since we started working on records and pattern 
matching.  It sounds like a lot of your thoughts have followed a similar 
arc to ours.


I'll share with you some of our thoughts, but I can't be engaging in a 
detailed back-and-forth right now -- we have too many other things going 
on, and this isn't yet on the front burner.  I think there's a right 
time for this work, and we're not quite there yet, but we'll get there 
soon enough and we'll pick up the ball again then.


To the existential question: yes, there should be a simpler, built-in 
way to parse JSON.  And, as you observe, the railroad diagram in the 
JSON spec is a graphical description of an algebraic data type.  One of 
the great simplifying effects of having algebraic data types (records + 
sealed classes) in the language is that many data modeling problems 
collapse down to the point where considerably less creativity is 
required of an API.  Here's the JSON API one can write after literally 
only 30 seconds of thought:



sealed interface JsonValue{

record JsonString(String s)implements JsonValue{ }

record JsonNumber(double d)implements JsonValue{ }

record JsonNull()implements JsonValue{ }

record JsonBoolean(booleanb)implements JsonValue{ }

record JsonArray(List values)implements JsonValue{ }

record JsonObject(Map pairs)implements JsonValue{ }

}



It matches the JSON spec almost literally, and you can use pattern 
matching to parse a document.  (OK, there's some tiny bit of creativity 
here in that True/False have been collapsed to a single JsonBoolean 
type, but you get my point.)


But, we're not quite ready to put this API into the JDK, because the 
language isn't *quite* there yet.  Records give you nice pattern 
matching, but they come at a cost; they're very specific and have rigid 
ideas about initialization, which ripples into a number of constraints 
on an implementation (i.e., much harder to parse lazily.)  So we're 
waiting until we have deconstruction patterns (next up on the patterns 
parade) so that the records above can be interfaces and still support 
pattern matching (and more flexibility in implementation, including 
using value classes when they arrive.)  It's not a long hop, though.


I agree with your assessment of streaming models; for documents too 
large to fit into memory, we'll let someone else provide a specialized 
solution.  Streaming and fully-materialized-tree are not the only two 
options; there are plenty of points in the middle.


As to API idioms, these can be layered.  The lazy-tree model outlined 
above can be a foundation for data binding, dynamic mapping to records, 
jsonpath, etc.  But once you've made the streaming-vs-materialized 
choice in favor of materialized, it's hard to imagine not having 
something like the above at the base of the tower.


The question you raise about error handling is one that infuses pattern 
matching in general.  Pattern matching allows us to collapse what would 
be a thousand questions -- "does key X exist?  is it mapped to a 
number?  is the number in the range of byte?" -- each with their own 
failure-handling path, into a single question.  That's great for 
reliable and readable code, but it does make errors more opaque, because 
it is more like the red "check engine" light on your dashboard.  
(Something like JSONPath could generate better error messages since 
you've given it a declarative description of an assumed structural 
invariant.)  But, imperative code that has to treat each structural 
assumption as a possible control-flow point is a disaster; we've seen 
too much code like this already.


The ecosystem is big enough that there will be lots of people with 
strong opinions that "X is the only sensible way to do it" (we've 
already seen X=databinding on this thread), but the reality is that 
there are multiple overlapping audiences here, and we have to be clear 
which audiences we are prioritizing.  We can have that debate when the 
time is right.


So, we'll get there, but we're waiting for one or two more bits of 
language evolution to give us the substrate for the API that feels right.


Hope this helps,
-Brian


On 12/15/2022 3:30 PM, Ethan McCue wrote:
I'm writing this to drive some forward motion and to nerd-snipe those 
who know better than I do into putting their thoughts into words.


There are three ways to process JSON[1]
- Streaming (Push or Pull)
- Traversing a Tree (Realized or Lazy)
- Declarative Databind (N ways)

Of these, JEP-198 explicitly ruled out providing "JAXB style type safe 
data binding."


No justification is given, but if I had to insert my own: mapping the 
Json model to/from the Java/JVM object model is a cursed combo of

- Huge possible design space
- Unpalatably large surface for backwards compatibility
- Serialization! Boo![2]

So for an artifact like the JDK, it probably doesn't make sense to 
include. That tracks.

It won't 

Re: JEP-198 - Lets start talking about JSON

2023-02-28 Thread Ethan McCue
Link to the proxy which I forgot to include

https://gist.github.com/bowbahdoe/eb29d172351162408eab5e4ee9d84fec

On Tue, Feb 28, 2023 at 12:16 PM Ethan McCue  wrote:

> As an update to my character arc, I documented and wrote up an explanation
> for the prototype library I was working on.[1]
>
> And I've gotten a good deal of feedback on reddit[2] and in private.
>
> I think its relevant to the conversation here in the sense of
>
> - There are more of rzwitserloot's objections to read on the general
> concept JSON as a built in.[3]
> - There are a lot of well reasoned objections to the manner in which I am
> interpreting a JSON tree, as well
> as objections to the usage of a tree as the core. JEP 198's current
> writeup (which I know is subject to a rewrite/retraction)
> presumes that an immutable tree would be the core data structure.
> - The peanut gallery might be interested in a "base" to implement whatever
> their take on an API should be.
>
> For that last category, I have a method-handle proxy written up for those
> who want to try the "push parser into a pull parser"
> transformation I alluded to in my first email of this thread.
>
> [1]: https://mccue.dev/pages/2-26-23-json
> [2]:
> https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/
> [3]: Including one that reddit took down, but can be seen through reveddit
> https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6=1=new=t1_jaa3x0q_status=all
>
> On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue  wrote:
>
>> Sidenote about "Project Galahad" - I know Graal uses json for a few
>> things including a reflection-config.json. Food for thought.
>>
>> > the java.util.log experiment shows that trying to ‘core-librarize’
>> needs that the community at large already fulfills with third party deps
>> isn’t a good move,
>>
>> I, personally, do not have much historical context for java.util.log.
>> What feels distinct about providing a JSON api is that
>> logging is an implicitly global thing. If a JSON api doesn't fill all
>> ecosystem niches, multiple can be used alongside
>> each other.
>>
>> > The root issue with JSON is that you just can’t tell how to interpret
>> any given JSON token
>>
>> The point where this could be an issue is numbers. Once something is
>> identified as a number we can
>>
>> 1. Parse it immediately. Using a long and falling back to a BigInteger.
>> For decimals its harder to know
>> whether to use a double or BigDecimal internally. In the library I've
>> been copy pasting from to build
>> a prototype that last one is an explicit option and it defaults to
>> doubles for the whole parse.
>> 2. Store the string and parse it upon request. We can still model it as a
>> Json.Number, but the
>> work of interpreting is deferred.
>>
>> But in general, making a tree of json values doesn't particularly affect
>> our ability to interpret it
>> in a certain way. That interpretation is just positional. That's just as
>> true as when making assertions
>> in the form of class structure and field types as it is when making
>> assertions in the form of code.[2]
>>
>> record Thing(Instant a) {}
>>
>> // vs.
>>
>> Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a)))
>>
>> If anything, using a named type as a lookup key for a deserialization
>> function is the less obvious
>> way to do this.
>>
>> > I’m not sure how to square this circle
>> > I don’t like the idea of shipping a non-data-binding JSON API in the
>> core libs.
>>
>> I think the way to cube this rhombus is to find ways to like the idea of
>> a non-data-binding JSON API. ¯\_(ツ)_/¯
>>
>> My personal journey with that is reaching its terminus here I think.
>>
>> Look on the bright side though - there are legit upsides to explicit tree
>> plucking!
>>
>> Yeah, the friction per field is slightly higher, but the relative
>> friction of custom types, or multiple construction methods for a
>> particular type, or maintaining compatibility with
>> legacy representations, or even just handling a top level list of things
>> - its much lower.
>>
>> And all that complexity - that an instant is made by looking for a long
>> or that it is parsed from a string in a
>> particular format - it lives in Java code you can see, touch, feel and
>> taste.
>>
>> I know "nobody does this"[2] but it's not that bad, actually.
>>
>> [1]: I do apologize for the code sketches consistently being "what I
>> think an interaction with a tree api should look like."
>> That is what I have been thinking about for a while so it's hard to
>> resist.
>> [2]: https://youtu.be/dOgfWXw9VrI?t=1225
>>
>> On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue  wrote:
>>
>>> > are pure JSON parsers really the go-to for most people?
>>>
>>> Depends on what you mean by JSON parsers and it depends on what you mean
>>> by people.
>>>
>>> To the best of my knowledge, both python and Javascript do not include
>>> streaming, databinding, or path navigation capabilities in their json
>>> parsers.
>>>
>>>
>>> 

Re: JEP-198 - Lets start talking about JSON

2023-02-28 Thread Ethan McCue
As an update to my character arc, I documented and wrote up an explanation
for the prototype library I was working on.[1]

And I've gotten a good deal of feedback on reddit[2] and in private.

I think its relevant to the conversation here in the sense of

- There are more of rzwitserloot's objections to read on the general
concept JSON as a built in.[3]
- There are a lot of well reasoned objections to the manner in which I am
interpreting a JSON tree, as well
as objections to the usage of a tree as the core. JEP 198's current writeup
(which I know is subject to a rewrite/retraction)
presumes that an immutable tree would be the core data structure.
- The peanut gallery might be interested in a "base" to implement whatever
their take on an API should be.

For that last category, I have a method-handle proxy written up for those
who want to try the "push parser into a pull parser"
transformation I alluded to in my first email of this thread.

[1]: https://mccue.dev/pages/2-26-23-json
[2]:
https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/
[3]: Including one that reddit took down, but can be seen through reveddit
https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6=1=new=t1_jaa3x0q_status=all

On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue  wrote:

> Sidenote about "Project Galahad" - I know Graal uses json for a few things
> including a reflection-config.json. Food for thought.
>
> > the java.util.log experiment shows that trying to ‘core-librarize’ needs
> that the community at large already fulfills with third party deps isn’t a
> good move,
>
> I, personally, do not have much historical context for java.util.log. What
> feels distinct about providing a JSON api is that
> logging is an implicitly global thing. If a JSON api doesn't fill all
> ecosystem niches, multiple can be used alongside
> each other.
>
> > The root issue with JSON is that you just can’t tell how to interpret
> any given JSON token
>
> The point where this could be an issue is numbers. Once something is
> identified as a number we can
>
> 1. Parse it immediately. Using a long and falling back to a BigInteger.
> For decimals its harder to know
> whether to use a double or BigDecimal internally. In the library I've been
> copy pasting from to build
> a prototype that last one is an explicit option and it defaults to doubles
> for the whole parse.
> 2. Store the string and parse it upon request. We can still model it as a
> Json.Number, but the
> work of interpreting is deferred.
>
> But in general, making a tree of json values doesn't particularly affect
> our ability to interpret it
> in a certain way. That interpretation is just positional. That's just as
> true as when making assertions
> in the form of class structure and field types as it is when making
> assertions in the form of code.[2]
>
> record Thing(Instant a) {}
>
> // vs.
>
> Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a)))
>
> If anything, using a named type as a lookup key for a deserialization
> function is the less obvious
> way to do this.
>
> > I’m not sure how to square this circle
> > I don’t like the idea of shipping a non-data-binding JSON API in the
> core libs.
>
> I think the way to cube this rhombus is to find ways to like the idea of a
> non-data-binding JSON API. ¯\_(ツ)_/¯
>
> My personal journey with that is reaching its terminus here I think.
>
> Look on the bright side though - there are legit upsides to explicit tree
> plucking!
>
> Yeah, the friction per field is slightly higher, but the relative
> friction of custom types, or multiple construction methods for a
> particular type, or maintaining compatibility with
> legacy representations, or even just handling a top level list of things -
> its much lower.
>
> And all that complexity - that an instant is made by looking for a long or
> that it is parsed from a string in a
> particular format - it lives in Java code you can see, touch, feel and
> taste.
>
> I know "nobody does this"[2] but it's not that bad, actually.
>
> [1]: I do apologize for the code sketches consistently being "what I think
> an interaction with a tree api should look like."
> That is what I have been thinking about for a while so it's hard to resist.
> [2]: https://youtu.be/dOgfWXw9VrI?t=1225
>
> On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue  wrote:
>
>> > are pure JSON parsers really the go-to for most people?
>>
>> Depends on what you mean by JSON parsers and it depends on what you mean
>> by people.
>>
>> To the best of my knowledge, both python and Javascript do not include
>> streaming, databinding, or path navigation capabilities in their json
>> parsers.
>>
>>
>> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue  wrote:
>>
>>> > The 95%+ use case for working with JSON for your average java coder is
>>> best done with data binding.
>>>
>>> To be brave yet controversial: I'm not sure this is neccesarily true.
>>>
>>> I will elaborate and respond to the other points after a 

Re: JEP-198 - Lets start talking about JSON

2022-12-16 Thread Ethan McCue
Sidenote about "Project Galahad" - I know Graal uses json for a few things
including a reflection-config.json. Food for thought.

> the java.util.log experiment shows that trying to ‘core-librarize’ needs
that the community at large already fulfills with third party deps isn’t a
good move,

I, personally, do not have much historical context for java.util.log. What
feels distinct about providing a JSON api is that
logging is an implicitly global thing. If a JSON api doesn't fill all
ecosystem niches, multiple can be used alongside
each other.

> The root issue with JSON is that you just can’t tell how to interpret any
given JSON token

The point where this could be an issue is numbers. Once something is
identified as a number we can

1. Parse it immediately. Using a long and falling back to a BigInteger. For
decimals its harder to know
whether to use a double or BigDecimal internally. In the library I've been
copy pasting from to build
a prototype that last one is an explicit option and it defaults to doubles
for the whole parse.
2. Store the string and parse it upon request. We can still model it as a
Json.Number, but the
work of interpreting is deferred.

But in general, making a tree of json values doesn't particularly affect
our ability to interpret it
in a certain way. That interpretation is just positional. That's just as
true as when making assertions
in the form of class structure and field types as it is when making
assertions in the form of code.[2]

record Thing(Instant a) {}

// vs.

Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a)))

If anything, using a named type as a lookup key for a deserialization
function is the less obvious
way to do this.

> I’m not sure how to square this circle
> I don’t like the idea of shipping a non-data-binding JSON API in the core
libs.

I think the way to cube this rhombus is to find ways to like the idea of a
non-data-binding JSON API. ¯\_(ツ)_/¯

My personal journey with that is reaching its terminus here I think.

Look on the bright side though - there are legit upsides to explicit tree
plucking!

Yeah, the friction per field is slightly higher, but the relative
friction of custom types, or multiple construction methods for a particular
type, or maintaining compatibility with
legacy representations, or even just handling a top level list of things -
its much lower.

And all that complexity - that an instant is made by looking for a long or
that it is parsed from a string in a
particular format - it lives in Java code you can see, touch, feel and
taste.

I know "nobody does this"[2] but it's not that bad, actually.

[1]: I do apologize for the code sketches consistently being "what I think
an interaction with a tree api should look like."
That is what I have been thinking about for a while so it's hard to resist.
[2]: https://youtu.be/dOgfWXw9VrI?t=1225

On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue  wrote:

> > are pure JSON parsers really the go-to for most people?
>
> Depends on what you mean by JSON parsers and it depends on what you mean
> by people.
>
> To the best of my knowledge, both python and Javascript do not include
> streaming, databinding, or path navigation capabilities in their json
> parsers.
>
>
> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue  wrote:
>
>> > The 95%+ use case for working with JSON for your average java coder is
>> best done with data binding.
>>
>> To be brave yet controversial: I'm not sure this is neccesarily true.
>>
>> I will elaborate and respond to the other points after a hot cocoa, but
>> the last point is part of why I think that tree-crawling needs _something_
>> better as an API to fit the bill.
>>
>> With my sketch that set of requirements would be represented as
>>
>> record Thing(
>> List xs
>> ) {
>> static Thing fromJson(Json json)
>> var defaultList = List.of(0L);
>> return new Thing(Decoder.optionalNullableField(
>> json
>> "xs",
>> Decoder.oneOf(
>> Decoder.array(Decoder.oneOf(
>> x -> Long.parseLong(Decoder.string(x)),
>> Decoder::long
>> ))
>> Decoder.null_(defaultList),
>> x -> List.of(Decoder.long_(x))
>> ),
>> defaultList
>> ));
>> )
>> }
>>
>> Which isn't amazing at first glance, but also
>>
>>{}
>>{"xs": null}
>>{"xs": 5}
>>{"xs": [5]}   {"xs": ["5"]}
>>{"xs": [1, "2", "3"]}
>>
>> these are some wildly varied structures. You could make a solid argument
>> that something which silently treats these all the same is
>> a bad API for all the reasons you would consider it a good one.
>>
>> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
>> lichtenberger.johan...@gmail.com> wrote:
>>
>>> I'll have to read the whole thing, but are pure JSON parsers really the
>>> go-to for 

Re: JEP-198 - Lets start talking about JSON

2022-12-15 Thread Ethan McCue
> are pure JSON parsers really the go-to for most people?

Depends on what you mean by JSON parsers and it depends on what you mean by
people.

To the best of my knowledge, both python and Javascript do not include
streaming, databinding, or path navigation capabilities in their json
parsers.


On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue  wrote:

> > The 95%+ use case for working with JSON for your average java coder is
> best done with data binding.
>
> To be brave yet controversial: I'm not sure this is neccesarily true.
>
> I will elaborate and respond to the other points after a hot cocoa, but
> the last point is part of why I think that tree-crawling needs _something_
> better as an API to fit the bill.
>
> With my sketch that set of requirements would be represented as
>
> record Thing(
> List xs
> ) {
> static Thing fromJson(Json json)
> var defaultList = List.of(0L);
> return new Thing(Decoder.optionalNullableField(
> json
> "xs",
> Decoder.oneOf(
> Decoder.array(Decoder.oneOf(
> x -> Long.parseLong(Decoder.string(x)),
> Decoder::long
> ))
> Decoder.null_(defaultList),
> x -> List.of(Decoder.long_(x))
> ),
> defaultList
> ));
> )
> }
>
> Which isn't amazing at first glance, but also
>
>{}
>{"xs": null}
>{"xs": 5}
>{"xs": [5]}   {"xs": ["5"]}
>{"xs": [1, "2", "3"]}
>
> these are some wildly varied structures. You could make a solid argument
> that something which silently treats these all the same is
> a bad API for all the reasons you would consider it a good one.
>
> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
> lichtenberger.johan...@gmail.com> wrote:
>
>> I'll have to read the whole thing, but are pure JSON parsers really the
>> go-to for most people? I'm a big advocate of providing also something
>> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
>> biased, of course, as I'm working on Brackit[1] in my spare time (which is
>> also a query compiler and intended to be used with proven optimizations by
>> document stores / JSON stores), but also can be used as an in-memory query
>> engine.
>>
>> kind regards
>> Johannes
>>
>> [1] https://github.com/sirixdb/brackit
>>
>> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
>> rein...@zwitserloot.com>:
>>
>>> A recent Advent-of-Code puzzle also made me double check the support of
>>> JSON in the java core libs and it is indeed a curious situation that the
>>> java core libs don’t cater to it particularly well.
>>>
>>> However, I’m not seeing an easy way forward to try to close this hole in
>>> the core library offerings.
>>>
>>> If you need to stream huge swaths of JSON, generally there’s a clear
>>> unit size that you can just databind. Something like:
>>>
>>> String jsonStr = """ { "version": 5, "data": [
>>>   -- 1 million relatively small records in this list --
>>>   ] } """;
>>>
>>>
>>> The usual swath of JSON parsers tend to support this (giving you a
>>> stream of java instances created by databinding those small records one by
>>> one), or if not, the best move forward is presumably to file a pull request
>>> with those projects; the java.util.log experiment shows that trying to
>>> ‘core-librarize’ needs that the community at large already fulfills with
>>> third party deps isn’t a good move, especially if the core library variant
>>> tries to oversimplify to avoid the trap of being too opinionated (which
>>> core libs shouldn’t be). In other words, the need for ’stream this JSON for
>>> me’ style APIs is even more exotic that Ethan is suggesting.
>>>
>>> I see a fundamental problem here:
>>>
>>>
>>>- The 95%+ use case for working with JSON for your average java
>>>coder is best done with data binding.
>>>- core libs doesn’t want to provide it, partly because it’s got a
>>>large design space, partly because the field’s already covered by GSON 
>>> and
>>>Jackson-json; java.util.log proves this doesn’t work. At least, I gather
>>>that’s what Ethan thinks and I agree with this assessment.
>>>- A language that claims to be “batteries included” that doesn’t
>>>ship with a JSON parser in this era is dubious, to say the least.
>>>
>>>
>>> I’m not sure how to square this circle. Hence it feels like core-libs
>>> needs to hold some more fundamental debates first:
>>>
>>>
>>>- Maybe it’s time to state in a more or less official decree that
>>>well-established, large design space jobs will remain the purview of
>>>dependencies no matter how popular it has, unless being part of the
>>>core-libs adds something more fundamental the third party deps cannot 
>>> bring
>>>to the table (such as language integration), or the community 
>>> standardizes
>>>on 

Re: JEP-198 - Lets start talking about JSON

2022-12-15 Thread Ethan McCue
> The 95%+ use case for working with JSON for your average java coder is
best done with data binding.

To be brave yet controversial: I'm not sure this is neccesarily true.

I will elaborate and respond to the other points after a hot cocoa, but the
last point is part of why I think that tree-crawling needs _something_
better as an API to fit the bill.

With my sketch that set of requirements would be represented as

record Thing(
List xs
) {
static Thing fromJson(Json json)
var defaultList = List.of(0L);
return new Thing(Decoder.optionalNullableField(
json
"xs",
Decoder.oneOf(
Decoder.array(Decoder.oneOf(
x -> Long.parseLong(Decoder.string(x)),
Decoder::long
))
Decoder.null_(defaultList),
x -> List.of(Decoder.long_(x))
),
defaultList
));
)
}

Which isn't amazing at first glance, but also

   {}
   {"xs": null}
   {"xs": 5}
   {"xs": [5]}   {"xs": ["5"]}
   {"xs": [1, "2", "3"]}

these are some wildly varied structures. You could make a solid argument
that something which silently treats these all the same is
a bad API for all the reasons you would consider it a good one.

On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
lichtenberger.johan...@gmail.com> wrote:

> I'll have to read the whole thing, but are pure JSON parsers really the
> go-to for most people? I'm a big advocate of providing also something
> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
> biased, of course, as I'm working on Brackit[1] in my spare time (which is
> also a query compiler and intended to be used with proven optimizations by
> document stores / JSON stores), but also can be used as an in-memory query
> engine.
>
> kind regards
> Johannes
>
> [1] https://github.com/sirixdb/brackit
>
> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
> rein...@zwitserloot.com>:
>
>> A recent Advent-of-Code puzzle also made me double check the support of
>> JSON in the java core libs and it is indeed a curious situation that the
>> java core libs don’t cater to it particularly well.
>>
>> However, I’m not seeing an easy way forward to try to close this hole in
>> the core library offerings.
>>
>> If you need to stream huge swaths of JSON, generally there’s a clear unit
>> size that you can just databind. Something like:
>>
>> String jsonStr = """ { "version": 5, "data": [
>>   -- 1 million relatively small records in this list --
>>   ] } """;
>>
>>
>> The usual swath of JSON parsers tend to support this (giving you a stream
>> of java instances created by databinding those small records one by one),
>> or if not, the best move forward is presumably to file a pull request with
>> those projects; the java.util.log experiment shows that trying to
>> ‘core-librarize’ needs that the community at large already fulfills with
>> third party deps isn’t a good move, especially if the core library variant
>> tries to oversimplify to avoid the trap of being too opinionated (which
>> core libs shouldn’t be). In other words, the need for ’stream this JSON for
>> me’ style APIs is even more exotic that Ethan is suggesting.
>>
>> I see a fundamental problem here:
>>
>>
>>- The 95%+ use case for working with JSON for your average java coder
>>is best done with data binding.
>>- core libs doesn’t want to provide it, partly because it’s got a
>>large design space, partly because the field’s already covered by GSON and
>>Jackson-json; java.util.log proves this doesn’t work. At least, I gather
>>that’s what Ethan thinks and I agree with this assessment.
>>- A language that claims to be “batteries included” that doesn’t ship
>>with a JSON parser in this era is dubious, to say the least.
>>
>>
>> I’m not sure how to square this circle. Hence it feels like core-libs
>> needs to hold some more fundamental debates first:
>>
>>
>>- Maybe it’s time to state in a more or less official decree that
>>well-established, large design space jobs will remain the purview of
>>dependencies no matter how popular it has, unless being part of the
>>core-libs adds something more fundamental the third party deps cannot 
>> bring
>>to the table (such as language integration), or the community standardizes
>>on a single library (JSR310’s story, more or less). JSON parsing would
>>qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’
>>as Ethan pointed out.
>>- Given that 99% of java projects, even really simple ones, start
>>with maven/gradle and a list of deps, is that really a problem?
>>
>>
>> I’m honestly not sure what the right answer is. On one hand, the npm
>> ecosystem seems to be doing very well even though their ‘batteries
>> included’ situation is an utter 

Re: JEP-198 - Lets start talking about JSON

2022-12-15 Thread Johannes Lichtenberger
I'll have to read the whole thing, but are pure JSON parsers really the
go-to for most people? I'm a big advocate of providing also something
similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
biased, of course, as I'm working on Brackit[1] in my spare time (which is
also a query compiler and intended to be used with proven optimizations by
document stores / JSON stores), but also can be used as an in-memory query
engine.

kind regards
Johannes

[1] https://github.com/sirixdb/brackit

Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
rein...@zwitserloot.com>:

> A recent Advent-of-Code puzzle also made me double check the support of
> JSON in the java core libs and it is indeed a curious situation that the
> java core libs don’t cater to it particularly well.
>
> However, I’m not seeing an easy way forward to try to close this hole in
> the core library offerings.
>
> If you need to stream huge swaths of JSON, generally there’s a clear unit
> size that you can just databind. Something like:
>
> String jsonStr = """ { "version": 5, "data": [
>   -- 1 million relatively small records in this list --
>   ] } """;
>
>
> The usual swath of JSON parsers tend to support this (giving you a stream
> of java instances created by databinding those small records one by one),
> or if not, the best move forward is presumably to file a pull request with
> those projects; the java.util.log experiment shows that trying to
> ‘core-librarize’ needs that the community at large already fulfills with
> third party deps isn’t a good move, especially if the core library variant
> tries to oversimplify to avoid the trap of being too opinionated (which
> core libs shouldn’t be). In other words, the need for ’stream this JSON for
> me’ style APIs is even more exotic that Ethan is suggesting.
>
> I see a fundamental problem here:
>
>
>- The 95%+ use case for working with JSON for your average java coder
>is best done with data binding.
>- core libs doesn’t want to provide it, partly because it’s got a
>large design space, partly because the field’s already covered by GSON and
>Jackson-json; java.util.log proves this doesn’t work. At least, I gather
>that’s what Ethan thinks and I agree with this assessment.
>- A language that claims to be “batteries included” that doesn’t ship
>with a JSON parser in this era is dubious, to say the least.
>
>
> I’m not sure how to square this circle. Hence it feels like core-libs
> needs to hold some more fundamental debates first:
>
>
>- Maybe it’s time to state in a more or less official decree that
>well-established, large design space jobs will remain the purview of
>dependencies no matter how popular it has, unless being part of the
>core-libs adds something more fundamental the third party deps cannot bring
>to the table (such as language integration), or the community standardizes
>on a single library (JSR310’s story, more or less). JSON parsing would
>qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’
>as Ethan pointed out.
>- Given that 99% of java projects, even really simple ones, start with
>maven/gradle and a list of deps, is that really a problem?
>
>
> I’m honestly not sure what the right answer is. On one hand, the npm
> ecosystem seems to be doing very well even though their ‘batteries
> included’ situation is an utter shambles. Then again, the notion that your
> average nodejs project includes 10x+ more dependencies than other languages
> is likely a significant part of the security clown fiesta going on over
> there as far as 3rd party deps is concerned, so by no means should java
> just blindly emulate their solutions.
>
> I don’t like the idea of shipping a non-data-binding JSON API in the core
> libs. The root issue with JSON is that you just can’t tell how to interpret
> any given JSON token, because that’s not how JSON is used in practice. What
> does 5 mean? Could be that I’m to take that as an int, or as a double, or
> perhaps even as a j.t.Instant (epoch-millis), and defaulting behaviour
> (similar to j.u.Map’s .getOrDefault is *very* convenient to parse most
> JSON out there in the real world - omitting k/v pairs whose value is still
> on default is very common). That’s what makes those databind libraries so
> enticing: Instead of trying to pattern match my way into this behaviour:
>
>
>- If the element isn’t there at all or null, give me a list-of-longs
>with a single 0 in it.
>- If the element is a number, make me a list-of-longs with 1 value in
>it, that is that number, as long.
>- If the element is a string, parse it into a long, then get me a list
>with this one long value (because IEEE double rules mean sometimes you have
>to put these things in string form or they get mangled by javascript-
>eval style parsers).
>
>
> And yet the above is quite common, and can easily be done by a databinder,
> which sees 

Re: JEP-198 - Lets start talking about JSON

2022-12-15 Thread Reinier Zwitserloot
A recent Advent-of-Code puzzle also made me double check the support of
JSON in the java core libs and it is indeed a curious situation that the
java core libs don’t cater to it particularly well.

However, I’m not seeing an easy way forward to try to close this hole in
the core library offerings.

If you need to stream huge swaths of JSON, generally there’s a clear unit
size that you can just databind. Something like:

String jsonStr = """ { "version": 5, "data": [
  -- 1 million relatively small records in this list --
  ] } """;


The usual swath of JSON parsers tend to support this (giving you a stream
of java instances created by databinding those small records one by one),
or if not, the best move forward is presumably to file a pull request with
those projects; the java.util.log experiment shows that trying to
‘core-librarize’ needs that the community at large already fulfills with
third party deps isn’t a good move, especially if the core library variant
tries to oversimplify to avoid the trap of being too opinionated (which
core libs shouldn’t be). In other words, the need for ’stream this JSON for
me’ style APIs is even more exotic that Ethan is suggesting.

I see a fundamental problem here:


   - The 95%+ use case for working with JSON for your average java coder is
   best done with data binding.
   - core libs doesn’t want to provide it, partly because it’s got a large
   design space, partly because the field’s already covered by GSON and
   Jackson-json; java.util.log proves this doesn’t work. At least, I gather
   that’s what Ethan thinks and I agree with this assessment.
   - A language that claims to be “batteries included” that doesn’t ship
   with a JSON parser in this era is dubious, to say the least.


I’m not sure how to square this circle. Hence it feels like core-libs needs
to hold some more fundamental debates first:


   - Maybe it’s time to state in a more or less official decree that
   well-established, large design space jobs will remain the purview of
   dependencies no matter how popular it has, unless being part of the
   core-libs adds something more fundamental the third party deps cannot bring
   to the table (such as language integration), or the community standardizes
   on a single library (JSR310’s story, more or less). JSON parsing would
   qualify as ‘well-established’ (GSON and Jackson) and ‘large design space’
   as Ethan pointed out.
   - Given that 99% of java projects, even really simple ones, start with
   maven/gradle and a list of deps, is that really a problem?


I’m honestly not sure what the right answer is. On one hand, the npm
ecosystem seems to be doing very well even though their ‘batteries
included’ situation is an utter shambles. Then again, the notion that your
average nodejs project includes 10x+ more dependencies than other languages
is likely a significant part of the security clown fiesta going on over
there as far as 3rd party deps is concerned, so by no means should java
just blindly emulate their solutions.

I don’t like the idea of shipping a non-data-binding JSON API in the core
libs. The root issue with JSON is that you just can’t tell how to interpret
any given JSON token, because that’s not how JSON is used in practice. What
does 5 mean? Could be that I’m to take that as an int, or as a double, or
perhaps even as a j.t.Instant (epoch-millis), and defaulting behaviour
(similar to j.u.Map’s .getOrDefault is *very* convenient to parse most JSON
out there in the real world - omitting k/v pairs whose value is still on
default is very common). That’s what makes those databind libraries so
enticing: Instead of trying to pattern match my way into this behaviour:


   - If the element isn’t there at all or null, give me a list-of-longs
   with a single 0 in it.
   - If the element is a number, make me a list-of-longs with 1 value in
   it, that is that number, as long.
   - If the element is a string, parse it into a long, then get me a list
   with this one long value (because IEEE double rules mean sometimes you have
   to put these things in string form or they get mangled by
javascript-eval style
   parsers).


And yet the above is quite common, and can easily be done by a databinder,
which sees you want a List for a field whose default value is
List.of(1L), and, armed with that knowledge, can transit the JSON into java
in that way.

You don’t *need* databinding to cater to this idea: You could for example
have a jsonNode.asLong(123) method that would parse a string if need be,
even. But this has nothing to do with pattern matching either.

 --Reinier Zwitserloot


On 15 Dec 2022 at 21:30:17, Ethan McCue  wrote:

> I'm writing this to drive some forward motion and to nerd-snipe those who
> know better than I do into putting their thoughts into words.
>
> There are three ways to process JSON[1]
> - Streaming (Push or Pull)
> - Traversing a Tree (Realized or Lazy)
> - Declarative Databind (N ways)
>
> Of these, JEP-198 explicitly ruled out