Link to the proxy which I forgot to include

https://gist.github.com/bowbahdoe/eb29d172351162408eab5e4ee9d84fec

On Tue, Feb 28, 2023 at 12:16 PM Ethan McCue <et...@mccue.dev> wrote:

> As an update to my character arc, I documented and wrote up an explanation
> for the prototype library I was working on.[1]
>
> And I've gotten a good deal of feedback on reddit[2] and in private.
>
> I think its relevant to the conversation here in the sense of
>
> - There are more of rzwitserloot's objections to read on the general
> concept JSON as a built in.[3]
> - There are a lot of well reasoned objections to the manner in which I am
> interpreting a JSON tree, as well
> as objections to the usage of a tree as the core. JEP 198's current
> writeup (which I know is subject to a rewrite/retraction)
> presumes that an immutable tree would be the core data structure.
> - The peanut gallery might be interested in a "base" to implement whatever
> their take on an API should be.
>
> For that last category, I have a method-handle proxy written up for those
> who want to try the "push parser into a pull parser"
> transformation I alluded to in my first email of this thread.
>
> [1]: https://mccue.dev/pages/2-26-23-json
> [2]:
> https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/
> [3]: Including one that reddit took down, but can be seen through reveddit
> https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6&limit=1&sort=new&show=t1_jaa3x0q&removal_status=all
>
> On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue <et...@mccue.dev> wrote:
>
>> Sidenote about "Project Galahad" - I know Graal uses json for a few
>> things including a reflection-config.json. Food for thought.
>>
>> > the java.util.log experiment shows that trying to ‘core-librarize’
>> needs that the community at large already fulfills with third party deps
>> isn’t a good move,
>>
>> I, personally, do not have much historical context for java.util.log.
>> What feels distinct about providing a JSON api is that
>> logging is an implicitly global thing. If a JSON api doesn't fill all
>> ecosystem niches, multiple can be used alongside
>> each other.
>>
>> > The root issue with JSON is that you just can’t tell how to interpret
>> any given JSON token
>>
>> The point where this could be an issue is numbers. Once something is
>> identified as a number we can
>>
>> 1. Parse it immediately. Using a long and falling back to a BigInteger.
>> For decimals its harder to know
>> whether to use a double or BigDecimal internally. In the library I've
>> been copy pasting from to build
>> a prototype that last one is an explicit option and it defaults to
>> doubles for the whole parse.
>> 2. Store the string and parse it upon request. We can still model it as a
>> Json.Number, but the
>> work of interpreting is deferred.
>>
>> But in general, making a tree of json values doesn't particularly affect
>> our ability to interpret it
>> in a certain way. That interpretation is just positional. That's just as
>> true as when making assertions
>> in the form of class structure and field types as it is when making
>> assertions in the form of code.[2]
>>
>>     record Thing(Instant a) {}
>>
>>     // vs.
>>
>>     Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a)))
>>
>> If anything, using a named type as a lookup key for a deserialization
>> function is the less obvious
>> way to do this.
>>
>> > I’m not sure how to square this circle
>> > I don’t like the idea of shipping a non-data-binding JSON API in the
>> core libs.
>>
>> I think the way to cube this rhombus is to find ways to like the idea of
>> a non-data-binding JSON API. ¯\_(ツ)_/¯
>>
>> My personal journey with that is reaching its terminus here I think.
>>
>> Look on the bright side though - there are legit upsides to explicit tree
>> plucking!
>>
>> Yeah, the friction per field is slightly higher, but the relative
>> friction of custom types, or multiple construction methods for a
>> particular type, or maintaining compatibility with
>> legacy representations, or even just handling a top level list of things
>> - its much lower.
>>
>> And all that complexity - that an instant is made by looking for a long
>> or that it is parsed from a string in a
>> particular format - it lives in Java code you can see, touch, feel and
>> taste.
>>
>> I know "nobody does this"[2] but it's not that bad, actually.
>>
>> [1]: I do apologize for the code sketches consistently being "what I
>> think an interaction with a tree api should look like."
>> That is what I have been thinking about for a while so it's hard to
>> resist.
>> [2]: https://youtu.be/dOgfWXw9VrI?t=1225
>>
>> On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue <et...@mccue.dev> wrote:
>>
>>> > are pure JSON parsers really the go-to for most people?
>>>
>>> Depends on what you mean by JSON parsers and it depends on what you mean
>>> by people.
>>>
>>> To the best of my knowledge, both python and Javascript do not include
>>> streaming, databinding, or path navigation capabilities in their json
>>> parsers.
>>>
>>>
>>> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue <et...@mccue.dev> wrote:
>>>
>>>> > The 95%+ use case for working with JSON for your average java coder
>>>> is best done with data binding.
>>>>
>>>> To be brave yet controversial: I'm not sure this is neccesarily true.
>>>>
>>>> I will elaborate and respond to the other points after a hot cocoa, but
>>>> the last point is part of why I think that tree-crawling needs _something_
>>>> better as an API to fit the bill.
>>>>
>>>> With my sketch that set of requirements would be represented as
>>>>
>>>>     record Thing(
>>>>         List<Long> xs
>>>>     ) {
>>>>         static Thing fromJson(Json json)
>>>>             var defaultList = List.of(0L);
>>>>             return new Thing(Decoder.optionalNullableField(
>>>>                 json
>>>>                 "xs",
>>>>                 Decoder.oneOf(
>>>>                     Decoder.array(Decoder.oneOf(
>>>>                         x -> Long.parseLong(Decoder.string(x)),
>>>>                         Decoder::long
>>>>                     ))
>>>>                     Decoder.null_(defaultList),
>>>>                     x -> List.of(Decoder.long_(x))
>>>>                 ),
>>>>                 defaultList
>>>>             ));
>>>>         )
>>>>     }
>>>>
>>>> Which isn't amazing at first glance, but also
>>>>
>>>>    {}
>>>>    {"xs": null}
>>>>    {"xs": 5}
>>>>    {"xs": [5]}   {"xs": ["5"]}
>>>>    {"xs": [1, "2", "3"]}
>>>>
>>>> these are some wildly varied structures. You could make a solid
>>>> argument that something which silently treats these all the same is
>>>> a bad API for all the reasons you would consider it a good one.
>>>>
>>>> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger <
>>>> lichtenberger.johan...@gmail.com> wrote:
>>>>
>>>>> I'll have to read the whole thing, but are pure JSON parsers really
>>>>> the go-to for most people? I'm a big advocate of providing also something
>>>>> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be
>>>>> biased, of course, as I'm working on Brackit[1] in my spare time (which is
>>>>> also a query compiler and intended to be used with proven optimizations by
>>>>> document stores / JSON stores), but also can be used as an in-memory query
>>>>> engine.
>>>>>
>>>>> kind regards
>>>>> Johannes
>>>>>
>>>>> [1] https://github.com/sirixdb/brackit
>>>>>
>>>>> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot <
>>>>> rein...@zwitserloot.com>:
>>>>>
>>>>>> A recent Advent-of-Code puzzle also made me double check the support
>>>>>> of JSON in the java core libs and it is indeed a curious situation that 
>>>>>> the
>>>>>> java core libs don’t cater to it particularly well.
>>>>>>
>>>>>> However, I’m not seeing an easy way forward to try to close this hole
>>>>>> in the core library offerings.
>>>>>>
>>>>>> If you need to stream huge swaths of JSON, generally there’s a clear
>>>>>> unit size that you can just databind. Something like:
>>>>>>
>>>>>> String jsonStr = """ { "version": 5, "data": [
>>>>>>   -- 1 million relatively small records in this list --
>>>>>>   ] } """;
>>>>>>
>>>>>>
>>>>>> The usual swath of JSON parsers tend to support this (giving you a
>>>>>> stream of java instances created by databinding those small records one 
>>>>>> by
>>>>>> one), or if not, the best move forward is presumably to file a pull 
>>>>>> request
>>>>>> with those projects; the java.util.log experiment shows that trying to
>>>>>> ‘core-librarize’ needs that the community at large already fulfills with
>>>>>> third party deps isn’t a good move, especially if the core library 
>>>>>> variant
>>>>>> tries to oversimplify to avoid the trap of being too opinionated (which
>>>>>> core libs shouldn’t be). In other words, the need for ’stream this JSON 
>>>>>> for
>>>>>> me’ style APIs is even more exotic that Ethan is suggesting.
>>>>>>
>>>>>> I see a fundamental problem here:
>>>>>>
>>>>>>
>>>>>>    - The 95%+ use case for working with JSON for your average java
>>>>>>    coder is best done with data binding.
>>>>>>    - core libs doesn’t want to provide it, partly because it’s got a
>>>>>>    large design space, partly because the field’s already covered by 
>>>>>> GSON and
>>>>>>    Jackson-json; java.util.log proves this doesn’t work. At least, I 
>>>>>> gather
>>>>>>    that’s what Ethan thinks and I agree with this assessment.
>>>>>>    - A language that claims to be “batteries included” that doesn’t
>>>>>>    ship with a JSON parser in this era is dubious, to say the least.
>>>>>>
>>>>>>
>>>>>> I’m not sure how to square this circle. Hence it feels like core-libs
>>>>>> needs to hold some more fundamental debates first:
>>>>>>
>>>>>>
>>>>>>    - Maybe it’s time to state in a more or less official decree that
>>>>>>    well-established, large design space jobs will remain the purview of
>>>>>>    dependencies no matter how popular it has, unless being part of the
>>>>>>    core-libs adds something more fundamental the third party deps cannot 
>>>>>> bring
>>>>>>    to the table (such as language integration), or the community 
>>>>>> standardizes
>>>>>>    on a single library (JSR310’s story, more or less). JSON parsing would
>>>>>>    qualify as ‘well-established’ (GSON and Jackson) and ‘large design 
>>>>>> space’
>>>>>>    as Ethan pointed out.
>>>>>>    - Given that 99% of java projects, even really simple ones, start
>>>>>>    with maven/gradle and a list of deps, is that really a problem?
>>>>>>
>>>>>>
>>>>>> I’m honestly not sure what the right answer is. On one hand, the npm
>>>>>> ecosystem seems to be doing very well even though their ‘batteries
>>>>>> included’ situation is an utter shambles. Then again, the notion that 
>>>>>> your
>>>>>> average nodejs project includes 10x+ more dependencies than other 
>>>>>> languages
>>>>>> is likely a significant part of the security clown fiesta going on over
>>>>>> there as far as 3rd party deps is concerned, so by no means should java
>>>>>> just blindly emulate their solutions.
>>>>>>
>>>>>> I don’t like the idea of shipping a non-data-binding JSON API in the
>>>>>> core libs. The root issue with JSON is that you just can’t tell how to
>>>>>> interpret any given JSON token, because that’s not how JSON is used in
>>>>>> practice. What does 5 mean? Could be that I’m to take that as an int,
>>>>>> or as a double, or perhaps even as a j.t.Instant (epoch-millis), and
>>>>>> defaulting behaviour (similar to j.u.Map’s .getOrDefault is *very*
>>>>>> convenient to parse most JSON out there in the real world - omitting k/v
>>>>>> pairs whose value is still on default is very common). That’s what makes
>>>>>> those databind libraries so enticing: Instead of trying to pattern match 
>>>>>> my
>>>>>> way into this behaviour:
>>>>>>
>>>>>>
>>>>>>    - If the element isn’t there at all or null, give me a
>>>>>>    list-of-longs with a single 0 in it.
>>>>>>    - If the element is a number, make me a list-of-longs with 1
>>>>>>    value in it, that is that number, as long.
>>>>>>    - If the element is a string, parse it into a long, then get me a
>>>>>>    list with this one long value (because IEEE double rules mean 
>>>>>> sometimes you
>>>>>>    have to put these things in string form or they get mangled by 
>>>>>> javascript-
>>>>>>    eval style parsers).
>>>>>>
>>>>>>
>>>>>> And yet the above is quite common, and can easily be done by a
>>>>>> databinder, which sees you want a List<Long> for a field whose
>>>>>> default value is List.of(1L), and, armed with that knowledge, can
>>>>>> transit the JSON into java in that way.
>>>>>>
>>>>>> You don’t *need* databinding to cater to this idea: You could for
>>>>>> example have a jsonNode.asLong(123) method that would parse a string
>>>>>> if need be, even. But this has nothing to do with pattern matching 
>>>>>> either.
>>>>>>
>>>>>>  --Reinier Zwitserloot
>>>>>>
>>>>>>
>>>>>> On 15 Dec 2022 at 21:30:17, Ethan McCue <et...@mccue.dev> wrote:
>>>>>>
>>>>>>> I'm writing this to drive some forward motion and to nerd-snipe
>>>>>>> those who know better than I do into putting their thoughts into words.
>>>>>>>
>>>>>>> There are three ways to process JSON[1]
>>>>>>> - Streaming (Push or Pull)
>>>>>>> - Traversing a Tree (Realized or Lazy)
>>>>>>> - Declarative Databind (N ways)
>>>>>>>
>>>>>>> Of these, JEP-198 explicitly ruled out providing "JAXB style type
>>>>>>> safe data binding."
>>>>>>>
>>>>>>> No justification is given, but if I had to insert my own: mapping
>>>>>>> the Json model to/from the Java/JVM object model is a cursed combo of
>>>>>>> - Huge possible design space
>>>>>>> - Unpalatably large surface for backwards compatibility
>>>>>>> - Serialization! Boo![2]
>>>>>>>
>>>>>>> So for an artifact like the JDK, it probably doesn't make sense to
>>>>>>> include. That tracks.
>>>>>>> It won't make everyone happy, people like databind APIs, but it
>>>>>>> tracks.
>>>>>>>
>>>>>>> So for the "read flow" these are the things to figure out.
>>>>>>>
>>>>>>>                 | Should Provide? | Intended User(s) |
>>>>>>> ----------------+-----------------+------------------+
>>>>>>>  Streaming Push |                 |                  |
>>>>>>> ----------------+-----------------+------------------+
>>>>>>>  Streaming Pull |                 |                  |
>>>>>>> ----------------+-----------------+------------------+
>>>>>>>  Realized Tree  |                 |                  |
>>>>>>> ----------------+-----------------+------------------+
>>>>>>>  Lazy Tree      |                 |                  |
>>>>>>> ----------------+-----------------+------------------+
>>>>>>>
>>>>>>> At which point, we should talk about what "meets needs of Java
>>>>>>> developers using JSON" implies.
>>>>>>>
>>>>>>> JSON is ubiquitous. Most kinds of software us schmucks write could
>>>>>>> have a reason to interact with it.
>>>>>>> The full set of "user personas" therefore aren't practical for me to
>>>>>>> talk about.[3]
>>>>>>>
>>>>>>> JSON documents, however, are not so varied.
>>>>>>>
>>>>>>> - There are small ones (1-10kb)
>>>>>>> - There are medium ones (10-1000kb)
>>>>>>> - There are big ones (1000kb-???)
>>>>>>>
>>>>>>> - There are shallow ones
>>>>>>> - There are deep ones
>>>>>>>
>>>>>>> So that feels like an easier direction to talk about it from.
>>>>>>>
>>>>>>>
>>>>>>> This repo[4] has some convenient toy examples of how some of those
>>>>>>> APIs look in libraries
>>>>>>> in the ecosystem. Specifically the Streaming Pull and Realized Tree
>>>>>>> models.
>>>>>>>
>>>>>>>         User r = new User();
>>>>>>>         while (true) {
>>>>>>>             JsonToken token = reader.peek();
>>>>>>>             switch (token) {
>>>>>>>                 case BEGIN_OBJECT:
>>>>>>>                     reader.beginObject();
>>>>>>>                     break;
>>>>>>>                 case END_OBJECT:
>>>>>>>                     reader.endObject();
>>>>>>>                     return r;
>>>>>>>                 case NAME:
>>>>>>>                     String fieldname = reader.nextName();
>>>>>>>                     switch (fieldname) {
>>>>>>>                         case "id":
>>>>>>>                             r.setId(reader.nextString());
>>>>>>>                             break;
>>>>>>>                         case "index":
>>>>>>>                             r.setIndex(reader.nextInt());
>>>>>>>                             break;
>>>>>>>                         ...
>>>>>>>                         case "friends":
>>>>>>>                             r.setFriends(new ArrayList<>());
>>>>>>>                             Friend f = null;
>>>>>>>                             carryOn = true;
>>>>>>>                             while (carryOn) {
>>>>>>>                                 token = reader.peek();
>>>>>>>                                 switch (token) {
>>>>>>>                                     case BEGIN_ARRAY:
>>>>>>>                                         reader.beginArray();
>>>>>>>                                         break;
>>>>>>>                                     case END_ARRAY:
>>>>>>>                                         reader.endArray();
>>>>>>>                                         carryOn = false;
>>>>>>>                                         break;
>>>>>>>                                     case BEGIN_OBJECT:
>>>>>>>                                         reader.beginObject();
>>>>>>>                                         f = new Friend();
>>>>>>>                                         break;
>>>>>>>                                     case END_OBJECT:
>>>>>>>                                         reader.endObject();
>>>>>>>                                         r.getFriends().add(f);
>>>>>>>                                         break;
>>>>>>>                                     case NAME:
>>>>>>>                                         String fn =
>>>>>>> reader.nextName();
>>>>>>>                                         switch (fn) {
>>>>>>>                                             case "id":
>>>>>>>
>>>>>>> f.setId(reader.nextString());
>>>>>>>                                                 break;
>>>>>>>                                             case "name":
>>>>>>>
>>>>>>> f.setName(reader.nextString());
>>>>>>>                                                 break;
>>>>>>>                                         }
>>>>>>>                                         break;
>>>>>>>                                 }
>>>>>>>                             }
>>>>>>>                             break;
>>>>>>>                     }
>>>>>>>             }
>>>>>>>
>>>>>>> I think its not hard to argue that the streaming apis are brutalist.
>>>>>>> The above is Gson, but Jackson, moshi, etc
>>>>>>> seem at least morally equivalent.
>>>>>>>
>>>>>>> Its hard to write, hard to write *correctly*, and theres is a
>>>>>>> curious protensity towards pairing it
>>>>>>> with anemic, mutable models.
>>>>>>>
>>>>>>> That being said, it handles big documents and deep documents really
>>>>>>> well. It also performs
>>>>>>> pretty darn well and is good enough as a "fallback" when the
>>>>>>> intended user experience
>>>>>>> is through something like databind.
>>>>>>>
>>>>>>> So what could we do meaningfully better with the language we have
>>>>>>> today/will have tommorow?
>>>>>>>
>>>>>>> - Sealed interfaces + Pattern matching could give a nicer model for
>>>>>>> tokens
>>>>>>>
>>>>>>>         sealed interface JsonToken {
>>>>>>>             record Field(String name) implements JsonToken {}
>>>>>>>             record BeginArray() implements JsonToken {}
>>>>>>>             record EndArray() implements JsonToken {}
>>>>>>>             record BeginObject() implements JsonToken {}
>>>>>>>             record EndObject() implements JsonToken {}
>>>>>>>             // ...
>>>>>>>         }
>>>>>>>
>>>>>>>         // ...
>>>>>>>
>>>>>>>         User r = new User();
>>>>>>>         while (true) {
>>>>>>>             JsonToken token = reader.peek();
>>>>>>>             switch (token) {
>>>>>>>                 case BeginObject __:
>>>>>>>                     reader.beginObject();
>>>>>>>                     break;
>>>>>>>                 case EndObject __:
>>>>>>>                     reader.endObject();
>>>>>>>                     return r;
>>>>>>>                 case Field("id"):
>>>>>>>                     r.setId(reader.nextString());
>>>>>>>                     break;
>>>>>>>                 case Field("index"):
>>>>>>>                     r.setIndex(reader.nextInt());
>>>>>>>                     break;
>>>>>>>
>>>>>>>                 // ...
>>>>>>>
>>>>>>>                 case Field("friends"):
>>>>>>>                     r.setFriends(new ArrayList<>());
>>>>>>>                     Friend f = null;
>>>>>>>                     carryOn = true;
>>>>>>>                     while (carryOn) {
>>>>>>>                         token = reader.peek();
>>>>>>>                         switch (token) {
>>>>>>>                 // ...
>>>>>>>
>>>>>>> - Value classes can make it all more efficient
>>>>>>>
>>>>>>>         sealed interface JsonToken {
>>>>>>>             value record Field(String name) implements JsonToken {}
>>>>>>>             value record BeginArray() implements JsonToken {}
>>>>>>>             value record EndArray() implements JsonToken {}
>>>>>>>             value record BeginObject() implements JsonToken {}
>>>>>>>             value record EndObject() implements JsonToken {}
>>>>>>>             // ...
>>>>>>>         }
>>>>>>>
>>>>>>> - (Fun One) We can transform a simpler-to-write push parser into a
>>>>>>> pull parser with Coroutines
>>>>>>>
>>>>>>>     This is just a toy we could play with while making something in
>>>>>>> the JDK. I'm pretty sure
>>>>>>>     we could make a parser which feeds into something like
>>>>>>>
>>>>>>>         interface Listener {
>>>>>>>             void onObjectStart();
>>>>>>>             void onObjectEnd();
>>>>>>>             void onArrayStart();
>>>>>>>             void onArrayEnd();
>>>>>>>             void onField(String name);
>>>>>>>             // ...
>>>>>>>         }
>>>>>>>
>>>>>>>     and invert a loop like
>>>>>>>
>>>>>>>         while (true) {
>>>>>>>             char c = next();
>>>>>>>             switch (c) {
>>>>>>>                 case '{':
>>>>>>>                     listener.onObjectStart();
>>>>>>>                     // ...
>>>>>>>                 // ...
>>>>>>>             }
>>>>>>>         }
>>>>>>>
>>>>>>>     by putting a Coroutine.yield in the callback.
>>>>>>>
>>>>>>>     That might be a meaningful simplification in code structure, I
>>>>>>> don't know enough to say.
>>>>>>>
>>>>>>> But, I think there are some hard questions like
>>>>>>>
>>>>>>> - Is the intent[5] to be make backing parser for ecosystem databind
>>>>>>> apis?
>>>>>>> - Is the intent that users who want to handle big/deep documents
>>>>>>> fall back to this?
>>>>>>> - Are those new language features / conveniences enough to offset
>>>>>>> the cost of committing to a new api?
>>>>>>> - To whom exactly does a low level api provide value?
>>>>>>> - What benefit is standardization in the JDK?
>>>>>>>
>>>>>>> and just generally - who would be the consumer(s) of this?
>>>>>>>
>>>>>>> The other kind of API still on the table is a Tree. There are two
>>>>>>> ways to handle this
>>>>>>>
>>>>>>> 1. Load it into `Object`. Use a bunch of instanceof checks/casts to
>>>>>>> confirm what it actually is.
>>>>>>>
>>>>>>>         Object v;
>>>>>>>         User u = new User();
>>>>>>>
>>>>>>>         if ((v = jso.get("id")) != null) {
>>>>>>>             u.setId((String) v);
>>>>>>>         }
>>>>>>>         if ((v = jso.get("index")) != null) {
>>>>>>>             u.setIndex(((Long) v).intValue());
>>>>>>>         }
>>>>>>>         if ((v = jso.get("guid")) != null) {
>>>>>>>             u.setGuid((String) v);
>>>>>>>         }
>>>>>>>         if ((v = jso.get("isActive")) != null) {
>>>>>>>             u.setIsActive(((Boolean) v));
>>>>>>>         }
>>>>>>>         if ((v = jso.get("balance")) != null) {
>>>>>>>             u.setBalance((String) v);
>>>>>>>         }
>>>>>>>         // ...
>>>>>>>         if ((v = jso.get("latitude")) != null) {
>>>>>>>             u.setLatitude(v instanceof BigDecimal ? ((BigDecimal)
>>>>>>> v).doubleValue() : (Double) v);
>>>>>>>         }
>>>>>>>         if ((v = jso.get("longitude")) != null) {
>>>>>>>             u.setLongitude(v instanceof BigDecimal ? ((BigDecimal)
>>>>>>> v).doubleValue() : (Double) v);
>>>>>>>         }
>>>>>>>         if ((v = jso.get("greeting")) != null) {
>>>>>>>             u.setGreeting((String) v);
>>>>>>>         }
>>>>>>>         if ((v = jso.get("favoriteFruit")) != null) {
>>>>>>>             u.setFavoriteFruit((String) v);
>>>>>>>         }
>>>>>>>         if ((v = jso.get("tags")) != null) {
>>>>>>>             List<Object> jsonarr = (List<Object>) v;
>>>>>>>             u.setTags(new ArrayList<>());
>>>>>>>             for (Object vi : jsonarr) {
>>>>>>>                 u.getTags().add((String) vi);
>>>>>>>             }
>>>>>>>         }
>>>>>>>         if ((v = jso.get("friends")) != null) {
>>>>>>>             List<Object> jsonarr = (List<Object>) v;
>>>>>>>             u.setFriends(new ArrayList<>());
>>>>>>>             for (Object vi : jsonarr) {
>>>>>>>                 Map<String, Object> jso0 = (Map<String, Object>) vi;
>>>>>>>                 Friend f = new Friend();
>>>>>>>                 f.setId((String) jso0.get("id"));
>>>>>>>                 f.setName((String) jso0.get("name"));
>>>>>>>                 u.getFriends().add(f);
>>>>>>>             }
>>>>>>>         }
>>>>>>>
>>>>>>> 2. Have an explicit model for Json, and helper methods that do said
>>>>>>> casts[6]
>>>>>>>
>>>>>>>
>>>>>>> this.setSiteSetting(readFromJson(jsonObject.getJsonObject("site")));
>>>>>>> JsonArray groups = jsonObject.getJsonArray("group");
>>>>>>> if(groups != null)
>>>>>>> {
>>>>>>> int len = groups.size();
>>>>>>> for(int i=0; i<len; i++)
>>>>>>> {
>>>>>>> JsonObject grp = groups.getJsonObject(i);
>>>>>>> SNMPSetting grpSetting = readFromJson(grp);
>>>>>>> String grpName = grp.getString("dbgroup", null);
>>>>>>> if(grpName != null && grpSetting != null)
>>>>>>> this.groupSettings.put(grpName, grpSetting);
>>>>>>> }
>>>>>>> }
>>>>>>> JsonArray hosts = jsonObject.getJsonArray("host");
>>>>>>> if(hosts != null)
>>>>>>> {
>>>>>>> int len = hosts.size();
>>>>>>> for(int i=0; i<len; i++)
>>>>>>> {
>>>>>>> JsonObject host = hosts.getJsonObject(i);
>>>>>>> SNMPSetting hostSetting = readFromJson(host);
>>>>>>> String hostName = host.getString("dbhost", null);
>>>>>>> if(hostName != null && hostSetting != null)
>>>>>>> this.hostSettings.put(hostName, hostSetting);
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> I think what has become easier to represent in the language nowadays
>>>>>>> is that explicit model for Json.
>>>>>>> Its the 101 lesson of sealed interfaces.[7] It feels nice and clean.
>>>>>>>
>>>>>>>         sealed interface Json {
>>>>>>>             final class Null implements Json {}
>>>>>>>             final class True implements Json {}
>>>>>>>             final class False implements Json {}
>>>>>>>             final class Array implements Json {}
>>>>>>>             final class Object implements Json {}
>>>>>>>             final class String implements Json {}
>>>>>>>             final class Number implements Json {}
>>>>>>>         }
>>>>>>>
>>>>>>> And the cast-and-check approach is now more viable on account of
>>>>>>> pattern matching.
>>>>>>>
>>>>>>>         if (jso.get("id") instanceof String v) {
>>>>>>>             u.setId(v);
>>>>>>>         }
>>>>>>>         if (jso.get("index") instanceof Long v) {
>>>>>>>             u.setIndex(v.intValue());
>>>>>>>         }
>>>>>>>         if (jso.get("guid") instanceof String v) {
>>>>>>>             u.setGuid(v);
>>>>>>>         }
>>>>>>>
>>>>>>>         // or
>>>>>>>
>>>>>>>         if (jso.get("id") instanceof String id &&
>>>>>>>                 jso.get("index") instanceof Long index &&
>>>>>>>                 jso.get("guid") instanceof String guid) {
>>>>>>>             return new User(id, index, guid, ...); // look ma, no
>>>>>>> setters!
>>>>>>>         }
>>>>>>>
>>>>>>>
>>>>>>> And on the horizon, again, is value types.
>>>>>>>
>>>>>>> But there are problems with this approach beyond the performance
>>>>>>> implications of loading into
>>>>>>> a tree.
>>>>>>>
>>>>>>> For one, all the code samples above have different behaviors around
>>>>>>> null keys and missing keys
>>>>>>> that are not obvious from first glance.
>>>>>>>
>>>>>>> This won't accept any null or missing fields
>>>>>>>
>>>>>>>         if (jso.get("id") instanceof String id &&
>>>>>>>                 jso.get("index") instanceof Long index &&
>>>>>>>                 jso.get("guid") instanceof String guid) {
>>>>>>>             return new User(id, index, guid, ...);
>>>>>>>         }
>>>>>>>
>>>>>>> This will accept individual null or missing fields, but also will
>>>>>>> silently ignore
>>>>>>> fields with incorrect types
>>>>>>>
>>>>>>>         if (jso.get("id") instanceof String v) {
>>>>>>>             u.setId(v);
>>>>>>>         }
>>>>>>>         if (jso.get("index") instanceof Long v) {
>>>>>>>             u.setIndex(v.intValue());
>>>>>>>         }
>>>>>>>         if (jso.get("guid") instanceof String v) {
>>>>>>>             u.setGuid(v);
>>>>>>>         }
>>>>>>>
>>>>>>> And, compared to databind where there is information about the
>>>>>>> expected structure of the document
>>>>>>> and its the job of the framework to assert that, I posit that the
>>>>>>> errors that would be encountered
>>>>>>> when writing code against this would be more like
>>>>>>>
>>>>>>>     "something wrong with user"
>>>>>>>
>>>>>>> than
>>>>>>>
>>>>>>>     "problem at users[5].name, expected string or null. got 5"
>>>>>>>
>>>>>>> Which feels unideal.
>>>>>>>
>>>>>>>
>>>>>>> One approach I find promising is something close to what Elm does
>>>>>>> with its decoders[8]. Not just combining assertion
>>>>>>> and binding like what pattern matching with records allows, but
>>>>>>> including a scheme for bubbling/nesting errors.
>>>>>>>
>>>>>>>     static String string(Json json) throws JsonDecodingException {
>>>>>>>         if (!(json instanceof Json.String jsonString)) {
>>>>>>>             throw JsonDecodingException.of(
>>>>>>>                     "expected a string",
>>>>>>>                     json
>>>>>>>             );
>>>>>>>         } else {
>>>>>>>             return jsonString.value();
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>>     static <T> T field(Json json, String fieldName, Decoder<?
>>>>>>> extends T> valueDecoder) throws JsonDecodingException {
>>>>>>>         var jsonObject = object(json);
>>>>>>>         var value = jsonObject.get(fieldName);
>>>>>>>         if (value == null) {
>>>>>>>             throw JsonDecodingException.atField(
>>>>>>>                     fieldName,
>>>>>>>                     JsonDecodingException.of(
>>>>>>>                             "no value for field",
>>>>>>>                             json
>>>>>>>                     )
>>>>>>>             );
>>>>>>>         }
>>>>>>>         else {
>>>>>>>             try {
>>>>>>>                 return valueDecoder.decode(value);
>>>>>>>             } catch (JsonDecodingException e) {
>>>>>>>                 throw JsonDecodingException.atField(
>>>>>>>                         fieldName,
>>>>>>>                         e
>>>>>>>                 );
>>>>>>>             }  catch (Exception e) {
>>>>>>>                 throw JsonDecodingException.atField(fieldName,
>>>>>>> JsonDecodingException.of(e, value));
>>>>>>>             }
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>> Which I think has some benefits over the ways I've seen of working
>>>>>>> with trees.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - It is declarative enough that folks who prefer databind might be
>>>>>>> happy enough.
>>>>>>>
>>>>>>>         static User fromJson(Json json) {
>>>>>>>             return new User(
>>>>>>>                 Decoder.field(json, "id", Decoder::string),
>>>>>>>                 Decoder.field(json, "index", Decoder::long_),
>>>>>>>                 Decoder.field(json, "guid", Decoder::string),
>>>>>>>             );
>>>>>>>         }
>>>>>>>
>>>>>>>         / ...
>>>>>>>
>>>>>>>         List<User> users = Decoders.array(json, User::fromJson);
>>>>>>>
>>>>>>> - Handling null and optional fields could be less easily conflated
>>>>>>>
>>>>>>>     Decoder.field(json, "id", Decoder::string);
>>>>>>>
>>>>>>>     Decoder.nullableField(json, "id", Decoder::string);
>>>>>>>
>>>>>>>     Decoder.optionalField(json, "id", Decoder::string);
>>>>>>>
>>>>>>>     Decoder.optionalNullableField(json, "id", Decoder::string);
>>>>>>>
>>>>>>>
>>>>>>> - It composes well with user defined classes
>>>>>>>
>>>>>>>     record Guid(String value) {
>>>>>>>         Guid {
>>>>>>>             // some assertions on the structure of value
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>>     Decoder.string(json, "guid", guid -> new
>>>>>>> Guid(Decoder.string(guid)));
>>>>>>>
>>>>>>>     // or even
>>>>>>>
>>>>>>>     record Guid(String value) {
>>>>>>>         Guid {
>>>>>>>             // some assertions on the structure of value
>>>>>>>         }
>>>>>>>
>>>>>>>         static Guid fromJson(Json json) {
>>>>>>>             return new Guid(Decoder.string(guid));
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>>     Decoder.string(json, "guid", Guid::fromJson);
>>>>>>>
>>>>>>>
>>>>>>> - When something goes wrong, the API can handle the fiddlyness of
>>>>>>> capturing information for feedback.
>>>>>>>
>>>>>>>     In the code I've sketched out its just what field/index things
>>>>>>> went wrong at. Potentially
>>>>>>>     capturing metadata like row/col numbers of the source would be
>>>>>>> sensible too.
>>>>>>>
>>>>>>>     Its just not reasonable to expect devs to do extra work to get
>>>>>>> that and its really nice to give it.
>>>>>>>
>>>>>>> There are also some downsides like
>>>>>>>
>>>>>>> -  I do not know how compatible it would be with lazy trees.
>>>>>>>
>>>>>>>      Lazy trees being the only way that a tree api could handle big
>>>>>>> or deep documents.
>>>>>>>      The general concept as applied in libraries like json-tree[9]
>>>>>>> is to navigate without
>>>>>>>      doing any work, and that clashes with wanting to instanceof
>>>>>>> check the info at the
>>>>>>>      current path.
>>>>>>>
>>>>>>> - It *almost* gives enough information to be a general schema
>>>>>>> approach
>>>>>>>
>>>>>>>     If one field fails, that in the model throws an exception
>>>>>>> immediately. If an API should
>>>>>>>     return "errors": [...], that is inconvenient to construct.
>>>>>>>
>>>>>>> - None of the existing popular libraries are doing this
>>>>>>>
>>>>>>>      The only mechanics that are strictly required to give this sort
>>>>>>> of API is lambdas. Those have
>>>>>>>      been out for a decade. Yes sealed interfaces make the data
>>>>>>> model prettier but in concept you
>>>>>>>      can build the same thing on top of anything.
>>>>>>>
>>>>>>>      I could argue that this is because of "cultural momentum" of
>>>>>>> databind or some other reason,
>>>>>>>      but the fact remains that it isn't a proven out approach.
>>>>>>>
>>>>>>>      Writing Json libraries is a todo list[10]. There are a lot of
>>>>>>> bad ideas and this might be one of the,
>>>>>>>
>>>>>>> - Performance impact of so many instanceof checks
>>>>>>>
>>>>>>>     I've gotten a 4.2% slowdown compared to the "regular" tree code
>>>>>>> without the repeated casts.
>>>>>>>
>>>>>>>     But that was with a parser that is 5x slower than Jacksons.
>>>>>>> (using the same benchmark project as for the snippets).
>>>>>>>     I think there could be reason to believe that the JIT does well
>>>>>>> enough with repeated instanceof
>>>>>>>     checks to consider it.
>>>>>>>
>>>>>>>
>>>>>>> My current thinking is that - despite not solving for large or deep
>>>>>>> documents - starting with a really "dumb" realized tree api
>>>>>>> might be the right place to start for the read side of a potential
>>>>>>> incubator module.
>>>>>>>
>>>>>>> But regardless - this feels like a good time to start more concrete
>>>>>>> conversations. I fell I should cap this email since I've reached the 
>>>>>>> point
>>>>>>> of decoherence and haven't even mentioned the write side of things
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1]:
>>>>>>> http://www.cowtowncoder.com/blog/archives/2009/01/entry_131.html
>>>>>>> [2]: https://security.snyk.io/vuln/maven?search=jackson-databind
>>>>>>> [3]: I only know like 8 people
>>>>>>> [4]:
>>>>>>> https://github.com/fabienrenaud/java-json-benchmark/blob/master/src/main/java/com/github/fabienrenaud/jjb/stream/UsersStreamDeserializer.java
>>>>>>> [5]: When I say "intent", I do so knowing full well no one has been
>>>>>>> actively thinking of this for an entire Game of Thrones
>>>>>>> [6]:
>>>>>>> https://github.com/yahoo/mysql_perf_analyzer/blob/master/myperf/src/main/java/com/yahoo/dba/perf/myperf/common/SNMPSettings.java
>>>>>>> [7]: https://www.infoq.com/articles/data-oriented-programming-java/
>>>>>>> [8]:
>>>>>>> https://package.elm-lang.org/packages/elm/json/latest/Json-Decode
>>>>>>> [9]: https://github.com/jbee/json-tree
>>>>>>> [10]: https://stackoverflow.com/a/14442630/2948173
>>>>>>> [11]: In 30 days JEP-198 it will be recognizably PI days old for the
>>>>>>> 2nd time in its history.
>>>>>>> [12]: To me, the fact that is still an open JEP is more a social
>>>>>>> convenience than anything. I could just as easily writing this exact 
>>>>>>> same
>>>>>>> email about TOML.
>>>>>>>
>>>>>>

Reply via email to