[jackson-dev] XML vs JSON, Tree Model, support for "infer Arrays from sequence of Elements"?

Tatu Saloranta Sun, 28 Jan 2018 13:43:16 -0800

One of common use cases that I failed to anticipate -- ironically
enough, due to my knowledge of information models of JSON and XML, and
impedance betwixt (and not, as is more common, due to my ignorance!)
-- is the case of reading XML as `JsonNode`, processing, and often
converting between 2 formats.


The reason I did not think this would be common thing is that
conversion is lossy by its nature, without additional metadata about
structure matching.

Consider this two XML documents

<root>
   <value>123</value>
</root>

If no more information is available, it would seem to naturally match
to structure of

{ "value" : 123 }

or class:

public class Bean {
  public int value;
}

But there is another possibility; considering it just happened to be
specialization of

<root>
   <value>123</value>
   <value>456</value>
</root>

which would seem to map to

{ "value" : [ 123, 456 ] }

or class:

public class Bean {
  public List<Integer> value; // or int[]
}

Unfortunately it is impossible, then, to determine "intent" for
properties, in that equivalent of JSON Array can only be detected if
there would be more then one element. A further practical concern is
that determination of "more than one" can not be done without
mandatory buffering of content.

This is NOT a problem with databinding, actually, since in those cases
where have Java class that helps determine expected model. But it is
probably when exposing Token Streams (via JsonParser subtype
`FromXmlParser`), or as Tree Model (`JsonNode`).

With this longish explanation, here is what XmlMapper does to get a `JsonNode`:

1. All elements map as JSON Object
2. Since elements can not have duplicate properties, in case of above
example, only the LAST value is retained (since each subsequent
element replaces current value)

This handling occurs because JsonNode deserializer has no knowledge of
format-level details: to it, content is just a token stream, and that
stream would not occur with valid JSON (for example).

However: it would be relatively easy to add a new feature (either
general or XML specific), enabling of which could add on feature
wherein what looks like a replacement of a property value would
"convert" it into `ArrayNode` and change logic to start appending
values in a way that makes sense (it gets bit tricky but seems
doable).

One thing, really, that I wonder is just whether there should be
attempt to further deviate processing of XML content (like `XmlNode`
subclass -- but that gets hairy pretty fast due to class model), or
perhaps XML-specific override of `JsonNodeDeserializer`, instead of
general-purpose feature.
I think format-specific deserializer is preferable over trying to
shoe-horn other aspects since this really is more of a format quirk.

Finally, there is the little problem of 2.9 being end of 2.x dev
cycle. So addition to API, whatever it is, may have to be done in a
way that is against semantic versioning ("no API additions in patch
version"). I think that is reasonable thing to do here, if feature is
really something community would like to see.

-+ Tatu +-

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[jackson-dev] XML vs JSON, Tree Model, support for "infer Arrays from sequence of Elements"?

Reply via email to