At REGnosys we are running into fundamental limitations of Jackson's 
support for XML. I would like to know whether these limitations are 
deliberate trade-offs, or changeable design decisions that could be fixed. 
Based on that we are considering whether we can either *extend *Jackson in 
our codebase, *contribute *to Jackson directly, or *move away* from Jackson 
if it doesn't fit at all.

First of all: why Jackson?
Saying that we just want to ingest XML based on an XSD is somewhat 
hand-wavy - the JAXB project exists exactly for that use case. So maybe the 
question is better stated: why not JAXB? In short: the XSD is not our 
source of truth, our domain specific language is.

At REGnosys we maintain the open-source Rune DSL 
<https://github.com/finos/rune-dsl>, a language specifically designed for 
modelling processes in the financial industry. One important component of 
the language is *ingestion*: the process of reading serial data (JSON, XML, 
CSV, ...) in various financial standard formats and representing it in a 
uniform way in our DSL. Many of these formats are XML-based and formally 
defined as multiple XSD files, such as FpML <https://www.fpml.org/>. To 
support ingesting of these data standards, we use the following steps.

   1. Transform the XSD into Rune types. (similar to how JAXB transforms 
   XSD to Java classes)
   2. Annotate the Rune types and fields with additional serialization 
   information. (similar to what both Jackson and JAXB do/support)
   3. From this Rune model, generate Java code with custom annotations.
   4. Using a custom Jackson annotation processor, deserialize using a 
   Jackson object mapper.

Note that steps 2 to 4 are independent of the exact serial format: we don't 
just support XML, we also support JSON and CSV, and want to stay extensible 
for any future formats. That is exactly the attractiveness of Jackson and 
where we loose interest in JAXB: Jackson's design principles align 
perfectly with this goal of agnostic deserialisation and serialisation.

Issues with Jackson XML
Most of our issues come down to the way bean properties are represented. 
Their identity is purely based on the local name of the property being 
deserialized, but doesn't take into account surrounding context such as 
ordering, namespaces, or representation (e.g., XML attribute versus XML 
element).

Examples of problems we run into:

   1. Having XML elements and XML attributes with the same name is 
   unsupported.
   Issue also described here: https://stackoverflow.com/q/47199799/3083982
   E.g., <foo id="my-id"><id>MyElementId</id></foo>
   2. The @JsonUnwrapped annotation breaks some XML features. Fundamentally 
   this is because it replaces the `FromXMLParser` instance with a 
   `TokenBuffer`-based parser, which breaks assumptions for some XML related 
   features. One example is described here: 
   https://github.com/FasterXML/jackson-dataformat-xml/issues/762
   3. Jackson does not support XSD substitution groups, i.e., having a 
   single property with multiple potential names, depending on which a 
   specific subtype deserializer is used. Turns out that this is not a 
   fundamental issue: we have already extended Jackson to support it in the 
   open-source Rune Common <https://github.com/finos/rune-common> project. 
   See issue ticket here: 
   https://github.com/FasterXML/jackson-dataformat-xml/issues/679
   4. Having XML elements with the same local name, but a different 
   namespace, is unsupported. See long-standing issue ticket here: 
   https://github.com/FasterXML/jackson-dataformat-xml/issues/65
   5. Having XML elements with the same local name, but with a different 
   order, is unsupported. I don't see a direct issue open for this, but it is 
   related to this comment: 
   
https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500
   E.g., deserializing A1 and A2 to two distinct properties: <foo><a>A1</a
   ><b/><a>A2</a></foo>

While we have ideas of how to approach this, I am definitely not saying we 
have a perfect solution in mind yet. We are mostly looking to answer the 
question if it is worth looking for a solution in the first place, or if 
this is just a fundamental limitation of Jackson.

I'm happy to discuss here, but if possible, I would also be very happy to 
jump on a call sometime to talk through this. Whatever works best.

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com.

Reply via email to