Re: [MarkLogic Dev General] json:config for XML schema

David Lee Thu, 13 Apr 2017 20:38:17 -0700

-->
the json library was written based on experience from this work:

http://www.calldei.com/pubs/Balisage2011/index.html
http://www.balisage.net/Proceedings/vol7/html/Lee01/BalisageVol7-Lee01.html


However promising, that was a small personal 'research project' not a product 
-- certainly not efficient.
It does illuminate some of the more subtle issues -- suggest possible 
strategies (see below),
but also (unexpected) limitations of them.

--- >>>
wrt ML and schema --  and why the ML json library  take a different approach 
then the above -- or the following

>>> ".  if you look at the sc:* functions you can parse to get to schema.  And 
>>> then using a few functions to build out the structure you need create a 
>>> function that does the transformation for you. "

-
I did investigate this approach but it was not feasible in the context of the 
use cases where json:transform-xxx is targeted.
It may well be in individual cases, but doesn’t pan out so well for a 
general-purpose library


Two major issues

Schemas for BOTH Sides

In general-- to do a schema based transformation -- you need schema for *both 
sides*
If you don’t care about deterministic transformations or bi-directional 
transformations, you can make do with less.

Given full scheme on both sides up-front -- that helps reduce the problem -- 
but doesn’t solve it.
Consider a simpler example:
--> Produce a Transform for XML into XHTML.    Pick one valid transformation.  
Only one.
Pick one that makes everyone happy.  --- #profit


JSON<>XML  is harder.
If you look at the 'data mapping' field --
Even the best implementations ( say Jackson ) -- same problem.
Fairly easy to map JSON  to some generic data object.  say  "Node" tree.  easy.
Or to transform from a specific class/object into 'JSON' --> just write every 
field as a fully annotated generic JSON object.
That’s easy.  Its also 'really ugly' and generally undesirable.

The 'basic' strategy does this for JSON-> XML  and the "full" strategy does 
this for XML->JSON.
High fidelity, bi-directional, configuration-free -- schema agnostic.   Ugly as 
sin.

More than a good theory -- Make people Happy.

Even If you supply both sides -- a schema for source and target -- such as you 
can get directly with many programming languages
via reflection on class declarations -- one would think the problem is solved.  
  But its more like the XML<>XHTML problem.
You can determine what is valid, but not what is *desired*.    Valid is easy.   
Desirable -- not so easy to define or achieve.

---------  Back to Schemas .. XML + JSON
In ML -- we have neither really -- we don’t have JSON Schema currently.
And while we do have XML Schema  -- its used for a very specific  purpose.

the 'sc:*' functions mentioned expose the results of  schema validation, not 
the schema itself.
That’s a subtle problem that makes them not as useful in this case as one might 
think.
The sc:* functions operate on *instances* of XML data.  Post schema validation.
You can't use them to query a schema document in the sense desired.
They are a Reflection API into instances of validated documents, expressed as 
schema attributes/axis on the node.
They are not a reflection API into the schema document itself.


The json:transform--xxx library does use schema information for atomic types.
When converting from XML, if the type of an atomic value is known, then that 
influences the JSON output.
E.g. if it’s a xs:string then it becomes a JSON String.   an xs:int  -> JSON 
Number.  xs:boolean , JSON bool,
an empty 'xs:nullable' element -> JSON null.

Even that has problems -- most JSON parsers use 64 bit doubles to represent 
Number -- which means
any integer > 52 bits get  corrupted.  ML data types use 64 bit unsinged 
integers extensively -- so strings are used for large numbers.
You can see that in the management and metering API endpoints.

xs:date, dateTime, duration -- majority of xsd primates -- no standard JSON 
representation.

Beyond atomics -- its just too different.   XML has named  values/nodes , JSON 
has unnamed  values/nodes but named vectors/vertices.
If you look at the native JSON implementation In V8 -- there is a novel  
approach  on how native JSON objects
are mapped into XDM and XPath.

Suppose you have full schema on both sides --  and 'obvious' mappings of 
structure and naming  ---

There is still a challenge -- even in theory.
Produce a single transform that is valid, lossless *and* universally desirable.

The last part -- that’s the fun part.   Figure how to determine what is 
'desirable' -- universally.
Then implement it simply.   Produce what is desired, not what is asked for.
Make People Happy.

Very interested in how to achieve that better.



From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Vidal
Sent: Thursday, April 13, 2017 3:38 PM
To: general@developer.marklogic.com
Subject: Re: [MarkLogic Dev General] json:config for XML schema

Well, the good news is if you have a schema you already know the definition of 
the structure you need to convert.  The general issue is to deal with "mixed" 
content and linking @ref elements to their ultimate definition and things like 
xs:sequence vs xs:choice.  The good news is MarkLogic has a library that can 
execute against the schema and provide you a means to create your own custom 
code to convert to JSON etc.  if you look at the sc:* functions you can parse 
to get to schema.  And then using a few functions to build out the structure 
you need create a function that does the transformation for you.  I  have some 
various code bits I can share if you need help.  If you give me some time (say 
tomorrow) I can probably write the code to generate the json for you.  Ping me 
directly if you need any help.

Regards,

Gary Vidal

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] json:config for XML schema

Reply via email to