Hi Rong,

On 15/11/2011 13:44, Rong Chen wrote:
> Hi all,
>
> Since we are talking about serialization format of archetypes, I guess
> we are not talking about a very large amount of data.
>
> I would prefer to keep the serialization format(s) as close to the
> object model as possible in order to reduce differences between
> standards and associated tooling work.
>

that was my view in the past, but over the years, I have learned a few 
things:

  * 'serious' XML people don't do this. Instead they exploit XML
    attributes and other tricks to the maximum, and they get used to
    working in this way. So even though this manner of thinking may seem
    to only make sense for 'big data', they get used to working this way
    for everything, and indeed many books, tools and online resources
    are built with these assumptions. So when they see our 'purist' XML,
    they not only don't like it, they don't actually work that way.
  * Although one should not care about 'reading' raw XML (and I am the
    first to say that we should never ever do it!) there are people who
    do, and who cannot avoid it - for debugging, testing, forensic data
    investigations, efficiency / performance assessments and so on. Now,
    as we can see from inspection of both the ADL 1.4 style XML, and the
    JSON that Seref is producing right now (based on the purist object
    representation), the number of lines used by each occurrences and
    each cardinality, is not only large, it does actually swamp the
    remainder of the content of some archetypes. Line count is not a
    particularly useful concept - only humans see lines - parsers just
    see a stream of lexical strings that get turned into tokens.
    Nevertheless, I can see the sense in reducing the XML content down
    from 6 lines (= 6 x tag pairs) for each occurrences / cardinality /
    existence to either a single XML attribute with a String value (the
    "2..*" option) or else the more complex XML attributes option I
    described in the first post on this thread.

The more I think about it, the more I think we should go with the pure 
String option, because:

  * it is the shortest form
  * it is the most human readable form
  * the same approach can be used for all three of occurrences /
    cardinality / existence, even though we know it is slightly overkill
    for existence.

In sum: it would be nice to make the persisted form the same as the 
in-memory form, but reality doesn't work out that way, because there are 
different optimisation needs in each place. And the non-OO nature of XSD 
means that you lose that battle from the start, so better to go with the 
flow ;-)

- thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111116/cd89b562/attachment.html>

Reply via email to