Hi Rong, On 15/11/2011 13:44, Rong Chen wrote: > Hi all, > > Since we are talking about serialization format of archetypes, I guess > we are not talking about a very large amount of data. > > I would prefer to keep the serialization format(s) as close to the > object model as possible in order to reduce differences between > standards and associated tooling work. >
that was my view in the past, but over the years, I have learned a few things: * 'serious' XML people don't do this. Instead they exploit XML attributes and other tricks to the maximum, and they get used to working in this way. So even though this manner of thinking may seem to only make sense for 'big data', they get used to working this way for everything, and indeed many books, tools and online resources are built with these assumptions. So when they see our 'purist' XML, they not only don't like it, they don't actually work that way. * Although one should not care about 'reading' raw XML (and I am the first to say that we should never ever do it!) there are people who do, and who cannot avoid it - for debugging, testing, forensic data investigations, efficiency / performance assessments and so on. Now, as we can see from inspection of both the ADL 1.4 style XML, and the JSON that Seref is producing right now (based on the purist object representation), the number of lines used by each occurrences and each cardinality, is not only large, it does actually swamp the remainder of the content of some archetypes. Line count is not a particularly useful concept - only humans see lines - parsers just see a stream of lexical strings that get turned into tokens. Nevertheless, I can see the sense in reducing the XML content down from 6 lines (= 6 x tag pairs) for each occurrences / cardinality / existence to either a single XML attribute with a String value (the "2..*" option) or else the more complex XML attributes option I described in the first post on this thread. The more I think about it, the more I think we should go with the pure String option, because: * it is the shortest form * it is the most human readable form * the same approach can be used for all three of occurrences / cardinality / existence, even though we know it is slightly overkill for existence. In sum: it would be nice to make the persisted form the same as the in-memory form, but reality doesn't work out that way, because there are different optimisation needs in each place. And the non-OO nature of XSD means that you lose that battle from the start, so better to go with the flow ;-) - thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111116/cd89b562/attachment.html>