occurrences and cardinality in ADL, XML, JSON
Hi! Just to make things more confused, here is another option for occurrence serialisation in JSON, YAML etc. Use arrays/sequences with two values for things like?occurrences, that way it's compact (same number of characters as occurrences: 0..5) and almost as readable, but the parser/serializer does more of the job and will even provide the programmer with data type (e.g. string or number) or null. In JSON... occurrences: [0, 5] ...and YAML... occurrences: [0, 5] The question of how to do with unbounded * still remains of course, one could do valid (ugly but compact) JSON like... occurrences: [0, *] On Fri, Nov 11, 2011 at 04:36, Andrew Patterson?andrewpatto at gmail.com?wrote: Why cant' the absence of a value mean unbounded? occurrences = ?lower = 2? Means 2..* Then a JSON like... occurrences: [2] ... (assuming occurrences are never unbounded in the lower end) or... occurrences: [2, null] ?...could mean unbounded upwards. I guess asking an API or programming language for the second value (index 1 if starting at 0) of the array will return null in both cases above. Since the short form of 1..1 often is just written as 1 in UML and ER-diagrams, the first style with occurrences: [1] meaning 1..* should probably be avoided and instead occurrences: [1, null] should be recommended for 1..* if humans are supposed to read. (And thus using occurrences: [1, 1] if you mean 1..1 and occurrences: [0, 0] if you mean 0..0) It looks a bit scary/ugly though, but probably better than [2, *] and to check for null is in many programming languages nicer than having to check datatype and possibly string content. On Fri, Nov 11, 2011 at 04:36, Andrew Patterson?andrewpatto at gmail.com?wrote: Also, what about inclusive/exclusive values at either end of the interval? I know that this isn't an issue for occurence and cardinality intervals which are always inclusive - but are we proposing that the representation of normal intervals will not use the same mechanisms are you are proposing here? What about using?booleans in an?array/sequence? inclusive: [true, false] ...meaning inclusive in lower but not upper end. But perhaps intervals need a completely different approach. Was that confusing enough? Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/? Tel: +46-13-286733 P.s. Off-topic: If this discussion was rushed and had to be decided in a time-limited face to face meeting we might already have picked the 0..*-string version and would have hesitated even to consider the above as a possibility if it popped up a few days later. (I am just trying to hint that slow open mail discussions allow more technical ideas to come forward than rushed meetings. Face to face meetings have great value too, but perhaps even more for other things than technical design.)
occurrences and cardinality in ADL, XML, JSON
Hm... some further thoughts on this. I originally chose the {0..1} curly brackets mini-syntax for ADL because it is the UML 'constraint' syntax - in UML, all diagram constraints (such as they are) are in braces (see here
occurrences and cardinality in ADL, XML, JSON
Hi Rong, On 15/11/2011 13:44, Rong Chen wrote: Hi all, Since we are talking about serialization format of archetypes, I guess we are not talking about a very large amount of data. I would prefer to keep the serialization format(s) as close to the object model as possible in order to reduce differences between standards and associated tooling work. that was my view in the past, but over the years, I have learned a few things: * 'serious' XML people don't do this. Instead they exploit XML attributes and other tricks to the maximum, and they get used to working in this way. So even though this manner of thinking may seem to only make sense for 'big data', they get used to working this way for everything, and indeed many books, tools and online resources are built with these assumptions. So when they see our 'purist' XML, they not only don't like it, they don't actually work that way. * Although one should not care about 'reading' raw XML (and I am the first to say that we should never ever do it!) there are people who do, and who cannot avoid it - for debugging, testing, forensic data investigations, efficiency / performance assessments and so on. Now, as we can see from inspection of both the ADL 1.4 style XML, and the JSON that Seref is producing right now (based on the purist object representation), the number of lines used by each occurrences and each cardinality, is not only large, it does actually swamp the remainder of the content of some archetypes. Line count is not a particularly useful concept - only humans see lines - parsers just see a stream of lexical strings that get turned into tokens. Nevertheless, I can see the sense in reducing the XML content down from 6 lines (= 6 x tag pairs) for each occurrences / cardinality / existence to either a single XML attribute with a String value (the 2..* option) or else the more complex XML attributes option I described in the first post on this thread. The more I think about it, the more I think we should go with the pure String option, because: * it is the shortest form * it is the most human readable form * the same approach can be used for all three of occurrences / cardinality / existence, even though we know it is slightly overkill for existence. In sum: it would be nice to make the persisted form the same as the in-memory form, but reality doesn't work out that way, because there are different optimisation needs in each place. And the non-OO nature of XSD means that you lose that battle from the start, so better to go with the flow ;-) - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2016/cd89b562/attachment.html
occurrences and cardinality in ADL, XML, JSON
The more I think about it, the more I think we should go with the pure String option, because: * it is the shortest form * it is the most human readable form * the same approach can be used for all three of occurrences / cardinality / existence, even though we know it is slightly overkill for existence. In sum: it would be nice to make the persisted form the same as the in-memory form, but reality doesn't work out that way, because there are different optimisation needs in each place. And the non-OO nature of XSD means that you lose that battle from the start, so better to go with the flow ;-) I think that you, as the world's first human ADL Parser, have summed this up quite nicely, I agree with you. Sebastian -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2016/f1a53732/attachment.html
occurrences and cardinality in ADL, XML, JSON
Hi Thomas, yes - everyone goes through the same process I think. The P_ classes I now have in the ADL 1.5 compiler are my latest addition in this process. [HKF: ] No, this is something you learn as it sounds like both you, I and others do doubt have learned. The first thing a new comer does is use their favourite XML toolkit to create classes and instances derived from the XML Schema. This is why we still get questions about the slight variations that we currently between the schema and the specifications. The thing is, we do want to reduce the entry point to use openEHR and if we require a custom serializer then we make this entry point harder. well, not if all the tooling is done and easy to use. Who writes their own XML parser these days? [HKF: ] Wasn't talking about that. However, actually we do, they are SAX-based readers where we want a stream reader into a domain model rather than an XML DOM. As I have stated previously, even with existing tools out there such as the Eiffel, Java, Python, Ruby and C# open source projects, people will still write their own for whatever reason. I bet there are at least a dozen Java RM implementations in the world, I know of four. Heath -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2015/dfb76841/attachment.html
occurrences and cardinality in ADL, XML, JSON
Hi all, Since we are talking about serialization format of archetypes, I guess we are not talking about a very large amount of data. I would prefer to keep the serialization format(s) as close to the object model as possible in order to reduce differences between standards and associated tooling work. Cheers, Rong On 14 November 2011 23:56, Heath Frankel heath.frankel at oceaninformatics.com wrote: Hi Thomas, yes - everyone goes through the same process I think. The P_ classes I now have in the ADL 1.5 compiler are my latest addition in this process. [HKF: ] No, this is something you learn as it sounds like both you, I and others do doubt have learned.? The first thing a new comer does is use their favourite XML toolkit to create classes and instances derived from the XML Schema.? This is why we still get questions about the slight variations that we currently between the schema and the specifications. The thing is, we do want to reduce the entry point to use openEHR and if we require a custom serializer then we make this entry point harder. well, not if all the tooling is done and easy to use. Who writes their own XML parser these days? [HKF: ] Wasn?t talking about that. ?However, actually we do, they are SAX-based readers where we want a stream reader into a domain model rather than an XML DOM. As I have stated previously, even with existing tools out there such as the Eiffel, Java, Python, Ruby and C# open source projects, people will still write their own for whatever reason.? I bet there are at least a dozen Java RM implementations in the world, I know of four. Heath ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
occurrences and cardinality in ADL, XML, JSON
I too have no problem with this custom serialisation as I have a hand-coded serializer that does the job (I gave up on the auto-generated ones years ago). However, I think we need to go back a step and get agreement from the community what the most important features of an XML serialization are: readability, size, auto-generation etc. Once we get some sort of ranking then we can score each implementation choice accordingly. I personally don't see the need to have consistently between different serialization formats, I think we should make the decisions that are best for the particular format. Having said that, I would be surprised if the logical features of the different formats would be different unless there intended use are dramatically different (i.e. the importance of auto-generation is likely to be the same for both JSON and XML). Heath -Original Message- From: openehr-technical-bounces at openehr.org [mailto:openehr-technical- bounces at openehr.org] On Behalf Of Andrew Patterson Sent: Saturday, 12 November 2011 12:26 AM To: For openEHR technical discussions Subject: Re: occurrences and cardinality in ADL, XML, JSON On 11/11/2011 11:50 PM, Thomas Beale wrote: occurrences: 1..* well that's my opinion as well, and XML-ers always react badly! The 'proper' parser code for dealing with this form, used in the ADL parser is (from the .y file): Well I consider myself an XML-er and I don't see massive problems with it, but maybe I have become soft in my old age. My main argument would be that the XML at one point was almost a straight serialization of the object model, as supported by various XML data binding libraries. So XML - AOM memory objects - XML was all doable with very standard binding libraries. BUT I was happy with status quo because I don't really care about the size of the XML or how often elements are repeated or the fact that is looks ugly to people - if people want compressed data then they should use fastinfoset or exi, and then gzip and it'll compress beautifully. The size/format/look is a concern to others. BUT If I have lost the battle and if we are going to do customised XML serializations then once you've taken it outside the normal data binding by introducing * forms or even 'properties' that aren't really properties but kind of quasi computed fields then you mind as well as give up on the pretence that the XML serialization will bind straight into an AOM compatible object model.. in which case parsing 1..* is not a problem Andrew ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
occurrences and cardinality in ADL, XML, JSON
Hi Thomas, The answers to the two questions below seem to be counter to each other. I think if we want to stay true to the RM that we should do this consistently, otherwise there is no reason why we can't deviate in other cases such as the first case below. In fact I would go further and suggest a syntax such as occurrences = 2..* in dADL and similar in XML. However, others may not be so keen on this as those starting out with openEHR like to use the built-in development tools in their favourite IDE to generate classes that match the AM/RM and automatically serialize data. This is certainly where I started but soon found limitations in this approach and now have a custom serializer. The thing is, we do want to reduce the entry point to use openEHR and if we require a custom serializer then we make this entry point harder. Regards Heath From: openehr-technical-boun...@openehr.org [mailto:openehr-technical-bounces at openehr.org] On Behalf Of Thomas Beale Sent: Friday, 11 November 2011 4:42 AM To: Openehr-Technical Subject: occurrences and cardinality in ADL, XML, JSON In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don't want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML's 'attributes'. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc. ~~ first question: occurrences and cardinality Occurrences and cardinality are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL occurrences here): occurrences = lower = 2 -- Integer field upper = 10 -- Integer field but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do: occurrences = lower = 2 -- Integer field upper_bounded = True -- Boolean field meaning that 3 possible attributes could occur for an occurrences, but only ever 2 at the same time. Or we could make everything into a string: occurrences = lower = 2 -- String field upper = * -- String field The upside is that the 'upper' attribute now handles both bounded and unbounded values. The downside is that the JSON / dADL parsers would have to do a bit more work to generate the required IntervalInteger object - since the 'upper' attribute now has to be treated as a little fragment of syntax and checked before being turned into an Integer. If we were just doing JSON, dADL and other 'proper' OO syntaxes, the first one would be the obvious one. But since we are also targetting XML, we have to think whether it makes more sense to do: children node_id=at0005 occurrences_lower=2 occurrences_upper=10 -- xsi:type=C_OBJECT rm_type_nameCLUSTER/rm_type_name and children node_id=at0005 occurrences_lower=2 occurrences_unbounded=true -- xs:boolean has to support 0/1 and true/false rm_type_nameCLUSTER/rm_type_name which is the analog of the first approach above, or it could be: children node_id=at0005 occurrences_lower=2 occurrences_upper=10 rm_type_nameCLUSTER/rm_type_name and children node_id=at0005 occurrences_lower=2 occurrences_upper=* rm_type_nameCLUSTER/rm_type_name with both attributes defined in the XSD as xs:string. This means that like for JSON/dADL, the XML standard parser only generates strings, and somehting further has to be done to obtain a proper Interval object. My preference is still to go with the first way of doing things. Do others agree with this? If so, it is what I will implement in the ADL 1.5 workbench. ~ second question:existence Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are constraint structures, they can only further constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually 0..0 and 1..1, so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble). Thus in JSON/dADL it could be: some_attr = existence = True|False In XML: attributes rm_attribute_name=nameexistence=true /attributes Now, this is cheating a bit because we are making it look like there is an AOM property 'existence' of type Boolean, but there isn't. Should it be named something else to make this clear? I.e. a pseudo attribute that only exists in serialised format but not in AOM internal format, e.g
occurrences and cardinality in ADL, XML, JSON
On 13/11/2011 22:43, Heath Frankel wrote: I too have no problem with this custom serialisation as I have a hand-coded serializer that does the job (I gave up on the auto-generated ones years ago). Heath, just to be completely clear, since we already had quite a few posts, you are happy to go with strings like 0..*, 0..1 etc? For occurrences, existence and cardinality? I realise exstence could be marginally simpler since it can only be 0..0 or 1..1 in ADL 1.5, but in ADL 1.4, there are lots of 0..1, and in any case, it just doesn't seem worth using a different method to decode existence than the other two Intervals. However, I think we need to go back a step and get agreement from the community what the most important features of an XML serialization are: readability, size, auto-generation etc. Once we get some sort of ranking then we can score each implementation choice accordingly. agree - please use these pages http://www.openehr.org/wiki/display/spec/XML+Schemas?focusedCommentId=12550150#comment-12550150 on the wiki I personally don't see the need to have consistently between different serialization formats, I think we should make the decisions that are best for the particular format. Having said that, I would be surprised if the logical features of the different formats would be different unless there intended use are dramatically different (i.e. the importance of auto-generation is likely to be the same for both JSON and XML). I would agree with these statements also... - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2014/dbe80dad/attachment.html
occurrences and cardinality in ADL, XML, JSON
Hi! On Mon, Nov 14, 2011 at 06:23, Heath Frankel heath.frankel at oceaninformatics.com wrote: However, others may not be so keen on this as those starting out with openEHR like to use the built-in development tools in their favourite IDE to generate classes that match the AM/RM and automatically serialize data. Yes. Please do not exclude the current verbose RM-mimicking XML-formats from a future version update since they certainly have a value too. They are for example very nice when you want to map AQL to xPath (and xQuery) and for generating stubs in programming languages. Is there anything stopping us from having more than one serialization alternative per formalism, e.g. both verbose and compact XML? Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2014/b92687db/attachment.html
occurrences and cardinality in ADL, XML, JSON
On 14/11/2011 05:23, Heath Frankel wrote: Hi Thomas, The answers to the two questions below seem to be counter to each other. I think if we want to stay true to the RM that we should do this consistently, otherwise there is no reason why we can't deviate in other cases such as the first case below. In fact I would go further and suggest a syntax such as occurrences = 2..* in dADL and similar in XML. yep - in dADL it is 2..* because the corresponding field in the classes P_C_ATTRIBUTE, P_C_OBJECT is a String. In XML, it is also currently a String element, but it could be an attribute. However, others may not be so keen on this as those starting out with openEHR like to use the built-in development tools in their favourite IDE to generate classes that match the AM/RM and automatically serialize data. This is certainly where I started but soon found limitations in this approach and now have a custom serializer. yes - everyone goes through the same process I think. The P_ classes I now have in the ADL 1.5 compiler are my latest addition in this process. The thing is, we do want to reduce the entry point to use openEHR and if we require a custom serializer then we make this entry point harder. well, not if all the tooling is done and easy to use. Who writes their own XML parser these days? - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2014/74dcf8f8/attachment.html
occurrences and cardinality in ADL, XML, JSON
On 14/11/2011 15:41, Erik Sundvall wrote: Hi! On Mon, Nov 14, 2011 at 06:23, Heath Frankel heath.frankel at oceaninformatics.com mailto:heath.frankel at oceaninformatics.com wrote: However, others may not be so keen on this as those starting out with openEHR like to use the built-in development tools in their favourite IDE to generate classes that match the AM/RM and automatically serialize data. Yes. Please do not exclude the current verbose RM-mimicking XML-formats from a future version update since they certainly have a value too. They are for example very nice when you want to map AQL to xPath (and xQuery) and for generating stubs in programming languages. Is there anything stopping us from having more than one serialization alternative per formalism, e.g. both verbose and compact XML? I knew someone would say that ;-) - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2014/c709d8be/attachment.html
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 7:19 PM, Erik Sundvall wrote: When a value (e.g. upper bound) may be either a number or a symbol (* or infinity) most recieveing software will need to have logic separating the cases anyway, no matter how they are serialized. So then I wonder how much harder it would be to include string parsing logic so that we can have JSON-fields with string values like... occurrences: 1..* Will a string pattern be good enough for validation by auto-generated validators or does separation into fields clearly make auto-generated validators more capable in this case? I'd agree with Eric here. The minute the receiving end has to deal with * or number then the data binder is going to need some special logic. You mind as well make the logic deal with parsing 1..*. It's not like that is much more of a challenge. So from an XML point of view we could then have children node_id=at0005 occurrences=1..* or for where we need elements occurrences value=1..* To specify wildcards for upper in XSD would have taken a regex string restriction anyway - the regex for the n..* form is similar complexity. The range string is easily implementable for JSON and YAML. Andrew
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 11:50 PM, Thomas Beale wrote: occurrences: 1..* well that's my opinion as well, and XML-ers always react badly! The 'proper' parser code for dealing with this form, used in the ADL parser is (from the .y file): Well I consider myself an XML-er and I don't see massive problems with it, but maybe I have become soft in my old age. My main argument would be that the XML at one point was almost a straight serialization of the object model, as supported by various XML data binding libraries. So XML - AOM memory objects - XML was all doable with very standard binding libraries. BUT I was happy with status quo because I don't really care about the size of the XML or how often elements are repeated or the fact that is looks ugly to people - if people want compressed data then they should use fastinfoset or exi, and then gzip and it'll compress beautifully. The size/format/look is a concern to others. BUT If I have lost the battle and if we are going to do customised XML serializations then once you've taken it outside the normal data binding by introducing * forms or even 'properties' that aren't really properties but kind of quasi computed fields then you mind as well as give up on the pretence that the XML serialization will bind straight into an AOM compatible object model.. in which case parsing 1..* is not a problem Andrew
occurrences and cardinality in ADL, XML, JSON
On 12/11/2011 1:16 AM, Ian McNicoll wrote: Apart from the size issue, readability is a particular problem because of the verbosity of the current XML schema. I'm not convinced that human readability should matter too much (especially seeing ADL is meant to be the human readable format - if we have readable XML can we ditch the ADL??) But I'm not passionately opposed to it or anything :) Just when it was brought up in the past many moons ago I thought we had other more pressing issues. But if the change is happening as part of an update to 1.5 then I'm all for it. Andrew
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 16:21, pablo pazos wrote: Hi, I was thinking of this a lot: using a schema-less formats to represent archetypes and RM instances. I think if we agree on a common language/standard/definition, we don't need to define the types of any node on a JSON/YAML structure, because those types are defined on the laguage/standard/definition those structures will follow. And if we define a good serialization to JSON/YAML of archetypes and RM instances, we don't need a schema to share instances of those structures, we just need to implement the serialization definitions, and base the parsing on the attribute names. What do you think? PS: I was thinking of archetypes serialized to JSON because I want to build a web-based GUI Generation layer completely implemented with Javascript (JSON objects are javascript objects), so we can useshare this thin layer to show archetype-based GUI generation easily, and, if we have a REST layer that implement EHR-Server services, we can user that GUI layer to send data input to the server and get information to show (a complete circle). If anyone want to collaborate on the JSON format of ADL/AOM please send contact me. -- Again, I agree with this point of view. But XML people may not but now I should clarify something... I should have explained on other thing: what I have done in the current AOM 1.5 implementation (but not yet documented) is to create a parallel set of P_XX classes ('P_' means 'persistent') like P_ARCHETYPE, P_C_OBJECT and so on. These classes formally specify the serialised form of the archetype so there can be no ambiguity. It is these classes that current have occurrences, cardinality and existence defined as String properties. There are a few other simplifications as well. My proposal is to add these P_XX class definitions to the specification. It mihgt seem like slight overkill (and I resisted it for a long time) but once I implemented it, it seems worthwhile, and it allows us to separate the in-memory computable version of the AOM from a P_ version whose sole purpose is serialisation. The Eiffel P_ classes are here http://www.openehr.org/svn/ref_impl_eiffel/BRANCHES/adl1.5/libraries/openehr/src/am/persistence/; it is easy to imagine what the Java, Python etc would look like. So Pablo's argument, applied to the P_ classes would indeed mean that the serialised form in JSON, YAML (also dADL) is a pure consequence of the P_AOM classes, and no extra logic is needed. That is why I built the P_ classes. - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2012/ad63f527/attachment.html
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 5:11 AM, Thomas Beale wrote: In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don't want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML's 'attributes'. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc. ~~ first question: occurrences and cardinality Occurrences and cardinality are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL occurrences here): occurrences = lower = 2 -- Integer field upper = 10 -- Integer field but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do: occurrences = lower = 2 -- Integer field upper_bounded = True -- Boolean field Why cant' the absence of a value mean unbounded? occurrences = lower = 2 Means 2..* I vaguely remember us discussing this many moons ago but I've forgotten the rationale.. Also, what about inclusive/exclusive values at either end of the interval? I know that this isn't an issue for occurence and cardinality intervals which are always inclusive - but are we proposing that the representation of normal intervals will not use the same mechanisms are you are proposing here? ~ second question:existence Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are /constraint/ structures, they can only /further /constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually 0..0 and 1..1, so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble). Thus in JSON/dADL it could be: some_attr = existence = True|False In XML: attributes rm_attribute_name=name*existence*=true /attributes If it was just to optimize the XML I'd give this a vote of 'meh'.. but given that existence is not really an interval because as you say it has very few possible valid values, I think the removal of the ambiguity by turning it into a single boolean is probably worthwhile. Andrew
occurrences and cardinality in ADL, XML, JSON
Hi All As ADL only states constraints there is no logical reason to include unbounded. So no constraint expressed = RM max. This is likely to be one or unbounded. Sent from my phone On 11/11/2011, at 5:11 AM, Thomas Beale thomas.beale at oceaninformatics.com wrote: In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don't want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML's 'attributes'. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc. ~~ first question: occurrences and cardinality Occurrences and cardinality are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL occurrences here): occurrences = lower = 2 -- Integer field upper = 10 -- Integer field but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do: occurrences = lower = 2 -- Integer field upper_bounded = True -- Boolean field Sam: no need for this. meaning that 3 possible attributes could occur for an occurrences, but only ever 2 at the same time. Or we could make everything into a string: occurrences = lower = 2 -- String field upper = * -- String field Sam: no need for this The upside is that the 'upper' attribute now handles both bounded and unbounded values. The downside is that the JSON / dADL parsers would have to do a bit more work to generate the required IntervalInteger object - since the 'upper' attribute now has to be treated as a little fragment of syntax and checked before being turned into an Integer. If we were just doing JSON, dADL and other 'proper' OO syntaxes, the first one would be the obvious one. But since we are also targetting XML, we have to think whether it makes more sense to do: children node_id=at0005 occurrences_lower=2 occurrences_upper=10 -- xsi:type=C_OBJECT rm_type_nameCLUSTER/rm_type_name and children node_id=at0005 occurrences_lower=2 occurrences_unbounded=true -- xs:boolean has to support 0/1 and true/false rm_type_nameCLUSTER/rm_type_name which is the analog of the first approach above, or it could be: children node_id=at0005 occurrences_lower=2 occurrences_upper=10 rm_type_nameCLUSTER/rm_type_name and children node_id=at0005 occurrences_lower=2 occurrences_upper=* rm_type_nameCLUSTER/rm_type_name with both attributes defined in the XSD as xs:string. This means that like for JSON/dADL, the XML standard parser only generates strings, and somehting further has to be done to obtain a proper Interval object. My preference is still to go with the first way of doing things. Do others agree with this? If so, it is what I will implement in the ADL 1.5 workbench. ~ second question:existence Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are constraint structures, they can only further constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually 0..0 and 1..1, so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble). Thus in JSON/dADL it could be: some_attr = existence = True|False In XML: attributes rm_attribute_name=nameexistence=true /attributes Now, this is cheating a bit because we are making it look like there is an AOM property 'existence' of type Boolean, but there isn't. Should it be named something else to make this clear? I.e. a pseudo attribute that only exists in serialised format but not in AOM internal format, e.g. 'existence_constraint'? I would favour this. In my current implementation, the serialised format actually has its own object model, and this would have to be true for JSON as well. I think it also makes sense in XML - that there will be a level of classes corresponding to the space-efficient serial form, which are not the same as the internal AOM classes. thoughts? Agree, it could be 0 or 1 - thomas beale ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical -- next part -- An HTML
occurrences and cardinality in ADL, XML, JSON
Although this would work, I think that it would make ADL far less readable and would oblige people to know always the reference model underneath AND their parent archetype (if for some reason the parent archetype is not available then you are completely screwed). Even if you say that people should know very well the model they are defining archetypes for, I think that you would agree with me that they should not be obliged to remember all archetypes on the specialization hierarchy. This could be even worse for the minimum, as if no constraint is expressed = RM min (and again, also taking into account parent archetype), which is almost always 0 or 1. And not being able to tell at first look if something is not needed is really bad (IMHO). 2011/11/11 Sam Heard sam.heard at oceaninformatics.com: Hi All As ADL only states constraints there is no logical reason to include unbounded. So no constraint expressed ?= RM max. This is likely to be one or unbounded. Sent from my phone On 11/11/2011, at 5:11 AM, Thomas Beale thomas.beale at oceaninformatics.com wrote: In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don't want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML's 'attributes'. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc. ~~ first question: occurrences and cardinality? Occurrences and cardinality? are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL occurrences here): occurrences = ??? lower = 2 -- Integer field ??? upper = 10 -- Integer field but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do: occurrences = ??? lower = 2 -- Integer field ??? upper_bounded = True -- Boolean field Sam: no need for this. meaning that 3 possible attributes could occur for an occurrences, but only ever 2 at the same time. Or we could make everything into a string: occurrences = ??? lower = 2 -- String field ??? upper = * -- String field Sam: no need for this The upside is that the 'upper' attribute now handles both bounded and unbounded values. The downside is that the JSON / dADL parsers would have to do a bit more work to generate the required IntervalInteger object - since the 'upper' attribute now has to be treated as a little fragment of syntax and checked before being turned into an Integer. If we were just doing JSON, dADL and other 'proper' OO syntaxes, the first one would be the obvious one. But since we are also targetting XML, we have to think whether it makes more sense to do: ?? ??? children node_id=at0005 occurrences_lower=2 occurrences_upper=10 -- xsi:type=C_OBJECT ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name and ?? ??? children node_id=at0005 occurrences_lower=2 occurrences_unbounded=true -- xs:boolean has to support 0/1 and true/false ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name which is the analog of the first approach above, or it could be: ?? ??? children node_id=at0005 occurrences_lower=2 occurrences_upper=10 ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name and ?? ??? children node_id=at0005 occurrences_lower=2 occurrences_upper=* ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name with both attributes defined in the XSD as xs:string. This means that like for JSON/dADL, the XML standard parser only generates strings, and somehting further has to be done to obtain a proper Interval object. My preference is still to go with the first way of doing things. Do others agree with this? If so, it is what I will implement in the ADL 1.5 workbench. ~ second question:existence Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are constraint structures, they can only further constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually 0..0 and 1..1, so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble). Thus in JSON/dADL it could be: some_attr = ??? existence = True|False In XML: attributes rm_attribute_name=nameexistence=true ?? /attributes Now, this is cheating a bit because we are making it look like there is an AOM property 'existence' of type Boolean, but there isn't. Should it be named something else to make this clear? I.e. a pseudo
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 08:15, Shinji KOBAYASHI wrote: Hi Thomas and colleagues, I would like to discuss about the other serialization form of archetype, too. I thought YAML could be an alternative of them. I had forgotten about YAML I have to admit. It would be interesting to support that in the ADL 1.5 tools as well. I will look into it. However, JSON/YAML are based on weakly typing languages, do not have established scheme definition, such as XSD/ADL. inline. 2011/11/11 Thomas Bealethomas.beale at oceaninformatics.com: ~~ first question: occurrences and cardinality but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do: occurrences = lower =2 -- Integer field upper_bounded =True -- Boolean field I think upper_bounded is typo for upper_unbounded, but this format has the oops - you are right. Sorry about that. most conformance to INTERVAL specification of assumed types library. I agree this, because this form is easier to parse and generate an INTERVAL instance. I also agree with the first way of XML scheme with the same reason. BTW, Rubyist might be prefer this format(YAML): occurrence: 2.. well, that's close to what I generate in dADL right now: but XML developers don't like that. - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/e861b406/attachment.html -- next part -- A non-text attachment was scrubbed... Name: ajbbddgj.png Type: image/png Size: 2382 bytes Desc: not available URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/e861b406/attachment.png
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 07:34, Diego Bosc? wrote: Although this would work, I think that it would make ADL far less readable and would oblige people to know always the reference model to be clear, I am not proposing to make any change at all to ADL. ADL is meant as a proper readable, mathematical formal expression of archetype semantics. It is the other serialisations we are concerned with here - i.e. serialisations of AOM structures. underneath AND their parent archetype (if for some reason the parent archetype is not available then you are completely screwed). Even if you say that people should know very well the model they are defining archetypes for, I think that you would agree with me that they should not be obliged to remember all archetypes on the specialization hierarchy. yes, that is another issue here, which is whether you are seeing an archetype in differential or flattened form. If we use the ADL format for occurrences, cardinality and existence ranges, you can always just look at the most specialised archetype and you know the resulting occurrences / card/ ex, because you always have the full range e.g. occ = 2..5 or whatever. But in the scheme I am proposing, this is not so easy to work out visually. The tools of course should generate the right result in 'flat' view. If you play around with the AWB, you will see the diff flat views, but currently these intervals are easy to understand because of always being in the full n..m form (even in the dADL and XML serialisation). So... good point This could be even worse for the minimum, as if no constraint is expressed = RM min (and again, also taking into account parent archetype), which is almost always 0 or 1. And not being able to tell at first look if something is not needed is really bad (IMHO). well it would be bad if there were no flattener, but it is always possible to implement a flattener. The way the AWB tool works is that the serialised form of a differential archetype is converted to AOM form - which has proper MULTIPLICITY_INTERVAL objects (these are essentially just IntervalInteger) before flattening; then serialisation occurs in the other direction. So a flattened archetype will show the result of the archetype lineage and also the RM, if the 'flatten RM' option is on. I am not saying all tools have to work this way - this is the way I have done the reference compiler, but others may come up with more stream-based approaches in the future. Anyway, this is a good point to be careful of. - thomas - thomas
occurrences and cardinality in ADL, XML, JSON
On 11/11/2011 08:19, Erik Sundvall wrote: Hi! On Fri, Nov 11, 2011 at 08:34, Diego Bosc?yampeku at gmail.com wrote: Although this would work, I think that it would make ADL far less readable Some readability thoughts... When a value (e.g. upper bound) may be either a number or a symbol (* or infinity) most recieveing software will need to have logic separating the cases anyway, no matter how they are serialized. So then I wonder how much harder it would be to include string parsing logic so that we can have JSON-fields with string values like... occurrences: 1..* well that's my opinion as well, and XML-ers always react badly! The 'proper' parser code for dealing with this form, used in the ADL parser is (from the .y file): ... %type MULTIPLICITY_INTERVAL c_occurrences c_existence occurrence_spec existence_spec ... c_occurrences: -- empty is ok | SYM_OCCURRENCES SYM_MATCHES SYM_START_CBLOCK occurrence_spec SYM_END_CBLOCK { $$ := $4 } | SYM_OCCURRENCES error { abort_with_error(SOCCF, Void) } ; occurrence_spec: cardinality_limit_value -- single integer or '*' { if not cardinality_limit_pos_infinity then create multiplicity_interval.make_point($1) else create multiplicity_interval.make_upper_unbounded(0) cardinality_limit_pos_infinity := False end $$ := multiplicity_interval } | V_INTEGER SYM_ELLIPSIS cardinality_limit_value { if cardinality_limit_pos_infinity then create multiplicity_interval.make_upper_unbounded($1) cardinality_limit_pos_infinity := False else create multiplicity_interval.make_bounded($1, $3) end $$ := multiplicity_interval } ; cardinality_limit_value: integer_value { $$ := $1 } | '*' { cardinality_limit_pos_infinity := True } ; But the 'fast dADL' parser doesn't bother with any of that. Here is the Eiffel code - you can see how simple it is, and how it would work in Java, Python etc etc. Note that this parser only handles correct Interval strings, i..e that were generated by the serialiser, not by some erroneous human hand! class MULTIPLICITY_INTERVAL inherit INTERVAL [INTEGER] make_from_string (a_str: attached STRING) -- make from a string of the form n..m or just n, where n and m are integers, or m may be '*' require valid_multiplicity_string: valid_multiplicity_string (a_str) local a_lower, an_upper, delim_pos: INTEGER a_mult_str: STRING do a_mult_str := a_str.twin -- remove any spaces a_mult_str.prune_all (' ') -- make the interval delim_pos := a_mult_str.substring_index (Multiplicity_range_delimiter, 1) -- n..m case if delim_pos 0 then a_lower := a_mult_str.substring (1, delim_pos-1).to_integer if a_mult_str.item (a_mult_str.count) = Multiplicity_unbounded_marker then make_upper_unbounded (a_lower) else an_upper := a_mult_str.substring (a_mult_str.substring_index (Multiplicity_range_delimiter, 1) + Multiplicity_range_delimiter.count, a_mult_str.count).to_integer make_bounded (a_lower, an_upper) end -- * case elseif a_mult_str.item (1) = Multiplicity_unbounded_marker then make_upper_unbounded (0) -- m (single integer) case else a_lower := a_mult_str.to_integer make_bounded (a_lower, a_lower) end end Not exactly hard. but I think XML developers are not used to this, and seem to prefer the XML-attributes style, which of course is not an OO structure, but does reduce the size of the XML file significantly. Will a string pattern be good enough for validation by auto-generated validators or does separation into fields clearly make auto-generated validators more capable in this case? Archetypes and templates will likely often be re-used as in-memory objects anyway so a little bit of string parsing overhead at startup might not have any significant overhead cost. On the other hand if we want to be verbose we could re-use some of the formalisms from http://json-schema.org/ Then we get schema validators in many programming languages for free (http://json-schema.org/implementations.html). Or perhaps json-schema should be an output format from something similar to the TDS (template data schema) approach? I guess my assumption is that ADL will always use the most efficient and human readable form
occurrences and cardinality in ADL, XML, JSON
Apart from the size issue, readability is a particular problem because of the verbosity of the current XML schema. Ian Dr Ian McNicoll office +44 (0)1536 414 994 fax +44 (0)1536 516317 mobile +44 (0)775 209 7859 skype ianmcnicoll ian.mcnicoll at oceaninformatics.com Clinical Modelling Consultant,?Ocean Informatics, UK openEHR Clinical Knowledge Editor www.openehr.org/knowledge Honorary Senior Research Associate, CHIME, UCL BCS Primary Health Care ?www.phcsg.org On 11 November 2011 13:56, Andrew Patterson andrewpatto at gmail.com wrote: On 11/11/2011 11:50 PM, Thomas Beale wrote: occurrences: 1..* well that's my opinion as well, and XML-ers always react badly! The 'proper' parser code for dealing with this form, used in the ADL parser is (from the .y file): Well I consider myself an XML-er and I don't see massive problems with it, but maybe I have become soft in my old age. My main argument would be that the XML at one point was almost a straight serialization of the object model, as supported by various XML data binding libraries. So XML - AOM memory objects - XML was all doable with very standard binding libraries. BUT I was happy with status quo because I don't really care about the size of the XML or how often elements are repeated or the fact that is looks ugly to people - if people want compressed data then they should use fastinfoset or exi, and then gzip and it'll compress beautifully. The size/format/look is a concern to others. BUT If I have lost the battle and if we are going to do customised XML serializations then once you've taken it outside the normal data binding by introducing * forms or even 'properties' that aren't really properties but kind of quasi computed fields then you mind as well as give up on the pretence that the XML serialization will bind straight into an AOM compatible object model.. in which case parsing 1..* is not a problem Andrew ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
occurrences and cardinality in ADL, XML, JSON
Hi Andrew, In principle I agree. I speak only as one of the poor sods who sometimes has to visually check the .opt template schemas and which use the same format. I know - get a tool :-) But even in something like XMLSpy it can get hard to see the clinical wood for the occurences trees. Ian Dr Ian McNicoll office +44 (0)1536 414 994 fax +44 (0)1536 516317 mobile +44 (0)775 209 7859 skype ianmcnicoll ian.mcnicoll at oceaninformatics.com Clinical Modelling Consultant,?Ocean Informatics, UK openEHR Clinical Knowledge Editor www.openehr.org/knowledge Honorary Senior Research Associate, CHIME, UCL BCS Primary Health Care ?www.phcsg.org On 11 November 2011 14:29, Andrew Patterson andrewpatto at gmail.com wrote: On 12/11/2011 1:16 AM, Ian McNicoll wrote: Apart from the size issue, readability is a particular problem because of the verbosity of the current XML schema. I'm not convinced that human readability should matter too much (especially seeing ADL is meant to be the human readable format - if we have readable XML can we ditch the ADL??) But I'm not passionately opposed to it or anything :) Just when it was brought up in the past many moons ago I thought we had other more pressing issues. But if the change is happening as part of an update to 1.5 then I'm all for it. Andrew
occurrences and cardinality in ADL, XML, JSON
Hi Thomas, do you have some examples of the JSON produced with your P_ classes from a couple AOM instances? It would be nice to see the results. I don't see why anyone would dislike not to have each node's type specified in the serialization form when we are talking about a schema-less format (I mean: we don't need to put each node's class in every instance of a JSON/YAML serialization from an AOM instance) and if we could agree a specification of this format (and the specification will have each nodes type, or a mapping to an AOM object that has a type defined in the AOM specs). This is not the issue, but I don't like the name persistence for the package, because I get the idea this is only for persisting something, but what I realy want to do is to use the serialization for archetype interchange (between a server and a web browser). -- Kind regards, Ing. Pablo Pazos Guti?rrez LinkedIn: http://uy.linkedin.com/in/pablopazosgutierrez Blog: http://informatica-medica.blogspot.com/ Twitter: http://twitter.com/ppazos Date: Sat, 12 Nov 2011 01:04:22 + From: thomas.be...@oceaninformatics.com To: openehr-technical at openehr.org Subject: Re: occurrences and cardinality in ADL, XML, JSON On 11/11/2011 16:21, pablo pazos wrote: Hi, I was thinking of this a lot: using a schema-less formats to represent archetypes and RM instances. I think if we agree on a common language/standard/definition, we don't need to define the types of any node on a JSON/YAML structure, because those types are defined on the laguage/standard/definition those structures will follow. And if we define a good serialization to JSON/YAML of archetypes and RM instances, we don't need a schema to share instances of those structures, we just need to implement the serialization definitions, and base the parsing on the attribute names. What do you think? PS: I was thinking of archetypes serialized to JSON because I want to build a web-based GUI Generation layer completely implemented with Javascript (JSON objects are javascript objects), so we can useshare this thin layer to show archetype-based GUI generation easily, and, if we have a REST layer that implement EHR-Server services, we can user that GUI layer to send data input to the server and get information to show (a complete circle). If anyone want to collaborate on the JSON format of ADL/AOM please send contact me. -- Again, I agree with this point of view. But XML people may not but now I should clarify something... I should have explained on other thing: what I have done in the current AOM 1.5 implementation (but not yet documented) is to create a parallel set of P_XX classes ('P_' means 'persistent') like P_ARCHETYPE, P_C_OBJECT and so on. These classes formally specify the serialised form of the archetype so there can be no ambiguity. It is these classes that current have occurrences, cardinality and existence defined as String properties. There are a few other simplifications as well. My proposal is to add these P_XX class definitions to the specification. It mihgt seem like slight overkill (and I resisted it for a long time) but once I implemented it, it seems worthwhile, and it allows us to separate the in-memory computable version of the AOM from a P_ version whose sole purpose is serialisation. The Eiffel P_ classes are here; it is easy to imagine what the Java, Python etc would look like. So Pablo's argument, applied to the P_ classes would indeed mean that the serialised form in JSON, YAML (also dADL) is a pure consequence of the P_AOM classes, and no extra logic is needed. That is why I built the P_ classes. - thomas ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/2518f9fa/attachment.html