occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Andrew Patterson
On 11/11/2011 5:11 AM, Thomas Beale wrote:

 In the current ADL 1.4-based XSDs used in openEHR, occurrences, 
 cardinality and existence are expressed as XML elements. We will want 
 to improve this for ADL 1.5 based XML. Now, we don't want to only take 
 care of XML; we also need to make it work for JSON, and (internally) 
 for dADL - neither of the latter have XML's 'attributes'. Many people 
 have asked for more efficient ways of serialising. Here are some ideas 
 for ADL 1.5 XML, JSON etc.

 ~~ first question: occurrences and cardinality  
 Occurrences and cardinality  are proper intervals in the AOM 
 representation. The most simplified object structure (JSON and dADL) 
 for occurrences and cardinality could look as follows (I use dADL  
 occurrences here):

 occurrences = 
 lower = 2 -- Integer field
 upper = 10 -- Integer field
 

 but the upper limit is commonly unbounded, i.e. '*' in typical 
 UML-like syntax. We could do:

 occurrences = 
 lower = 2 -- Integer field
 upper_bounded = True -- Boolean field
 

Why cant' the absence of a value mean unbounded?

occurrences = 
  lower = 2
 

Means 2..*

I vaguely remember us discussing this many moons ago but I've forgotten 
the rationale..

Also, what about inclusive/exclusive values at either end
of the interval? I know that this isn't an issue for occurence and
cardinality intervals which are always inclusive - but are we proposing that
the representation of normal intervals will not use the same mechanisms
are you are proposing here?

 ~ second question:existence 
 Existence as an interval can be 0..0 (prohibited, commonly used in 
 templates), 0..1 (optional, typical in the RM) and 1..1 (used in 
 templates and sometimes in archetypes). Now, since archetypes and 
 templates are /constraint/ structures, they can only /further 
 /constrain the RM in ADL/AOM 1.5. The only possibilities for this are 
 actually 0..0 and 1..1, so we could collapse existence onto a 
 single Boolean for serialised representation (it could also be a 
 single Boolean in the AOM, but that would be a breaking change, and 
 since we already use Intervals for occurrences and cardinality, it 
 does not seem worth the trouble).

 Thus in JSON/dADL it could be:

 some_attr = 
 existence = True|False
 

 In XML:

 attributes rm_attribute_name=name*existence*=true

 /attributes


If it was just to optimize the XML I'd give this a vote of 'meh'.. but 
given that existence is not really an interval because
as you say it has very few possible valid values, I think the removal of 
the ambiguity by turning it into a single boolean
is probably worthwhile.

Andrew




occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Sam Heard
 attachment was scrubbed...
URL: 
http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/3c173711/attachment.html


occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Diego Boscá
Although this would work, I think that it would make ADL far less
readable and would oblige people to know always the reference model
underneath AND their parent archetype (if for some reason the parent
archetype is not available then you are completely screwed). Even if
you say that people should know very well the model they are defining
archetypes for, I think that you would agree with me that they should
not be obliged to remember all archetypes on the specialization
hierarchy.

This could be even worse for the minimum, as if no constraint is
expressed = RM min (and again, also taking into account parent
archetype), which is almost always 0 or 1. And not being able to tell
at first look if something is not needed is really bad (IMHO).

2011/11/11 Sam Heard sam.heard at oceaninformatics.com:
 Hi All
 As ADL only states constraints there is no logical reason to include
 unbounded. So no constraint expressed ?= RM max. This is likely to be one or
 unbounded.

 Sent from my phone
 On 11/11/2011, at 5:11 AM, Thomas Beale thomas.beale at oceaninformatics.com
 wrote:


 In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality
 and existence are expressed as XML elements. We will want to improve this
 for ADL 1.5 based XML. Now, we don't want to only take care of XML; we also
 need to make it work for JSON, and (internally) for dADL - neither of the
 latter have XML's 'attributes'. Many people have asked for more efficient
 ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc.

 ~~ first question: occurrences and cardinality? 
 Occurrences and cardinality? are proper intervals in the AOM representation.
 The most simplified object structure (JSON and dADL) for occurrences and
 cardinality could look as follows (I use dADL  occurrences here):

 occurrences = 
 ??? lower = 2 -- Integer field
 ??? upper = 10 -- Integer field


 but the upper limit is commonly unbounded, i.e. '*' in typical UML-like
 syntax. We could do:

 occurrences = 
 ??? lower = 2 -- Integer field
 ??? upper_bounded = True -- Boolean field

 Sam: no need for this.



 meaning that 3 possible attributes could occur for an occurrences, but only
 ever 2 at the same time. Or we could make everything into a string:

 occurrences = 
 ??? lower = 2 -- String field
 ??? upper = * -- String field


 Sam: no need for this

 The upside is that the 'upper' attribute now handles both bounded and
 unbounded values. The downside is that the JSON / dADL parsers would have to
 do a bit more work to generate the required IntervalInteger object - since
 the 'upper' attribute now has to be treated as a little fragment of syntax
 and checked before being turned into an Integer.

 If we were just doing JSON, dADL and other 'proper' OO syntaxes, the first
 one would be the obvious one. But since we are also targetting XML, we have
 to think whether it makes more sense to do:

 ?? ??? children node_id=at0005 occurrences_lower=2
 occurrences_upper=10 -- xsi:type=C_OBJECT
 ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name

 and

 ?? ??? children node_id=at0005 occurrences_lower=2
 occurrences_unbounded=true -- xs:boolean has to support 0/1 and
 true/false
 ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name

 which is the analog of the first approach above, or it could be:

 ?? ??? children node_id=at0005 occurrences_lower=2
 occurrences_upper=10
 ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name

 and

 ?? ??? children node_id=at0005 occurrences_lower=2
 occurrences_upper=*
 ?? ??? ??? ??? rm_type_nameCLUSTER/rm_type_name

 with both attributes defined in the XSD as xs:string. This means that like
 for JSON/dADL, the XML standard parser only generates strings, and somehting
 further has to be done to obtain a proper Interval object.

 My preference is still to go with the first way of doing things. Do others
 agree with this? If so, it is what I will implement in the ADL 1.5
 workbench.

 ~ second question:existence 
 Existence as an interval can be 0..0 (prohibited, commonly used in
 templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates
 and sometimes in archetypes). Now, since archetypes and templates are
 constraint structures, they can only further constrain the RM in ADL/AOM
 1.5. The only possibilities for this are actually 0..0 and 1..1, so we
 could collapse existence onto a single Boolean for serialised representation
 (it could also be a single Boolean in the AOM, but that would be a breaking
 change, and since we already use Intervals for occurrences and cardinality,
 it does not seem worth the trouble).

 Thus in JSON/dADL it could be:

 some_attr = 
 ??? existence = True|False


 In XML:

 attributes rm_attribute_name=nameexistence=true
 ?? 
 /attributes

 Now, this is cheating a bit because we are making it look like there is an
 AOM property 'existence' of type Boolean, but there isn't. Should it be
 named something else to make this clear? I.e. a pseudo 

occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Thomas Beale
On 11/11/2011 08:15, Shinji KOBAYASHI wrote:
 Hi Thomas and colleagues,

 I would like to discuss about the other serialization form of archetype, too.
 I thought YAML could be an alternative of them.

I had forgotten about YAML I have to admit. It would be interesting to 
support that in the ADL 1.5 tools as well. I will look into it.

 However, JSON/YAML are based on weakly typing languages, do not have
 established scheme definition, such as XSD/ADL.

 inline.

 2011/11/11 Thomas Bealethomas.beale at oceaninformatics.com:

 ~~ first question: occurrences and cardinality  
 but the upper limit is commonly unbounded, i.e. '*' in typical UML-like
 syntax. We could do:

 occurrences =
  lower =2  -- Integer field
  upper_bounded =True  -- Boolean field
 I think upper_bounded is typo for upper_unbounded, but this format has the

oops - you are right. Sorry about that.

 most conformance to INTERVAL specification of assumed types library.
 I agree this, because this form is easier to parse and generate an
 INTERVAL instance.
 I also agree with the first way of XML scheme with the same reason.

 BTW, Rubyist might be prefer this format(YAML):

 occurrence:
2..

well, that's close to what I generate in dADL right now:


but XML developers don't like that.

- thomas

-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/e861b406/attachment.html
-- next part --
A non-text attachment was scrubbed...
Name: ajbbddgj.png
Type: image/png
Size: 2382 bytes
Desc: not available
URL: 
http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/e861b406/attachment.png


occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Thomas Beale
On 11/11/2011 07:34, Diego Bosc? wrote:
 Although this would work, I think that it would make ADL far less
 readable and would oblige people to know always the reference model

to be clear, I am not proposing to make any change at all to ADL. ADL is 
meant as a proper readable, mathematical formal expression of archetype 
semantics. It is the other serialisations we are concerned with here - 
i.e. serialisations of AOM structures.

 underneath AND their parent archetype (if for some reason the parent
 archetype is not available then you are completely screwed). Even if
 you say that people should know very well the model they are defining
 archetypes for, I think that you would agree with me that they should
 not be obliged to remember all archetypes on the specialization
 hierarchy.

yes, that is another issue here, which is whether you are seeing an 
archetype in differential or flattened form. If we use the ADL format 
for occurrences, cardinality and existence ranges, you can always just 
look at the most specialised archetype and you know the resulting 
occurrences / card/ ex, because you always have the full range e.g. occ 
= 2..5 or whatever. But in the scheme I am proposing, this is not so 
easy to work out visually. The tools of course should generate the right 
result in 'flat' view. If you play around with the AWB, you will see the 
diff  flat views, but currently these intervals are easy to understand 
because of always being in the full n..m form (even in the dADL and XML 
serialisation). So... good point


 This could be even worse for the minimum, as if no constraint is
 expressed = RM min (and again, also taking into account parent
 archetype), which is almost always 0 or 1. And not being able to tell
 at first look if something is not needed is really bad (IMHO).

well it would be bad if there were no flattener, but it is always 
possible to implement a flattener. The way the AWB tool works is that 
the serialised form of a differential archetype is converted to AOM form 
- which has proper MULTIPLICITY_INTERVAL objects (these are essentially 
just IntervalInteger) before flattening; then serialisation occurs in 
the other direction. So a flattened archetype will show the result of 
the archetype lineage and also the RM, if the 'flatten RM' option is on. 
I am not saying all tools have to work this way - this is the way I have 
done the reference compiler, but others may come up with more 
stream-based approaches in the future.

Anyway, this is a good point to be careful of.

- thomas


- thomas




occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Thomas Beale
On 11/11/2011 08:19, Erik Sundvall wrote:
 Hi!

 On Fri, Nov 11, 2011 at 08:34, Diego Bosc?yampeku at gmail.com  wrote:
 Although this would work, I think that it would make ADL far less
 readable
 Some readability thoughts...

 When a value (e.g. upper bound) may be either a number or a symbol (*
 or infinity) most recieveing software will need to have logic
 separating the cases anyway, no matter how they are serialized.
 So then I wonder how much harder it would be to include string parsing
 logic so that we can have JSON-fields with string values like...
 occurrences: 1..*

well that's my opinion as well, and XML-ers always react badly! The 
'proper' parser code for dealing with this form, used in the ADL parser 
is (from the .y file):

...
%type MULTIPLICITY_INTERVAL c_occurrences c_existence occurrence_spec 
existence_spec
...
c_occurrences:  -- empty is ok
 | SYM_OCCURRENCES SYM_MATCHES SYM_START_CBLOCK occurrence_spec 
SYM_END_CBLOCK
 {
 $$ := $4
 }
 | SYM_OCCURRENCES error
 {
 abort_with_error(SOCCF, Void)
 }
 ;

occurrence_spec: cardinality_limit_value -- single integer or '*'
 {
 if not cardinality_limit_pos_infinity then
 create multiplicity_interval.make_point($1)
 else
 create multiplicity_interval.make_upper_unbounded(0)
 cardinality_limit_pos_infinity := False
 end
 $$ := multiplicity_interval
 }
 | V_INTEGER SYM_ELLIPSIS cardinality_limit_value
 {
 if cardinality_limit_pos_infinity then
 create multiplicity_interval.make_upper_unbounded($1)
 cardinality_limit_pos_infinity := False
 else
 create multiplicity_interval.make_bounded($1, $3)
 end
 $$ := multiplicity_interval
 }
 ;




cardinality_limit_value: integer_value
 {
 $$ := $1
 }
 | '*'
 {
 cardinality_limit_pos_infinity := True
 }
 ;



But the 'fast dADL' parser doesn't bother with any of that. Here is the 
Eiffel code - you can see how simple it is, and how it would work in 
Java, Python etc etc. Note that this parser only handles correct 
Interval strings, i..e that were generated by the serialiser, not by 
some erroneous human hand!


class MULTIPLICITY_INTERVAL

 inherit INTERVAL [INTEGER]

 make_from_string (a_str: attached STRING)
 -- make from a string of the form n..m or just n, where 
n and m are integers, or m may be '*'
 require
 valid_multiplicity_string: valid_multiplicity_string (a_str)
 local
 a_lower, an_upper, delim_pos: INTEGER
 a_mult_str: STRING
 do
 a_mult_str := a_str.twin

 -- remove any spaces
 a_mult_str.prune_all (' ')

 -- make the interval
 delim_pos := a_mult_str.substring_index 
(Multiplicity_range_delimiter, 1)
 -- n..m case
 if delim_pos  0 then
 a_lower := a_mult_str.substring (1, delim_pos-1).to_integer
 if a_mult_str.item (a_mult_str.count) = 
Multiplicity_unbounded_marker then
 make_upper_unbounded (a_lower)
 else
 an_upper := a_mult_str.substring 
(a_mult_str.substring_index (Multiplicity_range_delimiter, 1) + 
Multiplicity_range_delimiter.count, a_mult_str.count).to_integer
 make_bounded (a_lower, an_upper)
 end
 -- * case
 elseif a_mult_str.item (1) = Multiplicity_unbounded_marker then
 make_upper_unbounded (0)
 -- m (single integer) case
 else
 a_lower := a_mult_str.to_integer
 make_bounded (a_lower, a_lower)
 end
 end


Not exactly hard. but I think XML developers are not used to this, 
and seem to prefer the XML-attributes style, which of course is not an 
OO structure, but does reduce the size of the XML file significantly.



 Will a string pattern be good enough for validation by auto-generated
 validators or does separation into fields clearly make auto-generated
 validators more capable in this case?

 Archetypes and templates will likely often be re-used as in-memory
 objects anyway so a little bit of string parsing overhead at startup
 might not have any significant overhead cost.


 On the other hand if we want to be verbose we could re-use some of the
 formalisms from http://json-schema.org/ Then we get schema validators
 in many programming languages for free
 (http://json-schema.org/implementations.html). Or perhaps json-schema
 should be an output format from something similar to the TDS (template
 data schema) approach?

I guess my assumption is that ADL will always use the most efficient and 
human readable form 

occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Ian McNicoll
Apart from the size issue, readability is a particular problem because
of the verbosity of the current XML schema.

Ian

Dr Ian McNicoll
office +44 (0)1536 414 994
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll at oceaninformatics.com

Clinical Modelling Consultant,?Ocean Informatics, UK
openEHR Clinical Knowledge Editor www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care ?www.phcsg.org




On 11 November 2011 13:56, Andrew Patterson andrewpatto at gmail.com wrote:
 On 11/11/2011 11:50 PM, Thomas Beale wrote:
 occurrences: 1..*
 well that's my opinion as well, and XML-ers always react badly! The
 'proper' parser code for dealing with this form, used in the ADL parser
 is (from the .y file):

 Well I consider myself an XML-er and I don't see massive problems with
 it, but
 maybe I have become soft in my old age.

 My main argument would be that the XML at one point was almost a
 straight serialization
 of the object model, as supported by various XML data binding libraries. So
 XML - AOM memory objects - XML was all doable with very standard
 binding libraries.

 BUT

 I was happy with status quo because I don't really care about the
 size of the XML or how often elements are repeated or the fact that is looks
 ugly to people - if people want compressed data then they should use
 fastinfoset
 or exi, and then gzip and it'll compress beautifully. The size/format/look
 is a concern to others.

 BUT

 If I have lost the battle and if we are going to do customised
 XML serializations then once you've taken it outside the
 normal data binding by introducing * forms or even
 'properties' that aren't really properties but kind of quasi computed fields
 then you mind as well as give up on the pretence that the XML serialization
 will bind straight into an AOM compatible object model..
 in which case parsing 1..* is not a problem

 Andrew

 ___
 openEHR-technical mailing list
 openEHR-technical at openehr.org
 http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical





occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread Ian McNicoll
Hi Andrew,

In principle I agree. I speak only as one of the poor sods who
sometimes has to visually check the .opt template schemas and which
use the same format. I know - get a tool :-) But even in something
like XMLSpy it can get hard to see the clinical wood for the
occurences trees.

Ian

Dr Ian McNicoll
office +44 (0)1536 414 994
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll at oceaninformatics.com

Clinical Modelling Consultant,?Ocean Informatics, UK
openEHR Clinical Knowledge Editor www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care ?www.phcsg.org




On 11 November 2011 14:29, Andrew Patterson andrewpatto at gmail.com wrote:
 On 12/11/2011 1:16 AM, Ian McNicoll wrote:

 Apart from the size issue, readability is a particular problem because
 of the verbosity of the current XML schema.

 I'm not convinced that human readability should matter too much
 (especially seeing ADL is meant to be the human readable format
 - if we have readable XML can we ditch the ADL??)

 But I'm not passionately opposed to it or anything :) Just when it
 was brought up in the past many moons ago I thought we had other
 more pressing issues. But if the change is happening as part of
 an update to 1.5 then I'm all for it.

 Andrew






occurrences and cardinality in ADL, XML, JSON

2011-11-11 Thread pablo pazos

Hi Thomas, do you have some examples of the JSON produced with your P_ classes 
from a couple AOM instances? It would be nice to see the results.

I don't see why anyone would dislike not to have each node's type specified in 
the serialization form when we are talking about a schema-less format (I mean: 
we don't need to put each node's class in every instance of a JSON/YAML 
serialization from an AOM instance) and if we could agree a specification of 
this format (and the specification will have each nodes type, or a mapping to 
an AOM object that has a type defined in the AOM specs).

This is not the issue, but I don't like the name persistence for the package, 
because I get the idea this is only for persisting something, but what I realy 
want to do is to use the serialization for archetype interchange (between a 
server and a web browser).

-- 
Kind regards,
Ing. Pablo Pazos Guti?rrez
LinkedIn: http://uy.linkedin.com/in/pablopazosgutierrez
Blog: http://informatica-medica.blogspot.com/
Twitter: http://twitter.com/ppazos

Date: Sat, 12 Nov 2011 01:04:22 +
From: thomas.be...@oceaninformatics.com
To: openehr-technical at openehr.org
Subject: Re: occurrences and cardinality in ADL, XML, JSON


  



  
  
On 11/11/2011 16:21, pablo pazos wrote:

  
  
Hi, I was thinking of this a lot: using a schema-less formats to
represent archetypes and RM instances.



I think if we agree on a common
  language/standard/definition, we don't need to define the
  types of any node on a JSON/YAML structure, because those
  types are defined on the laguage/standard/definition those
  structures will follow. And if we define a good serialization
  to JSON/YAML of archetypes and RM instances, we don't need a
  schema to share instances of those structures, we just need to
  implement the serialization definitions, and base the parsing
  on the attribute names.



What do you think?

  

  

  PS: I was thinking of archetypes serialized to JSON because I
  want to build a web-based GUI Generation layer completely
  implemented with Javascript (JSON objects are javascript
  objects), so we can useshare this thin layer to show
  archetype-based GUI generation easily, and, if we have a REST
  layer that implement EHR-Server services, we can user that GUI
  layer to send data input to the server and get information to
  show (a complete circle). If anyone want to collaborate on the
  JSON format of ADL/AOM please send contact me.

  

  -- 
  



Again, I agree with this point of view. But XML people may not
but now I should clarify something...



I should have explained on other thing: what I have done in the
current AOM 1.5 implementation (but not yet documented) is to create
a parallel set of P_XX classes ('P_' means 'persistent')  like
P_ARCHETYPE, P_C_OBJECT and so on. These classes formally specify
the serialised form of the archetype so there can be no ambiguity.
It is these classes that current have occurrences, cardinality and
existence defined as String properties. There are a few other
simplifications as well. My proposal is to add these P_XX class
definitions to the specification. It mihgt seem like slight overkill
(and I resisted it for a long time) but once I implemented it, it
seems worthwhile, and it allows us to separate the in-memory
computable version of the AOM from a P_ version whose sole purpose
is serialisation. The Eiffel P_ classes are here;
it is easy to imagine what the Java, Python etc would look like. 



So Pablo's argument, applied to the P_ classes would indeed mean
that the serialised form in JSON, YAML (also dADL) is a pure
consequence of the P_AOM classes, and no extra logic is needed. That
is why I built the P_ classes.



- thomas



  


___
openEHR-technical mailing list
openEHR-technical at openehr.org
http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical 
  
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2011/2518f9fa/attachment.html