Hey folks,

Following on from the discussion about adl path / xpath, XSDs, and so forth. 
Here's something semi-random that I've been thinking about for a while and I 
thought I might share. I figured now might be a good time for that, with the 
work ongoing on openehr 2.0.

Status
------
ADL is a rich schema language which allows for constructs that you cannot 
properly express in XML schema. The openEHR reference model is a rich model 
which incorporates many common data types.

This is (supposed to be?) great for modellers, leads to some very fun exercises 
in tool-building for openehr implementation experts, and is not great at all 
for joe programmer, who becomes very dependent on bob programmer [1]. One way 
or another, joe programmer is converting (with some help from bob programmer's 
magic along the way) the smart and flexible data structures into the stupid and 
rigid data structures that are typical for business programming and the tools 
and languages he has available to him.

Is this a required situation due to the inherent complexity of handling medical 
information? That seems to be the general gist of the argument that led to GEHR 
and openEHR, but I'm really not so sure I buy it. I'm also not convinced at all 
that long-term data archival and interoperability goals are really met all that 
well by any format based on complex base data structures.

Radical simplification
----------------------
By reducing the set of base types and base primitives in openehr, it _should_ 
be possible to produce a revised architecture that keeps _most_ of the core 
values of openehr, like two-level modeling on top of a generic architecture, 
makes implementation much easier, and makes use by joe programmer much easier 
still. I imagine most of the loss is in model conciseness.

I know discussions have been had here and elsewhere about "dumbing down to what 
you can conceivably get done with XML tools", but I'd actually like to take the 
thought experiment a bit further. In this case, the thoughts are about ADL and 
associated data format(s).

Lowest common denominator
-------------------------
Try and imagine openehr as it is today, and then:
* enforce (just) unicode characters, and UTF-8 encoding
* reduce the available collection primitives to just Map and List,
  losing List, Set and Interval
  (probably Set and Interval live in the reference model, but built
  conceptually they're built out of other types)
* remove Cluster, Element and ItemStructure from the reference model,
  forcing use of nested maps and lists only, throughout
* reduce primitives to string, true, false, null (ugh), int (32-bit),
  bigint (64-bit), float (32 bit), double (64 bit)
* keep the concept of datetime as a text/value subtype

Why? So any openehr data can be unambiguously serialized and deserialized to 
json, that crappy lowest common denominator format that joe and all his friends 
and all their tools can work with out of the box. Maybe we can still be smart 
enough to have something non-crappy...

...next, imagine that the map indices are now at-codes, meaning at-codes are 
required to be unique within maps. At-codes are also made mandatory for all 
model elements that are not primitives, though simple types can have 
reference-model-provided details. Finally, imagine that at-codes are not 
at-codes, but codes matching [a-zA-Z][a-zA-Z0-9_-]{1,36}.

Why? So your canonical JSON representation can be logical, even beautiful.

Where are we now?

  {
    person_name: {
      first_name: [
        {first_name_part: {value: "Jan"}},
        {first_name_part: {value: "Peter"}}
      ],
      last_name:  {value: "Balkenende"}
    }
  }

Err, what?
----------
Wait, is that really openEHR data? Am I serious? Yeah, I think so. But at what 
cost? What did we just lose?

I imagine ADL would become quite a bit simpler/more regular

  archetype (adl_version=2.0)
    openEHR-DEMOGRAPHIC-MAP.person_details.v1
  definition
    MAP[details] optional matches {
      ENTRY[identities] optional matches {
        LIST[identity_list] matches {
          MAP[person_name] optional matches {
            -- you can have at most one first name consisting of 1 or more parts
            ENTRY[first_name] optional matches {
              LIST[first_name_parts] {
                DV_TEXT[first_name_part] occurrences {1..*} matches {
                  value matches {*}
                }
              }
            }
            -- you must have exactly one last name
            ENTRY[last_name] matches {
              DV_TEXT[last_name_value] matches {
                value matches {*}
              }
            }      
          }
        }
      }
    }

which could in turn much more easily translate to XSD [2]

  <xs:element name="person_name" type="PERSON_NAME">
  <xs:complexType name="PERSON_NAME">
    <xs:complexContent>
      <xs:restriction base="openehr:MAP">
  ...
  <xs:element name="first_name_part" type="FIRST_NAME_PART">
  <xs:complexType name="FIRST_NAME_PART">
    <xs:complexContent>
      <xs:restriction base="openehr:DV_TEXT">

so that we could have pretty xml

  <person
      xmlns:o="http://openehr.org/xsd/v2";
      
xmlns="http://openehr.org/ckm/xsd/openEHR-DEMOGRAPHIC-CLUSTER.person_name.v1";>
    <details>
      <identities>
        <!-- list is hidden xs:sequence -->
        <person_name>
          <first_name>
            <first_name_part>
              <!-- first_name_part is-a DV_TEXT mitigates
                   <value><value>...</value></value> -->
              <o:value>Jan</o:value>
            </first_name_part>
            <first_name_part>
              <o:value>Peter</o:value>
            </first_name_part>
          </first_name>
          <last_name>
            <last_name_value>
              <o:value>Balkenende</o:value>
            </last_name_value>
          </last_name>
        </person_name>
      </identities>
    </details>
  </person>

which would also be much more efficient to store and index, and lead to more 
intuitive xpath

  //first_name_part[1]/value/text()        (: Jan :)
  //first_name_part[2]/value/text()        (: Peter :)
  //last_name_value/value/text()           (: Balkenende :)

Reflections
-----------
Basically, this involves sacrificing some modeling language power for increased 
implementation feasibility and tool interoperability. Most importantly it will 
reduce the cognitive dissonance that openEHR's two level modelling introduces 
for joe programmer.

Now, modeling language power is of use, isn't it? Surely the great power of ADL 
is not to be tampered with, conciseness of expression must not be lost? Well, 
Hmm. Obviously, significantly reducing what clinical models are possible to 
express is not an option at all.

But, on the other hand, you can convert pretty much any data structure into a 
tree (and we can keep Link around for approximating graph constructs), and 
convert any tree into a combination of maps and lists (just look at DOM...), 
and so it seems that this would then really place a burden mostly on the 
evolution of more powerful modeling tools, to allow expressing rich concepts 
using the more limited underlying modeling language without getting stuck [3]. 
I think this may actually be reasonable: surveying the archetypes in the CKM, 
they are predominantly using pretty simple composition and polymorphism, and 
simply binary relations. The archetypes can get pretty big and they can have 
some interesting constraints, but they're not that complex structurally.

However, if it turns out this proposed loss of modeling power isn't acceptable, 
I think the ORM (object role modelling) community has shown a successful way to 
have the best of several worlds: we could keep ADL (and its supporting tooling) 
around, mostly as it is, but then introduce a standardized translation from it 
to an intermediary, dumber, schema form (like ORM can be machine-translated to 
ER or UML), say Flattened ADL, or FADL, which is then the basis for data 
formats and system connectivity.

But, and this is perhaps the sour pill to swallow, ORM has _also_ shown that 
the seductiveness of a really powerful modeling language [4] is a great way to 
forever remain a relatively small and obscure community [5] while the majority 
of IT is off making big bucks by building theoretically (even provably) 
inferior systems. In a landscape still full of HL7v2, 80s-style SQL, and 
90s-style data entry forms, perhaps the strategy with the most chance of 
long-term success for openEHR is to actually let go of the shiniest tools.

Not meant as a call to action, just some food for thought :-)


cheers,


Leo

[1] of course this leads to a reasonable business model for jane manager, bob's 
boss, who gets to sell bob's shiny things to joe...but only so long as joe 
doesn't revolt...and the marketing is hard...
[2] as long as there's archetype slots, any purely XSD based validation is not 
going to be a unless you annoated instance data with the schema, but I can't 
imagine we'd want to consider giving up slots...
[3] so this turns out great for jane anyway, since she prefers the smart&rich 
customers...?
[4] if you're not familiar with ORM or other fact-based modelling, basically, 
the approach has been around forever (much longer than OO), and it kicks UMLs 
_ass_. GEHR and openEHR would no doubt have looked even prettier if they'd been 
expressed using ORM instead of UML.
[5] and a frustrated community at times too...hmm, I guess you might say ORM is 
to UML as Lisp is to Java...


-- 
This e-mail message is intended exclusively for the addressee(s). Please 
inform us immediately if you are not the addressee. 

Reply via email to