Trying to understand the openEHR Information Model

Thomas Beale Mon, 15 Apr 2013 11:54:52 +0100

On 15/04/2013 04:07, Randolph Neall wrote:
> I just spent quite a few profitable hours today with ehr_im.pdf, which 
> appears to be the main resource for understanding the "Information 
> Model" or "Reference Model," available for download from the CKM web 
> site.
>
> Overall, it's a very well-written document that anyone trying design 
> or implement any sort of EHR system should read. I'm left with a few 
> questions about instantiation, isolation, persistence, querying and 
> the impact of changes on stored content and querying. I hate to take 
> valuable time for anyone to answer my questions, so maybe all I need 
> are some more references.
>
> I'll first explain what I think I understand of how it all works.
>
> From what I can see, the entire system consists of a hierarchy of 
> classes, some, like the EHR, Composition, Instruction, Observation, 
> Evaluation and Action are defined as part of the reference model while 
> others, the archetypes, which are not part of the reference model, all 
> inherit from one of these RM classes. There are other RM classes, like 
> Entry, Navigation, Folder, Data_Structure, etc.) that are also part of 
> the RM and are properties of the archetypes. EHR is the base class, 
> containing, by reference, all the others. Navigational information 
> inside the composition archetypes is apparently critical. The 
> Composition type is the basic container for all other archetypes that 
> might be used within a single "contribution." And templates specify 
> which archetypes will exist in the composition types and in what 
> arrangement. All of this seems quite clear.
>
> Several things would seem to follow from all this:
>
> To access even the smallest detail from the overall record, the 
> software would need to request the entire record from the server, 
> presumably in the form of a binary stream, deserialize it all, and 
> then instantiate everything from the EHR class on down. It is somewhat 
> analogous to loading a document of some sort, something you load into 
> memory in its entirety before you can read anything from it. Am I 
> mistaken here? Or is there a way to instantiate small pieces of it? 
> That, it seems, would depend on the level at which serialization 
> occurs, whether it is serialized in pieces or in one big blob (or XML 
> document) or serializes it in smaller units.


Hi Randy,

It's a standard operation to query for and obtain objects of all sizes, 
from whole Compositions (the largest contiguous objects in an openEHR 
system) to an Element or even just the Quantity inside the Element. The 
wiki seems to be down right now, but there is a specification of the 
return structures for querying describing this.

>
> If it is all in one piece, how do you manage isolation? Can only one 
> user "check out" the record at one time? Or does it work something 
> like source control systems like SVN, where different people can 
> commit to a common project, merge differences, etc? Once you obtain 
> the binary stream from the server, you from then on know nothing of 
> changes others might also be making.

the update logic is Composition-level, and you can't commit something 
smaller than a Composition. The default logic is 'optimistic' meaning 
that there is no locking per se; instead, each request for a Composition 
includes the version (in meta-data not visible to the data author), and 
an attempt to write back a new version of a Composition will cause a 
check between the current top version and the 'current version' recorded 
for the Composition when it was retrieved. IF they are identical, the 
write will succeed. There is also branching supported in the 
specification. Read the Common IM 
<http://www.openehr.org/releases/1.0.2/architecture/rm/common_im.pdf> 
for the details.

>
> It would also seem to follow that when you want to save your work (say 
> you added some composition) that you would serialize the entire 
> record--which may contains years of information--and send it to the 
> server as a fresh new document, completely replacing the old one, 
> which, presumably, would be moved to some "past version" archive. 
> Correct?

Not correct ;-) The EHR is a virtual information object, and has no 
containment relationship to the Compositions or other items it includes.

> If so, how do you cope with  your storage requirements roughly 
> doubling with every tiny addition to the record? I'm probably way off 
> here; you've probably got an elegant answer to this, namely, some sort 
> of segmented storage, with each composition persisted in its own 
> little blob??

yep, that's much closer.

>
> You have event classes and you have persistent classes, well described 
> in the pdf. A persistent class would be something like a current drug 
> list. Following on with my understanding, it would seem that any 
> change to this list via a new composition submission, would 
> effectively create an entirely new copy of the list, embracing any 
> changes, however slight. Would the old one then be archived in the 
> now-obsolete former EHR record?

Actually the spec doesn't say how the new version is stored, only that 
its logical contents have to be the current full contents of the 
medications list (or whatever). Vendors could implement differential 
version representation if they want to.

>
> How, in all this, would querying work? Would the server itself have to 
> deserialize and instantiate hundreds or thousands of complete EHR 
> records in order to search within them?

No, the basic approach is:

  * queries are expressed using archetype paths (see the AQL spec
    <http://www.openehr.org/wiki/display/spec/AQL-+Archetype+Query+Language>)
  * since data are also creating using archetypes, the stored data know
    which archetypes (and interior archetype nodes) are responsible for
    every single fragment of data
  * indexing can be based on this knowledge, so that an archetype-level
    (i.e. domain-level) path map can be recorded in an index for every
    Composition stored
  * this enables a query server to look at the query, figure out which
    archetypes & paths are implicated, and find candidate Compositions
    for matches, based on the archetype index
  * depending on the level of blobbing in the implementation, at some
    point, smaller or larger blobs will have to be materialised to
    satisfy WHERE clause value comparisons.

The performance turns out to be very good doing this. With some other 
basis assumptions - such as: most EHR access is for read purposes - 
caching and other tricks can be used to make high performance systems.

There are even smarter indexing systems being used now, that index 
actual values as well as structure, so that an entire query can be 
processed off an index.


> I understand that you do have some path information persisted outside 
> the EHR blob, giving you some idea what is inside of what, but that 
> would still not eliminate the need to do a server-side deserialization 
> and instantiation in order to read specific information pointed to by 
> the externally-stored paths. Or so I would think. If I'm right, how 
> fast are your queries and what sort of hardware does it take to run them?

well that's often true, depending on the query content, but you can 
probably guess from the above that the amount of deserialisation is 
actually quite limited and completely manageable. It's not even the 
bottleneck - the bottelneck is almost always web services, and XML 
serialise / deserialise.

hope this clarifies.

- t

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20130415/37d209c5/attachment-0001.html>

Trying to understand the openEHR Information Model

Reply via email to