On 15/04/2013 04:07, Randolph Neall wrote: > I just spent quite a few profitable hours today with ehr_im.pdf, which > appears to be the main resource for understanding the "Information > Model" or "Reference Model," available for download from the CKM web > site. > > Overall, it's a very well-written document that anyone trying design > or implement any sort of EHR system should read. I'm left with a few > questions about instantiation, isolation, persistence, querying and > the impact of changes on stored content and querying. I hate to take > valuable time for anyone to answer my questions, so maybe all I need > are some more references. > > I'll first explain what I think I understand of how it all works. > > From what I can see, the entire system consists of a hierarchy of > classes, some, like the EHR, Composition, Instruction, Observation, > Evaluation and Action are defined as part of the reference model while > others, the archetypes, which are not part of the reference model, all > inherit from one of these RM classes. There are other RM classes, like > Entry, Navigation, Folder, Data_Structure, etc.) that are also part of > the RM and are properties of the archetypes. EHR is the base class, > containing, by reference, all the others. Navigational information > inside the composition archetypes is apparently critical. The > Composition type is the basic container for all other archetypes that > might be used within a single "contribution." And templates specify > which archetypes will exist in the composition types and in what > arrangement. All of this seems quite clear. > > Several things would seem to follow from all this: > > To access even the smallest detail from the overall record, the > software would need to request the entire record from the server, > presumably in the form of a binary stream, deserialize it all, and > then instantiate everything from the EHR class on down. It is somewhat > analogous to loading a document of some sort, something you load into > memory in its entirety before you can read anything from it. Am I > mistaken here? Or is there a way to instantiate small pieces of it? > That, it seems, would depend on the level at which serialization > occurs, whether it is serialized in pieces or in one big blob (or XML > document) or serializes it in smaller units.
Hi Randy, It's a standard operation to query for and obtain objects of all sizes, from whole Compositions (the largest contiguous objects in an openEHR system) to an Element or even just the Quantity inside the Element. The wiki seems to be down right now, but there is a specification of the return structures for querying describing this. > > If it is all in one piece, how do you manage isolation? Can only one > user "check out" the record at one time? Or does it work something > like source control systems like SVN, where different people can > commit to a common project, merge differences, etc? Once you obtain > the binary stream from the server, you from then on know nothing of > changes others might also be making. the update logic is Composition-level, and you can't commit something smaller than a Composition. The default logic is 'optimistic' meaning that there is no locking per se; instead, each request for a Composition includes the version (in meta-data not visible to the data author), and an attempt to write back a new version of a Composition will cause a check between the current top version and the 'current version' recorded for the Composition when it was retrieved. IF they are identical, the write will succeed. There is also branching supported in the specification. Read the Common IM <http://www.openehr.org/releases/1.0.2/architecture/rm/common_im.pdf> for the details. > > It would also seem to follow that when you want to save your work (say > you added some composition) that you would serialize the entire > record--which may contains years of information--and send it to the > server as a fresh new document, completely replacing the old one, > which, presumably, would be moved to some "past version" archive. > Correct? Not correct ;-) The EHR is a virtual information object, and has no containment relationship to the Compositions or other items it includes. > If so, how do you cope with your storage requirements roughly > doubling with every tiny addition to the record? I'm probably way off > here; you've probably got an elegant answer to this, namely, some sort > of segmented storage, with each composition persisted in its own > little blob?? yep, that's much closer. > > You have event classes and you have persistent classes, well described > in the pdf. A persistent class would be something like a current drug > list. Following on with my understanding, it would seem that any > change to this list via a new composition submission, would > effectively create an entirely new copy of the list, embracing any > changes, however slight. Would the old one then be archived in the > now-obsolete former EHR record? Actually the spec doesn't say how the new version is stored, only that its logical contents have to be the current full contents of the medications list (or whatever). Vendors could implement differential version representation if they want to. > > How, in all this, would querying work? Would the server itself have to > deserialize and instantiate hundreds or thousands of complete EHR > records in order to search within them? No, the basic approach is: * queries are expressed using archetype paths (see the AQL spec <http://www.openehr.org/wiki/display/spec/AQL-+Archetype+Query+Language>) * since data are also creating using archetypes, the stored data know which archetypes (and interior archetype nodes) are responsible for every single fragment of data * indexing can be based on this knowledge, so that an archetype-level (i.e. domain-level) path map can be recorded in an index for every Composition stored * this enables a query server to look at the query, figure out which archetypes & paths are implicated, and find candidate Compositions for matches, based on the archetype index * depending on the level of blobbing in the implementation, at some point, smaller or larger blobs will have to be materialised to satisfy WHERE clause value comparisons. The performance turns out to be very good doing this. With some other basis assumptions - such as: most EHR access is for read purposes - caching and other tricks can be used to make high performance systems. There are even smarter indexing systems being used now, that index actual values as well as structure, so that an entire query can be processed off an index. > I understand that you do have some path information persisted outside > the EHR blob, giving you some idea what is inside of what, but that > would still not eliminate the need to do a server-side deserialization > and instantiation in order to read specific information pointed to by > the externally-stored paths. Or so I would think. If I'm right, how > fast are your queries and what sort of hardware does it take to run them? well that's often true, depending on the query content, but you can probably guess from the above that the amount of deserialisation is actually quite limited and completely manageable. It's not even the bottleneck - the bottelneck is almost always web services, and XML serialise / deserialise. hope this clarifies. - t -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20130415/37d209c5/attachment-0001.html>