Re: Archetype relational mapping - a practical openEHR persistence solution

Bert Verhees Sun, 14 Feb 2016 03:03:28 -0800

On 14-02-16 00:04, Birger Haarbrandt wrote:

Hi Bert,
I'm not arguing that you can represent most data in XML. I'm justconcerned that mangling high volume or specialized data like forexample sensor data, genom data and geo-spatial data into a documentformat might not work too well. Also, when the ER-diagram ofnon-openEHR data is fairly complex, producing a meaningful XSD and XMLdocuments might not be that quick and easy (at least I don't know of aindustry-strength tool that can help with this task. However, I may bewrong about this and I'd be happy to learn).

I agree, long ranges of data are not well represented in XML. It has toomuch overhead. (Although there are other solutions for that which areeasy to integrate with XML, but that aside)

So handle XML as an intermediate representation, good for software tohandle, it can represent objects very good. So it fits good to a ObjectOriented paradigm. OpenEHR also works along this paradigm.XML is a format which has good support for validating and it canrepresent objects very good. It is also widely understood, and almostevery development-environment has standard support for XML.

There are two kind of related matured industries-supports I am lookingfor. That is a good, well defined query language, and as an extension onthis, a validation environment.XQuery and Schematron are excellent technologies which fit very good tothe two-level modeling (OpenEHR) paradigm, because they are path-based.

JSON is also very good, and it is leaner, especially if sender andreceiver have deep knowledge about the data (which is the case inOpenEHR), then JSON is better. But the industry support for JSON is, asfar as I know, not as good as it is for XML. But on the other hand, itis easy to migrate from XML to JSON and vice versa, even without orstructure data-loss, see for example

http://www.utilities-online.info/xmltojson/

I don't believe that XML-databases actually store XML. Oracle, forexample, breaks it up in a relational structure. But I don't know theinternals of others well. The worst solution, however for storing XMLwould be really storing XML.In the solution I presented in my email. it is not XML in which I wantto store data, that is path-value combination (in fact, in detail itdiffers somewhat, this is the base idea. The elaborated idea is 10 timesas efficient.)

Because, regarding to storage, their are other criteria than forvalidating and communicating data. In storage speed and efficiency arevery important, and also, a very good and fast implementation of AQL (orXQuery)And when data are retrieved, they can be represented in JSON or XML, orwhatever one likes, even support for native American smoke signals ispossible, these are again representations.

Regarding performance, we did some tests on SQL Server 2012 last year.As I have only experience with this particular database, it might wellbe that my critique does not apply to Oracle or Marklogic!

I am not very impressed by these database-tests, there are so manyside-factors which are not taken into account.The JDBC-drivers, for example, the used communication-protocols, theindexes, the code of the supporting software-layers, the quality of thequery-engine, the operating system, the file-system, thenetwork-card-driver, etc, etc.

You are testing complete different stacks of technologies.

It is like testing chain, and then concluding that the last shackle isno good because the chain breaks somewhere in the middle.

But there is indeed a problem with the old database technologies, andthat is that they are build for data-manipulation. There are goodreasons to do that, a bank does not want to process every day yourcomplete history, but wants to know you current savings and mortgageposition. So they modify your current data constantly. The Coddnormalization is also designed for efficiency and integrity in thecontext of datamanipulation.

When you use a database out of the box then you will see features whichare needed for constant manipulation.But you don't need them, because medical data are immutable. This isvery important.

Just a minute ago I compared a simple SQL Query with an XQuery on ourdata repository. I simply wanted to get all validated blood pressurevalues and their corresponding datetimes of a pediatric icu. Using theplain relational representation of the data (we automatically map datafrom compositions to tables), it takes under 1 second to get all329.273 rows. Having a full index on the blood pressure fragment ofthe composition (this is needed to get the internal tabularrepresentation of the data) and a secondary index on the paths,querying of the same rows still takes 30 seconds (without, it would be2 minutes. No surprise). Additionally, the size of the data increasesfrom 10MB to 270MB.

I can assure you that my database storage requires only a few indexes,and also very fast indexes, because data are immutable.

The disadvantage of my solution is that it is not out of the box.

The most important job to do is let the query engine work with thedata-storage, but there are now new ways to work with grammars, and Idon't think this is very difficult.


W3 has a lot of information for XQuery grammars
https://www.w3.org/TR/xquery-xpath-parsing/
https://www.w3.org/TR/xquery-30/

When this is done, a database-configuration, designed for speed, onevery RDB-engine can be used to create this data-processing method.

But I see that we are talking indeed in different tracks of approachingthe problem. You test out of the box solutions, many people do.And I think that out of the box, nothing is good enough, because theywere not thinking of OpenEHR but of a million othercustomer-requirements when designing their database.And how good and how well designed and how professional and wellmaintained, they will not remove those characteristics which stand inyour way.

This is the reality we face in out system, therefore, Iconsider XQuery and XML not an option for us to do analysis in thisdatabase layer. As said, this might not apply to a betterimplementation of XML by other vendors but I'd love to see somereal-world numbers.
Just some thoughts and experiences, I'm not a dedicated databaseexpert, therefore, I would not be sad if I'm proven wrong :)


Embrace the good news ;-)

Bert

_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

Re: Archetype relational mapping - a practical openEHR persistence solution

Reply via email to