Re: Archetype relational mapping - a practical openEHR persistence solution

Bert Verhees Sat, 13 Feb 2016 09:53:49 -0800

On 13-02-16 18:13, Birger Haarbrandt wrote:

Even as a research associate or independent developer, its sometimesgood to take a break from work :) I hope I didn't miss a point in youre-mail and hope that I understood corretly: when you storeopenEHR-data using XML technologies, you will have to use XMLtechnologies to query that data. To my understanding, also Oracle willneed xpath/xquery when you store XML data using the XML datatypedcolumn. If you want to combine this data with other data that isn'tbased on XML, you will have to convert the external data to XML or mixquery languages in the mentioned data bases.

I don't understand the point you make. The leafnodes in an archetype arealways constraints on primitives, and XML can represent all types ofprimitives.

Everything that is not a leafnode is a structure, the structure isobject oriented and easy to convert to XML, where the tag-names are theattribute-names, and the type-names are the typenames from the RM-XSD.The structure which is represented in XML is also easy to bring back toan object oriented structure.


It is just a notation, nothing else.

There are even standard java-classes (I thought in JAXB) which supportthis kind of things.

AQL is easy to convert to XQuery, but maybe I would prefer XQuery, andit is, for a non technical person possible to use a drag and dropinterface which represents the archetypepaths.

I am missing your point, or better sad, we are not on the same track ofunderstanding


Best regards
Bert

In short: It makes sense to not use the openEHR CDR for such queriesbut export it to something more suited for such purpose. This does notneed to be a classical Data Warehouse (it has only limited use for thequestions clinicians have) but certainly XML would not be my firstchoice for the analytics layer.
Birger


Von meinem iPad gesendet
Am 13.02.2016 um 17:56 schrieb Bert Verhees <bert.verh...@rosa.nl<mailto:bert.verh...@rosa.nl>>:
Thanks,
I always forget it is weekend, as independent developer, there ain'tno such thing as a day of.
I agree that mixing query-languages is ugly, and I don't see thenecessity.
But maybe, you will explain that next week

Bert

On 13-02-16 17:46, Birger Haarbrandt wrote:
Hi Bert,
I will post some more thoughts on these things after the weekend :)To just give a quick answer: imo it's important to have a flexibledata format like the one I2B2 uses (roughly said EAV) to mix openEHRdata with non-openEHR data. Making analysis on XMLdocuments/databaes might prevent integration of other sources (andeven if its possible like in SQL Server or Oracle, queries that mixXQuery and SQL might become pretty ugly. And I see no good way toprovide a drag & drop interface to researchers/physicians to makequeries).
Cheers,

Birger

Von meinem iPad gesendet


Am 13.02.2016 um 16:24 schrieb Bert Verhees <bert.verh...@rosa.nl>:
No comments, on the other hand it is Saturday

I had left out some necessary technical details.
I will possible build it and then have possible the fastesttwo-level-modeling engine in the world, which will, of course, alsosupport OpenEHR.
So it is not really bad that this happens.

When it is ready, I will make an announcement.

Best regards and enjoy the weekend.
Bert Verhees


On 12-02-16 20:49, Bert Verhees wrote:
On 12-02-16 18:26, Erik Sundvall wrote:
if you are experimenting with open source native XML DBs foropenEHR, it preformed well for "clinical" patient-specificquerying even though all xml databases we tested were notsuitable for ad hoc epidemiological population queries (withoutquery specific indexing)
A very interesting paper. I have some first opinions on that. Butfirst I need to explain what I think about the matter.I have not prepared the story below, so there may be things whichI write to fast. See it as provisional view, not as a hard opinion.
---------
There are relational database-configurations for OLAP and forOLTP. The combination is hard to find. There are reasons. This isa classic problem.
You need specific indexes for data-mining (OLAP), and for everyextra data-mining query you need more indexes, especially if youdon't have time to wait a night for the result. Those extraindexes stand in the way for transactional processing (OLTP)because they need to be updated, and that is unnecessary burdenfor the OLTP-processes, as longer as the database exist, theburden becomes heavier.
That is why OLAP and OLTP are not often combined in one configuration.
So many professional databases have extra features for OLAP, Iworked, years ago with Oracle for a big Dutch television company,and my main job was to create indexes for marketing purposes.We ran those unusual queries during the night and stored theresult in special tables, Oracle called them "materialized views".The day after, those views were processed in analyzing software,like SPSS, and after that, thrown away.
It was a database with 900,000 persons in it, and every person hada lot of history of web-history, personal interests, etc."How much interest does a person have for opera, and is he able topay for opera, is it worth to call him for a ticket-offer, wecannot call 900,000 persons"These were complex queries based on things the customer bought,television programs he was interested in, web-activities.
That was the kind of thing they did with the database.
So this could well be compared with a medical database, regardingto size and complexity.
The same difficulties count for XML databases. That is why XMLdatabases have also features for creating extra indexes.Oracle, by the way, if it knows the structure of XML (via XSD), itbreaks, underwater, XML into relational data, and store it in arelational database. It also converts XQuery to SQL.In this way, it has the weakness and advantages of a relationaldatabase, and it needs the extra indexes for unusual queries, buton the developer view it is XML.
-------
Comparing XML and relational, for OpenEHR, I favor XML, because itcan easily reflect the structures which need to be stored. Itmakes the data-processing layer less complex. There is a lot oftooling around XML, XML-schema to make the database-structureknown to Oracle, Schematron to validate against archetypes. Thisis very matured software, and therefor the complexity is solvedyears ago, and well tested. It is hidden complexity, and maturedhidden complexity is no problem.
--------
And if you want to do data-mining, like epidemiological research,and you have the time to plan the research, then the classicaldatabase, XML or RDB, is OK.
In my opinion, there is not often a real need for adhocdata-mining (epidemiological research) queries, with result in afew minutes. They are always planned, and creating the indexes andstoring the result in "materialized views" are part of the workone has to do for data-mining research on data.
So, I don't think there is a real need for this.
--------
Regarding to XML databases, Oracle has a solution, which canperform well if it is professionally maintained.This is often a point, because professional Oracle maintainingregarding to advanced use is very expensive.Another company is MarkLogic. It is said that MarkLogic is better,but I don't know that from own experience.
Both are free to use for developers.
You must think of numbers to 35,000 Euro a year for licenses,which is not very much for a big hospital, but very much for asmall health serviceThe open source XML ExistDB database is not very good fordata-mining, is my personal experience.
So, we must ask ourselves, are we solving a problem that no oneexperiences?
--------
There are a few advantages to OpenEHR. Data are immutable, neverchanged, never deleted. This makes a few difficult steps unnecessary.The Dewey concepts look very attractive, although it is alsocreated with deleting and changing data in mind.
This is very important for normal company use.
But, as said, we don't have that in OpenEHR. Medical data alwaysgrow, it are always new data. An event that has passed will neverchange.The only things that change (in OpenEHR they are versioned, butfrom the user perspective, they change), that are demographicdata. And one can live with that, create extra provisions for thatdemographic database-section, which is only a small part of thecomplete database.
Often, the demographic database is external anyway.

So, my thoughts, maybe Dewey is too good.
Path-values (to leaf-nodes) storage is enough
The paths, combined with ID's are keys, and are much alike XPath,so it is easy to store XML in a path-value database.
And querying is also easy, because all queries are path based.
I think, for OpenEHR this is the fastest solution. But maybe Ioverlook something.Maybe I say something stupid. I am not offended if you say so(maybe in other words ;-).
We all want and need to learn.
What we still need to do to build this solution is handle theXQuery grammar and let it run on path-based-database.This is not very easy, but also not very hard. Maybe thealgorithms are already to find.Like the Dewey algorithm this can run on any database, also onfree open source databases.
I think you get an excellent performance on Postgres.
A path-value database is easy to index, it only needs a few. Theinserting will stay very fast, always.
Lets do some calculations, for fun:
How many path-value-combinations do you have to store for anaverage composition?
Maybe 30? How many compositions are for an average patient? 10,000?
So every patient needs 300,000 path-values. So you can store10,000 patients in 3 billion records.
This is not much for Postgres, and the simple indexes needed.
When you need to store 900,000 patients you need 90 separate tables.
Very cheap, very fast, also for adhoc queries, and easy toaccomplish, I think.
I am very interested in opinions on this.

Thanks
Bert
_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org<mailto:openEHR-technical@lists.openehr.org>
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org
_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org
_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org<mailto:openEHR-technical@lists.openehr.org>
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org
_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

Re: Archetype relational mapping - a practical openEHR persistence solution

Reply via email to