Re: Archetype relational mapping - a practical openEHR persistence solution

Birger Haarbrandt Sat, 13 Feb 2016 08:48:30 -0800

Hi Bert,

I will post some more thoughts on these things after the weekend :) To just 
give a quick answer: imo it's important to have a flexible data format like the 
one I2B2 uses (roughly said EAV) to mix openEHR data with non-openEHR data. 
Making analysis on XML documents/databaes might prevent integration of other 
sources (and even if its possible like in SQL Server or Oracle, queries that 
mix XQuery and SQL might become pretty ugly. And I see no good way to provide a 
drag & drop interface to researchers/physicians to make queries).


Cheers,

Birger 

Von meinem iPad gesendet


> Am 13.02.2016 um 16:24 schrieb Bert Verhees <bert.verh...@rosa.nl>:
> 
> No comments, on the other hand it is Saturday
> 
> I had left out some necessary technical details.
> I will possible build it and then have possible the fastest 
> two-level-modeling engine in the world, which will, of course, also support 
> OpenEHR.
> So it is not really bad that this happens.
> 
> When it is ready, I will make an announcement.
> 
> Best regards and enjoy the weekend.
> Bert Verhees
> 
> 
>> On 12-02-16 20:49, Bert Verhees wrote:
>>> On 12-02-16 18:26, Erik Sundvall wrote:
>>> if you are experimenting with open source native XML DBs for openEHR, it 
>>> preformed well for "clinical" patient-specific querying even though all xml 
>>> databases we tested were not suitable for ad hoc epidemiological population 
>>> queries (without query specific indexing)
>> 
>> 
>> A very interesting paper. I have some first opinions on that. But first I 
>> need to explain what I think about the matter.
>> I have not prepared the story below, so there may be things which I write to 
>> fast. See it as provisional view, not as a hard opinion.
>> ---------
>> There are relational database-configurations for OLAP and for OLTP. The 
>> combination is hard to find. There are reasons. This is a classic problem.
>> 
>> You need specific indexes for data-mining (OLAP), and for every extra 
>> data-mining query you need more indexes, especially if you don't have time 
>> to wait a night for the result. Those extra       indexes stand in the way 
>> for transactional processing (OLTP) because they need to be updated, and 
>> that is unnecessary burden for the OLTP-processes, as longer as the database 
>> exist, the burden becomes heavier. 
>> That is why OLAP and OLTP are not often combined in one configuration.
>> 
>> So many professional databases have extra features for OLAP, I worked, years 
>> ago with Oracle for a big Dutch television company, and my main job was to 
>> create indexes for marketing purposes.
>> We ran those unusual queries during the night and stored the result in 
>> special tables, Oracle called them "materialized views". 
>> The day after, those views were processed in analyzing software, like SPSS, 
>> and after that, thrown away.
>> 
>> It was a database with 900,000 persons in it, and every person had a lot of 
>> history of web-history, personal interests, etc.
>> "How much interest does a person have for opera, and is he able to pay for 
>> opera, is it worth to call him for a ticket-offer, we cannot call 900,000 
>> persons"
>> These were complex queries based on things the customer bought, television 
>> programs he was interested in, web-activities.
>> That was the kind of thing they did with the database.
>> 
>> So this could well be compared with a medical database, regarding to size 
>> and complexity.
>> 
>> The same difficulties count for XML databases. That is why XML databases 
>> have also features for creating extra indexes.
>> Oracle, by the way, if it knows the structure of XML (via XSD), it breaks, 
>> underwater, XML into relational data, and store it in a relational database. 
>> It also converts XQuery to SQL.
>> In this way, it has the weakness and advantages of a relational database, 
>> and it needs the extra indexes for unusual queries, but on the developer 
>> view it is XML.
>> -------
>> Comparing XML and relational, for OpenEHR, I favor XML, because it can 
>> easily reflect the structures which need to be stored. It makes the 
>> data-processing layer less complex. There is a lot of tooling around XML, 
>> XML-schema to make the database-structure known to Oracle, Schematron to 
>> validate against archetypes. This is very matured software, and therefor the 
>> complexity is solved years ago, and well tested. It is hidden complexity, 
>> and matured hidden complexity is no problem.
>> --------
>> And if you want to do data-mining, like epidemiological research, and you 
>> have the time to plan the research, then the classical database, XML or RDB, 
>> is OK.
>> 
>> In my opinion, there is not often a real need for adhoc data-mining 
>> (epidemiological research) queries, with result in a few minutes. They are 
>> always planned, and creating the indexes and storing the result in  
>> "materialized views" are part of the work one has to do for data-mining 
>> research on data.
>> So, I don't think there is a real need for this.
>> --------
>> Regarding to XML databases, Oracle has a solution, which can perform well if 
>> it is professionally maintained. 
>> This is often a point, because professional Oracle maintaining regarding to 
>> advanced use is very expensive.
>> Another company is MarkLogic. It is said that MarkLogic is better, but I 
>> don't know that from own experience. 
>> Both are free to use for developers.
>> You must think of numbers to 35,000 Euro a year for licenses, which is not 
>> very much for a big hospital, but very much for a small health service
>> The open source XML ExistDB database is not very good for data-mining, is my 
>> personal experience.
>> 
>> So, we must ask ourselves, are we solving a problem that no one experiences?
>> --------
>> There are a few advantages to OpenEHR. Data are immutable, never changed, 
>> never deleted. This makes a few difficult steps unnecessary.
>> The Dewey concepts look very attractive, although it is also created with 
>> deleting and changing data in mind. 
>> This is very important for normal company use.
>> 
>> But, as said, we don't have that in OpenEHR. Medical data always grow, it 
>> are always new data. An event that has passed will never change. 
>> The only things that change (in OpenEHR they are versioned, but from the 
>> user perspective, they change), that are demographic data. And one can live 
>> with that, create extra provisions for that demographic database-section, 
>> which is only a small part of the complete database.
>> Often, the demographic database is external anyway.
>> 
>> So, my thoughts, maybe Dewey is too good.
>> Path-values (to leaf-nodes) storage is enough
>> The paths, combined with ID's are keys, and are much alike XPath, so it is 
>> easy to store XML in a path-value database.
>> And querying is also easy, because all queries are path based.
>> 
>> I think, for OpenEHR this is the fastest solution. But maybe I overlook 
>> something.
>> Maybe I say something stupid. I am not offended if you say so (maybe in 
>> other words ;-). 
>> We all want and need to learn.
>> 
>> What we still need to do to build this solution is handle the XQuery grammar 
>> and let it run on path-based-database.
>> This is not very easy, but also not very hard. Maybe the algorithms are 
>> already to find.
>> Like the Dewey algorithm this can run on any database, also on free open 
>> source databases.
>> I think you get an excellent performance on Postgres.
>> 
>> A path-value database is easy to index, it only needs a few. The inserting 
>> will stay very fast, always. 
>> 
>> Lets do some calculations, for fun:
>> How many path-value-combinations do you have to store for an average 
>> composition?
>> Maybe 30? How many compositions are for an average patient? 10,000?
>> So every patient needs 300,000 path-values. So you can store 10,000 patients 
>> in 3 billion records. 
>> This is not much for Postgres, and the simple indexes needed. 
>> When you need to store 900,000 patients you need 90 separate tables.
>> 
>> Very cheap, very fast, also for adhoc queries, and easy to accomplish, I 
>> think.
>> 
>> I am very interested in opinions on this.
>> 
>> Thanks 
>> Bert
>> 
>> 
>> 
>> 
>> 
>> 
> 
> _______________________________________________
> openEHR-technical mailing list
> openEHR-technical@lists.openehr.org
> http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

Re: Archetype relational mapping - a practical openEHR persistence solution

Reply via email to