Hi Bert, I will post some more thoughts on these things after the weekend :) To just give a quick answer: imo it's important to have a flexible data format like the one I2B2 uses (roughly said EAV) to mix openEHR data with non-openEHR data. Making analysis on XML documents/databaes might prevent integration of other sources (and even if its possible like in SQL Server or Oracle, queries that mix XQuery and SQL might become pretty ugly. And I see no good way to provide a drag & drop interface to researchers/physicians to make queries).
Cheers, Birger Von meinem iPad gesendet > Am 13.02.2016 um 16:24 schrieb Bert Verhees <bert.verh...@rosa.nl>: > > No comments, on the other hand it is Saturday > > I had left out some necessary technical details. > I will possible build it and then have possible the fastest > two-level-modeling engine in the world, which will, of course, also support > OpenEHR. > So it is not really bad that this happens. > > When it is ready, I will make an announcement. > > Best regards and enjoy the weekend. > Bert Verhees > > >> On 12-02-16 20:49, Bert Verhees wrote: >>> On 12-02-16 18:26, Erik Sundvall wrote: >>> if you are experimenting with open source native XML DBs for openEHR, it >>> preformed well for "clinical" patient-specific querying even though all xml >>> databases we tested were not suitable for ad hoc epidemiological population >>> queries (without query specific indexing) >> >> >> A very interesting paper. I have some first opinions on that. But first I >> need to explain what I think about the matter. >> I have not prepared the story below, so there may be things which I write to >> fast. See it as provisional view, not as a hard opinion. >> --------- >> There are relational database-configurations for OLAP and for OLTP. The >> combination is hard to find. There are reasons. This is a classic problem. >> >> You need specific indexes for data-mining (OLAP), and for every extra >> data-mining query you need more indexes, especially if you don't have time >> to wait a night for the result. Those extra indexes stand in the way >> for transactional processing (OLTP) because they need to be updated, and >> that is unnecessary burden for the OLTP-processes, as longer as the database >> exist, the burden becomes heavier. >> That is why OLAP and OLTP are not often combined in one configuration. >> >> So many professional databases have extra features for OLAP, I worked, years >> ago with Oracle for a big Dutch television company, and my main job was to >> create indexes for marketing purposes. >> We ran those unusual queries during the night and stored the result in >> special tables, Oracle called them "materialized views". >> The day after, those views were processed in analyzing software, like SPSS, >> and after that, thrown away. >> >> It was a database with 900,000 persons in it, and every person had a lot of >> history of web-history, personal interests, etc. >> "How much interest does a person have for opera, and is he able to pay for >> opera, is it worth to call him for a ticket-offer, we cannot call 900,000 >> persons" >> These were complex queries based on things the customer bought, television >> programs he was interested in, web-activities. >> That was the kind of thing they did with the database. >> >> So this could well be compared with a medical database, regarding to size >> and complexity. >> >> The same difficulties count for XML databases. That is why XML databases >> have also features for creating extra indexes. >> Oracle, by the way, if it knows the structure of XML (via XSD), it breaks, >> underwater, XML into relational data, and store it in a relational database. >> It also converts XQuery to SQL. >> In this way, it has the weakness and advantages of a relational database, >> and it needs the extra indexes for unusual queries, but on the developer >> view it is XML. >> ------- >> Comparing XML and relational, for OpenEHR, I favor XML, because it can >> easily reflect the structures which need to be stored. It makes the >> data-processing layer less complex. There is a lot of tooling around XML, >> XML-schema to make the database-structure known to Oracle, Schematron to >> validate against archetypes. This is very matured software, and therefor the >> complexity is solved years ago, and well tested. It is hidden complexity, >> and matured hidden complexity is no problem. >> -------- >> And if you want to do data-mining, like epidemiological research, and you >> have the time to plan the research, then the classical database, XML or RDB, >> is OK. >> >> In my opinion, there is not often a real need for adhoc data-mining >> (epidemiological research) queries, with result in a few minutes. They are >> always planned, and creating the indexes and storing the result in >> "materialized views" are part of the work one has to do for data-mining >> research on data. >> So, I don't think there is a real need for this. >> -------- >> Regarding to XML databases, Oracle has a solution, which can perform well if >> it is professionally maintained. >> This is often a point, because professional Oracle maintaining regarding to >> advanced use is very expensive. >> Another company is MarkLogic. It is said that MarkLogic is better, but I >> don't know that from own experience. >> Both are free to use for developers. >> You must think of numbers to 35,000 Euro a year for licenses, which is not >> very much for a big hospital, but very much for a small health service >> The open source XML ExistDB database is not very good for data-mining, is my >> personal experience. >> >> So, we must ask ourselves, are we solving a problem that no one experiences? >> -------- >> There are a few advantages to OpenEHR. Data are immutable, never changed, >> never deleted. This makes a few difficult steps unnecessary. >> The Dewey concepts look very attractive, although it is also created with >> deleting and changing data in mind. >> This is very important for normal company use. >> >> But, as said, we don't have that in OpenEHR. Medical data always grow, it >> are always new data. An event that has passed will never change. >> The only things that change (in OpenEHR they are versioned, but from the >> user perspective, they change), that are demographic data. And one can live >> with that, create extra provisions for that demographic database-section, >> which is only a small part of the complete database. >> Often, the demographic database is external anyway. >> >> So, my thoughts, maybe Dewey is too good. >> Path-values (to leaf-nodes) storage is enough >> The paths, combined with ID's are keys, and are much alike XPath, so it is >> easy to store XML in a path-value database. >> And querying is also easy, because all queries are path based. >> >> I think, for OpenEHR this is the fastest solution. But maybe I overlook >> something. >> Maybe I say something stupid. I am not offended if you say so (maybe in >> other words ;-). >> We all want and need to learn. >> >> What we still need to do to build this solution is handle the XQuery grammar >> and let it run on path-based-database. >> This is not very easy, but also not very hard. Maybe the algorithms are >> already to find. >> Like the Dewey algorithm this can run on any database, also on free open >> source databases. >> I think you get an excellent performance on Postgres. >> >> A path-value database is easy to index, it only needs a few. The inserting >> will stay very fast, always. >> >> Lets do some calculations, for fun: >> How many path-value-combinations do you have to store for an average >> composition? >> Maybe 30? How many compositions are for an average patient? 10,000? >> So every patient needs 300,000 path-values. So you can store 10,000 patients >> in 3 billion records. >> This is not much for Postgres, and the simple indexes needed. >> When you need to store 900,000 patients you need 90 separate tables. >> >> Very cheap, very fast, also for adhoc queries, and easy to accomplish, I >> think. >> >> I am very interested in opinions on this. >> >> Thanks >> Bert >> >> >> >> >> >> > > _______________________________________________ > openEHR-technical mailing list > openEHR-technical@lists.openehr.org > http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org
_______________________________________________ openEHR-technical mailing list openEHR-technical@lists.openehr.org http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org