Trying to understand the openEHR Information Model

Bert Verhees Mon, 22 Apr 2013 22:44:07 +0200

On 04/22/2013 02:12 PM, Thomas Beale wrote:
> On 22/04/2013 10:01, Bert Verhees wrote:
>> On 04/22/2013 10:01 AM, Thomas Beale wrote:
>>>
>>> Hi Bert,
>>>
>>> Xquery wasn't stable in 2006 when we needed a query language. AQL 
>>> was implemented by Ocean by 2007 and has been working since then, 
>>> and something similar implemented by companies in Brazil. Later on, 
>>> Marand implemented it, and I suspect someone else.
>>
>> I am sorry, I have no time to provide a well done analysis, but I 
>> have an opinion.
>>
>> XQuery is stabilized in 2007, XPath is sometime longer around, but as 
>> I understand, in version 2.0 it is subset of XQuery 1.0. I am reading 
>> the O'Reilly book of Priscilla Walmsley about XQuery, she explains 
>> very thoroughly (as we are used from her).
>>
>> AQL as shown in the Wiki, (that is what I know of AQL), can very well 
>> be served by syntax-transformation to XPath/XQuery.
>> http://www.openehr.org/wiki/display/spec/Archetype+Query+Language+Description
>>
>> Should one do that? Syntax-transformations? There is a risk.
>>
>> In favor of XQuery, there are query-engines available almost out of 
>> the box, open source or closed, some which are in development for 10 
>> years, based on good indexing, and still being active developed.
>> With all respect, but I think there has been very good work done, 
>> worldwide, and one should profit on that if possible.
>
> that's fine for XML data. But many implementations do not use XML as 
> the storage format - and there are good reasons for that - XML Schema 
> representations of object data require transformation, and have 
> efficiency problems that have to be addressed in one way or another.


I don't see any transformation needed, only leaf-data are stored as 
data, and that are always simple data, not objects, there is no 
transformation needed, no efficiency lost.
There are no proven efficiency-problems in XML, that is only a story, 
from a bad research with lack of details, we had that discussion.
The technique is over 15 years an industry-standard for many purposes.

But I understand your point, we can discuss that without bashing XML:
You are saying that people may want to use another storage than 
XML-databases, and than they can't use XQuery.
You are right, but can they use AQL?

There is only an incomplete definition of AQL in a Wiki, that had no 
substantial changes since long time, thus hardly any progress.
There is no guarantee that the Wiki is stable.

I think you know what kind of effort and the risk is to write a new 
query-engine on a new language-concept for any database-concept of choice.

Seref said it to Randolph a few days ago, there isn't hardly any work 
done by third parties, only two implementations of AQL, and in the same 
sentence he calls AQL the almost most important part of the OpenEHR 
eco-system.
Quote of Seref in this context:

> In my humble opinion, AQL is the most neglected, yet, probably one of 
> the most important components of an openEHR implementation. It is not 
> part of the implementation, but it has been implemented by at least 
> two vendors that I know of, with a third having something quite 
> similar to it.
One could, reading this, starting to doubt if OpenEHR can exist without 
a query language,
I think Seref is right. It cannot. And then there is no stable 
specification?

Also consider this.
How can two companies have implemented AQL if there is no stable 
definition?
How much money do they put at stake with uncertain result?
These are rhetorical questions.

It brings me to the conclusion that for third parties, there is only one 
way to go, and that is XML, and XQuery, there is no other way to get an 
OpenEHR system ready at this time and the coming few years.
The query language is one difficult part, the other difficult part is 
validation. Both can be solved using standard industry-tools, I come 
back to this at the end of this message.
And I am not talking about MLHIM. ;-)

The OpenEHR eco-system for XML is ready and full of features.

I don't say, XML is the only way, to write kernel. But it has many 
advantages, because of the wide industry-support, and the thousands of 
man-years development in that.
Choosing any other solution means having to write an query engine for a 
query language which still is not declared stable, and having to write a 
validation-tool which, as far as I know, only exist for DADL.

Implementing OpenEHR for a software-vendor, not using XML, is hardly an 
option.

>
> The general need we have in openEHR is for an abstract query language 
> that can be used to express queries to any openEHR (or 13606 or other 
> archetype-based system), regardless of whether its concrete 
> persistence happens to be in XML.
> If you are suggesting that we use Xquery/Xpath even for non-XML data 
> representation cases, that's a different conversation. It won't work 
> out of the box, because we use a more efficient path syntax (but which 
> is easily convertible), and Xquery/Xpath make other assumptions due to 
> being targetted to XML, e.g. they assume the XML attribute/element 
> dichotomy, which doesn't exist in normal object data; they don't 
> assume an object inheritance model, and so on.

By chance, tomorrow I go to Intersystems, for a technical introduction 
for Cache and tooling.
I am specially interested in (proprietary) path-based query-formalism 
they support.

I ask them for XQuery-support. I've read on their website, it was possible.
It is not surprising when their proprietary path-based query-formalism 
is very much like XPath.

This is because how can a serious database-vendor nowadays live without 
XQuery-support?
All big database-vendors support XML-structures, and they also support 
XQuery.
Check Microsoft, check Oracle, XML is here to stay, and that is so since 
15 years.

XPath2.0 (which is a subset of XQuery 1.0) is very similar to the 
path-based AQL, easily convertible, as you call it.


>
> Nevertheless, if it could be shown that AQL could be mapped to a clean 
> subset of Xquery/Xpath as a standard formalism, that's likely to be 
> useful. It would mean that those implementers who choose XML as their 
> internal data representation would be able to use standard products 
> out of the box, as you say. 
Maybe not 100% mapping, that can only be said after there is a stable 
AQL definition, but those from the AQL-Wiki can.

> Others might be able to some components, e.g. Xquery parsers in order 
> to build a query engine that talks to non-XML data.

I did similar once, writing a virtual query-engine.
It was an engine which could query 
"mapped-to-objects-third-party-likely-structured-databases" (Excusez le 
mots).
It could connect to an old COBOL database, a MUMPS hierarchical 
database, an API-based database, and some SQL databases.
They all could, on this product, be approached over SQL, to a common 
simplified virtual datamodel. That was the goal, and it worked, more or 
less.

It is about the same you propose in this sentence.
Using the grammar coming out of a query-engine to use it on another 
database-concept with a likely but not the same structure and probably 
other kind of optimizations.

It is very difficult to do something like that. It will cost 
man-months/years to get it fast performing and more or less bug-free.

The easy part, simple selects will take a few months, but then, 
optimizing in different kind of indexes, also user defined indexes, 
multi-user, unions, sub-selects, aggregations, authorization.
It is not easy at all, and I would definitely not advise a company to go 
this way.


>
>>
>> XQuery can also be used directly to query OpenEHR datasets. I see no 
>> reasons against this very good working solution. There is not really 
>> a need for a separate query-language.
>> At this moment AQL is a niche and XQuery is a standard. I have read 
>> somewhere that also Cache from Intersystems in an additional module 
>> supports XQuery, but marketing language is often gibberish. One can 
>> never be sure what really is possible.
>>
>> Apart from that, maybe there is a wish to complete the 
>> ADL/AQL-eco-system, for those who chose not to store in XML and want 
>> to write their own AQL-query-engine on the database-concept of their 
>> choice.
>> In that case, AQL should, in my opinion, be defined as close as 
>> possible to XPath/XQuery. I think very very close is possible and 
>> even obvious.
>> This is, because the basic goal is the same, to offer a generic 
>> query-language.
>
> well it's a bit more that that - it's to define a query language that 
> is a) based on the logical content models of the data and b) needs to 
> know nothing about the concrete persistence representation of the 
> data. The query language also has to support terminology-based query 
> expressions and subsumption.

As far as I can see from the Wiki, AQL is not going any advanced way, 
but it looks very obvious as one can expect from a generic query-language.
I don't know the state of art what Seref is talking about when he says 
that AQL is implemented by two vendors.
Terminology can possibly be done by preprocessing, depending of course 
on the terminology.

>
> But if it can be aligned, let's do it. It just needs someone to do the 
> work.
>
>> But other arguments could be: to comfort developers, to profit from 
>> what is already been done (in standard-definition and in tooling), 
>> and to provide interoperability with that part of the world, which 
>> understands XML better than ADL/AQL.
>>
>> But the next issue comes up.
>>
>> A shortcoming of the OpenEHR-documentation is the expression of the 
>> RM in XML-Schema. Derived OpenEHR-datasets can never be validated 
>> legally in XML-Schema 1.1 or 1.0.
>
> Do you mean just that the Release 1.0.2 XSDs need to be better 
> designed? We certainly know that, and welcome any proposals on that 
> (of which there are already many).

No, I mean that it is impossible to represent RM 1.0.2 in W3C XML 
Schema. It is unusable.
You cannot validate any XML-dataset modeled from an CKM archetype 
against the XSD's on the OpenEHR website regardless of the 
XML-Schema-version.
It is simply impossible, illegal. OpenEHR is breaking several XML-Schema 
rules.
XML Schema in any version is not ready for multi level modeling.

With some tricks it can be done, I do that now, but that is not very 
elegant.

But I have not found any reason why it cannot be defined in RelaxNG, 
which is a widely used Oasis standard.
But, I must admit, I am not  completely ready researching this, but I am 
for more 80%.
It relaxes on the points where XML Schema has its blocking restrictions.
It looks promising, I will let you know, I think, end next month, when I 
start working again on this.

My goal however is not only to represent OpenEHR in a schema-language, 
but everything that can be defined in ADL 1.4, so including OpenEHR.
And the translation from ADL to schema needs to be done automatically.

Oasis, as you will know is an industry standardization organization, it 
is Domain Member of OMG, and it is also sponsored by OMG (and the 
members of OMG).
There are several RelaxNG schema definitions which made it to 
ISO-standard, it is stable for many years now.


>
>> So defining the RM in a XML-Schema is quite useless, and bringing 
>> people on a dead end street. There are, however good alternatives, 
>> even better.
>
> Not sure what you are saying here, Bert. XML openEHR data is regularly 
> used as an exchange format for applications and systems. Can you 
> explain a bit better what you mean by the above?

I am sorry to say.

Writing a W3C XML Schema representing an archetype, and conforming the 
base-schema's published on the OpenEHR website.
It is not possible, not even for one single archetype from CKM.
So the XSD's are useless, meaning, there is no way they are useful.

The conversion to the exchange-format cannot be validated against a 
constrained schema representing the archetype in which they are defined. 
I am pretty sure in this.
You know what you have for source-data, maybe objects in Cache, or DADL 
or path/value-combinations.

But you don't know if the target/exchange XML-data are still valid.
You can guess they are, but you cannot proof they will always be valid.
I think, validation after data-transforming is very important. A guess 
should not be good enough.

Bert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20130422/f234f137/attachment-0001.html>

Trying to understand the openEHR Information Model

Reply via email to