> [original post from Prof John Roddick, Flinders University South 
> Australia, which failed to get through]
>
>> Parsons, S., 1996. Current approaches to handling imperfect 
>> information in data and knowledge bases. IEEE Transactions on 
>> Knowledge and Data Engineering 8 (3): 353-372.
>
>> in which he identifies five types of imperfection in data.  Namely:
>
>> 1.  Incomplete.  (eg. test results not known or qualified as in 
>> "interim results only") 
>
I think this is an aspect of the real-world situation, and just means 
that the information currently captured is only a "snapshot" along some 
tmeline; later, the final information will (presumably) be available. In 
openEHR, this would be indicated in the clinical info itself, e.g. 
pathology results might say "preliminary results". We don't need to do 
anything special in this case.

In cases like an unconscious person coming to A&E, and the admission 
form on the screen requires all sorts of things which cannot be answered 
for now, traditional computer systems do completely the wrong thing, and 
either prevent the form from being committed with what is known 
(phsyical description=xxxx, presenting complaint=partially severed left 
hand....) or creates dummy (but wrong) values for the fields that could 
not be filled in.

For this kind of situation, we have taken a lead from SCADA control 
systems (where I learned about software) and HL7's "flavours of null" 
approach. In control systems, all values have an associated "data 
quality" marker, which, if it indicates that the value is "old" or that 
serial communication from the field has stopped, you ignore the actual 
value (which might otherwise look like a completely legitimate 
transformer voltage or whatever). In HL7, all their data types include 
the notion of Null values in every possible field, and the include a 
"flavour of null" - reason for why the value is not available - e.g.. 
"unknown", "unavailable", "not asked", "asked but refused", "not 
applicable" etc (that's from memory so the values might be a bit off).

The approach we have taken in openEHR is similar to the control system 
approach, and uses HL7's flavour's of null. Thus, the class ELEMENT has 
attributes:
value: DATA_VALUE
null_flavour: DV_CODED_TEXT {value from HL7 null flavours domain}

This approach also works for database systems - there is no need to mix 
in fake null/0 values into the type value domain for a value field - 
it's a separate field, btu always associateed with the value field. So 
even if Oracle forces you to have a real date in the date-of-borth field 
(e.g. "1-1-1800"), the null_flavour sitting next to it has the value 
"UNK", meaning - "unknown - ignore what is in the value field".

>> 2.  Imprecise.  (eg. age "between 25 and 30" etc.).  This arises from 
>> a lack of granularity. 
>
we definitely have to deal with this. The possible ways include:
- DV_INTERVAL<T> type for ranges
- partial dates & times
- using narrative text

do we need more?

>> 3.  Vague.  (eg. blood pressure "high", smokes "a lot", pain "acute", 
>> etc.)   This arises from the use of fuzzy terms. 
>
we also have to deal with this, and the typical clinical version found 
in pathology and other areas where you get values from sets like {trace, 
+, ++, +++, ...}.

Currenty we have avoided a complex fuzzy data type, and provided the 
DV_ORDINAL data type, which allows ordinal numbers to be associated with 
symbols (or words). So for smoking, if you really want to avoid 
characterising quantitatively, you could use a DV_ORDINAL, which comes 
from a "Lilliputian DOH tobacco consumption" domain/set: {1=none; 
2=occasional; 3= regular/light; 4=heavy; 5=going to die real soon now}. 
 From the medical perspective I imagine that this particular example 
would be a spectacularly bad way to record this particular datum..... 
but the model will certainly let you do it, and it will also allow 
comparison (use of the '<' operator) by virtue of the ordinal numbers 
associated with the symbols. For recording pain, or the Apgar 
characteristics, or urinalysis values, this approach seems fairly common 
among clinicians.

Our idea wsith DV_ORDINAL was primarily not to prevent doctors from 
using "+", "++", "+++" type values, and to add a little bit of rigour 
(ensuring comparability).

What we are not doing is implementing a mathematical fuzzy model where 
each symbol is associated with a sub-section of a numerical range. For 
those of you into fuzzy maths, you know that to characterise these 
mapping requires a fair bit of extra information. However, this kind of 
information can be stored in archetypes, and is not needed in the data 
(the mappings should not change with respect to the patient), so we 
should probably consider this when designing the archetype version of 
the DV_ORDINAL class (and maybe other quantitative classes as well).

>> 4.  Uncertain.  (eg. a 95% chance of accuracy).  Arises from a lack 
>> of knowledge or subjective assessment. 
>
for this we include a "confidence: REAL" attribute in the ENTRY class.

>> 5.  Inconsistent.  (ie. contradictory information). 
>
I'm not sure what should be done about this, but I think it is in the 
clincal domain; the level of or reason for inconsistency should be 
characterised in the data by its authors; I don't think it needs anyting 
special in the reference model. (Anyone disagree?)

>> to that you can add a sixth
>
>> 6.  Out-of-date.  (ie. correct when stored by unlikely to be true now). 
>
this is a tricky one, and an example is "smoking status"=smoker which 
might be true up until two years ago, but change then. Also, the 
converse - the EHR shows that the patient was recorded as a smoker 15 
years ago, but there is no new information regarding smoking at all. Is 
s/he still a smoker? In general the time-based transaction concept of 
GEHR gives systems the basic tool for recording updates to things.

Sam has been contemplating ways of representiing the idea of 
"confirming" previous information whose value does not change, but we 
want a more recent update on teh situation (and medico-legally, the 
practitioner wants to show in the record that they did indeed review 
various things on such-and-such a date). This might require a special 
marker whcih does not change the valuue of something, but says that it 
was verified to be the same. I don't think we have and answer yet for 
this in the architecture.

>> These can, of course, be combined!
>
>> Incompleteness has traditionally been handled in databases with the 
>> null value. In my opinion this has been totally inadequate but that 
>> doesn't stop it being the only option available in most systems.  
>> Imprecision and uncertainly is often handled through coercion to the 
>> nearest value with all the problems that might cause and vagueness 
>> and inconsistency is often not handled at all.  Out-of-date-ness is 
>> handled by assuming it doesn't happen. 
>
John's long experience with the horrors of inadequate data handling 
certainly rings true with me.

>> For the purposes of GEHR, I would suggest that No. 5. Inconsistent 
>> data is a fact of life and since this is somewhat different (it 
>> required two pieces of information for example) then we should leave 
>> this category to constraint handling and expert interpretation.  
>
Agree.

>> However, I would suggest we need to find a way of handling the other 
>> 5.  It's not initially clear how though.  Perhaps a qualifying field 
>> for each critical value? 
>
how do you feel about the current ways of dealing with the problems, 
detailed above? We would value your expert opinion.

- thomas beale



-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org

Reply via email to