[original post from Prof John Roddick, Flinders University South Australia, which failed to get through]
> At 3:43 PM +1000 17/6/02, Thomas Beale wrote: > >> One thing to be clear on - we must differentiate between "not >> recorded" and "not there". Not recording someone's weight does not >> make them "weightless" (don't worry I understood the joke, but this >> is a serious point as well). A better example would be - not >> recording smoking status doesn't make the patient a non-smoker. > > I've been following this discussion with some interest. Apart from > Sam's valuable contribution to this, you might want to refer to Simon > Parson's paper: > Parsons, S., 1996. Current approaches to handling imperfect > information in data and knowledge bases. IEEE Transactions on > Knowledge and Data Engineering 8 (3): 353-372. > in which he identifies five types of imperfection in data. Namely: > 1. Incomplete. (eg. test results not known or qualified as in > "interim results only") > 2. Imprecise. (eg. age "between 25 and 30" etc.). This arises from > a lack of granularity. > 3. Vague. (eg. blood pressure "high", smokes "a lot", pain "acute", > etc.) This arises from the use of fuzzy terms. > 4. Uncertain. (eg. a 95% chance of accuracy). Arises from a lack of > knowledge or subjective assessment. > 5. Inconsistent. (ie. contradictory information). > to that you can add a sixth > 6. Out-of-date. (ie. correct when stored by unlikely to be true now). > These can, of course, be combined! > Incompleteness has traditionally been handled in databases with the > null value. In my opinion this has been totally inadequate but that > doesn't stop it being the only option available in most systems. > Imprecision and uncertainly is often handled through coercion to the > nearest value with all the problems that might cause and vagueness and > inconsistency is often not handled at all. Out-of-date-ness is > handled by assuming it doesn't happen. > For the purposes of GEHR, I would suggest that No. 5. Inconsistent > data is a fact of life and since this is somewhat different (it > required two pieces of information for example) then we should leave > this category to constraint handling and expert interpretation. > However, I would suggest we need to find a way of handling the other > 5. It's not initially clear how though. Perhaps a qualifying field > for each critical value? > If this is seen as important by others, I'll put my mind to thinking > it through. > My two cents worth... > John. > -- > Professor John Roddick > Knowledge Discovery and Management Laboratory > Flinders University * Adelaide * South Australia > ---- > Ph: +61 8 8201 5611 Fax: +61 8 8201 3626 Mobile: 0414 190 073 > URL: http://kdm.first.flinders.edu.au/ > Email: roddick at cs.flinders.edu.au - If you have any questions about using this list, please send a message to d.lloyd at openehr.org

