Re: Versioning vs Temporal modeling of Patient State

William Bug Fri, 12 Jan 2007 04:42:06 -0800

The IFOMIS work Dirk, Kirsten, and others have cited on referenttracking is definitely important work to review in this light. I'dnot been familiar with the model theoretic work Bijan mentions, butclearly that is important.

Werner Ceusters also has a list - a Google list I believe - onreferent tracking.

This work - and related work on "speech acts" - is most definitelyrelevant to this discussion and very specifically is designed toaddress ABox. As the citations given indicate, most of this work hasbeen done in the clinical domain with a focus on patient records,which was the origin of this thread and would be directly relevant tothe Use Case Nigam put out there.

Some of that work has begun to seep into the discussions regardingthe sort of GENBANK issues Kei mentioned, but its still really justdiscussion to my knowledge. As you could tell from the way I couchedmy description of that problem, clearly referent tracking is a bigpart of what must be accomodated in that domain as well - both interms of the actual content and evolution of a record in GENBANK,TrEMBL, etc., as well as the many ways in which researchers link toand reference such records.

Also - the work I was mentioning regarding TBox focussed, highlygranular revisions, has been informally discussed by NCBO folksincluding Chris Mungal, Fabian Neuhaus, Barry and others - again withan eye toward providing reasoning services to support thisrequirement of the nature of what Bijan, Dirk and others mentionbelow. This is associated with the discussions on this topic amongstboth BIRN, OBI, and NCIT participants but has all been very informalso far - AFAIK.

One of the things I would point out regarding the metadata propertiesI was referring to, is this was really meant to be just a simple,"low hanging fruit" approach to a much more complicated problem.There was not thought given to how one would actually constructautomatic means to mediate reasoning on - or even just representing -the evolving semantic graph. The idea was simply - many biomedicalontology development projects have begun to notice the pressing needfor version control which appears to be required at a very granularlevel. Standard source version control systems - e.g., CVS, SVN,etc. - just make the problem worse in my opinion. This is where I'ddiffer with the point Vipul makes. It's not that there are NOaspects of the software version process relevant to this issue. It'sjust I believe there are complex issues in this domain - some ofwhich Bijan mentioned - some of which I mention below regardingapplication the traditional approach to employing CVs for literatureannotation - that extend greatly beyond what the common practice insoftware version control is intended to support. In that domain,highly granular version management has been required, and I believesomething like it will be required in the ontology development spaceas well. Perhaps that's just a qualification and rewording of thepoint Vipul was trying to make.

SKOS, as I mentioned, does try to absorb some of what has been doneon this issue in the A&I/library science world in relation to CVapplication to the literature annotation process. This has long beenrecognized in that field as extremely important to the propercuration of a CV/taxonomy/classification scheme/thesaurus. In thatdomain, if you step back a bit from the details and ask - what is theintended purpose of a CV in that domain - the answer clearly is toimprove both precision and recall (F-measure from standard IR) forboolean, term-based queries. Anyone who has used MEDLINE over theyears has learned the utility of this approach - and its limitations(the barrage of false positives and unknown number of false negativesthat typically still effect query results). There is no doubt justlooked at empirically that having the people who are annotating theliterature use a CV greatly improves the F-measure of the searchsystem used to mine the resulting inverted indexes. However, I knowfrom time working with the creators of the Biological Abstracts, thatit took months of training for the "indexers" to get good atconsistently applying CV terms - and a lot of QA/QC was still neededto constantly monitor the output. The reason really comes down tothe lack of complete, detailed definitions and lack of a formal,semantic graph really left way too much leeway for indexers, evenwhen a moderate amount of effort was dedicated to incentivizingindexers. Having said that, when highly specific definitions wereused, it was found indexers both greatly slowed in the annotationoutput AND use of CV terms went way down, both of which are really atodds to the intended goal of the process (back to F-measure), whichis to provide maximal annotation given according to a CV. Even withthis work, BIOSIS (publishers of the Biological Abstracts) and reallyall the A&I vendors I knew of, still required a huge educationalstaff that would constantly travel the world providing demos andupdates to librarians, so they could be kept informed on how best touse the resulting CV indexes.

It was still clearly an art to maximize F-measure - one that verymuch depended on quality and structure of the CV/classificationscheme/taxonomy, the talents of the indexers applying the CVs in theannotation process and the talents of the info. retrieval experts/librarians in constructing queries. By far the most confoundingaspect of this process was the need to alter indexer and searcherpractice, as CV changes were introduced - as was of course inevidible- both due to changes in the *world* and changes in *knowledgerepresentation*, as Bijan describes it below. It was partly becauseof this, that various CV curatorial practices were developed thatagain are partially represented in SKOS - fields such as "scopenotes", "history notes", etc., which all relate to the versioningissue in this context, but, of course, are designed for humanconsumption and are not particularly useful to KE/KR algorithms.

My sense - as you can see in that OBI Wiki page I cited - is there isa need to provide such curation support in the ontology developmentprocess both to address the lexical issues as has been historicallydone in info. science/library science, as well as to address semanticgraph evolution. Both of these requirements arise due both tochanges in *world* and QA/QC performed on the KR (changes in*knowledge*). My sense is in providing this first simple step - ashared collection of AnnotationProperties used across the communitywhen building OWL-based ontologies - we provide the structurerequired to develop software tools to help automate the process.Nothing extending to the complexity of automatic reasoning, but justsomething to address the need quickly - a structured model for theseprocesses, if you will, that can evolve toward the more complex"referent tracking" and "speech act" formalism. This stop-gap isn'tnearly enough to fully address this complex issue, but it should berelatively easy to implement and to put into practice (with a minimalamount of automated support for ontology curators), and if donecorrectly, should be something that can migrate to the more complexapproach later. Providing too complex a strategy for addressing thisversioning issue now might prohibitively slow the ontologydevelopment process as it is being carried out by various communitybiomed. ontology development projects.

As you can tell, this is just a suggestion which OBI, BIRNLex, and afew other ontology developers have just begun to implement, so thisis most definitely a work-in-progress.

Having a review of the topic, as Vipul suggests, at this stage in thegame by the several folks who've provided valuable pointers andfeedback, would be a wonderful idea, I think.


Cheers,
Bill


On Jan 12, 2007, at 6:26 AM, Kashyap, Vipul wrote:



Is there any work in the literature related to:

- Defining what and when a version is?
- Do all updates necessarily lead to a new version?
- Is there a utility to instance versioning?

The observation about the utility of knowledge base update andrevision is anastute one. IMHO the utility of instance versioning is not cleareither.


Just my 2 cents,

---Vipul

-----Original Message-----

From: [EMAIL PROTECTED] [mailto:public-semweb-lifesci-

[EMAIL PROTECTED] On Behalf Of Bijan Parsia
Sent: Friday, January 12, 2007 5:28 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; 'w3c semweb hcls'; public-semweb-lifesci-
[EMAIL PROTECTED]
Subject: Re: Versioning vs Temporal modeling of Patient State


On Jan 12, 2007, at 9:36 AM, [EMAIL PROTECTED] wrote:

Recently I had an interesting conversation with Werner Cuesters,
professor in Bufallo and colleague of Barry Smith. He has some
theory about ontology maintenance and versioning and it considers
both "classes" and "instances". Both can change either because you
made en error, either you view on the world changed, either because
the world changed . It turns out that you can only handle changes
if you know for each change exactly what de reason of the change
was. That reason should be documented in the system.

[snip]

The standard lingo for this is that a change to the knowledge base
due to a change in the *world* is called an *update* whereas a change
in your knowledge base due to a change in *your knowledge* of the
(current static) world is called a *revision*. The locus classicus
for this, IMHO, is:
        <http://citeseer.ist.psu.edu/417296.html>

Following there model theoretic accounts, there is a spate of work
defining reasoning services that compute the updated or revisied
knowledge base given a proposed update or revision. E.g., recently:
        <http://lat.inf.tu-dresden.de/~clu/papers/archive/kr06c.pdf>

The utility of model oriented revision and update for expressive
logics is, IMHO, not fully established, though it is conceptually
useful in my experience. There is, of course, a large chunk of work
on revising (and even updating) belief *bases*, that is, attending
primarily to the *asserted* set of formulae.

Hope this helps.

Cheers,
Bijan.

THE INFORMATION TRANSMITTED IN THIS ELECTRONIC COMMUNICATION ISINTENDED ONLY FOR THE PERSON OR ENTITY TO WHOM IT IS ADDRESSED ANDMAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED MATERIAL. ANY REVIEW,RETRANSMISSION, DISSEMINATION OR OTHER USE OF OR TAKING OF ANYACTION IN RELIANCE UPON, THIS INFORMATION BY PERSONS OR ENTITIESOTHER THAN THE INTENDED RECIPIENT IS PROHIBITED. IF YOU RECEIVEDTHIS INFORMATION IN ERROR, PLEASE CONTACT THE SENDER AND THEPRIVACY OFFICER, AND PROPERLY DISPOSE OF THIS INFORMATION.


Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - [EMAIL PROTECTED]

Re: Versioning vs Temporal modeling of Patient State

Reply via email to