Re: Versioning vs Temporal modeling of Patient State

William Bug Fri, 12 Jan 2007 04:56:11 -0800

Just a quick addendum:

I would point out the issues I mention below regarding efforts toimproved F-measure derived from human efforts don't even begin toaddress the enormous amount of relevant work from the Text Miningfield (various approaches to improving on standard IR sparse matrixterm representation, NLP, LSI, text summarization, textcategorization, etc.). There is also the work Bob Futrelle hasbrought up before regarding "hedging" and other issues that very mucheffect our ability to automatically perform KE/KR/KM on unstructuredtext. This is not unrelated to work on "speech acts" as applied toformalization of clinical records, of course.

It's not completely clear to me how or whether that specific work canbe put to use on this versioning problem.


Cheers,
Bill


On Jan 12, 2007, at 7:40 AM, William Bug wrote:

The IFOMIS work Dirk, Kirsten, and others have cited on referenttracking is definitely important work to review in this light. I'dnot been familiar with the model theoretic work Bijan mentions, butclearly that is important.
Werner Ceusters also has a list - a Google list I believe - onreferent tracking.
This work - and related work on "speech acts" - is most definitelyrelevant to this discussion and very specifically is designed toaddress ABox. As the citations given indicate, most of this workhas been done in the clinical domain with a focus on patientrecords, which was the origin of this thread and would be directlyrelevant to the Use Case Nigam put out there.
Some of that work has begun to seep into the discussions regardingthe sort of GENBANK issues Kei mentioned, but its still really justdiscussion to my knowledge. As you could tell from the way Icouched my description of that problem, clearly referent trackingis a big part of what must be accomodated in that domain as well -both in terms of the actual content and evolution of a record inGENBANK, TrEMBL, etc., as well as the many ways in whichresearchers link to and reference such records.
Also - the work I was mentioning regarding TBox focussed, highlygranular revisions, has been informally discussed by NCBO folksincluding Chris Mungal, Fabian Neuhaus, Barry and others - againwith an eye toward providing reasoning services to support thisrequirement of the nature of what Bijan, Dirk and others mentionbelow. This is associated with the discussions on this topicamongst both BIRN, OBI, and NCIT participants but has all been veryinformal so far - AFAIK.
One of the things I would point out regarding the metadataproperties I was referring to, is this was really meant to be justa simple, "low hanging fruit" approach to a much more complicatedproblem. There was not thought given to how one would actuallyconstruct automatic means to mediate reasoning on - or even justrepresenting - the evolving semantic graph. The idea was simply -many biomedical ontology development projects have begun to noticethe pressing need for version control which appears to be requiredat a very granular level. Standard source version control systems- e.g., CVS, SVN, etc. - just make the problem worse in myopinion. This is where I'd differ with the point Vipul makes.It's not that there are NO aspects of the software version processrelevant to this issue. It's just I believe there are complexissues in this domain - some of which Bijan mentioned - some ofwhich I mention below regarding application the traditionalapproach to employing CVs for literature annotation - that extendgreatly beyond what the common practice in software version controlis intended to support. In that domain, highly granular versionmanagement has been required, and I believe something like it willbe required in the ontology development space as well. Perhapsthat's just a qualification and rewording of the point Vipul wastrying to make.
SKOS, as I mentioned, does try to absorb some of what has been doneon this issue in the A&I/library science world in relation to CVapplication to the literature annotation process. This has longbeen recognized in that field as extremely important to the propercuration of a CV/taxonomy/classification scheme/thesaurus. In thatdomain, if you step back a bit from the details and ask - what isthe intended purpose of a CV in that domain - the answer clearly isto improve both precision and recall (F-measure from standard IR)for boolean, term-based queries. Anyone who has used MEDLINE overthe years has learned the utility of this approach - and itslimitations (the barrage of false positives and unknown number offalse negatives that typically still effect query results). Thereis no doubt just looked at empirically that having the people whoare annotating the literature use a CV greatly improves the F-measure of the search system used to mine the resulting invertedindexes. However, I know from time working with the creators ofthe Biological Abstracts, that it took months of training for the"indexers" to get good at consistently applying CV terms - and alot of QA/QC was still needed to constantly monitor the output.The reason really comes down to the lack of complete, detaileddefinitions and lack of a formal, semantic graph really left waytoo much leeway for indexers, even when a moderate amount of effortwas dedicated to incentivizing indexers. Having said that, whenhighly specific definitions were used, it was found indexers bothgreatly slowed in the annotation output AND use of CV terms wentway down, both of which are really at odds to the intended goal ofthe process (back to F-measure), which is to provide maximalannotation given according to a CV. Even with this work, BIOSIS(publishers of the Biological Abstracts) and really all the A&Ivendors I knew of, still required a huge educational staff thatwould constantly travel the world providing demos and updates tolibrarians, so they could be kept informed on how best to use theresulting CV indexes.
It was still clearly an art to maximize F-measure - one that verymuch depended on quality and structure of the CV/classificationscheme/taxonomy, the talents of the indexers applying the CVs inthe annotation process and the talents of the info. retrievalexperts/librarians in constructing queries. By far the mostconfounding aspect of this process was the need to alter indexerand searcher practice, as CV changes were introduced - as was ofcourse inevidible - both due to changes in the *world* and changesin *knowledge representation*, as Bijan describes it below. It waspartly because of this, that various CV curatorial practices weredeveloped that again are partially represented in SKOS - fieldssuch as "scope notes", "history notes", etc., which all relate tothe versioning issue in this context, but, of course, are designedfor human consumption and are not particularly useful to KE/KRalgorithms.
My sense - as you can see in that OBI Wiki page I cited - is thereis a need to provide such curation support in the ontologydevelopment process both to address the lexical issues as has beenhistorically done in info. science/library science, as well as toaddress semantic graph evolution. Both of these requirements arisedue both to changes in *world* and QA/QC performed on the KR(changes in *knowledge*). My sense is in providing this firstsimple step - a shared collection of AnnotationProperties usedacross the community when building OWL-based ontologies - weprovide the structure required to develop software tools to helpautomate the process. Nothing extending to the complexity ofautomatic reasoning, but just something to address the need quickly- a structured model for these processes, if you will, that canevolve toward the more complex "referent tracking" and "speech act"formalism. This stop-gap isn't nearly enough to fully address thiscomplex issue, but it should be relatively easy to implement and toput into practice (with a minimal amount of automated support forontology curators), and if done correctly, should be something thatcan migrate to the more complex approach later. Providing toocomplex a strategy for addressing this versioning issue now mightprohibitively slow the ontology development process as it is beingcarried out by various community biomed. ontology developmentprojects.
As you can tell, this is just a suggestion which OBI, BIRNLex, anda few other ontology developers have just begun to implement, sothis is most definitely a work-in-progress.
Having a review of the topic, as Vipul suggests, at this stage inthe game by the several folks who've provided valuable pointers andfeedback, would be a wonderful idea, I think.
Cheers,
Bill


On Jan 12, 2007, at 6:26 AM, Kashyap, Vipul wrote:
Is there any work in the literature related to:

- Defining what and when a version is?
- Do all updates necessarily lead to a new version?
- Is there a utility to instance versioning?
The observation about the utility of knowledge base update andrevision is anastute one. IMHO the utility of instance versioning is not cleareither.
Just my 2 cents,

---Vipul
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:public-semweb-lifesci-
[EMAIL PROTECTED] On Behalf Of Bijan Parsia
Sent: Friday, January 12, 2007 5:28 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; 'w3c semweb hcls'; public-semweb-lifesci-
[EMAIL PROTECTED]
Subject: Re: Versioning vs Temporal modeling of Patient State


On Jan 12, 2007, at 9:36 AM, [EMAIL PROTECTED] wrote:
Recently I had an interesting conversation with Werner Cuesters,
professor in Bufallo and colleague of Barry Smith. He has some
theory about ontology maintenance and versioning and it considers
both "classes" and "instances". Both can change either because you
made en error, either you view on the world changed, either because
the world changed . It turns out that you can only handle changes
if you know for each change exactly what de reason of the change
was. That reason should be documented in the system.
[snip]

The standard lingo for this is that a change to the knowledge base
due to a change in the *world* is called an *update* whereas achange
in your knowledge base due to a change in *your knowledge* of the
(current static) world is called a *revision*. The locus classicus
for this, IMHO, is:
        <http://citeseer.ist.psu.edu/417296.html>

Following there model theoretic accounts, there is a spate of work
defining reasoning services that compute the updated or revisied
knowledge base given a proposed update or revision. E.g., recently:
        <http://lat.inf.tu-dresden.de/~clu/papers/archive/kr06c.pdf>

The utility of model oriented revision and update for expressive
logics is, IMHO, not fully established, though it is conceptually
useful in my experience. There is, of course, a large chunk of work
on revising (and even updating) belief *bases*, that is, attending
primarily to the *asserted* set of formulae.

Hope this helps.

Cheers,
Bijan.
THE INFORMATION TRANSMITTED IN THIS ELECTRONIC COMMUNICATION ISINTENDED ONLY FOR THE PERSON OR ENTITY TO WHOM IT IS ADDRESSED ANDMAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED MATERIAL. ANY REVIEW,RETRANSMISSION, DISSEMINATION OR OTHER USE OF OR TAKING OF ANYACTION IN RELIANCE UPON, THIS INFORMATION BY PERSONS OR ENTITIESOTHER THAN THE INTENDED RECIPIENT IS PROHIBITED. IF YOU RECEIVEDTHIS INFORMATION IN ERROR, PLEASE CONTACT THE SENDER AND THEPRIVACY OFFICER, AND PROPERLY DISPOSE OF THIS INFORMATION.
Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - [EMAIL PROTECTED]


Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - [EMAIL PROTECTED]

Re: Versioning vs Temporal modeling of Patient State

Reply via email to