Re: Evidence

M. Scott Marshall Thu, 21 Jun 2007 07:26:33 -0700

I see evidence as a special type of provenance for "facts","observations", and "conclusions" in a knowledgebase.

Motivation for evidence is the desire to represent information about anexperiment, such as the hypothesis. If we want to work with hypotheses,then we need to represent hypothetical information. But how? A uniformapproach would treat all information as propositional or hypotheticalrather than to have a separate class so that "hypothesis" can bepromoted to "fact" but I digress.. :) However we represent it, we wouldlike to know how our hypothetical fact is supported by evidence, such asprotocols and methods.


Alan Ruttenberg wrote:

Maybe we can bring this back to the main subject: What problems are wetrying to solve by recording evidence? What are the ways we would knowthat we've made a mistake?
(I suspect that there will be a variety of answers to this, and I'm verycurious to hear what people think)


I'll try to answer this:

We want to record evidence in order to evaluate and weigh the quality ofdata/information, as well as steer and/or evaluate any conclusions thatare made on the basis of that data. This is especially important in anenvironment for computational experiments. My test: If we can apply ourown criterion to evaluate our confidence in a given fact, even when itis in someone else's knowledgebase, we have succeeded with ourrepresentation of the evidence. So, an example of how to represent suchcriterion reason with it about example evidence would be nice..


Evidence in Text mining
-----------------------

Suppose that we are trying to distill knowledge provided by ascientific article into some representation. Example: "Is the articleabout proteinX?". If so, "How relevant is proteinX to the article?" andso forth. If the distillation process is carried out by a person, thenwho? In the case of text mining, we might like to know what algorithmsand techniques, queries, pattern recognizers (Bayesian or lexicalpatterns?), threshold values, etc. were used to extract knowledge. If aperson used a text mining workflow to support the distillation process,then we would like the URL to the workflow WSDL (from which we canusually discover the other details) and to know who the person was.

In general, we would like to know the resources involved in producing aparticular piece of data (or "fact"). We would like to know the actors,roles, conditions, algorithms, program versions, what rules were fired,and information resources.

An important challenge in the future will be to combine results frommanual and automated processes. Most of us would tend to view "facts"that result from an automated process as more hypothetical orquestionable than the same coming from a human expert. On the road toautomation, however, we should eventually reach the point that thequality of "text mining"-supported (i.e. not generated!) annotationswill be generally higher than manual-only annotation.


Evidence in Microarrays
-----------------------

I don't intend to start a debate about the particulars of microarraysbut I think that evidence comes up in practice here throughout theentire process of measurement and analysis. Gene expression, as measuredby microarrays, is actually a measurement of changes in mRNA levels at aparticular time, which *indicates* how much change in the process ofexpression has occurred under *specific* *conditions*. So, already wehave an example of terminology that is not ontologically accurate whenincorrectly applied (to microarrays) - technically, measuring mRNAlevels is not equivalent to measuring the quantity of protein product("expression"). But the term has been in use for so long that it remainsacceptable to refer to microarray analysis as "expression analysis". :)

In the case of "gene expression", the statistical process of microarrayanalysis only provides a probability that a gene is up or down regulated(e.g. in the common reference model). However, there is a series ofdecisions and conditions that lead up to the "call" (up, down,unchanged) for a particular gene and thus the resulting set ofdifferentially expressed genes for the array. The following conditionscan all be relevant to decisions in how much weight to give to theresulting data:


* Experimental design - organism, conditions, disease, phenotype, ..
* Source of cells, enzymes, ..
* Materials handling (thawed? how often?)
* Protocols used such as RNA extraction
* Operator
* Array layout and design - including choice of oligos

* Instrumentation details - array spotter/printer, laser type andcalibration, ..

* Ozone levels (I'm not kidding!)
* Image analysis ("Feature Extraction") software and settings
* Type of normalization
* Criteria for discarding data as "outliers"

* Criteria for classifying gene as differentially expressed (p-valuecutoff, ANOVA, ..)

Again, the point that I'm trying to make about microarrays is thatevidence (as well as uncertainty), can be represented and used, even forthe measurements ("observations") themselves. But this is not done inpractice. Even if you wanted to simply "pool" microarray data (mostpeople don't), it is very difficult to do because some of the mostimportant metadata (e.g. experimental design), if available, is often infree text format.


-scott

p.s. My introduction to HCLS summarizes the way that I look at evidencea lot more succinctly than the above: ;)

http://lists.w3.org/Archives/Public/public-semweb-lifesci/2006Feb/0131.html

--
M. Scott Marshall
http://staff.science.uva.nl/~marshall
http://adaptivedisclosure.org

Re: Evidence

Reply via email to