Re: Seeking Help with finding an assertion

Kei Cheung Thu, 05 Jul 2007 19:38:12 -0700


Hi Chris,

Thanks for pointing out the potential flaws of their method. It soundedlike there is room for improvement in terms of the accuracy of databasecontents and the method of assessing database accuracy. Don't get mewrong. I think highly of GO. :-)

I'm also thinking more about what "negative knowledge" really means.Does it mean any or all of the following:


1. inconsistent knowledge
2. inaccurate knowledge
3. incomplete knowledge
4. knowledge with uncertainties

Can SW/ontologies help turn "negative knowledge" to "positive knowledge"?

-Kei

Chris Mungall wrote:

On Jul 4, 2007, at 8:27 PM, Kei Cheung wrote:
As a follow-up example, a study for estimating the error rate ofGene Ontology (GO) was done:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1892569#id2674403
The study showed that the GO term annotation error rate estimatesfor the GoSeqLite database were found to be 13% to 18% for curatednon-ISS annotations, 49% for ISS annotations, and 28% to 30% for allcurated annotations. (ISS stands for inferred from sequencesimiliarity). Despite these findings, the authors concluded that GOis a comparatively high quality source of informaton. Integration ofdatabases involving significant error rates, however, can impactnegatively the quality of science.
I have not yet properly digested this paper, but on a cursory readingthere appear to be a few serious flaws. First, a lack ofunderstanding of basic ontology principles - annotations to lessspecific classes in the graph are treated as errors. Second, theauthors appear to make a lot of incorrect assumptions about how ISSannotations are curated.
It's curious they predict such a high error rate yet don't provideany examples.
-Kei

Kei Cheung wrote:
Hi Karen,
Your questions remind me of the following classic article writtenby Robert Robbins on "Challenges in the Human Genome Project".
http://www.esp.org/umdnj.pdf
Although it doesn't directly answer the questions, in the"Nomenclature Problems" section (p. 20-21), it discusses thesignificant problem of inconsistent knowledge representation. Itsays that it's mistake to believe that terminology fluidity is notan issue biological in database design. It also says that manybiologists don't realize that, in a database bulit with 5% error inthe definition of individual concepts, a query that joins across 15concepts has less than 50% chance of returning an adequate answer.The section also points out the importance of formal representationof scientific knowledge in addressing the inconsistency andnomenclature problems. Semantic Web and standard ontologies providea solution to these database problems. We just don't simply convertan existing database syntactically into a semantic web format, butwe also need to do careful semantic conversion to eliminate as manyerrors, ambiguities, and inconsistencies as possible in order toreduce the costs of knowledge retrieval and discovery.
-Kei

Skinner, Karen (NIH/NIDA) [E] wrote:
Recently I read somewhere (on this list, a blog, a news story,where...?) an assertion that struck me as an interesting passingfact at the time. As I recall, it indicated that more websitesare accessed via a search engine than by typing a URL into abrowser web address bar.
Alas, I did not save the reference, and now I am looking for theproverbial needle in a haystack. Namely, what is the exactassertion, who asserted it, and where did they make it? If anyonein the world has this information or knows how to get it, or orhas related data, I imagine they would belong to this list. Iwould be most grateful for any useful pointer.
Along this same vein, if anyone has any statistics, data,anecodotes or information related to the cost of(1) "friction" arising from inefficient or inappropriate effortsat information retrieval
and
(2) the cost of "negative knowledge" about an existing resource ordata,
these, too, would be helpful.
(For example, with respect to #2 above, we are all familiar withcomparison shopping for goods and services. We seek data/information about prices and quality , but at what point does theexpenditure of that effort exceed the value of the informationlearned?)
I am not looking for examples at the level of a philosophy orecnomics Ph.D. thesis, but rather a few examples in the sciencesthat can be used at the level of an "elevator speech."
Karen Skinner
Deputy Director for Science and Technology Development
Division of Basic Neuroscience and Behavior Research
National Institute on Drug Abuse/NIH

Re: Seeking Help with finding an assertion

Reply via email to