Re: Seeking Help with finding an assertion

Mark Montgomery Fri, 06 Jul 2007 08:09:08 -0700

The costs of not knowing is of course the reason for investing in science,and is for practical purposes infinite. Wouldn't it be nice if it endedthere, but we also must deal with spin, turf protection, disinformation, anddigital replication of false information..... In extreme examples ofdigitally enhanced negative information that transforms into negativeknowledge (belief systems)- terrorism and false health cures are goodexamples, but it's everywhere.

Unfortunately those costs are very challenging to recover even when systemsare built that can do so. For example, Google provides excellent searchtechnology, and is making billions from the advertising model only becauseof the generosity (or other factors) of hundreds of millions of people. Theydo have a revenue sharing program, but it doesn't come close to covering afraction of the cost for generating that information, so we have a problemlong term.

That's a major issue with free- a very small portion of the population isbeing compensated, it's not always the best or even accurate information,and even when coming from the most reputable institutions- it is oftenconflicted- an old media issue as well. So architects need to redesignincentives and try to keep our emotion based best intentions checked, if wewant sustainable systems that are properly aligned. That's one major issuewe've been working on. Call it economic justice. But with the publicInternet combined with democracy, these issues may eventually mean thedifference between survival or not.

Another is that the semantic web should have been the semantic internetbecause the web, while very important, isn't totally integrated withmessaging, which is a major source of information overload and badinformation that often leads to corrupt knowledge, replication offalsehoods, etc. Even false rumors within organizations and communities isspread rapidly now and causing major problems just in the small slices Isee. Then add the cost of missing just one essential message, or anorganization not giving it the priority deserved for whatever reason- likethe infamous Phoenix memo in intelligence that if properly handled couldhave prevented 9/11.

So while an essential piece of the puzzle, I think too much emphasis hasbeen placed on ontologies for the past decade, and far too many view thesector as a cure all. Very important component in particular for lifesciences, and accelerated discovery is taking place, but every path I'vegone down suggests strongly that unless each of the areas are addressed,particularly for organizations, systemic failure (at least events if notcontinual) is probable, which is why we take a holistic approach.

One problem I am still seeing with ontological languages and toolssurrounding them is that they are not sufficiently flexible to deal with therapid transition and growth of knowledge. Many if not most areas of learningare evolving very quickly- yesterday's certainty is today's uncertainty andtomorrow's epiphany. Here we may have a conflict within the very communityas this situation creates good job protection as XML did, but also serves tolimit adoption and usefulness, which is why we are focused on quality in,quality out. ERP has a similar structural problem that I for one would liketo avoid.

Prevention is the best medicine in IT architecture as well as healthcare,and also generates the highest ROI, even if so few are aware of it.


.02-

Mark Montgomery
CEO, Kyield
http://www.kyield.com
Managing Partner
Initium Venture Capital
http://www.initiumcapital.com

----- Original Message -----From: "Skinner, Karen (NIH/NIDA) [E]" <[EMAIL PROTECTED]>

To: "Kei Cheung" <[EMAIL PROTECTED]>
Cc: "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>
Sent: Thursday, July 05, 2007 11:39 PM
Subject: RE: Seeking Help with finding an assertion

Many thanks to all for this lively discussion, the helpful references,
and your generosity with your knowledge!

I could not access the video presentation yesterday but was finally
successful late tonight. It indeed was very interesting.  The paper:
"Understanding user goals in web search" also appears really relevant,
but it is not readily accessible through the NIH online publications.
I look forward to going through the other references, and your comments
about them have been helpful already.

Just a comment about "negative knowledge." I did not know it has any
sort of formal meaning. I coined the phrase for my own purpose in
reference to a situation where some information might exist, but a
potential user might not be aware of it. For example, a consumer could
go to a local store and compare prices for refrigerators. But if the
consumer visited more stores, she could learn even more about prices and
models. If she visited only one store, all other information about
prices would be "negative" knowledge to her, because it does not exist
for her -- i.e., she does not know about it. Certainly, at some point,
the cost of expenditure of effort to "know" exceeds the benefit, whether
that cost is determined by an hourly wage equivalent, or some subjective
measure of the value of her time.

In science, such an analysis quickly becomes very complex. In some
cases, an investigator may not care if a certain study has been
conducted because they only trust the reagents or data they themselves
generate, and the existence of data and resources is irrelevant to that
investigator.

On the other hand, suppose that "database X" did not exist, but the
existence of information that would have been found in it can be
identified and obtained only through locating and reading thousands of
individual papers. At what point does the cost of locating and reading
the papers by "y" number of users exceed the cost of the database? It
would seem that most of the cost would derive from the expense of
determining IF the knowledge existed. How many papers would the
scientist have to read before being certain the knowledge or data did
not exist?

Karen Skinner

-----Original Message-----
From: Kei Cheung [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 05, 2007 10:36 PM
To: Chris Mungall
Cc: Skinner, Karen (NIH/NIDA) [E]; public-semweb-lifesci hcls
Subject: Re: Seeking Help with finding an assertion

Hi Chris,

Thanks for pointing out the potential flaws of their method. It sounded
like there is room for improvement in terms of the accuracy of database
contents and the method of assessing database accuracy. Don't get me
wrong. I think highly of GO. :-)

I'm also thinking more about what "negative knowledge" really means.
Does it mean any or all of the following:

1. inconsistent knowledge
2. inaccurate knowledge
3. incomplete knowledge
4. knowledge with uncertainties

Can SW/ontologies help turn "negative knowledge" to "positive
knowledge"?

-Kei

Chris Mungall wrote:



On Jul 4, 2007, at 8:27 PM, Kei Cheung wrote:


As a follow-up example, a study for estimating the error rate of
Gene Ontology (GO) was done:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?
artid=1892569#id2674403

The study showed that the GO term annotation error rate estimates
for the GoSeqLite database were found to be 13% to 18% for curated
non-ISS annotations, 49% for ISS annotations, and 28% to 30% for  all

curated annotations. (ISS stands for inferred from sequence
similiarity). Despite these findings, the authors concluded that GO
is a comparatively high quality source of informaton. Integration  of

databases involving significant error rates, however, can impact
negatively the quality of science.



I have not yet properly digested this paper, but on a cursory reading

there appear to be a few serious flaws. First, a lack of
understanding of basic ontology principles - annotations to less
specific classes in the graph are treated as errors. Second, the
authors appear to make a lot of incorrect assumptions about how ISS
annotations are curated.

It's curious they predict such a high error rate yet don't provide
any examples.


-Kei

Kei Cheung wrote:


Hi Karen,

Your questions remind me of the following classic article written
by Robert Robbins on "Challenges in the Human Genome Project".

http://www.esp.org/umdnj.pdf

Although it doesn't directly answer the questions, in the
"Nomenclature Problems" section (p. 20-21), it discusses the
significant problem of inconsistent knowledge representation. It
says that it's mistake to believe  that terminology fluidity is  not

an issue biological in database design. It also says that many
biologists don't realize that, in a database bulit with 5% error  in

the definition of individual concepts, a query that joins  across 15

concepts has less than 50% chance of returning an  adequate answer.
The section also points out the importance of  formal representation

of scientific knowledge in addressing the  inconsistency and
nomenclature problems. Semantic Web and standard  ontologies provide

a solution to these database problems. We just  don't simply convert

an existing database syntactically into a  semantic web format, but
we also need to do careful semantic  conversion to eliminate as many

errors, ambiguities, and  inconsistencies as possible in order to
reduce the costs of  knowledge retrieval and discovery.

-Kei

Skinner, Karen (NIH/NIDA) [E] wrote:

Recently I read somewhere (on this list, a blog, a news story,
where...?) an assertion that struck me as an interesting passing
fact at the time.   As I recall, it indicated that more websites
are accessed via a search engine than by typing a URL into a
browser web address bar.

Alas, I did not save the reference, and now I am looking for the
proverbial needle in a haystack. Namely, what is the exact
assertion, who asserted it, and where did they make it?  If  anyone

in the world has this information or knows how to get it,  or or
has related data, I imagine they would belong to this list.  I
would be most grateful for any useful pointer.

Along this same vein, if anyone has any statistics, data,
anecodotes or information related to the cost of
(1) "friction" arising from inefficient or inappropriate efforts
at information retrieval
and
(2) the cost of "negative knowledge" about an existing resource  or

data,

these, too, would be helpful.

(For example, with respect to #2 above, we are all familiar with
comparison shopping for goods and services. We seek data/
information about prices and quality , but at what point does the
expenditure of that effort exceed the value of the information
learned?)

I am not looking for examples at the level of a philosophy or
ecnomics Ph.D. thesis, but rather a few examples in the sciences
that can be used at the level of an "elevator speech."


Karen Skinner
Deputy Director for Science and Technology Development
Division of Basic Neuroscience and Behavior Research
National Institute on Drug Abuse/NIH

Re: Seeking Help with finding an assertion

Reply via email to