Re: blog: semantic dissonance in uniprot

John F. Madden Thu, 26 Mar 2009 09:32:40 -0700

Pat et al.,

It sounds like people sometimes have an irresistible itch to say that"A is similar to B", but this statement as such has very littlesemantic content.

Perhaps it's not really intended as a statement that has a truthvalue, but rather as a record of somebody's feelings.

The semantic web can certainly serve as a repository for recordingone's feelings, and this might even be useful. (If I were andastrophysicist, for example, I might be quite interested in StephenHawking's intuitions about some problem I was working on.)

So what would you say about an rdf:property called, say, "http://www.example.com/intuit#similarTo" that could be used simply to post a record that somebody intuited a"similarity" between two things?

It would have little utility for inferencing, unless one were to writea custom application (i.e. not OWL) to do so. But it might haveutility as a semantic web "bookmark" for relationships that could beinteresting candidates for future formalization.


John



On Mar 26, 2009, at 8:42 AM, Pat Hayes wrote:

On Mar 26, 2009, at 8:28 AM, Michel_Dumontier wrote:
Pursuant to my email, and in light of several other comments, if our
goal is to now rectify what Uniprot:Protein _actually_ means in our
domain, and how it can be semantically mapped to other bio-ontologies,
then I might also suggest that instances of Uniprot:Protein are
aggregates of proteins (err... :ProteinAggregate anyone?), possibly
separated by both space and time, having a similar (base sequence +
mutations / ptms) composition, sharing certain characteristics (e.g.
functionality, domains) and observed to participate in biological
processes. Clearly not a type of protein of the single molecule form,
but again, certainly not a Record.
Indeed. If I might make a suggestion, rather than talking about'aggregates' (which sounds disturbingly, er, philosophical), why notjust say that the entity being identified is a _substance_.Substances are 'kinds of stuff' that include mixtures (eg concreteis a kind of stuff comprising a mix of sand, crushed rock, cementand water in several possible proportions) but also 'pure' stuffssuch as water. Note the distinction between a substance and a pieceof the substance (concrete, the building material vs,. this or thatlump of concrete) or a mereological sum (your 'aggregate', I think)of such pieces (all the concrete in America). The utility of this isthat it eliminates the discussions about molecules, which I think isgetting in the way of clarity here. Regarding sameAs, being thesame substance is a very strict kind of sameAs, of course, but itreally does only refer to substances, which is a step in the rightdirection. Each protein is a substance. It might turn out that oneprotein is a mixture of others, for example: this is fine, nothingbreaks, as long as nobody says the mixture is sameAs one of itscomponents. And now one can have notions such as 'purified form of'or 'isotopic version of' between substances, which might help tomake all these distinctions that you chemists need to be concernedwith.
Distinctions like object/substance/piece/mixture were worked out byontologists over 20 years ago, by the way. None of this is rocketscience.
Pat
-=Michel=-
If however, what we've been talking about is that identifiers like
        http://purl.uniprot.org/uniprot/Q16665
are actually database records, and not molecular entities, then wecan
settle this quickly:

Uniprot RDF file: http://www.uniprot.org/uniprot/Q16665.rdf
(is this what people were referring to as a Record???)

Contains:

<rdf:Description rdf:about="http://purl.uniprot.org/uniprot/Q16665";>
<rdf:type rdf:resource="http://purl.uniprot.org/core/Protein"; />
It's clear that the entity denoted by :Q16665 is rdf:type :Proteinand
is the subject of statements that are biological in nature such as
being
located in sub-cellular compartments or being involved inbiochemicalreactions. It is clearly not a Record. This is generally the casefor
nearly all entries in biomolecular databases.

Cheers,

-=Michel=-

Anxiously waiting see if this clears up things or generates
controversy
.. it's hard to predict!
If nobody ever wants to use the same property to talk about the
database
record as was used to talk about the molecule, and nobody evermakes
an
assertion that implies that the class of database records is
disjoint
from the class of molecules, then I don't see any harm in using the
same
URI to ambiguously denote both.   But if one is trying to design
data
to
be reusable by others in unforeseen ways, there clearly *is* a risk
that
someone will want to make such assertions in conjunction with the
data,
and if that happens there is a clear harm.  This risk is easy to
avoid
by using separate URIs.

There *are* trade-offs.  Minting two URIs instead of one *does* add
some
complexity, though as I pointed out that additional complexity can
be
mitigated to the point that it is a *very* low cost.  Still,
different
people will weigh these trade-offs differently, and what's best for
one
situation may not be best for another, as I indicated in myoriginal
post.
Furthermore, even if one does use the same URI to ambiguouslydenote
both a database record and a molecule, that is not the end of the
world
either.  It is possible (though more difficult) to later separate
out
and relate the different senses of an ambiguous URI, as I have
described:
http://dbooth.org/2007/splitting/
Ambiguity is inescapable, and ambiguity between a thing and a page
that
describes that thing is not fundamentally different from otherkinds
of
ambiguity (except perhaps that we are aware of it in advance and it
can
be easily avoided), as explained here:
http://dbooth.org/2007/splitting/#httpRange-14

Finally, although it is flattering that you have named this
suggestion
after me, I cannot take credit.  As I pointed out in my original
post,
the suggestion to differentiate between a molecule and the database
record that describes that molecule originates with theArchitecture
of
the World Wide Web:
http://www.w3.org/TR/webarch/#URI-collision
and best practices for implementing this distinction are described
in
Cool URIs for the Semantic Web:
http://www.w3.org/TR/cooluris

David Booth
------------------------------------------------------------
IHMC (850)434 8903 or (650)4943973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Re: blog: semantic dissonance in uniprot

Reply via email to