In addition to Uniprot, in light of Matthias' earlier email, what about http://en.wikipedia.org/wiki/Protein, http://dbpedia.org/page/Protein, and the protein related ontologies listed in OBO (http://www.obofoundry.org/)?

-Kei

Michel_Dumontier wrote:
Pursuant to my email, and in light of several other comments, if our
goal is to now rectify what Uniprot:Protein _actually_ means in our
domain, and how it can be semantically mapped to other bio-ontologies,
then I might also suggest that instances of Uniprot:Protein are
aggregates of proteins (err... :ProteinAggregate anyone?), possibly
separated by both space and time, having a similar (base sequence +
mutations / ptms) composition, sharing certain characteristics (e.g.
functionality, domains) and observed to participate in biological
processes. Clearly not a type of protein of the single molecule form,
but again, certainly not a Record.

-=Michel=-



 If however, what we've been talking about is that identifiers like
        http://purl.uniprot.org/uniprot/Q16665

are actually database records, and not molecular entities, then we can
settle this quickly:

Uniprot RDF file: http://www.uniprot.org/uniprot/Q16665.rdf
(is this what people were referring to as a Record???)

Contains:

<rdf:Description rdf:about="http://purl.uniprot.org/uniprot/Q16665";>
 <rdf:type rdf:resource="http://purl.uniprot.org/core/Protein"; />


It's clear that the entity denoted by :Q16665 is rdf:type :Protein and
is the subject of statements that are biological in nature such as
being
located in sub-cellular compartments or being involved in biochemical
reactions. It is clearly not a Record. This is generally the case for
nearly all entries in biomolecular databases.

Cheers,

-=Michel=-

Anxiously waiting see if this clears up things or generates
controversy
.. it's hard to predict!



If nobody ever wants to use the same property to talk about the
database
record as was used to talk about the molecule, and nobody ever makes
an
assertion that implies that the class of database records is
disjoint
from the class of molecules, then I don't see any harm in using the
same
URI to ambiguously denote both.   But if one is trying to design
data
to
be reusable by others in unforeseen ways, there clearly *is* a risk
that
someone will want to make such assertions in conjunction with the
data,
and if that happens there is a clear harm.  This risk is easy to
avoid
by using separate URIs.

There *are* trade-offs.  Minting two URIs instead of one *does* add
some
complexity, though as I pointed out that additional complexity can
be
mitigated to the point that it is a *very* low cost.  Still,
different
people will weigh these trade-offs differently, and what's best for
one
situation may not be best for another, as I indicated in my original
post.

Furthermore, even if one does use the same URI to ambiguously denote
both a database record and a molecule, that is not the end of the
world
either.  It is possible (though more difficult) to later separate
out
and relate the different senses of an ambiguous URI, as I have
described:
http://dbooth.org/2007/splitting/
Ambiguity is inescapable, and ambiguity between a thing and a page
that
describes that thing is not fundamentally different from other kinds
of
ambiguity (except perhaps that we are aware of it in advance and it
can
be easily avoided), as explained here:
http://dbooth.org/2007/splitting/#httpRange-14

Finally, although it is flattering that you have named this
suggestion
after me, I cannot take credit.  As I pointed out in my original
post,
the suggestion to differentiate between a molecule and the database
record that describes that molecule originates with the Architecture
of
the World Wide Web:
http://www.w3.org/TR/webarch/#URI-collision
and best practices for implementing this distinction are described
in
Cool URIs for the Semantic Web:
http://www.w3.org/TR/cooluris

David Booth







Reply via email to