On May 10, 2006, at 6:32 AM, Xiaoshu Wang wrote:
--Phil,
Also, it's not clear what it meant by "same thing".
An genbank record and embl record identifying the same piece
of DNA are not the same thing; they are different records.
Given that this is the semantic web, it might be nice to be
able to state "different records, but same gene". Or probably
"different record, but same gene, according to some criteria".
Using a "record"'s URI to identify a gene is fundamentally wrong.
The nature
of a "record" is a text document but the nature of a gene is a
biological
entity. Mixing the two of course will generate confusion. W3C's
TAG group
has already tackled this issue for quite a while and they have
come up a
resonably good resolution at the end of last year (search issue
httpRange-14).
As I said before, how the URI looks like doesn't matter. What
matters is
what will be returned when the URI is dereferenced. The URI is
just like
the variable identifier of a programming language. Each variable
has its own
type. If try to use a Foo as a Bar, of course you get runtime error.
In the mentioned particular case, a Gene is a biological entity
where a
Genebank record is an electronic text document. Of course, you
should not
use the latter to identify the former. The gene should have its
own URI.
Genes should have their own URIs? That's some 10^16 or so URIs just
for the volume of space that I'm occupying right now.
More useful would be a URI for gene types - eg a URI for the type
"Homo sapiens p53 gene" (or an allele thereof).
Of course, this gets back to Phil's point about not being able to
define gene, species, etc. You could counter that an instance of a
gene is akin to an instance of a species and is some aggregation-
population like entity, and there is only one Homo sapiens p53 gene.
But that leaves open the question of the relation between that entity
and the 10^12 or so DNA regions encoding p53 proteins in my cells.
I agree with Matthias that it is not as hopeless as Phil makes out -
I don't think it's so hard to come up with commensurable definitions
for these things.
For example, if it is assigned to be http://example.com/gene/123. And
dereference this URI should eventually (i.e., perhaps after a HTTP
303, see
httpRange-14) lead to an RDF document, where it says
[http://example.com/gene/123] a rdfs:Resource; (Or a URI for Gene
from an
ontology ...)
rdfs:seeAlso [GeneBank record ID];
rdfs:seeAlso [EMBL record ID].
You could use the sequence ontology here
but if we were to treat genes as types (which is ontologically
correct, I would argue), then the relation between 123 and SO:gene
would be subClass, not instantiation. I'm not sure why you couldn't
have an owl:sameAs between 123 and, say, an NCBI Gene ID. Both URIs
would dereference to representations of types.
Cheers
Chris
If you want, you can further write:
[Genebank record ID] owl:sameAs [EMBL record ID].
or whatever you want to say about anyting in the world.
Xiaoshu