On Jul 11, 2007, at 3:53 AM, Eric Jain wrote:
Alan Ruttenberg wrote:
On Jul 11, 2007, at 3:16 AM, Eric Jain wrote:
http://purl.uniprot.org/uniprot/P12345 does not identify an RDF
resource, it represents our concept of some protein.
What concept would that be? What are instances of the class of
proteins that this identifiers denotes?
(serious question)
Some resources are quite simple and straightforward to understand,
e.g. http://purl.uniprot.org/uniparc/UPI00001328C5 represents a
specific amino acid sequence,
The instances are sequences of letters? Qualities of a class of
molecules? The molecules themselves?
and e,g, http://purl.uniprot.org/taxonomy/9606 represents a
specific organism (though there are some complications there, too...)
(indeed)
The resources in the http://purl.uniprot.org/uniprot/ namespace are
a bit more complicated, basically it's annotation for a sequence in
an organism:
Are there sequences in organisms? Or are there polypeptides? Which do
the records represent? If the proteins, then in all states -
unfolded, folded, misfolded, phophorylated, glycosylated etc?
Do the set of sequences/proteins include common(in the organism's
population) non-function-changing mutants?
http://purl.uniprot.org/uniprot/P60484 (Human)
http://purl.uniprot.org/uniprot/P60483 (same sequence, but Dog)
What is the same about them?
...but these resources may also include annotation for related
sequences produced e.g. by alternative splicing:
http://purl.uniprot.org/uniprot/P00750 (Human, 3 sequences)
...provided the function of the resulting sequences are not so
different that they warrant resources of their own...
How different do they have to be?
These might seem to be silly questions "everyone knows what they
mean", but I don't think so. Would you use these identifiers to
uniquely enough identify a protein if your life depended on it? I
think that this is the standard that we should be aiming for - after
all, people's lives do/will depend on it.
What I'm trying to point out with these questions is that the uniprot
records are not trivially interpretable as "concepts", and that it
might be better to not even try in the first place. Rather leave them
be database records, and separately create an ontology of proteins
that might use the records, or aspects of the records in part of the
formal definitions of those proteins.
-Alan