On Jul 16, 2007, at 10:19 AM, Phillip Lord wrote:
"MK" == Marijke Keet <[EMAIL PROTECTED]> writes:
MK> Lack of sufficient knowledge about a particular (biological)
entity is
MK> a sideshow, not an argument, to the issue of distinguishing
real proteins from
MK> their records.
I agree. The argument is that it's very hard to describe what you
mean by a
"protein". We almost certainly don't mean a protein molecule. We
might mean a type of
protein. But then we don't know whether two protein molecule are
actually of a given
type.
I'm confused. I think we all would agree that there are instances of
proteins and we have a good idea of what they are. We also know that
there are groups of proteins that are built off the same template and
share certain properties. If we define classes using such properties,
then we can in principle, decide whether these proteins are members
of a given class (subject to experimental limitations). For instance
we can define a class of proteins that have a certain primary
structure (aa sequence), and then, via assay, measure what fraction
of the proteins in some sample have that structure.
My questions are how often do we want to refer to a protein, rather
than a record
about a protein?
Any time we want to make a scientific statement about proteins. In
my work, that means virtually all the time. For example, I have a
body of work that is the target of text mining at the moment - If the
text mining worked well enough to understand the articles, what
should it generate for semantic web consumption?
And who is responsible for ascribing a ID to a specific type of
protein. In practice, in bioinformatics, the answer to this is a)
we don't and b) uniprot.
I agree with a) - we mostly don't and when we do we do it in an
unclear and nonstandard way. I disagree with b) Exactly what the
class of proteins described by a uniprot record is not clear (though
Eric started to make a theory of what it could be). I have seen
uniprot ids used even to identify antibodies to a protein.
As for who is responsible, I would say that our community is
responsible. I expect that there will be efforts along this line in
the OBO Foundry and I would hope that there would be broad
participation from the people who are interested in following this list.
So, while distinguishing between a uniprot record and a protein
seems like a good
idea, I'm not convinced it brings you anything. What are you going
to do with your
protein ID?
I would like to be able to have Invitrogen be able to say that
product xxxyyy is an antibody to some specific class of
phosphoproteins in a way that a semantic web agent could do some
shopping for me if I needed such a reagent. I could go on and name a
long list of such cases, but I'm pretty sure you could do the same
thing, notwithstanding your playing dumb here.
-Alan
ps. Hi Phil - glad you're joining the party!
Phil
--
Phillip Lord, Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email:
[EMAIL PROTECTED]
School of Computing Science, http://
homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909, skype: russet_apples
Newcastle University,
NE1 7RU