On Jul 16, 2007, at 10:19 AM, Phillip Lord wrote:


"MK" == Marijke Keet <[EMAIL PROTECTED]> writes:

MK> Lack of sufficient knowledge about a particular (biological) entity is MK> a sideshow, not an argument, to the issue of distinguishing real proteins from
  MK> their records.

I agree. The argument is that it's very hard to describe what you mean by a "protein". We almost certainly don't mean a protein molecule. We might mean a type of protein. But then we don't know whether two protein molecule are actually of a given
type.

I'm confused. I think we all would agree that there are instances of proteins and we have a good idea of what they are. We also know that there are groups of proteins that are built off the same template and share certain properties. If we define classes using such properties, then we can in principle, decide whether these proteins are members of a given class (subject to experimental limitations). For instance we can define a class of proteins that have a certain primary structure (aa sequence), and then, via assay, measure what fraction of the proteins in some sample have that structure.

My questions are how often do we want to refer to a protein, rather than a record
about a protein?

Any time we want to make a scientific statement about proteins. In my work, that means virtually all the time. For example, I have a body of work that is the target of text mining at the moment - If the text mining worked well enough to understand the articles, what should it generate for semantic web consumption?

And who is responsible for ascribing a ID to a specific type of protein. In practice, in bioinformatics, the answer to this is a) we don't and b) uniprot.

I agree with a) - we mostly don't and when we do we do it in an unclear and nonstandard way. I disagree with b) Exactly what the class of proteins described by a uniprot record is not clear (though Eric started to make a theory of what it could be). I have seen uniprot ids used even to identify antibodies to a protein.

As for who is responsible, I would say that our community is responsible. I expect that there will be efforts along this line in the OBO Foundry and I would hope that there would be broad participation from the people who are interested in following this list.

So, while distinguishing between a uniprot record and a protein seems like a good idea, I'm not convinced it brings you anything. What are you going to do with your
protein ID?

I would like to be able to have Invitrogen be able to say that product xxxyyy is an antibody to some specific class of phosphoproteins in a way that a semantic web agent could do some shopping for me if I needed such a reagent. I could go on and name a long list of such cases, but I'm pretty sure you could do the same thing, notwithstanding your playing dumb here.

-Alan

ps. Hi Phil - glad you're joining the party!


Phil


--
Phillip Lord,                           Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http:// homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,               skype: russet_apples
Newcastle University,
NE1 7RU



Reply via email to