We don't yet have formal definitions for many of the classes and
relations (the effort only began in earnest a few months ago). But,
basically, there is a distinction made between the full-length (in terms
of amino acid sequence) protein and the sub-length parts of proteins
(commonly called domains by protein scientists, unfortunately). The
term "whole protein" is somewhat of a placeholder; it is used to signify
the evolutionary classes (families) of full-length proteins as opposed
to the evolutionary classes of domains. Sequence form is again a
placeholder term used to denote the initial translation product from an
mRNA, which itself might be based on a "normal" gene or a mutant
thereof, or which might be one of several possible alternatively spliced
transcripts from the normal or mutant gene. The cleaved or modified
product is a further breakdown of those initial translation products,
and allows one to distinguish between a phosphorylated version of a
protein and the non-phosphorylated version (as an example). The need
for the latter derives from the fact that the two versions might have
different functions.
Eric Jain wrote:
Darren Natale wrote:
We recently began a new Protein Ontology (PRO) effort geared precisely
toward the formal definition of the "smaller entities" referred to by
Alan. By "we" I mean the PRO Consortium, comprising the PIs Cathy Wu
of PIR (which is also a member organization of the UniProt
Consortium), Barry Smith of SUNY Buffalo, and Judy Blake of Jackson
Labs. PRO is being developed within the framework of the OBO Foundry,
and aims to specify protein entities at the level mentioned by Chris
(accounting for splice variation and post-translational modification
and cleavage). Where appropriate, PRO will indeed make reference to
both other ontologies and to UniProt Knowledgebase (UniProtKB)
records. Furthermore, we are also undertaking the "wildly ambitious"
job of representing broader, more-inclusive classes of similar
proteins based on evolutionary relatedness.
A further description of PRO (with examples and link to a paper) can
be found at http://pir.georgetown.edu/pro
This will no doubt be interesting to quite a few people here! For the
sake of this discussion, could you elaborate a bit more on how the
different concepts in PRO are defined, i.e. what is a "protein", "whole
protein", "sequence form" and "cleaved and/or modified product"?