Quite a nice example! These are the sorts of issues that we must contend with while creating the PRO framework. In fact, this addresses another issue of scope; that is, whether or not (in the long or short term) to also account for homodimers, trimers, and so on (currently, GO handles hetermeric complexes). This also provides a good opportunity for me to mention that our most immediate goal is to provide a framework that can be built upon by others as well as us. That is, we would encourage you to unfold your own corner of the protein world! ;)

June Kinoshita wrote:
If I may put forward a key protein in Alzheimer disease as an example that we are grappling with, there is full-length APP (which itself has a number of forms as well as mutations); various peptides derived from cleavage of APP; and then multimeric forms of the peptides, particularly Abeta42, which is known to form soluble dimer, trimer, tetramer, hectamer, and dodecamer, each of which may have different functions or toxicities, as well as "misfolded" protofibrillar and insoluble fibrillar forms, and possibly a pore-like form consisting of I-forget-how-many Abetas. In addition, proteins form complexes that have functions that are different from those of the non-complexed protein. I look forward to seeing how the Protein Ontology unfolds, so to speak! - June

On Jul 19, 2007, at 11:23 AM, Darren Natale wrote:


We don't yet have formal definitions for many of the classes and relations (the effort only began in earnest a few months ago). But, basically, there is a distinction made between the full-length (in terms of amino acid sequence) protein and the sub-length parts of proteins (commonly called domains by protein scientists, unfortunately). The term "whole protein" is somewhat of a placeholder; it is used to signify the evolutionary classes (families) of full-length proteins as opposed to the evolutionary classes of domains. Sequence form is again a placeholder term used to denote the initial translation product from an mRNA, which itself might be based on a "normal" gene or a mutant thereof, or which might be one of several possible alternatively spliced transcripts from the normal or mutant gene. The cleaved or modified product is a further breakdown of those initial translation products, and allows one to distinguish between a phosphorylated version of a protein and the non-phosphorylated version (as an example). The need for the latter derives from the fact that the two versions might have different functions.

Eric Jain wrote:
Darren Natale wrote:
We recently began a new Protein Ontology (PRO) effort geared precisely toward the formal definition of the "smaller entities" referred to by Alan. By "we" I mean the PRO Consortium, comprising the PIs Cathy Wu of PIR (which is also a member organization of the UniProt Consortium), Barry Smith of SUNY Buffalo, and Judy Blake of Jackson Labs. PRO is being developed within the framework of the OBO Foundry, and aims to specify protein entities at the level mentioned by Chris (accounting for splice variation and post-translational modification and cleavage). Where appropriate, PRO will indeed make reference to both other ontologies and to UniProt Knowledgebase (UniProtKB) records. Furthermore, we are also undertaking the "wildly ambitious" job of representing broader, more-inclusive classes of similar proteins based on evolutionary relatedness.

A further description of PRO (with examples and link to a paper) can be found at http://pir.georgetown.edu/pro
This will no doubt be interesting to quite a few people here! For the sake of this discussion, could you elaborate a bit more on how the different concepts in PRO are defined, i.e. what is a "protein", "whole protein", "sequence form" and "cleaved and/or modified product"?



Reply via email to