Quite a nice example! These are the sorts of issues that we must
contend with while creating the PRO framework. In fact, this addresses
another issue of scope; that is, whether or not (in the long or short
term) to also account for homodimers, trimers, and so on (currently, GO
handles hetermeric complexes). This also provides a good opportunity
for me to mention that our most immediate goal is to provide a framework
that can be built upon by others as well as us. That is, we would
encourage you to unfold your own corner of the protein world! ;)
June Kinoshita wrote:
If I may put forward a key protein in Alzheimer disease as an example
that we are grappling with, there is full-length APP (which itself has a
number of forms as well as mutations); various peptides derived from
cleavage of APP; and then multimeric forms of the peptides, particularly
Abeta42, which is known to form soluble dimer, trimer, tetramer,
hectamer, and dodecamer, each of which may have different functions or
toxicities, as well as "misfolded" protofibrillar and insoluble
fibrillar forms, and possibly a pore-like form consisting of
I-forget-how-many Abetas. In addition, proteins form complexes that have
functions that are different from those of the non-complexed protein. I
look forward to seeing how the Protein Ontology unfolds, so to speak! -
June
On Jul 19, 2007, at 11:23 AM, Darren Natale wrote:
We don't yet have formal definitions for many of the classes and
relations (the effort only began in earnest a few months ago). But,
basically, there is a distinction made between the full-length (in
terms of amino acid sequence) protein and the sub-length parts of
proteins (commonly called domains by protein scientists,
unfortunately). The term "whole protein" is somewhat of a
placeholder; it is used to signify the evolutionary classes (families)
of full-length proteins as opposed to the evolutionary classes of
domains. Sequence form is again a placeholder term used to denote the
initial translation product from an mRNA, which itself might be based
on a "normal" gene or a mutant thereof, or which might be one of
several possible alternatively spliced transcripts from the normal or
mutant gene. The cleaved or modified product is a further breakdown
of those initial translation products, and allows one to distinguish
between a phosphorylated version of a protein and the
non-phosphorylated version (as an example). The need for the latter
derives from the fact that the two versions might have different
functions.
Eric Jain wrote:
Darren Natale wrote:
We recently began a new Protein Ontology (PRO) effort geared
precisely toward the formal definition of the "smaller entities"
referred to by Alan. By "we" I mean the PRO Consortium, comprising
the PIs Cathy Wu of PIR (which is also a member organization of the
UniProt Consortium), Barry Smith of SUNY Buffalo, and Judy Blake of
Jackson Labs. PRO is being developed within the framework of the
OBO Foundry, and aims to specify protein entities at the level
mentioned by Chris (accounting for splice variation and
post-translational modification and cleavage). Where appropriate,
PRO will indeed make reference to both other ontologies and to
UniProt Knowledgebase (UniProtKB) records. Furthermore, we are also
undertaking the "wildly ambitious" job of representing broader,
more-inclusive classes of similar proteins based on evolutionary
relatedness.
A further description of PRO (with examples and link to a paper) can
be found at http://pir.georgetown.edu/pro
This will no doubt be interesting to quite a few people here! For the
sake of this discussion, could you elaborate a bit more on how the
different concepts in PRO are defined, i.e. what is a "protein",
"whole protein", "sequence form" and "cleaved and/or modified product"?