Summary: Continued discussion of whether we need to have identifiers
for protein classes in addition to those for records. Example
finding is given to support my view that we do need them, in response
to Phil's suggestion I examine my scenarios.
[yah, I know I'm not being consistent about the summaries yet]
On Jul 18, 2007, at 2:22 PM, Phillip Lord wrote:
"Alan" == Alan Ruttenberg <[EMAIL PROTECTED]> writes:
Alan> Summary: Answering Phil's questions, and clarifying one
thing he
Alan> asserts about what I said.
What if they have a polymorphism?
Alan> No.
Are two isoforms from an alternate splice the same protein?
Alan> No.
In both of these you differ from uniprot.
Well, if I am restricted to using such Uniprot classes I will have
trouble representing important scientific findings. If Uniprot only
has one name for the two molecules, one of which has a snp that leads
to a loss of function that is the initiating factor of a disease,
then we have a problem, no? How do we say things about the disease
related form?
Unsatisfying, maybe. Clear definitions are important. But
interoperability, and the lack of duplication are more so.
Alan> Forgive my confusion, but how exactly will we achieve
interoperability
Alan> and lack of duplication if we don't have definitions? How
would we
Alan> know that we don't have duplication, for example?
If you create identifiers to describe proteins rather than protein
records
(like uniprot) then you have created a whole new set of IDs. When
anyone wants
to talk about a protein, they will have to look up the ID.
As they will when they want to talk about a record. Of course perhaps
we all will add some links of the sort that say the record is about
some set of classes of proteins, and that aspects of the protein in a
class can be described by pieces of the record.
But at least we'll know what we are talking about.
<snip>
And, yet, you just told me that you could buy a antibody with just a
swissprot ID. So, let me restate the question, what are you going
to do
with a protein ID that you are not going to do with a swissprot
ID, or
"the protein formally known as OPSD_HUMAN".
Alan> I did not say that. I've said some people have identified
antibodies
Alan> by such ids. Unfortunately this information is of limited
use when
Alan> actually ordering an antibody, where I am interested in
much more
Alan> information, such as how specific it is, how it has been
validated,
Alan> and other properties related to how it behaves in certain
experimental
Alan> settings. I *want* to be able to have identifiers(URIs)
that are up to
Alan> the job of ordering reagents.
Well, I am not sure that you are going to achieve this with an
identifier. You
need significant extra amounts of metadata.
By that reasoning I don't need DOIs for publications. All I need is
the URI for the journal and some metadata.
My point here is simple. Separating out the informatics and biology
conform
better to our notion of reality, sure. But you are talking about
modelling
what makes a protein and, more, a type of protein. Work through
your scenarios
and see whether you need a protein ID for this. If not, you are
introducing a
layer of abstraction that you don't need.
I'm trying to be able to make statements that capture, among other
things, the conclusions that one finds in journal articles. In
http://www.nature.com/onc/journal/v21/n46/full/1205845a.html there is
a description of different isoforms of BAG-1. The different isoforms
have names, e.g. "BAG-1 p29" This name indicates a class of protein
instances. I expect I need a name and a definition for "BAG-1 p29"
and the others, so that I don't get confused and think there is a
contradiction between the statement that "BAG-1 p29 failed to protect
the transfected cells from apoptosis" and "BAG-1 p50, p46 and p33
isoforms enhanced the resistance to apoptosis"
But I'm open to discussing suggestions for representing these
statements by only making use of the Uniprot records ids, if you have
any.
-Alan