Re: Ambiguous names. was: Re: URL +1, LSID -1

Alan Ruttenberg Mon, 16 Jul 2007 08:38:26 -0700


On Jul 16, 2007, at 10:19 AM, Phillip Lord wrote:

"MK" == Marijke Keet <[EMAIL PROTECTED]> writes:
MK> Lack of sufficient knowledge about a particular (biological)entity isMK> a sideshow, not an argument, to the issue of distinguishingreal proteins from
  MK> their records.
I agree. The argument is that it's very hard to describe what youmean by a"protein". We almost certainly don't mean a protein molecule. Wemight mean a type ofprotein. But then we don't know whether two protein molecule areactually of a given
type.

I'm confused. I think we all would agree that there are instances ofproteins and we have a good idea of what they are. We also know thatthere are groups of proteins that are built off the same template andshare certain properties. If we define classes using such properties,then we can in principle, decide whether these proteins are membersof a given class (subject to experimental limitations). For instancewe can define a class of proteins that have a certain primarystructure (aa sequence), and then, via assay, measure what fractionof the proteins in some sample have that structure.

My questions are how often do we want to refer to a protein, ratherthan a record
about a protein?

Any time we want to make a scientific statement about proteins. Inmy work, that means virtually all the time. For example, I have abody of work that is the target of text mining at the moment - If thetext mining worked well enough to understand the articles, whatshould it generate for semantic web consumption?

And who is responsible for ascribing a ID to a specific type ofprotein. In practice, in bioinformatics, the answer to this is a)we don't and b) uniprot.

I agree with a) - we mostly don't and when we do we do it in anunclear and nonstandard way. I disagree with b) Exactly what theclass of proteins described by a uniprot record is not clear (thoughEric started to make a theory of what it could be). I have seenuniprot ids used even to identify antibodies to a protein.

As for who is responsible, I would say that our community isresponsible. I expect that there will be efforts along this line inthe OBO Foundry and I would hope that there would be broadparticipation from the people who are interested in following this list.

So, while distinguishing between a uniprot record and a proteinseems like a goodidea, I'm not convinced it brings you anything. What are you goingto do with your
protein ID?

I would like to be able to have Invitrogen be able to say thatproduct xxxyyy is an antibody to some specific class ofphosphoproteins in a way that a semantic web agent could do someshopping for me if I needed such a reagent. I could go on and name along list of such cases, but I'm pretty sure you could do the samething, notwithstanding your playing dumb here.


-Alan

ps. Hi Phil - glad you're joining the party!

Phil


--
Phillip Lord,                           Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email:[EMAIL PROTECTED]School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,               skype: russet_apples
Newcastle University,
NE1 7RU

Re: Ambiguous names. was: Re: URL +1, LSID -1

Reply via email to