Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

Xiaoshu Wang Wed, 18 Jul 2007 03:05:54 -0700

I agree with Alan but feel sympathy for Eric as well. In the absence ofa universally accepted ontology for describing biological entities, Erichas to develop something to start working on SW.But please note, just because "http://purl.uniprot.org/core/Protein";contains the string "Protein" does not make it the identifier for*Protein*, unless everyone else agrees to it. In an open worldenvironment, which RDF is in, everything makes sense as long as there isno contradiction. The ambiguity problem will only arise when the termis to be aligned with other terms, which is not the case yet. Thedevelopment of SW will be an evolving process because it is impossibleto get things right at the very first try. I think the guideline tobest practice should encourage to (1) try to reuse existing ontology and(2) if no such ontology exists, build your own. Eric's case obviouslyfelt into the second case. If more users agree the uniprot ontology, itis great and uniprot can gradually evolve into a standard. If not, wecan learn some lesson.


That's my two cents,

Xiaoshu

Alan Ruttenberg wrote:

In that case, I would recommend that it is unwise to use Uniprot idsas identifiers of protein classes on the semantic web. Doing so wouldencourage exactly the kind of ambiguity that we need to avoid in orderto write statements that will not confuse semantic web agents(including people).
I would suggest instead, that Uniprot not suggest that they representspecific classes of proteins, and instead keep them being exactly whatthey are, records containing information about diverse sets ofentites, which we all admit is very useful. If there is interest informalization for semantic web use at Uniprot, perhaps the focus canbe instead on the smaller entities on which these records collectinformation.
Let others who are more interested in providing formal definitions forproteins work on making definitions that carve out specific classes.They can do so in part by pointing at information in the Uniprotrecords and other sources.
-Alan

On Jul 17, 2007, at 4:33 AM, Eric Jain wrote:
Alan Ruttenberg wrote:
To clarify, no, I didn't mean this. I meant that the definition ofUniprot records are already broad in the sense that sometimesmultiple splice variants are included in a single record, as arepopulation and disease-causing variants, according to Eric.Basically I don't know what set of proteins people currently intendto denote when they use a uniprot id as a protein, and I'm notentirely certain what the curators intend. So step one would be anenglish description of how to figure out what the curator's intentis, and we could go on from there to define OWL definitions based onthat. I suspect that people currently using Uniprot ids may be usingthem in both broader and narrow ways, but we could leave thediscovery of such cases to a reasoner once we had the basics in place.
People do indeed use UniProtKB identifiers in both broad and narrowways: The narrow way is to talk about the exact, main sequence thatis shown...
I
In any case, I'm not too optimistic about being able to define ourconcepts in a strict, yet meaningful way, as often it's practicalcriteria that are used to decide, e.g. here's what one of ourcurators has to say on this:
"[Usually] we have one entry per gene. We have several entries for asingle gene when description of variations are too complicated todescribe in FT lines (of course, this criteria depends on theannotator). For viruses, it is much more messy, due to ribosomalframeshifts."
Formalize that! :-)

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

Reply via email to