Re: [DAS] querying for nonpositional annotations

Jim Procter Mon, 02 Aug 2010 01:51:10 -0700

Interesting thread this - it's something that hasn't been properlydiscussed at previous DAS developer meetings..


On 30/07/2010 20:33, Dave Messina wrote:

I too agree with Eugene.


No magic numbers.

You're too late here. 0 *is already* a magic number in the normalisedprotein sequence world, since it indicates the transcription start sitefor a coding sequence (i.e. the initial M). This is the cause for someambiguity in the bio* bindings, and confusion on the part of more simpleminded programmers like myself :)

Types can be used for filtering, and actually you get more fine-grained control 
than simply positional or non-positional. (I use this technique now in DASher.) 
*

In my opinion, the current spec as written is correct. That is, non-positional 
features don't just apply to the whole sequence, they apply to any part of the 
sequence.

Agreed. But read on...

As an example, consider a journal reference — a particular protein was isolated 
by a lab, they wrote a paper about it, and deposited the protein sequence in a 
database. If you look at a subsequence of the protein sequence, that 
subsequence still derives from the paper, right? So therefore the feature 
containing that journal reference should still be attached to the subsequence.

On that basis, I think the uniprot server is technically doing it wrong and 
should be changed, although I have to say that in practice it hasn't been an 
issue for me.

It's a difficult call. The uniprot server's behaviour is almostcertainly due to the ambiguity arising from non-positional annotationwhich have start/end attributes (where start==end && start==0), andthose which do not (the annotation is then usually derived from someother table, viz. the BioSQL schema). Other DAS servers do similarthings, and kludges are needed to fix them.

My only worry with the expectation of 'proper behaviour' - is thatcurrently, I frequently see IDs with more non-positional annotation thanpositional (notwithstanding histogram like continuous quantitativeannotation such as running averages of predicted or observed localsequence properties). Enforcing compliance with the spec as writtenmeans that the average DAS metaserver (i.e. uniprot, or some server thataggregates sequence database info with other data) will send a hugenon-positional header in response to every range qualified featurerequest, which is pretty inefficient. It may not scale well, either,since the amount of database cross references is (still) increasing.

* It might be nice, though, to add 'positional' and 'non-positional' types, 
which would be a way to grab all of the existing positional or non-positional 
types in one go. (currently it's necessary to specify multiple types to get the 
same functionality.)

This is essential, I think. However, the only way you are going to beable to do this in a DAS type constraint currently is to ensure thefeature annotation source is ontology aware (and said ontology includesa distinct positional/non-positional hierarchy)**. One route would be tointroduce a DAS-specific type term that the server maps to its source'sontology, another simpler approach would be to introduce a new booleanconstraint 'positional', which if specified, limits the response topositional annotation only.


Jim.

** but this immediatly brings to mind a nasty potential gotcha: e.g.'expression' in the context of a genome is positional, but is anon-positional feature in the context of a proteome. So terms will haveto be fully qualified in the type constraint on a feature request.


--
-------------------------------------------------------------------
J. B. Procter  (JALVIEW/ENFIN)  Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das

Re: [DAS] querying for nonpositional annotations

Reply via email to