Hi Thomas,

Uh-oh, URIs... :)

For coordinate systems, I think the definitions of the component pieces are 
fairly well described. It is a pity that the species name is not given its own 
parameter though. The sources documentation then says: "The uri (required) 
attribute is a globally unique identifier for the coordinate system. It should 
be a fully resolvable URL providing more information about the coordinate 
system." This could be misleading as although the URIs _are_ resolvable, the 
content is not particularly machine friendly.

I am not willing to change the syntax of the coordinate system URIs out in the 
wild, but if you need the content returned to be machine readable we could 
replace the HTML content with an XML+XSLT combination. That is, 
"http://www.dasregistry.org/dasregistry/coordsys/CS_DS6"; would look more like 
one of the entries in "http://www.dasregistry.org/das/coordinatesystem"; to a 
machine, and the same as it currently does to a human. From a practical 
perspective though, if a client parses the XML elements from the registry's 
/das/coordinatesystem output, it can identify all the coordinate systems by 
both URI and text description. Changing the output wouldn't materially change 
what a client needs to do given either a URI or a comma separated string. It is 
always going to need to run a HTTP get and do some parsing of coordinatesystem 
XML. But it is certainly true that having the URI resolve to the XML is a more 
elegant and simple to explain system, and in any case the spec makes no ment!
 ion of the fact that a client can even obtain the XML for all the coordinate 
systems together.

Throughout writing the 1.6 spec, URIs have always been a big problem to 
describe, mainly because there are lots of complications for DAS (source vs 
version, server vs registry namespace). URIs simply weren't given a lot of 
thought and explanation from the start, and it's too late to change them. In 
1.6 things are a little better in that source URIs have been formalised and are 
more useful, without breaking the assumptions clients currently make. But I 
have changed the wording describing URIs a few times. I did have a large 
section describing URIs in general, the rules for formulating them, relative 
URI references etc, but in the latest drafts this is simplified so as not to 
confuse people (as much). It only really refers to source URIs rather than 
coordinate systems though, so I'm happy to add something. Could you please 
provide the wording and examples? Nobody ever seems to want to :)

With regards to the alignment command specifically, I wanted to use the URI for 
both the query and the content as they are more robust, but there was some 
practical reason for the existing servers that prevented us from doing so. 
Perhaps Rob or Andreas can comment? Again, technically it doesn't matter to the 
client if it has access to the coordinates XML, but it does make the spec not 
'feel right' IMO. Also, if coordinate system descriptions (i.e. the comma 
separated string) were to change over time servers would drift and this would 
cause big problems for the client, but in truth plenty of stuff would break if 
that were to happen.

Cheers,
Andy

On 11 Aug 2010, at 20:14, Thomas Down wrote:

> My reading of the current spec is a bit vague about how we should refer to
> coordinate systems.
> 
> There seem to be three ways to represent a CS:
> 
>              - Comma-separated list, e.g. NCBI_36,Chromosome,Homo sapiens
>              - URI, e.g.
> http://www.dasregistry.org/dasregistry/coordsys/CS_DS40
>              - XML, e.g.:
> 
>                                <COORDINATES uri="
> http://www.dasregistry.org/dasregistry/coordsys/CS_DS40"; taxid="9606"
> source="Chromosome" authority="NCBI" test_range="1:1,1000"
> version="36">NCBI_36,Chromosome,Homo sapiens</COORDINATES>
> 
> The XML representation seems to be the most complete.
> 
> The URIs don't really get discussed much in the spec.  Should they resolve
> to anything in particular?  Or should they just be treated as opaque
> strings?  The example I've given resolves to an HTML document with a
> Vitruvian Man icon and some human-readable details, but probably isn't going
> to be any help to a client.
> 
> If you restrict yourself to single-genome DAS (sequence, features, etc.),
> this all works out fine -- the only interaction you need with the coordinate
> system infrastructure is to filter out suitable sources from a registry, and
> in that case you can either filter on the XML COORDINATES elements -- which
> is fairly straightforward -- or you can ask the registry to filter for you
> (using a data model which is a reasonably close match to the XML).
> 
> However, working with coordinate systems seems to be pretty much essential
> once you start working with alignements, and this is where things start to
> get complex.
> 
> The returned alignment XML defines the CS of each sequence in the alignment
> using the comma-separated form.  My assumption is that you're meant to treat
> this as an opaque string and correlate it with data from a registry, but
> this isn't 100% clear.
> 
> On the other hand, if you want to specify a coordinate system in the
> alignment QUERY, you're supposed to provide a URI.  It's not at all clear to
> me what a server is supposed to be doing with this.  Again, opaque string?
> 
> Is it too late to ask if there's any chance of rationalizing this (and maybe
> providing a few concrete examples in the spec) before 1.6-final?
> 
>             Thomas.
> _______________________________________________
> DAS mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/das


_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das

Reply via email to