On Thu, Aug 12, 2010 at 2:15 PM, Javier Herrero <[email protected]> wrote:
> Hi Thomas > > > > Is there still any interest in this on the Ensembl side? It's something > > I'm going to be needing soon, too (my current chain-file-based server > > doesn't handle all the cases I'm interested in). > > I guess the interest must come from "the other side". I am quite keen on > providing alignments through DAS if people and/or DAS clients will use them > and if that is not too heavy for our servers. Well, I'm very keen to get comparative data into Dalliance ( http://www.biodalliance.org/human/ncbi36/) if you haven't seen it, and an ensembl-compara DAS server would be substantially the best way to do that. > You can imagine that things can > go horribly wrong if one asked all 33-way EPO alignments on a chromosome at > once. This can probably be controlled in the server. > That's an interesting general question. Historically, DAS has gone more for trusting clients to request "sensible" amounts of data (although personally, I'd like to see a richer way of hinting to clients what "sensible" might mean in a given context). You could just forbid fetching alignments >1Mb or something. Having said that, I actually think there *are* legitimate reasons for fetching a whole chromosome worth of alignments. Firstly for clients that want to do something other than pure data visualisation. Secondly, for a client which wants to show a "karyotype" type view, with syntenic blocks labeled, rather than a very detailed base-level alignment. The first one can probably only be addressed by chunky servers and/or responsible usage patterns, but the second one could be handled quite nicely with a very small extension to the current DAS protocol. The current format encourages you to represent the high-level structure of the alignment using BLOCK elements, then fill in the fine base-level structure with CIGAR strings. Given a server that follows this pattern, all that's needed is a flag to omit the CIGARs and you'd have pretty-much perfect data for use in a synteny browser. Given that the CIGAR is already optional, this ought to be pretty painless to add. > Another question is whether it is easy to fit genomic alignments into the > current dasalignment structure. I am not sure how to interpret things like > dbAccessionId, objectVersion, dbSource, etc for a genomic alignment. In > other > words, should protein and genomic alignments share the same query and > response? The response format actually fits pretty well as far as I can tell. To my mind: assembly name/version == dbSource (although this is kind-of redundant with coordinate system...) chromosome name/number == objectVersion. Concrete example (NB. experimental server, may move, change or disappear!): view-source: http://www.derkholm.net:8080/das/hg18ToHg19/alignment?query=22 Because of the block/segment approach, it would be easy to generalize this to >2 way alignments. There may be a few minor additions that are worthwhile, but overall I'm fairly certain this will work okay. Right now, I'm much more concerned about the query format. Thomas. _______________________________________________ DAS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/das
