Hi all,

As you may know, soon a new revision of the DAS specification will be published. One of the features to be added is improved support for hierarchical features, and I'm looking for input regarding a detail of how this will be done.

The plan is to replace the <GROUP> structure with something similar to the DAS/2 approach: parent features have concise <PART> elements that identify other (separate) child features. Child features have <PARENT> elements to represent the reciprocal relationship. This means the group data no longer needs to be duplicated when shared by several features, and groups can themselves have start/endpoints:

  <FEATURE id="A1">
    <PART id="B1" />
    <PART id="B2" />
    ... start, end, notes and other verbose content ...
  </FEATURE>
  <FEATURE id="B1">
    <PARENT id="A1" />
    ... content ...
  </FEATURE>
  <FEATURE id="B2">
    <PARENT id="A1" />
    ... content ...
  </FEATURE>

Here, both contain references to each other representing the same link. However, it would be possible to represent the relationship even if only one feature links to the other:

  <FEATURE id="A1">
    <PART id="B1" />
    ...
  </FEATURE>
  <FEATURE id="B1">
    ...
  </FEATURE>

Therefore the option exists to omit the <PARENT> element from the specification entirely. Over the last couple of years we have seen DAS sources become more and more dense, and browsers wishing to display larger regions. As a result, there is significant pressure to minimise the verbosity of the XML response (there are other changes to the upcoming spec to help with this). Whilst DAS2's alternative content negotiation feature sidesteps the issue, DAS does not yet have this and in any case it is my belief that the fallback XML format should still be fit for purpose.

The counter argument (i.e. the case for requiring both <PARENT> and <PART> elements) is based around the rendering efficiency benefits of streaming. If a client knows for sure that it has parsed all features that are related to each other, it can render them while it waits for the server to send the rest of the response. A client could potentially use this to offer a significant usability boost - a user's perception of the speed of an interface is greatly influenced by how fast a display starts to render rather than the time it takes to complete. But at the moment there are no DAS clients that use this (it is not possible with the current spec, and some clients such as Ensembl cannot due to the way the data is rendered). I am not sure to what extent it would be used in future either, for example it could not be used where post-processing of the entire set of features is necessary (e.g. binning).

So my question is: should the specification require bi-directional references (<PARENT> and <PART>), or uni-directional (<PART> only)? Whichever approach is taken, replacing the <GROUP> structure will significantly reduce verbosity for groups with large numbers of child features, but do we want to reduce this further by removing <PARENT> elements at the cost of the potential for "streaming">

Apologies for the long and technical post.
Andy
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das

Reply via email to