On Wednesday, 1 September 2021 at 05:36:53 UTC, James Blachly wrote:
In another post, I've just announced our D-based high throughput sequencing library, dhtslib.

One feature that is, AFAIK, novel in the field is leveraging the compiler's type system to enforce correctness regarding different genome/reference sequence coordinate systems. Clearly, the encoding of domain specific knowledge in a language's type system is nothing new, but it is surprising that this has not been done before in bioinformatics, and it is an idea that IMO is long overdue given the trainwreck of different coordinate systems in our field.

You can find dhtslib's develop branch, with Typesafe Coordinates merged and ready to use, here:

https://github.com/blachlylab/dhtslib/


**Now the request:**
We've drafted a manuscript describing Typesafe Coordinates as a sort of low-key endorsement of the D language and our library package `dhtslib`. You can find the manuscript here:

https://github.com/blachlylab/typesafe-coordinates/

We would be very grateful to those of you who would take the time to read the manuscript and post comments (publicly or privately), _especially if we have made any incorrect statements_ or our language regarding type systems is awkward or nonstandard.

We did praise D, and gently criticized Rust and OCaml* somewhat as it appeared to me that they lacked the features required to implement Typesafe Coordinate Systems in as ergonomic a way as we could in D. However, being a true novice at both of these other languages there is the possibility that I've missed something significant, and that the Rust and OCaml implementations could be retooled to match the D implementation. I'd still be glad to hear it if that's the case.

I plan to make a few minor cleanups and submit this to a preprint server as well as a scientific journal in the next week or so.

Kind regards

James S Blachly, MD
The Ohio State University


* as a side note, I actually find the OCaml code quite attractive in its terseness: `let j = cl_interval_of_ho (ob_interval_of_zb i)`

Hi James and Charles,

I am happy to hear of your latest idea of creating type-safe coordinate systems. It's a great idea!

After reading the code on GitHub, I have only one major remark: IMHO, it would be great to separate the novel coordinates systems from any `htslib` dependencies ([see lines 47-50](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L47-L50)) as there are only auxiliary functions that use both the novel coordinates systems and `htslib`. The greater goal I have in mind is to provide the coordinate systems in a separate DUB sub-package (e.g. `dhtslib:coordinates`) that requires only a D compiler. That makes integration into existing projects that do not need `htslib` much easier.

Also, I have a short list of minor, technical remarks:

1. The returned type in [line 114](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L114) has a typo, there is an additional 's'. 2. The array of identifiers `CoordSystemLabels` in [line 203](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L203) is a bit unsafe and not strictly required for two reasons: 1. It can by generated by the compiler using `enum CoordSystemLabels = __traits(allMembers, CoordSystem);`. 2. As far as I can tell its only application is in [line 376](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L376). The same result can be achieved safely using `cs.stringof.split('.')[$ - 1]` or without use of `std.array.split`: `cs.stringof[CoordSystem.stringof.length + 1 .. $]`. 3. The function `unionImpl` in [line 326](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L326) actually computes the convex hull of the two intervals which should be noted in the doc comment for completeness' sake. 4. I have noted that you use operator overloading for union and intersection of `Interval`s. You may also add overloads for the `offset` function in both `Interval` and `Coordinate` with `auto opBinary(string op, T)(T off) if ((op == '+' || op == '-') && isIntegral!T)` and `auto opBinaryRight(string op, T)(T off) if ((op == '+' || op == '-') && isIntegral!T)`.

I enjoyed reading the manuscript. It highlights the issue clearly and presents the solution without getting lost in details. Ignoring typos at this stage, I have no remarks on it – keep going!


Cheers!

-- Arne

Reply via email to