Re: Announcement and Request: Typesafe Coordinate Systems for High-Throughput Sequencing Applications

2021-09-01 Thread Arne Ludwig via Digitalmars-d-announce
On Wednesday, 1 September 2021 at 05:36:53 UTC, James Blachly 
wrote:
In another post, I've just announced our D-based high 
throughput sequencing library, dhtslib.


One feature that is, AFAIK, novel in the field is leveraging 
the compiler's type system to enforce correctness regarding 
different genome/reference sequence coordinate systems. 
Clearly, the encoding of domain specific knowledge in a 
language's type system is nothing new, but it is surprising 
that this has not been done before in bioinformatics, and it is 
an idea that IMO is long overdue given the trainwreck of 
different coordinate systems in our field.


You can find dhtslib's develop branch, with Typesafe 
Coordinates merged and ready to use, here:


https://github.com/blachlylab/dhtslib/


**Now the request:**
We've drafted a manuscript describing Typesafe Coordinates as a 
sort of low-key endorsement of the D language and our library 
package `dhtslib`. You can find the manuscript here:


https://github.com/blachlylab/typesafe-coordinates/

We would be very grateful to those of you who would take the 
time to read the manuscript and post comments (publicly or 
privately), _especially if we have made any incorrect 
statements_ or our language regarding type systems is awkward 
or nonstandard.


We did praise D, and gently criticized Rust and OCaml* somewhat 
as it appeared to me that they lacked the features required to 
implement Typesafe Coordinate Systems in as ergonomic a way as 
we could in D. However, being a true novice at both of these 
other languages there is the possibility that I've missed 
something significant, and that the Rust and OCaml 
implementations could be retooled to match the D 
implementation. I'd still be glad to hear it if that's the case.


I plan to make a few minor cleanups and submit this to a 
preprint server as well as a scientific journal in the next 
week or so.


Kind regards

James S Blachly, MD
The Ohio State University


* as a side note, I actually find the OCaml code quite 
attractive in its terseness: `let j = cl_interval_of_ho 
(ob_interval_of_zb i)`


Hi James and Charles,

I am happy to hear of your latest idea of creating type-safe 
coordinate systems. It's a great idea!


After reading the code on GitHub, I have only one major remark: 
IMHO, it would be great to separate the novel coordinates systems 
from any `htslib` dependencies ([see lines 
47-50](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L47-L50)) as there are only auxiliary functions that use both the novel coordinates systems and `htslib`. The greater goal I have in mind is to provide the coordinate systems in a separate DUB sub-package (e.g. `dhtslib:coordinates`) that requires only a D compiler. That makes integration into existing projects that do not need `htslib` much easier.


Also, I have a short list of minor, technical remarks:

1. The returned type in [line 
114](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L114) has a typo, there is an additional 's'.
2. The array of identifiers `CoordSystemLabels` in [line 
203](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L203) is a bit unsafe and not strictly required for two reasons:
1. It can by generated by the compiler using `enum 
CoordSystemLabels = __traits(allMembers, CoordSystem);`.
2. As far as I can tell its only application is in [line 
376](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L376). The same result can be achieved safely using `cs.stringof.split('.')[$ - 1]` or without use of `std.array.split`: `cs.stringof[CoordSystem.stringof.length + 1 .. $]`.
3. The function `unionImpl` in [line 
326](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L326) actually computes the convex hull of the two intervals which should be noted in the doc comment for completeness' sake.
4. I have noted that you use operator overloading for union and 
intersection of `Interval`s. You may also add overloads for the 
`offset` function in both `Interval` and `Coordinate` with `auto 
opBinary(string op, T)(T off) if ((op == '+' || op == '-') && 
isIntegral!T)` and `auto opBinaryRight(string op, T)(T off) if 
((op == '+' || op == '-') && isIntegral!T)`.


I enjoyed reading the manuscript. It highlights the issue clearly 
and presents the solution without getting lost in details. 
Ignoring typos at this stage, I have no remarks on it – keep 
going!



Cheers!

-- Arne


Re: SAOC 2021 Projects Summarized

2021-09-01 Thread Ahmet Sait via Digitalmars-d-announce

On Wednesday, 1 September 2021 at 06:44:53 UTC, Mike Parker wrote:
On Wednesday, 1 September 2021 at 04:56:28 UTC, Ali Çehreli 
wrote:

On 8/30/21 5:47 AM, Mike Parker wrote:

> Ahmet Sait KoC’ak

Being a fellow Turkish, I am curios why his last name is 
spelled that way. Unless it was sepecially requested by him, I 
would use the following obviously correct spelling:


 Ahmet Sait Koçak

Ali


That's how it was submitted in the application. I simply 
copy-pasted.


Must have been an encoding issue with the mail then. I don't mind 
though.


Re: SAOC 2021 Projects Summarized

2021-09-01 Thread Mike Parker via Digitalmars-d-announce

On Wednesday, 1 September 2021 at 10:32:19 UTC, Ahmet Sait wrote:



Must have been an encoding issue with the mail then. I don't 
mind though.


I updated the blog as soon as I saw Ali's post.


Re: SAOC 2021 Projects Summarized

2021-09-01 Thread Paul Backus via Digitalmars-d-announce

On Monday, 30 August 2021 at 16:12:57 UTC, Adam D Ruppe wrote:
On Monday, 30 August 2021 at 16:03:29 UTC, Guillaume Piolat 
wrote:
Hyped by ProtoObject, this is our hope for a nothrow @nogc 
.destroy eventually!


This fails today only because of the rt_finalize hook working 
through void*. If you cut that out...


---
class Foo {
~this() @nogc nothrow {}
}

void main() @nogc nothrow {
scope auto foo = new Foo();
foo.__xdtor();
}
---

this works today.


I thought the problem with this was that destructors aren't 
virtual, so if you write something like this:


---
class Foo {
~this() { }
}

class Bar : Foo {
~this() { }
}

void main() {
Foo foo = new Bar();
foo.__xdtor;
}
---

...then you end up calling Foo's destructor, but not Bar's. 
That's why rt_finalize uses TypeInfo to look up the destructor 
for the object's runtime type.


Re: SAOC 2021 Projects Summarized

2021-09-01 Thread user1234 via Digitalmars-d-announce

On Wednesday, 1 September 2021 at 18:20:59 UTC, Paul Backus wrote:

On Monday, 30 August 2021 at 16:12:57 UTC, Adam D Ruppe wrote:
On Monday, 30 August 2021 at 16:03:29 UTC, Guillaume Piolat 
wrote:
Hyped by ProtoObject, this is our hope for a nothrow @nogc 
.destroy eventually!


This fails today only because of the rt_finalize hook working 
through void*. If you cut that out...


```
class Foo {
~this() @nogc nothrow {}
}

void main() @nogc nothrow {
scope auto foo = new Foo();
foo.__xdtor();
}
```

this works today.


I thought the problem with this was that destructors aren't 
virtual, so if you write something like this:

[...]
...then you end up calling Foo's destructor, but not Bar's. 
That's why rt_finalize uses TypeInfo to look up the destructor 
for the object's runtime type.


we can use custom destructors:

```d
class Foo {
~this() nothrow @nogc  {}
void destroy() nothrow @nogc
{
__xdtor();
}
}

class Bar : Foo {
override void destroy()
{
super.destroy();
}
}

void main() {
Foo foo = new Bar();
foo.destroy();
}
```

I dont know why destructors are not virtual. This makes sense for 
constructors as the correct instance size is required but not for 
destructors.






Re: SAOC 2021 Projects Summarized

2021-09-01 Thread Adam Ruppe via Digitalmars-d-announce

On Wednesday, 1 September 2021 at 22:23:59 UTC, user1234 wrote:

I dont know why destructors are not virtual.


https://dlang.org/spec/class.html#destructors

"There can be only one destructor per class, the destructor does 
not have any parameters, and has no attributes. It is always 
virtual. "



I guess it is virtually virtual due to the rt_finalize 
implementation papering it over.


But you can write a destroy function to do this too by looking up 
the base class xdtor as well (I'm pretty sure anyway).


My point is really just that protoobject is almost certain to 
have exactly the same destructor situation as object


Re: Announcement and Request: Typesafe Coordinate Systems for High-Throughput Sequencing Applications

2021-09-01 Thread James Blachly via Digitalmars-d-announce

On 9/1/21 5:01 AM, Arne Ludwig wrote:
I am happy to hear of your latest idea of creating type-safe coordinate 
systems. It's a great idea!


After reading the code on GitHub, I have only one major remark: IMHO, it 
would be great to separate the novel coordinates systems from any 
`htslib` dependencies ([see lines 
47-50](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L47-L50)) 
as there are only auxiliary functions that use both the novel 
coordinates systems and `htslib`. The greater goal I have in mind is to 
provide the coordinate systems in a separate DUB sub-package (e.g. 
`dhtslib:coordinates`) that requires only a D compiler. That makes 
integration into existing projects that do not need `htslib` much easier.


This is an absolutely **outstanding** idea. Those imports were only to 
reuse an htslib `chr:X-Y` string parsing function, but we can trivially 
rewrite this in native D to enable sub-package independence!



Also, I have a short list of minor, technical remarks:

1. The returned type in [line 
114](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L114) 
has a typo, there is an additional 's'.


Ahh, the curse of templates. Without 100% test coverage these things 
which would cause failure to compile in non-template code seem to always 
sneak in. Thank you so much.


2. The array of identifiers `CoordSystemLabels` in [line 
203](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L203) 
is a bit unsafe and not strictly required for two reasons:


A very excellent suggestion. I am still a metaprogramming novice.

3. The function `unionImpl` in [line 
326](https://github.com/blachlylab/dhtslib/blob/e3b5af14e9eefa54bcc27bc0fcc9066dc3a4ea54/source/dhtslib/coordinates.d#L326) 
actually computes the convex hull of the two intervals which should be 
noted in the doc comment for completeness' sake.


Yes, we had some internal debate about the appropriate result of both 
union and intersect operations when intervals are non-overlapping and 
return type is a non-array. Will leave as is and document as convex hull 
in this case.


4. I have noted that you use operator overloading for union and 
intersection of `Interval`s. You may also add overloads for the `offset` 
function in both `Interval` and `Coordinate` with `auto opBinary(string 
op, T)(T off) if ((op == '+' || op == '-') && isIntegral!T)` and `auto 
opBinaryRight(string op, T)(T off) if ((op == '+' || op == '-') && 
isIntegral!T)`.


Very nice. I do miss operator overloading in some of the other languages 
I explored recently.


I enjoyed reading the manuscript. It highlights the issue clearly and 
presents the solution without getting lost in details. Ignoring typos at 
this stage, I have no remarks on it – keep going!


Thanks again for this critical review. As you know we are really pleased 
with how D has accelerated our science and wish to share it with the world.


James