On Sunday, 26 August 2018 at 05:55:47 UTC, Pjotr Prins wrote:
Artem wrote Sambamba as a student
https://github.com/biod/sambamba
and it is now running around the world in sequencing centers.
Many many CPU hours and a resulting huge carbon foot print. The
large competing C++ samtools project has been trying for 8
years to catch up with an almost unchanged student project and
they are still slower in many cases.
[snip]
Note that Artem used the GC and only took GC out for critical
sections in parallel code. I don't buy these complaints about
GC.
The complaints about breaking code I don't see that much
either. Sambamba pretty much kept compiling over the years and
with LDC/LLVM latest we see a 20% perfomance increase. For free
(at least from our perspective). Kudos to LDC/LLVM efforts!!
This sounds very similar to my experiences with the tsv
utilities, on most of the same points (development simplicity,
comparative performance, GC use, LDC). Data processing apps may
well be a sweet spot. See my DConf talk for an overview
(https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf).
Though not mentioned in the talk, I also haven't had any
significant issues with new compiler releases. May have be
related to the type of code being written. Regarding the GC - The
throughput oriented nature of data processing tools like the tsv
utilities looks like a very good fit for the current GC.
Applications where low GC latency is needed may have different
results. It'd be great to hear an experience report from
development of an application where GC was used and low GC
latency was a priority.
--Jon