I released a new tool for weighted random sampling of tabular
data files: tsv-sample. It's one of several tools recently added
to tsv file toolkit I released last year. These tools are
especially useful when data files are larger than is desirable to
read entirely into memory in R and similar apps.
I'll publish an announcement of broader set of tools updates in
the next few weeks. I have some performance benchmarks to finish
first. However, weighted reservoir sampling algorithms are
interesting, I thought there might be enough interest to warrant
a separate announcement.
Repo: https://github.com/eBay/tsv-utils-dlang
tsv-sample code:
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d
--Jon