It's not quite a year since the open-sourcing of eBay's tsv
utilities. Since then there have been a number of additions and
updates, and the tools form a more complete package. The tools
assist with manipulation of tabular data files common in machine
learning and data mining environments. They work alongside
traditional Unix command line tools like 'cut', and 'sort'. They
also fit well with data mining and stats packages like R and
Pandas.
The tools include filtering, slicing, joins and other
manipulation, sampling, and statistical calculations. If you find
yourself working with large data files from a unix shell, you may
like these tools.
Speed matters when processing large data files, and these tools
are fast. I've published new benchmarks comparing the tools to
similar tools written in several native compiled programming
languages. The tools are the fastest on five of the six
benchmarks run, generally by significant margins. It's a good
result for the D programming language. The benchmarks may be of
interest regardless of your interest in the tools themselves.
Repository: https://github.com/eBay/tsv-utils-dlang
Performance benchmarks:
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md
--Jon