Re: Trying to reduce memory usage

Jon Degenhardt via Digitalmars-d-learn Mon, 22 Feb 2021 20:25:40 -0800

On Tuesday, 23 February 2021 at 00:08:40 UTC, tsbockman wrote:

On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardtwrote:
It would be interesting to see how the performance compares totsv-uniq(https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). Theprebuilt binaries turn on all the optimizations(https://github.com/eBay/tsv-utils/releases).
My program (called line-dedup below) is modestly faster thanyours, with the gap gradually widening as files get bigger.Similarly, when not using a memory-mapped scratch file, myprogram is modestly less memory hungry than yours, with the gapgradually widening as files get bigger.
In neither case is the difference very exciting though; thereal benefit of my algorithm is that it can process files toolarge for physical memory. It might also handle frequent hashcollisions better, and could be upgraded to handle huge numbersof very short lines efficiently.

Thanks for running the comparison! I appreciate seeing how otherimplementations compare.

I'd characterize the results a differently though. Based on thenumbers, line-dedup is materially faster than tsv-uniq, at leaston the tests run. To your point, it may not make much practicaldifference on data sets that fit in memory. tsv-uniq is fastenough for most needs. But it's still a material performancedelta. Nice job!

I agree also that the bigger pragmatic benefit is fast processingof files much larger than will fit in memory. There are otheruseful problems like this. One I often need is creating a randomweighted ordering. Easy to do for data sets that fit in memory,but hard to do fast for data sets that do not.


--Jon

Re: Trying to reduce memory usage

Reply via email to