Re: problem with parallel foreach

Gerald Jansen via Digitalmars-d-learn Wed, 13 May 2015 07:31:20 -0700

On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:

On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermolewrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
Would it be possible to give us some example data?
I might give it a go to try rewriting it tomorrow.
http://dekoppel.eu/tmp/pedupgLarge.tar.gz (89 Mb)
Contains two largish datasets in a directory structureexpected by the program.
I only see 2 traits in that example, so it's hard for anyoneto explore your scaling problem, seeing as there are a maximumof 2 tasks.
Either way, a few small changes were enough to cut the runtimeby a factor of ~6 in the single-threaded case and improve thescaling a bit, although the printing to output files stilllooks like a bit of a bottleneck.

http://dpaste.dzfl.pl/80cd36fd6796
The key thing was reducing the number of allocations (morestd.algorithm.splitter copying to static arrays, lessstd.array.split) and avoiding File.byLine. Other people in thisthread have mentioned alternatives to it that may befaster/have lower memory usage, I just read the whole files into memory and then lazily split them withstd.algorithm.splitter. I ended up with some blank lines comingthrough, so i added if(line.empty) continue; in a few places,you might want to look more carefully at that, it could be mymistake.
The use of std.array.appender for `info` is just good practice,but it doesn't make much difference here.

Wow, I'm impressed with the effort you guys (John, Rikki, others)are making to teach me some efficiency tricks. I guess this isone of the strengths of D: its community. I'm studying yourvarious contributions closely!

The empty line comes from the very last line on the files, whichalso end with a newline (as per "normal" practice?).

Re: problem with parallel foreach

Reply via email to