Ximin Luo: > Ximin Luo: >> Package: diffoscope >> Version: 78 >> Severity: normal >> >> Dear Maintainer, >> >> diff(1) first reads the contents of one file then the next one: >> >> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552 >> >> This means that if the "files" are actually FIFOs connected to the output of >> a >> process, as they are in many cases in diffoscope, the second process has to >> wait >> for diff(1) to fully read the output of the first process, before it itself >> can >> run. This prevents both processes from running in parallel. >> >> An appropriate fix would be to store the output of at least one of the >> commands >> into a temporary file, and have diff(1) read from this instead. This has to >> be >> done carefully however, to make sure that diff(1) doesn't accidentally read >> it >> before the process is finished. >> >> [..] > > It seems readelf specifically has weird performance behaviours when running > in parallel. > > [..]
I couldn't reproduce the above results on Holger's profitbricks machine, and bunk@ couldn't reproduce it either. That is, running the commands in parallel *did* produce roughly a 2x speed up. Also on my local machine I got: $ ls -laSr /usr/bin/{hot,hokey,darcs} -rwxr-xr-x 1 root root 20555008 Oct 28 2016 /usr/bin/hot* -rwxr-xr-x 1 root root 29637664 Oct 28 2016 /usr/bin/hokey* -rwxr-xr-x 1 root root 37144392 Oct 28 2016 /usr/bin/darcs* $ f() { taskset --cpu-list $1 objdump -S /usr/bin/hot >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real 0m12.445s user 0m12.408s sys 0m0.024s real 0m7.653s user 0m15.224s sys 0m0.040s $ f() { taskset --cpu-list $1 objdump -S /usr/bin/hokey >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real 0m24.998s user 0m24.896s sys 0m0.064s real 0m21.197s user 0m42.224s sys 0m0.076s $ f() { taskset --cpu-list $1 objdump -S /usr/bin/darcs >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real 0m38.652s user 0m38.532s sys 0m0.064s real 0m34.323s user 1m8.168s sys 0m0.104s i.e. the speed-improvement-due-to-parallelism decreases as the size of the input increases - but I couldn't reproduce this the profitbricks machine either. Due to the lack of debugging symbols for binutils (#863728) it's hard for me to investigate this further, so I'll pause this for now. It's probably worth un-reverting e28b540b0b289ce9fda70095160382799d7602a6 perhaps guarded by a CLI flag; though diffoscope's heavy use of Python-based filtering of external commands' output makes this less significant (without also trying to optimise how this filtering is done). In the meantime I'm also using "--exclude-command '^readelf.*\s--debug-dump=info'" to avoid the longest part of ELF processing. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git