Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2
forwarded 863636 https://salsa.debian.org/reproducible-builds/diffoscope/issues/22 thanks I've forwarded this upstream here: https://salsa.debian.org/reproducible-builds/diffoscope/issues/22 Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org / chris-lamb.co.uk `-
Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2
Ximin Luo: > Ximin Luo: >> Package: diffoscope >> Version: 78 >> Severity: normal >> >> Dear Maintainer, >> >> diff(1) first reads the contents of one file then the next one: >> >> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552 >> >> This means that if the "files" are actually FIFOs connected to the output of >> a >> process, as they are in many cases in diffoscope, the second process has to >> wait >> for diff(1) to fully read the output of the first process, before it itself >> can >> run. This prevents both processes from running in parallel. >> >> An appropriate fix would be to store the output of at least one of the >> commands >> into a temporary file, and have diff(1) read from this instead. This has to >> be >> done carefully however, to make sure that diff(1) doesn't accidentally read >> it >> before the process is finished. >> >> [..] > > It seems readelf specifically has weird performance behaviours when running > in parallel. > > [..] I couldn't reproduce the above results on Holger's profitbricks machine, and bunk@ couldn't reproduce it either. That is, running the commands in parallel *did* produce roughly a 2x speed up. Also on my local machine I got: $ ls -laSr /usr/bin/{hot,hokey,darcs} -rwxr-xr-x 1 root root 20555008 Oct 28 2016 /usr/bin/hot* -rwxr-xr-x 1 root root 29637664 Oct 28 2016 /usr/bin/hokey* -rwxr-xr-x 1 root root 37144392 Oct 28 2016 /usr/bin/darcs* $ f() { taskset --cpu-list $1 objdump -S /usr/bin/hot >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m12.445s user0m12.408s sys 0m0.024s real0m7.653s user0m15.224s sys 0m0.040s $ f() { taskset --cpu-list $1 objdump -S /usr/bin/hokey >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m24.998s user0m24.896s sys 0m0.064s real0m21.197s user0m42.224s sys 0m0.076s $ f() { taskset --cpu-list $1 objdump -S /usr/bin/darcs >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m38.652s user0m38.532s sys 0m0.064s real0m34.323s user1m8.168s sys 0m0.104s i.e. the speed-improvement-due-to-parallelism decreases as the size of the input increases - but I couldn't reproduce this the profitbricks machine either. Due to the lack of debugging symbols for binutils (#863728) it's hard for me to investigate this further, so I'll pause this for now. It's probably worth un-reverting e28b540b0b289ce9fda70095160382799d7602a6 perhaps guarded by a CLI flag; though diffoscope's heavy use of Python-based filtering of external commands' output makes this less significant (without also trying to optimise how this filtering is done). In the meantime I'm also using "--exclude-command '^readelf.*\s--debug-dump=info'" to avoid the longest part of ELF processing. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2
Ximin Luo: > Package: diffoscope > Version: 78 > Severity: normal > > Dear Maintainer, > > diff(1) first reads the contents of one file then the next one: > > https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552 > > This means that if the "files" are actually FIFOs connected to the output of a > process, as they are in many cases in diffoscope, the second process has to > wait > for diff(1) to fully read the output of the first process, before it itself > can > run. This prevents both processes from running in parallel. > > An appropriate fix would be to store the output of at least one of the > commands > into a temporary file, and have diff(1) read from this instead. This has to be > done carefully however, to make sure that diff(1) doesn't accidentally read it > before the process is finished. > > [..] It seems readelf specifically has weird performance behaviours when running in parallel. My initial attempt at fixing this works in the general case, but not for specific cases that are important to diffoscope and this scenario. https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?h=experimental=e28b540b0b289ce9fda70095160382799d7602a6 Taking diffoscope completely out of the equation, we see the following behaviours: $ f() { sleep 3; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m6.010s user0m0.000s sys 0m0.000s real0m3.004s user0m0.000s sys 0m0.000s $ f() { sha256sum /run/shm/elf.$1 >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m0.451s user0m0.432s sys 0m0.012s real0m0.221s user0m0.428s sys 0m0.004s (The above two cases confirm that the shell snippets are correct. /run/shm/elf.{1,2} is about 50MB.) $ f() { taskset $1 readelf --wide --debug-dump=rawline /run/shm/elf.$1 >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m17.128s user0m17.096s sys 0m0.008s real0m19.126s user0m37.256s sys 0m0.016s $ f() { taskset $1 readelf --wide --debug-dump=rawline /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m3.330s user0m3.324s sys 0m0.000s real0m1.674s user0m3.324s sys 0m0.000s $ f() { taskset $1 readelf --wide --debug-dump=info /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; ) real0m18.437s user0m18.384s sys 0m0.012s real0m29.039s user0m56.196s sys 0m0.044s /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug is about 4MB and from libc6-dbg=2.24-10. So it looks like readelf has issues when running for a long time in parallel. (--debug-dump=rawline with a 4MB file sped up when running in parallel.) However other tools don't seem to have this problem: $ time ( { echo 1; echo 2; } | xargs -n1 -I '{}' taskset '{}' readelf --wide --debug-dump=info /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null ) real0m18.730s user0m18.644s sys 0m0.032s $ time ( { echo 1; echo 2; } | parallel --will-cite taskset '{}' readelf --wide --debug-dump=info /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null ) real0m33.076s user1m0.844s sys 0m0.460s $ time ( { echo 1; echo 2; } | xargs -n1 -I '{}' taskset '{}' sha256sum /run/shm/test.iso >/dev/null ) real0m33.302s user0m32.332s sys 0m0.932s $ time ( { echo 1; echo 2; } | parallel --will-cite taskset '{}' sha256sum /run/shm/test.iso >/dev/null ) real0m16.843s user0m32.224s sys 0m1.272s Here, /run/shm/test.iso is about 4GB in size, and it does speed up when being processed by sha256sum in parallel. Running strace on the "readelf" calls shows that most syscalls are done at the start, then for most of the rest of the execution it's just "write()" syscalls and indeed the "sys" times above are negligible. So it seems unlikely that the processes are blocking on anything. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2
Package: diffoscope Version: 78 Severity: normal Dear Maintainer, diff(1) first reads the contents of one file then the next one: https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552 This means that if the "files" are actually FIFOs connected to the output of a process, as they are in many cases in diffoscope, the second process has to wait for diff(1) to fully read the output of the first process, before it itself can run. This prevents both processes from running in parallel. An appropriate fix would be to store the output of at least one of the commands into a temporary file, and have diff(1) read from this instead. This has to be done carefully however, to make sure that diff(1) doesn't accidentally read it before the process is finished. X -- System Information: Debian Release: 9.0 APT prefers testing APT policy: (990, 'testing'), (500, 'unstable-debug'), (500, 'testing-debug'), (500, 'buildd-unstable'), (300, 'unstable'), (100, 'experimental'), (1, 'experimental-debug') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.9.0-3-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages diffoscope depends on: ii python3-libarchive-c 2.1-3.1 ii python3-magic 1:5.30-1 ii python3-pkg-resources 33.1.1-1 pn python3:any Versions of packages diffoscope recommends: ii acl 2.2.52-3+b1 ii apktool 2.2.1+dfsg-2 ii binutils-multiarch 2.28-5 ii bzip21.0.6-8.1 ii caca-utils 0.99.beta19-2+b2 ii colord 1.3.3-2 ii default-jdk [java-sdk] 2:1.8-58 ii default-jdk-headless 2:1.8-58 ii enjarify 1:1.0.3-3 ii fontforge-extras 0.3-4 pn fp-utils ii genisoimage 9:1.1.11-3+b2 ii gettext 0.19.8.1-2 ii ghc 8.0.1-17+b1 ii ghostscript 9.20~dfsg-3.2 ii gnupg2.1.18-6 ii imagemagick 8:6.9.7.4+dfsg-8 ii imagemagick-6.q16 [imagemagick] 8:6.9.7.4+dfsg-8 ii jsbeautifier 1.6.4-6 ii llvm 1:3.8-36 ii mono-utils 4.6.2.7+dfsg-1 ii openjdk-8-jdk [java-sdk] 8u131-b11-2 ii openssh-client 1:7.4p1-10 ii pdftk2.02-4+b2 ii poppler-utils0.48.0-2 ii python3-argcomplete 1.8.1-1 ii python3-debian 0.1.30 pn python3-guestfs ii python3-progressbar 2.3-4 ii python3-rpm 4.12.0.2+dfsg1-2 ii python3-tlsh 3.4.4+20151206-1+b2 ii rpm2cpio 4.12.0.2+dfsg1-2 ii sng 1.1.0-1+b1 ii sqlite3 3.16.2-3 ii squashfs-tools 1:4.3-3+b1 ii unzip6.0-21 ii vim-common 2:8.0.0197-4 ii xxd 2:8.0.0197-4 ii xz-utils 5.2.2-1.2+b1 Versions of packages diffoscope suggests: ii libjs-jquery 3.1.1-2 -- no debconf information