[email protected] (Ludovic Courtès) skribis: > I tried running this: > > guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape > --no-grafts) | sort) & \ > guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape > --no-grafts) | sort -r) > > … also in parallel with this (for good measure): > > guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape > --no-grafts) | sort -R) > > Since we have 3 clients, that leads to 3 guix-daemon processes, and > those are stuck in a deadlock:
This strikes again: ‘cuirass remote-worker’ processes on berlin occasionally end up deadlocking in the exact same way. When running ‘current remote-worker --workers=4’, 4 sessions (4 clients) are used, which can lead to that situation, as in this example: --8<---------------cut here---------------start------------->8--- root@hydra-guix-126 ~# guix processes |guix shell recutils -- recsel -p 'SessionPID,ClientCommand,LockHeld' SessionPID: 27250 ClientCommand: /gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile -e main -s /gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real remote-worker --user=cuirass-worker --workers=4 --systems=x86_64-linux,i686-linux --publish-port=5558 --substitute-urls=http://141.80.167.131 SessionPID: 27269 ClientCommand: /gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile -e main -s /gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real remote-worker --user=cuirass-worker --workers=4 --systems=x86_64-linux,i686-linux --publish-port=5558 --substitute-urls=http://141.80.167.131 LockHeld: /gnu/store/72s7500g3zg2p6fjdc1paazvm1w2xdr2-libva-2.19.0.lock LockHeld: /gnu/store/0bbnhq7bagn6sbj2lmapmdiiw50v3dgz-rav1e-0.7.1.lock SessionPID: 27308 ClientCommand: /gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile -e main -s /gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real remote-worker --user=cuirass-worker --workers=4 --systems=x86_64-linux,i686-linux --publish-port=5558 --substitute-urls=http://141.80.167.131 LockHeld: /gnu/store/zf5w9ypk8il0i9y22n81aamypr2qgsmm-dav1d-1.5.0.lock SessionPID: 27345 ClientCommand: /gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile -e main -s /gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real remote-worker --user=cuirass-worker --workers=4 --systems=x86_64-linux,i686-linux --publish-port=5558 --substitute-urls=http://141.80.167.131 LockHeld: /gnu/store/0xbi2bgq34yyx2fqjjwpgdv4gkfyaf60-gst-plugins-bad-minimal-1.22.3.lock LockHeld: /gnu/store/ij5igi5xrp4sx6c78nbvg24lb4ma2f4l-libcbor-0.11.0.lock LockHeld: /gnu/store/czfvm14yy517vb8w2hpp46nyrdrymqyp-libfido2-1.12.0.lock LockHeld: /gnu/store/1ldcq0p20nqy7d3mxdy4yra1ax5ik3xc-mpg123-1.31.2.lock LockHeld: /gnu/store/sadbf1fmb0n9k754x5jbbdklcxbjqlhx-openssh-9.9p1.lock LockHeld: /gnu/store/86rl29llmb7s4sl3bx0vl465mmq7nk6f-gcr-3.41.2.lock SessionPID: 27382 ClientCommand: /gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile -e main -s /gnu/store/ll18sc406b5cqapmvz17v22gh4sryb24-cuirass-1.2.0-11.e96f088/bin/.cuirass-real remote-worker --user=cuirass-worker --workers=4 --systems=x86_64-linux,i686-linux --publish-port=5558 --substitute-urls=http://141.80.167.131 --8<---------------cut here---------------end--------------->8--- Here process 27269 holds locks on libva and rav1e and waits forever trying to get the dav1d lock, held by 27308; process 27308 tries to get the rav1e lock; process 27345 tries to get the libva lock. FWIW, each of them is trying to substitute (not build) those things, via the ‘build-things’ call made after the “substituting ~a inputs for ~a” message in remote-worker. Ludo’.
