severity 765577 serious thanks On Wed, Feb 25, 2015 at 03:24:08PM +0000, Filippo Giunchedi wrote: > FWIW we're running into the same bug with jessie installer, passing > 'debug' at boot apparently is enough to not trigger the race with good > success rate.
Filippo and I both work for the Wikimedia Foundation, where this is affecting us on dozens of systems. I tried to debug this extensively and had a chat with Marco d'Itri on IRC. It's both mine & Marco's opinion that this is an RC bug, thus elevating this to serious. Unfortunately, Marco told me that he won't able to tackle this and suggested to reply to this bug report so that the other udev maintainers can help out. The result of my own investigation is (not speaking for Marco): It's clear that there's some race condition happening here both because there are reports of it happening sporadically (not in my case, though) and because setting d-i to debug mode fixes it. Therefore, the operating theory is that multiple events for the same "add" event are triggered. This race is supposed to be handled, as: a) write_net_rules takes a lock before writing anything -- it's also evident this happens, as the duplicate entries have ethNs that are numerically ascending and not the same for the same card. b) 75-persistent-net-generator.rules is supposed to be idempotent, as it bails out early (3rd line) for interfaces that already have a NAME set. For the ones that don't, it also sets NAME right after the write_net_rules invocation. However this still leaves room for a race: write_net_rules is *not* idempotent and hence if 75-persistent-net-generator.rules gets called twice in very quick succession, before write_net_rules gets a chance to finish and name the interface, then an interface will be named twice, with a different name (and hence, eth0 will be renamed to e.g. eth2). It's still unknown to me why this is a regression. I've tried the following, under /lib/debian-installer/start-udev: 1) Adding a "udevadm settle || true" right after the "udevadm trigger". 2) Adding a "sleep 15" before "udevadm trigger" 3) Adding a "sleep 15" (or 3) *after* "udevadm trigger". Surprisingly, of these three, only (3) worked around the bug. Another less arbitrary/racy workaround I suggesed was a grep near the top of write_net_rules' write_rule() function. Since write_rule() operates under a lock, this would completely eliminate any kind of race here. I pitched this to Marco but he wasn't thrilled with the idea -- he said he'd prefer finding the root cause. I've done the change and tested it anyway, though, and it successfully aleviates this issue: diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules index 4379792..fbd1230 100644 --- a/debian/extra/write_net_rules +++ b/debian/extra/write_net_rules @@ -60,6 +60,9 @@ write_rule() { local name="$2" local comment="$3" + # workaround potential races, #765577 + if grep -q -F $match $RULES_FILE then return; fi + { if [ "$PRINT_HEADER" ]; then PRINT_HEADER= Thanks, Faidon -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org