Bug#765577: (no subject)
Control: tag -1 pending Hey Faidon, Faidon Liambotis [2015-04-22 15:25 +0300]: -new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*])/\\\1/g') +new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*\{\}])/\\\1/g') Thanks for spotting this! I tested this now with Marco's test script, and it still works fine. Committed to master now: http://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=3d02f64154 (to experimental too). I have to say... constructing regexps in shell is tricky and the whole escaping-with-sed logic feels like a hack. I think a literal grep (i.e. -F) would be better here, especially since I don't see the point of an exact match (even if the file was modified by the sysadmin, the right thing would to not write a new rule anyway). This is probably something to be considered post-jessie. Yeah, the whole generator is a horrible thing.. For the future I actually want to propose a bolder change: https://lists.debian.org/debian-devel/2015/05/msg00170.html Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) signature.asc Description: Digital signature
Bug#765577: (no subject)
reopen 765577 ! found 765577 215-14 thanks On Mon, Mar 30, 2015 at 06:06:47AM +0200, Marco d'Itri wrote: I see that we have independently devised the same fix, I am attaching a test case and a more refined version of your patch. I tried Jessie RC3 today and immediately found that the fix is, unfortunately, buggy. Your patch constructs a regexp and takes care to escape metacharacters ? and * with a sed but does not escape { and } that are also metacharacters in the extended set of POSIX regexps. These are always found in the string-to-be-matched here with 'ATTR{dev_id}==0x0' and 'ATTR{type}==1', so the if always fails. This was likely not caught by your test case (and was harder to debug and figure out!) because GNU grep's -E mode handles { as both a literal and a metacharacter heuristically for historic reasons (consult grep's manpage for that) but busybox grep does not: $ echo 'foo{bar}' test $ egrep 'foo{bar}' test foo{bar} $ busybox egrep 'foo{bar}' test egrep: bad regex 'foo{bar}' $ egrep 'fo{1,2}' test foo{bar} $ busybox egrep 'fo{1,2}' test foo{bar} Note that this is NOT a bug in busybox; foo{bar} is indeed an invalid extended POSIX regexp and busybox is right to complain and error out. The very minimal last-minute fix below did the trick for me but I have to say... constructing regexps in shell is tricky and the whole escaping-with-sed logic feels like a hack. I think a literal grep (i.e. -F) would be better here, especially since I don't see the point of an exact match (even if the file was modified by the sysadmin, the right thing would to not write a new rule anyway). This is probably something to be considered post-jessie. Thanks, Faidon diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules index 38a3ca0..fedc0f1 100644 --- a/debian/extra/write_net_rules +++ b/debian/extra/write_net_rules @@ -118,7 +118,7 @@ basename=${INTERFACE%%[0-9]*} match=$match, KERNEL==\$basename*\ # build a regular expression that matches the new rule that we want to write -new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*])/\\\1/g') +new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*\{\}])/\\\1/g') # Double check if the new rule has already been written. This happens if # multiple add events are generated before the script returns and udevd -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#765577: (no subject)
On Mar 18, Faidon Liambotis parav...@debian.org wrote: Well, the root cause IMO is that 75-persistent-net-generator.rules is inherently susceptible to races. It's my understanding that it's valid for events to be triggered multiple times -- there are multiple places in d-i that udevadm trigger is called, including start-udev (as shipped by udev-udeb) which would trigger another set of add events beyond the original hotplug one. I do not think that this is valid in the sense that the kernel could generate multiple add events, but removing all misuses of udevadm trigger requires some work and may not even be possible at this time. I see that we have independently devised the same fix, I am attaching a test case and a more refined version of your patch. -- ciao, Marco #!/bin/sh -e rm -f /etc/udev/rules.d/70-persistent-net.rules rmmod e1000e || true modprobe e1000e sleep 1 rm -f /etc/udev/rules.d/70-persistent-net.rules iteration=0 while : ; do iteration=$((iteration + 1)) if ! echo add /sys/class/net/eth0/uevent; then echo Failed at iteration $iteration! exit fi # do not send too many events to udev because they will all be queued # and eventually processed if [ $iteration -eq 2500 ]; then echo Aborting at iteration $iteration! exit fi done --- /lib/udev/write_net_rules.orig 2015-03-30 05:18:43.228147201 +0200 +++ /lib/udev/write_net_rules 2015-03-30 06:00:39.181085432 +0200 @@ -117,6 +117,18 @@ basename=${INTERFACE%%[0-9]*} match=$match, KERNEL==\$basename*\ +# build a regular expression that matches the new rule that we want to write +new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*])/\\\1/g') + +# Double check if the new rule has already been written. This happens if +# multiple add events are generated before the script returns and udevd +# renames the interfaces. See #765577 for details. +if egrep -q $new_rule_pattern \ +$RO_RULES_FILE $([ -e $RULES_FILE ] echo $RULES_FILE); then + unlock_rules_file + exit 0 +fi + if [ $INTERFACE_NAME ]; then # external tools may request a custom name COMMENT=$COMMENT (custom name provided by external tool) pgp099UwXLv9Q.pgp Description: PGP signature
Bug#765577: (no subject)
severity 765577 serious thanks On Wed, Feb 25, 2015 at 03:24:08PM +, Filippo Giunchedi wrote: FWIW we're running into the same bug with jessie installer, passing 'debug' at boot apparently is enough to not trigger the race with good success rate. Filippo and I both work for the Wikimedia Foundation, where this is affecting us on dozens of systems. I tried to debug this extensively and had a chat with Marco d'Itri on IRC. It's both mine Marco's opinion that this is an RC bug, thus elevating this to serious. Unfortunately, Marco told me that he won't able to tackle this and suggested to reply to this bug report so that the other udev maintainers can help out. The result of my own investigation is (not speaking for Marco): It's clear that there's some race condition happening here both because there are reports of it happening sporadically (not in my case, though) and because setting d-i to debug mode fixes it. Therefore, the operating theory is that multiple events for the same add event are triggered. This race is supposed to be handled, as: a) write_net_rules takes a lock before writing anything -- it's also evident this happens, as the duplicate entries have ethNs that are numerically ascending and not the same for the same card. b) 75-persistent-net-generator.rules is supposed to be idempotent, as it bails out early (3rd line) for interfaces that already have a NAME set. For the ones that don't, it also sets NAME right after the write_net_rules invocation. However this still leaves room for a race: write_net_rules is *not* idempotent and hence if 75-persistent-net-generator.rules gets called twice in very quick succession, before write_net_rules gets a chance to finish and name the interface, then an interface will be named twice, with a different name (and hence, eth0 will be renamed to e.g. eth2). It's still unknown to me why this is a regression. I've tried the following, under /lib/debian-installer/start-udev: 1) Adding a udevadm settle || true right after the udevadm trigger. 2) Adding a sleep 15 before udevadm trigger 3) Adding a sleep 15 (or 3) *after* udevadm trigger. Surprisingly, of these three, only (3) worked around the bug. Another less arbitrary/racy workaround I suggesed was a grep near the top of write_net_rules' write_rule() function. Since write_rule() operates under a lock, this would completely eliminate any kind of race here. I pitched this to Marco but he wasn't thrilled with the idea -- he said he'd prefer finding the root cause. I've done the change and tested it anyway, though, and it successfully aleviates this issue: diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules index 4379792..fbd1230 100644 --- a/debian/extra/write_net_rules +++ b/debian/extra/write_net_rules @@ -60,6 +60,9 @@ write_rule() { local name=$2 local comment=$3 + # workaround potential races, #765577 + if grep -q -F $match $RULES_FILE then return; fi + { if [ $PRINT_HEADER ]; then PRINT_HEADER= Thanks, Faidon -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#765577: (no subject)
Am 18.03.2015 um 18:15 schrieb Faidon Liambotis: Another less arbitrary/racy workaround I suggesed was a grep near the top of write_net_rules' write_rule() function. Since write_rule() operates under a lock, this would completely eliminate any kind of race here. I pitched this to Marco but he wasn't thrilled with the idea -- he said he'd prefer finding the root cause. I've done the change and tested it anyway, though, and it successfully aleviates this issue: I'm with Marco here. Before adding any workarounds, we need to understand what the underlying problem is. Otherwise we are adding cruft which nobody understands anymore a few years from now. Since I can't reproduce the issue and already have trouble keeping up with other bug reports, further investigation needs to be done by someone else. Regards, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#765577: (no subject)
On Wed, Mar 18, 2015 at 06:52:14PM +0100, Michael Biebl wrote: I'm with Marco here. Before adding any workarounds, we need to understand what the underlying problem is. Otherwise we are adding cruft which nobody understands anymore a few years from now. Since I can't reproduce the issue and already have trouble keeping up with other bug reports, further investigation needs to be done by someone else. Well, the root cause IMO is that 75-persistent-net-generator.rules is inherently susceptible to races. It's my understanding that it's valid for events to be triggered multiple times -- there are multiple places in d-i that udevadm trigger is called, including start-udev (as shipped by udev-udeb) which would trigger another set of add events beyond the original hotplug one. This is why write_net_rules operates under a lockfile too (again, AIUI). It's just that this doesn't fix the race, just limits the race window significantly but not entirely (as it makes write_net_rules idempotent, but not 75-persistent-net-generator.rules). The reason this is found in current jessie might just be that udev got faster, or more udevadm triggers were added in other parts of the installer etc. In any case, if you need more input from a buggy system, feel free to let me know and I'll try my best to get back to you with more details. Oh, my original patch was buggy for many reasons, here's the updated version: diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules index 4379792..95939b8 100644 --- a/debian/extra/write_net_rules +++ b/debian/extra/write_net_rules @@ -117,6 +117,12 @@ fi basename=${INTERFACE%%[0-9]*} match=$match, KERNEL==\$basename*\ +# detect a race and abort, cf. #765577 +if [ -f $RULES_FILE ] grep -q -F $match $RULES_FILE; then + unlock_rules_file + exit 0 +fi + if [ $INTERFACE_NAME ]; then # external tools may request a custom name COMMENT=$COMMENT (custom name provided by external tool) Regards, Faidon -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#765577: (no subject)
On Fri, Feb 06, 2015 at 01:26 AM, Chris Kuehl wrote: Hi Christof, On Fri, Feb 06, 2015 at 08:30:58AM +0100, Christof Böckler wrote: Maybe it is has to do with the presence of a second networking device? The devices I tested on (which had a 100% reproducibility rate) all had a single network card. So maybe it's possible that's the reason you don't see the same level of occurrence. FWIW we're running into the same bug with jessie installer, passing 'debug' at boot apparently is enough to not trigger the race with good success rate. Unfortunately it isn't a very good workaround, it needs to be undone e.g. in /etc/default/grub after installation. See also https://phabricator.wikimedia.org/T90236 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#765577: (no subject)
Hi Christof, On Fri, Feb 06, 2015 at 08:30:58AM +0100, Christof Böckler wrote: Maybe it is has to do with the presence of a second networking device? The devices I tested on (which had a 100% reproducibility rate) all had a single network card. So maybe it's possible that's the reason you don't see the same level of occurrence. Chris -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#765577: (no subject)
Being redirected from my bugreport #777126 I can confirm that this happens with automatic installations. Although it does not happen all the time. With a Lenovo Thinkpad X61 for example, I got this /etc/udev/rules.d/70-persistent-net.rules: # PCI device 0x8086:0x1049 (e1000e) SUBSYSTEM==net, ACTION==add, DRIVERS==?*, ATTR{address}==00:16:d3:ca:56:05, ATTR{dev_id}==0x0, ATTR{type}==1, KERNEL==eth*, NAME=eth0 # PCI device 0x8086:0x4230 (iwl4965) SUBSYSTEM==net, ACTION==add, DRIVERS==?*, ATTR{address}==00:13:e8:ab:5d:43, ATTR{dev_id}==0x0, ATTR{type}==1, KERNEL==wlan*, NAME=wlan0 Maybe it is has to do with the presence of a second networking device? Greetings Christof -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org