Bug#765577: (no subject)

2015-05-21 Thread Martin Pitt
Control: tag -1 pending

Hey Faidon,

Faidon Liambotis [2015-04-22 15:25 +0300]:
 -new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed 
 -re 's/([\?\*])/\\\1/g')
 +new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed 
 -re 's/([\?\*\{\}])/\\\1/g')

Thanks for spotting this! I tested this now with Marco's test script,
and it still works fine. Committed to master now:

  http://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=3d02f64154

(to experimental too).

 I have to say... constructing regexps in shell is tricky and the
 whole escaping-with-sed logic feels like a hack. I think a literal
 grep (i.e.  -F) would be better here, especially since I don't see
 the point of an exact match (even if the file was modified by the
 sysadmin, the right thing would to not write a new rule anyway).
 This is probably something to be considered post-jessie.

Yeah, the whole generator is a horrible thing.. For the future I
actually want to propose a bolder change:

  https://lists.debian.org/debian-devel/2015/05/msg00170.html

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)


signature.asc
Description: Digital signature


Bug#765577: (no subject)

2015-04-22 Thread Faidon Liambotis
reopen 765577 !
found 765577 215-14
thanks

On Mon, Mar 30, 2015 at 06:06:47AM +0200, Marco d'Itri wrote:
 I see that we have independently devised the same fix, I am attaching 
 a test case and a more refined version of your patch.

I tried Jessie RC3 today and immediately found that the fix is,
unfortunately, buggy. Your patch constructs a regexp and takes care to
escape metacharacters ? and * with a sed but does not escape { and
} that are also metacharacters in the extended set of POSIX regexps.
These are always found in the string-to-be-matched here with
'ATTR{dev_id}==0x0' and 'ATTR{type}==1', so the if always fails.

This was likely not caught by your test case (and was harder to debug
and figure out!) because GNU grep's -E mode handles { as both a literal
and a metacharacter heuristically for historic reasons (consult grep's
manpage for that) but busybox grep does not:
  $ echo 'foo{bar}'  test
  $ egrep 'foo{bar}' test 
  foo{bar}
  $ busybox egrep 'foo{bar}' test 
  egrep: bad regex 'foo{bar}'
  $ egrep 'fo{1,2}' test 
  foo{bar}
  $ busybox egrep 'fo{1,2}' test 
  foo{bar}
Note that this is NOT a bug in busybox; foo{bar} is indeed an invalid
extended POSIX regexp and busybox is right to complain and error out.

The very minimal last-minute fix below did the trick for me but I have
to say... constructing regexps in shell is tricky and the whole
escaping-with-sed logic feels like a hack. I think a literal grep (i.e.
-F) would be better here, especially since I don't see the point of an
exact match (even if the file was modified by the sysadmin, the right
thing would to not write a new rule anyway). This is probably something
to be considered post-jessie.

Thanks,
Faidon

diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules
index 38a3ca0..fedc0f1 100644
--- a/debian/extra/write_net_rules
+++ b/debian/extra/write_net_rules
@@ -118,7 +118,7 @@ basename=${INTERFACE%%[0-9]*}
 match=$match, KERNEL==\$basename*\
 
 # build a regular expression that matches the new rule that we want to write
-new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 
's/([\?\*])/\\\1/g')
+new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 
's/([\?\*\{\}])/\\\1/g')
 
 # Double check if the new rule has already been written. This happens if
 # multiple add events are generated before the script returns and udevd


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#765577: (no subject)

2015-03-29 Thread Marco d'Itri
On Mar 18, Faidon Liambotis parav...@debian.org wrote:

 Well, the root cause IMO is that 75-persistent-net-generator.rules is
 inherently susceptible to races. It's my understanding that it's valid
 for events to be triggered multiple times -- there are multiple places
 in d-i that udevadm trigger is called, including start-udev (as
 shipped by udev-udeb) which would trigger another set of add events
 beyond the original hotplug one.
I do not think that this is valid in the sense that the kernel could 
generate multiple add events, but removing all misuses of udevadm 
trigger requires some work and may not even be possible at this time.

I see that we have independently devised the same fix, I am attaching 
a test case and a more refined version of your patch.

-- 
ciao,
Marco
#!/bin/sh -e

rm -f /etc/udev/rules.d/70-persistent-net.rules

rmmod e1000e || true
modprobe e1000e
sleep 1

rm -f /etc/udev/rules.d/70-persistent-net.rules

iteration=0

while : ; do
  iteration=$((iteration + 1))
  if ! echo add  /sys/class/net/eth0/uevent; then
echo Failed at iteration $iteration!
exit
  fi
  # do not send too many events to udev because they will all be queued
  # and eventually processed
  if [ $iteration -eq 2500 ]; then
echo Aborting at iteration $iteration!
exit
  fi
done

--- /lib/udev/write_net_rules.orig	2015-03-30 05:18:43.228147201 +0200
+++ /lib/udev/write_net_rules	2015-03-30 06:00:39.181085432 +0200
@@ -117,6 +117,18 @@
 basename=${INTERFACE%%[0-9]*}
 match=$match, KERNEL==\$basename*\
 
+# build a regular expression that matches the new rule that we want to write
+new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*])/\\\1/g')
+
+# Double check if the new rule has already been written. This happens if
+# multiple add events are generated before the script returns and udevd
+# renames the interfaces. See #765577 for details.
+if egrep -q $new_rule_pattern \
+$RO_RULES_FILE $([ -e $RULES_FILE ]  echo $RULES_FILE); then
+  unlock_rules_file
+  exit 0
+fi
+
 if [ $INTERFACE_NAME ]; then
 	# external tools may request a custom name
 	COMMENT=$COMMENT (custom name provided by external tool)


pgp099UwXLv9Q.pgp
Description: PGP signature


Bug#765577: (no subject)

2015-03-18 Thread Faidon Liambotis
severity 765577 serious
thanks

On Wed, Feb 25, 2015 at 03:24:08PM +, Filippo Giunchedi wrote:
 FWIW we're running into the same bug with jessie installer, passing
 'debug' at boot apparently is enough to not trigger the race with good
 success rate.

Filippo and I both work for the Wikimedia Foundation, where this is
affecting us on dozens of systems.

I tried to debug this extensively and had a chat with Marco d'Itri on
IRC. It's both mine  Marco's opinion that this is an RC bug, thus
elevating this to serious. Unfortunately, Marco told me that he won't
able to tackle this and suggested to reply to this bug report so that
the other udev maintainers can help out.

The result of my own investigation is (not speaking for Marco):

It's clear that there's some race condition happening here both because
there are reports of it happening sporadically (not in my case, though)
and because setting d-i to debug mode fixes it.

Therefore, the operating theory is that multiple events for the same
add event are triggered. This race is supposed to be handled, as:

a) write_net_rules takes a lock before writing anything -- it's also
evident this happens, as the duplicate entries have ethNs that are
numerically ascending and not the same for the same card.

b) 75-persistent-net-generator.rules is supposed to be idempotent, as it
bails out early (3rd line) for interfaces that already have a NAME set.
For the ones that don't, it also sets NAME right after the
write_net_rules invocation.

However this still leaves room for a race: write_net_rules is *not*
idempotent and hence if 75-persistent-net-generator.rules gets called
twice in very quick succession, before write_net_rules gets a chance to
finish and name the interface, then an interface will be named twice,
with a different name (and hence, eth0 will be renamed to e.g. eth2).

It's still unknown to me why this is a regression.

I've tried the following, under /lib/debian-installer/start-udev:
1) Adding a udevadm settle || true right after the udevadm trigger.
2) Adding a sleep 15 before udevadm trigger
3) Adding a sleep 15 (or 3) *after* udevadm trigger.

Surprisingly, of these three, only (3) worked around the bug.

Another less arbitrary/racy workaround I suggesed was a grep near the
top of write_net_rules' write_rule() function.  Since write_rule()
operates under a lock, this would completely eliminate any kind of race
here. I pitched this to Marco but he wasn't thrilled with the idea -- he
said he'd prefer finding the root cause. I've done the change and tested
it anyway, though, and it successfully aleviates this issue:

diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules
index 4379792..fbd1230 100644
--- a/debian/extra/write_net_rules
+++ b/debian/extra/write_net_rules
@@ -60,6 +60,9 @@ write_rule() {
local name=$2
local comment=$3
 
+   # workaround potential races, #765577
+   if grep -q -F $match $RULES_FILE then return; fi
+
{
if [ $PRINT_HEADER ]; then
PRINT_HEADER=

Thanks,
Faidon


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#765577: (no subject)

2015-03-18 Thread Michael Biebl
Am 18.03.2015 um 18:15 schrieb Faidon Liambotis:
 Another less arbitrary/racy workaround I suggesed was a grep near the
 top of write_net_rules' write_rule() function.  Since write_rule()
 operates under a lock, this would completely eliminate any kind of race
 here. I pitched this to Marco but he wasn't thrilled with the idea -- he
 said he'd prefer finding the root cause. I've done the change and tested
 it anyway, though, and it successfully aleviates this issue:

I'm with Marco here. Before adding any workarounds, we need to
understand what the underlying problem is. Otherwise we are adding cruft
which nobody understands anymore a few years from now.

Since I can't reproduce the issue and already have trouble keeping up
with other bug reports, further investigation needs to be done by
someone else.

Regards,
Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#765577: (no subject)

2015-03-18 Thread Faidon Liambotis
On Wed, Mar 18, 2015 at 06:52:14PM +0100, Michael Biebl wrote:
 I'm with Marco here. Before adding any workarounds, we need to
 understand what the underlying problem is. Otherwise we are adding cruft
 which nobody understands anymore a few years from now.
 
 Since I can't reproduce the issue and already have trouble keeping up
 with other bug reports, further investigation needs to be done by
 someone else.

Well, the root cause IMO is that 75-persistent-net-generator.rules is
inherently susceptible to races. It's my understanding that it's valid
for events to be triggered multiple times -- there are multiple places
in d-i that udevadm trigger is called, including start-udev (as
shipped by udev-udeb) which would trigger another set of add events
beyond the original hotplug one.

This is why write_net_rules operates under a lockfile too (again, AIUI).
It's just that this doesn't fix the race, just limits the race window
significantly but not entirely (as it makes write_net_rules idempotent,
but not 75-persistent-net-generator.rules).

The reason this is found in current jessie might just be that udev got
faster, or more udevadm triggers were added in other parts of the
installer etc.

In any case, if you need more input from a buggy system, feel free to
let me know and I'll try my best to get back to you with more details.

Oh, my original patch was buggy for many reasons, here's the updated
version:

diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules
index 4379792..95939b8 100644
--- a/debian/extra/write_net_rules
+++ b/debian/extra/write_net_rules
@@ -117,6 +117,12 @@ fi
 basename=${INTERFACE%%[0-9]*}
 match=$match, KERNEL==\$basename*\
 
+# detect a race and abort, cf. #765577
+if [ -f $RULES_FILE ]  grep -q -F $match $RULES_FILE; then
+   unlock_rules_file
+   exit 0
+fi
+
 if [ $INTERFACE_NAME ]; then
# external tools may request a custom name
COMMENT=$COMMENT (custom name provided by external tool)

Regards,
Faidon


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#765577: (no subject)

2015-02-25 Thread Filippo Giunchedi
On Fri, Feb 06, 2015 at 01:26 AM, Chris Kuehl wrote:
 Hi Christof,
 
 On Fri, Feb 06, 2015 at 08:30:58AM +0100, Christof Böckler wrote:
  Maybe it is has to do with the presence of a second networking device?
 
 The devices I tested on (which had a 100% reproducibility rate) all had
 a single network card. So maybe it's possible that's the reason you
 don't see the same level of occurrence.

FWIW we're running into the same bug with jessie installer, passing
'debug' at boot apparently is enough to not trigger the race with good
success rate. Unfortunately it isn't a very good workaround, it needs to
be undone e.g. in /etc/default/grub after installation. See also
https://phabricator.wikimedia.org/T90236


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#765577: (no subject)

2015-02-06 Thread Chris Kuehl
Hi Christof,

On Fri, Feb 06, 2015 at 08:30:58AM +0100, Christof Böckler wrote:
 Maybe it is has to do with the presence of a second networking device?

The devices I tested on (which had a 100% reproducibility rate) all had
a single network card. So maybe it's possible that's the reason you
don't see the same level of occurrence.

Chris


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#765577: (no subject)

2015-02-05 Thread Christof Böckler
Being redirected from my bugreport #777126 I can confirm that this 
happens with automatic installations.
Although it does not happen all the time. With a Lenovo Thinkpad X61 for 
example, I got this /etc/udev/rules.d/70-persistent-net.rules:


# PCI device 0x8086:0x1049 (e1000e)
SUBSYSTEM==net, ACTION==add, DRIVERS==?*, 
ATTR{address}==00:16:d3:ca:56:05, ATTR{dev_id}==0x0, 
ATTR{type}==1, KERNEL==eth*, NAME=eth0


# PCI device 0x8086:0x4230 (iwl4965)
SUBSYSTEM==net, ACTION==add, DRIVERS==?*, 
ATTR{address}==00:13:e8:ab:5d:43, ATTR{dev_id}==0x0, 
ATTR{type}==1, KERNEL==wlan*, NAME=wlan0


Maybe it is has to do with the presence of a second networking device?

Greetings
Christof


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org