Bug#787191: Bug#776192: Linux null-pointer deref in 3.16.7-ctk2-1 (was: Bug#776192: upgrade-reports wheezy to jessie boot problem)

2015-08-25 Thread Faidon Liambotis
Hey,

On Tue, Jun 16, 2015 at 06:21:20PM +0300, Faidon Liambotis wrote:
 Any news about this? Can I help in any way?

Are there any objections/holdups here? It'd be great if this made it to
8.2, the deadline for which is this weekend AIUI. Let me know if I can
help! :)

Faidon

___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers


Bug#765577: (no subject)

2015-04-22 Thread Faidon Liambotis
reopen 765577 !
found 765577 215-14
thanks

On Mon, Mar 30, 2015 at 06:06:47AM +0200, Marco d'Itri wrote:
 I see that we have independently devised the same fix, I am attaching 
 a test case and a more refined version of your patch.

I tried Jessie RC3 today and immediately found that the fix is,
unfortunately, buggy. Your patch constructs a regexp and takes care to
escape metacharacters ? and * with a sed but does not escape { and
} that are also metacharacters in the extended set of POSIX regexps.
These are always found in the string-to-be-matched here with
'ATTR{dev_id}==0x0' and 'ATTR{type}==1', so the if always fails.

This was likely not caught by your test case (and was harder to debug
and figure out!) because GNU grep's -E mode handles { as both a literal
and a metacharacter heuristically for historic reasons (consult grep's
manpage for that) but busybox grep does not:
  $ echo 'foo{bar}'  test
  $ egrep 'foo{bar}' test 
  foo{bar}
  $ busybox egrep 'foo{bar}' test 
  egrep: bad regex 'foo{bar}'
  $ egrep 'fo{1,2}' test 
  foo{bar}
  $ busybox egrep 'fo{1,2}' test 
  foo{bar}
Note that this is NOT a bug in busybox; foo{bar} is indeed an invalid
extended POSIX regexp and busybox is right to complain and error out.

The very minimal last-minute fix below did the trick for me but I have
to say... constructing regexps in shell is tricky and the whole
escaping-with-sed logic feels like a hack. I think a literal grep (i.e.
-F) would be better here, especially since I don't see the point of an
exact match (even if the file was modified by the sysadmin, the right
thing would to not write a new rule anyway). This is probably something
to be considered post-jessie.

Thanks,
Faidon

diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules
index 38a3ca0..fedc0f1 100644
--- a/debian/extra/write_net_rules
+++ b/debian/extra/write_net_rules
@@ -118,7 +118,7 @@ basename=${INTERFACE%%[0-9]*}
 match=$match, KERNEL==\$basename*\
 
 # build a regular expression that matches the new rule that we want to write
-new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 
's/([\?\*])/\\\1/g')
+new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 
's/([\?\*\{\}])/\\\1/g')
 
 # Double check if the new rule has already been written. This happens if
 # multiple add events are generated before the script returns and udevd

___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers


Bug#781210: systemd asserts on function cg_is_empty_recursive, crashes

2015-03-27 Thread Faidon Liambotis
Hi Martin,

On Fri, Mar 27, 2015 at 04:40:25PM +0100, Martin Pitt wrote:
  If so, a mere ipsec stop after that should be able to crash
  systemd.
 
 Not that, it just marks the unit as stopped but keeps the processes
 running. But killing the two daemons manually makes the cgroup empty
 and I get that very exception.

I *think* you read systemctl stop ipsec while I really meant ipsec
stop (ipsec being /usr/sbin/ipsec, and stop being an action that
sends SIGTERM to the daemons, among other things).

By get that very exception you mean that systemd crashes for you as
well? If so, that's great :) Anything more I can do to help then? You
seem to be in a better position to reproduce than me at the moment.

On a side note, I've noticed that if I put the system under stress
--cpu 8 the behavior changes and systemctl restart strongswan works
properly. This definitely points to some kind of race.

Thanks!
Faidon

___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers


Bug#781210: systemd asserts on function cg_is_empty_recursive, crashes

2015-03-26 Thread Faidon Liambotis
Hi Martin!

On Thu, Mar 26, 2015 at 03:17:16PM +0100, Martin Pitt wrote:
 Control: severity -1 important
 
 I downgrade the severity to important as per
 https://www.debian.org/Bugs/Developer#severities (and with #781209 we
 have the bug that triggers this one); nevertheless, this is still an
 important issue of course.

Well, this does makes the whole system break (the system needs a reboot
to properly function again; daemon-reexec didn't work). I was about to
deploy this to a large fleet of machines and having to reboot all of
them would be quite catastrophic. I think it deserves to at least be RC.

 So cgroup in manager_notify_cgroup_empty() is valid, and
 manager_get_unit_by_cgroup(m, cgroup) returns some unit u, but
 u-cgroup_path is NULL. I suppose u-id is ipsec.service (if you can
 easily reproduce this, confirming this in gdb would be appreciated),
 so as the first iteration this smells like a bug in
 the cgroup_unit hashmap maintenance.
 
 I don't get quite the same effect as you, but I can reproduce the
 wrong cgroup and that systemctl restart strongswan leaves the old
 processes around and does not actually kill them. I don't get the
 assertion or crash, though.
 
 219 in experimental behaves much better, the processes gets put into
 the strongswan.service cgroup, and stopping, starting, restarting
 works properly. Can you confirm this?

I haven't been able to reproduce it again :/ I must be missing
something, as I was able to reproduce it multiple times on two different
servers yesterday. systemctl restart strongswan does not leave any
processes behind in my runs (I wrote something to do the same sequence
of events in a loop).

Can you confirm that in your case systemctl restart strongswan leaves
unmanageable processes behind (i.e. the ipsec binaries you see do *not*
have --nofork as an argument)? If so, a mere ipsec stop after that
should be able to crash systemd.

Thanks,
Faidon

___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers


Bug#781210: systemd asserts on function cg_is_empty_recursive, crashes

2015-03-25 Thread Faidon Liambotis
Package: systemd
Version: 215-12
Severity: critical

I've managed to reproducibly crash systemd:

# grep systemd /var/log/syslog | tail -3
Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at 
../src/shared/cgroup-util.c:913, function cg_is_empty_recursive().  Aborting.
Mar 26 01:02:15 curium systemd[1]: Caught ABRT, dumped core as pid 6916.
Mar 26 01:02:15 curium systemd[1]: Freezing execution.

After that, the system remains functioning (i.e. pid1 stays alive and
the kernel does not panic) but systemctl etc. do not respond (Failed to
list units: Connection timed out) and the system as a whole is pretty
useless until a reboot.

This is hard to trigger as it happens under very specific conditions
plus a race, but I've managed to reproduce it five times already on two
different servers.

The gory details and steps to reproduce are over at #781209, but in
short:

- strongswan-starter ships an init script /etc/init.d/ipsec and a system
  unit file named strongswan.service but containing Alias=ipsec.service.

- strongswan-starter's postinst is buggy and calls invoke-rc.d ipsec
  start manually before the systemd unit is fully set up.

- This results into the ipsec daemons actually starting up in an
  ipsec.service cgroup, as evidenced by e.g. a systemctl status.

- A subsequent systemctl restart strongswan almost always results into
  the service becoming inactive and the processes under the
  ipsec.service cgroup being killed.

  Sometimes, though, the service gets into an inactive (dead) state
  but the processes from the (wrong) cgroup stay up. This possibly
  happens because systemd tries to set up a strongswan.service cgroup?

- At that point, the processes are orphaned and lost from systemd's
  control and are completely unmanageable by systemctl.
  
- Killing them by hand (e.g. via kill or ipsec stop) crashes systemd.

A gdb bt full is attached.

Faidon
(gdb) bt full
#0  0x7f8fb8b0d79b in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
resultvar = 0
pid = optimized out
#1  0x7f8fb8f633d8 in crash.lto_priv.234 (sig=6) at ../src/core/main.c:158
rl = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
sa = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {0 repeats 16 times}}, sa_flags = 0, sa_restorer = 0x0}
__func__ = crash
__PRETTY_FUNCTION__ = crash
#2  signal handler called
No locals.
#3  0x7f8fb878a107 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
resultvar = 0
pid = 1
selftid = 1
#4  0x7f8fb878b4e8 in __GI_abort () at abort.c:89
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x7ffd56ee0ae4, sa_sigaction 
= 0x7ffd56ee0ae4}, sa_mask = {__val = {140255286606672, 140255286672192, 
  140726061894624, 140255287343296, 5990105739957488896, 
140255286672192, 140255260941991, 140255260944944, 2, 1, 140255287343248, 913, 
  140255260573900, 140255263445632, 140255287195268, 
140255286635808}}, sa_flags = -1165411504, sa_restorer = 0x7f8fba8a3b40}
sigs = {__val = {32, 0 repeats 15 times}}
#5  0x7f8fb8f9aed2 in log_assert_failed (text=text@entry=0x7f8fb901c736 
path, file=file@entry=0x7f8fb9019ea7 ../src/shared/cgroup-util.c, 
line=line@entry=913, func=func@entry=0x7f8fb901aa30 
__PRETTY_FUNCTION__.8851 cg_is_empty_recursive) at ../src/shared/log.c:709
No locals.
#6  0x7f8fb8f6cc8f in cg_is_empty_recursive.constprop.53 (path=0x0, 
ignore_self=ignore_self@entry=true, controller=synthetic pointer)
at ../src/shared/cgroup-util.c:913
d = 0x0
fn = 0x7f8fba8a4d90 \001
r = optimized out
#7  0x7f8fb8ff21c3 in manager_notify_cgroup_empty (cgroup=optimized out, 
m=0x7f8fba893b50) at ../src/core/cgroup.c:978
u = 0x7f8fba894d30
r = optimized out
#8  signal_agent_released (bus=optimized out, message=0x7f8fba8a3b40, 
userdata=0x7f8fba893b50, error=optimized out) at ../src/core/dbus.c:90
m = 0x7f8fba893b50
cgroup = 0x7f8fba923684 /system.slice/ipsec.service
r = optimized out
__PRETTY_FUNCTION__ = signal_agent_released
__func__ = signal_agent_released
#9  0x7f8fb9009137 in bus_match_run (bus=0x7f8fba8c2970, 
node=0x7f8fba90f260, m=0x7f8fba8a3b40) at 
../src/libsystemd/sd-bus/bus-match.c:299
error_buffer = {name = 0x0, message = 0x0, _need_free = 0}
slot = 0x7f8fba947890
test_str = optimized out
test_u8 = optimized out
r = optimized out
m = 0x7f8fba8a3b40
node = 0x7f8fba90f260
bus = optimized out
__PRETTY_FUNCTION__ = bus_match_run
#10 0x7f8fb9008dae in bus_match_run (bus=0x7f8fba8c2970, 
node=0x7f8fba9242c0, m=0x7f8fba8a3b40) at 
../src/libsystemd/sd-bus/bus-match.c:391
test_str = optimized out
test_u8 = optimized out
r = optimized out
m = 0x7f8fba8a3b40
node =