Bug#787191: Bug#776192: Linux null-pointer deref in 3.16.7-ctk2-1 (was: Bug#776192: upgrade-reports wheezy to jessie boot problem)
Hey, On Tue, Jun 16, 2015 at 06:21:20PM +0300, Faidon Liambotis wrote: Any news about this? Can I help in any way? Are there any objections/holdups here? It'd be great if this made it to 8.2, the deadline for which is this weekend AIUI. Let me know if I can help! :) Faidon ___ Pkg-systemd-maintainers mailing list Pkg-systemd-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers
Bug#765577: (no subject)
reopen 765577 ! found 765577 215-14 thanks On Mon, Mar 30, 2015 at 06:06:47AM +0200, Marco d'Itri wrote: I see that we have independently devised the same fix, I am attaching a test case and a more refined version of your patch. I tried Jessie RC3 today and immediately found that the fix is, unfortunately, buggy. Your patch constructs a regexp and takes care to escape metacharacters ? and * with a sed but does not escape { and } that are also metacharacters in the extended set of POSIX regexps. These are always found in the string-to-be-matched here with 'ATTR{dev_id}==0x0' and 'ATTR{type}==1', so the if always fails. This was likely not caught by your test case (and was harder to debug and figure out!) because GNU grep's -E mode handles { as both a literal and a metacharacter heuristically for historic reasons (consult grep's manpage for that) but busybox grep does not: $ echo 'foo{bar}' test $ egrep 'foo{bar}' test foo{bar} $ busybox egrep 'foo{bar}' test egrep: bad regex 'foo{bar}' $ egrep 'fo{1,2}' test foo{bar} $ busybox egrep 'fo{1,2}' test foo{bar} Note that this is NOT a bug in busybox; foo{bar} is indeed an invalid extended POSIX regexp and busybox is right to complain and error out. The very minimal last-minute fix below did the trick for me but I have to say... constructing regexps in shell is tricky and the whole escaping-with-sed logic feels like a hack. I think a literal grep (i.e. -F) would be better here, especially since I don't see the point of an exact match (even if the file was modified by the sysadmin, the right thing would to not write a new rule anyway). This is probably something to be considered post-jessie. Thanks, Faidon diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules index 38a3ca0..fedc0f1 100644 --- a/debian/extra/write_net_rules +++ b/debian/extra/write_net_rules @@ -118,7 +118,7 @@ basename=${INTERFACE%%[0-9]*} match=$match, KERNEL==\$basename*\ # build a regular expression that matches the new rule that we want to write -new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*])/\\\1/g') +new_rule_pattern=$(echo ^SUBSYSTEM==\net\, ACTION==\add\$match | sed -re 's/([\?\*\{\}])/\\\1/g') # Double check if the new rule has already been written. This happens if # multiple add events are generated before the script returns and udevd ___ Pkg-systemd-maintainers mailing list Pkg-systemd-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers
Bug#781210: systemd asserts on function cg_is_empty_recursive, crashes
Hi Martin, On Fri, Mar 27, 2015 at 04:40:25PM +0100, Martin Pitt wrote: If so, a mere ipsec stop after that should be able to crash systemd. Not that, it just marks the unit as stopped but keeps the processes running. But killing the two daemons manually makes the cgroup empty and I get that very exception. I *think* you read systemctl stop ipsec while I really meant ipsec stop (ipsec being /usr/sbin/ipsec, and stop being an action that sends SIGTERM to the daemons, among other things). By get that very exception you mean that systemd crashes for you as well? If so, that's great :) Anything more I can do to help then? You seem to be in a better position to reproduce than me at the moment. On a side note, I've noticed that if I put the system under stress --cpu 8 the behavior changes and systemctl restart strongswan works properly. This definitely points to some kind of race. Thanks! Faidon ___ Pkg-systemd-maintainers mailing list Pkg-systemd-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers
Bug#781210: systemd asserts on function cg_is_empty_recursive, crashes
Hi Martin! On Thu, Mar 26, 2015 at 03:17:16PM +0100, Martin Pitt wrote: Control: severity -1 important I downgrade the severity to important as per https://www.debian.org/Bugs/Developer#severities (and with #781209 we have the bug that triggers this one); nevertheless, this is still an important issue of course. Well, this does makes the whole system break (the system needs a reboot to properly function again; daemon-reexec didn't work). I was about to deploy this to a large fleet of machines and having to reboot all of them would be quite catastrophic. I think it deserves to at least be RC. So cgroup in manager_notify_cgroup_empty() is valid, and manager_get_unit_by_cgroup(m, cgroup) returns some unit u, but u-cgroup_path is NULL. I suppose u-id is ipsec.service (if you can easily reproduce this, confirming this in gdb would be appreciated), so as the first iteration this smells like a bug in the cgroup_unit hashmap maintenance. I don't get quite the same effect as you, but I can reproduce the wrong cgroup and that systemctl restart strongswan leaves the old processes around and does not actually kill them. I don't get the assertion or crash, though. 219 in experimental behaves much better, the processes gets put into the strongswan.service cgroup, and stopping, starting, restarting works properly. Can you confirm this? I haven't been able to reproduce it again :/ I must be missing something, as I was able to reproduce it multiple times on two different servers yesterday. systemctl restart strongswan does not leave any processes behind in my runs (I wrote something to do the same sequence of events in a loop). Can you confirm that in your case systemctl restart strongswan leaves unmanageable processes behind (i.e. the ipsec binaries you see do *not* have --nofork as an argument)? If so, a mere ipsec stop after that should be able to crash systemd. Thanks, Faidon ___ Pkg-systemd-maintainers mailing list Pkg-systemd-maintainers@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers
Bug#781210: systemd asserts on function cg_is_empty_recursive, crashes
Package: systemd Version: 215-12 Severity: critical I've managed to reproducibly crash systemd: # grep systemd /var/log/syslog | tail -3 Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at ../src/shared/cgroup-util.c:913, function cg_is_empty_recursive(). Aborting. Mar 26 01:02:15 curium systemd[1]: Caught ABRT, dumped core as pid 6916. Mar 26 01:02:15 curium systemd[1]: Freezing execution. After that, the system remains functioning (i.e. pid1 stays alive and the kernel does not panic) but systemctl etc. do not respond (Failed to list units: Connection timed out) and the system as a whole is pretty useless until a reboot. This is hard to trigger as it happens under very specific conditions plus a race, but I've managed to reproduce it five times already on two different servers. The gory details and steps to reproduce are over at #781209, but in short: - strongswan-starter ships an init script /etc/init.d/ipsec and a system unit file named strongswan.service but containing Alias=ipsec.service. - strongswan-starter's postinst is buggy and calls invoke-rc.d ipsec start manually before the systemd unit is fully set up. - This results into the ipsec daemons actually starting up in an ipsec.service cgroup, as evidenced by e.g. a systemctl status. - A subsequent systemctl restart strongswan almost always results into the service becoming inactive and the processes under the ipsec.service cgroup being killed. Sometimes, though, the service gets into an inactive (dead) state but the processes from the (wrong) cgroup stay up. This possibly happens because systemd tries to set up a strongswan.service cgroup? - At that point, the processes are orphaned and lost from systemd's control and are completely unmanageable by systemctl. - Killing them by hand (e.g. via kill or ipsec stop) crashes systemd. A gdb bt full is attached. Faidon (gdb) bt full #0 0x7f8fb8b0d79b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 resultvar = 0 pid = optimized out #1 0x7f8fb8f633d8 in crash.lto_priv.234 (sig=6) at ../src/core/main.c:158 rl = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} sa = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 repeats 16 times}}, sa_flags = 0, sa_restorer = 0x0} __func__ = crash __PRETTY_FUNCTION__ = crash #2 signal handler called No locals. #3 0x7f8fb878a107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 resultvar = 0 pid = 1 selftid = 1 #4 0x7f8fb878b4e8 in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x7ffd56ee0ae4, sa_sigaction = 0x7ffd56ee0ae4}, sa_mask = {__val = {140255286606672, 140255286672192, 140726061894624, 140255287343296, 5990105739957488896, 140255286672192, 140255260941991, 140255260944944, 2, 1, 140255287343248, 913, 140255260573900, 140255263445632, 140255287195268, 140255286635808}}, sa_flags = -1165411504, sa_restorer = 0x7f8fba8a3b40} sigs = {__val = {32, 0 repeats 15 times}} #5 0x7f8fb8f9aed2 in log_assert_failed (text=text@entry=0x7f8fb901c736 path, file=file@entry=0x7f8fb9019ea7 ../src/shared/cgroup-util.c, line=line@entry=913, func=func@entry=0x7f8fb901aa30 __PRETTY_FUNCTION__.8851 cg_is_empty_recursive) at ../src/shared/log.c:709 No locals. #6 0x7f8fb8f6cc8f in cg_is_empty_recursive.constprop.53 (path=0x0, ignore_self=ignore_self@entry=true, controller=synthetic pointer) at ../src/shared/cgroup-util.c:913 d = 0x0 fn = 0x7f8fba8a4d90 \001 r = optimized out #7 0x7f8fb8ff21c3 in manager_notify_cgroup_empty (cgroup=optimized out, m=0x7f8fba893b50) at ../src/core/cgroup.c:978 u = 0x7f8fba894d30 r = optimized out #8 signal_agent_released (bus=optimized out, message=0x7f8fba8a3b40, userdata=0x7f8fba893b50, error=optimized out) at ../src/core/dbus.c:90 m = 0x7f8fba893b50 cgroup = 0x7f8fba923684 /system.slice/ipsec.service r = optimized out __PRETTY_FUNCTION__ = signal_agent_released __func__ = signal_agent_released #9 0x7f8fb9009137 in bus_match_run (bus=0x7f8fba8c2970, node=0x7f8fba90f260, m=0x7f8fba8a3b40) at ../src/libsystemd/sd-bus/bus-match.c:299 error_buffer = {name = 0x0, message = 0x0, _need_free = 0} slot = 0x7f8fba947890 test_str = optimized out test_u8 = optimized out r = optimized out m = 0x7f8fba8a3b40 node = 0x7f8fba90f260 bus = optimized out __PRETTY_FUNCTION__ = bus_match_run #10 0x7f8fb9008dae in bus_match_run (bus=0x7f8fba8c2970, node=0x7f8fba9242c0, m=0x7f8fba8a3b40) at ../src/libsystemd/sd-bus/bus-match.c:391 test_str = optimized out test_u8 = optimized out r = optimized out m = 0x7f8fba8a3b40 node =