Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
Now I've tested it, and it fixed the problem :) Thanks, Jimmy On Tue, May 13, 2014 at 11:31 PM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: On Tue, May 13, 2014 at 07:40:53PM +0200, Umut Tezduyar Lindskog wrote: It is also reproducible by just loosing the carrier on the link. Maybe new async close is a candidate to solve it. On Tuesday, April 29, 2014, Harald Hoyer harald.ho...@gmail.com wrote: Am 28.04.2014 13:33, schrieb Jimmy Assarsson: Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? Can you check if this patch fixes the problem: -- Subject: [PATCH] core: close socket fds asynchronously http://lists.freedesktop.org/archives/systemd-devel/2014-April/018928.html --- src/core/service.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/core/service.c b/src/core/service.c index 694a265..7461ec3 100644 --- a/src/core/service.c +++ b/src/core/service.c @@ -27,6 +27,7 @@ #include linux/reboot.h #include sys/syscall.h +#include async.h #include manager.h #include unit.h #include service.h @@ -222,7 +223,7 @@ static void service_close_socket_fd(Service *s) { if (s-socket_fd 0) return; -s-socket_fd = safe_close(s-socket_fd); +s-socket_fd = asynchronous_close(s-socket_fd); } static void service_connection_unref(Service *s) { @@ -2705,7 +2706,7 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value, log_debug_unit(u-id, Failed to parse socket-fd value %s, value); else { -safe_close(s-socket_fd); +asynchronous_close(s-socket_fd); s-socket_fd = fdset_remove(fds, fd); } } else if (streq(key, main-exec-status-pid)) { -- 1.9.0 -- Thanks, Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
On Thu, May 15, 2014 at 10:40:48AM +0200, Jimmy Assarsson wrote: Now I've tested it, and it fixed the problem :) Great, thanks for testing. I'll push it to master then. Zbyszek On Tue, May 13, 2014 at 11:31 PM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: On Tue, May 13, 2014 at 07:40:53PM +0200, Umut Tezduyar Lindskog wrote: It is also reproducible by just loosing the carrier on the link. Maybe new async close is a candidate to solve it. On Tuesday, April 29, 2014, Harald Hoyer harald.ho...@gmail.com wrote: Am 28.04.2014 13:33, schrieb Jimmy Assarsson: Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? Can you check if this patch fixes the problem: -- Subject: [PATCH] core: close socket fds asynchronously http://lists.freedesktop.org/archives/systemd-devel/2014-April/018928.html --- src/core/service.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/core/service.c b/src/core/service.c index 694a265..7461ec3 100644 --- a/src/core/service.c +++ b/src/core/service.c @@ -27,6 +27,7 @@ #include linux/reboot.h #include sys/syscall.h +#include async.h #include manager.h #include unit.h #include service.h @@ -222,7 +223,7 @@ static void service_close_socket_fd(Service *s) { if (s-socket_fd 0) return; -s-socket_fd = safe_close(s-socket_fd); +s-socket_fd = asynchronous_close(s-socket_fd); } static void service_connection_unref(Service *s) { @@ -2705,7 +2706,7 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value, log_debug_unit(u-id, Failed to parse socket-fd value %s, value); else { -safe_close(s-socket_fd); +asynchronous_close(s-socket_fd); s-socket_fd = fdset_remove(fds, fd); } } else if (streq(key, main-exec-status-pid)) { -- 1.9.0 -- Thanks, Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
Hi, We will check it. Meanwhile, I believe you can also try it since problem is arch independent. Just follow Jimmy Assarsson's instructions. Umut On Tue, May 13, 2014 at 11:31 PM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: On Tue, May 13, 2014 at 07:40:53PM +0200, Umut Tezduyar Lindskog wrote: It is also reproducible by just loosing the carrier on the link. Maybe new async close is a candidate to solve it. On Tuesday, April 29, 2014, Harald Hoyer harald.ho...@gmail.com wrote: Am 28.04.2014 13:33, schrieb Jimmy Assarsson: Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? Can you check if this patch fixes the problem: -- Subject: [PATCH] core: close socket fds asynchronously http://lists.freedesktop.org/archives/systemd-devel/2014-April/018928.html --- src/core/service.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/core/service.c b/src/core/service.c index 694a265..7461ec3 100644 --- a/src/core/service.c +++ b/src/core/service.c @@ -27,6 +27,7 @@ #include linux/reboot.h #include sys/syscall.h +#include async.h #include manager.h #include unit.h #include service.h @@ -222,7 +223,7 @@ static void service_close_socket_fd(Service *s) { if (s-socket_fd 0) return; -s-socket_fd = safe_close(s-socket_fd); +s-socket_fd = asynchronous_close(s-socket_fd); } static void service_connection_unref(Service *s) { @@ -2705,7 +2706,7 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value, log_debug_unit(u-id, Failed to parse socket-fd value %s, value); else { -safe_close(s-socket_fd); +asynchronous_close(s-socket_fd); s-socket_fd = fdset_remove(fds, fd); } } else if (streq(key, main-exec-status-pid)) { -- 1.9.0 -- Thanks, Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
On Wed, May 14, 2014 at 09:05:12AM +0200, Umut Tezduyar Lindskog wrote: Hi, We will check it. Meanwhile, Thanks. I believe you can also try it since problem is arch independent. Just follow Jimmy Assarsson's instructions. I know, I'm just being lazy :) Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
It is also reproducible by just loosing the carrier on the link. Maybe new async close is a candidate to solve it. On Tuesday, April 29, 2014, Harald Hoyer harald.ho...@gmail.com wrote: Am 28.04.2014 13:33, schrieb Jimmy Assarsson: Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? To reproduce the problem: 1) Get latest Arch Linux 2) On remote machine execute rsh $target_ip -l root 'sleep 40' 3) Set link down on the interface which is assigned with $target_ip, on systemd machine ip link set down dev $if 4) On systemd machine, wait for 'sleep 40' to be completed. Then execute any systemd command systemctl list-jobs 5) After 60 seconds systemd is responding again By looking at the stack trace (see bellow), one can see that we are trying to close a socket and waiting on a system close call. So it's probably not a systemd problem, however systemd is affected by it. We've succesfully reproduced the problem on different hardware architectures (x86_64, arm, cris), systemd versions (208, 210, 212) and rshd implementations (netkit-rsh-0.17, inetutils 1.9.2-1). The problem occurs not only when the interface's link is set down, also when the IP address is removed or the ethernet cable is unplugged. ssh seems not to be affected by the problem. We generated a core dump: kill -SIGABRT 1 Here is the stack trace (the machine is running systemd 210). (gdb) bt #0 0xb6f4d830 in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:46 #1 0x000527e8 in crash.4282 (sig=6) at apps/systemd/systemd/src/core/main.c:156 #2 signal handler called #3 0xb6f4c28c in close () from target/armv6-axis-linux-gnueabi/lib/libpthread.so.0 #4 0x0009417c in close_nointr (fd=optimized out) at apps/systemd/systemd/src/shared/util.c:167 #5 0x00094250 in close_nointr_nofail (fd=optimized out) at apps/systemd/systemd/src/shared/util.c:191 #6 0x00073e0c in service_close_socket_fd.9824 (s=s@entry=0x1b6f918) at apps/systemd/systemd/src/core/service.c:229 #7 0x00079728 in service_set_state.9835 (s=s@entry=0x1b6f918, state=SERVICE_DEAD) at apps/systemd/systemd/src/core/service.c:1496 #8 0x00079b70 in service_enter_dead.9847 (s=0x1b6f918, f=optimized out, allow_restart=optimized out) at apps/systemd/systemd/src/core/service.c:1852 #9 0x00065470 in service_sigchld_event (u=0x1b6f918, pid=optimized out, code=1, status=0) at apps/systemd/systemd/src/core/service.c:3037 #10 0x00073490 in invoke_sigchld_event.5410 (m=m@entry=0x1ad7360, u=0x1b6f918, si=0xbe862670, si@entry=0xbe862668) at apps/systemd/systemd/src/core/manager.c:1430 #11 0x00054084 in manager_dispatch_sigchld.5415 (m=m@entry=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1477 #12 0x000629b0 in manager_dispatch_signal_fd.part.32 (userdata=optimized out) at apps/systemd/systemd/src/core/manager.c:1723 #13 manager_dispatch_signal_fd.5363 (source=optimized out, fd=optimized out, revents=optimized out, userdata=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1508 #14 0x0003e880 in source_dispatch (s=0x1ad7758) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:1861 #15 0x00041288 in sd_event_run (e=0x1ad61d8, timeout=optimized out) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:2117 #16 0x000103c8 in manager_loop (m=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1844 #17 main (argc=1, argv=0xbe862ee4) at apps/systemd/systemd/src/core/main.c:1704 Thanks, Jimmy Hmm, reminds me of: http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required http://oroboro.com/dealing-with-network-port-abuse-in-sockets-in-c/ ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org javascript:; http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
On Tue, May 13, 2014 at 07:40:53PM +0200, Umut Tezduyar Lindskog wrote: It is also reproducible by just loosing the carrier on the link. Maybe new async close is a candidate to solve it. On Tuesday, April 29, 2014, Harald Hoyer harald.ho...@gmail.com wrote: Am 28.04.2014 13:33, schrieb Jimmy Assarsson: Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? Can you check if this patch fixes the problem: -- Subject: [PATCH] core: close socket fds asynchronously http://lists.freedesktop.org/archives/systemd-devel/2014-April/018928.html --- src/core/service.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/core/service.c b/src/core/service.c index 694a265..7461ec3 100644 --- a/src/core/service.c +++ b/src/core/service.c @@ -27,6 +27,7 @@ #include linux/reboot.h #include sys/syscall.h +#include async.h #include manager.h #include unit.h #include service.h @@ -222,7 +223,7 @@ static void service_close_socket_fd(Service *s) { if (s-socket_fd 0) return; -s-socket_fd = safe_close(s-socket_fd); +s-socket_fd = asynchronous_close(s-socket_fd); } static void service_connection_unref(Service *s) { @@ -2705,7 +2706,7 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value, log_debug_unit(u-id, Failed to parse socket-fd value %s, value); else { -safe_close(s-socket_fd); +asynchronous_close(s-socket_fd); s-socket_fd = fdset_remove(fds, fd); } } else if (streq(key, main-exec-status-pid)) { -- 1.9.0 -- Thanks, Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd freezes after rshd execution, if network connection is down
Am 28.04.2014 13:33, schrieb Jimmy Assarsson: Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? To reproduce the problem: 1) Get latest Arch Linux 2) On remote machine execute rsh $target_ip -l root 'sleep 40' 3) Set link down on the interface which is assigned with $target_ip, on systemd machine ip link set down dev $if 4) On systemd machine, wait for 'sleep 40' to be completed. Then execute any systemd command systemctl list-jobs 5) After 60 seconds systemd is responding again By looking at the stack trace (see bellow), one can see that we are trying to close a socket and waiting on a system close call. So it's probably not a systemd problem, however systemd is affected by it. We've succesfully reproduced the problem on different hardware architectures (x86_64, arm, cris), systemd versions (208, 210, 212) and rshd implementations (netkit-rsh-0.17, inetutils 1.9.2-1). The problem occurs not only when the interface's link is set down, also when the IP address is removed or the ethernet cable is unplugged. ssh seems not to be affected by the problem. We generated a core dump: kill -SIGABRT 1 Here is the stack trace (the machine is running systemd 210). (gdb) bt #0 0xb6f4d830 in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:46 #1 0x000527e8 in crash.4282 (sig=6) at apps/systemd/systemd/src/core/main.c:156 #2 signal handler called #3 0xb6f4c28c in close () from target/armv6-axis-linux-gnueabi/lib/libpthread.so.0 #4 0x0009417c in close_nointr (fd=optimized out) at apps/systemd/systemd/src/shared/util.c:167 #5 0x00094250 in close_nointr_nofail (fd=optimized out) at apps/systemd/systemd/src/shared/util.c:191 #6 0x00073e0c in service_close_socket_fd.9824 (s=s@entry=0x1b6f918) at apps/systemd/systemd/src/core/service.c:229 #7 0x00079728 in service_set_state.9835 (s=s@entry=0x1b6f918, state=SERVICE_DEAD) at apps/systemd/systemd/src/core/service.c:1496 #8 0x00079b70 in service_enter_dead.9847 (s=0x1b6f918, f=optimized out, allow_restart=optimized out) at apps/systemd/systemd/src/core/service.c:1852 #9 0x00065470 in service_sigchld_event (u=0x1b6f918, pid=optimized out, code=1, status=0) at apps/systemd/systemd/src/core/service.c:3037 #10 0x00073490 in invoke_sigchld_event.5410 (m=m@entry=0x1ad7360, u=0x1b6f918, si=0xbe862670, si@entry=0xbe862668) at apps/systemd/systemd/src/core/manager.c:1430 #11 0x00054084 in manager_dispatch_sigchld.5415 (m=m@entry=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1477 #12 0x000629b0 in manager_dispatch_signal_fd.part.32 (userdata=optimized out) at apps/systemd/systemd/src/core/manager.c:1723 #13 manager_dispatch_signal_fd.5363 (source=optimized out, fd=optimized out, revents=optimized out, userdata=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1508 #14 0x0003e880 in source_dispatch (s=0x1ad7758) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:1861 #15 0x00041288 in sd_event_run (e=0x1ad61d8, timeout=optimized out) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:2117 #16 0x000103c8 in manager_loop (m=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1844 #17 main (argc=1, argv=0xbe862ee4) at apps/systemd/systemd/src/core/main.c:1704 Thanks, Jimmy Hmm, reminds me of: http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required http://oroboro.com/dealing-with-network-port-abuse-in-sockets-in-c/ ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd freezes after rshd execution, if network connection is down
Hi, We stumbled upon a freeze/block in systemd. The problem occurs when a rshd (socket activated) execution is completed, the network connection is down and systemd is closing the socket. This causes a long (60 seconds) freeze where it's not possible to communicate with systemd. Do you have any idea on what is causing this or how we can investigate this further? To reproduce the problem: 1) Get latest Arch Linux 2) On remote machine execute rsh $target_ip -l root 'sleep 40' 3) Set link down on the interface which is assigned with $target_ip, on systemd machine ip link set down dev $if 4) On systemd machine, wait for 'sleep 40' to be completed. Then execute any systemd command systemctl list-jobs 5) After 60 seconds systemd is responding again By looking at the stack trace (see bellow), one can see that we are trying to close a socket and waiting on a system close call. So it's probably not a systemd problem, however systemd is affected by it. We've succesfully reproduced the problem on different hardware architectures (x86_64, arm, cris), systemd versions (208, 210, 212) and rshd implementations (netkit-rsh-0.17, inetutils 1.9.2-1). The problem occurs not only when the interface's link is set down, also when the IP address is removed or the ethernet cable is unplugged. ssh seems not to be affected by the problem. We generated a core dump: kill -SIGABRT 1 Here is the stack trace (the machine is running systemd 210). (gdb) bt #0 0xb6f4d830 in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:46 #1 0x000527e8 in crash.4282 (sig=6) at apps/systemd/systemd/src/core/main.c:156 #2 signal handler called #3 0xb6f4c28c in close () from target/armv6-axis-linux-gnueabi/lib/libpthread.so.0 #4 0x0009417c in close_nointr (fd=optimized out) at apps/systemd/systemd/src/shared/util.c:167 #5 0x00094250 in close_nointr_nofail (fd=optimized out) at apps/systemd/systemd/src/shared/util.c:191 #6 0x00073e0c in service_close_socket_fd.9824 (s=s@entry=0x1b6f918) at apps/systemd/systemd/src/core/service.c:229 #7 0x00079728 in service_set_state.9835 (s=s@entry=0x1b6f918, state=SERVICE_DEAD) at apps/systemd/systemd/src/core/service.c:1496 #8 0x00079b70 in service_enter_dead.9847 (s=0x1b6f918, f=optimized out, allow_restart=optimized out) at apps/systemd/systemd/src/core/service.c:1852 #9 0x00065470 in service_sigchld_event (u=0x1b6f918, pid=optimized out, code=1, status=0) at apps/systemd/systemd/src/core/service.c:3037 #10 0x00073490 in invoke_sigchld_event.5410 (m=m@entry=0x1ad7360, u=0x1b6f918, si=0xbe862670, si@entry=0xbe862668) at apps/systemd/systemd/src/core/manager.c:1430 #11 0x00054084 in manager_dispatch_sigchld.5415 (m=m@entry=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1477 #12 0x000629b0 in manager_dispatch_signal_fd.part.32 (userdata=optimized out) at apps/systemd/systemd/src/core/manager.c:1723 #13 manager_dispatch_signal_fd.5363 (source=optimized out, fd=optimized out, revents=optimized out, userdata=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1508 #14 0x0003e880 in source_dispatch (s=0x1ad7758) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:1861 #15 0x00041288 in sd_event_run (e=0x1ad61d8, timeout=optimized out) at apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:2117 #16 0x000103c8 in manager_loop (m=0x1ad7360) at apps/systemd/systemd/src/core/manager.c:1844 #17 main (argc=1, argv=0xbe862ee4) at apps/systemd/systemd/src/core/main.c:1704 Thanks, Jimmy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel