Re: [systemd-devel] Wierd Segfault in sd_rtnl_message_unref (libnss_myhostname.so.2 by sshd )
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Tom, I will be happy to run different tests, but I need a serious amount of handholding, as I haven't done this kind of work for ages... You could start with how I run it through Valgrind... (which I do have installed, but no clue how to use in this context...) Svenne On 13-01-2015 23:33, Tom Gundersen wrote: Hi Svenne, On Mon, Jan 12, 2015 at 10:08 PM, Svenne Krap svenne.li...@krap.dk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi. On Arch X64 using 218-1 (first packaging of 218) I have run into the following wierd problem. When trying to connect to a ssh server running dualstack (both ipv4 and ipv6) by ipv6, ssh segfaults when I have loaded the full ipv4 bgp routing table (~500k+ routes). IPv4 connections works for some reason, and Ipv6 recovers if I kill the routing daemon (bird). The stack trace of the core-file starts with Stack trace of thread 515: #0 0x7f48334a3dd5 _int_free (libc.so.6) #1 0x7f4834a1e62a sd_rtnl_message_unref (libnss_myhostname.so.2) #2 0x7f4834a1e657 sd_rtnl_message_unref (libnss_myhostname.so.2) And continues with that line (#1 and #2) until frame 63. I have looked in src/libsystemd/sd-rtnl/rtnl-message.c and have two observations (my C is very rusty so feel free to correct me). Line 589, shouldn't the line if (m REFCNT_DEC(m-n_ref) = 0) { be if (m REFCNT_DEC(m-n_ref) = 0) { (I.e. greater-than-equal instead of less-than-equal) As Zbigniew explained, this is actually correct, but misleading. I fixed it to use equality now, which should hopefully make it clearer. Any chance you could run this through valgrind to get a bit more info about what's going wrong? Also, perhaps a test of whether m-next is equal to m on line 597 Hm, well, if there is a loop in the message list we are in trouble, but checking just for two messages pointing at each other is not enough, as the loop could be bigger. That said, such a loop can only happen if there is a real bug in our code, so I don't think we should be checking for that all the time. Thanks for the report! Tom -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQI5BAEBCAAjBQJUuBQPHBpodHRwOi8vc3Zlbm5lLmRrL3BncC9wb2xpY3kACgkQ /zLSj+olL/J/9w//VsorJ1y93yQzSw5SiOegSEr1tZulWP4v41mNRW32ufx22uaz 5KnBbUaokyueArHw2iNRoYpydSK/7yadp/hU9yFTwVnwEuwd/PwFSzPuIpdye2Xz STpIAlu4bBYgP5I4Tmue64VZDXxmrj24BbHd0yM5ycwApGxMtTdYnvrzfeRv0Hkf B0G6W/uRmYkFs2/oFf/4brhikK1EZuZzJPeV0v77SCQBxFyVrllwFcvnoW3cyFMa Co5Mz+5vgCpA2J8mOMFSTDJ3S+kUe6iwS1N5ijC3cM8mvIsQKEGG6xzKJ+mlLkdz J5E7OHoqBT7rEvKBq0LcHMsOC0wpIb9SG3YtXNeUuJNGm01FM0tvqyP57q63DW0r vH7u4y75DMQHeM0e/0uEuCiLVb1FHxQxH49NdwhLhFbA8hR6dq6nFL1zB0XnMTvi lPZdAnEv0WKkkMEVsWH1xABvoYF+VxV3DE/g1Ju/SUW2xmHNQABsp6RB9roDPrGF 8u9FnCbpu/QjM7C4MQR1OH1Z6r4sE/hLcDeNkBQRQRk//8V4W7AkIWQoi7clEAyi OsJc4YQVfIAebFDRukIEd3xKhNvgsH5ERPQSJPJ8FE/2BwXb8b7qzbiJyTfjQQLs RWq2zm1rB2UXjZtazEsZFx9VLmwly9blPolpXxSDYHSMp/uBMSS3KrXj2vk= =kUqi -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Wierd Segfault in sd_rtnl_message_unref (libnss_myhostname.so.2 by sshd )
Hi Svenne, On Mon, Jan 12, 2015 at 10:08 PM, Svenne Krap svenne.li...@krap.dk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi. On Arch X64 using 218-1 (first packaging of 218) I have run into the following wierd problem. When trying to connect to a ssh server running dualstack (both ipv4 and ipv6) by ipv6, ssh segfaults when I have loaded the full ipv4 bgp routing table (~500k+ routes). IPv4 connections works for some reason, and Ipv6 recovers if I kill the routing daemon (bird). The stack trace of the core-file starts with Stack trace of thread 515: #0 0x7f48334a3dd5 _int_free (libc.so.6) #1 0x7f4834a1e62a sd_rtnl_message_unref (libnss_myhostname.so.2) #2 0x7f4834a1e657 sd_rtnl_message_unref (libnss_myhostname.so.2) And continues with that line (#1 and #2) until frame 63. I have looked in src/libsystemd/sd-rtnl/rtnl-message.c and have two observations (my C is very rusty so feel free to correct me). Line 589, shouldn't the line if (m REFCNT_DEC(m-n_ref) = 0) { be if (m REFCNT_DEC(m-n_ref) = 0) { (I.e. greater-than-equal instead of less-than-equal) As Zbigniew explained, this is actually correct, but misleading. I fixed it to use equality now, which should hopefully make it clearer. Any chance you could run this through valgrind to get a bit more info about what's going wrong? Also, perhaps a test of whether m-next is equal to m on line 597 Hm, well, if there is a loop in the message list we are in trouble, but checking just for two messages pointing at each other is not enough, as the loop could be bigger. That said, such a loop can only happen if there is a real bug in our code, so I don't think we should be checking for that all the time. Thanks for the report! Tom ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Wierd Segfault in sd_rtnl_message_unref (libnss_myhostname.so.2 by sshd )
On Mon, Jan 12, 2015 at 10:08:30PM +0100, Svenne Krap wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi. On Arch X64 using 218-1 (first packaging of 218) I have run into the following wierd problem. When trying to connect to a ssh server running dualstack (both ipv4 and ipv6) by ipv6, ssh segfaults when I have loaded the full ipv4 bgp routing table (~500k+ routes). IPv4 connections works for some reason, and Ipv6 recovers if I kill the routing daemon (bird). The stack trace of the core-file starts with Stack trace of thread 515: #0 0x7f48334a3dd5 _int_free (libc.so.6) #1 0x7f4834a1e62a sd_rtnl_message_unref (libnss_myhostname.so.2) #2 0x7f4834a1e657 sd_rtnl_message_unref (libnss_myhostname.so.2) The reference counting might be broken. It is in other places unfortunately. And continues with that line (#1 and #2) until frame 63. I have looked in src/libsystemd/sd-rtnl/rtnl-message.c and have two observations (my C is very rusty so feel free to correct me). Line 589, shouldn't the line if (m REFCNT_DEC(m-n_ref) = 0) { No, it's supposed to do the freeing when it reaches 0. It is spelled as = 0 but that is either simply misleading, or a workaround for a bug. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel