Re: [systemd-devel] Wierd Segfault in sd_rtnl_message_unref (libnss_myhostname.so.2 by sshd )

2015-01-15 Thread Svenne Krap

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Tom,

I will be happy to run different tests, but I need a serious amount of
handholding, as I haven't done this kind of work for ages...

You could start with how I run it through Valgrind... (which I do have
installed, but no clue how to use in this context...)

Svenne

On 13-01-2015 23:33, Tom Gundersen wrote:
 Hi Svenne,

 On Mon, Jan 12, 2015 at 10:08 PM, Svenne Krap svenne.li...@krap.dk
wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi.

 On Arch X64 using 218-1 (first packaging of 218) I have run into the
 following wierd problem.

 When trying to connect to a ssh server running dualstack (both ipv4 and
 ipv6) by ipv6, ssh segfaults when I have loaded the full ipv4 bgp
 routing table (~500k+ routes). IPv4 connections works for some reason,
 and Ipv6 recovers if I kill the routing daemon (bird).

 The stack trace of the core-file starts with

 Stack trace of thread 515:
 #0  0x7f48334a3dd5 _int_free (libc.so.6)
 #1  0x7f4834a1e62a sd_rtnl_message_unref (libnss_myhostname.so.2)
 #2  0x7f4834a1e657 sd_rtnl_message_unref (libnss_myhostname.so.2)

 And continues with that line (#1 and #2) until frame 63.

 I have looked in src/libsystemd/sd-rtnl/rtnl-message.c and have two
 observations (my C is very rusty so feel free to correct me).

 Line 589, shouldn't the line
 if (m  REFCNT_DEC(m-n_ref) = 0) {

 be

 if (m  REFCNT_DEC(m-n_ref) = 0) {

 (I.e. greater-than-equal instead of less-than-equal)

 As Zbigniew explained, this is actually correct, but misleading. I
 fixed it to use equality now, which should hopefully make it clearer.

 Any chance you could run this through valgrind to get a bit more info
 about what's going wrong?

 Also, perhaps a test of whether m-next is equal to m on line 597

 Hm, well, if there is a loop in the message list we are in trouble,
 but checking just for two messages pointing at each other is not
 enough, as the loop could be bigger. That said, such a loop can only
 happen if there is a real bug in our code, so I don't think we should
 be checking for that all the time.

 Thanks for the report!

 Tom


-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQI5BAEBCAAjBQJUuBQPHBpodHRwOi8vc3Zlbm5lLmRrL3BncC9wb2xpY3kACgkQ
/zLSj+olL/J/9w//VsorJ1y93yQzSw5SiOegSEr1tZulWP4v41mNRW32ufx22uaz
5KnBbUaokyueArHw2iNRoYpydSK/7yadp/hU9yFTwVnwEuwd/PwFSzPuIpdye2Xz
STpIAlu4bBYgP5I4Tmue64VZDXxmrj24BbHd0yM5ycwApGxMtTdYnvrzfeRv0Hkf
B0G6W/uRmYkFs2/oFf/4brhikK1EZuZzJPeV0v77SCQBxFyVrllwFcvnoW3cyFMa
Co5Mz+5vgCpA2J8mOMFSTDJ3S+kUe6iwS1N5ijC3cM8mvIsQKEGG6xzKJ+mlLkdz
J5E7OHoqBT7rEvKBq0LcHMsOC0wpIb9SG3YtXNeUuJNGm01FM0tvqyP57q63DW0r
vH7u4y75DMQHeM0e/0uEuCiLVb1FHxQxH49NdwhLhFbA8hR6dq6nFL1zB0XnMTvi
lPZdAnEv0WKkkMEVsWH1xABvoYF+VxV3DE/g1Ju/SUW2xmHNQABsp6RB9roDPrGF
8u9FnCbpu/QjM7C4MQR1OH1Z6r4sE/hLcDeNkBQRQRk//8V4W7AkIWQoi7clEAyi
OsJc4YQVfIAebFDRukIEd3xKhNvgsH5ERPQSJPJ8FE/2BwXb8b7qzbiJyTfjQQLs
RWq2zm1rB2UXjZtazEsZFx9VLmwly9blPolpXxSDYHSMp/uBMSS3KrXj2vk=
=kUqi
-END PGP SIGNATURE-

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Wierd Segfault in sd_rtnl_message_unref (libnss_myhostname.so.2 by sshd )

2015-01-13 Thread Tom Gundersen
Hi Svenne,

On Mon, Jan 12, 2015 at 10:08 PM, Svenne Krap svenne.li...@krap.dk wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi.

 On Arch X64 using 218-1 (first packaging of 218) I have run into the
 following wierd problem.

 When trying to connect to a ssh server running dualstack (both ipv4 and
 ipv6) by ipv6, ssh segfaults when I have loaded the full ipv4 bgp
 routing table (~500k+ routes). IPv4 connections works for some reason,
 and Ipv6 recovers if I kill the routing daemon (bird).

 The stack trace of the core-file starts with

 Stack trace of thread 515:
 #0  0x7f48334a3dd5 _int_free (libc.so.6)
 #1  0x7f4834a1e62a sd_rtnl_message_unref (libnss_myhostname.so.2)
 #2  0x7f4834a1e657 sd_rtnl_message_unref (libnss_myhostname.so.2)

 And continues with that line (#1 and #2) until frame 63.

 I have looked in src/libsystemd/sd-rtnl/rtnl-message.c and have two
 observations (my C is very rusty so feel free to correct me).

 Line 589, shouldn't the line
 if (m  REFCNT_DEC(m-n_ref) = 0) {

 be

 if (m  REFCNT_DEC(m-n_ref) = 0) {

 (I.e. greater-than-equal instead of less-than-equal)

As Zbigniew explained, this is actually correct, but misleading. I
fixed it to use equality now, which should hopefully make it clearer.

Any chance you could run this through valgrind to get a bit more info
about what's going wrong?

 Also, perhaps a test of whether m-next is equal to m on line 597

Hm, well, if there is a loop in the message list we are in trouble,
but checking just for two messages pointing at each other is not
enough, as the loop could be bigger. That said, such a loop can only
happen if there is a real bug in our code, so I don't think we should
be checking for that all the time.

Thanks for the report!

Tom
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Wierd Segfault in sd_rtnl_message_unref (libnss_myhostname.so.2 by sshd )

2015-01-12 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Jan 12, 2015 at 10:08:30PM +0100, Svenne Krap wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Hi.
 
 On Arch X64 using 218-1 (first packaging of 218) I have run into the
 following wierd problem.
 
 When trying to connect to a ssh server running dualstack (both ipv4 and
 ipv6) by ipv6, ssh segfaults when I have loaded the full ipv4 bgp
 routing table (~500k+ routes). IPv4 connections works for some reason,
 and Ipv6 recovers if I kill the routing daemon (bird).
 
 The stack trace of the core-file starts with
 
 Stack trace of thread 515:
 #0  0x7f48334a3dd5 _int_free (libc.so.6)
 #1  0x7f4834a1e62a sd_rtnl_message_unref (libnss_myhostname.so.2)
 #2  0x7f4834a1e657 sd_rtnl_message_unref (libnss_myhostname.so.2)
The reference counting might be broken. It is in other places
unfortunately.

 And continues with that line (#1 and #2) until frame 63.
 
 I have looked in src/libsystemd/sd-rtnl/rtnl-message.c and have two
 observations (my C is very rusty so feel free to correct me).
 
 Line 589, shouldn't the line
 if (m  REFCNT_DEC(m-n_ref) = 0) {
No, it's supposed to do the freeing when it reaches 0. It is spelled as = 0
but that is either simply misleading, or a workaround for a bug.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel