GPL compliance issue with liquidio/lio_23xx_vsw.bin firmware

2018-08-27 Thread Florian Weimer

liquidio/lio_23xx_vsw.bin contains a compiled MIPS Linux kernel:

$ tail --bytes=+1313 liquidio/lio_23xx_vsw.bin > elf
$ readelf -aW elf
[…]
  [ 6] __ksymtab PROGBITS80e495f8 64a5f8 00d130 
00   A  0   0  8
  [ 7] __ksymtab_gpl PROGBITS80e56728 657728 008400 
00   A  0   0  8
  [ 8] __ksymtab_strings PROGBITS80e5eb28 65fb28 018868 
00   A  0   0  1

[…]
Symbol table '.symtab' contains 1349 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 0:  0 NOTYPE  LOCAL  DEFAULT  UND
 1:  0 FILELOCAL  DEFAULT  ABS 
arch/mips/kernel/head.o

 2:  0 FILELOCAL  DEFAULT  ABS init/main.c
 3:  0 FILELOCAL  DEFAULT  ABS 
include/linux/types.h

[…]

Yet there is no corresponding source provided, and LICENCE.cavium lacks 
the required notices.


Thanks,
Florian


sendmmsg flags userspace ABI change in kernel 4.6

2018-04-18 Thread Florian Weimer
Since this commit:

commit 28a94d8fb35b3a75b802f368ae6f4a9f6b0d435a
Author: Tom Herbert 
Date:   Mon Mar 7 14:11:02 2016 -0800

net: Allow MSG_EOR in each msghdr of sendmmsg

This patch allows setting MSG_EOR in each individual msghdr passed
in sendmmsg. This allows a sendmmsg to send multiple messages when
using SOCK_SEQPACKET.

Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller 

the msg_flags argument in individual msghdr arguments is longer
completely ignored for SOCK_SEQPACKET sockets.  msg_flags was and is
still documented as ignored for sendmsg(2), so by analogy for
sendmmsg(2) as well.

It seems that valgrind does not know about this yet, and due to
limited use of SCTP, this userspace ABI change has not been noticed so
far.

What are the plans in this area?  Will other kinds of sockets start
using the msghdr flags for sending?

A fully backwards-compatibility way to achieve this would be to
specify that you have to pass a new flag to sendmmsg (MSG_PERHDR?), in
its flags argument, to activate the per-msghdr flags.

The glibc DNS stub resolver relies on the previously documented
behavior, and I wonder how widely we should backport the change:

  https://sourceware.org/bugzilla/show_bug.cgi?id=23037

If the MSG_PERHDR route will be taken, we can skip this work, and
valgrind can flag uninitialized bits in msg_flags only if MSG_PERHDR
is passed.  (I believe it would be difficult for valgrind to look at
the socket type to determine whether undefined bits need reporting.)


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-16 Thread Florian Weimer

On 03/16/2018 06:29 PM, Linus Torvalds wrote:


Gcc people are crazy.


End of discussion from me.  This is not acceptable.

Florian


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-16 Thread Florian Weimer

On 03/16/2018 05:25 AM, Kees Cook wrote:

In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].


I find this commit message confusing.  VLAs have precisely defined 
semantics which differ from other arrays, and these differences can be 
observable (maybe not in the kernel, but certainly for userspace), so 
the compiler has to treat a VLA as such even if the length is a constant 
known at compile time.  (The original intent of the warning probably was 
a portability check anyway.)


If you want to catch stack frames which have unbounded size, 
-Werror=stack-usage=1000 or -Werror=vla-larger-than=1000 (with the 
constant adjusted as needed) might be the better approach.


Thanks,
Florian


Re: [net-next PATCH 2/3] net: reduce cycles spend on ICMP replies that gets rate limited

2017-06-05 Thread Florian Weimer
On 06/04/2017 04:38 PM, Jesper Dangaard Brouer wrote:
> On Sun, 4 Jun 2017 09:11:53 +0200
> Florian Weimer <fwei...@redhat.com> wrote:
> 
>> On 01/09/2017 04:04 PM, Jesper Dangaard Brouer wrote:
>>
>>> This patch split the global and per (inet)peer ICMP-reply limiter
>>> code, and moves the global limit check to earlier in the packet
>>> processing path.  Thus, avoid spending cycles on ICMP replies that
>>> gets limited/suppressed anyhow.
>>>
>>> The global ICMP rate limiter icmp_global_allow() is a good solution,
>>> it just happens too late in the process.  The kernel goes through the
>>> full route lookup (return path) for the ICMP message, before taking
>>> the rate limit decision of not sending the ICMP reply.
>>>
>>> Details: The kernels global rate limiter for ICMP messages got added
>>> in commit 4cdf507d5452 ("icmp: add a global rate limitation").  It is
>>> a token bucket limiter with a global lock.  It brilliantly avoids
>>> locking congestion by only updating when 20ms (HZ/50) were elapsed. It
>>> can then avoids taking lock when credit is exhausted (when under
>>> pressure) and time constraint for refill is not yet meet.  
>>
>> This patch removed the rate limit bypass for localhost.  As a result, it
>> is impossible to write deterministic UDP client tests tests which
>> exercise failover behavior in response to unreachable servers.
> 
> You cannot rely on ICMP responses delivery, too many systems (and
> middleboxes) limit or drop ICMP. Before this patch, loopback dev was
> explicitly excluded from being ICMP rate limited.  Thus, your localhost
> test passed.

Yes, I know that.  But there's a difference between failing a UDP query
immediately and waiting for the timeout to happen.  ICMP responses are
really helpful for that, even though they cannot be relied upon.

> Is there a real use-case behind "failover behavior in response to
> unreachable servers" (which would need to run on localhost)?

It's also relevant during boot when local UDP services are not running.
There, waiting for the timeout can delay the boot process.

It used to be relevant for switching to backup UDP-based servers (such
as name servers), but it seems the Linux kernel has not generated ICMP
messages at a sufficient rate to facilitate that long before the recent
changes.  And of course, it only covers a subset of the failure
scenarios (and arguably only a small subset of them).

In any case, we need a working way to test clients which have ICMP-based
failure detection, and we can't do that if the kernel sends them only
once in a while.

> Adding back outgoing-dev loopback test will require a full
> route-lookup, which is what the hole optimization gain[1] comes from.
> [1] https://git.kernel.org/torvalds/c/9f2f27a9a518
> 
> I've tried to come-up with an alternative solution, see inlined patch
> below...

Looking at the incoming interface doesn't seem unreasonable here.

Thanks,
Florian


Re: [net-next PATCH 2/3] net: reduce cycles spend on ICMP replies that gets rate limited

2017-06-04 Thread Florian Weimer
On 01/09/2017 04:04 PM, Jesper Dangaard Brouer wrote:
> This patch split the global and per (inet)peer ICMP-reply limiter
> code, and moves the global limit check to earlier in the packet
> processing path.  Thus, avoid spending cycles on ICMP replies that
> gets limited/suppressed anyhow.
> 
> The global ICMP rate limiter icmp_global_allow() is a good solution,
> it just happens too late in the process.  The kernel goes through the
> full route lookup (return path) for the ICMP message, before taking
> the rate limit decision of not sending the ICMP reply.
> 
> Details: The kernels global rate limiter for ICMP messages got added
> in commit 4cdf507d5452 ("icmp: add a global rate limitation").  It is
> a token bucket limiter with a global lock.  It brilliantly avoids
> locking congestion by only updating when 20ms (HZ/50) were elapsed. It
> can then avoids taking lock when credit is exhausted (when under
> pressure) and time constraint for refill is not yet meet.

This patch removed the rate limit bypass for localhost.  As a result, it
is impossible to write deterministic UDP client tests tests which
exercise failover behavior in response to unreachable servers.

H.J. Lu noted that a glibc test started failing on kernel 4.11 and
identified the regression:

  https://sourceware.org/ml/libc-alpha/2017-06/msg00167.html

(I have more tests which are afflicted by this, but are not yet in glibc
upstream.)

This is particularly annoying because we already run such tests in a
network namespace for isolation, but the rate limit counter is global,
so that doesn't help here.

I'm attaching a self-contained test case.  It fails for me with:

localhost-icmp: iteration 50: no ICMP message (poll timeout)

On kernel 4.10, it passes and runs within just a few milliseconds.

Would you please fix this in some way?  Thanks.

Florian
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* How many UDP packets to send to a non-responding part.  */
enum { ITERATIONS = 1000 };

int
main (void)
{
  /* Pick a port number which is likely unused.  */
  unsigned short port;
  {
int sock = socket (AF_INET, SOCK_DGRAM, 0);
if (sock < 0)
  err (1, "socket");
struct sockaddr_in sin = { .sin_family = AF_INET };
if (bind (sock, (struct sockaddr *) , sizeof (sin)) < 0)
  err (1, "bind");
socklen_t sinlen = sizeof (sin);
if (getsockname (sock, (struct sockaddr *) , ))
  err (1, "getsockname");
if (sinlen != sizeof (sin) || sin.sin_family != AF_INET)
  errx (1, "wrong address information for socket");
if (close (sock) < 0)
  err (1, "close");
port = sin.sin_port;
  }

  for (int i = 0; i < ITERATIONS; ++i)
{
  int sock = socket (AF_INET, SOCK_DGRAM, 0);
  if (sock < 0)
err (1, "socket");
  struct sockaddr_in sin =
{
  .sin_family = AF_INET,
  .sin_addr = { ntohl (INADDR_LOOPBACK) },
  .sin_port = port,
};
  if (connect (sock, (struct sockaddr *) , sizeof (sin)) < 0)
err (1, "connect");
  if (sendto (sock, "", 1, 0, NULL, 0) < 0)
err (1, "sendto");
  struct pollfd fd = { .fd = sock, .events = POLLIN };
  int ret = poll (, 1, 5000);
  if (ret < 0)
err (1, "poll");
  if (ret == 0)
errx (1, "iteration %d: no ICMP message (poll timeout)", i);
  if (close (sock) < 0)
err (1, "close");
}
}


Re: Kernel uapi and glibc header conflicts (was Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h )

2016-02-08 Thread Florian Weimer
On 02/07/2016 12:31 PM, Mikko Rapeli wrote:
> On Thu, Jan 07, 2016 at 10:30:40AM -0800, Stephen Hemminger wrote:
>> On Thu, 7 Jan 2016 07:29:50 +
>> Mikko Rapeli  wrote:
>>
>>> On Wed, Jan 06, 2016 at 09:20:07AM -0800, Stephen Hemminger wrote:
 This commit breaks compilation of iproute2 with net-next.
>>>
>>> Ok, linux/if.h and libc net/if.h have overlapping defines, and this is not
>>> the only one. I saw lots of them in the core dump headers.
>>>
>>> How should we handle them? Another ifndef for IFNAMSIZ into kernel uapi
>>> headers?
>>>
>>> -Mikko
>>
>> Probably need to do the same thing that was done previously for these
>> kind of conflicts.  This makes make linux/if.h change to adapt to net/if.h
>> being included before it.
> 
> So uapi headers now have a libc-compat.h
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/libc-compat.h?id=refs/tags/v4.5-rc2
> which tries to detect and fix incompatibilities between Linux kernel and glibc
> headers. Part of the fix is then in the kernel side headers and another part
> should be in glibc headers, but glibc git repo does not include any of these
> fixes yet.
> 
> Has the glibc part of this incompatiblity mess been discussed and agreed
> with glibc developers?

I don't remember any recent discussions on libc-alpha, or any bug
reports about this concrete change.

(Redirecting to libc-alpha, which seems the more appropriate list.)

> Many of the conflics arise from propably old glibc headers which had copied
> out definitions from the Linux kernel side before it could export any headers
> to userspace. I assume that the glibc headers are not allowed to depend and
> include Linux kernel uapi headers in deployments but maybe the Linux kernel
> headers could be used at glibc compile time to generate needed glibc side
> definitions. That would allow having a single source for definitions like 
> FNAMSIZ 16.

My impression is that this inconsistency isn't the only problem.  The
problems start if application developers need functionality which is
only in kernel-provided headers, but they still need to include glibc
headers at the same time.

> I'm drafting a test, similar to the kernel uapi header compile test
> https://github.com/mcfrisk/linux/blob/headers_test_v05/scripts/headers_compile_test.sh
> for the glibc conflicts too, and of course noticed that also glibc headers
> conflict with each other. With some workarounds I can test compile each kernel
> uapi header against all compiling glibc headers and see the conflicts as
> build failures.

That could be helpful.

I'm not familiar with relevant developer practices.  It seems to me that
from an application developer point of view, kernel headers are updated
a bit more frequently than glibc headers.  This likely pushes the
solution into a certain direction (and may be the rationale behind the
kernel's libc-compat.h).

Florian


Re: Asterisk deadlocks since Kernel 4.1

2015-11-19 Thread Florian Weimer
On 11/18/2015 10:23 PM, Stefan Priebe wrote:

>> please try to get a backtrace with debugging information.  It is likely
>> that this is the make_request/__check_pf functionality in glibc, but it
>> would be nice to get some certainty.
>>
>> Which glibc version do you use?  Has it got a fix for CVE-2013-7423?
> 
> It's Debians 2.13-38+deb7u8 Debians issue tracker says it is fixed:
> https://security-tracker.debian.org/tracker/CVE-2013-7423

I checked, and the patch is there and applied.

>> Theoretically, recvmsg could also hang if the Netlink query was dropped
>> by the kernel, or the final packet in the response was dropped.  We
>> never saw that happen, even under extreme load, but I didn't test with
>> recent kernels.
> 
> The load is very low in this system. Just 30 phones and only 1-6 calling.

This is rather odd.

I'm not sure if we can do anything simple on the glibc to help to debug
this.  If something in the process consumes the Netlink message from the
kernel (incorrectly reading from the internal glibc file descriptor),
this is fairly difficult detect inside glibc (it would require rather
large changes which do not currently exist at all).

Florian

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Asterisk deadlocks since Kernel 4.1

2015-11-19 Thread Florian Weimer
On 11/18/2015 10:36 PM, Stefan Priebe wrote:

>> please try to get a backtrace with debugging information.  It is likely
>> that this is the make_request/__check_pf functionality in glibc, but it
>> would be nice to get some certainty.
> 
> sorry here it is. What I'm wondering is why is there ipv6 stuff? I don't
> have ipv6 except for link local.

glibc needs to know if the system has global unicast addresses if it
receives  records.

It's curious that net.ipv6.conf.all.disable_ipv6=1 makes the problem go
away.  Even with that setting, the kernel seems to send two Netlink
responses.  So either this is enough to narrow the window for the race
so that no longer triggers, or there is a genuine kernel issue with
supplying the requested IPv6 Netlink response.

> Could it be this one?
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=505105#c79

No, that's on the DNS/UDP side, not in the Netlink code.

Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Asterisk deadlocks since Kernel 4.1

2015-11-19 Thread Florian Weimer
On 11/19/2015 01:46 PM, Stefan Priebe - Profihost AG wrote:

> I can try Kernel 4.4-rc1 next week. Or something else?

I found this bug report which indicates that 4.1.10 works:

  

But in your original report, you said that 4.1.13 is broken.

This backtrace:

  

shows a lot of waiting on quite different netlink sockets.  So if this
is due to a race in Asterisk, it must have happened several times in a row.

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Asterisk deadlocks since Kernel 4.1

2015-11-18 Thread Florian Weimer
On 11/18/2015 09:23 PM, Stefan Priebe wrote:
> 
> Am 17.11.2015 um 20:43 schrieb Thomas Gleixner:
>> On Tue, 17 Nov 2015, Stefan Priebe wrote:
>>> I've now also two gdb backtraces from two crashes:
>>> http://pastebin.com/raw.php?i=yih5jNt8
>>>
>>> http://pastebin.com/raw.php?i=kGEcvH4T
>>
>> They don't tell me anything as I have no idea of the inner workings of
>> asterisk. You might be better of to talk to the asterisk folks to help
>> you track down what that thing is waiting for, so we can actually look
>> at a well defined area.
> 
> The asterisk guys told me it's a livelock asterisk is waiting for
> getaddrinfo / recvmsg.
> 
> Thread 2 (Thread 0x7fbe989c6700 (LWP 12890)):
> #0  0x7fbeb9eb487d in recvmsg () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7fbeb9ed4fcc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x7fbeb9ed544a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x7fbeb9e92007 in getaddrinfo () from
> /lib/x86_64-linux-gnu/libc.so.6

Stefan,

please try to get a backtrace with debugging information.  It is likely
that this is the make_request/__check_pf functionality in glibc, but it
would be nice to get some certainty.

Which glibc version do you use?  Has it got a fix for CVE-2013-7423?

So far, the only known cause for a hang in this place (that is, lack of
return from recvmsg) is incorrect file descriptor use.  (CVE-2013-7423
is such an issue in glibc itself.)  The kernel upgrade could change
scheduling behavior, and the actual bug might have been latent before.

Theoretically, recvmsg could also hang if the Netlink query was dropped
by the kernel, or the final packet in the response was dropped.  We
never saw that happen, even under extreme load, but I didn't test with
recent kernels.

The glibc change Hannes mentioned won't detect the hang, but if there is
incorrect file descriptor reuse going on, it is possible that the new
assert catches it.

Florian

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] glibc: Terminate process on invalid netlink response from kernel [BZ #12926]

2015-11-03 Thread Florian Weimer
On 10/24/2015 06:22 AM, Mike Frysinger wrote:
> On 23 Oct 2015 22:07, Florian Weimer wrote:
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/netlink_assert_response.c
>> @@ -0,0 +1,100 @@
>> +/* Copyright (C) 2015 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
> 
> guess we like to have the first line be a short desc of the file

Added.

>> +static int
>> +get_address_family (int fd)
>> +{
>> ...
>> +  return sa.ss_family;

> ss_family is of type sa_family_t, not int ... not a big deal, but the
> two do differ in sign ...

Thanks.  I added static asserts to make sure that the actual type does
not cause problems with the use of -1 and an int return value.

I do not want to use SO_DOMAIN here because I expect this to eventually
move into generic code because we probably should do similar checking on
other internally-used sockets (where reporting impossible errors to the
caller would be grossly misleading).

I'm still waiting for comments from the kernel people. :)

Florian
Terminate process on invalid netlink response from kernel [BZ

The recvmsg system calls for netlink sockets have been particularly
prone to picking up unrelated data after a file descriptor race
(where the descriptor is closed and reopened concurrently in a
multi-threaded process, as the result of a file descriptor
management issue elsewhere).  This commit adds additional error
checking and aborts the process if a datagram of unexpected length
(without the netlink header) is received, or an error code which
cannot happen due to the way the netlink socket is used.

2015-11-03  Florian Weimer  <fwei...@redhat.com>

	[BZ #12926]
	Terminate process on invalid netlink response.
	* sysdeps/unix/sysv/linux/netlinkaccess.h
	(__netlink_assert_response): Declare.
	* sysdeps/unix/sysv/linux/netlink_assert_response.c: New file.
	* sysdeps/unix/sysv/linux/Makefile [$(subdir) == inet]
	(sysdep_routines): Add netlink_assert_response.
	* sysdeps/unix/sysv/linux/check_native.c (__check_native): Call
	__netlink_assert_response.
	* sysdeps/unix/sysv/linux/check_pf.c (make_request): Likewise.
	* sysdeps/unix/sysv/linux/ifaddrs.c (__netlink_request): Likewise.
	* sysdeps/unix/sysv/linux/Versions (GLIBC_PRIVATE): Add
	__netlink_assert_response.

diff --git a/ChangeLog b/ChangeLog
index ab7aa69..0b8ed92 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2015-11-03  Florian Weimer  <fwei...@redhat.com>
+
+	[BZ #12926]
+	Terminate process on invalid netlink response.
+	* sysdeps/unix/sysv/linux/netlinkaccess.h
+	(__netlink_assert_response): Declare.
+	* sysdeps/unix/sysv/linux/netlink_assert_response.c: New file.
+	* sysdeps/unix/sysv/linux/Makefile [$(subdir) == inet]
+	(sysdep_routines): Add netlink_assert_response.
+	* sysdeps/unix/sysv/linux/check_native.c (__check_native): Call
+	__netlink_assert_response.
+	* sysdeps/unix/sysv/linux/check_pf.c (make_request): Likewise.
+	* sysdeps/unix/sysv/linux/ifaddrs.c (__netlink_request): Likewise.
+	* sysdeps/unix/sysv/linux/Versions (GLIBC_PRIVATE): Add
+	__netlink_assert_response.
+
 2015-11-02  Joseph Myers  <jos...@codesourcery.com>
 
 	* math/libm-test.inc (modf_test_data): Add more tests.
diff --git a/NEWS b/NEWS
index 896eb02..d3738df 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,14 @@ Version 2.23
   19095, 19124, 19125, 19129, 19134, 19137, 19156, 19174, 19181, 19189,
   19201.
 
+* getaddrinfo now detects certain invalid responses on an internal netlink
+  socket.  If such responses are received, an affected process will
+  terminate with an error message of "Unexpected error  on netlink
+  descriptor " or "Unexpected netlink response of size  on
+  descriptor ".  The most likely cause for these errors is a
+  multi-threaded application which erroneously closes and reuses the netlink
+  file descriptor while it is used by getaddrinfo.
+
 * A defect in the malloc implementation, present since glibc 2.15 (2012) or
   glibc 2.10 via --enable-experimental-malloc (2009), could result in the
   unnecessary serialization of memory allocation requests across threads.
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 2c67a66..d6cc529 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -151,6 +151,7 @@ sysdep_headers += netinet/if_fddi.h netinet/if_tr.h \
 		  netipx/ipx.h netash/ash.h netax25/ax25.h netatalk/at.h \
 		  netrom/netrom.h netpacket/packet.h netrose/rose.h \
 		  neteconet/ec.h netiucv/iucv.h
+sysdep_routines += netlink_assert_response
 endif
 
 # Don't compile the ctype glue code, since there is no old non-GNU C library.
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 16bb281..202ffcc 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -169,5 +169,7 @@ libc {
   GLIBC_PRIVATE {
 # functions used in other lib

Re: [PATCH] glibc: Terminate process on invalid netlink response from kernel [BZ #12926]

2015-11-03 Thread Florian Weimer
On 11/03/2015 02:48 PM, Hannes Frederic Sowa wrote:
> Hello,
> 
> On Fri, Oct 23, 2015, at 21:07, Florian Weimer wrote:
>> (By the way, we'd also love to have a better kernel interface to fulfill
>> the needs for getaddrinfo address sorting.  The netlink requests we
>> currently use are much too slow if the host has many addresses
>> configured.)
> 
> One solution would be to finish the IPv6 ioctl interface to list
> addresses. The ioctl interface would need less memory allocations and is
> a synchronous interface which would make it much more easier for glibc
> to deal with. No timeouts and retries like with netlink are necessary.

The more fundamental question is whether we actually have to copy all
the addresses to userspace.  In the end, it may be better to hand a list
of destination addresses to the kernel and have it sort them according
to some algorithm.  But for the algorithm proposed in RFC 6724 section
6, this may be not worth the effort because there are so many
configurable bits.

I still think most of the address sorting is bogus because it appears to
make guarantees which can break after renumbering.

Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] glibc: Terminate process on invalid netlink response from kernel [BZ #12926]

2015-10-23 Thread Florian Weimer
This patch revisits this glibc bug:

  https://sourceware.org/bugzilla/show_bug.cgi?id=12926

For some reason, this particular code path is very good at picking up
file descriptors which have been reused in correctly.  This happens if
other threads have a race, close the wrong file descriptor (the one used
in the glibc netlink code), and reopen another one in its place.

The netlink requests we send to the kernel are:

  struct req
  {
struct nlmsghdr nlh;
struct rtgenmsg g;
/* struct rtgenmsg consists of a single byte.  This means there
   are three bytes of padding included in the REQ definition.
   We make them explicit here.  */
char pad[3];
  } req;

  req.nlh.nlmsg_len = sizeof (req);
  req.nlh.nlmsg_type = RTM_GETADDR;
  req.nlh.nlmsg_flags = NLM_F_ROOT | NLM_F_MATCH | NLM_F_REQUEST;
  req.nlh.nlmsg_pid = 0;
  req.nlh.nlmsg_seq = time (NULL);
  req.g.rtgen_family = AF_UNSPEC;

  req.nlh.nlmsg_len = sizeof (req);
  req.nlh.nlmsg_type = RTM_GETLINK;
  req.nlh.nlmsg_flags = NLM_F_ROOT | NLM_F_MATCH | NLM_F_REQUEST;
  req.nlh.nlmsg_pid = 0;
  req.nlh.nlmsg_seq = time (NULL);
  req.g.rtgen_family = AF_UNSPEC;

I discussed this with Hannes and he thinks that a zero-length reply (as
received by recvmsg) is impossible at this point, for these specific
types of netlink requests.  The new assert triggers for zero-length
replies, but also for replies less than sizeof (struct nlmsghdr) bytes
long, and for unexpected errors (EBADF, ENOTSOCK, ENOTCONN,
ECONNREFUSED, and EAGAIN on a non-blocking sockets—ours are all blocking).

This is purely a defense against silent data corruption and bug reports
incorrectly blaming glibc (or the wrong part of glibc at least).  I
added it to all three copies of the netlink code in glibc.

The glibc netlink code is still broken: It does not time out and retry
(needed in case the request gets lots), does not handle NLM_F_DUMP_INTR,
and does not deal with NLMSG_ERROR and ENOBUFS.  But these are separate
issues.  SOCK_CLOEXEC is not used, either. If we fix those issues, the
assert would remain in place, except for the EAGAIN part.

(By the way, we'd also love to have a better kernel interface to fulfill
the needs for getaddrinfo address sorting.  The netlink requests we
currently use are much too slow if the host has many addresses configured.)

I have tested that basic getaddrinfo operations still work after the
patch, but glibc testsuite coverage in this area is very limited, and I
have yet to do full-system testing with this patch.

Florian
Terminate process on invalid netlink response from kernel [BZ #12926]

The recvmsg system calls for netlink sockets have been particularly
prone to picking up unrelated data after a file descriptor race
(where the descriptor is closed and reopened concurrently in a
multi-threaded process, as the result of a file descriptor
management issue elsewhere).  This commit adds additional error
checking and aborts the process if a datagram of unexpected length
(without the netlink header) is received, or an error code which
cannot happen due to the way the netlink socket is used.

2015-10-23  Florian Weimer  <fwei...@redhat.com>

	[BZ #12926]
	Terminate process on invalid netlink response.
	* sysdeps/unix/sysv/linux/netlinkaccess.h
	(__netlink_assert_response): Declare.
	* sysdeps/unix/sysv/linux/netlink_assert_response.c: New file.
	* sysdeps/unix/sysv/linux/Makefile [$(subdir) == inet]
	(sysdep_routines): Add netlink_assert_response.
	* sysdeps/unix/sysv/linux/check_native.c (__check_native): Call
	__netlink_assert_response.
	* sysdeps/unix/sysv/linux/check_pf.c (make_request): Likewise.
	* sysdeps/unix/sysv/linux/ifaddrs.c (__netlink_request): Likewise.
	* sysdeps/unix/sysv/linux/Versions (GLIBC_PRIVATE): Add
	__netlink_assert_response.

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 2c67a66..d6cc529 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -151,6 +151,7 @@ sysdep_headers += netinet/if_fddi.h netinet/if_tr.h \
 		  netipx/ipx.h netash/ash.h netax25/ax25.h netatalk/at.h \
 		  netrom/netrom.h netpacket/packet.h netrose/rose.h \
 		  neteconet/ec.h netiucv/iucv.h
+sysdep_routines += netlink_assert_response
 endif
 
 # Don't compile the ctype glue code, since there is no old non-GNU C library.
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 16bb281..202ffcc 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -169,5 +169,7 @@ libc {
   GLIBC_PRIVATE {
 # functions used in other libraries
 __syscall_rt_sigqueueinfo;
+# functions used by nscd
+__netlink_assert_response;
   }
 }
diff --git a/sysdeps/unix/sysv/linux/check_native.c b/sysdeps/unix/sysv/linux/check_native.c
index eaefca1..d04c8f2 100644
--- a/sysdeps/unix/sysv/linux/check_native.c
+++ b/sysdeps/unix/sysv/linux/check_native.c
@@ -35,6 +35,7 @@
 
 #include 
 
+#include "netlinkaccess

Re: e1000 driver and samba

2007-09-18 Thread Florian Weimer
* Urs Thuermann:

 How can a corrupted frame pass the TCP checksum check?

The TCP/IP checksums are extremely weak.  If the corruption is due to
defective SRAM or something like that, it's likely that it causes an
error pattern which is 16-bit-aligned.  And an even number of
16-bit-aligned bit flips is not detected by the TCP checksum. 8-(

Actually, nobody should use TCP without application-level checksums
for that reason.  But of course, there is HTTP.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Beyond 64K TCP connections limit per IP-address

2007-07-04 Thread Florian Weimer
* Robert Iakobashvili:

 If I am correct, a TCP server can make up to
 64K accepts for a port at a single IP-address.

I don't think such a limit exists.  In typical configurations, a
single client IP address can only establish a few tens of thousands of
TCP connections to one server port.  But as soon as multiple clients
are involved, there is virtually no protocol-imposed limit.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.18-rc6 1/2] dllink driver: porting v1.19 to linux 2.6.18-rc6

2006-09-21 Thread Florian Weimer
* Arjan van de Ven:

  if (irq)
  spin_lock(np-tx_lock);
  else
  spin_lock_irqsave(np-tx_lock, flag);

 double p

 this is wrong to do with in_interrupt() as gating factor!

Well, this is not Hayim fault.  It's in 2.6.18 as is.

Unfortunately, the driver does not work very well.  Transmitting
frames hangs after a while (quickly to reproduce with ping -f
-s4).  Any suggestions how we could this get working?  The cards
used work quite well in older systems despite all the driver bugs, but
current systems with different timings expose them.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Durlacher Allee 47tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question about tcp hash function tcp_hashfn()

2006-06-02 Thread Florian Weimer
* Evgeniy Polyakov:

 :) thats true, but to be 100% honest I used different code to test for
 hash artifacts...

Ah, okay.

 But it still does not fix artifacts with for example const IP and random
 ports or const IP and linear port selection.

I see them now.  Hmm.  Is there a theoretical explanation for them?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html