from:"Jay Smith"

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

2016-10-17 Thread Jay Smith


Trying to revive this thread.

To review:  skb_copy_and_csum_datagram_msg() pretty clearly doesn't do
the right thing since it started using an iov_iter to copy into the
user's iovec.  In particular, if it encounters a datagram that fails the
checksum, the iov_iter continues to point to the end of the failed
datagram's data, and that data makes it out to user-space.

I'd previously sent a test program that consistenly reproduced the UDP(v4)
symptoms of this problem [0].  I've now also taken Christian's netem
suggestion and written a quick program that sends known data over
loopback TCP from one thread and reads it from another.  It optionally
sets up a netem qdisc that corrupts 1% of packets.  As expected, even
with corruption, tcp delivers correct data to the user on pre-3.19
kernels; on 3.19 and later, long transfers usually see corruptions.
(Source for the TCP test program below.)

I've also tried both test programs with the following patch:

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..61da091 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -730,6 +730,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 {
__wsum csum;
int chunk = skb->len - hlen;
+   struct iov_iter saved_iter;
 
if (!chunk)
return 0;
@@ -741,11 +742,16 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
goto fault;
} else {
csum = csum_partial(skb->data, hlen, skb->csum);
+
+   /* save msg_iter state, so we can revert if csum fails. */
+   saved_iter = msg->msg_iter;
if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter,
   chunk, &csum))
goto fault;
-   if (csum_fold(csum))
+   if (csum_fold(csum)) {
+   msg->msg_iter = saved_iter;
goto csum_error;
+   }
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
netdev_rx_csum_fault(skb->dev);
}


(This is essentially the same fix Alan previously sent [1], except that
it uses struct assignment rather than memcpy'ing the struct iov_iter.)
As expected, both UDP and TCP tests succeed under this fix.

So I've missed whatever conversations happened off-list after Alan's
original report.  But it appears to me that this patch completely
resolves the csum/iov_iter problem for both TCP and UDP; I'm not sure I
see what further problem we'd want to hold the fix off for?

[0] https://www.spinics.net/lists/netdev/msg398026.html
[1] https://patchwork.kernel.org/patch/9260733/


New TCP test program:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 

int bytes = 0;
struct sockaddr_in addr;
socklen_t addrLen = sizeof(addr);  


void *sender(void *ignore)
{
  int send = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
  if (send < 0)
  {
fprintf(stderr, "failed to create sending socket (err=%d: %s)\n", errno, 
strerror(errno));
exit(1);
  }

  int ret = connect(send, (struct sockaddr *) &addr, addrLen);
  if (ret != 0)
  {
fprintf(stderr, "failed to connect sending socket (err=%d: %s)\n", errno, 
strerror(errno));
exit(1);
  }

  int i = 0;
  while (i < bytes)
  {
#define OUT_MESSAGE  
""
int w = write(send, OUT_MESSAGE, strlen(OUT_MESSAGE));
if (w < 0)
{
  fprintf(stdout, "failed to write byte %d\n", i);
  exit(1);
}
i += w;
  }

  close(send);
}

/* set a qdisc on lo with corruption_probability 1% (or remove if if on==0) */
void setCorrupt(int on)
{
  struct nl_sock *sock;
  struct nl_cache *cache;
  struct rtnl_qdisc *q;
  struct rtnl_link *link;
  int if_index;

  sock = nl_socket_alloc();
  nl_connect(sock, NETLINK_ROUTE);

  rtnl_link_alloc_cache(sock, AF_UNSPEC, &cache);
  link = rtnl_link_get_by_name(cache, "lo");
  if_index = rtnl_link_get_ifindex(link);

  q = rtnl_qdisc_alloc();
  rtnl_tc_set_ifindex(TC_CAST(q), if_index);
  rtnl_tc_set_parent(TC_CAST(q), TC_H_ROOT);
  rtnl_tc_set_handle(TC_CAST(q), TC_HANDLE(1, 0));
  rtnl_tc_set_kind(TC_CAST(q), "netem");
  rtnl_netem_set_corruption_probability(q, 0x / 100);
  if (on)
  {
int ret = rtnl_qdisc_add(sock, q, NLM_F_CREATE);
if (ret < 0)
  {
fprintf(stderr, "rtnl_qdisc_add error: %s\n", nl_geterror(ret));
exit(1);
  }
  }
  else
  {
int ret = rtnl_qdisc_delete(sock, q);
if (ret < 0)
  {
fprintf(stderr, "rtnl_qdisc_del error: %s\n", nl_geterror(ret));
exit(1);
  }
  }

  rtnl_qdisc_put(q);
  nl_socket_free(sock);
  rtnl_link_put(link);
  nl_cache_put(cache);
}


int main(int argc, char **argv)
{
  if (argc < 2)
  {
fprintf(stderr, "Usage: tcpcsum  [corruption-rate]");
exit(1);
  }

  bytes = atoi(argv[1]);
  int cor

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

2016-09-30 Thread Jay Smith

Christian Lamparter writes:

> On Wednesday, September 28, 2016 7:20:39 PM CEST Jay Smith wrote:
>> Actually, on a little more searching of this list's archives, I think
>> that this discussion:  https://patchwork.kernel.org/patch/9260733/ is
>> about exactly the same issue I've found, except from the TCP side. I'm
>> cc'ing a few of the participants from that discussion.
>> 
>> So is the patch proposed there (copying and restoring the entire
>> iov_iter in skb_copy_and_csum_datagram_msg()) being considered as a
>> fix?
>
> From Alan's post:
>
> "My ugly patch fixes this in the most obvious way: make a local copy of
> msg->msg_iter before the call to skb_copy_and_csum_datagram(), and copy
> it back if the checksum is bad, just before goto csum_error;"
>
> IMHO this meant that the patch is a proof of concept for his problem.

It's also the simplest thing that fixes all of the relevant cases (udp4,
udp6, tcp4).  Basically, the state of the iov_iter (which, if I'm
reading correctly, consists of three elements -- iov_offset, count, and
nr_segs all change values as the iterator moves through the vectors)
needs to be backed-up and restored at exactly the points in datagram.c
where Alan's patch does so.

Whether that should be done with memcpy, as Alan does, or by exposing
some more abstract backup/restore functions from iov_iter.c is a matter
of taste.  I'm happy to accept the call of someone more maintainer-ish
on that.

> Al Viro identified more inconsistencies within the error-paths that deal
> with EFAULT in the whole area (in and around skb_copy_and_csum_datagram()).

Was this in some other thread?  The only other discussion I see of that
function in the "PROBLEM: network data corruption..." thread is around
this patch https://patchwork.kernel.org/patch/9245023/ , which as Al
says was just a diagnostic patch -- it intentionally doesn't handle the
multiple-vector case.

It seems like the EFAULT case in skb_copy_and_csum_datagram() would
indicate that the iov_iter code ran out of room to copy the current
message, even though it's checked for that room at datagram.c:738.
Which I guess is possible -- there could be some non-obvious counting
error in the iov_inter.c macros.  But, at least in the UDP cases, it
wouldn't trigger the same problem as a checksum failure -- the EFAULT
gets returned to the caller in that case, and the buffer isn't meant
to be valid.  It's only in the checksum case that we retry underneath
the udp_recvmsg() covers, and end up returning the supposedly-rejected data.

>
> As for fixing the issue: I'm happy to test and review patches. 
> The trouble is that nobody seem to be able to produce them...

Sorry -- is the trouble you're talking about here that no-one's produced
a patch, or that we don't have a reproduction of the problem?  I don't
think either is true.

The test program I'd attached to my first mail reliably reproduces
the UDP version of the problem.  It's pretty simple: listen on a UDP
port (using loopback, so that there's no hardware csum offload), use a
raw socket to send a datagram with a bad UDP checksum, then send a good
datagram, and then finally read from the socket.  On post-3.19 kernels,
you always get the contents of the bad packet at the start of the user
buffer: 

# bin/csumtestn 69
listening on port 47193
recvmsg returned 9 bytes: BAD DATABAD DATABAD DATABAD DATABAD DATABAD DATABAD 
DATABAD DATABAD DGood data

After Alan's patch, the good packet's contents are at the start of the
buffer, where they belong:

# bin/csumtestn 69
listening on port 54620
recvmsg returned 9 bytes: Good dataAD DATABAD DATABAD DATABAD DATABAD DATABAD 
DATABAD DATABAD D

So functionally, I believe that Alan's patch does the trick.  I haven't
actually tested it on UDP6, but a similar test should work there.

Inserting the bad packets deterministically into a TCP connection is
trickier, but I thought in the previous thread that you and Alan both
had wireless hardware configurations that frequently generated checksum
errors, and that Alan's claim was that his patch gave him good TCP data
even in the presence of those checksum errors.  Or do I misunderstand?

(Just to be clear, though, if there is a need for a new patch, for
whatever reason, I'm happy to generate one.)

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

2016-09-28 Thread Jay Smith

Actually, on a little more searching of this list's archives, I think
that this discussion:  https://patchwork.kernel.org/patch/9260733/ is
about exactly the same issue I've found, except from the TCP side. I'm
cc'ing a few of the participants from that discussion.

So is the patch proposed there (copying and restoring the entire
iov_iter in skb_copy_and_csum_datagram_msg()) being considered as a
fix?  If not, would an alternate one that concealed the
save-and-restore logic inside iov_iter.c be more acceptable?  I'd be
happy to produce whatever's needed, or yield to someone with stronger
feelings about where it should go...

On Wed, Sep 28, 2016 at 6:24 PM, Eric Dumazet  wrote:
> On Wed, 2016-09-28 at 17:18 -0700, Jay Smith wrote:
>> I've spent the last week or so trying to track down a recurring
>> problem I'm seeing with UDP datagram handling.  I'm new to the
>> internals of the Linux network stack, but it appears to me that
>> there's a substantial error in recent kernels' handling of UDP
>> checksum errors.
>>
>> The behavior I'm seeing:  I have a UDP server that receives lots of
>> datagrams from many devices on a single port. A small number of those
>> devices occasionally send packets with bad UDP checksums.  After I
>> receive one of these bad packets, the next recvmsg() made on the
>> socket returns data from the bad-checksum packet, but the
>> source-address and length of the next (good) packet that arrived at
>> the port.
>>
>> I believe this issue was introduced between linux 3.18 and 3.19, by a
>> set of changes to net/core/datagram.c that made
>> skb_copy_and_csum_datagram_msg() and related functions use the
>> iov_iter structure to copy data to user buffers.  In the case where
>> those functions copy a datagram and then conclude that the checksum is
>> invalid, they don't remove the already-copied data from the user's
>> iovec; when udp_recvmsg() calls skb_copy_and_csum_datagram_msg() for a
>> second time, looking at the next datagram on the queue, that second
>> datagram's data is appended to the first datagram's.  So when
>> recvmsg() returns to the user, the return value and msg_name reflect
>> the second datagram, but the first bytes in the user's iovec come from
>> the first (bad) datagram.
>>
>> (I've attached a test program that demonstrates the problem.  Note
>> that it sees correct behavior unless the bad-checksum packet has > 68
>> bytes of UDP data; otherwise, the packet doesn't make it past the
>> CHECKSUM_BREAK test, and never enters
>> skp_copy_and_csum_datagram_msg().)
>>
>> The fix for UDP seems pretty simple; the iov_iter's iov_offset member
>> just needs to be set back to zero on a checksum failure.  But it looks
>> like skb_copy_and_csum_datagram_msg() is also called from tcp_input.c,
>> where I assume that multiple sk_buffs can be copied-and-csum'd into
>> the same iov -- if that's right, it seems like iov_iter needs some
>> additional state to support rolling-back the most recent copy without
>> losing previous ones.
>>
>> Any thoughts?
>
> Nice catch !
>
> What about clearing iov_offset from UDP (v4 & v6) only ?
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 
> 7d96dc2d3d08fa909f247dfbcbd0fc1eeb59862b..928da2fbb3caa6de4d0e1d889c237590f71607ea
>  100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1342,6 +1342,7 @@ csum_copy_err:
> /* starting over for a new packet, but check if we need to yield */
> cond_resched();
> msg->msg_flags &= ~MSG_TRUNC;
> +   msg->msg_iter.iov_offset = 0;
> goto try_again;
>  }
>
>
> (similar for ipv6)
>
>

UDP wierdness around skb_copy_and_csum_datagram_msg()

2016-09-28 Thread Jay Smith

I've spent the last week or so trying to track down a recurring
problem I'm seeing with UDP datagram handling.  I'm new to the
internals of the Linux network stack, but it appears to me that
there's a substantial error in recent kernels' handling of UDP
checksum errors.

The behavior I'm seeing:  I have a UDP server that receives lots of
datagrams from many devices on a single port. A small number of those
devices occasionally send packets with bad UDP checksums.  After I
receive one of these bad packets, the next recvmsg() made on the
socket returns data from the bad-checksum packet, but the
source-address and length of the next (good) packet that arrived at
the port.

I believe this issue was introduced between linux 3.18 and 3.19, by a
set of changes to net/core/datagram.c that made
skb_copy_and_csum_datagram_msg() and related functions use the
iov_iter structure to copy data to user buffers.  In the case where
those functions copy a datagram and then conclude that the checksum is
invalid, they don't remove the already-copied data from the user's
iovec; when udp_recvmsg() calls skb_copy_and_csum_datagram_msg() for a
second time, looking at the next datagram on the queue, that second
datagram's data is appended to the first datagram's.  So when
recvmsg() returns to the user, the return value and msg_name reflect
the second datagram, but the first bytes in the user's iovec come from
the first (bad) datagram.

(I've attached a test program that demonstrates the problem.  Note
that it sees correct behavior unless the bad-checksum packet has > 68
bytes of UDP data; otherwise, the packet doesn't make it past the
CHECKSUM_BREAK test, and never enters
skp_copy_and_csum_datagram_msg().)

The fix for UDP seems pretty simple; the iov_iter's iov_offset member
just needs to be set back to zero on a checksum failure.  But it looks
like skb_copy_and_csum_datagram_msg() is also called from tcp_input.c,
where I assume that multiple sk_buffs can be copied-and-csum'd into
the same iov -- if that's right, it seems like iov_iter needs some
additional state to support rolling-back the most recent copy without
losing previous ones.

Any thoughts?
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 


int main(int argc, char **argv)
{
  int ret;

  if (argc < 2) {
fprintf(stderr, "Usage: csumtest \n");
exit(1);
  }

  struct sockaddr_in addr;
  socklen_t addrLen = sizeof(addr);  
  addr.sin_family = AF_INET;
  addr.sin_addr.s_addr = inet_addr("127.0.0.1");
  addr.sin_port = htons(0);

  int listen = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
  if (listen < 0)
  {
fprintf(stderr, "failed to create listening socket (err=%d: %s)\n", errno, strerror(errno));
exit(1);
  }

  ret = bind(listen, (struct sockaddr *) &addr, addrLen);
  if (ret != 0)
  {
fprintf(stderr, "failed to bind listening socket socket (err=%d: %s)\n", errno, strerror(errno));
exit(1);
  }

  ret = getsockname(listen, (struct sockaddr *) &addr, &addrLen);
  if (ret != 0)
  {
fprintf(stderr, "getsockname failed for listening socket socket (err=%d: %s)\n", errno, strerror(errno));
exit(1);
  }
  else
  {
printf("listening on port %d\n", ntohs(addr.sin_port));
  }


  
  // Send a packet with deliberately bad checksum
  char rawBuf[4096];
  memset(rawBuf, 0, 4096);
  struct udphdr *udph = (struct udphdr *) rawBuf;

  // Send as UDP data the string "BAD DATA", repeated as necessary
  // to fill the number of bytes requested on the command-line
  char *data = rawBuf + sizeof(struct udphdr);
  int badBytes = atoi(argv[1]);
  int i;
  for (i = 0; i < badBytes; i += 8)
  {
strncpy(data + i, "BAD DATA", 8);
  }

  // UDP headers contain a bogus checksum, but are otherwise reasonable
  udph->check = htons(0xbadd);
  udph->source = htons(18);
  udph->dest = addr.sin_port;
  udph->len = htons(sizeof(struct udphdr) + badBytes);

  int raw = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
  ret = sendto(raw, rawBuf,
	   sizeof(struct udphdr) + badBytes, 0,
	   (struct sockaddr *) &addr, addrLen);
  if (ret < 0)
  {
fprintf(stderr, "raw send failed (err=%d: %s)\n", errno, strerror(errno));
exit(1);
  }
  
  
  // Send a second, legitimate UDP packet
  int cooked = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
  char *second = "Good data";
  ret = sendto(cooked, second, strlen(second), 0,
	   (struct sockaddr *) &addr, addrLen);
  if (ret < 0)
  {
fprintf(stderr, "second cooked send failed (err=%d: %s)\n", errno, strerror(errno));
exit(1);
  }

  // Read what's queued up for us.
  while (1)
  {
struct msghdr msg;
struct iovec  iov;
char readBuf[4096];
ssize_t bytes;

memset(readBuf, 0, 4096);

iov.iov_base = readBuf;
iov.iov_len = sizeof(readBuf);

msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_flags = 0;

b

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

UDP wierdness around skb_copy_and_csum_datagram_msg()

4 matches

Site Navigation

Mail list logo

Footer information