Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
Paolo Abeniwrites: > Thank you! > > I'll submit formally the patch after some more testing. Thanks. > I noticed this version has entered the ppc patchwork, but I think that > the formal submission should go towards the net-next tree. Yeah it picks up all patches sent to the list. That's fine I'll just mark it "Not applicable", and expect to see it arrive via net-next. cheers
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
On Fri, 2017-06-23 at 16:59 +1000, Michael Ellerman wrote: > Hannes Frederic Sowawrites: > > > On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote: > > > > > > Can you please check if the following patch fixes the issue? Only > > > compiled tested here. > > > > > > Thanks!!! > > > --- > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > > index 067a607..80d89fe 100644 > > > --- a/net/ipv4/udp.c > > > +++ b/net/ipv4/udp.c > > > @@ -1446,16 +1446,19 @@ static struct sk_buff > > > *__first_packet_length(struct sock *sk, > > > { > > > struct sk_buff *skb; > > > > > > - while ((skb = skb_peek(rcvq)) != NULL && > > > - udp_lib_checksum_complete(skb)) { > > > - __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, > > > - IS_UDPLITE(sk)); > > > - __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, > > > - IS_UDPLITE(sk)); > > > - atomic_inc(>sk_drops); > > > - __skb_unlink(skb, rcvq); > > > - *total += skb->truesize; > > > - kfree_skb(skb); > > > + while ((skb = skb_peek(rcvq)) != NULL) { > > > + if (udp_lib_checksum_complete(skb)) { > > > + __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, > > > + IS_UDPLITE(sk)); > > > + __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, > > > + IS_UDPLITE(sk)); > > > + atomic_inc(>sk_drops); > > > + __skb_unlink(skb, rcvq); > > > + *total += skb->truesize; > > > + kfree_skb(skb); > > > + } else { > > > + udp_set_dev_scratch(skb); > > > > It needs a "break;" here. > > > > > + } > > > } > > > return skb; > > > } > > That works! > > $ wget google.com > --2017-06-23 16:56:31-- http://google.com/ > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3 > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected. > Proxy request sent, awaiting response... 302 Found > Location: http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ > [following] > --2017-06-23 16:56:31-- > http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ > Reusing existing connection to proxy.pmdw.com:3128. > Proxy request sent, awaiting response... 200 OK > Length: unspecified [text/html] > Saving to: ‘index.html’ > > > The patch had whitespace issues or something and I had to apply it by > hand, here's what I actually tested. Thank you! I'll submit formally the patch after some more testing. I noticed this version has entered the ppc patchwork, but I think that the formal submission should go towards the net-next tree. Cheers, Paolo
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
Hannes Frederic Sowawrites: > On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote: >> >> Can you please check if the following patch fixes the issue? Only >> compiled tested here. >> >> Thanks!!! >> --- >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >> index 067a607..80d89fe 100644 >> --- a/net/ipv4/udp.c >> +++ b/net/ipv4/udp.c >> @@ -1446,16 +1446,19 @@ static struct sk_buff >> *__first_packet_length(struct sock *sk, >> { >> struct sk_buff *skb; >> >> - while ((skb = skb_peek(rcvq)) != NULL && >> - udp_lib_checksum_complete(skb)) { >> - __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, >> - IS_UDPLITE(sk)); >> - __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, >> - IS_UDPLITE(sk)); >> - atomic_inc(>sk_drops); >> - __skb_unlink(skb, rcvq); >> - *total += skb->truesize; >> - kfree_skb(skb); >> + while ((skb = skb_peek(rcvq)) != NULL) { >> + if (udp_lib_checksum_complete(skb)) { >> + __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, >> + IS_UDPLITE(sk)); >> + __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, >> + IS_UDPLITE(sk)); >> + atomic_inc(>sk_drops); >> + __skb_unlink(skb, rcvq); >> + *total += skb->truesize; >> + kfree_skb(skb); >> + } else { >> + udp_set_dev_scratch(skb); > > It needs a "break;" here. > >> + } >> } >> return skb; >> } That works! $ wget google.com --2017-06-23 16:56:31-- http://google.com/ Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3 Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected. Proxy request sent, awaiting response... 302 Found Location: http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ [following] --2017-06-23 16:56:31-- http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ Reusing existing connection to proxy.pmdw.com:3128. Proxy request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘index.html’ The patch had whitespace issues or something and I had to apply it by hand, here's what I actually tested. cheers diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 067a607917f9..d3227c1bbe8e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1446,16 +1446,20 @@ static struct sk_buff *__first_packet_length(struct sock *sk, { struct sk_buff *skb; - while ((skb = skb_peek(rcvq)) != NULL && - udp_lib_checksum_complete(skb)) { - __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, - IS_UDPLITE(sk)); - __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, - IS_UDPLITE(sk)); - atomic_inc(>sk_drops); - __skb_unlink(skb, rcvq); - *total += skb->truesize; - kfree_skb(skb); + while ((skb = skb_peek(rcvq)) != NULL) { + if (udp_lib_checksum_complete(skb)) { + __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, + IS_UDPLITE(sk)); + __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, + IS_UDPLITE(sk)); + atomic_inc(>sk_drops); + __skb_unlink(skb, rcvq); + *total += skb->truesize; + kfree_skb(skb); + } else { + udp_set_dev_scratch(skb); + break; + } } return skb; }
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote: > On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote: > > Paolo wrote: > > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb > > > fields are on cold cachelines. > > > If the skb are linear and the kernel don't need to compute the udp > > > csum, only a handful of skb fields are required by udp_recvmsg(). > > > Since we already use skb->dev_scratch to cache hot data, and > > > there are 32 bits unused on 64 bit archs, use such field to cache > > > as much data as we can, and try to prefetch on dequeue the relevant > > > fields that are left out. > > > > > > This can save up to 2 cache miss per packet. > > > > > > v1 -> v2: > > > - changed udp_dev_scratch fields types to u{32,16} variant, > > > replaced bitfiled with bool > > > > > > Signed-off-by: Paolo Abeni> > > Acked-by: Eric Dumazet > > > --- > > > net/ipv4/udp.c | 114 > > > +++-- > > > 1 file changed, 103 insertions(+), 11 deletions(-) > > > > This appears to break wget on one of my machines. > > > > Networking in general is working, I'm able to SSH in, but then I can't > > do a wget. > > > > eg: > > > > $ wget google.com > > --2017-06-22 22:45:39-- http://google.com/ > > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in > > name resolution. > > wget: unable to resolve host address ‘proxy.pmdw.com’ > > > > $ host proxy.pmdw.com > > proxy.pmdw.com is an alias for raven.pmdw.com. > > raven.pmdw.com has address 10.1.2.3 > > > > $ wget google.com > > --2017-06-22 22:52:08-- http://google.com/ > > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in > > name resolution. > > wget: unable to resolve host address ‘proxy.pmdw.com’ > > > > Maybe host is using TCP but the man page says it doesn't? > > > > > > Everything is OK if I boot back to the previous commit 0a463c78d25b > > ("udp: avoid a cache miss on dequeue"): > > > > $ wget google.com > > --2017-06-22 23:00:01-- http://google.com/ > > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3 > > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected. > > Proxy request sent, awaiting response... 302 Found > > Location: http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE > > [following] > > --2017-06-22 23:00:01-- > > http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE > > Reusing existing connection to proxy.pmdw.com:3128. > > Proxy request sent, awaiting response... 200 OK > > Length: unspecified [text/html] > > Saving to: ‘index.html’ > > > > index.html [ <=> > > ] 11.37K --.-KB/sin 0.001s > > > > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640] > > > > $ uname -a > > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 > > ppc64 GNU/Linux > > > > > > Haven't had time to debug any further. Any ideas? > > Thank you for this report. > > Can you please specify features of the relevant NIC ? (ethtool -k > ) > > I'll try to replicate the issue as soon I'll get hands on suitable HW, I had my hands on power7, but I can't trivially reproduce the issue so I'm going to bug you for more info. Can you please specify the host CPU, the NIC in use (ethtool -i ), the compiler version used to build the kernel and possibly provide a tcpdump of the DNS packets received/sent while running wget and while running the host command? Do you have the relevant kernel running on others PPC hosts? Thank you, Paolo
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote: > > Can you please check if the following patch fixes the issue? Only > compiled tested here. > > Thanks!!! > --- > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 067a607..80d89fe 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1446,16 +1446,19 @@ static struct sk_buff > *__first_packet_length(struct sock *sk, > { > struct sk_buff *skb; > > - while ((skb = skb_peek(rcvq)) != NULL && > - udp_lib_checksum_complete(skb)) { > - __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, > - IS_UDPLITE(sk)); > - __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, > - IS_UDPLITE(sk)); > - atomic_inc(>sk_drops); > - __skb_unlink(skb, rcvq); > - *total += skb->truesize; > - kfree_skb(skb); > + while ((skb = skb_peek(rcvq)) != NULL) { > + if (udp_lib_checksum_complete(skb)) { > + __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, > + IS_UDPLITE(sk)); > + __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, > + IS_UDPLITE(sk)); > + atomic_inc(>sk_drops); > + __skb_unlink(skb, rcvq); > + *total += skb->truesize; > + kfree_skb(skb); > + } else { > + udp_set_dev_scratch(skb); It needs a "break;" here. > + } > } > return skb; > } Bye, Hannes
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote: > Paolo wrote: > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb > > fields are on cold cachelines. > > If the skb are linear and the kernel don't need to compute the udp > > csum, only a handful of skb fields are required by udp_recvmsg(). > > Since we already use skb->dev_scratch to cache hot data, and > > there are 32 bits unused on 64 bit archs, use such field to cache > > as much data as we can, and try to prefetch on dequeue the relevant > > fields that are left out. > > > > This can save up to 2 cache miss per packet. > > > > v1 -> v2: > > - changed udp_dev_scratch fields types to u{32,16} variant, > > replaced bitfiled with bool > > > > Signed-off-by: Paolo Abeni> > Acked-by: Eric Dumazet > > --- > > net/ipv4/udp.c | 114 > > +++-- > > 1 file changed, 103 insertions(+), 11 deletions(-) > > This appears to break wget on one of my machines. > > Networking in general is working, I'm able to SSH in, but then I can't > do a wget. Can you please check if the following patch fixes the issue? Only compiled tested here. Thanks!!! --- diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 067a607..80d89fe 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1446,16 +1446,19 @@ static struct sk_buff *__first_packet_length(struct sock *sk, { struct sk_buff *skb; - while ((skb = skb_peek(rcvq)) != NULL && - udp_lib_checksum_complete(skb)) { - __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, - IS_UDPLITE(sk)); - __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, - IS_UDPLITE(sk)); - atomic_inc(>sk_drops); - __skb_unlink(skb, rcvq); - *total += skb->truesize; - kfree_skb(skb); + while ((skb = skb_peek(rcvq)) != NULL) { + if (udp_lib_checksum_complete(skb)) { + __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, + IS_UDPLITE(sk)); + __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, + IS_UDPLITE(sk)); + atomic_inc(>sk_drops); + __skb_unlink(skb, rcvq); + *total += skb->truesize; + kfree_skb(skb); + } else { + udp_set_dev_scratch(skb); + } } return skb; }
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote: > On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote: > > Paolo wrote: > > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb > > > fields are on cold cachelines. > > > If the skb are linear and the kernel don't need to compute the udp > > > csum, only a handful of skb fields are required by udp_recvmsg(). > > > Since we already use skb->dev_scratch to cache hot data, and > > > there are 32 bits unused on 64 bit archs, use such field to cache > > > as much data as we can, and try to prefetch on dequeue the relevant > > > fields that are left out. > > > > > > This can save up to 2 cache miss per packet. > > > > > > v1 -> v2: > > > - changed udp_dev_scratch fields types to u{32,16} variant, > > > replaced bitfiled with bool > > > > > > Signed-off-by: Paolo Abeni> > > Acked-by: Eric Dumazet > > > --- > > > net/ipv4/udp.c | 114 > > > +++-- > > > 1 file changed, 103 insertions(+), 11 deletions(-) > > > > This appears to break wget on one of my machines. > > > > Networking in general is working, I'm able to SSH in, but then I can't > > do a wget. > > > > eg: > > > > $ wget google.com > > --2017-06-22 22:45:39-- http://google.com/ > > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in > > name resolution. > > wget: unable to resolve host address ‘proxy.pmdw.com’ > > > > $ host proxy.pmdw.com > > proxy.pmdw.com is an alias for raven.pmdw.com. > > raven.pmdw.com has address 10.1.2.3 > > > > $ wget google.com > > --2017-06-22 22:52:08-- http://google.com/ > > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in > > name resolution. > > wget: unable to resolve host address ‘proxy.pmdw.com’ > > > > Maybe host is using TCP but the man page says it doesn't? > > > > > > Everything is OK if I boot back to the previous commit 0a463c78d25b > > ("udp: avoid a cache miss on dequeue"): > > > > $ wget google.com > > --2017-06-22 23:00:01-- http://google.com/ > > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3 > > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected. > > Proxy request sent, awaiting response... 302 Found > > Location: http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE > > [following] > > --2017-06-22 23:00:01-- > > http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE > > Reusing existing connection to proxy.pmdw.com:3128. > > Proxy request sent, awaiting response... 200 OK > > Length: unspecified [text/html] > > Saving to: ‘index.html’ > > > > index.html [ <=> > > ] 11.37K --.-KB/sin 0.001s > > > > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640] > > > > $ uname -a > > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 > > ppc64 GNU/Linux > > > > > > Haven't had time to debug any further. Any ideas? > > Thank you for this report. > > Can you please specify features of the relevant NIC ? (ethtool -k > ) > > I'll try to replicate the issue as soon I'll get hands on suitable HW, I had my hands on power7, but I can't trivially reproduce the issue so I'm going to bug you for more info. Can you please specify the host CPU, the NIC in use (ethtool -i ), the compiler version used to build the kernel and possibly provide a tcpdump of the DNS packets received/sent while running wget and while running the host command? Do you have the relevant kernel running on others PPC hosts? Thank you, Paolo
Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote: > Paolo wrote: > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb > > fields are on cold cachelines. > > If the skb are linear and the kernel don't need to compute the udp > > csum, only a handful of skb fields are required by udp_recvmsg(). > > Since we already use skb->dev_scratch to cache hot data, and > > there are 32 bits unused on 64 bit archs, use such field to cache > > as much data as we can, and try to prefetch on dequeue the relevant > > fields that are left out. > > > > This can save up to 2 cache miss per packet. > > > > v1 -> v2: > > - changed udp_dev_scratch fields types to u{32,16} variant, > > replaced bitfiled with bool > > > > Signed-off-by: Paolo Abeni> > Acked-by: Eric Dumazet > > --- > > net/ipv4/udp.c | 114 > > +++-- > > 1 file changed, 103 insertions(+), 11 deletions(-) > > This appears to break wget on one of my machines. > > Networking in general is working, I'm able to SSH in, but then I can't > do a wget. > > eg: > > $ wget google.com > --2017-06-22 22:45:39-- http://google.com/ > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in > name resolution. > wget: unable to resolve host address ‘proxy.pmdw.com’ > > $ host proxy.pmdw.com > proxy.pmdw.com is an alias for raven.pmdw.com. > raven.pmdw.com has address 10.1.2.3 > > $ wget google.com > --2017-06-22 22:52:08-- http://google.com/ > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in > name resolution. > wget: unable to resolve host address ‘proxy.pmdw.com’ > > Maybe host is using TCP but the man page says it doesn't? > > > Everything is OK if I boot back to the previous commit 0a463c78d25b > ("udp: avoid a cache miss on dequeue"): > > $ wget google.com > --2017-06-22 23:00:01-- http://google.com/ > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3 > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected. > Proxy request sent, awaiting response... 302 Found > Location: http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE > [following] > --2017-06-22 23:00:01-- > http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE > Reusing existing connection to proxy.pmdw.com:3128. > Proxy request sent, awaiting response... 200 OK > Length: unspecified [text/html] > Saving to: ‘index.html’ > > index.html [ <=> ] > 11.37K --.-KB/sin 0.001s > > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640] > > $ uname -a > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 > ppc64 GNU/Linux > > > Haven't had time to debug any further. Any ideas? Thank you for this report. Can you please specify features of the relevant NIC ? (ethtool -k ) I'll try to replicate the issue as soon I'll get hands on suitable HW, meanwhile can you please try to trace the system behavior with perf? Something like: perf probe -a __udp_enqueue_schedule_skb perf probe -a udp_recvmsg perf probe -a udpv6_recvmsg perf record -e probe:__udp_enqueue_schedule_skb -e probe:udp_recvmsg -e probe:udpv6_recvmsg -ag wget google.com perf report --stdio Thanks, Paolo