Re: scheduling while atomic from vmci_transport_recv_stream_cb in 3.16 kernels
On Fri 15-09-17 18:12:15, Ben Hutchings wrote: > On Thu, 2017-09-14 at 10:59 +0200, Michal Hocko wrote: > > On Wed 13-09-17 18:58:13, Jorgen S. Hansen wrote: > > [...] > > > The patch series look good to me. > > > > Thanks for double checking. Ben, could you merge this to 3.16 stable > > branch, please? > > I have a long list of requests to work through, but I will get to this > eventually. Thanks! -- Michal Hocko SUSE Labs
[PATCH net-next] net: remove useless comments in dst.c
dst gc related code has been removed in commit 5b7c9a8ff828, so those comments are no longer useful. Signed-off-by: Duan Jiong --- net/core/dst.c | 17 - 1 file changed, 17 deletions(-) diff --git a/net/core/dst.c b/net/core/dst.c index a6c47da..a710d39 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -25,23 +25,6 @@ #include #include -/* - * Theory of operations: - * 1) We use a list, protected by a spinlock, to add - *new entries from both BH and non-BH context. - * 2) In order to keep spinlock held for a small delay, - *we use a second list where are stored long lived - *entries, that are handled by the garbage collect thread - *fired by a workqueue. - * 3) This list is guarded by a mutex, - *so that the gc_task and dst_dev_event() can be synchronized. - */ - -/* - * We want to keep lock & list close together - * to dirty as few cache lines as possible in __dst_free(). - * As this is not a very strong hint, we dont force an alignment on SMP. - */ int dst_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb) { kfree_skb(skb); -- 2.9.3
Re: [PATCH] bnx2x: drop packets where gso_size is too big for hardware
Hi Eric, >> +if (unlikely(skb_shinfo(skb)->gso_size + hlen > >> MAX_PACKET_SIZE)) { >> +BNX2X_ERR("reported gso segment size plus headers " >> + "(%d + %d) > MAX_PACKET_SIZE; dropping pkt!", >> + skb_shinfo(skb)->gso_size, hlen); >> + >> +goto free_and_drop; >> +} >> + > > > If you had this test in bnx2x_features_check(), packet could be > segmented by core networking stack before reaching bnx2x_start_xmit() by > clearing NETIF_F_GSO_MASK > > -> No drop would be involved. > > check i40evf_features_check() for similar logic. So I've been experimenting with this and reading through the core networking code. If my understanding is correct, disabling GSO will cause the packet to be segmented, but it will be segemented into gso_size+header length packets. So in this case (~10kB gso_size) the resultant packets will still be too big - although at least they don't cause a crash in that case. We could continue with this anyway as it at least prevents the crash - but, and I haven't been able to find a nice definitive answer to this - are implementations of ndo_start_xmit permitted to assume that the the skb passed in will fit within the MTU? I notice that most callers will attempt to ensure this - for example ip_output.c, ip6_output.c and ip_forward.c all contain calls to skb_gso_validate_mtu(). If implementations are permitted to assume this, perhaps a fix to openvswitch would be more appropriate? Regards, Daniel
Re: [PATCH] vhost_net: conditionally enable tx polling
Hi Jason, [auto build test WARNING on vhost/linux-next] [also build test WARNING on v4.14-rc1 next-20170915] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Jason-Wang/vhost_net-conditionally-enable-tx-polling/20170918-112041 base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next config: x86_64-randconfig-x009-201738 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers//vhost/net.c: In function 'handle_tx': >> drivers//vhost/net.c:565:4: warning: suggest parentheses around assignment >> used as truth value [-Wparentheses] if (err = -EAGAIN) ^~ vim +565 drivers//vhost/net.c 442 443 /* Expects to be always run from workqueue - which acts as 444 * read-size critical section for our kind of RCU. */ 445 static void handle_tx(struct vhost_net *net) 446 { 447 struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; 448 struct vhost_virtqueue *vq = &nvq->vq; 449 unsigned out, in; 450 int head; 451 struct msghdr msg = { 452 .msg_name = NULL, 453 .msg_namelen = 0, 454 .msg_control = NULL, 455 .msg_controllen = 0, 456 .msg_flags = MSG_DONTWAIT, 457 }; 458 size_t len, total_len = 0; 459 int err; 460 size_t hdr_size; 461 struct socket *sock; 462 struct vhost_net_ubuf_ref *uninitialized_var(ubufs); 463 bool zcopy, zcopy_used; 464 465 mutex_lock(&vq->mutex); 466 sock = vq->private_data; 467 if (!sock) 468 goto out; 469 470 if (!vq_iotlb_prefetch(vq)) 471 goto out; 472 473 vhost_disable_notify(&net->dev, vq); 474 vhost_net_disable_vq(net, vq); 475 476 hdr_size = nvq->vhost_hlen; 477 zcopy = nvq->ubufs; 478 479 for (;;) { 480 /* Release DMAs done buffers first */ 481 if (zcopy) 482 vhost_zerocopy_signal_used(net, vq); 483 484 /* If more outstanding DMAs, queue the work. 485 * Handle upend_idx wrap around 486 */ 487 if (unlikely(vhost_exceeds_maxpend(net))) 488 break; 489 490 head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, 491 ARRAY_SIZE(vq->iov), 492 &out, &in); 493 /* On error, stop handling until the next kick. */ 494 if (unlikely(head < 0)) 495 break; 496 /* Nothing new? Wait for eventfd to tell us they refilled. */ 497 if (head == vq->num) { 498 if (unlikely(vhost_enable_notify(&net->dev, vq))) { 499 vhost_disable_notify(&net->dev, vq); 500 continue; 501 } 502 break; 503 } 504 if (in) { 505 vq_err(vq, "Unexpected descriptor format for TX: " 506 "out %d, int %d\n", out, in); 507 break; 508 } 509 /* Skip header. TODO: support TSO. */ 510 len = iov_length(vq->iov, out); 511 iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len); 512 iov_iter_advance(&msg.msg_iter, hdr_size); 513 /* Sanity check */ 514 if (!msg_data_left(&msg)) { 515 vq_err(vq, "Unexpected header len for TX: " 516 "%zd expected %zd\n", 517 len, hdr_size); 518 break; 519 } 520 len = msg_data_left(&msg); 521 522 zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN 523 && (nvq->upend_idx + 1) % UIO_MAXIOV != 524nvq->done_idx 525 && vhost_net_tx_select_zcopy(net); 526 527 /* use msg_control to pass vhost zerocopy ubuf info to skb */ 528 if (zcopy_used) { 529 struct ubuf_info *ubuf; 530
Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
On 2017/9/14 13:06, Sekhar Nori wrote: On Thursday 14 September 2017 03:28 AM, Franklin S Cooper Jr wrote: On 08/18/2017 02:39 PM, Franklin S Cooper Jr wrote: During test transmitting using CAN-FD at high bitrates (4 Mbps) only resulted in errors. Scoping the signals I noticed that only a single bit was being transmitted and with a bit more investigation realized the actual MCAN IP would go back to initialization mode automatically. It appears this issue is due to the MCAN needing to use the Transmitter Delay Compensation Mode as defined in the MCAN User's Guide. When this mode is used the User's Guide indicates that the Transmitter Delay Compensation Offset register should be set. The document mentions that this register should be set to (1/dbitrate)/2*(Func Clk Freq). Additional CAN-CIA's "Bit Time Requirements for CAN FD" document indicates that this TDC mode is only needed for data bit rates above 2.5 Mbps. Therefore, only enable this mode and only set TDCO when the data bit rate is above 2.5 Mbps. Signed-off-by: Franklin S Cooper Jr --- I'm pretty surprised that this hasn't been implemented already since the primary purpose of CAN-FD is to go beyond 1 Mbps and the MCAN IP supports up to 10 Mbps. So it will be nice to get comments from users of this driver to understand if they have been able to use CAN-FD beyond 2.5 Mbps without this patch. If they haven't what did they do to get around it if they needed higher speeds. Meanwhile I plan on testing this using a more "realistic" CAN bus to insure everything still works at 5 Mbps which is the max speed of my CAN transceiver. ping. Anyone has any thoughts on this? I added Dong who authored the m_can driver and Wenyou who added the only in-kernel user of the driver for any help. I tested it on SAMA5D2 Xplained board both with and without this patch, both work with the 4M bps data bit rate. Thanks, Sekhar drivers/net/can/m_can/m_can.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index f4947a7..720e073 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -126,6 +126,12 @@ enum m_can_mram_cfg { #define DBTP_DSJW_SHIFT 0 #define DBTP_DSJW_MASK(0xf << DBTP_DSJW_SHIFT) +/* Transmitter Delay Compensation Register (TDCR) */ +#define TDCR_TDCO_SHIFT8 +#define TDCR_TDCO_MASK (0x7F << TDCR_TDCO_SHIFT) +#define TDCR_TDCF_SHIFT0 +#define TDCR_TDCF_MASK (0x7F << TDCR_TDCO_SHIFT) + /* Test Register (TEST) */ #define TEST_LBCK BIT(4) @@ -977,6 +983,8 @@ static int m_can_set_bittiming(struct net_device *dev) const struct can_bittiming *dbt = &priv->can.data_bittiming; u16 brp, sjw, tseg1, tseg2; u32 reg_btp; + u32 enable_tdc = 0; + u32 tdco; brp = bt->brp - 1; sjw = bt->sjw - 1; @@ -991,9 +999,23 @@ static int m_can_set_bittiming(struct net_device *dev) sjw = dbt->sjw - 1; tseg1 = dbt->prop_seg + dbt->phase_seg1 - 1; tseg2 = dbt->phase_seg2 - 1; + + /* TDC is only needed for bitrates beyond 2.5 MBit/s +* Specified in the "Bit Time Requirements for CAN FD" document +*/ + if (dbt->bitrate > 250) { + enable_tdc = DBTP_TDC; + /* Equation based on Bosch's M_CAN User Manual's +* Transmitter Delay Compensation Section +*/ + tdco = priv->can.clock.freq / (dbt->bitrate * 2); + m_can_write(priv, M_CAN_TDCR, tdco << TDCR_TDCO_SHIFT); + } + reg_btp = (brp << DBTP_DBRP_SHIFT) | (sjw << DBTP_DSJW_SHIFT) | (tseg1 << DBTP_DTSEG1_SHIFT) | - (tseg2 << DBTP_DTSEG2_SHIFT); + (tseg2 << DBTP_DTSEG2_SHIFT) | enable_tdc; + m_can_write(priv, M_CAN_DBTP, reg_btp); } Regards, Wenyou Yang
Re: Regression in throughput between kvm guests over virtual bridge
On 2017年09月16日 03:19, Matthew Rosato wrote: It looks like vhost is slowed down for some reason which leads to more idle time on 4.13+VHOST_RX_BATCH=1. Appreciated if you can collect the perf.diff on host, one for rx and one for tx. perf data below for the associated vhost threads, baseline=4.12, delta1=4.13, delta2=4.13+VHOST_RX_BATCH=1 Client vhost: 60.12% -11.11% -12.34% [kernel.vmlinux] [k] raw_copy_from_user 13.76% -1.28% -0.74% [kernel.vmlinux] [k] get_page_from_freelist 2.00% +3.69% +3.54% [kernel.vmlinux] [k] __wake_up_sync_key 1.19% +0.60% +0.66% [kernel.vmlinux] [k] __alloc_pages_nodemask 1.12% +0.76% +0.86% [kernel.vmlinux] [k] copy_page_from_iter 1.09% +0.28% +0.35% [vhost][k] vhost_get_vq_desc 1.07% +0.31% +0.26% [kernel.vmlinux] [k] alloc_skb_with_frags 0.94% +0.42% +0.65% [kernel.vmlinux] [k] alloc_pages_current 0.91% -0.19% -0.18% [kernel.vmlinux] [k] memcpy 0.88% +0.26% +0.30% [kernel.vmlinux] [k] __next_zones_zonelist 0.85% +0.05% +0.12% [kernel.vmlinux] [k] iov_iter_advance 0.79% +0.09% +0.19% [vhost][k] __vhost_add_used_n 0.74%[kernel.vmlinux] [k] get_task_policy.part.7 0.74% -0.01% -0.05% [kernel.vmlinux] [k] tun_net_xmit 0.60% +0.17% +0.33% [kernel.vmlinux] [k] policy_nodemask 0.58% -0.15% -0.12% [ebtables] [k] ebt_do_table 0.52% -0.25% -0.22% [kernel.vmlinux] [k] __alloc_skb ... 0.42% +0.58% +0.59% [kernel.vmlinux] [k] eventfd_signal ... 0.32% +0.96% +0.93% [kernel.vmlinux] [k] finish_task_switch ... +1.50% +1.16% [kernel.vmlinux] [k] get_task_policy.part.9 +0.40% +0.42% [kernel.vmlinux] [k] __skb_get_hash_symmetr +0.39% +0.40% [kernel.vmlinux] [k] _copy_from_iter_full +0.24% +0.23% [vhost_net][k] vhost_net_buf_peek Server vhost: 61.93% -10.72% -10.91% [kernel.vmlinux] [k] raw_copy_to_user 9.25% +0.47% +0.86% [kernel.vmlinux] [k] free_hot_cold_page 5.16% +1.41% +1.57% [vhost][k] vhost_get_vq_desc 5.12% -3.81% -3.78% [kernel.vmlinux] [k] skb_release_data 3.30% +0.42% +0.55% [kernel.vmlinux] [k] raw_copy_from_user 1.29% +2.20% +2.28% [kernel.vmlinux] [k] copy_page_to_iter 1.24% +1.65% +0.45% [vhost_net][k] handle_rx 1.08% +3.03% +2.85% [kernel.vmlinux] [k] __wake_up_sync_key 0.96% +0.70% +1.10% [vhost][k] translate_desc 0.69% -0.20% -0.22% [kernel.vmlinux] [k] tun_do_read.part.10 0.69%[kernel.vmlinux] [k] tun_peek_len 0.67% +0.75% +0.78% [kernel.vmlinux] [k] eventfd_signal 0.52% +0.96% +0.98% [kernel.vmlinux] [k] finish_task_switch 0.50% +0.05% +0.09% [vhost][k] vhost_add_used_n ... +0.63% +0.58% [vhost_net][k] vhost_net_buf_peek +0.32% +0.32% [kernel.vmlinux] [k] _copy_to_iter +0.19% +0.19% [kernel.vmlinux] [k] __skb_get_hash_symmetr +0.11% +0.21% [vhost][k] vhost_umem_interval_tr Looks like for some unknown reason which leads more wakeups. Could you please try to attached patch to see if it solves or mitigate the issue? Thanks >From 63b276ed881c1e2a89b7ea35b6f328f70ddd6185 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Mon, 18 Sep 2017 10:56:30 +0800 Subject: [PATCH] vhost_net: conditionally enable tx polling Signed-off-by: Jason Wang --- drivers/vhost/net.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 58585ec..397d86a 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -471,6 +471,7 @@ static void handle_tx(struct vhost_net *net) goto out; vhost_disable_notify(&net->dev, vq); + vhost_net_disable_vq(net, vq); hdr_size = nvq->vhost_hlen; zcopy = nvq->ubufs; @@ -562,6 +563,8 @@ static void handle_tx(struct vhost_net *net) % UIO_MAXIOV; } vhost_discard_vq_desc(vq, 1); + if (err = -EAGAIN) +vhost_net_enable_vq(net, vq); break; } if (err != len) -- 1.8.3.1
Re: [PATCH V2] tipc: Use bsearch library function
On Sun, 2017-09-17 at 16:27 +, Jon Maloy wrote: > > -Original Message- > > From: Thomas Meyer [mailto:tho...@m3y3r.de] [] > > What about the other binary search implementation in the same file? Should > > I try to convert it it will it get NAKed for performance reasons too? > > The searches for inserting and removing publications is less time critical, > so that would be ok with me. > If you have any more general interest in improving the code in this file > (which is needed) it would also be appreciated. Perhaps using an rbtree would be an improvement.
Re: [PATCH net-next v2 0/7] korina: performance fixes and cleanup
On 09/17/2017 10:23 AM, Roman Yeryomin wrote: > Changes from v1: > - use GRO instead of increasing ring size > - use NAPI_POLL_WEIGHT instead of defining own NAPI_WEIGHT > - optimize rx descriptor flags processing net-next is closed at the moment, but these look like reasonable changes, I would just replace patch 7 with a patch that entirely drops the driver specific version since that does not serve any purpose in the context of an in-kernel driver. Some nice clean-ups that you should also consider for future changes: - reduce the duplication of tests/conditions in korina_send_packet(), a lot of them are testing for the same things and setting the same descriptor bits - move korina_tx() to a NAPI context instead of working from hard interrupt context - get rid of the MIPS dma_cache_* calls and instead properly use the DMA-API to allocate descriptors and invalidate/write-back skb->data > > Roman Yeryomin (7): > net: korina: don't use overflow and underflow interrupts > net: korina: optimize rx descriptor flags processing > net: korina: use NAPI_POLL_WEIGHT > net: korina: use GRO > net: korina: whitespace cleanup > net: korina: update authors > net: korina: bump version > > drivers/net/ethernet/korina.c | 230 > ++ > 1 file changed, 78 insertions(+), 152 deletions(-) > -- Florian
Re: [PATCH net-next v2 7/7] net: korina: bump version
On 09/17/2017 10:25 AM, Roman Yeryomin wrote: > Signed-off-by: Roman Yeryomin You can probably drop the version because it does not really make much sense for an in-kernel driver anyway. -- Florian
Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
Hi. Just to note that it looks like disabling RACK and re-enabling FACK prevents warning from happening: net.ipv4.tcp_fack = 1 net.ipv4.tcp_recovery = 0 Hope I get semantics of these tunables right. On pátek 15. září 2017 21:04:36 CEST Oleksandr Natalenko wrote: > Hello. > > With net.ipv4.tcp_fack set to 0 the warning still appears: > > === > » sysctl net.ipv4.tcp_fack > net.ipv4.tcp_fack = 0 > > » LC_TIME=C dmesg -T | grep WARNING > [Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c: > 2826 tcp_fastretrans_alert+0x7c8/0x990 > [Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c: > 2826 tcp_fastretrans_alert+0x7c8/0x990 > [Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c: > 2826 tcp_fastretrans_alert+0x7c8/0x990 > [Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c: > 2826 tcp_fastretrans_alert+0x7c8/0x990 > > » ps -up 711 > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > root 711 4.3 0.0 0 0 ?S18:12 7:23 [irq/123- > enp3s0] > === > > Any suggestions? > > On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote: > > Thanks for testing that. That is a very useful data point. > > > > I was able to cook up a packetdrill test that could put the connection > > in CA_Disorder with retransmitted packets out, but not in CA_Open. So > > we do not yet have a test case to reproduce this. > > > > We do not see this warning on our fleet at Google. One significant > > difference I see between our environment and yours is that it seems > > > > you run with FACK enabled: > > net.ipv4.tcp_fack = 1 > > > > Note that FACK was disabled by default (since it was replaced by RACK) > > between kernel v4.10 and v4.11. And this is exactly the time when this > > bug started manifesting itself for you and some others, but not our > > fleet. So my new working hypothesis would be that this warning is due > > to a behavior that only shows up in kernels >=4.11 when FACK is > > enabled. > > > > Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at > > boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent), > > reboot, and test the kernel for a few days to see if the warning still > > pops up? > > > > thanks, > > neal > > > > [ps: apologies for the previous, mis-formatted post...]
Re: [PATCH] hamradio: baycom: use new parport device model
Acked-By: Thomas Sailer Am 17.09.2017 um 13:46 schrieb Sudip Mukherjee: Modify baycom driver to use the new parallel port device model. Signed-off-by: Sudip Mukherjee --- Not tested on real hardware, only tested on qemu and verified that the device is binding to the driver properly in epp_open but then unbinding as the device was not found. drivers/net/hamradio/baycom_epp.c | 50 +++ 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c index 1503f10..1e62d00 100644 --- a/drivers/net/hamradio/baycom_epp.c +++ b/drivers/net/hamradio/baycom_epp.c @@ -840,6 +840,7 @@ static int epp_open(struct net_device *dev) unsigned char tmp[128]; unsigned char stat; unsigned long tstart; + struct pardev_cb par_cb; if (!pp) { printk(KERN_ERR "%s: parport at 0x%lx unknown\n", bc_drvname, dev->base_addr); @@ -859,8 +860,21 @@ static int epp_open(struct net_device *dev) return -EIO; } memset(&bc->modem, 0, sizeof(bc->modem)); -bc->pdev = parport_register_device(pp, dev->name, NULL, epp_wakeup, - NULL, PARPORT_DEV_EXCL, dev); + memset(&par_cb, 0, sizeof(par_cb)); + par_cb.wakeup = epp_wakeup; + par_cb.private = (void *)dev; + par_cb.flags = PARPORT_DEV_EXCL; + for (i = 0; i < NR_PORTS; i++) + if (baycom_device[i] == dev) + break; + + if (i == NR_PORTS) { + pr_err("%s: no device found\n", bc_drvname); + parport_put_port(pp); + return -ENODEV; + } + + bc->pdev = parport_register_dev_model(pp, dev->name, &par_cb, i); parport_put_port(pp); if (!bc->pdev) { printk(KERN_ERR "%s: cannot register parport at 0x%lx\n", bc_drvname, pp->base); @@ -1185,6 +1199,23 @@ MODULE_LICENSE("GPL"); /* - */ +static int baycom_epp_par_probe(struct pardevice *par_dev) +{ + struct device_driver *drv = par_dev->dev.driver; + int len = strlen(drv->name); + + if (strncmp(par_dev->name, drv->name, len)) + return -ENODEV; + + return 0; +} + +static struct parport_driver baycom_epp_par_driver = { + .name = "bce", + .probe = baycom_epp_par_probe, + .devmodel = true, +}; + static void __init baycom_epp_dev_setup(struct net_device *dev) { struct baycom_state *bc = netdev_priv(dev); @@ -1204,10 +1235,15 @@ static void __init baycom_epp_dev_setup(struct net_device *dev) static int __init init_baycomepp(void) { - int i, found = 0; + int i, found = 0, ret; char set_hw = 1; printk(bc_drvinfo); + + ret = parport_register_driver(&baycom_epp_par_driver); + if (ret) + return ret; + /* * register net devices */ @@ -1241,7 +1277,12 @@ static int __init init_baycomepp(void) found++; } - return found ? 0 : -ENXIO; + if (found == 0) { + parport_unregister_driver(&baycom_epp_par_driver); + return -ENXIO; + } + + return 0; } static void __exit cleanup_baycomepp(void) @@ -1260,6 +1301,7 @@ static void __exit cleanup_baycomepp(void) printk(paranoia_str, "cleanup_module"); } } + parport_unregister_driver(&baycom_epp_par_driver); } module_init(init_baycomepp);
[PATCH net-next v2 7/7] net: korina: bump version
Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index d58aa4bfcb58..7cecd9dbc111 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -66,8 +66,8 @@ #include #define DRV_NAME "korina" -#define DRV_VERSION"0.10" -#define DRV_RELDATE"04Mar2008" +#define DRV_VERSION"0.20" +#define DRV_RELDATE"15Sep2017" #define STATION_ADDRESS_HIGH(dev) (((dev)->dev_addr[0] << 8) | \ ((dev)->dev_addr[1])) -- 2.11.0
[PATCH net-next v2 4/7] net: korina: use GRO
Performance gain when receiving locally is 55->95Mbps and 50->65Mbps for NAT. Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index c210add9b654..5f36e1703378 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -406,7 +406,7 @@ static int korina_rx(struct net_device *dev, int limit) skb->protocol = eth_type_trans(skb, dev); /* Pass the packet to upper layers */ - netif_receive_skb(skb); + napi_gro_receive(&lp->napi, skb); dev->stats.rx_packets++; dev->stats.rx_bytes += pkt_len; -- 2.11.0
[PATCH net-next v2 5/7] net: korina: whitespace cleanup
Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 58 +++ 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 5f36e1703378..c26f0d84ba6b 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -64,9 +64,9 @@ #include #include -#define DRV_NAME"korina" -#define DRV_VERSION "0.10" -#define DRV_RELDATE "04Mar2008" +#define DRV_NAME "korina" +#define DRV_VERSION"0.10" +#define DRV_RELDATE"04Mar2008" #define STATION_ADDRESS_HIGH(dev) (((dev)->dev_addr[0] << 8) | \ ((dev)->dev_addr[1])) @@ -75,7 +75,7 @@ ((dev)->dev_addr[4] << 8) | \ ((dev)->dev_addr[5])) -#define MII_CLOCK 125 /* no more than 2.5MHz */ +#define MII_CLOCK 125 /* no more than 2.5MHz */ /* the following must be powers of two */ #define KORINA_NUM_RDS 64 /* number of receive descriptors */ @@ -87,15 +87,19 @@ #define KORINA_RBSIZE 1536 /* size of one resource buffer = Ether MTU */ #define KORINA_RDS_MASK(KORINA_NUM_RDS - 1) #define KORINA_TDS_MASK(KORINA_NUM_TDS - 1) -#define RD_RING_SIZE (KORINA_NUM_RDS * sizeof(struct dma_desc)) +#define RD_RING_SIZE (KORINA_NUM_RDS * sizeof(struct dma_desc)) #define TD_RING_SIZE (KORINA_NUM_TDS * sizeof(struct dma_desc)) -#define TX_TIMEOUT (6000 * HZ / 1000) +#define TX_TIMEOUT (6000 * HZ / 1000) -enum chain_status { desc_filled, desc_empty }; -#define IS_DMA_FINISHED(X) (((X) & (DMA_DESC_FINI)) != 0) -#define IS_DMA_DONE(X) (((X) & (DMA_DESC_DONE)) != 0) -#define RCVPKT_LENGTH(X) (((X) & ETH_RX_LEN) >> ETH_RX_LEN_BIT) +enum chain_status { + desc_filled, + desc_empty +}; + +#define IS_DMA_FINISHED(X) (((X) & (DMA_DESC_FINI)) != 0) +#define IS_DMA_DONE(X) (((X) & (DMA_DESC_DONE)) != 0) +#define RCVPKT_LENGTH(X) (((X) & ETH_RX_LEN) >> ETH_RX_LEN_BIT) /* Information that need to be kept for each board. */ struct korina_private { @@ -123,7 +127,7 @@ struct korina_private { int rx_irq; int tx_irq; - spinlock_t lock;/* NIC xmit lock */ + spinlock_t lock;/* NIC xmit lock */ int dma_halt_cnt; int dma_run_cnt; @@ -146,17 +150,17 @@ static inline void korina_start_dma(struct dma_reg *ch, u32 dma_addr) static inline void korina_abort_dma(struct net_device *dev, struct dma_reg *ch) { - if (readl(&ch->dmac) & DMA_CHAN_RUN_BIT) { - writel(0x10, &ch->dmac); + if (readl(&ch->dmac) & DMA_CHAN_RUN_BIT) { + writel(0x10, &ch->dmac); - while (!(readl(&ch->dmas) & DMA_STAT_HALT)) - netif_trans_update(dev); + while (!(readl(&ch->dmas) & DMA_STAT_HALT)) + netif_trans_update(dev); - writel(0, &ch->dmas); - } + writel(0, &ch->dmas); + } - writel(0, &ch->dmadptr); - writel(0, &ch->dmandptr); + writel(0, &ch->dmadptr); + writel(0, &ch->dmandptr); } static inline void korina_chain_dma(struct dma_reg *ch, u32 dma_addr) @@ -685,7 +689,7 @@ static int korina_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) /* ethtool helpers */ static void netdev_get_drvinfo(struct net_device *dev, - struct ethtool_drvinfo *info) + struct ethtool_drvinfo *info) { struct korina_private *lp = netdev_priv(dev); @@ -728,10 +732,10 @@ static u32 netdev_get_link(struct net_device *dev) } static const struct ethtool_ops netdev_ethtool_ops = { - .get_drvinfo= netdev_get_drvinfo, - .get_link = netdev_get_link, - .get_link_ksettings = netdev_get_link_ksettings, - .set_link_ksettings = netdev_set_link_ksettings, + .get_drvinfo= netdev_get_drvinfo, + .get_link = netdev_get_link, + .get_link_ksettings = netdev_get_link_ksettings, + .set_link_ksettings = netdev_set_link_ksettings, }; static int korina_alloc_ring(struct net_device *dev) @@ -863,7 +867,7 @@ static int korina_init(struct net_device *dev) /* Management Clock Prescaler Divisor * Clock independent setting */ writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1, - &lp->eth_regs->ethmcp); + &lp->eth_regs->ethmcp); /* don't transmit until fifo contains 48b */ writel(48, &lp->eth_regs->ethfifott); @@ -946,14 +950,14 @@ static int korina_open(struct net_device *dev) 0, "Korina ethernet Rx", dev); if (ret < 0) { printk(KERN_ERR "%s: unable to get Rx DMA IRQ %d\n", - dev->name, lp->rx_irq);
[PATCH net-next v2 6/7] net: korina: update authors
Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index c26f0d84ba6b..d58aa4bfcb58 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -4,6 +4,7 @@ * Copyright 2004 IDT Inc. (risch...@idt.com) * Copyright 2006 Felix Fietkau * Copyright 2008 Florian Fainelli + * Copyright 2017 Roman Yeryomin * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -1150,5 +1151,6 @@ module_platform_driver(korina_driver); MODULE_AUTHOR("Philip Rischel "); MODULE_AUTHOR("Felix Fietkau "); MODULE_AUTHOR("Florian Fainelli "); +MODULE_AUTHOR("Roman Yeryomin "); MODULE_DESCRIPTION("IDT RC32434 (Korina) Ethernet driver"); MODULE_LICENSE("GPL"); -- 2.11.0
[PATCH net-next v2 3/7] net: korina: use NAPI_POLL_WEIGHT
Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index e5466e19994a..c210add9b654 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -1082,7 +1082,7 @@ static int korina_probe(struct platform_device *pdev) dev->netdev_ops = &korina_netdev_ops; dev->ethtool_ops = &netdev_ethtool_ops; dev->watchdog_timeo = TX_TIMEOUT; - netif_napi_add(dev, &lp->napi, korina_poll, 64); + netif_napi_add(dev, &lp->napi, korina_poll, NAPI_POLL_WEIGHT); lp->phy_addr = (((lp->rx_irq == 0x2c? 1:0) << 8) | 0x05); lp->mii_if.dev = dev; -- 2.11.0
[PATCH net-next v2 2/7] net: korina: optimize rx descriptor flags processing
Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 87 ++- 1 file changed, 44 insertions(+), 43 deletions(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 98d686ed69a9..e5466e19994a 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -363,59 +363,60 @@ static int korina_rx(struct net_device *dev, int limit) if ((KORINA_RBSIZE - (u32)DMA_COUNT(rd->control)) == 0) break; - /* Update statistics counters */ - if (devcs & ETH_RX_CRC) - dev->stats.rx_crc_errors++; - if (devcs & ETH_RX_LOR) - dev->stats.rx_length_errors++; - if (devcs & ETH_RX_LE) - dev->stats.rx_length_errors++; - if (devcs & ETH_RX_OVR) - dev->stats.rx_fifo_errors++; - if (devcs & ETH_RX_CV) - dev->stats.rx_frame_errors++; - if (devcs & ETH_RX_CES) - dev->stats.rx_length_errors++; - if (devcs & ETH_RX_MP) - dev->stats.multicast++; - - if ((devcs & ETH_RX_LD) != ETH_RX_LD) { - /* check that this is a whole packet -* WARNING: DMA_FD bit incorrectly set -* in Rc32434 (errata ref #077) */ + /* check that this is a whole packet +* WARNING: DMA_FD bit incorrectly set +* in Rc32434 (errata ref #077) */ + if (!(devcs & ETH_RX_LD)) + goto next; + + if (!(devcs & ETH_RX_ROK)) { + /* Update statistics counters */ dev->stats.rx_errors++; dev->stats.rx_dropped++; - } else if ((devcs & ETH_RX_ROK)) { - pkt_len = RCVPKT_LENGTH(devcs); + if (devcs & ETH_RX_CRC) + dev->stats.rx_crc_errors++; + if (devcs & ETH_RX_LE) + dev->stats.rx_length_errors++; + if (devcs & ETH_RX_OVR) + dev->stats.rx_fifo_errors++; + if (devcs & ETH_RX_CV) + dev->stats.rx_frame_errors++; + if (devcs & ETH_RX_CES) + dev->stats.rx_frame_errors++; + + goto next; + } - /* must be the (first and) last -* descriptor then */ - pkt_buf = (u8 *)lp->rx_skb[lp->rx_next_done]->data; + pkt_len = RCVPKT_LENGTH(devcs); - /* invalidate the cache */ - dma_cache_inv((unsigned long)pkt_buf, pkt_len - 4); + /* must be the (first and) last +* descriptor then */ + pkt_buf = (u8 *)lp->rx_skb[lp->rx_next_done]->data; - /* Malloc up new buffer. */ - skb_new = netdev_alloc_skb_ip_align(dev, KORINA_RBSIZE); + /* invalidate the cache */ + dma_cache_inv((unsigned long)pkt_buf, pkt_len - 4); - if (!skb_new) - break; - /* Do not count the CRC */ - skb_put(skb, pkt_len - 4); - skb->protocol = eth_type_trans(skb, dev); + /* Malloc up new buffer. */ + skb_new = netdev_alloc_skb_ip_align(dev, KORINA_RBSIZE); - /* Pass the packet to upper layers */ - netif_receive_skb(skb); - dev->stats.rx_packets++; - dev->stats.rx_bytes += pkt_len; + if (!skb_new) + break; + /* Do not count the CRC */ + skb_put(skb, pkt_len - 4); + skb->protocol = eth_type_trans(skb, dev); - /* Update the mcast stats */ - if (devcs & ETH_RX_MP) - dev->stats.multicast++; + /* Pass the packet to upper layers */ + netif_receive_skb(skb); + dev->stats.rx_packets++; + dev->stats.rx_bytes += pkt_len; - lp->rx_skb[lp->rx_next_done] = skb_new; - } + /* Update the mcast stats */ + if (devcs & ETH_RX_MP) + dev->stats.multicast++; + + lp->rx_skb[lp->rx_next_done] = skb_new; +next: rd->devcs = 0; /* Restore descriptor's curr_addr */ -- 2.11.0
[PATCH net-next v2 1/7] net: korina: don't use overflow and underflow interrupts
When such interrupts occur there is not much we can do. Dropping the whole ring doesn't help and only produces high packet loss. If we just ignore the interrupt the mac will drop one or few packets instead of the whole ring. Also this will lower the irq handling load and increase performance. Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 83 +-- 1 file changed, 1 insertion(+), 82 deletions(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 3c0a6451273d..98d686ed69a9 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -122,8 +122,6 @@ struct korina_private { int rx_irq; int tx_irq; - int ovr_irq; - int und_irq; spinlock_t lock;/* NIC xmit lock */ @@ -891,8 +889,6 @@ static void korina_restart_task(struct work_struct *work) */ disable_irq(lp->rx_irq); disable_irq(lp->tx_irq); - disable_irq(lp->ovr_irq); - disable_irq(lp->und_irq); writel(readl(&lp->tx_dma_regs->dmasm) | DMA_STAT_FINI | DMA_STAT_ERR, @@ -911,40 +907,10 @@ static void korina_restart_task(struct work_struct *work) } korina_multicast_list(dev); - enable_irq(lp->und_irq); - enable_irq(lp->ovr_irq); enable_irq(lp->tx_irq); enable_irq(lp->rx_irq); } -static void korina_clear_and_restart(struct net_device *dev, u32 value) -{ - struct korina_private *lp = netdev_priv(dev); - - netif_stop_queue(dev); - writel(value, &lp->eth_regs->ethintfc); - schedule_work(&lp->restart_task); -} - -/* Ethernet Tx Underflow interrupt */ -static irqreturn_t korina_und_interrupt(int irq, void *dev_id) -{ - struct net_device *dev = dev_id; - struct korina_private *lp = netdev_priv(dev); - unsigned int und; - - spin_lock(&lp->lock); - - und = readl(&lp->eth_regs->ethintfc); - - if (und & ETH_INT_FC_UND) - korina_clear_and_restart(dev, und & ~ETH_INT_FC_UND); - - spin_unlock(&lp->lock); - - return IRQ_HANDLED; -} - static void korina_tx_timeout(struct net_device *dev) { struct korina_private *lp = netdev_priv(dev); @@ -952,25 +918,6 @@ static void korina_tx_timeout(struct net_device *dev) schedule_work(&lp->restart_task); } -/* Ethernet Rx Overflow interrupt */ -static irqreturn_t -korina_ovr_interrupt(int irq, void *dev_id) -{ - struct net_device *dev = dev_id; - struct korina_private *lp = netdev_priv(dev); - unsigned int ovr; - - spin_lock(&lp->lock); - ovr = readl(&lp->eth_regs->ethintfc); - - if (ovr & ETH_INT_FC_OVR) - korina_clear_and_restart(dev, ovr & ~ETH_INT_FC_OVR); - - spin_unlock(&lp->lock); - - return IRQ_HANDLED; -} - #ifdef CONFIG_NET_POLL_CONTROLLER static void korina_poll_controller(struct net_device *dev) { @@ -993,8 +940,7 @@ static int korina_open(struct net_device *dev) } /* Install the interrupt handler -* that handles the Done Finished -* Ovr and Und Events */ +* that handles the Done Finished */ ret = request_irq(lp->rx_irq, korina_rx_dma_interrupt, 0, "Korina ethernet Rx", dev); if (ret < 0) { @@ -1010,31 +956,10 @@ static int korina_open(struct net_device *dev) goto err_free_rx_irq; } - /* Install handler for overrun error. */ - ret = request_irq(lp->ovr_irq, korina_ovr_interrupt, - 0, "Ethernet Overflow", dev); - if (ret < 0) { - printk(KERN_ERR "%s: unable to get OVR IRQ %d\n", - dev->name, lp->ovr_irq); - goto err_free_tx_irq; - } - - /* Install handler for underflow error. */ - ret = request_irq(lp->und_irq, korina_und_interrupt, - 0, "Ethernet Underflow", dev); - if (ret < 0) { - printk(KERN_ERR "%s: unable to get UND IRQ %d\n", - dev->name, lp->und_irq); - goto err_free_ovr_irq; - } mod_timer(&lp->media_check_timer, jiffies + 1); out: return ret; -err_free_ovr_irq: - free_irq(lp->ovr_irq, dev); -err_free_tx_irq: - free_irq(lp->tx_irq, dev); err_free_rx_irq: free_irq(lp->rx_irq, dev); err_release: @@ -1052,8 +977,6 @@ static int korina_close(struct net_device *dev) /* Disable interrupts */ disable_irq(lp->rx_irq); disable_irq(lp->tx_irq); - disable_irq(lp->ovr_irq); - disable_irq(lp->und_irq); korina_abort_tx(dev); tmp = readl(&lp->tx_dma_regs->dmasm); @@ -1073,8 +996,6 @@ static int korina_close(struct net_device *dev) free_irq(lp->rx_irq, dev); free_irq(lp->tx_irq, dev); - free_irq(lp->ovr_irq, dev); - free_irq(lp->und_irq, dev); return 0; } @@ -1113
[PATCH net-next v2 0/7] korina: performance fixes and cleanup
Changes from v1: - use GRO instead of increasing ring size - use NAPI_POLL_WEIGHT instead of defining own NAPI_WEIGHT - optimize rx descriptor flags processing Roman Yeryomin (7): net: korina: don't use overflow and underflow interrupts net: korina: optimize rx descriptor flags processing net: korina: use NAPI_POLL_WEIGHT net: korina: use GRO net: korina: whitespace cleanup net: korina: update authors net: korina: bump version drivers/net/ethernet/korina.c | 230 ++ 1 file changed, 78 insertions(+), 152 deletions(-) -- 2.11.0
Dear Talented
Dear Talented, I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a Film Corporation Located in the United State, is Soliciting for the Right to use Your Photo/Face and Personality as One of the Semi -Major Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) The Movie is Currently Filming (In Production) Please Note That There Will Be No Auditions, Traveling or Any Special / Professional Acting Skills, Since the Production of This Movie Will Be Done with our State of Art Computer -Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For More Information/Understanding, Please Write us on the E-Mail Below. CONTACT EMAIL: blueskyanimatedstu...@usa.com All Reply to: blueskyanimatedstu...@usa.com Note: Only the Response send to this mail will be Given a Prior Consideration. Talent Scout Kim Sharma
Dear Talented
Dear Talented, I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a Film Corporation Located in the United State, is Soliciting for the Right to use Your Photo/Face and Personality as One of the Semi -Major Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) The Movie is Currently Filming (In Production) Please Note That There Will Be No Auditions, Traveling or Any Special / Professional Acting Skills, Since the Production of This Movie Will Be Done with our State of Art Computer -Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For More Information/Understanding, Please Write us on the E-Mail Below. CONTACT EMAIL: blueskyanimatedstu...@usa.com All Reply to: blueskyanimatedstu...@usa.com Note: Only the Response send to this mail will be Given a Prior Consideration. Talent Scout Kim Sharma
RE: [PATCH V2] tipc: Use bsearch library function
> -Original Message- > From: Thomas Meyer [mailto:tho...@m3y3r.de] > Sent: Sunday, September 17, 2017 11:00 > To: Jon Maloy > Cc: Joe Perches ; Ying Xue ; > netdev@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; linux- > ker...@vger.kernel.org; da...@davemloft.net > Subject: Re: [PATCH V2] tipc: Use bsearch library function > > > > Am 16.09.2017 um 15:20 schrieb Jon Maloy . > >> > >> What part of "very time critical" have you verified and benchmarked as > >> inconsequential? > >> > >> Please post your results. > > > > I agree with Joe here. This change does not simplify anything, it does not > reduce the amount of code, plus that it introduce an unnecessary outline call > in a place where we have every reason to let the compiler do its optimization > job properly. > > Hi, > > Okay, should I prepare some performance numbers or do we NAK this > change? > What about the other binary search implementation in the same file? Should > I try to convert it it will it get NAKed for performance reasons too? The searches for inserting and removing publications is less time critical, so that would be ok with me. If you have any more general interest in improving the code in this file (which is needed) it would also be appreciated. BR ///jon > > With kind regards > Thomas smime.p7s Description: S/MIME cryptographic signature
Re: Page allocator bottleneck
On 15/09/2017 10:28 AM, Jesper Dangaard Brouer wrote: On Thu, 14 Sep 2017 19:49:31 +0300 Tariq Toukan wrote: Hi all, As part of the efforts to support increasing next-generation NIC speeds, I am investigating SW bottlenecks in network stack receive flow. Here I share some numbers I got for a simple experiment, in which I simulate the page allocation rate needed in 200Gpbs NICs. Thanks for bringing this up again. Sure. We need to keep up with the increasing NIC speeds. I ran the test below over 3 different (modified) mlx5 driver versions, loaded on server side (RX): 1) RX page cache disabled, 2 packets per page. 2 packets per page basically reduce the overhead you see from the page allocator to half. 2) RX page cache disabled, one packet per page. This, should stress the page allocator. 3) Huge RX page cache, one packet per page. A driver level page-cache will look nice, as long as it "works". I verified that it worked in the experiment. Drivers usually have no other option than basing their recycle facility to be based on the page-refcnt (as there is no destructor callback). Which implies packets/pages need to be returned quickly enough for it to work. Yes, that's how our current default (small) RX page-cache is implemented. Unfortunately, the timing and terms for a fair reuse rate are not always satisfied. All page allocations are of order 0. NIC: Connectx-5 100 Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Test: 128 TCP streams (using super_netperf). Changing num of RX queues. HW LRO OFF, GRO ON, MTU 1500. With TCP streams and GRO, is actually a good stress test for the page allocator (or drivers page-recycle cache). As Eric Dumazet have made some nice optimizations, that (in most situations) cause us to quickly free/recycle the SKB (coming from driver) and store the pages in 1-SKB. This cause us to hit the SLUB fastpath for the SKBs, but once the pages need to be free'ed this stress the page allocator more. Yep, bulking would help here, as you mention below. Also be aware that with TCP flows, the packets are likely delivered into a socket, that is consumed on another CPU. Thus, the pages are allocated on one CPU and free'ed on another. AFAIK this stress the order-0 cache PCP (Per-Cpu-Pages). Good point. Do you know of any tool/kernel counters that help observe and quantify this behavior? Observe: BW as a function of num RX queues. Results: Driver #1: #rings BW (Mbps) 1 23,813 2 44,086 3 62,128 4 78,058 6 94,210 (linerate) 8 94,205 (linerate) 12 94,202 (linerate) 16 94,191 (linerate) Driver #2: #rings BW (Mbps) 1 18,835 2 36,716 3 50,521 4 61,746 6 63,637 8 60,299 12 51,048 16 43,337 Driver #3: #rings BW (Mbps) 1 19,316 2 44,850 3 69,549 4 87,434 6 94,342 (linerate) 8 94,350 (linerate) 12 94,327 (linerate) 16 94,327 (linerate) Insights: Major degradation between #1 and #2, not getting any close to linerate! Degradation is fixed between #2 and #3. This is because page allocator cannot stand the higher allocation rate. In #2, we also see that the addition of rings (cores) reduces BW (!!), as result of increasing congestion over shared resources. Congestion in this case is very clear. When monitored in perf top: 85.58% [kernel] [k] queued_spin_lock_slowpath Well, we obviously need to know the caller of the spin_lock. In this case it is likely the page allocator lock. It could also be the TCP socket locks, but given GRO is enabled, they should be hit much less. It is the page allocator lock. I verified this based on Andi's suggestion, see other mail. It's nice to have the option to dynamically play with the parameter. But maybe we should also think of changing the default fraction guaranteed to the PCP, so that unaware admins of networking servers would also benefit. I think that page allocator issues should be discussed separately: 1) Rate: Increase the allocation rate on a single core. 2) Scalability: Reduce congestion and sync overhead between cores. Yes, but this no small task. I is on my TODO-list (emacs org-mode), but I have other tasks that have higher priority atm. I'll be working on XDP_REDIRECT for the next many months. Currently trying to convince people that we do an explicit packet-page return/free callback (which would avoid many of these issues). This is clearly the current bottleneck in the network stack receive flow. I know about some efforts that were made in the past two years. For example the ones from Jesper et al.: - Page-pool (not accepted AFAIK). The page-pool have many purposes. 1. generic page-cache for drivers, 2. keep pages DMA-mapped 3. facilitate drivers to change RX-ring memory model From a MM-point-of-view the page pool is just a destructor callback, that can "steal" the page. If I can convince XDP_REDIRECT to use an explicit dest
Re: Page allocator bottleneck
On 14/09/2017 11:19 PM, Andi Kleen wrote: Tariq Toukan writes: Congestion in this case is very clear. When monitored in perf top: 85.58% [kernel] [k] queued_spin_lock_slowpath Please look at the callers. Spinlock profiles without callers are usually useless because it's just blaming the messenger. Most likely the PCP lists are too small for your extreme allocation rate, so it goes back too often to the shared pool. You can play with the vm.percpu_pagelist_fraction setting. Thanks Andi. That was my initial guess, but I wasn't familiar with these tunes in VM to verify that. Indeed, bottleneck is released when increasing the PCP size, and BW becomes significantly better. -Andi
Re: [PATCH V2] tipc: Use bsearch library function
> Am 16.09.2017 um 15:20 schrieb Jon Maloy . >> >> What part of "very time critical" have you verified and benchmarked as >> inconsequential? >> >> Please post your results. > > I agree with Joe here. This change does not simplify anything, it does not > reduce the amount of code, plus that it introduce an unnecessary outline call > in a place where we have every reason to let the compiler do its optimization > job properly. Hi, Okay, should I prepare some performance numbers or do we NAK this change? What about the other binary search implementation in the same file? Should I try to convert it it will it get NAKed for performance reasons too? With kind regards Thomas smime.p7s Description: S/MIME cryptographic signature
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
On 09/15/2017 01:51 PM, Josef Bacik wrote: > Finally got access to a box to run this down myself. This patch on top of > the other patches fixes the problem for me, could you verify it works for > you? Thanks, > Yup I can confirm that patch fixes things when applied on top of the previous 3 patches. Thanks! Please tag those patches for stable releases if appropriate, this is affecting a decent amount of libvirt users Thanks, Cole
[pktgen script v2 1/2] Add some helper functions
From: Robert Hoo 1. given a device, get its NUMA belongings 2. given a device, get its queues' irq numbers. 3. given a NUMA node, get its cpu id list. Signed-off-by: Robert Hoo --- pktgen/functions.sh | 44 1 file changed, 44 insertions(+) diff --git a/pktgen/functions.sh b/pktgen/functions.sh index 205e4cd..09dfe7a 100644 --- a/pktgen/functions.sh +++ b/pktgen/functions.sh @@ -119,3 +119,47 @@ function root_check_run_with_sudo() { err 4 "cannot perform sudo run of $0" fi } + +# Exact input device's NUMA node info +function get_iface_node() +{ +local node=$(
[pktgen script v2 2/2] Add pktgen script: pktgen_sample06_numa_awared_queue_irq_affinity.sh
From: Robert Hoo This script simply does: Detect $DEV's NUMA node belonging. Bind each thread (processor of NUMA locality) with each $DEV queue's irq affinity, 1:1 mapping. How many '-t' threads input determines how many queues will be utilized. If '-f' designates first cpu id, then offset in the NUMA node's cpu list. Signed-off-by: Robert Hoo --- ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 97 ++ 1 file changed, 97 insertions(+) create mode 100755 pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh diff --git a/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh new file mode 100755 index 000..52da0f4 --- /dev/null +++ b/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh @@ -0,0 +1,97 @@ +#!/bin/bash +# +# Multiqueue: Using pktgen threads for sending on multiple CPUs +# * adding devices to kernel threads which are in the same NUMA node +# * bound devices queue's irq affinity to the threads, 1:1 mapping +# * notice the naming scheme for keeping device names unique +# * nameing scheme: dev@thread_number +# * flow variation via random UDP source port +# +basedir=`dirname $0` +source ${basedir}/functions.sh +root_check_run_with_sudo "$@" +# +# Required param: -i dev in $DEV +source ${basedir}/parameters.sh + +# Base Config +DELAY="0"# Zero means max speed +COUNT="2000" # Zero means indefinitely +[ -z "$CLONE_SKB" ] && CLONE_SKB="0" + +# Flow variation random source port between min and max +UDP_MIN=9 +UDP_MAX=109 + +node=`get_iface_node $DEV` +irq_array=(`get_iface_irqs $DEV`) +cpu_array=(`get_node_cpus $node`) + +[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]} ] && \ + err 1 "Thread number $THREADS exceeds: min (${#irq_array[*]},${#cpu_array[*]})" + +# (example of setting default params in your script) +if [ -z "$DEST_IP" ]; then +[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1" +fi +[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff" + +# General cleanup everything since last run +pg_ctrl "reset" + +# Threads are specified with parameter -t value in $THREADS +for ((i = 0; i < $THREADS; i++)); do +# The device name is extended with @name, using thread number to +# make then unique, but any name will do. +# Set the queue's irq affinity to this $thread (processor) +# if '-f' is designated, offset cpu id +thread=${cpu_array[$((i+F_THREAD))]} +dev=${DEV}@${thread} +echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list +info "irq ${irq_array[$i]} is set affinity to `cat /proc/irq/${irq_array[$i]}/smp_affinity_list`" + +# Add remove all other devices and add_device $dev to thread +pg_thread $thread "rem_device_all" +pg_thread $thread "add_device" $dev + +# select queue and bind the queue and $dev in 1:1 relationship +queue_num=$i +info "queue number is $queue_num" +pg_set $dev "queue_map_min $queue_num" +pg_set $dev "queue_map_max $queue_num" + +# Notice config queue to map to cpu (mirrors smp_processor_id()) +# It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number +pg_set $dev "flag QUEUE_MAP_CPU" + +# Base config of dev +pg_set $dev "count $COUNT" +pg_set $dev "clone_skb $CLONE_SKB" +pg_set $dev "pkt_size $PKT_SIZE" +pg_set $dev "delay $DELAY" + +# Flag example disabling timestamping +pg_set $dev "flag NO_TIMESTAMP" + +# Destination +pg_set $dev "dst_mac $DST_MAC" +pg_set $dev "dst$IP6 $DEST_IP" + +# Setup random UDP port src range +pg_set $dev "flag UDPSRC_RND" +pg_set $dev "udp_src_min $UDP_MIN" +pg_set $dev "udp_src_max $UDP_MAX" +done + +# start_run +echo "Running... ctrl^C to stop" >&2 +pg_ctrl "start" +echo "Done" >&2 + +# Print results +for ((i = 0; i < $THREADS; i++)); do +thread=${cpu_array[$((i+F_THREAD))]} +dev=${DEV}@${thread} +echo "Device: $dev" +cat /proc/net/pktgen/$dev | grep -A2 "Result:" +done -- 1.8.3.1
[pktgen script v2 0/2] Add a pktgen sample script of NUMA awareness
From: Robert Hoo It's hard to benchmark 40G+ network bandwidth using ordinary tools like iperf, netperf (see reference 1). Pktgen, packet generator from Kernel sapce, shall be a candidate. I derived this NUMA awared irq affinity sample script from multi-queue sample02, successfully benchmarked 40G link. I think this can also be useful for 100G reference, though I haven't got device to test yet. This script simply does: Detect $DEV's NUMA node belonging. Bind each thread (processor of NUMA locality) with each $DEV queue's irq affinity, 1:1 mapping. How many '-t' threads input determines how many queues will be utilized. If '-f' designates first cpu id, then offset in the NUMA node's cpu list. Tested with Intel XL710 NIC with Cisco 3172 switch. Referrences: https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf Change log v2: Rebased to https://github.com/netoptimizer/network-testing/tree/master/pktgen Move helper functions to functions.sh More concise shell grammar usage Take '-f' parameter into consideration. If the first CPU is designaed, offset in the NUMA-aware CPU list. Use err(), info() helper functions for such outputs. Robert Hoo (2): Add some helper functions Add pktgen script: pktgen_sample06_numa_awared_queue_irq_affinity.sh pktgen/functions.sh| 44 ++ ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 97 ++ 2 files changed, 141 insertions(+) create mode 100755 pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh -- 1.8.3.1
[PATCH] hamradio: baycom: use new parport device model
Modify baycom driver to use the new parallel port device model. Signed-off-by: Sudip Mukherjee --- Not tested on real hardware, only tested on qemu and verified that the device is binding to the driver properly in epp_open but then unbinding as the device was not found. drivers/net/hamradio/baycom_epp.c | 50 +++ 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c index 1503f10..1e62d00 100644 --- a/drivers/net/hamradio/baycom_epp.c +++ b/drivers/net/hamradio/baycom_epp.c @@ -840,6 +840,7 @@ static int epp_open(struct net_device *dev) unsigned char tmp[128]; unsigned char stat; unsigned long tstart; + struct pardev_cb par_cb; if (!pp) { printk(KERN_ERR "%s: parport at 0x%lx unknown\n", bc_drvname, dev->base_addr); @@ -859,8 +860,21 @@ static int epp_open(struct net_device *dev) return -EIO; } memset(&bc->modem, 0, sizeof(bc->modem)); -bc->pdev = parport_register_device(pp, dev->name, NULL, epp_wakeup, - NULL, PARPORT_DEV_EXCL, dev); + memset(&par_cb, 0, sizeof(par_cb)); + par_cb.wakeup = epp_wakeup; + par_cb.private = (void *)dev; + par_cb.flags = PARPORT_DEV_EXCL; + for (i = 0; i < NR_PORTS; i++) + if (baycom_device[i] == dev) + break; + + if (i == NR_PORTS) { + pr_err("%s: no device found\n", bc_drvname); + parport_put_port(pp); + return -ENODEV; + } + + bc->pdev = parport_register_dev_model(pp, dev->name, &par_cb, i); parport_put_port(pp); if (!bc->pdev) { printk(KERN_ERR "%s: cannot register parport at 0x%lx\n", bc_drvname, pp->base); @@ -1185,6 +1199,23 @@ MODULE_LICENSE("GPL"); /* - */ +static int baycom_epp_par_probe(struct pardevice *par_dev) +{ + struct device_driver *drv = par_dev->dev.driver; + int len = strlen(drv->name); + + if (strncmp(par_dev->name, drv->name, len)) + return -ENODEV; + + return 0; +} + +static struct parport_driver baycom_epp_par_driver = { + .name = "bce", + .probe = baycom_epp_par_probe, + .devmodel = true, +}; + static void __init baycom_epp_dev_setup(struct net_device *dev) { struct baycom_state *bc = netdev_priv(dev); @@ -1204,10 +1235,15 @@ static void __init baycom_epp_dev_setup(struct net_device *dev) static int __init init_baycomepp(void) { - int i, found = 0; + int i, found = 0, ret; char set_hw = 1; printk(bc_drvinfo); + + ret = parport_register_driver(&baycom_epp_par_driver); + if (ret) + return ret; + /* * register net devices */ @@ -1241,7 +1277,12 @@ static int __init init_baycomepp(void) found++; } - return found ? 0 : -ENXIO; + if (found == 0) { + parport_unregister_driver(&baycom_epp_par_driver); + return -ENXIO; + } + + return 0; } static void __exit cleanup_baycomepp(void) @@ -1260,6 +1301,7 @@ static void __exit cleanup_baycomepp(void) printk(paranoia_str, "cleanup_module"); } } + parport_unregister_driver(&baycom_epp_par_driver); } module_init(init_baycomepp); -- 2.7.4
RE: You've won the 10.5M lawsuit verdict!
Re: [PATCH net] net/sched: cls_matchall: fix crash when used with classful qdisc
On 09/16/2017 03:02 PM, Davide Caratti wrote: > this script, edited from Linux Advanced Routing and Traffic Control guide > > tc q a dev en0 root handle 1: htb default a > tc c a dev en0 parent 1: classid 1:1 htb rate 6mbit burst 15k > tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k > tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k > tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b > ping $address -c1 > tc -s c s dev en0 > > classifies traffic to 1:b or 1:a, depending on whether the packet matches > or not the pattern $clsargs of filter $clsname. However, when $clsname is > 'matchall', a systematic crash can be observed in htb_classify(). HTB and > classful qdiscs don't assign initial value to struct tcf_result, but then > they expect it to contain valid values after filters have been run. Thus, > current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured > by user, and makes HTB (and classful qdiscs) dereference random pointers. > > By assigning head->res to *res in mall_classify(), before the actions are > invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality, > that had no effect on 'matchall' classifier since its first introduction. > > BugLink: > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1460213&data=02%7C01%7Cyotamg%40mellanox.com%7C399f6ff50cb148cbd0d408d4fcfad4c7%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636411601630363571&sdata=PSkkBrWNXkTxsvXrTmK6Dx9iKZMq61MAKlTcdVcPj8w%3D&reserved=0 > Reported-by: Jiri Benc > Fixes: b87f7936a932 ("net/sched: introduce Match-all classifier") > Signed-off-by: Davide Caratti > --- > net/sched/cls_matchall.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c > index 21cc45caf842..eeac606c95ab 100644 > --- a/net/sched/cls_matchall.c > +++ b/net/sched/cls_matchall.c > @@ -32,6 +32,7 @@ static int mall_classify(struct sk_buff *skb, const struct > tcf_proto *tp, > if (tc_skip_sw(head->flags)) > return -1; > > + *res = head->res; > return tcf_exts_exec(skb, &head->exts, res); > } > Acked-by: Yotam Gigi