Re: [PATCH] kni: fix use-after-free when kni release
Acked-by: Igor Ryzhov On Wed, Feb 9, 2022 at 10:36 AM Min Hu (Connor) wrote: > Hi, Igor, > fixed in v2, please check it, thanks. > > 在 2022/2/8 20:41, Igor Ryzhov 写道: > > Looks correct. > > Could you, please, also change the order of `list_del` and > > `kni_dev_remove` in `kni_release`? It suffers from the same problem. > > > > Igor > > > > On Fri, Jan 28, 2022 at 5:43 AM Min Hu (Connor) > <mailto:humi...@huawei.com>> wrote: > > > > From: Huisong Li mailto:lihuis...@huawei.com > >> > > > > The "kni_dev" is the private data of the "net_device" in kni, and > > allocated > > with the "net_device" by calling "alloc_netdev()". The "net_device" > is > > freed by calling "free_netdev()" when kni release. The freed memory > > includes the "kni_dev". So After "kni_dev" should not be accessed > after > > "net_device" is released. > > > > Fixes: e77fec694936 ("kni: fix possible mbuf leaks and speed up port > > release") > > Cc: sta...@dpdk.org <mailto:sta...@dpdk.org> > > > > KASAN trace: > > > > [ 85.263717] > > == > > [ 85.264418] BUG: KASAN: use-after-free in > kni_net_release_fifo_phy+ > > 0x30/0x84 [rte_kni] > > [ 85.265139] Read of size 8 at addr 000260668d60 by task > kni/341 > > [ 85.265703] > > [ 85.265857] CPU: 0 PID: 341 Comm: kni Tainted: G U O > > 5.15.0-rc4+ #1 > > [ 85.266525] Hardware name: linux,dummy-virt (DT) > > [ 85.266968] Call trace: > > [ 85.267220] dump_backtrace+0x0/0x2d0 > > [ 85.267591] show_stack+0x24/0x30 > > [ 85.267924] dump_stack_lvl+0x8c/0xb8 > > [ 85.268294] print_address_description.constprop.0+0x74/0x2b8 > > [ 85.268855] kasan_report+0x1e4/0x200 > > [ 85.269224] __asan_load8+0x98/0xd4 > > [ 85.269577] kni_net_release_fifo_phy+0x30/0x84 [rte_kni] > > [ 85.270116] kni_dev_remove.isra.0+0x50/0x64 [rte_kni] > > [ 85.270630] kni_ioctl_release+0x254/0x320 [rte_kni] > > [ 85.271136] kni_ioctl+0x64/0xb0 [rte_kni] > > [ 85.271553] __arm64_sys_ioctl+0xdc/0x120 > > [ 85.271955] invoke_syscall+0x68/0x1a0 > > [ 85.272332] el0_svc_common.constprop.0+0x90/0x200 > > [ 85.272807] do_el0_svc+0x94/0xa4 > > [ 85.273144] el0_svc+0x78/0x240 > > [ 85.273463] el0t_64_sync_handler+0x1a8/0x1b0 > > [ 85.273895] el0t_64_sync+0x1a0/0x1a4 > > [ 85.274264] > > [ 85.274427] Allocated by task 341: > > [ 85.274767] kasan_save_stack+0x2c/0x60 > > [ 85.275157] __kasan_kmalloc+0x90/0xb4 > > [ 85.275533] __kmalloc_node+0x230/0x594 > > [ 85.275917] kvmalloc_node+0x8c/0x190 > > [ 85.276286] alloc_netdev_mqs+0x70/0x6b0 > > [ 85.276678] kni_ioctl_create+0x224/0xf40 [rte_kni] > > [ 85.277166] kni_ioctl+0x9c/0xb0 [rte_kni] > > [ 85.277581] __arm64_sys_ioctl+0xdc/0x120 > > [ 85.277980] invoke_syscall+0x68/0x1a0 > > [ 85.278357] el0_svc_common.constprop.0+0x90/0x200 > > [ 85.278830] do_el0_svc+0x94/0xa4 > > [ 85.279172] el0_svc+0x78/0x240 > > [ 85.279491] el0t_64_sync_handler+0x1a8/0x1b0 > > [ 85.279925] el0t_64_sync+0x1a0/0x1a4 > > [ 85.280292] > > [ 85.280454] Freed by task 341: > > [ 85.280763] kasan_save_stack+0x2c/0x60 > > [ 85.281147] kasan_set_track+0x2c/0x40 > > [ 85.281522] kasan_set_free_info+0x2c/0x50 > > [ 85.281930] __kasan_slab_free+0xdc/0x140 > > [ 85.282331] slab_free_freelist_hook+0x90/0x250 > > [ 85.282782] kfree+0x128/0x580 > > [ 85.283099] kvfree+0x48/0x60 > > [ 85.283402] netdev_freemem+0x34/0x44 > > [ 85.283770] netdev_release+0x50/0x64 > > [ 85.284138] device_release+0xa0/0x120 > > [ 85.284516] kobject_put+0xf8/0x160 > > [ 85.284867] put_device+0x20/0x30 > > [ 85.285204] free_netdev+0x22c/0x310 > > [ 85.285562] kni_dev_remove.isra.0+0x48/0x64 [rte_kni] > > [ 85.286076] kni_ioctl_release+0x254/0x320 [rte_kni] > > [ 85.286573] kni_ioctl+0x64/0xb0 [rte_kni] > > [ 85.286992] __arm64_sys_ioctl+0xdc/0x120 > > [ 85.287392] invoke_syscall+0x68/0x1a0 > > [ 85.287769] el0_svc
Re: [PATCH] kni: fix use-after-free when kni release
Looks correct. Could you, please, also change the order of `list_del` and `kni_dev_remove` in `kni_release`? It suffers from the same problem. Igor On Fri, Jan 28, 2022 at 5:43 AM Min Hu (Connor) wrote: > From: Huisong Li > > The "kni_dev" is the private data of the "net_device" in kni, and allocated > with the "net_device" by calling "alloc_netdev()". The "net_device" is > freed by calling "free_netdev()" when kni release. The freed memory > includes the "kni_dev". So After "kni_dev" should not be accessed after > "net_device" is released. > > Fixes: e77fec694936 ("kni: fix possible mbuf leaks and speed up port > release") > Cc: sta...@dpdk.org > > KASAN trace: > > [ 85.263717] == > [ 85.264418] BUG: KASAN: use-after-free in kni_net_release_fifo_phy+ > 0x30/0x84 [rte_kni] > [ 85.265139] Read of size 8 at addr 000260668d60 by task kni/341 > [ 85.265703] > [ 85.265857] CPU: 0 PID: 341 Comm: kni Tainted: G U O > 5.15.0-rc4+ #1 > [ 85.266525] Hardware name: linux,dummy-virt (DT) > [ 85.266968] Call trace: > [ 85.267220] dump_backtrace+0x0/0x2d0 > [ 85.267591] show_stack+0x24/0x30 > [ 85.267924] dump_stack_lvl+0x8c/0xb8 > [ 85.268294] print_address_description.constprop.0+0x74/0x2b8 > [ 85.268855] kasan_report+0x1e4/0x200 > [ 85.269224] __asan_load8+0x98/0xd4 > [ 85.269577] kni_net_release_fifo_phy+0x30/0x84 [rte_kni] > [ 85.270116] kni_dev_remove.isra.0+0x50/0x64 [rte_kni] > [ 85.270630] kni_ioctl_release+0x254/0x320 [rte_kni] > [ 85.271136] kni_ioctl+0x64/0xb0 [rte_kni] > [ 85.271553] __arm64_sys_ioctl+0xdc/0x120 > [ 85.271955] invoke_syscall+0x68/0x1a0 > [ 85.272332] el0_svc_common.constprop.0+0x90/0x200 > [ 85.272807] do_el0_svc+0x94/0xa4 > [ 85.273144] el0_svc+0x78/0x240 > [ 85.273463] el0t_64_sync_handler+0x1a8/0x1b0 > [ 85.273895] el0t_64_sync+0x1a0/0x1a4 > [ 85.274264] > [ 85.274427] Allocated by task 341: > [ 85.274767] kasan_save_stack+0x2c/0x60 > [ 85.275157] __kasan_kmalloc+0x90/0xb4 > [ 85.275533] __kmalloc_node+0x230/0x594 > [ 85.275917] kvmalloc_node+0x8c/0x190 > [ 85.276286] alloc_netdev_mqs+0x70/0x6b0 > [ 85.276678] kni_ioctl_create+0x224/0xf40 [rte_kni] > [ 85.277166] kni_ioctl+0x9c/0xb0 [rte_kni] > [ 85.277581] __arm64_sys_ioctl+0xdc/0x120 > [ 85.277980] invoke_syscall+0x68/0x1a0 > [ 85.278357] el0_svc_common.constprop.0+0x90/0x200 > [ 85.278830] do_el0_svc+0x94/0xa4 > [ 85.279172] el0_svc+0x78/0x240 > [ 85.279491] el0t_64_sync_handler+0x1a8/0x1b0 > [ 85.279925] el0t_64_sync+0x1a0/0x1a4 > [ 85.280292] > [ 85.280454] Freed by task 341: > [ 85.280763] kasan_save_stack+0x2c/0x60 > [ 85.281147] kasan_set_track+0x2c/0x40 > [ 85.281522] kasan_set_free_info+0x2c/0x50 > [ 85.281930] __kasan_slab_free+0xdc/0x140 > [ 85.282331] slab_free_freelist_hook+0x90/0x250 > [ 85.282782] kfree+0x128/0x580 > [ 85.283099] kvfree+0x48/0x60 > [ 85.283402] netdev_freemem+0x34/0x44 > [ 85.283770] netdev_release+0x50/0x64 > [ 85.284138] device_release+0xa0/0x120 > [ 85.284516] kobject_put+0xf8/0x160 > [ 85.284867] put_device+0x20/0x30 > [ 85.285204] free_netdev+0x22c/0x310 > [ 85.285562] kni_dev_remove.isra.0+0x48/0x64 [rte_kni] > [ 85.286076] kni_ioctl_release+0x254/0x320 [rte_kni] > [ 85.286573] kni_ioctl+0x64/0xb0 [rte_kni] > [ 85.286992] __arm64_sys_ioctl+0xdc/0x120 > [ 85.287392] invoke_syscall+0x68/0x1a0 > [ 85.287769] el0_svc_common.constprop.0+0x90/0x200 > [ 85.288243] do_el0_svc+0x94/0xa4 > [ 85.288579] el0_svc+0x78/0x240 > [ 85.288899] el0t_64_sync_handler+0x1a8/0x1b0 > [ 85.289332] el0t_64_sync+0x1a0/0x1a4 > [ 85.289699] > [ 85.289862] The buggy address belongs to the object at 000260668000 > [ 85.289862] which belongs to the cache kmalloc-cg-8k of size 8192 > [ 85.291079] The buggy address is located 3424 bytes inside of > [ 85.291079] 8192-byte region [000260668000, 00026066a000) > [ 85.292213] The buggy address belongs to the page: > [ 85.292684] page:(ptrval) refcount:1 mapcount:0 mapping: > index:0x0 pfn:0x2a0668 > [ 85.293585] head:(ptrval) order:3 compound_mapcount:0 > compound_pincount:0 > [ 85.294305] flags: 0xbfff8010200(slab|head|node=0|zone=2| > lastcpupid=0x7fff) > [ 85.295020] raw: 0bfff8010200 dead0122 > c000d680 > [ 85.295767] raw: 80020002 0001 > > [ 85.296512] page dumped because: kasan: bad access detected > [ 85.297054] > [ 85.297217] Memory state around the buggy address: > [ 85.297688] 000260668c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb > fb fb > [ 85.298384] 000260668c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb >
Re: [PATCH] kni: restrict bifurcated device support
Acked-by: Igor Ryzhov On Sat, Oct 9, 2021 at 2:58 AM Ferruh Yigit wrote: > To enable bifurcated device support, rtnl_lock is released before calling > userspace callbacks and asynchronous requests are enabled. > > But these changes caused more issues, like bug #809, #816. To reduce the > scope of the problems, the bifurcated device support related changes are > only enabled when it is requested explicitly with new 'enable_bifurcated' > module parameter. > And bifurcated device support is disabled by default. > > So the bifurcated device related problems are isolated and they can be > fixed without impacting all use cases. > > Bugzilla ID: 816 > Fixes: 631217c76135 ("kni: fix kernel deadlock with bifurcated device") > Cc: sta...@dpdk.org > > Signed-off-by: Ferruh Yigit > --- > Cc: Igor Ryzhov > Cc: Elad Nachman > Cc: Eric Christian > Cc: Stephen Hemminger > --- > .../prog_guide/kernel_nic_interface.rst | 28 + > kernel/linux/kni/kni_dev.h| 3 ++ > kernel/linux/kni/kni_misc.c | 36 > kernel/linux/kni/kni_net.c| 42 +++ > 4 files changed, 92 insertions(+), 17 deletions(-) > > diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst > b/doc/guides/prog_guide/kernel_nic_interface.rst > index 1ce03ec1a374..771c7d7fdac4 100644 > --- a/doc/guides/prog_guide/kernel_nic_interface.rst > +++ b/doc/guides/prog_guide/kernel_nic_interface.rst > @@ -56,6 +56,12 @@ can be specified when the module is loaded to control > its behavior: > off Interfaces will be created with carrier state > set to off. > onInterfaces will be created with carrier state > set to on. > (charp) > +parm: enable_bifurcated: Enable request processing support > for > +bifurcated drivers, which means releasing rtnl_lock > before calling > +userspace callback and supporting async requests > (default=off): > +onEnable request processing support for > bifurcated drivers. > + (charp) > + > > Loading the ``rte_kni`` kernel module without any optional parameters is > the typical way a DPDK application gets packets into and out of the kernel > @@ -174,6 +180,28 @@ To set the default carrier state to *off*: > If the ``carrier`` parameter is not specified, the default carrier state > of KNI interfaces will be set to *off*. > > +.. _kni_bifurcated_device_support: > + > +Bifurcated Device Support > +~ > + > +User callbacks are executed while kernel module holds the ``rtnl`` lock, > this > +causes a deadlock when callbacks run control commands on another Linux > kernel > +network interface. > + > +Bifurcated devices has kernel network driver part and to prevent deadlock > for > +them ``enable_bifurcated`` is used. > + > +To enable bifurcated device support: > + > +.. code-block:: console > + > +# insmod /kernel/linux/kni/rte_kni.ko enable_bifurcated=on > + > +Enabling bifurcated device support releases ``rtnl`` lock before calling > +callback and locks it back after callback. Also enables asynchronous > request to > +support callbacks that requires rtnl lock to work (interface down). > + > KNI Creation and Deletion > - > > diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h > index c15da311ba25..e8633486eeb8 100644 > --- a/kernel/linux/kni/kni_dev.h > +++ b/kernel/linux/kni/kni_dev.h > @@ -34,6 +34,9 @@ > /* Default carrier state for created KNI network interfaces */ > extern uint32_t kni_dflt_carrier; > > +/* Request processing support for bifurcated drivers. */ > +extern uint32_t bifurcated_support; > + > /** > * A structure describing the private information for a kni device. > */ > diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c > index 2b464c438113..aae977c187a9 100644 > --- a/kernel/linux/kni/kni_misc.c > +++ b/kernel/linux/kni/kni_misc.c > @@ -41,6 +41,10 @@ static uint32_t multiple_kthread_on; > static char *carrier; > uint32_t kni_dflt_carrier; > > +/* Request processing support for bifurcated drivers. */ > +static char *enable_bifurcated; > +uint32_t bifurcated_support; > + > #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ > > static int kni_net_id; > @@ -568,6 +572,22 @@ kni_parse_carrier_state(void) > return 0; > } > > +static int __init > +kni_parse_bifurcated_support(void) > +{ > + if (!enable_bifurcated) { > + bif
Re: [dpdk-dev] [PATCH v5 1/3] kni: rework rte_kni_update_link using ioctl
On Thu, Aug 26, 2021 at 9:06 PM Stephen Hemminger < step...@networkplumber.org> wrote: > On Thu, 26 Aug 2021 20:46:47 +0300 > Igor Ryzhov wrote: > > > Could you please clarify where exactly do I need to use rtnl lock? > > From what I understand, netif_carrier_on/off can be called without the > lock. > > Just a basic concern. The new stuff looks ok. > If you follow what current upstream virtio is doing, it should be safe. > > See drivers/net/virtio/virtio_net.c, looks like more is needed? Thanks for the suggestion, I looked at it. > Do you call ethtool_validate_speed() and ethtool_validate_duplex()? > The problem with those functions is that they are only available since kernel version 4.6 and currently we have to also support 4.4. I don't think we need to introduce more compatibility code, as nothing really bad happens if the developer provides some invalued values. The worst thing is that ethtool will show weird speed or unknown duplex. > > Also, in virtio the driver does stop/wakeup of tx queues on carrier change. > This can be added for sure, but this improvement is not actually related to my patch. We don't currently stop kernel's tx queue when the link is down and I don't change that behavior. The code change is trivial but I'll need to find some time to test that everything works fine, so let's consider it a separate thing, please.
[dpdk-dev] [PATCH v6 3/3] app/test: fix return value of test_kni_link_change
If the child process returns -1, WEXITSTATUS converts to 255 because it takes only 8 least significant bits. As a result, the test is not considered failed when it should. Signed-off-by: Igor Ryzhov --- app/test/test_kni.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/test/test_kni.c b/app/test/test_kni.c index 0df028696f36..70c81f4af0f4 100644 --- a/app/test/test_kni.c +++ b/app/test/test_kni.c @@ -214,7 +214,7 @@ test_kni_link_change(void) p_ret = waitpid(pid, &status, WNOHANG); if (p_ret != 0) { if (WIFEXITED(status)) - return WEXITSTATUS(status); + return WEXITSTATUS(status) ? -1 : 0; return -1; } rte_delay_ms(10); -- 2.33.0
[dpdk-dev] [PATCH v6 2/3] kni: implement basic get_link_ksettings callback
It allows inspecting link parameters using ethtool. Signed-off-by: Igor Ryzhov --- kernel/linux/kni/kni_net.c | 25 + 1 file changed, 25 insertions(+) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 611719b5ee27..931b9daf104f 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -820,9 +820,34 @@ static void kni_get_drvinfo(struct net_device *dev, strlcpy(info->driver, "kni", sizeof(info->driver)); } +#ifdef ETHTOOL_GLINKSETTINGS +static int kni_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *settings) +{ + struct kni_dev *kni = netdev_priv(dev); + + settings->base.port = PORT_OTHER; + + if (netif_carrier_ok(dev)) { + settings->base.speed = kni->speed; + settings->base.duplex = kni->duplex; + settings->base.autoneg = kni->autoneg; + } else { + settings->base.speed = SPEED_UNKNOWN; + settings->base.duplex = DUPLEX_UNKNOWN; + settings->base.autoneg = AUTONEG_ENABLE; + } + + return 0; +} +#endif + static const struct ethtool_ops kni_net_ethtool_ops = { .get_drvinfo= kni_get_drvinfo, .get_link = ethtool_op_get_link, +#ifdef ETHTOOL_GLINKSETTINGS + .get_link_ksettings = kni_get_link_ksettings, +#endif }; void -- 2.33.0
[dpdk-dev] [PATCH v6 1/3] kni: rework rte_kni_update_link using ioctl
Current implementation doesn't allow us to update KNI carrier if the interface is not yet UP in kernel. It means that we can't use it in the same thread which is processing rte_kni_ops.config_network_if, which is very convenient, because it allows us to have correct carrier status of the interface right after we enabled it and we don't have to use any additional thread to track link status. Propagating speed/duplex/autoneg to the kernel module also allows us to implement ethtool_ops.get_link_ksettings callback. Suggested-by: Dan Gora Signed-off-by: Igor Ryzhov --- app/test/test_kni.c | 62 +++-- examples/kni/main.c | 8 +++-- kernel/linux/kni/kni_dev.h | 5 +++ kernel/linux/kni/kni_misc.c | 47 lib/kni/rte_kni.c | 41 ++-- lib/kni/rte_kni.h | 12 +++ lib/kni/rte_kni_common.h| 9 ++ 7 files changed, 126 insertions(+), 58 deletions(-) diff --git a/app/test/test_kni.c b/app/test/test_kni.c index 96733554b6c4..0df028696f36 100644 --- a/app/test/test_kni.c +++ b/app/test/test_kni.c @@ -122,9 +122,32 @@ kni_change_mtu(uint16_t port_id, unsigned int new_mtu) return 0; } +static int +kni_get_carrier(const char *name) +{ + FILE *fd; + char path[128]; + int carrier; + + snprintf(path, sizeof(path), "/sys/devices/virtual/net/%s/carrier", + name); + fd = fopen(path, "r"); + if (fd == NULL) + return -1; + + if (fscanf(fd, "%d", &carrier) != 1) + return -1; + + fclose(fd); + + return carrier; +} + static int test_kni_link_change(void) { + struct rte_eth_link link; + int carrier; int ret; int pid; @@ -135,42 +158,47 @@ test_kni_link_change(void) } if (pid == 0) { + link.link_speed = ETH_SPEED_NUM_10G; + link.link_duplex = ETH_LINK_FULL_DUPLEX; + link.link_autoneg = ETH_LINK_AUTONEG; + printf("Starting KNI Link status change tests.\n"); if (system(IFCONFIG TEST_KNI_PORT" up") == -1) { ret = -1; goto error; } - ret = rte_kni_update_link(test_kni_ctx, 1); + link.link_status = ETH_LINK_UP; + ret = rte_kni_update_link(test_kni_ctx, &link); if (ret < 0) { printf("Failed to change link state to Up ret=%d.\n", ret); goto error; } rte_delay_ms(1000); - printf("KNI: Set LINKUP, previous state=%d\n", ret); - - ret = rte_kni_update_link(test_kni_ctx, 0); - if (ret != 1) { - printf( - "Failed! Previous link state should be 1, returned %d.\n", - ret); + carrier = kni_get_carrier(TEST_KNI_PORT); + if (carrier != 1) { + printf("Carrier did not change to Up in kernel.\n"); + ret = -1; goto error; } - rte_delay_ms(1000); - printf("KNI: Set LINKDOWN, previous state=%d\n", ret); + printf("KNI: Set LINKUP\n"); - ret = rte_kni_update_link(test_kni_ctx, 1); - if (ret != 0) { - printf( - "Failed! Previous link state should be 0, returned %d.\n", + link.link_status = ETH_LINK_DOWN; + ret = rte_kni_update_link(test_kni_ctx, &link); + if (ret < 0) { + printf("Failed to change link state to Down ret=%d.\n", ret); goto error; } - printf("KNI: Set LINKUP, previous state=%d\n", ret); - - ret = 0; rte_delay_ms(1000); + carrier = kni_get_carrier(TEST_KNI_PORT); + if (carrier != 0) { + printf("Carrier did not change to Down in kernel.\n"); + ret = -1; + goto error; + } + printf("KNI: Set LINKDOWN\n"); error: if (system(IFCONFIG TEST_KNI_PORT" down") == -1) diff --git a/examples/kni/main.c b/examples/kni/main.c index beabb3c848aa..aea44beac550 100644 --- a/examples/kni/main.c +++ b/examples/kni/main.c @@ -85,6 +85,7 @@ struct kni_port_params { unsigned lcore_tx; /* lcore ID for TX */ uint32_t nb_lcore_k; /* Number of lcores for KNI multi kernel threads */ uint32_t nb_kni; /* Number of KNI devices
Re: [dpdk-dev] [PATCH v5 1/3] kni: rework rte_kni_update_link using ioctl
Could you please clarify where exactly do I need to use rtnl lock? >From what I understand, netif_carrier_on/off can be called without the lock. On Thu, Aug 26, 2021 at 8:15 PM Stephen Hemminger < step...@networkplumber.org> wrote: > On Thu, 26 Aug 2021 18:19:09 +0300 > Igor Ryzhov wrote: > > > > > +static int > > +kni_ioctl_link(struct net *net, uint32_t ioctl_num, > > + unsigned long ioctl_param) > > +{ > > + struct kni_net *knet = net_generic(net, kni_net_id); > > + int ret = -EINVAL; > > + struct kni_dev *dev, *n; > > + struct rte_kni_link_info link_info; > > + struct net_device *netdev; > > + > > + if (_IOC_SIZE(ioctl_num) > sizeof(link_info)) > > + return -EINVAL; > > + > > + if (copy_from_user(&link_info, (void *)ioctl_param, > sizeof(link_info))) > > + return -EFAULT; > > + > > + if (strlen(link_info.name) == 0) > > + return -EINVAL; > > + > > + down_read(&knet->kni_list_lock); > > + list_for_each_entry_safe(dev, n, &knet->kni_list_head, list) { > > + if (strncmp(dev->name, link_info.name, RTE_KNI_NAMESIZE) > != 0) > > + continue; > > + > > + netdev = dev->net_dev; > > + > > + if (link_info.status) { > > + netif_carrier_on(netdev); > > + > > + dev->speed = link_info.speed; > > + dev->duplex = link_info.duplex; > > + dev->autoneg = link_info.autoneg; > > + } else { > > + netif_carrier_off(netdev); > > + } > > + > > + ret = 0; > > + break; > > + } > > + up_read(&knet->kni_list_lock); > > + > > + return ret; > > +} > > + > > You need to be using the RTNL mutex in KNI here (and probably elsewhere). > The use of semaphore for list lock should also be replaced by a mutex. > > The KNI driver was written long ago and was never reviewed by people > knowledgeable about kernel networking. That is one reason IMHO KNI > should not be used in production systems. > >
[dpdk-dev] [PATCH v5 3/3] app/test: fix return value of test_kni_link_change
If the child process returns -1, WEXITSTATUS converts to 255 because it takes only 8 least significant bits. As a result, the test is not considered failed when it should. Signed-off-by: Igor Ryzhov --- app/test/test_kni.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/test/test_kni.c b/app/test/test_kni.c index 0df028696f36..70c81f4af0f4 100644 --- a/app/test/test_kni.c +++ b/app/test/test_kni.c @@ -214,7 +214,7 @@ test_kni_link_change(void) p_ret = waitpid(pid, &status, WNOHANG); if (p_ret != 0) { if (WIFEXITED(status)) - return WEXITSTATUS(status); + return WEXITSTATUS(status) ? -1 : 0; return -1; } rte_delay_ms(10); -- 2.33.0
[dpdk-dev] [PATCH v5 2/3] kni: implement basic get_link_ksettings callback
It allows inspecting link parameters using ethtool. Signed-off-by: Igor Ryzhov --- kernel/linux/kni/kni_net.c | 25 + 1 file changed, 25 insertions(+) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 611719b5ee27..931b9daf104f 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -820,9 +820,34 @@ static void kni_get_drvinfo(struct net_device *dev, strlcpy(info->driver, "kni", sizeof(info->driver)); } +#ifdef ETHTOOL_GLINKSETTINGS +static int kni_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *settings) +{ + struct kni_dev *kni = netdev_priv(dev); + + settings->base.port = PORT_OTHER; + + if (netif_carrier_ok(dev)) { + settings->base.speed = kni->speed; + settings->base.duplex = kni->duplex; + settings->base.autoneg = kni->autoneg; + } else { + settings->base.speed = SPEED_UNKNOWN; + settings->base.duplex = DUPLEX_UNKNOWN; + settings->base.autoneg = AUTONEG_ENABLE; + } + + return 0; +} +#endif + static const struct ethtool_ops kni_net_ethtool_ops = { .get_drvinfo= kni_get_drvinfo, .get_link = ethtool_op_get_link, +#ifdef ETHTOOL_GLINKSETTINGS + .get_link_ksettings = kni_get_link_ksettings, +#endif }; void -- 2.33.0
[dpdk-dev] [PATCH v5 1/3] kni: rework rte_kni_update_link using ioctl
Current implementation doesn't allow us to update KNI carrier if the interface is not yet UP in kernel. It means that we can't use it in the same thread which is processing rte_kni_ops.config_network_if, which is very convenient, because it allows us to have correct carrier status of the interface right after we enabled it and we don't have to use any additional thread to track link status. Propagating speed/duplex/autoneg to the kernel module also allows us to implement ethtool_ops.get_link_ksettings callback. Suggested-by: Dan Gora Signed-off-by: Igor Ryzhov --- app/test/test_kni.c | 62 +++-- examples/kni/main.c | 8 +++-- kernel/linux/kni/kni_dev.h | 5 +++ kernel/linux/kni/kni_misc.c | 47 lib/kni/rte_kni.c | 38 ++- lib/kni/rte_kni.h | 12 +++ lib/kni/rte_kni_common.h| 9 ++ 7 files changed, 126 insertions(+), 55 deletions(-) diff --git a/app/test/test_kni.c b/app/test/test_kni.c index 96733554b6c4..0df028696f36 100644 --- a/app/test/test_kni.c +++ b/app/test/test_kni.c @@ -122,9 +122,32 @@ kni_change_mtu(uint16_t port_id, unsigned int new_mtu) return 0; } +static int +kni_get_carrier(const char *name) +{ + FILE *fd; + char path[128]; + int carrier; + + snprintf(path, sizeof(path), "/sys/devices/virtual/net/%s/carrier", + name); + fd = fopen(path, "r"); + if (fd == NULL) + return -1; + + if (fscanf(fd, "%d", &carrier) != 1) + return -1; + + fclose(fd); + + return carrier; +} + static int test_kni_link_change(void) { + struct rte_eth_link link; + int carrier; int ret; int pid; @@ -135,42 +158,47 @@ test_kni_link_change(void) } if (pid == 0) { + link.link_speed = ETH_SPEED_NUM_10G; + link.link_duplex = ETH_LINK_FULL_DUPLEX; + link.link_autoneg = ETH_LINK_AUTONEG; + printf("Starting KNI Link status change tests.\n"); if (system(IFCONFIG TEST_KNI_PORT" up") == -1) { ret = -1; goto error; } - ret = rte_kni_update_link(test_kni_ctx, 1); + link.link_status = ETH_LINK_UP; + ret = rte_kni_update_link(test_kni_ctx, &link); if (ret < 0) { printf("Failed to change link state to Up ret=%d.\n", ret); goto error; } rte_delay_ms(1000); - printf("KNI: Set LINKUP, previous state=%d\n", ret); - - ret = rte_kni_update_link(test_kni_ctx, 0); - if (ret != 1) { - printf( - "Failed! Previous link state should be 1, returned %d.\n", - ret); + carrier = kni_get_carrier(TEST_KNI_PORT); + if (carrier != 1) { + printf("Carrier did not change to Up in kernel.\n"); + ret = -1; goto error; } - rte_delay_ms(1000); - printf("KNI: Set LINKDOWN, previous state=%d\n", ret); + printf("KNI: Set LINKUP\n"); - ret = rte_kni_update_link(test_kni_ctx, 1); - if (ret != 0) { - printf( - "Failed! Previous link state should be 0, returned %d.\n", + link.link_status = ETH_LINK_DOWN; + ret = rte_kni_update_link(test_kni_ctx, &link); + if (ret < 0) { + printf("Failed to change link state to Down ret=%d.\n", ret); goto error; } - printf("KNI: Set LINKUP, previous state=%d\n", ret); - - ret = 0; rte_delay_ms(1000); + carrier = kni_get_carrier(TEST_KNI_PORT); + if (carrier != 0) { + printf("Carrier did not change to Down in kernel.\n"); + ret = -1; + goto error; + } + printf("KNI: Set LINKDOWN\n"); error: if (system(IFCONFIG TEST_KNI_PORT" down") == -1) diff --git a/examples/kni/main.c b/examples/kni/main.c index beabb3c848aa..aea44beac550 100644 --- a/examples/kni/main.c +++ b/examples/kni/main.c @@ -85,6 +85,7 @@ struct kni_port_params { unsigned lcore_tx; /* lcore ID for TX */ uint32_t nb_lcore_k; /* Number of lcores for KNI multi kernel threads */ uint32_t nb_kni; /* Number of KNI devices
Re: [dpdk-dev] [PATCH v4 1/2] kni: rework rte_kni_update_link using ioctl
Hi Ferruh, Thanks for the review. My comments inline. I'll send a new version later this week. On Mon, Jul 5, 2021 at 2:58 PM Ferruh Yigit wrote: > On 7/4/2021 6:06 PM, Igor Ryzhov wrote: > > Current implementation doesn't allow us to update KNI carrier if the > > interface is not yet UP in kernel. It means that we can't use it in the > > same thread which is processing rte_kni_ops.config_network_if, which is > > very convenient, because it allows us to have correct carrier status > > of the interface right after we enabled it and we don't have to use any > > additional thread to track link status. > > > > Propagating speed/duplex/autoneg to the kernel module also allows us to > > implement ethtool_ops.get_link_ksettings callback. > > > > Suggested-by: Dan Gora > > Signed-off-by: Igor Ryzhov > > --- > > app/test/test_kni.c | 32 ++--- > > examples/kni/main.c | 2 +- > > kernel/linux/kni/compat.h | 4 > > kernel/linux/kni/kni_dev.h | 5 > > kernel/linux/kni/kni_misc.c | 47 + > > kernel/linux/kni/kni_net.c | 15 > > lib/kni/rte_kni.c | 38 -- > > lib/kni/rte_kni.h | 10 > > lib/kni/rte_kni_common.h| 9 +++ > > 9 files changed, 89 insertions(+), 73 deletions(-) > > > > diff --git a/app/test/test_kni.c b/app/test/test_kni.c > > index 96733554b6c4..f9300552b3f6 100644 > > --- a/app/test/test_kni.c > > +++ b/app/test/test_kni.c > > @@ -125,6 +125,7 @@ kni_change_mtu(uint16_t port_id, unsigned int > new_mtu) > > static int > > test_kni_link_change(void) > > { > > + struct rte_eth_link link; > > int ret; > > int pid; > > > > @@ -135,42 +136,35 @@ test_kni_link_change(void) > > } > > > > if (pid == 0) { > > + link.link_speed = ETH_SPEED_NUM_10G; > > + link.link_duplex = ETH_LINK_FULL_DUPLEX; > > + link.link_autoneg = ETH_LINK_AUTONEG; > > + > > printf("Starting KNI Link status change tests.\n"); > > if (system(IFCONFIG TEST_KNI_PORT" up") == -1) { > > ret = -1; > > goto error; > > } > > > > - ret = rte_kni_update_link(test_kni_ctx, 1); > > + link.link_status = ETH_LINK_UP; > > + ret = rte_kni_update_link(test_kni_ctx, &link); > > if (ret < 0) { > > printf("Failed to change link state to Up > ret=%d.\n", > > ret); > > goto error; > > } > > rte_delay_ms(1000); > > - printf("KNI: Set LINKUP, previous state=%d\n", ret); > > - > > - ret = rte_kni_update_link(test_kni_ctx, 0); > > - if (ret != 1) { > > - printf( > > - "Failed! Previous link state should be 1, returned %d.\n", > > - ret); > > - goto error; > > - } > > - rte_delay_ms(1000); > > - printf("KNI: Set LINKDOWN, previous state=%d\n", ret); > > + printf("KNI: Set LINKUP\n"); > > > > - ret = rte_kni_update_link(test_kni_ctx, 1); > > - if (ret != 0) { > > - printf( > > - "Failed! Previous link state should be 0, returned %d.\n", > > + link.link_status = ETH_LINK_DOWN; > > + ret = rte_kni_update_link(test_kni_ctx, &link); > > + if (ret < 0) { > > + printf("Failed to change link state to Down > ret=%d.\n", > > ret); > > goto error; > > } > > Is there a way to verify the link status of the KNI interface, to double > check > setting link status succedded? > Yes, we can still read from the carrier file to check the current status. I'll add this to the test. > > > - printf("KNI: Set LINKUP, previous state=%d\n", ret); > > - > > - ret = 0; > > rte_delay_ms(1000); > > + printf("KNI: Set LINKDOWN\n"); > > > > error: > > if (system(IFCONFIG TEST_KNI_PORT" down") == -1) > >
[dpdk-dev] [PATCH v4 2/2] kni: implement basic get_link_ksettings callback
Signed-off-by: Igor Ryzhov --- kernel/linux/kni/kni_net.c | 25 + 1 file changed, 25 insertions(+) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 99da8d37dd6b..84357bec1341 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -805,9 +805,34 @@ static void kni_get_drvinfo(struct net_device *dev, strlcpy(info->driver, "kni", sizeof(info->driver)); } +#ifdef ETHTOOL_GLINKSETTINGS +static int kni_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *settings) +{ + struct kni_dev *kni = netdev_priv(dev); + + settings->base.port = PORT_OTHER; + + if (netif_carrier_ok(dev)) { + settings->base.speed = kni->speed; + settings->base.duplex = kni->duplex; + settings->base.autoneg = kni->autoneg; + } else { + settings->base.speed = SPEED_UNKNOWN; + settings->base.duplex = DUPLEX_UNKNOWN; + settings->base.autoneg = AUTONEG_ENABLE; + } + + return 0; +} +#endif + static const struct ethtool_ops kni_net_ethtool_ops = { .get_drvinfo= kni_get_drvinfo, .get_link = ethtool_op_get_link, +#ifdef ETHTOOL_GLINKSETTINGS + .get_link_ksettings = kni_get_link_ksettings, +#endif }; void -- 2.32.0
[dpdk-dev] [PATCH v4 1/2] kni: rework rte_kni_update_link using ioctl
Current implementation doesn't allow us to update KNI carrier if the interface is not yet UP in kernel. It means that we can't use it in the same thread which is processing rte_kni_ops.config_network_if, which is very convenient, because it allows us to have correct carrier status of the interface right after we enabled it and we don't have to use any additional thread to track link status. Propagating speed/duplex/autoneg to the kernel module also allows us to implement ethtool_ops.get_link_ksettings callback. Suggested-by: Dan Gora Signed-off-by: Igor Ryzhov --- app/test/test_kni.c | 32 ++--- examples/kni/main.c | 2 +- kernel/linux/kni/compat.h | 4 kernel/linux/kni/kni_dev.h | 5 kernel/linux/kni/kni_misc.c | 47 + kernel/linux/kni/kni_net.c | 15 lib/kni/rte_kni.c | 38 -- lib/kni/rte_kni.h | 10 lib/kni/rte_kni_common.h| 9 +++ 9 files changed, 89 insertions(+), 73 deletions(-) diff --git a/app/test/test_kni.c b/app/test/test_kni.c index 96733554b6c4..f9300552b3f6 100644 --- a/app/test/test_kni.c +++ b/app/test/test_kni.c @@ -125,6 +125,7 @@ kni_change_mtu(uint16_t port_id, unsigned int new_mtu) static int test_kni_link_change(void) { + struct rte_eth_link link; int ret; int pid; @@ -135,42 +136,35 @@ test_kni_link_change(void) } if (pid == 0) { + link.link_speed = ETH_SPEED_NUM_10G; + link.link_duplex = ETH_LINK_FULL_DUPLEX; + link.link_autoneg = ETH_LINK_AUTONEG; + printf("Starting KNI Link status change tests.\n"); if (system(IFCONFIG TEST_KNI_PORT" up") == -1) { ret = -1; goto error; } - ret = rte_kni_update_link(test_kni_ctx, 1); + link.link_status = ETH_LINK_UP; + ret = rte_kni_update_link(test_kni_ctx, &link); if (ret < 0) { printf("Failed to change link state to Up ret=%d.\n", ret); goto error; } rte_delay_ms(1000); - printf("KNI: Set LINKUP, previous state=%d\n", ret); - - ret = rte_kni_update_link(test_kni_ctx, 0); - if (ret != 1) { - printf( - "Failed! Previous link state should be 1, returned %d.\n", - ret); - goto error; - } - rte_delay_ms(1000); - printf("KNI: Set LINKDOWN, previous state=%d\n", ret); + printf("KNI: Set LINKUP\n"); - ret = rte_kni_update_link(test_kni_ctx, 1); - if (ret != 0) { - printf( - "Failed! Previous link state should be 0, returned %d.\n", + link.link_status = ETH_LINK_DOWN; + ret = rte_kni_update_link(test_kni_ctx, &link); + if (ret < 0) { + printf("Failed to change link state to Down ret=%d.\n", ret); goto error; } - printf("KNI: Set LINKUP, previous state=%d\n", ret); - - ret = 0; rte_delay_ms(1000); + printf("KNI: Set LINKDOWN\n"); error: if (system(IFCONFIG TEST_KNI_PORT" down") == -1) diff --git a/examples/kni/main.c b/examples/kni/main.c index beabb3c848aa..5833037cf1c9 100644 --- a/examples/kni/main.c +++ b/examples/kni/main.c @@ -755,7 +755,7 @@ monitor_all_ports_link_status(void *arg) } for (i = 0; i < p[portid]->nb_kni; i++) { prev = rte_kni_update_link(p[portid]->kni[i], - link.link_status); + &link); log_link_state(p[portid]->kni[i], prev, &link); } } diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h index 5f65640d5ed2..995d109c275a 100644 --- a/kernel/linux/kni/compat.h +++ b/kernel/linux/kni/compat.h @@ -61,10 +61,6 @@ #define kni_sock_map_fd(s) sock_map_fd(s, 0) #endif -#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0) -#define HAVE_CHANGE_CARRIER_CB -#endif - #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 14, 0) #define ether_addr_copy(dst, src) memcpy(dst, src, ETH_ALEN) #endif diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index c15da311ba25..969108cc30f8 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/l
Re: [dpdk-dev] [PATCH v3] kni: rework rte_kni_update_link using ioctl
Thanks Ferruh, I'll send an update later this week. I also want to add a "Suggested-by: Dan Gora " as it was his idea. Dan, please let me know if you don't want this tag to be added. Thanks, Igor On Mon, Jun 28, 2021 at 3:55 PM Ferruh Yigit wrote: > On 10/27/2019 8:16 PM, Igor Ryzhov wrote: > > Hi Ferruh, Dan, > > > > Sure, I remember last year discussion but now I see the problem in > current > > implementation. > > > > Ferruh, here is an example: > > > > We have a thread in the application that processes KNI commands from the > > kernel. > > It receives config_network_if command to set interface up, calls > > rte_eth_dev_start, and here is the problem. > > We cannot call current rte_kni_update_link from here as the interface is > > not yet up in the kernel, > > as we didn't send a response for config_network_if yet. So we need to > send > > a response first and only > > after that, we can use rte_kni_update_link. Actually, we don't even know > > the exact time between we > > send a response and the moment when the kernel receives it and the > > interface becomes up. > > We always have a dependency on the interface state in the kernel. With > > ioctl approach, we don't > > have such dependency - we can call rte_kni_update_link whenever we want, > > even when the interface is > > down in the kernel. As I explained, it's common when processing > > config_network_if to set interface up. > > > > Hi Igor, > > I agree with the mentioned problem. When the KNI interface is down, not > able to > update the link carrier status is not convenient. > > For a physical interface this may make sense, since interface won't be > used by > the OS, no need to power on the PHY and trace the carrier status. But for > the > intention of the original link set feature, it requires to be able to > update the > carrier status independent from the interface up/down status. > > Overall, also agree to not introduce a new ioctl and use existing > interface, but > for this case existing interface doesn't exactly fit to the intended use > case > and I am OK have the ioctl. > > Can you please send a new version rebasing latest head, we can continue on > that one? > > Thanks, > ferruh > > > > Igor > > > > On Mon, Oct 14, 2019 at 11:56 PM Dan Gora wrote: > > > >> Here's another link to the thread where this was discussed last year.. > >> Igor was actually on this thread as well... > >> > >> https://mails.dpdk.org/archives/dev/2018-August/110383.html > >> > >> On Mon, Oct 14, 2019 at 4:01 PM Dan Gora wrote: > >>> > >>> My original patch to add this feature was basically the same thing as > >>> this: setting the link status via a KNI ioctl. That method was > >>> rejected after _much_ discussion and we eventually settled on the > >>> currently implementation. > >>> > >>> My original patch was here: Message-Id: < > >> 20180628225548.21885-1...@adax.com> > >>> > >>> If you search for KNI and d...@adax.com in the DPDK devel list you > >>> should be able to suss out the whole discussion that lead to the > >>> current implementation. > >>> > >>> thanks > >>> dan > >>> > >>> On Mon, Oct 14, 2019 at 1:17 PM Ferruh Yigit > >> wrote: > >>>> > >>>> On 10/14/2019 5:10 PM, Ferruh Yigit wrote: > >>>>> On 9/25/2019 10:36 AM, Igor Ryzhov wrote: > >>>>>> Current implementation doesn't allow us to update KNI carrier if the > >>>>>> interface is not yet UP in kernel. It means that we can't use it in > >> the > >>>>>> same thread which is processing rte_kni_ops.config_network_if, > >> which is > >>>>>> very convenient, because it allows us to have correct carrier status > >>>>>> of the interface right after we enabled it and we don't have to use > >> any > >>>>>> additional thread to track link status. > >>>>> > >>>>> Hi Igor, > >>>>> > >>>>> The existing thread tracks the link status of the physical device > >> and reflects > >>>>> the changes to the kni netdev, but the "struct rte_kni_ops" > >>>>> (rte_kni_ops.config_network_if) works other way around, it captures > >> (some) > >>>>> requests to kni netdev and reflects them to the underlying physical > >> device. > >>>>> Even 'rte_kni_update_link()' updated to use ioctl, the thread still > >> looks > >>>>> required and this patch doesn't really changes that part. > >>>>> > >>>>> Also I am reluctant to extend the KNI ioctl interface when there is > >> a generic > >>>>> way to do that work. > >>>>> > >>>>> What is the use case of updating kni netdev carrier status when the > >> interface is > >>>>> down? > >>>> > >>>> btw, if the problem is status of the interface being 'no-carrier' by > >> default, > >>>> this can be changed by "carrier=on" parameter of the kni kernel > module: > >>>> "insmod ./build/kmod/rte_kni.ko carrier=on" > >> > >
Re: [dpdk-dev] Experimental symbols in kni lib
Hi Ferruh, all, Let's please discuss another approach to setting KNI link status before making this API stable: http://patches.dpdk.org/project/dpdk/patch/20190925093623.18419-1-iryz...@nfware.com/ I explained the problem with the current implementation there. More than that, using ioctl approach makes it possible to set also speed and duplex and use them to implement get_link_ksettings callback. I can send patches for both features. Igor On Thu, Jun 24, 2021 at 4:54 PM Kinsella, Ray wrote: > Sounds more than reasonable, +1 from me. > > Ray K > > On 24/06/2021 14:24, Ferruh Yigit wrote: > > On 6/24/2021 11:42 AM, Kinsella, Ray wrote: > >> Hi Ferruh, > >> > >> The following kni experimental symbols are present in both v21.05 and > v19.11 release. These symbols should be considered for promotion to stable > as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= > 2yrs at this point. > >> > >> * rte_kni_update_link > >> > >> Ray K > >> > > > > Hi Ray, > > > > Thanks for follow up. > > > > I just checked the API and planning a small behavior update to it. > > If the update is accepted, I suggest keeping the API experimental for > 21.08 too, > > but can mature it on v21.11. > > > > Thanks, > > ferruh > > >
Re: [dpdk-dev] [dpdk-stable] [PATCH v5 3/3] kni: fix kernel deadlock when using mlx devices
Sorry I remembered the problem with the deadlock. We can't just make the shutdown command synchronous, because we can't release the rtnl_lock anyway. So regardless of the process mode (sync/async), we have to preserve the lock when processing the shutdown. It looks like two different settings... On Fri, Apr 23, 2021 at 3:43 PM Igor Ryzhov wrote: > Hi Ferruh, > > Thanks. I think it would be great to make this configurable, and maybe even > make shutdown synchronous by default to preserve the old behavior. > > I would be grateful if you could spend time on the work and I am ready to > review it. > > Igor > > On Fri, Apr 23, 2021 at 11:59 AM Ferruh Yigit > wrote: > >> On 4/23/2021 9:41 AM, Igor Ryzhov wrote: >> > This patch changes the behavior for KNI interface shutdown. >> > Previously we would receive a real response from the driver, now we >> > always receive success. >> > I think this should be reflected in the docs/release notes. >> > >> >> Hi Igor, >> >> Make sense, I can add it. >> >> Meanwhile do you think has a benefit to make shutdown behavior >> configurable? >> Async/Sync shutdown based on module param? >> >> > Igor >> > >> > On Wed, Apr 21, 2021 at 2:07 AM Thomas Monjalon > > <mailto:tho...@monjalon.net>> wrote: >> > >> > 12/04/2021 16:35, Elad Nachman: >> > > Hi, >> > > >> > > The new patch is fine by me. >> > > >> > > Tested several dozens restarts of our proprietary application >> without >> > > apparent problem. >> > >> > Series applied, thanks. >> > >> > >> >>
Re: [dpdk-dev] [dpdk-stable] [PATCH v5 3/3] kni: fix kernel deadlock when using mlx devices
Hi Ferruh, Thanks. I think it would be great to make this configurable, and maybe even make shutdown synchronous by default to preserve the old behavior. I would be grateful if you could spend time on the work and I am ready to review it. Igor On Fri, Apr 23, 2021 at 11:59 AM Ferruh Yigit wrote: > On 4/23/2021 9:41 AM, Igor Ryzhov wrote: > > This patch changes the behavior for KNI interface shutdown. > > Previously we would receive a real response from the driver, now we > > always receive success. > > I think this should be reflected in the docs/release notes. > > > > Hi Igor, > > Make sense, I can add it. > > Meanwhile do you think has a benefit to make shutdown behavior > configurable? > Async/Sync shutdown based on module param? > > > Igor > > > > On Wed, Apr 21, 2021 at 2:07 AM Thomas Monjalon > <mailto:tho...@monjalon.net>> wrote: > > > > 12/04/2021 16:35, Elad Nachman: > > > Hi, > > > > > > The new patch is fine by me. > > > > > > Tested several dozens restarts of our proprietary application > without > > > apparent problem. > > > > Series applied, thanks. > > > > > >
Re: [dpdk-dev] [dpdk-stable] [PATCH v5 3/3] kni: fix kernel deadlock when using mlx devices
This patch changes the behavior for KNI interface shutdown. Previously we would receive a real response from the driver, now we always receive success. I think this should be reflected in the docs/release notes. Igor On Wed, Apr 21, 2021 at 2:07 AM Thomas Monjalon wrote: > 12/04/2021 16:35, Elad Nachman: > > Hi, > > > > The new patch is fine by me. > > > > Tested several dozens restarts of our proprietary application without > > apparent problem. > > Series applied, thanks. > > >
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4
Stephen, No, I don't have a better proposal, but I think it is not correct to change the behavior of KNI (making link down without a real response). Even though we know that communicating with userspace under rtnl_lock is a bad idea, it works as it is for many years already. Elad, I agree with you that KNI should be removed from the main tree if it is not possible to fix this __dev_close_many issue. There were discussions about this multiple times already, but no one is working on this AFAIK. Last time the discussion was a month ago: https://www.mail-archive.com/dev@dpdk.org/msg196033.html Igor On Fri, Feb 26, 2021 at 8:43 PM Elad Nachman wrote: > The way the kernel handles its locks and lists for the dev close many > path, there is no way you can go around this with rtnl unlocked : > " > > There is a race condition in __dev_close_many() processing the > close_list while the application terminates. > It looks like if two vEth devices are terminating, > and one releases the rtnl lock, the other takes it, > updating the close_list in an unstable state, > causing the close_list to become a circular linked list, > hence list_for_each_entry() will endlessly loop inside > __dev_close_many() . > > " > And I don't expect David Miller will bend the kernel networking for DPDK > or KNI. > > But - Stephen - if you can personally convince David to accept a > kernel patch which will separate the close_list locking mechanism to a > separate (RCU?) lock, then I can introduce first a patch to the kernel > which will add a lock for the close_list, this way rtnl can be > unlocked for the if down case. > > After that kernel patch, your original patch + relocation of the sync > mutex locking will do the job . > > Otherwise, rtnl has to be kept locked all of the way for the if down > case in order to prevent corruption causing a circular linked list out > of the close_list, causing a hang in the kernel. > > Currently, the rtnl lock is the only thing keeping the close_list from > corruption. > > If you doubt rtnl cannot be unlocked for dev close path, you can > consult David for his opinion, as I think it is critical to understand > what the kernel can or cannot do, or expects to be done before we can > unlock its locks as we wish inside rte_kni.ko . > > Otherwise, if we are still in disagreement on how to patch this set of > problems, I think the responsible way around it is to completely > remove kni from the main dpdk tree and move it to dpdk-kmods > repository. > > I know BSD style open-source does not carry legal responsibility from > the developers, but I think when a bunch of developers know a piece of > code is highly buggy, they should not leave it for countless new users > to bounce their head desperately against, if they cannot agree on a > correct way to solve the bunch of problems, of which I think we all > agree exist (we just do not agree on the proper solution or patch)... > > That's my two cents, > > Elad. > > On Fri, Feb 26, 2021 at 5:49 PM Stephen Hemminger > wrote: > > > > On Fri, 26 Feb 2021 00:01:01 +0300 > > Igor Ryzhov wrote: > > > > > Hi Elad, > > > > > > Thanks for the patch, but this is still NACK from me. > > > > > > The only real advantage of KNI over other exceptional-path techniques > > > like virtio-user is the ability to configure DPDK-managed interfaces > > > directly > > > from the kernel using well-known utils like iproute2. A very important > part > > > of this is getting responses from the DPDK app and knowing the actual > > > result of command execution. > > > If you're making async requests to the application and you don't know > > > the result, then what's the point of using KNI at all? > > > > > > Igor > > > > Do you have a better proposal that keeps the request result but does not > > call userspace with lock held. > > > > PS: I also have strong dislike of KNI, as designed it would have been > rejected > > by Linux kernel developers. A better solution would be userspace > version of > > something like devlink devices. But doing control operations by proxy is > > a locking nightmare. >
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4
Hi Elad, Thanks for the patch, but this is still NACK from me. The only real advantage of KNI over other exceptional-path techniques like virtio-user is the ability to configure DPDK-managed interfaces directly from the kernel using well-known utils like iproute2. A very important part of this is getting responses from the DPDK app and knowing the actual result of command execution. If you're making async requests to the application and you don't know the result, then what's the point of using KNI at all? Igor On Thu, Feb 25, 2021 at 5:32 PM Elad Nachman wrote: > This part of the series includes my fixes for the issues reported > by Ferruh and Igor (and Igor comments for v3 of the patch) > on top of part 1 of the patch series: > > A. KNI sync lock is being locked while rtnl is held. > If two threads are calling kni_net_process_request() , > then the first one will take the sync lock, release rtnl lock then sleep. > The second thread will try to lock sync lock while holding rtnl. > The first thread will wake, and try to lock rtnl, resulting in a deadlock. > The remedy is to release rtnl before locking the KNI sync lock. > Since in between nothing is accessing Linux network-wise, > no rtnl locking is needed. > > B. There is a race condition in __dev_close_many() processing the > close_list while the application terminates. > It looks like if two vEth devices are terminating, > and one releases the rtnl lock, the other takes it, > updating the close_list in an unstable state, > causing the close_list to become a circular linked list, > hence list_for_each_entry() will endlessly loop inside > __dev_close_many() . > Since the description for the original patch indicate the > original motivation was bringing the device up, > I have changed kni_net_process_request() to hold the rtnl mutex > in case of bringing the device down since this is the path called > from __dev_close_many() , causing the corruption of the close_list. > In order to prevent deadlock in Mellanox device in this case, the > code has been modified not to wait for user-space while holding > the rtnl lock. > Instead, after the request has been sent, all locks are relinquished > and the function exits immediately with return code of zero (success). > > To summarize: > request != interface down : unlock rtnl, send request to user-space, > wait for response, send the response error code to caller in user-space. > > request == interface down: send request to user-space, return immediately > with error code of 0 (success) to user-space. > > Signed-off-by: Elad Nachman > > > --- > v4: > * for if down case, send asynchronously with rtnl locked and without > wait, returning immediately to avoid both kernel race conditions > and deadlock in user-space > v3: > * Include original patch and new patch as a series of patch, added a > comment to the new patch > v2: > * rebuild the patch as increment from patch 64106 > * fix comment and blank lines > --- > kernel/linux/kni/kni_net.c | 41 +++-- > lib/librte_kni/rte_kni.c| 7 -- > lib/librte_kni/rte_kni_common.h | 1 + > 3 files changed, 40 insertions(+), 9 deletions(-) > > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > index f0b6e9a8d..ba991802b 100644 > --- a/kernel/linux/kni/kni_net.c > +++ b/kernel/linux/kni/kni_net.c > @@ -110,12 +110,34 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > void *resp_va; > uint32_t num; > int ret_val; > + int req_is_dev_stop = 0; > + > + /* For configuring the interface to down, > +* rtnl must be held all the way to prevent race condition > +* inside __dev_close_many() between two netdev instances of KNI > +*/ > + if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF && > + req->if_up == 0) > + req_is_dev_stop = 1; > > ASSERT_RTNL(); > > + /* Since we need to wait and RTNL mutex is held > +* drop the mutex and hold reference to keep device > +*/ > + if (!req_is_dev_stop) { > + dev_hold(dev); > + rtnl_unlock(); > + } > + > mutex_lock(&kni->sync_lock); > > - /* Construct data */ > + /* Construct data, for dev stop send asynchronously > +* so instruct user-space not to sent response as no > +* one will be waiting for it. > +*/ > + if (req_is_dev_stop) > + req->skip_post_resp_to_q = 1; > memcpy(kni->sync_kva, req, sizeof(struct rte_kni_request)); > num = kni_fifo_put(kni->req_q, &kni->sync_va, 1); > if (num < 1) { > @@ -124,16 +146,16 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > goto fail; > } > > - /* Since we need to wait and RTNL mutex is held > -* drop the mutex and hold refernce to keep device > + /* No result available since reques
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3
Elad, To make it work on Mellanox NIC, you need to find a way to send ALL requests to userspace without rtnl_lock held, including link down. As I understand, the race condition in __dev_close_many must be solved somehow. I can't provide remote access, but I am happy to test on Mellanox NICs, if you find a way to fix link down requests. On Wed, Feb 24, 2021 at 7:11 PM Elad Nachman wrote: > Sorry, don't have Mellanox NIC currently. Will have one in 8-12 weeks. > Will be happy to test it remotely if anyone can provide remote HW or > VM (Azure, for example). > > Elad. > > On Wed, Feb 24, 2021 at 5:18 PM Igor Ryzhov wrote: > > > > Stephen's idea was to fix the deadlock when working with the bifurcated > driver. > > Your rework breaks this because you still send link down requests under > rtnl_lock. > > Did you test your patch with Mellanox devices? > > > > On Wed, Feb 24, 2021 at 5:56 PM Elad Nachman wrote: > >> > >> The deadlock scenarios are explained below: > >> > >> It is described in Stephen Hemminger's original patch: > >> > >> " > >> > >> This fixes a deadlock when using KNI with bifurcated drivers. > >> Bringing kni device up always times out when using Mellanox > >> devices. > >> > >> The kernel KNI driver sends message to userspace to complete > >> the request. For the case of bifurcated driver, this may involve > >> an additional request to kernel to change state. This request > >> would deadlock because KNI was holding the RTNL mutex. > >> > >> " > >> > >> And also in my patch: > >> > >> " > >> KNI sync lock is being locked while rtnl is held. > >> If two threads are calling kni_net_process_request() , > >> then the first one will take the sync lock, release rtnl lock then > sleep. > >> The second thread will try to lock sync lock while holding rtnl. > >> The first thread will wake, and try to lock rtnl, resulting in a > deadlock. > >> The remedy is to release rtnl before locking the KNI sync lock. > >> Since in between nothing is accessing Linux network-wise, > >> no rtnl locking is needed. > >> " > >> > >> FYI, > >> > >> Elad. > >> > >> On Wed, Feb 24, 2021 at 4:41 PM Igor Ryzhov wrote: > >> > > >> > Both link up and link down also work for me without this patch. > >> > So what's the point in merging it? > >> > > >> > Just to clarify - I am not against the idea of this patch. > >> > Talking to userspace under rtnl_lock is a bad idea. > >> > I just think that any patch should fix some specified problem. > >> > > >> > If this patch is trying to solve the overall "userspace request under > rtnl_lock" problem, > >> > then it doesn't solve it correctly, because we still send link down > requests under the lock. > >> > > >> > If this patch is trying to solve some other issue, for example, some > "KNI deadlocks" > >> > you're talking about, then you should explain what these deadlocks > are, how to reproduce > >> > them and why this patch solves the issue. > >> > > >> > On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman > wrote: > >> >> > >> >> I tested both link up and link down many times without any problems > on > >> >> 100 restarts of the application. > >> >> > >> >> Having KNI deadlock frequently for real life applications is far > worst, IMHO. > >> >> > >> >> FYI > >> >> > >> >> Elad. > >> >> > >> >> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov > wrote: > >> >> > > >> >> > Elad, > >> >> > > >> >> > I understand your point. > >> >> > But the fact that this fix works for you doesn't mean that it will > work for all DPDK users. > >> >> > > >> >> > For example, I provided two simple commands: "ip link set up" and > "ip link set down". > >> >> > Your fix works for only one of them. For me, this is not a proper > fix. > >> >> > It may work for you because you don't disable interfaces, but it > will fail for users who do. > >> >> > > >> >> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman > wrote: > >> >> >>
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3
Stephen's idea was to fix the deadlock when working with the bifurcated driver. Your rework breaks this because you still send link down requests under rtnl_lock. Did you test your patch with Mellanox devices? On Wed, Feb 24, 2021 at 5:56 PM Elad Nachman wrote: > The deadlock scenarios are explained below: > > It is described in Stephen Hemminger's original patch: > > " > > This fixes a deadlock when using KNI with bifurcated drivers. > Bringing kni device up always times out when using Mellanox > devices. > > The kernel KNI driver sends message to userspace to complete > the request. For the case of bifurcated driver, this may involve > an additional request to kernel to change state. This request > would deadlock because KNI was holding the RTNL mutex. > > " > > And also in my patch: > > " > KNI sync lock is being locked while rtnl is held. > If two threads are calling kni_net_process_request() , > then the first one will take the sync lock, release rtnl lock then sleep. > The second thread will try to lock sync lock while holding rtnl. > The first thread will wake, and try to lock rtnl, resulting in a deadlock. > The remedy is to release rtnl before locking the KNI sync lock. > Since in between nothing is accessing Linux network-wise, > no rtnl locking is needed. > " > > FYI, > > Elad. > > On Wed, Feb 24, 2021 at 4:41 PM Igor Ryzhov wrote: > > > > Both link up and link down also work for me without this patch. > > So what's the point in merging it? > > > > Just to clarify - I am not against the idea of this patch. > > Talking to userspace under rtnl_lock is a bad idea. > > I just think that any patch should fix some specified problem. > > > > If this patch is trying to solve the overall "userspace request under > rtnl_lock" problem, > > then it doesn't solve it correctly, because we still send link down > requests under the lock. > > > > If this patch is trying to solve some other issue, for example, some > "KNI deadlocks" > > you're talking about, then you should explain what these deadlocks are, > how to reproduce > > them and why this patch solves the issue. > > > > On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman wrote: > >> > >> I tested both link up and link down many times without any problems on > >> 100 restarts of the application. > >> > >> Having KNI deadlock frequently for real life applications is far worst, > IMHO. > >> > >> FYI > >> > >> Elad. > >> > >> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov wrote: > >> > > >> > Elad, > >> > > >> > I understand your point. > >> > But the fact that this fix works for you doesn't mean that it will > work for all DPDK users. > >> > > >> > For example, I provided two simple commands: "ip link set up" and "ip > link set down". > >> > Your fix works for only one of them. For me, this is not a proper fix. > >> > It may work for you because you don't disable interfaces, but it will > fail for users who do. > >> > > >> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman > wrote: > >> >> > >> >> Currently KNI has a lot of issues with deadlocks locking the code, > >> >> after this commit, they are gone, and the code runs properly without > >> >> crashing. > >> >> That was tested with over 100 restarts of the application, which > >> >> previously required a hard reset of the board. > >> >> > >> >> I think this benefit overweights the complication of the code. > >> >> > >> >> The function is called with rtnl locked because this is how the Linux > >> >> kernel is designed to work - it is not designed to work with deferral > >> >> to user-space mid-function. > >> >> > >> >> To fix all such requests you need to reach an agreement with Linux > >> >> netdev, which is unlikely. > >> >> > >> >> Calling user-space can be done asynchronously, as Ferruh asked, but > >> >> then you will always have to return success, even on failure, as > Linux > >> >> kernel does not have a mechanism to asynchronously report on failure > >> >> for such system calls. > >> >> > >> >> IMHO - weighting the non-reporting of failure versus how the code > >> >> looks (as it functions perfectly OK), I decided to
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3
Both link up and link down also work for me without this patch. So what's the point in merging it? Just to clarify - I am not against the idea of this patch. Talking to userspace under rtnl_lock is a bad idea. I just think that any patch should fix some specified problem. If this patch is trying to solve the overall "userspace request under rtnl_lock" problem, then it doesn't solve it correctly, because we still send link down requests under the lock. If this patch is trying to solve some other issue, for example, some "KNI deadlocks" you're talking about, then you should explain what these deadlocks are, how to reproduce them and why this patch solves the issue. On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman wrote: > I tested both link up and link down many times without any problems on > 100 restarts of the application. > > Having KNI deadlock frequently for real life applications is far worst, > IMHO. > > FYI > > Elad. > > On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov wrote: > > > > Elad, > > > > I understand your point. > > But the fact that this fix works for you doesn't mean that it will work > for all DPDK users. > > > > For example, I provided two simple commands: "ip link set up" and "ip > link set down". > > Your fix works for only one of them. For me, this is not a proper fix. > > It may work for you because you don't disable interfaces, but it will > fail for users who do. > > > > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman wrote: > >> > >> Currently KNI has a lot of issues with deadlocks locking the code, > >> after this commit, they are gone, and the code runs properly without > >> crashing. > >> That was tested with over 100 restarts of the application, which > >> previously required a hard reset of the board. > >> > >> I think this benefit overweights the complication of the code. > >> > >> The function is called with rtnl locked because this is how the Linux > >> kernel is designed to work - it is not designed to work with deferral > >> to user-space mid-function. > >> > >> To fix all such requests you need to reach an agreement with Linux > >> netdev, which is unlikely. > >> > >> Calling user-space can be done asynchronously, as Ferruh asked, but > >> then you will always have to return success, even on failure, as Linux > >> kernel does not have a mechanism to asynchronously report on failure > >> for such system calls. > >> > >> IMHO - weighting the non-reporting of failure versus how the code > >> looks (as it functions perfectly OK), I decided to go with > >> functionality. > >> > >> FYI, > >> > >> Elad. > >> > >> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov wrote: > >> > > >> > This looks more like a hack than an actual fix to me. > >> > > >> > After this commit: > >> > "ip link set up" is sent to the userspace with unlocked rtnl_lock > >> > "ip link set down" is sent to the userspace with locked rtnl_lock > >> > > >> > How is this really fixing anything? IMHO it only complicates the code. > >> > If talking with userspace under rtnl_lock is a problem, then we > should fix all such requests, not only part of them. > >> > If it is not a problem, then I don't see any point in merging this. > >> > > >> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman > wrote: > >> >> > >> >> This part of the series includes my fixes for the issues reported > >> >> by Ferruh and Igor on top of part 1 of the patch series: > >> >> > >> >> A. KNI sync lock is being locked while rtnl is held. > >> >> If two threads are calling kni_net_process_request() , > >> >> then the first one will take the sync lock, release rtnl lock then > sleep. > >> >> The second thread will try to lock sync lock while holding rtnl. > >> >> The first thread will wake, and try to lock rtnl, resulting in a > deadlock. > >> >> The remedy is to release rtnl before locking the KNI sync lock. > >> >> Since in between nothing is accessing Linux network-wise, > >> >> no rtnl locking is needed. > >> >> > >> >> B. There is a race condition in __dev_close_many() processing the > >> >> close_list while the application terminates. > >> >> It looks like if two vEth devices are terminating, &
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3
Elad, I understand your point. But the fact that this fix works for you doesn't mean that it will work for all DPDK users. For example, I provided two simple commands: "ip link set up" and "ip link set down". Your fix works for only one of them. For me, this is not a proper fix. It may work for you because you don't disable interfaces, but it will fail for users who do. On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman wrote: > Currently KNI has a lot of issues with deadlocks locking the code, > after this commit, they are gone, and the code runs properly without > crashing. > That was tested with over 100 restarts of the application, which > previously required a hard reset of the board. > > I think this benefit overweights the complication of the code. > > The function is called with rtnl locked because this is how the Linux > kernel is designed to work - it is not designed to work with deferral > to user-space mid-function. > > To fix all such requests you need to reach an agreement with Linux > netdev, which is unlikely. > > Calling user-space can be done asynchronously, as Ferruh asked, but > then you will always have to return success, even on failure, as Linux > kernel does not have a mechanism to asynchronously report on failure > for such system calls. > > IMHO - weighting the non-reporting of failure versus how the code > looks (as it functions perfectly OK), I decided to go with > functionality. > > FYI, > > Elad. > > On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov wrote: > > > > This looks more like a hack than an actual fix to me. > > > > After this commit: > > "ip link set up" is sent to the userspace with unlocked rtnl_lock > > "ip link set down" is sent to the userspace with locked rtnl_lock > > > > How is this really fixing anything? IMHO it only complicates the code. > > If talking with userspace under rtnl_lock is a problem, then we should > fix all such requests, not only part of them. > > If it is not a problem, then I don't see any point in merging this. > > > > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman wrote: > >> > >> This part of the series includes my fixes for the issues reported > >> by Ferruh and Igor on top of part 1 of the patch series: > >> > >> A. KNI sync lock is being locked while rtnl is held. > >> If two threads are calling kni_net_process_request() , > >> then the first one will take the sync lock, release rtnl lock then > sleep. > >> The second thread will try to lock sync lock while holding rtnl. > >> The first thread will wake, and try to lock rtnl, resulting in a > deadlock. > >> The remedy is to release rtnl before locking the KNI sync lock. > >> Since in between nothing is accessing Linux network-wise, > >> no rtnl locking is needed. > >> > >> B. There is a race condition in __dev_close_many() processing the > >> close_list while the application terminates. > >> It looks like if two vEth devices are terminating, > >> and one releases the rtnl lock, the other takes it, > >> updating the close_list in an unstable state, > >> causing the close_list to become a circular linked list, > >> hence list_for_each_entry() will endlessly loop inside > >> __dev_close_many() . > >> Since the description for the original patch indicate the > >> original motivation was bringing the device up, > >> I have changed kni_net_process_request() to hold the rtnl mutex > >> in case of bringing the device down since this is the path called > >> from __dev_close_many() , causing the corruption of the close_list. > >> > >> Signed-off-by: Elad Nachman > >> --- > >> v3: > >> * Include original patch and new patch as a series of patch, added a > >> comment to the new patch > >> v2: > >> * rebuild the patch as increment from patch 64106 > >> * fix comment and blank lines > >> --- > >> kernel/linux/kni/kni_net.c | 29 + > >> 1 file changed, 21 insertions(+), 8 deletions(-) > >> > >> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > >> index f0b6e9a8d..017e44812 100644 > >> --- a/kernel/linux/kni/kni_net.c > >> +++ b/kernel/linux/kni/kni_net.c > >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > >> void *resp_va; > >> uint32_t num; > >> int ret_val; > >> + int req_is_dev_stop = 0; > &g
Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3
This looks more like a hack than an actual fix to me. After this commit: "ip link set up" is sent to the userspace with unlocked rtnl_lock "ip link set down" is sent to the userspace with locked rtnl_lock How is this really fixing anything? IMHO it only complicates the code. If talking with userspace under rtnl_lock is a problem, then we should fix all such requests, not only part of them. If it is not a problem, then I don't see any point in merging this. On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman wrote: > This part of the series includes my fixes for the issues reported > by Ferruh and Igor on top of part 1 of the patch series: > > A. KNI sync lock is being locked while rtnl is held. > If two threads are calling kni_net_process_request() , > then the first one will take the sync lock, release rtnl lock then sleep. > The second thread will try to lock sync lock while holding rtnl. > The first thread will wake, and try to lock rtnl, resulting in a deadlock. > The remedy is to release rtnl before locking the KNI sync lock. > Since in between nothing is accessing Linux network-wise, > no rtnl locking is needed. > > B. There is a race condition in __dev_close_many() processing the > close_list while the application terminates. > It looks like if two vEth devices are terminating, > and one releases the rtnl lock, the other takes it, > updating the close_list in an unstable state, > causing the close_list to become a circular linked list, > hence list_for_each_entry() will endlessly loop inside > __dev_close_many() . > Since the description for the original patch indicate the > original motivation was bringing the device up, > I have changed kni_net_process_request() to hold the rtnl mutex > in case of bringing the device down since this is the path called > from __dev_close_many() , causing the corruption of the close_list. > > Signed-off-by: Elad Nachman > --- > v3: > * Include original patch and new patch as a series of patch, added a > comment to the new patch > v2: > * rebuild the patch as increment from patch 64106 > * fix comment and blank lines > --- > kernel/linux/kni/kni_net.c | 29 + > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > index f0b6e9a8d..017e44812 100644 > --- a/kernel/linux/kni/kni_net.c > +++ b/kernel/linux/kni/kni_net.c > @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > void *resp_va; > uint32_t num; > int ret_val; > + int req_is_dev_stop = 0; > + > + /* For configuring the interface to down, > +* rtnl must be held all the way to prevent race condition > +* inside __dev_close_many() between two netdev instances of KNI > +*/ > + if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF && > + req->if_up == 0) > + req_is_dev_stop = 1; > > ASSERT_RTNL(); > > + /* Since we need to wait and RTNL mutex is held > +* drop the mutex and hold reference to keep device > +*/ > + if (!req_is_dev_stop) { > + dev_hold(dev); > + rtnl_unlock(); > + } > + > mutex_lock(&kni->sync_lock); > > /* Construct data */ > @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > goto fail; > } > > - /* Since we need to wait and RTNL mutex is held > -* drop the mutex and hold refernce to keep device > -*/ > - dev_hold(dev); > - rtnl_unlock(); > - > ret_val = wait_event_interruptible_timeout(kni->wq, > kni_fifo_count(kni->resp_q), 3 * HZ); > - rtnl_lock(); > - dev_put(dev); > > if (signal_pending(current) || ret_val <= 0) { > ret = -ETIME; > @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > > fail: > mutex_unlock(&kni->sync_lock); > + if (!req_is_dev_stop) { > + rtnl_lock(); > + dev_put(dev); > + } > return ret; > } > > -- > 2.17.1 > >
Re: [dpdk-dev] KNI alternatives
On Wed, Jan 13, 2021 at 9:45 PM Thomas Monjalon wrote: > 13/01/2021 19:17, Igor Ryzhov: > > On Wed, Jan 13, 2021 at 8:10 PM Stephen Hemminger < > > step...@networkplumber.org> wrote: > > > On Wed, Jan 13, 2021, 9:06 AM Thomas Monjalon > wrote: > > > > > > > Hi, > > > > > > > > As discussed today in the techboard meeting, KNI has probably > > > > better alternatives today without using an out-of-tree module. > > > > Virtio-user is a good candidate to replace KNI. > > > > What is the performance of TAP? > > > > Is there a way to leverage af_packet, af_xdp, or even pcap > interfaces? > > > > > > Last time I tried. > > > Virtio user was as fast as KNI and consumed less cpu. Was seeing 10mpps > > > Tap was much slower. Like 1mpps. > > > > Performance is not the only question. The advantage of KNI we are > currently > > using is > > the ability to control the DPDK interfaces by the kernel. > > For example, to implement bonding in the DPDK application, it is possible > > to create KNI > > pair for each physical interface, create a bond interface in Linux over > > those KNI interfaces > > and just pass LACP packets between the app and the kernel. The kernel > > itself will control > > MACs, MTU, etc. of underlying interfaces. AFAIK it's not possible with > > virtio-user or tap. > > Am I wrong? > > I see at least 2 alternatives for bonding with kernel management: > - mlx5 bonding > - af_xdp interface for most of NICs > Bonding is just an example. My point is that KNI allows changing MAC, MTU, rx mode and admin status of a NIC used by DPDK application - all using standard Linux utilities without any need for additional APIs provided by the app. Yes, Mellanox NICs also allow this but it's just one of many drivers supported by DPDK. I'll dig into af_xdp, thanks.
Re: [dpdk-dev] KNI alternatives
On Wed, Jan 13, 2021 at 8:10 PM Stephen Hemminger < step...@networkplumber.org> wrote: > Last time I tried. > Virtio user was as fast as KNI and consumed less cpu. Was seeing 10mpps > Tap was much slower. Like 1mpps. > > Vpp uses virtio user. > > Sorry for top post. Only have phone internet > > On Wed, Jan 13, 2021, 9:06 AM Thomas Monjalon wrote: > > > Hi, > > > > As discussed today in the techboard meeting, KNI has probably > > better alternatives today without using an out-of-tree module. > > Virtio-user is a good candidate to replace KNI. > > What is the performance of TAP? > > Is there a way to leverage af_packet, af_xdp, or even pcap interfaces? Performance is not the only question. The advantage of KNI we are currently using is the ability to control the DPDK interfaces by the kernel. For example, to implement bonding in the DPDK application, it is possible to create KNI pair for each physical interface, create a bond interface in Linux over those KNI interfaces and just pass LACP packets between the app and the kernel. The kernel itself will control MACs, MTU, etc. of underlying interfaces. AFAIK it's not possible with virtio-user or tap. Am I wrong? > > > > In order to avoid using the KNI out-of-tree module, > > we should make the librte_kni compatible with an other interface. > > The big plan is then to move the KNI module out of the main DPDK repo, > > as we did for igb_uio. > > > > The first step of this plan might be to document pros & cons > > of the KNI alternatives inside the KNI documentation: > > https://doc.dpdk.org/guides/prog_guide/kernel_nic_interface.html > > > > As a start, this doc should be better referenced: > > > > https://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html > > > > Note: I won't do this update myself, so feel free to step up! > > Thanks > > > > > > >
Re: [dpdk-dev] [PATCH] net/i40e: fix counters
This code is just ported from the Linux kernel where it is used for around 7 years, so I suppose it is pretty safe. But of course, take your time to test it, I am fine with getting this in the next LTS release. Igor On Tue, Nov 24, 2020 at 12:43 PM Zhang, Qi Z wrote: > > > > -Original Message- > > From: dev On Behalf Of Thomas Monjalon > > Sent: Tuesday, November 24, 2020 4:25 PM > > To: Igor Ryzhov ; dev ; Guo, Jia > > > > Cc: dpdk stable ; Xing, Beilei ; > Yigit, > > Ferruh > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: fix counters > > > > I will follow the recommendation of Ferruh and i40e maintainers. > > It is risky but it can be applied just before the release. > > I will suggest not to merge this patch in this release cycle, we need time > to fully test it and it can always be captured in following LTS release if > no issue be found. > > Thanks > Qi > > > > > > 24/11/2020 04:34, Guo, Jia: > > > hi, igor ryzhov and Thomas > > > > > > Since this remain issue is report recently and we need to reproduce > > > the issue and evaluate the patch and guaranty no side affect for other > case, so > > I am not sure even I don't think it still have time window to hit 20.11. > But > > whatever we have begin to check your patch for now on. What do you think > so? > > > > > > > > > From: Igor Ryzhov > > > Sent: Friday, November 20, 2020 2:27 AM > > > To: dev > > > Cc: dpdk stable ; Xing, Beilei > > > ; Guo, Jia ; Thomas Monjalon > > > > > > Subject: Re: [PATCH] net/i40e: fix counters > > > > > > CC maintainers and Thomas. > > > > > > This fix should be 20.11. The issue is seen multiple times a day under > ~20G > > traffic with stats collection once per second. > > > > > > Igor > > > > > > On Tue, Nov 17, 2020 at 11:56 AM Igor Ryzhov > > mailto:iryz...@nfware.com>> wrote: > > > When low and high registers are read separately, this opens the door > > > to a race condition: > > > - low register is read > > > - NIC updates the registers > > > - high register is read > > > > > > Because of this, we may end up with an incorrect counter value. > > > Let's read the registers in one shot, as it is done in Linux kernel > > > since the introduction of the i40e driver. > > > > > > Fixes: 4861cde46116 ("i40e: new poll mode driver") > > > Cc: sta...@dpdk.org<mailto:sta...@dpdk.org> > > > Signed-off-by: Igor Ryzhov > > > mailto:iryz...@nfware.com>> > > > --- > > > drivers/net/i40e/base/i40e_osdep.h | 10 ++ > > > drivers/net/i40e/i40e_ethdev.c | 10 +++--- > > > 2 files changed, 17 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/i40e/base/i40e_osdep.h > > > b/drivers/net/i40e/base/i40e_osdep.h > > > index 64b15e1b6138..ebd687240006 100644 > > > --- a/drivers/net/i40e/base/i40e_osdep.h > > > +++ b/drivers/net/i40e/base/i40e_osdep.h > > > @@ -133,6 +133,14 @@ static inline uint32_t i40e_read_addr(volatile > void > > *addr) > > > return rte_le_to_cpu_32(I40E_PCI_REG(addr)); > > > } > > > > > > +#define I40E_PCI_REG64(reg)rte_read64(reg) > > > +#define I40E_PCI_REG64_ADDR(a, reg) \ > > > + ((volatile uint64_t *)((char *)(a)->hw_addr + (reg))) static > > > +inline uint64_t i40e_read64_addr(volatile void *addr) { > > > + return rte_le_to_cpu_64(I40E_PCI_REG64(addr)); > > > +} > > > + > > > #define I40E_PCI_REG_WRITE(reg, value) \ > > > rte_write32((rte_cpu_to_le_32(value)), reg) #define > > > I40E_PCI_REG_WRITE_RELAXED(reg, value) \ @@ -145,6 +153,8 @@ static > > > inline uint32_t i40e_read_addr(volatile void *addr) #define > > > I40E_WRITE_REG(hw, reg, value) \ > > > I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((hw), (reg)), (value)) > > > > > > +#define I40E_READ_REG64(hw, reg) > > > +i40e_read64_addr(I40E_PCI_REG64_ADDR((hw), (reg))) > > > + > > > #define rd32(a, reg) i40e_read_addr(I40E_PCI_REG_ADDR((a), (reg))) > > > #define wr32(a, reg, value) \ > > > I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((a), (reg)), (value)) > > > diff --git a/drivers/net/i40e/i40e_ethdev.c > > > b/drivers/net/i40e/i40e_ethdev.c index 74f4ac1f9d4e..53b1e9b9e067 > > > 100644 > > > --- a/drivers/net/i40e/i40e_ethdev.c > > > +++ b/drivers/net/i40e/i40e_ethdev.c > > > @@ -6451,9 +6451,13 @@ i40e_stat_update_48(struct i40e_hw *hw, { > > > uint64_t new_data; > > > > > > - new_data = (uint64_t)I40E_READ_REG(hw, loreg); > > > - new_data |= ((uint64_t)(I40E_READ_REG(hw, hireg) & > > > - I40E_16_BIT_MASK)) << I40E_32_BIT_WIDTH; > > > + if (hw->device_id == I40E_DEV_ID_QEMU) { > > > + new_data = (uint64_t)I40E_READ_REG(hw, loreg); > > > + new_data |= ((uint64_t)(I40E_READ_REG(hw, hireg) & > > > + I40E_16_BIT_MASK)) << > > I40E_32_BIT_WIDTH; > > > + } else { > > > + new_data = I40E_READ_REG64(hw, loreg); > > > + } > > > > > > if (!offset_loaded) > > > *offset = new_data; > > > -- > > > 2.29.2 > > > > > > > > > > > > >
Re: [dpdk-dev] [PATCH] net/i40e: fix counters
CC maintainers and Thomas. This fix should be 20.11. The issue is seen multiple times a day under ~20G traffic with stats collection once per second. Igor On Tue, Nov 17, 2020 at 11:56 AM Igor Ryzhov wrote: > When low and high registers are read separately, this opens the door to > a race condition: > - low register is read > - NIC updates the registers > - high register is read > > Because of this, we may end up with an incorrect counter value. > Let's read the registers in one shot, as it is done in Linux kernel > since the introduction of the i40e driver. > > Fixes: 4861cde46116 ("i40e: new poll mode driver") > Cc: sta...@dpdk.org > Signed-off-by: Igor Ryzhov > --- > drivers/net/i40e/base/i40e_osdep.h | 10 ++ > drivers/net/i40e/i40e_ethdev.c | 10 +++--- > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/i40e/base/i40e_osdep.h > b/drivers/net/i40e/base/i40e_osdep.h > index 64b15e1b6138..ebd687240006 100644 > --- a/drivers/net/i40e/base/i40e_osdep.h > +++ b/drivers/net/i40e/base/i40e_osdep.h > @@ -133,6 +133,14 @@ static inline uint32_t i40e_read_addr(volatile void > *addr) > return rte_le_to_cpu_32(I40E_PCI_REG(addr)); > } > > +#define I40E_PCI_REG64(reg)rte_read64(reg) > +#define I40E_PCI_REG64_ADDR(a, reg) \ > + ((volatile uint64_t *)((char *)(a)->hw_addr + (reg))) > +static inline uint64_t i40e_read64_addr(volatile void *addr) > +{ > + return rte_le_to_cpu_64(I40E_PCI_REG64(addr)); > +} > + > #define I40E_PCI_REG_WRITE(reg, value) \ > rte_write32((rte_cpu_to_le_32(value)), reg) > #define I40E_PCI_REG_WRITE_RELAXED(reg, value) \ > @@ -145,6 +153,8 @@ static inline uint32_t i40e_read_addr(volatile void > *addr) > #define I40E_WRITE_REG(hw, reg, value) \ > I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((hw), (reg)), (value)) > > +#define I40E_READ_REG64(hw, reg) > i40e_read64_addr(I40E_PCI_REG64_ADDR((hw), (reg))) > + > #define rd32(a, reg) i40e_read_addr(I40E_PCI_REG_ADDR((a), (reg))) > #define wr32(a, reg, value) \ > I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((a), (reg)), (value)) > diff --git a/drivers/net/i40e/i40e_ethdev.c > b/drivers/net/i40e/i40e_ethdev.c > index 74f4ac1f9d4e..53b1e9b9e067 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -6451,9 +6451,13 @@ i40e_stat_update_48(struct i40e_hw *hw, > { > uint64_t new_data; > > - new_data = (uint64_t)I40E_READ_REG(hw, loreg); > - new_data |= ((uint64_t)(I40E_READ_REG(hw, hireg) & > - I40E_16_BIT_MASK)) << I40E_32_BIT_WIDTH; > + if (hw->device_id == I40E_DEV_ID_QEMU) { > + new_data = (uint64_t)I40E_READ_REG(hw, loreg); > + new_data |= ((uint64_t)(I40E_READ_REG(hw, hireg) & > + I40E_16_BIT_MASK)) << I40E_32_BIT_WIDTH; > + } else { > + new_data = I40E_READ_REG64(hw, loreg); > + } > > if (!offset_loaded) > *offset = new_data; > -- > 2.29.2 > >
[dpdk-dev] [PATCH] net/i40e: fix counters
When low and high registers are read separately, this opens the door to a race condition: - low register is read - NIC updates the registers - high register is read Because of this, we may end up with an incorrect counter value. Let's read the registers in one shot, as it is done in Linux kernel since the introduction of the i40e driver. Fixes: 4861cde46116 ("i40e: new poll mode driver") Cc: sta...@dpdk.org Signed-off-by: Igor Ryzhov --- drivers/net/i40e/base/i40e_osdep.h | 10 ++ drivers/net/i40e/i40e_ethdev.c | 10 +++--- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/net/i40e/base/i40e_osdep.h b/drivers/net/i40e/base/i40e_osdep.h index 64b15e1b6138..ebd687240006 100644 --- a/drivers/net/i40e/base/i40e_osdep.h +++ b/drivers/net/i40e/base/i40e_osdep.h @@ -133,6 +133,14 @@ static inline uint32_t i40e_read_addr(volatile void *addr) return rte_le_to_cpu_32(I40E_PCI_REG(addr)); } +#define I40E_PCI_REG64(reg)rte_read64(reg) +#define I40E_PCI_REG64_ADDR(a, reg) \ + ((volatile uint64_t *)((char *)(a)->hw_addr + (reg))) +static inline uint64_t i40e_read64_addr(volatile void *addr) +{ + return rte_le_to_cpu_64(I40E_PCI_REG64(addr)); +} + #define I40E_PCI_REG_WRITE(reg, value) \ rte_write32((rte_cpu_to_le_32(value)), reg) #define I40E_PCI_REG_WRITE_RELAXED(reg, value) \ @@ -145,6 +153,8 @@ static inline uint32_t i40e_read_addr(volatile void *addr) #define I40E_WRITE_REG(hw, reg, value) \ I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((hw), (reg)), (value)) +#define I40E_READ_REG64(hw, reg) i40e_read64_addr(I40E_PCI_REG64_ADDR((hw), (reg))) + #define rd32(a, reg) i40e_read_addr(I40E_PCI_REG_ADDR((a), (reg))) #define wr32(a, reg, value) \ I40E_PCI_REG_WRITE(I40E_PCI_REG_ADDR((a), (reg)), (value)) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 74f4ac1f9d4e..53b1e9b9e067 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -6451,9 +6451,13 @@ i40e_stat_update_48(struct i40e_hw *hw, { uint64_t new_data; - new_data = (uint64_t)I40E_READ_REG(hw, loreg); - new_data |= ((uint64_t)(I40E_READ_REG(hw, hireg) & - I40E_16_BIT_MASK)) << I40E_32_BIT_WIDTH; + if (hw->device_id == I40E_DEV_ID_QEMU) { + new_data = (uint64_t)I40E_READ_REG(hw, loreg); + new_data |= ((uint64_t)(I40E_READ_REG(hw, hireg) & + I40E_16_BIT_MASK)) << I40E_32_BIT_WIDTH; + } else { + new_data = I40E_READ_REG64(hw, loreg); + } if (!offset_loaded) *offset = new_data; -- 2.29.2
Re: [dpdk-dev] [RFC v2 2/2] doc: announce queue stats moving to xstats
Done. https://bugs.dpdk.org/show_bug.cgi?id=558 On Thu, Oct 15, 2020 at 11:03 AM Thomas Monjalon wrote: > 15/10/2020 09:55, Igor Ryzhov: > > Hi Thomas, > > > > Thanks for the slides, I checked the latest code and multicast/broadcast > > counters mostly look standardized. But mlx5 adds the "port" word to all > > its xstats names. I suppose you may be the right person to ask - is there > > any specific reason for this? > > No specific reason, I consider it as a bug. > Please could you open a bugzilla ticket? > > > > On Wed, Oct 14, 2020 at 12:45 PM Thomas Monjalon > > wrote: > > > > > 14/10/2020 11:35, Igor Ryzhov: > > > > Hi Ferruh, > > > > > > > > As rte_eth_stats is going to be changed, is it possible to add new > > > counters > > > > there? > > > > For example, in/out multicast/broadcast packets. > > > > > > Good question. > > > In order to avoid redundancy, I prefer focusing on xstats > > > and plan deprecation of rte_eth_stats. > > > > > > The basic stats you are asking for should have standardized names > > > and reserved ids in xstats API. > > > Please see these slides: > > > > https://fast.dpdk.org/events/slides/DPDK-2019-09-Ethernet_Statistics.pdf > > > >
Re: [dpdk-dev] [RFC v2 2/2] doc: announce queue stats moving to xstats
Hi Thomas, Thanks for the slides, I checked the latest code and multicast/broadcast counters mostly look standardized. But mlx5 adds the "port" word to all its xstats names. I suppose you may be the right person to ask - is there any specific reason for this? On Wed, Oct 14, 2020 at 12:45 PM Thomas Monjalon wrote: > 14/10/2020 11:35, Igor Ryzhov: > > Hi Ferruh, > > > > As rte_eth_stats is going to be changed, is it possible to add new > counters > > there? > > For example, in/out multicast/broadcast packets. > > Good question. > In order to avoid redundancy, I prefer focusing on xstats > and plan deprecation of rte_eth_stats. > > The basic stats you are asking for should have standardized names > and reserved ids in xstats API. > Please see these slides: > https://fast.dpdk.org/events/slides/DPDK-2019-09-Ethernet_Statistics.pdf > > >
Re: [dpdk-dev] [RFC v2 2/2] doc: announce queue stats moving to xstats
Hi Ferruh, As rte_eth_stats is going to be changed, is it possible to add new counters there? For example, in/out multicast/broadcast packets. Igor On Wed, Oct 14, 2020 at 5:27 AM Ferruh Yigit wrote: > Queue stats will be removed from basic stats to xstats. > > It will be PMDs responsibility to fill queue stats based on number of > queues they have. > > Until all PMDs implement the xstats, a temporary > 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' device flag created. PMDs switched > to the xstats should clear this flag to bypass the ethdev layer autofill > for queue stats. > > Signed-off-by: Ferruh Yigit > --- > config/rte_config.h | 2 +- > doc/guides/rel_notes/deprecation.rst | 7 +++ > lib/librte_ethdev/rte_ethdev.h | 3 +++ > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/config/rte_config.h b/config/rte_config.h > index 03d90d78bc..9ef3b75940 100644 > --- a/config/rte_config.h > +++ b/config/rte_config.h > @@ -55,7 +55,7 @@ > > /* ether defines */ > #define RTE_MAX_QUEUES_PER_PORT 1024 > -#define RTE_ETHDEV_QUEUE_STAT_CNTRS 16 > +#define RTE_ETHDEV_QUEUE_STAT_CNTRS 16 /* max 256 */ > #define RTE_ETHDEV_RXTX_CALLBACKS 1 > > /* cryptodev defines */ > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 584e720879..9143cfc529 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -164,6 +164,13 @@ Deprecation Notices >following the IPv6 header, as proposed in RFC >https://mails.dpdk.org/archives/dev/2020-August/177257.html. > > +* ethdev: Queue specific stats fields will be removed from ``struct > rte_eth_stats``. > + Mentioned fields are: ``q_ipackets``, ``q_opackets``, ``q_ibytes``, > ``q_obytes``, > + ``q_errors``. > + Instead queue stats will be received via xstats API. Current method > support > + will be limited to maximum 256 queues. > + Also compile time flag ``RTE_ETHDEV_QUEUE_STAT_CNTRS`` will be removed. > + > * security: The API ``rte_security_session_create`` takes only single > mempool >for session and session private data. So the application need to create >mempool for twice the number of sessions needed and will also lead to > diff --git a/lib/librte_ethdev/rte_ethdev.h > b/lib/librte_ethdev/rte_ethdev.h > index bb7a2b4289..a2e811ca48 100644 > --- a/lib/librte_ethdev/rte_ethdev.h > +++ b/lib/librte_ethdev/rte_ethdev.h > @@ -253,6 +253,7 @@ struct rte_eth_stats { > uint64_t ierrors; /**< Total number of erroneous received > packets. */ > uint64_t oerrors; /**< Total number of failed transmitted > packets. */ > uint64_t rx_nombuf; /**< Total number of RX mbuf allocation > failures. */ > + /* Queue stats are limited to max 256 queues */ > uint64_t q_ipackets[RTE_ETHDEV_QUEUE_STAT_CNTRS]; > /**< Total number of queue RX packets. */ > uint64_t q_opackets[RTE_ETHDEV_QUEUE_STAT_CNTRS]; > @@ -2704,6 +2705,7 @@ int rte_eth_xstats_reset(uint16_t port_id); > * The per-queue packet statistics functionality number that the > transmit > * queue is to be assigned. > * The value must be in the range [0, RTE_ETHDEV_QUEUE_STAT_CNTRS - 1]. > + * Max RTE_ETHDEV_QUEUE_STAT_CNTRS being 256. > * @return > * Zero if successful. Non-zero otherwise. > */ > @@ -2724,6 +2726,7 @@ int rte_eth_dev_set_tx_queue_stats_mapping(uint16_t > port_id, > * The per-queue packet statistics functionality number that the receive > * queue is to be assigned. > * The value must be in the range [0, RTE_ETHDEV_QUEUE_STAT_CNTRS - 1]. > + * Max RTE_ETHDEV_QUEUE_STAT_CNTRS being 256. > * @return > * Zero if successful. Non-zero otherwise. > */ > -- > 2.26.2 > >
Re: [dpdk-dev] [dpdk-stable] [PATCH v2] net/i40e: fix incorrect byte counters
Hi, Your code will work only if stats are updated at least once between two overflows. So it's still up to the application to handle this properly. I think it should be mentioned in the docs. Igor On Fri, Sep 18, 2020 at 6:45 AM Jiang, JunyuX wrote: > Hi Ferruh, > > > -Original Message- > > From: Ferruh Yigit > > Sent: Wednesday, September 16, 2020 8:31 PM > > To: Jiang, JunyuX ; dev@dpdk.org > > Cc: Guo, Jia ; Xing, Beilei ; > > sta...@dpdk.org > > Subject: Re: [dpdk-stable] [PATCH v2] net/i40e: fix incorrect byte > counters > > > > On 9/16/2020 2:51 AM, Junyu Jiang wrote: > > > This patch fixed the issue that rx/tx bytes overflowed > > > > "Rx/Tx statistics counters overflowed"? > > > Yes, the rx_bytes and tx_bytes counter in X710 cards is 48-bit long, if > keep sending packets for a log time, the register will overflow. > > > > on 48 bit limitation by enlarging the limitation. > > > > > > Fixes: 4861cde46116 ("i40e: new poll mode driver") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Junyu Jiang > > > --- > > > drivers/net/i40e/i40e_ethdev.c | 47 > > ++ > > > drivers/net/i40e/i40e_ethdev.h | 9 +++ > > > 2 files changed, 56 insertions(+) > > > > > > diff --git a/drivers/net/i40e/i40e_ethdev.c > > > b/drivers/net/i40e/i40e_ethdev.c index 563f21d9d..4d4ea9861 100644 > > > --- a/drivers/net/i40e/i40e_ethdev.c > > > +++ b/drivers/net/i40e/i40e_ethdev.c > > > @@ -3073,6 +3073,13 @@ i40e_update_vsi_stats(struct i40e_vsi *vsi) > > > i40e_stat_update_48(hw, I40E_GLV_BPRCH(idx), > > I40E_GLV_BPRCL(idx), > > > vsi->offset_loaded, &oes->rx_broadcast, > > > &nes->rx_broadcast); > > > + /* enlarge the limitation when rx_bytes overflowed */ > > > + if (vsi->offset_loaded) { > > > + if (I40E_RXTX_BYTES_LOW(vsi->old_rx_bytes) > nes- > > >rx_bytes) > > > + nes->rx_bytes += (uint64_t)1 << I40E_48_BIT_WIDTH; > > > + nes->rx_bytes += I40E_RXTX_BYTES_HIGH(vsi- > > >old_rx_bytes); > > > + } > > > + vsi->old_rx_bytes = nes->rx_bytes; > > > > > > Can you please describe this logic? (indeed better to describe it in the > > commit log) > > > > 'nes->rx_bytes' is diff in the stats register since last read. > > 'old_rx_bytes' is the previous stats diff. > > > > Why/how "I40E_RXTX_BYTES_LOW(vsi->old_rx_bytes) > nes->rx_bytes" has > > a meaning? Isn't this very depends on the read frequency? > > > > I guess I am missing something but please help me understand. > > > This patch fixes the issue of rx/tx bytes counter register overflow: > The counter register in i40e is 48-bit long, when overflow, nes->rx_bytes > becomes less than old_rx_bytes, the correct value of nes->rx_bytes should > be plused 1 << 48. > Use I40E_RXTX_BYTES_HIGH() to remember the MSB, nes->rx_bytes plus the MSB > is the correct value, So that using uint64_t to enlarge the 48 bit > limitation of register . > > > Also can you please confirm the initial value of the > "vsi->offset_loaded" is > > correct. > > > offset_loaded will be true when get statistics of port and offset_loaded > will be false when reset or clear the statistics, > so if offset_loaded is false, shouldn't to calculate the value of > nes->rx_bytes, it will be 0. > > > <> > > > > > @@ -282,6 +282,9 @@ struct rte_flow { > > > #define I40E_ETH_OVERHEAD \ > > > (RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN + > > I40E_VLAN_TAG_SIZE * 2) > > > > > > +#define I40E_RXTX_BYTES_HIGH(bytes) ((bytes) & ~I40E_48_BIT_MASK) > > > +#define I40E_RXTX_BYTES_LOW(bytes) ((bytes) & I40E_48_BIT_MASK) > > > + > > > > HIGH/LOW is a little misleading, for 64Bits it sounds like it is getting > low 32 bits > > and high 32 bits, can you please clarify macro masks out > > 48/16 bits. > > > Yes, I will change the macro name in V3. > > > > > struct i40e_adapter; > > > struct rte_pci_driver; > > > > > > @@ -399,6 +402,8 @@ struct i40e_vsi { > > > uint8_t vlan_anti_spoof_on; /* The VLAN anti-spoofing enabled */ > > > uint8_t vlan_filter_on; /* The VLAN filter enabled */ > > > struct i40e_bw_info bw_info; /* VSI bandwidth information */ > > > + uint64_t old_rx_bytes; > > > + uint64_t old_tx_bytes; > > > > 'prev' seems better naming than 'old', what do you think renaming > > 'old_rx_bytes' -> 'prev_rx_bytes' (for all variables)? > Yes, it's better, I will fix it in V3. >
Re: [dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices
Hi, Didn't see this patch previously, but we came up with the same idea internally and also faced a hang during the application shutdown. We didn't dig deep, but it occurred in kni_release function. Igor On Mon, Jul 27, 2020 at 8:53 PM Stephen Hemminger < step...@networkplumber.org> wrote: > On Mon, 27 Jul 2020 18:33:08 +0100 > Ferruh Yigit wrote: > > > On 5/6/2020 1:14 AM, Stephen Hemminger wrote: > > > On Wed, 18 Mar 2020 16:17:57 +0100 > > > Thomas Monjalon wrote: > > > > > >> 17/01/2020 17:43, Ferruh Yigit: > > >>> On 12/22/2019 5:55 PM, Stephen Hemminger wrote: > > This fixes a deadlock when using KNI with bifurcated drivers. > > Bringing kni device up always times out when using Mellanox > > devices. > > > > The kernel KNI driver sends message to userspace to complete > > the request. For the case of bifurcated driver, this may involve > > an additional request to kernel to change state. This request > > would deadlock because KNI was holding the RTNL mutex. > > > > This was a bad design which goes back to the original code. > > A workaround is for KNI driver to drop RTNL while waiting. > > To prevent the device from disappearing while the operation > > is in progress, it needs to hold reference to network device > > while waiting. > > > > As an added benefit, an useless error check can also be removed. > > > > Fixes: 3fc5ca2f6352 ("kni: initial import") > > Cc: sta...@dpdk.org > > Signed-off-by: Stephen Hemminger > > --- > > >>> > > >>> This patch cause a hang on my server, not sure what exactly was the > problem but > > >>> kernel log was continuously printing "Cannot send to req_q". Will > dig more. > > >> > > >> Ferruh, did you have a chance to check what is hanging? > > >> Stephen, is there any news on your side? > > >> > > >> > > > > > > It did not hang when I tested it. The bug report is still open > > > > > > > Sorry for the delay, since I am working remotely I was worried about > loosing the > > connection to my server, finally I did create a virtual environment to > test again. > > > > I confirm the hang observed %100 when two different process updates the > kni > > interface, like two different process sets the mtu. Without this patch > this > > works fine. > > > > I understand the motivation of the patch, but with change there is a > possibility > > to hang the server, which we can't allow, need to find another way. Can > updating > > mlx interface wait KNI interface operation to complete? > > Still KNI driver is broken. Calling userspace with RTNL held is > fundamentally > broken design. If KNI were to be incorporated in upstream kernel, then the > netdev > developer would see this. > > What ever solution you think is best. > I will continue to recommend against anyone using KNI. >
Re: [dpdk-dev] [PATCH] net/i40e: fix getting eeprom information
CCing i40e maintainers. This is a trivial fix without which getting module EEPROM doesn't work at all. On Thu, Jul 2, 2020 at 3:37 PM Pavel Ivashchenko wrote: > Signed-off-by: Pavel Ivashchenko > --- > drivers/net/i40e/i40e_ethdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/i40e/i40e_ethdev.c > b/drivers/net/i40e/i40e_ethdev.c > index 970a31cb2..5fdfadf7b 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -12093,7 +12093,7 @@ static int i40e_get_module_eeprom(struct > rte_eth_dev *dev, > } > status = i40e_aq_get_phy_register(hw, > I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE, > - addr, offset, 1, &value, NULL); > + addr, 1, offset, &value, NULL); > if (status) > return -EIO; > data[i] = (uint8_t)value; > -- > 2.25.1 > >
Re: [dpdk-dev] [PATCH] kernel/linux: fix kernel dir for meson
We should at least install it into /lib/modules/kernel-version. For convenience, dpdk modules are installed into /lib/modules/kernel-version/extra/dpdk. In the cross-compilation case, you can use DEST_DIR to set some prefix. I don't really see the issue here. The description clearly says that headers must be in $kernel_dir/build which is usually a symlink to /usr/src/linux-headers-kernel-version. Just set kernel_dir correctly and there won't be compilation failure. Igor On Mon, Dec 2, 2019 at 11:43 AM Ye Xiaolong wrote: > Hi, Igor > > Thanks for the feedback. > > On 12/02, Igor Ryzhov wrote: > >Hi Xiaolong, > > > >Nack from me. It's just an incorrect revert of my fix. > >Kernel modules will be installed in wrong directory, just check > install_dir > > Is there any convention that we must install kernel modules to which > directory? > And what about for cross compilation case? > > Current issue is that if I specify kernel_dir as one local linux src dir > in meson_options.txt, > meson build would fail to compile kernel modules while setting > RTE_KERNELDIR='//build_dir/target-x86_64_glibc/linux-x86_64/linux-4.19.81/' > before make works fine. Is there any subtlety I have missed? > > Thanks, > Xiaolong > > >parameter in kni/meson.build and igb_uio/meson.build. > > > >Igor > > > >On Mon, Dec 2, 2019 at 9:18 AM Xiaolong Ye wrote: > > > >> kernel_dir option in meson build is equivalent to RTE_KERNELDIR in make > >> system, for cross-compilation case, users would specify it as local > >> kernel src dir like > >> > >> //target-arm_glibc/linux-arm/linux-4.19.81/ > >> > >> Current meson build would fail to compile kernel module if user specify > >> kernel_dir as above, this patch fixes this issue. > >> > >> Fixes: 317832f97c16 ("kernel/linux: fix modules install path") > >> Cc: sta...@dpdk.org > >> Cc: iryz...@nfware.com > >> > >> Signed-off-by: Xiaolong Ye > >> --- > >> kernel/linux/igb_uio/meson.build | 2 +- > >> kernel/linux/kni/meson.build | 2 +- > >> kernel/linux/meson.build | 4 ++-- > >> meson_options.txt| 2 +- > >> 4 files changed, 5 insertions(+), 5 deletions(-) > >> > >> diff --git a/kernel/linux/igb_uio/meson.build > >> b/kernel/linux/igb_uio/meson.build > >> index fac404f07..e66218dae 100644 > >> --- a/kernel/linux/igb_uio/meson.build > >> +++ b/kernel/linux/igb_uio/meson.build > >> @@ -8,7 +8,7 @@ mkfile = custom_target('igb_uio_makefile', > >> custom_target('igb_uio', > >> input: ['igb_uio.c', 'Kbuild'], > >> output: 'igb_uio.ko', > >> - command: ['make', '-C', kernel_dir + '/build', > >> + command: ['make', '-C', kernel_dir, > >> 'M=' + meson.current_build_dir(), > >> 'src=' + meson.current_source_dir(), > >> 'EXTRA_CFLAGS=-I' + meson.current_source_dir() + > >> diff --git a/kernel/linux/kni/meson.build b/kernel/linux/kni/meson.build > >> index 955eec949..9fce0c16e 100644 > >> --- a/kernel/linux/kni/meson.build > >> +++ b/kernel/linux/kni/meson.build > >> @@ -13,7 +13,7 @@ kni_sources = files( > >> custom_target('rte_kni', > >> input: kni_sources, > >> output: 'rte_kni.ko', > >> - command: ['make', '-j4', '-C', kernel_dir + '/build', > >> + command: ['make', '-j4', '-C', kernel_dir, > >> 'M=' + meson.current_build_dir(), > >> 'src=' + meson.current_source_dir(), > >> 'MODULE_CFLAGS=-include ' + meson.source_root() + > >> '/config/rte_config.h' + > >> diff --git a/kernel/linux/meson.build b/kernel/linux/meson.build > >> index 1796cc686..a37c95752 100644 > >> --- a/kernel/linux/meson.build > >> +++ b/kernel/linux/meson.build > >> @@ -13,11 +13,11 @@ kernel_dir = get_option('kernel_dir') > >> if kernel_dir == '' > >> # use default path for native builds > >> kernel_version = run_command('uname', '-r').stdout().strip() > >> - kernel_dir = '/lib/modules/' + kernel_version > >> + kernel_dir = '/lib/modules/
Re: [dpdk-dev] [PATCH] kernel/linux: fix kernel dir for meson
Hi Xiaolong, Nack from me. It's just an incorrect revert of my fix. Kernel modules will be installed in wrong directory, just check install_dir parameter in kni/meson.build and igb_uio/meson.build. Igor On Mon, Dec 2, 2019 at 9:18 AM Xiaolong Ye wrote: > kernel_dir option in meson build is equivalent to RTE_KERNELDIR in make > system, for cross-compilation case, users would specify it as local > kernel src dir like > > //target-arm_glibc/linux-arm/linux-4.19.81/ > > Current meson build would fail to compile kernel module if user specify > kernel_dir as above, this patch fixes this issue. > > Fixes: 317832f97c16 ("kernel/linux: fix modules install path") > Cc: sta...@dpdk.org > Cc: iryz...@nfware.com > > Signed-off-by: Xiaolong Ye > --- > kernel/linux/igb_uio/meson.build | 2 +- > kernel/linux/kni/meson.build | 2 +- > kernel/linux/meson.build | 4 ++-- > meson_options.txt| 2 +- > 4 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/kernel/linux/igb_uio/meson.build > b/kernel/linux/igb_uio/meson.build > index fac404f07..e66218dae 100644 > --- a/kernel/linux/igb_uio/meson.build > +++ b/kernel/linux/igb_uio/meson.build > @@ -8,7 +8,7 @@ mkfile = custom_target('igb_uio_makefile', > custom_target('igb_uio', > input: ['igb_uio.c', 'Kbuild'], > output: 'igb_uio.ko', > - command: ['make', '-C', kernel_dir + '/build', > + command: ['make', '-C', kernel_dir, > 'M=' + meson.current_build_dir(), > 'src=' + meson.current_source_dir(), > 'EXTRA_CFLAGS=-I' + meson.current_source_dir() + > diff --git a/kernel/linux/kni/meson.build b/kernel/linux/kni/meson.build > index 955eec949..9fce0c16e 100644 > --- a/kernel/linux/kni/meson.build > +++ b/kernel/linux/kni/meson.build > @@ -13,7 +13,7 @@ kni_sources = files( > custom_target('rte_kni', > input: kni_sources, > output: 'rte_kni.ko', > - command: ['make', '-j4', '-C', kernel_dir + '/build', > + command: ['make', '-j4', '-C', kernel_dir, > 'M=' + meson.current_build_dir(), > 'src=' + meson.current_source_dir(), > 'MODULE_CFLAGS=-include ' + meson.source_root() + > '/config/rte_config.h' + > diff --git a/kernel/linux/meson.build b/kernel/linux/meson.build > index 1796cc686..a37c95752 100644 > --- a/kernel/linux/meson.build > +++ b/kernel/linux/meson.build > @@ -13,11 +13,11 @@ kernel_dir = get_option('kernel_dir') > if kernel_dir == '' > # use default path for native builds > kernel_version = run_command('uname', '-r').stdout().strip() > - kernel_dir = '/lib/modules/' + kernel_version > + kernel_dir = '/lib/modules/' + kernel_version + '/build' > endif > > # test running make in kernel directory, using "make kernelversion" > -make_returncode = run_command('make', '-sC', kernel_dir + '/build', > +make_returncode = run_command('make', '-sC', kernel_dir, > 'kernelversion').returncode() > if make_returncode != 0 > warning('Cannot compile kernel modules as requested - are kernel > headers installed?') > diff --git a/meson_options.txt b/meson_options.txt > index bc369d06c..7eba3b720 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -17,7 +17,7 @@ option('ibverbs_link', type: 'combo', choices : > ['shared', 'dlopen'], value: 'sh > option('include_subdir_arch', type: 'string', value: '', > description: 'subdirectory where to install arch-dependent > headers') > option('kernel_dir', type: 'string', value: '', > - description: 'Path to the kernel for building kernel modules. > Headers must be in $kernel_dir/build. Modules will be installed in > $DEST_DIR/$kernel_dir/extra/dpdk.') > + description: 'Path to the kernel for building kernel modules. > Modules will be installed in $DEST_DIR/$kernel_dir/extra/dpdk.') > option('lib_musdk_dir', type: 'string', value: '', > description: 'path to the MUSDK library installation directory') > option('machine', type: 'string', value: 'native', > -- > 2.17.1 > >
Re: [dpdk-dev] [PATCH] kni: increase kernel version requirement for VA
Hi Ferruh, There is a typo in "version incread to > 4.9.0 ...". incread > increased Igor On Wed, Nov 20, 2019 at 7:00 PM Ferruh Yigit wrote: > A build error reported related to the selected > 'get_user_pages_remote()' kernel API: > > .../kernel/linux/kni/kni_dev.h:113:8: > error: too few arguments to function ‘get_user_pages_remote’ > ret = get_user_pages_remote(tsk, tsk->mm, iova, 1 > ^ > > Currently there are three version of the 'get_user_pages_remote()' > supported, based on kernel version > < 4.9, = 4.9, > 4.9 > > These version based checks are not working fine with the distro kernels > which is the cause of reported build error. The error reported by the > kernel version 4.8, but it is using API defined in > 4.9. > > To be able to take control of this, and possible more, related build > error, increasing the minimum supported kernel version for iova=va with > KNI to kernel version 4.9. > > This leaves us with single version of the kernel API and more > manageable. > > Signed-off-by: Ferruh Yigit > --- > Cc: David Marchand > Cc: Vamsi Attunuru > Cc: Kiran Kumar K > Cc: Jerin Jacob > --- > doc/guides/prog_guide/kernel_nic_interface.rst | 2 +- > doc/guides/rel_notes/release_19_11.rst | 2 +- > kernel/linux/kni/compat.h | 16 +--- > kernel/linux/kni/kni_dev.h | 10 -- > lib/librte_eal/linux/eal/eal.c | 6 +++--- > lib/librte_kni/rte_kni.c | 2 +- > 6 files changed, 11 insertions(+), 27 deletions(-) > > diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst > b/doc/guides/prog_guide/kernel_nic_interface.rst > index c4479ffbf..848b00253 100644 > --- a/doc/guides/prog_guide/kernel_nic_interface.rst > +++ b/doc/guides/prog_guide/kernel_nic_interface.rst > @@ -305,7 +305,7 @@ IOVA = VA: Support > > KNI operates in IOVA_VA scheme when > > -- LINUX_VERSION_CODE >= KERNEL_VERSION(4, 6, 0) and > +- LINUX_VERSION_CODE > KERNEL_VERSION(4, 9, 0) and > - EAL option `iova-mode=va` is passed or bus IOVA scheme in the DPDK is > selected >as RTE_IOVA_VA. > > diff --git a/doc/guides/rel_notes/release_19_11.rst > b/doc/guides/rel_notes/release_19_11.rst > index 21be600ab..45b58190c 100644 > --- a/doc/guides/rel_notes/release_19_11.rst > +++ b/doc/guides/rel_notes/release_19_11.rst > @@ -299,7 +299,7 @@ New Features >* Added IOVA = VA support for KNI, KNI can operate in IOVA = VA mode > when > `iova-mode=va` EAL option is passed to the application or when bus > IOVA > scheme is selected as RTE_IOVA_VA. This mode only works on Linux > Kernel > -versions 4.6.0 and above. > +versions above 4.9.0. > >* Due to IOVA to KVA address translations, based on the KNI use case > there > can be a performance impact. For mitigation, forcing IOVA to PA via > EAL > diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h > index 062b170ef..339dd25a7 100644 > --- a/kernel/linux/kni/compat.h > +++ b/kernel/linux/kni/compat.h > @@ -122,16 +122,10 @@ > #define HAVE_SIGNAL_FUNCTIONS_OWN_HEADER > #endif > > -#if KERNEL_VERSION(4, 6, 0) <= LINUX_VERSION_CODE > - > +/* > + * iova to kva mapping support can be provided since 4.6.0, but support > + * version incread to > 4.9.0 because of the updates in > get_user_pages_remote() > + */ > +#if KERNEL_VERSION(4, 9, 0) < LINUX_VERSION_CODE > #define HAVE_IOVA_TO_KVA_MAPPING_SUPPORT > - > -#if KERNEL_VERSION(4, 9, 0) > LINUX_VERSION_CODE > -#define GET_USER_PAGES_REMOTE_API_V1 > -#elif KERNEL_VERSION(4, 9, 0) == LINUX_VERSION_CODE > -#define GET_USER_PAGES_REMOTE_API_V2 > -#else > -#define GET_USER_PAGES_REMOTE_API_V3 > -#endif > - > #endif > diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h > index fb641b696..5e75c6371 100644 > --- a/kernel/linux/kni/kni_dev.h > +++ b/kernel/linux/kni/kni_dev.h > @@ -101,18 +101,8 @@ static inline phys_addr_t iova_to_phys(struct > task_struct *tsk, > offset = iova & (PAGE_SIZE - 1); > > /* Read one page struct info */ > -#ifdef GET_USER_PAGES_REMOTE_API_V3 > ret = get_user_pages_remote(tsk, tsk->mm, iova, 1, > FOLL_TOUCH, &page, NULL, NULL); > -#endif > -#ifdef GET_USER_PAGES_REMOTE_API_V2 > - ret = get_user_pages_remote(tsk, tsk->mm, iova, 1, > - FOLL_TOUCH, &page, NULL); > -#endif > -#ifdef GET_USER_PAGES_REMOTE_API_V1 > - ret = get_user_pages_remote(tsk, tsk->mm, iova, 1 > - 0, 0, &page, NULL); > -#endif > if (ret < 0) > return 0; > > diff --git a/lib/librte_eal/linux/eal/eal.c > b/lib/librte_eal/linux/eal/eal.c > index b5b71500c..5879e33e5 100644 > --- a/lib/librte_eal/linux/eal/eal.c > +++ b/lib/librte_eal/linux/eal/eal.c > @@ -1073,7 +1073,7 @@ rte_eal_init(int argc, char **argv) > */ > iova_mode = RTE_IOVA_VA; >
Re: [dpdk-dev] [PATCH] kni: reduce interface name size
Hi Michael, Isn't it better to set it to IFNAMSIZ instead of 16? Best regards, Igot On Fri, Nov 15, 2019 at 2:41 PM Michael Pfeiffer < michael.pfeif...@tu-ilmenau.de> wrote: > The name in rte_kni_device_info is passed to the kernel, which allows > interface names with at most 16 bytes (IFNAMSIZ). rte_kni_alloc with a > longer name currently trigger a kernel BUG in alloc_netdev_mqs in > net/core/dev.c. Reduce RTE_KNI_NAMESIZE to prevent this situation. > > Signed-off-by: Michael Pfeiffer > --- > lib/librte_eal/linux/eal/include/rte_kni_common.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h > b/lib/librte_eal/linux/eal/include/rte_kni_common.h > index 46f75a710..59339271b 100644 > --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h > +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h > @@ -18,7 +18,7 @@ > /** > * KNI name is part of memzone name. > */ > -#define RTE_KNI_NAMESIZE 32 > +#define RTE_KNI_NAMESIZE 16 > > #define RTE_CACHE_LINE_MIN_SIZE 64 > > -- > 2.20.1 > >
Re: [dpdk-dev] [PATCH v3] kni: rework rte_kni_update_link using ioctl
Hi Ferruh, Dan, Sure, I remember last year discussion but now I see the problem in current implementation. Ferruh, here is an example: We have a thread in the application that processes KNI commands from the kernel. It receives config_network_if command to set interface up, calls rte_eth_dev_start, and here is the problem. We cannot call current rte_kni_update_link from here as the interface is not yet up in the kernel, as we didn't send a response for config_network_if yet. So we need to send a response first and only after that, we can use rte_kni_update_link. Actually, we don't even know the exact time between we send a response and the moment when the kernel receives it and the interface becomes up. We always have a dependency on the interface state in the kernel. With ioctl approach, we don't have such dependency - we can call rte_kni_update_link whenever we want, even when the interface is down in the kernel. As I explained, it's common when processing config_network_if to set interface up. Igor On Mon, Oct 14, 2019 at 11:56 PM Dan Gora wrote: > Here's another link to the thread where this was discussed last year.. > Igor was actually on this thread as well... > > https://mails.dpdk.org/archives/dev/2018-August/110383.html > > On Mon, Oct 14, 2019 at 4:01 PM Dan Gora wrote: > > > > My original patch to add this feature was basically the same thing as > > this: setting the link status via a KNI ioctl. That method was > > rejected after _much_ discussion and we eventually settled on the > > currently implementation. > > > > My original patch was here: Message-Id: < > 20180628225548.21885-1...@adax.com> > > > > If you search for KNI and d...@adax.com in the DPDK devel list you > > should be able to suss out the whole discussion that lead to the > > current implementation. > > > > thanks > > dan > > > > On Mon, Oct 14, 2019 at 1:17 PM Ferruh Yigit > wrote: > > > > > > On 10/14/2019 5:10 PM, Ferruh Yigit wrote: > > > > On 9/25/2019 10:36 AM, Igor Ryzhov wrote: > > > >> Current implementation doesn't allow us to update KNI carrier if the > > > >> interface is not yet UP in kernel. It means that we can't use it in > the > > > >> same thread which is processing rte_kni_ops.config_network_if, > which is > > > >> very convenient, because it allows us to have correct carrier status > > > >> of the interface right after we enabled it and we don't have to use > any > > > >> additional thread to track link status. > > > > > > > > Hi Igor, > > > > > > > > The existing thread tracks the link status of the physical device > and reflects > > > > the changes to the kni netdev, but the "struct rte_kni_ops" > > > > (rte_kni_ops.config_network_if) works other way around, it captures > (some) > > > > requests to kni netdev and reflects them to the underlying physical > device. > > > > Even 'rte_kni_update_link()' updated to use ioctl, the thread still > looks > > > > required and this patch doesn't really changes that part. > > > > > > > > Also I am reluctant to extend the KNI ioctl interface when there is > a generic > > > > way to do that work. > > > > > > > > What is the use case of updating kni netdev carrier status when the > interface is > > > > down? > > > > > > btw, if the problem is status of the interface being 'no-carrier' by > default, > > > this can be changed by "carrier=on" parameter of the kni kernel module: > > > "insmod ./build/kmod/rte_kni.ko carrier=on" >
Re: [dpdk-dev] [PATCH v2] kni: add ability to set min/max MTU
Hi David, Ferruh, Sorry I was on vacation. Thank you for dealing with this. Best regards, Igor On Sun, Oct 27, 2019 at 1:12 PM David Marchand wrote: > On Fri, Oct 25, 2019 at 8:31 PM Ferruh Yigit > wrote: > > > > From: Igor Ryzhov > > > > Starting with kernel version 4.10, there are new min/max MTU values in > > net_device structure, which are set to ETH_MIN_MTU and ETH_DATA_LEN by > > default. We should be able to change these values to allow MTU more than > > 1500 to be set on KNI. > > > > Signed-off-by: Igor Ryzhov > > Acked-by: Ferruh Yigit > > Applied, thanks. > > > -- > David Marchand > >
[dpdk-dev] [PATCH v3] kni: rework rte_kni_update_link using ioctl
Current implementation doesn't allow us to update KNI carrier if the interface is not yet UP in kernel. It means that we can't use it in the same thread which is processing rte_kni_ops.config_network_if, which is very convenient, because it allows us to have correct carrier status of the interface right after we enabled it and we don't have to use any additional thread to track link status. Signed-off-by: Igor Ryzhov --- v3: remove unused variables v2: fix checkpatch warnings kernel/linux/kni/compat.h | 4 -- kernel/linux/kni/kni_misc.c | 42 +++ kernel/linux/kni/kni_net.c| 15 --- .../linux/eal/include/rte_kni_common.h| 6 +++ lib/librte_kni/rte_kni.c | 33 +++ lib/librte_kni/rte_kni.h | 3 +- 6 files changed, 55 insertions(+), 48 deletions(-) diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h index fe0ee55e7..e0a491bcd 100644 --- a/kernel/linux/kni/compat.h +++ b/kernel/linux/kni/compat.h @@ -61,10 +61,6 @@ #define kni_sock_map_fd(s) sock_map_fd(s, 0) #endif -#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0) -#define HAVE_CHANGE_CARRIER_CB -#endif - #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 14, 0) #define ether_addr_copy(dst, src) memcpy(dst, src, ETH_ALEN) #endif diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index aeb275329..9bcba68c8 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -462,6 +462,45 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num, return ret; } +static int +kni_ioctl_link(struct net *net, uint32_t ioctl_num, + unsigned long ioctl_param) +{ + struct kni_net *knet = net_generic(net, kni_net_id); + int ret = -EINVAL; + struct kni_dev *dev, *n; + struct rte_kni_link_info link_info; + struct net_device *netdev; + + if (_IOC_SIZE(ioctl_num) > sizeof(link_info)) + return -EINVAL; + + if (copy_from_user(&link_info, (void *)ioctl_param, sizeof(link_info))) + return -EFAULT; + + if (strlen(link_info.name) == 0) + return -EINVAL; + + down_read(&knet->kni_list_lock); + list_for_each_entry_safe(dev, n, &knet->kni_list_head, list) { + if (strncmp(dev->name, link_info.name, RTE_KNI_NAMESIZE) != 0) + continue; + + netdev = dev->net_dev; + + if (link_info.linkup) + netif_carrier_on(netdev); + else + netif_carrier_off(netdev); + + ret = 0; + break; + } + up_read(&knet->kni_list_lock); + + return ret; +} + static int kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param) { @@ -483,6 +522,9 @@ kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param) case _IOC_NR(RTE_KNI_IOCTL_RELEASE): ret = kni_ioctl_release(net, ioctl_num, ioctl_param); break; + case _IOC_NR(RTE_KNI_IOCTL_LINK): + ret = kni_ioctl_link(net, ioctl_num, ioctl_param); + break; default: pr_debug("IOCTL default\n"); break; diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 7bd3a9f1e..cd852eea3 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -706,18 +706,6 @@ kni_net_set_mac(struct net_device *netdev, void *p) return (ret == 0 ? req.result : ret); } -#ifdef HAVE_CHANGE_CARRIER_CB -static int -kni_net_change_carrier(struct net_device *dev, bool new_carrier) -{ - if (new_carrier) - netif_carrier_on(dev); - else - netif_carrier_off(dev); - return 0; -} -#endif - static const struct header_ops kni_net_header_ops = { .create = kni_net_header, .parse = eth_header_parse, @@ -736,9 +724,6 @@ static const struct net_device_ops kni_net_netdev_ops = { .ndo_change_mtu = kni_net_change_mtu, .ndo_tx_timeout = kni_net_tx_timeout, .ndo_set_mac_address = kni_net_set_mac, -#ifdef HAVE_CHANGE_CARRIER_CB - .ndo_change_carrier = kni_net_change_carrier, -#endif }; static void kni_get_drvinfo(struct net_device *dev, diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h index 70992d835..07a10dd93 100644 --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h @@ -125,10 +125,16 @@ struct rte_kni_device_info { uint8_t mac_addr[6]; }; +struct rte_kni_link_info { + char name[RTE_KNI_NAMESIZE]; + unsigned int linkup; +}; + #define KNI_DEVICE "kni" #define RTE_KNI_IOCTL_TEST_IOWR(0, 1, int) #define RTE_KNI_IOCTL_CRE
Re: [dpdk-dev] [PATCH v2] kni: rework rte_kni_update_link using ioctl
Sure, my bad, sorry. Will send v3. On Tue, Sep 24, 2019 at 11:37 PM Aaron Conole wrote: > Igor Ryzhov writes: > > > Current implementation doesn't allow us to update KNI carrier if the > > interface is not yet UP in kernel. It means that we can't use it in the > > same thread which is processing rte_kni_ops.config_network_if, which is > > very convenient, because it allows us to have correct carrier status > > of the interface right after we enabled it and we don't have to use any > > additional thread to track link status. > > > > Signed-off-by: Igor Ryzhov > > --- > > v2: fix checkpatch warnings > > > > kernel/linux/kni/compat.h | 4 -- > > kernel/linux/kni/kni_misc.c | 42 +++ > > kernel/linux/kni/kni_net.c| 15 --- > > .../linux/eal/include/rte_kni_common.h| 6 +++ > > lib/librte_kni/rte_kni.c | 28 +++-- > > lib/librte_kni/rte_kni.h | 3 +- > > 6 files changed, 55 insertions(+), 43 deletions(-) > > > > diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h > > index fe0ee55e7..e0a491bcd 100644 > > --- a/kernel/linux/kni/compat.h > > +++ b/kernel/linux/kni/compat.h > > @@ -61,10 +61,6 @@ > > #define kni_sock_map_fd(s) sock_map_fd(s, 0) > > #endif > > > > -#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0) > > -#define HAVE_CHANGE_CARRIER_CB > > -#endif > > - > > #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 14, 0) > > #define ether_addr_copy(dst, src) memcpy(dst, src, ETH_ALEN) > > #endif > > diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c > > index aeb275329..9bcba68c8 100644 > > --- a/kernel/linux/kni/kni_misc.c > > +++ b/kernel/linux/kni/kni_misc.c > > @@ -462,6 +462,45 @@ kni_ioctl_release(struct net *net, uint32_t > ioctl_num, > > return ret; > > } > > > > +static int > > +kni_ioctl_link(struct net *net, uint32_t ioctl_num, > > + unsigned long ioctl_param) > > +{ > > + struct kni_net *knet = net_generic(net, kni_net_id); > > + int ret = -EINVAL; > > + struct kni_dev *dev, *n; > > + struct rte_kni_link_info link_info; > > + struct net_device *netdev; > > + > > + if (_IOC_SIZE(ioctl_num) > sizeof(link_info)) > > + return -EINVAL; > > + > > + if (copy_from_user(&link_info, (void *)ioctl_param, > sizeof(link_info))) > > + return -EFAULT; > > + > > + if (strlen(link_info.name) == 0) > > + return -EINVAL; > > + > > + down_read(&knet->kni_list_lock); > > + list_for_each_entry_safe(dev, n, &knet->kni_list_head, list) { > > + if (strncmp(dev->name, link_info.name, RTE_KNI_NAMESIZE) > != 0) > > + continue; > > + > > + netdev = dev->net_dev; > > + > > + if (link_info.linkup) > > + netif_carrier_on(netdev); > > + else > > + netif_carrier_off(netdev); > > + > > + ret = 0; > > + break; > > + } > > + up_read(&knet->kni_list_lock); > > + > > + return ret; > > +} > > + > > static int > > kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long > ioctl_param) > > { > > @@ -483,6 +522,9 @@ kni_ioctl(struct inode *inode, uint32_t ioctl_num, > unsigned long ioctl_param) > > case _IOC_NR(RTE_KNI_IOCTL_RELEASE): > > ret = kni_ioctl_release(net, ioctl_num, ioctl_param); > > break; > > + case _IOC_NR(RTE_KNI_IOCTL_LINK): > > + ret = kni_ioctl_link(net, ioctl_num, ioctl_param); > > + break; > > default: > > pr_debug("IOCTL default\n"); > > break; > > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > > index 7bd3a9f1e..cd852eea3 100644 > > --- a/kernel/linux/kni/kni_net.c > > +++ b/kernel/linux/kni/kni_net.c > > @@ -706,18 +706,6 @@ kni_net_set_mac(struct net_device *netdev, void *p) > > return (ret == 0 ? req.result : ret); > > } > > > > -#ifdef HAVE_CHANGE_CARRIER_CB > > -static int > > -kni_net_change_carrier(struct net_device *dev, bool new_carrier) > > -{ > > - if (new_carrier) > > - netif_carrier_on(dev); > > - else >
[dpdk-dev] [PATCH v2] kni: rework rte_kni_update_link using ioctl
Current implementation doesn't allow us to update KNI carrier if the interface is not yet UP in kernel. It means that we can't use it in the same thread which is processing rte_kni_ops.config_network_if, which is very convenient, because it allows us to have correct carrier status of the interface right after we enabled it and we don't have to use any additional thread to track link status. Signed-off-by: Igor Ryzhov --- v2: fix checkpatch warnings kernel/linux/kni/compat.h | 4 -- kernel/linux/kni/kni_misc.c | 42 +++ kernel/linux/kni/kni_net.c| 15 --- .../linux/eal/include/rte_kni_common.h| 6 +++ lib/librte_kni/rte_kni.c | 28 +++-- lib/librte_kni/rte_kni.h | 3 +- 6 files changed, 55 insertions(+), 43 deletions(-) diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h index fe0ee55e7..e0a491bcd 100644 --- a/kernel/linux/kni/compat.h +++ b/kernel/linux/kni/compat.h @@ -61,10 +61,6 @@ #define kni_sock_map_fd(s) sock_map_fd(s, 0) #endif -#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0) -#define HAVE_CHANGE_CARRIER_CB -#endif - #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 14, 0) #define ether_addr_copy(dst, src) memcpy(dst, src, ETH_ALEN) #endif diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index aeb275329..9bcba68c8 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -462,6 +462,45 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num, return ret; } +static int +kni_ioctl_link(struct net *net, uint32_t ioctl_num, + unsigned long ioctl_param) +{ + struct kni_net *knet = net_generic(net, kni_net_id); + int ret = -EINVAL; + struct kni_dev *dev, *n; + struct rte_kni_link_info link_info; + struct net_device *netdev; + + if (_IOC_SIZE(ioctl_num) > sizeof(link_info)) + return -EINVAL; + + if (copy_from_user(&link_info, (void *)ioctl_param, sizeof(link_info))) + return -EFAULT; + + if (strlen(link_info.name) == 0) + return -EINVAL; + + down_read(&knet->kni_list_lock); + list_for_each_entry_safe(dev, n, &knet->kni_list_head, list) { + if (strncmp(dev->name, link_info.name, RTE_KNI_NAMESIZE) != 0) + continue; + + netdev = dev->net_dev; + + if (link_info.linkup) + netif_carrier_on(netdev); + else + netif_carrier_off(netdev); + + ret = 0; + break; + } + up_read(&knet->kni_list_lock); + + return ret; +} + static int kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param) { @@ -483,6 +522,9 @@ kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param) case _IOC_NR(RTE_KNI_IOCTL_RELEASE): ret = kni_ioctl_release(net, ioctl_num, ioctl_param); break; + case _IOC_NR(RTE_KNI_IOCTL_LINK): + ret = kni_ioctl_link(net, ioctl_num, ioctl_param); + break; default: pr_debug("IOCTL default\n"); break; diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 7bd3a9f1e..cd852eea3 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -706,18 +706,6 @@ kni_net_set_mac(struct net_device *netdev, void *p) return (ret == 0 ? req.result : ret); } -#ifdef HAVE_CHANGE_CARRIER_CB -static int -kni_net_change_carrier(struct net_device *dev, bool new_carrier) -{ - if (new_carrier) - netif_carrier_on(dev); - else - netif_carrier_off(dev); - return 0; -} -#endif - static const struct header_ops kni_net_header_ops = { .create = kni_net_header, .parse = eth_header_parse, @@ -736,9 +724,6 @@ static const struct net_device_ops kni_net_netdev_ops = { .ndo_change_mtu = kni_net_change_mtu, .ndo_tx_timeout = kni_net_tx_timeout, .ndo_set_mac_address = kni_net_set_mac, -#ifdef HAVE_CHANGE_CARRIER_CB - .ndo_change_carrier = kni_net_change_carrier, -#endif }; static void kni_get_drvinfo(struct net_device *dev, diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h index 70992d835..07a10dd93 100644 --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h @@ -125,10 +125,16 @@ struct rte_kni_device_info { uint8_t mac_addr[6]; }; +struct rte_kni_link_info { + char name[RTE_KNI_NAMESIZE]; + unsigned int linkup; +}; + #define KNI_DEVICE "kni" #define RTE_KNI_IOCTL_TEST_IOWR(0, 1, int) #define RTE_KNI_IOCTL_CREATE _IOWR(0, 2, st
[dpdk-dev] [PATCH] kni: rework rte_kni_update_link using ioctl
Current implementation doesn't allow us to update KNI carrier if the interface is not yet UP in kernel. It means that we can't use it in the same thread which is processing rte_kni_ops.config_network_if, which is very convenient, because it allows us to have correct carrier status of the interface right after we enabled it and we don't have to use any additional thread to track link status. Signed-off-by: Igor Ryzhov --- kernel/linux/kni/compat.h | 4 -- kernel/linux/kni/kni_misc.c | 47 +++ kernel/linux/kni/kni_net.c| 15 -- .../linux/eal/include/rte_kni_common.h| 6 +++ lib/librte_kni/rte_kni.c | 28 +++ lib/librte_kni/rte_kni.h | 3 +- 6 files changed, 60 insertions(+), 43 deletions(-) diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h index fe0ee55e7..e0a491bcd 100644 --- a/kernel/linux/kni/compat.h +++ b/kernel/linux/kni/compat.h @@ -61,10 +61,6 @@ #define kni_sock_map_fd(s) sock_map_fd(s, 0) #endif -#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0) -#define HAVE_CHANGE_CARRIER_CB -#endif - #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 14, 0) #define ether_addr_copy(dst, src) memcpy(dst, src, ETH_ALEN) #endif diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index aeb275329..fa87bbf41 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -462,6 +462,50 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num, return ret; } +static int +kni_ioctl_link(struct net *net, uint32_t ioctl_num, + unsigned long ioctl_param) +{ + struct kni_net *knet = net_generic(net, kni_net_id); + int ret = -EINVAL; + struct kni_dev *dev, *n; + struct rte_kni_link_info link_info; + struct net_device *netdev; + + if (_IOC_SIZE(ioctl_num) > sizeof(link_info)) + return -EINVAL; + + ret = copy_from_user(&link_info, (void *)ioctl_param, +sizeof(link_info)); + if (ret) { + pr_err("copy_from_user in kni_ioctl_link"); + return -EIO; + } + + if (strlen(link_info.name) == 0) + return ret; + + down_read(&knet->kni_list_lock); + list_for_each_entry_safe(dev, n, &knet->kni_list_head, list) { + if (strncmp(dev->name, link_info.name, RTE_KNI_NAMESIZE) != 0) + continue; + + netdev = dev->net_dev; + + if (link_info.linkup) { + netif_carrier_on(netdev); + } else { + netif_carrier_off(netdev); + } + + ret = 0; + break; + } + up_read(&knet->kni_list_lock); + + return ret; +} + static int kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param) { @@ -483,6 +527,9 @@ kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param) case _IOC_NR(RTE_KNI_IOCTL_RELEASE): ret = kni_ioctl_release(net, ioctl_num, ioctl_param); break; + case _IOC_NR(RTE_KNI_IOCTL_LINK): + ret = kni_ioctl_link(net, ioctl_num, ioctl_param); + break; default: pr_debug("IOCTL default\n"); break; diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 7bd3a9f1e..cd852eea3 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -706,18 +706,6 @@ kni_net_set_mac(struct net_device *netdev, void *p) return (ret == 0 ? req.result : ret); } -#ifdef HAVE_CHANGE_CARRIER_CB -static int -kni_net_change_carrier(struct net_device *dev, bool new_carrier) -{ - if (new_carrier) - netif_carrier_on(dev); - else - netif_carrier_off(dev); - return 0; -} -#endif - static const struct header_ops kni_net_header_ops = { .create = kni_net_header, .parse = eth_header_parse, @@ -736,9 +724,6 @@ static const struct net_device_ops kni_net_netdev_ops = { .ndo_change_mtu = kni_net_change_mtu, .ndo_tx_timeout = kni_net_tx_timeout, .ndo_set_mac_address = kni_net_set_mac, -#ifdef HAVE_CHANGE_CARRIER_CB - .ndo_change_carrier = kni_net_change_carrier, -#endif }; static void kni_get_drvinfo(struct net_device *dev, diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h index 70992d835..07a10dd93 100644 --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h @@ -125,10 +125,16 @@ struct rte_kni_device_info { uint8_t mac_addr[6]; }; +struct rte_kni_link_info { + char name[RTE_KNI_NAMESIZE]; + unsigned int linkup; +}; + #define
[dpdk-dev] [PATCH] kni: add ability to set min/max MTU
Starting with kernel version 4.10, there are new min/max MTU values in net_device structure, which are set to ETH_MIN_MTU and ETH_DATA_LEN by default. We should be able to change these values to allow MTU more than 1500 to be set on KNI. Signed-off-by: Igor Ryzhov --- examples/kni/main.c | 3 +++ kernel/linux/kni/compat.h | 4 kernel/linux/kni/kni_misc.c | 8 lib/librte_eal/linux/eal/include/rte_kni_common.h | 2 ++ lib/librte_kni/rte_kni.c | 2 ++ lib/librte_kni/rte_kni.h | 2 ++ 6 files changed, 21 insertions(+) diff --git a/examples/kni/main.c b/examples/kni/main.c index 4710d7176..c22a7c18d 100644 --- a/examples/kni/main.c +++ b/examples/kni/main.c @@ -907,6 +907,9 @@ kni_alloc(uint16_t port_id) rte_eth_dev_get_mtu(port_id, &conf.mtu); + conf.min_mtu = dev_info.min_mtu; + conf.max_mtu = dev_info.max_mtu; + memset(&ops, 0, sizeof(ops)); ops.port_id = port_id; ops.change_mtu = kni_change_mtu; diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h index 562d8bf94..fe0ee55e7 100644 --- a/kernel/linux/kni/compat.h +++ b/kernel/linux/kni/compat.h @@ -89,6 +89,10 @@ #define HAVE_TRANS_START_HELPER #endif +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 10, 0) +#define HAVE_MIN_MAX_MTU +#endif + /* * KNI uses NET_NAME_UNKNOWN macro to select correct version of alloc_netdev() * For old kernels just backported the commit that enables the macro diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 2b75502a8..aeb275329 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -390,6 +390,14 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num, net_dev->max_mtu = net_dev->mtu; #endif +#ifdef HAVE_MIN_MAX_MTU + if (dev_info.min_mtu) + net_dev->min_mtu = dev_info.min_mtu; + + if (dev_info.max_mtu) + net_dev->max_mtu = dev_info.max_mtu; +#endif + ret = register_netdev(net_dev); if (ret) { pr_err("error %i registering device \"%s\"\n", diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h index 37d9ee8f0..70992d835 100644 --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h @@ -120,6 +120,8 @@ struct rte_kni_device_info { /* mbuf size */ unsigned mbuf_size; unsigned int mtu; + unsigned int min_mtu; + unsigned int max_mtu; uint8_t mac_addr[6]; }; diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 4b51fb4fe..521db27c4 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -252,6 +252,8 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, dev_info.group_id = conf->group_id; dev_info.mbuf_size = conf->mbuf_size; dev_info.mtu = conf->mtu; + dev_info.min_mtu = conf->min_mtu; + dev_info.max_mtu = conf->max_mtu; memcpy(dev_info.mac_addr, conf->mac_addr, RTE_ETHER_ADDR_LEN); diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h index 5699a6443..b22446fa7 100644 --- a/lib/librte_kni/rte_kni.h +++ b/lib/librte_kni/rte_kni.h @@ -70,6 +70,8 @@ struct rte_kni_conf { uint8_t force_bind : 1; /* Flag to bind kernel thread */ uint8_t mac_addr[RTE_ETHER_ADDR_LEN]; /* MAC address assigned to KNI */ uint16_t mtu; + uint16_t min_mtu; + uint16_t max_mtu; }; /** -- 2.23.0
Re: [dpdk-dev] [PATCH] config: set RTE_KNI_PREEMPT_DEFAULT in meson
CCing Bruce, maintainer of the meson build system. It's a simple patch to make meson behavior consistent with make behavior. CONFIG_RTE_KNI_PREEMPT_DEFAULT is set to "y" in config/common_base, so we need to set RTE_KNI_PREEMPT_DEFAULT to 1 in meson. Best regards, Igor On Mon, Sep 16, 2019 at 1:08 PM Igor Ryzhov wrote: > Same behavior as in make build system. > > Signed-off-by: Igor Ryzhov > --- > config/rte_config.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/config/rte_config.h b/config/rte_config.h > index 0bbbe274f..e6a35a170 100644 > --- a/config/rte_config.h > +++ b/config/rte_config.h > @@ -98,6 +98,9 @@ > #define RTE_SCHED_PORT_N_GRINDERS 8 > #undef RTE_SCHED_VECTOR > > +/* rte_kni defines */ > +#define RTE_KNI_PREEMPT_DEFAULT 1 > + > /** driver defines / > > /* QuickAssist device */ > -- > 2.23.0 > >
[dpdk-dev] [PATCH] config: set RTE_KNI_PREEMPT_DEFAULT in meson
Same behavior as in make build system. Signed-off-by: Igor Ryzhov --- config/rte_config.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/config/rte_config.h b/config/rte_config.h index 0bbbe274f..e6a35a170 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -98,6 +98,9 @@ #define RTE_SCHED_PORT_N_GRINDERS 8 #undef RTE_SCHED_VECTOR +/* rte_kni defines */ +#define RTE_KNI_PREEMPT_DEFAULT 1 + /** driver defines / /* QuickAssist device */ -- 2.23.0
Re: [dpdk-dev] [PATCH v9 2/5] kni: add IOVA=VA support in KNI lib
Hi, I believe iova_mode check should not be automatic and should be a part of rte_kni_conf. What if I want to use KNI just as a pure virtual device without any connection to a real PCI device in an application that works in VA mode? Best regards, Igor On Mon, Jul 29, 2019 at 3:14 PM wrote: > > From: Vamsi Attunuru > > Current KNI implementation only operates in IOVA=PA mode, patch adds > required functionality in KNI lib to support IOVA=VA mode. > > KNI kernel module requires device info to get iommu domain related > information for IOVA addr related translations. Patch defines device > related info in rte_kni_device_info structure and passes device info > to the kernel KNI module when IOVA=VA mode is enabled. > > Signed-off-by: Vamsi Attunuru > Signed-off-by: Kiran Kumar K > --- > lib/librte_eal/linux/eal/include/rte_kni_common.h | 8 ++ > lib/librte_kni/Makefile | 1 + > lib/librte_kni/meson.build| 1 + > lib/librte_kni/rte_kni.c | 30 > +++ > 4 files changed, 40 insertions(+) > > diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h > b/lib/librte_eal/linux/eal/include/rte_kni_common.h > index 37d9ee8..4fd8a90 100644 > --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h > +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h > @@ -111,6 +111,13 @@ struct rte_kni_device_info { > void * mbuf_va; > phys_addr_t mbuf_phys; > > + /* PCI info */ > + uint16_t vendor_id; /**< Vendor ID or PCI_ANY_ID. */ > + uint16_t device_id; /**< Device ID or PCI_ANY_ID. */ > + uint8_t bus; /**< Device bus */ > + uint8_t devid;/**< Device ID */ > + uint8_t function; /**< Device function. */ > + > uint16_t group_id;/**< Group ID */ > uint32_t core_id; /**< core ID to bind for kernel thread > */ > > @@ -121,6 +128,7 @@ struct rte_kni_device_info { > unsigned mbuf_size; > unsigned int mtu; > uint8_t mac_addr[6]; > + uint8_t iova_mode; > }; > > #define KNI_DEVICE "kni" > diff --git a/lib/librte_kni/Makefile b/lib/librte_kni/Makefile > index cbd6599..ab15d10 100644 > --- a/lib/librte_kni/Makefile > +++ b/lib/librte_kni/Makefile > @@ -7,6 +7,7 @@ include $(RTE_SDK)/mk/rte.vars.mk > LIB = librte_kni.a > > CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -fno-strict-aliasing > +CFLAGS += -I$(RTE_SDK)/drivers/bus/pci > LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev > > EXPORT_MAP := rte_kni_version.map > diff --git a/lib/librte_kni/meson.build b/lib/librte_kni/meson.build > index 41fa2e3..fd46f87 100644 > --- a/lib/librte_kni/meson.build > +++ b/lib/librte_kni/meson.build > @@ -9,3 +9,4 @@ version = 2 > sources = files('rte_kni.c') > headers = files('rte_kni.h') > deps += ['ethdev', 'pci'] > +includes += include_directories('../../drivers/bus/pci') > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c > index 4b51fb4..2aaaeaa 100644 > --- a/lib/librte_kni/rte_kni.c > +++ b/lib/librte_kni/rte_kni.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -199,6 +200,27 @@ kni_release_mz(struct rte_kni *kni) > rte_memzone_free(kni->m_sync_addr); > } > > +static void > +kni_dev_pci_addr_get(uint16_t port_id, struct rte_kni_device_info > *kni_dev_info) > +{ > + const struct rte_pci_device *pci_dev; > + struct rte_eth_dev_info dev_info; > + const struct rte_bus *bus = NULL; > + > + rte_eth_dev_info_get(port_id, &dev_info); > + > + if (dev_info.device) > + bus = rte_bus_find_by_device(dev_info.device); > + if (bus && !strcmp(bus->name, "pci")) { > + pci_dev = RTE_DEV_TO_PCI(dev_info.device); > + kni_dev_info->bus = pci_dev->addr.bus; > + kni_dev_info->devid = pci_dev->addr.devid; > + kni_dev_info->function = pci_dev->addr.function; > + kni_dev_info->vendor_id = pci_dev->id.vendor_id; > + kni_dev_info->device_id = pci_dev->id.device_id; > + } > +} > + > struct rte_kni * > rte_kni_alloc(struct rte_mempool *pktmbuf_pool, > const struct rte_kni_conf *conf, > @@ -247,6 +269,12 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, > kni->ops.port_id = UINT16_MAX; > > memset(&dev_info, 0, sizeof(dev_info)); > + > + if (rte_eal_iova_mode() == RTE_IOVA_VA) { > + uint16_t port_id = conf->group_id; > + > + kni_dev_pci_addr_get(port_id, &dev_info); > + } > dev_info.core_id = conf->core_id; > dev_info.force_bind = conf->force_bind; > dev_info.group_id = conf->group_id; > @@ -300,6 +328,8 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, > kni->group_id = conf->group_id; > kni->mbuf_size = conf->mbuf
Re: [dpdk-dev] [PATCH v5 9/9] kni: add minimal ethtool
Hi Stephen, As ethtool support is restored, I think your patch #7 should be updated to keep ethtool support in docs. Best regards, Igor On Thu, Jun 20, 2019 at 10:22 PM Stephen Hemminger wrote: > > Some applications use ethtool so add the minimum ethtool ops. > > Signed-off-by: Stephen Hemminger > --- > kernel/linux/kni/kni_dev.h | 2 ++ > kernel/linux/kni/kni_misc.c | 1 + > kernel/linux/kni/kni_net.c | 14 ++ > 3 files changed, 17 insertions(+) > > diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h > index ceba5f73c1d9..c1ca6789ce12 100644 > --- a/kernel/linux/kni/kni_dev.h > +++ b/kernel/linux/kni/kni_dev.h > @@ -11,6 +11,8 @@ > #endif > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > +#define KNI_VERSION"1.0" > + > #include "compat.h" > > #include > diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c > index be45f823408f..2b75502a8b0e 100644 > --- a/kernel/linux/kni/kni_misc.c > +++ b/kernel/linux/kni/kni_misc.c > @@ -21,6 +21,7 @@ > #include "compat.h" > #include "kni_dev.h" > > +MODULE_VERSION(KNI_VERSION); > MODULE_LICENSE("Dual BSD/GPL"); > MODULE_AUTHOR("Intel Corporation"); > MODULE_DESCRIPTION("Kernel Module for managing kni devices"); > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > index 320d51d7fc83..319ee2dcb19a 100644 > --- a/kernel/linux/kni/kni_net.c > +++ b/kernel/linux/kni/kni_net.c > @@ -13,6 +13,7 @@ > #include > #include > #include /* eth_type_trans */ > +#include > #include > #include > #include > @@ -725,6 +726,18 @@ static const struct net_device_ops kni_net_netdev_ops = { > #endif > }; > > +static void kni_get_drvinfo(struct net_device *dev, > + struct ethtool_drvinfo *info) > +{ > + strlcpy(info->version, KNI_VERSION, sizeof(info->version)); > + strlcpy(info->driver, "kni", sizeof(info->driver)); > +} > + > +static const struct ethtool_ops kni_net_ethtool_ops = { > + .get_drvinfo= kni_get_drvinfo, > + .get_link = ethtool_op_get_link, > +}; > + > void > kni_net_init(struct net_device *dev) > { > @@ -736,6 +749,7 @@ kni_net_init(struct net_device *dev) > ether_setup(dev); /* assign some of the fields */ > dev->netdev_ops = &kni_net_netdev_ops; > dev->header_ops = &kni_net_header_ops; > + dev->ethtool_ops = &kni_net_ethtool_ops; > dev->watchdog_timeo = WD_TIMEOUT; > } > > -- > 2.20.1 >
[dpdk-dev] [PATCH v2] kernel/linux: fix modules install path
Currently kernel modules are installed into /usr/src instead of /lib/modules when meson build system is used. This patch fixes that. Signed-off-by: Igor Ryzhov --- v2 - don't change kernel_dir to kernel_version kernel/linux/igb_uio/meson.build | 4 ++-- kernel/linux/kni/meson.build | 4 ++-- kernel/linux/meson.build | 4 ++-- meson_options.txt| 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/linux/igb_uio/meson.build b/kernel/linux/igb_uio/meson.build index f5a9d5ccf..fac404f07 100644 --- a/kernel/linux/igb_uio/meson.build +++ b/kernel/linux/igb_uio/meson.build @@ -8,7 +8,7 @@ mkfile = custom_target('igb_uio_makefile', custom_target('igb_uio', input: ['igb_uio.c', 'Kbuild'], output: 'igb_uio.ko', - command: ['make', '-C', kernel_dir, + command: ['make', '-C', kernel_dir + '/build', 'M=' + meson.current_build_dir(), 'src=' + meson.current_source_dir(), 'EXTRA_CFLAGS=-I' + meson.current_source_dir() + @@ -16,5 +16,5 @@ custom_target('igb_uio', 'modules'], depends: mkfile, install: true, - install_dir: kernel_dir + '/../extra/dpdk', + install_dir: kernel_dir + '/extra/dpdk', build_by_default: get_option('enable_kmods')) diff --git a/kernel/linux/kni/meson.build b/kernel/linux/kni/meson.build index a9f48b0e6..955eec949 100644 --- a/kernel/linux/kni/meson.build +++ b/kernel/linux/kni/meson.build @@ -13,7 +13,7 @@ kni_sources = files( custom_target('rte_kni', input: kni_sources, output: 'rte_kni.ko', - command: ['make', '-j4', '-C', kernel_dir, + command: ['make', '-j4', '-C', kernel_dir + '/build', 'M=' + meson.current_build_dir(), 'src=' + meson.current_source_dir(), 'MODULE_CFLAGS=-include ' + meson.source_root() + '/config/rte_config.h' + @@ -25,5 +25,5 @@ custom_target('rte_kni', depends: kni_mkfile, console: true, install: true, - install_dir: kernel_dir + '/../extra/dpdk', + install_dir: kernel_dir + '/extra/dpdk', build_by_default: get_option('enable_kmods')) diff --git a/kernel/linux/meson.build b/kernel/linux/meson.build index a37c95752..1796cc686 100644 --- a/kernel/linux/meson.build +++ b/kernel/linux/meson.build @@ -13,11 +13,11 @@ kernel_dir = get_option('kernel_dir') if kernel_dir == '' # use default path for native builds kernel_version = run_command('uname', '-r').stdout().strip() - kernel_dir = '/lib/modules/' + kernel_version + '/build' + kernel_dir = '/lib/modules/' + kernel_version endif # test running make in kernel directory, using "make kernelversion" -make_returncode = run_command('make', '-sC', kernel_dir, +make_returncode = run_command('make', '-sC', kernel_dir + '/build', 'kernelversion').returncode() if make_returncode != 0 warning('Cannot compile kernel modules as requested - are kernel headers installed?') diff --git a/meson_options.txt b/meson_options.txt index 16d9f92c6..5302b9c68 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -15,7 +15,7 @@ option('ibverbs_link', type: 'combo', choices : ['shared', 'dlopen'], value: 'sh option('include_subdir_arch', type: 'string', value: '', description: 'subdirectory where to install arch-dependent headers') option('kernel_dir', type: 'string', value: '', - description: 'path to the kernel for building kernel modules, they will be installed in $DEST_DIR/$kernel_dir/../extra/dpdk') + description: 'Path to the kernel for building kernel modules. Headers must be in $kernel_dir/build. Modules will be installed in $DEST_DIR/$kernel_dir/extra/dpdk.') option('lib_musdk_dir', type: 'string', value: '', description: 'path to the MUSDK library installation directory') option('machine', type: 'string', value: 'native', -- 2.21.0
Re: [dpdk-dev] [PATCH] kernel/linux: fix modules install path
Thanks for the clarification, I get it now. I will send v2 later today. On Mon, Jun 10, 2019 at 2:12 PM Bruce Richardson wrote: > On Mon, Jun 10, 2019 at 02:04:26PM +0300, Igor Ryzhov wrote: > >Bruce, > >From my understanding, kernel_dir is a directory with kernel headers > >needed > >for modules building. > > Right now, yes. I'd suggest that we change that to the actual kernel > modules directory, and we get both the build directory and the install > directory based off that. > > >When it's formed automatically, yes, it will be > >"/lib/modules/version/build" and we can get installation directory by > >stripping > >"/build". > > Well, I'd suggest if we query the value automatically we don't both adding > the build, and just add that later when building the modules, i.e. > kernel_dir should always be the the base directory without "build" on it. > > > But when it's set manually, it can be set to, for example, > >"/usr/src/linux-headers-version", and build will be successful, but we > >won't be > >able to strip "/build" as there is no "/build". > >Which path should be used for installation in cross-compile case, when > >the > >kernel_dir is set manually? > > The stripping "build" was just a suggestion to allow the value to be > specified either with or without the "build/" suffix and have things work. > For the paths specified in the cross-compile case, my thinking was that we > would: > * build using /build > * install to /extra/dpdk > > as with the non-cross-compile case. > > /Bruce > >
Re: [dpdk-dev] [PATCH] kernel/linux: fix modules install path
Bruce, >From my understanding, kernel_dir is a directory with kernel headers needed for modules building. When it's formed automatically, yes, it will be "/lib/modules/version/build" and we can get installation directory by stripping "/build". But when it's set manually, it can be set to, for example, "/usr/src/linux-headers-version", and build will be successful, but we won't be able to strip "/build" as there is no "/build". Which path should be used for installation in cross-compile case, when the kernel_dir is set manually? On Mon, Jun 10, 2019 at 12:37 PM Bruce Richardson < bruce.richard...@intel.com> wrote: > On Mon, Jun 10, 2019 at 11:25:52AM +0300, Igor Ryzhov wrote: > > Currently kernel modules are installed into /usr/src/ instead of > > /lib/modules when meson build system is used. This patch fixes that. > > > > Old build option "kernel_dir" is changed to "kernel_version". > > > > Signed-off-by: Igor Ryzhov > > --- > > kernel/linux/igb_uio/meson.build | 2 +- > > kernel/linux/kni/meson.build | 2 +- > > kernel/linux/meson.build | 16 +--- > > meson_options.txt| 4 ++-- > > 4 files changed, 13 insertions(+), 11 deletions(-) > > > > diff --git a/kernel/linux/igb_uio/meson.build > b/kernel/linux/igb_uio/meson.build > > index f5a9d5ccf..5093610e3 100644 > > --- a/kernel/linux/igb_uio/meson.build > > +++ b/kernel/linux/igb_uio/meson.build > > @@ -16,5 +16,5 @@ custom_target('igb_uio', > > 'modules'], > > depends: mkfile, > > install: true, > > - install_dir: kernel_dir + '/../extra/dpdk', > > + install_dir: kernel_install_dir + '/extra/dpdk', > > build_by_default: get_option('enable_kmods')) > > diff --git a/kernel/linux/kni/meson.build b/kernel/linux/kni/meson.build > > index a9f48b0e6..8a902d2ed 100644 > > --- a/kernel/linux/kni/meson.build > > +++ b/kernel/linux/kni/meson.build > > @@ -25,5 +25,5 @@ custom_target('rte_kni', > > depends: kni_mkfile, > > console: true, > > install: true, > > - install_dir: kernel_dir + '/../extra/dpdk', > > + install_dir: kernel_install_dir + '/extra/dpdk', > > build_by_default: get_option('enable_kmods')) > > diff --git a/kernel/linux/meson.build b/kernel/linux/meson.build > > index a37c95752..5a9303b33 100644 > > --- a/kernel/linux/meson.build > > +++ b/kernel/linux/meson.build > > @@ -3,19 +3,21 @@ > > > > subdirs = ['igb_uio', 'kni'] > > > > -# if we are cross-compiling we need kernel_dir specified > > -if get_option('kernel_dir') == '' and meson.is_cross_build() > > - warning('Need "kernel_dir" option for kmod compilation when > cross-compiling') > > +# if we are cross-compiling we need kernel_version specified > > +if get_option('kernel_version') == '' and meson.is_cross_build() > > + warning('Need "kernel_version" option for kmod compilation when > cross-compiling') > > subdir_done() > > endif > > Looking at the patch now, I'm not sure that this change from kernel_dir to > kernel_version is the right thing to do - since it almost certainly cause > issues for cross-compiling. The kernel modules almost certainly won't be in > the host's /lib/modules folder in cross-compile cases, so I think we need > to continue to specify a path to the kernel modules folder. > > To me the simplest option is that we should take the path to the kernels > module folder i.e. same as now, just without the "/build". A comment update > in the meson_options.txt file should be all that is needed to modify that > parameter, and we should be able to have the meson.build file automatically > strip "/build" off the path in order to enable backward compatibility. What > do you think? > > /Bruce > >
Re: [dpdk-dev] Incorrect install_dir for Linux kernel modules
Done – http://patches.dpdk.org/patch/54619/ On Fri, Jun 7, 2019 at 12:29 PM Bruce Richardson wrote: > On Thu, Jun 06, 2019 at 09:33:09PM +0300, Igor Ryzhov wrote: > > Hi everyone, > > > > I faced an issue today with install directory for rte_kni and igb_uio. > > Modules are installed to '/usr/src/...' instead of '/lib/modules/...'. > > > > Right now install_dir for both modules is set to: > > > > install_dir: kernel_dir + '/../extra/dpdk' > > > > where 'kernel_dir' is set manually or automatically to: > > > > kernel_version = run_command('uname', '-r').stdout().strip() > > kernel_dir = '/lib/modules/' + kernel_version + '/build' > > > > > > I believe the intention of using '/../extra/dpdk' was to come back to > > '/lib/modules/kernel_version/' and install the modules there, but as > > '/lib/modules/kernel_version/build' is a symlink, we actually get into > > the '/usr/src/linux-headers-kernel-version' and install modules there. > > > > I see two possible solutions here: > > 1. Add new 'kernel_install_dir' option. If it is not set, set it > > automatically: > > > > kernel_version = run_command('uname', '-r').stdout().strip() > > kernel_install_dir = '/lib/modules/' + kernel_version > > > > and set install_dir to: > > install_dir: kernel_install_dir + '/extra/dpdk' > > > > 2. Replace 'kernel_dir' option with new option 'kernel_version', and do > the > > following: > > > > if kernel_version == '' > > > > kernel_version = run_command('uname', '-r').stdout().strip() > > > > > > then set 'kernel_dir' and 'kernel_install_dir' accordingly: > > > > kernel_dir = '/lib/modules/' + kernel_version + '/build' > > > > kernel_install_dir = 'lib/modules/' + kernel_version > > > > and use it for building and installation: > > custom_target('rte_kni', > > > > ... > > > > command: ['make', '-j4', '-C', kernel_dir, > > > > ... > > > > install_dir: kernel_install_dir + '/extra/dpdk', > > > > ... > > > > > > I prefer the second one and already have a patch. > > Can send the patch if maintainers are good with the solution. > > > Thanks for highlighting this, and I agree with option 2 as a good solution. > Please send out the patch. > > /Bruce >
[dpdk-dev] [PATCH] kernel/linux: fix modules install path
Currently kernel modules are installed into /usr/src/ instead of /lib/modules when meson build system is used. This patch fixes that. Old build option "kernel_dir" is changed to "kernel_version". Signed-off-by: Igor Ryzhov --- kernel/linux/igb_uio/meson.build | 2 +- kernel/linux/kni/meson.build | 2 +- kernel/linux/meson.build | 16 +--- meson_options.txt| 4 ++-- 4 files changed, 13 insertions(+), 11 deletions(-) diff --git a/kernel/linux/igb_uio/meson.build b/kernel/linux/igb_uio/meson.build index f5a9d5ccf..5093610e3 100644 --- a/kernel/linux/igb_uio/meson.build +++ b/kernel/linux/igb_uio/meson.build @@ -16,5 +16,5 @@ custom_target('igb_uio', 'modules'], depends: mkfile, install: true, - install_dir: kernel_dir + '/../extra/dpdk', + install_dir: kernel_install_dir + '/extra/dpdk', build_by_default: get_option('enable_kmods')) diff --git a/kernel/linux/kni/meson.build b/kernel/linux/kni/meson.build index a9f48b0e6..8a902d2ed 100644 --- a/kernel/linux/kni/meson.build +++ b/kernel/linux/kni/meson.build @@ -25,5 +25,5 @@ custom_target('rte_kni', depends: kni_mkfile, console: true, install: true, - install_dir: kernel_dir + '/../extra/dpdk', + install_dir: kernel_install_dir + '/extra/dpdk', build_by_default: get_option('enable_kmods')) diff --git a/kernel/linux/meson.build b/kernel/linux/meson.build index a37c95752..5a9303b33 100644 --- a/kernel/linux/meson.build +++ b/kernel/linux/meson.build @@ -3,19 +3,21 @@ subdirs = ['igb_uio', 'kni'] -# if we are cross-compiling we need kernel_dir specified -if get_option('kernel_dir') == '' and meson.is_cross_build() - warning('Need "kernel_dir" option for kmod compilation when cross-compiling') +# if we are cross-compiling we need kernel_version specified +if get_option('kernel_version') == '' and meson.is_cross_build() + warning('Need "kernel_version" option for kmod compilation when cross-compiling') subdir_done() endif -kernel_dir = get_option('kernel_dir') -if kernel_dir == '' - # use default path for native builds +kernel_version = get_option('kernel_version') +if kernel_version == '' + # use default version for native builds kernel_version = run_command('uname', '-r').stdout().strip() - kernel_dir = '/lib/modules/' + kernel_version + '/build' endif +kernel_dir = '/lib/modules/' + kernel_version + '/build' +kernel_install_dir = '/lib/modules/' + kernel_version + # test running make in kernel directory, using "make kernelversion" make_returncode = run_command('make', '-sC', kernel_dir, 'kernelversion').returncode() diff --git a/meson_options.txt b/meson_options.txt index 16d9f92c6..5ca50d8dc 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -14,8 +14,8 @@ option('ibverbs_link', type: 'combo', choices : ['shared', 'dlopen'], value: 'sh description: 'Linkage method (shared/dlopen) for Mellanox PMDs with ibverbs dependencies.') option('include_subdir_arch', type: 'string', value: '', description: 'subdirectory where to install arch-dependent headers') -option('kernel_dir', type: 'string', value: '', - description: 'path to the kernel for building kernel modules, they will be installed in $DEST_DIR/$kernel_dir/../extra/dpdk') +option('kernel_version', type: 'string', value: '', + description: 'kernel version for building kernel modules') option('lib_musdk_dir', type: 'string', value: '', description: 'path to the MUSDK library installation directory') option('machine', type: 'string', value: 'native', -- 2.21.0
[dpdk-dev] Incorrect install_dir for Linux kernel modules
Hi everyone, I faced an issue today with install directory for rte_kni and igb_uio. Modules are installed to '/usr/src/...' instead of '/lib/modules/...'. Right now install_dir for both modules is set to: install_dir: kernel_dir + '/../extra/dpdk' where 'kernel_dir' is set manually or automatically to: kernel_version = run_command('uname', '-r').stdout().strip() kernel_dir = '/lib/modules/' + kernel_version + '/build' I believe the intention of using '/../extra/dpdk' was to come back to '/lib/modules/kernel_version/' and install the modules there, but as '/lib/modules/kernel_version/build' is a symlink, we actually get into the '/usr/src/linux-headers-kernel-version' and install modules there. I see two possible solutions here: 1. Add new 'kernel_install_dir' option. If it is not set, set it automatically: kernel_version = run_command('uname', '-r').stdout().strip() kernel_install_dir = '/lib/modules/' + kernel_version and set install_dir to: install_dir: kernel_install_dir + '/extra/dpdk' 2. Replace 'kernel_dir' option with new option 'kernel_version', and do the following: if kernel_version == '' kernel_version = run_command('uname', '-r').stdout().strip() then set 'kernel_dir' and 'kernel_install_dir' accordingly: kernel_dir = '/lib/modules/' + kernel_version + '/build' kernel_install_dir = 'lib/modules/' + kernel_version and use it for building and installation: custom_target('rte_kni', ... command: ['make', '-j4', '-C', kernel_dir, ... install_dir: kernel_install_dir + '/extra/dpdk', ... I prefer the second one and already have a patch. Can send the patch if maintainers are good with the solution. Best regards, Igor
[dpdk-dev] [PATCH] kni: remove PCI related information
As there is no ethtool support in KNI anymore, PCI related information is no longer needed. Signed-off-by: Igor Ryzhov --- kernel/linux/kni/kni_dev.h| 3 --- kernel/linux/kni/kni_misc.c | 6 -- lib/librte_eal/linux/eal/include/rte_kni_common.h | 7 --- lib/librte_kni/rte_kni.c | 4 4 files changed, 20 deletions(-) diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index 44a5a61f7..d57bce647 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -50,9 +50,6 @@ struct kni_dev { wait_queue_head_t wq; struct mutex sync_lock; - /* PCI device id */ - uint16_t device_id; - /* kni device */ struct net_device *net_dev; diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index af18c67c4..1fc5eeb9c 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -377,12 +377,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num, (unsigned long long) dev_info.resp_phys, kni->resp_q); pr_debug("mbuf_size:%u\n", kni->mbuf_size); - pr_debug("PCI: %02x:%02x.%02x %04x:%04x\n", - dev_info.bus, - dev_info.devid, - dev_info.function, - dev_info.vendor_id, - dev_info.device_id); /* if user has provided a valid mac address */ if (is_valid_ether_addr(dev_info.mac_addr)) memcpy(net_dev->dev_addr, dev_info.mac_addr, ETH_ALEN); diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h index 5db5a1333..91a1c1408 100644 --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h @@ -111,13 +111,6 @@ struct rte_kni_device_info { void * mbuf_va; phys_addr_t mbuf_phys; - /* PCI info */ - uint16_t vendor_id; /**< Vendor ID or PCI_ANY_ID. */ - uint16_t device_id; /**< Device ID or PCI_ANY_ID. */ - uint8_t bus; /**< Device bus */ - uint8_t devid;/**< Device ID */ - uint8_t function; /**< Device function. */ - uint16_t group_id;/**< Group ID */ uint32_t core_id; /**< core ID to bind for kernel thread */ diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index a0f1e3767..e29d0cc7d 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -252,10 +252,6 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, strlcpy(dev_info.name, conf->name, RTE_KNI_NAMESIZE); - RTE_LOG(INFO, KNI, "pci: %02x:%02x:%02x \t %02x:%02x\n", - dev_info.bus, dev_info.devid, dev_info.function, - dev_info.vendor_id, dev_info.device_id); - ret = kni_reserve_mz(kni); if (ret < 0) goto mz_fail; -- 2.21.0
Re: [dpdk-dev] [PATCH v2] kni: implement header_ops parse method
Hi Ferruh, To be absolutely sure, I performed a test using the test application. When I send pings from an interface: 3: ens8: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:c8:79:c6 brd ff:ff:ff:ff:ff:ff Here is what's in sockaddr_ll: $2 = {sll_family = 0x11, sll_protocol = 0x8, sll_ifindex = 0x2, sll_hatype = 0x1, sll_pkttype = 0x0, sll_halen = 0x6, sll_addr = { 0x52, 0x54, 0x0, 0xc8, 0x79, 0xc6, 0x0, 0x0}} So everything works as expected – the address in sll_addr is correct. Last two bytes are zero because the length of sll_addr is 8, however, Ethernet length is 6. Igor On Fri, Apr 12, 2019 at 8:15 PM Ferruh Yigit wrote: > On 4/12/2019 6:12 PM, Igor Ryzhov wrote: > > Hi Ferruh, > > > > I didn't test it with any special application, but FRR's ISIS works for > me after > > the patch, and it didn't work before. > > That is good enough, and by work you mean that you are able to get correct > value > on 'sll_addr', right? > > > > > Igor > > > > On Fri, Apr 12, 2019 at 5:53 PM Ferruh Yigit > <mailto:ferruh.yi...@intel.com>> wrote: > > > > On 4/12/2019 3:52 PM, Ferruh Yigit wrote: > > > On 4/10/2019 11:30 AM, Igor Ryzhov wrote: > > >> It allows applications running packet sockets over KNI interfaces > to get > > >> source Ethernet addresses of packets received using recvfrom > function. > > >> > > >> Signed-off-by: Igor Ryzhov iryz...@nfware.com>> > > > > > > Acked-by: Ferruh Yigit > <mailto:ferruh.yi...@intel.com>> > > > > > > > > > Hi Igor, > > > > > > I tested this with a quick application on top of kni interfaces, > that > > reads and > > > prints the 'sll_halen', but the last two bytes of the mac address > are always > > > > I mean 'sll_addr', 'sll_halen' is right (6). > > > > > zero, it is quite possible that something is not right in the test > app, but > > > before spending any time on it, can you please confirm this is > working > > fine for you? > > > > > > >
Re: [dpdk-dev] [PATCH v2] kni: implement header_ops parse method
Hi Ferruh, I didn't test it with any special application, but FRR's ISIS works for me after the patch, and it didn't work before. Igor On Fri, Apr 12, 2019 at 5:53 PM Ferruh Yigit wrote: > On 4/12/2019 3:52 PM, Ferruh Yigit wrote: > > On 4/10/2019 11:30 AM, Igor Ryzhov wrote: > >> It allows applications running packet sockets over KNI interfaces to get > >> source Ethernet addresses of packets received using recvfrom function. > >> > >> Signed-off-by: Igor Ryzhov > > > > Acked-by: Ferruh Yigit > > > > > > Hi Igor, > > > > I tested this with a quick application on top of kni interfaces, that > reads and > > prints the 'sll_halen', but the last two bytes of the mac address are > always > > I mean 'sll_addr', 'sll_halen' is right (6). > > > zero, it is quite possible that something is not right in the test app, > but > > before spending any time on it, can you please confirm this is working > fine for you? > > > >
[dpdk-dev] [PATCH v2] kni: implement header_ops parse method
It allows applications running packet sockets over KNI interfaces to get source Ethernet addresses of packets received using recvfrom function. Signed-off-by: Igor Ryzhov --- v2: Use eth_header_parse instead of own implementation. --- kernel/linux/kni/kni_net.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index be9e6b0b9..ad8365877 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -744,6 +744,7 @@ kni_net_change_carrier(struct net_device *dev, bool new_carrier) static const struct header_ops kni_net_header_ops = { .create = kni_net_header, + .parse = eth_header_parse, #ifdef HAVE_REBUILD_HEADER .rebuild = kni_net_rebuild_header, #endif /* < 4.1.0 */ -- 2.21.0
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
Hi Ferruh, Can we proceed with this patch? Igor On Thu, Jan 24, 2019 at 9:05 PM Igor Ryzhov wrote: > Hi Ferruh, > > Ok, no problem. Generally, it is needed for all applications using > packet(7) interface, running over KNI interfaces. > More specifically, one example of such application is FRRouting, I suppose > you are familiar with it. > FRR's ISIS daemon is using AF_PACKET sockets and checking received > sockaddr_ll. > Here is the link on one of the usages – > https://github.com/FRRouting/frr/blob/master/isisd/isis_pfpacket.c#L294 > > When we are good with motivation, I'll send v2 using eth_header_parse. > > Best regards, > Igor > > On Thu, Jan 24, 2019 at 8:15 PM Ferruh Yigit > wrote: > >> On 1/24/2019 4:35 PM, Igor Ryzhov wrote: >> > Hi Ferruh, >> > >> > I already answered your question in my previous email, header_ops.parse >> method >> > is used by packet(7) interface for packet parsing and filling >> sockaddr_ll structure. >> > Here is the link on the usage – >> > >> https://elixir.bootlin.com/linux/latest/source/net/packet/af_packet.c#L2100 >> >> I saw your previous reply. That is why changed the question slightly, >> from 'what >> it does' to 'what is the use case'. >> Trying to understand do we need it, please help me understand. >> >> > >> > Regarding the question about eth_header_ops usage: >> > Right now both already existing functions, kni_net_header and >> > kni_net_rebuild_header, are implemented as copies, not using >> eth_header_ops >> > functions. >> >> OK, I see your reasoning, but if there is an already Linux API that does >> what we >> want, I suggest calling it instead of re-implementing it, unless there is >> a good >> reason to not do so. >> >> > That's why I think my patch should be accepted as is, and the problem of >> > eth_header_ops usage should be investigated separately, and possibly >> resolved by >> > a separate patch. >> >> Agreed, eth_header_ops usage should be investigated separately. >> >> > >> > Best regards, >> > Igor >> > >> > On Thu, Jan 24, 2019 at 5:10 PM Ferruh Yigit > > <mailto:ferruh.yi...@intel.com>> wrote: >> > >> > On 1/24/2019 9:18 AM, Igor Ryzhov wrote: >> > > Hi Ferruh, >> > > >> > > What about this patch? >> > > Can you merge it as-is, or should I change it to use relevant >> eth_header_ops >> > > functions? Or maybe completely use eth_header_ops? >> > >> > Hi Igor, >> > >> > I am not clear about motivation of the patch, what use case enabled >> by this >> > patch? What is not working with current code? >> > I am for rejecting the patch without need justified. >> > >> > And if the need is justified, still there is a question that why >> not use >> > 'eth_header_parse()' directly but implement our copy? >> > >> > >> > And an extended question/investigation about why not use >> 'eth_header_ops', which >> > seems done intentionally but I am missing the reasoning. >> > >> > > >> > > Best regards, >> > > Igor >> > > >> > > On Fri, Nov 30, 2018 at 10:07 PM Igor Ryzhov > > <mailto:iryz...@nfware.com> >> > > <mailto:iryz...@nfware.com <mailto:iryz...@nfware.com>>> wrote: >> > > >> > > Hi Ferruh, >> > > >> > > header_ops.parse method is used by raw-sockets to >> fill sockaddr_ll >> > structure. >> > > It is used, for example, in isisd for frrouting. >> > > >> > > Regarding your question about eth_header_ops – I, >> unfortunately, don't >> > know >> > > why .cache and .cache_update are disabled for KNI. >> > > I also think that it will be better to use default >> eth_header_ops. >> > > >> > > Best regards, >> > > Igor >> > > >> > > On Tue, Oct 2, 2018 at 7:58 PM Ferruh Yigit < >> ferruh.yi...@intel.com >> > <mailto:ferruh.yi...@intel.com> >> > > <mailto:ferruh.yi...@intel.com <mailto:ferruh.yi...@intel.com>>> >> wrote: >> > > >> >
Re: [dpdk-dev] [PATCH] doc: deprecate KNI ethtool support
Acked-by: Igor Ryzhov On Mon, Feb 18, 2019 at 3:30 PM Ferruh Yigit wrote: > Remove KNI ethtool support. > > Signed-off-by: Ferruh Yigit > --- > RFC Patch: https://patches.dpdk.org/patch/49025/ > --- > doc/guides/rel_notes/deprecation.rst | 8 > 1 file changed, 8 insertions(+) > > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 1b4fcb7e6..0491eeea1 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -75,3 +75,11 @@ Deprecation Notices > > * crypto/aesni_mb: the minimum supported intel-ipsec-mb library version > will be >changed from 0.49.0 to 0.52.0. > + > +* kni: remove KNI ethtool support. To clarify, this is not to remove the > KNI, > + but only to remove ethtool support of it that is disabled by default and > + can be enabled via ``CONFIG_RTE_KNI_KMOD_ETHTOOL`` config option. > + Existing KNI ethtool implementation is only supported by ``igb`` & > ``ixgbe`` > + drivers, by using a copy of kernel drivers in DPDK. This model can't be > extend > + to all drivers in DPDK and it is too much effort to maintain kernel > modules in DPDK. > + As a result users won't be able to use ``ethtool`` via ``igb`` & > ``ixgbe`` anymore. > -- > 2.20.1 > >
Re: [dpdk-dev] [RFC] kni: remove ethtool support
Hi Ferruh, Thanks. Should I be a maintainer to ack the patch? Best regards, Igor On Mon, Feb 18, 2019 at 3:33 PM Ferruh Yigit wrote: > On 2/6/2019 1:12 PM, Igor Ryzhov wrote: > > Hi Ferruh, > > > > What's the plan with this patch? > > Hi Igor, > > I just sent a deprecation notice for this: > https://patches.dpdk.org/patch/50347/ > > If the deprecation notice approved, requires 3 acks, note will go into > 19.05 > And later this patch can go in 19.08 > > Thanks, > ferruh > > > > > Best regards, > > Igor > > > > On Sat, Jan 5, 2019 at 7:55 PM Igor Ryzhov > <mailto:iryz...@nfware.com>> wrote: > > > > Hi Ferruh, > > > > I answered in another thread. > > > > Regarding this patch – I have no objections now. > > > > Best regards, > > Igor > > > > On Tue, Dec 18, 2018 at 9:17 PM Ferruh Yigit > <mailto:ferruh.yi...@intel.com>> wrote: > > > > On 12/18/2018 9:20 AM, Ferruh Yigit wrote: > > > On 12/18/2018 8:20 AM, Igor Ryzhov wrote: > > >> Hi Ferruh, > > >> > > >> Please, look at my patch http://patches.dpdk.org/patch/48454/ > and > > consider > > >> rebasing your patch over mine. > > > > > > Sorry about that, yes I will check it today. > > > > Hi Igor, > > > > I put some comments on your patch. > > > > As far as I can see it also has a target to remove current type > of ethtool > > support, so this RFC should not be a concern to you. > > All ethtool support can be removed, when you have an actual > solution for > > driver > > independent ethtool support only a little code needs to be added > back. > > > > Thanks, > > ferruh > > > > > > > >> > > >> As we discussed with Stephen, KNI needs to supply ethtool_ops > with > > >> .get_link function, to properly support link status. > > >> So we should save ethtool_ops and implement .get_link using > standard > > >> ethtool_op_get_link. > > >> > > >> Best regards, > > >> Igor > > > > > > > > > >
Re: [dpdk-dev] [RFC] kni: remove ethtool support
Hi Ferruh, What's the plan with this patch? Best regards, Igor On Sat, Jan 5, 2019 at 7:55 PM Igor Ryzhov wrote: > Hi Ferruh, > > I answered in another thread. > > Regarding this patch – I have no objections now. > > Best regards, > Igor > > On Tue, Dec 18, 2018 at 9:17 PM Ferruh Yigit > wrote: > >> On 12/18/2018 9:20 AM, Ferruh Yigit wrote: >> > On 12/18/2018 8:20 AM, Igor Ryzhov wrote: >> >> Hi Ferruh, >> >> >> >> Please, look at my patch http://patches.dpdk.org/patch/48454/ and >> consider >> >> rebasing your patch over mine. >> > >> > Sorry about that, yes I will check it today. >> >> Hi Igor, >> >> I put some comments on your patch. >> >> As far as I can see it also has a target to remove current type of ethtool >> support, so this RFC should not be a concern to you. >> All ethtool support can be removed, when you have an actual solution for >> driver >> independent ethtool support only a little code needs to be added back. >> >> Thanks, >> ferruh >> >> > >> >> >> >> As we discussed with Stephen, KNI needs to supply ethtool_ops with >> >> .get_link function, to properly support link status. >> >> So we should save ethtool_ops and implement .get_link using standard >> >> ethtool_op_get_link. >> >> >> >> Best regards, >> >> Igor >> > >> > >> >>
[dpdk-dev] i40e doesn't calculate RSS for GRE traffic.
Hello everyone, We are currently testing i40e support for RSS calculation. RSS is configured with all supported flags: #define I40E_RSS_OFFLOAD_ALL ( \ ETH_RSS_FRAG_IPV4 | \ ETH_RSS_NONFRAG_IPV4_TCP | \ ETH_RSS_NONFRAG_IPV4_UDP | \ ETH_RSS_NONFRAG_IPV4_SCTP | \ ETH_RSS_NONFRAG_IPV4_OTHER | \ ETH_RSS_FRAG_IPV6 | \ ETH_RSS_NONFRAG_IPV6_TCP | \ ETH_RSS_NONFRAG_IPV6_UDP | \ ETH_RSS_NONFRAG_IPV6_SCTP | \ ETH_RSS_NONFRAG_IPV6_OTHER | \ ETH_RSS_L2_PAYLOAD) We checked RSS calculation for following types of packets: IPv4+TCP IPv4+UDP IPv4+ICMP IPv4+GRE and the same, but for IPv4 fragmented packets. And for the case "IPv4+GRE" RSS hash is not calculated and set to 0 in rte_mbuf. Is this a known issue? Best regards, Igor
Re: [dpdk-dev] [PATCH] kni: fix rte_kni_update_link
Hi again, Sorry for bothering, I should have done more testing. It works as it is now. So, self NACK on the patch. Best regards, Igor On Mon, Jan 28, 2019 at 2:45 PM Igor Ryzhov wrote: > Hi Ferruh, > > Can you, please, take a look at this patch? > The current implementation is broken, I think the patch should be merged > into 19.02 and 18.11.1. > > Best regards, > Igor > > On Thu, Jan 24, 2019 at 11:47 PM Igor Ryzhov wrote: > >> After read, file offset must be set to 0 before write. >> Otherwise, the third byte will be overwritten instead of the first. >> >> Fixes: c6fd54f28c24 ("kni: add function to set link state on kernel >> interface") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Igor Ryzhov >> --- >> lib/librte_kni/rte_kni.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c >> index 73aef..5899bb14a 100644 >> --- a/lib/librte_kni/rte_kni.c >> +++ b/lib/librte_kni/rte_kni.c >> @@ -746,6 +746,12 @@ rte_kni_update_link(struct rte_kni *kni, unsigned >> int linkup) >> } >> old_linkup = (old_carrier[0] == '1'); >> >> + if (lseek(fd, 0, SEEK_SET) == -1) { >> + RTE_LOG(ERR, KNI, "Failed to change file position: >> %s.\n", path); >> + close(fd); >> + return -1; >> + } >> + >> new_carrier = linkup ? "1" : "0"; >> ret = write(fd, new_carrier, 1); >> if (ret < 1) { >> -- >> 2.20.1 >> >>
Re: [dpdk-dev] [PATCH] kni: fix rte_kni_update_link
Hi Ferruh, Can you, please, take a look at this patch? The current implementation is broken, I think the patch should be merged into 19.02 and 18.11.1. Best regards, Igor On Thu, Jan 24, 2019 at 11:47 PM Igor Ryzhov wrote: > After read, file offset must be set to 0 before write. > Otherwise, the third byte will be overwritten instead of the first. > > Fixes: c6fd54f28c24 ("kni: add function to set link state on kernel > interface") > Cc: sta...@dpdk.org > > Signed-off-by: Igor Ryzhov > --- > lib/librte_kni/rte_kni.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c > index 73aef..5899bb14a 100644 > --- a/lib/librte_kni/rte_kni.c > +++ b/lib/librte_kni/rte_kni.c > @@ -746,6 +746,12 @@ rte_kni_update_link(struct rte_kni *kni, unsigned int > linkup) > } > old_linkup = (old_carrier[0] == '1'); > > + if (lseek(fd, 0, SEEK_SET) == -1) { > + RTE_LOG(ERR, KNI, "Failed to change file position: %s.\n", > path); > + close(fd); > + return -1; > + } > + > new_carrier = linkup ? "1" : "0"; > ret = write(fd, new_carrier, 1); > if (ret < 1) { > -- > 2.20.1 > >
[dpdk-dev] [PATCH] kni: fix rte_kni_update_link
After read, file offset must be set to 0 before write. Otherwise, the third byte will be overwritten instead of the first. Fixes: c6fd54f28c24 ("kni: add function to set link state on kernel interface") Cc: sta...@dpdk.org Signed-off-by: Igor Ryzhov --- lib/librte_kni/rte_kni.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 73aef..5899bb14a 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -746,6 +746,12 @@ rte_kni_update_link(struct rte_kni *kni, unsigned int linkup) } old_linkup = (old_carrier[0] == '1'); + if (lseek(fd, 0, SEEK_SET) == -1) { + RTE_LOG(ERR, KNI, "Failed to change file position: %s.\n", path); + close(fd); + return -1; + } + new_carrier = linkup ? "1" : "0"; ret = write(fd, new_carrier, 1); if (ret < 1) { -- 2.20.1
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
Hi Ferruh, Ok, no problem. Generally, it is needed for all applications using packet(7) interface, running over KNI interfaces. More specifically, one example of such application is FRRouting, I suppose you are familiar with it. FRR's ISIS daemon is using AF_PACKET sockets and checking received sockaddr_ll. Here is the link on one of the usages – https://github.com/FRRouting/frr/blob/master/isisd/isis_pfpacket.c#L294 When we are good with motivation, I'll send v2 using eth_header_parse. Best regards, Igor On Thu, Jan 24, 2019 at 8:15 PM Ferruh Yigit wrote: > On 1/24/2019 4:35 PM, Igor Ryzhov wrote: > > Hi Ferruh, > > > > I already answered your question in my previous email, header_ops.parse > method > > is used by packet(7) interface for packet parsing and filling > sockaddr_ll structure. > > Here is the link on the usage – > > > https://elixir.bootlin.com/linux/latest/source/net/packet/af_packet.c#L2100 > > I saw your previous reply. That is why changed the question slightly, from > 'what > it does' to 'what is the use case'. > Trying to understand do we need it, please help me understand. > > > > > Regarding the question about eth_header_ops usage: > > Right now both already existing functions, kni_net_header and > > kni_net_rebuild_header, are implemented as copies, not using > eth_header_ops > > functions. > > OK, I see your reasoning, but if there is an already Linux API that does > what we > want, I suggest calling it instead of re-implementing it, unless there is > a good > reason to not do so. > > > That's why I think my patch should be accepted as is, and the problem of > > eth_header_ops usage should be investigated separately, and possibly > resolved by > > a separate patch. > > Agreed, eth_header_ops usage should be investigated separately. > > > > > Best regards, > > Igor > > > > On Thu, Jan 24, 2019 at 5:10 PM Ferruh Yigit > <mailto:ferruh.yi...@intel.com>> wrote: > > > > On 1/24/2019 9:18 AM, Igor Ryzhov wrote: > > > Hi Ferruh, > > > > > > What about this patch? > > > Can you merge it as-is, or should I change it to use relevant > eth_header_ops > > > functions? Or maybe completely use eth_header_ops? > > > > Hi Igor, > > > > I am not clear about motivation of the patch, what use case enabled > by this > > patch? What is not working with current code? > > I am for rejecting the patch without need justified. > > > > And if the need is justified, still there is a question that why not > use > > 'eth_header_parse()' directly but implement our copy? > > > > > > And an extended question/investigation about why not use > 'eth_header_ops', which > > seems done intentionally but I am missing the reasoning. > > > > > > > > Best regards, > > > Igor > > > > > > On Fri, Nov 30, 2018 at 10:07 PM Igor Ryzhov > <mailto:iryz...@nfware.com> > > > <mailto:iryz...@nfware.com <mailto:iryz...@nfware.com>>> wrote: > > > > > > Hi Ferruh, > > > > > > header_ops.parse method is used by raw-sockets to > fill sockaddr_ll > > structure. > > > It is used, for example, in isisd for frrouting. > > > > > > Regarding your question about eth_header_ops – I, > unfortunately, don't > > know > > > why .cache and .cache_update are disabled for KNI. > > > I also think that it will be better to use default > eth_header_ops. > > > > > > Best regards, > > > Igor > > > > > > On Tue, Oct 2, 2018 at 7:58 PM Ferruh Yigit < > ferruh.yi...@intel.com > > <mailto:ferruh.yi...@intel.com> > > > <mailto:ferruh.yi...@intel.com <mailto:ferruh.yi...@intel.com>>> > wrote: > > > > > > On 9/27/2018 1:02 AM, Igor Ryzhov wrote: > > > > Signed-off-by: Igor Ryzhov > <mailto:iryz...@nfware.com> > > > <mailto:iryz...@nfware.com <mailto:iryz...@nfware.com>>> > > > > > > Hi Igor, > > > > > > What is the motivation to add this support? What is > enabled by this? > > > > > > > > > Meanwhile, why we are not using eth_header_ops, which is > already > >
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
Hi Ferruh, I already answered your question in my previous email, header_ops.parse method is used by packet(7) interface for packet parsing and filling sockaddr_ll structure. Here is the link on the usage – https://elixir.bootlin.com/linux/latest/source/net/packet/af_packet.c#L2100 Regarding the question about eth_header_ops usage: Right now both already existing functions, kni_net_header and kni_net_rebuild_header, are implemented as copies, not using eth_header_ops functions. That's why I think my patch should be accepted as is, and the problem of eth_header_ops usage should be investigated separately, and possibly resolved by a separate patch. Best regards, Igor On Thu, Jan 24, 2019 at 5:10 PM Ferruh Yigit wrote: > On 1/24/2019 9:18 AM, Igor Ryzhov wrote: > > Hi Ferruh, > > > > What about this patch? > > Can you merge it as-is, or should I change it to use relevant > eth_header_ops > > functions? Or maybe completely use eth_header_ops? > > Hi Igor, > > I am not clear about motivation of the patch, what use case enabled by this > patch? What is not working with current code? > I am for rejecting the patch without need justified. > > And if the need is justified, still there is a question that why not use > 'eth_header_parse()' directly but implement our copy? > > > And an extended question/investigation about why not use 'eth_header_ops', > which > seems done intentionally but I am missing the reasoning. > > > > > Best regards, > > Igor > > > > On Fri, Nov 30, 2018 at 10:07 PM Igor Ryzhov > <mailto:iryz...@nfware.com>> wrote: > > > > Hi Ferruh, > > > > header_ops.parse method is used by raw-sockets to fill sockaddr_ll > structure. > > It is used, for example, in isisd for frrouting. > > > > Regarding your question about eth_header_ops – I, unfortunately, > don't know > > why .cache and .cache_update are disabled for KNI. > > I also think that it will be better to use default eth_header_ops. > > > > Best regards, > > Igor > > > > On Tue, Oct 2, 2018 at 7:58 PM Ferruh Yigit > <mailto:ferruh.yi...@intel.com>> wrote: > > > > On 9/27/2018 1:02 AM, Igor Ryzhov wrote: > > > Signed-off-by: Igor Ryzhov > <mailto:iryz...@nfware.com>> > > > > Hi Igor, > > > > What is the motivation to add this support? What is enabled by > this? > > > > > > Meanwhile, why we are not using eth_header_ops, which is already > set by > > ether_setup(). > > To disable .cache & .cache_update? > > > > If so why not using relevant eth_header_ops (eth_header, > > eth_header_parse ..) > > instead of implementing ours? > > > > > --- > > > kernel/linux/kni/kni_net.c | 14 ++ > > > 1 file changed, 14 insertions(+) > > > > > > diff --git a/kernel/linux/kni/kni_net.c > b/kernel/linux/kni/kni_net.c > > > index 7fcfa106c..128a5477c 100644 > > > --- a/kernel/linux/kni/kni_net.c > > > +++ b/kernel/linux/kni/kni_net.c > > > @@ -678,6 +678,19 @@ kni_net_header(struct sk_buff *skb, struct > > net_device *dev, > > > return dev->hard_header_len; > > > } > > > > > > +/* > > > + * Extract hardware address from packet > > > + */ > > > +static int > > > +kni_net_header_parse(const struct sk_buff *skb, unsigned char > *haddr) > > > +{ > > > + const struct ethhdr *eth = eth_hdr(skb); > > > + > > > + memcpy(haddr, eth->h_source, ETH_ALEN); > > > + > > > + return ETH_ALEN; > > > +} > > > + > > > /* > > > * Re-fill the eth header > > > */ > > > @@ -739,6 +752,7 @@ kni_net_change_carrier(struct net_device > *dev, > > bool new_carrier) > > > > > > static const struct header_ops kni_net_header_ops = { > > > .create = kni_net_header, > > > + .parse = kni_net_header_parse, > > > #ifdef HAVE_REBUILD_HEADER > > > .rebuild = kni_net_rebuild_header, > > > #endif /* < 4.1.0 */ > > > > > > >
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
Hi Ferruh, What about this patch? Can you merge it as-is, or should I change it to use relevant eth_header_ops functions? Or maybe completely use eth_header_ops? Best regards, Igor On Fri, Nov 30, 2018 at 10:07 PM Igor Ryzhov wrote: > Hi Ferruh, > > header_ops.parse method is used by raw-sockets to fill sockaddr_ll > structure. > It is used, for example, in isisd for frrouting. > > Regarding your question about eth_header_ops – I, unfortunately, don't > know why .cache and .cache_update are disabled for KNI. > I also think that it will be better to use default eth_header_ops. > > Best regards, > Igor > > On Tue, Oct 2, 2018 at 7:58 PM Ferruh Yigit > wrote: > >> On 9/27/2018 1:02 AM, Igor Ryzhov wrote: >> > Signed-off-by: Igor Ryzhov >> >> Hi Igor, >> >> What is the motivation to add this support? What is enabled by this? >> >> >> Meanwhile, why we are not using eth_header_ops, which is already set by >> ether_setup(). >> To disable .cache & .cache_update? >> >> If so why not using relevant eth_header_ops (eth_header, eth_header_parse >> ..) >> instead of implementing ours? >> >> > --- >> > kernel/linux/kni/kni_net.c | 14 ++ >> > 1 file changed, 14 insertions(+) >> > >> > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c >> > index 7fcfa106c..128a5477c 100644 >> > --- a/kernel/linux/kni/kni_net.c >> > +++ b/kernel/linux/kni/kni_net.c >> > @@ -678,6 +678,19 @@ kni_net_header(struct sk_buff *skb, struct >> net_device *dev, >> > return dev->hard_header_len; >> > } >> > >> > +/* >> > + * Extract hardware address from packet >> > + */ >> > +static int >> > +kni_net_header_parse(const struct sk_buff *skb, unsigned char *haddr) >> > +{ >> > + const struct ethhdr *eth = eth_hdr(skb); >> > + >> > + memcpy(haddr, eth->h_source, ETH_ALEN); >> > + >> > + return ETH_ALEN; >> > +} >> > + >> > /* >> > * Re-fill the eth header >> > */ >> > @@ -739,6 +752,7 @@ kni_net_change_carrier(struct net_device *dev, bool >> new_carrier) >> > >> > static const struct header_ops kni_net_header_ops = { >> > .create = kni_net_header, >> > + .parse = kni_net_header_parse, >> > #ifdef HAVE_REBUILD_HEADER >> > .rebuild = kni_net_rebuild_header, >> > #endif /* < 4.1.0 */ >> > >> >>
Re: [dpdk-dev] [RFC] kni: remove ethtool support
Hi Ferruh, I answered in another thread. Regarding this patch – I have no objections now. Best regards, Igor On Tue, Dec 18, 2018 at 9:17 PM Ferruh Yigit wrote: > On 12/18/2018 9:20 AM, Ferruh Yigit wrote: > > On 12/18/2018 8:20 AM, Igor Ryzhov wrote: > >> Hi Ferruh, > >> > >> Please, look at my patch http://patches.dpdk.org/patch/48454/ and > consider > >> rebasing your patch over mine. > > > > Sorry about that, yes I will check it today. > > Hi Igor, > > I put some comments on your patch. > > As far as I can see it also has a target to remove current type of ethtool > support, so this RFC should not be a concern to you. > All ethtool support can be removed, when you have an actual solution for > driver > independent ethtool support only a little code needs to be added back. > > Thanks, > ferruh > > > > >> > >> As we discussed with Stephen, KNI needs to supply ethtool_ops with > >> .get_link function, to properly support link status. > >> So we should save ethtool_ops and implement .get_link using standard > >> ethtool_op_get_link. > >> > >> Best regards, > >> Igor > > > > > >
Re: [dpdk-dev] [PATCH v2] kni: use kni_ethtool_ops only with unknown drivers
Hi Ferruh, Thank you for all your comments. The only real purpose of the patch is to support .get_link for all KNI interfaces, not just for those using igb or ixgbe. So I propose to remove ethtool support at all with your patch, and after that, I will add .get_link support again. Yes, I understand that it doesn't implement real link status support, it just relies on the application, that should set link status using sysfs. Regarding the ethtool support in a driver-independent way – the idea is to use the same req/resp queues which are used for some net_device_ops like .ndo_change_mtu or .ndo_set_mac_address. I want to add more types of requests to rte_kni_req_id enum and implement them. It's not an urgent task, so I don't know yet when I'll find time to do it. Best regards, Igor On Tue, Dec 18, 2018 at 9:13 PM Ferruh Yigit wrote: > On 12/3/2018 2:06 PM, Igor Ryzhov wrote: > > Hi Ferruh, > > > > What about the patch? > > > > I also support dropping ethtool for ixgbe and i40e, but to save generic > ethtool_ops > > with .get_link implementation, because it's an essential function that > works > > correctly > > after proper implementation of carrier status that was merged into 18.11. > > > > Also, other ethtool operations may be implemented in a > driver-independent way using > > the same concept as for netdev_ops. > > "carrier status" relies on the sample app support also relies on kni > net_device > sysfs interface. > It is good target to have ethtool support in a driver-independent way, > please > share more details and lets discuss them. > > > > > On Mon, Dec 3, 2018 at 4:09 PM Ferruh Yigit > <mailto:ferruh.yi...@intel.com>> wrote: > > > > On 11/30/2018 11:38 PM, Stephen Hemminger wrote: > > > On Fri, 30 Nov 2018 22:47:50 +0300 > > > Igor Ryzhov mailto:iryz...@nfware.com>> > wrote: > > > > > >> Current implementation of kni_ethtool_ops just uses corresponding > > >> ethtool_ops function of underlying driver for all functions > except for > > >> .get_link. This commit sets kni->net_dev->ethtool_ops directly to > the > > >> ethtool_ops of the corresponding driver. > > >> > > >> For unknown drivers (all but ixgbe and i40e) we still use > > >> kni_ethtool_ops with implemented .get_link function. > > >> > > >> Signed-off-by: Igor Ryzhov iryz...@nfware.com>> > > > > > > Why does KNI still support ethtool which: > > > 1. Only works on a subset of devices > > > 2. Requires a 3rd implmentation of the HW device (Linux, DPDK, > and KNI) > > > > +1 to drop ethtool support, last time we tried concern was anybody > may be using > > it, perhaps we can try again. > > > > > > > > Then again why does KNI exist at all? What is missing from virtio > user which > > > is faster anyway. > > > > > > >
Re: [dpdk-dev] [RFC] kni: remove ethtool support
Hi Ferruh, Please, look at my patch http://patches.dpdk.org/patch/48454/ and consider rebasing your patch over mine. As we discussed with Stephen, KNI needs to supply ethtool_ops with .get_link function, to properly support link status. So we should save ethtool_ops and implement .get_link using standard ethtool_op_get_link. Best regards, Igor On Tue, Dec 18, 2018 at 4:28 AM Ferruh Yigit wrote: > Current design requires kernel drivers and they need to be probed by Linux > up to some level so that they can be usable by DPDK for ethtool support, > this requires maintaining the Linux drivers in DPDK. Also ethtool support > is limited and hard, if not impossible, to expand to other PMDs. Since KNI > ethtool support is not used commonly, if not used at all, removing the > support for the sake of simplicity and maintenance. Signed-off-by: Ferruh > Yigit --- config/common_base | 1 - > .../sample_app_ug/kernel_nic_interface.rst | 10 - examples/kni/main.c | 9 - > kernel/linux/kni/Kbuild | 4 +- kernel/linux/kni/Makefile | 26 +- > kernel/linux/kni/ethtool/README | 71 - > kernel/linux/kni/ethtool/igb/e1000_82575.c | 3650 -- > kernel/linux/kni/ethtool/igb/e1000_82575.h | 494 - > kernel/linux/kni/ethtool/igb/e1000_api.c | 1144 -- > kernel/linux/kni/ethtool/igb/e1000_api.h | 142 - > kernel/linux/kni/ethtool/igb/e1000_defines.h | 1365 -- > kernel/linux/kni/ethtool/igb/e1000_hw.h | 778 -- > kernel/linux/kni/ethtool/igb/e1000_i210.c | 894 -- > kernel/linux/kni/ethtool/igb/e1000_i210.h | 76 - > kernel/linux/kni/ethtool/igb/e1000_mac.c | 2081 > kernel/linux/kni/ethtool/igb/e1000_mac.h | 65 - > kernel/linux/kni/ethtool/igb/e1000_manage.c | 539 - > kernel/linux/kni/ethtool/igb/e1000_manage.h | 74 - > kernel/linux/kni/ethtool/igb/e1000_mbx.c | 510 - > kernel/linux/kni/ethtool/igb/e1000_mbx.h | 72 - > kernel/linux/kni/ethtool/igb/e1000_nvm.c | 950 -- > kernel/linux/kni/ethtool/igb/e1000_nvm.h | 60 - > kernel/linux/kni/ethtool/igb/e1000_osdep.h | 121 - > kernel/linux/kni/ethtool/igb/e1000_phy.c | 3392 - > kernel/linux/kni/ethtool/igb/e1000_phy.h | 241 - > kernel/linux/kni/ethtool/igb/e1000_regs.h | 631 - > kernel/linux/kni/ethtool/igb/igb.h | 844 -- > kernel/linux/kni/ethtool/igb/igb_ethtool.c | 2851 - > kernel/linux/kni/ethtool/igb/igb_main.c | 10344 > kernel/linux/kni/ethtool/igb/igb_param.c | 832 -- > kernel/linux/kni/ethtool/igb/igb_regtest.h | 234 - > kernel/linux/kni/ethtool/igb/igb_vmdq.c | 421 - > kernel/linux/kni/ethtool/igb/igb_vmdq.h | 31 - > kernel/linux/kni/ethtool/igb/kcompat.h | 3945 -- > kernel/linux/kni/ethtool/igb/meson.build | 16 - > kernel/linux/kni/ethtool/ixgbe/ixgbe.h | 912 -- > kernel/linux/kni/ethtool/ixgbe/ixgbe_82598.c | 1281 -- > kernel/linux/kni/ethtool/ixgbe/ixgbe_82598.h | 29 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_82599.c | 2299 > kernel/linux/kni/ethtool/ixgbe/ixgbe_82599.h | 43 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_api.c | 1142 -- > kernel/linux/kni/ethtool/ixgbe/ixgbe_api.h | 153 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_common.c | 4067 -- > kernel/linux/kni/ethtool/ixgbe/ixgbe_common.h | 125 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_dcb.h | 153 - > .../linux/kni/ethtool/ixgbe/ixgbe_ethtool.c | 2894 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_fcoe.h | 76 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_main.c | 2951 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_mbx.h | 90 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_osdep.h | 117 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_phy.c | 1832 --- > kernel/linux/kni/ethtool/ixgbe/ixgbe_phy.h | 122 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_type.h | 3239 - > kernel/linux/kni/ethtool/ixgbe/ixgbe_x540.c | 922 -- > kernel/linux/kni/ethtool/ixgbe/ixgbe_x540.h | 43 - > kernel/linux/kni/ethtool/ixgbe/kcompat.c | 1231 -- > kernel/linux/kni/ethtool/ixgbe/kcompat.h | 3140 - > kernel/linux/kni/ethtool/ixgbe/meson.build | 13 - > kernel/linux/kni/ethtool/meson.build | 5 - kernel/linux/kni/kni_dev.h | 8 - > kernel/linux/kni/kni_ethtool.c | 229 - kernel/linux/kni/kni_misc.c | 82 +- > kernel/linux/kni/meson.build | 9 +- lib/librte_kni/rte_kni.c | 5 - > lib/librte_kni/rte_kni.h | 4 +- 65 files changed, 15 insertions(+), 64119 > deletions(-) delete mode 100644 kernel/linux/kni/ethtool/README delete mode > 100644 kernel/linux/kni/ethtool/igb/e1000_82575.c delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_82575.h delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_api.c delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_api.h delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_defines.h delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_hw.h delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_i210.c delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_i210.h delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_mac.c delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_mac.h delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000_manage.c delete mode 100644 > kernel/linux/kni/ethtool/igb/e1000
Re: [dpdk-dev] [PATCH v2] kni: use kni_ethtool_ops only with unknown drivers
Hi Ferruh, What about the patch? I also support dropping ethtool for ixgbe and i40e, but to save generic ethtool_ops with .get_link implementation, because it's an essential function that works correctly after proper implementation of carrier status that was merged into 18.11. Also, other ethtool operations may be implemented in a driver-independent way using the same concept as for netdev_ops. On Mon, Dec 3, 2018 at 4:09 PM Ferruh Yigit wrote: > On 11/30/2018 11:38 PM, Stephen Hemminger wrote: > > On Fri, 30 Nov 2018 22:47:50 +0300 > > Igor Ryzhov wrote: > > > >> Current implementation of kni_ethtool_ops just uses corresponding > >> ethtool_ops function of underlying driver for all functions except for > >> .get_link. This commit sets kni->net_dev->ethtool_ops directly to the > >> ethtool_ops of the corresponding driver. > >> > >> For unknown drivers (all but ixgbe and i40e) we still use > >> kni_ethtool_ops with implemented .get_link function. > >> > >> Signed-off-by: Igor Ryzhov > > > > Why does KNI still support ethtool which: > > 1. Only works on a subset of devices > > 2. Requires a 3rd implmentation of the HW device (Linux, DPDK, and KNI) > > +1 to drop ethtool support, last time we tried concern was anybody may be > using > it, perhaps we can try again. > > > > > Then again why does KNI exist at all? What is missing from virtio user > which > > is faster anyway. > > > >
Re: [dpdk-dev] [PATCH v2] kni: use kni_ethtool_ops only with unknown drivers
Stephen, ethtool_get_link returns EOPNOTSUPP if device doesn't supply get_link: static int ethtool_get_link(struct net_device *dev, char __user *useraddr) { struct ethtool_value edata = { .cmd = ETHTOOL_GLINK }; if (!dev->ethtool_ops->get_link) return -EOPNOTSUPP; edata.data = netif_running(dev) && dev->ethtool_ops->get_link(dev); if (copy_to_user(useraddr, &edata, sizeof(edata))) return -EFAULT; return 0; } On Sat, Dec 1, 2018 at 8:31 PM Stephen Hemminger wrote: > On Sat, 1 Dec 2018 14:12:54 +0300 > Igor Ryzhov wrote: > > > Hi Stephen, > > > > I also do not see the point of the current implementation of ethtool > > support. > > That's why I sent this patch – it enables ethtool_ops for all devices, > > independent of the underlying driver. > > Right now only .get_link is supported, but I am thinking about > > implementation of a larger set of functions, using req/resp queue, like > > netdev_ops functions are working. > > > > Regarding the KNI itself, we use it as Linux mirror of physical port for: > > 1. Port configuration from Linux – such functions as set_mac, change_mtu, > > etc. And ethtool_ops will be used the same way. > > 2. Passing control-plane packets to Linux. > > > > Can virtio user be used the same way, as a mirror of physical port? > > > > Best regards, > > Igor > > In Linux if device does not supply get_link the base code does the > right thing > > > u32 ethtool_op_get_link(struct net_device *dev) > { > return netif_carrier_ok(dev) ? 1 : 0; > } > > > > Doing set_mac, change_mtu and ethtool_ops in virtio_user should be possible > but probably not implemented. > >
Re: [dpdk-dev] [PATCH v2] kni: use kni_ethtool_ops only with unknown drivers
Hi Stephen, I also do not see the point of the current implementation of ethtool support. That's why I sent this patch – it enables ethtool_ops for all devices, independent of the underlying driver. Right now only .get_link is supported, but I am thinking about implementation of a larger set of functions, using req/resp queue, like netdev_ops functions are working. Regarding the KNI itself, we use it as Linux mirror of physical port for: 1. Port configuration from Linux – such functions as set_mac, change_mtu, etc. And ethtool_ops will be used the same way. 2. Passing control-plane packets to Linux. Can virtio user be used the same way, as a mirror of physical port? Best regards, Igor On Sat, Dec 1, 2018 at 2:38 AM Stephen Hemminger wrote: > On Fri, 30 Nov 2018 22:47:50 +0300 > Igor Ryzhov wrote: > > > Current implementation of kni_ethtool_ops just uses corresponding > > ethtool_ops function of underlying driver for all functions except for > > .get_link. This commit sets kni->net_dev->ethtool_ops directly to the > > ethtool_ops of the corresponding driver. > > > > For unknown drivers (all but ixgbe and i40e) we still use > > kni_ethtool_ops with implemented .get_link function. > > > > Signed-off-by: Igor Ryzhov > > Why does KNI still support ethtool which: > 1. Only works on a subset of devices > 2. Requires a 3rd implmentation of the HW device (Linux, DPDK, and KNI) > > Then again why does KNI exist at all? What is missing from virtio user > which > is faster anyway. >
[dpdk-dev] [PATCH v2] kni: use kni_ethtool_ops only with unknown drivers
Current implementation of kni_ethtool_ops just uses corresponding ethtool_ops function of underlying driver for all functions except for .get_link. This commit sets kni->net_dev->ethtool_ops directly to the ethtool_ops of the corresponding driver. For unknown drivers (all but ixgbe and i40e) we still use kni_ethtool_ops with implemented .get_link function. Signed-off-by: Igor Ryzhov --- kernel/linux/kni/Makefile | 2 +- kernel/linux/kni/kni_ethtool.c | 210 - kernel/linux/kni/kni_misc.c| 9 +- 3 files changed, 7 insertions(+), 214 deletions(-) diff --git a/kernel/linux/kni/Makefile b/kernel/linux/kni/Makefile index 282be7b68..ee5e1e136 100644 --- a/kernel/linux/kni/Makefile +++ b/kernel/linux/kni/Makefile @@ -30,7 +30,7 @@ endif # SRCS-y := kni_misc.c SRCS-y += kni_net.c -SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += kni_ethtool.c +SRCS-y += kni_ethtool.c SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_main.c SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_api.c diff --git a/kernel/linux/kni/kni_ethtool.c b/kernel/linux/kni/kni_ethtool.c index b1c84f8f0..ccfd58ef0 100644 --- a/kernel/linux/kni/kni_ethtool.c +++ b/kernel/linux/kni/kni_ethtool.c @@ -8,218 +8,8 @@ #include #include "kni_dev.h" -static int -kni_check_if_running(struct net_device *dev) -{ - struct kni_dev *priv = netdev_priv(dev); - - if (priv->lad_dev) - return 0; - else - return -EOPNOTSUPP; -} - -static void -kni_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_drvinfo(priv->lad_dev, info); -} - -/* ETHTOOL_GLINKSETTINGS replaces ETHTOOL_GSET */ -#ifndef ETHTOOL_GLINKSETTINGS -static int -kni_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->get_settings(priv->lad_dev, ecmd); -} -#endif - -/* ETHTOOL_SLINKSETTINGS replaces ETHTOOL_SSET */ -#ifndef ETHTOOL_SLINKSETTINGS -static int -kni_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_settings(priv->lad_dev, ecmd); -} -#endif - -static void -kni_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_wol(priv->lad_dev, wol); -} - -static int -kni_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_wol(priv->lad_dev, wol); -} - -static int -kni_nway_reset(struct net_device *dev) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->nway_reset(priv->lad_dev); -} - -static int -kni_get_eeprom_len(struct net_device *dev) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->get_eeprom_len(priv->lad_dev); -} - -static int -kni_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, - u8 *bytes) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->get_eeprom(priv->lad_dev, eeprom, - bytes); -} - -static int -kni_set_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, - u8 *bytes) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_eeprom(priv->lad_dev, eeprom, - bytes); -} - -static void -kni_get_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_ringparam(priv->lad_dev, ring); -} - -static int -kni_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_ringparam(priv->lad_dev, ring); -} - -static void -kni_get_pauseparam(struct net_device *dev, struct ethtool_pauseparam *pause) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_pauseparam(priv->lad_dev, pause); -} - -static int -kni_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam *pause) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_pauseparam(priv->lad_dev, - pause); -} - -static u32 -kni_get_m
[dpdk-dev] [PATCH] kni: use kni_ethtool_ops only with unknown drivers
Current implementation of kni_ethtool_ops just uses corresponding ethtool_ops function of underlying driver for all functions except for .get_link. This commit sets kni->net_dev->ethtool_ops directly to the ethtool_ops of the corresponding driver. For unknown drivers (all but ixgbe and i40e) we still use kni_ethtool_ops with implemented .get_link function. Signed-off-by: Igor Ryzhov --- kernel/linux/kni/Makefile | 2 +- kernel/linux/kni/kni_ethtool.c | 210 - kernel/linux/kni/kni_misc.c| 6 +- 3 files changed, 6 insertions(+), 212 deletions(-) diff --git a/kernel/linux/kni/Makefile b/kernel/linux/kni/Makefile index 282be7b68..ee5e1e136 100644 --- a/kernel/linux/kni/Makefile +++ b/kernel/linux/kni/Makefile @@ -30,7 +30,7 @@ endif # SRCS-y := kni_misc.c SRCS-y += kni_net.c -SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += kni_ethtool.c +SRCS-y += kni_ethtool.c SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_main.c SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_api.c diff --git a/kernel/linux/kni/kni_ethtool.c b/kernel/linux/kni/kni_ethtool.c index b1c84f8f0..ccfd58ef0 100644 --- a/kernel/linux/kni/kni_ethtool.c +++ b/kernel/linux/kni/kni_ethtool.c @@ -8,218 +8,8 @@ #include #include "kni_dev.h" -static int -kni_check_if_running(struct net_device *dev) -{ - struct kni_dev *priv = netdev_priv(dev); - - if (priv->lad_dev) - return 0; - else - return -EOPNOTSUPP; -} - -static void -kni_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_drvinfo(priv->lad_dev, info); -} - -/* ETHTOOL_GLINKSETTINGS replaces ETHTOOL_GSET */ -#ifndef ETHTOOL_GLINKSETTINGS -static int -kni_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->get_settings(priv->lad_dev, ecmd); -} -#endif - -/* ETHTOOL_SLINKSETTINGS replaces ETHTOOL_SSET */ -#ifndef ETHTOOL_SLINKSETTINGS -static int -kni_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_settings(priv->lad_dev, ecmd); -} -#endif - -static void -kni_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_wol(priv->lad_dev, wol); -} - -static int -kni_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_wol(priv->lad_dev, wol); -} - -static int -kni_nway_reset(struct net_device *dev) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->nway_reset(priv->lad_dev); -} - -static int -kni_get_eeprom_len(struct net_device *dev) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->get_eeprom_len(priv->lad_dev); -} - -static int -kni_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, - u8 *bytes) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->get_eeprom(priv->lad_dev, eeprom, - bytes); -} - -static int -kni_set_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, - u8 *bytes) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_eeprom(priv->lad_dev, eeprom, - bytes); -} - -static void -kni_get_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_ringparam(priv->lad_dev, ring); -} - -static int -kni_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_ringparam(priv->lad_dev, ring); -} - -static void -kni_get_pauseparam(struct net_device *dev, struct ethtool_pauseparam *pause) -{ - struct kni_dev *priv = netdev_priv(dev); - - priv->lad_dev->ethtool_ops->get_pauseparam(priv->lad_dev, pause); -} - -static int -kni_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam *pause) -{ - struct kni_dev *priv = netdev_priv(dev); - - return priv->lad_dev->ethtool_ops->set_pauseparam(priv->lad_dev, - pause); -} - -static u32 -kni_get_m
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
Hi Ferruh, header_ops.parse method is used by raw-sockets to fill sockaddr_ll structure. It is used, for example, in isisd for frrouting. Regarding your question about eth_header_ops – I, unfortunately, don't know why .cache and .cache_update are disabled for KNI. I also think that it will be better to use default eth_header_ops. Best regards, Igor On Tue, Oct 2, 2018 at 7:58 PM Ferruh Yigit wrote: > On 9/27/2018 1:02 AM, Igor Ryzhov wrote: > > Signed-off-by: Igor Ryzhov > > Hi Igor, > > What is the motivation to add this support? What is enabled by this? > > > Meanwhile, why we are not using eth_header_ops, which is already set by > ether_setup(). > To disable .cache & .cache_update? > > If so why not using relevant eth_header_ops (eth_header, eth_header_parse > ..) > instead of implementing ours? > > > --- > > kernel/linux/kni/kni_net.c | 14 ++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > > index 7fcfa106c..128a5477c 100644 > > --- a/kernel/linux/kni/kni_net.c > > +++ b/kernel/linux/kni/kni_net.c > > @@ -678,6 +678,19 @@ kni_net_header(struct sk_buff *skb, struct > net_device *dev, > > return dev->hard_header_len; > > } > > > > +/* > > + * Extract hardware address from packet > > + */ > > +static int > > +kni_net_header_parse(const struct sk_buff *skb, unsigned char *haddr) > > +{ > > + const struct ethhdr *eth = eth_hdr(skb); > > + > > + memcpy(haddr, eth->h_source, ETH_ALEN); > > + > > + return ETH_ALEN; > > +} > > + > > /* > > * Re-fill the eth header > > */ > > @@ -739,6 +752,7 @@ kni_net_change_carrier(struct net_device *dev, bool > new_carrier) > > > > static const struct header_ops kni_net_header_ops = { > > .create = kni_net_header, > > + .parse = kni_net_header_parse, > > #ifdef HAVE_REBUILD_HEADER > > .rebuild = kni_net_rebuild_header, > > #endif /* < 4.1.0 */ > > > >
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
Hello Stephen, I looked at KNI code again, memcpy is already used everywhere (kni_net_header, kni_net_rebuild_header) for MAC address copy. So I propose to accept the patch as it is. memcpy can be replaced by ether_addr_copy in all functions at once in separate patch. Igor On Sat, Sep 29, 2018 at 10:19 PM Igor Ryzhov wrote: > It's just exact copy of eth_header_parse function from Linux kernel. > > No problem, can do that with ether_addr_copy. > > On Sat, Sep 29, 2018 at 10:22 AM Stephen Hemminger < > step...@networkplumber.org> wrote: > >> On Thu, 27 Sep 2018 03:02:24 +0300 >> Igor Ryzhov wrote: >> >> > +/* >> > + * Extract hardware address from packet >> > + */ >> > +static int >> > +kni_net_header_parse(const struct sk_buff *skb, unsigned char *haddr) >> > +{ >> > + const struct ethhdr *eth = eth_hdr(skb); >> > + >> > + memcpy(haddr, eth->h_source, ETH_ALEN); >> > + >> > + return ETH_ALEN; >> > +} >> >> Kernel has function ether_addr_copy which is marginally faster and >> commonly used. >> >
Re: [dpdk-dev] [PATCH] kni: implement header_ops parse method
It's just exact copy of eth_header_parse function from Linux kernel. No problem, can do that with ether_addr_copy. On Sat, Sep 29, 2018 at 10:22 AM Stephen Hemminger < step...@networkplumber.org> wrote: > On Thu, 27 Sep 2018 03:02:24 +0300 > Igor Ryzhov wrote: > > > +/* > > + * Extract hardware address from packet > > + */ > > +static int > > +kni_net_header_parse(const struct sk_buff *skb, unsigned char *haddr) > > +{ > > + const struct ethhdr *eth = eth_hdr(skb); > > + > > + memcpy(haddr, eth->h_source, ETH_ALEN); > > + > > + return ETH_ALEN; > > +} > > Kernel has function ether_addr_copy which is marginally faster and > commonly used. >
Re: [dpdk-dev] [PATCH v2 1/5] kni: add API to set link status on kernel interface
Hi Dan, Ferruh, Why do we need "struct rte_eth_link" as a parameter at all? Only link status is used in the function – let's use it only: rte_kni_update_link(struct rte_kni *kni, int link_status) /* 0 – down, 1 – up */ It will also solve your differences as we won't have any "redundant" information to print :) Best regards, Igor On Fri, Sep 28, 2018 at 2:52 AM Dan Gora wrote: > On Thu, Sep 27, 2018 at 8:44 PM, Ferruh Yigit > wrote: > >> Well, yes the link_status (link up, link down) _is_ applied to the KNI > >> interface. When that occurs, most people want to know what the link > >> speed is that the link came up at. > > > > +1 to this, people would like to know link speed of the interface. > > Are you printing link speed of interface? You are printing whatever user > pass to > > API. > > There is no such thing as "link speed of the interface". The link > speed is the speed of the underlying Ethernet link that the interface > corresponds to. This is true for all other ethernet interfaces in the > kernel. > > > I guess you trust to user to provide correct values there, but since > only link > > up & down matters, what prevents user to leave other fields, like speed, > just > > random values? > > Nothing. What prevents anyone from providing random values for > anything? The point of the API was to make it super simple, just: > > rte_eth_link_get_nowait(portid, &link); > rte_kni_update_link(p[portid]->kni[i], &link); > > No messing around with the link info retrieved from > rte_eth_link_get(_nowait), just dump in the struct that was returned. >
[dpdk-dev] [PATCH] kni: implement header_ops parse method
Signed-off-by: Igor Ryzhov --- kernel/linux/kni/kni_net.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 7fcfa106c..128a5477c 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -678,6 +678,19 @@ kni_net_header(struct sk_buff *skb, struct net_device *dev, return dev->hard_header_len; } +/* + * Extract hardware address from packet + */ +static int +kni_net_header_parse(const struct sk_buff *skb, unsigned char *haddr) +{ + const struct ethhdr *eth = eth_hdr(skb); + + memcpy(haddr, eth->h_source, ETH_ALEN); + + return ETH_ALEN; +} + /* * Re-fill the eth header */ @@ -739,6 +752,7 @@ kni_net_change_carrier(struct net_device *dev, bool new_carrier) static const struct header_ops kni_net_header_ops = { .create = kni_net_header, + .parse = kni_net_header_parse, #ifdef HAVE_REBUILD_HEADER .rebuild = kni_net_rebuild_header, #endif /* < 4.1.0 */ -- 2.19.0
[dpdk-dev] [PATCH v3] kni: dynamically allocate memory for each KNI
Long time ago preallocation of memory for KNI was introduced in commit 0c6bc8e. It was done because of lack of ability to free previously allocated memzones, which led to memzone exhaustion. Currently memzones can be freed and this patch uses this ability for dynamic KNI memory allocation. Signed-off-by: Igor Ryzhov --- v3: * style fixes v2: * allocate KNI using rte_zmalloc * swap reserve/release functions * use "kni" as a variable name * use macros for memzone names lib/librte_kni/rte_kni.c | 496 +-- lib/librte_kni/rte_kni.h | 6 +- test/test/test_kni.c | 6 - 3 files changed, 218 insertions(+), 290 deletions(-) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 8a8f6c1cc..8b68008bb 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -18,6 +18,9 @@ #include #include #include +#include +#include +#include #include #include "rte_kni_fifo.h" @@ -30,7 +33,23 @@ #define KNI_REQUEST_MBUF_NUM_MAX 32 -#define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0) +#define KNI_MEM_CHECK(cond, fail) do { if (cond) goto fail; } while (0) + +#define KNI_MZ_NAME_FMT"kni_info_%s" +#define KNI_TX_Q_MZ_NAME_FMT "kni_tx_%s" +#define KNI_RX_Q_MZ_NAME_FMT "kni_rx_%s" +#define KNI_ALLOC_Q_MZ_NAME_FMT"kni_alloc_%s" +#define KNI_FREE_Q_MZ_NAME_FMT "kni_free_%s" +#define KNI_REQ_Q_MZ_NAME_FMT "kni_req_%s" +#define KNI_RESP_Q_MZ_NAME_FMT "kni_resp_%s" +#define KNI_SYNC_ADDR_MZ_NAME_FMT "kni_sync_%s" + +TAILQ_HEAD(rte_kni_list, rte_tailq_entry); + +static struct rte_tailq_elem rte_kni_tailq = { + .name = "RTE_KNI", +}; +EAL_REGISTER_TAILQ(rte_kni_tailq) /** * KNI context @@ -42,18 +61,26 @@ struct rte_kni { struct rte_mempool *pktmbuf_pool; /**< pkt mbuf mempool */ unsigned mbuf_size; /**< mbuf size */ + const struct rte_memzone *m_tx_q; /**< TX queue memzone */ + const struct rte_memzone *m_rx_q; /**< RX queue memzone */ + const struct rte_memzone *m_alloc_q;/**< Alloc queue memzone */ + const struct rte_memzone *m_free_q; /**< Free queue memzone */ + struct rte_kni_fifo *tx_q; /**< TX queue */ struct rte_kni_fifo *rx_q; /**< RX queue */ struct rte_kni_fifo *alloc_q; /**< Allocated mbufs queue */ struct rte_kni_fifo *free_q;/**< To be freed mbufs queue */ + const struct rte_memzone *m_req_q; /**< Request queue memzone */ + const struct rte_memzone *m_resp_q; /**< Response queue memzone */ + const struct rte_memzone *m_sync_addr;/**< Sync addr memzone */ + /* For request & response */ struct rte_kni_fifo *req_q; /**< Request queue */ struct rte_kni_fifo *resp_q;/**< Response queue */ void * sync_addr; /**< Req/Resp Mem address */ struct rte_kni_ops ops; /**< operations for request */ - uint8_t in_use : 1; /**< kni in use */ }; enum kni_ops_status { @@ -61,231 +88,111 @@ enum kni_ops_status { KNI_REQ_REGISTERED, }; -/** - * KNI memzone pool slot - */ -struct rte_kni_memzone_slot { - uint32_t id; - uint8_t in_use : 1;/**< slot in use */ - - /* Memzones */ - const struct rte_memzone *m_ctx; /**< KNI ctx */ - const struct rte_memzone *m_tx_q; /**< TX queue */ - const struct rte_memzone *m_rx_q; /**< RX queue */ - const struct rte_memzone *m_alloc_q; /**< Allocated mbufs queue */ - const struct rte_memzone *m_free_q;/**< To be freed mbufs queue */ - const struct rte_memzone *m_req_q; /**< Request queue */ - const struct rte_memzone *m_resp_q;/**< Response queue */ - const struct rte_memzone *m_sync_addr; - - /* Free linked list */ - struct rte_kni_memzone_slot *next; /**< Next slot link.list */ -}; - -/** - * KNI memzone pool - */ -struct rte_kni_memzone_pool { - uint8_t initialized : 1;/**< Global KNI pool init flag */ - - uint32_t max_ifaces;/**< Max. num of KNI ifaces */ - struct rte_kni_memzone_slot *slots;/**< Pool slots */ - rte_spinlock_t mutex; /**< alloc/release mutex */ - - /* Free memzone slots linked-list */ - struct rte_kni_memzone_slot *free; /**< First empty slot */ - struct rte_kni_memzone_slot *free_tail;/**< Last empty slot */ -}; - - static void kni_free_mbufs(struct rte_kni *kni); static void kni_allocate_mbufs(struct rte_kni *kni); static v
Re: [dpdk-dev] [PATCH v2] kni: dynamically allocate memory for each KNI
Hi Ferruh, Will fix it today. Igor On Wed, Sep 26, 2018 at 1:41 PM Ferruh Yigit wrote: > On 9/23/2018 8:12 PM, Igor Ryzhov wrote: > > Long time ago preallocation of memory for KNI was introduced in commit > > 0c6bc8e. It was done because of lack of ability to free previously > > allocated memzones, which led to memzone exhaustion. Currently memzones > > can be freed and this patch uses this ability for dynamic KNI memory > > allocation. > > Hi Igor, > > Good cleanup, thanks. > +1 to eal_tailq for ctx > > A few minor comments below, but they are not significant enough to block > the > patch, please let us know if you don't have bandwidth for a new version. > > > Signed-off-by: Igor Ryzhov > <...> > > > @@ -294,41 +180,52 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, > > { > > int ret; > > struct rte_kni_device_info dev_info; > > - struct rte_kni *ctx; > > - char intf_name[RTE_KNI_NAMESIZE]; > > - const struct rte_memzone *mz; > > - struct rte_kni_memzone_slot *slot = NULL; > > + struct rte_kni *kni; > > + struct rte_tailq_entry *te = NULL; > > + struct rte_kni_list *kni_list; > > + > > + kni_list = RTE_TAILQ_CAST(rte_kni_tailq.head, rte_kni_list); > > Can you move this below input validation, no need this assignment if API > will > fail because of wrong input. > > > if (!pktmbuf_pool || !conf || !conf->name[0]) > > return NULL; > > > > /* Check if KNI subsystem has been initialized */ > > - if (kni_memzone_pool.initialized != 1) { > > + if (kni_fd < 0) { > > RTE_LOG(ERR, KNI, "KNI subsystem has not been initialized. > Invoke rte_kni_init() first\n"); > > return NULL; > > } > > > > - /* Get an available slot from the pool */ > > - slot = kni_memzone_pool_alloc(); > > - if (!slot) { > > - RTE_LOG(ERR, KNI, "Cannot allocate more KNI interfaces; > increase the number of max_kni_ifaces(current %d) or release unused > ones.\n", > > - kni_memzone_pool.max_ifaces); > > - return NULL; > > + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > + > > + TAILQ_FOREACH(te, kni_list, next) { > > + kni = (struct rte_kni *) te->data; > > + if (strncmp(conf->name, kni->name, RTE_KNI_NAMESIZE) == 0) > > + break; > > } > > This is rte_kni_get(), why not reuse it. You can create an version of it > without > lock. > > like _rte_kni_get() which you can call here. > > And > rte_kni_get() > rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK); > _rte_kni_get() > rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK); > > <...> > > > + > > + if (te == NULL) { > > + goto unlock; > > + } > > No need {} for single line. > One more below. >
[dpdk-dev] [PATCH v2] kni: dynamically allocate memory for each KNI
Long time ago preallocation of memory for KNI was introduced in commit 0c6bc8e. It was done because of lack of ability to free previously allocated memzones, which led to memzone exhaustion. Currently memzones can be freed and this patch uses this ability for dynamic KNI memory allocation. Signed-off-by: Igor Ryzhov --- v2: * allocate KNI using rte_zmalloc * swap reserve/release functions * use "kni" as a variable name * use macros for memzone names lib/librte_kni/rte_kni.c | 502 +-- lib/librte_kni/rte_kni.h | 6 +- test/test/test_kni.c | 6 - 3 files changed, 218 insertions(+), 296 deletions(-) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 8a8f6c1cc..6af6e7efe 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -18,6 +18,9 @@ #include #include #include +#include +#include +#include #include #include "rte_kni_fifo.h" @@ -30,7 +33,23 @@ #define KNI_REQUEST_MBUF_NUM_MAX 32 -#define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0) +#define KNI_MEM_CHECK(cond, fail) do { if (cond) goto fail; } while (0) + +#define KNI_MZ_NAME_FMT"kni_info_%s" +#define KNI_TX_Q_MZ_NAME_FMT "kni_tx_%s" +#define KNI_RX_Q_MZ_NAME_FMT "kni_rx_%s" +#define KNI_ALLOC_Q_MZ_NAME_FMT"kni_alloc_%s" +#define KNI_FREE_Q_MZ_NAME_FMT "kni_free_%s" +#define KNI_REQ_Q_MZ_NAME_FMT "kni_req_%s" +#define KNI_RESP_Q_MZ_NAME_FMT "kni_resp_%s" +#define KNI_SYNC_ADDR_MZ_NAME_FMT "kni_sync_%s" + +TAILQ_HEAD(rte_kni_list, rte_tailq_entry); + +static struct rte_tailq_elem rte_kni_tailq = { + .name = "RTE_KNI", +}; +EAL_REGISTER_TAILQ(rte_kni_tailq) /** * KNI context @@ -42,18 +61,26 @@ struct rte_kni { struct rte_mempool *pktmbuf_pool; /**< pkt mbuf mempool */ unsigned mbuf_size; /**< mbuf size */ + const struct rte_memzone *m_tx_q; /**< TX queue memzone */ + const struct rte_memzone *m_rx_q; /**< RX queue memzone */ + const struct rte_memzone *m_alloc_q;/**< Alloc queue memzone */ + const struct rte_memzone *m_free_q; /**< Free queue memzone */ + struct rte_kni_fifo *tx_q; /**< TX queue */ struct rte_kni_fifo *rx_q; /**< RX queue */ struct rte_kni_fifo *alloc_q; /**< Allocated mbufs queue */ struct rte_kni_fifo *free_q;/**< To be freed mbufs queue */ + const struct rte_memzone *m_req_q; /**< Request queue memzone */ + const struct rte_memzone *m_resp_q; /**< Response queue memzone */ + const struct rte_memzone *m_sync_addr;/**< Sync addr memzone */ + /* For request & response */ struct rte_kni_fifo *req_q; /**< Request queue */ struct rte_kni_fifo *resp_q;/**< Response queue */ void * sync_addr; /**< Req/Resp Mem address */ struct rte_kni_ops ops; /**< operations for request */ - uint8_t in_use : 1; /**< kni in use */ }; enum kni_ops_status { @@ -61,232 +88,91 @@ enum kni_ops_status { KNI_REQ_REGISTERED, }; -/** - * KNI memzone pool slot - */ -struct rte_kni_memzone_slot { - uint32_t id; - uint8_t in_use : 1;/**< slot in use */ - - /* Memzones */ - const struct rte_memzone *m_ctx; /**< KNI ctx */ - const struct rte_memzone *m_tx_q; /**< TX queue */ - const struct rte_memzone *m_rx_q; /**< RX queue */ - const struct rte_memzone *m_alloc_q; /**< Allocated mbufs queue */ - const struct rte_memzone *m_free_q;/**< To be freed mbufs queue */ - const struct rte_memzone *m_req_q; /**< Request queue */ - const struct rte_memzone *m_resp_q;/**< Response queue */ - const struct rte_memzone *m_sync_addr; - - /* Free linked list */ - struct rte_kni_memzone_slot *next; /**< Next slot link.list */ -}; - -/** - * KNI memzone pool - */ -struct rte_kni_memzone_pool { - uint8_t initialized : 1;/**< Global KNI pool init flag */ - - uint32_t max_ifaces;/**< Max. num of KNI ifaces */ - struct rte_kni_memzone_slot *slots;/**< Pool slots */ - rte_spinlock_t mutex; /**< alloc/release mutex */ - - /* Free memzone slots linked-list */ - struct rte_kni_memzone_slot *free; /**< First empty slot */ - struct rte_kni_memzone_slot *free_tail;/**< Last empty slot */ -}; - - static void kni_free_mbufs(struct rte_kni *kni); static void kni_allocate_mbufs(struct rte_kni *kni); static v
Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
Hi again, Forgot to mention netif_carrier_off changes. Your changes are necessary for correct link status management, but netif_carrier_off call should also be added to kni_ioctl_create before the register_netdev call. That's needed for correct link status even before first call to ndo_open. Best regards, Igor On Thu, Aug 30, 2018 at 12:49 PM, Igor Ryzhov wrote: > Hi Dan, > > We use KNI device exactly the same way you described – with IP addresses, > routing, etc. > And we also faced the same problem of having the actual link status in > Linux kernel. > > There is a special callback for link state management in net_device_ops > for soft-devices like KNI called ndo_change_carrier. > Current KNI driver implements it already, you just need to write to > /sys/class/net//carrier to change link status. > > Right now we implement it on application side, but I think it'll be good > to have this in rte_kni API. > > Here is our implementation: > > static int > linux_set_carrier(const char *name, int status) > { > char path[64]; > const char *carrier = status ? "1" : "0"; > int fd, ret; > > sprintf(path, "/sys/devices/virtual/net/%s/carrier", name); > fd = open(path, O_WRONLY); > if (fd == -1) { > return -errno; > } > > ret = write(fd, carrier, 2); > if (ret == -1) { > close(fd); > return -errno; > } > > close(fd); > > return 0; > } > > Best regards, > Igor > > On Thu, Aug 30, 2018 at 2:10 AM, Stephen Hemminger < > step...@networkplumber.org> wrote: > >> On Wed, 29 Aug 2018 19:41:23 -0300 >> Dan Gora wrote: >> >> > On Wed, Aug 29, 2018 at 7:12 PM, Dan Gora wrote: >> > > On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger >> > > wrote: >> > >>> >> Add a new API function to KNI, rte_kni_update_link() to allow >> DPDK >> > >>> >> applications to update the link state for the KNI network >> interfaces >> > >>> >> in the linux kernel. >> > >>> >> >> > >>> >> Note that the default carrier state is set to off when the >> interface >> > >>> >> is opened. >> > >>> >> >> > >>> >> Signed-off-by: Dan Gora >> > >>> > >> > >>> > Do you really need a special ioctl for this? >> > >>> > There is already ability to set link state via sysfs or netlink. >> > >>> >> > >>> I think yes.. AFAIK sysfs does not constitute a stable API; >> > >> >> > >> It is a stable API on Linux. >> > > >> > >> > Actually this does not seem to be completely true... >> > >> > From Documentation/admin-guide/sysfs-rules.rst: >> > >> > Rules on how to access information in sysfs >> > === >> > >> > The kernel-exported sysfs exports internal kernel implementation details >> > and depends on internal kernel structures and layout. It is agreed upon >> > by the kernel developers that the Linux kernel does not provide a stable >> > internal API. Therefore, there are aspects of the sysfs interface that >> > may not be stable across kernel releases. >> > >> > >> > >> > - devices are only "devices" >> > There is no such thing like class-, bus-, physical devices, >> > interfaces, and such that you can rely on in userspace. Everything >> is >> > just simply a "device". Class-, bus-, physical, ... types are just >> > kernel implementation details which should not be expected by >> > applications that look for devices in sysfs. >> > >> > The properties of a device are: >> > >> > - devpath (``/devices/pci:00/:00:1d.1/usb2/2-2/2-2:1.0``) >> > >> > >> > - kernel name (``sda``, ``tty``, ``:00:1f.2``, ...) >> > >> > >> > - subsystem (``block``, ``tty``, ``pci``, ...) >> > >> > >> > - driver (``tg3``, ``ata_piix``, ``uhci_hcd``) >> > >> > >> > - attributes >> > >> > >> > Everything else is just a kernel driver-core implementation detail >> > that should not be assumed to be stable across kernel releases. >> >> Network device sysfs is stable. No one ever got around to putting it in >> documentation >> I wouldn't worry, once anything in /sys/class/net is added it is not >> going to change without major breakage in many many tools. >> > >
Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
Hi Dan, We use KNI device exactly the same way you described – with IP addresses, routing, etc. And we also faced the same problem of having the actual link status in Linux kernel. There is a special callback for link state management in net_device_ops for soft-devices like KNI called ndo_change_carrier. Current KNI driver implements it already, you just need to write to /sys/class/net//carrier to change link status. Right now we implement it on application side, but I think it'll be good to have this in rte_kni API. Here is our implementation: static int linux_set_carrier(const char *name, int status) { char path[64]; const char *carrier = status ? "1" : "0"; int fd, ret; sprintf(path, "/sys/devices/virtual/net/%s/carrier", name); fd = open(path, O_WRONLY); if (fd == -1) { return -errno; } ret = write(fd, carrier, 2); if (ret == -1) { close(fd); return -errno; } close(fd); return 0; } Best regards, Igor On Thu, Aug 30, 2018 at 2:10 AM, Stephen Hemminger < step...@networkplumber.org> wrote: > On Wed, 29 Aug 2018 19:41:23 -0300 > Dan Gora wrote: > > > On Wed, Aug 29, 2018 at 7:12 PM, Dan Gora wrote: > > > On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger > > > wrote: > > >>> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK > > >>> >> applications to update the link state for the KNI network > interfaces > > >>> >> in the linux kernel. > > >>> >> > > >>> >> Note that the default carrier state is set to off when the > interface > > >>> >> is opened. > > >>> >> > > >>> >> Signed-off-by: Dan Gora > > >>> > > > >>> > Do you really need a special ioctl for this? > > >>> > There is already ability to set link state via sysfs or netlink. > > >>> > > >>> I think yes.. AFAIK sysfs does not constitute a stable API; > > >> > > >> It is a stable API on Linux. > > > > > > > Actually this does not seem to be completely true... > > > > From Documentation/admin-guide/sysfs-rules.rst: > > > > Rules on how to access information in sysfs > > === > > > > The kernel-exported sysfs exports internal kernel implementation details > > and depends on internal kernel structures and layout. It is agreed upon > > by the kernel developers that the Linux kernel does not provide a stable > > internal API. Therefore, there are aspects of the sysfs interface that > > may not be stable across kernel releases. > > > > > > > > - devices are only "devices" > > There is no such thing like class-, bus-, physical devices, > > interfaces, and such that you can rely on in userspace. Everything is > > just simply a "device". Class-, bus-, physical, ... types are just > > kernel implementation details which should not be expected by > > applications that look for devices in sysfs. > > > > The properties of a device are: > > > > - devpath (``/devices/pci:00/:00:1d.1/usb2/2-2/2-2:1.0``) > > > > > > - kernel name (``sda``, ``tty``, ``:00:1f.2``, ...) > > > > > > - subsystem (``block``, ``tty``, ``pci``, ...) > > > > > > - driver (``tg3``, ``ata_piix``, ``uhci_hcd``) > > > > > > - attributes > > > > > > Everything else is just a kernel driver-core implementation detail > > that should not be assumed to be stable across kernel releases. > > Network device sysfs is stable. No one ever got around to putting it in > documentation > I wouldn't worry, once anything in /sys/class/net is added it is not going > to change without major breakage in many many tools. >
Re: [dpdk-dev] [PATCH] kni: dynamically allocate memory for each KNI
Hello Ferruh, Thanks for the review, comments inline. On Mon, Aug 27, 2018 at 8:06 PM, Ferruh Yigit wrote: > On 8/2/2018 3:25 PM, Igor Ryzhov wrote: > > Long time ago preallocation of memory for KNI was introduced in commit > > 0c6bc8e. It was done because of lack of ability to free previously > > allocated memzones, which led to memzone exhaustion. Currently memzones > > can be freed and this patch uses this ability for dynamic KNI memory > > allocation. > > Hi Igor, > > It is good to be able to allocate memory dynamically and get rid of the > "max_kni_ifaces" and "kni_memzone_pool", thanks for the patch. > > Overall looks good, a few comments below. > > > > > Signed-off-by: Igor Ryzhov > > --- > > lib/librte_kni/rte_kni.c | 392 --- > > lib/librte_kni/rte_kni.h | 6 +- > > test/test/test_kni.c | 6 - > > 3 files changed, 128 insertions(+), 276 deletions(-) > > > > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c > > index 8a8f6c1cc..028b44bfd 100644 > > --- a/lib/librte_kni/rte_kni.c > > +++ b/lib/librte_kni/rte_kni.c > > @@ -36,24 +36,33 @@ > > * KNI context > > */ > > struct rte_kni { > > + const struct rte_memzone *mz; /**< KNI context memzone */ > > I was thinking remove the context memzone and use rte_zmalloc() to create > kni > objects but updated rte_kni_get() API seems relaying this. > If you see any other way to get kni object from name in rte_kni_get(), I > am for > removing above *mz variable from rte_kni struct. > I had absolutely the same thought but didn't find a way to save rte_kni_get() API. Maybe someone has any ideas? Or maybe this API can be marked deprecated and deleted in future? > > <...> > > > +static void > > +kni_ctx_release_mz(struct rte_kni *ctx) > > +{ > > + rte_memzone_free(ctx->m_tx_q); > > + rte_memzone_free(ctx->m_rx_q); > > + rte_memzone_free(ctx->m_alloc_q); > > + rte_memzone_free(ctx->m_free_q); > > + rte_memzone_free(ctx->m_req_q); > > + rte_memzone_free(ctx->m_resp_q); > > + rte_memzone_free(ctx->m_sync_addr); > > > "ctx" sounds confusing to me, isn't this "rte_kni" object instance, why > not just > call it "kni" or if it is too generic "kni_obj" or similar? For other APIs > as well. > "ctx" was already used in the code, so I didn't change it. I also think that it's better to use "kni" – will change it in v2. > > And this is just a detail but about order of APIs would you mind having > first > reserve() one, later release() one? > Ok. > > <...> > > > -/* Shall be called before any allocation happens */ > > -void > > -rte_kni_init(unsigned int max_kni_ifaces) > > +static struct rte_kni * > > +kni_ctx_reserve(const char *name) > > { > > - uint32_t i; > > - struct rte_kni_memzone_slot *it; > > + struct rte_kni *ctx; > > const struct rte_memzone *mz; > > -#define OBJNAMSIZ 32 > > - char obj_name[OBJNAMSIZ]; > > char mz_name[RTE_MEMZONE_NAMESIZE]; > > > > - /* Immediately return if KNI is already initialized */ > > - if (kni_memzone_pool.initialized) { > > - RTE_LOG(WARNING, KNI, "Double call to rte_kni_init()"); > > - return; > > - } > > + snprintf(mz_name, RTE_MEMZONE_NAMESIZE, "kni_info_%s", name); > > Can you please convert memzone names, like "kni_info" to defines, for all > of them? > Ok. > > <...> > > > @@ -81,8 +81,12 @@ struct rte_kni_conf { > > * > > * @param max_kni_ifaces > > * The maximum number of KNI interfaces that can coexist concurrently > > + * > > + * @return > > + * - 0 indicates success. > > + * - negative value indicates failure. > > */ > > -void rte_kni_init(unsigned int max_kni_ifaces); > > +int rte_kni_init(unsigned int max_kni_ifaces); > > This changes the API. Return type changes from "void" to "int". I agree > "int" > makes more sense since API can fail, but this changes the ABI/API. > > Since existing binaries doesn't check the return type at all there may be > no > issue from ABI point of view but from API point of view some apps may get > return > value not checked warnings, not sure though. > > And the need of the API is questionable at this stage, it may be possible > to > move rte_kni_alloc() whe
[dpdk-dev] [PATCH] kni: dynamically allocate memory for each KNI
Long time ago preallocation of memory for KNI was introduced in commit 0c6bc8e. It was done because of lack of ability to free previously allocated memzones, which led to memzone exhaustion. Currently memzones can be freed and this patch uses this ability for dynamic KNI memory allocation. Signed-off-by: Igor Ryzhov --- lib/librte_kni/rte_kni.c | 392 --- lib/librte_kni/rte_kni.h | 6 +- test/test/test_kni.c | 6 - 3 files changed, 128 insertions(+), 276 deletions(-) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 8a8f6c1cc..028b44bfd 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -36,24 +36,33 @@ * KNI context */ struct rte_kni { + const struct rte_memzone *mz; /**< KNI context memzone */ char name[RTE_KNI_NAMESIZE];/**< KNI interface name */ uint16_t group_id; /**< Group ID of KNI devices */ uint32_t slot_id; /**< KNI pool slot ID */ struct rte_mempool *pktmbuf_pool; /**< pkt mbuf mempool */ unsigned mbuf_size; /**< mbuf size */ + const struct rte_memzone *m_tx_q; /**< TX queue memzone */ + const struct rte_memzone *m_rx_q; /**< RX queue memzone */ + const struct rte_memzone *m_alloc_q;/**< Alloc queue memzone */ + const struct rte_memzone *m_free_q; /**< Free queue memzone */ + struct rte_kni_fifo *tx_q; /**< TX queue */ struct rte_kni_fifo *rx_q; /**< RX queue */ struct rte_kni_fifo *alloc_q; /**< Allocated mbufs queue */ struct rte_kni_fifo *free_q;/**< To be freed mbufs queue */ + const struct rte_memzone *m_req_q; /**< Request queue memzone */ + const struct rte_memzone *m_resp_q; /**< Response queue memzone */ + const struct rte_memzone *m_sync_addr;/**< Sync addr memzone */ + /* For request & response */ struct rte_kni_fifo *req_q; /**< Request queue */ struct rte_kni_fifo *resp_q;/**< Response queue */ void * sync_addr; /**< Req/Resp Mem address */ struct rte_kni_ops ops; /**< operations for request */ - uint8_t in_use : 1; /**< kni in use */ }; enum kni_ops_status { @@ -61,232 +70,110 @@ enum kni_ops_status { KNI_REQ_REGISTERED, }; -/** - * KNI memzone pool slot - */ -struct rte_kni_memzone_slot { - uint32_t id; - uint8_t in_use : 1;/**< slot in use */ - - /* Memzones */ - const struct rte_memzone *m_ctx; /**< KNI ctx */ - const struct rte_memzone *m_tx_q; /**< TX queue */ - const struct rte_memzone *m_rx_q; /**< RX queue */ - const struct rte_memzone *m_alloc_q; /**< Allocated mbufs queue */ - const struct rte_memzone *m_free_q;/**< To be freed mbufs queue */ - const struct rte_memzone *m_req_q; /**< Request queue */ - const struct rte_memzone *m_resp_q;/**< Response queue */ - const struct rte_memzone *m_sync_addr; - - /* Free linked list */ - struct rte_kni_memzone_slot *next; /**< Next slot link.list */ -}; - -/** - * KNI memzone pool - */ -struct rte_kni_memzone_pool { - uint8_t initialized : 1;/**< Global KNI pool init flag */ - - uint32_t max_ifaces;/**< Max. num of KNI ifaces */ - struct rte_kni_memzone_slot *slots;/**< Pool slots */ - rte_spinlock_t mutex; /**< alloc/release mutex */ - - /* Free memzone slots linked-list */ - struct rte_kni_memzone_slot *free; /**< First empty slot */ - struct rte_kni_memzone_slot *free_tail;/**< Last empty slot */ -}; - - static void kni_free_mbufs(struct rte_kni *kni); static void kni_allocate_mbufs(struct rte_kni *kni); static volatile int kni_fd = -1; -static struct rte_kni_memzone_pool kni_memzone_pool = { - .initialized = 0, -}; -static const struct rte_memzone * -kni_memzone_reserve(const char *name, size_t len, int socket_id, - unsigned flags) +/* Shall be called before any allocation happens */ +int +rte_kni_init(unsigned int max_kni_ifaces __rte_unused) { - const struct rte_memzone *mz = rte_memzone_lookup(name); + /* Check FD and open */ + if (kni_fd < 0) { + kni_fd = open("/dev/" KNI_DEVICE, O_RDWR); + if (kni_fd < 0) { + RTE_LOG(ERR, KNI, + "Can not open /dev/%s\n", KNI_DEVICE); + return -1; + } + } - if (mz == NULL) - mz = rte_memzone_reserve(name, len, socket_id, flags); + return 0; +} - re
Re: [dpdk-dev] [PATCH v2] net/i40e: fix flag sent to mac_address_write
Sure. Sent v3. On Fri, Jan 12, 2018 at 5:22 AM, Xing, Beilei wrote: > > > > -Original Message- > > From: Igor Ryzhov [mailto:iryz...@nfware.com] > > Sent: Thursday, January 11, 2018 6:07 PM > > To: dev@dpdk.org > > Cc: Xing, Beilei ; sta...@dpdk.org > > Subject: [PATCH v2] net/i40e: fix flag sent to mac_address_write > > > > Use the same value as in Linux driver. > > Sorry for missing the comment in v1 patch, but could you please detail the > commit log? > > > > > Fixes: e18e01e92c29 ("i40e: support default MAC address setting") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Igor Ryzhov >
[dpdk-dev] [PATCH v3] net/i40e: fix flag sent to mac_address_write
Current flag is in wrong byte order. Use defined macro as it is done in Linux driver. Fixes: e18e01e92c29 ("i40e: support default MAC address setting") Cc: sta...@dpdk.org Signed-off-by: Igor Ryzhov --- v3: * fix commit message v2: * fix checkpatch warning (long line) * fix commit subject * add Fixes line * CC to stable --- drivers/net/i40e/i40e_ethdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 285d92b..055b9e8 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -10795,8 +10795,8 @@ static void i40e_set_default_mac_addr(struct rte_eth_dev *dev, return; } - /* Flags: 0x3 updates port address */ - i40e_aq_mac_address_write(hw, 0x3, mac_addr->addr_bytes, NULL); + i40e_aq_mac_address_write(hw, I40E_AQC_WRITE_TYPE_LAA_WOL, + mac_addr->addr_bytes, NULL); } static int -- 2.6.4
Re: [dpdk-dev] [PATCH] net/i40e: fix VSI MAC filter on primary address change
Sent v2: https://dpdk.org/dev/patchwork/patch/33570/ On Thu, Jan 11, 2018 at 11:21 AM, Xing, Beilei wrote: > Hi Igor, > > > > Thanks for the catch, and glad to see your patchJ It resolves a potential > problem in PMD. > > The patch looks OK for me except some minor comments (in another mail > thread). > > > > Best Regards, > > Beilei > > > > *From:* Igor Ryzhov [mailto:iryz...@nfware.com] > *Sent:* Thursday, January 11, 2018 6:47 AM > *To:* Zhang, Helin > *Cc:* Xing, Beilei ; Olivier Matz < > olivier.m...@6wind.com>; dev@dpdk.org; Wu, Jingjing ; > sta...@dpdk.org; Laurent Hardy > > *Subject:* Re: [dpdk-dev] [PATCH] net/i40e: fix VSI MAC filter on primary > address change > > > > Hello everyone. > > > > It's sad that my comments were unanswered. > > I'm ok with the first two – they were mostly style-related. > > But I made an investigation on the third one and it is a bug. > > > > This is a description (from X710 datasheet) of flags sent to > mac_address_write command: > > > > By bits: > > 0-7 – Reserved > > 8 – MAC_MAG_EN > > 9 – LAA_WOL_PRESERVE > > 10-13 – Reserved > > 14-15 – Write type (00 – Update LAA only, 01 – Update LAA and WOL, 10 – > Update port address, 11 – Reserved, but used in Linux to enable multicast > magic packet wake up) > > > > Current code uses 0x3 flag, apparently trying to update LAA, WOL and port > address, but it sets first two bits instead of last two. > > These bits are reserved, that's why it doesn't break anything. > > At the same time, last two bits are set to zero, and the command changes > LAA address only – it's enough to work in simple case. > > > > The last question – which flag is correct to use – 01 (LAA + WOL) or 11 > (LAA + WOL + port). > > Linux driver uses the first one, and here is the patch to fix the issue: > > https://dpdk.org/dev/patchwork/patch/33524/ > > > > Best regards, > > Igor > > > > On Wed, Jan 10, 2018 at 4:57 PM, Zhang, Helin > wrote: > > > > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Xing, Beilei > > Sent: Thursday, January 4, 2018 1:39 PM > > To: Olivier Matz; dev@dpdk.org; Wu, Jingjing > > Cc: sta...@dpdk.org; Laurent Hardy > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: fix VSI MAC filter on primary > address > > change > > > > > > > > > -Original Message- > > > From: Olivier Matz [mailto:olivier.m...@6wind.com] > > > Sent: Wednesday, January 3, 2018 10:29 PM > > > To: dev@dpdk.org; Wu, Jingjing ; Xing, Beilei > > > > > > Cc: sta...@dpdk.org; Laurent Hardy > > > Subject: [PATCH] net/i40e: fix VSI MAC filter on primary address > > > change > > > > > > When primary address mac is changed, the mac filters were not updated > > > in the VSI with the new mac addr and incoming packets with this > > > destination address are dropped by the hardware filters. > > > > > > This patch removes the VSI mac filter for the previous mac address and > > > adds a new one for new mac address. > > > > > > Fixes: e18e01e92c29 ("i40e: support default MAC address setting") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Laurent Hardy > > > Signed-off-by: Olivier Matz > > > > Thanks for the fix. > > Acked-by: Beilei Xing > > Applied to dpdk-next-net-intel, thanks! > > /Helin > > >
[dpdk-dev] [PATCH v2] net/i40e: fix flag sent to mac_address_write
Use the same value as in Linux driver. Fixes: e18e01e92c29 ("i40e: support default MAC address setting") Cc: sta...@dpdk.org Signed-off-by: Igor Ryzhov --- v2: * fix checkpatch warning (long line) * fix commit subject * add Fixes line * CC to stable --- drivers/net/i40e/i40e_ethdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 285d92b..055b9e8 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -10795,8 +10795,8 @@ static void i40e_set_default_mac_addr(struct rte_eth_dev *dev, return; } - /* Flags: 0x3 updates port address */ - i40e_aq_mac_address_write(hw, 0x3, mac_addr->addr_bytes, NULL); + i40e_aq_mac_address_write(hw, I40E_AQC_WRITE_TYPE_LAA_WOL, + mac_addr->addr_bytes, NULL); } static int -- 2.6.4
Re: [dpdk-dev] [PATCH] net/i40e: fix VSI MAC filter on primary address change
Hello everyone. It's sad that my comments were unanswered. I'm ok with the first two – they were mostly style-related. But I made an investigation on the third one and it is a bug. This is a description (from X710 datasheet) of flags sent to mac_address_write command: By bits: 0-7 – Reserved 8 – MAC_MAG_EN 9 – LAA_WOL_PRESERVE 10-13 – Reserved 14-15 – Write type (00 – Update LAA only, 01 – Update LAA and WOL, 10 – Update port address, 11 – Reserved, but used in Linux to enable multicast magic packet wake up) Current code uses 0x3 flag, apparently trying to update LAA, WOL and port address, but it sets first two bits instead of last two. These bits are reserved, that's why it doesn't break anything. At the same time, last two bits are set to zero, and the command changes LAA address only – it's enough to work in simple case. The last question – which flag is correct to use – 01 (LAA + WOL) or 11 (LAA + WOL + port). Linux driver uses the first one, and here is the patch to fix the issue: https://dpdk.org/dev/patchwork/patch/33524/ Best regards, Igor On Wed, Jan 10, 2018 at 4:57 PM, Zhang, Helin wrote: > > > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Xing, Beilei > > Sent: Thursday, January 4, 2018 1:39 PM > > To: Olivier Matz; dev@dpdk.org; Wu, Jingjing > > Cc: sta...@dpdk.org; Laurent Hardy > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: fix VSI MAC filter on primary > address > > change > > > > > > > > > -Original Message- > > > From: Olivier Matz [mailto:olivier.m...@6wind.com] > > > Sent: Wednesday, January 3, 2018 10:29 PM > > > To: dev@dpdk.org; Wu, Jingjing ; Xing, Beilei > > > > > > Cc: sta...@dpdk.org; Laurent Hardy > > > Subject: [PATCH] net/i40e: fix VSI MAC filter on primary address > > > change > > > > > > When primary address mac is changed, the mac filters were not updated > > > in the VSI with the new mac addr and incoming packets with this > > > destination address are dropped by the hardware filters. > > > > > > This patch removes the VSI mac filter for the previous mac address and > > > adds a new one for new mac address. > > > > > > Fixes: e18e01e92c29 ("i40e: support default MAC address setting") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Laurent Hardy > > > Signed-off-by: Olivier Matz > > > > Thanks for the fix. > > Acked-by: Beilei Xing > Applied to dpdk-next-net-intel, thanks! > > /Helin >
[dpdk-dev] [PATCH] i40e: fix flag sent to mac_address_write
Use the same value as in Linux driver. Signed-off-by: Igor Ryzhov --- drivers/net/i40e/i40e_ethdev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 285d92b..93b9dd0 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -10795,8 +10795,7 @@ static void i40e_set_default_mac_addr(struct rte_eth_dev *dev, return; } - /* Flags: 0x3 updates port address */ - i40e_aq_mac_address_write(hw, 0x3, mac_addr->addr_bytes, NULL); + i40e_aq_mac_address_write(hw, I40E_AQC_WRITE_TYPE_LAA_WOL, mac_addr->addr_bytes, NULL); } static int -- 2.6.4
Re: [dpdk-dev] [PATCH] net/i40e: fix VSI MAC filter on primary address change
Thank you for the patch! Comments inline. On Wed, Jan 3, 2018 at 5:29 PM, Olivier Matz wrote: > > drivers/net/i40e/i40e_ethdev.c | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ > ethdev.c > index 811cc9ffe..e7d070879 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -10818,12 +10818,41 @@ static void i40e_set_default_mac_addr(struct > rte_eth_dev *dev, > struct ether_addr *mac_addr) > { > struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev-> > data->dev_private); > + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev-> > data->dev_private); > + struct i40e_vsi *vsi = pf->main_vsi; > + struct i40e_mac_filter_info mac_filter; > + struct i40e_mac_filter *f; > + int ret; > > if (!is_valid_assigned_ether_addr(mac_addr)) { > PMD_DRV_LOG(ERR, "Tried to set invalid MAC address."); > return; > } > > Is following check really necessary here? i40e_vsi_delete_mac(vsi, &pf->dev_addr) will do absolutely the same. + TAILQ_FOREACH(f, &vsi->mac_list, next) { > + if (is_same_ether_addr(&pf->dev_addr, > &f->mac_info.mac_addr)) > + break; > + } > + > + if (f == NULL) { > + PMD_DRV_LOG(ERR, "Failed to find filter for default mac"); > + return; > + } > + > + mac_filter = f->mac_info; > + ret = i40e_vsi_delete_mac(vsi, &mac_filter.mac_addr); > + if (ret != I40E_SUCCESS) { > + PMD_DRV_LOG(ERR, "Failed to delete mac filter"); > + return; > + } > + memcpy(&mac_filter.mac_addr, mac_addr, ETH_ADDR_LEN); > Shouldn't mac_filter.filter_type be set to RTE_MACVLAN_PERFECT_MATCH? > + ret = i40e_vsi_add_mac(vsi, &mac_filter); > + if (ret != I40E_SUCCESS) { > + PMD_DRV_LOG(ERR, "Failed to add mac filter"); > + return; > + } > + memcpy(&pf->dev_addr, mac_addr, ETH_ADDR_LEN); > + > /* Flags: 0x3 updates port address */ > In Linux driver I40E_AQC_WRITE_TYPE_LAA_WOL is used as a flag instead of 0x3. Shouldn't we use the same flag? > i40e_aq_mac_address_write(hw, 0x3, mac_addr->addr_bytes, NULL); > } > -- > 2.11.0 > > Best regards, Igor