Re: [EXT] Libtpa: a DPDK based userspace TCP stack implementation
On Mon, Dec 11, 2023 at 01:16:52PM +0100, Thomas Monjalon wrote: > 11/12/2023 12:32, Jerin Jacob Kollanukkaran: > > From: Yuanhan Liu > > > Hi all, > > > > > > I'd like to share a new DPDK open source project, libtpa(Transport > > > Protocol > > > Acceleration)[0], which is just another userspace TCP stack > > > implementation so > > > far, written from scratch. > > > > > > I started this project 3 years ago, while I was searching for a feasible > > > open > > > source project with no luck. There were indeed quite a few options, but > > > none of > > > them actually met my needs. I then started writing one. Likely, there are > > > still > > > other guys out there looking for a high performance and stable userspace > > > TCP > > > stack. This is what this email and libtpa for. > > > > Great Yuanhan. > > > > If you have time and willing to put effort, I suggest make this part of > > dpdk code base > > as new library (tcp or so) and leverage + improve another existing library > > such ip_frag. > > > > I believe, that is only way. > > - This code soon won't soon outdated based on new DPDK version > > - More community review and contributors > > - More review and features from NIC vendors PoV. > > - More arch and driver support. > > - More quality > > As Yuanhan said, there are many TCP stacks running on top of DPDK. > We should add this one to the list: > https://www.dpdk.org/ecosystem/#projects > Also a discussion has started recently about integrating one in DPDK. > As Jerin suggests, libtpa looks like a very good candidate to focus efforts > on it. > > Regarding performance, how does it compare with F-Stack? TLDK? Seastar? I think it should be fair to say (I haven't done the testing though; I never tried to run those stacks), libtpa is the userspace tcp stack with the best performance I'm aware of. The redis numbers showed in this email thread is just one example. Libtpa also ships an performance benchmark, tperf. With tperf write test (and without jumboframe), libtpa can achieve 200G linerate with only one physical core for write. The read test is not that good though, because of missing hardware acceleration features like TSO. Although performance is very important to an userspace stack, I still want to point out that, during the design, performance is not my major goal. I spent most of my effort on shaping the testing system and the debug-ablity initially. Libtpa has been deployed in bytedance for more than two years, till now, there is no single TCP protocol bug reported. (I do get very few bugs reported though, but most of them are related to the OS environment, such as sigbus due to wrong API used when running out of tmpfs). Thanks, Yuanhan Liu
Re: [EXT] Libtpa: a DPDK based userspace TCP stack implementation
On Mon, Dec 11, 2023 at 11:32:16AM +, Jerin Jacob Kollanukkaran wrote: > > > > -Original Message- > > From: Yuanhan Liu > > Sent: Monday, December 11, 2023 3:27 PM > > To: lib...@googlegroups.com > > Cc: dev@dpdk.org; Yuanhan Liu > > Subject: [EXT] Libtpa: a DPDK based userspace TCP stack implementation > > > > External Email > > > > -- > > Hi all, > > > > I'd like to share a new DPDK open source project, libtpa(Transport Protocol > > Acceleration)[0], which is just another userspace TCP stack implementation > > so > > far, written from scratch. > > > > I started this project 3 years ago, while I was searching for a feasible > > open > > source project with no luck. There were indeed quite a few options, but > > none of > > them actually met my needs. I then started writing one. Likely, there are > > still > > other guys out there looking for a high performance and stable userspace TCP > > stack. This is what this email and libtpa for. > > Great Yuanhan. > > If you have time and willing to put effort, I suggest make this part of dpdk > code base > as new library (tcp or so) and leverage + improve another existing library > such ip_frag. > > I believe, that is only way. > - This code soon won't soon outdated based on new DPDK version > - More community review and contributors > - More review and features from NIC vendors PoV. > - More arch and driver support. > - More quality Hi Jerin, Thanks for you suggestion and these really are good points! Although libtpa is currently designed as a libray, I doubt it would suit well as a new library to DPDK. Just taking the code base an example, libtpa so far is about 27K lines of code. The TCP part is only about 3K lines of code. All the rest are codes supporting the TCP part, such as sock tracing, mem file, mem file auto archive, etc. You can look more from the internals page (or even read the code ;) https://github.com/bytedance/libtpa/blob/main/doc/internals.rst Thanks, Yuanhan Liu > > Just my 2c. > > -- > You received this message because you are subscribed to the Google Groups > "libtpa" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to libtpa+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/libtpa/BY3PR18MB478592AF236C7BBDB0285E3CC88FA%40BY3PR18MB4785.namprd18.prod.outlook.com. > For more options, visit https://groups.google.com/d/optout.
Libtpa: a DPDK based userspace TCP stack implementation
Hi all, I'd like to share a new DPDK open source project, libtpa(Transport Protocol Acceleration)[0], which is just another userspace TCP stack implementation so far, written from scratch. I started this project 3 years ago, while I was searching for a feasible open source project with no luck. There were indeed quite a few options, but none of them actually met my needs. I then started writing one. Likely, there are still other guys out there looking for a high performance and stable userspace TCP stack. This is what this email and libtpa for. Libtpa is fast. To demonstrate that, we did a hacky redis integration. The benchmark shows that libtpa can boost the performance more than 5 times, from 0.21m rps to 1.14m rps[1]. Right, it can achieve 1 million rps just with one CPU thread. Meanwhile, the p99 latency decreases from 0.815ms to 0.159ms. Regarding the stableness, I'd say it's not bad, all kudos to the comprehensive testing. I've written more than 200 tests. Together with the testing arguments matrix[2], it can result in a big variety of test cases. Therefore, most of the bugs are captured before deployment. Having said that, I'd still suggest you to do as much testing as you can if you want to use it, for libtpa is still under active development and it's just v1.0-rc0 being released. Tons of changes have been made since the last stable release. There is one more thing I'm a bit proud of about libtpa: as a DPDK based project, libtpa has rich set of debug tools[3]. The sock tracing is particularly handy on debugging that libtpa doesn't ship a tcpdump like tool, simply for we don't really need one. The TCP part then may not sound that exciting. It's basically just an initial TCP implementation, with standard congestion avoid algorithm (New Reno). Libtpa implements slightly more than that though, such as SACK, congestion window validation, spurious retransmission detection, keepalive, etc. That's all. Comments, questions, patches and testing are all welcome! Thanks, Yuanhan Liu --- [0]: libtpa: https://github.com/bytedance/libtpa [1]: redis: https://github.com/bytedance/libtpa/tree/main/doc/redis.rst [2]: matrix shell: https://github.com/bytedance/libtpa/tree/main/doc/internals.rst [3]: user guide: https://github.com/bytedance/libtpa/tree/main/doc/user_guide.rst
Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
On Mon, Jul 2, 2018, at 1:44 PM, Qi Zhang wrote: > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let > application lock or unlock on specific ethdev, a locked device > can't be detached, this help applicaiton to prevent unexpected > device detaching, especially in multi-process envrionment. I'm just wondering why the need of introducing such 2 (very specific) APIs, judging that it looks like a ref count could solve this kind of issues perfectly. Also, the API name (_dev_lock/unlock) looks like an over killing, judging that you just want to pin a device (if I understand this commit log correctly). What firstly comes to my mind when I see "lock" was "no one can *access* a device after another has grabbed the lock", which doesn't quite match what you described here. That said, if I were you, I would go with introducing few generic refcount APIs. --yliu > > Aslo introduce the new API rte_eth_dev_lock_with_callback and > rte_eth_dev_unlock_with callback to let application to register > a callback function which will be invoked before a device is going > to be detached, the return value of the function will decide if > device will continue be detached or not, this support application > to do condition check at runtime. > > Signed-off-by: Qi Zhang > Reviewed-by: Anatoly Burakov
[dpdk-dev] [dpdk-announce] DPDK 17.11.3 (LTS) released
perf profile handler Santosh Shukla (2): net/octeontx: fix null pointer dereference net/octeontx: fix uninitialized variable in port open Shachar Beiser (2): net: add IPv6 header fields macros net/mlx5: fix IPv6 header fields Shahaf Shuler (12): net/mlx5: fix secondary process mempool registration net/mlx5: remove assert un-accessible from secondary process net/mlx5: warn for unsuccessful memory registration net/mlx5: fix CRC strip capability query net/mlx5: change pkt burst select function prototype net/mlx5: enforce RSS key length limitation net/mlx5: fix RSS key length query net/mlx5: fix link status initialization net/mlx5: fix ethtool link setting call order net/mlx5: fix socket connection return value net/mlx5: fix probe return value polarity net/mlx5: fix flow director drop rule deletion crash Shahed Shaikh (1): net/qede: fix unicast filter routine return code Stefan Hajnoczi (1): vhost: fix message payload union in setting ring address Stephen Hemminger (1): net/octeontx: fix uninitialized speed variable Thomas Monjalon (2): mbuf: fix Tx checksum offload API doc mbuf: improve tunnel Tx offloads API doc Tomasz Duszynski (1): net/mrvl: fix Rx descriptors number Tomasz Kulasek (5): vhost: fix offset while mmaping log base address vhost: check cmsg not null vhost: fix device cleanup at stop vhost: fix realloc failure vhost: fix ring index returned to master on stop Tonghao Zhang (2): net/bonding: free mempool used in mode 6 vhost: fix dead lock on closing in server mode Wenzhuo Lu (1): net/ixgbe: fix too many interrupts Xiaoxin Peng (1): net/bnxt: fix Rx mbuf and agg ring leak in dev stop Xueming Li (5): net/mlx5: fix existing file removal net/mlx5: map UAR address around huge pages net/mlx5: fix close after start failure net/mlx5: fix invalid flow item check net/mlx5: add data-plane debug message macro Yongseok Koh (6): doc: add timestamp offload to mlx5 features net/mlx5: fix synchronization on polling Rx completions net/mlx5: fix disabling Tx packet inlining net/mlx5: remove excessive data prefetch net/mlx5: fix calculation of Tx TSO inline room size net/mlx5: change device reference for secondary process Yuanhan Liu (1): version: 17.11.3 Yunjian Wang (2): net/i40e: fix intr callback unregister by adding retry net/ixgbe: fix intr callback unregister by adding retry
Re: [dpdk-dev] 17.11.3 (LTS) patches review and test
On Fri, Jun 8, 2018, at 11:00 PM, Bruce Richardson wrote: > On Sun, May 27, 2018 at 01:35:17PM +0800, Yuanhan Liu wrote: > > Hi all, > > > > Here is a list of patches targeted for LTS release 17.11.3. Please > > help review and test. The planned date for the final release is 8th, > > Jun. Before that, please let me know if anyone has objections with > > these patches being applied. > > > > These patches are located at branch 17.11 of dpdk-stable repo: > > http://dpdk.org/browse/dpdk-stable/ > > > > Thanks. > > > > --yliu > > > Hi Yuanhan, > > while I understand that you are eager to get the release out, I don't > believe all testing has been completed by us on the patches. Can the > release be postponed by a few days to allow the testing to be completed. Hi Bruce, Sure, no problem, I will postpone it to next Friday. Hope it's okay to you guys. And thanks for the testing! --yliu
Re: [dpdk-dev] 17.11.3 (LTS) patches review and test
On Thu, Jun 07, 2018 at 11:09:56PM +, Raslan Darawsheh wrote: > Hi, > > We've tested this version and we don't see new issues with it. Hi Raslan, Thank you a lot for the testing! --yliu > > Kindest regards, > Raslan Darawsheh > > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yuanhan Liu > Sent: Sunday, May 27, 2018 8:35 AM > To: dpdk stable > Cc: dev@dpdk.org; Thomas Monjalon > Subject: [dpdk-dev] 17.11.3 (LTS) patches review and test > > Hi all, > > Here is a list of patches targeted for LTS release 17.11.3. Please help > review and test. The planned date for the final release is 8th, Jun. Before > that, please let me know if anyone has objections with these patches being > applied. > > These patches are located at branch 17.11 of dpdk-stable repo: > > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&data=02%7C01%7Crasland%40mellanox.com%7Ce19869d187dc4afd691208d5c393b1e5%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636629961495952407&sdata=Afb4An7wMdhwquNjwMvC3iJy%2FCICWS9nR4KHlpLPIqY%3D&reserved=0 > > Thanks. > > --yliu > > --- > Aaron Conole (2): > nfp: unlink the appropriate lock file > nfp: allow for non-root user > > Adrien Mazarguil (1): > app/testpmd: fix empty list of RSS queues for flow > > Ajit Khaparde (9): > net/bnxt: fix LRO disable > net/bnxt: fix Rx drop setting > net/bnxt: set padding flags in Rx descriptor > net/bnxt: fix endianness of flag > net/bnxt: fix Rx checksum flags for tunnel frames > net/bnxt: free memory allocated for VF filters > net/bnxt: avoid invalid vnic id in set L2 Rx mask > net/bnxt: fix usage of vnic id > net/bnxt: fix Rx checksum flags > > Alejandro Lucero (5): > net/nfp: fix assigning port id in mbuf > net/nfp: fix barrier location > net/nfp: fix link speed capabilities > doc: fix NFP NIC guide grammar > net/nfp: fix mbufs releasing when stop or close > > Allain Legacy (1): > ip_frag: fix double free of chained mbufs > > Anatoly Burakov (4): > app/crypto-perf: fix IOVA translation > mem: do not use physical addresses in IOVA as VA mode > vfio: do not needlessly check for IOVA mode > mempool: fix virtual address population > > Andrew Rybchenko (6): > net/sfc: add missing defines for SAL annotation > net/sfc: fix mbuf data alignment calculation > net/sfc/base: fix comparison always true warning > mempool: fix leak when no objects are populated > test/mempool: fix autotest retry > net/sfc: ignore spec bits not covered by mask > > Andy Green (3): > net/bnx2x: do not cast function pointers as a policy > net/bnx2x: fix KR2 device check > net/bnx2x: fix memzone name overrun > > Ashish Jain (1): > event/dpaa2: remove link from info structure > > Beilei Xing (3): > net/i40e: fix DDP profile DEL operation > net/i40e: fix link status update > net/i40e: fix failing to disable FDIR Tx queue > > Bin Huang (1): > net/mlx5: add packet type index for TCP ack > > Chas Williams (5): > net/vmxnet3: set the queue shared buffer at start > net/bonding: fix setting VLAN ID on slave ports > net/bonding: clear started state if start fails > net/ixgbe: fix busy wait during checking link status > net/bonding: export mode 4 slave info routine > > Chuhong Yao (1): > net/liquidio: fix link state fetching during start > > Ciara Loftus (1): > net/vhost: initialise device as inactive > > Dan Gora (1): > kni: fix build on CentOS 7.4 > > Daniel Shelepov (1): > app/testpmd: fix burst stats reporting > > David Hunt (4): > mk: fix make defconfig on FreeBSD > test/distributor: fix return type of thread function > test/pipeline: fix return type of stub miss > examples/performance-thread: fix return type of threads > > Fan Zhang (2): > net/i40e: fix link update no wait > crypto/scheduler: fix possible duplicated ring names > > Ferruh Yigit (4): > pci: remove duplicated symbol from map file > net/tap: fix icc build > drivers/net: fix link autoneg value for virtual PMDs > net/i40e: fix shifts of signed values > > Gaetan Rivet (2): > bus/fslmc: fix find device start condition > bus/pci: fix find device implementation > > Gowrishankar Muthukrishnan (2): > eal/ppc: remove braces in SMP memory barrier macro > net/bo
Re: [dpdk-dev] [PATCH] app/testpmd: fix failsafe PMD failure on exit
On Sun, May 27, 2018 at 02:37:28PM +0200, Thomas Monjalon wrote: > 27/05/2018 06:06, Yuanhan Liu: > > On Tue, May 22, 2018 at 07:35:08PM +0100, Ferruh Yigit wrote: > > > + /* > > > + * This is a workaround to fix a virtio-user issue that > > > + * requires to call clean-up routine to remove existing > > > + * socket. > > > > I came across this patch while I was cherry-picking patches to 17.11.4 > > release. And this patch seems wrong to me. > > Yes it is far from perfect. > > > Any particular reason why the socket removal can not be done in virtio-user > > pmd, say at its close method? > > The socket is removed in the remove function of the driver. > The right fix is to call the remove functions of all driver from > the EAL cleanup function. > We have decided of this last minute workaround for testpmd > because we need it for testing convenience, but do not want to > take any risk with a proper fix as it is really late for that. If there must be a workaround, I would do it at virtio-user pmd. --yliu > > > > > + * This workaround valid only for testpmd, needs a fix > > > + * valid for all applications. > > > + * TODO: Implement proper resource cleanup > > > + */ > > > + device = rte_eth_devices[pt_id].device; > > > + if (device && !strcmp(device->driver->name, > > > "net_virtio_user")) > > > detach_port(pt_id); > > > } > > > } > > > > > >
[dpdk-dev] 17.11.3 (LTS) patches review and test
Hi all, Here is a list of patches targeted for LTS release 17.11.3. Please help review and test. The planned date for the final release is 8th, Jun. Before that, please let me know if anyone has objections with these patches being applied. These patches are located at branch 17.11 of dpdk-stable repo: http://dpdk.org/browse/dpdk-stable/ Thanks. --yliu --- Aaron Conole (2): nfp: unlink the appropriate lock file nfp: allow for non-root user Adrien Mazarguil (1): app/testpmd: fix empty list of RSS queues for flow Ajit Khaparde (9): net/bnxt: fix LRO disable net/bnxt: fix Rx drop setting net/bnxt: set padding flags in Rx descriptor net/bnxt: fix endianness of flag net/bnxt: fix Rx checksum flags for tunnel frames net/bnxt: free memory allocated for VF filters net/bnxt: avoid invalid vnic id in set L2 Rx mask net/bnxt: fix usage of vnic id net/bnxt: fix Rx checksum flags Alejandro Lucero (5): net/nfp: fix assigning port id in mbuf net/nfp: fix barrier location net/nfp: fix link speed capabilities doc: fix NFP NIC guide grammar net/nfp: fix mbufs releasing when stop or close Allain Legacy (1): ip_frag: fix double free of chained mbufs Anatoly Burakov (4): app/crypto-perf: fix IOVA translation mem: do not use physical addresses in IOVA as VA mode vfio: do not needlessly check for IOVA mode mempool: fix virtual address population Andrew Rybchenko (6): net/sfc: add missing defines for SAL annotation net/sfc: fix mbuf data alignment calculation net/sfc/base: fix comparison always true warning mempool: fix leak when no objects are populated test/mempool: fix autotest retry net/sfc: ignore spec bits not covered by mask Andy Green (3): net/bnx2x: do not cast function pointers as a policy net/bnx2x: fix KR2 device check net/bnx2x: fix memzone name overrun Ashish Jain (1): event/dpaa2: remove link from info structure Beilei Xing (3): net/i40e: fix DDP profile DEL operation net/i40e: fix link status update net/i40e: fix failing to disable FDIR Tx queue Bin Huang (1): net/mlx5: add packet type index for TCP ack Chas Williams (5): net/vmxnet3: set the queue shared buffer at start net/bonding: fix setting VLAN ID on slave ports net/bonding: clear started state if start fails net/ixgbe: fix busy wait during checking link status net/bonding: export mode 4 slave info routine Chuhong Yao (1): net/liquidio: fix link state fetching during start Ciara Loftus (1): net/vhost: initialise device as inactive Dan Gora (1): kni: fix build on CentOS 7.4 Daniel Shelepov (1): app/testpmd: fix burst stats reporting David Hunt (4): mk: fix make defconfig on FreeBSD test/distributor: fix return type of thread function test/pipeline: fix return type of stub miss examples/performance-thread: fix return type of threads Fan Zhang (2): net/i40e: fix link update no wait crypto/scheduler: fix possible duplicated ring names Ferruh Yigit (4): pci: remove duplicated symbol from map file net/tap: fix icc build drivers/net: fix link autoneg value for virtual PMDs net/i40e: fix shifts of signed values Gaetan Rivet (2): bus/fslmc: fix find device start condition bus/pci: fix find device implementation Gowrishankar Muthukrishnan (2): eal/ppc: remove braces in SMP memory barrier macro net/bonding: fix primary slave port id storage type Harish Patil (1): net/qede: fix multicast filtering Hemant Agrawal (6): net/dpaa: fix oob access bus/dpaa: fix resource leak net/dpaa2: fix xstats app/crypto-perf: fix excess crypto device error examples/l2fwd-crypto: fix the default aead assignments crypto/dpaa2_sec: fix HMAC supported digest sizes Hyong Youb Kim (1): net/enic: allocate stats DMA buffer upfront during probe Ivan Malov (2): net/sfc: add missing Rx fini on RSS setup fail path net/sfc: process RSS settings on Rx configure step Jasvinder Singh (1): test/pipeline: fix type of table entry parameter Jerin Jacob (2): app/crypto-perf: use strcpy for allocated string app/crypto-perf: fix parameters copy John Daley (1): net/enic: fix crash on MTU update with non-setup queues Junjie Chen (2): net/vhost: fix crash when creating vdev dynamically net/vhost: fix invalid state Kirill Rybalchenko (2): crypto/scheduler: fix multicore rings re-use crypto/scheduler: fix 64-bit mask of workers cores Lee Roberts (1): kni: fix build on RHEL 7.5 Matan Azrad (10): ethdev: fix port accessing after release app/testpmd: fix slave port detection app/testpmd: fix valid ports prints app/testpmd: fix forward ports update app/testpmd: fix forward ports Rx flu
Re: [dpdk-dev] [PATCH] app/testpmd: fix failsafe PMD failure on exit
On Tue, May 22, 2018 at 07:35:08PM +0100, Ferruh Yigit wrote: > vdevs detach on testpmd exit implemented as workaround to fix > a virtio-user issue. The issue was virtio-user cleanup is not > called and existing socket file not cleaned up which will fail > next run. > > The vdev cleanup causing problems in failsafe PMD. > > Reduce the cleanup to only virtio-user and add a comment that this > workaround should be converted to a proper cleanup, not something > specific to virtio-user, and not something specific to vdev and > testpmd. > > Fixes: fe890955114d ("app/testpmd: fix exit for vdevs") > ... > pmd_test_exit(void) > { > - const struct rte_bus *bus; > struct rte_device *device; > portid_t pt_id; > int ret; > @@ -2025,13 +2024,21 @@ pmd_test_exit(void) > if (ports != NULL) { > no_link_check = 1; > RTE_ETH_FOREACH_DEV(pt_id) { > - device = rte_eth_devices[pt_id].device; > - bus = rte_bus_find_by_device(device); > printf("\nShutting down port %d...\n", pt_id); > fflush(stdout); > stop_port(pt_id); > close_port(pt_id); > - if (bus && !strcmp(bus->name, "vdev")) > + > + /* > + * This is a workaround to fix a virtio-user issue that > + * requires to call clean-up routine to remove existing > + * socket. I came across this patch while I was cherry-picking patches to 17.11.4 release. And this patch seems wrong to me. Any particular reason why the socket removal can not be done in virtio-user pmd, say at its close method? --yliu > + * This workaround valid only for testpmd, needs a fix > + * valid for all applications. > + * TODO: Implement proper resource cleanup > + */ > + device = rte_eth_devices[pt_id].device; > + if (device && !strcmp(device->driver->name, > "net_virtio_user")) > detach_port(pt_id); > } > } > -- > 2.14.3
Re: [dpdk-dev] release date for 17.11.3?
On Thu, May 17, 2018 at 10:11:40PM +0200, Thomas Monjalon wrote: > 17/05/2018 18:16, Montorsi, Francesco: > > Hi all, > > I'd like to build DPDK on Centos 7.5... I found a commit > > (http://dpdk.org/browse/dpdk-stable/commit/?h=17.11&id=3ee847054cc9ab62fa2c9c6dc6ba68899d620e3a) > > that allows that, but it will go into DPDK 17.11.3 right? > > Is there a release date for 17.11.3 already defined? > > Hopefully June 8th: > http://dpdk.org/dev/roadmap#stable Yes, something like that. I probably will make it a little bit earlier than that. --yliu > > +Cc Yuanhan
Re: [dpdk-dev] [RFC v2] vhost: new rte_vhost API proposal
On Fri, May 18, 2018 at 03:50:45PM +0200, Maxime Coquelin wrote: > Hi Dariusz, > > On 05/18/2018 03:01 PM, Dariusz Stojaczyk wrote: > > rte_vhost is not vhost-user spec compliant. Some Vhost drivers have > > been already confirmed not to work with rte_vhost. virtio-user-scsi-pci > > in QEMU 2.12 doesn't fully initialize its management queues at SeaBIOS > > stage. This is perfectly fine from the Vhost-user spec perspective, but > > doesn't meet rte_vhost expectations. rte_vhost waits for all queues > > to be fully initialized before it allows the entire device to be > > processed. qFixing rte_vhost directly would require quite a big amount > > of changes, which would completely break backwards compatibility. > > > > This rte_vhost2 library is intended to smooth out the transition. > > It exposes a low-level API for implementing new Vhost-user slaves. > > The existing rte_vhost is about to be refactored to use rte_vhost2 > > library underneath, and demanding backends could now use rte_vhost2 > > directly. > > I like the idea, and the proposed way to smooth the transition. Please be aware of that I just had a quick glimpse of this patch and it's likely I don't have too much time to follow this. However, I also like this idea. And thank you for working on it. --yliu
Re: [dpdk-dev] [PATCH] maintainers: resign from vhost and vdev
On Thu, May 03, 2018 at 05:44:29PM +0200, Thomas Monjalon wrote: > 03/05/2018 12:49, Burakov, Anatoly: > > On 03-May-18 11:02 AM, Yang, Zhiyong wrote: > > > Thanks a lot for your excellent contributions to DPDK community, > > > Jianfeng. > > > See you later. > > > > +1, you will be missed! > > +1 > Thanks a lot Jianfeng +1, and good luck! --yliu
[dpdk-dev] [dpdk-announce] DPDK 17.11.1 (LTS) released
t/octeontx: fix Rx adapter port id mapping Phil Yang (2): test/memzone: fix NULL freeing test/memzone: fix freeing test Qi Zhang (2): net/i40e: fix VLAN offload setting net/i40e: exclude LLDP packet count Radu Nicolau (3): examples/bond: fix vdev name examples/bond: check mbuf allocation net/bonding: check error of MAC address setting Rafal Kozik (1): net/ena: do not set Tx L4 offloads in Rx path Rasesh Mody (4): net/qede: replace config option with run-time arg net/qede: fix MTU set and max Rx length net/qede: fix few log messages net/qede: fix clearing of queue stats Raslan Darawsheh (1): net/mlx5: fix flow type for allmulti rules Roman Zhukov (2): net/sfc: fix initialization of flow structure net/sfc: fix flow RSS check in error handling Rosen Xu (1): net/i40e: fix packet type for X722 Samuel Gauthier (1): net/virtio: fix Rx and Tx handler selection for ARM32 Shachar Beiser (1): net/mlx5: fix IPv6 header fields Shahaf Shuler (9): app/testpmd: fix port configuration print app/testpmd: fix flowgen forwarding offload flags net/mlx5: fix VLAN configuration after port stop net/mlx5: fix RSS key configuration net/mlx5: fix missing RSS capability net/mlx5: fix memory region cache lookup net/mlx5: fix memory region cache last index net/mlx5: fix memory region boundary checks net/mlx5: fix link state on device start Somnath Kotur (3): net/bnxt: fix duplicate filter pattern creation error net/bnxt: fix duplicate pattern for 5tuple filter net/bnxt: free the aggregation ring Stephen Hemminger (1): net/tap: remove unused kernel version definitions Thierry Herbelot (1): net/mlx5: cleanup allocation of ethtool stats Thomas Monjalon (1): ethdev: fix link autonegotiation value Timothy Redaelli (1): app/testpmd: remove xenvirt again Tiwei Bie (2): net/virtio: fix vector Rx flushing net/virtio: fix typo in LRO support Tomasz Jozwiak (2): crypto/qat: fix out-of-bounds access crypto/qat: fix parameter type Victor Kaplansky (1): vhost: protect active rings from async ring changes Vipin Varghese (1): net/pcap: fix the NUMA id display in logs Wei Dai (2): app/testpmd: fix invalid Rx queue number setting app/testpmd: fix invalid Tx queue number setting Wei Zhao (5): net/ixgbe: fix tunnel filter fail problem net/i40e: add FDIR NVGRE parameter check net/ixgbe: fix parsing FDIR NVGRE issue net/i40e: fix port segmentation fault when restart net/ixgbe: fix reset error handling Wenzhuo Lu (4): net/ixgbe: fix wrong PBA setting net/i40e: fix VF Rx interrupt enabling net/ixgbe: fix VF Rx interrupt enabling net/e1000: fix VF Rx interrupt enabling Xiao Wang (2): net/fm10k: fix logical port delete igb_uio: allow multi-process access Xueming Li (2): cmdline: fix dynamic tokens parsing cmdline: avoid garbage in unused fields of parsed result Yangchao Zhou (1): net/igb: fix Tx queue number assignment Yanglong Wu (3): app/testpmd: fix port id allocation net/ixgbe: fix the failure of number of Tx queue check net/ixgbe: fix max queue number for VF Yong Wang (4): net/e1000: fix null pointer check net/i40e: fix memory leak crypto/qat: fix allocation check and leak net/dpaa: fix potential memory leak Yongseok Koh (10): app/testpmd: fix crash of txonly with multiple segments net/mlx5: fix Memory Region registration net/mlx5: fix overflow of Memory Region cache net/mlx5: fix HW checksum offload for outer IP net/mlx5: fix overwriting bit-fields in SW Rx queue net/mlx5: fix deadlock of link status alarm net/mlx5: fix missing attribute size for drop action net/mlx5: fix calculation of flow ID flag net/mlx5: fix memory region lookup net/mlx5: fix handling link status event Yuanhan Liu (5): Revert "net/mlx5: fix IPv6 header fields" Revert "ethdev: fix port id allocation" net/mlx5: use PCI address as port name Revert "net/mlx5: fix flow type for allmulti rules" version: 17.11.1 Zhiyong Yang (3): bus/pci: fix interrupt handler type examples/vhost: fix startup check cryptodev: fix session pointer cast
Re: [dpdk-dev] 17.11.1 patches review and test
On Mon, Feb 26, 2018 at 07:38:32PM +0530, gowrishankar muthukrishnan wrote: > Hi yliu > Could you please include below patch merged 18.02 ?. Apologies for short > notice. > > http://dpdk.org/dev/patchwork/patch/34690/ Done. Thanks. --yliu > > Thanks, > Gowrishankar > > On Monday 19 February 2018 06:13 PM, Yuanhan Liu wrote: > >Hi all, > > > >Here is a list of patches targeted for LTS release 17.11.1. Please > >help review and test. The planned date for the final release is 27th, > >Feb. Before that, please shout if anyone has objections with these > >patches being applied. > > > >These patches are located at branch 17.11 of dpdk-stable repo: > > http://dpdk.org/browse/dpdk-stable/ > > > >Thanks. > > > > --yliu > > > >--- > >Adrien Mazarguil (7): > > net/mlx4: fix unnecessary include > > net/i40e: fix ISO C in exported header > > flow_classify: fix ISO C in exported header > > member: fix ISO C in exported header > > lib: fix missing includes in exported headers > > net/failsafe: fix invalid free > > net/mlx4: fix drop flow resources leak > > > >Ajit Khaparde (7): > > net/bnxt: fix double increment of idx during Tx ring alloc > > net/bnxt: parse checksum offload flags > > net/bnxt: fix group info usage > > net/bnxt: fix check for ether type > > net/bnxt: fix size of Tx ring in HW > > net/bnxt: fix number of pools for RSS > > net/bnxt: fix return code in MAC address set > > > >Akhil Goyal (2): > > security: fix enum start value > > examples/ipsec-secgw: fix corner case for SPI value > > > >Alejandro Lucero (3): > > net/nfp: fix MTU settings > > net/nfp: fix jumbo settings > > net/nfp: fix CRC strip check behaviour > > > >Anatoly Burakov (16): > > memzone: fix leak on allocation error > > malloc: protect stats with lock > > malloc: fix end for bounded elements > > vfio: fix enabled check on error > > app/procinfo: add compilation option in config > > test: register test as failed if setup failed > > test/table: fix uninitialized parameter > > test/memzone: fix wrong test > > member: fix memory leak on error > > usertools/devbind: fix kernel module reporting > > test/bitmap: fix memory leak > > test/reorder: fix memory leak > > test/ring: fix memory leak > > test/ring_perf: fix memory leak > > test/table: fix memory leak > > test/timer_perf: fix memory leak > > > >Andrea Grandi (2): > > doc: fix lists of supported crypto algorithms > > doc: fix format in OpenSSL installation guide > > > >Andrew Rybchenko (7): > > net/sfc: stop periodic DMA if MAC stats upload fails > > net/sfc: fix multicast address list copy memory leak > > net/sfc: fix DMA memory leak after kvarg processing failure > > net/sfc: fix label name to be consistent > > net/sfc: do not hold management event queue lock while MCDI > > net/sfc: fix incorrect bitwise ORing of L3/L4 packet types > > mempool: fix physical contiguous check > > > >Andriy Berestovskyy (1): > > keepalive: fix state alignment > > > >Anoob Joseph (1): > > examples/ipsec-secgw: fix usage of incorrect port > > > >Ashish Jain (1): > > net/dpaa: fix the mbuf packet type if zero > > > >Bao-Long Tran (1): > > examples/ip_pipeline: fix timer period unit > > > >Beilei Xing (12): > > net/i40e: fix VLAN offload setting issue > > net/i40e: fix FDIR input set conflict > > net/i40e: fix FDIR rule confiliction issue > > net/i40e: fix setting MAC address of VF > > net/i40e: fix flow director Rx resource defect > > net/i40e: warn when writing global registers > > net/i40e: add debug logs when writing global registers > > net/i40e: fix multiple driver support > > net/i40e: fix interrupt conflict with multi-driver > > net/i40e: fix Rx interrupt > > net/i40e: check multi-driver option parsing > > app/testpmd: fix flow director filter > > > >Chas Williams (1): > > net/bonding: fix setting slave MAC addresses > > > >David Harton (1): > > net/i40e: fix VF reset stats crash > > > >Didier Pallard (1): > > net/virtio: fix incorrect cast
[dpdk-dev] 17.11.1 patches review and test
/qede: fix few log messages Raslan Darawsheh (1): net/mlx5: fix flow type for allmulti rules Roman Zhukov (2): net/sfc: fix initialization of flow structure net/sfc: fix flow RSS check in error handling Rosen Xu (1): net/i40e: fix packet type for X722 Samuel Gauthier (1): net/virtio: fix Rx and Tx handler selection for ARM32 Shachar Beiser (1): net/mlx5: fix IPv6 header fields Shahaf Shuler (9): app/testpmd: fix port configuration print app/testpmd: fix flowgen forwarding offload flags net/mlx5: fix VLAN configuration after port stop net/mlx5: fix RSS key configuration net/mlx5: fix missing RSS capability net/mlx5: fix memory region cache lookup net/mlx5: fix memory region cache last index net/mlx5: fix memory region boundary checks net/mlx5: fix link state on device start Somnath Kotur (3): net/bnxt: fix duplicate filter pattern creation error net/bnxt: fix duplicate pattern for 5tuple filter net/bnxt: free the aggregation ring Stephen Hemminger (1): net/tap: remove unused kernel version definitions Thierry Herbelot (1): net/mlx5: cleanup allocation of ethtool stats Thomas Monjalon (1): ethdev: fix link autonegotiation value Timothy Redaelli (1): app/testpmd: remove xenvirt again Tiwei Bie (2): net/virtio: fix vector Rx flushing net/virtio: fix typo in LRO support Tomasz Jozwiak (2): crypto/qat: fix out-of-bounds access crypto/qat: fix parameter type Victor Kaplansky (1): vhost: protect active rings from async ring changes Vipin Varghese (1): net/pcap: fix the NUMA id display in logs Wei Dai (2): app/testpmd: fix invalid Rx queue number setting app/testpmd: fix invalid Tx queue number setting Wei Zhao (5): net/ixgbe: fix tunnel filter fail problem net/i40e: add FDIR NVGRE parameter check net/ixgbe: fix parsing FDIR NVGRE issue net/i40e: fix port segmentation fault when restart net/ixgbe: fix reset error handling Wenzhuo Lu (4): net/ixgbe: fix wrong PBA setting net/i40e: fix VF Rx interrupt enabling net/ixgbe: fix VF Rx interrupt enabling net/e1000: fix VF Rx interrupt enabling Xiao Wang (2): net/fm10k: fix logical port delete igb_uio: allow multi-process access Xueming Li (2): cmdline: fix dynamic tokens parsing cmdline: avoid garbage in unused fields of parsed result Yangchao Zhou (1): net/igb: fix Tx queue number assignment Yanglong Wu (3): app/testpmd: fix port id allocation net/ixgbe: fix the failure of number of Tx queue check net/ixgbe: fix max queue number for VF Yong Wang (4): net/e1000: fix null pointer check net/i40e: fix memory leak crypto/qat: fix allocation check and leak net/dpaa: fix potential memory leak Yongseok Koh (10): app/testpmd: fix crash of txonly with multiple segments net/mlx5: fix Memory Region registration net/mlx5: fix overflow of Memory Region cache net/mlx5: fix HW checksum offload for outer IP net/mlx5: fix overwriting bit-fields in SW Rx queue net/mlx5: fix deadlock of link status alarm net/mlx5: fix missing attribute size for drop action net/mlx5: fix calculation of flow ID flag net/mlx5: fix memory region lookup net/mlx5: fix handling link status event Yuanhan Liu (4): Revert "net/mlx5: fix IPv6 header fields" Revert "ethdev: fix port id allocation" net/mlx5: use PCI address as port name Revert "net/mlx5: fix flow type for allmulti rules" Zhiyong Yang (3): bus/pci: fix interrupt handler type examples/vhost: fix startup check cryptodev: fix session pointer cast
Re: [dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation
On Tue, Feb 13, 2018 at 12:51:57PM +, Ferruh Yigit wrote: > On 1/13/2016 4:49 PM, stephen at networkplumber.org (Stephen Hemminger) wrote: > > On Wed, 13 Jan 2016 15:07:08 + > > Bruce Richardson wrote: > > > >> So, while the two-tab indent may look "a bit weird" it does solve the two > >> issues > >> above. I believe practical benefits should override initial impressions. > >> [It took > >> me a while to get used to also, but now I very much like it as a style.] > > > > I don't think that deviating from kernel style for this case is justified. > > This is very old patch still sitting in patchwork, re-visiting it mainly to be > able to clean the patchwork. Just drop it, there is no reason to revisit it. --yliu > > This is a syntax change request and although I have my personal preferences I > would be OK with whatever decided. > > Currently there is already a decided syntax, changing it after this point will > cause either mixed usage or a big syntax cleanup patch. I think both are not > good. > > I am for continue whatever documented in current DPDK coding style doc, hence > NAK from my side. > > Thanks, > ferruh
Re: [dpdk-dev] [PATCH] MAINTAINERS: resign from vhost/virtio maintainer
On Mon, Feb 12, 2018 at 10:54:08AM +0100, Maxime Coquelin wrote: > Hi Yuanhan, > > On 02/12/2018 10:44 AM, Yuanhan Liu wrote: > >On Mon, Feb 12, 2018 at 04:51:58PM +0800, Yuanhan Liu wrote: > >>I was doing terrible reviews jobs recently, and to not hold back > >>the vhost/virtio development, it's better for me to resign. > > I would not put this in the commit message, because even you had less > time to review vhost/virtio patches recently, you have done a great work > on maintaining virtio and vhost generally, we owe you a lot of thanks > for this. Thank you! > >>Then here it is. > > > >One thing I forgot to mention is, not surprisingly, I will also resign > >from the committer role on the next-virtio tree. > > I propose myself for the committer role, I have already some experience as > Kernel maintainer for some ARM SoCs. Thank you again for volunteering youself to take care of it. I think you have proved that well since you joined DPDK community :) So Acked-by: Yuanhan Liu Also Cc'ed the teckboard, per ask from Thomas. --yliu > > Cheers, > Maxime > > > --yliu > >> > >>Signed-off-by: Yuanhan Liu > >>--- > >> MAINTAINERS | 3 --- > >> 1 file changed, 3 deletions(-) > >> > >>diff --git a/MAINTAINERS b/MAINTAINERS > >>index 4f1f33b..f12d83d 100644 > >>--- a/MAINTAINERS > >>+++ b/MAINTAINERS > >>@@ -557,7 +557,6 @@ F: doc/guides/nics/vmxnet3.rst > >> F: doc/guides/nics/features/vmxnet3.ini > >> Vhost-user > >>-M: Yuanhan Liu > >> M: Maxime Coquelin > >> T: git://dpdk.org/next/dpdk-next-virtio > >> F: lib/librte_vhost/ > >>@@ -569,14 +568,12 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst > >> Vhost PMD > >> M: Tetsuya Mukawa > >>-M: Yuanhan Liu > >> M: Maxime Coquelin > >> T: git://dpdk.org/next/dpdk-next-virtio > >> F: drivers/net/vhost/ > >> F: doc/guides/nics/features/vhost.ini > >> Virtio PMD > >>-M: Yuanhan Liu > >> M: Maxime Coquelin > >> M: Tiwei Bie > >> T: git://dpdk.org/next/dpdk-next-virtio > >>-- > >>2.7.4
Re: [dpdk-dev] [PATCH] MAINTAINERS: resign from vhost/virtio maintainer
On Mon, Feb 12, 2018 at 04:51:58PM +0800, Yuanhan Liu wrote: > I was doing terrible reviews jobs recently, and to not hold back > the vhost/virtio development, it's better for me to resign. > > Then here it is. One thing I forgot to mention is, not surprisingly, I will also resign from the committer role on the next-virtio tree. --yliu > > Signed-off-by: Yuanhan Liu > --- > MAINTAINERS | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4f1f33b..f12d83d 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -557,7 +557,6 @@ F: doc/guides/nics/vmxnet3.rst > F: doc/guides/nics/features/vmxnet3.ini > > Vhost-user > -M: Yuanhan Liu > M: Maxime Coquelin > T: git://dpdk.org/next/dpdk-next-virtio > F: lib/librte_vhost/ > @@ -569,14 +568,12 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst > > Vhost PMD > M: Tetsuya Mukawa > -M: Yuanhan Liu > M: Maxime Coquelin > T: git://dpdk.org/next/dpdk-next-virtio > F: drivers/net/vhost/ > F: doc/guides/nics/features/vhost.ini > > Virtio PMD > -M: Yuanhan Liu > M: Maxime Coquelin > M: Tiwei Bie > T: git://dpdk.org/next/dpdk-next-virtio > -- > 2.7.4
[dpdk-dev] [PATCH] MAINTAINERS: resign from vhost/virtio maintainer
I was doing terrible reviews jobs recently, and to not hold back the vhost/virtio development, it's better for me to resign. Then here it is. Signed-off-by: Yuanhan Liu --- MAINTAINERS | 3 --- 1 file changed, 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 4f1f33b..f12d83d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -557,7 +557,6 @@ F: doc/guides/nics/vmxnet3.rst F: doc/guides/nics/features/vmxnet3.ini Vhost-user -M: Yuanhan Liu M: Maxime Coquelin T: git://dpdk.org/next/dpdk-next-virtio F: lib/librte_vhost/ @@ -569,14 +568,12 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst Vhost PMD M: Tetsuya Mukawa -M: Yuanhan Liu M: Maxime Coquelin T: git://dpdk.org/next/dpdk-next-virtio F: drivers/net/vhost/ F: doc/guides/nics/features/vhost.ini Virtio PMD -M: Yuanhan Liu M: Maxime Coquelin M: Tiwei Bie T: git://dpdk.org/next/dpdk-next-virtio -- 2.7.4
[dpdk-dev] [PATCH] net/mlx4: use PCI address as port name
It is suggested to use PCI BDF to identify a port for port addition in OVS-DPDK. While mlx5 has its own naming style: name it by ib dev name. This breaks the typical OVS DPDK use case and brings more puzzle to the end users. To fix it, this patch changes it to use PCI BDF as the name, too. Judging the fact that there are 2 ports associated with one PCI for ConnectX-3, a postfix is needed. Thus, the final name looks like something below: :04:00.0-port0 :04:00.0-port1 Cc: sta...@dpdk.org Signed-off-by: Yuanhan Liu --- drivers/net/mlx4/mlx4.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index ee93daf..eb8851c 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -497,6 +497,14 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) struct priv *priv = NULL; struct rte_eth_dev *eth_dev = NULL; struct ether_addr mac; + char name[RTE_ETH_NAME_MAX_LEN]; + int len; + + len = snprintf(name, sizeof(name), PCI_PRI_FMT, +pci_dev->addr.domain, pci_dev->addr.bus, +pci_dev->addr.devid, pci_dev->addr.function); + if (device_attr.phys_port_cnt > 1) + snprintf(name + len, sizeof(name), "-port%u", i); /* If port is not enabled, skip. */ if (!(conf.ports.enabled & (1 << i))) @@ -605,14 +613,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) /* Get actual MTU if possible. */ mlx4_mtu_get(priv, &priv->mtu); DEBUG("port %u MTU is %u", priv->port, priv->mtu); - /* from rte_ethdev.c */ - { - char name[RTE_ETH_NAME_MAX_LEN]; - - snprintf(name, sizeof(name), "%s port %u", -mlx4_glue->get_device_name(ibv_dev), port); - eth_dev = rte_eth_dev_allocate(name); - } + eth_dev = rte_eth_dev_allocate(name); if (eth_dev == NULL) { ERROR("can not allocate rte ethdev"); rte_errno = ENOMEM; -- 2.7.4
Re: [dpdk-dev] [PATCH] maintainers: update vhost lib and pmd
On Sun, Feb 11, 2018 at 01:04:51AM +, Jianfeng Tan wrote: > Signed-off-by: Jianfeng Tan Acked-by: Yuanhan Liu And welcome! BTW, the verb "update" doesn't look good to me ;) --yliu > --- > MAINTAINERS | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4f1f33b..38e5fb8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -559,6 +559,7 @@ F: doc/guides/nics/features/vmxnet3.ini > Vhost-user > M: Yuanhan Liu > M: Maxime Coquelin > +M: Jianfeng Tan > T: git://dpdk.org/next/dpdk-next-virtio > F: lib/librte_vhost/ > F: doc/guides/prog_guide/vhost_lib.rst > @@ -571,6 +572,7 @@ Vhost PMD > M: Tetsuya Mukawa > M: Yuanhan Liu > M: Maxime Coquelin > +M: Jianfeng Tan > T: git://dpdk.org/next/dpdk-next-virtio > F: drivers/net/vhost/ > F: doc/guides/nics/features/vhost.ini > -- > 2.7.4
[dpdk-dev] [PATCH] net/mlx5: fix link speed
When the link is down, mlx5 kernel driver reports the link speed as -1 (UNKNOWN_SPEED). We need turn it to 0 for such case, otherwise, it will be re-queried again due to the link_speed is not 0, due to following code: 1201 if (((link->link_speed == 0) && link->link_status) || 1202 ((link->link_speed != 0) && !link->link_status)) { 1203 /* 1204* Inconsistent status. Event likely occurred before the 1205* kernel netdevice exposes the new status. 1206*/ Fixes: 188408719888 ("net/mlx5: fix support for newer link speeds") Cc: sta...@dpdk.org Signed-off-by: Yuanhan Liu --- drivers/net/mlx5/mlx5_ethdev.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 6665076..492ca07 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -808,6 +808,10 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int wait_to_complete) return -1; } dev_link.link_speed = ecmd->speed; + if (link_speed == -1) + dev_link.link_speed = 0; + else + dev_link.link_speed = link_speed; sc = ecmd->link_mode_masks[0] | ((uint64_t)ecmd->link_mode_masks[1] << 32); priv->link_speed_capa = 0; -- 2.7.4
Re: [dpdk-dev] [PATCH v3 0/2] vhost: IOTLB fixes
On Mon, Jan 29, 2018 at 05:30:38PM +0100, Maxime Coquelin wrote: > First patch of the series fixes OOM handling from the IOTLB > mempool, the second one removes pending IOTLB entry when the > IOTLB miss request sending failed. > > Changes since v2: > - > - patch 2: Fix error message with correct IOVA > > Changes since v1: > - > - Make log levels consistent (Tiwei) > - Remove pending IOTLB entry of miss request seding failed (Tiwei) > > Maxime Coquelin (2): > vhost: fix iotlb pool out-of-memory handling > vhost: remove pending IOTLB entry if IOTLB MISS request sending failed Series Acked-by: Yuanhan Liu Thanks. --yliu > > lib/librte_vhost/iotlb.c | 20 ++-- > lib/librte_vhost/iotlb.h | 3 +++ > lib/librte_vhost/vhost.c | 13 ++--- > 3 files changed, 27 insertions(+), 9 deletions(-) > > -- > 2.14.3
Re: [dpdk-dev] [PATCH] examples/vhost_scsi: drop unimplemented EVENT_IDX feature bit
On Wed, Jan 31, 2018 at 05:48:28PM +, Stefan Hajnoczi wrote: > The vhost_scsi example application negotiates the > VIRTIO_RING_F_EVENT_IDX feature bit but does not honor it when accessing > vrings. > > In particular, commit e37ff954405addb8ea422426a2d162d00dcad196 ("vhost: > support virtqueue interrupt/notification suppression") broke vring call > because vq->last_used_idx is never updated by vhost_scsi. The > vq->last_used_idx field is not even available via the librte_vhost > public API, so VIRTIO_RING_F_EVENT_IDX is currently only usable by the > built-in virtio_net.c driver in librte_vhost. > > This patch drops VIRTIO_RING_F_EVENT_IDX from vhost_scsi so that vring > call works again. > > Cc: Changpeng Liu > Cc: Junjie Chen > Signed-off-by: Stefan Hajnoczi Acked-by: Yuanhan Liu Thanks. --yliu > --- > examples/vhost_scsi/vhost_scsi.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/examples/vhost_scsi/vhost_scsi.c > b/examples/vhost_scsi/vhost_scsi.c > index da01ad378..3cb4383e9 100644 > --- a/examples/vhost_scsi/vhost_scsi.c > +++ b/examples/vhost_scsi/vhost_scsi.c > @@ -21,7 +21,6 @@ > #include "scsi_spec.h" > > #define VIRTIO_SCSI_FEATURES ((1 << VIRTIO_F_NOTIFY_ON_EMPTY) |\ > - (1 << VIRTIO_RING_F_EVENT_IDX) |\ > (1 << VIRTIO_SCSI_F_INOUT) |\ > (1 << VIRTIO_SCSI_F_CHANGE)) > > -- > 2.14.3
Re: [dpdk-dev] [PATCH 0/2] vhost: fix VIRTIO_NET_F_MQ vhost_scsi breakage
On Wed, Jan 31, 2018 at 05:46:49PM +, Stefan Hajnoczi wrote: > These patches fix a recent regression in librte_vhost that breaks the > vhost_scsi example application. vhost_user.c assumes all devices are vhost > net > backends when handling the VIRTIO_NET_F_MQ feature bit. The code is triggered > by vhost scsi devices and causes virtqueues to be removed. See Patch 2 for > details. > > Patch 1 puts the infrastructure in place to distinguish between the built-in > virtio_net.c driver and generic vhost device backend usage. > > Patch 2 fixes the regression by handling VIRTIO_NET_F_MQ only when the > built-in > virtio_net.c driver is in use. > > Stefan Hajnoczi (2): > vhost: add flag for built-in virtio_net.c driver > vhost: only drop vqs with built-in virtio_net.c driver Series Acked-by: Yuanhan Liu Thanks. --yliu > > lib/librte_vhost/vhost.h | 3 +++ > lib/librte_vhost/socket.c | 15 +++ > lib/librte_vhost/vhost.c | 17 - > lib/librte_vhost/vhost_user.c | 3 ++- > lib/librte_vhost/virtio_net.c | 14 ++ > 5 files changed, 50 insertions(+), 2 deletions(-) > > -- > 2.14.3
Re: [dpdk-dev] [PATCH 1/2] vhost: add flag for built-in virtio_net.c driver
Hi Stefan, On Wed, Jan 31, 2018 at 05:46:50PM +, Stefan Hajnoczi wrote: > The librte_vhost API is used in two ways: > 1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst(). This is how DPDK vhost-user firstly implemented. > 2. As a library for implementing vhost device backends. This is how DPDK vhost-use extended later, and vhost-user scsi is the first one being added. > There is no distinction between the two at the API level or in the > librte_vhost implementation. For example, device state is kept in > "struct virtio_net" regardless of whether this is actually a net device > backend or whether the built-in virtio_net.c driver is in use. Indeed. virtio_net should be renamed to "vhost_dev" or something like this. It's part of something un-finished in the last vhost-user extension refactoring. > > The virtio_net.c driver should be a librte_vhost API client just like > the vhost-scsi code and have no special access to vhost.h internals. > Unfortunately, fixing this requires significant librte_vhost API > changes. The way I thought was to move the virtio_net.c completely to vhost pmd (drivers/net/vhost). And let vhost-user just be a generic lib without any device specific stuff. Unfortunately, it can not be done recently, as there are still a lot of applications using rte_vhost_enqueue/dequeue_burst directly, for example, OVS. > This patch takes a different approach: keep the librte_vhost API > unchanged but track whether the built-in virtio_net.c driver is in use. > See the next patch for a bug fix that requires knowledge of whether > virtio_net.c is in use. LGTM. Thanks. --yliu
Re: [dpdk-dev] [PATCH v3 0/4] net/virtio: fix memory leak when reinitializing device
On Tue, Jan 23, 2018 at 04:54:39PM +0100, Olivier Matz wrote: > When devops->configure() is called, the queues may be reallocated > if the features flag changed, but the previous are not freed. This > patchset fixes this issue. Seires applied to dpdk-next-virtio. Thanks. --yliu > > To really point out the issue, I instrumented rte_malloc, > rte_memzone_reserve, rte_mbuf_alloc to track the allocations and frees. > For reference, the patch is available here: > https://www.droids-corp.org/~zer0/hidden/0001-debug-malloc.patch > Iit conflicts a bit with the patchset, but it is quite easy to solve. > > To reproduce the issue: > > cd dpdk > make config T=x86_64-native-linuxapp-gcc > sed -i > 's,CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=n,CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=y,' > build/.config > make -j4 > mkdir -p /mnt/huge > mount -t hugetlbfs nodev /mnt/huge > echo 256 > > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages > modprobe uio_pci_generic > python usertools/dpdk-devbind.py -b uio_pci_generic :00:02.0 > ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \ > -i --port-topology=chained --disable-hw-vlan-filter \ > --disable-hw-vlan-strip --enable-rx-cksum --enable-lro \ > --txqflags=0 --no-lsc-interrupt > > # testpmd commands > port stop 0 > port config all rx-cksum off > port start 0 > quit > > Without the patchset, the debug patch displays a leak of mbufs allocated > from virtio_dev_rx_queue_setup_finish() and a leak of rte_mallocs allocated > from virtio_init_queue(). > > After the patchset, only one malloc is remaining after the device close, > but this is normal, because this area is freed in dev_uninit(). > > v2 -> v3: > - fix queue flushing with vector Rx enabled (Tiwei). > The patch is slightly changed compared to what was proposed by mail, > due to a checkpatch warning. > > v1 -> v2: > - move the vq != NULL check from the refactor patch to the memory > leak fix patch (Yuanhan) > - new patch to fix queue flushing with vector Rx enabled (Tiwei) > - move the backportable fixes at the beginning of the patchset (Yuanhan) > - remove Cc stable on the typo fix (Yuanhan) > note: I finally kept it in the patchset > > > Olivier Matz (4): > net/virtio: fix queue flushing with vector Rx enabled > net/virtio: fix memory leak when reinitializing device > net/virtio: rationalize queue flushing > net/virtio: fix typo in function name > > drivers/net/virtio/virtio_ethdev.c | 71 > +- > drivers/net/virtio/virtqueue.c | 30 +--- > drivers/net/virtio/virtqueue.h | 13 ++- > 3 files changed, 69 insertions(+), 45 deletions(-) > > -- > 2.11.0
Re: [dpdk-dev] [PATCH] net/virtio: fix Rx and Tx handler selection for arm32
On Thu, Dec 14, 2017 at 03:45:04PM +0100, Maxime Coquelin wrote: > Hi Olivier, > > On 12/14/2017 03:32 PM, Olivier Matz wrote: > >From: Samuel Gauthier > > > >On arm32, we were always selecting the simple handler, but it is only > >available if neon is present. > > > >This is due to a typo in the name of the config option. > >CONFIG_RTE_ARCH_ARM is for Makefiles. One should use RTE_ARCH_ARM. > > > >Fixes: 2d7c37194ee4 ("net/virtio: add NEON based Rx handler") > >Cc:sta...@dpdk.org > > > >Signed-off-by: Samuel Gauthier > >--- > > drivers/net/virtio/virtio_ethdev.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > Reviewed-by: Maxime Coquelin Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH v2] net/virtio-user: fix segfault as features change
On Tue, Jan 23, 2018 at 09:52:43AM +, Jianfeng Tan wrote: > Since commit 59fe5e17d930 ("vhost: propagate set features handling error"), > vhost does not allow to set different features without reset. > > The virito-user driver fails to reset the device in below commit. > > To fix, we send the reset message as stopping the device. > > Fixes: c12a26ee209e ("net/virtio-user: fix not properly reset device") > > Reported-by: Lei Yao > Reported-by: Tiwei Bie > Signed-off-by: Jianfeng Tan Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH] vhost: fix ANY_LAYOUT declaration
On Sat, Jan 20, 2018 at 01:54:52AM +0800, Tan, Jianfeng wrote: > > > On 1/19/2018 10:42 PM, Yuanhan Liu wrote: > >On Fri, Jan 19, 2018 at 02:02:50PM -0500, Zhihong Wang wrote: > >>The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary > >>descriptor layouts. The vhost-user lib already supports it, but the > >>feature declaration is missing. This patch fixes the mismatch. > >I remembered there was a long discussion one year ago, that we can't > >blindly set this feature, as this flag is reserved (thus should not > >be set) for virtio 1.0. > > We might need to read that old thread again. But as you said, this flag is > reserved for 1.0, but not used for other purpose yet. So that the feature is > negotiated does not affect anything, no? > > > > >We should set it when v1.0 is not enabled. > > But in fact, vhost kernel reports supported feature bits consisting of > ANY_LAYOUT and VERSION_1. Yes, I was aware of that, and that was also one my points before. I now also release that it's a must from migration from vhost-net to vhost-user. Thus, I'd like to merge it this time. So Applied to dpdk-next-virtio. Thanks. --yliu > > > However, as you know, v1.0 is also enabled by default in DPDK vhost-user. > > Even v1.0 is used, the feature negotiated between virtio-net (kernel) and > vhost kernel indeed contains ANY_LAYOUT. I suppose we shall do as the > de-facto way. > > Thanks, > Jianfeng > > > > > --yliu > >>Signed-off-by: Zhihong Wang > >>--- > >> lib/librte_vhost/vhost.h | 1 + > >> 1 file changed, 1 insertion(+) > >> > >>diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > >>index b2bf0e8..57a9bea 100644 > >>--- a/lib/librte_vhost/vhost.h > >>+++ b/lib/librte_vhost/vhost.h > >>@@ -170,6 +170,7 @@ struct vhost_msg { > >> /* Features supported by this builtin vhost-user net driver. */ > >> #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | > >> \ > >>+ (1ULL << VIRTIO_F_ANY_LAYOUT) | \ > >>(1ULL << VIRTIO_NET_F_CTRL_VQ) | \ > >>(1ULL << VIRTIO_NET_F_CTRL_RX) | \ > >>(1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \ > >>-- > >>2.7.5
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Wed, Jan 24, 2018 at 05:57:34PM +0100, Thomas Monjalon wrote: > 24/01/2018 16:04, Yuanhan Liu: > > On Wed, Jan 24, 2018 at 11:37:31AM +0100, Thomas Monjalon wrote: > > > 24/01/2018 11:36, Yuanhan Liu: > > > > On Wed, Jan 24, 2018 at 11:21:44AM +0100, Thomas Monjalon wrote: > > > > > 24/01/2018 10:28, Yuanhan Liu: > > > > > > On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote: > > > > > > > 24/01/2018 07:43, Yuanhan Liu: > > > > > > > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote: > > > > > > > > > 23/01/2018 13:46, Yuanhan Liu: > > > > > > > > > > If port not found, then the whole string will be used for > > > > > > > > > > dev attachment. > > > > > > > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND > > > > > > > > > > port == 0 (the 2nd port will not be attached). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > And here is how the devargs would look like if > > > > > > > > > > "matching;settings" is > > > > > > > > > > being used: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > > > > > > > > > > > > > > > > > The part before ";" will be used for lookup and the later > > > > > > > > > > part will be > > > > > > > > > > used for attachment. It should work. It just looks > > > > > > > > > > redundant. > > > > > > > > > > > > > > > > > > It does not have to be redundant. > > > > > > > > > It can be: > > > > > > > > > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,... > > > > > > > > > > > > > > > > I knew you would make such reply :) > > > > > > > > Then there is a contradiction. According your suggestion, the > > > > > > > > "port=0" belongs > > > > > > > > to the matching section, but it also has to be used in the > > > > > > > > settings section. > > > > > > > > > > > > > > If port=0 is matched, it is already set, right? > > > > > > > > > > > > Yes. > > > > > > > > > > > > > Why it needs to be in settings? > > > > > > > > > > > > But I was talking the case it's not matched, say it's not probed > > > > > > and here > > > > > > we do hotplug. > > > > > > > > > > I don't understand. > > > > > Anyway, the port property should be read-only. > > > > > > > > All proberties should be read-only. > > > > > > > > > Are we talking about the dev_port from the Linux kernel? > > > > > > > > Yes. And it can be used for probing one port only (out of 2 ports in a > > > > NIC) > > > > at probe stage. So, at this stage, it's a setting but not a match. > > > > > > No it's a match! > > > > > > A settings is changing data in the port. > > > > So I see that's your definition about the "settings". What I think is > > everything needed for driver initiation are settings. > > > > For example, one proposed interface for VF rep is the "vf_id" property, > > Similar to "port" property we have just discussed above, it's used for > > probing one specific VR rep for the given VF id. > > > > You can say it's a match here, just like the "port" property. > > > > But note that "vf_id" could be a range, to enable multiple VF reps. > > The semantics looks like "setting" more than "match". > > Not sure why it would look like settings. > > > Another example is from the failsafe PMD that Gaetan had mentioned: > > > > driver=failsafe,dev(bus=pci,id=00:02.0),fd(/some/file/) > > > > They (dev and fd) should belong the "setting" section, for 2 reasons: > > > > - they should not be used for matching > > - they are used for failsafe PMD initiation > > Yes these ones are settings. > > > But it belongs "match", according to your definition about "settings", > > because it doesn't change data in the port. > > No, it changes data, dev() is adding a slave, and fd() is adding > a file descriptor. Then "port" should be settings, as it's used for adding a port. Moreover, "vf_id=0-3" is definitely settings (according your definition), as it's going to add 4 VF rep ports. --yliu > So we agree these are settings. > > > That also means, the word "settings" might not be well named. It's > > probably better to name it "drvargs". > > I disagree. But it's only naming. > Settings can be class settings, not only driver settings. > And driver properties can be matching or settings. > So "drvargs" does not make sense.
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Tue, Jan 23, 2018 at 05:08:16PM +0100, Gaëtan Rivet wrote: > Drivers answers to a specific API (ethdev, cryptodev, ...), to create > standardized objects in response to parameters that are given to them > for init. I think matching properties should be restricted to higher > classes (bus, eth/crypto), That's also what I thought. But I'm okay to have "driver" category included for matching. I just don't really see a good example for that. > while the driver class should be left > free-form and to the responsibility of the PMD itself (while having the > proper libraries for helping parsing safely, thus driving developpers > toward similar syntaxes, while not forcing them in those). I agree. The drv args are parsed by the drivers after all. It's hard to have a good parser for all. I also don't know why we have to force them to use "key=value" pairs. I even see some drawbacks from the forcement: - some PMDs already use none key/value format. Forcing them breaks more. If the "-w" "--vdev" compatibility is kept", nothing will be broken from the user point of view. However, if "key=value" pair is going to be used, user have to do some changes. - Some "value" might have to use the nested "=". Handling the nested pairs introduces more complexity. - sometimes, it's simple without an assignment. For example, it could be "driver=vhost-pmd,...,client" to let the vhost PMD acts as the client mode. Both Linux kernel and QEMU don't force the "key=value" pair usage, I don't see any good reason why we have to do that. --yliu
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Wed, Jan 24, 2018 at 11:37:31AM +0100, Thomas Monjalon wrote: > 24/01/2018 11:36, Yuanhan Liu: > > On Wed, Jan 24, 2018 at 11:21:44AM +0100, Thomas Monjalon wrote: > > > 24/01/2018 10:28, Yuanhan Liu: > > > > On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote: > > > > > 24/01/2018 07:43, Yuanhan Liu: > > > > > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote: > > > > > > > 23/01/2018 13:46, Yuanhan Liu: > > > > > > > > If port not found, then the whole string will be used for dev > > > > > > > > attachment. > > > > > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND > > > > > > > > port == 0 (the 2nd port will not be attached). > > > > > > > > > > > > > > > > > > > > > > > > And here is how the devargs would look like if > > > > > > > > "matching;settings" is > > > > > > > > being used: > > > > > > > > > > > > > > > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > > > > > > > > > > > > > The part before ";" will be used for lookup and the later part > > > > > > > > will be > > > > > > > > used for attachment. It should work. It just looks redundant. > > > > > > > > > > > > > > It does not have to be redundant. > > > > > > > It can be: > > > > > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,... > > > > > > > > > > > > I knew you would make such reply :) > > > > > > Then there is a contradiction. According your suggestion, the > > > > > > "port=0" belongs > > > > > > to the matching section, but it also has to be used in the settings > > > > > > section. > > > > > > > > > > If port=0 is matched, it is already set, right? > > > > > > > > Yes. > > > > > > > > > Why it needs to be in settings? > > > > > > > > But I was talking the case it's not matched, say it's not probed and > > > > here > > > > we do hotplug. > > > > > > I don't understand. > > > Anyway, the port property should be read-only. > > > > All proberties should be read-only. > > > > > Are we talking about the dev_port from the Linux kernel? > > > > Yes. And it can be used for probing one port only (out of 2 ports in a NIC) > > at probe stage. So, at this stage, it's a setting but not a match. > > No it's a match! > > A settings is changing data in the port. So I see that's your definition about the "settings". What I think is everything needed for driver initiation are settings. For example, one proposed interface for VF rep is the "vf_id" property, Similar to "port" property we have just discussed above, it's used for probing one specific VR rep for the given VF id. You can say it's a match here, just like the "port" property. But note that "vf_id" could be a range, to enable multiple VF reps. The semantics looks like "setting" more than "match". Another example is from the failsafe PMD that Gaetan had mentioned: driver=failsafe,dev(bus=pci,id=00:02.0),fd(/some/file/) They (dev and fd) should belong the "setting" section, for 2 reasons: - they should not be used for matching - they are used for failsafe PMD initiation But it belongs "match", according to your definition about "settings", because it doesn't change data in the port. That also means, the word "settings" might not be well named. It's probably better to name it "drvargs". --yliu
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Wed, Jan 24, 2018 at 11:21:44AM +0100, Thomas Monjalon wrote: > 24/01/2018 10:28, Yuanhan Liu: > > On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote: > > > 24/01/2018 07:43, Yuanhan Liu: > > > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote: > > > > > 23/01/2018 13:46, Yuanhan Liu: > > > > > > If port not found, then the whole string will be used for dev > > > > > > attachment. > > > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND > > > > > > port == 0 (the 2nd port will not be attached). > > > > > > > > > > > > > > > > > > And here is how the devargs would look like if "matching;settings" > > > > > > is > > > > > > being used: > > > > > > > > > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > > > > > > > > > The part before ";" will be used for lookup and the later part will > > > > > > be > > > > > > used for attachment. It should work. It just looks redundant. > > > > > > > > > > It does not have to be redundant. > > > > > It can be: > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,... > > > > > > > > I knew you would make such reply :) > > > > Then there is a contradiction. According your suggestion, the "port=0" > > > > belongs > > > > to the matching section, but it also has to be used in the settings > > > > section. > > > > > > If port=0 is matched, it is already set, right? > > > > Yes. > > > > > Why it needs to be in settings? > > > > But I was talking the case it's not matched, say it's not probed and here > > we do hotplug. > > I don't understand. > Anyway, the port property should be read-only. All proberties should be read-only. > Are we talking about the dev_port from the Linux kernel? Yes. And it can be used for probing one port only (out of 2 ports in a NIC) at probe stage. So, at this stage, it's a setting but not a match. --yliu
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote: > 24/01/2018 07:43, Yuanhan Liu: > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote: > > > 23/01/2018 13:46, Yuanhan Liu: > > > > On Thu, Jan 18, 2018 at 10:46:23AM +0100, Gaëtan Rivet wrote: > > > > > On Thu, Jan 18, 2018 at 09:46:29AM +0100, Thomas Monjalon wrote: > > > > > > 18/01/2018 08:35, Yuanhan Liu: > > > > > > > On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote: > > > > > > > > So does it make sense to separate them logically? Perhaps as > > > > > > > > "device identifier" > > > > > > > > and "device args". > > > > > > > > > > > > > > Then I think it returns back to the old issue: how could we > > > > > > > identify a > > > > > > > port when the bus id (say BDF for PCI bus) is not enough for > > > > > > > identifying > > > > > > > a port? Such case could happen when a single NIC has 2 ports > > > > > > > sharing > > > > > > > the same BDF. It could also happen with the VF representors that > > > > > > > will > > > > > > > be introduced shortly. > > > > > > > > > > > > Yes, the device matching syntax must include bus category, class > > > > > > category > > > > > > and driver category. So any device can be identified in future. > > > > > > > > > > > > But I think Ferruh is talking about separating device matching > > > > > > (which is described in this proposal) and device settings > > > > > > (which are usually mixed in -w and --vdev options). > > > > > > I agree there are different things and may be separate. > > > > > > They could share the same syntax (bus/class/driver) but be separate > > > > > > with a semicolon: > > > > > > matching;settings > > > > > > > > > > Can you give an example? > > > > > > > > Let's take port addition in OVS-DPDK as an example. It happens in 2 > > > > steps: > > > > - port lookup (if port is already probed) > > > > - dev attachment (if lookup fails) > > > > > > > > And also let's assume we need probe a ConnectX-3 port. Note that for > > > > ConnectX-3, there are 2 ports sharing the same PCI addr. Thus, PCI > > > > BDF is not enough. And let's assume we use another extra property > > > > "port". > > > > > > > > If the proposal described in this patch is being used, the devarg > > > > would look like following: > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > > > > > Then "bus=pci,id=04:00.0/class=eth,port=0" will be used for lookup, > > > > It means we are looking for a port with PCI BDF == 04:00.0 AND > > > > port == 0 (the first port of the 2 ports). > > > > > > > > Note that in my proposal the driver category is not intended for lookup. > > > > If any properties needed be looked in the driver category, they would > > > > probably need be elevated to the class category. > > > > > > It is not my thought. > > > I think we should be able to use bus, class and driver properties for > > > lookup. > > > We can imagine doing a lookup on a driver specific id, which is not > > > candidate to elevation to the class category. > > > > > > > If port not found, then the whole string will be used for dev > > > > attachment. > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND > > > > port == 0 (the 2nd port will not be attached). > > > > > > > > > > > > And here is how the devargs would look like if "matching;settings" is > > > > being used: > > > > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > > > > > The part before ";" will be used for lookup and the later part will be > > > > used for attachment. It should work. It just looks redundant. > > > > > > It does not have to be redundant. > > > It can be: > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,... > > > > I knew you would make such reply :) > > Then there is a contradiction. According your suggestion, the "port=0" > > belongs > > to the matching section, but it also has to be used in the settings section. > > If port=0 is matched, it is already set, right? Yes. > Why it needs to be in settings? But I was talking the case it's not matched, say it's not probed and here we do hotplug. --yliu > > > Another example, setting the MAC address: > > > bus=pci,id=04:00.0/class=eth,port=0;class=eth,mac=00:11:22:33:44:55 > > > > What's the scenario it will be used? And who is going to parse it? > > It can be used on command line for whitelisting a device and let the user > change its MAC address. > The matching is parsed by the PCI driver + ethdev, here. > As mac is a property of ethdev (class=eth), this part must be parsed by > ethdev.
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote: > 23/01/2018 13:46, Yuanhan Liu: > > On Thu, Jan 18, 2018 at 10:46:23AM +0100, Gaëtan Rivet wrote: > > > On Thu, Jan 18, 2018 at 09:46:29AM +0100, Thomas Monjalon wrote: > > > > 18/01/2018 08:35, Yuanhan Liu: > > > > > On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote: > > > > > > So does it make sense to separate them logically? Perhaps as > > > > > > "device identifier" > > > > > > and "device args". > > > > > > > > > > Then I think it returns back to the old issue: how could we identify a > > > > > port when the bus id (say BDF for PCI bus) is not enough for > > > > > identifying > > > > > a port? Such case could happen when a single NIC has 2 ports sharing > > > > > the same BDF. It could also happen with the VF representors that will > > > > > be introduced shortly. > > > > > > > > Yes, the device matching syntax must include bus category, class > > > > category > > > > and driver category. So any device can be identified in future. > > > > > > > > But I think Ferruh is talking about separating device matching > > > > (which is described in this proposal) and device settings > > > > (which are usually mixed in -w and --vdev options). > > > > I agree there are different things and may be separate. > > > > They could share the same syntax (bus/class/driver) but be separate > > > > with a semicolon: > > > > matching;settings > > > > > > Can you give an example? > > > > Let's take port addition in OVS-DPDK as an example. It happens in 2 > > steps: > > - port lookup (if port is already probed) > > - dev attachment (if lookup fails) > > > > And also let's assume we need probe a ConnectX-3 port. Note that for > > ConnectX-3, there are 2 ports sharing the same PCI addr. Thus, PCI > > BDF is not enough. And let's assume we use another extra property > > "port". > > > > If the proposal described in this patch is being used, the devarg > > would look like following: > > > > bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > Then "bus=pci,id=04:00.0/class=eth,port=0" will be used for lookup, > > It means we are looking for a port with PCI BDF == 04:00.0 AND > > port == 0 (the first port of the 2 ports). > > > > Note that in my proposal the driver category is not intended for lookup. > > If any properties needed be looked in the driver category, they would > > probably need be elevated to the class category. > > It is not my thought. > I think we should be able to use bus, class and driver properties for lookup. > We can imagine doing a lookup on a driver specific id, which is not > candidate to elevation to the class category. > > > If port not found, then the whole string will be used for dev attachment. > > It means we are attaching a port with PCI BDF == 04.00.0 AND > > port == 0 (the 2nd port will not be attached). > > > > > > And here is how the devargs would look like if "matching;settings" is > > being used: > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... > > > > The part before ";" will be used for lookup and the later part will be > > used for attachment. It should work. It just looks redundant. > > It does not have to be redundant. > It can be: > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,... I knew you would make such reply :) Then there is a contradiction. According your suggestion, the "port=0" belongs to the matching section, but it also has to be used in the settings section. > Another example, setting the MAC address: > bus=pci,id=04:00.0/class=eth,port=0;class=eth,mac=00:11:22:33:44:55 What's the scenario it will be used? And who is going to parse it? --yliu
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Thu, Jan 18, 2018 at 10:46:23AM +0100, Gaëtan Rivet wrote: > On Thu, Jan 18, 2018 at 09:46:29AM +0100, Thomas Monjalon wrote: > > 18/01/2018 08:35, Yuanhan Liu: > > > On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote: > > > > So does it make sense to separate them logically? Perhaps as "device > > > > identifier" > > > > and "device args". > > > > > > Then I think it returns back to the old issue: how could we identify a > > > port when the bus id (say BDF for PCI bus) is not enough for identifying > > > a port? Such case could happen when a single NIC has 2 ports sharing > > > the same BDF. It could also happen with the VF representors that will > > > be introduced shortly. > > > > Yes, the device matching syntax must include bus category, class category > > and driver category. So any device can be identified in future. > > > > But I think Ferruh is talking about separating device matching > > (which is described in this proposal) and device settings > > (which are usually mixed in -w and --vdev options). > > I agree there are different things and may be separate. > > They could share the same syntax (bus/class/driver) but be separate > > with a semicolon: > > matching;settings > > Can you give an example? Let's take port addition in OVS-DPDK as an example. It happens in 2 steps: - port lookup (if port is already probed) - dev attachment (if lookup fails) And also let's assume we need probe a ConnectX-3 port. Note that for ConnectX-3, there are 2 ports sharing the same PCI addr. Thus, PCI BDF is not enough. And let's assume we use another extra property "port". If the proposal described in this patch is being used, the devarg would look like following: bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... Then "bus=pci,id=04:00.0/class=eth,port=0" will be used for lookup, It means we are looking for a port with PCI BDF == 04:00.0 AND port == 0 (the first port of the 2 ports). Note that in my proposal the driver category is not intended for lookup. If any properties needed be looked in the driver category, they would probably need be elevated to the class category. If port not found, then the whole string will be used for dev attachment. It means we are attaching a port with PCI BDF == 04.00.0 AND port == 0 (the 2nd port will not be attached). And here is how the devargs would look like if "matching;settings" is being used: bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,... The part before ";" will be used for lookup and the later part will be used for attachment. It should work. It just looks redundant. -yliu
[dpdk-dev] [PATCH] net/mlx5: use PCI BDF as the port name
It is suggested to use PCI BDF to identify a port for port addition in OVS-DPDK. While mlx5 has its own naming style: name it by ib dev name. This breaks the typical OVS DPDK use case and brings more puzzle to the end users. To fix it, this patch changes it to use PCI BDF as the name, too. Also, a postfix " port %u" is added, just in case their might be more than 1 port assoicated with a PCI device. Signed-off-by: Yuanhan Liu --- drivers/net/mlx5/mlx5.c | 22 +- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 9d1de36..7cd9db7 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -610,6 +610,8 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) INFO("%u port(s) detected", device_attr.orig_attr.phys_port_cnt); for (i = 0; i < device_attr.orig_attr.phys_port_cnt; i++) { + char name[RTE_ETH_NAME_MAX_LEN]; + int len; uint32_t port = i + 1; /* ports are indexed from one */ uint32_t test = (1 << i); struct ibv_context *ctx = NULL; @@ -633,14 +635,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) .inline_max_packet_sz = MLX5_ARG_UNSET, }; + len = snprintf(name, sizeof(name), PCI_PRI_FMT, +pci_dev->addr.domain, pci_dev->addr.bus, +pci_dev->addr.devid, pci_dev->addr.function); + if (device_attr.orig_attr.phys_port_cnt > 1) + snprintf(name + len, sizeof(name), " port %u", i); + mlx5_dev[idx].ports |= test; if (rte_eal_process_type() == RTE_PROC_SECONDARY) { - /* from rte_ethdev.c */ - char name[RTE_ETH_NAME_MAX_LEN]; - - snprintf(name, sizeof(name), "%s port %u", -ibv_get_device_name(ibv_dev), port); eth_dev = rte_eth_dev_attach_secondary(name); if (eth_dev == NULL) { ERROR("can not attach rte ethdev"); @@ -834,14 +837,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) priv_get_mtu(priv, &priv->mtu); DEBUG("port %u MTU is %u", priv->port, priv->mtu); - /* from rte_ethdev.c */ - { - char name[RTE_ETH_NAME_MAX_LEN]; - - snprintf(name, sizeof(name), "%s port %u", -ibv_get_device_name(ibv_dev), port); - eth_dev = rte_eth_dev_allocate(name); - } + eth_dev = rte_eth_dev_allocate(name); if (eth_dev == NULL) { ERROR("can not allocate rte ethdev"); err = ENOMEM; -- 2.7.4
Re: [dpdk-dev] [PATCH] vhost: fix ANY_LAYOUT declaration
On Fri, Jan 19, 2018 at 02:02:50PM -0500, Zhihong Wang wrote: > The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary > descriptor layouts. The vhost-user lib already supports it, but the > feature declaration is missing. This patch fixes the mismatch. I remembered there was a long discussion one year ago, that we can't blindly set this feature, as this flag is reserved (thus should not be set) for virtio 1.0. We should set it when v1.0 is not enabled. However, as you know, v1.0 is also enabled by default in DPDK vhost-user. --yliu > > Signed-off-by: Zhihong Wang > --- > lib/librte_vhost/vhost.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index b2bf0e8..57a9bea 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -170,6 +170,7 @@ struct vhost_msg { > > /* Features supported by this builtin vhost-user net driver. */ > #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ > + (1ULL << VIRTIO_F_ANY_LAYOUT) | \ > (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ > (1ULL << VIRTIO_NET_F_CTRL_RX) | \ > (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \ > -- > 2.7.5
Re: [dpdk-dev] [PATCH] reset src fd field to -1 in fdset_move of vhost
On Thu, Dec 21, 2017 at 05:15:40PM +0800, Bing Zhao wrote: > In the fdset_move, after copying the fd&rwfds from the src to the dst, the fd > should be set to -1. Or else in some cases, there will be a fault missing. > E.g: > Before: 1 -1 3 4 -1 6 7 -1 9 10 > After: 1 10 3 4 9 6 7 -1 9 10 > Then the index7 will be returned correctly for the first time, but if another > fd is to be added, it will fail. Hi, Have you met a real issue? I'm a bit doubt about that, since the fd array is also guarded by "pfdset->num", which makes sure we will not access those invalid entries (i.e. the last 2 entries in above example). --yliu > Signed-off-by: Bing Zhao > --- > lib/librte_vhost/fd_man.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/librte_vhost/fd_man.c b/lib/librte_vhost/fd_man.c > index 4c6fed418..48594dd7f 100644 > --- a/lib/librte_vhost/fd_man.c > +++ b/lib/librte_vhost/fd_man.c > @@ -63,6 +63,7 @@ fdset_move(struct fdset *pfdset, int dst, int src) > { > pfdset->fd[dst]= pfdset->fd[src]; > pfdset->rwfds[dst] = pfdset->rwfds[src]; > + pfdset->fd[src].fd = -1; > } > > static void > -- > 2.11.0.windows.1 >
Re: [dpdk-dev] [PATCH v5] vhost_user: protect active rings from async ring changes
On Wed, Jan 17, 2018 at 03:49:25PM +0200, Victor Kaplansky wrote: > When performing live migration or memory hot-plugging, > the changes to the device and vrings made by message handler > done independently from vring usage by PMD threads. > > This causes for example segfaults during live-migration > with MQ enable, but in general virtually any request > sent by qemu changing the state of device can cause > problems. Applied to dpdk-next-virtio, with Cc: sta...@dpdk.org Thanks. --yliu > > These patches fixes all above issues by adding a spinlock > to every vring and requiring message handler to start operation > only after ensuring that all PMD threads related to the device > are out of critical section accessing the vring data. > > Each vring has its own lock in order to not create contention > between PMD threads of different vrings and to prevent > performance degradation by scaling queue pair number. > > See https://bugzilla.redhat.com/show_bug.cgi?id=1450680 > > Signed-off-by: Victor Kaplansky > --- > v5: > o get rid of spinlock wrapping functions in vhost.h > > v4: > > o moved access_unlock before accessing enable flag and >access_unlock after iommu_unlock consistently. > o cosmetics: removed blank line. > o the access_lock variable moved to be in the same >cache line with enable and access_ok flags. > o dequeue path is now guarded with trylock and returning >zero if unsuccessful. > o GET_VRING_BASE operation is not guarded by access lock >to avoid deadlock with device_destroy. See the comment >in the code. > o Fixed error path exit from enqueue and dequeue carefully >unlocking access and iommu locks as appropriate. > > v3: >o Added locking to enqueue flow. >o Enqueue path guarded as well as dequeue path. >o Changed name of active_lock. >o Added initialization of guarding spinlock. >o Reworked functions skimming over all virt-queues. >o Performance measurements done by Maxime Coquelin shows > no degradation in bandwidth and throughput. >o Spelling. >o Taking lock only on set operations. >o IOMMU messages are not guarded by access lock. > > v2: >o Fixed checkpatch complains. >o Added Signed-off-by. >o Refined placement of guard to exclude IOMMU messages. >o TODO: performance degradation measurement. > > lib/librte_vhost/vhost.h | 6 ++-- > lib/librte_vhost/vhost.c | 1 + > lib/librte_vhost/vhost_user.c | 70 > +++ > lib/librte_vhost/virtio_net.c | 28 ++--- > 4 files changed, 99 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c17..c8f2a817 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -108,12 +108,14 @@ struct vhost_virtqueue { > > /* Backend value to determine if device should started/stopped */ > int backend; > + int enabled; > + int access_ok; > + rte_spinlock_t access_lock; > + > /* Used to notify the guest (trigger interrupt) */ > int callfd; > /* Currently unused as polling mode is enabled */ > int kickfd; > - int enabled; > - int access_ok; > > /* Physical address of used ring, for logging */ > uint64_tlog_guest_addr; > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c > index 4f8b73a0..dcc42fc7 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -259,6 +259,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t > vring_idx) > > dev->virtqueue[vring_idx] = vq; > init_vring_queue(dev, vring_idx); > + rte_spinlock_init(&vq->access_lock); > > dev->nr_vring += 1; > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index f4c7ce46..0685d4e7 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -1190,12 +1190,47 @@ vhost_user_check_and_alloc_queue_pair(struct > virtio_net *dev, VhostUserMsg *msg) > return alloc_vring_queue(dev, vring_idx); > } > > +static void > +vhost_user_lock_all_queue_pairs(struct virtio_net *dev) > +{ > + unsigned int i = 0; > + unsigned int vq_num = 0; > + > + while (vq_num < dev->nr_vring) { > + struct vhost_virtqueue *vq = dev->virtqueue[i]; > + > + if (vq) { > + rte_spinlock_lock(&vq->access_lock); > + vq_num++; > + } > + i++; > + } > +} > + > +static void > +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) > +{ > + unsigned int i = 0; > + unsigned int vq_num = 0; > + > + while (vq_num < dev->nr_vring) { > + struct vhost_virtqueue *vq = dev->virtqueue[i]; > + > + if (vq) { > +
Re: [dpdk-dev] [PATCH 00/12] lib/librte_vhost: introduce new vhost_user crypto
On Mon, Nov 27, 2017 at 08:01:03PM +, Fan Zhang wrote: > This patchset adds crypto backend suppport to vhost_user library, > including a proof-of-concept sample application. The implementation > follows the virtio-crypto specification and have been tested > with qemu 2.9.50 (with several patches applied, detailed later) > with Fedora 24 running in the frontend. > > The vhost_user library acts as a "bridge" method that translate > the virtio-crypto crypto requests to DPDK crypto operations, so it > is purely software implementation. However it does require the user > to provide the DPDK Cryptodev ID so it knows how to handle the > virtio-crypto session creation and deletion mesages. > > Currently the implementation supports AES-CBC-128 and HMAC-SHA1 > cipher only/chaining modes and does not support sessionless mode > yet. The guest can use standard virtio-crypto driver to set up > session and sends encryption/decryption requests to backend. The > vhost-crypto sample application provided in this patchset will > do the actual crypto work. > > To make this patchset working, a few tweaks need to be done: > > In the host: > 1. Download the qemu source code, and apply the patches in: > list.nongnu.org/archive/html/qemu-devel/2017-07/msg04664.html. I could not open it. What's the status of them now? Have they got merged? Besides that, this patchset doesn't seem well organized. For example, Makefile, version_map, config file, doc (etc.) are all updated separately. Could you do a re-organization? Personally, I'd suggest something below: - one patch for introduce extra msg_handler - one patch for implementing vhost_crypto (release doc should also be updated here). - one patch for adding an example (example doc should also be included here). Of course, if you could split 2nd item a bit further, that'd be better. Also, the new functions should be marked as "experimental". --yliu > As the patches have been out for over 2 months, they cannot be > cleanly applied to the latest code. I have reverted the commits > back to 4c4414a to make the patches applied cleanly. > > 2. Recompile your qemu. > > 3. Apply this patchset to latest DPDK code and recompile DPDK. > > 4. Compile and run vhost-crypto sample application. > > ./examples/vhost_crypto/build/vhost-crypto -l 11,12,13 -w :86:01.0 \ > --socket-mem 2048,2048 > > Where :86:01.0 is the QAT PCI address. You may use AES-NI-MB if it is > not available. The sample application requires 3 lcores: 1 master, 1 worker, > and 1 statistics displaying worker. The application will create a UNIX socket > file /tmp/vhost_crypto1.socket. > > 5. Start your qemu application. Here is my command: > > qemu/x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -cpu host \ > -smp 2 -m 1G -hda ~/path-to-your/image.qcow \ > -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on \ > -mem-prealloc -numa node,memdev=mem -chardev \ > socket,id=charcrypto0,path=/tmp/vhost_crypto1.socket \ > -object cryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0 \ > -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0 > > 6. Once guest is booted. The Linux virtio_crypto kernel module is loaded by > default. You shall see the following logs in your demsg: > > [ 17.611723] virtio_crypto virtio0: max_queues: 1, max_cipher_key_len: ... > [ 17.612156] virtio_crypto virtio0: will run requests pump with realtime ... > [ 18.376100] virtio_crypto virtio0: Accelerator is ready > > The virtio_crypto driver in the guest is now up and running. > > 7. The rest steps can be as same as the Testing section in > https://wiki.qemu.org/Features/VirtioCrypto > > Fan Zhang (12): > lib/librte_vhost: add private data field > lib/librte_vhost: add vhost_user private info structure > lib/librte_vhost: add virtio crypto user message structure > lib/librte_vhost: add session messsage handler > lib/librte_vhost: add request handler > lib/librte_vhost: add head file > lib/librte_vhost: add public function implementation > lib/librte_vhost: update version map > lib/librte_vhost: update makefile > config: added no copy configuration item > example/vhost_crypto: add vhost_crypto sample application > doc: update vhost crypto documentation > > config/common_base|1 + > doc/guides/prog_guide/vhost_lib.rst | 16 + > doc/guides/rel_notes/release_18_02.rst|7 + > doc/guides/sample_app_ug/index.rst|1 + > doc/guides/sample_app_ug/vhost_crypto.rst | 95 +++ > examples/vhost_crypto/Makefile| 59 ++ > examples/vhost_crypto/main.c | 588 + > lib/librte_vhost/Makefile |6 +- > lib/librte_vhost/rte_vhost_crypto.h | 150 > lib/librte_vhost/rte_vhost_version.map| 11 + > lib/librte_vhost/vhost.h |4 +- > lib/librte_vhost/vhost_crypto.c | 1314 > + > lib/librte_vh
Re: [dpdk-dev] [PATCH 03/12] lib/librte_vhost: add virtio crypto user message structure
On Mon, Nov 27, 2017 at 08:01:06PM +, Fan Zhang wrote: > This patch adds virtio-crypto spec user message structure to > vhost_user. > > Signed-off-by: Fan Zhang > --- > lib/librte_vhost/vhost_user.h | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h > index 990b40a..793cdc2 100644 > --- a/lib/librte_vhost/vhost_user.h > +++ b/lib/librte_vhost/vhost_user.h > @@ -108,6 +108,30 @@ typedef struct VhostUserLog { > uint64_t mmap_offset; > } VhostUserLog; > > +/* Comply with Cryptodev-Linux */ > +#define VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH512 > +#define VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH 64 > + > +/* Same structure as vhost-user backend session info */ > +typedef struct VhostUserCryptoSessionParam { > + int64_t session_id; > + uint32_t op_code; > + uint32_t cipher_algo; > + uint32_t cipher_key_len; > + uint32_t hash_algo; > + uint32_t digest_len; > + uint32_t auth_key_len; > + uint32_t aad_len; > + uint8_t op_type; > + uint8_t dir; > + uint8_t hash_mode; > + uint8_t chaining_dir; > + uint8_t *ciphe_key; > + uint8_t *auth_key; > + uint8_t cipher_key_buf[VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH]; > + uint8_t auth_key_buf[VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH]; > +} VhostUserCryptoSessionParam; I'd suggest to merge this patch (which defines the struct) and the next one (which uses it) into one patch. --yliu > + > typedef struct VhostUserMsg { > union { > VhostUserRequest master; > @@ -128,6 +152,7 @@ typedef struct VhostUserMsg { > VhostUserMemory memory; > VhostUserLoglog; > struct vhost_iotlb_msg iotlb; > + VhostUserCryptoSessionParam crypto_session; > } payload; > int fds[VHOST_MEMORY_MAX_NREGIONS]; > } __attribute((packed)) VhostUserMsg; > -- > 2.9.5
Re: [dpdk-dev] [PATCH 02/12] lib/librte_vhost: add vhost_user private info structure
On Mon, Nov 27, 2017 at 08:01:05PM +, Fan Zhang wrote: > This patch adds a vhost_user_dev_priv structure and a vhost_user > message handler function prototype to vhost_user. This allows > different types of devices to add private information and their > device-specific vhost-user message function handlers to > virtio_net structure. The change to vhost_user_msg_handler is > also added to call the device-specific message handler. > > Signed-off-by: Fan Zhang > --- > lib/librte_vhost/vhost_user.c | 13 - > lib/librte_vhost/vhost_user.h | 9 - > 2 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index f4c7ce4..c5ae85f 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -1337,7 +1337,18 @@ vhost_user_msg_handler(int vid, int fd) > break; > > default: Hi Fan, Firstly, apologize for the very late review! > - ret = -1; > + if (!dev->private_data) > + ret = -1; > + else { > + struct vhost_user_dev_priv *priv = dev->private_data; > + > + if (!priv->vhost_user_msg_handler) > + ret = -1; > + else { > + ret = (*priv->vhost_user_msg_handler)(dev, > + &msg, fd); I think this is more complex than necessary. You could just add msg_handler in the virtio_net struct and do: if (dev->msg_handler) ret = dev->msg_handler(dev, &msg, fd); else ret = -1; > + } > + } > break; > > } > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h > index 76d9fe2..990b40a 100644 > --- a/lib/librte_vhost/vhost_user.h > +++ b/lib/librte_vhost/vhost_user.h > @@ -81,7 +81,7 @@ typedef enum VhostUserRequest { > VHOST_USER_NET_SET_MTU = 20, > VHOST_USER_SET_SLAVE_REQ_FD = 21, > VHOST_USER_IOTLB_MSG = 22, > - VHOST_USER_MAX > + VHOST_USER_MAX = 25 That's weird, we should list all VHOST_USER messages here. --yliu > } VhostUserRequest; > > typedef enum VhostUserSlaveRequest { > @@ -137,6 +137,13 @@ typedef struct VhostUserMsg { > /* The version of the protocol we support */ > #define VHOST_USER_VERSION0x1 > > +typedef int (*msg_handler)(struct virtio_net *dev, struct VhostUserMsg *msg, > + int fd); > + > +struct vhost_user_dev_priv { > + msg_handler vhost_user_msg_handler; > + char data[0]; > +}; > > /* vhost_user.c */ > int vhost_user_msg_handler(int vid, int fd); > -- > 2.9.5
Re: [dpdk-dev] [PATCH 1/3] net/virtio: fix typo in function name
On Thu, Jan 18, 2018 at 02:45:48PM +0100, Olivier Matz wrote: > On Thu, Jan 18, 2018 at 09:27:10PM +0800, Yuanhan Liu wrote: > > On Thu, Jan 18, 2018 at 10:07:31AM +0100, Olivier Matz wrote: > > > Fixes: c1f86306a026 ("virtio: add new driver") > > > Cc: sta...@dpdk.org > > > > I would not suggest to include such patch for a stable release. It doesn't > > fix a real issue. > > Yes. > > I've included it in the patchset to avoid conflicts for backports because it > changes the same code area. > > If you prefer, I can send a new patchset with the current 2/3 and 3/3, plus > the typo fix as a individual patch, on top of them. > > Does it look good to you? Yes, that's the reason I personaly always put the bug fix patches in head of a patchset, for avoiding conflicts like this. Thanks. --yliu
Re: [dpdk-dev] [PATCH 2/3] net/virtio: rationalize queue flushing
On Thu, Jan 18, 2018 at 02:55:44PM +0100, Olivier Matz wrote: > On Thu, Jan 18, 2018 at 09:26:09PM +0800, Yuanhan Liu wrote: > > Hi Oliver, > > > > On Thu, Jan 18, 2018 at 10:07:32AM +0100, Olivier Matz wrote: > > > Rationalize the function virtio_dev_free_mbufs(): > > > > > > - skip NULL vqs instead of crashing: this is required for the > > > next commit > > > - use the same kind of loop than in virtio_free_queues() > > > - also flush mbufs from the control queue (this is useless yet) > > > > Could we just do "nr_vq = virtio_get_nr_vq(hw) - 1" with a comment that > > CQ is excluded, for skipping the CQ? > > Is "nr_vq = virtio_get_nr_vq(hw) - 1" always valid? > Shouldn't we do this check? > if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) > > Instead, I suggest this: > > queue_type = virtio_get_queue_type(hw, i); > if (queue_type == VTNET_RQ) > type = "rxq"; > else if (queue_type == VTNET_TQ) > type = "txq"; > else > - type = "cq"; > + continue; Yes, this is better. > > > > - factorize common code between rxq, txq, cq > > > > > > Cc: sta...@dpdk.org > > > > Could you split the patch two 2: > > > > - one for fixing the crash (skip the NULL vqs). We only need this one > > for stable release. > > - another one for the refactoring > > Yes, do you want them all in the same patchset? I think it's okay. Thanks. --yliu
Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex
Hi, Apologize for late review. On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote: > From: wang zhike > > v3: > * Fix duplicate variable name, which leads to unexpected memory write. > v2: > * Move fdset_del before conn destroy. > * Fix coding style. Note that we prefer to put the change logs after "---" below Signed-off-by, so that those change logs won't be tracked in the git log history. > This patch fixes below race condition: > 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex >->fdset_del->loop to check fd.busy. > 2. another thread calls fdset_event_dispatch, and the busy flag is >changed AFTER handling on the fd, i.e, rcb(). However, the rcb, >such as vhost_user_read_cb() would try to retrieve the conn_mutex. > > So issue is that the 1st thread will loop check the flag while holding > the mutex, while the 2nd thread would be blocked by mutex and can not > change the flag. Then dead lock is observed. I then would change the title to "vhost: fix deadlock". I'm also keen to know how do you reproduce this issue with real-life APP (say ovs) and how easy it is for reproduce. > Signed-off-by: zhike wang Again, you need fix your git config file about your name. > --- > lib/librte_vhost/socket.c | 18 +- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 422da00..ea01327 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { > struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; > > if (!strcmp(vsocket->path, path)) { > + int del_fds[MAX_FDS]; > + int num_of_fds = 0, fd_index; > + I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds), fd_idx". > if (vsocket->is_server) { > fdset_del(&vhost_user.fdset, > vsocket->socket_fd); > close(vsocket->socket_fd); > @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { > vhost_user_remove_reconnect(vsocket); > } > > + /* fdset_del() must be called without conn_mutex. */ > + pthread_mutex_lock(&vsocket->conn_mutex); > + for (conn = TAILQ_FIRST(&vsocket->conn_list); > + conn != NULL; > + conn = next) { > + next = TAILQ_NEXT(conn, next); > + > + del_fds[num_of_fds++] = conn->connfd; > + } > + pthread_mutex_unlock(&vsocket->conn_mutex); > + > + for (fd_index = 0; fd_index < num_of_fds; fd_index++) > + fdset_del(&vhost_user.fdset, del_fds[fd_index]); > + > pthread_mutex_lock(&vsocket->conn_mutex); > for (conn = TAILQ_FIRST(&vsocket->conn_list); >conn != NULL; >conn = next) { > next = TAILQ_NEXT(conn, next); > > - fdset_del(&vhost_user.fdset, conn->connfd); If you log the fd here and invoke fdset_del() and close() after the loop, you then could avoid one extra loop as you did above. --yliu > RTE_LOG(INFO, VHOST_CONFIG, > "free connfd = %d for device '%s'\n", > conn->connfd, path); > -- > 1.8.3.1
Re: [dpdk-dev] [PATCH 1/3] net/virtio: fix typo in function name
On Thu, Jan 18, 2018 at 10:07:31AM +0100, Olivier Matz wrote: > Fixes: c1f86306a026 ("virtio: add new driver") > Cc: sta...@dpdk.org I would not suggest to include such patch for a stable release. It doesn't fix a real issue. Thanks. --yliu > > Signed-off-by: Olivier Matz > --- > drivers/net/virtio/virtio_ethdev.c | 4 ++-- > drivers/net/virtio/virtqueue.c | 2 +- > drivers/net/virtio/virtqueue.h | 2 +- > 3 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index ec012c2ac..c7426951c 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -1879,7 +1879,7 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev > *dev) > VIRTQUEUE_DUMP(rxvq->vq); > > PMD_INIT_LOG(DEBUG, "rx_queues[%d]=%p", i, rxvq); > - while ((buf = virtqueue_detatch_unused(rxvq->vq)) != NULL) { > + while ((buf = virtqueue_detach_unused(rxvq->vq)) != NULL) { > rte_pktmbuf_free(buf); > mbuf_num++; > } > @@ -1899,7 +1899,7 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev > *dev) > VIRTQUEUE_DUMP(txvq->vq); > > mbuf_num = 0; > - while ((buf = virtqueue_detatch_unused(txvq->vq)) != NULL) { > + while ((buf = virtqueue_detach_unused(txvq->vq)) != NULL) { > rte_pktmbuf_free(buf); > mbuf_num++; > } > diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c > index 1ada4fe08..6988bfea4 100644 > --- a/drivers/net/virtio/virtqueue.c > +++ b/drivers/net/virtio/virtqueue.c > @@ -16,7 +16,7 @@ > * 2) mbuf that hasn't been consued by backend. > */ > struct rte_mbuf * > -virtqueue_detatch_unused(struct virtqueue *vq) > +virtqueue_detach_unused(struct virtqueue *vq) > { > struct rte_mbuf *cookie; > int idx; > diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h > index fedbaa39c..1288e5287 100644 > --- a/drivers/net/virtio/virtqueue.h > +++ b/drivers/net/virtio/virtqueue.h > @@ -270,7 +270,7 @@ void virtqueue_dump(struct virtqueue *vq); > /** > * Get all mbufs to be freed. > */ > -struct rte_mbuf *virtqueue_detatch_unused(struct virtqueue *vq); > +struct rte_mbuf *virtqueue_detach_unused(struct virtqueue *vq); > > /* Flush the elements in the used ring. */ > void virtqueue_rxvq_flush(struct virtqueue *vq); > -- > 2.11.0
Re: [dpdk-dev] [PATCH 2/3] net/virtio: rationalize queue flushing
Hi Oliver, On Thu, Jan 18, 2018 at 10:07:32AM +0100, Olivier Matz wrote: > Rationalize the function virtio_dev_free_mbufs(): > > - skip NULL vqs instead of crashing: this is required for the > next commit > - use the same kind of loop than in virtio_free_queues() > - also flush mbufs from the control queue (this is useless yet) Could we just do "nr_vq = virtio_get_nr_vq(hw) - 1" with a comment that CQ is excluded, for skipping the CQ? > - factorize common code between rxq, txq, cq > > Cc: sta...@dpdk.org Could you split the patch two 2: - one for fixing the crash (skip the NULL vqs). We only need this one for stable release. - another one for the refactoring Thanks. --yliu
Re: [dpdk-dev] [PATCH v2] vhost: dequeue zero copy should restore mbuf before return to pool
On Thu, Jan 18, 2018 at 09:44:41AM +0100, Maxime Coquelin wrote: > > > On 01/17/2018 04:45 PM, Junjie Chen wrote: > >dequeue zero copy change buf_addr and buf_iova of mbuf, and return > >to mbuf pool without restore them, it breaks vm memory if others allocate > >mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova. > > > >Signed-off-by: Junjie Chen > >--- > >v2 changes: > >Remove useless restore > > lib/librte_vhost/virtio_net.c | 17 + > > 1 file changed, 17 insertions(+) > > > > Reviewed-by: Maxime Coquelin Applied to dpdk-next-virtio, with below addings: Fixes: b0a985d1f340 ("vhost: add dequeue zero copy") Cc: sta...@dpdk.org Thanks. --yliu
Re: [dpdk-dev] [PATCH] lib/librte_vhost: mov enum definition from PMD to lib
On Fri, Jan 12, 2018 at 04:12:12PM +0800, Zhiyong Yang wrote: > The enum definition is placed in librte_vhost in order to avoid many > duplication definitions in PMD and example code everywhere. I moved them out from rte_vhost.h on purpose, for they are virtio-net specific while this library is not. So nack. --yliu
Re: [dpdk-dev] [PATCH 1/2] net: fixup RARP generation
On Thu, Jan 18, 2018 at 09:38:39AM +0100, Thomas Monjalon wrote: > 18/01/2018 04:14, Yuanhan Liu: > > Due to a mistake operation from me, older version (v10) was merged to > > master branch. It's the v11 should be applied. However, the master branch > > is not rebase-able. Thus, this patch is made, from the diff between v10 > > and v11. > > Understood it is a mistake. > However, you can briefly describes what does this change. > Is there a changelog in v11 patch? Yes, ther is: v11: - Add check for parameter and tailroom in rte_net_make_rarp_packet. - Allocate mbuf in rte_net_make_rarp_packet. > > > > Code is from Xiao Wang. > > You may add his Signed-off. I have no objection. Xiao, okay to you? I will also set the author to you. --yliu > > Signed-off-by: Yuanhan Liu
Re: [dpdk-dev] [PATCH 2/2] net: fix build error
On Thu, Jan 18, 2018 at 09:36:46AM +0100, Thomas Monjalon wrote: > 18/01/2018 09:03, Yuanhan Liu: > > On Thu, Jan 18, 2018 at 07:45:23AM +, Wang, Xiao W wrote: > > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > > > 18/01/2018 04:14, Yuanhan Liu: > > > > > Fix build error when shared lib is enabled: > > > > > > > > > > LD librte_net.so.1.1 > > > > > rte_arp.o: In function `rte_net_make_rarp_packet': > > > > > rte_arp.c:(.text+0x1f0): undefined reference to > > > > > `rte_mempool_ops_table' > > > > > rte_arp.c:(.text+0x21d): undefined reference to > > > > > `rte_mempool_ops_table' > > > > > rte_arp.c:(.text+0x2d5): undefined reference to > > > > > `rte_mempool_ops_table' > > > > > rte_arp.c:(.text+0x384): undefined reference to > > > > > `rte_mempool_ops_table' > > > > > rte_arp.c:(.text+0x4b7): undefined reference to > > > > > `rte_mempool_ops_table' > > > > > > > > This is very strange, I do not see this error on my machine. > > > > > > I could see this error on mine with: > > > +CONFIG_RTE_BUILD_SHARED_LIB=y > > > > Yes, that's what meant in the commit log by "when shared lib is enabled". > > Got it: you are fixing a build issue introduced by the patch 1 > in this series. > So please merge them. Right. I will merge them. --yliu
Re: [dpdk-dev] [PATCH 2/2] net: fix build error
On Thu, Jan 18, 2018 at 07:45:23AM +, Wang, Xiao W wrote: > > > > -Original Message- > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > Sent: Thursday, January 18, 2018 3:39 PM > > To: Yuanhan Liu > > Cc: dev@dpdk.org; Wang, Xiao W ; Yigit, Ferruh > > ; Olivier Matz > > Subject: Re: [PATCH 2/2] net: fix build error > > > > 18/01/2018 04:14, Yuanhan Liu: > > > Fix build error when shared lib is enabled: > > > > > > LD librte_net.so.1.1 > > > rte_arp.o: In function `rte_net_make_rarp_packet': > > > rte_arp.c:(.text+0x1f0): undefined reference to `rte_mempool_ops_table' > > > rte_arp.c:(.text+0x21d): undefined reference to `rte_mempool_ops_table' > > > rte_arp.c:(.text+0x2d5): undefined reference to `rte_mempool_ops_table' > > > rte_arp.c:(.text+0x384): undefined reference to `rte_mempool_ops_table' > > > rte_arp.c:(.text+0x4b7): undefined reference to `rte_mempool_ops_table' > > > > This is very strange, I do not see this error on my machine. > > I could see this error on mine with: > +CONFIG_RTE_BUILD_SHARED_LIB=y Yes, that's what meant in the commit log by "when shared lib is enabled". --yliu
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote: > On 1/16/2018 2:50 PM, Yuanhan Liu wrote: > > This patch documents the new devargs syntax, which is going to be > > implemented in DPDK v18.05. > > > > The new devargs proposal is introduced for having a consistent > > interface for: > > > > - whitelisting/blacklisting devices > > - identifying ports > > - attaching/detaching devices > > Hi Yuanhan, Hi Ferruh, > devargs = device arguments, the PMD specific arguments, similar to > module_param > in Linux. Not exactly. If you look at the function rte_eth_dev_attach(const char *devargs, uint16_t *port_id), the "device identifier" part you called also part of the devargs. > > Currently only "--vdev" and -w/-b eal parameters parse proceeding strings as > devargs. > > Like: "--vdev "net_pcap,iface=lo" . > For this case "iface=lo" device specific argument and available to use from > pcap > PMD. > > I agree it to have a consistent way to describe device, that makes better > whitelist/blacklist support. But that part is not device args, more like > device > identifier. > > When you use this string with whitelist/blacklist I think you won't need > "iface=lo" part, Right. That's actaully mentioned in this patch: Before device is probed, only the bus category is relevant. > only need first part. And when using with --vdev, (or perhaps > with attach) you don't need to use first part > "bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55", PMD already knows > it > is in virtual bus and its class etc. We are going to keep the compatibily, meaning we will leave "-w" and "--vdev" as they are. We plan to introduce one more CLI option and let user to switch to it so that it can work with any bus, not only PCI and vdev bus. > So does it make sense to separate them logically? Perhaps as "device > identifier" > and "device args". Then I think it returns back to the old issue: how could we identify a port when the bus id (say BDF for PCI bus) is not enough for identifying a port? Such case could happen when a single NIC has 2 ports sharing the same BDF. It could also happen with the VF representors that will be introduced shortly. --yliu > string can become: > "device=bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55;driver=PMD_NAME,driverspecificproperty=VALUE" > > specific usages can become: > -w "device=bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55" > --vdev "driver=PMD_NAME,driverspecificproperty=VALUE" > > And store them in two separate storage, and eal or PMD can ask for "device > identifier" or "device args" separately? > > What do you think? > > > > > Please check the patch content for the details. Also, here is link > > for the background: > > http://dpdk.org/ml/archives/dev/2017-November/082600.html > > > > This syntax is suggestd by Thomas: > > http://dpdk.org/ml/archives/dev/2017-December/084234.html > > > > Signed-off-by: Yuanhan Liu > > --- > > doc/guides/prog_guide/env_abstraction_layer.rst | 34 > > + > > 1 file changed, 34 insertions(+) > > > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > > b/doc/guides/prog_guide/env_abstraction_layer.rst > > index 34d871c..12f37f2 100644 > > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > > @@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such > > case, calling > > callback. Care must be taken not to close the device from the interrupt > > handler > > context. It is necessary to reschedule such closing operation. > > > > +Devargs > > +~~~ > > + > > +The ``devargs`` can be used for whitelisting/blacklisting devices, > > identifying > > +DPDK ports and attaching/deatching devices. They all share the same syntax. > > + > > +It is split in 3 categories, where almost everything is optional key/value > > pairs: > > + > > +* bus (pci, vdev, vmbus, fslmc, etc) > > +* class (eth, crypto, etc) > > +* driver (i40e, mlx5, virtio, etc) > > + > > +The key/value pair describing the category scope is mandatory and must be > > the > > +first pair in the category properties. Example: bus=pci, must be placed > > before > > +id=:01:00.0. > > + > > +The syntax has below rules: > > + > > +* Between categories, the separator is a slash. > > +* Inside a category, the separator is a comma. > > +* Inside a key/value pair, the separator is an equal sign. > > +* Each category can be used alone. > > + > > +Here is an example with all categories:: > > + > > + > > bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE > > + > > +It can also be simple like below:: > > + > > +class=eth,mac=00:11:22:33:44:55 > > + > > +A device is identified when every properties are matched. Before device is > > +probed, only the bus category is relevant. > > + > > Blacklisting > > > > > >
[dpdk-dev] [PATCH 1/2] net: fixup RARP generation
Due to a mistake operation from me, older version (v10) was merged to master branch. It's the v11 should be applied. However, the master branch is not rebase-able. Thus, this patch is made, from the diff between v10 and v11. Code is from Xiao Wang. Signed-off-by: Yuanhan Liu --- lib/librte_net/rte_arp.c | 26 +- lib/librte_net/rte_arp.h | 11 ++- lib/librte_vhost/virtio_net.c | 12 +++- 3 files changed, 26 insertions(+), 23 deletions(-) diff --git a/lib/librte_net/rte_arp.c b/lib/librte_net/rte_arp.c index d7223b0..b953bcd 100644 --- a/lib/librte_net/rte_arp.c +++ b/lib/librte_net/rte_arp.c @@ -7,17 +7,28 @@ #include #define RARP_PKT_SIZE 64 -int -rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac) +struct rte_mbuf * +rte_net_make_rarp_packet(struct rte_mempool *mpool, + const struct ether_addr *mac) { struct ether_hdr *eth_hdr; struct arp_hdr *rarp; + struct rte_mbuf *mbuf; - if (mbuf->buf_len < RARP_PKT_SIZE) - return -1; + if (mpool == NULL) + return NULL; + + mbuf = rte_pktmbuf_alloc(mpool); + if (mbuf == NULL) + return NULL; + + eth_hdr = (struct ether_hdr *)rte_pktmbuf_append(mbuf, RARP_PKT_SIZE); + if (eth_hdr == NULL) { + rte_pktmbuf_free(mbuf); + return NULL; + } /* Ethernet header. */ - eth_hdr = rte_pktmbuf_mtod(mbuf, struct ether_hdr *); memset(eth_hdr->d_addr.addr_bytes, 0xff, ETHER_ADDR_LEN); ether_addr_copy(mac, ð_hdr->s_addr); eth_hdr->ether_type = htons(ETHER_TYPE_RARP); @@ -35,8 +46,5 @@ rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac) memset(&rarp->arp_data.arp_sip, 0x00, 4); memset(&rarp->arp_data.arp_tip, 0x00, 4); - mbuf->data_len = RARP_PKT_SIZE; - mbuf->pkt_len = RARP_PKT_SIZE; - - return 0; + return mbuf; } diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h index dad7423..457a39b 100644 --- a/lib/librte_net/rte_arp.h +++ b/lib/librte_net/rte_arp.h @@ -82,16 +82,17 @@ struct arp_hdr { * * Make a RARP packet based on MAC addr. * - * @param mbuf - * Pointer to the rte_mbuf structure + * @param mpool + * Pointer to the rte_mempool * @param mac * Pointer to the MAC addr * * @return - * - 0 on success, negative on error + * - RARP packet pointer on success, or NULL on error */ -int -rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac); +struct rte_mbuf * +rte_net_make_rarp_packet(struct rte_mempool *mpool, + const struct ether_addr *mac); #ifdef __cplusplus } diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index ca89288..a1d8026 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1162,19 +1162,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, rte_atomic16_cmpset((volatile uint16_t *) &dev->broadcast_rarp.cnt, 1, 0))) { - rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool); + rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac); if (rarp_mbuf == NULL) { RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); + "Failed to make RARP packet.\n"); return 0; } - - if (rte_net_make_rarp_packet(rarp_mbuf, &dev->mac) < 0) { - rte_pktmbuf_free(rarp_mbuf); - rarp_mbuf = NULL; - } else { - count -= 1; - } + count -= 1; } free_entries = *((volatile uint16_t *)&vq->avail->idx) - -- 2.7.4
[dpdk-dev] [PATCH 2/2] net: fix build error
Fix build error when shared lib is enabled: LD librte_net.so.1.1 rte_arp.o: In function `rte_net_make_rarp_packet': rte_arp.c:(.text+0x1f0): undefined reference to `rte_mempool_ops_table' rte_arp.c:(.text+0x21d): undefined reference to `rte_mempool_ops_table' rte_arp.c:(.text+0x2d5): undefined reference to `rte_mempool_ops_table' rte_arp.c:(.text+0x384): undefined reference to `rte_mempool_ops_table' rte_arp.c:(.text+0x4b7): undefined reference to `rte_mempool_ops_table' Fixes: 45ae05df824c ("net: add a helper for making RARP packet") Signed-off-by: Yuanhan Liu --- lib/librte_net/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile index ab290c3..95ff549 100644 --- a/lib/librte_net/Makefile +++ b/lib/librte_net/Makefile @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk LIB = librte_net.a CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -LDLIBS += -lrte_mbuf -lrte_eal +LDLIBS += -lrte_mbuf -lrte_eal -lrte_mempool EXPORT_MAP := rte_net_version.map LIBABIVER := 1 -- 2.7.4
Re: [dpdk-dev] [PATCH v11 0/5] net/virtio: support GUEST ANNOUNCE
Xiao told me that this series (except the last patch) was already applied to the Thomas master branch. I then realised it was my mistake. I applied v10 last week locally for some basic testing. There is a conflict in last patch, that's why the last patch is not merged. I forgot to do a reset before I applied another patch. Then later, I did a push to the next-virtio tree, thus patches from Xiao were also pushed. Ferruh then did a pull from it. As a result, they got merged to the master branch before I realised. Non-rebase is allowed there, thus I have made a patch to fix my mistake. Meanwhile, I have also spotted a build error when shared lib is enabled. I will send them out soon. --yliu On Wed, Jan 17, 2018 at 05:40:58AM +0800, Xiao Wang wrote: > When live migration is finished, the backup VM needs to proactively announce > its new location. DPDK vhost has implemented VHOST_USER_PROTOCOL_F_RARP to > generate a RARP packet to switch in dequeue path. Another method is to let > the guest proactively send out RARP packet using VIRTIO_NET_F_GUEST_ANNOUNCE > feature. > > This patch set enables this feature in virtio pmd, to support VM running > virtio > pmd be migrated without vhost supporting RARP generation. > > v11: > - Add check for parameter and tailroom in rte_net_make_rarp_packet. > - Allocate mbuf in rte_net_make_rarp_packet. > > v10: > - Add a bold doxygen comment for the experimental function. > > v9: > - Introduce function with the experimental state. > > v8: > - Add a helper in lib/librte_net to make rarp packet, it's used by > both vhost and virtio. > > v7: > - Improve comment for state_lock. > - Rename spinlock variable 'sl' to 'lock'. > > v6: > - Use rte_pktmbuf_alloc() instead of rte_mbuf_raw_alloc(). > - Remove the 'len' parameter in calling virtio_send_command(). > - Remove extra space between typo and var. > - Improve comment and alignment. > - Remove the unnecessary header file. > - A better usage of 'unlikely' indication. > > v5: > - Remove txvq parameter in virtio_inject_pkts. > - Zero hw->special_buf after using it. > - Return the retval of tx_pkt_burst(). > - Allocate a mbuf pointer on stack directly. > > v4: > - Move spinlock lock/unlock into dev_pause/resume. > - Separate out a patch for packet injection. > > v3: > - Remove Tx function code duplication, use a special pointer for rarp > injection. > - Rename function generate_rarp to virtio_notify_peers, replace > 'virtnet_' with 'virtio_'. > - Add comment for state_lock. > - Typo fix and comment improvement. > > v2: > - Use spaces instead of tabs between the code and comments. > - Remove unnecessary parentheses. > - Use rte_pktmbuf_mtod directly to get eth_hdr addr. > - Fix virtio_dev_pause return value check. > > Xiao Wang (5): > net/virtio: make control queue thread-safe > net/virtio: add packet injection method > net: add a helper for making RARP packet > vhost: use lib API to make RARP packet > net/virtio: support GUEST ANNOUNCE > > drivers/net/virtio/virtio_ethdev.c | 113 > +++- > drivers/net/virtio/virtio_ethdev.h | 6 ++ > drivers/net/virtio/virtio_pci.h | 7 ++ > drivers/net/virtio/virtio_rxtx.c| 3 +- > drivers/net/virtio/virtio_rxtx.h| 1 + > drivers/net/virtio/virtio_rxtx_simple.c | 2 +- > drivers/net/virtio/virtqueue.h | 11 > lib/Makefile| 3 +- > lib/librte_net/Makefile | 1 + > lib/librte_net/rte_arp.c| 50 ++ > lib/librte_net/rte_arp.h| 18 + > lib/librte_net/rte_net_version.map | 6 ++ > lib/librte_vhost/Makefile | 2 +- > lib/librte_vhost/virtio_net.c | 51 +- > 14 files changed, 219 insertions(+), 55 deletions(-) > create mode 100644 lib/librte_net/rte_arp.c > > -- > 2.15.1
Re: [dpdk-dev] [PATCH] vhost: do deep copy while reallocate vq
On Mon, Jan 15, 2018 at 06:32:19AM -0500, Junjie Chen wrote: > When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform > deep copy, which lead to 1) zmbuf list not valid 2) remote memory access. > This patch is to re-initlize the zmbuf list and also do the deep copy. > > Signed-off-by: Junjie Chen > --- Applied to dpdk-next-virtio. Thanks. --yliu
[dpdk-dev] [PATCH] doc: document the new devargs syntax
This patch documents the new devargs syntax, which is going to be implemented in DPDK v18.05. The new devargs proposal is introduced for having a consistent interface for: - whitelisting/blacklisting devices - identifying ports - attaching/detaching devices Please check the patch content for the details. Also, here is link for the background: http://dpdk.org/ml/archives/dev/2017-November/082600.html This syntax is suggestd by Thomas: http://dpdk.org/ml/archives/dev/2017-December/084234.html Signed-off-by: Yuanhan Liu --- doc/guides/prog_guide/env_abstraction_layer.rst | 34 + 1 file changed, 34 insertions(+) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 34d871c..12f37f2 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such case, calling callback. Care must be taken not to close the device from the interrupt handler context. It is necessary to reschedule such closing operation. +Devargs +~~~ + +The ``devargs`` can be used for whitelisting/blacklisting devices, identifying +DPDK ports and attaching/deatching devices. They all share the same syntax. + +It is split in 3 categories, where almost everything is optional key/value pairs: + +* bus (pci, vdev, vmbus, fslmc, etc) +* class (eth, crypto, etc) +* driver (i40e, mlx5, virtio, etc) + +The key/value pair describing the category scope is mandatory and must be the +first pair in the category properties. Example: bus=pci, must be placed before +id=:01:00.0. + +The syntax has below rules: + +* Between categories, the separator is a slash. +* Inside a category, the separator is a comma. +* Inside a key/value pair, the separator is an equal sign. +* Each category can be used alone. + +Here is an example with all categories:: + + bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE + +It can also be simple like below:: + +class=eth,mac=00:11:22:33:44:55 + +A device is identified when every properties are matched. Before device is +probed, only the bus category is relevant. + Blacklisting -- 2.7.4
Re: [dpdk-dev] [PATCH] vhost: support Explicit Congestion Notification
On Wed, Nov 22, 2017 at 11:19:42AM +0800, Jiayu Hu wrote: > In virtio, Explicit Congestion Notification (ECN) includes two parts: > guest ECN and host ECN. Guest ECN means the frontend can handle TSO > packets which have ECN set, and host ECN means the backend can handle > TSO packets which have ECN set. > > The ECN features are rarely used. However, virtio-net enables them by > default, and vhost-net support both. To make live migration from > vhost-net to vhost-user possible, this patch announces to support > guest and host ECN in vhost-user. > > Signed-off-by: Jiayu Hu Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH v4 0/5] lib: add Port Representors
On Mon, Jan 08, 2018 at 02:37:15PM +, Remy Horton wrote: > Port Representors provide a logical presentation in DPDK of VF (virtual > function) ports for the purposes of control and monitoring. Each port > representor device represents a single VF and is associated with it's > parent physical function (PF) PMD which provides the back-end hooks for > the representor device ops and defines the control domain to which that > port belongs. This allows to use existing DPDK APIs to monitor and control > the port without the need to create and maintain VF specific APIs. Firstly, I don't object this model. More precisely, I don't object for introducing a port to control the VFs. I even like this idea a bit, for at least it could get rid of some PMD specific APIs, say rte_pmd_i40e_set_vf_mac_addr, etc. However, I don't quite like this design, for a simple reason, it makes things way more complex: - new APIs are introduced. - new virtual PMD is required - broker concept I was thinking below should work: - Use the devargs for enabling the port (or VF) representor model. For example: -w 04:00.0,vf_ports=0-3 The vf_ports is for telling the driver we are going to create VF representors for VF 0 to VF 3. It could be a port mask like what this patchset does. - Then inside the driver (say, i40e) * allocate 1 DPDK ethdev port * allocate 4 DPDK ethdev port, one for each VF specified by the vf_ports option. And these are VF representors. * link the VF representors to the right VF port For example, for i40e, it should be (from patch 3): vf = &pf->vfs[representor->vport_id]; * set the proper ethdev callbacks for the VF representors. As you can see, none of above complexity is introduced. Probably you might find something are missing: - to identify the vf rep port for a specific VF. Which could be addressed by the new devargs syntax we are proposing: http://dpdk.org/ml/archives/dev/2017-December/084234.html For example, the right VF rep port could be identified by below devarg: bus=pci,id=04:00.0/class=eth,vf_id=0 - the ability of hotplug I think it could also be addressed by the new devargs syntax, also by re-using the rte_eth_dev_attach() function: rte_eth_dev_attach("bus=pci,id=04:00.0/class=eth,vf_id=4", &port_id); Thoughts? --yliu > > +-+ +---+ +---+ > |Control Plane| | Data Plane | | Data Plane | > | Application | | Application | | Application | > +-+ +---+ +---+ > | eth dev api | | eth dev api | | eth dev api | > +-+ +---+ +---+ > +---+ +---+ +---+ +---+ +---+ > | PF0 | | Port | | Port | |VF0 PMD| |VF0 PMD| > | PMD <--+ Rep 0 | | Rep 1 | +---+ +--++ > | | | PMD | | PMD | | > +---+--^+ +---+ +-+-+ | > | || || > | ++ || > | || > | || > ++ | > | | HW (logical view) | | | > | --+--+ +---+ +---+---+ | | > | | PF | | VF0 | | VF1 | | | > | || | | | ++ > | ++ +---+ +---+ | > | ++ | > | |VEB | | > | ++ | > | ++ | > | | Port | | > | | 0| | > | ++ | > ++ > > The figure above shows a deployment where the PF is bound to a DPDK control > plane application which uses representor ports to manage the configuration and > monitoring of it's VF ports. Each virtual function is represented in the > application by a representor port PMD which enables control of the > corresponding > VF through eth dev APIs on the representor PMD such as: > > - void rte_eth_promiscuous_enable(uint8_t port_id); > - void rte_eth_promiscuous_disable(uint8_t port_id); > - void rte_eth_allmulticast_enable(uint8_t port_id); > - void rte_eth_allmulticast_disable(uint8_t port_id); > - int rte_eth_dev_mac_addr_add(uint8_t port, struct ether_addr *mac_addr, > uint32_t pool); > - int rte_eth_dev_set_vlan_offload(uint8_t port_id, int offload_mask); > > as well as monitoring through API's like > > - void rte_eth_link_get(uint8_t port_id, struct rte_eth_link *link); > - int rte_eth_stats_get(uint8_t port_id, struct rte_eth_s
Re: [dpdk-dev] [PATCH v10 3/5] net: add a helper for making RARP packet
Thomas, look good to you? --yliu On Wed, Jan 10, 2018 at 09:23:54AM +0800, Xiao Wang wrote: > Suggested-by: Maxime Coquelin > Signed-off-by: Xiao Wang > Reviewed-by: Maxime Coquelin > --- > lib/librte_net/Makefile| 1 + > lib/librte_net/rte_arp.c | 42 > ++ > lib/librte_net/rte_arp.h | 17 +++ > lib/librte_net/rte_net_version.map | 6 ++ > 4 files changed, 66 insertions(+) > create mode 100644 lib/librte_net/rte_arp.c > > diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile > index 5e8a76b68..ab290c382 100644 > --- a/lib/librte_net/Makefile > +++ b/lib/librte_net/Makefile > @@ -13,6 +13,7 @@ LIBABIVER := 1 > > SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c > SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c > +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_arp.c > > # install includes > SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h > rte_esp.h > diff --git a/lib/librte_net/rte_arp.c b/lib/librte_net/rte_arp.c > new file mode 100644 > index 0..d7223b044 > --- /dev/null > +++ b/lib/librte_net/rte_arp.c > @@ -0,0 +1,42 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ > + > +#include > + > +#include > + > +#define RARP_PKT_SIZE64 > +int > +rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac) > +{ > + struct ether_hdr *eth_hdr; > + struct arp_hdr *rarp; > + > + if (mbuf->buf_len < RARP_PKT_SIZE) > + return -1; > + > + /* Ethernet header. */ > + eth_hdr = rte_pktmbuf_mtod(mbuf, struct ether_hdr *); > + memset(eth_hdr->d_addr.addr_bytes, 0xff, ETHER_ADDR_LEN); > + ether_addr_copy(mac, ð_hdr->s_addr); > + eth_hdr->ether_type = htons(ETHER_TYPE_RARP); > + > + /* RARP header. */ > + rarp = (struct arp_hdr *)(eth_hdr + 1); > + rarp->arp_hrd = htons(ARP_HRD_ETHER); > + rarp->arp_pro = htons(ETHER_TYPE_IPv4); > + rarp->arp_hln = ETHER_ADDR_LEN; > + rarp->arp_pln = 4; > + rarp->arp_op = htons(ARP_OP_REVREQUEST); > + > + ether_addr_copy(mac, &rarp->arp_data.arp_sha); > + ether_addr_copy(mac, &rarp->arp_data.arp_tha); > + memset(&rarp->arp_data.arp_sip, 0x00, 4); > + memset(&rarp->arp_data.arp_tip, 0x00, 4); > + > + mbuf->data_len = RARP_PKT_SIZE; > + mbuf->pkt_len = RARP_PKT_SIZE; > + > + return 0; > +} > diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h > index 183641874..dad7423ad 100644 > --- a/lib/librte_net/rte_arp.h > +++ b/lib/librte_net/rte_arp.h > @@ -76,6 +76,23 @@ struct arp_hdr { > struct arp_ipv4 arp_data; > } __attribute__((__packed__)); > > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Make a RARP packet based on MAC addr. > + * > + * @param mbuf > + * Pointer to the rte_mbuf structure > + * @param mac > + * Pointer to the MAC addr > + * > + * @return > + * - 0 on success, negative on error > + */ > +int > +rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr > *mac); > + > #ifdef __cplusplus > } > #endif > diff --git a/lib/librte_net/rte_net_version.map > b/lib/librte_net/rte_net_version.map > index 687c40eaf..213e6fd32 100644 > --- a/lib/librte_net/rte_net_version.map > +++ b/lib/librte_net/rte_net_version.map > @@ -12,3 +12,9 @@ DPDK_17.05 { > rte_net_crc_set_alg; > > } DPDK_16.11; > + > +EXPERIMENTAL { > + global: > + > + rte_net_make_rarp_packet; > +} DPDK_17.05; > -- > 2.15.1
Re: [dpdk-dev] [PATCH v2] examples/vhost: fix remove dev_info.max_rx_queues checking to solve startup failure
On Wed, Jan 10, 2018 at 02:01:01PM +0800, Zhiyong Yang wrote: > For vhost sample, the operation if (dev_info.max_rx_queues > > MAX_QUEUES) in the function port_init causes startup failure > when using X710(i40e driver). X710 requires that MAX_QUEUES > should be defined no less than 320, however it is defined as > 128 currently. > > Such checking is overkill and Removal don't affect any > functionality (have already validated ixgbe and i40e). > > The removal can avoid similar issue when introduing new physical > NIC. > > Fixes: 8bd6c395a568("examples/vhost: increase maximum queue number") > Cc: sta...@dpdk.org > > Signed-off-by: Zhiyong Yang Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH 5/5] vhost: add reconnect thread name for client mode.
On Tue, Jan 09, 2018 at 09:12:43PM +0800, Yuanhan Liu wrote: > On Fri, Jan 05, 2018 at 06:10:39AM -0800, Tonghao Zhang wrote: > > This patch adds the name for vhost-user reconnect thread. > > It can help us to know whether the thread is running. > > > > Signed-off-by: Tonghao Zhang > > --- > > lib/librte_vhost/socket.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > > index d44a0f1..c2e34e0 100644 > > --- a/lib/librte_vhost/socket.c > > +++ b/lib/librte_vhost/socket.c > > @@ -433,6 +433,7 @@ struct vhost_user_reconnect_list { > > vhost_user_reconnect_init(void) > > { > > int ret; > > + char thread_name[RTE_MAX_THREAD_NAME_LEN]; > > > > ret = pthread_mutex_init(&reconn_list.mutex, NULL); > > if (ret < 0) { > > @@ -449,6 +450,13 @@ struct vhost_user_reconnect_list { > > RTE_LOG(ERR, VHOST_CONFIG, > > "failed to destroy reconnect mutex"); > > } > > + } else { > > + snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, > > +"vhost-reconn"); > > + > > + if (rte_thread_setname(reconn_tid, thread_name)) > > + RTE_LOG(DEBUG, VHOST_CONFIG, > > + "Failed to set thread name for vhost-user reconnect"); > > Applied to dpdk-next-virtio, with the intendation fixed. Note that I have just applied this patch only. The rest of this patchset belongs to the ixgbe PMD driver, which need another maintainer for review. That also means, it's better if you could send them sperately next time when they are not related. More specifically, in this case, one patch for vhost-user, another patchset for all the rest ixgbe changes. --yliu
Re: [dpdk-dev] [PATCH v7 3/3] net/virtio: support GUEST ANNOUNCE
On Tue, Jan 09, 2018 at 12:41:53PM +0100, Thomas Monjalon wrote: > > > Do you think it could make sense to have this function in a lib, as > > > vhost user lib does exactly the same? > > > > > > I don't know if it could be useful to others than vhost/virtio though. > > > > Hi Thomas, > > > > Do you think it's worth adding a new helper for ARP in lib/librte_net/? > > Currently we just need a helper to build RARP packet (the above > > make_rarp_packet) > > Yes, good idea +1 --yliu
Re: [dpdk-dev] [PATCH] vhost: fix error code check when creating pthread
On Fri, Dec 08, 2017 at 11:19:49AM +0100, Olivier Matz wrote: > On error, pthread_create() returns a positive number (errno). > Fix the test on the return value. > > Fixes: af1475918124 ("vhost: introduce API to start a specific driver") > Fixes: e623e0c6d8a5 ("vhost: add reconnect ability") > Cc: sta...@dpdk.org > > Signed-off-by: Olivier Matz > --- Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH] examples/vhost: fix extend MAX_QUEUES to resolve startup failure
On Thu, Jan 04, 2018 at 02:33:32PM +0800, Zhiyong Yang wrote: > When binding X710 NIC (i40e driver) to DPDK, vhost sample startups > failure. > The sample requires that MAX_QUEUES should be defined no less than 320. > So, the patch redefines MAX_QUEUES 320 to fix the issue. It just makes the issue disappear. It doesn't really fix the issue. And I belive we have tried to fix this kind of issues in this way many times. (just check the git history). As you known, none of them really worked. You just added one more try, which is very likely will be broken again when Intel has one more new NIC. The error comes from: if (dev_info.max_rx_queues > MAX_QUEUES) { rte_exit(EXIT_FAILURE, "please define MAX_QUEUES no less than %u in %s\n", dev_info.max_rx_queues, __FILE__); } I think such check is overkill and we don't really need that. Could you just remove such check and do some validations on few difference nics? Thanks. --yliu
Re: [dpdk-dev] [PATCH] net/virtio: init MTU in case no control channel
On Fri, Jan 05, 2018 at 02:28:06AM -0800, Zhike Wang wrote: > From: zhike wang > > The max_mtu is kept as zero in case no CRTL channel, which leads > to failure when calling virtio_mtu_set(). > > Signed-off-by: Zhike Wang Applied to dpdk-next-virtio. Thanks. --yliu > --- > drivers/net/virtio/virtio_ethdev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index 21f2131..b7b3364 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -1479,6 +1479,8 @@ static int virtio_dev_xstats_get_names(struct > rte_eth_dev *dev, > } else { > PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1"); > hw->max_queue_pairs = 1; > + hw->max_mtu = VIRTIO_MAX_RX_PKTLEN - ETHER_HDR_LEN - > + VLAN_TAG_LEN - hw->vtnet_hdr_size; > } > > ret = virtio_alloc_queues(eth_dev); > -- > 1.8.3.1
Re: [dpdk-dev] [PATCH 5/5] vhost: add reconnect thread name for client mode.
On Fri, Jan 05, 2018 at 06:10:39AM -0800, Tonghao Zhang wrote: > This patch adds the name for vhost-user reconnect thread. > It can help us to know whether the thread is running. > > Signed-off-by: Tonghao Zhang > --- > lib/librte_vhost/socket.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index d44a0f1..c2e34e0 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -433,6 +433,7 @@ struct vhost_user_reconnect_list { > vhost_user_reconnect_init(void) > { > int ret; > + char thread_name[RTE_MAX_THREAD_NAME_LEN]; > > ret = pthread_mutex_init(&reconn_list.mutex, NULL); > if (ret < 0) { > @@ -449,6 +450,13 @@ struct vhost_user_reconnect_list { > RTE_LOG(ERR, VHOST_CONFIG, > "failed to destroy reconnect mutex"); > } > + } else { > + snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, > + "vhost-reconn"); > + > + if (rte_thread_setname(reconn_tid, thread_name)) > + RTE_LOG(DEBUG, VHOST_CONFIG, > + "Failed to set thread name for vhost-user reconnect"); Applied to dpdk-next-virtio, with the intendation fixed. I have also shortten the log a bit "failed to set reconnect thread name". There is no need to specify "vhost-user" again since VHOST_CONFIG tells you it comes from vhost-user. Thanks. --yliu
Re: [dpdk-dev] [PATCH v7] vhost: support virtqueue interrupt/notification suppression
On Tue, Jan 09, 2018 at 07:34:17AM +, Yao, Lei A wrote: > > > > -Original Message- > > From: Chen, Junjie J > > Sent: Tuesday, January 9, 2018 7:04 PM > > To: y...@fridaylinux.org; maxime.coque...@redhat.com; Wang, Xiao W > > ; Bie, Tiwei ; Yao, Lei A > > > > Cc: dev@dpdk.org; Chen, Junjie J > > Subject: [PATCH v7] vhost: support virtqueue interrupt/notification > > suppression > > > > The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is > > negotiated. The driver set vring flags to 0, and MAY use used_event in > > available ring to advise device interrupt util reach an index specified > > by used_event. The device ignore the lower bit of vring flags, and send > > an interrupt when index reach used_event. > > > > The device can suppress notification in a manner analogous to the ways > > driver suppress interrupt. The device manipulates flags or avail_event in > > the used ring in the same way the driver manipulates flags or used_event in > > available ring. > > > > Signed-off-by: Junjie Chen > Tested-by: Lei Yao Applied to dpdk-next-virtio. Thanks. --yliu > VM2VM Iperf test has been executed with virtio-net driver after apply this > patch. > No performance drop. > CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Host OS: Ubuntu 16.04 > Guest OS: Ubuntu 16.04 > Kernel: 4.4.0
Re: [dpdk-dev] [PATCH v6] vhost: support virtqueue interrupt/notification suppression
On Tue, Jan 09, 2018 at 02:12:02AM +, Chen, Junjie J wrote: > > > + if (vring_need_event(vhost_used_event(vq), new, old) > > > > It's a bit weird that you use one from the standard linux header file > > (vring_need_event), while you define you own one (vhost_used_event). > > Note that the system header file also has "vring_used_event()" defined. > The vring_used_event is defined and used for virtio in kernel, kernel defines > a vhost_used_event in vhost.c for vhost, so I just use a separated macro for > vhost end. > > I'd like to define both vhost_need_event and vhost_used_event in vhost.h to > remove potential build issue in old linux distribution and also to keep > consistent. Is that OK for you? Yes. --yliu
Re: [dpdk-dev] [PATCH v5 0/4] Vhost: fix mq=on but VIRTIO_NET_F_MQ not negotiated
On Wed, Dec 13, 2017 at 09:51:05AM +0100, Maxime Coquelin wrote: > Hi, > > This fifth revision fixes bug reported by Tiwei, dev->virtqueue[$idx] > wasn't reset to NULL at vq freeing time. Applied to dpdk-next-virtio. Thanks. --yliu > > Having QEMU started with mq=on but guest driver not negotiating > VIRTIO_NET_F_MQ feature ends up in the vhost device to never > start. Indeed, more queues are created in the vhost backend than > configured. > > Guest drivers known to not advertise the VIRTIO_NET_F_MQ feature are > iPXE and OVMF Virtio-net drivers. > > Queues are created because before starting the guest, QEMU sends > VHOST_USER_SET_VRING_CALL requests for all queues declared in QEMU > command line. Also, once Virtio features negotiated, QEMU sends > VHOST_USER_SET_VRING_ENABLE requests to disable all but the first > queue pair. > > This series fixes this by destroying all but first queue pair in > the backend if VIRTIO_NET_F_MQ isn't negotiated. First patches > makes sure that VHOST_USER_SET_FEATURES request doesn't change > Virtio features while the device is running, which should never > happen as per the Virtio spec. This helps to make sure vitqueues > aren't destroyed while being processed, but also protect from > other illegal features changes (e.g. VIRTIO_NET_F_MRG_RXBUF). > > Changes since v4: > = > - Fix dev->virtqueue[$ixd] not reset to NULL at VQ free time (Tiwei) > Changes since v3: > = > - Fix Virtio features = 0 case (Tiwei) > Changes since v2: > = > - Patch 2: Rework & fix VQs destruction loop (Laszlo) > Changes since v1: > = > - Patch 1: shift bits in the right direction (Ladi) > > Maxime Coquelin (4): > vhost: prevent features to be changed while device is running > vhost: propagate VHOST_USER_SET_FEATURES handling error > vhost: extract virtqueue cleaning and freeing functions > vhost: destroy unused virtqueues when multiqueue not negotiated > > lib/librte_vhost/vhost.c | 22 -- > lib/librte_vhost/vhost.h | 3 +++ > lib/librte_vhost/vhost_user.c | 40 ++-- > 3 files changed, 53 insertions(+), 12 deletions(-) > > -- > 2.14.3
Re: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload
On Tue, Nov 21, 2017 at 02:56:52PM +0800, Jiayu Hu wrote: > In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO > and guest UFO. Guest UFO means the frontend can receive large UDP packets, > and host UFO means the backend can receive large UDP packets. This patch > supports host UFO and guest UFO for vhost-user. > > Signed-off-by: Jiayu Hu Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH v2] examples/vhost: fix sending arp packet to self
On Fri, Dec 29, 2017 at 09:33:19AM -0500, Junjie Chen wrote: > ARP packets are not dropped when dest vdev is itself, which breaks > RX ring inconspicuously. > > Fixes: 9c5ef51207c6 ("examples/vhost: handle broadcast packet") > > Signed-off-by: Junjie Chen > --- Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH v6] vhost: support virtqueue interrupt/notification suppression
On Tue, Dec 26, 2017 at 12:43:10PM -0500, Junjie Chen wrote: > The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is > negotiated. The driver set vring flags to 0, and MAY use used_event in > available ring to advise device interrupt util reach an index specified > by used_event. The device ignore the lower bit of vring flags, and send > an interrupt when index reach used_event. > > The device can suppress notification in a manner analogous to the ways > driver suppress interrupt. The device manipulates flags or avail_event in > the used ring in the same way the driver manipulates flags or used_event in > available ring. > > This patch is to enable this feature in vhost. > > Signed-off-by: Junjie Chen You need put "---" before the change log. Otherwise, it will be tracked in the commit log. > +#define vhost_used_event(vr) \ > + (*(volatile uint16_t*)&(vr)->avail->ring[(vr)->size]) > + > +static __rte_always_inline void > +vhost_notify(struct virtio_net *dev, struct vhost_virtqueue *vq) > +{ > + /* Don't notify guest if we don't reach index specified by guest. */ > + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) { > + uint16_t old = vq->signalled_used; > + uint16_t new = vq->last_used_idx; > + > + LOG_DEBUG(VHOST_DATA, "%s: used_event_idx=%d, old=%d, new=%d\n", > + __func__, > + vhost_used_event(vq), > + old, new); > + if (vring_need_event(vhost_used_event(vq), new, old) It's a bit weird that you use one from the standard linux header file (vring_need_event), while you define you own one (vhost_used_event). Note that the system header file also has "vring_used_event()" defined. Besides that, I have few more comments (and some requirements): - It'd be much better if there is a Tested-by tag. Expeclitly, I'm asking a test with Linux kernel virtio-net driver in guest. - I also hope you could have done a build test on some old distributions. AFAIK, the two macros (vring_need_event and vring_used_event) come from kernel 3.0 (or above). Any kernel older than that would fail the build. - I'd be great if you could make a new one based on top of my latest tree: I have just applied a patchset that should conflict with this one. --yliu
Re: [dpdk-dev] [PATCH v2 0/2] vhost: introduce rte_vhost_vring_call()
On Wed, Jan 03, 2018 at 09:17:32AM +0100, Maxime Coquelin wrote: > >>Reviewed-by: Maxime Coquelin > > > > From 3.2. Managing ABI updates, General Guidelines: > > > > New APIs will be marked as experimental for at least one release to > > allow any issues found by users of the new API to be fixed quickly. > > > >http://dpdk.org/doc/guides/contributing/versioning.html > > > >I'm new to DPDK so I followed this guideline strictly and expect the > >maintainers to mark the API stable at some point in the future when they > >are happy with it. Maybe this guideline isn't followed strictly for > >librte_vhost? > > Thanks for the pointer, it seems it is not strictly followed for > librte_vhost & other libs. Let's keep it experimental for the coming > release. IIRC, it's a new guideline added recently (people were talking about that in last year's DPDK user event after all). And my understanding is, for a new (big set of) APIs, it's good to make them experimental, as normally, it's not an easy task to make a big stuff right in one time. For this one, which is simple and straightforward, I don't see a strong need to mark it as experimental. Thus I have removed it (while apply). If anyone has objections, let me know. Thanks. --yliu
Re: [dpdk-dev] [PATCH v6 1/3] net/virtio: make control queue thread-safe
On Sun, Jan 07, 2018 at 04:05:11AM -0800, Xiao Wang wrote: > diff --git a/drivers/net/virtio/virtio_rxtx.h > b/drivers/net/virtio/virtio_rxtx.h > index 54f1e849b..71b5798b0 100644 > --- a/drivers/net/virtio/virtio_rxtx.h > +++ b/drivers/net/virtio/virtio_rxtx.h > @@ -84,6 +84,7 @@ struct virtnet_ctl { > rte_iova_t virtio_net_hdr_mem; /**< hdr for each xmit packet */ > uint16_t port_id; /**< Device port identifier. */ > const struct rte_memzone *mz; /**< mem zone to populate CTL ring. */ > + rte_spinlock_t sl; /**< spinlock for control queue. */ It's weird to name it "sl". The typical naming is just "lock". --yliu
Re: [dpdk-dev] [PATCH v6 2/3] net/virtio: add packet injection method
On Sun, Jan 07, 2018 at 04:05:12AM -0800, Xiao Wang wrote: > + /* > + * App management thread and virtio interrupt handler thread > + * both can change the 'started' flag, this lock is meant to > + * avoid such a contention. > + */ > + rte_spinlock_t state_lock; Why not turning the "started" to atomic type, so that you don't need the lock? --yliu > + struct rte_mbuf **inject_pkts; > > struct virtqueue **vqs; > };
Re: [dpdk-dev] [PATCH v2 0/4] various fixes and cleanups for virtio PMD
On Mon, Dec 11, 2017 at 01:13:28PM +0800, Tiwei Bie wrote: > v2: > - refine indent for patch 2 > - fix a typo in commit log for patch 4 > - drop the blank lines squeezing patch Series applied to dpdk-next-virtio. Thanks. --yliu > > Tiwei Bie (4): > net/virtio: fix vector Rx break caused by rxq flushing > net/virtio: fix typo in LRO support > net/virtio: remove a redundant macro definition for ctrl vq > net/virtio: remove redundant macro definitions for vector Rx > > drivers/net/virtio/virtio_ethdev.c | 4 ++-- > drivers/net/virtio/virtio_rxtx_simple_neon.c | 2 -- > drivers/net/virtio/virtio_rxtx_simple_sse.c | 2 -- > drivers/net/virtio/virtqueue.c | 31 > +--- > drivers/net/virtio/virtqueue.h | 4 +--- > 5 files changed, 27 insertions(+), 16 deletions(-) > > -- > 2.13.3
Re: [dpdk-dev] [PATCH] vhost: support Generic Segmentation Offload
On Mon, Dec 25, 2017 at 01:53:29AM +, Yao, Lei A wrote: > > > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jiayu Hu > > Sent: Tuesday, November 28, 2017 1:29 PM > > To: dev@dpdk.org > > Cc: y...@fridaylinux.org; Tan, Jianfeng ; Hu, Jiayu > > > > Subject: [dpdk-dev] [PATCH] vhost: support Generic Segmentation Offload > > > > In virtio, Generic Segmentation Offload (GSO) is the feature for the > > backend, which means the backend can receive packets with any GSO > > type. > > > > Virtio-net enables the GSO feature by default, and vhost-net supports it. > > To make live migration from vhost-net to vhost-user possible, this patch > > enables GSO for vhost-user. > > > > Signed-off-by: Jiayu Hu > Tested-by: Lei Yao Applied to dpdk-next-virtio. Thanks. --yliu > This patch has been tested on my server, after add csum=on, gso=on to qemu > cmdline, > Following offload are active in vm: > udp-fragmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: on > tx-tcp6-segmentation: on > > > --- > > lib/librte_vhost/vhost.h | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > > index 1cc81c1..04f54cb 100644 > > --- a/lib/librte_vhost/vhost.h > > +++ b/lib/librte_vhost/vhost.h > > @@ -204,6 +204,7 @@ struct vhost_msg { > > (1ULL << VIRTIO_F_VERSION_1) | \ > > (1ULL << VHOST_F_LOG_ALL) | \ > > (1ULL << > > VHOST_USER_F_PROTOCOL_FEATURES) | \ > > + (1ULL << VIRTIO_NET_F_GSO) | \ > > (1ULL << VIRTIO_NET_F_HOST_TSO4) | \ > > (1ULL << VIRTIO_NET_F_HOST_TSO6) | \ > > (1ULL << VIRTIO_NET_F_CSUM)| \ > > -- > > 2.7.4
Re: [dpdk-dev] [PATCH] net/virtio: fix incorrect cast of void *
On Thu, Dec 14, 2017 at 03:49:32PM +0100, Maxime Coquelin wrote: > > > On 12/14/2017 03:33 PM, Olivier Matz wrote: > >From: Didier Pallard > > > >The rx_queues and tx_queues fields of the data structure points to a struct > >virtnet_rx or virtnet_tx. Casting it to a virtqueue is an error. > > > >It does not trigger any bug because pointer is not dereferenced inside the > >function, but it can become a bug if this code is copy/pasted and vq is > >dereferenced. > > > >Fixes: 01ad44fd374f ("net/virtio: split Rx/Tx queue") > >Cc: sta...@dpdk.org > > > >Signed-off-by: Didier Pallard > >Signed-off-by: Olivier Matz > >--- > > drivers/net/virtio/virtio_ethdev.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > >diff --git a/drivers/net/virtio/virtio_ethdev.c > >b/drivers/net/virtio/virtio_ethdev.c > >index c0ba83b06..b398f9960 100644 > >--- a/drivers/net/virtio/virtio_ethdev.c > >+++ b/drivers/net/virtio/virtio_ethdev.c > >@@ -893,7 +893,7 @@ static int virtio_dev_xstats_get_names(struct > >rte_eth_dev *dev, > > /* Note: limit checked in rte_eth_xstats_names() */ > > for (i = 0; i < dev->data->nb_rx_queues; i++) { > >-struct virtqueue *rxvq = dev->data->rx_queues[i]; > >+struct virtnet_rx *rxvq = dev->data->rx_queues[i]; > > if (rxvq == NULL) > > continue; > > for (t = 0; t < VIRTIO_NB_RXQ_XSTATS; t++) { > >@@ -906,7 +906,7 @@ static int virtio_dev_xstats_get_names(struct > >rte_eth_dev *dev, > > } > > for (i = 0; i < dev->data->nb_tx_queues; i++) { > >-struct virtqueue *txvq = dev->data->tx_queues[i]; > >+struct virtnet_tx *txvq = dev->data->tx_queues[i]; > > if (txvq == NULL) > > continue; > > for (t = 0; t < VIRTIO_NB_TXQ_XSTATS; t++) { > > > > Reviewed-by: Maxime Coquelin Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload
On Tue, Nov 21, 2017 at 02:56:52PM +0800, Jiayu Hu wrote: > In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO > and guest UFO. Guest UFO means the frontend can receive large UDP packets, > and host UFO means the backend can receive large UDP packets. This patch > supports host UFO and guest UFO for vhost-user. > > Signed-off-by: Jiayu Hu > --- > lib/librte_mbuf/rte_mbuf.h| 7 +++ > lib/librte_vhost/vhost.h | 2 ++ > lib/librte_vhost/virtio_net.c | 10 ++ > 3 files changed, 19 insertions(+) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index ce8a05d..3d8cfc9 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -209,6 +209,13 @@ extern "C" { > /* add new TX flags here */ > > /** > + * UDP Fragmentation Offload flag. This flag is used for enabling UDP > + * fragmentation in SW or in HW. When use UFO, mbuf->tso_segsz is used > + * to store the MSS of UDP fragments. > + */ > +#define PKT_TX_UDP_SEG (1ULL << 42) This patch added a new mbuf flag, Olivier, do you have objections? --yliu
Re: [dpdk-dev] [PATCH] lib/librte_vhost: remove redundant logic judgement
On Mon, Dec 25, 2017 at 05:16:17PM +0800, Zhiyong Yang wrote: > At the beginning of vring_translate, the code > if(!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) already judges > if IOMMU_PLATFORM is supported. The function vhost_iova_to_vva always > repeats the logic, __vhost_iova_to_vva can be used directly to avoid it > here. > > Signed-off-by: Zhiyong Yang > --- > lib/librte_vhost/vhost.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c > index 4f8b73a09..bb615fd2a 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -157,7 +157,7 @@ vring_translate(struct virtio_net *dev, struct > vhost_virtqueue *vq) > goto out; > > size = sizeof(struct vring_desc) * vq->size; > - vq->desc = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, vq, > + vq->desc = (struct vring_desc *)(uintptr_t)__vhost_iova_to_vva(dev, vq, > vq->ring_addrs.desc_user_addr, > size, VHOST_ACCESS_RW); I don't see strong reason to bother doing the change. It's not in the datapath after all. I'd like to keep the code as it is, to keep it simpler: user just has to call vhost_iova_to_vva() and let it to handle the details. --yliu > if (!vq->desc) > @@ -165,7 +165,8 @@ vring_translate(struct virtio_net *dev, struct > vhost_virtqueue *vq) > > size = sizeof(struct vring_avail); > size += sizeof(uint16_t) * vq->size; > - vq->avail = (struct vring_avail *)(uintptr_t)vhost_iova_to_vva(dev, vq, > + vq->avail = (struct vring_avail *)(uintptr_t)__vhost_iova_to_vva(dev, > + vq, > vq->ring_addrs.avail_user_addr, > size, VHOST_ACCESS_RW); > if (!vq->avail) > @@ -173,7 +174,7 @@ vring_translate(struct virtio_net *dev, struct > vhost_virtqueue *vq) > > size = sizeof(struct vring_used); > size += sizeof(struct vring_used_elem) * vq->size; > - vq->used = (struct vring_used *)(uintptr_t)vhost_iova_to_vva(dev, vq, > + vq->used = (struct vring_used *)(uintptr_t)__vhost_iova_to_vva(dev, vq, > vq->ring_addrs.used_user_addr, > size, VHOST_ACCESS_RW); > if (!vq->used) > -- > 2.13.3
Re: [dpdk-dev] [PATCH] net/virtio: remove unnecessary macro definitions
On Tue, Dec 26, 2017 at 05:25:00PM +0800, Zhiyong Yang wrote: > DPDK has already the definition of Ethernet numeric link speeds in Mbps in > the file Rte_ethdev.h, it is unnecessary to rededine virtio specific link > speeds macros again. > > Signed-off-by: Zhiyong Yang Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH] vhost: fix dequeue zero copy not work with virtio1
On Fri, Dec 15, 2017 at 10:33:41AM +, Loftus, Ciara wrote: > > > > Hi Junjie, > > > > On 12/13/2017 05:50 PM, Junjie Chen wrote: > > > This fix dequeue zero copy can not work with Qemu > > > version >= 2.7. Since from Qemu 2.7 virtio device > > > use virtio-1 protocol, the zero copy code path > > > forget to add offset to buffer address. > > > > > > Signed-off-by: Junjie Chen > > > --- > > > lib/librte_vhost/virtio_net.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > > > index 6fee16e..79d80f7 100644 > > > --- a/lib/librte_vhost/virtio_net.c > > > +++ b/lib/librte_vhost/virtio_net.c > > > @@ -977,7 +977,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct > > vhost_virtqueue *vq, > > > desc->addr + desc_offset, > > cpy_len { > > > cur->data_len = cpy_len; > > > cur->data_off = 0; > > > - cur->buf_addr = (void *)(uintptr_t)desc_addr; > > > + cur->buf_addr = (void *)(uintptr_t)(desc_addr > > > + + desc_offset); > > > cur->buf_iova = hpa; > > > > > > /* > > > > > > > Thanks for fixing this. > > > > Reviewed-by: Maxime Coquelin > > > > Maxime > > Thanks for the fix. Can this be considered for the stable branch? Yes, I think so. Applied to dpdk-next-virtio, with Fixes: b0a985d1f340 ("vhost: add dequeue zero copy") Cc: sta...@dpdk.org Thanks. --yliu
[dpdk-dev] [dpdk-announce] DPDK 17.08.1 released
-Og net/qede: fix compilation with -Og app/test-crypto-perf: fix memory leak app/test-crypto-perf: fix compilation with -Og net/virtio: revert not claiming LRO support net/virtio: revert not claiming IP checksum offload net/virtio: fix log levels in configure net/virtio: fix mbuf port for simple Rx function net/virtio: fix queue setup consistency net/virtio: fix compilation with -Og lpm6: fix compilation with -Og Ophir Munk (3): net/tap: fix flow and port commands net/failsafe: fix VLAN stripping configuration app/testpmd: fix forwarding between non consecutive ports Pablo de Lara (8): hash: fix eviction counter crypto/aesni_gcm: fix zero data operation app/crypto-perf: fix packet length check app/crypto-perf: parse AEAD data from vectors crypto/openssl: fix AEAD parameters examples/l2fwd-crypto: fix physical address setting crypto/qat: fix HMAC supported digest sizes app/testpmd: fix topology error message Patrick MacArthur (1): eal: copy raw strings taken from command line Phil Yang (1): app/testpmd: fix quitting in container Qi Zhang (4): net/i40e: fix flow control watermark mismatch net/i40e: fix packet count for PF net/i40e: fix mbuf free in vector Tx net/i40e: fix mirror with firmware 6.0 Rami Rosen (1): net/kni: remove driver struct forward declaration Rasesh Mody (4): net/qede/base: fix to use a passed ptt handle net/qede/base: fix return code to align with FW net/qede: remove duplicate includes net/qede/base: fix division by zero Raslan Darawsheh (2): net/failsafe: fix failsafe bus uninit return value net/failsafe: fix PCI devices init RongQiang Xie (2): net/enic: fix possible null pointer dereference net/qede: fix possible null pointer dereference Sebastian Basierski (3): net/vmxnet3: fix unintentional integer overflow net/virtio-user: fix TAP name string termination net/virtio: check error on setting non block flag Sergio Gonzalez Monroy (1): crypto/aesni_mb: fix invalid session error Shahaf Shuler (5): net/mlx5: fix num seg assumption in SSE Tx net/mlx5: fix Tx stats error counter definition net/mlx5: fix Tx stats error counter logic net/mlx5: fix TSO segment size verification net/mlx5: fix packet type flags for Ethernet only frame Stefan Baranoff (1): net/pcap: fix memory leak in dumper open Stephen Hemminger (1): eal: initialize logging before bus Tiwei Bie (1): net/virtio: flush Rx queues on start Tomasz Duszynski (3): drivers/crypto: use snprintf return value correctly examples/ipsec-secgw: fix IP version check examples/ipsec-secgw: fix IPv6 payload length Tomasz Kulasek (3): net/bonding: fix slaves capacity check net/i40e: fix assignment of enum values net/bonding: fix check slaves link properties Vipin Varghese (1): net/tap: fix unregistering callback with invalid fd Wei Dai (8): net/ixgbe: fix mapping of user priority to TC net/ixgbe: fix adding a mirror rule net/i40e: fix mirror rule reset when port is closed net/ixgbe: fix Rx queue interrupt mapping in VF net/ixgbe: fix VFIO interrupt mapping in VF net/ixgbe: fix PF DCB info app/testpmd: fix mapping of user priority to DCB TC net/i40e: fix VFIO interrupt mapping in VF Wei Zhao (4): net/ixgbe: fix MAC VLAN filter fail problem net/i40e: fix clear xstats bug in VF app/testpmd: fix packet throughput after stats reset net/ixgbe: fix filter parser for L2 tunnel Wenzhuo Lu (7): net/i40e: fix TM node parameter checking net/i40e: fix TM level capability getting net/ixgbe: fix TM node parameter checking net/ixgbe: fix TM level capability getting net/i40e: fix not supporting NULL TM profile net/ixgbe: fix not supporting NULL TM profile net/i40e: fix parent when adding TM node Xiaoyun Li (2): net/i40e: fix PF notify issue when VF is not up net/igb: fix Rx interrupt with VFIO and MSI-X Xueming Li (5): net/mlx5: fix tunnel offload detection mem: fix malloc debug config mem: fix malloc element free in debug mode examples/l2fwd_fork: fix message pool init examples/multi_process: fix received message length Yi Yang (1): service: fix build with gcc 4.9 Yong Wang (4): net/liquidio: fix uninitialized variable net/igb: fix memcpy length net/i40e: fix uninitialized variable net/ixgbe: fix uninitialized variable Yongseok Koh (3): net/mlx5: fix calculating TSO inline size net/mlx5: fix overflow of Rx SW ring net/mlx5: fix tunneled TCP/UDP packet type Yuanhan Liu (2): Revert "net/virtio: flush Rx queues on start" version: 17.08.1 Zhiyong Yang (2): test: fix assignment operation
Re: [dpdk-dev] 17.08.1 patches review and test
On Mon, Dec 04, 2017 at 08:40:03PM +, Patil, Harish wrote: > > > -Original Message- > From: dev on behalf of Yuanhan Liu > > Date: Monday, November 27, 2017 at 4:21 AM > To: dpdk stable > Cc: "dev@dpdk.org" , "Xu, Qian Q" > Subject: [dpdk-dev] 17.08.1 patches review and test > > >Hi all, > > > >Here is a list of patches targeted for stable release 17.08.1. Please > >help review and test. The planned date for the final release is 7th, > >Dec. Before that, please shout if anyone has objections with these > >patches being applied. > > > >These patches are located at branch 17.08 of dpdk-stable repo: > >http://dpdk.org/browse/dpdk-stable/ > > > >Thanks. > > > >--yliu > > > >--- > >Aaron Conole (1): > > net/enic: fix assignment > > > >Ajit Khaparde (28): > > net/bnxt: fix HWRM macros and locking > > net/bnxt: use 64-bits of address for VLAN table > > net/bnxt: fix an issue with group id calculation > > net/bnxt: fix calculation of number of pools > > net/bnxt: handle multi queue mode properly > > net/bnxt: fix Rx handling and buffer allocation logic > > net/bnxt: fix an issue with broadcast traffic > > net/bnxt: fix usage of VMDq flags > > net/bnxt: set checksum offload flags correctly > > net/bnxt: update status of Rx IP/L4 CKSUM > > net/bnxt: fix config RSS update > > net/bnxt: set the hash key size > > net/bnxt: fix per queue stats display in xstats > > net/bnxt: fix interrupt handler > > net/bnxt: fix number of MAC addresses for VMDq > > net/bnxt: fix the association of a MACVLAN per VNIC > > net/bnxt: fix Tx offload capability > > net/bnxt: fix Rx offload capability > > net/bnxt: handle Rx multi queue creation properly > > net/bnxt: remove redundant code parsing pool map > > net/bnxt: fix a bit shift operation > > net/bnxt: fix a potential null pointer dereference > > net/bnxt: fix a potential null pointer dereference > > net/bnxt: fix a pointer deref before null check > > net/bnxt: fix an unused value > > net/bnxt: check VLANs from pool map only for VMDq > > net/bnxt: do not set hash type unnecessarily > > net/bnxt: fix VLAN spoof configuration > > > >Akhil Goyal (2): > > test/crypto: fix dpaa2 sec macros and definitions > > net/dpaa2: set queues after reconfiguration > > > >Alejandro Lucero (2): > > net/nfp: fix RSS > > net/nfp: fix Rx interrupt when multiqueue > > > >Alok Makhariya (2): > > crypto/dpaa2_sec: remove ICV memset on decryption side > > crypto/dpaa2_sec: add check for segmented buffer > > > >Anatoly Burakov (1): > > vfio: fix secondary process initialization > > > >Andrey Chilikin (1): > > net/i40e: fix flexible payload configuration > > > >Aviad Yehezkel (4): > > examples/ipsec-secgw: fix crypto device mapping > > examples/ipsec-secgw: fix session creation > > examples/ipsec-secgw: fix AAD length setting > > app/testpmd: fix build without ixgbe and bnxt PMDs > > > >Beilei Xing (1): > > net/i40e: fix VF device stop issue > > > >Chas Williams (1): > > net/vmxnet3: fix memory leak when releasing queues > > > >Congwen Zhang (1): > > net/cxgbe: fix memory leak > > > >Daniel Mrzyglod (3): > > net/virtio: fix untrusted scalar value > > app/testpmd: fix DDP package filesize detection > > net/bonding: fix default aggregator mode to stable > > > >David Harton (2): > > net/vmxnet3: fix MAC address set > > net/i40e: fix i40evf MAC filter table > > > >Ferruh Yigit (4): > > ethdev: fix ABI version > > ethdev: revert use port name from device structure > > igb_uio: remove device reset in open > > net/qede: fix icc build > > > >Gaetan Rivet (1): > > net/failsafe: fix errno set on command execution > > > >Gowrishankar Muthukrishnan (1): > > net/bonding: support bifurcated driver in eal > > > >Guduri Prathyusha (2): > > examples/l3fwd: fix NEON instructions > > examples/l3fwd: fix aliasing in port grouping > > > >Harish Patil (2): > > net/qede: fix supported packet types > > net/qede: fix to re-enable LRO during device start > > > >Hemant Agrawal (3): > > net/d
Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name
On Tue, Dec 05, 2017 at 06:22:05PM +0100, Adrien Mazarguil wrote: > > > > > Just for information, this "port=x" argument in mlx4 is consistent > > > > > with the > > > > > value found in /sys/class/net/ethX/dev_port under Linux. If we choose > > > > > to use > > > > > a port index (instead of a MAC or something else), it would make > > > > > sense to > > > > > standardize it on the same order as given by the host OS for > > > > > consistency > > > > > across all PMDs. > > > > > > Thanks for the info. > > > > > > But FYI, for most of other PMDs, such sys file won't exist, as the host > > > driver should have been unbind and bind with something like uio. So I > > > don't think it applies to all other nics. > > Sure, I only meant PMDs must implement the same numbering scheme the kernel > driver they replace would have used (as exposed through dev_port) for > consistency. Note dev_port is always present since Linux 3.15, even with > single port/bus address devices, so applications that want to construct > -w/-b arguments can rely on that before unbinding devices. I don't think that's a clean way. Fortunate enough though, even we want to use the port as one of the key for identification, we don't really need that in most cases. So, just like the mac proposed here, we could (and probably should) make it optional. [...] > > > I'm not a fan of the MAC naming, neither. The reason this patch proposes > > > mac > > > naming is that it's more clear for the user to specify a port. I also > > > agree > > > that the port index could be another good option. > > > > > > One thing I really haven't considered yet is how it becomes when the VF > > > reprenstor comes to play? (more guys are cc'ed). > > Won't VFs come through a separate PCI bus address anyway? Not sure extra > care is needed for those. Yes, for VF. But I was talking about VF representors [1]. Note that the interface is not settled down yet, but it's likely we need some interfaces (or more precisely, some concepts) like that. In that proposal, several ports could share a same vdev name, while each one is differenced by a vport_id. Again, it's not settled down yet, it might be something else in the end, say all share a same (PF) PCI id, while each one is identified by something else, etc. [1]: http://dpdk.org/ml/archives/dev/2017-November/080946.html --yliu
Re: [dpdk-dev] [PATCH v2 2/2] net/virtio: support GUEST ANNOUNCE
On Wed, Dec 06, 2017 at 07:23:11PM +0800, Tiwei Bie wrote: > On Mon, Dec 04, 2017 at 06:02:08AM -0800, Xiao Wang wrote: > [...] > > diff --git a/drivers/net/virtio/virtio_rxtx.c > > b/drivers/net/virtio/virtio_rxtx.c > > index 6a24fde..7313bdd 100644 > > --- a/drivers/net/virtio/virtio_rxtx.c > > +++ b/drivers/net/virtio/virtio_rxtx.c > > @@ -1100,3 +1100,84 @@ > > > > return nb_tx; > > } > > + > > +uint16_t > > +virtio_inject_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t > > nb_pkts) > > +{ > > + struct virtnet_tx *txvq = tx_queue; > > + struct virtqueue *vq = txvq->vq; > > + struct virtio_hw *hw = vq->hw; > > + uint16_t hdr_size = hw->vtnet_hdr_size; > > + uint16_t nb_used, nb_tx = 0; > > + > > + if (unlikely(nb_pkts < 1)) > > + return nb_pkts; > > + > > + PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts); > > + nb_used = VIRTQUEUE_NUSED(vq); > > + > > + virtio_rmb(); > > + if (likely(nb_used > vq->vq_nentries - vq->vq_free_thresh)) > > + virtio_xmit_cleanup(vq, nb_used); > > + > > + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { > > + struct rte_mbuf *txm = tx_pkts[nb_tx]; > > + int can_push = 0, use_indirect = 0, slots, need; > > + > > + /* optimize ring usage */ > > + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) || > > + vtpci_with_feature(hw, > > VIRTIO_F_VERSION_1)) && > > + rte_mbuf_refcnt_read(txm) == 1 && > > + RTE_MBUF_DIRECT(txm) && > > + txm->nb_segs == 1 && > > + rte_pktmbuf_headroom(txm) >= hdr_size && > > + rte_is_aligned(rte_pktmbuf_mtod(txm, char *), > > + __alignof__(struct virtio_net_hdr_mrg_rxbuf))) > > + can_push = 1; > > + else if (vtpci_with_feature(hw, VIRTIO_RING_F_INDIRECT_DESC) && > > +txm->nb_segs < VIRTIO_MAX_TX_INDIRECT) > > + use_indirect = 1; > > + > > + /* How many main ring entries are needed to this Tx? > > +* any_layout => number of segments > > +* indirect => 1 > > +* default=> number of segments + 1 > > +*/ > > + slots = use_indirect ? 1 : (txm->nb_segs + !can_push); > > + need = slots - vq->vq_free_cnt; > > + > > + /* Positive value indicates it need free vring descriptors */ > > + if (unlikely(need > 0)) { > > + nb_used = VIRTQUEUE_NUSED(vq); > > + virtio_rmb(); > > + need = RTE_MIN(need, (int)nb_used); > > + > > + virtio_xmit_cleanup(vq, need); > > + need = slots - vq->vq_free_cnt; > > + if (unlikely(need > 0)) { > > + PMD_TX_LOG(ERR, > > + "No free tx descriptors to > > transmit"); > > + break; > > + } > > + } > > + > > + /* Enqueue Packet buffers */ > > + virtqueue_enqueue_xmit(txvq, txm, slots, use_indirect, > > can_push); > > + > > + txvq->stats.bytes += txm->pkt_len; > > + virtio_update_packet_stats(&txvq->stats, txm); > > + } > > + > > + txvq->stats.packets += nb_tx; > > + > > + if (likely(nb_tx)) { > > + vq_update_avail_idx(vq); > > + > > + if (unlikely(virtqueue_kick_prepare(vq))) { > > + virtqueue_notify(vq); > > + PMD_TX_LOG(DEBUG, "Notified backend after xmit"); > > + } > > + } > > + > > + return nb_tx; > > +} > > Simple Tx has some special assumptions and setups of the txq. > Basically the current implementation of virtio_inject_pkts() > is a mirror of virtio_xmit_pkts(). So when simple Tx function > is chosen, calling virtio_inject_pkts() could cause problems. That's why I suggested to invoke the tx_pkt_burst callback directly. --yliu
Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes
On Wed, Dec 06, 2017 at 03:55:49PM +0200, Victor Kaplansky wrote: > When performing live migration or memory hot-plugging, > the changes to the device and vrings made by message handler > done independently from vring usage by PMD threads. > > This causes for example segfauls during live-migration > with MQ enable, but in general virtually any request > sent by qemu changing the state of device can cause > problems. > > These patches fixes all above issues by adding a spinlock > to every vring and requiring message handler to start operation > only after ensuring that all PMD threads related to the divece > are out of critical section accessing the vring data. > > Each vring has its own lock in order to not create contention > between PMD threads of different vrings and to prevent > performance degradation by scaling queue pair number. Hi, Thanks for the patch! Firstly, I didn't see your SoB. I'm also more interested to know do you see any performance penalty? > --- > lib/librte_vhost/vhost.h | 1 + > lib/librte_vhost/vhost_user.c | 44 > +++ > lib/librte_vhost/virtio_net.c | 8 > 3 files changed, 53 insertions(+) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c17..812aeccd 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -137,6 +137,7 @@ struct vhost_virtqueue { > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; > int iotlb_cache_nr; > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; > +rte_spinlock_t active_lock; The indentation is broken. > } __rte_cache_aligned; > @@ -1356,6 +1361,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_unlock(vq); > > +rte_spinlock_unlock(&vq->active_lock); Ditto. --yliu
Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend
On Wed, Dec 06, 2017 at 12:23:19PM +0100, Maxime Coquelin wrote: > Hi, > > On 12/05/2017 07:58 AM, Wang, Wei 5. (NSB - CN/Hangzhou) wrote: > > > > > >Hi, all > > > >In DPDK document, it it described that Virtio in containers Cannot work when > >there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. In another word, > >do not use 2MB hugepage so far. Do you know the reason of this limitation? > > What document are you referring to? > Actually, a single region is composed of several hugepages, be it 1G or > 2M. I use 2M hugepages daily without issues. He was talking about virtio-user. --yliu > > The limitation comes from the Vhost-user protocol spec. I know there are > discussions to enlarge it, but it would require a new protocol flag not > to break existing implementations, so no workaround possible. > > >In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't be > >set to 1GB size. The hugepage number shall be more than 8 > > > > > >So is there any solution or workaround to fix this limitation? Or change > >dpdk code to fix this limitation? > > Have you tried to use 2MB pages and faced the NREGIONS > 8 case? > If so, could you tell us more about your setup (QEMU version, VM RAM > size, etc...)? > > Cheers, > Maxime > > > > > >Vivian > >seat: 19009 > >phone: 13738006921 > > > > > >
Re: [dpdk-dev] Your next patch "net/virtio: fix an incorrect behavior..."
Thanks for the remind! I have reverted it. --yliu On Wed, Dec 06, 2017 at 03:11:48PM +0800, Tiwei Bie wrote: > Hi Yuanhan, > > Thanks for Antonio's reminder! > > Below patch may break vector Rx in some cases [1]. > > http://dpdk.org/browse/dpdk-stable/commit/drivers/net/virtio?h=17.08&id=f4e455b776f61ec7fbdfb5dce6ba34ad9016689b > > And I'm working on a fix for it. But as the deadline of 17.08.1 > is approaching, we don't have enough time to get the fix reviewed > and tested by the community. I'm not sure, maybe we should remove > it from the stable branch for 17.08.1 for now? > > [1] http://dpdk.org/ml/archives/dev/2017-December/082983.html > > Best regards, > Tiwei Bie > > On Tue, Dec 05, 2017 at 09:14:39PM +0800, Fischetti, Antonio wrote: > > Thanks Tiwei, > > one more thing. I saw they are going to backport some patches into 17.08.1 > > > > http://dev.dpdk.narkive.com/pX0IIwPl/dpdk-dev-17-08-1-patches-review-and-test > > > > is your next fix related to the one reported as "net/virtio: flush Rx > > queues on start"? > > > > If that's the case, should we flag this to Yuanhan Liu, just to be aware > > there's some work in progress? > > > > > > Antonio > >
Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend
On Tue, Dec 05, 2017 at 06:58:53AM +, Wang, Wei 5. (NSB - CN/Hangzhou) wrote: > > > Hi, all > > In DPDK document, it it described that Virtio in containers Cannot work when > there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. In another word, > do not use 2MB hugepage so far. Do you know the reason of this limitation? It comes from the vhost-user spec. > In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't be > set to 1GB size. The hugepage number shall be more than 8 > > > So is there any solution or workaround to fix this limitation? Or change dpdk > code to fix this limitation? > AFAIK, there are no workarounds. And I think the DPDK EAL memory code need be refactored a bit to fix this limitation. Unforunately, seems there are no work on that recently. --yliu
Re: [dpdk-dev] [PATCH 2/2] net/virtio: support GUEST ANNOUNCE
On Thu, Nov 30, 2017 at 02:41:12AM +, Wang, Xiao W wrote: > > > > -Original Message- > > From: Yuanhan Liu [mailto:y...@fridaylinux.org] > > Sent: Monday, November 27, 2017 8:49 PM > > To: Wang, Xiao W > > Cc: dev@dpdk.org > > Subject: Re: [PATCH 2/2] net/virtio: support GUEST ANNOUNCE > > > > On Fri, Nov 24, 2017 at 03:04:00AM -0800, Xiao Wang wrote: > > > When live migration is done, for the backup VM, either the virtio > > > frontend or the vhost backend needs to send out gratuitous RARP packet > > > to announce its new network location. > > > > > > This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support > > live > > > migration scenario where the vhost backend doesn't have the ability to > > > generate RARP packet. > > > > Yes, it's a feature good to have. > > > > > +static int > > > +virtio_dev_pause(struct rte_eth_dev *dev) > > > +{ > > > + struct virtio_hw *hw = dev->data->dev_private; > > > + > > > + if (hw->started == 0) > > > + return -1; > > > + hw->started = 0; > > > + /* > > > + * Prevent the worker thread from touching queues to avoid condition, > > > + * 1 ms should be enough for the ongoing Tx function to finish. > > > + */ > > > + rte_delay_ms(1); > > > + return 0; > > > +} > > > + > > > +static void > > > +virtio_dev_resume(struct rte_eth_dev *dev) > > > +{ > > > + struct virtio_hw *hw = dev->data->dev_private; > > > + > > > + hw->started = 1; > > > +} > > > > However, the implementation (stop first, pause for 1ms, duplicate another > > Tx function, resume) doesn't seem elegant. > > > > You probably could try something like DPDK vhost does: > > > > - set a flag when S_ANNOUCE is received > > - inject a pkt when such flag is set in the xmit function > > > > You then should be able to get rid of all of above stuffs. > > > > --yliu > > The difference is that the virtio port may just receive packet, without xmit. Thanks, I missed that. However, you really should not add a duplicate function. It adds more maintain effort. I think you probably could just invoke the tx_pkt_burst callback directly. You have stopped the device after all. What's the necessary to duplicate it? --yliu
Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message
On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote: > > > On 11/15/2017 12:41 PM, Jianfeng Tan wrote: > >In a running VM, operations (like device attach/detach) will > >trigger the QEMU to resend set_mem_table to vhost-user backend. > > > >DPDK vhost-user handles this message rudely by unmap all existing > >regions and map new ones. This might lead to segfault if there > >is pmd thread just trying to touch those unmapped memory regions. > > > >But for most cases, except VM memory hotplug, QEMU still sends the > >set_mem_table message even the memory regions are not changed as > >QEMU vhost-user filters out those not backed by file (fd > 0). > > > >To fix this case, we add a check in the handler to see if the > >memory regions are really changed; if not, we just keep old memory > >regions. > > > >Fixes: 8f972312b8f4 ("vhost: support vhost-user") > > > >CC: sta...@dpdk.org > > > >CC: Yuanhan Liu > >CC: Maxime Coquelin > > > >Reported-by: Yang Zhang > >Reported-by: Xin Long > >Signed-off-by: Yi Yang > >Signed-off-by: Jianfeng Tan > >--- > > lib/librte_vhost/vhost_user.c | 33 + > > 1 file changed, 33 insertions(+) > > Reviewed-by: Maxime Coquelin Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
On Tue, Dec 05, 2017 at 06:59:26AM +, Yang, Zhiyong wrote: > Hi all, > > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhiyong Yang > > Sent: Thursday, November 30, 2017 5:47 PM > > To: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com > > Cc: Wang, Wei W ; Tan, Jianfeng > > > > Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD > > supporting VM2VM scenario > > > > Vhostpci PMD is a new type driver working in guest OS which has ability to > > drive > > the vhostpci modern pci device, which is a new virtio device. > > > > The following linking is about vhostpci design: > > > > An initial device design is presented at KVM Forum'16: > > http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost- > > pci.pdf > > The latest device design and implementation will be posted to the QEMU > > community soon. > > > > The latest version vhostpci device code has been posted to QEMU community. > [PATCH v3 0/7] Vhost-pci for inter-VM communication > http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html I didn't follow on the qemu ML. Do you know when it's likely it will get accepted? --yliu > > [2016] Design of Vhost-pci by Wei Wang - YouTube linking as reference. > https://www.youtube.com/watch?v=xITj0qsaSJQ > > thanks > Zhiyong
Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name
On Tue, Dec 05, 2017 at 02:20:05PM +0100, Thomas Monjalon wrote: > 05/12/2017 12:04, Adrien Mazarguil: > > > Hi Yuanhan, > > > > > > On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote: > > > rte_devargs is identified by the name (pci id for pci device). It also > > > includes other driver specific key-value options. It's not clear for the > > > user to know which one (or few) of them should be used together with the > > > PCI id to identify a specific port. For example, as you mentioned, in > > > mlx4, it's "pci_id,port=x". It could be something else in other drivers. > > > Just for information, this "port=x" argument in mlx4 is consistent with the > > value found in /sys/class/net/ethX/dev_port under Linux. If we choose to use > > a port index (instead of a MAC or something else), it would make sense to > > standardize it on the same order as given by the host OS for consistency > > across all PMDs. Thanks for the info. But FYI, for most of other PMDs, such sys file won't exist, as the host driver should have been unbind and bind with something like uio. So I don't think it applies to all other nics. > > Good idea, thanks. > > > > I think we will MAC in some cases and port number in others. Could you list some examples? > It is important to have identifiers available even without initializing > the device. I don't object that. I think that's good for whitelisting/blacklisting before the device initialization. However, once the initialization is done, everything could be used to identify a port (say, the mac this patch mentioned). > > Devices with a single port per PCI address would simply use/allow "0". Yes. > > > > > > Actually, this patch also proposes a devarg like naming style: "name[,mac] > ". > > > What it does try to do is to define a standard syntax, so that the user > > > doesn't have to know any driver specific options. > > > > > > However, the mac address is changeable, leaving this naming inconsistent. > > > Well, in practice, PCI id is also changeable. > > > > > > > > OTOH, having a consistent naming seems a bit far away from this patch's > > > goal: define a standard ethdev naming and leave less harassment to the > users. > > > > > > I'm not a fan of the MAC naming scheme either, a kind of per-device physical > > port index seems more robust and doesn't require much initialization to > > determine how many ports are supported by the device and whether the index > > is > > known/valid (particularly given the vast majority exposes only one per bus > > address). I'm not a fan of the MAC naming, neither. The reason this patch proposes mac naming is that it's more clear for the user to specify a port. I also agree that the port index could be another good option. One thing I really haven't considered yet is how it becomes when the VF reprenstor comes to play? (more guys are cc'ed). > > Would it make sense to have this name usable unmodified as a valid device > > (-w) argument, including parameters? > > Yes we must provide some new key/value arguments for devargs. > So it can be used anywhere, including -w/-b options in DPDK > or port configuration in OVS. > > > > If so, PMDs could append parameters while probing the underlying device, by > > appending ",port=x", ",mac=x" followed by other unspecified parameters with > > default values. This could uniquely identify the port _and_ its > > configuration in a reusable fashion. > > > > Question: should we separate device identification and configuration > in the syntax? Yes, we should. I think we should parse the identification generically, and leave the left to the driver. > > Otherwise if all we need is unique names, we can use the opposite and much > > simpler approach. Let librte_ether assign them sequentially > > (e.g. "rte_eth%d", no need for consistency with OS netdevices), applications > > can figure the rest based on data structures if needed. > > No, unique names are not useful / not usable by users. Agree. We don't really need another unique name, since the port id is already unique. However, I think a consistent naming might be useful: user doesn't have to worry it even though MAC is changed or the PCI slot is changed. --yliu
Re: [dpdk-dev] [PATCH] maintainers: claim responsibility for virtio PMD
On Fri, Dec 01, 2017 at 08:06:04PM +0800, Yuanhan Liu wrote: > On Fri, Dec 01, 2017 at 07:04:35PM +0800, Tiwei Bie wrote: > > Add myself as co-maintainer for virtio driver. > > > > Signed-off-by: Tiwei Bie > > --- > > Hi Yuanhan and Maxime, > > > > As there will be more and more virtio related stuffs (e.g. vhost-pci, > > virtio crypto, vhost crypto) in DPDK, more maintaining efforts will be > > needed for virtio/vhost. So I volunteer to co-maintain virtio driver. > > Hope this could help! > > I'm glad that you are offering help! And > > Acked-by: Yuanhan Liu It was a mistake; I switched to an older machine, with older configs. Sorry for that. And here is the right one: Acked-by: Yuanhan Liu --yliu