Re: [EXT] Libtpa: a DPDK based userspace TCP stack implementation

2023-12-11 Thread Yuanhan Liu
On Mon, Dec 11, 2023 at 01:16:52PM +0100, Thomas Monjalon wrote:
> 11/12/2023 12:32, Jerin Jacob Kollanukkaran:
> > From: Yuanhan Liu 
> > > Hi all,
> > > 
> > > I'd like to share a new DPDK open source project, libtpa(Transport 
> > > Protocol
> > > Acceleration)[0], which is just another userspace TCP stack 
> > > implementation so
> > > far, written from scratch.
> > > 
> > > I started this project 3 years ago, while I was searching for a feasible 
> > > open
> > > source project with no luck. There were indeed quite a few options, but 
> > > none of
> > > them actually met my needs. I then started writing one. Likely, there are 
> > > still
> > > other guys out there looking for a high performance and stable userspace 
> > > TCP
> > > stack. This is what this email and libtpa for.
> > 
> > Great Yuanhan.
> > 
> > If you have time and willing to put effort, I suggest make this part of 
> > dpdk code base
> > as new library (tcp or so) and leverage + improve another existing library 
> > such ip_frag.
> > 
> > I believe, that is only way.
> > - This code soon won't soon outdated based on new DPDK version
> > - More community review and contributors
> > - More review and features from NIC vendors PoV.
> > - More arch and driver support.
> > - More quality
> 
> As Yuanhan said, there are many TCP stacks running on top of DPDK.
> We should add this one to the list:
>   https://www.dpdk.org/ecosystem/#projects
> Also a discussion has started recently about integrating one in DPDK.
> As Jerin suggests, libtpa looks like a very good candidate to focus efforts 
> on it.
> 
> Regarding performance, how does it compare with F-Stack? TLDK? Seastar?

I think it should be fair to say (I haven't done the testing though; I
never tried to run those stacks), libtpa is the userspace tcp stack with
the best performance I'm aware of. The redis numbers showed in this email
thread is just one example. Libtpa also ships an performance benchmark,
tperf. With tperf write test (and without jumboframe), libtpa can achieve
200G linerate with only one physical core for write. The read test is not
that good though, because of missing hardware acceleration features like
TSO.

Although performance is very important to an userspace stack, I still
want to point out that, during the design, performance is not my major
goal. I spent most of my effort on shaping the testing system and the
debug-ablity initially. Libtpa has been deployed in bytedance for more
than two years, till now, there is no single TCP protocol bug reported.
(I do get very few bugs reported though, but most of them are related
to the OS environment, such as sigbus due to wrong API used when running
out of tmpfs).

Thanks,
Yuanhan Liu


Re: [EXT] Libtpa: a DPDK based userspace TCP stack implementation

2023-12-11 Thread Yuanhan Liu
On Mon, Dec 11, 2023 at 11:32:16AM +, Jerin Jacob Kollanukkaran wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu 
> > Sent: Monday, December 11, 2023 3:27 PM
> > To: lib...@googlegroups.com
> > Cc: dev@dpdk.org; Yuanhan Liu 
> > Subject: [EXT] Libtpa: a DPDK based userspace TCP stack implementation
> > 
> > External Email
> > 
> > --
> > Hi all,
> > 
> > I'd like to share a new DPDK open source project, libtpa(Transport Protocol
> > Acceleration)[0], which is just another userspace TCP stack implementation 
> > so
> > far, written from scratch.
> > 
> > I started this project 3 years ago, while I was searching for a feasible 
> > open
> > source project with no luck. There were indeed quite a few options, but 
> > none of
> > them actually met my needs. I then started writing one. Likely, there are 
> > still
> > other guys out there looking for a high performance and stable userspace TCP
> > stack. This is what this email and libtpa for.
> 
> Great Yuanhan.
> 
> If you have time and willing to put effort, I suggest make this part of dpdk 
> code base
> as new library (tcp or so) and leverage + improve another existing library 
> such ip_frag.
> 
> I believe, that is only way.
> - This code soon won't soon outdated based on new DPDK version
> - More community review and contributors
> - More review and features from NIC vendors PoV.
> - More arch and driver support.
> - More quality

Hi Jerin,

Thanks for you suggestion and these really are good points!

Although libtpa is currently designed as a libray, I doubt it would suit
well as a new library to DPDK. Just taking the code base an example,
libtpa so far is about 27K lines of code. The TCP part is only about
3K lines of code. All the rest are codes supporting the TCP part, such
as sock tracing, mem file, mem file auto archive, etc. You can look
more from the internals page (or even read the code ;)

   https://github.com/bytedance/libtpa/blob/main/doc/internals.rst

Thanks,
Yuanhan Liu


> 
> Just my 2c.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "libtpa" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to libtpa+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/libtpa/BY3PR18MB478592AF236C7BBDB0285E3CC88FA%40BY3PR18MB4785.namprd18.prod.outlook.com.
> For more options, visit https://groups.google.com/d/optout.


Libtpa: a DPDK based userspace TCP stack implementation

2023-12-11 Thread Yuanhan Liu
Hi all,

I'd like to share a new DPDK open source project, libtpa(Transport
Protocol Acceleration)[0], which is just another userspace TCP stack
implementation so far, written from scratch.

I started this project 3 years ago, while I was searching for a feasible
open source project with no luck. There were indeed quite a few options,
but none of them actually met my needs. I then started writing one. Likely,
there are still other guys out there looking for a high performance and
stable userspace TCP stack. This is what this email and libtpa for.

Libtpa is fast. To demonstrate that, we did a hacky redis integration. The
benchmark shows that libtpa can boost the performance more than 5 times,
from 0.21m rps to 1.14m rps[1]. Right, it can achieve 1 million rps just
with one CPU thread. Meanwhile, the p99 latency decreases from 0.815ms
to 0.159ms.

Regarding the stableness, I'd say it's not bad, all kudos to the
comprehensive testing. I've written more than 200 tests. Together with
the testing arguments matrix[2], it can result in a big variety of test
cases. Therefore, most of the bugs are captured before deployment.

Having said that, I'd still suggest you to do as much testing as you can
if you want to use it, for libtpa is still under active development and
it's just v1.0-rc0 being released. Tons of changes have been made since
the last stable release.

There is one more thing I'm a bit proud of about libtpa: as a DPDK based
project, libtpa has rich set of debug tools[3]. The sock tracing is
particularly handy on debugging that libtpa doesn't ship a tcpdump like
tool, simply for we don't really need one.

The TCP part then may not sound that exciting. It's basically just an
initial TCP implementation, with standard congestion avoid algorithm
(New Reno). Libtpa implements slightly more than that though, such as
SACK, congestion window validation, spurious retransmission detection,
keepalive, etc.

That's all. Comments, questions, patches and testing are all welcome!

Thanks,
Yuanhan Liu

---
[0]: libtpa: https://github.com/bytedance/libtpa
[1]: redis: https://github.com/bytedance/libtpa/tree/main/doc/redis.rst
[2]: matrix shell: 
https://github.com/bytedance/libtpa/tree/main/doc/internals.rst
[3]: user guide: 
https://github.com/bytedance/libtpa/tree/main/doc/user_guide.rst


Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock

2018-07-03 Thread Yuanhan Liu



On Mon, Jul 2, 2018, at 1:44 PM, Qi Zhang wrote:
> Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> application lock or unlock on specific ethdev, a locked device
> can't be detached, this help applicaiton to prevent unexpected
> device detaching, especially in multi-process envrionment.

I'm just wondering why the need of introducing such 2 (very specific)
APIs, judging that it  looks like a ref count could solve this kind of issues
perfectly.

Also, the API name (_dev_lock/unlock) looks like an over killing, judging
that you just want to pin a device (if I understand this commit log correctly).
What firstly comes to my mind when I see "lock" was "no one can *access*
a device after another has grabbed the lock", which doesn't quite match
what you described here.

That said, if I were you, I would go with introducing few generic refcount
APIs.

--yliu


> 
> Aslo introduce the new API rte_eth_dev_lock_with_callback and
> rte_eth_dev_unlock_with callback to let application to register
> a callback function which will be invoked before a device is going
> to be detached, the return value of the function will decide if
> device will continue be detached or not, this support application
> to do condition check at runtime.
> 
> Signed-off-by: Qi Zhang 
> Reviewed-by: Anatoly Burakov 


[dpdk-dev] [dpdk-announce] DPDK 17.11.3 (LTS) released

2018-06-15 Thread Yuanhan Liu
 perf profile handler

Santosh Shukla (2):
  net/octeontx: fix null pointer dereference
  net/octeontx: fix uninitialized variable in port open

Shachar Beiser (2):
  net: add IPv6 header fields macros
  net/mlx5: fix IPv6 header fields

Shahaf Shuler (12):
  net/mlx5: fix secondary process mempool registration
  net/mlx5: remove assert un-accessible from secondary process
  net/mlx5: warn for unsuccessful memory registration
  net/mlx5: fix CRC strip capability query
  net/mlx5: change pkt burst select function prototype
  net/mlx5: enforce RSS key length limitation
  net/mlx5: fix RSS key length query
  net/mlx5: fix link status initialization
  net/mlx5: fix ethtool link setting call order
  net/mlx5: fix socket connection return value
  net/mlx5: fix probe return value polarity
  net/mlx5: fix flow director drop rule deletion crash

Shahed Shaikh (1):
  net/qede: fix unicast filter routine return code

Stefan Hajnoczi (1):
  vhost: fix message payload union in setting ring address

Stephen Hemminger (1):
  net/octeontx: fix uninitialized speed variable

Thomas Monjalon (2):
  mbuf: fix Tx checksum offload API doc
  mbuf: improve tunnel Tx offloads API doc

Tomasz Duszynski (1):
  net/mrvl: fix Rx descriptors number

Tomasz Kulasek (5):
  vhost: fix offset while mmaping log base address
  vhost: check cmsg not null
  vhost: fix device cleanup at stop
  vhost: fix realloc failure
  vhost: fix ring index returned to master on stop

Tonghao Zhang (2):
  net/bonding: free mempool used in mode 6
  vhost: fix dead lock on closing in server mode

Wenzhuo Lu (1):
  net/ixgbe: fix too many interrupts

Xiaoxin Peng (1):
  net/bnxt: fix Rx mbuf and agg ring leak in dev stop

Xueming Li (5):
  net/mlx5: fix existing file removal
  net/mlx5: map UAR address around huge pages
  net/mlx5: fix close after start failure
  net/mlx5: fix invalid flow item check
  net/mlx5: add data-plane debug message macro

Yongseok Koh (6):
  doc: add timestamp offload to mlx5 features
  net/mlx5: fix synchronization on polling Rx completions
  net/mlx5: fix disabling Tx packet inlining
  net/mlx5: remove excessive data prefetch
  net/mlx5: fix calculation of Tx TSO inline room size
  net/mlx5: change device reference for secondary process

Yuanhan Liu (1):
  version: 17.11.3

Yunjian Wang (2):
  net/i40e: fix intr callback unregister by adding retry
  net/ixgbe: fix intr callback unregister by adding retry


Re: [dpdk-dev] 17.11.3 (LTS) patches review and test

2018-06-08 Thread Yuanhan Liu



On Fri, Jun 8, 2018, at 11:00 PM, Bruce Richardson wrote:
> On Sun, May 27, 2018 at 01:35:17PM +0800, Yuanhan Liu wrote:
> > Hi all,
> > 
> > Here is a list of patches targeted for LTS release 17.11.3. Please
> > help review and test. The planned date for the final release is 8th,
> > Jun. Before that, please let me know if anyone has objections with
> > these patches being applied.
> > 
> > These patches are located at branch 17.11 of dpdk-stable repo:
> > http://dpdk.org/browse/dpdk-stable/
> > 
> > Thanks.
> > 
> > --yliu
> > 
> Hi Yuanhan,
> 
> while I understand that you are eager to get the release out, I don't
> believe all testing has been completed by us on the patches. Can the
> release be postponed by a few days to allow the testing to be completed.


Hi Bruce,

Sure, no problem, I will postpone it to next Friday. Hope it's okay to you guys.

And thanks for the testing!

--yliu



Re: [dpdk-dev] 17.11.3 (LTS) patches review and test

2018-06-08 Thread Yuanhan Liu
On Thu, Jun 07, 2018 at 11:09:56PM +, Raslan Darawsheh wrote:
> Hi,
> 
> We've tested this version and we don't see new issues with it.

Hi Raslan,

Thank you a lot for the testing!

--yliu
> 
> Kindest regards,
> Raslan Darawsheh
> 
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Sunday, May 27, 2018 8:35 AM
> To: dpdk stable 
> Cc: dev@dpdk.org; Thomas Monjalon 
> Subject: [dpdk-dev] 17.11.3 (LTS) patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for LTS release 17.11.3. Please help 
> review and test. The planned date for the final release is 8th, Jun. Before 
> that, please let me know if anyone has objections with these patches being 
> applied.
> 
> These patches are located at branch 17.11 of dpdk-stable repo:
> 
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&data=02%7C01%7Crasland%40mellanox.com%7Ce19869d187dc4afd691208d5c393b1e5%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636629961495952407&sdata=Afb4An7wMdhwquNjwMvC3iJy%2FCICWS9nR4KHlpLPIqY%3D&reserved=0
> 
> Thanks.
> 
>   --yliu
> 
> ---
> Aaron Conole (2):
>   nfp: unlink the appropriate lock file
>   nfp: allow for non-root user
> 
> Adrien Mazarguil (1):
>   app/testpmd: fix empty list of RSS queues for flow
> 
> Ajit Khaparde (9):
>   net/bnxt: fix LRO disable
>   net/bnxt: fix Rx drop setting
>   net/bnxt: set padding flags in Rx descriptor
>   net/bnxt: fix endianness of flag
>   net/bnxt: fix Rx checksum flags for tunnel frames
>   net/bnxt: free memory allocated for VF filters
>   net/bnxt: avoid invalid vnic id in set L2 Rx mask
>   net/bnxt: fix usage of vnic id
>   net/bnxt: fix Rx checksum flags
> 
> Alejandro Lucero (5):
>   net/nfp: fix assigning port id in mbuf
>   net/nfp: fix barrier location
>   net/nfp: fix link speed capabilities
>   doc: fix NFP NIC guide grammar
>   net/nfp: fix mbufs releasing when stop or close
> 
> Allain Legacy (1):
>   ip_frag: fix double free of chained mbufs
> 
> Anatoly Burakov (4):
>   app/crypto-perf: fix IOVA translation
>   mem: do not use physical addresses in IOVA as VA mode
>   vfio: do not needlessly check for IOVA mode
>   mempool: fix virtual address population
> 
> Andrew Rybchenko (6):
>   net/sfc: add missing defines for SAL annotation
>   net/sfc: fix mbuf data alignment calculation
>   net/sfc/base: fix comparison always true warning
>   mempool: fix leak when no objects are populated
>   test/mempool: fix autotest retry
>   net/sfc: ignore spec bits not covered by mask
> 
> Andy Green (3):
>   net/bnx2x: do not cast function pointers as a policy
>   net/bnx2x: fix KR2 device check
>   net/bnx2x: fix memzone name overrun
> 
> Ashish Jain (1):
>   event/dpaa2: remove link from info structure
> 
> Beilei Xing (3):
>   net/i40e: fix DDP profile DEL operation
>   net/i40e: fix link status update
>   net/i40e: fix failing to disable FDIR Tx queue
> 
> Bin Huang (1):
>   net/mlx5: add packet type index for TCP ack
> 
> Chas Williams (5):
>   net/vmxnet3: set the queue shared buffer at start
>   net/bonding: fix setting VLAN ID on slave ports
>   net/bonding: clear started state if start fails
>   net/ixgbe: fix busy wait during checking link status
>   net/bonding: export mode 4 slave info routine
> 
> Chuhong Yao (1):
>   net/liquidio: fix link state fetching during start
> 
> Ciara Loftus (1):
>   net/vhost: initialise device as inactive
> 
> Dan Gora (1):
>   kni: fix build on CentOS 7.4
> 
> Daniel Shelepov (1):
>   app/testpmd: fix burst stats reporting
> 
> David Hunt (4):
>   mk: fix make defconfig on FreeBSD
>   test/distributor: fix return type of thread function
>   test/pipeline: fix return type of stub miss
>   examples/performance-thread: fix return type of threads
> 
> Fan Zhang (2):
>   net/i40e: fix link update no wait
>   crypto/scheduler: fix possible duplicated ring names
> 
> Ferruh Yigit (4):
>   pci: remove duplicated symbol from map file
>   net/tap: fix icc build
>   drivers/net: fix link autoneg value for virtual PMDs
>   net/i40e: fix shifts of signed values
> 
> Gaetan Rivet (2):
>   bus/fslmc: fix find device start condition
>   bus/pci: fix find device implementation
> 
> Gowrishankar Muthukrishnan (2):
>   eal/ppc: remove braces in SMP memory barrier macro
>   net/bo

Re: [dpdk-dev] [PATCH] app/testpmd: fix failsafe PMD failure on exit

2018-06-01 Thread Yuanhan Liu
On Sun, May 27, 2018 at 02:37:28PM +0200, Thomas Monjalon wrote:
> 27/05/2018 06:06, Yuanhan Liu:
> > On Tue, May 22, 2018 at 07:35:08PM +0100, Ferruh Yigit wrote:
> > > + /*
> > > +  * This is a workaround to fix a virtio-user issue that
> > > +  * requires to call clean-up routine to remove existing
> > > +  * socket.
> > 
> > I came across this patch while I was cherry-picking patches to 17.11.4
> > release. And this patch seems wrong to me.
> 
> Yes it is far from perfect.
> 
> > Any particular reason why the socket removal can not be done in virtio-user
> > pmd, say at its close method?
> 
> The socket is removed in the remove function of the driver.
> The right fix is to call the remove functions of all driver from
> the EAL cleanup function.
> We have decided of this last minute workaround for testpmd
> because we need it for testing convenience, but do not want to
> take any risk with a proper fix as it is really late for that.

If there must be a workaround, I would do it at virtio-user pmd.

--yliu
> 
> 
> > > +  * This workaround valid only for testpmd, needs a fix
> > > +  * valid for all applications.
> > > +  * TODO: Implement proper resource cleanup
> > > +  */
> > > + device = rte_eth_devices[pt_id].device;
> > > + if (device && !strcmp(device->driver->name, 
> > > "net_virtio_user"))
> > >   detach_port(pt_id);
> > >   }
> > >   }
> > 
> 
> 
> 
> 


[dpdk-dev] 17.11.3 (LTS) patches review and test

2018-05-26 Thread Yuanhan Liu
Hi all,

Here is a list of patches targeted for LTS release 17.11.3. Please
help review and test. The planned date for the final release is 8th,
Jun. Before that, please let me know if anyone has objections with
these patches being applied.

These patches are located at branch 17.11 of dpdk-stable repo:
http://dpdk.org/browse/dpdk-stable/

Thanks.

--yliu

---
Aaron Conole (2):
  nfp: unlink the appropriate lock file
  nfp: allow for non-root user

Adrien Mazarguil (1):
  app/testpmd: fix empty list of RSS queues for flow

Ajit Khaparde (9):
  net/bnxt: fix LRO disable
  net/bnxt: fix Rx drop setting
  net/bnxt: set padding flags in Rx descriptor
  net/bnxt: fix endianness of flag
  net/bnxt: fix Rx checksum flags for tunnel frames
  net/bnxt: free memory allocated for VF filters
  net/bnxt: avoid invalid vnic id in set L2 Rx mask
  net/bnxt: fix usage of vnic id
  net/bnxt: fix Rx checksum flags

Alejandro Lucero (5):
  net/nfp: fix assigning port id in mbuf
  net/nfp: fix barrier location
  net/nfp: fix link speed capabilities
  doc: fix NFP NIC guide grammar
  net/nfp: fix mbufs releasing when stop or close

Allain Legacy (1):
  ip_frag: fix double free of chained mbufs

Anatoly Burakov (4):
  app/crypto-perf: fix IOVA translation
  mem: do not use physical addresses in IOVA as VA mode
  vfio: do not needlessly check for IOVA mode
  mempool: fix virtual address population

Andrew Rybchenko (6):
  net/sfc: add missing defines for SAL annotation
  net/sfc: fix mbuf data alignment calculation
  net/sfc/base: fix comparison always true warning
  mempool: fix leak when no objects are populated
  test/mempool: fix autotest retry
  net/sfc: ignore spec bits not covered by mask

Andy Green (3):
  net/bnx2x: do not cast function pointers as a policy
  net/bnx2x: fix KR2 device check
  net/bnx2x: fix memzone name overrun

Ashish Jain (1):
  event/dpaa2: remove link from info structure

Beilei Xing (3):
  net/i40e: fix DDP profile DEL operation
  net/i40e: fix link status update
  net/i40e: fix failing to disable FDIR Tx queue

Bin Huang (1):
  net/mlx5: add packet type index for TCP ack

Chas Williams (5):
  net/vmxnet3: set the queue shared buffer at start
  net/bonding: fix setting VLAN ID on slave ports
  net/bonding: clear started state if start fails
  net/ixgbe: fix busy wait during checking link status
  net/bonding: export mode 4 slave info routine

Chuhong Yao (1):
  net/liquidio: fix link state fetching during start

Ciara Loftus (1):
  net/vhost: initialise device as inactive

Dan Gora (1):
  kni: fix build on CentOS 7.4

Daniel Shelepov (1):
  app/testpmd: fix burst stats reporting

David Hunt (4):
  mk: fix make defconfig on FreeBSD
  test/distributor: fix return type of thread function
  test/pipeline: fix return type of stub miss
  examples/performance-thread: fix return type of threads

Fan Zhang (2):
  net/i40e: fix link update no wait
  crypto/scheduler: fix possible duplicated ring names

Ferruh Yigit (4):
  pci: remove duplicated symbol from map file
  net/tap: fix icc build
  drivers/net: fix link autoneg value for virtual PMDs
  net/i40e: fix shifts of signed values

Gaetan Rivet (2):
  bus/fslmc: fix find device start condition
  bus/pci: fix find device implementation

Gowrishankar Muthukrishnan (2):
  eal/ppc: remove braces in SMP memory barrier macro
  net/bonding: fix primary slave port id storage type

Harish Patil (1):
  net/qede: fix multicast filtering

Hemant Agrawal (6):
  net/dpaa: fix oob access
  bus/dpaa: fix resource leak
  net/dpaa2: fix xstats
  app/crypto-perf: fix excess crypto device error
  examples/l2fwd-crypto: fix the default aead assignments
  crypto/dpaa2_sec: fix HMAC supported digest sizes

Hyong Youb Kim (1):
  net/enic: allocate stats DMA buffer upfront during probe

Ivan Malov (2):
  net/sfc: add missing Rx fini on RSS setup fail path
  net/sfc: process RSS settings on Rx configure step

Jasvinder Singh (1):
  test/pipeline: fix type of table entry parameter

Jerin Jacob (2):
  app/crypto-perf: use strcpy for allocated string
  app/crypto-perf: fix parameters copy

John Daley (1):
  net/enic: fix crash on MTU update with non-setup queues

Junjie Chen (2):
  net/vhost: fix crash when creating vdev dynamically
  net/vhost: fix invalid state

Kirill Rybalchenko (2):
  crypto/scheduler: fix multicore rings re-use
  crypto/scheduler: fix 64-bit mask of workers cores

Lee Roberts (1):
  kni: fix build on RHEL 7.5

Matan Azrad (10):
  ethdev: fix port accessing after release
  app/testpmd: fix slave port detection
  app/testpmd: fix valid ports prints
  app/testpmd: fix forward ports update
  app/testpmd: fix forward ports Rx flu

Re: [dpdk-dev] [PATCH] app/testpmd: fix failsafe PMD failure on exit

2018-05-26 Thread Yuanhan Liu
On Tue, May 22, 2018 at 07:35:08PM +0100, Ferruh Yigit wrote:
> vdevs detach on testpmd exit implemented as workaround to fix
> a virtio-user issue. The issue was virtio-user cleanup is not
> called and existing socket file not cleaned up which will fail
> next run.
> 
> The vdev cleanup causing problems in failsafe PMD.
> 
> Reduce the cleanup to only virtio-user and add a comment that this
> workaround should be converted to a proper cleanup, not something
> specific to virtio-user, and not something specific to vdev and
> testpmd.
> 
> Fixes: fe890955114d ("app/testpmd: fix exit for vdevs")
> 
...
>  pmd_test_exit(void)
>  {
> - const struct rte_bus *bus;
>   struct rte_device *device;
>   portid_t pt_id;
>   int ret;
> @@ -2025,13 +2024,21 @@ pmd_test_exit(void)
>   if (ports != NULL) {
>   no_link_check = 1;
>   RTE_ETH_FOREACH_DEV(pt_id) {
> - device = rte_eth_devices[pt_id].device;
> - bus = rte_bus_find_by_device(device);
>   printf("\nShutting down port %d...\n", pt_id);
>   fflush(stdout);
>   stop_port(pt_id);
>   close_port(pt_id);
> - if (bus && !strcmp(bus->name, "vdev"))
> +
> + /*
> +  * This is a workaround to fix a virtio-user issue that
> +  * requires to call clean-up routine to remove existing
> +  * socket.

I came across this patch while I was cherry-picking patches to 17.11.4
release. And this patch seems wrong to me.

Any particular reason why the socket removal can not be done in virtio-user
pmd, say at its close method?

--yliu
> +  * This workaround valid only for testpmd, needs a fix
> +  * valid for all applications.
> +  * TODO: Implement proper resource cleanup
> +  */
> + device = rte_eth_devices[pt_id].device;
> + if (device && !strcmp(device->driver->name, 
> "net_virtio_user"))
>   detach_port(pt_id);
>   }
>   }
> -- 
> 2.14.3


Re: [dpdk-dev] release date for 17.11.3?

2018-05-20 Thread Yuanhan Liu
On Thu, May 17, 2018 at 10:11:40PM +0200, Thomas Monjalon wrote:
> 17/05/2018 18:16, Montorsi, Francesco:
> > Hi all,
> > I'd like to build DPDK on Centos 7.5... I found a commit 
> > (http://dpdk.org/browse/dpdk-stable/commit/?h=17.11&id=3ee847054cc9ab62fa2c9c6dc6ba68899d620e3a)
> >  that allows that, but it will go into DPDK 17.11.3 right?
> > Is there a release date for 17.11.3 already defined?
> 
> Hopefully June 8th:
>   http://dpdk.org/dev/roadmap#stable

Yes, something like that. I probably will make it a little bit earlier than 
that.

--yliu
> 
> +Cc Yuanhan



Re: [dpdk-dev] [RFC v2] vhost: new rte_vhost API proposal

2018-05-20 Thread Yuanhan Liu
On Fri, May 18, 2018 at 03:50:45PM +0200, Maxime Coquelin wrote:
> Hi Dariusz,
> 
> On 05/18/2018 03:01 PM, Dariusz Stojaczyk wrote:
> > rte_vhost is not vhost-user spec compliant. Some Vhost drivers have
> > been already confirmed not to work with rte_vhost. virtio-user-scsi-pci
> > in QEMU 2.12 doesn't fully initialize its management queues at SeaBIOS
> > stage. This is perfectly fine from the Vhost-user spec perspective, but
> > doesn't meet rte_vhost expectations. rte_vhost waits for all queues
> > to be fully initialized before it allows the entire device to be
> > processed. qFixing rte_vhost directly would require quite a big amount
> > of changes, which would completely break backwards compatibility.
> > 
> > This rte_vhost2 library is intended to smooth out the transition.
> > It exposes a low-level API for implementing new Vhost-user slaves.
> > The existing rte_vhost is about to be refactored to use rte_vhost2
> > library underneath, and demanding backends could now use rte_vhost2
> > directly.
> 
> I like the idea, and the proposed way to smooth the transition.

Please be aware of that I just had a quick glimpse of this patch and it's
likely I don't have too much time to follow this.

However, I also like this idea. And thank you for working on it.

--yliu


Re: [dpdk-dev] [PATCH] maintainers: resign from vhost and vdev

2018-05-05 Thread Yuanhan Liu
On Thu, May 03, 2018 at 05:44:29PM +0200, Thomas Monjalon wrote:
> 03/05/2018 12:49, Burakov, Anatoly:
> > On 03-May-18 11:02 AM, Yang, Zhiyong wrote:
> > > Thanks a lot for your excellent contributions to DPDK community,  
> > > Jianfeng.
> > > See you later.
> > 
> > +1, you will be missed!
> 
> +1
> Thanks a lot Jianfeng

+1, and good luck!

--yliu


[dpdk-dev] [dpdk-announce] DPDK 17.11.1 (LTS) released

2018-02-27 Thread Yuanhan Liu
t/octeontx: fix Rx adapter port id mapping

Phil Yang (2):
  test/memzone: fix NULL freeing
  test/memzone: fix freeing test

Qi Zhang (2):
  net/i40e: fix VLAN offload setting
  net/i40e: exclude LLDP packet count

Radu Nicolau (3):
  examples/bond: fix vdev name
  examples/bond: check mbuf allocation
  net/bonding: check error of MAC address setting

Rafal Kozik (1):
  net/ena: do not set Tx L4 offloads in Rx path

Rasesh Mody (4):
  net/qede: replace config option with run-time arg
  net/qede: fix MTU set and max Rx length
  net/qede: fix few log messages
  net/qede: fix clearing of queue stats

Raslan Darawsheh (1):
  net/mlx5: fix flow type for allmulti rules

Roman Zhukov (2):
  net/sfc: fix initialization of flow structure
  net/sfc: fix flow RSS check in error handling

Rosen Xu (1):
  net/i40e: fix packet type for X722

Samuel Gauthier (1):
  net/virtio: fix Rx and Tx handler selection for ARM32

Shachar Beiser (1):
  net/mlx5: fix IPv6 header fields

Shahaf Shuler (9):
  app/testpmd: fix port configuration print
  app/testpmd: fix flowgen forwarding offload flags
  net/mlx5: fix VLAN configuration after port stop
  net/mlx5: fix RSS key configuration
  net/mlx5: fix missing RSS capability
  net/mlx5: fix memory region cache lookup
  net/mlx5: fix memory region cache last index
  net/mlx5: fix memory region boundary checks
  net/mlx5: fix link state on device start

Somnath Kotur (3):
  net/bnxt: fix duplicate filter pattern creation error
  net/bnxt: fix duplicate pattern for 5tuple filter
  net/bnxt: free the aggregation ring

Stephen Hemminger (1):
  net/tap: remove unused kernel version definitions

Thierry Herbelot (1):
  net/mlx5: cleanup allocation of ethtool stats

Thomas Monjalon (1):
  ethdev: fix link autonegotiation value

Timothy Redaelli (1):
  app/testpmd: remove xenvirt again

Tiwei Bie (2):
  net/virtio: fix vector Rx flushing
  net/virtio: fix typo in LRO support

Tomasz Jozwiak (2):
  crypto/qat: fix out-of-bounds access
  crypto/qat: fix parameter type

Victor Kaplansky (1):
  vhost: protect active rings from async ring changes

Vipin Varghese (1):
  net/pcap: fix the NUMA id display in logs

Wei Dai (2):
  app/testpmd: fix invalid Rx queue number setting
  app/testpmd: fix invalid Tx queue number setting

Wei Zhao (5):
  net/ixgbe: fix tunnel filter fail problem
  net/i40e: add FDIR NVGRE parameter check
  net/ixgbe: fix parsing FDIR NVGRE issue
  net/i40e: fix port segmentation fault when restart
  net/ixgbe: fix reset error handling

Wenzhuo Lu (4):
  net/ixgbe: fix wrong PBA setting
  net/i40e: fix VF Rx interrupt enabling
  net/ixgbe: fix VF Rx interrupt enabling
  net/e1000: fix VF Rx interrupt enabling

Xiao Wang (2):
  net/fm10k: fix logical port delete
  igb_uio: allow multi-process access

Xueming Li (2):
  cmdline: fix dynamic tokens parsing
  cmdline: avoid garbage in unused fields of parsed result

Yangchao Zhou (1):
  net/igb: fix Tx queue number assignment

Yanglong Wu (3):
  app/testpmd: fix port id allocation
  net/ixgbe: fix the failure of number of Tx queue check
  net/ixgbe: fix max queue number for VF

Yong Wang (4):
  net/e1000: fix null pointer check
  net/i40e: fix memory leak
  crypto/qat: fix allocation check and leak
  net/dpaa: fix potential memory leak

Yongseok Koh (10):
  app/testpmd: fix crash of txonly with multiple segments
  net/mlx5: fix Memory Region registration
  net/mlx5: fix overflow of Memory Region cache
  net/mlx5: fix HW checksum offload for outer IP
  net/mlx5: fix overwriting bit-fields in SW Rx queue
  net/mlx5: fix deadlock of link status alarm
  net/mlx5: fix missing attribute size for drop action
  net/mlx5: fix calculation of flow ID flag
  net/mlx5: fix memory region lookup
  net/mlx5: fix handling link status event

Yuanhan Liu (5):
  Revert "net/mlx5: fix IPv6 header fields"
  Revert "ethdev: fix port id allocation"
  net/mlx5: use PCI address as port name
  Revert "net/mlx5: fix flow type for allmulti rules"
  version: 17.11.1

Zhiyong Yang (3):
  bus/pci: fix interrupt handler type
  examples/vhost: fix startup check
  cryptodev: fix session pointer cast


Re: [dpdk-dev] 17.11.1 patches review and test

2018-02-26 Thread Yuanhan Liu
On Mon, Feb 26, 2018 at 07:38:32PM +0530, gowrishankar muthukrishnan wrote:
> Hi yliu
> Could you please include below patch merged 18.02 ?. Apologies for short
> notice.
> 
> http://dpdk.org/dev/patchwork/patch/34690/

Done. Thanks.

--yliu
> 
> Thanks,
> Gowrishankar
> 
> On Monday 19 February 2018 06:13 PM, Yuanhan Liu wrote:
> >Hi all,
> >
> >Here is a list of patches targeted for LTS release 17.11.1. Please
> >help review and test. The planned date for the final release is 27th,
> >Feb. Before that, please shout if anyone has objections with these
> >patches being applied.
> >
> >These patches are located at branch 17.11 of dpdk-stable repo:
> > http://dpdk.org/browse/dpdk-stable/
> >
> >Thanks.
> >
> > --yliu
> >
> >---
> >Adrien Mazarguil (7):
> >   net/mlx4: fix unnecessary include
> >   net/i40e: fix ISO C in exported header
> >   flow_classify: fix ISO C in exported header
> >   member: fix ISO C in exported header
> >   lib: fix missing includes in exported headers
> >   net/failsafe: fix invalid free
> >   net/mlx4: fix drop flow resources leak
> >
> >Ajit Khaparde (7):
> >   net/bnxt: fix double increment of idx during Tx ring alloc
> >   net/bnxt: parse checksum offload flags
> >   net/bnxt: fix group info usage
> >   net/bnxt: fix check for ether type
> >   net/bnxt: fix size of Tx ring in HW
> >   net/bnxt: fix number of pools for RSS
> >   net/bnxt: fix return code in MAC address set
> >
> >Akhil Goyal (2):
> >   security: fix enum start value
> >   examples/ipsec-secgw: fix corner case for SPI value
> >
> >Alejandro Lucero (3):
> >   net/nfp: fix MTU settings
> >   net/nfp: fix jumbo settings
> >   net/nfp: fix CRC strip check behaviour
> >
> >Anatoly Burakov (16):
> >   memzone: fix leak on allocation error
> >   malloc: protect stats with lock
> >   malloc: fix end for bounded elements
> >   vfio: fix enabled check on error
> >   app/procinfo: add compilation option in config
> >   test: register test as failed if setup failed
> >   test/table: fix uninitialized parameter
> >   test/memzone: fix wrong test
> >   member: fix memory leak on error
> >   usertools/devbind: fix kernel module reporting
> >   test/bitmap: fix memory leak
> >   test/reorder: fix memory leak
> >   test/ring: fix memory leak
> >   test/ring_perf: fix memory leak
> >   test/table: fix memory leak
> >   test/timer_perf: fix memory leak
> >
> >Andrea Grandi (2):
> >   doc: fix lists of supported crypto algorithms
> >   doc: fix format in OpenSSL installation guide
> >
> >Andrew Rybchenko (7):
> >   net/sfc: stop periodic DMA if MAC stats upload fails
> >   net/sfc: fix multicast address list copy memory leak
> >   net/sfc: fix DMA memory leak after kvarg processing failure
> >   net/sfc: fix label name to be consistent
> >   net/sfc: do not hold management event queue lock while MCDI
> >   net/sfc: fix incorrect bitwise ORing of L3/L4 packet types
> >   mempool: fix physical contiguous check
> >
> >Andriy Berestovskyy (1):
> >   keepalive: fix state alignment
> >
> >Anoob Joseph (1):
> >   examples/ipsec-secgw: fix usage of incorrect port
> >
> >Ashish Jain (1):
> >   net/dpaa: fix the mbuf packet type if zero
> >
> >Bao-Long Tran (1):
> >   examples/ip_pipeline: fix timer period unit
> >
> >Beilei Xing (12):
> >   net/i40e: fix VLAN offload setting issue
> >   net/i40e: fix FDIR input set conflict
> >   net/i40e: fix FDIR rule confiliction issue
> >   net/i40e: fix setting MAC address of VF
> >   net/i40e: fix flow director Rx resource defect
> >   net/i40e: warn when writing global registers
> >   net/i40e: add debug logs when writing global registers
> >   net/i40e: fix multiple driver support
> >   net/i40e: fix interrupt conflict with multi-driver
> >   net/i40e: fix Rx interrupt
> >   net/i40e: check multi-driver option parsing
> >   app/testpmd: fix flow director filter
> >
> >Chas Williams (1):
> >   net/bonding: fix setting slave MAC addresses
> >
> >David Harton (1):
> >   net/i40e: fix VF reset stats crash
> >
> >Didier Pallard (1):
> >   net/virtio: fix incorrect cast

[dpdk-dev] 17.11.1 patches review and test

2018-02-19 Thread Yuanhan Liu
/qede: fix few log messages

Raslan Darawsheh (1):
  net/mlx5: fix flow type for allmulti rules

Roman Zhukov (2):
  net/sfc: fix initialization of flow structure
  net/sfc: fix flow RSS check in error handling

Rosen Xu (1):
  net/i40e: fix packet type for X722

Samuel Gauthier (1):
  net/virtio: fix Rx and Tx handler selection for ARM32

Shachar Beiser (1):
  net/mlx5: fix IPv6 header fields

Shahaf Shuler (9):
  app/testpmd: fix port configuration print
  app/testpmd: fix flowgen forwarding offload flags
  net/mlx5: fix VLAN configuration after port stop
  net/mlx5: fix RSS key configuration
  net/mlx5: fix missing RSS capability
  net/mlx5: fix memory region cache lookup
  net/mlx5: fix memory region cache last index
  net/mlx5: fix memory region boundary checks
  net/mlx5: fix link state on device start

Somnath Kotur (3):
  net/bnxt: fix duplicate filter pattern creation error
  net/bnxt: fix duplicate pattern for 5tuple filter
  net/bnxt: free the aggregation ring

Stephen Hemminger (1):
  net/tap: remove unused kernel version definitions

Thierry Herbelot (1):
  net/mlx5: cleanup allocation of ethtool stats

Thomas Monjalon (1):
  ethdev: fix link autonegotiation value

Timothy Redaelli (1):
  app/testpmd: remove xenvirt again

Tiwei Bie (2):
  net/virtio: fix vector Rx flushing
  net/virtio: fix typo in LRO support

Tomasz Jozwiak (2):
  crypto/qat: fix out-of-bounds access
  crypto/qat: fix parameter type

Victor Kaplansky (1):
  vhost: protect active rings from async ring changes

Vipin Varghese (1):
  net/pcap: fix the NUMA id display in logs

Wei Dai (2):
  app/testpmd: fix invalid Rx queue number setting
  app/testpmd: fix invalid Tx queue number setting

Wei Zhao (5):
  net/ixgbe: fix tunnel filter fail problem
  net/i40e: add FDIR NVGRE parameter check
  net/ixgbe: fix parsing FDIR NVGRE issue
  net/i40e: fix port segmentation fault when restart
  net/ixgbe: fix reset error handling

Wenzhuo Lu (4):
  net/ixgbe: fix wrong PBA setting
  net/i40e: fix VF Rx interrupt enabling
  net/ixgbe: fix VF Rx interrupt enabling
  net/e1000: fix VF Rx interrupt enabling

Xiao Wang (2):
  net/fm10k: fix logical port delete
  igb_uio: allow multi-process access

Xueming Li (2):
  cmdline: fix dynamic tokens parsing
  cmdline: avoid garbage in unused fields of parsed result

Yangchao Zhou (1):
  net/igb: fix Tx queue number assignment

Yanglong Wu (3):
  app/testpmd: fix port id allocation
  net/ixgbe: fix the failure of number of Tx queue check
  net/ixgbe: fix max queue number for VF

Yong Wang (4):
  net/e1000: fix null pointer check
  net/i40e: fix memory leak
  crypto/qat: fix allocation check and leak
  net/dpaa: fix potential memory leak

Yongseok Koh (10):
  app/testpmd: fix crash of txonly with multiple segments
  net/mlx5: fix Memory Region registration
  net/mlx5: fix overflow of Memory Region cache
  net/mlx5: fix HW checksum offload for outer IP
  net/mlx5: fix overwriting bit-fields in SW Rx queue
  net/mlx5: fix deadlock of link status alarm
  net/mlx5: fix missing attribute size for drop action
  net/mlx5: fix calculation of flow ID flag
  net/mlx5: fix memory region lookup
  net/mlx5: fix handling link status event

Yuanhan Liu (4):
  Revert "net/mlx5: fix IPv6 header fields"
  Revert "ethdev: fix port id allocation"
  net/mlx5: use PCI address as port name
  Revert "net/mlx5: fix flow type for allmulti rules"

Zhiyong Yang (3):
  bus/pci: fix interrupt handler type
  examples/vhost: fix startup check
  cryptodev: fix session pointer cast


Re: [dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation

2018-02-13 Thread Yuanhan Liu
On Tue, Feb 13, 2018 at 12:51:57PM +, Ferruh Yigit wrote:
> On 1/13/2016 4:49 PM, stephen at networkplumber.org (Stephen Hemminger) wrote:
> > On Wed, 13 Jan 2016 15:07:08 +
> > Bruce Richardson  wrote:
> > 
> >> So, while the two-tab indent may look "a bit weird" it does solve the two 
> >> issues
> >> above. I believe practical benefits should override initial impressions. 
> >> [It took
> >> me a while to get used to also, but now I very much like it as a style.]
> > 
> > I don't think that deviating from kernel style for this case is justified.
> 
> This is very old patch still sitting in patchwork, re-visiting it mainly to be
> able to clean the patchwork.

Just drop it, there is no reason to revisit it.

--yliu
> 
> This is a syntax change request and although I have my personal preferences I
> would be OK with whatever decided.
> 
> Currently there is already a decided syntax, changing it after this point will
> cause either mixed usage or a big syntax cleanup patch. I think both are not 
> good.
> 
> I am for continue whatever documented in current DPDK coding style doc, hence
> NAK from my side.
> 
> Thanks,
> ferruh


Re: [dpdk-dev] [PATCH] MAINTAINERS: resign from vhost/virtio maintainer

2018-02-12 Thread Yuanhan Liu
On Mon, Feb 12, 2018 at 10:54:08AM +0100, Maxime Coquelin wrote:
> Hi Yuanhan,
> 
> On 02/12/2018 10:44 AM, Yuanhan Liu wrote:
> >On Mon, Feb 12, 2018 at 04:51:58PM +0800, Yuanhan Liu wrote:
> >>I was doing terrible reviews jobs recently, and to not hold back
> >>the vhost/virtio development, it's better for me to resign.
> 
> I would not put this in the commit message, because even you had less
> time to review vhost/virtio patches recently, you have done a great work
> on maintaining virtio and vhost generally, we owe you a lot of thanks
> for this.

Thank you!

> >>Then here it is.
> >
> >One thing I forgot to mention is, not surprisingly, I will also resign
> >from the committer role on the next-virtio tree.
> 
> I propose myself for the committer role, I have already some experience as
> Kernel maintainer for some ARM SoCs.

Thank you again for volunteering youself to take care of it. I think
you have proved that well since you joined DPDK community :)

So Acked-by: Yuanhan Liu 

Also Cc'ed the teckboard, per ask from Thomas.

--yliu


> 
> Cheers,
> Maxime
> 
> > --yliu
> >>
> >>Signed-off-by: Yuanhan Liu 
> >>---
> >>  MAINTAINERS | 3 ---
> >>  1 file changed, 3 deletions(-)
> >>
> >>diff --git a/MAINTAINERS b/MAINTAINERS
> >>index 4f1f33b..f12d83d 100644
> >>--- a/MAINTAINERS
> >>+++ b/MAINTAINERS
> >>@@ -557,7 +557,6 @@ F: doc/guides/nics/vmxnet3.rst
> >>  F: doc/guides/nics/features/vmxnet3.ini
> >>  Vhost-user
> >>-M: Yuanhan Liu 
> >>  M: Maxime Coquelin 
> >>  T: git://dpdk.org/next/dpdk-next-virtio
> >>  F: lib/librte_vhost/
> >>@@ -569,14 +568,12 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst
> >>  Vhost PMD
> >>  M: Tetsuya Mukawa 
> >>-M: Yuanhan Liu 
> >>  M: Maxime Coquelin 
> >>  T: git://dpdk.org/next/dpdk-next-virtio
> >>  F: drivers/net/vhost/
> >>  F: doc/guides/nics/features/vhost.ini
> >>  Virtio PMD
> >>-M: Yuanhan Liu 
> >>  M: Maxime Coquelin 
> >>  M: Tiwei Bie 
> >>  T: git://dpdk.org/next/dpdk-next-virtio
> >>-- 
> >>2.7.4


Re: [dpdk-dev] [PATCH] MAINTAINERS: resign from vhost/virtio maintainer

2018-02-12 Thread Yuanhan Liu
On Mon, Feb 12, 2018 at 04:51:58PM +0800, Yuanhan Liu wrote:
> I was doing terrible reviews jobs recently, and to not hold back
> the vhost/virtio development, it's better for me to resign.
> 
> Then here it is.

One thing I forgot to mention is, not surprisingly, I will also resign
from the committer role on the next-virtio tree.

--yliu
> 
> Signed-off-by: Yuanhan Liu 
> ---
>  MAINTAINERS | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4f1f33b..f12d83d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -557,7 +557,6 @@ F: doc/guides/nics/vmxnet3.rst
>  F: doc/guides/nics/features/vmxnet3.ini
>  
>  Vhost-user
> -M: Yuanhan Liu 
>  M: Maxime Coquelin 
>  T: git://dpdk.org/next/dpdk-next-virtio
>  F: lib/librte_vhost/
> @@ -569,14 +568,12 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst
>  
>  Vhost PMD
>  M: Tetsuya Mukawa 
> -M: Yuanhan Liu 
>  M: Maxime Coquelin 
>  T: git://dpdk.org/next/dpdk-next-virtio
>  F: drivers/net/vhost/
>  F: doc/guides/nics/features/vhost.ini
>  
>  Virtio PMD
> -M: Yuanhan Liu 
>  M: Maxime Coquelin 
>  M: Tiwei Bie 
>  T: git://dpdk.org/next/dpdk-next-virtio
> -- 
> 2.7.4


[dpdk-dev] [PATCH] MAINTAINERS: resign from vhost/virtio maintainer

2018-02-12 Thread Yuanhan Liu
I was doing terrible reviews jobs recently, and to not hold back
the vhost/virtio development, it's better for me to resign.

Then here it is.

Signed-off-by: Yuanhan Liu 
---
 MAINTAINERS | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4f1f33b..f12d83d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -557,7 +557,6 @@ F: doc/guides/nics/vmxnet3.rst
 F: doc/guides/nics/features/vmxnet3.ini
 
 Vhost-user
-M: Yuanhan Liu 
 M: Maxime Coquelin 
 T: git://dpdk.org/next/dpdk-next-virtio
 F: lib/librte_vhost/
@@ -569,14 +568,12 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst
 
 Vhost PMD
 M: Tetsuya Mukawa 
-M: Yuanhan Liu 
 M: Maxime Coquelin 
 T: git://dpdk.org/next/dpdk-next-virtio
 F: drivers/net/vhost/
 F: doc/guides/nics/features/vhost.ini
 
 Virtio PMD
-M: Yuanhan Liu 
 M: Maxime Coquelin 
 M: Tiwei Bie 
 T: git://dpdk.org/next/dpdk-next-virtio
-- 
2.7.4



[dpdk-dev] [PATCH] net/mlx4: use PCI address as port name

2018-02-11 Thread Yuanhan Liu
It is suggested to use PCI BDF to identify a port for port addition
in OVS-DPDK. While mlx5 has its own naming style: name it by ib dev
name. This breaks the typical OVS DPDK use case and brings more puzzle
to the end users.

To fix it, this patch changes it to use PCI BDF as the name, too.
Judging the fact that there are 2 ports associated with one PCI for
ConnectX-3, a postfix is needed. Thus, the final name looks like
something below:

:04:00.0-port0
:04:00.0-port1

Cc: sta...@dpdk.org
Signed-off-by: Yuanhan Liu 
---
 drivers/net/mlx4/mlx4.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ee93daf..eb8851c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -497,6 +497,14 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct priv *priv = NULL;
struct rte_eth_dev *eth_dev = NULL;
struct ether_addr mac;
+   char name[RTE_ETH_NAME_MAX_LEN];
+   int len;
+
+   len = snprintf(name, sizeof(name), PCI_PRI_FMT,
+pci_dev->addr.domain, pci_dev->addr.bus,
+pci_dev->addr.devid, pci_dev->addr.function);
+   if (device_attr.phys_port_cnt > 1)
+   snprintf(name + len, sizeof(name), "-port%u", i);
 
/* If port is not enabled, skip. */
if (!(conf.ports.enabled & (1 << i)))
@@ -605,14 +613,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
/* Get actual MTU if possible. */
mlx4_mtu_get(priv, &priv->mtu);
DEBUG("port %u MTU is %u", priv->port, priv->mtu);
-   /* from rte_ethdev.c */
-   {
-   char name[RTE_ETH_NAME_MAX_LEN];
-
-   snprintf(name, sizeof(name), "%s port %u",
-mlx4_glue->get_device_name(ibv_dev), port);
-   eth_dev = rte_eth_dev_allocate(name);
-   }
+   eth_dev = rte_eth_dev_allocate(name);
if (eth_dev == NULL) {
ERROR("can not allocate rte ethdev");
rte_errno = ENOMEM;
-- 
2.7.4



Re: [dpdk-dev] [PATCH] maintainers: update vhost lib and pmd

2018-02-10 Thread Yuanhan Liu
On Sun, Feb 11, 2018 at 01:04:51AM +, Jianfeng Tan wrote:
> Signed-off-by: Jianfeng Tan 

Acked-by: Yuanhan Liu 

And welcome! BTW, the verb "update" doesn't look good to me ;)

--yliu
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4f1f33b..38e5fb8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -559,6 +559,7 @@ F: doc/guides/nics/features/vmxnet3.ini
>  Vhost-user
>  M: Yuanhan Liu 
>  M: Maxime Coquelin 
> +M: Jianfeng Tan 
>  T: git://dpdk.org/next/dpdk-next-virtio
>  F: lib/librte_vhost/
>  F: doc/guides/prog_guide/vhost_lib.rst
> @@ -571,6 +572,7 @@ Vhost PMD
>  M: Tetsuya Mukawa 
>  M: Yuanhan Liu 
>  M: Maxime Coquelin 
> +M: Jianfeng Tan 
>  T: git://dpdk.org/next/dpdk-next-virtio
>  F: drivers/net/vhost/
>  F: doc/guides/nics/features/vhost.ini
> -- 
> 2.7.4


[dpdk-dev] [PATCH] net/mlx5: fix link speed

2018-02-05 Thread Yuanhan Liu
When the link is down, mlx5 kernel driver reports the link speed as -1
(UNKNOWN_SPEED). We need turn it to 0 for such case, otherwise, it
will be re-queried again due to the link_speed is not 0, due to following
code:

  1201   if (((link->link_speed == 0) && link->link_status) ||
  1202   ((link->link_speed != 0) && !link->link_status)) {
  1203   /*
  1204* Inconsistent status. Event likely occurred before the
  1205* kernel netdevice exposes the new status.
  1206*/

Fixes: 188408719888 ("net/mlx5: fix support for newer link speeds")

Cc: sta...@dpdk.org
Signed-off-by: Yuanhan Liu 
---
 drivers/net/mlx5/mlx5_ethdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6665076..492ca07 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -808,6 +808,10 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int 
wait_to_complete)
return -1;
}
dev_link.link_speed = ecmd->speed;
+   if (link_speed == -1)
+   dev_link.link_speed = 0;
+   else
+   dev_link.link_speed = link_speed;
sc = ecmd->link_mode_masks[0] |
((uint64_t)ecmd->link_mode_masks[1] << 32);
priv->link_speed_capa = 0;
-- 
2.7.4



Re: [dpdk-dev] [PATCH v3 0/2] vhost: IOTLB fixes

2018-02-01 Thread Yuanhan Liu
On Mon, Jan 29, 2018 at 05:30:38PM +0100, Maxime Coquelin wrote:
> First patch of the series fixes OOM handling from the IOTLB
> mempool, the second one removes pending IOTLB entry when the
> IOTLB miss request sending failed.
> 
> Changes since v2:
> -
> - patch 2: Fix error message with correct IOVA
> 
> Changes since v1:
> -
> - Make log levels consistent (Tiwei)
> - Remove pending IOTLB entry of miss request seding failed (Tiwei)
> 
> Maxime Coquelin (2):
>   vhost: fix iotlb pool out-of-memory handling
>   vhost: remove pending IOTLB entry if IOTLB MISS request sending failed

Series Acked-by: Yuanhan Liu 

Thanks.

--yliu
> 
>  lib/librte_vhost/iotlb.c | 20 ++--
>  lib/librte_vhost/iotlb.h |  3 +++
>  lib/librte_vhost/vhost.c | 13 ++---
>  3 files changed, 27 insertions(+), 9 deletions(-)
> 
> -- 
> 2.14.3


Re: [dpdk-dev] [PATCH] examples/vhost_scsi: drop unimplemented EVENT_IDX feature bit

2018-02-01 Thread Yuanhan Liu
On Wed, Jan 31, 2018 at 05:48:28PM +, Stefan Hajnoczi wrote:
> The vhost_scsi example application negotiates the
> VIRTIO_RING_F_EVENT_IDX feature bit but does not honor it when accessing
> vrings.
> 
> In particular, commit e37ff954405addb8ea422426a2d162d00dcad196 ("vhost:
> support virtqueue interrupt/notification suppression") broke vring call
> because vq->last_used_idx is never updated by vhost_scsi.  The
> vq->last_used_idx field is not even available via the librte_vhost
> public API, so VIRTIO_RING_F_EVENT_IDX is currently only usable by the
> built-in virtio_net.c driver in librte_vhost.
> 
> This patch drops VIRTIO_RING_F_EVENT_IDX from vhost_scsi so that vring
> call works again.
> 
> Cc: Changpeng Liu 
> Cc: Junjie Chen 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Yuanhan Liu 

Thanks.

--yliu
> ---
>  examples/vhost_scsi/vhost_scsi.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/examples/vhost_scsi/vhost_scsi.c 
> b/examples/vhost_scsi/vhost_scsi.c
> index da01ad378..3cb4383e9 100644
> --- a/examples/vhost_scsi/vhost_scsi.c
> +++ b/examples/vhost_scsi/vhost_scsi.c
> @@ -21,7 +21,6 @@
>  #include "scsi_spec.h"
>  
>  #define VIRTIO_SCSI_FEATURES ((1 << VIRTIO_F_NOTIFY_ON_EMPTY) |\
> -   (1 << VIRTIO_RING_F_EVENT_IDX) |\
> (1 << VIRTIO_SCSI_F_INOUT) |\
> (1 << VIRTIO_SCSI_F_CHANGE))
>  
> -- 
> 2.14.3


Re: [dpdk-dev] [PATCH 0/2] vhost: fix VIRTIO_NET_F_MQ vhost_scsi breakage

2018-02-01 Thread Yuanhan Liu
On Wed, Jan 31, 2018 at 05:46:49PM +, Stefan Hajnoczi wrote:
> These patches fix a recent regression in librte_vhost that breaks the
> vhost_scsi example application.  vhost_user.c assumes all devices are vhost 
> net
> backends when handling the VIRTIO_NET_F_MQ feature bit.  The code is triggered
> by vhost scsi devices and causes virtqueues to be removed.  See Patch 2 for
> details.
> 
> Patch 1 puts the infrastructure in place to distinguish between the built-in
> virtio_net.c driver and generic vhost device backend usage.
> 
> Patch 2 fixes the regression by handling VIRTIO_NET_F_MQ only when the 
> built-in
> virtio_net.c driver is in use.
> 
> Stefan Hajnoczi (2):
>   vhost: add flag for built-in virtio_net.c driver
>   vhost: only drop vqs with built-in virtio_net.c driver

Series Acked-by: Yuanhan Liu 

Thanks.

--yliu
> 
>  lib/librte_vhost/vhost.h  |  3 +++
>  lib/librte_vhost/socket.c | 15 +++
>  lib/librte_vhost/vhost.c  | 17 -
>  lib/librte_vhost/vhost_user.c |  3 ++-
>  lib/librte_vhost/virtio_net.c | 14 ++
>  5 files changed, 50 insertions(+), 2 deletions(-)
> 
> -- 
> 2.14.3


Re: [dpdk-dev] [PATCH 1/2] vhost: add flag for built-in virtio_net.c driver

2018-02-01 Thread Yuanhan Liu
Hi Stefan,

On Wed, Jan 31, 2018 at 05:46:50PM +, Stefan Hajnoczi wrote:
> The librte_vhost API is used in two ways:
> 1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst().

This is how DPDK vhost-user firstly implemented.

> 2. As a library for implementing vhost device backends.

This is how DPDK vhost-use extended later, and vhost-user scsi is
the first one being added.

> There is no distinction between the two at the API level or in the
> librte_vhost implementation.  For example, device state is kept in
> "struct virtio_net" regardless of whether this is actually a net device
> backend or whether the built-in virtio_net.c driver is in use.

Indeed. virtio_net should be renamed to "vhost_dev" or something like
this. It's part of something un-finished in the last vhost-user extension
refactoring.

> 
> The virtio_net.c driver should be a librte_vhost API client just like
> the vhost-scsi code and have no special access to vhost.h internals.
> Unfortunately, fixing this requires significant librte_vhost API
> changes.

The way I thought was to move the virtio_net.c completely to vhost
pmd (drivers/net/vhost). And let vhost-user just be a generic lib
without any device specific stuff.

Unfortunately, it can not be done recently, as there are still a lot
of applications using rte_vhost_enqueue/dequeue_burst directly, for
example, OVS.

> This patch takes a different approach: keep the librte_vhost API
> unchanged but track whether the built-in virtio_net.c driver is in use.
> See the next patch for a bug fix that requires knowledge of whether
> virtio_net.c is in use.

LGTM.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v3 0/4] net/virtio: fix memory leak when reinitializing device

2018-01-26 Thread Yuanhan Liu
On Tue, Jan 23, 2018 at 04:54:39PM +0100, Olivier Matz wrote:
> When devops->configure() is called, the queues may be reallocated
> if the features flag changed, but the previous are not freed. This
> patchset fixes this issue.

Seires applied to dpdk-next-virtio.

Thanks.

--yliu
> 
> To really point out the issue, I instrumented rte_malloc,
> rte_memzone_reserve, rte_mbuf_alloc to track the allocations and frees.
> For reference, the patch is available here:
> https://www.droids-corp.org/~zer0/hidden/0001-debug-malloc.patch
> Iit conflicts a bit with the patchset, but it is quite easy to solve.
> 
> To reproduce the issue:
> 
>   cd dpdk
>   make config T=x86_64-native-linuxapp-gcc
>   sed -i 
> 's,CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=n,CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=y,'
>  build/.config
>   make -j4
>   mkdir -p /mnt/huge
>   mount -t hugetlbfs nodev /mnt/huge
>   echo 256 > 
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>   modprobe uio_pci_generic
>   python usertools/dpdk-devbind.py -b uio_pci_generic :00:02.0
>   ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
>  -i --port-topology=chained --disable-hw-vlan-filter \
>  --disable-hw-vlan-strip --enable-rx-cksum --enable-lro \
>  --txqflags=0 --no-lsc-interrupt
> 
>   # testpmd commands
>   port stop 0
>   port config all rx-cksum off
>   port start 0
>   quit
> 
> Without the patchset, the debug patch displays a leak of mbufs allocated
> from virtio_dev_rx_queue_setup_finish() and a leak of rte_mallocs allocated
> from virtio_init_queue().
> 
> After the patchset, only one malloc is remaining after the device close,
> but this is normal, because this area is freed in dev_uninit().
> 
> v2 -> v3:
> - fix queue flushing with vector Rx enabled (Tiwei).
>   The patch is slightly changed compared to what was proposed by mail,
>   due to a checkpatch warning.
> 
> v1 -> v2:
> - move the vq != NULL check from the refactor patch to the memory
>   leak fix patch (Yuanhan)
> - new patch to fix queue flushing with vector Rx enabled (Tiwei)
> - move the backportable fixes at the beginning of the patchset (Yuanhan)
> - remove Cc stable on the typo fix (Yuanhan)
>   note: I finally kept it in the patchset
> 
> 
> Olivier Matz (4):
>   net/virtio: fix queue flushing with vector Rx enabled
>   net/virtio: fix memory leak when reinitializing device
>   net/virtio: rationalize queue flushing
>   net/virtio: fix typo in function name
> 
>  drivers/net/virtio/virtio_ethdev.c | 71 
> +-
>  drivers/net/virtio/virtqueue.c | 30 +---
>  drivers/net/virtio/virtqueue.h | 13 ++-
>  3 files changed, 69 insertions(+), 45 deletions(-)
> 
> -- 
> 2.11.0


Re: [dpdk-dev] [PATCH] net/virtio: fix Rx and Tx handler selection for arm32

2018-01-26 Thread Yuanhan Liu
On Thu, Dec 14, 2017 at 03:45:04PM +0100, Maxime Coquelin wrote:
> Hi Olivier,
> 
> On 12/14/2017 03:32 PM, Olivier Matz wrote:
> >From: Samuel Gauthier
> >
> >On arm32, we were always selecting the simple handler, but it is only
> >available if neon is present.
> >
> >This is due to a typo in the name of the config option.
> >CONFIG_RTE_ARCH_ARM is for Makefiles. One should use RTE_ARCH_ARM.
> >
> >Fixes: 2d7c37194ee4 ("net/virtio: add NEON based Rx handler")
> >Cc:sta...@dpdk.org
> >
> >Signed-off-by: Samuel Gauthier
> >---
> >  drivers/net/virtio/virtio_ethdev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Maxime Coquelin 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v2] net/virtio-user: fix segfault as features change

2018-01-26 Thread Yuanhan Liu
On Tue, Jan 23, 2018 at 09:52:43AM +, Jianfeng Tan wrote:
> Since commit 59fe5e17d930 ("vhost: propagate set features handling error"),
> vhost does not allow to set different features without reset.
> 
> The virito-user driver fails to reset the device in below commit.
> 
> To fix, we send the reset message as stopping the device.
> 
> Fixes: c12a26ee209e ("net/virtio-user: fix not properly reset device")
> 
> Reported-by: Lei Yao 
> Reported-by: Tiwei Bie 
> Signed-off-by: Jianfeng Tan 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] vhost: fix ANY_LAYOUT declaration

2018-01-26 Thread Yuanhan Liu
On Sat, Jan 20, 2018 at 01:54:52AM +0800, Tan, Jianfeng wrote:
> 
> 
> On 1/19/2018 10:42 PM, Yuanhan Liu wrote:
> >On Fri, Jan 19, 2018 at 02:02:50PM -0500, Zhihong Wang wrote:
> >>The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary
> >>descriptor layouts. The vhost-user lib already supports it, but the
> >>feature declaration is missing. This patch fixes the mismatch.
> >I remembered there was a long discussion one year ago, that we can't
> >blindly set this feature, as this flag is reserved (thus should not
> >be set) for virtio 1.0.
> 
> We might need to read that old thread again. But as you said, this flag is
> reserved for 1.0, but not used for other purpose yet. So that the feature is
> negotiated does not affect anything, no?
> 
> >
> >We should set it when v1.0 is not enabled.
> 
> But in fact, vhost kernel reports supported feature bits consisting of
> ANY_LAYOUT and VERSION_1.

Yes, I was aware of that, and that was also one my points before.

I now also release that it's a must from migration from vhost-net to
vhost-user. Thus, I'd like to merge it this time. So

Applied to dpdk-next-virtio.

Thanks.

--yliu

> 
> >  However, as you know, v1.0 is also enabled by default in DPDK vhost-user.
> 
> Even v1.0 is used, the feature negotiated between virtio-net (kernel) and
> vhost kernel indeed contains ANY_LAYOUT. I suppose we shall do as the
> de-facto way.
> 
> Thanks,
> Jianfeng
> 
> >
> > --yliu
> >>Signed-off-by: Zhihong Wang 
> >>---
> >>  lib/librte_vhost/vhost.h | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >>diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> >>index b2bf0e8..57a9bea 100644
> >>--- a/lib/librte_vhost/vhost.h
> >>+++ b/lib/librte_vhost/vhost.h
> >>@@ -170,6 +170,7 @@ struct vhost_msg {
> >>  /* Features supported by this builtin vhost-user net driver. */
> >>  #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | 
> >> \
> >>+   (1ULL << VIRTIO_F_ANY_LAYOUT) | \
> >>(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> >>(1ULL << VIRTIO_NET_F_CTRL_RX) | \
> >>(1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> >>-- 
> >>2.7.5


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-25 Thread Yuanhan Liu
On Wed, Jan 24, 2018 at 05:57:34PM +0100, Thomas Monjalon wrote:
> 24/01/2018 16:04, Yuanhan Liu:
> > On Wed, Jan 24, 2018 at 11:37:31AM +0100, Thomas Monjalon wrote:
> > > 24/01/2018 11:36, Yuanhan Liu:
> > > > On Wed, Jan 24, 2018 at 11:21:44AM +0100, Thomas Monjalon wrote:
> > > > > 24/01/2018 10:28, Yuanhan Liu:
> > > > > > On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote:
> > > > > > > 24/01/2018 07:43, Yuanhan Liu:
> > > > > > > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote:
> > > > > > > > > 23/01/2018 13:46, Yuanhan Liu:
> > > > > > > > > > If port not found, then the whole string will be used for 
> > > > > > > > > > dev attachment.
> > > > > > > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND
> > > > > > > > > > port == 0 (the 2nd port will not be attached).
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > And here is how the devargs would look like if 
> > > > > > > > > > "matching;settings" is
> > > > > > > > > > being used:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > > > > > > > > > 
> > > > > > > > > > The part before ";" will be used for lookup and the later 
> > > > > > > > > > part will be
> > > > > > > > > > used for attachment. It should work. It just looks 
> > > > > > > > > > redundant.
> > > > > > > > > 
> > > > > > > > > It does not have to be redundant.
> > > > > > > > > It can be:
> > > > > > > > >   
> > > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,...
> > > > > > > > 
> > > > > > > > I knew you would make such reply :)
> > > > > > > > Then there is a contradiction. According your suggestion, the 
> > > > > > > > "port=0" belongs
> > > > > > > > to the matching section, but it also has to be used in the 
> > > > > > > > settings section.
> > > > > > > 
> > > > > > > If port=0 is matched, it is already set, right?
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > Why it needs to be in settings?
> > > > > > 
> > > > > > But I was talking the case it's not matched, say it's not probed 
> > > > > > and here
> > > > > > we do hotplug.
> > > > > 
> > > > > I don't understand.
> > > > > Anyway, the port property should be read-only.
> > > > 
> > > > All proberties should be read-only.
> > > > 
> > > > > Are we talking about the dev_port from the Linux kernel?
> > > > 
> > > > Yes. And it can be used for probing one port only (out of 2 ports in a 
> > > > NIC)
> > > > at probe stage. So, at this stage, it's a setting but not a match.
> > > 
> > > No it's a match!
> > > 
> > > A settings is changing data in the port.
> > 
> > So I see that's your definition about the "settings". What I think is
> > everything needed for driver initiation are settings.
> > 
> > For example, one proposed interface for VF rep is the "vf_id" property,
> > Similar to "port" property we have just discussed above,  it's used for
> > probing one specific VR rep for the given VF id.
> > 
> > You can say it's a match here, just like the "port" property.
> > 
> > But note that "vf_id" could be a range, to enable multiple VF reps.
> > The semantics looks like "setting" more than "match".
> 
> Not sure why it would look like settings.
> 
> > Another example is from the failsafe PMD that Gaetan had mentioned:
> > 
> > driver=failsafe,dev(bus=pci,id=00:02.0),fd(/some/file/)
> > 
> > They (dev and fd) should belong the "setting" section, for 2 reasons:
> > 
> > - they should not be used for matching
> > - they are used for failsafe PMD initiation
> 
> Yes these ones are settings.
> 
> > But it belongs "match", according to your definition about "settings",
> > because it doesn't change data in the port.
> 
> No, it changes data, dev() is adding a slave, and fd() is adding
> a file descriptor.

Then "port" should be settings, as it's used for adding a port.
Moreover, "vf_id=0-3" is definitely settings (according your definition),
as it's going to add 4 VF rep ports.

--yliu

> So we agree these are settings.
> 
> > That also means, the word "settings" might not be well named. It's
> > probably better to name it "drvargs".
> 
> I disagree. But it's only naming.
> Settings can be class settings, not only driver settings.
> And driver properties can be matching or settings.
> So "drvargs" does not make sense.


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-24 Thread Yuanhan Liu
On Tue, Jan 23, 2018 at 05:08:16PM +0100, Gaëtan Rivet wrote:
> Drivers answers to a specific API (ethdev, cryptodev, ...), to create
> standardized objects in response to parameters that are given to them
> for init. I think matching properties should be restricted to higher
> classes (bus, eth/crypto),

That's also what I thought. But I'm okay to have "driver" category
included for matching. I just don't really see a good example for that.

> while the driver class should be left
> free-form and to the responsibility of the PMD itself (while having the
> proper libraries for helping parsing safely, thus driving developpers
> toward similar syntaxes, while not forcing them in those).

I agree. The drv args are parsed by the drivers after all. It's hard to
have a good parser for all. I also don't know why we have to force them
to use "key=value" pairs.

I even see some drawbacks from the forcement:

- some PMDs already use none key/value format. Forcing them breaks more.
  If the "-w" "--vdev" compatibility is kept", nothing will be broken
  from the user point of view. However, if "key=value" pair is going to
  be used, user have to do some changes.

- Some "value" might have to use the nested "=". Handling the nested pairs
  introduces more complexity.

- sometimes, it's simple without an assignment. For example, it could be
  "driver=vhost-pmd,...,client" to let the vhost PMD acts as the client
  mode.

Both Linux kernel and QEMU don't force the "key=value" pair usage, I don't
see any good reason why we have to do that.

--yliu


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-24 Thread Yuanhan Liu
On Wed, Jan 24, 2018 at 11:37:31AM +0100, Thomas Monjalon wrote:
> 24/01/2018 11:36, Yuanhan Liu:
> > On Wed, Jan 24, 2018 at 11:21:44AM +0100, Thomas Monjalon wrote:
> > > 24/01/2018 10:28, Yuanhan Liu:
> > > > On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote:
> > > > > 24/01/2018 07:43, Yuanhan Liu:
> > > > > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote:
> > > > > > > 23/01/2018 13:46, Yuanhan Liu:
> > > > > > > > If port not found, then the whole string will be used for dev 
> > > > > > > > attachment.
> > > > > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND
> > > > > > > > port == 0 (the 2nd port will not be attached).
> > > > > > > > 
> > > > > > > > 
> > > > > > > > And here is how the devargs would look like if 
> > > > > > > > "matching;settings" is
> > > > > > > > being used:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > > > > > > > 
> > > > > > > > The part before ";" will be used for lookup and the later part 
> > > > > > > > will be
> > > > > > > > used for attachment. It should work. It just looks redundant.
> > > > > > > 
> > > > > > > It does not have to be redundant.
> > > > > > > It can be:
> > > > > > >   
> > > > > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,...
> > > > > > 
> > > > > > I knew you would make such reply :)
> > > > > > Then there is a contradiction. According your suggestion, the 
> > > > > > "port=0" belongs
> > > > > > to the matching section, but it also has to be used in the settings 
> > > > > > section.
> > > > > 
> > > > > If port=0 is matched, it is already set, right?
> > > > 
> > > > Yes.
> > > > 
> > > > > Why it needs to be in settings?
> > > > 
> > > > But I was talking the case it's not matched, say it's not probed and 
> > > > here
> > > > we do hotplug.
> > > 
> > > I don't understand.
> > > Anyway, the port property should be read-only.
> > 
> > All proberties should be read-only.
> > 
> > > Are we talking about the dev_port from the Linux kernel?
> > 
> > Yes. And it can be used for probing one port only (out of 2 ports in a NIC)
> > at probe stage. So, at this stage, it's a setting but not a match.
> 
> No it's a match!
> 
> A settings is changing data in the port.

So I see that's your definition about the "settings". What I think is
everything needed for driver initiation are settings.

For example, one proposed interface for VF rep is the "vf_id" property,
Similar to "port" property we have just discussed above,  it's used for
probing one specific VR rep for the given VF id.

You can say it's a match here, just like the "port" property.

But note that "vf_id" could be a range, to enable multiple VF reps.
The semantics looks like "setting" more than "match".

Another example is from the failsafe PMD that Gaetan had mentioned:

driver=failsafe,dev(bus=pci,id=00:02.0),fd(/some/file/)

They (dev and fd) should belong the "setting" section, for 2 reasons:

- they should not be used for matching
- they are used for failsafe PMD initiation

But it belongs "match", according to your definition about "settings",
because it doesn't change data in the port.

That also means, the word "settings" might not be well named. It's
probably better to name it "drvargs".

--yliu


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-24 Thread Yuanhan Liu
On Wed, Jan 24, 2018 at 11:21:44AM +0100, Thomas Monjalon wrote:
> 24/01/2018 10:28, Yuanhan Liu:
> > On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote:
> > > 24/01/2018 07:43, Yuanhan Liu:
> > > > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote:
> > > > > 23/01/2018 13:46, Yuanhan Liu:
> > > > > > If port not found, then the whole string will be used for dev 
> > > > > > attachment.
> > > > > > It means we are attaching a port with PCI BDF == 04.00.0 AND
> > > > > > port == 0 (the 2nd port will not be attached).
> > > > > > 
> > > > > > 
> > > > > > And here is how the devargs would look like if "matching;settings" 
> > > > > > is
> > > > > > being used:
> > > > > > 
> > > > > > 
> > > > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > > > > > 
> > > > > > The part before ";" will be used for lookup and the later part will 
> > > > > > be
> > > > > > used for attachment. It should work. It just looks redundant.
> > > > > 
> > > > > It does not have to be redundant.
> > > > > It can be:
> > > > >   
> > > > > bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,...
> > > > 
> > > > I knew you would make such reply :)
> > > > Then there is a contradiction. According your suggestion, the "port=0" 
> > > > belongs
> > > > to the matching section, but it also has to be used in the settings 
> > > > section.
> > > 
> > > If port=0 is matched, it is already set, right?
> > 
> > Yes.
> > 
> > > Why it needs to be in settings?
> > 
> > But I was talking the case it's not matched, say it's not probed and here
> > we do hotplug.
> 
> I don't understand.
> Anyway, the port property should be read-only.

All proberties should be read-only.

> Are we talking about the dev_port from the Linux kernel?

Yes. And it can be used for probing one port only (out of 2 ports in a NIC)
at probe stage. So, at this stage, it's a setting but not a match.

--yliu


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-24 Thread Yuanhan Liu
On Wed, Jan 24, 2018 at 09:19:10AM +0100, Thomas Monjalon wrote:
> 24/01/2018 07:43, Yuanhan Liu:
> > On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote:
> > > 23/01/2018 13:46, Yuanhan Liu:
> > > > On Thu, Jan 18, 2018 at 10:46:23AM +0100, Gaëtan Rivet wrote:
> > > > > On Thu, Jan 18, 2018 at 09:46:29AM +0100, Thomas Monjalon wrote:
> > > > > > 18/01/2018 08:35, Yuanhan Liu:
> > > > > > > On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote:
> > > > > > > > So does it make sense to separate them logically? Perhaps as 
> > > > > > > > "device identifier"
> > > > > > > > and "device args".
> > > > > > > 
> > > > > > > Then I think it returns back to the old issue: how could we 
> > > > > > > identify a
> > > > > > > port when the bus id (say BDF for PCI bus) is not enough for 
> > > > > > > identifying
> > > > > > > a port? Such case could happen when a single NIC has 2 ports 
> > > > > > > sharing
> > > > > > > the same BDF. It could also happen with the VF representors that 
> > > > > > > will
> > > > > > > be introduced shortly.
> > > > > > 
> > > > > > Yes, the device matching syntax must include bus category, class 
> > > > > > category
> > > > > > and driver category. So any device can be identified in future.
> > > > > > 
> > > > > > But I think Ferruh is talking about separating device matching
> > > > > > (which is described in this proposal) and device settings
> > > > > > (which are usually mixed in -w and --vdev options).
> > > > > > I agree there are different things and may be separate.
> > > > > > They could share the same syntax (bus/class/driver) but be separate
> > > > > > with a semicolon:
> > > > > > matching;settings
> > > > >
> > > > > Can you give an example?
> > > > 
> > > > Let's take port addition in OVS-DPDK as an example. It happens in 2
> > > > steps:
> > > > - port lookup (if port is already probed)
> > > > - dev attachment (if lookup fails)
> > > > 
> > > > And also let's assume we need probe a ConnectX-3 port. Note that for
> > > > ConnectX-3, there are 2 ports sharing the same PCI addr. Thus, PCI
> > > > BDF is not enough. And let's assume we use another extra property
> > > > "port".
> > > > 
> > > > If the proposal described in this patch is being used, the devarg
> > > > would look like following:
> > > > 
> > > > bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > > > 
> > > > Then "bus=pci,id=04:00.0/class=eth,port=0" will be used for lookup,
> > > > It means we are looking for a port with PCI BDF == 04:00.0 AND
> > > > port == 0 (the first port of the 2 ports).
> > > > 
> > > > Note that in my proposal the driver category is not intended for lookup.
> > > > If any properties needed be looked in the driver category, they would
> > > > probably need be elevated to the class category.
> > > 
> > > It is not my thought.
> > > I think we should be able to use bus, class and driver properties for 
> > > lookup.
> > > We can imagine doing a lookup on a driver specific id, which is not
> > > candidate to elevation to the class category.
> > > 
> > > > If port not found, then the whole string will be used for dev 
> > > > attachment.
> > > > It means we are attaching a port with PCI BDF == 04.00.0 AND
> > > > port == 0 (the 2nd port will not be attached).
> > > > 
> > > > 
> > > > And here is how the devargs would look like if "matching;settings" is
> > > > being used:
> > > > 
> > > > 
> > > > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > > > 
> > > > The part before ";" will be used for lookup and the later part will be
> > > > used for attachment. It should work. It just looks redundant.
> > > 
> > > It does not have to be redundant.
> > > It can be:
> > >   bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,...
> > 
> > I knew you would make such reply :)
> > Then there is a contradiction. According your suggestion, the "port=0" 
> > belongs
> > to the matching section, but it also has to be used in the settings section.
> 
> If port=0 is matched, it is already set, right?

Yes.

> Why it needs to be in settings?

But I was talking the case it's not matched, say it's not probed and here
we do hotplug.

--yliu

> > > Another example, setting the MAC address:
> > >   bus=pci,id=04:00.0/class=eth,port=0;class=eth,mac=00:11:22:33:44:55
> > 
> > What's the scenario it will be used? And who is going to parse it?
> 
> It can be used on command line for whitelisting a device and let the user
> change its MAC address.
> The matching is parsed by the PCI driver + ethdev, here.
> As mac is a property of ethdev (class=eth), this part must be parsed by 
> ethdev.



Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-23 Thread Yuanhan Liu
On Tue, Jan 23, 2018 at 03:29:34PM +0100, Thomas Monjalon wrote:
> 23/01/2018 13:46, Yuanhan Liu:
> > On Thu, Jan 18, 2018 at 10:46:23AM +0100, Gaëtan Rivet wrote:
> > > On Thu, Jan 18, 2018 at 09:46:29AM +0100, Thomas Monjalon wrote:
> > > > 18/01/2018 08:35, Yuanhan Liu:
> > > > > On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote:
> > > > > > So does it make sense to separate them logically? Perhaps as 
> > > > > > "device identifier"
> > > > > > and "device args".
> > > > > 
> > > > > Then I think it returns back to the old issue: how could we identify a
> > > > > port when the bus id (say BDF for PCI bus) is not enough for 
> > > > > identifying
> > > > > a port? Such case could happen when a single NIC has 2 ports sharing
> > > > > the same BDF. It could also happen with the VF representors that will
> > > > > be introduced shortly.
> > > > 
> > > > Yes, the device matching syntax must include bus category, class 
> > > > category
> > > > and driver category. So any device can be identified in future.
> > > > 
> > > > But I think Ferruh is talking about separating device matching
> > > > (which is described in this proposal) and device settings
> > > > (which are usually mixed in -w and --vdev options).
> > > > I agree there are different things and may be separate.
> > > > They could share the same syntax (bus/class/driver) but be separate
> > > > with a semicolon:
> > > > matching;settings
> > >
> > > Can you give an example?
> > 
> > Let's take port addition in OVS-DPDK as an example. It happens in 2
> > steps:
> > - port lookup (if port is already probed)
> > - dev attachment (if lookup fails)
> > 
> > And also let's assume we need probe a ConnectX-3 port. Note that for
> > ConnectX-3, there are 2 ports sharing the same PCI addr. Thus, PCI
> > BDF is not enough. And let's assume we use another extra property
> > "port".
> > 
> > If the proposal described in this patch is being used, the devarg
> > would look like following:
> > 
> > bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > 
> > Then "bus=pci,id=04:00.0/class=eth,port=0" will be used for lookup,
> > It means we are looking for a port with PCI BDF == 04:00.0 AND
> > port == 0 (the first port of the 2 ports).
> > 
> > Note that in my proposal the driver category is not intended for lookup.
> > If any properties needed be looked in the driver category, they would
> > probably need be elevated to the class category.
> 
> It is not my thought.
> I think we should be able to use bus, class and driver properties for lookup.
> We can imagine doing a lookup on a driver specific id, which is not
> candidate to elevation to the class category.
> 
> > If port not found, then the whole string will be used for dev attachment.
> > It means we are attaching a port with PCI BDF == 04.00.0 AND
> > port == 0 (the 2nd port will not be attached).
> > 
> > 
> > And here is how the devargs would look like if "matching;settings" is
> > being used:
> > 
> > 
> > bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...
> > 
> > The part before ";" will be used for lookup and the later part will be
> > used for attachment. It should work. It just looks redundant.
> 
> It does not have to be redundant.
> It can be:
>   bus=pci,id=04:00.0/class=eth,port=0;driver=mlx4,mlx4_arg1=settings1,...

I knew you would make such reply :)
Then there is a contradiction. According your suggestion, the "port=0" belongs
to the matching section, but it also has to be used in the settings section.

> Another example, setting the MAC address:
>   bus=pci,id=04:00.0/class=eth,port=0;class=eth,mac=00:11:22:33:44:55

What's the scenario it will be used? And who is going to parse it?

--yliu


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-23 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 10:46:23AM +0100, Gaëtan Rivet wrote:
> On Thu, Jan 18, 2018 at 09:46:29AM +0100, Thomas Monjalon wrote:
> > 18/01/2018 08:35, Yuanhan Liu:
> > > On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote:
> > > > So does it make sense to separate them logically? Perhaps as "device 
> > > > identifier"
> > > > and "device args".
> > > 
> > > Then I think it returns back to the old issue: how could we identify a
> > > port when the bus id (say BDF for PCI bus) is not enough for identifying
> > > a port? Such case could happen when a single NIC has 2 ports sharing
> > > the same BDF. It could also happen with the VF representors that will
> > > be introduced shortly.
> > 
> > Yes, the device matching syntax must include bus category, class category
> > and driver category. So any device can be identified in future.
> > 
> > But I think Ferruh is talking about separating device matching
> > (which is described in this proposal) and device settings
> > (which are usually mixed in -w and --vdev options).
> > I agree there are different things and may be separate.
> > They could share the same syntax (bus/class/driver) but be separate
> > with a semicolon:
> > matching;settings
>
> Can you give an example?

Let's take port addition in OVS-DPDK as an example. It happens in 2
steps:
- port lookup (if port is already probed)
- dev attachment (if lookup fails)

And also let's assume we need probe a ConnectX-3 port. Note that for
ConnectX-3, there are 2 ports sharing the same PCI addr. Thus, PCI
BDF is not enough. And let's assume we use another extra property
"port".

If the proposal described in this patch is being used, the devarg
would look like following:

bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...

Then "bus=pci,id=04:00.0/class=eth,port=0" will be used for lookup,
It means we are looking for a port with PCI BDF == 04:00.0 AND
port == 0 (the first port of the 2 ports).

Note that in my proposal the driver category is not intended for lookup.
If any properties needed be looked in the driver category, they would
probably need be elevated to the class category.

If port not found, then the whole string will be used for dev attachment.
It means we are attaching a port with PCI BDF == 04.00.0 AND
port == 0 (the 2nd port will not be attached).


And here is how the devargs would look like if "matching;settings" is
being used:


bus=pci,id=04:00.0/class=eth,port=0;bus=pci,id=04:00.0/class=eth,port=0/driver=mlx4,mlx4_arg_A=val,...

The part before ";" will be used for lookup and the later part will be
used for attachment. It should work. It just looks redundant.

-yliu


[dpdk-dev] [PATCH] net/mlx5: use PCI BDF as the port name

2018-01-22 Thread Yuanhan Liu
It is suggested to use PCI BDF to identify a port for port addition
in OVS-DPDK. While mlx5 has its own naming style: name it by ib dev
name. This breaks the typical OVS DPDK use case and brings more puzzle
to the end users.

To fix it, this patch changes it to use PCI BDF as the name, too.
Also, a postfix " port %u" is added, just in case their might be more
than 1 port assoicated with a PCI device.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/mlx5/mlx5.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9d1de36..7cd9db7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -610,6 +610,8 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
INFO("%u port(s) detected", device_attr.orig_attr.phys_port_cnt);
 
for (i = 0; i < device_attr.orig_attr.phys_port_cnt; i++) {
+   char name[RTE_ETH_NAME_MAX_LEN];
+   int len;
uint32_t port = i + 1; /* ports are indexed from one */
uint32_t test = (1 << i);
struct ibv_context *ctx = NULL;
@@ -633,14 +635,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
.inline_max_packet_sz = MLX5_ARG_UNSET,
};
 
+   len = snprintf(name, sizeof(name), PCI_PRI_FMT,
+pci_dev->addr.domain, pci_dev->addr.bus,
+pci_dev->addr.devid, pci_dev->addr.function);
+   if (device_attr.orig_attr.phys_port_cnt > 1)
+   snprintf(name + len, sizeof(name), " port %u", i);
+
mlx5_dev[idx].ports |= test;
 
if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-   /* from rte_ethdev.c */
-   char name[RTE_ETH_NAME_MAX_LEN];
-
-   snprintf(name, sizeof(name), "%s port %u",
-ibv_get_device_name(ibv_dev), port);
eth_dev = rte_eth_dev_attach_secondary(name);
if (eth_dev == NULL) {
ERROR("can not attach rte ethdev");
@@ -834,14 +837,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv_get_mtu(priv, &priv->mtu);
DEBUG("port %u MTU is %u", priv->port, priv->mtu);
 
-   /* from rte_ethdev.c */
-   {
-   char name[RTE_ETH_NAME_MAX_LEN];
-
-   snprintf(name, sizeof(name), "%s port %u",
-ibv_get_device_name(ibv_dev), port);
-   eth_dev = rte_eth_dev_allocate(name);
-   }
+   eth_dev = rte_eth_dev_allocate(name);
if (eth_dev == NULL) {
ERROR("can not allocate rte ethdev");
err = ENOMEM;
-- 
2.7.4



Re: [dpdk-dev] [PATCH] vhost: fix ANY_LAYOUT declaration

2018-01-19 Thread Yuanhan Liu
On Fri, Jan 19, 2018 at 02:02:50PM -0500, Zhihong Wang wrote:
> The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary
> descriptor layouts. The vhost-user lib already supports it, but the 
> feature declaration is missing. This patch fixes the mismatch.

I remembered there was a long discussion one year ago, that we can't
blindly set this feature, as this flag is reserved (thus should not
be set) for virtio 1.0.

We should set it when v1.0 is not enabled. However, as you know,
v1.0 is also enabled by default in DPDK vhost-user.

--yliu
> 
> Signed-off-by: Zhihong Wang 
> ---
>  lib/librte_vhost/vhost.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index b2bf0e8..57a9bea 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -170,6 +170,7 @@ struct vhost_msg {
>  
>  /* Features supported by this builtin vhost-user net driver. */
>  #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> + (1ULL << VIRTIO_F_ANY_LAYOUT) | \
>   (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
>   (1ULL << VIRTIO_NET_F_CTRL_RX) | \
>   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> -- 
> 2.7.5


Re: [dpdk-dev] [PATCH] reset src fd field to -1 in fdset_move of vhost

2018-01-19 Thread Yuanhan Liu
On Thu, Dec 21, 2017 at 05:15:40PM +0800, Bing Zhao wrote:
> In the fdset_move, after copying the fd&rwfds from the src to the dst, the fd 
> should be set to -1. Or else in some cases, there will be a fault missing. 
> E.g:
> Before: 1 -1 3 4 -1 6 7 -1 9 10
> After: 1 10 3 4 9 6 7 -1 9 10
> Then the index7 will be returned correctly for the first time, but if another 
> fd is to be added, it will fail.

Hi,

Have you met a real issue? I'm a bit doubt about that, since the fd array
is also guarded by "pfdset->num", which makes sure we will not access
those invalid entries (i.e. the last 2 entries in above example).

--yliu

> Signed-off-by: Bing Zhao 
> ---
>  lib/librte_vhost/fd_man.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_vhost/fd_man.c b/lib/librte_vhost/fd_man.c
> index 4c6fed418..48594dd7f 100644
> --- a/lib/librte_vhost/fd_man.c
> +++ b/lib/librte_vhost/fd_man.c
> @@ -63,6 +63,7 @@ fdset_move(struct fdset *pfdset, int dst, int src)
>  {
>   pfdset->fd[dst]= pfdset->fd[src];
>   pfdset->rwfds[dst] = pfdset->rwfds[src];
> + pfdset->fd[src].fd = -1;
>  }
>  
>  static void
> -- 
> 2.11.0.windows.1
> 


Re: [dpdk-dev] [PATCH v5] vhost_user: protect active rings from async ring changes

2018-01-19 Thread Yuanhan Liu
On Wed, Jan 17, 2018 at 03:49:25PM +0200, Victor Kaplansky wrote:
> When performing live migration or memory hot-plugging,
> the changes to the device and vrings made by message handler
> done independently from vring usage by PMD threads.
> 
> This causes for example segfaults during live-migration
> with MQ enable, but in general virtually any request
> sent by qemu changing the state of device can cause
> problems.

Applied to dpdk-next-virtio, with

Cc: sta...@dpdk.org

Thanks.

--yliu
> 
> These patches fixes all above issues by adding a spinlock
> to every vring and requiring message handler to start operation
> only after ensuring that all PMD threads related to the device
> are out of critical section accessing the vring data.
> 
> Each vring has its own lock in order to not create contention
> between PMD threads of different vrings and to prevent
> performance degradation by scaling queue pair number.
> 
> See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
> 
> Signed-off-by: Victor Kaplansky 
> ---
> v5:
>  o get rid of spinlock wrapping functions in vhost.h
> 
> v4:
> 
>  o moved access_unlock before accessing enable flag and
>access_unlock after iommu_unlock consistently.
>  o cosmetics: removed blank line.
>  o the access_lock variable moved to be in the same
>cache line with enable and access_ok flags.
>  o dequeue path is now guarded with trylock and returning
>zero if unsuccessful.
>  o GET_VRING_BASE operation is not guarded by access lock
>to avoid deadlock with device_destroy. See the comment
>in the code.
>  o Fixed error path exit from enqueue and dequeue carefully
>unlocking access and iommu locks as appropriate.
> 
> v3:
>o Added locking to enqueue flow.
>o Enqueue path guarded as well as dequeue path.
>o Changed name of active_lock.
>o Added initialization of guarding spinlock.
>o Reworked functions skimming over all virt-queues.
>o Performance measurements done by Maxime Coquelin shows
>  no degradation in bandwidth and throughput.
>o Spelling.
>o Taking lock only on set operations.
>o IOMMU messages are not guarded by access lock.
> 
> v2:
>o Fixed checkpatch complains.
>o Added Signed-off-by.
>o Refined placement of guard to exclude IOMMU messages.
>o TODO: performance degradation measurement.
> 
>  lib/librte_vhost/vhost.h  |  6 ++--
>  lib/librte_vhost/vhost.c  |  1 +
>  lib/librte_vhost/vhost_user.c | 70 
> +++
>  lib/librte_vhost/virtio_net.c | 28 ++---
>  4 files changed, 99 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 1cc81c17..c8f2a817 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -108,12 +108,14 @@ struct vhost_virtqueue {
>  
>   /* Backend value to determine if device should started/stopped */
>   int backend;
> + int enabled;
> + int access_ok;
> + rte_spinlock_t  access_lock;
> +
>   /* Used to notify the guest (trigger interrupt) */
>   int callfd;
>   /* Currently unused as polling mode is enabled */
>   int kickfd;
> - int enabled;
> - int access_ok;
>  
>   /* Physical address of used ring, for logging */
>   uint64_tlog_guest_addr;
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 4f8b73a0..dcc42fc7 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -259,6 +259,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t 
> vring_idx)
>  
>   dev->virtqueue[vring_idx] = vq;
>   init_vring_queue(dev, vring_idx);
> + rte_spinlock_init(&vq->access_lock);
>  
>   dev->nr_vring += 1;
>  
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index f4c7ce46..0685d4e7 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1190,12 +1190,47 @@ vhost_user_check_and_alloc_queue_pair(struct 
> virtio_net *dev, VhostUserMsg *msg)
>   return alloc_vring_queue(dev, vring_idx);
>  }
>  
> +static void
> +vhost_user_lock_all_queue_pairs(struct virtio_net *dev)
> +{
> + unsigned int i = 0;
> + unsigned int vq_num = 0;
> +
> + while (vq_num < dev->nr_vring) {
> + struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> + if (vq) {
> + rte_spinlock_lock(&vq->access_lock);
> + vq_num++;
> + }
> + i++;
> + }
> +}
> +
> +static void
> +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev)
> +{
> + unsigned int i = 0;
> + unsigned int vq_num = 0;
> +
> + while (vq_num < dev->nr_vring) {
> + struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> + if (vq) {
> +   

Re: [dpdk-dev] [PATCH 00/12] lib/librte_vhost: introduce new vhost_user crypto

2018-01-18 Thread Yuanhan Liu
On Mon, Nov 27, 2017 at 08:01:03PM +, Fan Zhang wrote:
> This patchset adds crypto backend suppport to vhost_user library,
> including a proof-of-concept sample application. The implementation
> follows the virtio-crypto specification and have been tested
> with qemu 2.9.50 (with several patches applied, detailed later)
> with Fedora 24 running in the frontend.
> 
> The vhost_user library acts as a "bridge" method that translate
> the virtio-crypto crypto requests to DPDK crypto operations, so it
> is purely software implementation. However it does require the user
> to provide the DPDK Cryptodev ID so it knows how to handle the
> virtio-crypto session creation and deletion mesages.
> 
> Currently the implementation supports AES-CBC-128 and HMAC-SHA1
> cipher only/chaining modes and does not support sessionless mode
> yet. The guest can use standard virtio-crypto driver to set up
> session and sends encryption/decryption requests to backend. The
> vhost-crypto sample application provided in this patchset will
> do the actual crypto work.
> 
> To make this patchset working, a few tweaks need to be done:
> 
> In the host:
> 1. Download the qemu source code, and apply the patches in:
> list.nongnu.org/archive/html/qemu-devel/2017-07/msg04664.html.

I could not open it. What's the status of them now? Have they got
merged?

Besides that, this patchset doesn't seem well organized. For example,
Makefile, version_map, config file, doc (etc.) are all updated separately.

Could you do a re-organization? Personally, I'd suggest something below:

- one patch for introduce extra msg_handler
- one patch for implementing vhost_crypto (release doc should also be
  updated here).
- one patch for adding an example (example doc should also be included
  here).

Of course, if you could split 2nd item a bit further, that'd be better.

Also, the new functions should be marked as "experimental".

--yliu

> As the patches have been out for over 2 months, they cannot be
> cleanly applied to the latest code. I have reverted the commits
> back to 4c4414a to make the patches applied cleanly.
> 
> 2. Recompile your qemu.
> 
> 3. Apply this patchset to latest DPDK code and recompile DPDK.
> 
> 4. Compile and run vhost-crypto sample application.
> 
> ./examples/vhost_crypto/build/vhost-crypto -l 11,12,13 -w :86:01.0 \
>  --socket-mem 2048,2048
> 
> Where :86:01.0 is the QAT PCI address. You may use AES-NI-MB if it is
> not available. The sample application requires 3 lcores: 1 master, 1 worker,
> and 1 statistics displaying worker. The application will create a UNIX socket
> file /tmp/vhost_crypto1.socket.
> 
> 5. Start your qemu application. Here is my command:
> 
> qemu/x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -cpu host \
> -smp 2 -m 1G -hda ~/path-to-your/image.qcow \
> -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on \
> -mem-prealloc -numa node,memdev=mem -chardev \
> socket,id=charcrypto0,path=/tmp/vhost_crypto1.socket \
> -object cryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0 \
> -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0 
> 
> 6. Once guest is booted. The Linux virtio_crypto kernel module is loaded by
> default. You shall see the following logs in your demsg:
> 
> [   17.611723] virtio_crypto virtio0: max_queues: 1, max_cipher_key_len: ...
> [   17.612156] virtio_crypto virtio0: will run requests pump with realtime ...
> [   18.376100] virtio_crypto virtio0: Accelerator is ready
> 
> The virtio_crypto driver in the guest is now up and running.
> 
> 7. The rest steps can be as same as the Testing section in
> https://wiki.qemu.org/Features/VirtioCrypto
> 
> Fan Zhang (12):
>   lib/librte_vhost: add private data field
>   lib/librte_vhost: add vhost_user private info structure
>   lib/librte_vhost: add virtio crypto user message structure
>   lib/librte_vhost: add session messsage handler
>   lib/librte_vhost: add request handler
>   lib/librte_vhost: add head file
>   lib/librte_vhost: add public function implementation
>   lib/librte_vhost: update version map
>   lib/librte_vhost: update makefile
>   config: added no copy configuration item
>   example/vhost_crypto: add vhost_crypto sample application
>   doc: update vhost crypto documentation
> 
>  config/common_base|1 +
>  doc/guides/prog_guide/vhost_lib.rst   |   16 +
>  doc/guides/rel_notes/release_18_02.rst|7 +
>  doc/guides/sample_app_ug/index.rst|1 +
>  doc/guides/sample_app_ug/vhost_crypto.rst |   95 +++
>  examples/vhost_crypto/Makefile|   59 ++
>  examples/vhost_crypto/main.c  |  588 +
>  lib/librte_vhost/Makefile |6 +-
>  lib/librte_vhost/rte_vhost_crypto.h   |  150 
>  lib/librte_vhost/rte_vhost_version.map|   11 +
>  lib/librte_vhost/vhost.h  |4 +-
>  lib/librte_vhost/vhost_crypto.c   | 1314 
> +
>  lib/librte_vh

Re: [dpdk-dev] [PATCH 03/12] lib/librte_vhost: add virtio crypto user message structure

2018-01-18 Thread Yuanhan Liu
On Mon, Nov 27, 2017 at 08:01:06PM +, Fan Zhang wrote:
> This patch adds virtio-crypto spec user message structure to
> vhost_user.
> 
> Signed-off-by: Fan Zhang 
> ---
>  lib/librte_vhost/vhost_user.h | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index 990b40a..793cdc2 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -108,6 +108,30 @@ typedef struct VhostUserLog {
>   uint64_t mmap_offset;
>  } VhostUserLog;
>  
> +/* Comply with Cryptodev-Linux */
> +#define VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH512
> +#define VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH  64
> +
> +/* Same structure as vhost-user backend session info */
> +typedef struct VhostUserCryptoSessionParam {
> + int64_t session_id;
> + uint32_t op_code;
> + uint32_t cipher_algo;
> + uint32_t cipher_key_len;
> + uint32_t hash_algo;
> + uint32_t digest_len;
> + uint32_t auth_key_len;
> + uint32_t aad_len;
> + uint8_t op_type;
> + uint8_t dir;
> + uint8_t hash_mode;
> + uint8_t chaining_dir;
> + uint8_t *ciphe_key;
> + uint8_t *auth_key;
> + uint8_t cipher_key_buf[VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH];
> + uint8_t auth_key_buf[VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH];
> +} VhostUserCryptoSessionParam;

I'd suggest to merge this patch (which defines the struct) and the next one
(which uses it) into one patch.

--yliu
> +
>  typedef struct VhostUserMsg {
>   union {
>   VhostUserRequest master;
> @@ -128,6 +152,7 @@ typedef struct VhostUserMsg {
>   VhostUserMemory memory;
>   VhostUserLoglog;
>   struct vhost_iotlb_msg iotlb;
> + VhostUserCryptoSessionParam crypto_session;
>   } payload;
>   int fds[VHOST_MEMORY_MAX_NREGIONS];
>  } __attribute((packed)) VhostUserMsg;
> -- 
> 2.9.5


Re: [dpdk-dev] [PATCH 02/12] lib/librte_vhost: add vhost_user private info structure

2018-01-18 Thread Yuanhan Liu
On Mon, Nov 27, 2017 at 08:01:05PM +, Fan Zhang wrote:
> This patch adds a vhost_user_dev_priv structure and a vhost_user
> message handler function prototype to vhost_user. This allows
> different types of devices to add private information and their
> device-specific vhost-user message function handlers to
> virtio_net structure. The change to vhost_user_msg_handler is
> also added to call the device-specific message handler.
> 
> Signed-off-by: Fan Zhang 
> ---
>  lib/librte_vhost/vhost_user.c | 13 -
>  lib/librte_vhost/vhost_user.h |  9 -
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index f4c7ce4..c5ae85f 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1337,7 +1337,18 @@ vhost_user_msg_handler(int vid, int fd)
>   break;
>  
>   default:

Hi Fan,

Firstly, apologize for the very late review!

> - ret = -1;
> + if (!dev->private_data)
> + ret = -1;
> + else {
> + struct vhost_user_dev_priv *priv = dev->private_data;
> +
> + if (!priv->vhost_user_msg_handler)
> + ret = -1;
> + else {
> + ret = (*priv->vhost_user_msg_handler)(dev,
> + &msg, fd);

I think this is more complex than necessary. You could just add msg_handler
in the virtio_net struct and do:

if (dev->msg_handler)
ret = dev->msg_handler(dev, &msg, fd);
else
ret = -1;

> + }
> + }
>   break;
>  
>   }
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index 76d9fe2..990b40a 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -81,7 +81,7 @@ typedef enum VhostUserRequest {
>   VHOST_USER_NET_SET_MTU = 20,
>   VHOST_USER_SET_SLAVE_REQ_FD = 21,
>   VHOST_USER_IOTLB_MSG = 22,
> - VHOST_USER_MAX
> + VHOST_USER_MAX = 25

That's weird, we should list all VHOST_USER messages here.

--yliu

>  } VhostUserRequest;
>  
>  typedef enum VhostUserSlaveRequest {
> @@ -137,6 +137,13 @@ typedef struct VhostUserMsg {
>  /* The version of the protocol we support */
>  #define VHOST_USER_VERSION0x1
>  
> +typedef int (*msg_handler)(struct virtio_net *dev, struct VhostUserMsg *msg,
> + int fd);
> +
> +struct vhost_user_dev_priv {
> + msg_handler vhost_user_msg_handler;
> + char data[0];
> +};
>  
>  /* vhost_user.c */
>  int vhost_user_msg_handler(int vid, int fd);
> -- 
> 2.9.5


Re: [dpdk-dev] [PATCH 1/3] net/virtio: fix typo in function name

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 02:45:48PM +0100, Olivier Matz wrote:
> On Thu, Jan 18, 2018 at 09:27:10PM +0800, Yuanhan Liu wrote:
> > On Thu, Jan 18, 2018 at 10:07:31AM +0100, Olivier Matz wrote:
> > > Fixes: c1f86306a026 ("virtio: add new driver")
> > > Cc: sta...@dpdk.org
> > 
> > I would not suggest to include such patch for a stable release. It doesn't
> > fix a real issue.
> 
> Yes.
> 
> I've included it in the patchset to avoid conflicts for backports because it
> changes the same code area.
> 
> If you prefer, I can send a new patchset with the current 2/3 and 3/3, plus
> the typo fix as a individual patch, on top of them.
> 
> Does it look good to you?

Yes, that's the reason I personaly always put the bug fix patches in
head of a patchset, for avoiding conflicts like this.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH 2/3] net/virtio: rationalize queue flushing

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 02:55:44PM +0100, Olivier Matz wrote:
> On Thu, Jan 18, 2018 at 09:26:09PM +0800, Yuanhan Liu wrote:
> > Hi Oliver,
> > 
> > On Thu, Jan 18, 2018 at 10:07:32AM +0100, Olivier Matz wrote:
> > > Rationalize the function virtio_dev_free_mbufs():
> > > 
> > > - skip NULL vqs instead of crashing: this is required for the
> > >   next commit
> > > - use the same kind of loop than in virtio_free_queues()
> > > - also flush mbufs from the control queue (this is useless yet)
> > 
> > Could we just do "nr_vq = virtio_get_nr_vq(hw) - 1" with a comment that
> > CQ is excluded, for skipping the CQ?
> 
> Is "nr_vq = virtio_get_nr_vq(hw) - 1" always valid?
> Shouldn't we do this check?
>   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ))
> 
> Instead, I suggest this:
> 
>   queue_type = virtio_get_queue_type(hw, i);
>   if (queue_type == VTNET_RQ)
>   type = "rxq";
>   else if (queue_type == VTNET_TQ)
>   type = "txq";
>   else
> - type = "cq";
> + continue;

Yes, this is better.

> 
> > > - factorize common code between rxq, txq, cq
> > > 
> > > Cc: sta...@dpdk.org
> > 
> > Could you split the patch two 2:
> > 
> > - one for fixing the crash (skip the NULL vqs). We only need this one
> >   for stable release.
> > - another one for the refactoring
> 
> Yes, do you want them all in the same patchset?

I think it's okay.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex

2018-01-18 Thread Yuanhan Liu
Hi,

Apologize for late review.

On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike 
> 
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.

Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.

> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
>->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
>changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
>such as vhost_user_read_cb() would try to retrieve the conn_mutex.
> 
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.

I then would change the title to "vhost: fix deadlock".

I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.

> Signed-off-by: zhike wang 

Again, you need fix your git config file about your name.

> ---
>  lib/librte_vhost/socket.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
>   struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>  
>   if (!strcmp(vsocket->path, path)) {
> + int del_fds[MAX_FDS];
> + int num_of_fds = 0, fd_index;
> +

I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".

>   if (vsocket->is_server) {
>   fdset_del(&vhost_user.fdset, 
> vsocket->socket_fd);
>   close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
>   vhost_user_remove_reconnect(vsocket);
>   }
>  
> + /* fdset_del() must be called without conn_mutex. */
> + pthread_mutex_lock(&vsocket->conn_mutex);
> + for (conn = TAILQ_FIRST(&vsocket->conn_list);
> +  conn != NULL;
> +  conn = next) {
> + next = TAILQ_NEXT(conn, next);
> +
> + del_fds[num_of_fds++] = conn->connfd;
> + }
> + pthread_mutex_unlock(&vsocket->conn_mutex);
> +
> + for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> + fdset_del(&vhost_user.fdset, del_fds[fd_index]);
> +
>   pthread_mutex_lock(&vsocket->conn_mutex);
>   for (conn = TAILQ_FIRST(&vsocket->conn_list);
>conn != NULL;
>conn = next) {
>   next = TAILQ_NEXT(conn, next);
>  
> - fdset_del(&vhost_user.fdset, conn->connfd);

If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.

--yliu
>   RTE_LOG(INFO, VHOST_CONFIG,
>   "free connfd = %d for device '%s'\n",
>   conn->connfd, path);
> -- 
> 1.8.3.1


Re: [dpdk-dev] [PATCH 1/3] net/virtio: fix typo in function name

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 10:07:31AM +0100, Olivier Matz wrote:
> Fixes: c1f86306a026 ("virtio: add new driver")
> Cc: sta...@dpdk.org

I would not suggest to include such patch for a stable release. It doesn't
fix a real issue.

Thanks.

--yliu
> 
> Signed-off-by: Olivier Matz 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 4 ++--
>  drivers/net/virtio/virtqueue.c | 2 +-
>  drivers/net/virtio/virtqueue.h | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index ec012c2ac..c7426951c 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1879,7 +1879,7 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev 
> *dev)
>   VIRTQUEUE_DUMP(rxvq->vq);
>  
>   PMD_INIT_LOG(DEBUG, "rx_queues[%d]=%p", i, rxvq);
> - while ((buf = virtqueue_detatch_unused(rxvq->vq)) != NULL) {
> + while ((buf = virtqueue_detach_unused(rxvq->vq)) != NULL) {
>   rte_pktmbuf_free(buf);
>   mbuf_num++;
>   }
> @@ -1899,7 +1899,7 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev 
> *dev)
>   VIRTQUEUE_DUMP(txvq->vq);
>  
>   mbuf_num = 0;
> - while ((buf = virtqueue_detatch_unused(txvq->vq)) != NULL) {
> + while ((buf = virtqueue_detach_unused(txvq->vq)) != NULL) {
>   rte_pktmbuf_free(buf);
>   mbuf_num++;
>   }
> diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
> index 1ada4fe08..6988bfea4 100644
> --- a/drivers/net/virtio/virtqueue.c
> +++ b/drivers/net/virtio/virtqueue.c
> @@ -16,7 +16,7 @@
>   * 2) mbuf that hasn't been consued by backend.
>   */
>  struct rte_mbuf *
> -virtqueue_detatch_unused(struct virtqueue *vq)
> +virtqueue_detach_unused(struct virtqueue *vq)
>  {
>   struct rte_mbuf *cookie;
>   int idx;
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index fedbaa39c..1288e5287 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -270,7 +270,7 @@ void virtqueue_dump(struct virtqueue *vq);
>  /**
>   *  Get all mbufs to be freed.
>   */
> -struct rte_mbuf *virtqueue_detatch_unused(struct virtqueue *vq);
> +struct rte_mbuf *virtqueue_detach_unused(struct virtqueue *vq);
>  
>  /* Flush the elements in the used ring. */
>  void virtqueue_rxvq_flush(struct virtqueue *vq);
> -- 
> 2.11.0


Re: [dpdk-dev] [PATCH 2/3] net/virtio: rationalize queue flushing

2018-01-18 Thread Yuanhan Liu
Hi Oliver,

On Thu, Jan 18, 2018 at 10:07:32AM +0100, Olivier Matz wrote:
> Rationalize the function virtio_dev_free_mbufs():
> 
> - skip NULL vqs instead of crashing: this is required for the
>   next commit
> - use the same kind of loop than in virtio_free_queues()
> - also flush mbufs from the control queue (this is useless yet)

Could we just do "nr_vq = virtio_get_nr_vq(hw) - 1" with a comment that
CQ is excluded, for skipping the CQ?

> - factorize common code between rxq, txq, cq
> 
> Cc: sta...@dpdk.org

Could you split the patch two 2:

- one for fixing the crash (skip the NULL vqs). We only need this one
  for stable release.
- another one for the refactoring

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v2] vhost: dequeue zero copy should restore mbuf before return to pool

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 09:44:41AM +0100, Maxime Coquelin wrote:
> 
> 
> On 01/17/2018 04:45 PM, Junjie Chen wrote:
> >dequeue zero copy change buf_addr and buf_iova of mbuf, and return
> >to mbuf pool without restore them, it breaks vm memory if others allocate
> >mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova.
> >
> >Signed-off-by: Junjie Chen 
> >---
> >v2 changes:
> >Remove useless restore
> >  lib/librte_vhost/virtio_net.c | 17 +
> >  1 file changed, 17 insertions(+)
> >
> 
> Reviewed-by: Maxime Coquelin 

Applied to dpdk-next-virtio, with below addings:

Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Cc: sta...@dpdk.org

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] lib/librte_vhost: mov enum definition from PMD to lib

2018-01-18 Thread Yuanhan Liu
On Fri, Jan 12, 2018 at 04:12:12PM +0800, Zhiyong Yang wrote:
> The enum definition is placed in librte_vhost in order to avoid many
> duplication definitions in PMD and example code everywhere.

I moved them out from rte_vhost.h on purpose, for they are virtio-net
specific while this library is not.

So nack.

--yliu


Re: [dpdk-dev] [PATCH 1/2] net: fixup RARP generation

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 09:38:39AM +0100, Thomas Monjalon wrote:
> 18/01/2018 04:14, Yuanhan Liu:
> > Due to a mistake operation from me, older version (v10) was merged to
> > master branch. It's the v11 should be applied. However, the master branch
> > is not rebase-able. Thus, this patch is made, from the diff between v10
> > and v11.
> 
> Understood it is a mistake.
> However, you can briefly describes what does this change.
> Is there a changelog in v11 patch?

Yes, ther is:

v11:
- Add check for parameter and tailroom in rte_net_make_rarp_packet.
- Allocate mbuf in rte_net_make_rarp_packet.

> > 
> > Code is from Xiao Wang.
> 
> You may add his Signed-off.

I have no objection. Xiao, okay to you? I will also set the author
to you.

--yliu

> > Signed-off-by: Yuanhan Liu 


Re: [dpdk-dev] [PATCH 2/2] net: fix build error

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 09:36:46AM +0100, Thomas Monjalon wrote:
> 18/01/2018 09:03, Yuanhan Liu:
> > On Thu, Jan 18, 2018 at 07:45:23AM +, Wang, Xiao W wrote:
> > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > 18/01/2018 04:14, Yuanhan Liu:
> > > > > Fix build error when shared lib is enabled:
> > > > >
> > > > >   LD librte_net.so.1.1
> > > > > rte_arp.o: In function `rte_net_make_rarp_packet':
> > > > > rte_arp.c:(.text+0x1f0): undefined reference to 
> > > > > `rte_mempool_ops_table'
> > > > > rte_arp.c:(.text+0x21d): undefined reference to 
> > > > > `rte_mempool_ops_table'
> > > > > rte_arp.c:(.text+0x2d5): undefined reference to 
> > > > > `rte_mempool_ops_table'
> > > > > rte_arp.c:(.text+0x384): undefined reference to 
> > > > > `rte_mempool_ops_table'
> > > > > rte_arp.c:(.text+0x4b7): undefined reference to 
> > > > > `rte_mempool_ops_table'
> > > > 
> > > > This is very strange, I do not see this error on my machine.
> > > 
> > > I could see this error on mine with:
> > > +CONFIG_RTE_BUILD_SHARED_LIB=y
> > 
> > Yes, that's what meant in the commit log by "when shared lib is enabled".
> 
> Got it: you are fixing a build issue introduced by the patch 1
> in this series.
> So please merge them.

Right. I will merge them.

--yliu


Re: [dpdk-dev] [PATCH 2/2] net: fix build error

2018-01-18 Thread Yuanhan Liu
On Thu, Jan 18, 2018 at 07:45:23AM +, Wang, Xiao W wrote:
> 
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > Sent: Thursday, January 18, 2018 3:39 PM
> > To: Yuanhan Liu 
> > Cc: dev@dpdk.org; Wang, Xiao W ; Yigit, Ferruh
> > ; Olivier Matz 
> > Subject: Re: [PATCH 2/2] net: fix build error
> > 
> > 18/01/2018 04:14, Yuanhan Liu:
> > > Fix build error when shared lib is enabled:
> > >
> > >   LD librte_net.so.1.1
> > > rte_arp.o: In function `rte_net_make_rarp_packet':
> > > rte_arp.c:(.text+0x1f0): undefined reference to `rte_mempool_ops_table'
> > > rte_arp.c:(.text+0x21d): undefined reference to `rte_mempool_ops_table'
> > > rte_arp.c:(.text+0x2d5): undefined reference to `rte_mempool_ops_table'
> > > rte_arp.c:(.text+0x384): undefined reference to `rte_mempool_ops_table'
> > > rte_arp.c:(.text+0x4b7): undefined reference to `rte_mempool_ops_table'
> > 
> > This is very strange, I do not see this error on my machine.
> 
> I could see this error on mine with:
> +CONFIG_RTE_BUILD_SHARED_LIB=y

Yes, that's what meant in the commit log by "when shared lib is enabled".

--yliu


Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-17 Thread Yuanhan Liu
On Wed, Jan 17, 2018 at 12:34:08PM +, Ferruh Yigit wrote:
> On 1/16/2018 2:50 PM, Yuanhan Liu wrote:
> > This patch documents the new devargs syntax, which is going to be
> > implemented in DPDK v18.05.
> > 
> > The new devargs proposal is introduced for having a consistent
> > interface for:
> > 
> > - whitelisting/blacklisting devices
> > - identifying ports
> > - attaching/detaching devices
> 
> Hi Yuanhan,

Hi Ferruh,

> devargs = device arguments, the PMD specific arguments, similar to 
> module_param
> in Linux.

Not exactly. If you look at the function
rte_eth_dev_attach(const char *devargs, uint16_t *port_id), the "device
identifier" part you called also part of the devargs.

> 
> Currently only "--vdev" and -w/-b eal parameters parse proceeding strings as
> devargs.
> 
> Like: "--vdev "net_pcap,iface=lo" .
> For this case "iface=lo" device specific argument and available to use from 
> pcap
> PMD.
> 
> I agree it to have a consistent way to describe device, that makes better
> whitelist/blacklist support. But that part is not device args, more like 
> device
> identifier.
> 
> When you use this string with whitelist/blacklist I think you won't need
> "iface=lo" part,

Right. That's actaully mentioned in this patch:

   Before device is probed, only the bus category is relevant.

> only need first part. And when using with --vdev, (or perhaps
> with attach) you don't need to use first part
> "bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55", PMD already knows 
> it
> is in virtual bus and its class etc.

We are going to keep the compatibily, meaning we will leave "-w" and
"--vdev" as they are. We plan to introduce one more CLI option and
let user to switch to it so that it can work with any bus, not only
PCI and vdev bus.

> So does it make sense to separate them logically? Perhaps as "device 
> identifier"
> and "device args".

Then I think it returns back to the old issue: how could we identify a
port when the bus id (say BDF for PCI bus) is not enough for identifying
a port? Such case could happen when a single NIC has 2 ports sharing
the same BDF. It could also happen with the VF representors that will
be introduced shortly.

--yliu

> string can become:
> "device=bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55;driver=PMD_NAME,driverspecificproperty=VALUE"
> 
> specific usages can become:
> -w "device=bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55"
> --vdev "driver=PMD_NAME,driverspecificproperty=VALUE"
> 
> And store them in two separate storage, and eal or PMD can ask for "device
> identifier" or "device args" separately?
> 
> What do you think?
> 
> > 
> > Please check the patch content for the details. Also, here is link
> > for the background:
> > http://dpdk.org/ml/archives/dev/2017-November/082600.html
> > 
> > This syntax is suggestd by Thomas:
> > http://dpdk.org/ml/archives/dev/2017-December/084234.html
> > 
> > Signed-off-by: Yuanhan Liu 
> > ---
> >  doc/guides/prog_guide/env_abstraction_layer.rst | 34 
> > +
> >  1 file changed, 34 insertions(+)
> > 
> > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
> > b/doc/guides/prog_guide/env_abstraction_layer.rst
> > index 34d871c..12f37f2 100644
> > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > @@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such 
> > case, calling
> >  callback. Care must be taken not to close the device from the interrupt 
> > handler
> >  context. It is necessary to reschedule such closing operation.
> >  
> > +Devargs
> > +~~~
> > +
> > +The ``devargs`` can be used for whitelisting/blacklisting devices, 
> > identifying
> > +DPDK ports and attaching/deatching devices. They all share the same syntax.
> > +
> > +It is split in 3 categories, where almost everything is optional key/value 
> > pairs:
> > +
> > +* bus (pci, vdev, vmbus, fslmc, etc)
> > +* class (eth, crypto, etc)
> > +* driver (i40e, mlx5, virtio, etc)
> > +
> > +The key/value pair describing the category scope is mandatory and must be 
> > the
> > +first pair in the category properties. Example: bus=pci, must be placed 
> > before
> > +id=:01:00.0.
> > +
> > +The syntax has below rules:
> > +
> > +* Between categories, the separator is a slash.
> > +* Inside a category, the separator is a comma.
> > +* Inside a key/value pair, the separator is an equal sign.
> > +* Each category can be used alone.
> > +
> > +Here is an example with all categories::
> > +
> > +
> > bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE
> > +
> > +It can also be simple like below::
> > +
> > +class=eth,mac=00:11:22:33:44:55
> > +
> > +A device is identified when every properties are matched. Before device is
> > +probed, only the bus category is relevant.
> > +
> >  Blacklisting
> >  
> >  
> > 


[dpdk-dev] [PATCH 1/2] net: fixup RARP generation

2018-01-17 Thread Yuanhan Liu
Due to a mistake operation from me, older version (v10) was merged to
master branch. It's the v11 should be applied. However, the master branch
is not rebase-able. Thus, this patch is made, from the diff between v10
and v11.

Code is from Xiao Wang.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_net/rte_arp.c  | 26 +-
 lib/librte_net/rte_arp.h  | 11 ++-
 lib/librte_vhost/virtio_net.c | 12 +++-
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/lib/librte_net/rte_arp.c b/lib/librte_net/rte_arp.c
index d7223b0..b953bcd 100644
--- a/lib/librte_net/rte_arp.c
+++ b/lib/librte_net/rte_arp.c
@@ -7,17 +7,28 @@
 #include 
 
 #define RARP_PKT_SIZE  64
-int
-rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac)
+struct rte_mbuf *
+rte_net_make_rarp_packet(struct rte_mempool *mpool,
+   const struct ether_addr *mac)
 {
struct ether_hdr *eth_hdr;
struct arp_hdr *rarp;
+   struct rte_mbuf *mbuf;
 
-   if (mbuf->buf_len < RARP_PKT_SIZE)
-   return -1;
+   if (mpool == NULL)
+   return NULL;
+
+   mbuf = rte_pktmbuf_alloc(mpool);
+   if (mbuf == NULL)
+   return NULL;
+
+   eth_hdr = (struct ether_hdr *)rte_pktmbuf_append(mbuf, RARP_PKT_SIZE);
+   if (eth_hdr == NULL) {
+   rte_pktmbuf_free(mbuf);
+   return NULL;
+   }
 
/* Ethernet header. */
-   eth_hdr = rte_pktmbuf_mtod(mbuf, struct ether_hdr *);
memset(eth_hdr->d_addr.addr_bytes, 0xff, ETHER_ADDR_LEN);
ether_addr_copy(mac, ð_hdr->s_addr);
eth_hdr->ether_type = htons(ETHER_TYPE_RARP);
@@ -35,8 +46,5 @@ rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct 
ether_addr *mac)
memset(&rarp->arp_data.arp_sip, 0x00, 4);
memset(&rarp->arp_data.arp_tip, 0x00, 4);
 
-   mbuf->data_len = RARP_PKT_SIZE;
-   mbuf->pkt_len = RARP_PKT_SIZE;
-
-   return 0;
+   return mbuf;
 }
diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h
index dad7423..457a39b 100644
--- a/lib/librte_net/rte_arp.h
+++ b/lib/librte_net/rte_arp.h
@@ -82,16 +82,17 @@ struct arp_hdr {
  *
  * Make a RARP packet based on MAC addr.
  *
- * @param mbuf
- *   Pointer to the rte_mbuf structure
+ * @param mpool
+ *   Pointer to the rte_mempool
  * @param mac
  *   Pointer to the MAC addr
  *
  * @return
- *   - 0 on success, negative on error
+ *   - RARP packet pointer on success, or NULL on error
  */
-int
-rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac);
+struct rte_mbuf *
+rte_net_make_rarp_packet(struct rte_mempool *mpool,
+   const struct ether_addr *mac);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index ca89288..a1d8026 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1162,19 +1162,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
rte_atomic16_cmpset((volatile uint16_t *)
&dev->broadcast_rarp.cnt, 1, 0))) {
 
-   rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool);
+   rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
if (rarp_mbuf == NULL) {
RTE_LOG(ERR, VHOST_DATA,
-   "Failed to allocate memory for mbuf.\n");
+   "Failed to make RARP packet.\n");
return 0;
}
-
-   if (rte_net_make_rarp_packet(rarp_mbuf, &dev->mac) < 0) {
-   rte_pktmbuf_free(rarp_mbuf);
-   rarp_mbuf = NULL;
-   } else {
-   count -= 1;
-   }
+   count -= 1;
}
 
free_entries = *((volatile uint16_t *)&vq->avail->idx) -
-- 
2.7.4



[dpdk-dev] [PATCH 2/2] net: fix build error

2018-01-17 Thread Yuanhan Liu
Fix build error when shared lib is enabled:

  LD librte_net.so.1.1
rte_arp.o: In function `rte_net_make_rarp_packet':
rte_arp.c:(.text+0x1f0): undefined reference to `rte_mempool_ops_table'
rte_arp.c:(.text+0x21d): undefined reference to `rte_mempool_ops_table'
rte_arp.c:(.text+0x2d5): undefined reference to `rte_mempool_ops_table'
rte_arp.c:(.text+0x384): undefined reference to `rte_mempool_ops_table'
rte_arp.c:(.text+0x4b7): undefined reference to `rte_mempool_ops_table'

Fixes: 45ae05df824c ("net: add a helper for making RARP packet")

Signed-off-by: Yuanhan Liu 
---
 lib/librte_net/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index ab290c3..95ff549 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_net.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-LDLIBS += -lrte_mbuf -lrte_eal
+LDLIBS += -lrte_mbuf -lrte_eal -lrte_mempool
 
 EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
-- 
2.7.4



Re: [dpdk-dev] [PATCH v11 0/5] net/virtio: support GUEST ANNOUNCE

2018-01-17 Thread Yuanhan Liu
Xiao told me that this series (except the last patch) was already applied
to the Thomas master branch. I then realised it was my mistake.

I applied v10 last week locally for some basic testing. There is a conflict
in last patch, that's why the last patch is not merged. I forgot to do
a reset before I applied another patch. Then later, I did a push to the
next-virtio tree, thus patches from Xiao were also pushed. Ferruh then
did a pull from it. As a result, they got merged to the master branch
before I realised. Non-rebase is allowed there, thus I have made a patch
to fix my mistake.

Meanwhile, I have also spotted a build error when shared lib is enabled.
I will send them out soon.

--yliu

On Wed, Jan 17, 2018 at 05:40:58AM +0800, Xiao Wang wrote:
> When live migration is finished, the backup VM needs to proactively announce
> its new location. DPDK vhost has implemented VHOST_USER_PROTOCOL_F_RARP to
> generate a RARP packet to switch in dequeue path. Another method is to let
> the guest proactively send out RARP packet using VIRTIO_NET_F_GUEST_ANNOUNCE
> feature.
> 
> This patch set enables this feature in virtio pmd, to support VM running 
> virtio
> pmd be migrated without vhost supporting RARP generation.
> 
> v11:
> - Add check for parameter and tailroom in rte_net_make_rarp_packet.
> - Allocate mbuf in rte_net_make_rarp_packet.
> 
> v10:
> - Add a bold doxygen comment for the experimental function.
> 
> v9:
> - Introduce function with the experimental state.
> 
> v8:
> - Add a helper in lib/librte_net to make rarp packet, it's used by
>   both vhost and virtio.
> 
> v7:
> - Improve comment for state_lock.
> - Rename spinlock variable 'sl' to 'lock'.
> 
> v6:
> - Use rte_pktmbuf_alloc() instead of rte_mbuf_raw_alloc().
> - Remove the 'len' parameter in calling virtio_send_command().
> - Remove extra space between typo and var.
> - Improve comment and alignment.
> - Remove the unnecessary header file.
> - A better usage of 'unlikely' indication.
> 
> v5:
> - Remove txvq parameter in virtio_inject_pkts.
> - Zero hw->special_buf after using it.
> - Return the retval of tx_pkt_burst().
> - Allocate a mbuf pointer on stack directly.
> 
> v4:
> - Move spinlock lock/unlock into dev_pause/resume.
> - Separate out a patch for packet injection.
> 
> v3:
> - Remove Tx function code duplication, use a special pointer for rarp
>   injection.
> - Rename function generate_rarp to virtio_notify_peers, replace
>   'virtnet_' with 'virtio_'.
> - Add comment for state_lock.
> - Typo fix and comment improvement.
> 
> v2:
> - Use spaces instead of tabs between the code and comments.
> - Remove unnecessary parentheses.
> - Use rte_pktmbuf_mtod directly to get eth_hdr addr.
> - Fix virtio_dev_pause return value check.
> 
> Xiao Wang (5):
>   net/virtio: make control queue thread-safe
>   net/virtio: add packet injection method
>   net: add a helper for making RARP packet
>   vhost: use lib API to make RARP packet
>   net/virtio: support GUEST ANNOUNCE
> 
>  drivers/net/virtio/virtio_ethdev.c  | 113 
> +++-
>  drivers/net/virtio/virtio_ethdev.h  |   6 ++
>  drivers/net/virtio/virtio_pci.h |   7 ++
>  drivers/net/virtio/virtio_rxtx.c|   3 +-
>  drivers/net/virtio/virtio_rxtx.h|   1 +
>  drivers/net/virtio/virtio_rxtx_simple.c |   2 +-
>  drivers/net/virtio/virtqueue.h  |  11 
>  lib/Makefile|   3 +-
>  lib/librte_net/Makefile |   1 +
>  lib/librte_net/rte_arp.c|  50 ++
>  lib/librte_net/rte_arp.h|  18 +
>  lib/librte_net/rte_net_version.map  |   6 ++
>  lib/librte_vhost/Makefile   |   2 +-
>  lib/librte_vhost/virtio_net.c   |  51 +-
>  14 files changed, 219 insertions(+), 55 deletions(-)
>  create mode 100644 lib/librte_net/rte_arp.c
> 
> -- 
> 2.15.1


Re: [dpdk-dev] [PATCH] vhost: do deep copy while reallocate vq

2018-01-17 Thread Yuanhan Liu
On Mon, Jan 15, 2018 at 06:32:19AM -0500, Junjie Chen wrote:
> When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform
> deep copy, which lead to 1) zmbuf list not valid 2) remote memory access.
> This patch is to re-initlize the zmbuf list and also do the deep copy.
> 
> Signed-off-by: Junjie Chen 
> ---

Applied to dpdk-next-virtio.

Thanks.

--yliu


[dpdk-dev] [PATCH] doc: document the new devargs syntax

2018-01-16 Thread Yuanhan Liu
This patch documents the new devargs syntax, which is going to be
implemented in DPDK v18.05.

The new devargs proposal is introduced for having a consistent
interface for:

- whitelisting/blacklisting devices
- identifying ports
- attaching/detaching devices

Please check the patch content for the details. Also, here is link
for the background:
http://dpdk.org/ml/archives/dev/2017-November/082600.html

This syntax is suggestd by Thomas:
http://dpdk.org/ml/archives/dev/2017-December/084234.html

Signed-off-by: Yuanhan Liu 
---
 doc/guides/prog_guide/env_abstraction_layer.rst | 34 +
 1 file changed, 34 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 34d871c..12f37f2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such 
case, calling
 callback. Care must be taken not to close the device from the interrupt handler
 context. It is necessary to reschedule such closing operation.
 
+Devargs
+~~~
+
+The ``devargs`` can be used for whitelisting/blacklisting devices, identifying
+DPDK ports and attaching/deatching devices. They all share the same syntax.
+
+It is split in 3 categories, where almost everything is optional key/value 
pairs:
+
+* bus (pci, vdev, vmbus, fslmc, etc)
+* class (eth, crypto, etc)
+* driver (i40e, mlx5, virtio, etc)
+
+The key/value pair describing the category scope is mandatory and must be the
+first pair in the category properties. Example: bus=pci, must be placed before
+id=:01:00.0.
+
+The syntax has below rules:
+
+* Between categories, the separator is a slash.
+* Inside a category, the separator is a comma.
+* Inside a key/value pair, the separator is an equal sign.
+* Each category can be used alone.
+
+Here is an example with all categories::
+
+
bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE
+
+It can also be simple like below::
+
+class=eth,mac=00:11:22:33:44:55
+
+A device is identified when every properties are matched. Before device is
+probed, only the bus category is relevant.
+
 Blacklisting
 
 
-- 
2.7.4



Re: [dpdk-dev] [PATCH] vhost: support Explicit Congestion Notification

2018-01-11 Thread Yuanhan Liu
On Wed, Nov 22, 2017 at 11:19:42AM +0800, Jiayu Hu wrote:
> In virtio, Explicit Congestion Notification (ECN) includes two parts:
> guest ECN and host ECN. Guest ECN means the frontend can handle TSO
> packets which have ECN set, and host ECN means the backend can handle
> TSO packets which have ECN set.
> 
> The ECN features are rarely used. However, virtio-net enables them by
> default, and vhost-net support both. To make live migration from
> vhost-net to vhost-user possible, this patch announces to support
> guest and host ECN in vhost-user.
> 
> Signed-off-by: Jiayu Hu 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v4 0/5] lib: add Port Representors

2018-01-10 Thread Yuanhan Liu
On Mon, Jan 08, 2018 at 02:37:15PM +, Remy Horton wrote:
> Port Representors provide a logical presentation in DPDK of VF (virtual 
> function) ports for the purposes of control and monitoring. Each port 
> representor device represents a single VF and is associated with it's 
> parent physical function (PF) PMD which provides the back-end hooks for 
> the representor device ops and defines the control domain to which that 
> port belongs. This allows to use existing DPDK APIs to monitor and control 
> the port without the need to create and maintain VF specific APIs.

Firstly, I don't object this model. More precisely, I don't object for
introducing a port to control the VFs. I even like this idea a bit, for
at least it could get rid of some PMD specific APIs, say
rte_pmd_i40e_set_vf_mac_addr, etc.

However, I don't quite like this design, for a simple reason, it makes
things way more complex:

- new APIs are introduced.
- new virtual PMD is required
- broker concept


I was thinking below should work:

- Use the devargs for enabling the port (or VF) representor model.
  For example: -w 04:00.0,vf_ports=0-3

  The vf_ports is for telling the driver we are going to create VF
  representors for VF 0 to VF 3. It could be a port mask like what
  this patchset does.

- Then inside the driver (say, i40e)

  * allocate 1 DPDK ethdev port
  * allocate 4 DPDK ethdev port, one for each VF specified by
the vf_ports option. And these are VF representors.
  * link the VF representors to the right VF port
For example, for i40e, it should be (from patch 3):
vf = &pf->vfs[representor->vport_id];
  * set the proper ethdev callbacks for the VF representors.

As you can see, none of above complexity is introduced.

Probably you might find something are missing:

- to identify the vf rep port for a specific VF.

  Which could be addressed by the new devargs syntax we are proposing:
  http://dpdk.org/ml/archives/dev/2017-December/084234.html

  For example, the right VF rep port could be identified by below devarg:
  bus=pci,id=04:00.0/class=eth,vf_id=0

- the ability of hotplug

  I think it could also be addressed by the new devargs syntax, also by
  re-using the rte_eth_dev_attach() function:

  rte_eth_dev_attach("bus=pci,id=04:00.0/class=eth,vf_id=4", &port_id);
  

Thoughts?

--yliu
> 
> +-+   +---+  +---+
> |Control Plane|   |   Data Plane  |  |   Data Plane  |
> | Application |   |   Application |  |   Application |
> +-+   +---+  +---+
> | eth dev api |   |  eth dev api  |  |  eth dev api  |
> +-+   +---+  +---+
> +---+  +---+  +---+   +---+  +---+
> |  PF0  |  | Port  |  | Port  |   |VF0 PMD|  |VF0 PMD|
> |  PMD  <--+ Rep 0 |  | Rep 1 |   +---+  +--++
> |   |  | PMD   |  | PMD   | |
> +---+--^+  +---+  +-+-+ |
> |  ||  ||
> |  ++  ||
> |  ||
> |  ||
> ++  |
> |   |  HW (logical view)   | |  |
> | --+--+ +---+ +---+---+ |  |
> | |   PF   | |  VF0  | |  VF1  | |  |
> | || |   | |   ++
> | ++ +---+ +---+ |
> | ++ |
> | |VEB | |
> | ++ |
> | ++ |
> | |  Port  | |
> | |   0| |
> | ++ |
> ++
> 
> The figure above shows a deployment where the PF is bound to a DPDK control
> plane application which uses representor ports to manage the configuration and
> monitoring of it's VF ports. Each virtual function is represented in the
> application by a representor port PMD which enables control of the 
> corresponding
> VF through eth dev APIs on the representor PMD such as:
> 
> - void rte_eth_promiscuous_enable(uint8_t port_id);
> - void rte_eth_promiscuous_disable(uint8_t port_id);
> - void rte_eth_allmulticast_enable(uint8_t port_id);
> - void rte_eth_allmulticast_disable(uint8_t port_id);
> - int rte_eth_dev_mac_addr_add(uint8_t port, struct ether_addr *mac_addr,
>   uint32_t pool);
> - int rte_eth_dev_set_vlan_offload(uint8_t port_id, int offload_mask);
> 
> as well as monitoring through API's like
> 
> - void rte_eth_link_get(uint8_t port_id, struct rte_eth_link *link);
> - int rte_eth_stats_get(uint8_t port_id, struct rte_eth_s

Re: [dpdk-dev] [PATCH v10 3/5] net: add a helper for making RARP packet

2018-01-10 Thread Yuanhan Liu
Thomas, look good to you?

--yliu

On Wed, Jan 10, 2018 at 09:23:54AM +0800, Xiao Wang wrote:
> Suggested-by: Maxime Coquelin 
> Signed-off-by: Xiao Wang 
> Reviewed-by: Maxime Coquelin 
> ---
>  lib/librte_net/Makefile|  1 +
>  lib/librte_net/rte_arp.c   | 42 
> ++
>  lib/librte_net/rte_arp.h   | 17 +++
>  lib/librte_net/rte_net_version.map |  6 ++
>  4 files changed, 66 insertions(+)
>  create mode 100644 lib/librte_net/rte_arp.c
> 
> diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
> index 5e8a76b68..ab290c382 100644
> --- a/lib/librte_net/Makefile
> +++ b/lib/librte_net/Makefile
> @@ -13,6 +13,7 @@ LIBABIVER := 1
>  
>  SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
>  SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
> +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_arp.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h 
> rte_esp.h
> diff --git a/lib/librte_net/rte_arp.c b/lib/librte_net/rte_arp.c
> new file mode 100644
> index 0..d7223b044
> --- /dev/null
> +++ b/lib/librte_net/rte_arp.c
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include 
> +
> +#include 
> +
> +#define RARP_PKT_SIZE64
> +int
> +rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr *mac)
> +{
> + struct ether_hdr *eth_hdr;
> + struct arp_hdr *rarp;
> +
> + if (mbuf->buf_len < RARP_PKT_SIZE)
> + return -1;
> +
> + /* Ethernet header. */
> + eth_hdr = rte_pktmbuf_mtod(mbuf, struct ether_hdr *);
> + memset(eth_hdr->d_addr.addr_bytes, 0xff, ETHER_ADDR_LEN);
> + ether_addr_copy(mac, ð_hdr->s_addr);
> + eth_hdr->ether_type = htons(ETHER_TYPE_RARP);
> +
> + /* RARP header. */
> + rarp = (struct arp_hdr *)(eth_hdr + 1);
> + rarp->arp_hrd = htons(ARP_HRD_ETHER);
> + rarp->arp_pro = htons(ETHER_TYPE_IPv4);
> + rarp->arp_hln = ETHER_ADDR_LEN;
> + rarp->arp_pln = 4;
> + rarp->arp_op  = htons(ARP_OP_REVREQUEST);
> +
> + ether_addr_copy(mac, &rarp->arp_data.arp_sha);
> + ether_addr_copy(mac, &rarp->arp_data.arp_tha);
> + memset(&rarp->arp_data.arp_sip, 0x00, 4);
> + memset(&rarp->arp_data.arp_tip, 0x00, 4);
> +
> + mbuf->data_len = RARP_PKT_SIZE;
> + mbuf->pkt_len = RARP_PKT_SIZE;
> +
> + return 0;
> +}
> diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h
> index 183641874..dad7423ad 100644
> --- a/lib/librte_net/rte_arp.h
> +++ b/lib/librte_net/rte_arp.h
> @@ -76,6 +76,23 @@ struct arp_hdr {
>   struct arp_ipv4 arp_data;
>  } __attribute__((__packed__));
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Make a RARP packet based on MAC addr.
> + *
> + * @param mbuf
> + *   Pointer to the rte_mbuf structure
> + * @param mac
> + *   Pointer to the MAC addr
> + *
> + * @return
> + *   - 0 on success, negative on error
> + */
> +int
> +rte_net_make_rarp_packet(struct rte_mbuf *mbuf, const struct ether_addr 
> *mac);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_net/rte_net_version.map 
> b/lib/librte_net/rte_net_version.map
> index 687c40eaf..213e6fd32 100644
> --- a/lib/librte_net/rte_net_version.map
> +++ b/lib/librte_net/rte_net_version.map
> @@ -12,3 +12,9 @@ DPDK_17.05 {
>   rte_net_crc_set_alg;
>  
>  } DPDK_16.11;
> +
> +EXPERIMENTAL {
> + global:
> +
> + rte_net_make_rarp_packet;
> +} DPDK_17.05;
> -- 
> 2.15.1


Re: [dpdk-dev] [PATCH v2] examples/vhost: fix remove dev_info.max_rx_queues checking to solve startup failure

2018-01-10 Thread Yuanhan Liu
On Wed, Jan 10, 2018 at 02:01:01PM +0800, Zhiyong Yang wrote:
> For vhost sample, the operation if (dev_info.max_rx_queues >
> MAX_QUEUES) in the function port_init causes startup failure
> when using X710(i40e driver). X710 requires that MAX_QUEUES
> should be defined no less than 320, however it is defined as
> 128 currently.
> 
> Such checking is overkill and Removal don't affect any
> functionality (have already validated ixgbe and i40e).
> 
> The removal can avoid similar issue when introduing new physical
> NIC.
> 
> Fixes: 8bd6c395a568("examples/vhost: increase maximum queue number")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Zhiyong Yang 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH 5/5] vhost: add reconnect thread name for client mode.

2018-01-09 Thread Yuanhan Liu
On Tue, Jan 09, 2018 at 09:12:43PM +0800, Yuanhan Liu wrote:
> On Fri, Jan 05, 2018 at 06:10:39AM -0800, Tonghao Zhang wrote:
> > This patch adds the name for vhost-user reconnect thread.
> > It can help us to know whether the thread is running.
> > 
> > Signed-off-by: Tonghao Zhang 
> > ---
> >  lib/librte_vhost/socket.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> > index d44a0f1..c2e34e0 100644
> > --- a/lib/librte_vhost/socket.c
> > +++ b/lib/librte_vhost/socket.c
> > @@ -433,6 +433,7 @@ struct vhost_user_reconnect_list {
> >  vhost_user_reconnect_init(void)
> >  {
> > int ret;
> > +   char thread_name[RTE_MAX_THREAD_NAME_LEN];
> >  
> > ret = pthread_mutex_init(&reconn_list.mutex, NULL);
> > if (ret < 0) {
> > @@ -449,6 +450,13 @@ struct vhost_user_reconnect_list {
> > RTE_LOG(ERR, VHOST_CONFIG,
> > "failed to destroy reconnect mutex");
> > }
> > +   } else {
> > +   snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
> > +"vhost-reconn");
> > +
> > +   if (rte_thread_setname(reconn_tid, thread_name))
> > +   RTE_LOG(DEBUG, VHOST_CONFIG,
> > +   "Failed to set thread name for vhost-user reconnect");
> 
> Applied to dpdk-next-virtio, with the intendation fixed.

Note that I have just applied this patch only. The rest of this patchset
belongs to the ixgbe PMD driver, which need another maintainer for review.

That also means, it's better if you could send them sperately next time
when they are not related. More specifically, in this case, one patch
for vhost-user, another patchset for all the rest ixgbe changes.

--yliu


Re: [dpdk-dev] [PATCH v7 3/3] net/virtio: support GUEST ANNOUNCE

2018-01-09 Thread Yuanhan Liu
On Tue, Jan 09, 2018 at 12:41:53PM +0100, Thomas Monjalon wrote:
> > > Do you think it could make sense to have this function in a lib, as
> > > vhost user lib does exactly the same?
> > > 
> > > I don't know if it could be useful to others than vhost/virtio though.
> > 
> > Hi Thomas,
> > 
> > Do you think it's worth adding a new helper for ARP in lib/librte_net/?
> > Currently we just need a helper to build RARP packet (the above 
> > make_rarp_packet)
> 
> Yes, good idea

+1

--yliu


Re: [dpdk-dev] [PATCH] vhost: fix error code check when creating pthread

2018-01-09 Thread Yuanhan Liu
On Fri, Dec 08, 2017 at 11:19:49AM +0100, Olivier Matz wrote:
> On error, pthread_create() returns a positive number (errno).
> Fix the test on the return value.
> 
> Fixes: af1475918124 ("vhost: introduce API to start a specific driver")
> Fixes: e623e0c6d8a5 ("vhost: add reconnect ability")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Olivier Matz 
> ---

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] examples/vhost: fix extend MAX_QUEUES to resolve startup failure

2018-01-09 Thread Yuanhan Liu
On Thu, Jan 04, 2018 at 02:33:32PM +0800, Zhiyong Yang wrote:
> When binding X710 NIC (i40e driver) to DPDK, vhost sample startups
> failure.
> The sample requires that MAX_QUEUES should be defined no less than 320.
> So, the patch redefines MAX_QUEUES 320 to fix the issue.

It just makes the issue disappear. It doesn't really fix the issue.
And I belive we have tried to fix this kind of issues in this way
many times. (just check the git history). As you known, none of them
really worked. You just added one more try, which is very likely
will be broken again when Intel has one more new NIC.

The error comes from:

if (dev_info.max_rx_queues > MAX_QUEUES) {
rte_exit(EXIT_FAILURE,
"please define MAX_QUEUES no less than %u in %s\n",
dev_info.max_rx_queues, __FILE__);
}

I think such check is overkill and we don't really need that. Could
you just remove such check and do some validations on few difference
nics?

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] net/virtio: init MTU in case no control channel

2018-01-09 Thread Yuanhan Liu
On Fri, Jan 05, 2018 at 02:28:06AM -0800, Zhike Wang wrote:
> From: zhike wang 
> 
> The max_mtu is kept as zero in case no CRTL channel, which leads
> to failure when calling virtio_mtu_set().
> 
> Signed-off-by: Zhike Wang 

Applied to dpdk-next-virtio.

Thanks.

--yliu
> ---
>  drivers/net/virtio/virtio_ethdev.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index 21f2131..b7b3364 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1479,6 +1479,8 @@ static int virtio_dev_xstats_get_names(struct 
> rte_eth_dev *dev,
>   } else {
>   PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1");
>   hw->max_queue_pairs = 1;
> + hw->max_mtu = VIRTIO_MAX_RX_PKTLEN - ETHER_HDR_LEN -
> + VLAN_TAG_LEN - hw->vtnet_hdr_size;
>   }
>  
>   ret = virtio_alloc_queues(eth_dev);
> -- 
> 1.8.3.1


Re: [dpdk-dev] [PATCH 5/5] vhost: add reconnect thread name for client mode.

2018-01-09 Thread Yuanhan Liu
On Fri, Jan 05, 2018 at 06:10:39AM -0800, Tonghao Zhang wrote:
> This patch adds the name for vhost-user reconnect thread.
> It can help us to know whether the thread is running.
> 
> Signed-off-by: Tonghao Zhang 
> ---
>  lib/librte_vhost/socket.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index d44a0f1..c2e34e0 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -433,6 +433,7 @@ struct vhost_user_reconnect_list {
>  vhost_user_reconnect_init(void)
>  {
>   int ret;
> + char thread_name[RTE_MAX_THREAD_NAME_LEN];
>  
>   ret = pthread_mutex_init(&reconn_list.mutex, NULL);
>   if (ret < 0) {
> @@ -449,6 +450,13 @@ struct vhost_user_reconnect_list {
>   RTE_LOG(ERR, VHOST_CONFIG,
>   "failed to destroy reconnect mutex");
>   }
> + } else {
> + snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
> +  "vhost-reconn");
> +
> + if (rte_thread_setname(reconn_tid, thread_name))
> + RTE_LOG(DEBUG, VHOST_CONFIG,
> + "Failed to set thread name for vhost-user reconnect");

Applied to dpdk-next-virtio, with the intendation fixed.

I have also shortten the log a bit "failed to set reconnect thread name".
There is no need to specify "vhost-user" again since VHOST_CONFIG tells
you it comes from vhost-user.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v7] vhost: support virtqueue interrupt/notification suppression

2018-01-09 Thread Yuanhan Liu
On Tue, Jan 09, 2018 at 07:34:17AM +, Yao, Lei A wrote:
> 
> 
> > -Original Message-
> > From: Chen, Junjie J
> > Sent: Tuesday, January 9, 2018 7:04 PM
> > To: y...@fridaylinux.org; maxime.coque...@redhat.com; Wang, Xiao W
> > ; Bie, Tiwei ; Yao, Lei A
> > 
> > Cc: dev@dpdk.org; Chen, Junjie J 
> > Subject: [PATCH v7] vhost: support virtqueue interrupt/notification
> > suppression
> > 
> > The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
> > negotiated. The driver set vring flags to 0, and MAY use used_event in
> > available ring to advise device interrupt util reach an index specified
> > by used_event. The device ignore the lower bit of vring flags, and send
> > an interrupt when index reach used_event.
> > 
> > The device can suppress notification in a manner analogous to the ways
> > driver suppress interrupt. The device manipulates flags or avail_event in
> > the used ring in the same way the driver manipulates flags or used_event in
> > available ring.
> > 
> > Signed-off-by: Junjie Chen 
> Tested-by: Lei Yao 

Applied to dpdk-next-virtio.

Thanks.

--yliu

> VM2VM Iperf test has been executed with virtio-net driver after apply this 
> patch.
> No performance drop. 
> CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Host OS: Ubuntu 16.04
> Guest OS: Ubuntu 16.04
> Kernel: 4.4.0


Re: [dpdk-dev] [PATCH v6] vhost: support virtqueue interrupt/notification suppression

2018-01-08 Thread Yuanhan Liu
On Tue, Jan 09, 2018 at 02:12:02AM +, Chen, Junjie J wrote:
> > > + if (vring_need_event(vhost_used_event(vq), new, old)
> > 
> > It's a bit weird that you use one from the standard linux header file
> > (vring_need_event), while you define you own one (vhost_used_event).
> > Note that the system header file also has "vring_used_event()" defined.
> The vring_used_event is defined and used for virtio in kernel, kernel defines 
> a vhost_used_event in vhost.c for vhost, so I just use a separated macro for 
> vhost end.
> 
> I'd like to define both vhost_need_event and vhost_used_event in vhost.h to 
> remove potential build issue in old linux distribution and also to keep 
> consistent. Is that OK for you?

Yes.

--yliu


Re: [dpdk-dev] [PATCH v5 0/4] Vhost: fix mq=on but VIRTIO_NET_F_MQ not negotiated

2018-01-08 Thread Yuanhan Liu
On Wed, Dec 13, 2017 at 09:51:05AM +0100, Maxime Coquelin wrote:
> Hi,
> 
> This fifth revision fixes bug reported by Tiwei, dev->virtqueue[$idx]
> wasn't reset to NULL at vq freeing time.

Applied to dpdk-next-virtio.

Thanks.

--yliu
> 
> Having QEMU started with mq=on but guest driver not negotiating
> VIRTIO_NET_F_MQ feature ends up in the vhost device to never
> start. Indeed, more queues are created in the vhost backend than
> configured.
> 
> Guest drivers known to not advertise the VIRTIO_NET_F_MQ feature are
> iPXE and OVMF Virtio-net drivers.
> 
> Queues are created because before starting the guest, QEMU sends
> VHOST_USER_SET_VRING_CALL requests for all queues declared in QEMU
> command line. Also, once Virtio features negotiated, QEMU sends
> VHOST_USER_SET_VRING_ENABLE requests to disable all but the first
> queue pair.
> 
> This series fixes this by destroying all but first queue pair in
> the backend if VIRTIO_NET_F_MQ isn't negotiated. First patches
> makes sure that VHOST_USER_SET_FEATURES request doesn't change
> Virtio features while the device is running, which should never
> happen as per the Virtio spec. This helps to make sure vitqueues
> aren't destroyed while being processed, but also protect from
> other illegal features changes (e.g. VIRTIO_NET_F_MRG_RXBUF).
> 
> Changes since v4:
> =
> - Fix dev->virtqueue[$ixd] not reset to NULL at VQ free time (Tiwei)
> Changes since v3:
> =
> - Fix Virtio features = 0 case (Tiwei)
> Changes since v2:
> =
> - Patch 2: Rework & fix VQs destruction loop (Laszlo)
> Changes since v1:
> =
> - Patch 1: shift bits in the right direction (Ladi)
> 
> Maxime Coquelin (4):
>   vhost: prevent features to be changed while device is running
>   vhost: propagate VHOST_USER_SET_FEATURES handling error
>   vhost: extract virtqueue cleaning and freeing functions
>   vhost: destroy unused virtqueues when multiqueue not negotiated
> 
>  lib/librte_vhost/vhost.c  | 22 --
>  lib/librte_vhost/vhost.h  |  3 +++
>  lib/librte_vhost/vhost_user.c | 40 ++--
>  3 files changed, 53 insertions(+), 12 deletions(-)
> 
> -- 
> 2.14.3


Re: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload

2018-01-08 Thread Yuanhan Liu
On Tue, Nov 21, 2017 at 02:56:52PM +0800, Jiayu Hu wrote:
> In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
> and guest UFO. Guest UFO means the frontend can receive large UDP packets,
> and host UFO means the backend can receive large UDP packets. This patch
> supports host UFO and guest UFO for vhost-user.
> 
> Signed-off-by: Jiayu Hu 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v2] examples/vhost: fix sending arp packet to self

2018-01-08 Thread Yuanhan Liu
On Fri, Dec 29, 2017 at 09:33:19AM -0500, Junjie Chen wrote:
> ARP packets are not dropped when dest vdev is itself, which breaks
> RX ring inconspicuously.
> 
> Fixes: 9c5ef51207c6 ("examples/vhost: handle broadcast packet")
> 
> Signed-off-by: Junjie Chen 
> ---

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v6] vhost: support virtqueue interrupt/notification suppression

2018-01-08 Thread Yuanhan Liu
On Tue, Dec 26, 2017 at 12:43:10PM -0500, Junjie Chen wrote:
> The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
> negotiated. The driver set vring flags to 0, and MAY use used_event in
> available ring to advise device interrupt util reach an index specified
> by used_event. The device ignore the lower bit of vring flags, and send
> an interrupt when index reach used_event.
> 
> The device can suppress notification in a manner analogous to the ways
> driver suppress interrupt. The device manipulates flags or avail_event in
> the used ring in the same way the driver manipulates flags or used_event in
> available ring.
> 
> This patch is to enable this feature in vhost.
> 
> Signed-off-by: Junjie Chen 

You need put "---" before the change log. Otherwise, it will be tracked in
the commit log.

> +#define vhost_used_event(vr) \
> + (*(volatile uint16_t*)&(vr)->avail->ring[(vr)->size])
> +
> +static __rte_always_inline void
> +vhost_notify(struct virtio_net *dev,  struct vhost_virtqueue *vq)
> +{
> + /* Don't notify guest if we don't reach index specified by guest. */
> + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
> + uint16_t old = vq->signalled_used;
> + uint16_t new = vq->last_used_idx;
> +
> + LOG_DEBUG(VHOST_DATA, "%s: used_event_idx=%d, old=%d, new=%d\n",
> + __func__,
> + vhost_used_event(vq),
> + old, new);
> + if (vring_need_event(vhost_used_event(vq), new, old)

It's a bit weird that you use one from the standard linux header file
(vring_need_event), while you define you own one (vhost_used_event).
Note that the system header file also has "vring_used_event()" defined.

Besides that, I have few more comments (and some requirements):

- It'd be much better if there is a Tested-by tag. Expeclitly,
  I'm asking a test with Linux kernel virtio-net driver in guest.

- I also hope you could have done a build test on some old distributions.
  AFAIK, the two macros (vring_need_event and vring_used_event) come
  from kernel 3.0 (or above). Any kernel older than that would fail
  the build.

- I'd be great if you could make a new one based on top of my latest
  tree: I have just applied a patchset that should conflict with this
  one.

--yliu


Re: [dpdk-dev] [PATCH v2 0/2] vhost: introduce rte_vhost_vring_call()

2018-01-08 Thread Yuanhan Liu
On Wed, Jan 03, 2018 at 09:17:32AM +0100, Maxime Coquelin wrote:
> >>Reviewed-by: Maxime Coquelin 
> >
> > From 3.2. Managing ABI updates, General Guidelines:
> >
> >   New APIs will be marked as experimental for at least one release to
> >   allow any issues found by users of the new API to be fixed quickly.
> >
> >http://dpdk.org/doc/guides/contributing/versioning.html
> >
> >I'm new to DPDK so I followed this guideline strictly and expect the
> >maintainers to mark the API stable at some point in the future when they
> >are happy with it.  Maybe this guideline isn't followed strictly for
> >librte_vhost?
> 
> Thanks for the pointer, it seems it is not strictly followed for
> librte_vhost & other libs. Let's keep it experimental for the coming
> release.

IIRC, it's a new guideline added recently (people were talking about
that in last year's DPDK user event after all).

And my understanding is, for a new (big set of) APIs, it's good to
make them experimental, as normally, it's not an easy task to make
a big stuff right in one time.

For this one, which is simple and straightforward, I don't see a
strong need to mark it as experimental.

Thus I have removed it (while apply). If anyone has objections,
let me know.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v6 1/3] net/virtio: make control queue thread-safe

2018-01-08 Thread Yuanhan Liu
On Sun, Jan 07, 2018 at 04:05:11AM -0800, Xiao Wang wrote:
> diff --git a/drivers/net/virtio/virtio_rxtx.h 
> b/drivers/net/virtio/virtio_rxtx.h
> index 54f1e849b..71b5798b0 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -84,6 +84,7 @@ struct virtnet_ctl {
>   rte_iova_t virtio_net_hdr_mem;  /**< hdr for each xmit packet */
>   uint16_t port_id;   /**< Device port identifier. */
>   const struct rte_memzone *mz;   /**< mem zone to populate CTL ring. */
> + rte_spinlock_t sl;  /**< spinlock for control queue. */

It's weird to name it "sl". The typical naming is just "lock".

--yliu


Re: [dpdk-dev] [PATCH v6 2/3] net/virtio: add packet injection method

2018-01-08 Thread Yuanhan Liu
On Sun, Jan 07, 2018 at 04:05:12AM -0800, Xiao Wang wrote:
> + /*
> +  * App management thread and virtio interrupt handler thread
> +  * both can change the 'started' flag, this lock is meant to
> +  * avoid such a contention.
> +  */
> + rte_spinlock_t state_lock;

Why not turning the "started" to atomic type, so that you don't need
the lock?

--yliu
> + struct rte_mbuf **inject_pkts;
>  
>   struct virtqueue **vqs;
>  };


Re: [dpdk-dev] [PATCH v2 0/4] various fixes and cleanups for virtio PMD

2017-12-27 Thread Yuanhan Liu
On Mon, Dec 11, 2017 at 01:13:28PM +0800, Tiwei Bie wrote:
> v2:
> - refine indent for patch 2
> - fix a typo in commit log for patch 4
> - drop the blank lines squeezing patch

Series applied to dpdk-next-virtio.

Thanks.

--yliu
> 
> Tiwei Bie (4):
>   net/virtio: fix vector Rx break caused by rxq flushing
>   net/virtio: fix typo in LRO support
>   net/virtio: remove a redundant macro definition for ctrl vq
>   net/virtio: remove redundant macro definitions for vector Rx
> 
>  drivers/net/virtio/virtio_ethdev.c   |  4 ++--
>  drivers/net/virtio/virtio_rxtx_simple_neon.c |  2 --
>  drivers/net/virtio/virtio_rxtx_simple_sse.c  |  2 --
>  drivers/net/virtio/virtqueue.c   | 31 
> +---
>  drivers/net/virtio/virtqueue.h   |  4 +---
>  5 files changed, 27 insertions(+), 16 deletions(-)
> 
> -- 
> 2.13.3


Re: [dpdk-dev] [PATCH] vhost: support Generic Segmentation Offload

2017-12-27 Thread Yuanhan Liu
On Mon, Dec 25, 2017 at 01:53:29AM +, Yao, Lei A wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jiayu Hu
> > Sent: Tuesday, November 28, 2017 1:29 PM
> > To: dev@dpdk.org
> > Cc: y...@fridaylinux.org; Tan, Jianfeng ; Hu, Jiayu
> > 
> > Subject: [dpdk-dev] [PATCH] vhost: support Generic Segmentation Offload
> > 
> > In virtio, Generic Segmentation Offload (GSO) is the feature for the
> > backend, which means the backend can receive packets with any GSO
> > type.
> > 
> > Virtio-net enables the GSO feature by default, and vhost-net supports it.
> > To make live migration from vhost-net to vhost-user possible, this patch
> > enables GSO for vhost-user.
> > 
> > Signed-off-by: Jiayu Hu 
> Tested-by: Lei Yao

Applied to dpdk-next-virtio.

Thanks.

--yliu

> This patch has been tested on my server, after add csum=on, gso=on to qemu 
> cmdline,
> Following offload are active in vm:
> udp-fragmentation-offload: on
> tx-tcp-segmentation: on
>  tx-tcp-ecn-segmentation: on
>  tx-tcp6-segmentation: on
> 
> > ---
> >  lib/librte_vhost/vhost.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 1cc81c1..04f54cb 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -204,6 +204,7 @@ struct vhost_msg {
> > (1ULL << VIRTIO_F_VERSION_1)   | \
> > (1ULL << VHOST_F_LOG_ALL)  | \
> > (1ULL <<
> > VHOST_USER_F_PROTOCOL_FEATURES) | \
> > +   (1ULL << VIRTIO_NET_F_GSO) | \
> > (1ULL << VIRTIO_NET_F_HOST_TSO4) | \
> > (1ULL << VIRTIO_NET_F_HOST_TSO6) | \
> > (1ULL << VIRTIO_NET_F_CSUM)| \
> > --
> > 2.7.4


Re: [dpdk-dev] [PATCH] net/virtio: fix incorrect cast of void *

2017-12-27 Thread Yuanhan Liu
On Thu, Dec 14, 2017 at 03:49:32PM +0100, Maxime Coquelin wrote:
> 
> 
> On 12/14/2017 03:33 PM, Olivier Matz wrote:
> >From: Didier Pallard 
> >
> >The rx_queues and tx_queues fields of the data structure points to a struct
> >virtnet_rx or virtnet_tx. Casting it to a virtqueue is an error.
> >
> >It does not trigger any bug because pointer is not dereferenced inside the
> >function, but it can become a bug if this code is copy/pasted and vq is
> >dereferenced.
> >
> >Fixes: 01ad44fd374f ("net/virtio: split Rx/Tx queue")
> >Cc: sta...@dpdk.org
> >
> >Signed-off-by: Didier Pallard 
> >Signed-off-by: Olivier Matz 
> >---
> >  drivers/net/virtio/virtio_ethdev.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/virtio/virtio_ethdev.c 
> >b/drivers/net/virtio/virtio_ethdev.c
> >index c0ba83b06..b398f9960 100644
> >--- a/drivers/net/virtio/virtio_ethdev.c
> >+++ b/drivers/net/virtio/virtio_ethdev.c
> >@@ -893,7 +893,7 @@ static int virtio_dev_xstats_get_names(struct 
> >rte_eth_dev *dev,
> > /* Note: limit checked in rte_eth_xstats_names() */
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >-struct virtqueue *rxvq = dev->data->rx_queues[i];
> >+struct virtnet_rx *rxvq = dev->data->rx_queues[i];
> > if (rxvq == NULL)
> > continue;
> > for (t = 0; t < VIRTIO_NB_RXQ_XSTATS; t++) {
> >@@ -906,7 +906,7 @@ static int virtio_dev_xstats_get_names(struct 
> >rte_eth_dev *dev,
> > }
> > for (i = 0; i < dev->data->nb_tx_queues; i++) {
> >-struct virtqueue *txvq = dev->data->tx_queues[i];
> >+struct virtnet_tx *txvq = dev->data->tx_queues[i];
> > if (txvq == NULL)
> > continue;
> > for (t = 0; t < VIRTIO_NB_TXQ_XSTATS; t++) {
> >
> 
> Reviewed-by: Maxime Coquelin 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload

2017-12-27 Thread Yuanhan Liu
On Tue, Nov 21, 2017 at 02:56:52PM +0800, Jiayu Hu wrote:
> In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
> and guest UFO. Guest UFO means the frontend can receive large UDP packets,
> and host UFO means the backend can receive large UDP packets. This patch
> supports host UFO and guest UFO for vhost-user.
> 
> Signed-off-by: Jiayu Hu 
> ---
>  lib/librte_mbuf/rte_mbuf.h|  7 +++
>  lib/librte_vhost/vhost.h  |  2 ++
>  lib/librte_vhost/virtio_net.c | 10 ++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index ce8a05d..3d8cfc9 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -209,6 +209,13 @@ extern "C" {
>  /* add new TX flags here */
>  
>  /**
> + * UDP Fragmentation Offload flag. This flag is used for enabling UDP
> + * fragmentation in SW or in HW. When use UFO, mbuf->tso_segsz is used
> + * to store the MSS of UDP fragments.
> + */
> +#define PKT_TX_UDP_SEG   (1ULL << 42)

This patch added a new mbuf flag, Olivier, do you have objections?

--yliu


Re: [dpdk-dev] [PATCH] lib/librte_vhost: remove redundant logic judgement

2017-12-27 Thread Yuanhan Liu
On Mon, Dec 25, 2017 at 05:16:17PM +0800, Zhiyong Yang wrote:
> At the beginning of vring_translate, the code
> if(!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) already judges
> if IOMMU_PLATFORM is supported. The function vhost_iova_to_vva always
> repeats the logic, __vhost_iova_to_vva can be used directly to avoid it
> here.
> 
> Signed-off-by: Zhiyong Yang 
> ---
>  lib/librte_vhost/vhost.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 4f8b73a09..bb615fd2a 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -157,7 +157,7 @@ vring_translate(struct virtio_net *dev, struct 
> vhost_virtqueue *vq)
>   goto out;
>  
>   size = sizeof(struct vring_desc) * vq->size;
> - vq->desc = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, vq,
> + vq->desc = (struct vring_desc *)(uintptr_t)__vhost_iova_to_vva(dev, vq,
>   vq->ring_addrs.desc_user_addr,
>   size, VHOST_ACCESS_RW);


I don't see strong reason to bother doing the change. It's not in the
datapath after all. I'd like to keep the code as it is, to keep it simpler:
user just has to call vhost_iova_to_vva() and let it to handle the details.

--yliu


>   if (!vq->desc)
> @@ -165,7 +165,8 @@ vring_translate(struct virtio_net *dev, struct 
> vhost_virtqueue *vq)
>  
>   size = sizeof(struct vring_avail);
>   size += sizeof(uint16_t) * vq->size;
> - vq->avail = (struct vring_avail *)(uintptr_t)vhost_iova_to_vva(dev, vq,
> + vq->avail = (struct vring_avail *)(uintptr_t)__vhost_iova_to_vva(dev,
> + vq,
>   vq->ring_addrs.avail_user_addr,
>   size, VHOST_ACCESS_RW);
>   if (!vq->avail)
> @@ -173,7 +174,7 @@ vring_translate(struct virtio_net *dev, struct 
> vhost_virtqueue *vq)
>  
>   size = sizeof(struct vring_used);
>   size += sizeof(struct vring_used_elem) * vq->size;
> - vq->used = (struct vring_used *)(uintptr_t)vhost_iova_to_vva(dev, vq,
> + vq->used = (struct vring_used *)(uintptr_t)__vhost_iova_to_vva(dev, vq,
>   vq->ring_addrs.used_user_addr,
>   size, VHOST_ACCESS_RW);
>   if (!vq->used)
> -- 
> 2.13.3


Re: [dpdk-dev] [PATCH] net/virtio: remove unnecessary macro definitions

2017-12-27 Thread Yuanhan Liu
On Tue, Dec 26, 2017 at 05:25:00PM +0800, Zhiyong Yang wrote:
> DPDK has already the definition of Ethernet numeric link speeds in Mbps in
> the file Rte_ethdev.h, it is unnecessary to rededine virtio specific link
> speeds macros again.
> 
> Signed-off-by: Zhiyong Yang 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH] vhost: fix dequeue zero copy not work with virtio1

2017-12-15 Thread Yuanhan Liu
On Fri, Dec 15, 2017 at 10:33:41AM +, Loftus, Ciara wrote:
> > 
> > Hi Junjie,
> > 
> > On 12/13/2017 05:50 PM, Junjie Chen wrote:
> > > This fix dequeue zero copy can not work with Qemu
> > > version >= 2.7. Since from Qemu 2.7 virtio device
> > > use virtio-1 protocol, the zero copy code path
> > > forget to add offset to buffer address.
> > >
> > > Signed-off-by: Junjie Chen 
> > > ---
> > >   lib/librte_vhost/virtio_net.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > > index 6fee16e..79d80f7 100644
> > > --- a/lib/librte_vhost/virtio_net.c
> > > +++ b/lib/librte_vhost/virtio_net.c
> > > @@ -977,7 +977,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > >   desc->addr + desc_offset,
> > cpy_len {
> > >   cur->data_len = cpy_len;
> > >   cur->data_off = 0;
> > > - cur->buf_addr = (void *)(uintptr_t)desc_addr;
> > > + cur->buf_addr = (void *)(uintptr_t)(desc_addr
> > > + + desc_offset);
> > >   cur->buf_iova = hpa;
> > >
> > >   /*
> > >
> > 
> > Thanks for fixing this.
> > 
> > Reviewed-by: Maxime Coquelin 
> > 
> > Maxime
> 
> Thanks for the fix. Can this be considered for the stable branch?

Yes, I think so. Applied to dpdk-next-virtio, with

Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Cc: sta...@dpdk.org

Thanks.

--yliu


[dpdk-dev] [dpdk-announce] DPDK 17.08.1 released

2017-12-07 Thread Yuanhan Liu
 -Og
  net/qede: fix compilation with -Og
  app/test-crypto-perf: fix memory leak
  app/test-crypto-perf: fix compilation with -Og
  net/virtio: revert not claiming LRO support
  net/virtio: revert not claiming IP checksum offload
  net/virtio: fix log levels in configure
  net/virtio: fix mbuf port for simple Rx function
  net/virtio: fix queue setup consistency
  net/virtio: fix compilation with -Og
  lpm6: fix compilation with -Og

Ophir Munk (3):
  net/tap: fix flow and port commands
  net/failsafe: fix VLAN stripping configuration
  app/testpmd: fix forwarding between non consecutive ports

Pablo de Lara (8):
  hash: fix eviction counter
  crypto/aesni_gcm: fix zero data operation
  app/crypto-perf: fix packet length check
  app/crypto-perf: parse AEAD data from vectors
  crypto/openssl: fix AEAD parameters
  examples/l2fwd-crypto: fix physical address setting
  crypto/qat: fix HMAC supported digest sizes
  app/testpmd: fix topology error message

Patrick MacArthur (1):
  eal: copy raw strings taken from command line

Phil Yang (1):
  app/testpmd: fix quitting in container

Qi Zhang (4):
  net/i40e: fix flow control watermark mismatch
  net/i40e: fix packet count for PF
  net/i40e: fix mbuf free in vector Tx
  net/i40e: fix mirror with firmware 6.0

Rami Rosen (1):
  net/kni: remove driver struct forward declaration

Rasesh Mody (4):
  net/qede/base: fix to use a passed ptt handle
  net/qede/base: fix return code to align with FW
  net/qede: remove duplicate includes
  net/qede/base: fix division by zero

Raslan Darawsheh (2):
  net/failsafe: fix failsafe bus uninit return value
  net/failsafe: fix PCI devices init

RongQiang Xie (2):
  net/enic: fix possible null pointer dereference
  net/qede: fix possible null pointer dereference

Sebastian Basierski (3):
  net/vmxnet3: fix unintentional integer overflow
  net/virtio-user: fix TAP name string termination
  net/virtio: check error on setting non block flag

Sergio Gonzalez Monroy (1):
  crypto/aesni_mb: fix invalid session error

Shahaf Shuler (5):
  net/mlx5: fix num seg assumption in SSE Tx
  net/mlx5: fix Tx stats error counter definition
  net/mlx5: fix Tx stats error counter logic
  net/mlx5: fix TSO segment size verification
  net/mlx5: fix packet type flags for Ethernet only frame

Stefan Baranoff (1):
  net/pcap: fix memory leak in dumper open

Stephen Hemminger (1):
  eal: initialize logging before bus

Tiwei Bie (1):
  net/virtio: flush Rx queues on start

Tomasz Duszynski (3):
  drivers/crypto: use snprintf return value correctly
  examples/ipsec-secgw: fix IP version check
  examples/ipsec-secgw: fix IPv6 payload length

Tomasz Kulasek (3):
  net/bonding: fix slaves capacity check
  net/i40e: fix assignment of enum values
  net/bonding: fix check slaves link properties

Vipin Varghese (1):
  net/tap: fix unregistering callback with invalid fd

Wei Dai (8):
  net/ixgbe: fix mapping of user priority to TC
  net/ixgbe: fix adding a mirror rule
  net/i40e: fix mirror rule reset when port is closed
  net/ixgbe: fix Rx queue interrupt mapping in VF
  net/ixgbe: fix VFIO interrupt mapping in VF
  net/ixgbe: fix PF DCB info
  app/testpmd: fix mapping of user priority to DCB TC
  net/i40e: fix VFIO interrupt mapping in VF

Wei Zhao (4):
  net/ixgbe: fix MAC VLAN filter fail problem
  net/i40e: fix clear xstats bug in VF
  app/testpmd: fix packet throughput after stats reset
  net/ixgbe: fix filter parser for L2 tunnel

Wenzhuo Lu (7):
  net/i40e: fix TM node parameter checking
  net/i40e: fix TM level capability getting
  net/ixgbe: fix TM node parameter checking
  net/ixgbe: fix TM level capability getting
  net/i40e: fix not supporting NULL TM profile
  net/ixgbe: fix not supporting NULL TM profile
  net/i40e: fix parent when adding TM node

Xiaoyun Li (2):
  net/i40e: fix PF notify issue when VF is not up
  net/igb: fix Rx interrupt with VFIO and MSI-X

Xueming Li (5):
  net/mlx5: fix tunnel offload detection
  mem: fix malloc debug config
  mem: fix malloc element free in debug mode
  examples/l2fwd_fork: fix message pool init
  examples/multi_process: fix received message length

Yi Yang (1):
  service: fix build with gcc 4.9

Yong Wang (4):
  net/liquidio: fix uninitialized variable
  net/igb: fix memcpy length
  net/i40e: fix uninitialized variable
  net/ixgbe: fix uninitialized variable

Yongseok Koh (3):
  net/mlx5: fix calculating TSO inline size
  net/mlx5: fix overflow of Rx SW ring
  net/mlx5: fix tunneled TCP/UDP packet type

Yuanhan Liu (2):
  Revert "net/virtio: flush Rx queues on start"
  version: 17.08.1

Zhiyong Yang (2):
  test: fix assignment operation

Re: [dpdk-dev] 17.08.1 patches review and test

2017-12-07 Thread Yuanhan Liu
On Mon, Dec 04, 2017 at 08:40:03PM +, Patil, Harish wrote:
> 
> 
> -Original Message-
> From: dev  on behalf of Yuanhan Liu
> 
> Date: Monday, November 27, 2017 at 4:21 AM
> To: dpdk stable 
> Cc: "dev@dpdk.org" , "Xu, Qian Q" 
> Subject: [dpdk-dev] 17.08.1 patches review and test
> 
> >Hi all,
> >
> >Here is a list of patches targeted for stable release 17.08.1. Please
> >help review and test. The planned date for the final release is 7th,
> >Dec. Before that, please shout if anyone has objections with these
> >patches being applied.
> >
> >These patches are located at branch 17.08 of dpdk-stable repo:
> >http://dpdk.org/browse/dpdk-stable/
> >
> >Thanks.
> >
> >--yliu
> >
> >---
> >Aaron Conole (1):
> >  net/enic: fix assignment
> >
> >Ajit Khaparde (28):
> >  net/bnxt: fix HWRM macros and locking
> >  net/bnxt: use 64-bits of address for VLAN table
> >  net/bnxt: fix an issue with group id calculation
> >  net/bnxt: fix calculation of number of pools
> >  net/bnxt: handle multi queue mode properly
> >  net/bnxt: fix Rx handling and buffer allocation logic
> >  net/bnxt: fix an issue with broadcast traffic
> >  net/bnxt: fix usage of VMDq flags
> >  net/bnxt: set checksum offload flags correctly
> >  net/bnxt: update status of Rx IP/L4 CKSUM
> >  net/bnxt: fix config RSS update
> >  net/bnxt: set the hash key size
> >  net/bnxt: fix per queue stats display in xstats
> >  net/bnxt: fix interrupt handler
> >  net/bnxt: fix number of MAC addresses for VMDq
> >  net/bnxt: fix the association of a MACVLAN per VNIC
> >  net/bnxt: fix Tx offload capability
> >  net/bnxt: fix Rx offload capability
> >  net/bnxt: handle Rx multi queue creation properly
> >  net/bnxt: remove redundant code parsing pool map
> >  net/bnxt: fix a bit shift operation
> >  net/bnxt: fix a potential null pointer dereference
> >  net/bnxt: fix a potential null pointer dereference
> >  net/bnxt: fix a pointer deref before null check
> >  net/bnxt: fix an unused value
> >  net/bnxt: check VLANs from pool map only for VMDq
> >  net/bnxt: do not set hash type unnecessarily
> >  net/bnxt: fix VLAN spoof configuration
> >
> >Akhil Goyal (2):
> >  test/crypto: fix dpaa2 sec macros and definitions
> >  net/dpaa2: set queues after reconfiguration
> >
> >Alejandro Lucero (2):
> >  net/nfp: fix RSS
> >  net/nfp: fix Rx interrupt when multiqueue
> >
> >Alok Makhariya (2):
> >  crypto/dpaa2_sec: remove ICV memset on decryption side
> >  crypto/dpaa2_sec: add check for segmented buffer
> >
> >Anatoly Burakov (1):
> >  vfio: fix secondary process initialization
> >
> >Andrey Chilikin (1):
> >  net/i40e: fix flexible payload configuration
> >
> >Aviad Yehezkel (4):
> >  examples/ipsec-secgw: fix crypto device mapping
> >  examples/ipsec-secgw: fix session creation
> >  examples/ipsec-secgw: fix AAD length setting
> >  app/testpmd: fix build without ixgbe and bnxt PMDs
> >
> >Beilei Xing (1):
> >  net/i40e: fix VF device stop issue
> >
> >Chas Williams (1):
> >  net/vmxnet3: fix memory leak when releasing queues
> >
> >Congwen Zhang (1):
> >  net/cxgbe: fix memory leak
> >
> >Daniel Mrzyglod (3):
> >  net/virtio: fix untrusted scalar value
> >  app/testpmd: fix DDP package filesize detection
> >  net/bonding: fix default aggregator mode to stable
> >
> >David Harton (2):
> >  net/vmxnet3: fix MAC address set
> >  net/i40e: fix i40evf MAC filter table
> >
> >Ferruh Yigit (4):
> >  ethdev: fix ABI version
> >  ethdev: revert use port name from device structure
> >  igb_uio: remove device reset in open
> >  net/qede: fix icc build
> >
> >Gaetan Rivet (1):
> >  net/failsafe: fix errno set on command execution
> >
> >Gowrishankar Muthukrishnan (1):
> >  net/bonding: support bifurcated driver in eal
> >
> >Guduri Prathyusha (2):
> >  examples/l3fwd: fix NEON instructions
> >  examples/l3fwd: fix aliasing in port grouping
> >
> >Harish Patil (2):
> >  net/qede: fix supported packet types
> >  net/qede: fix to re-enable LRO during device start
> >
> >Hemant Agrawal (3):
> >  net/d

Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-06 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 06:22:05PM +0100, Adrien Mazarguil wrote:
> > > > > Just for information, this "port=x" argument in mlx4 is consistent 
> > > > > with the
> > > > > value found in /sys/class/net/ethX/dev_port under Linux. If we choose 
> > > > > to use
> > > > > a port index (instead of a MAC or something else), it would make 
> > > > > sense to
> > > > > standardize it on the same order as given by the host OS for 
> > > > > consistency
> > > > > across all PMDs.
> > > 
> > > Thanks for the info.
> > > 
> > > But FYI, for most of other PMDs, such sys file won't exist, as the host
> > > driver should have been unbind and bind with something like uio. So I
> > > don't think it applies to all other nics.
> 
> Sure, I only meant PMDs must implement the same numbering scheme the kernel
> driver they replace would have used (as exposed through dev_port) for
> consistency. Note dev_port is always present since Linux 3.15, even with
> single port/bus address devices, so applications that want to construct
> -w/-b arguments can rely on that before unbinding devices.

I don't think that's a clean way. Fortunate enough though, even we
want to use the port as one of the key for identification, we don't
really need that in most cases. So, just like the mac proposed here,
we could (and probably should) make it optional.

[...]
> > > I'm not a fan of the MAC naming, neither. The reason this patch proposes 
> > > mac
> > > naming is that it's more clear for the user to specify a port. I also 
> > > agree
> > > that the port index could be another good option.
> > > 
> > > One thing I really haven't considered yet is how it becomes when the VF
> > > reprenstor comes to play? (more guys are cc'ed).
> 
> Won't VFs come through a separate PCI bus address anyway? Not sure extra
> care is needed for those.

Yes, for VF. But I was talking about VF representors [1]. Note that the
interface is not settled down yet, but it's likely we need some interfaces
(or more precisely, some concepts) like that. In that proposal, several
ports could share a same vdev name, while each one is differenced by a
vport_id. Again, it's not settled down yet, it might be something else
in the end, say all share a same (PF) PCI id, while each one is identified
by something else, etc.

[1]: http://dpdk.org/ml/archives/dev/2017-November/080946.html

--yliu


Re: [dpdk-dev] [PATCH v2 2/2] net/virtio: support GUEST ANNOUNCE

2017-12-06 Thread Yuanhan Liu
On Wed, Dec 06, 2017 at 07:23:11PM +0800, Tiwei Bie wrote:
> On Mon, Dec 04, 2017 at 06:02:08AM -0800, Xiao Wang wrote:
> [...]
> > diff --git a/drivers/net/virtio/virtio_rxtx.c 
> > b/drivers/net/virtio/virtio_rxtx.c
> > index 6a24fde..7313bdd 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -1100,3 +1100,84 @@
> >  
> > return nb_tx;
> >  }
> > +
> > +uint16_t
> > +virtio_inject_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t 
> > nb_pkts)
> > +{
> > +   struct virtnet_tx *txvq = tx_queue;
> > +   struct virtqueue *vq = txvq->vq;
> > +   struct virtio_hw *hw = vq->hw;
> > +   uint16_t hdr_size = hw->vtnet_hdr_size;
> > +   uint16_t nb_used, nb_tx = 0;
> > +
> > +   if (unlikely(nb_pkts < 1))
> > +   return nb_pkts;
> > +
> > +   PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
> > +   nb_used = VIRTQUEUE_NUSED(vq);
> > +
> > +   virtio_rmb();
> > +   if (likely(nb_used > vq->vq_nentries - vq->vq_free_thresh))
> > +   virtio_xmit_cleanup(vq, nb_used);
> > +
> > +   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> > +   struct rte_mbuf *txm = tx_pkts[nb_tx];
> > +   int can_push = 0, use_indirect = 0, slots, need;
> > +
> > +   /* optimize ring usage */
> > +   if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
> > +   vtpci_with_feature(hw, 
> > VIRTIO_F_VERSION_1)) &&
> > +   rte_mbuf_refcnt_read(txm) == 1 &&
> > +   RTE_MBUF_DIRECT(txm) &&
> > +   txm->nb_segs == 1 &&
> > +   rte_pktmbuf_headroom(txm) >= hdr_size &&
> > +   rte_is_aligned(rte_pktmbuf_mtod(txm, char *),
> > +   __alignof__(struct virtio_net_hdr_mrg_rxbuf)))
> > +   can_push = 1;
> > +   else if (vtpci_with_feature(hw, VIRTIO_RING_F_INDIRECT_DESC) &&
> > +txm->nb_segs < VIRTIO_MAX_TX_INDIRECT)
> > +   use_indirect = 1;
> > +
> > +   /* How many main ring entries are needed to this Tx?
> > +* any_layout => number of segments
> > +* indirect   => 1
> > +* default=> number of segments + 1
> > +*/
> > +   slots = use_indirect ? 1 : (txm->nb_segs + !can_push);
> > +   need = slots - vq->vq_free_cnt;
> > +
> > +   /* Positive value indicates it need free vring descriptors */
> > +   if (unlikely(need > 0)) {
> > +   nb_used = VIRTQUEUE_NUSED(vq);
> > +   virtio_rmb();
> > +   need = RTE_MIN(need, (int)nb_used);
> > +
> > +   virtio_xmit_cleanup(vq, need);
> > +   need = slots - vq->vq_free_cnt;
> > +   if (unlikely(need > 0)) {
> > +   PMD_TX_LOG(ERR,
> > +   "No free tx descriptors to 
> > transmit");
> > +   break;
> > +   }
> > +   }
> > +
> > +   /* Enqueue Packet buffers */
> > +   virtqueue_enqueue_xmit(txvq, txm, slots, use_indirect, 
> > can_push);
> > +
> > +   txvq->stats.bytes += txm->pkt_len;
> > +   virtio_update_packet_stats(&txvq->stats, txm);
> > +   }
> > +
> > +   txvq->stats.packets += nb_tx;
> > +
> > +   if (likely(nb_tx)) {
> > +   vq_update_avail_idx(vq);
> > +
> > +   if (unlikely(virtqueue_kick_prepare(vq))) {
> > +   virtqueue_notify(vq);
> > +   PMD_TX_LOG(DEBUG, "Notified backend after xmit");
> > +   }
> > +   }
> > +
> > +   return nb_tx;
> > +}
> 
> Simple Tx has some special assumptions and setups of the txq.
> Basically the current implementation of virtio_inject_pkts()
> is a mirror of virtio_xmit_pkts(). So when simple Tx function
> is chosen, calling virtio_inject_pkts() could cause problems.

That's why I suggested to invoke the tx_pkt_burst callback directly.

--yliu


Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes

2017-12-06 Thread Yuanhan Liu
On Wed, Dec 06, 2017 at 03:55:49PM +0200, Victor Kaplansky wrote:
> When performing live migration or memory hot-plugging,
> the changes to the device and vrings made by message handler
> done independently from vring usage by PMD threads.
> 
> This causes for example segfauls during live-migration
> with MQ enable, but in general virtually any request
> sent by qemu changing the state of device can cause
> problems.
> 
> These patches fixes all above issues by adding a spinlock
> to every vring and requiring message handler to start operation
> only after ensuring that all PMD threads related to the divece
> are out of critical section accessing the vring data.
> 
> Each vring has its own lock in order to not create contention
> between PMD threads of different vrings and to prevent
> performance degradation by scaling queue pair number.

Hi,

Thanks for the patch! Firstly, I didn't see your SoB.

I'm also more interested to know do you see any performance penalty?

> ---
>  lib/librte_vhost/vhost.h  |  1 +
>  lib/librte_vhost/vhost_user.c | 44 
> +++
>  lib/librte_vhost/virtio_net.c |  8 
>  3 files changed, 53 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 1cc81c17..812aeccd 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -137,6 +137,7 @@ struct vhost_virtqueue {
>   TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
>   int iotlb_cache_nr;
>   TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
> +rte_spinlock_t active_lock;

The indentation is broken.

>  } __rte_cache_aligned;
> @@ -1356,6 +1361,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>   if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
>   vhost_user_iotlb_rd_unlock(vq);
>  
> +rte_spinlock_unlock(&vq->active_lock);

Ditto.

--yliu


Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend

2017-12-06 Thread Yuanhan Liu
On Wed, Dec 06, 2017 at 12:23:19PM +0100, Maxime Coquelin wrote:
> Hi,
> 
> On 12/05/2017 07:58 AM, Wang, Wei 5. (NSB - CN/Hangzhou) wrote:
> >
> >
> >Hi, all
> >
> >In DPDK document, it it described that Virtio in containers Cannot work when 
> >there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. In another word, 
> >do not use 2MB hugepage so far. Do you know the reason of this limitation?
> 
> What document are you referring to?
> Actually, a single region is composed of several hugepages, be it 1G or
> 2M. I use 2M hugepages daily without issues.

He was talking about virtio-user.

--yliu
> 
> The limitation comes from the Vhost-user protocol spec. I know there are
> discussions to enlarge it, but it would require a new protocol flag not
> to break existing implementations, so no workaround possible.
> 
> >In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't be 
> >set to 1GB size. The hugepage number shall be more than 8
> >
> >
> >So is there any solution or workaround to fix this limitation? Or change 
> >dpdk code to fix this limitation?
> 
> Have you tried to use 2MB pages and faced the NREGIONS > 8 case?
> If so, could you tell us more about your setup (QEMU version, VM RAM
> size, etc...)?
> 
> Cheers,
> Maxime
> >
> >
> >Vivian
> >seat:  19009
> >phone: 13738006921
> >
> >
> >


Re: [dpdk-dev] Your next patch "net/virtio: fix an incorrect behavior..."

2017-12-05 Thread Yuanhan Liu
Thanks for the remind! I have reverted it.

--yliu

On Wed, Dec 06, 2017 at 03:11:48PM +0800, Tiwei Bie wrote:
> Hi Yuanhan,
> 
> Thanks for Antonio's reminder!
> 
> Below patch may break vector Rx in some cases [1].
> 
> http://dpdk.org/browse/dpdk-stable/commit/drivers/net/virtio?h=17.08&id=f4e455b776f61ec7fbdfb5dce6ba34ad9016689b
> 
> And I'm working on a fix for it. But as the deadline of 17.08.1
> is approaching, we don't have enough time to get the fix reviewed
> and tested by the community. I'm not sure, maybe we should remove
> it from the stable branch for 17.08.1 for now?
> 
> [1] http://dpdk.org/ml/archives/dev/2017-December/082983.html
> 
> Best regards,
> Tiwei Bie
> 
> On Tue, Dec 05, 2017 at 09:14:39PM +0800, Fischetti, Antonio wrote:
> > Thanks Tiwei,
> > one more thing. I saw they are going to backport some patches into 17.08.1
> > 
> > http://dev.dpdk.narkive.com/pX0IIwPl/dpdk-dev-17-08-1-patches-review-and-test
> > 
> > is your next fix related to the one reported as "net/virtio: flush Rx 
> > queues on start"?
> > 
> > If that's the case, should we flag this to Yuanhan Liu, just to be aware 
> > there's some work in progress?
> > 
> > 
> > Antonio
> > 


Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend

2017-12-05 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 06:58:53AM +, Wang, Wei 5. (NSB - CN/Hangzhou) 
wrote:
> 
> 
> Hi, all
> 
> In DPDK document, it it described that Virtio in containers Cannot work when 
> there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. In another word, 
> do not use 2MB hugepage so far. Do you know the reason of this limitation?

It comes from the vhost-user spec.

> In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't be 
> set to 1GB size. The hugepage number shall be more than 8
> 
> 
> So is there any solution or workaround to fix this limitation? Or change dpdk 
> code to fix this limitation?
> 

AFAIK, there are no workarounds. And I think the DPDK EAL memory code need be
refactored a bit to fix this limitation. Unforunately, seems there are no work
on that recently.

--yliu


Re: [dpdk-dev] [PATCH 2/2] net/virtio: support GUEST ANNOUNCE

2017-12-05 Thread Yuanhan Liu
On Thu, Nov 30, 2017 at 02:41:12AM +, Wang, Xiao W wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > Sent: Monday, November 27, 2017 8:49 PM
> > To: Wang, Xiao W 
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH 2/2] net/virtio: support GUEST ANNOUNCE
> > 
> > On Fri, Nov 24, 2017 at 03:04:00AM -0800, Xiao Wang wrote:
> > > When live migration is done, for the backup VM, either the virtio
> > > frontend or the vhost backend needs to send out gratuitous RARP packet
> > > to announce its new network location.
> > >
> > > This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support
> > live
> > > migration scenario where the vhost backend doesn't have the ability to
> > > generate RARP packet.
> > 
> > Yes, it's a feature good to have.
> > 
> > > +static int
> > > +virtio_dev_pause(struct rte_eth_dev *dev)
> > > +{
> > > + struct virtio_hw *hw = dev->data->dev_private;
> > > +
> > > + if (hw->started == 0)
> > > + return -1;
> > > + hw->started = 0;
> > > + /*
> > > +  * Prevent the worker thread from touching queues to avoid condition,
> > > +  * 1 ms should be enough for the ongoing Tx function to finish.
> > > +  */
> > > + rte_delay_ms(1);
> > > + return 0;
> > > +}
> > > +
> > > +static void
> > > +virtio_dev_resume(struct rte_eth_dev *dev)
> > > +{
> > > + struct virtio_hw *hw = dev->data->dev_private;
> > > +
> > > + hw->started = 1;
> > > +}
> > 
> > However, the implementation (stop first, pause for 1ms, duplicate another
> > Tx function, resume) doesn't seem elegant.
> > 
> > You probably could try something like DPDK vhost does:
> > 
> > - set a flag when S_ANNOUCE is received
> > - inject a pkt when such flag is set in the xmit function
> > 
> > You then should be able to get rid of all of above stuffs.
> > 
> > --yliu
> 
> The difference is that the virtio port may just receive packet, without xmit.

Thanks, I missed that.

However, you really should not add a duplicate function. It adds more
maintain effort. I think you probably could just invoke the tx_pkt_burst
callback directly. You have stopped the device after all. What's the
necessary to duplicate it?

--yliu


Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message

2017-12-05 Thread Yuanhan Liu
On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
> >In a running VM, operations (like device attach/detach) will
> >trigger the QEMU to resend set_mem_table to vhost-user backend.
> >
> >DPDK vhost-user handles this message rudely by unmap all existing
> >regions and map new ones. This might lead to segfault if there
> >is pmd thread just trying to touch those unmapped memory regions.
> >
> >But for most cases, except VM memory hotplug, QEMU still sends the
> >set_mem_table message even the memory regions are not changed as
> >QEMU vhost-user filters out those not backed by file (fd > 0).
> >
> >To fix this case, we add a check in the handler to see if the
> >memory regions are really changed; if not, we just keep old memory
> >regions.
> >
> >Fixes: 8f972312b8f4 ("vhost: support vhost-user")
> >
> >CC: sta...@dpdk.org
> >
> >CC: Yuanhan Liu 
> >CC: Maxime Coquelin 
> >
> >Reported-by: Yang Zhang 
> >Reported-by: Xin Long 
> >Signed-off-by: Yi Yang 
> >Signed-off-by: Jianfeng Tan 
> >---
> >  lib/librte_vhost/vhost_user.c | 33 +
> >  1 file changed, 33 insertions(+)
> 
> Reviewed-by: Maxime Coquelin 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario

2017-12-05 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 06:59:26AM +, Yang, Zhiyong wrote:
> Hi all,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhiyong Yang
> > Sent: Thursday, November 30, 2017 5:47 PM
> > To: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com
> > Cc: Wang, Wei W ; Tan, Jianfeng
> > 
> > Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > supporting VM2VM scenario
> > 
> > Vhostpci PMD is a new type driver working in guest OS which has ability to 
> > drive
> > the vhostpci modern pci device, which is a new virtio device.
> > 
> > The following linking is about vhostpci design:
> > 
> > An initial device design is presented at KVM Forum'16:
> > http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
> > pci.pdf
> > The latest device design and implementation will be posted to the QEMU
> > community soon.
> > 
> 
> The latest version vhostpci device code has been posted to QEMU community.
> [PATCH v3 0/7] Vhost-pci for inter-VM communication
> http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html

I didn't follow on the qemu ML. Do you know when it's likely it will
get accepted?

--yliu
> 
> [2016] Design of Vhost-pci by Wei Wang - YouTube linking as reference.
> https://www.youtube.com/watch?v=xITj0qsaSJQ
> 
> thanks
> Zhiyong


Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-05 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 02:20:05PM +0100, Thomas Monjalon wrote:
> 05/12/2017 12:04, Adrien Mazarguil:
> 
> > Hi Yuanhan,
> 
> >
> 
> > On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote:
> > > rte_devargs is identified by the name (pci id for pci device). It also
> > > includes other driver specific key-value options. It's not clear for the
> > > user to know which one (or few) of them should be used together with the
> > > PCI id to identify a specific port. For example, as you mentioned, in
> > > mlx4, it's "pci_id,port=x". It could be something else in other drivers.
> 
> > Just for information, this "port=x" argument in mlx4 is consistent with the
> > value found in /sys/class/net/ethX/dev_port under Linux. If we choose to use
> > a port index (instead of a MAC or something else), it would make sense to
> > standardize it on the same order as given by the host OS for consistency
> > across all PMDs.

Thanks for the info.

But FYI, for most of other PMDs, such sys file won't exist, as the host
driver should have been unbind and bind with something like uio. So I
don't think it applies to all other nics.

> 
> Good idea, thanks.
> 
>  
> 
> I think we will MAC in some cases and port number in others.

Could you list some examples?

> It is important to have identifiers available even without initializing
> the device.

I don't object that. I think that's good for whitelisting/blacklisting
before the device initialization. However, once the initialization is
done, everything could be used to identify a port (say, the mac this patch
mentioned).

> > Devices with a single port per PCI address would simply use/allow "0".

Yes.

> >
> 
> > > Actually, this patch also proposes a devarg like naming style: "name[,mac]
> ".
> > > What it does try to do is to define a standard syntax, so that the user
> > > doesn't have to know any driver specific options.
> > >
> > > However, the mac address is changeable, leaving this naming inconsistent.
> > > Well, in practice, PCI id is also changeable.
> 
> > >
> 
> > > OTOH, having a consistent naming seems a bit far away from this patch's
> > > goal: define a standard ethdev naming and leave less harassment to the
> users.
> 
> >
> 
> > I'm not a fan of the MAC naming scheme either, a kind of per-device physical
> > port index seems more robust and doesn't require much initialization to
> > determine how many ports are supported by the device and whether the index 
> > is
> > known/valid (particularly given the vast majority exposes only one per bus
> > address).

I'm not a fan of the MAC naming, neither. The reason this patch proposes mac
naming is that it's more clear for the user to specify a port. I also agree
that the port index could be another good option.

One thing I really haven't considered yet is how it becomes when the VF
reprenstor comes to play? (more guys are cc'ed).

> > Would it make sense to have this name usable unmodified as a valid device
> > (-w) argument, including parameters?
> 
> Yes we must provide some new key/value arguments for devargs.
> So it can be used anywhere, including -w/-b options in DPDK
> or port configuration in OVS.
>  
> 
> > If so, PMDs could append parameters while probing the underlying device, by
> > appending ",port=x", ",mac=x" followed by other unspecified parameters with
> > default values. This could uniquely identify the port _and_ its
> > configuration in a reusable fashion.
> 
>  
> 
> Question: should we separate device identification and configuration
> in the syntax?

Yes, we should. I think we should parse the identification generically,
and leave the left to the driver.

> > Otherwise if all we need is unique names, we can use the opposite and much
> > simpler approach. Let librte_ether assign them sequentially
> > (e.g. "rte_eth%d", no need for consistency with OS netdevices), applications
> > can figure the rest based on data structures if needed.
> 
> No, unique names are not useful / not usable by users.

Agree. We don't really need another unique name, since the port id is
already unique.

However, I think a consistent naming might be useful: user doesn't have
to worry it even though MAC is changed or the PCI slot is changed.

--yliu


Re: [dpdk-dev] [PATCH] maintainers: claim responsibility for virtio PMD

2017-12-05 Thread Yuanhan Liu
On Fri, Dec 01, 2017 at 08:06:04PM +0800, Yuanhan Liu wrote:
> On Fri, Dec 01, 2017 at 07:04:35PM +0800, Tiwei Bie wrote:
> > Add myself as co-maintainer for virtio driver.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> > Hi Yuanhan and Maxime,
> > 
> > As there will be more and more virtio related stuffs (e.g. vhost-pci,
> > virtio crypto, vhost crypto) in DPDK, more maintaining efforts will be
> > needed for virtio/vhost. So I volunteer to co-maintain virtio driver.
> > Hope this could help!
> 
> I'm glad that you are offering help! And
> 
> Acked-by: Yuanhan Liu 

It was a mistake; I switched to an older machine, with older configs.
Sorry for that. And here is the right one:

Acked-by: Yuanhan Liu 

--yliu


  1   2   3   4   5   6   7   8   9   10   >