[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-24 Thread Xie, Huawei
On 12/24/2015 2:49 AM, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
>> Sent: Wednesday, December 23, 2015 6:38 PM
>> To: Xie, Huawei
>> Cc: dev at dpdk.org; dprovan at bivio.net
>> Subject: Re: [dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk 
>> API
>>
>> On Wed, 23 Dec 2015 00:17:53 +0800
>> Huawei Xie  wrote:
>>
>>> +
>>> +   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
>>> +   if (unlikely(rc))
>>> +   return rc;
>>> +
>>> +   switch (count % 4) {
>>> +   case 0: while (idx != count) {
>>> +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
>>> +   rte_mbuf_refcnt_set(mbufs[idx], 1);
>>> +   rte_pktmbuf_reset(mbufs[idx]);
>>> +   idx++;
>>> +   case 3:
>>> +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
>>> +   rte_mbuf_refcnt_set(mbufs[idx], 1);
>>> +   rte_pktmbuf_reset(mbufs[idx]);
>>> +   idx++;
>>> +   case 2:
>>> +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
>>> +   rte_mbuf_refcnt_set(mbufs[idx], 1);
>>> +   rte_pktmbuf_reset(mbufs[idx]);
>>> +   idx++;
>>> +   case 1:
>>> +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
>>> +   rte_mbuf_refcnt_set(mbufs[idx], 1);
>>> +   rte_pktmbuf_reset(mbufs[idx]);
>>> +   idx++;
>>> +   }
>>> +   }
>>> +   return 0;
>>> +}
>> Since function will not work if count can not be 0 (otherwise 
>> rte_mempool_get_bulk will fail),
> As I understand, rte_mempool_get_bulk() will work correctly and return 0, if 
> count==0.
> That's why Huawei prefers while() {}, instead of do {} while() - to avoid 
> extra check for
> (count != 0) at the start. 
> Konstantin

Yes.

>
>
>> why not:
>>  1. Document that assumption
>>  2. Use that assumption to speed up code.
>>
>>
>>
>>  switch(count % 4) {
>>  do {
>>  case 0:
>>  ...
>>  case 1:
>>  ...
>>  } while (idx != count);
>>  }
>>
>> Also you really need to add a big block comment about this loop, to explain
>> what it does and why.

Since we change duff's implementation a bit, and for people who don't
know duff's device, we could add comment.
Is comment like this enough?
Use Duff's device to unroll the loop a bit to gain more performance
Use while() rather than do {} while() as count could be zero.




[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Tetsuya Mukawa
On 2015/12/22 13:47, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu 
> wrote:
>
>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>> in a few
>>> ways:
>> Rich, thanks for the info!
>>
>>> 1. new_device/destroy_device: Link state change (will be covered by the
>> link
>>> status interrupt).
>>> 2. new_device: Add first queue to datapath.
>> I'm wondering why vring_state_changed() is not used, as it will also be
>> triggered at the beginning, when the default queue (the first queue) is
>> enabled.
>>
> Turns out I'd misread the code and it's already using the
> vring_state_changed callback for the
> first queue. Not sure if this is intentional but vring_state_changed is
> called for the first queue
> before new_device.
>
>
>>> 3. vring_state_changed: Add/remove queue to datapath.
>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>> when
>>> qemu is killed).
>> I had a plan to invoke vring_state_changed() to disable all vrings
>> when destroy_device() is called.
>>
> That would be good.
>
>
>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>> You can get the 'struct virtio_net' dev from all above callbacks.
>
>
>> 1. Link status interrupt.
>>
>> To vhost pmd, new_device()/destroy_device() equals to the link status
>> interrupt, where new_device() is a link up, and destroy_device() is link
>> down().
>>
>>
>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>> should
>>> cover the first queue at new_device and removal of all queues at
>>> destroy_device.
>> As stated above, vring_state_changed() should be able to do that, except
>> the one on destroy_device(), which is not done yet.
>>
>>> 3. Per-queue or per-device NUMA node info.
>> You can query the NUMA node info implicitly by get_mempolicy(); check
>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>
> Your suggestions are exactly how my application is already working. I was
> commenting on the
> proposed changes to the vhost PMD API. I would prefer to
> use RTE_ETH_EVENT_INTR_LSC
> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> of these vhost-specific
> hacks. The queue state change callback is the one new API that needs to be
> added because
> normal NICs don't have this behavior.
>
> You could add another rte_eth_event_type for the queue state change
> callback, and pass the
> queue ID, RX/TX direction, and enable bit through cb_arg. 

Hi Rich,

So far, EAL provides rte_eth_dev_callback_register() for event handling.
DPDK app can register callback handler and "callback argument".
And EAL will call callback handler with the argument.
Anyway, vhost library and PMD cannot change the argument.

I guess the callback handler will need to call ethdev APIs to know what
causes the interrupt.
Probably rte_eth_dev_socket_id() is to know numa_node, and
rte_eth_dev_info_get() is to know the number of queues.
Is this okay for your case?

Thanks,
Tetsuya


[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-24 Thread Yuanhan Liu
On Wed, Dec 23, 2015 at 11:26:17PM +0100, Thomas Monjalon wrote:
> 2015-12-23 10:09, Yuanhan Liu:
> > On Wed, Dec 23, 2015 at 09:55:54AM +0800, Peter Xu wrote:
> > > On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote:
> > > > On 12/22/2015 7:39 PM, Peter Xu wrote:
> > > > > I tried to unbind one of the virtio net device, I see the PCI entry
> > > > > still there.
> > > > >
> > > > > Before unbind:
> > > > >
> > > > > [root at vm proc]# lspci -k -s 00:03.0
> > > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > > > Subsystem: Red Hat, Inc Device 0001
> > > > > Kernel driver in use: virtio-pci
> > > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > > > >   c060-c07f : :00:03.0
> > > > > c060-c07f : virtio-pci
> > > > >
> > > > > After unbind:
> > > > >
> > > > > [root at vm proc]# lspci -k -s 00:03.0
> > > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > > > Subsystem: Red Hat, Inc Device 0001
> > > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > > > >   c060-c07f : :00:03.0
> > > > >
> > > > > So... does this means that it is an alternative to black list
> > > > > solution?
> > > > Oh, we could firstly check if this port is manipulated by kernel driver
> > > > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too 
> > > > late.
> > 
> > Why can't we simply quit at pci_scan_one, once finding that it's not
> > bond to uio (or similar stuff)? That would be generic enough, that we
> > don't have to do similar checks for each new pmd driver.
> > 
> > Or, am I missing something?
> 
> UIO is not needed to make virtio works (without interrupt support).
> Sometimes it may be required to avoid using kernel modules.

While dig the git history, I found the commit:

commit da978dfdc43b59e290a46d7ece5fd19ce79a1162
Author: Ouyang Changchun 
Date:   Mon Feb 9 09:14:06 2015 +0800

virtio: use port IO to get PCI resource

Make virtio not require UIO for some security reasons, this is to match
6WIND's virtio-net-pmd.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 

The commit log is not well written, giving no explanation about the
"some security reasons".

Anyway, I see it now that it's kind of a design.


Note that my first patch set about enabling virtio 1.0 [0] sets the
RTE_PCI_DRV_NEED_MAPPING flag, which somehow implies that uio is a
must, otherwise, eal init would fail at pci_map_device().

So that my pathset breaks the un-documented rule, and I need fix it.
How about adding a wrapper, say rte_pci_map_device(), and exporting
it, so that virtio pmd driver could map resources when necessary?

[0]: http://dpdk.org/dev/patchwork/bundle/yliu/virtio-1.0-pmd/


> > > I guess there might be two problems? Which are:
> > > 
> > > 1. How user avoid DPDK taking over virtio devices that they do not
> > >want for IO (chooses which device to use)
> > 
> > Isn't that what's the 'binding/unbinding' for?
> 
> Binding is, sometimes, required.

We may need fix the doc then. As the doc says it's a must:

3.6. Binding and Unbinding Network Ports to/from the Kernel Modules

Instead, all ports that are to be used by an DPDK application
==> must be bound to the uio_pci_generic, igb_uio or vfio-pci
module before the application is run. Any network ports under
Linux* control will be ignored by the DPDK poll-mode drivers
and cannot be used by the application.


--yliu

> But does it mean DPDK should use every available ports?
> That's the default and may be configured with blacklist/whitelist.
> 
> > > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in
> > >kernel (happens on every virtio device that DPDK uses)
> > 
> > If you unbinded the kernel driver first, which is the suggested (or
> > must?) way to use DPDK, that will not happen.


[dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions

2015-12-24 Thread Yuanhan Liu
On Wed, Dec 23, 2015 at 09:45:19AM +, Bernard Iremonger wrote:
> This initialisation of nb_rx_queues and nb_tx_queues has been removed
> from eth_virtio_dev_init.
> 
> The nb_rx_queues and nb_tx_queues were being initialised in 
> eth_virtio_dev_init
> before the tx_queues and rx_queues arrays were allocated.
> 
> The arrays are allocated when the ethdev port is configured and the
> nb_tx_queues and nb_rx_queues are initialised.
> 
> If any of the following functions were called before the ethdev
> port was configured there was a segmentation fault because
> rx_queues and tx_queues were NULL:
> 
> rte_eth_stats_get
> rte_eth_stats_reset
> rte_eth_xstats_get
> rte_eth_xstats_reset
> 
> Fixes: 823ad647950a ("virtio: support multiple queues")
> Signed-off-by: Bernard Iremonger 

Acked-by: Yuanhan Liu 

Thanks.

--yliu


[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Yuanhan Liu
On Wed, Dec 23, 2015 at 11:00:15PM +0100, Thomas Monjalon wrote:
> 2015-12-23 10:44, Yuanhan Liu:
> > On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> > > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu  > > linux.intel.com>
> > > wrote:
> > > 
> > > On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > > > The queue state change callback is the one new API that needs to be
> > > > added because
> > > > normal NICs don't have this behavior.
> > > 
> > > Again I'd ask, will vring_state_changed() be enough, when above issues
> > > are resolved: vring_state_changed() will be invoked at new_device()/
> > > destroy_device(), and of course, ethtool change?
> > > 
> > > 
> > > It would be sufficient. It is not a great API though, because it requires 
> > > the
> > > application to do the conversion from struct virtio_net to a DPDK port 
> > > number,
> > > and from a virtqueue index to a DPDK queue id and direction. Also, the 
> > > current
> > > implementation often makes this callback when the vring state has not 
> > > actually
> > > changed (enabled -> enabled and disabled -> disabled).
> > > 
> > > If you're asking about using vring_state_changed() _instead_ of the link 
> > > status
> > > event and rte_eth_dev_socket_id(),
> > 
> > No, I like the idea of link status event and rte_eth_dev_socket_id();
> > I was just wondering why a new API is needed. Both Tetsuya and I
> > were thinking to leverage the link status event to represent the
> > queue stats change (triggered by vring_state_changed()) as well,
> > so that we don't need to introduce another eth event. However, I'd
> > agree that it's better if we could have a new dedicate event.
> > 
> > Thomas, here is some background for you. For vhost pmd and linux
> > virtio-net combo, the queue can be dynamically changed by ethtool,
> > therefore, the application wishes to have another eth event, say
> > RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
> > add/remove corresponding queue to the datapath when that happens.
> > What do you think of that?
> 
> Yes it is an event. So I don't understand the question.
> What may be better than a specific rte_eth_event_type?

The alternative is a new set of callbacks, but judging that we already
have a set of callback for vhost libraray, and adding a new set to vhost
pmd doesn't seem elegant to me.

Therefore, I'd prefer a new eth event.

--yliu


[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Tetsuya Mukawa
On 2015/12/24 12:09, Tetsuya Mukawa wrote:
> On 2015/12/22 13:47, Rich Lane wrote:
>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu 
>> wrote:
>>
>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
 I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>> in a few
 ways:
>>> Rich, thanks for the info!
>>>
 1. new_device/destroy_device: Link state change (will be covered by the
>>> link
 status interrupt).
 2. new_device: Add first queue to datapath.
>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>> triggered at the beginning, when the default queue (the first queue) is
>>> enabled.
>>>
>> Turns out I'd misread the code and it's already using the
>> vring_state_changed callback for the
>> first queue. Not sure if this is intentional but vring_state_changed is
>> called for the first queue
>> before new_device.
>>
>>
 3. vring_state_changed: Add/remove queue to datapath.
 4. destroy_device: Remove all queues (vring_state_changed is not called
>>> when
 qemu is killed).
>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>> when destroy_device() is called.
>>>
>> That would be good.
>>
>>
 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>
>>> 1. Link status interrupt.
>>>
>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>> down().
>>>
>>>
 2. New queue_state_changed callback. Unlike vring_state_changed this
>>> should
 cover the first queue at new_device and removal of all queues at
 destroy_device.
>>> As stated above, vring_state_changed() should be able to do that, except
>>> the one on destroy_device(), which is not done yet.
>>>
 3. Per-queue or per-device NUMA node info.
>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>
>> Your suggestions are exactly how my application is already working. I was
>> commenting on the
>> proposed changes to the vhost PMD API. I would prefer to
>> use RTE_ETH_EVENT_INTR_LSC
>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>> of these vhost-specific
>> hacks. The queue state change callback is the one new API that needs to be
>> added because
>> normal NICs don't have this behavior.
>>
>> You could add another rte_eth_event_type for the queue state change
>> callback, and pass the
>> queue ID, RX/TX direction, and enable bit through cb_arg. 
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>
> I guess the callback handler will need to call ethdev APIs to know what
> causes the interrupt.
> Probably rte_eth_dev_socket_id() is to know numa_node, and
> rte_eth_dev_info_get() is to know the number of queues.

It seems rte_eth_dev_info_get() is not enough, because DPDK application
needs not only the number of queues, but also which queues are enabled
or disabled at least.
I guess current interrupt mechanism and ethdev APIs cannot provides such
information, because it's something special for vhost case.

Tetsuya

> Is this okay for your case?
>
> Thanks,
> Tetsuya



[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Yuanhan Liu
On Thu, Dec 24, 2015 at 12:09:10PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu  > linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg. 
> 
> Hi Rich,
> 
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.

Yes, the event callback argument is provided from application, where it
has no way to know info you mentioned above, like queue ID, RX/TX direction.

For that, we may need introduce another structure to note all above
info, and embed it to virtio_net struct.

> I guess the callback handler will need to call ethdev APIs to know what
> causes the interrupt.
> Probably rte_eth_dev_socket_id() is to know numa_node, and
> rte_eth_dev_info_get() is to know the number of queues.
> Is this okay for your case?

I don't think that's enough, and I think Rich has already given all info
needed to a queue change interrupt to you. (That would also answer your
questions in another email)

--yliu


[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Tetsuya Mukawa
On 2015/12/24 12:51, Yuanhan Liu wrote:
> On Wed, Dec 23, 2015 at 11:00:15PM +0100, Thomas Monjalon wrote:
>> 2015-12-23 10:44, Yuanhan Liu:
>>> On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
 On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu >>> linux.intel.com>
 wrote:

 On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
 > The queue state change callback is the one new API that needs to be
 > added because
 > normal NICs don't have this behavior.

 Again I'd ask, will vring_state_changed() be enough, when above issues
 are resolved: vring_state_changed() will be invoked at new_device()/
 destroy_device(), and of course, ethtool change?


 It would be sufficient. It is not a great API though, because it requires 
 the
 application to do the conversion from struct virtio_net to a DPDK port 
 number,
 and from a virtqueue index to a DPDK queue id and direction. Also, the 
 current
 implementation often makes this callback when the vring state has not 
 actually
 changed (enabled -> enabled and disabled -> disabled).

 If you're asking about using vring_state_changed() _instead_ of the link 
 status
 event and rte_eth_dev_socket_id(),
>>> No, I like the idea of link status event and rte_eth_dev_socket_id();
>>> I was just wondering why a new API is needed. Both Tetsuya and I
>>> were thinking to leverage the link status event to represent the
>>> queue stats change (triggered by vring_state_changed()) as well,
>>> so that we don't need to introduce another eth event. However, I'd
>>> agree that it's better if we could have a new dedicate event.
>>>
>>> Thomas, here is some background for you. For vhost pmd and linux
>>> virtio-net combo, the queue can be dynamically changed by ethtool,
>>> therefore, the application wishes to have another eth event, say
>>> RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
>>> add/remove corresponding queue to the datapath when that happens.
>>> What do you think of that?
>> Yes it is an event. So I don't understand the question.
>> What may be better than a specific rte_eth_event_type?
> The alternative is a new set of callbacks, but judging that we already
> have a set of callback for vhost libraray, and adding a new set to vhost
> pmd doesn't seem elegant to me.
>
> Therefore, I'd prefer a new eth event.
>
>   --yliu

I am ok to have one more event type.

BTW, I have questions about numa_node.
I guess "rte_eth_dev_socket_id()" can only return numa_node of specified
port.
If multiple queues are used in one device(port), can we say all queues
are always in same numa_node?

If the answer is no, I am still not sure we can remove "struct
virtio_net" from DPDK application callback handling.
I agree we can add RTE_ETH_EVENT_QUEUE_STATE_CHANGE for interrupt notice.
But current ethdev APIs may not be able to hide vhost specific
properties, then the callback hander needs to handle "struct virtio_net"
directly.

Thanks,
Tetsuya




[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Tetsuya Mukawa
On 2015/12/24 13:00, Yuanhan Liu wrote:
> On Thu, Dec 24, 2015 at 12:09:10PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu >> linux.intel.com>
>>> wrote:
>>>
 On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
 in a few
> ways:
 Rich, thanks for the info!

> 1. new_device/destroy_device: Link state change (will be covered by the
 link
> status interrupt).
> 2. new_device: Add first queue to datapath.
 I'm wondering why vring_state_changed() is not used, as it will also be
 triggered at the beginning, when the default queue (the first queue) is
 enabled.

>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
> 3. vring_state_changed: Add/remove queue to datapath.
> 4. destroy_device: Remove all queues (vring_state_changed is not called
 when
> qemu is killed).
 I had a plan to invoke vring_state_changed() to disable all vrings
 when destroy_device() is called.

>>> That would be good.
>>>
>>>
> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
 You can get the 'struct virtio_net' dev from all above callbacks.
>>>
 1. Link status interrupt.

 To vhost pmd, new_device()/destroy_device() equals to the link status
 interrupt, where new_device() is a link up, and destroy_device() is link
 down().


> 2. New queue_state_changed callback. Unlike vring_state_changed this
 should
> cover the first queue at new_device and removal of all queues at
> destroy_device.
 As stated above, vring_state_changed() should be able to do that, except
 the one on destroy_device(), which is not done yet.

> 3. Per-queue or per-device NUMA node info.
 You can query the NUMA node info implicitly by get_mempolicy(); check
 numa_realloc() at lib/librte_vhost/virtio-net.c for reference.

>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg. 
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
> Yes, the event callback argument is provided from application, where it
> has no way to know info you mentioned above, like queue ID, RX/TX direction.
>
> For that, we may need introduce another structure to note all above
> info, and embed it to virtio_net struct.
>
>> I guess the callback handler will need to call ethdev APIs to know what
>> causes the interrupt.
>> Probably rte_eth_dev_socket_id() is to know numa_node, and
>> rte_eth_dev_info_get() is to know the number of queues.
>> Is this okay for your case?
> I don't think that's enough, and I think Rich has already given all info
> needed to a queue change interrupt to you. (That would also answer your
> questions in another email)
>
>   --yliu

Thanks. Yes, I agree we need to provides such information through
"struct virtio_net".
And the callback handler needs to handle the vhost specific structure
directly.

Probably it's difficult to remove "struct virtio_net" from callback
handler of DPDK app.
This is the point I want to describe.

Tetsuya,


[dpdk-dev] rte_prefetch0() is effective?

2015-12-24 Thread Moon-Sang Lee
I see codes as below in example directory, and I wonder it is effective.
Coherent IO is adopted to modern architectures,
so I think that DMA initiation by rte_eth_rx_burst() might already fulfills
cache lines of RX buffers.
Do I really need to call rte_prefetchX()?

nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
MAX_PKT_BURST);
...
/* Prefetch and forward already prefetched packets */
for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) {
rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[
j + PREFETCH_OFFSET], void *));
l3fwd_simple_forward(pkts_burst[j], portid,
qconf);
}


-- 
Moon-Sang Lee, SW Engineer
Email: sang0627 at gmail.com
Wisdom begins in wonder. *Socrates*


[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-24 Thread Tetsuya Mukawa
On 2015/12/24 14:37, Rich Lane wrote:
> On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa  wrote:
>
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
>> yuanhan.liu at linux.intel.com>
>>> wrote:
>>>
 On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
 in a few
> ways:
 Rich, thanks for the info!

> 1. new_device/destroy_device: Link state change (will be covered by the
 link
> status interrupt).
> 2. new_device: Add first queue to datapath.
 I'm wondering why vring_state_changed() is not used, as it will also be
 triggered at the beginning, when the default queue (the first queue) is
 enabled.

>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
> 3. vring_state_changed: Add/remove queue to datapath.
> 4. destroy_device: Remove all queues (vring_state_changed is not called
 when
> qemu is killed).
 I had a plan to invoke vring_state_changed() to disable all vrings
 when destroy_device() is called.

>>> That would be good.
>>>
>>>
> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
 You can get the 'struct virtio_net' dev from all above callbacks.
>>>
 1. Link status interrupt.

 To vhost pmd, new_device()/destroy_device() equals to the link status
 interrupt, where new_device() is a link up, and destroy_device() is link
 down().


> 2. New queue_state_changed callback. Unlike vring_state_changed this
 should
> cover the first queue at new_device and removal of all queues at
> destroy_device.
 As stated above, vring_state_changed() should be able to do that, except
 the one on destroy_device(), which is not done yet.

> 3. Per-queue or per-device NUMA node info.
 You can query the NUMA node info implicitly by get_mempolicy(); check
 numa_realloc() at lib/librte_vhost/virtio-net.c for reference.

>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to
>> be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg.
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
>>
> You're right, I'd mistakenly thought that the PMD controlled the void *
> passed to the callback.
>
> Here's a thought:
>
> struct rte_eth_vhost_queue_event {
> uint16_t queue_id;
> bool rx;
> bool enable;
> };
>
> int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
> rte_eth_vhost_queue_event *event);
>
> On receiving the ethdev event the application could repeatedly call
> rte_eth_vhost_get_queue_event
> to find out what happened.

Hi Rich and Yuanhan,

I guess we have 2 implementations here.

1. rte_eth_vhost_get_queue_event() returns each event.
2. rte_eth_vhost_get_queue_status() returns current status of the queues.

I guess option "2" is more generic manner to handle interrupts from
device driver.
In the case of option "1", if DPDK application doesn't call
rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
This may exhaust memory.

One more example is current link status interrupt handling.
Actually ethdev API just returns current status of the port.
What do you think?

>
> An issue with having the application dig into struct virtio_net is that it
> can only be safely accessed from
> a callback on the vhost thread.

Here is one of example how to invoke a callback handler registered by
DPDK application from the PMD.

  _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);

Above function is called by interrupt handling thread of the PMDs.

Please check implementation of above function.
A callback handler that DPDK application registers is called in
"interrupt handling context".
(I mean the interrupt handling thread of the PMD calls the callback
handler of DPDK application also.)
Anyway, I guess the callback handler of DPDK application can access to
"struct virtio_net" safely.

> A typical application running its control
> plane on lcore 0 would need to
> copy all the relevant info from str

[dpdk-dev] [PATCH 1/3] librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET

2015-12-24 Thread Qiu, Michael
On 12/23/2015 8:19 PM, Reshma Pattan wrote:
> Macros RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET
> are blocking the secondary process from using the APIs.
> API access should be given to both secondary and primary.

Just as the log says, is it safe to do so?

Thanks,
Michael
> Fix minor checkpath issues in rte_ethdev.h
>
> Reported-by: Sean Harte 
> Signed-off-by: Reshma Pattan 
> ---
>  lib/librte_ether/rte_ethdev.c | 50 
> +--
>  lib/librte_ether/rte_ethdev.h | 20 -
>  2 files changed, 11 insertions(+), 59 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index ed971b4..5849102 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -711,10 +711,6 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t 
> rx_queue_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -741,10 +737,6 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t 
> rx_queue_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -771,10 +763,6 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t 
> tx_queue_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -801,10 +789,6 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t 
> tx_queue_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -874,10 +858,6 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, 
> uint16_t nb_tx_q,
>   struct rte_eth_dev_info dev_info;
>   int diag;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   if (nb_rx_q > RTE_MAX_QUEUES_PER_PORT) {
> @@ -1059,10 +1039,6 @@ rte_eth_dev_start(uint8_t port_id)
>   struct rte_eth_dev *dev;
>   int diag;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -1096,10 +1072,6 @@ rte_eth_dev_stop(uint8_t port_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_RET();
> -
>   RTE_ETH_VALID_PORTID_OR_RET(port_id);
>   dev = &rte_eth_devices[port_id];
>  
> @@ -1121,10 +1093,6 @@ rte_eth_dev_set_link_up(uint8_t port_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -1138,10 +1106,6 @@ rte_eth_dev_set_link_down(uint8_t port_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
> -
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>  
>   dev = &rte_eth_devices[port_id];
> @@ -1155,10 +1119,6 @@ rte_eth_dev_close(uint8_t port_id)
>  {
>   struct rte_eth_dev *dev;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*/
> - RTE_PROC_PRIMARY_OR_RET();
> -
>   RTE_ETH_VALID_PORTID_OR_RET(port_id);
>   dev = &rte_eth_devices[port_id];
>  
> @@ -1183,10 +1143,6 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
> rx_queue_id,
>   struct rte_eth_dev *dev;
>   struct rte_eth_dev_info dev_info;
>  
> - /* This function is only safe when called from the primary process
> -  * in a multi-process setup*

[dpdk-dev] [PATCH v5 0/6] interrupt mode for fm10k

2015-12-24 Thread Qiu, Michael
On 12/23/2015 3:38 PM, He, Shaopeng wrote:
> This patch series adds interrupt mode support for fm10k,
> contains four major parts:
>
> 1. implement rx_descriptor_done function in fm10k
> 2. add rx interrupt support in fm10k PF and VF
> 3. make sure default VID available in dev_init in fm10k
> 4. fix a memory leak for non-ip packet in l3fwd-power,
>which happens mostly when testing fm10k interrupt mode.
>
> v5 changes:
>   - remove one unnecessary NULL check for rte_free
>   - fix a wrong error message
>   - add more clean up when memory allocation fails
>   - split line over 80 characters to 2 lines
>   - update interrupt mode limitation in fm10k.rst
>
> v4 changes:
>   - rebase to latest code
>   - update release 2.3 note in corresponding patches
>
> v3 changes:
>   - rebase to latest code
>   - macro renaming according to the EAL change
>
> v2 changes:
>   - reword some comments and commit messages
>   - split one big patch into three smaller ones
>
> Shaopeng He (6):
>   fm10k: implement rx_descriptor_done function
>   fm10k: setup rx queue interrupts for PF and VF
>   fm10k: remove rx queue interrupts when dev stops
>   fm10k: add rx queue interrupt en/dis functions
>   fm10k: make sure default VID available in dev_init
>   l3fwd-power: fix a memory leak for non-ip packet
>
>  doc/guides/nics/fm10k.rst|   7 ++
>  doc/guides/rel_notes/release_2_3.rst |   8 ++
>  drivers/net/fm10k/fm10k.h|   6 ++
>  drivers/net/fm10k/fm10k_ethdev.c | 174 
> ---
>  drivers/net/fm10k/fm10k_rxtx.c   |  25 +
>  examples/l3fwd-power/main.c  |   3 +-
>  6 files changed, 211 insertions(+), 12 deletions(-)
>

Acked-by: Michael Qiu 


[dpdk-dev] [PATCH 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd

2015-12-24 Thread Qiu, Michael
On 12/24/2015 11:07 AM, Zhihong Wang wrote:
> Handle SIGINT and SIGTERM in testpmd.
>
> Signed-off-by: Zhihong Wang 
> ---
>  app/test-pmd/testpmd.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 98ae46d..c259ba3 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1573,6 +1573,7 @@ pmd_test_exit(void)
>   FOREACH_PORT(pt_id, ports) {
>   printf("Stopping port %d...", pt_id);
>   fflush(stdout);
> + rte_eth_dev_stop(pt_id);
>   rte_eth_dev_close(pt_id);
>   printf("done\n");
>   }
> @@ -1984,12 +1985,34 @@ init_port(void)
>   ports[pid].enabled = 1;
>  }
>  
> +/* When we receive a INT signal, close all ports */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> + unsigned portid;
> +
> + printf("Preparing to exit...\n");

Better to notice user "Signal xxx received, reparing to exit... "

> + FOREACH_PORT(portid, ports) {
> + if (port_id_is_invalid(portid, ENABLED_WARN))
> + continue;
> + printf("Stopping port %d...", portid);
> + rte_eth_dev_stop(portid);
> + rte_eth_dev_close(portid);
> + printf(" Done\n");
> + }
> + printf("Bye...\n");

Here why don't call pmd_test_exit()? Any issue with that func?

Thanks,
Michael
> + exit(0);
> +}
> +
>  int
>  main(int argc, char** argv)
>  {
>   int  diag;
>   uint8_t port_id;
>  
> + signal(SIGINT, sigint_handler);
> + signal(SIGTERM, sigint_handler);
> +
>   diag = rte_eal_init(argc, argv);
>   if (diag < 0)
>   rte_panic("Cannot init EAL\n");



[dpdk-dev] [PATCH 2/3] examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd

2015-12-24 Thread Qiu, Michael
On 12/24/2015 11:07 AM, Zhihong Wang wrote:
> Handle SIGINT and SIGTERM in l2fwd.
>
> Signed-off-by: Zhihong Wang 
> ---
>  examples/l2fwd/main.c | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
> index 720fd5a..0594037 100644
> --- a/examples/l2fwd/main.c
> +++ b/examples/l2fwd/main.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -534,6 +535,27 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
> port_mask)
>   }
>  }
>  
> +/* When we receive a INT signal, close all ports */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> + unsigned portid, nb_ports;
> +
> + printf("Preparing to exit...\n");

Same here and l3fwd, better to show the reason of this exit.

Thanks,
Michael
> + nb_ports = rte_eth_dev_count();
> + for (portid = 0; portid < nb_ports; portid++) {
> + if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
> + continue;
> + }
> + printf("Stopping port %d...", portid);
> + rte_eth_dev_stop(portid);
> + rte_eth_dev_close(portid);
> + printf(" Done\n");
> + }
> + printf("Bye...\n");
> + exit(0);
> +}
> +
>  int
>  main(int argc, char **argv)
>  {
> @@ -546,6 +568,9 @@ main(int argc, char **argv)
>   unsigned lcore_id, rx_lcore_id;
>   unsigned nb_ports_in_mask = 0;
>  
> + signal(SIGINT, sigint_handler);
> + signal(SIGTERM, sigint_handler);
> +
>   /* init EAL */
>   ret = rte_eal_init(argc, argv);
>   if (ret < 0)



[dpdk-dev] [RFC PATCH 0/2] Reduce DPDK initialization time

2015-12-24 Thread Qiu, Michael
On 11/18/2015 6:30 PM, Zhihong Wang wrote:
> This RFC patch aims to reduce DPDK initialization time, which is important in 
> cases such as micro service.
>
> Changes are:
>
> 1. Reduce timer initialization time
>
> 2. Remove unnecessary hugepage zero-filling operations
>
> With this patch:
>
> 1. Timer initialization time can be reduced by 4/10 second
>
> 2. Memory initialization time can be reduced nearly by half
>
> The 2nd topic has been brought up before in this thread:
> http://dpdk.org/dev/patchwork/patch/4219/
>
> Zhihong Wang (2):
>   lib/librte_eal: Reduce timer initialization time
>   lib/librte_eal: Remove unnecessary hugepage zero-filling
>
>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 +
>  lib/librte_eal/linuxapp/eal/eal_timer.c  | 2 +-
>  2 files changed, 2 insertions(+), 5 deletions(-)
>

As I tested with 8192 hugepages(size 2M), one nic 82599 bind, using time
to get the seconds used:
with this patch:

echo quit | time ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 -- -i
2.15 user
5.55 system
0:07.82 elapsed

Without patch:
echo quit | time ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 -- -i
3.18 user
5.63 system
0:09.32 elapsed

1.5s saved,  16% improvement, I don't know if this is good enough, but
indeed save lots of time.

Thanks,
Michael


[dpdk-dev] Network software verification survey.

2015-12-24 Thread Arseniy Zaostrovnykh
Hello,

I am a PhD student and I am doing research about verification of network 
applications. In order to maximize the utility of our efforts for the 
networking community, our team would grateful if you answer a couple of 
questions:

http://goo.gl/cL760p

This short form should take just 2-3 minutes of your time. If you are 
subscribed to the DPDK-users list and received a similar message, please 
use this link instead. If you have any questions, please ask them in 
this thread or e-mail me personally (arseniy.zaostrovnykh at epfl.ch).

-- 
Respectfully,
Arseniy.



[dpdk-dev] [PATCH 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd

2015-12-24 Thread Wang, Zhihong
> > +/* When we receive a INT signal, close all ports */ static void
> > +sigint_handler(__rte_unused int signum) {
> > +   unsigned portid;
> > +
> > +   printf("Preparing to exit...\n");
> 
> Better to notice user "Signal xxx received, reparing to exit... "

Can do that.

> 
> > +   FOREACH_PORT(portid, ports) {
> > +   if (port_id_is_invalid(portid, ENABLED_WARN))
> > +   continue;
> > +   printf("Stopping port %d...", portid);
> > +   rte_eth_dev_stop(portid);
> > +   rte_eth_dev_close(portid);
> > +   printf(" Done\n");
> > +   }
> > +   printf("Bye...\n");
> 
> Here why don't call pmd_test_exit()? Any issue with that func?

Yes should just call this one :)

> 
> Thanks,
> Michael
> > +   exit(0);
> > +}
> > +
> >  int
> >  main(int argc, char** argv)
> >  {
> > int  diag;
> > uint8_t port_id;
> >
> > +   signal(SIGINT, sigint_handler);
> > +   signal(SIGTERM, sigint_handler);
> > +
> > diag = rte_eal_init(argc, argv);
> > if (diag < 0)
> > rte_panic("Cannot init EAL\n");



[dpdk-dev] [PATCH 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

2015-12-24 Thread Ananyev, Konstantin

Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang
> Sent: Wednesday, December 23, 2015 8:03 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in 
> l3fwd
> 
> Handle SIGINT and SIGTERM in l3fwd.
> 
> Signed-off-by: Zhihong Wang 
> ---
>  examples/l3fwd/main.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> index 5b0c2dd..aae16d2 100644
> --- a/examples/l3fwd/main.c
> +++ b/examples/l3fwd/main.c
> @@ -41,6 +41,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -2559,6 +2560,27 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
> port_mask)
>   }
>  }
> 
> +/* When we receive a INT signal, close all ports */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> + unsigned portid, nb_ports;
> +
> + printf("Preparing to exit...\n");
> + nb_ports = rte_eth_dev_count();
> + for (portid = 0; portid < nb_ports; portid++) {
> + if ((enabled_port_mask & (1 << portid)) == 0) {
> + continue;
> + }
> + printf("Stopping port %d...", portid);
> + rte_eth_dev_stop(portid);
> + rte_eth_dev_close(portid);

Hmm, so your interrupt thread invokes dev_stop,
while IO lcores keep calling rx_burst/tx_burst? 
For graceful shutdown on SIGINT, I suppose you first have to
stop your IO lcores first.
Let say have a global var: 'stop' that every lcore has to check from
time to time (or something similar).
Konstantin

> + printf(" Done\n");
> + }
> + printf("Bye...\n");
> + exit(0);
> +}
> +
>  int
>  main(int argc, char **argv)
>  {
> @@ -2572,6 +2594,9 @@ main(int argc, char **argv)
>   uint32_t n_tx_queue, nb_lcores;
>   uint8_t portid, nb_rx_queue, queue, socketid;
> 
> + signal(SIGINT, sigint_handler);
> + signal(SIGTERM, sigint_handler);
> +
>   /* init EAL */
>   ret = rte_eal_init(argc, argv);
>   if (ret < 0)
> --
> 2.5.0



[dpdk-dev] [PATCH] mk: fix examples build failure

2015-12-24 Thread steeven lee
1. Fix examples build failure
2. make build as default output folder name

Signed-off-by: steeven 
---
 mk/internal/rte.extvars.mk | 4 ++--
 mk/rte.extsubdir.mk| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mk/internal/rte.extvars.mk b/mk/internal/rte.extvars.mk
index 040d39f..cabef0a 100644
--- a/mk/internal/rte.extvars.mk
+++ b/mk/internal/rte.extvars.mk
@@ -52,9 +52,9 @@ RTE_EXTMK ?= $(RTE_SRCDIR)/Makefile
 export RTE_EXTMK

 # RTE_SDK_BIN must point to .config, include/ and lib/.
-RTE_SDK_BIN := $(RTE_SDK)/$(RTE_TARGET)
+RTE_SDK_BIN := $(RTE_SDK)/build
 ifeq ($(wildcard $(RTE_SDK_BIN)/.config),)
-$(error Cannot find .config in $(RTE_SDK))
+$(error Cannot find .config in $(RTE_SDK_BIN))
 endif

 #
diff --git a/mk/rte.extsubdir.mk b/mk/rte.extsubdir.mk
index f50f006..819020a 100644
--- a/mk/rte.extsubdir.mk
+++ b/mk/rte.extsubdir.mk
@@ -46,7 +46,7 @@ $(DIRS-y):
@echo "== $@"
$(Q)$(MAKE) -C $(@) \
M=$(CURDIR)/$(@)/Makefile \
-   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/$(@)/$(RTE_TARGET) \
+   O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/build \
BASE_OUTPUT=$(BASE_OUTPUT) \
CUR_SUBDIR=$(CUR_SUBDIR)/$(@) \
S=$(CURDIR)/$(@) \
-- 
1.9.1


[dpdk-dev] [PATCH v3] doc: announce ABI change for struct rte_eth_conf

2015-12-24 Thread Ivan Boule
Hi Jijiang,

See my comments inline below prefixewd with IB>
Ivan

On 12/18/2015 03:00 AM, Liu, Jijiang wrote:
> Hi Boule,
>
>> -Original Message-
>> From: Ivan Boule [mailto:ivan.boule at 6wind.com]
>> Sent: Tuesday, December 15, 2015 4:50 PM
>> To: Liu, Jijiang
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3] doc: announce ABI change for struct
>> rte_eth_conf
>>
>> On 12/14/2015 08:48 AM, Jijiang Liu wrote:
>>> In current codes, tunnel configuration information is not stored in a device
>> configuration, and it will get nothing if application want to retrieve tunnel
>> config, so I think it is necessary to add rte_eth_dev_tunnel_configure()
>> function is to do the configurations including flow and classification
>> information and store it in a device configuration.
>>>
>>> And tunneling packet encapsulation operation will benifit from the change.
>>>
>>> There are more descriptions for the ABI changes below,
>>>
>>> The struct 'rte_eth_tunnel_conf' is a new, its defination like below,
>>> struct rte_eth_tunnel_conf {
>>>  uint16_t tx_queue;
>>>  uint16_t filter_type;
>>>  struct rte_eth_tunnel_flow flow_tnl; };
>>>
>>> The ABI change announcement of struct 'rte_eth_tunnel_flow' have
>> already sent out, refer to [1].
>>>
>>> [1]http://dpdk.org/ml/archives/dev/2015-December/029837.html.
>>>
>>> The change of struct 'rte_eth_conf' like below, but it have not finalized 
>>> yet.
>>> struct rte_eth_conf {
>>> ...
>>> uint32_t dcb_capability_en;
>>> struct rte_fdir_conf fdir_conf; /**< FDIR configuration. */
>>> struct rte_intr_conf intr_conf; /**< Interrupt mode configuration. */
>>> struct rte_eth_tunnel_conf
>> *tunnel_conf[RTE_MAX_QUEUES_PER_PORT];
>>> /**< Tunnel configuration. */
>>> };
>>>
>>> v2 change:
>>> Add more description for the change.
>>>
>>> v3 change:
>>> Change ABI announcement description.
>>>
>>> Signed-off-by: Jijiang Liu  ---cmdline.c
>>>doc/guides/rel_notes/deprecation.rst |6 ++
>>>1 files changed, 6 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>> b/doc/guides/rel_notes/deprecation.rst
>>> index 5c458f2..9dbe89e 100644
>>> --- a/doc/guides/rel_notes/deprecation.rst
>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>> @@ -23,3 +23,9 @@ Deprecation Notices
>>>* ABI changes are planned for struct rte_eth_tunnel_flow in order to
>> extend new fileds to support
>>>  tunneling packet configuration in unified tunneling APIs. The release 
>>> 2.2
>> does not contain these ABI
>>>  changes, but release 2.3 will, and no backwards compatibility is 
>>> planned.
>>> +
>>> +* ABI changes are planned for the struct rte_eth_conf in order to add
>>> +'tunnel_conf' variable
>>> +  in the struct to store tunnel configuration when using new API
>>> +rte_eth_dev_tunnel_configure
>>> +  (uint8_t port_id, uint16_t rx_queue, struct rte_eth_tunnel_conf *
>>> +tunnel_conf) to configure
>>> +  tunnel flow and classification information. The release 2.2 does
>>> +not contain these ABI change,
>>> +  but release 2.3 will, and no backward compatibility is planned.
>>>
>> Hi Jijiang,
>>
>> Can you provide a real use case - I mean an example of a real network
>> application - that really needs to save tunnel configurations in a data
>> structure associated with a NIC port?
>
> I'm trying to provide a tunneling packet solution in DPDK, which would 
> accelerate de/encapsulation operation of tunneling packet.
IB> I was asking for an example of an application that needs to SAVE in 
the DPDK structure associated with a port a tunnel configuration that it 
applies to that port.
Where does that saved tunnel configuration will participate to the 
acceleration of decap/encap ops?

>
> It was described at [1],
> [1] http://dpdk.org/ml/archives/dev/2015-December/030283.html
>
>
> Let me provide more details on this, these data structure definition have not 
> fully finalized yet, just for your reference.
> We are talking about why tunnel configuration need to be stored.
IB? yes :-)

>
> For NIC A RX process,
> VM 0--->VTEP A---> VXLAN network--->VTEP B---NIC A (Rx queue 1 with info [1] 
> )--->SW decapsulation--->vSwitch--->VM 0
>
> For NIC A TX process,
> VM 0<---VTEP A<---VXLAN network<---VTEP B<---NIC A (TX queue 1)<---SW 
> Encapsulation with info[2]<---vSwitch<---VM 0
>
> The[2] information  will be got by retrieving the tunnel configuration, if 
> the tunnel configuration is not stored  in 'rt_eth_conf', and how to get it?

IB> it is assumed that the encapsulation acceleration relies on having 
this operation done in hardware. Am I wrong?
If I am right, then can you tell me which PMD function accesses the 
saved tunnel configuration?

>
> Of course, the tunnel configuration is also stored in Application, does it 
> make sense?
IB> No. Why store it twice? Are you considering that memory if available 
for free?

>
> [1] outr src ip(192.168.10.1) + outer dst ip(10.239.129.11)

[dpdk-dev] [PATCH v1 0/2] Virtio-net PMD Extension to work on host

2015-12-24 Thread Tan, Jianfeng
Hi Tetsuya,

After several days' studying your patch, I have some questions as follows:

1. Is physically-contig memory really necessary?
This is a too strong requirement IMHO. IVSHMEM doesn't require this in its 
original meaning. So how do you think of
Huawei Xie's idea of using virtual address for address translation? (In 
addition, virtual address of mem_table could be
different in application and QTest, but this can be addressed because 
SET_MEM_TABLE msg will be intercepted by
QTest)

2. Is root privilege OK in container's case?
Another reason we'd like to give up physically-contig feature is that it needs 
root privilege to read /proc/self/pagemap
file. Container has already been widely criticized for bad security isolation. 
Enabling root privilege will make it worse.
On the other hand, it's not easy to remove root privilege too. If we use 
vhost-net as the backend, kernel will definitely
require root privilege to create a tap device/raw socket. We tend to pick such 
work, which requires root, into runtime
preparation of a container. Do you agree?

3.Is one Qtest process per virtio device too heavy?
Although we can foresee that each container always owns only one virtio device, 
but take its possible high density
into consideration, hundreds or even thousands of container requires the same 
number of QTest processes. As
you mentioned that port hotplug is supported, is it possible to use just one 
QTest process for all virtio devices
emulation?

As you know, we have another solution according to this (which under heavy 
internal review). But I think we have lots
of common problems to be solved, right?

Thanks for your great work!

Thanks,
Jianfeng

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Wednesday, December 16, 2015 4:37 PM
> To: dev at dpdk.org
> Cc: nakajima.yoshihiro at lab.ntt.co.jp; Tan, Jianfeng; Xie, Huawei;
> mst at redhat.com; marcandre.lureau at gmail.com; Tetsuya Mukawa
> Subject: [PATCH v1 0/2] Virtio-net PMD Extension to work on host
> 
> [Change log]
> 
> PATCH v1:
> (Just listing functionality changes and important bug fix)
> * Support virtio-net interrupt handling.
>   (It means virtio-net PMD on host and guest have same virtio-net features)
> * Fix memory allocation method to allocate contiguous memory correctly.
> * Port Hotplug is supported.
> * Rebase on DPDK-2.2.
> 
> 
> [Abstraction]
> 
> Normally, virtio-net PMD only works on VM, because there is no virtio-net
> device on host.
> This RFC patch extends virtio-net PMD to be able to work on host as virtual
> PMD.
> But we didn't implement virtio-net device as a part of virtio-net PMD.
> To prepare virtio-net device for the PMD, start QEMU process with special
> QTest mode, then connect it from virtio-net PMD through unix domain
> socket.
> 
> The virtio-net PMD on host is fully compatible with the PMD on guest.
> We can use same functionalities, and connect to anywhere QEMU virtio-net
> device can.
> For example, the PMD can use virtio-net multi queues function. Also it can
> connects to vhost-net kernel module and vhost-user backend application.
> Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
> PMD will be shared between vhost backend application. But vhost backend
> application memory will not be shared.
> 
> Main target of this PMD is container like docker, rkt, lxc and etc.
> We can isolate related processes(virtio-net PMD process, QEMU and vhost-
> user backend process) by container.
> But, to communicate through unix domain socket, shared directory will be
> needed.
> 
> 
> [How to use]
> 
> So far, we need QEMU patch to connect to vhost-user backend.
> See below patch.
>  - http://patchwork.ozlabs.org/patch/552549/
> To know how to use, check commit log.
> 
> 
> [Detailed Description]
> 
>  - virtio-net device implementation
> This host mode PMD uses QEMU virtio-net device. To do that, QEMU QTest
> functionality is used.
> QTest is a test framework of QEMU devices. It allows us to implement a
> device driver outside of QEMU.
> With QTest, we can implement DPDK application and virtio-net PMD as
> standalone process on host.
> When QEMU is invoked as QTest mode, any guest code will not run.
> To know more about QTest, see below.
>  - http://wiki.qemu.org/Features/QTest
> 
>  - probing devices
> QTest provides a unix domain socket. Through this socket, driver process can
> access to I/O port and memory of QEMU virtual machine.
> The PMD will send I/O port accesses to probe pci devices.
> If we can find virtio-net and ivshmem device, initialize the devices.
> Also, I/O port accesses of virtio-net PMD will be sent through socket, and
> virtio-net PMD can initialize vitio-net device on QEMU correctly.
> 
>  - ivshmem device to share memory
> To share memory that virtio-net PMD process uses, ivshmem device will be
> used.
> Because ivshmem device can only handle one file descriptor, shared memory
> should be consist of one file.
> To allocate such a m

[dpdk-dev] [PATCH 1/3] librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET

2015-12-24 Thread Pattan, Reshma


> -Original Message-
> From: Qiu, Michael
> On 12/23/2015 8:19 PM, Reshma Pattan wrote:
> > Macros RTE_PROC_PRIMARY_OR_ERR_RET and
> RTE_PROC_PRIMARY_OR_RET are
> > blocking the secondary process from using the APIs.
> > API access should be given to both secondary and primary.
> 
> Just as the log says, is it safe to do so?


Hi,

Some parts of the code still need these macros, which I am not sure yet. But as 
and when we identify those we have to add the macros to the needed places. 
But it is safe to remove from start of function to allow secondary process to 
do device configuration and queue setups for vdev. 
Please let me know if you know any of such cases where these macros should be 
added.

Thanks,
Reshma



[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Getting memory backing issues with qemu parameter passing

2015-12-24 Thread Abhijeet Karve
Hi Przemek,

Thank you so much for your quick response. 

The guide(
https://github.com/openstack/networking-ovs-dpdk/blob/stable/kilo/doc/source/getstarted/ubuntu.rst
) which you have suggested that is for openstack vhost user installations 
with devstack. 
Can't we have any reference for including ovs-dpdk mechanisam driver for 
openstack Ubuntu distribution which we are following for 
compute+controller node setup?" 

We are facing below listed issues With the current approach of setting up 
openstack kilo interactively + replacing ovs with ovs-dpdk enabled and 
Instance creation in openstack with
passing that instance id to QEMU command line which further passes the 
vhost-user sockets to instances for enabling the DPDK libraries in it.


1. Created a flavor m1.hugepages which is backed by hugepage memory, 
unable to spawn instance with this flavor ? Getting a issue like: No 
matching hugetlbfs for the number of hugepages assigned to the flavor.
2. Passing socket info to instances via qemu manually and instnaces 
created are not persistent.

Now as you suggested, we are looking in enabling ovsdpdk ml2 mechanism 
driver and agent all of that in our openstack ubuntu distribution.

Would be really appriciate if get any help or ref with explanation.

We are using compute + controller node setup and we are using following 
software platform on compute node: 
_ 
Openstack: Kilo 
Distribution: Ubuntu 14.04 
OVS Version: 2.4.0 
DPDK 2.0.0 
_ 

Thanks,
Abhijeet Karve





From:   "Czesnowicz, Przemyslaw" 
To: Abhijeet Karve 
Cc: "dev at dpdk.org" , "discuss at openvswitch.org" 
, "Gray, Mark D" 
Date:   12/17/2015 06:32 PM
Subject:RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser



I haven?t tried that approach not sure if that would work, it seems 
clunky.

If you enable ovsdpdk ml2 mechanism driver and agent all of that (add 
ports to ovs with the right type, pass the sockets to qemu) would be done 
by OpenStack.

Przemek

From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com] 
Sent: Thursday, December 17, 2015 12:41 PM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser

Hi Przemek, 

Thank you so much for sharing the ref guide. 

Would be appreciate if clear one doubt. 

At present we are setting up openstack kilo interactively and further 
replacing ovs with ovs-dpdk enabled. 
Once the above setup done, We are creating instance in openstack and 
passing that instance id to QEMU command line which further passes the 
vhost-user sockets to instances, enabling the DPDK libraries in it. 

Isn't this the correct way of integrating ovs-dpdk with openstack? 


Thanks & Regards
Abhijeet Karve




From:"Czesnowicz, Przemyslaw"  
To:Abhijeet Karve  
Cc:"dev at dpdk.org" , "discuss at openvswitch.org" <
discuss at openvswitch.org>, "Gray, Mark D"  
Date:12/17/2015 05:27 PM 
Subject:RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser 




HI Abhijeet, 

For Kilo you need to use ovsdpdk mechanism driver and a matching agent to 
integrate ovs-dpdk with OpenStack. 

The guide you are following only talks about running ovs-dpdk not how it 
should be integrated with OpenStack. 

Please follow this guide: 
https://github.com/openstack/networking-ovs-dpdk/blob/stable/kilo/doc/source/getstarted/ubuntu.rst
 


Best regards 
Przemek 


From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com] 
Sent: Wednesday, December 16, 2015 9:37 AM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser 

Hi Przemek, 


We have configured the accelerated data path between a physical interface 
to the VM using openvswitch netdev-dpdk with vhost-user support. The VM 
created with this special data path and vhost library, I am calling as 
DPDK instance. 

If assigning ip manually to the newly created Cirros VM instance, We are 
able to make 2 VM's to communicate on the same compute node. Else it's not 
associating any ip through DHCP though DHCP is in compute node only. 

Yes it's a compute + controller node setup and we are using following 
software platform on compute node: 
_ 
Openstack: Kilo 
Distribution: Ubuntu 14.04 
OVS Version: 2.4.0 
DPDK 2.0.0 
_ 

We are following the intel guide 
https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200
 


When doing "ovs-vsctl show" in compute node, it shows below output: 
_ 
ovs-vsctl show 
c2ec29a5-992d-4875-8adc-1265c23e0304 
   Bridge br-ex 
   Port phy-br-ex 
   Interface phy-br-ex 
   type: patch

[dpdk-dev] [PATCH 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

2015-12-24 Thread Stephen Hemminger
On Wed, 23 Dec 2015 15:03:15 -0500
Zhihong Wang  wrote:

> +/* When we receive a INT signal, close all ports */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> + unsigned portid, nb_ports;
> +
> + printf("Preparing to exit...\n");
> + nb_ports = rte_eth_dev_count();
> + for (portid = 0; portid < nb_ports; portid++) {
> + if ((enabled_port_mask & (1 << portid)) == 0) {
> + continue;
> + }
> + printf("Stopping port %d...", portid);
> + rte_eth_dev_stop(portid);
> + rte_eth_dev_close(portid);
> + printf(" Done\n");
> + }
> + printf("Bye...\n");
> + exit(0);
> +}

Signal handlers should only set a flag, which is then checked by thread loops.
Calling functions in DPDK from signal handlers is not safe.


[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-24 Thread Stephen Hemminger
On Thu, 24 Dec 2015 11:30:27 +0800
Yuanhan Liu  wrote:

> On Wed, Dec 23, 2015 at 11:26:17PM +0100, Thomas Monjalon wrote:
> > 2015-12-23 10:09, Yuanhan Liu:
> > > On Wed, Dec 23, 2015 at 09:55:54AM +0800, Peter Xu wrote:
> > > > On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote:
> > > > > On 12/22/2015 7:39 PM, Peter Xu wrote:
> > > > > > I tried to unbind one of the virtio net device, I see the PCI entry
> > > > > > still there.
> > > > > >
> > > > > > Before unbind:
> > > > > >
> > > > > > [root at vm proc]# lspci -k -s 00:03.0
> > > > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > > > > Subsystem: Red Hat, Inc Device 0001
> > > > > > Kernel driver in use: virtio-pci
> > > > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > > > > >   c060-c07f : :00:03.0
> > > > > > c060-c07f : virtio-pci
> > > > > >
> > > > > > After unbind:
> > > > > >
> > > > > > [root at vm proc]# lspci -k -s 00:03.0
> > > > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > > > > Subsystem: Red Hat, Inc Device 0001
> > > > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > > > > >   c060-c07f : :00:03.0
> > > > > >
> > > > > > So... does this means that it is an alternative to black list
> > > > > > solution?
> > > > > Oh, we could firstly check if this port is manipulated by kernel 
> > > > > driver
> > > > > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too 
> > > > > late.
> > > 
> > > Why can't we simply quit at pci_scan_one, once finding that it's not
> > > bond to uio (or similar stuff)? That would be generic enough, that we
> > > don't have to do similar checks for each new pmd driver.
> > > 
> > > Or, am I missing something?
> > 
> > UIO is not needed to make virtio works (without interrupt support).
> > Sometimes it may be required to avoid using kernel modules.
> 
> While dig the git history, I found the commit:
> 
> commit da978dfdc43b59e290a46d7ece5fd19ce79a1162
> Author: Ouyang Changchun 
> Date:   Mon Feb 9 09:14:06 2015 +0800
> 
> virtio: use port IO to get PCI resource
> 
> Make virtio not require UIO for some security reasons, this is to 
> match
> 6WIND's virtio-net-pmd.
> 
> Signed-off-by: Changchun Ouyang 
> Acked-by: Huawei Xie 
> 
> The commit log is not well written, giving no explanation about the
> "some security reasons".
> 
> Anyway, I see it now that it's kind of a design.
> 
> 
> Note that my first patch set about enabling virtio 1.0 [0] sets the
> RTE_PCI_DRV_NEED_MAPPING flag, which somehow implies that uio is a
> must, otherwise, eal init would fail at pci_map_device().
> 
> So that my pathset breaks the un-documented rule, and I need fix it.
> How about adding a wrapper, say rte_pci_map_device(), and exporting
> it, so that virtio pmd driver could map resources when necessary?
> 
> [0]: http://dpdk.org/dev/patchwork/bundle/yliu/virtio-1.0-pmd/
> 
> 
> > > > I guess there might be two problems? Which are:
> > > > 
> > > > 1. How user avoid DPDK taking over virtio devices that they do not
> > > >want for IO (chooses which device to use)
> > > 
> > > Isn't that what's the 'binding/unbinding' for?
> > 
> > Binding is, sometimes, required.
> 
> We may need fix the doc then. As the doc says it's a must:
> 
> 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules
> 
> Instead, all ports that are to be used by an DPDK application
> ==> must be bound to the uio_pci_generic, igb_uio or vfio-pci
> module before the application is run. Any network ports under
> Linux* control will be ignored by the DPDK poll-mode drivers
> and cannot be used by the application.
> 
> 
>   --yliu
> 
> > But does it mean DPDK should use every available ports?
> > That's the default and may be configured with blacklist/whitelist.
> > 
> > > > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in
> > > >kernel (happens on every virtio device that DPDK uses)
> > > 
> > > If you unbinded the kernel driver first, which is the suggested (or
> > > must?) way to use DPDK, that will not happen.


As far as I remember; there are some environments where DPDK but guests are not
allowed to load their own kernel modules. But since virtio only needs access to
I/O ports on x86, the driver can accomodate. I haven't run into these 
environments.

But the added code in virtio driver causes issues. Mostly because it causes
an unnecessary duplication of the initialization code, and is missing many of
the protections and interfaces that exist in the base code. For example,
there are lots of corner cases in interrupt support which are related to
this non-UIO mode of operation.


[dpdk-dev] [PATCH v2 0/3] Handle SIGINT and SIGTERM in DPDK examples

2015-12-24 Thread Zhihong Wang
This patch handles SIGINT and SIGTERM in testpmd, l2fwd, and l3fwd, make sure 
all ports are properly stopped and closed.
For virtual ports, the stop and close function may deal with resource cleanup, 
such as socket files unlinking.

--
Changes in v2:

1. Make sure graceful exit for all running phases

2. Make sure program exits with the right status

Zhihong Wang (3):
  app/test-pmd: Handle SIGINT and SIGTERM in testpmd
  examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd
  examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

 app/test-pmd/cmdline.c |  19 ++---
 app/test-pmd/testpmd.c |  38 ++---
 app/test-pmd/testpmd.h |   1 +
 examples/l2fwd/main.c  |  60 +++
 examples/l3fwd/main.c  | 110 -
 5 files changed, 196 insertions(+), 32 deletions(-)

-- 
2.5.0



[dpdk-dev] [PATCH v2 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd

2015-12-24 Thread Zhihong Wang
Handle SIGINT and SIGTERM in testpmd.

Signed-off-by: Zhihong Wang 
---
 app/test-pmd/cmdline.c | 19 +--
 app/test-pmd/testpmd.c | 38 --
 app/test-pmd/testpmd.h |  1 +
 3 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..4ff1739 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -90,6 +90,8 @@

 #include "testpmd.h"

+static struct cmdline *testpmd_cl;
+
 static void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);

 #ifdef RTE_NIC_BYPASS
@@ -9778,17 +9780,22 @@ cmdline_parse_ctx_t main_ctx[] = {
 void
 prompt(void)
 {
-   struct cmdline *cl;
-
/* initialize non-constant commands */
cmd_set_fwd_mode_init();

-   cl = cmdline_stdin_new(main_ctx, "testpmd> ");
-   if (cl == NULL) {
+   testpmd_cl = cmdline_stdin_new(main_ctx, "testpmd> ");
+   if (testpmd_cl == NULL) {
return;
}
-   cmdline_interact(cl);
-   cmdline_stdin_exit(cl);
+   cmdline_interact(testpmd_cl);
+   cmdline_stdin_exit(testpmd_cl);
+}
+
+void
+prompt_exit(void)
+{
+   if (testpmd_cl != NULL)
+   cmdline_quit(testpmd_cl);
 }

 static void
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 98ae46d..cb38d56 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1570,13 +1570,16 @@ pmd_test_exit(void)
if (test_done == 0)
stop_packet_forwarding();

-   FOREACH_PORT(pt_id, ports) {
-   printf("Stopping port %d...", pt_id);
-   fflush(stdout);
-   rte_eth_dev_close(pt_id);
-   printf("done\n");
+   if (ports != NULL) {
+   FOREACH_PORT(pt_id, ports) {
+   printf("Stopping port %d...", pt_id);
+   fflush(stdout);
+   rte_eth_dev_stop(pt_id);
+   rte_eth_dev_close(pt_id);
+   printf(" Done\n");
+   }
}
-   printf("bye...\n");
+   printf("Bye...\n");
 }

 typedef void (*cmd_func_t)(void);
@@ -1984,12 +1987,34 @@ init_port(void)
ports[pid].enabled = 1;
 }

+static void
+force_quit(void)
+{
+   pmd_test_exit();
+   prompt_exit();
+}
+
+static void
+sigint_handler(__rte_unused int signum)
+{
+   if (signum == SIGINT || signum == SIGTERM) {
+   printf("\nSignal %d received, preparing to exit...\n",
+   signum);
+   force_quit();
+   signal(signum, SIG_DFL);
+   kill(getpid(), signum);
+   }
+}
+
 int
 main(int argc, char** argv)
 {
int  diag;
uint8_t port_id;

+   signal(SIGINT, sigint_handler);
+   signal(SIGTERM, sigint_handler);
+
diag = rte_eal_init(argc, argv);
if (diag < 0)
rte_panic("Cannot init EAL\n");
@@ -2041,6 +2066,7 @@ main(int argc, char** argv)
start_packet_forwarding(0);
printf("Press enter to exit\n");
rc = read(0, &c, 1);
+   pmd_test_exit();
if (rc < 0)
return 1;
}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index ee7de98..7ffc17b 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -462,6 +462,7 @@ unsigned int parse_item_list(char* str, const char* 
item_name,
unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void prompt(void);
+void prompt_exit(void);
 void nic_stats_display(portid_t port_id);
 void nic_stats_clear(portid_t port_id);
 void nic_xstats_display(portid_t port_id);
-- 
2.5.0



[dpdk-dev] [PATCH v2 2/3] examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd

2015-12-24 Thread Zhihong Wang
Handle SIGINT and SIGTERM in l2fwd.

Signed-off-by: Zhihong Wang 
---
 examples/l2fwd/main.c | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..75899dd 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
 #include 
@@ -69,6 +71,9 @@
 #include 
 #include 

+static int force_quit = -1;
+static int signo_quit = -1;
+
 #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1

 #define NB_MBUF   8192
@@ -284,6 +289,8 @@ l2fwd_main_loop(void)
}

while (1) {
+   if (unlikely(force_quit != 0))
+   break;

cur_tsc = rte_rdtsc();

@@ -534,6 +541,45 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+static void
+stop_ports(void)
+{
+   unsigned portid, nb_ports;
+
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
+   continue;
+   }
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+}
+
+static void
+signal_handler(__rte_unused int signum)
+{
+   if (signum == SIGINT || signum == SIGTERM) {
+   printf("\nSignal %d received, preparing to exit...\n",
+   signum);
+   if (force_quit < 0) {
+   printf("Forwarding not started yet...\n");
+   /* stop ports */
+   stop_ports();
+   printf("Bye...\n");
+   /* inform if there's a caller */
+   signal(signum, SIG_DFL);
+   kill(getpid(), signum);
+   } else {
+   printf("Forwarding started already...\n");
+   signo_quit = signum;
+   force_quit = 1;
+   }
+   }
+}
+
 int
 main(int argc, char **argv)
 {
@@ -546,6 +592,9 @@ main(int argc, char **argv)
unsigned lcore_id, rx_lcore_id;
unsigned nb_ports_in_mask = 0;

+   signal(SIGINT, signal_handler);
+   signal(SIGTERM, signal_handler);
+
/* init EAL */
ret = rte_eal_init(argc, argv);
if (ret < 0)
@@ -697,11 +746,22 @@ main(int argc, char **argv)
check_all_ports_link_status(nb_ports, l2fwd_enabled_port_mask);

/* launch per-lcore init on every lcore */
+   force_quit = 0;
rte_eal_mp_remote_launch(l2fwd_launch_one_lcore, NULL, CALL_MASTER);
RTE_LCORE_FOREACH_SLAVE(lcore_id) {
if (rte_eal_wait_lcore(lcore_id) < 0)
return -1;
}

+   printf("Stopping forwarding... Done\n");
+   /* stop ports */
+   stop_ports();
+   printf("Bye...\n");
+   /* inform if there's a caller */
+   if (force_quit != 0) {
+   signal(signo_quit, SIG_DFL);
+   kill(getpid(), signo_quit);
+   }
+
return 0;
 }
-- 
2.5.0



[dpdk-dev] [PATCH v2 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

2015-12-24 Thread Zhihong Wang
Handle SIGINT and SIGTERM in l3fwd.

Signed-off-by: Zhihong Wang 
---
 examples/l3fwd/main.c | 110 +-
 1 file changed, 90 insertions(+), 20 deletions(-)

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 5b0c2dd..b9f3232 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -41,6 +41,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
 #include 
@@ -75,6 +77,9 @@
 #include 
 #include 

+static int force_quit = -1;
+static int signo_quit = -1;
+
 #define APP_LOOKUP_EXACT_MATCH  0
 #define APP_LOOKUP_LPM  1
 #define DO_RFC_1812_CHECKS
@@ -1554,6 +1559,8 @@ main_loop(__attribute__((unused)) void *dummy)
}

while (1) {
+   if (unlikely(force_quit != 0))
+   break;

cur_tsc = rte_rdtsc();

@@ -2559,6 +2566,74 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+static void
+start_ports(void)
+{
+   unsigned portid, nb_ports;
+   int ret;
+
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((enabled_port_mask & (1 << portid)) == 0) {
+   continue;
+   }
+   printf("Starting port %d...", portid);
+   ret = rte_eth_dev_start(portid);
+   if (ret < 0)
+   rte_exit(EXIT_FAILURE,
+   "rte_eth_dev_start: err=%d, port=%d\n",
+   ret, portid);
+   /*
+* If enabled, put device in promiscuous mode.
+* This allows IO forwarding mode to forward packets
+* to itself through 2 cross-connected  ports of the
+* target machine.
+*/
+   if (promiscuous_on)
+   rte_eth_promiscuous_enable(portid);
+   printf(" Done\n");
+   }
+}
+
+static void
+stop_ports(void)
+{
+   unsigned portid, nb_ports;
+
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((enabled_port_mask & (1 << portid)) == 0) {
+   continue;
+   }
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+}
+
+static void
+signal_handler(__rte_unused int signum)
+{
+   if (signum == SIGINT || signum == SIGTERM) {
+   printf("\nSignal %d received, preparing to exit...\n",
+   signum);
+   if (force_quit < 0) {
+   printf("Forwarding not started yet...\n");
+   /* stop ports */
+   stop_ports();
+   printf("Bye...\n");
+   /* inform if there's a caller */
+   signal(signum, SIG_DFL);
+   kill(getpid(), signum);
+   } else {
+   printf("Forwarding started already...\n");
+   signo_quit = signum;
+   force_quit = 1;
+   }
+   }
+}
+
 int
 main(int argc, char **argv)
 {
@@ -2572,6 +2647,9 @@ main(int argc, char **argv)
uint32_t n_tx_queue, nb_lcores;
uint8_t portid, nb_rx_queue, queue, socketid;

+   signal(SIGINT, signal_handler);
+   signal(SIGTERM, signal_handler);
+
/* init EAL */
ret = rte_eal_init(argc, argv);
if (ret < 0)
@@ -2711,34 +2789,26 @@ main(int argc, char **argv)
printf("\n");

/* start ports */
-   for (portid = 0; portid < nb_ports; portid++) {
-   if ((enabled_port_mask & (1 << portid)) == 0) {
-   continue;
-   }
-   /* Start device */
-   ret = rte_eth_dev_start(portid);
-   if (ret < 0)
-   rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, 
port=%d\n",
-   ret, portid);
-
-   /*
-* If enabled, put device in promiscuous mode.
-* This allows IO forwarding mode to forward packets
-* to itself through 2 cross-connected  ports of the
-* target machine.
-*/
-   if (promiscuous_on)
-   rte_eth_promiscuous_enable(portid);
-   }
-
+   start_ports();
check_all_ports_link_status((uint8_t)nb_ports, enabled_port_mask);

/* launch per-lcore init on every lcore */
+   force_quit = 0;
rte_eal_mp_remote_launch(main_loop, NULL, CALL_MASTER);
RTE_LCORE_FOREACH_SLAVE(lcore_id) {
if (rte_eal_wait_lcore(lcore_id) < 0)
return -1;
}

+   printf("Stopping forwarding... Done\n");
+   /*