Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping functions

2019-11-04 Thread Damjan Marion (damarion)


On 4 Nov 2019, at 18:42, Burakov, Anatoly 
mailto:anatoly.bura...@intel.com>> wrote:

On 04-Nov-19 5:34 PM, Damjan Marion (damarion) wrote:
On 4 Nov 2019, at 18:27, Burakov, Anatoly 
mailto:anatoly.bura...@intel.com>> wrote:

On 04-Nov-19 1:57 PM, Damjan Marion (damarion) wrote:
On 25 Oct 2019, at 15:02, Damjan Marion (damarion) 
mailto:damar...@cisco.com> <mailto:damar...@cisco.com>> 
wrote:



On 25 Oct 2019, at 14:23, Burakov, Anatoly 
mailto:anatoly.bura...@intel.com> 
<mailto:anatoly.bura...@intel.com>> wrote:

On 25-Oct-19 12:13 PM, Damjan Marion (damarion) wrote:
On 25 Oct 2019, at 00:32, Thomas Monjalon 
mailto:tho...@monjalon.net> <mailto:tho...@monjalon.net>> 
wrote:

24/10/2019 21:09, David Marchand:
On Thu, Oct 24, 2019 at 2:18 PM Anatoly Burakov
mailto:anatoly.bura...@intel.com> 
<mailto:anatoly.bura...@intel.com>> wrote:

The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov 
mailto:anatoly.bura...@intel.com> 
<mailto:anatoly.bura...@intel.com>>
---

Notes:
   Although `rte_vfio_dma_map` et al. was marked as deprecated in our 
documentation,
   it wasn't marked as __rte_deprecated in code. Should we still remove it?

I can see that vpp is still using this api.
I would prefer we get some ack from their side.

Shahaf?
Ray?

Do you guys have contact with VPP devs?

+Cc Damjan
Thanks for looping me in. If I remember correctly that was used only to get mlx 
PMDs working.
We can remove that calls but then mlx PMDs will stop working unless there is 
alternative solution.
From my perspective it is not big issue as we already have native rdma based 
mlx support, but i would expect that other people will complain.
Is there alternative way to tell DPDK about DMA mapping?

The rte_vfio_container_dma_map(VFIO_DEFAULT_CONTAINER, ...) is the exact 
equivalent of the functions being removed. Also, rte_dev_dma_map() is supposed 
to be the more general DMA mapping API that works with VFIO and with any other 
bus/device-specific DMA mapping.

So yes, a simple search and replace for "rte_vfio_dma_(un)?map(" to 
"rte_vfio_container_dma_(un)?map(VFIO_DEFAULT_CONTAINER, " should trigger 
exactly the same behavior.

Done, will be merged after it passes verify jobs…

https://gerrit.fd.io/r/c/vpp/+/22982
I just got report that this patch breaks some tests. Is it 
RTE_VFIO_DEFAULT_CONTAINER_FD right value to use here?
Maybe i wrongly assumed that when you said VFIO_DEFAULT_CONTAINER, you meant 
RTE_VFIO_DEFAULT_CONTAINER_FD…
—
Damjan
Yes, i think i can see the bug. Can you rerun the failing test after applying 
the following patch?

diff --git a/lib/librte_eal/linux/eal/eal_vfio.c 
b/lib/librte_eal/linux/eal/eal_vfio.c
index d9541b1220..d7887388f9 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -412,6 +412,9 @@ get_vfio_cfg_by_container_fd(int container_fd)
{
int i;

+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ return default_vfio_cfg;
+
for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
if (vfio_cfgs[i].vfio_container_fd == container_fd)
return &vfio_cfgs[i];


The problem seems to be that we're looking at actual fd, whereas the 
RTE_VFIO_DEFAULT_CONTAINER_FD value is -1, which will not match anything in 
that list.
That was exactly my reading, but I didn’t want to rush into conclusion. Will 
ask guys to test…

This should make it easier to test:

http://patches.dpdk.org/patch/62390/

This one even easier :)

https://gerrit.fd.io/r/c/vpp/+/23227




Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping functions

2019-11-04 Thread Damjan Marion (damarion)


> On 4 Nov 2019, at 18:27, Burakov, Anatoly  wrote:
> 
> On 04-Nov-19 1:57 PM, Damjan Marion (damarion) wrote:
>>> On 25 Oct 2019, at 15:02, Damjan Marion (damarion) >> <mailto:damar...@cisco.com>> wrote:
>>> 
>>> 
>>> 
>>>> On 25 Oct 2019, at 14:23, Burakov, Anatoly >>> <mailto:anatoly.bura...@intel.com>> wrote:
>>>> 
>>>> On 25-Oct-19 12:13 PM, Damjan Marion (damarion) wrote:
>>>>>> On 25 Oct 2019, at 00:32, Thomas Monjalon >>>>> <mailto:tho...@monjalon.net>> wrote:
>>>>>> 
>>>>>> 24/10/2019 21:09, David Marchand:
>>>>>>> On Thu, Oct 24, 2019 at 2:18 PM Anatoly Burakov
>>>>>>> mailto:anatoly.bura...@intel.com>> wrote:
>>>>>>>> 
>>>>>>>> The rte_vfio_dma_map/unmap API's have been marked as deprecated in
>>>>>>>> release 19.05. Remove them.
>>>>>>>> 
>>>>>>>> Signed-off-by: Anatoly Burakov >>>>>>> <mailto:anatoly.bura...@intel.com>>
>>>>>>>> ---
>>>>>>>> 
>>>>>>>> Notes:
>>>>>>>>Although `rte_vfio_dma_map` et al. was marked as deprecated in our 
>>>>>>>> documentation,
>>>>>>>>it wasn't marked as __rte_deprecated in code. Should we still 
>>>>>>>> remove it?
>>>>>>> 
>>>>>>> I can see that vpp is still using this api.
>>>>>>> I would prefer we get some ack from their side.
>>>>>>> 
>>>>>>> Shahaf?
>>>>>>> Ray?
>>>>>>> 
>>>>>>> Do you guys have contact with VPP devs?
>>>>>> 
>>>>>> +Cc Damjan
>>>>> Thanks for looping me in. If I remember correctly that was used only to 
>>>>> get mlx PMDs working.
>>>>> We can remove that calls but then mlx PMDs will stop working unless there 
>>>>> is alternative solution.
>>>>> From my perspective it is not big issue as we already have native rdma 
>>>>> based mlx support, but i would expect that other people will complain.
>>>>> Is there alternative way to tell DPDK about DMA mapping?
>>>> 
>>>> The rte_vfio_container_dma_map(VFIO_DEFAULT_CONTAINER, ...) is the exact 
>>>> equivalent of the functions being removed. Also, rte_dev_dma_map() is 
>>>> supposed to be the more general DMA mapping API that works with VFIO and 
>>>> with any other bus/device-specific DMA mapping.
>>>> 
>>>> So yes, a simple search and replace for "rte_vfio_dma_(un)?map(" to 
>>>> "rte_vfio_container_dma_(un)?map(VFIO_DEFAULT_CONTAINER, " should trigger 
>>>> exactly the same behavior.
>>> 
>>> Done, will be merged after it passes verify jobs…
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/22982
>> I just got report that this patch breaks some tests. Is it 
>> RTE_VFIO_DEFAULT_CONTAINER_FD right value to use here?
>> Maybe i wrongly assumed that when you said VFIO_DEFAULT_CONTAINER, you meant 
>> RTE_VFIO_DEFAULT_CONTAINER_FD…
>> —
>> Damjan
> Yes, i think i can see the bug. Can you rerun the failing test after applying 
> the following patch?
> 
> diff --git a/lib/librte_eal/linux/eal/eal_vfio.c 
> b/lib/librte_eal/linux/eal/eal_vfio.c
> index d9541b1220..d7887388f9 100644
> --- a/lib/librte_eal/linux/eal/eal_vfio.c
> +++ b/lib/librte_eal/linux/eal/eal_vfio.c
> @@ -412,6 +412,9 @@ get_vfio_cfg_by_container_fd(int container_fd)
> {
>   int i;
> 
> + if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
> + return default_vfio_cfg;
> +
>   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
>   if (vfio_cfgs[i].vfio_container_fd == container_fd)
>   return &vfio_cfgs[i];
> 
> 
> The problem seems to be that we're looking at actual fd, whereas the 
> RTE_VFIO_DEFAULT_CONTAINER_FD value is -1, which will not match anything in 
> that list.

That was exactly my reading, but I didn’t want to rush into conclusion. Will 
ask guys to test…



Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping functions

2019-11-04 Thread Damjan Marion (damarion)


On 27 Oct 2019, at 08:00, Shahaf Shuler 
mailto:shah...@mellanox.com>> wrote:



-Original Message-
From: Damjan Marion (damarion) mailto:damar...@cisco.com>>
Sent: Friday, October 25, 2019 2:14 PM
To: Thomas Monjalon mailto:tho...@monjalon.net>>
Cc: David Marchand 
mailto:david.march...@redhat.com>>; Anatoly Burakov
mailto:anatoly.bura...@intel.com>>; Shahaf Shuler 
mailto:shah...@mellanox.com>>; Ray
Kinsella mailto:m...@ashroe.eu>>; dev 
mailto:dev@dpdk.org>>; Neil Horman
mailto:nhor...@tuxdriver.com>>; John McNamara 
mailto:john.mcnam...@intel.com>>;
Marko Kovacevic mailto:marko.kovace...@intel.com>>; 
Bruce Richardson
mailto:bruce.richard...@intel.com>>
Subject: Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping
functions



On 25 Oct 2019, at 00:32, Thomas Monjalon 
mailto:tho...@monjalon.net>>
wrote:

24/10/2019 21:09, David Marchand:
On Thu, Oct 24, 2019 at 2:18 PM Anatoly Burakov
mailto:anatoly.bura...@intel.com>> wrote:

The rte_vfio_dma_map/unmap API's have been marked as deprecated
in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov 
mailto:anatoly.bura...@intel.com>>
---

Notes:
  Although `rte_vfio_dma_map` et al. was marked as deprecated in our
documentation,
  it wasn't marked as __rte_deprecated in code. Should we still remove
it?

I can see that vpp is still using this api.
I would prefer we get some ack from their side.

Shahaf?
Ray?

Do you guys have contact with VPP devs?

+Cc Damjan

Thanks for looping me in. If I remember correctly that was used only to get
mlx PMDs working.
We can remove that calls but then mlx PMDs will stop working unless there is
alternative solution.

From my perspective it is not big issue as we already have native rdma based
mlx support, but i would expect that other people will complain.

Is there alternative way to tell DPDK about DMA mapping?

Damjan I don't follow.

Why would using the rte_dev_dma_map would break Mellanox PMDs?

May be just me confused. I remember you guys pot some hack to get it working in 
the past, but that patch may be using different way to get DMA mappings…





Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping functions

2019-11-04 Thread Damjan Marion (damarion)


On 25 Oct 2019, at 15:02, Damjan Marion (damarion) 
mailto:damar...@cisco.com>> wrote:



On 25 Oct 2019, at 14:23, Burakov, Anatoly 
mailto:anatoly.bura...@intel.com>> wrote:

On 25-Oct-19 12:13 PM, Damjan Marion (damarion) wrote:
On 25 Oct 2019, at 00:32, Thomas Monjalon 
mailto:tho...@monjalon.net>> wrote:

24/10/2019 21:09, David Marchand:
On Thu, Oct 24, 2019 at 2:18 PM Anatoly Burakov
mailto:anatoly.bura...@intel.com>> wrote:

The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov 
mailto:anatoly.bura...@intel.com>>
---

Notes:
   Although `rte_vfio_dma_map` et al. was marked as deprecated in our 
documentation,
   it wasn't marked as __rte_deprecated in code. Should we still remove it?

I can see that vpp is still using this api.
I would prefer we get some ack from their side.

Shahaf?
Ray?

Do you guys have contact with VPP devs?

+Cc Damjan
Thanks for looping me in. If I remember correctly that was used only to get mlx 
PMDs working.
We can remove that calls but then mlx PMDs will stop working unless there is 
alternative solution.
From my perspective it is not big issue as we already have native rdma based 
mlx support, but i would expect that other people will complain.
Is there alternative way to tell DPDK about DMA mapping?

The rte_vfio_container_dma_map(VFIO_DEFAULT_CONTAINER, ...) is the exact 
equivalent of the functions being removed. Also, rte_dev_dma_map() is supposed 
to be the more general DMA mapping API that works with VFIO and with any other 
bus/device-specific DMA mapping.

So yes, a simple search and replace for "rte_vfio_dma_(un)?map(" to 
"rte_vfio_container_dma_(un)?map(VFIO_DEFAULT_CONTAINER, " should trigger 
exactly the same behavior.

Done, will be merged after it passes verify jobs…

https://gerrit.fd.io/r/c/vpp/+/22982

I just got report that this patch breaks some tests. Is it 
RTE_VFIO_DEFAULT_CONTAINER_FD right value to use here?
Maybe i wrongly assumed that when you said VFIO_DEFAULT_CONTAINER, you meant 
RTE_VFIO_DEFAULT_CONTAINER_FD…

—
Damjan








Re: [dpdk-dev] [PATCH v1 3/3] ethdev: enhance the API for getting burst mode information

2019-11-02 Thread Damjan Marion (damarion)



> On 2 Nov 2019, at 09:38, Slava Ovsiienko  wrote:
> 
> Hi
>> -Original Message-
>> From: Liu, Yu Y 
>> Sent: Saturday, November 2, 2019 8:56
>> To: Wang, Haiyue ; Thomas Monjalon
>> 
>> Cc: dev@dpdk.org; arybche...@solarflare.com; Yigit, Ferruh
>> ; jerinjac...@gmail.com; Ye, Xiaolong
>> ; Kinsella, Ray ; Sun,
>> Chenmin ; Slava Ovsiienko
>> ; Damjan Marion (damarion)
>> ; Liu, Yu Y 
>> Subject: RE: [PATCH v1 3/3] ethdev: enhance the API for getting burst mode
>> information
>> 
>> Add Damjan from FD.io for awareness...
>> 
>> Hi Thomas,
>> 
>> Long time no see. Sorry I use outlook which is not friendly to community
>> email.
>> 
>>> Anyway I will propose to replace this API in the next release.
>> Will your plan be affected by API/ABI stable plan?
>> BTW, if you propose new change in next release, it will make DPDK
>> consumer(FD.io) to change again.
>> So even if it is not affected to the API/ABI stable plan, do we still have 
>> time
>> to get a solution for everyone in DPDK 19.11 with your
>> contribution/acceleration?
>> 
>>> I suspect a real hidden issue in Intel CPUs that you try to mitigate.
>> Please be rest assured it is not the case.
>> This request is just from one FD.io project internal bug " tx/rx burst 
>> function
>> is shown as nil" reported by Chenmin.
> 
> Why just the presenting string with function name (possible with suffix) is 
> not enough?
> I would like to see this API  (strings approach) in mlx5 either, dropping the 
> entire feature
> does not look nice, as for me.
> 
> We could consider some requirements for the name suffices to distinguish 
> whether
> function uses vector instructions and which ones if any.
> 
>> My understanding is DPDK behavior was taken as bug for someone in FD.io
>> project and potentially will mislead other DPDK consumer.
> 
> Why does FD.io code want to know which vector extension is used by burst 
> routines?
> Is it going to share/preserve some resources (registers, etc.)? Is it robust ?
> Burst routines might not know whether vector extensions is used (they might 
> call 
> libraries, even rte_memcpy() can use vectors in implicit fashion).

This is jut debug CLI print, it was added by me as a result of frustration of 
dealing constantly with
operational issues where people are reporting lower performance caused by DPDK 
deciding
for variety of reasons to switch from vector PMD to scalar one.

> 
>> Haiyue is working with Chenmin to address the issue and with your support it
>> will be even better.
>> 
>> Your support will be highly appreciated!
>> 
>> Thanks & Regards,
>> Yu Liu
>> 
>> -Original Message-
>> From: dev  On Behalf Of Wang, Haiyue
>> Sent: Saturday, November 2, 2019 1:30 PM
>> To: Thomas Monjalon 
>> Cc: dev@dpdk.org; arybche...@solarflare.com; Yigit, Ferruh
>> ; jerinjac...@gmail.com; Ye, Xiaolong
>> ; Kinsella, Ray ; Sun,
>> Chenmin ; Slava Ovsiienko
>> 
>> Subject: Re: [dpdk-dev] [PATCH v1 3/3] ethdev: enhance the API for getting
>> burst mode information
>> 
>>> -Original Message-
>>> From: Thomas Monjalon 
>>> Sent: Saturday, November 2, 2019 06:46
>>> To: Wang, Haiyue 
>>> Cc: dev@dpdk.org; arybche...@solarflare.com; Yigit, Ferruh
>>> ; jerinjac...@gmail.com; Ye, Xiaolong
>>> ; Kinsella, Ray ; Sun,
>>> Chenmin ; Slava Ovsiienko
>>> 
>>> Subject: Re: [PATCH v1 3/3] ethdev: enhance the API for getting burst
>>> mode information
>>> 
>>> Thank you for trying to address comments done late.
>>> 
>>> 31/10/2019 18:11, Haiyue Wang:
>>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>> 
>> 
>>>> +#define RTE_ETH_BURST_ALTIVEC   (1ULL << 2)
>>>> +#define RTE_ETH_BURST_NEON  (1ULL << 3)
>>>> +#define RTE_ETH_BURST_SSE   (1ULL << 4)
>>>> +#define RTE_ETH_BURST_AVX2  (1ULL << 5)
>>>> +#define RTE_ETH_BURST_AVX512(1ULL << 6)
>>> 
>>> Of course, I still believe that giving a special treatment to vector
>>> instructions is wrong.
>>> You did not justify why it needs to be defined in bits instead of
>>> string. I am not asking again because anyway you don't really reply. I
>>> think you are executing an order you received and I don't want to
>>> blame you more.
>>> I suspect a real hidden issue in Intel CPUs that you try to mitigate.
>>> No need to reply to this comment.
>>> Anyway I will propose to replace this API in the next release.
>> 
>> Never mind, if this design is truly ugly, drop it all now. I also prefer to 
>> do the
>> best, that's why open source is amazing, thanks! ;-)
> 



Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping functions

2019-10-25 Thread Damjan Marion (damarion)


On 25 Oct 2019, at 14:23, Burakov, Anatoly 
mailto:anatoly.bura...@intel.com>> wrote:

On 25-Oct-19 12:13 PM, Damjan Marion (damarion) wrote:
On 25 Oct 2019, at 00:32, Thomas Monjalon 
mailto:tho...@monjalon.net>> wrote:

24/10/2019 21:09, David Marchand:
On Thu, Oct 24, 2019 at 2:18 PM Anatoly Burakov
mailto:anatoly.bura...@intel.com>> wrote:

The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov 
mailto:anatoly.bura...@intel.com>>
---

Notes:
   Although `rte_vfio_dma_map` et al. was marked as deprecated in our 
documentation,
   it wasn't marked as __rte_deprecated in code. Should we still remove it?

I can see that vpp is still using this api.
I would prefer we get some ack from their side.

Shahaf?
Ray?

Do you guys have contact with VPP devs?

+Cc Damjan
Thanks for looping me in. If I remember correctly that was used only to get mlx 
PMDs working.
We can remove that calls but then mlx PMDs will stop working unless there is 
alternative solution.
From my perspective it is not big issue as we already have native rdma based 
mlx support, but i would expect that other people will complain.
Is there alternative way to tell DPDK about DMA mapping?

The rte_vfio_container_dma_map(VFIO_DEFAULT_CONTAINER, ...) is the exact 
equivalent of the functions being removed. Also, rte_dev_dma_map() is supposed 
to be the more general DMA mapping API that works with VFIO and with any other 
bus/device-specific DMA mapping.

So yes, a simple search and replace for "rte_vfio_dma_(un)?map(" to 
"rte_vfio_container_dma_(un)?map(VFIO_DEFAULT_CONTAINER, " should trigger 
exactly the same behavior.

Done, will be merged after it passes verify jobs…

https://gerrit.fd.io/r/c/vpp/+/22982

Thanks,

Damjan



Re: [dpdk-dev] [PATCH] vfio: remove deprecated DMA mapping functions

2019-10-25 Thread Damjan Marion (damarion)


> On 25 Oct 2019, at 00:32, Thomas Monjalon  wrote:
> 
> 24/10/2019 21:09, David Marchand:
>> On Thu, Oct 24, 2019 at 2:18 PM Anatoly Burakov
>>  wrote:
>>> 
>>> The rte_vfio_dma_map/unmap API's have been marked as deprecated in
>>> release 19.05. Remove them.
>>> 
>>> Signed-off-by: Anatoly Burakov 
>>> ---
>>> 
>>> Notes:
>>>Although `rte_vfio_dma_map` et al. was marked as deprecated in our 
>>> documentation,
>>>it wasn't marked as __rte_deprecated in code. Should we still remove it?
>> 
>> I can see that vpp is still using this api.
>> I would prefer we get some ack from their side.
>> 
>> Shahaf?
>> Ray?
>> 
>> Do you guys have contact with VPP devs?
> 
> +Cc Damjan

Thanks for looping me in. If I remember correctly that was used only to get mlx 
PMDs working.
We can remove that calls but then mlx PMDs will stop working unless there is 
alternative solution.

From my perspective it is not big issue as we already have native rdma based 
mlx support, but i would expect that other people will complain.

Is there alternative way to tell DPDK about DMA mapping?

Thanks,

— 
Damjan



Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)

2019-05-07 Thread Damjan Marion (damarion)

--
Damjan

On 7 May 2019, at 13:29, Honnappa Nagarahalli 
mailto:honnappa.nagaraha...@arm.com>> wrote:


On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
Memory interface (memif), provides high performance packet transfer
over shared memory.

Signed-off-by: Jakub Grajciar mailto:jgraj...@cisco.com>>


<...>

  With that in mind, I believe that 23Mpps is fine performance. No
performance target is
  defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly
ordered architectures (at least on Arm). IMO, C11 atomics should be used to
implement the fast path functions at least. This ensures optimal performance
on all supported architectures in DPDK.

  Atomics are not required by memif driver.

Correct, only thing we need is store barrier once per batch of packets, to
make sure that descriptor changes are globally visible before we bump head
pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory 
model aware atomic operations [1] show better performance. So, instead of using 
full memory barriers you can use store-release and load-acquire semantics. A 
similar change was done to svm_fifo data structure in VPP [2] (though the 
original algorithm used was different from the one used in this memif patch).

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/

Typically we have hundreds of normal memory stores into the packet ring, then 
single store fence and then finally one more store to bump head pointer.
Sorry. I’m not getting what are you suggesting here, and how that can be 
faster...


Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)

2019-05-06 Thread Damjan Marion (damarion)


> On 6 May 2019, at 13:00, Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES 
> at Cisco)  wrote:
> 
> 
> 
> From: Honnappa Nagarahalli 
> Sent: Friday, May 3, 2019 6:27 AM
> To: Jakub Grajciar; Ferruh Yigit; dev@dpdk.org; Honnappa Nagarahalli
> Cc: nd; nd
> Subject: RE: [dpdk-dev] [RFC v5] /net: memory interface (memif)
> 
>> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
>>> Memory interface (memif), provides high performance packet transfer
>>> over shared memory.
>>> 
>>> Signed-off-by: Jakub Grajciar 
>> 
> 
> <...>
> 
>>With that in mind, I believe that 23Mpps is fine performance. No
>> performance target is
>>defined, the goal is to be as fast as possible.
> Use of C11 atomics have proven to provide better performance on weakly 
> ordered architectures (at least on Arm). IMO, C11 atomics should be used to 
> implement the fast path functions at least. This ensures optimal performance 
> on all supported architectures in DPDK.
> 
>Atomics are not required by memif driver.

Correct, only thing we need is store barrier once per batch of packets,
to make sure that descriptor changes are globally visible before we bump head 
pointer.

-- 
Damjan

[dpdk-dev] Adding API to force freeing consumed buffers in TX ring

2016-11-21 Thread Damjan Marion (damarion)

Hi,

Currently in VPP we do memcpy of whole packet when we need to do 
replication as we cannot know if specific buffer is transmitted
from tx ring before we update it again (i.e. l2 header rewrite).

Unless there is already a way to address this issue in DPDK which I?m not aware
of my proposal is that we provide mechanism for polling TX ring 
for consumed buffers. This can be either completely new API or 
extension of rte_etx_tx_burst (i.e. special case when nb_pkts=0).

This will allows us to start polling tx ring when we expect some 
mbuf back, instead of waiting for next tx burst (which we don?t know
when it will happen) and hoping that we will reach free_threshold soon.

Any thoughts?

Thanks,

Damjan





[dpdk-dev] [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-06 Thread Damjan Marion (damarion)

On 6 Oct 2016, at 04:19, Jeff Shaw mailto:jeffrey.b.shaw at intel.com>> wrote:

On Wed, Oct 05, 2016 at 04:57:28PM -0700, Chen, Jing D wrote:
Hi,

-Original Message-
From: Shaw, Jeffrey B
Sent: Wednesday, October 5, 2016 5:13 PM
To: dev at dpdk.org
Cc: Zhang, Helin mailto:helin.zhang at intel.com>>; 
Wu, Jingjing
mailto:jingjing.wu at intel.com>>; damarion at 
cisco.com; Zhang, Qi Z
mailto:qi.z.zhang at intel.com>>; Chen, Jing D 
mailto:jing.d.chen at intel.com>>
Subject: [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

From: Damjan Marion mailto:damar...@cisco.com>>

Decode the checksum flags from the rx descriptor, setting the appropriate bit
in the mbuf ol_flags field when the flag indicates a bad checksum.

Signed-off-by: Damjan Marion mailto:damarion at 
cisco.com>>
Signed-off-by: Jeff Shaw mailto:jeffrey.b.shaw at 
intel.com>>
---
drivers/net/i40e/i40e_rxtx_vec.c | 48 +++---
--
1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c
b/drivers/net/i40e/i40e_rxtx_vec.c
index 6c63141..d2267ad 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -138,19 +138,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)  static
inline void  desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)  {
- __m128i vlan0, vlan1, rss;
- union {
- uint16_t e[4];
- uint64_t dword;
- } vol;
+ __m128i vlan0, vlan1, rss, l3_l4e;

/* mask everything except RSS, flow director and VLAN flags
 * bit2 is for VLAN tag, bit11 for flow director indication
 * bit13:12 for RSS indication.
 */
- const __m128i rss_vlan_msk = _mm_set_epi16(
- 0x, 0x, 0x, 0x,
- 0x3804, 0x3804, 0x3804, 0x3804);
+ const __m128i rss_vlan_msk = _mm_set_epi32(
+ 0x1c03004, 0x1c03004, 0x1c03004, 0x1c03004);

Mask is wrong here. Should be 0x1c03804, ..., etc.


/* map rss and vlan type to rss hash and vlan flag */
const __m128i vlan_flags = _mm_set_epi8(0, 0, 0, 0, @@ -163,23
+158,36 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
PKT_RX_RSS_HASH | PKT_RX_FDIR,
PKT_RX_RSS_HASH, 0, 0,
0, 0, PKT_RX_FDIR, 0);

- vlan0 = _mm_unpackhi_epi16(descs[0], descs[1]);
- vlan1 = _mm_unpackhi_epi16(descs[2], descs[3]);
- vlan0 = _mm_unpacklo_epi32(vlan0, vlan1);
+ const __m128i l3_l4e_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD
| PKT_RX_IP_CKSUM_BAD,
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
+ PKT_RX_EIP_CKSUM_BAD,
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
+ PKT_RX_L4_CKSUM_BAD,
+ PKT_RX_IP_CKSUM_BAD,
+ 0);
+
+ vlan0 = _mm_unpackhi_epi32(descs[0], descs[1]);
+ vlan1 = _mm_unpackhi_epi32(descs[2], descs[3]);
+ vlan0 = _mm_unpacklo_epi64(vlan0, vlan1);

vlan1 = _mm_and_si128(vlan0, rss_vlan_msk);
vlan0 = _mm_shuffle_epi8(vlan_flags, vlan1);

- rss = _mm_srli_epi16(vlan1, 11);
+ rss = _mm_srli_epi32(vlan1, 12);
rss = _mm_shuffle_epi8(rss_flags, rss);

My bad. Original code will use bit[13:11] to identify RSS and FDIR flag. Now
It masked bit 11 out when creating " rss_vlan_msk" and doing shift above,
while it still try to use  original "rss_flags"?

Good catch.  I have no idea how you spotted that, and you're right, we should
be shifting by 11, not 12. Also the mask needs to be updated (as you
mentioned to me offline) which I noted above.

Damjan, unless you object I'll send a v3 with an updated rss_vlan_msk and
the 11 bit shift so we also get the Flow Director Filter Match (FLM)
indication.

Absolutely no objection, and thanks for taking care for that!

Damjan




[dpdk-dev] spinlock: Move constructor function out of header file

2016-07-16 Thread Damjan Marion (damarion)

> On 15 Jul 2016, at 17:08, Thomas Monjalon  
> wrote:
> 
> 2016-07-15 16:37, Thomas Monjalon:
>> I will apply it with trivial changes suggested by Jan and
>> the small needed changes that I describe below:
>> 
>> 2016-07-14 20:03, Jan Viktorin:
>>> On Thu, 14 Jul 2016 15:27:29 +0200
>>> damarion at cisco.com wrote:
 --- /dev/null
 +++ b/lib/librte_eal/common/arch/x86/rte_spinlock.c
>> [...]
 +uint8_t rte_rtm_supported; /* cache the flag to avoid the overhead
 +of the rte_cpu_get_flag_enabled function */
>> 
>> This variable must be exported in the .map file.
>> 
 --- a/lib/librte_eal/linuxapp/eal/Makefile
 +++ b/lib/librte_eal/linuxapp/eal/Makefile
 @@ -106,6 +106,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_keepalive.c
 
 # from arch dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_cpuflags.c
 +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_spinlock.c
>>> 
>>> This is not good, you provide rte_spinlock.c only for x86. Building
>>> for any other arch would fail to find this file.
>> 
>> This change do the trick:
>> 
>> -SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_spinlock.c
>> +SRCS-$(CONFIG_RTE_ARCH_X86) += rte_spinlock.c
>> 
>> (Note that CONFIG_RTE_EXEC_ENV_LINUXAPP check is not needed inside linuxapp 
>> EAL)
>> 
>>> Moreover, the bsdapp/eal/Makefile should reflect this situation as
>>> well.
>> 
>> Yes
> 
> Applied with above changes, thanks

Thanks!


[dpdk-dev] spinlock: Move constructor function out of header file

2016-07-16 Thread Damjan Marion (damarion)

> On 15 Jul 2016, at 12:09, Thomas Monjalon  
> wrote:
> 
> 2016-07-15 09:54, Damjan Marion:
>> So we don?t have much pending beside 2 patches for i40e which 
>> Jeff submitted yesterday and they will i guess need to wait for 16.11.
> 
> Yes these i40e patches will probably have to wait 16.11.
> 
>> Only one which I have on my mind is:
>> 
>> https://git.fd.io/cgit/vpp/tree/dpdk/dpdk-16.04_patches/0005-Allow-applications-to-override-rte_delay_us.patch
>> 
>> This is big issue for us when running single-core, as some
>> drivers tend to call rte_delay_us for a long time, and that is 
>> causing packet drops. I.e. if you do stop/start on one interface
>> and you are running BFD on another one, BFD will fail?
>> 
>> Current patch is hack, it basically allows us to override 
>> delay function so we can de-schedule it,
>> do some other useful work while waiting for delay to finish
>> and then give control back to original function?
>> 
>> Maybe we can fix this by providing a delay callback functionality...
> 
> Yes it could be an idea.
> Please send a RFC patch.

OK, I will ask one of our guys to work on it...


[dpdk-dev] spinlock: Move constructor function out of header file

2016-07-15 Thread Damjan Marion (damarion)

> On 15 Jul 2016, at 10:31, Thomas Monjalon  
> wrote:
> 
> 2016-07-14 22:20, Damjan Marion:
>> 
>>> On 15 Jul 2016, at 00:06, Thomas Monjalon  
>>> wrote:
>>> 
>>> 2016-07-14 18:10, Damjan Marion:
 Dear Jan,
 
 Thank you for your comments. A bit too much overhead to submit simple patch
 so let?s forget about it. I will just add it as it is to our private
 collection of patches.
>>> 
>>> These are changes trivial to fix when applying.
>>> I strongly prefer that you upstream patches instead of keeping patches
>>> in the VPP repository. I will help you in this task.
>>> Thanks for the effort.
>> 
>> Yeah, I agree with you, unfortunately it is all about time,
>> for me it is still cheaper to rebase them.
>> 
>> I respect your rules, but I just don?t have enough free cycles
>> to spend learning all of them, for my occasional patch submission.
> 
> We know there is a learning curve for the submission process.
> That's why we do not expect it to be fully satisfied, especially
> from occasional contributors.
> I am used to do some not significant changes before applying to
> help newcomers patches being accepted. That's what I will do in
> your case.
> As I said previously I will help you to drop your local patches.
> 
> Please continue sending other patches with a detailed explanation
> and we will take care of them.

Ok, Thanks!

So we don?t have much pending beside 2 patches for i40e which 
Jeff submitted yesterday and they will i guess need to wait for 16.11.

Only one which I have on my mind is:

https://git.fd.io/cgit/vpp/tree/dpdk/dpdk-16.04_patches/0005-Allow-applications-to-override-rte_delay_us.patch

This is big issue for us when running single-core, as some
drivers tend to call rte_delay_us for a long time, and that is 
causing packet drops. I.e. if you do stop/start on one interface
and you are running BFD on another one, BFD will fail?

Current patch is hack, it basically allows us to override 
delay function so we can de-schedule it,
do some other useful work while waiting for delay to finish
and then give control back to original function?

Maybe we can fix this by providing a delay callback functionality...

Thanks,

Damjan




[dpdk-dev] spinlock: Move constructor function out of header file

2016-07-14 Thread Damjan Marion (damarion)

> On 15 Jul 2016, at 00:06, Thomas Monjalon  
> wrote:
> 
> 2016-07-14 18:10, Damjan Marion:
>> Dear Jan,
>> 
>> Thank you for your comments. A bit too much overhead to submit simple patch
>> so let?s forget about it. I will just add it as it is to our private
>> collection of patches.
> 
> These are changes trivial to fix when applying.
> I strongly prefer that you upstream patches instead of keeping patches
> in the VPP repository. I will help you in this task.
> Thanks for the effort.

Yeah, I agree with you, unfortunately it is all about time,
for me it is still cheaper to rebase them.

I respect your rules, but I just don?t have enough free cycles
to spend learning all of them, for my occasional patch submission.


[dpdk-dev] spinlock: Move constructor function out of header file

2016-07-14 Thread Damjan Marion (damarion)

Dear Jan,

Thank you for your comments. A bit too much overhead to submit simple patch
so let?s forget about it. I will just add it as it is to our private
collection of patches.

If anybody wants to pick it from here, please do...

Thanks,

Damjan


> On 14 Jul 2016, at 20:03, Jan Viktorin  wrote:
> 
> Hello Damjan,
> 
> thank you for the patch. It makes sense to me.
> 
> Next time, please CC the appropriate maintainers.
> (See the MAINTAINERS file in the root of the DPDK source tree.)
> 
> In the subject after "spinlock:" you should start with a lower case
> letter, so "move constructor..."
> 
> On Thu, 14 Jul 2016 15:27:29 +0200
> damarion at cisco.com wrote:
> 
>> From: Damjan Marion 
>> 
>> Having constructor function in the header gile is generaly
> 
> I'd write:
> 
> Having constructor functions in header files is generally a bad idea.
> 
> Anyway:
> 
> s/gile/file/
> 
>> bad idea, as it will eventually be implanted to 3rd party
>> library.
>> 
>> In this case it is causing linking issues with 3rd party
> 
> it causes linking issues
> 
>> libraries when application is not linked to dpdk, due to missing
> 
> an application
> 
> to dpdk due to a missing gymbol (no comma)
> 
>> symbol called by constructor.
> 
> Please include the following line:
> 
> Fixes: ba7468997ea6 ("spinlock: add HTM lock elision for x86")
> 
>> 
>> Signed-off-by: Damjan Marion 
>> 
>> ---
>> lib/librte_eal/common/arch/x86/rte_spinlock.c  | 45 
>> ++
>> .../common/include/arch/x86/rte_spinlock.h | 13 ++-
>> lib/librte_eal/linuxapp/eal/Makefile   |  1 +
>> 3 files changed, 49 insertions(+), 10 deletions(-)
>> create mode 100644 lib/librte_eal/common/arch/x86/rte_spinlock.c
>> 
>> diff --git a/lib/librte_eal/common/arch/x86/rte_spinlock.c 
>> b/lib/librte_eal/common/arch/x86/rte_spinlock.c
>> new file mode 100644
>> index 000..ad8cc5a
>> --- /dev/null
>> +++ b/lib/librte_eal/common/arch/x86/rte_spinlock.c
>> @@ -0,0 +1,45 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + * * Redistributions of source code must retain the above copyright
>> + *   notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above copyright
>> + *   notice, this list of conditions and the following disclaimer in
>> + *   the documentation and/or other materials provided with the
>> + *   distribution.
>> + * * Neither the name of Intel Corporation nor the names of its
>> + *   contributors may be used to endorse or promote products derived
>> + *   from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +#include "rte_cpuflags.h"
>> +
>> +#include 
> 
> According to:
> http://dpdk.org/doc/guides-16.04/contributing/coding_style.html#coding-style
> 
> you should change the order of these includes.
> 
>> +
>> +uint8_t rte_rtm_supported; /* cache the flag to avoid the overhead
>> +  of the rte_cpu_get_flag_enabled function */
> 
> The comment should be placed before the declaration or use the /**< */
> Doxygen style. I'd prefer to placed it before. Can you fix it with this
> patch?
> 
>> +
>> +static void __attribute__((constructor))
>> +rte_rtm_init(void)
>> +{
>> +rte_rtm_supported = rte_cpu_get_flag_enabled(RTE_CPUFLAG_RTM);
>> +}
>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_spinlock.h 
>> b/lib/librte_eal/common/include/arch/x86/rte_spinlock.h
>> index 02f95cb..8e630c2 100644
>> --- a/lib/librte_eal/common/include/arch/x86/rte_spinlock.h
>> +++ b/lib/librte_eal/common/include/arch/x86/rte_spinlock.h
>> @@ -94,24 +94,17 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
>> }
>> #endif
>> 
>> -static uint8_t rtm_supported; /* cache the flag to avoid the overhead
>> - of the rte_cpu_get_flag_enabled func

[dpdk-dev] 16.07-rc2 issue with rte_rtm_init(void) constructor

2016-07-14 Thread Damjan Marion (damarion)

> On 14 Jul 2016, at 13:30, Thomas Monjalon  
> wrote:
> 
> 2016-07-14 09:36, Damjan Marion:
 
 linking fails with:
 dpdk/include/rte_spinlock.h:103: undefined reference to 
 `rte_cpu_get_flag_enabled?
 
 Is there any chance that this one is moved to some .c file, so it is loaded
 only when it is really needed?
>>> 
>>> Yes it could be moved to lib/librte_eal/common/arch/x86/.
>> 
>> Any chance to get this in 16.07 ?
> 
> Yes maybe if you submit a patch quickly and it is clean enough.

OK, I already have working patch[1], but it is possibly not clean enough.
Basically I moved rte_rtm_init to lib/librte_eal/common/arch/x86/rte_cpuflags.c

Is this the right place?

Should we call ?rtm_supported? different, knowing that now it is not static 
anymore?

Thanks,

Damjan


[1] 
https://github.com/vpp-dev/vpp/blob/dpdk-16.07-rc2/dpdk/dpdk-16.07-rc2_patches/0001-Fix-linking-issue-due-to-constructor-in-header-file.patch




[dpdk-dev] 16.07-rc2 issue with rte_rtm_init(void) constructor

2016-07-14 Thread Damjan Marion (damarion)

> On 14 Jul 2016, at 10:20, Thomas Monjalon  
> wrote:
> 
> 2016-07-13 22:58, Damjan Marion:
>> I have issues with linking application to 16.07-rc2.
>> 
>> Looks like reason is constructor function in include file,
>> so our unit test apps are failing to link as they are not linked with dpdk 
>> libs.
>> (and they should not be as they are not calling any dpdk function).
> 
> I don't understand:
> Why are you linking DPDK if you do not use any DPDK function?

If i simplify it, i have 4 components:

1. libdpdk
2. libXXX
3. app-A
4. app-B

libXXX includes some dpdk headers, mainly because of data structures like 
rte_mbuf, and some functions are calling
functions form libdpdk.

app-A links against libdpdk and libXXX
app-B links against libXXX only and only uses functions from libXXX which 
doesn?t have dpdk dependency

This is working fine in 14.04. In 14.07 rte_rtm_init() implants himself into 
libXXX as it is defined
in header file, and that causes linking to fail due to missing 
rte_cpu_get_flag_enabled().

> 
>> static inline void __attribute__((constructor))
>> rte_rtm_init(void)
>> {
>>rtm_supported = rte_cpu_get_flag_enabled(RTE_CPUFLAG_RTM);
>> }
>> 
>> linking fails with:
>> dpdk/include/rte_spinlock.h:103: undefined reference to 
>> `rte_cpu_get_flag_enabled?
>> 
>> Is there any chance that this one is moved to some .c file, so it is loaded
>> only when it is really needed?
> 
> Yes it could be moved to lib/librte_eal/common/arch/x86/.


Any chance to get this in 16.07 ?

Thanks!



[dpdk-dev] 16.07-rc2 issue with rte_rtm_init(void) constructor

2016-07-13 Thread Damjan Marion (damarion)

Folks,

I have issues with linking application to 16.07-rc2.

Looks like reason is constructor function in include file,
so our unit test apps are failing to link as they are not linked with dpdk libs.
(and they should not be as they are not calling any dpdk function).


static inline void __attribute__((constructor))
rte_rtm_init(void)
{
rtm_supported = rte_cpu_get_flag_enabled(RTE_CPUFLAG_RTM);
}

linking fails with:
dpdk/include/rte_spinlock.h:103: undefined reference to 
`rte_cpu_get_flag_enabled?

Is there any chance that this one is moved to some .c file, so it is loaded
only when it is really needed?

Thanks,

Damjan




[dpdk-dev] weak functions in some drivers

2016-06-21 Thread Damjan Marion (damarion)


> On 21 Jun 2016, at 09:01, Ferruh Yigit  wrote:
> 
> Hi Damjan,
> 
> On 6/21/2016 4:01 PM, Damjan Marion (damarion) wrote:
>> 
>> Hello,
>> 
>> We just spent few hours troubleshooting why vPMD is not working
>> in i40e driver. Conclusion was that problem is caused by linker 
>> linking the wrong instance of the i40e_rx_vec_dev_conf_condition_check(...).
>> 
>> That function is defined 2 times, once in i40e_rxtx.c and once in 
>> i40e_rxtx_vec.c. First one is defined as weak and it just returns -1.
>> 
>> librte_pmd_i40e.a contains both versions:
>> 
>> $ objdump -x librte_pmd_i40e.a| grep i40e_rx_vec_dev_conf_condition_check
>> 6ca0  wF .text   0006 
>> i40e_rx_vec_dev_conf_condition_check
>> 07c1 g F .text.unlikely  001c 
>> i40e_rx_vec_dev_conf_condition_check
>> 
>> However when we are linking our app, linker was picking 1st (weak) one and 
>> vPMD init was failing.
> App is linking with correct instance of the function for me, how can I
> reproduce this linkage issue?
> 
>> 
>> Workaround we applied to get int working:  -Wl,--whole-archive  
>> -Wl,?no-whole-archive
> These flags already wraps PMD libraries, can you please give more detail
> about workaround?
> 

You can see our exact fix here:

https://git.fd.io/cgit/vpp/commit/?id=0977e4baabd97d1de711a3d7a0f285364a84159c

If you want to repro just pick the commit before this one?

Thanks,

Damjan




[dpdk-dev] weak functions in some drivers

2016-06-21 Thread Damjan Marion (damarion)

Hello,

We just spent few hours troubleshooting why vPMD is not working
in i40e driver. Conclusion was that problem is caused by linker 
linking the wrong instance of the i40e_rx_vec_dev_conf_condition_check(...).

That function is defined 2 times, once in i40e_rxtx.c and once in 
i40e_rxtx_vec.c. First one is defined as weak and it just returns -1.

librte_pmd_i40e.a contains both versions:

$ objdump -x librte_pmd_i40e.a| grep i40e_rx_vec_dev_conf_condition_check
6ca0  wF .text  0006 
i40e_rx_vec_dev_conf_condition_check
07c1 g F .text.unlikely 001c 
i40e_rx_vec_dev_conf_condition_check

However when we are linking our app, linker was picking 1st (weak) one and vPMD 
init was failing.

Workaround we applied to get int working:  -Wl,--whole-archive  
-Wl,?no-whole-archive

What is not clear to me is motivation to use weak here instead of simply using 
CONFIG_RTE_I40E_INC_VECTOR
macro to exclude stubs in i40e_rxtx.c. It will make library smaller and avoid 
issues like this one
which are quite hard to troubleshoot.

BTW Looks like same issue is happening with fm10k driver.

Thanks,

Damjan





[dpdk-dev] rte_mbuf.next in 2nd cacheline

2015-06-17 Thread Damjan Marion (damarion)

> On 17 Jun 2015, at 16:06, Bruce Richardson  
> wrote:
> 
> On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote:
>> 
>>> On 15 Jun 2015, at 16:12, Bruce Richardson  
>>> wrote:
>>> 
>>> The next pointers always start out as NULL when the mbuf pool is created. 
>>> The
>>> only time it is set to non-NULL is when we have chained mbufs. If we never 
>>> have
>>> any chained mbufs, we never need to touch the next field, or even read it - 
>>> since
>>> we have the num-segments count in the first cache line. If we do have a 
>>> multi-segment
>>> mbuf, it's likely to be a big packet, so we have more processing time 
>>> available
>>> and we can then take the hit of setting the next pointer.
>> 
>> There are applications which are not using rx offload, but they deal with 
>> chained mbufs.
>> Why they are less important than ones using rx offload? This is something 
>> people 
>> should be able to configure on build time.
> 
> It's not that they are less important, it's that the packet processing cycle 
> count
> budget is going to be greater. A packet which is 64 bytes, or 128 bytes in 
> size
> can make use of a number of RX offloads to reduce it's processing time. 
> However,
> a 64/128 packet is not going to be split across multiple buffers [unless we
> are dealing with a very unusual setup!].
> 
> To handle 64 byte packets at 40G line rate, one has 50 cycles per core per 
> packet
> when running at 3GHz. [30 cycles / 59.5 mpps].
> If we assume that we are dealing with fairly small buffers
> here, and that anything greater than 1k packets are chained, we still have 626
> cycles per 3GHz core per packet to work with for that 1k packet. Given that
> "normal" DPDK buffers are 2k in size, we have over a thousand cycles per 
> packet
> for any packet that is split. 
> 
> In summary, packets spread across multiple buffers are large packets, and so 
> have
> larger packet cycle count budgets and so can much better absorb the cost of
> touching a second cache line in the mbuf than a 64-byte packet can. Therefore,
> we optimize for the 64B packet case.

This makes sense if there is no other work to do on the same core.
Otherwise it is better to spent those cycles on actual work instead of waiting 
for 
2nd cache line...

Thanks,

Damjan


[dpdk-dev] rte_mbuf.next in 2nd cacheline

2015-06-17 Thread Damjan Marion (damarion)

> On 15 Jun 2015, at 16:12, Bruce Richardson  
> wrote:
> 
> The next pointers always start out as NULL when the mbuf pool is created. The
> only time it is set to non-NULL is when we have chained mbufs. If we never 
> have
> any chained mbufs, we never need to touch the next field, or even read it - 
> since
> we have the num-segments count in the first cache line. If we do have a 
> multi-segment
> mbuf, it's likely to be a big packet, so we have more processing time 
> available
> and we can then take the hit of setting the next pointer.

There are applications which are not using rx offload, but they deal with 
chained mbufs.
Why they are less important than ones using rx offload? This is something 
people 
should be able to configure on build time.
That should not be too hard to achieve with set of macros. I can come up with 
the patch...

Thanks,

Damjan


[dpdk-dev] rte_mbuf.next in 2nd cacheline

2015-06-10 Thread Damjan Marion (damarion)

Hi,

We noticed 7% performance improvement by simply moving rte_mbuf.next field to 
the 1st cache line.

Currently, it falls under /* second cache line - fields only used in slow path 
or on TX */
but it is actually used at several places in rx fast path. (e.g.: 
i40e_rx_alloc_bufs() is setting that field to NULL).

Is there anything we can do here (stop using next field, or move it to 1st 
cache line)?

Thanks,

Damjan






[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-06 Thread Damjan Marion (damarion)

> On 06 Feb 2015, at 11:26, De Lara Guarch, Pablo  intel.com> wrote:
> 
> 
> 
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Linhaifeng
>> Sent: Friday, February 06, 2015 2:04 AM
>> To: Damjan Marion (damarion); dev at dpdk.org
>> Subject: Re: [dpdk-dev] mmap fails with more than 4 hugepages
>> 
>> 
>> 
>> On 2015/2/5 20:00, Damjan Marion (damarion) wrote:
>>> Hi,
>>> 
>>> I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK
>> crashes in rte_eal_init()
>>> when number of available hugepages is around 4 or above.
>>> Everything works fine with lower values (i.e. 3).
>>> 
>>> I also tried with allocating 4 on node0 and 0 on node1, same crash
>> happens.
>>> 
>>> 
>>> Any idea what might be causing this?
>>> 
>>> Thanks,
>>> 
>>> Damjan
>>> 
>> 
>> cat /proc/sys/vm/max_map_count
> 
> I just checked on my board, and having 40k hugepages, you need that value to 
> be over the double of that (a value of 81k should work).
> 
> Let us know if that fixed the problem!

Yes, it works now. Thanks!



[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-05 Thread Damjan Marion (damarion)

> On 05 Feb 2015, at 15:59, Neil Horman  wrote:
> 
> On Thu, Feb 05, 2015 at 01:20:01PM +0000, Damjan Marion (damarion) wrote:
>> 
>>> On 05 Feb 2015, at 13:59, Neil Horman  wrote:
>>> 
>>> On Thu, Feb 05, 2015 at 12:00:48PM +, Damjan Marion (damarion) wrote:
>>>> Hi,
>>>> 
>>>> I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK 
>>>> crashes in rte_eal_init()
>>>> when number of available hugepages is around 4 or above.
>>>> Everything works fine with lower values (i.e. 3).
>>>> 
>>>> I also tried with allocating 4 on node0 and 0 on node1, same crash 
>>>> happens.
>>>> 
>>>> 
>>>> Any idea what might be causing this?
>>>> 
>>>> Thanks,
>>>> 
>>>> Damjan
>>>> 
>>>> 
>>>> $ cat 
>>>> /sys/devices/system/node/node[01]/hugepages/hugepages-2048kB/nr_hugepages
>>>> 2
>>>> 2
>>>> 
>>>> $ grep -i huge /proc/meminfo
>>>> AnonHugePages:706560 kB
>>>> HugePages_Total:   4
>>>> HugePages_Free:4
>>>> HugePages_Rsvd:0
>>>> HugePages_Surp:0
>>>> Hugepagesize:   2048 kB
>>>> 
>>> Whats your shmmax value set to? 4 2MB hugepages is way above the default
>>> setting for how much shared ram a system will allow.  I've not done the 
>>> math on
>>> your logs below, but judging by the size of some of the mapped segments, I'm
>>> betting your hitting the default limit of 4GB.
>> 
>> $ cat /proc/sys/kernel/shmmax
>> 33554432
>> 
>> $ sysctl -w kernel.shmmax=8589934592
>> kernel.shmmax = 8589934592
>> 
>> same crash :(
>> 
>> Thanks,
>> 
>> Damjan
> 
> What about the shmmni and shmmax values.  The shmmax value will also need to 
> be
> set to at least 80G (more if you have other shared memory needs), and shmmni
> will need to be larger than 40,000 to handle all the segments your creating.
> Neil

Hmm, if that is the reason, then 3 will also not work. as both values are 
set by default to much much lower value.

Anyway, I tried:

$ sysctl kernel.shmmni
kernel.shmmni = 10

$ sysctl kernel.shmmax
kernel.shmmax = 274877906944

$ cat /sys/devices/system/node/node[01]/hugepages/hugepages-2048kB/nr_hugepages
2
2

$ sudo ~/src/dpdk/x86_64-native-linuxapp-gcc/app/testpmd -l 5-7 -n 3 
--socket-mem 512,512
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 6 on socket 0
EAL: Detected lcore 7 as core 7 on socket 0
EAL: Detected lcore 8 as core 0 on socket 1
EAL: Detected lcore 9 as core 1 on socket 1
EAL: Detected lcore 10 as core 2 on socket 1
EAL: Detected lcore 11 as core 3 on socket 1
EAL: Detected lcore 12 as core 4 on socket 1
EAL: Detected lcore 13 as core 5 on socket 1
EAL: Detected lcore 14 as core 6 on socket 1
EAL: Detected lcore 15 as core 7 on socket 1
EAL: Detected lcore 16 as core 0 on socket 0
EAL: Detected lcore 17 as core 1 on socket 0
EAL: Detected lcore 18 as core 2 on socket 0
EAL: Detected lcore 19 as core 3 on socket 0
EAL: Detected lcore 20 as core 4 on socket 0
EAL: Detected lcore 21 as core 5 on socket 0
EAL: Detected lcore 22 as core 6 on socket 0
EAL: Detected lcore 23 as core 7 on socket 0
EAL: Detected lcore 24 as core 0 on socket 1
EAL: Detected lcore 25 as core 1 on socket 1
EAL: Detected lcore 26 as core 2 on socket 1
EAL: Detected lcore 27 as core 3 on socket 1
EAL: Detected lcore 28 as core 4 on socket 1
EAL: Detected lcore 29 as core 5 on socket 1
EAL: Detected lcore 30 as core 6 on socket 1
EAL: Detected lcore 31 as core 7 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 32 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
EAL: Ask a virtual area of 0x8000 bytes
EAL: Virtual area found at 0x7fe28000 (size = 0x8000)
EAL: Ask a virtual area of 0x8000 bytes
EAL: Virtual area found at 0x7fe1c000 (size = 0x8000)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fe335c0 (size = 0x20)
EAL: Ask a virtual area of 0x940 bytes
EAL: Virtual area found at 0x7fe32c60 (size = 0x940)
EAL: Ask a virtual area of 0x5c0 bytes
EAL: Virtual area found at 0x7fe32680 (size = 0x5c0)
EAL: Ask a virtual area of 0x38e0 bytes
EAL: Virtual area found at 0x7fe24700 (size = 0x38e0)
EA

[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-05 Thread Damjan Marion (damarion)

On 05 Feb 2015, at 14:22, Jay Rolette mailto:rolette 
at infiniteio.com>> wrote:

On Thu, Feb 5, 2015 at 6:00 AM, Damjan Marion (damarion) mailto:damarion at cisco.com>> wrote:
Hi,

I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK crashes 
in rte_eal_init()
when number of available hugepages is around 4 or above.
Everything works fine with lower values (i.e. 3).

I also tried with allocating 4 on node0 and 0 on node1, same crash happens.


Any idea what might be causing this?

Any reason you can't switch to using 1GB hugepages? You'll get better 
performance and your init time will be shorter. The systems we run on are 
similar (256GB, 2 NUMA nodes) and that works fine for us.

Yes, unfortunately some other consumers are needing smaller pages


Not directly related, but if you have to stick with 2MB hugepages, you might 
want to take a look at a patch I submitted that fixes the O(n^2) algorithm used 
in initializing hugepages.

I tried it hoping that it will change something, no luck?

Thanks,

Damjan



[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-05 Thread Damjan Marion (damarion)

> On 05 Feb 2015, at 13:59, Neil Horman  wrote:
> 
> On Thu, Feb 05, 2015 at 12:00:48PM +0000, Damjan Marion (damarion) wrote:
>> Hi,
>> 
>> I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK 
>> crashes in rte_eal_init()
>> when number of available hugepages is around 4 or above.
>> Everything works fine with lower values (i.e. 3).
>> 
>> I also tried with allocating 4 on node0 and 0 on node1, same crash 
>> happens.
>> 
>> 
>> Any idea what might be causing this?
>> 
>> Thanks,
>> 
>> Damjan
>> 
>> 
>> $ cat 
>> /sys/devices/system/node/node[01]/hugepages/hugepages-2048kB/nr_hugepages
>> 2
>> 2
>> 
>> $ grep -i huge /proc/meminfo
>> AnonHugePages:706560 kB
>> HugePages_Total:   4
>> HugePages_Free:4
>> HugePages_Rsvd:0
>> HugePages_Surp:0
>> Hugepagesize:   2048 kB
>> 
> Whats your shmmax value set to? 4 2MB hugepages is way above the default
> setting for how much shared ram a system will allow.  I've not done the math 
> on
> your logs below, but judging by the size of some of the mapped segments, I'm
> betting your hitting the default limit of 4GB.

$ cat /proc/sys/kernel/shmmax
33554432

$ sysctl -w kernel.shmmax=8589934592
kernel.shmmax = 8589934592

same crash :(

Thanks,

Damjan


[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-05 Thread Damjan Marion (damarion)
Hi,

I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK crashes 
in rte_eal_init()
when number of available hugepages is around 4 or above.
Everything works fine with lower values (i.e. 3).

I also tried with allocating 4 on node0 and 0 on node1, same crash happens.


Any idea what might be causing this?

Thanks,

Damjan


$ cat /sys/devices/system/node/node[01]/hugepages/hugepages-2048kB/nr_hugepages
2
2

$ grep -i huge /proc/meminfo
AnonHugePages:706560 kB
HugePages_Total:   4
HugePages_Free:4
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB


$ sudo ~/src/dpdk/x86_64-native-linuxapp-gcc/app/testpmd -l 5-7 -n 3 
--socket-mem 512,512
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 6 on socket 0
EAL: Detected lcore 7 as core 7 on socket 0
EAL: Detected lcore 8 as core 0 on socket 1
EAL: Detected lcore 9 as core 1 on socket 1
EAL: Detected lcore 10 as core 2 on socket 1
EAL: Detected lcore 11 as core 3 on socket 1
EAL: Detected lcore 12 as core 4 on socket 1
EAL: Detected lcore 13 as core 5 on socket 1
EAL: Detected lcore 14 as core 6 on socket 1
EAL: Detected lcore 15 as core 7 on socket 1
EAL: Detected lcore 16 as core 0 on socket 0
EAL: Detected lcore 17 as core 1 on socket 0
EAL: Detected lcore 18 as core 2 on socket 0
EAL: Detected lcore 19 as core 3 on socket 0
EAL: Detected lcore 20 as core 4 on socket 0
EAL: Detected lcore 21 as core 5 on socket 0
EAL: Detected lcore 22 as core 6 on socket 0
EAL: Detected lcore 23 as core 7 on socket 0
EAL: Detected lcore 24 as core 0 on socket 1
EAL: Detected lcore 25 as core 1 on socket 1
EAL: Detected lcore 26 as core 2 on socket 1
EAL: Detected lcore 27 as core 3 on socket 1
EAL: Detected lcore 28 as core 4 on socket 1
EAL: Detected lcore 29 as core 5 on socket 1
EAL: Detected lcore 30 as core 6 on socket 1
EAL: Detected lcore 31 as core 7 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 32 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
EAL: Ask a virtual area of 0x80 bytes
EAL: Virtual area found at 0x7fae2a20 (size = 0x80)
EAL: Ask a virtual area of 0x760 bytes
EAL: Virtual area found at 0x7fae22a0 (size = 0x760)
EAL: Ask a virtual area of 0x140 bytes
EAL: Virtual area found at 0x7fae2140 (size = 0x140)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fae2100 (size = 0x20)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7fae20a0 (size = 0x40)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fae2060 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fae2020 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fae1fe0 (size = 0x20)
EAL: Ask a virtual area of 0x580 bytes
EAL: Virtual area found at 0x7fae1a40 (size = 0x580)
EAL: Ask a virtual area of 0x3b20 bytes
EAL: Virtual area found at 0x7faddf00 (size = 0x3b20)
EAL: Ask a virtual area of 0x40 bytes
EAL: Virtual area found at 0x7faddea0 (size = 0x40)
EAL: Ask a virtual area of 0x7c0 bytes
EAL: Virtual area found at 0x7fadd6c0 (size = 0x7c0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fadd680 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fadd640 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fadd600 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fadd5c0 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fadd580 (size = 0x20)
EAL: Ask a virtual area of 0x980 bytes
EAL: Virtual area found at 0x7fadcbe0 (size = 0x980)
EAL: Ask a virtual area of 0x1900 bytes
EAL: Virtual area found at 0x7fadb2c0 (size = 0x1900)
EAL: Ask a virtual area of 0x2440 bytes
EAL: Virtual area found at 0x7fad8e60 (size = 0x2440)
EAL: Ask a virtual area of 0xc80 bytes
EAL: Virtual area found at 0x7fad81c0 (size = 0xc80)
EAL: Ask a virtual area of 0x3200 bytes
EAL: Virtual area found at 0x7fad4fa0 (size = 0x3200)
EAL: Ask a virtual area of 0x3db80 bytes
EAL: Virtual area found at 0x7fa97400 (size = 0x3db80)
EAL: Ask a virtual area of 0xfa00 bytes
EAL: Virtual area found at 0x7fa879e0 (size = 0xfa00)
EAL: Ask a virtual area of 0x16840 bytes
EAL: Virtual area found at 0x7fa71180 (size = 0x16840)
EAL: Ask a virtual area of 0x7d00 bytes
EAL: Virtual area found at 0x7fa6946

[dpdk-dev] [PATCH] virtio: fix crash if VIRTIO_NET_F_CTRL_VQ is not negotiated

2014-09-29 Thread Damjan Marion (damarion)


On 17 Sep 2014, at 09:32, Olivier MATZ  wrote:

> Hello,
> 
> On 09/12/2014 12:25 AM, damarion at cisco.com wrote:
>> From: Damjan Marion 
>> 
>> If VIRTIO_NET_F_CTRL_VQ is not negotiated hw->cvq will be NULL
>> 
>> Signed-off-by: Damjan Marion 
>> ---
>>  lib/librte_pmd_virtio/virtio_rxtx.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
> 
> Acked-by: Olivier Matz 
> 

Is this going to be applied or any action pending on my side?

Thanks,

Damjan