from:"Vlad Zolotarov"

[dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure

2016-07-31 Thread Vlad Zolotarov



On 07/31/2016 10:46 AM, Vlad Zolotarov wrote:
>
>
> On 07/20/2016 05:24 PM, Tomasz Kulasek wrote:
>> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
>> changes in rte_eth_dev and rte_eth_desc_lim structures.
>>
>> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
>> necessary preparations of packet burst to be safely transmitted on
>> device for desired HW offloads (set/reset checksum field according to
>> the hardware requirements) and check HW constraints (number of segments
>> per packet, etc).
>>
>> While the limitations and requirements may differ for devices, it
>> requires to extend rte_eth_dev structure with new function pointer
>> "tx_pkt_prep" which can be implemented in the driver to prepare and
>> verify packets, in devices specific way, before burst, what should to
>> prevent application to send malformed packets.
>>
>> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
>> nb_mtu_seg_max, providing an information about max segments in TSO and
>> non TSO packets acceptable by device.
>>
>> Signed-off-by: Tomasz Kulasek 
>
> Acked-by: Vlad Zolotarov 

One small comment however.
Although this function is a must we need a way to clearly understand 
which one the clusters are malformed since dropping the whole bulk is 
usually not an option and sending the malformed packets anyway may cause 
a HW hang, thus not an option as well.
Another thing - I've pulled the current master and I couldn't find the 
way an application may query the mentioned Tx offload HW limitation, 
e.g. maximum number of segments.
Knowing this limitation would avoid extra liniarization.

thanks,
vlad

>
>> ---
>>   doc/guides/rel_notes/deprecation.rst |7 +++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst 
>> b/doc/guides/rel_notes/deprecation.rst
>> index f502f86..485aacb 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -41,3 +41,10 @@ Deprecation Notices
>>   * The mempool functions for single/multi producer/consumer are 
>> deprecated and
>> will be removed in 16.11.
>> It is replaced by rte_mempool_generic_get/put functions.
>> +
>> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure 
>> will be
>> +  extended with new function pointer ``tx_pkt_prep`` allowing 
>> verification
>> +  and processing of packet burst to meet HW specific requirements 
>> before
>> +  transmit. Also new fields will be added to the 
>> ``rte_eth_desc_lim`` structure:
>> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about 
>> number of
>> +  segments limit to be transmitted by device for TSO/non-TSO packets.
>

[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-31 Thread Vlad Zolotarov



On 07/22/2016 01:48 AM, Ananyev, Konstantin wrote:
>
>> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
>> changes in rte_eth_dev and rte_eth_desc_lim structures.
>>
>> As discussed in that thread:
>>
>> http://dpdk.org/ml/archives/dev/2015-September/023603.html
>>
>> Different NIC models depending on HW offload requested might impose
>> different requirements on packets to be TX-ed in terms of:
>>
>>   - Max number of fragments per packet allowed
>>   - Max number of fragments per TSO segments
>>   - The way pseudo-header checksum should be pre-calculated
>>   - L3/L4 header fields filling
>>   - etc.
>>
>>
>> MOTIVATION:
>> ---
>>
>> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>> However, this work is sometimes required, and now, it's an
>> application issue.
>>
>> 2) Different hardware may have different requirements for TX offloads,
>> other subset can be supported and so on.
>>
>> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>> device. These parameters may be vary for different devices.
>>
>> For example i40e HW allows 8 fragments per packet, but that is after
>> TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
>>
>> 4) Fields in packet may require different initialization (like eg. will
>> require pseudo-header checksum precalculation, sometimes in a
>> different way depending on packet type, and so on). Now application
>> needs to care about it.
>>
>> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>> prepare packet burst in acceptable form for specific device.
>>
>> 6) Some additional checks may be done in debug mode keeping tx_burst
>> implementation clean.
>>
>>
>> PROPOSAL:
>> -
>>
>> To help user to deal with all these varieties we propose to:
>>
>> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>> packet burst to be safely transmitted on device for desired HW
>> offloads (set/reset checksum field according to the hardware
>> requirements) and check HW constraints (number of segments per
>> packet, etc).
>>
>> While the limitations and requirements may differ for devices, it
>> requires to extend rte_eth_dev structure with new function pointer
>> "tx_pkt_prep" which can be implemented in the driver to prepare and
>> verify packets, in devices specific way, before burst, what should to
>> prevent application to send malformed packets.
>>
>> 2. Also new fields will be introduced in rte_eth_desc_lim:
>> nb_seg_max and nb_mtu_seg_max, providing an information about max
>> segments in TSO and non-TSO packets acceptable by device.
>>
>> This information is useful for application to not create/limit
>> malicious packet.
>>
>>
>> APPLICATION (CASE OF USE):
>> --
>>
>> 1) Application should to initialize burst of packets to send, set
>> required tx offload flags and required fields, like l2_len, l3_len,
>> l4_len, and tso_segsz
>>
>> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>> required to send packets through the NIC.
>>
>> 3) The result of rte_eth_tx_prep can be used to send valid packets
>> and/or restore invalid if function fails.
>>
>> eg.
>>
>>  for (i = 0; i < nb_pkts; i++) {
>>
>>  /* initialize or process packet */
>>
>>  bufs[i]->tso_segsz = 800;
>>  bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
>>  | PKT_TX_IP_CKSUM;
>>  bufs[i]->l2_len = sizeof(struct ether_hdr);
>>  bufs[i]->l3_len = sizeof(struct ipv4_hdr);
>>  bufs[i]->l4_len = sizeof(struct tcp_hdr);
>>  }
>>
>>  /* Prepare burst of TX packets */
>>  nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
>>
>>  if (nb_prep < nb_pkts) {
>>  printf("tx_prep failed\n");
>>
>>  /* drop or restore invalid packets */
>>
>>  }
>>
>>  /* Send burst of TX packets */
>>  nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
>>
>>  /* Free any unsent packets. */
>>
>>
>>
>> Signed-off-by: Tomasz Kulasek 

Acked-by: Vlad Zolotarov 

>> ---
>>   do

[dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure

2016-07-31 Thread Vlad Zolotarov



On 07/20/2016 05:24 PM, Tomasz Kulasek wrote:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
>
> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> necessary preparations of packet burst to be safely transmitted on
> device for desired HW offloads (set/reset checksum field according to
> the hardware requirements) and check HW constraints (number of segments
> per packet, etc).
>
> While the limitations and requirements may differ for devices, it
> requires to extend rte_eth_dev structure with new function pointer
> "tx_pkt_prep" which can be implemented in the driver to prepare and
> verify packets, in devices specific way, before burst, what should to
> prevent application to send malformed packets.
>
> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> nb_mtu_seg_max, providing an information about max segments in TSO and
> non TSO packets acceptable by device.
>
> Signed-off-by: Tomasz Kulasek 

Acked-by: Vlad Zolotarov 

> ---
>   doc/guides/rel_notes/deprecation.rst |7 +++
>   1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>   * The mempool functions for single/multi producer/consumer are deprecated 
> and
> will be removed in 16.11.
> It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` 
> structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number 
> of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.

[dpdk-dev] API feature check _HAS_

2015-11-29 Thread Vlad Zolotarov



On 11/29/15 11:10, Gleb Natapov wrote:
> On Sun, Nov 29, 2015 at 11:07:44AM +0200, Vlad Zolotarov wrote:
>>
>> On 11/26/15 22:35, Thomas Monjalon wrote:
>>> When introducing LRO, Vlad has defined the macro RTE_ETHDEV_HAS_LRO_SUPPORT:
>>> http://dpdk.org/browse/dpdk/commit/lib/librte_ether/rte_ethdev.h?id=8eecb329
>>>
>>> It allows to use the feature without version check (before the release or
>>> after a backport).
>>> Do you think it is useful?
>>> Should we define other macros RTE_[API]_HAS_[FEATURE] for each new feature
>>> or API change?
>> The main purpose of the above macro was to identify the presence of the new
>> field in the rte_eth_rxmode during the
>> period of time when there was no other way to know it. Once this may be
>> concluded based on the release version I see no
>> reason to keep it.
>>
> Concluding things based on release version does not work so well for
> back ports.

In that case the existing applications won't be able to enjoy the 
feature with the older releases with the backport - that's true.
Having this flag has it's benefits (e.g. the corresponding ifdefs are 
much more readable), however to be consistent we'd rather define this 
type of flags
for other features too like Thomas wrote above. I'm not against this 
approach too...

>
>>> It's time to fix it before releasing the 2.2 version.
> --
>   Gleb.

[dpdk-dev] API feature check _HAS_

2015-11-29 Thread Vlad Zolotarov

On 11/26/15 22:35, Thomas Monjalon wrote:
> When introducing LRO, Vlad has defined the macro RTE_ETHDEV_HAS_LRO_SUPPORT:
> http://dpdk.org/browse/dpdk/commit/lib/librte_ether/rte_ethdev.h?id=8eecb329
>
> It allows to use the feature without version check (before the release or
> after a backport).
> Do you think it is useful?
> Should we define other macros RTE_[API]_HAS_[FEATURE] for each new feature
> or API change?

The main purpose of the above macro was to identify the presence of the 
new field in the rte_eth_rxmode during the
period of time when there was no other way to know it. Once this may be 
concluded based on the release version I see no
reason to keep it.

> It's time to fix it before releasing the 2.2 version.

[dpdk-dev] [PATCH v4] ixgbe_pmd: enforce RS bit on every EOP descriptor for devices newer than 82598

2015-10-27 Thread Vlad Zolotarov



On 10/27/15 21:10, Ananyev, Konstantin wrote:
> Hi lads,
>
>> -Original Message-----
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Tuesday, October 27, 2015 6:48 PM
>> To: Thomas Monjalon; Ananyev, Konstantin; Zhang, Helin
>> Cc: dev at dpdk.org; Kirsher, Jeffrey T; Brandeburg, Jesse
>> Subject: Re: [dpdk-dev] [PATCH v4] ixgbe_pmd: enforce RS bit on every EOP 
>> descriptor for devices newer than 82598
>>
>>
>>
>> On 10/27/15 20:09, Thomas Monjalon wrote:
>>> Any Follow-up to this discussion?
>>> Should we mark this patch as rejected?
>> Hmmm... This patch fixes an obvious spec violation. Why would it be
>> rejected?
> No I don't think we can reject the patch:
> There is a reproducible  TX hang on ixgbe PMD on described conditions.
> Though, as I explained here:
> http://dpdk.org/ml/archives/dev/2015-September/023574.html
> Vlad's patch would cause quite a big slowdown.
> We are still in the process to get an answer from HW guys are there any
> alternatives that will allow to fix the problem and avoid the slowdown.

+1

> Konstantin
>
>>> 2015-08-24 11:11, Vlad Zolotarov:
>>>> On 08/20/15 18:37, Vlad Zolotarov wrote:
>>>>> According to 82599 and x540 HW specifications RS bit *must* be
>>>>> set in the last descriptor of *every* packet.
>>>>>
>>>>> Before this patch there were 3 types of Tx callbacks that were setting
>>>>> RS bit every tx_rs_thresh descriptors. This patch introduces a set of
>>>>> new callbacks, one for each type mentioned above, that will set the RS
>>>>> bit in every EOP descriptor.
>>>>>
>>>>> ixgbe_set_tx_function() will set the appropriate Tx callback according
>>>>> to the device family.
>>>> [+Jesse and Jeff]
>>>>
>>>> I've started to look at the i40e PMD and it has the same RS bit
>>>> deferring logic
>>>> as ixgbe PMD has (surprise, surprise!.. ;)). To recall, i40e PMD uses a
>>>> descriptor write-back
>>>> completion mode.
>>>>
>>>>From the HW Spec it's unclear if RS bit should be set on *every* 
>>>> descriptor
>>>> with EOP bit. However I noticed that Linux driver, before it moved to
>>>> HEAD write-back mode, was setting RS
>>>> bit on every EOP descriptor.
>>>>
>>>> So, here is a question to Intel guys: could u, pls., clarify this point?
>>>>
>>>> Thanks in advance,
>>>> vlad
>>>

[dpdk-dev] [PATCH v4] ixgbe_pmd: enforce RS bit on every EOP descriptor for devices newer than 82598

2015-10-27 Thread Vlad Zolotarov



On 10/27/15 20:09, Thomas Monjalon wrote:
> Any Follow-up to this discussion?
> Should we mark this patch as rejected?

Hmmm... This patch fixes an obvious spec violation. Why would it be 
rejected?

>
> 2015-08-24 11:11, Vlad Zolotarov:
>> On 08/20/15 18:37, Vlad Zolotarov wrote:
>>> According to 82599 and x540 HW specifications RS bit *must* be
>>> set in the last descriptor of *every* packet.
>>>
>>> Before this patch there were 3 types of Tx callbacks that were setting
>>> RS bit every tx_rs_thresh descriptors. This patch introduces a set of
>>> new callbacks, one for each type mentioned above, that will set the RS
>>> bit in every EOP descriptor.
>>>
>>> ixgbe_set_tx_function() will set the appropriate Tx callback according
>>> to the device family.
>> [+Jesse and Jeff]
>>
>> I've started to look at the i40e PMD and it has the same RS bit
>> deferring logic
>> as ixgbe PMD has (surprise, surprise!.. ;)). To recall, i40e PMD uses a
>> descriptor write-back
>> completion mode.
>>
>>   From the HW Spec it's unclear if RS bit should be set on *every* descriptor
>> with EOP bit. However I noticed that Linux driver, before it moved to
>> HEAD write-back mode, was setting RS
>> bit on every EOP descriptor.
>>
>> So, here is a question to Intel guys: could u, pls., clarify this point?
>>
>> Thanks in advance,
>> vlad
>
>

[dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs

2015-10-23 Thread Vlad Zolotarov



On 10/23/15 11:32, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Friday, October 23, 2015 4:27 PM
>> To: Zhang, Helin
>> Cc: Lu, Wenzhuo; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs
>>
>>
>>
>> On 10/23/15 10:14, Zhang, Helin wrote:
>>> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
>>> Sent: Friday, October 23, 2015 2:57 PM
>>> To: Zhang, Helin
>>> Cc: Lu, Wenzhuo; dev at dpdk.org
>>> Subject: RE: [dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames
>>> from VFs
>>>
>>>
>>> On Oct 23, 2015 9:30 AM, "Zhang, Helin"  wrote:
>>>>
>>>> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
>>>> Sent: Friday, October 23, 2015 2:24 PM
>>>> To: Zhang, Helin
>>>> Cc: Lu, Wenzhuo; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames
>>>> from VFs
>>>>
>>>>
>>>> On Oct 23, 2015 9:02 AM, "Zhang, Helin"  wrote:
>>>>>
>>>>>> -Original Message-
>>>>>> From: Lu, Wenzhuo
>>>>>> Sent: Friday, October 23, 2015 1:52 PM
>>>>>> To: dev at dpdk.org
>>>>>> Cc: Zhang, Helin; Lu, Wenzhuo
>>>>>> Subject: [PATCH v4] ixgbe: Drop flow control frames from VFs
>>>>>>
>>>>>> This patch will drop flow control frames from being transmitted from 
>>>>>> VSIs.
>>>>>> With this patch in place a malicious VF cannot send flow control or
>>>>>> PFC packets out on the wire.
>>>> The whole idea of this (and similar i40e patches sent before) is really
>> confusing.
>>>> If u want to disable FC feature for VFs then go and disable the feature. 
>>>> Why
>> keep (not malicious) user think that he/she has enabled the feature while u
>> silently block it?
>>>> Helin: I don't think disabling FC is equal to filtering out any pause 
>>>> frames. How
>> about the software application constructs a pause frame and then tries to 
>> send it
>> out?
>>> But not disabling FC for the user and silently preventing it is bogus. 
>>> First, the
>> conventional user should not be affected. I think this patch (and all its 
>> clones)
>> should be extended to, first, disable the FC Tx feature for the relevant 
>> devices
>> and only then adding any anti malicious filtering.
>>> Helin: Disabling FC will disable both PF and VF FC, I don't find out where 
>>> can
>> disable VF FC only. Am I wrong?
>>
>> There are flow_ctrl_get/set callbacks in eth_dev_ops which are used for
>> configuring FC.
>> I see that they are not set for either ixgbevf or i40evf, so here we are all 
>> set for
>> these.
> Helin: The behaviors rely on the hardware capability, but not the SW.
> I meant I don't think it can support disabling VF FC. Please correct me in 
> case I am wrong!

I see. After a shallow sweep on the x540 and xl710 specs it seems that u 
r right. However I was talking about the SW interface only and since it 
is not enabled for the devices in question my whole objection is removed.

thanks,
vlad

>
>
>>>>>> V2:
>>>>>> Reword the comments.
>>>>>>
>>>>>> V3:
>>>>>> Move the check of set_ethertype_anti_spoofing to the top of the function,
>> to
>>>>>> avoid occupying an ethertype_filter entity without using it.
>>>>>>
>>>>>> V4:
>>>>>> Remove the useless braces and return.
>>>>>>
>>>>>> Signed-off-by: Wenzhuo Lu 
>>>>> Acked-by: Helin Zhang 
>>>>>

[dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs

2015-10-23 Thread Vlad Zolotarov



On 10/23/15 10:14, Zhang, Helin wrote:
>
> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Friday, October 23, 2015 2:57 PM
> To: Zhang, Helin
> Cc: Lu, Wenzhuo; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs
>
>
> On Oct 23, 2015 9:30 AM, "Zhang, Helin"  wrote:
>>
>>
>> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Friday, October 23, 2015 2:24 PM
>> To: Zhang, Helin
>> Cc: Lu, Wenzhuo; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs
>>
>>
>> On Oct 23, 2015 9:02 AM, "Zhang, Helin"  wrote:
>>>
>>>
 -Original Message-
 From: Lu, Wenzhuo
 Sent: Friday, October 23, 2015 1:52 PM
 To: dev at dpdk.org
 Cc: Zhang, Helin; Lu, Wenzhuo
 Subject: [PATCH v4] ixgbe: Drop flow control frames from VFs

 This patch will drop flow control frames from being transmitted from VSIs.
 With this patch in place a malicious VF cannot send flow control or PFC 
 packets
 out on the wire.
>> The whole idea of this (and similar i40e patches sent before) is really 
>> confusing.
>> If u want to disable FC feature for VFs then go and disable the feature. Why 
>> keep (not malicious) user think that he/she has enabled the feature while u 
>> silently block it?
>>
>> Helin: I don't think disabling FC is equal to filtering out any pause 
>> frames. How about the software application constructs a pause frame and then 
>> tries to send it out?
> But not disabling FC for the user and silently preventing it is bogus. First, 
> the conventional user should not be affected. I think this patch (and all its 
> clones) should be extended to, first, disable the FC Tx feature for the 
> relevant devices and only then adding any anti malicious filtering.
>   
> Helin: Disabling FC will disable both PF and VF FC, I don't find out where 
> can disable VF FC only. Am I wrong?

There are flow_ctrl_get/set callbacks in eth_dev_ops which are used for 
configuring FC.
I see that they are not set for either ixgbevf or i40evf, so here we are 
all set for these.

>
 V2:
 Reword the comments.

 V3:
 Move the check of set_ethertype_anti_spoofing to the top of the function, 
 to
 avoid occupying an ethertype_filter entity without using it.

 V4:
 Remove the useless braces and return.

 Signed-off-by: Wenzhuo Lu 
>>> Acked-by: Helin Zhang 
>>>

[dpdk-dev] [PATCH] i40e: workaround for Security issue in SR-IOV mode

2015-10-08 Thread Vlad Zolotarov

On 10/08/15 05:17, Wu, Jingjing wrote:
>>> In SR-IOV mode a VF sending LFC or PFC would throttle the entire port.
>>> The workaround is to add a filter to drop pause frames from VFs from
>>> sending pause frames.
>> This is a very strange approach - this would silently disable the Tx FC 
>> while a user would think it's enabled. Wouldn't the right approach be to let 
>> the user decide weather to enable this feature or even better - allow PF to 
>> disable this feature in the VF?
> So, even we let VF sending Tx, it does not make sense at all.  As my 
> understanding, Flow control is used for full-duplex point-to-point 
> connections. How about VF? What is its peer for the point-to-point connect? 
> So if we enable it, it will be a security risk if attacker sends FC on VFs.

I'll start start from the end: AFAIR FC frames are not forwarded, they 
only throttle the sender on the side that receives the PAUSE frame. 
Therefore it's quite trickery to create a PAUSE-frame attack as I see it 
- u'll have to hack the switch next to the host u attack. So, let's drop 
the "security" risk argument for now... ;)

Regarding VF sending FC frames being useless: this depends on the setup 
demands. If drops in the VF on the MAC level are not acceptable then it 
makes the whole lot of sense, just like it makes sense with a PF in the 
same situation. Of course, as a result the whole (switch) link will be 
throttled however that's the price to pay and System Administrators 
should be well aware of it.

If, on the other hand, System Administrator doesn't want FC it may just 
not enable it on this VF. If memory serves me well FC is disabled by 
default in DPDK.

thanks,
vlad

>
> Thanks
> Jingjing

[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vlad Zolotarov

On 10/06/15 18:00, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 05:49:21PM +0300, Vlad Zolotarov wrote:
>>> and read/write the config space.
>>> This means that a single userspace bug is enough to corrupt kernel
>>> memory.
>> Could u, pls., provide and example of this simple bug? Because it's
>> absolutely not obvious...
> Stick a value that happens to match a kernel address in Msg Addr field
> in an unmasked MSI-X entry.

This patch neither configures MSI-X entries in the user space nor 
provides additional means to do so therefore this "sticking" would be a 
matter of some extra code that is absolutely unrelated to this patch. 
So, this example seems absolutely irrelevant to this particular discussion.

thanks,
vlad

>

[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vlad Zolotarov

On 10/06/15 16:58, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 11:23:11AM +0300, Vlad Zolotarov wrote:
>> Michael, how this or any other related patch is related to the problem u r
>> describing?
>> The above ability is there for years and if memory serves me
>> well it was u who wrote uio_pci_generic with this "security flaw".  ;)
> I answered all this already.
>
> This patch enables bus mastering, enables MSI or MSI-X

This may be done from the user space right now without this patch...

> , and requires
> userspace to map the MSI-X table

Hmmm... I must have missed this requirement. Could u, pls., clarify? 
 From what I see, MSI/MSI-X table is configured completely in the kernel 
here...

> and read/write the config space.
> This means that a single userspace bug is enough to corrupt kernel
> memory.

Could u, pls., provide and example of this simple bug? Because it's 
absolutely not obvious...

>
> uio_pci_generic does not enable bus mastering or MSI, and
> it might be a good idea to have uio_pci_generic block
> access to MSI/MSI-X config.

Since device bars may be mapped bypassing the UIO/uio_pci_generic - this 
won't solve any issue.

[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vlad Zolotarov

On 10/06/15 01:49, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 01:09:55AM +0300, Vladislav Zolotarov wrote:
>> How about instead of trying to invent the wheel just go and attack the 
>> problem
>> directly just like i've proposed already a few times in the last days: 
>> instead
>> of limiting the UIO limit the users that are allowed to use UIO to privileged
>> users only (e.g. root). This would solve all clearly unresolvable issues u 
>> are
>> raising here all together, wouldn't it?
> No - root or no root, if the user can modify the addresses in the MSI-X
> table and make the chip corrupt random memory, this is IMHO a non-starter.

Michael, how this or any other related patch is related to the problem u 
r describing? The above ability is there for years and if memory serves 
me well it was u who wrote uio_pci_generic with this "security flaw".  ;)

This patch in general only adds the ability to receive notifications per 
MSI-X interrupt and it has nothing to do with the ability to reprogram 
the MSI-X related registers from the user space which was always there.

>
> And tainting kernel is not a solution - your patch adds a pile of
> code that either goes completely unused or taints the kernel.
> Not just that - it's a dedicated userspace API that either
> goes completely unused or taints the kernel.
>
>>> --
>>> MST

[dpdk-dev] [PATCH 0/2] uio_msi: device driver

2015-10-05 Thread Vlad Zolotarov

On 10/04/15 22:03, Greg KH wrote:
> On Sun, Oct 04, 2015 at 07:49:35PM +0300, Vlad Zolotarov wrote:
>> FYI: I've just posted to linux-kernel list patches that add support for both
>> MSI and MSI-X interrupt modes to uio_pci_generic driver.
>> It addresses most (all) remarks on this thread and also fixes some issues
>> this code has, e.g. not disabling msi-x in remove(), etc.
>>
>> U are all welcome to comment... ;)
> Not if you don't at least cc: all of the uio maintainers :(
>
> I'm just going to ignore the things, as obviously you don't want them
> merged, quite strange...

I actually do mean them to be merged and I do (tried to) cc all the 
maintainers. Unfortunately I missed the first letter when I copied your 
email from the get_maintainers.pl output. I resent v3 with the correct 
email of yours. Hope u don't have (too) hard feelings about the first 
iterations of the series. Pls., believe me there was nothing personal, 
just a typo... ;)

>
> greg k-h

[dpdk-dev] [PATCH 0/2] uio_msi: device driver

2015-10-04 Thread Vlad Zolotarov

FYI: I've just posted to linux-kernel list patches that add support for 
both MSI and MSI-X interrupt modes to uio_pci_generic driver.
It addresses most (all) remarks on this thread and also fixes some 
issues this code has, e.g. not disabling msi-x in remove(), etc.

U are all welcome to comment... ;)

thanks,
vlad

On 10/02/15 04:39, Alexander Duyck wrote:
> On 10/01/2015 05:04 PM, Stephen Hemminger wrote:
>> On Thu, 1 Oct 2015 16:43:23 -0700
>> Alexander Duyck  wrote:
>>
>>> Yes, but in the case of something like a VF it is going to just make a
>>> bigger mess of things since INTx doesn't work.  So what would you 
>>> expect
>>> your driver to do in that case?  Also we have to keep in mind that the
>>> MSI-X failure case is very unlikely.
>>>
>>> One other thing that just occurred to me is that you may want to try
>>> using the range allocation call instead of a hard set number of
>>> interrupts.  Then if you start running short on vectors you don't hard
>>> fail and instead just allocate what you can.
>> I tried that but the bookkeeping gets messy since there is no good
>> way to communicate that back to userspace and have it adapt.
>
> Actually I kind of just realized that uio_msi_open is kind of messed 
> up.  So if the MSI-X allocation fails due to no resources it will 
> return a positive value indicating the number of vectors that could be 
> allocated, a negative value if one of the input values is invalid, or 
> 0.  I'm not sure if returning a positive value on failure is an issue 
> or not.  I know the open call is supposed to return a negative value 
> or the file descriptor if not negative.  I don't know if the return 
> value might be interpreted as a file descriptor or not.
>
> Also if MSI-X is supported by the hardware, but disabled for some 
> reason by the kernel ("pci=nomsi")  then this driver is rendered 
> inoperable since it will never give you anything but -EINVAL from the 
> open call.
>
> I really think you should probably look at taking care of enabling 
> MSI-X and maybe MSI as a fall-back in probe.  At least then you can 
> post a message about how many vectors are enabled and what type. Then 
> if you cannot enable any interrupts due to MSI being disabled you can 
> simply fail at probe time and let then load a different driver.
>
> - Alex

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-01 Thread Vlad Zolotarov



On 10/01/15 17:47, Stephen Hemminger wrote:
> On Thu, 1 Oct 2015 11:00:28 +0300
> Vlad Zolotarov  wrote:
>
>>
>> On 10/01/15 00:36, Stephen Hemminger wrote:
>>> On Wed, 30 Sep 2015 23:09:33 +0300
>>> Vlad Zolotarov  wrote:
>>>
>>>> On 09/30/15 22:39, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 30, 2015 at 10:06:52PM +0300, Vlad Zolotarov wrote:
>>>>>>>> How would iommu
>>>>>>>> virtualization change anything?
>>>>>>> Kernel can use an iommu to limit device access to memory of
>>>>>>> the controlling application.
>>>>>> Ok, this is obvious but what it has to do with enabling using MSI/MSI-X
>>>>>> interrupts support in uio_pci_generic? kernel may continue to limit the
>>>>>> above access with this support as well.
>>>>> It could maybe. So if you write a patch to allow MSI by at the same time
>>>>> creating an isolated IOMMU group and blocking DMA from device in
>>>>> question anywhere, that sounds reasonable.
>>>> No, I'm only planning to add MSI and MSI-X interrupts support for
>>>> uio_pci_generic device.
>>>> The rest mentioned above should naturally be a matter of a different
>>>> patch and writing it is orthogonal to the patch I'm working on as has
>>>> been extensively discussed in this thread.
>>>>
>>> I have a generic MSI and MSI-X driver (posted earlier on this list).
>>> About to post to upstream kernel.
>> Stephen, hi!
>>
>> I found the mentioned series and first thing I noticed was that it's
>> been sent in May so the first question is how far in your list of tasks
>> submitting it upstream is? We need it more or less yesterday and I'm
>> working on it right now. Therefore if u don't have time for it I'd like
>> to help... ;) However I'd like u to clarify a few small things. Pls.,
>> see below...
>>
>> I noticed that u've created a separate msi_msix driver and the second
>> question is what do u plan for the upstream? I was thinking of extending
>> the existing uio_pci_generic with the MSI-X functionality similar to
>> your code and preserving the INT#X functionality as it is now:
> The igb_uio has a bunch of other things I didn't want to deal with:
> the name (being specific to old Intel driver); compatibility with older
> kernels; legacy ABI support. Therefore in effect uio_msi is a rebase
> of igb_uio.
>
> The submission upstream yesterday is the first step, I expect lots
> of review feedback.

Sure, we have lots of feedback already even before the patch has been 
sent... ;)
So, I'm preparing the uio_pci_generic patch. Just wanted to make sure we 
are not working on the same patch at the same time... ;)

It's going to enable both MSI and MSI-X support.
For a backward compatibility it'll enable INT#X by default.
It follows the concepts and uses some code pieces from your uio_msi 
patch. If u want I'll put u as a signed-off when I send it.


>
>>*   INT#X and MSI would provide the IRQ number to the UIO module while
>>  only MSI-X case would register with UIO_IRQ_CUSTOM.
> I wanted all IRQ's to be the same for the driver, ie all go through
> eventfd mechanism. This makes code on DPDK side consistent with less
> special cases.

Of course. The name (uio_msi) is a bit confusing since it only adds 
MSI-X support. I mistakenly thought that it adds both MSI and MSI-X but 
it seems to only add MSI-X and then there are no further questions... ;)

>
>> I also noticed that u enable MSI-X on a first open() call. I assume
>> there was a good reason (that I miss) for not doing it in probe(). Could
>> u, pls., clarify?

What about this?

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-01 Thread Vlad Zolotarov



On 10/01/15 11:44, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 11:40:16PM +0300, Michael S. Tsirkin wrote:
>>> And for what, to prevent
>>> root from touching memory via dma that they can access in a million other
>>> ways?
>> So one can be reasonably sure a kernel oops is not a result of a
>> userspace bug.
> Actually, I thought about this overnight, and  it should be possible to
> drive it securely from userspace, without hypervisor changes.
>
> See
>
> https://mid.gmane.org/20151001104505-mutt-send-email-mst at redhat.com

Looks like a dead link.

>
>
>
>> -- 
>> MST

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-01 Thread Vlad Zolotarov



On 10/01/15 00:36, Stephen Hemminger wrote:
> On Wed, 30 Sep 2015 23:09:33 +0300
> Vlad Zolotarov  wrote:
>
>>
>> On 09/30/15 22:39, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 10:06:52PM +0300, Vlad Zolotarov wrote:
>>>>>> How would iommu
>>>>>> virtualization change anything?
>>>>> Kernel can use an iommu to limit device access to memory of
>>>>> the controlling application.
>>>> Ok, this is obvious but what it has to do with enabling using MSI/MSI-X
>>>> interrupts support in uio_pci_generic? kernel may continue to limit the
>>>> above access with this support as well.
>>> It could maybe. So if you write a patch to allow MSI by at the same time
>>> creating an isolated IOMMU group and blocking DMA from device in
>>> question anywhere, that sounds reasonable.
>> No, I'm only planning to add MSI and MSI-X interrupts support for
>> uio_pci_generic device.
>> The rest mentioned above should naturally be a matter of a different
>> patch and writing it is orthogonal to the patch I'm working on as has
>> been extensively discussed in this thread.
>>
> I have a generic MSI and MSI-X driver (posted earlier on this list).
> About to post to upstream kernel.

Great! It would save me a few working days... ;) Thanks, Stephen!

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-01 Thread Vlad Zolotarov



On 09/30/15 22:39, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 10:06:52PM +0300, Vlad Zolotarov wrote:
>>>> How would iommu
>>>> virtualization change anything?
>>> Kernel can use an iommu to limit device access to memory of
>>> the controlling application.
>> Ok, this is obvious but what it has to do with enabling using MSI/MSI-X
>> interrupts support in uio_pci_generic? kernel may continue to limit the
>> above access with this support as well.
> It could maybe. So if you write a patch to allow MSI by at the same time
> creating an isolated IOMMU group and blocking DMA from device in
> question anywhere, that sounds reasonable.

No, I'm only planning to add MSI and MSI-X interrupts support for 
uio_pci_generic device.
The rest mentioned above should naturally be a matter of a different 
patch and writing it is orthogonal to the patch I'm working on as has 
been extensively discussed in this thread.

>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 22:10, Vlad Zolotarov wrote:
>
>
> On 09/30/15 22:06, Vlad Zolotarov wrote:
>>
>>
>> On 09/30/15 21:55, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:
>>>>
>>>> On 09/30/15 18:26, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
>>>>>> How not virtualizing iommu forces "all or nothing" approach?
>>>>> Looks like you can't limit an assigned device to only access part of
>>>>> guest memory that belongs to a given process.  Either let it 
>>>>> access all
>>>>> of guest memory ("all") or don't assign the device ("nothing").
>>>> Ok. A question then: can u limit the assigned device to only access 
>>>> part of
>>>> the guest memory even if iommu was virtualized?
>>> That's exactly what an iommu does - limit the device io access to 
>>> memory.
>>
>> If it does - it will continue to do so with or without the patch and 
>> if it doesn't (for any reason) it won't do it even without the patch.
>> So, again, the above (rhetorical) question stands. ;)
>>
>> I think Avi has already explained quite in detail why security is 
>> absolutely a non issue in regard to this patch or in regard to UIO in 
>> general. Security has to be enforced by some other means like iommu.
>>
>>>
>>>> How would iommu
>>>> virtualization change anything?
>>> Kernel can use an iommu to limit device access to memory of
>>> the controlling application.
>>
>> Ok, this is obvious but what it has to do with enabling using 
>> MSI/MSI-X interrupts support in uio_pci_generic? kernel may continue 
>> to limit the above access with this support as well.
>>
>>>
>>>> And why do we care about an assigned device
>>>> to be able to access all Guest memory?
>>> Because we want to be reasonably sure a kernel memory corruption
>>> is not a result of a bug in a userspace application.
>>
>> Corrupting Guest's memory due to any SW misbehavior (including bugs) 
>> is a non-issue by design - this is what HV and Guest machines were 
>> invented for. So, like Avi also said, instead of trying to enforce 
>> nobody cares about 
>
> Let me rephrase: by pretending enforcing some security promise that u 
> don't actually fulfill... ;)

...the promise nobody really cares about...

>
>> we'd rather make the developers life easier instead (by applying the 
>> not-yet-completed patch I'm working on).
>>>
>>
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 22:06, Vlad Zolotarov wrote:
>
>
> On 09/30/15 21:55, Michael S. Tsirkin wrote:
>> On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:
>>>
>>> On 09/30/15 18:26, Michael S. Tsirkin wrote:
>>>> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
>>>>> How not virtualizing iommu forces "all or nothing" approach?
>>>> Looks like you can't limit an assigned device to only access part of
>>>> guest memory that belongs to a given process.  Either let it access 
>>>> all
>>>> of guest memory ("all") or don't assign the device ("nothing").
>>> Ok. A question then: can u limit the assigned device to only access 
>>> part of
>>> the guest memory even if iommu was virtualized?
>> That's exactly what an iommu does - limit the device io access to 
>> memory.
>
> If it does - it will continue to do so with or without the patch and 
> if it doesn't (for any reason) it won't do it even without the patch.
> So, again, the above (rhetorical) question stands. ;)
>
> I think Avi has already explained quite in detail why security is 
> absolutely a non issue in regard to this patch or in regard to UIO in 
> general. Security has to be enforced by some other  means like iommu.
>
>>
>>> How would iommu
>>> virtualization change anything?
>> Kernel can use an iommu to limit device access to memory of
>> the controlling application.
>
> Ok, this is obvious but what it has to do with enabling using 
> MSI/MSI-X interrupts support in uio_pci_generic? kernel may continue 
> to limit the above access with this support as well.
>
>>
>>> And why do we care about an assigned device
>>> to be able to access all Guest memory?
>> Because we want to be reasonably sure a kernel memory corruption
>> is not a result of a bug in a userspace application.
>
> Corrupting Guest's memory due to any SW misbehavior (including bugs) 
> is a non-issue by design - this is what HV and Guest machines were 
> invented for. So, like Avi also said, instead of trying to enforce 
> nobody cares about 

Let me rephrase: by pretending enforcing some security promise that u 
don't actually fulfill... ;)

> we'd rather make the developers life easier instead (by applying the 
> not-yet-completed patch I'm working on).
>>
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 21:55, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 18:26, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
>>>> How not virtualizing iommu forces "all or nothing" approach?
>>> Looks like you can't limit an assigned device to only access part of
>>> guest memory that belongs to a given process.  Either let it access all
>>> of guest memory ("all") or don't assign the device ("nothing").
>> Ok. A question then: can u limit the assigned device to only access part of
>> the guest memory even if iommu was virtualized?
> That's exactly what an iommu does - limit the device io access to memory.

If it does - it will continue to do so with or without the patch and if 
it doesn't (for any reason) it won't do it even without the patch.
So, again, the above (rhetorical) question stands. ;)

I think Avi has already explained quite in detail why security is 
absolutely a non issue in regard to this patch or in regard to UIO in 
general. Security has to be enforced by some other  means like iommu.

>
>> How would iommu
>> virtualization change anything?
> Kernel can use an iommu to limit device access to memory of
> the controlling application.

Ok, this is obvious but what it has to do with enabling using MSI/MSI-X 
interrupts support in uio_pci_generic? kernel may continue to limit the 
above access with this support as well.

>
>> And why do we care about an assigned device
>> to be able to access all Guest memory?
> Because we want to be reasonably sure a kernel memory corruption
> is not a result of a bug in a userspace application.

Corrupting Guest's memory due to any SW misbehavior (including bugs) is 
a non-issue by design - this is what HV and Guest machines were invented 
for. So, like Avi also said, instead of trying to enforce nobody cares 
about we'd rather make the developers life easier instead (by applying 
the not-yet-completed patch I'm working on).
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov

On 09/30/15 18:26, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
>> How not virtualizing iommu forces "all or nothing" approach?
> Looks like you can't limit an assigned device to only access part of
> guest memory that belongs to a given process.  Either let it access all
> of guest memory ("all") or don't assign the device ("nothing").

Ok. A question then: can u limit the assigned device to only access part 
of the guest memory even if iommu was virtualized? How would iommu 
virtualization change anything? And why do we care about an assigned 
device to be able to access all Guest memory?

>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 15:27, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 15:03, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
>>>> On 09/30/15 14:41, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>>>>>>   The whole idea is to bypass kernel. Especially for networking...
>>>>> ... on dumb hardware that doesn't support doing that securely.
>>>> On a very capable HW that supports whatever security requirements needed
>>>> (e.g. 82599 Intel's SR-IOV VF devices).
>>> Network card type is irrelevant as long as you do not have an IOMMU,
>>> otherwise you would just use e.g. VFIO.
>> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
>> example where there *is* iommu but it's not virtualized
>> and thus VFIO is
>> useless and there is an option to use directly assigned SR-IOV networking
>> device there where using the kernel drivers impose a performance impact
>> compared to user space UIO-based user space kernel bypass mode of usage. How
>> is it irrelevant? Could u, pls, clarify your point?
>>
> So it's not even dumb hardware, it's another piece of software
> that forces an "all or nothing" approach where either
> device has access to all VM memory, or none.
> And this, unfortunately, leaves you with no secure way to
> allow userspace drivers.
UIO is not secure even today so what are we arguing about? ;)
Adding MSI/MSI-X support won't change this state, so, pls., discard the 
security argument unless u thing that UIO is completely secure piece of 
software today. In the later case, could u, pls., clarify what would 
prevent the userspace program to configure a DMA controller via 
registers and do whatever it wants?


How not virtualizing iommu forces "all or nothing" approach? What 
insecure in relying on HV to control the iommu and not letting the VF 
any access to it?
As far as I see it - there isn't any security problem here at all. The 
only problem I see here is that dumb current uio_pci_generic 
implementation forces people to go and invent the workarounds instead of 
having a proper MSI/MSI-X support implemented. And as I've mentioned 
above it has nothing to do with security because there is no such thing 
as security (on the UIO driver level) when we talk about UIO - it has to 
be ensured by some other entity like HV.

>
> So it makes even less sense to add insecure work-arounds in the kernel.
> It seems quite likely that by the time the new kernel reaches
> production X years from now, EC2 will have a virtual iommu.

I'd bet that new kernel would reach production long before Amazon does 
that... ;)

>
>
>>>>> Colour me unimpressed.
>>>>>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov

On 09/30/15 15:03, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 14:41, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>>>> The whole idea is to bypass kernel. Especially for networking...
>>> ... on dumb hardware that doesn't support doing that securely.
>> On a very capable HW that supports whatever security requirements needed
>> (e.g. 82599 Intel's SR-IOV VF devices).
> Network card type is irrelevant as long as you do not have an IOMMU,
> otherwise you would just use e.g. VFIO.

Sorry, but I don't follow your logic here - Amazon EC2 environment is a 
example where there *is* iommu but it's not virtualized and thus VFIO is 
useless and there is an option to use directly assigned SR-IOV 
networking device there where using the kernel drivers impose a 
performance impact compared to user space UIO-based user space kernel 
bypass mode of usage. How is it irrelevant? Could u, pls, clarify your 
point?

>
>>> Colour me unimpressed.
>>>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 14:41, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>> The whole idea is to bypass kernel. Especially for networking...
> ... on dumb hardware that doesn't support doing that securely.

On a very capable HW that supports whatever security requirements needed 
(e.g. 82599 Intel's SR-IOV VF devices).

> Colour me unimpressed.
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 13:58, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 01:37:22PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 00:49, Michael S. Tsirkin wrote:
>>> On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
>>>> On Tue, 29 Sep 2015 23:54:54 +0300
>>>> "Michael S. Tsirkin"  wrote:
>>>>
>>>>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
>>>>>> The security breach motivation u brought in "[RFC PATCH] uio:
>>>>>> uio_pci_generic: Add support for MSI interrupts" thread seems a bit weak
>>>>>> since one u let the userland access to the bar it may do any funny thing
>>>>>> using the DMA engine of the device. This kind of stuff should be 
>>>>>> prevented
>>>>>> using the iommu and if it's enabled then any funny tricks using MSI/MSI-X
>>>>>> configuration will be prevented too.
>>>>>>
>>>>>> I'm about to send the patch to main Linux mailing list. Let's continue 
>>>>>> this
>>>>>> discussion there.
>>>>> Basically UIO shouldn't be used with devices capable of DMA.
>>>>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
>> If there is an IOMMU in the picture there shouldn't be any problem to use
>> UIO with DMA capable devices.
> UIO doesn't enforce the IOMMU though. That's why it's not a good fit.

Having said all that - does UIO denies to work with the devices with DMA 
capability today? Either i have missed that logic or it's not there.
So all what u are so worried about may already be done today. That's why 
I don't understand why adding a support for MSI/MSI-X interrupts
would change anything here. U are right that UIO *today* has a security 
hole however it should be addressed separately and the same solution
that will cover the the security breach in the current code will cover 
the "MSI/MSI-X security vulnerability" because they are actually exactly 
the same
issue.

>
>>>>> I don't think this can change.
>>>> Given there is no PV IOMMU and even if there was it would be too slow for 
>>>> DPDK
>>>> use, I can't accept that.
>>> QEMU does allow emulating an iommu.
>> Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an option
>> there.
> Not only that, a bunch of boxes have their IOMMU disabled.
> So for such systems, you can't have userspace poking at
> device registers. You need a kernel driver to validate
> userspace requests before passing them on to devices.

I think u are describing a HV functionality here. ;) And yes, u are 
absolutely right, HV has to control the non-privileged userland.
For HV/non-virtualized boxes a possible solution could be to allow UIO 
only for some privileged group of processes.

>
>> And again, it's a general issue not DPDK specific.
>> Today one has to develop some proprietary modules (like igb_uio) to
>> workaround the issue and this is lame.
> Of course it is lame. So don't bypass the kernel then, use the upstream 
> drivers.

This would impose a heavy performance penalty. The whole idea is to 
bypass kernel. Especially for networking...

>
>> IMHO uio_pci_generic should
>> be fixed to be able to properly work within any virtualized environment and
>> not only with KVM.
> The motivation for UIO is pretty clear:
>
>  For many types of devices, creating a Linux kernel driver is
>  overkill.  All that is really needed is some way to handle an
>  interrupt and provide access to the memory space of the
>  device.  The logic of controlling the device does not
>  necessarily have to be within the kernel, as the device does
>  not need to take advantage of any of other resources that the
>  kernel provides.  One such common class of devices that are
>  like this are for industrial I/O cards.
>
> Devices doing DMA do need to take advantage of memory protection
> that the kernel provides.
Well, yeah - but who said I has to be forbidden to work with the device 
if MSI-X interrupts is my only option?

Kernel may provide a protection in the way that it would check the 
process permissions and deny the UIO access to non-privileged processes.
I'm not sure it's the case today and if it's not the case then, as 
mentioned above, this would rather be fixed ASAP exactly due to reasons 
u bring up
here. And once it's done there shouldn't be any limitation to allow MSI 
or MSI-X interrupts along with INT#X.

>
>>>   DPDK uses static mappings, so I
>>> doubt it's speed matters at all.
>>>
>>> Anyway, DPDK is doing polling all the time. I don't see why does it
>>> insist on using interrupts to detect link up events. Just poll for that
>>> too.
>>>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov



On 09/30/15 00:49, Michael S. Tsirkin wrote:
> On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
>> On Tue, 29 Sep 2015 23:54:54 +0300
>> "Michael S. Tsirkin"  wrote:
>>
>>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
>>>> The security breach motivation u brought in "[RFC PATCH] uio:
>>>> uio_pci_generic: Add support for MSI interrupts" thread seems a bit weak
>>>> since one u let the userland access to the bar it may do any funny thing
>>>> using the DMA engine of the device. This kind of stuff should be prevented
>>>> using the iommu and if it's enabled then any funny tricks using MSI/MSI-X
>>>> configuration will be prevented too.
>>>>
>>>> I'm about to send the patch to main Linux mailing list. Let's continue this
>>>> discussion there.
>>>>
>>> Basically UIO shouldn't be used with devices capable of DMA.
>>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).

If there is an IOMMU in the picture there shouldn't be any problem to 
use UIO with DMA capable devices.

>>> I don't think this can change.
>> Given there is no PV IOMMU and even if there was it would be too slow for 
>> DPDK
>> use, I can't accept that.
> QEMU does allow emulating an iommu.

Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
option there. And again, it's a general issue not DPDK specific.
Today one has to develop some proprietary modules (like igb_uio) to 
workaround the issue and this is lame. IMHO uio_pci_generic should
be fixed to be able to properly work within any virtualized environment 
and not only with KVM.



>   DPDK uses static mappings, so I
> doubt it's speed matters at all.
>
> Anyway, DPDK is doing polling all the time. I don't see why does it
> insist on using interrupts to detect link up events. Just poll for that
> too.
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-29 Thread Vlad Zolotarov



On 09/27/15 12:43, Michael S. Tsirkin wrote:
> On Sun, Sep 27, 2015 at 10:05:11AM +0300, Vlad Zolotarov wrote:
>> Hi,
>> I was trying to use uio_pci_generic with Intel's 10G SR-IOV devices on
>> Amazon EC2 instances with Enhanced Networking enabled.
>> The idea is to create a DPDK environment that doesn't require compiling
>> kernel modules (igb_uio).
>> However I was surprised to discover that uio_pci_generic refuses to work
>> with EN device on AWS:
>>
>> $ lspci
>> 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
>> 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>> 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
>> 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
>> 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
>> 00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
>> Virtual Function (rev 01)
>> 00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
>> Virtual Function (rev 01)
>> 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
>>
>> $ sudo ./dpdk/tools/dpdk_nic_bind.py -b uio_pci_generic 00:04.0
>> Error: bind failed for :00:04.0 - Cannot bind to driver uio_pci_generic
>> $dmesg
>>
>> --> snip <---
>> [  816.655575] uio_pci_generic :00:04.0: No IRQ assigned to device: no 
>> support for interrupts?
>>
>> $ sudo lspci -s 00:04.0 -vvv
>> 00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
>> Virtual Function (rev 01)
>>  Physical Slot: 4
>>  Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
>> Stepping- SERR- FastB2B- DisINTx-
>>  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- >  Region 0: Memory at f3008000 (64-bit, prefetchable) [size=16K]
>>  Region 3: Memory at f300c000 (64-bit, prefetchable) [size=16K]
>>  Capabilities: [70] MSI-X: Enable- Count=3 Masked-
>>  Vector table: BAR=3 offset=
>>  PBA: BAR=3 offset=2000
>>  Kernel modules: ixgbevf
>>
>> So, as we may see the PCI device doesn't have an INTX interrupt line
>> assigned indeed. It has an MSI-X capability however.
>> Looking at the uio_pci_generic code it seems to require the INTX:
>>
>> uio_pci_generic.c: line 74: probe():
>>
>>  if (!pdev->irq) {
>>  dev_warn(>dev, "No IRQ assigned to device: "
>>   "no support for interrupts?\n");
>>  pci_disable_device(pdev);
>>  return -ENODEV;
>>  }
>>
>> Is it a known limitation? Michael, could u, pls., comment on this?
>>
>> thanks,
>> vlad

Michael, I took a look at the pci_stub driver and the reason why DPDK 
uses uio the first place and I have some comments below.

> This is expected. uio_pci_generic forwards INT#x interrupts from device
> to userspace, but VF devices never assert INT#x.
>
> So it doesn't seem to make any sense to bind uio_pci_generic there.

Well, it's not completely correct to put it this way. The thing is that 
DPDK (and it could be any other framework/developer)
uses uio_pci_generic to actually get interrupts from the device and it 
makes a perfect sense to be able to do so
in the SR-IOV devices too. The problem is, like u've described above, 
that the current implementation of uio_pci_generic
wouldn't let them do so and it seems like a bogus behavior to me. There 
is no reason, why uio_pci_generic wouldn't be able to work
the same way as it does today but with MSI-X interrupts. To make things 
simple forwarding just the first vector as an initial implementation.

The security breach motivation u brought in "[RFC PATCH] uio: 
uio_pci_generic: Add support for MSI interrupts" thread seems a bit weak
since one u let the userland access to the bar it may do any funny thing 
using the DMA engine of the device. This kind of stuff should be prevented
using the iommu and if it's enabled then any funny tricks using 
MSI/MSI-X configuration will be prevented too.

I'm about to send the patch to main Linux mailing list. Let's continue 
this discussion there.

>
> I think that DPDK should be fixed to not require uio_pci_generic
> for VF devices (or any devices without INT#x).
>
> If DPDK requires a place-holder driver, the pci-stub driver should
> do this adequately. See ./drivers/pci/pci-stub.c
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-27 Thread Vlad Zolotarov

Hi,
I was trying to use uio_pci_generic with Intel's 10G SR-IOV devices on 
Amazon EC2 instances with Enhanced Networking enabled.
The idea is to create a DPDK environment that doesn't require compiling 
kernel modules (igb_uio).
However I was surprised to discover that uio_pci_generic refuses to work 
with EN device on AWS:

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)

$ sudo ./dpdk/tools/dpdk_nic_bind.py -b uio_pci_generic 00:04.0
Error: bind failed for :00:04.0 - Cannot bind to driver uio_pci_generic

$dmesg

--> snip <---
[  816.655575] uio_pci_generic :00:04.0: No IRQ assigned to device: no 
support for interrupts?

$ sudo lspci -s 00:04.0 -vvv
00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
Physical Slot: 4
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- irq) {
dev_warn(>dev, "No IRQ assigned to device: "
 "no support for interrupts?\n");
pci_disable_device(pdev);
return -ENODEV;
}

Is it a known limitation? Michael, could u, pls., comment on this?

thanks,
vlad

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-13 Thread Vlad Zolotarov



On 09/13/15 14:47, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Avi Kivity
>> Sent: Friday, September 11, 2015 6:48 PM
>> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 
>> for all NICs but 82598
>>
>> On 09/11/2015 07:08 PM, Thomas Monjalon wrote:
>>> 2015-09-11 18:43, Avi Kivity:
 On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
> On Sep 11, 2015 5:55 PM, "Thomas Monjalon"  > wrote:
>> 2015-09-11 17:47, Avi Kivity:
>>> On 09/11/2015 05:25 PM, didier.pallard wrote:
 Hi vlad,

 Documentation states that a packet (or multiple packets in transmit
 segmentation) can span any number of
 buffers (and their descriptors) up to a limit of 40 minus WTHRESH
 minus 2.

 Shouldn't there be a test in transmit function that drops
> properly the
 mbufs with a too large number of
 segments, while incrementing a statistic; otherwise transmit
> function
 may be locked by the faulty packet without
 notification.

>>> What we proposed is that the pmd expose to dpdk, and dpdk expose
> to the
>>> application, an mbuf check function.  This way applications that can
>>> generate complex packets can verify that the device will be able to
>>> process them, and applications that only generate simple mbufs can
> avoid
>>> the overhead by not calling the function.
>> More than a check, it should be exposed as a capability of the port.
>> Anyway, if the application sends too much segments, the driver must
>> drop it to avoid hang, and maintain a dedicated statistic counter to
>> allow easy debugging.
> I agree with Thomas - this should not be optional. Malformed packets
> should be dropped. In the icgbe case it's a very simple test - it's a
> single branch per packet so i doubt that it could impose any
> measurable performance degradation.
 A drop allows the application no chance to recover.  The driver must
 either provide the ability for the application to know that it cannot
 accept the packet, or it must fix it up itself.
>>> I have the feeling that everybody agrees on the same thing:
>>> the application must be able to make a well formed packet by checking
>>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs?
>> It is not generic enough.  i40e has a limit that it imposes post-TSO.
>>
>>
>>> In case the application fails in its checks, the driver must drop it and
>>> notify the user via a stat counter.
>>> The driver can also remove the hardware limitation by gathering the segments
>>> but it may be hard to implement and would be a slow operation.
>> I think that to satisfy both the 64b full line rate applications and the
>> more complicated full stack applications, this must be made optional.
>> In particular, and application that only forwards packets will never hit
>> a NIC's limits, so it need not take any action. That's why I think a
>> verification function is ideal; a forwarding application can ignore it,
>> and a complex application can call it, and if it fails the packet, it
>> can linearize it itself, removing complexity from dpdk itself.
> I think that's a good approach to that problem.
> As I remember we discussed something similar a while ago -
> A function (tx_prep() or something) that would check nb_segs and probably 
> some other HW specific restrictions,
> calculate pseudo-header checksum, reset ip header len, etc.
>
>  From other hand we also can add two more fields into rte_eth_dev_info:
> 1) Max num of segs per TSO packet (tx_max_seg ?).
> 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?).
> So for ixgbe both will have value 40 - wthresh,
> while for i40e 1) would be UINT8_MAX and 2) will be 8.
> Then upper layer can use that information to select an optimal size for its 
> TX buffers.

HW limitations differ from HW to HW not only by values but also by their 
nature - for instance for Qlogic bnx2x NICs the limitations may not be 
expressed in the values above so this must be a callback.

>   
> Konstantin
>

[dpdk-dev] [PATCH v1] net: i40e: add VLAN tag size to RXMAX

2015-08-31 Thread Vlad Zolotarov

HW requires it regardless the presence of the VLAN tag in the received frame.
Otherwise Rx frames are being filtered out on the MTU-4 boundary.

Signed-off-by: Vlad Zolotarov 
---
 drivers/net/i40e/i40e_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index eae4ab0..22aaeb1 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -3156,7 +3156,7 @@ i40e_rx_queue_init(struct i40e_rx_queue *rxq)
rx_ctx.hsplit_0 = I40E_HEADER_SPLIT_ALL;
else
rx_ctx.hsplit_0 = I40E_HEADER_SPLIT_NONE;
-   rx_ctx.rxmax = rxq->max_pkt_len;
+   rx_ctx.rxmax = rxq->max_pkt_len + I40E_VLAN_TAG_SIZE;
rx_ctx.tphrdesc_ena = 1;
rx_ctx.tphwdesc_ena = 1;
rx_ctx.tphdata_ena = 1;
-- 
2.1.0

[dpdk-dev] i40e: XL710 Rx filters out frames above 1510 bytes

2015-08-31 Thread Vlad Zolotarov



On 08/30/15 16:03, Vlad Zolotarov wrote:
> Hi, I have the most strange issue on a setup of 2 pairs of connected 
> back to back XL710 cards.
> It seems that frames above 1510 bytes are being filtered out by an 
> i40e PMD receiver.
> The same setup works perfectly when I use Linux drivers on both sides 
> but when I use a PMD on one side and a Linux driver on the other - the 
> issue above occurs.
>
> i40e PMD statistics show nothing: all counters stay zero.
> The MFS field in PRTGL_SAH register has the expected (reset) value: 
> 0x2600.

We've found the problem. There is additional configuration per Rx queue 
context that limits the maximum
allowed frame size - RXMAX. And it is configured to be 1518 by default. 
Which is supposed to be MTU + L2 HDR + CRC = 1500 + 14 + 4. However,
when the MTU sized frames are being sent they are still being filtered 
out. This is probably due to some HW "feature" that requires RXMAX 
include VLAN header size even if there isn't any VLAN header. This 
"feature" has been addressed for instance in the Jesse's 61a46a4c0 
commit in the net-next tree.

In addition there is some mysterious "GLV_REPC/GLVREPC counter of the 
VSI" that is supposed to count this and a few other Rx drops events but 
this counter is nowhere to be found - neither in the data sheet (its 
offset on the bar I mean), nor in the DPDK i40e PMD registers 
description, nor in the net-next Linux driver. DPDK has a "/* GLV_REPC 
not supported */" in the i40e_ethdev.c implying that there must be some 
problem with this register too and it's a pity.

I'll send a patch to the dpdk-dev that fixes the RXMAX issue shortly.

thanks,
vlad

>
> I even tried to disable all Tx offload capabilities on the Linux side 
> - still no success.
>
> Could u, guys clarify to be what is going on and what I may be doing 
> wrong?
> Pls., let me know if u need any additional info about the setup.
>
> thanks in advance,
> vlad
>
>
>How to reproduce the issue:
>
> I used the in tree apps from examples/multi_process/client_server_mp:
>
>
>  On the DPDK box:
>
> 1. Bound both NICs to DPDK's UIO.
> 2. Configured 1024 huge pages on each NUMA Node (there are two NUMA
>Nodes in total).
>
> $ cat /proc/meminfo  | grep Huge
> AnonHugePages: 0 kB
> HugePages_Total:2048
> HugePages_Free:  292
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
>
> 3. Compiled mp_server and mp_client example applications.
> 4. Run: sudo ./build/mp_server -c 6 -n 4  --  -p 0x3 -n 1
> 5. Run: sudo ./build/mp_client -c 8 -n 4 --proc-type=auto -- -n 0
>
>
>  On the "other side" box (with the Linux device drivers):
>
> 1. configure a static arp:
>
> $ arp 192.169.10.118
> Address  HWtype  HWaddress   Flags 
> MaskIface
> 192.169.10.118   ether   68:05:ca:2d:39:88 
> CMens3
>
> 2. Set the MTU on ens3 to 3000 (see below).
> 3. Use *ping* application for sending frames of different sizes:
> 1. "ping 192.169.10.118 -s 1400" makes both Rx and Tx counters on
>the DPDK side increment.
> 2. "ping 192.169.10.118 -s 1500" - Rx counter on the DPDK side
>doesn't increment.
>
>
>The DPDK PMD driven setup description:
>
>  * Two XL710 single port cards connected to XL710 cards in a different
>server back to back.
>
> $ lspci | grep Ether
> ---snip---
> 05:00.0 Ethernet controller: Intel Corporation Ethernet Controller 
> XL710 for 40GbE QSFP+ (rev 01)
> 83:00.0 Ethernet controller: Intel Corporation Ethernet Controller 
> XL710 for 40GbE QSFP+ (rev 01)
> ---snip---
>
> $ ethtool -i ens288
> driver: i40e
> version: 1.3.2-k
> firmware-version: f4.22.26225 a1.1 n4.24 e13fd
> bus-info: :83:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> $ ethtool -i ens785
> driver: i40e
> version: 1.3.2-k
> firmware-version: f4.22.26225 a1.1 n4.24 e13fd
> bus-info: :05:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> $ ifconfig
> ---snip---
> ens288: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
> ether 68:05:ca:2d:3a:00  txqueuelen 1000  (Ethernet)
> RX packets 0  bytes 0 (0.0 B)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 0  bytes 0 (0.0 B)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> ens785: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mt

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-26 Thread Vlad Zolotarov



On 08/25/15 22:30, Vladislav Zolotarov wrote:
>
>
> On Aug 25, 2015 22:16, "Zhang, Helin"  <mailto:helin.zhang at intel.com>> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>]
> > > Sent: Tuesday, August 25, 2015 11:53 AM
> > > To: Zhang, Helin
> > > Cc: Lu, Wenzhuo; dev at dpdk.org <mailto:dev at dpdk.org>
> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for
> > > all NICs but 82598
> > >
> > >
> > >
> > > On 08/25/15 21:43, Zhang, Helin wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > > I think this could possibly be the root cause of your TX hang issue.
> > > > Please try to limit the number to 8 or less, and then see if the 
> issue
> > > > will still be there or not?
> > > >
> > >
> > > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > > 7.2.1.1 of x540 devices spec:
> > >
> > > A packet (or multiple packets in transmit segmentation) can span 
> any number of
> > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH 
> minus 2 (see
> > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 
> for WTHRESH
> > > details). For best performance it is recommended to minimize the 
> number of
> > > buffers as possible.
> > >
> > > Could u, pls., clarify why do u think that the maximum number of 
> data buffers is
> > > limited by 8?
> > OK, i40e hardware is 8
>
> For i40 it's a bit more complicated than just "not more than 8" - it's 
> not more than 8 for a non-TSO packet and not more than 8 for each MSS 
> including headers buffers for TSO. But this thread is not about i40e 
> so this doesn't seem to be relevant anyway.
>
> , so I'd assume x540 could have a similar one.
>
> x540 spec assumes otherwise... ?
>
> Yes, in your case,
> > the limit could be around 38, right?
>
> If by "around 38" u mean "exactly 38" then u are absolutely right... ?
>
> > Could you help to make sure there is no packet to be transmitted 
> uses more than
> > 38 descriptors?
>
> Just like i've already mentioned, we limit the cluster by at most 33 
> data segments. Therefore we are good here...
>
> > I heard that there is a similar hang issue on X710 if using more 
> than 8 descriptors for
> > a single packet. I am wondering if the issue is similar on x540.
>
> What's x710? If that's xl710 40G nics (i40e driver),
>

I've found what x710 NICs are - they are another NICs family managed by 
i40e PMD. Therefore the rest of what I said stands the same... ;)

> then it has its own specs with its own HW limitations i've mentioned 
> above. It has nothing to do with this thread that is all about 10G 
> nics managed by ixgbe driver.
>
> There is a different thread, where i've raised the 40G NICs xmit 
> issues. See "i40e xmit path HW limitation" thread.
>
> >
> > Regards,
> > Helin
> >
> > >
> > > thanks,
> > > vlad
> > >
> > > > It does not have any check for the number of descriptors to be used
> > > > for a single packet, and it relies on the users to give correct mbuf
> > > > chains.
> > > >
> > > > We may need a check of this somewhere. Of cause the point you
> > > > indicated we also need to carefully investigate or fix.
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>]
> > > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > > *To:* Zhang, Helin
> > > > *Cc:* Lu, Wenzhuo; dev at dpdk.org <mailto:dev at dpdk.org>
> > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > > above 1 for all NICs but 82598
> > > >
> > > >
> > > > On Aug 25, 2015 21:14, "Zhang, Helin"  <mailto:helin.zhang at intel.com>
> > > > <mailto:helin.zhang at intel.com <mailto:helin.zhang at intel.com>>> 
> wrote:
> > > > >
> > > > > Hi Vlad
> > > > >
> > > > >
> > > > >
> > > > > In addition, I?d double check with you what?s the maximum 
> number

[dpdk-dev] i40e and RSS woes

2015-08-24 Thread Vlad Zolotarov



On 08/24/15 20:51, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Monday, August 24, 2015 4:14 AM
>> To: Zhang, Helin; Gleb Natapov; dev at dpdk.org
>> Subject: Re: [dpdk-dev] i40e and RSS woes
>>
>>
>>
>> On 03/05/15 07:56, Zhang, Helin wrote:
>>> Hi Gleb
>>>
>>> Sorry for late! I am struggling on my tasks for the following DPDK release 
>>> these
>> days.
>>>> -Original Message-
>>>> From: Gleb Natapov [mailto:gleb at cloudius-systems.com]
>>>> Sent: Monday, March 2, 2015 8:56 PM
>>>> To: dev at dpdk.org
>>>> Cc: Zhang, Helin
>>>> Subject: Re: i40e and RSS woes
>>>>
>>>> Ping.
>>>>
>>>> On Thu, Feb 19, 2015 at 04:50:10PM +0200, Gleb Natapov wrote:
>>>>> CCing i40e driver author in a hope to get an answer.
>>>>>
>>>>> On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
>>>>>> I have an application that works reasonably well with ixgbe driver,
>>>>>> but when I try to use it with i40e I encounter various RSS related 
>>>>>> issues.
>>>>>>
>>>>>> First one is that for some reason i40e, when it builds default reta
>>>>>> table, round down number of queues to power of two. Why is this? If
>>> It seems because of i40e queue configuration. We will check it more
>>> and see if it can be changed or improved later.
>> Helin, hi!
>> Sorry for bringing it back but it seems that the RSS queues number issue
>> (rounding it down to the nearest power of 2) still hasn't been addressed in 
>> the
>> master branch.
>>
>> Could u, pls., clarify what is that "i40e queue configuration" that requires 
>> this
>> alignment u are referring above?
>>
>>   From what i could see "num" parameter is not propagated outside the
>> i40e_pf_config_rss() in any form except for RSS table contents.
>> This means that any code that would need to know the number of Rx queues
>> would use the dev_data->nb_rx_queues (e.g. i40e_dev_rx_init()) and wouldn't
>> be able to know that i40e_pf_config_rss() something different except for
>> scanning the RSS table in HW which is of course not an option.
>>
>> Therefore, from the first look it seems that this rounding may be safely 
>> removed
>> unless I've missed something.
> Could you help to refer to the data sheet of 'Hash Filter', 'Receive Queue 
> Regions', it
> is said that '1, 2, 4, 8, 16, 32, 64' are the supported queue regions.
> Yes, we should support more than 64 queues per port, but for rss, it should 
> be one
> of '1, 2, 4, 8, 16, 32, 64'.

"The VSIs support 8 regions of receive queues that are aimed mainly for
the TCs. The TC regions are defined per VSI by the VSIQF_TCREGION
register. The region sizes (defined by the TC_SIZE fields) can be any of
the following value: 1, 2, 4, 8, 16, 32, 64 as long as the total number of
queues do not exceed the VSI allocation. These regions starts at the
offset defined by the TC_OFFSET parameter. According to the region
size, the ?n? LS bits of the Queue Index from the LUT are enabled."

I think the above says that the region sizes may only be one of the 
mentioned values.

AFAIU this doesn't mean that the number or RSS queues has to be the same 
- it may not exceed it.

Just like it's stated in the "Outcome Queue Index" definition the final 
mapping to the PF index space is done using the
VSILAN_QTABLE or VSILAN_QBASE registers (a.k.a. RSS indirection table).

For instance if u have a region of size 8 u may configure 3 RSS queues 
by setting the following RSS table:
0,1,2,0,1,2,0,1

>
> Thanks,
> Helin
>
>> Pls., comment.
>>
>> thanks,
>> vlad
>>
>>>>>> I configure reta by my own using all of the queues everything seams
>>>>>> to be working. To add insult to injury I do not get any errors
>>>>>> during configuration some queues just do not receive any traffic.
>>>>>>
>>>>>> The second problem is that for some reason i40e does not use 40
>>>>>> byte toeplitz hash key like any other driver, but it expects the
>>>>>> key to be 52 bytes. And it would have being fine (if we ignore the
>>>>>> fact that it contradicts MS spec), but how my high level code
>>>>>> suppose to know
>>>> that?
>>> Actually a rss_key_len was introduced in struct rte_eth_rss_conf
>>> recently. So

[dpdk-dev] i40e and RSS woes

2015-08-24 Thread Vlad Zolotarov



On 03/05/15 07:56, Zhang, Helin wrote:
> Hi Gleb
>
> Sorry for late! I am struggling on my tasks for the following DPDK release 
> these days.
>
>> -Original Message-
>> From: Gleb Natapov [mailto:gleb at cloudius-systems.com]
>> Sent: Monday, March 2, 2015 8:56 PM
>> To: dev at dpdk.org
>> Cc: Zhang, Helin
>> Subject: Re: i40e and RSS woes
>>
>> Ping.
>>
>> On Thu, Feb 19, 2015 at 04:50:10PM +0200, Gleb Natapov wrote:
>>> CCing i40e driver author in a hope to get an answer.
>>>
>>> On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
 I have an application that works reasonably well with ixgbe driver,
 but when I try to use it with i40e I encounter various RSS related issues.

 First one is that for some reason i40e, when it builds default reta
 table, round down number of queues to power of two. Why is this? If
> It seems because of i40e queue configuration. We will check it more and see
> if it can be changed or improved later.

Helin, hi!
Sorry for bringing it back but it seems that the RSS queues number issue 
(rounding it down to the nearest power of 2)
still hasn't been addressed in the master branch.

Could u, pls., clarify what is that "i40e queue configuration" that 
requires this alignment u are referring above?

 From what i could see "num" parameter is not propagated outside the 
i40e_pf_config_rss() in any form except for RSS table contents.
This means that any code that would need to know the number of Rx queues 
would use the dev_data->nb_rx_queues (e.g. i40e_dev_rx_init())
and wouldn't be able to know that i40e_pf_config_rss() something 
different except for scanning the RSS table in HW which is of course not 
an option.

Therefore, from the first look it seems that this rounding may be safely 
removed unless I've missed something.

Pls., comment.

thanks,
vlad

>
 I configure reta by my own using all of the queues everything seams
 to be working. To add insult to injury I do not get any errors
 during configuration some queues just do not receive any traffic.

 The second problem is that for some reason i40e does not use 40 byte
 toeplitz hash key like any other driver, but it expects the key to
 be 52 bytes. And it would have being fine (if we ignore the fact
 that it contradicts MS spec), but how my high level code suppose to know
>> that?
> Actually a rss_key_len was introduced in struct rte_eth_rss_conf recently. So 
> the
> length should be indicated clearly. But I found the annotations of that 
> structure
> should have been reworked. I will try to rework it with clear descriptions.
>
 And again, device configuration does not fail when wrong key length
 is provided, it just uses some other key. Guys this kind of error
 handling is completely unacceptable.
> If less length of key is provided, it will not be used at all, the default 
> key will be used.
> So there is no issue as you said. But we need to add more clear description 
> for the
> structure of rte_eth_rss_conf.
>
> Thank you very much for the good comments!
>
> Regards,
> Helin
>
 The last one is more of a question. Why interface to change RSS hash
 function (XOR or toeplitz) is part of a filter configuration and not
 rss config?

 --
Gleb.
>>> --
>>> Gleb.
>> --
>>  Gleb.

[dpdk-dev] [PATCH v4] ixgbe_pmd: enforce RS bit on every EOP descriptor for devices newer than 82598

2015-08-24 Thread Vlad Zolotarov



On 08/20/15 18:37, Vlad Zolotarov wrote:
> According to 82599 and x540 HW specifications RS bit *must* be
> set in the last descriptor of *every* packet.
>
> Before this patch there were 3 types of Tx callbacks that were setting
> RS bit every tx_rs_thresh descriptors. This patch introduces a set of
> new callbacks, one for each type mentioned above, that will set the RS
> bit in every EOP descriptor.
>
> ixgbe_set_tx_function() will set the appropriate Tx callback according
> to the device family.

[+Jesse and Jeff]

I've started to look at the i40e PMD and it has the same RS bit 
deferring logic
as ixgbe PMD has (surprise, surprise!.. ;)). To recall, i40e PMD uses a 
descriptor write-back
completion mode.

 From the HW Spec it's unclear if RS bit should be set on *every* descriptor
with EOP bit. However I noticed that Linux driver, before it moved to 
HEAD write-back mode, was setting RS
bit on every EOP descriptor.

So, here is a question to Intel guys: could u, pls., clarify this point?

Thanks in advance,
vlad

>
> This patch fixes the Tx hang we were constantly hitting with a
> seastar-based application on x540 NIC.
>
> Signed-off-by: Vlad Zolotarov 
> ---
> New in v4:
> - Styling (white spaces) fixes.
>
> New in v3:
> - Enforce the RS bit setting instead of enforcing tx_rs_thresh to be 1.
> ---
>   drivers/net/ixgbe/ixgbe_ethdev.c   |  14 +++-
>   drivers/net/ixgbe/ixgbe_ethdev.h   |   4 ++
>   drivers/net/ixgbe/ixgbe_rxtx.c | 139 
> -
>   drivers/net/ixgbe/ixgbe_rxtx.h |   2 +
>   drivers/net/ixgbe/ixgbe_rxtx_vec.c |  29 ++--
>   5 files changed, 149 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c 
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index b8ee1e9..355882c 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -866,12 +866,17 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
>   uint32_t ctrl_ext;
>   uint16_t csum;
>   int diag, i;
> + bool rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);
>   
>   PMD_INIT_FUNC_TRACE();
>   
>   eth_dev->dev_ops = _eth_dev_ops;
>   eth_dev->rx_pkt_burst = _recv_pkts;
> - eth_dev->tx_pkt_burst = _xmit_pkts;
> +
> + if (rs_deferring_allowed)
> + eth_dev->tx_pkt_burst = _xmit_pkts;
> + else
> + eth_dev->tx_pkt_burst = _xmit_pkts_always_rs;
>   
>   /*
>* For secondary processes, we don't initialise any further as primary
> @@ -1147,12 +1152,17 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
>   struct ixgbe_hwstrip *hwstrip =
>   IXGBE_DEV_PRIVATE_TO_HWSTRIP_BITMAP(eth_dev->data->dev_private);
>   struct ether_addr *perm_addr = (struct ether_addr *) hw->mac.perm_addr;
> + bool rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);
>   
>   PMD_INIT_FUNC_TRACE();
>   
>   eth_dev->dev_ops = _eth_dev_ops;
>   eth_dev->rx_pkt_burst = _recv_pkts;
> - eth_dev->tx_pkt_burst = _xmit_pkts;
> +
> + if (rs_deferring_allowed)
> + eth_dev->tx_pkt_burst = _xmit_pkts;
> + else
> + eth_dev->tx_pkt_burst = _xmit_pkts_always_rs;
>   
>   /* for secondary processes, we don't initialise any further as primary
>* has already done this work. Only check we don't need a different
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h 
> b/drivers/net/ixgbe/ixgbe_ethdev.h
> index c3d4f4f..390356d 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> @@ -367,9 +367,13 @@ uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
>   
>   uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   uint16_t nb_pkts);
> +uint16_t ixgbe_xmit_pkts_always_rs(
> + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
>   
>   uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
>   uint16_t nb_pkts);
> +uint16_t ixgbe_xmit_pkts_simple_always_rs(
> + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
>   
>   int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
> struct rte_eth_rss_conf *rss_conf);
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index 91023b9..044f72c 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -164,11 +164,16 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
>   
>   /* Populate 4 descriptors with data from 4 mbufs */
>   static inline void
> -tx4(volatile union ixgbe_adv_tx_desc *txdp, struct rt

[dpdk-dev] [PATCH v4] ixgbe_pmd: enforce RS bit on every EOP descriptor for devices newer than 82598

2015-08-20 Thread Vlad Zolotarov

According to 82599 and x540 HW specifications RS bit *must* be
set in the last descriptor of *every* packet.

Before this patch there were 3 types of Tx callbacks that were setting
RS bit every tx_rs_thresh descriptors. This patch introduces a set of
new callbacks, one for each type mentioned above, that will set the RS
bit in every EOP descriptor.

ixgbe_set_tx_function() will set the appropriate Tx callback according
to the device family.

This patch fixes the Tx hang we were constantly hitting with a
seastar-based application on x540 NIC.

Signed-off-by: Vlad Zolotarov 
---
New in v4:
   - Styling (white spaces) fixes.

New in v3:
   - Enforce the RS bit setting instead of enforcing tx_rs_thresh to be 1.
---
 drivers/net/ixgbe/ixgbe_ethdev.c   |  14 +++-
 drivers/net/ixgbe/ixgbe_ethdev.h   |   4 ++
 drivers/net/ixgbe/ixgbe_rxtx.c | 139 -
 drivers/net/ixgbe/ixgbe_rxtx.h |   2 +
 drivers/net/ixgbe/ixgbe_rxtx_vec.c |  29 ++--
 5 files changed, 149 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index b8ee1e9..355882c 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -866,12 +866,17 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
uint32_t ctrl_ext;
uint16_t csum;
int diag, i;
+   bool rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);

PMD_INIT_FUNC_TRACE();

eth_dev->dev_ops = _eth_dev_ops;
eth_dev->rx_pkt_burst = _recv_pkts;
-   eth_dev->tx_pkt_burst = _xmit_pkts;
+
+   if (rs_deferring_allowed)
+   eth_dev->tx_pkt_burst = _xmit_pkts;
+   else
+   eth_dev->tx_pkt_burst = _xmit_pkts_always_rs;

/*
 * For secondary processes, we don't initialise any further as primary
@@ -1147,12 +1152,17 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
struct ixgbe_hwstrip *hwstrip =
IXGBE_DEV_PRIVATE_TO_HWSTRIP_BITMAP(eth_dev->data->dev_private);
struct ether_addr *perm_addr = (struct ether_addr *) hw->mac.perm_addr;
+   bool rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);

PMD_INIT_FUNC_TRACE();

eth_dev->dev_ops = _eth_dev_ops;
eth_dev->rx_pkt_burst = _recv_pkts;
-   eth_dev->tx_pkt_burst = _xmit_pkts;
+
+   if (rs_deferring_allowed)
+   eth_dev->tx_pkt_burst = _xmit_pkts;
+   else
+   eth_dev->tx_pkt_burst = _xmit_pkts_always_rs;

/* for secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index c3d4f4f..390356d 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -367,9 +367,13 @@ uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,

 uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t ixgbe_xmit_pkts_always_rs(
+   void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);

 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t ixgbe_xmit_pkts_simple_always_rs(
+   void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);

 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 91023b9..044f72c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -164,11 +164,16 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)

 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile union ixgbe_adv_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile union ixgbe_adv_tx_desc *txdp, struct rte_mbuf **pkts,
+bool always_rs)
 {
uint64_t buf_dma_addr;
uint32_t pkt_len;
int i;
+   uint32_t flags = DCMD_DTYP_FLAGS;
+
+   if (always_rs)
+   flags |= IXGBE_ADVTXD_DCMD_RS;

for (i = 0; i < 4; ++i, ++txdp, ++pkts) {
buf_dma_addr = RTE_MBUF_DATA_DMA_ADDR(*pkts);
@@ -178,7 +183,7 @@ tx4(volatile union ixgbe_adv_tx_desc *txdp, struct rte_mbuf 
**pkts)
txdp->read.buffer_addr = rte_cpu_to_le_64(buf_dma_addr);

txdp->read.cmd_type_len =
-   rte_cpu_to_le_32((uint32_t)DCMD_DTYP_FLAGS | pkt_len);
+   rte_cpu_to_le_32(flags | pkt_len);

txdp->read.olinfo_status =
rte_cpu_to_le_32(pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
@@ -189,10 +194,15 @@ tx4(volatile union ixgbe_adv_tx_desc *txdp, struct 
rte_mbuf **pkts)

 /* Populate 1 descriptor with data from 1 mbuf */
 static inl

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-20 Thread Vlad Zolotarov



On 08/20/15 11:56, Vlad Zolotarov wrote:
>
>
> On 08/20/15 11:41, Ananyev, Konstantin wrote:
>> Hi Vlad,
>>
>>> -Original Message-
>>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>>> Sent: Wednesday, August 19, 2015 11:03 AM
>>> To: Ananyev, Konstantin; Lu, Wenzhuo
>>> Cc: dev at dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
>>> above 1 for all NICs but 82598
>>>
>>>
>>>
>>> On 08/19/15 10:43, Ananyev, Konstantin wrote:
>>>> Hi Vlad,
>>>> Sorry for delay with review, I am OOO till next week.
>>>> Meanwhile, few questions/comments from me.
>>> Hi, Konstantin, long time no see... ;)
>>>
>>>>>>>>> This patch fixes the Tx hang we were constantly hitting with a
>>>>> seastar-based
>>>>>>>>> application on x540 NIC.
>>>>>>>> Could you help to share with us how to reproduce the tx hang 
>>>>>>>> issue,
>>>>> with using
>>>>>>>> typical DPDK examples?
>>>>>>> Sorry. I'm not very familiar with the typical DPDK examples to 
>>>>>>> help u
>>>>>>> here. However this is quite irrelevant since without this this 
>>>>>>> patch
>>>>>>> ixgbe PMD obviously abuses the HW spec as has been explained above.
>>>>>>>
>>>>>>> We saw the issue when u stressed the xmit path with a lot of highly
>>>>>>> fragmented TCP frames (packets with up to 33 fragments with 
>>>>>>> non-headers
>>>>>>> fragments as small as 4 bytes) with all offload features enabled.
>>>> Could you provide us with the pcap file to reproduce the issue?
>>> Well, the thing is it takes some time to reproduce it (a few minutes of
>>> heavy load) therefore a pcap would be quite large.
>> Probably you can upload it to some place, from which we will be able 
>> to download it?
>
> I'll see what I can do but no promises...

On a second thought pcap file won't help u much since in order to 
reproduce the issue u have to reproduce exactly the same structure of 
clusters i give to HW and it's not what u see on wire in a TSO case.

>
>> Or might be you have some sort of scapy script to generate it?
>> I suppose we'll need something to reproduce the issue and verify the 
>> fix.
>
> Since the original code abuses the HW spec u don't have to... ;)
>
>>
>>>> My concern with you approach is that it would affect TX performance.
>>> It certainly will ;) But it seem inevitable. See below.
>>>
>>>> Right now, for simple TX PMD usually reads only 
>>>> (nb_tx_desc/tx_rs_thresh) TXDs,
>>>> While with your patch (if I understand it correctly) it has to read 
>>>> all TXDs in the HW TX ring.
>>> If by "simple" u refer an always single fragment per Tx packet - then u
>>> are absolutely correct.
>>>
>>> My initial patch was to only set RS on every EOP descriptor without
>>> changing the rs_thresh value and this patch worked.
>>> However HW spec doesn't ensure in a general case that packets are 
>>> always
>>> handled/completion write-back completes in the same order the packets
>>> are placed on the ring (see "Tx arbitration schemes" chapter in 82599
>>> spec for instance). Therefore AFAIU one should not assume that if
>>> packet[x+1] DD bit is set then packet[x] is completed too.
>>  From my understanding, TX arbitration controls the order in which 
>> TXDs from
>> different queues are fetched/processed.
>> But descriptors from the same TX queue are processed in FIFO order.
>> So, I think that  - yes, if TXD[x+1] DD bit is set, then TXD[x] is 
>> completed too,
>> and setting RS on every EOP TXD should be enough.
>
> Ok. I'll rework the patch under this assumption then.
>
>>
>>> That's why I changed the patch to be as u see it now. However if I miss
>>> something here and your HW people ensure the in-order completion 
>>> this of
>>> course may be changed back.
>>>
>>>> Even if we really need to setup RS bit in each TXD (I still doubt 
>>>> we really do) - ,
>>> Well, if u doubt u may ask the guys from the Intel networking division
>>> that wrote the 82599 and x540 HW specs where they clearly state 
>>> that. ;)
>> Good point, we'll see what we can do here :)

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-19 Thread Vlad Zolotarov



On 08/19/15 10:43, Ananyev, Konstantin wrote:
> Hi Vlad,
> Sorry for delay with review, I am OOO till next week.
> Meanwhile, few questions/comments from me.

Hi, Konstantin, long time no see... ;)

>
>>>>>> This patch fixes the Tx hang we were constantly hitting with a
>> seastar-based
>>>>>> application on x540 NIC.
>>>>> Could you help to share with us how to reproduce the tx hang issue,
>> with using
>>>>> typical DPDK examples?
>>>> Sorry. I'm not very familiar with the typical DPDK examples to help u
>>>> here. However this is quite irrelevant since without this this patch
>>>> ixgbe PMD obviously abuses the HW spec as has been explained above.
>>>>
>>>> We saw the issue when u stressed the xmit path with a lot of highly
>>>> fragmented TCP frames (packets with up to 33 fragments with non-headers
>>>> fragments as small as 4 bytes) with all offload features enabled.
> Could you provide us with the pcap file to reproduce the issue?

Well, the thing is it takes some time to reproduce it (a few minutes of 
heavy load) therefore a pcap would be quite large.

> My concern with you approach is that it would affect TX performance.

It certainly will ;) But it seem inevitable. See below.

> Right now, for simple TX PMD usually reads only (nb_tx_desc/tx_rs_thresh) 
> TXDs,
> While with your patch (if I understand it correctly) it has to read all TXDs 
> in the HW TX ring.

If by "simple" u refer an always single fragment per Tx packet - then u 
are absolutely correct.

My initial patch was to only set RS on every EOP descriptor without 
changing the rs_thresh value and this patch worked.
However HW spec doesn't ensure in a general case that packets are always 
handled/completion write-back completes in the same order the packets 
are placed on the ring (see "Tx arbitration schemes" chapter in 82599 
spec for instance). Therefore AFAIU one should not assume that if 
packet[x+1] DD bit is set then packet[x] is completed too.

That's why I changed the patch to be as u see it now. However if I miss 
something here and your HW people ensure the in-order completion this of 
course may be changed back.

> Even if we really need to setup RS bit in each TXD (I still doubt we really 
> do) - ,

Well, if u doubt u may ask the guys from the Intel networking division 
that wrote the 82599 and x540 HW specs where they clearly state that. ;)

> I think inside PMD it still should be possible to check TX completion in 
> chunks.
> Konstantin
>
>
>>>> Thanks,
>>>> vlad
>>>>>> Signed-off-by: Vlad Zolotarov 
>>>>>> ---
>>>>>>drivers/net/ixgbe/ixgbe_ethdev.c |  9 +
>>>>>>drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
>>>>>>2 files changed, 31 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> index b8ee1e9..6714fd9 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
>>>> struct
>>>>>> rte_eth_dev_info *dev_info)
>>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
>>>>>> ETH_TXQ_FLAGS_NOOFFLOADS,
>>>>>> };
>>>>>> +
>>>>>> +  /*
>>>>>> +   * According to 82599 and x540 specifications RS bit *must* be
>> set on
>>>> the
>>>>>> +   * last descriptor of *every* packet. Therefore we will not allow
>> the
>>>>>> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
>>>>>> +   */
>>>>>> +  if (hw->mac.type > ixgbe_mac_82598EB)
>>>>>> +  dev_info->default_txconf.tx_rs_thresh = 1;
>>>>>> +
>>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
>>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
>>>>>> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff --
>>>> git
>>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>> index
>>>>>> 91023b9..8dbdffc 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> @@ -2085,11 +2085,19 @@ ixgbe_de

[dpdk-dev] [PATCH v2] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-17 Thread Vlad Zolotarov

According to 82599 and x540 HW specifications RS bit *must* be
set in the last descriptor of *every* packet.

This patch fixes the Tx hang we were constantly hitting with a
seastar-based application on x540 NIC.

Signed-off-by: Vlad Zolotarov 
---
New in v2:
   - ixgbevf: ixgbevf_dev_info_get(): return tx_rs_thresh=1 in
  default tx configuration for all devices since VFs
  are available only on devices newer than 82598.

Signed-off-by: Vlad Zolotarov 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 19 ++-
 drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index b8ee1e9..fd9cb77 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS,
};
+
+   /*
+* According to 82599 and x540 specifications RS bit *must* be set on 
the
+* last descriptor of *every* packet. Therefore we will not allow the
+* tx_rs_thresh above 1 for all NICs newer than 82598.
+*/
+   if (hw->mac.type > ixgbe_mac_82598EB)
+   dev_info->default_txconf.tx_rs_thresh = 1;
+
dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
@@ -2463,7 +2472,15 @@ ixgbevf_dev_info_get(struct rte_eth_dev *dev,
.wthresh = IXGBE_DEFAULT_TX_WTHRESH,
},
.tx_free_thresh = IXGBE_DEFAULT_TX_FREE_THRESH,
-   .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
+   /*
+* According to 82599 and x540 specifications RS bit *must* be
+* set on the last descriptor of *every* packet. Therefore we
+* will not allow the tx_rs_thresh above 1 for all NICs newer
+* than 82598. Since VFs are available only on devices starting
+* from 82599, tx_rs_thresh should be set to 1 for ALL VF
+* devices.
+*/
+   .tx_rs_thresh = 1,
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS,
};
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 91023b9..8dbdffc 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
struct ixgbe_tx_queue *txq;
struct ixgbe_hw *hw;
uint16_t tx_rs_thresh, tx_free_thresh;
+   bool rs_deferring_allowed;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

/*
+* According to 82599 and x540 specifications RS bit *must* be set on 
the
+* last descriptor of *every* packet. Therefore we will not allow the
+* tx_rs_thresh above 1 for all NICs newer than 82598.
+*/
+   rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);
+
+   /*
 * Validate number of transmit descriptors.
 * It must not exceed hardware maximum, and must be multiple
 * of IXGBE_ALIGN.
@@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 * to transmit a packet is greater than the number of free TX
 * descriptors.
 * The following constraints must be satisfied:
+*  tx_rs_thresh must be less than 2 for NICs for which RS deferring is
+*  forbidden (all but 82598).
 *  tx_rs_thresh must be greater than 0.
 *  tx_rs_thresh must be less than the size of the ring minus 2.
 *  tx_rs_thresh must be less than or equal to tx_free_thresh.
@@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 * When set to zero use default values.
 */
tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ?
-   tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
+   tx_conf->tx_rs_thresh :
+   (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1));
tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
+
+   if (!rs_deferring_allowed && tx_rs_thresh > 1) {
+   PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS "
+ "must be set for every packet for this HW. "
+ "(tx_rs_thresh=%u port=%d queue=%d)",
+(unsigned int)tx_rs_thresh,
+

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-14 Thread Vlad Zolotarov



On 08/13/15 23:28, Zhang, Helin wrote:
> Hi Vlad
>
> I don't think the changes are needed. It says in datasheet that the RS bit 
> should be
> set on the last descriptor of every packet, ONLY WHEN TXDCTL.WTHRESH equals 
> to ZERO.

Of course it's needed! See below.
Exactly the same spec a few lines above the place u've just quoted states:

"Software should not set the RS bit when TXDCTL.WTHRESH is greater than zero."

And since all three (3) ixgbe xmit callbacks are utilizing RS bit 
notation ixgbe PMD is actually not supporting any value of WTHRESH 
different from zero.

>
> Regards,
> Helin
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Thursday, August 13, 2015 11:07 AM
>> To: dev at dpdk.org
>> Cc: Zhang, Helin; Ananyev, Konstantin; avi at cloudius-systems.com; Vlad
>> Zolotarov
>> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all
>> NICs but 82598
>>
>> According to 82599 and x540 HW specifications RS bit *must* be set in the 
>> last
>> descriptor of *every* packet.
> There is a condition that if TXDCTL.WTHRESH equal to zero.

Right and ixgbe PMD requires this condition to be fulfilled in order to 
function. See above.

>
>> This patch fixes the Tx hang we were constantly hitting with a seastar-based
>> application on x540 NIC.
> Could you help to share with us how to reproduce the tx hang issue, with using
> typical DPDK examples?

Sorry. I'm not very familiar with the typical DPDK examples to help u 
here. However this is quite irrelevant since without this this patch 
ixgbe PMD obviously abuses the HW spec as has been explained above.

We saw the issue when u stressed the xmit path with a lot of highly 
fragmented TCP frames (packets with up to 33 fragments with non-headers 
fragments as small as 4 bytes) with all offload features enabled.

Thanks,
vlad
>
>> Signed-off-by: Vlad Zolotarov 
>> ---
>>   drivers/net/ixgbe/ixgbe_ethdev.c |  9 +
>>   drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
>>   2 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>> index b8ee1e9..6714fd9 100644
>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct
>> rte_eth_dev_info *dev_info)
>>  .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
>>  ETH_TXQ_FLAGS_NOOFFLOADS,
>>  };
>> +
>> +/*
>> + * According to 82599 and x540 specifications RS bit *must* be set on 
>> the
>> + * last descriptor of *every* packet. Therefore we will not allow the
>> + * tx_rs_thresh above 1 for all NICs newer than 82598.
>> + */
>> +if (hw->mac.type > ixgbe_mac_82598EB)
>> +dev_info->default_txconf.tx_rs_thresh = 1;
>> +
>>  dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
>>  dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
>>  dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff --git
>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index
>> 91023b9..8dbdffc 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev
>> *dev,
>>  struct ixgbe_tx_queue *txq;
>>  struct ixgbe_hw *hw;
>>  uint16_t tx_rs_thresh, tx_free_thresh;
>> +bool rs_deferring_allowed;
>>
>>  PMD_INIT_FUNC_TRACE();
>>  hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>
>>  /*
>> + * According to 82599 and x540 specifications RS bit *must* be set on 
>> the
>> + * last descriptor of *every* packet. Therefore we will not allow the
>> + * tx_rs_thresh above 1 for all NICs newer than 82598.
>> + */
>> +rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);
>> +
>> +/*
>>   * Validate number of transmit descriptors.
>>   * It must not exceed hardware maximum, and must be multiple
>>   * of IXGBE_ALIGN.
>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
>>   * to transmit a packet is greater than the number of free TX
>>   * descriptors.
>>   * The following constraints must be satisfied:
>> + *  tx_rs_thresh must be less than 2 for NICs for which RS deferring is
>> + *  forbidden (all but 82598).
>>   *

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-13 Thread Vlad Zolotarov

According to 82599 and x540 HW specifications RS bit *must* be
set in the last descriptor of *every* packet.

This patch fixes the Tx hang we were constantly hitting with a
seastar-based application on x540 NIC.

Signed-off-by: Vlad Zolotarov 
---
 drivers/net/ixgbe/ixgbe_ethdev.c |  9 +
 drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index b8ee1e9..6714fd9 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS,
};
+
+   /*
+* According to 82599 and x540 specifications RS bit *must* be set on 
the
+* last descriptor of *every* packet. Therefore we will not allow the
+* tx_rs_thresh above 1 for all NICs newer than 82598.
+*/
+   if (hw->mac.type > ixgbe_mac_82598EB)
+   dev_info->default_txconf.tx_rs_thresh = 1;
+
dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 91023b9..8dbdffc 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
struct ixgbe_tx_queue *txq;
struct ixgbe_hw *hw;
uint16_t tx_rs_thresh, tx_free_thresh;
+   bool rs_deferring_allowed;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

/*
+* According to 82599 and x540 specifications RS bit *must* be set on 
the
+* last descriptor of *every* packet. Therefore we will not allow the
+* tx_rs_thresh above 1 for all NICs newer than 82598.
+*/
+   rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);
+
+   /*
 * Validate number of transmit descriptors.
 * It must not exceed hardware maximum, and must be multiple
 * of IXGBE_ALIGN.
@@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 * to transmit a packet is greater than the number of free TX
 * descriptors.
 * The following constraints must be satisfied:
+*  tx_rs_thresh must be less than 2 for NICs for which RS deferring is
+*  forbidden (all but 82598).
 *  tx_rs_thresh must be greater than 0.
 *  tx_rs_thresh must be less than the size of the ring minus 2.
 *  tx_rs_thresh must be less than or equal to tx_free_thresh.
@@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 * When set to zero use default values.
 */
tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ?
-   tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
+   tx_conf->tx_rs_thresh :
+   (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1));
tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
+
+   if (!rs_deferring_allowed && tx_rs_thresh > 1) {
+   PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS "
+ "must be set for every packet for this HW. "
+ "(tx_rs_thresh=%u port=%d queue=%d)",
+(unsigned int)tx_rs_thresh,
+(int)dev->data->port_id, (int)queue_idx);
+   return -(EINVAL);
+   }
+
if (tx_rs_thresh >= (nb_desc - 2)) {
PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number "
 "of TX descriptors minus 2. (tx_rs_thresh=%u "
-- 
2.1.0

[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov



On 07/30/15 20:33, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Thursday, July 30, 2015 9:44 AM
>> To: Zhang, Helin; Ananyev, Konstantin
>> Cc: dev at dpdk.org
>> Subject: Re: i40e xmit path HW limitation
>>
>>
>>
>> On 07/30/15 19:10, Zhang, Helin wrote:
>>>> -Original Message-
>>>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>>>> Sent: Thursday, July 30, 2015 7:58 AM
>>>> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
>>>> Subject: RFC: i40e xmit path HW limitation
>>>>
>>>> Hi, Konstantin, Helin,
>>>> there is a documented limitation of xl710 controllers (i40e driver)
>>>> which is not handled in any way by a DPDK driver.
>>>>From the datasheet chapter 8.4.1:
>>>>
>>>> "? A single transmit packet may span up to 8 buffers (up to 8 data
>>>> descriptors per packet including both the header and payload buffers).
>>>> ? The total number of data descriptors for the whole TSO (explained
>>>> later on in this chapter) is unlimited as long as each segment within
>>>> the TSO obeys the previous rule (up to 8 data descriptors per segment
>>>> for both the TSO header and the segment payload buffers)."
>>> Yes, I remember the RX side just supports 5 segments per packet receiving.
>>> But what's the possible issue you thought about?
>> Note that it's a Tx size we are talking about.
>>
>> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo.
>> If such a cluster arrives and you post it on the HW ring - HW will shut this 
>> HW ring
>> down permanently. The application will see that it's ring is stuck.
> That issue was because of using more than 8 descriptors for a packet for TSO.

There is no problem in transmitting the TSO packet with more than 8 
fragments.
On the opposite - one can't transmit a non-TSO packet with more than 8 
fragments.
One also can't transmit the TSO packet that would contain more than 8 
fragments in a single TSO segment including the TSO headers.

Pls., read the HW spec as I quoted above for more details.

>
>>>> This means that, for instance, long cluster with small fragments has to be
>>>> linearized before it may be placed on the HW ring.
>>> What type of size of the small fragments? Basically 2KB is the default size 
>>> of
>> mbuf of most
>>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the
>> maximum
>>> packet size we supported.
>>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of 
>>> packet.
>> I kinda lost u here. Again, we talk about the Tx side here and buffers
>> are not obligatory completely filled. Namely there may be a cluster with
>> 15 fragments 100 bytes each.
> The root cause is using more than 8 descriptors for a packet.

That would be if u would like to SUPER simplify the HW limitation above. 
In that case u would significantly limit the different packets that may 
be sent without the linearization.

> Linux driver can help
> on reducing number of descriptors to be used by merging small size of payload
> together, right?
> It is not for TSO, it is just for packet transmitting. 2 options in my mind:
> 1. Use should ensure it will not use more than 8 descriptors per packet for 
> transmitting.

This requirement is too restricting. Pls., see above.

> 2. DPDK driver should try to merge small packet together for such case, like 
> Linux kernel driver.
> I prefer to use option 1, users should ensure that in the application or up 
> layer software,
> and keep the PMD driver as simple as possible.

The above statement is super confusing: on the one hand u suggest the 
DPDK driver to merge the small packet (fragments?) together (how?) and 
then u immediately propose the user application to do that. Could u, 
pls., clarify what exactly u suggest here?
If that's to leave it to the application - note that it would demand 
patching all existing DPDK applications that send TCP packets.

>
> But I have a thought that the maximum number of RX/TX descriptor should be 
> able to be
> queried somewhere.

There is no such thing as maximum number of Tx fragments in a TSO case. 
It's only limited by the Tx ring size.

>
> Regards,
> Helin
>>>> In more standard environments like Linux or FreeBSD drivers the solution is
>>>> straight forward - call skb_linearize()/m_collapse() corresponding.
>>>> In the non-conformist environment like DPDK life is not that easy - there

[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov



On 07/30/15 20:01, Stephen Hemminger wrote:
> On Thu, 30 Jul 2015 19:50:27 +0300
> Vlad Zolotarov  wrote:
>
>>
>> On 07/30/15 19:20, Avi Kivity wrote:
>>>
>>> On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
>>>> On Thu, 30 Jul 2015 17:57:33 +0300
>>>> Vlad Zolotarov  wrote:
>>>>
>>>>> Hi, Konstantin, Helin,
>>>>> there is a documented limitation of xl710 controllers (i40e driver)
>>>>> which is not handled in any way by a DPDK driver.
>>>>>From the datasheet chapter 8.4.1:
>>>>>
>>>>> "? A single transmit packet may span up to 8 buffers (up to 8 data
>>>>> descriptors per packet including
>>>>> both the header and payload buffers).
>>>>> ? The total number of data descriptors for the whole TSO (explained
>>>>> later on in this chapter) is
>>>>> unlimited as long as each segment within the TSO obeys the previous
>>>>> rule (up to 8 data descriptors
>>>>> per segment for both the TSO header and the segment payload buffers)."
>>>>>
>>>>> This means that, for instance, long cluster with small fragments has to
>>>>> be linearized before it may be placed on the HW ring.
>>>>> In more standard environments like Linux or FreeBSD drivers the
>>>>> solution
>>>>> is straight forward - call skb_linearize()/m_collapse() corresponding.
>>>>> In the non-conformist environment like DPDK life is not that easy -
>>>>> there is no easy way to collapse the cluster into a linear buffer from
>>>>> inside the device driver
>>>>> since device driver doesn't allocate memory in a fast path and utilizes
>>>>> the user allocated pools only.
>>>>>
>>>>> Here are two proposals for a solution:
>>>>>
>>>>>1. We may provide a callback that would return a user TRUE if a give
>>>>>   cluster has to be linearized and it should always be called before
>>>>>   rte_eth_tx_burst(). Alternatively it may be called from inside the
>>>>>   rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return
>>>>> some
>>>>>   error code for a case when one of the clusters it's given has
>>>>> to be
>>>>>   linearized.
>>>>>2. Another option is to allocate a mempool in the driver with the
>>>>>   elements consuming a single page each (standard 2KB buffers would
>>>>>   do). Number of elements in the pool should be as Tx ring length
>>>>>   multiplied by "64KB/(linear data length of the buffer in the pool
>>>>>   above)". Here I use 64KB as a maximum packet length and not taking
>>>>>   into an account esoteric things like "Giant" TSO mentioned in the
>>>>>   spec above. Then we may actually go and linearize the cluster if
>>>>>   needed on top of the buffers from the pool above, post the buffer
>>>>>   from the mempool above on the HW ring, link the original
>>>>> cluster to
>>>>>   that new cluster (using the private data) and release it when the
>>>>>   send is done.
>>>> Or just silently drop heavily scattered packets (and increment oerrors)
>>>> with a PMD_TX_LOG debug message.
>>>>
>>>> I think a DPDK driver doesn't have to accept all possible mbufs and do
>>>> extra work. It seems reasonable to expect caller to be well behaved
>>>> in this restricted ecosystem.
>>>>
>>> How can the caller know what's well behaved?  It's device dependent.
>> +1
>>
>> Stephen, how do you imagine this well-behaved application? Having switch
>> case by an underlying device type and then "well-behaving" correspondingly?
>> Not to mention that to "well-behave" the application writer has to read
>> HW specs and understand them, which would limit the amount of DPDK
>> developers to a very small amount of people... ;) Not to mention that
>> the mentioned above switch-case would be a super ugly thing to be found
>> in an application that would raise a big question about the
>> justification of a DPDK existence as as SDK providing device drivers
>> interface. ;)
> Either have a RTE_MAX_MBUF_SEGMENTS

And what would it be in our care? 8? This would limit the maximum TSO 
packet to 16KB for 2KB buffers.

> that is global or
> a mbuf_linearize function?  Driver already can stash the
> mbuf pool used for Rx and reuse it for the transient Tx buffers.
First of all who can guaranty that that pool would meet our needs - 
namely have large enough buffers?
Secondly, using user's Rx mempool for that would be really not nice 
(read - dirty) towards the user that may had allocated the specific 
amount of buffers in it according to some calculations that didn't 
include the usage from the Tx flow.

And lastly and most importantly, this would require using the atomic 
operations during access to Rx mempool, that would both require a 
specific mempool initialization and would significantly hit the 
performance.


>

[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov



On 07/30/15 19:20, Avi Kivity wrote:
>
>
> On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
>> On Thu, 30 Jul 2015 17:57:33 +0300
>> Vlad Zolotarov  wrote:
>>
>>> Hi, Konstantin, Helin,
>>> there is a documented limitation of xl710 controllers (i40e driver)
>>> which is not handled in any way by a DPDK driver.
>>>   From the datasheet chapter 8.4.1:
>>>
>>> "? A single transmit packet may span up to 8 buffers (up to 8 data 
>>> descriptors per packet including
>>> both the header and payload buffers).
>>> ? The total number of data descriptors for the whole TSO (explained 
>>> later on in this chapter) is
>>> unlimited as long as each segment within the TSO obeys the previous 
>>> rule (up to 8 data descriptors
>>> per segment for both the TSO header and the segment payload buffers)."
>>>
>>> This means that, for instance, long cluster with small fragments has to
>>> be linearized before it may be placed on the HW ring.
>>> In more standard environments like Linux or FreeBSD drivers the 
>>> solution
>>> is straight forward - call skb_linearize()/m_collapse() corresponding.
>>> In the non-conformist environment like DPDK life is not that easy -
>>> there is no easy way to collapse the cluster into a linear buffer from
>>> inside the device driver
>>> since device driver doesn't allocate memory in a fast path and utilizes
>>> the user allocated pools only.
>>>
>>> Here are two proposals for a solution:
>>>
>>>   1. We may provide a callback that would return a user TRUE if a give
>>>  cluster has to be linearized and it should always be called before
>>>  rte_eth_tx_burst(). Alternatively it may be called from inside the
>>>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return 
>>> some
>>>  error code for a case when one of the clusters it's given has 
>>> to be
>>>  linearized.
>>>   2. Another option is to allocate a mempool in the driver with the
>>>  elements consuming a single page each (standard 2KB buffers would
>>>  do). Number of elements in the pool should be as Tx ring length
>>>  multiplied by "64KB/(linear data length of the buffer in the pool
>>>  above)". Here I use 64KB as a maximum packet length and not taking
>>>  into an account esoteric things like "Giant" TSO mentioned in the
>>>  spec above. Then we may actually go and linearize the cluster if
>>>  needed on top of the buffers from the pool above, post the buffer
>>>  from the mempool above on the HW ring, link the original 
>>> cluster to
>>>  that new cluster (using the private data) and release it when the
>>>  send is done.
>> Or just silently drop heavily scattered packets (and increment oerrors)
>> with a PMD_TX_LOG debug message.
>>
>> I think a DPDK driver doesn't have to accept all possible mbufs and do
>> extra work. It seems reasonable to expect caller to be well behaved
>> in this restricted ecosystem.
>>
>
> How can the caller know what's well behaved?  It's device dependent.

+1

Stephen, how do you imagine this well-behaved application? Having switch 
case by an underlying device type and then "well-behaving" correspondingly?
Not to mention that to "well-behave" the application writer has to read 
HW specs and understand them, which would limit the amount of DPDK 
developers to a very small amount of people... ;) Not to mention that 
the mentioned above switch-case would be a super ugly thing to be found 
in an application that would raise a big question about the 
justification of a DPDK existence as as SDK providing device drivers 
interface. ;)

>
>

[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov



On 07/30/15 19:10, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Thursday, July 30, 2015 7:58 AM
>> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
>> Subject: RFC: i40e xmit path HW limitation
>>
>> Hi, Konstantin, Helin,
>> there is a documented limitation of xl710 controllers (i40e driver) which is 
>> not
>> handled in any way by a DPDK driver.
>>   From the datasheet chapter 8.4.1:
>>
>> "? A single transmit packet may span up to 8 buffers (up to 8 data 
>> descriptors per
>> packet including both the header and payload buffers).
>> ? The total number of data descriptors for the whole TSO (explained later on 
>> in
>> this chapter) is unlimited as long as each segment within the TSO obeys the
>> previous rule (up to 8 data descriptors per segment for both the TSO header 
>> and
>> the segment payload buffers)."
> Yes, I remember the RX side just supports 5 segments per packet receiving.
> But what's the possible issue you thought about?
Note that it's a Tx size we are talking about.

See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo.
If such a cluster arrives and you post it on the HW ring - HW will shut 
this HW ring down permanently. The application will see that it's ring 
is stuck.

>
>> This means that, for instance, long cluster with small fragments has to be
>> linearized before it may be placed on the HW ring.
> What type of size of the small fragments? Basically 2KB is the default size 
> of mbuf of most
> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the 
> maximum
> packet size we supported.
> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of 
> packet.

I kinda lost u here. Again, we talk about the Tx side here and buffers 
are not obligatory completely filled. Namely there may be a cluster with 
15 fragments 100 bytes each.

>
>> In more standard environments like Linux or FreeBSD drivers the solution is
>> straight forward - call skb_linearize()/m_collapse() corresponding.
>> In the non-conformist environment like DPDK life is not that easy - there is 
>> no
>> easy way to collapse the cluster into a linear buffer from inside the device 
>> driver
>> since device driver doesn't allocate memory in a fast path and utilizes the 
>> user
>> allocated pools only.
>> Here are two proposals for a solution:
>>
>>   1. We may provide a callback that would return a user TRUE if a give
>>  cluster has to be linearized and it should always be called before
>>  rte_eth_tx_burst(). Alternatively it may be called from inside the
>>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
>>  error code for a case when one of the clusters it's given has to be
>>  linearized.
>>   2. Another option is to allocate a mempool in the driver with the
>>  elements consuming a single page each (standard 2KB buffers would
>>  do). Number of elements in the pool should be as Tx ring length
>>  multiplied by "64KB/(linear data length of the buffer in the pool
>>  above)". Here I use 64KB as a maximum packet length and not taking
>>  into an account esoteric things like "Giant" TSO mentioned in the
>>  spec above. Then we may actually go and linearize the cluster if
>>  needed on top of the buffers from the pool above, post the buffer
>>  from the mempool above on the HW ring, link the original cluster to
>>  that new cluster (using the private data) and release it when the
>>  send is done.
>>
>>
>> The first is a change in the API and would require from the application some
>> additional handling (linearization). The second would require some additional
>> memory but would keep all dirty details inside the driver and would leave the
>> rest of the code intact.
>>
>> Pls., comment.
>>
>> thanks,
>> vlad
>>

[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov

Hi, Konstantin, Helin,
there is a documented limitation of xl710 controllers (i40e driver) 
which is not handled in any way by a DPDK driver.
 From the datasheet chapter 8.4.1:

"? A single transmit packet may span up to 8 buffers (up to 8 data descriptors 
per packet including
both the header and payload buffers).
? The total number of data descriptors for the whole TSO (explained later on in 
this chapter) is
unlimited as long as each segment within the TSO obeys the previous rule (up to 
8 data descriptors
per segment for both the TSO header and the segment payload buffers)."

This means that, for instance, long cluster with small fragments has to 
be linearized before it may be placed on the HW ring.
In more standard environments like Linux or FreeBSD drivers the solution 
is straight forward - call skb_linearize()/m_collapse() corresponding.
In the non-conformist environment like DPDK life is not that easy - 
there is no easy way to collapse the cluster into a linear buffer from 
inside the device driver
since device driver doesn't allocate memory in a fast path and utilizes 
the user allocated pools only.

Here are two proposals for a solution:

 1. We may provide a callback that would return a user TRUE if a give
cluster has to be linearized and it should always be called before
rte_eth_tx_burst(). Alternatively it may be called from inside the
rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
error code for a case when one of the clusters it's given has to be
linearized.
 2. Another option is to allocate a mempool in the driver with the
elements consuming a single page each (standard 2KB buffers would
do). Number of elements in the pool should be as Tx ring length
multiplied by "64KB/(linear data length of the buffer in the pool
above)". Here I use 64KB as a maximum packet length and not taking
into an account esoteric things like "Giant" TSO mentioned in the
spec above. Then we may actually go and linearize the cluster if
needed on top of the buffers from the pool above, post the buffer
from the mempool above on the HW ring, link the original cluster to
that new cluster (using the private data) and release it when the
send is done.


The first is a change in the API and would require from the application 
some additional handling (linearization). The second would require some 
additional memory but would keep all dirty details inside the driver and 
would leave the rest of the code intact.

Pls., comment.

thanks,
vlad

[dpdk-dev] [PATCH] ethdev: fix ABI breakage in lro code

2015-07-17 Thread Vlad Zolotarov



On 07/13/15 13:26, John McNamara wrote:
> Fix for ABI breakage introduced in LRO addition. Moves
> lro bitfield to the end of the struct/member.
>
> Fixes: 8eecb3295aed (ixgbe: add LRO support)
>
> Signed-off-by: John McNamara 
> ---
>   lib/librte_ether/rte_ethdev.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 79bde89..1c3ace1 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1578,9 +1578,9 @@ struct rte_eth_dev_data {
>   uint8_t port_id;   /**< Device [external] port identifier. */
>   uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
>   scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
> OFF(0) */
> - lro  : 1,  /**< RX LRO is ON(1) / OFF(0) */
>   all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
> - dev_started : 1;   /**< Device state: STARTED(1) / STOPPED(0). 
> */
> + dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). 
> */
> + lro : 1;   /**< RX LRO is ON(1) / OFF(0) */

Acked-by: Vlad Zolotarov 

>   };
>   
>   /**

[dpdk-dev] [PATCH v2 5/5] ixgbe: Add support for scattered Rx with bulk allocation.

2015-04-29 Thread Vlad Zolotarov

Simply initialze rx_pkt_burst callback to ixgbe_recv_pkts_lro_bulk_alloc() if
the conditions are right.

This is possible because work against HW in LRO and scattered cases is exactly 
the same
and LRO callback already supports the bulk allocation.

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 1766c1a..63284c9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3780,6 +3780,11 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
 dev->data->port_id);

dev->rx_pkt_burst = ixgbe_recv_scattered_pkts_vec;
+   } else if (adapter->rx_bulk_alloc_allowed) {
+   PMD_INIT_LOG(INFO, "Using a Scattered with bulk "
+  "allocation callback (port=%d).",
+dev->data->port_id);
+   dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
} else {
PMD_INIT_LOG(DEBUG, "Using Regualr (non-vector, "
"single allocation) "
-- 
2.1.0

[dpdk-dev] [PATCH v2 4/5] ixgbe: Kill ixgbe_recv_scattered_pkts()

2015-04-29 Thread Vlad Zolotarov

Kill ixgbe_recv_scattered_pkts() - use ixgbe_recv_pkts_lro_single_alloc()
instead.

Work against HW queues in LRO and scattered Rx cases is exactly the same.
Therefore we may drop the inferior callback.

This patch also changes the sw_rsc_ring allocation in the 
ixgbe_dev_rx_queue_setup() to
always allocate sw_rsc_ring instead of explicitly allocating it in all possible 
cases
when it may be needed: LRO and/or scattered Rx.

This will only impose sizeof(void*) * IXGBE_MAX_RING_DESC = 32KB overhead
per Rx queue as a price for a much simpler code, which seems reasonable.

Signed-off-by: Vlad Zolotarov 
---
New in v2:
   - Always allocate sw_sc_ring.
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   2 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   3 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 276 +++-
 3 files changed, 22 insertions(+), 259 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index aec1de9..5f9a1cf 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -986,7 +986,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 * RX function */
if (rte_eal_process_type() != RTE_PROC_PRIMARY){
if (eth_dev->data->scattered_rx)
-   eth_dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
+   eth_dev->rx_pkt_burst = 
ixgbe_recv_pkts_lro_single_alloc;
return 0;
}

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 5b90115..419ea5d 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -352,9 +352,6 @@ void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
 uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

-uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
-   struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
-
 uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index b335a57..1766c1a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1722,239 +1722,6 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct 
rte_mbuf **rx_pkts,
return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
 }

-uint16_t
-ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
- uint16_t nb_pkts)
-{
-   struct ixgbe_rx_queue *rxq;
-   volatile union ixgbe_adv_rx_desc *rx_ring;
-   volatile union ixgbe_adv_rx_desc *rxdp;
-   struct ixgbe_rx_entry *sw_ring;
-   struct ixgbe_rx_entry *rxe;
-   struct rte_mbuf *first_seg;
-   struct rte_mbuf *last_seg;
-   struct rte_mbuf *rxm;
-   struct rte_mbuf *nmb;
-   union ixgbe_adv_rx_desc rxd;
-   uint64_t dma; /* Physical address of mbuf data buffer */
-   uint32_t staterr;
-   uint16_t rx_id;
-   uint16_t nb_rx;
-   uint16_t nb_hold;
-   uint16_t data_len;
-
-   nb_rx = 0;
-   nb_hold = 0;
-   rxq = rx_queue;
-   rx_id = rxq->rx_tail;
-   rx_ring = rxq->rx_ring;
-   sw_ring = rxq->sw_ring;
-
-   /*
-* Retrieve RX context of current packet, if any.
-*/
-   first_seg = rxq->pkt_first_seg;
-   last_seg = rxq->pkt_last_seg;
-
-   while (nb_rx < nb_pkts) {
-   next_desc:
-   /*
-* The order of operations here is important as the DD status
-* bit must not be read after any other descriptor fields.
-* rx_ring and rxdp are pointing to volatile data so the order
-* of accesses cannot be reordered by the compiler. If they were
-* not volatile, they could be reordered which could lead to
-* using invalid descriptor fields when read from rxd.
-*/
-   rxdp = _ring[rx_id];
-   staterr = rxdp->wb.upper.status_error;
-   if (! (staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
-   break;
-   rxd = *rxdp;
-
-   /*
-* Descriptor done.
-*
-* Allocate a new mbuf to replenish the RX ring descriptor.
-* If the allocation fails:
-*- arrange for that RX descriptor to be the first one
-*  being parsed the next time the receive function is
-*  invoked [on the same queue].
-*
-*- Stop parsing the RX ring and return immediately.
-*
-* This policy does not drop the packet received in the RX
-* descriptor for whic

[dpdk-dev] [PATCH v2 3/5] ixgbe: Rename yy_rsc_xx -> yy_sc/scattered_rx_xx

2015-04-29 Thread Vlad Zolotarov

   - ixgbe_rsc_entry -> ixgbe_scattered_rx_entry
   - sw_rsc_ring -> sw_sc_ring
   - ixgbe_free_rsc_cluster() -> ixgbe_free_sc_cluster()
   - In local variables: xx_rsc_yy -> xx_sc_yy

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 48 +++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h |  4 ++--
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index a45f51e..b335a57 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1466,7 +1466,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts,
struct ixgbe_rx_queue *rxq = rx_queue;
volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
-   struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
+   struct ixgbe_scattered_rx_entry *sw_sc_ring = rxq->sw_sc_ring;
uint16_t rx_id = rxq->rx_tail;
uint16_t nb_rx = 0;
uint16_t nb_hold = rxq->nb_rx_hold;
@@ -1475,8 +1475,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts,
while (nb_rx < nb_pkts) {
bool eop;
struct ixgbe_rx_entry *rxe;
-   struct ixgbe_rsc_entry *rsc_entry;
-   struct ixgbe_rsc_entry *next_rsc_entry;
+   struct ixgbe_scattered_rx_entry *sc_entry;
+   struct ixgbe_scattered_rx_entry *next_sc_entry;
struct ixgbe_rx_entry *next_rxe;
struct rte_mbuf *first_seg;
struct rte_mbuf *rxm;
@@ -1619,14 +1619,14 @@ next_desc:
else
nextp_id = next_id;

-   next_rsc_entry = _rsc_ring[nextp_id];
+   next_sc_entry = _sc_ring[nextp_id];
next_rxe = _ring[nextp_id];
rte_ixgbe_prefetch(next_rxe);
}

-   rsc_entry = _rsc_ring[rx_id];
-   first_seg = rsc_entry->fbuf;
-   rsc_entry->fbuf = NULL;
+   sc_entry = _sc_ring[rx_id];
+   first_seg = sc_entry->fbuf;
+   sc_entry->fbuf = NULL;

/*
 * If this is the first buffer of the received packet,
@@ -1651,11 +1651,11 @@ next_desc:
/*
 * If this is not the last buffer of the received packet, update
 * the pointer to the first mbuf at the NEXTP entry in the
-* sw_rsc_ring and continue to parse the RX ring.
+* sw_sc_ring and continue to parse the RX ring.
 */
if (!eop) {
rxm->next = next_rxe->mbuf;
-   next_rsc_entry->fbuf = first_seg;
+   next_sc_entry->fbuf = first_seg;
goto next_desc;
}

@@ -2305,7 +2305,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 }

 /**
- * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
+ * ixgbe_free_sc_cluster - free the not-yet-completed scattered cluster
  *
  * The "next" pointer of the last segment of (not-yet-completed) RSC clusters
  * in the sw_rsc_ring is not set to NULL but rather points to the next
@@ -2314,10 +2314,10 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
  * will just free first "nb_segs" segments of the cluster explicitly by calling
  * an rte_pktmbuf_free_seg().
  *
- * @m RSC cluster head
+ * @m scattered cluster head
  */
 static void
-ixgbe_free_rsc_cluster(struct rte_mbuf *m)
+ixgbe_free_sc_cluster(struct rte_mbuf *m)
 {
uint8_t i, nb_segs = m->nb_segs;
struct rte_mbuf *next_seg;
@@ -2353,11 +2353,11 @@ ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
 #endif
}

-   if (rxq->sw_rsc_ring)
+   if (rxq->sw_sc_ring)
for (i = 0; i < rxq->nb_rx_desc; i++)
-   if (rxq->sw_rsc_ring[i].fbuf) {
-   
ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
-   rxq->sw_rsc_ring[i].fbuf = NULL;
+   if (rxq->sw_sc_ring[i].fbuf) {
+   ixgbe_free_sc_cluster(rxq->sw_sc_ring[i].fbuf);
+   rxq->sw_sc_ring[i].fbuf = NULL;
}
 }

@@ -2367,7 +2367,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue *rxq)
if (rxq != NULL) {
ixgbe_rx_queue_release_mbufs(rxq);
rte_free(rxq->sw_ring);
-   rte_free(rxq->sw_rsc_ring);
+   rte_free(rxq->sw_sc_ring);
rte_free(rxq);
}
 }
@@ -2624,20 +2624,20 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
}

[dpdk-dev] [PATCH v2 2/5] ixgbe: ixgbe_rx_queue: remove unused rsc_en field

2015-04-29 Thread Vlad Zolotarov

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 60344a9..a45f51e 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -2489,7 +2489,6 @@ ixgbe_reset_rx_queue(struct ixgbe_adapter *adapter, 
struct ixgbe_rx_queue *rxq)
rxq->nb_rx_hold = 0;
rxq->pkt_first_seg = NULL;
rxq->pkt_last_seg = NULL;
-   rxq->rsc_en = 0;
 }

 int
@@ -4188,8 +4187,6 @@ ixgbe_set_rsc(struct rte_eth_dev *dev)
 * interrupt vector.
 */
ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
-
-   rxq->rsc_en = 1;
}

dev->data->lro = 1;
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
index 4d77042..a1bcbe8 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
@@ -131,7 +131,6 @@ struct ixgbe_rx_queue {
uint8_t port_id;  /**< Device port identifier. */
uint8_t crc_len;  /**< 0 if CRC stripped, 4 otherwise. */
uint8_t drop_en;  /**< If not 0, set SRRCTL.Drop_En. */
-   uint8_t rsc_en;   /**< If not 0, RSC is enabled. */
uint8_t rx_deferred_start; /**< not in global dev start. */
 #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
-- 
2.1.0

[dpdk-dev] [PATCH v2 1/5] ixgbe: move rx_bulk_alloc_allowed and rx_vec_allowed to ixgbe_adapter

2015-04-29 Thread Vlad Zolotarov

Move the above fields from ixgbe_hw to ixgbe_adapter.

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  2 --
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  8 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  3 +++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 38 +++--
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h 
b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
index 9a66370..c67d462 100644
--- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
+++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
@@ -3657,8 +3657,6 @@ struct ixgbe_hw {
bool force_full_reset;
bool allow_unsupported_sfp;
bool wol_enabled;
-   bool rx_bulk_alloc_allowed;
-   bool rx_vec_allowed;
 };

 #define ixgbe_call_func(hw, func, params, error) \
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 366aa45..aec1de9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1428,8 +1428,8 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
 {
struct ixgbe_interrupt *intr =
IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_adapter *adapter =
+   (struct ixgbe_adapter *)dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

@@ -1440,8 +1440,8 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
 * Initialize to TRUE. If any of Rx queues doesn't meet the bulk
 * allocation or vector Rx preconditions we will reset it.
 */
-   hw->rx_bulk_alloc_allowed = true;
-   hw->rx_vec_allowed = true;
+   adapter->rx_bulk_alloc_allowed = true;
+   adapter->rx_vec_allowed = true;

return 0;
 }
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index e45e727..5b90115 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -265,6 +265,9 @@ struct ixgbe_adapter {
struct ixgbe_bypass_infobps;
 #endif /* RTE_NIC_BYPASS */
struct ixgbe_filter_infofilter;
+
+   bool rx_bulk_alloc_allowed;
+   bool rx_vec_allowed;
 };

 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 3c61d1c..60344a9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -2442,7 +2442,7 @@ check_rx_burst_bulk_alloc_preconditions(__rte_unused 
struct ixgbe_rx_queue *rxq)

 /* Reset dynamic ixgbe_rx_queue fields back to defaults */
 static void
-ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct ixgbe_rx_queue *rxq)
+ixgbe_reset_rx_queue(struct ixgbe_adapter *adapter, struct ixgbe_rx_queue *rxq)
 {
static const union ixgbe_adv_rx_desc zeroed_desc = {{0}};
unsigned i;
@@ -2458,7 +2458,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct 
ixgbe_rx_queue *rxq)
 * constraints here to see if we need to zero out memory after the end
 * of the H/W descriptor ring.
 */
-   if (hw->rx_bulk_alloc_allowed)
+   if (adapter->rx_bulk_alloc_allowed)
/* zero out extra memory */
len += RTE_PMD_IXGBE_RX_MAX_BURST;

@@ -2504,6 +2504,8 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
struct ixgbe_rx_queue *rxq;
struct ixgbe_hw *hw;
uint16_t len;
+   struct ixgbe_adapter *adapter =
+   (struct ixgbe_adapter *)dev->data->dev_private;
struct rte_eth_dev_info dev_info = { 0 };
struct rte_eth_rxmode *dev_rx_mode = >data->dev_conf.rxmode;
bool rsc_requested = false;
@@ -2602,7 +2604,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
"preconditions - canceling the feature for "
"the whole port[%d]",
 rxq->queue_id, rxq->port_id);
-   hw->rx_bulk_alloc_allowed = false;
+   adapter->rx_bulk_alloc_allowed = false;
}

/*
@@ -2611,7 +2613,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
 * function does not access an invalid memory region.
 */
len = nb_desc;
-   if (hw->rx_bulk_alloc_allowed)
+   if (adapter->rx_bulk_alloc_allowed)
len += RTE_PMD_IXGBE_RX_MAX_BURST;

rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
@@ -2644,13 +2646,13 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
"preconditions - canceling the feature for "
"the whole port[%d]",
 rxq->queue_id, rxq->port_id);
-   hw->rx_vec_allowed = false;
+

[dpdk-dev] [PATCH v2 0/5]: Cleanups in the ixgbe PMD

2015-04-29 Thread Vlad Zolotarov

This series includes:
   - Fix the "issue" introduced in 01fa1d6215fa7cd6b5303ac9296381b75b9226de:
 files in librte_pmd_ixgbe/ixgbe/ are shared with FreeBSD and AFAIU should 
not
 be changed unless the change is pushed into the FreeBSD tree first.
   - Remove unused rsc_en field in ixgbe_rx_queue struct.
 Thanks to Shiweixian  for pointing this out.
   - Kill the non-vector scattered Rx callback and use an appropriate LRO 
callback
 instead. This is possible because work against HW in both LRO and 
scattered RX
 cases is the same. Note that this patch touches the ixgbevf PMD as well.
   - Use LRO bulk callback when scattered (non-LRO) Rx is requested and 
parameters
 allow bulk allocation.

Note that this series is meant to cleanup the PF PMD and is a follow up series 
for my
previous patches. Although VF PMD is slightly modified here too this series 
doesn't mean
to fix/add new functionality to it. VF PMD should be patched in the similar way 
I've
patched PF PMD in my previous series in order to fix the same issues that were 
fixed in
the PF PMD and in order to enable LRO and scattered Rx with bulk allocations.

New in v2:
   - Rename RSC-specific structures to "Scattered Rx" derivatives.
   - Always allocate Scattered Rx ring.

Vlad Zolotarov (5):
  ixgbe: move rx_bulk_alloc_allowed and rx_vec_allowed to ixgbe_adapter
  ixgbe: ixgbe_rx_queue: remove unused rsc_en field
  ixgbe: Rename yy_rsc_xx -> yy_sc/scattered_rx_xx
  ixgbe: Kill ixgbe_recv_scattered_pkts()
  ixgbe: Add support for scattered Rx with bulk allocation.

 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 -
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  10 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   6 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 360 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   5 +-
 5 files changed, 77 insertions(+), 306 deletions(-)

-- 
2.1.0

[dpdk-dev] [PATCH v1 3/4] ixgbe: Kill ixgbe_recv_scattered_pkts()

2015-04-29 Thread Vlad Zolotarov



On 04/29/15 09:47, Vlad Zolotarov wrote:
>
>
> On 04/28/15 20:42, Ananyev, Konstantin wrote:
>> Hi Vlad,
>>
>>> -Original Message-
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>>> Sent: Sunday, April 26, 2015 3:46 PM
>>> To: dev at dpdk.org
>>> Subject: [dpdk-dev] [PATCH v1 3/4] ixgbe: Kill 
>>> ixgbe_recv_scattered_pkts()
>>>
>>> Kill ixgbe_recv_scattered_pkts() - use 
>>> ixgbe_recv_pkts_lro_single_alloc()
>>> instead.
>>>
>>> Work against HW queues in LRO and scattered Rx cases is exactly the 
>>> same.
>>> Therefore we may drop the inferior callback.
>>>
>>> Signed-off-by: Vlad Zolotarov 
>>> ---
>>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   2 +-
>>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   3 -
>>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 243 
>>> +---
>>>   3 files changed, 7 insertions(+), 241 deletions(-)
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> index aec1de9..5f9a1cf 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> @@ -986,7 +986,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
>>>* RX function */
>>>   if (rte_eal_process_type() != RTE_PROC_PRIMARY){
>>>   if (eth_dev->data->scattered_rx)
>>> -eth_dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
>>> +eth_dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
>>>   return 0;
>>>   }
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> index 5b90115..419ea5d 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> @@ -352,9 +352,6 @@ void ixgbevf_dev_rxtx_start(struct rte_eth_dev 
>>> *dev);
>>>   uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>>>   uint16_t nb_pkts);
>>>
>>> -uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
>>> -struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>> -
>>>   uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
>>>   struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>>   uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> index a45f51e..c23e20f 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> @@ -1722,239 +1722,6 @@ ixgbe_recv_pkts_lro_bulk_alloc(void 
>>> *rx_queue, struct rte_mbuf **rx_pkts,
>>>   return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
>>>   }
>>>
>>> -uint16_t
>>> -ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>>> -  uint16_t nb_pkts)
>>> -{
>>> -struct ixgbe_rx_queue *rxq;
>>> -volatile union ixgbe_adv_rx_desc *rx_ring;
>>> -volatile union ixgbe_adv_rx_desc *rxdp;
>>> -struct ixgbe_rx_entry *sw_ring;
>>> -struct ixgbe_rx_entry *rxe;
>>> -struct rte_mbuf *first_seg;
>>> -struct rte_mbuf *last_seg;
>>> -struct rte_mbuf *rxm;
>>> -struct rte_mbuf *nmb;
>>> -union ixgbe_adv_rx_desc rxd;
>>> -uint64_t dma; /* Physical address of mbuf data buffer */
>>> -uint32_t staterr;
>>> -uint16_t rx_id;
>>> -uint16_t nb_rx;
>>> -uint16_t nb_hold;
>>> -uint16_t data_len;
>>> -
>>> -nb_rx = 0;
>>> -nb_hold = 0;
>>> -rxq = rx_queue;
>>> -rx_id = rxq->rx_tail;
>>> -rx_ring = rxq->rx_ring;
>>> -sw_ring = rxq->sw_ring;
>>> -
>>> -/*
>>> - * Retrieve RX context of current packet, if any.
>>> - */
>>> -first_seg = rxq->pkt_first_seg;
>>> -last_seg = rxq->pkt_last_seg;
>>> -
>>> -while (nb_rx < nb_pkts) {
>>> -next_desc:
>>> -/*
>>> - * The order of operations here is important as the DD status
>>> - * bit must not be read after any other descriptor fields.
>>> - * rx_ring and rxdp are pointing to volatile data so the order
>>>

[dpdk-dev] [PATCH v1 3/4] ixgbe: Kill ixgbe_recv_scattered_pkts()

2015-04-29 Thread Vlad Zolotarov



On 04/28/15 20:42, Ananyev, Konstantin wrote:
> Hi Vlad,
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Sunday, April 26, 2015 3:46 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v1 3/4] ixgbe: Kill ixgbe_recv_scattered_pkts()
>>
>> Kill ixgbe_recv_scattered_pkts() - use ixgbe_recv_pkts_lro_single_alloc()
>> instead.
>>
>> Work against HW queues in LRO and scattered Rx cases is exactly the same.
>> Therefore we may drop the inferior callback.
>>
>> Signed-off-by: Vlad Zolotarov 
>> ---
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   2 +-
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   3 -
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 243 
>> +---
>>   3 files changed, 7 insertions(+), 241 deletions(-)
>>
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> index aec1de9..5f9a1cf 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> @@ -986,7 +986,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
>>   * RX function */
>>  if (rte_eal_process_type() != RTE_PROC_PRIMARY){
>>  if (eth_dev->data->scattered_rx)
>> -eth_dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
>> +eth_dev->rx_pkt_burst = 
>> ixgbe_recv_pkts_lro_single_alloc;
>>  return 0;
>>  }
>>
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>> index 5b90115..419ea5d 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>> @@ -352,9 +352,6 @@ void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
>>   uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>>  uint16_t nb_pkts);
>>
>> -uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
>> -struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>> -
>>   uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
>>  struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>   uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> index a45f51e..c23e20f 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> @@ -1722,239 +1722,6 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, 
>> struct rte_mbuf **rx_pkts,
>>  return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
>>   }
>>
>> -uint16_t
>> -ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>> -  uint16_t nb_pkts)
>> -{
>> -struct ixgbe_rx_queue *rxq;
>> -volatile union ixgbe_adv_rx_desc *rx_ring;
>> -volatile union ixgbe_adv_rx_desc *rxdp;
>> -struct ixgbe_rx_entry *sw_ring;
>> -struct ixgbe_rx_entry *rxe;
>> -struct rte_mbuf *first_seg;
>> -struct rte_mbuf *last_seg;
>> -struct rte_mbuf *rxm;
>> -struct rte_mbuf *nmb;
>> -union ixgbe_adv_rx_desc rxd;
>> -uint64_t dma; /* Physical address of mbuf data buffer */
>> -uint32_t staterr;
>> -uint16_t rx_id;
>> -uint16_t nb_rx;
>> -uint16_t nb_hold;
>> -uint16_t data_len;
>> -
>> -nb_rx = 0;
>> -nb_hold = 0;
>> -rxq = rx_queue;
>> -rx_id = rxq->rx_tail;
>> -rx_ring = rxq->rx_ring;
>> -sw_ring = rxq->sw_ring;
>> -
>> -/*
>> - * Retrieve RX context of current packet, if any.
>> - */
>> -first_seg = rxq->pkt_first_seg;
>> -last_seg = rxq->pkt_last_seg;
>> -
>> -while (nb_rx < nb_pkts) {
>> -next_desc:
>> -/*
>> - * The order of operations here is important as the DD status
>> - * bit must not be read after any other descriptor fields.
>> - * rx_ring and rxdp are pointing to volatile data so the order
>> - * of accesses cannot be reordered by the compiler. If they were
>> - * not volatile, they could be reordered which could lead to
>> - * using invalid descriptor fields when read from rxd.
>> - */
>> -rxdp = _ring[rx_id];
>> -staterr = rxdp->wb.upper.status_error;
>> -if (! (staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
&g

[dpdk-dev] [PATCH v1 4/4] ixgbe: Add support for scattered Rx with bulk allocation.

2015-04-26 Thread Vlad Zolotarov

Simply initialze rx_pkt_burst callback to ixgbe_recv_pkts_lro_bulk_alloc() if
the conditions are right.

This is possible because work against HW in LRO and scattered cases is exactly 
the same
and LRO callback already supports the bulk allocation.

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index c23e20f..6addc41 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3783,6 +3783,11 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
 dev->data->port_id);

dev->rx_pkt_burst = ixgbe_recv_scattered_pkts_vec;
+   } else if (adapter->rx_bulk_alloc_allowed) {
+   PMD_INIT_LOG(INFO, "Using a Scattered with bulk "
+  "allocation callback (port=%d).",
+dev->data->port_id);
+   dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
} else {
PMD_INIT_LOG(DEBUG, "Using Regualr (non-vector, "
"single allocation) "
-- 
2.1.0

[dpdk-dev] [PATCH v1 3/4] ixgbe: Kill ixgbe_recv_scattered_pkts()

2015-04-26 Thread Vlad Zolotarov

Kill ixgbe_recv_scattered_pkts() - use ixgbe_recv_pkts_lro_single_alloc()
instead.

Work against HW queues in LRO and scattered Rx cases is exactly the same.
Therefore we may drop the inferior callback.

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   2 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   3 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 243 +---
 3 files changed, 7 insertions(+), 241 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index aec1de9..5f9a1cf 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -986,7 +986,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 * RX function */
if (rte_eal_process_type() != RTE_PROC_PRIMARY){
if (eth_dev->data->scattered_rx)
-   eth_dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
+   eth_dev->rx_pkt_burst = 
ixgbe_recv_pkts_lro_single_alloc;
return 0;
}

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 5b90115..419ea5d 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -352,9 +352,6 @@ void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
 uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

-uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
-   struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
-
 uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index a45f51e..c23e20f 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1722,239 +1722,6 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct 
rte_mbuf **rx_pkts,
return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
 }

-uint16_t
-ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
- uint16_t nb_pkts)
-{
-   struct ixgbe_rx_queue *rxq;
-   volatile union ixgbe_adv_rx_desc *rx_ring;
-   volatile union ixgbe_adv_rx_desc *rxdp;
-   struct ixgbe_rx_entry *sw_ring;
-   struct ixgbe_rx_entry *rxe;
-   struct rte_mbuf *first_seg;
-   struct rte_mbuf *last_seg;
-   struct rte_mbuf *rxm;
-   struct rte_mbuf *nmb;
-   union ixgbe_adv_rx_desc rxd;
-   uint64_t dma; /* Physical address of mbuf data buffer */
-   uint32_t staterr;
-   uint16_t rx_id;
-   uint16_t nb_rx;
-   uint16_t nb_hold;
-   uint16_t data_len;
-
-   nb_rx = 0;
-   nb_hold = 0;
-   rxq = rx_queue;
-   rx_id = rxq->rx_tail;
-   rx_ring = rxq->rx_ring;
-   sw_ring = rxq->sw_ring;
-
-   /*
-* Retrieve RX context of current packet, if any.
-*/
-   first_seg = rxq->pkt_first_seg;
-   last_seg = rxq->pkt_last_seg;
-
-   while (nb_rx < nb_pkts) {
-   next_desc:
-   /*
-* The order of operations here is important as the DD status
-* bit must not be read after any other descriptor fields.
-* rx_ring and rxdp are pointing to volatile data so the order
-* of accesses cannot be reordered by the compiler. If they were
-* not volatile, they could be reordered which could lead to
-* using invalid descriptor fields when read from rxd.
-*/
-   rxdp = _ring[rx_id];
-   staterr = rxdp->wb.upper.status_error;
-   if (! (staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
-   break;
-   rxd = *rxdp;
-
-   /*
-* Descriptor done.
-*
-* Allocate a new mbuf to replenish the RX ring descriptor.
-* If the allocation fails:
-*- arrange for that RX descriptor to be the first one
-*  being parsed the next time the receive function is
-*  invoked [on the same queue].
-*
-*- Stop parsing the RX ring and return immediately.
-*
-* This policy does not drop the packet received in the RX
-* descriptor for which the allocation of a new mbuf failed.
-* Thus, it allows that packet to be later retrieved if
-* mbuf have been freed in the mean time.
-* As a side effect, holding RX descriptors instead of
-* systematically giving them back to the NIC may lead to
-* RX ring exhaustion situations.
-* However, the NIC can gracefully prevent such si

[dpdk-dev] [PATCH v1 2/4] ixgbe: ixgbe_rx_queue: remove unused rsc_en field

2015-04-26 Thread Vlad Zolotarov

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 60344a9..a45f51e 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -2489,7 +2489,6 @@ ixgbe_reset_rx_queue(struct ixgbe_adapter *adapter, 
struct ixgbe_rx_queue *rxq)
rxq->nb_rx_hold = 0;
rxq->pkt_first_seg = NULL;
rxq->pkt_last_seg = NULL;
-   rxq->rsc_en = 0;
 }

 int
@@ -4188,8 +4187,6 @@ ixgbe_set_rsc(struct rte_eth_dev *dev)
 * interrupt vector.
 */
ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
-
-   rxq->rsc_en = 1;
}

dev->data->lro = 1;
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
index 4d77042..a1bcbe8 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
@@ -131,7 +131,6 @@ struct ixgbe_rx_queue {
uint8_t port_id;  /**< Device port identifier. */
uint8_t crc_len;  /**< 0 if CRC stripped, 4 otherwise. */
uint8_t drop_en;  /**< If not 0, set SRRCTL.Drop_En. */
-   uint8_t rsc_en;   /**< If not 0, RSC is enabled. */
uint8_t rx_deferred_start; /**< not in global dev start. */
 #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
-- 
2.1.0

[dpdk-dev] [PATCH v1 1/4] ixgbe: move rx_bulk_alloc_allowed and rx_vec_allowed to ixgbe_adapter

2015-04-26 Thread Vlad Zolotarov

Move the above fields from ixgbe_hw to ixgbe_adapter.

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  2 --
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  8 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  3 +++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 38 +++--
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h 
b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
index 9a66370..c67d462 100644
--- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
+++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
@@ -3657,8 +3657,6 @@ struct ixgbe_hw {
bool force_full_reset;
bool allow_unsupported_sfp;
bool wol_enabled;
-   bool rx_bulk_alloc_allowed;
-   bool rx_vec_allowed;
 };

 #define ixgbe_call_func(hw, func, params, error) \
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 366aa45..aec1de9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1428,8 +1428,8 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
 {
struct ixgbe_interrupt *intr =
IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_adapter *adapter =
+   (struct ixgbe_adapter *)dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

@@ -1440,8 +1440,8 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
 * Initialize to TRUE. If any of Rx queues doesn't meet the bulk
 * allocation or vector Rx preconditions we will reset it.
 */
-   hw->rx_bulk_alloc_allowed = true;
-   hw->rx_vec_allowed = true;
+   adapter->rx_bulk_alloc_allowed = true;
+   adapter->rx_vec_allowed = true;

return 0;
 }
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index e45e727..5b90115 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -265,6 +265,9 @@ struct ixgbe_adapter {
struct ixgbe_bypass_infobps;
 #endif /* RTE_NIC_BYPASS */
struct ixgbe_filter_infofilter;
+
+   bool rx_bulk_alloc_allowed;
+   bool rx_vec_allowed;
 };

 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 3c61d1c..60344a9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -2442,7 +2442,7 @@ check_rx_burst_bulk_alloc_preconditions(__rte_unused 
struct ixgbe_rx_queue *rxq)

 /* Reset dynamic ixgbe_rx_queue fields back to defaults */
 static void
-ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct ixgbe_rx_queue *rxq)
+ixgbe_reset_rx_queue(struct ixgbe_adapter *adapter, struct ixgbe_rx_queue *rxq)
 {
static const union ixgbe_adv_rx_desc zeroed_desc = {{0}};
unsigned i;
@@ -2458,7 +2458,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct 
ixgbe_rx_queue *rxq)
 * constraints here to see if we need to zero out memory after the end
 * of the H/W descriptor ring.
 */
-   if (hw->rx_bulk_alloc_allowed)
+   if (adapter->rx_bulk_alloc_allowed)
/* zero out extra memory */
len += RTE_PMD_IXGBE_RX_MAX_BURST;

@@ -2504,6 +2504,8 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
struct ixgbe_rx_queue *rxq;
struct ixgbe_hw *hw;
uint16_t len;
+   struct ixgbe_adapter *adapter =
+   (struct ixgbe_adapter *)dev->data->dev_private;
struct rte_eth_dev_info dev_info = { 0 };
struct rte_eth_rxmode *dev_rx_mode = >data->dev_conf.rxmode;
bool rsc_requested = false;
@@ -2602,7 +2604,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
"preconditions - canceling the feature for "
"the whole port[%d]",
 rxq->queue_id, rxq->port_id);
-   hw->rx_bulk_alloc_allowed = false;
+   adapter->rx_bulk_alloc_allowed = false;
}

/*
@@ -2611,7 +2613,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
 * function does not access an invalid memory region.
 */
len = nb_desc;
-   if (hw->rx_bulk_alloc_allowed)
+   if (adapter->rx_bulk_alloc_allowed)
len += RTE_PMD_IXGBE_RX_MAX_BURST;

rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
@@ -2644,13 +2646,13 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
"preconditions - canceling the feature for "
"the whole port[%d]",
 rxq->queue_id, rxq->port_id);
-   hw->rx_vec_allowed = false;
+

[dpdk-dev] [PATCH v1 0/4]: Cleanups in the ixgbe PMD

2015-04-26 Thread Vlad Zolotarov

This series includes:
   - Fix the "issue" introduced in 01fa1d6215fa7cd6b5303ac9296381b75b9226de:
 files in librte_pmd_ixgbe/ixgbe/ are shared with FreeBSD and AFAIU should 
not
 be changed unless the change is pushed into the FreeBSD tree first.
   - Remove unused rsc_en field in ixgbe_rx_queue struct.
 Thanks to Shiweixian  for pointing this out.
   - Kill the non-vector scattered Rx callback and use an appropriate LRO 
callback
 instead. This is possible because work against HW in both LRO and 
scattered RX
 cases is the same. Note that this patch touches the ixgbevf PMD as well.
   - Use LRO bulk callback when scattered (non-LRO) Rx is requested and 
parameters
 allow bulk allocation.

Note that this series is meant to cleanup the PF PMD and is a follow up series 
for my
previous patches. Although VF PMD is slightly modified here too this series 
doesn't mean
to fix/add new functionality to it. VF PMD should be patched in the similar way 
I've 
patched PF PMD in my previous series in order to fix the same issues that were 
fixed in
the PF PMD and in order to enable LRO and scattered Rx with bulk allocations.

Vlad Zolotarov (4):
  ixgbe: move rx_bulk_alloc_allowed and rx_vec_allowed to ixgbe_adapter
  ixgbe: ixgbe_rx_queue: remove unused rsc_en field
  ixgbe: Kill ixgbe_recv_scattered_pkts()
  ixgbe: Add support for scattered Rx with bulk allocation.

 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 -
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  10 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   6 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 289 
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   1 -
 5 files changed, 41 insertions(+), 267 deletions(-)

-- 
2.1.0

[dpdk-dev] DCA

2015-04-20 Thread Vlad Zolotarov

Hi,
I would like to ask if there is any reason why DPDK doesn't have support 
for DCA feature?

thanks,
vlad

[dpdk-dev] [PATCH v3 2/2] use simple zero initializers

2015-04-19 Thread Vlad Zolotarov



On 04/17/15 01:10, Thomas Monjalon wrote:
> To initialize a structure with zeros, one field was explicitly set
> to avoid "missing initializer" bug with old GCC (e.g. 4.4).
> This warning is now disabled (commit ) for old versions of GCC,
> so the workarounds may be removed.
>
> These initializers should not be needed for static variables but they
> are still used to workaround an ICC bug (see commit b2595c4aa92d).
>
> There is one remaining exception where {0} initializer doesn't work cleanly,
> even with recent GCC:
> lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:735:9:
> error: missing braces around initializer [-Werror=missing-braces]
>struct rte_mbuf mb_def = {0}; /* zeroed mbuf */
>
> Tested with gcc-4.4.7 (CentOS), gcc-4.7.2 (Debian), gcc-4.9.2 (Arch),
> clang-3.6.0 and icc-13.1.1.
>
> Signed-off-by: Thomas Monjalon 
> Tested-by: Thomas Monjalon 
> Tested-by: John McNamara 

Acked-by: Vlad Zolotarov 

> ---
> changes in v2:
> - new patch
> changes in v3:
> - tested with clang and icc
>
>   app/test/test_ring_perf.c | 2 +-
>   lib/librte_pmd_e1000/em_ethdev.c  | 2 +-
>   lib/librte_pmd_e1000/igb_ethdev.c | 4 ++--
>   lib/librte_pmd_e1000/igb_rxtx.c   | 6 ++
>   lib/librte_pmd_enic/enic_clsf.c   | 2 +-
>   lib/librte_pmd_i40e/i40e_rxtx.c   | 2 +-
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 +++-
>   lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 3 +--
>   lib/librte_pmd_mlx4/mlx4.c| 2 +-
>   9 files changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> index 44dda4d..8c47ccb 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -253,7 +253,7 @@ static void
>   run_on_core_pair(struct lcore_pair *cores,
>   lcore_function_t f1, lcore_function_t f2)
>   {
> - struct thread_params param1 = {.size = 0}, param2 = {.size = 0};
> + struct thread_params param1 = {0}, param2 = {0};
>   unsigned i;
>   for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
>   lcore_count = 0;
> diff --git a/lib/librte_pmd_e1000/em_ethdev.c 
> b/lib/librte_pmd_e1000/em_ethdev.c
> index 12ecf5f..82e0b7a 100644
> --- a/lib/librte_pmd_e1000/em_ethdev.c
> +++ b/lib/librte_pmd_e1000/em_ethdev.c
> @@ -130,7 +130,7 @@ static struct rte_pci_id pci_id_em_map[] = {
>   #define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
>   #include "rte_pci_dev_ids.h"
>   
> -{.device_id = 0},
> +{0},
>   };
>   
>   static const struct eth_dev_ops eth_em_ops = {
> diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
> b/lib/librte_pmd_e1000/igb_ethdev.c
> index 1ea2d38..e2b7cf3 100644
> --- a/lib/librte_pmd_e1000/igb_ethdev.c
> +++ b/lib/librte_pmd_e1000/igb_ethdev.c
> @@ -221,7 +221,7 @@ static struct rte_pci_id pci_id_igb_map[] = {
>   #define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
>   #include "rte_pci_dev_ids.h"
>   
> -{.device_id = 0},
> +{0},
>   };
>   
>   /*
> @@ -232,7 +232,7 @@ static struct rte_pci_id pci_id_igbvf_map[] = {
>   #define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
>   #include "rte_pci_dev_ids.h"
>   
> -{.device_id = 0},
> +{0},
>   };
>   
>   static const struct eth_dev_ops eth_igb_ops = {
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index 946b39d..084e45a 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -1164,8 +1164,7 @@ igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
>   static void
>   igb_reset_tx_queue(struct igb_tx_queue *txq, struct rte_eth_dev *dev)
>   {
> - static const union e1000_adv_tx_desc zeroed_desc = { .read = {
> - .buffer_addr = 0}};
> + static const union e1000_adv_tx_desc zeroed_desc = {{0}};
>   struct igb_tx_entry *txe = txq->sw_ring;
>   uint16_t i, prev;
>   struct e1000_hw *hw;
> @@ -1330,8 +1329,7 @@ eth_igb_rx_queue_release(void *rxq)
>   static void
>   igb_reset_rx_queue(struct igb_rx_queue *rxq)
>   {
> - static const union e1000_adv_rx_desc zeroed_desc = { .read = {
> - .pkt_addr = 0}};
> + static const union e1000_adv_rx_desc zeroed_desc = {{0}};
>   unsigned i;
>   
>   /* Zero out HW ring memory */
> diff --git a/lib/librte_pmd_enic/enic_clsf.c b/lib/librte_pmd_enic/enic_clsf.c
> index b61d625..a069194 100644
> --- a/lib/librte_pmd_enic/enic_clsf.c
> +++ b/lib/librte_pmd_enic/enic_clsf.c
> @@ -96,7 +96,7 @@ int enic_fdir_add_fltr(struct enic *enic, struct 
> rte_fdir_filter *params,
>   u16

[dpdk-dev] [PATCH v3 1/2] mk: fix build with gcc 4.4 and clang

2015-04-19 Thread Vlad Zolotarov



On 04/17/15 01:10, Thomas Monjalon wrote:
> With GCC 4.4.7 from CentOS 6.5, the following errors arise:
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 'ixgbe_dev_rx_queue_setup':
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: (near initialization for 
> 'dev_info.driver_name')
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 'ixgbe_set_rsc':
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: (near initialization for 
> 'dev_info.driver_name')
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 
> 'ixgbe_recv_pkts_lro_single_alloc':
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1479: error: 'next_rsc_entry' may be used 
> uninitialized in this function
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1480: error: 'next_rxe' may be used 
> uninitialized in this function
>
> The "missing initializer" warning is a GCC bug which seems fixed in 4.7.
> The same warning is thrown by clang.
> The "may be used uninitialized" warning is another GCC bug which seems fixed 
> in 4.7.
>
> Fixes: 8eecb3295aed ("ixgbe: add LRO support")
>
> Signed-off-by: Thomas Monjalon 

Acked-by: Vlad Zolotarov 

> ---
> changes in v2:
> - option -Wno-missing-field-initializers for old GCC instead of code 
> workaround
> changes in v3:
> - option -Wno-missing-field-initializers for clang
> - option -Wno-uninitialized for old GCC instead of code workaround (=NULL)
> - remove redundants -Wno-uninitialized from ixgbe Makefile
>
>   lib/librte_pmd_ixgbe/Makefile  | 4 
>   mk/toolchain/clang/rte.vars.mk | 3 +++
>   mk/toolchain/gcc/rte.vars.mk   | 9 +
>   3 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_pmd_ixgbe/Makefile b/lib/librte_pmd_ixgbe/Makefile
> index ae36202..fbf6966 100644
> --- a/lib/librte_pmd_ixgbe/Makefile
> +++ b/lib/librte_pmd_ixgbe/Makefile
> @@ -76,10 +76,6 @@ ifeq ($(shell test $(GCC_VERSION) -ge 50 && echo 1), 1)
>   CFLAGS_ixgbe_common.o += -Wno-logical-not-parentheses
>   endif
>   
> -ifeq ($(shell test $(GCC_VERSION) -le 46 && echo 1), 1)
> -CFLAGS_ixgbe_x550.o += -Wno-uninitialized
> -CFLAGS_ixgbe_phy.o += -Wno-uninitialized
> -endif
>   endif
>   
>   #
> diff --git a/mk/toolchain/clang/rte.vars.mk b/mk/toolchain/clang/rte.vars.mk
> index 40cb389..245ea7e 100644
> --- a/mk/toolchain/clang/rte.vars.mk
> +++ b/mk/toolchain/clang/rte.vars.mk
> @@ -72,5 +72,8 @@ WERROR_FLAGS += -Wundef -Wwrite-strings
>   # process cpu flags
>   include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
>   
> +# workaround clang bug with warning "missing field initializer" for "= {0}"
> +WERROR_FLAGS += -Wno-missing-field-initializers
> +
>   export CC AS AR LD OBJCOPY OBJDUMP STRIP READELF
>   export TOOLCHAIN_CFLAGS TOOLCHAIN_LDFLAGS TOOLCHAIN_ASFLAGS
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 88f235c..0f51c66 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -80,5 +80,14 @@ WERROR_FLAGS += -Wundef -Wwrite-strings
>   # process cpu flags
>   include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
>   
> +# workaround GCC bug with warning "missing initializer" for "= {0}"
> +ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
> +WERROR_FLAGS += -Wno-missing-field-initializers
> +endif
> +# workaround GCC bug with warning "may be used uninitialized"
> +ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
> +WERROR_FLAGS += -Wno-uninitialized
> +endif
> +
>   export CC AS AR LD OBJCOPY OBJDUMP STRIP READELF
>   export TOOLCHAIN_CFLAGS TOOLCHAIN_LDFLAGS TOOLCHAIN_ASFLAGS

[dpdk-dev] [PATCH v2 1/2] ixgbe: fix build with gcc 4.4

2015-04-16 Thread Vlad Zolotarov



On 04/15/15 23:49, Thomas Monjalon wrote:
> With GCC 4.4.7 from CentOS 6.5, the following errors arise:
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_dev_rx_queue_setup?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: (near initialization for 
> ?dev_info.driver_name?)
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_set_rsc?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: (near initialization for 
> ?dev_info.driver_name?)
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 
> ?ixgbe_recv_pkts_lro_single_alloc?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1479: error: ?next_rsc_entry? may be used 
> uninitialized in this function
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1480: error: ?next_rxe? may be used 
> uninitialized in this function
>
> The "missing initializer" warning is a GCC bug which seems fixed in 4.7.
> The "may be used uninitialized" warning seems to be another GCC bug and is
> workarounded with NULL initialization.
>
> Fixes: 8eecb3295aed ("ixgbe: add LRO support")
>
> Signed-off-by: Thomas Monjalon 
> ---
> changes in v2:
> - option -Wno-missing-field-initializers for old GCC instead of code 
> workaround
>
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 4 ++--
>   mk/toolchain/gcc/rte.vars.mk  | 5 +
>   2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index f1da9ec..6475c44 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -1476,8 +1476,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
> **rx_pkts, uint16_t nb_pkts,
>   bool eop;
>   struct ixgbe_rx_entry *rxe;
>   struct ixgbe_rsc_entry *rsc_entry;
> - struct ixgbe_rsc_entry *next_rsc_entry;
> - struct ixgbe_rx_entry *next_rxe;
> + struct ixgbe_rsc_entry *next_rsc_entry = NULL;
> + struct ixgbe_rx_entry *next_rxe = NULL;

-Wno-maybe-uninitialized ?

>   struct rte_mbuf *first_seg;
>   struct rte_mbuf *rxm;
>   struct rte_mbuf *nmb;
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 88f235c..208cddd 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -80,5 +80,10 @@ WERROR_FLAGS += -Wundef -Wwrite-strings
>   # process cpu flags
>   include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
>   
> +# workaround GCC bug with warning "missing initializer" for "= {0}"
> +ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
> +WERROR_FLAGS += -Wno-missing-field-initializers
> +endif
> +
>   export CC AS AR LD OBJCOPY OBJDUMP STRIP READELF
>   export TOOLCHAIN_CFLAGS TOOLCHAIN_LDFLAGS TOOLCHAIN_ASFLAGS

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 18:28, Thomas Monjalon wrote:
> 2015-04-14 18:21, Vlad Zolotarov:
>> On 04/14/15 18:13, Thomas Monjalon wrote:
>>> 2015-04-14 17:59, Vlad Zolotarov:
>>>> On 04/14/15 17:17, Thomas Monjalon wrote:
>>>>> 2015-04-14 16:38, Vlad Zolotarov:
>>>>>> On 04/14/15 16:06, Ananyev, Konstantin wrote:
>>>>>>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>>>>>>>> On 04/14/15 12:31, Thomas Monjalon wrote:
>>>>>>>>> - struct rte_eth_dev_info dev_info = { 0 };
>>>>>>>>> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>>>>>>> Hmmm... Unless I miss something this and one above would zero only a
>>>>>>>> single field - "max_rx_queues"; and would leave the rest uninitialized.
>>>>>>>> The original code intend to zero the whole struct. The alternative to
>>>>>>>> the original lines could be usage of memset().
>>>>>>> As I understand, in that case compiler had to set all non-explicitly 
>>>>>>> initialised members to 0.
>>>>>>> So I think we are ok here.
>>>>>> Yeah, I guess it does zero-initializes the rest
>>>>>> (https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) however I
>>>>>> don't understand how the above change fixes the error if it complains
>>>>>> about the dev_info.driver_name?
>>>>> As only 1 field is required, I chose the one which should not be removed
>>>>> from this structure in the future.
>>>>>
>>>>>> What I'm trying to say - the proposed fix is completely unclear and
>>>>>> confusing. Think of somebody reading this line in a month from today -
>>>>>> he wouldn't get a clue why is it there, why to explicitly set
>>>>>> max_rx_queues to zero and leave the rest be zeroed automatically... Why
>>>>>> to add such artifacts to the code instead of just zeroing the struct
>>>>>> with a memset() and putting a good clear comment above it explaining why
>>>>>> we use a memset() and not and initializer?
>>>>> We can make it longer yes.
>>>>> I think you agree we should avoid extra lines if not needed.
>>>>> In this case, when reading "= { .field = 0 }", it seems clear our goal
>>>>> is to zero the structure (it is to me).
>>>> I'm sorry but it's not clear to me at all since the common C practice
>>>> for zeroing the struct would be
>>>>
>>>> struct st a = {0};
>>>>
>>>> Like in the lines u are changing. The lines as above are clearly should
>>>> not be commented and are absolutely clear.
>>>> The lines u are adding on the other hand are absolutely unclear and
>>>> confusing outside the gcc bug context. Therefore it should be clearly
>>>> stated so in a form of comment. Otherwise somebody (like myself) may see
>>>> this and immediately fix it back (as it should be).
>>>>
>>>>> I thought it is a basic C practice.
>>>> I doubt that. ;) Explained above.
>>>>
>>>>> You should try "git grep '\.[^ ]\+ *= *0 *}'" to be convinced that we are
>>>>> not going to comment each occurence of this coding style.
>>>>> But it must be explained in the coding style document. Agree?
>>>> OMG! This is awful! I think everybody agrees that this is a workaround
>>>> and has nothing to do with a codding style (it's an opposite to a style
>>>> actually). I don't know where this should be explained, frankly.
>>> Once we assert we want to support this buggy compiler, the workarounds
>>> are automatically parts of the coding style.
>> It'd rather not... ;)
>>
>>> I don't know how to deal differently with this constraint.
>> Add -Wno-missing-braces compilation option for compiler versions below
>> 4.7. U (and me and I guess most other developers) compile DPDK code with
>> a newer compiler thus the code would be properly inspected with these
>> compilers and we may afford to be less restrictive with compilation
>> warnings with legacy compiler versions...
> You're right.
> I will test it and submit a v2.
> Then I could use the above grep command to replace other occurences of this
> workaround.

U read my mind!.. ;)

>
>>>> Getting back to the issue - I'm a bit surprised since I use this kind of
>>>> initializer ({0}) in a C code for quite a long time - long before 2012.
>>>> I'd like to understand what is a problem with this specific gcc version.
>>>> This seems to trivial. I'm surprised CentOS has a gcc version with this
>>>> kind of bugs.
>>> Each day brings its surprise :)
>

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 18:13, Thomas Monjalon wrote:
> 2015-04-14 17:59, Vlad Zolotarov:
>> On 04/14/15 17:17, Thomas Monjalon wrote:
>>> 2015-04-14 16:38, Vlad Zolotarov:
>>>> On 04/14/15 16:06, Ananyev, Konstantin wrote:
>>>>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>>>>>> On 04/14/15 12:31, Thomas Monjalon wrote:
>>>>>>> -   struct rte_eth_dev_info dev_info = { 0 };
>>>>>>> +   struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>>>>> Hmmm... Unless I miss something this and one above would zero only a
>>>>>> single field - "max_rx_queues"; and would leave the rest uninitialized.
>>>>>> The original code intend to zero the whole struct. The alternative to
>>>>>> the original lines could be usage of memset().
>>>>> As I understand, in that case compiler had to set all non-explicitly 
>>>>> initialised members to 0.
>>>>> So I think we are ok here.
>>>> Yeah, I guess it does zero-initializes the rest
>>>> (https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) however I
>>>> don't understand how the above change fixes the error if it complains
>>>> about the dev_info.driver_name?
>>> As only 1 field is required, I chose the one which should not be removed
>>> from this structure in the future.
>>>
>>>> What I'm trying to say - the proposed fix is completely unclear and
>>>> confusing. Think of somebody reading this line in a month from today -
>>>> he wouldn't get a clue why is it there, why to explicitly set
>>>> max_rx_queues to zero and leave the rest be zeroed automatically... Why
>>>> to add such artifacts to the code instead of just zeroing the struct
>>>> with a memset() and putting a good clear comment above it explaining why
>>>> we use a memset() and not and initializer?
>>> We can make it longer yes.
>>> I think you agree we should avoid extra lines if not needed.
>>> In this case, when reading "= { .field = 0 }", it seems clear our goal
>>> is to zero the structure (it is to me).
>> I'm sorry but it's not clear to me at all since the common C practice
>> for zeroing the struct would be
>>
>> struct st a = {0};
>>
>> Like in the lines u are changing. The lines as above are clearly should
>> not be commented and are absolutely clear.
>> The lines u are adding on the other hand are absolutely unclear and
>> confusing outside the gcc bug context. Therefore it should be clearly
>> stated so in a form of comment. Otherwise somebody (like myself) may see
>> this and immediately fix it back (as it should be).
>>
>>> I thought it is a basic C practice.
>> I doubt that. ;) Explained above.
>>
>>> You should try "git grep '\.[^ ]\+ *= *0 *}'" to be convinced that we are
>>> not going to comment each occurence of this coding style.
>>> But it must be explained in the coding style document. Agree?
>> OMG! This is awful! I think everybody agrees that this is a workaround
>> and has nothing to do with a codding style (it's an opposite to a style
>> actually). I don't know where this should be explained, frankly.
> Once we assert we want to support this buggy compiler, the workarounds
> are automatically parts of the coding style.

It'd rather not... ;)

> I don't know how to deal differently with this constraint.

Add -Wno-missing-braces compilation option for compiler versions below 
4.7. U (and me and I guess most other developers) compile DPDK code with 
a newer compiler thus the code would be properly inspected with these 
compilers and we may afford to be less restrictive with compilation 
warnings with legacy compiler versions...

>
>> Getting back to the issue - I'm a bit surprised since I use this kind of
>> initializer ({0}) in a C code for quite a long time - long before 2012.
>> I'd like to understand what is a problem with this specific gcc version.
>> This seems to trivial. I'm surprised CentOS has a gcc version with this
>> kind of bugs.
> Each day brings its surprise :)
>

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 17:17, Thomas Monjalon wrote:
> 2015-04-14 16:38, Vlad Zolotarov:
>> On 04/14/15 16:06, Ananyev, Konstantin wrote:
>>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>>>> On 04/14/15 12:31, Thomas Monjalon wrote:
>>>>> - struct rte_eth_dev_info dev_info = { 0 };
>>>>> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>>> Hmmm... Unless I miss something this and one above would zero only a
>>>> single field - "max_rx_queues"; and would leave the rest uninitialized.
>>>> The original code intend to zero the whole struct. The alternative to
>>>> the original lines could be usage of memset().
>>> As I understand, in that case compiler had to set all non-explicitly 
>>> initialised members to 0.
>>> So I think we are ok here.
>> Yeah, I guess it does zero-initializes the rest
>> (https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) however I
>> don't understand how the above change fixes the error if it complains
>> about the dev_info.driver_name?
> As only 1 field is required, I chose the one which should not be removed
> from this structure in the future.
>
>> What I'm trying to say - the proposed fix is completely unclear and
>> confusing. Think of somebody reading this line in a month from today -
>> he wouldn't get a clue why is it there, why to explicitly set
>> max_rx_queues to zero and leave the rest be zeroed automatically... Why
>> to add such artifacts to the code instead of just zeroing the struct
>> with a memset() and putting a good clear comment above it explaining why
>> we use a memset() and not and initializer?
> We can make it longer yes.
> I think you agree we should avoid extra lines if not needed.
> In this case, when reading "= { .field = 0 }", it seems clear our goal
> is to zero the structure (it is to me).

I'm sorry but it's not clear to me at all since the common C practice 
for zeroing the struct would be

struct st a = {0};

Like in the lines u are changing. The lines as above are clearly should 
not be commented and are absolutely clear.
The lines u are adding on the other hand are absolutely unclear and 
confusing outside the gcc bug context. Therefore it should be clearly 
stated so in a form of comment. Otherwise somebody (like myself) may see 
this and immediately fix it back (as it should be).

> I thought it is a basic C practice.

I doubt that. ;) Explained above.

>
> You should try "git grep '\.[^ ]\+ *= *0 *}'" to be convinced that we are
> not going to comment each occurence of this coding style.
> But it must be explained in the coding style document. Agree?

OMG! This is awful! I think everybody agrees that this is a workaround 
and has nothing to do with a codding style (it's an opposite to a style 
actually). I don't know where this should be explained, frankly.

Getting back to the issue - I'm a bit surprised since I use this kind of 
initializer ({0}) in a C code for quite a long time - long before 2012. 
I'd like to understand what is a problem with this specific gcc version. 
This seems to trivial. I'm surprised CentOS has a gcc version with this 
kind of bugs.

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 17:17, Thomas Monjalon wrote:
> 2015-04-14 16:38, Vlad Zolotarov:
>> On 04/14/15 16:06, Ananyev, Konstantin wrote:
>>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>>>> On 04/14/15 12:31, Thomas Monjalon wrote:
>>>>> - struct rte_eth_dev_info dev_info = { 0 };
>>>>> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>>> Hmmm... Unless I miss something this and one above would zero only a
>>>> single field - "max_rx_queues"; and would leave the rest uninitialized.
>>>> The original code intend to zero the whole struct. The alternative to
>>>> the original lines could be usage of memset().
>>> As I understand, in that case compiler had to set all non-explicitly 
>>> initialised members to 0.
>>> So I think we are ok here.
>> Yeah, I guess it does zero-initializes the rest
>> (https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) however I
>> don't understand how the above change fixes the error if it complains
>> about the dev_info.driver_name?
> As only 1 field is required, I chose the one which should not be removed
> from this structure in the future.

I don't follow - where/why only one field is required? The function u 
are patching uses "rx_offload_capa" field. Or u mean this gcc version 
requires only one field? If so, could u, please, provide the errata u 
are referring, since standard doesn't require any field and {0} is an 
absolutely legal (and proper) initializer in this case...

>
>> What I'm trying to say - the proposed fix is completely unclear and
>> confusing. Think of somebody reading this line in a month from today -
>> he wouldn't get a clue why is it there, why to explicitly set
>> max_rx_queues to zero and leave the rest be zeroed automatically... Why
>> to add such artifacts to the code instead of just zeroing the struct
>> with a memset() and putting a good clear comment above it explaining why
>> we use a memset() and not and initializer?
> We can make it longer yes.
> I think you agree we should avoid extra lines if not needed.
> In this case, when reading "= { .field = 0 }", it seems clear our goal
> is to zero the structure (it is to me).
> I thought it is a basic C practice.
>
> You should try "git grep '\.[^ ]\+ *= *0 *}'" to be convinced that we are
> not going to comment each occurence of this coding style.
> But it must be explained in the coding style document. Agree?

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 16:23, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Tuesday, April 14, 2015 1:52 PM
>> To: Thomas Monjalon; Ananyev, Konstantin; Zhang, Helin
>> Cc: dev at dpdk.org
>> Subject: Re: [PATCH] ixgbe: fix build with gcc 4.4
>>
>>
>>
>> On 04/14/15 12:31, Thomas Monjalon wrote:
>>> With GCC 4.4.7 from CentOS 6.5, the following errors arise:
>>>
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_dev_rx_queue_setup?:
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: missing initializer
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: (near initialization for 
>>> ?dev_info.driver_name?)
>>>
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_set_rsc?:
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: missing initializer
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: (near initialization for 
>>> ?dev_info.driver_name?)
>>>
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 
>>> ?ixgbe_recv_pkts_lro_single_alloc?:
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1479: error: ?next_rsc_entry? may be used 
>>> uninitialized in this function
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1480: error: ?next_rxe? may be used 
>>> uninitialized in this function
>> :D Looks like a gcc bug ;) Both are set and only after that (!!!) used
>> under "!eop" condition.
> Possibly, but we still need to make it build cleanly.

It's clearly - I was just trying to be polite here... ;)
Please, add the comment explaining this initialization so that nobody 
removes these workarounds by mistake...

> Konstantin
>
>>> Fixes: 8eecb3295aed ("ixgbe: add LRO support")
>>>
>>> Signed-off-by: Thomas Monjalon 
>>> ---
>>>lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 
>>>1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> index f1da9ec..a2b8631 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> @@ -1476,8 +1476,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
>>> **rx_pkts, uint16_t nb_pkts,
>>> bool eop;
>>> struct ixgbe_rx_entry *rxe;
>>> struct ixgbe_rsc_entry *rsc_entry;
>>> -   struct ixgbe_rsc_entry *next_rsc_entry;
>>> -   struct ixgbe_rx_entry *next_rxe;
>>> +   struct ixgbe_rsc_entry *next_rsc_entry = NULL;
>>> +   struct ixgbe_rx_entry *next_rxe = NULL;
>>> struct rte_mbuf *first_seg;
>>> struct rte_mbuf *rxm;
>>> struct rte_mbuf *nmb;
>>> @@ -2506,7 +2506,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
>>> struct ixgbe_rx_queue *rxq;
>>> struct ixgbe_hw *hw;
>>> uint16_t len;
>>> -   struct rte_eth_dev_info dev_info = { 0 };
>>> +   struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>> struct rte_eth_rxmode *dev_rx_mode = >data->dev_conf.rxmode;
>>> bool rsc_requested = false;
>>>
>>> @@ -4069,7 +4069,7 @@ ixgbe_set_rsc(struct rte_eth_dev *dev)
>>>{
>>> struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;
>>> struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> -   struct rte_eth_dev_info dev_info = { 0 };
>>> +   struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>> bool rsc_capable = false;
>>> uint16_t i;
>>> uint32_t rdrxctl;

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 16:06, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Tuesday, April 14, 2015 1:49 PM
>> To: Thomas Monjalon; Ananyev, Konstantin; Zhang, Helin
>> Cc: dev at dpdk.org
>> Subject: Re: [PATCH] ixgbe: fix build with gcc 4.4
>>
>>
>>
>> On 04/14/15 12:31, Thomas Monjalon wrote:
>>> With GCC 4.4.7 from CentOS 6.5, the following errors arise:
>>>
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_dev_rx_queue_setup?:
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: missing initializer
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: (near initialization for 
>>> ?dev_info.driver_name?)
>>>
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_set_rsc?:
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: missing initializer
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: (near initialization for 
>>> ?dev_info.driver_name?)
>>>
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 
>>> ?ixgbe_recv_pkts_lro_single_alloc?:
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1479: error: ?next_rsc_entry? may be used 
>>> uninitialized in this function
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1480: error: ?next_rxe? may be used 
>>> uninitialized in this function
>>>
>>> Fixes: 8eecb3295aed ("ixgbe: add LRO support")
>>>
>>> Signed-off-by: Thomas Monjalon 
>>> ---
>>>lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 
>>>1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> index f1da9ec..a2b8631 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> @@ -1476,8 +1476,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
>>> **rx_pkts, uint16_t nb_pkts,
>>> bool eop;
>>> struct ixgbe_rx_entry *rxe;
>>> struct ixgbe_rsc_entry *rsc_entry;
>>> -   struct ixgbe_rsc_entry *next_rsc_entry;
>>> -   struct ixgbe_rx_entry *next_rxe;
>>> +   struct ixgbe_rsc_entry *next_rsc_entry = NULL;
>>> +   struct ixgbe_rx_entry *next_rxe = NULL;
>>> struct rte_mbuf *first_seg;
>>> struct rte_mbuf *rxm;
>>> struct rte_mbuf *nmb;
>>> @@ -2506,7 +2506,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
>>> struct ixgbe_rx_queue *rxq;
>>> struct ixgbe_hw *hw;
>>> uint16_t len;
>>> -   struct rte_eth_dev_info dev_info = { 0 };
>>> +   struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>>> struct rte_eth_rxmode *dev_rx_mode = >data->dev_conf.rxmode;
>>> bool rsc_requested = false;
>>>
>>> @@ -4069,7 +4069,7 @@ ixgbe_set_rsc(struct rte_eth_dev *dev)
>>>{
>>> struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;
>>> struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> -   struct rte_eth_dev_info dev_info = { 0 };
>>> +   struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>> Hmmm... Unless I miss something this and one above would zero only a
>> single field - "max_rx_queues"; and would leave the rest uninitialized.
>> The original code intend to zero the whole struct. The alternative to
>> the original lines could be usage of memset().
> As I understand, in that case compiler had to set all non-explicitly 
> initialised members to 0.
> So I think we are ok here.

Yeah, I guess it does zero-initializes the rest 
(https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) however I 
don't understand how the above change fixes the error if it complains 
about the dev_info.driver_name?

What I'm trying to say - the proposed fix is completely unclear and 
confusing. Think of somebody reading this line in a month from today - 
he wouldn't get a clue why is it there, why to explicitly set 
max_rx_queues to zero and leave the rest be zeroed automatically... Why 
to add such artifacts to the code instead of just zeroing the struct 
with a memset() and putting a good clear comment above it explaining why 
we use a memset() and not and initializer?

>   
>>> bool rsc_capable = false;
>>> uint16_t i;
>>> uint32_t rdrxctl;

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 12:31, Thomas Monjalon wrote:
> With GCC 4.4.7 from CentOS 6.5, the following errors arise:
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_dev_rx_queue_setup?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: (near initialization for 
> ?dev_info.driver_name?)
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_set_rsc?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: (near initialization for 
> ?dev_info.driver_name?)
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 
> ?ixgbe_recv_pkts_lro_single_alloc?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1479: error: ?next_rsc_entry? may be used 
> uninitialized in this function
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1480: error: ?next_rxe? may be used 
> uninitialized in this function

:D Looks like a gcc bug ;) Both are set and only after that (!!!) used 
under "!eop" condition.

>
> Fixes: 8eecb3295aed ("ixgbe: add LRO support")
>
> Signed-off-by: Thomas Monjalon 
> ---
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index f1da9ec..a2b8631 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -1476,8 +1476,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
> **rx_pkts, uint16_t nb_pkts,
>   bool eop;
>   struct ixgbe_rx_entry *rxe;
>   struct ixgbe_rsc_entry *rsc_entry;
> - struct ixgbe_rsc_entry *next_rsc_entry;
> - struct ixgbe_rx_entry *next_rxe;
> + struct ixgbe_rsc_entry *next_rsc_entry = NULL;
> + struct ixgbe_rx_entry *next_rxe = NULL;
>   struct rte_mbuf *first_seg;
>   struct rte_mbuf *rxm;
>   struct rte_mbuf *nmb;
> @@ -2506,7 +2506,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
>   struct ixgbe_rx_queue *rxq;
>   struct ixgbe_hw *hw;
>   uint16_t len;
> - struct rte_eth_dev_info dev_info = { 0 };
> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>   struct rte_eth_rxmode *dev_rx_mode = >data->dev_conf.rxmode;
>   bool rsc_requested = false;
>   
> @@ -4069,7 +4069,7 @@ ixgbe_set_rsc(struct rte_eth_dev *dev)
>   {
>   struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;
>   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> - struct rte_eth_dev_info dev_info = { 0 };
> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>   bool rsc_capable = false;
>   uint16_t i;
>   uint32_t rdrxctl;

[dpdk-dev] [PATCH] ixgbe: fix build with gcc 4.4

2015-04-14 Thread Vlad Zolotarov



On 04/14/15 12:31, Thomas Monjalon wrote:
> With GCC 4.4.7 from CentOS 6.5, the following errors arise:
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_dev_rx_queue_setup?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:2509: error: (near initialization for 
> ?dev_info.driver_name?)
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function ?ixgbe_set_rsc?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: missing initializer
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:4072: error: (near initialization for 
> ?dev_info.driver_name?)
>
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c: In function 
> ?ixgbe_recv_pkts_lro_single_alloc?:
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1479: error: ?next_rsc_entry? may be used 
> uninitialized in this function
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c:1480: error: ?next_rxe? may be used 
> uninitialized in this function
>
> Fixes: 8eecb3295aed ("ixgbe: add LRO support")
>
> Signed-off-by: Thomas Monjalon 
> ---
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index f1da9ec..a2b8631 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -1476,8 +1476,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf 
> **rx_pkts, uint16_t nb_pkts,
>   bool eop;
>   struct ixgbe_rx_entry *rxe;
>   struct ixgbe_rsc_entry *rsc_entry;
> - struct ixgbe_rsc_entry *next_rsc_entry;
> - struct ixgbe_rx_entry *next_rxe;
> + struct ixgbe_rsc_entry *next_rsc_entry = NULL;
> + struct ixgbe_rx_entry *next_rxe = NULL;
>   struct rte_mbuf *first_seg;
>   struct rte_mbuf *rxm;
>   struct rte_mbuf *nmb;
> @@ -2506,7 +2506,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
>   struct ixgbe_rx_queue *rxq;
>   struct ixgbe_hw *hw;
>   uint16_t len;
> - struct rte_eth_dev_info dev_info = { 0 };
> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };
>   struct rte_eth_rxmode *dev_rx_mode = >data->dev_conf.rxmode;
>   bool rsc_requested = false;
>   
> @@ -4069,7 +4069,7 @@ ixgbe_set_rsc(struct rte_eth_dev *dev)
>   {
>   struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;
>   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> - struct rte_eth_dev_info dev_info = { 0 };
> + struct rte_eth_dev_info dev_info = { .max_rx_queues = 0 };

Hmmm... Unless I miss something this and one above would zero only a 
single field - "max_rx_queues"; and would leave the rest uninitialized.
The original code intend to zero the whole struct. The alternative to 
the original lines could be usage of memset().

>   bool rsc_capable = false;
>   uint16_t i;
>   uint32_t rdrxctl;

[dpdk-dev] [PATCH v9 0/3]: Add LRO support to ixgbe PMD

2015-04-13 Thread Vlad Zolotarov



On 03/31/15 14:47, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Monday, March 30, 2015 8:21 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v9 0/3]: Add LRO support to ixgbe PMD
>>
>> This series adds the missing flow for enabling the LRO in the ethdev and
>> adds a support for this feature in the ixgbe PMD. There is a big hope that 
>> this
>> initiative is going to be picked up by some Intel developer that would add 
>> the LRO support
>> to other Intel PMDs.
>>
>> The series starts with some cleanup work in the code the final patch (the 
>> actual adding of
>> the LRO support) is going to touch/use/change. There are still quite a few 
>> issues in the ixgbe
>> PMD code left but they have to be a matter of a different series and I've 
>> left a few "TODO"
>> remarks in the code.
>>
>> The LRO ("RSC" in Intel's context) PMD completion handling code follows the 
>> same design as the
>> corresponding Linux and FreeBSD implementation: pass the aggregation's 
>> cluster HEAD buffer to
>> the NEXTP entry of the software ring till EOP is met.
>>
>> HW configuration follows the corresponding specs: this feature is supported 
>> only by x540 and
>> 82599 PF devices.
>>
>> The feature has been tested with seastar TCP stack with the following 
>> configuration on Tx side:
>> - MTU: 400B
>> - 100 concurrent TCP connections.
>>
>> The results were:
>> - Without LRO: total throughput: 0.12Gbps, coefficient of variance: 1.41%
>> - With LRO:total throughput: 8.21Gbps, coefficient of variance: 0.59%
>>
>> This is an almost factor 80 improvement.
>>
>> New in v9:
>> - Move newly added IXGBE_XXX macros to ixgbe_ethdev.h.
>>
>> New in v8:
>> - Fixed the structs naming: igb_xxx -> ixgbe_xxx (some leftovers in 
>> PATCH2).
>> - Took the RSC configuration code from ixgbe_dev_rx_init() into a 
>> separate
>>   function - ixgbe_set_rsc().
>> - Added some missing macros for HW configuration.
>> - Styling adjustments:
>>- Functions names.
>>- Functions descriptions.
>> - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported 
>> by
>>   ixgbe PMD.
>>
>> New in v7:
>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
>> - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
>>- Don't set them to FALSE in rte_eth_dev_stop() flow - the following
>>  rte_eth_dev_start() will need them.
>>- Reset them to TRUE in rte_eth_dev_configure() and not in a probe() 
>> flow.
>>  This will ensure the proper behaviour if port is re-configured.
>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>>   This is needed for ixgbe_rx_queue_release_mbufs().
>> - _recv_pkts_lro(): added the missing memory barrier before RDT update 
>> in a
>>   non-bulk allocation case.
>> - Don't allow RSC when device is configured in an SR-IOV mode.
>>
>> New in v6:
>> - Fix of the typo in the "bug fixes" series that broke the compilation 
>> caused a
>>   minor change in this follow-up series.
>>
>> New in v5:
>> - Split the series into "bug fixes" and "all the rest" so that the 
>> former could be
>>   integrated into a 2.0 release.
>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
>> rte_ethdev.h.
>> - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
>>
>> New in v4:
>> - Remove CONFIG_RTE_ETHDEV_LRO_SUPPORT from config/common_linuxapp.
>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h.
>> - As a result of "ixgbe: check rxd number to avoid mbuf leak" 
>> (352078e8e) Vector Rx
>>   had to get the same treatment as Rx Bulk Alloc (see PATCH4 for more 
>> details).
>>
>> New in v3:
>> - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1. 
>> Otherwise rte_pktmbuf_free()
>>   won't free them.
>>
>> New in v2:
>> - Removed rte_eth_dev_data.lro_bulk_alloc and added 
>> ixgbe_hw.rx_bulk_alloc_allowed
>>   instead.
>> - Unified the rx_pkt_bulk callback setting (a separate new patch).
>> - Fixed a few styling and spelling issues.
>>
>> Vlad Zolotarov (3):
>>ixgbe: Cleanups
>>ixgbe: Code refactoring
>>ixgbe: Add LRO support
>>
>>   lib/librte_ether/rte_ethdev.h   |   9 +-
>>   lib/librte_net/rte_ip.h |   3 +
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  13 +
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 765 
>> 
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>>   6 files changed, 738 insertions(+), 69 deletions(-)
>>
>> --
> Acked-by: Konstantin Ananyev  for 2.1 
> release.

Thomas, could u consider applying this, please.

thanks,
vlad

>
>> 2.1.0

[dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support

2015-03-31 Thread Vlad Zolotarov



On 03/31/15 13:25, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Monday, March 30, 2015 4:57 PM
>> To: Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>
>>
>>
>> On 03/30/15 18:37, Vlad Zolotarov wrote:
>>>
>>> On 03/30/15 17:18, Ananyev, Konstantin wrote:
>>>> Hi Vlad,
>>>>
>>>>> -Original Message-
>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>>>>> Sent: Wednesday, March 18, 2015 5:52 PM
>>>>> To: dev at dpdk.org
>>>>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>>>>
>>>>>   - Only x540 and 82599 devices support LRO.
>>>>>   - Add the appropriate HW configuration.
>>>>>   - Add RSC aware rx_pkt_burst() handlers:
>>>>>  - Implemented bulk allocation and non-bulk allocation versions.
>>>>>  - Add LRO-specific fields to rte_eth_rxmode, to
>>>>> rte_eth_dev_data
>>>>>and to ixgbe_rx_queue.
>>>>>  - Use the appropriate handler when LRO is requested.
>>>>>
>>>>> Signed-off-by: Vlad Zolotarov 
>>>>> ---
>>>>> New in v8:
>>>>>  - Took the RSC configuration code from ixgbe_dev_rx_init() into
>>>>> a separate
>>>>>function - ixgbe_set_rsc().
>>>>>  - Added some missing macros for HW configuration.
>>>>>  - Styling adjustments:
>>>>> - Functions names.
>>>>> - Functions descriptions.
>>>>>  - Reworked the ixgbe_free_rsc_cluster() code to make it more
>>>>> readable.
>>>>>  - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not
>>>>> supported by
>>>>>ixgbe PMD.
>>>>>
>>>>> New in v7:
>>>>>  - Free not-yet-completed RSC aggregations in rte_eth_dev_stop()
>>>>> flow.
>>>>>  - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>>>>>This is needed for ixgbe_rx_queue_release_mbufs().
>>>>>  - _recv_pkts_lro(): added the missing memory barrier before RDT
>>>>> update in a
>>>>>non-bulk allocation case.
>>>>>  - Don't allow RSC when device is configured in an SR-IOV mode.
>>>>>
>>>>> New in v5:
>>>>>  - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning
>>>>> of rte_ethdev.h.
>>>>>  - Removed the "TODO: Remove me" comment near
>>>>> RTE_ETHDEV_HAS_LRO_SUPPORT.
>>>>>
>>>>> New in v4:
>>>>>  - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>>>>>RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>>>>
>>>>> New in v2:
>>>>>  - Removed rte_eth_dev_data.lro_bulk_alloc.
>>>>>  - Fixed a few styling and spelling issues.
>>>>> ---
>>>>>lib/librte_ether/rte_ethdev.h   |   9 +-
>>>>>lib/librte_net/rte_ip.h |   3 +
>>>>>lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
>>>>>lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
>>>>>lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
>>>>>lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 610
>>>>> +++-
>>>>>lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>>>>>7 files changed, 642 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_ether/rte_ethdev.h
>>>>> b/lib/librte_ether/rte_ethdev.h
>>>>> index 21aa359..61dc49a 100644
>>>>> --- a/lib/librte_ether/rte_ethdev.h
>>>>> +++ b/lib/librte_ether/rte_ethdev.h
>>>>> @@ -172,6 +172,9 @@ extern "C" {
>>>>>
>>>>>#include 
>>>>>
>>>>> +/* Use this macro to check if LRO API is supported */
>>>>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>>>>> +
>>>>>#include 
>>>>>#include 
>>>>>#include 
>>>>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>>

[dpdk-dev] [PATCH v9 3/3] ixgbe: Add LRO support

2015-03-30 Thread Vlad Zolotarov

- Only x540 and 82599 devices support LRO.
- Add the appropriate HW configuration.
- Add RSC aware rx_pkt_burst() handlers:
   - Implemented bulk allocation and non-bulk allocation versions.
   - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
 and to ixgbe_rx_queue.
   - Use the appropriate handler when LRO is requested.

Signed-off-by: Vlad Zolotarov 
---
New in v9:
   - Move new IXGBE_XXX macros to ixgbe_ethdev.h

New in v8:
   - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
 function - ixgbe_set_rsc().
   - Added some missing macros for HW configuration.
   - Styling adjustments:
  - Functions names.
  - Functions descriptions.
   - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
   - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
 ixgbe PMD.

New in v7:
   - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
   - Reset the sw_ring[].mbuf entry in a bulk allocation case.
 This is needed for ixgbe_rx_queue_release_mbufs().
   - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
 non-bulk allocation case.
   - Don't allow RSC when device is configured in an SR-IOV mode.

New in v5:
   - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
rte_ethdev.h.
   - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.

New in v4:
   - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
 RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.

New in v2:
   - Removed rte_eth_dev_data.lro_bulk_alloc.
   - Fixed a few styling and spelling issues.
---
 lib/librte_ether/rte_ethdev.h   |   9 +-
 lib/librte_net/rte_ip.h |   3 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  13 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 610 +++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 6 files changed, 644 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 725321a..84b0b7d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -172,6 +172,9 @@ extern "C" {

 #include 

+/* Use this macro to check if LRO API is supported */
+#define RTE_ETHDEV_HAS_LRO_SUPPORT
+
 #include 
 #include 
 #include 
@@ -320,14 +323,15 @@ struct rte_eth_rxmode {
enum rte_eth_rx_mq_mode mq_mode;
uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame enabled. */
uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
-   uint8_t header_split : 1, /**< Header Split enable. */
+   uint16_t header_split : 1, /**< Header Split enable. */
hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. 
*/
hw_vlan_filter   : 1, /**< VLAN filter enable. */
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
-   enable_scatter   : 1; /**< Enable scatter packets rx handler */
+   enable_scatter   : 1, /**< Enable scatter packets rx handler */
+   enable_lro   : 1; /**< Enable LRO */
 };

 /**
@@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
uint8_t port_id;   /**< Device [external] port identifier. */
uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
OFF(0) */
+   lro  : 1,  /**< RX LRO is ON(1) / OFF(0) */
all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
dev_started : 1;   /**< Device state: STARTED(1) / STOPPED(0). 
*/
 };
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 64935d9..74c9ced 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -110,6 +110,9 @@ struct ipv4_hdr {
   (((c) & 0xff) << 8)  | \
   ((d) & 0xff))

+/** Maximal IPv4 packet length (including a header) */
+#define IPV4_MAX_PKT_LEN65535
+
 /** Internet header length mask for version_ihl field */
 #define IPV4_HDR_IHL_MASK  (0x0f)
 /**
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 5caee22..3946115 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)

/* Clear stored conf */
dev->data->scattered_rx = 0;
+   dev->data->lro = 0;

/* Clear recorded link status */

[dpdk-dev] [PATCH v9 2/3] ixgbe: Code refactoring

2015-03-30 Thread Vlad Zolotarov

   - ixgbe_rx_alloc_bufs():
  - Reset the rte_mbuf fields only when requested.
  - Take the RDT update out of the function.
  - Add the stub when RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not defined.
   - ixgbe_recv_scattered_pkts():
  - Take the code that updates the fields of the cluster's HEAD buffer into
the inline function.

Signed-off-by: Vlad Zolotarov 
---
New in v8:
   - Fixed the structs naming: igb_xxx -> ixgbe_xxx
   - Adjust a code style with the ixgbe PMD styling.

New in v3:
   - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1.
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 126 --
 1 file changed, 81 insertions(+), 45 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 7173db8..c2bcecb 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1021,7 +1021,7 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 }

 static inline int
-ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
+ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
 {
volatile union ixgbe_adv_rx_desc *rxdp;
struct ixgbe_rx_entry *rxep;
@@ -1042,11 +1042,14 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
for (i = 0; i < rxq->rx_free_thresh; ++i) {
/* populate the static rte mbuf fields */
mb = rxep[i].mbuf;
+   if (reset_mbuf) {
+   mb->next = NULL;
+   mb->nb_segs = 1;
+   mb->port = rxq->port_id;
+   }
+
rte_mbuf_refcnt_set(mb, 1);
-   mb->next = NULL;
mb->data_off = RTE_PKTMBUF_HEADROOM;
-   mb->nb_segs = 1;
-   mb->port = rxq->port_id;

/* populate the descriptors */
dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
@@ -1054,10 +1057,6 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
rxdp[i].read.pkt_addr = dma_addr;
}

-   /* update tail pointer */
-   rte_wmb();
-   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rxq->rx_free_trigger);
-
/* update state of internal queue structure */
rxq->rx_free_trigger = rxq->rx_free_trigger + rxq->rx_free_thresh;
if (rxq->rx_free_trigger >= rxq->nb_rx_desc)
@@ -1109,7 +1108,9 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

/* if required, allocate new buffers to replenish descriptors */
if (rxq->rx_tail > rxq->rx_free_trigger) {
-   if (ixgbe_rx_alloc_bufs(rxq) != 0) {
+   uint16_t cur_free_trigger = rxq->rx_free_trigger;
+
+   if (ixgbe_rx_alloc_bufs(rxq, true) != 0) {
int i, j;
PMD_RX_LOG(DEBUG, "RX mbuf alloc failed port_id=%u "
   "queue_id=%u", (unsigned) rxq->port_id,
@@ -1129,6 +1130,10 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

return 0;
}
+
+   /* update tail pointer */
+   rte_wmb();
+   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, cur_free_trigger);
}

if (rxq->rx_tail >= rxq->nb_rx_desc)
@@ -1179,6 +1184,12 @@ ixgbe_recv_pkts_bulk_alloc(__rte_unused void *rx_queue,
return 0;
 }

+static inline int
+ixgbe_rx_alloc_bufs(__rte_unused struct ixgbe_rx_queue *rxq,
+   __rte_unused bool reset_mbuf)
+{
+   return -ENOMEM;
+}
 #endif /* RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC */

 uint16_t
@@ -1363,6 +1374,64 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return (nb_rx);
 }

+/**
+ * Detect an RSC descriptor.
+ */
+static inline uint32_t
+ixgbe_rsc_count(union ixgbe_adv_rx_desc *rx)
+{
+   return (rte_le_to_cpu_32(rx->wb.lower.lo_dword.data) &
+   IXGBE_RXDADV_RSCCNT_MASK) >> IXGBE_RXDADV_RSCCNT_SHIFT;
+}
+
+/**
+ * ixgbe_fill_cluster_head_buf - fill the first mbuf of the returned packet
+ *
+ * Fill the following info in the HEAD buffer of the Rx cluster:
+ *- RX port identifier
+ *- hardware offload data, if any:
+ *  - RSS flag & hash
+ *  - IP checksum flag
+ *  - VLAN TCI, if any
+ *  - error flags
+ * @head HEAD of the packet cluster
+ * @desc HW descriptor to get data from
+ * @port_id Port ID of the Rx queue
+ */
+static inline void
+ixgbe_fill_cluster_head_buf(
+   struct rte_mbuf *head,
+   union ixgbe_adv_rx_desc *desc,
+   uint8_t port_id,
+   uint32_t staterr)
+{
+   uint32_t hlen_type_rss;
+   uint64_t pkt_flags;
+
+   head->port = port_id;
+
+   /*
+* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is
+* set in the pkt_flags field.
+*/
+   head->vlan_tci

[dpdk-dev] [PATCH v9 1/3] ixgbe: Cleanups

2015-03-30 Thread Vlad Zolotarov

   - Removed the not needed casting.
   - ixgbe_dev_rx_init(): shorten the lines by defining a local alias variable 
to access
  >data->dev_conf.rxmode.

Signed-off-by: Vlad Zolotarov 
---
New in v6:
   - Fixed a compilation error caused by a patches recomposition during series 
separation.
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 9da2c7e..7173db8 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1031,8 +1031,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
int diag, i;

/* allocate buffers in bulk directly into the S/W ring */
-   alloc_idx = (uint16_t)(rxq->rx_free_trigger -
-   (rxq->rx_free_thresh - 1));
+   alloc_idx = rxq->rx_free_trigger - (rxq->rx_free_thresh - 1);
rxep = >sw_ring[alloc_idx];
diag = rte_mempool_get_bulk(rxq->mb_pool, (void *)rxep,
rxq->rx_free_thresh);
@@ -1060,10 +1059,9 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rxq->rx_free_trigger);

/* update state of internal queue structure */
-   rxq->rx_free_trigger = (uint16_t)(rxq->rx_free_trigger +
-   rxq->rx_free_thresh);
+   rxq->rx_free_trigger = rxq->rx_free_trigger + rxq->rx_free_thresh;
if (rxq->rx_free_trigger >= rxq->nb_rx_desc)
-   rxq->rx_free_trigger = (uint16_t)(rxq->rx_free_thresh - 1);
+   rxq->rx_free_trigger = rxq->rx_free_thresh - 1;

/* no errors */
return 0;
@@ -3590,6 +3588,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxcsum;
uint16_t buf_size;
uint16_t i;
+   struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3612,7 +3611,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
 * Configure CRC stripping, if any.
 */
hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
-   if (dev->data->dev_conf.rxmode.hw_strip_crc)
+   if (rx_conf->hw_strip_crc)
hlreg0 |= IXGBE_HLREG0_RXCRCSTRP;
else
hlreg0 &= ~IXGBE_HLREG0_RXCRCSTRP;
@@ -3620,11 +3619,11 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
/*
 * Configure jumbo frame support, if any.
 */
-   if (dev->data->dev_conf.rxmode.jumbo_frame == 1) {
+   if (rx_conf->jumbo_frame == 1) {
hlreg0 |= IXGBE_HLREG0_JUMBOEN;
maxfrs = IXGBE_READ_REG(hw, IXGBE_MAXFRS);
maxfrs &= 0x;
-   maxfrs |= (dev->data->dev_conf.rxmode.max_rx_pkt_len << 16);
+   maxfrs |= (rx_conf->max_rx_pkt_len << 16);
IXGBE_WRITE_REG(hw, IXGBE_MAXFRS, maxfrs);
} else
hlreg0 &= ~IXGBE_HLREG0_JUMBOEN;
@@ -3648,9 +3647,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
 * Reset crc_len in case it was changed after queue setup by a
 * call to configure.
 */
-   rxq->crc_len = (uint8_t)
-   ((dev->data->dev_conf.rxmode.hw_strip_crc) ? 0 :
-   ETHER_CRC_LEN);
+   rxq->crc_len = rx_conf->hw_strip_crc ? 0 : ETHER_CRC_LEN;

/* Setup the Base and Length of the Rx Descriptor Rings */
bus_addr = rxq->rx_ring_phys_addr;
@@ -3668,7 +3665,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
/*
 * Configure Header Split
 */
-   if (dev->data->dev_conf.rxmode.header_split) {
+   if (rx_conf->header_split) {
if (hw->mac.type == ixgbe_mac_82599EB) {
/* Must setup the PSRTYPE register */
uint32_t psrtype;
@@ -3678,7 +3675,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
IXGBE_PSRTYPE_IPV6HDR;
IXGBE_WRITE_REG(hw, 
IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
}
-   srrctl = ((dev->data->dev_conf.rxmode.split_hdr_size <<
+   srrctl = ((rx_conf->split_hdr_size <<
IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
IXGBE_SRRCTL_BSIZEHDR_MASK);
srrctl |= IXGBE_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
@@ -3712,7 +3709,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
dev->data->s

[dpdk-dev] [PATCH v9 0/3]: Add LRO support to ixgbe PMD

2015-03-30 Thread Vlad Zolotarov

This series adds the missing flow for enabling the LRO in the ethdev and
adds a support for this feature in the ixgbe PMD. There is a big hope that this
initiative is going to be picked up by some Intel developer that would add the 
LRO support
to other Intel PMDs.

The series starts with some cleanup work in the code the final patch (the 
actual adding of
the LRO support) is going to touch/use/change. There are still quite a few 
issues in the ixgbe
PMD code left but they have to be a matter of a different series and I've left 
a few "TODO"
remarks in the code.

The LRO ("RSC" in Intel's context) PMD completion handling code follows the 
same design as the
corresponding Linux and FreeBSD implementation: pass the aggregation's cluster 
HEAD buffer to
the NEXTP entry of the software ring till EOP is met.

HW configuration follows the corresponding specs: this feature is supported 
only by x540 and
82599 PF devices.

The feature has been tested with seastar TCP stack with the following 
configuration on Tx side:
   - MTU: 400B
   - 100 concurrent TCP connections.

The results were:
   - Without LRO: total throughput: 0.12Gbps, coefficient of variance: 1.41%
   - With LRO:total throughput: 8.21Gbps, coefficient of variance: 0.59%

This is an almost factor 80 improvement.

New in v9:
   - Move newly added IXGBE_XXX macros to ixgbe_ethdev.h.

New in v8:
   - Fixed the structs naming: igb_xxx -> ixgbe_xxx (some leftovers in PATCH2).
   - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
 function - ixgbe_set_rsc().
   - Added some missing macros for HW configuration.
   - Styling adjustments:
  - Functions names.
  - Functions descriptions.
   - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
   - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
 ixgbe PMD.

New in v7:
   - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
   - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
  - Don't set them to FALSE in rte_eth_dev_stop() flow - the following
rte_eth_dev_start() will need them.
  - Reset them to TRUE in rte_eth_dev_configure() and not in a probe() flow.
This will ensure the proper behaviour if port is re-configured.
   - Reset the sw_ring[].mbuf entry in a bulk allocation case.
 This is needed for ixgbe_rx_queue_release_mbufs().
   - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
 non-bulk allocation case.
   - Don't allow RSC when device is configured in an SR-IOV mode.

New in v6:
   - Fix of the typo in the "bug fixes" series that broke the compilation 
caused a
 minor change in this follow-up series.

New in v5:
   - Split the series into "bug fixes" and "all the rest" so that the former 
could be
 integrated into a 2.0 release.
   - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
rte_ethdev.h.
   - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.

New in v4:
   - Remove CONFIG_RTE_ETHDEV_LRO_SUPPORT from config/common_linuxapp.
   - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h.
   - As a result of "ixgbe: check rxd number to avoid mbuf leak" (352078e8e) 
Vector Rx
 had to get the same treatment as Rx Bulk Alloc (see PATCH4 for more 
details).

New in v3:
   - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1. Otherwise 
rte_pktmbuf_free()
 won't free them.

New in v2:
   - Removed rte_eth_dev_data.lro_bulk_alloc and added 
ixgbe_hw.rx_bulk_alloc_allowed
 instead.
   - Unified the rx_pkt_bulk callback setting (a separate new patch).
   - Fixed a few styling and spelling issues.

Vlad Zolotarov (3):
  ixgbe: Cleanups
  ixgbe: Code refactoring
  ixgbe: Add LRO support

 lib/librte_ether/rte_ethdev.h   |   9 +-
 lib/librte_net/rte_ip.h |   3 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  13 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 765 
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 6 files changed, 738 insertions(+), 69 deletions(-)

-- 
2.1.0

[dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support

2015-03-30 Thread Vlad Zolotarov



On 03/30/15 18:37, Vlad Zolotarov wrote:
>
>
> On 03/30/15 17:18, Ananyev, Konstantin wrote:
>> Hi Vlad,
>>
>>> -Original Message-
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>>> Sent: Wednesday, March 18, 2015 5:52 PM
>>> To: dev at dpdk.org
>>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>>
>>>  - Only x540 and 82599 devices support LRO.
>>>  - Add the appropriate HW configuration.
>>>  - Add RSC aware rx_pkt_burst() handlers:
>>> - Implemented bulk allocation and non-bulk allocation versions.
>>> - Add LRO-specific fields to rte_eth_rxmode, to 
>>> rte_eth_dev_data
>>>   and to ixgbe_rx_queue.
>>> - Use the appropriate handler when LRO is requested.
>>>
>>> Signed-off-by: Vlad Zolotarov 
>>> ---
>>> New in v8:
>>> - Took the RSC configuration code from ixgbe_dev_rx_init() into 
>>> a separate
>>>   function - ixgbe_set_rsc().
>>> - Added some missing macros for HW configuration.
>>> - Styling adjustments:
>>>- Functions names.
>>>- Functions descriptions.
>>> - Reworked the ixgbe_free_rsc_cluster() code to make it more 
>>> readable.
>>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not 
>>> supported by
>>>   ixgbe PMD.
>>>
>>> New in v7:
>>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() 
>>> flow.
>>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>>>   This is needed for ixgbe_rx_queue_release_mbufs().
>>> - _recv_pkts_lro(): added the missing memory barrier before RDT 
>>> update in a
>>>   non-bulk allocation case.
>>> - Don't allow RSC when device is configured in an SR-IOV mode.
>>>
>>> New in v5:
>>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning 
>>> of rte_ethdev.h.
>>> - Removed the "TODO: Remove me" comment near 
>>> RTE_ETHDEV_HAS_LRO_SUPPORT.
>>>
>>> New in v4:
>>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>>>   RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>>
>>> New in v2:
>>> - Removed rte_eth_dev_data.lro_bulk_alloc.
>>> - Fixed a few styling and spelling issues.
>>> ---
>>>   lib/librte_ether/rte_ethdev.h   |   9 +-
>>>   lib/librte_net/rte_ip.h |   3 +
>>>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
>>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
>>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
>>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 610 
>>> +++-
>>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>>>   7 files changed, 642 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.h 
>>> b/lib/librte_ether/rte_ethdev.h
>>> index 21aa359..61dc49a 100644
>>> --- a/lib/librte_ether/rte_ethdev.h
>>> +++ b/lib/librte_ether/rte_ethdev.h
>>> @@ -172,6 +172,9 @@ extern "C" {
>>>
>>>   #include 
>>>
>>> +/* Use this macro to check if LRO API is supported */
>>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>>> +
>>>   #include 
>>>   #include 
>>>   #include 
>>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>>>   enum rte_eth_rx_mq_mode mq_mode;
>>>   uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame 
>>> enabled. */
>>>   uint16_t split_hdr_size;  /**< hdr buf size (header_split 
>>> enabled).*/
>>> -uint8_t header_split : 1, /**< Header Split enable. */
>>> +uint16_t header_split : 1, /**< Header Split enable. */
>>>   hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload 
>>> enable. */
>>>   hw_vlan_filter   : 1, /**< VLAN filter enable. */
>>>   hw_vlan_strip: 1, /**< VLAN strip enable. */
>>>   hw_vlan_extend   : 1, /**< Extended VLAN enable. */
>>>   jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
>>>   hw_strip_crc : 1, /**< Enable CRC stripping by 
>>> hardware. */
>>> -enable_scatter   : 1; /**< Enable scatter packets rx 
>>> handler */
>>> +

[dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support

2015-03-30 Thread Vlad Zolotarov



On 03/30/15 17:18, Ananyev, Konstantin wrote:
> Hi Vlad,
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Wednesday, March 18, 2015 5:52 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>
>>  - Only x540 and 82599 devices support LRO.
>>  - Add the appropriate HW configuration.
>>  - Add RSC aware rx_pkt_burst() handlers:
>> - Implemented bulk allocation and non-bulk allocation versions.
>> - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
>>   and to ixgbe_rx_queue.
>>     - Use the appropriate handler when LRO is requested.
>>
>> Signed-off-by: Vlad Zolotarov 
>> ---
>> New in v8:
>> - Took the RSC configuration code from ixgbe_dev_rx_init() into a 
>> separate
>>   function - ixgbe_set_rsc().
>> - Added some missing macros for HW configuration.
>> - Styling adjustments:
>>- Functions names.
>>- Functions descriptions.
>> - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported 
>> by
>>   ixgbe PMD.
>>
>> New in v7:
>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>>   This is needed for ixgbe_rx_queue_release_mbufs().
>> - _recv_pkts_lro(): added the missing memory barrier before RDT update 
>> in a
>>   non-bulk allocation case.
>> - Don't allow RSC when device is configured in an SR-IOV mode.
>>
>> New in v5:
>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
>> rte_ethdev.h.
>> - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
>>
>> New in v4:
>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>>   RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>
>> New in v2:
>> - Removed rte_eth_dev_data.lro_bulk_alloc.
>> - Fixed a few styling and spelling issues.
>> ---
>>   lib/librte_ether/rte_ethdev.h   |   9 +-
>>   lib/librte_net/rte_ip.h |   3 +
>>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 610 
>> +++-
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>>   7 files changed, 642 insertions(+), 8 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index 21aa359..61dc49a 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -172,6 +172,9 @@ extern "C" {
>>
>>   #include 
>>
>> +/* Use this macro to check if LRO API is supported */
>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>> +
>>   #include 
>>   #include 
>>   #include 
>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>>  enum rte_eth_rx_mq_mode mq_mode;
>>  uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame enabled. */
>>  uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
>> -uint8_t header_split : 1, /**< Header Split enable. */
>> +uint16_t header_split : 1, /**< Header Split enable. */
>>  hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. 
>> */
>>  hw_vlan_filter   : 1, /**< VLAN filter enable. */
>>  hw_vlan_strip: 1, /**< VLAN strip enable. */
>>  hw_vlan_extend   : 1, /**< Extended VLAN enable. */
>>  jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
>>  hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
>> -enable_scatter   : 1; /**< Enable scatter packets rx handler */
>> +enable_scatter   : 1, /**< Enable scatter packets rx handler */
>> +enable_lro   : 1; /**< Enable LRO */
>>   };
>>
>>   /**
>> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
>>  uint8_t port_id;   /**< Device [external] port identifier. */
>>  uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
>>  scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
>> OFF(0) */
>> +lro

[dpdk-dev] [dpdk=dev] [PATCH v8 0/3]: Add LRO support to ixgbe PMD

2015-03-18 Thread Vlad Zolotarov

There was a typo in a format-patch command - pls., ignore the whole 
series. I'm respinning it with the proper subject.


On 03/18/15 19:48, Vlad Zolotarov wrote:
> This series adds the missing flow for enabling the LRO in the ethdev and
> adds a support for this feature in the ixgbe PMD. There is a big hope that 
> this
> initiative is going to be picked up by some Intel developer that would add 
> the LRO support
> to other Intel PMDs.
>
> The series starts with some cleanup work in the code the final patch (the 
> actual adding of
> the LRO support) is going to touch/use/change. There are still quite a few 
> issues in the ixgbe
> PMD code left but they have to be a matter of a different series and I've 
> left a few "TODO"
> remarks in the code.
>
> The LRO ("RSC" in Intel's context) PMD completion handling code follows the 
> same design as the
> corresponding Linux and FreeBSD implementation: pass the aggregation's 
> cluster HEAD buffer to
> the NEXTP entry of the software ring till EOP is met.
>
> HW configuration follows the corresponding specs: this feature is supported 
> only by x540 and
> 82599 PF devices.
>
> The feature has been tested with seastar TCP stack with the following 
> configuration on Tx side:
> - MTU: 400B
> - 100 concurrent TCP connections.
>
> The results were:
> - Without LRO: total throughput: 0.12Gbps, coefficient of variance: 1.41%
> - With LRO:total throughput: 8.21Gbps, coefficient of variance: 0.59%
>
> This is an almost factor 80 improvement.
>
> New in v8:
> - Fixed the structs naming: igb_xxx -> ixgbe_xxx (some leftovers in 
> PATCH2).
> - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
>   function - ixgbe_set_rsc().
> - Added some missing macros for HW configuration.
> - Styling adjustments:
>- Functions names.
>- Functions descriptions.
> - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported 
> by
>   ixgbe PMD.
>
> New in v7:
> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
> - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
>- Don't set them to FALSE in rte_eth_dev_stop() flow - the following
>  rte_eth_dev_start() will need them.
>- Reset them to TRUE in rte_eth_dev_configure() and not in a probe() 
> flow.
>  This will ensure the proper behaviour if port is re-configured.
> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>   This is needed for ixgbe_rx_queue_release_mbufs().
> - _recv_pkts_lro(): added the missing memory barrier before RDT update in 
> a
>   non-bulk allocation case.
> - Don't allow RSC when device is configured in an SR-IOV mode.
>
> New in v6:
> - Fix of the typo in the "bug fixes" series that broke the compilation 
> caused a
>   minor change in this follow-up series.
>
> New in v5:
> - Split the series into "bug fixes" and "all the rest" so that the former 
> could be
>   integrated into a 2.0 release.
> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
> rte_ethdev.h.
> - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
>
> New in v4:
> - Remove CONFIG_RTE_ETHDEV_LRO_SUPPORT from config/common_linuxapp.
> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h.
> - As a result of "ixgbe: check rxd number to avoid mbuf leak" (352078e8e) 
> Vector Rx
>   had to get the same treatment as Rx Bulk Alloc (see PATCH4 for more 
> details).
>
> New in v3:
> - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1. 
> Otherwise rte_pktmbuf_free()
>   won't free them.
>
> New in v2:
> - Removed rte_eth_dev_data.lro_bulk_alloc and added 
> ixgbe_hw.rx_bulk_alloc_allowed
>   instead.
> - Unified the rx_pkt_bulk callback setting (a separate new patch).
> - Fixed a few styling and spelling issues.
>
>
> Vlad Zolotarov (3):
>ixgbe: Cleanups
>ixgbe: Code refactoring
>ixgbe: Add LRO support
>
>   lib/librte_ether/rte_ethdev.h   |   9 +-
>   lib/librte_net/rte_ip.h |   3 +
>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 766 
> +---
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>   7 files changed, 737 insertions(+), 69 deletions(-)
>

[dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support

2015-03-18 Thread Vlad Zolotarov

- Only x540 and 82599 devices support LRO.
- Add the appropriate HW configuration.
- Add RSC aware rx_pkt_burst() handlers:
   - Implemented bulk allocation and non-bulk allocation versions.
   - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
 and to ixgbe_rx_queue.
   - Use the appropriate handler when LRO is requested.

Signed-off-by: Vlad Zolotarov 
---
New in v8:
   - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
 function - ixgbe_set_rsc().
   - Added some missing macros for HW configuration.
   - Styling adjustments:
  - Functions names.
  - Functions descriptions.
   - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
   - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
 ixgbe PMD.

New in v7:
   - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
   - Reset the sw_ring[].mbuf entry in a bulk allocation case.
 This is needed for ixgbe_rx_queue_release_mbufs().
   - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
 non-bulk allocation case.
   - Don't allow RSC when device is configured in an SR-IOV mode.

New in v5:
   - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
rte_ethdev.h.
   - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.

New in v4:
   - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
 RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.

New in v2:
   - Removed rte_eth_dev_data.lro_bulk_alloc.
   - Fixed a few styling and spelling issues.
---
 lib/librte_ether/rte_ethdev.h   |   9 +-
 lib/librte_net/rte_ip.h |   3 +
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 610 +++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 7 files changed, 642 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 21aa359..61dc49a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -172,6 +172,9 @@ extern "C" {

 #include 

+/* Use this macro to check if LRO API is supported */
+#define RTE_ETHDEV_HAS_LRO_SUPPORT
+
 #include 
 #include 
 #include 
@@ -320,14 +323,15 @@ struct rte_eth_rxmode {
enum rte_eth_rx_mq_mode mq_mode;
uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame enabled. */
uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
-   uint8_t header_split : 1, /**< Header Split enable. */
+   uint16_t header_split : 1, /**< Header Split enable. */
hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. 
*/
hw_vlan_filter   : 1, /**< VLAN filter enable. */
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
-   enable_scatter   : 1; /**< Enable scatter packets rx handler */
+   enable_scatter   : 1, /**< Enable scatter packets rx handler */
+   enable_lro   : 1; /**< Enable LRO */
 };

 /**
@@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
uint8_t port_id;   /**< Device [external] port identifier. */
uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
OFF(0) */
+   lro  : 1,  /**< RX LRO is ON(1) / OFF(0) */
all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
dev_started : 1;   /**< Device state: STARTED(1) / STOPPED(0). 
*/
 };
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 64935d9..74c9ced 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -110,6 +110,9 @@ struct ipv4_hdr {
   (((c) & 0xff) << 8)  | \
   ((d) & 0xff))

+/** Maximal IPv4 packet length (including a header) */
+#define IPV4_MAX_PKT_LEN65535
+
 /** Internet header length mask for version_ihl field */
 #define IPV4_HDR_IHL_MASK  (0x0f)
 /**
diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h 
b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
index 9a66370..4998627 100644
--- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
+++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
@@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
 #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) * 4)) : \
 (0x012300 + (((_i) - 24) * 4)))

[dpdk-dev] [PATCH v8 2/3] ixgbe: Code refactoring

2015-03-18 Thread Vlad Zolotarov

   - ixgbe_rx_alloc_bufs():
  - Reset the rte_mbuf fields only when requested.
  - Take the RDT update out of the function.
  - Add the stub when RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not defined.
   - ixgbe_recv_scattered_pkts():
  - Take the code that updates the fields of the cluster's HEAD buffer into
the inline function.

Signed-off-by: Vlad Zolotarov 
---
New in v8:
   - Fixed the structs naming: igb_xxx -> ixgbe_xxx
   - Adjust a code style with the ixgbe PMD styling.

New in v3:
   - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1.
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 127 --
 1 file changed, 82 insertions(+), 45 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index f17e8e1..a08ae6a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1021,7 +1021,7 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 }

 static inline int
-ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
+ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
 {
volatile union ixgbe_adv_rx_desc *rxdp;
struct ixgbe_rx_entry *rxep;
@@ -1042,11 +1042,14 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
for (i = 0; i < rxq->rx_free_thresh; ++i) {
/* populate the static rte mbuf fields */
mb = rxep[i].mbuf;
+   if (reset_mbuf) {
+   mb->next = NULL;
+   mb->nb_segs = 1;
+   mb->port = rxq->port_id;
+   }
+
rte_mbuf_refcnt_set(mb, 1);
-   mb->next = NULL;
mb->data_off = RTE_PKTMBUF_HEADROOM;
-   mb->nb_segs = 1;
-   mb->port = rxq->port_id;

/* populate the descriptors */
dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
@@ -1054,10 +1057,6 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
rxdp[i].read.pkt_addr = dma_addr;
}

-   /* update tail pointer */
-   rte_wmb();
-   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rxq->rx_free_trigger);
-
/* update state of internal queue structure */
rxq->rx_free_trigger = rxq->rx_free_trigger + rxq->rx_free_thresh;
if (rxq->rx_free_trigger >= rxq->nb_rx_desc)
@@ -1109,7 +1108,9 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

/* if required, allocate new buffers to replenish descriptors */
if (rxq->rx_tail > rxq->rx_free_trigger) {
-   if (ixgbe_rx_alloc_bufs(rxq) != 0) {
+   uint16_t cur_free_trigger = rxq->rx_free_trigger;
+
+   if (ixgbe_rx_alloc_bufs(rxq, true) != 0) {
int i, j;
PMD_RX_LOG(DEBUG, "RX mbuf alloc failed port_id=%u "
   "queue_id=%u", (unsigned) rxq->port_id,
@@ -1129,6 +1130,10 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

return 0;
}
+
+   /* update tail pointer */
+   rte_wmb();
+   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, cur_free_trigger);
}

if (rxq->rx_tail >= rxq->nb_rx_desc)
@@ -1168,6 +1173,13 @@ ixgbe_recv_pkts_bulk_alloc(void *rx_queue, struct 
rte_mbuf **rx_pkts,

return nb_rx;
 }
+#else
+static inline int
+ixgbe_rx_alloc_bufs(__rte_unused struct ixgbe_rx_queue *rxq,
+   __rte_unused bool reset_mbuf)
+{
+   return -ENOMEM;
+}
 #endif /* RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC */

 uint16_t
@@ -1352,6 +1364,64 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return (nb_rx);
 }

+/**
+ * Detect an RSC descriptor.
+ */
+static inline uint32_t
+ixgbe_rsc_count(union ixgbe_adv_rx_desc *rx)
+{
+   return (rte_le_to_cpu_32(rx->wb.lower.lo_dword.data) &
+   IXGBE_RXDADV_RSCCNT_MASK) >> IXGBE_RXDADV_RSCCNT_SHIFT;
+}
+
+/**
+ * ixgbe_fill_cluster_head_buf - fill the first mbuf of the returned packet
+ *
+ * Fill the following info in the HEAD buffer of the Rx cluster:
+ *- RX port identifier
+ *- hardware offload data, if any:
+ *  - RSS flag & hash
+ *  - IP checksum flag
+ *  - VLAN TCI, if any
+ *  - error flags
+ * @head HEAD of the packet cluster
+ * @desc HW descriptor to get data from
+ * @port_id Port ID of the Rx queue
+ */
+static inline void
+ixgbe_fill_cluster_head_buf(
+   struct rte_mbuf *head,
+   union ixgbe_adv_rx_desc *desc,
+   uint8_t port_id,
+   uint32_t staterr)
+{
+   uint32_t hlen_type_rss;
+   uint64_t pkt_flags;
+
+   head->port = port_id;
+
+   /*
+* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is
+* set in the pkt_flags field.
+*/
+

[dpdk-dev] [PATCH v8 1/3] ixgbe: Cleanups

2015-03-18 Thread Vlad Zolotarov

   - Removed the not needed casting.
   - ixgbe_dev_rx_init(): shorten the lines by defining a local alias variable 
to access
  >data->dev_conf.rxmode.

Signed-off-by: Vlad Zolotarov 
---
New in v6:
   - Fixed a compilation error caused by a patches recomposition during series 
separation.
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 42f0aa5..f17e8e1 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1031,8 +1031,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
int diag, i;

/* allocate buffers in bulk directly into the S/W ring */
-   alloc_idx = (uint16_t)(rxq->rx_free_trigger -
-   (rxq->rx_free_thresh - 1));
+   alloc_idx = rxq->rx_free_trigger - (rxq->rx_free_thresh - 1);
rxep = >sw_ring[alloc_idx];
diag = rte_mempool_get_bulk(rxq->mb_pool, (void *)rxep,
rxq->rx_free_thresh);
@@ -1060,10 +1059,9 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rxq->rx_free_trigger);

/* update state of internal queue structure */
-   rxq->rx_free_trigger = (uint16_t)(rxq->rx_free_trigger +
-   rxq->rx_free_thresh);
+   rxq->rx_free_trigger = rxq->rx_free_trigger + rxq->rx_free_thresh;
if (rxq->rx_free_trigger >= rxq->nb_rx_desc)
-   rxq->rx_free_trigger = (uint16_t)(rxq->rx_free_thresh - 1);
+   rxq->rx_free_trigger = rxq->rx_free_thresh - 1;

/* no errors */
return 0;
@@ -3579,6 +3577,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxcsum;
uint16_t buf_size;
uint16_t i;
+   struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3601,7 +3600,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
 * Configure CRC stripping, if any.
 */
hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
-   if (dev->data->dev_conf.rxmode.hw_strip_crc)
+   if (rx_conf->hw_strip_crc)
hlreg0 |= IXGBE_HLREG0_RXCRCSTRP;
else
hlreg0 &= ~IXGBE_HLREG0_RXCRCSTRP;
@@ -3609,11 +3608,11 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
/*
 * Configure jumbo frame support, if any.
 */
-   if (dev->data->dev_conf.rxmode.jumbo_frame == 1) {
+   if (rx_conf->jumbo_frame == 1) {
hlreg0 |= IXGBE_HLREG0_JUMBOEN;
maxfrs = IXGBE_READ_REG(hw, IXGBE_MAXFRS);
maxfrs &= 0x;
-   maxfrs |= (dev->data->dev_conf.rxmode.max_rx_pkt_len << 16);
+   maxfrs |= (rx_conf->max_rx_pkt_len << 16);
IXGBE_WRITE_REG(hw, IXGBE_MAXFRS, maxfrs);
} else
hlreg0 &= ~IXGBE_HLREG0_JUMBOEN;
@@ -3637,9 +3636,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
 * Reset crc_len in case it was changed after queue setup by a
 * call to configure.
 */
-   rxq->crc_len = (uint8_t)
-   ((dev->data->dev_conf.rxmode.hw_strip_crc) ? 0 :
-   ETHER_CRC_LEN);
+   rxq->crc_len = rx_conf->hw_strip_crc ? 0 : ETHER_CRC_LEN;

/* Setup the Base and Length of the Rx Descriptor Rings */
bus_addr = rxq->rx_ring_phys_addr;
@@ -3657,7 +3654,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
/*
 * Configure Header Split
 */
-   if (dev->data->dev_conf.rxmode.header_split) {
+   if (rx_conf->header_split) {
if (hw->mac.type == ixgbe_mac_82599EB) {
/* Must setup the PSRTYPE register */
uint32_t psrtype;
@@ -3667,7 +3664,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
IXGBE_PSRTYPE_IPV6HDR;
IXGBE_WRITE_REG(hw, 
IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
}
-   srrctl = ((dev->data->dev_conf.rxmode.split_hdr_size <<
+   srrctl = ((rx_conf->split_hdr_size <<
IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
IXGBE_SRRCTL_BSIZEHDR_MASK);
srrctl |= IXGBE_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
@@ -3701,7 +3698,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
dev->data->s

[dpdk-dev] [PATCH v8 0/3]: Add LRO support to ixgbe PMD

2015-03-18 Thread Vlad Zolotarov

This series adds the missing flow for enabling the LRO in the ethdev and
adds a support for this feature in the ixgbe PMD. There is a big hope that this
initiative is going to be picked up by some Intel developer that would add the 
LRO support
to other Intel PMDs.

The series starts with some cleanup work in the code the final patch (the 
actual adding of
the LRO support) is going to touch/use/change. There are still quite a few 
issues in the ixgbe
PMD code left but they have to be a matter of a different series and I've left 
a few "TODO"
remarks in the code.

The LRO ("RSC" in Intel's context) PMD completion handling code follows the 
same design as the
corresponding Linux and FreeBSD implementation: pass the aggregation's cluster 
HEAD buffer to
the NEXTP entry of the software ring till EOP is met.

HW configuration follows the corresponding specs: this feature is supported 
only by x540 and
82599 PF devices.

The feature has been tested with seastar TCP stack with the following 
configuration on Tx side:
   - MTU: 400B
   - 100 concurrent TCP connections.

The results were:
   - Without LRO: total throughput: 0.12Gbps, coefficient of variance: 1.41%
   - With LRO:total throughput: 8.21Gbps, coefficient of variance: 0.59%

This is an almost factor 80 improvement.

New in v8:
   - Fixed the structs naming: igb_xxx -> ixgbe_xxx (some leftovers in PATCH2).
   - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
 function - ixgbe_set_rsc().
   - Added some missing macros for HW configuration.
   - Styling adjustments:
  - Functions names.
  - Functions descriptions.
   - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
   - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
 ixgbe PMD.

New in v7:
   - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
   - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
  - Don't set them to FALSE in rte_eth_dev_stop() flow - the following
rte_eth_dev_start() will need them.
  - Reset them to TRUE in rte_eth_dev_configure() and not in a probe() flow.
This will ensure the proper behaviour if port is re-configured.
   - Reset the sw_ring[].mbuf entry in a bulk allocation case.
 This is needed for ixgbe_rx_queue_release_mbufs().
   - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
 non-bulk allocation case.
   - Don't allow RSC when device is configured in an SR-IOV mode.

New in v6:
   - Fix of the typo in the "bug fixes" series that broke the compilation 
caused a
 minor change in this follow-up series.

New in v5:
   - Split the series into "bug fixes" and "all the rest" so that the former 
could be
 integrated into a 2.0 release.
   - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
rte_ethdev.h.
   - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.

New in v4:
   - Remove CONFIG_RTE_ETHDEV_LRO_SUPPORT from config/common_linuxapp.
   - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h.
   - As a result of "ixgbe: check rxd number to avoid mbuf leak" (352078e8e) 
Vector Rx
 had to get the same treatment as Rx Bulk Alloc (see PATCH4 for more 
details).

New in v3:
   - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1. Otherwise 
rte_pktmbuf_free()
 won't free them.

New in v2:
   - Removed rte_eth_dev_data.lro_bulk_alloc and added 
ixgbe_hw.rx_bulk_alloc_allowed
 instead.
   - Unified the rx_pkt_bulk callback setting (a separate new patch).
   - Fixed a few styling and spelling issues.


Vlad Zolotarov (3):
  ixgbe: Cleanups
  ixgbe: Code refactoring
  ixgbe: Add LRO support

 lib/librte_ether/rte_ethdev.h   |   9 +-
 lib/librte_net/rte_ip.h |   3 +
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 766 +---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 7 files changed, 737 insertions(+), 69 deletions(-)

-- 
2.1.0

[dpdk-dev] [dpdk=dev] [PATCH v8 3/3] ixgbe: Add LRO support

2015-03-18 Thread Vlad Zolotarov

- Only x540 and 82599 devices support LRO.
- Add the appropriate HW configuration.
- Add RSC aware rx_pkt_burst() handlers:
   - Implemented bulk allocation and non-bulk allocation versions.
   - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
 and to ixgbe_rx_queue.
   - Use the appropriate handler when LRO is requested.

Signed-off-by: Vlad Zolotarov 
---
New in v8:
   - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
 function - ixgbe_set_rsc().
   - Added some missing macros for HW configuration.
   - Styling adjustments:
  - Functions names.
  - Functions descriptions.
   - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
   - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
 ixgbe PMD.

New in v7:
   - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
   - Reset the sw_ring[].mbuf entry in a bulk allocation case.
 This is needed for ixgbe_rx_queue_release_mbufs().
   - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
 non-bulk allocation case.
   - Don't allow RSC when device is configured in an SR-IOV mode.

New in v5:
   - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
rte_ethdev.h.
   - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.

New in v4:
   - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
 RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.

New in v2:
   - Removed rte_eth_dev_data.lro_bulk_alloc.
   - Fixed a few styling and spelling issues.
---
 lib/librte_ether/rte_ethdev.h   |   9 +-
 lib/librte_net/rte_ip.h |   3 +
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 610 +++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 7 files changed, 642 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 21aa359..61dc49a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -172,6 +172,9 @@ extern "C" {

 #include 

+/* Use this macro to check if LRO API is supported */
+#define RTE_ETHDEV_HAS_LRO_SUPPORT
+
 #include 
 #include 
 #include 
@@ -320,14 +323,15 @@ struct rte_eth_rxmode {
enum rte_eth_rx_mq_mode mq_mode;
uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame enabled. */
uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
-   uint8_t header_split : 1, /**< Header Split enable. */
+   uint16_t header_split : 1, /**< Header Split enable. */
hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. 
*/
hw_vlan_filter   : 1, /**< VLAN filter enable. */
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
-   enable_scatter   : 1; /**< Enable scatter packets rx handler */
+   enable_scatter   : 1, /**< Enable scatter packets rx handler */
+   enable_lro   : 1; /**< Enable LRO */
 };

 /**
@@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
uint8_t port_id;   /**< Device [external] port identifier. */
uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
OFF(0) */
+   lro  : 1,  /**< RX LRO is ON(1) / OFF(0) */
all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
dev_started : 1;   /**< Device state: STARTED(1) / STOPPED(0). 
*/
 };
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 64935d9..74c9ced 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -110,6 +110,9 @@ struct ipv4_hdr {
   (((c) & 0xff) << 8)  | \
   ((d) & 0xff))

+/** Maximal IPv4 packet length (including a header) */
+#define IPV4_MAX_PKT_LEN65535
+
 /** Internet header length mask for version_ihl field */
 #define IPV4_HDR_IHL_MASK  (0x0f)
 /**
diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h 
b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
index 9a66370..4998627 100644
--- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
+++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
@@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
 #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) * 4)) : \
 (0x012300 + (((_i) - 24) * 4)))

[dpdk-dev] [dpdk=dev] [PATCH v8 2/3] ixgbe: Code refactoring

2015-03-18 Thread Vlad Zolotarov

   - ixgbe_rx_alloc_bufs():
  - Reset the rte_mbuf fields only when requested.
  - Take the RDT update out of the function.
  - Add the stub when RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not defined.
   - ixgbe_recv_scattered_pkts():
  - Take the code that updates the fields of the cluster's HEAD buffer into
the inline function.

Signed-off-by: Vlad Zolotarov 
---
New in v8:
   - Fixed the structs naming: igb_xxx -> ixgbe_xxx
   - Adjust a code style with the ixgbe PMD styling.

New in v3:
   - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1.
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 127 --
 1 file changed, 82 insertions(+), 45 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index f17e8e1..a08ae6a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1021,7 +1021,7 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 }

 static inline int
-ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
+ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
 {
volatile union ixgbe_adv_rx_desc *rxdp;
struct ixgbe_rx_entry *rxep;
@@ -1042,11 +1042,14 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
for (i = 0; i < rxq->rx_free_thresh; ++i) {
/* populate the static rte mbuf fields */
mb = rxep[i].mbuf;
+   if (reset_mbuf) {
+   mb->next = NULL;
+   mb->nb_segs = 1;
+   mb->port = rxq->port_id;
+   }
+
rte_mbuf_refcnt_set(mb, 1);
-   mb->next = NULL;
mb->data_off = RTE_PKTMBUF_HEADROOM;
-   mb->nb_segs = 1;
-   mb->port = rxq->port_id;

/* populate the descriptors */
dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
@@ -1054,10 +1057,6 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
rxdp[i].read.pkt_addr = dma_addr;
}

-   /* update tail pointer */
-   rte_wmb();
-   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rxq->rx_free_trigger);
-
/* update state of internal queue structure */
rxq->rx_free_trigger = rxq->rx_free_trigger + rxq->rx_free_thresh;
if (rxq->rx_free_trigger >= rxq->nb_rx_desc)
@@ -1109,7 +1108,9 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

/* if required, allocate new buffers to replenish descriptors */
if (rxq->rx_tail > rxq->rx_free_trigger) {
-   if (ixgbe_rx_alloc_bufs(rxq) != 0) {
+   uint16_t cur_free_trigger = rxq->rx_free_trigger;
+
+   if (ixgbe_rx_alloc_bufs(rxq, true) != 0) {
int i, j;
PMD_RX_LOG(DEBUG, "RX mbuf alloc failed port_id=%u "
   "queue_id=%u", (unsigned) rxq->port_id,
@@ -1129,6 +1130,10 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

return 0;
}
+
+   /* update tail pointer */
+   rte_wmb();
+   IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, cur_free_trigger);
}

if (rxq->rx_tail >= rxq->nb_rx_desc)
@@ -1168,6 +1173,13 @@ ixgbe_recv_pkts_bulk_alloc(void *rx_queue, struct 
rte_mbuf **rx_pkts,

return nb_rx;
 }
+#else
+static inline int
+ixgbe_rx_alloc_bufs(__rte_unused struct ixgbe_rx_queue *rxq,
+   __rte_unused bool reset_mbuf)
+{
+   return -ENOMEM;
+}
 #endif /* RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC */

 uint16_t
@@ -1352,6 +1364,64 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return (nb_rx);
 }

+/**
+ * Detect an RSC descriptor.
+ */
+static inline uint32_t
+ixgbe_rsc_count(union ixgbe_adv_rx_desc *rx)
+{
+   return (rte_le_to_cpu_32(rx->wb.lower.lo_dword.data) &
+   IXGBE_RXDADV_RSCCNT_MASK) >> IXGBE_RXDADV_RSCCNT_SHIFT;
+}
+
+/**
+ * ixgbe_fill_cluster_head_buf - fill the first mbuf of the returned packet
+ *
+ * Fill the following info in the HEAD buffer of the Rx cluster:
+ *- RX port identifier
+ *- hardware offload data, if any:
+ *  - RSS flag & hash
+ *  - IP checksum flag
+ *  - VLAN TCI, if any
+ *  - error flags
+ * @head HEAD of the packet cluster
+ * @desc HW descriptor to get data from
+ * @port_id Port ID of the Rx queue
+ */
+static inline void
+ixgbe_fill_cluster_head_buf(
+   struct rte_mbuf *head,
+   union ixgbe_adv_rx_desc *desc,
+   uint8_t port_id,
+   uint32_t staterr)
+{
+   uint32_t hlen_type_rss;
+   uint64_t pkt_flags;
+
+   head->port = port_id;
+
+   /*
+* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is
+* set in the pkt_flags field.
+*/
+

[dpdk-dev] [dpdk=dev] [PATCH v8 1/3] ixgbe: Cleanups

2015-03-18 Thread Vlad Zolotarov

   - Removed the not needed casting.
   - ixgbe_dev_rx_init(): shorten the lines by defining a local alias variable 
to access
  >data->dev_conf.rxmode.

Signed-off-by: Vlad Zolotarov 
---
New in v6:
   - Fixed a compilation error caused by a patches recomposition during series 
separation.
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 42f0aa5..f17e8e1 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1031,8 +1031,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
int diag, i;

/* allocate buffers in bulk directly into the S/W ring */
-   alloc_idx = (uint16_t)(rxq->rx_free_trigger -
-   (rxq->rx_free_thresh - 1));
+   alloc_idx = rxq->rx_free_trigger - (rxq->rx_free_thresh - 1);
rxep = >sw_ring[alloc_idx];
diag = rte_mempool_get_bulk(rxq->mb_pool, (void *)rxep,
rxq->rx_free_thresh);
@@ -1060,10 +1059,9 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rxq->rx_free_trigger);

/* update state of internal queue structure */
-   rxq->rx_free_trigger = (uint16_t)(rxq->rx_free_trigger +
-   rxq->rx_free_thresh);
+   rxq->rx_free_trigger = rxq->rx_free_trigger + rxq->rx_free_thresh;
if (rxq->rx_free_trigger >= rxq->nb_rx_desc)
-   rxq->rx_free_trigger = (uint16_t)(rxq->rx_free_thresh - 1);
+   rxq->rx_free_trigger = rxq->rx_free_thresh - 1;

/* no errors */
return 0;
@@ -3579,6 +3577,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxcsum;
uint16_t buf_size;
uint16_t i;
+   struct rte_eth_rxmode *rx_conf = >data->dev_conf.rxmode;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3601,7 +3600,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
 * Configure CRC stripping, if any.
 */
hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
-   if (dev->data->dev_conf.rxmode.hw_strip_crc)
+   if (rx_conf->hw_strip_crc)
hlreg0 |= IXGBE_HLREG0_RXCRCSTRP;
else
hlreg0 &= ~IXGBE_HLREG0_RXCRCSTRP;
@@ -3609,11 +3608,11 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
/*
 * Configure jumbo frame support, if any.
 */
-   if (dev->data->dev_conf.rxmode.jumbo_frame == 1) {
+   if (rx_conf->jumbo_frame == 1) {
hlreg0 |= IXGBE_HLREG0_JUMBOEN;
maxfrs = IXGBE_READ_REG(hw, IXGBE_MAXFRS);
maxfrs &= 0x;
-   maxfrs |= (dev->data->dev_conf.rxmode.max_rx_pkt_len << 16);
+   maxfrs |= (rx_conf->max_rx_pkt_len << 16);
IXGBE_WRITE_REG(hw, IXGBE_MAXFRS, maxfrs);
} else
hlreg0 &= ~IXGBE_HLREG0_JUMBOEN;
@@ -3637,9 +3636,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
 * Reset crc_len in case it was changed after queue setup by a
 * call to configure.
 */
-   rxq->crc_len = (uint8_t)
-   ((dev->data->dev_conf.rxmode.hw_strip_crc) ? 0 :
-   ETHER_CRC_LEN);
+   rxq->crc_len = rx_conf->hw_strip_crc ? 0 : ETHER_CRC_LEN;

/* Setup the Base and Length of the Rx Descriptor Rings */
bus_addr = rxq->rx_ring_phys_addr;
@@ -3657,7 +3654,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
/*
 * Configure Header Split
 */
-   if (dev->data->dev_conf.rxmode.header_split) {
+   if (rx_conf->header_split) {
if (hw->mac.type == ixgbe_mac_82599EB) {
/* Must setup the PSRTYPE register */
uint32_t psrtype;
@@ -3667,7 +3664,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
IXGBE_PSRTYPE_IPV6HDR;
IXGBE_WRITE_REG(hw, 
IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
}
-   srrctl = ((dev->data->dev_conf.rxmode.split_hdr_size <<
+   srrctl = ((rx_conf->split_hdr_size <<
IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
IXGBE_SRRCTL_BSIZEHDR_MASK);
srrctl |= IXGBE_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
@@ -3701,7 +3698,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
dev->data->s

[dpdk-dev] [dpdk=dev] [PATCH v8 0/3]: Add LRO support to ixgbe PMD

2015-03-18 Thread Vlad Zolotarov

This series adds the missing flow for enabling the LRO in the ethdev and
adds a support for this feature in the ixgbe PMD. There is a big hope that this
initiative is going to be picked up by some Intel developer that would add the 
LRO support
to other Intel PMDs.

The series starts with some cleanup work in the code the final patch (the 
actual adding of
the LRO support) is going to touch/use/change. There are still quite a few 
issues in the ixgbe
PMD code left but they have to be a matter of a different series and I've left 
a few "TODO"
remarks in the code.

The LRO ("RSC" in Intel's context) PMD completion handling code follows the 
same design as the
corresponding Linux and FreeBSD implementation: pass the aggregation's cluster 
HEAD buffer to
the NEXTP entry of the software ring till EOP is met.

HW configuration follows the corresponding specs: this feature is supported 
only by x540 and
82599 PF devices.

The feature has been tested with seastar TCP stack with the following 
configuration on Tx side:
   - MTU: 400B
   - 100 concurrent TCP connections.

The results were:
   - Without LRO: total throughput: 0.12Gbps, coefficient of variance: 1.41%
   - With LRO:total throughput: 8.21Gbps, coefficient of variance: 0.59%

This is an almost factor 80 improvement.

New in v8:
   - Fixed the structs naming: igb_xxx -> ixgbe_xxx (some leftovers in PATCH2).
   - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
 function - ixgbe_set_rsc().
   - Added some missing macros for HW configuration.
   - Styling adjustments:
  - Functions names.
  - Functions descriptions.
   - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
   - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
 ixgbe PMD.

New in v7:
   - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
   - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
  - Don't set them to FALSE in rte_eth_dev_stop() flow - the following
rte_eth_dev_start() will need them.
  - Reset them to TRUE in rte_eth_dev_configure() and not in a probe() flow.
This will ensure the proper behaviour if port is re-configured.
   - Reset the sw_ring[].mbuf entry in a bulk allocation case.
 This is needed for ixgbe_rx_queue_release_mbufs().
   - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
 non-bulk allocation case.
   - Don't allow RSC when device is configured in an SR-IOV mode.

New in v6:
   - Fix of the typo in the "bug fixes" series that broke the compilation 
caused a
 minor change in this follow-up series.

New in v5:
   - Split the series into "bug fixes" and "all the rest" so that the former 
could be
 integrated into a 2.0 release.
   - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
rte_ethdev.h.
   - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.

New in v4:
   - Remove CONFIG_RTE_ETHDEV_LRO_SUPPORT from config/common_linuxapp.
   - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h.
   - As a result of "ixgbe: check rxd number to avoid mbuf leak" (352078e8e) 
Vector Rx
 had to get the same treatment as Rx Bulk Alloc (see PATCH4 for more 
details).

New in v3:
   - ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1. Otherwise 
rte_pktmbuf_free()
 won't free them.

New in v2:
   - Removed rte_eth_dev_data.lro_bulk_alloc and added 
ixgbe_hw.rx_bulk_alloc_allowed
 instead.
   - Unified the rx_pkt_bulk callback setting (a separate new patch).
   - Fixed a few styling and spelling issues.


Vlad Zolotarov (3):
  ixgbe: Cleanups
  ixgbe: Code refactoring
  ixgbe: Add LRO support

 lib/librte_ether/rte_ethdev.h   |   9 +-
 lib/librte_net/rte_ip.h |   3 +
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  11 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 766 +---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 7 files changed, 737 insertions(+), 69 deletions(-)

-- 
2.1.0

[dpdk-dev] [PATCH v6 3/3] ixgbe: Add LRO support

2015-03-18 Thread Vlad Zolotarov



On 03/18/15 02:31, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Monday, March 16, 2015 6:27 PM
>> To: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 3/3] ixgbe: Add LRO support
>>
>>
>>
>> On 03/09/15 21:07, Vlad Zolotarov wrote:
>>>   - Only x540 and 82599 devices support LRO.
>>>   - Add the appropriate HW configuration.
>>>   - Add RSC aware rx_pkt_burst() handlers:
>>>  - Implemented bulk allocation and non-bulk allocation versions.
>>>  - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
>>>    and to igb_rx_queue.
>>>  - Use the appropriate handler when LRO is requested.
>>>
>>> Signed-off-by: Vlad Zolotarov 
>>> ---
>>> New in v5:
>>>  - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
>>> rte_ethdev.h.
>>>  - Removed the "TODO: Remove me" comment near 
>>> RTE_ETHDEV_HAS_LRO_SUPPORT.
>>>
>>> New in v4:
>>>  - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>>>RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>>
>>> New in v2:
>>>  - Removed rte_eth_dev_data.lro_bulk_alloc.
>>>  - Fixed a few styling and spelling issues.
>>> ---
>>>lib/librte_ether/rte_ethdev.h   |   9 +-
>>>lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   6 +
>>>lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
>>>lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 562 
>>> +++-
>>>lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>>>5 files changed, 581 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>>> index 8db3127..44f081f 100644
>>> --- a/lib/librte_ether/rte_ethdev.h
>>> +++ b/lib/librte_ether/rte_ethdev.h
>>> @@ -172,6 +172,9 @@ extern "C" {
>>>
>>>#include 
>>>
>>> +/* Use this macro to check if LRO API is supported */
>>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>>> +
>>>#include 
>>>#include 
>>>#include 
>>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>>> enum rte_eth_rx_mq_mode mq_mode;
>>> uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame enabled. */
>>> uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
>>> -   uint8_t header_split : 1, /**< Header Split enable. */
>>> +   uint16_t header_split : 1, /**< Header Split enable. */
>>> hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. 
>>> */
>>> hw_vlan_filter   : 1, /**< VLAN filter enable. */
>>> hw_vlan_strip: 1, /**< VLAN strip enable. */
>>> hw_vlan_extend   : 1, /**< Extended VLAN enable. */
>>> jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
>>> hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
>>> -   enable_scatter   : 1; /**< Enable scatter packets rx handler */
>>> +   enable_scatter   : 1, /**< Enable scatter packets rx handler */
>>> +   enable_lro   : 1; /**< Enable LRO */
>>>};
>>>
>>>/**
>>> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
>>> uint8_t port_id;   /**< Device [external] port identifier. */
>>> uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
>>> scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
>>> OFF(0) */
>>> +   lro  : 1,  /**< RX LRO is ON(1) / OFF(0) */
>>> all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
>>> dev_started : 1;   /**< Device state: STARTED(1) / STOPPED(0). 
>>> */
>>>};
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> index 9d3de1a..765174d 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> @@ -1648,6 +1648,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>>
>>> /* Clear stored conf */
>>> dev->data->scattered_rx = 0;
>>> +   dev->data->lro = 0;
>>> hw->rx_bulk_alloc_allo

[dpdk-dev] [PATCH v6 3/3] ixgbe: Add LRO support

2015-03-16 Thread Vlad Zolotarov



On 03/09/15 21:07, Vlad Zolotarov wrote:
>  - Only x540 and 82599 devices support LRO.
>  - Add the appropriate HW configuration.
>  - Add RSC aware rx_pkt_burst() handlers:
> - Implemented bulk allocation and non-bulk allocation versions.
> - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
>   and to igb_rx_queue.
> - Use the appropriate handler when LRO is requested.
>
> Signed-off-by: Vlad Zolotarov 
> ---
> New in v5:
> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of 
> rte_ethdev.h.
> - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
>
> New in v4:
> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>   RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>
> New in v2:
> - Removed rte_eth_dev_data.lro_bulk_alloc.
> - Fixed a few styling and spelling issues.
> ---
>   lib/librte_ether/rte_ethdev.h   |   9 +-
>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   6 +
>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 562 
> +++-
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
>   5 files changed, 581 insertions(+), 7 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 8db3127..44f081f 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -172,6 +172,9 @@ extern "C" {
>   
>   #include 
>   
> +/* Use this macro to check if LRO API is supported */
> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
> +
>   #include 
>   #include 
>   #include 
> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>   enum rte_eth_rx_mq_mode mq_mode;
>   uint32_t max_rx_pkt_len;  /**< Only used if jumbo_frame enabled. */
>   uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
> - uint8_t header_split : 1, /**< Header Split enable. */
> + uint16_t header_split : 1, /**< Header Split enable. */
>   hw_ip_checksum   : 1, /**< IP/UDP/TCP checksum offload enable. 
> */
>   hw_vlan_filter   : 1, /**< VLAN filter enable. */
>   hw_vlan_strip: 1, /**< VLAN strip enable. */
>   hw_vlan_extend   : 1, /**< Extended VLAN enable. */
>   jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
>   hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
> - enable_scatter   : 1; /**< Enable scatter packets rx handler */
> + enable_scatter   : 1, /**< Enable scatter packets rx handler */
> + enable_lro   : 1; /**< Enable LRO */
>   };
>   
>   /**
> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
>   uint8_t port_id;   /**< Device [external] port identifier. */
>   uint8_t promiscuous   : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
>   scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
> OFF(0) */
> + lro  : 1,  /**< RX LRO is ON(1) / OFF(0) */
>   all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
>   dev_started : 1;   /**< Device state: STARTED(1) / STOPPED(0). 
> */
>   };
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 9d3de1a..765174d 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -1648,6 +1648,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>   
>   /* Clear stored conf */
>   dev->data->scattered_rx = 0;
> + dev->data->lro = 0;
>   hw->rx_bulk_alloc_allowed = false;
>   hw->rx_vec_allowed = false;
>   
> @@ -2018,6 +2019,11 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
> rte_eth_dev_info *dev_info)
>   DEV_RX_OFFLOAD_IPV4_CKSUM |
>   DEV_RX_OFFLOAD_UDP_CKSUM  |
>   DEV_RX_OFFLOAD_TCP_CKSUM;
> +
> + if (hw->mac.type == ixgbe_mac_82599EB ||
> + hw->mac.type == ixgbe_mac_X540)
> + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
> +
>   dev_info->tx_offload_capa =
>   DEV_TX_OFFLOAD_VLAN_INSERT |
>   DEV_TX_OFFLOAD_IPV4_CKSUM  |
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> index a549f5c..e206584 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> @@ -349,6 +349,11 @@ uint16_t ixgbe_recv_pkts_bulk_alloc(void *rx_queue, 
> struct rte_mbuf **rx_pkts,
>   uint16_t ixgbe_rec

[dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD Rx flow

2015-03-16 Thread Vlad Zolotarov



On 03/13/15 15:03, Vladislav Zolotarov wrote:
>
>
> On Mar 13, 2015 2:51 PM, "Ananyev, Konstantin" 
> mailto:konstantin.ananyev at intel.com>> 
> wrote:
> >
> > Hi Vlad,
> >
> > > -Original Message-
> > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>]
> > > Sent: Friday, March 13, 2015 11:52 AM
> > > To: Ananyev, Konstantin; dev at dpdk.org <mailto:dev at dpdk.org>
> > > Subject: Re: [dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF 
> PMD Rx flow
> > >
> > >
> > >
> > > On 03/13/15 13:07, Ananyev, Konstantin wrote:
> > > >
> > > >> -Original Message-
> > > >> From: dev [mailto:dev-bounces at dpdk.org 
> <mailto:dev-bounces at dpdk.org>] On Behalf Of Vlad Zolotarov
> > > >> Sent: Thursday, March 12, 2015 9:17 PM
> > > >> To: dev at dpdk.org <mailto:dev at dpdk.org>
> > > >> Subject: [dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF 
> PMD Rx flow
> > > >>
> > > >> This series contains some bug fixes that were found during my 
> work on the ixgbe LRO
> > > >> patches. Sending this series separately on Thomas request so 
> that it may be integrated
> > > >> into the 2.0 release.
> > > >>
> > > >> New in v3:
> > > >> - Adjusted to the new structs naming in the master.
> > > >> - Fixed rx_bulk_alloc_allowed and rx_vec_allowed 
> initialization:
> > > >>- Don't set them to FALSE in rte_eth_dev_stop() flow - 
> the following
> > > >>  rte_eth_dev_start() will need them.
> > > >>- Reset them to TRUE in rte_eth_dev_configure() and not 
> in a probe() flow.
> > > >>  This will ensure the proper behaviour if port is 
> re-configured.
> > > >> - Rename:
> > > >>- ixgbe_rx_vec_condition_check() -> 
> ixgbe_rx_vec_dev_conf_condition_check()
> > > >>- set_rx_function() -> ixgbe_set_rx_function()
> > > >> - Clean up the logic in ixgbe_set_rx_function().
> > > >> - Define stubs with __attribute__((weak)) instead of using 
> #ifdef's.
> > > >> - Styling: beautify ixgbe_rxtx.h a bit.
> > > >>
> > > >> New in v2:
> > > >> - Fixed a compilation failure.
> > > >>
> > > >>
> > > >> Vlad Zolotarov (3):
> > > >>ixgbe: Use the rte_le_to_cpu_xx()/rte_cpu_to_le_xx() when
> > > >>  reading/setting HW ring descriptor fields
> > > >>ixgbe: Bug fix: Properly configure Rx CRC stripping for x540 
> devices
> > > >>ixgbe: Unify the rx_pkt_bulk callback initialization
> > > >>
> > > >>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  2 +
> > > >>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 13 +-
> > > >>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 216 
> +---
> > > >>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   | 28 -
> > > >>   lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |  2 +-
> > > >>   5 files changed, 183 insertions(+), 78 deletions(-)
> > > >>
> > > > Acked-by: Konstantin Ananyev  <mailto:konstantin.ananyev at intel.com>>
> > > >
> > > > Just one nit:
> > > >
> > > > +int __attribute__((weak)) ixgbe_rxq_vec_setup(
> > > > +   struct ixgbe_rx_queue __rte_unused *rxq)
> > > > +{
> > > >
> > > > Please use notation:
> > > > int __attribute__((weak))
> > > > ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
> > > >
> > > > To keep up with the rest of the code, plus makes much easier to 
> read.
> > >
> > > I took an example from kni/ethtool/igb/kcompat.h for a template but no
> > > problem.
> > > Do u want me to respin or it's ok? I will use this format for the
> > > follow-up LRO patch anyway...
> >
> > Doing that in LRO patch set is ok.
> > No need for respin that one, I think.
>
> Great! Thanks a lot for reviewing this.
>
> Thomas, it seems like ixgbe maintainer gives this series a green 
> light!.. ;)
>

Ping.
Thomas, could u, pls., consider  applying this series to at least the 
upstream master?

thanks in advance,
vlad

> > Konstantin
> >
> > >
> > > >
> > > >> --
> > > >> 2.1.0
> >
>

[dpdk-dev] [PATCH v6 3/3] ixgbe: Add LRO support

2015-03-13 Thread Vlad Zolotarov



On 03/13/15 13:28, Ananyev, Konstantin wrote:
> Hi Olivier,
>
>> -Original Message-
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Friday, March 13, 2015 9:08 AM
>> To: Vlad Zolotarov; Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 3/3] ixgbe: Add LRO support
>>
>> Hi Vlad,
>>
>> On 03/11/2015 05:54 PM, Vlad Zolotarov wrote:
>>>>>> About the existing RX/TX functions and PPC support:
>>>>>> Note that all of them were created before PPC support for DPDK was
>>>>>> introduced.
>>>>>> At that moment only IA was supported.
>>>>>> That's why in some places where you would expect to see 'mb()' there
>>>>>> are 'volatile' and/or ' rte_compiler_barrier' instead.
>>>>>> Why all that places wasn't updated when PPC support was added -
>>>>>> that's another question.
>>>>>>From my understanding - with current implementation some of DPDK
>>>>>> PMDs RX/TX functions and  rte_ring wouldn't work correctly
>>>>> on PPC.
>>>>>> So, I suppose we need to decide for ourselves - do we really want to
>>>>>> support PPC and other architectures with non-IA memory
>>>>> model or not?
>>>>>> If not, then I think we don't need any mb()s inside recv_pkts_lro()
>>>>>> - just rte_compiler_barrier seems enough, and no point to
>>>>> complain about
>>>>>> it in comments.
>>>>>> If yes - then why to introduce a new function with a known potential
>>>>>> bug?
>>>>> In order to introduce a new function with the proper implementation or
>>>>> to fix any other places with the similar weakness I would need a proper
>>>>> tools like a proper platform-dependent barrier-macros similar to
>>>>> smp_Xmb() Linux macros that reduce to a compiler barrier where
>>>>> appropriate or to a proper memory fence where needed.
>>>> I understand that.
>>>> Let's add new macro for that: rte_smp_Xmb() or something,
>>>> so it would be just rte_compiler_barrier() for x86 and a proper mb()
>>>> for PPC.
>>> There was an idea to use the C11 built-in memory barriers. I suggest we
>>> open a separate discussion about that and add these and the appropriate
>>> fixes in a separate series. There are quite a few places to fix anyway,
>>> which are currently broken on PPC so this patch doesn't make things any
>>> worse. However adding a new memory barrier doesn't belong to an LRO
>>> functionality and thus to this series.
>> This is an interesting discussion. Just for reference, I submitted a
>> patch on this topic but it was probably too early as only Intel
>> architecture was supported at that time.
>>
>> See http://dpdk.org/ml/archives/dev/2014-May/002597.html
> I do remember that conversation :)
> At that moment, as nothing except IA wasn't supported, I feel it was not 
> needed.
> Though now, if we do want to support PPC and other architectures with weak 
> memory model,
> I think we do need to introduce some platform dependent set of Xmb() macros.
> See http://dpdk.org/ml/archives/dev/2014-October/006729.html
>
> Actually while thinking about it once again:
> Is there any good use for rte_compiler_barrier() for PPC memory model?
> I can't think about any.
> So I wonder can't we just make for PPC:
>   #define rte_compiler_barrierrte_mb
> While keeping it as it is for IA.
> Would save us from searching/replacing though all the code.

I wonder why should we invent a wheel? Like Avi has proposed we may use 
the existing standard C library primitives for that. See this 
http://en.cppreference.com/w/c/atomic. I don't know what's the state of 
icc in this area though... ;)

Pros:

  * Zero maintenance.
  * Multi-platform support.
  * It seems that this is the direction the industry is going to (as
opposed to the discussed above mb(), rmb(), wmb() model).

Cons:

  * The model is a bit different from what most of the kernel
programmers used to.
  * The current code adaptation would be a bit more painful (due to
first "cons").


I think this could be a very nice move. For a user space for sure. The 
open question is a KNI component. I don't know how much code is shared 
between kernel and user space DPDK code but if there isn't much - then 
we may still go for a built-in C atomics primitives in a user space and 
do whatever we choose in the KNI...

>
>   Konstantin
>
>
>
>> Regards,
>> Olivier

[dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD Rx flow

2015-03-13 Thread Vlad Zolotarov



On 03/13/15 13:07, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Thursday, March 12, 2015 9:17 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD Rx flow
>>
>> This series contains some bug fixes that were found during my work on the 
>> ixgbe LRO
>> patches. Sending this series separately on Thomas request so that it may be 
>> integrated
>> into the 2.0 release.
>>
>> New in v3:
>> - Adjusted to the new structs naming in the master.
>> - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
>>- Don't set them to FALSE in rte_eth_dev_stop() flow - the following
>>  rte_eth_dev_start() will need them.
>>- Reset them to TRUE in rte_eth_dev_configure() and not in a probe() 
>> flow.
>>  This will ensure the proper behaviour if port is re-configured.
>> - Rename:
>>- ixgbe_rx_vec_condition_check() -> 
>> ixgbe_rx_vec_dev_conf_condition_check()
>>- set_rx_function() -> ixgbe_set_rx_function()
>> - Clean up the logic in ixgbe_set_rx_function().
>> - Define stubs with __attribute__((weak)) instead of using #ifdef's.
>> - Styling: beautify ixgbe_rxtx.h a bit.
>>
>> New in v2:
>> - Fixed a compilation failure.
>>
>>
>> Vlad Zolotarov (3):
>>ixgbe: Use the rte_le_to_cpu_xx()/rte_cpu_to_le_xx() when
>>  reading/setting HW ring descriptor fields
>>ixgbe: Bug fix: Properly configure Rx CRC stripping for x540 devices
>>ixgbe: Unify the rx_pkt_bulk callback initialization
>>
>>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 +
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  13 +-
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 216 
>> +---
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  28 -
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |   2 +-
>>   5 files changed, 183 insertions(+), 78 deletions(-)
>>
> Acked-by: Konstantin Ananyev 
>
> Just one nit:
>
> +int __attribute__((weak)) ixgbe_rxq_vec_setup(
> + struct ixgbe_rx_queue __rte_unused *rxq)
> +{
>
> Please use notation:
> int __attribute__((weak))
> ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
>
> To keep up with the rest of the code, plus makes much easier to read.

I took an example from kni/ethtool/igb/kcompat.h for a template but no 
problem.
Do u want me to respin or it's ok? I will use this format for the 
follow-up LRO patch anyway...

>
>> --
>> 2.1.0

[dpdk-dev] [PATCH v3 3/3] ixgbe: Unify the rx_pkt_bulk callback initialization

2015-03-13 Thread Vlad Zolotarov

   - Set the callback in a single function that is called from
 ixgbe_dev_rx_init() for a primary process and from eth_ixgbe_dev_init()
 for a secondary processes. This is instead of multiple, hard to track 
places.
   - Added ixgbe_hw.rx_bulk_alloc_allowed - see ixgbe_hw.rx_vec_allowed 
description below.
   - Added ixgbe_hw.rx_vec_allowed: like with Bulk Allocation, Vector Rx is
 enabled or disabled on a per-port level. All queues have to meet the 
appropriate
 preconditions and if any of them doesn't - the feature has to be disabled.
 Therefore ixgbe_hw.rx_vec_allowed will be updated during each queues 
configuration
 (rte_eth_rx_queue_setup()) and then used in rte_eth_dev_start() to 
configure the
 appropriate callbacks. The same happens with ixgbe_hw.rx_vec_allowed in a 
Bulk Allocation
 context.
   - Bugs fixed:
  - Vector scattered packets callback was called regardless the appropriate
preconditions:
 - Vector Rx specific preconditions.
 - Bulk Allocation preconditions.
  - Vector Rx was enabled/disabled according to the last queue setting and 
not
based on all queues setting (which may be different for each queue).

Signed-off-by: Vlad Zolotarov 
---
New in v3:
   - Adjusted to the new structs naming in the master.
   - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
  - Don't set them to FALSE in rte_eth_dev_stop() flow - the following
rte_eth_dev_start() will need them.
  - Reset them to TRUE in rte_eth_dev_configure() and not in a probe() flow.
This will ensure the proper behaviour if port is re-configured.
   - Rename:
  - ixgbe_rx_vec_condition_check() -> 
ixgbe_rx_vec_dev_conf_condition_check()
  - set_rx_function() -> ixgbe_set_rx_function()
   - Clean up the logic in ixgbe_set_rx_function().
   - Define stubs with __attribute__((weak)) instead of using #ifdef's.
   - Styling: beautify ixgbe_rxtx.h a bit.

New in v2:
   - Fixed an artifact caused by git rebasing that broke the compilation.
---
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  13 ++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 200 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  28 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |   2 +-
 5 files changed, 174 insertions(+), 71 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h 
b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
index c67d462..9a66370 100644
--- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
+++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
@@ -3657,6 +3657,8 @@ struct ixgbe_hw {
bool force_full_reset;
bool allow_unsupported_sfp;
bool wol_enabled;
+   bool rx_bulk_alloc_allowed;
+   bool rx_vec_allowed;
 };

 #define ixgbe_call_func(hw, func, params, error) \
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index e4edb01..92d75db 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -756,8 +756,8 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
   "Using default TX function.");
}

-   if (eth_dev->data->scattered_rx)
-   eth_dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
+   ixgbe_set_rx_function(eth_dev);
+
return 0;
}
pci_dev = eth_dev->pci_dev;
@@ -1429,12 +1429,21 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
 {
struct ixgbe_interrupt *intr =
IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

PMD_INIT_FUNC_TRACE();

/* set flag to update link status after init */
intr->flags |= IXGBE_FLAG_NEED_LINK_UPDATE;

+   /*
+* Initialize to TRUE. If any of Rx queues doesn't meet the bulk
+* allocation or vector Rx preconditions we will reset it.
+*/
+   hw->rx_bulk_alloc_allowed = true;
+   hw->rx_vec_allowed = true;
+
return 0;
 }

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 92be61e..5b1ba82 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -2075,12 +2075,12 @@ check_rx_burst_bulk_alloc_preconditions(__rte_unused 
struct ixgbe_rx_queue *rxq)

 /* Reset dynamic ixgbe_rx_queue fields back to defaults */
 static void
-ixgbe_reset_rx_queue(struct ixgbe_rx_queue *rxq)
+ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct ixgbe_rx_queue *rxq)
 {
static const union ixgbe_adv_rx_desc zeroed_desc = { .read = {
.pkt_addr = 0}};
unsigned i;
-   uint16_t len;
+   uint16_t len = rxq->nb_rx_desc;

/*
 * By default, the Rx queue setup function allocates enough memory fo

[dpdk-dev] [PATCH v3 2/3] ixgbe: Bug fix: Properly configure Rx CRC stripping for x540 devices

2015-03-13 Thread Vlad Zolotarov

According to x540 spec chapter 8.2.4.8.9 CRCSTRIP field of RDRXCTL should
be configured to the same value as HLREG0.RXCRCSTRP.

Clearing the RDRXCTL.RSCFRSTSIZE field for x540 is not required by the spec
but seems harmless.

Signed-off-by: Vlad Zolotarov 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index f7c081f..92be61e 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3680,7 +3680,8 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)

IXGBE_WRITE_REG(hw, IXGBE_RXCSUM, rxcsum);

-   if (hw->mac.type == ixgbe_mac_82599EB) {
+   if (hw->mac.type == ixgbe_mac_82599EB ||
+   hw->mac.type == ixgbe_mac_X540) {
rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
if (dev->data->dev_conf.rxmode.hw_strip_crc)
rdrxctl |= IXGBE_RDRXCTL_CRCSTRIP;
-- 
2.1.0

[dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD Rx flow

2015-03-13 Thread Vlad Zolotarov

This series contains some bug fixes that were found during my work on the ixgbe 
LRO
patches. Sending this series separately on Thomas request so that it may be 
integrated
into the 2.0 release.

New in v3:
   - Adjusted to the new structs naming in the master.
   - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
  - Don't set them to FALSE in rte_eth_dev_stop() flow - the following
rte_eth_dev_start() will need them.
  - Reset them to TRUE in rte_eth_dev_configure() and not in a probe() flow.
This will ensure the proper behaviour if port is re-configured.
   - Rename:
  - ixgbe_rx_vec_condition_check() -> 
ixgbe_rx_vec_dev_conf_condition_check()
  - set_rx_function() -> ixgbe_set_rx_function()
   - Clean up the logic in ixgbe_set_rx_function().
   - Define stubs with __attribute__((weak)) instead of using #ifdef's.
   - Styling: beautify ixgbe_rxtx.h a bit.

New in v2:
   - Fixed a compilation failure.


Vlad Zolotarov (3):
  ixgbe: Use the rte_le_to_cpu_xx()/rte_cpu_to_le_xx() when
reading/setting HW ring descriptor fields
  ixgbe: Bug fix: Properly configure Rx CRC stripping for x540 devices
  ixgbe: Unify the rx_pkt_bulk callback initialization

 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  13 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 216 +---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  28 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |   2 +-
 5 files changed, 183 insertions(+), 78 deletions(-)

-- 
2.1.0

[dpdk-dev] [PATCH v2 3/3] ixgbe: Unify the rx_pkt_bulk callback initialization

2015-03-11 Thread Vlad Zolotarov



On 03/11/15 14:45, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Monday, March 09, 2015 4:29 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v2 3/3] ixgbe: Unify the rx_pkt_bulk callback 
>> initialization
>>
>> - Set the callback in a single function that is called from
>>   ixgbe_dev_rx_init() for a primary process and from eth_ixgbe_dev_init()
>>   for a secondary processes. This is instead of multiple, hard to track 
>> places.
>> - Added ixgbe_hw.rx_bulk_alloc_allowed - see ixgbe_hw.rx_vec_allowed 
>> description below.
>> - Added ixgbe_hw.rx_vec_allowed: like with Bulk Allocation, Vector Rx is
>>   enabled or disabled on a per-port level. All queues have to meet the 
>> appropriate
>>   preconditions and if any of them doesn't - the feature has to be 
>> disabled.
>>   Therefore ixgbe_hw.rx_vec_allowed will be updated during each queues 
>> configuration
>>   (rte_eth_rx_queue_setup()) and then used in rte_eth_dev_start() to 
>> configure the
>>   appropriate callbacks. The same happens with ixgbe_hw.rx_vec_allowed 
>> in a Bulk Allocation
>>   context.
>> - Bugs fixed:
>>- Vector scattered packets callback was called regardless the 
>> appropriate
>>  preconditions:
>>   - Vector Rx specific preconditions.
>>   - Bulk Allocation preconditions.
>>    - Vector Rx was enabled/disabled according to the last queue setting 
>> and not
>>  based on all queues setting (which may be different for each queue).
>>
>> Signed-off-by: Vlad Zolotarov 
>> ---
>> New in v2:
>> - Fixed an broken compilation.
>> ---
>>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 +
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  13 ++-
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 183 
>> +---
>>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  22 +++-
>>   4 files changed, 152 insertions(+), 68 deletions(-)
>>
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h 
>> b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>> index c67d462..9a66370 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>> @@ -3657,6 +3657,8 @@ struct ixgbe_hw {
>>  bool force_full_reset;
>>  bool allow_unsupported_sfp;
>>  bool wol_enabled;
>> +bool rx_bulk_alloc_allowed;
>> +bool rx_vec_allowed;
>>   };
>>
>>   #define ixgbe_call_func(hw, func, params, error) \
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> index 9bdc046..9d3de1a 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> @@ -760,8 +760,8 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct 
>> eth_driver *eth_drv,
>> "Using default TX function.");
>>  }
>>
>> -if (eth_dev->data->scattered_rx)
>> -eth_dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
>> +set_rx_function(eth_dev);
>> +
>>  return 0;
>>  }
>>  pci_dev = eth_dev->pci_dev;
>> @@ -772,6 +772,13 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct 
>> eth_driver *eth_drv,
>>  hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
>>  hw->allow_unsupported_sfp = 1;
>>
>> +/*
>> + * Initialize to TRUE. If any of Rx queues doesn't meet the bulk
>> + * allocation or vector Rx preconditions we will reset it.
>> + */
>> +hw->rx_bulk_alloc_allowed = true;
>> +hw->rx_vec_allowed = true;
>> +
>>  /* Initialize the shared code (base driver) */
>>   #ifdef RTE_NIC_BYPASS
>>  diag = ixgbe_bypass_init_shared_code(hw);
>> @@ -1641,6 +1648,8 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>
>>  /* Clear stored conf */
>>  dev->data->scattered_rx = 0;
>> +hw->rx_bulk_alloc_allowed = false;
>> +hw->rx_vec_allowed = false;
> If dev_stop() sets it to 'false', who will reset it back to 'true' then?
>
>>  /* Clear recorded link status */
>>  memset(, 0, sizeof(link));
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> index ce9658e..a00f5c9 100644
>> --- a/lib/librte_pmd_ixgbe

1 2 3 >

1 - 100 of 223 matches

Mail list logo