[dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs

2015-10-23 Thread Vladislav Zolotarov
On Oct 23, 2015 9:30 AM, "Zhang, Helin"  wrote:
>
>
>
> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Friday, October 23, 2015 2:24 PM
> To: Zhang, Helin
> Cc: Lu, Wenzhuo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from
VFs
>
>
> On Oct 23, 2015 9:02 AM, "Zhang, Helin"  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Lu, Wenzhuo
> > > Sent: Friday, October 23, 2015 1:52 PM
> > > To: dev at dpdk.org
> > > Cc: Zhang, Helin; Lu, Wenzhuo
> > > Subject: [PATCH v4] ixgbe: Drop flow control frames from VFs
> > >
> > > This patch will drop flow control frames from being transmitted from
VSIs.
> > > With this patch in place a malicious VF cannot send flow control or
PFC packets
> > > out on the wire.
> The whole idea of this (and similar i40e patches sent before) is really
confusing.
> If u want to disable FC feature for VFs then go and disable the feature.
Why keep (not malicious) user think that he/she has enabled the feature
while u silently block it?
>
> Helin: I don't think disabling FC is equal to filtering out any pause
frames. How about the software application constructs a pause frame and
then tries to send it out?

But not disabling FC for the user and silently preventing it is bogus.
First, the conventional user should not be affected. I think this patch
(and all its clones) should be extended to, first, disable the FC Tx
feature for the relevant devices and only then adding any anti malicious
filtering.

>
> > >
> > > V2:
> > > Reword the comments.
> > >
> > > V3:
> > > Move the check of set_ethertype_anti_spoofing to the top of the
function, to
> > > avoid occupying an ethertype_filter entity without using it.
> > >
> > > V4:
> > > Remove the useless braces and return.
> > >
> > > Signed-off-by: Wenzhuo Lu 
> > Acked-by: Helin Zhang 
> >


[dpdk-dev] [PATCH v4] ixgbe: Drop flow control frames from VFs

2015-10-23 Thread Vladislav Zolotarov
On Oct 23, 2015 9:02 AM, "Zhang, Helin"  wrote:
>
>
>
> > -Original Message-
> > From: Lu, Wenzhuo
> > Sent: Friday, October 23, 2015 1:52 PM
> > To: dev at dpdk.org
> > Cc: Zhang, Helin; Lu, Wenzhuo
> > Subject: [PATCH v4] ixgbe: Drop flow control frames from VFs
> >
> > This patch will drop flow control frames from being transmitted from
VSIs.
> > With this patch in place a malicious VF cannot send flow control or PFC
packets
> > out on the wire.

The whole idea of this (and similar i40e patches sent before) is really
confusing.
If u want to disable FC feature for VFs then go and disable the feature.
Why keep (not malicious) user think that he/she has enabled the feature
while u silently block it?


> >
> > V2:
> > Reword the comments.
> >
> > V3:
> > Move the check of set_ethertype_anti_spoofing to the top of the
function, to
> > avoid occupying an ethertype_filter entity without using it.
> >
> > V4:
> > Remove the useless braces and return.
> >
> > Signed-off-by: Wenzhuo Lu 
> Acked-by: Helin Zhang 
>


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vladislav Zolotarov
On Oct 6, 2015 12:55 AM, "Michael S. Tsirkin"  wrote:
>
> On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
>
> Thinking about it some more, maybe some devices don't do DMA, and merely
> signal events with MSI/MSI-X.
>
> The fact you mention igb_uio in the cover letter seems to hint that this
> isn't the case, and that the real intent is to abuse it for DMA-capable
> devices, but still ...
>
> If we assume such a simple device, we need to block userspace from
> tweaking at least the MSI control and the MSI-X table.
> And changing BARs might make someone else corrupt the MSI-X
> table, so we need to block it from changing BARs, too.
>
> Things like device reset will clear the table.  I guess this means we
> need to track access to reset, too, make sure we restore the
> table to a sane config.
>
> PM  capability can be used to reset things tooI think. Better be
> careful about that.
>
> And a bunch of devices could be doing weird things that need
> to be special-cased.
>
> All of this is what VFIO is already dealing with.
>
> Maybe extending VFIO for this usecase, or finding another way to share
> code might be a better idea than duplicating the code within uio?

How about instead of trying to invent the wheel just go and attack the
problem directly just like i've proposed already a few times in the last
days: instead of limiting the UIO limit the users that are allowed to use
UIO to privileged users only (e.g. root). This would solve all clearly
unresolvable issues u are raising here all together, wouldn't it?

>
> --
> MST


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-27 Thread Vladislav Zolotarov
On Sep 27, 2015 12:43 PM, "Michael S. Tsirkin"  wrote:
>
> On Sun, Sep 27, 2015 at 10:05:11AM +0300, Vlad Zolotarov wrote:
> > Hi,
> > I was trying to use uio_pci_generic with Intel's 10G SR-IOV devices on
> > Amazon EC2 instances with Enhanced Networking enabled.
> > The idea is to create a DPDK environment that doesn't require compiling
> > kernel modules (igb_uio).
> > However I was surprised to discover that uio_pci_generic refuses to work
> > with EN device on AWS:
> >
> > $ lspci
> > 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
(rev 02)
> > 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton
II]
> > 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE
[Natoma/Triton II]
> > 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
> > 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> > 00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet
Controller Virtual Function (rev 01)
> > 00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet
Controller Virtual Function (rev 01)
> > 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device
(rev 01)
> >
> > $ sudo ./dpdk/tools/dpdk_nic_bind.py -b uio_pci_generic 00:04.0
> > Error: bind failed for :00:04.0 - Cannot bind to driver
uio_pci_generic
>
> > $dmesg
> >
> > --> snip <---
> > [  816.655575] uio_pci_generic :00:04.0: No IRQ assigned to device:
no support for interrupts?
> >
> > $ sudo lspci -s 00:04.0 -vvv
> > 00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet
Controller Virtual Function (rev 01)
> >   Physical Slot: 4
> >   Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
> >   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR-  >   Region 0: Memory at f3008000 (64-bit, prefetchable) [size=16K]
> >   Region 3: Memory at f300c000 (64-bit, prefetchable) [size=16K]
> >   Capabilities: [70] MSI-X: Enable- Count=3 Masked-
> >   Vector table: BAR=3 offset=
> >   PBA: BAR=3 offset=2000
> >   Kernel modules: ixgbevf
> >
> > So, as we may see the PCI device doesn't have an INTX interrupt line
> > assigned indeed. It has an MSI-X capability however.
> > Looking at the uio_pci_generic code it seems to require the INTX:
> >
> > uio_pci_generic.c: line 74: probe():
> >
> >   if (!pdev->irq) {
> >   dev_warn(>dev, "No IRQ assigned to device: "
> >"no support for interrupts?\n");
> >   pci_disable_device(pdev);
> >   return -ENODEV;
> >   }
> >
> > Is it a known limitation? Michael, could u, pls., comment on this?
> >
> > thanks,
> > vlad
>
> This is expected. uio_pci_generic forwards INT#x interrupts from device
> to userspace, but VF devices never assert INT#x.
>
> So it doesn't seem to make any sense to bind uio_pci_generic there.
>
> I think that DPDK should be fixed to not require uio_pci_generic
> for VF devices (or any devices without INT#x).
>
> If DPDK requires a place-holder driver, the pci-stub driver should
> do this adequately. See ./drivers/pci/pci-stub.c

Thank for clarification, Michael. I'll take a look.

>
> --
> MST


[dpdk-dev] [PATCH] i40e: workaround for Security issue in SR-IOV mode

2015-09-25 Thread Vladislav Zolotarov
On Sep 25, 2015 11:44 AM, "Jingjing Wu"  wrote:
>
> In SR-IOV mode a VF sending LFC or PFC would throttle the entire port.
> The workaround is to add a filter to drop pause frames from VFs from
> sending pause frames.

This is a very strange approach - this would silently disable the Tx FC
while a user would think it's enabled. Wouldn't the right approach be to
let the user decide weather to enable this feature or even better - allow
PF to disable this feature in the VF?

>
> Signed-off-by: Jingjing Wu 
> ---
>  drivers/net/i40e/i40e_ethdev.c | 30 ++
>  1 file changed, 30 insertions(+)
>
> diff --git a/drivers/net/i40e/i40e_ethdev.c
b/drivers/net/i40e/i40e_ethdev.c
> index 2dd9fdc..6cc2172 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -382,6 +382,30 @@ static inline void i40e_flex_payload_reg_init(struct
i40e_hw *hw)
> I40E_WRITE_REG(hw, I40E_GLQF_PIT(17), 0x7440);
>  }
>
> +#define I40E_FLOW_CONTROL_ETHERTYPE  0x8808
> +
> +/*
> + * Add a ethertype filter to drop all flow control frames transimited
> + * from VSIs.
> +*/
> +static void
> +i40e_add_tx_flow_control_drop_filter(struct i40e_pf *pf)
> +{
> +   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> +   uint16_t flags = I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC |
> +   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP |
> +   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TX;
> +   int ret;
> +
> +   ret = i40e_aq_add_rem_control_packet_filter(hw, NULL,
> +   I40E_FLOW_CONTROL_ETHERTYPE, flags,
> +   pf->main_vsi_seid, 0,
> +   TRUE, NULL, NULL);
> +   if (ret)
> +   PMD_INIT_LOG(ERR, "Failed to add filter to drop flow
control "
> + " frames from VSIs.");
> +}
> +
>  static int
>  eth_i40e_dev_init(struct rte_eth_dev *dev)
>  {
> @@ -584,6 +608,12 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
>
> /* enable uio intr after callback register */
> rte_intr_enable(&(pci_dev->intr_handle));
> +   /*
> +* Add a ethertype filter to drop all flow control frames
transimited
> +* from VSIs. This is used to workaround the issue -- in SR-IOV
mode
> +* where a VF sending LFC or PFC would throttle the entire port.
> +*/
> +   i40e_add_tx_flow_control_drop_filter(pf);
>
> /* initialize mirror rule list */
> TAILQ_INIT(>mirror_list);
> --
> 2.4.0
>


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Vladislav Zolotarov
On Sep 11, 2015 7:09 PM, "Thomas Monjalon" 
wrote:
>
> 2015-09-11 18:43, Avi Kivity:
> > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
> > > On Sep 11, 2015 5:55 PM, "Thomas Monjalon"  > > <mailto:thomas.monjalon at 6wind.com>> wrote:
> > > > 2015-09-11 17:47, Avi Kivity:
> > > > > On 09/11/2015 05:25 PM, didier.pallard wrote:
> > > > > > Hi vlad,
> > > > > >
> > > > > > Documentation states that a packet (or multiple packets in
transmit
> > > > > > segmentation) can span any number of
> > > > > > buffers (and their descriptors) up to a limit of 40 minus
WTHRESH
> > > > > > minus 2.
> > > > > >
> > > > > > Shouldn't there be a test in transmit function that drops
> > > properly the
> > > > > > mbufs with a too large number of
> > > > > > segments, while incrementing a statistic; otherwise transmit
> > > function
> > > > > > may be locked by the faulty packet without
> > > > > > notification.
> > > > > >
> > > > >
> > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose
> > > to the
> > > > > application, an mbuf check function.  This way applications that
can
> > > > > generate complex packets can verify that the device will be able
to
> > > > > process them, and applications that only generate simple mbufs can
> > > avoid
> > > > > the overhead by not calling the function.
> > > >
> > > > More than a check, it should be exposed as a capability of the port.
> > > > Anyway, if the application sends too much segments, the driver must
> > > > drop it to avoid hang, and maintain a dedicated statistic counter to
> > > > allow easy debugging.
> > >
> > > I agree with Thomas - this should not be optional. Malformed packets
> > > should be dropped. In the icgbe case it's a very simple test - it's a
> > > single branch per packet so i doubt that it could impose any
> > > measurable performance degradation.
> >
> > A drop allows the application no chance to recover.  The driver must
> > either provide the ability for the application to know that it cannot
> > accept the packet, or it must fix it up itself.
>
> I have the feeling that everybody agrees on the same thing:
> the application must be able to make a well formed packet by checking
> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs?
> In case the application fails in its checks, the driver must drop it and
> notify the user via a stat counter.
> The driver can also remove the hardware limitation by gathering the
segments
> but it may be hard to implement and would be a slow operation.

We thought about linearization too. It's doable with extra mempool and it
may be optional so that those that don't need could compile it out and/or
disable it in a runtime...


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Vladislav Zolotarov
On Sep 11, 2015 7:07 PM, "Richardson, Bruce" 
wrote:
>
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladislav Zolotarov
> > Sent: Friday, September 11, 2015 5:04 PM
> > To: Avi Kivity
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1
> > for all NICs but 82598
> >
> > On Sep 11, 2015 6:43 PM, "Avi Kivity"  wrote:
> > >
> > > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
> > >>
> > >>
> > >> On Sep 11, 2015 5:55 PM, "Thomas Monjalon"
> > >> 
> > wrote:
> > >> >
> > >> > 2015-09-11 17:47, Avi Kivity:
> > >> > > On 09/11/2015 05:25 PM, didier.pallard wrote:
> > >> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
> > >> > > >>
> > >> > > >> Helin, the issue has been seen on x540 devices. Pls., see a
> > chapter
> > >> > > >> 7.2.1.1 of x540 devices spec:
> > >> > > >>
> > >> > > >> A packet (or multiple packets in transmit segmentation) can
> > >> > > >> span
> > any
> > >> > > >> number of
> > >> > > >> buffers (and their descriptors) up to a limit of 40 minus
> > >> > > >> WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details and
> > >> > > >> section Section 7.2.3.5.1
> > for
> > >> > > >> WTHRESH
> > >> > > >> details). For best performance it is recommended to minimize
> > >> > > >> the number of buffers as possible.
> > >> > > >>
> > >> > > >> Could u, pls., clarify why do u think that the maximum number
> > >> > > >> of
> > data
> > >> > > >> buffers is limited by 8?
> > >> > > >>
> > >> > > >> thanks,
> > >> > > >> vlad
> > >> > > >
> > >> > > > Hi vlad,
> > >> > > >
> > >> > > > Documentation states that a packet (or multiple packets in
> > >> > > > transmit
> > >> > > > segmentation) can span any number of buffers (and their
> > >> > > > descriptors) up to a limit of 40 minus WTHRESH minus 2.
> > >> > > >
> > >> > > > Shouldn't there be a test in transmit function that drops
> > >> > > > properly
> > the
> > >> > > > mbufs with a too large number of segments, while incrementing a
> > >> > > > statistic; otherwise transmit
> > function
> > >> > > > may be locked by the faulty packet without notification.
> > >> > > >
> > >> > >
> > >> > > What we proposed is that the pmd expose to dpdk, and dpdk expose
> > >> > > to
> > the
> > >> > > application, an mbuf check function.  This way applications that
> > >> > > can generate complex packets can verify that the device will be
> > >> > > able to process them, and applications that only generate simple
> > >> > > mbufs can
> > avoid
> > >> > > the overhead by not calling the function.
> > >> >
> > >> > More than a check, it should be exposed as a capability of the
port.
> > >> > Anyway, if the application sends too much segments, the driver must
> > >> > drop it to avoid hang, and maintain a dedicated statistic counter
> > >> > to
> > allow
> > >> > easy debugging.
> > >>
> > >> I agree with Thomas - this should not be optional. Malformed packets
> > should be dropped. In the icgbe case it's a very simple test - it's a
> > single branch per packet so i doubt that it could impose any measurable
> > performance degradation.
> > >>
> > >>
> > >
> > > A drop allows the application no chance to recover.  The driver must
> > either provide the ability for the application to know that it cannot
> > accept the packet, or it must fix it up itself.
> >
> > An appropriate statistics counter would be a perfect tool to detect such
> > issues. Knowingly sending a packet that will cause a HW to hang is not
> > acceptable.
>
> I would agree. Drivers should provide a function to query the max number
of
> segments they can accept and the driver should be able to discard any
packets
> exceeding that number, and just track it via a stat.

+1

>
> /Bruce


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Vladislav Zolotarov
On Sep 11, 2015 7:00 PM, "Richardson, Bruce" 
wrote:
>
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladislav Zolotarov
> > Sent: Friday, September 11, 2015 4:13 PM
> > To: Thomas Monjalon
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1
> > for all NICs but 82598
> >
> > On Sep 11, 2015 5:55 PM, "Thomas Monjalon" 
> > wrote:
> > >
> > > 2015-09-11 17:47, Avi Kivity:
> > > > On 09/11/2015 05:25 PM, didier.pallard wrote:
> > > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
> > > > >>
> > > > >> Helin, the issue has been seen on x540 devices. Pls., see a
> > > > >> chapter
> > > > >> 7.2.1.1 of x540 devices spec:
> > > > >>
> > > > >> A packet (or multiple packets in transmit segmentation) can span
> > > > >> any number of buffers (and their descriptors) up to a limit of 40
> > > > >> minus WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details
> > > > >> and section Section 7.2.3.5.1 for WTHRESH details). For best
> > > > >> performance it is recommended to minimize the number of buffers
> > > > >> as possible.
> > > > >>
> > > > >> Could u, pls., clarify why do u think that the maximum number of
> > > > >> data buffers is limited by 8?
> > > > >>
> > > > >> thanks,
> > > > >> vlad
> > > > >
> > > > > Hi vlad,
> > > > >
> > > > > Documentation states that a packet (or multiple packets in
> > > > > transmit
> > > > > segmentation) can span any number of buffers (and their
> > > > > descriptors) up to a limit of 40 minus WTHRESH minus 2.
> > > > >
> > > > > Shouldn't there be a test in transmit function that drops properly
> > > > > the mbufs with a too large number of segments, while incrementing
> > > > > a statistic; otherwise transmit function may be locked by the
> > > > > faulty packet without notification.
> > > > >
> > > >
> > > > What we proposed is that the pmd expose to dpdk, and dpdk expose to
> > > > the application, an mbuf check function.  This way applications that
> > > > can generate complex packets can verify that the device will be able
> > > > to process them, and applications that only generate simple mbufs
> > > > can avoid the overhead by not calling the function.
> > >
> > > More than a check, it should be exposed as a capability of the port.
> > > Anyway, if the application sends too much segments, the driver must
> > > drop it to avoid hang, and maintain a dedicated statistic counter to
> > > allow easy debugging.
> >
> > I agree with Thomas - this should not be optional. Malformed packets
> > should be dropped. In the icgbe case it's a very simple test - it's a
> > single branch per packet so i doubt that it could impose any measurable
> > performance degradation.
> >
> Actually, it could very well do - we'd have to test it. For the vector IO
> paths, every additional cycle in the RX or TX paths causes a noticeable
perf
> drop.

Well if your application is willing to know all different HW limitations
then u may not need it. However usually application doesn't want to know
the HW technical details. And it this case ignoring them may cause HW to
hang.

Of course, if your app always sends single fragment packets of less than
1500 bytes then u r right and u will most likely not hit any HW limitation,
however what i have in mind is a full featured case where packets are bit
more big and complicated and where a single branch per packet will change
nothing. This is regarding 40 segments case.

In regard to the RS bit - this is related to any packet and according to
spec it should be set in every packet.

>
> /Bruce


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Vladislav Zolotarov
On Sep 11, 2015 6:43 PM, "Avi Kivity"  wrote:
>
> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
>>
>>
>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" 
wrote:
>> >
>> > 2015-09-11 17:47, Avi Kivity:
>> > > On 09/11/2015 05:25 PM, didier.pallard wrote:
>> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
>> > > >>
>> > > >> Helin, the issue has been seen on x540 devices. Pls., see a
chapter
>> > > >> 7.2.1.1 of x540 devices spec:
>> > > >>
>> > > >> A packet (or multiple packets in transmit segmentation) can span
any
>> > > >> number of
>> > > >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH
>> > > >> minus 2 (see
>> > > >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1
for
>> > > >> WTHRESH
>> > > >> details). For best performance it is recommended to minimize the
>> > > >> number of buffers
>> > > >> as possible.
>> > > >>
>> > > >> Could u, pls., clarify why do u think that the maximum number of
data
>> > > >> buffers is limited by 8?
>> > > >>
>> > > >> thanks,
>> > > >> vlad
>> > > >
>> > > > Hi vlad,
>> > > >
>> > > > Documentation states that a packet (or multiple packets in transmit
>> > > > segmentation) can span any number of
>> > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH
>> > > > minus 2.
>> > > >
>> > > > Shouldn't there be a test in transmit function that drops properly
the
>> > > > mbufs with a too large number of
>> > > > segments, while incrementing a statistic; otherwise transmit
function
>> > > > may be locked by the faulty packet without
>> > > > notification.
>> > > >
>> > >
>> > > What we proposed is that the pmd expose to dpdk, and dpdk expose to
the
>> > > application, an mbuf check function.  This way applications that can
>> > > generate complex packets can verify that the device will be able to
>> > > process them, and applications that only generate simple mbufs can
avoid
>> > > the overhead by not calling the function.
>> >
>> > More than a check, it should be exposed as a capability of the port.
>> > Anyway, if the application sends too much segments, the driver must
>> > drop it to avoid hang, and maintain a dedicated statistic counter to
allow
>> > easy debugging.
>>
>> I agree with Thomas - this should not be optional. Malformed packets
should be dropped. In the icgbe case it's a very simple test - it's a
single branch per packet so i doubt that it could impose any measurable
performance degradation.
>>
>>
>
> A drop allows the application no chance to recover.  The driver must
either provide the ability for the application to know that it cannot
accept the packet, or it must fix it up itself.

An appropriate statistics counter would be a perfect tool to detect such
issues. Knowingly sending a packet that will cause a HW to hang is not
acceptable.


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Vladislav Zolotarov
On Sep 9, 2015 4:19 PM, "Ananyev, Konstantin" 
wrote:
>
> Hi Thomas,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Wednesday, September 09, 2015 1:19 PM
> > To: Zhang, Helin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1 for all NICs but 82598
> >
> > 2015-08-25 20:13, Zhang, Helin:
> > > Yes, I got the perfect answers. Thank you very much!
> > > I just wanted to make sure the test case was OK with the limit of
maximum number of descriptors, as I heard there is a hang issue on
> > other NICs of using more descriptors than hardware allowed.
> > > OK. I am still waiting for the answers/confirmation from x540
hardware designers. We need all agree on your patches to avoid risks.
> >
> > Helin, any news?
> > Can we apply v4 of this patch?
>
> Unfortunately we are seeing a huge performance drop with that patch:
> On my box bi-directional traffic (64B packet) over one port can't reach
even 11 Mpps.

Konstantin, could u clarify - u saw "only" 11 Mpps with v3 of this patch
which doesn't change the rs_thresh and only sets RS on every packet? What
is the performance in the same test without this patch?

> We are still doing some experiments and consultations with ND guys how we
can overcome
> this problem and keep performance intact.
>
> Konstantin


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Vladislav Zolotarov
On Sep 11, 2015 5:55 PM, "Thomas Monjalon" 
wrote:
>
> 2015-09-11 17:47, Avi Kivity:
> > On 09/11/2015 05:25 PM, didier.pallard wrote:
> > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
> > >>
> > >> Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > >> 7.2.1.1 of x540 devices spec:
> > >>
> > >> A packet (or multiple packets in transmit segmentation) can span any
> > >> number of
> > >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH
> > >> minus 2 (see
> > >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for
> > >> WTHRESH
> > >> details). For best performance it is recommended to minimize the
> > >> number of buffers
> > >> as possible.
> > >>
> > >> Could u, pls., clarify why do u think that the maximum number of data
> > >> buffers is limited by 8?
> > >>
> > >> thanks,
> > >> vlad
> > >
> > > Hi vlad,
> > >
> > > Documentation states that a packet (or multiple packets in transmit
> > > segmentation) can span any number of
> > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH
> > > minus 2.
> > >
> > > Shouldn't there be a test in transmit function that drops properly the
> > > mbufs with a too large number of
> > > segments, while incrementing a statistic; otherwise transmit function
> > > may be locked by the faulty packet without
> > > notification.
> > >
> >
> > What we proposed is that the pmd expose to dpdk, and dpdk expose to the
> > application, an mbuf check function.  This way applications that can
> > generate complex packets can verify that the device will be able to
> > process them, and applications that only generate simple mbufs can avoid
> > the overhead by not calling the function.
>
> More than a check, it should be exposed as a capability of the port.
> Anyway, if the application sends too much segments, the driver must
> drop it to avoid hang, and maintain a dedicated statistic counter to allow
> easy debugging.

I agree with Thomas - this should not be optional. Malformed packets should
be dropped. In the icgbe case it's a very simple test - it's a single
branch per packet so i doubt that it could impose any measurable
performance degradation.

>


[dpdk-dev] rte_eth_rx_burst only returns up to 32 packets

2015-08-29 Thread Vladislav Zolotarov
On Aug 29, 2015 16:55, "Masood Moshref Javadi" 
wrote:
>
> Thanks a lot.
>
> So I assume the only way to see if there are more than 256 packets in the
> queue is to count them using rte_eth_rx_count

Sorry for bulging in but i think Masood rose a very important general
question here, which is: how do i check if there are more pending packets
in the PMD Rx queue without actually polling them (out)?

As far as i see it the short answer is you can't. Or more specifically -
you shouldn't. Because if it happens that your code depends on it - it most
likely doesn't fit the PMD model.

The thing is that u can actually go and query the Rx ring about the number
of pending packets but the truth is that by the time u get this query
result it may already be wrong since new packets have arrived.

The only correct way to deal with PMD Rx queue is to actually go and poll
for new packets: if there are some - handle them and if there aren't any -
poll again.

This is my understanding of this.

Thanks,
Vlad

> On Aug 28, 2015 8:32 PM, "Gaohaifeng (A)" 
wrote:
>
> > Please see  _rece_raw_pkts_vec function.
> >
> > Here is part of its comments may explain this question
> > /*
> >  * vPMD receive routine, now only accept (nb_pkts ==
> > RTE_IXGBE_VPMD_RX_BURST)
> >  * in one loop
> >  *
> >  * Notice:
> >  * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet
> >  * - nb_pkts > RTE_IXGBE_VPMD_RX_BURST, only scan
RTE_IXGBE_VPMD_RX_BURST
> >  *   numbers of DD bit
> > */
> >
> >
> >
> > Hi,
> >
> > The documentation of this method says: Applications implementing a
> > "retrieve as much received packets as possible" policy can check this
> > specific case and keep invoking the rte_eth_rx_burst() <
> >
http://dpdk.org/doc/api/rte__ethdev_8h.html#aee7daffe261e67355a78b106627c4c45
> > >function
> > until a value less than nb_pkts is returned.
> >
> > But the function returns at most 32 packets regardless of the burst size
> > parameter (nb_pkts). For example when I set the burst size to 256, it
only
> > returns 32 packets even though the queue has more packets. This means
that
> > I cannot rely on the returned value to know if there are >
> > 256 packets in the queue or not.
> >
> > Where this number 32 comes from? Is it because "PMD:
> > ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst
size
> > no less than 32." ?
> >
> > I use DPDK 2.0.0 and Intel 82599 10 G NIC.
> >
> > Thanks
> >


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Vladislav Zolotarov
On Aug 25, 2015 22:16, "Zhang, Helin"  wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Tuesday, August 25, 2015 11:53 AM
> > To: Zhang, Helin
> > Cc: Lu, Wenzhuo; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1 for
> > all NICs but 82598
> >
> >
> >
> > On 08/25/15 21:43, Zhang, Helin wrote:
> > >
> > > Hi Vlad
> > >
> > > I think this could possibly be the root cause of your TX hang issue.
> > > Please try to limit the number to 8 or less, and then see if the issue
> > > will still be there or not?
> > >
> >
> > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > 7.2.1.1 of x540 devices spec:
> >
> > A packet (or multiple packets in transmit segmentation) can span any
number of
> > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus
2 (see
> > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for
WTHRESH
> > details). For best performance it is recommended to minimize the number
of
> > buffers as possible.
> >
> > Could u, pls., clarify why do u think that the maximum number of data
buffers is
> > limited by 8?
> OK, i40e hardware is 8

For i40 it's a bit more complicated than just "not more than 8" - it's not
more than 8 for a non-TSO packet and not more than 8 for each MSS including
headers buffers for TSO. But this thread is not about i40e so this doesn't
seem to be relevant anyway.

, so I'd assume x540 could have a similar one.

x540 spec assumes otherwise... ?

Yes, in your case,
> the limit could be around 38, right?

If by "around 38" u mean "exactly 38" then u are absolutely right... ?

> Could you help to make sure there is no packet to be transmitted uses
more than
> 38 descriptors?

Just like i've already mentioned, we limit the cluster by at most 33 data
segments. Therefore we are good here...

> I heard that there is a similar hang issue on X710 if using more than 8
descriptors for
> a single packet. I am wondering if the issue is similar on x540.

What's x710? If that's xl710 40G nics (i40e driver), then it has its own
specs with its own HW limitations i've mentioned above. It has nothing to
do with this thread that is all about 10G nics managed by ixgbe driver.

There is a different thread, where i've raised the 40G NICs xmit issues.
See "i40e xmit path HW limitation" thread.

>
> Regards,
> Helin
>
> >
> > thanks,
> > vlad
> >
> > > It does not have any check for the number of descriptors to be used
> > > for a single packet, and it relies on the users to give correct mbuf
> > > chains.
> > >
> > > We may need a check of this somewhere. Of cause the point you
> > > indicated we also need to carefully investigate or fix.
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > *To:* Zhang, Helin
> > > *Cc:* Lu, Wenzhuo; dev at dpdk.org
> > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > >
> > >
> > > On Aug 25, 2015 21:14, "Zhang, Helin"  > > <mailto:helin.zhang at intel.com>> wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > In addition, I?d double check with you what?s the maximum number of
> > > descriptors would be used for a single packet transmitting?
> > > >
> > > > Datasheet said that it supports up to 8. I am wondering if more than
> > > 8 were used in your case?
> > >
> > > If memory serves me well the maximum number of data descriptors per
> > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK
> > > is always zero it gives us 38 segments. We limit them by 33.
> > >
> > > >
> > > > Thank you very much!
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Zhang, Helin
> > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > To: Vladislav Zolotarov
> > > > Cc: Lu, Wenzhuo; dev at dpdk.org <mailto:dev at dpdk.org>
> > > >
> > > > Subject: RE: [dpdk-dev

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Vladislav Zolotarov
On Aug 25, 2015 21:14, "Zhang, Helin"  wrote:
>
> Hi Vlad
>
>
>
> In addition, I?d double check with you what?s the maximum number of
descriptors would be used for a single packet transmitting?
>
> Datasheet said that it supports up to 8. I am wondering if more than 8
were used in your case?

If memory serves me well the maximum number of data descriptors per single
xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK is always
zero it gives us 38 segments. We limit them by 33.

>
> Thank you very much!
>
>
>
> Regards,
>
> Helin
>
>
>
> From: Zhang, Helin
> Sent: Wednesday, August 19, 2015 10:29 AM
> To: Vladislav Zolotarov
> Cc: Lu, Wenzhuo; dev at dpdk.org
>
> Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1
for all NICs but 82598
>
>
>
> Hi Vlad
>
>
>
> Thank you very much for the patches! Give me a few more time to double
check with more guys, and possibly hardware experts.
>
>
>
> Regards,
>
> Helin
>
>
>
> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Tuesday, August 18, 2015 9:56 PM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Zhang, Helin
> Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1
for all NICs but 82598
>
>
>
>
> On Aug 19, 2015 03:42, "Lu, Wenzhuo"  wrote:
> >
> > Hi Helin,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> > > Sent: Friday, August 14, 2015 1:38 PM
> > > To: Zhang, Helin; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
above 1 for
> > > all NICs but 82598
> > >
> > >
> > >
> > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > Hi Vlad
> > > >
> > > > I don't think the changes are needed. It says in datasheet that the
RS
> > > > bit should be set on the last descriptor of every packet, ONLY WHEN
> > > TXDCTL.WTHRESH equals to ZERO.
> > >
> > > Of course it's needed! See below.
> > > Exactly the same spec a few lines above the place u've just quoted
states:
> > >
> > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater
than
> > > zero."
> > >
> > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit
notation ixgbe PMD
> > > is actually not supporting any value of WTHRESH different from zero.
> > I think Vlad is right. We need to fix this issue. Any suggestion? If
not, I'd like to ack this patch.
>
> Pls., note that there is a v2 of this patch on the list. I forgot to
patch ixgbevf_dev_info_get() in v1.
>
> >
> > >
> > > >
> > > > Regards,
> > > > Helin
> > > >
> > > >> -Original Message-
> > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > >> To: dev at dpdk.org
> > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi at cloudius-systems.com;
Vlad
> > > >> Zolotarov
> > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
above 1 for
> > > all
> > > >> NICs but 82598
> > > >>
> > > >> According to 82599 and x540 HW specifications RS bit *must* be set
in the
> > > last
> > > >> descriptor of *every* packet.
> > > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> > >
> > > Right and ixgbe PMD requires this condition to be fulfilled in order
to
> > > function. See above.
> > >
> > > >
> > > >> This patch fixes the Tx hang we were constantly hitting with a
seastar-based
> > > >> application on x540 NIC.
> > > > Could you help to share with us how to reproduce the tx hang issue,
with using
> > > > typical DPDK examples?
> > >
> > > Sorry. I'm not very familiar with the typical DPDK examples to help u
> > > here. However this is quite irrelevant since without this this patch
> > > ixgbe PMD obviously abuses the HW spec as has been explained above.
> > >
> > > We saw the issue when u stressed the xmit path with a lot of highly
> > > fragmented TCP frames (packets with up to 33 fragments with
non-headers
> > > fragments as small as 4 bytes) with all offload features enabled.
> > >
> > > Thanks,
> > > vlad
> > > >
> > > >> Sign

[dpdk-dev] i40e and RSS woes

2015-08-24 Thread Vladislav Zolotarov
On Aug 24, 2015 21:54, "Zhang, Helin"  wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Monday, August 24, 2015 11:26 AM
> > To: Zhang, Helin; Gleb Natapov
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] i40e and RSS woes
> >
> >
> >
> > On 08/24/15 20:51, Zhang, Helin wrote:
> > >
> > >> -Original Message-
> > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > >> Sent: Monday, August 24, 2015 4:14 AM
> > >> To: Zhang, Helin; Gleb Natapov; dev at dpdk.org
> > >> Subject: Re: [dpdk-dev] i40e and RSS woes
> > >>
> > >>
> > >>
> > >> On 03/05/15 07:56, Zhang, Helin wrote:
> > >>> Hi Gleb
> > >>>
> > >>> Sorry for late! I am struggling on my tasks for the following DPDK
> > >>> release these
> > >> days.
> >  -Original Message-
> >  From: Gleb Natapov [mailto:gleb at cloudius-systems.com]
> >  Sent: Monday, March 2, 2015 8:56 PM
> >  To: dev at dpdk.org
> >  Cc: Zhang, Helin
> >  Subject: Re: i40e and RSS woes
> > 
> >  Ping.
> > 
> >  On Thu, Feb 19, 2015 at 04:50:10PM +0200, Gleb Natapov wrote:
> > > CCing i40e driver author in a hope to get an answer.
> > >
> > > On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
> > >> I have an application that works reasonably well with ixgbe
> > >> driver, but when I try to use it with i40e I encounter various
RSS related
> > issues.
> > >>
> > >> First one is that for some reason i40e, when it builds default
> > >> reta table, round down number of queues to power of two. Why is
> > >> this? If
> > >>> It seems because of i40e queue configuration. We will check it more
> > >>> and see if it can be changed or improved later.
> > >> Helin, hi!
> > >> Sorry for bringing it back but it seems that the RSS queues number
> > >> issue (rounding it down to the nearest power of 2) still hasn't been
> > >> addressed in the master branch.
> > >>
> > >> Could u, pls., clarify what is that "i40e queue configuration" that
> > >> requires this alignment u are referring above?
> > >>
> > >>   From what i could see "num" parameter is not propagated outside the
> > >> i40e_pf_config_rss() in any form except for RSS table contents.
> > >> This means that any code that would need to know the number of Rx
> > >> queues would use the dev_data->nb_rx_queues (e.g. i40e_dev_rx_init())
> > >> and wouldn't be able to know that i40e_pf_config_rss() something
> > >> different except for scanning the RSS table in HW which is of course
not an
> > option.
> > >>
> > >> Therefore, from the first look it seems that this rounding may be
> > >> safely removed unless I've missed something.
> > > Could you help to refer to the data sheet of 'Hash Filter', 'Receive
> > > Queue Regions', it is said that '1, 2, 4, 8, 16, 32, 64' are the
supported queue
> > regions.
> > > Yes, we should support more than 64 queues per port, but for rss, it
> > > should be one of '1, 2, 4, 8, 16, 32, 64'.
> >
> > "The VSIs support 8 regions of receive queues that are aimed mainly for
the TCs.
> > The TC regions are defined per VSI by the VSIQF_TCREGION register. The
region
> > sizes (defined by the TC_SIZE fields) can be any of the following
value: 1, 2, 4, 8,
> > 16, 32, 64 as long as the total number of queues do not exceed the VSI
allocation.
> > These regions starts at the offset defined by the TC_OFFSET parameter.
> > According to the region size, the 'n' LS bits of the Queue Index from
the LUT are
> > enabled."
> >
> > I think the above says that the region sizes may only be one of the
mentioned
> > values.
> >
> > AFAIU this doesn't mean that the number or RSS queues has to be the same
> > - it may not exceed it.
> >
> > Just like it's stated in the "Outcome Queue Index" definition the final
mapping to
> > the PF index space is done using the VSILAN_QTABLE or VSILAN_QBASE
registers
> > (a.k.a. RSS indirection table).
> >
> > For instance if u have a region of size 8 u may configure 3 RSS queues
by setting
> > the following RSS table:
> > 0,1,2,0,1,2,0,1
> I tend to agree with you. Anyway, I am working on supporting more queues
per port than 64,
> and I will take this into account. If not other strong reasons, I will
change it. Thank you very much!

Great! Thanks a lot, Helin.

>
> Regards,
> Helin
>
> >
> > >
> > > Thanks,
> > > Helin
> > >
> > >> Pls., comment.
> > >>
> > >> thanks,
> > >> vlad
> > >>
> > >> I configure reta by my own using all of the queues everything
> > >> seams to be working. To add insult to injury I do not get any
> > >> errors during configuration some queues just do not receive any
traffic.
> > >>
> > >> The second problem is that for some reason i40e does not use 40
> > >> byte toeplitz hash key like any other driver, but it expects the
> > >> key to be 52 bytes. And it would have being fine (if we ignore
> > >> the fact that it contradicts MS spec), but how my high level 

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-19 Thread Vladislav Zolotarov
On Aug 19, 2015 03:42, "Lu, Wenzhuo"  wrote:
>
> Hi Helin,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> > Sent: Friday, August 14, 2015 1:38 PM
> > To: Zhang, Helin; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1 for
> > all NICs but 82598
> >
> >
> >
> > On 08/13/15 23:28, Zhang, Helin wrote:
> > > Hi Vlad
> > >
> > > I don't think the changes are needed. It says in datasheet that the RS
> > > bit should be set on the last descriptor of every packet, ONLY WHEN
> > TXDCTL.WTHRESH equals to ZERO.
> >
> > Of course it's needed! See below.
> > Exactly the same spec a few lines above the place u've just quoted
states:
> >
> > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than
> > zero."
> >
> > And since all three (3) ixgbe xmit callbacks are utilizing RS bit
notation ixgbe PMD
> > is actually not supporting any value of WTHRESH different from zero.
> I think Vlad is right. We need to fix this issue. Any suggestion? If not,
I'd like to ack this patch.

Pls., note that there is a v2 of this patch on the list. I forgot to patch
ixgbevf_dev_info_get() in v1.

>
> >
> > >
> > > Regards,
> > > Helin
> > >
> > >> -Original Message-
> > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > >> Sent: Thursday, August 13, 2015 11:07 AM
> > >> To: dev at dpdk.org
> > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi at cloudius-systems.com; Vlad
> > >> Zolotarov
> > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1 for
> > all
> > >> NICs but 82598
> > >>
> > >> According to 82599 and x540 HW specifications RS bit *must* be set
in the
> > last
> > >> descriptor of *every* packet.
> > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> >
> > Right and ixgbe PMD requires this condition to be fulfilled in order to
> > function. See above.
> >
> > >
> > >> This patch fixes the Tx hang we were constantly hitting with a
seastar-based
> > >> application on x540 NIC.
> > > Could you help to share with us how to reproduce the tx hang issue,
with using
> > > typical DPDK examples?
> >
> > Sorry. I'm not very familiar with the typical DPDK examples to help u
> > here. However this is quite irrelevant since without this this patch
> > ixgbe PMD obviously abuses the HW spec as has been explained above.
> >
> > We saw the issue when u stressed the xmit path with a lot of highly
> > fragmented TCP frames (packets with up to 33 fragments with non-headers
> > fragments as small as 4 bytes) with all offload features enabled.
> >
> > Thanks,
> > vlad
> > >
> > >> Signed-off-by: Vlad Zolotarov 
> > >> ---
> > >>   drivers/net/ixgbe/ixgbe_ethdev.c |  9 +
> > >>   drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
> > >>   2 files changed, 31 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> b/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> index b8ee1e9..6714fd9 100644
> > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
> > struct
> > >> rte_eth_dev_info *dev_info)
> > >>.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
> > >>ETH_TXQ_FLAGS_NOOFFLOADS,
> > >>};
> > >> +
> > >> +  /*
> > >> +   * According to 82599 and x540 specifications RS bit *must* be
set on
> > the
> > >> +   * last descriptor of *every* packet. Therefore we will not allow
the
> > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > >> +   */
> > >> +  if (hw->mac.type > ixgbe_mac_82598EB)
> > >> +  dev_info->default_txconf.tx_rs_thresh = 1;
> > >> +
> > >>dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
> > >>dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
> > >>dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff --
> > git
> > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index
> > >> 91023b9..8dbdffc 100644
> > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev
> > >> *dev,
> > >>struct ixgbe_tx_queue *txq;
> > >>struct ixgbe_hw *hw;
> > >>uint16_t tx_rs_thresh, tx_free_thresh;
> > >> +  bool rs_deferring_allowed;
> > >>
> > >>PMD_INIT_FUNC_TRACE();
> > >>hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > >>
> > >>/*
> > >> +   * According to 82599 and x540 specifications RS bit *must* be
set on
> > the
> > >> +   * last descriptor of *every* packet. Therefore we will not allow
the
> > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > >> +   */
> > >> +  rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB);
> > >> +
> > >> +  /*
> > >> * Validate number of transmit descriptors.
> > >> * It must not exceed hardware maximum, and must be 

[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Vladislav Zolotarov
On Jul 30, 2015 22:00, "Zhang, Helin"  wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Thursday, July 30, 2015 10:56 AM
> > To: Zhang, Helin; Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: i40e xmit path HW limitation
> >
> >
> >
> > On 07/30/15 20:33, Zhang, Helin wrote:
> > >
> > >> -Original Message-
> > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > >> Sent: Thursday, July 30, 2015 9:44 AM
> > >> To: Zhang, Helin; Ananyev, Konstantin
> > >> Cc: dev at dpdk.org
> > >> Subject: Re: i40e xmit path HW limitation
> > >>
> > >>
> > >>
> > >> On 07/30/15 19:10, Zhang, Helin wrote:
> >  -Original Message-
> >  From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> >  Sent: Thursday, July 30, 2015 7:58 AM
> >  To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
> >  Subject: RFC: i40e xmit path HW limitation
> > 
> >  Hi, Konstantin, Helin,
> >  there is a documented limitation of xl710 controllers (i40e driver)
> >  which is not handled in any way by a DPDK driver.
> > From the datasheet chapter 8.4.1:
> > 
> >  "? A single transmit packet may span up to 8 buffers (up to 8 data
> >  descriptors per packet including both the header and payload
buffers).
> >  ? The total number of data descriptors for the whole TSO (explained
> >  later on in this chapter) is unlimited as long as each segment
> >  within the TSO obeys the previous rule (up to 8 data descriptors
> >  per segment for both the TSO header and the segment payload
buffers)."
> > >>> Yes, I remember the RX side just supports 5 segments per packet
receiving.
> > >>> But what's the possible issue you thought about?
> > >> Note that it's a Tx size we are talking about.
> > >>
> > >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next
> > repo.
> > >> If such a cluster arrives and you post it on the HW ring - HW will
> > >> shut this HW ring down permanently. The application will see that
it's ring is
> > stuck.
> > > That issue was because of using more than 8 descriptors for a packet
for TSO.
> >
> > There is no problem in transmitting the TSO packet with more than 8
fragments.
> > On the opposite - one can't transmit a non-TSO packet with more than 8
> > fragments.
> > One also can't transmit the TSO packet that would contain more than 8
fragments
> > in a single TSO segment including the TSO headers.
> >
> > Pls., read the HW spec as I quoted above for more details.
> I meant a packet to be transmitted by the hardware, but not the TSO
packet in memory.
> It could be a segment in TSO packet in memory.
> The linearize check in kernel driver is not for TSO only, it is for both
TSO and
> NON-TSO cases.

That's what i was trying to tell u. Great we are on the same page at
last... ?

>
> >
> > >
> >  This means that, for instance, long cluster with small fragments
> >  has to be linearized before it may be placed on the HW ring.
> > >>> What type of size of the small fragments? Basically 2KB is the
> > >>> default size of
> > >> mbuf of most
> > >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough
> > >>> for the
> > >> maximum
> > >>> packet size we supported.
> > >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB
size of
> > packet.
> > >> I kinda lost u here. Again, we talk about the Tx side here and
> > >> buffers are not obligatory completely filled. Namely there may be a
> > >> cluster with
> > >> 15 fragments 100 bytes each.
> > > The root cause is using more than 8 descriptors for a packet.
> >
> > That would be if u would like to SUPER simplify the HW limitation above.
> > In that case u would significantly limit the different packets that may
be sent
> > without the linearization.
> >
> > > Linux driver can help
> > > on reducing number of descriptors to be used by merging small size of
> > > payload together, right?
> > > It is not for TSO, it is just for packet transmitting. 2 options in
my mind:
> > > 1. Use should ensure it will not use more than 8 descriptors per
packet for
> > transmitting.
> >
> > This requirement is too restricting. Pls., see above.
> >
> > > 2. DPDK driver should try to merge small packet together for such
case, like
> > Linux kernel driver.
> > > I prefer to use option 1, users should ensure that in the application
> > > or up layer software, and keep the PMD driver as simple as possible.
> >
> > The above statement is super confusing: on the one hand u suggest the
DPDK
> > driver to merge the small packet (fragments?) together (how?) and then u
> > immediately propose the user application to do that. Could u, pls.,
clarify what
> > exactly u suggest here?
> > If that's to leave it to the application - note that it would demand
patching all
> > existing DPDK applications that send TCP packets.
> Those are two of obvious options. One is to do that in PMD, the 

[dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD Rx flow

2015-03-13 Thread Vladislav Zolotarov
On Mar 13, 2015 2:51 PM, "Ananyev, Konstantin" 
wrote:
>
> Hi Vlad,
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Friday, March 13, 2015 11:52 AM
> > To: Ananyev, Konstantin; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD
Rx flow
> >
> >
> >
> > On 03/13/15 13:07, Ananyev, Konstantin wrote:
> > >
> > >> -Original Message-
> > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> > >> Sent: Thursday, March 12, 2015 9:17 PM
> > >> To: dev at dpdk.org
> > >> Subject: [dpdk-dev] [PATCH v3 0/3]: bug fixes in the ixgbe PF PMD Rx
flow
> > >>
> > >> This series contains some bug fixes that were found during my work
on the ixgbe LRO
> > >> patches. Sending this series separately on Thomas request so that it
may be integrated
> > >> into the 2.0 release.
> > >>
> > >> New in v3:
> > >> - Adjusted to the new structs naming in the master.
> > >> - Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
> > >>- Don't set them to FALSE in rte_eth_dev_stop() flow - the
following
> > >>  rte_eth_dev_start() will need them.
> > >>- Reset them to TRUE in rte_eth_dev_configure() and not in a
probe() flow.
> > >>  This will ensure the proper behaviour if port is
re-configured.
> > >> - Rename:
> > >>- ixgbe_rx_vec_condition_check() ->
ixgbe_rx_vec_dev_conf_condition_check()
> > >>- set_rx_function() -> ixgbe_set_rx_function()
> > >> - Clean up the logic in ixgbe_set_rx_function().
> > >> - Define stubs with __attribute__((weak)) instead of using
#ifdef's.
> > >> - Styling: beautify ixgbe_rxtx.h a bit.
> > >>
> > >> New in v2:
> > >> - Fixed a compilation failure.
> > >>
> > >>
> > >> Vlad Zolotarov (3):
> > >>ixgbe: Use the rte_le_to_cpu_xx()/rte_cpu_to_le_xx() when
> > >>  reading/setting HW ring descriptor fields
> > >>ixgbe: Bug fix: Properly configure Rx CRC stripping for x540
devices
> > >>ixgbe: Unify the rx_pkt_bulk callback initialization
> > >>
> > >>   lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   2 +
> > >>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  13 +-
> > >>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 216
+---
> > >>   lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  28 -
> > >>   lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |   2 +-
> > >>   5 files changed, 183 insertions(+), 78 deletions(-)
> > >>
> > > Acked-by: Konstantin Ananyev 
> > >
> > > Just one nit:
> > >
> > > +int __attribute__((weak)) ixgbe_rxq_vec_setup(
> > > +   struct ixgbe_rx_queue __rte_unused *rxq)
> > > +{
> > >
> > > Please use notation:
> > > int __attribute__((weak))
> > > ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
> > >
> > > To keep up with the rest of the code, plus makes much easier to read.
> >
> > I took an example from kni/ethtool/igb/kcompat.h for a template but no
> > problem.
> > Do u want me to respin or it's ok? I will use this format for the
> > follow-up LRO patch anyway...
>
> Doing that in LRO patch set is ok.
> No need for respin that one, I think.

Great! Thanks a lot for reviewing this.

Thomas, it seems like ixgbe maintainer gives this series a green light!..
;)

> Konstantin
>
> >
> > >
> > >> --
> > >> 2.1.0
>


[dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups

2015-03-09 Thread Vladislav Zolotarov
On Mar 9, 2015 8:01 PM, "Mcnamara, John"  wrote:
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladislav Zolotarov
> > Sent: Monday, March 9, 2015 5:14 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups
> >
> > On Mar 9, 2015 6:39 PM, "Ananyev, Konstantin"
> > 
> > wrote:
> > > > > If I remember things correctly, it considered result at right side
> > > > > of
> > '=' operator as unsigned int,
> > > > > and then complained that we assign it to smaller size (unsigned
> > short) operand.
> > > >
> > > > If that's the case - that's a clear compiler bug.
> > >
> > > Might be, though if we still have to support it, there is no much
> > > choice
> > I am afraid.
> >
> > Well to begin with could anybody who has this icc thing (preferably the
> > latest version) post the compilation errors u are talking about
>
> Hi,
>
> For what it is worth there aren't any warnings with ICC 13 and the 1/3
patch (+ the previous patchset):
>
>   $ make T=x86_64-native-linuxapp-icc install CC=icc
>   Build complete
>
>   $ git log --pretty=format:"%h - %an: %s" -4
>   b5d06e4 - Vladislav Zolotarov: ixgbe: Cleanups
>   3cd0367 - Vladislav Zolotarov: ixgbe: Unify the rx_pkt_bulk callback ...
>   10ed30e - Vladislav Zolotarov: ixgbe: Bug fix: Properly configure Rx ...
>   2a5bc6a - Vladislav Zolotarov: ixgbe: Use the rte_le_to_cpu_xx()/rte_...
>
>   $ icc --version
>   icc (ICC) 13.1.1 20130313

Thanks a lot, John.

>
> John.
> --
>
>


[dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups

2015-03-09 Thread Vladislav Zolotarov
On Mar 9, 2015 8:01 PM, "Mcnamara, John"  wrote:
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladislav Zolotarov
> > Sent: Monday, March 9, 2015 5:14 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups
> >
> > On Mar 9, 2015 6:39 PM, "Ananyev, Konstantin"
> > 
> > wrote:
> > > > > If I remember things correctly, it considered result at right side
> > > > > of
> > '=' operator as unsigned int,
> > > > > and then complained that we assign it to smaller size (unsigned
> > short) operand.
> > > >
> > > > If that's the case - that's a clear compiler bug.
> > >
> > > Might be, though if we still have to support it, there is no much
> > > choice
> > I am afraid.
> >
> > Well to begin with could anybody who has this icc thing (preferably the
> > latest version) post the compilation errors u are talking about
>
> Hi,
>
> For what it is worth there aren't any warnings with ICC 13 and the 1/3
patch (+ the previous patchset):
>
>   $ make T=x86_64-native-linuxapp-icc install CC=icc
>   Build complete
>
>   $ git log --pretty=format:"%h - %an: %s" -4
>   b5d06e4 - Vladislav Zolotarov: ixgbe: Cleanups
>   3cd0367 - Vladislav Zolotarov: ixgbe: Unify the rx_pkt_bulk callback ...
>   10ed30e - Vladislav Zolotarov: ixgbe: Bug fix: Properly configure Rx ...
>   2a5bc6a - Vladislav Zolotarov: ixgbe: Use the rte_le_to_cpu_xx()/rte_...
>
>   $ icc --version
>   icc (ICC) 13.1.1 20130313

That's worth a lot to me!.. :D

>
> John.
> --
>
>


[dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups

2015-03-09 Thread Vladislav Zolotarov
On Mar 9, 2015 6:39 PM, "Ananyev, Konstantin" 
wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Monday, March 09, 2015 3:58 PM
> > To: Ananyev, Konstantin; Wodkowski, PawelX; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups
> >
> >
> >
> > On 03/09/15 13:29, Ananyev, Konstantin wrote:
> > >
> > >> -Original Message-
> > >> From: Wodkowski, PawelX
> > >> Sent: Monday, March 09, 2015 11:09 AM
> > >> To: Ananyev, Konstantin; Vlad Zolotarov; dev at dpdk.org
> > >> Subject: Re: [dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups
> > >>
> > >> On 2015-03-09 11:49, Ananyev, Konstantin wrote:
> > >>>
> >  -Original Message-
> >  From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> >  Sent: Monday, March 09, 2015 10:21 AM
> >  To: dev at dpdk.org
> >  Subject: [dpdk-dev] [PATCH v5 1/3] ixgbe: Cleanups
> > 
> >   - Removed the not needed casting.
> >   - ixgbe_dev_rx_init(): shorten the lines by defining a local
alias variable to access
> >  >data->dev_conf.rxmode.
> > 
> >  Signed-off-by: Vlad Zolotarov 
> >  ---
> > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 27
---
> > 1 file changed, 12 insertions(+), 15 deletions(-)
> > 
> >  diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >  index 72c65df..609b5fd 100644
> >  --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >  +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >  @@ -1032,8 +1032,7 @@ ixgbe_rx_alloc_bufs(struct igb_rx_queue *rxq)
> >   int diag, i;
> > 
> >   /* allocate buffers in bulk directly into the S/W ring */
> >  -alloc_idx = (uint16_t)(rxq->rx_free_trigger -
> >  -(rxq->rx_free_thresh - 1));
> >  +alloc_idx = rxq->rx_free_trigger - (rxq->rx_free_thresh -
1);
> > >>> I think all these extra casts came in to keep icc 12.* compiling
without warnings.
> > >>> I am agree that they are unnecessary.
> > >>> Though if we still have to support icc 12.* we either need to keep
them, or find
> > >>> some other way to keep it happy.
> > >>> Konstantin
> > >>>
> > >> What warnings icc 12.* is throwing?
> > > Try and see :)
> > >
> > >> Only warning I can think of here is
> > >> signed -> unsigned implicit cast.
> > > If I remember things correctly, it considered result at right side of
'=' operator as unsigned int,
> > > and then complained that we assign it to smaller size (unsigned
short) operand.
> >
> > If that's the case - that's a clear compiler bug.
>
> Might be, though if we still have to support it, there is no much choice
I am afraid.

Well to begin with could anybody who has this icc thing (preferably the
latest version) post the compilation errors u are talking about. Let's
continue this discussion with some specific things in hands. So far there
were a lot of "maybe"s and "as far as i remember"s. Could u, guys, pls., be
more specific so that i could address the real issues and not just your
fears? ;)

Thanks,
Vlad

>
> >
> > >
> > >> Changing '1' to '1U' helps?
> > > Don't think so, but you are welcome to try.
> > >
> > > Konstantin
> > >
> > >>
> > >> --
> > >> Pawel
>
,


[dpdk-dev] [PATCH v2 5/6] common_linuxapp: Added CONFIG_RTE_ETHDEV_LRO_SUPPORT option

2015-03-05 Thread Vladislav Zolotarov
On Mar 5, 2015 9:14 PM, "Thomas Monjalon"  wrote:
>
> 2015-03-05 16:18, Vlad Zolotarov:
> >
> > On 03/05/15 16:01, Thomas Monjalon wrote:
> > > 2015-03-05 15:39, Vlad Zolotarov:
> > >> On 03/05/15 15:19, Thomas Monjalon wrote:
> > >>> 2015-03-05 13:28, Vlad Zolotarov:
> >  Enables LRO support in PMDs.
> > 
> >  Signed-off-by: Vlad Zolotarov 
> >  ---
> > config/common_linuxapp | 1 +
> > 1 file changed, 1 insertion(+)
> > 
> >  diff --git a/config/common_linuxapp b/config/common_linuxapp
> >  index 97f1c9e..5b98595 100644
> >  --- a/config/common_linuxapp
> >  +++ b/config/common_linuxapp
> >  @@ -137,6 +137,7 @@ CONFIG_RTE_MAX_ETHPORTS=32
> > CONFIG_RTE_LIBRTE_IEEE1588=n
> > CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> >  +CONFIG_RTE_ETHDEV_LRO_SUPPORT=y
> > >>> Sorry I don't really follow this ixgbe discussion but I wonder why
you
> > >>> would add a compile time option for this feature.
> > >> The only reason is to be able to detect that the feature is present
in
> > >> the DPDK version u r compiling against because of the API change.
> > >> Currently, this can't be done using the DPDK version thus we may
either
> > > Why you cannot use version? In development phase?
> > > When released, you'll be able to test >= 2.1.
> >
> > Of course! When the version bumps, the #ifdef I've mentioned above may
> > be replaced with the version check.
> >
> > >
> > >> do a try-compilation and if it fails define some application-level
macro
> > >> disabling
> > >> the feature usage or we may define a macro in the library level
> > >> (together with tons of other such macros like those in the patch
snippet
> > >> above).
> > > I'd prefer a request rte_eth_dev_info_get() to advertise that the
feature
> > > is available with the device and driver.
> > > Please let's try to remove config options and #ifdefs.
> >
> > This is exactly what my patch does. But that's not ending there - there
> > is a new feature bit added in rte_eth_rxmode (enable_lro) and in order
> > to compile the application has to know somehow if this bit is present or
> > not. How do u propose to do this now?
>
> I think it would be better to define something like RTE_HAS_LRO in
rte_ethdev.h.

Ok. So i'll change it in v4.

>
> > Of course, I can put such macro in my own tree but then I'll have to
> > rebase all the time and inform other developers that will have to work
> > against my tree (and not upstream as it's supposed to be) - to update.
> > This sounds like a hassle thus I added this macro to resolve this issue
> > until the version is bumped.
> >
> > >
> > >>> What is the benefit of disabling it?
> > >> No benefit whatsoever.
>
>


[dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS

2014-12-26 Thread Vladislav Zolotarov
On Dec 26, 2014 9:28 AM, "Ouyang, Changchun" 
wrote:
>
> Hi Vladislav,
>
>
>
> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Friday, December 26, 2014 2:49 PM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
>
>
>
>
> On Dec 26, 2014 3:52 AM, "Ouyang, Changchun" 
wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > > Sent: Thursday, December 25, 2014 9:20 PM
> > > To: Ouyang, Changchun; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
> > >
> > >
> > > On 12/25/14 04:43, Ouyang, Changchun wrote:
> > > > Hi,
> > > > Sorry miss some comments, so continue my response below,
> > > >
> > > >> -Original Message-
> > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > > >> Sent: Wednesday, December 24, 2014 6:40 PM
> > > >> To: Ouyang, Changchun; dev at dpdk.org
> > > >> Subject: Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: Config VF RSS
> > > >>
> > > >>
> > > >> On 12/24/14 07:23, Ouyang Changchun wrote:
> > > >>> It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable
> > > VF
> > > >> RSS.
> > > >>> The psrtype will determine how many queues the received packets
will
> > > >>> distribute to, and the value of psrtype should depends on both
facet:
> > > >>> max VF rxq number which has been negotiated with PF, and the
number
> > > >>> of
> > > >> rxq specified in config on guest.
> > > >>> Signed-off-by: Changchun Ouyang 
> > > >>> ---
> > > >>>lib/librte_pmd_ixgbe/ixgbe_pf.c   | 15 +++
> > > >>>lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 92
> > > >> ++-
> > > >>>2 files changed, 97 insertions(+), 10 deletions(-)
> > > >>>
> > > >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > >>> b/lib/librte_pmd_ixgbe/ixgbe_pf.c index cbb0145..9c9dad8 100644
> > > >>> --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > >>> @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct
rte_eth_dev
> > > >> *eth_dev)
> > > >>>   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
> > > mac.num_rar_entries), 0);
> > > >>>   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
> > > mac.num_rar_entries), 0);
> > > >>>
> > > >>> + /*
> > > >>> +  * VF RSS can support at most 4 queues for each VF, even if
> > > >>> +  * 8 queues are available for each VF, it need refine to 4
> > > >>> +  * queues here due to this limitation, otherwise no queue
> > > >>> +  * will receive any packet even RSS is enabled.
> > > >> According to Table 7-3 in the 82599 spec RSS is not available when
> > > >> port is configured to have 8 queues per pool. This means that if u
> > > >> see this configuration u may immediately disable RSS flow in your
code.
> > > >>
> > > >>> +  */
> > > >>> + if (eth_dev->data->dev_conf.rxmode.mq_mode ==
> > > >> ETH_MQ_RX_VMDQ_RSS) {
> > > >>> + if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
> > > >>> + RTE_ETH_DEV_SRIOV(eth_dev).active =
> > > >> ETH_32_POOLS;
> > > >>> + RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
> > > >>> + RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
> > > >>> + dev_num_vf(eth_dev) * 4;
> > > >> According to 82599 spec u can't do that since RSS is not allowed
when
> > > >> port is configured to have 8 function per-VF. Have u verified that
> > > >> this works? If yes, then spec should be updated.
> > > >>
> > > >>> + }
> > > >>> + }
> > > >>> +
> > > >>>   /* set VMDq map to default PF pool */
> > > >>>   hw->mac.ops.set_vmdq(hw, 0,
> > > >>> RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx);
> > > >>>
&

[dpdk-dev] DPDK RSS support for ixgbevf PMD

2014-12-15 Thread Vladislav Zolotarov
Hi,
I'm running an ixgbevf PMD on an AWS guests with extended networking
(SR-IOV functions of 82599 Intel's NIC) and noticed that even in the
current git tree there is no support for a multi-queue in this PMD: reta
size returned by rte_eth_dev_info_get() call is 0, while max_rx_queues and
max_tx_queues are both 4.

Linux ixgbevf-2.15.3 driver on the other hand successfully initializes 2
RSS queues: for some reason it always limits the number of RSS queues by 2.

ixgbevf_main.c: line 2539
u16 rss = min_t(u16, num_online_cpus(), 2);

The above is strange since if MRQE is set to 1010b there are 4 RSS queues
available which seems to be the case in my AWS Guest.

However, let's get back to DPDK. As I've mentioned above the SR-IOV
function i have is RSS capable (to be 100% sure I've verified both queues
are receiving packets in a multi-socket TCP test). And it's a shame I can't
utilize it with a DPDK.

I wonder if there are any blockers to add this capability to the ixgbevf
PMD and if not is it scheduled to some time soon?

Thanks in advance,
vlad