On 12/13/23 14:06, David Marchand wrote:
> This commit adds support for DPDK v23.11.
> It updates the CI script and documentation and includes the following
> changes coming from the dpdk-latest branch:
> 
> - sparse: Add some compiler intrinsics for DPDK build.
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=*
> 
> - ci: Cache DPDK installed libraries only.
> - ci: Reduce optional libraries in DPDK.
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=*
> 
> - system-dpdk: Ignore net/ice error log about QinQ offloading.
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=*
> 
> Signed-off-by: David Marchand <david.march...@redhat.com>
> ---

Hi, Kevin, David, others.

We need to make a decision on this patch as the proposed branching is
only one week away.  As far as I understand, there is a problem with
Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added.
The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337
(not 1337 at all) and the commit that introduced the issue in DPDK is
known.  To the date the issue is not fixed.  The potential solution is
to revert the commit from DPDK, bringing back another issue fixed by
aforementioned commit, though that issue seems less severe and, to my
knowledge, we didn't actually experience it in the past.

There is also a situation around DPDK stable releases.  Since these are
normally created after the next major release of DPDK is out, the time
gap between xx.11 and xx.11.1 is 5 months.  Which is a lot, especially
for an LTS release, since projects are likely to migrate to new LTS
releases of DPDK and they are likely to discover bugs that need fixes
earlier than in 5 months.

With that said, we have a few options for the current patch:

0. Accept the patch and do nothing about the issue.  Clearly not a good
   option.  The argument can be made that the problem was also
   backported to stable DPDK 21.11.5 and 22.11.something, so older OVS
   releases are also affected, i.e. it's kind of not a problem for 3.3
   release of OVS in particular.  However, for older releases the users
   can choose to fall back to older stable releases of DPDK.  With a
   major version upgrade we are going to introduce breaking changes,
   and there is nowhere to fall back, since going back to 22.11 will
   break features for certain drivers even if DPDK API/ABI that we
   use would have been compatible.

1. Accept the patch and document that users will need to revert a
   particular DPDK commit, if they are planning to use VFs on Intel NICs.
   And upgrade to 23.11.1 as soon as it is available, assuming the issue
   will be fixed there.

   This is not a very user-friendly option.  And it is not clear if
   distributions will do that.  Also, it's a one-off solution that we may
   have to repeat every year.  And it might not be possible for other
   types of issues we may encounter in the future.  Also, users will
   have zero validation for the changes they make in DPDK.

2. Check if DPDK can make a one-off stable release of 23.11.1 with just this
   patch reverted or the fix implemented.  If this can be done before OVS
   release in mid February, that might be acceptable.

   This will likely mean skipping some validation steps on the DPDK release
   side, so not ideal.  However, it is better than asking users to revert
   this patch themselves as they will have zero validation this way.
   This also doesn't address the bigger problem with DPDK stable release
   cadence and making one-off releases every year doesn't sound right.

3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer
   releases of OVS.

   This should address the release cadence problem, sine we'll have at
   least one stable release of DPDK before moving to a new major version,
   giving us time to test and report issues.  Upgrading to .1 stable versions
   instead of unstable ones seems like a good idea for software in general.
   Obvious downside for this approach is an even longer time for new DPDK
   features to be available for OVS users.

Note: Moving release dates for major releases of OVS or DPDK doesn't sound
right and may create more issues than it solves due to release time alignments
with major consumers like OVN, distributions and cluster management systems.
So, not suggesting that.

<rant>
4. Revisiting the stable release policy for DPDK LTS releases might be a good
   thing though, since 5 months is an unreasonably long time for a fresh
   release to not receive any bug fixes. This time gap is also larger than a
   time gap between two stable releases of the same series, i.e. time between
   xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which
   doesn't make a lot of sense.

   I understand a position of DPDK project to not incorporate testing of
   external applications into their release process, since it can't possibly
   test with every application.  However, application developers can't possibly
   test every DPDK driver on their own, because upstream communities like OVS
   simply don't have hardware/infrastructure to do so.  And there is a clear
   gap in testing and validation on DPDK side, i.e. validation performed by
   DPDK project alone is not sufficient.  That means that bugs are inevitable
   and fresh releases of DPDK will contain bugs making them unusable for some
   applications.  Hence the need for faster process for .1 releases.  E.g.
   have xx.11.1 release in the end of Januray / start of February would be
   fine.  Though the timing with different holidays around the world is not
   good.

   This option is just a little more sustainable option 2 as it will involve
   proper validation on DPDK side.  But again it's not OVS' call to make.

5. Have bug-free DPDK right out the gate :D.  This is obviously not happening
   unless OVS is tightly integrated into DPDK testing and validation and all
   the issues are caught before new version of DPDK is released.
</rant>

I think, option 0 is a no-go.  To resolve a current issue at hands for OVS
3.3 we could go with 1, 2 or 3.  Though 2 is not OVS' call to make.  Long
term solutions are 3 or 4, as 1 and 2 require solving this problem every year,
depending on us having problems with a new release or not.  5 doesn't seem
like a possible solution at the moment for various reasons.

Thoughts?

We need to make a decision on this by the end of this week.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to