On 12/13/23 14:06, David Marchand wrote: > This commit adds support for DPDK v23.11. > It updates the CI script and documentation and includes the following > changes coming from the dpdk-latest branch: > > - sparse: Add some compiler intrinsics for DPDK build. > https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=* > > - ci: Cache DPDK installed libraries only. > - ci: Reduce optional libraries in DPDK. > https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=* > > - system-dpdk: Ignore net/ice error log about QinQ offloading. > https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=* > > Signed-off-by: David Marchand <david.march...@redhat.com> > ---
Hi, Kevin, David, others. We need to make a decision on this patch as the proposed branching is only one week away. As far as I understand, there is a problem with Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added. The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337 (not 1337 at all) and the commit that introduced the issue in DPDK is known. To the date the issue is not fixed. The potential solution is to revert the commit from DPDK, bringing back another issue fixed by aforementioned commit, though that issue seems less severe and, to my knowledge, we didn't actually experience it in the past. There is also a situation around DPDK stable releases. Since these are normally created after the next major release of DPDK is out, the time gap between xx.11 and xx.11.1 is 5 months. Which is a lot, especially for an LTS release, since projects are likely to migrate to new LTS releases of DPDK and they are likely to discover bugs that need fixes earlier than in 5 months. With that said, we have a few options for the current patch: 0. Accept the patch and do nothing about the issue. Clearly not a good option. The argument can be made that the problem was also backported to stable DPDK 21.11.5 and 22.11.something, so older OVS releases are also affected, i.e. it's kind of not a problem for 3.3 release of OVS in particular. However, for older releases the users can choose to fall back to older stable releases of DPDK. With a major version upgrade we are going to introduce breaking changes, and there is nowhere to fall back, since going back to 22.11 will break features for certain drivers even if DPDK API/ABI that we use would have been compatible. 1. Accept the patch and document that users will need to revert a particular DPDK commit, if they are planning to use VFs on Intel NICs. And upgrade to 23.11.1 as soon as it is available, assuming the issue will be fixed there. This is not a very user-friendly option. And it is not clear if distributions will do that. Also, it's a one-off solution that we may have to repeat every year. And it might not be possible for other types of issues we may encounter in the future. Also, users will have zero validation for the changes they make in DPDK. 2. Check if DPDK can make a one-off stable release of 23.11.1 with just this patch reverted or the fix implemented. If this can be done before OVS release in mid February, that might be acceptable. This will likely mean skipping some validation steps on the DPDK release side, so not ideal. However, it is better than asking users to revert this patch themselves as they will have zero validation this way. This also doesn't address the bigger problem with DPDK stable release cadence and making one-off releases every year doesn't sound right. 3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer releases of OVS. This should address the release cadence problem, sine we'll have at least one stable release of DPDK before moving to a new major version, giving us time to test and report issues. Upgrading to .1 stable versions instead of unstable ones seems like a good idea for software in general. Obvious downside for this approach is an even longer time for new DPDK features to be available for OVS users. Note: Moving release dates for major releases of OVS or DPDK doesn't sound right and may create more issues than it solves due to release time alignments with major consumers like OVN, distributions and cluster management systems. So, not suggesting that. <rant> 4. Revisiting the stable release policy for DPDK LTS releases might be a good thing though, since 5 months is an unreasonably long time for a fresh release to not receive any bug fixes. This time gap is also larger than a time gap between two stable releases of the same series, i.e. time between xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which doesn't make a lot of sense. I understand a position of DPDK project to not incorporate testing of external applications into their release process, since it can't possibly test with every application. However, application developers can't possibly test every DPDK driver on their own, because upstream communities like OVS simply don't have hardware/infrastructure to do so. And there is a clear gap in testing and validation on DPDK side, i.e. validation performed by DPDK project alone is not sufficient. That means that bugs are inevitable and fresh releases of DPDK will contain bugs making them unusable for some applications. Hence the need for faster process for .1 releases. E.g. have xx.11.1 release in the end of Januray / start of February would be fine. Though the timing with different holidays around the world is not good. This option is just a little more sustainable option 2 as it will involve proper validation on DPDK side. But again it's not OVS' call to make. 5. Have bug-free DPDK right out the gate :D. This is obviously not happening unless OVS is tightly integrated into DPDK testing and validation and all the issues are caught before new version of DPDK is released. </rant> I think, option 0 is a no-go. To resolve a current issue at hands for OVS 3.3 we could go with 1, 2 or 3. Though 2 is not OVS' call to make. Long term solutions are 3 or 4, as 1 and 2 require solving this problem every year, depending on us having problems with a new release or not. 5 doesn't seem like a possible solution at the moment for various reasons. Thoughts? We need to make a decision on this by the end of this week. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev