On 1/12/24 14:39, Kevin Traynor wrote: > On 11/01/2024 15:44, Ilya Maximets wrote: >> On 1/10/24 19:35, Kevin Traynor wrote: >>> +cc some others people who may be interested about OVS upgrading DPDK >>> version. >>> >>> On 10/01/2024 16:52, Ilya Maximets wrote: >>>> On 12/13/23 14:06, David Marchand wrote: >>>>> This commit adds support for DPDK v23.11. >>>>> It updates the CI script and documentation and includes the following >>>>> changes coming from the dpdk-latest branch: >>>>> >>>>> - sparse: Add some compiler intrinsics for DPDK build. >>>>> >>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=* >>>>> >>>>> - ci: Cache DPDK installed libraries only. >>>>> - ci: Reduce optional libraries in DPDK. >>>>> >>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=* >>>>> >>>>> - system-dpdk: Ignore net/ice error log about QinQ offloading. >>>>> >>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=* >>>>> >>>>> Signed-off-by: David Marchand <david.march...@redhat.com> >>>>> --- >>>> >>>> Hi, Kevin, David, others. >>>> >>> >>> Hi Ilya, >>> >>> Thanks for summarizing the options. >>> >>>> We need to make a decision on this patch as the proposed branching is >>>> only one week away. As far as I understand, there is a problem with >>>> Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added. >>>> The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337 >>>> (not 1337 at all) and the commit that introduced the issue in DPDK is >>>> known. To the date the issue is not fixed. The potential solution is >>>> to revert the commit from DPDK, bringing back another issue fixed by >>>> aforementioned commit, though that issue seems less severe and, to my >>>> knowledge, we didn't actually experience it in the past. >>>> >>> >>> Agree, non-regression is always better. >>> >>>> There is also a situation around DPDK stable releases. Since these are >>>> normally created after the next major release of DPDK is out, the time >>>> gap between xx.11 and xx.11.1 is 5 months. Which is a lot, especially >>>> for an LTS release, since projects are likely to migrate to new LTS >>>> releases of DPDK and they are likely to discover bugs that need fixes >>>> earlier than in 5 months. >>>> >>> >>> Good feedback. The issue is the LTS follows the DPDK main release in >>> order that fixes are applied in main branch and already have gone >>> through some validation. But maybe there's a more limited xx.11.1 >>> version with fixes for reported issues that could be released etc. It's >>> something that would need more discussion. >>> >>> I think it's better to address the current issue and possible future >>> workflow changes separately as much as possible, as they might need >>> different resolutions and the thread could get a bit overloaded. I've >>> just commented on the current issue below for now. >> >> OK. That makes sense. It's hard to solve long term issues in >> a time scramble. >> >>> >>>> With that said, we have a few options for the current patch: >>>> >>>> 0. Accept the patch and do nothing about the issue. Clearly not a good >>>> option. The argument can be made that the problem was also >>>> backported to stable DPDK 21.11.5 and 22.11.something, so older OVS >>>> releases are also affected, i.e. it's kind of not a problem for 3.3 >>>> release of OVS in particular. However, for older releases the users >>>> can choose to fall back to older stable releases of DPDK. With a >>>> major version upgrade we are going to introduce breaking changes, >>>> and there is nowhere to fall back, since going back to 22.11 will >>>> break features for certain drivers even if DPDK API/ABI that we >>>> use would have been compatible. >>>> >>> >>> I have reverted the patch that introduced the issue for 21.11.6. >>> Hopefully we can do the same for 22.11.4, and we will have those >>> releases shortly to cover the branches using those LTS's. >>> >>>> 1. Accept the patch and document that users will need to revert a >>>> particular DPDK commit, if they are planning to use VFs on Intel NICs. >>>> And upgrade to 23.11.1 as soon as it is available, assuming the issue >>>> will be fixed there. >>>> >>>> This is not a very user-friendly option. And it is not clear if >>>> distributions will do that. Also, it's a one-off solution that we may >>>> have to repeat every year. And it might not be possible for other >>>> types of issues we may encounter in the future. Also, users will >>>> have zero validation for the changes they make in DPDK. >>>> >>>> 2. Check if DPDK can make a one-off stable release of 23.11.1 with just >>>> this >>>> patch reverted or the fix implemented. If this can be done before OVS >>>> release in mid February, that might be acceptable. >>>> >>>> This will likely mean skipping some validation steps on the DPDK release >>>> side, so not ideal. However, it is better than asking users to revert >>>> this patch themselves as they will have zero validation this way. >>>> This also doesn't address the bigger problem with DPDK stable release >>>> cadence and making one-off releases every year doesn't sound right. >>>> >>> >>> Quite similar, but I guess 1 is more of an inconvenience for the user to >>> have to revert that patch themselves, especially if they are just using >>> the tar file. >>> >>> I'm not sure if it's Luca who is going to maintain 23.11 LTS, but if >>> he's not available I would be prepared to make a 23.11.1 release with a >>> revert for that issue *if* it's confirmed and agreed by Intel devs. >> >> I see that there were no replies to your questions in the BZ. Should the >> revert patch for a main branch be posted to dpdk-dev? >> > > The code has diverged between main and stable branches and the patch > that triggered the issue was a "fix" for another issue, so they may want > to take a different approach on main branch, or debate about API usage > and integration. > >>> >>>> 3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer >>>> releases of OVS. >>>> >>>> This should address the release cadence problem, sine we'll have at >>>> least one stable release of DPDK before moving to a new major version, >>>> giving us time to test and report issues. Upgrading to .1 stable >>>> versions >>>> instead of unstable ones seems like a good idea for software in general. >>>> Obvious downside for this approach is an even longer time for new DPDK >>>> features to be available for OVS users. >>>> >>> >>> A couple of downsides wrt doing this for current issue: >>> - Possibly users of other DPDK drivers want to use the updated versions >>> in DPDK 23.11 >>> - Some users may have already planned updating to a common DPDK >>> with/without OVS to 23.11 based on what has been the standard workflow >>> over last few years >>> - 22.11 will EoL a year before 23.11 so it may mean a user using OVS 3.3 >>> faces more time with an unmaintained DPDK LTS at the backend of their usage >> >> Good point. OVS LTS support is already longer (3 years) than DPDK's (2 >> years) >> and moving adoption of new DPDK LTS releases to summer releases will make >> the difference even larger, since DPDK versions they are using will last only >> for 10 months. This is not ideal, but we don't have a lot of options, unless >> the options 4 or 5 are happening. >> > > We have been doing 3 years maintenance on last few DPDK LTS releases as > a trial and it has gone well. We didn't want to update docs and then > break a promise in a year or two's time, but I think at this point we > can update the docs to officially state 3 years maintenance. > > So the overlap maintenance time is better, but the point about the extra > year still holds, just a bit later on.
At least, 3 years of DPDK LTS support will mean about 2 years support for OVS, i.e. the new OVS LTS candidate will be about to be released when support ends. Much better than 10 months. > >>> >>>> Note: Moving release dates for major releases of OVS or DPDK doesn't sound >>>> right and may create more issues than it solves due to release time >>>> alignments >>>> with major consumers like OVN, distributions and cluster management >>>> systems. >>>> So, not suggesting that. >>>> >>>> <rant> >>>> 4. Revisiting the stable release policy for DPDK LTS releases might be a >>>> good >>>> thing though, since 5 months is an unreasonably long time for a fresh >>>> release to not receive any bug fixes. This time gap is also larger than >>>> a >>>> time gap between two stable releases of the same series, i.e. time >>>> between >>>> xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which >>>> doesn't make a lot of sense. >>>> >>>> I understand a position of DPDK project to not incorporate testing of >>>> external applications into their release process, since it can't >>>> possibly >>>> test with every application. However, application developers can't >>>> possibly >>>> test every DPDK driver on their own, because upstream communities like >>>> OVS >>>> simply don't have hardware/infrastructure to do so. And there is a >>>> clear >>>> gap in testing and validation on DPDK side, i.e. validation performed by >>>> DPDK project alone is not sufficient. That means that bugs are >>>> inevitable >>>> and fresh releases of DPDK will contain bugs making them unusable for >>>> some >>>> applications. Hence the need for faster process for .1 releases. E.g. >>>> have xx.11.1 release in the end of Januray / start of February would be >>>> fine. Though the timing with different holidays around the world is not >>>> good. >>>> >>>> This option is just a little more sustainable option 2 as it will >>>> involve >>>> proper validation on DPDK side. But again it's not OVS' call to make. >>>> >>>> 5. Have bug-free DPDK right out the gate :D. This is obviously not >>>> happening >>>> unless OVS is tightly integrated into DPDK testing and validation and >>>> all >>>> the issues are caught before new version of DPDK is released. >>>> </rant> >>>> >>>> I think, option 0 is a no-go. To resolve a current issue at hands for OVS >>>> 3.3 we could go with 1, 2 or 3. Though 2 is not OVS' call to make. Long >>>> term solutions are 3 or 4, as 1 and 2 require solving this problem every >>>> year, >>>> depending on us having problems with a new release or not. 5 doesn't seem >>>> like a possible solution at the moment for various reasons. >>>> >>>> Thoughts? >>>> >>> >>> My preference would be 2, as it's the least amount of headaches and >>> change for users. >> >> 2 does sounds like the best short term option, I agree. Though is is also >> the one we (OVS community) have the least control over. We're waiting for >> iavf maintainers to confirm the issue and then we're relying on 23.11.1 >> release to be made and be made on time. So, the option is getting less >> viable each day. >> > > I'm coming to the conclusion that there may not be a quick solution on > DPDK side to allow for option 2. and it's probably best to just go with > option 1 at this point. > > That will allow more time before OVS 3.4/DPDK 23.11.1 to debate the API > and how and where is best way to fix it. OK. I think today we have no real choice but to go with the option 1. We'll need a NEWS entry for that in the patch. I'll make sure to include a variant of it in the release announce in February if nothing changes until then. But I think we should still pursue the option 2 in case the solution will be found before the final release in February. Though if there will be no conclusion on the long term problem until autumn, we should go with 3 and move 24.11 adoption to summer of 2025. And follow that strategy going forward, as the current approach is not sustainable. > > David, let us know if you agree ? If so, maybe you can send a new > version of the patch with the added documentation. I can help with docs > or discussing further. David, could you, please, add a note in the NEWS file and send a new version of the patch? > >>> >>> thanks, >>> Kevin. >>> >>>> We need to make a decision on this by the end of this week. >>>> >>>> Best regards, Ilya Maximets. >>>> >>> >> > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev