On 1/12/24 14:39, Kevin Traynor wrote:
> On 11/01/2024 15:44, Ilya Maximets wrote:
>> On 1/10/24 19:35, Kevin Traynor wrote:
>>> +cc some others people who may be interested about OVS upgrading DPDK
>>> version.
>>>
>>> On 10/01/2024 16:52, Ilya Maximets wrote:
>>>> On 12/13/23 14:06, David Marchand wrote:
>>>>> This commit adds support for DPDK v23.11.
>>>>> It updates the CI script and documentation and includes the following
>>>>> changes coming from the dpdk-latest branch:
>>>>>
>>>>> - sparse: Add some compiler intrinsics for DPDK build.
>>>>>   
>>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=*
>>>>>
>>>>> - ci: Cache DPDK installed libraries only.
>>>>> - ci: Reduce optional libraries in DPDK.
>>>>>   
>>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=*
>>>>>
>>>>> - system-dpdk: Ignore net/ice error log about QinQ offloading.
>>>>>   
>>>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=*
>>>>>
>>>>> Signed-off-by: David Marchand <david.march...@redhat.com>
>>>>> ---
>>>>
>>>> Hi, Kevin, David, others.
>>>>
>>>
>>> Hi Ilya,
>>>
>>> Thanks for summarizing the options.
>>>
>>>> We need to make a decision on this patch as the proposed branching is
>>>> only one week away.  As far as I understand, there is a problem with
>>>> Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added.
>>>> The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337
>>>> (not 1337 at all) and the commit that introduced the issue in DPDK is
>>>> known.  To the date the issue is not fixed.  The potential solution is
>>>> to revert the commit from DPDK, bringing back another issue fixed by
>>>> aforementioned commit, though that issue seems less severe and, to my
>>>> knowledge, we didn't actually experience it in the past.
>>>>
>>>
>>> Agree, non-regression is always better.
>>>
>>>> There is also a situation around DPDK stable releases.  Since these are
>>>> normally created after the next major release of DPDK is out, the time
>>>> gap between xx.11 and xx.11.1 is 5 months.  Which is a lot, especially
>>>> for an LTS release, since projects are likely to migrate to new LTS
>>>> releases of DPDK and they are likely to discover bugs that need fixes
>>>> earlier than in 5 months.
>>>>
>>>
>>> Good feedback. The issue is the LTS follows the DPDK main release in
>>> order that fixes are applied in main branch and already have gone
>>> through some validation. But maybe there's a more limited xx.11.1
>>> version with fixes for reported issues that could be released etc. It's
>>> something that would need more discussion.
>>>
>>> I think it's better to address the current issue and possible future
>>> workflow changes separately as much as possible, as they might need
>>> different resolutions and the thread could get a bit overloaded. I've
>>> just commented on the current issue below for now.
>>
>> OK.  That makes sense.  It's hard to solve long term issues in
>> a time scramble.
>>
>>>
>>>> With that said, we have a few options for the current patch:
>>>>
>>>> 0. Accept the patch and do nothing about the issue.  Clearly not a good
>>>>    option.  The argument can be made that the problem was also
>>>>    backported to stable DPDK 21.11.5 and 22.11.something, so older OVS
>>>>    releases are also affected, i.e. it's kind of not a problem for 3.3
>>>>    release of OVS in particular.  However, for older releases the users
>>>>    can choose to fall back to older stable releases of DPDK.  With a
>>>>    major version upgrade we are going to introduce breaking changes,
>>>>    and there is nowhere to fall back, since going back to 22.11 will
>>>>    break features for certain drivers even if DPDK API/ABI that we
>>>>    use would have been compatible.
>>>>
>>>
>>> I have reverted the patch that introduced the issue for 21.11.6.
>>> Hopefully we can do the same for 22.11.4, and we will have those
>>> releases shortly to cover the branches using those LTS's.
>>>
>>>> 1. Accept the patch and document that users will need to revert a
>>>>    particular DPDK commit, if they are planning to use VFs on Intel NICs.
>>>>    And upgrade to 23.11.1 as soon as it is available, assuming the issue
>>>>    will be fixed there.
>>>>
>>>>    This is not a very user-friendly option.  And it is not clear if
>>>>    distributions will do that.  Also, it's a one-off solution that we may
>>>>    have to repeat every year.  And it might not be possible for other
>>>>    types of issues we may encounter in the future.  Also, users will
>>>>    have zero validation for the changes they make in DPDK.
>>>>
>>>> 2. Check if DPDK can make a one-off stable release of 23.11.1 with just 
>>>> this
>>>>    patch reverted or the fix implemented.  If this can be done before OVS
>>>>    release in mid February, that might be acceptable.
>>>>
>>>>    This will likely mean skipping some validation steps on the DPDK release
>>>>    side, so not ideal.  However, it is better than asking users to revert
>>>>    this patch themselves as they will have zero validation this way.
>>>>    This also doesn't address the bigger problem with DPDK stable release
>>>>    cadence and making one-off releases every year doesn't sound right.
>>>>
>>>
>>> Quite similar, but I guess 1 is more of an inconvenience for the user to
>>> have to revert that patch themselves, especially if they are just using
>>> the tar file.
>>>
>>> I'm not sure if it's Luca who is going to maintain 23.11 LTS, but if
>>> he's not available I would be prepared to make a 23.11.1 release with a
>>> revert for that issue *if* it's confirmed and agreed by Intel devs.
>>
>> I see that there were no replies to your questions in the BZ.  Should the
>> revert patch for a main branch be posted to dpdk-dev?
>>
> 
> The code has diverged between main and stable branches and the patch
> that triggered the issue was a "fix" for another issue, so they may want
> to take a different approach on main branch, or debate about API usage
> and integration.
> 
>>>
>>>> 3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer
>>>>    releases of OVS.
>>>>
>>>>    This should address the release cadence problem, sine we'll have at
>>>>    least one stable release of DPDK before moving to a new major version,
>>>>    giving us time to test and report issues.  Upgrading to .1 stable 
>>>> versions
>>>>    instead of unstable ones seems like a good idea for software in general.
>>>>    Obvious downside for this approach is an even longer time for new DPDK
>>>>    features to be available for OVS users.
>>>>
>>>
>>> A couple of downsides wrt doing this for current issue:
>>> - Possibly users of other DPDK drivers want to use the updated versions
>>> in DPDK 23.11
>>> - Some users may have already planned updating to a common DPDK
>>> with/without OVS to 23.11 based on what has been the standard workflow
>>> over last few years
>>> - 22.11 will EoL a year before 23.11 so it may mean a user using OVS 3.3
>>> faces more time with an unmaintained DPDK LTS at the backend of their usage
>>
>> Good point.  OVS LTS support is already longer (3 years) than DPDK's (2 
>> years)
>> and moving adoption of new DPDK LTS releases to summer releases will make
>> the difference even larger, since DPDK versions they are using will last only
>> for 10 months.  This is not ideal, but we don't have a lot of options, unless
>> the options 4 or 5 are happening.
>>
> 
> We have been doing 3 years maintenance on last few DPDK LTS releases as
> a trial and it has gone well. We didn't want to update docs and then
> break a promise in a year or two's time, but I think at this point we
> can update the docs to officially state 3 years maintenance.
> 
> So the overlap maintenance time is better, but the point about the extra
> year still holds, just a bit later on.

At least, 3 years of DPDK LTS support will mean about 2 years support
for OVS, i.e. the new OVS LTS candidate will be about to be released
when support ends.  Much better than 10 months.

> 
>>>
>>>> Note: Moving release dates for major releases of OVS or DPDK doesn't sound
>>>> right and may create more issues than it solves due to release time 
>>>> alignments
>>>> with major consumers like OVN, distributions and cluster management 
>>>> systems.
>>>> So, not suggesting that.
>>>>
>>>> <rant>
>>>> 4. Revisiting the stable release policy for DPDK LTS releases might be a 
>>>> good
>>>>    thing though, since 5 months is an unreasonably long time for a fresh
>>>>    release to not receive any bug fixes. This time gap is also larger than 
>>>> a
>>>>    time gap between two stable releases of the same series, i.e. time 
>>>> between
>>>>    xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which
>>>>    doesn't make a lot of sense.
>>>>
>>>>    I understand a position of DPDK project to not incorporate testing of
>>>>    external applications into their release process, since it can't 
>>>> possibly
>>>>    test with every application.  However, application developers can't 
>>>> possibly
>>>>    test every DPDK driver on their own, because upstream communities like 
>>>> OVS
>>>>    simply don't have hardware/infrastructure to do so.  And there is a 
>>>> clear
>>>>    gap in testing and validation on DPDK side, i.e. validation performed by
>>>>    DPDK project alone is not sufficient.  That means that bugs are 
>>>> inevitable
>>>>    and fresh releases of DPDK will contain bugs making them unusable for 
>>>> some
>>>>    applications.  Hence the need for faster process for .1 releases.  E.g.
>>>>    have xx.11.1 release in the end of Januray / start of February would be
>>>>    fine.  Though the timing with different holidays around the world is not
>>>>    good.
>>>>
>>>>    This option is just a little more sustainable option 2 as it will 
>>>> involve
>>>>    proper validation on DPDK side.  But again it's not OVS' call to make.
>>>>
>>>> 5. Have bug-free DPDK right out the gate :D.  This is obviously not 
>>>> happening
>>>>    unless OVS is tightly integrated into DPDK testing and validation and 
>>>> all
>>>>    the issues are caught before new version of DPDK is released.
>>>> </rant>
>>>>
>>>> I think, option 0 is a no-go.  To resolve a current issue at hands for OVS
>>>> 3.3 we could go with 1, 2 or 3.  Though 2 is not OVS' call to make.  Long
>>>> term solutions are 3 or 4, as 1 and 2 require solving this problem every 
>>>> year,
>>>> depending on us having problems with a new release or not.  5 doesn't seem
>>>> like a possible solution at the moment for various reasons.
>>>>
>>>> Thoughts?
>>>>
>>>
>>> My preference would be 2, as it's the least amount of headaches and
>>> change for users.
>>
>> 2 does sounds like the best short term option, I agree.  Though is is also
>> the one we (OVS community) have the least control over.  We're waiting for
>> iavf maintainers to confirm the issue and then we're relying on 23.11.1
>> release to be made and be made on time.  So, the option is getting less
>> viable each day.
>>
> 
> I'm coming to the conclusion that there may not be a quick solution on
> DPDK side to allow for option 2. and it's probably best to just go with
> option 1 at this point.
> 
> That will allow more time before OVS 3.4/DPDK 23.11.1 to debate the API
> and how and where is best way to fix it.

OK.  I think today we have no real choice but to go with the option 1.
We'll need a NEWS entry for that in the patch.  I'll make sure to include
a variant of it in the release announce in February if nothing changes
until then.

But I think we should still pursue the option 2 in case the solution will
be found before the final release in February.

Though if there will be no conclusion on the long term problem until autumn,
we should go with 3 and move 24.11 adoption to summer of 2025.  And follow
that strategy going forward, as the current approach is not sustainable.

> 
> David, let us know if you agree ? If so, maybe you can send a new
> version of the patch with the added documentation. I can help with docs
> or discussing further.

David, could you, please, add a note in the NEWS file and send a new version
of the patch?

> 
>>>
>>> thanks,
>>> Kevin.
>>>
>>>> We need to make a decision on this by the end of this week.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>
>>
> 
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to