On 1/10/24 19:35, Kevin Traynor wrote:
> +cc some others people who may be interested about OVS upgrading DPDK
> version.
> 
> On 10/01/2024 16:52, Ilya Maximets wrote:
>> On 12/13/23 14:06, David Marchand wrote:
>>> This commit adds support for DPDK v23.11.
>>> It updates the CI script and documentation and includes the following
>>> changes coming from the dpdk-latest branch:
>>>
>>> - sparse: Add some compiler intrinsics for DPDK build.
>>>   
>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=*
>>>
>>> - ci: Cache DPDK installed libraries only.
>>> - ci: Reduce optional libraries in DPDK.
>>>   
>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=*
>>>
>>> - system-dpdk: Ignore net/ice error log about QinQ offloading.
>>>   
>>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=*
>>>
>>> Signed-off-by: David Marchand <david.march...@redhat.com>
>>> ---
>>
>> Hi, Kevin, David, others.
>>
> 
> Hi Ilya,
> 
> Thanks for summarizing the options.
> 
>> We need to make a decision on this patch as the proposed branching is
>> only one week away.  As far as I understand, there is a problem with
>> Intel Virtual Function driver (iavf) that deadlocks OVS when VF is added.
>> The problem is described in https://bugs.dpdk.org/show_bug.cgi?id=1337
>> (not 1337 at all) and the commit that introduced the issue in DPDK is
>> known.  To the date the issue is not fixed.  The potential solution is
>> to revert the commit from DPDK, bringing back another issue fixed by
>> aforementioned commit, though that issue seems less severe and, to my
>> knowledge, we didn't actually experience it in the past.
>>
> 
> Agree, non-regression is always better.
> 
>> There is also a situation around DPDK stable releases.  Since these are
>> normally created after the next major release of DPDK is out, the time
>> gap between xx.11 and xx.11.1 is 5 months.  Which is a lot, especially
>> for an LTS release, since projects are likely to migrate to new LTS
>> releases of DPDK and they are likely to discover bugs that need fixes
>> earlier than in 5 months.
>>
> 
> Good feedback. The issue is the LTS follows the DPDK main release in
> order that fixes are applied in main branch and already have gone
> through some validation. But maybe there's a more limited xx.11.1
> version with fixes for reported issues that could be released etc. It's
> something that would need more discussion.
> 
> I think it's better to address the current issue and possible future
> workflow changes separately as much as possible, as they might need
> different resolutions and the thread could get a bit overloaded. I've
> just commented on the current issue below for now.

OK.  That makes sense.  It's hard to solve long term issues in
a time scramble.

> 
>> With that said, we have a few options for the current patch:
>>
>> 0. Accept the patch and do nothing about the issue.  Clearly not a good
>>    option.  The argument can be made that the problem was also
>>    backported to stable DPDK 21.11.5 and 22.11.something, so older OVS
>>    releases are also affected, i.e. it's kind of not a problem for 3.3
>>    release of OVS in particular.  However, for older releases the users
>>    can choose to fall back to older stable releases of DPDK.  With a
>>    major version upgrade we are going to introduce breaking changes,
>>    and there is nowhere to fall back, since going back to 22.11 will
>>    break features for certain drivers even if DPDK API/ABI that we
>>    use would have been compatible.
>>
> 
> I have reverted the patch that introduced the issue for 21.11.6.
> Hopefully we can do the same for 22.11.4, and we will have those
> releases shortly to cover the branches using those LTS's.
> 
>> 1. Accept the patch and document that users will need to revert a
>>    particular DPDK commit, if they are planning to use VFs on Intel NICs.
>>    And upgrade to 23.11.1 as soon as it is available, assuming the issue
>>    will be fixed there.
>>
>>    This is not a very user-friendly option.  And it is not clear if
>>    distributions will do that.  Also, it's a one-off solution that we may
>>    have to repeat every year.  And it might not be possible for other
>>    types of issues we may encounter in the future.  Also, users will
>>    have zero validation for the changes they make in DPDK.
>>
>> 2. Check if DPDK can make a one-off stable release of 23.11.1 with just this
>>    patch reverted or the fix implemented.  If this can be done before OVS
>>    release in mid February, that might be acceptable.
>>
>>    This will likely mean skipping some validation steps on the DPDK release
>>    side, so not ideal.  However, it is better than asking users to revert
>>    this patch themselves as they will have zero validation this way.
>>    This also doesn't address the bigger problem with DPDK stable release
>>    cadence and making one-off releases every year doesn't sound right.
>>
> 
> Quite similar, but I guess 1 is more of an inconvenience for the user to
> have to revert that patch themselves, especially if they are just using
> the tar file.
> 
> I'm not sure if it's Luca who is going to maintain 23.11 LTS, but if
> he's not available I would be prepared to make a 23.11.1 release with a
> revert for that issue *if* it's confirmed and agreed by Intel devs.

I see that there were no replies to your questions in the BZ.  Should the
revert patch for a main branch be posted to dpdk-dev?

> 
>> 3. Postpone 23.11 to OVS 3.4 and likely just move DPDK upgrades to summer
>>    releases of OVS.
>>
>>    This should address the release cadence problem, sine we'll have at
>>    least one stable release of DPDK before moving to a new major version,
>>    giving us time to test and report issues.  Upgrading to .1 stable versions
>>    instead of unstable ones seems like a good idea for software in general.
>>    Obvious downside for this approach is an even longer time for new DPDK
>>    features to be available for OVS users.
>>
> 
> A couple of downsides wrt doing this for current issue:
> - Possibly users of other DPDK drivers want to use the updated versions
> in DPDK 23.11
> - Some users may have already planned updating to a common DPDK
> with/without OVS to 23.11 based on what has been the standard workflow
> over last few years
> - 22.11 will EoL a year before 23.11 so it may mean a user using OVS 3.3
> faces more time with an unmaintained DPDK LTS at the backend of their usage

Good point.  OVS LTS support is already longer (3 years) than DPDK's (2 years)
and moving adoption of new DPDK LTS releases to summer releases will make
the difference even larger, since DPDK versions they are using will last only
for 10 months.  This is not ideal, but we don't have a lot of options, unless
the options 4 or 5 are happening.

> 
>> Note: Moving release dates for major releases of OVS or DPDK doesn't sound
>> right and may create more issues than it solves due to release time 
>> alignments
>> with major consumers like OVN, distributions and cluster management systems.
>> So, not suggesting that.
>>
>> <rant>
>> 4. Revisiting the stable release policy for DPDK LTS releases might be a good
>>    thing though, since 5 months is an unreasonably long time for a fresh
>>    release to not receive any bug fixes. This time gap is also larger than a
>>    time gap between two stable releases of the same series, i.e. time between
>>    xx.11.1 and xx.11.2 is less than time between xx.11 and xx.11.1, which
>>    doesn't make a lot of sense.
>>
>>    I understand a position of DPDK project to not incorporate testing of
>>    external applications into their release process, since it can't possibly
>>    test with every application.  However, application developers can't 
>> possibly
>>    test every DPDK driver on their own, because upstream communities like OVS
>>    simply don't have hardware/infrastructure to do so.  And there is a clear
>>    gap in testing and validation on DPDK side, i.e. validation performed by
>>    DPDK project alone is not sufficient.  That means that bugs are inevitable
>>    and fresh releases of DPDK will contain bugs making them unusable for some
>>    applications.  Hence the need for faster process for .1 releases.  E.g.
>>    have xx.11.1 release in the end of Januray / start of February would be
>>    fine.  Though the timing with different holidays around the world is not
>>    good.
>>
>>    This option is just a little more sustainable option 2 as it will involve
>>    proper validation on DPDK side.  But again it's not OVS' call to make.
>>
>> 5. Have bug-free DPDK right out the gate :D.  This is obviously not happening
>>    unless OVS is tightly integrated into DPDK testing and validation and all
>>    the issues are caught before new version of DPDK is released.
>> </rant>
>>
>> I think, option 0 is a no-go.  To resolve a current issue at hands for OVS
>> 3.3 we could go with 1, 2 or 3.  Though 2 is not OVS' call to make.  Long
>> term solutions are 3 or 4, as 1 and 2 require solving this problem every 
>> year,
>> depending on us having problems with a new release or not.  5 doesn't seem
>> like a possible solution at the moment for various reasons.
>>
>> Thoughts?
>>
> 
> My preference would be 2, as it's the least amount of headaches and
> change for users.

2 does sounds like the best short term option, I agree.  Though is is also
the one we (OVS community) have the least control over.  We're waiting for
iavf maintainers to confirm the issue and then we're relying on 23.11.1
release to be made and be made on time.  So, the option is getting less
viable each day.

> 
> thanks,
> Kevin.
> 
>> We need to make a decision on this by the end of this week.
>>
>> Best regards, Ilya Maximets.
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to