On Wed, Aug 20, 2025 at 04:26:14PM +0300, Timo Teras wrote:
> On Wed, 20 Aug 2025 15:38:12 +0300
> "Lifshits, Vitaly" <[email protected]> wrote:
> 
> > On 8/20/2025 9:57 AM, Timo Teras wrote:
> > 
> > >>>
> > >>> Thanks for adding this!
> > >>>
> > >>> However, as a user, I find it inconvenient if the default setting
> > >>> results in a subtly broken system on a device I just from a store.
> > >>>
> > >>> Since this affects devices from multiple large vendors, would it
> > >>> be possible to add some kind of quirk mechanism to automatically
> > >>> enable this on known "bad" systems. Perhaps something based on
> > >>> the DMI or other system specific information. Could something
> > >>> like this be implemented?
> > >>>
> > >>> At least in my use case I have multiple e1000e using laptops on
> > >>> the same link partner working, and only one broken device for
> > >>> which I reported this issue. So at least on my experience the
> > >>> issue relates to specific system primarily (perhaps also
> > >>> requiring a specific link partner for the issue to show up).  
> > >>
> > >> Unfortunately, there is no visible configuration that allows the
> > >> driver to reliably identify problematic systems.
> > >> If in the future we find such data, then we can improve the
> > >> workaround and make it automatic.
> > >>
> > >> At present, the user-controlled interface is the best we have.  
> > > 
> > > Could you look at:
> > >   - drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c
> > >   - drivers/soundwire/dmi-quirks.c
> > > 
> > > These use dmi_first_match() to match the DMI information of the
> > > system and then apply quirks based on the matching per-system data.
> > > 
> > > Having similar mechanism in e1000e should be possible, right?
> > > 
> > > I am happy to provide the needed DMI information from my system if
> > > this works out.
> > > 
> > > Timo  
> > 
> > Hi Timo,
> > 
> > At the moment, we have no clear knowledge as to which systems may be 
> > affected, and what common characteristics they share.
> > We are working with vendors to try to narrow it down.
> > You are most welcome to share DMI information from your system. It
> > can help with further investigation.
> > 
> > However, maintaining a DMI quirk for every single system for which an 
> > issue has been reported is not feasible. Trying to deduce a pattern
> > from a handful of data points can lead to it being too broad or too
> > narrow. Furthermore, it may set up expectations of updating the quirk
> > every time another user comes and says 'your default setting does not
> > work for me'. This can quickly escalate out of control, and generally
> > seems like the wrong approach.
> > 
> > Ultimately, vendors are best positioned to manage this, as they know 
> > which of their systems require this parameter. If a list were to be 
> > maintained, I’d suggest something similar to what Mario proposed for 
> > Dell platforms a few years ago for a different issue:
> > https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/
> > 
> > For now, I prefer not to delay the current patch, acknowledging that 
> > finding a better solution may take time.
> 
> Thank you for the continued investigation on the issue!
> 
> But I find this commit to not fix the reported regression. Nothing
> changes without additional admin/user changes. Things used to work and
> the added/modified K1 support thing is causing a regression.
> 
> Ubuntu has already reverted the offending patch due to complaints in
> some flavors:
>  
> https://patchwork.ozlabs.org/project/ubuntu-kernel/patch/[email protected]/
>  https://bugs.launchpad.net/bugs/2115393
>  
> https://www.mail-archive.com/[email protected]/msg551129.html

Qubes OS also has this change reverted in default kernel, for the same
reason:
https://github.com/QubesOS/qubes-issues/issues/9896
https://github.com/QubesOS/qubes-linux-kernel/commit/4fb8c96dd7bd73dda00a89d026b6ebefff939a67

We've got several reports of the regression caused by the "e1000e:
change k1 configuration on MTP and later platforms", and _none_
complains after reverting it. And we do have many users on MTL or newer.

> This is what I ended up also doing as it reliably fixes things on every
> model I have, and has not caused any of them to have any other issues
> (including packet loss).
> 
> At least mainstream Dell Pro and HP Zbook laptops have been reported to
> be broken. See:
>  https://lists.openwall.net/netdev/2025/07/01/57
>  
> https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20250623/048860.html
> 
> This seems to be the same issue:
>  https://bugzilla.kernel.org/show_bug.cgi?id=218642
> 
> So some questions at this point:
> 
> If the added K1 configuration does not work and causes regressions,
> could it be reverted and added back when a k1 configuration change that
> can determine the affected systems is ready?
> 
> Could you explain the commit "e1000e: change k1 configuration on MTP
> and later platforms" more? What does it fix? My understanding it is
> "minor packet loss that may affect some machines"?
> 
> How many machines / what kind of scenario is affected? Is it fixing a
> more serious issue than the regression it is causing?
> The regression is completely defunct ethernet after unplugging cable.
> 
> My understanding is that the K1 change affects only power consumption.
> Is this right? How much is the consumption difference? Would it rather
> make sense to disable K1 by default on the potentially affected mac/phy
> versions until a good common denominator is found?

Given the severity of the regression, I'd suggest something like the
above. Have functional configuration by default, and have an option to
potentially improve power consumption. Once criteria when it can be
safely enabled by default are figured out, then it's fine to apply the
improvement by default. But I'd rather have users with functional
ethernet, than slight power (or performance?) improvement at the cost of
completely breaking it for others...

> On the other hand, do you think that asking to have a list of the few
> currently known affected machines (until a simpler common denominator
> can be found) too unreasonable? If the list seems to grow much, it
> would be an indication that the default setting is wrong and changing
> the defaults might be a good idea.

Let me know what info you'd need for such list.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature

Reply via email to