Hi @kleber-souza, @chengendu

Thanks for your attention.

Allow me give my perception of the impact of fixing "bug" LP: #2003053.

The original patchset introduced *two* regressions. One, (NFS deathlock)
that hit everybody - fixed by #2009325, but the remaining one, are now
hitting those of use spawning new user processes frequently, causing new
"login times" to be created and access cache zapped. As a result we are
looking at 300-400% increase in *overall* NFS operations, making the
current kernels unusable for production. We do not have that kind of
head-room on our NFS servers.

The result is, we are simply stuck with kernels prior to #2003053 fixes.
With recent CVE fixes in current kernel, we have now also resorted to
the option of building our own kernels. This is very counter-productive.

I understand the use case for the changes that went into "bug"
#20003053. The reason why I call this a "bug" (in quotes) is due to the
fact, that the behaviour has been around for more than 15 years. While
age alone is not a qualifier, I am just saying that this has been an
accepted behaviour for that long. Furthermore #2003053 will only apply
in environments where the NFS-server has a knowledge of users and their
secondary groups and validates them for ACCESS calls. (ours don't)

>From the original upstream commit message
0eb43812c0270ee3d005ff32f91f7d0a6c4943af : "While it is reasonable to
expect that such group membership changes are rare, and that we do not
want to optimise the cache to accommodate them, it is also not
unreasonable for the user to expect that if they log out and log back in
again, that the staleness would clear up".

It is clear that a trade-off was considered, however the use case being
a "user" (a physical interactive person), and not any service of any
kind. I am quite certain that with a use case with a regression of 3-4x
increase in NFS ops, this would not have gone in the way it was.

I understand why sometimes there is are strong reasons to cherry-pick
changes from upstream - or making your own changes. IMHO, I do not think
the use case for #20003053 was strong enough to justify that.

The main regression assessment for #20003053 was considered low, as it
was upstream changes. We now know, this was not the case.

And with that knowledge, and comparing it to the weak use case the
changes was trying to address, it should have been the right decision to
revert the changes.

The suggested upstream changes to introduce a mount option to address
this, should should be turned around. The option should be added for
those wanting to zap/re-validate their access caches on re-login, but
leave the default behaviour as is.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2015827

Title:
  NFS performance issue while clearing the file access cache upon login

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The performance issue that has been observed may be attributed to an increase 
in NFS ACCESS operations, possibly due to a new mechanism introduced in the 
Linux 6.2-rc3 NFS client side.
  This mechanism clears the access cache as soon as the cache timestamp becomes 
older than the user's login time,
  with the primary objective of preventing the NFS client's access cache from 
becoming stale due to any changes made to the user's group membership on the 
server after the user has already logged in on the client.

  It's worth noting that POSIX only refreshes the user's supplementary group 
information upon login.
  Upstream has taken into consideration that users may reasonably expect the 
access cache to be cleared when they log out and log back in again, with all 
behavior returning to normal after the replacement.

  The performance overhead can be particularly noticeable when applications or 
users switch to other privileged users via commands such as "su" to operate on 
NFS-mounted folders.
  In such cases, the privileged user's login time will be renewed, and NFS 
ACCESS operations will need to be re-sent, potentially leading to performance 
degradation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2015827/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to