Hi @kleber-souza, @chengendu Thanks for your attention.
Allow me give my perception of the impact of fixing "bug" LP: #2003053. The original patchset introduced *two* regressions. One, (NFS deathlock) that hit everybody - fixed by #2009325, but the remaining one, are now hitting those of use spawning new user processes frequently, causing new "login times" to be created and access cache zapped. As a result we are looking at 300-400% increase in *overall* NFS operations, making the current kernels unusable for production. We do not have that kind of head-room on our NFS servers. The result is, we are simply stuck with kernels prior to #2003053 fixes. With recent CVE fixes in current kernel, we have now also resorted to the option of building our own kernels. This is very counter-productive. I understand the use case for the changes that went into "bug" #20003053. The reason why I call this a "bug" (in quotes) is due to the fact, that the behaviour has been around for more than 15 years. While age alone is not a qualifier, I am just saying that this has been an accepted behaviour for that long. Furthermore #2003053 will only apply in environments where the NFS-server has a knowledge of users and their secondary groups and validates them for ACCESS calls. (ours don't) >From the original upstream commit message 0eb43812c0270ee3d005ff32f91f7d0a6c4943af : "While it is reasonable to expect that such group membership changes are rare, and that we do not want to optimise the cache to accommodate them, it is also not unreasonable for the user to expect that if they log out and log back in again, that the staleness would clear up". It is clear that a trade-off was considered, however the use case being a "user" (a physical interactive person), and not any service of any kind. I am quite certain that with a use case with a regression of 3-4x increase in NFS ops, this would not have gone in the way it was. I understand why sometimes there is are strong reasons to cherry-pick changes from upstream - or making your own changes. IMHO, I do not think the use case for #20003053 was strong enough to justify that. The main regression assessment for #20003053 was considered low, as it was upstream changes. We now know, this was not the case. And with that knowledge, and comparing it to the weak use case the changes was trying to address, it should have been the right decision to revert the changes. The suggested upstream changes to introduce a mount option to address this, should should be turned around. The option should be added for those wanting to zap/re-validate their access caches on re-login, but leave the default behaviour as is. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2015827 Title: NFS performance issue while clearing the file access cache upon login Status in linux package in Ubuntu: In Progress Bug description: The performance issue that has been observed may be attributed to an increase in NFS ACCESS operations, possibly due to a new mechanism introduced in the Linux 6.2-rc3 NFS client side. This mechanism clears the access cache as soon as the cache timestamp becomes older than the user's login time, with the primary objective of preventing the NFS client's access cache from becoming stale due to any changes made to the user's group membership on the server after the user has already logged in on the client. It's worth noting that POSIX only refreshes the user's supplementary group information upon login. Upstream has taken into consideration that users may reasonably expect the access cache to be cleared when they log out and log back in again, with all behavior returning to normal after the replacement. The performance overhead can be particularly noticeable when applications or users switch to other privileged users via commands such as "su" to operate on NFS-mounted folders. In such cases, the privileged user's login time will be renewed, and NFS ACCESS operations will need to be re-sent, potentially leading to performance degradation. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2015827/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp