Hi all,

I am from the Canonical's kernel team and currently investigating this
issue. In this case, jammy-hwe, mantic-hwe, and noble by default uses
6.8 kernel (when a generic jammy and mantic is installed it uses hwe
version by default). So, the issue is with 6.8 kernel rather than
series.

I was not able to reproduce the error with generic 6.8.0-45.45 kernel
after 1 hour of stressing. I am still working on this. I really
appreciate all the feedback you provided.


Meanwhile, for those who are having the problem, I have created an unofficial 
version of 6.8.0-45.45 kernel which includes the upstream fix from 
"6ddc9deacc1312762c2edd9de00ce76b00f69f7c",
 - for jammy: 
https://launchpad.net/~mehmetbasaran/+archive/ubuntu/linux-hwe-6.8-6.8.0-45.45-nfs-patch
 - for noble: 
https://launchpad.net/~mehmetbasaran/+archive/ubuntu/linux-6.8.0-45.45-nfs-patch


Installation instructions:

Note that, if you are using secure boot, you will not be able to boot
into these kernels. You will need to disable it first.

# Add the unofficial ppa. Pick the correct one depending on your series
# For jammy: sudo add-apt-repository 
ppa:mehmetbasaran/linux-hwe-6.8-6.8.0-45.45-nfs-patch
# For noble: sudo add-apt-repository 
ppa:mehmetbasaran/linux-6.8.0-45.45-nfs-patch

$ sudo add-apt-repository ppa:mehmetbasaran/linux-6.8.0-45.45-nfs-patch
$ sudo apt update

$ sudo apt install linux-buildinfo-6.8.0-46-generic-nfs \
  linux-cloud-tools-6.8.0-46-generic-nfs \
  linux-cloud-tools-common \
  linux-headers-6.8.0-46-generic-nfs \
  linux-image-unsigned-6.8.0-46-generic-nfs \
  linux-modules-6.8.0-46-generic-nfs \
  linux-modules-extra-6.8.0-46-generic-nfs \
  linux-modules-ipu6-6.8.0-46-generic-nfs \
  linux-modules-iwlwifi-6.8.0-46-generic-nfs \
  linux-modules-usbio-6.8.0-46-generic-nfs \
  linux-nfs-6.8-cloud-tools-6.8.0-46 \
  linux-nfs-6.8-headers-6.8.0-46 \
  linux-nfs-6.8-tools-6.8.0-46 \
  linux-tools-6.8.0-46-generic-nfs

Next time you boot, you will be using the patched 6.8.0-45.45
$ uname -r
# 6.8.0-46-generic-nfs

To return back to the previous kernel (official 6.8.0-45.45) you just need to 
update grub:
$ grep 'menuentry \|submenu ' /boot/grub/grub.cfg | cut -f2 -d "'" # Prints 
available kernels on your machine, in my case:
  Ubuntu
  Advanced options for Ubuntu
  Ubuntu, with Linux 6.8.0-46-generic-nfs
  Ubuntu, with Linux 6.8.0-46-generic-nfs (recovery mode)
  Ubuntu, with Linux 6.8.0-45-generic
  Ubuntu, with Linux 6.8.0-45-generic (recovery mode)
  Ubuntu, with Linux 6.5.0-18-generic
  Ubuntu, with Linux 6.5.0-18-generic (recovery mode)


# Change GRUB_DEFAULT in /etc/default/grub
# from GRUB_DEFAULT=0
# to GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 
6.8.0-45-generic"

$ sudo update-grub
$ reboot

$ uname -r
# 6.8.0-45-generic


After changing your kernel to previous version you can completely remove the 
unofficial kernel:
  
# Now these packages will be safe to be removed
$ sudo apt remove linux-buildinfo-6.8.0-46-generic-nfs \
  linux-cloud-tools-6.8.0-46-generic-nfs \
  linux-headers-6.8.0-46-generic-nfs \
  linux-image-unsigned-6.8.0-46-generic-nfs \
  linux-modules-6.8.0-46-generic-nfs \
  linux-modules-extra-6.8.0-46-generic-nfs \
  linux-modules-ipu6-6.8.0-46-generic-nfs \
  linux-modules-iwlwifi-6.8.0-46-generic-nfs \
  linux-modules-usbio-6.8.0-46-generic-nfs \
  linux-nfs-6.8-cloud-tools-6.8.0-46 \
  linux-nfs-6.8-headers-6.8.0-46 \
  linux-nfs-6.8-tools-6.8.0-46 \
  linux-tools-6.8.0-46-generic-nfs

# Remove unofficial ppa from update list
$ sudo add-apt-repository --remove ppa:mehmetbasaran/linux-6.8.0-45.45-nfs-patch

# Restore grub settings
# Change GRUB_DEFAULT in /etc/default/grub to GRUB_DEFAULT=0
$ sudo update-grub

$ reboot

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2062568

Title:
  nfsd gets unresponsive after some hours of operation

Status in linux package in Ubuntu:
  Confirmed
Status in nfs-utils package in Ubuntu:
  Confirmed

Bug description:
  I installed the 24.04 Beta on two test machines that were running
  22.04 without issues before. One of them exports two volumes that are
  mounted by the other machine, which primarily uses them as a secondary
  storage for ccache.

  After being up for a couple of hours (happened twice since yesterday
  evening) it seems that nfsd on the machine exporting the volumes hangs
  on something.

  From dmesg on the server (repeated a few times):

  [11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds.
  [11183.290558]       Not tainted 6.8.0-22-generic #22-Ubuntu
  [11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [11183.290582] task:nfsd            state:D stack:0     pid:1419  tgid:1419  
ppid:2      flags:0x00004000
  [11183.290587] Call Trace:
  [11183.290602]  <TASK>
  [11183.290606]  __schedule+0x27c/0x6b0
  [11183.290612]  schedule+0x33/0x110
  [11183.290615]  schedule_timeout+0x157/0x170
  [11183.290619]  wait_for_completion+0x88/0x150
  [11183.290623]  __flush_workqueue+0x140/0x3e0
  [11183.290629]  nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
  [11183.290689]  nfsd4_destroy_session+0x186/0x260 [nfsd]
  [11183.290744]  nfsd4_proc_compound+0x3af/0x770 [nfsd]
  [11183.290798]  nfsd_dispatch+0xd4/0x220 [nfsd]
  [11183.290851]  svc_process_common+0x44d/0x710 [sunrpc]
  [11183.290924]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
  [11183.290976]  svc_process+0x132/0x1b0 [sunrpc]
  [11183.291041]  svc_handle_xprt+0x4d3/0x5d0 [sunrpc]
  [11183.291105]  svc_recv+0x18b/0x2e0 [sunrpc]
  [11183.291168]  ? __pfx_nfsd+0x10/0x10 [nfsd]
  [11183.291220]  nfsd+0x8b/0xe0 [nfsd]
  [11183.291270]  kthread+0xef/0x120
  [11183.291274]  ? __pfx_kthread+0x10/0x10
  [11183.291276]  ret_from_fork+0x44/0x70
  [11183.291279]  ? __pfx_kthread+0x10/0x10
  [11183.291281]  ret_from_fork_asm+0x1b/0x30
  [11183.291286]  </TASK>

  From dmesg on the client (repeated a number of times):
  [ 6596.911785] RPC: Could not send backchannel reply error: -110
  [ 6596.972490] RPC: Could not send backchannel reply error: -110
  [ 6837.281307] RPC: Could not send backchannel reply error: -110

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: nfs-kernel-server 1:2.6.4-3ubuntu5
  ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
  Uname: Linux 6.8.0-22-generic x86_64
  .etc.request-key.d.id_resolver.conf: create   id_resolver     *       *       
/usr/sbin/nfsidmap -t 600 %k %d
  ApportVersion: 2.28.1-0ubuntu1
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Fri Apr 19 14:10:25 2024
  InstallationDate: Installed on 2024-04-16 (3 days ago)
  InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 
(20240410.1)
  NFSMounts:

  NFSv4Mounts:

  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=<set>
  SourcePackage: nfs-utils
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2062568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to