Hi everyone,

Thanks for helping to test the test kernel, and I am glad that it fixes
the issue.

I have written a SRU template and have set it as the description of the
bug.

I have submitted the revert to the Kernel team mailing list for SRU:

Cover Letter:
https://lists.ubuntu.com/archives/kernel-team/2024-April/150433.html
Patch:
https://lists.ubuntu.com/archives/kernel-team/2024-April/150434.html

I'll go and have a talk to the kernel team now. This is a regression,
yes, but I am not sure if they will be interested in respinning the
2024.04.01 SRU cycle to include the fix, due to it being quite late in
the cycle. https://kernel.ubuntu.com/

We just did an emergency respin last week to fix bug 2060780, which was
another cifs regression for the 5.15 series, with an unrelated cause.

I'll write back with what the kernel team say, and hopefully give you an
estimated time of release. If its this SRU cycle, or the next.

Thanks,
Matthew

PS: Peter, I tried Debian 12.5 and I still can't reproduce the issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2060919

Title:
  cifs: Copying file to same directory results in page fault

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Mantic:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2060919

  [Impact]

  Copying or modifying a file to the same directory within a cifs mount
  results in a page fault, and the process that initiated the copy being
  killed. This could be cp, nautilus, etc.

  This results in the following oops:

  BUG: unable to handle page fault for address: fffffffffffffffe
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD f45a3f067 P4D f45a3f067 PUD f45a41067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 28103 Comm: Thread (pooled) Tainted: P OE 6.5.0-27-generic 
#28-Ubuntu
  RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
  Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 
89 45 d4 e8 79 b3 23 f1 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 
f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
  RSP: 0018:ffffaab6865ffbf8 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffaab6865ffc28 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000023854 R11: 0000000000000000 R12: 0000000000000000
  R13: ffffaab6865ffc78 R14: ffff906675d8aed0 R15: ffffaab6865ffc70
  FS: 00007bd4d594b6c0(0000) GS:ffff90753f800000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 000000017022a000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
  <TASK>
  ? show_regs+0x6d/0x80
  ? __die+0x24/0x80
  ? page_fault_oops+0x99/0x1b0
  ? kernelmode_fixup_or_oops+0xb2/0x140
  ? __bad_area_nosemaphore+0x1a5/0x2c0
  ? bad_area_nosemaphore+0x16/0x30
  ? do_kern_addr_fault+0x7b/0xa0
  ? exc_page_fault+0x1a4/0x1b0
  ? asm_exc_page_fault+0x27/0x30
  ? cifs_flush_folio+0x41/0xf0 [cifs]
  ? cifs_flush_folio+0x37/0xf0 [cifs]
  cifs_remap_file_range+0x172/0x660 [cifs]
  do_clone_file_range+0x101/0x2d0
  vfs_clone_file_range+0x3f/0x150
  ioctl_file_clone+0x52/0xc0
  do_vfs_ioctl+0x68f/0x910
  ? __fget_light+0xa5/0x120
  __x64_sys_ioctl+0x7d/0xf0
  do_syscall_64+0x59/0x90
  ? kmem_cache_free+0x22/0x3e0
  ? putname+0x5b/0x80
  ? exit_to_user_mode_prepare+0x30/0xb0
  ? syscall_exit_to_user_mode+0x37/0x60
  ? do_syscall_64+0x68/0x90
  ? do_syscall_64+0x68/0x90
  ? do_syscall_64+0x68/0x90

  There is no known workaround.

  [Fix]

  The stacktrace is very similar to a regression reported to upstream
  6.1.y:

  https://lore.kernel.org/linux-
  mm/a76b370f93cb928c049b94e1fde0d2da506dfcb2.ca...@amazon.com/T/

  The thread mentions that:

  commit 7b2404a886f8b91250c31855d287e632123e1746
  Author: David Howells <dhowe...@redhat.com>
  Date: Fri Dec 1 00:22:00 2023 +0000
  Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746

  introduced the issue to Debian's 6.1 kernel.

  This got backported to Ubuntu in:

  commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
  Author: David Howells <dhowe...@redhat.com>
  Date: Fri Dec 1 00:22:00 2023 +0000
  Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()

  $ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
  Ubuntu-6.5.0-20.20~107

  Which we have been using for some time now, and is not the culprit.

  Reading the regression mailing list thread, they mention that things
  work differently in 6.1:

  > Yeah.  __filemap_get_folio() works differently in v6.1.y. There it returns a
  > folio or NULL.  In 6.7 it returns a folio or a negative error code.  The 
error
  > check in cifs_flush_folio() needs to change to something like:
  >
  >     folio = filemap_get_folio(inode->i_mapping, index);
  >     if (!folio)
  >             return -ENOMEM;
  >
  > David 

  6.1.y then got a specific patch to fix the issue in 6.1, which is:

  commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
  Author: Steve French <stfre...@microsoft.com>
  Date: Fri Jan 12 23:08:51 2024 -0600
  Subject: cifs: fix flushing folio regression for 6.1 backport
  Link: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1

  $ git describe --contains 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
  Ubuntu-6.5.0-27.28~162

  Since the Ubuntu mantic kernel consumes both 6.1.y and 6.7.y / 6.8.y
  stable patches, this patch was applied to mantic's 6.5 kernel by
  mistake, and contains the wrong logic for how __filemap_get_folio()
  works in 6.5.

  The fix is to revert "cifs: fix flushing folio regression for 6.1
  backport" as a SAUCE patch.

  [Testcase]

  Start two VMs. One is recommended to be Debian 12, which is what some
  users have had luck with in the past, as the server, and the client
  can be mantic.

  Server
  ------

  $ sudo apt update
  $ sudo apt upgrade
  $ sudo apt install samba
  $ sudo vim /etc/samba/smb.conf
  server min protocol = NT1
  [sambashare]
      comment = Samba on Ubuntu
      path = /home/ubuntu/sambashare
      read only = no
      browsable = yes
  $ mkdir ~/sambashare
  $ sudo smbpasswd -a ubuntu

  Client
  ------

  $ sudo apt update
  $ sudo apt install cifs-utils
  $ mkdir ~/share
  $ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
  Password for ubuntu@//192.168.122.185/sambashare:
  $ mount -l
  ...
  //192.168.122.185/sambashare on /home/ubuntu/share type cifs 
(rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

  $ ls
  hallo.txt hello.txt sample.txt sample2.txt
  $ sudo cp hello.txt hello.txt.1
  Killed

  If you install the test kernel available from the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test

  The copy will work as expected.

  [Where problems could occur]

  Reverting the patch restores logic back to how it was between
  6.5.0-20-generic through to 6.5.0-26-generic, which functions, and is
  well tested by the community.

  If a regression were to occur, it would impact all writes to cifs
  mounts, particularly to the same destination directory as the origin
  file. There is no known workarounds.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060919/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to