Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Wed, 20 Dec 2017, at 3:14 PM, Tejun Heo wrote: > On Tue, Dec 19, 2017 at 05:42:39AM -0800, Tejun Heo wrote: > > On Sun, Dec 17, 2017 at 03:24:48PM -0800, vcap...@pengaru.com wrote: > > > On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > > > > I just upgraded to 4.14.7 and tried to reproduce this error, this time > > > > under strace. As you can see this happens when systemctl tries to read > > > > a specific entry under /sys/fs . In case this matters, the entry is for > > > > a small virtual machine running under qemu/kvm and managed by libvirt. > > > > > > > > open("/sys/fs/cgroup/unified/machine.slice", > > > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > > > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > > > > getdents(5, /* 12 entries */, 32768)= 464 > > > > openat(AT_FDCWD, > > > > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > > > > O_RDONLY|O_CLOEXEC) = 8 > > > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > > > read(8, ) = ? > > > > +++ killed by SIGKILL +++ > > > > [1]12078 killed strace -- systemctl status > > > > > > > > > > > > > > This recently came through lkml, may be related: > > > https://marc.info/?l=linux-kernel=151320108922415=2 > > > > It looks like it could be the same problem. Working on the fix now. > > Will let you know when I have something. > > Fix posted. > > http://lkml.kernel.org/r/20171220151331.ga3413...@devbig577.frc2.facebook.com > Thank you Tejun - I tested this fix and it works for me B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Wed, 20 Dec 2017, at 3:14 PM, Tejun Heo wrote: > On Tue, Dec 19, 2017 at 05:42:39AM -0800, Tejun Heo wrote: > > On Sun, Dec 17, 2017 at 03:24:48PM -0800, vcap...@pengaru.com wrote: > > > On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > > > > I just upgraded to 4.14.7 and tried to reproduce this error, this time > > > > under strace. As you can see this happens when systemctl tries to read > > > > a specific entry under /sys/fs . In case this matters, the entry is for > > > > a small virtual machine running under qemu/kvm and managed by libvirt. > > > > > > > > open("/sys/fs/cgroup/unified/machine.slice", > > > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > > > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > > > > getdents(5, /* 12 entries */, 32768)= 464 > > > > openat(AT_FDCWD, > > > > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > > > > O_RDONLY|O_CLOEXEC) = 8 > > > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > > > read(8, ) = ? > > > > +++ killed by SIGKILL +++ > > > > [1]12078 killed strace -- systemctl status > > > > > > > > > > > > > > This recently came through lkml, may be related: > > > https://marc.info/?l=linux-kernel=151320108922415=2 > > > > It looks like it could be the same problem. Working on the fix now. > > Will let you know when I have something. > > Fix posted. > > http://lkml.kernel.org/r/20171220151331.ga3413...@devbig577.frc2.facebook.com > Thank you Tejun - I tested this fix and it works for me B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Tue, Dec 19, 2017 at 05:42:39AM -0800, Tejun Heo wrote: > On Sun, Dec 17, 2017 at 03:24:48PM -0800, vcap...@pengaru.com wrote: > > On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > > > I just upgraded to 4.14.7 and tried to reproduce this error, this time > > > under strace. As you can see this happens when systemctl tries to read a > > > specific entry under /sys/fs . In case this matters, the entry is for a > > > small virtual machine running under qemu/kvm and managed by libvirt. > > > > > > open("/sys/fs/cgroup/unified/machine.slice", > > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > > > getdents(5, /* 12 entries */, 32768)= 464 > > > openat(AT_FDCWD, > > > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > > > O_RDONLY|O_CLOEXEC) = 8 > > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > > read(8, ) = ? > > > +++ killed by SIGKILL +++ > > > [1]12078 killed strace -- systemctl status > > > > > > > > > > This recently came through lkml, may be related: > > https://marc.info/?l=linux-kernel=151320108922415=2 > > It looks like it could be the same problem. Working on the fix now. > Will let you know when I have something. Fix posted. http://lkml.kernel.org/r/20171220151331.ga3413...@devbig577.frc2.facebook.com Thanks. -- tejun
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Tue, Dec 19, 2017 at 05:42:39AM -0800, Tejun Heo wrote: > On Sun, Dec 17, 2017 at 03:24:48PM -0800, vcap...@pengaru.com wrote: > > On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > > > I just upgraded to 4.14.7 and tried to reproduce this error, this time > > > under strace. As you can see this happens when systemctl tries to read a > > > specific entry under /sys/fs . In case this matters, the entry is for a > > > small virtual machine running under qemu/kvm and managed by libvirt. > > > > > > open("/sys/fs/cgroup/unified/machine.slice", > > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > > > getdents(5, /* 12 entries */, 32768)= 464 > > > openat(AT_FDCWD, > > > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > > > O_RDONLY|O_CLOEXEC) = 8 > > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > > read(8, ) = ? > > > +++ killed by SIGKILL +++ > > > [1]12078 killed strace -- systemctl status > > > > > > > > > > This recently came through lkml, may be related: > > https://marc.info/?l=linux-kernel=151320108922415=2 > > It looks like it could be the same problem. Working on the fix now. > Will let you know when I have something. Fix posted. http://lkml.kernel.org/r/20171220151331.ga3413...@devbig577.frc2.facebook.com Thanks. -- tejun
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
Hello, On Mon, Dec 18, 2017 at 03:17:54PM -0500, George Amanakis wrote: > I can replicate this on a Thinkpad X230i running archlinux with latest > 4.14.7 kernel, without the ZFS modules. > > Steps to reproduce: > 1) create a virtual machine using libvirt (attached xml) > 2) virsh start vm > 3) head /sys/fs/cgroup/unified/machine.slice/machine- > qemu\\x2d2\\x2dvm.scope/cgroup.procs It took some massaging but I can reproduce the problem. Will report when I know more. Thanks. -- tejun
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
Hello, On Mon, Dec 18, 2017 at 03:17:54PM -0500, George Amanakis wrote: > I can replicate this on a Thinkpad X230i running archlinux with latest > 4.14.7 kernel, without the ZFS modules. > > Steps to reproduce: > 1) create a virtual machine using libvirt (attached xml) > 2) virsh start vm > 3) head /sys/fs/cgroup/unified/machine.slice/machine- > qemu\\x2d2\\x2dvm.scope/cgroup.procs It took some massaging but I can reproduce the problem. Will report when I know more. Thanks. -- tejun
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Sun, Dec 17, 2017 at 03:24:48PM -0800, vcap...@pengaru.com wrote: > On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > > I just upgraded to 4.14.7 and tried to reproduce this error, this time > > under strace. As you can see this happens when systemctl tries to read a > > specific entry under /sys/fs . In case this matters, the entry is for a > > small virtual machine running under qemu/kvm and managed by libvirt. > > > > open("/sys/fs/cgroup/unified/machine.slice", > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > > getdents(5, /* 12 entries */, 32768)= 464 > > openat(AT_FDCWD, > > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > > O_RDONLY|O_CLOEXEC) = 8 > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > read(8, ) = ? > > +++ killed by SIGKILL +++ > > [1]12078 killed strace -- systemctl status > > > > > > This recently came through lkml, may be related: > https://marc.info/?l=linux-kernel=151320108922415=2 It looks like it could be the same problem. Working on the fix now. Will let you know when I have something. Thanks. -- tejun
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Sun, Dec 17, 2017 at 03:24:48PM -0800, vcap...@pengaru.com wrote: > On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > > I just upgraded to 4.14.7 and tried to reproduce this error, this time > > under strace. As you can see this happens when systemctl tries to read a > > specific entry under /sys/fs . In case this matters, the entry is for a > > small virtual machine running under qemu/kvm and managed by libvirt. > > > > open("/sys/fs/cgroup/unified/machine.slice", > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > > getdents(5, /* 12 entries */, 32768)= 464 > > openat(AT_FDCWD, > > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > > O_RDONLY|O_CLOEXEC) = 8 > > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > read(8, ) = ? > > +++ killed by SIGKILL +++ > > [1]12078 killed strace -- systemctl status > > > > > > This recently came through lkml, may be related: > https://marc.info/?l=linux-kernel=151320108922415=2 It looks like it could be the same problem. Working on the fix now. Will let you know when I have something. Thanks. -- tejun
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
I can replicate this on a Thinkpad X230i running archlinux with latest 4.14.7 kernel, without the ZFS modules. Steps to reproduce: 1) create a virtual machine using libvirt (attached xml) 2) virsh start vm 3) head /sys/fs/cgroup/unified/machine.slice/machine- qemu\\x2d2\\x2dvm.scope/cgroup.procs This hangs the laptop requiring a hard reset. Regards, George vm.xml Description: XML document
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
I can replicate this on a Thinkpad X230i running archlinux with latest 4.14.7 kernel, without the ZFS modules. Steps to reproduce: 1) create a virtual machine using libvirt (attached xml) 2) virsh start vm 3) head /sys/fs/cgroup/unified/machine.slice/machine- qemu\\x2d2\\x2dvm.scope/cgroup.procs This hangs the laptop requiring a hard reset. Regards, George vm.xml Description: XML document
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 17/12/2017 23:24, vcap...@pengaru.com wrote: On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(5, /* 12 entries */, 32768)= 464 openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(8, ) = ? +++ killed by SIGKILL +++ [1]12078 killed strace -- systemctl status This recently came through lkml, may be related: https://marc.info/?l=linux-kernel=151320108922415=2 thank you, it certainly seems related. Is there some debugging option I could enable, or patch I could apply, which would make the point of data corruption easier to find? I'm ok taking untested patches, if that helps finding the location of the bug. B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 17/12/2017 23:24, vcap...@pengaru.com wrote: On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(5, /* 12 entries */, 32768)= 464 openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(8, ) = ? +++ killed by SIGKILL +++ [1]12078 killed strace -- systemctl status This recently came through lkml, may be related: https://marc.info/?l=linux-kernel=151320108922415=2 thank you, it certainly seems related. Is there some debugging option I could enable, or patch I could apply, which would make the point of data corruption easier to find? I'm ok taking untested patches, if that helps finding the location of the bug. B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > I just upgraded to 4.14.7 and tried to reproduce this error, this time under > strace. As you can see this happens when systemctl tries to read a specific > entry under /sys/fs . In case this matters, the entry is for a small virtual > machine running under qemu/kvm and managed by libvirt. > > open("/sys/fs/cgroup/unified/machine.slice", > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > getdents(5, /* 12 entries */, 32768)= 464 > openat(AT_FDCWD, > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > O_RDONLY|O_CLOEXEC) = 8 > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(8, ) = ? > +++ killed by SIGKILL +++ > [1]12078 killed strace -- systemctl status > > This recently came through lkml, may be related: https://marc.info/?l=linux-kernel=151320108922415=2 CCd Tejun > B. > > > [ 1889.226051] > > [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 > [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' > [ 1889.249920] > > [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at > 00b0 > [ 1889.267524] IP: pids_free+0x28/0xb0 > [ 1889.272394] PGD 0 P4D 0 > [ 1889.274925] Oops: [#1] SMP > [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter > ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > kvm crct10dif_pclmul crc32_pclmu > l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd > glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto > nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf > mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ > core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca > shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) > zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas > libahci xhci_pci ehci_pci mpt3sas xhci_hc > d ehci_hcd raid_class libata > [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio > zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun > tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio > [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: PW O > 4.14.7-1-ARCH #1 > [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 > [ 1889.383474] task: 93149aaec140 task.stack: a88c3836c000 > [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 > [ 1889.393388] RSP: 0018:a88c3836fcc8 EFLAGS: 00010282 > [ 1889.398605] RAX: RBX: RCX: > 0006 > [ 1889.405731] RDX: RSI: 0202 RDI: > 0202 > [ 1889.412854] RBP: 931499ab2d58 R08: 079a R09: > > [ 1889.419979] R10: 001f5954 R11: 0003d040 R12: > 56e21a48 > [ 1889.427102] R13: a91de5c0 R14: 93247b0598c0 R15: > a91cd0a0 > [ 1889.434227] FS: 7f18eee6b8c0() GS:931ebfa4() > knlGS: > [ 1889.442302] CS: 0010 DS: ES: CR0: 80050033 > [ 1889.448041] CR2: 00b0 CR3: 000611019003 CR4: > 001626e0 > [ 1889.455164] Call Trace: > [ 1889.457610] cgroup_free+0xaa/0x190 > [ 1889.461095] __put_task_struct+0x68/0x230 > [ 1889.465105] ? seq_printf+0x4e/0x70 > [ 1889.468591] css_task_iter_next+0x74/0x90 > [ 1889.472594] kernfs_seq_next+0x58/0x110 > [ 1889.476424] seq_read+0x36c/0x620 > [ 1889.479735] __vfs_read+0x54/0x2e0 > [ 1889.483134] vfs_read+0x9d/0x200 > [ 1889.486358] SyS_read+0x52/0xc0 > [ 1889.489494] do_syscall_64+0x69/0x1e0 > [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 > [ 1889.497771] RIP: 0033:0x7f18ee784a11 > [ 1889.501341] RSP: 002b:7ffd56942618 EFLAGS: 0246 ORIG_RAX: > > [ 1889.508897] RAX: ffda RBX: 559a9ae6d260 RCX: > 7f18ee784a11 > [ 1889.516022] RDX: 1000 RSI: 559a9ae80f70 RDI: > 0008 > [ 1889.523145] RBP: 0d68 R08: 0003 R09: > ffb0 > [ 1889.530270] R10: 1000 R11: 0246 R12: > 7f18eea4b700 > [ 1889.537395] R13: 7f18eea4c240 R14: 559a9ae6d260 R15: > > [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 > fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> > 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 > [ 1889.563368] RIP:
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On Sun, Dec 17, 2017 at 05:49:44PM +, Bronek Kozicki wrote: > I just upgraded to 4.14.7 and tried to reproduce this error, this time under > strace. As you can see this happens when systemctl tries to read a specific > entry under /sys/fs . In case this matters, the entry is for a small virtual > machine running under qemu/kvm and managed by libvirt. > > open("/sys/fs/cgroup/unified/machine.slice", > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > getdents(5, /* 12 entries */, 32768)= 464 > openat(AT_FDCWD, > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > O_RDONLY|O_CLOEXEC) = 8 > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(8, ) = ? > +++ killed by SIGKILL +++ > [1]12078 killed strace -- systemctl status > > This recently came through lkml, may be related: https://marc.info/?l=linux-kernel=151320108922415=2 CCd Tejun > B. > > > [ 1889.226051] > > [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 > [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' > [ 1889.249920] > > [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at > 00b0 > [ 1889.267524] IP: pids_free+0x28/0xb0 > [ 1889.272394] PGD 0 P4D 0 > [ 1889.274925] Oops: [#1] SMP > [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter > ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > kvm crct10dif_pclmul crc32_pclmu > l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd > glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto > nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf > mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ > core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca > shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) > zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas > libahci xhci_pci ehci_pci mpt3sas xhci_hc > d ehci_hcd raid_class libata > [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio > zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun > tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio > [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: PW O > 4.14.7-1-ARCH #1 > [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 > [ 1889.383474] task: 93149aaec140 task.stack: a88c3836c000 > [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 > [ 1889.393388] RSP: 0018:a88c3836fcc8 EFLAGS: 00010282 > [ 1889.398605] RAX: RBX: RCX: > 0006 > [ 1889.405731] RDX: RSI: 0202 RDI: > 0202 > [ 1889.412854] RBP: 931499ab2d58 R08: 079a R09: > > [ 1889.419979] R10: 001f5954 R11: 0003d040 R12: > 56e21a48 > [ 1889.427102] R13: a91de5c0 R14: 93247b0598c0 R15: > a91cd0a0 > [ 1889.434227] FS: 7f18eee6b8c0() GS:931ebfa4() > knlGS: > [ 1889.442302] CS: 0010 DS: ES: CR0: 80050033 > [ 1889.448041] CR2: 00b0 CR3: 000611019003 CR4: > 001626e0 > [ 1889.455164] Call Trace: > [ 1889.457610] cgroup_free+0xaa/0x190 > [ 1889.461095] __put_task_struct+0x68/0x230 > [ 1889.465105] ? seq_printf+0x4e/0x70 > [ 1889.468591] css_task_iter_next+0x74/0x90 > [ 1889.472594] kernfs_seq_next+0x58/0x110 > [ 1889.476424] seq_read+0x36c/0x620 > [ 1889.479735] __vfs_read+0x54/0x2e0 > [ 1889.483134] vfs_read+0x9d/0x200 > [ 1889.486358] SyS_read+0x52/0xc0 > [ 1889.489494] do_syscall_64+0x69/0x1e0 > [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 > [ 1889.497771] RIP: 0033:0x7f18ee784a11 > [ 1889.501341] RSP: 002b:7ffd56942618 EFLAGS: 0246 ORIG_RAX: > > [ 1889.508897] RAX: ffda RBX: 559a9ae6d260 RCX: > 7f18ee784a11 > [ 1889.516022] RDX: 1000 RSI: 559a9ae80f70 RDI: > 0008 > [ 1889.523145] RBP: 0d68 R08: 0003 R09: > ffb0 > [ 1889.530270] R10: 1000 R11: 0246 R12: > 7f18eea4b700 > [ 1889.537395] R13: 7f18eea4c240 R14: 559a9ae6d260 R15: > > [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 > fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> > 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 > [ 1889.563368] RIP:
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
FWIW, I can do "cat" . I get a single number seemingly followed by an infinite stream of 0s (I tried wc -l, but did not want to wait very long and killed it). Here is what it looks like, if limited by "head": root@gdansk ~ # cat '/sys/fs/cgroup/unified/machine.slice/machine-qemu\x2d1\x2dkartuzy\x2dspice.scope/cgroup.procs' | head 10649 0 0 0 0 0 0 0 0 0 root@gdansk ~ # PID 10649 is indeed qemu process running the virtual machine in question: root@gdansk ~ # ps lw 10649 F UID PID PPID PRI NIVSZ RSS WCHAN STAT TTYTIME COMMAND 6 0 10649 1 20 0 4815836 60252 - Sl ? 2:56 /usr/bin/qemu-system-x86_64 -name guest=kartuzy-spice,process=qemu:kartuzy-spice,debug-threads=on -S -object se Sorry about taint by ZFS, but there is nothing I can do, it is my root filesystem. Since I am the only user of the package in question I could cheat and replace the license for the build of the ZFS module, but I do not see how that might help. B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
FWIW, I can do "cat" . I get a single number seemingly followed by an infinite stream of 0s (I tried wc -l, but did not want to wait very long and killed it). Here is what it looks like, if limited by "head": root@gdansk ~ # cat '/sys/fs/cgroup/unified/machine.slice/machine-qemu\x2d1\x2dkartuzy\x2dspice.scope/cgroup.procs' | head 10649 0 0 0 0 0 0 0 0 0 root@gdansk ~ # PID 10649 is indeed qemu process running the virtual machine in question: root@gdansk ~ # ps lw 10649 F UID PID PPID PRI NIVSZ RSS WCHAN STAT TTYTIME COMMAND 6 0 10649 1 20 0 4815836 60252 - Sl ? 2:56 /usr/bin/qemu-system-x86_64 -name guest=kartuzy-spice,process=qemu:kartuzy-spice,debug-threads=on -S -object se Sorry about taint by ZFS, but there is nothing I can do, it is my root filesystem. Since I am the only user of the package in question I could cheat and replace the license for the build of the ZFS module, but I do not see how that might help. B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 12/17/2017 10:30 AM, Bronek Kozicki wrote: > On 17/12/2017 18:25, Randy Dunlap wrote: >> On 12/17/2017 09:49 AM, Bronek Kozicki wrote: >>> I just upgraded to 4.14.7 and tried to reproduce this error, this time >>> under strace. As you can see this happens when systemctl tries to read a >>> specific entry under /sys/fs . In case this matters, the entry is for a >>> small virtual machine running under qemu/kvm and managed by libvirt. >>> >>> open("/sys/fs/cgroup/unified/machine.slice", >>> O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 >>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 >>> getdents(5, /* 12 entries */, 32768) = 464 >>> openat(AT_FDCWD, >>> "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", >>> O_RDONLY|O_CLOEXEC) = 8 >>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 >>> read(8, ) = ? >>> +++ killed by SIGKILL +++ >>> [1] 12078 killed strace -- systemctl status >>> >>> >>> B. >>> >> >> Hi, >> >> Can you reproduce this without using (loading) the XFS modules? >> They cause the kernel to be tainted. > > I think you mean ZFS - I cannot do that. It is my root filesystem. Sorry, yes, I did mean ZFS. thanks, -- ~Randy
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 12/17/2017 10:30 AM, Bronek Kozicki wrote: > On 17/12/2017 18:25, Randy Dunlap wrote: >> On 12/17/2017 09:49 AM, Bronek Kozicki wrote: >>> I just upgraded to 4.14.7 and tried to reproduce this error, this time >>> under strace. As you can see this happens when systemctl tries to read a >>> specific entry under /sys/fs . In case this matters, the entry is for a >>> small virtual machine running under qemu/kvm and managed by libvirt. >>> >>> open("/sys/fs/cgroup/unified/machine.slice", >>> O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 >>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 >>> getdents(5, /* 12 entries */, 32768) = 464 >>> openat(AT_FDCWD, >>> "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", >>> O_RDONLY|O_CLOEXEC) = 8 >>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 >>> read(8, ) = ? >>> +++ killed by SIGKILL +++ >>> [1] 12078 killed strace -- systemctl status >>> >>> >>> B. >>> >> >> Hi, >> >> Can you reproduce this without using (loading) the XFS modules? >> They cause the kernel to be tainted. > > I think you mean ZFS - I cannot do that. It is my root filesystem. Sorry, yes, I did mean ZFS. thanks, -- ~Randy
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 17/12/2017 18:25, Randy Dunlap wrote: On 12/17/2017 09:49 AM, Bronek Kozicki wrote: I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(5, /* 12 entries */, 32768) = 464 openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(8, ) = ? +++ killed by SIGKILL +++ [1] 12078 killed strace -- systemctl status B. Hi, Can you reproduce this without using (loading) the XFS modules? They cause the kernel to be tainted. I think you mean ZFS - I cannot do that. It is my root filesystem. B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 17/12/2017 18:25, Randy Dunlap wrote: On 12/17/2017 09:49 AM, Bronek Kozicki wrote: I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(5, /* 12 entries */, 32768) = 464 openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(8, ) = ? +++ killed by SIGKILL +++ [1] 12078 killed strace -- systemctl status B. Hi, Can you reproduce this without using (loading) the XFS modules? They cause the kernel to be tainted. I think you mean ZFS - I cannot do that. It is my root filesystem. B.
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 12/17/2017 09:49 AM, Bronek Kozicki wrote: > I just upgraded to 4.14.7 and tried to reproduce this error, this time under > strace. As you can see this happens when systemctl tries to read a specific > entry under /sys/fs . In case this matters, the entry is for a small virtual > machine running under qemu/kvm and managed by libvirt. > > open("/sys/fs/cgroup/unified/machine.slice", > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > getdents(5, /* 12 entries */, 32768) = 464 > openat(AT_FDCWD, > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > O_RDONLY|O_CLOEXEC) = 8 > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(8, ) = ? > +++ killed by SIGKILL +++ > [1] 12078 killed strace -- systemctl status > > > B. > Hi, Can you reproduce this without using (loading) the XFS modules? They cause the kernel to be tainted. Adding cgroups mailing list also. > > [ 1889.226051] > > [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 > [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' > [ 1889.249920] > > [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at > 00b0 > [ 1889.267524] IP: pids_free+0x28/0xb0 > [ 1889.272394] PGD 0 P4D 0 > [ 1889.274925] Oops: [#1] SMP > [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter > ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > kvm crct10dif_pclmul crc32_pclmu > l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd > glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto > nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf > mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ > core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca > shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) > zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas > libahci xhci_pci ehci_pci mpt3sas xhci_hc > d ehci_hcd raid_class libata > [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio > zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun > tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio > [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P W O > 4.14.7-1-ARCH #1 > [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 > [ 1889.383474] task: 93149aaec140 task.stack: a88c3836c000 > [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 > [ 1889.393388] RSP: 0018:a88c3836fcc8 EFLAGS: 00010282 > [ 1889.398605] RAX: RBX: RCX: > 0006 > [ 1889.405731] RDX: RSI: 0202 RDI: > 0202 > [ 1889.412854] RBP: 931499ab2d58 R08: 079a R09: > > [ 1889.419979] R10: 001f5954 R11: 0003d040 R12: > 56e21a48 > [ 1889.427102] R13: a91de5c0 R14: 93247b0598c0 R15: > a91cd0a0 > [ 1889.434227] FS: 7f18eee6b8c0() GS:931ebfa4() > knlGS: > [ 1889.442302] CS: 0010 DS: ES: CR0: 80050033 > [ 1889.448041] CR2: 00b0 CR3: 000611019003 CR4: > 001626e0 > [ 1889.455164] Call Trace: > [ 1889.457610] cgroup_free+0xaa/0x190 > [ 1889.461095] __put_task_struct+0x68/0x230 > [ 1889.465105] ? seq_printf+0x4e/0x70 > [ 1889.468591] css_task_iter_next+0x74/0x90 > [ 1889.472594] kernfs_seq_next+0x58/0x110 > [ 1889.476424] seq_read+0x36c/0x620 > [ 1889.479735] __vfs_read+0x54/0x2e0 > [ 1889.483134] vfs_read+0x9d/0x200 > [ 1889.486358] SyS_read+0x52/0xc0 > [ 1889.489494] do_syscall_64+0x69/0x1e0 > [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 > [ 1889.497771] RIP: 0033:0x7f18ee784a11 > [ 1889.501341] RSP: 002b:7ffd56942618 EFLAGS: 0246 ORIG_RAX: > > [ 1889.508897] RAX: ffda RBX: 559a9ae6d260 RCX: > 7f18ee784a11 > [ 1889.516022] RDX: 1000 RSI: 559a9ae80f70 RDI: > 0008 > [ 1889.523145] RBP: 0d68 R08: 0003 R09: > ffb0 > [ 1889.530270] R10: 1000 R11: 0246 R12: > 7f18eea4b700 > [ 1889.537395] R13: 7f18eea4c240 R14: 559a9ae6d260 R15: > > [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 > fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> > 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 > [ 1889.563368] RIP:
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
On 12/17/2017 09:49 AM, Bronek Kozicki wrote: > I just upgraded to 4.14.7 and tried to reproduce this error, this time under > strace. As you can see this happens when systemctl tries to read a specific > entry under /sys/fs . In case this matters, the entry is for a small virtual > machine running under qemu/kvm and managed by libvirt. > > open("/sys/fs/cgroup/unified/machine.slice", > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > getdents(5, /* 12 entries */, 32768) = 464 > openat(AT_FDCWD, > "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", > O_RDONLY|O_CLOEXEC) = 8 > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(8, ) = ? > +++ killed by SIGKILL +++ > [1] 12078 killed strace -- systemctl status > > > B. > Hi, Can you reproduce this without using (loading) the XFS modules? They cause the kernel to be tainted. Adding cgroups mailing list also. > > [ 1889.226051] > > [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 > [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' > [ 1889.249920] > > [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at > 00b0 > [ 1889.267524] IP: pids_free+0x28/0xb0 > [ 1889.272394] PGD 0 P4D 0 > [ 1889.274925] Oops: [#1] SMP > [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter > ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > kvm crct10dif_pclmul crc32_pclmu > l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd > glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto > nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf > mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ > core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca > shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) > zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas > libahci xhci_pci ehci_pci mpt3sas xhci_hc > d ehci_hcd raid_class libata > [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio > zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun > tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio > [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P W O > 4.14.7-1-ARCH #1 > [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 > [ 1889.383474] task: 93149aaec140 task.stack: a88c3836c000 > [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 > [ 1889.393388] RSP: 0018:a88c3836fcc8 EFLAGS: 00010282 > [ 1889.398605] RAX: RBX: RCX: > 0006 > [ 1889.405731] RDX: RSI: 0202 RDI: > 0202 > [ 1889.412854] RBP: 931499ab2d58 R08: 079a R09: > > [ 1889.419979] R10: 001f5954 R11: 0003d040 R12: > 56e21a48 > [ 1889.427102] R13: a91de5c0 R14: 93247b0598c0 R15: > a91cd0a0 > [ 1889.434227] FS: 7f18eee6b8c0() GS:931ebfa4() > knlGS: > [ 1889.442302] CS: 0010 DS: ES: CR0: 80050033 > [ 1889.448041] CR2: 00b0 CR3: 000611019003 CR4: > 001626e0 > [ 1889.455164] Call Trace: > [ 1889.457610] cgroup_free+0xaa/0x190 > [ 1889.461095] __put_task_struct+0x68/0x230 > [ 1889.465105] ? seq_printf+0x4e/0x70 > [ 1889.468591] css_task_iter_next+0x74/0x90 > [ 1889.472594] kernfs_seq_next+0x58/0x110 > [ 1889.476424] seq_read+0x36c/0x620 > [ 1889.479735] __vfs_read+0x54/0x2e0 > [ 1889.483134] vfs_read+0x9d/0x200 > [ 1889.486358] SyS_read+0x52/0xc0 > [ 1889.489494] do_syscall_64+0x69/0x1e0 > [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 > [ 1889.497771] RIP: 0033:0x7f18ee784a11 > [ 1889.501341] RSP: 002b:7ffd56942618 EFLAGS: 0246 ORIG_RAX: > > [ 1889.508897] RAX: ffda RBX: 559a9ae6d260 RCX: > 7f18ee784a11 > [ 1889.516022] RDX: 1000 RSI: 559a9ae80f70 RDI: > 0008 > [ 1889.523145] RBP: 0d68 R08: 0003 R09: > ffb0 > [ 1889.530270] R10: 1000 R11: 0246 R12: > 7f18eea4b700 > [ 1889.537395] R13: 7f18eea4c240 R14: 559a9ae6d260 R15: > > [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 > fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> > 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 > [ 1889.563368] RIP:
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(5, /* 12 entries */, 32768)= 464 openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(8, ) = ? +++ killed by SIGKILL +++ [1]12078 killed strace -- systemctl status B. [ 1889.226051] [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' [ 1889.249920] [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00b0 [ 1889.267524] IP: pids_free+0x28/0xb0 [ 1889.272394] PGD 0 P4D 0 [ 1889.274925] Oops: [#1] SMP [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc d ehci_hcd raid_class libata [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: PW O 4.14.7-1-ARCH #1 [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 [ 1889.383474] task: 93149aaec140 task.stack: a88c3836c000 [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 [ 1889.393388] RSP: 0018:a88c3836fcc8 EFLAGS: 00010282 [ 1889.398605] RAX: RBX: RCX: 0006 [ 1889.405731] RDX: RSI: 0202 RDI: 0202 [ 1889.412854] RBP: 931499ab2d58 R08: 079a R09: [ 1889.419979] R10: 001f5954 R11: 0003d040 R12: 56e21a48 [ 1889.427102] R13: a91de5c0 R14: 93247b0598c0 R15: a91cd0a0 [ 1889.434227] FS: 7f18eee6b8c0() GS:931ebfa4() knlGS: [ 1889.442302] CS: 0010 DS: ES: CR0: 80050033 [ 1889.448041] CR2: 00b0 CR3: 000611019003 CR4: 001626e0 [ 1889.455164] Call Trace: [ 1889.457610] cgroup_free+0xaa/0x190 [ 1889.461095] __put_task_struct+0x68/0x230 [ 1889.465105] ? seq_printf+0x4e/0x70 [ 1889.468591] css_task_iter_next+0x74/0x90 [ 1889.472594] kernfs_seq_next+0x58/0x110 [ 1889.476424] seq_read+0x36c/0x620 [ 1889.479735] __vfs_read+0x54/0x2e0 [ 1889.483134] vfs_read+0x9d/0x200 [ 1889.486358] SyS_read+0x52/0xc0 [ 1889.489494] do_syscall_64+0x69/0x1e0 [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 [ 1889.497771] RIP: 0033:0x7f18ee784a11 [ 1889.501341] RSP: 002b:7ffd56942618 EFLAGS: 0246 ORIG_RAX: [ 1889.508897] RAX: ffda RBX: 559a9ae6d260 RCX: 7f18ee784a11 [ 1889.516022] RDX: 1000 RSI: 559a9ae80f70 RDI: 0008 [ 1889.523145] RBP: 0d68 R08: 0003 R09: ffb0 [ 1889.530270] R10: 1000 R11: 0246 R12: 7f18eea4b700 [ 1889.537395] R13: 7f18eea4c240 R14: 559a9ae6d260 R15: [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 [ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: a88c3836fcc8 [ 1889.568846] CR2: 00b0 [ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]---
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 getdents(5, /* 12 entries */, 32768)= 464 openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(8, ) = ? +++ killed by SIGKILL +++ [1]12078 killed strace -- systemctl status B. [ 1889.226051] [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' [ 1889.249920] [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00b0 [ 1889.267524] IP: pids_free+0x28/0xb0 [ 1889.272394] PGD 0 P4D 0 [ 1889.274925] Oops: [#1] SMP [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc d ehci_hcd raid_class libata [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: PW O 4.14.7-1-ARCH #1 [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 [ 1889.383474] task: 93149aaec140 task.stack: a88c3836c000 [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 [ 1889.393388] RSP: 0018:a88c3836fcc8 EFLAGS: 00010282 [ 1889.398605] RAX: RBX: RCX: 0006 [ 1889.405731] RDX: RSI: 0202 RDI: 0202 [ 1889.412854] RBP: 931499ab2d58 R08: 079a R09: [ 1889.419979] R10: 001f5954 R11: 0003d040 R12: 56e21a48 [ 1889.427102] R13: a91de5c0 R14: 93247b0598c0 R15: a91cd0a0 [ 1889.434227] FS: 7f18eee6b8c0() GS:931ebfa4() knlGS: [ 1889.442302] CS: 0010 DS: ES: CR0: 80050033 [ 1889.448041] CR2: 00b0 CR3: 000611019003 CR4: 001626e0 [ 1889.455164] Call Trace: [ 1889.457610] cgroup_free+0xaa/0x190 [ 1889.461095] __put_task_struct+0x68/0x230 [ 1889.465105] ? seq_printf+0x4e/0x70 [ 1889.468591] css_task_iter_next+0x74/0x90 [ 1889.472594] kernfs_seq_next+0x58/0x110 [ 1889.476424] seq_read+0x36c/0x620 [ 1889.479735] __vfs_read+0x54/0x2e0 [ 1889.483134] vfs_read+0x9d/0x200 [ 1889.486358] SyS_read+0x52/0xc0 [ 1889.489494] do_syscall_64+0x69/0x1e0 [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 [ 1889.497771] RIP: 0033:0x7f18ee784a11 [ 1889.501341] RSP: 002b:7ffd56942618 EFLAGS: 0246 ORIG_RAX: [ 1889.508897] RAX: ffda RBX: 559a9ae6d260 RCX: 7f18ee784a11 [ 1889.516022] RDX: 1000 RSI: 559a9ae80f70 RDI: 0008 [ 1889.523145] RBP: 0d68 R08: 0003 R09: ffb0 [ 1889.530270] R10: 1000 R11: 0246 R12: 7f18eea4b700 [ 1889.537395] R13: 7f18eea4c240 R14: 559a9ae6d260 R15: [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 [ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: a88c3836fcc8 [ 1889.568846] CR2: 00b0 [ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]---
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
This has happend again, and hopefully the report is not as mangled as the previous one. I was also trying to start "systemctl status", only once this time. The kernel build is different because I've just disabled RCU tracing/debugging options. One more thing, this kernel was built with gcc 7.2.1 B. 2017-12-17T12:50:38,640725+ [ cut here ] 2017-12-17T12:50:38,640741+ WARNING: CPU: 10 PID: 16921 at kernel/fork.c:414 __put_task_struct+0x160/0x230 2017-12-17T12:50:38,640742+ Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ext4 kvm crct10dif_pclmul crc32_pc lmul crc32c_intel ghash_clmulni_intel pcbc crc16 mbcache aesni_intel jbd2 aes_x86_64 crypto_simd glue_helper cryptd nls_iso8859_1 nls_cp437 vfat fscrypto fat intel_cstate evdev input_leds led_class intel_rapl_perf mac_hid pcspkr igb hid_logitech_dj ptp pps_core i2c_alg o_bit tpm_tis ioatdma mei_me i2c_i801 tpm_tis_core lpc_ich mei dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ehci_pci ahci xhci_pci libsas libahci mpt3sas xhci_hcd ehci_h cd raid_class libata 2017-12-17T12:50:38,640812+ scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio 2017-12-17T12:50:38,640833+ CPU: 10 PID: 16921 Comm: systemctl Tainted: P O4.14.6-3-ARCH #1 2017-12-17T12:50:38,640835+ Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 2017-12-17T12:50:38,640837+ task: 9c4b5475c140 task.stack: b4bf8641c000 2017-12-17T12:50:38,640840+ RIP: 0010:__put_task_struct+0x160/0x230 2017-12-17T12:50:38,640841+ RSP: 0018:b4bf8641fd50 EFLAGS: 00010246 2017-12-17T12:50:38,640843+ RAX: RBX: 9c4b4f2c33f8 RCX: 0001 2017-12-17T12:50:38,640845+ RDX: b4bf8641fdf8 RSI: 9c4b4f2c33f8 RDI: 9c4b4f2c33f8 2017-12-17T12:50:38,640846+ RBP: b21ddda0 R08: 000a R09: 0008 2017-12-17T12:50:38,640847+ R10: b4bf8641fcf8 R11: R12: b4bf8641fdf8 2017-12-17T12:50:38,640849+ R13: 9c4b4f2c33f8 R14: 9c4b4f2c33f8 R15: 9c655f98c578 2017-12-17T12:50:38,640851+ FS: 7fb1df5308c0() GS:9c65bfa8() knlGS: 2017-12-17T12:50:38,640852+ CS: 0010 DS: ES: CR0: 80050033 2017-12-17T12:50:38,640853+ CR2: 55e957fd6f78 CR3: 0006042fb001 CR4: 001626e0 2017-12-17T12:50:38,640855+ Call Trace: 2017-12-17T12:50:38,640862+ ? seq_printf+0x4e/0x70 2017-12-17T12:50:38,640870+ css_task_iter_next+0x74/0x90 2017-12-17T12:50:38,640876+ kernfs_seq_next+0x58/0x110 2017-12-17T12:50:38,640878+ seq_read+0x36c/0x620 2017-12-17T12:50:38,640886+ ? __handle_mm_fault+0xb10/0x1630 2017-12-17T12:50:38,640889+ __vfs_read+0x54/0x2e0 2017-12-17T12:50:38,640891+ vfs_read+0x9d/0x200 2017-12-17T12:50:38,640893+ SyS_read+0x52/0xc0 2017-12-17T12:50:38,640899+ entry_SYSCALL_64_fastpath+0x1a/0xa5 2017-12-17T12:50:38,640902+ RIP: 0033:0x7fb1dee49a11 2017-12-17T12:50:38,640903+ RSP: 002b:7ffcf8aa5268 EFLAGS: 0246 ORIG_RAX: 2017-12-17T12:50:38,640905+ RAX: ffda RBX: 7fb1df114aa0 RCX: 7fb1dee49a11 2017-12-17T12:50:38,640907+ RDX: 1000 RSI: 55e957fd5f70 RDI: 0008 2017-12-17T12:50:38,640908+ RBP: 7fb1df114b00 R08: 0003 R09: ffb0 2017-12-17T12:50:38,640909+ R10: 1000 R11: 0246 R12: 1010 2017-12-17T12:50:38,640910+ R13: 7fb1df114b00 R14: 1000 R15: 0001 2017-12-17T12:50:38,640912+ Code: 44 24 10 65 48 33 04 25 28 00 00 00 0f 85 85 00 00 00 48 83 c4 18 48 89 df 5b 5d 41 5c 41 5d e9 27 fe ff ff 0f ff e9 ee fe ff ff <0f> ff e9 d2 fe ff ff 0f ff e9 f2 fe ff ff 4d 8d ac 24 d0 03 00 2017-12-17T12:50:38,640950+ ---[ end trace bc939269a984f4e0 ]--- 2017-12-17T12:50:38,640953+ 2017-12-17T12:50:38,649395+ UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 2017-12-17T12:50:38,655693+ member access within null pointer of type 'struct pids_cgroup' 2017-12-17T12:50:38,662630+ CPU: 10 PID: 16921 Comm: systemctl Tainted: P W O4.14.6-3-ARCH #1 2017-12-17T12:50:38,662631+ Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 2017-12-17T12:50:38,662632+ Call Trace: 2017-12-17T12:50:38,662638+ dump_stack+0x70/0xae 2017-12-17T12:50:38,662645+ ubsan_epilogue+0x9/0x40
Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
This has happend again, and hopefully the report is not as mangled as the previous one. I was also trying to start "systemctl status", only once this time. The kernel build is different because I've just disabled RCU tracing/debugging options. One more thing, this kernel was built with gcc 7.2.1 B. 2017-12-17T12:50:38,640725+ [ cut here ] 2017-12-17T12:50:38,640741+ WARNING: CPU: 10 PID: 16921 at kernel/fork.c:414 __put_task_struct+0x160/0x230 2017-12-17T12:50:38,640742+ Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ext4 kvm crct10dif_pclmul crc32_pc lmul crc32c_intel ghash_clmulni_intel pcbc crc16 mbcache aesni_intel jbd2 aes_x86_64 crypto_simd glue_helper cryptd nls_iso8859_1 nls_cp437 vfat fscrypto fat intel_cstate evdev input_leds led_class intel_rapl_perf mac_hid pcspkr igb hid_logitech_dj ptp pps_core i2c_alg o_bit tpm_tis ioatdma mei_me i2c_i801 tpm_tis_core lpc_ich mei dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ehci_pci ahci xhci_pci libsas libahci mpt3sas xhci_hcd ehci_h cd raid_class libata 2017-12-17T12:50:38,640812+ scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio 2017-12-17T12:50:38,640833+ CPU: 10 PID: 16921 Comm: systemctl Tainted: P O4.14.6-3-ARCH #1 2017-12-17T12:50:38,640835+ Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 2017-12-17T12:50:38,640837+ task: 9c4b5475c140 task.stack: b4bf8641c000 2017-12-17T12:50:38,640840+ RIP: 0010:__put_task_struct+0x160/0x230 2017-12-17T12:50:38,640841+ RSP: 0018:b4bf8641fd50 EFLAGS: 00010246 2017-12-17T12:50:38,640843+ RAX: RBX: 9c4b4f2c33f8 RCX: 0001 2017-12-17T12:50:38,640845+ RDX: b4bf8641fdf8 RSI: 9c4b4f2c33f8 RDI: 9c4b4f2c33f8 2017-12-17T12:50:38,640846+ RBP: b21ddda0 R08: 000a R09: 0008 2017-12-17T12:50:38,640847+ R10: b4bf8641fcf8 R11: R12: b4bf8641fdf8 2017-12-17T12:50:38,640849+ R13: 9c4b4f2c33f8 R14: 9c4b4f2c33f8 R15: 9c655f98c578 2017-12-17T12:50:38,640851+ FS: 7fb1df5308c0() GS:9c65bfa8() knlGS: 2017-12-17T12:50:38,640852+ CS: 0010 DS: ES: CR0: 80050033 2017-12-17T12:50:38,640853+ CR2: 55e957fd6f78 CR3: 0006042fb001 CR4: 001626e0 2017-12-17T12:50:38,640855+ Call Trace: 2017-12-17T12:50:38,640862+ ? seq_printf+0x4e/0x70 2017-12-17T12:50:38,640870+ css_task_iter_next+0x74/0x90 2017-12-17T12:50:38,640876+ kernfs_seq_next+0x58/0x110 2017-12-17T12:50:38,640878+ seq_read+0x36c/0x620 2017-12-17T12:50:38,640886+ ? __handle_mm_fault+0xb10/0x1630 2017-12-17T12:50:38,640889+ __vfs_read+0x54/0x2e0 2017-12-17T12:50:38,640891+ vfs_read+0x9d/0x200 2017-12-17T12:50:38,640893+ SyS_read+0x52/0xc0 2017-12-17T12:50:38,640899+ entry_SYSCALL_64_fastpath+0x1a/0xa5 2017-12-17T12:50:38,640902+ RIP: 0033:0x7fb1dee49a11 2017-12-17T12:50:38,640903+ RSP: 002b:7ffcf8aa5268 EFLAGS: 0246 ORIG_RAX: 2017-12-17T12:50:38,640905+ RAX: ffda RBX: 7fb1df114aa0 RCX: 7fb1dee49a11 2017-12-17T12:50:38,640907+ RDX: 1000 RSI: 55e957fd5f70 RDI: 0008 2017-12-17T12:50:38,640908+ RBP: 7fb1df114b00 R08: 0003 R09: ffb0 2017-12-17T12:50:38,640909+ R10: 1000 R11: 0246 R12: 1010 2017-12-17T12:50:38,640910+ R13: 7fb1df114b00 R14: 1000 R15: 0001 2017-12-17T12:50:38,640912+ Code: 44 24 10 65 48 33 04 25 28 00 00 00 0f 85 85 00 00 00 48 83 c4 18 48 89 df 5b 5d 41 5c 41 5d e9 27 fe ff ff 0f ff e9 ee fe ff ff <0f> ff e9 d2 fe ff ff 0f ff e9 f2 fe ff ff 4d 8d ac 24 d0 03 00 2017-12-17T12:50:38,640950+ ---[ end trace bc939269a984f4e0 ]--- 2017-12-17T12:50:38,640953+ 2017-12-17T12:50:38,649395+ UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 2017-12-17T12:50:38,655693+ member access within null pointer of type 'struct pids_cgroup' 2017-12-17T12:50:38,662630+ CPU: 10 PID: 16921 Comm: systemctl Tainted: P W O4.14.6-3-ARCH #1 2017-12-17T12:50:38,662631+ Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 2017-12-17T12:50:38,662632+ Call Trace: 2017-12-17T12:50:38,662638+ dump_stack+0x70/0xae 2017-12-17T12:50:38,662645+ ubsan_epilogue+0x9/0x40