NULL pointer dereference in process_one_work
Hi,tj and jiangshan, I build a ceph storage pool to run some benchmarks with 3.10 kernel. Occasionally, when the cpus' load is very high, some nodes crash with message below. [292273.612014] BUG: unable to handle kernel NULL pointer dereference at 0008 [292273.612057] IP: [] process_one_work+0x31/0x470 [292273.612087] PGD 0 [292273.612099] Oops: [#1] SMP [292273.612117] Modules linked in: rbd(OE) bcache(OE) ip_vs xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bonding intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mxm_wmi iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf pcspkr ipmi_ssif mei_me sg lpc_ich mei sb_edac ipmi_si mfd_core edac_core ipmi_msghandler shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper [292273.612495] crct10dif_pclmul crct10dif_common ttm crc32c_intel drm ahci nvme bnx2x libahci i2c_core libata mdio libcrc32c megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod [292273.612580] CPU: 16 PID: 353223 Comm: kworker/16:2 Tainted: G OE 3.10.0-327.el7.x86_64 #1 [292273.612620] Hardware name: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS 2.4.3 01/17/2017 [292273.612655] task: 8801f55e6780 ti: 882a199b task.ti: 882a199b [292273.612685] RIP: 0010:[] [] process_one_work+0x31/0x470 [292273.612721] RSP: 0018:882a199b3e28 EFLAGS: 00010046 [292273.612743] RAX: RBX: 88088b273028 RCX: 882a199b3fd8 [292273.612771] RDX: RSI: 88088b273028 RDI: 88088b273000 [292273.612799] RBP: 882a199b3e60 R08: R09: 0770 [292273.612827] R10: 8822a3bb1f80 R11: 8822a3bb1f80 R12: 88088b273000 [292273.612855] R13: 881fff313fc0 R14: R15: 881fff313fc0 [292273.612883] FS: () GS:881fff30() knlGS: [292273.612914] CS: 0010 DS: ES: CR0: 80050033 [292273.612937] CR2: 00b8 CR3: 0194a000 CR4: 003407e0 [292273.612965] DR0: DR1: DR2: [292273.612994] DR3: DR6: fffe0ff0 DR7: 0400 [292273.613021] Stack: [292273.613031] ff313fd8 881fff313fd8 000188088b273030 [292273.613069] 8801f55e6780 88088b273000 881fff313fc0 882a199b3ec0 [292273.613108] 8109e4cc 882a199b3fd8 882a199b3fd8 8801f55e6780 [292273.613146] Call Trace: [292273.613160] [] worker_thread+0x21c/0x400 [292273.613185] [] ? rescuer_thread+0x400/0x400 [292273.613212] [] kthread+0xcf/0xe0 [292273.613234] [] ? kthread_create_on_node+0x140/0x140 [292273.613263] [] ret_from_fork+0x58/0x90 [292273.613287] [] ? kthread_create_on_node+0x140/0x140 [292273.614303] Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01 [292273.617971] RIP [] process_one_work+0x31/0x470 [292273.620011] RSP [292273.621940] CR2: 0008 Some crash messsages: crash> sys KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 32 DATE: Wed Oct 18 05:21:14 2017 UPTIME: 3 days, 09:07:25 LOAD AVERAGE: 221.70, 222.22, 224.96 TASKS: 3115 NODENAME: node121 RELEASE: 3.10.0-327.el7.x86_64 VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015 MACHINE: x86_64 (2099 Mhz) MEMORY: 255.9 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0008" crash> bt PID: 353223 TASK: 8801f55e6780 CPU: 16 COMMAND: "kworker/16:2" #0 [882a199b3af0] machine_kexec at 81051beb #1 [882a199b3b50] crash_kexec at 810f2542 #2 [882a199b3c20] oops_end at 8163e1a8 #3 [882a199b3c48] no_context at 8162e2b8 #4 [882a199b3c98] __bad_area_nosemaphore at 8162e34e #5 [882a199b3ce0] bad_area_nosemaphore at 8162e4b8 #6 [882a199b3cf0] __do_page_fault at 81640fce #7 [882a199b3d48] do_page_fault at 81641113 #8 [882a199b3d70] page_fault at 8163d408 [exception RIP: process_one_work+49] RIP: 8109d4b1 RSP: 882a199b3e28 RFLAGS: 00010046 RAX: RBX: 88088b273028 RCX: 882a199b3fd8 RDX: RSI: 88088b273028 RDI: 88088b273000 RBP: 882a199b3e60 R8: R9: 0770
NULL pointer dereference in process_one_work
Hi,tj and jiangshan, I build a ceph storage pool to run some benchmarks with 3.10 kernel. Occasionally, when the cpus' load is very high, some nodes crash with message below. [292273.612014] BUG: unable to handle kernel NULL pointer dereference at 0008 [292273.612057] IP: [] process_one_work+0x31/0x470 [292273.612087] PGD 0 [292273.612099] Oops: [#1] SMP [292273.612117] Modules linked in: rbd(OE) bcache(OE) ip_vs xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bonding intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mxm_wmi iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf pcspkr ipmi_ssif mei_me sg lpc_ich mei sb_edac ipmi_si mfd_core edac_core ipmi_msghandler shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper [292273.612495] crct10dif_pclmul crct10dif_common ttm crc32c_intel drm ahci nvme bnx2x libahci i2c_core libata mdio libcrc32c megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod [292273.612580] CPU: 16 PID: 353223 Comm: kworker/16:2 Tainted: G OE 3.10.0-327.el7.x86_64 #1 [292273.612620] Hardware name: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS 2.4.3 01/17/2017 [292273.612655] task: 8801f55e6780 ti: 882a199b task.ti: 882a199b [292273.612685] RIP: 0010:[] [] process_one_work+0x31/0x470 [292273.612721] RSP: 0018:882a199b3e28 EFLAGS: 00010046 [292273.612743] RAX: RBX: 88088b273028 RCX: 882a199b3fd8 [292273.612771] RDX: RSI: 88088b273028 RDI: 88088b273000 [292273.612799] RBP: 882a199b3e60 R08: R09: 0770 [292273.612827] R10: 8822a3bb1f80 R11: 8822a3bb1f80 R12: 88088b273000 [292273.612855] R13: 881fff313fc0 R14: R15: 881fff313fc0 [292273.612883] FS: () GS:881fff30() knlGS: [292273.612914] CS: 0010 DS: ES: CR0: 80050033 [292273.612937] CR2: 00b8 CR3: 0194a000 CR4: 003407e0 [292273.612965] DR0: DR1: DR2: [292273.612994] DR3: DR6: fffe0ff0 DR7: 0400 [292273.613021] Stack: [292273.613031] ff313fd8 881fff313fd8 000188088b273030 [292273.613069] 8801f55e6780 88088b273000 881fff313fc0 882a199b3ec0 [292273.613108] 8109e4cc 882a199b3fd8 882a199b3fd8 8801f55e6780 [292273.613146] Call Trace: [292273.613160] [] worker_thread+0x21c/0x400 [292273.613185] [] ? rescuer_thread+0x400/0x400 [292273.613212] [] kthread+0xcf/0xe0 [292273.613234] [] ? kthread_create_on_node+0x140/0x140 [292273.613263] [] ret_from_fork+0x58/0x90 [292273.613287] [] ? kthread_create_on_node+0x140/0x140 [292273.614303] Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01 [292273.617971] RIP [] process_one_work+0x31/0x470 [292273.620011] RSP [292273.621940] CR2: 0008 Some crash messsages: crash> sys KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 32 DATE: Wed Oct 18 05:21:14 2017 UPTIME: 3 days, 09:07:25 LOAD AVERAGE: 221.70, 222.22, 224.96 TASKS: 3115 NODENAME: node121 RELEASE: 3.10.0-327.el7.x86_64 VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015 MACHINE: x86_64 (2099 Mhz) MEMORY: 255.9 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0008" crash> bt PID: 353223 TASK: 8801f55e6780 CPU: 16 COMMAND: "kworker/16:2" #0 [882a199b3af0] machine_kexec at 81051beb #1 [882a199b3b50] crash_kexec at 810f2542 #2 [882a199b3c20] oops_end at 8163e1a8 #3 [882a199b3c48] no_context at 8162e2b8 #4 [882a199b3c98] __bad_area_nosemaphore at 8162e34e #5 [882a199b3ce0] bad_area_nosemaphore at 8162e4b8 #6 [882a199b3cf0] __do_page_fault at 81640fce #7 [882a199b3d48] do_page_fault at 81641113 #8 [882a199b3d70] page_fault at 8163d408 [exception RIP: process_one_work+49] RIP: 8109d4b1 RSP: 882a199b3e28 RFLAGS: 00010046 RAX: RBX: 88088b273028 RCX: 882a199b3fd8 RDX: RSI: 88088b273028 RDI: 88088b273000 RBP: 882a199b3e60 R8: R9: 0770