ping
This issue still can be reproduced on 5.10.0-rc4

[ 1914.356562] BUG: Bad page state in process kworker/58:0  pfn:1fadf5
[ 1914.390159] page:00000000fee4d2a1 refcount:0 mapcount:-1024 
mapping:0000000000000000 index:0x0 pfn:0x1fadf5
[ 1914.436292] flags: 0x17ffffc0000000()
[ 1914.452792] raw: 0017ffffc0000000 dead000000000100 dead000000000122 
0000000000000000
[ 1914.488322] raw: 0000000000000000 0000000000000000 00000000fffffbff 
0000000000000000
[ 1914.523625] page dumped because: nonzero mapcount
[ 1914.544972] Modules linked in: dm_log_writes loop ext4 mbcache jbd2 rfkill 
sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 
crct10dif_pclmul i2c_algo_bit drm_kms_helper syscopyarea crc32_pclmul 
ghash_clmulni_intel iTCO_wdt sysfillrect sysimgblt rapl fb_sys_fops 
intel_cstate iTCO_vendor_support drm dax_pmem_compat ipmi_ssif device_dax 
intel_uncore pcspkr dax_pmem_core i2c_i801 lpc_ich acpi_ipmi ipmi_si joydev 
ipmi_devintf acpi_tad ipmi_msghandler hpilo hpwdt i2c_smbus ioatdma 
acpi_power_meter dca ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt 
ahci bnx2x nfit libahci libata tg3 libnvdimm hpsa mdio libcrc32c 
scsi_transport_sas crc32c_intel wmi dm_mirror dm_region_hash dm_log dm_mod
[ 1914.862181] CPU: 58 PID: 14617 Comm: kworker/58:0 Tainted: G S  B            
 5.10.0-rc4 #1
[ 1914.903469] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS 
P89 10/05/2016
[ 1914.945189] Workqueue: mm_percpu_wq vmstat_update
[ 1914.966350] Call Trace:
[ 1914.977331]  dump_stack+0x57/0x6a
[ 1914.992193]  bad_page.cold.114+0x9b/0xa0
[ 1915.009908]  free_pcppages_bulk+0x538/0x760
[ 1915.029226]  drain_zone_pages+0x1f/0x30
[ 1915.046526]  refresh_cpu_vm_stats+0x1ea/0x2b0
[ 1915.066113]  vmstat_update+0xf/0x50
[ 1915.081784]  process_one_work+0x1a4/0x340
[ 1915.099858]  ? process_one_work+0x340/0x340
[ 1915.118741]  worker_thread+0x30/0x370
[ 1915.135268]  ? process_one_work+0x340/0x340
[ 1915.154211]  kthread+0x116/0x130
[ 1915.168771]  ? kthread_park+0x80/0x80
[ 1915.185635]  ret_from_fork+0x22/0x30
[ 1972.063440] restraintd[2377]: *** Current Time: Mon Nov 16 00:56:57 2020  
Localwatchdog at: Mon Nov 16 02:55:57 2020
[ 1976.501706] BUG: Bad page state in process kworker/4:0  pfn:a24692
[ 1976.532586] page:00000000f000e4ba refcount:0 mapcount:-1024 
mapping:0000000000000000 index:0x0 pfn:0xa24692
[ 1976.581869] flags: 0x57ffffc0000000()
[ 1976.599064] raw: 0057ffffc0000000 dead000000000100 dead000000000122 
0000000000000000
[ 1976.635786] raw: 0000000000000000 0000000000000000 00000000fffffbff 
0000000000000000
[ 1976.671862] page dumped because: nonzero mapcount
[ 1976.694287] Modules linked in: dm_log_writes loop ext4 mbcache jbd2 rfkill 
sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 
crct10dif_pclmul i2c_algo_bit drm_kms_helper syscopyarea crc32_pclmul 
ghash_clmulni_intel iTCO_wdt sysfillrect sysimgblt rapl fb_sys_fops 
intel_cstate iTCO_vendor_support drm dax_pmem_compat ipmi_ssif device_dax 
intel_uncore pcspkr dax_pmem_core i2c_i801 lpc_ich acpi_ipmi ipmi_si joydev 
ipmi_devintf acpi_tad ipmi_msghandler hpilo hpwdt i2c_smbus ioatdma 
acpi_power_meter dca ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt 
ahci bnx2x nfit libahci libata tg3 libnvdimm hpsa mdio libcrc32c 
scsi_transport_sas crc32c_intel wmi dm_mirror dm_region_hash dm_log dm_mod
[ 1977.024006] CPU: 4 PID: 23471 Comm: kworker/4:0 Tainted: G S  B             
5.10.0-rc4 #1
[ 1977.067069] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS 
P89 10/05/2016
[ 1977.106156] Workqueue: mm_percpu_wq vmstat_update
[ 1977.128645] Call Trace:
[ 1977.140263]  dump_stack+0x57/0x6a
[ 1977.155844]  bad_page.cold.114+0x9b/0xa0
[ 1977.174451]  free_pcppages_bulk+0x538/0x760
[ 1977.194417]  drain_zone_pages+0x1f/0x30
[ 1977.212748]  refresh_cpu_vm_stats+0x1ea/0x2b0
[ 1977.233450]  vmstat_update+0xf/0x50
[ 1977.249779]  process_one_work+0x1a4/0x340
[ 1977.268797]  ? process_one_work+0x340/0x340
[ 1977.288564]  worker_thread+0x30/0x370
[ 1977.306138]  ? process_one_work+0x340/0x340
[ 1977.326017]  kthread+0x116/0x130
[ 1977.341274]  ? kthread_park+0x80/0x80
[ 1977.358649]  ret_from_fork+0x22/0x30



On 11/11/20 11:44 AM, Yi Zhang wrote:
Add Ralph


Hi Dan/Jason

It turns out that it was introduced by bellow patch[1] which fixed the "static key devmap_managed_key" issue, but introduced [2]
Finally I found it was not 100% reproduced, and sorry for my mistake.

[1]
commit 46b1ee38b2ba1a9524c8e886ad078bd3ca40de2a (HEAD)
Author: Ralph Campbell <rcampb...@nvidia.com>
Date:   Sun Nov 1 17:07:23 2020 -0800

    mm/mremap_pages: fix static key devmap_managed_key updates

[2]
[ 1129.792673] memmap_init_zone_device initialised 2063872 pages in 34ms
[ 1129.865469] memmap_init_zone_device initialised 2063872 pages in 34ms
[ 1129.924080] memmap_init_zone_device initialised 2063872 pages in 24ms
[ 1129.987160] memmap_init_zone_device initialised 2063872 pages in 25ms
[ 1170.785114] BUG: Bad page state in process kworker/67:2 pfn:189e3e
[ 1170.815859] page:000000002f5fe047 refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x189e3e
[ 1170.864772] flags: 0x17ffffc0000000()
[ 1170.883291] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000 [ 1170.920537] raw: 0000000000000000 0000000000000000 00000000fffffbff 0000000000000000
[ 1170.957627] page dumped because: nonzero mapcount
[ 1170.980101] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass mgag200 crct10dif_pclmul iTCO_wdt i2c_algo_bit crc32_pclmul iTCO_vendor_support drm_kms_helper syscopyarea acpi_ipmi ghash_clmulni_intel sysfillrect ipmi_si rapl sysimgblt fb_sys_fops i2c_i801 ipmi_devintf drm ipmi_msghandler intel_cstate intel_uncore dax_pmem_compat device_dax ioatdma i2c_smbus acpi_tad joydev dax_pmem_core pcspkr hpwdt lpc_ich acpi_power_meter hpilo dca ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci bnx2x libahci nfit libata tg3 libnvdimm hpsa mdio scsi_transport_sas libcrc32c wmi crc32c_intel dm_mirror dm_region_hash dm_log dm_mod [ 1171.332281] CPU: 67 PID: 2700 Comm: kworker/67:2 Tainted: G S                5.10.0-rc2.46b1ee38b2ba+ #4 [ 1171.378334] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
[ 1171.419774] Workqueue: mm_percpu_wq vmstat_update
[ 1171.442726] Call Trace:
[ 1171.454481]  dump_stack+0x57/0x6a
[ 1171.470597]  bad_page.cold.114+0x9b/0xa0
[ 1171.489841]  free_pcppages_bulk+0x538/0x760
[ 1171.509124]  drain_zone_pages+0x1f/0x30
[ 1171.527649]  refresh_cpu_vm_stats+0x1ea/0x2b0
[ 1171.548935]  vmstat_update+0xf/0x50
[ 1171.565961]  process_one_work+0x1a4/0x340
[ 1171.585142]  ? process_one_work+0x340/0x340
[ 1171.605147]  worker_thread+0x30/0x370
[ 1171.622603]  ? process_one_work+0x340/0x340
[ 1171.642355]  kthread+0x116/0x130
[ 1171.657519]  ? kthread_park+0x80/0x80
[ 1171.674713]  ret_from_fork+0x22/0x30
[ 1171.691291] Disabling lock debugging due to kernel taint

How confident are you in the bisection?

Jason

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Reply via email to