Re: regression from 5.10.0-rc3: BUG: Bad page state in process kworker/41:0 pfn:891066 during fio on devdax

Yi Zhang Mon, 09 Nov 2020 04:11:20 -0800

Hi Dan

By bisecting, this issue was introduced with bellow patch


commit f8f6ae5d077a9bdaf5cbf2ac960a5d1a04b47482
Author: Jason Gunthorpe <j...@ziepe.ca>
Date:   Sun Nov 1 17:08:00 2020 -0800

    mm: always have io_remap_pfn_range() set pgprot_decrypted()

    The purpose of io_remap_pfn_range() is to map IO memory, such as a
    memory mapped IO exposed through a PCI BAR.  IO devices do not
    understand encryption, so this memory must always be decrypted.
    Automatically call pgprot_decrypted() as part of the generic
    implementation.

This fixes a bug where enabling AMD SME causes subsystems, such asRDMA,

    using io_remap_pfn_range() to expose BAR pages to user space to fail.
    The CPU will encrypt access to those BAR pages instead of passing
    unencrypted IO directly to the device.

    Places not mapping IO should use remap_pfn_range().


On 11/9/20 10:38 AM, Yi Zhang wrote:

Hello

I found this regression during devdax fio test on 5.10.0-rc3, could anyone help 
check it, thanks.

[  303.441089] memmap_init_zone_device initialised 2063872 pages in 34ms
[  303.501085] memmap_init_zone_device initialised 2063872 pages in 34ms
[  303.556891] memmap_init_zone_device initialised 2063872 pages in 24ms
[  303.612790] memmap_init_zone_device initialised 2063872 pages in 24ms
[  326.779920] perf: interrupt took too long (2714 > 2500), lowering 
kernel.perf_event_max_sample_rate to 73000
[  334.857133] perf: interrupt took too long (3737 > 3392), lowering 
kernel.perf_event_max_sample_rate to 53000
[  366.202597] memmap_init_zone_device initialised 1835008 pages in 21ms
[  366.255031] memmap_init_zone_device initialised 1835008 pages in 22ms
[  366.317048] memmap_init_zone_device initialised 1835008 pages in 31ms
[  366.377970] memmap_init_zone_device initialised 1835008 pages in 32ms
[  368.785285] BUG: Bad page state in process kworker/41:0  pfn:891066
[  368.818471] page:00000000581ab220 refcount:0 mapcount:-1024 
mapping:0000000000000000 index:0x0 pfn:0x891066
[  368.865117] flags: 0x57ffffc0000000()
[  368.882138] raw: 0057ffffc0000000 dead000000000100 dead000000000122 
0000000000000000
[  368.917429] raw: 0000000000000000 0000000000000000 00000000fffffbff 
0000000000000000
[  368.952788] page dumped because: nonzero mapcount
[  368.974190] Modules linked in: rfkill sunrpc vfat fat dm_multipath 
intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp 
coretemp mgag200 ipmi_ssif i2c_algo_bit kvm_intel drm_kms_helper syscopyarea 
acpi_ipmi sysfillrect kvm sysimgblt ipmi_si fb_sys_fops iTCO_wdt 
iTCO_vendor_support ipmi_devintf drm irqbypass crct10dif_pclmul ipmi_msghandler 
crc32_pclmul i2c_i801 ghash_clmulni_intel dax_pmem_compat rapl device_dax 
i2c_smbus intel_cstate ioatdma intel_uncore joydev hpilo dax_pmem_core pcspkr 
acpi_tad hpwdt lpc_ich dca acpi_power_meter ip_tables xfs sr_mod cdrom sd_mod 
t10_pi sg nd_pmem nd_btt ahci nfit bnx2x libahci libata tg3 libnvdimm hpsa mdio 
libcrc32c scsi_transport_sas wmi crc32c_intel dm_mirror dm_region_hash dm_log 
dm_mod
[  369.281195] CPU: 41 PID: 3258 Comm: kworker/41:0 Tainted: G S                
5.10.0-rc3 #1
[  369.321037] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS 
P89 10/05/2016
[  369.363640] Workqueue: mm_percpu_wq vmstat_update
[  369.385044] Call Trace:
[  369.388275] perf: interrupt took too long (5477 > 4671), lowering 
kernel.perf_event_max_sample_rate to 36000
[  369.396225]  dump_stack+0x57/0x6a
[  369.411391]  bad_page.cold.114+0x9b/0xa0
[  369.429316]  free_pcppages_bulk+0x538/0x760
[  369.448465]  drain_zone_pages+0x1f/0x30
[  369.466027]  refresh_cpu_vm_stats+0x1ea/0x2b0
[  369.485972]  vmstat_update+0xf/0x50
[  369.502064]  process_one_work+0x1a4/0x340
[  369.520412]  ? process_one_work+0x340/0x340
[  369.539510]  worker_thread+0x30/0x370
[  369.555744]  ? process_one_work+0x340/0x340
[  369.574765]  kthread+0x116/0x130
[  369.589612]  ? kthread_park+0x80/0x80
[  369.606231]  ret_from_fork+0x22/0x30
[  369.622910] Disabling lock debugging due to kernel taint
[  393.619285] perf: interrupt took too long (6874 > 6846), lowering 
kernel.perf_event_max_sample_rate to 29000
[  397.904036] BUG: Bad page state in process kworker/57:1  pfn:189525
[  397.936971] page:00000000be782875 refcount:0 mapcount:-1024 
mapping:0000000000000000 index:0x0 pfn:0x189525
[  397.984722] flags: 0x17ffffc0000000()
[  398.002324] raw: 0017ffffc0000000 dead000000000100 dead000000000122 
0000000000000000
[  398.039032] raw: 0000000000000000 0000000000000000 00000000fffffbff 
0000000000000000
[  398.075804] page dumped because: nonzero mapcount
[  398.098130] Modules linked in: rfkill sunrpc vfat fat dm_multipath 
intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp 
coretemp mgag200 ipmi_ssif i2c_algo_bit kvm_intel drm_kms_helper syscopyarea 
acpi_ipmi sysfillrect kvm sysimgblt ipmi_si fb_sys_fops iTCO_wdt 
iTCO_vendor_support ipmi_devintf drm irqbypass crct10dif_pclmul ipmi_msghandler 
crc32_pclmul i2c_i801 ghash_clmulni_intel dax_pmem_compat rapl device_dax 
i2c_smbus intel_cstate ioatdma intel_uncore joydev hpilo dax_pmem_core pcspkr 
acpi_tad hpwdt lpc_ich dca acpi_power_meter ip_tables xfs sr_mod cdrom sd_mod 
t10_pi sg nd_pmem nd_btt ahci nfit bnx2x libahci libata tg3 libnvdimm hpsa mdio 
libcrc32c scsi_transport_sas wmi crc32c_intel dm_mirror dm_region_hash dm_log 
dm_mod
[  398.413042] CPU: 57 PID: 587 Comm: kworker/57:1 Tainted: G S  B             
5.10.0-rc3 #1
[  398.455914] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS 
P89 10/05/2016
[  398.496657] Workqueue: mm_percpu_wq vmstat_update
[  398.518938] Call Trace:
[  398.530673]  dump_stack+0x57/0x6a
[  398.546463]  bad_page.cold.114+0x9b/0xa0
[  398.564977]  free_pcppages_bulk+0x538/0x760
[  398.584697]  drain_zone_pages+0x1f/0x30
[  398.602907]  refresh_cpu_vm_stats+0x1ea/0x2b0
[  398.623681]  vmstat_update+0xf/0x50
[  398.640415]  process_one_work+0x1a4/0x340
[  398.659517]  ? process_one_work+0x340/0x340
[  398.678659]  worker_thread+0x30/0x370
[  398.695506]  ? process_one_work+0x340/0x340
[  398.715204]  kthread+0x116/0x130
[  398.730572]  ? kthread_park+0x80/0x80
[  398.747761]  ret_from_fork+0x22/0x30




Best Regards,
   Yi Zhang

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: regression from 5.10.0-rc3: BUG: Bad page state in process kworker/41:0 pfn:891066 during fio on devdax

Reply via email to