On 11/10/20 3:36 PM, Yi Zhang wrote:
On 11/10/20 8:36 AM, Jason Gunthorpe wrote:
On Mon, Nov 09, 2020 at 01:54:42PM -0400, Jason Gunthorpe wrote:
On Mon, Nov 09, 2020 at 09:26:19AM -0800, Dan Williams wrote:
On Mon, Nov 9, 2020 at 6:12 AM Jason Gunthorpe <j...@ziepe.ca> wrote:
Wow, this is surprising
This has been widely backported already, Dan please check??
I thought pgprot_decrypted was a NOP on most x86 platforms -
sme_me_mask == 0:
#define __sme_set(x) ((x) | sme_me_mask)
#define __sme_clr(x) ((x) & ~sme_me_mask)
??
Confused how this can be causing DAX issues
Does that correctly preserve the "soft" pte bits? Especially
PTE_DEVMAP that DAX uses?
I'll check...
extern u64 sme_me_mask;
#define __pgprot(x) ((pgprot_t) { (x) } )
#define pgprot_val(x) ((x).pgprot)
#define __sme_clr(x) ((x) & ~sme_me_mask)
#define pgprot_decrypted(prot) __pgprot(__sme_clr(pgprot_val(prot)))
static inline int io_remap_pfn_range(struct vm_area_struct *vma,
unsigned long addr, unsigned
long pfn,
unsigned long size, pgprot_t
prot)
{
return remap_pfn_range(vma, addr, pfn, size,
pgprot_decrypted(prot));
}
Not seeing how that could change the pgprot in any harmful way?
Yi, are you using a platform where sme_me_mask != 0 ?
That code looks clearly like it would only trigger on AMD SME systems,
is that what you are using?
Can't be, the system is too old:
[ 398.455914] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380
Gen9, BIOS P89 10/05/2016
I'm at a total loss how this change could even do anything on a
non-AMD system, let alone how this intersects in any way with DEVDAX,
which I could not find being used with io_remap_pfn_range()
I will double confirm it.
Hi Dan/Jason
It turns out that it was introduced by bellow patch[1] which fixed the
"static key devmap_managed_key" issue, but introduced [2]
Finally I found it was not 100% reproduced, and sorry for my mistake.
[1]
commit 46b1ee38b2ba1a9524c8e886ad078bd3ca40de2a (HEAD)
Author: Ralph Campbell <rcampb...@nvidia.com>
Date: Sun Nov 1 17:07:23 2020 -0800
mm/mremap_pages: fix static key devmap_managed_key updates
[2]
[ 1129.792673] memmap_init_zone_device initialised 2063872 pages in 34ms
[ 1129.865469] memmap_init_zone_device initialised 2063872 pages in 34ms
[ 1129.924080] memmap_init_zone_device initialised 2063872 pages in 24ms
[ 1129.987160] memmap_init_zone_device initialised 2063872 pages in 25ms
[ 1170.785114] BUG: Bad page state in process kworker/67:2 pfn:189e3e
[ 1170.815859] page:000000002f5fe047 refcount:0 mapcount:-1024
mapping:0000000000000000 index:0x0 pfn:0x189e3e
[ 1170.864772] flags: 0x17ffffc0000000()
[ 1170.883291] raw: 0017ffffc0000000 dead000000000100 dead000000000122
0000000000000000
[ 1170.920537] raw: 0000000000000000 0000000000000000 00000000fffffbff
0000000000000000
[ 1170.957627] page dumped because: nonzero mapcount
[ 1170.980101] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace nfs_ssc fscache rfkill sunrpc vfat fat
dm_multipath intel_rapl_msr intel_rapl_common sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm
irqbypass mgag200 crct10dif_pclmul iTCO_wdt i2c_algo_bit crc32_pclmul
iTCO_vendor_support drm_kms_helper syscopyarea acpi_ipmi
ghash_clmulni_intel sysfillrect ipmi_si rapl sysimgblt fb_sys_fops
i2c_i801 ipmi_devintf drm ipmi_msghandler intel_cstate intel_uncore
dax_pmem_compat device_dax ioatdma i2c_smbus acpi_tad joydev
dax_pmem_core pcspkr hpwdt lpc_ich acpi_power_meter hpilo dca ip_tables
xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci bnx2x libahci nfit
libata tg3 libnvdimm hpsa mdio scsi_transport_sas libcrc32c wmi
crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
[ 1171.332281] CPU: 67 PID: 2700 Comm: kworker/67:2 Tainted: G
S 5.10.0-rc2.46b1ee38b2ba+ #4
[ 1171.378334] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380
Gen9, BIOS P89 10/05/2016
[ 1171.419774] Workqueue: mm_percpu_wq vmstat_update
[ 1171.442726] Call Trace:
[ 1171.454481] dump_stack+0x57/0x6a
[ 1171.470597] bad_page.cold.114+0x9b/0xa0
[ 1171.489841] free_pcppages_bulk+0x538/0x760
[ 1171.509124] drain_zone_pages+0x1f/0x30
[ 1171.527649] refresh_cpu_vm_stats+0x1ea/0x2b0
[ 1171.548935] vmstat_update+0xf/0x50
[ 1171.565961] process_one_work+0x1a4/0x340
[ 1171.585142] ? process_one_work+0x340/0x340
[ 1171.605147] worker_thread+0x30/0x370
[ 1171.622603] ? process_one_work+0x340/0x340
[ 1171.642355] kthread+0x116/0x130
[ 1171.657519] ? kthread_park+0x80/0x80
[ 1171.674713] ret_from_fork+0x22/0x30
[ 1171.691291] Disabling lock debugging due to kernel taint
How confident are you in the bisection?
Jason
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org