On Thu, Jan 12, 2023 at 05:21:30PM +0000, Jonathan Cameron wrote: > On Thu, 12 Jan 2023 10:39:17 -0500 > Gregory Price <gregory.pr...@memverge.com> wrote: > > > On Wed, Jan 11, 2023 at 02:24:32PM +0000, Jonathan Cameron via wrote: > > > Gregory's patches were posted as part of his work on adding volatile > > > support. > > > https://lore.kernel.org/linux-cxl/20221006233702.18532-1-gregory.pr...@memverge.com/ > > > https://lore.kernel.org/linux-cxl/20221128150157.97724-2-gregory.pr...@memverge.com/ > > > I might propose this for upstream inclusion this cycle, but testing is > > > currently limited by lack of suitable kernel support. > > > > fwiw the testing i've done suggests the problem isn't necessarily the > > implementation so much as either the EFI support or the ACPI tables. > > > > For example, we see memory expanders come up no problem and turn into > > volatile memory on real hardware, with the same kernels with just a few > > commands. My gut feeling is that either a mailbox command is missing or > > that the ACPI tables are missing/significantly different. > > > > I haven't been able to investigate further at this point, but that's my > > current state with the voltile type-3 device testing. > > My assumption was that all shipping hardware platforms were doing the > enumeration and bring up of memory expanders in the BIOS / firmware. > Those are then presented to the OS already set up exactly as if they were > normal memory. We could do the same on QEMU but that means a lot of > work in EDK2. Note that it makes no sense to do the enumeration and > creation of ACPI tables in QEMU itself though could hack it like that. > This stuff is done in firmware because that enables it for legacy > OSes. Everything is more or less presented to the OS like you would > present RAM (EFI memory map, ACPI tables etc). > > Firmware enumeration doesn't typically support hotplug, so if we add > support for hotplug of volatile memory type 3 devices to the kernel > we will also be able to do 'cold plug' and have the kernel bring them up > in a similar fashion to what we do for non-volatile (for non volatile there > is typically no real support in firmware as there is a bunch of policy to > deal with that doesn't belong in firmware). (simplifying heavily ;) > > So I don't think we are missing anything in the emulation, just in the > software layers above it. Could be wrong though ;) > > Jonathan > >
I'm not so sure something is missing so much as something seems incorrect in either the ACPI table structure definitions, the mailbox, or even the doe emulation. I took your branch and reverted to just prior to the volatile patch refernce: 59a59ef725699e0efb3e9e31a7f8d246de7286ed QEMU configuration for boot (Please let me know if something is wrong) sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \ -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \ -m 2G,slots=4,maxmem=4G \ -smp 4 \ -machine type=q35,accel=kvm,cxl=on \ -enable-kvm \ -nographic \ -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \ -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \ -object memory-backend-file,pmem=true,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=1G \ -object memory-backend-file,pmem=true,id=lsa0,mem-path=/tmp/cxl-lsa0,size=1G \ -device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 \ -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G After boot we find: [root@fedora ~]# ls /sys/bus/cxl/devices/ decoder0.0 decoder2.0 mem0 pmem0 root0 decoder1.0 endpoint2 nvdimm-bridge0 port1 [root@fedora ~]# ls -al /sys/bus/dax/devices/ total 0 drwxr-xr-x. 2 root root 0 Jan 12 22:44 . drwxr-xr-x. 4 root root 0 Jan 12 22:44 .. During boot, I am seeing three separate call traces, all of which appear to be related to PCI DOE and/or getting CDAT information. [ 3.916900] Call Trace: [ 3.916906] <TASK> [ 3.931217] pci_doe_submit_task+0x5d/0xd0 [ 3.936609] pci_doe_discovery+0xb4/0x100 [ 3.936627] ? pci_doe_xa_destroy+0x10/0x10 [ 3.942675] pcim_doe_create_mb+0x219/0x290 [ 3.950506] cxl_pci_probe+0x192/0x430 [ 3.960248] local_pci_probe+0x41/0x80 [ 3.966564] pci_device_probe+0xb3/0x220 [ 3.966579] really_probe+0xde/0x380 [ 3.966583] ? pm_runtime_barrier+0x50/0x90 [ 3.969158] __driver_probe_device+0x78/0x170 [ 3.969167] driver_probe_device+0x1f/0x90 [ 3.978264] __driver_attach_async_helper+0x5c/0xe0 [ 3.983953] async_run_entry_fn+0x30/0x130 [ 3.991084] process_one_work+0x294/0x5b0 [ 4.004458] worker_thread+0x4f/0x3a0 [ 4.012612] ? process_one_work+0x5b0/0x5b0 [ 4.019114] kthread+0xf5/0x120 [ 4.025133] ? kthread_complete_and_exit+0x20/0x20 [ 4.031327] ret_from_fork+0x22/0x30 [ 4.038969] </TASK> [ 16.047704] pci_doe_submit_task+0x5d/0xd0 [ 16.047713] cxl_cdat_get_length+0xb8/0x110 [ 16.047779] ? dvsec_range_allowed+0x60/0x60 [ 16.047803] read_cdat_data+0xaf/0x1a0 [ 16.047814] cxl_port_probe+0x80/0x120 [ 16.047824] cxl_bus_probe+0x17/0x50 [ 16.047830] really_probe+0xde/0x380 [ 16.047835] ? pm_runtime_barrier+0x50/0x90 [ 16.047843] __driver_probe_device+0x78/0x170 [ 16.047851] driver_probe_device+0x1f/0x90 [ 16.047858] __device_attach_driver+0x85/0x110 [ 16.047881] ? driver_allows_async_probing+0x70/0x70 [ 16.047884] bus_for_each_drv+0x7a/0xb0 [ 16.047896] __device_attach+0xb3/0x1d0 [ 16.047907] bus_probe_device+0x9f/0xc0 [ 16.047913] device_add+0x41e/0x9b0 [ 16.047918] ? kobject_set_name_vargs+0x6d/0x90 [ 16.047928] ? dev_set_name+0x4b/0x60 [ 16.047944] devm_cxl_add_port+0x27b/0x3b0 [ 16.047970] devm_cxl_add_endpoint+0x82/0x130 [ 16.047982] cxl_mem_probe+0xc4/0x11d [cxl_mem] [ 16.047997] cxl_bus_probe+0x17/0x50 [ 16.048003] really_probe+0xde/0x380 [ 16.048007] ? pm_runtime_barrier+0x50/0x90 [ 16.048014] __driver_probe_device+0x78/0x170 [ 16.048022] driver_probe_device+0x1f/0x90 [ 16.048029] __driver_attach+0xd5/0x1d0 [ 16.048036] ? __device_attach_driver+0x110/0x110 [ 16.048040] bus_for_each_dev+0x76/0xa0 [ 16.048051] bus_add_driver+0x1b1/0x200 [ 16.048061] driver_register+0x89/0xe0 [ 16.048066] ? 0xffffffffc056e000 [ 16.048070] do_one_initcall+0x6e/0x320 [ 16.048091] do_init_module+0x4a/0x200 [ 16.048099] __do_sys_init_module+0x16a/0x1a0 [ 16.048132] do_syscall_64+0x5b/0x80 [ 16.048138] ? lock_is_held_type+0xe8/0x140 [ 16.048148] ? asm_exc_page_fault+0x22/0x30 [ 16.048156] ? lockdep_hardirqs_on+0x7d/0x100 [ 16.048162] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 16.054601] pci_doe_submit_task+0x5d/0xd0 [ 16.054610] cxl_cdat_read_table.isra.0+0x141/0x190 [ 16.054660] ? dvsec_range_allowed+0x60/0x60 [ 16.054685] read_cdat_data+0xfc/0x1a0 [ 16.054695] cxl_port_probe+0x80/0x120 [ 16.054706] cxl_bus_probe+0x17/0x50 [ 16.054712] really_probe+0xde/0x380 [ 16.054717] ? pm_runtime_barrier+0x50/0x90 [ 16.054725] __driver_probe_device+0x78/0x170 [ 16.054733] driver_probe_device+0x1f/0x90 [ 16.054739] __device_attach_driver+0x85/0x110 [ 16.054747] ? driver_allows_async_probing+0x70/0x70 [ 16.054751] bus_for_each_drv+0x7a/0xb0 [ 16.054767] __device_attach+0xb3/0x1d0 [ 16.054782] bus_probe_device+0x9f/0xc0 [ 16.054791] device_add+0x41e/0x9b0 [ 16.054798] ? kobject_set_name_vargs+0x6d/0x90 [ 16.054811] ? dev_set_name+0x4b/0x60 [ 16.054831] devm_cxl_add_port+0x27b/0x3b0 [ 16.054843] devm_cxl_add_endpoint+0x82/0x130 [ 16.054854] cxl_mem_probe+0xc4/0x11d [cxl_mem] [ 16.054869] cxl_bus_probe+0x17/0x50 [ 16.054875] really_probe+0xde/0x380 [ 16.054879] ? pm_runtime_barrier+0x50/0x90 [ 16.054887] __driver_probe_device+0x78/0x170 [ 16.054894] driver_probe_device+0x1f/0x90 [ 16.054901] __driver_attach+0xd5/0x1d0 [ 16.054908] ? __device_attach_driver+0x110/0x110 [ 16.054912] bus_for_each_dev+0x76/0xa0 [ 16.054923] bus_add_driver+0x1b1/0x200 [ 16.055204] driver_register+0x89/0xe0 [ 16.055211] ? 0xffffffffc056e000 [ 16.055215] do_one_initcall+0x6e/0x320 [ 16.055237] do_init_module+0x4a/0x200 [ 16.055245] __do_sys_init_module+0x16a/0x1a0 [ 16.055277] do_syscall_64+0x5b/0x80 [ 16.055283] ? lock_is_held_type+0xe8/0x140 [ 16.055294] ? asm_exc_page_fault+0x22/0x30 [ 16.055301] ? lockdep_hardirqs_on+0x7d/0x100 [ 16.055307] entry_SYSCALL_64_after_hwframe+0x63/0xcd