On Thu, Jan 12, 2023 at 05:21:30PM +0000, Jonathan Cameron wrote:
> On Thu, 12 Jan 2023 10:39:17 -0500
> Gregory Price <gregory.pr...@memverge.com> wrote:
> 
> > On Wed, Jan 11, 2023 at 02:24:32PM +0000, Jonathan Cameron via wrote:
> > > Gregory's patches were posted as part of his work on adding volatile 
> > > support.
> > > https://lore.kernel.org/linux-cxl/20221006233702.18532-1-gregory.pr...@memverge.com/
> > > https://lore.kernel.org/linux-cxl/20221128150157.97724-2-gregory.pr...@memverge.com/
> > > I might propose this for upstream inclusion this cycle, but testing is
> > > currently limited by lack of suitable kernel support.  
> > 
> > fwiw the testing i've done suggests the problem isn't necessarily the
> > implementation so much as either the EFI support or the ACPI tables.
> > 
> > For example, we see memory expanders come up no problem and turn into
> > volatile memory on real hardware, with the same kernels with just a few
> > commands.  My gut feeling is that either a mailbox command is missing or
> > that the ACPI tables are missing/significantly different.
> > 
> > I haven't been able to investigate further at this point, but that's my
> > current state with the voltile type-3 device testing.
> 
> My assumption was that all shipping hardware platforms were doing the
> enumeration and bring up of memory expanders in the BIOS / firmware.
> Those are then presented to the OS already set up exactly as if they were
> normal memory.  We could do the same on QEMU but that means a lot of
> work in EDK2. Note that it makes no sense to do the enumeration and
> creation of ACPI tables in QEMU itself though could hack it like that.
> This stuff is done in firmware because that enables it for legacy
> OSes. Everything is more or less presented to the OS like you would
> present RAM (EFI memory map, ACPI tables etc).
> 
> Firmware enumeration doesn't typically support hotplug, so if we add
> support for hotplug of volatile memory type 3 devices to the kernel
> we will also be able to do 'cold plug' and have the kernel bring them up
> in a similar fashion to what we do for non-volatile (for non volatile there
> is typically no real support in firmware as there is a bunch of policy to
> deal with that doesn't belong in firmware). (simplifying heavily ;)
> 
> So I don't think we are missing anything in the emulation, just in the
> software layers above it.  Could be wrong though ;)
> 
> Jonathan
> 
>

I'm not so sure something is missing so much as something seems
incorrect in either the ACPI table structure definitions, the mailbox,
or even the doe emulation.

I took your branch and reverted to just prior to the volatile patch
refernce: 59a59ef725699e0efb3e9e31a7f8d246de7286ed


QEMU configuration for boot (Please let me know if something is wrong)

sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
-drive 
file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 2G,slots=4,maxmem=4G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
-object 
memory-backend-file,pmem=true,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=1G \
-object memory-backend-file,pmem=true,id=lsa0,mem-path=/tmp/cxl-lsa0,size=1G \
-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G

After boot we find:

[root@fedora ~]# ls /sys/bus/cxl/devices/
decoder0.0  decoder2.0  mem0            pmem0  root0
decoder1.0  endpoint2   nvdimm-bridge0  port1

[root@fedora ~]# ls -al /sys/bus/dax/devices/
total 0
drwxr-xr-x. 2 root root 0 Jan 12 22:44 .
drwxr-xr-x. 4 root root 0 Jan 12 22:44 ..


During boot, I am seeing three separate call traces, all of which appear
to be related to PCI DOE and/or getting CDAT information.

[    3.916900] Call Trace:
[    3.916906]  <TASK>
[    3.931217]  pci_doe_submit_task+0x5d/0xd0
[    3.936609]  pci_doe_discovery+0xb4/0x100
[    3.936627]  ? pci_doe_xa_destroy+0x10/0x10
[    3.942675]  pcim_doe_create_mb+0x219/0x290
[    3.950506]  cxl_pci_probe+0x192/0x430
[    3.960248]  local_pci_probe+0x41/0x80
[    3.966564]  pci_device_probe+0xb3/0x220
[    3.966579]  really_probe+0xde/0x380
[    3.966583]  ? pm_runtime_barrier+0x50/0x90
[    3.969158]  __driver_probe_device+0x78/0x170
[    3.969167]  driver_probe_device+0x1f/0x90
[    3.978264]  __driver_attach_async_helper+0x5c/0xe0
[    3.983953]  async_run_entry_fn+0x30/0x130
[    3.991084]  process_one_work+0x294/0x5b0
[    4.004458]  worker_thread+0x4f/0x3a0
[    4.012612]  ? process_one_work+0x5b0/0x5b0
[    4.019114]  kthread+0xf5/0x120
[    4.025133]  ? kthread_complete_and_exit+0x20/0x20
[    4.031327]  ret_from_fork+0x22/0x30
[    4.038969]  </TASK>

[   16.047704]  pci_doe_submit_task+0x5d/0xd0
[   16.047713]  cxl_cdat_get_length+0xb8/0x110
[   16.047779]  ? dvsec_range_allowed+0x60/0x60
[   16.047803]  read_cdat_data+0xaf/0x1a0
[   16.047814]  cxl_port_probe+0x80/0x120
[   16.047824]  cxl_bus_probe+0x17/0x50
[   16.047830]  really_probe+0xde/0x380
[   16.047835]  ? pm_runtime_barrier+0x50/0x90
[   16.047843]  __driver_probe_device+0x78/0x170
[   16.047851]  driver_probe_device+0x1f/0x90
[   16.047858]  __device_attach_driver+0x85/0x110
[   16.047881]  ? driver_allows_async_probing+0x70/0x70
[   16.047884]  bus_for_each_drv+0x7a/0xb0
[   16.047896]  __device_attach+0xb3/0x1d0
[   16.047907]  bus_probe_device+0x9f/0xc0
[   16.047913]  device_add+0x41e/0x9b0
[   16.047918]  ? kobject_set_name_vargs+0x6d/0x90
[   16.047928]  ? dev_set_name+0x4b/0x60
[   16.047944]  devm_cxl_add_port+0x27b/0x3b0
[   16.047970]  devm_cxl_add_endpoint+0x82/0x130
[   16.047982]  cxl_mem_probe+0xc4/0x11d [cxl_mem]
[   16.047997]  cxl_bus_probe+0x17/0x50
[   16.048003]  really_probe+0xde/0x380
[   16.048007]  ? pm_runtime_barrier+0x50/0x90
[   16.048014]  __driver_probe_device+0x78/0x170
[   16.048022]  driver_probe_device+0x1f/0x90
[   16.048029]  __driver_attach+0xd5/0x1d0
[   16.048036]  ? __device_attach_driver+0x110/0x110
[   16.048040]  bus_for_each_dev+0x76/0xa0
[   16.048051]  bus_add_driver+0x1b1/0x200
[   16.048061]  driver_register+0x89/0xe0
[   16.048066]  ? 0xffffffffc056e000
[   16.048070]  do_one_initcall+0x6e/0x320
[   16.048091]  do_init_module+0x4a/0x200
[   16.048099]  __do_sys_init_module+0x16a/0x1a0
[   16.048132]  do_syscall_64+0x5b/0x80
[   16.048138]  ? lock_is_held_type+0xe8/0x140
[   16.048148]  ? asm_exc_page_fault+0x22/0x30
[   16.048156]  ? lockdep_hardirqs_on+0x7d/0x100
[   16.048162]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

[   16.054601]  pci_doe_submit_task+0x5d/0xd0
[   16.054610]  cxl_cdat_read_table.isra.0+0x141/0x190
[   16.054660]  ? dvsec_range_allowed+0x60/0x60
[   16.054685]  read_cdat_data+0xfc/0x1a0
[   16.054695]  cxl_port_probe+0x80/0x120
[   16.054706]  cxl_bus_probe+0x17/0x50
[   16.054712]  really_probe+0xde/0x380
[   16.054717]  ? pm_runtime_barrier+0x50/0x90
[   16.054725]  __driver_probe_device+0x78/0x170
[   16.054733]  driver_probe_device+0x1f/0x90
[   16.054739]  __device_attach_driver+0x85/0x110
[   16.054747]  ? driver_allows_async_probing+0x70/0x70
[   16.054751]  bus_for_each_drv+0x7a/0xb0
[   16.054767]  __device_attach+0xb3/0x1d0
[   16.054782]  bus_probe_device+0x9f/0xc0
[   16.054791]  device_add+0x41e/0x9b0
[   16.054798]  ? kobject_set_name_vargs+0x6d/0x90
[   16.054811]  ? dev_set_name+0x4b/0x60
[   16.054831]  devm_cxl_add_port+0x27b/0x3b0
[   16.054843]  devm_cxl_add_endpoint+0x82/0x130
[   16.054854]  cxl_mem_probe+0xc4/0x11d [cxl_mem]
[   16.054869]  cxl_bus_probe+0x17/0x50
[   16.054875]  really_probe+0xde/0x380
[   16.054879]  ? pm_runtime_barrier+0x50/0x90
[   16.054887]  __driver_probe_device+0x78/0x170
[   16.054894]  driver_probe_device+0x1f/0x90
[   16.054901]  __driver_attach+0xd5/0x1d0
[   16.054908]  ? __device_attach_driver+0x110/0x110
[   16.054912]  bus_for_each_dev+0x76/0xa0
[   16.054923]  bus_add_driver+0x1b1/0x200
[   16.055204]  driver_register+0x89/0xe0
[   16.055211]  ? 0xffffffffc056e000
[   16.055215]  do_one_initcall+0x6e/0x320
[   16.055237]  do_init_module+0x4a/0x200
[   16.055245]  __do_sys_init_module+0x16a/0x1a0
[   16.055277]  do_syscall_64+0x5b/0x80
[   16.055283]  ? lock_is_held_type+0xe8/0x140
[   16.055294]  ? asm_exc_page_fault+0x22/0x30
[   16.055301]  ? lockdep_hardirqs_on+0x7d/0x100
[   16.055307]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reply via email to