Fan,

Many thanks, it helps a lot. Previous I forgot to create a new dax 
device(daxctl create-device region0)
Question: Why we need to create a the dax0.1, why the dax0.0 doesn't associate 
to the new adding DCD region.

Ira,

Let me try to report a kernel panic.

kernel: dcd-2024-04-17
qemu: dcd-2024-04-17

QEMU command line:
164     <qemu:arg value='-device'/>
165     <qemu:arg 
value='cxl-type3,bus=cxl-rp-hb0rp0,persistent-memdev=cxl-pmem0,lsa=cxl-pmem-lsa0,id=pmem-dcmem,volatile-dc-memdev=cxl-dcmem0,num-dc-regions=4'/>
166     <qemu:arg value='-object'/>
167     <qemu:arg 
value='memory-backend-file,id=cxl-dcmem0,share=on,mem-path=/home/lizhijian/images/cxldcmem0.raw,size=2048M'/>
168     <qemu:arg value='-object'/>
169     <qemu:arg 
value='memory-backend-file,id=cxl-pmem0,share=on,mem-path=/home/lizhijian/images/cxlpmem0.raw,size=2048M'/>
170     <qemu:arg value='-object'/>
171     <qemu:arg 
value='memory-backend-file,id=cxl-pmem-lsa0,share=on,mem-path=/home/lizhijian/images/cxlpmem-lsa0.raw,size=4K'/>
172     <qemu:arg value='-M'/>
173     <qemu:arg 
value='cxl=on,cxl-fmw.0.targets.0=pxb-cxl.0,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k'/>


Reproducer:
  1. guest: ./create-dc.sh
  2. host: virsh qemu-monitor-command rdma-server-cxl-persistent-dcd $(cat 
cxl-add-dcd.json)
  3. guest: daxctl create-device region0 # will create dax0.1
  4. daxctl reconfigure-device  --mode=system-ram --force  dax0.1 -u  # kernel 
panic

=====================
# cat ./create-dc.sh
#!/bin/bash
set -ex

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x10000000 > /sys/bus/cxl/devices/$region/size
echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind
=========================
# cat cxl-add-dcd.json
{ "execute": "cxl-add-dynamic-capacity",
   "arguments": {
       "path": "/machine/peripheral/pmem-dcmem",
       "hid": 0,
       "selection-policy": 2,
       "region-id": 0,
       "tag": "",
       "extents": [
       {
           "offset": 0,
           "len": 268435456
       }
       ]
   }
}



[  126.909297] Demotion targets for Node 0: preferred: 1, fallback: 1
[  126.911186] Demotion targets for Node 1: null
[  126.913808] BUG: kernel NULL pointer dereference, address: 0000000000000468
[  126.915431] #PF: supervisor read access in kernel mode
[  126.917156] #PF: error_code(0x0000) - not-present page
[  126.918976] PGD 8000000006771067 P4D 8000000006771067 PUD e777067 PMD 0
[  126.920587] Oops: 0000 [#1] PREEMPT SMP PTI
[  126.921714] CPU: 0 PID: 1101 Comm: daxctl Kdump: loaded Not tainted 
6.9.0-rc3-lizhijian+ #489
[  126.924914] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  126.928620] RIP: 0010:cxl_region_perf_attrs_callback+0x25/0x110 [cxl_core]
[  126.930316] Code: 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 8b 
6a 24 83 fd ff 74 20 48 83 fe 01 75 1a 48 8b 87 58 ff ff ff 48 89 fb <48> 8b b8 
68 04 00 00 e8 cf a2 f4 e0 39 c5 74 13 45 31 e4 5b 44 89
[  126.934920] RSP: 0018:ffffc900007cbc58 EFLAGS: 00010246
[  126.936994] RAX: 0000000000000000 RBX: ffff888007534d60 RCX: 0000000000000020
[  126.939378] RDX: ffffc900007cbcf8 RSI: 0000000000000001 RDI: ffff888007534d60
[  126.942721] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
[  126.944762] R10: ffff88807fc31d80 R11: 0000000000000000 R12: 0000000000000000
[  126.946900] R13: 0000000000000001 R14: ffffc900007cbcf8 R15: ffff888007534d60
[  126.948871] FS:  00007fb2ab918880(0000) GS:ffff88807fc00000(0000) 
knlGS:0000000000000000
[  126.951241] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  126.952722] CR2: 0000000000000468 CR3: 000000000aaf0003 CR4: 00000000001706f0
[  126.954623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  126.956768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  126.958887] Call Trace:
[  126.959814]  <TASK>
[  126.960569]  ? __die+0x20/0x70
[  126.961645]  ? page_fault_oops+0x15a/0x450
[  126.962930]  ? search_module_extables+0x33/0x90
[  126.964374]  ? fixup_exception+0x22/0x310
[  126.965693]  ? exc_page_fault+0x68/0x200
[  126.967371]  ? asm_exc_page_fault+0x22/0x30
[  126.968713]  ? cxl_region_perf_attrs_callback+0x25/0x110 [cxl_core]
[  126.972508]  notifier_call_chain+0x40/0x110
[  126.974380]  blocking_notifier_call_chain+0x43/0x60
[  126.975788]  online_pages+0x24c/0x2d0
[  126.977008]  memory_subsys_online+0x233/0x290
[  126.978338]  device_online+0x64/0x90
[  126.979440]  state_store+0xae/0xc0
[  126.980510]  kernfs_fop_write_iter+0x143/0x200
[  126.981734]  vfs_write+0x3a6/0x570
[  126.982851]  ksys_write+0x65/0xf0
[  126.984006]  do_syscall_64+0x6d/0x140
[  126.985309]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[  126.986927] RIP: 0033:0x7fb2abc777a7
[  126.987983] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  126.992770] RSP: 002b:00007ffebec70b98 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  126.994874] RAX: ffffffffffffffda RBX: 000000000040e1f0 RCX: 00007fb2abc777a7
[  126.996906] RDX: 000000000000000f RSI: 00007fb2abdb6434 RDI: 0000000000000004
[  126.998911] RBP: 00007ffebec70bd0 R08: 0000000000000000 R09: 00007ffebec70640
[  127.000879] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403840
[  127.003572] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  127.005543]  </TASK>

Thanks
Zhijian

On 17/05/2024 01:12, fan wrote:
> On Tue, May 14, 2024 at 02:16:51AM +0000, Zhijian Li (Fujitsu) wrote:
>> Hi Fan
>>
>>
>> Do you have a newer instruction to play with the DCD. It seems that
>> the instruction in RFC[0] doesn't work for current code.
>>
>> [0] https://lore.kernel.org/all/20230511175609.2091136-1-fan...@samsung.com/
>>
> 
> For the testing, the only thing that has been changed for this series is
> the QMP interface for add/release DC extents.
> 
> https://lore.kernel.org/linux-cxl/d708f7c8-2598-4a17-9cbb-935c6ae2a...@fujitsu.com/T/#m05066f0098e976fb1c4b05db5e7ff7ca1bf27b1e
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>    "arguments": {
>        "path": "/machine/peripheral/cxl-dcd0",
>        "hid": 0,
>        "selection-policy": 2,
>        "region-id": 0,
>        "tag": "",
>        "extents": [
>        {
>            "offset": 0,
>            "len": 134217728
>        },
>        {
>            "offset": 134217728,
>            "len": 134217728
>        }
>        ]
>    }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>    "arguments": {
>        "path": "/machine/peripheral/cxl-dcd0",
>        "hid": 0,
>        "flags": 1,
>        "region-id": 0,
>        "tag": "",
>        "extents": [
>        {
>            "offset": 134217728,
>            "len": 134217728
>        }
>        ]
>    }
> }
> 
> btw, I have a wiki page to explain how to test CXL DCD with a tool I
> wrote.
> https://github.com/moking/moking.github.io/wiki/cxl%E2%80%90test%E2%80%90tool:-A-tool-to-ease-CXL-test-with-QEMU-setup%E2%80%90%E2%80%90Using-DCD-test-as-an-example
> 


> Let me know if you need more info for testing.
> 
> 
> Fan
> 
>>
>>
>> On 19/04/2024 07:10, nifan....@gmail.com wrote:
>>> A git tree of this series can be found here (with one extra commit on top
>>> for printing out accepted/pending extent list):
>>> https://github.com/moking/qemu/tree/dcd-v7
>>>
>>> v6->v7:
>>>
>>> 1. Fixed the dvsec range register issue mentioned in the the cover letter 
>>> in v6.
>>>      Only relevant bits are set to mark the device ready (Patch 6). 
>>> (Jonathan)
>>> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. 
>>> (Jonathan)
>>> 3. Used MIN instead of if statement to get record_count in Patch 7. 
>>> (Jonathan)
>>> 4. Added "Reviewed-by" tag to Patch 7.
>>> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>>>      reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. 
>>> (Jørgen)
>>> 6. Added comments to indicate further "TODO" items in 
>>> cmd_dcd_add_dyn_cap_rsp.
>>>       (Jonathan)
>>> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
>>> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>>>      tags, selection policy, flags in the interface. (Jonathan, Gregory)
>>> 9. Redesigned the pending list so extents in the same requests are grouped
>>>       together. A new data structure is introduced to represent "extent 
>>> group"
>>>       in pending list.  (Jonathan)
>>> 10. Added support in QMP interface for "More" flag.
>>> 11. Check "Forced removal" flag for release request and not let it pass 
>>> through.
>>> 12. Removed the dynamic capacity log type from CxlEventLog definition in 
>>> cxl.json
>>>      to avoid the side effect it may introduce to inject error to DC event 
>>> log.
>>>      (Jonathan)
>>> 13. Hard coded the event log type to dynamic capacity event log in QMP
>>>       interfaces. (Jonathan)
>>> 14. Adding space in between "-1]". (Jonathan)
>>> 15. Some minor comment fixes.
>>>
>>> The code is tested with similar setup and has passed similar tests as listed
>>> in the cover letter of v5[1] and v6[2].
>>> Also, the code is tested with the latest DCD kernel patchset[3].
>>>
>>> [1] Qemu DCD patchset v5: 
>>> https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan....@gmail.com/T/#t
>>> [2] Qemu DCD patchset v6: 
>>> https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan....@gmail.com/T/#t
>>> [3] DCD kernel patches: 
>>> https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623...@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
>>>
>>>
>>> Fan Ni (12):
>>>     hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>>>       payload of identify memory device command
>>>     hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>>>       and mailbox command support
>>>     include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
>>>       type3 memory devices
>>>     hw/mem/cxl_type3: Add support to create DC regions to type3 memory
>>>       devices
>>>     hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
>>>       size instead of mr as argument
>>>     hw/mem/cxl_type3: Add host backend and address space handling for DC
>>>       regions
>>>     hw/mem/cxl_type3: Add DC extent list representative and get DC extent
>>>       list mailbox support
>>>     hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>>>       dynamic capacity response
>>>     hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
>>>       extents
>>>     hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>>>     hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>>>     hw/mem/cxl_type3: Allow to release extent superset in QMP interface
>>>
>>>    hw/cxl/cxl-mailbox-utils.c  | 620 ++++++++++++++++++++++++++++++++++-
>>>    hw/mem/cxl_type3.c          | 633 +++++++++++++++++++++++++++++++++---
>>>    hw/mem/cxl_type3_stubs.c    |  20 ++
>>>    include/hw/cxl/cxl_device.h |  81 ++++-
>>>    include/hw/cxl/cxl_events.h |  18 +
>>>    qapi/cxl.json               |  69 ++++
>>>    6 files changed, 1396 insertions(+), 45 deletions(-)
>>>

Reply via email to