On Thu, Jun 18, 2026 at 05:38:33PM -0700, Alison Schofield wrote:
> On Wed, Jun 17, 2026 at 10:52:04PM -0700, Anisa Su wrote:
> > On Wed, Jun 17, 2026 at 12:10:07AM -0700, Alison Schofield wrote:
> > > On Thu, Jun 04, 2026 at 10:43:10PM -0700, Alison Schofield wrote:
> > > > On Sat, May 23, 2026 at 02:50:35AM -0700, Anisa Su wrote:
> > > > > CXL Dynamic Capacity Device (DCD) support has continued to evolve in
> > > > > the
> > > > > upstream kernel since Ira's v5 posting [1]. The kernel side has
> > > > > settled
> > > > > on a uuid-driven claim model for sparse DAX devices: dax_resources
> > > > > carry
> > > > > the tag delivered with each extent, and userspace selects which ones
> > > > > to
> > > > > claim by writing a UUID to the dax device's sysfs 'uuid' attribute (or
> > > > > "0" to claim a single untagged resource). Size on a sparse region is
> > > > > determined by the claim, not requested up-front.
> > > > >
> > > > > This series brings cxl-cli and daxctl in line with that model and
> > > > > extends cxl_test to exercise the new paths end-to-end.
> > > >
> > > > Hi Anisa,
> > > >
> > > > I just now picked this up with the kernel side and took it for a quick
> > > > test drive. Based on what's been touched, first meaningful finding is
> > > > all the DAX unit tests pass, and then for CXL unit tests, all but these
> > > > 2 pass: cxl-security.sh and cxl-dcd.sh
> > > >
> > > > Please let me know if there are known problems with either of those
> > > > before I explore further.
> > >
> > > Hi Anisa,
> > >
> > > Good news, DCD exposed a long hidden bug that made cxl-security.sh
> > > fail. It is not an issue w DCD patches.
> > >
> > > Found that DCD set changes which mock memdev the test happens to
> > > land on, and that's enough to uncover a latent hex/decimal bug in
> > > CXL nvdimm code. We use to always land on id '1', but now this patch:
> > >
> > > tools/testing/cxl: Add DC Regions to mock mem data
> > >
> > > reorders the sorted dimm list, so the test selects a dimm with
> > > serial 10 (0xa), and there's the hex/decimal mismatch.
> > >
> > > The renumbering is harmless in itself but it just changed the
> > > serial the test exercises and tripped over the old bug.
> > >
> > > I'll send a separate fixup patch for the hex/dec cleanup.
> > >
> > > (No answer on cxl-dcd.sh yet)
> > >
> > > -- Alison
> > >
> > Thanks for looking into this! I can also look into what might be going
> > on with cxl-dcd.sh if you let me know the base commit you applied the
> > dcd patches onto? :)
>
> The base commit was indeed the key to the cxl-dcd.sh failure.
>
> I'm seeing a probe-ordering race that you may not see unless you're
> using v7.1-rc1 or later. The branch linked in the kernel patchset does
> not include this commit -
>
> 39aa1d4be12b ("dax/cxl, hmem: Initialize hmem early and defer dax_cxl
> binding")
>
> Dan changed cxl_dax_region to PROBE_PREFER_ASYNCHRONOUS in support the
> DAX and HMEM synchronization, so I'm guessing that undoing that, is
> not an option. Before that change, cxl_dax_region probed synchronously
> and created the zero-sized seed dax device before cxlr_add_existing_extents()
> ran, so no race existed.
>
> Move to 7.1 and you *should* see cxl-dcd.sh start failing. Since it's a
> timing issue, so you may need to dial down any dynamic debug and do
> repeated runs.
>
> The race is on the dax_region device's devres_head between-
> (a) the asynchronous cxl_dax_region probe reaching really_probe()
> and
> (b) cxlr_add_existing_extents() attaching devres to the same device
>
> really_probe() rejects probing devices that already have resources
> attached. If (b) wins, probe fails with -EBUSY, cxl_dax_region never
> binds, and the seed dax device is never created.
>
> One possible fixup would be to move existing-extent processing into
> cxl_dax_region_probe() so that the resource attachment happens
> within the probe itself. That looked like more restructuring than I
> could quickly test out, so I'm sending it back to you.
>
> Below is a reproducer using cxl_test and cxl-cli. It creates a DC region
> and checks immediately if its dax_region driver bound and a seed dax
> device exists. An 'unbound' dax_region is the bug.
>
> #!/bin/bash
> set -u
> CXL=${CXL:-cxl}; NDCTL=${NDCTL:-ndctl}; TRIALS=${1:-10}
> bound=0 unbound=0
> for t in $(seq 1 "$TRIALS"); do
> $NDCTL disable-region -b cxl_test all >/dev/null 2>&1
> modprobe -r cxl_test 2>/dev/null; modprobe cxl_test
> udevadm settle 2>/dev/null; dmesg -C 2>/dev/null
> # first non-sharable memdev with a dynamic_ram_a partition
> # (serial 56540 == 0xDCDC is the mock's sharable fixture)
> mem=$($CXL list -b cxl_test -Mi \
> | jq -r '.[] | select(.dynamic_ram_a_size != null)
> | select(.serial != 56540) | .memdev' | head -1)
> reg=$($CXL create-region -t dynamic_ram_a -d decoder0.0 -m "$mem" \
> 2>/dev/null | jq -r .region)
> rnum=${reg#region}
> # sample immediately, no sleep (what the test does via daxctl)
> daxreg=$(readlink -f /sys/bus/cxl/devices/"$reg"/dax_region"$rnum"
> 2>/dev/null)
> drv=$([ -e "$daxreg/driver" ] && echo bound || echo UNBOUND)
> seed=$([ -e /sys/bus/dax/devices/dax"$rnum".0/uuid ] && echo yes ||
> echo NO)
> ebusy=$(dmesg 2>/dev/null | grep -c "Resources present before
> probing")
> printf 'trial %2d: %s drv=%-7s seed=%-3s ebusy_msgs=%s\n' \
> "$t" "$reg" "$drv" "$seed" "$ebusy"
> [ "$drv" = bound ] && bound=$((bound+1)) || unbound=$((unbound+1))
> done
> echo "SUMMARY: bound=$bound unbound(FAIL)=$unbound of $TRIALS"
> [ "$unbound" -eq 0 ] || exit 1
>
> Sample output on a failing kernel-
> trial 1: region9 drv=bound seed=yes ebusy_msgs=0
> trial 2: region9 drv=UNBOUND seed=NO ebusy_msgs=1
> trial 3: region9 drv=bound seed=yes ebusy_msgs=0
> trial 4: region9 drv=UNBOUND seed=NO ebusy_msgs=1
> ...
> SUMMARY: bound=4 unbound(FAIL)=4 of 8
>
Thank you so much for investigating. The fix you suggested works: I
moved the processing of existing extents to cxl_dax_region_probe().
This solved the probe-race, but it deadlocked because after processing existing
extents:
cxlr_notify_extent() does guard(device)(dev) on &cxlr->cxlr_dax->dev.
That lock is already held by probe.
To fix:
cxlr_notify_extent() split into core __cxlr_notify_extent() and wrapper
cxlr_notify_extent().
__cxlr_notify_extent() asserts device_lock_assert(dev) instead of
acquiring the lock. Call this directly in the process existing extents
path to skip trying to acquire lock again.
cxlr_notify_extent() wrapper acquires the lock for the other case --
extents added after driver loaded.
Thanks again for the thorough investigation! It was super helpful :)
- Anisa
> >
> > Thanks,
> > Anisa
> >
>
> snip
>