On Thu, 11 Aug 2022 17:46:55 -0700 Dan Williams <dan.j.willi...@intel.com> wrote:
> Dan Williams wrote: > > Bobo WL wrote: > > > Hi Dan, > > > > > > Thanks for your reply! > > > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.willi...@intel.com> > > > wrote: > > > > > > > > What is the output of: > > > > > > > > cxl list -MDTu -d decoder0.0 > > > > > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or > > > > at least not in the specified order, or that validation check is > > > > broken. > > > > > > Command "cxl list -MDTu -d decoder0.0" output: > > > > Thanks for this, I think I know the problem, but will try some > > experiments with cxl_test first. > > Hmm, so my cxl_test experiment unfortunately passed so I'm not > reproducing the failure mode. This is the result of creating x4 region > with devices directly attached to a single host-bridge: > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30)) > { > "region":"region8", > "resource":"0xf1f0000000", > "size":"1024.00 MiB (1073.74 MB)", > "interleave_ways":4, > "interleave_granularity":256, > "decode_state":"commit", > "mappings":[ > { > "position":3, > "memdev":"mem11", > "decoder":"decoder21.0" > }, > { > "position":2, > "memdev":"mem9", > "decoder":"decoder19.0" > }, > { > "position":1, > "memdev":"mem10", > "decoder":"decoder20.0" > }, > { > "position":0, > "memdev":"mem12", > "decoder":"decoder22.0" > } > ] > } > cxl region: cmd_create_region: created 1 region > > > Did the commit_store() crash stop reproducing with latest cxl/preview > > branch? > > I missed the answer to this question. > > All of these changes are now in Linus' tree perhaps give that a try and > post the debug log again? Hi Dan, I've moved onto looking at this one. 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up at some stage), 1 switch, 4 downstream switch ports each with a type 3 I'm not getting a crash, but can't successfully setup a region. Upon adding the final target It's failing in check_last_peer() as pos < distance. Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for some reason or that distance check is wrong. Wasn't a good idea to just skip that step though as it goes boom - though stack trace is not useful. Jonathan