On Thu, 2022-08-11 at 12:34 -0700, Dan Williams wrote: > Vishal Verma wrote: > > Add a 'create-region' command to cxl-cli that walks the platform's CXL > > hierarchy to find an appropriate root decoder based on any options > > provided, and uses libcxl APIs to create a 'region' that is comprehended > > by libnvdimm and ndctl. > > > > Cc: Dan Williams <[email protected]> > > Signed-off-by: Vishal Verma <[email protected]> > > --- > > Documentation/cxl/bus-option.txt | 5 + > > Documentation/cxl/cxl-create-region.txt | 114 +++++ > > Documentation/cxl/region-description.txt | 7 + > > cxl/builtin.h | 1 + > > cxl/filter.h | 4 +- > > cxl/cxl.c | 1 + > > cxl/region.c | 594 +++++++++++++++++++++++ > > Documentation/cxl/meson.build | 2 + > > cxl/meson.build | 1 + > > 9 files changed, 728 insertions(+), 1 deletion(-) > > create mode 100644 Documentation/cxl/bus-option.txt > > create mode 100644 Documentation/cxl/cxl-create-region.txt > > create mode 100644 Documentation/cxl/region-description.txt > > create mode 100644 cxl/region.c > > > > diff --git a/Documentation/cxl/bus-option.txt > > b/Documentation/cxl/bus-option.txt > > new file mode 100644 > > index 0000000..02e2f08 > > --- /dev/null > > +++ b/Documentation/cxl/bus-option.txt > > @@ -0,0 +1,5 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +-b:: > > +--bus=:: > > + Restrict the operation to the specified bus. > > diff --git a/Documentation/cxl/cxl-create-region.txt > > b/Documentation/cxl/cxl-create-region.txt > > new file mode 100644 > > index 0000000..15dc742 > > --- /dev/null > > +++ b/Documentation/cxl/cxl-create-region.txt > > @@ -0,0 +1,114 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +cxl-create-region(1) > > +==================== > > + > > +NAME > > +---- > > +cxl-create-region - Assemble a CXL region by setting up attributes of its > > +constituent CXL memdevs. > > + > > +SYNOPSIS > > +-------- > > +[verse] > > +'cxl create-region [<options>]' > > + > > +include::region-description.txt[] > > + > > +For create-region, a size can optionally be specified, but if not, the > > maximum > > +possible size for each memdev will be used up to the available decode > > capacity > > +in the system for the given memory type. For persistent regions a UUID can > > +optionally be specified, but if not, one will be generated. > > + > > +If the region-creation operation is successful, a region object will be > > +emitted on stdout in JSON format (see examples). If the specified arguments > > +cannot be satisfied with a legal configuration, then an appropriate error > > will > > +be emitted on stderr. > > + > > +EXAMPLE > > +------- > > +---- > > +# cxl create-region -m -d decoder0.1 -w 2 -g 1024 mem0 mem1 > > +{ > > + "region":"region0", > > + "resource":"0xc90000000", > > + "size":"512.00 MiB (536.87 MB)", > > + "interleave_ways":2, > > + "interleave_granularity":1024, > > + "mappings":[ > > + { > > + "position":1, > > + "decoder":"decoder4.0" > > + }, > > + { > > + "position":0, > > + "decoder":"decoder3.0" > > + } > > + ] > > +} > > +created 1 region > > +---- > > + > > +OPTIONS > > +------- > > +<target(s)>:: > > +The CXL targets that should be used to form the region. This is optional, > > +as they can be chosen automatically based on other options chosen. The > > number of > > +'target' arguments must match the '--ways' option (if provided). The > > +targets may be memdevs, or endpoints. The options below control what type > > of > > +targets are being used. > > + > > +include::bus-option.txt[] > > + > > +-m:: > > +--memdevs:: > > + Indicate that the non-option arguments for 'target(s)' refer to > > memdev > > + names. > > Are they names or filter parameters ala 'cxl list -m'? I.e. do you > foresee being able to do something like:
Hm, in the current implementation they really are just names - I just use the remaining argv[] as the targets array, but.. > > "cxl create-region -p $port -m all" > > ...to just select all the memdevs that are descendants of $port in the > future? More of a curiosity about future possibilities then a request > for change. This sounds like a useful thing to do - perhaps with the next bit of porcelain additions. > > > + > > +-e:: > > +--ep-decoders:: > > + Indicate that the non-option arguments for 'target(s)' refer to > > endpoint > > + decoder names. > > I wonder if this should have a note about it being for test-only > purposes? Given the strict CXL decoder allocation rules I wonder if > anyone can use this in practice? There might be some synergy with 'cxl > reserve-dpa' where this option could be used to say "do not allocate new > decoders, and do not reserve more DPA, just try to use the DPA that was > already reserved in the following decoders". > > We might even delete this option for now unless I am missing the marquee > use case for it? Because unless someone knows what they are doing they > are almost always going to be wrong. Yeah I agree - it is definitely not straightforward to use, and I don't see a practical use case. I think I only had it because the ABI wanted decoders, and very early implementations had me explicitly asking for /everything/. I'm okay adding a note that this shouldn't normally be used, or removing it entirely. > > > + > > +-s:: > > +--size=:: > > + Specify the total size for the new region. This is optional, and by > > + default, the maximum possible size will be used. > > How about add: > > "The maximum possible size is gated by both the contiguous free HPA > space remaining in the root decoder, and the available DPA space in the > component memdevs." Yep. > > > + > > +-t:: > > +--type=:: > > + Specify the region type - 'pmem' or 'ram'. Defaults to 'pmem'. > > + > > +-U:: > > +--uuid=:: > > + Specify a UUID for the new region. This shouldn't usually need to be > > + specified, as one will be generated by default. > > + > > +-w:: > > +--ways=:: > > + The number of interleave ways for the new region's interleave. This > > + should be equal to the number of memdevs specified in --memdevs, if > > + --memdevs is being supplied. If --memdevs is not specified, an > > + appropriate number of memdevs will be chosen based on the number of > > + ways specified. > > The reverse is also true, right? That if -w is not specified then the > number of ways is determined by the number of targets specified, or by > other default target searches. I guess notes about those enhanced > default modes can wait until more 'create-region' porcelain arrives. Hm actually in the current state, /only/ the reverse is true, so this description was certainly a bit forward looking. I'll fix to say what it actually does today. > > > + > > +-g:: > > +--granularity=:: > > + The interleave granularity for the new region. Must match the > > selected > > + root decoder's (if provided) granularity. > > This just has me thinking that the kernel needs to up-level its code > comments and changelogs on the "why" for this restriction to somewhere > this man page can reference. > > However second sentence should be: > > "If the root decoder is interleaved across more than one host-bridge > then this value must match that granularity. Otherwise, for > non-interleaved decode windows, any granularity can be > specified as long as all devices support that setting." > > As I type that it raises 2 questions: > > 1/ If someone does "cxl create-region -g 1024" with no other arguments, > will it fallback to a decoder that can support that setting if its first > choice can not? Well there's no automatic root decoder selection at all in this series, but for future porcelain, I'd imagine it should try to match exactly any args that were specified, and fail if /all/ of those can't be matched. e.g.: decoder1 - 2-way - IG:256, and decoder2 - 1-way - IG:1024 and we get 'cxl create-region -g 1024', we should pick decoder 2, create a x1 interleave under it. Does that make sense? > > 2/ Per Dave's recent series [1], cxl-cli is missing any consideration > that a given endpoint may not have the support for the interleave > address bits that are necessary for a given 'create-region' request. > Does not need to be solved now, but should be queued up next. > > 2a/ Same for interleave-ways, there are endpoint capabilities that need > to be accounted. Yep agreed. > > [1]: > https://lore.kernel.org/linux-cxl/165999244272.493131.1975513183227389633.st...@djiang5-desk4.jf.intel.com/ > [..] > > > > + > > + > > +static int parse_create_options(int argc, const char **argv, > > + struct parsed_params *p) > > +{ > > + int i; > > + > > + if (!param.root_decoder) { > > + log_err(&rl, "no root decoder specified\n"); > > + return -EINVAL; > > + } > > + > > + if (param.type) { > > + if (strcmp(param.type, "ram") == 0) > > + p->mode = CXL_DECODER_MODE_RAM; > > + else if (strcmp(param.type, "volatile") == 0) > > + p->mode = CXL_DECODER_MODE_RAM; > > + else if (strcmp(param.type, "pmem") == 0) > > + p->mode = CXL_DECODER_MODE_PMEM; > > + else { > > + log_err(&rl, "unsupported type: %s\n", param.type); > > + return -EINVAL; > > This probably wants a common helper that can be shared with > __reserve_dpa() and add_cxl_decoder() that do the same conversion. I.e. > cxl_decoder_mode_name() needs a buddy. That can be a follow on change. > ISTR I might have already noted that, so forgive the duplicate comment. Ah I do remember this, I probably just missed this cleanup - I'll add it for v3. > > > + } > > + } else > > + p->mode = CXL_DECODER_MODE_PMEM; > > + > > + if (param.size) { > > + p->size = parse_size64(param.size); > > + if (p->size == ULLONG_MAX) { > > + log_err(&rl, "Invalid size: %s\n", param.size); > > + return -EINVAL; > > + } > > + } > > + > > + if (param.ways) { > > + unsigned long ways = strtoul(param.ways, NULL, 0); > > + > > + if (ways == ULONG_MAX || (int)ways <= 0) { > > + log_err(&rl, "Invalid interleave ways: %s\n", > > + param.ways); > > + return -EINVAL; > > + } > > + p->ways = ways; > > + } else if (argc) { > > + p->ways = argc; > > This is where: > > cxl create-region -p $port -m all > > ...would not work, but maybe requiring explicit targets is ok. There's > so many potential moving pieces in a CXL topology maybe we do not want > to go there with this flexibility. Hm yeah - true. It was certainly convenient (ab)using argc and argv for this, but if we did add the flexibility of specifying a filter, or even multiple filters in the future, the targets array would need proper reconstruction anyway, and counting the objects in it would come alongwith it. > > > [..] > > + > > +static int cxl_region_determine_granularity(struct cxl_region *region, > > + struct parsed_params *p) > > +{ > > + const char *devname = cxl_region_get_devname(region); > > + unsigned int granularity, ways; > > + > > + /* Default granularity will be the root decoder's granularity */ > > + granularity = > > cxl_decoder_get_interleave_granularity(p->root_decoder); > > + if (granularity == 0 || granularity == UINT_MAX) { > > + log_err(&rl, "%s: unable to determine root decoder > > granularity\n", > > + devname); > > + return -ENXIO; > > + } > > + > > + /* If no user-supplied granularity, just use the default */ > > + if (!p->granularity) > > + return granularity; > > + > > + ways = cxl_decoder_get_interleave_ways(p->root_decoder); > > + if (ways == 0 || ways == UINT_MAX) { > > + log_err(&rl, "%s: unable to determine root decoder ways\n", > > + devname); > > + return -ENXIO; > > + } > > + > > + /* For ways == 1, any user-supplied granularity is fine */ > > ...modulo a future patch that checks the device's interleave capability. > > Rest of the patch looks good. After fixing up the option descriptions > per above you can add: > > Reviewed-by: Dan Williams <[email protected]> Thanks!
