Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory

2018-08-31 Thread Jonathan Cameron
On Thu, 30 Aug 2018 12:53:40 -0600
Logan Gunthorpe  wrote:

> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in peer-to-peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> Add an interface for other subsystems to find and allocate chunks of P2P
> memory as necessary to facilitate transfers between two PCI peers:
> 
> int pci_p2pdma_add_client();
> struct pci_dev *pci_p2pmem_find();
> void *pci_alloc_p2pmem();
> 
> The new interface requires a driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary (adding incompatible clients
> will fail). With a suitable p2pmem device, memory can then be
> allocated with pci_alloc_p2pmem() for use in DMA transactions.
> 
> Depending on hardware, using peer-to-peer memory may reduce the bandwidth
> of the transfer but can significantly reduce pressure on system memory.
> This may be desirable in many cases: for example a system could be designed
> with a small CPU connected to a PCIe switch by a small number of lanes
> which would maximize the number of lanes available to connect to NVMe
> devices.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI bridge. This is because we
> have no way of knowing whether peer-to-peer routing between PCIe Root Ports
> is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
> transfers that go through the RC is limited to only reducing DRAM usage
> and, in some cases, coding convenience. The PCI-SIG may be exploring
> adding a new capability bit to advertise whether this is possible for
> future hardware.
> 
> This commit includes significant rework and feedback from Christoph
> Hellwig.
> 
> Signed-off-by: Christoph Hellwig 
> Signed-off-by: Logan Gunthorpe 

Apologies for being a late entrant to this conversation so I may be asking
about a topic that has been covered in detail in earlier patches!
> ---
...

> +/*
> + * Find the distance through the nearest common upstream bridge between
> + * two PCI devices.
> + *
> + * If the two devices are the same device then 0 will be returned.
> + *
> + * If there are two virtual functions of the same device behind the same
> + * bridge port then 2 will be returned (one step down to the PCIe switch,
> + * then one step back to the same device).
> + *
> + * In the case where two devices are connected to the same PCIe switch, the
> + * value 4 will be returned. This corresponds to the following PCI tree:
> + *
> + * -+  Root Port
> + *  \+ Switch Upstream Port
> + *   +-+ Switch Downstream Port
> + *   + \- Device A
> + *   \-+ Switch Downstream Port
> + * \- Device B
> + *
> + * The distance is 4 because we traverse from Device A through the downstream
> + * port of the switch, to the common upstream port, back up to the second
> + * downstream port and then to Device B.
> + *
> + * Any two devices that don't have a common upstream bridge will return -1.
> + * In this way devices on separate PCIe root ports will be rejected, which
> + * is what we want for peer-to-peer seeing each PCIe root port defines a
> + * separate hierarchy domain and there's no way to determine whether the root
> + * complex supports forwarding between them.
> + *
> + * In the case where two devices are connected to different PCIe switches,
> + * this function will still return a positive distance as long as both
> + * switches evenutally have a common upstream bridge. Note this covers
> + * the case of using multiple PCIe switches to achieve a desired level of
> + * fan-out from a root port. The exact distance will be a function of the
> + * number of switches between Device A and Device B.

This feels like a somewhat simplistic starting point rather than a
generally correct estimate to use.  Should we be taking the bandwidth of
those links into account for example, or any discoverable latencies?
Not all PCIe switches are alike - particularly when it comes to P2P.

I guess that can be a topic for future development if it turns out people
have horrible mixed systems.

> + *
> + * If a bridge which has any ACS redirection bits set is in the path
> + * then this functions will return -2. This is so we reject any
> + * cases where the TLPs are forwarded up into the root complex.
> + * In this case, a list of all infringing bridge addresses will be
> + * populated in acs_list (assuming it's non-null) for printk purposes.
> + */

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-27 Thread Jonathan Cameron
On Mon, 26 Mar 2018 09:46:24 -0600
Logan Gunthorpe <log...@deltatee.com> wrote:

> On 26/03/18 08:01 AM, Bjorn Helgaas wrote:
> > On Mon, Mar 26, 2018 at 12:11:38PM +0100, Jonathan Cameron wrote:  
> >> On Tue, 13 Mar 2018 10:43:55 -0600
> >> Logan Gunthorpe <log...@deltatee.com> wrote:  
> >>> It turns out that root ports that support P2P are far less common than 
> >>> anyone thought. So it will likely have to be a white list.  
> >>
> >> This came as a bit of a surprise to our PCIe architect.
> >>
> >> His follow up was whether it was worth raising an ECR for the PCIe spec
> >> to add a capability bit to allow this to be discovered.  This might
> >> long term avoid the need to maintain the white list for new devices.
> >>
> >> So is it worth having a long term solution for making this discoverable?  
> > 
> > It was surprising to me that there's no architected way to discover
> > this.  It seems like such an obvious thing that I guess I assumed the
> > omission was intentional, i.e., maybe there's something that makes it
> > impractical, but it would be worth at least asking somebody in the
> > SIG.  It seems like for root ports in the same root complex, at least,
> > there could be a bit somewhere in the root port or the RCRB (which
> > Linux doesn't support yet).  
> 
> Yes, I agree. It would be a good long term solution to have this bit in
> the spec. That would avoid us needing to create a white list for new
> hardware. However, I expect it would be years before we can rely on it
> so someone may yet implement that white list.
> 
> Logan

I'll see if I can get our PCI SIG people to follow this through and see if
it is just an omission or as Bjorn suggested, there is some reason we
aren't thinking of that makes it hard.

Agreed that it is a somewhat tangential question to what to do with current
hardware, but let's get the ball rolling if possible.

Jonathan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-26 Thread Jonathan Cameron
On Tue, 13 Mar 2018 10:43:55 -0600
Logan Gunthorpe  wrote:

> On 12/03/18 09:28 PM, Sinan Kaya wrote:
> > On 3/12/2018 3:35 PM, Logan Gunthorpe wrote:
> > Regarding the switch business, It is amazing how much trouble you went into
> > limit this functionality into very specific hardware.
> > 
> > I thought that we reached to an agreement that code would not impose
> > any limits on what user wants.
> > 
> > What happened to all the emails we exchanged?  
> 
> It turns out that root ports that support P2P are far less common than 
> anyone thought. So it will likely have to be a white list.

This came as a bit of a surprise to our PCIe architect.

His follow up was whether it was worth raising an ECR for the PCIe spec
to add a capability bit to allow this to be discovered.  This might
long term avoid the need to maintain the white list for new devices.

So is it worth having a long term solution for making this discoverable?

Jonathan

> Nobody else 
> seems keen on allowing the user to enable this on hardware that doesn't 
> work. The easiest solution is still limiting it to using a switch. From 
> there, if someone wants to start creating a white-list then that's 
> probably the way forward to support root ports.
> 
> And there's also the ACS problem which means if you want to use P2P on 
> the root ports you'll have to disable ACS on the entire system. (Or 
> preferably, the IOMMU groups need to get more sophisticated to allow for 
> dynamic changes).
> 
> Additionally, once you allow for root ports you may find the IOMMU 
> getting in the way.
> 
> So there are great deal more issues to sort out if you don't restrict to 
> devices behind switches.
> 
> Logan

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-17 Thread Jonathan Cameron
On Wed, 12 Sep 2018 17:08:52 +0200
Arnd Bergmann  wrote:

> The .ioctl and .compat_ioctl file operations have the same prototype so
> they can both point to the same function, which works great almost all
> the time when all the commands are compatible.
> 
> One exception is the s390 architecture, where a compat pointer is only
> 31 bit wide, and converting it into a 64-bit pointer requires calling
> compat_ptr(). Most drivers here will ever run in s390, but since we now
> have a generic helper for it, it's easy enough to use it consistently.
> 
> I double-checked all these drivers to ensure that all ioctl arguments
> are used as pointers or are ignored, but are not interpreted as integer
> values.
> 
> Signed-off-by: Arnd Bergmann 
> ---

For IIO part.

Acked-by: Jonathan Cameron 

Thanks,
> diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
> index a062cfddc5af..22844b94b0e9 100644
> --- a/drivers/iio/industrialio-core.c
> +++ b/drivers/iio/industrialio-core.c
> @@ -1630,7 +1630,7 @@ static const struct file_operations iio_buffer_fileops 
> = {
>   .owner = THIS_MODULE,
>   .llseek = noop_llseek,
>   .unlocked_ioctl = iio_ioctl,
> - .compat_ioctl = iio_ioctl,
> + .compat_ioctl = generic_compat_ioctl_ptrarg,
>  };
>  

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device

2019-04-05 Thread Jonathan Cameron
On Thu, 4 Apr 2019 12:08:49 -0700
Dan Williams  wrote:

> Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance
> properties described by the ACPI HMAT is expected to have an application
> specific consumer.
> 
> Those consumers may want 100% of the memory capacity to be reserved from
> any usage by the kernel. By default, with this enabling, a platform
> device is created to represent this differentiated resource.
> 
> A follow on change arranges for device-dax to claim these devices by
> default and provide an mmap interface for the target application.
> However, if the administrator prefers that some or all of the special
> purpose memory is made available to the core-mm the device-dax hotplug
> facility can be used to online the memory with its own numa node.
> 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Keith Busch 
> Cc: Jonathan Cameron 
> Signed-off-by: Dan Williams 

Hi Dan,

Great to see you getting this discussion going so fast and in
general the approach makes sense to me.

I'm a little confused why HMAT has anything to do with this.
SPM is defined either via the attribute in SRAT SPA entries,
EF_MEMORY_SP or via the EFI memory map.

Whether it is in HMAT or not isn't all that relevant.
Back in the days of the reservation hint (so before yesterday :)
it was relevant obviously but that's no longer true.

So what am I missing?

Thanks,

Jonathan


> ---
>  drivers/acpi/hmat/Kconfig |1 +
>  drivers/acpi/hmat/hmat.c  |   63 
> +
>  include/linux/memregion.h |3 ++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> index 95a29964dbea..4fcf76e8aa1d 100644
> --- a/drivers/acpi/hmat/Kconfig
> +++ b/drivers/acpi/hmat/Kconfig
> @@ -3,6 +3,7 @@ config ACPI_HMAT
>   bool "ACPI Heterogeneous Memory Attribute Table Support"
>   depends on ACPI_NUMA
>   select HMEM_REPORTING
> + select MEMREGION
>   help
>If set, this option has the kernel parse and report the
>platform's ACPI HMAT (Heterogeneous Memory Attributes Table),
> diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
> index e7ae44c8d359..482360004ea0 100644
> --- a/drivers/acpi/hmat/hmat.c
> +++ b/drivers/acpi/hmat/hmat.c
> @@ -13,6 +13,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -612,6 +615,65 @@ static __init void hmat_register_target_perf(struct 
> memory_target *target)
>   node_set_perf_attrs(mem_nid, >hmem_attrs, 0);
>  }
>  
> +static __init void hmat_register_target_device(struct memory_target *target)
> +{
> + struct memregion_info info;
> + struct resource res = {
> + .start = target->start,
> + .end = target->start + target->size - 1,
> + .flags = IORESOURCE_MEM,
> + .desc = IORES_DESC_APPLICATION_RESERVED,
> + };
> + struct platform_device *pdev;
> + int rc, id;
> +
> + if (region_intersects(target->start, target->size, IORESOURCE_MEM,
> + IORES_DESC_APPLICATION_RESERVED)
> + != REGION_INTERSECTS)
> + return;
> +
> + id = memregion_alloc();
> + if (id < 0) {
> + pr_err("acpi/hmat: memregion allocation failure for %pr\n", 
> );
> + return;
> + }
> +
> + pdev = platform_device_alloc("hmem", id);
> + if (!pdev) {
> + pr_err("acpi/hmat: hmem device allocation failure for %pr\n", 
> );
> + goto out_pdev;
> + }
> +
> + pdev->dev.numa_node = 
> acpi_map_pxm_to_online_node(target->processor_pxm);
> + info = (struct memregion_info) {
> + .target_node = acpi_map_pxm_to_node(target->memory_pxm),
> + };
> + rc = platform_device_add_data(pdev, , sizeof(info));
> + if (rc < 0) {
> + pr_err("acpi/hmat: hmem memregion_info allocation failure for 
> %pr\n", );
> + goto out_pdev;
> + }
> +
> + rc = platform_device_add_resources(pdev, , 1);
> + if (rc < 0) {
> + pr_err("acpi/hmat: hmem resource allocation failure for %pr\n", 
> );
> + goto out_resource;
> + }
> +
> + rc = platform_device_add(pdev);
> + if (rc < 0) {
> + dev_err(>dev, "acpi/hmat: device add failed for %pr\n", 
> );
> + goto out_resource;
> + }
> +
> + return;
> +
> +out_resource:
> + put_device(>dev);
> +out_pdev:

Re: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device

2019-04-05 Thread Jonathan Cameron
On Fri, 5 Apr 2019 08:43:03 -0700
Dan Williams  wrote:

> On Fri, Apr 5, 2019 at 4:19 AM Jonathan Cameron
>  wrote:
> >
> > On Thu, 4 Apr 2019 12:08:49 -0700
> > Dan Williams  wrote:
> >  
> > > Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance
> > > properties described by the ACPI HMAT is expected to have an application
> > > specific consumer.
> > >
> > > Those consumers may want 100% of the memory capacity to be reserved from
> > > any usage by the kernel. By default, with this enabling, a platform
> > > device is created to represent this differentiated resource.
> > >
> > > A follow on change arranges for device-dax to claim these devices by
> > > default and provide an mmap interface for the target application.
> > > However, if the administrator prefers that some or all of the special
> > > purpose memory is made available to the core-mm the device-dax hotplug
> > > facility can be used to online the memory with its own numa node.
> > >
> > > Cc: "Rafael J. Wysocki" 
> > > Cc: Len Brown 
> > > Cc: Keith Busch 
> > > Cc: Jonathan Cameron 
> > > Signed-off-by: Dan Williams   
> >
> > Hi Dan,
> >
> > Great to see you getting this discussion going so fast and in
> > general the approach makes sense to me.
> >
> > I'm a little confused why HMAT has anything to do with this.
> > SPM is defined either via the attribute in SRAT SPA entries,
> > EF_MEMORY_SP or via the EFI memory map.
> >
> > Whether it is in HMAT or not isn't all that relevant.
> > Back in the days of the reservation hint (so before yesterday :)
> > it was relevant obviously but that's no longer true.
> >
> > So what am I missing?  
> 
> It's a good question, and an assumption I should have explicitly
> declared in the changelog. The problem with EFI_MEMORY_SP is the same
> as the problem with the EfiPersistentMemory type, it isn't precise
> enough on its own for the kernel to delineate 'type' or
> device/replaceable-unit boundaries. For example, I expect one
> EFI_MEMORY_SP range of a specific type may be contiguous with another
> range of a different type. Similar to the NFIT there is no requirement
> in the specification that platform firmware inject multiple range
> entries. Instead that precision is left to the SRAT + HMAT, or the
> NFIT in the case of PMEM.

Absolutely, as long as they are all SPM, they could be anywhere in
the system.

> 
> Conversely, and thinking through this a bit more, if a memory range is
> "special", but the platform fails to enumerate it in HMAT I think
> Linux should scream loudly that the firmware is broken and leave the
> range alone. The "scream loudly" piece is missing in the current set,
> but the "leave the range alone" functionality is included.

I am certainly keen on screaming if the various entries are inconsistent
but am not sure they necessarily are here.

So there are a couple of ways we could get an SPM range defined.
The key thing here is that firmware should be attempting to describe
what it has to some degree somewhere.  If not it won't get a good
result ;)  So if there is no SRAT then you are on your own. SCREAM!

1. Directly in the memory map.  If there is no other information then
   tough luck the kernel can only sensibly handle it as one device.
   Or not at all, which seems like a reasonable decision to me.
   SCREAM

2. In memory map + a proximity domain entry in SRAT.  Given memory
   with different characteristics should be in different proximity
   domains anyway - this should be fairly precise. The slight snag
   here is that the fine grained nature of SRAT is actually a side
   effect of HMAT, so not sure well platforms have traditional
   describe their more subtle differences.

3. In NFIT as NFIT SPA carries the memory attribute.  Not sure if
   we should scream if this disagrees with the memory map.

4. In HMAT?  Now this changed in ACPI 6.3 to clean up the 'messy'
   prior relationship between it and SRAT.  Now HMAT no longer has
   memory address ranges as you observed.  That means, to describe
   properties of memory, it has to use the proximity domains of
   SRAT.  It provides lots of additional info about those domains
   but it is SRAT that defines them.

So I would argue that HMAT itself doesn't tell us anything useful.
SRAT certainly does though so I think this should be coming from
SRAT (or NFIT as that also defines the required precision)

Jonathan



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device

2019-04-05 Thread Jonathan Cameron
On Fri, 5 Apr 2019 09:56:22 -0700
Dan Williams  wrote:

> On Fri, Apr 5, 2019 at 9:24 AM Jonathan Cameron
>  wrote:
> >
> > On Fri, 5 Apr 2019 08:43:03 -0700
> > Dan Williams  wrote:
> >  
> > > On Fri, Apr 5, 2019 at 4:19 AM Jonathan Cameron
> > >  wrote:  
> > > >
> > > > On Thu, 4 Apr 2019 12:08:49 -0700
> > > > Dan Williams  wrote:
> > > >  
> > > > > Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance
> > > > > properties described by the ACPI HMAT is expected to have an 
> > > > > application
> > > > > specific consumer.
> > > > >
> > > > > Those consumers may want 100% of the memory capacity to be reserved 
> > > > > from
> > > > > any usage by the kernel. By default, with this enabling, a platform
> > > > > device is created to represent this differentiated resource.
> > > > >
> > > > > A follow on change arranges for device-dax to claim these devices by
> > > > > default and provide an mmap interface for the target application.
> > > > > However, if the administrator prefers that some or all of the special
> > > > > purpose memory is made available to the core-mm the device-dax hotplug
> > > > > facility can be used to online the memory with its own numa node.
> > > > >
> > > > > Cc: "Rafael J. Wysocki" 
> > > > > Cc: Len Brown 
> > > > > Cc: Keith Busch 
> > > > > Cc: Jonathan Cameron 
> > > > > Signed-off-by: Dan Williams   
> > > >
> > > > Hi Dan,
> > > >
> > > > Great to see you getting this discussion going so fast and in
> > > > general the approach makes sense to me.
> > > >
> > > > I'm a little confused why HMAT has anything to do with this.
> > > > SPM is defined either via the attribute in SRAT SPA entries,
> > > > EF_MEMORY_SP or via the EFI memory map.
> > > >
> > > > Whether it is in HMAT or not isn't all that relevant.
> > > > Back in the days of the reservation hint (so before yesterday :)
> > > > it was relevant obviously but that's no longer true.
> > > >
> > > > So what am I missing?  
> > >
> > > It's a good question, and an assumption I should have explicitly
> > > declared in the changelog. The problem with EFI_MEMORY_SP is the same
> > > as the problem with the EfiPersistentMemory type, it isn't precise
> > > enough on its own for the kernel to delineate 'type' or
> > > device/replaceable-unit boundaries. For example, I expect one
> > > EFI_MEMORY_SP range of a specific type may be contiguous with another
> > > range of a different type. Similar to the NFIT there is no requirement
> > > in the specification that platform firmware inject multiple range
> > > entries. Instead that precision is left to the SRAT + HMAT, or the
> > > NFIT in the case of PMEM.  
> >
> > Absolutely, as long as they are all SPM, they could be anywhere in
> > the system.
> >  
> > >
> > > Conversely, and thinking through this a bit more, if a memory range is
> > > "special", but the platform fails to enumerate it in HMAT I think
> > > Linux should scream loudly that the firmware is broken and leave the
> > > range alone. The "scream loudly" piece is missing in the current set,
> > > but the "leave the range alone" functionality is included.  
> >
> > I am certainly keen on screaming if the various entries are inconsistent
> > but am not sure they necessarily are here.
> >
> > So there are a couple of ways we could get an SPM range defined.
> > The key thing here is that firmware should be attempting to describe
> > what it has to some degree somewhere.  If not it won't get a good
> > result ;)  So if there is no SRAT then you are on your own. SCREAM!
> >
> > 1. Directly in the memory map.  If there is no other information then
> >tough luck the kernel can only sensibly handle it as one device.
> >Or not at all, which seems like a reasonable decision to me.
> >SCREAM
> >
> > 2. In memory map + a proximity domain entry in SRAT.  Given memory
> >with different characteristics should be in different proximity
> >domains anyway - this should be fairly precise. The slight snag
> >here is that the fine grained nature of SRAT is actually a side
> >effe

Re: [PATCH v4 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory

2019-06-25 Thread Jonathan Cameron
On Mon, 24 Jun 2019 11:19:32 -0700
Dan Williams  wrote:

> Currently hmat.c lives under an "hmat" directory which does not enhance
> the description of the file. The initial motivation for giving hmat.c
> its own directory was to delineate it as mm functionality in contrast to
> ACPI device driver functionality.
> 
> As ACPI continues to play an increasing role in conveying
> memory location and performance topology information to the OS take the
> opportunity to co-locate these NUMA relevant tables in a combined
> directory.
> 
> numa.c is renamed to srat.c and moved to drivers/acpi/numa/ along with
> hmat.c.

Hi Dan,

srat.c now includes processing for the slit table which is a bit odd.

Now we could split this up in to a top level numa.c and then
srat.c, slit.c and hmat.c

Does feel rather silly though.  Perhaps better to just leave it as
numa.c?

I don't really feel strongly about this though.

Jonathan

> 
> Cc: Len Brown 
> Cc: Keith Busch 
> Cc: "Rafael J. Wysocki" 
> Reviewed-by: Dave Hansen 
> Signed-off-by: Dan Williams 
> ---
>  drivers/acpi/Kconfig   |9 +
>  drivers/acpi/Makefile  |3 +--
>  drivers/acpi/hmat/Makefile |2 --
>  drivers/acpi/numa/Kconfig  |7 ++-
>  drivers/acpi/numa/Makefile |3 +++
>  drivers/acpi/numa/hmat.c   |0 
>  drivers/acpi/numa/srat.c   |0 
>  7 files changed, 11 insertions(+), 13 deletions(-)
>  delete mode 100644 drivers/acpi/hmat/Makefile
>  rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (72%)
>  create mode 100644 drivers/acpi/numa/Makefile
>  rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (100%)
>  rename drivers/acpi/{numa.c => numa/srat.c} (100%)
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 283ee94224c6..82c4a31c8701 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -321,12 +321,6 @@ config ACPI_THERMAL
> To compile this driver as a module, choose M here:
> the module will be called thermal.
>  
> -config ACPI_NUMA
> - bool "NUMA support"
> - depends on NUMA
> - depends on (X86 || IA64 || ARM64)
> - default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64
> -
>  config ACPI_CUSTOM_DSDT_FILE
>   string "Custom DSDT Table file to include"
>   default ""
> @@ -475,8 +469,7 @@ config ACPI_REDUCED_HARDWARE_ONLY
> If you are unsure what to do, do not enable this option.
>  
>  source "drivers/acpi/nfit/Kconfig"
> -source "drivers/acpi/hmat/Kconfig"
> -
> +source "drivers/acpi/numa/Kconfig"
>  source "drivers/acpi/apei/Kconfig"
>  source "drivers/acpi/dptf/Kconfig"
>  
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 5d361e4e3405..f08a661274e8 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -55,7 +55,6 @@ acpi-$(CONFIG_X86)  += acpi_cmos_rtc.o
>  acpi-$(CONFIG_X86)   += x86/apple.o
>  acpi-$(CONFIG_X86)   += x86/utils.o
>  acpi-$(CONFIG_DEBUG_FS)  += debugfs.o
> -acpi-$(CONFIG_ACPI_NUMA) += numa.o
>  acpi-$(CONFIG_ACPI_PROCFS_POWER) += cm_sbs.o
>  acpi-y   += acpi_lpat.o
>  acpi-$(CONFIG_ACPI_LPIT) += acpi_lpit.o
> @@ -80,7 +79,7 @@ obj-$(CONFIG_ACPI_PROCESSOR)+= processor.o
>  obj-$(CONFIG_ACPI)   += container.o
>  obj-$(CONFIG_ACPI_THERMAL)   += thermal.o
>  obj-$(CONFIG_ACPI_NFIT)  += nfit/
> -obj-$(CONFIG_ACPI_HMAT)  += hmat/
> +obj-$(CONFIG_ACPI_NUMA)  += numa/
>  obj-$(CONFIG_ACPI)   += acpi_memhotplug.o
>  obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>  obj-$(CONFIG_ACPI_BATTERY)   += battery.o
> diff --git a/drivers/acpi/hmat/Makefile b/drivers/acpi/hmat/Makefile
> deleted file mode 100644
> index 1c20ef36a385..
> --- a/drivers/acpi/hmat/Makefile
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -# SPDX-License-Identifier: GPL-2.0-only
> -obj-$(CONFIG_ACPI_HMAT) := hmat.o
> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/numa/Kconfig
> similarity index 72%
> rename from drivers/acpi/hmat/Kconfig
> rename to drivers/acpi/numa/Kconfig
> index 95a29964dbea..d14582387ed0 100644
> --- a/drivers/acpi/hmat/Kconfig
> +++ b/drivers/acpi/numa/Kconfig
> @@ -1,4 +1,9 @@
> -# SPDX-License-Identifier: GPL-2.0
> +config ACPI_NUMA
> + bool "NUMA support"
> + depends on NUMA
> + depends on (X86 || IA64 || ARM64)
> + default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64
> +
>  config ACPI_HMAT
>   bool "ACPI Heterogeneous Memory Attribute Table Support"
>   depends on ACPI_NUMA
> diff --git a/drivers/acpi/numa/Makefile b/drivers/acpi/numa/Makefile
> new file mode 100644
> index ..517a6c689a94
> --- /dev/null
> +++ b/drivers/acpi/numa/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_ACPI_NUMA) += srat.o
> +obj-$(CONFIG_ACPI_HMAT) += hmat.o
> diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/numa/hmat.c
> similarity index 100%
> rename from drivers/acpi/hmat/hmat.c
> rename to 

Re: [PATCH v4 08/10] device-dax: Add a driver for "hmem" devices

2019-06-25 Thread Jonathan Cameron
On Mon, 24 Jun 2019 11:20:16 -0700
Dan Williams  wrote:

> Platform firmware like EFI/ACPI may publish "hmem" platform devices.
> Such a device is a performance differentiated memory range likely
> reserved for an application specific use case. The driver gives access
> to 100% of the capacity via a device-dax mmap instance by default.
> 
> However, if over-subscription and other kernel memory management is
> desired the resulting dax device can be assigned to the core-mm via the
> kmem driver.
> 
> This consumes "hmem" devices the producer of "hmem" devices is saved for
> a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM
> symbol to gate performing the enumeration work.
> 
> Cc: Vishal Verma 
> Cc: Keith Busch 
> Cc: Dave Jiang 
> Reported-by: kbuild test robot 
> Reviewed-by: Dave Hansen 
> Signed-off-by: Dan Williams 
No need to have a remove function at all.  Otherwise this looks good to me.

Reviewed-by: Jonathan Cameron 

> ---
>  drivers/dax/Kconfig|   27 +++
>  drivers/dax/Makefile   |2 ++
>  drivers/dax/hmem.c |   57 
> 
>  include/linux/ioport.h |4 +++
>  4 files changed, 85 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/dax/hmem.c
> 
> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index f33c73e4af41..1a59ef86f148 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -32,19 +32,36 @@ config DEV_DAX_PMEM
>  
> Say M if unsure
>  
> +config DEV_DAX_HMEM
> + tristate "HMEM DAX: direct access to 'specific purpose' memory"
> + depends on EFI_APPLICATION_RESERVED
> + default DEV_DAX
> + help
> +   EFI 2.8 platforms, and others, may advertise 'specific purpose'
> +   memory.  For example, a high bandwidth memory pool. The
> +   indication from platform firmware is meant to reserve the
> +   memory from typical usage by default.  This driver creates
> +   device-dax instances for these memory ranges, and that also
> +   enables the possibility to assign them to the DEV_DAX_KMEM
> +   driver to override the reservation and add them to kernel
> +   "System RAM" pool.
> +
> +   Say M if unsure.
> +
>  config DEV_DAX_KMEM
>   tristate "KMEM DAX: volatile-use of persistent memory"
>   default DEV_DAX
>   depends on DEV_DAX
>   depends on MEMORY_HOTPLUG # for add_memory() and friends
>   help
> -   Support access to persistent memory as if it were RAM.  This
> -   allows easier use of persistent memory by unmodified
> -   applications.
> +   Support access to persistent, or other performance
> +   differentiated memory as if it were System RAM. This allows
> +   easier use of persistent memory by unmodified applications, or
> +   adds core kernel memory services to heterogeneous memory types
> +   (HMEM) marked "reserved" by platform firmware.
>  
> To use this feature, a DAX device must be unbound from the
> -   device_dax driver (PMEM DAX) and bound to this kmem driver
> -   on each boot.
> +   device_dax driver and bound to this kmem driver on each boot.
>  
> Say N if unsure.
>  
> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> index 81f7d54dadfb..80065b38b3c4 100644
> --- a/drivers/dax/Makefile
> +++ b/drivers/dax/Makefile
> @@ -2,9 +2,11 @@
>  obj-$(CONFIG_DAX) += dax.o
>  obj-$(CONFIG_DEV_DAX) += device_dax.o
>  obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
> +obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
>  
>  dax-y := super.o
>  dax-y += bus.o
>  device_dax-y := device.o
> +dax_hmem-y := hmem.o
>  
>  obj-y += pmem/
> diff --git a/drivers/dax/hmem.c b/drivers/dax/hmem.c
> new file mode 100644
> index ..62f9e3c80e21
> --- /dev/null
> +++ b/drivers/dax/hmem.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "bus.h"
> +
> +static int dax_hmem_probe(struct platform_device *pdev)
> +{
> + struct dev_pagemap pgmap = { NULL };
> + struct device *dev = >dev;
> + struct dax_region *dax_region;
> + struct memregion_info *mri;
> + struct dev_dax *dev_dax;
> + struct resource *res;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res)
> + return -ENOMEM;
> +
> + mri = dev->platform_data;
> + pgmap.dev = dev;
> + memcpy(, res, sizeof(*res));
> +
> + dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node,
> +  

Re: [PATCH v4 10/10] acpi/numa/hmat: Register "specific purpose" memory as an "hmem" device

2019-06-25 Thread Jonathan Cameron
On Mon, 24 Jun 2019 11:20:26 -0700
Dan Williams  wrote:

> Memory that has been tagged EFI_MEMORY_SP, and has performance
> properties described by the ACPI HMAT is expected to have an application
> specific consumer.
> 
> Those consumers may want 100% of the memory capacity to be reserved from
> any usage by the kernel. By default, with this enabling, a platform
> device is created to represent this differentiated resource.
> 
> The device-dax "hmem" driver claims these devices by default and
> provides an mmap interface for the target application.  If the
> administrator prefers, the hmem resource range can be made available to
> the core-mm via the device-dax hotplug facility, kmem, to online the
> memory with its own numa node.
> 
> This was tested with an emulated HMAT produced by qemu (with the pending
> HMAT enabling patches), and "efi_fake_mem=8G@9G:0x4" on the kernel
> command line to mark the memory ranges associated with node2 and node3
> as EFI_MEMORY_SP.
> 
> qemu numa configuration options:
> 
> -numa node,mem=4G,cpus=0-19,nodeid=0
> -numa node,mem=4G,cpus=20-39,nodeid=1
> -numa node,mem=4G,nodeid=2
> -numa node,mem=4G,nodeid=3
> -numa dist,src=0,dst=0,val=10
> -numa dist,src=0,dst=1,val=21
> -numa dist,src=0,dst=2,val=21
> -numa dist,src=0,dst=3,val=21
> -numa dist,src=1,dst=0,val=21
> -numa dist,src=1,dst=1,val=10
> -numa dist,src=1,dst=2,val=21
> -numa dist,src=1,dst=3,val=21
> -numa dist,src=2,dst=0,val=21
> -numa dist,src=2,dst=1,val=21
> -numa dist,src=2,dst=2,val=10
> -numa dist,src=2,dst=3,val=21
> -numa dist,src=3,dst=0,val=21
> -numa dist,src=3,dst=1,val=21
> -numa dist,src=3,dst=2,val=21
> -numa dist,src=3,dst=3,val=10
> -numa 
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5
> -numa 
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
> -numa 
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10
> -numa 
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10
> -numa 
> hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15
> -numa 
> hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15
> -numa 
> hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20
> -numa 
> hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20
> -numa 
> hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10
> -numa 
> hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10
> -numa 
> hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5
> -numa 
> hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
> -numa 
> hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15
> -numa 
> hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15
> -numa 
> hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20
> -numa 
> hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20
> 
> Result:
> 
> # daxctl list -RDu
> [
>   {
> "path":"\/platform\/hmem.1",
> "id":1,
> "size":"4.00 GiB (4.29 GB)",
> "align":2097152,
> "devices":[
>   {
> "chardev":"dax1.0",
> "size":"4.00 GiB (4.29 GB)"
>   }
> ]
>   },
>   {
> "path":"\/platform\/hmem.0",
> "id":0,
> "size":"4.00 GiB (4.29 GB)",
> "align":2097152,
> "devices":[
>   {
> "chardev":"dax0.0",
> "size":"4.00 GiB (4.29 GB)"
>   }
> ]
>   }
> ]
> 
> # cat /proc/iomem
> [..]
> 24000-43fff : Application Reserved
>   24000-33fff : hmem.0
> 24000-33fff : dax0.0
>   34000-43fff : hmem.1
> 34000-43fff : dax1.0
> 
> Cc: Len Brown 
> Cc: Keith Busch 
> Cc: "Rafael J. Wysocki" 
> Cc: Vishal Verma 
> Cc: Jonathan Cameron 
> Reviewed-by: Dave Hansen 
> Signed-off-by: Dan Williams 
Looks good to me.

Reviewed-by: Jonathan Cameron 

I like the approach of this patch set in general as it gives the flex

Re: [PATCH v2 08/27] ocxl: Save the device serial number in ocxl_fn

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:36 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> This patch retrieves the serial number of the card and makes it available
> to consumers of the ocxl driver via the ocxl_fn struct.
> 
> Signed-off-by: Alastair D'Silva 
> Acked-by: Frederic Barrat 
> Acked-by: Andrew Donnellan 
> ---
>  drivers/misc/ocxl/config.c | 46 ++
>  include/misc/ocxl.h|  1 +
>  2 files changed, 47 insertions(+)
> 
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index fb0c3b6f8312..a9203c309365 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -71,6 +71,51 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 
> afu_idx)
>   return 0;
>  }
>  
> +/**

Make sure anything you mark as kernel doc with /** is valid
kernel-doc.

> + * Find a related PCI device (function 0)
> + * @device: PCI device to match
> + *
> + * Returns a pointer to the related device, or null if not found
> + */
> +static struct pci_dev *get_function_0(struct pci_dev *dev)
> +{
> + unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0); // Look for 
> function 0

Not sure the trailing comment adds much.

I'd personally not bother with this wrapper at all and just call
the pci functions directly where needed.

> +
> + return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
> + dev->bus->number, devfn);
> +}
> +
> +static void read_serial(struct pci_dev *dev, struct ocxl_fn_config *fn)
> +{
> + u32 low, high;
> + int pos;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
> + if (pos) {
> + pci_read_config_dword(dev, pos + 0x04, );
> + pci_read_config_dword(dev, pos + 0x08, );
> +
> + fn->serial = low | ((u64)high) << 32;
> +
> + return;
> + }
> +
> + if (PCI_FUNC(dev->devfn) != 0) {
> + struct pci_dev *related = get_function_0(dev);
> +
> + if (!related) {
> + fn->serial = 0;
> + return;
> + }
> +
> + read_serial(related, fn);
> + pci_dev_put(related);
> + return;
> + }
> +
> + fn->serial = 0;
> +}
> +
>  static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
>  {
>   u16 val;
> @@ -208,6 +253,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
> ocxl_fn_config *fn)
>   int rc;
>  
>   read_pasid(dev, fn);
> + read_serial(dev, fn);
>  
>   rc = read_dvsec_tl(dev, fn);
>   if (rc) {
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 6f7c02f0d5e3..9843051c3c5b 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -46,6 +46,7 @@ struct ocxl_fn_config {
>   int dvsec_afu_info_pos; /* offset of the AFU information DVSEC */
>   s8 max_pasid_log;
>   s8 max_afu_index;
> + u64 serial;
>  };
>  
>  enum ocxl_endian {

___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 24/27] nvdimm/ocxl: Implement Overwrite

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:52 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> The near storage command 'Secure Erase' overwrites all data on the
> media.
> 
> This patch hooks it up to the security function 'overwrite'.
> 
> Signed-off-by: Alastair D'Silva 

A few things to tidy up in here.

Thanks,

Jonathan


> ---
>  drivers/nvdimm/ocxl/scm.c  | 164 -
>  drivers/nvdimm/ocxl/scm_internal.c |   1 +
>  drivers/nvdimm/ocxl/scm_internal.h |  17 +++
>  3 files changed, 180 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> index a81eb5916eb3..8deb7862793c 100644
> --- a/drivers/nvdimm/ocxl/scm.c
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -169,6 +169,86 @@ static int scm_reserve_metadata(struct scm_data 
> *scm_data,
>   return 0;
>  }
>  
> +/**
> + * scm_overwrite() - Overwrite all data on the card
> + * @scm_data: The SCM device data

I would mention in here that this exists with the lock held and
where that is unlocked again.

> + * Return: 0 on success
> + */
> +int scm_overwrite(struct scm_data *scm_data)
> +{
> + int rc;
> +
> + mutex_lock(_data->ns_command.lock);
> +
> + rc = scm_ns_command_request(scm_data, NS_COMMAND_SECURE_ERASE);
> + if (rc)

Perhaps change that goto label to reflect it is the error path rather
than a shared exit route.

> + goto out;
> +
> + rc = scm_ns_command_execute(scm_data);
> + if (rc)
> + goto out;
> +
> + scm_data->overwrite_state = SCM_OVERWRITE_BUSY;
> +
> + return 0;
> +
> +out:
> + mutex_unlock(_data->ns_command.lock);
> + return rc;
> +}
> +
> +/**
> + * scm_secop_overwrite() - Overwrite all data on the card
> + * @nvdimm: The nvdimm representation of the SCM device to start the 
> overwrite on
> + * @key_data: Unused (no security key implementation)
> + * Return: 0 on success
> + */
> +static int scm_secop_overwrite(struct nvdimm *nvdimm,
> +const struct nvdimm_key_data *key_data)
> +{
> + struct scm_data *scm_data = nvdimm_provider_data(nvdimm);
> +
> + return scm_overwrite(scm_data);
> +}
> +
> +/**
> + * scm_secop_query_overwrite() - Get the current overwrite state
> + * @nvdimm: The nvdimm representation of the SCM device to start the 
> overwrite on
> + * Return: 0 if successful or idle, -EBUSY if busy, -EFAULT if failed
> + */
> +static int scm_secop_query_overwrite(struct nvdimm *nvdimm)
> +{
> + struct scm_data *scm_data = nvdimm_provider_data(nvdimm);
> +
> + if (scm_data->overwrite_state == SCM_OVERWRITE_BUSY)
> + return -EBUSY;
> +
> + if (scm_data->overwrite_state == SCM_OVERWRITE_FAILED)
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> +/**
> + * scm_secop_get_flags() - return the security flags for the SCM device

All params need to documented in kernel-doc comments.

> + */
> +static unsigned long scm_secop_get_flags(struct nvdimm *nvdimm,
> + enum nvdimm_passphrase_type ptype)
> +{
> + struct scm_data *scm_data = nvdimm_provider_data(nvdimm);
> +
> + if (scm_data->overwrite_state == SCM_OVERWRITE_BUSY)
> + return BIT(NVDIMM_SECURITY_OVERWRITE);
> +
> + return BIT(NVDIMM_SECURITY_DISABLED);
> +}
> +
> +static const struct nvdimm_security_ops sec_ops  = {
> + .get_flags = scm_secop_get_flags,
> + .overwrite = scm_secop_overwrite,
> + .query_overwrite = scm_secop_query_overwrite,
> +};
> +
>  /**
>   * scm_register_lpc_mem() - Discover persistent memory on a device and 
> register it with the NVDIMM subsystem
>   * @scm_data: The SCM device data
> @@ -224,10 +304,10 @@ static int scm_register_lpc_mem(struct scm_data 
> *scm_data)
>   set_bit(NDD_ALIASING, _flags);
>  
>   snprintf(serial, sizeof(serial), "%llx", fn_config->serial);
> - nd_mapping_desc.nvdimm = nvdimm_create(scm_data->nvdimm_bus, scm_data,
> + nd_mapping_desc.nvdimm = __nvdimm_create(scm_data->nvdimm_bus, scm_data,
>scm_dimm_attribute_groups,
>nvdimm_flags, nvdimm_cmd_mask,
> -  0, NULL);
> +  0, NULL, serial, _ops);
>   if (!nd_mapping_desc.nvdimm)
>   return -ENOMEM;
>  
> @@ -1530,6 +1610,83 @@ static void scm_dump_error_log(struct scm_data 
> *scm_data)
>   kfree(buf);
>  }
>  
> +static void scm_handle_nscra_doorbell(struct scm_data *scm_data)
> +{
> + int rc;
> +
> + if (scm_data->ns_command.op_code == NS_COMMAND_SECURE_ERASE) {

Feels likely that we are going to end up with quite a few blocks like this as
the driver is extended. Perhaps just start out with a switch statement and
separate functions that it calls?

> + u64 success, attempted;
> +

One is enough here.

> +
> + rc = scm_ns_response(scm_data);
> + if (rc < 0) {
> + scm_data->overwrite_state = SCM_OVERWRITE_FAILED;

If 

Re: [PATCH v2 22/27] nvdimm/ocxl: Implement the heartbeat command

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:50 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> The heartbeat admin command is a simple admin command that exercises
> the communication mechanisms within the controller.
> 
> This patch issues a heartbeat command to the card during init to ensure
> we can communicate with the card's crontroller.

controller

> 
> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/nvdimm/ocxl/scm.c | 43 +++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> index 8a30c887b5ed..e8b34262f397 100644
> --- a/drivers/nvdimm/ocxl/scm.c
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -353,6 +353,44 @@ static bool scm_is_usable(const struct scm_data 
> *scm_data)
>   return true;
>  }
>  
> +/**
> + * scm_heartbeat() - Issue a heartbeat command to the controller
> + * @scm_data: a pointer to the SCM device data
> + * Return: 0 if the controller responded correctly, negative on error
> + */
> +static int scm_heartbeat(struct scm_data *scm_data)
> +{
> + int rc;
> +
> + mutex_lock(_data->admin_command.lock);
> +
> + rc = scm_admin_command_request(scm_data, ADMIN_COMMAND_HEARTBEAT);
> + if (rc)
> + goto out;
> +
> + rc = scm_admin_command_execute(scm_data);
> + if (rc)
> + goto out;
> +
> + rc = scm_admin_command_complete_timeout(scm_data, 
> ADMIN_COMMAND_HEARTBEAT);
> + if (rc < 0) {
> + dev_err(_data->dev, "Heartbeat timeout\n");
> + goto out;
> + }
> +
> + rc = scm_admin_response(scm_data);
> + if (rc < 0)
> + goto out;
> + if (rc != STATUS_SUCCESS)
> + scm_warn_status(scm_data, "Unexpected status from heartbeat", 
> rc);
> +
> + rc = scm_admin_response_handled(scm_data);
> +
> +out:
> + mutex_unlock(_data->admin_command.lock);
> + return rc;
> +}
> +
>  /**
>   * allocate_scm_minor() - Allocate a minor number to use for an SCM device
>   * @scm_data: The SCM device to associate the minor with
> @@ -1508,6 +1546,11 @@ static int scm_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   goto err;
>   }
>  
> + if (scm_heartbeat(scm_data)) {
> + dev_err(>dev, "SCM Heartbeat failed\n");
> + goto err;
> + }
> +
>   elapsed = 0;
>   timeout = scm_data->readiness_timeout + 
> scm_data->memory_available_timeout;
>   while (!scm_is_usable(scm_data)) {

___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 10/27] nvdimm: Add driver for OpenCAPI Storage Class Memory

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:38 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> This driver exposes LPC memory on OpenCAPI SCM cards
> as an NVDIMM, allowing the existing nvram infrastructure
> to be used.
> 
> Namespace metadata is stored on the media itself, so
> scm_reserve_metadata() maps 1 section's worth of PMEM storage
> at the start to hold this. The rest of the PMEM range is registered
> with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size() provide
> callbacks to libnvdimm to access the metadata.
> 
> Signed-off-by: Alastair D'Silva 
Hi Alastair,

A few bits and bobs inline.

Thanks,

Jonathan

> ---
>  drivers/nvdimm/Kconfig |   2 +
>  drivers/nvdimm/Makefile|   2 +-
>  drivers/nvdimm/ocxl/Kconfig|  15 +
>  drivers/nvdimm/ocxl/Makefile   |   7 +
>  drivers/nvdimm/ocxl/scm.c  | 519 +
>  drivers/nvdimm/ocxl/scm_internal.h |  28 ++
>  6 files changed, 572 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/nvdimm/ocxl/Kconfig
>  create mode 100644 drivers/nvdimm/ocxl/Makefile
>  create mode 100644 drivers/nvdimm/ocxl/scm.c
>  create mode 100644 drivers/nvdimm/ocxl/scm_internal.h
> 
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 36af7af6b7cf..d1bab36da61c 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -130,4 +130,6 @@ config NVDIMM_TEST_BUILD
> core devm_memremap_pages() implementation and other
> infrastructure.
>  
> +source "drivers/nvdimm/ocxl/Kconfig"
> +
>  endif
> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> index 29203f3d3069..e33492128042 100644
> --- a/drivers/nvdimm/Makefile
> +++ b/drivers/nvdimm/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
> +obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o ocxl/
>  obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
>  obj-$(CONFIG_ND_BTT) += nd_btt.o
>  obj-$(CONFIG_ND_BLK) += nd_blk.o
> diff --git a/drivers/nvdimm/ocxl/Kconfig b/drivers/nvdimm/ocxl/Kconfig
> new file mode 100644
> index ..24099b300f5e
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +if LIBNVDIMM
> +
> +config OCXL_SCM
> + tristate "OpenCAPI Storage Class Memory"
> + depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH
> + select ZONE_DEVICE
> + select OCXL
> + help
> +   Exposes devices that implement the OpenCAPI Storage Class Memory
> +   specification as persistent memory regions.
> +
> +   Select N if unsure.
> +
> +endif
> diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
> new file mode 100644
> index ..74a1bd98848e
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +ccflags-$(CONFIG_PPC_WERROR) += -Werror
> +
> +obj-$(CONFIG_OCXL_SCM) += ocxlscm.o
> +
> +ocxlscm-y := scm.o
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> new file mode 100644
> index ..571058a9e7b8
> --- /dev/null
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -0,0 +1,519 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +// Copyright 2019 IBM Corp.
> +
> +/*
> + * A driver for Storage Class Memory, connected via OpenCAPI
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "scm_internal.h"
> +
> +
> +static const struct pci_device_id scm_pci_tbl[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
> + { }
> +};
> +
> +MODULE_DEVICE_TABLE(pci, scm_pci_tbl);
> +
> +#define SCM_NUM_MINORS 256 // Total to reserve
> +
> +static dev_t scm_dev;
> +static struct class *scm_class;
> +static struct mutex minors_idr_lock;
> +static struct idr minors_idr;
> +
> +static const struct attribute_group *scm_pmem_attribute_groups[] = {
> + _bus_attribute_group,
> + NULL,
> +};
> +
> +static const struct attribute_group *scm_pmem_region_attribute_groups[] = {
> + _region_attribute_group,
> + _device_attribute_group,
> + _mapping_attribute_group,
> + _numa_attribute_group,
> + NULL,
> +};
> +
> +/**
> + * scm_ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from 
> ndctl
> + * @scm_data: the SCM metadata
> + * @command: the incoming data to write
> + * Return: 0 on success, negative on failure
> + */
> +static int scm_ndctl_config_write(struct scm_data *scm_data,
> +   struct nd_cmd_set_config_hdr *command)
> +{
> + if (command->in_offset + command->in_length > SCM_LABEL_AREA_SIZE)
> + return -EINVAL;
> +
> + memcpy_flushcache(scm_data->metadata_addr + command->in_offset, 
> command->in_buf,
> +   command->in_length);
> +
> + return 0;
> +}
> +
> +/**
> + * scm_ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from 
> ndctl
> + * @scm_data: the SCM metadata
> + * @command: the read request
> + * Return: 0 on 

Re: [PATCH v2 07/27] ocxl: Add functions to map/unmap LPC memory

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:35 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> Add functions to map/unmap LPC memory
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/misc/ocxl/config.c|  4 +++
>  drivers/misc/ocxl/core.c  | 50 +++
>  drivers/misc/ocxl/ocxl_internal.h |  3 ++
>  include/misc/ocxl.h   | 18 +++
>  4 files changed, 75 insertions(+)
> 
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index c8e19bfb5ef9..fb0c3b6f8312 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
>   afu->special_purpose_mem_size =
>   total_mem_size - lpc_mem_size;
>   }
> +
> + dev_info(>dev, "Probed LPC memory of %#llx bytes and special 
> purpose memory of %#llx bytes\n",
> + afu->lpc_mem_size, afu->special_purpose_mem_size);
> +

If we are being fussy, this block has nothing todo with the rest of the patch
so we should be seeing it here.

>   return 0;
>  }
>  
> diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> index 2531c6cf19a0..98611faea219 100644
> --- a/drivers/misc/ocxl/core.c
> +++ b/drivers/misc/ocxl/core.c
> @@ -210,6 +210,55 @@ static void unmap_mmio_areas(struct ocxl_afu *afu)
>   release_fn_bar(afu->fn, afu->config.global_mmio_bar);
>  }
>  
> +int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu)
> +{
> + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> +
> + if ((afu->config.lpc_mem_size + afu->config.special_purpose_mem_size) 
> == 0)
> + return 0;
> +
> + afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev);
> + if (afu->lpc_base_addr == 0)
> + return -EINVAL;
> +
> + if (afu->config.lpc_mem_size) {

I was happy with the explicit check on 0 above, but we should be consistent.  
Either
we make use of 0 == false, or we don't and explicitly check vs 0.

Hence

if (afu->config.pc_mem_size != 0) { 

here or

if (!(afu->config.pc_mem_size + afu->config.special_purpose_mem_size))
return 0;

above.

> + afu->lpc_res.start = afu->lpc_base_addr + 
> afu->config.lpc_mem_offset;
> + afu->lpc_res.end = afu->lpc_res.start + 
> afu->config.lpc_mem_size - 1;
> + }
> +
> + if (afu->config.special_purpose_mem_size) {
> + afu->special_purpose_res.start = afu->lpc_base_addr +
> +  
> afu->config.special_purpose_mem_offset;
> + afu->special_purpose_res.end = afu->special_purpose_res.start +
> +
> afu->config.special_purpose_mem_size - 1;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(ocxl_afu_map_lpc_mem);
> +
> +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
> +{
> + return >lpc_res;
> +}
> +EXPORT_SYMBOL_GPL(ocxl_afu_lpc_mem);
> +
> +static void unmap_lpc_mem(struct ocxl_afu *afu)
> +{
> + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> +
> + if (afu->lpc_res.start || afu->special_purpose_res.start) {
> + void *link = afu->fn->link;
> +
> + ocxl_link_lpc_release(link, dev);
> +
> + afu->lpc_res.start = 0;
> + afu->lpc_res.end = 0;
> + afu->special_purpose_res.start = 0;
> + afu->special_purpose_res.end = 0;
> + }
> +}
> +
>  static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev 
> *dev)
>  {
>   int rc;
> @@ -251,6 +300,7 @@ static int configure_afu(struct ocxl_afu *afu, u8 
> afu_idx, struct pci_dev *dev)
>  
>  static void deconfigure_afu(struct ocxl_afu *afu)
>  {
> + unmap_lpc_mem(afu);

Hmm. This breaks the existing balance between configure_afu and deconfigure_afu.

Given comments below on why we don't do map_lpc_mem in the afu bring up
(as it's a shared operation) it seems to me that we should be doing this
outside of the afu deconfigure.  Perhaps ocxl_function_close is appropriate?
I don't know this infrastructure well enough to be sure.

If it does need to be here, then a comment to give more info on
why would be great!

>   unmap_mmio_areas(afu);
>   reclaim_afu_pasid(afu);
>   reclaim_afu_actag(afu);
> diff --git a/drivers/misc/ocxl/ocxl_internal.h 
> b/drivers/misc/ocxl/ocxl_internal.h
> index 20b417e00949..9f4b47900e62 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -52,6 +52,9 @@ struct ocxl_afu {
>   void __iomem *global_mmio_ptr;
>   u64 pp_mmio_start;
>   void *private;
> + u64 lpc_base_addr; /* Covers both LPC & special purpose memory */
> + struct resource lpc_res;
> + struct resource special_purpose_res;
>  };
>  
>  enum ocxl_context_status {
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 06dd5839e438..6f7c02f0d5e3 100644
> --- a/include/misc/ocxl.h
> +++ 

Re: [PATCH v2 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:34 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> Tally up the LPC memory on an OpenCAPI link & allow it to be mapped
> 
> Signed-off-by: Alastair D'Silva 
Hi Alastair,

A few trivial comments inline.

Jonathan

> ---
>  drivers/misc/ocxl/core.c  | 10 ++
>  drivers/misc/ocxl/link.c  | 60 +++
>  drivers/misc/ocxl/ocxl_internal.h | 33 +
>  3 files changed, 103 insertions(+)
> 
> diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> index b7a09b21ab36..2531c6cf19a0 100644
> --- a/drivers/misc/ocxl/core.c
> +++ b/drivers/misc/ocxl/core.c
> @@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, u8 
> afu_idx, struct pci_dev *dev)
>   if (rc)
>   goto err_free_pasid;
>  
> + if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) {
> + rc = ocxl_link_add_lpc_mem(afu->fn->link, 
> afu->config.lpc_mem_offset,
> +afu->config.lpc_mem_size +
> +
> afu->config.special_purpose_mem_size);
> + if (rc)
> + goto err_free_mmio;
> + }
> +
>   return 0;
>  
> +err_free_mmio:
> + unmap_mmio_areas(afu);
>  err_free_pasid:
>   reclaim_afu_pasid(afu);
>  err_free_actag:
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 58d111afd9f6..d8503f0dc6ec 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -84,6 +84,11 @@ struct ocxl_link {
>   int dev;
>   atomic_t irq_available;
>   struct spa *spa;
> + struct mutex lpc_mem_lock;

Always a good idea to explicitly document what a lock is intended to protect.

> + u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */
> + u64 lpc_mem;
> + int lpc_consumers;
> +
>   void *platform_data;
>  };
>  static struct list_head links_list = LIST_HEAD_INIT(links_list);
> @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
> struct ocxl_link **out_l
>   if (rc)
>   goto err_spa;
>  
> + mutex_init(>lpc_mem_lock);
> +
>   /* platform specific hook */
>   rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
>   >platform_data);
> @@ -711,3 +718,56 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
>   atomic_inc(>irq_available);
>  }
>  EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
> +
> +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
> +{
> + struct ocxl_link *link = (struct ocxl_link *) link_handle;
> +
> + // Check for overflow

Stray c++ style comment.

> + if (offset > (offset + size))
> + return -EINVAL;
> +
> + mutex_lock(>lpc_mem_lock);
> + link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);
> +
> + mutex_unlock(>lpc_mem_lock);
> +
> + return 0;
> +}
> +
> +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> +{
> + struct ocxl_link *link = (struct ocxl_link *) link_handle;
> + u64 lpc_mem;
> +
> + mutex_lock(>lpc_mem_lock);
> + if (link->lpc_mem) {

If you don't modify this later in the series (I haven't read it all yet :),
it rather feels like it would be more compact and just as readable as
something like...

if (!link->lpc_mem)
link->lpc_mem = pnv_ocxl...

if (link->lpc_mem)
link->lpc_consumers++;
mutex_unlock(>lpc_mem_lock);

return link->lpc_mem;

> + lpc_mem = link->lpc_mem;
> +
> + link->lpc_consumers++;
> + mutex_unlock(>lpc_mem_lock);
> + return lpc_mem;
> + }
> +
> + link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link->lpc_mem_sz);
> + if (link->lpc_mem)
> + link->lpc_consumers++;
> + lpc_mem = link->lpc_mem;
> + mutex_unlock(>lpc_mem_lock);
> +
> + return lpc_mem;
> +}
> +
> +void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
> +{
> + struct ocxl_link *link = (struct ocxl_link *) link_handle;
> +
> + mutex_lock(>lpc_mem_lock);
> + WARN_ON(--link->lpc_consumers < 0);
> + if (link->lpc_consumers == 0) {
> + pnv_ocxl_platform_lpc_release(pdev);
> + link->lpc_mem = 0;
> + }
> +
> + mutex_unlock(>lpc_mem_lock);
> +}
> diff --git a/drivers/misc/ocxl/ocxl_internal.h 
> b/drivers/misc/ocxl/ocxl_internal.h
> index 97415afd79f3..20b417e00949 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -141,4 +141,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 
> offset);
>  u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
>  void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
>  
> +/**
> + * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an 
> OpenCAPI link
> + *
> + * @link_handle: The OpenCAPI link 

Re: [PATCH v2 14/27] nvdimm/ocxl: Add support for near storage commands

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:42 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> Similar to the previous patch, this adds support for near storage commands.
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/nvdimm/ocxl/scm.c  |  6 +
>  drivers/nvdimm/ocxl/scm_internal.c | 41 ++
>  drivers/nvdimm/ocxl/scm_internal.h | 38 +++
>  3 files changed, 85 insertions(+)
> 
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> index 1e175f3c3cf2..6c16ca7fabfa 100644
> --- a/drivers/nvdimm/ocxl/scm.c
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -310,12 +310,18 @@ static int scm_setup_command_metadata(struct scm_data 
> *scm_data)
>   int rc;
>  
>   mutex_init(_data->admin_command.lock);
> + mutex_init(_data->ns_command.lock);
>  
>   rc = scm_extract_command_metadata(scm_data, GLOBAL_MMIO_ACMA_CREQO,
> _data->admin_command);
>   if (rc)
>   return rc;
>  
> + rc = scm_extract_command_metadata(scm_data, GLOBAL_MMIO_NSCMA_CREQO,
> +   _data->ns_command);
> + if (rc)
> + return rc;
> +

Ah. So much for my comment in previous patch.  Ignore that...

>   return 0;
>  }
>  
> diff --git a/drivers/nvdimm/ocxl/scm_internal.c 
> b/drivers/nvdimm/ocxl/scm_internal.c
> index 7b11b56863fb..c405f1d8afb8 100644
> --- a/drivers/nvdimm/ocxl/scm_internal.c
> +++ b/drivers/nvdimm/ocxl/scm_internal.c
> @@ -132,6 +132,47 @@ int scm_admin_response_handled(const struct scm_data 
> *scm_data)
> OCXL_LITTLE_ENDIAN, GLOBAL_MMIO_CHI_ACRA);
>  }
>  
> +int scm_ns_command_request(struct scm_data *scm_data, u8 op_code)
> +{
> + u64 val;
> + int rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, GLOBAL_MMIO_CHI,
> +  OCXL_LITTLE_ENDIAN, );
> + if (rc)
> + return rc;
> +
> + if (!(val & GLOBAL_MMIO_CHI_NSCRA))
> + return -EBUSY;
> +
> + return scm_command_request(scm_data, _data->ns_command, op_code);
> +}
> +
> +int scm_ns_response(const struct scm_data *scm_data)
> +{
> + return scm_command_response(scm_data, _data->ns_command);
> +}
> +
> +int scm_ns_command_execute(const struct scm_data *scm_data)
> +{
> + return ocxl_global_mmio_set64(scm_data->ocxl_afu, GLOBAL_MMIO_HCI,
> +   OCXL_LITTLE_ENDIAN, 
> GLOBAL_MMIO_HCI_NSCRW);
> +}
> +
> +bool scm_ns_command_complete(const struct scm_data *scm_data)
> +{
> + u64 val = 0;
> + int rc = scm_chi(scm_data, );
> +
> + WARN_ON(rc);
> +
> + return (val & GLOBAL_MMIO_CHI_NSCRA) != 0;
> +}
> +
> +int scm_ns_response_handled(const struct scm_data *scm_data)
> +{
> + return ocxl_global_mmio_set64(scm_data->ocxl_afu, GLOBAL_MMIO_CHIC,
> +   OCXL_LITTLE_ENDIAN, 
> GLOBAL_MMIO_CHI_NSCRA);
> +}
> +
>  void scm_warn_status(const struct scm_data *scm_data, const char *message,
>u8 status)
>  {
> diff --git a/drivers/nvdimm/ocxl/scm_internal.h 
> b/drivers/nvdimm/ocxl/scm_internal.h
> index 9bff684cd069..9575996a89e7 100644
> --- a/drivers/nvdimm/ocxl/scm_internal.h
> +++ b/drivers/nvdimm/ocxl/scm_internal.h
> @@ -108,6 +108,7 @@ struct scm_data {
>   struct ocxl_context *ocxl_context;
>   void *metadata_addr;
>   struct command_metadata admin_command;
> + struct command_metadata ns_command;
>   struct resource scm_res;
>   struct nd_region *nd_region;
>   char fw_version[8+1];
> @@ -176,6 +177,42 @@ int scm_admin_command_complete_timeout(const struct 
> scm_data *scm_data,
>   */
>  int scm_admin_response_handled(const struct scm_data *scm_data);
>  
> +/**
> + * scm_ns_command_request() - Issue a near storage command request
> + * @scm_data: a pointer to the SCM device data
> + * @op_code: The op-code for the command
> + * Returns an identifier for the command, or negative on error
> + */
> +int scm_ns_command_request(struct scm_data *scm_data, u8 op_code);
> +
> +/**
> + * scm_ns_response() - Validate a near storage response
> + * @scm_data: a pointer to the SCM device data
> + * Returns the status code of the command, or negative on error
> + */
> +int scm_ns_response(const struct scm_data *scm_data);
> +
> +/**
> + * scm_ns_command_execute() - Notify the controller to start processing a 
> pending near storage command
> + * @scm_data: a pointer to the SCM device data
> + * Returns 0 on success, negative on error
> + */
> +int scm_ns_command_execute(const struct scm_data *scm_data);
> +
> +/**
> + * scm_ns_command_complete() - Is a near storage command executing
> + * scm_data: a pointer to the SCM device data
> + * Returns true if the previous admin command has completed
> + */
> +bool scm_ns_command_complete(const struct scm_data *scm_data);
> +
> +/**
> + * scm_ns_response_handled() - Notify the controller that the near storage 
> response 

Re: [PATCH v2 12/27] nvdimm/ocxl: Read the capability registers & wait for device ready

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:40 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> This patch reads timeouts & firmware version from the controller, and
> uses those timeouts to wait for the controller to report that it is ready
> before handing the memory over to libnvdimm.
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/nvdimm/ocxl/Makefile   |  2 +-
>  drivers/nvdimm/ocxl/scm.c  | 84 ++
>  drivers/nvdimm/ocxl/scm_internal.c | 19 +++
>  drivers/nvdimm/ocxl/scm_internal.h | 24 +
>  4 files changed, 128 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/nvdimm/ocxl/scm_internal.c
> 
> diff --git a/drivers/nvdimm/ocxl/Makefile b/drivers/nvdimm/ocxl/Makefile
> index 74a1bd98848e..9b6e31f0eb3e 100644
> --- a/drivers/nvdimm/ocxl/Makefile
> +++ b/drivers/nvdimm/ocxl/Makefile
> @@ -4,4 +4,4 @@ ccflags-$(CONFIG_PPC_WERROR)  += -Werror
>  
>  obj-$(CONFIG_OCXL_SCM) += ocxlscm.o
>  
> -ocxlscm-y := scm.o
> +ocxlscm-y := scm.o scm_internal.o
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> index 571058a9e7b8..8088f65c289e 100644
> --- a/drivers/nvdimm/ocxl/scm.c
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -7,6 +7,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -266,6 +267,30 @@ static int scm_register_lpc_mem(struct scm_data 
> *scm_data)
>   return 0;
>  }
>  
> +/**
> + * scm_is_usable() - Is a controller usable?
> + * @scm_data: a pointer to the SCM device data
> + * Return: true if the controller is usable
> + */
> +static bool scm_is_usable(const struct scm_data *scm_data)
> +{
> + u64 chi = 0;
> + int rc = scm_chi(scm_data, );
> +
> + if (!(chi & GLOBAL_MMIO_CHI_CRDY)) {
> + dev_err(_data->dev, "SCM controller is not ready.\n");
> + return false;
> + }
> +
> + if (!(chi & GLOBAL_MMIO_CHI_MA)) {
> + dev_err(_data->dev,
> + "SCM controller does not have memory available.\n");
> + return false;
> + }
> +
> + return true;
> +}
> +
>  /**
>   * allocate_scm_minor() - Allocate a minor number to use for an SCM device
>   * @scm_data: The SCM device to associate the minor with
> @@ -380,6 +405,48 @@ static void scm_remove(struct pci_dev *pdev)
>   }
>  }
>  
> +/**
> + * read_device_metadata() - Retrieve config information from the AFU and 
> save it for future use
> + * @scm_data: the SCM metadata
> + * Return: 0 on success, negative on failure
> + */
> +static int read_device_metadata(struct scm_data *scm_data)
> +{
> + u64 val;
> + int rc;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, GLOBAL_MMIO_CCAP0,
> +  OCXL_LITTLE_ENDIAN, );
> + if (rc)
> + return rc;
> +
> + scm_data->scm_revision = val & 0x;
> + scm_data->read_latency = (val >> 32) & 0xFF;
> + scm_data->readiness_timeout = (val >> 48) & 0xff;
> + scm_data->memory_available_timeout = val >> 52;

This overlaps with the masked region for readiness_timeout.  I'll guess the maks
on that should be 0xF.

> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, GLOBAL_MMIO_CCAP1,
> +  OCXL_LITTLE_ENDIAN, );
> + if (rc)
> + return rc;
> +
> + scm_data->max_controller_dump_size = val & 0x;
> +
> + // Extract firmware version text
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, GLOBAL_MMIO_FWVER,
> +  OCXL_HOST_ENDIAN, (u64 
> *)scm_data->fw_version);
> + if (rc)
> + return rc;
> +
> + scm_data->fw_version[8] = '\0';
> +
> + dev_info(_data->dev,
> +  "Firmware version '%s' SCM revision %d:%d\n", 
> scm_data->fw_version,
> +  scm_data->scm_revision >> 4, scm_data->scm_revision & 0x0F);
> +
> + return 0;
> +}
> +
>  /**
>   * scm_probe_function_0 - Set up function 0 for an OpenCAPI Storage Class 
> Memory device
>   * This is important as it enables templates higher than 0 across all other 
> functions,
> @@ -420,6 +487,8 @@ static int scm_probe_function_0(struct pci_dev *pdev)
>  static int scm_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  {
>   struct scm_data *scm_data = NULL;
> + int elapsed;
> + u16 timeout;
>  
>   if (PCI_FUNC(pdev->devfn) == 0)
>   return scm_probe_function_0(pdev);
> @@ -469,6 +538,21 @@ static int scm_probe(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>   goto err;
>   }
>  
> + if (read_device_metadata(scm_data)) {
> + dev_err(>dev, "Could not read SCM device metadata\n");
> + goto err;
> + }
> +
> + elapsed = 0;
> + timeout = scm_data->readiness_timeout + 
> scm_data->memory_available_timeout;
> + while (!scm_is_usable(scm_data)) {
> + if (elapsed++ > timeout) {
> + dev_warn(_data->dev, "SCM ready 

Re: [PATCH v2 13/27] nvdimm/ocxl: Add support for Admin commands

2020-02-03 Thread Jonathan Cameron
On Tue, 3 Dec 2019 14:46:41 +1100
Alastair D'Silva  wrote:

> From: Alastair D'Silva 
> 
> This patch requests the metadata required to issue admin commands, as well
> as some helper functions to construct and check the completion of the
> commands.
> 
> Signed-off-by: Alastair D'Silva 

A few trivial bits inline.

Jonathan

> ---
>  drivers/nvdimm/ocxl/scm.c  |  67 +
>  drivers/nvdimm/ocxl/scm_internal.c | 152 +
>  drivers/nvdimm/ocxl/scm_internal.h |  62 
>  3 files changed, 281 insertions(+)
> 
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> index 8088f65c289e..1e175f3c3cf2 100644
> --- a/drivers/nvdimm/ocxl/scm.c
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -267,6 +267,58 @@ static int scm_register_lpc_mem(struct scm_data 
> *scm_data)
>   return 0;
>  }
>  
> +/**
> + * scm_extract_command_metadata() - Extract command data from MMIO & save it 
> for further use
> + * @scm_data: a pointer to the SCM device data
> + * @offset: The base address of the command data structures (address of 
> CREQO)
> + * @command_metadata: A pointer to the command metadata to populate
> + * Return: 0 on success, negative on failure
> + */
> +static int scm_extract_command_metadata(struct scm_data *scm_data, u32 
> offset,
> + struct command_metadata 
> *command_metadata)
> +{
> + int rc;
> + u64 tmp;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, offset, 
> OCXL_LITTLE_ENDIAN,
> +  );
> + if (rc)
> + return rc;
> +
> + command_metadata->request_offset = tmp >> 32;
> + command_metadata->response_offset = tmp & 0x;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, offset + 8, 
> OCXL_LITTLE_ENDIAN,
> +  );
> + if (rc)
> + return rc;
> +
> + command_metadata->data_offset = tmp >> 32;
> + command_metadata->data_size = tmp & 0x;
> +
> + command_metadata->id = 0;
> +
> + return 0;
> +}
> +
> +/**
> + * scm_setup_command_metadata() - Set up the command metadata
> + * @scm_data: a pointer to the SCM device data
> + */
> +static int scm_setup_command_metadata(struct scm_data *scm_data)
> +{
> + int rc;
> +
> + mutex_init(_data->admin_command.lock);
> +
> + rc = scm_extract_command_metadata(scm_data, GLOBAL_MMIO_ACMA_CREQO,
> +   _data->admin_command);
> + if (rc)
> + return rc;

Unless you are adding to this later in the series.

return scm_extract_command_metadata(scm_data,...)

> +
> + return 0;
> +}
> +
>  /**
>   * scm_is_usable() - Is a controller usable?
>   * @scm_data: a pointer to the SCM device data
> @@ -276,6 +328,8 @@ static bool scm_is_usable(const struct scm_data *scm_data)
>  {
>   u64 chi = 0;
>   int rc = scm_chi(scm_data, );
> + if (rc)
> + return false;
>  
>   if (!(chi & GLOBAL_MMIO_CHI_CRDY)) {
>   dev_err(_data->dev, "SCM controller is not ready.\n");
> @@ -502,6 +556,14 @@ static int scm_probe(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>   }
>   scm_data->pdev = pdev;
>  
> + scm_data->timeouts[ADMIN_COMMAND_ERRLOG] = 2000; // ms
> + scm_data->timeouts[ADMIN_COMMAND_HEARTBEAT] = 100; // ms
> + scm_data->timeouts[ADMIN_COMMAND_SMART] = 100; // ms
> + scm_data->timeouts[ADMIN_COMMAND_CONTROLLER_DUMP] = 1000; // ms
> + scm_data->timeouts[ADMIN_COMMAND_CONTROLLER_STATS] = 100; // ms
> + scm_data->timeouts[ADMIN_COMMAND_SHUTDOWN] = 1000; // ms
> + scm_data->timeouts[ADMIN_COMMAND_FW_UPDATE] = 16000; // ms
> +
>   pci_set_drvdata(pdev, scm_data);
>  
>   scm_data->ocxl_fn = ocxl_function_open(pdev);
> @@ -543,6 +605,11 @@ static int scm_probe(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>   goto err;
>   }
>  
> + if (scm_setup_command_metadata(scm_data)) {
> + dev_err(>dev, "Could not read OCXL command matada\n");
> + goto err;
> + }
> +
>   elapsed = 0;
>   timeout = scm_data->readiness_timeout + 
> scm_data->memory_available_timeout;
>   while (!scm_is_usable(scm_data)) {
> diff --git a/drivers/nvdimm/ocxl/scm_internal.c 
> b/drivers/nvdimm/ocxl/scm_internal.c
> index 72d3c0e7d846..7b11b56863fb 100644
> --- a/drivers/nvdimm/ocxl/scm_internal.c
> +++ b/drivers/nvdimm/ocxl/scm_internal.c
> @@ -17,3 +17,155 @@ int scm_chi(const struct scm_data *scm_data, u64 *chi)
>  
>   return 0;
>  }
> +
> +static int scm_command_request(const struct scm_data *scm_data,
> +struct command_metadata *cmd, u8 op_code)
> +{
> + u64 val = op_code;
> + int rc;
> + u8 i;
> +
> + cmd->op_code = op_code;
> + cmd->id++;
> +
> + val |= ((u64)cmd->id) << 16;
> +
> + rc = ocxl_global_mmio_write64(scm_data->ocxl_afu, cmd->request_offset,
> +  

Re: [PATCH 06/17] Documentation/driver-api: generic-counter: drop doubled word

2020-07-04 Thread Jonathan Cameron
On Sat, 4 Jul 2020 08:30:41 -0400
William Breathitt Gray  wrote:

> On Fri, Jul 03, 2020 at 08:44:51PM -0700, Randy Dunlap wrote:
> > Drop the doubled word "the".
> > 
> > Signed-off-by: Randy Dunlap 
> > Cc: Jonathan Corbet 
> > Cc: linux-...@vger.kernel.org
> > Cc: William Breathitt Gray 
> > Cc: linux-...@vger.kernel.org
> > ---
> >  Documentation/driver-api/generic-counter.rst |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- linux-next-20200701.orig/Documentation/driver-api/generic-counter.rst
> > +++ linux-next-20200701/Documentation/driver-api/generic-counter.rst
> > @@ -262,7 +262,7 @@ the system.
> >  Counter Counts may be allocated via counter_count structures, and
> >  respective Counter Signal associations (Synapses) made via
> >  counter_synapse structures. Associated counter_synapse structures are
> > -stored as an array and set to the the synapses array member of the
> > +stored as an array and set to the synapses array member of the
> >  respective counter_count structure. These counter_count structures are
> >  set to the counts array member of an allocated counter_device structure
> >  before the Counter is registered to the system.  
> 
> Acked-by: William Breathitt Gray 

Applied to the togreg branch of iio.git

Thanks,

Jonathan
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 07/17] Documentation/driver-api: iio/buffers: drop doubled word

2020-07-04 Thread Jonathan Cameron
On Fri,  3 Jul 2020 20:44:52 -0700
Randy Dunlap  wrote:

> Drop the doubled word "struct".
> 
> Signed-off-by: Randy Dunlap 
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> Cc: William Breathitt Gray 
> Cc: linux-...@vger.kernel.org
Applied to the togreg branch of iio.git.

Thanks,

Jonathan

> ---
>  Documentation/driver-api/iio/buffers.rst |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-next-20200701.orig/Documentation/driver-api/iio/buffers.rst
> +++ linux-next-20200701/Documentation/driver-api/iio/buffers.rst
> @@ -88,7 +88,7 @@ fields in iio_chan_spec definition::
>  The driver implementing the accelerometer described above will have the
>  following channel definition::
>  
> -   struct struct iio_chan_spec accel_channels[] = {
> +   struct iio_chan_spec accel_channels[] = {
> {
> .type = IIO_ACCEL,
>  .modified = 1,
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 01/14] cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints

2021-02-01 Thread Jonathan Cameron
On Fri, 29 Jan 2021 16:24:25 -0800
Ben Widawsky  wrote:

> From: Dan Williams 
> 
> The CXL.mem protocol allows a device to act as a provider of "System
> RAM" and/or "Persistent Memory" that is fully coherent as if the memory
> was attached to the typical CPU memory controller.
> 
> With the CXL-2.0 specification a PCI endpoint can implement a "Type-3"
> device interface and give the operating system control over "Host
> Managed Device Memory". See section 2.3 Type 3 CXL Device.
> 
> The memory range exported by the device may optionally be described by
> the platform firmware memory map, or by infrastructure like LIBNVDIMM to
> provision persistent memory capacity from one, or more, CXL.mem devices.
> 
> A pre-requisite for Linux-managed memory-capacity provisioning is this
> cxl_mem driver that can speak the mailbox protocol defined in section
> 8.2.8.4 Mailbox Registers.
> 
> For now just land the initial driver boiler-plate and Documentation/
> infrastructure.
> 
> Link: https://www.computeexpresslink.org/download-the-specification
> Cc: Jonathan Corbet 
> Signed-off-by: Dan Williams 
> Signed-off-by: Ben Widawsky 
Hi Ben,

One thing below about using defs from generic PCI headers where
they are not CXL specific.


> diff --git a/drivers/cxl/pci.h b/drivers/cxl/pci.h
> new file mode 100644
> index ..a8a9935fa90b
> --- /dev/null
> +++ b/drivers/cxl/pci.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#ifndef __CXL_PCI_H__
> +#define __CXL_PCI_H__
> +
> +#define PCI_CLASS_MEMORY_CXL 0x050210
> +
> +/*
> + * See section 8.1 Configuration Space Registers in the CXL 2.0
> + * Specification
> + */
> +#define PCI_EXT_CAP_ID_DVSEC 0x23
> +#define PCI_DVSEC_VENDOR_ID_CXL  0x1E98
> +#define PCI_DVSEC_VENDOR_ID_OFFSET   0x4
> +#define PCI_DVSEC_ID_CXL 0x0
> +#define PCI_DVSEC_ID_OFFSET  0x8

include/uapi/linux/pci-regs.h includes equivalents of generic parts of
this already though PCI_DVSEC_HEADER1 isn't exactly informative naming.

> +
> +#define PCI_DVSEC_ID_CXL_REGLOC  0x8
> +
> +#endif /* __CXL_PCI_H__ */
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 1/8] cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints

2021-02-10 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:52 -0800
Ben Widawsky  wrote:

> From: Dan Williams 
> 
> The CXL.mem protocol allows a device to act as a provider of "System
> RAM" and/or "Persistent Memory" that is fully coherent as if the memory
> was attached to the typical CPU memory controller.
> 
> With the CXL-2.0 specification a PCI endpoint can implement a "Type-3"
> device interface and give the operating system control over "Host
> Managed Device Memory". See section 2.3 Type 3 CXL Device.
> 
> The memory range exported by the device may optionally be described by
> the platform firmware memory map, or by infrastructure like LIBNVDIMM to
> provision persistent memory capacity from one, or more, CXL.mem devices.
> 
> A pre-requisite for Linux-managed memory-capacity provisioning is this
> cxl_mem driver that can speak the mailbox protocol defined in section
> 8.2.8.4 Mailbox Registers.
> 
> For now just land the initial driver boiler-plate and Documentation/
> infrastructure.
> 
> Link: https://www.computeexpresslink.org/download-the-specification
> Cc: Jonathan Corbet 
> Signed-off-by: Dan Williams 
> Signed-off-by: Ben Widawsky 
> Acked-by: David Rientjes  (v1)

A few trivial bits inline but nothing that I feel that strongly about.
It is probably a good idea to add a note about generic dvsec code
somewhere in this patch description (to avoid people raising it on
future versions!)

With the define of PCI_EXT_CAP_ID_DVSEC dropped (it's in the generic
header already).

Reviewed-by: Jonathan Cameron 

> ---
>  Documentation/driver-api/cxl/index.rst| 12 
>  .../driver-api/cxl/memory-devices.rst | 29 +
>  Documentation/driver-api/index.rst|  1 +
>  drivers/Kconfig   |  1 +
>  drivers/Makefile  |  1 +
>  drivers/cxl/Kconfig   | 35 +++
>  drivers/cxl/Makefile  |  4 ++
>  drivers/cxl/mem.c | 63 +++
>  drivers/cxl/pci.h | 18 ++
>  include/linux/pci_ids.h   |  1 +
>  10 files changed, 165 insertions(+)
>  create mode 100644 Documentation/driver-api/cxl/index.rst
>  create mode 100644 Documentation/driver-api/cxl/memory-devices.rst
>  create mode 100644 drivers/cxl/Kconfig
>  create mode 100644 drivers/cxl/Makefile
>  create mode 100644 drivers/cxl/mem.c
>  create mode 100644 drivers/cxl/pci.h
> 
> diff --git a/Documentation/driver-api/cxl/index.rst 
> b/Documentation/driver-api/cxl/index.rst
> new file mode 100644
> index ..036e49553542
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/index.rst
> @@ -0,0 +1,12 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +
> +Compute Express Link
> +
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   memory-devices
> +
> +.. only::  subproject and html
> diff --git a/Documentation/driver-api/cxl/memory-devices.rst 
> b/Documentation/driver-api/cxl/memory-devices.rst
> new file mode 100644
> index ..43177e700d62
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -0,0 +1,29 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: 
> +
> +===
> +Compute Express Link Memory Devices
> +===
> +
> +A Compute Express Link Memory Device is a CXL component that implements the
> +CXL.mem protocol. It contains some amount of volatile memory, persistent 
> memory,
> +or both. It is enumerated as a PCI device for configuration and passing
> +messages over an MMIO mailbox. Its contribution to the System Physical
> +Address space is handled via HDM (Host Managed Device Memory) decoders
> +that optionally define a device's contribution to an interleaved address
> +range across multiple devices underneath a host-bridge or interleaved
> +across host-bridges.
> +
> +Driver Infrastructure
> +=
> +
> +This section covers the driver infrastructure for a CXL memory device.
> +
> +CXL Memory Device
> +-
> +
> +.. kernel-doc:: drivers/cxl/mem.c
> +   :doc: cxl mem
> +
> +.. kernel-doc:: drivers/cxl/mem.c
> +   :internal:
> diff --git a/Documentation/driver-api/index.rst 
> b/Documentation/driver-api/index.rst
> index 2456d0a97ed8..d246a18fd78f 100644
> --- a/Documentation/driver-api/index.rst
> +++ b/Documentation/driver-api/index.rst
> @@ -35,6 +35,7 @@ available subsections can be seen below.
> usb/index
> firewire
> pci/index
> +   cxl/index
> spi
> i2c
> ipmb
> diff --gi

Re: [PATCH v2 1/8] cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints

2021-02-10 Thread Jonathan Cameron
On Wed, 10 Feb 2021 09:12:20 -0800
Ben Widawsky  wrote:

...
   
> > > +}
> > > +
> > > +static int cxl_mem_probe(struct pci_dev *pdev, const struct 
> > > pci_device_id *id)
> > > +{
> > > + struct device *dev = >dev;
> > > + int regloc;
> > > +
> > > + regloc = cxl_mem_dvsec(pdev, PCI_DVSEC_ID_CXL_REGLOC_OFFSET);
> > > + if (!regloc) {
> > > + dev_err(dev, "register location dvsec not found\n");
> > > + return -ENXIO;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static const struct pci_device_id cxl_mem_pci_tbl[] = {
> > > + /* PCI class code for CXL.mem Type-3 Devices */
> > > + { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
> > > +   PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF, 0xff, 0 },  
> > 
> > Having looked at this and thought 'thats a bit tricky to check'
> > I did a quick grep and seems the kernel is split between this approach
> > and people going with the mor readable c99 style initiators
> > .class = .. etc
> > 
> > Personally I'd find the c99 approach easier to read. 
> >   
> 
> Well, it's Dan's patch, but I did modify this last. I took a look around, and
> the best fit seems to me seems to be:
> -   { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
> - PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF, 0xff, 0 },
> +   { PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), 
> ~0)},
> 
> That work for you?
> 

Yes that's definitely nicer.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-10 Thread Jonathan Cameron
On Wed, 10 Feb 2021 08:55:57 -0800
Ben Widawsky  wrote:

> On 21-02-10 15:07:59, Jonathan Cameron wrote:
> > On Wed, 10 Feb 2021 13:32:52 +
> > Jonathan Cameron  wrote:
> >   
> > > On Tue, 9 Feb 2021 16:02:53 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > Provide enough functionality to utilize the mailbox of a memory device.
> > > > The mailbox is used to interact with the firmware running on the memory
> > > > device. The flow is proven with one implemented command, "identify".
> > > > Because the class code has already told the driver this is a memory
> > > > device and the identify command is mandatory.
> > > > 
> > > > CXL devices contain an array of capabilities that describe the
> > > > interactions software can have with the device or firmware running on
> > > > the device. A CXL compliant device must implement the device status and
> > > > the mailbox capability. Additionally, a CXL compliant memory device must
> > > > implement the memory device capability. Each of the capabilities can
> > > > [will] provide an offset within the MMIO region for interacting with the
> > > > CXL device.
> > > > 
> > > > The capabilities tell the driver how to find and map the register space
> > > > for CXL Memory Devices. The registers are required to utilize the CXL
> > > > spec defined mailbox interface. The spec outlines two mailboxes, primary
> > > > and secondary. The secondary mailbox is earmarked for system firmware,
> > > > and not handled in this driver.
> > > > 
> > > > Primary mailboxes are capable of generating an interrupt when submitting
> > > > a background command. That implementation is saved for a later time.
> > > > 
> > > > Link: https://www.computeexpresslink.org/download-the-specification
> > > > Signed-off-by: Ben Widawsky 
> > > > Reviewed-by: Dan Williams 
> > > 
> > > Hi Ben,
> > > 
> > >   
> > > > +/**
> > > > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory device.
> > > > + * @cxlm: The CXL memory device to communicate with.
> > > > + * @mbox_cmd: Command to send to the memory device.
> > > > + *
> > > > + * Context: Any context. Expects mbox_lock to be held.
> > > > + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 on 
> > > > success.
> > > > + * Caller should check the return code in @mbox_cmd to make 
> > > > sure it
> > > > + * succeeded.
> > > 
> > > cxl_xfer_log() doesn't check mbox_cmd->return_code and for my test it 
> > > currently
> > > enters an infinite loop as a result.  
> 
> I meant to fix that.
> 
> > > 
> > > I haven't checked other paths, but to my mind it is not a good idea to 
> > > require
> > > two levels of error checking - the example here proves how easy it is to 
> > > forget
> > > one.  
> 
> Demonstrably, you're correct. I think it would be good to have a kernel only
> mbox command that does the error checking though. Let me type something up and
> see how it looks.
> 
> > > 
> > > Now all I have to do is figure out why I'm getting an error in the first 
> > > place!  
> > 
> > For reference this seems to be our old issue of arm64 memcpy_fromio() only 
> > doing 8 byte
> > or 1 byte copies.  The hack in QEMU to allow that to work, doesn't work.
> > Result is that 1 byte reads replicate across the register
> > (in this case instead of 001c I get 1c1c1c1c)
> > 
> > For these particular registers, we are covered by the rules in 8.2 which 
> > says that
> > a 1, 2, 4, 8 aligned reads of 64 bit registers etc are fine.
> > 
> > So we should not have to care.  This isn't true for the component registers 
> > where
> > we need to guarantee 4 or 8 byte reads only.
> > 
> > For this particular issue the mailbox_read_reg() function in the QEMU code
> > needs to handle the size 1 case and set min_access_size = 1 for
> > mailbox_ops.  Logically it should also handle the 2 byte case I think,
> > but I'm not hitting that.
> > 
> > Jonathan  
> 
> I think the latest QEMU patches should do the right thing (I have a v4 branch 
> if
> you want to try it). If it doesn't, it'd be worth debugging. The memory
> accessors should split up or combine the reads/

Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-10 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:53 -0800
Ben Widawsky  wrote:

> Provide enough functionality to utilize the mailbox of a memory device.
> The mailbox is used to interact with the firmware running on the memory
> device. The flow is proven with one implemented command, "identify".
> Because the class code has already told the driver this is a memory
> device and the identify command is mandatory.
> 
> CXL devices contain an array of capabilities that describe the
> interactions software can have with the device or firmware running on
> the device. A CXL compliant device must implement the device status and
> the mailbox capability. Additionally, a CXL compliant memory device must
> implement the memory device capability. Each of the capabilities can
> [will] provide an offset within the MMIO region for interacting with the
> CXL device.
> 
> The capabilities tell the driver how to find and map the register space
> for CXL Memory Devices. The registers are required to utilize the CXL
> spec defined mailbox interface. The spec outlines two mailboxes, primary
> and secondary. The secondary mailbox is earmarked for system firmware,
> and not handled in this driver.
> 
> Primary mailboxes are capable of generating an interrupt when submitting
> a background command. That implementation is saved for a later time.
> 
> Link: https://www.computeexpresslink.org/download-the-specification
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 

A few more comments inline (proper review whereas my other reply was a
bug chase).

Jonathan

> ---
>  drivers/cxl/Kconfig   |  14 +
>  drivers/cxl/cxl.h |  93 +++
>  drivers/cxl/mem.c | 511 +-
>  drivers/cxl/pci.h |  13 +
>  include/uapi/linux/pci_regs.h |   1 +
>  5 files changed, 630 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/cxl/cxl.h
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 9e80b311e928..c4ba3aa0a05d 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -32,4 +32,18 @@ config CXL_MEM
> Chapter 2.3 Type 3 CXL Device in the CXL 2.0 specification.
>  
> If unsure say 'm'.
> +
> +config CXL_MEM_INSECURE_DEBUG
> + bool "CXL.mem debugging"

As mentioned below, this makes me a tiny bit uncomfortable.

> + depends on CXL_MEM
> + help
> +   Enable debug of all CXL command payloads.
> +
> +   Some CXL devices and controllers support encryption and other
> +   security features. The payloads for the commands that enable
> +   those features may contain sensitive clear-text security
> +   material. Disable debug of those command payloads by default.
> +   If you are a kernel developer actively working on CXL
> +   security enabling say Y, otherwise say N.
> +
>  endif
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> new file mode 100644
> index ..745f5e0bfce3
> --- /dev/null
> +++ b/drivers/cxl/cxl.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2020 Intel Corporation. */
> +
> +#ifndef __CXL_H__
> +#define __CXL_H__
> +
> +#include 
> +#include 
> +#include 
> +
> +/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> +#define CXLDEV_CAP_ARRAY_OFFSET 0x0
> +#define   CXLDEV_CAP_ARRAY_CAP_ID 0
> +#define   CXLDEV_CAP_ARRAY_ID_MASK GENMASK(15, 0)
> +#define   CXLDEV_CAP_ARRAY_COUNT_MASK GENMASK(47, 32)
> +/* CXL 2.0 8.2.8.2.1 CXL Device Capabilities */
> +#define CXLDEV_CAP_CAP_ID_DEVICE_STATUS 0x1
> +#define CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX 0x2
> +#define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
> +#define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
> +
> +/* CXL 2.0 8.2.8.4 Mailbox Registers */
> +#define CXLDEV_MBOX_CAPS_OFFSET 0x00
> +#define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> +#define CXLDEV_MBOX_CTRL_OFFSET 0x04
> +#define   CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
> +#define CXLDEV_MBOX_CMD_OFFSET 0x08
> +#define   CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK(15, 0)
> +#define   CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK(36, 16)
> +#define CXLDEV_MBOX_STATUS_OFFSET 0x10
> +#define   CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK(47, 32)
> +#define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
> +#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
> +
> +/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> +#define CXLMDEV_STATUS_OFFSET 0x0
> +#define   CXLMDEV_DEV_FATAL BIT(0)
> +#define   CXLMDEV_FW_HALT BIT(1)
> +#define   CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2)
> +#define CXLMDEV_MS_NOT_READY 0
> +#define CXLMDEV_MS_READY 1
> +#define CXLMDEV_MS_ERROR 2
> +#define CXLMDEV_MS_DISABLED 3
> +#define CXLMDEV_READY(status)
>   \
> + (FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) ==\
> +  CXLMDEV_MS_READY)
> +#define   CXLMDEV_MBOX_IF_READY BIT(4)
> +#define   CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5)
> +#define CXLMDEV_RESET_NEEDED_NOT 0
> +#define 

Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-10 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:53 -0800
Ben Widawsky  wrote:

> Provide enough functionality to utilize the mailbox of a memory device.
> The mailbox is used to interact with the firmware running on the memory
> device. The flow is proven with one implemented command, "identify".
> Because the class code has already told the driver this is a memory
> device and the identify command is mandatory.
> 
> CXL devices contain an array of capabilities that describe the
> interactions software can have with the device or firmware running on
> the device. A CXL compliant device must implement the device status and
> the mailbox capability. Additionally, a CXL compliant memory device must
> implement the memory device capability. Each of the capabilities can
> [will] provide an offset within the MMIO region for interacting with the
> CXL device.
> 
> The capabilities tell the driver how to find and map the register space
> for CXL Memory Devices. The registers are required to utilize the CXL
> spec defined mailbox interface. The spec outlines two mailboxes, primary
> and secondary. The secondary mailbox is earmarked for system firmware,
> and not handled in this driver.
> 
> Primary mailboxes are capable of generating an interrupt when submitting
> a background command. That implementation is saved for a later time.
> 
> Link: https://www.computeexpresslink.org/download-the-specification
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 

Hi Ben,


> +/**
> + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory device.
> + * @cxlm: The CXL memory device to communicate with.
> + * @mbox_cmd: Command to send to the memory device.
> + *
> + * Context: Any context. Expects mbox_lock to be held.
> + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 on 
> success.
> + * Caller should check the return code in @mbox_cmd to make sure it
> + * succeeded.

cxl_xfer_log() doesn't check mbox_cmd->return_code and for my test it currently
enters an infinite loop as a result.

I haven't checked other paths, but to my mind it is not a good idea to require
two levels of error checking - the example here proves how easy it is to forget
one.

Now all I have to do is figure out why I'm getting an error in the first place!

Jonathan



> + *
> + * This is a generic form of the CXL mailbox send command, thus the only I/O
> + * operations used are cxl_read_mbox_reg(). Memory devices, and perhaps other
> + * types of CXL devices may have further information available upon error
> + * conditions.
> + *
> + * The CXL spec allows for up to two mailboxes. The intention is for the 
> primary
> + * mailbox to be OS controlled and the secondary mailbox to be used by system
> + * firmware. This allows the OS and firmware to communicate with the device 
> and
> + * not need to coordinate with each other. The driver only uses the primary
> + * mailbox.
> + */
> +static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm,
> +  struct mbox_cmd *mbox_cmd)
> +{
> + void __iomem *payload = cxlm->mbox_regs + CXLDEV_MBOX_PAYLOAD_OFFSET;
> + u64 cmd_reg, status_reg;
> + size_t out_len;
> + int rc;
> +
> + lockdep_assert_held(>mbox_mutex);
> +
> + /*
> +  * Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
> +  *   1. Caller reads MB Control Register to verify doorbell is clear
> +  *   2. Caller writes Command Register
> +  *   3. Caller writes Command Payload Registers if input payload is 
> non-empty
> +  *   4. Caller writes MB Control Register to set doorbell
> +  *   5. Caller either polls for doorbell to be clear or waits for 
> interrupt if configured
> +  *   6. Caller reads MB Status Register to fetch Return code
> +  *   7. If command successful, Caller reads Command Register to get 
> Payload Length
> +  *   8. If output payload is non-empty, host reads Command Payload 
> Registers
> +  *
> +  * Hardware is free to do whatever it wants before the doorbell is rung,
> +  * and isn't allowed to change anything after it clears the doorbell. As
> +  * such, steps 2 and 3 can happen in any order, and steps 6, 7, 8 can
> +  * also happen in any order (though some orders might not make sense).
> +  */
> +
> + /* #1 */
> + if (cxl_doorbell_busy(cxlm)) {
> + dev_err_ratelimited(>pdev->dev,
> + "Mailbox re-busy after acquiring\n");
> + return -EBUSY;
> + }
> +
> + cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK,
> +  mbox_cmd->opcode);
> + if (mbox_cmd->size_in) {
> + if (WARN_ON(!mbox_cmd->payload_in))
> + return -EINVAL;
> +
> + cmd_reg |= FIELD_PREP(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK,
> +   mbox_cmd->size_in);
> + memcpy_toio(payload, mbox_cmd->payload_in, mbox_cmd->size_in);
> + }
> +
> + /* #2, #3 

Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-10 Thread Jonathan Cameron
On Wed, 10 Feb 2021 13:32:52 +
Jonathan Cameron  wrote:

> On Tue, 9 Feb 2021 16:02:53 -0800
> Ben Widawsky  wrote:
> 
> > Provide enough functionality to utilize the mailbox of a memory device.
> > The mailbox is used to interact with the firmware running on the memory
> > device. The flow is proven with one implemented command, "identify".
> > Because the class code has already told the driver this is a memory
> > device and the identify command is mandatory.
> > 
> > CXL devices contain an array of capabilities that describe the
> > interactions software can have with the device or firmware running on
> > the device. A CXL compliant device must implement the device status and
> > the mailbox capability. Additionally, a CXL compliant memory device must
> > implement the memory device capability. Each of the capabilities can
> > [will] provide an offset within the MMIO region for interacting with the
> > CXL device.
> > 
> > The capabilities tell the driver how to find and map the register space
> > for CXL Memory Devices. The registers are required to utilize the CXL
> > spec defined mailbox interface. The spec outlines two mailboxes, primary
> > and secondary. The secondary mailbox is earmarked for system firmware,
> > and not handled in this driver.
> > 
> > Primary mailboxes are capable of generating an interrupt when submitting
> > a background command. That implementation is saved for a later time.
> > 
> > Link: https://www.computeexpresslink.org/download-the-specification
> > Signed-off-by: Ben Widawsky 
> > Reviewed-by: Dan Williams   
> 
> Hi Ben,
> 
> 
> > +/**
> > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory device.
> > + * @cxlm: The CXL memory device to communicate with.
> > + * @mbox_cmd: Command to send to the memory device.
> > + *
> > + * Context: Any context. Expects mbox_lock to be held.
> > + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 on 
> > success.
> > + * Caller should check the return code in @mbox_cmd to make sure it
> > + * succeeded.  
> 
> cxl_xfer_log() doesn't check mbox_cmd->return_code and for my test it 
> currently
> enters an infinite loop as a result.
> 
> I haven't checked other paths, but to my mind it is not a good idea to require
> two levels of error checking - the example here proves how easy it is to 
> forget
> one.
> 
> Now all I have to do is figure out why I'm getting an error in the first 
> place!

For reference this seems to be our old issue of arm64 memcpy_fromio() only 
doing 8 byte
or 1 byte copies.  The hack in QEMU to allow that to work, doesn't work.
Result is that 1 byte reads replicate across the register
(in this case instead of 001c I get 1c1c1c1c)

For these particular registers, we are covered by the rules in 8.2 which says 
that
a 1, 2, 4, 8 aligned reads of 64 bit registers etc are fine.

So we should not have to care.  This isn't true for the component registers 
where
we need to guarantee 4 or 8 byte reads only.

For this particular issue the mailbox_read_reg() function in the QEMU code
needs to handle the size 1 case and set min_access_size = 1 for
mailbox_ops.  Logically it should also handle the 2 byte case I think,
but I'm not hitting that.

Jonathan

> 
> Jonathan
> 
> 
> 
> > + *
> > + * This is a generic form of the CXL mailbox send command, thus the only 
> > I/O
> > + * operations used are cxl_read_mbox_reg(). Memory devices, and perhaps 
> > other
> > + * types of CXL devices may have further information available upon error
> > + * conditions.
> > + *
> > + * The CXL spec allows for up to two mailboxes. The intention is for the 
> > primary
> > + * mailbox to be OS controlled and the secondary mailbox to be used by 
> > system
> > + * firmware. This allows the OS and firmware to communicate with the 
> > device and
> > + * not need to coordinate with each other. The driver only uses the primary
> > + * mailbox.
> > + */
> > +static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm,
> > +struct mbox_cmd *mbox_cmd)
> > +{
> > +   void __iomem *payload = cxlm->mbox_regs + CXLDEV_MBOX_PAYLOAD_OFFSET;
> > +   u64 cmd_reg, status_reg;
> > +   size_t out_len;
> > +   int rc;
> > +
> > +   lockdep_assert_held(>mbox_mutex);
> > +
> > +   /*
> > +* Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
> > +*   1. Caller reads MB Control Register to verify doorbell is clear
> > +*   2. Caller writes Command Register
> > +*   3.

Re: [PATCH v2 3/8] cxl/mem: Register CXL memX devices

2021-02-10 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:54 -0800
Ben Widawsky  wrote:

> From: Dan Williams 
> 
> Create the /sys/bus/cxl hierarchy to enumerate:
> 
> * Memory Devices (per-endpoint control devices)
> 
> * Memory Address Space Devices (platform address ranges with
>   interleaving, performance, and persistence attributes)
> 
> * Memory Regions (active provisioned memory from an address space device
>   that is in use as System RAM or delegated to libnvdimm as Persistent
>   Memory regions).
> 
> For now, only the per-endpoint control devices are registered on the
> 'cxl' bus. However, going forward it will provide a mechanism to
> coordinate cross-device interleave.
> 
> Signed-off-by: Dan Williams 
> Signed-off-by: Ben Widawsky 

One stray header, and a request for a tiny bit of reordering to
make it easier to chase through creation and destruction.

Either way with the header move to earlier patch I'm fine with this one.

Reviewed-by: Jonathan Cameron 

> ---
>  Documentation/ABI/testing/sysfs-bus-cxl   |  26 ++
>  .../driver-api/cxl/memory-devices.rst |  17 +
>  drivers/cxl/Makefile  |   3 +
>  drivers/cxl/bus.c |  29 ++
>  drivers/cxl/cxl.h |   4 +
>  drivers/cxl/mem.c | 301 +-
>  6 files changed, 378 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-cxl
>  create mode 100644 drivers/cxl/bus.c
> 


> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 745f5e0bfce3..b3c56fa6e126 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -3,6 +3,7 @@
>  
>  #ifndef __CXL_H__
>  #define __CXL_H__
> +#include 

Why is this coming in now? Feels like it should have been in earlier
patch that started using struct range

>  
>  #include 
>  #include 
> @@ -55,6 +56,7 @@
>   (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=   \
>CXLMDEV_RESET_NEEDED_NOT)
>  
> +struct cxl_memdev;
>  /**
>   * struct cxl_mem - A CXL memory device
>   * @pdev: The PCI device associated with this CXL device.
> @@ -72,6 +74,7 @@
>  struct cxl_mem {
>   struct pci_dev *pdev;
>   void __iomem *regs;
> + struct cxl_memdev *cxlmd;
>  
>   void __iomem *status_regs;
>   void __iomem *mbox_regs;
> @@ -90,4 +93,5 @@ struct cxl_mem {
>   } ram;
>  };
>  
> +extern struct bus_type cxl_bus_type;
>  #endif /* __CXL_H__ */
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 0a868a15badc..8bbd2495e237 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1,11 +1,36 @@
>

> +
> +static void cxl_memdev_release(struct device *dev)
> +{
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +
> + percpu_ref_exit(>ops_active);
> + ida_free(_memdev_ida, cxlmd->id);
> + kfree(cxlmd);
> +}
> +
...

> +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> +{
> + struct pci_dev *pdev = cxlm->pdev;
> + struct cxl_memdev *cxlmd;
> + struct device *dev;
> + struct cdev *cdev;
> + int rc;
> +
> + cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
> + if (!cxlmd)
> + return -ENOMEM;
> + init_completion(>ops_dead);
> +
> + /*
> +  * @cxlm is deallocated when the driver unbinds so operations
> +  * that are using it need to hold a live reference.
> +  */
> + cxlmd->cxlm = cxlm;
> + rc = percpu_ref_init(>ops_active, cxlmdev_ops_active_release, 0,
> +  GFP_KERNEL);
> + if (rc)
> + goto err_ref;
> +
> + rc = ida_alloc_range(_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
> + if (rc < 0)
> + goto err_id;
> + cxlmd->id = rc;
> +
> + dev = >dev;
> + device_initialize(dev);
> + dev->parent = >dev;
> + dev->bus = _bus_type;
> + dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> + dev->type = _memdev_type;
> + dev_set_name(dev, "mem%d", cxlmd->id);
> +
> + cdev = >cdev;
> + cdev_init(cdev, _memdev_fops);
> +
> + rc = cdev_device_add(cdev, dev);
> + if (rc)
> + goto err_add;
> +
> + return devm_add_action_or_reset(dev->parent, cxlmdev_unregister, cxlmd);

This had me scratching my head. The cxlmdev_unregister() if called normally
or in the _or_reset() results in

percpu_ref_kill(>ops_active);
cdev_device_del(>cdev, dev);
wait_for_completion(>ops_dead);
cxlmd->cxlm = NULL;
put_device(dev);
/* If last ref this will result in */
percpu

Re: [PATCH v2 4/8] cxl/mem: Add basic IOCTL interface

2021-02-10 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:55 -0800
Ben Widawsky  wrote:

> Add a straightforward IOCTL that provides a mechanism for userspace to
> query the supported memory device commands. CXL commands as they appear
> to userspace are described as part of the UAPI kerneldoc. The command
> list returned via this IOCTL will contain the full set of commands that
> the driver supports, however, some of those commands may not be
> available for use by userspace.
> 
> Memory device commands first appear in the CXL 2.0 specification. They
> are submitted through a mailbox mechanism specified also originally
> specified in the CXL 2.0 specification.
> 
> The send command allows userspace to issue mailbox commands directly to
> the hardware. The list of available commands to send are the output of
> the query command. The driver verifies basic properties of the command
> and possibly inspect the input (or output) payload to determine whether
> or not the command is allowed (or might taint the kernel).
> 
> Reported-by: kernel test robot  # bug in earlier revision
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 

A bit of anti macro commentary below.  Heavy use of them may make the code
shorter, but I'd argue they make it harder to do review if you've not looked
at a given bit of code for a while.

Also there is a bit of documentation in here for flags that don't seem to
exist (at this stage anyway) - may just be in the wrong patch.

Jonathan


> ---
>  .clang-format |   1 +
>  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
>  drivers/cxl/mem.c | 291 +-
>  include/uapi/linux/cxl_mem.h  | 152 +
>  4 files changed, 443 insertions(+), 2 deletions(-)
>  create mode 100644 include/uapi/linux/cxl_mem.h
> 
> diff --git a/.clang-format b/.clang-format
> index 10dc5a9a61b3..3f11c8901b43 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -109,6 +109,7 @@ ForEachMacros:
>- 'css_for_each_child'
>- 'css_for_each_descendant_post'
>- 'css_for_each_descendant_pre'
> +  - 'cxl_for_each_cmd'
>- 'device_for_each_child_node'
>- 'dma_fence_chain_for_each'
>- 'do_for_each_ftrace_op'
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index a4c75a28c839..6eb8e634664d 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -352,6 +352,7 @@ Code  Seq#Include File
>Comments
>   
> 
>  0xCC  00-0F  drivers/misc/ibmvmc.h   pseries 
> VMC driver
>  0xCD  01 linux/reiserfs_fs.h
> +0xCE  01-02  uapi/linux/cxl_mem.hCompute 
> Express Link Memory Devices
>  0xCF  02 fs/cifs/ioctl.c
>  0xDB  00-0F  drivers/char/mwave/mwavepub.h
>  0xDD  00-3F  ZFCP 
> device driver see drivers/s390/scsi/
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 8bbd2495e237..ce65630bb75e 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -39,6 +40,7 @@
>  #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
>  
>  enum opcode {
> + CXL_MBOX_OP_INVALID = 0x,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
>   CXL_MBOX_OP_MAX = 0x1
>  };
> @@ -90,9 +92,57 @@ struct cxl_memdev {
>  static int cxl_mem_major;
>  static DEFINE_IDA(cxl_memdev_ida);
>  
> +/**
> + * struct cxl_mem_command - Driver representation of a memory device command
> + * @info: Command information as it exists for the UAPI
> + * @opcode: The actual bits used for the mailbox protocol
> + * @flags: Set of flags reflecting the state of the command.
> + *
> + *  * %CXL_CMD_FLAG_MANDATORY: Hardware must support this command. This flag 
> is
> + *only used internally by the driver for sanity checking.

Doesn't seem to be defined yet.

> + *
> + * The cxl_mem_command is the driver's internal representation of commands 
> that
> + * are supported by the driver. Some of these commands may not be supported 
> by
> + * the hardware. The driver will use @info to validate the fields passed in 
> by
> + * the user then submit the @opcode to the hardware.
> + *
> + * See struct cxl_command_info.
> + */
> +struct cxl_mem_command {
> + struct cxl_command_info info;
> + enum opcode opcode;
> +};
> +
> +#define CXL_CMD(_id, _flags, sin, sout)  
>   \
> + [CXL_MEM_COMMAND_ID_##_id] = { \
> + .info = {

Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-11 Thread Jonathan Cameron
On Wed, 10 Feb 2021 11:54:29 -0800
Dan Williams  wrote:

> > > ...
> > >  
> > > > +static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,
> > > > +struct mbox_cmd *mbox_cmd)
> > > > +{
> > > > +   struct device *dev = >pdev->dev;
> > > > +
> > > > +   dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",
> > > > +   mbox_cmd->opcode, mbox_cmd->size_in);
> > > > +
> > > > +   if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {  
> > >
> > > Hmm.  Whilst I can see the advantage of this for debug, I'm not sure we 
> > > want
> > > it upstream even under a rather evil looking CONFIG variable.
> > >
> > > Is there a bigger lock we can use to avoid chance of accidental 
> > > enablement?  
> >
> > Any suggestions? I'm told this functionality was extremely valuable for 
> > NVDIMM,
> > though I haven't personally experienced it.  
> 
> Yeah, there was no problem with the identical mechanism in LIBNVDIMM
> land. However, I notice that the useful feature for LIBNVDIMM is the
> option to dump all payloads. This one only fires on timeouts which is
> less useful. So I'd say fix it to dump all payloads on the argument
> that the safety mechanism was proven with the LIBNVDIMM precedent, or
> delete it altogether to maintain v5.12 momentum. Payload dumping can
> be added later.

I think I'd drop it for now - feels like a topic that needs more discussion.

Also, dumping this data to the kernel log isn't exactly elegant - particularly
if we dump a lot more of it.  Perhaps tracepoints?

> 
> [..]
> > > > diff --git a/include/uapi/linux/pci_regs.h 
> > > > b/include/uapi/linux/pci_regs.h
> > > > index e709ae8235e7..6267ca9ae683 100644
> > > > --- a/include/uapi/linux/pci_regs.h
> > > > +++ b/include/uapi/linux/pci_regs.h
> > > > @@ -1080,6 +1080,7 @@
> > > >
> > > >  /* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
> > > >  #define PCI_DVSEC_HEADER1  0x4 /* Designated Vendor-Specific 
> > > > Header1 */
> > > > +#define PCI_DVSEC_HEADER1_LENGTH_MASK  0xFFF0  
> > >
> > > Seems sensible to add the revision mask as well.
> > > The vendor id currently read using a word read rather than dword, but 
> > > perhaps
> > > neater to add that as well for completeness?
> > >
> > > Having said that, given Bjorn's comment on clashes and the fact he'd 
> > > rather see
> > > this stuff defined in drivers and combined later (see review patch 1 and 
> > > follow
> > > the link) perhaps this series should not touch this header at all.  
> >
> > I'm fine to move it back.  
> 
> Yeah, we're playing tennis now between Bjorn's and Christoph's
> comments, but I like Bjorn's suggestion of "deduplicate post merge"
> given the bloom of DVSEC infrastructure landing at the same time.
I guess it may depend on timing of this.  Personally I think 5.12 may be too 
aggressive.

As long as Bjorn can take a DVSEC deduplication as an immutable branch then 
perhaps
during 5.13 this tree can sit on top of that.

Jonathan

___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 4/8] cxl/mem: Add basic IOCTL interface

2021-02-11 Thread Jonathan Cameron
On Wed, 10 Feb 2021 20:40:52 -0800
Dan Williams  wrote:

> On Wed, Feb 10, 2021 at 10:47 AM Jonathan Cameron
>  wrote:
> [..]
> > > +#define CXL_CMDS 
> > >  \
> > > + ___C(INVALID, "Invalid Command"), \
> > > + ___C(IDENTIFY, "Identify Command"),   \
> > > + ___C(MAX, "Last command")
> > > +
> > > +#define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> > > +enum { CXL_CMDS };
> > > +
> > > +#undef ___C
> > > +#define ___C(a, b) { b }
> > > +static const struct {
> > > + const char *name;
> > > +} cxl_command_names[] = { CXL_CMDS };
> > > +#undef ___C  
> >
> > Unless there are going to be a lot of these, I'd just write them out long 
> > hand
> > as much more readable than the macro magic.  
> 
> This macro magic isn't new to Linux it was introduced with ftrace:
> 
> See "cpp tricks and treats": https://lwn.net/Articles/383362/

Yeah. I've dealt with that one a few times. It's very cleaver and compact
but a PITA to debug build errors related to it.

> 
> >
> > enum {
> > CXL_MEM_COMMAND_ID_INVALID,
> > CXL_MEM_COMMAND_ID_IDENTIFY,
> > CXL_MEM_COMMAND_ID_MAX
> > };
> >
> > static const struct {
> > const char *name;
> > } cxl_command_names[] = {
> > [CXL_MEM_COMMAND_ID_INVALID] = { "Invalid Command" },
> > [CXL_MEM_COMMAND_ID_IDENTIFY] = { "Identify Comamnd" },
> > /* I hope you never need the Last command to exist in here as that 
> > sounds like a bug */
> > };
> >
> > That's assuming I actually figured the macro fun out correctly.
> > To my mind it's worth doing this stuff for 'lots' no so much for 3.  
> 
> The list will continue to expand, and it eliminates the "did you
> remember to update cxl_command_names" review burden permanently.

How about a compromise.  Add a comment giving how the first entry expands to
avoid people (me at least :) having to think their way through it every time?

Jonathan
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 3/8] cxl/mem: Register CXL memX devices

2021-02-11 Thread Jonathan Cameron
On Wed, 10 Feb 2021 18:17:25 +
Jonathan Cameron  wrote:

> On Tue, 9 Feb 2021 16:02:54 -0800
> Ben Widawsky  wrote:
> 
> > From: Dan Williams 
> > 
> > Create the /sys/bus/cxl hierarchy to enumerate:
> > 
> > * Memory Devices (per-endpoint control devices)
> > 
> > * Memory Address Space Devices (platform address ranges with
> >   interleaving, performance, and persistence attributes)
> > 
> > * Memory Regions (active provisioned memory from an address space device
> >   that is in use as System RAM or delegated to libnvdimm as Persistent
> >   Memory regions).
> > 
> > For now, only the per-endpoint control devices are registered on the
> > 'cxl' bus. However, going forward it will provide a mechanism to
> > coordinate cross-device interleave.
> > 
> > Signed-off-by: Dan Williams 
> > Signed-off-by: Ben Widawsky   
> 
> One stray header, and a request for a tiny bit of reordering to
> make it easier to chase through creation and destruction.
> 
> Either way with the header move to earlier patch I'm fine with this one.
> 
> Reviewed-by: Jonathan Cameron 

Actually thinking more on this, what is the justification for the
complexity + overhead of a percpu_refcount vs a refcount

I don't think this is a high enough performance path for it to matter.
Perhaps I'm missing a usecase where it does?

Jonathan

> 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl   |  26 ++
> >  .../driver-api/cxl/memory-devices.rst |  17 +
> >  drivers/cxl/Makefile  |   3 +
> >  drivers/cxl/bus.c |  29 ++
> >  drivers/cxl/cxl.h |   4 +
> >  drivers/cxl/mem.c | 301 +-
> >  6 files changed, 378 insertions(+), 2 deletions(-)
> >  create mode 100644 Documentation/ABI/testing/sysfs-bus-cxl
> >  create mode 100644 drivers/cxl/bus.c
> >   
> 
> 
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 745f5e0bfce3..b3c56fa6e126 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -3,6 +3,7 @@
> >  
> >  #ifndef __CXL_H__
> >  #define __CXL_H__
> > +#include   
> 
> Why is this coming in now? Feels like it should have been in earlier
> patch that started using struct range
> 
> >  
> >  #include 
> >  #include 
> > @@ -55,6 +56,7 @@
> > (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=   \
> >  CXLMDEV_RESET_NEEDED_NOT)
> >  
> > +struct cxl_memdev;
> >  /**
> >   * struct cxl_mem - A CXL memory device
> >   * @pdev: The PCI device associated with this CXL device.
> > @@ -72,6 +74,7 @@
> >  struct cxl_mem {
> > struct pci_dev *pdev;
> > void __iomem *regs;
> > +   struct cxl_memdev *cxlmd;
> >  
> > void __iomem *status_regs;
> > void __iomem *mbox_regs;
> > @@ -90,4 +93,5 @@ struct cxl_mem {
> > } ram;
> >  };
> >  
> > +extern struct bus_type cxl_bus_type;
> >  #endif /* __CXL_H__ */
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 0a868a15badc..8bbd2495e237 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -1,11 +1,36 @@
> >  
> 
> > +
> > +static void cxl_memdev_release(struct device *dev)
> > +{
> > +   struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +
> > +   percpu_ref_exit(>ops_active);
> > +   ida_free(_memdev_ida, cxlmd->id);
> > +   kfree(cxlmd);
> > +}
> > +  
> ...
> 
> > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > +{
> > +   struct pci_dev *pdev = cxlm->pdev;
> > +   struct cxl_memdev *cxlmd;
> > +   struct device *dev;
> > +   struct cdev *cdev;
> > +   int rc;
> > +
> > +   cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
> > +   if (!cxlmd)
> > +   return -ENOMEM;
> > +   init_completion(>ops_dead);
> > +
> > +   /*
> > +* @cxlm is deallocated when the driver unbinds so operations
> > +* that are using it need to hold a live reference.
> > +*/
> > +   cxlmd->cxlm = cxlm;
> > +   rc = percpu_ref_init(>ops_active, cxlmdev_ops_active_release, 0,
> > +GFP_KERNEL);
> > +   if (rc)
> > +   goto err_ref;
> > +
> > +   rc = ida_alloc_range(_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
> > +   if (rc < 0)
> > +   goto err_id;
> > +   cxlmd->id = rc;
> > +
> > +   dev = >de

Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-11 Thread Jonathan Cameron
On Wed, 10 Feb 2021 10:16:05 -0800
Ben Widawsky  wrote:

> On 21-02-10 08:55:57, Ben Widawsky wrote:
> > On 21-02-10 15:07:59, Jonathan Cameron wrote:  
> > > On Wed, 10 Feb 2021 13:32:52 +0000
> > > Jonathan Cameron  wrote:
> > >   
> > > > On Tue, 9 Feb 2021 16:02:53 -0800
> > > > Ben Widawsky  wrote:
> > > >   
> > > > > Provide enough functionality to utilize the mailbox of a memory 
> > > > > device.
> > > > > The mailbox is used to interact with the firmware running on the 
> > > > > memory
> > > > > device. The flow is proven with one implemented command, "identify".
> > > > > Because the class code has already told the driver this is a memory
> > > > > device and the identify command is mandatory.
> > > > > 
> > > > > CXL devices contain an array of capabilities that describe the
> > > > > interactions software can have with the device or firmware running on
> > > > > the device. A CXL compliant device must implement the device status 
> > > > > and
> > > > > the mailbox capability. Additionally, a CXL compliant memory device 
> > > > > must
> > > > > implement the memory device capability. Each of the capabilities can
> > > > > [will] provide an offset within the MMIO region for interacting with 
> > > > > the
> > > > > CXL device.
> > > > > 
> > > > > The capabilities tell the driver how to find and map the register 
> > > > > space
> > > > > for CXL Memory Devices. The registers are required to utilize the CXL
> > > > > spec defined mailbox interface. The spec outlines two mailboxes, 
> > > > > primary
> > > > > and secondary. The secondary mailbox is earmarked for system firmware,
> > > > > and not handled in this driver.
> > > > > 
> > > > > Primary mailboxes are capable of generating an interrupt when 
> > > > > submitting
> > > > > a background command. That implementation is saved for a later time.
> > > > > 
> > > > > Link: https://www.computeexpresslink.org/download-the-specification
> > > > > Signed-off-by: Ben Widawsky 
> > > > > Reviewed-by: Dan Williams 
> > > > 
> > > > Hi Ben,
> > > > 
> > > >   
> > > > > +/**
> > > > > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory 
> > > > > device.
> > > > > + * @cxlm: The CXL memory device to communicate with.
> > > > > + * @mbox_cmd: Command to send to the memory device.
> > > > > + *
> > > > > + * Context: Any context. Expects mbox_lock to be held.
> > > > > + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 
> > > > > on success.
> > > > > + * Caller should check the return code in @mbox_cmd to make 
> > > > > sure it
> > > > > + * succeeded.
> > > > 
> > > > cxl_xfer_log() doesn't check mbox_cmd->return_code and for my test it 
> > > > currently
> > > > enters an infinite loop as a result.  
> > 
> > I meant to fix that.
> >   
> > > > 
> > > > I haven't checked other paths, but to my mind it is not a good idea to 
> > > > require
> > > > two levels of error checking - the example here proves how easy it is 
> > > > to forget
> > > > one.  
> > 
> > Demonstrably, you're correct. I think it would be good to have a kernel only
> > mbox command that does the error checking though. Let me type something up 
> > and
> > see how it looks.  
> 
> Hi Jonathan. What do you think of this? The bit I'm on the fence about is if I
> should validate output size too. I like the simplicity as it is, but it 
> requires
> every caller to possibly check output size, which is kind of the same problem
> you're originally pointing out.

The simplicity is good and this is pretty much what I expected you would end up 
with
(always reassuring)

For the output, perhaps just add another parameter to the wrapper for minimum
output length expected?

Now you mention the length question.  It does rather feel like there should also
be some protection on memcpy_fromio() copying too much data if the hardware
happens to return an unexpectedly long length.  Should never happen, but
the hardening is worth adding anyway given it'

Re: [PATCH v2 7/8] cxl/mem: Add set of informational commands

2021-02-11 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:58 -0800
Ben Widawsky  wrote:

> Add initial set of formal commands beyond basic identify and command
> enumeration.
> 
> Of special note is the Get Log Command which is only specified to return
> 2 log types, CEL and VENDOR_DEBUG. Given that VENDOR_DEBUG is already a
> large catch all for vendor specific information there is no known reason
> for devices to be implementing other log types. Unknown log types are
> included in the "vendor passthrough shenanigans" safety regime like raw
> commands and blocked by default.

As mentioned in previous patch comments, the way that is worded in the spec
suggests to me that what we might see if other specifications providing
more UUIDs to define other 'standard' info.  Maybe something else was
intended...   Still what you have done here makes sense to me.

> 
> Up to this point there has been no reason to inspect payload data.
> Given the need to check the log type add a new "validate_payload"
> operation to define a generic mechanism to restrict / filter commands.
> 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 

Reviewed-by: Jonathan Cameron 

> ---
>  drivers/cxl/mem.c| 55 +++-
>  include/uapi/linux/cxl_mem.h |  5 
>  2 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index e9aa6ca18d99..e8cc076b9f1b 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -44,12 +44,16 @@
>  enum opcode {
>   CXL_MBOX_OP_INVALID = 0x,
>   CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> + CXL_MBOX_OP_GET_FW_INFO = 0x0200,
>   CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
>   CXL_MBOX_OP_GET_SUPPORTED_LOGS  = 0x0400,
>   CXL_MBOX_OP_GET_LOG = 0x0401,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
> + CXL_MBOX_OP_GET_PARTITION_INFO  = 0x4100,
>   CXL_MBOX_OP_SET_PARTITION_INFO  = 0x4101,
> + CXL_MBOX_OP_GET_LSA = 0x4102,
>   CXL_MBOX_OP_SET_LSA = 0x4103,
> + CXL_MBOX_OP_GET_HEALTH_INFO = 0x4200,
>   CXL_MBOX_OP_SET_SHUTDOWN_STATE  = 0x4204,
>   CXL_MBOX_OP_SCAN_MEDIA  = 0x4304,
>   CXL_MBOX_OP_GET_SCAN_MEDIA  = 0x4305,
> @@ -118,6 +122,9 @@ static const uuid_t log_uuid[] = {
>   0xd6, 0x07, 0x19, 0x40, 0x3d, 0x86)
>  };
>  
> +static int validate_log_uuid(struct cxl_mem *cxlm, void __user *payload,
> +  size_t size);
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> @@ -129,6 +136,10 @@ static const uuid_t log_uuid[] = {
>   *  * %CXL_CMD_INTERNAL_FLAG_PSEUDO: This is a pseudo command which doesn't 
> have
>   *a direct mapping to hardware. They are implicitly always enabled.
>   *
> + * @validate_payload: A function called after the command is validated but
> + * before it's sent to the hardware. The primary purpose is to validate, or
> + * fixup the actual payload.
> + *
>   * The cxl_mem_command is the driver's internal representation of commands 
> that
>   * are supported by the driver. Some of these commands may not be supported 
> by
>   * the hardware. The driver will use @info to validate the fields passed in 
> by
> @@ -139,9 +150,12 @@ static const uuid_t log_uuid[] = {
>  struct cxl_mem_command {
>   struct cxl_command_info info;
>   enum opcode opcode;
> +
> + int (*validate_payload)(struct cxl_mem *cxlm, void __user *payload,
> + size_t size);
>  };
>  
> -#define CXL_CMD(_id, _flags, sin, sout)  
>   \
> +#define CXL_CMD_VALIDATE(_id, _flags, sin, sout, v)  
>   \
>   [CXL_MEM_COMMAND_ID_##_id] = { \
>   .info = {  \
>   .id = CXL_MEM_COMMAND_ID_##_id,\
> @@ -150,8 +164,12 @@ struct cxl_mem_command {
>   .size_out = sout,  \
>   }, \
>   .opcode = CXL_MBOX_OP_##_id,   \
> + .validate_payload = v, \
>   }
>  
> +#define CXL_CMD(_id, _flags, sin, sout)  
>   \
> + CXL_CMD_VALIDATE(_id, _flags, sin, sout, NULL)
> +
>  /*
>   * This table defines the supported mailbox commands for the driver. This 
> ta

Re: [PATCH v2 5/8] cxl/mem: Add a "RAW" send command

2021-02-11 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:56 -0800
Ben Widawsky  wrote:

> The CXL memory device send interface will have a number of supported
> commands. The raw command is not such a command. Raw commands allow
> userspace to send a specified opcode to the underlying hardware and
> bypass all driver checks on the command. This is useful for a couple of
> usecases, mainly:
> 1. Undocumented vendor specific hardware commands

This one I get.  There are things we'd love to standardize but often they
need proving in a generation of hardware before the data is available to
justify taking it to a standards body.  Stuff like performance stats.
This stuff will all sit in the vendor defined range.  Maybe there is an
argument for in driver hooks to allow proper support even for these
(Ben mentioned this in the other branch of the thread).

> 2. Prototyping new hardware commands not yet supported by the driver

For 2, could just have a convenient place to enable this by one line patch.
Some subsystems (SPI comes to mind) do this for their equivalent of raw
commands.  The code is all there to enable it but you need to hook it
up if you want to use it.  Avoids chance of a distro shipping it.

> 
> While this all sounds very powerful it comes with a couple of caveats:
> 1. Bug reports using raw commands will not get the same level of
>attention as bug reports using supported commands (via taint).
> 2. Supported commands will be rejected by the RAW command.

Perhaps I'm missing reading this point 2 (not sure the code actually does it!)

As stated what worries me as it means when we add support for a new
bit of the spec we just broke the userspace ABI.

> 
> With this comes new debugfs knob to allow full access to your toes with
> your weapon of choice.

A few trivial things inline,

Jonathan

> 
> Cc: Ariel Sibley 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 
> ---
>  drivers/cxl/Kconfig  |  18 +
>  drivers/cxl/mem.c| 125 ++-
>  include/uapi/linux/cxl_mem.h |  12 +++-
>  3 files changed, 152 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index c4ba3aa0a05d..08eaa8e52083 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -33,6 +33,24 @@ config CXL_MEM
>  
> If unsure say 'm'.
>  
> +config CXL_MEM_RAW_COMMANDS
> + bool "RAW Command Interface for Memory Devices"
> + depends on CXL_MEM
> + help
> +   Enable CXL RAW command interface.
> +
> +   The CXL driver ioctl interface may assign a kernel ioctl command
> +   number for each specification defined opcode. At any given point in
> +   time the number of opcodes that the specification defines and a device
> +   may implement may exceed the kernel's set of associated ioctl function
> +   numbers. The mismatch is either by omission, specification is too new,
> +   or by design. When prototyping new hardware, or developing / debugging
> +   the driver it is useful to be able to submit any possible command to
> +   the hardware, even commands that may crash the kernel due to their
> +   potential impact to memory currently in use by the kernel.
> +
> +   If developing CXL hardware or the driver say Y, otherwise say N.
> +
>  config CXL_MEM_INSECURE_DEBUG
>   bool "CXL.mem debugging"
>   depends on CXL_MEM
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index ce65630bb75e..6d766a994dce 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1,6 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -41,7 +43,14 @@
>  
>  enum opcode {
>   CXL_MBOX_OP_INVALID = 0x,
> + CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> + CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
> + CXL_MBOX_OP_SET_PARTITION_INFO  = 0x4101,
> + CXL_MBOX_OP_SET_LSA = 0x4103,
> + CXL_MBOX_OP_SET_SHUTDOWN_STATE  = 0x4204,
> + CXL_MBOX_OP_SCAN_MEDIA  = 0x4304,
> + CXL_MBOX_OP_GET_SCAN_MEDIA  = 0x4305,
>   CXL_MBOX_OP_MAX = 0x1
>  };
>  
> @@ -91,6 +100,8 @@ struct cxl_memdev {
>  
>  static int cxl_mem_major;
>  static DEFINE_IDA(cxl_memdev_ida);
> +static struct dentry *cxl_debugfs;
> +static bool raw_allow_all;
>  
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
> @@ -132,6 +143,49 @@ struct cxl_mem_command {
>   */
>  static struct cxl_mem_command mem_commands[] = {
>   CXL_CMD(IDENTIFY, NONE, 0, 0x43),
> +#ifdef CONFIG_CXL_MEM_RAW_COMMANDS
> + CXL_CMD(RAW, NONE, ~0, ~0),
> +#endif
> +};
> +
> +/*
> + * Commands that RAW doesn't permit. The rationale for each:
> + *
> + * CXL_MBOX_OP_ACTIVATE_FW: Firmware activation requires adjustment /
> + * coordination of 

Re: [PATCH v2 6/8] cxl/mem: Enable commands via CEL

2021-02-11 Thread Jonathan Cameron
On Tue, 9 Feb 2021 16:02:57 -0800
Ben Widawsky  wrote:

> CXL devices identified by the memory-device class code must implement
> the Device Command Interface (described in 8.2.9 of the CXL 2.0 spec).
> While the driver already maintains a list of commands it supports, there
> is still a need to be able to distinguish between commands that the
> driver knows about from commands that are optionally supported by the
> hardware.
> 
> The Command Effects Log (CEL) is specified in the CXL 2.0 specification.
> The CEL is one of two types of logs, the other being vendor specific.

I'd say "vendor specific debug" just so that no one thinks it has anything
to do with the rest of this description (which mentioned vendor specific
commands).

> They are distinguished in hardware/spec via UUID. The CEL is useful for
> 2 things:
> 1. Determine which optional commands are supported by the CXL device.
> 2. Enumerate any vendor specific commands
> 
> The CEL is used by the driver to determine which commands are available
> in the hardware and therefore which commands userspace is allowed to
> execute. The set of enabled commands might be a subset of commands which
> are advertised in UAPI via CXL_MEM_SEND_COMMAND IOCTL.
> 
> The implementation leaves the statically defined table of commands and
> supplements it with a bitmap to determine commands that are enabled.
> This organization was chosen for the following reasons:
> - Smaller memory footprint. Doesn't need a table per device.
> - Reduce memory allocation complexity.
> - Fixed command IDs to opcode mapping for all devices makes development
>   and debugging easier.
> - Certain helpers are easily achievable, like cxl_for_each_cmd().
> 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 
> ---
>  drivers/cxl/cxl.h|   2 +
>  drivers/cxl/mem.c| 216 +++
>  include/uapi/linux/cxl_mem.h |   1 +
>  3 files changed, 219 insertions(+)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b3c56fa6e126..9a5e595abfa4 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -68,6 +68,7 @@ struct cxl_memdev;
>   *(CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
>   * @mbox_mutex: Mutex to synchronize mailbox access.
>   * @firmware_version: Firmware version for the memory device.
> + * @enabled_commands: Hardware commands found enabled in CEL.
>   * @pmem: Persistent memory capacity information.
>   * @ram: Volatile memory capacity information.
>   */
> @@ -83,6 +84,7 @@ struct cxl_mem {
>   size_t payload_size;
>   struct mutex mbox_mutex; /* Protects device mailbox and firmware */
>   char firmware_version[0x10];
> + unsigned long *enabled_cmds;
>  
>   struct {
>   struct range range;
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 6d766a994dce..e9aa6ca18d99 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -45,6 +45,8 @@ enum opcode {
>   CXL_MBOX_OP_INVALID = 0x,
>   CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
>   CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> + CXL_MBOX_OP_GET_SUPPORTED_LOGS  = 0x0400,
> + CXL_MBOX_OP_GET_LOG = 0x0401,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
>   CXL_MBOX_OP_SET_PARTITION_INFO  = 0x4101,
>   CXL_MBOX_OP_SET_LSA = 0x4103,
> @@ -103,6 +105,19 @@ static DEFINE_IDA(cxl_memdev_ida);
>  static struct dentry *cxl_debugfs;
>  static bool raw_allow_all;
>  
> +enum {
> + CEL_UUID,
> + VENDOR_DEBUG_UUID

Who wants to take a bet this will get extended at somepoint in the future?
Add a trailing comma to make that less noisy.

They would never have used a UUID if this wasn't expected to expand.
CXL spec calls out that "The following Log Identifier UUIDs are defined in 
_this_
specification" rather implying other specs may well define more.
Fun for the future!

> +};
> +
> +/* See CXL 2.0 Table 170. Get Log Input Payload */
> +static const uuid_t log_uuid[] = {
> + [CEL_UUID] = UUID_INIT(0xda9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79, 0x96,
> +0xb1, 0x62, 0x3b, 0x3f, 0x17),
> + [VENDOR_DEBUG_UUID] = UUID_INIT(0xe1819d9, 0x11a9, 0x400c, 0x81, 0x1f,
> + 0xd6, 0x07, 0x19, 0x40, 0x3d, 0x86)

likewise on trailing comma

> +};
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> @@ -111,6 +126,8 @@ static bool raw_allow_all;
>   *
>   *  * %CXL_CMD_FLAG_MANDATORY: Hardware must support this command. This flag 
> is
>   *only used internally by the driver for sanity checking.
> + *  * %CXL_CMD_INTERNAL_FLAG_PSEUDO: This is a pseudo command which doesn't 
> have
> + *a direct mapping to hardware. They are implicitly always enabled.

Stale comment?

>   *
>   * The cxl_mem_command is the driver's internal representation of commands 
> that
>   * are 

Re: [PATCH 02/14] cxl/mem: Map memory device registers

2021-02-01 Thread Jonathan Cameron
On Mon, 1 Feb 2021 08:46:24 -0800
Ben Widawsky  wrote:

> On 21-01-30 15:51:42, David Rientjes wrote:
> > On Fri, 29 Jan 2021, Ben Widawsky wrote:
> >   
> > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > new file mode 100644
> > > index ..d81d0ba4617c
> > > --- /dev/null
> > > +++ b/drivers/cxl/cxl.h
> > > @@ -0,0 +1,17 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/* Copyright(c) 2020 Intel Corporation. */
> > > +
> > > +#ifndef __CXL_H__
> > > +#define __CXL_H__
> > > +
> > > +/**
> > > + * struct cxl_mem - A CXL memory device
> > > + * @pdev: The PCI device associated with this CXL device.
> > > + * @regs: IO mappings to the device's MMIO
> > > + */
> > > +struct cxl_mem {
> > > + struct pci_dev *pdev;
> > > + void __iomem *regs;
> > > +};
> > > +
> > > +#endif  
> > 
> > Stupid question: can there be more than one CXL.mem capable logical 
> > device?  I only ask to determine if an ordinal is needed to enumerate 
> > multiple LDs.  
> 
> Not a stupid question at all. I admit, I haven't spent much time thinking 
> about
> MLDs. I don't have a solid answer to your question. As I understand it, the
> devices in the virtual hierarchy will appear as individual CXL type 3 device
> components (2.4 in the spec) and transparent to software. A few times I've
> attempted to think about MLDs, get confused, and go do something else. The 
> only
> MLD specificity I know of is the MLD DVSEC (8.1.10), which seems not 
> incredibly
> interesting to me at present (basically, only supporting hot reset).

That's my understanding as well.  If you have an MLD (and hence multiple 
logical memory
devices) the fact they have multiple logical devices will be nearly invisible to
any given host (will more or less look like an SLD).  Configuration via the
fabric manager API will be invisible to the host other than via hotplug
events when the configuration changes.
Note the MLD DVSEC is only in the fabric manager owned logical device (so Linux
won't see it in the PCI hierarchy).  Note the fabric manager is usually 
controlled by
a BMC or similar.

Various registers become hardwired to 0 and certain writes are ignored when it's
an MLD logical device rather than an physical device (SLD)

We might want to look at emulating this in QEMU to give us a
platform to verify the spec (particularly to make sure the various hardwired
registers don't cause any problems), but I'd not expect to see anything specific
in kernel support.

Jonathan



> 
> >   
> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index f4ee9a507ac9..a869c8dc24cc 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -4,6 +4,58 @@
> > >  #include 
> > >  #include 
> > >  #include "pci.h"
> > > +#include "cxl.h"
> > > +
> > > +/**
> > > + * cxl_mem_create() - Create a new  cxl_mem.
> > > + * @pdev: The pci device associated with the new  cxl_mem.
> > > + * @reg_lo: Lower 32b of the register locator
> > > + * @reg_hi: Upper 32b of the register locator.
> > > + *
> > > + * Return: The new  cxl_mem on success, NULL on failure.
> > > + *
> > > + * Map the BAR for a CXL memory device. This BAR has the memory device's
> > > + * registers for the device as specified in CXL specification.
> > > + */
> > > +static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo,
> > > +   u32 reg_hi)
> > > +{
> > > + struct device *dev = >dev;
> > > + struct cxl_mem *cxlm;
> > > + void __iomem *regs;
> > > + u64 offset;
> > > + u8 bar;
> > > + int rc;
> > > +
> > > + offset = ((u64)reg_hi << 32) | (reg_lo & CXL_REGLOC_ADDR_MASK);
> > > + bar = (reg_lo >> CXL_REGLOC_BIR_SHIFT) & CXL_REGLOC_BIR_MASK;
> > > +
> > > + /* Basic sanity check that BAR is big enough */
> > > + if (pci_resource_len(pdev, bar) < offset) {
> > > + dev_err(dev, "BAR%d: %pr: too small (offset: %#llx)\n", bar,
> > > + >resource[bar], (unsigned long long)offset);
> > > + return NULL;
> > > + }
> > > +
> > > + rc = pcim_iomap_regions(pdev, BIT(bar), pci_name(pdev));
> > > + if (rc != 0) {
> > > + dev_err(dev, "failed to map registers\n");
> > > + return NULL;
> > > + }
> > > +
> > > + cxlm = devm_kzalloc(>dev, sizeof(*cxlm), GFP_KERNEL);
> > > + if (!cxlm) {
> > > + dev_err(dev, "No memory available\n");
> > > + return NULL;
> > > + }
> > > +
> > > + regs = pcim_iomap_table(pdev)[bar];
> > > + cxlm->pdev = pdev;
> > > + cxlm->regs = regs + offset;
> > > +
> > > + dev_dbg(dev, "Mapped CXL Memory Device resource\n");
> > > + return cxlm;
> > > +}
> > >  
> > >  static int cxl_mem_dvsec(struct pci_dev *pdev, int dvsec)
> > >  {
> > > @@ -32,15 +84,42 @@ static int cxl_mem_dvsec(struct pci_dev *pdev, int 
> > > dvsec)
> > >  static int cxl_mem_probe(struct pci_dev *pdev, const struct 
> > > pci_device_id *id)
> > >  {
> > >   struct device *dev = >dev;
> > > - int regloc;
> > > + struct cxl_mem *cxlm;
> > > + int rc, regloc, i;
> > > +
> > > + rc = 

Re: [PATCH v2 5/8] cxl/mem: Add a "RAW" send command

2021-02-12 Thread Jonathan Cameron
On Thu, 11 Feb 2021 08:01:48 -0800
Ben Widawsky  wrote:

> On 21-02-11 11:19:24, Jonathan Cameron wrote:
> > On Tue, 9 Feb 2021 16:02:56 -0800
> > Ben Widawsky  wrote:
> >   
> > > The CXL memory device send interface will have a number of supported
> > > commands. The raw command is not such a command. Raw commands allow
> > > userspace to send a specified opcode to the underlying hardware and
> > > bypass all driver checks on the command. This is useful for a couple of
> > > usecases, mainly:
> > > 1. Undocumented vendor specific hardware commands  
> > 
> > This one I get.  There are things we'd love to standardize but often they
> > need proving in a generation of hardware before the data is available to
> > justify taking it to a standards body.  Stuff like performance stats.
> > This stuff will all sit in the vendor defined range.  Maybe there is an
> > argument for in driver hooks to allow proper support even for these
> > (Ben mentioned this in the other branch of the thread).
> >   
> > > 2. Prototyping new hardware commands not yet supported by the driver  
> > 
> > For 2, could just have a convenient place to enable this by one line patch.
> > Some subsystems (SPI comes to mind) do this for their equivalent of raw
> > commands.  The code is all there to enable it but you need to hook it
> > up if you want to use it.  Avoids chance of a distro shipping it.
> >   
> 
> I'm fine to drop #2 as a justification point, or maybe reword the commit 
> message
> to say, "you could also just do... but since we have it for #1 already..."
> 
> > > 
> > > While this all sounds very powerful it comes with a couple of caveats:
> > > 1. Bug reports using raw commands will not get the same level of
> > >attention as bug reports using supported commands (via taint).
> > > 2. Supported commands will be rejected by the RAW command.  
> > 
> > Perhaps I'm missing reading this point 2 (not sure the code actually does 
> > it!)
> > 
> > As stated what worries me as it means when we add support for a new
> > bit of the spec we just broke the userspace ABI.
> >   
> 
> It does not break ABI. The agreement is userspace must always use the QUERY
> command to find out what commands are supported. If it tries to use a RAW
> command that is a supported command, it will be rejected. In the case you
> mention, that's an application bug. If there is a way to document that better
> than what's already in the UAPI kdocs, I'm open to suggestions.
> 
> Unlike perhaps other UAPI, this one only promises to give you a way to 
> determine
> what commands you can use, not the list of what commands you can use.

*crossed fingers* on this.  Users may have a different view when their 
application
just stops working.  It might print a nice error message telling them why
but it still doesn't work and that way lies grumpy Linus and reverts...

Mostly we'll get away with it because no one will notice, but it's unfortunately
still risky.   Personal preference is toplay safer and not allow direct 
userspace
access to commands in the spec (unless we've decided they will always be 
available
directly to userspace).  This includes anything in the ranges reserved for 
future
spec usage.

Jonathan



> 
> > > 
> > > With this comes new debugfs knob to allow full access to your toes with
> > > your weapon of choice.  
> > 
> > A few trivial things inline,
> > 
> > Jonathan
> >   
> > > 
> > > Cc: Ariel Sibley 
> > > Signed-off-by: Ben Widawsky 
> > > Reviewed-by: Dan Williams 
> > > ---
> > >  drivers/cxl/Kconfig  |  18 +
> > >  drivers/cxl/mem.c| 125 ++-
> > >  include/uapi/linux/cxl_mem.h |  12 +++-
> > >  3 files changed, 152 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > index c4ba3aa0a05d..08eaa8e52083 100644
> > > --- a/drivers/cxl/Kconfig
> > > +++ b/drivers/cxl/Kconfig
> > > @@ -33,6 +33,24 @@ config CXL_MEM
> > >  
> > > If unsure say 'm'.
> > >  
> > > +config CXL_MEM_RAW_COMMANDS
> > > + bool "RAW Command Interface for Memory Devices"
> > > + depends on CXL_MEM
> > > + help
> > > +   Enable CXL RAW command interface.
> > > +
> > > +   The CXL driver ioctl interface may assign a kernel ioctl command
> > > +   number for each specification defined opcode. At any given point in
> > > +   time the number of o

Re: [PATCH v2 3/8] cxl/mem: Register CXL memX devices

2021-02-12 Thread Jonathan Cameron
On Thu, 11 Feb 2021 12:40:45 -0800
Dan Williams  wrote:

> On Thu, Feb 11, 2021 at 2:19 AM Jonathan Cameron
>  wrote:
> >
> > On Wed, 10 Feb 2021 18:17:25 +
> > Jonathan Cameron  wrote:
> >  
> > > On Tue, 9 Feb 2021 16:02:54 -0800
> > > Ben Widawsky  wrote:
> > >  
> > > > From: Dan Williams 
> > > >
> > > > Create the /sys/bus/cxl hierarchy to enumerate:
> > > >
> > > > * Memory Devices (per-endpoint control devices)
> > > >
> > > > * Memory Address Space Devices (platform address ranges with
> > > >   interleaving, performance, and persistence attributes)
> > > >
> > > > * Memory Regions (active provisioned memory from an address space device
> > > >   that is in use as System RAM or delegated to libnvdimm as Persistent
> > > >   Memory regions).
> > > >
> > > > For now, only the per-endpoint control devices are registered on the
> > > > 'cxl' bus. However, going forward it will provide a mechanism to
> > > > coordinate cross-device interleave.
> > > >
> > > > Signed-off-by: Dan Williams 
> > > > Signed-off-by: Ben Widawsky   
> > >
> > > One stray header, and a request for a tiny bit of reordering to
> > > make it easier to chase through creation and destruction.
> > >
> > > Either way with the header move to earlier patch I'm fine with this one.
> > >
> > > Reviewed-by: Jonathan Cameron   
> >
> > Actually thinking more on this, what is the justification for the
> > complexity + overhead of a percpu_refcount vs a refcount  
> 
> A typical refcount does not have the block and drain semantics of a
> percpu_ref. I'm planning to circle back and make this a first class
> facility of the cdev interface borrowing the debugfs approach [1], but
> for now percpu_ref fits the bill locally.
> 
> > I don't think this is a high enough performance path for it to matter.
> > Perhaps I'm missing a usecase where it does?  
> 
> It's less about percpu_ref performance and more about the
> percpu_ref_tryget_live() facility.
> 
> [1]: 
> http://lore.kernel.org/r/CAPcyv4jEYPsyh0bhbtKGRbK3bgp=_+=2rjx4x0gli5-25vv...@mail.gmail.com

Thanks for the reference. Definitely a nasty corner to clean up so I'll
keep an eye open for a new version of that series.

Jonathan

___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-12 Thread Jonathan Cameron
On Thu, 11 Feb 2021 10:27:41 -0800
Ben Widawsky  wrote:

> On 21-02-11 09:55:48, Jonathan Cameron wrote:
> > On Wed, 10 Feb 2021 10:16:05 -0800
> > Ben Widawsky  wrote:
> >   
> > > On 21-02-10 08:55:57, Ben Widawsky wrote:  
> > > > On 21-02-10 15:07:59, Jonathan Cameron wrote:
> > > > > On Wed, 10 Feb 2021 13:32:52 +
> > > > > Jonathan Cameron  wrote:
> > > > > 
> > > > > > On Tue, 9 Feb 2021 16:02:53 -0800
> > > > > > Ben Widawsky  wrote:
> > > > > > 
> > > > > > > Provide enough functionality to utilize the mailbox of a memory 
> > > > > > > device.
> > > > > > > The mailbox is used to interact with the firmware running on the 
> > > > > > > memory
> > > > > > > device. The flow is proven with one implemented command, 
> > > > > > > "identify".
> > > > > > > Because the class code has already told the driver this is a 
> > > > > > > memory
> > > > > > > device and the identify command is mandatory.
> > > > > > > 
> > > > > > > CXL devices contain an array of capabilities that describe the
> > > > > > > interactions software can have with the device or firmware 
> > > > > > > running on
> > > > > > > the device. A CXL compliant device must implement the device 
> > > > > > > status and
> > > > > > > the mailbox capability. Additionally, a CXL compliant memory 
> > > > > > > device must
> > > > > > > implement the memory device capability. Each of the capabilities 
> > > > > > > can
> > > > > > > [will] provide an offset within the MMIO region for interacting 
> > > > > > > with the
> > > > > > > CXL device.
> > > > > > > 
> > > > > > > The capabilities tell the driver how to find and map the register 
> > > > > > > space
> > > > > > > for CXL Memory Devices. The registers are required to utilize the 
> > > > > > > CXL
> > > > > > > spec defined mailbox interface. The spec outlines two mailboxes, 
> > > > > > > primary
> > > > > > > and secondary. The secondary mailbox is earmarked for system 
> > > > > > > firmware,
> > > > > > > and not handled in this driver.
> > > > > > > 
> > > > > > > Primary mailboxes are capable of generating an interrupt when 
> > > > > > > submitting
> > > > > > > a background command. That implementation is saved for a later 
> > > > > > > time.
> > > > > > > 
> > > > > > > Link: 
> > > > > > > https://www.computeexpresslink.org/download-the-specification
> > > > > > > Signed-off-by: Ben Widawsky 
> > > > > > > Reviewed-by: Dan Williams   
> > > > > > 
> > > > > > Hi Ben,
> > > > > > 
> > > > > > 
> > > > > > > +/**
> > > > > > > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory 
> > > > > > > device.
> > > > > > > + * @cxlm: The CXL memory device to communicate with.
> > > > > > > + * @mbox_cmd: Command to send to the memory device.
> > > > > > > + *
> > > > > > > + * Context: Any context. Expects mbox_lock to be held.
> > > > > > > + * Return: -ETIMEDOUT if timeout occurred waiting for 
> > > > > > > completion. 0 on success.
> > > > > > > + * Caller should check the return code in @mbox_cmd to 
> > > > > > > make sure it
> > > > > > > + * succeeded.  
> > > > > > 
> > > > > > cxl_xfer_log() doesn't check mbox_cmd->return_code and for my test 
> > > > > > it currently
> > > > > > enters an infinite loop as a result.
> > > > 
> > > > I meant to fix that.
> > > > 
> > > > > > 
> > > > > > I haven't checked other paths, but to my mind it is not a good idea 
> > > > > > to require
> > > > > > two le

Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

2021-02-12 Thread Jonathan Cameron
On Thu, 11 Feb 2021 07:55:29 -0800
Ben Widawsky  wrote:

> On 21-02-11 09:55:48, Jonathan Cameron wrote:
> > On Wed, 10 Feb 2021 10:16:05 -0800
> > Ben Widawsky  wrote:
> >   
> > > On 21-02-10 08:55:57, Ben Widawsky wrote:  
> > > > On 21-02-10 15:07:59, Jonathan Cameron wrote:
> > > > > On Wed, 10 Feb 2021 13:32:52 +
> > > > > Jonathan Cameron  wrote:
> > > > > 
> > > > > > On Tue, 9 Feb 2021 16:02:53 -0800
> > > > > > Ben Widawsky  wrote:
> > > > > > 
> > > > > > > Provide enough functionality to utilize the mailbox of a memory 
> > > > > > > device.
> > > > > > > The mailbox is used to interact with the firmware running on the 
> > > > > > > memory
> > > > > > > device. The flow is proven with one implemented command, 
> > > > > > > "identify".
> > > > > > > Because the class code has already told the driver this is a 
> > > > > > > memory
> > > > > > > device and the identify command is mandatory.
> > > > > > > 
> > > > > > > CXL devices contain an array of capabilities that describe the
> > > > > > > interactions software can have with the device or firmware 
> > > > > > > running on
> > > > > > > the device. A CXL compliant device must implement the device 
> > > > > > > status and
> > > > > > > the mailbox capability. Additionally, a CXL compliant memory 
> > > > > > > device must
> > > > > > > implement the memory device capability. Each of the capabilities 
> > > > > > > can
> > > > > > > [will] provide an offset within the MMIO region for interacting 
> > > > > > > with the
> > > > > > > CXL device.
> > > > > > > 
> > > > > > > The capabilities tell the driver how to find and map the register 
> > > > > > > space
> > > > > > > for CXL Memory Devices. The registers are required to utilize the 
> > > > > > > CXL
> > > > > > > spec defined mailbox interface. The spec outlines two mailboxes, 
> > > > > > > primary
> > > > > > > and secondary. The secondary mailbox is earmarked for system 
> > > > > > > firmware,
> > > > > > > and not handled in this driver.
> > > > > > > 
> > > > > > > Primary mailboxes are capable of generating an interrupt when 
> > > > > > > submitting
> > > > > > > a background command. That implementation is saved for a later 
> > > > > > > time.
> > > > > > > 
> > > > > > > Link: 
> > > > > > > https://www.computeexpresslink.org/download-the-specification
> > > > > > > Signed-off-by: Ben Widawsky 
> > > > > > > Reviewed-by: Dan Williams   
> > > > > > 
> > > > > > Hi Ben,
> > > > > > 
> > > > > > 
> > > > > > > +/**
> > > > > > > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory 
> > > > > > > device.
> > > > > > > + * @cxlm: The CXL memory device to communicate with.
> > > > > > > + * @mbox_cmd: Command to send to the memory device.
> > > > > > > + *
> > > > > > > + * Context: Any context. Expects mbox_lock to be held.
> > > > > > > + * Return: -ETIMEDOUT if timeout occurred waiting for 
> > > > > > > completion. 0 on success.
> > > > > > > + * Caller should check the return code in @mbox_cmd to 
> > > > > > > make sure it
> > > > > > > + * succeeded.  
> > > > > > 
> > > > > > cxl_xfer_log() doesn't check mbox_cmd->return_code and for my test 
> > > > > > it currently
> > > > > > enters an infinite loop as a result.
> > > > 
> > > > I meant to fix that.
> > > > 
> > > > > > 
> > > > > > I haven't checked other paths, but to my mind it is not a good idea 
> > > > > > to require
> > > > > >

Re: [PATCH v4 2/9] cxl/mem: Find device capabilities

2021-02-16 Thread Jonathan Cameron
On Mon, 15 Feb 2021 17:45:31 -0800
Ben Widawsky  wrote:

> Provide enough functionality to utilize the mailbox of a memory device.
> The mailbox is used to interact with the firmware running on the memory
> device. The flow is proven with one implemented command, "identify".
> Because the class code has already told the driver this is a memory
> device and the identify command is mandatory.
> 
> CXL devices contain an array of capabilities that describe the
> interactions software can have with the device or firmware running on
> the device. A CXL compliant device must implement the device status and
> the mailbox capability. Additionally, a CXL compliant memory device must
> implement the memory device capability. Each of the capabilities can
> [will] provide an offset within the MMIO region for interacting with the
> CXL device.
> 
> The capabilities tell the driver how to find and map the register space
> for CXL Memory Devices. The registers are required to utilize the CXL
> spec defined mailbox interface. The spec outlines two mailboxes, primary
> and secondary. The secondary mailbox is earmarked for system firmware,
> and not handled in this driver.
> 
> Primary mailboxes are capable of generating an interrupt when submitting
> a background command. That implementation is saved for a later time.
> 
> Link: https://www.computeexpresslink.org/download-the-specification
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams  (v2)

Looks like an off by one error in the register locator iteration.

The potential buffer overrun from memcpy_fromio is still there as well
as far as I can see.

If the software provides storage for a payload of size n and the hardware
reports a size of n + d, code will happily write beyond the end of the
storage provided.

Obviously, this shouldn't happen, but I'm not that trusting of both
hardware and software never having bugs.

Jonathan

> ---
>  drivers/cxl/cxl.h |  88 
>  drivers/cxl/mem.c | 543 +-
>  drivers/cxl/pci.h |  14 ++
>  3 files changed, 643 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/cxl/cxl.h
> 
...

> +
> +#endif /* __CXL_H__ */
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index ce33c5ee77c9..b86cda2d299a 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -3,7 +3,458 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "pci.h"
> +#include "cxl.h"
> +
> +#define cxl_doorbell_busy(cxlm)  
>   \
> + (readl((cxlm)->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET) &  \
> +  CXLDEV_MBOX_CTRL_DOORBELL)
> +
> +/* CXL 2.0 - 8.2.8.4 */
> +#define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
> +
> +enum opcode {
> + CXL_MBOX_OP_IDENTIFY= 0x4000,
> + CXL_MBOX_OP_MAX = 0x1
> +};
> +
> +/**
> + * struct mbox_cmd - A command to be submitted to hardware.
> + * @opcode: (input) The command set and command submitted to hardware.
> + * @payload_in: (input) Pointer to the input payload.
> + * @payload_out: (output) Pointer to the output payload. Must be allocated by
> + *the caller.
> + * @size_in: (input) Number of bytes to load from @payload.
> + * @size_out: (output) Number of bytes loaded into @payload.
> + * @return_code: (output) Error code returned from hardware.
> + *
> + * This is the primary mechanism used to send commands to the hardware.
> + * All the fields except @payload_* correspond exactly to the fields 
> described in
> + * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
> + * @payload_out are written to, and read from the Command Payload Registers
> + * defined in CXL 2.0 8.2.8.4.8.
> + */
> +struct mbox_cmd {
> + u16 opcode;
> + void *payload_in;
> + void *payload_out;
> + size_t size_in;
> + size_t size_out;
> + u16 return_code;
> +#define CXL_MBOX_SUCCESS 0
> +};


> +
> +/**
> + * __cxl_mem_mbox_send_cmd() - Execute a mailbox command
> + * @cxlm: The CXL memory device to communicate with.
> + * @mbox_cmd: Command to send to the memory device.
> + *
> + * Context: Any context. Expects mbox_mutex to be held.
> + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 on 
> success.
> + * Caller should check the return code in @mbox_cmd to make sure it
> + * succeeded.
> + *
> + * This is a generic form of the CXL mailbox send command thus only using the
> + * registers defined by the mailbox capability ID - CXL 2.0 8.2.8.4. Memory
> + * devices, and perhaps other types of CXL devices may have further 
> information
> + * available upon error conditions. Driver facilities wishing to send mailbox
> + * commands should use the wrapper command.
> + *
> + * The CXL spec allows for up to two mailboxes. The intention is for the 
> primary
> + * mailbox to be OS controlled and the secondary mailbox to be used by system
> + * firmware. This allows the OS and firmware to communicate with the device 

Re: [PATCH v4 4/9] cxl/mem: Add basic IOCTL interface

2021-02-16 Thread Jonathan Cameron
On Mon, 15 Feb 2021 17:45:33 -0800
Ben Widawsky  wrote:

> Add a straightforward IOCTL that provides a mechanism for userspace to
> query the supported memory device commands. CXL commands as they appear
> to userspace are described as part of the UAPI kerneldoc. The command
> list returned via this IOCTL will contain the full set of commands that
> the driver supports, however, some of those commands may not be
> available for use by userspace.
> 
> Memory device commands first appear in the CXL 2.0 specification. They
> are submitted through a mailbox mechanism specified in the CXL 2.0
> specification.
> 
> The send command allows userspace to issue mailbox commands directly to
> the hardware. The list of available commands to send are the output of
> the query command. The driver verifies basic properties of the command
> and possibly inspect the input (or output) payload to determine whether
> or not the command is allowed (or might taint the kernel).
> 
> Reported-by: kernel test robot  # bug in earlier revision
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams  (v2)

I may be missreading this but I think the logic to ensure commands
using a variable sized buffer have enough space is broken.

Jonathan

> ---
>  .clang-format |   1 +
>  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
>  drivers/cxl/mem.c | 288 +-
>  include/uapi/linux/cxl_mem.h  | 154 ++
>  4 files changed, 443 insertions(+), 1 deletion(-)
>  create mode 100644 include/uapi/linux/cxl_mem.h
> 
> diff --git a/.clang-format b/.clang-format
> index 10dc5a9a61b3..3f11c8901b43 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -109,6 +109,7 @@ ForEachMacros:
>- 'css_for_each_child'
>- 'css_for_each_descendant_post'
>- 'css_for_each_descendant_pre'
> +  - 'cxl_for_each_cmd'
>- 'device_for_each_child_node'
>- 'dma_fence_chain_for_each'
>- 'do_for_each_ftrace_op'
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index a4c75a28c839..6eb8e634664d 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -352,6 +352,7 @@ Code  Seq#Include File
>Comments
>   
> 
>  0xCC  00-0F  drivers/misc/ibmvmc.h   pseries 
> VMC driver
>  0xCD  01 linux/reiserfs_fs.h
> +0xCE  01-02  uapi/linux/cxl_mem.hCompute 
> Express Link Memory Devices
>  0xCF  02 fs/cifs/ioctl.c
>  0xDB  00-0F  drivers/char/mwave/mwavepub.h
>  0xDD  00-3F  ZFCP 
> device driver see drivers/s390/scsi/
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 410adb1bdffc..a4298cb1182d 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -40,6 +41,7 @@
>  #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
>  
>  enum opcode {
> + CXL_MBOX_OP_INVALID = 0x,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
>   CXL_MBOX_OP_MAX = 0x1
>  };
> @@ -91,6 +93,49 @@ struct cxl_memdev {
>  static int cxl_mem_major;
>  static DEFINE_IDA(cxl_memdev_ida);
>  
> +/**
> + * struct cxl_mem_command - Driver representation of a memory device command
> + * @info: Command information as it exists for the UAPI
> + * @opcode: The actual bits used for the mailbox protocol
> + *
> + * The cxl_mem_command is the driver's internal representation of commands 
> that
> + * are supported by the driver. Some of these commands may not be supported 
> by
> + * the hardware. The driver will use @info to validate the fields passed in 
> by
> + * the user then submit the @opcode to the hardware.
> + *
> + * See struct cxl_command_info.
> + */
> +struct cxl_mem_command {
> + struct cxl_command_info info;
> + enum opcode opcode;
> +};
> +
> +#define CXL_CMD(_id, sin, sout)  
>   \
> + [CXL_MEM_COMMAND_ID_##_id] = { \
> + .info = {  \
> + .id = CXL_MEM_COMMAND_ID_##_id,\
> + .size_in = sin,\
> + .size_out = sout,  \
> + }, \
> + .opcode = CXL_MBOX_OP_##_id,   \
> + }
> +
> +/*
> + * This table defines 

Re: [PATCH v4 5/9] cxl/mem: Add a "RAW" send command

2021-02-16 Thread Jonathan Cameron
On Mon, 15 Feb 2021 17:45:34 -0800
Ben Widawsky  wrote:

> The CXL memory device send interface will have a number of supported
> commands. The raw command is not such a command. Raw commands allow
> userspace to send a specified opcode to the underlying hardware and
> bypass all driver checks on the command. The primary use for this
> command is to [begrudgingly] allow undocumented vendor specific hardware
> commands.
> 
> While not the main motivation, it also allows prototyping new hardware
> commands without a driver patch and rebuild.
> 
> While this all sounds very powerful it comes with a couple of caveats:
> 1. Bug reports using raw commands will not get the same level of
>attention as bug reports using supported commands (via taint).
> 2. Supported commands will be rejected by the RAW command.
> 
> With this comes new debugfs knob to allow full access to your toes with
> your weapon of choice.
> 
> Cc: Ariel Sibley 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams  (v2)

Whilst I'm definitely dubious about introducing this interface
so early in development, I haven't found any problems with 'how' it
has been done.

I guess it's now just up to us to hassle our hardware colleagues into
only using this facility when absolutely necessary...

Reviewed-by: Jonathan Cameron 

> ---
>  drivers/cxl/Kconfig  |  18 +
>  drivers/cxl/mem.c| 132 +++
>  include/uapi/linux/cxl_mem.h |  12 +++-
>  3 files changed, 161 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 9e80b311e928..97dc4d751651 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -32,4 +32,22 @@ config CXL_MEM
> Chapter 2.3 Type 3 CXL Device in the CXL 2.0 specification.
>  
> If unsure say 'm'.
> +
> +config CXL_MEM_RAW_COMMANDS
> + bool "RAW Command Interface for Memory Devices"
> + depends on CXL_MEM
> + help
> +   Enable CXL RAW command interface.
> +
> +   The CXL driver ioctl interface may assign a kernel ioctl command
> +   number for each specification defined opcode. At any given point in
> +   time the number of opcodes that the specification defines and a device
> +   may implement may exceed the kernel's set of associated ioctl function
> +   numbers. The mismatch is either by omission, specification is too new,
> +   or by design. When prototyping new hardware, or developing / debugging
> +   the driver it is useful to be able to submit any possible command to
> +   the hardware, even commands that may crash the kernel due to their
> +   potential impact to memory currently in use by the kernel.
> +
> +   If developing CXL hardware or the driver say Y, otherwise say N.
>  endif
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index a4298cb1182d..6b4feb0ce47d 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1,6 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -42,7 +44,14 @@
>  
>  enum opcode {
>   CXL_MBOX_OP_INVALID = 0x,
> + CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> + CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
> + CXL_MBOX_OP_SET_PARTITION_INFO  = 0x4101,
> + CXL_MBOX_OP_SET_LSA = 0x4103,
> + CXL_MBOX_OP_SET_SHUTDOWN_STATE  = 0x4204,
> + CXL_MBOX_OP_SCAN_MEDIA  = 0x4304,
> + CXL_MBOX_OP_GET_SCAN_MEDIA  = 0x4305,
>   CXL_MBOX_OP_MAX = 0x1
>  };
>  
> @@ -92,6 +101,8 @@ struct cxl_memdev {
>  
>  static int cxl_mem_major;
>  static DEFINE_IDA(cxl_memdev_ida);
> +static struct dentry *cxl_debugfs;
> +static bool cxl_raw_allow_all;
>  
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
> @@ -128,6 +139,49 @@ struct cxl_mem_command {
>   */
>  static struct cxl_mem_command mem_commands[] = {
>   CXL_CMD(IDENTIFY, 0, 0x43),
> +#ifdef CONFIG_CXL_MEM_RAW_COMMANDS
> + CXL_CMD(RAW, ~0, ~0),
> +#endif
> +};
> +
> +/*
> + * Commands that RAW doesn't permit. The rationale for each:
> + *
> + * CXL_MBOX_OP_ACTIVATE_FW: Firmware activation requires adjustment /
> + * coordination of transaction timeout values at the root bridge level.
> + *
> + * CXL_MBOX_OP_SET_PARTITION_INFO: The device memory map may change live
> + * and needs to be coordinated with HDM updates.
> + *
> + * CXL_MBOX_OP_SET_LSA: The label storage area may b

Re: [PATCH v4 6/9] cxl/mem: Enable commands via CEL

2021-02-16 Thread Jonathan Cameron
On Mon, 15 Feb 2021 17:45:35 -0800
Ben Widawsky  wrote:

> CXL devices identified by the memory-device class code must implement
> the Device Command Interface (described in 8.2.9 of the CXL 2.0 spec).
> While the driver already maintains a list of commands it supports, there
> is still a need to be able to distinguish between commands that the
> driver knows about from commands that are optionally supported by the
> hardware.
> 
> The Command Effects Log (CEL) is specified in the CXL 2.0 specification.
> The CEL is one of two types of logs, the other being vendor specific.
> They are distinguished in hardware/spec via UUID. The CEL is useful for
> 2 things:
> 1. Determine which optional commands are supported by the CXL device.
> 2. Enumerate any vendor specific commands
> 
> The CEL is used by the driver to determine which commands are available
> in the hardware and therefore which commands userspace is allowed to
> execute. The set of enabled commands might be a subset of commands which
> are advertised in UAPI via CXL_MEM_SEND_COMMAND IOCTL.
> 
> With the CEL enabling comes a internal flag to indicate a base set of
> commands that are enabled regardless of CEL. Such commands are required
> for basic interaction with the hardware and thus can be useful in debug
> cases, for example if the CEL is corrupted.
> 
> The implementation leaves the statically defined table of commands and
> supplements it with a bitmap to determine commands that are enabled.
> This organization was chosen for the following reasons:
> - Smaller memory footprint. Doesn't need a table per device.
> - Reduce memory allocation complexity.
> - Fixed command IDs to opcode mapping for all devices makes development
>   and debugging easier.
> - Certain helpers are easily achievable, like cxl_for_each_cmd().
> 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams 
A few trivial, now unnecessary casts following on from you changing the types
of the buffer pointers passed to cxl_mem_mbox_send_cmd() to void *

Otherwise looks good to me.

Reviewed-by: Jonathan Cameron 

> ---
>  drivers/cxl/cxl.h|   2 +
>  drivers/cxl/mem.c| 213 ++-
>  include/uapi/linux/cxl_mem.h |   1 +
>  3 files changed, 213 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f69313dc3b4e..cb7a77ed443d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -67,6 +67,7 @@ struct cxl_memdev;
>   *(CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
>   * @mbox_mutex: Mutex to synchronize mailbox access.
>   * @firmware_version: Firmware version for the memory device.
> + * @enabled_commands: Hardware commands found enabled in CEL.
>   * @pmem: Persistent memory capacity information.
>   * @ram: Volatile memory capacity information.
>   */
> @@ -82,6 +83,7 @@ struct cxl_mem {
>   size_t payload_size;
>   struct mutex mbox_mutex; /* Protects device mailbox and firmware */
>   char firmware_version[0x10];
> + unsigned long *enabled_cmds;
>  
>   struct range pmem_range;
>   struct range ram_range;
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 6b4feb0ce47d..4d921b4375f9 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -46,6 +46,8 @@ enum opcode {
>   CXL_MBOX_OP_INVALID = 0x,
>   CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
>   CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> + CXL_MBOX_OP_GET_SUPPORTED_LOGS  = 0x0400,
> + CXL_MBOX_OP_GET_LOG = 0x0401,
>   CXL_MBOX_OP_IDENTIFY= 0x4000,
>   CXL_MBOX_OP_SET_PARTITION_INFO  = 0x4101,
>   CXL_MBOX_OP_SET_LSA = 0x4103,
> @@ -104,10 +106,28 @@ static DEFINE_IDA(cxl_memdev_ida);
>  static struct dentry *cxl_debugfs;
>  static bool cxl_raw_allow_all;
>  
> +enum {
> + CEL_UUID,
> + VENDOR_DEBUG_UUID,
> +};
> +
> +/* See CXL 2.0 Table 170. Get Log Input Payload */
> +static const uuid_t log_uuid[] = {
> + [CEL_UUID] = UUID_INIT(0xda9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79, 0x96,
> +0xb1, 0x62, 0x3b, 0x3f, 0x17),
> + [VENDOR_DEBUG_UUID] = UUID_INIT(0xe1819d9, 0x11a9, 0x400c, 0x81, 0x1f,
> + 0xd6, 0x07, 0x19, 0x40, 0x3d, 0x86),
> +};
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
>   * @opcode: The actual bits used for the mailbox protocol
> + * @flags: Set of flags effecting driver behavior.
> + *
> + *  * %CXL_CMD_FLAG_FORCE_ENABLE: In cases of error, commands with this flag
> + *will be enabled by t

Re: [PATCH v4 9/9] cxl/mem: Add payload dumping for debug

2021-02-16 Thread Jonathan Cameron
On Mon, 15 Feb 2021 17:45:38 -0800
Ben Widawsky  wrote:

> It's often useful in debug scenarios to see what the hardware has dumped
> out. As it stands today, any device error will result in the payload not
> being copied out, so there is no way to triage commands which weren't
> expected to fail (and sometimes the payload may have that information).
> 
> The functionality is protected by normal kernel security mechanisms as
> well as a CONFIG option in the CXL driver.
> 
> This was extracted from the original version of the CXL enabling patch
> series.
> 
> Signed-off-by: Ben Widawsky 

My gut feeling here is use a tracepoint rather than spamming the kernel
log.  Alternatively just don't bother merging this patch - it's on the list
now anyway so trivial for anyone doing such debug to pick it up.

Jonathan



> ---
>  drivers/cxl/Kconfig | 13 +
>  drivers/cxl/mem.c   |  8 
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 97dc4d751651..3eec9276e586 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -50,4 +50,17 @@ config CXL_MEM_RAW_COMMANDS
> potential impact to memory currently in use by the kernel.
>  
> If developing CXL hardware or the driver say Y, otherwise say N.
> +
> +config CXL_MEM_INSECURE_DEBUG
> + bool "CXL.mem debugging"
> + depends on CXL_MEM
> + help
> +   Enable debug of all CXL command payloads.
> +
> +   Some CXL devices and controllers support encryption and other
> +   security features. The payloads for the commands that enable
> +   those features may contain sensitive clear-text security
> +   material. Disable debug of those command payloads by default.
> +   If you are a kernel developer actively working on CXL
> +   security enabling say Y, otherwise say N.
>  endif
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index dc608bb20a31..237b956f0be0 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -342,6 +342,14 @@ static int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm,
>  
>   /* #5 */
>   rc = cxl_mem_wait_for_doorbell(cxlm);
> +
> + if (!cxl_is_security_command(mbox_cmd->opcode) ||
> + IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {
> + print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,
> +  mbox_cmd->payload_in, mbox_cmd->size_in,
> +  true);
> + }
> +
>   if (rc == -ETIMEDOUT) {
>   cxl_mem_mbox_timeout(cxlm, mbox_cmd);
>   return rc;
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v4 2/9] cxl/mem: Find device capabilities

2021-02-16 Thread Jonathan Cameron
On Tue, 16 Feb 2021 08:43:03 -0800
Ben Widawsky  wrote:

> On 21-02-16 14:51:48, Jonathan Cameron wrote:
> > On Mon, 15 Feb 2021 17:45:31 -0800
> > Ben Widawsky  wrote:
> >   
> > > Provide enough functionality to utilize the mailbox of a memory device.
> > > The mailbox is used to interact with the firmware running on the memory
> > > device. The flow is proven with one implemented command, "identify".
> > > Because the class code has already told the driver this is a memory
> > > device and the identify command is mandatory.
> > > 
> > > CXL devices contain an array of capabilities that describe the
> > > interactions software can have with the device or firmware running on
> > > the device. A CXL compliant device must implement the device status and
> > > the mailbox capability. Additionally, a CXL compliant memory device must
> > > implement the memory device capability. Each of the capabilities can
> > > [will] provide an offset within the MMIO region for interacting with the
> > > CXL device.
> > > 
> > > The capabilities tell the driver how to find and map the register space
> > > for CXL Memory Devices. The registers are required to utilize the CXL
> > > spec defined mailbox interface. The spec outlines two mailboxes, primary
> > > and secondary. The secondary mailbox is earmarked for system firmware,
> > > and not handled in this driver.
> > > 
> > > Primary mailboxes are capable of generating an interrupt when submitting
> > > a background command. That implementation is saved for a later time.
> > > 
> > > Link: https://www.computeexpresslink.org/download-the-specification
> > > Signed-off-by: Ben Widawsky 
> > > Reviewed-by: Dan Williams  (v2)  
> > 
> > Looks like an off by one error in the register locator iteration.
> > 
> > The potential buffer overrun from memcpy_fromio is still there as well
> > as far as I can see.
> > 
> > If the software provides storage for a payload of size n and the hardware
> > reports a size of n + d, code will happily write beyond the end of the
> > storage provided.
> > 
> > Obviously, this shouldn't happen, but I'm not that trusting of both
> > hardware and software never having bugs.
> > 
> > Jonathan
> >   
> > > ---
> > >  drivers/cxl/cxl.h |  88 
> > >  drivers/cxl/mem.c | 543 +-
> > >  drivers/cxl/pci.h |  14 ++
> > >  3 files changed, 643 insertions(+), 2 deletions(-)
> > >  create mode 100644 drivers/cxl/cxl.h
> > >   
> > ...
> >   
> > > +
> > > +#endif /* __CXL_H__ */
> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index ce33c5ee77c9..b86cda2d299a 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -3,7 +3,458 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include "pci.h"
> > > +#include "cxl.h"
> > > +
> > > +#define cxl_doorbell_busy(cxlm)  
> > >   \
> > > + (readl((cxlm)->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET) &  \
> > > +  CXLDEV_MBOX_CTRL_DOORBELL)
> > > +
> > > +/* CXL 2.0 - 8.2.8.4 */
> > > +#define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
> > > +
> > > +enum opcode {
> > > + CXL_MBOX_OP_IDENTIFY= 0x4000,
> > > + CXL_MBOX_OP_MAX = 0x1
> > > +};
> > > +
> > > +/**
> > > + * struct mbox_cmd - A command to be submitted to hardware.
> > > + * @opcode: (input) The command set and command submitted to hardware.
> > > + * @payload_in: (input) Pointer to the input payload.
> > > + * @payload_out: (output) Pointer to the output payload. Must be 
> > > allocated by
> > > + *the caller.
> > > + * @size_in: (input) Number of bytes to load from @payload.
> > > + * @size_out: (output) Number of bytes loaded into @payload.
> > > + * @return_code: (output) Error code returned from hardware.
> > > + *
> > > + * This is the primary mechanism used to send commands to the hardware.
> > > + * All the fields except @payload_* correspond exactly to the fields 
> > > described in
> > > + * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
> > > + * @payload_out are written to, and read from the Command Pay

Re: [PATCH v4 4/9] cxl/mem: Add basic IOCTL interface

2021-02-16 Thread Jonathan Cameron
On Tue, 16 Feb 2021 09:53:14 -0800
Ben Widawsky  wrote:

> On 21-02-16 15:22:23, Jonathan Cameron wrote:
> > On Mon, 15 Feb 2021 17:45:33 -0800
> > Ben Widawsky  wrote:
> >   
> > > Add a straightforward IOCTL that provides a mechanism for userspace to
> > > query the supported memory device commands. CXL commands as they appear
> > > to userspace are described as part of the UAPI kerneldoc. The command
> > > list returned via this IOCTL will contain the full set of commands that
> > > the driver supports, however, some of those commands may not be
> > > available for use by userspace.
> > > 
> > > Memory device commands first appear in the CXL 2.0 specification. They
> > > are submitted through a mailbox mechanism specified in the CXL 2.0
> > > specification.
> > > 
> > > The send command allows userspace to issue mailbox commands directly to
> > > the hardware. The list of available commands to send are the output of
> > > the query command. The driver verifies basic properties of the command
> > > and possibly inspect the input (or output) payload to determine whether
> > > or not the command is allowed (or might taint the kernel).
> > > 
> > > Reported-by: kernel test robot  # bug in earlier revision
> > > Signed-off-by: Ben Widawsky 
> > > Reviewed-by: Dan Williams  (v2)  
> > 
> > I may be missreading this but I think the logic to ensure commands
> > using a variable sized buffer have enough space is broken.
> > 
> > Jonathan
> >   
> > > ---
> > >  .clang-format |   1 +
> > >  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
> > >  drivers/cxl/mem.c | 288 +-
> > >  include/uapi/linux/cxl_mem.h  | 154 ++
> > >  4 files changed, 443 insertions(+), 1 deletion(-)
> > >  create mode 100644 include/uapi/linux/cxl_mem.h
> > > 
> > > diff --git a/.clang-format b/.clang-format
> > > index 10dc5a9a61b3..3f11c8901b43 100644
> > > --- a/.clang-format
> > > +++ b/.clang-format
> > > @@ -109,6 +109,7 @@ ForEachMacros:
> > >- 'css_for_each_child'
> > >- 'css_for_each_descendant_post'
> > >- 'css_for_each_descendant_pre'
> > > +  - 'cxl_for_each_cmd'
> > >- 'device_for_each_child_node'
> > >- 'dma_fence_chain_for_each'
> > >- 'do_for_each_ftrace_op'
> > > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> > > b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > > index a4c75a28c839..6eb8e634664d 100644
> > > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > > @@ -352,6 +352,7 @@ Code  Seq#Include File
> > >Comments
> > >   
> > > <mailto:michael.kl...@puffin.lb.shuttle.de>
> > >  0xCC  00-0F  drivers/misc/ibmvmc.h   
> > > pseries VMC driver
> > >  0xCD  01 linux/reiserfs_fs.h
> > > +0xCE  01-02  uapi/linux/cxl_mem.h
> > > Compute Express Link Memory Devices
> > >  0xCF  02 fs/cifs/ioctl.c
> > >  0xDB  00-0F  drivers/char/mwave/mwavepub.h
> > >  0xDD  00-3F  
> > > ZFCP device driver see drivers/s390/scsi/
> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index 410adb1bdffc..a4298cb1182d 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -1,5 +1,6 @@
> > >  // SPDX-License-Identifier: GPL-2.0-only
> > >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -40,6 +41,7 @@
> > >  #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
> > >  
> > >  enum opcode {
> > > + CXL_MBOX_OP_INVALID = 0x,
> > >   CXL_MBOX_OP_IDENTIFY= 0x4000,
> > >   CXL_MBOX_OP_MAX = 0x1
> > >  };
> > > @@ -91,6 +93,49 @@ struct cxl_memdev {
> > >  static int cxl_mem_major;
> > >  static DEFINE_IDA(cxl_memdev_ida);
> > >  
> > > +/**
> > > + * struct cxl_mem_command - Driver representation of a m

Re: [PATCH v4 4/9] cxl/mem: Add basic IOCTL interface

2021-02-17 Thread Jonathan Cameron
On Tue, 16 Feb 2021 10:34:32 -0800
Ben Widawsky  wrote:

...

> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index 237b956f0be0..4ca4f5afd9d2 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -686,7 +686,11 @@ static int cxl_validate_cmd_from_user(struct cxl_mem 
> > > *cxlm,
> > > 
> > > memcpy(out_cmd, c, sizeof(*c));
> > > out_cmd->info.size_in = send_cmd->in.size;
> > > -   out_cmd->info.size_out = send_cmd->out.size;
> > > +   /*
> > > +* XXX: out_cmd->info.size_out will be controlled by the driver, 
> > > and the
> > > +* specified number of bytes @send_cmd->out.size will be copied 
> > > back out
> > > +* to userspace.
> > > +*/
> > > 
> > > return 0;
> > >  }  
> > 
> > This deals with the buffer overflow being triggered from userspace.
> > 
> > I'm still nervous.  I really don't like assuming hardware will do the right
> > thing and never send us more data than we expect.
> > 
> > Given the check that it will fit in the target buffer is simple,
> > I'd prefer to harden it and know we can't have a problem.
> > 
> > Jonathan  
> 
> I'm working on hardening __cxl_mem_mbox_send_cmd now per your request. With
> that, I think this solves the issue, right?

Should do.  Thanks,

Jonathan
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v5 2/9] cxl/mem: Find device capabilities

2021-02-17 Thread Jonathan Cameron
On Tue, 16 Feb 2021 20:09:51 -0800
Ben Widawsky  wrote:

> Provide enough functionality to utilize the mailbox of a memory device.
> The mailbox is used to interact with the firmware running on the memory
> device. The flow is proven with one implemented command, "identify".
> Because the class code has already told the driver this is a memory
> device and the identify command is mandatory.
> 
> CXL devices contain an array of capabilities that describe the
> interactions software can have with the device or firmware running on
> the device. A CXL compliant device must implement the device status and
> the mailbox capability. Additionally, a CXL compliant memory device must
> implement the memory device capability. Each of the capabilities can
> [will] provide an offset within the MMIO region for interacting with the
> CXL device.
> 
> The capabilities tell the driver how to find and map the register space
> for CXL Memory Devices. The registers are required to utilize the CXL
> spec defined mailbox interface. The spec outlines two mailboxes, primary
> and secondary. The secondary mailbox is earmarked for system firmware,
> and not handled in this driver.
> 
> Primary mailboxes are capable of generating an interrupt when submitting
> a background command. That implementation is saved for a later time.
> 
> Reported-by: Colin Ian King  (coverity)
> Reported-by: Dan Carpenter  (smatch)
> Link: https://www.computeexpresslink.org/download-the-specification
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams  (v2)

Looks good to me. 

Reviewed-by: Jonathan Cameron 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v5 4/9] cxl/mem: Add basic IOCTL interface

2021-02-17 Thread Jonathan Cameron
On Tue, 16 Feb 2021 20:09:53 -0800
Ben Widawsky  wrote:

> Add a straightforward IOCTL that provides a mechanism for userspace to
> query the supported memory device commands. CXL commands as they appear
> to userspace are described as part of the UAPI kerneldoc. The command
> list returned via this IOCTL will contain the full set of commands that
> the driver supports, however, some of those commands may not be
> available for use by userspace.
> 
> Memory device commands first appear in the CXL 2.0 specification. They
> are submitted through a mailbox mechanism specified in the CXL 2.0
> specification.
> 
> The send command allows userspace to issue mailbox commands directly to
> the hardware. The list of available commands to send are the output of
> the query command. The driver verifies basic properties of the command
> and possibly inspect the input (or output) payload to determine whether
> or not the command is allowed (or might taint the kernel).
> 
> Reported-by: kernel test robot  # bug in earlier revision
> Reported-by: Stephen Rothwell 
> Cc: Al Viro 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Dan Williams  (v2)

Reviewed-by: Jonathan Cameron 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2] cxl/mem: Fix potential memory leak

2021-02-22 Thread Jonathan Cameron
On Sat, 20 Feb 2021 19:58:46 -0800
Ben Widawsky  wrote:

> When submitting a command for userspace, input and output payload bounce
> buffers are allocated. For a given command, both input and output
> buffers may exist and so when allocation of the input buffer fails, the
> output buffer must be freed too.
> 
> As far as I can tell, userspace can't easily exploit the leak to OOM a
> machine unless the machine was already near OOM state.
> 
> Fixes: 583fa5e71cae ("cxl/mem: Add basic IOCTL interface")
> Reported-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Ben Widawsky 

I'd probably have used a goto and label but I guess this works fine a well.
FWIW on such a small patch.

Reviewed-by: Jonathan Cameron 

> ---
>  drivers/cxl/mem.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index df895bcca63a..244cb7d89678 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -514,8 +514,10 @@ static int handle_mailbox_cmd_from_user(struct cxl_mem 
> *cxlm,
>   if (cmd->info.size_in) {
>   mbox_cmd.payload_in = vmemdup_user(u64_to_user_ptr(in_payload),
>  cmd->info.size_in);
> - if (IS_ERR(mbox_cmd.payload_in))
> + if (IS_ERR(mbox_cmd.payload_in)) {
> + kvfree(mbox_cmd.payload_out);
>   return PTR_ERR(mbox_cmd.payload_in);
> + }
>   }
>  
>   rc = cxl_mem_mbox_get(cxlm);
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org