Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE

2020-10-21 Thread Derrick, Jonathan
On Tue, 2020-10-20 at 21:21 -0500, Bjorn Helgaas wrote:
> On Wed, Oct 21, 2020 at 01:20:24AM +0000, Derrick, Jonathan wrote:
> > On Tue, 2020-10-20 at 15:26 -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> > > > VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> > > > In order to support direct interrupt remapping of VMD child devices,
> > > > ensure that the IRTE is programmed with the VMD endpoint's requester-id
> > > > using pci_real_dma_dev().
> > > > 
> > > > Reviewed-by: Andy Shevchenko 
> > > > Signed-off-by: Jon Derrick 
> > > 
> > > As Thomas (and Stephen) pointed out, this conflicts with 7ca435cf857d
> > > ("x86/irq: Cleanup the arch_*_msi_irqs() leftovers"), which removes
> > > native_setup_msi_irqs().
> > > 
> > > Stephen resolved the conflict by dropping this hunk.  I would rather
> > > just drop this patch completely from the PCI tree.  If I keep the
> > > patch, (1) Linus will have to resolve the conflict, and worse, (2)
> > > it's not clear what happened to the use of pci_real_dma_dev() here.
> > > It will just vanish into the ether with no explanation other than
> > > "this function was removed."
> > > 
> > > Is dropping this patch the correct thing to do?  Or do you need to add
> > > pci_real_dma_dev() elsewhere to compensate?
> > 
> > It would still need the pci_real_dma_dev() for IRTE programming.
> > 
> > I think at this point I would rather see 5+6 dropped and this included
> > for TGL enablement:
> > https://patchwork.kernel.org/project/linux-pci/patch/20200914190128.5114-1-jonathan.derr...@intel.com/
> 
> It's too late to add new things for v5.10.  I'll drop 5 and I'll be
> happy to drop 6, too, if you want.  I have several comments/questions
> on 6 anyway that I haven't finished writing up.
> 
> But if you'd rather have 1-4 + 6 in v5.10 instead of just 1-4, let me
> know.
> 
> Bjorn

Here's the proposed new location for patch 5 for pci_real_dma_dev(),
but I can't test this at the moment:

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..707968b234e9 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -194,6 +194,7 @@ int pci_msi_prepare(struct irq_domain *domain,
struct device *dev, int nvec,
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
}
+   arg->devid = pci_real_dma_dev(pdev);
 
return 0;
 }
-- 
2.18.1


Otherwise I would want to drop 5 & 6 because 6 will likely break VMD
without patch 5 when IO APIC is in use


Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE

2020-10-20 Thread Derrick, Jonathan
On Tue, 2020-10-20 at 15:26 -0500, Bjorn Helgaas wrote:
> On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> > VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> > In order to support direct interrupt remapping of VMD child devices,
> > ensure that the IRTE is programmed with the VMD endpoint's requester-id
> > using pci_real_dma_dev().
> > 
> > Reviewed-by: Andy Shevchenko 
> > Signed-off-by: Jon Derrick 
> 
> As Thomas (and Stephen) pointed out, this conflicts with 7ca435cf857d
> ("x86/irq: Cleanup the arch_*_msi_irqs() leftovers"), which removes
> native_setup_msi_irqs().
> 
> Stephen resolved the conflict by dropping this hunk.  I would rather
> just drop this patch completely from the PCI tree.  If I keep the
> patch, (1) Linus will have to resolve the conflict, and worse, (2)
> it's not clear what happened to the use of pci_real_dma_dev() here.
> It will just vanish into the ether with no explanation other than
> "this function was removed."
> 
> Is dropping this patch the correct thing to do?  Or do you need to add
> pci_real_dma_dev() elsewhere to compensate?
It would still need the pci_real_dma_dev() for IRTE programming.

I think at this point I would rather see 5+6 dropped and this included
for TGL enablement:
https://patchwork.kernel.org/project/linux-pci/patch/20200914190128.5114-1-jonathan.derr...@intel.com/

> 
> > ---
> >  arch/x86/kernel/apic/msi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> > index c2b2911feeef..7ca271b8d891 100644
> > --- a/arch/x86/kernel/apic/msi.c
> > +++ b/arch/x86/kernel/apic/msi.c
> > @@ -189,7 +189,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int 
> > nvec, int type)
> >  
> > init_irq_alloc_info(, NULL);
> > info.type = X86_IRQ_ALLOC_TYPE_MSI;
> > -   info.msi_dev = dev;
> > +   info.msi_dev = pci_real_dma_dev(dev);
> >  
> > domain = irq_remapping_get_irq_domain();
> > if (domain == NULL)
> > -- 
> > 2.27.0
> > 


Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI

2020-09-30 Thread Derrick, Jonathan
+Megha

On Wed, 2020-09-30 at 09:57 -0300, Jason Gunthorpe wrote:
> On Wed, Sep 30, 2020 at 12:45:30PM +0000, Derrick, Jonathan wrote:
> > Hi Jason
> > 
> > On Mon, 2020-08-31 at 11:39 -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> > > > From: Thomas Gleixner 
> > > > 
> > > > Devices on the VMD bus use their own MSI irq domain, but it is not
> > > > distinguishable from regular PCI/MSI irq domains. This is required
> > > > to exclude VMD devices from getting the irq domain pointer set by
> > > > interrupt remapping.
> > > > 
> > > > Override the default bus token.
> > > > 
> > > > Signed-off-by: Thomas Gleixner 
> > > > Acked-by: Bjorn Helgaas 
> > > >  drivers/pci/controller/vmd.c |6 ++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
> > > > return -ENODEV;
> > > > }
> > > >  
> > > > +   /*
> > > > +* Override the irq domain bus token so the domain can be 
> > > > distinguished
> > > > +* from a regular PCI/MSI domain.
> > > > +*/
> > > > +   irq_domain_update_bus_token(vmd->irq_domain, 
> > > > DOMAIN_BUS_VMD_MSI);
> > > > +
> > > 
> > > Having the non-transparent-bridge hold a MSI table and
> > > multiplex/de-multiplex IRQs looks like another good use case for
> > > something close to pci_subdevice_msi_create_irq_domain()?
> > > 
> > > If each device could have its own internal MSI-X table programmed
> > > properly things would work alot better. Disable capture/remap of the
> > > MSI range in the NTB.
> > We can disable the capture and remap in newer devices so we don't even
> > need the irq domain.
> 
> You'd still need an irq domain, it just comes from
> pci_subdevice_msi_create_irq_domain() instead of internal to this
> driver.
I have this set which disables remapping and bypasses the creation of
the IRQ domain:
https://patchwork.ozlabs.org/project/linux-pci/list/?series=192936

It allows the end-devices like NVMe to directly manager their own
interrupts and eliminates the VMD interrupt completely. The only quirk
was that kernel has to program IRTE with the VMD device ID as it still
remaps Requester-ID from subdevices.

> 
> > I would however like to determine if the MSI data bits could be used
> > for individual devices to have the IRQ domain subsystem demultiplex the
> > virq from that and eliminate the IRQ list iteration.
> 
> Yes, exactly. This new pci_subdevice_msi_create_irq_domain() creates
> *proper* fully functional interrupts, no need for any IRQ handler in
> this driver.
> 
> > A concern I have about that scheme is virtualization as I think those
> > bits are used to route to the virtual vector.
> 
> It should be fine with this patch series, consult with Megha how
> virtualization is working with IDXD/etc
> 
> Jason


Re: [patch V2 24/46] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI

2020-09-30 Thread Derrick, Jonathan
Hi Jason

On Mon, 2020-08-31 at 11:39 -0300, Jason Gunthorpe wrote:
> On Wed, Aug 26, 2020 at 01:16:52PM +0200, Thomas Gleixner wrote:
> > From: Thomas Gleixner 
> > 
> > Devices on the VMD bus use their own MSI irq domain, but it is not
> > distinguishable from regular PCI/MSI irq domains. This is required
> > to exclude VMD devices from getting the irq domain pointer set by
> > interrupt remapping.
> > 
> > Override the default bus token.
> > 
> > Signed-off-by: Thomas Gleixner 
> > Acked-by: Bjorn Helgaas 
> >  drivers/pci/controller/vmd.c |6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
> > return -ENODEV;
> > }
> >  
> > +   /*
> > +* Override the irq domain bus token so the domain can be distinguished
> > +* from a regular PCI/MSI domain.
> > +*/
> > +   irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> > +
> 
> Having the non-transparent-bridge hold a MSI table and
> multiplex/de-multiplex IRQs looks like another good use case for
> something close to pci_subdevice_msi_create_irq_domain()?
> 
> If each device could have its own internal MSI-X table programmed
> properly things would work alot better. Disable capture/remap of the
> MSI range in the NTB.
We can disable the capture and remap in newer devices so we don't even
need the irq domain. Legacy VMD will automatically remap based on the
APIC dest bits in the MSI address.

I would however like to determine if the MSI data bits could be used
for individual devices to have the IRQ domain subsystem demultiplex the
virq from that and eliminate the IRQ list iteration.

A concern I have about that scheme is virtualization as I think those
bits are used to route to the virtual vector.

> 
> Then each device could have a proper non-multiplexed interrupt
> delivered to the CPU.. Affinity would work properly, no need to share
> IRQs (eg vmd_irq() goes away)/etc.
> 
> Something for the VMD maintainers to think about at least.
> 
> As I hear more about NTB these days a full MSI scheme for NTB seems
> interesting to have in the PCI-E core code..
> 
> Jason




Re: [PATCH] PCI/ASPM: Enable ASPM for links under VMD domain

2020-09-10 Thread Derrick, Jonathan
On Thu, 2020-09-10 at 14:17 -0500, Bjorn Helgaas wrote:
> On Thu, Sep 10, 2020 at 06:52:48PM +0000, Derrick, Jonathan wrote:
> > On Thu, 2020-09-10 at 12:38 -0500, Bjorn Helgaas wrote:
> > > On Thu, Sep 10, 2020 at 04:33:39PM +, Derrick, Jonathan wrote:
> > > > On Wed, 2020-09-09 at 20:55 -0500, Bjorn Helgaas wrote:
> > > > > On Fri, Aug 21, 2020 at 08:32:20PM +0800, Kai-Heng Feng wrote:
> > > > > > New Intel laptops with VMD cannot reach deeper power saving state,
> > > > > > renders very short battery time.
> > > > > > 
> > > > > > As BIOS may not be able to program the config space for devices 
> > > > > > under
> > > > > > VMD domain, ASPM needs to be programmed manually by software. This 
> > > > > > is
> > > > > > also the case under Windows.
> > > > > > 
> > > > > > The VMD controller itself is a root complex integrated endpoint that
> > > > > > doesn't have ASPM capability, so we can't propagate the ASPM 
> > > > > > settings to
> > > > > > devices under it. Hence, simply apply ASPM_STATE_ALL to the links 
> > > > > > under
> > > > > > VMD domain, unsupported states will be cleared out anyway.
> > > > > > 
> > > > > > Signed-off-by: Kai-Heng Feng 
> > > > > > ---
> > > > > >  drivers/pci/pcie/aspm.c |  3 ++-
> > > > > >  drivers/pci/quirks.c| 11 +++
> > > > > >  include/linux/pci.h |  2 ++
> > > > > >  3 files changed, 15 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > index 253c30cc1967..dcc002dbca19 100644
> > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > @@ -624,7 +624,8 @@ static void pcie_aspm_cap_init(struct 
> > > > > > pcie_link_state *link, int blacklist)
> > > > > > aspm_calc_l1ss_info(link, , );
> > > > > >  
> > > > > > /* Save default state */
> > > > > > -   link->aspm_default = link->aspm_enabled;
> > > > > > +   link->aspm_default = parent->dev_flags & 
> > > > > > PCI_DEV_FLAGS_ENABLE_ASPM ?
> > > > > > +ASPM_STATE_ALL : link->aspm_enabled;
> > > > > 
> > > > > This function is ridiculously complicated already, and I really don't
> > > > > want to make it worse.
> > > > > 
> > > > > What exactly is the PCIe topology here?  Apparently the VMD controller
> > > > > is a Root Complex Integrated Endpoint, so it's a Type 0 (non-bridge)
> > > > > device.  And it has no Link, hence no Link Capabilities or Control and
> > > > > hence no ASPM-related bits.  Right?
> > > > 
> > > > That's correct. VMD is the Type 0 device providing config/mmio
> > > > apertures to another segment and MSI/X remapping. No link and no ASPM
> > > > related bits.
> > > > 
> > > > Hierarchy is usually something like:
> > > > 
> > > > Segment 0   | VMD segment
> > > > Root Complex -> VMD | Type 0 (RP/Bridge; physical slot) - Type 1
> > > > | Type 0 (RP/Bridge; physical slot) - Type 1
> > > > 
> > > > > And the devices under the VMD controller?  I guess they are regular
> > > > > PCIe Endpoints, Switch Ports, etc?  Obviously there's a Link involved
> > > > > somewhere.  Does the VMD controller have some magic, non-architected
> > > > > Port on the downstream side?
> > > > 
> > > > Correct: Type 0 and Type 1 devices, and any number of Switch ports as
> > > > it's usually pinned out to physical slot.
> > > > 
> > > > > Does this patch enable ASPM on this magic Link between VMD and the
> > > > > next device?  Configuring ASPM correctly requires knowledge and knobs
> > > > > from both ends of the Link, and apparently we don't have those for the
> > > > > VMD end.
> > > > 
> > > > VMD itself doesn't have the link to it's domain. It's really just the
> > > > config/mmio aperture and MSI/X remapper. The PCIe link is between the
> > > > Type 0 and T

Re: [PATCH] PCI/ASPM: Enable ASPM for links under VMD domain

2020-09-10 Thread Derrick, Jonathan
Hi Bjorn

On Wed, 2020-09-09 at 20:55 -0500, Bjorn Helgaas wrote:
> [+cc Saheed]
> 
> On Fri, Aug 21, 2020 at 08:32:20PM +0800, Kai-Heng Feng wrote:
> > New Intel laptops with VMD cannot reach deeper power saving state,
> > renders very short battery time.
> > 
> > As BIOS may not be able to program the config space for devices under
> > VMD domain, ASPM needs to be programmed manually by software. This is
> > also the case under Windows.
> > 
> > The VMD controller itself is a root complex integrated endpoint that
> > doesn't have ASPM capability, so we can't propagate the ASPM settings to
> > devices under it. Hence, simply apply ASPM_STATE_ALL to the links under
> > VMD domain, unsupported states will be cleared out anyway.
> > 
> > Signed-off-by: Kai-Heng Feng 
> > ---
> >  drivers/pci/pcie/aspm.c |  3 ++-
> >  drivers/pci/quirks.c| 11 +++
> >  include/linux/pci.h |  2 ++
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 253c30cc1967..dcc002dbca19 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -624,7 +624,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state 
> > *link, int blacklist)
> > aspm_calc_l1ss_info(link, , );
> >  
> > /* Save default state */
> > -   link->aspm_default = link->aspm_enabled;
> > +   link->aspm_default = parent->dev_flags & PCI_DEV_FLAGS_ENABLE_ASPM ?
> > +ASPM_STATE_ALL : link->aspm_enabled;
> 
> This function is ridiculously complicated already, and I really don't
> want to make it worse.
> 
> What exactly is the PCIe topology here?  Apparently the VMD controller
> is a Root Complex Integrated Endpoint, so it's a Type 0 (non-bridge)
> device.  And it has no Link, hence no Link Capabilities or Control and
> hence no ASPM-related bits.  Right?
That's correct. VMD is the Type 0 device providing config/mmio
apertures to another segment and MSI/X remapping. No link and no ASPM
related bits.

Hierarchy is usually something like:

Segment 0   | VMD segment
Root Complex -> VMD | Type 0 (RP/Bridge; physical slot) - Type 1
| Type 0 (RP/Bridge; physical slot) - Type 1

> 
> And the devices under the VMD controller?  I guess they are regular
> PCIe Endpoints, Switch Ports, etc?  Obviously there's a Link involved
> somewhere.  Does the VMD controller have some magic, non-architected
> Port on the downstream side?
Correct: Type 0 and Type 1 devices, and any number of Switch ports as
it's usually pinned out to physical slot.


> 
> Does this patch enable ASPM on this magic Link between VMD and the
> next device?  Configuring ASPM correctly requires knowledge and knobs
> from both ends of the Link, and apparently we don't have those for the
> VMD end.
VMD itself doesn't have the link to it's domain. It's really just the
config/mmio aperture and MSI/X remapper. The PCIe link is between the
Type 0 and Type 1 devices on the VMD domain. So fortunately the VMD
itself is not the upstream part of the link.

> 
> Or is it for Links deeper in the hierarchy?  I assume those should
> just work already, although there might be issues with latency
> computation, etc., because we may not be able to account for the part
> of the path above VMD.
That's correct. This is for the links within the domain itself, such as
between a type 0 and NVMe device.

> 
> I want aspm.c to eventually get out of the business of managing struct
> pcie_link_state.  I think it should manage *device* state for each end
> of the link.  Maybe that's a path forward, e.g., if we cache the Link
> Capabilities during enumeration, quirks could modify that directly,
> and aspm.c could just consume that cached information.  I think Saheed
> (cc'd) is already working on patches in this direction.
> 
> I'm still not sure how this works if VMD is the upstream end of a
> Link, though.
> 
> > /* Setup initial capable state. Will be updated later */
> > link->aspm_capable = link->aspm_support;
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index bdf9b52567e0..2e2f525bd892 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5632,3 +5632,14 @@ static void apex_pci_fixup_class(struct pci_dev 
> > *pdev)
> >  }
> >  DECLARE_PCI_FIXUP_CLASS_HEADER(0x1ac1, 0x089a,
> >PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
> > +
> > +/*
> > + * Device [8086:9a09]
> > + * BIOS may not be able to access config space of devices under VMD 
> > domain, so
> > + * it relies on software to enable ASPM for links under VMD.
> > + */
> > +static void pci_fixup_enable_aspm(struct pci_dev *pdev)
> > +{
> > +   pdev->dev_flags |= PCI_DEV_FLAGS_ENABLE_ASPM;
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a09, 
> > pci_fixup_enable_aspm);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 835530605c0d..66a45916c7c6 100644
> > --- a/include/linux/pci.h
> > 

Re: [PATCH] PCI/ASPM: Enable ASPM for links under VMD domain

2020-08-27 Thread Derrick, Jonathan
On Thu, 2020-08-27 at 17:23 +0100, h...@infradead.org wrote:
> On Thu, Aug 27, 2020 at 04:13:44PM +0000, Derrick, Jonathan wrote:
> > On Thu, 2020-08-27 at 06:34 +, h...@infradead.org wrote:
> > > On Wed, Aug 26, 2020 at 09:43:27PM +, Derrick, Jonathan wrote:
> > > > Feel free to review my set to disable the MSI remapping which will
> > > > make
> > > > it perform as well as direct-attached:
> > > > 
> > > > https://patchwork.kernel.org/project/linux-pci/list/?series=325681
> > > 
> > > So that then we have to deal with your schemes to make individual
> > > device direct assignment work in a convoluted way?
> > 
> > That's not the intent of that patchset -at all-. It was to address the
> > performance bottlenecks with VMD that you constantly complain about. 
> 
> I know.  But once we fix that bottleneck we fix the next issue,
> then to tackle the next.  While at the same time VMD brings zero
> actual benefits.
> 

Just a few benefits and there are other users with unique use cases:
1. Passthrough of the endpoint to OSes which don't natively support
hotplug can enable hotplug for that OS using the guest VMD driver
2. Some hypervisors have a limit on the number of devices that can be
passed through. VMD endpoint is a single device that expands to many.
3. Expansion of possible bus numbers beyond 256 by using other
segments.
4. Custom RAID LED patterns driven by ledctl

I'm not trying to market this. Just pointing out that this isn't
"bringing zero actual benefits" to many users.


> > > Please just give us
> > > a disable nob for VMD, which solves _all_ these problems without
> > > adding
> > > any.
> > 
> > I don't see the purpose of this line of discussion. VMD has been in the
> > kernel for 5 years. We are constantly working on better support.
> 
> Please just work with the platform people to allow the host to disable
> VMD.  That is the only really useful value add here.

Cheers


Re: [PATCH] PCI/ASPM: Enable ASPM for links under VMD domain

2020-08-27 Thread Derrick, Jonathan
On Thu, 2020-08-27 at 06:34 +, h...@infradead.org wrote:
> On Wed, Aug 26, 2020 at 09:43:27PM +0000, Derrick, Jonathan wrote:
> > Feel free to review my set to disable the MSI remapping which will
> > make
> > it perform as well as direct-attached:
> > 
> > https://patchwork.kernel.org/project/linux-pci/list/?series=325681
> 
> So that then we have to deal with your schemes to make individual
> device direct assignment work in a convoluted way?

That's not the intent of that patchset -at all-. It was to address the
performance bottlenecks with VMD that you constantly complain about. 


> Please just give us
> a disable nob for VMD, which solves _all_ these problems without
> adding
> any.

I don't see the purpose of this line of discussion. VMD has been in the
kernel for 5 years. We are constantly working on better support.


Re: [PATCH] PCI/ASPM: Enable ASPM for links under VMD domain

2020-08-26 Thread Derrick, Jonathan
On Tue, 2020-08-25 at 06:23 +, Christoph Hellwig wrote:
> On Fri, Aug 21, 2020 at 08:32:20PM +0800, Kai-Heng Feng wrote:
> > New Intel laptops with VMD cannot reach deeper power saving state,
> > renders very short battery time.
> 
> So what about just disabling VMD given how bloody pointless it is?
> Hasn't anyone learned from the AHCI remapping debacle?
> 
> I'm really pissed at all this pointless crap intel comes up with just
> to make life hard for absolutely no gain.  Is it so hard to just leave
> a NVMe device as a standard NVMe device instead of f*^ everything
> up in the chipset to make OS support a pain and I/O slower than by
> doing nothing?


Feel free to review my set to disable the MSI remapping which will make
it perform as well as direct-attached:

https://patchwork.kernel.org/project/linux-pci/list/?series=325681


Re: [patch V2 23/46] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI

2020-08-26 Thread Derrick, Jonathan
On Wed, 2020-08-26 at 21:42 +0100, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:16:51 +0100,
> Thomas Gleixner  wrote:
> > From: Thomas Gleixner 
> > 
> > PCI devices behind a VMD bus are not subject to interrupt remapping, but
> > the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI
> > irq domain.
> > 
> > Add a new domain bus token and allow it in the bus token check in
> > msi_check_reservation_mode() to keep the functionality the same once VMD
> > uses this token.
> > 
> > Signed-off-by: Thomas Gleixner 
> > 
> > ---
> >  include/linux/irqdomain.h |1 +
> >  kernel/irq/msi.c  |7 ++-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > --- a/include/linux/irqdomain.h
> > +++ b/include/linux/irqdomain.h
> > @@ -84,6 +84,7 @@ enum irq_domain_bus_token {
> > DOMAIN_BUS_FSL_MC_MSI,
> > DOMAIN_BUS_TI_SCI_INTA_MSI,
> > DOMAIN_BUS_WAKEUP,
> > +   DOMAIN_BUS_VMD_MSI,
> >  };
> >  
> >  /**
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s
> >  {
> > struct msi_desc *desc;
> >  
> > -   if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
> > +   switch(domain->bus_token) {
> > +   case DOMAIN_BUS_PCI_MSI:
> > +   case DOMAIN_BUS_VMD_MSI:
> > +   break;
> > +   default:
> > return false;
> > +   }
> >  
> > if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
> > return false;
> 
> Acked-by: Marc Zyngier 
> 
>   M.
> 

Acked-by: Jon Derrick 


Re: [PATCH 0/6] VMD MSI Remapping Bypass

2020-08-17 Thread Derrick, Jonathan
Hi Lorenzo,

On Tue, 2020-07-28 at 13:49 -0600, Jon Derrick wrote:
> The Intel Volume Management Device acts similar to a PCI-to-PCI bridge in that
> it changes downstream devices' requester-ids to its own. As VMD supports PCIe
> devices, it has its own MSI/X table and transmits child device MSI/X by
> remapping child device MSI/X and handling like a demultiplexer.
> 
> Some newer VMD devices (Icelake Server and client) have an option to bypass 
> the
> VMD MSI/X remapping table. This allows for better performance scaling as the
> child device MSI/X won't be limited by VMD's MSI/X count and IRQ handler.
> 
> It's expected that most users don't want MSI/X remapping when they can get
> better performance without this limitation. This set includes some long 
> overdue
> cleanup of overgrown VMD code and introduces the MSI/X remapping disable.
> 
> Applies on top of e3beca48a45b ("irqdomain/treewide: Keep firmware node
> unconditionally allocated") and ec0160891e38 ("irqdomain/treewide: Free
> firmware node after domain removal") in tip/urgent
> 
> 
> Jon Derrick (6):
>   PCI: vmd: Create physical offset helper
>   PCI: vmd: Create bus offset configuration helper
>   PCI: vmd: Create IRQ Domain configuration helper
>   PCI: vmd: Create IRQ allocation helper
>   x86/apic/msi: Use Real PCI DMA device when configuring IRTE
>   PCI: vmd: Disable MSI/X remapping when possible
> 
>  arch/x86/kernel/apic/msi.c   |   2 +-
>  drivers/pci/controller/vmd.c | 344 +++
>  2 files changed, 224 insertions(+), 122 deletions(-)
> 
> 
> base-commit: ec0160891e387f4771f953b888b1fe951398e5d9

Gentle reminder. Please don't forget about this.
We have a few more patches coming soon that I'd prefer to stage upon
this set.


Re: Parallel compilation performance regression

2020-06-19 Thread Derrick, Jonathan
On Thu, 2020-06-18 at 18:05 -0700, Matthew Wilcox wrote:
> On Thu, Jun 18, 2020 at 11:52:55PM +0000, Derrick, Jonathan wrote:
> > Hi David,
> > 
> > I've been experiencing a performance regression when running a parallel
> > compilation (eg, make -j72) on recent kernels.
> 
> I bet you're using a version of make which predates 4.3:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ddad21d3e99c743a3aa473121dc5561679e26bb
> 

I am!

# make --version
GNU Make 4.2.1


Thank you Matthew!


Parallel compilation performance regression

2020-06-18 Thread Derrick, Jonathan
Hi David,

I've been experiencing a performance regression when running a parallel
compilation (eg, make -j72) on recent kernels.

I bisected it to this commit:

commit b667b867344301e24f21d4a4c844675ff61d89e1
Author: David Howells 
Date:   Tue Sep 24 16:09:04 2019 +0100

pipe: Advance tail pointer inside of wait spinlock in pipe_read()

Advance the pipe ring tail pointer inside of wait spinlock in pipe_read()
so that the pipe can be written into with kernel notifications from
contexts where pipe->mutex cannot be taken.


Prior to this commit I got 70% or so thread saturation of cc1 and after
it rarely gets above 15% and would often drop down to 1-2 threads. It
doesn't look like a clean revert either. Looking at upstream, it seems
that some later code changed the wakeup. I'm not really sure how this
all fits into parallelized make.

Best
Jon


Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC

2020-05-22 Thread Derrick, Jonathan
On Fri, 2020-05-01 at 11:35 -0600, Jonathan Derrick wrote:
> On Fri, 2020-05-01 at 12:16 -0500, Bjorn Helgaas wrote:
> > On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote:
> > > Hi Bjorn & Kuppuswamy,
> > > 
> > > I see a problem in the DPC ECN [1] to _OSC in that it doesn't give us a 
> > > way to
> > > determine if firmware supports _OSC DPC negotation, and therefore how to 
> > > handle
> > > DPC.
> > > 
> > > Here is the wording of the ECN that implies that Firmware without _OSC DPC
> > > negotiation support should have the OSPM rely on _OSC AER negotiation when
> > > determining DPC control:
> > > 
> > >   PCIe Base Specification suggests that Downstream Port Containment may be
> > >   controlled either by the Firmware or the Operating System. It also 
> > > suggests
> > >   that the Firmware retain ownership of Downstream Port Containment if it 
> > > also
> > >   owns AER. When the Firmware owns Downstream Port Containment, it is 
> > > expected
> > >   to use the new "Error Disconnect Recover" notification to alert OSPM of 
> > > a
> > >   Downstream Port Containment event.
> > > 
> > > In legacy platforms, as bits in _OSC are reserved prior to 
> > > implementation, ACPI
> > > Root Bus enumeration will mark these Host Bridges as without Native DPC
> > > support, even though the specification implies it's expected that AER _OSC
> > > negotiation determines DPC control for these platforms. There seems to be 
> > > a
> > > need for a way to determine if the DPC control bit in _OSC is supported 
> > > and
> > > fallback on AER otherwise.
> > > 
> > > 
> > > Currently portdrv assumes DPC control if the port has Native AER services:
> > > 
> > > static int get_port_device_capability(struct pci_dev *dev)
> > > ...
> > >   if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> > >   pci_aer_available() &&
> > >   (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> > >   services |= PCIE_PORT_SERVICE_DPC;
> > > 
> > > Newer firmware may not grant OSPM DPC control, if for instance, it 
> > > expects to
> > > use Error Disconnect Recovery. However it looks like ACPI will use DPC 
> > > services
> > > via the EDR driver, without binding the full DPC port service driver.
> > > 
> > > 
> > > If we change portdrv to probe based on host->native_dpc and not AER, then 
> > > we
> > > break instances with legacy firmware where OSPM will clear 
> > > host->native_dpc
> > > solely due to _OSC bits being reserved:
> > > 
> > > struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> > > ...
> > >   if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
> > >   host_bridge->native_dpc = 0;
> > > 
> > > 
> > > 
> > > So my assumption instead is that host->native_dpc can be 0 and expect 
> > > Native
> > > DPC services if AER is used. In other words, if and only if DPC probe is
> > > invoked from portdrv, then it needs to rely on the AER dependency. 
> > > Otherwise it
> > > should be assumed that ACPI set up DPC via EDR. This covers legacy 
> > > firmware.
> > > 
> > > However it seems like that could be trouble with newer firmware that 
> > > might give
> > > OSPM control of AER but not DPC, and would result in both Native DPC and 
> > > EDR
> > > being in effect.
> > > 
> > > 
> > > Anyways here are two patches that give control of AER and DPC on the 
> > > results of
> > > _OSC. They don't mess with the HEST parser as I expect those to be 
> > > removed at
> > > some point. I need these for VMD support which doesn't even rely on _OSC, 
> > > but I
> > > suspect this won't be the last effort as we detangle Firmware First.
> > > 
> > > [1] https://members.pcisig.com/wg/PCI-SIG/document/12888
> > 
> > Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches
> > from Alex and Sathy first, then see what needs to be done on top of
> > those, so I'm going to push these off for a few days and they'll
> > probably need a refresh.
> > 
> > Bjorn
> 
> Agreed, no need to merge now. Just wanted to bring up the DPC
> ambiguity, which I think was first addressed by dpc-native:
> 
> commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8
> Author: Olof Johansson 
> Date:   Wed Oct 23 12:22:05 2019 -0700
> 
> PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control
> 
> Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
> Linux handled DPC events regardless of whether firmware had granted it
> ownership of AER or DPC, e.g., via _OSC.
> 
> PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
> control of AER, so after eed85ff4c0da7, Linux handles DPC events only if 
> it
> has control of AER.
> 
> On platforms that do not grant OS control of AER via _OSC, Linux DPC
> handling worked before eed85ff4c0da7 but not after.
> 
> To make Linux DPC handling work on those platforms the same way they did
> before, add a "pcie_ports=dpc-native" kernel parameter that 

Re: [PATCH v4 0/3] Replace private domain with per-group default domain

2020-05-06 Thread Derrick, Jonathan
On Wed, 2020-05-06 at 10:09 +0800, Daniel Drake wrote:
> On Wed, May 6, 2020 at 10:03 AM Lu Baolu  wrote:
> > https://lkml.org/lkml/2020/4/14/616
> > [This has been applied in iommu/next.]
> > 
> > Hence, there is no need to keep the private domain implementation
> > in the Intel IOMMU driver. This patch series aims to remove it.
> 
> I applied these patches on top of Joerg's branch and confirmed that
> they fix the issue discussed in the thread:
> 
> [PATCH v2] iommu/vt-d: consider real PCI device when checking if
> mapping is needed
> (the patch there is no longer needed)
> 
> Tested-by: Daniel Drake 
> 
> Thanks!

Looks like the key to the real DMA dev fix was removing
identity_mapping() paths that led to dev->archdata.iommu == NULL -> DMA
domain

Works great for me as well

Reviewed-by: Jon Derrick 


Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC

2020-05-01 Thread Derrick, Jonathan
On Fri, 2020-05-01 at 12:16 -0500, Bjorn Helgaas wrote:
> On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote:
> > Hi Bjorn & Kuppuswamy,
> > 
> > I see a problem in the DPC ECN [1] to _OSC in that it doesn't give us a way 
> > to
> > determine if firmware supports _OSC DPC negotation, and therefore how to 
> > handle
> > DPC.
> > 
> > Here is the wording of the ECN that implies that Firmware without _OSC DPC
> > negotiation support should have the OSPM rely on _OSC AER negotiation when
> > determining DPC control:
> > 
> >   PCIe Base Specification suggests that Downstream Port Containment may be
> >   controlled either by the Firmware or the Operating System. It also 
> > suggests
> >   that the Firmware retain ownership of Downstream Port Containment if it 
> > also
> >   owns AER. When the Firmware owns Downstream Port Containment, it is 
> > expected
> >   to use the new "Error Disconnect Recover" notification to alert OSPM of a
> >   Downstream Port Containment event.
> > 
> > In legacy platforms, as bits in _OSC are reserved prior to implementation, 
> > ACPI
> > Root Bus enumeration will mark these Host Bridges as without Native DPC
> > support, even though the specification implies it's expected that AER _OSC
> > negotiation determines DPC control for these platforms. There seems to be a
> > need for a way to determine if the DPC control bit in _OSC is supported and
> > fallback on AER otherwise.
> > 
> > 
> > Currently portdrv assumes DPC control if the port has Native AER services:
> > 
> > static int get_port_device_capability(struct pci_dev *dev)
> > ...
> > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> > pci_aer_available() &&
> > (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> > services |= PCIE_PORT_SERVICE_DPC;
> > 
> > Newer firmware may not grant OSPM DPC control, if for instance, it expects 
> > to
> > use Error Disconnect Recovery. However it looks like ACPI will use DPC 
> > services
> > via the EDR driver, without binding the full DPC port service driver.
> > 
> > 
> > If we change portdrv to probe based on host->native_dpc and not AER, then we
> > break instances with legacy firmware where OSPM will clear host->native_dpc
> > solely due to _OSC bits being reserved:
> > 
> > struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> > ...
> > if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
> > host_bridge->native_dpc = 0;
> > 
> > 
> > 
> > So my assumption instead is that host->native_dpc can be 0 and expect Native
> > DPC services if AER is used. In other words, if and only if DPC probe is
> > invoked from portdrv, then it needs to rely on the AER dependency. 
> > Otherwise it
> > should be assumed that ACPI set up DPC via EDR. This covers legacy firmware.
> > 
> > However it seems like that could be trouble with newer firmware that might 
> > give
> > OSPM control of AER but not DPC, and would result in both Native DPC and EDR
> > being in effect.
> > 
> > 
> > Anyways here are two patches that give control of AER and DPC on the 
> > results of
> > _OSC. They don't mess with the HEST parser as I expect those to be removed 
> > at
> > some point. I need these for VMD support which doesn't even rely on _OSC, 
> > but I
> > suspect this won't be the last effort as we detangle Firmware First.
> > 
> > [1] https://members.pcisig.com/wg/PCI-SIG/document/12888
> 
> Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches
> from Alex and Sathy first, then see what needs to be done on top of
> those, so I'm going to push these off for a few days and they'll
> probably need a refresh.
> 
> Bjorn


Agreed, no need to merge now. Just wanted to bring up the DPC
ambiguity, which I think was first addressed by dpc-native:

commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8
Author: Olof Johansson 
Date:   Wed Oct 23 12:22:05 2019 -0700

PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control

Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
Linux handled DPC events regardless of whether firmware had granted it
ownership of AER or DPC, e.g., via _OSC.

PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it
has control of AER.

On platforms that do not grant OS control of AER via _OSC, Linux DPC
handling worked before eed85ff4c0da7 but not after.

To make Linux DPC handling work on those platforms the same way they did
before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux
handle DPC events regardless of whether it has control of AER.

[bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/]
Link: https://lore.kernel.org/r/20191023192205.97024-1-o...@lixom.net
Signed-off-by: Olof Johansson 
Signed-off-by: Bjorn Helgaas 


Thanks,
Jon


Re: [PATCH v1 1/1] PCI/AER: Use _OSC negotiation to determine AER ownership

2020-04-28 Thread Derrick, Jonathan
Sorry I didn't see this before my comments yesterday

For either individual or split set,
Reviewed-by: Jon Derrick 

Thanks Kuppuswamy

On Mon, 2020-04-27 at 19:02 -0500, Bjorn Helgaas wrote:
> [+cc Jon, Alexandru]
> 
> On Sun, Apr 26, 2020 at 11:30:06AM -0700, 
> sathyanarayanan.kuppusw...@linux.intel.com wrote:
> > From: Kuppuswamy Sathyanarayanan 
> > 
> > 
> > Currently PCIe AER driver uses HEST FIRMWARE_FIRST bit to
> > determine the PCIe AER Capability ownership between OS and
> > firmware. This support is added based on following spec
> > reference.
> > 
> > Per ACPI spec r6.3, table 18-387, 18-388, 18-389, HEST table
> > flags field BIT-0 and BIT-1 can be used to expose the
> > ownership of error source between firmware and OSPM.
> > 
> > Bit [0] - FIRMWARE_FIRST: If set, indicates that system
> >   firmware will handle errors from this source
> >   first.
> > Bit [1] – GLOBAL: If set, indicates that the settings
> >   contained in this structure apply globally to all
> >   PCI Express Bridges.
> > 
> > Although above spec reference states that setting
> > FIRMWARE_FIRST bit means firmware will handle the error source
> > first, it does not explicitly state anything about AER
> > ownership. So using HEST to determine AER ownership is
> > incorrect.
> > 
> > Also, as per following specification references, _OSC can be
> > used to negotiate the AER ownership between firmware and OS.
> > Details are,
> > 
> > Per ACPI spec r6.3, sec 6.2.11.3, table titled “Interpretation
> > of _OSC Control Field” and as per PCI firmware specification r3.2,
> > sec 4.5.1, table 4-5, OS can set bit 3 of _OSC control field
> > to request control over PCI Express AER. If the OS successfully
> > receives control of this feature, it must handle error reporting
> > through the AER Capability as described in the PCI Express Base
> > Specification.
> > 
> > Since above spec references clearly states _OSC can be used to
> > determine AER ownership, don't depend on HEST FIRMWARE_FIRST bit.
> 
> I split this up a bit and applied the first part to pci/error to get
> it into -next so we can start seeing what breaks.  I won't be too
> surprised if we trip over something.
> 
> Here's the first part (entire original patch is at
> https://lore.kernel.org/r/67af2931705bed9a588b5a39d369cb70b9942190.1587925636.git.sathyanarayanan.kuppusw...@linux.intel.com).
> 
> commit 8f8e42e7c2dd ("PCI/AER: Use only _OSC to determine AER ownership")
> Author: Kuppuswamy Sathyanarayanan 
> 
> Date:   Mon Apr 27 18:25:13 2020 -0500
> 
> PCI/AER: Use only _OSC to determine AER ownership
> 
> Per the PCI Firmware spec, r3.2, sec 4.5.1, the OS can request control of
> AER via bit 3 of the _OSC Control Field.  In the returned value of the
> Control Field:
> 
>   The firmware sets [bit 3] to 1 to grant control over PCI Express 
> Advanced
>   Error Reporting.  ...  after control is transferred to the operating
>   system, firmware must not modify the Advanced Error Reporting 
> Capability.
>   If control of this feature was requested and denied or was not 
> requested,
>   firmware returns this bit set to 0.
> 
> Previously the pci_root driver looked at the HEST FIRMWARE_FIRST bit to
> determine whether to request ownership of the AER Capability.  This was
> based on ACPI spec v6.3, sec 18.3.2.4, and similar sections, which say
> things like:
> 
>   Bit [0] - FIRMWARE_FIRST: If set, indicates that system firmware will
> handle errors from this source first.
> 
>   Bit [1] - GLOBAL: If set, indicates that the settings contained in this
> structure apply globally to all PCI Express Devices.
> 
> These ACPI references don't say anything about ownership of the AER
> Capability.
> 
> Remove use of the FIRMWARE_FIRST bit and rely only on the _OSC bit to
> determine whether we have control of the AER Capability.
> 
> Link: 
> https://lore.kernel.org/r/67af2931705bed9a588b5a39d369cb70b9942190.1587925636.git.sathyanarayanan.kuppusw...@linux.intel.com
> [bhelgaas: commit log, split patches]
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Bjorn Helgaas 
> 
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index ac8ad6cb82aa..9e235c1a75ff 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -483,13 +483,8 @@ static void negotiate_os_control(struct acpi_pci_root 
> *root, int *no_aspm,
>   if (IS_ENABLED(CONFIG_HOTPLUG_PCI_SHPC))
>   control |= OSC_PCI_SHPC_NATIVE_HP_CONTROL;
>  
> - if (pci_aer_available()) {
> - if (aer_acpi_firmware_first())
> - dev_info(>dev,
> -  "PCIe AER handled by firmware\n");
> - else
> - control |= OSC_PCI_EXPRESS_AER_CONTROL;
> - }
> + if (pci_aer_available())
> + control 

Re: [PATCH 2/2] block: sed-opal: fix sparse warning: convert __be64 data

2019-10-03 Thread Derrick, Jonathan
On Thu, 2019-10-03 at 11:40 -0400, Scott Bauer wrote:
> On Wed, Oct 02, 2019 at 07:23:15PM -0700, Randy Dunlap wrote:
> > From: Randy Dunlap 
> > 
> > sparse warns about incorrect type when using __be64 data.
> > It is not being converted to CPU-endian but it should be.
> > 
> > Fixes these sparse warnings:
> > 
> > ../block/sed-opal.c:375:20: warning: incorrect type in assignment 
> > (different base types)
> > ../block/sed-opal.c:375:20:expected unsigned long long [usertype] align
> > ../block/sed-opal.c:375:20:got restricted __be64 const [usertype] 
> > alignment_granularity
> > ../block/sed-opal.c:376:25: warning: incorrect type in assignment 
> > (different base types)
> > ../block/sed-opal.c:376:25:expected unsigned long long [usertype] 
> > lowest_lba
> > ../block/sed-opal.c:376:25:got restricted __be64 const [usertype] 
> > lowest_aligned_lba
> > 
> > Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
> > Signed-off-by: Randy Dunlap 
> > Cc: Scott Bauer 
> > Cc: Rafael Antognolli 
> > Cc: Jens Axboe 
> > Cc: linux-bl...@vger.kernel.org
> 
> + Jon and Revanth,
> 
> 
> These look fine. They're currently unused, but may be useful in the future 
> for sysfs or what ever else we add in.

I imagine modern devices with logical/physical indirection would
probably report gran=1 and lowest=0 regardless.

Either way,
Reviewed-by: Jon Derrick 


Re: [PATCH 1/2] block: sed-opal: fix sparse warning: obsolete array init.

2019-10-03 Thread Derrick, Jonathan
On Wed, 2019-10-02 at 19:23 -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Fix sparse warning: (missing '=')
> ../block/sed-opal.c:133:17: warning: obsolete array initializer, use C99 
> syntax
> 
> Fixes: ff91064ea37c ("block: sed-opal: check size of shadow mbr")
> Signed-off-by: Randy Dunlap 
> Cc: Jens Axboe 
> Cc: linux-bl...@vger.kernel.org
> Cc: Jonas Rabenstein 
> Cc: David Kozub 
> ---
>  block/sed-opal.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- lnx-54-rc1.orig/block/sed-opal.c
> +++ lnx-54-rc1/block/sed-opal.c
> @@ -129,7 +129,7 @@ static const u8 opaluid[][OPAL_UID_LENGT
>   { 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x84, 0x01 },
>  
>   /* tables */
> - [OPAL_TABLE_TABLE]
> + [OPAL_TABLE_TABLE] =
>   { 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01 },
>   [OPAL_LOCKINGRANGE_GLOBAL] =
>   { 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 },
> 

Reviewed-by: Jon Derrick 


Re: [PATCH 2/2] PCI: vmd: Fix shadow offsets to reflect spec changes

2019-09-17 Thread Derrick, Jonathan
On Tue, 2019-09-17 at 17:37 +0100, Lorenzo Pieralisi wrote:
> On Tue, Sep 17, 2019 at 03:51:39PM +0000, Derrick, Jonathan wrote:
> 
> [...]
> 
> > Sorry for the confusion.
> > 
> > These changes only affect systems with VMD devices with 8086:28C0
> > device IDs, but these won't be production hardware for some time.
> > 
> > Systems with VMD devices exist in the wild with 8086:201D device IDs.
> > These don't support the guest passthrough mode and this code won't
> > break anything with them. Additionally, patch 1/2 (bus numbering) only
> > affects 8086:28C0.
> > 
> > So on existing HW, these patches won't affect anything
> 
> It is me who created confusion, apologies. I read the code properly and
> I understand that both patches are fixes for HW that is still not
> available (and they can't create an issue with current kernel because
> HAS_MEMBAR_SHADOW and HAS_BUS_RESTRICTIONS features are not implemented
> on 8086:201D), we should take these patches and trickle them to stable
> kernels as soon as possible so that when HW _is_ available mainline and
> stable kernels are fixed.
> 
> Correct ?
> 
> Lorenzo

That's correct. It will apply to 5.2 stable but is missing a few deps
for 4.19 that I wouldn't consider as qualifying as stable. I can
backport to 4.19 fairly easily.


Thanks,
Jon


Re: [PATCH 2/2] PCI: vmd: Fix shadow offsets to reflect spec changes

2019-09-17 Thread Derrick, Jonathan
On Tue, 2019-09-17 at 16:15 +0100, Lorenzo Pieralisi wrote:
> On Tue, Sep 17, 2019 at 02:45:03PM +0000, Derrick, Jonathan wrote:
> > On Tue, 2019-09-17 at 15:05 +0100, Lorenzo Pieralisi wrote:
> > > On Tue, Sep 17, 2019 at 01:55:59PM +, Derrick, Jonathan wrote:
> > > > On Tue, 2019-09-17 at 11:41 +0100, Lorenzo Pieralisi wrote:
> > > > > On Mon, Sep 16, 2019 at 07:54:35AM -0600, Jon Derrick wrote:
> > > > > > The shadow offset scratchpad was moved to 0x2000-0x2010. Update the
> > > > > > location to get the correct shadow offset.
> > > > > 
> > > > > Hi Jon,
> > > > > 
> > > > > what does "was moved" mean ? Would this code still work on previous 
> > > > > HW ?
> > > > > 
> > > > The previous code won't work on (not yet released) hw. Guests using the
> > > > domain will see the wrong offset and enumerate the domain incorrectly.
> > > 
> > > That's true also for new kernels on _current_ hardware, isn't it ?
> > > 
> > > What I am saying is that there must be a way to detect the right
> > > offset from HW probing or firmware otherwise things will break
> > > one way of another.
> > > 
> > I think this is basically that, but the spec changed which register
> > addresses contained the offset. The offset was still discoverable
> > either way, but is now within 0x2000-0x2010, with 0x2010-0x2090 as oob
> > interface.
> > 
> > 
> > 
> > > > > We must make sure that the address move is managed seamlessly by the
> > > > > kernel.
> > > > If we need to avoid changing addressing within stable, then that's
> > > > fine. But otherwise is there an issue merging with 5.4?
> > > 
> > > See above. Would 5.4 with this patch applied work on systems where
> > > the offset is the same as the _current_ one without this patch
> > > applied ?
> > I understand your concern, but these systems with wrong addressing
> > won't exist because the hardware isn't released yet.
> > 
> > In the future, the hardware will be released and users will inevitably
> > load some unfixed kernel, and we would like to point to stable for the
> > fix.
> 
> I am sorry for being blunt but I need to understand. If we apply
> this patch, are you telling me that the _current_ HW would fail ?
> 
> I assume the current HW+kernel set-up is working, maybe that's
> what I got wrong.
> 
> Reworded: on existing HW, is this patch fixing anything ?
> 
> This patch when it hits the mainline will trickle into stable
> kernel unchanged.
Sorry for the confusion.

These changes only affect systems with VMD devices with 8086:28C0
device IDs, but these won't be production hardware for some time.

Systems with VMD devices exist in the wild with 8086:201D device IDs.
These don't support the guest passthrough mode and this code won't
break anything with them. Additionally, patch 1/2 (bus numbering) only
affects 8086:28C0.

So on existing HW, these patches won't affect anything



> 
> > > > > For the time being I have to drop this fix and it is extremely
> > > > > tight to get it in v5.4, I can't send stable changes that we may
> > > > > have to revert.
> > > > Aren't we in the beginning of the merge window?
> > > 
> > > Yes and that's the problem, patches for v5.4 should have already
> > > being queued a while ago, I do not know when Bjorn will send the
> > > PR for v5.4 but that's going to happen shortly, I am making an
> > > exception to squeeze these patches in since it is vmd only code
> > > but still it is very very tight.
> > > 
> > If you feel there's a risk, then I think it can be staged for v5.5.
> > Hardware will not be available for some time.
> 
> I do not feel it is risky, I feel it would be much better if the
> scratchpad address could be detected at runtime through versioning
> of sorts either HW or firmware based.
> 
> If we can't probe it inevitably we will have systems where kernels
> would break and that's something we should avoid.
> 
I agree that it might have been nicer if it were an ACPI/EFI var, but I
think there were some complexities with teaching hypervisors to expose
it to the guests for use when enumerating the domain from the passed-
through endpoint. The method that exists in 8086:28C0 hardware divorces
the firmware descriptors from the device so that the guest driver only
needs to read the host-to-guest physical offset from the device itself.


Best regards,
Jon


> Lorenzo


Re: [PATCH 2/2] PCI: vmd: Fix shadow offsets to reflect spec changes

2019-09-17 Thread Derrick, Jonathan
On Tue, 2019-09-17 at 15:05 +0100, Lorenzo Pieralisi wrote:
> On Tue, Sep 17, 2019 at 01:55:59PM +0000, Derrick, Jonathan wrote:
> > On Tue, 2019-09-17 at 11:41 +0100, Lorenzo Pieralisi wrote:
> > > On Mon, Sep 16, 2019 at 07:54:35AM -0600, Jon Derrick wrote:
> > > > The shadow offset scratchpad was moved to 0x2000-0x2010. Update the
> > > > location to get the correct shadow offset.
> > > 
> > > Hi Jon,
> > > 
> > > what does "was moved" mean ? Would this code still work on previous HW ?
> > > 
> > The previous code won't work on (not yet released) hw. Guests using the
> > domain will see the wrong offset and enumerate the domain incorrectly.
> 
> That's true also for new kernels on _current_ hardware, isn't it ?
> 
> What I am saying is that there must be a way to detect the right
> offset from HW probing or firmware otherwise things will break
> one way of another.
> 
I think this is basically that, but the spec changed which register
addresses contained the offset. The offset was still discoverable
either way, but is now within 0x2000-0x2010, with 0x2010-0x2090 as oob
interface.



> > > We must make sure that the address move is managed seamlessly by the
> > > kernel.
> > If we need to avoid changing addressing within stable, then that's
> > fine. But otherwise is there an issue merging with 5.4?
> 
> See above. Would 5.4 with this patch applied work on systems where
> the offset is the same as the _current_ one without this patch
> applied ?
I understand your concern, but these systems with wrong addressing
won't exist because the hardware isn't released yet.

In the future, the hardware will be released and users will inevitably
load some unfixed kernel, and we would like to point to stable for the
fix.



> 
> > > For the time being I have to drop this fix and it is extremely
> > > tight to get it in v5.4, I can't send stable changes that we may
> > > have to revert.
> > Aren't we in the beginning of the merge window?
> 
> Yes and that's the problem, patches for v5.4 should have already
> being queued a while ago, I do not know when Bjorn will send the
> PR for v5.4 but that's going to happen shortly, I am making an
> exception to squeeze these patches in since it is vmd only code
> but still it is very very tight.
> 
If you feel there's a risk, then I think it can be staged for v5.5.
Hardware will not be available for some time.



Best regards,
Jon


> Thanks,
> Lorenzo
> 
> > > Thanks,
> > > Lorenzo
> > > 
> > > > Cc: sta...@vger.kernel.org # v5.2+
> > > > Fixes: 6788958e ("PCI: vmd: Assign membar addresses from shadow 
> > > > registers")
> > > > Signed-off-by: Jon Derrick 
> > > > ---
> > > >  drivers/pci/controller/vmd.c | 9 ++---
> > > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > index 2e4da3f51d6b..a35d3f3996d7 100644
> > > > --- a/drivers/pci/controller/vmd.c
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -31,6 +31,9 @@
> > > >  #define PCI_REG_VMLOCK 0x70
> > > >  #define MB2_SHADOW_EN(vmlock)  (vmlock & 0x2)
> > > >  
> > > > +#define MB2_SHADOW_OFFSET  0x2000
> > > > +#define MB2_SHADOW_SIZE16
> > > > +
> > > >  enum vmd_features {
> > > > /*
> > > >  * Device may contain registers which hint the physical 
> > > > location of the
> > > > @@ -578,7 +581,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
> > > > unsigned long features)
> > > > u32 vmlock;
> > > > int ret;
> > > >  
> > > > -   membar2_offset = 0x2018;
> > > > +   membar2_offset = MB2_SHADOW_OFFSET + MB2_SHADOW_SIZE;
> > > > ret = pci_read_config_dword(vmd->dev, PCI_REG_VMLOCK, 
> > > > );
> > > > if (ret || vmlock == ~0)
> > > > return -ENODEV;
> > > > @@ -590,9 +593,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
> > > > unsigned long features)
> > > > if (!membar2)
> > > > return -ENOMEM;
> > > > offset[0] = 
> > > > vmd->dev->resource[VMD_MEMBAR1].start -
> > > > -   readq(membar2 + 0x2008);
> > > > +   readq(membar2 + 
> > > > MB2_SHADOW_OFFSET);
> > > > offset[1] = 
> > > > vmd->dev->resource[VMD_MEMBAR2].start -
> > > > -   readq(membar2 + 0x2010);
> > > > +   readq(membar2 + 
> > > > MB2_SHADOW_OFFSET + 8);
> > > > pci_iounmap(vmd->dev, membar2);
> > > > }
> > > > }
> > > > -- 
> > > > 2.20.1
> > > > 


Re: [PATCH 2/2] PCI: vmd: Fix shadow offsets to reflect spec changes

2019-09-17 Thread Derrick, Jonathan
On Tue, 2019-09-17 at 11:41 +0100, Lorenzo Pieralisi wrote:
> On Mon, Sep 16, 2019 at 07:54:35AM -0600, Jon Derrick wrote:
> > The shadow offset scratchpad was moved to 0x2000-0x2010. Update the
> > location to get the correct shadow offset.
> 
> Hi Jon,
> 
> what does "was moved" mean ? Would this code still work on previous HW ?
> 
The previous code won't work on (not yet released) hw. Guests using the
domain will see the wrong offset and enumerate the domain incorrectly.


> We must make sure that the address move is managed seamlessly by the
> kernel.
If we need to avoid changing addressing within stable, then that's
fine. But otherwise is there an issue merging with 5.4?


> 
> For the time being I have to drop this fix and it is extremely
> tight to get it in v5.4, I can't send stable changes that we may
> have to revert.
Aren't we in the beginning of the merge window?


> 
> Thanks,
> Lorenzo
> 
> > Cc: sta...@vger.kernel.org # v5.2+
> > Fixes: 6788958e ("PCI: vmd: Assign membar addresses from shadow registers")
> > Signed-off-by: Jon Derrick 
> > ---
> >  drivers/pci/controller/vmd.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > index 2e4da3f51d6b..a35d3f3996d7 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -31,6 +31,9 @@
> >  #define PCI_REG_VMLOCK 0x70
> >  #define MB2_SHADOW_EN(vmlock)  (vmlock & 0x2)
> >  
> > +#define MB2_SHADOW_OFFSET  0x2000
> > +#define MB2_SHADOW_SIZE16
> > +
> >  enum vmd_features {
> > /*
> >  * Device may contain registers which hint the physical location of the
> > @@ -578,7 +581,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
> > unsigned long features)
> > u32 vmlock;
> > int ret;
> >  
> > -   membar2_offset = 0x2018;
> > +   membar2_offset = MB2_SHADOW_OFFSET + MB2_SHADOW_SIZE;
> > ret = pci_read_config_dword(vmd->dev, PCI_REG_VMLOCK, );
> > if (ret || vmlock == ~0)
> > return -ENODEV;
> > @@ -590,9 +593,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
> > unsigned long features)
> > if (!membar2)
> > return -ENOMEM;
> > offset[0] = vmd->dev->resource[VMD_MEMBAR1].start -
> > -   readq(membar2 + 0x2008);
> > +   readq(membar2 + MB2_SHADOW_OFFSET);
> > offset[1] = vmd->dev->resource[VMD_MEMBAR2].start -
> > -   readq(membar2 + 0x2010);
> > +   readq(membar2 + MB2_SHADOW_OFFSET + 8);
> > pci_iounmap(vmd->dev, membar2);
> > }
> > }
> > -- 
> > 2.20.1
> > 


Re: [PATCH V6 1/2] genirq/affinity: Improve __irq_build_affinity_masks()

2019-08-22 Thread Derrick, Jonathan
lgtm

Reviewed-by: Jon Derrick 

On Mon, 2019-08-19 at 20:49 +0800, Ming Lei wrote:
> One invariant of __irq_build_affinity_masks() is that all CPUs in the
> specified masks( cpu_mask AND node_to_cpumask for each node) should be
> covered during the spread. Even though all requested vectors have been
> reached, we still need to spread vectors among remained CPUs. The similar
> policy has been taken in case of 'numvecs <= nodes' already:
> 
> So remove the following check inside the loop:
> 
>   if (done >= numvecs)
>   break;
> 
> Meantime assign at least 1 vector for remained nodes if 'numvecs' vectors
> have been handled already.
> 
> Also, if the specified cpumask for one numa node is empty, simply not
> spread vectors on this node.
> 
> Cc: Christoph Hellwig 
> Cc: Keith Busch 
> Cc: linux-n...@lists.infradead.org,
> Cc: Jon Derrick 
> Cc: Jens Axboe 
> Signed-off-by: Ming Lei 
> ---
>  kernel/irq/affinity.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 6fef48033f96..c7cca942bd8a 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -129,14 +129,26 @@ static int __irq_build_affinity_masks(unsigned int 
> startvec,
>   for_each_node_mask(n, nodemsk) {
>   unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
>  
> - /* Spread the vectors per node */
> - vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> -
>   /* Get the cpus on this node which are in the mask */
>   cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> -
> - /* Calculate the number of cpus per vector */
>   ncpus = cpumask_weight(nmsk);
> + if (!ncpus)
> + continue;
> +
> + /*
> +  * Calculate the number of cpus per vector
> +  *
> +  * Spread the vectors evenly per node. If the requested
> +  * vector number has been reached, simply allocate one
> +  * vector for each remaining node so that all nodes can
> +  * be covered
> +  */
> + if (numvecs > done)
> + vecs_per_node = max_t(unsigned,
> + (numvecs - done) / nodes, 1);
> + else
> + vecs_per_node = 1;
> +
>   vecs_to_assign = min(vecs_per_node, ncpus);
>  
>   /* Account for rounding errors */
> @@ -156,13 +168,11 @@ static int __irq_build_affinity_masks(unsigned int 
> startvec,
>   }
>  
>   done += v;
> - if (done >= numvecs)
> - break;
>   if (curvec >= last_affv)
>   curvec = firstvec;
>   --nodes;
>   }
> - return done;
> + return done < numvecs ? done : numvecs;
>  }
>  
>  /*


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH V6 2/2] genirq/affinity: Spread vectors on node according to nr_cpu ratio

2019-08-22 Thread Derrick, Jonathan
lgtm. Don't know how often we'll see these configurations but it looks
like it'll be handled gracefully
Thanks for the effort

Reviewed-by: Jon Derrick 

On Mon, 2019-08-19 at 20:49 +0800, Ming Lei wrote:
> Now __irq_build_affinity_masks() spreads vectors evenly per node, and
> all vectors may not be spread in case that each numa node has different
> CPU number, then the warning in irq_build_affinity_masks() can
> be triggered.
> 
> Improve current spreading algorithm by assigning vectors according to
> the ratio of node's nr_cpu to nr_remaining_cpus, meantime running the
> assignment from smaller nodes to bigger nodes to guarantee that every
> active node gets allocated at least one vector, then we can avoid
> cross-node spread in normal situation.
> 
> Meantime the reported warning can be fixed.
> 
> Another big goodness is that the spread approach becomes more fair if
> node has different CPU number.
> 
> For example, on the following machine:
>   [root@ktest-01 ~]# lscpu
>   ...
>   CPU(s):  16
>   On-line CPU(s) list: 0-15
>   Thread(s) per core:  1
>   Core(s) per socket:  8
>   Socket(s):   2
>   NUMA node(s):2
>   ...
>   NUMA node0 CPU(s):   0,1,3,5-9,11,13-15
>   NUMA node1 CPU(s):   2,4,10,12
> 
> When driver requests to allocate 8 vectors, the following spread can
> be got:
>   irq 31, cpu list 2,4
>   irq 32, cpu list 10,12
>   irq 33, cpu list 0-1
>   irq 34, cpu list 3,5
>   irq 35, cpu list 6-7
>   irq 36, cpu list 8-9
>   irq 37, cpu list 11,13
>   irq 38, cpu list 14-15
> 
> Without this patch, kernel warning is triggered on above situation, and
> allocation result was supposed to be 4 vectors for each node.
> 
> Cc: Christoph Hellwig 
> Cc: Keith Busch 
> Cc: linux-n...@lists.infradead.org,
> Cc: Jon Derrick 
> Cc: Jens Axboe 
> Reported-by: Jon Derrick 
> Reviewed-by: Jon Derrick 
> Reviewed-by: Keith Busch 
> Signed-off-by: Ming Lei 
> ---
>  kernel/irq/affinity.c | 239 +++---
>  1 file changed, 200 insertions(+), 39 deletions(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index c7cca942bd8a..d905e844bf3a 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>   unsigned int cpus_per_vec)
> @@ -94,6 +95,155 @@ static int get_nodes_in_cpumask(cpumask_var_t 
> *node_to_cpumask,
>   return nodes;
>  }
>  
> +struct node_vectors {
> + unsigned id;
> +
> + union {
> + unsigned nvectors;
> + unsigned ncpus;
> + };
> +};
> +
> +static int ncpus_cmp_func(const void *l, const void *r)
> +{
> + const struct node_vectors *ln = l;
> + const struct node_vectors *rn = r;
> +
> + return ln->ncpus - rn->ncpus;
> +}
> +
> +/*
> + * Allocate vector number for each node, so that for each node:
> + *
> + * 1) the allocated number is >= 1
> + *
> + * 2) the allocated numbver is <= active CPU number of this node
> + *
> + * The actual allocated total vectors may be less than @numvecs when
> + * active total CPU number is less than @numvecs.
> + *
> + * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]'
> + * for each node.
> + */
> +static void alloc_nodes_vectors(unsigned int numvecs,
> + const cpumask_var_t *node_to_cpumask,
> + const struct cpumask *cpu_mask,
> + const nodemask_t nodemsk,
> + struct cpumask *nmsk,
> + struct node_vectors *node_vectors)
> +{
> + unsigned n, remaining_ncpus = 0;
> +
> + for (n = 0; n < nr_node_ids; n++) {
> + node_vectors[n].id = n;
> + node_vectors[n].ncpus = UINT_MAX;
> + }
> +
> + for_each_node_mask(n, nodemsk) {
> + unsigned ncpus;
> +
> + cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> + ncpus = cpumask_weight(nmsk);
> +
> + if (!ncpus)
> + continue;
> + remaining_ncpus += ncpus;
> + node_vectors[n].ncpus = ncpus;
> + }
> +
> + numvecs = min_t(unsigned, remaining_ncpus, numvecs);
> +
> + sort(node_vectors, nr_node_ids, sizeof(node_vectors[0]),
> +  ncpus_cmp_func, NULL);
> +
> + /*
> +  * Allocate vectors for each node according to the ratio of this
> +  * node's nr_cpus to remaining un-assigned ncpus. 'numvecs' is
> +  * bigger than number of active numa nodes. Always start the
> +  * allocation from the node with minimized nr_cpus.
> +  *
> +  * This way guarantees that each active node gets allocated at
> +  * least one vector, and the theory is simple: over-allocation
> +  * is only done when this node is assigned by one 

Re: [PATCH V5 2/2] genirq/affinity: Spread vectors on node according to nr_cpu ratio

2019-08-16 Thread Derrick, Jonathan
On Fri, 2019-08-16 at 09:53 -0600, Keith Busch wrote:
> On Thu, Aug 15, 2019 at 07:28:49PM -0700, Ming Lei wrote:
> > Now __irq_build_affinity_masks() spreads vectors evenly per node, and
> > all vectors may not be spread in case that each numa node has different
> > CPU number, then the warning in irq_build_affinity_masks() can
> > be triggered.
> > 
> > Improve current spreading algorithm by assigning vectors according to
> > the ratio of node's nr_cpu to nr_remaining_cpus, meantime running the
> > assignment from smaller nodes to bigger nodes to guarantee that every
> > active node gets allocated at least one vector, then we can avoid
> > cross-node spread in normal situation.
> > 
> > Meantime the reported warning can be fixed.
> > 
> > Another big goodness is that the spread approach becomes more fair if
> > node has different CPU number.
> > 
> > For example, on the following machine:
> > [root@ktest-01 ~]# lscpu
> > ...
> > CPU(s):  16
> > On-line CPU(s) list: 0-15
> > Thread(s) per core:  1
> > Core(s) per socket:  8
> > Socket(s):   2
> > NUMA node(s):2
> > ...
> > NUMA node0 CPU(s):   0,1,3,5-9,11,13-15
> > NUMA node1 CPU(s):   2,4,10,12
> > 
> > When driver requests to allocate 8 vectors, the following spread can
> > be got:
> > irq 31, cpu list 2,4
> > irq 32, cpu list 10,12
> > irq 33, cpu list 0-1
> > irq 34, cpu list 3,5
> > irq 35, cpu list 6-7
> > irq 36, cpu list 8-9
> > irq 37, cpu list 11,13
> > irq 38, cpu list 14-15
> > 
> > Without this patch, kernel warning is triggered on above situation, and
> > allocation result was supposed to be 4 vectors for each node.
> > 
> > Cc: Christoph Hellwig 
> > Cc: Keith Busch 
> > Cc: linux-n...@lists.infradead.org,
> > Cc: Jon Derrick 
> > Cc: Jens Axboe 
> > Reported-by: Jon Derrick 
> > Signed-off-by: Ming Lei 
> 
> I had every intention to thoroughly test this on imbalanced node
> configurations, but that's not going to happen anytime soon. It looks
> correct to me, so I'll append my review here.
> 
I can only test this with 2 nodes but I have varied nr_cpus as well as
using different devices with fewer and more vectors than CPUs. Spread
looks good.

Thank you

Reviewed-by: Jon Derrick 


[snip]


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH V3 1/3] genirq/affinity: Enhance warning check

2019-08-13 Thread Derrick, Jonathan
Hi Ming,

On Tue, 2019-08-13 at 16:14 +0800, Ming Lei wrote:
> The two-stage spread is done on same irq vectors, and we just need that
> either one stage covers all vector, not two stage work together to cover
> all vectors.
> 
> So enhance the warning check to make sure all vectors are spread.
> 
> Cc: Christoph Hellwig 
> Cc: Keith Busch 
> Cc: linux-n...@lists.infradead.org,
> Cc: Jon Derrick 
> Cc: Jens Axboe 
> Fixes: 6da4b3ab9a6 ("genirq/affinity: Add support for allocating interrupt 
> sets")
> Signed-off-by: Ming Lei 
> ---
>  kernel/irq/affinity.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 6fef48033f96..265b3076f16b 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -215,8 +215,7 @@ static int irq_build_affinity_masks(unsigned int 
> startvec, unsigned int numvecs,
>  npresmsk, nmsk, masks);
>   put_online_cpus();
>  
> - if (nr_present < numvecs)
> - WARN_ON(nr_present + nr_others < numvecs);
> + WARN_ON(max(nr_present, nr_others) < numvecs);

I think the patch description assumes the first condition
"The two-stage spread is done on same irq vectors"

/*
 * Spread on non present CPUs starting from the next vector to be
 * handled. If the spreading of present CPUs already exhausted the
 * vector space, assign the non present CPUs to the already spread
 * out vectors.
 */
if (nr_present >= numvecs)
curvec = firstvec;

But doesn't following condition imply nr_others spread is potentionally
different vector set?

else
curvec = firstvec + nr_present;

>  
>   free_node_to_cpumask(node_to_cpumask);
>  


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] genirq/affinity: report extra vectors on uneven nodes

2019-08-08 Thread Derrick, Jonathan
On Thu, 2019-08-08 at 10:32 -0600, Keith Busch wrote:
> On Thu, Aug 08, 2019 at 09:04:28AM +0200, Thomas Gleixner wrote:
> > On Wed, 7 Aug 2019, Jon Derrick wrote:
> > > The current irq spreading algorithm spreads vectors amongst cpus evenly
> > > per node. If a node has more cpus than another node, the extra vectors
> > > being spread may not be reported back to the caller.
> > > 
> > > This is most apparent with the NVMe driver and nr_cpus < vectors, where
> > > the underreporting results in the caller's WARN being triggered:
> > > 
> > > irq_build_affinity_masks()
> > > ...
> > >   if (nr_present < numvecs)
> > >   WARN_ON(nr_present + nr_others < numvecs);
> > > 
> > > Signed-off-by: Jon Derrick 
> > > ---
> > >  kernel/irq/affinity.c | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > > index 4352b08ae48d..9beafb8c7e92 100644
> > > --- a/kernel/irq/affinity.c
> > > +++ b/kernel/irq/affinity.c
> > > @@ -127,7 +127,8 @@ static int __irq_build_affinity_masks(unsigned int 
> > > startvec,
> > >   }
> > >  
> > >   for_each_node_mask(n, nodemsk) {
> > > - unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
> > > + unsigned int ncpus, v, vecs_to_assign, total_vecs_to_assign,
> > > + vecs_per_node;
> > >  
> > >   /* Spread the vectors per node */
> > >   vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> > > @@ -141,14 +142,16 @@ static int __irq_build_affinity_masks(unsigned int 
> > > startvec,
> > >  
> > >   /* Account for rounding errors */
> > >   extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
> > > + total_vecs_to_assign = vecs_to_assign + extra_vecs;
> > >  
> > > - for (v = 0; curvec < last_affv && v < vecs_to_assign;
> > > + for (v = 0; curvec < last_affv && v < total_vecs_to_assign;
> > >curvec++, v++) {
> > >   cpus_per_vec = ncpus / vecs_to_assign;
> > >  
> > >   /* Account for extra vectors to compensate rounding 
> > > errors */
> > >   if (extra_vecs) {
> > >   cpus_per_vec++;
> > > + v++;
> > >   --extra_vecs;
> > >   }
> > >   irq_spread_init_one([curvec].mask, nmsk,
> > > -- 
> 
> This looks like it will break the spread to non-present CPUs since
> it's not accurately reporting how many vectors were assigned for the
> present spread.
> 
> I think the real problem is the spread's vecs_per_node doesn't account
> which nodes contribute more CPUs than others. For example:
> 
>   Node 0 has 32 CPUs
>   Node 1 has 8 CPUs
>   Assign 32 vectors
> 
> The current algorithm assigns 16 vectors to node 0 because vecs_per_node
> is calculated as 32 vectors / 2 nodes on the first iteration. The
> subsequent iteration for node 1 gets 8 vectors because it has only 8
> CPUs, leaving 8 vectors unassigned.
> 
> A more fair spread would give node 0 the remaining 8 vectors. This
> optimization, however, is a bit more complex than the current algorithm,
> which is probably why it wasn't done, so I think the warning should just
> be removed.

It does get a bit complex for the rare scenario in this case
Maybe just an informational warning rather than a stackdumping warning


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/3] block: sed-opal: add ioctl for done-mark of shadow mbr

2019-05-06 Thread Derrick, Jonathan
LGTM

Reviewed-by: Jon Derrick 

On Wed, 2019-05-01 at 01:20 +0200, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Enable users to mark the shadow mbr as done without completely
> deactivating the shadow mbr feature. This may be useful on reboots,
> when the power to the disk is not disconnected in between and the
> shadow
> mbr stores the required boot files. Of course, this saves also the
> (few) commands required to enable the feature if it is already
> enabled
> and one only wants to mark the shadow mbr as done.
> 
> Co-authored-by: David Kozub 
> Signed-off-by: Jonas Rabenstein <
> jonas.rabenst...@studium.uni-erlangen.de>
> Signed-off-by: David Kozub 
> ---
>  block/sed-opal.c  | 27 +++
>  include/linux/sed-opal.h  |  1 +
>  include/uapi/linux/sed-opal.h | 12 
>  3 files changed, 40 insertions(+)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index b1aa0cc25803..f1eb9c18e335 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -1986,6 +1986,30 @@ static int
> opal_enable_disable_shadow_mbr(struct opal_dev *dev,
>   return ret;
>  }
>  
> +static int opal_set_mbr_done(struct opal_dev *dev,
> +  struct opal_mbr_done *mbr_done)
> +{
> + u8 mbr_done_tf = mbr_done->done_flag == OPAL_MBR_DONE ?
> + OPAL_TRUE : OPAL_FALSE;
> +
> + const struct opal_step mbr_steps[] = {
> + { start_admin1LSP_opal_session, _done->key },
> + { set_mbr_done, _done_tf },
> + { end_opal_session, }
> + };
> + int ret;
> +
> + if (mbr_done->done_flag != OPAL_MBR_DONE &&
> + mbr_done->done_flag != OPAL_MBR_NOT_DONE)
> + return -EINVAL;
> +
> + mutex_lock(>dev_lock);
> + setup_opal_dev(dev);
> + ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
> + mutex_unlock(>dev_lock);
> + return ret;
> +}
> +
>  static int opal_save(struct opal_dev *dev, struct opal_lock_unlock
> *lk_unlk)
>  {
>   struct opal_suspend_data *suspend;
> @@ -2299,6 +2323,9 @@ int sed_ioctl(struct opal_dev *dev, unsigned
> int cmd, void __user *arg)
>   case IOC_OPAL_ENABLE_DISABLE_MBR:
>   ret = opal_enable_disable_shadow_mbr(dev, p);
>   break;
> + case IOC_OPAL_MBR_DONE:
> + ret = opal_set_mbr_done(dev, p);
> + break;
>   case IOC_OPAL_ERASE_LR:
>   ret = opal_erase_locking_range(dev, p);
>   break;
> diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h
> index 04b124fca51e..42b2ce5da7b3 100644
> --- a/include/linux/sed-opal.h
> +++ b/include/linux/sed-opal.h
> @@ -47,6 +47,7 @@ static inline bool is_sed_ioctl(unsigned int cmd)
>   case IOC_OPAL_ENABLE_DISABLE_MBR:
>   case IOC_OPAL_ERASE_LR:
>   case IOC_OPAL_SECURE_ERASE_LR:
> + case IOC_OPAL_MBR_DONE:
>   return true;
>   }
>   return false;
> diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-
> opal.h
> index e092e124dd16..81dd0e8886a1 100644
> --- a/include/uapi/linux/sed-opal.h
> +++ b/include/uapi/linux/sed-opal.h
> @@ -29,6 +29,11 @@ enum opal_mbr {
>   OPAL_MBR_DISABLE = 0x01,
>  };
>  
> +enum opal_mbr_done_flag {
> + OPAL_MBR_NOT_DONE = 0x0,
> + OPAL_MBR_DONE = 0x01
> +};
> +
>  enum opal_user {
>   OPAL_ADMIN1 = 0x0,
>   OPAL_USER1 = 0x01,
> @@ -104,6 +109,12 @@ struct opal_mbr_data {
>   __u8 __align[7];
>  };
>  
> +struct opal_mbr_done {
> + struct opal_key key;
> + __u8 done_flag;
> + __u8 __align[7];
> +};
> +
>  #define IOC_OPAL_SAVE_IOW('p', 220, struct
> opal_lock_unlock)
>  #define IOC_OPAL_LOCK_UNLOCK _IOW('p', 221, struct
> opal_lock_unlock)
>  #define IOC_OPAL_TAKE_OWNERSHIP  _IOW('p', 222, struct
> opal_key)
> @@ -116,5 +127,6 @@ struct opal_mbr_data {
>  #define IOC_OPAL_ENABLE_DISABLE_MBR _IOW('p', 229, struct
> opal_mbr_data)
>  #define IOC_OPAL_ERASE_LR   _IOW('p', 230, struct
> opal_session_info)
>  #define IOC_OPAL_SECURE_ERASE_LR_IOW('p', 231, struct
> opal_session_info)
> +#define IOC_OPAL_MBR_DONE   _IOW('p', 232, struct
> opal_mbr_done)
>  
>  #endif /* _UAPI_SED_OPAL_H */


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] x86/pci: Clean up usage of X86_DEV_DMA_OPS

2019-04-10 Thread Derrick, Jonathan
On Wed, 2019-04-10 at 16:45 -0500, Bjorn Helgaas wrote:
> [+cc Keith, Jonathan (VMD guys)]
> 
> I'm OK with this from a PCI perspective.  It would be nice if
> 
>   dma_domain_list
>   dma_domain_list_lock
>   add_dma_domain()
>   del_dma_domain()
>   set_dma_domain_ops()
> 
> could all be moved to vmd.c, since they're really only used there.
> 
> But we don't really have a good way to call host bridge-specific code
> like set_dma_domain_ops() except via pcibios_add_device().  Someday
> maybe pcibios_add_device() will become a host bridge method.
I think these could have been contained to vmd.c with the host bridge
add_device callback [1] but I can't seem to find whatever came about
that discussion

[1] https://marc.info/?l=linux-pci=153337765811720=2



Either way, this looks fine
Reviewed-by: Jon Derrick 


> 
> Acked-by: Bjorn Helgaas 
> 
> 
> On Wed, Apr 10, 2019 at 10:16:17AM +0200, Ingo Molnar wrote:
> > (+Cc. Patch quoted below. Acked-by from an x86 perspective.)
> > 
> > * Christoph Hellwig  wrote:
> > 
> > > We have supported per-device dma_map_ops in generic code for a
> > > long
> > > time, and this symbol just guards the inclusion of the
> > > dma_map_ops
> > > registry used for vmd.  Stop enabling it for anything but vmd.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >  arch/x86/Kconfig   | 3 ---
> > >  drivers/misc/mic/Kconfig   | 4 ++--
> > >  drivers/pci/controller/Kconfig | 1 +
> > >  drivers/pci/controller/vmd.c   | 7 ---
> > >  4 files changed, 3 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index 38c62ff8a3f0..d8e2e6519a61 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -28,7 +28,6 @@ config X86_64
> > >   select MODULES_USE_ELF_RELA
> > >   select NEED_DMA_MAP_STATE
> > >   select SWIOTLB
> > > - select X86_DEV_DMA_OPS
> > >   select ARCH_HAS_SYSCALL_WRAPPER
> > >  
> > >  #
> > > @@ -703,7 +702,6 @@ config STA2X11
> > >   bool "STA2X11 Companion Chip Support"
> > >   depends on X86_32_NON_STANDARD && PCI
> > >   select ARCH_HAS_PHYS_TO_DMA
> > > - select X86_DEV_DMA_OPS
> > >   select SWIOTLB
> > >   select MFD_STA2X11
> > >   select GPIOLIB
> > > @@ -2877,7 +2875,6 @@ config HAVE_ATOMIC_IOMAP
> > >  
> > >  config X86_DEV_DMA_OPS
> > >   bool
> > > - depends on X86_64 || STA2X11
> > >  
> > >  config HAVE_GENERIC_GUP
> > >   def_bool y
> > > diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
> > > index 242dcee14689..6736f72cc14a 100644
> > > --- a/drivers/misc/mic/Kconfig
> > > +++ b/drivers/misc/mic/Kconfig
> > > @@ -4,7 +4,7 @@ comment "Intel MIC Bus Driver"
> > >  
> > >  config INTEL_MIC_BUS
> > >   tristate "Intel MIC Bus Driver"
> > > - depends on 64BIT && PCI && X86 && X86_DEV_DMA_OPS
> > > + depends on 64BIT && PCI && X86
> > >   help
> > > This option is selected by any driver which registers a
> > > device or driver on the MIC Bus, such as
> > > CONFIG_INTEL_MIC_HOST,
> > > @@ -21,7 +21,7 @@ comment "SCIF Bus Driver"
> > >  
> > >  config SCIF_BUS
> > >   tristate "SCIF Bus Driver"
> > > - depends on 64BIT && PCI && X86 && X86_DEV_DMA_OPS
> > > + depends on 64BIT && PCI && X86
> > >   help
> > > This option is selected by any driver which registers a
> > > device or driver on the SCIF Bus, such as
> > > CONFIG_INTEL_MIC_HOST
> > > diff --git a/drivers/pci/controller/Kconfig
> > > b/drivers/pci/controller/Kconfig
> > > index 6012f3059acd..011c57cae4b0 100644
> > > --- a/drivers/pci/controller/Kconfig
> > > +++ b/drivers/pci/controller/Kconfig
> > > @@ -267,6 +267,7 @@ config PCIE_TANGO_SMP8759
> > >  
> > >  config VMD
> > >   depends on PCI_MSI && X86_64 && SRCU
> > > + select X86_DEV_DMA_OPS
> > >   tristate "Intel Volume Management Device Driver"
> > >   ---help---
> > > Adds support for the Intel Volume Management Device (VMD).
> > > VMD is a
> > > diff --git a/drivers/pci/controller/vmd.c
> > > b/drivers/pci/controller/vmd.c
> > > index cf6816b55b5e..999a5509e57e 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -95,10 +95,8 @@ struct vmd_dev {
> > >   struct irq_domain   *irq_domain;
> > >   struct pci_bus  *bus;
> > >  
> > > -#ifdef CONFIG_X86_DEV_DMA_OPS
> > >   struct dma_map_ops  dma_ops;
> > >   struct dma_domain   dma_domain;
> > > -#endif
> > >  };
> > >  
> > >  static inline struct vmd_dev *vmd_from_bus(struct pci_bus *bus)
> > > @@ -293,7 +291,6 @@ static struct msi_domain_info
> > > vmd_msi_domain_info = {
> > >   .chip   = _msi_controller,
> > >  };
> > >  
> > > -#ifdef CONFIG_X86_DEV_DMA_OPS
> > >  /*
> > >   * VMD replaces the requester ID with its own.  DMA mappings for
> > > devices in a
> > >   * VMD domain need to be mapped for the VMD, not the device
> > > requiring
> > > @@ -438,10 +435,6 @@ static void vmd_setup_dma_ops(struct vmd_dev
> > > *vmd)
> > >   add_dma_domain(domain);
> > >  }
> > >  #undef ASSIGN_VMD_DMA_OPS
> > > 

Re: [PATCH v4 13/16] block: sed-opal: check size of shadow mbr

2019-02-11 Thread Derrick, Jonathan
On Sun, 2019-02-10 at 21:05 +0100, David Kozub wrote:
> On Fri, 8 Feb 2019, Derrick, Jonathan wrote:
> 
> > On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> > > From: Jonas Rabenstein 
> > > 
> > > Check whether the shadow mbr does fit in the provided space on
> > > the
> > > target. Also a proper firmware should handle this case and return
> > > an
> > > error we may prevent problems or even damage with crappy
> > > firmwares.
> > > 
> > > Signed-off-by: Jonas Rabenstein  > > angen.de>
> > > Reviewed-by: Scott Bauer 
> > > ---
> > >  block/opal_proto.h | 16 
> > >  block/sed-opal.c   | 39 +++
> > >  2 files changed, 55 insertions(+)
> > > 
> > > diff --git a/block/opal_proto.h b/block/opal_proto.h
> > > index b6e352cfe982..5e8df3245eb0 100644
> > > --- a/block/opal_proto.h
> > > +++ b/block/opal_proto.h
> > > @@ -106,6 +106,7 @@ enum opal_uid {
> > >   OPAL_ENTERPRISE_BANDMASTER0_UID,
> > >   OPAL_ENTERPRISE_ERASEMASTER_UID,
> > >   /* tables */
> > > + OPAL_TABLE_TABLE,
> > >   OPAL_LOCKINGRANGE_GLOBAL,
> > >   OPAL_LOCKINGRANGE_ACE_RDLOCKED,
> > >   OPAL_LOCKINGRANGE_ACE_WRLOCKED,
> > > @@ -160,6 +161,21 @@ enum opal_token {
> > >   OPAL_STARTCOLUMN = 0x03,
> > >   OPAL_ENDCOLUMN = 0x04,
> > >   OPAL_VALUES = 0x01,
> > > + /* table table */
> > > + OPAL_TABLE_UID = 0x00,
> > > + OPAL_TABLE_NAME = 0x01,
> > > + OPAL_TABLE_COMMON = 0x02,
> > > + OPAL_TABLE_TEMPLATE = 0x03,
> > > + OPAL_TABLE_KIND = 0x04,
> > > + OPAL_TABLE_COLUMN = 0x05,
> > > + OPAL_TABLE_COLUMNS = 0x06,
> > > + OPAL_TABLE_ROWS = 0x07,
> > > + OPAL_TABLE_ROWS_FREE = 0x08,
> > > + OPAL_TABLE_ROW_BYTES = 0x09,
> > > + OPAL_TABLE_LASTID = 0x0A,
> > > + OPAL_TABLE_MIN = 0x0B,
> > > + OPAL_TABLE_MAX = 0x0C,
> > > +
> > >   /* authority table */
> > >   OPAL_PIN = 0x03,
> > >   /* locking tokens */
> > > diff --git a/block/sed-opal.c b/block/sed-opal.c
> > > index 2459ac4d523b..3493bb979978 100644
> > > --- a/block/sed-opal.c
> > > +++ b/block/sed-opal.c
> > > @@ -139,6 +139,8 @@ static const u8 opaluid[][OPAL_UID_LENGTH] =
> > > {
> > > 
> > >   /* tables */
> > > 
> > > + [OPAL_TABLE_TABLE]
> > > + { 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01
> > > },
> > >   [OPAL_LOCKINGRANGE_GLOBAL] =
> > >   { 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01
> > > },
> > >   [OPAL_LOCKINGRANGE_ACE_RDLOCKED] =
> > > @@ -1120,6 +1122,29 @@ static int generic_get_column(struct
> > > opal_dev *dev, const u8 *table,
> > >   return finalize_and_send(dev, parse_and_check_status);
> > >  }
> > > 
> > > +/*
> > > + * see TCG SAS 5.3.2.3 for a description of the available
> > > columns
> > > + *
> > > + * the result is provided in dev->resp->tok[4]
> > > + */
> > > +static int generic_get_table_info(struct opal_dev *dev, enum
> > > opal_uid table,
> > > +   u64 column)
> > > +{
> > > + u8 uid[OPAL_UID_LENGTH];
> > > + const unsigned int half = OPAL_UID_LENGTH/2;
> > > +
> > > + /* sed-opal UIDs can be split in two halves:
> > > +  *  first:  actual table index
> > > +  *  second: relative index in the table
> > > +  * so we have to get the first half of the
> > > OPAL_TABLE_TABLE and use the
> > > +  * first part of the target table as relative index into
> > > that table
> > > +  */
> > > + memcpy(uid, opaluid[OPAL_TABLE_TABLE], half);
> > > + memcpy(uid+half, opaluid[table], half);
> > > +
> > > + return generic_get_column(dev, uid, column);
> > > +}
> > > +
> > >  static int gen_key(struct opal_dev *dev, void *data)
> > >  {
> > >   u8 uid[OPAL_UID_LENGTH];
> > > @@ -1535,6 +1560,20 @@ static int write_shadow_mbr(struct
> > > opal_dev *dev, void *data)
> > >   u64 len;
> > >   int err = 0;
> > > 
> > > + /* do we fit in the available shadow mbr space? */
> > > + err = generic_get_table_info(dev, OPAL_MBR,
> > > OPAL_TABLE_ROWS);
> > 
> > Wouldn't you need to multiply this by result from
> > OPAL_TABLE_ROWBYTES?
> > What does ROWBYTES return for you

Re: [PATCH v4 15/16] block: sed-opal: don't repeat opal_discovery0 in each steps array

2019-02-11 Thread Derrick, Jonathan
Hi David,

On Sun, 2019-02-10 at 18:46 +0100, David Kozub wrote:
> On Fri, 8 Feb 2019, Derrick, Jonathan wrote:
> 
> > On Mon, 2019-02-04 at 23:44 +0100, David Kozub wrote:
> > > On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> > > 
> > > > > + /* first do a discovery0 */
> > > > > + error = opal_discovery0_step(dev);
> > > > > 
> > > > > + for (state = 0; !error && state < n_steps; state++)
> > > > > + error = execute_step(dev, [state], state);
This was implemented in v4's 14/16, rather than this patch (15/16)

> > > > > +
> > > > > + /*
> > > > > +  * For each OPAL command the first step in steps starts some 
> > > > > sort of
> > > > > +  * session. If an error occurred in the initial discovery0 or 
> > > > > if an
> > > > > +  * error occurred in the first step (and thus stopping the loop 
> > > > > with
> > > > > +  * state == 1) then there was an error before or during the 
> > > > > attempt to
> > > > > +  * start a session. Therefore we shouldn't attempt to terminate 
> > > > > a
> > > > > +  * session, as one has not yet been created.
> > > > > +  */
> > > > > + if (error && state > 1)
> > > > > + end_opal_session_error(dev);
> > > > > 
> > > > >   return error;
> > > > 
> > > > The flow here is a little too condensed for my taste.  Why not the
> > > > plain obvoious, if a little longer:
> > > > 
> > > > error = error = opal_discovery0_step(dev);
> > > > if (error)
> > > > return error;
> > > > 
> > > > for (state = 0; state < n_steps; state++) {
> > > > error = execute_step(dev, [state], state);
> > > > if (error)
> > > > goto out_error;
> > > > }
> > > > 
> > > > return 0;
> > > > 
> > > > out_error:
> > > > if (state > 1)
> > > > end_opal_session_error(dev);
> > > > return error;
> > > 
> > > No problem, I can use this version. But I think there is a minor issue -
> > > the same one I hit in my original change, just from the other direction:
> > > 
> > > If the loop succeds for the 0-th element of steps, and then fails for the
> > > 1st element, then state equals 1 yet the session has been started, so we
> > > should close it.
> > > 
> > > I think the condition in out_error should be if (state > 0).
> > > 
> > > Best regards,
> > > David
> > 
> > Looks good with Christoph's suggestion (for 14/16) and your state check fix
> > 
> > 
> > Reviewed-by: Jon Derrick 
> 
> Hi Jon,
> 
> What suggestion by Christoph you have in mind? I don't see any for 14/16. 
> Currently, in my git repo, for this patch, I applied Christoph suggestion 
> for this (15/16) patch + the "state > 0" fix. Is this what you mean?
> 
> Best regards,
> David

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 13/16] block: sed-opal: check size of shadow mbr

2019-02-08 Thread Derrick, Jonathan
On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Check whether the shadow mbr does fit in the provided space on the
> target. Also a proper firmware should handle this case and return an
> error we may prevent problems or even damage with crappy firmwares.
> 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/opal_proto.h | 16 
>  block/sed-opal.c   | 39 +++
>  2 files changed, 55 insertions(+)
> 
> diff --git a/block/opal_proto.h b/block/opal_proto.h
> index b6e352cfe982..5e8df3245eb0 100644
> --- a/block/opal_proto.h
> +++ b/block/opal_proto.h
> @@ -106,6 +106,7 @@ enum opal_uid {
>   OPAL_ENTERPRISE_BANDMASTER0_UID,
>   OPAL_ENTERPRISE_ERASEMASTER_UID,
>   /* tables */
> + OPAL_TABLE_TABLE,
>   OPAL_LOCKINGRANGE_GLOBAL,
>   OPAL_LOCKINGRANGE_ACE_RDLOCKED,
>   OPAL_LOCKINGRANGE_ACE_WRLOCKED,
> @@ -160,6 +161,21 @@ enum opal_token {
>   OPAL_STARTCOLUMN = 0x03,
>   OPAL_ENDCOLUMN = 0x04,
>   OPAL_VALUES = 0x01,
> + /* table table */
> + OPAL_TABLE_UID = 0x00,
> + OPAL_TABLE_NAME = 0x01,
> + OPAL_TABLE_COMMON = 0x02,
> + OPAL_TABLE_TEMPLATE = 0x03,
> + OPAL_TABLE_KIND = 0x04,
> + OPAL_TABLE_COLUMN = 0x05,
> + OPAL_TABLE_COLUMNS = 0x06,
> + OPAL_TABLE_ROWS = 0x07,
> + OPAL_TABLE_ROWS_FREE = 0x08,
> + OPAL_TABLE_ROW_BYTES = 0x09,
> + OPAL_TABLE_LASTID = 0x0A,
> + OPAL_TABLE_MIN = 0x0B,
> + OPAL_TABLE_MAX = 0x0C,
> +
>   /* authority table */
>   OPAL_PIN = 0x03,
>   /* locking tokens */
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 2459ac4d523b..3493bb979978 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -139,6 +139,8 @@ static const u8 opaluid[][OPAL_UID_LENGTH] = {
>  
>   /* tables */
>  
> + [OPAL_TABLE_TABLE]
> + { 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01 },
>   [OPAL_LOCKINGRANGE_GLOBAL] =
>   { 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 },
>   [OPAL_LOCKINGRANGE_ACE_RDLOCKED] =
> @@ -1120,6 +1122,29 @@ static int generic_get_column(struct opal_dev *dev, 
> const u8 *table,
>   return finalize_and_send(dev, parse_and_check_status);
>  }
>  
> +/*
> + * see TCG SAS 5.3.2.3 for a description of the available columns
> + *
> + * the result is provided in dev->resp->tok[4]
> + */
> +static int generic_get_table_info(struct opal_dev *dev, enum opal_uid table,
> +   u64 column)
> +{
> + u8 uid[OPAL_UID_LENGTH];
> + const unsigned int half = OPAL_UID_LENGTH/2;
> +
> + /* sed-opal UIDs can be split in two halves:
> +  *  first:  actual table index
> +  *  second: relative index in the table
> +  * so we have to get the first half of the OPAL_TABLE_TABLE and use the
> +  * first part of the target table as relative index into that table
> +  */
> + memcpy(uid, opaluid[OPAL_TABLE_TABLE], half);
> + memcpy(uid+half, opaluid[table], half);
> +
> + return generic_get_column(dev, uid, column);
> +}
> +
>  static int gen_key(struct opal_dev *dev, void *data)
>  {
>   u8 uid[OPAL_UID_LENGTH];
> @@ -1535,6 +1560,20 @@ static int write_shadow_mbr(struct opal_dev *dev, void 
> *data)
>   u64 len;
>   int err = 0;
>  
> + /* do we fit in the available shadow mbr space? */
> + err = generic_get_table_info(dev, OPAL_MBR, OPAL_TABLE_ROWS);
Wouldn't you need to multiply this by result from OPAL_TABLE_ROWBYTES?
What does ROWBYTES return for you?

[snip]


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 16/16] block: sed-opal: rename next to execute_steps

2019-02-08 Thread Derrick, Jonathan
Looks good

Reviewed-by: Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> As the function is responsible for executing the individual steps supplied
> in the steps argument, execute_steps is a more descriptive name than the
> rather generic next.
> 
> Signed-off-by: David Kozub 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c | 37 +++--
>  1 file changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 3362741dd198..c09149dd5e0a 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -401,8 +401,8 @@ static int execute_step(struct opal_dev *dev,
>   return error;
>  }
>  
> -static int next(struct opal_dev *dev, const struct opal_step *steps,
> - size_t n_steps)
> +static int execute_steps(struct opal_dev *dev,
> +  const struct opal_step *steps, size_t n_steps)
>  {
>   size_t state;
>   int error;
> @@ -2034,7 +2034,7 @@ static int opal_secure_erase_locking_range(struct 
> opal_dev *dev,
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, erase_steps, ARRAY_SIZE(erase_steps));
> + ret = execute_steps(dev, erase_steps, ARRAY_SIZE(erase_steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2051,7 +2051,7 @@ static int opal_erase_locking_range(struct opal_dev 
> *dev,
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, erase_steps, ARRAY_SIZE(erase_steps));
> + ret = execute_steps(dev, erase_steps, ARRAY_SIZE(erase_steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2077,7 +2077,7 @@ static int opal_enable_disable_shadow_mbr(struct 
> opal_dev *dev,
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
> + ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2099,7 +2099,7 @@ static int opal_mbr_status(struct opal_dev *dev, struct 
> opal_mbr_data *opal_mbr)
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
> + ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2122,7 +2122,7 @@ static int opal_write_shadow_mbr(struct opal_dev *dev,
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
> + ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2174,7 +2174,7 @@ static int opal_add_user_to_lr(struct opal_dev *dev,
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, steps, ARRAY_SIZE(steps));
> + ret = execute_steps(dev, steps, ARRAY_SIZE(steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2189,7 +2189,7 @@ static int opal_reverttper(struct opal_dev *dev, struct 
> opal_key *opal)
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, revert_steps, ARRAY_SIZE(revert_steps));
> + ret = execute_steps(dev, revert_steps, ARRAY_SIZE(revert_steps));
>   mutex_unlock(>dev_lock);
>  
>   /*
> @@ -2217,10 +2217,11 @@ static int __opal_lock_unlock(struct opal_dev *dev,
>   };
>  
>   if (lk_unlk->session.sum)
> - return next(dev, unlock_sum_steps,
> - ARRAY_SIZE(unlock_sum_steps));
> + return execute_steps(dev, unlock_sum_steps,
> +  ARRAY_SIZE(unlock_sum_steps));
>   else
> - return next(dev, unlock_steps, ARRAY_SIZE(unlock_steps));
> + return execute_steps(dev, unlock_steps,
> +  ARRAY_SIZE(unlock_steps));
>  }
>  
>  static int __opal_set_mbr_done(struct opal_dev *dev, struct opal_key *key)
> @@ -2232,7 +2233,7 @@ static int __opal_set_mbr_done(struct opal_dev *dev, 
> struct opal_key *key)
>   { end_opal_session, }
>   };
>  
> - return next(dev, mbrdone_step, ARRAY_SIZE(mbrdone_step));
> + return execute_steps(dev, mbrdone_step, ARRAY_SIZE(mbrdone_step));
>  }
>  
>  static int opal_lock_unlock(struct opal_dev *dev,
> @@ -2267,7 +2268,7 @@ static int opal_take_ownership(struct opal_dev *dev, 
> struct opal_key *opal)
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, owner_steps, ARRAY_SIZE(owner_steps));
> + ret = execute_steps(dev, owner_steps, ARRAY_SIZE(owner_steps));
>   mutex_unlock(>dev_lock);
>   return ret;
>  }
> @@ -2288,7 +2289,7 @@ static int opal_activate_lsp(struct opal_dev *dev,
>  
>   mutex_lock(>dev_lock);
>   setup_opal_dev(dev);
> - ret = next(dev, active_steps, ARRAY_SIZE(active_steps));
> + ret = execute_steps(dev, active_steps, ARRAY_SIZE(active_steps));

Re: [PATCH v4 15/16] block: sed-opal: don't repeat opal_discovery0 in each steps array

2019-02-08 Thread Derrick, Jonathan
On Mon, 2019-02-04 at 23:44 +0100, David Kozub wrote:
> On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> 
> > > + /* first do a discovery0 */
> > > + error = opal_discovery0_step(dev);
> > > 
> > > + for (state = 0; !error && state < n_steps; state++)
> > > + error = execute_step(dev, [state], state);
> > > +
> > > + /*
> > > +  * For each OPAL command the first step in steps starts some sort of
> > > +  * session. If an error occurred in the initial discovery0 or if an
> > > +  * error occurred in the first step (and thus stopping the loop with
> > > +  * state == 1) then there was an error before or during the attempt to
> > > +  * start a session. Therefore we shouldn't attempt to terminate a
> > > +  * session, as one has not yet been created.
> > > +  */
> > > + if (error && state > 1)
> > > + end_opal_session_error(dev);
> > > 
> > >   return error;
> > 
> > The flow here is a little too condensed for my taste.  Why not the
> > plain obvoious, if a little longer:
> > 
> > error = error = opal_discovery0_step(dev);
> > if (error)
> > return error;
> > 
> > for (state = 0; state < n_steps; state++) {
> > error = execute_step(dev, [state], state);
> > if (error)
> > goto out_error;
> > }
> > 
> > return 0;
> > 
> > out_error:
> > if (state > 1)
> > end_opal_session_error(dev);
> > return error;
> 
> No problem, I can use this version. But I think there is a minor issue - 
> the same one I hit in my original change, just from the other direction:
> 
> If the loop succeds for the 0-th element of steps, and then fails for the 
> 1st element, then state equals 1 yet the session has been started, so we 
> should close it.
> 
> I think the condition in out_error should be if (state > 0).
> 
> Best regards,
> David
Looks good with Christoph's suggestion (for 14/16) and your state check fix


Reviewed-by: Jon Derrick 

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 12/16] block: sed-opal: unify retrieval of table columns

2019-02-08 Thread Derrick, Jonathan
Looks good

Reviewed-by: Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Instead of having multiple places defining the same argument list to get
> a specific column of a sed-opal table, provide a generic version and
> call it from those functions.
> 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/opal_proto.h |   2 +
>  block/sed-opal.c   | 132 +
>  2 files changed, 50 insertions(+), 84 deletions(-)
> 
> diff --git a/block/opal_proto.h b/block/opal_proto.h
> index e20be8258854..b6e352cfe982 100644
> --- a/block/opal_proto.h
> +++ b/block/opal_proto.h
> @@ -170,6 +170,8 @@ enum opal_token {
>   OPAL_READLOCKED = 0x07,
>   OPAL_WRITELOCKED = 0x08,
>   OPAL_ACTIVEKEY = 0x0A,
> + /* lockingsp table */
> + OPAL_LIFECYCLE = 0x06,
>   /* locking info table */
>   OPAL_MAXRANGES = 0x04,
>/* mbr control */
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 88c84906ce98..2459ac4d523b 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -1089,6 +1089,37 @@ static int finalize_and_send(struct opal_dev *dev, 
> cont_fn cont)
>   return opal_send_recv(dev, cont);
>  }
>  
> +/*
> + * request @column from table @table on device @dev. On success, the column
> + * data will be available in dev->resp->tok[4]
> + */
> +static int generic_get_column(struct opal_dev *dev, const u8 *table,
> +   u64 column)
> +{
> + int err;
> +
> + err = cmd_start(dev, table, opalmethod[OPAL_GET]);
> +
> + add_token_u8(, dev, OPAL_STARTLIST);
> +
> + add_token_u8(, dev, OPAL_STARTNAME);
> + add_token_u8(, dev, OPAL_STARTCOLUMN);
> + add_token_u64(, dev, column);
> + add_token_u8(, dev, OPAL_ENDNAME);
> +
> + add_token_u8(, dev, OPAL_STARTNAME);
> + add_token_u8(, dev, OPAL_ENDCOLUMN);
> + add_token_u64(, dev, column);
> + add_token_u8(, dev, OPAL_ENDNAME);
> +
> + add_token_u8(, dev, OPAL_ENDLIST);
> +
> + if (err)
> + return err;
> +
> + return finalize_and_send(dev, parse_and_check_status);
> +}
> +
>  static int gen_key(struct opal_dev *dev, void *data)
>  {
>   u8 uid[OPAL_UID_LENGTH];
> @@ -1143,23 +1174,11 @@ static int get_active_key(struct opal_dev *dev, void 
> *data)
>   if (err)
>   return err;
>  
> - err = cmd_start(dev, uid, opalmethod[OPAL_GET]);
> - add_token_u8(, dev, OPAL_STARTLIST);
> - add_token_u8(, dev, OPAL_STARTNAME);
> - add_token_u8(, dev, 3); /* startCloumn */
> - add_token_u8(, dev, 10); /* ActiveKey */
> - add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_STARTNAME);
> - add_token_u8(, dev, 4); /* endColumn */
> - add_token_u8(, dev, 10); /* ActiveKey */
> - add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
> - if (err) {
> - pr_debug("Error building get active key command\n");
> + err = generic_get_column(dev, uid, OPAL_ACTIVEKEY);
> + if (err)
>   return err;
> - }
>  
> - return finalize_and_send(dev, get_active_key_cont);
> + return get_active_key_cont(dev);
>  }
>  
>  static int generic_lr_enable_disable(struct opal_dev *dev,
> @@ -1820,17 +1839,19 @@ static int activate_lsp(struct opal_dev *dev, void 
> *data)
>   return finalize_and_send(dev, parse_and_check_status);
>  }
>  
> -static int get_lsp_lifecycle_cont(struct opal_dev *dev)
> +/* Determine if we're in the Manufactured Inactive or Active state */
> +static int get_lsp_lifecycle(struct opal_dev *dev, void *data)
>  {
>   u8 lc_status;
> - int error = 0;
> + int err;
>  
> - error = parse_and_check_status(dev);
> - if (error)
> - return error;
> + err = generic_get_column(dev, opaluid[OPAL_LOCKINGSP_UID],
> +  OPAL_LIFECYCLE);
> + if (err)
> + return err;
>  
>   lc_status = response_get_u64(>parsed, 4);
> - /* 0x08 is Manufacured Inactive */
> + /* 0x08 is Manufactured Inactive */
>   /* 0x09 is Manufactured */
>   if (lc_status != OPAL_MANUFACTURED_INACTIVE) {
>   pr_debug("Couldn't determine the status of the Lifecycle 
> state\n");
> @@ -1840,49 +1861,19 @@ static int get_lsp_lifecycle_cont(struct opal_dev 
> *dev)
>   return 0;
>  }
>  
> -/* Determine if we're in the Manufactured Inactive or Active state */
> -static int get_lsp_lifecycle(struct opal_dev *dev, void *data)
> -{
> - int err;
> -
> - err = cmd_start(dev, opaluid[OPAL_LOCKINGSP_UID],
> - opalmethod[OPAL_GET]);
> -
> - add_token_u8(, dev, OPAL_STARTLIST);
> -
> - add_token_u8(, dev, OPAL_STARTNAME);
> - add_token_u8(, dev, 3); /* Start Column */
> - add_token_u8(, dev, 6); /* Lifecycle Column */
> - add_token_u8(, dev, OPAL_ENDNAME);
> -
> - add_token_u8(, dev, OPAL_STARTNAME);
> - 

Re: [PATCH v4 01/16] block: sed-opal: fix typos and formatting

2019-02-08 Thread Derrick, Jonathan
Looks good otherwise

Reviewed-by: Jon Derrick 

On Mon, 2019-02-04 at 21:28 +0100, David Kozub wrote:
> On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> 
> > On Fri, Feb 01, 2019 at 09:50:08PM +0100, David Kozub wrote:
> > > This should make no change in functionality.
> > > The formatting changes were triggered by checkpatch.pl.
> > > 
> > > Signed-off-by: David Kozub 
> > > Reviewed-by: Scott Bauer 
> > > ---
> > >  block/sed-opal.c | 19 +++
> > >  1 file changed, 11 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/block/sed-opal.c b/block/sed-opal.c
> > > index e0de4dd448b3..c882a193e162 100644
> > > --- a/block/sed-opal.c
> > > +++ b/block/sed-opal.c
> > > @@ -11,8 +11,8 @@
> > >   *
> > >   * This program is distributed in the hope it will be useful, but WITHOUT
> > >   * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > - * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License 
> > > for
> > > - * more details.
> > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > + * for more details.
> > 
> > What exactly is the fix here?
> > 
> > If we want to fix the licence boilerplate we should switch it to an
> > SPDX tag instead.
> > 
> > Otherwise this looks fine to me.
> 
> I thought checkpatch.pl -f block/sed-opal.c complained about the line 
> being too long. But when I try that again now (with the original version), 
> it does not complain. So I probably saw a ghost.
> 
> I'll undo this change.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 09/16] block: sed-opal: split generation of bytestring header and content

2019-02-08 Thread Derrick, Jonathan
Looks good

Reviewed-by Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Split the header generation from the (normal) memcpy part if a
> bytestring is copied into the command buffer. This allows in-place
> generation of the bytestring content. For example, copy_from_user may be
> used without an intermediate buffer.
> 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 4225f23b2165..4b0a63b9d7c9 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -586,14 +586,11 @@ static void add_token_u64(int *err, struct opal_dev 
> *cmd, u64 number)
>   add_token_u8(err, cmd, number >> (len * 8));
>  }
>  
> -static void add_token_bytestring(int *err, struct opal_dev *cmd,
> -  const u8 *bytestring, size_t len)
> +static u8 *add_bytestring_header(int *err, struct opal_dev *cmd, size_t len)
>  {
>   size_t header_len = 1;
>   bool is_short_atom = true;
> -
> - if (*err)
> - return;
> + char *start;
>  
>   if (len & ~SHORT_ATOM_LEN_MASK) {
>   header_len = 2;
> @@ -602,17 +599,27 @@ static void add_token_bytestring(int *err, struct 
> opal_dev *cmd,
>  
>   if (!can_add(err, cmd, header_len + len)) {
>   pr_debug("Error adding bytestring: end of buffer.\n");
> - return;
> + return NULL;
>   }
>  
>   if (is_short_atom)
>   add_short_atom_header(cmd, true, false, len);
>   else
>   add_medium_atom_header(cmd, true, false, len);
> + start = >cmd[cmd->pos];
> + return start;
> +}
>  
> - memcpy(>cmd[cmd->pos], bytestring, len);
> - cmd->pos += len;
> +static void add_token_bytestring(int *err, struct opal_dev *cmd,
> +  const u8 *bytestring, size_t len)
> +{
> + u8 *start;
>  
> + start = add_bytestring_header(err, cmd, len);
> + if (!start)
> + return;
> + memcpy(start, bytestring, len);
> + cmd->pos += len;
>  }
>  
>  static int build_locking_range(u8 *buffer, size_t length, u8 lr)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 11/16] block: sed-opal: ioctl for writing to shadow mbr

2019-02-08 Thread Derrick, Jonathan
On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Allow modification of the shadow mbr. If the shadow mbr is not marked as
> done, this data will be presented read only as the device content. Only
> after marking the shadow mbr as done and unlocking a locking range the
> actual content is accessible.
> 
> Co-authored-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c  | 89 ++-
>  include/linux/sed-opal.h  |  1 +
>  include/uapi/linux/sed-opal.h |  8 
>  3 files changed, 97 insertions(+), 1 deletion(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index e03838cfd31b..88c84906ce98 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -34,6 +34,9 @@
>  #define IO_BUFFER_LENGTH 2048
>  #define MAX_TOKS 64
>  
> +/* Number of bytes needed by cmd_finalize. */
> +#define CMD_FINALIZE_BYTES_NEEDED 7
> +
>  struct opal_step {
>   int (*fn)(struct opal_dev *dev, void *data);
>   void *data;
> @@ -668,7 +671,11 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   struct opal_header *hdr;
>   int err = 0;
>  
> - /* close the parameter list opened from cmd_start */
> + /*
> +  * Close the parameter list opened from cmd_start.
> +  * The number of bytes added must be equal to
> +  * CMD_FINALIZE_BYTES_NEEDED.
> +  */
>   add_token_u8(, cmd, OPAL_ENDLIST);
>  
>   add_token_u8(, cmd, OPAL_ENDOFDATA);
> @@ -1500,6 +1507,58 @@ static int set_mbr_enable_disable(struct opal_dev 
> *dev, void *data)
>   return finalize_and_send(dev, parse_and_check_status);
>  }
>  
> +static int write_shadow_mbr(struct opal_dev *dev, void *data)
> +{
> + struct opal_shadow_mbr *shadow = data;
> + const u8 __user *src;
> + u8 *dst;
> + size_t off = 0;
> + u64 len;
> + int err = 0;
> +
> + /* do the actual transmission(s) */
> + src = (u8 *) shadow->data;
> + while (off < shadow->size) {
> + err = cmd_start(dev, opaluid[OPAL_MBR], opalmethod[OPAL_SET]);
> + add_token_u8(, dev, OPAL_STARTNAME);
> + add_token_u8(, dev, OPAL_WHERE);
> + add_token_u64(, dev, shadow->offset + off);
> + add_token_u8(, dev, OPAL_ENDNAME);
> +
> + add_token_u8(, dev, OPAL_STARTNAME);
> + add_token_u8(, dev, OPAL_VALUES);
> +
> + /*
> +  * The bytestring header is either 1 or 2 bytes, so assume 2.
> +  * There also needs to be enough space to accommodate the
> +  * trailing OPAL_ENDNAME (1 byte) and tokens added by
> +  * cmd_finalize.
> +  */
> + len = min(remaining_size(dev) - (2+1+CMD_FINALIZE_BYTES_NEEDED),
> +   (size_t)(shadow->size - off));
> + pr_debug("MBR: write bytes %zu+%llu/%llu\n",
> +  off, len, shadow->size);
> +
> + dst = add_bytestring_header(, dev, len);
> + if (!dst)
> + break;
> + if (copy_from_user(dst, src + off, len))
> + err = -EFAULT;
> + dev->pos += len;
> +
> + add_token_u8(, dev, OPAL_ENDNAME);
> + if (err)
> + break;
> +
> + err = finalize_and_send(dev, parse_and_check_status);
> + if (err)
> + break;
> +
> + off += len;
> + }
> + return err;
> +}
> +
>  static int generic_pw_cmd(u8 *key, size_t key_len, u8 *cpin_uid,
> struct opal_dev *dev)
>  {
> @@ -2045,6 +2104,31 @@ static int opal_mbr_status(struct opal_dev *dev, 
> struct opal_mbr_data *opal_mbr)
>   return ret;
>  }
>  
> +static int opal_write_shadow_mbr(struct opal_dev *dev,
> +  struct opal_shadow_mbr *info)
> +{
> + const struct opal_step mbr_steps[] = {
> + { opal_discovery0, },
> + { start_admin1LSP_opal_session, >key },
> + { write_shadow_mbr, info },
> + { end_opal_session, },
> + { NULL, }
> + };
> + int ret;
> +
> + if (info->size == 0)
> + return 0;
> +
> + if (!access_ok(info->data, info->size))
> + return -EINVAL;
-EFAULT?

> +
> + mutex_lock(>dev_lock);
> + setup_opal_dev(dev, mbr_steps);
> + ret = next(dev);
> + mutex_unlock(>dev_lock);
> + return ret;
> +}
> +
>  static int opal_save(struct opal_dev *dev, struct opal_lock_unlock *lk_unlk)
>  {
>   struct opal_suspend_data *suspend;
> @@ -2378,6 +2462,9 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, 
> void __user *arg)
>   case IOC_OPAL_MBR_STATUS:
>   ret = opal_mbr_status(dev, p);
>   break;
> + case IOC_OPAL_WRITE_SHADOW_MBR:
> + ret = opal_write_shadow_mbr(dev, p);
> +

Re: [PATCH v4 02/16] block: sed-opal: use correct macro for method length

2019-02-08 Thread Derrick, Jonathan
Looks good

Reviewed-by: Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Also the values of OPAL_UID_LENGTH and OPAL_METHOD_LENGTH are the same,
> it is weird to use OPAL_UID_LENGTH for the definition of the methods.
> 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index c882a193e162..5c123a5b4ab1 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -181,7 +181,7 @@ static const u8 opaluid[][OPAL_UID_LENGTH] = {
>   * Derived from: TCG_Storage_Architecture_Core_Spec_v2.01_r1.00
>   * Section: 6.3 Assigned UIDs
>   */
> -static const u8 opalmethod[][OPAL_UID_LENGTH] = {
> +static const u8 opalmethod[][OPAL_METHOD_LENGTH] = {
>   [OPAL_PROPERTIES] =
>   { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0x01 },
>   [OPAL_STARTSESSION] =

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 05/16] block: sed-opal: unify cmd start

2019-02-08 Thread Derrick, Jonathan
Looks fine with 4/16

Acked-by: Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> Every step starts with resetting the cmd buffer as well as the comid and
> constructs the appropriate OPAL_CALL command. Consequently, those
> actions may be combined into one generic function. On should take care
> that the opening and closing tokens for the argument list are already
> emitted by cmd_start and cmd_finalize respectively and thus must not be
> additionally added.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c | 228 ++-
>  1 file changed, 69 insertions(+), 159 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 35b1747b650f..e29cb2f445ff 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -661,7 +661,7 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   struct opal_header *hdr;
>   int err = 0;
>  
> - /* close parameter list */
> + /* close the parameter list opened from cmd_start */
>   add_token_u8(, cmd, OPAL_ENDLIST);
>  
>   add_token_u8(, cmd, OPAL_ENDOFDATA);
> @@ -1006,6 +1006,27 @@ static void clear_opal_cmd(struct opal_dev *dev)
>   memset(dev->cmd, 0, IO_BUFFER_LENGTH);
>  }
>  
> +static int cmd_start(struct opal_dev *dev, const u8 *uid, const u8 *method)
> +{
> + int err = 0;
> +
> + clear_opal_cmd(dev);
> + set_comid(dev, dev->comid);
> +
> + add_token_u8(, dev, OPAL_CALL);
> + add_token_bytestring(, dev, uid, OPAL_UID_LENGTH);
> + add_token_bytestring(, dev, method, OPAL_METHOD_LENGTH);
> +
> + /*
> +  * Every method call is followed by its parameters enclosed within
> +  * OPAL_STARTLIST and OPAL_ENDLIST tokens. We automatically open the
> +  * parameter list here and close it later in cmd_finalize.
> +  */
> + add_token_u8(, dev, OPAL_STARTLIST);
> +
> + return err;
> +}
> +
>  static int start_opal_session_cont(struct opal_dev *dev)
>  {
>   u32 hsn, tsn;
> @@ -1068,20 +1089,13 @@ static int finalize_and_send(struct opal_dev *dev, 
> cont_fn cont)
>  static int gen_key(struct opal_dev *dev, void *data)
>  {
>   u8 uid[OPAL_UID_LENGTH];
> - int err = 0;
> -
> - clear_opal_cmd(dev);
> - set_comid(dev, dev->comid);
> + int err;
>  
>   memcpy(uid, dev->prev_data, min(sizeof(uid), dev->prev_d_len));
>   kfree(dev->prev_data);
>   dev->prev_data = NULL;
>  
> - add_token_u8(, dev, OPAL_CALL);
> - add_token_bytestring(, dev, uid, OPAL_UID_LENGTH);
> - add_token_bytestring(, dev, opalmethod[OPAL_GENKEY],
> -  OPAL_UID_LENGTH);
> - add_token_u8(, dev, OPAL_STARTLIST);
> + err = cmd_start(dev, uid, opalmethod[OPAL_GENKEY]);
>  
>   if (err) {
>   pr_debug("Error building gen key command\n");
> @@ -1119,21 +1133,14 @@ static int get_active_key_cont(struct opal_dev *dev)
>  static int get_active_key(struct opal_dev *dev, void *data)
>  {
>   u8 uid[OPAL_UID_LENGTH];
> - int err = 0;
> + int err;
>   u8 *lr = data;
>  
> - clear_opal_cmd(dev);
> - set_comid(dev, dev->comid);
> -
>   err = build_locking_range(uid, sizeof(uid), *lr);
>   if (err)
>   return err;
>  
> - err = 0;
> - add_token_u8(, dev, OPAL_CALL);
> - add_token_bytestring(, dev, uid, OPAL_UID_LENGTH);
> - add_token_bytestring(, dev, opalmethod[OPAL_GET], OPAL_UID_LENGTH);
> - add_token_u8(, dev, OPAL_STARTLIST);
> + err = cmd_start(dev, uid, opalmethod[OPAL_GET]);
>   add_token_u8(, dev, OPAL_STARTLIST);
>   add_token_u8(, dev, OPAL_STARTNAME);
>   add_token_u8(, dev, 3); /* startCloumn */
> @@ -1156,13 +1163,10 @@ static int generic_lr_enable_disable(struct opal_dev 
> *dev,
>u8 *uid, bool rle, bool wle,
>bool rl, bool wl)
>  {
> - int err = 0;
> + int err;
>  
> - add_token_u8(, dev, OPAL_CALL);
> - add_token_bytestring(, dev, uid, OPAL_UID_LENGTH);
> - add_token_bytestring(, dev, opalmethod[OPAL_SET], OPAL_UID_LENGTH);
> + err = cmd_start(dev, uid, opalmethod[OPAL_SET]);
>  
> - add_token_u8(, dev, OPAL_STARTLIST);
>   add_token_u8(, dev, OPAL_STARTNAME);
>   add_token_u8(, dev, OPAL_VALUES);
>   add_token_u8(, dev, OPAL_STARTLIST);
> @@ -1209,10 +1213,7 @@ static int setup_locking_range(struct opal_dev *dev, 
> void *data)
>   u8 uid[OPAL_UID_LENGTH];
>   struct opal_user_lr_setup *setup = data;
>   u8 lr;
> - int err = 0;
> -
> - clear_opal_cmd(dev);
> - set_comid(dev, dev->comid);
> + int err;
>  
>   lr = setup->session.opal_key.lr;
>   err = build_locking_range(uid, sizeof(uid), lr);
> @@ -1222,12 +1223,8 @@ static int setup_locking_range(struct opal_dev *dev, 
> void *data)
>   if (lr == 0)
> 

Re: [PATCH v4 07/16] block: sed-opal: reuse response_get_token to decrease code duplication

2019-02-08 Thread Derrick, Jonathan
Looks good

Reviewed-by: Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> response_get_token had already been in place, its functionality had
> been duplicated within response_get_{u64,bytestring} with the same error
> handling. Unify the handling by reusing response_get_token within the
> other functions.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c | 46 +++---
>  1 file changed, 15 insertions(+), 31 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 537cd73ea88a..1332547e5a99 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -889,27 +889,19 @@ static size_t response_get_string(const struct 
> parsed_resp *resp, int n,
> const char **store)
>  {
>   u8 skip;
> - const struct opal_resp_tok *token;
> + const struct opal_resp_tok *tok;
>  
>   *store = NULL;
> - if (!resp) {
> - pr_debug("Response is NULL\n");
> - return 0;
> - }
> -
> - if (n >= resp->num) {
> - pr_debug("Response has %d tokens. Can't access %d\n",
> -  resp->num, n);
> + tok = response_get_token(resp, n);
> + if (IS_ERR(tok))
>   return 0;
> - }
>  
> - token = >toks[n];
> - if (token->type != OPAL_DTA_TOKENID_BYTESTRING) {
> + if (tok->type != OPAL_DTA_TOKENID_BYTESTRING) {
>   pr_debug("Token is not a byte string!\n");
>   return 0;
>   }
>  
> - switch (token->width) {
> + switch (tok->width) {
>   case OPAL_WIDTH_TINY:
>   case OPAL_WIDTH_SHORT:
>   skip = 1;
> @@ -925,37 +917,29 @@ static size_t response_get_string(const struct 
> parsed_resp *resp, int n,
>   return 0;
>   }
>  
> - *store = token->pos + skip;
> - return token->len - skip;
> + *store = tok->pos + skip;
> + return tok->len - skip;
>  }
>  
>  static u64 response_get_u64(const struct parsed_resp *resp, int n)
>  {
> - if (!resp) {
> - pr_debug("Response is NULL\n");
> - return 0;
> - }
> + const struct opal_resp_tok *tok;
>  
> - if (n >= resp->num) {
> - pr_debug("Response has %d tokens. Can't access %d\n",
> -  resp->num, n);
> + tok = response_get_token(resp, n);
> + if (IS_ERR(tok))
>   return 0;
> - }
>  
> - if (resp->toks[n].type != OPAL_DTA_TOKENID_UINT) {
> - pr_debug("Token is not unsigned it: %d\n",
> -  resp->toks[n].type);
> + if (tok->type != OPAL_DTA_TOKENID_UINT) {
> + pr_debug("Token is not unsigned int: %d\n", tok->type);
>   return 0;
>   }
>  
> - if (!(resp->toks[n].width == OPAL_WIDTH_TINY ||
> -   resp->toks[n].width == OPAL_WIDTH_SHORT)) {
> - pr_debug("Atom is not short or tiny: %d\n",
> -  resp->toks[n].width);
> + if (tok->width != OPAL_WIDTH_TINY && tok->width != OPAL_WIDTH_SHORT) {
> + pr_debug("Atom is not short or tiny: %d\n", tok->width);
>   return 0;
>   }
>  
> - return resp->toks[n].stored.u;
> + return tok->stored.u;
>  }
>  
>  static bool response_token_matches(const struct opal_resp_tok *token, u8 
> match)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 04/16] block: sed-opal: close parameter list in cmd_finalize

2019-02-08 Thread Derrick, Jonathan
Normally I wouldn't like decreasing the readability (having a STARTLIST
without an ENDLIST in the same function), but this makes a lot of sense
with 5/16

Acked-by: Jon Derrick 

On Fri, 2019-02-01 at 21:50 +0100, David Kozub wrote:
> Every step ends by calling cmd_finalize (via finalize_and_send)
> yet every step adds the token OPAL_ENDLIST on its own. Moving
> this into cmd_finalize decreases code duplication.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
> Reviewed-by: Scott Bauer 
> ---
>  block/sed-opal.c | 25 +++--
>  1 file changed, 3 insertions(+), 22 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 980705681806..35b1747b650f 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -661,6 +661,9 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   struct opal_header *hdr;
>   int err = 0;
>  
> + /* close parameter list */
> + add_token_u8(, cmd, OPAL_ENDLIST);
> +
>   add_token_u8(, cmd, OPAL_ENDOFDATA);
>   add_token_u8(, cmd, OPAL_STARTLIST);
>   add_token_u8(, cmd, 0);
> @@ -1079,7 +1082,6 @@ static int gen_key(struct opal_dev *dev, void *data)
>   add_token_bytestring(, dev, opalmethod[OPAL_GENKEY],
>OPAL_UID_LENGTH);
>   add_token_u8(, dev, OPAL_STARTLIST);
> - add_token_u8(, dev, OPAL_ENDLIST);
>  
>   if (err) {
>   pr_debug("Error building gen key command\n");
> @@ -1142,7 +1144,6 @@ static int get_active_key(struct opal_dev *dev, void 
> *data)
>   add_token_u8(, dev, 10); /* ActiveKey */
>   add_token_u8(, dev, OPAL_ENDNAME);
>   add_token_u8(, dev, OPAL_ENDLIST);
> - add_token_u8(, dev, OPAL_ENDLIST);
>   if (err) {
>   pr_debug("Error building get active key command\n");
>   return err;
> @@ -1188,7 +1189,6 @@ static int generic_lr_enable_disable(struct opal_dev 
> *dev,
>  
>   add_token_u8(, dev, OPAL_ENDLIST);
>   add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
>   return err;
>  }
>  
> @@ -1254,8 +1254,6 @@ static int setup_locking_range(struct opal_dev *dev, 
> void *data)
>  
>   add_token_u8(, dev, OPAL_ENDLIST);
>   add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
> -
>   }
>   if (err) {
>   pr_debug("Error building Setup Locking range command.\n");
> @@ -1295,7 +1293,6 @@ static int start_generic_opal_session(struct opal_dev 
> *dev,
>  
>   switch (auth) {
>   case OPAL_ANYBODY_UID:
> - add_token_u8(, dev, OPAL_ENDLIST);
>   break;
>   case OPAL_ADMIN1_UID:
>   case OPAL_SID_UID:
> @@ -1308,7 +1305,6 @@ static int start_generic_opal_session(struct opal_dev 
> *dev,
>   add_token_bytestring(, dev, opaluid[auth],
>OPAL_UID_LENGTH);
>   add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
>   break;
>   default:
>   pr_debug("Cannot start Admin SP session with auth %d\n", auth);
> @@ -1406,7 +1402,6 @@ static int start_auth_opal_session(struct opal_dev 
> *dev, void *data)
>   add_token_u8(, dev, 3);
>   add_token_bytestring(, dev, lk_ul_user, OPAL_UID_LENGTH);
>   add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
>  
>   if (err) {
>   pr_debug("Error building STARTSESSION command.\n");
> @@ -1429,7 +1424,6 @@ static int revert_tper(struct opal_dev *dev, void *data)
>   add_token_bytestring(, dev, opalmethod[OPAL_REVERT],
>OPAL_UID_LENGTH);
>   add_token_u8(, dev, OPAL_STARTLIST);
> - add_token_u8(, dev, OPAL_ENDLIST);
>   if (err) {
>   pr_debug("Error building REVERT TPER command.\n");
>   return err;
> @@ -1463,7 +1457,6 @@ static int internal_activate_user(struct opal_dev *dev, 
> void *data)
>   add_token_u8(, dev, OPAL_ENDNAME);
>   add_token_u8(, dev, OPAL_ENDLIST);
>   add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
>  
>   if (err) {
>   pr_debug("Error building Activate UserN command.\n");
> @@ -1490,7 +1483,6 @@ static int erase_locking_range(struct opal_dev *dev, 
> void *data)
>   add_token_bytestring(, dev, opalmethod[OPAL_ERASE],
>OPAL_UID_LENGTH);
>   add_token_u8(, dev, OPAL_STARTLIST);
> - add_token_u8(, dev, OPAL_ENDLIST);
>  
>   if (err) {
>   pr_debug("Error building Erase Locking Range Command.\n");
> @@ -1521,7 +1513,6 @@ static int set_mbr_done(struct opal_dev *dev, void 
> *data)
>   add_token_u8(, dev, OPAL_ENDNAME);
>   add_token_u8(, dev, OPAL_ENDLIST);
>   add_token_u8(, dev, OPAL_ENDNAME);
> - add_token_u8(, dev, OPAL_ENDLIST);
>  
>   if (err) {
> 

Re: [PATCH v4 03/16] block: sed-opal: unify space check in add_token_*

2019-02-08 Thread Derrick, Jonathan
On Mon, 2019-02-04 at 22:07 +0100, David Kozub wrote:
> On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> 
> > On Fri, Feb 01, 2019 at 09:50:10PM +0100, David Kozub wrote:
> > > From: Jonas Rabenstein 
> > > 
> > > All add_token_* functions have a common set of conditions that have to
> > > be checked. Use a common function for those checks in order to avoid
> > > different behaviour as well as code duplication.
> > > 
> > > Co-authored-by: David Kozub 
> > > Signed-off-by: Jonas Rabenstein 
> > > Signed-off-by: David Kozub 
> > > Reviewed-by: Scott Bauer 
> > > ---
> > >  block/sed-opal.c | 30 +-
> > >  1 file changed, 21 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/block/sed-opal.c b/block/sed-opal.c
> > > index 5c123a5b4ab1..980705681806 100644
> > > --- a/block/sed-opal.c
> > > +++ b/block/sed-opal.c
> > > @@ -510,15 +510,29 @@ static int opal_discovery0(struct opal_dev *dev, 
> > > void *data)
> > >   return opal_discovery0_end(dev);
> > >  }
> > > 
> > > -static void add_token_u8(int *err, struct opal_dev *cmd, u8 tok)
> > > +static size_t remaining_size(struct opal_dev *cmd)
> > > +{
> > > + return IO_BUFFER_LENGTH - cmd->pos;
> > > +}
> > 
> > This function seem a little pointless to me, at least as of this patch
> > where it only has a single user just below.
> 
> It is eventually used for the second time in 11/16 block: sed-opal: ioctl 
> for writing to shadow mbr.
> 
> If you feel strongly about this I can exclude it from this commit and 
> introduce it in 11/16 (where it then will called from here and from 
> write_shadow_mbr).
> 
> Best regards,
> David
I'd prefer this option where we refactor later

Otherwise looks good
Reviewed-by Jon Derrick 

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v4 10/16] block: sed-opal: add ioctl for done-mark of shadow mbr

2019-02-07 Thread Derrick, Jonathan
On Thu, 2019-02-07 at 23:56 +0100, David Kozub wrote:
> On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> 
> > On Fri, Feb 01, 2019 at 09:50:17PM +0100, David Kozub wrote:
> > > From: Jonas Rabenstein 
> > > 
> > > Enable users to mark the shadow mbr as done without completely
> > > deactivating the shadow mbr feature. This may be useful on reboots,
> > > when the power to the disk is not disconnected in between and the shadow
> > > mbr stores the required boot files. Of course, this saves also the
> > > (few) commands required to enable the feature if it is already enabled
> > > and one only wants to mark the shadow mbr as done.
> > > 
> > > Signed-off-by: Jonas Rabenstein 
> > > Reviewed-by: Scott Bauer 
> > > ---
> > >  block/sed-opal.c  | 33 +++--
> > >  include/linux/sed-opal.h  |  1 +
> > >  include/uapi/linux/sed-opal.h |  1 +
> > >  3 files changed, 33 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/sed-opal.c b/block/sed-opal.c
> > > index 4b0a63b9d7c9..e03838cfd31b 100644
> > > --- a/block/sed-opal.c
> > > +++ b/block/sed-opal.c
> > > @@ -1996,13 +1996,39 @@ static int opal_erase_locking_range(struct 
> > > opal_dev *dev,
> > >  static int opal_enable_disable_shadow_mbr(struct opal_dev *dev,
> > > struct opal_mbr_data *opal_mbr)
> > >  {
> > > + u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> > > + ? OPAL_TRUE : OPAL_FALSE;
> > >   const struct opal_step mbr_steps[] = {
> > >   { opal_discovery0, },
> > >   { start_admin1LSP_opal_session, _mbr->key },
> > > - { set_mbr_done, _mbr->enable_disable },
> > > + { set_mbr_done,  },
> > >   { end_opal_session, },
> > >   { start_admin1LSP_opal_session, _mbr->key },
> > > - { set_mbr_enable_disable, _mbr->enable_disable },
> > > + { set_mbr_enable_disable,  },
> > > + { end_opal_session, },
> > > + { NULL, }
> > 
> > This seems to be a change of what we pass to set_mbr_done /
> > set_mbr_enable_disable and not really related to the new functionality
> > here, so it should be split into a separate patch.
> > 
> > That being said if we really care about this translation between
> > the two sets of constants, why not do it inside
> > set_mbr_done and set_mbr_enable_disable?
> 
> Hi Christoph,
> 
> I agree, this should be split. Furthermore I think I found an issue here: 
> OPAL_MBR_ENABLE and OPAL_MBR_DISABLE are defined as follows:
> 
> enum opal_mbr {
>   OPAL_MBR_ENABLE = 0x0,
>   OPAL_MBR_DISABLE = 0x01,
> };
> 
> ... while OPAL_TRUE and OPAL_FALSE tokens are:
> 
>   OPAL_TRUE = 0x01,
>   OPAL_FALSE = 0x00,
> 
> so in the current code in kernel, when the IOCTL input is directly passed 
> in place of the TRUE/FALSE tokens (in opal_enable_disable_shadow_mbr), 
> passing OPAL_MBR_ENABLE (0) to IOC_OPAL_ENABLE_DISABLE_MBR ends up being 
> interpreted as OPAL_FALSE (0) and passing OPAL_MBR_DISABLE (1) ended up 
> being interpreted as OPAL_TRUE (1). So the behavior is:
> 
> OPAL_MBR_ENABLE: set MBR enable to OPAL_FALSE and done to OPAL_FALSE
> OPAL_MBR_DISABLE: set MBR enable to OPAL_TRUE and done to OPAL_TRUE
> 
> Am I missing something here? This seems wrong to me. And I think this 
> patch actually changes it by introducing:
> 
> +u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> +? OPAL_TRUE : OPAL_FALSE;
> 
> which is essentially a negation (map 0 to 1 and 1 to 0).
> 
> I had a strange feeling of IOC_OPAL_ENABLE_DISABLE_MBR behaving 
> incorrectly when I first tried it. But when I checked later I was not able 
> to reproduce it - probably originally I tested without this patch.
> 
> With regard to the new IOC_OPAL_MBR_STATUS: I find the usage of 
> OPAL_MBR_ENABLE/DISABLE for this confusing: what should passing 
> OPAL_MBR_ENABLE do? Should it enable the shadow MBR? Or should it 
> enable the MBR-done flag? I think the implementation in this patch 
> interprets OPAL_MBR_ENABLE as 'set the "done" flag to true', thus hiding 
> the shadow MBR. But this is not obvious looking at the IOCTL name.
> 
> What if I introduced two new constants for this? OPAL_MBR_DONE and 
> OPAL_MBR_NOT_DONE? Maybe the IOCTL could be renamed too - 
> IOC_OPAL_MBR_DONE? Or is it only me who finds this confusing?
> 
> Best regards,
> David

Hi David,

Based on the spec and appnote [1], it does look like sed-opal-temp is
providing the inverted value for shadow mbr enable:

if (cfg.enable_mbr)
mbr.enable_disable = OPAL_MBR_ENABLE;
else
mbr.enable_disable = OPAL_MBR_DISABLE;

where
enum opal_mbr {
OPAL_MBR_ENABLE = 0x0,
OPAL_MBR_DISABLE = 0x01,
};

The appnote says as much:
3.2.9.4
Enable the MBR Shadowing feature
session[TSN:HSN] -> MBRControl_UID.Set[Values = [Enable = TRUE]]
  07FE  
0010 0048 1001 0001 
0020   0030 

Re: [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls

2018-12-14 Thread Derrick, Jonathan
Looks good to me
Thanks Christoph

Acked-by: Jon Derrick 

On Fri, 2018-12-14 at 15:17 -0600, Bjorn Helgaas wrote:
> Conventional spelling in subject is
> 
>   PCI: vmd: Use dma_* APIs instead of direct method calls
> 
> On Fri, Dec 07, 2018 at 11:07:19AM -0800, Christoph Hellwig wrote:
> > With the bypass support for the direct mapping we might not always have
> > methods to call, so use the proper APIs instead.  The only downside is
> > that we will create two dma-debug entries for each mapping if
> > CONFIG_DMA_DEBUG is enabled.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> You cc'd the VMD maintainers already, and I have no objection to this
> from a PCI core point of view, so:
> 
> Acked-by: Bjorn Helgaas 
> 
> > ---
> >  drivers/pci/controller/vmd.c | 42 +++-
> >  1 file changed, 17 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > index 98ce79eac128..3890812cdf87 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -307,39 +307,32 @@ static struct device *to_vmd_dev(struct device *dev)
> > return >dev->dev;
> >  }
> >  
> > -static const struct dma_map_ops *vmd_dma_ops(struct device *dev)
> > -{
> > -   return get_dma_ops(to_vmd_dev(dev));
> > -}
> > -
> >  static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr,
> >gfp_t flag, unsigned long attrs)
> >  {
> > -   return vmd_dma_ops(dev)->alloc(to_vmd_dev(dev), size, addr, flag,
> > -  attrs);
> > +   return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs);
> >  }
> >  
> >  static void vmd_free(struct device *dev, size_t size, void *vaddr,
> >  dma_addr_t addr, unsigned long attrs)
> >  {
> > -   return vmd_dma_ops(dev)->free(to_vmd_dev(dev), size, vaddr, addr,
> > - attrs);
> > +   return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs);
> >  }
> >  
> >  static int vmd_mmap(struct device *dev, struct vm_area_struct *vma,
> > void *cpu_addr, dma_addr_t addr, size_t size,
> > unsigned long attrs)
> >  {
> > -   return vmd_dma_ops(dev)->mmap(to_vmd_dev(dev), vma, cpu_addr, addr,
> > - size, attrs);
> > +   return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size,
> > +   attrs);
> >  }
> >  
> >  static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt,
> >void *cpu_addr, dma_addr_t addr, size_t size,
> >unsigned long attrs)
> >  {
> > -   return vmd_dma_ops(dev)->get_sgtable(to_vmd_dev(dev), sgt, cpu_addr,
> > -addr, size, attrs);
> > +   return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size,
> > +   attrs);
> >  }
> >  
> >  static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
> > @@ -347,61 +340,60 @@ static dma_addr_t vmd_map_page(struct device *dev, 
> > struct page *page,
> >enum dma_data_direction dir,
> >unsigned long attrs)
> >  {
> > -   return vmd_dma_ops(dev)->map_page(to_vmd_dev(dev), page, offset, size,
> > - dir, attrs);
> > +   return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir,
> > +   attrs);
> >  }
> >  
> >  static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t 
> > size,
> >enum dma_data_direction dir, unsigned long attrs)
> >  {
> > -   vmd_dma_ops(dev)->unmap_page(to_vmd_dev(dev), addr, size, dir, attrs);
> > +   dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs);
> >  }
> >  
> >  static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int 
> > nents,
> >   enum dma_data_direction dir, unsigned long attrs)
> >  {
> > -   return vmd_dma_ops(dev)->map_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
> > +   return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
> >  }
> >  
> >  static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int 
> > nents,
> >  enum dma_data_direction dir, unsigned long attrs)
> >  {
> > -   vmd_dma_ops(dev)->unmap_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
> > +   dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
> >  }
> >  
> >  static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
> > size_t size, enum dma_data_direction dir)
> >  {
> > -   vmd_dma_ops(dev)->sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
> > +   dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
> >  }
> >  
> >  static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr,
> >size_t size, enum dma_data_direction dir)
> >  {
> > -   

Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-07 Thread Derrick, Jonathan
I found the same issue:
https://patchwork.ozlabs.org/patch/989272/

Tested-by: Jon Derrick 

On Mon, 2018-11-05 at 18:32 -0600, Alex G. wrote:
> ping
> 
> On 09/18/2018 05:15 PM, Alexandru Gagniuc wrote:
> > When a PCI device is gone, we don't want to send IO to it if we can
> > avoid it. We expose functionality via the irq_chip structure. As
> > users of that structure may not know about the underlying PCI
> > device,
> > it's our responsibility to guard against removed devices.
> > 
> > .irq_write_msi_msg() is already guarded inside
> > __pci_write_msi_msg().
> > .irq_mask/unmask() are not. Guard them for completeness.
> > 
> > For example, surprise removal of a PCIe device triggers teardown.
> > This
> > touches the irq_chips ops some point to disable the interrupts. I/O
> > generated here can crash the system on firmware-first machines.
> > Not triggering the IO in the first place greatly reduces the
> > possibility of the problem occurring.
> > 
> > Signed-off-by: Alexandru Gagniuc 
> > ---
> >   drivers/pci/msi.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index f2ef896464b3..f31058fd2260 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -227,6 +227,9 @@ static void msi_set_mask_bit(struct irq_data
> > *data, u32 flag)
> >   {
> > struct msi_desc *desc = irq_data_get_msi_desc(data);
> >   
> > +   if (pci_dev_is_disconnected(msi_desc_to_pci_dev(desc)))
> > +   return;
> > +
> > if (desc->msi_attrib.is_msix) {
> > msix_mask_irq(desc, flag);
> > readl(desc->mask_base); /* Flush
> > write to device */
> > 

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-07 Thread Derrick, Jonathan
I found the same issue:
https://patchwork.ozlabs.org/patch/989272/

Tested-by: Jon Derrick 

On Mon, 2018-11-05 at 18:32 -0600, Alex G. wrote:
> ping
> 
> On 09/18/2018 05:15 PM, Alexandru Gagniuc wrote:
> > When a PCI device is gone, we don't want to send IO to it if we can
> > avoid it. We expose functionality via the irq_chip structure. As
> > users of that structure may not know about the underlying PCI
> > device,
> > it's our responsibility to guard against removed devices.
> > 
> > .irq_write_msi_msg() is already guarded inside
> > __pci_write_msi_msg().
> > .irq_mask/unmask() are not. Guard them for completeness.
> > 
> > For example, surprise removal of a PCIe device triggers teardown.
> > This
> > touches the irq_chips ops some point to disable the interrupts. I/O
> > generated here can crash the system on firmware-first machines.
> > Not triggering the IO in the first place greatly reduces the
> > possibility of the problem occurring.
> > 
> > Signed-off-by: Alexandru Gagniuc 
> > ---
> >   drivers/pci/msi.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index f2ef896464b3..f31058fd2260 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -227,6 +227,9 @@ static void msi_set_mask_bit(struct irq_data
> > *data, u32 flag)
> >   {
> > struct msi_desc *desc = irq_data_get_msi_desc(data);
> >   
> > +   if (pci_dev_is_disconnected(msi_desc_to_pci_dev(desc)))
> > +   return;
> > +
> > if (desc->msi_attrib.is_msix) {
> > msix_mask_irq(desc, flag);
> > readl(desc->mask_base); /* Flush
> > write to device */
> > 

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] PCI/portdrv: Enable error reporting on managed ports

2018-10-09 Thread Derrick, Jonathan
Hi Bjorn,

On Tue, 2018-10-09 at 12:56 -0500, Bjorn Helgaas wrote:
> On Tue, Sep 04, 2018 at 12:33:09PM -0600, Jon Derrick wrote:
> > During probe, the port driver will disable error reporting and
> > assumes
> > it will be enabled later by the AER driver's pci_walk_bus()
> > sequence.
> > This may not be the case for host-bridge enabled root ports, who
> > will
> > enable first error reporting on the bus during the root port probe,
> > and
> > then disable error reporting on downstream devices during
> > subsequent
> > probing of the bus.
> 
> I understand the hotplug case (see below), but help me understand
> this
> "host-bridge enabled root ports" thing.  I'm not sure what that
> means.
Sorry for the confusion. I meant a device which doesn't expose the root
ports with firmware but has to expose the root ports using
pci_create_root_bus or similar methods. These methods use a host bridge
aperture on the backend.


> 
> We run pcie_portdrv_probe() for every root port, switch upstream
> port,
> and switch downstream port, and it always disables error reporting
> for
> the port:
> 
>   pcie_portdrv_probe  # pci_driver .probe
> pcie_port_device_register
>   get_port_device_capability
> services |= PCIE_PORT_SERVICE_AER
> pci_disable_pcie_error_reporting
>   # clear DEVCTL Error Reporting Enables
> 
> For root ports, we call aer_probe(), and it enables error reporting
> for the entire tree below the root port:
> 
>   aer_probe   # pcie_port_service .probe
> aer_enable_rootport
>   set_downstream_devices_error_reporting(dev, true)
> pci_walk_bus(dev->subordinate, set_device_error_reporting)
>   set_device_error_reporting
> if (Root Port || Upstream Port || Downstream Port)
>   pci_enable_pcie_error_reporting
> # set DEVCTL Error Reporting Enables
> 
> This is definitely broken for hot-added switches because aer_probe()
> is the only place we enable error reporting, and it's only run when
> we
> enumerate a root port, not when we hot-add things below that root
> port.

I don't currently have the hardware to test hotplugging a switch,
although I think it should be possible to test with Thunderbolt. Mika? 
:)

> 
> > A hotplugged port device may also fail to enable error reporting as
> > the
> > AER driver has already run on the root bus.
> > Check for these conditions and enable error reporting during
> > portdrv
> > probing.
> > 
> > Example case:
> 
> pcie_portdrv_probe(1:00:00.0):
> > [  343.790573] pcieport 1:00:00.0:
> > pci_disable_pcie_error_reporting
> 
> aer_probe(1:00:00.0):
> > [  343.809812] pcieport 1:00:00.0:
> > pci_enable_pcie_error_reporting
> > [  343.819506] pci 1:01:00.0: pci_enable_pcie_error_reporting
> > [  343.828814] pci 1:02:00.0: pci_enable_pcie_error_reporting
> > [  343.838089] pci 1:02:01.0: pci_enable_pcie_error_reporting
> > [  343.847478] pci 1:02:02.0: pci_enable_pcie_error_reporting
> > [  343.856659] pci 1:02:03.0: pci_enable_pcie_error_reporting
> > [  343.865794] pci 1:02:04.0: pci_enable_pcie_error_reporting
> > [  343.874875] pci 1:02:05.0: pci_enable_pcie_error_reporting
> > [  343.883918] pci 1:02:06.0: pci_enable_pcie_error_reporting
> > [  343.892922] pci 1:02:07.0: pci_enable_pcie_error_reporting
> 
> pcie_portdrv_probe(1:01:00.0):
> > [  343.918900] pcieport 1:01:00.0:
> > pci_disable_pcie_error_reporting
> 
> pcie_portdrv_probe(1:02:00.0):
> > [  343.968426] pcieport 1:02:00.0:
> > pci_disable_pcie_error_reporting
> 
> ...
> > [  344.028179] pcieport 1:02:01.0:
> > pci_disable_pcie_error_reporting
> > [  344.091269] pcieport 1:02:02.0:
> > pci_disable_pcie_error_reporting
> > [  344.156473] pcieport 1:02:03.0:
> > pci_disable_pcie_error_reporting
> > [  344.238042] pcieport 1:02:04.0:
> > pci_disable_pcie_error_reporting
> > [  344.321864] pcieport 1:02:05.0:
> > pci_disable_pcie_error_reporting
> > [  344.411601] pcieport 1:02:06.0:
> > pci_disable_pcie_error_reporting
> > [  344.505332] pcieport 1:02:07.0:
> > pci_disable_pcie_error_reporting
> > [  344.621824] nvme 1:06:00.0: pci_enable_pcie_error_reporting

The main issue leading to this sequence of events is that the root port
and tree is enumerated, then the tree has drivers attached in-order.
During driver probe, the root port services are attached which enables
AER on the entire bus. Then the port's service drivers are attached,
which disables AER during initialization.


It looks like I don't see this issue in the non-host-bridge case
because this statement isn't executed:
#ifdef CONFIG_PCIEAER
if (dev->aer_cap && pci_aer_available() &&
(pcie_ports_native || host->native_aer)) {
services |= PCIE_PORT_SERVICE_AER;

/*
 * Disable AER on this port in case it's been enabled
by the
 * BIOS (the AER service driver 

Re: [PATCH] PCI/portdrv: Enable error reporting on managed ports

2018-10-09 Thread Derrick, Jonathan
Hi Bjorn,

On Tue, 2018-10-09 at 12:56 -0500, Bjorn Helgaas wrote:
> On Tue, Sep 04, 2018 at 12:33:09PM -0600, Jon Derrick wrote:
> > During probe, the port driver will disable error reporting and
> > assumes
> > it will be enabled later by the AER driver's pci_walk_bus()
> > sequence.
> > This may not be the case for host-bridge enabled root ports, who
> > will
> > enable first error reporting on the bus during the root port probe,
> > and
> > then disable error reporting on downstream devices during
> > subsequent
> > probing of the bus.
> 
> I understand the hotplug case (see below), but help me understand
> this
> "host-bridge enabled root ports" thing.  I'm not sure what that
> means.
Sorry for the confusion. I meant a device which doesn't expose the root
ports with firmware but has to expose the root ports using
pci_create_root_bus or similar methods. These methods use a host bridge
aperture on the backend.


> 
> We run pcie_portdrv_probe() for every root port, switch upstream
> port,
> and switch downstream port, and it always disables error reporting
> for
> the port:
> 
>   pcie_portdrv_probe  # pci_driver .probe
> pcie_port_device_register
>   get_port_device_capability
> services |= PCIE_PORT_SERVICE_AER
> pci_disable_pcie_error_reporting
>   # clear DEVCTL Error Reporting Enables
> 
> For root ports, we call aer_probe(), and it enables error reporting
> for the entire tree below the root port:
> 
>   aer_probe   # pcie_port_service .probe
> aer_enable_rootport
>   set_downstream_devices_error_reporting(dev, true)
> pci_walk_bus(dev->subordinate, set_device_error_reporting)
>   set_device_error_reporting
> if (Root Port || Upstream Port || Downstream Port)
>   pci_enable_pcie_error_reporting
> # set DEVCTL Error Reporting Enables
> 
> This is definitely broken for hot-added switches because aer_probe()
> is the only place we enable error reporting, and it's only run when
> we
> enumerate a root port, not when we hot-add things below that root
> port.

I don't currently have the hardware to test hotplugging a switch,
although I think it should be possible to test with Thunderbolt. Mika? 
:)

> 
> > A hotplugged port device may also fail to enable error reporting as
> > the
> > AER driver has already run on the root bus.
> > Check for these conditions and enable error reporting during
> > portdrv
> > probing.
> > 
> > Example case:
> 
> pcie_portdrv_probe(1:00:00.0):
> > [  343.790573] pcieport 1:00:00.0:
> > pci_disable_pcie_error_reporting
> 
> aer_probe(1:00:00.0):
> > [  343.809812] pcieport 1:00:00.0:
> > pci_enable_pcie_error_reporting
> > [  343.819506] pci 1:01:00.0: pci_enable_pcie_error_reporting
> > [  343.828814] pci 1:02:00.0: pci_enable_pcie_error_reporting
> > [  343.838089] pci 1:02:01.0: pci_enable_pcie_error_reporting
> > [  343.847478] pci 1:02:02.0: pci_enable_pcie_error_reporting
> > [  343.856659] pci 1:02:03.0: pci_enable_pcie_error_reporting
> > [  343.865794] pci 1:02:04.0: pci_enable_pcie_error_reporting
> > [  343.874875] pci 1:02:05.0: pci_enable_pcie_error_reporting
> > [  343.883918] pci 1:02:06.0: pci_enable_pcie_error_reporting
> > [  343.892922] pci 1:02:07.0: pci_enable_pcie_error_reporting
> 
> pcie_portdrv_probe(1:01:00.0):
> > [  343.918900] pcieport 1:01:00.0:
> > pci_disable_pcie_error_reporting
> 
> pcie_portdrv_probe(1:02:00.0):
> > [  343.968426] pcieport 1:02:00.0:
> > pci_disable_pcie_error_reporting
> 
> ...
> > [  344.028179] pcieport 1:02:01.0:
> > pci_disable_pcie_error_reporting
> > [  344.091269] pcieport 1:02:02.0:
> > pci_disable_pcie_error_reporting
> > [  344.156473] pcieport 1:02:03.0:
> > pci_disable_pcie_error_reporting
> > [  344.238042] pcieport 1:02:04.0:
> > pci_disable_pcie_error_reporting
> > [  344.321864] pcieport 1:02:05.0:
> > pci_disable_pcie_error_reporting
> > [  344.411601] pcieport 1:02:06.0:
> > pci_disable_pcie_error_reporting
> > [  344.505332] pcieport 1:02:07.0:
> > pci_disable_pcie_error_reporting
> > [  344.621824] nvme 1:06:00.0: pci_enable_pcie_error_reporting

The main issue leading to this sequence of events is that the root port
and tree is enumerated, then the tree has drivers attached in-order.
During driver probe, the root port services are attached which enables
AER on the entire bus. Then the port's service drivers are attached,
which disables AER during initialization.


It looks like I don't see this issue in the non-host-bridge case
because this statement isn't executed:
#ifdef CONFIG_PCIEAER
if (dev->aer_cap && pci_aer_available() &&
(pcie_ports_native || host->native_aer)) {
services |= PCIE_PORT_SERVICE_AER;

/*
 * Disable AER on this port in case it's been enabled
by the
 * BIOS (the AER service driver 

Re: [PATCH v2] PCI hotplug Eq v2

2018-09-17 Thread Derrick, Jonathan
On Mon, 2018-09-17 at 15:53 -0500, Bjorn Helgaas wrote:
> On Thu, Aug 30, 2018 at 04:11:59PM -0600, Jon Derrick wrote:
> > Hi Bjorn,
> > 
> > Sorry for the delay on this one and pushing it after RC1.
> > Feel free to queue it up for 4.20 if it looks fine.
> > 
> > I've added comments to the git log and source explaining why
> > calculate_iosize was left unchanged. Basically I could not
> > synthesize a condition where it would have affected the topology.
> 
> In other words, the only reason you didn't change the
> calculate_iosize() path was because you couldn't test it?
> 
I did unsuccessfully try to synthesize it in hardware and qemu. The
firmwares didn't provide the neccessary topology to hit the flexible IO
provisioning conditions


> I appreciate your desire to avoid untested changes, but I think it's
> very important to preserve and even improve the symmetry between
> calculate_memsize() and calculate_iosize().  For example, it's not
> obvious why the order is different here:
> 
>   calculate_iosize():
> size = ALIGN(size + size1, align);
> if (size < old_size)
>   size = old_size;
> 
I agree this part didn't make that much sense to me, which was another
reason I left it as-is. Looking at it again, I think its a harmless
calculation that bounds IO size tightly, but could also be reordered as
below to provide for the additional IO (assuming this code ever runs).

>   calculate_memsize():
> if (size < old_size)
>   size = old_size;
> size = ALIGN(size + size1, align);
> 
> So I don't want to diverge them further unless there's a real
> functional reason why we need to handle I/O port space differently
> than MMIO space.
> 
> You've tested the MMIO path, and I'm willing to take the risk of
> doing the same thing in the I/O port path.
> 
> Bjorn
Great! I'll follow-up with a patch as soon as I can

Jon


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] PCI hotplug Eq v2

2018-09-17 Thread Derrick, Jonathan
On Mon, 2018-09-17 at 15:53 -0500, Bjorn Helgaas wrote:
> On Thu, Aug 30, 2018 at 04:11:59PM -0600, Jon Derrick wrote:
> > Hi Bjorn,
> > 
> > Sorry for the delay on this one and pushing it after RC1.
> > Feel free to queue it up for 4.20 if it looks fine.
> > 
> > I've added comments to the git log and source explaining why
> > calculate_iosize was left unchanged. Basically I could not
> > synthesize a condition where it would have affected the topology.
> 
> In other words, the only reason you didn't change the
> calculate_iosize() path was because you couldn't test it?
> 
I did unsuccessfully try to synthesize it in hardware and qemu. The
firmwares didn't provide the neccessary topology to hit the flexible IO
provisioning conditions


> I appreciate your desire to avoid untested changes, but I think it's
> very important to preserve and even improve the symmetry between
> calculate_memsize() and calculate_iosize().  For example, it's not
> obvious why the order is different here:
> 
>   calculate_iosize():
> size = ALIGN(size + size1, align);
> if (size < old_size)
>   size = old_size;
> 
I agree this part didn't make that much sense to me, which was another
reason I left it as-is. Looking at it again, I think its a harmless
calculation that bounds IO size tightly, but could also be reordered as
below to provide for the additional IO (assuming this code ever runs).

>   calculate_memsize():
> if (size < old_size)
>   size = old_size;
> size = ALIGN(size + size1, align);
> 
> So I don't want to diverge them further unless there's a real
> functional reason why we need to handle I/O port space differently
> than MMIO space.
> 
> You've tested the MMIO path, and I'm willing to take the risk of
> doing the same thing in the I/O port path.
> 
> Bjorn
Great! I'll follow-up with a patch as soon as I can

Jon


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] PCI/AER: Fix an AER enabling/disabling race

2018-09-03 Thread Derrick, Jonathan
Hi,

After giving this a few days thought, I think the right way is to call
pci_enable_pcie_error_reporting after portdrv probe, and prevent AER's
pci_walk_bus from enabling err reporting if the port hasn't been
probed.

I'm going to Self-NAK this and follow-up

Sorry for the noise

On Sat, 2018-09-01 at 19:06 -0600, Jon Derrick wrote:
> There is a sequence with non-ACPI root ports where the AER driver can
> enable error reporting on the tree before port drivers have bound to
> ports on the tree. The port driver assumes the AER driver will set up
> error reporting afterwards, so instead add a check if error reporting
> was set up first.
> 
> Example:
> [  343.790573] pcieport 1:00:00.0:
> pci_disable_pcie_error_reporting
> [  343.809812] pcieport 1:00:00.0:
> pci_enable_pcie_error_reporting
> [  343.819506] pci 1:01:00.0: pci_enable_pcie_error_reporting
> [  343.828814] pci 1:02:00.0: pci_enable_pcie_error_reporting
> [  343.838089] pci 1:02:01.0: pci_enable_pcie_error_reporting
> [  343.847478] pci 1:02:02.0: pci_enable_pcie_error_reporting
> [  343.856659] pci 1:02:03.0: pci_enable_pcie_error_reporting
> [  343.865794] pci 1:02:04.0: pci_enable_pcie_error_reporting
> [  343.874875] pci 1:02:05.0: pci_enable_pcie_error_reporting
> [  343.883918] pci 1:02:06.0: pci_enable_pcie_error_reporting
> [  343.892922] pci 1:02:07.0: pci_enable_pcie_error_reporting
> [  343.918900] pcieport 1:01:00.0:
> pci_disable_pcie_error_reporting
> [  343.968426] pcieport 1:02:00.0:
> pci_disable_pcie_error_reporting
> [  344.028179] pcieport 1:02:01.0:
> pci_disable_pcie_error_reporting
> [  344.091269] pcieport 1:02:02.0:
> pci_disable_pcie_error_reporting
> [  344.156473] pcieport 1:02:03.0:
> pci_disable_pcie_error_reporting
> [  344.238042] pcieport 1:02:04.0:
> pci_disable_pcie_error_reporting
> [  344.321864] pcieport 1:02:05.0:
> pci_disable_pcie_error_reporting
> [  344.411601] pcieport 1:02:06.0:
> pci_disable_pcie_error_reporting
> [  344.505332] pcieport 1:02:07.0:
> pci_disable_pcie_error_reporting
> [  344.621824] nvme 1:06:00.0: pci_enable_pcie_error_reporting
> 
> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/pcie/aer.c  | 1 +
>  drivers/pci/pcie/portdrv_core.c | 5 -
>  include/linux/pci.h | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 83180ed..a4e36b6 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1333,6 +1333,7 @@ static int set_device_error_reporting(struct
> pci_dev *dev, void *data)
>   if (enable)
>   pcie_set_ecrc_checking(dev);
>  
> + dev->aer_configured = 1;
>   return 0;
>  }
>  
> diff --git a/drivers/pci/pcie/portdrv_core.c
> b/drivers/pci/pcie/portdrv_core.c
> index 7c37d81..f5de554 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -224,8 +224,11 @@ static int get_port_device_capability(struct
> pci_dev *dev)
>   /*
>* Disable AER on this port in case it's been
> enabled by the
>* BIOS (the AER service driver will enable it when
> necessary).
> +  * Don't disable it if the AER service driver has
> already
> +  * enabled it from the root port bus walking
>*/
> - pci_disable_pcie_error_reporting(dev);
> + if (!dev->aer_configured)
> + pci_disable_pcie_error_reporting(dev);
>   }
>  #endif
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8d..c071622 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -402,6 +402,7 @@ struct pci_dev {
>   unsigned inthas_secondary_link:1;
>   unsigned intnon_compliant_bars:1;   /* Broken
> BARs; ignore them */
>   unsigned intis_probed:1;/* Device
> probing in progress */
> + unsigned intaer_configured:1;   /* AER
> configured for device */
>   pci_dev_flags_t dev_flags;
>   atomic_tenable_cnt; /* pci_enable_device has
> been called */
>  

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] PCI/AER: Fix an AER enabling/disabling race

2018-09-03 Thread Derrick, Jonathan
Hi,

After giving this a few days thought, I think the right way is to call
pci_enable_pcie_error_reporting after portdrv probe, and prevent AER's
pci_walk_bus from enabling err reporting if the port hasn't been
probed.

I'm going to Self-NAK this and follow-up

Sorry for the noise

On Sat, 2018-09-01 at 19:06 -0600, Jon Derrick wrote:
> There is a sequence with non-ACPI root ports where the AER driver can
> enable error reporting on the tree before port drivers have bound to
> ports on the tree. The port driver assumes the AER driver will set up
> error reporting afterwards, so instead add a check if error reporting
> was set up first.
> 
> Example:
> [  343.790573] pcieport 1:00:00.0:
> pci_disable_pcie_error_reporting
> [  343.809812] pcieport 1:00:00.0:
> pci_enable_pcie_error_reporting
> [  343.819506] pci 1:01:00.0: pci_enable_pcie_error_reporting
> [  343.828814] pci 1:02:00.0: pci_enable_pcie_error_reporting
> [  343.838089] pci 1:02:01.0: pci_enable_pcie_error_reporting
> [  343.847478] pci 1:02:02.0: pci_enable_pcie_error_reporting
> [  343.856659] pci 1:02:03.0: pci_enable_pcie_error_reporting
> [  343.865794] pci 1:02:04.0: pci_enable_pcie_error_reporting
> [  343.874875] pci 1:02:05.0: pci_enable_pcie_error_reporting
> [  343.883918] pci 1:02:06.0: pci_enable_pcie_error_reporting
> [  343.892922] pci 1:02:07.0: pci_enable_pcie_error_reporting
> [  343.918900] pcieport 1:01:00.0:
> pci_disable_pcie_error_reporting
> [  343.968426] pcieport 1:02:00.0:
> pci_disable_pcie_error_reporting
> [  344.028179] pcieport 1:02:01.0:
> pci_disable_pcie_error_reporting
> [  344.091269] pcieport 1:02:02.0:
> pci_disable_pcie_error_reporting
> [  344.156473] pcieport 1:02:03.0:
> pci_disable_pcie_error_reporting
> [  344.238042] pcieport 1:02:04.0:
> pci_disable_pcie_error_reporting
> [  344.321864] pcieport 1:02:05.0:
> pci_disable_pcie_error_reporting
> [  344.411601] pcieport 1:02:06.0:
> pci_disable_pcie_error_reporting
> [  344.505332] pcieport 1:02:07.0:
> pci_disable_pcie_error_reporting
> [  344.621824] nvme 1:06:00.0: pci_enable_pcie_error_reporting
> 
> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/pcie/aer.c  | 1 +
>  drivers/pci/pcie/portdrv_core.c | 5 -
>  include/linux/pci.h | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 83180ed..a4e36b6 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1333,6 +1333,7 @@ static int set_device_error_reporting(struct
> pci_dev *dev, void *data)
>   if (enable)
>   pcie_set_ecrc_checking(dev);
>  
> + dev->aer_configured = 1;
>   return 0;
>  }
>  
> diff --git a/drivers/pci/pcie/portdrv_core.c
> b/drivers/pci/pcie/portdrv_core.c
> index 7c37d81..f5de554 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -224,8 +224,11 @@ static int get_port_device_capability(struct
> pci_dev *dev)
>   /*
>* Disable AER on this port in case it's been
> enabled by the
>* BIOS (the AER service driver will enable it when
> necessary).
> +  * Don't disable it if the AER service driver has
> already
> +  * enabled it from the root port bus walking
>*/
> - pci_disable_pcie_error_reporting(dev);
> + if (!dev->aer_configured)
> + pci_disable_pcie_error_reporting(dev);
>   }
>  #endif
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8d..c071622 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -402,6 +402,7 @@ struct pci_dev {
>   unsigned inthas_secondary_link:1;
>   unsigned intnon_compliant_bars:1;   /* Broken
> BARs; ignore them */
>   unsigned intis_probed:1;/* Device
> probing in progress */
> + unsigned intaer_configured:1;   /* AER
> configured for device */
>   pci_dev_flags_t dev_flags;
>   atomic_tenable_cnt; /* pci_enable_device has
> been called */
>  

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/2] PCI/DPC: Add 'nodpc' parameter

2018-08-16 Thread Derrick, Jonathan
On Thu, 2018-08-16 at 08:49 -0700, Matthew Wilcox wrote:
> On Wed, Aug 15, 2018 at 03:26:39PM -0600, Jon Derrick wrote:
> > Some users may want to disable downstream port containment (DPC),
> > so
> > give them this option
> 
> Is it possible they might only want to disable DPC on a subset of the
> hierarchy rather than globally?

Absolutely. I was hoping Logan's pci dev_str would land because I have
a few others I want to convert to that api for granular tuning

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/2] PCI/DPC: Add 'nodpc' parameter

2018-08-16 Thread Derrick, Jonathan
On Thu, 2018-08-16 at 08:49 -0700, Matthew Wilcox wrote:
> On Wed, Aug 15, 2018 at 03:26:39PM -0600, Jon Derrick wrote:
> > Some users may want to disable downstream port containment (DPC),
> > so
> > give them this option
> 
> Is it possible they might only want to disable DPC on a subset of the
> hierarchy rather than globally?

Absolutely. I was hoping Logan's pci dev_str would land because I have
a few others I want to convert to that api for granular tuning

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] PCI: Equalize hotplug memory for non/occupied slots

2018-08-14 Thread Derrick, Jonathan
It's been a few weeks. Thoughts on this one?

On Wed, 2018-07-25 at 17:02 -0600, Jon Derrick wrote:
> Currently, a hotplug bridge will be given hpmemsize additional memory
> if
> available, in order to satisfy any future hotplug allocation
> requirements.
> 
> These calculations don't consider the current memory size of the
> hotplug
> bridge/slot, so hotplug bridges/slots which have downstream devices
> will
> get their current allocation in addition to the hpmemsize value.
> 
> This makes for possibly undesirable results with a mix of unoccupied
> and
> occupied slots (ex, with hpmemsize=2M):
> 
> 02:03.0 PCI bridge: <-- Occupied
>   Memory behind bridge: d620-d64f [size=3M]
> 02:04.0 PCI bridge: <-- Unoccupied
>   Memory behind bridge: d650-d66f [size=2M]
> 
> This change considers the current allocation size when using the
> hpmemsize parameter to make the reservations predictable for the mix
> of
> unoccupied and occupied slots:
> 
> 02:03.0 PCI bridge: <-- Occupied
>   Memory behind bridge: d620-d63f [size=2M]
> 02:04.0 PCI bridge: <-- Unoccupied
>   Memory behind bridge: d640-d65f [size=2M]
> 
> Signed-off-by: Jon Derrick 
> ---
> Original RFC here:
> https://patchwork.ozlabs.org/patch/945374/
> 
> I split this bit out from the RFC while awaiting the pci string
> handling
> enhancements to handle per-device settings
> 
> Changed from RFC is a simpler algo
> 
>  drivers/pci/setup-bus.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 79b1824..5ae39e6 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -831,7 +831,8 @@ static resource_size_t
> calculate_iosize(resource_size_t size,
>  
>  static resource_size_t calculate_memsize(resource_size_t size,
>   resource_size_t min_size,
> - resource_size_t size1,
> + resource_size_t add_size,
> + resource_size_t children_add_size,
>   resource_size_t old_size,
>   resource_size_t align)
>  {
> @@ -841,7 +842,7 @@ static resource_size_t
> calculate_memsize(resource_size_t size,
>   old_size = 0;
>   if (size < old_size)
>   size = old_size;
> - size = ALIGN(size + size1, align);
> + size = ALIGN(max(size, add_size) + children_add_size,
> align);
>   return size;
>  }
>  
> @@ -1079,12 +1080,10 @@ static int pbus_size_mem(struct pci_bus *bus,
> unsigned long mask,
>  
>   min_align = calculate_mem_align(aligns, max_order);
>   min_align = max(min_align, window_alignment(bus, b_res-
> >flags));
> - size0 = calculate_memsize(size, min_size, 0,
> resource_size(b_res), min_align);
> + size0 = calculate_memsize(size, min_size, 0, 0,
> resource_size(b_res), min_align);
>   add_align = max(min_align, add_align);
> - if (children_add_size > add_size)
> - add_size = children_add_size;
> - size1 = (!realloc_head || (realloc_head && !add_size)) ?
> size0 :
> - calculate_memsize(size, min_size, add_size,
> + size1 = (!realloc_head || (realloc_head && !add_size &&
> !children_add_size)) ? size0 :
> + calculate_memsize(size, min_size, add_size,
> children_add_size,
>   resource_size(b_res), add_align);
>   if (!size0 && !size1) {
>   if (b_res->start || b_res->end)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] PCI: Equalize hotplug memory for non/occupied slots

2018-08-14 Thread Derrick, Jonathan
It's been a few weeks. Thoughts on this one?

On Wed, 2018-07-25 at 17:02 -0600, Jon Derrick wrote:
> Currently, a hotplug bridge will be given hpmemsize additional memory
> if
> available, in order to satisfy any future hotplug allocation
> requirements.
> 
> These calculations don't consider the current memory size of the
> hotplug
> bridge/slot, so hotplug bridges/slots which have downstream devices
> will
> get their current allocation in addition to the hpmemsize value.
> 
> This makes for possibly undesirable results with a mix of unoccupied
> and
> occupied slots (ex, with hpmemsize=2M):
> 
> 02:03.0 PCI bridge: <-- Occupied
>   Memory behind bridge: d620-d64f [size=3M]
> 02:04.0 PCI bridge: <-- Unoccupied
>   Memory behind bridge: d650-d66f [size=2M]
> 
> This change considers the current allocation size when using the
> hpmemsize parameter to make the reservations predictable for the mix
> of
> unoccupied and occupied slots:
> 
> 02:03.0 PCI bridge: <-- Occupied
>   Memory behind bridge: d620-d63f [size=2M]
> 02:04.0 PCI bridge: <-- Unoccupied
>   Memory behind bridge: d640-d65f [size=2M]
> 
> Signed-off-by: Jon Derrick 
> ---
> Original RFC here:
> https://patchwork.ozlabs.org/patch/945374/
> 
> I split this bit out from the RFC while awaiting the pci string
> handling
> enhancements to handle per-device settings
> 
> Changed from RFC is a simpler algo
> 
>  drivers/pci/setup-bus.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 79b1824..5ae39e6 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -831,7 +831,8 @@ static resource_size_t
> calculate_iosize(resource_size_t size,
>  
>  static resource_size_t calculate_memsize(resource_size_t size,
>   resource_size_t min_size,
> - resource_size_t size1,
> + resource_size_t add_size,
> + resource_size_t children_add_size,
>   resource_size_t old_size,
>   resource_size_t align)
>  {
> @@ -841,7 +842,7 @@ static resource_size_t
> calculate_memsize(resource_size_t size,
>   old_size = 0;
>   if (size < old_size)
>   size = old_size;
> - size = ALIGN(size + size1, align);
> + size = ALIGN(max(size, add_size) + children_add_size,
> align);
>   return size;
>  }
>  
> @@ -1079,12 +1080,10 @@ static int pbus_size_mem(struct pci_bus *bus,
> unsigned long mask,
>  
>   min_align = calculate_mem_align(aligns, max_order);
>   min_align = max(min_align, window_alignment(bus, b_res-
> >flags));
> - size0 = calculate_memsize(size, min_size, 0,
> resource_size(b_res), min_align);
> + size0 = calculate_memsize(size, min_size, 0, 0,
> resource_size(b_res), min_align);
>   add_align = max(min_align, add_align);
> - if (children_add_size > add_size)
> - add_size = children_add_size;
> - size1 = (!realloc_head || (realloc_head && !add_size)) ?
> size0 :
> - calculate_memsize(size, min_size, add_size,
> + size1 = (!realloc_head || (realloc_head && !add_size &&
> !children_add_size)) ? size0 :
> + calculate_memsize(size, min_size, add_size,
> children_add_size,
>   resource_size(b_res), add_align);
>   if (!size0 && !size1) {
>   if (b_res->start || b_res->end)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH][RESEND] block: sed-opal: fix response string extraction

2018-03-06 Thread Derrick, Jonathan
This looks correct.

Adding my Ack unless Scott has objections

Acked-by: Jonathan Derrick 

On Thu, 2018-03-01 at 14:26 +0100, Jonas Rabenstein wrote:
> Tokens are prefixed by a variable length of bytes. If a bytestring is
> not stored in an tiny or short atom, we have to skip more than one
> byte
> in order to have the actual bytes not prefixed by the bytes
> describing
> the actual length of the string.
> 
> Signed-off-by: Jonas Rabenstein  n.de>
> ---
>  block/sed-opal.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 525506bed399..33052d0111de 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -876,6 +876,9 @@ static int response_parse(const u8 *buf, size_t
> length,
>  static size_t response_get_string(const struct parsed_resp *resp,
> int n,
> const char **store)
>  {
> + u8 skip;
> + const struct opal_resp_tok *token;
> +
>   *store = NULL;
>   if (!resp) {
>   pr_debug("Response is NULL\n");
> @@ -888,13 +891,30 @@ static size_t response_get_string(const struct
> parsed_resp *resp, int n,
>   return 0;
>   }
>  
> - if (resp->toks[n].type != OPAL_DTA_TOKENID_BYTESTRING) {
> + token = >toks[n];
> + if (token->type != OPAL_DTA_TOKENID_BYTESTRING) {
>   pr_debug("Token is not a byte string!\n");
>   return 0;
>   }
>  
> - *store = resp->toks[n].pos + 1;
> - return resp->toks[n].len - 1;
> + switch (token->width) {
> + case OPAL_WIDTH_TINY:
> + case OPAL_WIDTH_SHORT:
> + skip = 1;
> + break;
> + case OPAL_WIDTH_MEDIUM:
> + skip = 2;
> + break;
> + case OPAL_WIDTH_LONG:
> + skip = 4;
> + break;
> + default:
> + pr_debug("Token has invalid width!\n");
> + return 0;
> + }
> +
> + *store = token->pos + skip;
> + return token->len - skip;
>  }
>  
>  static u64 response_get_u64(const struct parsed_resp *resp, int n)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH][RESEND] block: sed-opal: fix response string extraction

2018-03-06 Thread Derrick, Jonathan
This looks correct.

Adding my Ack unless Scott has objections

Acked-by: Jonathan Derrick 

On Thu, 2018-03-01 at 14:26 +0100, Jonas Rabenstein wrote:
> Tokens are prefixed by a variable length of bytes. If a bytestring is
> not stored in an tiny or short atom, we have to skip more than one
> byte
> in order to have the actual bytes not prefixed by the bytes
> describing
> the actual length of the string.
> 
> Signed-off-by: Jonas Rabenstein  n.de>
> ---
>  block/sed-opal.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 525506bed399..33052d0111de 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -876,6 +876,9 @@ static int response_parse(const u8 *buf, size_t
> length,
>  static size_t response_get_string(const struct parsed_resp *resp,
> int n,
> const char **store)
>  {
> + u8 skip;
> + const struct opal_resp_tok *token;
> +
>   *store = NULL;
>   if (!resp) {
>   pr_debug("Response is NULL\n");
> @@ -888,13 +891,30 @@ static size_t response_get_string(const struct
> parsed_resp *resp, int n,
>   return 0;
>   }
>  
> - if (resp->toks[n].type != OPAL_DTA_TOKENID_BYTESTRING) {
> + token = >toks[n];
> + if (token->type != OPAL_DTA_TOKENID_BYTESTRING) {
>   pr_debug("Token is not a byte string!\n");
>   return 0;
>   }
>  
> - *store = resp->toks[n].pos + 1;
> - return resp->toks[n].len - 1;
> + switch (token->width) {
> + case OPAL_WIDTH_TINY:
> + case OPAL_WIDTH_SHORT:
> + skip = 1;
> + break;
> + case OPAL_WIDTH_MEDIUM:
> + skip = 2;
> + break;
> + case OPAL_WIDTH_LONG:
> + skip = 4;
> + break;
> + default:
> + pr_debug("Token has invalid width!\n");
> + return 0;
> + }
> +
> + *store = token->pos + skip;
> + return token->len - skip;
>  }
>  
>  static u64 response_get_u64(const struct parsed_resp *resp, int n)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH][RESEND] block: sed-opal: fix u64 short atom length

2018-03-06 Thread Derrick, Jonathan
Hi Jonas,

On Thu, 2018-03-01 at 14:27 +0100, Jonas Rabenstein wrote:
> The length must be given as bytes and not as 4 bit tuples.
> 
> Signed-off-by: Jonas Rabenstein  n.de>
> ---
>  block/sed-opal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 36842bfa572e..d5f565e1557a 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -562,7 +562,7 @@ static void add_token_u64(int *err, struct
> opal_dev *cmd, u64 number)
>   }
>  
>   msb = fls(number);
> - len = DIV_ROUND_UP(msb, 4);
> + len = DIV_ROUND_UP(msb, 8);

This change looks partially correct, but I believe we should be doing
fls64() on 'number' as well.

It looks like it currently coincidentally works with u64 numbers
falling in 32-bit ranges.


>  
>   if (cmd->pos >= IO_BUFFER_LENGTH - len - 1) {
>   pr_debug("Error adding u64: end of buffer.\n");

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH][RESEND] block: sed-opal: fix u64 short atom length

2018-03-06 Thread Derrick, Jonathan
Hi Jonas,

On Thu, 2018-03-01 at 14:27 +0100, Jonas Rabenstein wrote:
> The length must be given as bytes and not as 4 bit tuples.
> 
> Signed-off-by: Jonas Rabenstein  n.de>
> ---
>  block/sed-opal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 36842bfa572e..d5f565e1557a 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -562,7 +562,7 @@ static void add_token_u64(int *err, struct
> opal_dev *cmd, u64 number)
>   }
>  
>   msb = fls(number);
> - len = DIV_ROUND_UP(msb, 4);
> + len = DIV_ROUND_UP(msb, 8);

This change looks partially correct, but I believe we should be doing
fls64() on 'number' as well.

It looks like it currently coincidentally works with u64 numbers
falling in 32-bit ranges.


>  
>   if (cmd->pos >= IO_BUFFER_LENGTH - len - 1) {
>   pr_debug("Error adding u64: end of buffer.\n");

smime.p7s
Description: S/MIME cryptographic signature