Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: > > Introduce a new pcibios function pcibios_ignore_alignment_request > which allows the PCI core to defer to platform-specific code to > determine whether or not to ignore alignment requests for PCI resources. > > The existing behavior is to simply ignore alignment requests when > PCI_PROBE_ONLY is set. This is behavior is maintained by the > default implementation of pcibios_ignore_alignment_request. > > Signed-off-by: Shawn Anastasio > --- > drivers/pci/pci.c | 9 +++-- > include/linux/pci.h | 1 + > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 8abc843b1615..8207a09085d1 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) > return 0; > } > > +int __weak pcibios_ignore_alignment_request(void) > +{ > + return pci_has_flag(PCI_PROBE_ONLY); > +} > + > #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE > static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; > static DEFINE_SPINLOCK(resource_alignment_lock); > @@ -5906,9 +5911,9 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > p = resource_alignment_param; > if (!*p && !align) > goto out; > - if (pci_has_flag(PCI_PROBE_ONLY)) { > + if (pcibios_ignore_alignment_request()) { > align = 0; > - pr_info_once("PCI: Ignoring requested alignments > (PCI_PROBE_ONLY)\n"); > + pr_info_once("PCI: Ignoring requested alignments\n"); > goto out; > } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 4a5a84d7bdd4..47471dcdbaf9 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, > int active) {} > int pcibios_alloc_irq(struct pci_dev *dev); > void pcibios_free_irq(struct pci_dev *dev); > resource_size_t pcibios_default_alignment(void); > +int pcibios_ignore_alignment_request(void); > > #ifdef CONFIG_HIBERNATE_CALLBACKS > extern struct dev_pm_ops pcibios_pm_ops; > -- > 2.20.1 >
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/28/19 12:36 AM, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... Interesting, I wasn't aware that hotplugged devices are configured by the hypervisor on PowerVM. That at least means that this patch is wrong as-is since it won't handle that properly. Definitely seems like there will need to be different behavior here depending on the hypervisor. That being said, wouldn't PCI_PROBE_ONLY still be set on pseries/kvm (at least for initial boot) to observe SLOF's original BAR assignments? Perhaps it should be un-set after initial PCI init? IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living Good to know. I'll await his comments before continuing here. diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +int pcibios_ignore_alignment_request(void); #ifdef CONFIG_HIBERNATE_CALLBACKS extern struct dev_pm_ops pcibios_pm_ops; -- 2.20.1
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 28/05/2019 15:36, Oliver wrote: > On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: >> >> Introduce a new pcibios function pcibios_ignore_alignment_request >> which allows the PCI core to defer to platform-specific code to >> determine whether or not to ignore alignment requests for PCI resources. >> >> The existing behavior is to simply ignore alignment requests when >> PCI_PROBE_ONLY is set. This is behavior is maintained by the >> default implementation of pcibios_ignore_alignment_request. >> >> Signed-off-by: Shawn Anastasio >> --- >> drivers/pci/pci.c | 9 +++-- >> include/linux/pci.h | 1 + >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 8abc843b1615..8207a09085d1 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) >> return 0; >> } >> >> +int __weak pcibios_ignore_alignment_request(void) >> +{ >> + return pci_has_flag(PCI_PROBE_ONLY); >> +} >> + >> #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE >> static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; >> static DEFINE_SPINLOCK(resource_alignment_lock); >> @@ -5906,9 +5911,9 @@ static resource_size_t >> pci_specified_resource_alignment(struct pci_dev *dev, >> p = resource_alignment_param; >> if (!*p && !align) >> goto out; >> - if (pci_has_flag(PCI_PROBE_ONLY)) { >> + if (pcibios_ignore_alignment_request()) { >> align = 0; >> - pr_info_once("PCI: Ignoring requested alignments >> (PCI_PROBE_ONLY)\n"); >> + pr_info_once("PCI: Ignoring requested alignments\n"); >> goto out; >> } > > I think the logic here is questionable to begin with. If the user has > explicitly requested re-aligning a resource via the command line then > we should probably do it even if PCI_PROBE_ONLY is set. When it breaks > they get to keep the pieces. > > That said, the real issue here is that PCI_PROBE_ONLY probably > shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) > hotplugged devices are configured by firmware before it's passed to > the guest and we need to keep the FW assignments otherwise things > break. QEMU however doesn't do any BAR assignments and relies on that > being handled by the guest. At boot time this is done by SLOF, but > Linux only keeps SLOF around until it's extracted the device-tree. > Once that's done SLOF gets blown away and the kernel needs to do it's > own BAR assignments. I'm guessing there's a hack in there to make it > work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: [c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360 [c6e6fa10] [c05f8b54] assign_requested_resources_sorted+0x84/0x110 [c6e6fa60] [c05f9540] __assign_resources_sorted+0xd0/0x750 [c6e6fb40] [c05fb2e0] __pci_bus_assign_resources+0x80/0x280 [c6e6fc00] [c05fb95c] pci_assign_unassigned_bus_resources+0xbc/0x100 [c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60 [c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0 [c6e6fcd0] [c068060c] bus_attr_store+0x3c/0x60 [c6e6fcf0] [c037853c] sysfs_kf_write+0x5c/0x80 > > IIRC Sam Bobroff was looking at hotplug under pseries recently so he > might have something to add. He's sick at the moment, but I'll ask him > to take a look at this once he's back among the living > >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 4a5a84d7bdd4..47471dcdbaf9 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, >> int active) {} >> int pcibios_alloc_irq(struct pci_dev *dev); >> void pcibios_free_irq(struct pci_dev *dev); >> resource_size_t pcibios_default_alignment(void); >> +int pcibios_ignore_alignment_request(void); >> >> #ifdef CONFIG_HIBERNATE_CALLBACKS >> extern struct dev_pm_ops pcibios_pm_ops; >> -- >> 2.20.1 >> -- Alexey
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. [c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360 [c6e6fa10] [c05f8b54] assign_requested_resources_sorted+0x84/0x110 [c6e6fa60] [c05f9540] __assign_resources_sorted+0xd0/0x750 [c6e6fb40] [c05fb2e0] __pci_bus_assign_resources+0x80/0x280 [c6e6fc00] [c05fb95c] pci_assign_unassigned_bus_resources+0xbc/0x100 [c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60 [c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0 [c6e6fcd0] [c068060c] bus_attr_store+0x3c/0x60 [c6e6fcf0] [c037853c] sysfs_kf_write+0x5c/0x80 IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {} int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +int pcibios_ignore_alignment_request(void); #ifdef CONFIG_HIBERNATE_CALLBACKS extern struct dev_pm_ops pcibios_pm_ops; -- 2.20.1
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote: > On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: > > > > Introduce a new pcibios function pcibios_ignore_alignment_request > > which allows the PCI core to defer to platform-specific code to > > determine whether or not to ignore alignment requests for PCI resources. > > > > The existing behavior is to simply ignore alignment requests when > > PCI_PROBE_ONLY is set. This is behavior is maintained by the > > default implementation of pcibios_ignore_alignment_request. > > > > Signed-off-by: Shawn Anastasio > > --- > > drivers/pci/pci.c | 9 +++-- > > include/linux/pci.h | 1 + > > 2 files changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 8abc843b1615..8207a09085d1 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5882,6 +5882,11 @@ resource_size_t __weak > > pcibios_default_alignment(void) > > return 0; > > } > > > > +int __weak pcibios_ignore_alignment_request(void) > > +{ > > + return pci_has_flag(PCI_PROBE_ONLY); > > +} > > + > > #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE > > static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; > > static DEFINE_SPINLOCK(resource_alignment_lock); > > @@ -5906,9 +5911,9 @@ static resource_size_t > > pci_specified_resource_alignment(struct pci_dev *dev, > > p = resource_alignment_param; > > if (!*p && !align) > > goto out; > > - if (pci_has_flag(PCI_PROBE_ONLY)) { > > + if (pcibios_ignore_alignment_request()) { > > align = 0; > > - pr_info_once("PCI: Ignoring requested alignments > > (PCI_PROBE_ONLY)\n"); > > + pr_info_once("PCI: Ignoring requested alignments\n"); > > goto out; > > } > > I think the logic here is questionable to begin with. If the user has > explicitly requested re-aligning a resource via the command line then > we should probably do it even if PCI_PROBE_ONLY is set. When it breaks > they get to keep the pieces. I agree. I don't like PCI_PROBE_ONLY in the first place. It's a sledgehammer approach that doesn't tell us which resource assignments need to be preserved or why. I'd rather use IORESOURCE_PCI_FIXED and set it for the BARs where there's actually some sort of hypervisor/firmware/OS dependency. If there's a way to avoid another pciobios_*() weak function, that would also be better. Bjorn
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 28/05/2019 17:39, Shawn Anastasio wrote: > > > On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >> >> >> On 28/05/2019 15:36, Oliver wrote: >>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio >>> wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } >>> >>> I think the logic here is questionable to begin with. If the user has >>> explicitly requested re-aligning a resource via the command line then >>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks >>> they get to keep the pieces. >>> >>> That said, the real issue here is that PCI_PROBE_ONLY probably >>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) >>> hotplugged devices are configured by firmware before it's passed to >>> the guest and we need to keep the FW assignments otherwise things >>> break. QEMU however doesn't do any BAR assignments and relies on that >>> being handled by the guest. At boot time this is done by SLOF, but >>> Linux only keeps SLOF around until it's extracted the device-tree. >>> Once that's done SLOF gets blown away and the kernel needs to do it's >>> own BAR assignments. I'm guessing there's a hack in there to make it >>> work today, but it's a little surprising that it works at all... >> >> >> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >> guest which receives an event from qemu (RAS_EPOW from >> /proc/interrupts), fetches device tree chunks (and as I understand it - >> they come with BARs from phyp but without from qemu) and writes "1" to >> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: > > Interesting. Does this mean that the PHYP hotplug path doesn't > call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. > If so it means the patch may not > break that platform after all, though it still may not be > the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. >> [c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360 >> >> [c6e6fa10] [c05f8b54] >> assign_requested_resources_sorted+0x84/0x110 >> [c6e6fa60] [c05f9540] >> __assign_resources_sorted+0xd0/0x750 >> [c6e6fb40] [c05fb2e0] >> __pci_bus_assign_resources+0x80/0x280 >> [c6e6fc00] [c05fb95c] >> pci_assign_unassigned_bus_resources+0xbc/0x100 >> [c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60 >> >> [c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0 >> >> [c6e6fcd0] [c068060c] bus_attr_store+0x3c/0x60 >> >> [c6e6fcf0] [c037853c] sysfs_kf_write+0x5c/0x80 >> >> >> >> >> >>> >>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he >>> might have something to add. He's sick at the moment, but I'll ask him >>> to take a look at this once he's back among the living >>> diff --git a/include/linux/pci.h b/include/linux/pci.h index 4a5a84d7bdd4..47471dcdbaf9 100644 --- a/include/linux/pci.h
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? If not, the device tree modification that Sam found would work fine here. Otherwise, we need a way to honor the initial assignments from SLOF while still allowing custom alignments for hotplugged devices, either by deferring to the platform code like I do here, unsetting PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED like Bjorn suggested. [c6e6f960] [c05f62d4] pci_assign_resource+0x44/0x360 [c6e6fa10] [c05f8b54] assign_requested_resources_sorted+0x84/0x110 [c6e6fa60] [c05f9540] __assign_resources_sorted+0xd0/0x750 [c6e6fb40] [c05fb2e0] __pci_bus_assign_resources+0x80/0x280 [c6e6fc00] [c05fb95c] pci_assign_unassigned_bus_resources+0xbc/0x100 [c6e6fc60] [c05e3d74] pci_rescan_bus+0x34/0x60 [c6e6fc90] [c05f1ef4] rescan_store+0x84/0xc0 [c6e6fcd0] [c068060c] bus_attr_store+0x3c/0x60 [c6e6fcf0] [c037853c] sysfs_kf_write+0x5c/0x80
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 31/05/2019 08:49, Shawn Anastasio wrote: > On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: >> >> >> On 28/05/2019 17:39, Shawn Anastasio wrote: >>> >>> >>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: > On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio > wrote: >> >> Introduce a new pcibios function pcibios_ignore_alignment_request >> which allows the PCI core to defer to platform-specific code to >> determine whether or not to ignore alignment requests for PCI >> resources. >> >> The existing behavior is to simply ignore alignment requests when >> PCI_PROBE_ONLY is set. This is behavior is maintained by the >> default implementation of pcibios_ignore_alignment_request. >> >> Signed-off-by: Shawn Anastasio >> --- >> drivers/pci/pci.c | 9 +++-- >> include/linux/pci.h | 1 + >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 8abc843b1615..8207a09085d1 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -5882,6 +5882,11 @@ resource_size_t __weak >> pcibios_default_alignment(void) >> return 0; >> } >> >> +int __weak pcibios_ignore_alignment_request(void) >> +{ >> + return pci_has_flag(PCI_PROBE_ONLY); >> +} >> + >> #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE >> static char >> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; >> static DEFINE_SPINLOCK(resource_alignment_lock); >> @@ -5906,9 +5911,9 @@ static resource_size_t >> pci_specified_resource_alignment(struct pci_dev *dev, >> p = resource_alignment_param; >> if (!*p && !align) >> goto out; >> - if (pci_has_flag(PCI_PROBE_ONLY)) { >> + if (pcibios_ignore_alignment_request()) { >> align = 0; >> - pr_info_once("PCI: Ignoring requested alignments >> (PCI_PROBE_ONLY)\n"); >> + pr_info_once("PCI: Ignoring requested alignments\n"); >> goto out; >> } > > I think the logic here is questionable to begin with. If the user has > explicitly requested re-aligning a resource via the command line then > we should probably do it even if PCI_PROBE_ONLY is set. When it breaks > they get to keep the pieces. > > That said, the real issue here is that PCI_PROBE_ONLY probably > shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) > hotplugged devices are configured by firmware before it's passed to > the guest and we need to keep the FW assignments otherwise things > break. QEMU however doesn't do any BAR assignments and relies on that > being handled by the guest. At boot time this is done by SLOF, but > Linux only keeps SLOF around until it's extracted the device-tree. > Once that's done SLOF gets blown away and the kernel needs to do it's > own BAR assignments. I'm guessing there's a hack in there to make it > work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: >>> >>> Interesting. Does this mean that the PHYP hotplug path doesn't >>> call pci_assign_resource? >> >> >> I'd expect dlpar_add_slot() to be called under phyp and eventually >> pci_device_add() which (I think) may or may not trigger later >> reassignment. >> >> >>> If so it means the patch may not >>> break that platform after all, though it still may not be >>> the correct way of doing things. >> >> >> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems >> that (unless resource_alignment= is used) the pseries guest should just >> walk through all allocated resources and leave them unchanged. > > If we add a pcibios_default_alignment() implementation like was > suggested earlier, then it will behave as if the user has > specified resource_alignment= by default and SLOF's assignments > won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. > I guess it boils down to one question - is it important that we > observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it d
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which worked, but if I add an implementation of pcibios_default_alignment which simply returns PAGE_SIZE, my VM fails to boot and many errors from the virtio disk driver are printed to the console. After some investigation, it seems that with pcibios_default_alignment present, Linux will reallocate all resources provided by SLOF on boot. I'm still not sure why exactly t
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 03/06/2019 12:23, Shawn Anastasio wrote: > > > On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: >> >> >> On 31/05/2019 08:49, Shawn Anastasio wrote: >>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: > > > On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >> >> >> On 28/05/2019 15:36, Oliver wrote: >>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio >>> wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } >>> >>> I think the logic here is questionable to begin with. If the user >>> has >>> explicitly requested re-aligning a resource via the command line >>> then >>> we should probably do it even if PCI_PROBE_ONLY is set. When it >>> breaks >>> they get to keep the pieces. >>> >>> That said, the real issue here is that PCI_PROBE_ONLY probably >>> shouldn't be set under qemu/kvm. Under the other hypervisor >>> (PowerVM) >>> hotplugged devices are configured by firmware before it's passed to >>> the guest and we need to keep the FW assignments otherwise things >>> break. QEMU however doesn't do any BAR assignments and relies on >>> that >>> being handled by the guest. At boot time this is done by SLOF, but >>> Linux only keeps SLOF around until it's extracted the device-tree. >>> Once that's done SLOF gets blown away and the kernel needs to do >>> it's >>> own BAR assignments. I'm guessing there's a hack in there to make it >>> work today, but it's a little surprising that it works at all... >> >> >> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >> guest which receives an event from qemu (RAS_EPOW from >> /proc/interrupts), fetches device tree chunks (and as I understand >> it - >> they come with BARs from phyp but without from qemu) and writes >> "1" to >> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: > > Interesting. Does this mean that the PHYP hotplug path doesn't > call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. > If so it means the patch may not > break that platform after all, though it still may not be > the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. >>> >>> If we add a pcibios_default_alignment() implementation like was >>> suggested earlier, then it will behave as if the user has >>> specified resource_alignment= by default and SLOF's assignments >>> won't be honored (I think). >> >> >> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and >> tried booting with and without pci=
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 03/06/2019 15:02, Alexey Kardashevskiy wrote: > > > On 03/06/2019 12:23, Shawn Anastasio wrote: >> >> >> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: >>> >>> >>> On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: > > > On 28/05/2019 17:39, Shawn Anastasio wrote: >> >> >> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >>> >>> >>> On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: > > Introduce a new pcibios function pcibios_ignore_alignment_request > which allows the PCI core to defer to platform-specific code to > determine whether or not to ignore alignment requests for PCI > resources. > > The existing behavior is to simply ignore alignment requests when > PCI_PROBE_ONLY is set. This is behavior is maintained by the > default implementation of pcibios_ignore_alignment_request. > > Signed-off-by: Shawn Anastasio > --- > drivers/pci/pci.c | 9 +++-- > include/linux/pci.h | 1 + > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 8abc843b1615..8207a09085d1 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5882,6 +5882,11 @@ resource_size_t __weak > pcibios_default_alignment(void) > return 0; > } > > +int __weak pcibios_ignore_alignment_request(void) > +{ > + return pci_has_flag(PCI_PROBE_ONLY); > +} > + > #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE > static char > resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; > static DEFINE_SPINLOCK(resource_alignment_lock); > @@ -5906,9 +5911,9 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > p = resource_alignment_param; > if (!*p && !align) > goto out; > - if (pci_has_flag(PCI_PROBE_ONLY)) { > + if (pcibios_ignore_alignment_request()) { > align = 0; > - pr_info_once("PCI: Ignoring requested alignments > (PCI_PROBE_ONLY)\n"); > + pr_info_once("PCI: Ignoring requested > alignments\n"); > goto out; > } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... >>> >>> >>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >>> guest which receives an event from qemu (RAS_EPOW from >>> /proc/interrupts), fetches device tree chunks (and as I understand >>> it - >>> they come with BARs from phyp but without from qemu) and writes >>> "1" to >>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: >> >> Interesting. Does this mean that the PHYP hotplug path doesn't >> call pci_assign_resource? > > > I'd expect dlpar_add_slot() to be called under phyp and eventually > pci_device_add() which (I think) may or may not trigger later > reassignment. > > >> If so it means the patch may not >> break that platform after all, though it still may not be >> the correct way of doing things. > > > We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems > that (unless resource_alignment= is used) the pseries guest should just > walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by defa
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote: On 03/06/2019 15:02, Alexey Kardashevskiy wrote: On 03/06/2019 12:23, Shawn Anastasio wrote: On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which worked, but if I add an implementation of pcibios_default_alignment which simply returns PAGE_SIZE, my VM fails to boot and many errors from the virtio disk driver are printed to the console. After some
Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote: > On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: > > > > Introduce a new pcibios function pcibios_ignore_alignment_request > > which allows the PCI core to defer to platform-specific code to > > determine whether or not to ignore alignment requests for PCI resources. > > > > The existing behavior is to simply ignore alignment requests when > > PCI_PROBE_ONLY is set. This is behavior is maintained by the > > default implementation of pcibios_ignore_alignment_request. > > > > Signed-off-by: Shawn Anastasio > > --- > > drivers/pci/pci.c | 9 +++-- > > include/linux/pci.h | 1 + > > 2 files changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 8abc843b1615..8207a09085d1 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5882,6 +5882,11 @@ resource_size_t __weak > > pcibios_default_alignment(void) > > return 0; > > } > > > > +int __weak pcibios_ignore_alignment_request(void) > > +{ > > + return pci_has_flag(PCI_PROBE_ONLY); > > +} > > + > > #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE > > static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; > > static DEFINE_SPINLOCK(resource_alignment_lock); > > @@ -5906,9 +5911,9 @@ static resource_size_t > > pci_specified_resource_alignment(struct pci_dev *dev, > > p = resource_alignment_param; > > if (!*p && !align) > > goto out; > > - if (pci_has_flag(PCI_PROBE_ONLY)) { > > + if (pcibios_ignore_alignment_request()) { > > align = 0; > > - pr_info_once("PCI: Ignoring requested alignments > > (PCI_PROBE_ONLY)\n"); > > + pr_info_once("PCI: Ignoring requested alignments\n"); > > goto out; > > } > > I think the logic here is questionable to begin with. If the user has > explicitly requested re-aligning a resource via the command line then > we should probably do it even if PCI_PROBE_ONLY is set. When it breaks > they get to keep the pieces. > > That said, the real issue here is that PCI_PROBE_ONLY probably > shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) > hotplugged devices are configured by firmware before it's passed to > the guest and we need to keep the FW assignments otherwise things > break. QEMU however doesn't do any BAR assignments and relies on that > being handled by the guest. At boot time this is done by SLOF, but > Linux only keeps SLOF around until it's extracted the device-tree. > Once that's done SLOF gets blown away and the kernel needs to do it's > own BAR assignments. I'm guessing there's a hack in there to make it > work today, but it's a little surprising that it works at all... > > IIRC Sam Bobroff was looking at hotplug under pseries recently so he > might have something to add. He's sick at the moment, but I'll ask him > to take a look at this once he's back among the living There seems to be some code already in the kernel that will disable PCI_PROBE_ONLY based on a device tree property, so I did a quick test today and it seems to work. Only a trivial tweak is needed in QEMU to do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only" with a value of 0), and that would allow us to set it only for QEMU (and not PowerVM) if that's what we want to do. Is that useful? (I haven't done any real testing yet but the guest booted up OK.) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index 4a5a84d7bdd4..47471dcdbaf9 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, > > int active) {} > > int pcibios_alloc_irq(struct pci_dev *dev); > > void pcibios_free_irq(struct pci_dev *dev); > > resource_size_t pcibios_default_alignment(void); > > +int pcibios_ignore_alignment_request(void); > > > > #ifdef CONFIG_HIBERNATE_CALLBACKS > > extern struct dev_pm_ops pcibios_pm_ops; > > -- > > 2.20.1 > > > signature.asc Description: PGP signature
Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/30/19 1:55 AM, Sam Bobroff wrote: On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... IIRC Sam Bobroff was looking at hotplug under pseries recently so he might have something to add. He's sick at the moment, but I'll ask him to take a look at this once he's back among the living There seems to be some code already in the kernel that will disable PCI_PROBE_ONLY based on a device tree property, so I did a quick test today and it seems to work. Only a trivial tweak is needed in QEMU to do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only" with a value of 0), and that would allow us to set it only for QEMU (and not PowerVM) if that's what we want to do. Is that useful? (I haven't done any real testing yet but the guest booted up OK.) It was my understanding that PCI_PROBE_ONLY should actually be set initially so that Linux uses SLOF's BAR assignments. The issue here is that PCI_PROBE_ONLY shouldn't be honored after initial bringup on KVM so that hotplugged PCI devices can have custom BAR alignments. Of course, if there's no need to honor SLOF's initial assignments, I assume disabling PCI_PROBE_ONLY would work fine. In fact, I'm not entirely sure why it's done in the first place. Does anybody know? If there is actually a valid reason for preserving SLOF's initial assignments, then it seems like the correct solution is to disable PCI_PROBE_ONLY after initial PCI bringup or ignore it in pci_specified_resource_alignment() like I do in this patch set. Bjorn Helgaas also suggested marking individual resources provided by SLOF/PHYP with IORESOURCE_PCI_FIXED which would remove the need to use PCI_PROBE_ONLY altogether. Any thoughts? - Shawn