Re: x86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-21 Thread Yinghai Lu
On Thu, Oct 19, 2017 at 10:52 PM, Dave Young  wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to:
>
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
>
> It's more transparent and user-friendly.

ok with me.

But looks like last time Vivek did not like this idea.


Re: x86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-21 Thread Yinghai Lu
On Thu, Oct 19, 2017 at 10:52 PM, Dave Young  wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to:
>
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
>
> It's more transparent and user-friendly.

ok with me.

But looks like last time Vivek did not like this idea.


Re: [PATCH V13 1/4] PCI: Don't ignore valid response before CRS timeout

2017-09-03 Thread Yinghai Lu
On Tue, Aug 29, 2017 at 12:53 PM, Bjorn Helgaas  wrote:

>
> Applied this series to pci/enumeration for v4.14.  You didn't include a
> cover letter, but the series includes:
>
>   [V13 1/4] PCI: Don't ignore valid response before CRS timeout
>   [V13 2/4] PCI: Factor out pci_bus_wait_crs()
>   [V13 3/4] PCI: Handle CRS ('device not ready') returned by device af
>   [V13 4/4] PCI: Warn periodically while waiting for device to become
>
> I made some changes:
>
>   - Renamed pci_bus_crs_visibility_pending() to pci_bus_crs_vendor_id()
> because the CRS completion is not "pending".  It is not waiting
> somewhere for us to do something about it.  The CRS completion has
> already occurred and is over.  All we have now is a magic Vendor ID
> value that tells us that it happened.
>
>   - Split addition of pci_bus_crs_vendor_id() to a separate patch, move
> it to probe.c, and make it static.

the calling of pci_bus_crs_vendor_id() in pci_bus_read_dev_vendor_id()
could be removed. pci_bus_wait_crs() have that calling inside.

-Yinghai


Re: [PATCH V13 1/4] PCI: Don't ignore valid response before CRS timeout

2017-09-03 Thread Yinghai Lu
On Tue, Aug 29, 2017 at 12:53 PM, Bjorn Helgaas  wrote:

>
> Applied this series to pci/enumeration for v4.14.  You didn't include a
> cover letter, but the series includes:
>
>   [V13 1/4] PCI: Don't ignore valid response before CRS timeout
>   [V13 2/4] PCI: Factor out pci_bus_wait_crs()
>   [V13 3/4] PCI: Handle CRS ('device not ready') returned by device af
>   [V13 4/4] PCI: Warn periodically while waiting for device to become
>
> I made some changes:
>
>   - Renamed pci_bus_crs_visibility_pending() to pci_bus_crs_vendor_id()
> because the CRS completion is not "pending".  It is not waiting
> somewhere for us to do something about it.  The CRS completion has
> already occurred and is over.  All we have now is a magic Vendor ID
> value that tells us that it happened.
>
>   - Split addition of pci_bus_crs_vendor_id() to a separate patch, move
> it to probe.c, and make it static.

the calling of pci_bus_crs_vendor_id() in pci_bus_read_dev_vendor_id()
could be removed. pci_bus_wait_crs() have that calling inside.

-Yinghai


[PATCH 03/10] PCI: export symbol for PCI_TEST module

2017-08-05 Thread Yinghai Lu
We need to use them from pci_test module, so expose them.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 arch/x86/pci/i386.c | 1 +
 drivers/pci/setup-bus.c | 1 +
 kernel/resource.c   | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 7b43071..9065c58 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -383,6 +383,7 @@ void pcibios_resource_survey_bus(struct pci_bus *bus)
if (!(pci_probe & PCI_ASSIGN_ROMS))
pcibios_allocate_rom_resources(bus);
 }
+EXPORT_SYMBOL_GPL(pcibios_resource_survey_bus);
 
 void __init pcibios_resource_survey(void)
 {
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1c30102..24292e9 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1839,6 +1839,7 @@ void pci_assign_unassigned_root_bus_resources(struct 
pci_bus *bus)
/* dump the resource on buses */
pci_bus_dump_resources(bus);
 }
+EXPORT_SYMBOL_GPL(pci_assign_unassigned_root_bus_resources);
 
 void __init pci_assign_unassigned_resources(void)
 {
diff --git a/kernel/resource.c b/kernel/resource.c
index 4174020..0d40d9a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -301,6 +301,8 @@ void release_child_resources(struct resource *r)
write_unlock(_lock);
 }
 
+EXPORT_SYMBOL_GPL(release_child_resources);
+
 /**
  * request_resource_conflict - request and reserve an I/O or memory resource
  * @root: root resource descriptor
-- 
2.9.4



[PATCH 03/10] PCI: export symbol for PCI_TEST module

2017-08-05 Thread Yinghai Lu
We need to use them from pci_test module, so expose them.

Signed-off-by: Yinghai Lu 
---
 arch/x86/pci/i386.c | 1 +
 drivers/pci/setup-bus.c | 1 +
 kernel/resource.c   | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 7b43071..9065c58 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -383,6 +383,7 @@ void pcibios_resource_survey_bus(struct pci_bus *bus)
if (!(pci_probe & PCI_ASSIGN_ROMS))
pcibios_allocate_rom_resources(bus);
 }
+EXPORT_SYMBOL_GPL(pcibios_resource_survey_bus);
 
 void __init pcibios_resource_survey(void)
 {
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1c30102..24292e9 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1839,6 +1839,7 @@ void pci_assign_unassigned_root_bus_resources(struct 
pci_bus *bus)
/* dump the resource on buses */
pci_bus_dump_resources(bus);
 }
+EXPORT_SYMBOL_GPL(pci_assign_unassigned_root_bus_resources);
 
 void __init pci_assign_unassigned_resources(void)
 {
diff --git a/kernel/resource.c b/kernel/resource.c
index 4174020..0d40d9a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -301,6 +301,8 @@ void release_child_resources(struct resource *r)
write_unlock(_lock);
 }
 
+EXPORT_SYMBOL_GPL(release_child_resources);
+
 /**
  * request_resource_conflict - request and reserve an I/O or memory resource
  * @root: root resource descriptor
-- 
2.9.4



[PATCH 00/10] PCI: pci resource allocation test module

2017-08-05 Thread Yinghai Lu
Read from data file and mask file, to build simulated data structure, and
have pci_ops to use them.

Extract calling for pci_create_root_bus, scan_child_bus, resource survey
and resource assign ... to see if those functions work as expected with
simulated data.

mask is with rw bits on pci registers, so we can make pci BAR sizing working.

It also support bus number assign-all.

Only tested on x86 64bit arch.

  # insmod pci_test.ko data_file=pci_test_data.txt mask_file=pci_test_mask.txt
  # lspci -tv
  # cat /proc/ioports_test
  # cat /proc/iomem_test
  # rmmod pci_test

also in git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
for_pci_v4.14_pci_next

Thanks

Yinghai

Yinghai Lu (10):
  PCI: avoid arch_remove_reservations() for PCI_TEST
  PCI: introduce ioport_res/iomem_res for PCI_TEST
  PCI: export symbol for PCI_TEST module
  PCI: extend pci device match_driver state
  PCI: Add PCI_TEST module for resource allocation
  PCI: PCI_TEST simple data
  PCI: PCI_TEST data from x5-8
  PCI: PCI_TEST data from x5-8 with zeroed bus number
  PCI: PCI_TEST data from x2-8
  PCI: PCI_TEST data from x2-8 with zeroed bus number

 arch/x86/kernel/resource.c |15 +-
 arch/x86/pci/i386.c| 1 +
 drivers/iommu/amd_iommu_init.c | 2 +-
 drivers/pci/Kconfig| 6 +
 drivers/pci/Makefile   | 2 +
 drivers/pci/bus.c  | 3 +-
 drivers/pci/pci-driver.c   | 2 +-
 drivers/pci/pci_test.c |  1281 ++
 drivers/pci/pci_test_data.txt  |24 +
 drivers/pci/pci_test_data_x2-8.txt | 22818 +++
 drivers/pci/pci_test_data_x2-8_bus.txt | 22818 +++
 drivers/pci/pci_test_data_x5-8.txt |  5656 
 drivers/pci/pci_test_data_x5-8_bus.txt |  5656 
 drivers/pci/pci_test_mask.txt  | 5 +
 drivers/pci/pci_test_mask_x2-8.txt |   319 +
 drivers/pci/pci_test_mask_x5-8.txt |   176 +
 drivers/pci/probe.c| 4 +-
 drivers/pci/quirks.c   | 2 +-
 drivers/pci/setup-bus.c| 3 +-
 drivers/pci/setup-res.c| 4 +-
 include/linux/ioport.h | 3 +-
 include/linux/pci.h|15 +-
 kernel/resource.c  | 7 +-
 23 files changed, 58808 insertions(+), 14 deletions(-)
 create mode 100644 drivers/pci/pci_test.c
 create mode 100644 drivers/pci/pci_test_data.txt
 create mode 100644 drivers/pci/pci_test_data_x2-8.txt
 create mode 100644 drivers/pci/pci_test_data_x2-8_bus.txt
 create mode 100644 drivers/pci/pci_test_data_x5-8.txt
 create mode 100644 drivers/pci/pci_test_data_x5-8_bus.txt
 create mode 100644 drivers/pci/pci_test_mask.txt
 create mode 100644 drivers/pci/pci_test_mask_x2-8.txt
 create mode 100644 drivers/pci/pci_test_mask_x5-8.txt

-- 
2.9.4



[PATCH 01/10] PCI: avoid arch_remove_reservations() for PCI_TEST

2017-08-05 Thread Yinghai Lu
arch_remove_reservations will clip out with e820 from host that kernel
running, that will cause failure from PCI_TEST from simulated data.

PCI_TEST has different iomem resource instead iomem_resource,
so check if iomem_resource is related to avoid calling
arch_remove_reservations()

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 arch/x86/kernel/resource.c | 15 +--
 include/linux/ioport.h |  3 ++-
 kernel/resource.c  |  5 +++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index 5ab3895..f19b0f6 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -35,14 +35,25 @@ static void remove_e820_regions(struct resource *avail)
}
 }
 
-void arch_remove_reservations(struct resource *avail)
+static int is_from_iomem_resource(struct resource *root)
+{
+   while (root->parent)
+   root = root->parent;
+
+   if (root == _resource)
+   return 1;
+
+   return 0;
+}
+
+void arch_remove_reservations(struct resource *root, struct resource *avail)
 {
/*
 * Trim out BIOS area (high 2MB) and E820 regions. We do not remove
 * the low 1MB unconditionally, as this area is needed for some ISA
 * cards requiring a memory range, e.g. the i82365 PCMCIA controller.
 */
-   if (avail->flags & IORESOURCE_MEM) {
+   if ((avail->flags & IORESOURCE_MEM) && is_from_iomem_resource(root)) {
resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
 
remove_e820_regions(avail);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064..4e9272d 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -177,7 +177,8 @@ extern struct resource *insert_resource_conflict(struct 
resource *parent, struct
 extern int insert_resource(struct resource *parent, struct resource *new);
 extern void insert_resource_expand_to_fit(struct resource *root, struct 
resource *new);
 extern int remove_resource(struct resource *old);
-extern void arch_remove_reservations(struct resource *avail);
+extern void arch_remove_reservations(struct resource *root,
+   struct resource *avail);
 extern int allocate_resource(struct resource *root, struct resource *new,
 resource_size_t size, resource_size_t min,
 resource_size_t max, resource_size_t align,
diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..4174020 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -570,7 +570,8 @@ int region_intersects(resource_size_t start, size_t size, 
unsigned long flags,
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
-void __weak arch_remove_reservations(struct resource *avail)
+void __weak arch_remove_reservations(struct resource *root,
+struct resource *avail)
 {
 }
 
@@ -622,7 +623,7 @@ static int __find_resource(struct resource *root, struct 
resource *old,
goto next;
 
resource_clip(, constraint->min, constraint->max);
-   arch_remove_reservations();
+   arch_remove_reservations(root, );
 
/* Check for overflow after ALIGN() */
avail.start = ALIGN(tmp.start, constraint->align);
-- 
2.9.4



[PATCH 00/10] PCI: pci resource allocation test module

2017-08-05 Thread Yinghai Lu
Read from data file and mask file, to build simulated data structure, and
have pci_ops to use them.

Extract calling for pci_create_root_bus, scan_child_bus, resource survey
and resource assign ... to see if those functions work as expected with
simulated data.

mask is with rw bits on pci registers, so we can make pci BAR sizing working.

It also support bus number assign-all.

Only tested on x86 64bit arch.

  # insmod pci_test.ko data_file=pci_test_data.txt mask_file=pci_test_mask.txt
  # lspci -tv
  # cat /proc/ioports_test
  # cat /proc/iomem_test
  # rmmod pci_test

also in git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
for_pci_v4.14_pci_next

Thanks

Yinghai

Yinghai Lu (10):
  PCI: avoid arch_remove_reservations() for PCI_TEST
  PCI: introduce ioport_res/iomem_res for PCI_TEST
  PCI: export symbol for PCI_TEST module
  PCI: extend pci device match_driver state
  PCI: Add PCI_TEST module for resource allocation
  PCI: PCI_TEST simple data
  PCI: PCI_TEST data from x5-8
  PCI: PCI_TEST data from x5-8 with zeroed bus number
  PCI: PCI_TEST data from x2-8
  PCI: PCI_TEST data from x2-8 with zeroed bus number

 arch/x86/kernel/resource.c |15 +-
 arch/x86/pci/i386.c| 1 +
 drivers/iommu/amd_iommu_init.c | 2 +-
 drivers/pci/Kconfig| 6 +
 drivers/pci/Makefile   | 2 +
 drivers/pci/bus.c  | 3 +-
 drivers/pci/pci-driver.c   | 2 +-
 drivers/pci/pci_test.c |  1281 ++
 drivers/pci/pci_test_data.txt  |24 +
 drivers/pci/pci_test_data_x2-8.txt | 22818 +++
 drivers/pci/pci_test_data_x2-8_bus.txt | 22818 +++
 drivers/pci/pci_test_data_x5-8.txt |  5656 
 drivers/pci/pci_test_data_x5-8_bus.txt |  5656 
 drivers/pci/pci_test_mask.txt  | 5 +
 drivers/pci/pci_test_mask_x2-8.txt |   319 +
 drivers/pci/pci_test_mask_x5-8.txt |   176 +
 drivers/pci/probe.c| 4 +-
 drivers/pci/quirks.c   | 2 +-
 drivers/pci/setup-bus.c| 3 +-
 drivers/pci/setup-res.c| 4 +-
 include/linux/ioport.h | 3 +-
 include/linux/pci.h|15 +-
 kernel/resource.c  | 7 +-
 23 files changed, 58808 insertions(+), 14 deletions(-)
 create mode 100644 drivers/pci/pci_test.c
 create mode 100644 drivers/pci/pci_test_data.txt
 create mode 100644 drivers/pci/pci_test_data_x2-8.txt
 create mode 100644 drivers/pci/pci_test_data_x2-8_bus.txt
 create mode 100644 drivers/pci/pci_test_data_x5-8.txt
 create mode 100644 drivers/pci/pci_test_data_x5-8_bus.txt
 create mode 100644 drivers/pci/pci_test_mask.txt
 create mode 100644 drivers/pci/pci_test_mask_x2-8.txt
 create mode 100644 drivers/pci/pci_test_mask_x5-8.txt

-- 
2.9.4



[PATCH 01/10] PCI: avoid arch_remove_reservations() for PCI_TEST

2017-08-05 Thread Yinghai Lu
arch_remove_reservations will clip out with e820 from host that kernel
running, that will cause failure from PCI_TEST from simulated data.

PCI_TEST has different iomem resource instead iomem_resource,
so check if iomem_resource is related to avoid calling
arch_remove_reservations()

Signed-off-by: Yinghai Lu 
---
 arch/x86/kernel/resource.c | 15 +--
 include/linux/ioport.h |  3 ++-
 kernel/resource.c  |  5 +++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index 5ab3895..f19b0f6 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -35,14 +35,25 @@ static void remove_e820_regions(struct resource *avail)
}
 }
 
-void arch_remove_reservations(struct resource *avail)
+static int is_from_iomem_resource(struct resource *root)
+{
+   while (root->parent)
+   root = root->parent;
+
+   if (root == _resource)
+   return 1;
+
+   return 0;
+}
+
+void arch_remove_reservations(struct resource *root, struct resource *avail)
 {
/*
 * Trim out BIOS area (high 2MB) and E820 regions. We do not remove
 * the low 1MB unconditionally, as this area is needed for some ISA
 * cards requiring a memory range, e.g. the i82365 PCMCIA controller.
 */
-   if (avail->flags & IORESOURCE_MEM) {
+   if ((avail->flags & IORESOURCE_MEM) && is_from_iomem_resource(root)) {
resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
 
remove_e820_regions(avail);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064..4e9272d 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -177,7 +177,8 @@ extern struct resource *insert_resource_conflict(struct 
resource *parent, struct
 extern int insert_resource(struct resource *parent, struct resource *new);
 extern void insert_resource_expand_to_fit(struct resource *root, struct 
resource *new);
 extern int remove_resource(struct resource *old);
-extern void arch_remove_reservations(struct resource *avail);
+extern void arch_remove_reservations(struct resource *root,
+   struct resource *avail);
 extern int allocate_resource(struct resource *root, struct resource *new,
 resource_size_t size, resource_size_t min,
 resource_size_t max, resource_size_t align,
diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..4174020 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -570,7 +570,8 @@ int region_intersects(resource_size_t start, size_t size, 
unsigned long flags,
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
-void __weak arch_remove_reservations(struct resource *avail)
+void __weak arch_remove_reservations(struct resource *root,
+struct resource *avail)
 {
 }
 
@@ -622,7 +623,7 @@ static int __find_resource(struct resource *root, struct 
resource *old,
goto next;
 
resource_clip(, constraint->min, constraint->max);
-   arch_remove_reservations();
+   arch_remove_reservations(root, );
 
/* Check for overflow after ALIGN() */
avail.start = ALIGN(tmp.start, constraint->align);
-- 
2.9.4



[PATCH 04/10] PCI: extend pci device match_driver state

2017-08-05 Thread Yinghai Lu
Change it from false/true to -1/0/1.

If it is set from -1, then it will never get change to 1 later.

For PCI_TEST, the simulated device does not have realy function
support except bus number and BAR setting.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 drivers/iommu/amd_iommu_init.c | 2 +-
 drivers/pci/bus.c  | 3 ++-
 drivers/pci/pci-driver.c   | 2 +-
 drivers/pci/probe.c| 2 +-
 include/linux/pci.h| 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 3723037..6d706ca 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1558,7 +1558,7 @@ static int iommu_init_pci(struct amd_iommu *iommu)
return -ENODEV;
 
/* Prevent binding other PCI device drivers to IOMMU devices */
-   iommu->dev->match_driver = false;
+   iommu->dev->match_driver = 0;
 
pci_read_config_dword(iommu->dev, cap_ptr + MMIO_CAP_HDR_OFFSET,
  >cap);
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index bc56cf1..e2c0bda 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -322,7 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
pci_proc_attach_device(dev);
pci_bridge_d3_update(dev);
 
-   dev->match_driver = true;
+   if (dev->match_driver == 0)
+   dev->match_driver = 1;
retval = device_attach(>dev);
if (retval < 0 && retval != -EPROBE_DEFER) {
dev_warn(>dev, "device attach failed (%d)\n", retval);
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d51e873..c67b872 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1373,7 +1373,7 @@ static int pci_bus_match(struct device *dev, struct 
device_driver *drv)
struct pci_driver *pci_drv;
const struct pci_device_id *found_id;
 
-   if (!pci_dev->match_driver)
+   if (pci_dev->match_driver != 1)
return 0;
 
pci_drv = to_pci_driver(drv);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1811016..34ffbdf 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2021,7 +2021,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
*bus)
pci_set_msi_domain(dev);
 
/* Notifier could use PCI capabilities */
-   dev->match_driver = false;
+   dev->match_driver = 0;
ret = device_add(>dev);
WARN_ON(ret < 0);
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c58a635..6e81d64 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -344,7 +344,7 @@ struct pci_dev {
unsigned intirq;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory 
regions + expansion ROMs */
 
-   bool match_driver;  /* Skip attaching driver */
+   int match_driver;   /* Skip attaching driver */
/* These fields are used by common fixups */
unsigned inttransparent:1;  /* Subtractive decode PCI bridge */
unsigned intmultifunction:1;/* Part of multi-function device */
-- 
2.9.4



[PATCH 04/10] PCI: extend pci device match_driver state

2017-08-05 Thread Yinghai Lu
Change it from false/true to -1/0/1.

If it is set from -1, then it will never get change to 1 later.

For PCI_TEST, the simulated device does not have realy function
support except bus number and BAR setting.

Signed-off-by: Yinghai Lu 
---
 drivers/iommu/amd_iommu_init.c | 2 +-
 drivers/pci/bus.c  | 3 ++-
 drivers/pci/pci-driver.c   | 2 +-
 drivers/pci/probe.c| 2 +-
 include/linux/pci.h| 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 3723037..6d706ca 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1558,7 +1558,7 @@ static int iommu_init_pci(struct amd_iommu *iommu)
return -ENODEV;
 
/* Prevent binding other PCI device drivers to IOMMU devices */
-   iommu->dev->match_driver = false;
+   iommu->dev->match_driver = 0;
 
pci_read_config_dword(iommu->dev, cap_ptr + MMIO_CAP_HDR_OFFSET,
  >cap);
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index bc56cf1..e2c0bda 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -322,7 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
pci_proc_attach_device(dev);
pci_bridge_d3_update(dev);
 
-   dev->match_driver = true;
+   if (dev->match_driver == 0)
+   dev->match_driver = 1;
retval = device_attach(>dev);
if (retval < 0 && retval != -EPROBE_DEFER) {
dev_warn(>dev, "device attach failed (%d)\n", retval);
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d51e873..c67b872 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1373,7 +1373,7 @@ static int pci_bus_match(struct device *dev, struct 
device_driver *drv)
struct pci_driver *pci_drv;
const struct pci_device_id *found_id;
 
-   if (!pci_dev->match_driver)
+   if (pci_dev->match_driver != 1)
return 0;
 
pci_drv = to_pci_driver(drv);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1811016..34ffbdf 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2021,7 +2021,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
*bus)
pci_set_msi_domain(dev);
 
/* Notifier could use PCI capabilities */
-   dev->match_driver = false;
+   dev->match_driver = 0;
ret = device_add(>dev);
WARN_ON(ret < 0);
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c58a635..6e81d64 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -344,7 +344,7 @@ struct pci_dev {
unsigned intirq;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory 
regions + expansion ROMs */
 
-   bool match_driver;  /* Skip attaching driver */
+   int match_driver;   /* Skip attaching driver */
/* These fields are used by common fixups */
unsigned inttransparent:1;  /* Subtractive decode PCI bridge */
unsigned intmultifunction:1;/* Part of multi-function device */
-- 
2.9.4



[PATCH 06/10] PCI: PCI_TEST simple data

2017-08-05 Thread Yinghai Lu
Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 drivers/pci/pci_test_data.txt | 24 
 drivers/pci/pci_test_mask.txt |  5 +
 2 files changed, 29 insertions(+)
 create mode 100644 drivers/pci/pci_test_data.txt
 create mode 100644 drivers/pci/pci_test_mask.txt

diff --git a/drivers/pci/pci_test_data.txt b/drivers/pci/pci_test_data.txt
new file mode 100644
index 000..a3cfd5d
--- /dev/null
+++ b/drivers/pci/pci_test_data.txt
@@ -0,0 +1,24 @@
+pci_bus :00: root bus resource [bus 00-fe]
+pci_bus :00: root bus resource [io  0x-0x0cf7]
+pci_bus :00: root bus resource [io  0x0d00-0x]
+pci_bus :00: root bus resource [mem 0x000a-0x000b]
+pci_bus :00: root bus resource [mem 0xbfa0-0xfebf]
+pci_bus :00: root bus resource [mem 0xfed4-0xfed4bfff]
+
+00:1a.0 0c03: 8086:1c2d (rev 04) (prog-if 20 [EHCI])
+00: 86 80 2d 1c 06 00 90 02 04 20 03 0c 00 00 00 00
+10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 cf 21
+30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00
+40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+50: 01 58 c2 c9 00 00 00 00 0a 98 a0 20 00 00 00 00
+60: 20 20 a7 07 00 00 00 00 01 00 00 00 00 00 08 c0
+70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00
+80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00
+90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00
+a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00
+e0: 00 00 00 00 01 44 00 9c 06 40 0b 28 b8 a1 18 09
+f0: 00 00 00 00 88 85 80 00 87 0f 06 08 08 17 5b 20
diff --git a/drivers/pci/pci_test_mask.txt b/drivers/pci/pci_test_mask.txt
new file mode 100644
index 000..824afab
--- /dev/null
+++ b/drivers/pci/pci_test_mask.txt
@@ -0,0 +1,5 @@
+mask: 8086:1c2d : 1:rw 0: readonly
+00: 00 00 00 00 ff ff ff 02 04 20 03 0c 00 00 00 00
+10: 00 fc ff ff 00 00 00 00 00 00 00 00 00 00 00 00
+20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-- 
2.9.4



[PATCH 06/10] PCI: PCI_TEST simple data

2017-08-05 Thread Yinghai Lu
Signed-off-by: Yinghai Lu 
---
 drivers/pci/pci_test_data.txt | 24 
 drivers/pci/pci_test_mask.txt |  5 +
 2 files changed, 29 insertions(+)
 create mode 100644 drivers/pci/pci_test_data.txt
 create mode 100644 drivers/pci/pci_test_mask.txt

diff --git a/drivers/pci/pci_test_data.txt b/drivers/pci/pci_test_data.txt
new file mode 100644
index 000..a3cfd5d
--- /dev/null
+++ b/drivers/pci/pci_test_data.txt
@@ -0,0 +1,24 @@
+pci_bus :00: root bus resource [bus 00-fe]
+pci_bus :00: root bus resource [io  0x-0x0cf7]
+pci_bus :00: root bus resource [io  0x0d00-0x]
+pci_bus :00: root bus resource [mem 0x000a-0x000b]
+pci_bus :00: root bus resource [mem 0xbfa0-0xfebf]
+pci_bus :00: root bus resource [mem 0xfed4-0xfed4bfff]
+
+00:1a.0 0c03: 8086:1c2d (rev 04) (prog-if 20 [EHCI])
+00: 86 80 2d 1c 06 00 90 02 04 20 03 0c 00 00 00 00
+10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 cf 21
+30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00
+40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+50: 01 58 c2 c9 00 00 00 00 0a 98 a0 20 00 00 00 00
+60: 20 20 a7 07 00 00 00 00 01 00 00 00 00 00 08 c0
+70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00
+80: 00 00 80 00 11 88 0c 93 30 0d 00 24 00 00 00 00
+90: 00 00 00 00 00 00 00 00 13 00 06 03 00 00 00 00
+a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+d0: 00 00 00 00 00 aa ff 00 00 00 00 00 00 00 00 00
+e0: 00 00 00 00 01 44 00 9c 06 40 0b 28 b8 a1 18 09
+f0: 00 00 00 00 88 85 80 00 87 0f 06 08 08 17 5b 20
diff --git a/drivers/pci/pci_test_mask.txt b/drivers/pci/pci_test_mask.txt
new file mode 100644
index 000..824afab
--- /dev/null
+++ b/drivers/pci/pci_test_mask.txt
@@ -0,0 +1,5 @@
+mask: 8086:1c2d : 1:rw 0: readonly
+00: 00 00 00 00 ff ff ff 02 04 20 03 0c 00 00 00 00
+10: 00 fc ff ff 00 00 00 00 00 00 00 00 00 00 00 00
+20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-- 
2.9.4



[PATCH 02/10] PCI: introduce ioport_res/iomem_res for PCI_TEST

2017-08-05 Thread Yinghai Lu
Make every bus to take the pointer to correct iomem_res instead of
using iomem_resource directly.

So PCI_TEST could use different iomem_res later.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 drivers/pci/probe.c |  2 ++
 drivers/pci/quirks.c|  2 +-
 drivers/pci/setup-bus.c |  2 +-
 drivers/pci/setup-res.c |  4 ++--
 include/linux/pci.h | 13 +
 5 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..1811016 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -881,6 +881,8 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus 
*parent,
child->msi = parent->msi;
child->sysdata = parent->sysdata;
child->bus_flags = parent->bus_flags;
+   child->iomem_res = parent->iomem_res;
+   child->ioport_res = parent->ioport_res;
 
/* initialize some portions of the bus device, but don't register it
 * now as the parent is not properly set up yet.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..dc31098 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1617,7 +1617,7 @@ static void quirk_alder_ioapic(struct pci_dev *pdev)
 * not touch this (and it's already covered by the fixmap), so
 * forcibly insert it into the resource tree */
if (pci_resource_start(pdev, 0) && pci_resource_len(pdev, 0))
-   insert_resource(_resource, >resource[0]);
+   insert_resource(iomem_res(pdev->bus), >resource[0]);
 
/* The next five BARs all seem to be rubbish, so just clean
 * them out */
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 958da7d..1c30102 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -805,7 +805,7 @@ static struct resource *find_free_bus_resource(struct 
pci_bus *bus,
struct resource *r;
 
pci_bus_for_each_resource(bus, r, i) {
-   if (r == _resource || r == _resource)
+   if (r == ioport_res(bus) || r == iomem_res(bus))
continue;
if (r && (r->flags & type_mask) == type && !r->parent)
return r;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 85774b7..43921a4 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -215,9 +215,9 @@ static int pci_revert_fw_address(struct resource *res, 
struct pci_dev *dev,
root = pci_find_parent_resource(dev, res);
if (!root) {
if (res->flags & IORESOURCE_IO)
-   root = _resource;
+   root = ioport_res(dev->bus);
else
-   root = _resource;
+   root = iomem_res(dev->bus);
}
 
dev_info(>dev, "BAR %d: trying firmware assignment %pR\n",
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..c58a635 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -524,6 +524,9 @@ struct pci_bus {
void*sysdata;   /* hook for sys-specific extension */
struct proc_dir_entry *procdir; /* directory entry in /proc/bus/pci */
 
+   struct resource *iomem_res; /* pointer to root iomem_resource */
+   struct resource *ioport_res;/* pointer to root ioport_resource */
+
unsigned char   number; /* bus number */
unsigned char   primary;/* number of primary bridge */
unsigned char   max_bus_speed;  /* enum pci_bus_speed */
@@ -545,6 +548,16 @@ struct pci_bus {
 
 #define to_pci_bus(n)  container_of(n, struct pci_bus, dev)
 
+static inline struct resource *iomem_res(struct pci_bus *bus)
+{
+   return bus->iomem_res ? : _resource;
+}
+
+static inline struct resource *ioport_res(struct pci_bus *bus)
+{
+   return bus->ioport_res ? : _resource;
+}
+
 /*
  * Returns true if the PCI bus is root (behind host-PCI bridge),
  * false otherwise
-- 
2.9.4



[PATCH 05/10] PCI: Add PCI_TEST module for resource allocation

2017-08-05 Thread Yinghai Lu
Read from data file and mask file, to build simulated data structure, and
have pci_ops to use them.

Extract calling for pci_create_root_bus, scan_child_bus, resource survey
and resource assign ... to see if those functions work as expected with
simulated data.

mask is with rw bits on pci registers, so we can make pci BAR sizing working.

It also support bus number assign-all.

Only tested on x86 64bit arch.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 drivers/pci/Kconfig|6 +
 drivers/pci/Makefile   |2 +
 drivers/pci/pci_test.c | 1281 
 3 files changed, 1289 insertions(+)
 create mode 100644 drivers/pci/pci_test.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index c32a77f..3183331 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -134,6 +134,12 @@ config PCI_HYPERV
   The PCI device frontend driver allows the kernel to import arbitrary
   PCI devices from a PCI backend to support PCI driver domains.
 
+config PCI_TEST
+   tristate "PCI test"
+   depends on PCI && X86_64
+   help
+ PCI test module to verify PCI resource allocation with simulated data
+
 source "drivers/pci/hotplug/Kconfig"
 source "drivers/pci/dwc/Kconfig"
 source "drivers/pci/host/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 66a21ac..043b979 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -51,6 +51,8 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 
 obj-$(CONFIG_OF) += of.o
 
+obj-$(CONFIG_PCI_TEST) += pci_test.o
+
 ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
 
 # PCI host controller drivers
diff --git a/drivers/pci/pci_test.c b/drivers/pci/pci_test.c
new file mode 100644
index 000..02ab2d2
--- /dev/null
+++ b/drivers/pci/pci_test.c
@@ -0,0 +1,1281 @@
+/*
+ *     drivers/pci/pci_test.c
+ * by Yinghai Lu <ying...@kernel.org>
+ *
+ * # insmod pci_test.ko data_file=pci_test_data.txt mask_file=pci_test_mask.txt
+ * # lspci -tv
+ * # cat /proc/ioports_test
+ * # cat /proc/iomem_test
+ * # rmmod pci_test
+ *
+ * data_file: is root bus resource from boot log + "lspci -vv"
+ * mask_file: modified from "lspci -vv" with mask head.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct test_pci_root {
+   struct list_head node;
+   u16 segment;
+   u8 busnumber;
+   struct resource secondary;
+   struct resource c_pri; /* holder for child primary */
+   struct list_head root_res_list;
+   struct resource secondary_os;
+   struct pci_bus *bus;
+};
+
+struct test_pci_dev {
+   struct list_head node;
+   u32 mbdf;
+   int space_size; /* 256 or 4096 */
+   unsigned char *reg;
+
+   int mask_size;
+   unsigned char *mask;
+
+   int invisiable;
+   struct test_pci_dev *parent;  /* device on root bus has no parent */
+   struct list_head dev_list; /* node in children devices */
+   int bridge;
+   struct resource secondary;
+   struct resource c_pri; /* holder for child primary */
+   struct list_head children;  /* bridge to point to children devices */
+};
+
+struct test_pci_mask {
+   struct list_head node;
+   u32 vendor_device_id;
+   int space_size; /* 256 or 4096 */
+   unsigned char *reg; /* for mask: bit set means rw */
+};
+
+static LIST_HEAD(test_pci_root_list);
+static LIST_HEAD(test_pci_dev_list);
+static LIST_HEAD(test_pci_mask_list);
+
+static struct test_pci_root *test_pci_get_root(int mn, int bn)
+{
+   struct test_pci_root *root;
+
+   list_for_each_entry(root, _pci_root_list, node)
+   if (root->segment == (mn & 0x) &&
+   root->busnumber == (bn & 0xff))
+   return root;
+
+   return NULL;
+}
+
+static struct test_pci_root *test_pci_add_root(int mn, int bn, int bn1, int 
bn2)
+{
+   struct test_pci_root *root;
+
+   if (bn > bn1 || bn1 > bn2)
+   return NULL;
+
+   root = test_pci_get_root(mn, bn);
+   if (root) /* bus come last ? */
+   goto set_secondary;
+
+   root = kzalloc(sizeof(struct test_pci_root), GFP_KERNEL);
+   if (!root)
+   return NULL;
+
+   INIT_LIST_HEAD(>root_res_list);
+   list_add_tail(>node, _pci_root_list);
+
+   root->segment = mn & 0x;
+   root->busnumber = bn & 0xff;
+
+set_secondary:
+   if (root->c_pri.parent)
+   release_resource(>c_pri);
+   root->secondary.flags = IORESOURCE_BUS;
+   root->secondary.start = bn1 & 0xff;
+   root->secondary.end = bn2 & 0xff;
+   root->c_pri.flags = IORESOURCE_BUS;
+   root->c_pri.start = root->secondary.start;
+   root->c_pri.end = root->secondary

[PATCH 02/10] PCI: introduce ioport_res/iomem_res for PCI_TEST

2017-08-05 Thread Yinghai Lu
Make every bus to take the pointer to correct iomem_res instead of
using iomem_resource directly.

So PCI_TEST could use different iomem_res later.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/probe.c |  2 ++
 drivers/pci/quirks.c|  2 +-
 drivers/pci/setup-bus.c |  2 +-
 drivers/pci/setup-res.c |  4 ++--
 include/linux/pci.h | 13 +
 5 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..1811016 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -881,6 +881,8 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus 
*parent,
child->msi = parent->msi;
child->sysdata = parent->sysdata;
child->bus_flags = parent->bus_flags;
+   child->iomem_res = parent->iomem_res;
+   child->ioport_res = parent->ioport_res;
 
/* initialize some portions of the bus device, but don't register it
 * now as the parent is not properly set up yet.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..dc31098 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1617,7 +1617,7 @@ static void quirk_alder_ioapic(struct pci_dev *pdev)
 * not touch this (and it's already covered by the fixmap), so
 * forcibly insert it into the resource tree */
if (pci_resource_start(pdev, 0) && pci_resource_len(pdev, 0))
-   insert_resource(_resource, >resource[0]);
+   insert_resource(iomem_res(pdev->bus), >resource[0]);
 
/* The next five BARs all seem to be rubbish, so just clean
 * them out */
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 958da7d..1c30102 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -805,7 +805,7 @@ static struct resource *find_free_bus_resource(struct 
pci_bus *bus,
struct resource *r;
 
pci_bus_for_each_resource(bus, r, i) {
-   if (r == _resource || r == _resource)
+   if (r == ioport_res(bus) || r == iomem_res(bus))
continue;
if (r && (r->flags & type_mask) == type && !r->parent)
return r;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 85774b7..43921a4 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -215,9 +215,9 @@ static int pci_revert_fw_address(struct resource *res, 
struct pci_dev *dev,
root = pci_find_parent_resource(dev, res);
if (!root) {
if (res->flags & IORESOURCE_IO)
-   root = _resource;
+   root = ioport_res(dev->bus);
else
-   root = _resource;
+   root = iomem_res(dev->bus);
}
 
dev_info(>dev, "BAR %d: trying firmware assignment %pR\n",
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..c58a635 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -524,6 +524,9 @@ struct pci_bus {
void*sysdata;   /* hook for sys-specific extension */
struct proc_dir_entry *procdir; /* directory entry in /proc/bus/pci */
 
+   struct resource *iomem_res; /* pointer to root iomem_resource */
+   struct resource *ioport_res;/* pointer to root ioport_resource */
+
unsigned char   number; /* bus number */
unsigned char   primary;/* number of primary bridge */
unsigned char   max_bus_speed;  /* enum pci_bus_speed */
@@ -545,6 +548,16 @@ struct pci_bus {
 
 #define to_pci_bus(n)  container_of(n, struct pci_bus, dev)
 
+static inline struct resource *iomem_res(struct pci_bus *bus)
+{
+   return bus->iomem_res ? : _resource;
+}
+
+static inline struct resource *ioport_res(struct pci_bus *bus)
+{
+   return bus->ioport_res ? : _resource;
+}
+
 /*
  * Returns true if the PCI bus is root (behind host-PCI bridge),
  * false otherwise
-- 
2.9.4



[PATCH 05/10] PCI: Add PCI_TEST module for resource allocation

2017-08-05 Thread Yinghai Lu
Read from data file and mask file, to build simulated data structure, and
have pci_ops to use them.

Extract calling for pci_create_root_bus, scan_child_bus, resource survey
and resource assign ... to see if those functions work as expected with
simulated data.

mask is with rw bits on pci registers, so we can make pci BAR sizing working.

It also support bus number assign-all.

Only tested on x86 64bit arch.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/Kconfig|6 +
 drivers/pci/Makefile   |2 +
 drivers/pci/pci_test.c | 1281 
 3 files changed, 1289 insertions(+)
 create mode 100644 drivers/pci/pci_test.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index c32a77f..3183331 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -134,6 +134,12 @@ config PCI_HYPERV
   The PCI device frontend driver allows the kernel to import arbitrary
   PCI devices from a PCI backend to support PCI driver domains.
 
+config PCI_TEST
+   tristate "PCI test"
+   depends on PCI && X86_64
+   help
+ PCI test module to verify PCI resource allocation with simulated data
+
 source "drivers/pci/hotplug/Kconfig"
 source "drivers/pci/dwc/Kconfig"
 source "drivers/pci/host/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 66a21ac..043b979 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -51,6 +51,8 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 
 obj-$(CONFIG_OF) += of.o
 
+obj-$(CONFIG_PCI_TEST) += pci_test.o
+
 ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
 
 # PCI host controller drivers
diff --git a/drivers/pci/pci_test.c b/drivers/pci/pci_test.c
new file mode 100644
index 000..02ab2d2
--- /dev/null
+++ b/drivers/pci/pci_test.c
@@ -0,0 +1,1281 @@
+/*
+ * drivers/pci/pci_test.c
+ * by Yinghai Lu 
+ *
+ * # insmod pci_test.ko data_file=pci_test_data.txt mask_file=pci_test_mask.txt
+ * # lspci -tv
+ * # cat /proc/ioports_test
+ * # cat /proc/iomem_test
+ * # rmmod pci_test
+ *
+ * data_file: is root bus resource from boot log + "lspci -vv"
+ * mask_file: modified from "lspci -vv" with mask head.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct test_pci_root {
+   struct list_head node;
+   u16 segment;
+   u8 busnumber;
+   struct resource secondary;
+   struct resource c_pri; /* holder for child primary */
+   struct list_head root_res_list;
+   struct resource secondary_os;
+   struct pci_bus *bus;
+};
+
+struct test_pci_dev {
+   struct list_head node;
+   u32 mbdf;
+   int space_size; /* 256 or 4096 */
+   unsigned char *reg;
+
+   int mask_size;
+   unsigned char *mask;
+
+   int invisiable;
+   struct test_pci_dev *parent;  /* device on root bus has no parent */
+   struct list_head dev_list; /* node in children devices */
+   int bridge;
+   struct resource secondary;
+   struct resource c_pri; /* holder for child primary */
+   struct list_head children;  /* bridge to point to children devices */
+};
+
+struct test_pci_mask {
+   struct list_head node;
+   u32 vendor_device_id;
+   int space_size; /* 256 or 4096 */
+   unsigned char *reg; /* for mask: bit set means rw */
+};
+
+static LIST_HEAD(test_pci_root_list);
+static LIST_HEAD(test_pci_dev_list);
+static LIST_HEAD(test_pci_mask_list);
+
+static struct test_pci_root *test_pci_get_root(int mn, int bn)
+{
+   struct test_pci_root *root;
+
+   list_for_each_entry(root, _pci_root_list, node)
+   if (root->segment == (mn & 0x) &&
+   root->busnumber == (bn & 0xff))
+   return root;
+
+   return NULL;
+}
+
+static struct test_pci_root *test_pci_add_root(int mn, int bn, int bn1, int 
bn2)
+{
+   struct test_pci_root *root;
+
+   if (bn > bn1 || bn1 > bn2)
+   return NULL;
+
+   root = test_pci_get_root(mn, bn);
+   if (root) /* bus come last ? */
+   goto set_secondary;
+
+   root = kzalloc(sizeof(struct test_pci_root), GFP_KERNEL);
+   if (!root)
+   return NULL;
+
+   INIT_LIST_HEAD(>root_res_list);
+   list_add_tail(>node, _pci_root_list);
+
+   root->segment = mn & 0x;
+   root->busnumber = bn & 0xff;
+
+set_secondary:
+   if (root->c_pri.parent)
+   release_resource(>c_pri);
+   root->secondary.flags = IORESOURCE_BUS;
+   root->secondary.start = bn1 & 0xff;
+   root->secondary.end = bn2 & 0xff;
+   root->c_pri.flags = IORESOURCE_BUS;
+   root->c_pri.start = root->secondary.start;
+   root->c_pri.end = root->secondary.start;
+   request_resource(>secondary, >

Re: [tip:x86/platform] x86/PCI/mmcfg: Switch to ECAM config mode if possible

2017-06-29 Thread Yinghai Lu
On Wed, Jun 28, 2017 at 11:45 PM, tip-bot for Thomas Gleixner
 wrote:
> Commit-ID:  b5b0f00c760b6e9673ab79b88ede2f3c7a039f74
> Gitweb: http://git.kernel.org/tip/b5b0f00c760b6e9673ab79b88ede2f3c7a039f74
> Author: Thomas Gleixner 
> AuthorDate: Thu, 16 Mar 2017 22:50:09 +0100
> Committer:  Thomas Gleixner 
> CommitDate: Thu, 29 Jun 2017 08:41:54 +0200
>
> x86/PCI/mmcfg: Switch to ECAM config mode if possible
>
> To allow lockless access to the whole PCI configuration space the mmconfig
> based accessor functions need to be propagated to the pci_root_ops.
>
> Unfortunatly this cannot be done before the PCI subsystem initialization
> happens even if mmconfig access is already available. The reason is that
> some of the special platform PCI implementations must be able to overrule
> that setting before further accesses happen.
>
> The earliest possible point is after x86_init.pci.init() has been run. This
> is at a point in the boot process where nothing actually uses the PCI
> devices so the accessor function pointers can be updated lockless w/o risk.
>
> The switch to full ECAM mode depends on the availability of mmconfig and
> unchanged default accessors.
>
> Signed-off-by: Thomas Gleixner 
> Acked-by: Bjorn Helgaas 
> Cc: Andi Kleen 
> Cc: Peter Zijlstra 
> Cc: Stephane Eranian 
> Cc: Borislav Petkov 
> Cc: linux-...@vger.kernel.org
> Link: http://lkml.kernel.org/r/20170316215057.452220...@linutronix.de
> Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/include/asm/pci_x86.h | 20 
>  arch/x86/pci/common.c  | 16 
>  arch/x86/pci/legacy.c  |  1 +
>  arch/x86/pci/mmconfig-shared.c | 30 ++
>  arch/x86/pci/mmconfig_32.c |  8 
>  arch/x86/pci/mmconfig_64.c |  8 
>  6 files changed, 67 insertions(+), 16 deletions(-)
>
>  /*
> + * Called after the last possible modification to raw_pci_[ext_]ops.
> + *
> + * Verify that root_pci_ops have not been overwritten by any implementation
> + * of x86_init.pci.arch_init() and x86_init.pci.init().
> + *
> + * If not, let the mmconfig code decide whether the ops can be switched
> + * over to the ECAM accessor functions.
> + */
> +void __init pcibios_select_ops(void)
> +{
> +   if (pci_root_ops.read != pci_read || pci_root_ops.write != pci_write)
> +   return;
> +   pci_mmcfg_select_ops();
> +}
> +
> +/*
>   *  Called after each bus is probed, but before its children
>   *  are examined.
>   */
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 1cb01ab..80ea40e 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -65,6 +65,7 @@ static int __init pci_subsys_init(void)
> }
> }
>
> +   pcibios_select_ops();
> pcibios_fixup_peer_bridges();
> x86_init.pci.init_irq();
> pcibios_init();
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index d1b47d5..6af6351 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -816,3 +816,33 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
>
> return -ENOENT;
>  }
> +
> +static int pci_ecam_read(struct pci_bus *bus, unsigned int devfn, int reg,
> +int size, u32 *value)
> +{
> +   return pci_mmcfg_read(pci_domain_nr(bus), bus->number, devfn, reg,
> + size, value);
> +}
> +
> +static int pci_ecam_write(struct pci_bus *bus, unsigned int devfn, int reg,
> + int size, u32 value)
> +{
> +   return pci_mmcfg_write(pci_domain_nr(bus), bus->number, devfn, reg,
> +  size, value);
> +}
> +
> +void __init pci_mmcfg_select_ops(void)
> +{
> +   if (raw_pci_ext_ops != _mmcfg)
> +   return;
> +
> +   /*
> +* The pointer to root_pci_ops has been handed in to ACPI already
> +* and is already set in the busses.
> +*
> +* Switch the functions over to ECAM for all config space accesses.
> +*/
> +   pci_root_ops.read = pci_ecam_read;
> +   pci_root_ops.write = pci_ecam_write;
> +   pr_info("PCI: Switch to ECAM configuration mode\n");
> +}

Hi Thomas,

Would this patch actually void the commit:

commit a0ca9909609470ad779b9b9cc68ce96e975afff7
Author: Ivan Kokshaysky 
Date:   Mon Jan 14 17:31:09 2008 -0500

PCI x86: always use conf1 to access config space below 256 bytes

Thanks to Loic Prylli , who originally proposed
this idea.

Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all

Re: [tip:x86/platform] x86/PCI/mmcfg: Switch to ECAM config mode if possible

2017-06-29 Thread Yinghai Lu
On Wed, Jun 28, 2017 at 11:45 PM, tip-bot for Thomas Gleixner
 wrote:
> Commit-ID:  b5b0f00c760b6e9673ab79b88ede2f3c7a039f74
> Gitweb: http://git.kernel.org/tip/b5b0f00c760b6e9673ab79b88ede2f3c7a039f74
> Author: Thomas Gleixner 
> AuthorDate: Thu, 16 Mar 2017 22:50:09 +0100
> Committer:  Thomas Gleixner 
> CommitDate: Thu, 29 Jun 2017 08:41:54 +0200
>
> x86/PCI/mmcfg: Switch to ECAM config mode if possible
>
> To allow lockless access to the whole PCI configuration space the mmconfig
> based accessor functions need to be propagated to the pci_root_ops.
>
> Unfortunatly this cannot be done before the PCI subsystem initialization
> happens even if mmconfig access is already available. The reason is that
> some of the special platform PCI implementations must be able to overrule
> that setting before further accesses happen.
>
> The earliest possible point is after x86_init.pci.init() has been run. This
> is at a point in the boot process where nothing actually uses the PCI
> devices so the accessor function pointers can be updated lockless w/o risk.
>
> The switch to full ECAM mode depends on the availability of mmconfig and
> unchanged default accessors.
>
> Signed-off-by: Thomas Gleixner 
> Acked-by: Bjorn Helgaas 
> Cc: Andi Kleen 
> Cc: Peter Zijlstra 
> Cc: Stephane Eranian 
> Cc: Borislav Petkov 
> Cc: linux-...@vger.kernel.org
> Link: http://lkml.kernel.org/r/20170316215057.452220...@linutronix.de
> Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/include/asm/pci_x86.h | 20 
>  arch/x86/pci/common.c  | 16 
>  arch/x86/pci/legacy.c  |  1 +
>  arch/x86/pci/mmconfig-shared.c | 30 ++
>  arch/x86/pci/mmconfig_32.c |  8 
>  arch/x86/pci/mmconfig_64.c |  8 
>  6 files changed, 67 insertions(+), 16 deletions(-)
>
>  /*
> + * Called after the last possible modification to raw_pci_[ext_]ops.
> + *
> + * Verify that root_pci_ops have not been overwritten by any implementation
> + * of x86_init.pci.arch_init() and x86_init.pci.init().
> + *
> + * If not, let the mmconfig code decide whether the ops can be switched
> + * over to the ECAM accessor functions.
> + */
> +void __init pcibios_select_ops(void)
> +{
> +   if (pci_root_ops.read != pci_read || pci_root_ops.write != pci_write)
> +   return;
> +   pci_mmcfg_select_ops();
> +}
> +
> +/*
>   *  Called after each bus is probed, but before its children
>   *  are examined.
>   */
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 1cb01ab..80ea40e 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -65,6 +65,7 @@ static int __init pci_subsys_init(void)
> }
> }
>
> +   pcibios_select_ops();
> pcibios_fixup_peer_bridges();
> x86_init.pci.init_irq();
> pcibios_init();
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index d1b47d5..6af6351 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -816,3 +816,33 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
>
> return -ENOENT;
>  }
> +
> +static int pci_ecam_read(struct pci_bus *bus, unsigned int devfn, int reg,
> +int size, u32 *value)
> +{
> +   return pci_mmcfg_read(pci_domain_nr(bus), bus->number, devfn, reg,
> + size, value);
> +}
> +
> +static int pci_ecam_write(struct pci_bus *bus, unsigned int devfn, int reg,
> + int size, u32 value)
> +{
> +   return pci_mmcfg_write(pci_domain_nr(bus), bus->number, devfn, reg,
> +  size, value);
> +}
> +
> +void __init pci_mmcfg_select_ops(void)
> +{
> +   if (raw_pci_ext_ops != _mmcfg)
> +   return;
> +
> +   /*
> +* The pointer to root_pci_ops has been handed in to ACPI already
> +* and is already set in the busses.
> +*
> +* Switch the functions over to ECAM for all config space accesses.
> +*/
> +   pci_root_ops.read = pci_ecam_read;
> +   pci_root_ops.write = pci_ecam_write;
> +   pr_info("PCI: Switch to ECAM configuration mode\n");
> +}

Hi Thomas,

Would this patch actually void the commit:

commit a0ca9909609470ad779b9b9cc68ce96e975afff7
Author: Ivan Kokshaysky 
Date:   Mon Jan 14 17:31:09 2008 -0500

PCI x86: always use conf1 to access config space below 256 bytes

Thanks to Loic Prylli , who originally proposed
this idea.

Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.

Signed-off-by: Ivan Kokshaysky 
Signed-off-by: Matthew Wilcox 
Signed-off-by: Linus Torvalds 

Re: [PATCH v3] PCI: Workaround wrong flags completions for IDT switch

2017-06-13 Thread Yinghai Lu
On Mon, Jun 12, 2017 at 2:48 PM, Bjorn Helgaas <helg...@kernel.org> wrote:
> On Fri, Jun 09, 2017 at 04:16:17PM -0700, Yinghai Lu wrote:
>> From: James Puthukattukaran <james.puthukattuka...@oracle.com>
>>
>> The IDT switch incorrectly flags an ACS source violation on a read config
>> request to an end point device on the completion (IDT 89H32H8G3-YC,
>> errata #36) even though the PCI Express spec states that completions are
>> never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).
>
> Can you include a URL where this erratum is published?  If not, can
> you include the actual erratum text here?

Ok.

>
> Have you considered ways to make this patch apply only to the affected
> IDT switches?  Currently it applies to *all* devices.

But we need to apply that workaround before we know vendorid/deviceid
under that root port or downstream port.

>
> The purpose of the pci_bus_read_dev_vendor_id() path is to support the
> Configuration Request Retry Status feature (see PCIe r3.1, sec 2.3.2),
> which works by special handling of config reads of the Vendor ID after
> a reset.  Normally, that Vendor ID read would be the first access to
> a device when we enumerate it.
>
> This patch adds config reads and writes of the ACS capability *before*
> the Vendor ID read.  At that point we don't even know whether the
> device exists.  If it doesn't exist, pci_find_ext_capability() would
> read 0x data, and it probably fails reasonably gracefully.
>
> But if the device *does* exist, I think this patch breaks the CRS
> Software Visibility feature.  Without this patch, we try to read
> Vendor ID, and the device may return a CRS Completion Status.  If CRS
> visibility is enabled, the root complex may complete the request by
> returning 0x0001 for the Vendor ID, in which case we sleep and try
> again later.
>
> With this patch, we first try to read the ACS capability.  If the
> device returns a CRS Completion Status, the root complex is required
> to reissue the config request.  This is the required behavior
> regardless of whether CRS Software Visibility is enabled, so I think
> this effectively disables that feature.

The workaround (acs reading etc) is applied to root port or downstream port.
and pci_bus_read_dev_vendor_id() is for reading vendorid of device
under that root port or downstream port.

Thanks

Yinghai


Re: [PATCH v3] PCI: Workaround wrong flags completions for IDT switch

2017-06-13 Thread Yinghai Lu
On Mon, Jun 12, 2017 at 2:48 PM, Bjorn Helgaas  wrote:
> On Fri, Jun 09, 2017 at 04:16:17PM -0700, Yinghai Lu wrote:
>> From: James Puthukattukaran 
>>
>> The IDT switch incorrectly flags an ACS source violation on a read config
>> request to an end point device on the completion (IDT 89H32H8G3-YC,
>> errata #36) even though the PCI Express spec states that completions are
>> never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).
>
> Can you include a URL where this erratum is published?  If not, can
> you include the actual erratum text here?

Ok.

>
> Have you considered ways to make this patch apply only to the affected
> IDT switches?  Currently it applies to *all* devices.

But we need to apply that workaround before we know vendorid/deviceid
under that root port or downstream port.

>
> The purpose of the pci_bus_read_dev_vendor_id() path is to support the
> Configuration Request Retry Status feature (see PCIe r3.1, sec 2.3.2),
> which works by special handling of config reads of the Vendor ID after
> a reset.  Normally, that Vendor ID read would be the first access to
> a device when we enumerate it.
>
> This patch adds config reads and writes of the ACS capability *before*
> the Vendor ID read.  At that point we don't even know whether the
> device exists.  If it doesn't exist, pci_find_ext_capability() would
> read 0x data, and it probably fails reasonably gracefully.
>
> But if the device *does* exist, I think this patch breaks the CRS
> Software Visibility feature.  Without this patch, we try to read
> Vendor ID, and the device may return a CRS Completion Status.  If CRS
> visibility is enabled, the root complex may complete the request by
> returning 0x0001 for the Vendor ID, in which case we sleep and try
> again later.
>
> With this patch, we first try to read the ACS capability.  If the
> device returns a CRS Completion Status, the root complex is required
> to reissue the config request.  This is the required behavior
> regardless of whether CRS Software Visibility is enabled, so I think
> this effectively disables that feature.

The workaround (acs reading etc) is applied to root port or downstream port.
and pci_bus_read_dev_vendor_id() is for reading vendorid of device
under that root port or downstream port.

Thanks

Yinghai


[PATCH v3] PCI: Workaround wrong flags completions for IDT switch

2017-06-09 Thread Yinghai Lu
From: James Puthukattukaran <james.puthukattuka...@oracle.com>

The IDT switch incorrectly flags an ACS source violation on a read config
request to an end point device on the completion (IDT 89H32H8G3-YC,
errata #36) even though the PCI Express spec states that completions are
never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).

The suggested workaround by IDT is to issue a configuration write to the
downstream device before issuing the first config read. This allows the
downstream device to capture its bus number, thus avoiding the ACS
violation on the completion.

The patch does the following -

1. Disable ACS source violation if enabled
2. Wait for config space access to become available by reading vendor id
3. Do a config write to the end point (errata workaround)
4. Enable ACS source validation (if it was enabled to begin with)

-v2: move workaround to pci_bus_read_dev_vendor_id() from pci_bus_check_dev()
 and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
-v3: add bus->self check for root bus and virtual bus for sriov vfs.

Signed-off-by: James Puthukattukaran <james.puthukattuka...@oracle.com>
Signed-off-by: Yinghai Lu <ying...@kernel.org>

--

 drivers/pci/pci.c   |   33 +
 drivers/pci/pci.h   |1 +
 drivers/pci/probe.c |   38 --
 3 files changed, 70 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/probe.c
===
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1763,8 +1763,8 @@ struct pci_dev *pci_alloc_dev(struct pci
 }
 EXPORT_SYMBOL(pci_alloc_dev);
 
-bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
-   int crs_timeout)
+static bool __pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn,
+u32 *l, int crs_timeout)
 {
int delay = 1;
 
@@ -1801,6 +1801,40 @@ bool pci_bus_read_dev_vendor_id(struct p
 
return true;
 }
+
+bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
+   int crs_timeout)
+{
+   int found;
+   int enable = -1;
+
+   /*
+* Some IDT switches flag an ACS violation for config reads
+* even though the PCI spec allows for it (PCIe 3.1, 6.1.12.1)
+* It flags it because the bus number is not properly set in the
+* completion. The workaround is to do a dummy write to properly
+* latch number once the device is ready for config operations
+*/
+
+   if (bus->self)
+   enable = pci_std_enable_acs_sv(bus->self, false);
+
+   found = __pci_bus_read_dev_vendor_id(bus, devfn, l, crs_timeout);
+
+   /*
+* The fact that we can read the vendor id indicates that the device
+* is ready for config operations. Do the write as part of the errata
+* workaround.
+*/
+   if (bus->self) {
+   if (found)
+   pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
+   if (enable > 0)
+   pci_std_enable_acs_sv(bus->self, enable);
+   }
+
+   return found;
+}
 EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
 
 /*
Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -2838,6 +2838,39 @@ static bool pci_acs_flags_enabled(struct
 }
 
 /**
+ *  pci_std_enable_acs_sv - enable/disable ACS source validation if supported 
by the switch
+ *  @dev - pcie switch/RP
+ *  @enable - enable (1) or disable (0) source validation
+ *
+ *  Returns : < 0 on failure
+ *   previous acs_sv state
+ */
+int pci_std_enable_acs_sv(struct pci_dev *dev, bool enable)
+{
+   int pos;
+   u16 cap;
+   u16 ctrl;
+   int retval;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return -ENODEV;
+
+   pci_read_config_word(dev, pos + PCI_ACS_CAP, );
+   pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
+
+   retval = !!(ctrl & cap & PCI_ACS_SV);
+   if (enable)
+   ctrl |= (cap & PCI_ACS_SV);
+   else
+   ctrl &= ~(cap & PCI_ACS_SV);
+
+   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+
+   return retval;
+}
+
+/**
  * pci_acs_enabled - test ACS against required flags for a given device
  * @pdev: device to test
  * @acs_flags: required PCI ACS flags
Index: linux-2.6/drivers/pci/pci.h
===
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -343,6 +343,7 @@ static inline resource_size_t pci_resour
 }
 
 void pci_enable_acs(struct pci_dev *dev);
+int pci_std_enable_acs_sv(struct pci_dev *dev, bool enable);
 
 #ifdef CONFIG_PCIE_PTM
 void pci_ptm_init(struct pci_dev *dev);


[PATCH v3] PCI: Workaround wrong flags completions for IDT switch

2017-06-09 Thread Yinghai Lu
From: James Puthukattukaran 

The IDT switch incorrectly flags an ACS source violation on a read config
request to an end point device on the completion (IDT 89H32H8G3-YC,
errata #36) even though the PCI Express spec states that completions are
never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).

The suggested workaround by IDT is to issue a configuration write to the
downstream device before issuing the first config read. This allows the
downstream device to capture its bus number, thus avoiding the ACS
violation on the completion.

The patch does the following -

1. Disable ACS source violation if enabled
2. Wait for config space access to become available by reading vendor id
3. Do a config write to the end point (errata workaround)
4. Enable ACS source validation (if it was enabled to begin with)

-v2: move workaround to pci_bus_read_dev_vendor_id() from pci_bus_check_dev()
 and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
-v3: add bus->self check for root bus and virtual bus for sriov vfs.

Signed-off-by: James Puthukattukaran 
Signed-off-by: Yinghai Lu 

--

 drivers/pci/pci.c   |   33 +
 drivers/pci/pci.h   |1 +
 drivers/pci/probe.c |   38 --
 3 files changed, 70 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/probe.c
===
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1763,8 +1763,8 @@ struct pci_dev *pci_alloc_dev(struct pci
 }
 EXPORT_SYMBOL(pci_alloc_dev);
 
-bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
-   int crs_timeout)
+static bool __pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn,
+u32 *l, int crs_timeout)
 {
int delay = 1;
 
@@ -1801,6 +1801,40 @@ bool pci_bus_read_dev_vendor_id(struct p
 
return true;
 }
+
+bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
+   int crs_timeout)
+{
+   int found;
+   int enable = -1;
+
+   /*
+* Some IDT switches flag an ACS violation for config reads
+* even though the PCI spec allows for it (PCIe 3.1, 6.1.12.1)
+* It flags it because the bus number is not properly set in the
+* completion. The workaround is to do a dummy write to properly
+* latch number once the device is ready for config operations
+*/
+
+   if (bus->self)
+   enable = pci_std_enable_acs_sv(bus->self, false);
+
+   found = __pci_bus_read_dev_vendor_id(bus, devfn, l, crs_timeout);
+
+   /*
+* The fact that we can read the vendor id indicates that the device
+* is ready for config operations. Do the write as part of the errata
+* workaround.
+*/
+   if (bus->self) {
+   if (found)
+   pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
+   if (enable > 0)
+   pci_std_enable_acs_sv(bus->self, enable);
+   }
+
+   return found;
+}
 EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
 
 /*
Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -2838,6 +2838,39 @@ static bool pci_acs_flags_enabled(struct
 }
 
 /**
+ *  pci_std_enable_acs_sv - enable/disable ACS source validation if supported 
by the switch
+ *  @dev - pcie switch/RP
+ *  @enable - enable (1) or disable (0) source validation
+ *
+ *  Returns : < 0 on failure
+ *   previous acs_sv state
+ */
+int pci_std_enable_acs_sv(struct pci_dev *dev, bool enable)
+{
+   int pos;
+   u16 cap;
+   u16 ctrl;
+   int retval;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return -ENODEV;
+
+   pci_read_config_word(dev, pos + PCI_ACS_CAP, );
+   pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
+
+   retval = !!(ctrl & cap & PCI_ACS_SV);
+   if (enable)
+   ctrl |= (cap & PCI_ACS_SV);
+   else
+   ctrl &= ~(cap & PCI_ACS_SV);
+
+   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+
+   return retval;
+}
+
+/**
  * pci_acs_enabled - test ACS against required flags for a given device
  * @pdev: device to test
  * @acs_flags: required PCI ACS flags
Index: linux-2.6/drivers/pci/pci.h
===
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -343,6 +343,7 @@ static inline resource_size_t pci_resour
 }
 
 void pci_enable_acs(struct pci_dev *dev);
+int pci_std_enable_acs_sv(struct pci_dev *dev, bool enable);
 
 #ifdef CONFIG_PCIE_PTM
 void pci_ptm_init(struct pci_dev *dev);


Re: [PATCH] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-01 Thread Yinghai Lu
 ba92c783f9b8
> [9.988962] CR2: 9387bfff
> [9.989022] ---[ end trace fe34c0fc0fe685ab ]---
> [9.998690] Kernel panic - not syncing: Fatal exception
> [   10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
>
> Reported-by: Jeff Moyer <jmo...@redhat.com>
> Signed-off-by: Baoquan He <b...@redhat.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: x...@kernel.org
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Thomas Garnier <thgar...@google.com>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Yasuaki Ishimatsu <yasu.isim...@gmail.com>
> Cc: Jinbum Park <jinb.pa...@gmail.com>
> Cc: Dave Hansen <dave.han...@linux.intel.com>
> Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com>
> Cc: Yinghai Lu <ying...@kernel.org>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Dave Young <dyo...@redhat.com>
> ---
>  arch/x86/mm/init_64.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 15173d3..dbf4f00 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -94,12 +94,14 @@ __setup("noexec32=", nonx32_setup);
>   */
>  void sync_global_pgds(unsigned long start, unsigned long end)
>  {
> -   unsigned long address;
> +   unsigned long address, address_next;
>
> -   for (address = start; address <= end; address += PGDIR_SIZE) {
> +   for (address = start; address <= end; address = address_next) {
> const pgd_t *pgd_ref = pgd_offset_k(address);
> struct page *page;
>
> +   address_next = (address & PGDIR_MASK) + PGDIR_SIZE;
> +
> if (pgd_none(*pgd_ref))
> continue;
>

This one is better than V2.

It would better if could rename address to addr as Ingo suggested.

Acked-by: Yinghai Lu <ying...@kernel.org>


Re: [PATCH] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-01 Thread Yinghai Lu
2] CR2: 9387bfff
> [9.989022] ---[ end trace fe34c0fc0fe685ab ]---
> [9.998690] Kernel panic - not syncing: Fatal exception
> [   10.004708] Kernel Offset: 0x1100 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
>
> Reported-by: Jeff Moyer 
> Signed-off-by: Baoquan He 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: Kees Cook 
> Cc: Thomas Garnier 
> Cc: Andrew Morton 
> Cc: Yasuaki Ishimatsu 
> Cc: Jinbum Park 
> Cc: Dave Hansen 
> Cc: "Kirill A. Shutemov" 
> Cc: Yinghai Lu 
> Cc: Dan Williams 
> Cc: Dave Young 
> ---
>  arch/x86/mm/init_64.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 15173d3..dbf4f00 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -94,12 +94,14 @@ __setup("noexec32=", nonx32_setup);
>   */
>  void sync_global_pgds(unsigned long start, unsigned long end)
>  {
> -   unsigned long address;
> +   unsigned long address, address_next;
>
> -   for (address = start; address <= end; address += PGDIR_SIZE) {
> +       for (address = start; address <= end; address = address_next) {
> const pgd_t *pgd_ref = pgd_offset_k(address);
> struct page *page;
>
> +   address_next = (address & PGDIR_MASK) + PGDIR_SIZE;
> +
> if (pgd_none(*pgd_ref))
> continue;
>

This one is better than V2.

It would better if could rename address to addr as Ingo suggested.

Acked-by: Yinghai Lu 


Re: [PATCH v2] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-01 Thread Yinghai Lu
On Mon, May 1, 2017 at 12:32 PM, Ingo Molnar  wrote:
>
> * Baoquan He  wrote:
>
>>  arch/x86/mm/init_64.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 15173d3..dfa9edb 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -96,7 +96,9 @@ void sync_global_pgds(unsigned long start, unsigned long 
>> end)
>>  {
>>   unsigned long address;
>>
>> - for (address = start; address <= end; address += PGDIR_SIZE) {
>> + for (address = start; address <= end;
>> +  address = ALIGN(address + 1, PGDIR_SIZE)) {
>> +
>>   const pgd_t *pgd_ref = pgd_offset_k(address);
>>   struct page *page;
>
> This patch does not apply cleanly to tip:master.
>
> You can avoid the col80 problems by renaming 'address' to the canonical 'addr'
> name, the loop will become:
>
> for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
>
> ... which fits into 80 cols.

would be more readable to make sync_global_pgds() loop is more like that in
kernel_physical_mapping_init() ?

vaddr_start = vaddr;

for (; vaddr < vaddr_end; vaddr = vaddr_next) {
...
vaddr_next = (vaddr & PGDIR_MASK) + PGDIR_SIZE;

Yinghai


Re: [PATCH v2] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds

2017-05-01 Thread Yinghai Lu
On Mon, May 1, 2017 at 12:32 PM, Ingo Molnar  wrote:
>
> * Baoquan He  wrote:
>
>>  arch/x86/mm/init_64.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 15173d3..dfa9edb 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -96,7 +96,9 @@ void sync_global_pgds(unsigned long start, unsigned long 
>> end)
>>  {
>>   unsigned long address;
>>
>> - for (address = start; address <= end; address += PGDIR_SIZE) {
>> + for (address = start; address <= end;
>> +  address = ALIGN(address + 1, PGDIR_SIZE)) {
>> +
>>   const pgd_t *pgd_ref = pgd_offset_k(address);
>>   struct page *page;
>
> This patch does not apply cleanly to tip:master.
>
> You can avoid the col80 problems by renaming 'address' to the canonical 'addr'
> name, the loop will become:
>
> for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
>
> ... which fits into 80 cols.

would be more readable to make sync_global_pgds() loop is more like that in
kernel_physical_mapping_init() ?

vaddr_start = vaddr;

for (; vaddr < vaddr_end; vaddr = vaddr_next) {
...
vaddr_next = (vaddr & PGDIR_MASK) + PGDIR_SIZE;

Yinghai


Re: [tip:x86/boot] x86/boot/e820: Basic cleanup of e820.c

2017-04-25 Thread Yinghai Lu
On Tue, Apr 11, 2017 at 12:37 AM, tip-bot for Ingo Molnar
 wrote:
> Commit-ID:  640e1b38b00550990cecd809021cd37716e45922
> Gitweb: http://git.kernel.org/tip/640e1b38b00550990cecd809021cd37716e45922
> Author: Ingo Molnar 
> AuthorDate: Sat, 28 Jan 2017 11:13:08 +0100
> Committer:  Ingo Molnar 
> CommitDate: Sat, 28 Jan 2017 14:42:27 +0100
>

> x86/boot/e820: Basic cleanup of e820.c

> @@ -951,49 +924,42 @@ void __init finish_e820_parsing(void)
>  static const char *__init e820_type_to_string(int e820_type)
>  {
> switch (e820_type) {
> -   case E820_RESERVED_KERN:
> -   case E820_RAM:  return "System RAM";
> -   case E820_ACPI: return "ACPI Tables";
> -   case E820_NVS:  return "ACPI Non-volatile Storage";
> -   case E820_UNUSABLE: return "Unusable memory";
> -   case E820_PRAM: return "Persistent Memory (legacy)";
> -   case E820_PMEM: return "Persistent Memory";
> -   default:return "reserved";
> +   case E820_RESERVED_KERN: /* Fall-through: */
> +   case E820_RAM:   return "System RAM";
> +   case E820_ACPI:  return "ACPI Tables";
> +   case E820_NVS:   return "ACPI Non-volatile Storage";
> +   case E820_UNUSABLE:  return "Unusable memory";
> +   case E820_PRAM:  return "Persistent Memory (legacy)";
> +   case E820_PMEM:  return "Persistent Memory";
> +   default: return "Reserved";
> }
>  }
>
...

Hi Ingo,

The reserved ==> Reserved change cause kexec warning.

Unknown type (Reserved) while parsing /sys/firmware/memmap/18/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/16/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/14/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/22/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/9/type.
Please report this as bug. Using RANGE_RESERVED now.
add_buffer: base:43fff6000 bufsz:80e0 memsz:a000
add_buffer: base:43fff1000 bufsz:44ce memsz:44ce
add_buffer: base:43c00 bufsz:f4c5c0 memsz:3581000
add_buffer: base:439d0d000 bufsz:22f2060 memsz:22f2060
add_buffer: base:43fff bufsz:70 memsz:70
add_buffer: base:43ffef000 bufsz:230 memsz:230
10:~/k # cat /sys/firmware/memmap/14/type
Reserved

also /proc/iomem have that changed too.

Yinghai


Re: [tip:x86/boot] x86/boot/e820: Basic cleanup of e820.c

2017-04-25 Thread Yinghai Lu
On Tue, Apr 11, 2017 at 12:37 AM, tip-bot for Ingo Molnar
 wrote:
> Commit-ID:  640e1b38b00550990cecd809021cd37716e45922
> Gitweb: http://git.kernel.org/tip/640e1b38b00550990cecd809021cd37716e45922
> Author: Ingo Molnar 
> AuthorDate: Sat, 28 Jan 2017 11:13:08 +0100
> Committer:  Ingo Molnar 
> CommitDate: Sat, 28 Jan 2017 14:42:27 +0100
>

> x86/boot/e820: Basic cleanup of e820.c

> @@ -951,49 +924,42 @@ void __init finish_e820_parsing(void)
>  static const char *__init e820_type_to_string(int e820_type)
>  {
> switch (e820_type) {
> -   case E820_RESERVED_KERN:
> -   case E820_RAM:  return "System RAM";
> -   case E820_ACPI: return "ACPI Tables";
> -   case E820_NVS:  return "ACPI Non-volatile Storage";
> -   case E820_UNUSABLE: return "Unusable memory";
> -   case E820_PRAM: return "Persistent Memory (legacy)";
> -   case E820_PMEM: return "Persistent Memory";
> -   default:return "reserved";
> +   case E820_RESERVED_KERN: /* Fall-through: */
> +   case E820_RAM:   return "System RAM";
> +   case E820_ACPI:  return "ACPI Tables";
> +   case E820_NVS:   return "ACPI Non-volatile Storage";
> +   case E820_UNUSABLE:  return "Unusable memory";
> +   case E820_PRAM:  return "Persistent Memory (legacy)";
> +   case E820_PMEM:  return "Persistent Memory";
> +   default: return "Reserved";
> }
>  }
>
...

Hi Ingo,

The reserved ==> Reserved change cause kexec warning.

Unknown type (Reserved) while parsing /sys/firmware/memmap/18/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/16/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/14/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/22/type.
Please report this as bug. Using RANGE_RESERVED now.
Unknown type (Reserved) while parsing /sys/firmware/memmap/9/type.
Please report this as bug. Using RANGE_RESERVED now.
add_buffer: base:43fff6000 bufsz:80e0 memsz:a000
add_buffer: base:43fff1000 bufsz:44ce memsz:44ce
add_buffer: base:43c00 bufsz:f4c5c0 memsz:3581000
add_buffer: base:439d0d000 bufsz:22f2060 memsz:22f2060
add_buffer: base:43fff bufsz:70 memsz:70
add_buffer: base:43ffef000 bufsz:230 memsz:230
10:~/k # cat /sys/firmware/memmap/14/type
Reserved

also /proc/iomem have that changed too.

Yinghai


Re: [PATCH 1/2] x86/mm/ident_map: Add PUD level 1GB page support

2017-04-25 Thread Yinghai Lu
On Tue, Apr 25, 2017 at 2:13 AM, Xunlei Pang  wrote:
> The current kernel_ident_mapping_init() creates the identity
> mapping using 2MB page(PMD level), this patch adds the 1GB
> page(PUD level) support.
>
> This is useful on large machines to save some reserved memory
> (as paging structures) in the kdump case when kexec setups up
> identity mappings before booting into the new kernel.
>
> We will utilize this new support in the following patch.
>
> Signed-off-by: Xunlei Pang 
> ---
>  arch/x86/boot/compressed/pagetable.c |  2 +-
>  arch/x86/include/asm/init.h  |  3 ++-
>  arch/x86/kernel/machine_kexec_64.c   |  2 +-
>  arch/x86/mm/ident_map.c  | 13 -
>  arch/x86/power/hibernate_64.c|  2 +-
>  5 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/pagetable.c 
> b/arch/x86/boot/compressed/pagetable.c
> index 56589d0..1d78f17 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -70,7 +70,7 @@ static void *alloc_pgt_page(void *context)
>   * Due to relocation, pointers must be assigned at run time not build time.
>   */
>  static struct x86_mapping_info mapping_info = {
> -   .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
> +   .page_flag   = __PAGE_KERNEL_LARGE_EXEC,
>  };
>
>  /* Locates and clears a region for a new top level page table. */
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 737da62..46eab1a 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -4,8 +4,9 @@
>  struct x86_mapping_info {
> void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
> void *context;   /* context for alloc_pgt_page */
> -   unsigned long pmd_flag;  /* page flag for PMD entry */
> +   unsigned long page_flag; /* page flag for PMD or PUD entry */
> unsigned long offset;/* ident mapping offset */
> +   bool use_pud_page;  /* PUD level 1GB page support */

how about use direct_gbpages instead?
use_pud_page is confusing.

>  };
>
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 085c3b3..1d4f2b0 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -113,7 +113,7 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
> struct x86_mapping_info info = {
> .alloc_pgt_page = alloc_pgt_page,
> .context= image,
> -   .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
> +   .page_flag  = __PAGE_KERNEL_LARGE_EXEC,
> };
> unsigned long mstart, mend;
> pgd_t *level4p;
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 04210a2..0ad0280 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -13,7 +13,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, 
> pmd_t *pmd_page,
> if (pmd_present(*pmd))
> continue;
>
> -   set_pmd(pmd, __pmd((addr - info->offset) | info->pmd_flag));
> +   set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
> }
>  }
>
> @@ -30,6 +30,17 @@ static int ident_pud_init(struct x86_mapping_info *info, 
> pud_t *pud_page,
> if (next > end)
> next = end;
>
> +   if (info->use_pud_page) {
> +   pud_t pudval;
> +
> +   if (pud_present(*pud))
> +   continue;
> +
> +   pudval = __pud((addr - info->offset) | 
> info->page_flag);
> +   set_pud(pud, pudval);

should mask addr with PUD_MASK.
   addr &= PUD_MASK;
   set_pud(pud, __pmd(addr - info->offset) | info->page_flag);



> +   continue;
> +   }
> +
> if (pud_present(*pud)) {
> pmd = pmd_offset(pud, 0);
> ident_pmd_init(info, pmd, addr, next);
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 6a61194..a6e21fe 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -104,7 +104,7 @@ static int set_up_temporary_mappings(void)
>  {
> struct x86_mapping_info info = {
> .alloc_pgt_page = alloc_pgt_page,
> -   .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
> +   .page_flag  = __PAGE_KERNEL_LARGE_EXEC,
> .offset = __PAGE_OFFSET,
> };
> unsigned long mstart, mend;
> --
> 1.8.3.1
>


Re: [PATCH 1/2] x86/mm/ident_map: Add PUD level 1GB page support

2017-04-25 Thread Yinghai Lu
On Tue, Apr 25, 2017 at 2:13 AM, Xunlei Pang  wrote:
> The current kernel_ident_mapping_init() creates the identity
> mapping using 2MB page(PMD level), this patch adds the 1GB
> page(PUD level) support.
>
> This is useful on large machines to save some reserved memory
> (as paging structures) in the kdump case when kexec setups up
> identity mappings before booting into the new kernel.
>
> We will utilize this new support in the following patch.
>
> Signed-off-by: Xunlei Pang 
> ---
>  arch/x86/boot/compressed/pagetable.c |  2 +-
>  arch/x86/include/asm/init.h  |  3 ++-
>  arch/x86/kernel/machine_kexec_64.c   |  2 +-
>  arch/x86/mm/ident_map.c  | 13 -
>  arch/x86/power/hibernate_64.c|  2 +-
>  5 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/pagetable.c 
> b/arch/x86/boot/compressed/pagetable.c
> index 56589d0..1d78f17 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -70,7 +70,7 @@ static void *alloc_pgt_page(void *context)
>   * Due to relocation, pointers must be assigned at run time not build time.
>   */
>  static struct x86_mapping_info mapping_info = {
> -   .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
> +   .page_flag   = __PAGE_KERNEL_LARGE_EXEC,
>  };
>
>  /* Locates and clears a region for a new top level page table. */
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 737da62..46eab1a 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -4,8 +4,9 @@
>  struct x86_mapping_info {
> void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
> void *context;   /* context for alloc_pgt_page */
> -   unsigned long pmd_flag;  /* page flag for PMD entry */
> +   unsigned long page_flag; /* page flag for PMD or PUD entry */
> unsigned long offset;/* ident mapping offset */
> +   bool use_pud_page;  /* PUD level 1GB page support */

how about use direct_gbpages instead?
use_pud_page is confusing.

>  };
>
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 085c3b3..1d4f2b0 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -113,7 +113,7 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
> struct x86_mapping_info info = {
> .alloc_pgt_page = alloc_pgt_page,
> .context= image,
> -   .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
> +   .page_flag  = __PAGE_KERNEL_LARGE_EXEC,
> };
> unsigned long mstart, mend;
> pgd_t *level4p;
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 04210a2..0ad0280 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -13,7 +13,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, 
> pmd_t *pmd_page,
> if (pmd_present(*pmd))
> continue;
>
> -   set_pmd(pmd, __pmd((addr - info->offset) | info->pmd_flag));
> +   set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
> }
>  }
>
> @@ -30,6 +30,17 @@ static int ident_pud_init(struct x86_mapping_info *info, 
> pud_t *pud_page,
> if (next > end)
> next = end;
>
> +   if (info->use_pud_page) {
> +   pud_t pudval;
> +
> +   if (pud_present(*pud))
> +   continue;
> +
> +   pudval = __pud((addr - info->offset) | 
> info->page_flag);
> +   set_pud(pud, pudval);

should mask addr with PUD_MASK.
   addr &= PUD_MASK;
   set_pud(pud, __pmd(addr - info->offset) | info->page_flag);



> +   continue;
> +   }
> +
> if (pud_present(*pud)) {
> pmd = pmd_offset(pud, 0);
> ident_pmd_init(info, pmd, addr, next);
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 6a61194..a6e21fe 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -104,7 +104,7 @@ static int set_up_temporary_mappings(void)
>  {
> struct x86_mapping_info info = {
> .alloc_pgt_page = alloc_pgt_page,
> -   .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
> +   .page_flag  = __PAGE_KERNEL_LARGE_EXEC,
> .offset = __PAGE_OFFSET,
> };
> unsigned long mstart, mend;
> --
> 1.8.3.1
>


[PATCH 05/13] sparc/PCI: Keep resource idx order with bridge register number

2017-04-20 Thread Yinghai Lu
On one system found strange "no compatible bridge window" warning
even we already had pref_compat support that add extra pref bit for device
resource.

PCI: Claiming :00:01.0: Resource 14: 00020001..000200010fff 
[10220c]
PCI: Claiming :01:00.0: Resource 1: 00020001..00020001 
[100214]
pci :01:00.0: can't claim BAR 1 [mem 0x20001-0x20001 
64bit]: no compatible bridge window

It turns out that pci_resource_compatible()/pci_up_path_over_pref_mem64()
just check resource with bridge pref mmio register idx 15, and we have put
resource to use mmio register idx 14 during of_scan_pci_bridge()
as the bridge does not have mmio resource.

We already fix pci_up_path_over_pref_mem64() to check all bus resources.

And at the same time, this patch make resource to have consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index adb9653..887441e 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -481,7 +481,7 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
pci_read_bridge_bases(bus);
goto after_ranges;
}
-   i = 1;
+   i = 3;
for (; len >= 32; len -= 32, ranges += 8) {
u64 start;
 
@@ -513,6 +513,12 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
   " for bridge %s\n", node->full_name);
continue;
}
+   } else if ((flags & IORESOURCE_PREFETCH) &&
+  !bus->resource[2]->flags) {
+   res = bus->resource[2];
+   } else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+   IORESOURCE_MEM) && !bus->resource[1]->flags) {
+   res = bus->resource[1];
} else {
if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
printk(KERN_ERR "PCI: too many memory ranges"
-- 
2.9.3



[PATCH 05/13] sparc/PCI: Keep resource idx order with bridge register number

2017-04-20 Thread Yinghai Lu
On one system found strange "no compatible bridge window" warning
even we already had pref_compat support that add extra pref bit for device
resource.

PCI: Claiming :00:01.0: Resource 14: 00020001..000200010fff 
[10220c]
PCI: Claiming :01:00.0: Resource 1: 00020001..00020001 
[100214]
pci :01:00.0: can't claim BAR 1 [mem 0x20001-0x20001 
64bit]: no compatible bridge window

It turns out that pci_resource_compatible()/pci_up_path_over_pref_mem64()
just check resource with bridge pref mmio register idx 15, and we have put
resource to use mmio register idx 14 during of_scan_pci_bridge()
as the bridge does not have mmio resource.

We already fix pci_up_path_over_pref_mem64() to check all bus resources.

And at the same time, this patch make resource to have consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index adb9653..887441e 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -481,7 +481,7 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
pci_read_bridge_bases(bus);
goto after_ranges;
}
-   i = 1;
+   i = 3;
for (; len >= 32; len -= 32, ranges += 8) {
u64 start;
 
@@ -513,6 +513,12 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
   " for bridge %s\n", node->full_name);
continue;
}
+   } else if ((flags & IORESOURCE_PREFETCH) &&
+  !bus->resource[2]->flags) {
+   res = bus->resource[2];
+   } else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+   IORESOURCE_MEM) && !bus->resource[1]->flags) {
+   res = bus->resource[1];
} else {
if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
printk(KERN_ERR "PCI: too many memory ranges"
-- 
2.9.3



[PATCH 00/13] PCI: sparc related 64bit resource fixup

2017-04-20 Thread Yinghai Lu
Hi Bjorn,

Please check sparc related 64bit resource handling patches.

patch 1-8: parse MEM64 for sparc and other system with OF.
So device 64bit resource could find their parent resource.

patch 9-12: MMIO64 handling enhancement
treat non-pref mmio64 as pref mmio64 if all bridges to root all pcie.

patch 13: restore old pref allocation logic if hostbridge does not support 
mmio64.

Those patches could be applied on top of today's pci/next.

Thanks

Yinghai


Yinghai Lu (13):
  sparc/PCI: Use correct offset for bus address to resource
  PCI: Add pci_find_bus_resource()
  sparc/PCI: Reserve legacy mmio after PCI mmio
  sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  sparc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
  PCI: Check pref compatible bit for mem64 resource of PCIe device
  PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
  PCI: Add has_mem64 for struct host_bridge
  PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
  PCI: Restore pref MMIO allocation logic for host bridge without mmio64

 arch/powerpc/kernel/pci_of_scan.c | 12 +-
 arch/sparc/kernel/of_device_32.c  |  5 ++-
 arch/sparc/kernel/of_device_64.c  |  5 ++-
 arch/sparc/kernel/pci.c   | 15 +--
 arch/sparc/kernel/pci_common.c| 91 +++
 arch/sparc/kernel/pci_impl.h  |  5 +++
 drivers/of/address.c  |  4 +-
 drivers/pci/bus.c |  4 +-
 drivers/pci/pci.c | 31 +++--
 drivers/pci/pci.h |  2 +
 drivers/pci/probe.c   | 40 +
 drivers/pci/setup-bus.c   | 65 
 drivers/pci/setup-res.c   | 13 --
 include/linux/pci.h   |  4 ++
 14 files changed, 224 insertions(+), 72 deletions(-)

-- 
2.9.3



[PATCH 00/13] PCI: sparc related 64bit resource fixup

2017-04-20 Thread Yinghai Lu
Hi Bjorn,

Please check sparc related 64bit resource handling patches.

patch 1-8: parse MEM64 for sparc and other system with OF.
So device 64bit resource could find their parent resource.

patch 9-12: MMIO64 handling enhancement
treat non-pref mmio64 as pref mmio64 if all bridges to root all pcie.

patch 13: restore old pref allocation logic if hostbridge does not support 
mmio64.

Those patches could be applied on top of today's pci/next.

Thanks

Yinghai


Yinghai Lu (13):
  sparc/PCI: Use correct offset for bus address to resource
  PCI: Add pci_find_bus_resource()
  sparc/PCI: Reserve legacy mmio after PCI mmio
  sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  sparc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
  PCI: Check pref compatible bit for mem64 resource of PCIe device
  PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
  PCI: Add has_mem64 for struct host_bridge
  PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
  PCI: Restore pref MMIO allocation logic for host bridge without mmio64

 arch/powerpc/kernel/pci_of_scan.c | 12 +-
 arch/sparc/kernel/of_device_32.c  |  5 ++-
 arch/sparc/kernel/of_device_64.c  |  5 ++-
 arch/sparc/kernel/pci.c   | 15 +--
 arch/sparc/kernel/pci_common.c| 91 +++
 arch/sparc/kernel/pci_impl.h  |  5 +++
 drivers/of/address.c  |  4 +-
 drivers/pci/bus.c |  4 +-
 drivers/pci/pci.c | 31 +++--
 drivers/pci/pci.h |  2 +
 drivers/pci/probe.c   | 40 +
 drivers/pci/setup-bus.c   | 65 
 drivers/pci/setup-res.c   | 13 --
 include/linux/pci.h   |  4 ++
 14 files changed, 224 insertions(+), 72 deletions(-)

-- 
2.9.3



[PATCH 07/13] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing

2017-04-20 Thread Yinghai Lu
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device
resource flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu <ying...@kernel.org>
Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Gavin Shan <gws...@linux.vnet.ibm.com>
Cc: Yijing Wang <wangyij...@huawei.com>
Cc: Anton Blanchard <an...@samba.org>
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 9581e00..24714d4 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
 
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
-   flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
+   if (addr0 & 0x0100)
+   flags |= IORESOURCE_MEM_64
+| PCI_BASE_ADDRESS_MEM_TYPE_64;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
 | PCI_BASE_ADDRESS_MEM_PREFETCH;
-- 
2.9.3



[PATCH 07/13] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing

2017-04-20 Thread Yinghai Lu
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device
resource flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Gavin Shan 
Cc: Yijing Wang 
Cc: Anton Blanchard 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 9581e00..24714d4 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
 
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
-   flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
+   if (addr0 & 0x0100)
+   flags |= IORESOURCE_MEM_64
+| PCI_BASE_ADDRESS_MEM_TYPE_64;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
 | PCI_BASE_ADDRESS_MEM_PREFETCH;
-- 
2.9.3



[PATCH 01/13] sparc/PCI: Use correct offset for bus address to resource

2017-04-20 Thread Yinghai Lu
After we added 64bit mmio parsing, we got some "no compatible bridge window"
warning on anther new model that support 64bit resource.

It turns out that we can not use mem_space.start as 64bit mem space
offset, aka there is mem_space.start != offset.

Use child_phys_addr to calculate exact offset and record offset in
pbm.

After patch we get correct offset.

/pci@305: PCI IO [io  0x2007e-0x2007e0fff] offset 2007e
/pci@305: PCI MEM [mem 0x20010-0x27eff] offset 2
/pci@305: PCI MEM64 [mem 0x20001-0x2000d] offset 2
...
pci_sun4v f02ae7f8: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x20010-0x27eff] (bus 
address [0x0010-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x2000d] (bus 
address [0x1-0xd])

-v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset
 check overlapping between mem64_space and mem_space.
-v7: after new pci_mmap_page_range patches.
-v8: remove change in pci_resource_to_user()

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci.c|  6 +++---
 arch/sparc/kernel/pci_common.c | 32 
 arch/sparc/kernel/pci_impl.h   |  4 
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 7eceaa1..c5cf813 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -663,12 +663,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
printk("PCI: Scanning PBM %s\n", node->full_name);
 
pci_add_resource_offset(, >io_space,
-   pbm->io_space.start);
+   pbm->io_offset);
pci_add_resource_offset(, >mem_space,
-   pbm->mem_space.start);
+   pbm->mem_offset);
if (pbm->mem64_space.flags)
pci_add_resource_offset(, >mem64_space,
-   pbm->mem_space.start);
+   pbm->mem64_offset);
pbm->busn.start = pbm->pci_first_busno;
pbm->busn.end   = pbm->pci_last_busno;
pbm->busn.flags = IORESOURCE_BUS;
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 33524c1..76998f8 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -410,13 +410,16 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 
for (i = 0; i < num_pbm_ranges; i++) {
const struct linux_prom_pci_ranges *pr = _ranges[i];
-   unsigned long a, size;
+   unsigned long a, size, region_a;
u32 parent_phys_hi, parent_phys_lo;
+   u32 child_phys_mid, child_phys_lo;
u32 size_hi, size_lo;
int type;
 
parent_phys_hi = pr->parent_phys_hi;
parent_phys_lo = pr->parent_phys_lo;
+   child_phys_mid = pr->child_phys_mid;
+   child_phys_lo = pr->child_phys_lo;
if (tlb_type == hypervisor)
parent_phys_hi &= 0x0fff;
 
@@ -426,6 +429,8 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
type = (pr->child_phys_hi >> 24) & 0x3;
a = (((unsigned long)parent_phys_hi << 32UL) |
 ((unsigned long)parent_phys_lo  <<  0UL));
+   region_a = (((unsigned long)child_phys_mid << 32UL) |
+((unsigned long)child_phys_lo  <<  0UL));
size = (((unsigned long)size_hi << 32UL) |
((unsigned long)size_lo  <<  0UL));
 
@@ -440,6 +445,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
pbm->io_space.start = a;
pbm->io_space.end = a + size - 1UL;
pbm->io_space.flags = IORESOURCE_IO;
+   pbm->io_offset = a - region_a;
saw_io = 1;
break;
 
@@ -448,6 +454,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
pbm->mem_space.start = a;
pbm->mem_space.end = a + size - 1UL;
pbm->mem_space.flags = IORESOURCE_MEM;
+   pbm->mem_offset = a - region_a;
saw_mem = 1;
break;
 
@@ -456,6 +463,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
pbm->mem

[PATCH 01/13] sparc/PCI: Use correct offset for bus address to resource

2017-04-20 Thread Yinghai Lu
After we added 64bit mmio parsing, we got some "no compatible bridge window"
warning on anther new model that support 64bit resource.

It turns out that we can not use mem_space.start as 64bit mem space
offset, aka there is mem_space.start != offset.

Use child_phys_addr to calculate exact offset and record offset in
pbm.

After patch we get correct offset.

/pci@305: PCI IO [io  0x2007e-0x2007e0fff] offset 2007e
/pci@305: PCI MEM [mem 0x20010-0x27eff] offset 2
/pci@305: PCI MEM64 [mem 0x20001-0x2000d] offset 2
...
pci_sun4v f02ae7f8: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x20010-0x27eff] (bus 
address [0x0010-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x2000d] (bus 
address [0x1-0xd])

-v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset
 check overlapping between mem64_space and mem_space.
-v7: after new pci_mmap_page_range patches.
-v8: remove change in pci_resource_to_user()

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci.c|  6 +++---
 arch/sparc/kernel/pci_common.c | 32 
 arch/sparc/kernel/pci_impl.h   |  4 
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 7eceaa1..c5cf813 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -663,12 +663,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
printk("PCI: Scanning PBM %s\n", node->full_name);
 
pci_add_resource_offset(, >io_space,
-   pbm->io_space.start);
+   pbm->io_offset);
pci_add_resource_offset(, >mem_space,
-   pbm->mem_space.start);
+   pbm->mem_offset);
if (pbm->mem64_space.flags)
pci_add_resource_offset(, >mem64_space,
-   pbm->mem_space.start);
+   pbm->mem64_offset);
pbm->busn.start = pbm->pci_first_busno;
pbm->busn.end   = pbm->pci_last_busno;
pbm->busn.flags = IORESOURCE_BUS;
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 33524c1..76998f8 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -410,13 +410,16 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 
for (i = 0; i < num_pbm_ranges; i++) {
const struct linux_prom_pci_ranges *pr = _ranges[i];
-   unsigned long a, size;
+   unsigned long a, size, region_a;
u32 parent_phys_hi, parent_phys_lo;
+   u32 child_phys_mid, child_phys_lo;
u32 size_hi, size_lo;
int type;
 
parent_phys_hi = pr->parent_phys_hi;
parent_phys_lo = pr->parent_phys_lo;
+   child_phys_mid = pr->child_phys_mid;
+   child_phys_lo = pr->child_phys_lo;
if (tlb_type == hypervisor)
parent_phys_hi &= 0x0fff;
 
@@ -426,6 +429,8 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
type = (pr->child_phys_hi >> 24) & 0x3;
a = (((unsigned long)parent_phys_hi << 32UL) |
 ((unsigned long)parent_phys_lo  <<  0UL));
+   region_a = (((unsigned long)child_phys_mid << 32UL) |
+((unsigned long)child_phys_lo  <<  0UL));
size = (((unsigned long)size_hi << 32UL) |
((unsigned long)size_lo  <<  0UL));
 
@@ -440,6 +445,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
pbm->io_space.start = a;
pbm->io_space.end = a + size - 1UL;
pbm->io_space.flags = IORESOURCE_IO;
+   pbm->io_offset = a - region_a;
saw_io = 1;
break;
 
@@ -448,6 +454,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
pbm->mem_space.start = a;
pbm->mem_space.end = a + size - 1UL;
pbm->mem_space.flags = IORESOURCE_MEM;
+   pbm->mem_offset = a - region_a;
saw_mem = 1;
break;
 
@@ -456,6 +463,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
pbm->mem64_space.start = a;
 

[PATCH 12/13] PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64

2017-04-20 Thread Yinghai Lu
If host bridge does not have mmio64 above 4G, We don't need to
treat device non-pref mmio64 as as pref mmio64.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b3fd314..7a0e59b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -738,7 +738,7 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
 static bool pci_up_path_over_pref_mem64(struct pci_bus *bus)
 {
if (pci_is_root_bus(bus))
-   return true;
+   return to_pci_host_bridge(bus->bridge)->has_mem64;
 
if (bus->self) {
int i;
-- 
2.9.3



[PATCH 02/13] PCI: Add pci_find_bus_resource()

2017-04-20 Thread Yinghai Lu
Add pci_find_bus_resource() to return bus resource for input resource.

In some case, we may only have bus instead of dev.
It is same as pci_find_parent_resource, but take bus as input.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 drivers/pci/pci.c   | 27 ---
 include/linux/pci.h |  2 ++
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4ffa152..deb828f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -437,18 +437,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 }
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
-/**
- * pci_find_parent_resource - return resource region of parent bus of given 
region
- * @dev: PCI device structure contains resources to be searched
- * @res: child resource record for which parent is sought
- *
- *  For given resource region of given device, return the resource
- *  region of parent bus the given region is contained in.
- */
-struct resource *pci_find_parent_resource(const struct pci_dev *dev,
- struct resource *res)
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res)
 {
-   const struct pci_bus *bus = dev->bus;
struct resource *r;
int i;
 
@@ -478,6 +469,20 @@ struct resource *pci_find_parent_resource(const struct 
pci_dev *dev,
}
return NULL;
 }
+
+/**
+ * pci_find_parent_resource - return resource region of parent bus of given 
region
+ * @dev: PCI device structure contains resources to be searched
+ * @res: child resource record for which parent is sought
+ *
+ *  For given resource region of given device, return the resource
+ *  region of parent bus the given region is contained in.
+ */
+struct resource *pci_find_parent_resource(const struct pci_dev *dev,
+ struct resource *res)
+{
+   return pci_find_bus_resource(dev->bus, res);
+}
 EXPORT_SYMBOL(pci_find_parent_resource);
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 88185ff..817786b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -836,6 +836,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct 
pci_bus_region *region,
 struct resource *res);
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 struct pci_bus_region *region);
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res);
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-- 
2.9.3



[PATCH 12/13] PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64

2017-04-20 Thread Yinghai Lu
If host bridge does not have mmio64 above 4G, We don't need to
treat device non-pref mmio64 as as pref mmio64.

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b3fd314..7a0e59b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -738,7 +738,7 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
 static bool pci_up_path_over_pref_mem64(struct pci_bus *bus)
 {
if (pci_is_root_bus(bus))
-   return true;
+   return to_pci_host_bridge(bus->bridge)->has_mem64;
 
if (bus->self) {
int i;
-- 
2.9.3



[PATCH 02/13] PCI: Add pci_find_bus_resource()

2017-04-20 Thread Yinghai Lu
Add pci_find_bus_resource() to return bus resource for input resource.

In some case, we may only have bus instead of dev.
It is same as pci_find_parent_resource, but take bus as input.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/pci.c   | 27 ---
 include/linux/pci.h |  2 ++
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4ffa152..deb828f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -437,18 +437,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 }
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
-/**
- * pci_find_parent_resource - return resource region of parent bus of given 
region
- * @dev: PCI device structure contains resources to be searched
- * @res: child resource record for which parent is sought
- *
- *  For given resource region of given device, return the resource
- *  region of parent bus the given region is contained in.
- */
-struct resource *pci_find_parent_resource(const struct pci_dev *dev,
- struct resource *res)
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res)
 {
-   const struct pci_bus *bus = dev->bus;
struct resource *r;
int i;
 
@@ -478,6 +469,20 @@ struct resource *pci_find_parent_resource(const struct 
pci_dev *dev,
}
return NULL;
 }
+
+/**
+ * pci_find_parent_resource - return resource region of parent bus of given 
region
+ * @dev: PCI device structure contains resources to be searched
+ * @res: child resource record for which parent is sought
+ *
+ *  For given resource region of given device, return the resource
+ *  region of parent bus the given region is contained in.
+ */
+struct resource *pci_find_parent_resource(const struct pci_dev *dev,
+ struct resource *res)
+{
+   return pci_find_bus_resource(dev->bus, res);
+}
 EXPORT_SYMBOL(pci_find_parent_resource);
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 88185ff..817786b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -836,6 +836,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct 
pci_bus_region *region,
 struct resource *res);
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 struct pci_bus_region *region);
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res);
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-- 
2.9.3



[PATCH 09/13] PCI: Check pref compatible bit for mem64 resource of PCIe device

2017-04-20 Thread Yinghai Lu
We still get "no compatible bridge window" warning on sparc T5-8
after we add support for 64bit resource parsing for root bus.

 PCI: scan_bus[/pci@300/pci@1/pci@0/pci@6] bus no 8
 PCI: Claiming :00:01.0: Resource 15: 8001..8004afff 
[220c]
 PCI: Claiming :01:00.0: Resource 15: 8001..8004afff 
[220c]
 PCI: Claiming :02:04.0: Resource 15: 8001..80012fff 
[220c]
 PCI: Claiming :03:00.0: Resource 15: 8001..80012fff 
[220c]
 PCI: Claiming :04:06.0: Resource 14: 8001..80010fff 
[220c]
 PCI: Claiming :05:00.0: Resource 0: 8001..80011fff 
[204]
 pci :05:00.0: can't claim BAR 0 [mem 0x8001-0x80011fff]: no 
compatible bridge window

All the bridges 64-bit resource have pref bit, but the device resource does not
have pref set, then we can not find parent for the device resource,
as we can not put non-pref mmio under pref mmio.

According to pcie spec errta
https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf
page 13, in some case it is ok to mark some as pref.

Mark if the entire path from the host to the adapter is over PCI Express.
Set pref compatible bit for claim/sizing/assign for 64bit mem resource
on that pcie device.

-v2: set pref for mmio 64 when whole path is PCI Express, according to David 
Miller.
-v3: don't set pref directly, change to UNDER_PREF, and set PREF before
 sizing and assign resource, and cleart PREF afterwards. requested by BenH.
-v4: use on_all_pcie_path device flag instead.
-v6: update after pci_find_bus_resource() change

Link: 
http://lkml.kernel.org/r/cae9fiqu1gjy1lyrxs+ma5lcteee4xmtjrg0axj9k_tsu+m9...@mail.gmail.com
Reported-by: David Ahern <david.ah...@oracle.com>
Tested-by: David Ahern <david.ah...@oracle.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Tested-by: TJ <li...@iam.tj>
Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci_common.c |  2 +-
 drivers/pci/pci.c  |  8 +---
 drivers/pci/pci.h  |  2 ++
 drivers/pci/probe.c| 33 +
 drivers/pci/setup-bus.c| 23 +++
 drivers/pci/setup-res.c|  4 
 include/linux/pci.h|  3 ++-
 7 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 1ebc7ff..6f206a1 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -343,7 +343,7 @@ static void pci_register_region(struct pci_bus *bus, const 
char *name,
region.start = rstart;
region.end = rstart + size - 1UL;
pcibios_bus_to_resource(bus, res, );
-   bus_res = pci_find_bus_resource(bus, res);
+   bus_res = pci_find_bus_resource(bus, res, res->flags);
if (!bus_res) {
kfree(res);
return;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index deb828f..bdb70b7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -438,7 +438,7 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
 struct resource *pci_find_bus_resource(const struct pci_bus *bus,
-   struct resource *res)
+   struct resource *res, int flags)
 {
struct resource *r;
int i;
@@ -453,7 +453,7 @@ struct resource *pci_find_bus_resource(const struct pci_bus 
*bus,
 * not, the allocator made a mistake.
 */
if (r->flags & IORESOURCE_PREFETCH &&
-   !(res->flags & IORESOURCE_PREFETCH))
+   !(flags & IORESOURCE_PREFETCH))
return NULL;
 
/*
@@ -481,7 +481,9 @@ struct resource *pci_find_bus_resource(const struct pci_bus 
*bus,
 struct resource *pci_find_parent_resource(const struct pci_dev *dev,
  struct resource *res)
 {
-   return pci_find_bus_resource(dev->bus, res);
+   int flags = pci_resource_pref_compatible(dev, res);
+
+   return pci_find_bus_resource(dev->bus, res, flags);
 }
 EXPORT_SYMBOL(pci_find_parent_resource);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 586e63f..eb57780 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -368,4 +368,6 @@ int acpi_get_rc_resources(struct device *dev, const char 
*hid, u16 segment,
  struct resource *res);
 #endif
 
+int pci_resource_pref_compatible(const struct pci_dev *dev,
+struct resource *res);
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.

[PATCH 09/13] PCI: Check pref compatible bit for mem64 resource of PCIe device

2017-04-20 Thread Yinghai Lu
We still get "no compatible bridge window" warning on sparc T5-8
after we add support for 64bit resource parsing for root bus.

 PCI: scan_bus[/pci@300/pci@1/pci@0/pci@6] bus no 8
 PCI: Claiming :00:01.0: Resource 15: 8001..8004afff 
[220c]
 PCI: Claiming :01:00.0: Resource 15: 8001..8004afff 
[220c]
 PCI: Claiming :02:04.0: Resource 15: 8001..80012fff 
[220c]
 PCI: Claiming :03:00.0: Resource 15: 8001..80012fff 
[220c]
 PCI: Claiming :04:06.0: Resource 14: 8001..80010fff 
[220c]
 PCI: Claiming :05:00.0: Resource 0: 8001..80011fff 
[204]
 pci :05:00.0: can't claim BAR 0 [mem 0x8001-0x80011fff]: no 
compatible bridge window

All the bridges 64-bit resource have pref bit, but the device resource does not
have pref set, then we can not find parent for the device resource,
as we can not put non-pref mmio under pref mmio.

According to pcie spec errta
https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf
page 13, in some case it is ok to mark some as pref.

Mark if the entire path from the host to the adapter is over PCI Express.
Set pref compatible bit for claim/sizing/assign for 64bit mem resource
on that pcie device.

-v2: set pref for mmio 64 when whole path is PCI Express, according to David 
Miller.
-v3: don't set pref directly, change to UNDER_PREF, and set PREF before
 sizing and assign resource, and cleart PREF afterwards. requested by BenH.
-v4: use on_all_pcie_path device flag instead.
-v6: update after pci_find_bus_resource() change

Link: 
http://lkml.kernel.org/r/cae9fiqu1gjy1lyrxs+ma5lcteee4xmtjrg0axj9k_tsu+m9...@mail.gmail.com
Reported-by: David Ahern 
Tested-by: David Ahern 
Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Tested-by: TJ 
Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci_common.c |  2 +-
 drivers/pci/pci.c  |  8 +---
 drivers/pci/pci.h  |  2 ++
 drivers/pci/probe.c| 33 +
 drivers/pci/setup-bus.c| 23 +++
 drivers/pci/setup-res.c|  4 
 include/linux/pci.h|  3 ++-
 7 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 1ebc7ff..6f206a1 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -343,7 +343,7 @@ static void pci_register_region(struct pci_bus *bus, const 
char *name,
region.start = rstart;
region.end = rstart + size - 1UL;
pcibios_bus_to_resource(bus, res, );
-   bus_res = pci_find_bus_resource(bus, res);
+   bus_res = pci_find_bus_resource(bus, res, res->flags);
if (!bus_res) {
kfree(res);
return;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index deb828f..bdb70b7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -438,7 +438,7 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
 struct resource *pci_find_bus_resource(const struct pci_bus *bus,
-   struct resource *res)
+   struct resource *res, int flags)
 {
struct resource *r;
int i;
@@ -453,7 +453,7 @@ struct resource *pci_find_bus_resource(const struct pci_bus 
*bus,
 * not, the allocator made a mistake.
 */
if (r->flags & IORESOURCE_PREFETCH &&
-   !(res->flags & IORESOURCE_PREFETCH))
+   !(flags & IORESOURCE_PREFETCH))
return NULL;
 
/*
@@ -481,7 +481,9 @@ struct resource *pci_find_bus_resource(const struct pci_bus 
*bus,
 struct resource *pci_find_parent_resource(const struct pci_dev *dev,
  struct resource *res)
 {
-   return pci_find_bus_resource(dev->bus, res);
+   int flags = pci_resource_pref_compatible(dev, res);
+
+   return pci_find_bus_resource(dev->bus, res, flags);
 }
 EXPORT_SYMBOL(pci_find_parent_resource);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 586e63f..eb57780 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -368,4 +368,6 @@ int acpi_get_rc_resources(struct device *dev, const char 
*hid, u16 segment,
  struct resource *res);
 #endif
 
+int pci_resource_pref_compatible(const struct pci_dev *dev,
+struct resource *res);
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5548044..676b55f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1920,6 +1920,36 @@ stat

[PATCH 10/13] PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64

2017-04-20 Thread Yinghai Lu
If any bridge up to root only have 32bit pref mmio, We don't need to
treat device non-pref mmio64 as as pref mmio64.

We need to move pci_bridge_check_ranges calling early.
For parent bridges pref mmio BAR may not allocated by BIOS, res flags
is still 0, we need to have it correct set before we check them for
child device resources.

-v2: check all bus resources instead of just res[15].

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
---
 drivers/pci/setup-bus.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3de66e6..b3fd314 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -735,6 +735,29 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int 
i)
return -EINVAL;
 }
 
+static bool pci_up_path_over_pref_mem64(struct pci_bus *bus)
+{
+   if (pci_is_root_bus(bus))
+   return true;
+
+   if (bus->self) {
+   int i;
+   bool found = false;
+   struct resource *res;
+
+   pci_bus_for_each_resource(bus, res, i)
+   if (res->flags & IORESOURCE_MEM_64) {
+   found = true;
+   break;
+   }
+
+   if (!found)
+   return false;
+   }
+
+   return pci_up_path_over_pref_mem64(bus->parent);
+}
+
 int pci_resource_pref_compatible(const struct pci_dev *dev,
 struct resource *res)
 {
@@ -743,7 +766,8 @@ int pci_resource_pref_compatible(const struct pci_dev *dev,
 
if ((res->flags & IORESOURCE_MEM) &&
(res->flags & IORESOURCE_MEM_64) &&
-   dev->on_all_pcie_path)
+   dev->on_all_pcie_path &&
+   pci_up_path_over_pref_mem64(dev->bus))
return res->flags | IORESOURCE_PREFETCH;
 
return res->flags;
@@ -1236,6 +1260,10 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
struct resource *b_res;
int ret;
 
+   if (!pci_is_root_bus(bus) &&
+   (bus->self->class >> 8) == PCI_CLASS_BRIDGE_PCI)
+   pci_bridge_check_ranges(bus);
+
list_for_each_entry(dev, >devices, bus_list) {
struct pci_bus *b = dev->subordinate;
if (!b)
@@ -1263,7 +1291,6 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
break;
 
case PCI_CLASS_BRIDGE_PCI:
-   pci_bridge_check_ranges(bus);
if (bus->self->is_hotplug_bridge) {
additional_io_size  = pci_hotplug_io_size;
additional_mem_size = pci_hotplug_mem_size;
-- 
2.9.3



[PATCH 10/13] PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64

2017-04-20 Thread Yinghai Lu
If any bridge up to root only have 32bit pref mmio, We don't need to
treat device non-pref mmio64 as as pref mmio64.

We need to move pci_bridge_check_ranges calling early.
For parent bridges pref mmio BAR may not allocated by BIOS, res flags
is still 0, we need to have it correct set before we check them for
child device resources.

-v2: check all bus resources instead of just res[15].

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 drivers/pci/setup-bus.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3de66e6..b3fd314 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -735,6 +735,29 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int 
i)
return -EINVAL;
 }
 
+static bool pci_up_path_over_pref_mem64(struct pci_bus *bus)
+{
+   if (pci_is_root_bus(bus))
+   return true;
+
+   if (bus->self) {
+   int i;
+   bool found = false;
+   struct resource *res;
+
+   pci_bus_for_each_resource(bus, res, i)
+   if (res->flags & IORESOURCE_MEM_64) {
+   found = true;
+   break;
+   }
+
+   if (!found)
+   return false;
+   }
+
+   return pci_up_path_over_pref_mem64(bus->parent);
+}
+
 int pci_resource_pref_compatible(const struct pci_dev *dev,
 struct resource *res)
 {
@@ -743,7 +766,8 @@ int pci_resource_pref_compatible(const struct pci_dev *dev,
 
if ((res->flags & IORESOURCE_MEM) &&
(res->flags & IORESOURCE_MEM_64) &&
-   dev->on_all_pcie_path)
+   dev->on_all_pcie_path &&
+   pci_up_path_over_pref_mem64(dev->bus))
return res->flags | IORESOURCE_PREFETCH;
 
return res->flags;
@@ -1236,6 +1260,10 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
struct resource *b_res;
int ret;
 
+   if (!pci_is_root_bus(bus) &&
+   (bus->self->class >> 8) == PCI_CLASS_BRIDGE_PCI)
+   pci_bridge_check_ranges(bus);
+
list_for_each_entry(dev, >devices, bus_list) {
struct pci_bus *b = dev->subordinate;
if (!b)
@@ -1263,7 +1291,6 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
break;
 
case PCI_CLASS_BRIDGE_PCI:
-   pci_bridge_check_ranges(bus);
if (bus->self->is_hotplug_bridge) {
additional_io_size  = pci_hotplug_io_size;
additional_mem_size = pci_hotplug_mem_size;
-- 
2.9.3



[PATCH 06/13] powerpc/PCI: Keep resource idx order with bridge register number

2017-04-20 Thread Yinghai Lu
Same as sparc version.

Make resource with consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index ea3d981..9581e00 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -252,7 +252,7 @@ void of_scan_pci_bridge(struct pci_dev *dev)
bus->resource[i] = res;
++res;
}
-   i = 1;
+   i = 3;
for (; len >= 32; len -= 32, ranges += 8) {
flags = pci_parse_of_flags(of_read_number(ranges, 1), 1);
size = of_read_number([6], 2);
@@ -265,6 +265,12 @@ void of_scan_pci_bridge(struct pci_dev *dev)
   " for bridge %s\n", node->full_name);
continue;
}
+   } else if ((flags & IORESOURCE_PREFETCH) &&
+  !bus->resource[2]->flags) {
+   res = bus->resource[2];
+   } else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+   IORESOURCE_MEM) && !bus->resource[1]->flags) {
+   res = bus->resource[1];
} else {
if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
printk(KERN_ERR "PCI: too many memory ranges"
-- 
2.9.3



[PATCH 04/13] sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing

2017-04-20 Thread Yinghai Lu
For device resource with PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

so this patch set IORESOUCE_MEM_64 for 64bit resource during OF device
resource flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu <ying...@kernel.org>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: sparcli...@vger.kernel.org
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
---
 arch/sparc/kernel/of_device_32.c | 5 +++--
 arch/sparc/kernel/of_device_64.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/of_device_32.c b/arch/sparc/kernel/of_device_32.c
index 185aa96..3e9f273 100644
--- a/arch/sparc/kernel/of_device_32.c
+++ b/arch/sparc/kernel/of_device_32.c
@@ -83,11 +83,12 @@ static unsigned long of_bus_pci_get_flags(const u32 *addr, 
unsigned long flags)
case 0x01:
flags |= IORESOURCE_IO;
break;
-
case 0x02: /* 32 bits */
-   case 0x03: /* 64 bits */
flags |= IORESOURCE_MEM;
break;
+   case 0x03: /* 64 bits */
+   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+   break;
}
if (w & 0x4000)
flags |= IORESOURCE_PREFETCH;
diff --git a/arch/sparc/kernel/of_device_64.c b/arch/sparc/kernel/of_device_64.c
index 7bbdc26..defee61 100644
--- a/arch/sparc/kernel/of_device_64.c
+++ b/arch/sparc/kernel/of_device_64.c
@@ -146,11 +146,12 @@ static unsigned long of_bus_pci_get_flags(const u32 
*addr, unsigned long flags)
case 0x01:
flags |= IORESOURCE_IO;
break;
-
case 0x02: /* 32 bits */
-   case 0x03: /* 64 bits */
flags |= IORESOURCE_MEM;
break;
+   case 0x03: /* 64 bits */
+   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+   break;
}
if (w & 0x4000)
flags |= IORESOURCE_PREFETCH;
-- 
2.9.3



[PATCH 08/13] OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource

2017-04-20 Thread Yinghai Lu
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device
resource flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu <ying...@kernel.org>
Cc: Grant Likely <grant.lik...@linaro.org>
Cc: Rob Herring <robh...@kernel.org>
Cc: devicet...@vger.kernel.org
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
---
 drivers/of/address.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..d1bb76c 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -131,9 +131,11 @@ static unsigned int of_bus_pci_get_flags(const __be32 
*addr)
flags |= IORESOURCE_IO;
break;
case 0x02: /* 32 bits */
-   case 0x03: /* 64 bits */
flags |= IORESOURCE_MEM;
break;
+   case 0x03: /* 64 bits */
+   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+   break;
}
if (w & 0x4000)
flags |= IORESOURCE_PREFETCH;
-- 
2.9.3



[PATCH 06/13] powerpc/PCI: Keep resource idx order with bridge register number

2017-04-20 Thread Yinghai Lu
Same as sparc version.

Make resource with consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu 
Cc: linuxppc-...@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index ea3d981..9581e00 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -252,7 +252,7 @@ void of_scan_pci_bridge(struct pci_dev *dev)
bus->resource[i] = res;
++res;
}
-   i = 1;
+   i = 3;
for (; len >= 32; len -= 32, ranges += 8) {
flags = pci_parse_of_flags(of_read_number(ranges, 1), 1);
size = of_read_number([6], 2);
@@ -265,6 +265,12 @@ void of_scan_pci_bridge(struct pci_dev *dev)
   " for bridge %s\n", node->full_name);
continue;
}
+   } else if ((flags & IORESOURCE_PREFETCH) &&
+  !bus->resource[2]->flags) {
+   res = bus->resource[2];
+   } else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+   IORESOURCE_MEM) && !bus->resource[1]->flags) {
+   res = bus->resource[1];
} else {
if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
printk(KERN_ERR "PCI: too many memory ranges"
-- 
2.9.3



[PATCH 04/13] sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing

2017-04-20 Thread Yinghai Lu
For device resource with PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

so this patch set IORESOUCE_MEM_64 for 64bit resource during OF device
resource flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu 
Cc: "David S. Miller" 
Cc: sparcli...@vger.kernel.org
Tested-by: Khalid Aziz 
---
 arch/sparc/kernel/of_device_32.c | 5 +++--
 arch/sparc/kernel/of_device_64.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/of_device_32.c b/arch/sparc/kernel/of_device_32.c
index 185aa96..3e9f273 100644
--- a/arch/sparc/kernel/of_device_32.c
+++ b/arch/sparc/kernel/of_device_32.c
@@ -83,11 +83,12 @@ static unsigned long of_bus_pci_get_flags(const u32 *addr, 
unsigned long flags)
case 0x01:
flags |= IORESOURCE_IO;
break;
-
case 0x02: /* 32 bits */
-   case 0x03: /* 64 bits */
flags |= IORESOURCE_MEM;
break;
+   case 0x03: /* 64 bits */
+   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+   break;
}
if (w & 0x4000)
flags |= IORESOURCE_PREFETCH;
diff --git a/arch/sparc/kernel/of_device_64.c b/arch/sparc/kernel/of_device_64.c
index 7bbdc26..defee61 100644
--- a/arch/sparc/kernel/of_device_64.c
+++ b/arch/sparc/kernel/of_device_64.c
@@ -146,11 +146,12 @@ static unsigned long of_bus_pci_get_flags(const u32 
*addr, unsigned long flags)
case 0x01:
flags |= IORESOURCE_IO;
break;
-
case 0x02: /* 32 bits */
-   case 0x03: /* 64 bits */
flags |= IORESOURCE_MEM;
break;
+   case 0x03: /* 64 bits */
+   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+   break;
}
if (w & 0x4000)
flags |= IORESOURCE_PREFETCH;
-- 
2.9.3



[PATCH 08/13] OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource

2017-04-20 Thread Yinghai Lu
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device
resource flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu 
Cc: Grant Likely 
Cc: Rob Herring 
Cc: devicet...@vger.kernel.org
Tested-by: Khalid Aziz 
---
 drivers/of/address.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..d1bb76c 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -131,9 +131,11 @@ static unsigned int of_bus_pci_get_flags(const __be32 
*addr)
flags |= IORESOURCE_IO;
break;
case 0x02: /* 32 bits */
-   case 0x03: /* 64 bits */
flags |= IORESOURCE_MEM;
break;
+   case 0x03: /* 64 bits */
+   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+   break;
}
if (w & 0x4000)
flags |= IORESOURCE_PREFETCH;
-- 
2.9.3



[PATCH 13/13] PCI: Restore pref MMIO allocation logic for host bridge without mmio64

2017-04-20 Thread Yinghai Lu
>From 5b2854155 (PCI: Restrict 64-bit prefetchable bridge windows to 64-bit
resources), we change the logic for pref mmio allocation:
When bridge pref support mmio64, we will only put children pref
that support mmio64 into it, and will put children pref mmio32
into bridge's non-pref mmio32.

That could leave bridge pref bar not used when that pref bar is mmio64,
and children res only has mmio32.
Also could have allocation failure when non-pref mmio32 is not big
enough space for those children pref mmio32.

That is not rational when the host bridge does not have 64bit mmio
above 4g at all.

The patch restore to old logic:
when host bridge does not have has_mem64, put children pref mmio64 and
pref mmio32 all under bridges pref bars.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
---
 drivers/pci/bus.c   |  4 +++-
 drivers/pci/setup-bus.c | 13 +
 drivers/pci/setup-res.c |  9 ++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index bc56cf1..79205fb 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -233,8 +233,10 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct 
resource *res,
 {
 #ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
int rc;
+   unsigned long mmio64 = pci_find_host_bridge(bus)->has_mem64 ?
+   IORESOURCE_MEM_64 : 0;
 
-   if (res->flags & IORESOURCE_MEM_64) {
+   if (res->flags & mmio64) {
rc = pci_bus_alloc_from_region(bus, res, size, align, min,
   type_mask, alignf, alignf_data,
   _high);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7a0e59b..f29cf5d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1308,7 +1308,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
b_res = >self->resource[PCI_BRIDGE_RESOURCES];
mask = IORESOURCE_MEM;
prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
-   if (b_res[2].flags & IORESOURCE_MEM_64) {
+   if ((b_res[2].flags & IORESOURCE_MEM_64) &&
+   pci_find_host_bridge(bus)->has_mem64) {
prefmask |= IORESOURCE_MEM_64;
ret = pbus_size_mem(bus, prefmask, prefmask,
  prefmask, prefmask,
@@ -1578,17 +1579,21 @@ static void pci_bridge_release_resources(struct pci_bus 
*bus,
 *io port.
 * 2. if there is non pref mmio assign fail, release bridge
 *nonpref mmio.
-* 3. if there is 64bit pref mmio assign fail, and bridge pref
+* 3. if there is pref mmio assign fail, and host bridge does
+*have 64bit mmio, release bridge pref mmio.
+* 4. if there is 64bit pref mmio assign fail, and bridge pref
 *is 64bit, release bridge pref mmio.
-* 4. if there is pref mmio assign fail, and bridge pref is
+* 5. if there is pref mmio assign fail, and bridge pref is
 *32bit mmio, release bridge pref mmio
-* 5. if there is pref mmio assign fail, and bridge pref is not
+* 6. if there is pref mmio assign fail, and bridge pref is not
 *assigned, release bridge nonpref mmio.
 */
if (type & IORESOURCE_IO)
idx = 0;
else if (!(type & IORESOURCE_PREFETCH))
idx = 1;
+   else if (!pci_find_host_bridge(bus)->has_mem64)
+   idx = 2;
else if ((type & IORESOURCE_MEM_64) &&
 (b_res[2].flags & IORESOURCE_MEM_64))
idx = 2;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 2aeb4bc..49cfb55 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -240,6 +240,8 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
struct resource *res = dev->resource + resno;
resource_size_t min;
int ret;
+   unsigned long mmio64 = pci_find_host_bridge(bus)->has_mem64 ?
+   IORESOURCE_MEM_64 : 0;
 
min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
@@ -251,7 +253,7 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
 * things differently than they were sized, not everything will fit.
 */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
-IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
+IORESOURCE_PREFETCH | mmio64,
 pcibios_align_resource, dev);
if (ret == 0)
return 0;
@@ -260,7 +262,8 @@ static in

[PATCH 13/13] PCI: Restore pref MMIO allocation logic for host bridge without mmio64

2017-04-20 Thread Yinghai Lu
>From 5b2854155 (PCI: Restrict 64-bit prefetchable bridge windows to 64-bit
resources), we change the logic for pref mmio allocation:
When bridge pref support mmio64, we will only put children pref
that support mmio64 into it, and will put children pref mmio32
into bridge's non-pref mmio32.

That could leave bridge pref bar not used when that pref bar is mmio64,
and children res only has mmio32.
Also could have allocation failure when non-pref mmio32 is not big
enough space for those children pref mmio32.

That is not rational when the host bridge does not have 64bit mmio
above 4g at all.

The patch restore to old logic:
when host bridge does not have has_mem64, put children pref mmio64 and
pref mmio32 all under bridges pref bars.

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 drivers/pci/bus.c   |  4 +++-
 drivers/pci/setup-bus.c | 13 +
 drivers/pci/setup-res.c |  9 ++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index bc56cf1..79205fb 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -233,8 +233,10 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct 
resource *res,
 {
 #ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
int rc;
+   unsigned long mmio64 = pci_find_host_bridge(bus)->has_mem64 ?
+   IORESOURCE_MEM_64 : 0;
 
-   if (res->flags & IORESOURCE_MEM_64) {
+   if (res->flags & mmio64) {
rc = pci_bus_alloc_from_region(bus, res, size, align, min,
   type_mask, alignf, alignf_data,
   _high);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7a0e59b..f29cf5d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1308,7 +1308,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
b_res = >self->resource[PCI_BRIDGE_RESOURCES];
mask = IORESOURCE_MEM;
prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
-   if (b_res[2].flags & IORESOURCE_MEM_64) {
+   if ((b_res[2].flags & IORESOURCE_MEM_64) &&
+   pci_find_host_bridge(bus)->has_mem64) {
prefmask |= IORESOURCE_MEM_64;
ret = pbus_size_mem(bus, prefmask, prefmask,
  prefmask, prefmask,
@@ -1578,17 +1579,21 @@ static void pci_bridge_release_resources(struct pci_bus 
*bus,
 *io port.
 * 2. if there is non pref mmio assign fail, release bridge
 *nonpref mmio.
-* 3. if there is 64bit pref mmio assign fail, and bridge pref
+* 3. if there is pref mmio assign fail, and host bridge does
+*have 64bit mmio, release bridge pref mmio.
+* 4. if there is 64bit pref mmio assign fail, and bridge pref
 *is 64bit, release bridge pref mmio.
-* 4. if there is pref mmio assign fail, and bridge pref is
+* 5. if there is pref mmio assign fail, and bridge pref is
 *32bit mmio, release bridge pref mmio
-* 5. if there is pref mmio assign fail, and bridge pref is not
+* 6. if there is pref mmio assign fail, and bridge pref is not
 *assigned, release bridge nonpref mmio.
 */
if (type & IORESOURCE_IO)
idx = 0;
else if (!(type & IORESOURCE_PREFETCH))
idx = 1;
+   else if (!pci_find_host_bridge(bus)->has_mem64)
+   idx = 2;
else if ((type & IORESOURCE_MEM_64) &&
 (b_res[2].flags & IORESOURCE_MEM_64))
idx = 2;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 2aeb4bc..49cfb55 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -240,6 +240,8 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
struct resource *res = dev->resource + resno;
resource_size_t min;
int ret;
+   unsigned long mmio64 = pci_find_host_bridge(bus)->has_mem64 ?
+   IORESOURCE_MEM_64 : 0;
 
min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
@@ -251,7 +253,7 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
 * things differently than they were sized, not everything will fit.
 */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
-IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
+IORESOURCE_PREFETCH | mmio64,
 pcibios_align_resource, dev);
if (ret == 0)
return 0;
@@ -260,7 +262,8 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct

[PATCH 11/13] PCI: Add has_mem64 for struct host_bridge

2017-04-20 Thread Yinghai Lu
Add has_mem64 for struct host_bridge, on root bus that does not support
mmio64 above 4g, will not set that.

We will use that info next two following patches:
1. Don't treat non-pref mmio64 as pref mmio, so will not put
   it under bridge's pref range when rescan the devices
2. will keep pref mmio64 and pref mmio32 under bridge pref bar.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
---
 drivers/pci/probe.c | 7 +++
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 676b55f..8f439e0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -818,6 +818,13 @@ int pci_register_host_bridge(struct pci_host_bridge 
*bridge)
addr[0] = '\0';
 
dev_info(>dev, "root bus resource %pR%s\n", res, addr);
+
+   if (resource_type(res) == IORESOURCE_MEM) {
+   if ((res->end - offset) > 0x)
+   bridge->has_mem64 = 1;
+   if ((res->start - offset) > 0x)
+   res->flags |= IORESOURCE_MEM_64;
+   }
}
 
down_write(_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b14dd94..a3693ef 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -436,6 +436,7 @@ struct pci_host_bridge {
void *release_data;
struct msi_controller *msi;
unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
+   unsigned int has_mem64:1;
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,
-- 
2.9.3



[PATCH 11/13] PCI: Add has_mem64 for struct host_bridge

2017-04-20 Thread Yinghai Lu
Add has_mem64 for struct host_bridge, on root bus that does not support
mmio64 above 4g, will not set that.

We will use that info next two following patches:
1. Don't treat non-pref mmio64 as pref mmio, so will not put
   it under bridge's pref range when rescan the devices
2. will keep pref mmio64 and pref mmio32 under bridge pref bar.

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 drivers/pci/probe.c | 7 +++
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 676b55f..8f439e0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -818,6 +818,13 @@ int pci_register_host_bridge(struct pci_host_bridge 
*bridge)
addr[0] = '\0';
 
dev_info(>dev, "root bus resource %pR%s\n", res, addr);
+
+   if (resource_type(res) == IORESOURCE_MEM) {
+   if ((res->end - offset) > 0x)
+   bridge->has_mem64 = 1;
+   if ((res->start - offset) > 0x)
+   res->flags |= IORESOURCE_MEM_64;
+   }
}
 
down_write(_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b14dd94..a3693ef 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -436,6 +436,7 @@ struct pci_host_bridge {
void *release_data;
struct msi_controller *msi;
unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
+   unsigned int has_mem64:1;
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,
-- 
2.9.3



[PATCH 03/13] sparc/PCI: Reserve legacy mmio after PCI mmio

2017-04-20 Thread Yinghai Lu
On one system found bunch of claim resource fail from pci device.
pci_sun4v f02b894c: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x2-0x27eff] (bus 
address [0x-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x20007] (bus 
address [0x1-0x7])
...
PCI: Claiming :00:02.0: Resource 14: 0002..0002004f 
[200]
pci :00:02.0: can't claim BAR 14 [mem 0x2-0x2004f]: 
address conflict with Video RAM area [??? 0x2000a-0x2000b flags 
0x8000]
pci :02:00.0: can't claim BAR 0 [mem 0x2-0x2000f]: no 
compatible bridge window
PCI: Claiming :02:00.0: Resource 3: 00020010..000200103fff [200]
pci :02:00.0: can't claim BAR 3 [mem 0x20010-0x200103fff]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 0: 00020020..0002002f [200]
pci :02:00.1: can't claim BAR 0 [mem 0x20020-0x2002f]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 3: 000200104000..000200107fff [200]
pci :02:00.1: can't claim BAR 3 [mem 0x200104000-0x200107fff]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 0: 00020030..0002003f [200]
pci :02:00.2: can't claim BAR 0 [mem 0x20030-0x2003f]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 3: 000200108000..00020010bfff [200]
pci :02:00.2: can't claim BAR 3 [mem 0x200108000-0x20010bfff]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 0: 00020040..0002004f [200]
pci :02:00.3: can't claim BAR 0 [mem 0x20040-0x2004f]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 3: 00020010c000..00020010 [200]
pci :02:00.3: can't claim BAR 3 [mem 0x20010c000-0x20010]: no 
compatible bridge window

The bridge 00:02.0 resource does not get reserved as Video RAM take the 
position early,
and following children resources reservation all fail.

Move down Video RAM area reservation after pci mmio get reserved,
so we leave pci driver to use those regions.

-v5: merge simplify one and use pcibios_bus_to_resource()

-v6: use pci_find_bus_resource()

Signed-off-by: Yinghai Lu <ying...@kernel.org>
Tested-by: Khalid Aziz <khalid.a...@oracle.com>
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci.c|  1 +
 arch/sparc/kernel/pci_common.c | 59 ++
 arch/sparc/kernel/pci_impl.h   |  1 +
 3 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index c5cf813..adb9653 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -686,6 +686,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
pci_bus_register_of_sysfs(bus);
 
pci_claim_bus_resources(bus);
+   pci_register_legacy_regions(bus);
pci_bus_add_devices(bus);
return bus;
 }
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 76998f8..1ebc7ff 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm)
}
 }
 
-static void pci_register_legacy_regions(struct resource *io_res,
-   struct resource *mem_res)
+static void pci_register_region(struct pci_bus *bus, const char *name,
+   resource_size_t rstart, resource_size_t size)
 {
-   struct resource *p;
+   struct resource *res, *conflict, *bus_res;
+   struct pci_bus_region region;
 
-   /* VGA Video RAM. */
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   res = kzalloc(sizeof(*res), GFP_KERNEL);
+   if (!res)
return;
 
-   p->name = "Video RAM area";
-   p->start = mem_res->start + 0xaUL;
-   p->end = p->start + 0x1UL;
-   p->flags = IORESOURCE_BUSY;
-   request_resource(mem_res, p);
+   res->flags = IORESOURCE_MEM;
 
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   region.start = rstart;
+   region.end = rstart + size - 1UL;
+   pcibios_bus_to_resource(bus, res, );
+   bus_res = pci_find_bus_resource(bus, res);
+   if (!bus_res) {
+   kfree(res);
return;
+   }
+
+   res->name = name;
+   res->flags |= IORESOURCE_BUSY;
+   conflict = request_resource_conflict(bus_res, res);
+   if (conflict) {
+   dev_printk(KERN_DEBUG, >dev,
+   " can't claim %s %pR: address conflict with %s %pR\n",
+   res-&g

[PATCH 03/13] sparc/PCI: Reserve legacy mmio after PCI mmio

2017-04-20 Thread Yinghai Lu
On one system found bunch of claim resource fail from pci device.
pci_sun4v f02b894c: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x2-0x27eff] (bus 
address [0x-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x20007] (bus 
address [0x1-0x7])
...
PCI: Claiming :00:02.0: Resource 14: 0002..0002004f 
[200]
pci :00:02.0: can't claim BAR 14 [mem 0x2-0x2004f]: 
address conflict with Video RAM area [??? 0x2000a-0x2000b flags 
0x8000]
pci :02:00.0: can't claim BAR 0 [mem 0x2-0x2000f]: no 
compatible bridge window
PCI: Claiming :02:00.0: Resource 3: 00020010..000200103fff [200]
pci :02:00.0: can't claim BAR 3 [mem 0x20010-0x200103fff]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 0: 00020020..0002002f [200]
pci :02:00.1: can't claim BAR 0 [mem 0x20020-0x2002f]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 3: 000200104000..000200107fff [200]
pci :02:00.1: can't claim BAR 3 [mem 0x200104000-0x200107fff]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 0: 00020030..0002003f [200]
pci :02:00.2: can't claim BAR 0 [mem 0x20030-0x2003f]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 3: 000200108000..00020010bfff [200]
pci :02:00.2: can't claim BAR 3 [mem 0x200108000-0x20010bfff]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 0: 00020040..0002004f [200]
pci :02:00.3: can't claim BAR 0 [mem 0x20040-0x2004f]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 3: 00020010c000..00020010 [200]
pci :02:00.3: can't claim BAR 3 [mem 0x20010c000-0x20010]: no 
compatible bridge window

The bridge 00:02.0 resource does not get reserved as Video RAM take the 
position early,
and following children resources reservation all fail.

Move down Video RAM area reservation after pci mmio get reserved,
so we leave pci driver to use those regions.

-v5: merge simplify one and use pcibios_bus_to_resource()

-v6: use pci_find_bus_resource()

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
Cc: sparcli...@vger.kernel.org
---
 arch/sparc/kernel/pci.c|  1 +
 arch/sparc/kernel/pci_common.c | 59 ++
 arch/sparc/kernel/pci_impl.h   |  1 +
 3 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index c5cf813..adb9653 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -686,6 +686,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
pci_bus_register_of_sysfs(bus);
 
pci_claim_bus_resources(bus);
+   pci_register_legacy_regions(bus);
pci_bus_add_devices(bus);
return bus;
 }
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 76998f8..1ebc7ff 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm)
}
 }
 
-static void pci_register_legacy_regions(struct resource *io_res,
-   struct resource *mem_res)
+static void pci_register_region(struct pci_bus *bus, const char *name,
+   resource_size_t rstart, resource_size_t size)
 {
-   struct resource *p;
+   struct resource *res, *conflict, *bus_res;
+   struct pci_bus_region region;
 
-   /* VGA Video RAM. */
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   res = kzalloc(sizeof(*res), GFP_KERNEL);
+   if (!res)
return;
 
-   p->name = "Video RAM area";
-   p->start = mem_res->start + 0xaUL;
-   p->end = p->start + 0x1UL;
-   p->flags = IORESOURCE_BUSY;
-   request_resource(mem_res, p);
+   res->flags = IORESOURCE_MEM;
 
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   region.start = rstart;
+   region.end = rstart + size - 1UL;
+   pcibios_bus_to_resource(bus, res, );
+   bus_res = pci_find_bus_resource(bus, res);
+   if (!bus_res) {
+   kfree(res);
return;
+   }
+
+   res->name = name;
+   res->flags |= IORESOURCE_BUSY;
+   conflict = request_resource_conflict(bus_res, res);
+   if (conflict) {
+   dev_printk(KERN_DEBUG, >dev,
+   " can't claim %s %pR: address conflict with %s %pR\n",
+   res->name, res, conflict->name, conflict);
+   kfree(res);

Re: [PATCH v2] PCI: disable SERR for kdump kernel

2017-04-20 Thread Yinghai Lu
On Thu, Apr 20, 2017 at 10:14 AM, Sinan Kaya <ok...@codeaurora.org> wrote:
> On 4/18/2017 8:31 PM, Yinghai Lu wrote:
>> * pci_setup_device - fill in class and map information of a device
>>   * @dev: the device structure to fill
>> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>>   /* device class may be changed after fixup */
>>   class = dev->class >> 8;
>>
>> + if (is_kdump_kernel())
>> + pci_disable_serr(dev);
>> +
>
> This sounds like something that needs to be done while shutting down
> the first kernel as part of the kdump procedure rather than boot of
> the kdump kernel in pci setup.

For kdump path, first kernel shutdown path is not called.

We have to do sth in second kernel instead.

Thanks

Yinghai


Re: [PATCH v2] PCI: disable SERR for kdump kernel

2017-04-20 Thread Yinghai Lu
On Thu, Apr 20, 2017 at 10:14 AM, Sinan Kaya  wrote:
> On 4/18/2017 8:31 PM, Yinghai Lu wrote:
>> * pci_setup_device - fill in class and map information of a device
>>   * @dev: the device structure to fill
>> @@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
>>   /* device class may be changed after fixup */
>>   class = dev->class >> 8;
>>
>> + if (is_kdump_kernel())
>> + pci_disable_serr(dev);
>> +
>
> This sounds like something that needs to be done while shutting down
> the first kernel as part of the kdump procedure rather than boot of
> the kdump kernel in pci setup.

For kdump path, first kernel shutdown path is not called.

We have to do sth in second kernel instead.

Thanks

Yinghai


[PATCH v2] PCI: disable SERR for kdump kernel

2017-04-18 Thread Yinghai Lu
Found one system with infiniband with SRIOV enabled, kdump kernel
SRIOV BAR probing trigger one pci fatal error.
That assert error pin, and host get reset by BMC.

We can just ignore that error to let kernel go on
and kdump to create vmcore.

-v2: add debug print out

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/probe.c |   23 +++
 1 file changed, 23 insertions(+)

Index: linux-2.6/drivers/pci/probe.c
===
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "pci.h"
 
 #define CARDBUS_LATENCY_TIMER  176 /* secondary latency timer */
@@ -1515,6 +1517,24 @@ static void pci_msi_setup_pci_dev(struct
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 }
 
+static void pci_disable_serr(struct pci_dev *dev)
+{
+   u16 pci_cmd, pci_bctl;
+
+   pci_read_config_word(dev, PCI_COMMAND, _cmd);
+   pci_cmd &= ~PCI_COMMAND_SERR;
+   pci_write_config_word(dev, PCI_COMMAND, pci_cmd);
+   dev_printk(KERN_DEBUG, >dev, "SERR cleared\n");
+
+   /* Program bridge control value */
+   if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, _bctl);
+   pci_bctl &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, pci_bctl);
+   dev_printk(KERN_DEBUG, >dev, "BRIDGE SERR cleared\n");
+   }
+}
+
 /**
  * pci_setup_device - fill in class and map information of a device
  * @dev: the device structure to fill
@@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
/* device class may be changed after fixup */
class = dev->class >> 8;
 
+   if (is_kdump_kernel())
+   pci_disable_serr(dev);
+
if (dev->non_compliant_bars) {
pci_read_config_word(dev, PCI_COMMAND, );
if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {


[PATCH v2] PCI: disable SERR for kdump kernel

2017-04-18 Thread Yinghai Lu
Found one system with infiniband with SRIOV enabled, kdump kernel
SRIOV BAR probing trigger one pci fatal error.
That assert error pin, and host get reset by BMC.

We can just ignore that error to let kernel go on
and kdump to create vmcore.

-v2: add debug print out

Signed-off-by: Yinghai Lu 

---
 drivers/pci/probe.c |   23 +++
 1 file changed, 23 insertions(+)

Index: linux-2.6/drivers/pci/probe.c
===
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "pci.h"
 
 #define CARDBUS_LATENCY_TIMER  176 /* secondary latency timer */
@@ -1515,6 +1517,24 @@ static void pci_msi_setup_pci_dev(struct
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 }
 
+static void pci_disable_serr(struct pci_dev *dev)
+{
+   u16 pci_cmd, pci_bctl;
+
+   pci_read_config_word(dev, PCI_COMMAND, _cmd);
+   pci_cmd &= ~PCI_COMMAND_SERR;
+   pci_write_config_word(dev, PCI_COMMAND, pci_cmd);
+   dev_printk(KERN_DEBUG, >dev, "SERR cleared\n");
+
+   /* Program bridge control value */
+   if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI) {
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, _bctl);
+   pci_bctl &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, pci_bctl);
+   dev_printk(KERN_DEBUG, >dev, "BRIDGE SERR cleared\n");
+   }
+}
+
 /**
  * pci_setup_device - fill in class and map information of a device
  * @dev: the device structure to fill
@@ -1572,6 +1592,9 @@ int pci_setup_device(struct pci_dev *dev
/* device class may be changed after fixup */
class = dev->class >> 8;
 
+   if (is_kdump_kernel())
+   pci_disable_serr(dev);
+
if (dev->non_compliant_bars) {
pci_read_config_word(dev, PCI_COMMAND, );
if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {


Re: [PATCH] x86/boot: Support uncompressed kernel

2017-03-23 Thread Yinghai Lu
On Thu, Mar 23, 2017 at 5:51 AM, Chao Peng  wrote:
> Compressed kernel has its own drawback: uncompressing takes time. Even
> though the time is short enough to ignore for most cases but for cases that
> time is critical this is still a big number. In our on-going optimization
> for kernel boot time, the measured overall kernel boot time is ~90ms while
> the uncompressing takes ~50ms with gzip.
>
> The patch adds a 'CONFIG_KERNEL_RAW' configure choice so the built binary
> can have no uncompressing at all. The experiment shows:
>
> kernel   kernel sizetime in decompress_kernel
> compressed (gzip)3.3M   53ms
> uncompressed 14M3ms

How about the time difference for bootloader to read kernel from
flash/disk/network to ram?

Thanks

Yinghai


Re: [PATCH] x86/boot: Support uncompressed kernel

2017-03-23 Thread Yinghai Lu
On Thu, Mar 23, 2017 at 5:51 AM, Chao Peng  wrote:
> Compressed kernel has its own drawback: uncompressing takes time. Even
> though the time is short enough to ignore for most cases but for cases that
> time is critical this is still a big number. In our on-going optimization
> for kernel boot time, the measured overall kernel boot time is ~90ms while
> the uncompressing takes ~50ms with gzip.
>
> The patch adds a 'CONFIG_KERNEL_RAW' configure choice so the built binary
> can have no uncompressing at all. The experiment shows:
>
> kernel   kernel sizetime in decompress_kernel
> compressed (gzip)3.3M   53ms
> uncompressed 14M3ms

How about the time difference for bootloader to read kernel from
flash/disk/network to ram?

Thanks

Yinghai


Re: [tip:x86/asm] x86/asm: Optimize clear_page()

2017-03-07 Thread Yinghai Lu
On Mon, Mar 6, 2017 at 11:30 PM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Yinghai Lu <ying...@kernel.org> wrote:
>
>> On Wed, Mar 1, 2017 at 1:47 AM, tip-bot for Borislav Petkov
>> <tip...@zytor.com> wrote:
>> > Commit-ID:  49ca7bb328c630dd43be626534b49e19513296fd
>> > Gitweb: 
>> > http://git.kernel.org/tip/49ca7bb328c630dd43be626534b49e19513296fd
>> > Author: Borislav Petkov <b...@suse.de>
>> > AuthorDate: Thu, 9 Feb 2017 01:34:49 +0100
>> > Committer:  Ingo Molnar <mi...@kernel.org>
>> > CommitDate: Wed, 1 Mar 2017 10:18:32 +0100
>> >
>> > x86/asm: Optimize clear_page()
>> >
>> > Currently, we CALL clear_page() which then JMPs to the proper function
>> > chosen by the alternatives.
>> >
>> > What we should do instead is CALL the proper function directly. (This
>> > was something Ingo suggested a while ago). So let's do that.
>>
>> looks like this one broke the kexec.
>> after revert it back, kexec work again.
>
> Ok, this should be fixed in the new version I just pushed out:
>
>   f25d38475519 x86/asm: Optimize clear_page()
>
> Please let me know if it doesn't.

Yes. new commit works with kexec.

Thanks

Yinghai


Re: [tip:x86/asm] x86/asm: Optimize clear_page()

2017-03-07 Thread Yinghai Lu
On Mon, Mar 6, 2017 at 11:30 PM, Ingo Molnar  wrote:
>
> * Yinghai Lu  wrote:
>
>> On Wed, Mar 1, 2017 at 1:47 AM, tip-bot for Borislav Petkov
>>  wrote:
>> > Commit-ID:  49ca7bb328c630dd43be626534b49e19513296fd
>> > Gitweb: 
>> > http://git.kernel.org/tip/49ca7bb328c630dd43be626534b49e19513296fd
>> > Author: Borislav Petkov 
>> > AuthorDate: Thu, 9 Feb 2017 01:34:49 +0100
>> > Committer:  Ingo Molnar 
>> > CommitDate: Wed, 1 Mar 2017 10:18:32 +0100
>> >
>> > x86/asm: Optimize clear_page()
>> >
>> > Currently, we CALL clear_page() which then JMPs to the proper function
>> > chosen by the alternatives.
>> >
>> > What we should do instead is CALL the proper function directly. (This
>> > was something Ingo suggested a while ago). So let's do that.
>>
>> looks like this one broke the kexec.
>> after revert it back, kexec work again.
>
> Ok, this should be fixed in the new version I just pushed out:
>
>   f25d38475519 x86/asm: Optimize clear_page()
>
> Please let me know if it doesn't.

Yes. new commit works with kexec.

Thanks

Yinghai


Re: [tip:x86/asm] x86/asm: Optimize clear_page()

2017-03-06 Thread Yinghai Lu
On Wed, Mar 1, 2017 at 1:47 AM, tip-bot for Borislav Petkov
 wrote:
> Commit-ID:  49ca7bb328c630dd43be626534b49e19513296fd
> Gitweb: http://git.kernel.org/tip/49ca7bb328c630dd43be626534b49e19513296fd
> Author: Borislav Petkov 
> AuthorDate: Thu, 9 Feb 2017 01:34:49 +0100
> Committer:  Ingo Molnar 
> CommitDate: Wed, 1 Mar 2017 10:18:32 +0100
>
> x86/asm: Optimize clear_page()
>
> Currently, we CALL clear_page() which then JMPs to the proper function
> chosen by the alternatives.
>
> What we should do instead is CALL the proper function directly. (This
> was something Ingo suggested a while ago). So let's do that.

looks like this one broke the kexec.
after revert it back, kexec work again.

10:~/k # sh kk
add_buffer: base:43fff6000 bufsz:80e0 memsz:a000
add_buffer: base:43fff1000 bufsz:44ce memsz:44ce
add_buffer: base:43c00 bufsz:eb2360 memsz:352e000
add_buffer: base:439d0d000 bufsz:22f2060 memsz:22f2060
add_buffer: base:43fff bufsz:70 memsz:70
add_buffer: base:43ffef000 bufsz:140 memsz:140
10:~/k # [   79.250483] BUG: unable to handle kernel paging request at
c467661dc038
[   79.251562] IP: __handle_mm_fault+0x256/0x910
[   79.252157] PGD 0
[   79.252159]
[   79.252733] Oops:  [#1] SMP
[   79.253243] Modules linked in:
[   79.253718] CPU: 4 PID: 5593 Comm: hald-addon-stor Not tainted
4.11.0-rc1-yh-00100-g00db9e3-dirty #175
[   79.255054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
[   79.256069] task: 8b43794c task.stack: b30dc6dac000
[   79.256887] RIP: 0010:__handle_mm_fault+0x256/0x910
[   79.257545] RSP: :b30dc6dafdd0 EFLAGS: 00010282
[   79.258225] RAX: 3928261dc000 RBX: 8b417a38dcf0 RCX: 3000
[   79.259175] RDX: 09cc3928261dcc7c RSI: 09cc3928261dcc7c RDI: b30dc6dafe48
[   79.260126] RBP: b30dc6dafe70 R08: 0001 R09: 8b43794c0c60
[   79.261095] R10: 3638e619 R11: 0001 R12: 8b427a72a538
[   79.261963] R13: c467661dc038 R14: b30dc6dafde0 R15: 0154
[   79.262903] FS:  7f29c1ce4740() GS:8b427ba0()
knlGS:
[   79.263973] CS:  0010 DS:  ES:  CR0: 80050033
[   79.264741] CR2: c467661dc038 CR3: 00033a512000 CR4: 06e0
[   79.265679] Call Trace:
[   79.266003]  ? handle_mm_fault+0x138/0x320
[   79.266431]  handle_mm_fault+0x247/0x320
[   79.266968]  ? handle_mm_fault+0x47/0x320
[   79.267491]  __do_page_fault+0x49f/0x500
[   79.268039]  do_page_fault+0x65/0x80
[   79.268508]  page_fault+0x22/0x30
[   79.268975] RIP: 0033:0x7f29c0ed53e8
[   79.269443] RSP: 002b:7ffe63a0e080 EFLAGS: 00010246
[   79.271605] RAX:  RBX: 07c7 RCX: 7f29c0ed53e8
[   79.272794] RDX: 07c7 RSI: 0002 RDI: 0060d0e0
[   79.273741] RBP: 0002 R08: 7f29c1457de0 R09: 
[   79.274698] R10: 0001 R11: 0246 R12: 0060ac20
[   79.275648] R13: 0060d0e0 R14: 0060ac28 R15: 7f29c1457de0
[   79.276596] Code: 3f 00 00 41 81 e5 f8 0f 00 00 f6 c2 80 48 0f 44
c1 4c 03 2d 25 9d ca 01 48 21 d0 49 01 c5 4d 85 ed 4c 89 6d 90 0f 84
d1 04 00 00 <49> 8b 75 00 48 f7 c6 9f ff ff ff 75 6a 48 8b 05 be 35 eb
01 a8
[   79.279121] RIP: __handle_mm_fault+0x256/0x910 RSP: b30dc6dafdd0
[   79.279965] CR2: c467661dc038
[   79.280403] ---[ end trace 7bd128a831f77757 ]---
[   79.298303] general protection fault:  [#2] SMP
[   79.298997] Modules linked in:
[   79.299402] CPU: 4 PID: 5593 Comm: hald-addon-stor Tainted: G
D 4.11.0-rc1-yh-00100-g00db9e3-dirty #175
[   79.300794] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
[   79.301707] task: 8b43794c task.stack: b30dc6dac000
[   79.302502] RIP: 0010:__wake_up_common+0x4a/0x90
[   79.303133] RSP: :8b427ba03de0 EFLAGS: 00010006
[   79.303807] RAX: b30dc6263da0 RBX: 765622af RCX: 
[   79.304769] RDX:  RSI: 0001 RDI: b30dc6263da0
[   79.305730] RBP: 8b427ba03e18 R08:  R09: 0001
[   79.306691] R10:  R11: 0e2e7ae4 R12: afe71d08
[   79.307642] R13: 58e0432d872b20f9 R14:  R15: 0001
[   79.308571] FS:  7f29c1ce4740() GS:8b427ba0()
knlGS:
[   79.309653] CS:  0010 DS:  ES:  CR0: 80050033
[   79.310434] CR2: c467661dc038 CR3: 00033a512000 CR4: 06e0
[   79.311398] Call Trace:
[   79.311724]  
[   79.311998]  __wake_up+0x39/0x50
[   79.312458]  wake_up_klogd_work_func+0x52/0x60
[   79.313119]  irq_work_run_list+0x43/0x70
[   79.313634]  ? tick_sched_handle.isra.16+0x50/0x50
[   79.314289]  irq_work_tick+0x40/0x50
[   79.314754]  update_process_times+0x42/0x60
[   79.315332]  tick_sched_handle.isra.16+0x41/0x50
[   79.315933]  tick_sched_timer+0x3d/0x70
[   

Re: [tip:x86/asm] x86/asm: Optimize clear_page()

2017-03-06 Thread Yinghai Lu
On Wed, Mar 1, 2017 at 1:47 AM, tip-bot for Borislav Petkov
 wrote:
> Commit-ID:  49ca7bb328c630dd43be626534b49e19513296fd
> Gitweb: http://git.kernel.org/tip/49ca7bb328c630dd43be626534b49e19513296fd
> Author: Borislav Petkov 
> AuthorDate: Thu, 9 Feb 2017 01:34:49 +0100
> Committer:  Ingo Molnar 
> CommitDate: Wed, 1 Mar 2017 10:18:32 +0100
>
> x86/asm: Optimize clear_page()
>
> Currently, we CALL clear_page() which then JMPs to the proper function
> chosen by the alternatives.
>
> What we should do instead is CALL the proper function directly. (This
> was something Ingo suggested a while ago). So let's do that.

looks like this one broke the kexec.
after revert it back, kexec work again.

10:~/k # sh kk
add_buffer: base:43fff6000 bufsz:80e0 memsz:a000
add_buffer: base:43fff1000 bufsz:44ce memsz:44ce
add_buffer: base:43c00 bufsz:eb2360 memsz:352e000
add_buffer: base:439d0d000 bufsz:22f2060 memsz:22f2060
add_buffer: base:43fff bufsz:70 memsz:70
add_buffer: base:43ffef000 bufsz:140 memsz:140
10:~/k # [   79.250483] BUG: unable to handle kernel paging request at
c467661dc038
[   79.251562] IP: __handle_mm_fault+0x256/0x910
[   79.252157] PGD 0
[   79.252159]
[   79.252733] Oops:  [#1] SMP
[   79.253243] Modules linked in:
[   79.253718] CPU: 4 PID: 5593 Comm: hald-addon-stor Not tainted
4.11.0-rc1-yh-00100-g00db9e3-dirty #175
[   79.255054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
[   79.256069] task: 8b43794c task.stack: b30dc6dac000
[   79.256887] RIP: 0010:__handle_mm_fault+0x256/0x910
[   79.257545] RSP: :b30dc6dafdd0 EFLAGS: 00010282
[   79.258225] RAX: 3928261dc000 RBX: 8b417a38dcf0 RCX: 3000
[   79.259175] RDX: 09cc3928261dcc7c RSI: 09cc3928261dcc7c RDI: b30dc6dafe48
[   79.260126] RBP: b30dc6dafe70 R08: 0001 R09: 8b43794c0c60
[   79.261095] R10: 3638e619 R11: 0001 R12: 8b427a72a538
[   79.261963] R13: c467661dc038 R14: b30dc6dafde0 R15: 0154
[   79.262903] FS:  7f29c1ce4740() GS:8b427ba0()
knlGS:
[   79.263973] CS:  0010 DS:  ES:  CR0: 80050033
[   79.264741] CR2: c467661dc038 CR3: 00033a512000 CR4: 06e0
[   79.265679] Call Trace:
[   79.266003]  ? handle_mm_fault+0x138/0x320
[   79.266431]  handle_mm_fault+0x247/0x320
[   79.266968]  ? handle_mm_fault+0x47/0x320
[   79.267491]  __do_page_fault+0x49f/0x500
[   79.268039]  do_page_fault+0x65/0x80
[   79.268508]  page_fault+0x22/0x30
[   79.268975] RIP: 0033:0x7f29c0ed53e8
[   79.269443] RSP: 002b:7ffe63a0e080 EFLAGS: 00010246
[   79.271605] RAX:  RBX: 07c7 RCX: 7f29c0ed53e8
[   79.272794] RDX: 07c7 RSI: 0002 RDI: 0060d0e0
[   79.273741] RBP: 0002 R08: 7f29c1457de0 R09: 
[   79.274698] R10: 0001 R11: 0246 R12: 0060ac20
[   79.275648] R13: 0060d0e0 R14: 0060ac28 R15: 7f29c1457de0
[   79.276596] Code: 3f 00 00 41 81 e5 f8 0f 00 00 f6 c2 80 48 0f 44
c1 4c 03 2d 25 9d ca 01 48 21 d0 49 01 c5 4d 85 ed 4c 89 6d 90 0f 84
d1 04 00 00 <49> 8b 75 00 48 f7 c6 9f ff ff ff 75 6a 48 8b 05 be 35 eb
01 a8
[   79.279121] RIP: __handle_mm_fault+0x256/0x910 RSP: b30dc6dafdd0
[   79.279965] CR2: c467661dc038
[   79.280403] ---[ end trace 7bd128a831f77757 ]---
[   79.298303] general protection fault:  [#2] SMP
[   79.298997] Modules linked in:
[   79.299402] CPU: 4 PID: 5593 Comm: hald-addon-stor Tainted: G
D 4.11.0-rc1-yh-00100-g00db9e3-dirty #175
[   79.300794] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
[   79.301707] task: 8b43794c task.stack: b30dc6dac000
[   79.302502] RIP: 0010:__wake_up_common+0x4a/0x90
[   79.303133] RSP: :8b427ba03de0 EFLAGS: 00010006
[   79.303807] RAX: b30dc6263da0 RBX: 765622af RCX: 
[   79.304769] RDX:  RSI: 0001 RDI: b30dc6263da0
[   79.305730] RBP: 8b427ba03e18 R08:  R09: 0001
[   79.306691] R10:  R11: 0e2e7ae4 R12: afe71d08
[   79.307642] R13: 58e0432d872b20f9 R14:  R15: 0001
[   79.308571] FS:  7f29c1ce4740() GS:8b427ba0()
knlGS:
[   79.309653] CS:  0010 DS:  ES:  CR0: 80050033
[   79.310434] CR2: c467661dc038 CR3: 00033a512000 CR4: 06e0
[   79.311398] Call Trace:
[   79.311724]  
[   79.311998]  __wake_up+0x39/0x50
[   79.312458]  wake_up_klogd_work_func+0x52/0x60
[   79.313119]  irq_work_run_list+0x43/0x70
[   79.313634]  ? tick_sched_handle.isra.16+0x50/0x50
[   79.314289]  irq_work_tick+0x40/0x50
[   79.314754]  update_process_times+0x42/0x60
[   79.315332]  tick_sched_handle.isra.16+0x41/0x50
[   79.315933]  tick_sched_timer+0x3d/0x70
[   79.316472]  __hrtimer_run_queues+0x264/0x440
[   

[PATCH] PCI/aspm: Fix link->downstream setting

2017-03-01 Thread Yinghai Lu
~ # echo 1 > /sys/bus/pci/devices/\:0b\:00.0/remove
...
 BUG: unable to handle kernel NULL pointer dereference at 0080
 IP: pcie_config_aspm_link+0x5d/0x2b0
 Call Trace:
  pcie_aspm_exit_link_state+0x75/0x130
  pci_stop_bus_device+0xa4/0xb0
  pci_stop_and_remove_bus_device_locked+0x1a/0x30
  remove_store+0x50/0x70
  dev_attr_store+0x18/0x30
  sysfs_kf_write+0x44/0x60
  kernfs_fop_write+0x10e/0x190
  __vfs_write+0x28/0x110
  ? rcu_read_lock_sched_held+0x5d/0x80
  ? rcu_sync_lockdep_assert+0x2c/0x60
  ? __sb_start_write+0x173/0x1a0
  ? vfs_write+0xb3/0x180
  vfs_write+0xc4/0x180
  SyS_write+0x49/0xa0
  do_syscall_64+0xa6/0x1c0
  entry_SYSCALL64_slow_path+0x25/0x25
 ---[ end trace bd187ee0267df5d9 ]---

set downstream even with blacklist path.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/pcie/aspm.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/pci/pcie/aspm.c
===
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -478,7 +478,7 @@ static void aspm_calc_l1ss_info(struct p
 
 static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 {
-   struct pci_dev *child, *parent = link->pdev;
+   struct pci_dev *child = link->downstream, *parent = link->pdev;
struct pci_bus *linkbus = parent->subordinate;
struct aspm_register_info upreg, dwreg;
 
@@ -491,9 +491,7 @@ static void pcie_aspm_cap_init(struct pc
 
/* Get upstream/downstream components' register state */
pcie_get_aspm_reg(parent, );
-   child = pci_function_0(linkbus);
pcie_get_aspm_reg(child, );
-   link->downstream = child;
 
/*
 * If ASPM not supported, don't mess with the clocks and link,
@@ -800,6 +798,7 @@ static struct pcie_link_state *alloc_pci
INIT_LIST_HEAD(>children);
INIT_LIST_HEAD(>link);
link->pdev = pdev;
+   link->downstream = pci_function_0(pdev->subordinate);
 
/*
 * Root Ports and PCI/PCI-X to PCIe Bridges are roots of PCIe


[PATCH] PCI/aspm: Fix link->downstream setting

2017-03-01 Thread Yinghai Lu
~ # echo 1 > /sys/bus/pci/devices/\:0b\:00.0/remove
...
 BUG: unable to handle kernel NULL pointer dereference at 0080
 IP: pcie_config_aspm_link+0x5d/0x2b0
 Call Trace:
  pcie_aspm_exit_link_state+0x75/0x130
  pci_stop_bus_device+0xa4/0xb0
  pci_stop_and_remove_bus_device_locked+0x1a/0x30
  remove_store+0x50/0x70
  dev_attr_store+0x18/0x30
  sysfs_kf_write+0x44/0x60
  kernfs_fop_write+0x10e/0x190
  __vfs_write+0x28/0x110
  ? rcu_read_lock_sched_held+0x5d/0x80
  ? rcu_sync_lockdep_assert+0x2c/0x60
  ? __sb_start_write+0x173/0x1a0
  ? vfs_write+0xb3/0x180
  vfs_write+0xc4/0x180
  SyS_write+0x49/0xa0
  do_syscall_64+0xa6/0x1c0
  entry_SYSCALL64_slow_path+0x25/0x25
 ---[ end trace bd187ee0267df5d9 ]---

set downstream even with blacklist path.

Signed-off-by: Yinghai Lu 

---
 drivers/pci/pcie/aspm.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/pci/pcie/aspm.c
===
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -478,7 +478,7 @@ static void aspm_calc_l1ss_info(struct p
 
 static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 {
-   struct pci_dev *child, *parent = link->pdev;
+   struct pci_dev *child = link->downstream, *parent = link->pdev;
struct pci_bus *linkbus = parent->subordinate;
struct aspm_register_info upreg, dwreg;
 
@@ -491,9 +491,7 @@ static void pcie_aspm_cap_init(struct pc
 
/* Get upstream/downstream components' register state */
pcie_get_aspm_reg(parent, );
-   child = pci_function_0(linkbus);
pcie_get_aspm_reg(child, );
-   link->downstream = child;
 
/*
 * If ASPM not supported, don't mess with the clocks and link,
@@ -800,6 +798,7 @@ static struct pcie_link_state *alloc_pci
INIT_LIST_HEAD(>children);
INIT_LIST_HEAD(>link);
link->pdev = pdev;
+   link->downstream = pci_function_0(pdev->subordinate);
 
/*
 * Root Ports and PCI/PCI-X to PCIe Bridges are roots of PCIe


Re: [PATCH] PCI, pciehp: Reuse set_slot_off()

2017-02-24 Thread Yinghai Lu
On Fri, Feb 24, 2017 at 9:17 AM, Raj, Ashok <ashok@intel.com> wrote:
> On Thu, Feb 23, 2017 at 10:54:35PM -0800, Yinghai Lu wrote:
>> +++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -71,9 +71,6 @@ static void set_slot_off(struct controll
>>*/
>>   msleep(1000);
>>   }
>> -
>> - pciehp_green_led_off(pslot);
>> - pciehp_set_attention_status(pslot, 1);
>
> Re using set_slot_off() in remove_board() make sense.. but i'm not sure
> why these are pulled out? It seems to be functionally complete
> when these are done in set_slot_off().

Just don't want to the led operation is wrapped too deep inside.

In board_added(), has led operation code directly for success path.

so put led operation code directly for error path make them more symmetrically.

>
>>  }
>>
>>  /**
>> @@ -126,6 +123,8 @@ static int board_added(struct slot *p_sl
>>
>>  err_exit:
>>   set_slot_off(ctrl, p_slot);
>> + pciehp_green_led_off(p_slot);
>> + pciehp_set_attention_status(p_slot, 1);
>>   return retval;
>>  }
>>
>> @@ -142,16 +141,7 @@ static int remove_board(struct slot *p_s
>>   if (retval)
>>   return retval;
>>
>> - if (POWER_CTRL(ctrl)) {
>> - pciehp_power_off_slot(p_slot);
>> -
>> - /*
>> -  * After turning power off, we must wait for at least 1 second
>> -  * before taking any action that relies on power having been
>> -  * removed from the slot/adapter.
>> -  */
>> - msleep(1000);
>> - }
>> + set_slot_off(ctrl, p_slot);
>>
>>   /* turn off Green LED */
>>   pciehp_green_led_off(p_slot);
> Don't we need the pciehp_set_attention_status() here?

that attention led could be on if previous power on is not done successfully

Thanks

Yinghai.


Re: [PATCH] PCI, pciehp: Reuse set_slot_off()

2017-02-24 Thread Yinghai Lu
On Fri, Feb 24, 2017 at 9:17 AM, Raj, Ashok  wrote:
> On Thu, Feb 23, 2017 at 10:54:35PM -0800, Yinghai Lu wrote:
>> +++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -71,9 +71,6 @@ static void set_slot_off(struct controll
>>*/
>>   msleep(1000);
>>   }
>> -
>> - pciehp_green_led_off(pslot);
>> - pciehp_set_attention_status(pslot, 1);
>
> Re using set_slot_off() in remove_board() make sense.. but i'm not sure
> why these are pulled out? It seems to be functionally complete
> when these are done in set_slot_off().

Just don't want to the led operation is wrapped too deep inside.

In board_added(), has led operation code directly for success path.

so put led operation code directly for error path make them more symmetrically.

>
>>  }
>>
>>  /**
>> @@ -126,6 +123,8 @@ static int board_added(struct slot *p_sl
>>
>>  err_exit:
>>   set_slot_off(ctrl, p_slot);
>> + pciehp_green_led_off(p_slot);
>> + pciehp_set_attention_status(p_slot, 1);
>>   return retval;
>>  }
>>
>> @@ -142,16 +141,7 @@ static int remove_board(struct slot *p_s
>>   if (retval)
>>   return retval;
>>
>> - if (POWER_CTRL(ctrl)) {
>> - pciehp_power_off_slot(p_slot);
>> -
>> - /*
>> -  * After turning power off, we must wait for at least 1 second
>> -  * before taking any action that relies on power having been
>> -  * removed from the slot/adapter.
>> -  */
>> - msleep(1000);
>> - }
>> + set_slot_off(ctrl, p_slot);
>>
>>   /* turn off Green LED */
>>   pciehp_green_led_off(p_slot);
> Don't we need the pciehp_set_attention_status() here?

that attention led could be on if previous power on is not done successfully

Thanks

Yinghai.


[PATCH] PCI, pciehp: Reuse set_slot_off()

2017-02-23 Thread Yinghai Lu
Now set_slot_off() is used in board_added() err path.

We could reuse in remove_board.

Also need to move green_led and attention_status out it.
and make code more readable.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/hotplug/pciehp_ctrl.c |   16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
@@ -71,9 +71,6 @@ static void set_slot_off(struct controll
 */
msleep(1000);
}
-
-   pciehp_green_led_off(pslot);
-   pciehp_set_attention_status(pslot, 1);
 }
 
 /**
@@ -126,6 +123,8 @@ static int board_added(struct slot *p_sl
 
 err_exit:
set_slot_off(ctrl, p_slot);
+   pciehp_green_led_off(p_slot);
+   pciehp_set_attention_status(p_slot, 1);
return retval;
 }
 
@@ -142,16 +141,7 @@ static int remove_board(struct slot *p_s
if (retval)
return retval;
 
-   if (POWER_CTRL(ctrl)) {
-   pciehp_power_off_slot(p_slot);
-
-   /*
-* After turning power off, we must wait for at least 1 second
-* before taking any action that relies on power having been
-* removed from the slot/adapter.
-*/
-   msleep(1000);
-   }
+   set_slot_off(ctrl, p_slot);
 
/* turn off Green LED */
pciehp_green_led_off(p_slot);


[PATCH] PCI, pciehp: Reuse set_slot_off()

2017-02-23 Thread Yinghai Lu
Now set_slot_off() is used in board_added() err path.

We could reuse in remove_board.

Also need to move green_led and attention_status out it.
and make code more readable.

Signed-off-by: Yinghai Lu 

---
 drivers/pci/hotplug/pciehp_ctrl.c |   16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
@@ -71,9 +71,6 @@ static void set_slot_off(struct controll
 */
msleep(1000);
}
-
-   pciehp_green_led_off(pslot);
-   pciehp_set_attention_status(pslot, 1);
 }
 
 /**
@@ -126,6 +123,8 @@ static int board_added(struct slot *p_sl
 
 err_exit:
set_slot_off(ctrl, p_slot);
+   pciehp_green_led_off(p_slot);
+   pciehp_set_attention_status(p_slot, 1);
return retval;
 }
 
@@ -142,16 +141,7 @@ static int remove_board(struct slot *p_s
if (retval)
return retval;
 
-   if (POWER_CTRL(ctrl)) {
-   pciehp_power_off_slot(p_slot);
-
-   /*
-* After turning power off, we must wait for at least 1 second
-* before taking any action that relies on power having been
-* removed from the slot/adapter.
-*/
-   msleep(1000);
-   }
+   set_slot_off(ctrl, p_slot);
 
/* turn off Green LED */
pciehp_green_led_off(p_slot);


[PATCH] PCI,pciehp: Skip not changed command write

2017-02-23 Thread Yinghai Lu
Notice two systems with different cpu hve different print out when power on
slots:
First one:
 pciehp :60:03.2:pcie004: pciehp_green_led_on: SLOTCTRL a8 write cmd 100
 pciehp :60:03.2:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write 
cmd c0
 pciehp :60:03.2:pcie004: pending interrupts 0x0010 from Slot Status

Second one:
 pciehp :73:00.0:pcie004: pciehp_green_led_on: SLOTCTRL a8 write cmd 100
 pciehp :73:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write 
cmd c0
 pciehp :73:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :73:00.0:pcie004: pending interrupts 0x0010 from Slot Status

Actually attention is not changed on both.

First one will not generate CC if write not change cmd.
Second one will generate CC even if write not change cmd.

To avoid those difference interrupts, check if we are trying to
write same cmd, if so skip writing.

Signed-off-by: Yinghai Lu <ying...@kernel.org>
---
 drivers/pci/hotplug/pciehp_hpc.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -186,7 +186,7 @@ static void pcie_do_write_cmd(struct con
  u16 mask, bool wait)
 {
struct pci_dev *pdev = ctrl_dev(ctrl);
-   u16 slot_ctrl;
+   u16 slot_ctrl, old_slot_ctrl;
 
mutex_lock(>ctrl_lock);
 
@@ -201,8 +201,14 @@ static void pcie_do_write_cmd(struct con
goto out;
}
 
+   old_slot_ctrl = slot_ctrl;
slot_ctrl &= ~mask;
slot_ctrl |= (cmd & mask);
+   if (slot_ctrl == old_slot_ctrl) {
+   ctrl_dbg(ctrl, "%s: mask/cmd %x/%x no change\n", __func__,
+mask, cmd & mask);
+   goto out;
+   }
ctrl->cmd_busy = 1;
smp_mb();
pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);


[PATCH] PCI,pciehp: Skip not changed command write

2017-02-23 Thread Yinghai Lu
Notice two systems with different cpu hve different print out when power on
slots:
First one:
 pciehp :60:03.2:pcie004: pciehp_green_led_on: SLOTCTRL a8 write cmd 100
 pciehp :60:03.2:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write 
cmd c0
 pciehp :60:03.2:pcie004: pending interrupts 0x0010 from Slot Status

Second one:
 pciehp :73:00.0:pcie004: pciehp_green_led_on: SLOTCTRL a8 write cmd 100
 pciehp :73:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write 
cmd c0
 pciehp :73:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :73:00.0:pcie004: pending interrupts 0x0010 from Slot Status

Actually attention is not changed on both.

First one will not generate CC if write not change cmd.
Second one will generate CC even if write not change cmd.

To avoid those difference interrupts, check if we are trying to
write same cmd, if so skip writing.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/hotplug/pciehp_hpc.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -186,7 +186,7 @@ static void pcie_do_write_cmd(struct con
  u16 mask, bool wait)
 {
struct pci_dev *pdev = ctrl_dev(ctrl);
-   u16 slot_ctrl;
+   u16 slot_ctrl, old_slot_ctrl;
 
mutex_lock(>ctrl_lock);
 
@@ -201,8 +201,14 @@ static void pcie_do_write_cmd(struct con
goto out;
}
 
+   old_slot_ctrl = slot_ctrl;
slot_ctrl &= ~mask;
slot_ctrl |= (cmd & mask);
+   if (slot_ctrl == old_slot_ctrl) {
+   ctrl_dbg(ctrl, "%s: mask/cmd %x/%x no change\n", __func__,
+mask, cmd & mask);
+   goto out;
+   }
ctrl->cmd_busy = 1;
smp_mb();
pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);


Re: [PATCH] PCI,pciehp: Move printout before write_cmd

2017-02-23 Thread yinghai . lu

yes.


On 02/23/2017 01:55 PM, James Puthukattukaran wrote:

So, the issue is that you could get the following sequence -

write command
print out in ISR for pending interrupt
ctrl_dbg message for write command

and this makes it look like the write command occurred after the 
pending interrupt message?




On 02/23/2017 03:28 PM, Yinghai Lu wrote:

Bjorn complained some strange print out for pending interrupts.

Actually that is caused that have cmd print out after write_cmd.

Adjust the sequence to get right order for debug print out.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
b/drivers/pci/hotplug/pciehp_hpc.c

index 026830a..94df18f99 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -475,9 +475,9 @@ void pciehp_set_attention_status(struct slot 
*slot, u8 value)

  default:
  return;
  }
-pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
+pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
  }
void pciehp_green_led_on(struct slot *slot)
@@ -487,11 +487,11 @@ void pciehp_green_led_on(struct slot *slot)
  if (!PWR_LED(ctrl))
  return;
  -pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
-  PCI_EXP_SLTCTL_PIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_IND_ON);
+pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
+  PCI_EXP_SLTCTL_PIC);
  }
void pciehp_green_led_off(struct slot *slot)
@@ -501,11 +501,11 @@ void pciehp_green_led_off(struct slot *slot)
  if (!PWR_LED(ctrl))
  return;
  -pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
-  PCI_EXP_SLTCTL_PIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_IND_OFF);
+pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
+  PCI_EXP_SLTCTL_PIC);
  }
void pciehp_green_led_blink(struct slot *slot)
@@ -515,11 +515,11 @@ void pciehp_green_led_blink(struct slot *slot)
  if (!PWR_LED(ctrl))
  return;
  -pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
-  PCI_EXP_SLTCTL_PIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_IND_BLINK);
+pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
+  PCI_EXP_SLTCTL_PIC);
  }
int pciehp_power_on_slot(struct slot *slot)
@@ -536,10 +536,10 @@ int pciehp_power_on_slot(struct slot *slot)
 PCI_EXP_SLTSTA_PFD);
  ctrl->power_fault_detected = 0;
  -pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_ON);
+pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
retval = pciehp_link_enable(ctrl);
  if (retval)
@@ -552,10 +552,10 @@ void pciehp_power_off_slot(struct slot *slot)
  {
  struct controller *ctrl = slot->ctrl;
  -pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_OFF);
+pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
  }
static irqreturn_t pciehp_isr(int irq, void *dev_id)
@@ -701,9 +701,9 @@ void pcie_enable_notification(struct controller 
*ctrl)

  PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
  PCI_EXP_SLTCTL_DLLSCE);
  -pcie_write_cmd_nowait(ctrl, cmd, mask);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
+pcie_write_cmd_nowait(ctrl, cmd, mask);
  }
static void pcie_disable_notification(struct controller *ctrl)
@@ -714,9 +714,9 @@ static void pcie_disable_notification(struct 
controller *ctrl)

  PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
  PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
  PCI_EXP_SLTCTL_DLLSCE);
-pcie_write_cmd(ctrl, 0, mask);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+pcie_write_cmd(ctrl, 0, mask);
  }
/*
@@ -743,18 +743,18 @@ int pciehp_reset_slot(struct slot *slot, int 
probe)

  ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
  s

Re: [PATCH] PCI,pciehp: Move printout before write_cmd

2017-02-23 Thread yinghai . lu

yes.


On 02/23/2017 01:55 PM, James Puthukattukaran wrote:

So, the issue is that you could get the following sequence -

write command
print out in ISR for pending interrupt
ctrl_dbg message for write command

and this makes it look like the write command occurred after the 
pending interrupt message?




On 02/23/2017 03:28 PM, Yinghai Lu wrote:

Bjorn complained some strange print out for pending interrupts.

Actually that is caused that have cmd print out after write_cmd.

Adjust the sequence to get right order for debug print out.

Signed-off-by: Yinghai Lu 

diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
b/drivers/pci/hotplug/pciehp_hpc.c

index 026830a..94df18f99 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -475,9 +475,9 @@ void pciehp_set_attention_status(struct slot 
*slot, u8 value)

  default:
  return;
  }
-pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
+pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
  }
void pciehp_green_led_on(struct slot *slot)
@@ -487,11 +487,11 @@ void pciehp_green_led_on(struct slot *slot)
  if (!PWR_LED(ctrl))
  return;
  -pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
-  PCI_EXP_SLTCTL_PIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_IND_ON);
+pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
+  PCI_EXP_SLTCTL_PIC);
  }
void pciehp_green_led_off(struct slot *slot)
@@ -501,11 +501,11 @@ void pciehp_green_led_off(struct slot *slot)
  if (!PWR_LED(ctrl))
  return;
  -pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
-  PCI_EXP_SLTCTL_PIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_IND_OFF);
+pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
+  PCI_EXP_SLTCTL_PIC);
  }
void pciehp_green_led_blink(struct slot *slot)
@@ -515,11 +515,11 @@ void pciehp_green_led_blink(struct slot *slot)
  if (!PWR_LED(ctrl))
  return;
  -pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
-  PCI_EXP_SLTCTL_PIC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_IND_BLINK);
+pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
+  PCI_EXP_SLTCTL_PIC);
  }
int pciehp_power_on_slot(struct slot *slot)
@@ -536,10 +536,10 @@ int pciehp_power_on_slot(struct slot *slot)
 PCI_EXP_SLTSTA_PFD);
  ctrl->power_fault_detected = 0;
  -pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_ON);
+pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
retval = pciehp_link_enable(ctrl);
  if (retval)
@@ -552,10 +552,10 @@ void pciehp_power_off_slot(struct slot *slot)
  {
  struct controller *ctrl = slot->ctrl;
  -pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
   PCI_EXP_SLTCTL_PWR_OFF);
+pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
  }
static irqreturn_t pciehp_isr(int irq, void *dev_id)
@@ -701,9 +701,9 @@ void pcie_enable_notification(struct controller 
*ctrl)

  PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
  PCI_EXP_SLTCTL_DLLSCE);
  -pcie_write_cmd_nowait(ctrl, cmd, mask);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
+pcie_write_cmd_nowait(ctrl, cmd, mask);
  }
static void pcie_disable_notification(struct controller *ctrl)
@@ -714,9 +714,9 @@ static void pcie_disable_notification(struct 
controller *ctrl)

  PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
  PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
  PCI_EXP_SLTCTL_DLLSCE);
-pcie_write_cmd(ctrl, 0, mask);
  ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
   pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+pcie_write_cmd(ctrl, 0, mask);
  }
/*
@@ -743,18 +743,18 @@ int pciehp_reset_slot(struct slot *slot, int 
probe)

  ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
  stat_mask |= PCI_EXP_SLTSTA_DLLS

[PATCH -v2] PCI,pciehp: Not write linkctrl register if val is not changed

2017-02-23 Thread Yinghai Lu
Most system have port link enabled by default, and should not
have confusing printout.

Also move printout before actully write, so could make debug print in order.

-v2: inline __pciehp_link_set into pciehp_link_enable

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |   19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -333,28 +333,23 @@ int pciehp_check_link_status(struct cont
return 0;
 }
 
-static int __pciehp_link_set(struct controller *ctrl, bool enable)
+static int pciehp_link_enable(struct controller *ctrl)
 {
struct pci_dev *pdev = ctrl_dev(ctrl);
-   u16 lnk_ctrl;
+   u16 lnk_ctrl, old_lnk_ctrl;
 
pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, _ctrl);
+   old_lnk_ctrl = lnk_ctrl;
 
-   if (enable)
-   lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
-   else
-   lnk_ctrl |= PCI_EXP_LNKCTL_LD;
+   lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
+   if (old_lnk_ctrl == lnk_ctrl)
+   return 0;
 
-   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
return 0;
 }
 
-static int pciehp_link_enable(struct controller *ctrl)
-{
-   return __pciehp_link_set(ctrl, true);
-}
-
 int pciehp_get_raw_indicator_status(struct hotplug_slot *hotplug_slot,
u8 *status)
 {


[PATCH -v2] PCI,pciehp: Not write linkctrl register if val is not changed

2017-02-23 Thread Yinghai Lu
Most system have port link enabled by default, and should not
have confusing printout.

Also move printout before actully write, so could make debug print in order.

-v2: inline __pciehp_link_set into pciehp_link_enable

Signed-off-by: Yinghai Lu 

---
 drivers/pci/hotplug/pciehp_hpc.c |   19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -333,28 +333,23 @@ int pciehp_check_link_status(struct cont
return 0;
 }
 
-static int __pciehp_link_set(struct controller *ctrl, bool enable)
+static int pciehp_link_enable(struct controller *ctrl)
 {
struct pci_dev *pdev = ctrl_dev(ctrl);
-   u16 lnk_ctrl;
+   u16 lnk_ctrl, old_lnk_ctrl;
 
pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, _ctrl);
+   old_lnk_ctrl = lnk_ctrl;
 
-   if (enable)
-   lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
-   else
-   lnk_ctrl |= PCI_EXP_LNKCTL_LD;
+   lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
+   if (old_lnk_ctrl == lnk_ctrl)
+   return 0;
 
-   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
return 0;
 }
 
-static int pciehp_link_enable(struct controller *ctrl)
-{
-   return __pciehp_link_set(ctrl, true);
-}
-
 int pciehp_get_raw_indicator_status(struct hotplug_slot *hotplug_slot,
u8 *status)
 {


[PATCH] PCI,pciehp: Not write linkctrl register if val is changed

2017-02-23 Thread Yinghai Lu
Most system have port link enabled by default, and should not
have confusing printout.

Also move printout before actully write, so could make debug print in order.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -336,17 +336,21 @@ int pciehp_check_link_status(struct cont
 static int __pciehp_link_set(struct controller *ctrl, bool enable)
 {
struct pci_dev *pdev = ctrl_dev(ctrl);
-   u16 lnk_ctrl;
+   u16 lnk_ctrl, old_lnk_ctrl;
 
pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, _ctrl);
+   old_lnk_ctrl = lnk_ctrl;
 
if (enable)
lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
else
lnk_ctrl |= PCI_EXP_LNKCTL_LD;
 
-   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
+   if (old_lnk_ctrl == lnk_ctrl)
+   return 0;
+
ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
return 0;
 }
 


[PATCH] PCI,pciehp: Not write linkctrl register if val is changed

2017-02-23 Thread Yinghai Lu
Most system have port link enabled by default, and should not
have confusing printout.

Also move printout before actully write, so could make debug print in order.

Signed-off-by: Yinghai Lu 

---
 drivers/pci/hotplug/pciehp_hpc.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -336,17 +336,21 @@ int pciehp_check_link_status(struct cont
 static int __pciehp_link_set(struct controller *ctrl, bool enable)
 {
struct pci_dev *pdev = ctrl_dev(ctrl);
-   u16 lnk_ctrl;
+   u16 lnk_ctrl, old_lnk_ctrl;
 
pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, _ctrl);
+   old_lnk_ctrl = lnk_ctrl;
 
if (enable)
lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
else
lnk_ctrl |= PCI_EXP_LNKCTL_LD;
 
-   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
+   if (old_lnk_ctrl == lnk_ctrl)
+   return 0;
+
ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
return 0;
 }
 


[PATCH] PCI,pciehp: Move printout before write_cmd

2017-02-23 Thread Yinghai Lu
Bjorn complained some strange print out for pending interrupts.

Actually that is caused that have cmd print out after write_cmd.

Adjust the sequence to get right order for debug print out.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 026830a..94df18f99 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -475,9 +475,9 @@ void pciehp_set_attention_status(struct slot *slot, u8 
value)
default:
return;
}
-   pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
+   pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
 }
 
 void pciehp_green_led_on(struct slot *slot)
@@ -487,11 +487,11 @@ void pciehp_green_led_on(struct slot *slot)
if (!PWR_LED(ctrl))
return;
 
-   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
- PCI_EXP_SLTCTL_PIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_IND_ON);
+   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
+ PCI_EXP_SLTCTL_PIC);
 }
 
 void pciehp_green_led_off(struct slot *slot)
@@ -501,11 +501,11 @@ void pciehp_green_led_off(struct slot *slot)
if (!PWR_LED(ctrl))
return;
 
-   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
- PCI_EXP_SLTCTL_PIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_IND_OFF);
+   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
+ PCI_EXP_SLTCTL_PIC);
 }
 
 void pciehp_green_led_blink(struct slot *slot)
@@ -515,11 +515,11 @@ void pciehp_green_led_blink(struct slot *slot)
if (!PWR_LED(ctrl))
return;
 
-   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
- PCI_EXP_SLTCTL_PIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_IND_BLINK);
+   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
+ PCI_EXP_SLTCTL_PIC);
 }
 
 int pciehp_power_on_slot(struct slot *slot)
@@ -536,10 +536,10 @@ int pciehp_power_on_slot(struct slot *slot)
   PCI_EXP_SLTSTA_PFD);
ctrl->power_fault_detected = 0;
 
-   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_ON);
+   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
 
retval = pciehp_link_enable(ctrl);
if (retval)
@@ -552,10 +552,10 @@ void pciehp_power_off_slot(struct slot *slot)
 {
struct controller *ctrl = slot->ctrl;
 
-   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_OFF);
+   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
 }
 
 static irqreturn_t pciehp_isr(int irq, void *dev_id)
@@ -701,9 +701,9 @@ void pcie_enable_notification(struct controller *ctrl)
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
PCI_EXP_SLTCTL_DLLSCE);
 
-   pcie_write_cmd_nowait(ctrl, cmd, mask);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
+   pcie_write_cmd_nowait(ctrl, cmd, mask);
 }
 
 static void pcie_disable_notification(struct controller *ctrl)
@@ -714,9 +714,9 @@ static void pcie_disable_notification(struct controller 
*ctrl)
PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
PCI_EXP_SLTCTL_DLLSCE);
-   pcie_write_cmd(ctrl, 0, mask);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+   pcie_write_cmd(ctrl, 0, mask);
 }
 
 /*
@@ -743,18 +743,18 @@ int pciehp_reset_slot(struct slot *slot, int probe)
ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
stat_mask |= PCI_EXP_SLTSTA_DLLSC;
 
-   pcie_write_cmd(ctrl, 0, ctrl_m

[PATCH] PCI,pciehp: Move printout before write_cmd

2017-02-23 Thread Yinghai Lu
Bjorn complained some strange print out for pending interrupts.

Actually that is caused that have cmd print out after write_cmd.

Adjust the sequence to get right order for debug print out.

Signed-off-by: Yinghai Lu 

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 026830a..94df18f99 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -475,9 +475,9 @@ void pciehp_set_attention_status(struct slot *slot, u8 
value)
default:
return;
}
-   pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
+   pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
 }
 
 void pciehp_green_led_on(struct slot *slot)
@@ -487,11 +487,11 @@ void pciehp_green_led_on(struct slot *slot)
if (!PWR_LED(ctrl))
return;
 
-   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
- PCI_EXP_SLTCTL_PIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_IND_ON);
+   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON,
+ PCI_EXP_SLTCTL_PIC);
 }
 
 void pciehp_green_led_off(struct slot *slot)
@@ -501,11 +501,11 @@ void pciehp_green_led_off(struct slot *slot)
if (!PWR_LED(ctrl))
return;
 
-   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
- PCI_EXP_SLTCTL_PIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_IND_OFF);
+   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
+ PCI_EXP_SLTCTL_PIC);
 }
 
 void pciehp_green_led_blink(struct slot *slot)
@@ -515,11 +515,11 @@ void pciehp_green_led_blink(struct slot *slot)
if (!PWR_LED(ctrl))
return;
 
-   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
- PCI_EXP_SLTCTL_PIC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_IND_BLINK);
+   pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK,
+ PCI_EXP_SLTCTL_PIC);
 }
 
 int pciehp_power_on_slot(struct slot *slot)
@@ -536,10 +536,10 @@ int pciehp_power_on_slot(struct slot *slot)
   PCI_EXP_SLTSTA_PFD);
ctrl->power_fault_detected = 0;
 
-   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_ON);
+   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
 
retval = pciehp_link_enable(ctrl);
if (retval)
@@ -552,10 +552,10 @@ void pciehp_power_off_slot(struct slot *slot)
 {
struct controller *ctrl = slot->ctrl;
 
-   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PWR_OFF);
+   pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
 }
 
 static irqreturn_t pciehp_isr(int irq, void *dev_id)
@@ -701,9 +701,9 @@ void pcie_enable_notification(struct controller *ctrl)
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
PCI_EXP_SLTCTL_DLLSCE);
 
-   pcie_write_cmd_nowait(ctrl, cmd, mask);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
+   pcie_write_cmd_nowait(ctrl, cmd, mask);
 }
 
 static void pcie_disable_notification(struct controller *ctrl)
@@ -714,9 +714,9 @@ static void pcie_disable_notification(struct controller 
*ctrl)
PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
PCI_EXP_SLTCTL_DLLSCE);
-   pcie_write_cmd(ctrl, 0, mask);
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+   pcie_write_cmd(ctrl, 0, mask);
 }
 
 /*
@@ -743,18 +743,18 @@ int pciehp_reset_slot(struct slot *slot, int probe)
ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
stat_mask |= PCI_EXP_SLTSTA_DLLSC;
 
-   pcie_write_cmd(ctrl, 0, ctrl_mask);
ctrl_dbg(ctrl, &q

Re: [PATCH] PCI,pciehp: Don't handle PDC for cards with attention button

2017-02-17 Thread Yinghai Lu
On Fri, Feb 17, 2017 at 2:39 PM, Bjorn Helgaas <helg...@kernel.org> wrote:
> On Thu, Feb 16, 2017 at 10:12:47PM -0800, Yinghai Lu wrote:
>
> I don't think it really makes sense to tie PDC handling with the
> attention button.  It might happen to avoid the problem on your
> system, but I don't see the logical connection between them.

but in pcie_enable_notification() we don't enable PDCE when ATTN is not there.

cmd = PCI_EXP_SLTCTL_DLLSCE;
if (ATTN_BUTTN(ctrl))
cmd |= PCI_EXP_SLTCTL_ABPE;
else
cmd |= PCI_EXP_SLTCTL_PDCE;
if (!pciehp_poll_mode)
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
PCI_EXP_SLTCTL_PFDE |
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
PCI_EXP_SLTCTL_DLLSCE);

pcie_write_cmd_nowait(ctrl, cmd, mask);

should we remove that check there ?

>
> Can you reproduce this by disabling pciehp and driving this sequence
> manually with setpci?  I suspect that we are tripping over our own
> feet because we read PCI_EXP_SLTSTA once, clear it (probably too
> early), then queue multiple events, then process those events possibly
> simultaneously.

sca15-2243-0a818158:~ # echo 1 > /sys/bus/pci/devices/\:3b\:00.0/remove
[  171.769322] pci :3b:00.0: PME# disabled
[  171.774459] iommu: Removing device :3b:00.0 from group 36
[  171.780984] pci :3b:00.0: freeing pci_dev info
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xa8.w
01cb
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xaa.w
0050
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xa8.w=0x05cb
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xaa.w
0158

so after power off, status bit 3 the PDC get set.


Re: [PATCH] PCI,pciehp: Don't handle PDC for cards with attention button

2017-02-17 Thread Yinghai Lu
On Fri, Feb 17, 2017 at 2:39 PM, Bjorn Helgaas  wrote:
> On Thu, Feb 16, 2017 at 10:12:47PM -0800, Yinghai Lu wrote:
>
> I don't think it really makes sense to tie PDC handling with the
> attention button.  It might happen to avoid the problem on your
> system, but I don't see the logical connection between them.

but in pcie_enable_notification() we don't enable PDCE when ATTN is not there.

cmd = PCI_EXP_SLTCTL_DLLSCE;
if (ATTN_BUTTN(ctrl))
cmd |= PCI_EXP_SLTCTL_ABPE;
else
cmd |= PCI_EXP_SLTCTL_PDCE;
if (!pciehp_poll_mode)
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
PCI_EXP_SLTCTL_PFDE |
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
PCI_EXP_SLTCTL_DLLSCE);

pcie_write_cmd_nowait(ctrl, cmd, mask);

should we remove that check there ?

>
> Can you reproduce this by disabling pciehp and driving this sequence
> manually with setpci?  I suspect that we are tripping over our own
> feet because we read PCI_EXP_SLTSTA once, clear it (probably too
> early), then queue multiple events, then process those events possibly
> simultaneously.

sca15-2243-0a818158:~ # echo 1 > /sys/bus/pci/devices/\:3b\:00.0/remove
[  171.769322] pci :3b:00.0: PME# disabled
[  171.774459] iommu: Removing device :3b:00.0 from group 36
[  171.780984] pci :3b:00.0: freeing pci_dev info
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xa8.w
01cb
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xaa.w
0050
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xa8.w=0x05cb
sca15-2243-0a818158:~ # setpci -s :3a:00.0 0xaa.w
0158

so after power off, status bit 3 the PDC get set.


[PATCH] PCI,pciehp: Don't handle PDC for cards with attention button

2017-02-16 Thread Yinghai Lu
On one system during power off slot, find card get power on right away.
 # echo 0 > /sys/bus/pci/slots/1/power
 pci_hotplug: power_write_file: power = 0
 pciehp :16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 
11f1
 pciehp :16:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 
:17:00
 pci :17:00.0: PME# disabled
 pci :17:00.0: freeing pci_dev info
 pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :16:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
 pciehp :16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
 pciehp :16:00.0:pcie004: Slot(1): Link Down
 pciehp :16:00.0:pcie004: Slot(1): Link Down event ignored; already 
powering off
 pciehp :16:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
 pciehp :16:00.0:pcie004: pending interrupts 0x0018 from Slot Status  
<==
 pciehp :16:00.0:pcie004: Slot(1): Card present
 pciehp :16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 
17f1
 pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :16:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
 pciehp :16:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
 pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
 pciehp :16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
 pciehp :16:00.0:pcie004: Slot(1): Link Up
...

Somehow PDC bit get set, and during handling interrupt that is caused by
CC, that PDC also get handled, and the card get powered on again.

In pcie_enable_notification(), we already only enable notification
for PDC when attention button is not there.
So we can safely add checking in pciehp_isr() to skip that PDC handling.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -618,8 +618,9 @@ static irqreturn_t pciehp_isr(int irq, v
present = !!(status & PCI_EXP_SLTSTA_PDS);
ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
  present ? "" : "not ");
-   pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
-INT_PRESENCE_OFF);
+   if (!ATTN_BUTTN(ctrl))
+   pciehp_queue_interrupt_event(slot, present ?
+   INT_PRESENCE_ON : INT_PRESENCE_OFF);
}
 
/* Check Power Fault Detected */


[PATCH] PCI,pciehp: Don't handle PDC for cards with attention button

2017-02-16 Thread Yinghai Lu
On one system during power off slot, find card get power on right away.
 # echo 0 > /sys/bus/pci/slots/1/power
 pci_hotplug: power_write_file: power = 0
 pciehp :16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 
11f1
 pciehp :16:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 
:17:00
 pci :17:00.0: PME# disabled
 pci :17:00.0: freeing pci_dev info
 pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :16:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
 pciehp :16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
 pciehp :16:00.0:pcie004: Slot(1): Link Down
 pciehp :16:00.0:pcie004: Slot(1): Link Down event ignored; already 
powering off
 pciehp :16:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
 pciehp :16:00.0:pcie004: pending interrupts 0x0018 from Slot Status  
<==
 pciehp :16:00.0:pcie004: Slot(1): Card present
 pciehp :16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 
17f1
 pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :16:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
 pciehp :16:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
 pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp :16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
 pciehp :16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
 pciehp :16:00.0:pcie004: Slot(1): Link Up
...

Somehow PDC bit get set, and during handling interrupt that is caused by
CC, that PDC also get handled, and the card get powered on again.

In pcie_enable_notification(), we already only enable notification
for PDC when attention button is not there.
So we can safely add checking in pciehp_isr() to skip that PDC handling.

Signed-off-by: Yinghai Lu 

---
 drivers/pci/hotplug/pciehp_hpc.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -618,8 +618,9 @@ static irqreturn_t pciehp_isr(int irq, v
present = !!(status & PCI_EXP_SLTSTA_PDS);
ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
  present ? "" : "not ");
-   pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
-INT_PRESENCE_OFF);
+   if (!ATTN_BUTTN(ctrl))
+   pciehp_queue_interrupt_event(slot, present ?
+   INT_PRESENCE_ON : INT_PRESENCE_OFF);
}
 
/* Check Power Fault Detected */


[PATCH] PCI/PME: Restore pcie_pme_driver.remove

2017-02-14 Thread Yinghai Lu
Found 4.9 and later, removing pci device for pcie port via /sys failed:
[ cut here ]
kernel BUG at drivers/pci/msi.c:370!
invalid opcode:  [#1] SMP
Modules linked in:
CPU: 1 PID: 14509 Comm: sh Tainted: GW  4.8.0-rc1-yh-00012-gd29438d
RIP: 0010:[]  free_msi_irqs+0x65/0x190
...
Call Trace:
 [] pci_disable_msi+0x34/0x40
 [] cleanup_service_irqs+0x27/0x30
 [] pcie_port_device_remove+0x2a/0x40
 [] pcie_portdrv_remove+0x40/0x50
 [] pci_device_remove+0x4b/0xc0
 [] __device_release_driver+0xb6/0x150
 [] device_release_driver+0x25/0x40
 [] pci_stop_bus_device+0x74/0xa0
 [] pci_stop_and_remove_bus_device_locked+0x1a/0x30
 [] remove_store+0x50/0x70
 [] dev_attr_store+0x18/0x30
 [] sysfs_kf_write+0x44/0x60
 [] kernfs_fop_write+0x10e/0x190
 [] __vfs_write+0x28/0x110
 [] ? percpu_down_read+0x44/0x80
 [] ? __sb_start_write+0xa7/0xe0
 [] ? __sb_start_write+0xa7/0xe0
 [] vfs_write+0xc4/0x180
 [] SyS_write+0x49/0xa0
 [] do_syscall_64+0xa6/0x1b0
 [] entry_SYSCALL64_slow_path+0x25/0x25
...
 RIP  [] free_msi_irqs+0x65/0x190
 RSP 
---[ end trace f4505e1dac5b95d3 ]---
Segmentation fault

Bisect to commit d7def2040077 ("PCI/PME: Make explicitly non-modular").
That commit did extra thing like remove the .remove for pcie_pme_driver.

Put back pcie_pme_remove and restore to pcie_pme_driver fix the problem.

Fixes: d7def2040077 ("PCI/PME: Make explicitly non-modular")
Cc: <sta...@vger.kernel.org>
Signed-off-by: Yinghai Lu <ying...@kernel.org>

diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index 7175293..2dd1c68 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -433,6 +433,17 @@ static int pcie_pme_resume(struct pcie_device *srv)
return 0;
 }
 
+/**
+ * pcie_pme_remove - Prepare PCIe PME service device for removal.
+ * @srv - PCIe service device to remove.
+ */
+static void pcie_pme_remove(struct pcie_device *srv)
+{
+   pcie_pme_suspend(srv);
+   free_irq(srv->irq, srv);
+   kfree(get_service_data(srv));
+}
+
 static struct pcie_port_service_driver pcie_pme_driver = {
.name   = "pcie_pme",
.port_type  = PCI_EXP_TYPE_ROOT_PORT,
@@ -441,6 +452,7 @@ static struct pcie_port_service_driver pcie_pme_driver = {
.probe  = pcie_pme_probe,
.suspend= pcie_pme_suspend,
.resume = pcie_pme_resume,
+   .remove = pcie_pme_remove,
 };
 
 /**


[PATCH] PCI/PME: Restore pcie_pme_driver.remove

2017-02-14 Thread Yinghai Lu
Found 4.9 and later, removing pci device for pcie port via /sys failed:
[ cut here ]
kernel BUG at drivers/pci/msi.c:370!
invalid opcode:  [#1] SMP
Modules linked in:
CPU: 1 PID: 14509 Comm: sh Tainted: GW  4.8.0-rc1-yh-00012-gd29438d
RIP: 0010:[]  free_msi_irqs+0x65/0x190
...
Call Trace:
 [] pci_disable_msi+0x34/0x40
 [] cleanup_service_irqs+0x27/0x30
 [] pcie_port_device_remove+0x2a/0x40
 [] pcie_portdrv_remove+0x40/0x50
 [] pci_device_remove+0x4b/0xc0
 [] __device_release_driver+0xb6/0x150
 [] device_release_driver+0x25/0x40
 [] pci_stop_bus_device+0x74/0xa0
 [] pci_stop_and_remove_bus_device_locked+0x1a/0x30
 [] remove_store+0x50/0x70
 [] dev_attr_store+0x18/0x30
 [] sysfs_kf_write+0x44/0x60
 [] kernfs_fop_write+0x10e/0x190
 [] __vfs_write+0x28/0x110
 [] ? percpu_down_read+0x44/0x80
 [] ? __sb_start_write+0xa7/0xe0
 [] ? __sb_start_write+0xa7/0xe0
 [] vfs_write+0xc4/0x180
 [] SyS_write+0x49/0xa0
 [] do_syscall_64+0xa6/0x1b0
 [] entry_SYSCALL64_slow_path+0x25/0x25
...
 RIP  [] free_msi_irqs+0x65/0x190
 RSP 
---[ end trace f4505e1dac5b95d3 ]---
Segmentation fault

Bisect to commit d7def2040077 ("PCI/PME: Make explicitly non-modular").
That commit did extra thing like remove the .remove for pcie_pme_driver.

Put back pcie_pme_remove and restore to pcie_pme_driver fix the problem.

Fixes: d7def2040077 ("PCI/PME: Make explicitly non-modular")
Cc: 
Signed-off-by: Yinghai Lu 

diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index 7175293..2dd1c68 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -433,6 +433,17 @@ static int pcie_pme_resume(struct pcie_device *srv)
return 0;
 }
 
+/**
+ * pcie_pme_remove - Prepare PCIe PME service device for removal.
+ * @srv - PCIe service device to remove.
+ */
+static void pcie_pme_remove(struct pcie_device *srv)
+{
+   pcie_pme_suspend(srv);
+   free_irq(srv->irq, srv);
+   kfree(get_service_data(srv));
+}
+
 static struct pcie_port_service_driver pcie_pme_driver = {
.name   = "pcie_pme",
.port_type  = PCI_EXP_TYPE_ROOT_PORT,
@@ -441,6 +452,7 @@ static struct pcie_port_service_driver pcie_pme_driver = {
.probe  = pcie_pme_probe,
.suspend= pcie_pme_suspend,
.resume = pcie_pme_resume,
+   .remove = pcie_pme_remove,
 };
 
 /**


Re: [GIT PULL] PCI fixes for v4.10

2017-02-10 Thread Yinghai Lu
On Fri, Feb 10, 2017 at 6:39 PM, Yinghai Lu <ying...@kernel.org> wrote:
> Ashok,
>
> Can ask your QA guys check only attached patch and commit 68db9bc ?

more clean patches: split that into two small patches.

Thanks

Yinghai
From 68db9bc814362e7f24371c27d12a4f34477d9356 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lu...@wunner.de>
Date: Fri, 28 Oct 2016 10:52:06 +0200
Subject: PCI: pciehp: Add runtime PM support for PCIe hotplug ports

Linux 4.8 added support for runtime suspending PCIe ports to D3hot with
commit 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports"), but
excluded hotplug ports.  Those are now afforded runtime PM by the present
commit.

Hotplug ports require a few extra considerations:

- The configuration space of the port remains accessible in D3hot, so all
  the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.  Even turning on slot power doesn't seem
  to require the port to be in D0, at least the PCIe spec doesn't say so
  and I confirmed that by testing with a Thunderbolt controller.

- However D0 is required to access devices on the secondary bus.  This
  happens in pciehp_check_link_status() and pciehp_configure_device() (both
  called from board_added()) and in pciehp_unconfigure_device() (called
  from remove_board()), so acquire a runtime PM ref for their invocation.

- The hotplug port stays active as long as it has active children.  If all
  hotplugged devices below the port runtime suspend, the port is allowed to
  runtime suspend as well.  Plug and unplug detection continues to work in
  D3hot.

- Hotplug interrupts are delivered in-band, so while the hotplug port
  itself is allowed to go to D3hot, its parent ports must stay in D0 for
  interrupts to come through.  Add a corresponding restriction to
  pci_dev_check_d3cold().

- Runtime PM may only be allowed if the hotplug port is handled natively by
  the OS.  On ACPI systems, the port may alternatively be handled by the
  firmware and things break if the OS puts the port into D3 behind the
  firmware's back:  E.g. Thunderbolt hotplug ports on non-Macs are handled
  by Intel's firmware in System Management Mode and the firmware is known
  to access devices on the port's secondary bus without checking first if
  the port is in D0: https://bugzilla.kernel.org/show_bug.cgi?id=53811

Signed-off-by: Lukas Wunner <lu...@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
CC: Mika Westerberg <mika.westerb...@linux.intel.com>

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index efe69e8..ffd3fe6 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "../pci.h"
 #include "pciehp.h"
@@ -98,6 +99,7 @@ static int board_added(struct slot *p_slot)
 	pciehp_green_led_blink(p_slot);
 
 	/* Check link training status */
+	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_check_link_status(ctrl);
 	if (retval) {
 		ctrl_err(ctrl, "Failed to check link status\n");
@@ -118,12 +120,14 @@ static int board_added(struct slot *p_slot)
 		if (retval != -EEXIST)
 			goto err_exit;
 	}
+	pm_runtime_put(>pcie->port->dev);
 
 	pciehp_green_led_on(p_slot);
 	pciehp_set_attention_status(p_slot, 0);
 	return 0;
 
 err_exit:
+	pm_runtime_put(>pcie->port->dev);
 	set_slot_off(ctrl, p_slot);
 	return retval;
 }
@@ -137,7 +141,9 @@ static int remove_board(struct slot *p_slot)
 	int retval;
 	struct controller *ctrl = p_slot->ctrl;
 
+	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_unconfigure_device(p_slot);
+	pm_runtime_put(>pcie->port->dev);
 	if (retval)
 		return retval;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d86351a..1eb622c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2245,13 +2245,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return false;
 
 		/*
-		 * Hotplug interrupts cannot be delivered if the link is down,
-		 * so parents of a hotplug port must stay awake. In addition,
-		 * hotplug ports handled by firmware in System Management Mode
+		 * Hotplug ports handled by firmware in System Management Mode
 		 * may not be put into D3 by the OS (Thunderbolt on non-Macs).
-		 * For simplicity, disallow in general for now.
 		 */
-		if (bridge->is_hotplug_bridge)
+		if (bridge->is_hotplug_bridge && !pciehp_is_native(bridge))
 			return false;
 
 		if (pci_bridge_d3_force)
@@ -2283,7 +2280,10 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data)
 	 !pci_pme_capable(dev, PCI_D3cold)) ||
 
 	/* If it is a bridge it must be allowed to go to D3. */
-	!pci_power_manageable(dev))
+	!pci_power_manageable(dev) ||
+
+	/* Hotplug interrupts cannot be de

Re: [GIT PULL] PCI fixes for v4.10

2017-02-10 Thread Yinghai Lu
On Fri, Feb 10, 2017 at 6:39 PM, Yinghai Lu  wrote:
> Ashok,
>
> Can ask your QA guys check only attached patch and commit 68db9bc ?

more clean patches: split that into two small patches.

Thanks

Yinghai
From 68db9bc814362e7f24371c27d12a4f34477d9356 Mon Sep 17 00:00:00 2001
From: Lukas Wunner 
Date: Fri, 28 Oct 2016 10:52:06 +0200
Subject: PCI: pciehp: Add runtime PM support for PCIe hotplug ports

Linux 4.8 added support for runtime suspending PCIe ports to D3hot with
commit 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports"), but
excluded hotplug ports.  Those are now afforded runtime PM by the present
commit.

Hotplug ports require a few extra considerations:

- The configuration space of the port remains accessible in D3hot, so all
  the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.  Even turning on slot power doesn't seem
  to require the port to be in D0, at least the PCIe spec doesn't say so
  and I confirmed that by testing with a Thunderbolt controller.

- However D0 is required to access devices on the secondary bus.  This
  happens in pciehp_check_link_status() and pciehp_configure_device() (both
  called from board_added()) and in pciehp_unconfigure_device() (called
  from remove_board()), so acquire a runtime PM ref for their invocation.

- The hotplug port stays active as long as it has active children.  If all
  hotplugged devices below the port runtime suspend, the port is allowed to
  runtime suspend as well.  Plug and unplug detection continues to work in
  D3hot.

- Hotplug interrupts are delivered in-band, so while the hotplug port
  itself is allowed to go to D3hot, its parent ports must stay in D0 for
  interrupts to come through.  Add a corresponding restriction to
  pci_dev_check_d3cold().

- Runtime PM may only be allowed if the hotplug port is handled natively by
  the OS.  On ACPI systems, the port may alternatively be handled by the
  firmware and things break if the OS puts the port into D3 behind the
  firmware's back:  E.g. Thunderbolt hotplug ports on non-Macs are handled
  by Intel's firmware in System Management Mode and the firmware is known
  to access devices on the port's secondary bus without checking first if
  the port is in D0: https://bugzilla.kernel.org/show_bug.cgi?id=53811

Signed-off-by: Lukas Wunner 
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Rafael J. Wysocki 
CC: Mika Westerberg 

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index efe69e8..ffd3fe6 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "../pci.h"
 #include "pciehp.h"
@@ -98,6 +99,7 @@ static int board_added(struct slot *p_slot)
 	pciehp_green_led_blink(p_slot);
 
 	/* Check link training status */
+	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_check_link_status(ctrl);
 	if (retval) {
 		ctrl_err(ctrl, "Failed to check link status\n");
@@ -118,12 +120,14 @@ static int board_added(struct slot *p_slot)
 		if (retval != -EEXIST)
 			goto err_exit;
 	}
+	pm_runtime_put(>pcie->port->dev);
 
 	pciehp_green_led_on(p_slot);
 	pciehp_set_attention_status(p_slot, 0);
 	return 0;
 
 err_exit:
+	pm_runtime_put(>pcie->port->dev);
 	set_slot_off(ctrl, p_slot);
 	return retval;
 }
@@ -137,7 +141,9 @@ static int remove_board(struct slot *p_slot)
 	int retval;
 	struct controller *ctrl = p_slot->ctrl;
 
+	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_unconfigure_device(p_slot);
+	pm_runtime_put(>pcie->port->dev);
 	if (retval)
 		return retval;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d86351a..1eb622c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2245,13 +2245,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return false;
 
 		/*
-		 * Hotplug interrupts cannot be delivered if the link is down,
-		 * so parents of a hotplug port must stay awake. In addition,
-		 * hotplug ports handled by firmware in System Management Mode
+		 * Hotplug ports handled by firmware in System Management Mode
 		 * may not be put into D3 by the OS (Thunderbolt on non-Macs).
-		 * For simplicity, disallow in general for now.
 		 */
-		if (bridge->is_hotplug_bridge)
+		if (bridge->is_hotplug_bridge && !pciehp_is_native(bridge))
 			return false;
 
 		if (pci_bridge_d3_force)
@@ -2283,7 +2280,10 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data)
 	 !pci_pme_capable(dev, PCI_D3cold)) ||
 
 	/* If it is a bridge it must be allowed to go to D3. */
-	!pci_power_manageable(dev))
+	!pci_power_manageable(dev) ||
+
+	/* Hotplug interrupts cannot be delivered if the link is down. */
+	dev->is_hotplug_bridge)
 
 		*d3cold_ok = false;
 
Subject: [PATCH] PCI, pciehp: clean and reuse set_slot_off

Move out led

Re: [GIT PULL] PCI fixes for v4.10

2017-02-10 Thread Yinghai Lu
00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 6118.023880] pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 6118.602855] pciehp :16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
[ 6118.611507] pciehp :16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
[ 6118.620057] pciehp :16:00.0:pcie004: Slot(1): Link Up
[ 6118.626151] pciehp :16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
[ 6118.634828] pciehp :16:00.0:pcie004: Slot(1): Link Up event ignored; already powering on
[ 6118.741520] pciehp :16:00.0:pcie004: pciehp_check_link_status: lnk_status = f103
[ 6118.750201] pci :17:00.0: [108e:2088] type 00 class 0x020700
...

That mean commit 68db9bc assumpation about power on/off on D3 is not right.
- The configuration space of the port remains accessible in D3hot, so all
  the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.  Even turning on slot power doesn't seem
  to require the port to be in D0, at least the PCIe spec doesn't say so
  and I confirmed that by testing with a Thunderbolt controller.

This patch put back D0 when trying to power on/off the slots.

Signed-off-by: Yinghai Lu <ying...@kernel.org>

---
 drivers/pci/hotplug/pciehp_ctrl.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
@@ -89,17 +89,17 @@ static int board_added(struct slot *p_sl
 	struct controller *ctrl = p_slot->ctrl;
 	struct pci_bus *parent = ctrl->pcie->port->subordinate;
 
+	pm_runtime_get_sync(>pcie->port->dev);
 	if (POWER_CTRL(ctrl)) {
 		/* Power on slot */
 		retval = pciehp_power_on_slot(p_slot);
 		if (retval)
-			return retval;
+			goto err_exit;
 	}
 
 	pciehp_green_led_blink(p_slot);
 
 	/* Check link training status */
-	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_check_link_status(ctrl);
 	if (retval) {
 		ctrl_err(ctrl, "Failed to check link status\n");
@@ -143,9 +143,10 @@ static int remove_board(struct slot *p_s
 
 	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_unconfigure_device(p_slot);
-	pm_runtime_put(>pcie->port->dev);
-	if (retval)
+	if (retval) {
+		pm_runtime_put(>pcie->port->dev);
 		return retval;
+	}
 
 	if (POWER_CTRL(ctrl)) {
 		pciehp_power_off_slot(p_slot);
@@ -157,6 +158,7 @@ static int remove_board(struct slot *p_s
 		 */
 		msleep(1000);
 	}
+	pm_runtime_put(>pcie->port->dev);
 
 	/* turn off Green LED */
 	pciehp_green_led_off(p_slot);


Re: [GIT PULL] PCI fixes for v4.10

2017-02-10 Thread Yinghai Lu
ink: SLOTCTRL a8 write cmd 200
[ 6118.023880] pciehp :16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 6118.602855] pciehp :16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
[ 6118.611507] pciehp :16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
[ 6118.620057] pciehp :16:00.0:pcie004: Slot(1): Link Up
[ 6118.626151] pciehp :16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
[ 6118.634828] pciehp :16:00.0:pcie004: Slot(1): Link Up event ignored; already powering on
[ 6118.741520] pciehp :16:00.0:pcie004: pciehp_check_link_status: lnk_status = f103
[ 6118.750201] pci :17:00.0: [108e:2088] type 00 class 0x020700
...

That mean commit 68db9bc assumpation about power on/off on D3 is not right.
- The configuration space of the port remains accessible in D3hot, so all
  the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.  Even turning on slot power doesn't seem
  to require the port to be in D0, at least the PCIe spec doesn't say so
  and I confirmed that by testing with a Thunderbolt controller.

This patch put back D0 when trying to power on/off the slots.

Signed-off-by: Yinghai Lu 

---
 drivers/pci/hotplug/pciehp_ctrl.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
@@ -89,17 +89,17 @@ static int board_added(struct slot *p_sl
 	struct controller *ctrl = p_slot->ctrl;
 	struct pci_bus *parent = ctrl->pcie->port->subordinate;
 
+	pm_runtime_get_sync(>pcie->port->dev);
 	if (POWER_CTRL(ctrl)) {
 		/* Power on slot */
 		retval = pciehp_power_on_slot(p_slot);
 		if (retval)
-			return retval;
+			goto err_exit;
 	}
 
 	pciehp_green_led_blink(p_slot);
 
 	/* Check link training status */
-	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_check_link_status(ctrl);
 	if (retval) {
 		ctrl_err(ctrl, "Failed to check link status\n");
@@ -143,9 +143,10 @@ static int remove_board(struct slot *p_s
 
 	pm_runtime_get_sync(>pcie->port->dev);
 	retval = pciehp_unconfigure_device(p_slot);
-	pm_runtime_put(>pcie->port->dev);
-	if (retval)
+	if (retval) {
+		pm_runtime_put(>pcie->port->dev);
 		return retval;
+	}
 
 	if (POWER_CTRL(ctrl)) {
 		pciehp_power_off_slot(p_slot);
@@ -157,6 +158,7 @@ static int remove_board(struct slot *p_s
 		 */
 		msleep(1000);
 	}
+	pm_runtime_put(>pcie->port->dev);
 
 	/* turn off Green LED */
 	pciehp_green_led_off(p_slot);


Re: pciehp is broken from 4.10-rc1

2017-02-04 Thread Yinghai Lu
On Sat, Feb 4, 2017 at 8:22 PM, Yinghai Lu <ying...@kernel.org> wrote:
> On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner <lu...@wunner.de> wrote:
>> On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote:
>>> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lu...@wunner.de> wrote:
>>> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote:
>>> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above,
>>> > so here's a tentative patch.
>>> >
>>> >
>>> > -- >8 --
>>> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend
>>>
>>> it works:
>>
>> Thanks a lot for the report and for testing the patch!
>
> Wait, Commit 68db9bc still has problem with another server (skylake
> based), and this patch does not help.
>
>
> sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power
> [  362.721197] pci_hotplug: power_write_file: power = 0
> [  362.726887] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
> SLOTCTRL a8 value read 11f1
> [  362.736431] pciehp :b3:00.0:pcie004: pciehp_unconfigure_device:
> domain:bus:dev = :b4:00
> [  362.746160] mlx4_core :b4:00.0: PME# disabled
> [  364.494033] pcieport :b3:00.0:   root_bridge ACPI_HANDLE
> 9e56b8811550 : pci:b3
> [  364.503274] pcieport :b3:00.0:  pciehp is native
> [  364.508863] pci :b4:00.0: freeing pci_dev info
> [  364.514718] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  364.523443] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
> SLOTCTRL a8 write cmd 400
> [  364.587047] pciehp :b3:00.0:pcie004: pending interrupts 0x0108
> from Slot Status
> [  364.595592] pciehp :b3:00.0:pcie004: Slot(11): Link Down
> [  364.602325] pciehp :b3:00.0:pcie004: Slot(11): Link Down event
> ignored; already powering off
> [  365.568415] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
> SLOTCTRL a8 write cmd 300
> [  365.569338] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
>
> sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power
> [  375.376609] pci_hotplug: power_write_file: power = 1
> [  375.382175] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
> SLOTCTRL a8 value read 17f1
> [  375.392695] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  375.401370] pciehp :b3:00.0:pcie004: pciehp_power_on_slot:
> SLOTCTRL a8 write cmd 0
> [  375.410231] pciehp :b3:00.0:pcie004: pciehp_green_led_blink:
> SLOTCTRL a8 write cmd 200
> [  375.411071] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  375.445222] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  377.00] pciehp :b3:00.0:pcie004: Data Link Layer Link
> Active not set in 1000 msec
> [  378.960364] pci :b4:00.0 id reading try 50 times with interval
> 20 ms to get 
> [  378.969406] pciehp :b3:00.0:pcie004: pciehp_check_link_status:
> lnk_status = 5001
> [  378.978059] pciehp :b3:00.0:pcie004: link training error: status 0x5001
> [  378.985834] pciehp :b3:00.0:pcie004: Failed to check link status
> [  378.987185] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  378.987253] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
> SLOTCTRL a8 write cmd 400
> [  380.000409] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
> SLOTCTRL a8 write cmd 300
> [  380.000674] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  380.018020] pciehp :b3:00.0:pcie004:
> pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
> [  380.019053] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> -bash: echo: write error: Operation not permitted
>
> revert commit 68db9bc, also make it working again.

output after reverting 68db9bc

sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power
[  359.966115] pci_hotplug: power_write_file: power = 0
[  359.971759] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
SLOTCTRL a8 value read 11f1
[  359.981284] pciehp :b3:00.0:pcie004: pciehp_unconfigure_device:
domain:bus:dev = :b4:00
[  359.991017] mlx4_core :b4:00.0: PME# disabled
[  361.579571] pci :b4:00.0: freeing pci_dev info
[  361.585390] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  361.594116] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  361.657705] pciehp :b3:00.0:pcie004: pending interrupts 0x0108
from Slot Status
[  361.666268] pciehp :b3:00.0:pcie004: Slot(11): Link Down
[  361.673076] pciehp :b3:00.0:pcie004: Slot(11): Link Down event
ignore

Re: pciehp is broken from 4.10-rc1

2017-02-04 Thread Yinghai Lu
On Sat, Feb 4, 2017 at 8:22 PM, Yinghai Lu  wrote:
> On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner  wrote:
>> On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote:
>>> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner  wrote:
>>> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote:
>>> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above,
>>> > so here's a tentative patch.
>>> >
>>> >
>>> > -- >8 --
>>> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend
>>>
>>> it works:
>>
>> Thanks a lot for the report and for testing the patch!
>
> Wait, Commit 68db9bc still has problem with another server (skylake
> based), and this patch does not help.
>
>
> sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power
> [  362.721197] pci_hotplug: power_write_file: power = 0
> [  362.726887] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
> SLOTCTRL a8 value read 11f1
> [  362.736431] pciehp :b3:00.0:pcie004: pciehp_unconfigure_device:
> domain:bus:dev = :b4:00
> [  362.746160] mlx4_core :b4:00.0: PME# disabled
> [  364.494033] pcieport :b3:00.0:   root_bridge ACPI_HANDLE
> 9e56b8811550 : pci:b3
> [  364.503274] pcieport :b3:00.0:  pciehp is native
> [  364.508863] pci :b4:00.0: freeing pci_dev info
> [  364.514718] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  364.523443] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
> SLOTCTRL a8 write cmd 400
> [  364.587047] pciehp :b3:00.0:pcie004: pending interrupts 0x0108
> from Slot Status
> [  364.595592] pciehp :b3:00.0:pcie004: Slot(11): Link Down
> [  364.602325] pciehp :b3:00.0:pcie004: Slot(11): Link Down event
> ignored; already powering off
> [  365.568415] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
> SLOTCTRL a8 write cmd 300
> [  365.569338] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
>
> sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power
> [  375.376609] pci_hotplug: power_write_file: power = 1
> [  375.382175] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
> SLOTCTRL a8 value read 17f1
> [  375.392695] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  375.401370] pciehp :b3:00.0:pcie004: pciehp_power_on_slot:
> SLOTCTRL a8 write cmd 0
> [  375.410231] pciehp :b3:00.0:pcie004: pciehp_green_led_blink:
> SLOTCTRL a8 write cmd 200
> [  375.411071] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  375.445222] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  377.00] pciehp :b3:00.0:pcie004: Data Link Layer Link
> Active not set in 1000 msec
> [  378.960364] pci :b4:00.0 id reading try 50 times with interval
> 20 ms to get 
> [  378.969406] pciehp :b3:00.0:pcie004: pciehp_check_link_status:
> lnk_status = 5001
> [  378.978059] pciehp :b3:00.0:pcie004: link training error: status 0x5001
> [  378.985834] pciehp :b3:00.0:pcie004: Failed to check link status
> [  378.987185] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  378.987253] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
> SLOTCTRL a8 write cmd 400
> [  380.000409] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
> SLOTCTRL a8 write cmd 300
> [  380.000674] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> [  380.018020] pciehp :b3:00.0:pcie004:
> pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
> [  380.019053] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
> from Slot Status
> -bash: echo: write error: Operation not permitted
>
> revert commit 68db9bc, also make it working again.

output after reverting 68db9bc

sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power
[  359.966115] pci_hotplug: power_write_file: power = 0
[  359.971759] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
SLOTCTRL a8 value read 11f1
[  359.981284] pciehp :b3:00.0:pcie004: pciehp_unconfigure_device:
domain:bus:dev = :b4:00
[  359.991017] mlx4_core :b4:00.0: PME# disabled
[  361.579571] pci :b4:00.0: freeing pci_dev info
[  361.585390] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  361.594116] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  361.657705] pciehp :b3:00.0:pcie004: pending interrupts 0x0108
from Slot Status
[  361.666268] pciehp :b3:00.0:pcie004: Slot(11): Link Down
[  361.673076] pciehp :b3:00.0:pcie004: Slot(11): Link Down event
ignored; already powering off
[  362.621894] pciehp :b3:00.0:pcie004: p

Re: pciehp is broken from 4.10-rc1

2017-02-04 Thread Yinghai Lu
On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner <lu...@wunner.de> wrote:
> On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote:
>> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lu...@wunner.de> wrote:
>> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote:
>> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above,
>> > so here's a tentative patch.
>> >
>> >
>> > -- >8 --
>> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend
>>
>> it works:
>
> Thanks a lot for the report and for testing the patch!

Wait, Commit 68db9bc still has problem with another server (skylake
based), and this patch does not help.


sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power
[  362.721197] pci_hotplug: power_write_file: power = 0
[  362.726887] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
SLOTCTRL a8 value read 11f1
[  362.736431] pciehp :b3:00.0:pcie004: pciehp_unconfigure_device:
domain:bus:dev = :b4:00
[  362.746160] mlx4_core :b4:00.0: PME# disabled
[  364.494033] pcieport :b3:00.0:   root_bridge ACPI_HANDLE
9e56b8811550 : pci:b3
[  364.503274] pcieport :b3:00.0:  pciehp is native
[  364.508863] pci :b4:00.0: freeing pci_dev info
[  364.514718] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  364.523443] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  364.587047] pciehp :b3:00.0:pcie004: pending interrupts 0x0108
from Slot Status
[  364.595592] pciehp :b3:00.0:pcie004: Slot(11): Link Down
[  364.602325] pciehp :b3:00.0:pcie004: Slot(11): Link Down event
ignored; already powering off
[  365.568415] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
[  365.569338] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status

sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power
[  375.376609] pci_hotplug: power_write_file: power = 1
[  375.382175] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
SLOTCTRL a8 value read 17f1
[  375.392695] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  375.401370] pciehp :b3:00.0:pcie004: pciehp_power_on_slot:
SLOTCTRL a8 write cmd 0
[  375.410231] pciehp :b3:00.0:pcie004: pciehp_green_led_blink:
SLOTCTRL a8 write cmd 200
[  375.411071] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  375.445222] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  377.00] pciehp :b3:00.0:pcie004: Data Link Layer Link
Active not set in 1000 msec
[  378.960364] pci :b4:00.0 id reading try 50 times with interval
20 ms to get 
[  378.969406] pciehp :b3:00.0:pcie004: pciehp_check_link_status:
lnk_status = 5001
[  378.978059] pciehp :b3:00.0:pcie004: link training error: status 0x5001
[  378.985834] pciehp :b3:00.0:pcie004: Failed to check link status
[  378.987185] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  378.987253] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  380.000409] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
[  380.000674] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  380.018020] pciehp :b3:00.0:pcie004:
pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[  380.019053] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
-bash: echo: write error: Operation not permitted

revert commit 68db9bc, also make it working again.


Thanks


Yinghai


Re: pciehp is broken from 4.10-rc1

2017-02-04 Thread Yinghai Lu
On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner  wrote:
> On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote:
>> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner  wrote:
>> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote:
>> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above,
>> > so here's a tentative patch.
>> >
>> >
>> > -- >8 --
>> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend
>>
>> it works:
>
> Thanks a lot for the report and for testing the patch!

Wait, Commit 68db9bc still has problem with another server (skylake
based), and this patch does not help.


sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power
[  362.721197] pci_hotplug: power_write_file: power = 0
[  362.726887] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
SLOTCTRL a8 value read 11f1
[  362.736431] pciehp :b3:00.0:pcie004: pciehp_unconfigure_device:
domain:bus:dev = :b4:00
[  362.746160] mlx4_core :b4:00.0: PME# disabled
[  364.494033] pcieport :b3:00.0:   root_bridge ACPI_HANDLE
9e56b8811550 : pci:b3
[  364.503274] pcieport :b3:00.0:  pciehp is native
[  364.508863] pci :b4:00.0: freeing pci_dev info
[  364.514718] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  364.523443] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  364.587047] pciehp :b3:00.0:pcie004: pending interrupts 0x0108
from Slot Status
[  364.595592] pciehp :b3:00.0:pcie004: Slot(11): Link Down
[  364.602325] pciehp :b3:00.0:pcie004: Slot(11): Link Down event
ignored; already powering off
[  365.568415] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
[  365.569338] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status

sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power
[  375.376609] pci_hotplug: power_write_file: power = 1
[  375.382175] pciehp :b3:00.0:pcie004: pciehp_get_power_status:
SLOTCTRL a8 value read 17f1
[  375.392695] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  375.401370] pciehp :b3:00.0:pcie004: pciehp_power_on_slot:
SLOTCTRL a8 write cmd 0
[  375.410231] pciehp :b3:00.0:pcie004: pciehp_green_led_blink:
SLOTCTRL a8 write cmd 200
[  375.411071] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  375.445222] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  377.00] pciehp :b3:00.0:pcie004: Data Link Layer Link
Active not set in 1000 msec
[  378.960364] pci :b4:00.0 id reading try 50 times with interval
20 ms to get 
[  378.969406] pciehp :b3:00.0:pcie004: pciehp_check_link_status:
lnk_status = 5001
[  378.978059] pciehp :b3:00.0:pcie004: link training error: status 0x5001
[  378.985834] pciehp :b3:00.0:pcie004: Failed to check link status
[  378.987185] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  378.987253] pciehp :b3:00.0:pcie004: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  380.000409] pciehp :b3:00.0:pcie004: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
[  380.000674] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
[  380.018020] pciehp :b3:00.0:pcie004:
pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[  380.019053] pciehp :b3:00.0:pcie004: pending interrupts 0x0010
from Slot Status
-bash: echo: write error: Operation not permitted

revert commit 68db9bc, also make it working again.


Thanks


Yinghai


  1   2   3   4   5   6   7   8   9   10   >