Re: [PATCH v2 0/3] PCIe Host request to reserve IOVA

2018-12-13 Thread poza

On 2018-12-13 16:02, Srinath Mannam wrote:

Few SOCs have limitation that their PCIe host can't allow few inbound
address ranges.
Allowed inbound address ranges are listed in dma-ranges DT property and
this address ranges are required to do IOVA mapping.
Remaining address ranges have to be reserved in IOVA mapping.

PCIe Host driver of those SOCs has to list all address ranges which 
have
to reserve their IOVA address into PCIe host bridge resource entry 
list.
IOMMU framework will reserve these IOVAs while initializing IOMMU 
domain.


This patch set is based on Linux-4.19-rc1.

Changes from v1:
  - Addressed Oza review comments.

Srinath Mannam (3):
  PCI: Add dma-resv window list
  iommu/dma: IOVA reserve for PCI host reserve address list
  PCI: iproc: Add dma reserve resources to host

 drivers/iommu/dma-iommu.c   |  8 ++
 drivers/pci/controller/pcie-iproc.c | 51 
-

 drivers/pci/probe.c |  3 +++
 include/linux/pci.h |  1 +
 4 files changed, 62 insertions(+), 1 deletion(-)


Looks good to me.

Reviewed-by: Oza Pawandeep 


Re: [RFC PATCH 3/3] PCI: iproc: Add dma reserve resources to host

2018-12-13 Thread poza

On 2018-12-13 14:47, Srinath Mannam wrote:

Hi Oza,

Thank you for the review.
Please find my comments in lined.

On Thu, Dec 13, 2018 at 11:33 AM  wrote:


On 2018-12-12 11:16, Srinath Mannam wrote:
> IPROC host has the limitation that it can use
> only those address ranges given by dma-ranges
> property as inbound address.
> So that the memory address holes in dma-ranges
> should be reserved to allocate as DMA address.
>
> All such reserved addresses are created as resource
> entries and add to dma_resv list of pci host bridge.
>
> These dma reserve resources created by parsing
> dma-ranges parameter.
>
> Ex:
> dma-ranges = < \
>   0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \
>   0x4300 0x08 0x 0x08 0x 0x08 0x \
>   0x4300 0x80 0x 0x80 0x 0x40 0x>
>
> In the above example of dma-ranges, memory address from
> 0x0 - 0x8000,
> 0x1 - 0x8,
> 0x10 - 0x80 and
> 0x100 - 0x.
> are not allowed to use as inbound addresses.
> So that we need to add these address range to dma_resv
> list to reserve their IOVA address ranges.
>
> Signed-off-by: Srinath Mannam 
> ---
>  drivers/pci/controller/pcie-iproc.c | 49
> +
>  1 file changed, 49 insertions(+)
>
> diff --git a/drivers/pci/controller/pcie-iproc.c
> b/drivers/pci/controller/pcie-iproc.c
> index 3160e93..43e465a 100644
> --- a/drivers/pci/controller/pcie-iproc.c
> +++ b/drivers/pci/controller/pcie-iproc.c
> @@ -1154,25 +1154,74 @@ static int iproc_pcie_setup_ib(struct
> iproc_pcie *pcie,
>   return ret;
>  }
>
> +static int
> +iproc_pcie_add_dma_resv_range(struct device *dev, struct list_head
> *resources,
> +   uint64_t start, uint64_t end)
> +{
> + struct resource *res;
> +
> + res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL);
> + if (!res)
> + return -ENOMEM;
> +
> + res->start = (resource_size_t)start;
> + res->end = (resource_size_t)end;
> + pci_add_resource_offset(resources, res, 0);
> +
> + return 0;
> +}
> +
>  static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie)
>  {
> + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
>   struct of_pci_range range;
>   struct of_pci_range_parser parser;
>   int ret;
> + uint64_t start, end;
> + LIST_HEAD(resources);
>
>   /* Get the dma-ranges from DT */
>   ret = of_pci_dma_range_parser_init(, pcie->dev->of_node);
>   if (ret)
>   return ret;
>
> + start = 0;
>   for_each_of_pci_range(, ) {
> + end = range.pci_addr;
> + /* dma-ranges list expected in sorted order */
> + if (end < start) {
> + ret = -EINVAL;
> + goto out;
> + }
>   /* Each range entry corresponds to an inbound mapping region */
>   ret = iproc_pcie_setup_ib(pcie, , IPROC_PCIE_IB_MAP_MEM);
>   if (ret)
>   return ret;
> +
> + if (end - start) {
> + ret = iproc_pcie_add_dma_resv_range(pcie->dev,
> + ,
> + start, end);
> + if (ret)
> + goto out;
> + }
> + start = range.pci_addr + range.size;
>   }
>
> + end = ~0;
Hi Srinath,

this series is based on following patch sets.

https://lkml.org/lkml/2017/5/16/19
https://lkml.org/lkml/2017/5/16/23
https://lkml.org/lkml/2017/5/16/21,


Yes, this patch series is done based on the inputs of the patches you
sent earlier.


some comments to be adapted from the patch-set I did.

end = ~0;
you should consider DMA_MASK, to see iproc controller is in 32 bit or 
64

bit system.
please check following code snippet.

if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) 
* 8) - 1);

+   reserve_iova(iovad, lo, hi);
+   }

Also if this controller is integrated to 64bit platform, but decide to
restrict DMA to 32 bit for some reason, the code should address such
scenarios.
so it is always safe to do

#define BITS_PER_BYTE 8
DMA_BIT_MASK(sizeof(dma_addr_t) * BITS_PER_BYTE)
so please use kernle macro to find the end of DMA region.


this change done with the assumption, that end_address is max bus
address(~0) instead
pcie RC dma mask.
Even dma-ranges has 64bit size dma-mask of PCIe host is forced to 
32bit.

// in of_dma_configure function
dev->coherent_dma_mask = DMA_BIT_MASK(32);
And dma-mask of endpoint was set to 64bit in their drivers. also SMMU 
supported

dma mask is 48-bit.
But here requirement is all address ranges except 

Re: [RFC PATCH 3/3] PCI: iproc: Add dma reserve resources to host

2018-12-12 Thread poza

On 2018-12-12 11:16, Srinath Mannam wrote:

IPROC host has the limitation that it can use
only those address ranges given by dma-ranges
property as inbound address.
So that the memory address holes in dma-ranges
should be reserved to allocate as DMA address.

All such reserved addresses are created as resource
entries and add to dma_resv list of pci host bridge.

These dma reserve resources created by parsing
dma-ranges parameter.

Ex:
dma-ranges = < \
  0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \
  0x4300 0x08 0x 0x08 0x 0x08 0x \
  0x4300 0x80 0x 0x80 0x 0x40 0x>

In the above example of dma-ranges, memory address from
0x0 - 0x8000,
0x1 - 0x8,
0x10 - 0x80 and
0x100 - 0x.
are not allowed to use as inbound addresses.
So that we need to add these address range to dma_resv
list to reserve their IOVA address ranges.

Signed-off-by: Srinath Mannam 
---
 drivers/pci/controller/pcie-iproc.c | 49 
+

 1 file changed, 49 insertions(+)

diff --git a/drivers/pci/controller/pcie-iproc.c
b/drivers/pci/controller/pcie-iproc.c
index 3160e93..43e465a 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -1154,25 +1154,74 @@ static int iproc_pcie_setup_ib(struct 
iproc_pcie *pcie,

return ret;
 }

+static int
+iproc_pcie_add_dma_resv_range(struct device *dev, struct list_head 
*resources,

+ uint64_t start, uint64_t end)
+{
+   struct resource *res;
+
+   res = devm_kzalloc(dev, sizeof(struct resource), GFP_KERNEL);
+   if (!res)
+   return -ENOMEM;
+
+   res->start = (resource_size_t)start;
+   res->end = (resource_size_t)end;
+   pci_add_resource_offset(resources, res, 0);
+
+   return 0;
+}
+
 static int iproc_pcie_map_dma_ranges(struct iproc_pcie *pcie)
 {
+   struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
struct of_pci_range range;
struct of_pci_range_parser parser;
int ret;
+   uint64_t start, end;
+   LIST_HEAD(resources);

/* Get the dma-ranges from DT */
ret = of_pci_dma_range_parser_init(, pcie->dev->of_node);
if (ret)
return ret;

+   start = 0;
for_each_of_pci_range(, ) {
+   end = range.pci_addr;
+   /* dma-ranges list expected in sorted order */
+   if (end < start) {
+   ret = -EINVAL;
+   goto out;
+   }
/* Each range entry corresponds to an inbound mapping region */
ret = iproc_pcie_setup_ib(pcie, , IPROC_PCIE_IB_MAP_MEM);
if (ret)
return ret;
+
+   if (end - start) {
+   ret = iproc_pcie_add_dma_resv_range(pcie->dev,
+   ,
+   start, end);
+   if (ret)
+   goto out;
+   }
+   start = range.pci_addr + range.size;
}

+   end = ~0;

Hi Srinath,

this series is based on following patch sets.

https://lkml.org/lkml/2017/5/16/19
https://lkml.org/lkml/2017/5/16/23
https://lkml.org/lkml/2017/5/16/21,

some comments to be adapted from the patch-set I did.

end = ~0;
you should consider DMA_MASK, to see iproc controller is in 32 bit or 64 
bit system.

please check following code snippet.

if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+   lo = iova_pfn(iovad, tmp_dma_addr);
+   hi = iova_pfn(iovad,
+ DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+   reserve_iova(iovad, lo, hi);
+   }

Also if this controller is integrated to 64bit platform, but decide to 
restrict DMA to 32 bit for some reason, the code should address such 
scenarios.

so it is always safe to do

#define BITS_PER_BYTE 8
DMA_BIT_MASK(sizeof(dma_addr_t) * BITS_PER_BYTE)
so please use kernle macro to find the end of DMA region.


Also ideally according to SBSA v5

8.3 PCI Express device view of memory

Transactions from a PCI express device will either directly address the 
memory system of the base server system
or be presented to a SMMU for optional address translation and 
permission policing.
In systems that are compatible with level 3 or above of the SBSA, the 
addresses sent by PCI express devices
must be presented to the memory system or SMMU unmodified. In a system 
where the PCI express does not use
an SMMU, the PCI express devices have the same view of physical memory 
as the PEs. In a system with a
SMMU for PCI express there are no transformations to addresses being 
sent by PCI express devices before they

are presented as an input address to the SMMU.


ASPEED graphics card: ast_pci_probe causes RCU stalls

2018-11-21 Thread poza

Hi,


we have on-board ASPEED Graphics card on PCIe.

kernel version: 4.16

I select following drive to enable ast graphics support.
Symbol: DRM_AST [=y] 
 
  \u2502
  AST server chips   
 
\u2502
  Location:  
 
  \u2502
  -> Device Drivers  
 
\u2502
-> Graphics support  
 
\u2502
  Defined at drivers/gpu/drm/ast/Kconfig:1   
 
  \u2502
  Depends on: HAS_IOMEM [=y] && DRM [=y] && PCI [=y] && MMU [=y] 
 
  \u2502

  Selects: DRM_TTM [=y] && DRM_KMS_HELPER [=y] && DRM_TTM [=y]

lspci -vvv output.
-

0007:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED 
Graphics Family (rev 41) (prog-if 00 [VGA controller])

Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 255
Region 0: Memory at e7010100 (32-bit, non-prefetchable) 
[size=16M]
Region 1: Memory at e7010080 (32-bit, non-prefetchable) 
[size=128K]

Region 2: I/O ports at 6 [size=128]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)

Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
Address:   Data: 


it seems to me that; ast_pci_probe seems to stuck in set_clock

[   38.293239] INFO: rcu_sched self-detected stall on CPU
[   38.300808]  0-: (35 ticks this GP) 
idle=256/1/4611686018427387906 softirq=183/183 fqs=187

[   38.313653]   (t=421 jiffies g=-232 c=-233 q=322)
[   38.320592] Task dump for CPU 0:
[   38.325566] kworker/0:0 R  running task0 3  2 
0x0002

[   38.335989] Workqueue: events work_for_cpu_fn
[   38.342409] Call trace:
[   38.346025]  dump_backtrace+0x0/0x170
[   38.351413]  show_stack+0x14/0x20
[   38.356297]  sched_show_task+0x104/0x128
[   38.362173]  dump_cpu_task+0x40/0x50
[   38.367441]  rcu_dump_cpu_stacks+0x94/0xd4
[   38.373480]  rcu_check_callbacks+0x574/0x7b0
[   38.379785]  update_process_times+0x2c/0x58
[   38.385946]  tick_sched_handle.isra.5+0x30/0x50
[   38.393830]  tick_sched_timer+0x40/0x90
[   38.399480]  __hrtimer_run_queues+0x120/0x1b8
[   38.405895]  hrtimer_interrupt+0xd4/0x250
[   38.411815]  arch_timer_handler_phys+0x28/0x40
[   38.418361]  handle_percpu_devid_irq+0x80/0x138
[   38.425152]  generic_handle_irq+0x24/0x38
[   38.431057]  __handle_domain_irq+0x5c/0xb0
[   38.437104]  gic_handle_irq+0x7c/0x184
[   38.442639]  el1_irq+0xb0/0x140
[   38.447265]  ast_get_index_reg_mask+0x4/0x38
[   38.453553]  __i2c_bit_add_bus+0x54/0x3e0
[   38.459532]  i2c_bit_add_bus+0x14/0x20
[   38.465057]  ast_mode_init+0x230/0x358
[   38.470584]  ast_driver_load+0x5a4/0x968
[   38.476368]  drm_dev_register+0x154/0x1d8
[   38.482283]  drm_get_pci_dev+0x94/0x160
[   38.488047]  ast_pci_probe+0x18/0x20
[   38.493318]  local_pci_probe+0x28/0x80
[   38.498830]  work_for_cpu_fn+0x18/0x28
[   38.504367]  process_one_work+0x1d4/0x310
[   38.510271]  worker_thread+0x230/0x470
[   38.515804]  kthread+0x128/0x130
[   38.520777]  ret_from_fork+0x10/0x18

Regards,
Oza.



ASPEED graphics card: ast_pci_probe causes RCU stalls

2018-11-21 Thread poza

Hi,


we have on-board ASPEED Graphics card on PCIe.

kernel version: 4.16

I select following drive to enable ast graphics support.
Symbol: DRM_AST [=y] 
 
  \u2502
  AST server chips   
 
\u2502
  Location:  
 
  \u2502
  -> Device Drivers  
 
\u2502
-> Graphics support  
 
\u2502
  Defined at drivers/gpu/drm/ast/Kconfig:1   
 
  \u2502
  Depends on: HAS_IOMEM [=y] && DRM [=y] && PCI [=y] && MMU [=y] 
 
  \u2502

  Selects: DRM_TTM [=y] && DRM_KMS_HELPER [=y] && DRM_TTM [=y]

lspci -vvv output.
-

0007:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED 
Graphics Family (rev 41) (prog-if 00 [VGA controller])

Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 255
Region 0: Memory at e7010100 (32-bit, non-prefetchable) 
[size=16M]
Region 1: Memory at e7010080 (32-bit, non-prefetchable) 
[size=128K]

Region 2: I/O ports at 6 [size=128]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)

Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
Address:   Data: 


it seems to me that; ast_pci_probe seems to stuck in set_clock

[   38.293239] INFO: rcu_sched self-detected stall on CPU
[   38.300808]  0-: (35 ticks this GP) 
idle=256/1/4611686018427387906 softirq=183/183 fqs=187

[   38.313653]   (t=421 jiffies g=-232 c=-233 q=322)
[   38.320592] Task dump for CPU 0:
[   38.325566] kworker/0:0 R  running task0 3  2 
0x0002

[   38.335989] Workqueue: events work_for_cpu_fn
[   38.342409] Call trace:
[   38.346025]  dump_backtrace+0x0/0x170
[   38.351413]  show_stack+0x14/0x20
[   38.356297]  sched_show_task+0x104/0x128
[   38.362173]  dump_cpu_task+0x40/0x50
[   38.367441]  rcu_dump_cpu_stacks+0x94/0xd4
[   38.373480]  rcu_check_callbacks+0x574/0x7b0
[   38.379785]  update_process_times+0x2c/0x58
[   38.385946]  tick_sched_handle.isra.5+0x30/0x50
[   38.393830]  tick_sched_timer+0x40/0x90
[   38.399480]  __hrtimer_run_queues+0x120/0x1b8
[   38.405895]  hrtimer_interrupt+0xd4/0x250
[   38.411815]  arch_timer_handler_phys+0x28/0x40
[   38.418361]  handle_percpu_devid_irq+0x80/0x138
[   38.425152]  generic_handle_irq+0x24/0x38
[   38.431057]  __handle_domain_irq+0x5c/0xb0
[   38.437104]  gic_handle_irq+0x7c/0x184
[   38.442639]  el1_irq+0xb0/0x140
[   38.447265]  ast_get_index_reg_mask+0x4/0x38
[   38.453553]  __i2c_bit_add_bus+0x54/0x3e0
[   38.459532]  i2c_bit_add_bus+0x14/0x20
[   38.465057]  ast_mode_init+0x230/0x358
[   38.470584]  ast_driver_load+0x5a4/0x968
[   38.476368]  drm_dev_register+0x154/0x1d8
[   38.482283]  drm_get_pci_dev+0x94/0x160
[   38.488047]  ast_pci_probe+0x18/0x20
[   38.493318]  local_pci_probe+0x28/0x80
[   38.498830]  work_for_cpu_fn+0x18/0x28
[   38.504367]  process_one_work+0x1d4/0x310
[   38.510271]  worker_thread+0x230/0x470
[   38.515804]  kthread+0x128/0x130
[   38.520777]  ret_from_fork+0x10/0x18

Regards,
Oza.



Re: [PATCH] PCI/AER: Clear uncorrectable error status for device

2018-09-28 Thread poza

On 2018-09-27 03:38, Bjorn Helgaas wrote:

[+cc Sinan, LKML]

On Tue, Sep 18, 2018 at 04:20:29AM -0400, Oza Pawandeep wrote:

PCI based device drivers handles ERR_NONFATAL  by registering
pci_error_handlers. some of the drivers clear AER uncorrectable status
in slot_reset while some in resume.

Drivers should not have responsibility of clearing the AER status, 
instead

shall be done by error and recovery framework defined in err.c


Agreed, and Keith's patch 43c9a34fe04e ("PCI/ERR: Always use the first
downstream port") [1], which is queued on pci/hotplug for v4.20, does
call pci_cleanup_aer_uncorrect_error_status() at the end of
pcie_do_recovery().

1) Does that seem like the right place?

2) I guess all we need now would be to remove the calls from the
   drivers?

3) If we remove all the calls from the drivers, we should remove the
   declaration from include/linux/aer.h, too.

I can take care of these updates if we agree they're the right thing
to do.


sure Bjorn. this patch already removes all the calls from drivers and 
adds

call to pci_cleanup_aer_uncorrect_error_status().

Please feel free to modify or adapt and take care.

Regards,
Oza.



[1]
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?id=43c9a34fe04e


Clear the status while resuming, after reset_link was successful.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/crypto/qat/qat_common/adf_aer.c 
b/drivers/crypto/qat/qat_common/adf_aer.c

index da8a2d3..61ded36 100644
--- a/drivers/crypto/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/qat/qat_common/adf_aer.c
@@ -198,7 +198,6 @@ static pci_ers_result_t adf_slot_reset(struct 
pci_dev *pdev)

pr_err("QAT: Can't find acceleration device\n");
return PCI_ERS_RESULT_DISCONNECT;
}
-   pci_cleanup_aer_uncorrect_error_status(pdev);
if (adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_SYNC))
return PCI_ERS_RESULT_DISCONNECT;

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 4fa4c06..80c475f 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1267,12 +1267,6 @@ static pci_ers_result_t 
ioat_pcie_error_slot_reset(struct pci_dev *pdev)

pci_wake_from_d3(pdev, false);
}

-   err = pci_cleanup_aer_uncorrect_error_status(pdev);
-   if (err) {
-   dev_err(>dev,
-   "AER uncorrect error status clear failed: %#x\n", err);
-   }
-
return result;
 }

diff --git a/drivers/infiniband/hw/hfi1/pcie.c 
b/drivers/infiniband/hw/hfi1/pcie.c

index baf7c32..38bc804 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -655,7 +655,6 @@ pci_resume(struct pci_dev *pdev)
struct hfi1_devdata *dd = pci_get_drvdata(pdev);

dd_dev_info(dd, "HFI1 resume function called\n");
-   pci_cleanup_aer_uncorrect_error_status(pdev);
/*
 * Running jobs will fail, since it's asynchronous
 * unlike sysfs-requested reset.   Better than
diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c

index 5ac7b31..30595b3 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -597,7 +597,6 @@ qib_pci_resume(struct pci_dev *pdev)
struct qib_devdata *dd = pci_get_drvdata(pdev);

qib_devinfo(pdev, "QIB resume function called\n");
-   pci_cleanup_aer_uncorrect_error_status(pdev);
/*
 * Running jobs will fail, since it's asynchronous
 * unlike sysfs-requested reset.   Better than
diff --git a/drivers/net/ethernet/atheros/alx/main.c 
b/drivers/net/ethernet/atheros/alx/main.c

index 567ee54..0d0b6a4 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -1960,8 +1960,6 @@ static pci_ers_result_t 
alx_pci_error_slot_reset(struct pci_dev *pdev)

if (!alx_reset_mac(hw))
rc = PCI_ERS_RESULT_RECOVERED;
 out:
-   pci_cleanup_aer_uncorrect_error_status(pdev);
-
rtnl_unlock();

return rc;
diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
b/drivers/net/ethernet/broadcom/bnx2.c

index 122fdb8..bbb2471 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -8793,13 +8793,6 @@ static pci_ers_result_t 
bnx2_io_slot_reset(struct pci_dev *pdev)

if (!(bp->flags & BNX2_FLAG_AER_ENABLED))
return result;

-   err = pci_cleanup_aer_uncorrect_error_status(pdev);
-   if (err) {
-   dev_err(>dev,
-   "pci_cleanup_aer_uncorrect_error_status failed 0x%0x\n",
-err); /* non-fatal, continue */
-   }
-
return result;
 }

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

index 5b1ed24..cfb6c89 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ 

Re: [PATCH] PCI/AER: Clear uncorrectable error status for device

2018-09-28 Thread poza

On 2018-09-27 03:38, Bjorn Helgaas wrote:

[+cc Sinan, LKML]

On Tue, Sep 18, 2018 at 04:20:29AM -0400, Oza Pawandeep wrote:

PCI based device drivers handles ERR_NONFATAL  by registering
pci_error_handlers. some of the drivers clear AER uncorrectable status
in slot_reset while some in resume.

Drivers should not have responsibility of clearing the AER status, 
instead

shall be done by error and recovery framework defined in err.c


Agreed, and Keith's patch 43c9a34fe04e ("PCI/ERR: Always use the first
downstream port") [1], which is queued on pci/hotplug for v4.20, does
call pci_cleanup_aer_uncorrect_error_status() at the end of
pcie_do_recovery().

1) Does that seem like the right place?

2) I guess all we need now would be to remove the calls from the
   drivers?

3) If we remove all the calls from the drivers, we should remove the
   declaration from include/linux/aer.h, too.

I can take care of these updates if we agree they're the right thing
to do.


sure Bjorn. this patch already removes all the calls from drivers and 
adds

call to pci_cleanup_aer_uncorrect_error_status().

Please feel free to modify or adapt and take care.

Regards,
Oza.



[1]
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?id=43c9a34fe04e


Clear the status while resuming, after reset_link was successful.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/crypto/qat/qat_common/adf_aer.c 
b/drivers/crypto/qat/qat_common/adf_aer.c

index da8a2d3..61ded36 100644
--- a/drivers/crypto/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/qat/qat_common/adf_aer.c
@@ -198,7 +198,6 @@ static pci_ers_result_t adf_slot_reset(struct 
pci_dev *pdev)

pr_err("QAT: Can't find acceleration device\n");
return PCI_ERS_RESULT_DISCONNECT;
}
-   pci_cleanup_aer_uncorrect_error_status(pdev);
if (adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_SYNC))
return PCI_ERS_RESULT_DISCONNECT;

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 4fa4c06..80c475f 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1267,12 +1267,6 @@ static pci_ers_result_t 
ioat_pcie_error_slot_reset(struct pci_dev *pdev)

pci_wake_from_d3(pdev, false);
}

-   err = pci_cleanup_aer_uncorrect_error_status(pdev);
-   if (err) {
-   dev_err(>dev,
-   "AER uncorrect error status clear failed: %#x\n", err);
-   }
-
return result;
 }

diff --git a/drivers/infiniband/hw/hfi1/pcie.c 
b/drivers/infiniband/hw/hfi1/pcie.c

index baf7c32..38bc804 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -655,7 +655,6 @@ pci_resume(struct pci_dev *pdev)
struct hfi1_devdata *dd = pci_get_drvdata(pdev);

dd_dev_info(dd, "HFI1 resume function called\n");
-   pci_cleanup_aer_uncorrect_error_status(pdev);
/*
 * Running jobs will fail, since it's asynchronous
 * unlike sysfs-requested reset.   Better than
diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c

index 5ac7b31..30595b3 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -597,7 +597,6 @@ qib_pci_resume(struct pci_dev *pdev)
struct qib_devdata *dd = pci_get_drvdata(pdev);

qib_devinfo(pdev, "QIB resume function called\n");
-   pci_cleanup_aer_uncorrect_error_status(pdev);
/*
 * Running jobs will fail, since it's asynchronous
 * unlike sysfs-requested reset.   Better than
diff --git a/drivers/net/ethernet/atheros/alx/main.c 
b/drivers/net/ethernet/atheros/alx/main.c

index 567ee54..0d0b6a4 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -1960,8 +1960,6 @@ static pci_ers_result_t 
alx_pci_error_slot_reset(struct pci_dev *pdev)

if (!alx_reset_mac(hw))
rc = PCI_ERS_RESULT_RECOVERED;
 out:
-   pci_cleanup_aer_uncorrect_error_status(pdev);
-
rtnl_unlock();

return rc;
diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
b/drivers/net/ethernet/broadcom/bnx2.c

index 122fdb8..bbb2471 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -8793,13 +8793,6 @@ static pci_ers_result_t 
bnx2_io_slot_reset(struct pci_dev *pdev)

if (!(bp->flags & BNX2_FLAG_AER_ENABLED))
return result;

-   err = pci_cleanup_aer_uncorrect_error_status(pdev);
-   if (err) {
-   dev_err(>dev,
-   "pci_cleanup_aer_uncorrect_error_status failed 0x%0x\n",
-err); /* non-fatal, continue */
-   }
-
return result;
 }

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

index 5b1ed24..cfb6c89 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ 

Re: [PATCH] PCI/AER: Clear uncorrectable error status for device

2018-09-18 Thread poza

On 2018-09-18 20:00, Sinan Kaya wrote:

On 9/18/2018 4:20 AM, Oza Pawandeep wrote:

+++ b/drivers/pci/pcie/err.c
@@ -265,6 +265,8 @@ static pci_ers_result_t 
broadcast_error_message(struct pci_dev *dev,

 * The error is non fatal so the bus is ok; just invoke
 * the callback for the function that logged the error.
 */
+   if (cb == report_resume)
+   pci_cleanup_aer_uncorrect_error_status(dev);
cb(dev, _data);
}


In order to follow the existing behavior (drivers are calling
pci_cleanup_aer_uncorrect_error_status() right before return),
you should probably move the pci_cleanup_aer_uncorrect_error_status
after

cb(dev, _data);

line.


some drivers are calling it in slot_reset, which is before resume,
while some are calling in beginning of resume (e.g. netxen_io_resume)

hence I have kept it before calling resume()   (e.g. before cb(dev, 
_data))


Regards,
Oza.


Re: [PATCH] PCI/AER: Clear uncorrectable error status for device

2018-09-18 Thread poza

On 2018-09-18 20:00, Sinan Kaya wrote:

On 9/18/2018 4:20 AM, Oza Pawandeep wrote:

+++ b/drivers/pci/pcie/err.c
@@ -265,6 +265,8 @@ static pci_ers_result_t 
broadcast_error_message(struct pci_dev *dev,

 * The error is non fatal so the bus is ok; just invoke
 * the callback for the function that logged the error.
 */
+   if (cb == report_resume)
+   pci_cleanup_aer_uncorrect_error_status(dev);
cb(dev, _data);
}


In order to follow the existing behavior (drivers are calling
pci_cleanup_aer_uncorrect_error_status() right before return),
you should probably move the pci_cleanup_aer_uncorrect_error_status
after

cb(dev, _data);

line.


some drivers are calling it in slot_reset, which is before resume,
while some are calling in beginning of resume (e.g. netxen_io_resume)

hence I have kept it before calling resume()   (e.g. before cb(dev, 
_data))


Regards,
Oza.


Re: [PATCH v2] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-09-10 Thread poza

On 2018-09-06 01:40, Bjorn Helgaas wrote:

On Thu, Aug 09, 2018 at 08:41:27PM +0530, p...@codeaurora.org wrote:

On 2018-08-09 20:27, Bharat Kumar Gogada wrote:
> As per Figure 6-3 in PCIe r4.0, sec 6.2.6, ERR_ messages
> will be forwarded from the secondary interface to the primary interface,
> if the SERR# Enable bit in the Bridge Control register is set.
> Currently PCI_BRIDGE_CTL_SERR is being enabled only in
> ACPI flow.
> This patch enables PCI_BRIDGE_CTL_SERR for Type-1 PCI device.
>
> Signed-off-by: Bharat Kumar Gogada 
> ---
>  drivers/pci/pcie/aer.c |   23 +++
>  1 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index a2e8838..4fb0d24 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -343,6 +343,20 @@ int pci_enable_pcie_error_reporting(struct pci_dev
> *dev)
>if (!dev->aer_cap)
>return -EIO;
>
> +  if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /*
> +   * A Type-1 PCI bridge will not forward ERR_ messages coming
> +   * from an endpoint if SERR# forwarding is not enabled.
> +   */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
> +  if (!(control & PCI_BRIDGE_CTL_SERR)) {
> +  control |= PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +  }
> +
>return pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> PCI_EXP_AER_FLAGS);
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
> @@ -352,6 +366,15 @@ int pci_disable_pcie_error_reporting(struct pci_dev
> *dev)
>if (pcie_aer_get_firmware_first(dev))
>return -EIO;
>
> +  if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /* Clear SERR Forwarding */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
> +  control &= ~PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +
>return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  PCI_EXP_AER_FLAGS);
>  }


Hi Bjorn,

I made some comments on patchv1., same I am putting it across here.

what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all
the downstream pci_dev ?
and remove all the calls from drivers..
aer_init will be called for each device (pci_dev) while pciehp does
re-enumeration.
so probable we might want to call pci_enable_pcie_error_reporting()
but that dictates the design where AER framework is taking decision to
enable error reporting on behalf of drivers as well.
but thats fine I think, if drivers do not want to participate then 
they have

to call  pci_disable_pcie_error_reporting explicitly.
does this make sense ?


I just replied to the original patch; sorry, I forgot to add you to
the cc list.  Bharat, when people respond to your v patch, can you
please add them (and anybody else *they* added) to the cc list when
you post your v patch?

If we set PCI_BRIDGE_CTL_SERR in the pci_configure_device() path,
would that address your comments?


yes, that should do.



There's still a separate question of where and how we should configure
the error bits in PCI_EXP_DEVCTL (currently done in
pci_enable_pcie_error_reporting()).

Bjorn


Re: [PATCH v2] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-09-10 Thread poza

On 2018-09-06 01:40, Bjorn Helgaas wrote:

On Thu, Aug 09, 2018 at 08:41:27PM +0530, p...@codeaurora.org wrote:

On 2018-08-09 20:27, Bharat Kumar Gogada wrote:
> As per Figure 6-3 in PCIe r4.0, sec 6.2.6, ERR_ messages
> will be forwarded from the secondary interface to the primary interface,
> if the SERR# Enable bit in the Bridge Control register is set.
> Currently PCI_BRIDGE_CTL_SERR is being enabled only in
> ACPI flow.
> This patch enables PCI_BRIDGE_CTL_SERR for Type-1 PCI device.
>
> Signed-off-by: Bharat Kumar Gogada 
> ---
>  drivers/pci/pcie/aer.c |   23 +++
>  1 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index a2e8838..4fb0d24 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -343,6 +343,20 @@ int pci_enable_pcie_error_reporting(struct pci_dev
> *dev)
>if (!dev->aer_cap)
>return -EIO;
>
> +  if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /*
> +   * A Type-1 PCI bridge will not forward ERR_ messages coming
> +   * from an endpoint if SERR# forwarding is not enabled.
> +   */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
> +  if (!(control & PCI_BRIDGE_CTL_SERR)) {
> +  control |= PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +  }
> +
>return pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> PCI_EXP_AER_FLAGS);
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
> @@ -352,6 +366,15 @@ int pci_disable_pcie_error_reporting(struct pci_dev
> *dev)
>if (pcie_aer_get_firmware_first(dev))
>return -EIO;
>
> +  if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /* Clear SERR Forwarding */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
> +  control &= ~PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +
>return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  PCI_EXP_AER_FLAGS);
>  }


Hi Bjorn,

I made some comments on patchv1., same I am putting it across here.

what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all
the downstream pci_dev ?
and remove all the calls from drivers..
aer_init will be called for each device (pci_dev) while pciehp does
re-enumeration.
so probable we might want to call pci_enable_pcie_error_reporting()
but that dictates the design where AER framework is taking decision to
enable error reporting on behalf of drivers as well.
but thats fine I think, if drivers do not want to participate then 
they have

to call  pci_disable_pcie_error_reporting explicitly.
does this make sense ?


I just replied to the original patch; sorry, I forgot to add you to
the cc list.  Bharat, when people respond to your v patch, can you
please add them (and anybody else *they* added) to the cc list when
you post your v patch?

If we set PCI_BRIDGE_CTL_SERR in the pci_configure_device() path,
would that address your comments?


yes, that should do.



There's still a separate question of where and how we should configure
the error bits in PCI_EXP_DEVCTL (currently done in
pci_enable_pcie_error_reporting()).

Bjorn


Re: [PATCH v2] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-08-09 Thread poza

On 2018-08-09 20:27, Bharat Kumar Gogada wrote:

As per Figure 6-3 in PCIe r4.0, sec 6.2.6, ERR_ messages
will be forwarded from the secondary interface to the primary 
interface,

if the SERR# Enable bit in the Bridge Control register is set.
Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This patch enables PCI_BRIDGE_CTL_SERR for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..4fb0d24 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,20 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   if (!(control & PCI_BRIDGE_CTL_SERR)) {
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +366,15 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Hi Bjorn,

I made some comments on patchv1., same I am putting it across here.

what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all
the downstream pci_dev ?
and remove all the calls from drivers..
aer_init will be called for each device (pci_dev) while pciehp does 
re-enumeration.

so probable we might want to call pci_enable_pcie_error_reporting()
but that dictates the design where AER framework is taking decision to 
enable error reporting on behalf of drivers as well.
but thats fine I think, if drivers do not want to participate then they 
have to call  pci_disable_pcie_error_reporting explicitly.

does this make sense ?








Re: [PATCH v2] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-08-09 Thread poza

On 2018-08-09 20:27, Bharat Kumar Gogada wrote:

As per Figure 6-3 in PCIe r4.0, sec 6.2.6, ERR_ messages
will be forwarded from the secondary interface to the primary 
interface,

if the SERR# Enable bit in the Bridge Control register is set.
Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This patch enables PCI_BRIDGE_CTL_SERR for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..4fb0d24 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,20 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   if (!(control & PCI_BRIDGE_CTL_SERR)) {
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +366,15 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Hi Bjorn,

I made some comments on patchv1., same I am putting it across here.

what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all
the downstream pci_dev ?
and remove all the calls from drivers..
aer_init will be called for each device (pci_dev) while pciehp does 
re-enumeration.

so probable we might want to call pci_enable_pcie_error_reporting()
but that dictates the design where AER framework is taking decision to 
enable error reporting on behalf of drivers as well.
but thats fine I think, if drivers do not want to participate then they 
have to call  pci_disable_pcie_error_reporting explicitly.

does this make sense ?








Re: [PATCH] PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition

2018-08-02 Thread poza

On 2018-08-01 04:20, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level.  This looks 
like

my merge error from these commits:

  fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
  41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")

Remove the duplicate PCI_EXP_AER_FLAGS definition.

Fixes: 41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2b344c9e2d46..c6cc855bfa22 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -329,8 +329,6 @@ int pcie_aer_get_firmware_first(struct pci_dev 
*dev)

aer_set_firmware_first(dev);
return dev->__aer_firmware_first;
 }
-#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

-PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)

 static bool aer_firmware_first;



Reviewed-by: Oza Pawandeep 


Re: [PATCH] PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition

2018-08-02 Thread poza

On 2018-08-01 04:20, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level.  This looks 
like

my merge error from these commits:

  fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
  41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")

Remove the duplicate PCI_EXP_AER_FLAGS definition.

Fixes: 41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pcie/aer.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2b344c9e2d46..c6cc855bfa22 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -329,8 +329,6 @@ int pcie_aer_get_firmware_first(struct pci_dev 
*dev)

aer_set_firmware_first(dev);
return dev->__aer_firmware_first;
 }
-#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

-PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)

 static bool aer_firmware_first;



Reviewed-by: Oza Pawandeep 


Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-08-02 Thread poza

On 2018-08-02 11:53, p...@codeaurora.org wrote:

On 2018-08-01 04:17, Bjorn Helgaas wrote:

On Thu, Jul 12, 2018 at 08:15:19PM +0530, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.


This does seem broken.

Figure 6-3 in PCIe r4.0, sec 6.2.6, would be a helpful reference to
include in the commit log.

Semi-related question: there are about 40 drivers that call
pci_enable_pcie_error_reporting() and
pci_disable_pcie_error_reporting().  I see that the PCI core
calls pci_enable_pcie_error_reporting() for Root Ports and Switch
Ports in this path:

  aer_probe # for root ports only
aer_enable_rootport
  set_downstream_devices_error_reporting
set_device_error_reporting
  if (ROOT_PORT || UPSTREAM || DOWNSTREAM)
pci_enable_pcie_error_reporting
pci_walk_bus(..., set_device_error_reporting)

But the core doesn't call pci_enable_pcie_error_reporting() for
endpoints.  I wonder why not.  Could we?  And then remove the calls
from those drivers?  If PCI_EXP_AER_FLAGS should only be set if the
driver is prepared, the pci_driver.err_handler would be a good hint.
But I suspect we could do something sensible and at least report
errors even if the driver doesn't have err_handler callbacks.



what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all
the downstream pci_dev ?
and remove all the calls from drivers..



aer_init will be called for each device (pci_dev) while pciehp does 
re-enumeration.

so probable we might want to call pci_enable_pcie_error_reporting()
but that dictates the design where AER framework is taking decision to 
enable error reporting on behalf of drivers as well.
but thats fine I think, if drivers do not want to participate then they 
have to call  pci_disable_pcie_error_reporting explicitly.

does this make sense ?


On MIPS Octeon, it looks like pcibios_plat_dev_init() does already set
PCI_EXP_AER_FLAGS for every device.

But this question is obviously far beyond the scope of this current
patch.


Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct 
pci_dev *dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {


I think this test needs to be refined a little bit.  If the kernel
happens to be built with CONFIG_ACPI=y but the current platform
doesn't support ACPI, we still want to set PCI_BRIDGE_CTL_SERR,
don't we?


+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }
--
1.7.1



Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-08-02 Thread poza

On 2018-08-02 11:53, p...@codeaurora.org wrote:

On 2018-08-01 04:17, Bjorn Helgaas wrote:

On Thu, Jul 12, 2018 at 08:15:19PM +0530, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.


This does seem broken.

Figure 6-3 in PCIe r4.0, sec 6.2.6, would be a helpful reference to
include in the commit log.

Semi-related question: there are about 40 drivers that call
pci_enable_pcie_error_reporting() and
pci_disable_pcie_error_reporting().  I see that the PCI core
calls pci_enable_pcie_error_reporting() for Root Ports and Switch
Ports in this path:

  aer_probe # for root ports only
aer_enable_rootport
  set_downstream_devices_error_reporting
set_device_error_reporting
  if (ROOT_PORT || UPSTREAM || DOWNSTREAM)
pci_enable_pcie_error_reporting
pci_walk_bus(..., set_device_error_reporting)

But the core doesn't call pci_enable_pcie_error_reporting() for
endpoints.  I wonder why not.  Could we?  And then remove the calls
from those drivers?  If PCI_EXP_AER_FLAGS should only be set if the
driver is prepared, the pci_driver.err_handler would be a good hint.
But I suspect we could do something sensible and at least report
errors even if the driver doesn't have err_handler callbacks.



what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all
the downstream pci_dev ?
and remove all the calls from drivers..



aer_init will be called for each device (pci_dev) while pciehp does 
re-enumeration.

so probable we might want to call pci_enable_pcie_error_reporting()
but that dictates the design where AER framework is taking decision to 
enable error reporting on behalf of drivers as well.
but thats fine I think, if drivers do not want to participate then they 
have to call  pci_disable_pcie_error_reporting explicitly.

does this make sense ?


On MIPS Octeon, it looks like pcibios_plat_dev_init() does already set
PCI_EXP_AER_FLAGS for every device.

But this question is obviously far beyond the scope of this current
patch.


Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct 
pci_dev *dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {


I think this test needs to be refined a little bit.  If the kernel
happens to be built with CONFIG_ACPI=y but the current platform
doesn't support ACPI, we still want to set PCI_BRIDGE_CTL_SERR,
don't we?


+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }
--
1.7.1



Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-08-02 Thread poza

On 2018-08-01 04:17, Bjorn Helgaas wrote:

On Thu, Jul 12, 2018 at 08:15:19PM +0530, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.


This does seem broken.

Figure 6-3 in PCIe r4.0, sec 6.2.6, would be a helpful reference to
include in the commit log.

Semi-related question: there are about 40 drivers that call
pci_enable_pcie_error_reporting() and
pci_disable_pcie_error_reporting().  I see that the PCI core
calls pci_enable_pcie_error_reporting() for Root Ports and Switch
Ports in this path:

  aer_probe # for root ports only
aer_enable_rootport
  set_downstream_devices_error_reporting
set_device_error_reporting
  if (ROOT_PORT || UPSTREAM || DOWNSTREAM)
pci_enable_pcie_error_reporting
pci_walk_bus(..., set_device_error_reporting)

But the core doesn't call pci_enable_pcie_error_reporting() for
endpoints.  I wonder why not.  Could we?  And then remove the calls
from those drivers?  If PCI_EXP_AER_FLAGS should only be set if the
driver is prepared, the pci_driver.err_handler would be a good hint.
But I suspect we could do something sensible and at least report
errors even if the driver doesn't have err_handler callbacks.



what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all the 
downstream pci_dev ?

and remove all the calls from drivers..


On MIPS Octeon, it looks like pcibios_plat_dev_init() does already set
PCI_EXP_AER_FLAGS for every device.

But this question is obviously far beyond the scope of this current
patch.


Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct 
pci_dev *dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {


I think this test needs to be refined a little bit.  If the kernel
happens to be built with CONFIG_ACPI=y but the current platform
doesn't support ACPI, we still want to set PCI_BRIDGE_CTL_SERR,
don't we?


+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }
--
1.7.1



Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-08-02 Thread poza

On 2018-08-01 04:17, Bjorn Helgaas wrote:

On Thu, Jul 12, 2018 at 08:15:19PM +0530, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.


This does seem broken.

Figure 6-3 in PCIe r4.0, sec 6.2.6, would be a helpful reference to
include in the commit log.

Semi-related question: there are about 40 drivers that call
pci_enable_pcie_error_reporting() and
pci_disable_pcie_error_reporting().  I see that the PCI core
calls pci_enable_pcie_error_reporting() for Root Ports and Switch
Ports in this path:

  aer_probe # for root ports only
aer_enable_rootport
  set_downstream_devices_error_reporting
set_device_error_reporting
  if (ROOT_PORT || UPSTREAM || DOWNSTREAM)
pci_enable_pcie_error_reporting
pci_walk_bus(..., set_device_error_reporting)

But the core doesn't call pci_enable_pcie_error_reporting() for
endpoints.  I wonder why not.  Could we?  And then remove the calls
from those drivers?  If PCI_EXP_AER_FLAGS should only be set if the
driver is prepared, the pci_driver.err_handler would be a good hint.
But I suspect we could do something sensible and at least report
errors even if the driver doesn't have err_handler callbacks.



what about hot-plug case ?
should not aer_init() call pci_enable_pcie_error_reporting() for all the 
downstream pci_dev ?

and remove all the calls from drivers..


On MIPS Octeon, it looks like pcibios_plat_dev_init() does already set
PCI_EXP_AER_FLAGS for every device.

But this question is obviously far beyond the scope of this current
patch.


Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct 
pci_dev *dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {


I think this test needs to be refined a little bit.  If the kernel
happens to be built with CONFIG_ACPI=y but the current platform
doesn't support ACPI, we still want to set PCI_BRIDGE_CTL_SERR,
don't we?


+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }
--
1.7.1



Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-25 Thread poza

On 2018-07-21 11:37, Sinan Kaya wrote:

On 7/20/2018 7:58 PM, Sinan Kaya wrote:

We need to figure out how to gracefully return inside hotplug driver
if link down happened and there is an error pending.


How about adding the following into the hotplug ISR?

1. check if firmware first is disabled
2. check if there is a fatal error pending in the device_status 
register

of the PCI Express capability on the root port.
3. bail out from hotplug routine if this is the case.
4. otherwise, existing behavior.


This makes sense.

from Lukas's text

"
The user may turn the slot on/off via sysfs.  If an Attention Button
is present, the user may also press that button to turn the slot on/off
after 5 seconds.  Either way, it may cause pciehp's IRQ thread to run
concurrently to a reset initiated by the AER driver, independently of
any events signalled by the slot.
"

so if device gets removed and re-enumerated other than hotplug ISR,
or any other similar path has to take care of checking ERR_FATAL status.

Regards,
Oza.


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-25 Thread poza

On 2018-07-21 11:37, Sinan Kaya wrote:

On 7/20/2018 7:58 PM, Sinan Kaya wrote:

We need to figure out how to gracefully return inside hotplug driver
if link down happened and there is an error pending.


How about adding the following into the hotplug ISR?

1. check if firmware first is disabled
2. check if there is a fatal error pending in the device_status 
register

of the PCI Express capability on the root port.
3. bail out from hotplug routine if this is the case.
4. otherwise, existing behavior.


This makes sense.

from Lukas's text

"
The user may turn the slot on/off via sysfs.  If an Attention Button
is present, the user may also press that button to turn the slot on/off
after 5 seconds.  Either way, it may cause pciehp's IRQ thread to run
concurrently to a reset initiated by the AER driver, independently of
any events signalled by the slot.
"

so if device gets removed and re-enumerated other than hotplug ISR,
or any other similar path has to take care of checking ERR_FATAL status.

Regards,
Oza.


[RFC] SERR# handling by Linux

2018-07-23 Thread poza

Hi Bjorn and Keith,

This discussion is to extend the idea of follwing patch.
[PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

PCIe Spec
7.6.2.1.3 Command Register (Offset 04h)

SERR# Enable – See Section 7.6.2.1.14.
When Set, this bit enables reporting upstream of Non-fatal and Fatal 
errors detected by the Function to the Root Complex. Note that errors 
are reported if enabled either through this bit or through the PCI 
Express specific bits in the Device Control register (see Section 
7.6.3.4).
In addition, for Functions with Type 1 Configuration Space headers, this 
bit controls transmission by the primary interface of ERR_NONFATAL and 
ERR_FATAL error Messages forwarded from the secondary interface. This 
bit does not affect the transmission of forwarded ERR_COR messages.
A Root Complex Integrated Endpoint that is not associated with a Root 
Complex Event Collector is permitted to hardwire this bit to 0b.

Default value of this bit is 0b.


7.6.2.3.13 Bridge Control Register
SERR# Enable – See Section 7.6.2.1.147.5.1.8.
This bit controls forwarding of ERR_COR, ERR_NONFATAL and ERR_FATAL from 
secondary to primary.


6.2.3.2.2
The transmission of these error Messages by class (correctable, 
non-fatal, fatal) is enabled using the Reporting Enable bits of the 
Device Control register (see Section 7.6.3.4) or the SERR# Enable bit in 
the PCI Command register (see Section 7.6.2.1.3).



AER driver touches device control (and choose not to touch PCI_COMMAND)
On the hand SERR# of Bridge Control Register is not set either.

The meaning of both the SERR# for type-1 configuration space seems to me 
the same.
both essentially says that ERR_NONFATAL and ERR_FATAL from secondary to 
primary.
except that bridge control setting, also forwards ERR_COR messages while 
Command Register settings affect only ERR_NONFATAL and ERR_FATAL.


there are 2 cases,
1)hotplug Enabled slot is inserted with type-1 configuration space 
(bridge) and
2) hot plug disabled, where on our platform we typically set #SERR by 
firmware


So yes it makes sense to set #SERR bit by AER driver if it fins bridge.
but we not only have do
[PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

but also we have to cover hotplug case and hence
pci_aer_init() should call
 pci_enable_pcie_error_reporting(dev);


something like below.

int pci_aer_init(struct pci_dev *dev)
{
int rc;

dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);

pci_enable_pcie_error_reporting(dev);
return pci_cleanup_aer_error_status_regs(dev);
}


int pci_enable_pcie_error_reporting(struct pci_dev *dev)
{
int ret;

if (pcie_aer_get_firmware_first(dev))
return -EIO;

if (!dev->aer_cap)
return -EIO;

if (!IS_ENABLED(CONFIG_ACPI) &&
dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
u16 control;

/*
 * A Type-1 PCI bridge will not forward ERR_ messages coming
 * from an endpoint if SERR# forwarding is not enabled.
 */
pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
control |= PCI_BRIDGE_CTL_SERR;
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
}

return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

}
EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);

also we have to remove pci_enable_pcie_error_reporting() call from the 
drivers.

because aer_init will do it for all the devices.

although I am not very sure is it safe to detect enable error reporting 
by default for all the error devices ?

e.g. setting PCI_EXP_DEVCTL.

probably drivers might want to call pci_disable_pcie_error_reporting() 
who doesnt want to participate in error reporting.


Regards,
Oza.













[RFC] SERR# handling by Linux

2018-07-23 Thread poza

Hi Bjorn and Keith,

This discussion is to extend the idea of follwing patch.
[PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

PCIe Spec
7.6.2.1.3 Command Register (Offset 04h)

SERR# Enable – See Section 7.6.2.1.14.
When Set, this bit enables reporting upstream of Non-fatal and Fatal 
errors detected by the Function to the Root Complex. Note that errors 
are reported if enabled either through this bit or through the PCI 
Express specific bits in the Device Control register (see Section 
7.6.3.4).
In addition, for Functions with Type 1 Configuration Space headers, this 
bit controls transmission by the primary interface of ERR_NONFATAL and 
ERR_FATAL error Messages forwarded from the secondary interface. This 
bit does not affect the transmission of forwarded ERR_COR messages.
A Root Complex Integrated Endpoint that is not associated with a Root 
Complex Event Collector is permitted to hardwire this bit to 0b.

Default value of this bit is 0b.


7.6.2.3.13 Bridge Control Register
SERR# Enable – See Section 7.6.2.1.147.5.1.8.
This bit controls forwarding of ERR_COR, ERR_NONFATAL and ERR_FATAL from 
secondary to primary.


6.2.3.2.2
The transmission of these error Messages by class (correctable, 
non-fatal, fatal) is enabled using the Reporting Enable bits of the 
Device Control register (see Section 7.6.3.4) or the SERR# Enable bit in 
the PCI Command register (see Section 7.6.2.1.3).



AER driver touches device control (and choose not to touch PCI_COMMAND)
On the hand SERR# of Bridge Control Register is not set either.

The meaning of both the SERR# for type-1 configuration space seems to me 
the same.
both essentially says that ERR_NONFATAL and ERR_FATAL from secondary to 
primary.
except that bridge control setting, also forwards ERR_COR messages while 
Command Register settings affect only ERR_NONFATAL and ERR_FATAL.


there are 2 cases,
1)hotplug Enabled slot is inserted with type-1 configuration space 
(bridge) and
2) hot plug disabled, where on our platform we typically set #SERR by 
firmware


So yes it makes sense to set #SERR bit by AER driver if it fins bridge.
but we not only have do
[PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

but also we have to cover hotplug case and hence
pci_aer_init() should call
 pci_enable_pcie_error_reporting(dev);


something like below.

int pci_aer_init(struct pci_dev *dev)
{
int rc;

dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);

pci_enable_pcie_error_reporting(dev);
return pci_cleanup_aer_error_status_regs(dev);
}


int pci_enable_pcie_error_reporting(struct pci_dev *dev)
{
int ret;

if (pcie_aer_get_firmware_first(dev))
return -EIO;

if (!dev->aer_cap)
return -EIO;

if (!IS_ENABLED(CONFIG_ACPI) &&
dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
u16 control;

/*
 * A Type-1 PCI bridge will not forward ERR_ messages coming
 * from an endpoint if SERR# forwarding is not enabled.
 */
pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
control |= PCI_BRIDGE_CTL_SERR;
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
}

return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

}
EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);

also we have to remove pci_enable_pcie_error_reporting() call from the 
drivers.

because aer_init will do it for all the devices.

although I am not very sure is it safe to detect enable error reporting 
by default for all the error devices ?

e.g. setting PCI_EXP_DEVCTL.

probably drivers might want to call pci_disable_pcie_error_reporting() 
who doesnt want to participate in error reporting.


Regards,
Oza.













Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-23 Thread poza

On 2018-07-18 19:04, Bharat Kumar Gogada wrote:

On 2018-07-13 19:25, Bharat Kumar Gogada wrote:
>> > Currently PCI_BRIDGE_CTL_SERR is being enabled only in ACPI flow.
>> > This bit is required for forwarding errors reported by EP devices
>> > to upstream device.
>> > This patch enables SERR# for Type-1 PCI device.
>> >
>> > Signed-off-by: Bharat Kumar Gogada

>> > ---
>> >  drivers/pci/pcie/aer.c |   23 +++
>> >  1 files changed, 23 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index
>> > a2e8838..943e084 100644
>> > --- a/drivers/pci/pcie/aer.c
>> > +++ b/drivers/pci/pcie/aer.c
>> > @@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct
>> > pci_dev
>> > *dev)
>> >   if (!dev->aer_cap)
>> >   return -EIO;
>> >
>> > + if (!IS_ENABLED(CONFIG_ACPI) &&
>> > + dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>> > + u16 control;
>> > +
>> > + /*
>> > +  * A Type-1 PCI bridge will not forward ERR_ messages
>> coming
>> > +  * from an endpoint if SERR# forwarding is not enabled.
>> > +  */
>> > + pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
>> );
>> > + control |= PCI_BRIDGE_CTL_SERR;
>> > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
>> > + }
>> > +
>> >   return pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
>> > PCI_EXP_AER_FLAGS);  }
>> > EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
>> > @@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct
>> > pci_dev *dev)
>> >   if (pcie_aer_get_firmware_first(dev))
>> >   return -EIO;
>> >
>> > + if (!IS_ENABLED(CONFIG_ACPI) &&
>> > + dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>> > + u16 control;
>> > +
>> > + /* Clear SERR Forwarding */
>> > + pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
>> );
>> > + control &= ~PCI_BRIDGE_CTL_SERR;
>> > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
>> > + }
>> > +
>> >   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>> > PCI_EXP_AER_FLAGS);
>> >  }
>>
>>
>> Should this configuration no be set by Firmware ? why should Linux
>> dictate it ?
> Hi Oza, Can you please let us know why this should be set by firmware ?
> Spec clearly states ERR_CORR,ERR_FATAL/NON FATAL will be forwarded
> only if this bit is set.
> If linux AER service is being enabled without checking/setting this
> bit, then AER service will not do anything even ERR_* is seen on link.
>
> Regards,
> Bharat


The ERR_* to be forwarded or not to be forwarded could be decision of 
the

platform.
hence I think it is best left to firmware to decide if it want to 
enable this for

particular platform.

I'm not aware of other platforms, can you please give an example of a 
platform

how it decides to set this in firmware ?


although,
There are 2 cases
Hotplug capable bridge and otherwise.


Yes, what about an RP which supports only AER but doesn't support 
Hotplug ?

If we have this patch we can set this bit without firmware also.


1) If Firmware sets them, I do not think during enumeraion linux will 
loose

those settings.


2) I do not see any integration of hotplug with AER currently, so if 
the PCIe
switch is plugged into Hotplug capable RP, I am not very sure if this 
functions

get called.

Keith, Lukas and Bjorn any comments ?

Hi all, do you have any inputs on this ?



I have thought more on this..but needs separate discussion hence will be 
forking the separate thread.
If I choose to make some more patches based on this patch, this patch 
can remain as is and will be included with your authorship.


Regards,
Oza.


Regards,
Bharat


Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-23 Thread poza

On 2018-07-18 19:04, Bharat Kumar Gogada wrote:

On 2018-07-13 19:25, Bharat Kumar Gogada wrote:
>> > Currently PCI_BRIDGE_CTL_SERR is being enabled only in ACPI flow.
>> > This bit is required for forwarding errors reported by EP devices
>> > to upstream device.
>> > This patch enables SERR# for Type-1 PCI device.
>> >
>> > Signed-off-by: Bharat Kumar Gogada

>> > ---
>> >  drivers/pci/pcie/aer.c |   23 +++
>> >  1 files changed, 23 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index
>> > a2e8838..943e084 100644
>> > --- a/drivers/pci/pcie/aer.c
>> > +++ b/drivers/pci/pcie/aer.c
>> > @@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct
>> > pci_dev
>> > *dev)
>> >   if (!dev->aer_cap)
>> >   return -EIO;
>> >
>> > + if (!IS_ENABLED(CONFIG_ACPI) &&
>> > + dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>> > + u16 control;
>> > +
>> > + /*
>> > +  * A Type-1 PCI bridge will not forward ERR_ messages
>> coming
>> > +  * from an endpoint if SERR# forwarding is not enabled.
>> > +  */
>> > + pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
>> );
>> > + control |= PCI_BRIDGE_CTL_SERR;
>> > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
>> > + }
>> > +
>> >   return pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
>> > PCI_EXP_AER_FLAGS);  }
>> > EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
>> > @@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct
>> > pci_dev *dev)
>> >   if (pcie_aer_get_firmware_first(dev))
>> >   return -EIO;
>> >
>> > + if (!IS_ENABLED(CONFIG_ACPI) &&
>> > + dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>> > + u16 control;
>> > +
>> > + /* Clear SERR Forwarding */
>> > + pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
>> );
>> > + control &= ~PCI_BRIDGE_CTL_SERR;
>> > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
>> > + }
>> > +
>> >   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>> > PCI_EXP_AER_FLAGS);
>> >  }
>>
>>
>> Should this configuration no be set by Firmware ? why should Linux
>> dictate it ?
> Hi Oza, Can you please let us know why this should be set by firmware ?
> Spec clearly states ERR_CORR,ERR_FATAL/NON FATAL will be forwarded
> only if this bit is set.
> If linux AER service is being enabled without checking/setting this
> bit, then AER service will not do anything even ERR_* is seen on link.
>
> Regards,
> Bharat


The ERR_* to be forwarded or not to be forwarded could be decision of 
the

platform.
hence I think it is best left to firmware to decide if it want to 
enable this for

particular platform.

I'm not aware of other platforms, can you please give an example of a 
platform

how it decides to set this in firmware ?


although,
There are 2 cases
Hotplug capable bridge and otherwise.


Yes, what about an RP which supports only AER but doesn't support 
Hotplug ?

If we have this patch we can set this bit without firmware also.


1) If Firmware sets them, I do not think during enumeraion linux will 
loose

those settings.


2) I do not see any integration of hotplug with AER currently, so if 
the PCIe
switch is plugged into Hotplug capable RP, I am not very sure if this 
functions

get called.

Keith, Lukas and Bjorn any comments ?

Hi all, do you have any inputs on this ?



I have thought more on this..but needs separate discussion hence will be 
forking the separate thread.
If I choose to make some more patches based on this patch, this patch 
can remain as is and will be included with your authorship.


Regards,
Oza.


Regards,
Bharat


Re: [PATCH v3 0/7] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

2018-07-19 Thread poza

On 2018-07-19 01:14, Bjorn Helgaas wrote:

This is a v3 of Oza's patches [1].  It's available at [2] if you prefer
git.

v3 changes:
  - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only 
called

from pcie_do_fatal_recovery().  Moved to first in series to avoid a
window where ERR_FATAL recovery only clears ERR_NONFATAL bits.  
Visible

only inside the PCI core.
  - Instead of having pci_cleanup_aer_uncorrect_error_status() do 
different
things based on dev->error_state, use this only for ERR_NONFATAL 
bits.

I didn't change the name because it's used by many drivers.
  - Rename pci_cleanup_aer_error_device_status() to
pci_aer_clear_device_status(), make it void, and make it visible 
only

inside the PCI core.
  - Remove pcie_portdrv_err_handler.slot_reset altogether instead of 
making
it a stub function.  Possibly pcie_portdrv_err_handler could be 
removed

completely?

[1]
https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-p...@codeaurora.org
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer

---

Bjorn Helgaas (1):
  PCI/AER: Clear only ERR_FATAL status bits during fatal recovery

Oza Pawandeep (6):
  PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
  PCI/AER: Factor out ERR_NONFATAL status bit clearing
  PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
  PCI/AER: Clear device status bits during ERR_FATAL and 
ERR_NONFATAL

  PCI/AER: Clear device status bits during ERR_COR handling
  PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset


 drivers/pci/pci.h  |5 
 drivers/pci/pcie/aer.c |   47 
+++-

 drivers/pci/pcie/err.c |   15 +
 drivers/pci/pcie/portdrv_pci.c |   25 -
 4 files changed, 43 insertions(+), 49 deletions(-)



Hi Bjorn,

I am planning on some things to do after this series.


your text
"
1) I don't think the driver slot_reset callbacks should be responsible
for clearing these AER status bits.  Can we clear them somewhere in
the pcie_do_nonfatal_recovery() path and remove these calls from the
drivers?
"

Oza: We can do following
broadcast_error_message()
  if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
should do
  pci_walk_bus(dev->subordinate, 
pci_cleanup_aer_uncorrect_error_status, NULL);


and update all the drivers and remove the call 
pci_cleanup_aer_uncorrect_error_status()



2) In principle, we should only read PCI_ERR_UNCOR_STATUS *once* per
device when handling an error.  We currently read it three times:

  aer_isr
aer_isr_one_error
  find_source_device
find_device_iter
  is_error_source
read PCI_ERR_UNCOR_STATUS  # 1
Oza: this is the first legitimate read
  aer_process_err_devices
get_device_error_info(e_info->dev[i])
  read PCI_ERR_UNCOR_STATUS# 2
Oza: I see this read used to check if link is healthy so the purpose of 
this read looks different to me.

handle_error_source
  pcie_do_nonfatal_recovery
...
  report_slot_reset
driver->err_handler->slot_reset
  pci_cleanup_aer_uncorrect_error_status
read PCI_ERR_UNCOR_STATUS  # 3
Oza: pci_cleanup_aer_uncorrect_error_status() is generic and able to 
clear status.

for e.g. in point 4 as I suggested if we have to do
pci_walk_bus(dev->subordinate, pci_cleanup_aer_uncorrect_error_status, 
NULL); then we have to read them.



3) we need to get rid of pci_channel_io_frozen permanently.

Regards,
Oza.


















Re: [PATCH v3 0/7] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

2018-07-19 Thread poza

On 2018-07-19 01:14, Bjorn Helgaas wrote:

This is a v3 of Oza's patches [1].  It's available at [2] if you prefer
git.

v3 changes:
  - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only 
called

from pcie_do_fatal_recovery().  Moved to first in series to avoid a
window where ERR_FATAL recovery only clears ERR_NONFATAL bits.  
Visible

only inside the PCI core.
  - Instead of having pci_cleanup_aer_uncorrect_error_status() do 
different
things based on dev->error_state, use this only for ERR_NONFATAL 
bits.

I didn't change the name because it's used by many drivers.
  - Rename pci_cleanup_aer_error_device_status() to
pci_aer_clear_device_status(), make it void, and make it visible 
only

inside the PCI core.
  - Remove pcie_portdrv_err_handler.slot_reset altogether instead of 
making
it a stub function.  Possibly pcie_portdrv_err_handler could be 
removed

completely?

[1]
https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-p...@codeaurora.org
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer

---

Bjorn Helgaas (1):
  PCI/AER: Clear only ERR_FATAL status bits during fatal recovery

Oza Pawandeep (6):
  PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
  PCI/AER: Factor out ERR_NONFATAL status bit clearing
  PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
  PCI/AER: Clear device status bits during ERR_FATAL and 
ERR_NONFATAL

  PCI/AER: Clear device status bits during ERR_COR handling
  PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset


 drivers/pci/pci.h  |5 
 drivers/pci/pcie/aer.c |   47 
+++-

 drivers/pci/pcie/err.c |   15 +
 drivers/pci/pcie/portdrv_pci.c |   25 -
 4 files changed, 43 insertions(+), 49 deletions(-)



Hi Bjorn,

I am planning on some things to do after this series.


your text
"
1) I don't think the driver slot_reset callbacks should be responsible
for clearing these AER status bits.  Can we clear them somewhere in
the pcie_do_nonfatal_recovery() path and remove these calls from the
drivers?
"

Oza: We can do following
broadcast_error_message()
  if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
should do
  pci_walk_bus(dev->subordinate, 
pci_cleanup_aer_uncorrect_error_status, NULL);


and update all the drivers and remove the call 
pci_cleanup_aer_uncorrect_error_status()



2) In principle, we should only read PCI_ERR_UNCOR_STATUS *once* per
device when handling an error.  We currently read it three times:

  aer_isr
aer_isr_one_error
  find_source_device
find_device_iter
  is_error_source
read PCI_ERR_UNCOR_STATUS  # 1
Oza: this is the first legitimate read
  aer_process_err_devices
get_device_error_info(e_info->dev[i])
  read PCI_ERR_UNCOR_STATUS# 2
Oza: I see this read used to check if link is healthy so the purpose of 
this read looks different to me.

handle_error_source
  pcie_do_nonfatal_recovery
...
  report_slot_reset
driver->err_handler->slot_reset
  pci_cleanup_aer_uncorrect_error_status
read PCI_ERR_UNCOR_STATUS  # 3
Oza: pci_cleanup_aer_uncorrect_error_status() is generic and able to 
clear status.

for e.g. in point 4 as I suggested if we have to do
pci_walk_bus(dev->subordinate, pci_cleanup_aer_uncorrect_error_status, 
NULL); then we have to read them.



3) we need to get rid of pci_channel_io_frozen permanently.

Regards,
Oza.


















Re: [PATCH v3 0/7] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

2018-07-18 Thread poza

On 2018-07-19 01:14, Bjorn Helgaas wrote:

This is a v3 of Oza's patches [1].  It's available at [2] if you prefer
git.

v3 changes:
  - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only 
called

from pcie_do_fatal_recovery().  Moved to first in series to avoid a
window where ERR_FATAL recovery only clears ERR_NONFATAL bits.  
Visible

only inside the PCI core.
  - Instead of having pci_cleanup_aer_uncorrect_error_status() do 
different
things based on dev->error_state, use this only for ERR_NONFATAL 
bits.

I didn't change the name because it's used by many drivers.
  - Rename pci_cleanup_aer_error_device_status() to
pci_aer_clear_device_status(), make it void, and make it visible 
only

inside the PCI core.
  - Remove pcie_portdrv_err_handler.slot_reset altogether instead of 
making
it a stub function.  Possibly pcie_portdrv_err_handler could be 
removed

completely?

[1]
https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-p...@codeaurora.org
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer

---

Bjorn Helgaas (1):
  PCI/AER: Clear only ERR_FATAL status bits during fatal recovery

Oza Pawandeep (6):
  PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
  PCI/AER: Factor out ERR_NONFATAL status bit clearing
  PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
  PCI/AER: Clear device status bits during ERR_FATAL and 
ERR_NONFATAL

  PCI/AER: Clear device status bits during ERR_COR handling
  PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset


 drivers/pci/pci.h  |5 
 drivers/pci/pcie/aer.c |   47 
+++-

 drivers/pci/pcie/err.c |   15 +
 drivers/pci/pcie/portdrv_pci.c |   25 -
 4 files changed, 43 insertions(+), 49 deletions(-)


looks good to me.
Thanks for the corrections.
some x86 compilation errors, you want me to to fix it and push v4 ?

Regards,
Oza.







Re: [PATCH v3 0/7] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

2018-07-18 Thread poza

On 2018-07-19 01:14, Bjorn Helgaas wrote:

This is a v3 of Oza's patches [1].  It's available at [2] if you prefer
git.

v3 changes:
  - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only 
called

from pcie_do_fatal_recovery().  Moved to first in series to avoid a
window where ERR_FATAL recovery only clears ERR_NONFATAL bits.  
Visible

only inside the PCI core.
  - Instead of having pci_cleanup_aer_uncorrect_error_status() do 
different
things based on dev->error_state, use this only for ERR_NONFATAL 
bits.

I didn't change the name because it's used by many drivers.
  - Rename pci_cleanup_aer_error_device_status() to
pci_aer_clear_device_status(), make it void, and make it visible 
only

inside the PCI core.
  - Remove pcie_portdrv_err_handler.slot_reset altogether instead of 
making
it a stub function.  Possibly pcie_portdrv_err_handler could be 
removed

completely?

[1]
https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-p...@codeaurora.org
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer

---

Bjorn Helgaas (1):
  PCI/AER: Clear only ERR_FATAL status bits during fatal recovery

Oza Pawandeep (6):
  PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
  PCI/AER: Factor out ERR_NONFATAL status bit clearing
  PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
  PCI/AER: Clear device status bits during ERR_FATAL and 
ERR_NONFATAL

  PCI/AER: Clear device status bits during ERR_COR handling
  PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset


 drivers/pci/pci.h  |5 
 drivers/pci/pcie/aer.c |   47 
+++-

 drivers/pci/pcie/err.c |   15 +
 drivers/pci/pcie/portdrv_pci.c |   25 -
 4 files changed, 43 insertions(+), 49 deletions(-)


looks good to me.
Thanks for the corrections.
some x86 compilation errors, you want me to to fix it and push v4 ?

Regards,
Oza.







Re: [PATCH v2 1/6] PCI/AER: Take severity mask into account while clearing error bits

2018-07-18 Thread poza

On 2018-07-18 03:06, Bjorn Helgaas wrote:

On Tue, Jul 17, 2018 at 02:03:29PM -0500, Bjorn Helgaas wrote:

Hi Oza,

Thanks for doing this!

On Fri, Jun 22, 2018 at 05:58:09AM -0400, Oza Pawandeep wrote:
> pci_cleanup_aer_uncorrect_error_status() is called by different slot_reset
> callbacks in case of ERR_NONFATAL.

IIRC, the current strategy is:

  ERR_COR: log only
  ERR_NONFATAL: call driver callbacks (pci_error_handlers)
  ERR_FATAL: remove device and re-enumerate

So these slot_reset callbacks are only used for ERR_NONFATAL, which
are all uncorrectable errors, of course.

This patch makes it so that when the slot_reset callbacks call
pci_cleanup_aer_uncorrect_error_status(), we only clear the
bits set by ERR_NONFATAL events (this is determined by
PCI_ERR_UNCOR_SEVER).

That makes good sense to me.  All these status bits are RW1CS, so they
will be preserved across a reset but will be cleared when we
re-enumerate, in this path:

  pci_init_capabilities
pci_aer_init
  pci_cleanup_aer_error_status_regs
  # clear all PCI_ERR_UNCOR_STATUS and PCI_ERR_COR_STATUS bits

> AER uncorrectable error status should take severity into account in order
> to clear the bits, so that ERR_NONFATAL path does not clear the bit which
> are marked with severity fatal.

Two comments:

1) Can you split this into two patches, one that changes
pci_cleanup_aer_uncorrect_error_status() so it looks like the error
clearing code in aer_error_resume(), and a second that factors out the
duplicate code?

2) Maybe use "sev" or "sever" instead of "mask" for the local
variable, since there is also a separate PCI_ERR_UNCOR_MASK register,
which is not involved here.


Let me back up a little here: I'm not asking you to do the things
below here.  They're just possible future things, so we can think
about them after this series.  And the things above are things I can
easily do myself.  So no action required from you, unless you think
I'm on the wrong track :)


I agree with your points, and have taken them into account for future 
series reference as well.


what about PATCH-2 of this series ?
that clears ERR_FATAL bits, but as you said, during re-enumeration
pci_init_capabilities
 pci_aer_init
   pci_cleanup_aer_error_status_regs
   # clear all PCI_ERR_UNCOR_STATUS and PCI_ERR_COR_STATUS bits

but that should clear the ERR_FATAL of the devices beneath.

PATCH2: we are doing it for BRIDGE where we think where ERR_FATAL was 
reported by bridge and the problem is with downstream link.

if ((service == PCIE_PORT_SERVICE_AER) &&
(dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) {
/*
 * If the error is reported by a bridge, we think this error
 * is related to the downstream link of the bridge, so we
 * do error recovery on all subordinates of the bridge instead
 * of the bridge and clear the error status of the bridge.
 */
pci_cleanup_aer_uncorrect_error_status(dev);
}


so overall, I think all the patches are required, if you have comments 
please let me know.

so far I see that, no action is required from me.




3) The "pci_cleanup_aer_uncorrect_error_status()" name no longer
really describes what this does.  Something like
"pci_aer_clear_nonfatal_status()" would be more descriptive.  But I
see you have a subsequent patch (which I haven't looked at yet) that
is related to this.

4) I don't think the driver slot_reset callbacks should be responsible
for clearing these AER status bits.  Can we clear them somewhere in
the pcie_do_nonfatal_recovery() path and remove these calls from the
drivers?

5) In principle, we should only read PCI_ERR_UNCOR_STATUS *once* per
device when handling an error.  We currently read it three times:

  aer_isr
aer_isr_one_error
  find_source_device
find_device_iter
  is_error_source
read PCI_ERR_UNCOR_STATUS  # 1
  aer_process_err_devices
get_device_error_info(e_info->dev[i])
  read PCI_ERR_UNCOR_STATUS# 2
handle_error_source
  pcie_do_nonfatal_recovery
...
  report_slot_reset
driver->err_handler->slot_reset
  pci_cleanup_aer_uncorrect_error_status
read PCI_ERR_UNCOR_STATUS  # 3

OK, that was more than two comments :)


Re: [PATCH v2 1/6] PCI/AER: Take severity mask into account while clearing error bits

2018-07-18 Thread poza

On 2018-07-18 03:06, Bjorn Helgaas wrote:

On Tue, Jul 17, 2018 at 02:03:29PM -0500, Bjorn Helgaas wrote:

Hi Oza,

Thanks for doing this!

On Fri, Jun 22, 2018 at 05:58:09AM -0400, Oza Pawandeep wrote:
> pci_cleanup_aer_uncorrect_error_status() is called by different slot_reset
> callbacks in case of ERR_NONFATAL.

IIRC, the current strategy is:

  ERR_COR: log only
  ERR_NONFATAL: call driver callbacks (pci_error_handlers)
  ERR_FATAL: remove device and re-enumerate

So these slot_reset callbacks are only used for ERR_NONFATAL, which
are all uncorrectable errors, of course.

This patch makes it so that when the slot_reset callbacks call
pci_cleanup_aer_uncorrect_error_status(), we only clear the
bits set by ERR_NONFATAL events (this is determined by
PCI_ERR_UNCOR_SEVER).

That makes good sense to me.  All these status bits are RW1CS, so they
will be preserved across a reset but will be cleared when we
re-enumerate, in this path:

  pci_init_capabilities
pci_aer_init
  pci_cleanup_aer_error_status_regs
  # clear all PCI_ERR_UNCOR_STATUS and PCI_ERR_COR_STATUS bits

> AER uncorrectable error status should take severity into account in order
> to clear the bits, so that ERR_NONFATAL path does not clear the bit which
> are marked with severity fatal.

Two comments:

1) Can you split this into two patches, one that changes
pci_cleanup_aer_uncorrect_error_status() so it looks like the error
clearing code in aer_error_resume(), and a second that factors out the
duplicate code?

2) Maybe use "sev" or "sever" instead of "mask" for the local
variable, since there is also a separate PCI_ERR_UNCOR_MASK register,
which is not involved here.


Let me back up a little here: I'm not asking you to do the things
below here.  They're just possible future things, so we can think
about them after this series.  And the things above are things I can
easily do myself.  So no action required from you, unless you think
I'm on the wrong track :)


I agree with your points, and have taken them into account for future 
series reference as well.


what about PATCH-2 of this series ?
that clears ERR_FATAL bits, but as you said, during re-enumeration
pci_init_capabilities
 pci_aer_init
   pci_cleanup_aer_error_status_regs
   # clear all PCI_ERR_UNCOR_STATUS and PCI_ERR_COR_STATUS bits

but that should clear the ERR_FATAL of the devices beneath.

PATCH2: we are doing it for BRIDGE where we think where ERR_FATAL was 
reported by bridge and the problem is with downstream link.

if ((service == PCIE_PORT_SERVICE_AER) &&
(dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) {
/*
 * If the error is reported by a bridge, we think this error
 * is related to the downstream link of the bridge, so we
 * do error recovery on all subordinates of the bridge instead
 * of the bridge and clear the error status of the bridge.
 */
pci_cleanup_aer_uncorrect_error_status(dev);
}


so overall, I think all the patches are required, if you have comments 
please let me know.

so far I see that, no action is required from me.




3) The "pci_cleanup_aer_uncorrect_error_status()" name no longer
really describes what this does.  Something like
"pci_aer_clear_nonfatal_status()" would be more descriptive.  But I
see you have a subsequent patch (which I haven't looked at yet) that
is related to this.

4) I don't think the driver slot_reset callbacks should be responsible
for clearing these AER status bits.  Can we clear them somewhere in
the pcie_do_nonfatal_recovery() path and remove these calls from the
drivers?

5) In principle, we should only read PCI_ERR_UNCOR_STATUS *once* per
device when handling an error.  We currently read it three times:

  aer_isr
aer_isr_one_error
  find_source_device
find_device_iter
  is_error_source
read PCI_ERR_UNCOR_STATUS  # 1
  aer_process_err_devices
get_device_error_info(e_info->dev[i])
  read PCI_ERR_UNCOR_STATUS# 2
handle_error_source
  pcie_do_nonfatal_recovery
...
  report_slot_reset
driver->err_handler->slot_reset
  pci_cleanup_aer_uncorrect_error_status
read PCI_ERR_UNCOR_STATUS  # 3

OK, that was more than two comments :)


Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-14 Thread poza

On 2018-07-13 19:25, Bharat Kumar Gogada wrote:

> Currently PCI_BRIDGE_CTL_SERR is being enabled only in ACPI flow.
> This bit is required for forwarding errors reported by EP devices to
> upstream device.
> This patch enables SERR# for Type-1 PCI device.
>
> Signed-off-by: Bharat Kumar Gogada 
> ---
>  drivers/pci/pcie/aer.c |   23 +++
>  1 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index
> a2e8838..943e084 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct
> pci_dev
> *dev)
>if (!dev->aer_cap)
>return -EIO;
>
> +  if (!IS_ENABLED(CONFIG_ACPI) &&
> +  dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /*
> +   * A Type-1 PCI bridge will not forward ERR_ messages
coming
> +   * from an endpoint if SERR# forwarding is not enabled.
> +   */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
);
> +  control |= PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +
>return pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> PCI_EXP_AER_FLAGS);  }
> EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
> @@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct
> pci_dev *dev)
>if (pcie_aer_get_firmware_first(dev))
>return -EIO;
>
> +  if (!IS_ENABLED(CONFIG_ACPI) &&
> +  dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /* Clear SERR Forwarding */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
);
> +  control &= ~PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +
>return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  PCI_EXP_AER_FLAGS);
>  }


Should this configuration no be set by Firmware ? why should Linux 
dictate it

?

Hi Oza, Can you please let us know why this should be set by firmware ?
Spec clearly states ERR_CORR,ERR_FATAL/NON FATAL will be forwarded
only if this bit is set.
If linux AER service is being enabled without checking/setting this
bit, then AER service will
not do anything even ERR_* is seen on link.

Regards,
Bharat



The ERR_* to be forwarded or not to be forwarded could be decision of 
the platform.
hence I think it is best left to firmware to decide if it want to enable 
this for particular platform.


although,
There are 2 cases
Hotplug capable bridge and otherwise.

1) If Firmware sets them, I do not think during enumeraion linux will 
loose those settings.


2) I do not see any integration of hotplug with AER currently, so if the 
PCIe switch is plugged into Hotplug

capable RP, I am not very sure if this functions get called.

Keith, Lukas and Bjorn any comments ?

Regards,
Oza.













Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-14 Thread poza

On 2018-07-13 19:25, Bharat Kumar Gogada wrote:

> Currently PCI_BRIDGE_CTL_SERR is being enabled only in ACPI flow.
> This bit is required for forwarding errors reported by EP devices to
> upstream device.
> This patch enables SERR# for Type-1 PCI device.
>
> Signed-off-by: Bharat Kumar Gogada 
> ---
>  drivers/pci/pcie/aer.c |   23 +++
>  1 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index
> a2e8838..943e084 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct
> pci_dev
> *dev)
>if (!dev->aer_cap)
>return -EIO;
>
> +  if (!IS_ENABLED(CONFIG_ACPI) &&
> +  dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /*
> +   * A Type-1 PCI bridge will not forward ERR_ messages
coming
> +   * from an endpoint if SERR# forwarding is not enabled.
> +   */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
);
> +  control |= PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +
>return pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> PCI_EXP_AER_FLAGS);  }
> EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
> @@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct
> pci_dev *dev)
>if (pcie_aer_get_firmware_first(dev))
>return -EIO;
>
> +  if (!IS_ENABLED(CONFIG_ACPI) &&
> +  dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +  u16 control;
> +
> +  /* Clear SERR Forwarding */
> +  pci_read_config_word(dev, PCI_BRIDGE_CONTROL,
);
> +  control &= ~PCI_BRIDGE_CTL_SERR;
> +  pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +  }
> +
>return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  PCI_EXP_AER_FLAGS);
>  }


Should this configuration no be set by Firmware ? why should Linux 
dictate it

?

Hi Oza, Can you please let us know why this should be set by firmware ?
Spec clearly states ERR_CORR,ERR_FATAL/NON FATAL will be forwarded
only if this bit is set.
If linux AER service is being enabled without checking/setting this
bit, then AER service will
not do anything even ERR_* is seen on link.

Regards,
Bharat



The ERR_* to be forwarded or not to be forwarded could be decision of 
the platform.
hence I think it is best left to firmware to decide if it want to enable 
this for particular platform.


although,
There are 2 cases
Hotplug capable bridge and otherwise.

1) If Firmware sets them, I do not think during enumeraion linux will 
loose those settings.


2) I do not see any integration of hotplug with AER currently, so if the 
PCIe switch is plugged into Hotplug

capable RP, I am not very sure if this functions get called.

Keith, Lukas and Bjorn any comments ?

Regards,
Oza.













Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-12 Thread poza

On 2018-07-12 20:15, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Should this configuration no be set by Firmware ? why should Linux 
dictate it ?


Regards,
Oza.




Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-12 Thread poza

On 2018-07-12 20:15, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Should this configuration no be set by Firmware ? why should Linux 
dictate it ?


Regards,
Oza.




Re: [PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

2018-07-05 Thread poza

Hi Bjorn,

Could you help to review this series ?

Regards,
Oza.

On 2018-06-22 15:28, Oza Pawandeep wrote:
These are follow up patches for the series which modifies ERR_FATAL 
handling.
although there were couple of problems existed before which, itis also 
fixing.


patch-1:
Fixes the problem where ERR_FATAL and ERR_NONFATAL status should be 
cleared

taking severity mask into account.

patch-2:
Takes care of clearing error fatal status

patch-3:
Follow up patch where no more need of handling ERR_FATAL
case.

patch-4:
Fixes clearing device status in case of uncorrectable errors.
(e.g. ERR_FATAL and ERR_NONFATAL)

patch-5:
Fixes clearing device status in case of correctable errors.

patch-6:
Follow up patch where no more need of handling pci_channel_io_frozen
in pcie_portdrv_slot_reset()

Oza Pawandeep (6):
  PCI/AER: Take severity mask into account while clearing error bits
  PCI/AER: Clear uncorrectable fatal error status bits
  PCI/ERR: Cleanup ERR_FATAL of error broadcast
  PCI/AER: Clear device error status bits during ERR_FATAL and
ERR_NONFATAL
  PCI/AER: Fix correctable status bits clearing in device register
  PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

 drivers/pci/pcie/aer.c | 35 
+++

 drivers/pci/pcie/err.c | 15 +++
 drivers/pci/pcie/portdrv_pci.c | 18 --
 include/linux/aer.h|  5 +
 4 files changed, 35 insertions(+), 38 deletions(-)


Re: [PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

2018-07-05 Thread poza

Hi Bjorn,

Could you help to review this series ?

Regards,
Oza.

On 2018-06-22 15:28, Oza Pawandeep wrote:
These are follow up patches for the series which modifies ERR_FATAL 
handling.
although there were couple of problems existed before which, itis also 
fixing.


patch-1:
Fixes the problem where ERR_FATAL and ERR_NONFATAL status should be 
cleared

taking severity mask into account.

patch-2:
Takes care of clearing error fatal status

patch-3:
Follow up patch where no more need of handling ERR_FATAL
case.

patch-4:
Fixes clearing device status in case of uncorrectable errors.
(e.g. ERR_FATAL and ERR_NONFATAL)

patch-5:
Fixes clearing device status in case of correctable errors.

patch-6:
Follow up patch where no more need of handling pci_channel_io_frozen
in pcie_portdrv_slot_reset()

Oza Pawandeep (6):
  PCI/AER: Take severity mask into account while clearing error bits
  PCI/AER: Clear uncorrectable fatal error status bits
  PCI/ERR: Cleanup ERR_FATAL of error broadcast
  PCI/AER: Clear device error status bits during ERR_FATAL and
ERR_NONFATAL
  PCI/AER: Fix correctable status bits clearing in device register
  PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

 drivers/pci/pcie/aer.c | 35 
+++

 drivers/pci/pcie/err.c | 15 +++
 drivers/pci/pcie/portdrv_pci.c | 18 --
 include/linux/aer.h|  5 +
 4 files changed, 35 insertions(+), 38 deletions(-)


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 20:04, Lukas Wunner wrote:

On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
@@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
u32 service)

pci_dev_put(pdev);
}

+   hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
+   hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
+
+   if (hpdev && hpsvc)
+   hpsvc->mask_irq(to_pcie_device(hpdev));
+
result = reset_link(udev, service);

+   if (hpdev && hpsvc)
+   hpsvc->unmask_irq(to_pcie_device(hpdev));
+


We've already got the ->reset_slot callback in struct hotplug_slot_ops,
I'm wondering if we really need additional ones for this use case.

If you just do...

if (!pci_probe_reset_slot(dev->slot))
pci_reset_slot(dev->slot)
else {
/* regular code path */
}

would that not be equivalent?



pcie_do_fatal_recovery() calls
  reset_link()
 which calls dpc_reset_link() or aer_root_reset() depending 
on the event.


and dpc_reset_link() and aer_root_reset() do their own cleanup stuffs.
infact, aer_root_reset() is the only function which actually triggers 
pci_reset_bridge_secondary_bus().


So if we try to fit in your suggestion:

if (!pci_probe_reset_slot(dev->slot))
{
pci_reset_slot(dev->slot)
result = reset_link(udev, service); >> in this case aer_root_reset 
must not call pci_reset_bridge_secondary_bus()

} else
result = reset_link(udev, service); >> in this case aer_root_reset 
must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug 
capable)


Did I get your suggestion right ?

Regards,
Oza.


(It's worth noting though that pciehp is the only hotplug driver
implementing the ->reset_slot callback.  If hotplug is handled by
a different driver or by the platform firmware, devices may still
be removed and re-enumerated twice.)




Thanks,

Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 20:04, Lukas Wunner wrote:

On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
@@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
u32 service)

pci_dev_put(pdev);
}

+   hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
+   hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
+
+   if (hpdev && hpsvc)
+   hpsvc->mask_irq(to_pcie_device(hpdev));
+
result = reset_link(udev, service);

+   if (hpdev && hpsvc)
+   hpsvc->unmask_irq(to_pcie_device(hpdev));
+


We've already got the ->reset_slot callback in struct hotplug_slot_ops,
I'm wondering if we really need additional ones for this use case.

If you just do...

if (!pci_probe_reset_slot(dev->slot))
pci_reset_slot(dev->slot)
else {
/* regular code path */
}

would that not be equivalent?



pcie_do_fatal_recovery() calls
  reset_link()
 which calls dpc_reset_link() or aer_root_reset() depending 
on the event.


and dpc_reset_link() and aer_root_reset() do their own cleanup stuffs.
infact, aer_root_reset() is the only function which actually triggers 
pci_reset_bridge_secondary_bus().


So if we try to fit in your suggestion:

if (!pci_probe_reset_slot(dev->slot))
{
pci_reset_slot(dev->slot)
result = reset_link(udev, service); >> in this case aer_root_reset 
must not call pci_reset_bridge_secondary_bus()

} else
result = reset_link(udev, service); >> in this case aer_root_reset 
must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug 
capable)


Did I get your suggestion right ?

Regards,
Oza.


(It's worth noting though that pciehp is the only hotplug driver
implementing the ->reset_slot callback.  If hotplug is handled by
a different driver or by the platform firmware, devices may still
be removed and re-enumerated twice.)




Thanks,

Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 19:42, Lukas Wunner wrote:

On Tue, Jul 03, 2018 at 07:30:28AM -0400, ok...@codeaurora.org wrote:

On 2018-07-03 04:34, Lukas Wunner wrote:
>On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>If a bridge supports hotplug and observes a PCIe fatal error, the
>>following
>>events happen:
>>
>>1. AER driver removes the devices from PCI tree on fatal error
>>2. AER driver brings down the link by issuing a secondary bus reset
>>waits
>>for the link to come up.
>>3. Hotplug driver observes a link down interrupt
>>4. Hotplug driver tries to remove the devices waiting for the rescan
>>lock
>>but devices are already removed by the AER driver and AER driver is
>>waiting
>>for the link to come back up.
>>5. AER driver tries to re-enumerate devices after polling for the link
>>state to go up.
>>6. Hotplug driver obtains the lock and tries to remove the devices
>>again.
>>
>>If a bridge is a hotplug capable bridge, mask hotplug interrupts before
>>the
>>reset and unmask afterwards.
>
>Would it work for you if you just amended the AER driver to skip
>removal and re-enumeration of devices if the port is a hotplug bridge?
>Just check for is_hotplug_bridge in struct pci_dev.

The reason why we want to remove devices before secondary bus reset is 
to

quiesce pcie bus traffic before issuing a reset.

Skipping this step might cause transactions to be lost in the middle 
of the
reset as there will be active traffic flowing and drivers will 
suddenly

start reading ffs.


Interesting, I think that merits a code comment.

FWIW, macOS has a "PCI pause" callback to quiesce a device:
https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf

They're using it to reconfigure a device's BAR and bus number
at runtime (sic!), e.g. if mmio windows need to be moved around
on Thunderbolt hotplug if there's insufficient space:

"During pause reconfiguration, the following may be changed:
 - device BAR registers
 - the devices bus number
 - registry properties reflecting these values ("ranges",
   "assigned-addresses", "reg")
 - device MSI block values for address and value, but not the
   number of MSIs allocated"

Conceptually, "PCI pause" is similar to putting the device in a suspend
state.  I'm wondering if suspending the devices below the bridge would
make more sense than removing them in the AER driver.



the code is shared by not only AER but also DPC, where if DPC event 
happens the devices are removed.
also if the bridge is hotplug capable, then the devices beneath might 
have changed and resume might break.



Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 19:42, Lukas Wunner wrote:

On Tue, Jul 03, 2018 at 07:30:28AM -0400, ok...@codeaurora.org wrote:

On 2018-07-03 04:34, Lukas Wunner wrote:
>On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>If a bridge supports hotplug and observes a PCIe fatal error, the
>>following
>>events happen:
>>
>>1. AER driver removes the devices from PCI tree on fatal error
>>2. AER driver brings down the link by issuing a secondary bus reset
>>waits
>>for the link to come up.
>>3. Hotplug driver observes a link down interrupt
>>4. Hotplug driver tries to remove the devices waiting for the rescan
>>lock
>>but devices are already removed by the AER driver and AER driver is
>>waiting
>>for the link to come back up.
>>5. AER driver tries to re-enumerate devices after polling for the link
>>state to go up.
>>6. Hotplug driver obtains the lock and tries to remove the devices
>>again.
>>
>>If a bridge is a hotplug capable bridge, mask hotplug interrupts before
>>the
>>reset and unmask afterwards.
>
>Would it work for you if you just amended the AER driver to skip
>removal and re-enumeration of devices if the port is a hotplug bridge?
>Just check for is_hotplug_bridge in struct pci_dev.

The reason why we want to remove devices before secondary bus reset is 
to

quiesce pcie bus traffic before issuing a reset.

Skipping this step might cause transactions to be lost in the middle 
of the
reset as there will be active traffic flowing and drivers will 
suddenly

start reading ffs.


Interesting, I think that merits a code comment.

FWIW, macOS has a "PCI pause" callback to quiesce a device:
https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf

They're using it to reconfigure a device's BAR and bus number
at runtime (sic!), e.g. if mmio windows need to be moved around
on Thunderbolt hotplug if there's insufficient space:

"During pause reconfiguration, the following may be changed:
 - device BAR registers
 - the devices bus number
 - registry properties reflecting these values ("ranges",
   "assigned-addresses", "reg")
 - device MSI block values for address and value, but not the
   number of MSIs allocated"

Conceptually, "PCI pause" is similar to putting the device in a suspend
state.  I'm wondering if suspending the devices below the bridge would
make more sense than removing them in the AER driver.



the code is shared by not only AER but also DPC, where if DPC event 
happens the devices are removed.
also if the bridge is hotplug capable, then the devices beneath might 
have changed and resume might break.



Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 19:29, Lukas Wunner wrote:

On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
Issue is observing hotplug link down event in the middle of AER 
recovery

as in my previous reply.

If we mask hotplug interrupts before secondary bus reset via my patch,
then hotplug driver will not observe both link up and link down 
interrupts.


If we don't mask hotplug interrupts, we have a race condition.


I assume that a bus reset not only causes a link and presence event but
also clears the Presence Detect State bit in the Slot Status register
and the Data Link Layer Link Active bit in the Link Status register
momentarily.

pciehp may access those two bits concurrently to the AER driver
performing a slot reset.  So it may not be sufficient to mask
the interrupt.


Was just wondering that you are protecting Presence Detect State bit 
with reset_lock, mainly in pciehp_ist
but with hotplug interrupt disabled, is there another way that it 
hotplug code gets activated ?




I've posted this patch to address the issue:
https://patchwork.ozlabs.org/patch/930391/

Thanks,

Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 19:29, Lukas Wunner wrote:

On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
Issue is observing hotplug link down event in the middle of AER 
recovery

as in my previous reply.

If we mask hotplug interrupts before secondary bus reset via my patch,
then hotplug driver will not observe both link up and link down 
interrupts.


If we don't mask hotplug interrupts, we have a race condition.


I assume that a bus reset not only causes a link and presence event but
also clears the Presence Detect State bit in the Slot Status register
and the Data Link Layer Link Active bit in the Link Status register
momentarily.

pciehp may access those two bits concurrently to the AER driver
performing a slot reset.  So it may not be sufficient to mask
the interrupt.


Was just wondering that you are protecting Presence Detect State bit 
with reset_lock, mainly in pciehp_ist
but with hotplug interrupt disabled, is there another way that it 
hotplug code gets activated ?




I've posted this patch to address the issue:
https://patchwork.ozlabs.org/patch/930391/

Thanks,

Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 17:00, ok...@codeaurora.org wrote:

On 2018-07-03 04:34, Lukas Wunner wrote:

On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
If a bridge supports hotplug and observes a PCIe fatal error, the 
following

events happen:

1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset 
waits

for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan 
lock
but devices are already removed by the AER driver and AER driver is 
waiting

for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the 
link

state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices 
again.


If a bridge is a hotplug capable bridge, mask hotplug interrupts 
before the

reset and unmask afterwards.


Would it work for you if you just amended the AER driver to skip
removal and re-enumeration of devices if the port is a hotplug bridge?
Just check for is_hotplug_bridge in struct pci_dev.


The reason why we want to remove devices before secondary bus reset is
to quiesce pcie bus traffic before issuing a reset.

Skipping this step might cause transactions to be lost in the middle
of the reset as there will be active traffic flowing and drivers will
suddenly start reading ffs.

I don't think we can skip this step.



what if we only have conditional enumeration ?  (leaving removing 
devices followed by SBR as is) ?


following code is doing little more extra work than our normal ERR_FATAL 
path.


pciehp_unconfigure_device doing little more than enumeration to 
quiescence the bus.


/*
 * Ensure that no new Requests will be generated from
 * the device.
 */
if (presence) {
pci_read_config_word(dev, PCI_COMMAND, );
command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
command |= PCI_COMMAND_INTX_DISABLE;
pci_write_config_word(dev, PCI_COMMAND, command);
}







That would seem like a much simpler solution, given that it is known
that the link will flap on reset, causing the hotplug driver to remove
and re-enumerate devices.  That would also cover cases where hotplug 
is
handled by a different driver than pciehp, or by the platform 
firmware.


Thanks,

Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 17:00, ok...@codeaurora.org wrote:

On 2018-07-03 04:34, Lukas Wunner wrote:

On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
If a bridge supports hotplug and observes a PCIe fatal error, the 
following

events happen:

1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset 
waits

for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan 
lock
but devices are already removed by the AER driver and AER driver is 
waiting

for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the 
link

state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices 
again.


If a bridge is a hotplug capable bridge, mask hotplug interrupts 
before the

reset and unmask afterwards.


Would it work for you if you just amended the AER driver to skip
removal and re-enumeration of devices if the port is a hotplug bridge?
Just check for is_hotplug_bridge in struct pci_dev.


The reason why we want to remove devices before secondary bus reset is
to quiesce pcie bus traffic before issuing a reset.

Skipping this step might cause transactions to be lost in the middle
of the reset as there will be active traffic flowing and drivers will
suddenly start reading ffs.

I don't think we can skip this step.



what if we only have conditional enumeration ?  (leaving removing 
devices followed by SBR as is) ?


following code is doing little more extra work than our normal ERR_FATAL 
path.


pciehp_unconfigure_device doing little more than enumeration to 
quiescence the bus.


/*
 * Ensure that no new Requests will be generated from
 * the device.
 */
if (presence) {
pci_read_config_word(dev, PCI_COMMAND, );
command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
command |= PCI_COMMAND_INTX_DISABLE;
pci_write_config_word(dev, PCI_COMMAND, command);
}







That would seem like a much simpler solution, given that it is known
that the link will flap on reset, causing the hotplug driver to remove
and re-enumerate devices.  That would also cover cases where hotplug 
is
handled by a different driver than pciehp, or by the platform 
firmware.


Thanks,

Lukas


Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 14:04, Lukas Wunner wrote:

On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
If a bridge supports hotplug and observes a PCIe fatal error, the 
following

events happen:

1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset 
waits

for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan 
lock
but devices are already removed by the AER driver and AER driver is 
waiting

for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the link
state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices 
again.


If a bridge is a hotplug capable bridge, mask hotplug interrupts 
before the

reset and unmask afterwards.


Would it work for you if you just amended the AER driver to skip
removal and re-enumeration of devices if the port is a hotplug bridge?
Just check for is_hotplug_bridge in struct pci_dev.



I tend to agree with you Lukas.

on this line I already have follow up patches
although I am waiting for Bjorn to review some patch-series before that.
[PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

It doesn't look to me a an entirely a race condition since its guarded 
by pci_lock_rescan_remove())
I observed that both hotplug and aer/dpc comes out of it in a quiet sane 
state.


My thinking is: Disabling hotplug interrupts during ERR_FATAL,
is something little away from natural course of link_down event 
handling, which is handled by pciehp more maturely.
so it would be just easy not to take any action e.g. removal and 
re-enumeration of devices from ERR_FATAL handling point of view.


I leave it to Bjorn.

follwing is the patch wich I am trying to set it right and under test.
so till now I am in an opinion to handle this by checking in err.c

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 410c35c..607a234 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -292,15 +292,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
u32 service)


parent = udev->subordinate;
pci_lock_rescan_remove();
-   list_for_each_entry_safe_reverse(pdev, temp, >devices,
-bus_list) {
-   pci_dev_get(pdev);
-   pci_dev_set_disconnected(pdev, NULL);
-   if (pci_has_subordinate(pdev))
-   pci_walk_bus(pdev->subordinate,
-pci_dev_set_disconnected, NULL);
-   pci_stop_and_remove_bus_device(pdev);
-   pci_dev_put(pdev);
+   if (!udev->is_hotplug_bridge) {
+   list_for_each_entry_safe_reverse(pdev, temp, 
>devices,

+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, 
NULL);

+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }
}

result = reset_link(udev, service);
@@ -318,7 +320,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 
service)

}

if (result == PCI_ERS_RESULT_RECOVERED) {
-   if (pcie_wait_for_link(udev, true))
+   if (pcie_wait_for_link(udev, true) && 
!udev->is_hotplug_bridge)

pci_rescan_bus(udev->bus);
pci_info(dev, "Device recovery from fatal error 
successful\n");

dev->error_state = pci_channel_io_normal;



That would seem like a much simpler solution, given that it is known
that the link will flap on reset, causing the hotplug driver to remove
and re-enumerate devices.  That would also cover cases where hotplug is
handled by a different driver than pciehp, or by the platform firmware.

Thanks,

Lukas













Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset

2018-07-03 Thread poza

On 2018-07-03 14:04, Lukas Wunner wrote:

On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
If a bridge supports hotplug and observes a PCIe fatal error, the 
following

events happen:

1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset 
waits

for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan 
lock
but devices are already removed by the AER driver and AER driver is 
waiting

for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the link
state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices 
again.


If a bridge is a hotplug capable bridge, mask hotplug interrupts 
before the

reset and unmask afterwards.


Would it work for you if you just amended the AER driver to skip
removal and re-enumeration of devices if the port is a hotplug bridge?
Just check for is_hotplug_bridge in struct pci_dev.



I tend to agree with you Lukas.

on this line I already have follow up patches
although I am waiting for Bjorn to review some patch-series before that.
[PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

It doesn't look to me a an entirely a race condition since its guarded 
by pci_lock_rescan_remove())
I observed that both hotplug and aer/dpc comes out of it in a quiet sane 
state.


My thinking is: Disabling hotplug interrupts during ERR_FATAL,
is something little away from natural course of link_down event 
handling, which is handled by pciehp more maturely.
so it would be just easy not to take any action e.g. removal and 
re-enumeration of devices from ERR_FATAL handling point of view.


I leave it to Bjorn.

follwing is the patch wich I am trying to set it right and under test.
so till now I am in an opinion to handle this by checking in err.c

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 410c35c..607a234 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -292,15 +292,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
u32 service)


parent = udev->subordinate;
pci_lock_rescan_remove();
-   list_for_each_entry_safe_reverse(pdev, temp, >devices,
-bus_list) {
-   pci_dev_get(pdev);
-   pci_dev_set_disconnected(pdev, NULL);
-   if (pci_has_subordinate(pdev))
-   pci_walk_bus(pdev->subordinate,
-pci_dev_set_disconnected, NULL);
-   pci_stop_and_remove_bus_device(pdev);
-   pci_dev_put(pdev);
+   if (!udev->is_hotplug_bridge) {
+   list_for_each_entry_safe_reverse(pdev, temp, 
>devices,

+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, 
NULL);

+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }
}

result = reset_link(udev, service);
@@ -318,7 +320,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 
service)

}

if (result == PCI_ERS_RESULT_RECOVERED) {
-   if (pcie_wait_for_link(udev, true))
+   if (pcie_wait_for_link(udev, true) && 
!udev->is_hotplug_bridge)

pci_rescan_bus(udev->bus);
pci_info(dev, "Device recovery from fatal error 
successful\n");

dev->error_state = pci_channel_io_normal;



That would seem like a much simpler solution, given that it is known
that the link will flap on reset, causing the hotplug driver to remove
and re-enumerate devices.  That would also cover cases where hotplug is
handled by a different driver than pciehp, or by the platform firmware.

Thanks,

Lukas













Re: [PATCH] PCI/AER: Adopt lspci naming convention for AER prints

2018-06-26 Thread poza

On 2018-06-26 21:14, Tyler Baicar wrote:

lspci uses abbreviated naming for AER error strings. Adopt the
same naming convention for the AER printing so they match.

Signed-off-by: Tyler Baicar 
---
 drivers/pci/pcie/aer.c | 46 
+++---

 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..08a5219 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -459,22 +459,22 @@ int pci_aer_init(struct pci_dev *dev)
 };

 static const char *aer_correctable_error_string[] = {
-   "Receiver Error", /* Bit Position 0   */
+   "RxErr",  /* Bit Position 0   */
NULL,
NULL,
NULL,
NULL,
NULL,
-   "Bad TLP",/* Bit Position 6   */
-   "Bad DLLP",   /* Bit Position 7   */
-   "RELAY_NUM Rollover", /* Bit Position 8   */
+   "BadTLP", /* Bit Position 6   */
+   "BadDLLP",/* Bit Position 7   */
+   "Rollover",   /* Bit Position 8   */
NULL,
NULL,
NULL,
-   "Replay Timer Timeout",   /* Bit Position 12  */
-   "Advisory Non-Fatal", /* Bit Position 13  */
-   "Corrected Internal Error",   /* Bit Position 14  */
-   "Header Log Overflow",/* Bit Position 15  */
+   "Timeout",/* Bit Position 12  */
+   "NonFatalErr",/* Bit Position 13  */
+   "CorrIntErr", /* Bit Position 14  */
+   "HeaderOF",   /* Bit Position 15  */
 };

 static const char *aer_uncorrectable_error_string[] = {
@@ -482,28 +482,28 @@ int pci_aer_init(struct pci_dev *dev)
NULL,
NULL,
NULL,
-   "Data Link Protocol", /* Bit Position 4   */
-   "Surprise Down Error",/* Bit Position 5   */
+   "DLP",/* Bit Position 4   */
+   "SDES",   /* Bit Position 5   */
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
-   "Poisoned TLP",   /* Bit Position 12  */
-   "Flow Control Protocol",  /* Bit Position 13  */
-   "Completion Timeout", /* Bit Position 14  */
-   "Completer Abort",/* Bit Position 15  */
-   "Unexpected Completion",  /* Bit Position 16  */
-   "Receiver Overflow",  /* Bit Position 17  */
-   "Malformed TLP",  /* Bit Position 18  */
+   "TLP",/* Bit Position 12  */
+   "FCP",/* Bit Position 13  */
+   "CmpltTO",/* Bit Position 14  */
+   "CmpltAbrt",  /* Bit Position 15  */
+   "UnxCmplt",   /* Bit Position 16  */
+   "RxOF",   /* Bit Position 17  */
+   "MalfTLP",/* Bit Position 18  */
"ECRC",   /* Bit Position 19  */
-   "Unsupported Request",/* Bit Position 20  */
-   "ACS Violation",  /* Bit Position 21  */
-   "Uncorrectable Internal Error",   /* Bit Position 22  */
-   "MC Blocked TLP", /* Bit Position 23  */
-   "AtomicOp Egress Blocked",/* Bit Position 24  */
-   "TLP Prefix Blocked Error",   /* Bit Position 25  */
+   "UnsupReq",   /* Bit Position 20  */
+   "ACSViol",/* Bit Position 21  */
+   "UncorrIntErr",   /* Bit Position 22  */
+   "BlockedTLP", /* Bit Position 23  */
+   "AtomicOpBlocked",/* Bit Position 24  */
+   "TLPBlockedErr",  /* Bit Position 25  */
 };

 static const char *aer_agent_string[] = {



Reviewed-by: Oza Pawandeep 



Re: [PATCH] PCI/AER: Adopt lspci naming convention for AER prints

2018-06-26 Thread poza

On 2018-06-26 21:14, Tyler Baicar wrote:

lspci uses abbreviated naming for AER error strings. Adopt the
same naming convention for the AER printing so they match.

Signed-off-by: Tyler Baicar 
---
 drivers/pci/pcie/aer.c | 46 
+++---

 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..08a5219 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -459,22 +459,22 @@ int pci_aer_init(struct pci_dev *dev)
 };

 static const char *aer_correctable_error_string[] = {
-   "Receiver Error", /* Bit Position 0   */
+   "RxErr",  /* Bit Position 0   */
NULL,
NULL,
NULL,
NULL,
NULL,
-   "Bad TLP",/* Bit Position 6   */
-   "Bad DLLP",   /* Bit Position 7   */
-   "RELAY_NUM Rollover", /* Bit Position 8   */
+   "BadTLP", /* Bit Position 6   */
+   "BadDLLP",/* Bit Position 7   */
+   "Rollover",   /* Bit Position 8   */
NULL,
NULL,
NULL,
-   "Replay Timer Timeout",   /* Bit Position 12  */
-   "Advisory Non-Fatal", /* Bit Position 13  */
-   "Corrected Internal Error",   /* Bit Position 14  */
-   "Header Log Overflow",/* Bit Position 15  */
+   "Timeout",/* Bit Position 12  */
+   "NonFatalErr",/* Bit Position 13  */
+   "CorrIntErr", /* Bit Position 14  */
+   "HeaderOF",   /* Bit Position 15  */
 };

 static const char *aer_uncorrectable_error_string[] = {
@@ -482,28 +482,28 @@ int pci_aer_init(struct pci_dev *dev)
NULL,
NULL,
NULL,
-   "Data Link Protocol", /* Bit Position 4   */
-   "Surprise Down Error",/* Bit Position 5   */
+   "DLP",/* Bit Position 4   */
+   "SDES",   /* Bit Position 5   */
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
-   "Poisoned TLP",   /* Bit Position 12  */
-   "Flow Control Protocol",  /* Bit Position 13  */
-   "Completion Timeout", /* Bit Position 14  */
-   "Completer Abort",/* Bit Position 15  */
-   "Unexpected Completion",  /* Bit Position 16  */
-   "Receiver Overflow",  /* Bit Position 17  */
-   "Malformed TLP",  /* Bit Position 18  */
+   "TLP",/* Bit Position 12  */
+   "FCP",/* Bit Position 13  */
+   "CmpltTO",/* Bit Position 14  */
+   "CmpltAbrt",  /* Bit Position 15  */
+   "UnxCmplt",   /* Bit Position 16  */
+   "RxOF",   /* Bit Position 17  */
+   "MalfTLP",/* Bit Position 18  */
"ECRC",   /* Bit Position 19  */
-   "Unsupported Request",/* Bit Position 20  */
-   "ACS Violation",  /* Bit Position 21  */
-   "Uncorrectable Internal Error",   /* Bit Position 22  */
-   "MC Blocked TLP", /* Bit Position 23  */
-   "AtomicOp Egress Blocked",/* Bit Position 24  */
-   "TLP Prefix Blocked Error",   /* Bit Position 25  */
+   "UnsupReq",   /* Bit Position 20  */
+   "ACSViol",/* Bit Position 21  */
+   "UncorrIntErr",   /* Bit Position 22  */
+   "BlockedTLP", /* Bit Position 23  */
+   "AtomicOpBlocked",/* Bit Position 24  */
+   "TLPBlockedErr",  /* Bit Position 25  */
 };

 static const char *aer_agent_string[] = {



Reviewed-by: Oza Pawandeep 



Re: [PATCH v2 3/5] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-06-12 Thread poza

On 2018-06-12 22:28, Ray Jui wrote:

On 6/12/2018 1:29 AM, p...@codeaurora.org wrote:

On 2018-06-12 05:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 
--

 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 680f6b1..0804aa2 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1197,10 +1197,22 @@ static int 
iproc_pcie_paxb_v2_msi_steer(struct

iproc_pcie *pcie, u64 msi_addr)
 return ret;
 }

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, 
u64 msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, 
u64 msi_addr,

+ bool enable)
 {
 u32 val;

+    if (!enable) {
+    /*
+ * Disable PAXC MSI steering. All write transfers will be
+ * treated as non-MSI transfers
+ */
+    val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+    val &= ~MSI_ENABLE_CFG;
+    iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+    return;

can be dropped.



No it cannot be dropped. Please review the code carefully.


Ahhh, my bad, it looked like a new function to me, may e I need sleep.
sorry about that.

Reviewed-by: Oza Pawandeep 





Re: [PATCH v2 3/5] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-06-12 Thread poza

On 2018-06-12 22:28, Ray Jui wrote:

On 6/12/2018 1:29 AM, p...@codeaurora.org wrote:

On 2018-06-12 05:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 
--

 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 680f6b1..0804aa2 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1197,10 +1197,22 @@ static int 
iproc_pcie_paxb_v2_msi_steer(struct

iproc_pcie *pcie, u64 msi_addr)
 return ret;
 }

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, 
u64 msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, 
u64 msi_addr,

+ bool enable)
 {
 u32 val;

+    if (!enable) {
+    /*
+ * Disable PAXC MSI steering. All write transfers will be
+ * treated as non-MSI transfers
+ */
+    val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+    val &= ~MSI_ENABLE_CFG;
+    iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+    return;

can be dropped.



No it cannot be dropped. Please review the code carefully.


Ahhh, my bad, it looked like a new function to me, may e I need sleep.
sorry about that.

Reviewed-by: Oza Pawandeep 





Re: [PATCH v2 2/5] PCI: iproc: Fix up corrupted PAXC root complex config registers

2018-06-12 Thread poza

On 2018-06-12 13:57, p...@codeaurora.org wrote:

On 2018-06-12 05:51, Ray Jui wrote:

On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS

This patch fixes up the issue by manually populating the related
PCIe capabilities

Signed-off-by: Ray Jui 
---
 drivers/pci/host/pcie-iproc.c | 65 
+++

 drivers/pci/host/pcie-iproc.h |  3 ++
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 3c76c5f..680f6b1 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -85,6 +85,8 @@
 #define IMAP_VALID_SHIFT   0
 #define IMAP_VALID BIT(IMAP_VALID_SHIFT)

+#define IPROC_PCI_PM_CAP   0x48
+#define IPROC_PCI_PM_CAP_MASK  0x
 #define IPROC_PCI_EXP_CAP  0xac

 #define IPROC_PCIE_REG_INVALID 0x
@@ -375,6 +377,17 @@ static const u16 iproc_pcie_reg_paxc_v2[] = {
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
 };

+/*
+ * List of device IDs of controllers that have corrupted capability 
list that

+ * require SW fixup
+ */
+static const u16 iproc_pcie_corrupt_cap_did[] = {
+   0x16cd,
+   0x16f0,
+   0xd802,
+   0xd804
+};
+
 static inline struct iproc_pcie *iproc_data(struct pci_bus *bus)
 {
struct iproc_pcie *pcie = bus->sysdata;
@@ -495,6 +508,49 @@ static unsigned int iproc_pcie_cfg_retry(void
__iomem *cfg_data_p)
return data;
 }

+static void iproc_pcie_fix_cap(struct iproc_pcie *pcie, int where, 
u32 *val)

+{
+   u32 i, dev_id;
+
+   switch (where & ~0x3) {
+   case PCI_VENDOR_ID:
+   dev_id = *val >> 16;
+
+   /*
+* Activate fixup for those controllers that have corrupted
+* capability list registers
+*/
+   for (i = 0; i < ARRAY_SIZE(iproc_pcie_corrupt_cap_did); i++)
+   if (dev_id == iproc_pcie_corrupt_cap_did[i])
+   pcie->fix_paxc_cap = true;


and I think this code will try to fix up every time config space is 
read.

Does this get corrupted often, randomly ?
Can it not be solved by using one time Quirk ?
and if not Quirk, you dont want to be setting pcie->fix_paxc_cap =
false somewhere

besides, pcie->fix_paxc_cap = true; is set if PCI_VENDOR_ID is read 
first.
and rest cases stay with the assumption that PCI_VENDOR_ID will be read 
first.

which is infact read first during enumeration
(that is the assumption code is making), but that is safe assumption
to make I think.



ok I see that Bjorn has suggested to fix it this way instead of Quirks.
will just mark

Reviewed-by: Oza Pawandeep 


+   break;
+
+   case IPROC_PCI_PM_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise PM, force next capability to PCIe */
+   *val &= ~IPROC_PCI_PM_CAP_MASK;
+   *val |= IPROC_PCI_EXP_CAP << 8 | PCI_CAP_ID_PM;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise root port, version 2, terminate here */
+   *val = (PCI_EXP_TYPE_ROOT_PORT << 4 | 2) << 16 |
+   PCI_CAP_ID_EXP;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL:
+   /* Don't advertise CRS SV support */
+   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
+   break;
+
+   default:
+   break;
+   }
+}
+
 static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int 
devfn,

  int where, int size, u32 *val)
 {
@@ -509,13 +565,10 @@ static int iproc_pcie_config_read(struct pci_bus
*bus, unsigned int devfn,
/* root complex access */
if (busno == 0) {
ret = pci_generic_config_read32(bus, devfn, where, size, val);
-   if (ret != PCIBIOS_SUCCESSFUL)
-   return ret;
+   if (ret == PCIBIOS_SUCCESSFUL)
+   iproc_pcie_fix_cap(pcie, where, val);

-   /* Don't advertise CRS SV support */
-   if ((where & ~0x3) == IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL)
-   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
-   return PCIBIOS_SUCCESSFUL;
+   return ret;
}

 	cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, 
where);
diff --git 

Re: [PATCH v2 2/5] PCI: iproc: Fix up corrupted PAXC root complex config registers

2018-06-12 Thread poza

On 2018-06-12 13:57, p...@codeaurora.org wrote:

On 2018-06-12 05:51, Ray Jui wrote:

On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS

This patch fixes up the issue by manually populating the related
PCIe capabilities

Signed-off-by: Ray Jui 
---
 drivers/pci/host/pcie-iproc.c | 65 
+++

 drivers/pci/host/pcie-iproc.h |  3 ++
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 3c76c5f..680f6b1 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -85,6 +85,8 @@
 #define IMAP_VALID_SHIFT   0
 #define IMAP_VALID BIT(IMAP_VALID_SHIFT)

+#define IPROC_PCI_PM_CAP   0x48
+#define IPROC_PCI_PM_CAP_MASK  0x
 #define IPROC_PCI_EXP_CAP  0xac

 #define IPROC_PCIE_REG_INVALID 0x
@@ -375,6 +377,17 @@ static const u16 iproc_pcie_reg_paxc_v2[] = {
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
 };

+/*
+ * List of device IDs of controllers that have corrupted capability 
list that

+ * require SW fixup
+ */
+static const u16 iproc_pcie_corrupt_cap_did[] = {
+   0x16cd,
+   0x16f0,
+   0xd802,
+   0xd804
+};
+
 static inline struct iproc_pcie *iproc_data(struct pci_bus *bus)
 {
struct iproc_pcie *pcie = bus->sysdata;
@@ -495,6 +508,49 @@ static unsigned int iproc_pcie_cfg_retry(void
__iomem *cfg_data_p)
return data;
 }

+static void iproc_pcie_fix_cap(struct iproc_pcie *pcie, int where, 
u32 *val)

+{
+   u32 i, dev_id;
+
+   switch (where & ~0x3) {
+   case PCI_VENDOR_ID:
+   dev_id = *val >> 16;
+
+   /*
+* Activate fixup for those controllers that have corrupted
+* capability list registers
+*/
+   for (i = 0; i < ARRAY_SIZE(iproc_pcie_corrupt_cap_did); i++)
+   if (dev_id == iproc_pcie_corrupt_cap_did[i])
+   pcie->fix_paxc_cap = true;


and I think this code will try to fix up every time config space is 
read.

Does this get corrupted often, randomly ?
Can it not be solved by using one time Quirk ?
and if not Quirk, you dont want to be setting pcie->fix_paxc_cap =
false somewhere

besides, pcie->fix_paxc_cap = true; is set if PCI_VENDOR_ID is read 
first.
and rest cases stay with the assumption that PCI_VENDOR_ID will be read 
first.

which is infact read first during enumeration
(that is the assumption code is making), but that is safe assumption
to make I think.



ok I see that Bjorn has suggested to fix it this way instead of Quirks.
will just mark

Reviewed-by: Oza Pawandeep 


+   break;
+
+   case IPROC_PCI_PM_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise PM, force next capability to PCIe */
+   *val &= ~IPROC_PCI_PM_CAP_MASK;
+   *val |= IPROC_PCI_EXP_CAP << 8 | PCI_CAP_ID_PM;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise root port, version 2, terminate here */
+   *val = (PCI_EXP_TYPE_ROOT_PORT << 4 | 2) << 16 |
+   PCI_CAP_ID_EXP;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL:
+   /* Don't advertise CRS SV support */
+   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
+   break;
+
+   default:
+   break;
+   }
+}
+
 static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int 
devfn,

  int where, int size, u32 *val)
 {
@@ -509,13 +565,10 @@ static int iproc_pcie_config_read(struct pci_bus
*bus, unsigned int devfn,
/* root complex access */
if (busno == 0) {
ret = pci_generic_config_read32(bus, devfn, where, size, val);
-   if (ret != PCIBIOS_SUCCESSFUL)
-   return ret;
+   if (ret == PCIBIOS_SUCCESSFUL)
+   iproc_pcie_fix_cap(pcie, where, val);

-   /* Don't advertise CRS SV support */
-   if ((where & ~0x3) == IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL)
-   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
-   return PCIBIOS_SUCCESSFUL;
+   return ret;
}

 	cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, 
where);
diff --git 

Re: [PATCH 2/6] PCI: iproc: Add INTx support with better modeling

2018-06-12 Thread poza

On 2018-05-30 03:28, Ray Jui wrote:

Add PCIe legacy interrupt INTx support to the iProc PCIe driver by
modeling it with its own IRQ domain. All 4 interrupts INTA, INTB, INTC,
INTD share the same interrupt line connected to the GIC in the system,
while the status of each INTx can be obtained through the INTX CSR
register

Signed-off-by: Ray Jui 
---
 drivers/pci/host/pcie-iproc-platform.c |  2 +
 drivers/pci/host/pcie-iproc.c  | 95 
+-

 drivers/pci/host/pcie-iproc.h  |  6 +++
 3 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc-platform.c
b/drivers/pci/host/pcie-iproc-platform.c
index e764a2a..7a51e6c 100644
--- a/drivers/pci/host/pcie-iproc-platform.c
+++ b/drivers/pci/host/pcie-iproc-platform.c
@@ -70,6 +70,8 @@ static int iproc_pcie_pltfm_probe(struct
platform_device *pdev)
}
pcie->base_addr = reg.start;

+   pcie->irq = platform_get_irq(pdev, 0);
+
if (of_property_read_bool(np, "brcm,pcie-ob")) {
u32 val;

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 14f374d..0bd2c14 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -264,6 +265,7 @@ enum iproc_pcie_reg {

/* enable INTx */
IPROC_PCIE_INTX_EN,
+   IPROC_PCIE_INTX_CSR,

/* outbound address mapping */
IPROC_PCIE_OARR0,
@@ -305,6 +307,7 @@ static const u16 iproc_pcie_reg_paxb_bcma[] = {
[IPROC_PCIE_CFG_ADDR]   = 0x1f8,
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
[IPROC_PCIE_INTX_EN]= 0x330,
+   [IPROC_PCIE_INTX_CSR]   = 0x334,
[IPROC_PCIE_LINK_STATUS]= 0xf0c,
 };

@@ -316,6 +319,7 @@ static const u16 iproc_pcie_reg_paxb[] = {
[IPROC_PCIE_CFG_ADDR]   = 0x1f8,
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
[IPROC_PCIE_INTX_EN]= 0x330,
+   [IPROC_PCIE_INTX_CSR]   = 0x334,
[IPROC_PCIE_OARR0]  = 0xd20,
[IPROC_PCIE_OMAP0]  = 0xd40,
[IPROC_PCIE_OARR1]  = 0xd28,
@@ -332,6 +336,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
[IPROC_PCIE_CFG_ADDR]   = 0x1f8,
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
[IPROC_PCIE_INTX_EN]= 0x330,
+   [IPROC_PCIE_INTX_CSR]   = 0x334,
[IPROC_PCIE_OARR0]  = 0xd20,
[IPROC_PCIE_OMAP0]  = 0xd40,
[IPROC_PCIE_OARR1]  = 0xd28,
@@ -782,9 +787,90 @@ static int iproc_pcie_check_link(struct iproc_pcie 
*pcie)

return link_is_active ? 0 : -ENODEV;
 }

-static void iproc_pcie_enable(struct iproc_pcie *pcie)
+static int iproc_pcie_intx_map(struct irq_domain *domain, unsigned int 
irq,

+  irq_hw_number_t hwirq)
 {
+   irq_set_chip_and_handler(irq, _irq_chip, handle_simple_irq);
+   irq_set_chip_data(irq, domain->host_data);
+
+   return 0;
+}
+
+static const struct irq_domain_ops intx_domain_ops = {
+   .map = iproc_pcie_intx_map,
+};
+
+static void iproc_pcie_isr(struct irq_desc *desc)
+{
+   struct irq_chip *chip = irq_desc_get_chip(desc);
+   struct iproc_pcie *pcie;
+   struct device *dev;
+   unsigned long status;
+   u32 bit, virq;
+
+   chained_irq_enter(chip, desc);
+   pcie = irq_desc_get_handler_data(desc);
+   dev = pcie->dev;
+
+   /* go through INTx A, B, C, D until all interrupts are handled */
+   while ((status = iproc_pcie_read_reg(pcie, IPROC_PCIE_INTX_CSR) &
+   SYS_RC_INTX_MASK) != 0) {
+   for_each_set_bit(bit, , PCI_NUM_INTX) {
+   virq = irq_find_mapping(pcie->irq_domain, bit + 1);
+   if (virq)
+   generic_handle_irq(virq);
+   else
+   dev_err(dev, "unexpected INTx%u\n", bit);
+   }
+   }
+


Are these level or edge interrupts ? although I see DT says: 
IRQ_TYPE_NONE

do you not need to clear interrupt status bits in IPROC_PCIE_INTX_CSR ?


+   chained_irq_exit(chip, desc);
+}
+
+static int iproc_pcie_intx_enable(struct iproc_pcie *pcie)
+{
+   struct device *dev = pcie->dev;
+   struct device_node *node = dev->of_node;
+   int ret;
+
iproc_pcie_write_reg(pcie, IPROC_PCIE_INTX_EN, SYS_RC_INTX_MASK);
+
+   /*
+* BCMA devices do not map INTx the same way as platform devices. All
+* BCMA needs is the above code to enable INTx
+*/
+   if (pcie->irq <= 0)
+   return 0;
+
+   /* set IRQ handler */
+   irq_set_chained_handler_and_data(pcie->irq, iproc_pcie_isr, pcie);
+
+   /* add IRQ domain for INTx */
+   pcie->irq_domain = irq_domain_add_linear(node, PCI_NUM_INTX + 1,
+  

Re: [PATCH 2/6] PCI: iproc: Add INTx support with better modeling

2018-06-12 Thread poza

On 2018-05-30 03:28, Ray Jui wrote:

Add PCIe legacy interrupt INTx support to the iProc PCIe driver by
modeling it with its own IRQ domain. All 4 interrupts INTA, INTB, INTC,
INTD share the same interrupt line connected to the GIC in the system,
while the status of each INTx can be obtained through the INTX CSR
register

Signed-off-by: Ray Jui 
---
 drivers/pci/host/pcie-iproc-platform.c |  2 +
 drivers/pci/host/pcie-iproc.c  | 95 
+-

 drivers/pci/host/pcie-iproc.h  |  6 +++
 3 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc-platform.c
b/drivers/pci/host/pcie-iproc-platform.c
index e764a2a..7a51e6c 100644
--- a/drivers/pci/host/pcie-iproc-platform.c
+++ b/drivers/pci/host/pcie-iproc-platform.c
@@ -70,6 +70,8 @@ static int iproc_pcie_pltfm_probe(struct
platform_device *pdev)
}
pcie->base_addr = reg.start;

+   pcie->irq = platform_get_irq(pdev, 0);
+
if (of_property_read_bool(np, "brcm,pcie-ob")) {
u32 val;

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 14f374d..0bd2c14 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -264,6 +265,7 @@ enum iproc_pcie_reg {

/* enable INTx */
IPROC_PCIE_INTX_EN,
+   IPROC_PCIE_INTX_CSR,

/* outbound address mapping */
IPROC_PCIE_OARR0,
@@ -305,6 +307,7 @@ static const u16 iproc_pcie_reg_paxb_bcma[] = {
[IPROC_PCIE_CFG_ADDR]   = 0x1f8,
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
[IPROC_PCIE_INTX_EN]= 0x330,
+   [IPROC_PCIE_INTX_CSR]   = 0x334,
[IPROC_PCIE_LINK_STATUS]= 0xf0c,
 };

@@ -316,6 +319,7 @@ static const u16 iproc_pcie_reg_paxb[] = {
[IPROC_PCIE_CFG_ADDR]   = 0x1f8,
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
[IPROC_PCIE_INTX_EN]= 0x330,
+   [IPROC_PCIE_INTX_CSR]   = 0x334,
[IPROC_PCIE_OARR0]  = 0xd20,
[IPROC_PCIE_OMAP0]  = 0xd40,
[IPROC_PCIE_OARR1]  = 0xd28,
@@ -332,6 +336,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
[IPROC_PCIE_CFG_ADDR]   = 0x1f8,
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
[IPROC_PCIE_INTX_EN]= 0x330,
+   [IPROC_PCIE_INTX_CSR]   = 0x334,
[IPROC_PCIE_OARR0]  = 0xd20,
[IPROC_PCIE_OMAP0]  = 0xd40,
[IPROC_PCIE_OARR1]  = 0xd28,
@@ -782,9 +787,90 @@ static int iproc_pcie_check_link(struct iproc_pcie 
*pcie)

return link_is_active ? 0 : -ENODEV;
 }

-static void iproc_pcie_enable(struct iproc_pcie *pcie)
+static int iproc_pcie_intx_map(struct irq_domain *domain, unsigned int 
irq,

+  irq_hw_number_t hwirq)
 {
+   irq_set_chip_and_handler(irq, _irq_chip, handle_simple_irq);
+   irq_set_chip_data(irq, domain->host_data);
+
+   return 0;
+}
+
+static const struct irq_domain_ops intx_domain_ops = {
+   .map = iproc_pcie_intx_map,
+};
+
+static void iproc_pcie_isr(struct irq_desc *desc)
+{
+   struct irq_chip *chip = irq_desc_get_chip(desc);
+   struct iproc_pcie *pcie;
+   struct device *dev;
+   unsigned long status;
+   u32 bit, virq;
+
+   chained_irq_enter(chip, desc);
+   pcie = irq_desc_get_handler_data(desc);
+   dev = pcie->dev;
+
+   /* go through INTx A, B, C, D until all interrupts are handled */
+   while ((status = iproc_pcie_read_reg(pcie, IPROC_PCIE_INTX_CSR) &
+   SYS_RC_INTX_MASK) != 0) {
+   for_each_set_bit(bit, , PCI_NUM_INTX) {
+   virq = irq_find_mapping(pcie->irq_domain, bit + 1);
+   if (virq)
+   generic_handle_irq(virq);
+   else
+   dev_err(dev, "unexpected INTx%u\n", bit);
+   }
+   }
+


Are these level or edge interrupts ? although I see DT says: 
IRQ_TYPE_NONE

do you not need to clear interrupt status bits in IPROC_PCIE_INTX_CSR ?


+   chained_irq_exit(chip, desc);
+}
+
+static int iproc_pcie_intx_enable(struct iproc_pcie *pcie)
+{
+   struct device *dev = pcie->dev;
+   struct device_node *node = dev->of_node;
+   int ret;
+
iproc_pcie_write_reg(pcie, IPROC_PCIE_INTX_EN, SYS_RC_INTX_MASK);
+
+   /*
+* BCMA devices do not map INTx the same way as platform devices. All
+* BCMA needs is the above code to enable INTx
+*/
+   if (pcie->irq <= 0)
+   return 0;
+
+   /* set IRQ handler */
+   irq_set_chained_handler_and_data(pcie->irq, iproc_pcie_isr, pcie);
+
+   /* add IRQ domain for INTx */
+   pcie->irq_domain = irq_domain_add_linear(node, PCI_NUM_INTX + 1,
+  

Re: [PATCH v2 5/5] PCI: iproc: Reduce inbound/outbound mapping print level

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

Reduce inbound/outbound mapping print level from dev_info to
dev_dbg. This reduces the console logs during Linux boot process

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 59be1e0..3160e93 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -880,14 +880,14 @@ static inline int iproc_pcie_ob_write(struct
iproc_pcie *pcie, int window_idx,
writel(lower_32_bits(pci_addr), pcie->base + omap_offset);
writel(upper_32_bits(pci_addr), pcie->base + omap_offset + 4);

-   dev_info(dev, "ob window [%d]: offset 0x%x axi %pap pci %pap\n",
-window_idx, oarr_offset, _addr, _addr);
-   dev_info(dev, "oarr lo 0x%x oarr hi 0x%x\n",
-readl(pcie->base + oarr_offset),
-readl(pcie->base + oarr_offset + 4));
-   dev_info(dev, "omap lo 0x%x omap hi 0x%x\n",
-readl(pcie->base + omap_offset),
-readl(pcie->base + omap_offset + 4));
+   dev_dbg(dev, "ob window [%d]: offset 0x%x axi %pap pci %pap\n",
+   window_idx, oarr_offset, _addr, _addr);
+   dev_dbg(dev, "oarr lo 0x%x oarr hi 0x%x\n",
+   readl(pcie->base + oarr_offset),
+   readl(pcie->base + oarr_offset + 4));
+   dev_dbg(dev, "omap lo 0x%x omap hi 0x%x\n",
+   readl(pcie->base + omap_offset),
+   readl(pcie->base + omap_offset + 4));

return 0;
 }
@@ -1054,8 +1054,8 @@ static int iproc_pcie_ib_write(struct iproc_pcie
*pcie, int region_idx,
iproc_pcie_reg_is_invalid(imap_offset))
return -EINVAL;

-   dev_info(dev, "ib region [%d]: offset 0x%x axi %pap pci %pap\n",
-region_idx, iarr_offset, _addr, _addr);
+   dev_dbg(dev, "ib region [%d]: offset 0x%x axi %pap pci %pap\n",
+   region_idx, iarr_offset, _addr, _addr);

/*
 * Program the IARR registers.  The upper 32-bit IARR register is
@@ -1065,9 +1065,9 @@ static int iproc_pcie_ib_write(struct iproc_pcie
*pcie, int region_idx,
   pcie->base + iarr_offset);
writel(upper_32_bits(pci_addr), pcie->base + iarr_offset + 4);

-   dev_info(dev, "iarr lo 0x%x iarr hi 0x%x\n",
-readl(pcie->base + iarr_offset),
-readl(pcie->base + iarr_offset + 4));
+   dev_dbg(dev, "iarr lo 0x%x iarr hi 0x%x\n",
+   readl(pcie->base + iarr_offset),
+   readl(pcie->base + iarr_offset + 4));

/*
 * Now program the IMAP registers.  Each IARR region may have one or
@@ -1081,10 +1081,10 @@ static int iproc_pcie_ib_write(struct
iproc_pcie *pcie, int region_idx,
writel(upper_32_bits(axi_addr),
   pcie->base + imap_offset + ib_map->imap_addr_offset);

-   dev_info(dev, "imap window [%d] lo 0x%x hi 0x%x\n",
-window_idx, readl(pcie->base + imap_offset),
-readl(pcie->base + imap_offset +
-  ib_map->imap_addr_offset));
+   dev_dbg(dev, "imap window [%d] lo 0x%x hi 0x%x\n",
+   window_idx, readl(pcie->base + imap_offset),
+   readl(pcie->base + imap_offset +
+ ib_map->imap_addr_offset));

imap_offset += ib_map->imap_window_offset;
axi_addr += size;


Reviewed-by: Oza Pawandeep 


Re: [PATCH v2 5/5] PCI: iproc: Reduce inbound/outbound mapping print level

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

Reduce inbound/outbound mapping print level from dev_info to
dev_dbg. This reduces the console logs during Linux boot process

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 59be1e0..3160e93 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -880,14 +880,14 @@ static inline int iproc_pcie_ob_write(struct
iproc_pcie *pcie, int window_idx,
writel(lower_32_bits(pci_addr), pcie->base + omap_offset);
writel(upper_32_bits(pci_addr), pcie->base + omap_offset + 4);

-   dev_info(dev, "ob window [%d]: offset 0x%x axi %pap pci %pap\n",
-window_idx, oarr_offset, _addr, _addr);
-   dev_info(dev, "oarr lo 0x%x oarr hi 0x%x\n",
-readl(pcie->base + oarr_offset),
-readl(pcie->base + oarr_offset + 4));
-   dev_info(dev, "omap lo 0x%x omap hi 0x%x\n",
-readl(pcie->base + omap_offset),
-readl(pcie->base + omap_offset + 4));
+   dev_dbg(dev, "ob window [%d]: offset 0x%x axi %pap pci %pap\n",
+   window_idx, oarr_offset, _addr, _addr);
+   dev_dbg(dev, "oarr lo 0x%x oarr hi 0x%x\n",
+   readl(pcie->base + oarr_offset),
+   readl(pcie->base + oarr_offset + 4));
+   dev_dbg(dev, "omap lo 0x%x omap hi 0x%x\n",
+   readl(pcie->base + omap_offset),
+   readl(pcie->base + omap_offset + 4));

return 0;
 }
@@ -1054,8 +1054,8 @@ static int iproc_pcie_ib_write(struct iproc_pcie
*pcie, int region_idx,
iproc_pcie_reg_is_invalid(imap_offset))
return -EINVAL;

-   dev_info(dev, "ib region [%d]: offset 0x%x axi %pap pci %pap\n",
-region_idx, iarr_offset, _addr, _addr);
+   dev_dbg(dev, "ib region [%d]: offset 0x%x axi %pap pci %pap\n",
+   region_idx, iarr_offset, _addr, _addr);

/*
 * Program the IARR registers.  The upper 32-bit IARR register is
@@ -1065,9 +1065,9 @@ static int iproc_pcie_ib_write(struct iproc_pcie
*pcie, int region_idx,
   pcie->base + iarr_offset);
writel(upper_32_bits(pci_addr), pcie->base + iarr_offset + 4);

-   dev_info(dev, "iarr lo 0x%x iarr hi 0x%x\n",
-readl(pcie->base + iarr_offset),
-readl(pcie->base + iarr_offset + 4));
+   dev_dbg(dev, "iarr lo 0x%x iarr hi 0x%x\n",
+   readl(pcie->base + iarr_offset),
+   readl(pcie->base + iarr_offset + 4));

/*
 * Now program the IMAP registers.  Each IARR region may have one or
@@ -1081,10 +1081,10 @@ static int iproc_pcie_ib_write(struct
iproc_pcie *pcie, int region_idx,
writel(upper_32_bits(axi_addr),
   pcie->base + imap_offset + ib_map->imap_addr_offset);

-   dev_info(dev, "imap window [%d] lo 0x%x hi 0x%x\n",
-window_idx, readl(pcie->base + imap_offset),
-readl(pcie->base + imap_offset +
-  ib_map->imap_addr_offset));
+   dev_dbg(dev, "imap window [%d] lo 0x%x hi 0x%x\n",
+   window_idx, readl(pcie->base + imap_offset),
+   readl(pcie->base + imap_offset +
+ ib_map->imap_addr_offset));

imap_offset += ib_map->imap_window_offset;
axi_addr += size;


Reviewed-by: Oza Pawandeep 


Re: [PATCH v2 1/5] PCI: iproc: Activate PAXC bridge quirk for more devices

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

Activate PAXC bridge quirk for more PAXC based PCIe root complex with
the following PCIe device ID:
0xd750, 0xd802, 0xd804

Signed-off-by: Ray Jui 
---
 drivers/pci/quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 2990ad1..47dfea0 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2195,6 +2195,9 @@ static void quirk_paxc_bridge(struct pci_dev 
*pdev)

 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd, 
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0, 
quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, 
quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, 
quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, 
quirk_paxc_bridge);

 #endif

 /* Originally in EDAC sources for i82875P:


Reviewed-by: Oza Pawandeep 


Re: [PATCH v2 1/5] PCI: iproc: Activate PAXC bridge quirk for more devices

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

Activate PAXC bridge quirk for more PAXC based PCIe root complex with
the following PCIe device ID:
0xd750, 0xd802, 0xd804

Signed-off-by: Ray Jui 
---
 drivers/pci/quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 2990ad1..47dfea0 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2195,6 +2195,9 @@ static void quirk_paxc_bridge(struct pci_dev 
*pdev)

 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd, 
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0, 
quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, 
quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, 
quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, 
quirk_paxc_bridge);

 #endif

 /* Originally in EDAC sources for i82875P:


Reviewed-by: Oza Pawandeep 


Re: [PATCH v2 4/5] PCI: iproc: Reject unconfigured physical functions from PAXC

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

PAXC is an emulated PCIe root complex internally in various Broadcom
based SoCs. PAXC internally connects to the embedded network processor
within these SoCs, with the embedeed network processor exposed as an
endpoint device

The number of physical functions from the embedded network processor
that can be accessed depend on the firmware configuration.
Unfortunately, due to an ASIC bug, unconfigured physical functions 
cannot

be properly hidden from the root complex during enumerattion. As a
result, config write access to these unconfigured physical functions
during enumeration will cause a bus lock up on the embedded network
processor

Fortunately, these unconfigured physical functions contain a very
specific, staled PCIe device ID 0x168e. By making use of this device 
ID,

one is able to terminate the enumeration early in the vendor/device ID
config read

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 26 +-
 drivers/pci/host/pcie-iproc.h |  5 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 0804aa2..59be1e0 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -582,6 +582,25 @@ static int iproc_pcie_config_read(struct pci_bus
*bus, unsigned int devfn,
if (size <= 2)
*val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);

+   /*
+	 * For PAXC and PAXCv2, the total number of PFs that one can 
enumerate
+	 * depends on the firmware configuration. Unfortunately, due to an 
ASIC

+* bug, unconfigured PFs cannot be properly hidden from the root
+	 * complex. As a result, write access to these PFs will cause bus 
lock

+* up on the embedded processor
+*
+	 * Since all unconfigured PFs are left with an incorrect, staled 
device
+	 * ID of 0x168e (PCI_DEVICE_ID_NX2_57810), we try to catch those 
access

+* early here and reject them all
+*/
+#define DEVICE_ID_MASK 0x
+#define DEVICE_ID_SHIFT16
+   if (pcie->rej_unconfig_pf &&
+   (where & CFG_ADDR_REG_NUM_MASK) == PCI_VENDOR_ID)
+   if ((*val & DEVICE_ID_MASK) ==
+   (PCI_DEVICE_ID_NX2_57810 << DEVICE_ID_SHIFT))
+   return PCIBIOS_FUNC_NOT_SUPPORTED;
+
return PCIBIOS_SUCCESSFUL;
 }

@@ -681,7 +700,7 @@ static int iproc_pcie_config_read32(struct pci_bus
*bus, unsigned int devfn,
struct iproc_pcie *pcie = iproc_data(bus);

iproc_pcie_apb_err_disable(bus, true);
-   if (pcie->type == IPROC_PCIE_PAXB_V2)
+   if (pcie->iproc_cfg_read)
ret = iproc_pcie_config_read(bus, devfn, where, size, val);
else
ret = pci_generic_config_read32(bus, devfn, where, size, val);
@@ -1336,6 +1355,7 @@ static int iproc_pcie_rev_init(struct iproc_pcie 
*pcie)

break;
case IPROC_PCIE_PAXB:
regs = iproc_pcie_reg_paxb;
+   pcie->iproc_cfg_read = true;
pcie->has_apb_err_disable = true;
if (pcie->need_ob_cfg) {
pcie->ob_map = paxb_ob_map;
@@ -1358,10 +1378,14 @@ static int iproc_pcie_rev_init(struct 
iproc_pcie *pcie)

case IPROC_PCIE_PAXC:
regs = iproc_pcie_reg_paxc;
pcie->ep_is_internal = true;
+   pcie->iproc_cfg_read = true;
+   pcie->rej_unconfig_pf = true;
break;
case IPROC_PCIE_PAXC_V2:
regs = iproc_pcie_reg_paxc_v2;
pcie->ep_is_internal = true;
+   pcie->iproc_cfg_read = true;
+   pcie->rej_unconfig_pf = true;
pcie->need_msi_steer = true;
break;
default:
diff --git a/drivers/pci/host/pcie-iproc.h 
b/drivers/pci/host/pcie-iproc.h

index 9d5cfee..4f03ea5 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ -58,6 +58,9 @@ struct iproc_msi;
  * @phy: optional PHY device that controls the Serdes
  * @map_irq: function callback to map interrupts
  * @ep_is_internal: indicates an internal emulated endpoint device is 
connected
+ * @iproc_cfg_read: indicates the iProc config read function should be 
used
+ * @rej_unconfig_pf: indicates the root complex needs to detect and 
reject
+ * enumeration against unconfigured physical functions emulated in the 
ASIC
  * @has_apb_err_disable: indicates the controller can be configured to 
prevent

  * unsupported request from being forwarded as an APB bus error
  * @fix_paxc_cap: indicates the controller has corrupted capability 
list in its

@@ -86,6 +89,8 @@ struct iproc_pcie {
struct phy *phy;
int (*map_irq)(const struct pci_dev *, u8, u8);
bool ep_is_internal;
+   bool iproc_cfg_read;
+   bool rej_unconfig_pf;
bool has_apb_err_disable;
bool 

Re: [PATCH v2 4/5] PCI: iproc: Reject unconfigured physical functions from PAXC

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

PAXC is an emulated PCIe root complex internally in various Broadcom
based SoCs. PAXC internally connects to the embedded network processor
within these SoCs, with the embedeed network processor exposed as an
endpoint device

The number of physical functions from the embedded network processor
that can be accessed depend on the firmware configuration.
Unfortunately, due to an ASIC bug, unconfigured physical functions 
cannot

be properly hidden from the root complex during enumerattion. As a
result, config write access to these unconfigured physical functions
during enumeration will cause a bus lock up on the embedded network
processor

Fortunately, these unconfigured physical functions contain a very
specific, staled PCIe device ID 0x168e. By making use of this device 
ID,

one is able to terminate the enumeration early in the vendor/device ID
config read

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 26 +-
 drivers/pci/host/pcie-iproc.h |  5 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 0804aa2..59be1e0 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -582,6 +582,25 @@ static int iproc_pcie_config_read(struct pci_bus
*bus, unsigned int devfn,
if (size <= 2)
*val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);

+   /*
+	 * For PAXC and PAXCv2, the total number of PFs that one can 
enumerate
+	 * depends on the firmware configuration. Unfortunately, due to an 
ASIC

+* bug, unconfigured PFs cannot be properly hidden from the root
+	 * complex. As a result, write access to these PFs will cause bus 
lock

+* up on the embedded processor
+*
+	 * Since all unconfigured PFs are left with an incorrect, staled 
device
+	 * ID of 0x168e (PCI_DEVICE_ID_NX2_57810), we try to catch those 
access

+* early here and reject them all
+*/
+#define DEVICE_ID_MASK 0x
+#define DEVICE_ID_SHIFT16
+   if (pcie->rej_unconfig_pf &&
+   (where & CFG_ADDR_REG_NUM_MASK) == PCI_VENDOR_ID)
+   if ((*val & DEVICE_ID_MASK) ==
+   (PCI_DEVICE_ID_NX2_57810 << DEVICE_ID_SHIFT))
+   return PCIBIOS_FUNC_NOT_SUPPORTED;
+
return PCIBIOS_SUCCESSFUL;
 }

@@ -681,7 +700,7 @@ static int iproc_pcie_config_read32(struct pci_bus
*bus, unsigned int devfn,
struct iproc_pcie *pcie = iproc_data(bus);

iproc_pcie_apb_err_disable(bus, true);
-   if (pcie->type == IPROC_PCIE_PAXB_V2)
+   if (pcie->iproc_cfg_read)
ret = iproc_pcie_config_read(bus, devfn, where, size, val);
else
ret = pci_generic_config_read32(bus, devfn, where, size, val);
@@ -1336,6 +1355,7 @@ static int iproc_pcie_rev_init(struct iproc_pcie 
*pcie)

break;
case IPROC_PCIE_PAXB:
regs = iproc_pcie_reg_paxb;
+   pcie->iproc_cfg_read = true;
pcie->has_apb_err_disable = true;
if (pcie->need_ob_cfg) {
pcie->ob_map = paxb_ob_map;
@@ -1358,10 +1378,14 @@ static int iproc_pcie_rev_init(struct 
iproc_pcie *pcie)

case IPROC_PCIE_PAXC:
regs = iproc_pcie_reg_paxc;
pcie->ep_is_internal = true;
+   pcie->iproc_cfg_read = true;
+   pcie->rej_unconfig_pf = true;
break;
case IPROC_PCIE_PAXC_V2:
regs = iproc_pcie_reg_paxc_v2;
pcie->ep_is_internal = true;
+   pcie->iproc_cfg_read = true;
+   pcie->rej_unconfig_pf = true;
pcie->need_msi_steer = true;
break;
default:
diff --git a/drivers/pci/host/pcie-iproc.h 
b/drivers/pci/host/pcie-iproc.h

index 9d5cfee..4f03ea5 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ -58,6 +58,9 @@ struct iproc_msi;
  * @phy: optional PHY device that controls the Serdes
  * @map_irq: function callback to map interrupts
  * @ep_is_internal: indicates an internal emulated endpoint device is 
connected
+ * @iproc_cfg_read: indicates the iProc config read function should be 
used
+ * @rej_unconfig_pf: indicates the root complex needs to detect and 
reject
+ * enumeration against unconfigured physical functions emulated in the 
ASIC
  * @has_apb_err_disable: indicates the controller can be configured to 
prevent

  * unsupported request from being forwarded as an APB bus error
  * @fix_paxc_cap: indicates the controller has corrupted capability 
list in its

@@ -86,6 +89,8 @@ struct iproc_pcie {
struct phy *phy;
int (*map_irq)(const struct pci_dev *, u8, u8);
bool ep_is_internal;
+   bool iproc_cfg_read;
+   bool rej_unconfig_pf;
bool has_apb_err_disable;
bool 

Re: [PATCH v2 3/5] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 680f6b1..0804aa2 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1197,10 +1197,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
iproc_pcie *pcie, u64 msi_addr)
return ret;
 }

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr,

+bool enable)
 {
u32 val;

+   if (!enable) {
+   /*
+* Disable PAXC MSI steering. All write transfers will be
+* treated as non-MSI transfers
+*/
+   val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+   val &= ~MSI_ENABLE_CFG;
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+   return;

can be dropped.

+   }
+
/*
 * Program bits [43:13] of address of GITS_TRANSLATER register into
 	 * bits [30:0] of the MSI base address register.  In fact, in all 
iProc
@@ -1254,7 +1266,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie 
*pcie,

return ret;
break;
case IPROC_PCIE_PAXC_V2:
-   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
break;
default:
return -EINVAL;
@@ -1480,6 +1492,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
 }
 EXPORT_SYMBOL(iproc_pcie_remove);

+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based 
root

+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+   struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+   if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+   quirk_paxc_disable_msi_parsing);
+
 MODULE_AUTHOR("Ray Jui ");
 MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
 MODULE_LICENSE("GPL v2");


Re: [PATCH v2 3/5] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 680f6b1..0804aa2 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1197,10 +1197,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
iproc_pcie *pcie, u64 msi_addr)
return ret;
 }

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr,

+bool enable)
 {
u32 val;

+   if (!enable) {
+   /*
+* Disable PAXC MSI steering. All write transfers will be
+* treated as non-MSI transfers
+*/
+   val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+   val &= ~MSI_ENABLE_CFG;
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+   return;

can be dropped.

+   }
+
/*
 * Program bits [43:13] of address of GITS_TRANSLATER register into
 	 * bits [30:0] of the MSI base address register.  In fact, in all 
iProc
@@ -1254,7 +1266,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie 
*pcie,

return ret;
break;
case IPROC_PCIE_PAXC_V2:
-   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
break;
default:
return -EINVAL;
@@ -1480,6 +1492,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
 }
 EXPORT_SYMBOL(iproc_pcie_remove);

+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based 
root

+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+   struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+   if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+   quirk_paxc_disable_msi_parsing);
+
 MODULE_AUTHOR("Ray Jui ");
 MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
 MODULE_LICENSE("GPL v2");


Re: [PATCH v2 2/5] PCI: iproc: Fix up corrupted PAXC root complex config registers

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS

This patch fixes up the issue by manually populating the related
PCIe capabilities

Signed-off-by: Ray Jui 
---
 drivers/pci/host/pcie-iproc.c | 65 
+++

 drivers/pci/host/pcie-iproc.h |  3 ++
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 3c76c5f..680f6b1 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -85,6 +85,8 @@
 #define IMAP_VALID_SHIFT   0
 #define IMAP_VALID BIT(IMAP_VALID_SHIFT)

+#define IPROC_PCI_PM_CAP   0x48
+#define IPROC_PCI_PM_CAP_MASK  0x
 #define IPROC_PCI_EXP_CAP  0xac

 #define IPROC_PCIE_REG_INVALID 0x
@@ -375,6 +377,17 @@ static const u16 iproc_pcie_reg_paxc_v2[] = {
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
 };

+/*
+ * List of device IDs of controllers that have corrupted capability 
list that

+ * require SW fixup
+ */
+static const u16 iproc_pcie_corrupt_cap_did[] = {
+   0x16cd,
+   0x16f0,
+   0xd802,
+   0xd804
+};
+
 static inline struct iproc_pcie *iproc_data(struct pci_bus *bus)
 {
struct iproc_pcie *pcie = bus->sysdata;
@@ -495,6 +508,49 @@ static unsigned int iproc_pcie_cfg_retry(void
__iomem *cfg_data_p)
return data;
 }

+static void iproc_pcie_fix_cap(struct iproc_pcie *pcie, int where, u32 
*val)

+{
+   u32 i, dev_id;
+
+   switch (where & ~0x3) {
+   case PCI_VENDOR_ID:
+   dev_id = *val >> 16;
+
+   /*
+* Activate fixup for those controllers that have corrupted
+* capability list registers
+*/
+   for (i = 0; i < ARRAY_SIZE(iproc_pcie_corrupt_cap_did); i++)
+   if (dev_id == iproc_pcie_corrupt_cap_did[i])
+   pcie->fix_paxc_cap = true;


and I think this code will try to fix up every time config space is 
read.

Does this get corrupted often, randomly ?
Can it not be solved by using one time Quirk ?
and if not Quirk, you dont want to be setting pcie->fix_paxc_cap = false 
somewhere


besides, pcie->fix_paxc_cap = true; is set if PCI_VENDOR_ID is read 
first.
and rest cases stay with the assumption that PCI_VENDOR_ID will be read 
first.

which is infact read first during enumeration
(that is the assumption code is making), but that is safe assumption to 
make I think.



+   break;
+
+   case IPROC_PCI_PM_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise PM, force next capability to PCIe */
+   *val &= ~IPROC_PCI_PM_CAP_MASK;
+   *val |= IPROC_PCI_EXP_CAP << 8 | PCI_CAP_ID_PM;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise root port, version 2, terminate here */
+   *val = (PCI_EXP_TYPE_ROOT_PORT << 4 | 2) << 16 |
+   PCI_CAP_ID_EXP;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL:
+   /* Don't advertise CRS SV support */
+   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
+   break;
+
+   default:
+   break;
+   }
+}
+
 static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int 
devfn,

  int where, int size, u32 *val)
 {
@@ -509,13 +565,10 @@ static int iproc_pcie_config_read(struct pci_bus
*bus, unsigned int devfn,
/* root complex access */
if (busno == 0) {
ret = pci_generic_config_read32(bus, devfn, where, size, val);
-   if (ret != PCIBIOS_SUCCESSFUL)
-   return ret;
+   if (ret == PCIBIOS_SUCCESSFUL)
+   iproc_pcie_fix_cap(pcie, where, val);

-   /* Don't advertise CRS SV support */
-   if ((where & ~0x3) == IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL)
-   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
-   return PCIBIOS_SUCCESSFUL;
+   return ret;
}

cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
diff --git a/drivers/pci/host/pcie-iproc.h 
b/drivers/pci/host/pcie-iproc.h

index 814b600..9d5cfee 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ 

Re: [PATCH v2 2/5] PCI: iproc: Fix up corrupted PAXC root complex config registers

2018-06-12 Thread poza

On 2018-06-12 05:51, Ray Jui wrote:

On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS

This patch fixes up the issue by manually populating the related
PCIe capabilities

Signed-off-by: Ray Jui 
---
 drivers/pci/host/pcie-iproc.c | 65 
+++

 drivers/pci/host/pcie-iproc.h |  3 ++
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 3c76c5f..680f6b1 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -85,6 +85,8 @@
 #define IMAP_VALID_SHIFT   0
 #define IMAP_VALID BIT(IMAP_VALID_SHIFT)

+#define IPROC_PCI_PM_CAP   0x48
+#define IPROC_PCI_PM_CAP_MASK  0x
 #define IPROC_PCI_EXP_CAP  0xac

 #define IPROC_PCIE_REG_INVALID 0x
@@ -375,6 +377,17 @@ static const u16 iproc_pcie_reg_paxc_v2[] = {
[IPROC_PCIE_CFG_DATA]   = 0x1fc,
 };

+/*
+ * List of device IDs of controllers that have corrupted capability 
list that

+ * require SW fixup
+ */
+static const u16 iproc_pcie_corrupt_cap_did[] = {
+   0x16cd,
+   0x16f0,
+   0xd802,
+   0xd804
+};
+
 static inline struct iproc_pcie *iproc_data(struct pci_bus *bus)
 {
struct iproc_pcie *pcie = bus->sysdata;
@@ -495,6 +508,49 @@ static unsigned int iproc_pcie_cfg_retry(void
__iomem *cfg_data_p)
return data;
 }

+static void iproc_pcie_fix_cap(struct iproc_pcie *pcie, int where, u32 
*val)

+{
+   u32 i, dev_id;
+
+   switch (where & ~0x3) {
+   case PCI_VENDOR_ID:
+   dev_id = *val >> 16;
+
+   /*
+* Activate fixup for those controllers that have corrupted
+* capability list registers
+*/
+   for (i = 0; i < ARRAY_SIZE(iproc_pcie_corrupt_cap_did); i++)
+   if (dev_id == iproc_pcie_corrupt_cap_did[i])
+   pcie->fix_paxc_cap = true;


and I think this code will try to fix up every time config space is 
read.

Does this get corrupted often, randomly ?
Can it not be solved by using one time Quirk ?
and if not Quirk, you dont want to be setting pcie->fix_paxc_cap = false 
somewhere


besides, pcie->fix_paxc_cap = true; is set if PCI_VENDOR_ID is read 
first.
and rest cases stay with the assumption that PCI_VENDOR_ID will be read 
first.

which is infact read first during enumeration
(that is the assumption code is making), but that is safe assumption to 
make I think.



+   break;
+
+   case IPROC_PCI_PM_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise PM, force next capability to PCIe */
+   *val &= ~IPROC_PCI_PM_CAP_MASK;
+   *val |= IPROC_PCI_EXP_CAP << 8 | PCI_CAP_ID_PM;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP:
+   if (pcie->fix_paxc_cap) {
+   /* advertise root port, version 2, terminate here */
+   *val = (PCI_EXP_TYPE_ROOT_PORT << 4 | 2) << 16 |
+   PCI_CAP_ID_EXP;
+   }
+   break;
+
+   case IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL:
+   /* Don't advertise CRS SV support */
+   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
+   break;
+
+   default:
+   break;
+   }
+}
+
 static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int 
devfn,

  int where, int size, u32 *val)
 {
@@ -509,13 +565,10 @@ static int iproc_pcie_config_read(struct pci_bus
*bus, unsigned int devfn,
/* root complex access */
if (busno == 0) {
ret = pci_generic_config_read32(bus, devfn, where, size, val);
-   if (ret != PCIBIOS_SUCCESSFUL)
-   return ret;
+   if (ret == PCIBIOS_SUCCESSFUL)
+   iproc_pcie_fix_cap(pcie, where, val);

-   /* Don't advertise CRS SV support */
-   if ((where & ~0x3) == IPROC_PCI_EXP_CAP + PCI_EXP_RTCTL)
-   *val &= ~(PCI_EXP_RTCAP_CRSVIS << 16);
-   return PCIBIOS_SUCCESSFUL;
+   return ret;
}

cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
diff --git a/drivers/pci/host/pcie-iproc.h 
b/drivers/pci/host/pcie-iproc.h

index 814b600..9d5cfee 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ 

Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-11 Thread poza

On 2018-06-11 15:31, p...@codeaurora.org wrote:

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.


Hi Bjorn and Keith,

broadcast_error_message()
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
.
pci_walk_bus(dev->subordinate, cb, _data);


so in case of ERR_FATAL, the walk bus is happening on subordinates,
and if I understand the walk right
then, pcie_portdrv_slot_reset() is called only on BRIDGES/Switches

If is never called on Root-Ports

having said that, now since we are removing the devices (compare to
previous error callback handling in ERR_FATAL)
I dont see the need of the above code anymore.



when I say above code, I meant this patch itself which removes ERR_FATAL 
handling out of pcie_portdrv_slot_reset



because there is nothing to restore to any more. as we are initiating
re-enumeration.

Regards,
Oza.


Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-11 Thread poza

On 2018-06-11 15:31, p...@codeaurora.org wrote:

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.


Hi Bjorn and Keith,

broadcast_error_message()
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
.
pci_walk_bus(dev->subordinate, cb, _data);


so in case of ERR_FATAL, the walk bus is happening on subordinates,
and if I understand the walk right
then, pcie_portdrv_slot_reset() is called only on BRIDGES/Switches

If is never called on Root-Ports

having said that, now since we are removing the devices (compare to
previous error callback handling in ERR_FATAL)
I dont see the need of the above code anymore.



when I say above code, I meant this patch itself which removes ERR_FATAL 
handling out of pcie_portdrv_slot_reset



because there is nothing to restore to any more. as we are initiating
re-enumeration.

Regards,
Oza.


Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-11 Thread poza

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.


Hi Bjorn and Keith,

broadcast_error_message()
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
.
pci_walk_bus(dev->subordinate, cb, _data);


so in case of ERR_FATAL, the walk bus is happening on subordinates, and 
if I understand the walk right

then, pcie_portdrv_slot_reset() is called only on BRIDGES/Switches

If is never called on Root-Ports

having said that, now since we are removing the devices (compare to 
previous error callback handling in ERR_FATAL)

I dont see the need of the above code anymore.

because there is nothing to restore to any more. as we are initiating 
re-enumeration.


Regards,
Oza.


















Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-11 Thread poza

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.


Hi Bjorn and Keith,

broadcast_error_message()
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
.
pci_walk_bus(dev->subordinate, cb, _data);


so in case of ERR_FATAL, the walk bus is happening on subordinates, and 
if I understand the walk right

then, pcie_portdrv_slot_reset() is called only on BRIDGES/Switches

If is never called on Root-Ports

having said that, now since we are removing the devices (compare to 
previous error callback handling in ERR_FATAL)

I dont see the need of the above code anymore.

because there is nothing to restore to any more. as we are initiating 
re-enumeration.


Regards,
Oza.


















Re: [RFC PATCH v1 0/9] PCI/portdrv: Squash into one file

2018-06-11 Thread poza

On 2018-06-09 01:42, Bjorn Helgaas wrote:
The portdrv code is scattered across several files, which makes it a 
bit of

a hassle to browse.  Consolidate it all in a single file.



Although I do not have any objection is merging the code,

It looks to me that the 2 files served purpose of keeping
portdrv device functionality in  one file > portdrv_pci.c
while, port driver helper and exported service functions for services 
are in > portdrv_core.c


If that was the original intention, it just looks okay to me.
But I am not sure the way the code is now, we can call is scattered,
and also I do not think by merging it, we can call it more organised.

although certainly it looks easier to browse as you are suggesting.
little less confusing.

This is all pure code moves; no functional changes intended.  Well, 
there

is a function rename, but it shouldn't change any behavior.

---

Bjorn Helgaas (9):
  PCI/portdrv: Rename resume_iter() to prevent name collision
  PCI/portdrv: Squash pieces of portdrv_core.c into portdrv_pci.c
  PCI/portdrv: Squash PM-related code into portdrv_pci.c
  PCI/portdrv: Squash device removal code into portdrv_pci.c
  PCI/portdrv: Squash lookup interfaces into portdrv_pci.c
  PCI/portdrv: Squash service driver registration into 
portdrv_pci.c

  PCI/portdrv: Move private definitions to portdrv_pci.c
  PCI/portdrv: Clean up whitespace
  PCI/portdrv: Rename portdrv_pci.c to portdrv.c


 drivers/pci/pcie/Makefile   |6
 drivers/pci/pcie/portdrv.c  |  822 
+++

 drivers/pci/pcie/portdrv.h  |   15 -
 drivers/pci/pcie/portdrv_core.c |  578 ---
 drivers/pci/pcie/portdrv_pci.c  |  261 
 5 files changed, 824 insertions(+), 858 deletions(-)
 create mode 100644 drivers/pci/pcie/portdrv.c
 delete mode 100644 drivers/pci/pcie/portdrv_core.c
 delete mode 100644 drivers/pci/pcie/portdrv_pci.c


Re: [RFC PATCH v1 0/9] PCI/portdrv: Squash into one file

2018-06-11 Thread poza

On 2018-06-09 01:42, Bjorn Helgaas wrote:
The portdrv code is scattered across several files, which makes it a 
bit of

a hassle to browse.  Consolidate it all in a single file.



Although I do not have any objection is merging the code,

It looks to me that the 2 files served purpose of keeping
portdrv device functionality in  one file > portdrv_pci.c
while, port driver helper and exported service functions for services 
are in > portdrv_core.c


If that was the original intention, it just looks okay to me.
But I am not sure the way the code is now, we can call is scattered,
and also I do not think by merging it, we can call it more organised.

although certainly it looks easier to browse as you are suggesting.
little less confusing.

This is all pure code moves; no functional changes intended.  Well, 
there

is a function rename, but it shouldn't change any behavior.

---

Bjorn Helgaas (9):
  PCI/portdrv: Rename resume_iter() to prevent name collision
  PCI/portdrv: Squash pieces of portdrv_core.c into portdrv_pci.c
  PCI/portdrv: Squash PM-related code into portdrv_pci.c
  PCI/portdrv: Squash device removal code into portdrv_pci.c
  PCI/portdrv: Squash lookup interfaces into portdrv_pci.c
  PCI/portdrv: Squash service driver registration into 
portdrv_pci.c

  PCI/portdrv: Move private definitions to portdrv_pci.c
  PCI/portdrv: Clean up whitespace
  PCI/portdrv: Rename portdrv_pci.c to portdrv.c


 drivers/pci/pcie/Makefile   |6
 drivers/pci/pcie/portdrv.c  |  822 
+++

 drivers/pci/pcie/portdrv.h  |   15 -
 drivers/pci/pcie/portdrv_core.c |  578 ---
 drivers/pci/pcie/portdrv_pci.c  |  261 
 5 files changed, 824 insertions(+), 858 deletions(-)
 create mode 100644 drivers/pci/pcie/portdrv.c
 delete mode 100644 drivers/pci/pcie/portdrv_core.c
 delete mode 100644 drivers/pci/pcie/portdrv_pci.c


Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-07 Thread poza

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.



Yes Bjorn thats right.
I am trying to understand it but no clue.
since it is restoring the stuffs in ERR_FATAL case, why would PCIe 
bridge loose all the settings ?  [config space, aer bits, master, device 
enable etc..)
Max we do is link_reset in ERR_FATAL case, and Secondary bus reset 
should affect downstream components (not upstream)






Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-07 Thread poza

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.



Yes Bjorn thats right.
I am trying to understand it but no clue.
since it is restoring the stuffs in ERR_FATAL case, why would PCIe 
bridge loose all the settings ?  [config space, aer bits, master, device 
enable etc..)
Max we do is link_reset in ERR_FATAL case, and Secondary bus reset 
should affect downstream components (not upstream)






Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-07 Thread poza

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.


Keith,

do you know why in ERR_FATAL case following was done ?
have a look at pcie_portdrv_slot_reset() handling (for bridges, switches 
etc..)


Regards,
Oza.





Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-07 Thread poza

On 2018-06-08 03:04, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 07:18:03PM +0530, p...@codeaurora.org wrote:

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>
>  /* global data */
>
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -  int retval;
> -
> -  retval = pci_enable_device(dev);
> -  if (retval)
> -  return retval;
> -  pci_set_master(dev);
> -  return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -  /* If fatal, restore cfg space for possible link reset at upstream */
> -  if (dev->error_state == pci_channel_io_frozen) {
> -  dev->state_saved = true;
> -  pci_restore_state(dev);
> -  pcie_portdrv_restore_config(dev);
> -  pci_enable_pcie_error_reporting(dev);
> -  }
> -
>return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case

where it
restores the config space, enable device, set master and enable error
reporting
and as far as I understand this is being done for upstream link 
(bridges

etc..)

why was it done at the first point (I checked the commit description, 
but

could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?


You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.


Keith,

do you know why in ERR_FATAL case following was done ?
have a look at pcie_portdrv_slot_reset() handling (for bridges, switches 
etc..)


Regards,
Oza.





Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-07 Thread poza

On 2018-06-07 11:30, Oza Pawandeep wrote:
We are handling ERR_FATAL by resetting the Link in software,skipping 
the

driver pci_error_handlers callbacks, removing the devices from the PCI
subsystem, and re-enumerating, as a result of that, no more calling
pcie_portdrv_slot_reset in ERR_FATAL case.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pcie/portdrv_pci.c 
b/drivers/pci/pcie/portdrv_pci.c

index 973f1b8..92f5d330 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);

 /* global data */

-static int pcie_portdrv_restore_config(struct pci_dev *dev)
-{
-   int retval;
-
-   retval = pci_enable_device(dev);
-   if (retval)
-   return retval;
-   pci_set_master(dev);
-   return 0;
-}
-
 #ifdef CONFIG_PM
 static int pcie_port_runtime_suspend(struct device *dev)
 {
@@ -162,14 +151,6 @@ static pci_ers_result_t
pcie_portdrv_mmio_enabled(struct pci_dev *dev)

 static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 {
-   /* If fatal, restore cfg space for possible link reset at upstream */
-   if (dev->error_state == pci_channel_io_frozen) {
-   dev->state_saved = true;
-   pci_restore_state(dev);
-   pcie_portdrv_restore_config(dev);
-   pci_enable_pcie_error_reporting(dev);
-   }
-
return PCI_ERS_RESULT_RECOVERED;
 }



Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case where it
restores the config space, enable device, set master and enable error 
reporting
and as far as I understand this is being done for upstream link (bridges 
etc..)


why was it done at the first point (I checked the commit description, 
but could not really get it)

and do we need to handle the same thing in ERR_FATAL now ?

Regards,
Oza.
















Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()

2018-06-07 Thread poza

On 2018-06-07 11:30, Oza Pawandeep wrote:
We are handling ERR_FATAL by resetting the Link in software,skipping 
the

driver pci_error_handlers callbacks, removing the devices from the PCI
subsystem, and re-enumerating, as a result of that, no more calling
pcie_portdrv_slot_reset in ERR_FATAL case.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pcie/portdrv_pci.c 
b/drivers/pci/pcie/portdrv_pci.c

index 973f1b8..92f5d330 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);

 /* global data */

-static int pcie_portdrv_restore_config(struct pci_dev *dev)
-{
-   int retval;
-
-   retval = pci_enable_device(dev);
-   if (retval)
-   return retval;
-   pci_set_master(dev);
-   return 0;
-}
-
 #ifdef CONFIG_PM
 static int pcie_port_runtime_suspend(struct device *dev)
 {
@@ -162,14 +151,6 @@ static pci_ers_result_t
pcie_portdrv_mmio_enabled(struct pci_dev *dev)

 static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 {
-   /* If fatal, restore cfg space for possible link reset at upstream */
-   if (dev->error_state == pci_channel_io_frozen) {
-   dev->state_saved = true;
-   pci_restore_state(dev);
-   pcie_portdrv_restore_config(dev);
-   pci_enable_pcie_error_reporting(dev);
-   }
-
return PCI_ERS_RESULT_RECOVERED;
 }



Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()

because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case where it
restores the config space, enable device, set master and enable error 
reporting
and as far as I understand this is being done for upstream link (bridges 
etc..)


why was it done at the first point (I checked the commit description, 
but could not really get it)

and do we need to handle the same thing in ERR_FATAL now ?

Regards,
Oza.
















Re: [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits

2018-06-07 Thread poza

On 2018-06-07 18:51, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 02:00:29AM -0400, Oza Pawandeep wrote:

PCIe ERR_NONFATAL and ERR_FATAL are uncorrectable errors, and clearing
uncorrectable error bits should take error mask into account.

Signed-off-by: Oza Pawandeep 


If/when you repost these, please include a [0/6] cover letter with an
overview of the purpose of the series.

I assume these are for v4.19, so I'll look at them after the merge
window.

If they fix issues introduced during the v4.18 merge window, we may be
able to merge them during the v4.18 -rc cycle.  In this case, I would
need specifics about what exactly the problems are.


sure Bjorn, will include cover letter.
Mostly these fixes the things which existed before 4.18 as well.
I have a question, please clarify when you get a chance.

I am posting the question on tops of PATCH-6.

Regards,
Oza.



diff --git a/drivers/pci/pcie/aer/aerdrv.c 
b/drivers/pci/pcie/aer/aerdrv.c

index 377e576..8cbc62b 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -341,8 +341,6 @@ static pci_ers_result_t aer_root_reset(struct 
pci_dev *dev)

  */
 static void aer_error_resume(struct pci_dev *dev)
 {
-   int pos;
-   u32 status, mask;
u16 reg16;

/* Clean up Root device status */
@@ -350,11 +348,7 @@ static void aer_error_resume(struct pci_dev *dev)
pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);

/* Clean AER Root Error Status */
-   pos = dev->aer_cap;
-   pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, );
-   pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, );
-   status &= ~mask; /* Clear corresponding nonfatal bits */
-   pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
+   pci_cleanup_aer_uncorrect_error_status(dev);
 }

 /**
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index 946f3f6..309f3f5 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -50,13 +50,17 @@ 
EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);

 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 {
int pos;
-   u32 status;
+   u32 status, mask;

pos = dev->aer_cap;
if (!pos)
return -EIO;

+   /* Clean AER Root Error Status */
+   pos = dev->aer_cap;
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, );
+   pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, );
+   status &= ~mask; /* Clear corresponding nonfatal bits */
if (status)
pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);

--
2.7.4



Re: [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits

2018-06-07 Thread poza

On 2018-06-07 18:51, Bjorn Helgaas wrote:

On Thu, Jun 07, 2018 at 02:00:29AM -0400, Oza Pawandeep wrote:

PCIe ERR_NONFATAL and ERR_FATAL are uncorrectable errors, and clearing
uncorrectable error bits should take error mask into account.

Signed-off-by: Oza Pawandeep 


If/when you repost these, please include a [0/6] cover letter with an
overview of the purpose of the series.

I assume these are for v4.19, so I'll look at them after the merge
window.

If they fix issues introduced during the v4.18 merge window, we may be
able to merge them during the v4.18 -rc cycle.  In this case, I would
need specifics about what exactly the problems are.


sure Bjorn, will include cover letter.
Mostly these fixes the things which existed before 4.18 as well.
I have a question, please clarify when you get a chance.

I am posting the question on tops of PATCH-6.

Regards,
Oza.



diff --git a/drivers/pci/pcie/aer/aerdrv.c 
b/drivers/pci/pcie/aer/aerdrv.c

index 377e576..8cbc62b 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -341,8 +341,6 @@ static pci_ers_result_t aer_root_reset(struct 
pci_dev *dev)

  */
 static void aer_error_resume(struct pci_dev *dev)
 {
-   int pos;
-   u32 status, mask;
u16 reg16;

/* Clean up Root device status */
@@ -350,11 +348,7 @@ static void aer_error_resume(struct pci_dev *dev)
pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);

/* Clean AER Root Error Status */
-   pos = dev->aer_cap;
-   pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, );
-   pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, );
-   status &= ~mask; /* Clear corresponding nonfatal bits */
-   pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
+   pci_cleanup_aer_uncorrect_error_status(dev);
 }

 /**
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index 946f3f6..309f3f5 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -50,13 +50,17 @@ 
EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);

 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 {
int pos;
-   u32 status;
+   u32 status, mask;

pos = dev->aer_cap;
if (!pos)
return -EIO;

+   /* Clean AER Root Error Status */
+   pos = dev->aer_cap;
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, );
+   pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, );
+   status &= ~mask; /* Clear corresponding nonfatal bits */
if (status)
pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);

--
2.7.4



Re: [PATCH INTERNAL 3/3] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-05-18 Thread poza

On 2018-05-17 22:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 3c76c5f..b906d80 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
iproc_pcie *pcie, u64 msi_addr)
return ret;
 }

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr,

+bool enable)
 {
u32 val;

+   if (!enable) {
+   /*
+* Disable PAXC MSI steering. All write transfers will be
+* treated as non-MSI transfers
+*/
+   val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+   val &= ~MSI_ENABLE_CFG;
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+   return;
+   }
+
/*
 * Program bits [43:13] of address of GITS_TRANSLATER register into
 	 * bits [30:0] of the MSI base address register.  In fact, in all 
iProc
@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie 
*pcie,

return ret;
break;
case IPROC_PCIE_PAXC_V2:
-   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);


Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 
'false' ?
I see its getting called only from one place in current code 
iproc_pcie_msi_steer().



break;
default:
return -EINVAL;
@@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
 }
 EXPORT_SYMBOL(iproc_pcie_remove);

+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based 
root

+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+   struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+   if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+   quirk_paxc_disable_msi_parsing);
+
 MODULE_AUTHOR("Ray Jui ");
 MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
 MODULE_LICENSE("GPL v2");


Re: [PATCH INTERNAL 3/3] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-05-18 Thread poza

On 2018-05-17 22:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/pci/host/pcie-iproc.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c 
b/drivers/pci/host/pcie-iproc.c

index 3c76c5f..b906d80 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
iproc_pcie *pcie, u64 msi_addr)
return ret;
 }

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 
msi_addr,

+bool enable)
 {
u32 val;

+   if (!enable) {
+   /*
+* Disable PAXC MSI steering. All write transfers will be
+* treated as non-MSI transfers
+*/
+   val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+   val &= ~MSI_ENABLE_CFG;
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+   return;
+   }
+
/*
 * Program bits [43:13] of address of GITS_TRANSLATER register into
 	 * bits [30:0] of the MSI base address register.  In fact, in all 
iProc
@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie 
*pcie,

return ret;
break;
case IPROC_PCIE_PAXC_V2:
-   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);


Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 
'false' ?
I see its getting called only from one place in current code 
iproc_pcie_msi_steer().



break;
default:
return -EINVAL;
@@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
 }
 EXPORT_SYMBOL(iproc_pcie_remove);

+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based 
root

+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+   struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+   if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+   quirk_paxc_disable_msi_parsing);
+
 MODULE_AUTHOR("Ray Jui ");
 MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
 MODULE_LICENSE("GPL v2");


Re: [PATCH INTERNAL 2/3] PCI: iproc: Fix up corrupted PAXC root complex config registers

2018-05-18 Thread poza

On 2018-05-17 22:51, Ray Jui wrote:

On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS

This patch fixes up the issue by manually populating the related
PCIe capabilities based on readings from the PCIe capability structure

Signed-off-by: Ray Jui 
Reviewed-by: Anup Patel 
Reviewed-by: Scott Branden 
---
 drivers/pci/quirks.c | 95 


 1 file changed, 95 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 47dfea0..0cdbd0a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2198,6 +2198,101 @@
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, 
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, 
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, 
quirk_paxc_bridge);

+
+/*
+ * The PCI capabilities list for certain revisions of Broadcom PAXC 
root
+ * complexes is incorrectly terminated due to corrupted configuration 
space

+ * registers in the range of 0x50 - 0x5f
+ *
+ * As a result, the capability list becomes broken and prevent 
standard PCI

+ * stack from being able to traverse to the PCIe capability structure
+ */
+static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
+{
+   int pos, i = 0;
+   u8 next_cap;
+   u16 reg16, *cap;
+   struct pci_cap_saved_state *state;
+
+   /* bail out if PCIe capability can be found */
+   if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
+   return;
+
+   /* locate the power management capability */
+   pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
+   if (!pos)
+   return;
+
+   /* bail out if the next capability pointer is not 0x50/0x58 */
+   pci_read_config_byte(pdev, pos + 1, _cap);
+   if (next_cap != 0x50 && next_cap != 0x58)
+   return;
+
+   /* bail out if we do not terminate at 0x50/0x58 */
+   pos = next_cap;
+   pci_read_config_byte(pdev, pos + 1, _cap);
+   if (next_cap != 0x00)
+   return;
+
+   /*
+* On these buggy HW, PCIe capability structure is expected to be at
+* 0xac and should terminate the list
+*
+* Borrow the similar logic from theIntel DH895xCC VFs fixup to save

:%s /theIntel /Intel

+* the PCIe capability list
+*/
+   pos = 0xac;
+   pci_read_config_word(pdev, pos, );
+   if (reg16 == (0x | PCI_CAP_ID_EXP)) {
+   u32 status;
+
+#ifndef PCI_EXP_SAVE_REGS
+#define PCI_EXP_SAVE_REGS 7
+#endif
+   int size = PCI_EXP_SAVE_REGS * sizeof(u16);
+
+   pdev->pcie_cap = pos;
+   pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, );
+   pdev->pcie_flags_reg = reg16;
+   pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, );
+   pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
+
+   pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+   if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, ) !=
+   PCIBIOS_SUCCESSFUL || (status == 0x))
+   pdev->cfg_size = PCI_CFG_SPACE_SIZE;
+
+   if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
+   return;
+
+   state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
+   if (!state)
+   return;
+
+   state->cap.cap_nr = PCI_CAP_ID_EXP;
+   state->cap.cap_extended = 0;
+   state->cap.size = size;
+   cap = (u16 *)>cap.data[0];
+   pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_RTCTL,  [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, [i++]);
+   hlist_add_head(>next, >saved_cap_space);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 
PCI_DEVICE_ID_NX2_57810,

+   quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
+   quirk_paxc_pcie_capability);

Re: [PATCH INTERNAL 2/3] PCI: iproc: Fix up corrupted PAXC root complex config registers

2018-05-18 Thread poza

On 2018-05-17 22:51, Ray Jui wrote:

On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS

This patch fixes up the issue by manually populating the related
PCIe capabilities based on readings from the PCIe capability structure

Signed-off-by: Ray Jui 
Reviewed-by: Anup Patel 
Reviewed-by: Scott Branden 
---
 drivers/pci/quirks.c | 95 


 1 file changed, 95 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 47dfea0..0cdbd0a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2198,6 +2198,101 @@
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, 
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, 
quirk_paxc_bridge);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, 
quirk_paxc_bridge);

+
+/*
+ * The PCI capabilities list for certain revisions of Broadcom PAXC 
root
+ * complexes is incorrectly terminated due to corrupted configuration 
space

+ * registers in the range of 0x50 - 0x5f
+ *
+ * As a result, the capability list becomes broken and prevent 
standard PCI

+ * stack from being able to traverse to the PCIe capability structure
+ */
+static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
+{
+   int pos, i = 0;
+   u8 next_cap;
+   u16 reg16, *cap;
+   struct pci_cap_saved_state *state;
+
+   /* bail out if PCIe capability can be found */
+   if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
+   return;
+
+   /* locate the power management capability */
+   pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
+   if (!pos)
+   return;
+
+   /* bail out if the next capability pointer is not 0x50/0x58 */
+   pci_read_config_byte(pdev, pos + 1, _cap);
+   if (next_cap != 0x50 && next_cap != 0x58)
+   return;
+
+   /* bail out if we do not terminate at 0x50/0x58 */
+   pos = next_cap;
+   pci_read_config_byte(pdev, pos + 1, _cap);
+   if (next_cap != 0x00)
+   return;
+
+   /*
+* On these buggy HW, PCIe capability structure is expected to be at
+* 0xac and should terminate the list
+*
+* Borrow the similar logic from theIntel DH895xCC VFs fixup to save

:%s /theIntel /Intel

+* the PCIe capability list
+*/
+   pos = 0xac;
+   pci_read_config_word(pdev, pos, );
+   if (reg16 == (0x | PCI_CAP_ID_EXP)) {
+   u32 status;
+
+#ifndef PCI_EXP_SAVE_REGS
+#define PCI_EXP_SAVE_REGS 7
+#endif
+   int size = PCI_EXP_SAVE_REGS * sizeof(u16);
+
+   pdev->pcie_cap = pos;
+   pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, );
+   pdev->pcie_flags_reg = reg16;
+   pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, );
+   pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
+
+   pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+   if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, ) !=
+   PCIBIOS_SUCCESSFUL || (status == 0x))
+   pdev->cfg_size = PCI_CFG_SPACE_SIZE;
+
+   if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
+   return;
+
+   state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
+   if (!state)
+   return;
+
+   state->cap.cap_nr = PCI_CAP_ID_EXP;
+   state->cap.cap_extended = 0;
+   state->cap.size = size;
+   cap = (u16 *)>cap.data[0];
+   pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_RTCTL,  [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, [i++]);
+   pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, [i++]);
+   hlist_add_head(>next, >saved_cap_space);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 
PCI_DEVICE_ID_NX2_57810,

+   quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
+   quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 18:34, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 05:45:58PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 16:22, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-16 05:26, Bjorn Helgaas wrote:
> > > On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > > > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > > > DPC driver implements link_reset callback, and calls
> > > > > pci_do_fatal_recovery().
> > > > >
> > > > > Which follows standard path of ERR_FATAL recovery.
> > > > >
> > > > > Signed-off-by: Oza Pawandeep 
> > > > >
> > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > > index 5e8857a..6af7595 100644
> > > > > --- a/drivers/pci/pci.h
> > > > > +++ b/drivers/pci/pci.h
> > > > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > > > pci_resource_alignment(struct pci_dev *dev,
> > > > >  void pci_enable_acs(struct pci_dev *dev);
> > > > >
> > > > >  /* PCI error reporting and recovery */
> > > > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > > > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > > > >
> > > > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > index fdfc474..36e622d 100644
> > > > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > > > *aerdev,
> > > > >} else if (info->severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(dev);
> > > > >else if (info->severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(dev);
> > > > > +  pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > > > >  }
> > > > >
> > > > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct 
work_struct
> > > > > *work)
> > > > >if (entry.severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(pdev);
> > > > >else if (entry.severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(pdev);
> > > > > +  pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > > > >pci_dev_put(pdev);
> > > > >}
> > > > >  }
> > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > > > index 80ec384..5680c13 100644
> > > > > --- a/drivers/pci/pcie/dpc.c
> > > > > +++ b/drivers/pci/pcie/dpc.c
> > > > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > > > *dpc)
> > > > >pcie_wait_for_link(pdev, false);
> > > > >  }
> > > > >
> > > > > -static void dpc_work(struct work_struct *work)
> > > > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > > > >  {
> > > > > -  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > > > -  struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > > > -  struct pci_bus *parent = pdev->subordinate;
> > > > > -  u16 cap = dpc->cap_pos, ctl;
> > > > > -
> > > > > -  pci_lock_rescan_remove();
> > > > > -  list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > > > -   bus_list) {
> > > > > -  pci_dev_get(dev);
> > > > > -  pci_dev_set_disconnected(dev, NULL);
> > > > > -  if (pci_has_subordinate(dev))
> > > > > -  pci_walk_bus(dev->subordinate,
> > > > > -   pci_dev_set_disconnected, NULL);
> > > > > -  pci_stop_and_remove_bus_device(dev);
> > > > > -  pci_dev_put(dev);
> > > > > -  }
> > > > > -  pci_unlock_rescan_remove();
> > > > > -
> > > > > +  struct dpc_dev *dpc;
> > > > > +  struct pcie_device *pciedev;
> > > > > +  struct device *devdpc;
> > > > > +  u16 cap, ctl;
> > > > > +
> > > > > +  /*
> > > > > +   * DPC disables the Link automatically in hardware, so it has
> > > > > +   * already been reset by the time we get here.
> > > > > +   */
> > > > > +
> > > > > +  devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > > > +  pciedev = to_pcie_device(devdpc);
> > > > > +  dpc = get_service_data(pciedev);
> > > > > +  cap = dpc->cap_pos;
> > > > > +
> > > > > +  /*
> > > > > +   * Waiting until the link is inactive, then clearing DPC
> > > > > +   * trigger status to allow the port to leave DPC.
> > > > > +   */
> > > > >dpc_wait_link_inactive(dpc);
> > > > > +
> > > > >if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > > > -  return;
> > > > > +  return PCI_ERS_RESULT_DISCONNECT;
> > > > >

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 18:34, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 05:45:58PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 16:22, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-16 05:26, Bjorn Helgaas wrote:
> > > On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > > > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > > > DPC driver implements link_reset callback, and calls
> > > > > pci_do_fatal_recovery().
> > > > >
> > > > > Which follows standard path of ERR_FATAL recovery.
> > > > >
> > > > > Signed-off-by: Oza Pawandeep 
> > > > >
> > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > > index 5e8857a..6af7595 100644
> > > > > --- a/drivers/pci/pci.h
> > > > > +++ b/drivers/pci/pci.h
> > > > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > > > pci_resource_alignment(struct pci_dev *dev,
> > > > >  void pci_enable_acs(struct pci_dev *dev);
> > > > >
> > > > >  /* PCI error reporting and recovery */
> > > > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > > > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > > > >
> > > > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > index fdfc474..36e622d 100644
> > > > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > > > *aerdev,
> > > > >} else if (info->severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(dev);
> > > > >else if (info->severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(dev);
> > > > > +  pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > > > >  }
> > > > >
> > > > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct 
work_struct
> > > > > *work)
> > > > >if (entry.severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(pdev);
> > > > >else if (entry.severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(pdev);
> > > > > +  pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > > > >pci_dev_put(pdev);
> > > > >}
> > > > >  }
> > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > > > index 80ec384..5680c13 100644
> > > > > --- a/drivers/pci/pcie/dpc.c
> > > > > +++ b/drivers/pci/pcie/dpc.c
> > > > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > > > *dpc)
> > > > >pcie_wait_for_link(pdev, false);
> > > > >  }
> > > > >
> > > > > -static void dpc_work(struct work_struct *work)
> > > > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > > > >  {
> > > > > -  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > > > -  struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > > > -  struct pci_bus *parent = pdev->subordinate;
> > > > > -  u16 cap = dpc->cap_pos, ctl;
> > > > > -
> > > > > -  pci_lock_rescan_remove();
> > > > > -  list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > > > -   bus_list) {
> > > > > -  pci_dev_get(dev);
> > > > > -  pci_dev_set_disconnected(dev, NULL);
> > > > > -  if (pci_has_subordinate(dev))
> > > > > -  pci_walk_bus(dev->subordinate,
> > > > > -   pci_dev_set_disconnected, NULL);
> > > > > -  pci_stop_and_remove_bus_device(dev);
> > > > > -  pci_dev_put(dev);
> > > > > -  }
> > > > > -  pci_unlock_rescan_remove();
> > > > > -
> > > > > +  struct dpc_dev *dpc;
> > > > > +  struct pcie_device *pciedev;
> > > > > +  struct device *devdpc;
> > > > > +  u16 cap, ctl;
> > > > > +
> > > > > +  /*
> > > > > +   * DPC disables the Link automatically in hardware, so it has
> > > > > +   * already been reset by the time we get here.
> > > > > +   */
> > > > > +
> > > > > +  devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > > > +  pciedev = to_pcie_device(devdpc);
> > > > > +  dpc = get_service_data(pciedev);
> > > > > +  cap = dpc->cap_pos;
> > > > > +
> > > > > +  /*
> > > > > +   * Waiting until the link is inactive, then clearing DPC
> > > > > +   * trigger status to allow the port to leave DPC.
> > > > > +   */
> > > > >dpc_wait_link_inactive(dpc);
> > > > > +
> > > > >if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > > > -  return;
> > > > > +  return PCI_ERS_RESULT_DISCONNECT;
> > > > >if (dpc->rp_extensions 

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 18:34, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 05:45:58PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 16:22, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-16 05:26, Bjorn Helgaas wrote:
> > > On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > > > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > > > DPC driver implements link_reset callback, and calls
> > > > > pci_do_fatal_recovery().
> > > > >
> > > > > Which follows standard path of ERR_FATAL recovery.
> > > > >
> > > > > Signed-off-by: Oza Pawandeep 
> > > > >
> > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > > index 5e8857a..6af7595 100644
> > > > > --- a/drivers/pci/pci.h
> > > > > +++ b/drivers/pci/pci.h
> > > > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > > > pci_resource_alignment(struct pci_dev *dev,
> > > > >  void pci_enable_acs(struct pci_dev *dev);
> > > > >
> > > > >  /* PCI error reporting and recovery */
> > > > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > > > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > > > >
> > > > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > index fdfc474..36e622d 100644
> > > > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > > > *aerdev,
> > > > >} else if (info->severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(dev);
> > > > >else if (info->severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(dev);
> > > > > +  pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > > > >  }
> > > > >
> > > > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct 
work_struct
> > > > > *work)
> > > > >if (entry.severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(pdev);
> > > > >else if (entry.severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(pdev);
> > > > > +  pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > > > >pci_dev_put(pdev);
> > > > >}
> > > > >  }
> > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > > > index 80ec384..5680c13 100644
> > > > > --- a/drivers/pci/pcie/dpc.c
> > > > > +++ b/drivers/pci/pcie/dpc.c
> > > > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > > > *dpc)
> > > > >pcie_wait_for_link(pdev, false);
> > > > >  }
> > > > >
> > > > > -static void dpc_work(struct work_struct *work)
> > > > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > > > >  {
> > > > > -  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > > > -  struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > > > -  struct pci_bus *parent = pdev->subordinate;
> > > > > -  u16 cap = dpc->cap_pos, ctl;
> > > > > -
> > > > > -  pci_lock_rescan_remove();
> > > > > -  list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > > > -   bus_list) {
> > > > > -  pci_dev_get(dev);
> > > > > -  pci_dev_set_disconnected(dev, NULL);
> > > > > -  if (pci_has_subordinate(dev))
> > > > > -  pci_walk_bus(dev->subordinate,
> > > > > -   pci_dev_set_disconnected, NULL);
> > > > > -  pci_stop_and_remove_bus_device(dev);
> > > > > -  pci_dev_put(dev);
> > > > > -  }
> > > > > -  pci_unlock_rescan_remove();
> > > > > -
> > > > > +  struct dpc_dev *dpc;
> > > > > +  struct pcie_device *pciedev;
> > > > > +  struct device *devdpc;
> > > > > +  u16 cap, ctl;
> > > > > +
> > > > > +  /*
> > > > > +   * DPC disables the Link automatically in hardware, so it has
> > > > > +   * already been reset by the time we get here.
> > > > > +   */
> > > > > +
> > > > > +  devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > > > +  pciedev = to_pcie_device(devdpc);
> > > > > +  dpc = get_service_data(pciedev);
> > > > > +  cap = dpc->cap_pos;
> > > > > +
> > > > > +  /*
> > > > > +   * Waiting until the link is inactive, then clearing DPC
> > > > > +   * trigger status to allow the port to leave DPC.
> > > > > +   */
> > > > >dpc_wait_link_inactive(dpc);
> > > > > +
> > > > >if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > > > -  return;
> > > > > +  return PCI_ERS_RESULT_DISCONNECT;
> > > > >

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 18:34, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 05:45:58PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 16:22, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-16 05:26, Bjorn Helgaas wrote:
> > > On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > > > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > > > DPC driver implements link_reset callback, and calls
> > > > > pci_do_fatal_recovery().
> > > > >
> > > > > Which follows standard path of ERR_FATAL recovery.
> > > > >
> > > > > Signed-off-by: Oza Pawandeep 
> > > > >
> > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > > index 5e8857a..6af7595 100644
> > > > > --- a/drivers/pci/pci.h
> > > > > +++ b/drivers/pci/pci.h
> > > > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > > > pci_resource_alignment(struct pci_dev *dev,
> > > > >  void pci_enable_acs(struct pci_dev *dev);
> > > > >
> > > > >  /* PCI error reporting and recovery */
> > > > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > > > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > > > >
> > > > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > index fdfc474..36e622d 100644
> > > > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > > > *aerdev,
> > > > >} else if (info->severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(dev);
> > > > >else if (info->severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(dev);
> > > > > +  pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > > > >  }
> > > > >
> > > > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct 
work_struct
> > > > > *work)
> > > > >if (entry.severity == AER_NONFATAL)
> > > > >pcie_do_nonfatal_recovery(pdev);
> > > > >else if (entry.severity == AER_FATAL)
> > > > > -  pcie_do_fatal_recovery(pdev);
> > > > > +  pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > > > >pci_dev_put(pdev);
> > > > >}
> > > > >  }
> > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > > > index 80ec384..5680c13 100644
> > > > > --- a/drivers/pci/pcie/dpc.c
> > > > > +++ b/drivers/pci/pcie/dpc.c
> > > > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > > > *dpc)
> > > > >pcie_wait_for_link(pdev, false);
> > > > >  }
> > > > >
> > > > > -static void dpc_work(struct work_struct *work)
> > > > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > > > >  {
> > > > > -  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > > > -  struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > > > -  struct pci_bus *parent = pdev->subordinate;
> > > > > -  u16 cap = dpc->cap_pos, ctl;
> > > > > -
> > > > > -  pci_lock_rescan_remove();
> > > > > -  list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > > > -   bus_list) {
> > > > > -  pci_dev_get(dev);
> > > > > -  pci_dev_set_disconnected(dev, NULL);
> > > > > -  if (pci_has_subordinate(dev))
> > > > > -  pci_walk_bus(dev->subordinate,
> > > > > -   pci_dev_set_disconnected, NULL);
> > > > > -  pci_stop_and_remove_bus_device(dev);
> > > > > -  pci_dev_put(dev);
> > > > > -  }
> > > > > -  pci_unlock_rescan_remove();
> > > > > -
> > > > > +  struct dpc_dev *dpc;
> > > > > +  struct pcie_device *pciedev;
> > > > > +  struct device *devdpc;
> > > > > +  u16 cap, ctl;
> > > > > +
> > > > > +  /*
> > > > > +   * DPC disables the Link automatically in hardware, so it has
> > > > > +   * already been reset by the time we get here.
> > > > > +   */
> > > > > +
> > > > > +  devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > > > +  pciedev = to_pcie_device(devdpc);
> > > > > +  dpc = get_service_data(pciedev);
> > > > > +  cap = dpc->cap_pos;
> > > > > +
> > > > > +  /*
> > > > > +   * Waiting until the link is inactive, then clearing DPC
> > > > > +   * trigger status to allow the port to leave DPC.
> > > > > +   */
> > > > >dpc_wait_link_inactive(dpc);
> > > > > +
> > > > >if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > > > -  return;
> > > > > +  return PCI_ERS_RESULT_DISCONNECT;
> > > > >if (dpc->rp_extensions 

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 16:22, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 05:26, Bjorn Helgaas wrote:
> On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > DPC driver implements link_reset callback, and calls
> > > pci_do_fatal_recovery().
> > >
> > > Which follows standard path of ERR_FATAL recovery.
> > >
> > > Signed-off-by: Oza Pawandeep 
> > >
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index 5e8857a..6af7595 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > pci_resource_alignment(struct pci_dev *dev,
> > >  void pci_enable_acs(struct pci_dev *dev);
> > >
> > >  /* PCI error reporting and recovery */
> > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > >
> > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > index fdfc474..36e622d 100644
> > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > *aerdev,
> > >  } else if (info->severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(dev);
> > >  else if (info->severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(dev);
> > > +pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > >  }
> > >
> > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct work_struct
> > > *work)
> > >  if (entry.severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(pdev);
> > >  else if (entry.severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(pdev);
> > > +pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > >  pci_dev_put(pdev);
> > >  }
> > >  }
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 80ec384..5680c13 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > *dpc)
> > >  pcie_wait_for_link(pdev, false);
> > >  }
> > >
> > > -static void dpc_work(struct work_struct *work)
> > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > >  {
> > > -struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > -struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > -struct pci_bus *parent = pdev->subordinate;
> > > -u16 cap = dpc->cap_pos, ctl;
> > > -
> > > -pci_lock_rescan_remove();
> > > -list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > - bus_list) {
> > > -pci_dev_get(dev);
> > > -pci_dev_set_disconnected(dev, NULL);
> > > -if (pci_has_subordinate(dev))
> > > -pci_walk_bus(dev->subordinate,
> > > - pci_dev_set_disconnected, NULL);
> > > -pci_stop_and_remove_bus_device(dev);
> > > -pci_dev_put(dev);
> > > -}
> > > -pci_unlock_rescan_remove();
> > > -
> > > +struct dpc_dev *dpc;
> > > +struct pcie_device *pciedev;
> > > +struct device *devdpc;
> > > +u16 cap, ctl;
> > > +
> > > +/*
> > > + * DPC disables the Link automatically in hardware, so it has
> > > + * already been reset by the time we get here.
> > > + */
> > > +
> > > +devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > +pciedev = to_pcie_device(devdpc);
> > > +dpc = get_service_data(pciedev);
> > > +cap = dpc->cap_pos;
> > > +
> > > +/*
> > > + * Waiting until the link is inactive, then clearing DPC
> > > + * trigger status to allow the port to leave DPC.
> > > + */
> > >  dpc_wait_link_inactive(dpc);
> > > +
> > >  if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > -return;
> > > +return PCI_ERS_RESULT_DISCONNECT;
> > >  if (dpc->rp_extensions && dpc->rp_pio_status) {
> > >  pci_write_config_dword(pdev, cap + 
PCI_EXP_DPC_RP_PIO_STATUS,
> > > dpc->rp_pio_status);
> > > @@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
> > >  pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
> > >  pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
> > >  

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 16:22, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 05:26, Bjorn Helgaas wrote:
> On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > DPC driver implements link_reset callback, and calls
> > > pci_do_fatal_recovery().
> > >
> > > Which follows standard path of ERR_FATAL recovery.
> > >
> > > Signed-off-by: Oza Pawandeep 
> > >
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index 5e8857a..6af7595 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > pci_resource_alignment(struct pci_dev *dev,
> > >  void pci_enable_acs(struct pci_dev *dev);
> > >
> > >  /* PCI error reporting and recovery */
> > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > >
> > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > index fdfc474..36e622d 100644
> > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > *aerdev,
> > >  } else if (info->severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(dev);
> > >  else if (info->severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(dev);
> > > +pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > >  }
> > >
> > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct work_struct
> > > *work)
> > >  if (entry.severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(pdev);
> > >  else if (entry.severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(pdev);
> > > +pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > >  pci_dev_put(pdev);
> > >  }
> > >  }
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 80ec384..5680c13 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > *dpc)
> > >  pcie_wait_for_link(pdev, false);
> > >  }
> > >
> > > -static void dpc_work(struct work_struct *work)
> > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > >  {
> > > -struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > -struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > -struct pci_bus *parent = pdev->subordinate;
> > > -u16 cap = dpc->cap_pos, ctl;
> > > -
> > > -pci_lock_rescan_remove();
> > > -list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > - bus_list) {
> > > -pci_dev_get(dev);
> > > -pci_dev_set_disconnected(dev, NULL);
> > > -if (pci_has_subordinate(dev))
> > > -pci_walk_bus(dev->subordinate,
> > > - pci_dev_set_disconnected, NULL);
> > > -pci_stop_and_remove_bus_device(dev);
> > > -pci_dev_put(dev);
> > > -}
> > > -pci_unlock_rescan_remove();
> > > -
> > > +struct dpc_dev *dpc;
> > > +struct pcie_device *pciedev;
> > > +struct device *devdpc;
> > > +u16 cap, ctl;
> > > +
> > > +/*
> > > + * DPC disables the Link automatically in hardware, so it has
> > > + * already been reset by the time we get here.
> > > + */
> > > +
> > > +devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > +pciedev = to_pcie_device(devdpc);
> > > +dpc = get_service_data(pciedev);
> > > +cap = dpc->cap_pos;
> > > +
> > > +/*
> > > + * Waiting until the link is inactive, then clearing DPC
> > > + * trigger status to allow the port to leave DPC.
> > > + */
> > >  dpc_wait_link_inactive(dpc);
> > > +
> > >  if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > -return;
> > > +return PCI_ERS_RESULT_DISCONNECT;
> > >  if (dpc->rp_extensions && dpc->rp_pio_status) {
> > >  pci_write_config_dword(pdev, cap + 
PCI_EXP_DPC_RP_PIO_STATUS,
> > > dpc->rp_pio_status);
> > > @@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
> > >  pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
> > >  pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
> > >ctl | 

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 16:22, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 05:26, Bjorn Helgaas wrote:
> On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > DPC driver implements link_reset callback, and calls
> > > pci_do_fatal_recovery().
> > >
> > > Which follows standard path of ERR_FATAL recovery.
> > >
> > > Signed-off-by: Oza Pawandeep 
> > >
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index 5e8857a..6af7595 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > pci_resource_alignment(struct pci_dev *dev,
> > >  void pci_enable_acs(struct pci_dev *dev);
> > >
> > >  /* PCI error reporting and recovery */
> > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > >
> > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > index fdfc474..36e622d 100644
> > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > *aerdev,
> > >  } else if (info->severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(dev);
> > >  else if (info->severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(dev);
> > > +pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > >  }
> > >
> > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct work_struct
> > > *work)
> > >  if (entry.severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(pdev);
> > >  else if (entry.severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(pdev);
> > > +pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > >  pci_dev_put(pdev);
> > >  }
> > >  }
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 80ec384..5680c13 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > *dpc)
> > >  pcie_wait_for_link(pdev, false);
> > >  }
> > >
> > > -static void dpc_work(struct work_struct *work)
> > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > >  {
> > > -struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > -struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > -struct pci_bus *parent = pdev->subordinate;
> > > -u16 cap = dpc->cap_pos, ctl;
> > > -
> > > -pci_lock_rescan_remove();
> > > -list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > - bus_list) {
> > > -pci_dev_get(dev);
> > > -pci_dev_set_disconnected(dev, NULL);
> > > -if (pci_has_subordinate(dev))
> > > -pci_walk_bus(dev->subordinate,
> > > - pci_dev_set_disconnected, NULL);
> > > -pci_stop_and_remove_bus_device(dev);
> > > -pci_dev_put(dev);
> > > -}
> > > -pci_unlock_rescan_remove();
> > > -
> > > +struct dpc_dev *dpc;
> > > +struct pcie_device *pciedev;
> > > +struct device *devdpc;
> > > +u16 cap, ctl;
> > > +
> > > +/*
> > > + * DPC disables the Link automatically in hardware, so it has
> > > + * already been reset by the time we get here.
> > > + */
> > > +
> > > +devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > +pciedev = to_pcie_device(devdpc);
> > > +dpc = get_service_data(pciedev);
> > > +cap = dpc->cap_pos;
> > > +
> > > +/*
> > > + * Waiting until the link is inactive, then clearing DPC
> > > + * trigger status to allow the port to leave DPC.
> > > + */
> > >  dpc_wait_link_inactive(dpc);
> > > +
> > >  if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > -return;
> > > +return PCI_ERS_RESULT_DISCONNECT;
> > >  if (dpc->rp_extensions && dpc->rp_pio_status) {
> > >  pci_write_config_dword(pdev, cap + 
PCI_EXP_DPC_RP_PIO_STATUS,
> > > dpc->rp_pio_status);
> > > @@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
> > >  pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
> > >  pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
> > >  

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 16:22, Bjorn Helgaas wrote:

On Wed, May 16, 2018 at 01:46:25PM +0530, p...@codeaurora.org wrote:

On 2018-05-16 05:26, Bjorn Helgaas wrote:
> On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:
> > On 2018-05-11 16:13, Oza Pawandeep wrote:
> > > DPC driver implements link_reset callback, and calls
> > > pci_do_fatal_recovery().
> > >
> > > Which follows standard path of ERR_FATAL recovery.
> > >
> > > Signed-off-by: Oza Pawandeep 
> > >
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index 5e8857a..6af7595 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -354,7 +354,7 @@ static inline resource_size_t
> > > pci_resource_alignment(struct pci_dev *dev,
> > >  void pci_enable_acs(struct pci_dev *dev);
> > >
> > >  /* PCI error reporting and recovery */
> > > -void pcie_do_fatal_recovery(struct pci_dev *dev);
> > > +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
> > >  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
> > >
> > >  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> > > b/drivers/pci/pcie/aer/aerdrv_core.c
> > > index fdfc474..36e622d 100644
> > > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > > @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> > > *aerdev,
> > >  } else if (info->severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(dev);
> > >  else if (info->severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(dev);
> > > +pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
> > >  }
> > >
> > >  #ifdef CONFIG_ACPI_APEI_PCIEAER
> > > @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct work_struct
> > > *work)
> > >  if (entry.severity == AER_NONFATAL)
> > >  pcie_do_nonfatal_recovery(pdev);
> > >  else if (entry.severity == AER_FATAL)
> > > -pcie_do_fatal_recovery(pdev);
> > > +pcie_do_fatal_recovery(pdev, 
PCIE_PORT_SERVICE_AER);
> > >  pci_dev_put(pdev);
> > >  }
> > >  }
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 80ec384..5680c13 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> > > *dpc)
> > >  pcie_wait_for_link(pdev, false);
> > >  }
> > >
> > > -static void dpc_work(struct work_struct *work)
> > > +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> > >  {
> > > -struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> > > -struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> > > -struct pci_bus *parent = pdev->subordinate;
> > > -u16 cap = dpc->cap_pos, ctl;
> > > -
> > > -pci_lock_rescan_remove();
> > > -list_for_each_entry_safe_reverse(dev, temp, >devices,
> > > - bus_list) {
> > > -pci_dev_get(dev);
> > > -pci_dev_set_disconnected(dev, NULL);
> > > -if (pci_has_subordinate(dev))
> > > -pci_walk_bus(dev->subordinate,
> > > - pci_dev_set_disconnected, NULL);
> > > -pci_stop_and_remove_bus_device(dev);
> > > -pci_dev_put(dev);
> > > -}
> > > -pci_unlock_rescan_remove();
> > > -
> > > +struct dpc_dev *dpc;
> > > +struct pcie_device *pciedev;
> > > +struct device *devdpc;
> > > +u16 cap, ctl;
> > > +
> > > +/*
> > > + * DPC disables the Link automatically in hardware, so it has
> > > + * already been reset by the time we get here.
> > > + */
> > > +
> > > +devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> > > +pciedev = to_pcie_device(devdpc);
> > > +dpc = get_service_data(pciedev);
> > > +cap = dpc->cap_pos;
> > > +
> > > +/*
> > > + * Waiting until the link is inactive, then clearing DPC
> > > + * trigger status to allow the port to leave DPC.
> > > + */
> > >  dpc_wait_link_inactive(dpc);
> > > +
> > >  if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> > > -return;
> > > +return PCI_ERS_RESULT_DISCONNECT;
> > >  if (dpc->rp_extensions && dpc->rp_pio_status) {
> > >  pci_write_config_dword(pdev, cap + 
PCI_EXP_DPC_RP_PIO_STATUS,
> > > dpc->rp_pio_status);
> > > @@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
> > >  pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
> > >  pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
> > >ctl | 

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 05:26, Bjorn Helgaas wrote:

On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:

On 2018-05-11 16:13, Oza Pawandeep wrote:
> DPC driver implements link_reset callback, and calls
> pci_do_fatal_recovery().
>
> Which follows standard path of ERR_FATAL recovery.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 5e8857a..6af7595 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -354,7 +354,7 @@ static inline resource_size_t
> pci_resource_alignment(struct pci_dev *dev,
>  void pci_enable_acs(struct pci_dev *dev);
>
>  /* PCI error reporting and recovery */
> -void pcie_do_fatal_recovery(struct pci_dev *dev);
> +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
>  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
>
>  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> b/drivers/pci/pcie/aer/aerdrv_core.c
> index fdfc474..36e622d 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> *aerdev,
>} else if (info->severity == AER_NONFATAL)
>pcie_do_nonfatal_recovery(dev);
>else if (info->severity == AER_FATAL)
> -  pcie_do_fatal_recovery(dev);
> +  pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
>  }
>
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
> @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct work_struct
> *work)
>if (entry.severity == AER_NONFATAL)
>pcie_do_nonfatal_recovery(pdev);
>else if (entry.severity == AER_FATAL)
> -  pcie_do_fatal_recovery(pdev);
> +  pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_AER);
>pci_dev_put(pdev);
>}
>  }
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 80ec384..5680c13 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> *dpc)
>pcie_wait_for_link(pdev, false);
>  }
>
> -static void dpc_work(struct work_struct *work)
> +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>  {
> -  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> -  struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> -  struct pci_bus *parent = pdev->subordinate;
> -  u16 cap = dpc->cap_pos, ctl;
> -
> -  pci_lock_rescan_remove();
> -  list_for_each_entry_safe_reverse(dev, temp, >devices,
> -   bus_list) {
> -  pci_dev_get(dev);
> -  pci_dev_set_disconnected(dev, NULL);
> -  if (pci_has_subordinate(dev))
> -  pci_walk_bus(dev->subordinate,
> -   pci_dev_set_disconnected, NULL);
> -  pci_stop_and_remove_bus_device(dev);
> -  pci_dev_put(dev);
> -  }
> -  pci_unlock_rescan_remove();
> -
> +  struct dpc_dev *dpc;
> +  struct pcie_device *pciedev;
> +  struct device *devdpc;
> +  u16 cap, ctl;
> +
> +  /*
> +   * DPC disables the Link automatically in hardware, so it has
> +   * already been reset by the time we get here.
> +   */
> +
> +  devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> +  pciedev = to_pcie_device(devdpc);
> +  dpc = get_service_data(pciedev);
> +  cap = dpc->cap_pos;
> +
> +  /*
> +   * Waiting until the link is inactive, then clearing DPC
> +   * trigger status to allow the port to leave DPC.
> +   */
>dpc_wait_link_inactive(dpc);
> +
>if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> -  return;
> +  return PCI_ERS_RESULT_DISCONNECT;
>if (dpc->rp_extensions && dpc->rp_pio_status) {
>pci_write_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS,
>   dpc->rp_pio_status);
> @@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
>pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
>pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
>  ctl | PCI_EXP_DPC_CTL_INT_EN);
> +
> +  return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static void dpc_work(struct work_struct *work)
> +{
> +  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> +  struct pci_dev *pdev = dpc->dev->port;
> +
> +  /* From DPC point of view error is always FATAL. */
> +  pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_DPC);
>  }
>
>  static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
> @@ -288,6 +301,7 @@ static struct pcie_port_service_driver dpcdriver = {
>.service= PCIE_PORT_SERVICE_DPC,
>.probe  = dpc_probe,
>.remove = dpc_remove,
> +  .reset_link = dpc_reset_link,
>  };
>
>  static int __init dpc_service_init(void)
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 33a16b1..29ff148 100644
> --- a/drivers/pci/pcie/err.c
> +++ 

Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-16 Thread poza

On 2018-05-16 05:26, Bjorn Helgaas wrote:

On Fri, May 11, 2018 at 05:22:08PM +0530, p...@codeaurora.org wrote:

On 2018-05-11 16:13, Oza Pawandeep wrote:
> DPC driver implements link_reset callback, and calls
> pci_do_fatal_recovery().
>
> Which follows standard path of ERR_FATAL recovery.
>
> Signed-off-by: Oza Pawandeep 
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 5e8857a..6af7595 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -354,7 +354,7 @@ static inline resource_size_t
> pci_resource_alignment(struct pci_dev *dev,
>  void pci_enable_acs(struct pci_dev *dev);
>
>  /* PCI error reporting and recovery */
> -void pcie_do_fatal_recovery(struct pci_dev *dev);
> +void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
>  void pcie_do_nonfatal_recovery(struct pci_dev *dev);
>
>  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> b/drivers/pci/pcie/aer/aerdrv_core.c
> index fdfc474..36e622d 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device
> *aerdev,
>} else if (info->severity == AER_NONFATAL)
>pcie_do_nonfatal_recovery(dev);
>else if (info->severity == AER_FATAL)
> -  pcie_do_fatal_recovery(dev);
> +  pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
>  }
>
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
> @@ -321,7 +321,7 @@ static void aer_recover_work_func(struct work_struct
> *work)
>if (entry.severity == AER_NONFATAL)
>pcie_do_nonfatal_recovery(pdev);
>else if (entry.severity == AER_FATAL)
> -  pcie_do_fatal_recovery(pdev);
> +  pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_AER);
>pci_dev_put(pdev);
>}
>  }
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 80ec384..5680c13 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev
> *dpc)
>pcie_wait_for_link(pdev, false);
>  }
>
> -static void dpc_work(struct work_struct *work)
> +static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>  {
> -  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> -  struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> -  struct pci_bus *parent = pdev->subordinate;
> -  u16 cap = dpc->cap_pos, ctl;
> -
> -  pci_lock_rescan_remove();
> -  list_for_each_entry_safe_reverse(dev, temp, >devices,
> -   bus_list) {
> -  pci_dev_get(dev);
> -  pci_dev_set_disconnected(dev, NULL);
> -  if (pci_has_subordinate(dev))
> -  pci_walk_bus(dev->subordinate,
> -   pci_dev_set_disconnected, NULL);
> -  pci_stop_and_remove_bus_device(dev);
> -  pci_dev_put(dev);
> -  }
> -  pci_unlock_rescan_remove();
> -
> +  struct dpc_dev *dpc;
> +  struct pcie_device *pciedev;
> +  struct device *devdpc;
> +  u16 cap, ctl;
> +
> +  /*
> +   * DPC disables the Link automatically in hardware, so it has
> +   * already been reset by the time we get here.
> +   */
> +
> +  devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
> +  pciedev = to_pcie_device(devdpc);
> +  dpc = get_service_data(pciedev);
> +  cap = dpc->cap_pos;
> +
> +  /*
> +   * Waiting until the link is inactive, then clearing DPC
> +   * trigger status to allow the port to leave DPC.
> +   */
>dpc_wait_link_inactive(dpc);
> +
>if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
> -  return;
> +  return PCI_ERS_RESULT_DISCONNECT;
>if (dpc->rp_extensions && dpc->rp_pio_status) {
>pci_write_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS,
>   dpc->rp_pio_status);
> @@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
>pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
>pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
>  ctl | PCI_EXP_DPC_CTL_INT_EN);
> +
> +  return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static void dpc_work(struct work_struct *work)
> +{
> +  struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
> +  struct pci_dev *pdev = dpc->dev->port;
> +
> +  /* From DPC point of view error is always FATAL. */
> +  pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_DPC);
>  }
>
>  static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
> @@ -288,6 +301,7 @@ static struct pcie_port_service_driver dpcdriver = {
>.service= PCIE_PORT_SERVICE_DPC,
>.probe  = dpc_probe,
>.remove = dpc_remove,
> +  .reset_link = dpc_reset_link,
>  };
>
>  static int __init dpc_service_init(void)
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 33a16b1..29ff148 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ 

Re: [PATCH v16 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices

2018-05-15 Thread poza

On 2018-05-16 05:29, Bjorn Helgaas wrote:

On Fri, May 11, 2018 at 06:43:22AM -0400, Oza Pawandeep wrote:

This patch alters the behavior of handling of ERR_FATAL, where removal
of devices is initiated, followed by reset link, followed by
re-enumeration.

So the errors are handled in a different way as follows:
ERR_NONFATAL => call driver recovery entry points
ERR_FATAL=> remove and re-enumerate

please refer to Documentation/PCI/pci-error-recovery.txt for more 
details.


Signed-off-by: Oza Pawandeep 
Reviewed-by: Keith Busch 

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index 0ea5acc..649dd1f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -475,35 +476,84 @@ static pci_ers_result_t reset_link(struct 
pci_dev *dev)

 }

 /**
- * do_recovery - handle nonfatal/fatal error recovery process
+ * do_fatal_recovery - handle fatal error recovery process
+ * @dev: pointer to a pci_dev data structure of agent detecting an 
error

+ *
+ * Invoked when an error is fatal. Once being invoked, removes the 
devices
+ * benetah this AER agent, followed by reset link e.g. secondary bus 
reset

+ * followed by re-enumeration of devices.
+ */
+
+static void do_fatal_recovery(struct pci_dev *dev)
+{
+   struct pci_dev *udev;
+   struct pci_bus *parent;
+   struct pci_dev *pdev, *temp;
+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+   struct aer_broadcast_data result_data;
+
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   udev = dev;
+   else
+   udev = dev->bus->self;
+
+   parent = udev->subordinate;
+   pci_lock_rescan_remove();
+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, NULL);
+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }
+
+   result = reset_link(udev);


I don't like the fact that for DPC, the link reset happens before we 
call
the driver .remove() methods, while for AER, the reset happens *after* 
the
.remove() methods.  That means the .remove() methods may work 
differently

for AER vs. DPC, e.g., they may be able to access the device if AER is
handling the error, but they would not be able to access it if DPC is
handling it.

I don't know how to fix this, and I think we can keep this patch as it 
is

for now, but I think we should fix it eventually.


point noted, will see to this eventually.




+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   /*
+* If the error is reported by a bridge, we think this error
+* is related to the downstream link of the bridge, so we
+* do error recovery on all subordinates of the bridge instead
+* of the bridge and clear the error status of the bridge.
+*/
+   pci_walk_bus(dev->subordinate, report_resume, _data);
+   pci_cleanup_aer_uncorrect_error_status(dev);
+   }
+
+   if (result == PCI_ERS_RESULT_RECOVERED) {
+   if (pcie_wait_for_link(udev, true))
+   pci_rescan_bus(udev->bus);
+   } else {
+   pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+   pci_info(dev, "AER: Device recovery failed\n");
+   }
+
+   pci_unlock_rescan_remove();
+}
+
+/**
+ * do_nonfatal_recovery - handle nonfatal error recovery process
  * @dev: pointer to a pci_dev data structure of agent detecting an 
error

- * @severity: error severity type
  *
  * Invoked when an error is nonfatal/fatal. Once being invoked, 
broadcast
  * error detected message to all downstream drivers within a 
hierarchy in

  * question and return the returned code.
  */
-static void do_recovery(struct pci_dev *dev, int severity)
+static void do_nonfatal_recovery(struct pci_dev *dev)
 {
-   pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
+   pci_ers_result_t status;
enum pci_channel_state state;

-   if (severity == AER_FATAL)
-   state = pci_channel_io_frozen;
-   else
-   state = pci_channel_io_normal;
+   state = pci_channel_io_normal;

status = broadcast_error_message(dev,
state,
"error_detected",
report_error_detected);

-   if (severity == AER_FATAL) {
-   result 

Re: [PATCH v16 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices

2018-05-15 Thread poza

On 2018-05-16 05:29, Bjorn Helgaas wrote:

On Fri, May 11, 2018 at 06:43:22AM -0400, Oza Pawandeep wrote:

This patch alters the behavior of handling of ERR_FATAL, where removal
of devices is initiated, followed by reset link, followed by
re-enumeration.

So the errors are handled in a different way as follows:
ERR_NONFATAL => call driver recovery entry points
ERR_FATAL=> remove and re-enumerate

please refer to Documentation/PCI/pci-error-recovery.txt for more 
details.


Signed-off-by: Oza Pawandeep 
Reviewed-by: Keith Busch 

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index 0ea5acc..649dd1f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -475,35 +476,84 @@ static pci_ers_result_t reset_link(struct 
pci_dev *dev)

 }

 /**
- * do_recovery - handle nonfatal/fatal error recovery process
+ * do_fatal_recovery - handle fatal error recovery process
+ * @dev: pointer to a pci_dev data structure of agent detecting an 
error

+ *
+ * Invoked when an error is fatal. Once being invoked, removes the 
devices
+ * benetah this AER agent, followed by reset link e.g. secondary bus 
reset

+ * followed by re-enumeration of devices.
+ */
+
+static void do_fatal_recovery(struct pci_dev *dev)
+{
+   struct pci_dev *udev;
+   struct pci_bus *parent;
+   struct pci_dev *pdev, *temp;
+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+   struct aer_broadcast_data result_data;
+
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   udev = dev;
+   else
+   udev = dev->bus->self;
+
+   parent = udev->subordinate;
+   pci_lock_rescan_remove();
+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, NULL);
+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }
+
+   result = reset_link(udev);


I don't like the fact that for DPC, the link reset happens before we 
call
the driver .remove() methods, while for AER, the reset happens *after* 
the
.remove() methods.  That means the .remove() methods may work 
differently

for AER vs. DPC, e.g., they may be able to access the device if AER is
handling the error, but they would not be able to access it if DPC is
handling it.

I don't know how to fix this, and I think we can keep this patch as it 
is

for now, but I think we should fix it eventually.


point noted, will see to this eventually.




+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   /*
+* If the error is reported by a bridge, we think this error
+* is related to the downstream link of the bridge, so we
+* do error recovery on all subordinates of the bridge instead
+* of the bridge and clear the error status of the bridge.
+*/
+   pci_walk_bus(dev->subordinate, report_resume, _data);
+   pci_cleanup_aer_uncorrect_error_status(dev);
+   }
+
+   if (result == PCI_ERS_RESULT_RECOVERED) {
+   if (pcie_wait_for_link(udev, true))
+   pci_rescan_bus(udev->bus);
+   } else {
+   pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+   pci_info(dev, "AER: Device recovery failed\n");
+   }
+
+   pci_unlock_rescan_remove();
+}
+
+/**
+ * do_nonfatal_recovery - handle nonfatal error recovery process
  * @dev: pointer to a pci_dev data structure of agent detecting an 
error

- * @severity: error severity type
  *
  * Invoked when an error is nonfatal/fatal. Once being invoked, 
broadcast
  * error detected message to all downstream drivers within a 
hierarchy in

  * question and return the returned code.
  */
-static void do_recovery(struct pci_dev *dev, int severity)
+static void do_nonfatal_recovery(struct pci_dev *dev)
 {
-   pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
+   pci_ers_result_t status;
enum pci_channel_state state;

-   if (severity == AER_FATAL)
-   state = pci_channel_io_frozen;
-   else
-   state = pci_channel_io_normal;
+   state = pci_channel_io_normal;

status = broadcast_error_message(dev,
state,
"error_detected",
report_error_detected);

-   if (severity == AER_FATAL) {
-   result = reset_link(dev);
-   if 

Re: [PATCH v16 5/9] PCI/AER: Factor out error reporting from AER

2018-05-11 Thread poza

On 2018-05-11 21:24, Lukas Wunner wrote:

On Fri, May 11, 2018 at 09:04:36PM +0530, p...@codeaurora.org wrote:

On 2018-05-11 18:28, Lukas Wunner wrote:
>On Fri, May 11, 2018 at 06:43:24AM -0400, Oza Pawandeep wrote:
>>+void pcie_do_fatal_recovery(struct pci_dev *dev)
>>+{
>>+   struct pci_dev *udev;
>>+   struct pci_bus *parent;
>>+   struct pci_dev *pdev, *temp;
>>+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
>>+   struct aer_broadcast_data result_data;
>>+
>>+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
>>+   udev = dev;
>>+   else
>>+   udev = dev->bus->self;
>>+
>>+   parent = udev->subordinate;
>>+   pci_lock_rescan_remove();
>>+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
>>+bus_list) {
>>+   pci_dev_get(pdev);
>>+   pci_dev_set_disconnected(pdev, NULL);
>>+   if (pci_has_subordinate(pdev))
>>+   pci_walk_bus(pdev->subordinate,
>>+pci_dev_set_disconnected, NULL);
>>+   pci_stop_and_remove_bus_device(pdev);
>>+   pci_dev_put(pdev);
>>+   }
>
>Any reason not to simply call
>
>pci_walk_bus(udev->subordinate, pci_dev_set_disconnected, NULL);
>
>before the list_for_each_entry_safe_reverse() iteration, instead of
>calling it for each device on the subordinate bus and for each
>device's children?  Should be semantically identical, saves 3 LoC
>and saves wasted cycles of acquiring pci_bus_sem over and over again
>for each device on the subordinate bus.

Well this is borrowed code from DPC driver, hence I thought to keep 
the

same.
but to me it looks like its taking care of PCIe switch where is goes 
through
all the subordinates, and which could turn out to be more swicthes 
down the

line, and son on...
it goes all the way down to the tree


... which is precisely what the one line I suggested above does.

You don't need to respin for this alone as far as I'm concerned,
but please post a follow-up refactoring patch.  I have a patch
in the pipeline which makes the same change in pciehp, hence this
caught my eye.

Thanks,

Lukas


Thanks Lukas, I will keep this in my pipeline as an optimization.
appreciate your input.

Regards,
Oza.







Re: [PATCH v16 5/9] PCI/AER: Factor out error reporting from AER

2018-05-11 Thread poza

On 2018-05-11 21:24, Lukas Wunner wrote:

On Fri, May 11, 2018 at 09:04:36PM +0530, p...@codeaurora.org wrote:

On 2018-05-11 18:28, Lukas Wunner wrote:
>On Fri, May 11, 2018 at 06:43:24AM -0400, Oza Pawandeep wrote:
>>+void pcie_do_fatal_recovery(struct pci_dev *dev)
>>+{
>>+   struct pci_dev *udev;
>>+   struct pci_bus *parent;
>>+   struct pci_dev *pdev, *temp;
>>+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
>>+   struct aer_broadcast_data result_data;
>>+
>>+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
>>+   udev = dev;
>>+   else
>>+   udev = dev->bus->self;
>>+
>>+   parent = udev->subordinate;
>>+   pci_lock_rescan_remove();
>>+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
>>+bus_list) {
>>+   pci_dev_get(pdev);
>>+   pci_dev_set_disconnected(pdev, NULL);
>>+   if (pci_has_subordinate(pdev))
>>+   pci_walk_bus(pdev->subordinate,
>>+pci_dev_set_disconnected, NULL);
>>+   pci_stop_and_remove_bus_device(pdev);
>>+   pci_dev_put(pdev);
>>+   }
>
>Any reason not to simply call
>
>pci_walk_bus(udev->subordinate, pci_dev_set_disconnected, NULL);
>
>before the list_for_each_entry_safe_reverse() iteration, instead of
>calling it for each device on the subordinate bus and for each
>device's children?  Should be semantically identical, saves 3 LoC
>and saves wasted cycles of acquiring pci_bus_sem over and over again
>for each device on the subordinate bus.

Well this is borrowed code from DPC driver, hence I thought to keep 
the

same.
but to me it looks like its taking care of PCIe switch where is goes 
through
all the subordinates, and which could turn out to be more swicthes 
down the

line, and son on...
it goes all the way down to the tree


... which is precisely what the one line I suggested above does.

You don't need to respin for this alone as far as I'm concerned,
but please post a follow-up refactoring patch.  I have a patch
in the pipeline which makes the same change in pciehp, hence this
caught my eye.

Thanks,

Lukas


Thanks Lukas, I will keep this in my pipeline as an optimization.
appreciate your input.

Regards,
Oza.







Re: [PATCH v16 5/9] PCI/AER: Factor out error reporting from AER

2018-05-11 Thread poza

On 2018-05-11 18:28, Lukas Wunner wrote:

On Fri, May 11, 2018 at 06:43:24AM -0400, Oza Pawandeep wrote:

+void pcie_do_fatal_recovery(struct pci_dev *dev)
+{
+   struct pci_dev *udev;
+   struct pci_bus *parent;
+   struct pci_dev *pdev, *temp;
+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+   struct aer_broadcast_data result_data;
+
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   udev = dev;
+   else
+   udev = dev->bus->self;
+
+   parent = udev->subordinate;
+   pci_lock_rescan_remove();
+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, NULL);
+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }


Any reason not to simply call

pci_walk_bus(udev->subordinate, pci_dev_set_disconnected, NULL);

before the list_for_each_entry_safe_reverse() iteration, instead of
calling it for each device on the subordinate bus and for each
device's children?  Should be semantically identical, saves 3 LoC
and saves wasted cycles of acquiring pci_bus_sem over and over again
for each device on the subordinate bus.

Thanks,

Lukas


Well this is borrowed code from DPC driver, hence I thought to keep the 
same.
but to me it looks like its taking care of PCIe switch where is goes 
through all the subordinates, and which could turn out to be more 
swicthes down the line, and son on...

it goes all the way down to the tree
Am I missing something here ?








Re: [PATCH v16 5/9] PCI/AER: Factor out error reporting from AER

2018-05-11 Thread poza

On 2018-05-11 18:28, Lukas Wunner wrote:

On Fri, May 11, 2018 at 06:43:24AM -0400, Oza Pawandeep wrote:

+void pcie_do_fatal_recovery(struct pci_dev *dev)
+{
+   struct pci_dev *udev;
+   struct pci_bus *parent;
+   struct pci_dev *pdev, *temp;
+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+   struct aer_broadcast_data result_data;
+
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   udev = dev;
+   else
+   udev = dev->bus->self;
+
+   parent = udev->subordinate;
+   pci_lock_rescan_remove();
+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, NULL);
+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }


Any reason not to simply call

pci_walk_bus(udev->subordinate, pci_dev_set_disconnected, NULL);

before the list_for_each_entry_safe_reverse() iteration, instead of
calling it for each device on the subordinate bus and for each
device's children?  Should be semantically identical, saves 3 LoC
and saves wasted cycles of acquiring pci_bus_sem over and over again
for each device on the subordinate bus.

Thanks,

Lukas


Well this is borrowed code from DPC driver, hence I thought to keep the 
same.
but to me it looks like its taking care of PCIe switch where is goes 
through all the subordinates, and which could turn out to be more 
swicthes down the line, and son on...

it goes all the way down to the tree
Am I missing something here ?








Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC

2018-05-11 Thread poza

On 2018-05-11 16:13, Oza Pawandeep wrote:

DPC driver implements link_reset callback, and calls
pci_do_fatal_recovery().

Which follows standard path of ERR_FATAL recovery.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5e8857a..6af7595 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -354,7 +354,7 @@ static inline resource_size_t
pci_resource_alignment(struct pci_dev *dev,
 void pci_enable_acs(struct pci_dev *dev);

 /* PCI error reporting and recovery */
-void pcie_do_fatal_recovery(struct pci_dev *dev);
+void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service);
 void pcie_do_nonfatal_recovery(struct pci_dev *dev);

 bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
b/drivers/pci/pcie/aer/aerdrv_core.c
index fdfc474..36e622d 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -254,7 +254,7 @@ static void handle_error_source(struct pcie_device 
*aerdev,

} else if (info->severity == AER_NONFATAL)
pcie_do_nonfatal_recovery(dev);
else if (info->severity == AER_FATAL)
-   pcie_do_fatal_recovery(dev);
+   pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER);
 }

 #ifdef CONFIG_ACPI_APEI_PCIEAER
@@ -321,7 +321,7 @@ static void aer_recover_work_func(struct 
work_struct *work)

if (entry.severity == AER_NONFATAL)
pcie_do_nonfatal_recovery(pdev);
else if (entry.severity == AER_FATAL)
-   pcie_do_fatal_recovery(pdev);
+   pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_AER);
pci_dev_put(pdev);
}
 }
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 80ec384..5680c13 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -73,29 +73,31 @@ static void dpc_wait_link_inactive(struct dpc_dev 
*dpc)

pcie_wait_for_link(pdev, false);
 }

-static void dpc_work(struct work_struct *work)
+static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 {
-   struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
-   struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
-   struct pci_bus *parent = pdev->subordinate;
-   u16 cap = dpc->cap_pos, ctl;
-
-   pci_lock_rescan_remove();
-   list_for_each_entry_safe_reverse(dev, temp, >devices,
-bus_list) {
-   pci_dev_get(dev);
-   pci_dev_set_disconnected(dev, NULL);
-   if (pci_has_subordinate(dev))
-   pci_walk_bus(dev->subordinate,
-pci_dev_set_disconnected, NULL);
-   pci_stop_and_remove_bus_device(dev);
-   pci_dev_put(dev);
-   }
-   pci_unlock_rescan_remove();
-
+   struct dpc_dev *dpc;
+   struct pcie_device *pciedev;
+   struct device *devdpc;
+   u16 cap, ctl;
+
+   /*
+* DPC disables the Link automatically in hardware, so it has
+* already been reset by the time we get here.
+*/
+
+   devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC);
+   pciedev = to_pcie_device(devdpc);
+   dpc = get_service_data(pciedev);
+   cap = dpc->cap_pos;
+
+   /*
+* Waiting until the link is inactive, then clearing DPC
+* trigger status to allow the port to leave DPC.
+*/
dpc_wait_link_inactive(dpc);
+
if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
-   return;
+   return PCI_ERS_RESULT_DISCONNECT;
if (dpc->rp_extensions && dpc->rp_pio_status) {
pci_write_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS,
   dpc->rp_pio_status);
@@ -108,6 +110,17 @@ static void dpc_work(struct work_struct *work)
pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, );
pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
  ctl | PCI_EXP_DPC_CTL_INT_EN);
+
+   return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void dpc_work(struct work_struct *work)
+{
+   struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
+   struct pci_dev *pdev = dpc->dev->port;
+
+   /* From DPC point of view error is always FATAL. */
+   pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_DPC);
 }

 static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
@@ -288,6 +301,7 @@ static struct pcie_port_service_driver dpcdriver = 
{

.service= PCIE_PORT_SERVICE_DPC,
.probe  = dpc_probe,
.remove = dpc_remove,
+   .reset_link = dpc_reset_link,
 };

 static int __init dpc_service_init(void)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 33a16b1..29ff148 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -185,7 +185,7 @@ static 

  1   2   3   >