from:"George Cherian"

[PATCH net-next 0/2] Add devlink health reporters for NIX block

2021-01-19 Thread George Cherian

Devlink health reporters are added for the NIX block.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

This series is in continuation to
https://www.spinics.net/lists/netdev/msg707798.html

Added Documentation for the same.

George Cherian (2):
  octeontx2-af: Add devlink health reporters for NIX
  docs: octeontx2: Add Documentation for NIX health reporters

 .../ethernet/marvell/octeontx2.rst|  70 ++
 .../marvell/octeontx2/af/rvu_devlink.c| 652 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  27 +
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 4 files changed, 758 insertions(+), 1 deletion(-)

-- 
2.25.1

[PATCH net-next 2/2] docs: octeontx2: Add Documentation for NIX health reporters

2021-01-19 Thread George Cherian

Add devlink health reporter documentation for NIX block.

Signed-off-by: George Cherian 
---
 .../ethernet/marvell/octeontx2.rst| 70 +++
 1 file changed, 70 insertions(+)

diff --git 
a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst 
b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
index 61e850460e18..dd5cd69467be 100644
--- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
+++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
@@ -217,3 +217,73 @@ For example::
 NPA_AF_ERR:
NPA Error Interrupt Reg : 4096
AQ Doorbell Error
+
+
+NIX Reporters
+-
+The NIX reporters are responsible for reporting and recovering the following 
group of errors:
+
+1. GENERAL events
+
+   - Receive mirror/multicast packet drop due to insufficient buffer.
+   - SMQ Flush operation.
+
+2. ERROR events
+
+   - Memory Fault due to WQE read/write from multicast/mirror buffer.
+   - Receive multicast/mirror replication list error.
+   - Receive packet on an unmapped PF.
+   - Fault due to NIX_AQ_INST_S read or NIX_AQ_RES_S write.
+   - AQ Doorbell Error.
+
+3. RAS events
+
+   - RAS Error Reporting for NIX Receive Multicast/Mirror Entry Structure.
+   - RAS Error Reporting for WQE/Packet Data read from Multicast/Mirror 
Buffer..
+   - RAS Error Reporting for NIX_AQ_INST_S/NIX_AQ_RES_S.
+
+4. RVU events
+
+   - Error due to unmapped slot.
+
+Sample Output::
+
+   ~# ./devlink health
+   pci/0002:01:00.0:
+ reporter hw_npa_intr
+   state healthy error 0 recover 0 grace_period 0 auto_recover true 
auto_dump true
+ reporter hw_npa_gen
+   state healthy error 0 recover 0 grace_period 0 auto_recover true 
auto_dump true
+ reporter hw_npa_err
+   state healthy error 0 recover 0 grace_period 0 auto_recover true 
auto_dump true
+ reporter hw_npa_ras
+   state healthy error 0 recover 0 grace_period 0 auto_recover true 
auto_dump true
+ reporter hw_nix_intr
+   state healthy error 1121 recover 1121 last_dump_date 2021-01-19 
last_dump_time 05:42:26 grace_period 0 auto_recover true auto_dump true
+ reporter hw_nix_gen
+   state healthy error 949 recover 949 last_dump_date 2021-01-19 
last_dump_time 05:42:43 grace_period 0 auto_recover true auto_dump true
+ reporter hw_nix_err
+   state healthy error 1147 recover 1147 last_dump_date 2021-01-19 
last_dump_time 05:42:59 grace_period 0 auto_recover true auto_dump true
+ reporter hw_nix_ras
+   state healthy error 409 recover 409 last_dump_date 2021-01-19 
last_dump_time 05:43:16 grace_period 0 auto_recover true auto_dump true
+
+Each reporter dumps the
+
+ - Error Type
+ - Error Register value
+ - Reason in words
+
+For example::
+
+   ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr
+NIX_AF_RVU:
+   NIX RVU Interrupt Reg : 1
+   Unmap Slot Error
+   ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen
+NIX_AF_GENERAL:
+   NIX General Interrupt Reg : 1
+   Rx multicast pkt drop
+   ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_err
+NIX_AF_ERR:
+   NIX Error Interrupt Reg : 64
+   Rx on unmapped PF_FUNC
-- 
2.25.1

[PATCH net-next 1/2] octeontx2-af: Add devlink health reporters for NIX

2021-01-19 Thread George Cherian

Add health reporters for RVU NIX block.
NIX Health reporters handle following HW event groups
- GENERAL events
- ERROR events
- RAS events
- RVU event

Output:

 # devlink health
 pci/0002:01:00.0:
   reporter hw_npa_intr
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_npa_gen
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_npa_err
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_npa_ras
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_nix_intr
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_nix_gen
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_nix_err
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
   reporter hw_nix_ras
 state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true

 # devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr
  NIX_AF_RVU:
NIX RVU Interrupt Reg : 1
Unmap Slot Error
 # devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen
  NIX_AF_GENERAL:
NIX General Interrupt Reg : 1
Rx multicast pkt drop

Each reporter dump shows the Register value and the description of the cause.

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 652 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  27 +
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 3 files changed, 688 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index bc0e4113370e..10a98bcb7c54 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -52,6 +52,650 @@ static bool rvu_common_request_irq(struct rvu *rvu, int 
offset,
return rvu->irq_allocated[offset];
 }
 
+static void rvu_nix_intr_work(struct work_struct *work)
+{
+   struct rvu_nix_health_reporters *rvu_nix_health_reporter;
+
+   rvu_nix_health_reporter = container_of(work, struct 
rvu_nix_health_reporters, intr_work);
+   devlink_health_report(rvu_nix_health_reporter->rvu_hw_nix_intr_reporter,
+ "NIX_AF_RVU Error",
+ rvu_nix_health_reporter->nix_event_ctx);
+}
+
+static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->rvu_nix_health_reporter->nix_event_ctx;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT);
+   nix_event_context->nix_af_rvu_int = intr;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL);
+   queue_work(rvu_dl->devlink_wq, 
_dl->rvu_nix_health_reporter->intr_work);
+
+   return IRQ_HANDLED;
+}
+
+static void rvu_nix_gen_work(struct work_struct *work)
+{
+   struct rvu_nix_health_reporters *rvu_nix_health_reporter;
+
+   rvu_nix_health_reporter = container_of(work, struct 
rvu_nix_health_reporters, gen_work);
+   devlink_health_report(rvu_nix_health_reporter->rvu_hw_nix_gen_reporter,
+ "NIX_AF_GEN Error",
+ rvu_nix_health_reporter->nix_event_ctx);
+}
+
+static irqreturn_t rvu_nix_af_rvu_gen_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->rvu_nix_health_reporter->nix_event_ctx;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_GEN_INT);
+   nix_event_context->nix_af_rvu_gen = intr;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_GEN_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_GEN_INT_ENA_W1C, ~0ULL);
+   queue_work(rvu_dl->devlink_wq, 
_dl->rvu_nix_health_reporter->gen_work);
+
+   return IRQ_HANDLED;
+}
+
+static void rvu_nix_err_work(struct work_struct *work)
+{
+   struct rvu_nix_health_reporters *rvu_nix_health_reporter;
+
+   rvu_nix_health_reporter = container_

Re: [PATCH v2] docs: octeontx2: tune rst markup

2021-01-06 Thread George Cherian

On Wed, Jan 6, 2021 at 9:51 PM Lukas Bulwahn  wrote:
>
> Commit 80b9414832a1 ("docs: octeontx2: Add Documentation for NPA health
> reporters") added new documentation with improper formatting for rst, and
> caused a few new warnings for make htmldocs in octeontx2.rst:169--202.
>
> Tune markup and formatting for better presentation in the HTML view.
>
> Signed-off-by: Lukas Bulwahn 
> ---
> v1 -> v2: minor stylistic tuning as suggested by Randy
>
> applies cleanly on current master (v5.11-rc2) and next-20210106
>
> George, please ack.
> Jonathan, please pick this minor formatting clean-up patch.
>
Acked-by: George Cherian 

Regards
-George

RE: [PATCH][next] octeontx2-af: Fix undetected unmap PF error check

2020-12-16 Thread George Cherian



> -Original Message-
> From: Colin King 
> Sent: Wednesday, December 16, 2020 6:06 PM
> To: Sunil Kovvuri Goutham ; Linu Cherian
> ; Geethasowjanya Akula ;
> Jerin Jacob Kollanukkaran ; David S . Miller
> ; Jakub Kicinski ; George
> Cherian ; net...@vger.kernel.org
> Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH][next] octeontx2-af: Fix undetected unmap PF error
> check
> 
> From: Colin Ian King 
> 
> Currently the check for an unmap PF error is always going to be false because
> intr_val is a 32 bit int and is being bit-mask checked against 1ULL << 32.  
> Fix
> this by making intr_val a u64 to match the type at it is copied from, namely
> npa_event_context->npa_af_rvu_ge.
> 
> Addresses-Coverity: ("Operands don't affect result")
> Fixes: f1168d1e207c ("octeontx2-af: Add devlink health reporters for NPA")
> Signed-off-by: Colin Ian King 
Acked-by: George Cherian 

> ---
>  drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> index 3f9d0ab6d5ae..bc0e4113370e 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> @@ -275,7 +275,8 @@ static int rvu_npa_report_show(struct devlink_fmsg
> *fmsg, void *ctx,
>  enum npa_af_rvu_health health_reporter)  {
>   struct rvu_npa_event_ctx *npa_event_context;
> - unsigned int intr_val, alloc_dis, free_dis;
> + unsigned int alloc_dis, free_dis;
> + u64 intr_val;
>   int err;
> 
>   npa_event_context = ctx;
> --
> 2.29.2

Regards,
-George

[PATCHv6 net-next 0/3] Add devlink and devlink health reporters to

2020-12-10 Thread George Cherian

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA block.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

For now, I have dropped the NIX block health reporters. 
This series attempts to add health reporters only for the NPA block.
As per Jakub's suggestion separate reporters per event is used and also
got rid of the counters.

Change-log:
v6
 - Address Jakub comments
 - Add reporters per event for each block.
 - Remove the Sw counter.
 - Remove the mbox version from devlink info.

v5 
 - Address Jiri's comment
 - use devlink_fmsg_arr_pair_nest_start() for NIX blocks 

v4 
 - Rebase to net-next (no logic changes).
 
v3
 - Address Saeed's comments on v2.
 - Renamed the reporter name as hw_*.
 - Call devlink_health_report() when an event is raised.
 - Added recover op too.

v2
 - Address Willem's comments on v1.
 - Fixed the sparse issues, reported by Jakub.


George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  docs: octeontx2: Add Documentation for NPA health reporters

 .../ethernet/marvell/octeontx2.rst|  50 ++
 .../net/ethernet/marvell/octeontx2/Kconfig|   1 +
 .../ethernet/marvell/octeontx2/af/Makefile|   2 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../marvell/octeontx2/af/rvu_devlink.c| 770 ++
 .../marvell/octeontx2/af/rvu_devlink.h|  55 ++
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 8 files changed, 912 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

-- 
2.25.1

[PATCHv6 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-12-10 Thread George Cherian

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
 #

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../net/ethernet/marvell/octeontx2/Kconfig|  1 +
 .../ethernet/marvell/octeontx2/af/Makefile|  2 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  4 ++
 .../marvell/octeontx2/af/rvu_devlink.c| 64 +++
 .../marvell/octeontx2/af/rvu_devlink.h| 20 ++
 6 files changed, 98 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 543a1d047567..16caa02095fe 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
+   select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 7100d1dd856e..eb535c98ca38 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 octeontx2_mbox-y := mbox.o rvu_trace.o
 octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
  rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \
- rvu_cpt.o
+ rvu_cpt.o rvu_devlink.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 9f901c0edcbb..e8fd712860a1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (err)
goto err_flr;
 
+   err = rvu_register_dl(rvu);
+   if (err)
+   goto err_irq;
+
rvu_setup_rvum_blk_revid(rvu);
 
/* Enable AF's VFs (if any) */
err = rvu_enable_sriov(rvu);
if (err)
-   goto err_irq;
+   goto err_dl;
 
/* Initialize debugfs */
rvu_dbg_init(rvu);
 
return 0;
+err_dl:
+   rvu_unregister_dl(rvu);
 err_irq:
rvu_unregister_interrupts(rvu);
 err_flr:
@@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev)
 
rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
+   rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index b6c0977499ab..b1a6ecfd563e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -12,7 +12,10 @@
 #define RVU_H
 
 #include 
+#include 
+
 #include "rvu_struct.h"
+#include "rvu_devlink.h"
 #include "common.h"
 #include "mbox.h"
 #include "npc.h"
@@ -422,6 +425,7 @@ struct rvu {
 #ifdef CONFIG_DEBUG_FS
struct rvu_debugfs  rvu_dbg;
 #endif
+   struct rvu_devlink  *rvu_dl;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
new file mode 100644
index ..5dabca04a34b
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Devlink
+ *
+ * Copyright (C) 2020 Marvell.
+ *
+ */
+
+#include "rvu.h"
+
+#define DRV_NAME "octeontx2-af"
+
+static int rvu_devlink_info_get(struct devlink *devlink, struct 
devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   return devlink_info_driver_name_put(req, DRV_NAME);
+}
+
+static const struct devlink_ops rvu_devlink_ops = {
+   .info_get = rvu_devlink_info_get,
+};
+
+int rvu_register_dl(struct rvu *rvu)
+{
+   struct rvu_devlink *rvu_dl;
+   struct devlink *dl;
+   int err;
+
+   rvu_dl = kzalloc(sizeof(*rvu_dl), GFP_KERNEL);
+   if (!rvu_dl)
+   return -ENOMEM;
+
+   dl = devlink_alloc(_devlink_ops, sizeof(struct rvu_devlink));
+   if (!dl) {
+

[PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-12-10 Thread George Cherian

Add health reporters for RVU NPA block.
NPA Health reporters handle following HW event groups
 - GENERAL events
 - ERROR events
 - RAS events
 - RVU event

Output:
 #devlink health
 pci/0002:01:00.0:
   reporter hw_npa_intr
 state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true
   reporter hw_npa_gen
 state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true
   reporter hw_npa_err
 state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true
   reporter hw_npa_ras
 state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true

 #devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
 NPA_AF_ERR:
NPA Error Interrupt Reg : 4096
AQ Doorbell Error
 #devlink health dump show  pci/0002:01:00.0 reporter hw_npa_ras
 NPA_AF_RVU_RAS:
NPA RAS Interrupt Reg : 0

 Each reporter dump shows the Register value and the description of the
cause.

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 708 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  35 +
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 3 files changed, 765 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 5dabca04a34b..3f9d0ab6d5ae 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -5,10 +5,714 @@
  *
  */
 
+#include
+
 #include "rvu.h"
+#include "rvu_reg.h"
+#include "rvu_struct.h"
 
 #define DRV_NAME "octeontx2-af"
 
+static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
+{
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, name);
+   if (err)
+   return err;
+
+   return  devlink_fmsg_obj_nest_start(fmsg);
+}
+
+static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
+{
+   int err;
+
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+
+   return devlink_fmsg_pair_nest_end(fmsg);
+}
+
+static bool rvu_common_request_irq(struct rvu *rvu, int offset,
+  const char *name, irq_handler_t fn)
+{
+   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
+   int rc;
+
+   sprintf(>irq_name[offset * NAME_SIZE], name);
+   rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
+>irq_name[offset * NAME_SIZE], rvu_dl);
+   if (rc)
+   dev_warn(rvu->dev, "Failed to register %s irq\n", name);
+   else
+   rvu->irq_allocated[offset] = true;
+
+   return rvu->irq_allocated[offset];
+}
+
+static void rvu_npa_intr_work(struct work_struct *work)
+{
+   struct rvu_npa_health_reporters *rvu_npa_health_reporter;
+
+   rvu_npa_health_reporter = container_of(work, struct 
rvu_npa_health_reporters, intr_work);
+   devlink_health_report(rvu_npa_health_reporter->rvu_hw_npa_intr_reporter,
+ "NPA_AF_RVU Error",
+ rvu_npa_health_reporter->npa_event_ctx);
+}
+
+static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_ctx *npa_event_context;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_context = rvu_dl->rvu_npa_health_reporter->npa_event_ctx;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT);
+   npa_event_context->npa_af_rvu_int = intr;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL);
+   queue_work(rvu_dl->devlink_wq, 
_dl->rvu_npa_health_reporter->intr_work);
+
+   return IRQ_HANDLED;
+}
+
+static void rvu_npa_gen_work(struct work_struct *work)
+{
+   struct rvu_npa_health_reporters *rvu_npa_health_reporter;
+
+   rvu_npa_health_reporter = container_of(work, struct 
rvu_npa_health_reporters, gen_work);
+   devlink_health_report(rvu_npa_health_reporter->rvu_hw_npa_gen_reporter,
+ "NPA_AF_GEN Error",
+ rvu_npa_health_reporter->npa_event_ctx);
+}
+
+static irqreturn_t rvu_npa_af_gen_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_ctx *npa_event_context;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+

[PATCH 3/3] docs: octeontx2: Add Documentation for NPA health reporters

2020-12-10 Thread George Cherian

Add Documentation for devlink health reporters for NPA block.

Signed-off-by: George Cherian 
---
 .../ethernet/marvell/octeontx2.rst| 50 +++
 1 file changed, 50 insertions(+)

diff --git 
a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst 
b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
index 88f508338c5f..d3fcf536d14e 100644
--- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
+++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
@@ -12,6 +12,7 @@ Contents
 - `Overview`_
 - `Drivers`_
 - `Basic packet flow`_
+- `Devlink health reporters`_
 
 Overview
 
@@ -157,3 +158,52 @@ Egress
 3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped 
pool of NPA block LF.
 4. NIX block transmits the pkt on the designated channel.
 5. NPC MCAM entries can be installed to divert pkt onto a different channel.
+
+Devlink health reporters
+
+
+NPA Reporters
+-
+The NPA reporters are responsible for reporting and recovering the following 
group of errors
+1. GENERAL events
+   - Error due to operation of unmapped PF.
+   - Error due to disabled alloc/free for other HW blocks (NIX, SSO, TIM, DPI 
and AURA).
+2. ERROR events
+   - Fault due to NPA_AQ_INST_S read or NPA_AQ_RES_S write.
+   - AQ Doorbell Error.
+3. RAS events
+   - RAS Error Reporting for NPA_AQ_INST_S/NPA_AQ_RES_S.
+4. RVU events
+   - Error due to unmapped slot.
+
+Sample Output
+-
+~# devlink health
+pci/0002:01:00.0:
+  reporter hw_npa_intr
+  state healthy error 2872 recover 2872 last_dump_date 2020-12-10 
last_dump_time 09:39:09 grace_period 0 auto_recover true auto_dump true
+  reporter hw_npa_gen
+  state healthy error 2872 recover 2872 last_dump_date 2020-12-11 
last_dump_time 04:43:04 grace_period 0 auto_recover true auto_dump true
+  reporter hw_npa_err
+  state healthy error 2871 recover 2871 last_dump_date 2020-12-10 
last_dump_time 09:39:17 grace_period 0 auto_recover true auto_dump true
+   reporter hw_npa_ras
+  state healthy error 0 recover 0 last_dump_date 2020-12-10 last_dump_time 
09:32:40 grace_period 0 auto_recover true auto_dump true
+
+Each reporter dumps the
+ - Error Type
+ - Error Register value
+ - Reason in words
+
+For eg:
+~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_gen
+ NPA_AF_GENERAL:
+ NPA General Interrupt Reg : 1
+ NIX0: free disabled RX
+~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_intr
+ NPA_AF_RVU:
+ NPA RVU Interrupt Reg : 1
+ Unmap Slot Error
+~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
+ NPA_AF_ERR:
+NPA Error Interrupt Reg : 4096
+AQ Doorbell Error
-- 
2.25.1

RE: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-30 Thread George Cherian




> -Original Message-
> From: George Cherian
> Sent: Tuesday, December 1, 2020 10:49 AM
> To: 'Jakub Kicinski' 
> Cc: 'net...@vger.kernel.org' ; 'linux-
> ker...@vger.kernel.org' ;
> 'da...@davemloft.net' ; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; 'masahi...@kernel.org'
> ; 'willemdebruijn.ker...@gmail.com'
> ; 'sa...@kernel.org'
> ; 'j...@resnulli.us' 
> Subject: RE: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> Jakub,
> 
> > -Original Message-
> > From: George Cherian
> > Sent: Tuesday, December 1, 2020 9:06 AM
> > To: Jakub Kicinski 
> > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > da...@davemloft.net; Sunil Kovvuri Goutham
> ;
> > Linu Cherian ; Geethasowjanya Akula
> > ; masahi...@kernel.org;
> > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us
> > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
> > reporters for NPA
> >
> > Hi Jakub,
> >
> > > -Original Message-
> > > From: Jakub Kicinski 
> > > Sent: Tuesday, December 1, 2020 7:59 AM
> > > To: George Cherian 
> > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > da...@davemloft.net; Sunil Kovvuri Goutham
> > ;
> > > Linu Cherian ; Geethasowjanya Akula
> > > ; masahi...@kernel.org;
> > > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us
> > > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
> > > reporters for NPA
> > >
> > > On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote:
> > > > Add health reporters for RVU NPA block.
> > > > NPA Health reporters handle following HW event groups
> > > >  - GENERAL events
> > > >  - ERROR events
> > > >  - RAS events
> > > >  - RVU event
> > > > An event counter per event is maintained in SW.
> > > >
> > > > Output:
> > > >  # devlink health
> > > >  pci/0002:01:00.0:
> > > >reporter hw_npa
> > > >  state healthy error 0 recover 0  # devlink  health dump show
> > > > pci/0002:01:00.0 reporter hw_npa
> > > >  NPA_AF_GENERAL:
> > > > Unmap PF Error: 0
> > > > NIX:
> > > > 0: free disabled RX: 0 free disabled TX: 0
> > > > 1: free disabled RX: 0 free disabled TX: 0
> > > > Free Disabled for SSO: 0
> > > > Free Disabled for TIM: 0
> > > > Free Disabled for DPI: 0
> > > > Free Disabled for AURA: 0
> > > > Alloc Disabled for Resvd: 0
> > > >   NPA_AF_ERR:
> > > > Memory Fault on NPA_AQ_INST_S read: 0
> > > > Memory Fault on NPA_AQ_RES_S write: 0
> > > > AQ Doorbell Error: 0
> > > > Poisoned data on NPA_AQ_INST_S read: 0
> > > > Poisoned data on NPA_AQ_RES_S write: 0
> > > > Poisoned data on HW context read: 0
> > > >   NPA_AF_RVU:
> > > > Unmap Slot Error: 0
> > >
> > > You seem to have missed the feedback Saeed and I gave you on v2.
> > >
> > > Did you test this with the errors actually triggering? Devlink
> > > should store only
> > Yes, the same was tested using devlink health test interface by
> > injecting errors.
> > The dump gets generated automatically and the counters do get out of
> > sync, in case of continuous error.
> > That wouldn't be much of an issue as the user could manually trigger a
> > dump clear and Re-dump the counters to get the exact status of the
> > counters at any point of time.
> 
> Now that recover op is added the devlink error counter and recover counter
> will be proper. The internal counter for each event is needed just to
> understand within a specific reporter, how many such events occurred.
> 
> Following is the log snippet of the devlink health test being done on hw_nix
> reporter.
> # for i in `seq 1 33` ; do  devlink health test pci/0002:01:00.0 reporter 
> hw_nix;
> done //Inject 33 errors (16  of NIX_AF_RVU and 17 of NIX_AF_RAS and
> NIX_AF_GENERAL errors) # devlink health
> pci/0002:01:00.0:
>   reporter hw_npa
> state healthy error 0 recover 0 grace_period 0 auto_recover true
> auto_dump true
>   reporter hw_nix
> state healthy error 250 recover 250 last_dump_date 1970-01-01
> last_dump_time 00:04:16 grace_period 0 auto_recover true auto_dump true
Oops,

RE: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-30 Thread George Cherian

Jakub,

> -Original Message-
> From: George Cherian
> Sent: Tuesday, December 1, 2020 9:06 AM
> To: Jakub Kicinski 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> da...@davemloft.net; Sunil Kovvuri Goutham ;
> Linu Cherian ; Geethasowjanya Akula
> ; masahi...@kernel.org;
> willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us
> Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> Hi Jakub,
> 
> > -Original Message-
> > From: Jakub Kicinski 
> > Sent: Tuesday, December 1, 2020 7:59 AM
> > To: George Cherian 
> > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > da...@davemloft.net; Sunil Kovvuri Goutham
> ;
> > Linu Cherian ; Geethasowjanya Akula
> > ; masahi...@kernel.org;
> > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us
> > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
> > reporters for NPA
> >
> > On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote:
> > > Add health reporters for RVU NPA block.
> > > NPA Health reporters handle following HW event groups
> > >  - GENERAL events
> > >  - ERROR events
> > >  - RAS events
> > >  - RVU event
> > > An event counter per event is maintained in SW.
> > >
> > > Output:
> > >  # devlink health
> > >  pci/0002:01:00.0:
> > >reporter hw_npa
> > >  state healthy error 0 recover 0  # devlink  health dump show
> > > pci/0002:01:00.0 reporter hw_npa
> > >  NPA_AF_GENERAL:
> > > Unmap PF Error: 0
> > > NIX:
> > > 0: free disabled RX: 0 free disabled TX: 0
> > > 1: free disabled RX: 0 free disabled TX: 0
> > > Free Disabled for SSO: 0
> > > Free Disabled for TIM: 0
> > > Free Disabled for DPI: 0
> > > Free Disabled for AURA: 0
> > > Alloc Disabled for Resvd: 0
> > >   NPA_AF_ERR:
> > > Memory Fault on NPA_AQ_INST_S read: 0
> > > Memory Fault on NPA_AQ_RES_S write: 0
> > > AQ Doorbell Error: 0
> > > Poisoned data on NPA_AQ_INST_S read: 0
> > > Poisoned data on NPA_AQ_RES_S write: 0
> > > Poisoned data on HW context read: 0
> > >   NPA_AF_RVU:
> > > Unmap Slot Error: 0
> >
> > You seem to have missed the feedback Saeed and I gave you on v2.
> >
> > Did you test this with the errors actually triggering? Devlink should
> > store only
> Yes, the same was tested using devlink health test interface by injecting
> errors.
> The dump gets generated automatically and the counters do get out of sync,
> in case of continuous error.
> That wouldn't be much of an issue as the user could manually trigger a dump
> clear and Re-dump the counters to get the exact status of the counters at
> any point of time.

Now that recover op is added the devlink error counter and recover counter will 
be 
proper. The internal counter for each event is needed just to understand within 
a specific reporter, how 
many such events occurred. 

Following is the log snippet of the devlink health test being done on hw_nix 
reporter.
# for i in `seq 1 33` ; do  devlink health test pci/0002:01:00.0 reporter 
hw_nix; done
//Inject 33 errors (16  of NIX_AF_RVU and 17 of NIX_AF_RAS and  NIX_AF_GENERAL 
errors)
# devlink health 
pci/0002:01:00.0:
  reporter hw_npa
state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump 
true
  reporter hw_nix
state healthy error 250 recover 250 last_dump_date 1970-01-01 
last_dump_time 00:04:16 grace_period 0 auto_recover true auto_dump true
# devlink health dump show pci/0002:01:00.0 reporter hw_nix
NIX_AF_GENERAL:
Memory Fault on NIX_AQ_INST_S read: 1 
Memory Fault on NIX_AQ_RES_S write: 1 
AQ Doorbell error: 1 
Rx on unmapped PF_FUNC: 1 
Rx multicast replication error: 1 
Memory fault on NIX_RX_MCE_S read: 1 
Memory fault on multicast WQE read: 1 
Memory fault on mirror WQE read: 1 
Memory fault on mirror pkt write: 1 
Memory fault on multicast pkt write: 1
  NIX_AF_RAS:
Poisoned data on NIX_AQ_INST_S read: 1 
Poisoned data on NIX_AQ_RES_S write: 1 
Poisoned data on HW context read: 1 
Poisoned data on packet read from mirror buffer: 1 
Poisoned data on packet read from mcast buffer: 1 
Poisoned data on WQE read from mirror buffer: 1 
Poisoned data on WQE read from multicast buffer: 1 
Poisoned data on NIX_RX_MCE_S read: 1
  NIX_AF_RVU:
Unmap Slot Error: 0
#

Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-30 Thread George Cherian

Hi Jakub,

> -Original Message-
> From: Jakub Kicinski 
> Sent: Tuesday, December 1, 2020 7:59 AM
> To: George Cherian 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> da...@davemloft.net; Sunil Kovvuri Goutham ;
> Linu Cherian ; Geethasowjanya Akula
> ; masahi...@kernel.org;
> willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us
> Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote:
> > Add health reporters for RVU NPA block.
> > NPA Health reporters handle following HW event groups
> >  - GENERAL events
> >  - ERROR events
> >  - RAS events
> >  - RVU event
> > An event counter per event is maintained in SW.
> >
> > Output:
> >  # devlink health
> >  pci/0002:01:00.0:
> >reporter hw_npa
> >  state healthy error 0 recover 0
> >  # devlink  health dump show pci/0002:01:00.0 reporter hw_npa
> >  NPA_AF_GENERAL:
> > Unmap PF Error: 0
> > NIX:
> > 0: free disabled RX: 0 free disabled TX: 0
> > 1: free disabled RX: 0 free disabled TX: 0
> > Free Disabled for SSO: 0
> > Free Disabled for TIM: 0
> > Free Disabled for DPI: 0
> > Free Disabled for AURA: 0
> > Alloc Disabled for Resvd: 0
> >   NPA_AF_ERR:
> > Memory Fault on NPA_AQ_INST_S read: 0
> > Memory Fault on NPA_AQ_RES_S write: 0
> > AQ Doorbell Error: 0
> > Poisoned data on NPA_AQ_INST_S read: 0
> > Poisoned data on NPA_AQ_RES_S write: 0
> > Poisoned data on HW context read: 0
> >   NPA_AF_RVU:
> > Unmap Slot Error: 0
> 
> You seem to have missed the feedback Saeed and I gave you on v2.
> 
> Did you test this with the errors actually triggering? Devlink should store 
> only
Yes, the same was tested using devlink health test interface by injecting 
errors.
The dump gets generated automatically and the counters do get out of sync, 
in case of continuous error.
That wouldn't be much of an issue as the user could manually trigger a dump 
clear and 
Re-dump the counters to get the exact status of the counters at any point of 
time.

> one dump, are the counters not going to get out of sync unless something
> clears the dump every time it triggers?

Regards,
-George

[PATCHv5 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX

2020-11-26 Thread George Cherian

Add health reporters for RVU NIX block.
NIX Health reporter handle following HW event groups
 - GENERAL events
 - RAS events
 - RVU event
An event counter per event is maintained in SW.

Output:
 # devlink health
 pci/0002:01:00.0:
   reporter hw_npa
 state healthy error 0 recover 0
   reporter hw_nix
 state healthy error 0 recover 0
 # devlink  health dump show pci/0002:01:00.0 reporter hw_nix
  NIX_AF_GENERAL:
 Memory Fault on NIX_AQ_INST_S read: 0
 Memory Fault on NIX_AQ_RES_S write: 0
 AQ Doorbell error: 0
 Rx on unmapped PF_FUNC: 0
 Rx multicast replication error: 0
 Memory fault on NIX_RX_MCE_S read: 0
 Memory fault on multicast WQE read: 0
 Memory fault on mirror WQE read: 0
 Memory fault on mirror pkt write: 0
 Memory fault on multicast pkt write: 0
   NIX_AF_RAS:
 Poisoned data on NIX_AQ_INST_S read: 0
 Poisoned data on NIX_AQ_RES_S write: 0
 Poisoned data on HW context read: 0
 Poisoned data on packet read from mirror buffer: 0
 Poisoned data on packet read from mcast buffer: 0
 Poisoned data on WQE read from mirror buffer: 0
 Poisoned data on WQE read from multicast buffer: 0
 Poisoned data on NIX_RX_MCE_S read: 0
   NIX_AF_RVU:
 Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 414 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  31 ++
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 3 files changed, 453 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 377264d65d0c..2f20d8b9eef3 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -35,6 +35,131 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
return devlink_fmsg_pair_nest_end(fmsg);
 }
 
+static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->nix_event_ctx;
+   nix_event_count = _event_context->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT);
+   nix_event_context->nix_af_rvu_int = intr;
+
+   if (intr & BIT_ULL(0))
+   nix_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL);
+   devlink_health_report(rvu_dl->rvu_nix_health_reporter, "NIX_AF_RVU 
Error",
+ nix_event_context);
+
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->nix_event_ctx;
+   nix_event_count = _event_context->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT);
+   nix_event_context->nix_af_rvu_err = intr;
+
+   if (intr & BIT_ULL(14))
+   nix_event_count->aq_inst_count++;
+   if (intr & BIT_ULL(13))
+   nix_event_count->aq_res_count++;
+   if (intr & BIT_ULL(12))
+   nix_event_count->aq_db_count++;
+   if (intr & BIT_ULL(6))
+   nix_event_count->rx_on_unmap_pf_count++;
+   if (intr & BIT_ULL(5))
+   nix_event_count->rx_mcast_repl_count++;
+   if (intr & BIT_ULL(4))
+   nix_event_count->rx_mcast_memfault_count++;
+   if (intr & BIT_ULL(3))
+   nix_event_count->rx_mcast_wqe_memfault_count++;
+   if (intr & BIT_ULL(2))
+   nix_event_count->rx_mirror_wqe_memfault_count++;
+   if (intr & BIT_ULL(1))
+   nix_event_count->rx_mirror_pktw_memfault_count++;
+   if (intr & BIT_ULL(0))
+   nix_event_count->rx_mcast_pktw_memfault_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT_ENA_W1C, ~0ULL);
+   dev

[PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-26 Thread George Cherian

Add health reporters for RVU NPA block.
NPA Health reporters handle following HW event groups
 - GENERAL events
 - ERROR events
 - RAS events
 - RVU event
An event counter per event is maintained in SW.

Output:
 # devlink health
 pci/0002:01:00.0:
   reporter hw_npa
 state healthy error 0 recover 0
 # devlink  health dump show pci/0002:01:00.0 reporter hw_npa
 NPA_AF_GENERAL:
Unmap PF Error: 0
NIX:
0: free disabled RX: 0 free disabled TX: 0
1: free disabled RX: 0 free disabled TX: 0
Free Disabled for SSO: 0
Free Disabled for TIM: 0
Free Disabled for DPI: 0
Free Disabled for AURA: 0
Alloc Disabled for Resvd: 0
  NPA_AF_ERR:
Memory Fault on NPA_AQ_INST_S read: 0
Memory Fault on NPA_AQ_RES_S write: 0
AQ Doorbell Error: 0
Poisoned data on NPA_AQ_INST_S read: 0
Poisoned data on NPA_AQ_RES_S write: 0
Poisoned data on HW context read: 0
  NPA_AF_RVU:
Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 498 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  31 ++
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 3 files changed, 551 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 04ef945e7e75..377264d65d0c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -5,10 +5,504 @@
  *
  */
 
+#include
+
 #include "rvu.h"
+#include "rvu_reg.h"
+#include "rvu_struct.h"
 
 #define DRV_NAME "octeontx2-af"
 
+static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
+{
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, name);
+   if (err)
+   return err;
+
+   return  devlink_fmsg_obj_nest_start(fmsg);
+}
+
+static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
+{
+   int err;
+
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+
+   return devlink_fmsg_pair_nest_end(fmsg);
+}
+
+static bool rvu_common_request_irq(struct rvu *rvu, int offset,
+  const char *name, irq_handler_t fn)
+{
+   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
+   int rc;
+
+   sprintf(>irq_name[offset * NAME_SIZE], name);
+   rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
+>irq_name[offset * NAME_SIZE], rvu_dl);
+   if (rc)
+   dev_warn(rvu->dev, "Failed to register %s irq\n", name);
+   else
+   rvu->irq_allocated[offset] = true;
+
+   return rvu->irq_allocated[offset];
+}
+
+static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_ctx *npa_event_context;
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_context = rvu_dl->npa_event_ctx;
+   npa_event_count = _event_context->npa_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT);
+   npa_event_context->npa_af_rvu_int = intr;
+
+   if (intr & BIT_ULL(0))
+   npa_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL);
+   devlink_health_report(rvu_dl->rvu_npa_health_reporter, "NPA_AF_RVU 
Error",
+ npa_event_context);
+
+   return IRQ_HANDLED;
+}
+
+static int rvu_npa_inpq_to_cnt(u16 in,
+  struct rvu_npa_event_cnt *npa_event_count)
+{
+   switch (in) {
+   case 0:
+   return 0;
+   case BIT(NPA_INPQ_NIX0_RX):
+   return npa_event_count->free_dis_nix0_rx_count++;
+   case BIT(NPA_INPQ_NIX0_TX):
+   return npa_event_count->free_dis_nix0_tx_count++;
+   case BIT(NPA_INPQ_NIX1_RX):
+   return npa_event_count->free_dis_nix1_rx_count++;
+   case BIT(NPA_INPQ_NIX1_TX):
+   return npa_event_count->free_dis_nix1_tx_count++;
+   case BIT(NPA_INPQ_SSO):
+   return npa_event_count->free_dis_sso_count++;
+   case BIT(NPA_INPQ_TIM):
+   return npa_event_count->free_dis_tim_count++;
+   case BIT(NPA_INPQ_DPI):
+   return npa_event_count->free_dis_dpi_count++;
+   case BIT(NPA_INPQ_AURA_OP):
+   return npa_event_cou

[PATCHv5 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-11-26 Thread George Cherian

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
  versions:
  fixed:
mbox version: 9

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../net/ethernet/marvell/octeontx2/Kconfig|  1 +
 .../ethernet/marvell/octeontx2/af/Makefile|  2 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  4 ++
 .../marvell/octeontx2/af/rvu_devlink.c| 72 +++
 .../marvell/octeontx2/af/rvu_devlink.h| 20 ++
 6 files changed, 106 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 543a1d047567..16caa02095fe 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
+   select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 7100d1dd856e..eb535c98ca38 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 octeontx2_mbox-y := mbox.o rvu_trace.o
 octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
  rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \
- rvu_cpt.o
+ rvu_cpt.o rvu_devlink.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 9f901c0edcbb..e8fd712860a1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (err)
goto err_flr;
 
+   err = rvu_register_dl(rvu);
+   if (err)
+   goto err_irq;
+
rvu_setup_rvum_blk_revid(rvu);
 
/* Enable AF's VFs (if any) */
err = rvu_enable_sriov(rvu);
if (err)
-   goto err_irq;
+   goto err_dl;
 
/* Initialize debugfs */
rvu_dbg_init(rvu);
 
return 0;
+err_dl:
+   rvu_unregister_dl(rvu);
 err_irq:
rvu_unregister_interrupts(rvu);
 err_flr:
@@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev)
 
rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
+   rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index b6c0977499ab..b1a6ecfd563e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -12,7 +12,10 @@
 #define RVU_H
 
 #include 
+#include 
+
 #include "rvu_struct.h"
+#include "rvu_devlink.h"
 #include "common.h"
 #include "mbox.h"
 #include "npc.h"
@@ -422,6 +425,7 @@ struct rvu {
 #ifdef CONFIG_DEBUG_FS
struct rvu_debugfs  rvu_dbg;
 #endif
+   struct rvu_devlink  *rvu_dl;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
new file mode 100644
index ..04ef945e7e75
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Devlink
+ *
+ * Copyright (C) 2020 Marvell.
+ *
+ */
+
+#include "rvu.h"
+
+#define DRV_NAME "octeontx2-af"
+
+static int rvu_devlink_info_get(struct devlink *devlink, struct 
devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   char buf[10];
+   int err;
+
+   err = devlink_info_driver_name_put(req, DRV_NAME);
+   if (err)
+   return err;
+
+   sprintf(buf, "%X", OTX2_MBOX_VERSION);
+   return devlink_info_version_fixed_put(req, "mbox version:", buf);
+}
+
+static const struct devlink_ops rvu_devlink_ops = {
+   .info_get = rvu_devlink_info_get,
+};
+
+int rvu_register_dl(struct rvu *rvu)
+{
+   struct

[PATCHv5 net-next 0/3] Add devlink and devlink health reporters to

2020-11-26 Thread George Cherian



Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA and NIX blocks.
These reporters report the error count in respective blocks.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

Change-log:
v5 
 - Address Jiri's comment
 - use devlink_fmsg_arr_pair_nest_start() for NIX blocks 

v4 
 - Rebase to net-next (no logic changes).
 
v3
 - Address Saeed's comments on v2.
 - Renamed the reporter name as hw_*.
 - Call devlink_health_report() when an event is raised.
 - Added recover op too.

v2
 - Address Willem's comments on v1.
 - Fixed the sparse issues, reported by Jakub.


George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  octeontx2-af: Add devlink health reporters for NIX

 .../net/ethernet/marvell/octeontx2/Kconfig|   1 +
 .../ethernet/marvell/octeontx2/af/Makefile|   2 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../marvell/octeontx2/af/rvu_devlink.c| 978 ++
 .../marvell/octeontx2/af/rvu_devlink.h|  82 ++
 .../marvell/octeontx2/af/rvu_struct.h |  33 +
 7 files changed, 1107 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

-- 
2.25.1

Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-23 Thread George Cherian

Hi Jiri,

> -Original Message-
> From: Jiri Pirko 
> Sent: Monday, November 23, 2020 3:52 PM
> To: George Cherian 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org;
> willemdebruijn.ker...@gmail.com; sa...@kernel.org
> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> Mon, Nov 23, 2020 at 03:49:06AM CET, gcher...@marvell.com wrote:
> >
> >
> >> -Original Message-
> >> From: Jiri Pirko 
> >> Sent: Saturday, November 21, 2020 7:44 PM
> >> To: George Cherian 
> >> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> >> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
> >> ; Linu Cherian ;
> >> Geethasowjanya Akula ; masahi...@kernel.org;
> >> willemdebruijn.ker...@gmail.com; sa...@kernel.org
> >> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
> >> reporters for NPA
> >>
> >> Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote:
> >> >Add health reporters for RVU NPA block.
> >> >NPA Health reporters handle following HW event groups
> >> > - GENERAL events
> >> > - ERROR events
> >> > - RAS events
> >> > - RVU event
> >> >An event counter per event is maintained in SW.
> >> >
> >> >Output:
> >> > # devlink health
> >> > pci/0002:01:00.0:
> >> >   reporter npa
> >> > state healthy error 0 recover 0  # devlink  health dump show
> >> >pci/0002:01:00.0 reporter npa
> >> > NPA_AF_GENERAL:
> >> >Unmap PF Error: 0
> >> >Free Disabled for NIX0 RX: 0
> >> >Free Disabled for NIX0 TX: 0
> >> >Free Disabled for NIX1 RX: 0
> >> >Free Disabled for NIX1 TX: 0
> >>
> >> This is for 2 ports if I'm not mistaken. Then you need to have this
> >> reporter per-port. Register ports and have reporter for each.
> >>
> >No, these are not port specific reports.
> >NIX is the Network Interface Controller co-processor block.
> >There are (max of) 2 such co-processor blocks per SoC.
> 
> Ah. I see. In that case, could you please structure the json differently. 
> Don't
> concatenate the number with the string. Instead of that, please have 2
> subtrees, one for each NIX.
> 
NPA_AF_GENERAL:
Unmap PF Error: 0
Free Disabled for NIX0 
RX: 0
TX: 0
Free Disabled for NIX1
RX: 0
TX: 0

Something like this?

Regards,
-George
> 
> >
> >Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor)
> reporter.
> >This tells whether a free or alloc operation is skipped due to the
> >configurations set by other co-processor blocks (NIX,SSO,TIM etc).
> >
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.kernel.org_doc
> >_html_latest_networking_device-
> 5Fdrivers_ethernet_marvell_octeontx2.htm
> >l=DwIBAg=nKjWec2b6R0mOyPaz7xtfQ=npgTSgHrUSLmXpBZJKVhk0
> lE_XNvtVDl8
> >ZA2zBvBqPw=FNPm6lB8fRvGYvMqQWer6S9WI6rZIlMmDCqbM8xrnxM
> =B47zBTfDlIdM
> >xUmK0hmQkuoZnsGZYSzkvbZUloevT0A=
> >> NAK.

Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-22 Thread George Cherian




> -Original Message-
> From: Jiri Pirko 
> Sent: Saturday, November 21, 2020 7:44 PM
> To: George Cherian 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org;
> willemdebruijn.ker...@gmail.com; sa...@kernel.org
> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote:
> >Add health reporters for RVU NPA block.
> >NPA Health reporters handle following HW event groups
> > - GENERAL events
> > - ERROR events
> > - RAS events
> > - RVU event
> >An event counter per event is maintained in SW.
> >
> >Output:
> > # devlink health
> > pci/0002:01:00.0:
> >   reporter npa
> > state healthy error 0 recover 0
> > # devlink  health dump show pci/0002:01:00.0 reporter npa
> > NPA_AF_GENERAL:
> >Unmap PF Error: 0
> >Free Disabled for NIX0 RX: 0
> >Free Disabled for NIX0 TX: 0
> >Free Disabled for NIX1 RX: 0
> >Free Disabled for NIX1 TX: 0
> 
> This is for 2 ports if I'm not mistaken. Then you need to have this reporter
> per-port. Register ports and have reporter for each.
> 
No, these are not port specific reports.
NIX is the Network Interface Controller co-processor block.
There are (max of) 2 such co-processor blocks per SoC.

Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor) reporter.
This tells whether a free or alloc operation is skipped due to the 
configurations set by
other co-processor blocks (NIX,SSO,TIM etc).

https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/marvell/octeontx2.html
> NAK.

[PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-20 Thread George Cherian

Add health reporters for RVU NPA block.
NPA Health reporters handle following HW event groups
 - GENERAL events
 - ERROR events
 - RAS events
 - RVU event
An event counter per event is maintained in SW.

Output:
 # devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
 # devlink  health dump show pci/0002:01:00.0 reporter npa
 NPA_AF_GENERAL:
Unmap PF Error: 0
Free Disabled for NIX0 RX: 0
Free Disabled for NIX0 TX: 0
Free Disabled for NIX1 RX: 0
Free Disabled for NIX1 TX: 0
Free Disabled for SSO: 0
Free Disabled for TIM: 0
Free Disabled for DPI: 0
Free Disabled for AURA: 0
Alloc Disabled for Resvd: 0
  NPA_AF_ERR:
Memory Fault on NPA_AQ_INST_S read: 0
Memory Fault on NPA_AQ_RES_S write: 0
AQ Doorbell Error: 0
Poisoned data on NPA_AQ_INST_S read: 0
Poisoned data on NPA_AQ_RES_S write: 0
Poisoned data on HW context read: 0
  NPA_AF_RVU:
Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 492 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  31 ++
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 3 files changed, 545 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 04ef945e7e75..b7f0691d86b0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -5,10 +5,498 @@
  *
  */
 
+#include
+
 #include "rvu.h"
+#include "rvu_reg.h"
+#include "rvu_struct.h"
 
 #define DRV_NAME "octeontx2-af"
 
+static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
+{
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, name);
+   if (err)
+   return err;
+
+   return  devlink_fmsg_obj_nest_start(fmsg);
+}
+
+static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
+{
+   int err;
+
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+
+   return devlink_fmsg_pair_nest_end(fmsg);
+}
+
+static bool rvu_common_request_irq(struct rvu *rvu, int offset,
+  const char *name, irq_handler_t fn)
+{
+   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
+   int rc;
+
+   sprintf(>irq_name[offset * NAME_SIZE], name);
+   rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
+>irq_name[offset * NAME_SIZE], rvu_dl);
+   if (rc)
+   dev_warn(rvu->dev, "Failed to register %s irq\n", name);
+   else
+   rvu->irq_allocated[offset] = true;
+
+   return rvu->irq_allocated[offset];
+}
+
+static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_ctx *npa_event_context;
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_context = rvu_dl->npa_event_ctx;
+   npa_event_count = _event_context->npa_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT);
+   npa_event_context->npa_af_rvu_int = intr;
+
+   if (intr & BIT_ULL(0))
+   npa_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL);
+   devlink_health_report(rvu_dl->rvu_npa_health_reporter, "NPA_AF_RVU 
Error",
+ npa_event_context);
+
+   return IRQ_HANDLED;
+}
+
+static int rvu_npa_inpq_to_cnt(u16 in,
+  struct rvu_npa_event_cnt *npa_event_count)
+{
+   switch (in) {
+   case 0:
+   return 0;
+   case BIT(NPA_INPQ_NIX0_RX):
+   return npa_event_count->free_dis_nix0_rx_count++;
+   case BIT(NPA_INPQ_NIX0_TX):
+   return npa_event_count->free_dis_nix0_tx_count++;
+   case BIT(NPA_INPQ_NIX1_RX):
+   return npa_event_count->free_dis_nix1_rx_count++;
+   case BIT(NPA_INPQ_NIX1_TX):
+   return npa_event_count->free_dis_nix1_tx_count++;
+   case BIT(NPA_INPQ_SSO):
+   return npa_event_count->free_dis_sso_count++;
+   case BIT(NPA_INPQ_TIM):
+   return npa_event_count->free_dis_tim_count++;
+   case BIT(NPA_INPQ_DPI):
+   return npa_event_count->free_dis_dpi_count++;
+   case BIT(NPA_INPQ_AURA_OP):

[PATCHv4 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX

2020-11-20 Thread George Cherian

Add health reporters for RVU NIX block.
NIX Health reporter handle following HW event groups
 - GENERAL events
 - RAS events
 - RVU event
An event counter per event is maintained in SW.

Output:
 # ./devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
   reporter nix
 state healthy error 0 recover 0
 # ./devlink  health dump show pci/0002:01:00.0 reporter nix
  NIX_AF_GENERAL:
 Memory Fault on NIX_AQ_INST_S read: 0
 Memory Fault on NIX_AQ_RES_S write: 0
 AQ Doorbell error: 0
 Rx on unmapped PF_FUNC: 0
 Rx multicast replication error: 0
 Memory fault on NIX_RX_MCE_S read: 0
 Memory fault on multicast WQE read: 0
 Memory fault on mirror WQE read: 0
 Memory fault on mirror pkt write: 0
 Memory fault on multicast pkt write: 0
   NIX_AF_RAS:
 Poisoned data on NIX_AQ_INST_S read: 0
 Poisoned data on NIX_AQ_RES_S write: 0
 Poisoned data on HW context read: 0
 Poisoned data on packet read from mirror buffer: 0
 Poisoned data on packet read from mcast buffer: 0
 Poisoned data on WQE read from mirror buffer: 0
 Poisoned data on WQE read from multicast buffer: 0
 Poisoned data on NIX_RX_MCE_S read: 0
   NIX_AF_RVU:
 Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 414 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  31 ++
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 3 files changed, 453 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index b7f0691d86b0..c02d0f56ae7a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -35,6 +35,131 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
return devlink_fmsg_pair_nest_end(fmsg);
 }
 
+static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->nix_event_ctx;
+   nix_event_count = _event_context->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT);
+   nix_event_context->nix_af_rvu_int = intr;
+
+   if (intr & BIT_ULL(0))
+   nix_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL);
+   devlink_health_report(rvu_dl->rvu_nix_health_reporter, "NIX_AF_RVU 
Error",
+ nix_event_context);
+
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->nix_event_ctx;
+   nix_event_count = _event_context->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT);
+   nix_event_context->nix_af_rvu_err = intr;
+
+   if (intr & BIT_ULL(14))
+   nix_event_count->aq_inst_count++;
+   if (intr & BIT_ULL(13))
+   nix_event_count->aq_res_count++;
+   if (intr & BIT_ULL(12))
+   nix_event_count->aq_db_count++;
+   if (intr & BIT_ULL(6))
+   nix_event_count->rx_on_unmap_pf_count++;
+   if (intr & BIT_ULL(5))
+   nix_event_count->rx_mcast_repl_count++;
+   if (intr & BIT_ULL(4))
+   nix_event_count->rx_mcast_memfault_count++;
+   if (intr & BIT_ULL(3))
+   nix_event_count->rx_mcast_wqe_memfault_count++;
+   if (intr & BIT_ULL(2))
+   nix_event_count->rx_mirror_wqe_memfault_count++;
+   if (intr & BIT_ULL(1))
+   nix_event_count->rx_mirror_pktw_memfault_count++;
+   if (intr & BIT_ULL(0))
+   nix_event_count->rx_mcast_pktw_memfault_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT_ENA_W1C, ~0ULL);
+   dev

[PATCHv4 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-11-20 Thread George Cherian

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
  versions:
  fixed:
mbox version: 9

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../net/ethernet/marvell/octeontx2/Kconfig|  1 +
 .../ethernet/marvell/octeontx2/af/Makefile|  2 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  4 ++
 .../marvell/octeontx2/af/rvu_devlink.c| 72 +++
 .../marvell/octeontx2/af/rvu_devlink.h| 20 ++
 6 files changed, 106 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 543a1d047567..16caa02095fe 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
+   select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 7100d1dd856e..eb535c98ca38 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 octeontx2_mbox-y := mbox.o rvu_trace.o
 octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
  rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \
- rvu_cpt.o
+ rvu_cpt.o rvu_devlink.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 9f901c0edcbb..e8fd712860a1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (err)
goto err_flr;
 
+   err = rvu_register_dl(rvu);
+   if (err)
+   goto err_irq;
+
rvu_setup_rvum_blk_revid(rvu);
 
/* Enable AF's VFs (if any) */
err = rvu_enable_sriov(rvu);
if (err)
-   goto err_irq;
+   goto err_dl;
 
/* Initialize debugfs */
rvu_dbg_init(rvu);
 
return 0;
+err_dl:
+   rvu_unregister_dl(rvu);
 err_irq:
rvu_unregister_interrupts(rvu);
 err_flr:
@@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev)
 
rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
+   rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index b6c0977499ab..b1a6ecfd563e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -12,7 +12,10 @@
 #define RVU_H
 
 #include 
+#include 
+
 #include "rvu_struct.h"
+#include "rvu_devlink.h"
 #include "common.h"
 #include "mbox.h"
 #include "npc.h"
@@ -422,6 +425,7 @@ struct rvu {
 #ifdef CONFIG_DEBUG_FS
struct rvu_debugfs  rvu_dbg;
 #endif
+   struct rvu_devlink  *rvu_dl;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
new file mode 100644
index ..04ef945e7e75
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Devlink
+ *
+ * Copyright (C) 2020 Marvell.
+ *
+ */
+
+#include "rvu.h"
+
+#define DRV_NAME "octeontx2-af"
+
+static int rvu_devlink_info_get(struct devlink *devlink, struct 
devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   char buf[10];
+   int err;
+
+   err = devlink_info_driver_name_put(req, DRV_NAME);
+   if (err)
+   return err;
+
+   sprintf(buf, "%X", OTX2_MBOX_VERSION);
+   return devlink_info_version_fixed_put(req, "mbox version:", buf);
+}
+
+static const struct devlink_ops rvu_devlink_ops = {
+   .info_get = rvu_devlink_info_get,
+};
+
+int rvu_register_dl(struct rvu *rvu)
+{
+   struct

[PATCHv3 net-next 0/3] Add devlink and devlink health reporters to

2020-11-20 Thread George Cherian

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA and NIX blocks.
These reporters report the error count in respective blocks.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

Change-log:
v4 
 - Rebase to net-next (no logic changes).
 
v3
 - Address Saeed's comments on v2.
 - Renamed the reporter name as hw_*.
 - Call devlink_health_report() when an event is raised.
 - Added recover op too.

v2
 - Address Willem's comments on v1.
 - Fixed the sparse issues, reported by Jakub.


George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  octeontx2-af: Add devlink health reporters for NIX

 .../net/ethernet/marvell/octeontx2/Kconfig|   1 +
 .../ethernet/marvell/octeontx2/af/Makefile|   2 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../marvell/octeontx2/af/rvu_devlink.c| 972 ++
 .../marvell/octeontx2/af/rvu_devlink.h|  82 ++
 .../marvell/octeontx2/af/rvu_struct.h |  33 +
 7 files changed, 1101 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

-- 
2.25.1

[PATCH] octeontx2-af: Add support for RSS hashing based on Transport protocol field

2020-11-20 Thread George Cherian

Add support to choose RSS flow key algorithm with IPv4 transport protocol
field included in hashing input data. This will be enabled by default.
There-by enabling 3/5 tuple hash

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: George Cherian 
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c  | 7 +++
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 3 ++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h 
b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index f46de8419b77..97c8566b7da8 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -644,6 +644,7 @@ struct nix_rss_flowkey_cfg {
 #define NIX_FLOW_KEY_TYPE_INNR_SCTP BIT(16)
 #define NIX_FLOW_KEY_TYPE_INNR_ETH_DMAC BIT(17)
 #define NIX_FLOW_KEY_TYPE_VLAN BIT(20)
+#define NIX_FLOW_KEY_TYPE_IPV4_PROTO   BIT(21)
u32 flowkey_cfg; /* Flowkey types selected */
u8  group;   /* RSS context or group */
 };
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 8bac1dd3a1c2..ef016521b277 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -2429,6 +2429,13 @@ static int set_flowkey_fields(struct nix_rx_flowkey_alg 
*alg, u32 flow_cfg)
/* This should be set to 1, when SEL_CHAN is set */
field->bytesm1 = 1;
break;
+   case NIX_FLOW_KEY_TYPE_IPV4_PROTO:
+   field->lid = NPC_LID_LC;
+   field->hdr_offset = 9; /* offset */
+   field->bytesm1 = 0; /* 1 byte */
+   field->ltype_match = NPC_LT_LC_IP;
+   field->ltype_mask = 0xF;
+   break;
case NIX_FLOW_KEY_TYPE_IPV4:
case NIX_FLOW_KEY_TYPE_INNR_IPV4:
field->lid = NPC_LID_LC;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c 
b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 9f3d6715748e..2ab927408656 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -355,7 +355,8 @@ int otx2_rss_init(struct otx2_nic *pfvf)
rss->flowkey_cfg = rss->enable ? rss->flowkey_cfg :
   NIX_FLOW_KEY_TYPE_IPV4 | NIX_FLOW_KEY_TYPE_IPV6 |
   NIX_FLOW_KEY_TYPE_TCP | NIX_FLOW_KEY_TYPE_UDP |
-  NIX_FLOW_KEY_TYPE_SCTP | NIX_FLOW_KEY_TYPE_VLAN;
+  NIX_FLOW_KEY_TYPE_SCTP | NIX_FLOW_KEY_TYPE_VLAN |
+  NIX_FLOW_KEY_TYPE_IPV4_PROTO;
 
ret = otx2_set_flowkey_cfg(pfvf);
if (ret)
-- 
2.25.1

[PATCHv3 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-19 Thread George Cherian

Add health reporters for RVU NPA block.
NPA Health reporters handle following HW event groups
 - GENERAL events
 - ERROR events
 - RAS events
 - RVU event
An event counter per event is maintained in SW.

Output:
 # devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
 # devlink  health dump show pci/0002:01:00.0 reporter npa
 NPA_AF_GENERAL:
Unmap PF Error: 0
Free Disabled for NIX0 RX: 0
Free Disabled for NIX0 TX: 0
Free Disabled for NIX1 RX: 0
Free Disabled for NIX1 TX: 0
Free Disabled for SSO: 0
Free Disabled for TIM: 0
Free Disabled for DPI: 0
Free Disabled for AURA: 0
Alloc Disabled for Resvd: 0
  NPA_AF_ERR:
Memory Fault on NPA_AQ_INST_S read: 0
Memory Fault on NPA_AQ_RES_S write: 0
AQ Doorbell Error: 0
Poisoned data on NPA_AQ_INST_S read: 0
Poisoned data on NPA_AQ_RES_S write: 0
Poisoned data on HW context read: 0
  NPA_AF_RVU:
Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 492 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  31 ++
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 3 files changed, 545 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 04ef945e7e75..b7f0691d86b0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -5,10 +5,498 @@
  *
  */
 
+#include
+
 #include "rvu.h"
+#include "rvu_reg.h"
+#include "rvu_struct.h"
 
 #define DRV_NAME "octeontx2-af"
 
+static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
+{
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, name);
+   if (err)
+   return err;
+
+   return  devlink_fmsg_obj_nest_start(fmsg);
+}
+
+static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
+{
+   int err;
+
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+
+   return devlink_fmsg_pair_nest_end(fmsg);
+}
+
+static bool rvu_common_request_irq(struct rvu *rvu, int offset,
+  const char *name, irq_handler_t fn)
+{
+   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
+   int rc;
+
+   sprintf(>irq_name[offset * NAME_SIZE], name);
+   rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
+>irq_name[offset * NAME_SIZE], rvu_dl);
+   if (rc)
+   dev_warn(rvu->dev, "Failed to register %s irq\n", name);
+   else
+   rvu->irq_allocated[offset] = true;
+
+   return rvu->irq_allocated[offset];
+}
+
+static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_ctx *npa_event_context;
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_context = rvu_dl->npa_event_ctx;
+   npa_event_count = _event_context->npa_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT);
+   npa_event_context->npa_af_rvu_int = intr;
+
+   if (intr & BIT_ULL(0))
+   npa_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL);
+   devlink_health_report(rvu_dl->rvu_npa_health_reporter, "NPA_AF_RVU 
Error",
+ npa_event_context);
+
+   return IRQ_HANDLED;
+}
+
+static int rvu_npa_inpq_to_cnt(u16 in,
+  struct rvu_npa_event_cnt *npa_event_count)
+{
+   switch (in) {
+   case 0:
+   return 0;
+   case BIT(NPA_INPQ_NIX0_RX):
+   return npa_event_count->free_dis_nix0_rx_count++;
+   case BIT(NPA_INPQ_NIX0_TX):
+   return npa_event_count->free_dis_nix0_tx_count++;
+   case BIT(NPA_INPQ_NIX1_RX):
+   return npa_event_count->free_dis_nix1_rx_count++;
+   case BIT(NPA_INPQ_NIX1_TX):
+   return npa_event_count->free_dis_nix1_tx_count++;
+   case BIT(NPA_INPQ_SSO):
+   return npa_event_count->free_dis_sso_count++;
+   case BIT(NPA_INPQ_TIM):
+   return npa_event_count->free_dis_tim_count++;
+   case BIT(NPA_INPQ_DPI):
+   return npa_event_count->free_dis_dpi_count++;
+   case BIT(NPA_INPQ_AURA_OP):

[PATCHv3 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX

2020-11-19 Thread George Cherian

Add health reporters for RVU NIX block.
NIX Health reporter handle following HW event groups
 - GENERAL events
 - RAS events
 - RVU event
An event counter per event is maintained in SW.

Output:
 # ./devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
   reporter nix
 state healthy error 0 recover 0
 # ./devlink  health dump show pci/0002:01:00.0 reporter nix
  NIX_AF_GENERAL:
 Memory Fault on NIX_AQ_INST_S read: 0
 Memory Fault on NIX_AQ_RES_S write: 0
 AQ Doorbell error: 0
 Rx on unmapped PF_FUNC: 0
 Rx multicast replication error: 0
 Memory fault on NIX_RX_MCE_S read: 0
 Memory fault on multicast WQE read: 0
 Memory fault on mirror WQE read: 0
 Memory fault on mirror pkt write: 0
 Memory fault on multicast pkt write: 0
   NIX_AF_RAS:
 Poisoned data on NIX_AQ_INST_S read: 0
 Poisoned data on NIX_AQ_RES_S write: 0
 Poisoned data on HW context read: 0
 Poisoned data on packet read from mirror buffer: 0
 Poisoned data on packet read from mcast buffer: 0
 Poisoned data on WQE read from mirror buffer: 0
 Poisoned data on WQE read from multicast buffer: 0
 Poisoned data on NIX_RX_MCE_S read: 0
   NIX_AF_RVU:
 Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 414 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  31 ++
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 3 files changed, 453 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index b7f0691d86b0..c02d0f56ae7a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -35,6 +35,131 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
return devlink_fmsg_pair_nest_end(fmsg);
 }
 
+static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->nix_event_ctx;
+   nix_event_count = _event_context->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT);
+   nix_event_context->nix_af_rvu_int = intr;
+
+   if (intr & BIT_ULL(0))
+   nix_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL);
+   devlink_health_report(rvu_dl->rvu_nix_health_reporter, "NIX_AF_RVU 
Error",
+ nix_event_context);
+
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_ctx *nix_event_context;
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_context = rvu_dl->nix_event_ctx;
+   nix_event_count = _event_context->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT);
+   nix_event_context->nix_af_rvu_err = intr;
+
+   if (intr & BIT_ULL(14))
+   nix_event_count->aq_inst_count++;
+   if (intr & BIT_ULL(13))
+   nix_event_count->aq_res_count++;
+   if (intr & BIT_ULL(12))
+   nix_event_count->aq_db_count++;
+   if (intr & BIT_ULL(6))
+   nix_event_count->rx_on_unmap_pf_count++;
+   if (intr & BIT_ULL(5))
+   nix_event_count->rx_mcast_repl_count++;
+   if (intr & BIT_ULL(4))
+   nix_event_count->rx_mcast_memfault_count++;
+   if (intr & BIT_ULL(3))
+   nix_event_count->rx_mcast_wqe_memfault_count++;
+   if (intr & BIT_ULL(2))
+   nix_event_count->rx_mirror_wqe_memfault_count++;
+   if (intr & BIT_ULL(1))
+   nix_event_count->rx_mirror_pktw_memfault_count++;
+   if (intr & BIT_ULL(0))
+   nix_event_count->rx_mcast_pktw_memfault_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr);
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT_ENA_W1C, ~0ULL);
+   dev

[PATCHv3 net-next 0/3] Add devlink and devlink health reporters to

2020-11-19 Thread George Cherian

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA and NIX blocks.
These reporters report the error count in respective blocks.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

Change-log:
v3
 - Address Saeed's comments on v2.
 - Renamed the reporter name as hw_*.
 - Call devlink_health_report() when an event is raised.
 - Added recover op too.

v2
 - Address Willem's comments on v1.
 - Fixed the sparse issues, reported by Jakub.

George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  octeontx2-af: Add devlink health reporters for NIX

 .../net/ethernet/marvell/octeontx2/Kconfig|   1 +
 .../ethernet/marvell/octeontx2/af/Makefile|   3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../marvell/octeontx2/af/rvu_devlink.c| 972 ++
 .../marvell/octeontx2/af/rvu_devlink.h|  82 ++
 .../marvell/octeontx2/af/rvu_struct.h |  33 +
 7 files changed, 1102 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

-- 
2.25.1

[PATCHv3 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-11-19 Thread George Cherian

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
  versions:
  fixed:
mbox version: 9

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../net/ethernet/marvell/octeontx2/Kconfig|  1 +
 .../ethernet/marvell/octeontx2/af/Makefile|  3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  4 ++
 .../marvell/octeontx2/af/rvu_devlink.c| 72 +++
 .../marvell/octeontx2/af/rvu_devlink.h| 20 ++
 6 files changed, 107 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 543a1d047567..16caa02095fe 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
+   select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 2f7a861d0c7b..20135f1d3387 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -9,4 +9,5 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 
 octeontx2_mbox-y := mbox.o rvu_trace.o
 octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
- rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o
+ rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o \
+ rvu_devlink.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index a28a518c0eae..67d6e05d1037 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -2816,17 +2816,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (err)
goto err_flr;
 
+   err = rvu_register_dl(rvu);
+   if (err)
+   goto err_irq;
+
rvu_setup_rvum_blk_revid(rvu);
 
/* Enable AF's VFs (if any) */
err = rvu_enable_sriov(rvu);
if (err)
-   goto err_irq;
+   goto err_dl;
 
/* Initialize debugfs */
rvu_dbg_init(rvu);
 
return 0;
+err_dl:
+   rvu_unregister_dl(rvu);
 err_irq:
rvu_unregister_interrupts(rvu);
 err_flr:
@@ -2858,6 +2864,7 @@ static void rvu_remove(struct pci_dev *pdev)
 
rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
+   rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 5ac9bb12415f..282566235918 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -12,7 +12,10 @@
 #define RVU_H
 
 #include 
+#include 
+
 #include "rvu_struct.h"
+#include "rvu_devlink.h"
 #include "common.h"
 #include "mbox.h"
 
@@ -376,6 +379,7 @@ struct rvu {
 #ifdef CONFIG_DEBUG_FS
struct rvu_debugfs  rvu_dbg;
 #endif
+   struct rvu_devlink  *rvu_dl;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
new file mode 100644
index ..04ef945e7e75
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Devlink
+ *
+ * Copyright (C) 2020 Marvell.
+ *
+ */
+
+#include "rvu.h"
+
+#define DRV_NAME "octeontx2-af"
+
+static int rvu_devlink_info_get(struct devlink *devlink, struct 
devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   char buf[10];
+   int err;
+
+   err = devlink_info_driver_name_put(req, DRV_NAME);
+   if (err)
+   return err;
+
+   sprintf(buf, "%X", OTX2_MBOX_VERSION);
+   return devlink_info_version_fixed_put(req, "mbox version:", buf);
+}
+
+static const struct devlink_ops rvu_devlink_ops = {
+   .info_get = rvu_devlink_info_get,
+};
+
+int rvu_register_dl(struct rvu *rvu)
+{
+   struct rvu_devlink *rvu_dl;
+

Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX

2020-11-05 Thread George Cherian

Hi Saeed,

Thanks for the review.

> -Original Message-
> From: Saeed Mahameed 
> Sent: Thursday, November 5, 2020 10:39 AM
> To: George Cherian ; net...@vger.kernel.org;
> linux-kernel@vger.kernel.org; Jiri Pirko 
> Cc: k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org;
> willemdebruijn.ker...@gmail.com
> Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health
> reporters for NIX
> 
> On Wed, 2020-11-04 at 17:57 +0530, George Cherian wrote:
> > Add health reporters for RVU NPA block.
>^^^ NIX ?
> 
Yes, it's NIX.

> Cc: Jiri
> 
> Anyway, could you please spare some words on what is NPA and what is
> NIX?
> 
> Regarding the reporters names, all drivers register well known generic names
> such as (fw,hw,rx,tx), I don't know if it is a good idea to use vendor 
> specific
> names, if you are reporting for hw/fw units then just use "hw" or "fw" as the
> reporter name and append the unit NPA/NIX to the counter/error names.
Okay. These are hw units, I will rename them as hw_npa/hw_nix.
> 
> > Only reporter dump is supported.
> >
> > Output:
> >  # ./devlink health
> >  pci/0002:01:00.0:
> >reporter npa
> >  state healthy error 0 recover 0
> >reporter nix
> >  state healthy error 0 recover 0
> >  # ./devlink  health dump show pci/0002:01:00.0 reporter nix
> >   NIX_AF_GENERAL:
> >  Memory Fault on NIX_AQ_INST_S read: 0
> >  Memory Fault on NIX_AQ_RES_S write: 0
> >  AQ Doorbell error: 0
> >  Rx on unmapped PF_FUNC: 0
> >  Rx multicast replication error: 0
> >  Memory fault on NIX_RX_MCE_S read: 0
> >  Memory fault on multicast WQE read: 0
> >  Memory fault on mirror WQE read: 0
> >  Memory fault on mirror pkt write: 0
> >  Memory fault on multicast pkt write: 0
> >NIX_AF_RAS:
> >  Poisoned data on NIX_AQ_INST_S read: 0
> >  Poisoned data on NIX_AQ_RES_S write: 0
> >  Poisoned data on HW context read: 0
> >  Poisoned data on packet read from mirror buffer: 0
> >  Poisoned data on packet read from mcast buffer: 0
> >  Poisoned data on WQE read from mirror buffer: 0
> >  Poisoned data on WQE read from multicast buffer: 0
> >  Poisoned data on NIX_RX_MCE_S read: 0
> >NIX_AF_RVU:
> >  Unmap Slot Error: 0
> >
> 
> Now i am a little bit skeptic here, devlink health reporter infrastructure was
> never meant to deal with dump op only, the main purpose is to
> diagnose/dump and recover.
> 
> especially in your use case where you only report counters, i don't believe
> devlink health dump is a proper interface for this.
These are not counters. These are error interrupts raised by HW blocks.
The count is provided to understand on how frequently the errors are seen.
Error recovery for some of the blocks happen internally. That is the reason,
Currently only dump op is added.
> Many of these counters if not most are data path packet based and maybe
> they should belong to ethtool.

Regards,
-George

[PATCH v2 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-04 Thread George Cherian

Add health reporters for RVU NPA block.
Only reporter dump is supported

Output:
 # devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
 # devlink  health dump show pci/0002:01:00.0 reporter npa
 NPA_AF_GENERAL:
Unmap PF Error: 0
Free Disabled for NIX0 RX: 0
Free Disabled for NIX0 TX: 0
Free Disabled for NIX1 RX: 0
Free Disabled for NIX1 TX: 0
Free Disabled for SSO: 0
Free Disabled for TIM: 0
Free Disabled for DPI: 0
Free Disabled for AURA: 0
Alloc Disabled for Resvd: 0
  NPA_AF_ERR:
Memory Fault on NPA_AQ_INST_S read: 0
Memory Fault on NPA_AQ_RES_S write: 0
AQ Doorbell Error: 0
Poisoned data on NPA_AQ_INST_S read: 0
Poisoned data on NPA_AQ_RES_S write: 0
Poisoned data on HW context read: 0
  NPA_AF_RVU:
Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 432 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  23 +
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 3 files changed, 477 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 596bb9c533b5..bf9efe1f6aec 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -5,10 +5,438 @@
  *
  */
 
+#include
+
 #include "rvu.h"
+#include "rvu_reg.h"
+#include "rvu_struct.h"
 
 #define DRV_NAME "octeontx2-af"
 
+static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
+{
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, name);
+   if (err)
+   return err;
+
+   return  devlink_fmsg_obj_nest_start(fmsg);
+}
+
+static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
+{
+   int err;
+
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+
+   return devlink_fmsg_pair_nest_end(fmsg);
+}
+
+static bool rvu_common_request_irq(struct rvu *rvu, int offset,
+  const char *name, irq_handler_t fn)
+{
+   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
+   int rc;
+
+   sprintf(>irq_name[offset * NAME_SIZE], name);
+   rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
+>irq_name[offset * NAME_SIZE], rvu_dl);
+   if (rc)
+   dev_warn(rvu->dev, "Failed to register %s irq\n", name);
+   else
+   rvu->irq_allocated[offset] = true;
+
+   return rvu->irq_allocated[offset];
+}
+
+static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_count = rvu_dl->npa_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT);
+
+   if (intr & BIT_ULL(0))
+   npa_event_count->unmap_slot_count++;
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr);
+   return IRQ_HANDLED;
+}
+
+static int rvu_npa_inpq_to_cnt(u16 in,
+  struct rvu_npa_event_cnt *npa_event_count)
+{
+   switch (in) {
+   case 0:
+   return 0;
+   case BIT(NPA_INPQ_NIX0_RX):
+   return npa_event_count->free_dis_nix0_rx_count++;
+   case BIT(NPA_INPQ_NIX0_TX):
+   return npa_event_count->free_dis_nix0_tx_count++;
+   case BIT(NPA_INPQ_NIX1_RX):
+   return npa_event_count->free_dis_nix1_rx_count++;
+   case BIT(NPA_INPQ_NIX1_TX):
+   return npa_event_count->free_dis_nix1_tx_count++;
+   case BIT(NPA_INPQ_SSO):
+   return npa_event_count->free_dis_sso_count++;
+   case BIT(NPA_INPQ_TIM):
+   return npa_event_count->free_dis_tim_count++;
+   case BIT(NPA_INPQ_DPI):
+   return npa_event_count->free_dis_dpi_count++;
+   case BIT(NPA_INPQ_AURA_OP):
+   return npa_event_count->free_dis_aura_count++;
+   case BIT(NPA_INPQ_INTERNAL_RSV):
+   return npa_event_count->free_dis_rsvd_count++;
+   }
+
+   return npa_event_count->alloc_dis_rsvd_count++;
+}
+
+static irqreturn_t rvu_npa_af_gen_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr, val;
+   u64 intr;
+
+   rvu = rvu_dl-&

[PATCH v2 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX

2020-11-04 Thread George Cherian

Add health reporters for RVU NPA block.
Only reporter dump is supported.

Output:
 # ./devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
   reporter nix
 state healthy error 0 recover 0
 # ./devlink  health dump show pci/0002:01:00.0 reporter nix
  NIX_AF_GENERAL:
 Memory Fault on NIX_AQ_INST_S read: 0
 Memory Fault on NIX_AQ_RES_S write: 0
 AQ Doorbell error: 0
 Rx on unmapped PF_FUNC: 0
 Rx multicast replication error: 0
 Memory fault on NIX_RX_MCE_S read: 0
 Memory fault on multicast WQE read: 0
 Memory fault on mirror WQE read: 0
 Memory fault on mirror pkt write: 0
 Memory fault on multicast pkt write: 0
   NIX_AF_RAS:
 Poisoned data on NIX_AQ_INST_S read: 0
 Poisoned data on NIX_AQ_RES_S write: 0
 Poisoned data on HW context read: 0
 Poisoned data on packet read from mirror buffer: 0
 Poisoned data on packet read from mcast buffer: 0
 Poisoned data on WQE read from mirror buffer: 0
 Poisoned data on WQE read from multicast buffer: 0
 Poisoned data on NIX_RX_MCE_S read: 0
   NIX_AF_RVU:
 Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 360 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  24 ++
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 3 files changed, 393 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index bf9efe1f6aec..49e51d1bd7d5 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -35,6 +35,110 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg)
return devlink_fmsg_pair_nest_end(fmsg);
 }
 
+static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_count = rvu_dl->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT);
+
+   if (intr & BIT_ULL(0))
+   nix_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_count = rvu_dl->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT);
+
+   if (intr & BIT_ULL(14))
+   nix_event_count->aq_inst_count++;
+   if (intr & BIT_ULL(13))
+   nix_event_count->aq_res_count++;
+   if (intr & BIT_ULL(12))
+   nix_event_count->aq_db_count++;
+   if (intr & BIT_ULL(6))
+   nix_event_count->rx_on_unmap_pf_count++;
+   if (intr & BIT_ULL(5))
+   nix_event_count->rx_mcast_repl_count++;
+   if (intr & BIT_ULL(4))
+   nix_event_count->rx_mcast_memfault_count++;
+   if (intr & BIT_ULL(3))
+   nix_event_count->rx_mcast_wqe_memfault_count++;
+   if (intr & BIT_ULL(2))
+   nix_event_count->rx_mirror_wqe_memfault_count++;
+   if (intr & BIT_ULL(1))
+   nix_event_count->rx_mirror_pktw_memfault_count++;
+   if (intr & BIT_ULL(0))
+   nix_event_count->rx_mcast_pktw_memfault_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t rvu_nix_af_ras_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_count = rvu_dl->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RAS);
+
+   if (intr & BIT_ULL(34))
+   nix_event_count->poison_aq_inst_count++;
+   if (intr & BIT_ULL(33))
+   nix_event_count->poison_aq_res_count++;
+   if (intr & BIT_ULL(3

[PATCH v2 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-11-04 Thread George Cherian

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
  versions:
  fixed:
mbox version: 9

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../net/ethernet/marvell/octeontx2/Kconfig|  1 +
 .../ethernet/marvell/octeontx2/af/Makefile|  3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  4 ++
 .../marvell/octeontx2/af/rvu_devlink.c| 72 +++
 .../marvell/octeontx2/af/rvu_devlink.h| 20 ++
 6 files changed, 107 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 543a1d047567..16caa02095fe 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
+   select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 2f7a861d0c7b..20135f1d3387 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -9,4 +9,5 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 
 octeontx2_mbox-y := mbox.o rvu_trace.o
 octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
- rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o
+ rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o \
+ rvu_devlink.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index f0ce2ec0993b..cfff7d3fb705 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -2816,17 +2816,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (err)
goto err_flr;
 
+   err = rvu_register_dl(rvu);
+   if (err)
+   goto err_irq;
+
rvu_setup_rvum_blk_revid(rvu);
 
/* Enable AF's VFs (if any) */
err = rvu_enable_sriov(rvu);
if (err)
-   goto err_irq;
+   goto err_dl;
 
/* Initialize debugfs */
rvu_dbg_init(rvu);
 
return 0;
+err_dl:
+   rvu_unregister_dl(rvu);
 err_irq:
rvu_unregister_interrupts(rvu);
 err_flr:
@@ -2858,6 +2864,7 @@ static void rvu_remove(struct pci_dev *pdev)
 
rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
+   rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 5ac9bb12415f..282566235918 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -12,7 +12,10 @@
 #define RVU_H
 
 #include 
+#include 
+
 #include "rvu_struct.h"
+#include "rvu_devlink.h"
 #include "common.h"
 #include "mbox.h"
 
@@ -376,6 +379,7 @@ struct rvu {
 #ifdef CONFIG_DEBUG_FS
struct rvu_debugfs  rvu_dbg;
 #endif
+   struct rvu_devlink  *rvu_dl;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
new file mode 100644
index ..596bb9c533b5
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Devlink
+ *
+ * Copyright (C) 2020 Marvell International Ltd.
+ *
+ */
+
+#include "rvu.h"
+
+#define DRV_NAME "octeontx2-af"
+
+static int rvu_devlink_info_get(struct devlink *devlink, struct 
devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   char buf[10];
+   int err;
+
+   err = devlink_info_driver_name_put(req, DRV_NAME);
+   if (err)
+   return err;
+
+   sprintf(buf, "%X", OTX2_MBOX_VERSION);
+   return devlink_info_version_fixed_put(req, "mbox version:", buf);
+}
+
+static const struct devlink_ops rvu_devlink_ops = {
+   .info_get = rvu_devlink_info_get,
+};
+
+int rvu_register_dl(struct rvu *rvu)
+{
+   struct r

[PATCH v2 net-next 0/3] Add devlink and devlink health reporters to

2020-11-04 Thread George Cherian

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA and NIX blocks.
These reporters report the error count in respective blocks.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

Change-log:
- Address Willem's comments on v1.
- Fixed the sparse issues, reported by Jakub.

George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  octeontx2-af: Add devlink health reporters for NIX

 .../net/ethernet/marvell/octeontx2/Kconfig|   1 +
 .../ethernet/marvell/octeontx2/af/Makefile|   3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../marvell/octeontx2/af/rvu_devlink.c| 860 ++
 .../marvell/octeontx2/af/rvu_devlink.h|  67 ++
 .../marvell/octeontx2/af/rvu_struct.h |  33 +
 7 files changed, 975 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

-- 
2.25.4

Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-03 Thread George Cherian

Hi Willem,

> -Original Message-
> From: Willem de Bruijn 
> Sent: Tuesday, November 3, 2020 11:26 PM
> To: George Cherian 
> Cc: Network Development ; linux-kernel  ker...@vger.kernel.org>; Jakub Kicinski ; David Miller
> ; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org
> Subject: Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> On Tue, Nov 3, 2020 at 12:43 PM George Cherian 
> wrote:
> >
> > Hi Willem,
> >
> >
> > > -Original Message-
> > > From: Willem de Bruijn 
> > > Sent: Tuesday, November 3, 2020 7:21 PM
> > > To: George Cherian 
> > > Cc: Network Development ; linux-kernel
> > > ; Jakub Kicinski ;
> > > David Miller ; Sunil Kovvuri Goutham
> > > ; Linu Cherian ;
> > > Geethasowjanya Akula ; masahi...@kernel.org
> > > Subject: [EXT] Re: [net-next PATCH 2/3] octeontx2-af: Add devlink
> > > health reporters for NPA
> > >
> > > External Email
> > >
> > > 
> > > --
> > > > > >  static int rvu_devlink_info_get(struct devlink *devlink,
> > > > > > struct
> > > > > devlink_info_req *req,
> > > > > > struct netlink_ext_ack
> > > > > > *extack)  { @@
> > > > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu)
> > > > > > rvu_dl->dl = dl;
> > > > > > rvu_dl->rvu = rvu;
> > > > > > rvu->rvu_dl = rvu_dl;
> > > > > > -   return 0;
> > > > > > +
> > > > > > +   return rvu_health_reporters_create(rvu);
> > > > >
> > > > > when would this be called with rvu->rvu_dl == NULL?
> > > >
> > > > During initialization.
> > >
> > > This is the only caller, and it is only reached if rvu_dl is non-zero.
> >
> > Did you mean to ask, where is it de-initialized?
> > If so, it should be done in rvu_unregister_dl() after freeing rvu_dl.
> 
> No, I meant that rvu_health_reporters_create does not need an !rvu-
> >rvu_dl precondition test, as the only callers calls with with a non-zero
> rvu_dl.

Yes understood!!
Will fix in v2.

Thanks,
-George

Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-03 Thread George Cherian

Hi Willem,


> -Original Message-
> From: Willem de Bruijn 
> Sent: Tuesday, November 3, 2020 7:21 PM
> To: George Cherian 
> Cc: Network Development ; linux-kernel  ker...@vger.kernel.org>; Jakub Kicinski ; David Miller
> ; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org
> Subject: [EXT] Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> > > >  static int rvu_devlink_info_get(struct devlink *devlink, struct
> > > devlink_info_req *req,
> > > > struct netlink_ext_ack *extack)  { @@
> > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu)
> > > > rvu_dl->dl = dl;
> > > > rvu_dl->rvu = rvu;
> > > > rvu->rvu_dl = rvu_dl;
> > > > -   return 0;
> > > > +
> > > > +   return rvu_health_reporters_create(rvu);
> > >
> > > when would this be called with rvu->rvu_dl == NULL?
> >
> > During initialization.
> 
> This is the only caller, and it is only reached if rvu_dl is non-zero.

Yes!!! I got it, will address it in v2.

Regards
-George

Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-03 Thread George Cherian

Hi Willem,


> -Original Message-
> From: Willem de Bruijn 
> Sent: Tuesday, November 3, 2020 7:21 PM
> To: George Cherian 
> Cc: Network Development ; linux-kernel  ker...@vger.kernel.org>; Jakub Kicinski ; David Miller
> ; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org
> Subject: [EXT] Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> External Email
> 
> --
> > > >  static int rvu_devlink_info_get(struct devlink *devlink, struct
> > > devlink_info_req *req,
> > > > struct netlink_ext_ack *extack)  { @@
> > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu)
> > > > rvu_dl->dl = dl;
> > > > rvu_dl->rvu = rvu;
> > > > rvu->rvu_dl = rvu_dl;
> > > > -   return 0;
> > > > +
> > > > +   return rvu_health_reporters_create(rvu);
> > >
> > > when would this be called with rvu->rvu_dl == NULL?
> >
> > During initialization.
> 
> This is the only caller, and it is only reached if rvu_dl is non-zero.

Did you mean to ask, where is it de-initialized?
If so, it should be done in rvu_unregister_dl() after freeing rvu_dl.

Is that what you meant?

Regards,
-George

Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-02 Thread George Cherian

Hi Willem,

Thanks for the review.

> -Original Message-
> From: Willem de Bruijn 
> Sent: Monday, November 2, 2020 7:12 PM
> To: George Cherian 
> Cc: Network Development ; linux-kernel  ker...@vger.kernel.org>; Jakub Kicinski ; David Miller
> ; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org
> Subject: Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health
> reporters for NPA
> 
> On Mon, Nov 2, 2020 at 12:07 AM George Cherian
>  wrote:
> >
> > Add health reporters for RVU NPA block.
> > Only reporter dump is supported
> >
> > Output:
> >  # devlink health
> >  pci/0002:01:00.0:
> >reporter npa
> >  state healthy error 0 recover 0
> >  # devlink  health dump show pci/0002:01:00.0 reporter npa
> >  NPA_AF_GENERAL:
> > Unmap PF Error: 0
> > Free Disabled for NIX0 RX: 0
> > Free Disabled for NIX0 TX: 0
> > Free Disabled for NIX1 RX: 0
> > Free Disabled for NIX1 TX: 0
> > Free Disabled for SSO: 0
> > Free Disabled for TIM: 0
> > Free Disabled for DPI: 0
> > Free Disabled for AURA: 0
> > Alloc Disabled for Resvd: 0
> >   NPA_AF_ERR:
> > Memory Fault on NPA_AQ_INST_S read: 0
> > Memory Fault on NPA_AQ_RES_S write: 0
> > AQ Doorbell Error: 0
> > Poisoned data on NPA_AQ_INST_S read: 0
> > Poisoned data on NPA_AQ_RES_S write: 0
> > Poisoned data on HW context read: 0
> >   NPA_AF_RVU:
> > Unmap Slot Error: 0
> >
> > Signed-off-by: Sunil Kovvuri Goutham 
> > Signed-off-by: Jerin Jacob 
> > Signed-off-by: George Cherian 
> 
> 
> > +static bool rvu_npa_af_request_irq(struct rvu *rvu, int blkaddr, int 
> > offset,
> > +  const char *name, irq_handler_t fn)
> > +{
> > +   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
> > +   int rc;
> > +
> > +   WARN_ON(rvu->irq_allocated[offset]);
> 
> Please use WARN_ON sparingly for important unrecoverable events. This
> seems like a basic precondition. If it can happen at all, can probably catch 
> in a
> normal branch with a netdev_err. The stacktrace in the oops is not likely to
> point at the source of the non-zero value, anyway.
Okay, will fix it in v2.
> 
> > +   rvu->irq_allocated[offset] = false;
> 
> Why initialize this here? Are these fields not zeroed on alloc? Is this here 
> only
> to safely call rvu_npa_unregister_interrupts on partial alloc? Then it might 
> be
> simpler to just have jump labels in this function to free the successfully
> requested irqs.

It shouldn't be initialized like this; it is zeroed on alloc.
Will fix in v2.
> 
> > +   sprintf(>irq_name[offset * NAME_SIZE], name);
> > +   rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
> > +>irq_name[offset * NAME_SIZE], rvu_dl);
> > +   if (rc)
> > +   dev_warn(rvu->dev, "Failed to register %s irq\n", name);
> > +   else
> > +   rvu->irq_allocated[offset] = true;
> > +
> > +   return rvu->irq_allocated[offset]; }
> 
> > +static int rvu_npa_health_reporters_create(struct rvu_devlink
> > +*rvu_dl) {
> > +   struct devlink_health_reporter *rvu_npa_health_reporter;
> > +   struct rvu_npa_event_cnt *npa_event_count;
> > +   struct rvu *rvu = rvu_dl->rvu;
> > +
> > +   npa_event_count = kzalloc(sizeof(*npa_event_count), GFP_KERNEL);
> > +   if (!npa_event_count)
> > +   return -ENOMEM;
> > +
> > +   rvu_dl->npa_event_cnt = npa_event_count;
> > +   rvu_npa_health_reporter = devlink_health_reporter_create(rvu_dl-
> >dl,
> > +
> > _npa_hw_fault_reporter_ops,
> > +0, rvu);
> > +   if (IS_ERR(rvu_npa_health_reporter)) {
> > +   dev_warn(rvu->dev, "Failed to create npa reporter, err 
> > =%ld\n",
> > +PTR_ERR(rvu_npa_health_reporter));
> > +   return PTR_ERR(rvu_npa_health_reporter);
> > +   }
> > +
> > +   rvu_dl->rvu_npa_health_reporter = rvu_npa_health_reporter;
> > +   return 0;
> > +}
> > +
> > +static void rvu_npa_health_reporters_destroy(struct rvu_devlink
> > +*rvu_dl) {
> > +   if (!rvu_dl->rvu_npa_health_reporter)

Re: [net-next PATCH 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-11-02 Thread George Cherian

Hi Willem,

Thanks for the review.

> -Original Message-
> From: Willem de Bruijn 
> Sent: Monday, November 2, 2020 7:01 PM
> To: George Cherian 
> Cc: Network Development ; linux-kernel  ker...@vger.kernel.org>; Jakub Kicinski ; David Miller
> ; Sunil Kovvuri Goutham
> ; Linu Cherian ;
> Geethasowjanya Akula ; masahi...@kernel.org
> Subject: Re: [net-next PATCH 1/3] octeontx2-af: Add devlink suppoort
> to af driver
> 
> On Mon, Nov 2, 2020 at 12:07 AM George Cherian
>  wrote:
> >
> > Add devlink support to AF driver. Basic devlink support is added.
> > Currently info_get is the only supported devlink ops.
> >
> > devlink ouptput looks like this
> >  # devlink dev
> >  pci/0002:01:00.0
> >  # devlink dev info
> >  pci/0002:01:00.0:
> >   driver octeontx2-af
> >   versions:
> >   fixed:
> >     mbox version: 9
> >
> > Signed-off-by: Sunil Kovvuri Goutham 
> > Signed-off-by: Jerin Jacob 
> > Signed-off-by: George Cherian 
> 
> > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
> > b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
> > index 5ac9bb12415f..c112b299635d 100644
> > --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
> > +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
> > @@ -12,7 +12,10 @@
> >  #define RVU_H
> >
> >  #include 
> > +#include 
> > +
> >  #include "rvu_struct.h"
> > +#include "rvu_devlink.h"
> >  #include "common.h"
> >  #include "mbox.h"
> >
> > @@ -372,10 +375,10 @@ struct rvu {
> > struct npc_kpu_profile_adapter kpu;
> >
> > struct ptp  *ptp;
> > -
> 
> accidentally removed this line?
Yes.
> 
> >  #ifdef CONFIG_DEBUG_FS
> > struct rvu_debugfs  rvu_dbg;
> >  #endif
> > +   struct rvu_devlink  *rvu_dl;
> >  };
> 
> 
> > +int rvu_register_dl(struct rvu *rvu)
> > +{
> > +   struct rvu_devlink *rvu_dl;
> > +   struct devlink *dl;
> > +   int err;
> > +
> > +   rvu_dl = kzalloc(sizeof(*rvu_dl), GFP_KERNEL);
> > +   if (!rvu_dl)
> > +   return -ENOMEM;
> > +
> > +   dl = devlink_alloc(_devlink_ops, sizeof(struct rvu_devlink));
> > +   if (!dl) {
> > +   dev_warn(rvu->dev, "devlink_alloc failed\n");
> > +   return -ENOMEM;
> 
> rvu_dl not freed on error.
Thanks for pointing out, will address in v2.
> 
> This happens a couple of times in these patches
Will fix it.
> 
> Is the intermediate struct needed, or could you embed the fields directly into
> rvu and use container_of to get from devlink to struct rvu? Even if needed,
> perhaps easier to embed the struct into rvu rather than a pointer.
Currently only 2 hardware blocks are supported NIX and NPA.
Error reporting for more HW blocks will be added, that’s the reason for the 
intermediate struct.
> 
> > +   }
> > +
> > +   err = devlink_register(dl, rvu->dev);
> > +   if (err) {
> > +   dev_err(rvu->dev, "devlink register failed with error 
> > %d\n", err);
> > +   devlink_free(dl);
> > +   return err;
> > +   }
> > +
> > +   rvu_dl->dl = dl;
> > +   rvu_dl->rvu = rvu;
> > +   rvu->rvu_dl = rvu_dl;
> > +   return 0;
> > +}
> > +
> > +void rvu_unregister_dl(struct rvu *rvu) {
> > +   struct rvu_devlink *rvu_dl = rvu->rvu_dl;
> > +   struct devlink *dl = rvu_dl->dl;
> > +
> > +   if (!dl)
> > +   return;
> > +
> > +   devlink_unregister(dl);
> > +   devlink_free(dl);
> 
> here too
Yes, will fix in v2.

Regards,
-George

[net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

2020-11-01 Thread George Cherian

Add health reporters for RVU NPA block.
Only reporter dump is supported

Output:
 # devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
 # devlink  health dump show pci/0002:01:00.0 reporter npa
 NPA_AF_GENERAL:
Unmap PF Error: 0
Free Disabled for NIX0 RX: 0
Free Disabled for NIX0 TX: 0
Free Disabled for NIX1 RX: 0
Free Disabled for NIX1 TX: 0
Free Disabled for SSO: 0
Free Disabled for TIM: 0
Free Disabled for DPI: 0
Free Disabled for AURA: 0
Alloc Disabled for Resvd: 0
  NPA_AF_ERR:
Memory Fault on NPA_AQ_INST_S read: 0
Memory Fault on NPA_AQ_RES_S write: 0
AQ Doorbell Error: 0
Poisoned data on NPA_AQ_INST_S read: 0
Poisoned data on NPA_AQ_RES_S write: 0
Poisoned data on HW context read: 0
  NPA_AF_RVU:
Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 434 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  23 +
 .../marvell/octeontx2/af/rvu_struct.h |  23 +
 3 files changed, 479 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index c9f5f66e6701..946e751fb544 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -5,10 +5,440 @@
  *
  */
 
+#include
+
 #include "rvu.h"
+#include "rvu_reg.h"
+#include "rvu_struct.h"
 
 #define DRV_NAME "octeontx2-af"
 
+void rvu_npa_unregister_interrupts(struct rvu *rvu);
+
+int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
+{
+   int err;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, name);
+   if (err)
+   return err;
+
+   return  devlink_fmsg_obj_nest_start(fmsg);
+}
+
+int rvu_report_pair_end(struct devlink_fmsg *fmsg)
+{
+   int err;
+
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+
+   return devlink_fmsg_pair_nest_end(fmsg);
+}
+
+static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_count = rvu_dl->npa_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT);
+
+   if (intr & BIT_ULL(0))
+   npa_event_count->unmap_slot_count++;
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr);
+   return IRQ_HANDLED;
+}
+
+static int rvu_npa_inpq_to_cnt(u16 in,
+  struct rvu_npa_event_cnt *npa_event_count)
+{
+   switch (in) {
+   case 0:
+   return 0;
+   case BIT(NPA_INPQ_NIX0_RX):
+   return npa_event_count->free_dis_nix0_rx_count++;
+   case BIT(NPA_INPQ_NIX0_TX):
+   return npa_event_count->free_dis_nix0_tx_count++;
+   case BIT(NPA_INPQ_NIX1_RX):
+   return npa_event_count->free_dis_nix1_rx_count++;
+   case BIT(NPA_INPQ_NIX1_TX):
+   return npa_event_count->free_dis_nix1_tx_count++;
+   case BIT(NPA_INPQ_SSO):
+   return npa_event_count->free_dis_sso_count++;
+   case BIT(NPA_INPQ_TIM):
+   return npa_event_count->free_dis_tim_count++;
+   case BIT(NPA_INPQ_DPI):
+   return npa_event_count->free_dis_dpi_count++;
+   case BIT(NPA_INPQ_AURA_OP):
+   return npa_event_count->free_dis_aura_count++;
+   case BIT(NPA_INPQ_INTERNAL_RSV):
+   return npa_event_count->free_dis_rsvd_count++;
+   }
+
+   return npa_event_count->alloc_dis_rsvd_count++;
+}
+
+static irqreturn_t rvu_npa_af_gen_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_npa_event_cnt *npa_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr, val;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   npa_event_count = rvu_dl->npa_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NPA_AF_GEN_INT);
+
+   if (intr & BIT_ULL(32))
+   npa_event_count->unmap_pf_count++;
+
+   val = FIELD_GET(GENMASK(31, 16), intr);
+   rvu_npa_inpq_to_cnt(val, npa_event_count);
+
+   val = FIELD_GET(GENMASK(15, 0), intr);
+   rvu_npa_inpq_to_cnt(val, npa_event_count);
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, N

[net-next PATCH 1/3] octeontx2-af: Add devlink suppoort to af driver

2020-11-01 Thread George Cherian

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
  versions:
  fixed:
mbox version: 9

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../net/ethernet/marvell/octeontx2/Kconfig|  1 +
 .../ethernet/marvell/octeontx2/af/Makefile|  3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  5 +-
 .../marvell/octeontx2/af/rvu_devlink.c| 69 +++
 .../marvell/octeontx2/af/rvu_devlink.h| 20 ++
 6 files changed, 104 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig 
b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 543a1d047567..16caa02095fe 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
 config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
+   select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile 
b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
index 2f7a861d0c7b..20135f1d3387 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile
@@ -9,4 +9,5 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
 
 octeontx2_mbox-y := mbox.o rvu_trace.o
 octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
- rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o
+ rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o \
+ rvu_devlink.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index a28a518c0eae..58c48fa7aa72 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -2812,10 +2812,14 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (err)
goto err_mbox;
 
-   err = rvu_register_interrupts(rvu);
+   err = rvu_register_dl(rvu);
if (err)
goto err_flr;
 
+   err = rvu_register_interrupts(rvu);
+   if (err)
+   goto err_dl;
+
rvu_setup_rvum_blk_revid(rvu);
 
/* Enable AF's VFs (if any) */
@@ -2829,6 +2833,8 @@ static int rvu_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
return 0;
 err_irq:
rvu_unregister_interrupts(rvu);
+err_dl:
+   rvu_unregister_dl(rvu);
 err_flr:
rvu_flr_wq_destroy(rvu);
 err_mbox:
@@ -2858,6 +2864,7 @@ static void rvu_remove(struct pci_dev *pdev)
 
rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
+   rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 5ac9bb12415f..c112b299635d 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -12,7 +12,10 @@
 #define RVU_H
 
 #include 
+#include 
+
 #include "rvu_struct.h"
+#include "rvu_devlink.h"
 #include "common.h"
 #include "mbox.h"
 
@@ -372,10 +375,10 @@ struct rvu {
struct npc_kpu_profile_adapter kpu;
 
struct ptp  *ptp;
-
 #ifdef CONFIG_DEBUG_FS
struct rvu_debugfs  rvu_dbg;
 #endif
+   struct rvu_devlink  *rvu_dl;
 };
 
 static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
new file mode 100644
index ..c9f5f66e6701
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell OcteonTx2 RVU Devlink
+ *
+ * Copyright (C) 2020 Marvell International Ltd.
+ *
+ */
+
+#include "rvu.h"
+
+#define DRV_NAME "octeontx2-af"
+
+static int rvu_devlink_info_get(struct devlink *devlink, struct 
devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   char buf[10];
+   int err;
+
+   err = devlink_info_driver_name_put(req, DRV_NAME);
+   if (err)
+   return err;
+
+   sprintf(buf, "%X", OTX2_MBOX_VERSION);
+   return devlink_info_version_fixed_put(req, "mbox

[net-next PATCH 0/3] Add devlink and devlink health reporters to

2020-11-01 Thread George Cherian

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA and NIX blocks.
These reporters report the error count in respective blocks.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html


George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  octeontx2-af: Add devlink health reporters for NIX

 .../net/ethernet/marvell/octeontx2/Kconfig|   1 +
 .../ethernet/marvell/octeontx2/af/Makefile|   3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   5 +-
 .../marvell/octeontx2/af/rvu_devlink.c| 875 ++
 .../marvell/octeontx2/af/rvu_devlink.h|  67 ++
 .../marvell/octeontx2/af/rvu_struct.h |  33 +
 7 files changed, 990 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

-- 
2.25.1

[net-next PATCH 3/3] octeontx2-af: Add devlink health reporters for NIX

2020-11-01 Thread George Cherian

Add health reporters for RVU NPA block.
Only reporter dump is supported.

Output:
 # ./devlink health
 pci/0002:01:00.0:
   reporter npa
 state healthy error 0 recover 0
   reporter nix
 state healthy error 0 recover 0
 # ./devlink  health dump show pci/0002:01:00.0 reporter nix
  NIX_AF_GENERAL:
 Memory Fault on NIX_AQ_INST_S read: 0
 Memory Fault on NIX_AQ_RES_S write: 0
 AQ Doorbell error: 0
 Rx on unmapped PF_FUNC: 0
 Rx multicast replication error: 0
 Memory fault on NIX_RX_MCE_S read: 0
 Memory fault on multicast WQE read: 0
 Memory fault on mirror WQE read: 0
 Memory fault on mirror pkt write: 0
 Memory fault on multicast pkt write: 0
   NIX_AF_RAS:
 Poisoned data on NIX_AQ_INST_S read: 0
 Poisoned data on NIX_AQ_RES_S write: 0
 Poisoned data on HW context read: 0
 Poisoned data on packet read from mirror buffer: 0
 Poisoned data on packet read from mcast buffer: 0
 Poisoned data on WQE read from mirror buffer: 0
 Poisoned data on WQE read from multicast buffer: 0
 Poisoned data on NIX_RX_MCE_S read: 0
   NIX_AF_RVU:
 Unmap Slot Error: 0

Signed-off-by: Sunil Kovvuri Goutham 
Signed-off-by: Jerin Jacob 
Signed-off-by: George Cherian 
---
 .../marvell/octeontx2/af/rvu_devlink.c| 376 +-
 .../marvell/octeontx2/af/rvu_devlink.h|  24 ++
 .../marvell/octeontx2/af/rvu_struct.h |  10 +
 3 files changed, 409 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 946e751fb544..c2dd2026c7da 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -14,6 +14,7 @@
 #define DRV_NAME "octeontx2-af"
 
 void rvu_npa_unregister_interrupts(struct rvu *rvu);
+void rvu_nix_unregister_interrupts(struct rvu *rvu);
 
 int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name)
 {
@@ -37,6 +38,373 @@ int rvu_report_pair_end(struct devlink_fmsg *fmsg)
return devlink_fmsg_pair_nest_end(fmsg);
 }
 
+irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_count = rvu_dl->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT);
+
+   if (intr & BIT_ULL(0))
+   nix_event_count->unmap_slot_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr);
+   return IRQ_HANDLED;
+}
+
+irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_count = rvu_dl->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT);
+
+   if (intr & BIT_ULL(14))
+   nix_event_count->aq_inst_count++;
+   if (intr & BIT_ULL(13))
+   nix_event_count->aq_res_count++;
+   if (intr & BIT_ULL(12))
+   nix_event_count->aq_db_count++;
+   if (intr & BIT_ULL(6))
+   nix_event_count->rx_on_unmap_pf_count++;
+   if (intr & BIT_ULL(5))
+   nix_event_count->rx_mcast_repl_count++;
+   if (intr & BIT_ULL(4))
+   nix_event_count->rx_mcast_memfault_count++;
+   if (intr & BIT_ULL(3))
+   nix_event_count->rx_mcast_wqe_memfault_count++;
+   if (intr & BIT_ULL(2))
+   nix_event_count->rx_mirror_wqe_memfault_count++;
+   if (intr & BIT_ULL(1))
+   nix_event_count->rx_mirror_pktw_memfault_count++;
+   if (intr & BIT_ULL(0))
+   nix_event_count->rx_mcast_pktw_memfault_count++;
+
+   /* Clear interrupts */
+   rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr);
+   return IRQ_HANDLED;
+}
+
+irqreturn_t rvu_nix_af_ras_intr_handler(int irq, void *rvu_irq)
+{
+   struct rvu_nix_event_cnt *nix_event_count;
+   struct rvu_devlink *rvu_dl = rvu_irq;
+   struct rvu *rvu;
+   int blkaddr;
+   u64 intr;
+
+   rvu = rvu_dl->rvu;
+   blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0);
+   if (blkaddr < 0)
+   return IRQ_NONE;
+
+   nix_event_count = rvu_dl->nix_event_cnt;
+   intr = rvu_read64(rvu, blkaddr, NIX_AF_RAS);
+
+

[net-next PATCH 2/2] octeontx2-pf: Support to change VLAN based RSS hash options via ethtool

2020-09-22 Thread George Cherian

Add support to control rx-flow-hash based on VLAN.
By default VLAN plus 4-tuple based hashing is enabled.
Changes can be done runtime using ethtool

To enable 2-tuple plus VLAN based flow distribution
  # ethtool -N  rx-flow-hash  sdv
To enable 4-tuple plus VLAN based flow distribution
  # ethtool -N  rx-flow-hash  sdfnv

Signed-off-by: George Cherian 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c  | 2 +-
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c 
b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 820fc660de66..d2581090f9a4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -355,7 +355,7 @@ int otx2_rss_init(struct otx2_nic *pfvf)
rss->flowkey_cfg = rss->enable ? rss->flowkey_cfg :
   NIX_FLOW_KEY_TYPE_IPV4 | NIX_FLOW_KEY_TYPE_IPV6 |
   NIX_FLOW_KEY_TYPE_TCP | NIX_FLOW_KEY_TYPE_UDP |
-  NIX_FLOW_KEY_TYPE_SCTP;
+  NIX_FLOW_KEY_TYPE_SCTP | NIX_FLOW_KEY_TYPE_VLAN;
 
ret = otx2_set_flowkey_cfg(pfvf);
if (ret)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c 
b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index 0341d9694e8b..662fb80dbb9d 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -428,6 +428,8 @@ static int otx2_get_rss_hash_opts(struct otx2_nic *pfvf,
 
/* Mimimum is IPv4 and IPv6, SIP/DIP */
nfc->data = RXH_IP_SRC | RXH_IP_DST;
+   if (rss->flowkey_cfg & NIX_FLOW_KEY_TYPE_VLAN)
+   nfc->data |= RXH_VLAN;
 
switch (nfc->flow_type) {
case TCP_V4_FLOW:
@@ -477,6 +479,11 @@ static int otx2_set_rss_hash_opts(struct otx2_nic *pfvf,
if (!(nfc->data & RXH_IP_SRC) || !(nfc->data & RXH_IP_DST))
return -EINVAL;
 
+   if (nfc->data & RXH_VLAN)
+   rss_cfg |=  NIX_FLOW_KEY_TYPE_VLAN;
+   else
+   rss_cfg &= ~NIX_FLOW_KEY_TYPE_VLAN;
+
switch (nfc->flow_type) {
case TCP_V4_FLOW:
case TCP_V6_FLOW:
-- 
2.25.1

[net-next PATCH 1/2] octeontx2-af: Add support for VLAN based RSS hashing

2020-09-22 Thread George Cherian

Added support for PF/VF drivers to choose RSS flow key algorithm
with VLAN tag included in hashing input data. Only CTAG is considered.

Signed-off-by: George Cherian 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h| 1 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h 
b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 4aaef0a2b51c..aa3bda3f34be 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -625,6 +625,7 @@ struct nix_rss_flowkey_cfg {
 #define NIX_FLOW_KEY_TYPE_INNR_UDP  BIT(15)
 #define NIX_FLOW_KEY_TYPE_INNR_SCTP BIT(16)
 #define NIX_FLOW_KEY_TYPE_INNR_ETH_DMAC BIT(17)
+#define NIX_FLOW_KEY_TYPE_VLAN BIT(20)
u32 flowkey_cfg; /* Flowkey types selected */
u8  group;   /* RSS context or group */
 };
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c 
b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 08181fc5f5d4..4bdc4baa3c59 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -2509,6 +2509,14 @@ static int set_flowkey_fields(struct nix_rx_flowkey_alg 
*alg, u32 flow_cfg)
field->ltype_match = NPC_LT_LE_GTPU;
field->ltype_mask = 0xF;
break;
+   case NIX_FLOW_KEY_TYPE_VLAN:
+   field->lid = NPC_LID_LB;
+   field->hdr_offset = 2; /* Skip TPID (2-bytes) */
+   field->bytesm1 = 1; /* 2 Bytes (Actually 12 bits) */
+   field->ltype_match = NPC_LT_LB_CTAG;
+   field->ltype_mask = 0xF;
+   field->fn_mask = 1; /* Mask out the first nibble */
+   break;
}
field->ena = 1;
 
-- 
2.25.1

[net-next PATCH 0/2] Add support for VLAN based flow distribution

2020-09-22 Thread George Cherian

This series add support for VLAN based flow distribution for octeontx2
netdev driver. This adds support for configuring the same via ethtool.

Following tests have been done.
- Multi VLAN flow with same SD
- Multi VLAN flow with same SDFN
- Single VLAN flow with multi SD
- Single VLAN flow with multi SDFN
All tests done for udp/tcp both v4 and v6


George Cherian (2):
  octeontx2-af: Add support for VLAN based RSS hashing
  octeontx2-pf: Support to change VLAN based RSS hash options via
ethtool

 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  1 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c| 10 +-
 .../net/ethernet/marvell/octeontx2/nic/otx2_common.c   |  2 +-
 .../net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c  |  7 +++
 4 files changed, 18 insertions(+), 2 deletions(-)

-- 
2.25.1

Re: [PATCH v2 3/3] asm-generic/io.h: Fix !CONFIG_GENERIC_IOMAP pci_iounmap() implementation

2020-09-18 Thread George Cherian




> -Original Message-
> From: Lorenzo Pieralisi 
> Sent: Thursday, September 17, 2020 3:00 PM
> To: Catalin Marinas 
> Cc: linux-kernel@vger.kernel.org; George Cherian ;
> Arnd Bergmann ; Will Deacon ; Bjorn
> Helgaas ; Yang Yingliang
> ; linux-...@vger.kernel.org; linux-
> a...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; David S. Miller
> 
> Subject: Re: [PATCH v2 3/3] asm-generic/io.h: Fix
> !CONFIG_GENERIC_IOMAP pci_iounmap() implementation
> 
> 
> --
> On Wed, Sep 16, 2020 at 03:51:11PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 16, 2020 at 12:06:58PM +0100, Lorenzo Pieralisi wrote:
> > > For arches that do not select CONFIG_GENERIC_IOMAP, the current
> > > pci_iounmap() function does nothing causing obvious memory leaks for
> > > mapped regions that are backed by MMIO physical space.
> > >
> > > In order to detect if a mapped pointer is IO vs MMIO, a check must
> > > made available to the pci_iounmap() function so that it can actually
> > > detect whether the pointer has to be unmapped.
> > >
> > > In configurations where CONFIG_HAS_IOPORT_MAP &&
> > > !CONFIG_GENERIC_IOMAP, a mapped port is detected using an
> > > ioport_map() stub defined in asm-generic/io.h.
> > >
> > > Use the same logic to implement a stub (ie __pci_ioport_unmap())
> > > that detects if the passed in pointer in pci_iounmap() is IO vs MMIO
> > > to iounmap conditionally and call it in pci_iounmap() fixing the issue.
> > >
> > > Leave __pci_ioport_unmap() as a NOP for all other config options.
> > >
> > > Reported-by: George Cherian 
> > > Link:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org
> > > _lkml_20200905024811.74701-2D1-2Dyangyingliang-
> 40huawei.com=DwIBAg
> > >
> =nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk
> Mzr4a
> > > Cbk=UO5qU5LtNtCn6_gnT0rCkBxIm-w8jCaxHO6v7oK-U-
> I=CSGHQpKoVdNiqb1e
> > > DFuRUhka_Xv5o2PosWZ1rR8oOD4=
> > > Link:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org
> > > _lkml_20200824132046.3114383-2D1-2Dgeorge.cherian-
> 40marvell.com=Dw
> > >
> IBAg=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpz
> RkkM
> > > zr4aCbk=UO5qU5LtNtCn6_gnT0rCkBxIm-w8jCaxHO6v7oK-U-
> I=3B83oan7i1g3
> > > KaPgQmFK6PudR9GzvAPk33Z5Yyv-CMI=
> > > Signed-off-by: Lorenzo Pieralisi 
> > > Cc: Arnd Bergmann 
> > > Cc: George Cherian 
> > > Cc: Will Deacon 
> > > Cc: Bjorn Helgaas 
> > > Cc: Catalin Marinas 
> > > Cc: Yang Yingliang 
> > > ---
> > >  include/asm-generic/io.h | 39
> > > +++
> > >  1 file changed, 27 insertions(+), 12 deletions(-)
> >
> > This works for me. The only question I have is whether pci_iomap.h is
> > better than io.h for __pci_ioport_unmap(). These headers are really
> > confusing.
> 
> Yes they are, in total honesty there is much more to do to make them sane,
> this patch is just a band-aid.
> 
> I thought about moving this stuff into pci_iomap.h, though that file is
> included _independently_ from io.h from some arches so I tried to keep
> everything in io.h to minimize disruption.
> 
> We can merge this patch - since it is a fix after all - and then I can try to
> improve the whole pci_iounmap() includes.
> 
> > Either way:
> >
> > Reviewed-by: Catalin Marinas 
> 
> Thanks a lot. I'd appreciate a tested-by from the George as he is the one who
> reported the problem.

Verified this patch and it works as expected.
Tested-by: George Cherian 
 
> Lorenzo

Re: [PATCH] arm64: PCI: fix memleak when calling pci_iomap/unmap()

2020-09-07 Thread George Cherian




> -Original Message-
> From: Catalin Marinas 
> Sent: Monday, September 7, 2020 4:16 PM
> To: Yang Yingliang 
> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; will.dea...@arm.com; bhelg...@google.com;
> George Cherian ; guohan...@huawei.com
> Subject: Re: [PATCH] arm64: PCI: fix memleak when calling
> pci_iomap/unmap()
> 
> 
> --
> On Sat, Sep 05, 2020 at 10:48:11AM +0800, Yang Yingliang wrote:
> > diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index
> > 1006ed2d7c604..ddfa1c53def48 100644
> > --- a/arch/arm64/kernel/pci.c
> > +++ b/arch/arm64/kernel/pci.c
> > @@ -217,4 +217,9 @@ void pcibios_remove_bus(struct pci_bus *bus)
> > acpi_pci_remove_bus(bus);
> >  }
> >
> > +void pci_iounmap(struct pci_dev *dev, void __iomem *addr) {
> > +   iounmap(addr);
> > +}
> > +EXPORT_SYMBOL(pci_iounmap);
> 
> So, what's wrong with the generic pci_iounmap() implementation?
> Shouldn't it call iounmap() already?
Since ARM64 selects CONFIG_GENERIC_PCI_IOMAP and not
CONFIG_GENERIC_IOMAP,  the pci_iounmap function is reduced to a NULL
function. Due to this, even the managed release variants or even the explicit
pci_iounmap calls doesn't really remove the mappings leading to leak.

-George
https://lkml.org/lkml/2020/8/20/28

> 
> --
> Catalin

Re: Re: [PATCH v3] PCI: Add pci_iounmap

2020-09-01 Thread George Cherian

Hi Yang,

> -Original Message-
> From: Yang Yingliang 
> Sent: Tuesday, September 1, 2020 6:59 PM
> To: George Cherian ; linux-kernel@vger.kernel.org;
> linux-a...@vger.kernel.org; linux-...@vger.kernel.org
> Cc: kbuild-...@lists.01.org; bhelg...@google.com; a...@arndb.de;
> m...@redhat.com
> Subject: Re: [PATCH v3] PCI: Add pci_iounmap
> 
> 
>
> 
> On 2020/8/25 9:25, kernel test robot wrote:
> > Hi George,
> >
> > I love your patch! Yet something to improve:
> >
> > [auto build test ERROR on pci/next]
> > [also build test ERROR on linux/master linus/master asm-generic/master
> > v5.9-rc2 next-20200824] [If your patch is applied to the wrong git tree,
> kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git-
> 2Dscm.com_doc
> > s_git-2Dformat-2Dpatch=DwIC-
> g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7di
> >
> rkF6u2D3eSIS0cA8FeYpzRkkMzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIV
> WLSWfG61
> > NWTWG5LI=ycW6SZOVRuKAm3YwdhyAuSh22oPuengSMVuv-
> EwaUew= ]
> >
> > url:https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_0day-2Dci_linux_commits_George-2DCherian_PCI-2DAdd-
> 2Dpci-5Fiounmap_20200824-2D212149=DwIC-
> g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk
> Mzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIVWLSWfG61NWTWG5LI=6c
> UOYHeDOBZ0HaFc2z-vaDgDmbIK4LCBRt9kNkn1sto=
> > base:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__git.kernel.org_pub_scm_linux_kernel_git_helgaas_pci.git=DwIC-
> g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk
> Mzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIVWLSWfG61NWTWG5LI=h-
> TMyLlEdAwew-u52q4dgWBUMgm0ys-xKzvOO86e1Lw=  next
> > config: powerpc-allyesconfig (attached as .config)
> > compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1
> > build):
> >  wget https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__raw.githubusercontent.com_intel_lkp-
> 2Dtests_master_sbin_make.cross=DwIC-
> g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk
> Mzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIVWLSWfG61NWTWG5LI=az
> QcL0MQmPpr9UfvyBSSdQiu1UbjJgFrzNJOtcZ_--E=  -O ~/bin/make.cross
> >  chmod +x ~/bin/make.cross
> >  # save the attached .config to linux build tree
> >  COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0
> > make.cross ARCH=powerpc
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot 
> >
> > All errors (new ones prefixed by >>):
> >
> > powerpc64-linux-ld: lib/pci_iomap.o: in function `__crc_pci_iounmap':
> >>> (.rodata+0x10): multiple definition of `__crc_pci_iounmap';
> >>> lib/iomap.o:(.rodata+0x68): first defined here
> EXPORT_SYMBOL(pci_iounmap) in lib/iomap.c need be removed.
I really don't think that is the way to fix this. I have also seen your other 
patch 
in which iomap being moved out of lib/iomap.c to header file.

There was a reason for moving iomap and its variants to a lib since most of
the arch's implementation of map was similar. Whereas the unmap had multiple 
implementation per arch's. So, the lib/iomap never implemented the generic 
unmap.

I see either of the following solution.
a. Have an arm64 specific implementation for the unmap function.
Or
b. something on the lines of v2[1], which accommodates all the arch's but has 
the #ifdef
for which Bjorn raised his concerns.

Bjorn, any comments?

Regards
-George

[1] - https://lkml.org/lkml/2020/8/20/28

[PATCH v3] PCI: Add pci_iounmap

2020-08-24 Thread George Cherian

In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not
CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a NULL
function. Due to this the managed release variants or even the explicit
pci_iounmap calls doesn't really remove the mappings.

This issue is seen on an arm64 based system. arm64 by default selects
only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP from this
'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")'

Also '66eab4df288a ("lib: add GENERIC_PCI_IOMAP")' moved only  the iomap
functions to lib/pci_iomap.c. The pci_iounmap() was left in lib/iomap.c
as different achitectures has its own pci_iounmap implementation.
For architectures, which doesn't have pci_iounmap implemented, this
would lead to a potential leak. So provide a generic iounmap function in
lib/pci_iomap.c.

Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap,
would lead to the following error message after long hour tests

"allocation failed: out of vmalloc space - use vmalloc= to
increase size."

Signed-off-by: George Cherian 
---
* Changes from v2
- Get rid of the #ifdefs around pci_iounmap()
* Changes from v1
- Fix the 0-day compilation error.
- Mark the lib/iomap pci_iounmap call as weak incase
if any architecture have there own implementation.
 include/asm-generic/io.h| 4 
 include/asm-generic/iomap.h | 1 -
 include/asm-generic/pci_iomap.h | 1 +
 lib/pci_iomap.c | 6 ++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index dabf8cb7203b..5986b37226b7 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void __iomem 
*addr,
 struct pci_dev;
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long 
max);
 
+#ifdef CONFIG_GENERIC_PCI_IOMAP
+extern void pci_iounmap(struct pci_dev *dev, void __iomem *p);
+#else
 #ifndef pci_iounmap
 #define pci_iounmap pci_iounmap
 static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p)
 {
 }
 #endif
+#endif /* CONFIG_GENERIC_PCI_IOMAP */
 #endif /* CONFIG_GENERIC_IOMAP */
 
 /*
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index 649224664969..68c75e26edbd 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -104,7 +104,6 @@ extern void ioport_unmap(void __iomem *);
 #ifdef CONFIG_PCI
 /* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */
 struct pci_dev;
-extern void pci_iounmap(struct pci_dev *dev, void __iomem *);
 #elif defined(CONFIG_GENERIC_IOMAP)
 struct pci_dev;
 static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
index d4f16dcc2ed7..3684307a6b44 100644
--- a/include/asm-generic/pci_iomap.h
+++ b/include/asm-generic/pci_iomap.h
@@ -18,6 +18,7 @@ extern void __iomem *pci_iomap_range(struct pci_dev *dev, int 
bar,
 extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
unsigned long offset,
unsigned long maxlen);
+extern void pci_iounmap(struct pci_dev *dev, void __iomem *p);
 /* Create a virtual mapping cookie for a port on a given PCI device.
  * Do not call this directly, it exists to make it easier for architectures
  * to override */
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index 2d3eb1cb73b8..e97b73995af7 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -134,4 +134,10 @@ void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, 
unsigned long maxlen)
return pci_iomap_wc_range(dev, bar, 0, maxlen);
 }
 EXPORT_SYMBOL_GPL(pci_iomap_wc);
+
+void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr)
+{
+   iounmap(addr);
+}
+EXPORT_SYMBOL(pci_iounmap);
 #endif /* CONFIG_PCI */
-- 
2.25.1

Re: [PATCHv2] PCI: Add pci_iounmap

2020-08-21 Thread George Cherian

Hi Bjorn,

> -Original Message-
> From: Bjorn Helgaas 
> Sent: Friday, August 21, 2020 3:26 AM
> To: George Cherian 
> Cc: linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; linux-
> p...@vger.kernel.org; bhelg...@google.com; a...@arndb.de; Michael S.
> Tsirkin 
> Subject: [EXT] Re: [PATCHv2] PCI: Add pci_iounmap
> 
> [+cc Michael, author of 66eab4df288a ("lib: add GENERIC_PCI_IOMAP")]
> 
> On Thu, Aug 20, 2020 at 10:33:06AM +0530, George Cherian wrote:
> > In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not
> > CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a
> > NULL function. Due to this the managed release variants or even the
> > explicit pci_iounmap calls doesn't really remove the mappings.
> >
> > This issue is seen on an arm64 based system. arm64 by default selects
> > only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP
> from this
> > 'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")'
> >
> > Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap,
> > would lead to the following error message after long hour tests
> >
> > "allocation failed: out of vmalloc space - use vmalloc= to
> > increase size."
> >
> > Signed-off-by: George Cherian 
> > ---
> > * Changes from v1
> > - Fix the 0-day compilation error.
> > - Mark the lib/iomap pci_iounmap call as weak incase
> >   if any architecture have there own implementation.
> >
> >  include/asm-generic/io.h |  4 
> >  lib/pci_iomap.c  | 10 ++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index
> > dabf8cb7203b..5986b37226b7 100644
> > --- a/include/asm-generic/io.h
> > +++ b/include/asm-generic/io.h
> > @@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void
> > __iomem *addr,  struct pci_dev;  extern void __iomem *pci_iomap(struct
> > pci_dev *dev, int bar, unsigned long max);
> >
> > +#ifdef CONFIG_GENERIC_PCI_IOMAP
> > +extern void pci_iounmap(struct pci_dev *dev, void __iomem *p); #else
> >  #ifndef pci_iounmap
> >  #define pci_iounmap pci_iounmap
> >  static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p)
> > {  }  #endif
> > +#endif /* CONFIG_GENERIC_PCI_IOMAP */
> >  #endif /* CONFIG_GENERIC_IOMAP */
> >
> >  /*
> > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index
> > 2d3eb1cb73b8..ecd1eb3f6c25 100644
> > --- a/lib/pci_iomap.c
> > +++ b/lib/pci_iomap.c
> > @@ -134,4 +134,14 @@ void __iomem *pci_iomap_wc(struct pci_dev
> *dev, int bar, unsigned long maxlen)
> > return pci_iomap_wc_range(dev, bar, 0, maxlen);  }
> > EXPORT_SYMBOL_GPL(pci_iomap_wc);
> > +
> > +#ifndef CONFIG_GENERIC_IOMAP
> > +#define pci_iounmap pci_iounmap
> > +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr);
> > +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr) {
> > +   iounmap(addr);
> > +}
> > +EXPORT_SYMBOL(pci_iounmap);
> > +#endif
> 
> I completely agree that this looks like a leak that needs to be fixed.
> 
> But my head hurts after trying to understand pci_iomap() and
> pci_iounmap().  I hate to add even more #ifdefs here.  Can't we somehow
> rationalize this and put pci_iounmap() next to pci_iomap()?

Yes,  that makes more sense than having #ifdefs here.
I will re-spin and send out another version.
> 
> 66eab4df288a ("lib: add GENERIC_PCI_IOMAP") moved pci_iomap() from
> lib/iomap.c to lib/pci_iomap.c, but left pci_iounmap() in lib/iomap.c.
> There must be some good reason why they're separated, but I don't know
> what it is.
> 
> >  #endif /* CONFIG_PCI */
> > --
> > 2.25.1
> >

[PATCHv2] PCI: Add pci_iounmap

2020-08-19 Thread George Cherian

In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not
CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a NULL
function. Due to this the managed release variants or even the explicit
pci_iounmap calls doesn't really remove the mappings.

This issue is seen on an arm64 based system. arm64 by default selects
only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP from this
'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")'

Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap,
would lead to the following error message after long hour tests

"allocation failed: out of vmalloc space - use vmalloc= to
increase size."

Signed-off-by: George Cherian 
---
* Changes from v1
- Fix the 0-day compilation error.
- Mark the lib/iomap pci_iounmap call as weak incase 
  if any architecture have there own implementation.

 include/asm-generic/io.h |  4 
 lib/pci_iomap.c  | 10 ++
 2 files changed, 14 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index dabf8cb7203b..5986b37226b7 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void __iomem 
*addr,
 struct pci_dev;
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long 
max);
 
+#ifdef CONFIG_GENERIC_PCI_IOMAP
+extern void pci_iounmap(struct pci_dev *dev, void __iomem *p);
+#else
 #ifndef pci_iounmap
 #define pci_iounmap pci_iounmap
 static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p)
 {
 }
 #endif
+#endif /* CONFIG_GENERIC_PCI_IOMAP */
 #endif /* CONFIG_GENERIC_IOMAP */
 
 /*
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index 2d3eb1cb73b8..ecd1eb3f6c25 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -134,4 +134,14 @@ void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, 
unsigned long maxlen)
return pci_iomap_wc_range(dev, bar, 0, maxlen);
 }
 EXPORT_SYMBOL_GPL(pci_iomap_wc);
+
+#ifndef CONFIG_GENERIC_IOMAP
+#define pci_iounmap pci_iounmap
+void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr);
+void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr)
+{
+   iounmap(addr);
+}
+EXPORT_SYMBOL(pci_iounmap);
+#endif
 #endif /* CONFIG_PCI */
-- 
2.25.1

[PATCH] PCI: Add pci_iounmap

2020-08-19 Thread George Cherian

In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not
CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a NULL
function. Due to this the managed release variants or even the explicit
pci_iounmap calls doesn't really remove the mappings.

This issue is seen on an arm64 based system. arm64 by default selects
only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP from this
'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")'

Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap,
would lead to the following error message after long hour tests

"allocation failed: out of vmalloc space - use vmalloc= to
increase size."

Signed-off-by: George Cherian 
---
 include/asm-generic/io.h | 4 
 lib/pci_iomap.c  | 9 +
 2 files changed, 13 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index dabf8cb7203b..5986b37226b7 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void __iomem 
*addr,
 struct pci_dev;
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long 
max);
 
+#ifdef CONFIG_GENERIC_PCI_IOMAP
+extern void pci_iounmap(struct pci_dev *dev, void __iomem *p);
+#else
 #ifndef pci_iounmap
 #define pci_iounmap pci_iounmap
 static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p)
 {
 }
 #endif
+#endif /* CONFIG_GENERIC_PCI_IOMAP */
 #endif /* CONFIG_GENERIC_IOMAP */
 
 /*
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index 2d3eb1cb73b8..36128af05e1c 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -134,4 +134,13 @@ void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, 
unsigned long maxlen)
return pci_iomap_wc_range(dev, bar, 0, maxlen);
 }
 EXPORT_SYMBOL_GPL(pci_iomap_wc);
+
+#ifndef CONFIG_GENERIC_IOMAP
+#define pci_iounmap pci_iounmap
+void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
+{
+   iounmap(addr);
+}
+EXPORT_SYMBOL(pci_iounmap);
+#endif
 #endif /* CONFIG_PCI */
-- 
2.25.1

Re: [EXT] Re: [PATCH] PCI: Enhance the ACS quirk for Cavium devices

2019-10-08 Thread George Cherian

Hi Bjorn,

Sorry for the late reply I was off for couple of days.

On 10/8/19 2:32 PM, Bjorn Helgaas wrote:
> External Email
>
> --
> On Tue, Oct 08, 2019 at 08:25:23AM +, Robert Richter wrote:
>> On 04.10.19 14:48:13, Bjorn Helgaas wrote:
>>> commit 37b22fbfec2d
>>> Author: George Cherian 
>>> Date:   Thu Sep 19 02:43:34 2019 +
>>>
>>>  PCI: Apply Cavium ACS quirk to CN99xx and CN11xxx Root Ports
>>>  
>>>  Add an array of Cavium Root Port device IDs and apply the quirk only 
>>> to the
>>>  listed devices.
>>>  
>>>  Instead of applying the quirk to all Root Ports where
>>>  "(dev->device & 0xf800) == 0xa000", apply it only to CN88xx 0xa180 and
>>>  0xa170 Root Ports.

All the root ports of CN88xx series will have device id's 0xa180 and 0xa170.

This patch currently targets only CN88xx series and not all of the CN8xxx.

For eg:- 83xx devices don't wont the quirk to be applied as of today. 
The quirk

needs to be applied only for TX1 series and not oncteon-tx1 series.

>> No, this can't be removed. It is a match all for all CN8xxx variants
>> (note the 3 'x', all TX1 cores). So all device ids from 0xa000 to
>> 0xa7FF are affected here and need the quirk.
> OK, I'll drop the patch and wait for a new one.  Maybe what was needed
> was to keep the "(dev->device & 0xf800) == 0xa000" part and add the
> pci_quirk_cavium_acs_ids[] array in addition?
>
>>>  Also apply the quirk to CN99xx (0xaf84) and CN11xxx (0xb884) Root 
>>> Ports.

The device id's for all variants of CN99xx is 0xaf84 and CN11xxx will be 
0xb884.

So this patch holds good for TX2 as well as TX3 series of processors.


>> I thought the quirk is CN8xxx specific, but I could be wrong here.
>>
>> -Robert
>>
>>>  
>>>  Link: 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_r_20190919024319.GA8792-40dc5-2Deodlnx05.marvell.com=DwIBAg=nKjWec2b6R0mOyPaz7xtfQ=8vKOpC26NZGzQPAMiIlimxyEGCRSJiq-j8yyjPJ6VZ4=Vmml-rx3t63ZbbXZ0XaESAM9yAlexE29R-giTbcj4Qk=57jKIj8BAydbLpftLt5Ssva7vD6GuoCaIpjTi-sB5kU=
>>>  Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more 
>>> Root Ports")
>>>  Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices")
>>>  Signed-off-by: George Cherian 
>>>  Signed-off-by: Bjorn Helgaas 
>>>  Cc: sta...@vger.kernel.org  # v4.12+
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 320255e5e8f8..4e5048cb5ec6 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -4311,17 +4311,24 @@ static int pci_quirk_amd_sb_acs(struct pci_dev 
>>> *dev, u16 acs_flags)
>>>   #endif
>>>   }
>>>   
>>> +static const u16 pci_quirk_cavium_acs_ids[] = {
>>> +   0xa180, 0xa170, /* CN88xx family of devices */
>>> +   0xaf84, /* CN99xx family of devices */
>>> +   0xb884, /* CN11xxx family of devices */
>>> +};
>>> +
>>>   static bool pci_quirk_cavium_acs_match(struct pci_dev *dev)
>>>   {
>>> -   /*
>>> -* Effectively selects all downstream ports for whole ThunderX 1
>>> -* family by 0xf800 mask (which represents 8 SoCs), while the lower
>>> -* bits of device ID are used to indicate which subdevice is used
>>> -* within the SoC.
>>> -*/
>>> -   return (pci_is_pcie(dev) &&
>>> -   (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) &&
>>> -   ((dev->device & 0xf800) == 0xa000));
>>> +   int i;
>>> +
>>> +   if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
>>> +   return false;
>>> +
>>> +   for (i = 0; i < ARRAY_SIZE(pci_quirk_cavium_acs_ids); i++)
>>> +   if (pci_quirk_cavium_acs_ids[i] == dev->device)
>>> +   return true;
>>> +
>>> +   return false;
>>>   }
>>>   
>>>   static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)

[PATCH] PCI: Enhance the ACS quirk for Cavium devices

2019-09-18 Thread George Cherian

Enhance the ACS quirk for Cavium Processors. Add the root port
vendor ID's in an array and use the same in match function.
For newer devices add the vendor ID's in the array so that the
match function is simpler.

Signed-off-by: George Cherian 
---
 drivers/pci/quirks.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44c4ae1abd00..64deeaddd51c 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4241,17 +4241,27 @@ static int pci_quirk_amd_sb_acs(struct pci_dev *dev, 
u16 acs_flags)
 #endif
 }
 
+static const u16 pci_quirk_cavium_acs_ids[] = {
+   /* CN88xx family of devices */
+   0xa180, 0xa170,
+   /* CN99xx family of devices */
+   0xaf84,
+   /* CN11xxx family of devices */
+   0xb884,
+};
+
 static bool pci_quirk_cavium_acs_match(struct pci_dev *dev)
 {
-   /*
-* Effectively selects all downstream ports for whole ThunderX 1
-* family by 0xf800 mask (which represents 8 SoCs), while the lower
-* bits of device ID are used to indicate which subdevice is used
-* within the SoC.
-*/
-   return (pci_is_pcie(dev) &&
-   (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) &&
-   ((dev->device & 0xf800) == 0xa000));
+   int i;
+
+   if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
+   return false;
+
+   for (i = 0; i < ARRAY_SIZE(pci_quirk_cavium_acs_ids); i++)
+   if (pci_quirk_cavium_acs_ids[i] == dev->device)
+   return true;
+
+   return false;
 }
 
 static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
-- 
2.17.1

Re: [RFC PATCH] cpufreq / cppc: Work around for Hisilicon CPPC cpufreq

2019-01-24 Thread George Cherian

Hi Wang,

On Thu, Jan 24, 2019 at 12:27 PM Viresh Kumar  wrote:
>
> +George/Prashanth.
>
> Guys please see if you have any objections to this patch. I am not
> very familiar with this stuff and it would be good to get some
> feedback from you guys.
>
> @Rafael: Do you have any comments on this ?
>
> On 17-01-19, 19:00, Xiongfeng Wang wrote:
> > Hisilicon chips do not support delivered performance counter register
> > and reference performance counter register. But the platform can
> > calculate the real performance using its own method. This patch provide
> > a workaround for this problem, and other platforms can also use this
> > workaround framework. We reuse the desired performance register to
> > store the real performance calculated by the platform. After the
> > platform finished the frequency adjust, it gets the real performance and
> > writes it into desired performance register. OS can use it to calculate
> > the real frequency.
Does your platform support Autonomous Selection mode?
This register is not valid when autonomous mode is enabled. In such case
how are you calculating the frequency?
> >
> > Signed-off-by: Xiongfeng Wang 
> > ---
> >  drivers/acpi/cppc_acpi.c   | 29 
> >  drivers/cpufreq/Kconfig.arm|  7 +
> >  drivers/cpufreq/cppc_cpufreq.c | 62 
> > ++
> >  include/acpi/cppc_acpi.h   |  4 +++
> >  4 files changed, 102 insertions(+)
> >
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index 217a782..0cdaf7e 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1050,6 +1050,35 @@ static int cpc_write(int cpu, struct 
> > cpc_register_resource *reg_res, u64 val)
> >   return ret_val;
> >  }
> >
> > +#ifdef CONFIG_HISILICON_CPPC_CPUFREQ_WORKAROUND
> > +/*
> > + * We reuse the desired performance register to store the real performance
> > + * calculated by the platform.
> > + */
> > +u64 hisi_cppc_get_real_perf(unsigned int cpunum)
> > +{
> > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> > + struct cpc_register_resource *desired_reg;
> > + u64 desired_perf;
> > + int ret;
> > +
> > + /*
> > +  * Make sure the platform has finished the frequency adjust
> > +  * and wrote the real performance in desired performance register
> > +  */
> > + ret = check_pcc_chan(pcc_ss_id, false);
> > + if (ret)
> > + return 0;
If there is a pending command in the channel then returning zero
will give bogus frequency value. You may return the previous written value.
> > +
> > + desired_reg = _desc->cpc_regs[DESIRED_PERF];
> > + cpc_read(cpunum, desired_reg, _perf);
> > +
> > + return desired_perf;
> > +}
> > +EXPORT_SYMBOL_GPL(hisi_cppc_get_real_perf);
> > +#endif
> > +
> >  /**
> >   * cppc_get_perf_caps - Get a CPUs performance capabilities.
> >   * @cpunum: CPU from which to get capabilities info.
> > diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> > index 688f102..236bd07 100644
> > --- a/drivers/cpufreq/Kconfig.arm
> > +++ b/drivers/cpufreq/Kconfig.arm
> > @@ -18,6 +18,13 @@ config ACPI_CPPC_CPUFREQ
> >
> > If in doubt, say N.
> >
> > +config HISILICON_CPPC_CPUFREQ_WORKAROUND
> > + bool "Workaround for Hisilicon CPPC Cpufreq"
> > + default y
> > + depends on ACPI_CPPC_CPUFREQ && ARM64
Do you really want this to be applied to all ARM64? or just only
for affected HISI platforms?
> > + help
> > +   This option enables a workaround for Hisilicon CPPC Cpufreq.
> > +
> >  config ARM_ARMADA_37XX_CPUFREQ
> >   tristate "Armada 37xx CPUFreq support"
> >   depends on ARCH_MVEBU && CPUFREQ_DT
> > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > index fd25c21c..b910e84 100644
> > --- a/drivers/cpufreq/cppc_cpufreq.c
> > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > @@ -33,6 +33,13 @@
> >  /* Offest in the DMI processor structure for the max frequency */
> >  #define DMI_PROCESSOR_MAX_SPEED  0x14
> >
> > +struct cppc_get_rate_workaround_info {
If your intention is to make a generic framework for future extensions
then you might need to name it differently. Something like
struct cppc_workaround_info, so that you can extend the same for any
future workarounds.
> > + char oem_id[ACPI_OEM_ID_SIZE +1];
> > + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1];
> > + u32 oem_revision;
> > + unsigned int (*get)(unsigned int cpu);
This can be  unsigned int (*get_rate)(unsigned int cpu);
> > +};
> > +
> >  /*
> >   * These structs contain information parsed from per CPU
> >   * ACPI _CPC structures.
> > @@ -357,6 +364,59 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int 
> > cpunum)
> >   .name = "cppc_cpufreq",
> >  };
> >
> > +#ifdef CONFIG_HISILICON_CPPC_CPUFREQ_WORKAROUND
> > +/*
> > + * When the platform does not support delivered

Re: [PATCH] xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc

2018-10-28 Thread George Cherian


Hi Alan,

Thanks for the review.
I will update the patch accordingly and send out v2.

On 10/28/2018 10:48 PM, Alan Stern wrote:
> 
> On Sat, 27 Oct 2018, Cherian, George wrote:
> 
>> Implement workaround for ThunderX2 Errata-129 (documented in
>> CN99XX Known Issues" available at Cavium support site).
>> As per ThunderX2errata-129, USB-2.0 device may come up as USB-1.0
>> If a connection to a USB-1.0 device is followed by another connection
>> to a USB-2.0 device, the link will come up as USB-1.0 for the USB-2.0
>> device.
>>
>> Resolution: Reset the PHY after the USB1.0 device is disconnected.
>> The PHY reset sequence is done using private registers in XHCI register
>> space. After the PHY is reset we check for the PLL lock status and retry
>> the operation if it fails. From our tests, retrying 4 times is sufficient.
>>
>> Add a new quirk flag XHCI_RESET_PLL_ON_DISCONNECT to invoke the workaround
>> in handle_xhci_port_status().
> 
> Minor nitpick (for both the patch description and the code comments):
> 
> USB 1.0 was never widely adopted and is not used any more.  The
> earliest vesion of USB currently used in supported devices is USB 1.1.
> Likewise, there are a few devices around that support USB 2.1, not
> USB 2.0, but they are presumably also subject to the problem described
> above.
> 
> I suggest you change the description and the comments to refer to USB 1
> and USB 2 instead of USB 1.0 and USB 2.0, as the latter are too
> restrictive and misleading.
> 
> Alan Stern
> 
Regards,
-George

Re: [PATCH] xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc

2018-10-28 Thread George Cherian


Hi Alan,

Thanks for the review.
I will update the patch accordingly and send out v2.

On 10/28/2018 10:48 PM, Alan Stern wrote:
> 
> On Sat, 27 Oct 2018, Cherian, George wrote:
> 
>> Implement workaround for ThunderX2 Errata-129 (documented in
>> CN99XX Known Issues" available at Cavium support site).
>> As per ThunderX2errata-129, USB-2.0 device may come up as USB-1.0
>> If a connection to a USB-1.0 device is followed by another connection
>> to a USB-2.0 device, the link will come up as USB-1.0 for the USB-2.0
>> device.
>>
>> Resolution: Reset the PHY after the USB1.0 device is disconnected.
>> The PHY reset sequence is done using private registers in XHCI register
>> space. After the PHY is reset we check for the PLL lock status and retry
>> the operation if it fails. From our tests, retrying 4 times is sufficient.
>>
>> Add a new quirk flag XHCI_RESET_PLL_ON_DISCONNECT to invoke the workaround
>> in handle_xhci_port_status().
> 
> Minor nitpick (for both the patch description and the code comments):
> 
> USB 1.0 was never widely adopted and is not used any more.  The
> earliest vesion of USB currently used in supported devices is USB 1.1.
> Likewise, there are a few devices around that support USB 2.1, not
> USB 2.0, but they are presumably also subject to the problem described
> above.
> 
> I suggest you change the description and the comments to refer to USB 1
> and USB 2 instead of USB 1.0 and USB 2.0, as the latter are too
> restrictive and misleading.
> 
> Alan Stern
> 
Regards,
-George

Re: [PATCH 2/2] ipmi_ssif: Fix crash seen while ipmi_unregister_smi

2018-08-26 Thread George Cherian




Hi Corey,


On 08/24/2018 06:38 PM, Corey Minyard wrote:


On 08/24/2018 06:10 AM, George Cherian wrote:

Dont set ssif_info->intf to NULL before ipmi_unresgiter_smi.
shutdown_ssif will anyways free ssif_info.


This is correct, but it goes a little deeper.  I just sent out a
patch yesterday that included this.


Yes I saw the patch now, 
https://sourceforge.net/p/openipmi/mailman/message/36397896/

I will test and update in that thread.



Thanks,

-corey


Following crash is obsearved if ssif_info->intf is set to NULL
before ipmi_unregister_smi.

  CPU: 119 PID: 7317 Comm: kssif000e Not tainted 4.18.0+ #80
  Hardware name: Cavium Inc. Saber/Saber, BIOS Cavium reference 
firmware version 7.0 08/04/2018

  pstate: 2049 (nzCv daif +PAN -UAO)
  pc : ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler]
  lr : deliver_recv_msg+0x30/0x5c [ipmi_ssif]
  sp : 37a0fd20
  x29: 37a0fd20 x28: 
  x27: 047e08f0 x26: 800ed9375800
  x25: 37a0fe00 x24: 09073000
  x23: 0013 x22: 
  x21: 7000 x20: 800adce18400
  x19:  x18: 3742fd38
  x17: 089960f0 x16: 000e
  x15: 0007 x14: 
  x13:  x12: 0033
  x11: 0381 x10: 0ba0
  x9 :  x8 : 800ac001fc00
  x7 : 7fe003b4d800 x6 : 800adce1854b
  x5 : 0014 x4 : 0004
  x3 :  x2 : 0002
  x1 : 567cb12f8b916b00 x0 : 0002
  Process kssif000e (pid: 7317, stack limit = 0x41077d8a)
  Call trace:
   ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler]
   deliver_recv_msg+0x30/0x5c [ipmi_ssif]
   msg_done_handler+0x2f0/0x66c [ipmi_ssif]
   ipmi_ssif_thread+0x108/0x124 [ipmi_ssif]
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: b9402280 91401e75 f90037a1 7100041f (b945bab6)
  ---[ end trace fb7d748bc7b17490 ]---
  Kernel panic - not syncing: Fatal exception
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x23800c38
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: George Cherian 
---
  drivers/char/ipmi/ipmi_ssif.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_ssif.c 
b/drivers/char/ipmi/ipmi_ssif.c

index ccdf6b1..1490636 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -1226,7 +1226,6 @@ static void shutdown_ssif(void *send_info)
  static int ssif_remove(struct i2c_client *client)
  {
  struct ssif_info *ssif_info = i2c_get_clientdata(client);
- struct ipmi_smi *intf;
  struct ssif_addr_info *addr_info;

  if (!ssif_info)
@@ -1236,9 +1235,7 @@ static int ssif_remove(struct i2c_client *client)
   * After this point, we won't deliver anything asychronously
   * to the message handler.  We can unregister ourself.
   */
- intf = ssif_info->intf;
- ssif_info->intf = NULL;
- ipmi_unregister_smi(intf);
+ ipmi_unregister_smi(ssif_info->intf);

  list_for_each_entry(addr_info, _infos, link) {
  if (addr_info->client == client) {

Re: [PATCH 2/2] ipmi_ssif: Fix crash seen while ipmi_unregister_smi

2018-08-26 Thread George Cherian




Hi Corey,


On 08/24/2018 06:38 PM, Corey Minyard wrote:


On 08/24/2018 06:10 AM, George Cherian wrote:

Dont set ssif_info->intf to NULL before ipmi_unresgiter_smi.
shutdown_ssif will anyways free ssif_info.


This is correct, but it goes a little deeper.  I just sent out a
patch yesterday that included this.


Yes I saw the patch now, 
https://sourceforge.net/p/openipmi/mailman/message/36397896/

I will test and update in that thread.



Thanks,

-corey


Following crash is obsearved if ssif_info->intf is set to NULL
before ipmi_unregister_smi.

  CPU: 119 PID: 7317 Comm: kssif000e Not tainted 4.18.0+ #80
  Hardware name: Cavium Inc. Saber/Saber, BIOS Cavium reference 
firmware version 7.0 08/04/2018

  pstate: 2049 (nzCv daif +PAN -UAO)
  pc : ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler]
  lr : deliver_recv_msg+0x30/0x5c [ipmi_ssif]
  sp : 37a0fd20
  x29: 37a0fd20 x28: 
  x27: 047e08f0 x26: 800ed9375800
  x25: 37a0fe00 x24: 09073000
  x23: 0013 x22: 
  x21: 7000 x20: 800adce18400
  x19:  x18: 3742fd38
  x17: 089960f0 x16: 000e
  x15: 0007 x14: 
  x13:  x12: 0033
  x11: 0381 x10: 0ba0
  x9 :  x8 : 800ac001fc00
  x7 : 7fe003b4d800 x6 : 800adce1854b
  x5 : 0014 x4 : 0004
  x3 :  x2 : 0002
  x1 : 567cb12f8b916b00 x0 : 0002
  Process kssif000e (pid: 7317, stack limit = 0x41077d8a)
  Call trace:
   ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler]
   deliver_recv_msg+0x30/0x5c [ipmi_ssif]
   msg_done_handler+0x2f0/0x66c [ipmi_ssif]
   ipmi_ssif_thread+0x108/0x124 [ipmi_ssif]
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: b9402280 91401e75 f90037a1 7100041f (b945bab6)
  ---[ end trace fb7d748bc7b17490 ]---
  Kernel panic - not syncing: Fatal exception
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x23800c38
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: George Cherian 
---
  drivers/char/ipmi/ipmi_ssif.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_ssif.c 
b/drivers/char/ipmi/ipmi_ssif.c

index ccdf6b1..1490636 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -1226,7 +1226,6 @@ static void shutdown_ssif(void *send_info)
  static int ssif_remove(struct i2c_client *client)
  {
  struct ssif_info *ssif_info = i2c_get_clientdata(client);
- struct ipmi_smi *intf;
  struct ssif_addr_info *addr_info;

  if (!ssif_info)
@@ -1236,9 +1235,7 @@ static int ssif_remove(struct i2c_client *client)
   * After this point, we won't deliver anything asychronously
   * to the message handler.  We can unregister ourself.
   */
- intf = ssif_info->intf;
- ssif_info->intf = NULL;
- ipmi_unregister_smi(intf);
+ ipmi_unregister_smi(ssif_info->intf);

  list_for_each_entry(addr_info, _infos, link) {
  if (addr_info->client == client) {

[PATCH v2] i2c: xlp9xx: Fix case where SSIF read transaction completes early

2018-08-09 Thread George Cherian

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and  the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.

Signed-off-by: George Cherian 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index 1f41a4f..7134f72 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -191,28 +191,43 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+
+   /*
+* We expect at least 2 interrupts for I2C_M_RECV_LEN
+* transactions. The length is updated during the first
+* interrupt, and the buffer contents are only copied
+* during subsequent interrupts. If in case the interrupts
+* get merged we would complete the transaction without
+* copying out the bytes from RX fifo. To avoid this now we
+* drain the fifo as and when data is available.
+* We drained the rlen byte already, decrement total length
+* by one.
+*/
+
+   len--;
if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) {
rlen = 0;   /*abort transfer */
priv->msg_buf_remaining = 0;
priv->msg_len = 0;
-   } else {
-   *buf++ = rlen;
-   if (priv->client_pec)
-   ++rlen; /* account for error check byte */
-   /* update remaining bytes and message length */
-   priv->msg_buf_remaining = rlen;
-   priv->msg_len = rlen + 1;
+   xlp9xx_i2c_update_rlen(priv);
+   return;
}
+
+   *buf++ = rlen;
+   if (priv->client_pec)
+   ++rlen; /* account for error check byte */
+   /* update remaining bytes and message length */
+   priv->msg_buf_remaining = rlen;
+   priv->msg_len = rlen + 1;
xlp9xx_i2c_update_rlen(priv);
priv->len_recv = false;
-   } else {
-   len = min(priv->msg_buf_remaining, len);
-   for (i = 0; i < len; i++, buf++)
-   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
-
-   priv->msg_buf_remaining -= len;
}
 
+   len = min(priv->msg_buf_remaining, len);
+   for (i = 0; i < len; i++, buf++)
+   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+
+   priv->msg_buf_remaining -= len;
priv->msg_buf = buf;
 
if (priv->msg_buf_remaining)
-- 
1.8.3.1

[PATCH v2] i2c: xlp9xx: Fix case where SSIF read transaction completes early

2018-08-09 Thread George Cherian

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and  the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.

Signed-off-by: George Cherian 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index 1f41a4f..7134f72 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -191,28 +191,43 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+
+   /*
+* We expect at least 2 interrupts for I2C_M_RECV_LEN
+* transactions. The length is updated during the first
+* interrupt, and the buffer contents are only copied
+* during subsequent interrupts. If in case the interrupts
+* get merged we would complete the transaction without
+* copying out the bytes from RX fifo. To avoid this now we
+* drain the fifo as and when data is available.
+* We drained the rlen byte already, decrement total length
+* by one.
+*/
+
+   len--;
if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) {
rlen = 0;   /*abort transfer */
priv->msg_buf_remaining = 0;
priv->msg_len = 0;
-   } else {
-   *buf++ = rlen;
-   if (priv->client_pec)
-   ++rlen; /* account for error check byte */
-   /* update remaining bytes and message length */
-   priv->msg_buf_remaining = rlen;
-   priv->msg_len = rlen + 1;
+   xlp9xx_i2c_update_rlen(priv);
+   return;
}
+
+   *buf++ = rlen;
+   if (priv->client_pec)
+   ++rlen; /* account for error check byte */
+   /* update remaining bytes and message length */
+   priv->msg_buf_remaining = rlen;
+   priv->msg_len = rlen + 1;
xlp9xx_i2c_update_rlen(priv);
priv->len_recv = false;
-   } else {
-   len = min(priv->msg_buf_remaining, len);
-   for (i = 0; i < len; i++, buf++)
-   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
-
-   priv->msg_buf_remaining -= len;
}
 
+   len = min(priv->msg_buf_remaining, len);
+   for (i = 0; i < len; i++, buf++)
+   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+
+   priv->msg_buf_remaining -= len;
priv->msg_buf = buf;
 
if (priv->msg_buf_remaining)
-- 
1.8.3.1

Re: Re: [PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early

2018-08-02 Thread George Cherian


Hi Wolfran,

Thanks for the review.

I will update the patch with a small comment section above
len --;
so that there is no confusion.

On 08/01/2018 02:35 AM, Wolfram Sang wrote:

--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+   len--;


I don't know the HW and assume the above line is correct because of
merging two interrupts into one. However, the line looks a bit stray,
and I wonder if we shouldn't add a comment somewhere explaining the
situation similar to the second paragraph of the commit message?



Regards,
-George

Re: Re: [PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early

2018-08-02 Thread George Cherian


Hi Wolfran,

Thanks for the review.

I will update the patch with a small comment section above
len --;
so that there is no confusion.

On 08/01/2018 02:35 AM, Wolfram Sang wrote:

--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+   len--;


I don't know the HW and assume the above line is correct because of
merging two interrupts into one. However, the line looks a bit stray,
and I wonder if we shouldn't add a comment somewhere explaining the
situation similar to the second paragraph of the commit message?



Regards,
-George

[PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early

2018-07-22 Thread George Cherian

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and  the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.

Signed-off-by: George Cherian 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index 1f41a4f..01fa04d 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+   len--;
if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) {
rlen = 0;   /*abort transfer */
priv->msg_buf_remaining = 0;
priv->msg_len = 0;
-   } else {
-   *buf++ = rlen;
-   if (priv->client_pec)
-   ++rlen; /* account for error check byte */
-   /* update remaining bytes and message length */
-   priv->msg_buf_remaining = rlen;
-   priv->msg_len = rlen + 1;
+   xlp9xx_i2c_update_rlen(priv);
+   return;
}
+
+   *buf++ = rlen;
+   if (priv->client_pec)
+   ++rlen; /* account for error check byte */
+   /* update remaining bytes and message length */
+   priv->msg_buf_remaining = rlen;
+   priv->msg_len = rlen + 1;
xlp9xx_i2c_update_rlen(priv);
priv->len_recv = false;
-   } else {
-   len = min(priv->msg_buf_remaining, len);
-   for (i = 0; i < len; i++, buf++)
-   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
-
-   priv->msg_buf_remaining -= len;
}
 
+   len = min(priv->msg_buf_remaining, len);
+   for (i = 0; i < len; i++, buf++)
+   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+
+   priv->msg_buf_remaining -= len;
priv->msg_buf = buf;
 
if (priv->msg_buf_remaining)
-- 
1.8.3.1

[PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early

2018-07-22 Thread George Cherian

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and  the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.

Signed-off-by: George Cherian 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index 1f41a4f..01fa04d 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+   len--;
if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) {
rlen = 0;   /*abort transfer */
priv->msg_buf_remaining = 0;
priv->msg_len = 0;
-   } else {
-   *buf++ = rlen;
-   if (priv->client_pec)
-   ++rlen; /* account for error check byte */
-   /* update remaining bytes and message length */
-   priv->msg_buf_remaining = rlen;
-   priv->msg_len = rlen + 1;
+   xlp9xx_i2c_update_rlen(priv);
+   return;
}
+
+   *buf++ = rlen;
+   if (priv->client_pec)
+   ++rlen; /* account for error check byte */
+   /* update remaining bytes and message length */
+   priv->msg_buf_remaining = rlen;
+   priv->msg_len = rlen + 1;
xlp9xx_i2c_update_rlen(priv);
priv->len_recv = false;
-   } else {
-   len = min(priv->msg_buf_remaining, len);
-   for (i = 0; i < len; i++, buf++)
-   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
-
-   priv->msg_buf_remaining -= len;
}
 
+   len = min(priv->msg_buf_remaining, len);
+   for (i = 0; i < len; i++, buf++)
+   *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
+
+   priv->msg_buf_remaining -= len;
priv->msg_buf = buf;
 
if (priv->msg_buf_remaining)
-- 
1.8.3.1

[PATCH v4] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-12 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 52 ++
 1 file changed, 52 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..30f3021 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,62 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static inline u64 get_delta(u64 t1, u64 t0)
+{
+   if (t1 > t0 || t0 > ~(u32)0)
+   return t1 - t0;
+
+   return (u32)t1 - (u32)t0;
+}
+
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+
+   delta_reference = get_delta(fb_ctrs_t1.reference,
+   fb_ctrs_t0.reference);
+   delta_delivered = get_delta(fb_ctrs_t1.delivered,
+   fb_ctrs_t0.delivered);
+
+   /* Check to avoid divide-by zero */
+   if (delta_reference || delta_delivered)
+   delivered_perf = (reference_perf * delta_delivered) /
+   delta_reference;
+   else
+   delivered_perf = cpu->perf_ctrls.desired_perf;
+
+   return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   udelay(2); /* 2usec delay between sampling */
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
1.8.3.1

[PATCH v4] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-12 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 52 ++
 1 file changed, 52 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..30f3021 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,62 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static inline u64 get_delta(u64 t1, u64 t0)
+{
+   if (t1 > t0 || t0 > ~(u32)0)
+   return t1 - t0;
+
+   return (u32)t1 - (u32)t0;
+}
+
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+
+   delta_reference = get_delta(fb_ctrs_t1.reference,
+   fb_ctrs_t0.reference);
+   delta_delivered = get_delta(fb_ctrs_t1.delivered,
+   fb_ctrs_t0.delivered);
+
+   /* Check to avoid divide-by zero */
+   if (delta_reference || delta_delivered)
+   delivered_perf = (reference_perf * delta_delivered) /
+   delta_reference;
+   else
+   delivered_perf = cpu->perf_ctrls.desired_perf;
+
+   return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   udelay(2); /* 2usec delay between sampling */
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
1.8.3.1

Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-10 Thread George Cherian


Hi Prakash,


On 07/10/2018 09:19 PM, Prakash, Prashanth wrote:


On 7/9/2018 11:42 PM, George Cherian wrote:

Hi Prakash,


On 07/09/2018 10:12 PM, Prakash, Prashanth wrote:


Hi George,


On 7/9/2018 4:10 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
   drivers/cpufreq/cppc_cpufreq.c | 44 
++
   1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
   }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+
+ delta_reference = (u32)fb_ctrs_t1.reference -
+ (u32)fb_ctrs_t0.reference;
+ delta_delivered = (u32)fb_ctrs_t1.delivered -
+ (u32)fb_ctrs_t0.delivered;

Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs
have 64b fields for reference and delivered counters.

Moreover, the integer math is incorrect. You can run into a scenario where
t1.ref/del < t0.ref/del,  thus setting a negative number to u64! The likelihood
of this is very high especially when you throw away the higher 32bits.


Because of binary representation, unsigned subtraction will work even if
t1.ref/del < t0.ref/del. So essentially, the code should look like
this,

static inline u64 get_delta(u64 t1, u64 t0)
{
 if (t1 > t0 || t0 > ~(u32)0)
 return t1 - t0;

 return (u32)t1 - (u32)t0;
}

As a further optimization, I used (u32) since that also works,
as long as the momentary delta at any point is not greater than 2 ^ 32.
I don't foresee any reason for any platform to increment the counters at
an interval greater than 2 ^ 32.


We are NOT running within any critical section to make sure that there will be
no context switch between feedback counter reads. Thus the assumptions that
the delta always represent a very short momentary window of time and that
it is always less than 2^32 is not accurate.

The single overflow assumption about when the above interger math will
work is also not acceptable - especially when we throw away the higher order 
bits.
There are hardware out there that uses 64b counters and can overflow lower 32b
in quite short order of time. Since the spec (and some hardware) provides 
64bits,
we should use it make our implementation more robust instead of throwing away
  the higher order bits.

I think it's ok to use the above integer math, but please add a comment about
single overflow assumption and don't throw away the higher 32bits.


Okay,
I will spin a v4 with the get_delta changes.
Also note that the get_delta function doesn't throw away the higher 32
bits.



To keep things simple, do something like below:

if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) {
   /* Atleast one of them should have overflowed */
   return desired_perf;
}
else {
   compute the delivered perf using the counters.
}


No need to do like this as this is tested and found working across counter 
overruns in our platform.



+
+ /* Check to avoid divide-by zero */
+ if (delta_reference || delta_delivered)
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = cpu->perf_ctrls.desired_perf;
+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)

Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-10 Thread George Cherian


Hi Prakash,


On 07/10/2018 09:19 PM, Prakash, Prashanth wrote:


On 7/9/2018 11:42 PM, George Cherian wrote:

Hi Prakash,


On 07/09/2018 10:12 PM, Prakash, Prashanth wrote:


Hi George,


On 7/9/2018 4:10 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
   drivers/cpufreq/cppc_cpufreq.c | 44 
++
   1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
   }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+
+ delta_reference = (u32)fb_ctrs_t1.reference -
+ (u32)fb_ctrs_t0.reference;
+ delta_delivered = (u32)fb_ctrs_t1.delivered -
+ (u32)fb_ctrs_t0.delivered;

Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs
have 64b fields for reference and delivered counters.

Moreover, the integer math is incorrect. You can run into a scenario where
t1.ref/del < t0.ref/del,  thus setting a negative number to u64! The likelihood
of this is very high especially when you throw away the higher 32bits.


Because of binary representation, unsigned subtraction will work even if
t1.ref/del < t0.ref/del. So essentially, the code should look like
this,

static inline u64 get_delta(u64 t1, u64 t0)
{
 if (t1 > t0 || t0 > ~(u32)0)
 return t1 - t0;

 return (u32)t1 - (u32)t0;
}

As a further optimization, I used (u32) since that also works,
as long as the momentary delta at any point is not greater than 2 ^ 32.
I don't foresee any reason for any platform to increment the counters at
an interval greater than 2 ^ 32.


We are NOT running within any critical section to make sure that there will be
no context switch between feedback counter reads. Thus the assumptions that
the delta always represent a very short momentary window of time and that
it is always less than 2^32 is not accurate.

The single overflow assumption about when the above interger math will
work is also not acceptable - especially when we throw away the higher order 
bits.
There are hardware out there that uses 64b counters and can overflow lower 32b
in quite short order of time. Since the spec (and some hardware) provides 
64bits,
we should use it make our implementation more robust instead of throwing away
  the higher order bits.

I think it's ok to use the above integer math, but please add a comment about
single overflow assumption and don't throw away the higher 32bits.


Okay,
I will spin a v4 with the get_delta changes.
Also note that the get_delta function doesn't throw away the higher 32
bits.



To keep things simple, do something like below:

if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) {
   /* Atleast one of them should have overflowed */
   return desired_perf;
}
else {
   compute the delivered perf using the counters.
}


No need to do like this as this is tested and found working across counter 
overruns in our platform.



+
+ /* Check to avoid divide-by zero */
+ if (delta_reference || delta_delivered)
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = cpu->perf_ctrls.desired_perf;
+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)

Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-09 Thread George Cherian


Hi Prakash,


On 07/09/2018 10:12 PM, Prakash, Prashanth wrote:


Hi George,


On 7/9/2018 4:10 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
  drivers/cpufreq/cppc_cpufreq.c | 44 ++
  1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
   return ret;
  }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+
+ delta_reference = (u32)fb_ctrs_t1.reference -
+ (u32)fb_ctrs_t0.reference;
+ delta_delivered = (u32)fb_ctrs_t1.delivered -
+ (u32)fb_ctrs_t0.delivered;

Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs
have 64b fields for reference and delivered counters.

Moreover, the integer math is incorrect. You can run into a scenario where
t1.ref/del < t0.ref/del,  thus setting a negative number to u64! The likelihood
of this is very high especially when you throw away the higher 32bits.


Because of binary representation, unsigned subtraction will work even if
t1.ref/del < t0.ref/del. So essentially, the code should look like
this,

static inline u64 get_delta(u64 t1, u64 t0)
{
if (t1 > t0 || t0 > ~(u32)0)
return t1 - t0;

return (u32)t1 - (u32)t0;
}

As a further optimization, I used (u32) since that also works,
as long as the momentary delta at any point is not greater than 2 ^ 32.
I don't foresee any reason for any platform to increment the counters at
an interval greater than 2 ^ 32.


To keep things simple, do something like below:

if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) {
  /* Atleast one of them should have overflowed */
  return desired_perf;
}
else {
  compute the delivered perf using the counters.
}


No need to do like this as this is tested and found working across 
counter overruns in our platform.



+
+ /* Check to avoid divide-by zero */
+ if (delta_reference || delta_delivered)
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = cpu->perf_ctrls.desired_perf;
+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)
+ return ret;
+
+ return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
  static struct cpufreq_driver cppc_cpufreq_driver = {
   .flags = CPUFREQ_CONST_LOOPS,
   .verify = cppc_verify_policy,
   .target = cppc_cpufreq_set_target,
+ .get = cppc_cpufreq_get_rate,
   .init = cppc_cpufreq_cpu_init,
   .stop_cpu = cppc_cpufreq_stop_cpu,
   .name = "cppc_cpufreq",

Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-09 Thread George Cherian


Hi Prakash,


On 07/09/2018 10:12 PM, Prakash, Prashanth wrote:


Hi George,


On 7/9/2018 4:10 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
  drivers/cpufreq/cppc_cpufreq.c | 44 ++
  1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
   return ret;
  }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+
+ delta_reference = (u32)fb_ctrs_t1.reference -
+ (u32)fb_ctrs_t0.reference;
+ delta_delivered = (u32)fb_ctrs_t1.delivered -
+ (u32)fb_ctrs_t0.delivered;

Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs
have 64b fields for reference and delivered counters.

Moreover, the integer math is incorrect. You can run into a scenario where
t1.ref/del < t0.ref/del,  thus setting a negative number to u64! The likelihood
of this is very high especially when you throw away the higher 32bits.


Because of binary representation, unsigned subtraction will work even if
t1.ref/del < t0.ref/del. So essentially, the code should look like
this,

static inline u64 get_delta(u64 t1, u64 t0)
{
if (t1 > t0 || t0 > ~(u32)0)
return t1 - t0;

return (u32)t1 - (u32)t0;
}

As a further optimization, I used (u32) since that also works,
as long as the momentary delta at any point is not greater than 2 ^ 32.
I don't foresee any reason for any platform to increment the counters at
an interval greater than 2 ^ 32.


To keep things simple, do something like below:

if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) {
  /* Atleast one of them should have overflowed */
  return desired_perf;
}
else {
  compute the delivered perf using the counters.
}


No need to do like this as this is tested and found working across 
counter overruns in our platform.



+
+ /* Check to avoid divide-by zero */
+ if (delta_reference || delta_delivered)
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = cpu->perf_ctrls.desired_perf;
+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)
+ return ret;
+
+ return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
  static struct cpufreq_driver cppc_cpufreq_driver = {
   .flags = CPUFREQ_CONST_LOOPS,
   .verify = cppc_verify_policy,
   .target = cppc_cpufreq_set_target,
+ .get = cppc_cpufreq_get_rate,
   .init = cppc_cpufreq_cpu_init,
   .stop_cpu = cppc_cpufreq_stop_cpu,
   .name = "cppc_cpufreq",

[PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-09 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+
+   delta_reference = (u32)fb_ctrs_t1.reference -
+   (u32)fb_ctrs_t0.reference;
+   delta_delivered = (u32)fb_ctrs_t1.delivered -
+   (u32)fb_ctrs_t0.delivered;
+
+   /* Check to avoid divide-by zero */
+   if (delta_reference || delta_delivered)
+   delivered_perf = (reference_perf * delta_delivered) /
+   delta_reference;
+   else
+   delivered_perf = cpu->perf_ctrls.desired_perf;
+
+   return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   udelay(2); /* 2usec delay between sampling */
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
1.8.3.1

[PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-09 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+
+   delta_reference = (u32)fb_ctrs_t1.reference -
+   (u32)fb_ctrs_t0.reference;
+   delta_delivered = (u32)fb_ctrs_t1.delivered -
+   (u32)fb_ctrs_t0.delivered;
+
+   /* Check to avoid divide-by zero */
+   if (delta_reference || delta_delivered)
+   delivered_perf = (reference_perf * delta_delivered) /
+   delta_reference;
+   else
+   delivered_perf = cpu->perf_ctrls.desired_perf;
+
+   return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   udelay(2); /* 2usec delay between sampling */
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
1.8.3.1

Re: [v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-06-20 Thread George Cherian


Hi JC,

Thanks for the review.


On 06/20/2018 02:09 AM, Jayachandran C wrote:

Hi George,

Few comments on your patch:

On Fri, Jun 15, 2018 at 03:03:15AM -0700, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
  drivers/cpufreq/cppc_cpufreq.c | 71 ++
  1 file changed, 71 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 3464580..3fe7625 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
  }
  
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,

+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) {
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.reference > (~(u32)0))
+   delta_reference  = (~((u64)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   else
+   delta_reference  = (~((u32)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   }
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) {
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.delivered > (~(u32)0))
+   delta_delivered  = (~((u64)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   else
+   delta_delivered  = (~((u32)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   }


Having this code repeated twice does not look great. Also the math here
is not correct, since (~0 - val2 + val1) is off by one. Because of
binary representation, unsigned subtraction will work even if
val2 < val1. So cleaner way would be to do:

static inline u64 ts_sub(u64 t1, u64 t0)
{
if (t1 > t0 || t0 > ~(u32)0)
return t1 - t0;

return (u32)t1 - (u32)t0;
}

And then use ts_sub in both places above.


I was actually thinking to replace the whole comparison with a single
line irrespective of rollover or not.
It will look something like this.

delta = (u32)(((1UL << 32) - t0) + t1);

This will also take care of the value being off by one.


JC.



Regards,
-George

Re: [v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-06-20 Thread George Cherian


Hi JC,

Thanks for the review.


On 06/20/2018 02:09 AM, Jayachandran C wrote:

Hi George,

Few comments on your patch:

On Fri, Jun 15, 2018 at 03:03:15AM -0700, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
  drivers/cpufreq/cppc_cpufreq.c | 71 ++
  1 file changed, 71 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 3464580..3fe7625 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
  }
  
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,

+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) {
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.reference > (~(u32)0))
+   delta_reference  = (~((u64)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   else
+   delta_reference  = (~((u32)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   }
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) {
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.delivered > (~(u32)0))
+   delta_delivered  = (~((u64)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   else
+   delta_delivered  = (~((u32)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   }


Having this code repeated twice does not look great. Also the math here
is not correct, since (~0 - val2 + val1) is off by one. Because of
binary representation, unsigned subtraction will work even if
val2 < val1. So cleaner way would be to do:

static inline u64 ts_sub(u64 t1, u64 t0)
{
if (t1 > t0 || t0 > ~(u32)0)
return t1 - t0;

return (u32)t1 - (u32)t0;
}

And then use ts_sub in both places above.


I was actually thinking to replace the whole comparison with a single
line irrespective of rollover or not.
It will look something like this.

delta = (u32)(((1UL << 32) - t0) + t1);

This will also take care of the value being off by one.


JC.



Regards,
-George

Re: [PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-06-20 Thread George Cherian


Hi Prakash,

Thanks for the review.

On 06/19/2018 01:51 AM, Prakash, Prashanth wrote:

External Email

Hi George,

On 6/15/2018 4:03 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
  drivers/cpufreq/cppc_cpufreq.c | 71 ++
  1 file changed, 71 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 3464580..3fe7625 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
   return ret;
  }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+ if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) {
+ delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+ } else {

There should be another if () here to check if the reference counters are equal.
We cannot assume, there was a overflow when the counters are equal. As I
mentioned on last patch, the counters *may* pause in idle states.
My Bad... I somehow, over looked that point. In case of delta_reference 
being zero there is actually a check below to avoid divide-by-zero. 
There I returned  reference perf instead of desired perf, same I will 
take care in v3. Isn't that sufficient or is there a need for an 
explicit check here for delta = zero?


Moreover the delta calculation am planning to replace with single
line comparison in v3 for both normal and overflow case.

+ /*
+  * Counters would have wrapped-around
+  * We also need to find whether the low level fw
+  * maintains 32 bit or 64 bit counters, to calculate
+  * the correct delta.
+  */
+ if (fb_ctrs_t0.reference > (~(u32)0))
+ delta_reference  = (~((u64)0) - fb_ctrs_t0.reference) +
+ fb_ctrs_t1.reference;
+ else
+ delta_reference  = (~((u32)0) - fb_ctrs_t0.reference) +
+ fb_ctrs_t1.reference;
+ }
+
+ if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) {
+ delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+ } else {
+ /*
+  * Counters would have wrapped-around
+  * We also need to find whether the low level fw
+  * maintains 32 bit or 64 bit counters, to calculate
+  * the correct delta.
+  */
+ if (fb_ctrs_t0.delivered > (~(u32)0))
+ delta_delivered  = (~((u64)0) - fb_ctrs_t0.delivered) +
+ fb_ctrs_t1.delivered;
+ else
+ delta_delivered  = (~((u32)0) - fb_ctrs_t0.delivered) +
+ fb_ctrs_t1.delivered;
+ }
+
+ if (delta_reference)  /* Check to avoid divide-by zero */
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = reference_perf;


If we cannot compute delivered performance then we should return
desired/requested perf and not reference_perf.


Noted!!

+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)
+ return ret;
+
+ return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
  static struct cpufreq_driver cppc_cpufreq_driver = {
   .flags = CPUFREQ_CONST_LOOPS,
   .verify = cppc_verify_policy,
   .target = cppc_cpufreq_set_target,
+ .get = cppc_c

Re: [PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-06-20 Thread George Cherian


Hi Prakash,

Thanks for the review.

On 06/19/2018 01:51 AM, Prakash, Prashanth wrote:

External Email

Hi George,

On 6/15/2018 4:03 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
  drivers/cpufreq/cppc_cpufreq.c | 71 ++
  1 file changed, 71 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 3464580..3fe7625 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
   return ret;
  }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+ if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) {
+ delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+ } else {

There should be another if () here to check if the reference counters are equal.
We cannot assume, there was a overflow when the counters are equal. As I
mentioned on last patch, the counters *may* pause in idle states.
My Bad... I somehow, over looked that point. In case of delta_reference 
being zero there is actually a check below to avoid divide-by-zero. 
There I returned  reference perf instead of desired perf, same I will 
take care in v3. Isn't that sufficient or is there a need for an 
explicit check here for delta = zero?


Moreover the delta calculation am planning to replace with single
line comparison in v3 for both normal and overflow case.

+ /*
+  * Counters would have wrapped-around
+  * We also need to find whether the low level fw
+  * maintains 32 bit or 64 bit counters, to calculate
+  * the correct delta.
+  */
+ if (fb_ctrs_t0.reference > (~(u32)0))
+ delta_reference  = (~((u64)0) - fb_ctrs_t0.reference) +
+ fb_ctrs_t1.reference;
+ else
+ delta_reference  = (~((u32)0) - fb_ctrs_t0.reference) +
+ fb_ctrs_t1.reference;
+ }
+
+ if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) {
+ delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+ } else {
+ /*
+  * Counters would have wrapped-around
+  * We also need to find whether the low level fw
+  * maintains 32 bit or 64 bit counters, to calculate
+  * the correct delta.
+  */
+ if (fb_ctrs_t0.delivered > (~(u32)0))
+ delta_delivered  = (~((u64)0) - fb_ctrs_t0.delivered) +
+ fb_ctrs_t1.delivered;
+ else
+ delta_delivered  = (~((u32)0) - fb_ctrs_t0.delivered) +
+ fb_ctrs_t1.delivered;
+ }
+
+ if (delta_reference)  /* Check to avoid divide-by zero */
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = reference_perf;


If we cannot compute delivered performance then we should return
desired/requested perf and not reference_perf.


Noted!!

+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)
+ return ret;
+
+ return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
  static struct cpufreq_driver cppc_cpufreq_driver = {
   .flags = CPUFREQ_CONST_LOOPS,
   .verify = cppc_verify_policy,
   .target = cppc_cpufreq_set_target,
+ .get = cppc_c

[PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-06-15 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 71 ++
 1 file changed, 71 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 3464580..3fe7625 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) {
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.reference > (~(u32)0))
+   delta_reference  = (~((u64)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   else
+   delta_reference  = (~((u32)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   }
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) {
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.delivered > (~(u32)0))
+   delta_delivered  = (~((u64)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   else
+   delta_delivered  = (~((u32)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   }
+
+   if (delta_reference)  /* Check to avoid divide-by zero */
+   delivered_perf = (reference_perf * delta_delivered) /
+   delta_reference;
+   else
+   delivered_perf = reference_perf;
+
+   return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   udelay(2); /* 2usec delay between sampling */
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
2.7.4

[PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-06-15 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 71 ++
 1 file changed, 71 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 3464580..3fe7625 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, delivered_perf;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) {
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.reference > (~(u32)0))
+   delta_reference  = (~((u64)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   else
+   delta_reference  = (~((u32)0) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+   }
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) {
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   } else {
+   /*
+* Counters would have wrapped-around
+* We also need to find whether the low level fw
+* maintains 32 bit or 64 bit counters, to calculate
+* the correct delta.
+*/
+   if (fb_ctrs_t0.delivered > (~(u32)0))
+   delta_delivered  = (~((u64)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   else
+   delta_delivered  = (~((u32)0) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+   }
+
+   if (delta_reference)  /* Check to avoid divide-by zero */
+   delivered_perf = (reference_perf * delta_delivered) /
+   delta_reference;
+   else
+   delivered_perf = reference_perf;
+
+   return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   udelay(2); /* 2usec delay between sampling */
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
2.7.4

Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-31 Thread George Cherian




Hi Prashanth,
On 05/29/2018 09:14 PM, Prakash, Prashanth wrote:


On 5/28/2018 1:09 AM, George Cherian wrote:

Hi Prashanth,

On 05/26/2018 02:30 AM, Prakash, Prashanth wrote:


On 5/25/2018 12:27 AM, George Cherian wrote:

Hi Prashanth,

On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:

Hi George,

On 5/22/2018 5:42 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
---
    drivers/cpufreq/cppc_cpufreq.c | 44 
++
    1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
    return ret;
    }
    +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+    u64 delta_reference, delta_delivered;
+    u64 reference_perf, ratio;
+
+    reference_perf = fb_ctrs_t0.reference_perf;
+    if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+    delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+    else /* Counters would have wrapped-around */
+    delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+    fb_ctrs_t1.reference;
+
+    if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+    delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+    else /* Counters would have wrapped-around */
+    delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+    fb_ctrs_t1.delivered;

We need to check that the wraparound time is long enough to make sure that
the counters cannot wrap around more than once. We can register a  get() api
only after checking that wraparound time value is reasonably high.

I am not aware of any platforms where wraparound time is soo short, but
wouldn't hurt to check once during init.

By design the wraparound time is a 64 bit counter, for that matter even
all the feedback counters too are 64 bit counters. I don't see any
chance in which the counters can wraparound twice in back to back reads.
The only situation is in which system itself is running at a really high
frequency. Even in that case today's spec is not sufficient to support the same.


The spec doesn't say these have to be 64bit registers.  The wraparound
counter register is in spec to communicate the worst case(shortest)
counter rollover time.


Spec says these are 32 or 64 bit registers. Spec also defines counter
wraparound time in seconds. The minimum value possible is 1 as zero means the 
counters are never assumed to wrap around. Even in platforms with value set as 
1 (1 sec) I dont really see a situation in which
the counter can wraparound twice if we are putting a delay of 2usec
between sampling.

ok.

Thanks




As as mentioned before this is just a defensive check to make sure that
the platform has not set it to some very low number (which is allowed
by the spec).

It might be unnecessary to have a check like this.





+
+    if (delta_reference)  /* Check to avoid divide-by zero */
+    ratio = (delta_delivered * 1000) / delta_reference;

Why not just return the computed value here instead of *1000 and later /1000?
return (ref_per * delta_del) / delta_ref;

Yes.

+    else
+    return -EINVAL;

Instead of EINVAL, i think we should return current frequency.


Sorry, I didn't get you, How do you calculate the current frequency?
Did you mean reference performance?

I mean the performance that OSPM/Linux had requested earlier.
i.e the desired_perf

Okay, I will make necessary changes for this in v2.




The counters can pause if CPUs are in idle state during our sampling interval, 
so
If the counters did not progress, it is reasonable to assume the delivered perf 
was
equal to desired perf.

No, that is wrong. Here the check is for reference performance delta.
This counter can never pause. In case of cpuidle only the delivered counters 
could pause. Delivered counters will pause only if the particular core enters 
power down mode, Otherwise we would be still clocking the core and we should be 
getting a delta across 2 sampling periods. In case if the reference counter is 
paused which

Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-31 Thread George Cherian




Hi Prashanth,
On 05/29/2018 09:14 PM, Prakash, Prashanth wrote:


On 5/28/2018 1:09 AM, George Cherian wrote:

Hi Prashanth,

On 05/26/2018 02:30 AM, Prakash, Prashanth wrote:


On 5/25/2018 12:27 AM, George Cherian wrote:

Hi Prashanth,

On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:

Hi George,

On 5/22/2018 5:42 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
---
    drivers/cpufreq/cppc_cpufreq.c | 44 
++
    1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
    return ret;
    }
    +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+    u64 delta_reference, delta_delivered;
+    u64 reference_perf, ratio;
+
+    reference_perf = fb_ctrs_t0.reference_perf;
+    if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+    delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+    else /* Counters would have wrapped-around */
+    delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+    fb_ctrs_t1.reference;
+
+    if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+    delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+    else /* Counters would have wrapped-around */
+    delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+    fb_ctrs_t1.delivered;

We need to check that the wraparound time is long enough to make sure that
the counters cannot wrap around more than once. We can register a  get() api
only after checking that wraparound time value is reasonably high.

I am not aware of any platforms where wraparound time is soo short, but
wouldn't hurt to check once during init.

By design the wraparound time is a 64 bit counter, for that matter even
all the feedback counters too are 64 bit counters. I don't see any
chance in which the counters can wraparound twice in back to back reads.
The only situation is in which system itself is running at a really high
frequency. Even in that case today's spec is not sufficient to support the same.


The spec doesn't say these have to be 64bit registers.  The wraparound
counter register is in spec to communicate the worst case(shortest)
counter rollover time.


Spec says these are 32 or 64 bit registers. Spec also defines counter
wraparound time in seconds. The minimum value possible is 1 as zero means the 
counters are never assumed to wrap around. Even in platforms with value set as 
1 (1 sec) I dont really see a situation in which
the counter can wraparound twice if we are putting a delay of 2usec
between sampling.

ok.

Thanks




As as mentioned before this is just a defensive check to make sure that
the platform has not set it to some very low number (which is allowed
by the spec).

It might be unnecessary to have a check like this.





+
+    if (delta_reference)  /* Check to avoid divide-by zero */
+    ratio = (delta_delivered * 1000) / delta_reference;

Why not just return the computed value here instead of *1000 and later /1000?
return (ref_per * delta_del) / delta_ref;

Yes.

+    else
+    return -EINVAL;

Instead of EINVAL, i think we should return current frequency.


Sorry, I didn't get you, How do you calculate the current frequency?
Did you mean reference performance?

I mean the performance that OSPM/Linux had requested earlier.
i.e the desired_perf

Okay, I will make necessary changes for this in v2.




The counters can pause if CPUs are in idle state during our sampling interval, 
so
If the counters did not progress, it is reasonable to assume the delivered perf 
was
equal to desired perf.

No, that is wrong. Here the check is for reference performance delta.
This counter can never pause. In case of cpuidle only the delivered counters 
could pause. Delivered counters will pause only if the particular core enters 
power down mode, Otherwise we would be still clocking the core and we should be 
getting a delta across 2 sampling periods. In case if the reference counter is 
paused which

Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-28 Thread George Cherian


Hi Prashanth,

On 05/26/2018 02:30 AM, Prakash, Prashanth wrote:


On 5/25/2018 12:27 AM, George Cherian wrote:

Hi Prashanth,

On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:

Hi George,

On 5/22/2018 5:42 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian <george.cher...@cavium.com>
---
   drivers/cpufreq/cppc_cpufreq.c | 44 
++
   1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
   return ret;
   }
   +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+    u64 delta_reference, delta_delivered;
+    u64 reference_perf, ratio;
+
+    reference_perf = fb_ctrs_t0.reference_perf;
+    if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+    delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+    else /* Counters would have wrapped-around */
+    delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+    fb_ctrs_t1.reference;
+
+    if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+    delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+    else /* Counters would have wrapped-around */
+    delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+    fb_ctrs_t1.delivered;

We need to check that the wraparound time is long enough to make sure that
the counters cannot wrap around more than once. We can register a  get() api
only after checking that wraparound time value is reasonably high.

I am not aware of any platforms where wraparound time is soo short, but
wouldn't hurt to check once during init.

By design the wraparound time is a 64 bit counter, for that matter even
all the feedback counters too are 64 bit counters. I don't see any
chance in which the counters can wraparound twice in back to back reads.
The only situation is in which system itself is running at a really high
frequency. Even in that case today's spec is not sufficient to support the same.


The spec doesn't say these have to be 64bit registers.  The wraparound
counter register is in spec to communicate the worst case(shortest)
counter rollover time.


Spec says these are 32 or 64 bit registers. Spec also defines counter
wraparound time in seconds. The minimum value possible is 1 as zero 
means the counters are never assumed to wrap around. Even in platforms 
with value set as 1 (1 sec) I dont really see a situation in which

the counter can wraparound twice if we are putting a delay of 2usec
between sampling.



As as mentioned before this is just a defensive check to make sure that
the platform has not set it to some very low number (which is allowed
by the spec).

It might be unnecessary to have a check like this.





+
+    if (delta_reference)  /* Check to avoid divide-by zero */
+    ratio = (delta_delivered * 1000) / delta_reference;

Why not just return the computed value here instead of *1000 and later /1000?
return (ref_per * delta_del) / delta_ref;

Yes.

+    else
+    return -EINVAL;

Instead of EINVAL, i think we should return current frequency.


Sorry, I didn't get you, How do you calculate the current frequency?
Did you mean reference performance?

I mean the performance that OSPM/Linux had requested earlier.
i.e the desired_perf

Okay, I will make necessary changes for this in v2.




The counters can pause if CPUs are in idle state during our sampling interval, 
so
If the counters did not progress, it is reasonable to assume the delivered perf 
was
equal to desired perf.

No, that is wrong. Here the check is for reference performance delta.
This counter can never pause. In case of cpuidle only the delivered counters 
could pause. Delivered counters will pause only if the particular core enters 
power down mode, Otherwise we would be still clocking the core and we should be 
getting a delta across 2 sampling periods. In case if the reference counter is 
paused which by design is not correct then there is no point in returning 
reference performance numbers. That too is

Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-28 Thread George Cherian


Hi Prashanth,

On 05/26/2018 02:30 AM, Prakash, Prashanth wrote:


On 5/25/2018 12:27 AM, George Cherian wrote:

Hi Prashanth,

On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:

Hi George,

On 5/22/2018 5:42 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
---
   drivers/cpufreq/cppc_cpufreq.c | 44 
++
   1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
   return ret;
   }
   +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+    u64 delta_reference, delta_delivered;
+    u64 reference_perf, ratio;
+
+    reference_perf = fb_ctrs_t0.reference_perf;
+    if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+    delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+    else /* Counters would have wrapped-around */
+    delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+    fb_ctrs_t1.reference;
+
+    if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+    delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+    else /* Counters would have wrapped-around */
+    delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+    fb_ctrs_t1.delivered;

We need to check that the wraparound time is long enough to make sure that
the counters cannot wrap around more than once. We can register a  get() api
only after checking that wraparound time value is reasonably high.

I am not aware of any platforms where wraparound time is soo short, but
wouldn't hurt to check once during init.

By design the wraparound time is a 64 bit counter, for that matter even
all the feedback counters too are 64 bit counters. I don't see any
chance in which the counters can wraparound twice in back to back reads.
The only situation is in which system itself is running at a really high
frequency. Even in that case today's spec is not sufficient to support the same.


The spec doesn't say these have to be 64bit registers.  The wraparound
counter register is in spec to communicate the worst case(shortest)
counter rollover time.


Spec says these are 32 or 64 bit registers. Spec also defines counter
wraparound time in seconds. The minimum value possible is 1 as zero 
means the counters are never assumed to wrap around. Even in platforms 
with value set as 1 (1 sec) I dont really see a situation in which

the counter can wraparound twice if we are putting a delay of 2usec
between sampling.



As as mentioned before this is just a defensive check to make sure that
the platform has not set it to some very low number (which is allowed
by the spec).

It might be unnecessary to have a check like this.





+
+    if (delta_reference)  /* Check to avoid divide-by zero */
+    ratio = (delta_delivered * 1000) / delta_reference;

Why not just return the computed value here instead of *1000 and later /1000?
return (ref_per * delta_del) / delta_ref;

Yes.

+    else
+    return -EINVAL;

Instead of EINVAL, i think we should return current frequency.


Sorry, I didn't get you, How do you calculate the current frequency?
Did you mean reference performance?

I mean the performance that OSPM/Linux had requested earlier.
i.e the desired_perf

Okay, I will make necessary changes for this in v2.




The counters can pause if CPUs are in idle state during our sampling interval, 
so
If the counters did not progress, it is reasonable to assume the delivered perf 
was
equal to desired perf.

No, that is wrong. Here the check is for reference performance delta.
This counter can never pause. In case of cpuidle only the delivered counters 
could pause. Delivered counters will pause only if the particular core enters 
power down mode, Otherwise we would be still clocking the core and we should be 
getting a delta across 2 sampling periods. In case if the reference counter is 
paused which by design is not correct then there is no point in returning 
reference performance numbers. That too is wrong. In case the low

Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-25 Thread George Cherian


Hi Prashanth,

On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:

Hi George,

On 5/22/2018 5:42 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian <george.cher...@cavium.com>
---
  drivers/cpufreq/cppc_cpufreq.c | 44 ++
  1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
  }
  
+static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,

+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, ratio;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   else /* Counters would have wrapped-around */
+   delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   else /* Counters would have wrapped-around */
+   delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;

We need to check that the wraparound time is long enough to make sure that
the counters cannot wrap around more than once. We can register a  get() api
only after checking that wraparound time value is reasonably high.

I am not aware of any platforms where wraparound time is soo short, but
wouldn't hurt to check once during init.

By design the wraparound time is a 64 bit counter, for that matter even
all the feedback counters too are 64 bit counters. I don't see any
chance in which the counters can wraparound twice in back to back reads.
The only situation is in which system itself is running at a really high
frequency. Even in that case today's spec is not sufficient to support 
the same.



+
+   if (delta_reference)  /* Check to avoid divide-by zero */
+   ratio = (delta_delivered * 1000) / delta_reference;

Why not just return the computed value here instead of *1000 and later /1000?
return (ref_per * delta_del) / delta_ref;

Yes.

+   else
+   return -EINVAL;

Instead of EINVAL, i think we should return current frequency.


Sorry, I didn't get you, How do you calculate the current frequency?
Did you mean reference performance?


The counters can pause if CPUs are in idle state during our sampling interval, 
so
If the counters did not progress, it is reasonable to assume the delivered perf 
was
equal to desired perf.

No, that is wrong. Here the check is for reference performance delta.
This counter can never pause. In case of cpuidle only the delivered 
counters could pause. Delivered counters will pause only if the 
particular core enters power down mode, Otherwise we would be still 
clocking the core and we should be getting a delta across 2 sampling 
periods. In case if the reference counter is paused which by design is 
not correct then there is no point in returning reference performance 
numbers. That too is wrong. In case the low level FW is not updating the
counters properly then it should be evident till Linux, instead of 
returning a bogus frequency.


Even if platform wanted to limit, since the CPUs were asleep(idle) we could not 
have
observed lower performance, so we will not throw off  any logic that could be 
driven
using the returned value.

+
+   return (reference_perf * ratio) / 1000;

This should be converted to KHz as cpufreq is not aware of CPPC abstract scale
In our platform all performance registers are implemented in KHz. 
Because of which we never had an issue with conversion. I am  not

aware whether ACPI mandates to use any particular unit. How is that
implemented in your platform? Just to avoid any extra conversion don't
you feel it is better to always report in KHz from firmware.


+}
+
+static unsigned int cppc_cpufreq_get_rate(unsi

Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-25 Thread George Cherian


Hi Prashanth,

On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:

Hi George,

On 5/22/2018 5:42 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
---
  drivers/cpufreq/cppc_cpufreq.c | 44 ++
  1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
  }
  
+static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,

+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, ratio;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   else /* Counters would have wrapped-around */
+   delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   else /* Counters would have wrapped-around */
+   delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;

We need to check that the wraparound time is long enough to make sure that
the counters cannot wrap around more than once. We can register a  get() api
only after checking that wraparound time value is reasonably high.

I am not aware of any platforms where wraparound time is soo short, but
wouldn't hurt to check once during init.

By design the wraparound time is a 64 bit counter, for that matter even
all the feedback counters too are 64 bit counters. I don't see any
chance in which the counters can wraparound twice in back to back reads.
The only situation is in which system itself is running at a really high
frequency. Even in that case today's spec is not sufficient to support 
the same.



+
+   if (delta_reference)  /* Check to avoid divide-by zero */
+   ratio = (delta_delivered * 1000) / delta_reference;

Why not just return the computed value here instead of *1000 and later /1000?
return (ref_per * delta_del) / delta_ref;

Yes.

+   else
+   return -EINVAL;

Instead of EINVAL, i think we should return current frequency.


Sorry, I didn't get you, How do you calculate the current frequency?
Did you mean reference performance?


The counters can pause if CPUs are in idle state during our sampling interval, 
so
If the counters did not progress, it is reasonable to assume the delivered perf 
was
equal to desired perf.

No, that is wrong. Here the check is for reference performance delta.
This counter can never pause. In case of cpuidle only the delivered 
counters could pause. Delivered counters will pause only if the 
particular core enters power down mode, Otherwise we would be still 
clocking the core and we should be getting a delta across 2 sampling 
periods. In case if the reference counter is paused which by design is 
not correct then there is no point in returning reference performance 
numbers. That too is wrong. In case the low level FW is not updating the
counters properly then it should be evident till Linux, instead of 
returning a bogus frequency.


Even if platform wanted to limit, since the CPUs were asleep(idle) we could not 
have
observed lower performance, so we will not throw off  any logic that could be 
driven
using the returned value.

+
+   return (reference_perf * ratio) / 1000;

This should be converted to KHz as cpufreq is not aware of CPPC abstract scale
In our platform all performance registers are implemented in KHz. 
Because of which we never had an issue with conversion. I am  not

aware whether ACPI mandates to use any particular unit. How is that
implemented in your platform? Just to avoid any extra conversion don't
you feel it is better to always report in KHz from firmware.


+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_per

[PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-22 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian <george.cher...@cavium.com>
---
 drivers/cpufreq/cppc_cpufreq.c | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, ratio;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   else /* Counters would have wrapped-around */
+   delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   else /* Counters would have wrapped-around */
+   delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+
+   if (delta_reference)  /* Check to avoid divide-by zero */
+   ratio = (delta_delivered * 1000) / delta_reference;
+   else
+   return -EINVAL;
+
+   return (reference_perf * ratio) / 1000;
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
1.8.3.1

[PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-05-22 Thread George Cherian

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
---
 drivers/cpufreq/cppc_cpufreq.c | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b15115a..a046915 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
 }
 
+static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0,
+struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+   u64 delta_reference, delta_delivered;
+   u64 reference_perf, ratio;
+
+   reference_perf = fb_ctrs_t0.reference_perf;
+   if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
+   delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
+   else /* Counters would have wrapped-around */
+   delta_reference  = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
+   fb_ctrs_t1.reference;
+
+   if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
+   delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
+   else /* Counters would have wrapped-around */
+   delta_delivered  = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
+   fb_ctrs_t1.delivered;
+
+   if (delta_reference)  /* Check to avoid divide-by zero */
+   ratio = (delta_delivered * 1000) / delta_reference;
+   else
+   return -EINVAL;
+
+   return (reference_perf * ratio) / 1000;
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+   struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+   int ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+   if (ret)
+   return ret;
+
+   ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+   if (ret)
+   return ret;
+
+   return cppc_get_rate_from_fbctrs(fb_ctrs_t0, fb_ctrs_t1);
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
+   .get = cppc_cpufreq_get_rate,
.init = cppc_cpufreq_cpu_init,
.stop_cpu = cppc_cpufreq_stop_cpu,
.name = "cppc_cpufreq",
-- 
1.8.3.1

[PATCH 1/4] i2c: xlp9xx: Add support for SMBAlert

2018-05-16 Thread George Cherian

Add support for SMBus alert mechanism to i2c-xlp9xx driver.
The second interrupt is parsed to use for SMBus alert.
The first interrupt is the i2c controller main interrupt.

Signed-off-by: Kamlakant Patel <kamlakant.pa...@cavium.com>
Signed-off-by: George Cherian <george.cher...@cavium.com>
Reviewed-by: Jan Glauber <jglau...@cavium.com>
---
 drivers/i2c/busses/i2c-xlp9xx.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index eb8913e..fe54512 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,6 +85,8 @@ struct xlp9xx_i2c_dev {
struct device *dev;
struct i2c_adapter adapter;
struct completion msg_complete;
+   struct i2c_smbus_alert_setup alert_data;
+   struct i2c_client *ara;
int irq;
bool msg_read;
bool len_recv;
@@ -447,6 +450,19 @@ static int xlp9xx_i2c_get_frequency(struct platform_device 
*pdev,
return 0;
 }
 
+static int xlp9xx_i2c_smbus_setup(struct xlp9xx_i2c_dev *priv,
+ struct platform_device *pdev)
+{
+   if (!priv->alert_data.irq)
+   return -EINVAL;
+
+   priv->ara = i2c_setup_smbus_alert(>adapter, >alert_data);
+   if (!priv->ara)
+   return -ENODEV;
+
+   return 0;
+}
+
 static int xlp9xx_i2c_probe(struct platform_device *pdev)
 {
struct xlp9xx_i2c_dev *priv;
@@ -467,6 +483,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev)
dev_err(>dev, "invalid irq!\n");
return priv->irq;
}
+   /* SMBAlert irq */
+   priv->alert_data.irq = platform_get_irq(pdev, 1);
+   if (priv->alert_data.irq <= 0)
+   priv->alert_data.irq = 0;
 
xlp9xx_i2c_get_frequency(pdev, priv);
xlp9xx_i2c_init(priv);
@@ -493,6 +513,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev)
if (err)
return err;
 
+   err = xlp9xx_i2c_smbus_setup(priv, pdev);
+   if (err)
+   dev_dbg(>dev, "No active SMBus alert %d\n", err);
+
platform_set_drvdata(pdev, priv);
dev_dbg(>dev, "I2C bus:%d added\n", priv->adapter.nr);
 
-- 
1.8.3.1

[PATCH 1/4] i2c: xlp9xx: Add support for SMBAlert

2018-05-16 Thread George Cherian

Add support for SMBus alert mechanism to i2c-xlp9xx driver.
The second interrupt is parsed to use for SMBus alert.
The first interrupt is the i2c controller main interrupt.

Signed-off-by: Kamlakant Patel 
Signed-off-by: George Cherian 
Reviewed-by: Jan Glauber 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index eb8913e..fe54512 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,6 +85,8 @@ struct xlp9xx_i2c_dev {
struct device *dev;
struct i2c_adapter adapter;
struct completion msg_complete;
+   struct i2c_smbus_alert_setup alert_data;
+   struct i2c_client *ara;
int irq;
bool msg_read;
bool len_recv;
@@ -447,6 +450,19 @@ static int xlp9xx_i2c_get_frequency(struct platform_device 
*pdev,
return 0;
 }
 
+static int xlp9xx_i2c_smbus_setup(struct xlp9xx_i2c_dev *priv,
+ struct platform_device *pdev)
+{
+   if (!priv->alert_data.irq)
+   return -EINVAL;
+
+   priv->ara = i2c_setup_smbus_alert(>adapter, >alert_data);
+   if (!priv->ara)
+   return -ENODEV;
+
+   return 0;
+}
+
 static int xlp9xx_i2c_probe(struct platform_device *pdev)
 {
struct xlp9xx_i2c_dev *priv;
@@ -467,6 +483,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev)
dev_err(>dev, "invalid irq!\n");
return priv->irq;
}
+   /* SMBAlert irq */
+   priv->alert_data.irq = platform_get_irq(pdev, 1);
+   if (priv->alert_data.irq <= 0)
+   priv->alert_data.irq = 0;
 
xlp9xx_i2c_get_frequency(pdev, priv);
xlp9xx_i2c_init(priv);
@@ -493,6 +513,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev)
if (err)
return err;
 
+   err = xlp9xx_i2c_smbus_setup(priv, pdev);
+   if (err)
+   dev_dbg(>dev, "No active SMBus alert %d\n", err);
+
platform_set_drvdata(pdev, priv);
dev_dbg(>dev, "I2C bus:%d added\n", priv->adapter.nr);
 
-- 
1.8.3.1

[PATCH 3/4] i2c: xlp9xx: Make sure the transfer size is not more than I2C_SMBUS_BLOCK_SIZE

2018-05-16 Thread George Cherian

For SMBus transactions the max permissible transfer size is
I2C_SMBUS_BLOCK_SIZE. It is possible that some clients might
not follow it strictly occasionally.
This would lead to stack corruption if the driver copies more than
I2C_SMBUS_BLOCK_SIZE bytes. Add a check to avoid such conditions.

Signed-off-by: Jayachandran C <jn...@caviumnetworks.com>
Signed-off-by: George Cherian <george.cher...@cavium.com>
---
 drivers/i2c/busses/i2c-xlp9xx.c | 37 -
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index c268fde..1f41a4f 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -172,6 +172,8 @@ static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev 
*priv)
len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) &
  XLP9XX_I2C_FIFO_WCNT_MASK;
len = max_t(u32, priv->msg_len, len + 4);
+   if (len >= I2C_SMBUS_BLOCK_MAX + 2)
+   return;
val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
(len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val);
@@ -189,14 +191,20 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
-   *buf++ = rlen;
-   if (priv->client_pec)
-   ++rlen;
-   /* update remaining bytes and message length */
-   priv->msg_buf_remaining = rlen;
-   priv->msg_len = rlen + 1;
-   priv->len_recv = false;
+   if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) {
+   rlen = 0;   /*abort transfer */
+   priv->msg_buf_remaining = 0;
+   priv->msg_len = 0;
+   } else {
+   *buf++ = rlen;
+   if (priv->client_pec)
+   ++rlen; /* account for error check byte */
+   /* update remaining bytes and message length */
+   priv->msg_buf_remaining = rlen;
+   priv->msg_len = rlen + 1;
+   }
xlp9xx_i2c_update_rlen(priv);
+   priv->len_recv = false;
} else {
len = min(priv->msg_buf_remaining, len);
for (i = 0; i < len; i++, buf++)
@@ -315,10 +323,6 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev 
*priv, struct i2c_msg *msg,
xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_MFIFOCTRL,
 XLP9XX_I2C_MFIFOCTRL_RST);
 
-   /* set FIFO threshold if reading */
-   if (priv->msg_read)
-   xlp9xx_i2c_update_rx_fifo_thres(priv);
-
/* set slave addr */
xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_SLAVEADDR,
 (msg->addr << XLP9XX_I2C_SLAVEADDR_ADDR_SHIFT) |
@@ -337,9 +341,13 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev 
*priv, struct i2c_msg *msg,
val &= ~XLP9XX_I2C_CTRL_ADDMODE;
 
priv->len_recv = msg->flags & I2C_M_RECV_LEN;
-   len = priv->len_recv ? XLP9XX_I2C_FIFO_SIZE : msg->len;
+   len = priv->len_recv ? I2C_SMBUS_BLOCK_MAX + 2 : msg->len;
priv->client_pec = msg->flags & I2C_CLIENT_PEC;
 
+   /* set FIFO threshold if reading */
+   if (priv->msg_read)
+   xlp9xx_i2c_update_rx_fifo_thres(priv);
+
/* set data length to be transferred */
val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
  (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
@@ -393,8 +401,11 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev 
*priv, struct i2c_msg *msg,
}
 
/* update msg->len with actual received length */
-   if (msg->flags & I2C_M_RECV_LEN)
+   if (msg->flags & I2C_M_RECV_LEN) {
+   if (!priv->msg_len)
+   return -EPROTO;
msg->len = priv->msg_len;
+   }
return 0;
 }
 
-- 
1.8.3.1

[PATCH 3/4] i2c: xlp9xx: Make sure the transfer size is not more than I2C_SMBUS_BLOCK_SIZE

2018-05-16 Thread George Cherian

For SMBus transactions the max permissible transfer size is
I2C_SMBUS_BLOCK_SIZE. It is possible that some clients might
not follow it strictly occasionally.
This would lead to stack corruption if the driver copies more than
I2C_SMBUS_BLOCK_SIZE bytes. Add a check to avoid such conditions.

Signed-off-by: Jayachandran C 
Signed-off-by: George Cherian 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 37 -
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index c268fde..1f41a4f 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -172,6 +172,8 @@ static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev 
*priv)
len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) &
  XLP9XX_I2C_FIFO_WCNT_MASK;
len = max_t(u32, priv->msg_len, len + 4);
+   if (len >= I2C_SMBUS_BLOCK_MAX + 2)
+   return;
val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
(len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val);
@@ -189,14 +191,20 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
if (priv->len_recv) {
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
-   *buf++ = rlen;
-   if (priv->client_pec)
-   ++rlen;
-   /* update remaining bytes and message length */
-   priv->msg_buf_remaining = rlen;
-   priv->msg_len = rlen + 1;
-   priv->len_recv = false;
+   if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) {
+   rlen = 0;   /*abort transfer */
+   priv->msg_buf_remaining = 0;
+   priv->msg_len = 0;
+   } else {
+   *buf++ = rlen;
+   if (priv->client_pec)
+   ++rlen; /* account for error check byte */
+   /* update remaining bytes and message length */
+   priv->msg_buf_remaining = rlen;
+   priv->msg_len = rlen + 1;
+   }
xlp9xx_i2c_update_rlen(priv);
+   priv->len_recv = false;
} else {
len = min(priv->msg_buf_remaining, len);
for (i = 0; i < len; i++, buf++)
@@ -315,10 +323,6 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev 
*priv, struct i2c_msg *msg,
xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_MFIFOCTRL,
 XLP9XX_I2C_MFIFOCTRL_RST);
 
-   /* set FIFO threshold if reading */
-   if (priv->msg_read)
-   xlp9xx_i2c_update_rx_fifo_thres(priv);
-
/* set slave addr */
xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_SLAVEADDR,
 (msg->addr << XLP9XX_I2C_SLAVEADDR_ADDR_SHIFT) |
@@ -337,9 +341,13 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev 
*priv, struct i2c_msg *msg,
val &= ~XLP9XX_I2C_CTRL_ADDMODE;
 
priv->len_recv = msg->flags & I2C_M_RECV_LEN;
-   len = priv->len_recv ? XLP9XX_I2C_FIFO_SIZE : msg->len;
+   len = priv->len_recv ? I2C_SMBUS_BLOCK_MAX + 2 : msg->len;
priv->client_pec = msg->flags & I2C_CLIENT_PEC;
 
+   /* set FIFO threshold if reading */
+   if (priv->msg_read)
+   xlp9xx_i2c_update_rx_fifo_thres(priv);
+
/* set data length to be transferred */
val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
  (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
@@ -393,8 +401,11 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev 
*priv, struct i2c_msg *msg,
}
 
/* update msg->len with actual received length */
-   if (msg->flags & I2C_M_RECV_LEN)
+   if (msg->flags & I2C_M_RECV_LEN) {
+   if (!priv->msg_len)
+   return -EPROTO;
msg->len = priv->msg_len;
+   }
return 0;
 }
 
-- 
1.8.3.1

[PATCH 4/4] i2c: xlp9xx: Add MAINTAINERS entry

2018-05-16 Thread George Cherian

The i2c XLP9xx driver is maintained by Cavium.
Add George Cherian and Jan Glauber as the Maintainers.

Signed-off-by: George Cherian <george.cher...@cavium.com>
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index df6e9bb..68da265 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15509,6 +15509,14 @@ L: linux-kernel@vger.kernel.org
 S: Supported
 F: drivers/char/xillybus/
 
+XLP9XX I2C DRIVER
+M: George Cherian <george.cher...@cavium.com>
+M: Jan Glauber <jglau...@cavium.com>
+L: linux-...@vger.kernel.org
+W: http://www.cavium.com
+S: Supported
+F: drivers/i2c/busses/i2c-xlp9xx.c
+
 XRA1403 GPIO EXPANDER
 M: Nandor Han <nandor@ge.com>
 M: Semi Malinen <semi.mali...@ge.com>
-- 
1.8.3.1

[PATCH 4/4] i2c: xlp9xx: Add MAINTAINERS entry

2018-05-16 Thread George Cherian

The i2c XLP9xx driver is maintained by Cavium.
Add George Cherian and Jan Glauber as the Maintainers.

Signed-off-by: George Cherian 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index df6e9bb..68da265 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15509,6 +15509,14 @@ L: linux-kernel@vger.kernel.org
 S: Supported
 F: drivers/char/xillybus/
 
+XLP9XX I2C DRIVER
+M: George Cherian 
+M: Jan Glauber 
+L: linux-...@vger.kernel.org
+W: http://www.cavium.com
+S: Supported
+F: drivers/i2c/busses/i2c-xlp9xx.c
+
 XRA1403 GPIO EXPANDER
 M: Nandor Han 
 M: Semi Malinen 
-- 
1.8.3.1

[PATCH 2/4] i2c: xlp9xx: Fix issue seen when updating receive length

2018-05-16 Thread George Cherian

The hardware does not handle updates to the length register gracefully
if the new value is less than the number of bytes received so far. If
this happens, the i2c controller will not stop the receive transaction
properly.

Fix this by ensuring that the updated length is ok. This is done by
making sure that the new length written to hardware is at least few
bytes more than the bytes received so far.

While at that refactor the length updation to a new function.

Signed-off-by: Jayachandran C <jn...@caviumnetworks.com>
Signed-off-by: George Cherian <george.cher...@cavium.com>
---
 drivers/i2c/busses/i2c-xlp9xx.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index fe54512..c268fde 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -158,9 +158,28 @@ static void xlp9xx_i2c_fill_tx_fifo(struct xlp9xx_i2c_dev 
*priv)
priv->msg_buf += len;
 }
 
+static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev *priv)
+{
+   u32 val, len;
+
+   /*
+* Update receive length. Re-read len to get the latest value,
+* and then add 4 to have a minimum value that can be safely
+* written. This is to account for the byte read above, the
+* transfer in progress and any delays in the register I/O
+*/
+   val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL);
+   len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) &
+ XLP9XX_I2C_FIFO_WCNT_MASK;
+   len = max_t(u32, priv->msg_len, len + 4);
+   val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
+   (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
+   xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val);
+}
+
 static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv)
 {
-   u32 len, i, val;
+   u32 len, i;
u8 rlen, *buf = priv->msg_buf;
 
len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) &
@@ -171,20 +190,13 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
*buf++ = rlen;
-   len--;
-
if (priv->client_pec)
++rlen;
/* update remaining bytes and message length */
priv->msg_buf_remaining = rlen;
priv->msg_len = rlen + 1;
priv->len_recv = false;
-
-   /* Update transfer length to read only actual data */
-   val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL);
-   val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
-   ((rlen + 1) << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
-   xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val);
+   xlp9xx_i2c_update_rlen(priv);
} else {
len = min(priv->msg_buf_remaining, len);
for (i = 0; i < len; i++, buf++)
-- 
1.8.3.1

[PATCH 0/4] i2c-xlp9xx Add support for SMBAlert and minor fixes

2018-05-16 Thread George Cherian

This series adds the SMBAlert support for i2c-xlp9xx driver and the following
fixes.

Patch 2: Make sure we update the transfer length to a future length.

Patch 3: Restrict the transfer size to I2C_SMBUS_BLOCK_SIZE for transfers
with I2C_M_RECV_LEN is set.

Patch 4: While at that update the MAINATINERS file to reflect the current
maintainers of the driver.

George Cherian (4):
  i2c: xlp9xx: Add support for SMBAlert
  i2c: xlp9xx: Fix issue seen when updating receive length
  i2c: xlp9xx: Make sure the transfer size is not more than
I2C_SMBUS_BLOCK_SIZE
  i2c: xlp9xx: Add MAINTAINERS entry

 MAINTAINERS |  8 
 drivers/i2c/busses/i2c-xlp9xx.c | 89 +++--
 2 files changed, 76 insertions(+), 21 deletions(-)

-- 
1.8.3.1

[PATCH 2/4] i2c: xlp9xx: Fix issue seen when updating receive length

2018-05-16 Thread George Cherian

The hardware does not handle updates to the length register gracefully
if the new value is less than the number of bytes received so far. If
this happens, the i2c controller will not stop the receive transaction
properly.

Fix this by ensuring that the updated length is ok. This is done by
making sure that the new length written to hardware is at least few
bytes more than the bytes received so far.

While at that refactor the length updation to a new function.

Signed-off-by: Jayachandran C 
Signed-off-by: George Cherian 
---
 drivers/i2c/busses/i2c-xlp9xx.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c
index fe54512..c268fde 100644
--- a/drivers/i2c/busses/i2c-xlp9xx.c
+++ b/drivers/i2c/busses/i2c-xlp9xx.c
@@ -158,9 +158,28 @@ static void xlp9xx_i2c_fill_tx_fifo(struct xlp9xx_i2c_dev 
*priv)
priv->msg_buf += len;
 }
 
+static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev *priv)
+{
+   u32 val, len;
+
+   /*
+* Update receive length. Re-read len to get the latest value,
+* and then add 4 to have a minimum value that can be safely
+* written. This is to account for the byte read above, the
+* transfer in progress and any delays in the register I/O
+*/
+   val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL);
+   len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) &
+ XLP9XX_I2C_FIFO_WCNT_MASK;
+   len = max_t(u32, priv->msg_len, len + 4);
+   val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
+   (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
+   xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val);
+}
+
 static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv)
 {
-   u32 len, i, val;
+   u32 len, i;
u8 rlen, *buf = priv->msg_buf;
 
len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) &
@@ -171,20 +190,13 @@ static void xlp9xx_i2c_drain_rx_fifo(struct 
xlp9xx_i2c_dev *priv)
/* read length byte */
rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO);
*buf++ = rlen;
-   len--;
-
if (priv->client_pec)
++rlen;
/* update remaining bytes and message length */
priv->msg_buf_remaining = rlen;
priv->msg_len = rlen + 1;
priv->len_recv = false;
-
-   /* Update transfer length to read only actual data */
-   val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL);
-   val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) |
-   ((rlen + 1) << XLP9XX_I2C_CTRL_MCTLEN_SHIFT);
-   xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val);
+   xlp9xx_i2c_update_rlen(priv);
} else {
len = min(priv->msg_buf_remaining, len);
for (i = 0; i < len; i++, buf++)
-- 
1.8.3.1

[PATCH 0/4] i2c-xlp9xx Add support for SMBAlert and minor fixes

2018-05-16 Thread George Cherian

This series adds the SMBAlert support for i2c-xlp9xx driver and the following
fixes.

Patch 2: Make sure we update the transfer length to a future length.

Patch 3: Restrict the transfer size to I2C_SMBUS_BLOCK_SIZE for transfers
with I2C_M_RECV_LEN is set.

Patch 4: While at that update the MAINATINERS file to reflect the current
maintainers of the driver.

George Cherian (4):
  i2c: xlp9xx: Add support for SMBAlert
  i2c: xlp9xx: Fix issue seen when updating receive length
  i2c: xlp9xx: Make sure the transfer size is not more than
I2C_SMBUS_BLOCK_SIZE
  i2c: xlp9xx: Add MAINTAINERS entry

 MAINTAINERS |  8 
 drivers/i2c/busses/i2c-xlp9xx.c | 89 +++--
 2 files changed, 76 insertions(+), 21 deletions(-)

-- 
1.8.3.1

[PATCH] cpufreq: cppc: Use transition_delay_us depending on the transition_latency

2018-03-23 Thread George Cherian

With commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
value to 10 ms")  the cpufreq was not honouring the delay passed via
ACPI (PCCT). Due to which on ARM based platforms using CPPC the cpufreq
governor tries to change the frequency of CPU faster than expeted.

This leads to continuous error messages like the following.
" ACPI CPPC: PCC check channel failed. Status=0 "

Earlier (without above commit) the default transition delay was
taken form the value passed from PCCT. Use the same value provided by PCCT
to set the transition_delay_us.

Fixes: e948bc8fbee0 (cpufreq: Cap the default transition delay value to 10 ms)
Signed-off-by: George Cherian <george.cher...@cavium.com>
---
 drivers/cpufreq/cppc_cpufreq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a1c3025..dcb1cb9 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -162,6 +163,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.max_freq = cppc_dmi_max_khz;
 
policy->cpuinfo.transition_latency = 
cppc_get_transition_latency(cpu_num);
+   policy->transition_delay_us = cppc_get_transition_latency(cpu_num) /
+   NSEC_PER_USEC;
policy->shared_type = cpu->shared_type;
 
if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
-- 
1.8.3.1

[PATCH] cpufreq: cppc: Use transition_delay_us depending on the transition_latency

2018-03-23 Thread George Cherian

With commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
value to 10 ms")  the cpufreq was not honouring the delay passed via
ACPI (PCCT). Due to which on ARM based platforms using CPPC the cpufreq
governor tries to change the frequency of CPU faster than expeted.

This leads to continuous error messages like the following.
" ACPI CPPC: PCC check channel failed. Status=0 "

Earlier (without above commit) the default transition delay was
taken form the value passed from PCCT. Use the same value provided by PCCT
to set the transition_delay_us.

Fixes: e948bc8fbee0 (cpufreq: Cap the default transition delay value to 10 ms)
Signed-off-by: George Cherian 
---
 drivers/cpufreq/cppc_cpufreq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a1c3025..dcb1cb9 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -162,6 +163,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.max_freq = cppc_dmi_max_khz;
 
policy->cpuinfo.transition_latency = 
cppc_get_transition_latency(cpu_num);
+   policy->transition_delay_us = cppc_get_transition_latency(cpu_num) /
+   NSEC_PER_USEC;
policy->shared_type = cpu->shared_type;
 
if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
-- 
1.8.3.1

Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-03-13 Thread George Cherian

Hi Bjorn,

On 02/22/2018 08:39 PM, Bjorn Helgaas wrote:

On Thu, Feb 22, 2018 at 06:43:34PM +0530, George Cherian wrote:

On 02/22/2018 04:50 AM, Bjorn Helgaas wrote:

On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote:

On 02/21/2018 03:24 PM, Lukas Wunner wrote:

On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote:

I will explain the setup used
To the Cavium ThunderX RC the following PLX device is connected.
PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express
Gen 3 (8.0 GT/s) Switch
There is no device connected downstream to the PLX switch.

AFAIU the pcie_port driver probes PLX and enters autosuspend
after 100ms since pci_bridge_d3_possible() returns true.

And later pci_sysfs_init() ends up doing a config access of
PLX which fails with a "synchronous external abort"

Thanks for the details!

This one *should* be fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization=bf6c089ee2ac67eb22c0ff0ac9cc7f9ccd619d90

Any chance you could try that out?

I did try your patch and it works fine on the above failing setup.

Thanks for testing it!

I have found another configuration where this fails.
Following is the configuration
1) Connected a PCIe Intel i40 card under the root port.
2) unbind the i40 driver and bind with vfio-pci driver.
3) Run lspci in a loop. "lspci -s xx:xx.xx -vvv"

I get the same synchronous external abort.
In this case the vfio-pci driver probe it moves the device (i40) to
D3hot provided disable_idle_d3 is not set. lspci tries to do
the config_access which fails with synchronous external abort when
the root port transitions to D3hot.

the stack trace for this issue looks like this
[] pci_generic_config_read+0x5c/0xf0
[] pci_user_read_config_dword+0x84/0x110
[] pci_vpd_read+0x100/0x208
[] pci_read_vpd+0x50/0x68
[] read_vpd_attr+0x60/0x80
[] sysfs_kf_bin_read+0x6c/0xa8
[] kernfs_fop_read+0xa4/0x1c8
[] __vfs_read+0x60/0x170
[] vfs_read+0x8c/0x148
[] SyS_pread64+0xbc/0xd8

I have tried adding pci_config_pm_runtime_get/put pair inside
pci_vpd_read(), which I guess might be needed, in case the device goes
to D3cold. But having said that it didnt fix the problem in our platform.

Your original patch avoids this problem by setting PCI_DEV_FLAGS_NO_D3
on the root port, so it seems like this must be somehow related to the
root port's state.

This seems to be another issue and is not related to $SUBJECT.
Our Hardware team is internally looking into the same and will keep you
posted of any further details.

Thanks for your time and suggestions.

I assume this VPD read is on the i40 device, right? Since you're
still seeing the problem even after calling
pci_config_pm_runtime_get(), I assume the root port is still not in
D0. Can you add a little more instrumentation to read PCI_PM_CTRL and
PCI_PM_PPB_EXTENSIONS for the root port and PCI_PM_CTRL for the i40
device right after you call pci_config_pm_runtime_get()?

I don't see anything obviously different between the pci_read_config()
path and the pci_vpd_read() path except for the
pci_config_pm_runtime_get() call that you've already added. I guess
you could try using setpci instead of lspci to see if the failure only
happens in the pci_vpd_read() path. I assume that will be the case
because lspci probably does config reads before it does the VPD read,
and those initial config reads seemed to work OK.

The VPD path does do config writes in addition to config reads. Maybe
there's something special about writes, although I don't know what
that would be. You can tell I'm running out of ideas here :)

Bjorn

-George

Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-03-13 Thread George Cherian

Hi Bjorn,

On 02/22/2018 08:39 PM, Bjorn Helgaas wrote:

On Thu, Feb 22, 2018 at 06:43:34PM +0530, George Cherian wrote:

On 02/22/2018 04:50 AM, Bjorn Helgaas wrote:

On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote:

On 02/21/2018 03:24 PM, Lukas Wunner wrote:

On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote:

AFAIU the pcie_port driver probes PLX and enters autosuspend
after 100ms since pci_bridge_d3_possible() returns true.

And later pci_sysfs_init() ends up doing a config access of
PLX which fails with a "synchronous external abort"

Thanks for the details!

This one *should* be fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization=bf6c089ee2ac67eb22c0ff0ac9cc7f9ccd619d90

Any chance you could try that out?

I did try your patch and it works fine on the above failing setup.

Thanks for testing it!

Your original patch avoids this problem by setting PCI_DEV_FLAGS_NO_D3
on the root port, so it seems like this must be somehow related to the
root port's state.

This seems to be another issue and is not related to $SUBJECT.
Our Hardware team is internally looking into the same and will keep you
posted of any further details.

Thanks for your time and suggestions.

The VPD path does do config writes in addition to config reads. Maybe
there's something special about writes, although I don't know what
that would be. You can tell I'm running out of ideas here :)

Bjorn

-George

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1006 matches

Mail list logo