[PATCH net-next 0/2] Add devlink health reporters for NIX block
Devlink health reporters are added for the NIX block. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html This series is in continuation to https://www.spinics.net/lists/netdev/msg707798.html Added Documentation for the same. George Cherian (2): octeontx2-af: Add devlink health reporters for NIX docs: octeontx2: Add Documentation for NIX health reporters .../ethernet/marvell/octeontx2.rst| 70 ++ .../marvell/octeontx2/af/rvu_devlink.c| 652 +- .../marvell/octeontx2/af/rvu_devlink.h| 27 + .../marvell/octeontx2/af/rvu_struct.h | 10 + 4 files changed, 758 insertions(+), 1 deletion(-) -- 2.25.1
[PATCH net-next 2/2] docs: octeontx2: Add Documentation for NIX health reporters
Add devlink health reporter documentation for NIX block. Signed-off-by: George Cherian --- .../ethernet/marvell/octeontx2.rst| 70 +++ 1 file changed, 70 insertions(+) diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst index 61e850460e18..dd5cd69467be 100644 --- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst +++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst @@ -217,3 +217,73 @@ For example:: NPA_AF_ERR: NPA Error Interrupt Reg : 4096 AQ Doorbell Error + + +NIX Reporters +- +The NIX reporters are responsible for reporting and recovering the following group of errors: + +1. GENERAL events + + - Receive mirror/multicast packet drop due to insufficient buffer. + - SMQ Flush operation. + +2. ERROR events + + - Memory Fault due to WQE read/write from multicast/mirror buffer. + - Receive multicast/mirror replication list error. + - Receive packet on an unmapped PF. + - Fault due to NIX_AQ_INST_S read or NIX_AQ_RES_S write. + - AQ Doorbell Error. + +3. RAS events + + - RAS Error Reporting for NIX Receive Multicast/Mirror Entry Structure. + - RAS Error Reporting for WQE/Packet Data read from Multicast/Mirror Buffer.. + - RAS Error Reporting for NIX_AQ_INST_S/NIX_AQ_RES_S. + +4. RVU events + + - Error due to unmapped slot. + +Sample Output:: + + ~# ./devlink health + pci/0002:01:00.0: + reporter hw_npa_intr + state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true + reporter hw_npa_gen + state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true + reporter hw_npa_err + state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true + reporter hw_npa_ras + state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true + reporter hw_nix_intr + state healthy error 1121 recover 1121 last_dump_date 2021-01-19 last_dump_time 05:42:26 grace_period 0 auto_recover true auto_dump true + reporter hw_nix_gen + state healthy error 949 recover 949 last_dump_date 2021-01-19 last_dump_time 05:42:43 grace_period 0 auto_recover true auto_dump true + reporter hw_nix_err + state healthy error 1147 recover 1147 last_dump_date 2021-01-19 last_dump_time 05:42:59 grace_period 0 auto_recover true auto_dump true + reporter hw_nix_ras + state healthy error 409 recover 409 last_dump_date 2021-01-19 last_dump_time 05:43:16 grace_period 0 auto_recover true auto_dump true + +Each reporter dumps the + + - Error Type + - Error Register value + - Reason in words + +For example:: + + ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr +NIX_AF_RVU: + NIX RVU Interrupt Reg : 1 + Unmap Slot Error + ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen +NIX_AF_GENERAL: + NIX General Interrupt Reg : 1 + Rx multicast pkt drop + ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_err +NIX_AF_ERR: + NIX Error Interrupt Reg : 64 + Rx on unmapped PF_FUNC -- 2.25.1
[PATCH net-next 1/2] octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NIX block. NIX Health reporters handle following HW event groups - GENERAL events - ERROR events - RAS events - RVU event Output: # devlink health pci/0002:01:00.0: reporter hw_npa_intr state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_npa_gen state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_npa_err state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_npa_ras state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_nix_intr state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_nix_gen state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_nix_err state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_nix_ras state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true # devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr NIX_AF_RVU: NIX RVU Interrupt Reg : 1 Unmap Slot Error # devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen NIX_AF_GENERAL: NIX General Interrupt Reg : 1 Rx multicast pkt drop Each reporter dump shows the Register value and the description of the cause. Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 652 +- .../marvell/octeontx2/af/rvu_devlink.h| 27 + .../marvell/octeontx2/af/rvu_struct.h | 10 + 3 files changed, 688 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index bc0e4113370e..10a98bcb7c54 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -52,6 +52,650 @@ static bool rvu_common_request_irq(struct rvu *rvu, int offset, return rvu->irq_allocated[offset]; } +static void rvu_nix_intr_work(struct work_struct *work) +{ + struct rvu_nix_health_reporters *rvu_nix_health_reporter; + + rvu_nix_health_reporter = container_of(work, struct rvu_nix_health_reporters, intr_work); + devlink_health_report(rvu_nix_health_reporter->rvu_hw_nix_intr_reporter, + "NIX_AF_RVU Error", + rvu_nix_health_reporter->nix_event_ctx); +} + +static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->rvu_nix_health_reporter->nix_event_ctx; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT); + nix_event_context->nix_af_rvu_int = intr; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL); + queue_work(rvu_dl->devlink_wq, _dl->rvu_nix_health_reporter->intr_work); + + return IRQ_HANDLED; +} + +static void rvu_nix_gen_work(struct work_struct *work) +{ + struct rvu_nix_health_reporters *rvu_nix_health_reporter; + + rvu_nix_health_reporter = container_of(work, struct rvu_nix_health_reporters, gen_work); + devlink_health_report(rvu_nix_health_reporter->rvu_hw_nix_gen_reporter, + "NIX_AF_GEN Error", + rvu_nix_health_reporter->nix_event_ctx); +} + +static irqreturn_t rvu_nix_af_rvu_gen_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->rvu_nix_health_reporter->nix_event_ctx; + intr = rvu_read64(rvu, blkaddr, NIX_AF_GEN_INT); + nix_event_context->nix_af_rvu_gen = intr; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_GEN_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_GEN_INT_ENA_W1C, ~0ULL); + queue_work(rvu_dl->devlink_wq, _dl->rvu_nix_health_reporter->gen_work); + + return IRQ_HANDLED; +} + +static void rvu_nix_err_work(struct work_struct *work) +{ + struct rvu_nix_health_reporters *rvu_nix_health_reporter; + + rvu_nix_health_reporter = container_
Re: [PATCH v2] docs: octeontx2: tune rst markup
On Wed, Jan 6, 2021 at 9:51 PM Lukas Bulwahn wrote: > > Commit 80b9414832a1 ("docs: octeontx2: Add Documentation for NPA health > reporters") added new documentation with improper formatting for rst, and > caused a few new warnings for make htmldocs in octeontx2.rst:169--202. > > Tune markup and formatting for better presentation in the HTML view. > > Signed-off-by: Lukas Bulwahn > --- > v1 -> v2: minor stylistic tuning as suggested by Randy > > applies cleanly on current master (v5.11-rc2) and next-20210106 > > George, please ack. > Jonathan, please pick this minor formatting clean-up patch. > Acked-by: George Cherian Regards -George
RE: [PATCH][next] octeontx2-af: Fix undetected unmap PF error check
> -Original Message- > From: Colin King > Sent: Wednesday, December 16, 2020 6:06 PM > To: Sunil Kovvuri Goutham ; Linu Cherian > ; Geethasowjanya Akula ; > Jerin Jacob Kollanukkaran ; David S . Miller > ; Jakub Kicinski ; George > Cherian ; net...@vger.kernel.org > Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH][next] octeontx2-af: Fix undetected unmap PF error > check > > From: Colin Ian King > > Currently the check for an unmap PF error is always going to be false because > intr_val is a 32 bit int and is being bit-mask checked against 1ULL << 32. > Fix > this by making intr_val a u64 to match the type at it is copied from, namely > npa_event_context->npa_af_rvu_ge. > > Addresses-Coverity: ("Operands don't affect result") > Fixes: f1168d1e207c ("octeontx2-af: Add devlink health reporters for NPA") > Signed-off-by: Colin Ian King Acked-by: George Cherian > --- > drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c > b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c > index 3f9d0ab6d5ae..bc0e4113370e 100644 > --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c > +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c > @@ -275,7 +275,8 @@ static int rvu_npa_report_show(struct devlink_fmsg > *fmsg, void *ctx, > enum npa_af_rvu_health health_reporter) { > struct rvu_npa_event_ctx *npa_event_context; > - unsigned int intr_val, alloc_dis, free_dis; > + unsigned int alloc_dis, free_dis; > + u64 intr_val; > int err; > > npa_event_context = ctx; > -- > 2.29.2 Regards, -George
[PATCHv6 net-next 0/3] Add devlink and devlink health reporters to
Add basic devlink and devlink health reporters. Devlink health reporters are added for NPA block. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html For now, I have dropped the NIX block health reporters. This series attempts to add health reporters only for the NPA block. As per Jakub's suggestion separate reporters per event is used and also got rid of the counters. Change-log: v6 - Address Jakub comments - Add reporters per event for each block. - Remove the Sw counter. - Remove the mbox version from devlink info. v5 - Address Jiri's comment - use devlink_fmsg_arr_pair_nest_start() for NIX blocks v4 - Rebase to net-next (no logic changes). v3 - Address Saeed's comments on v2. - Renamed the reporter name as hw_*. - Call devlink_health_report() when an event is raised. - Added recover op too. v2 - Address Willem's comments on v1. - Fixed the sparse issues, reported by Jakub. George Cherian (3): octeontx2-af: Add devlink suppoort to af driver octeontx2-af: Add devlink health reporters for NPA docs: octeontx2: Add Documentation for NPA health reporters .../ethernet/marvell/octeontx2.rst| 50 ++ .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 2 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 +- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 + .../marvell/octeontx2/af/rvu_devlink.c| 770 ++ .../marvell/octeontx2/af/rvu_devlink.h| 55 ++ .../marvell/octeontx2/af/rvu_struct.h | 23 + 8 files changed, 912 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h -- 2.25.1
[PATCHv6 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af # Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 2 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 ++- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 ++ .../marvell/octeontx2/af/rvu_devlink.c| 64 +++ .../marvell/octeontx2/af/rvu_devlink.h| 20 ++ 6 files changed, 98 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 543a1d047567..16caa02095fe 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" select OCTEONTX2_MBOX + select NET_DEVLINK depends on (64BIT && COMPILE_TEST) || ARM64 depends on PCI help diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 7100d1dd856e..eb535c98ca38 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o rvu_trace.o octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \ rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \ - rvu_cpt.o + rvu_cpt.o rvu_devlink.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 9f901c0edcbb..e8fd712860a1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_flr; + err = rvu_register_dl(rvu); + if (err) + goto err_irq; + rvu_setup_rvum_blk_revid(rvu); /* Enable AF's VFs (if any) */ err = rvu_enable_sriov(rvu); if (err) - goto err_irq; + goto err_dl; /* Initialize debugfs */ rvu_dbg_init(rvu); return 0; +err_dl: + rvu_unregister_dl(rvu); err_irq: rvu_unregister_interrupts(rvu); err_flr: @@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev) rvu_dbg_exit(rvu); rvu_unregister_interrupts(rvu); + rvu_unregister_dl(rvu); rvu_flr_wq_destroy(rvu); rvu_cgx_exit(rvu); rvu_fwdata_exit(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index b6c0977499ab..b1a6ecfd563e 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -12,7 +12,10 @@ #define RVU_H #include +#include + #include "rvu_struct.h" +#include "rvu_devlink.h" #include "common.h" #include "mbox.h" #include "npc.h" @@ -422,6 +425,7 @@ struct rvu { #ifdef CONFIG_DEBUG_FS struct rvu_debugfs rvu_dbg; #endif + struct rvu_devlink *rvu_dl; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c new file mode 100644 index ..5dabca04a34b --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Devlink + * + * Copyright (C) 2020 Marvell. + * + */ + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" + +static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + return devlink_info_driver_name_put(req, DRV_NAME); +} + +static const struct devlink_ops rvu_devlink_ops = { + .info_get = rvu_devlink_info_get, +}; + +int rvu_register_dl(struct rvu *rvu) +{ + struct rvu_devlink *rvu_dl; + struct devlink *dl; + int err; + + rvu_dl = kzalloc(sizeof(*rvu_dl), GFP_KERNEL); + if (!rvu_dl) + return -ENOMEM; + + dl = devlink_alloc(_devlink_ops, sizeof(struct rvu_devlink)); + if (!dl) { +
[PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
Add health reporters for RVU NPA block. NPA Health reporters handle following HW event groups - GENERAL events - ERROR events - RAS events - RVU event Output: #devlink health pci/0002:01:00.0: reporter hw_npa_intr state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_npa_gen state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_npa_err state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_npa_ras state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true #devlink health dump show pci/0002:01:00.0 reporter hw_npa_err NPA_AF_ERR: NPA Error Interrupt Reg : 4096 AQ Doorbell Error #devlink health dump show pci/0002:01:00.0 reporter hw_npa_ras NPA_AF_RVU_RAS: NPA RAS Interrupt Reg : 0 Each reporter dump shows the Register value and the description of the cause. Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 708 +- .../marvell/octeontx2/af/rvu_devlink.h| 35 + .../marvell/octeontx2/af/rvu_struct.h | 23 + 3 files changed, 765 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 5dabca04a34b..3f9d0ab6d5ae 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -5,10 +5,714 @@ * */ +#include + #include "rvu.h" +#include "rvu_reg.h" +#include "rvu_struct.h" #define DRV_NAME "octeontx2-af" +static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + return devlink_fmsg_obj_nest_start(fmsg); +} + +static int rvu_report_pair_end(struct devlink_fmsg *fmsg) +{ + int err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + + return devlink_fmsg_pair_nest_end(fmsg); +} + +static bool rvu_common_request_irq(struct rvu *rvu, int offset, + const char *name, irq_handler_t fn) +{ + struct rvu_devlink *rvu_dl = rvu->rvu_dl; + int rc; + + sprintf(>irq_name[offset * NAME_SIZE], name); + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0, +>irq_name[offset * NAME_SIZE], rvu_dl); + if (rc) + dev_warn(rvu->dev, "Failed to register %s irq\n", name); + else + rvu->irq_allocated[offset] = true; + + return rvu->irq_allocated[offset]; +} + +static void rvu_npa_intr_work(struct work_struct *work) +{ + struct rvu_npa_health_reporters *rvu_npa_health_reporter; + + rvu_npa_health_reporter = container_of(work, struct rvu_npa_health_reporters, intr_work); + devlink_health_report(rvu_npa_health_reporter->rvu_hw_npa_intr_reporter, + "NPA_AF_RVU Error", + rvu_npa_health_reporter->npa_event_ctx); +} + +static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_ctx *npa_event_context; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_context = rvu_dl->rvu_npa_health_reporter->npa_event_ctx; + intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT); + npa_event_context->npa_af_rvu_int = intr; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL); + queue_work(rvu_dl->devlink_wq, _dl->rvu_npa_health_reporter->intr_work); + + return IRQ_HANDLED; +} + +static void rvu_npa_gen_work(struct work_struct *work) +{ + struct rvu_npa_health_reporters *rvu_npa_health_reporter; + + rvu_npa_health_reporter = container_of(work, struct rvu_npa_health_reporters, gen_work); + devlink_health_report(rvu_npa_health_reporter->rvu_hw_npa_gen_reporter, + "NPA_AF_GEN Error", + rvu_npa_health_reporter->npa_event_ctx); +} + +static irqreturn_t rvu_npa_af_gen_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_ctx *npa_event_context; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); +
[PATCH 3/3] docs: octeontx2: Add Documentation for NPA health reporters
Add Documentation for devlink health reporters for NPA block. Signed-off-by: George Cherian --- .../ethernet/marvell/octeontx2.rst| 50 +++ 1 file changed, 50 insertions(+) diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst index 88f508338c5f..d3fcf536d14e 100644 --- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst +++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst @@ -12,6 +12,7 @@ Contents - `Overview`_ - `Drivers`_ - `Basic packet flow`_ +- `Devlink health reporters`_ Overview @@ -157,3 +158,52 @@ Egress 3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF. 4. NIX block transmits the pkt on the designated channel. 5. NPC MCAM entries can be installed to divert pkt onto a different channel. + +Devlink health reporters + + +NPA Reporters +- +The NPA reporters are responsible for reporting and recovering the following group of errors +1. GENERAL events + - Error due to operation of unmapped PF. + - Error due to disabled alloc/free for other HW blocks (NIX, SSO, TIM, DPI and AURA). +2. ERROR events + - Fault due to NPA_AQ_INST_S read or NPA_AQ_RES_S write. + - AQ Doorbell Error. +3. RAS events + - RAS Error Reporting for NPA_AQ_INST_S/NPA_AQ_RES_S. +4. RVU events + - Error due to unmapped slot. + +Sample Output +- +~# devlink health +pci/0002:01:00.0: + reporter hw_npa_intr + state healthy error 2872 recover 2872 last_dump_date 2020-12-10 last_dump_time 09:39:09 grace_period 0 auto_recover true auto_dump true + reporter hw_npa_gen + state healthy error 2872 recover 2872 last_dump_date 2020-12-11 last_dump_time 04:43:04 grace_period 0 auto_recover true auto_dump true + reporter hw_npa_err + state healthy error 2871 recover 2871 last_dump_date 2020-12-10 last_dump_time 09:39:17 grace_period 0 auto_recover true auto_dump true + reporter hw_npa_ras + state healthy error 0 recover 0 last_dump_date 2020-12-10 last_dump_time 09:32:40 grace_period 0 auto_recover true auto_dump true + +Each reporter dumps the + - Error Type + - Error Register value + - Reason in words + +For eg: +~# devlink health dump show pci/0002:01:00.0 reporter hw_npa_gen + NPA_AF_GENERAL: + NPA General Interrupt Reg : 1 + NIX0: free disabled RX +~# devlink health dump show pci/0002:01:00.0 reporter hw_npa_intr + NPA_AF_RVU: + NPA RVU Interrupt Reg : 1 + Unmap Slot Error +~# devlink health dump show pci/0002:01:00.0 reporter hw_npa_err + NPA_AF_ERR: +NPA Error Interrupt Reg : 4096 +AQ Doorbell Error -- 2.25.1
RE: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
> -Original Message- > From: George Cherian > Sent: Tuesday, December 1, 2020 10:49 AM > To: 'Jakub Kicinski' > Cc: 'net...@vger.kernel.org' ; 'linux- > ker...@vger.kernel.org' ; > 'da...@davemloft.net' ; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; 'masahi...@kernel.org' > ; 'willemdebruijn.ker...@gmail.com' > ; 'sa...@kernel.org' > ; 'j...@resnulli.us' > Subject: RE: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health > reporters for NPA > > Jakub, > > > -Original Message- > > From: George Cherian > > Sent: Tuesday, December 1, 2020 9:06 AM > > To: Jakub Kicinski > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > > da...@davemloft.net; Sunil Kovvuri Goutham > ; > > Linu Cherian ; Geethasowjanya Akula > > ; masahi...@kernel.org; > > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us > > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health > > reporters for NPA > > > > Hi Jakub, > > > > > -Original Message- > > > From: Jakub Kicinski > > > Sent: Tuesday, December 1, 2020 7:59 AM > > > To: George Cherian > > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > > > da...@davemloft.net; Sunil Kovvuri Goutham > > ; > > > Linu Cherian ; Geethasowjanya Akula > > > ; masahi...@kernel.org; > > > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us > > > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health > > > reporters for NPA > > > > > > On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote: > > > > Add health reporters for RVU NPA block. > > > > NPA Health reporters handle following HW event groups > > > > - GENERAL events > > > > - ERROR events > > > > - RAS events > > > > - RVU event > > > > An event counter per event is maintained in SW. > > > > > > > > Output: > > > > # devlink health > > > > pci/0002:01:00.0: > > > >reporter hw_npa > > > > state healthy error 0 recover 0 # devlink health dump show > > > > pci/0002:01:00.0 reporter hw_npa > > > > NPA_AF_GENERAL: > > > > Unmap PF Error: 0 > > > > NIX: > > > > 0: free disabled RX: 0 free disabled TX: 0 > > > > 1: free disabled RX: 0 free disabled TX: 0 > > > > Free Disabled for SSO: 0 > > > > Free Disabled for TIM: 0 > > > > Free Disabled for DPI: 0 > > > > Free Disabled for AURA: 0 > > > > Alloc Disabled for Resvd: 0 > > > > NPA_AF_ERR: > > > > Memory Fault on NPA_AQ_INST_S read: 0 > > > > Memory Fault on NPA_AQ_RES_S write: 0 > > > > AQ Doorbell Error: 0 > > > > Poisoned data on NPA_AQ_INST_S read: 0 > > > > Poisoned data on NPA_AQ_RES_S write: 0 > > > > Poisoned data on HW context read: 0 > > > > NPA_AF_RVU: > > > > Unmap Slot Error: 0 > > > > > > You seem to have missed the feedback Saeed and I gave you on v2. > > > > > > Did you test this with the errors actually triggering? Devlink > > > should store only > > Yes, the same was tested using devlink health test interface by > > injecting errors. > > The dump gets generated automatically and the counters do get out of > > sync, in case of continuous error. > > That wouldn't be much of an issue as the user could manually trigger a > > dump clear and Re-dump the counters to get the exact status of the > > counters at any point of time. > > Now that recover op is added the devlink error counter and recover counter > will be proper. The internal counter for each event is needed just to > understand within a specific reporter, how many such events occurred. > > Following is the log snippet of the devlink health test being done on hw_nix > reporter. > # for i in `seq 1 33` ; do devlink health test pci/0002:01:00.0 reporter > hw_nix; > done //Inject 33 errors (16 of NIX_AF_RVU and 17 of NIX_AF_RAS and > NIX_AF_GENERAL errors) # devlink health > pci/0002:01:00.0: > reporter hw_npa > state healthy error 0 recover 0 grace_period 0 auto_recover true > auto_dump true > reporter hw_nix > state healthy error 250 recover 250 last_dump_date 1970-01-01 > last_dump_time 00:04:16 grace_period 0 auto_recover true auto_dump true Oops,
RE: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Jakub, > -Original Message- > From: George Cherian > Sent: Tuesday, December 1, 2020 9:06 AM > To: Jakub Kicinski > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > da...@davemloft.net; Sunil Kovvuri Goutham ; > Linu Cherian ; Geethasowjanya Akula > ; masahi...@kernel.org; > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health > reporters for NPA > > Hi Jakub, > > > -Original Message- > > From: Jakub Kicinski > > Sent: Tuesday, December 1, 2020 7:59 AM > > To: George Cherian > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > > da...@davemloft.net; Sunil Kovvuri Goutham > ; > > Linu Cherian ; Geethasowjanya Akula > > ; masahi...@kernel.org; > > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us > > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health > > reporters for NPA > > > > On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote: > > > Add health reporters for RVU NPA block. > > > NPA Health reporters handle following HW event groups > > > - GENERAL events > > > - ERROR events > > > - RAS events > > > - RVU event > > > An event counter per event is maintained in SW. > > > > > > Output: > > > # devlink health > > > pci/0002:01:00.0: > > >reporter hw_npa > > > state healthy error 0 recover 0 # devlink health dump show > > > pci/0002:01:00.0 reporter hw_npa > > > NPA_AF_GENERAL: > > > Unmap PF Error: 0 > > > NIX: > > > 0: free disabled RX: 0 free disabled TX: 0 > > > 1: free disabled RX: 0 free disabled TX: 0 > > > Free Disabled for SSO: 0 > > > Free Disabled for TIM: 0 > > > Free Disabled for DPI: 0 > > > Free Disabled for AURA: 0 > > > Alloc Disabled for Resvd: 0 > > > NPA_AF_ERR: > > > Memory Fault on NPA_AQ_INST_S read: 0 > > > Memory Fault on NPA_AQ_RES_S write: 0 > > > AQ Doorbell Error: 0 > > > Poisoned data on NPA_AQ_INST_S read: 0 > > > Poisoned data on NPA_AQ_RES_S write: 0 > > > Poisoned data on HW context read: 0 > > > NPA_AF_RVU: > > > Unmap Slot Error: 0 > > > > You seem to have missed the feedback Saeed and I gave you on v2. > > > > Did you test this with the errors actually triggering? Devlink should > > store only > Yes, the same was tested using devlink health test interface by injecting > errors. > The dump gets generated automatically and the counters do get out of sync, > in case of continuous error. > That wouldn't be much of an issue as the user could manually trigger a dump > clear and Re-dump the counters to get the exact status of the counters at > any point of time. Now that recover op is added the devlink error counter and recover counter will be proper. The internal counter for each event is needed just to understand within a specific reporter, how many such events occurred. Following is the log snippet of the devlink health test being done on hw_nix reporter. # for i in `seq 1 33` ; do devlink health test pci/0002:01:00.0 reporter hw_nix; done //Inject 33 errors (16 of NIX_AF_RVU and 17 of NIX_AF_RAS and NIX_AF_GENERAL errors) # devlink health pci/0002:01:00.0: reporter hw_npa state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true reporter hw_nix state healthy error 250 recover 250 last_dump_date 1970-01-01 last_dump_time 00:04:16 grace_period 0 auto_recover true auto_dump true # devlink health dump show pci/0002:01:00.0 reporter hw_nix NIX_AF_GENERAL: Memory Fault on NIX_AQ_INST_S read: 1 Memory Fault on NIX_AQ_RES_S write: 1 AQ Doorbell error: 1 Rx on unmapped PF_FUNC: 1 Rx multicast replication error: 1 Memory fault on NIX_RX_MCE_S read: 1 Memory fault on multicast WQE read: 1 Memory fault on mirror WQE read: 1 Memory fault on mirror pkt write: 1 Memory fault on multicast pkt write: 1 NIX_AF_RAS: Poisoned data on NIX_AQ_INST_S read: 1 Poisoned data on NIX_AQ_RES_S write: 1 Poisoned data on HW context read: 1 Poisoned data on packet read from mirror buffer: 1 Poisoned data on packet read from mcast buffer: 1 Poisoned data on WQE read from mirror buffer: 1 Poisoned data on WQE read from multicast buffer: 1 Poisoned data on NIX_RX_MCE_S read: 1 NIX_AF_RVU: Unmap Slot Error: 0 #
Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Hi Jakub, > -Original Message- > From: Jakub Kicinski > Sent: Tuesday, December 1, 2020 7:59 AM > To: George Cherian > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > da...@davemloft.net; Sunil Kovvuri Goutham ; > Linu Cherian ; Geethasowjanya Akula > ; masahi...@kernel.org; > willemdebruijn.ker...@gmail.com; sa...@kernel.org; j...@resnulli.us > Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health > reporters for NPA > > On Thu, 26 Nov 2020 19:32:50 +0530 George Cherian wrote: > > Add health reporters for RVU NPA block. > > NPA Health reporters handle following HW event groups > > - GENERAL events > > - ERROR events > > - RAS events > > - RVU event > > An event counter per event is maintained in SW. > > > > Output: > > # devlink health > > pci/0002:01:00.0: > >reporter hw_npa > > state healthy error 0 recover 0 > > # devlink health dump show pci/0002:01:00.0 reporter hw_npa > > NPA_AF_GENERAL: > > Unmap PF Error: 0 > > NIX: > > 0: free disabled RX: 0 free disabled TX: 0 > > 1: free disabled RX: 0 free disabled TX: 0 > > Free Disabled for SSO: 0 > > Free Disabled for TIM: 0 > > Free Disabled for DPI: 0 > > Free Disabled for AURA: 0 > > Alloc Disabled for Resvd: 0 > > NPA_AF_ERR: > > Memory Fault on NPA_AQ_INST_S read: 0 > > Memory Fault on NPA_AQ_RES_S write: 0 > > AQ Doorbell Error: 0 > > Poisoned data on NPA_AQ_INST_S read: 0 > > Poisoned data on NPA_AQ_RES_S write: 0 > > Poisoned data on HW context read: 0 > > NPA_AF_RVU: > > Unmap Slot Error: 0 > > You seem to have missed the feedback Saeed and I gave you on v2. > > Did you test this with the errors actually triggering? Devlink should store > only Yes, the same was tested using devlink health test interface by injecting errors. The dump gets generated automatically and the counters do get out of sync, in case of continuous error. That wouldn't be much of an issue as the user could manually trigger a dump clear and Re-dump the counters to get the exact status of the counters at any point of time. > one dump, are the counters not going to get out of sync unless something > clears the dump every time it triggers? Regards, -George
[PATCHv5 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NIX block. NIX Health reporter handle following HW event groups - GENERAL events - RAS events - RVU event An event counter per event is maintained in SW. Output: # devlink health pci/0002:01:00.0: reporter hw_npa state healthy error 0 recover 0 reporter hw_nix state healthy error 0 recover 0 # devlink health dump show pci/0002:01:00.0 reporter hw_nix NIX_AF_GENERAL: Memory Fault on NIX_AQ_INST_S read: 0 Memory Fault on NIX_AQ_RES_S write: 0 AQ Doorbell error: 0 Rx on unmapped PF_FUNC: 0 Rx multicast replication error: 0 Memory fault on NIX_RX_MCE_S read: 0 Memory fault on multicast WQE read: 0 Memory fault on mirror WQE read: 0 Memory fault on mirror pkt write: 0 Memory fault on multicast pkt write: 0 NIX_AF_RAS: Poisoned data on NIX_AQ_INST_S read: 0 Poisoned data on NIX_AQ_RES_S write: 0 Poisoned data on HW context read: 0 Poisoned data on packet read from mirror buffer: 0 Poisoned data on packet read from mcast buffer: 0 Poisoned data on WQE read from mirror buffer: 0 Poisoned data on WQE read from multicast buffer: 0 Poisoned data on NIX_RX_MCE_S read: 0 NIX_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 414 +- .../marvell/octeontx2/af/rvu_devlink.h| 31 ++ .../marvell/octeontx2/af/rvu_struct.h | 10 + 3 files changed, 453 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 377264d65d0c..2f20d8b9eef3 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -35,6 +35,131 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg) return devlink_fmsg_pair_nest_end(fmsg); } +static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->nix_event_ctx; + nix_event_count = _event_context->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT); + nix_event_context->nix_af_rvu_int = intr; + + if (intr & BIT_ULL(0)) + nix_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL); + devlink_health_report(rvu_dl->rvu_nix_health_reporter, "NIX_AF_RVU Error", + nix_event_context); + + return IRQ_HANDLED; +} + +static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->nix_event_ctx; + nix_event_count = _event_context->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT); + nix_event_context->nix_af_rvu_err = intr; + + if (intr & BIT_ULL(14)) + nix_event_count->aq_inst_count++; + if (intr & BIT_ULL(13)) + nix_event_count->aq_res_count++; + if (intr & BIT_ULL(12)) + nix_event_count->aq_db_count++; + if (intr & BIT_ULL(6)) + nix_event_count->rx_on_unmap_pf_count++; + if (intr & BIT_ULL(5)) + nix_event_count->rx_mcast_repl_count++; + if (intr & BIT_ULL(4)) + nix_event_count->rx_mcast_memfault_count++; + if (intr & BIT_ULL(3)) + nix_event_count->rx_mcast_wqe_memfault_count++; + if (intr & BIT_ULL(2)) + nix_event_count->rx_mirror_wqe_memfault_count++; + if (intr & BIT_ULL(1)) + nix_event_count->rx_mirror_pktw_memfault_count++; + if (intr & BIT_ULL(0)) + nix_event_count->rx_mcast_pktw_memfault_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT_ENA_W1C, ~0ULL); + dev
[PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Add health reporters for RVU NPA block. NPA Health reporters handle following HW event groups - GENERAL events - ERROR events - RAS events - RVU event An event counter per event is maintained in SW. Output: # devlink health pci/0002:01:00.0: reporter hw_npa state healthy error 0 recover 0 # devlink health dump show pci/0002:01:00.0 reporter hw_npa NPA_AF_GENERAL: Unmap PF Error: 0 NIX: 0: free disabled RX: 0 free disabled TX: 0 1: free disabled RX: 0 free disabled TX: 0 Free Disabled for SSO: 0 Free Disabled for TIM: 0 Free Disabled for DPI: 0 Free Disabled for AURA: 0 Alloc Disabled for Resvd: 0 NPA_AF_ERR: Memory Fault on NPA_AQ_INST_S read: 0 Memory Fault on NPA_AQ_RES_S write: 0 AQ Doorbell Error: 0 Poisoned data on NPA_AQ_INST_S read: 0 Poisoned data on NPA_AQ_RES_S write: 0 Poisoned data on HW context read: 0 NPA_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 498 +- .../marvell/octeontx2/af/rvu_devlink.h| 31 ++ .../marvell/octeontx2/af/rvu_struct.h | 23 + 3 files changed, 551 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 04ef945e7e75..377264d65d0c 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -5,10 +5,504 @@ * */ +#include + #include "rvu.h" +#include "rvu_reg.h" +#include "rvu_struct.h" #define DRV_NAME "octeontx2-af" +static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + return devlink_fmsg_obj_nest_start(fmsg); +} + +static int rvu_report_pair_end(struct devlink_fmsg *fmsg) +{ + int err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + + return devlink_fmsg_pair_nest_end(fmsg); +} + +static bool rvu_common_request_irq(struct rvu *rvu, int offset, + const char *name, irq_handler_t fn) +{ + struct rvu_devlink *rvu_dl = rvu->rvu_dl; + int rc; + + sprintf(>irq_name[offset * NAME_SIZE], name); + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0, +>irq_name[offset * NAME_SIZE], rvu_dl); + if (rc) + dev_warn(rvu->dev, "Failed to register %s irq\n", name); + else + rvu->irq_allocated[offset] = true; + + return rvu->irq_allocated[offset]; +} + +static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_ctx *npa_event_context; + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_context = rvu_dl->npa_event_ctx; + npa_event_count = _event_context->npa_event_cnt; + intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT); + npa_event_context->npa_af_rvu_int = intr; + + if (intr & BIT_ULL(0)) + npa_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL); + devlink_health_report(rvu_dl->rvu_npa_health_reporter, "NPA_AF_RVU Error", + npa_event_context); + + return IRQ_HANDLED; +} + +static int rvu_npa_inpq_to_cnt(u16 in, + struct rvu_npa_event_cnt *npa_event_count) +{ + switch (in) { + case 0: + return 0; + case BIT(NPA_INPQ_NIX0_RX): + return npa_event_count->free_dis_nix0_rx_count++; + case BIT(NPA_INPQ_NIX0_TX): + return npa_event_count->free_dis_nix0_tx_count++; + case BIT(NPA_INPQ_NIX1_RX): + return npa_event_count->free_dis_nix1_rx_count++; + case BIT(NPA_INPQ_NIX1_TX): + return npa_event_count->free_dis_nix1_tx_count++; + case BIT(NPA_INPQ_SSO): + return npa_event_count->free_dis_sso_count++; + case BIT(NPA_INPQ_TIM): + return npa_event_count->free_dis_tim_count++; + case BIT(NPA_INPQ_DPI): + return npa_event_count->free_dis_dpi_count++; + case BIT(NPA_INPQ_AURA_OP): + return npa_event_cou
[PATCHv5 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af versions: fixed: mbox version: 9 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 2 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 ++- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 ++ .../marvell/octeontx2/af/rvu_devlink.c| 72 +++ .../marvell/octeontx2/af/rvu_devlink.h| 20 ++ 6 files changed, 106 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 543a1d047567..16caa02095fe 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" select OCTEONTX2_MBOX + select NET_DEVLINK depends on (64BIT && COMPILE_TEST) || ARM64 depends on PCI help diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 7100d1dd856e..eb535c98ca38 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o rvu_trace.o octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \ rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \ - rvu_cpt.o + rvu_cpt.o rvu_devlink.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 9f901c0edcbb..e8fd712860a1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_flr; + err = rvu_register_dl(rvu); + if (err) + goto err_irq; + rvu_setup_rvum_blk_revid(rvu); /* Enable AF's VFs (if any) */ err = rvu_enable_sriov(rvu); if (err) - goto err_irq; + goto err_dl; /* Initialize debugfs */ rvu_dbg_init(rvu); return 0; +err_dl: + rvu_unregister_dl(rvu); err_irq: rvu_unregister_interrupts(rvu); err_flr: @@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev) rvu_dbg_exit(rvu); rvu_unregister_interrupts(rvu); + rvu_unregister_dl(rvu); rvu_flr_wq_destroy(rvu); rvu_cgx_exit(rvu); rvu_fwdata_exit(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index b6c0977499ab..b1a6ecfd563e 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -12,7 +12,10 @@ #define RVU_H #include +#include + #include "rvu_struct.h" +#include "rvu_devlink.h" #include "common.h" #include "mbox.h" #include "npc.h" @@ -422,6 +425,7 @@ struct rvu { #ifdef CONFIG_DEBUG_FS struct rvu_debugfs rvu_dbg; #endif + struct rvu_devlink *rvu_dl; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c new file mode 100644 index ..04ef945e7e75 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Devlink + * + * Copyright (C) 2020 Marvell. + * + */ + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" + +static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + char buf[10]; + int err; + + err = devlink_info_driver_name_put(req, DRV_NAME); + if (err) + return err; + + sprintf(buf, "%X", OTX2_MBOX_VERSION); + return devlink_info_version_fixed_put(req, "mbox version:", buf); +} + +static const struct devlink_ops rvu_devlink_ops = { + .info_get = rvu_devlink_info_get, +}; + +int rvu_register_dl(struct rvu *rvu) +{ + struct
[PATCHv5 net-next 0/3] Add devlink and devlink health reporters to
Add basic devlink and devlink health reporters. Devlink health reporters are added for NPA and NIX blocks. These reporters report the error count in respective blocks. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html Change-log: v5 - Address Jiri's comment - use devlink_fmsg_arr_pair_nest_start() for NIX blocks v4 - Rebase to net-next (no logic changes). v3 - Address Saeed's comments on v2. - Renamed the reporter name as hw_*. - Call devlink_health_report() when an event is raised. - Added recover op too. v2 - Address Willem's comments on v1. - Fixed the sparse issues, reported by Jakub. George Cherian (3): octeontx2-af: Add devlink suppoort to af driver octeontx2-af: Add devlink health reporters for NPA octeontx2-af: Add devlink health reporters for NIX .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 2 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 +- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 + .../marvell/octeontx2/af/rvu_devlink.c| 978 ++ .../marvell/octeontx2/af/rvu_devlink.h| 82 ++ .../marvell/octeontx2/af/rvu_struct.h | 33 + 7 files changed, 1107 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h -- 2.25.1
Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Hi Jiri, > -Original Message- > From: Jiri Pirko > Sent: Monday, November 23, 2020 3:52 PM > To: George Cherian > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org; > willemdebruijn.ker...@gmail.com; sa...@kernel.org > Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health > reporters for NPA > > Mon, Nov 23, 2020 at 03:49:06AM CET, gcher...@marvell.com wrote: > > > > > >> -Original Message- > >> From: Jiri Pirko > >> Sent: Saturday, November 21, 2020 7:44 PM > >> To: George Cherian > >> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > >> k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham > >> ; Linu Cherian ; > >> Geethasowjanya Akula ; masahi...@kernel.org; > >> willemdebruijn.ker...@gmail.com; sa...@kernel.org > >> Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health > >> reporters for NPA > >> > >> Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote: > >> >Add health reporters for RVU NPA block. > >> >NPA Health reporters handle following HW event groups > >> > - GENERAL events > >> > - ERROR events > >> > - RAS events > >> > - RVU event > >> >An event counter per event is maintained in SW. > >> > > >> >Output: > >> > # devlink health > >> > pci/0002:01:00.0: > >> > reporter npa > >> > state healthy error 0 recover 0 # devlink health dump show > >> >pci/0002:01:00.0 reporter npa > >> > NPA_AF_GENERAL: > >> >Unmap PF Error: 0 > >> >Free Disabled for NIX0 RX: 0 > >> >Free Disabled for NIX0 TX: 0 > >> >Free Disabled for NIX1 RX: 0 > >> >Free Disabled for NIX1 TX: 0 > >> > >> This is for 2 ports if I'm not mistaken. Then you need to have this > >> reporter per-port. Register ports and have reporter for each. > >> > >No, these are not port specific reports. > >NIX is the Network Interface Controller co-processor block. > >There are (max of) 2 such co-processor blocks per SoC. > > Ah. I see. In that case, could you please structure the json differently. > Don't > concatenate the number with the string. Instead of that, please have 2 > subtrees, one for each NIX. > NPA_AF_GENERAL: Unmap PF Error: 0 Free Disabled for NIX0 RX: 0 TX: 0 Free Disabled for NIX1 RX: 0 TX: 0 Something like this? Regards, -George > > > > >Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor) > reporter. > >This tells whether a free or alloc operation is skipped due to the > >configurations set by other co-processor blocks (NIX,SSO,TIM etc). > > > >https://urldefense.proofpoint.com/v2/url?u=https- > 3A__www.kernel.org_doc > >_html_latest_networking_device- > 5Fdrivers_ethernet_marvell_octeontx2.htm > >l=DwIBAg=nKjWec2b6R0mOyPaz7xtfQ=npgTSgHrUSLmXpBZJKVhk0 > lE_XNvtVDl8 > >ZA2zBvBqPw=FNPm6lB8fRvGYvMqQWer6S9WI6rZIlMmDCqbM8xrnxM > =B47zBTfDlIdM > >xUmK0hmQkuoZnsGZYSzkvbZUloevT0A= > >> NAK.
Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
> -Original Message- > From: Jiri Pirko > Sent: Saturday, November 21, 2020 7:44 PM > To: George Cherian > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org; > willemdebruijn.ker...@gmail.com; sa...@kernel.org > Subject: Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health > reporters for NPA > > Sat, Nov 21, 2020 at 05:02:00AM CET, george.cher...@marvell.com wrote: > >Add health reporters for RVU NPA block. > >NPA Health reporters handle following HW event groups > > - GENERAL events > > - ERROR events > > - RAS events > > - RVU event > >An event counter per event is maintained in SW. > > > >Output: > > # devlink health > > pci/0002:01:00.0: > > reporter npa > > state healthy error 0 recover 0 > > # devlink health dump show pci/0002:01:00.0 reporter npa > > NPA_AF_GENERAL: > >Unmap PF Error: 0 > >Free Disabled for NIX0 RX: 0 > >Free Disabled for NIX0 TX: 0 > >Free Disabled for NIX1 RX: 0 > >Free Disabled for NIX1 TX: 0 > > This is for 2 ports if I'm not mistaken. Then you need to have this reporter > per-port. Register ports and have reporter for each. > No, these are not port specific reports. NIX is the Network Interface Controller co-processor block. There are (max of) 2 such co-processor blocks per SoC. Moreover, this is an NPA (Network Pool/Buffer Allocator co- processor) reporter. This tells whether a free or alloc operation is skipped due to the configurations set by other co-processor blocks (NIX,SSO,TIM etc). https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/marvell/octeontx2.html > NAK.
[PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Add health reporters for RVU NPA block. NPA Health reporters handle following HW event groups - GENERAL events - ERROR events - RAS events - RVU event An event counter per event is maintained in SW. Output: # devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 # devlink health dump show pci/0002:01:00.0 reporter npa NPA_AF_GENERAL: Unmap PF Error: 0 Free Disabled for NIX0 RX: 0 Free Disabled for NIX0 TX: 0 Free Disabled for NIX1 RX: 0 Free Disabled for NIX1 TX: 0 Free Disabled for SSO: 0 Free Disabled for TIM: 0 Free Disabled for DPI: 0 Free Disabled for AURA: 0 Alloc Disabled for Resvd: 0 NPA_AF_ERR: Memory Fault on NPA_AQ_INST_S read: 0 Memory Fault on NPA_AQ_RES_S write: 0 AQ Doorbell Error: 0 Poisoned data on NPA_AQ_INST_S read: 0 Poisoned data on NPA_AQ_RES_S write: 0 Poisoned data on HW context read: 0 NPA_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 492 +- .../marvell/octeontx2/af/rvu_devlink.h| 31 ++ .../marvell/octeontx2/af/rvu_struct.h | 23 + 3 files changed, 545 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 04ef945e7e75..b7f0691d86b0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -5,10 +5,498 @@ * */ +#include + #include "rvu.h" +#include "rvu_reg.h" +#include "rvu_struct.h" #define DRV_NAME "octeontx2-af" +static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + return devlink_fmsg_obj_nest_start(fmsg); +} + +static int rvu_report_pair_end(struct devlink_fmsg *fmsg) +{ + int err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + + return devlink_fmsg_pair_nest_end(fmsg); +} + +static bool rvu_common_request_irq(struct rvu *rvu, int offset, + const char *name, irq_handler_t fn) +{ + struct rvu_devlink *rvu_dl = rvu->rvu_dl; + int rc; + + sprintf(>irq_name[offset * NAME_SIZE], name); + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0, +>irq_name[offset * NAME_SIZE], rvu_dl); + if (rc) + dev_warn(rvu->dev, "Failed to register %s irq\n", name); + else + rvu->irq_allocated[offset] = true; + + return rvu->irq_allocated[offset]; +} + +static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_ctx *npa_event_context; + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_context = rvu_dl->npa_event_ctx; + npa_event_count = _event_context->npa_event_cnt; + intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT); + npa_event_context->npa_af_rvu_int = intr; + + if (intr & BIT_ULL(0)) + npa_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL); + devlink_health_report(rvu_dl->rvu_npa_health_reporter, "NPA_AF_RVU Error", + npa_event_context); + + return IRQ_HANDLED; +} + +static int rvu_npa_inpq_to_cnt(u16 in, + struct rvu_npa_event_cnt *npa_event_count) +{ + switch (in) { + case 0: + return 0; + case BIT(NPA_INPQ_NIX0_RX): + return npa_event_count->free_dis_nix0_rx_count++; + case BIT(NPA_INPQ_NIX0_TX): + return npa_event_count->free_dis_nix0_tx_count++; + case BIT(NPA_INPQ_NIX1_RX): + return npa_event_count->free_dis_nix1_rx_count++; + case BIT(NPA_INPQ_NIX1_TX): + return npa_event_count->free_dis_nix1_tx_count++; + case BIT(NPA_INPQ_SSO): + return npa_event_count->free_dis_sso_count++; + case BIT(NPA_INPQ_TIM): + return npa_event_count->free_dis_tim_count++; + case BIT(NPA_INPQ_DPI): + return npa_event_count->free_dis_dpi_count++; + case BIT(NPA_INPQ_AURA_OP):
[PATCHv4 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NIX block. NIX Health reporter handle following HW event groups - GENERAL events - RAS events - RVU event An event counter per event is maintained in SW. Output: # ./devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 reporter nix state healthy error 0 recover 0 # ./devlink health dump show pci/0002:01:00.0 reporter nix NIX_AF_GENERAL: Memory Fault on NIX_AQ_INST_S read: 0 Memory Fault on NIX_AQ_RES_S write: 0 AQ Doorbell error: 0 Rx on unmapped PF_FUNC: 0 Rx multicast replication error: 0 Memory fault on NIX_RX_MCE_S read: 0 Memory fault on multicast WQE read: 0 Memory fault on mirror WQE read: 0 Memory fault on mirror pkt write: 0 Memory fault on multicast pkt write: 0 NIX_AF_RAS: Poisoned data on NIX_AQ_INST_S read: 0 Poisoned data on NIX_AQ_RES_S write: 0 Poisoned data on HW context read: 0 Poisoned data on packet read from mirror buffer: 0 Poisoned data on packet read from mcast buffer: 0 Poisoned data on WQE read from mirror buffer: 0 Poisoned data on WQE read from multicast buffer: 0 Poisoned data on NIX_RX_MCE_S read: 0 NIX_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 414 +- .../marvell/octeontx2/af/rvu_devlink.h| 31 ++ .../marvell/octeontx2/af/rvu_struct.h | 10 + 3 files changed, 453 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index b7f0691d86b0..c02d0f56ae7a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -35,6 +35,131 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg) return devlink_fmsg_pair_nest_end(fmsg); } +static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->nix_event_ctx; + nix_event_count = _event_context->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT); + nix_event_context->nix_af_rvu_int = intr; + + if (intr & BIT_ULL(0)) + nix_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL); + devlink_health_report(rvu_dl->rvu_nix_health_reporter, "NIX_AF_RVU Error", + nix_event_context); + + return IRQ_HANDLED; +} + +static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->nix_event_ctx; + nix_event_count = _event_context->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT); + nix_event_context->nix_af_rvu_err = intr; + + if (intr & BIT_ULL(14)) + nix_event_count->aq_inst_count++; + if (intr & BIT_ULL(13)) + nix_event_count->aq_res_count++; + if (intr & BIT_ULL(12)) + nix_event_count->aq_db_count++; + if (intr & BIT_ULL(6)) + nix_event_count->rx_on_unmap_pf_count++; + if (intr & BIT_ULL(5)) + nix_event_count->rx_mcast_repl_count++; + if (intr & BIT_ULL(4)) + nix_event_count->rx_mcast_memfault_count++; + if (intr & BIT_ULL(3)) + nix_event_count->rx_mcast_wqe_memfault_count++; + if (intr & BIT_ULL(2)) + nix_event_count->rx_mirror_wqe_memfault_count++; + if (intr & BIT_ULL(1)) + nix_event_count->rx_mirror_pktw_memfault_count++; + if (intr & BIT_ULL(0)) + nix_event_count->rx_mcast_pktw_memfault_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT_ENA_W1C, ~0ULL); + dev
[PATCHv4 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af versions: fixed: mbox version: 9 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 2 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 ++- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 ++ .../marvell/octeontx2/af/rvu_devlink.c| 72 +++ .../marvell/octeontx2/af/rvu_devlink.h| 20 ++ 6 files changed, 106 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 543a1d047567..16caa02095fe 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" select OCTEONTX2_MBOX + select NET_DEVLINK depends on (64BIT && COMPILE_TEST) || ARM64 depends on PCI help diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 7100d1dd856e..eb535c98ca38 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o rvu_trace.o octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \ rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \ - rvu_cpt.o + rvu_cpt.o rvu_devlink.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 9f901c0edcbb..e8fd712860a1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_flr; + err = rvu_register_dl(rvu); + if (err) + goto err_irq; + rvu_setup_rvum_blk_revid(rvu); /* Enable AF's VFs (if any) */ err = rvu_enable_sriov(rvu); if (err) - goto err_irq; + goto err_dl; /* Initialize debugfs */ rvu_dbg_init(rvu); return 0; +err_dl: + rvu_unregister_dl(rvu); err_irq: rvu_unregister_interrupts(rvu); err_flr: @@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev) rvu_dbg_exit(rvu); rvu_unregister_interrupts(rvu); + rvu_unregister_dl(rvu); rvu_flr_wq_destroy(rvu); rvu_cgx_exit(rvu); rvu_fwdata_exit(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index b6c0977499ab..b1a6ecfd563e 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -12,7 +12,10 @@ #define RVU_H #include +#include + #include "rvu_struct.h" +#include "rvu_devlink.h" #include "common.h" #include "mbox.h" #include "npc.h" @@ -422,6 +425,7 @@ struct rvu { #ifdef CONFIG_DEBUG_FS struct rvu_debugfs rvu_dbg; #endif + struct rvu_devlink *rvu_dl; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c new file mode 100644 index ..04ef945e7e75 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Devlink + * + * Copyright (C) 2020 Marvell. + * + */ + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" + +static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + char buf[10]; + int err; + + err = devlink_info_driver_name_put(req, DRV_NAME); + if (err) + return err; + + sprintf(buf, "%X", OTX2_MBOX_VERSION); + return devlink_info_version_fixed_put(req, "mbox version:", buf); +} + +static const struct devlink_ops rvu_devlink_ops = { + .info_get = rvu_devlink_info_get, +}; + +int rvu_register_dl(struct rvu *rvu) +{ + struct
[PATCHv3 net-next 0/3] Add devlink and devlink health reporters to
Add basic devlink and devlink health reporters. Devlink health reporters are added for NPA and NIX blocks. These reporters report the error count in respective blocks. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html Change-log: v4 - Rebase to net-next (no logic changes). v3 - Address Saeed's comments on v2. - Renamed the reporter name as hw_*. - Call devlink_health_report() when an event is raised. - Added recover op too. v2 - Address Willem's comments on v1. - Fixed the sparse issues, reported by Jakub. George Cherian (3): octeontx2-af: Add devlink suppoort to af driver octeontx2-af: Add devlink health reporters for NPA octeontx2-af: Add devlink health reporters for NIX .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 2 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 +- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 + .../marvell/octeontx2/af/rvu_devlink.c| 972 ++ .../marvell/octeontx2/af/rvu_devlink.h| 82 ++ .../marvell/octeontx2/af/rvu_struct.h | 33 + 7 files changed, 1101 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h -- 2.25.1
[PATCH] octeontx2-af: Add support for RSS hashing based on Transport protocol field
Add support to choose RSS flow key algorithm with IPv4 transport protocol field included in hashing input data. This will be enabled by default. There-by enabling 3/5 tuple hash Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: George Cherian --- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 + drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 7 +++ drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 3 ++- 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index f46de8419b77..97c8566b7da8 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -644,6 +644,7 @@ struct nix_rss_flowkey_cfg { #define NIX_FLOW_KEY_TYPE_INNR_SCTP BIT(16) #define NIX_FLOW_KEY_TYPE_INNR_ETH_DMAC BIT(17) #define NIX_FLOW_KEY_TYPE_VLAN BIT(20) +#define NIX_FLOW_KEY_TYPE_IPV4_PROTO BIT(21) u32 flowkey_cfg; /* Flowkey types selected */ u8 group; /* RSS context or group */ }; diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index 8bac1dd3a1c2..ef016521b277 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -2429,6 +2429,13 @@ static int set_flowkey_fields(struct nix_rx_flowkey_alg *alg, u32 flow_cfg) /* This should be set to 1, when SEL_CHAN is set */ field->bytesm1 = 1; break; + case NIX_FLOW_KEY_TYPE_IPV4_PROTO: + field->lid = NPC_LID_LC; + field->hdr_offset = 9; /* offset */ + field->bytesm1 = 0; /* 1 byte */ + field->ltype_match = NPC_LT_LC_IP; + field->ltype_mask = 0xF; + break; case NIX_FLOW_KEY_TYPE_IPV4: case NIX_FLOW_KEY_TYPE_INNR_IPV4: field->lid = NPC_LID_LC; diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index 9f3d6715748e..2ab927408656 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -355,7 +355,8 @@ int otx2_rss_init(struct otx2_nic *pfvf) rss->flowkey_cfg = rss->enable ? rss->flowkey_cfg : NIX_FLOW_KEY_TYPE_IPV4 | NIX_FLOW_KEY_TYPE_IPV6 | NIX_FLOW_KEY_TYPE_TCP | NIX_FLOW_KEY_TYPE_UDP | - NIX_FLOW_KEY_TYPE_SCTP | NIX_FLOW_KEY_TYPE_VLAN; + NIX_FLOW_KEY_TYPE_SCTP | NIX_FLOW_KEY_TYPE_VLAN | + NIX_FLOW_KEY_TYPE_IPV4_PROTO; ret = otx2_set_flowkey_cfg(pfvf); if (ret) -- 2.25.1
[PATCHv3 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Add health reporters for RVU NPA block. NPA Health reporters handle following HW event groups - GENERAL events - ERROR events - RAS events - RVU event An event counter per event is maintained in SW. Output: # devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 # devlink health dump show pci/0002:01:00.0 reporter npa NPA_AF_GENERAL: Unmap PF Error: 0 Free Disabled for NIX0 RX: 0 Free Disabled for NIX0 TX: 0 Free Disabled for NIX1 RX: 0 Free Disabled for NIX1 TX: 0 Free Disabled for SSO: 0 Free Disabled for TIM: 0 Free Disabled for DPI: 0 Free Disabled for AURA: 0 Alloc Disabled for Resvd: 0 NPA_AF_ERR: Memory Fault on NPA_AQ_INST_S read: 0 Memory Fault on NPA_AQ_RES_S write: 0 AQ Doorbell Error: 0 Poisoned data on NPA_AQ_INST_S read: 0 Poisoned data on NPA_AQ_RES_S write: 0 Poisoned data on HW context read: 0 NPA_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 492 +- .../marvell/octeontx2/af/rvu_devlink.h| 31 ++ .../marvell/octeontx2/af/rvu_struct.h | 23 + 3 files changed, 545 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 04ef945e7e75..b7f0691d86b0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -5,10 +5,498 @@ * */ +#include + #include "rvu.h" +#include "rvu_reg.h" +#include "rvu_struct.h" #define DRV_NAME "octeontx2-af" +static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + return devlink_fmsg_obj_nest_start(fmsg); +} + +static int rvu_report_pair_end(struct devlink_fmsg *fmsg) +{ + int err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + + return devlink_fmsg_pair_nest_end(fmsg); +} + +static bool rvu_common_request_irq(struct rvu *rvu, int offset, + const char *name, irq_handler_t fn) +{ + struct rvu_devlink *rvu_dl = rvu->rvu_dl; + int rc; + + sprintf(>irq_name[offset * NAME_SIZE], name); + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0, +>irq_name[offset * NAME_SIZE], rvu_dl); + if (rc) + dev_warn(rvu->dev, "Failed to register %s irq\n", name); + else + rvu->irq_allocated[offset] = true; + + return rvu->irq_allocated[offset]; +} + +static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_ctx *npa_event_context; + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_context = rvu_dl->npa_event_ctx; + npa_event_count = _event_context->npa_event_cnt; + intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT); + npa_event_context->npa_af_rvu_int = intr; + + if (intr & BIT_ULL(0)) + npa_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT_ENA_W1C, ~0ULL); + devlink_health_report(rvu_dl->rvu_npa_health_reporter, "NPA_AF_RVU Error", + npa_event_context); + + return IRQ_HANDLED; +} + +static int rvu_npa_inpq_to_cnt(u16 in, + struct rvu_npa_event_cnt *npa_event_count) +{ + switch (in) { + case 0: + return 0; + case BIT(NPA_INPQ_NIX0_RX): + return npa_event_count->free_dis_nix0_rx_count++; + case BIT(NPA_INPQ_NIX0_TX): + return npa_event_count->free_dis_nix0_tx_count++; + case BIT(NPA_INPQ_NIX1_RX): + return npa_event_count->free_dis_nix1_rx_count++; + case BIT(NPA_INPQ_NIX1_TX): + return npa_event_count->free_dis_nix1_tx_count++; + case BIT(NPA_INPQ_SSO): + return npa_event_count->free_dis_sso_count++; + case BIT(NPA_INPQ_TIM): + return npa_event_count->free_dis_tim_count++; + case BIT(NPA_INPQ_DPI): + return npa_event_count->free_dis_dpi_count++; + case BIT(NPA_INPQ_AURA_OP):
[PATCHv3 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NIX block. NIX Health reporter handle following HW event groups - GENERAL events - RAS events - RVU event An event counter per event is maintained in SW. Output: # ./devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 reporter nix state healthy error 0 recover 0 # ./devlink health dump show pci/0002:01:00.0 reporter nix NIX_AF_GENERAL: Memory Fault on NIX_AQ_INST_S read: 0 Memory Fault on NIX_AQ_RES_S write: 0 AQ Doorbell error: 0 Rx on unmapped PF_FUNC: 0 Rx multicast replication error: 0 Memory fault on NIX_RX_MCE_S read: 0 Memory fault on multicast WQE read: 0 Memory fault on mirror WQE read: 0 Memory fault on mirror pkt write: 0 Memory fault on multicast pkt write: 0 NIX_AF_RAS: Poisoned data on NIX_AQ_INST_S read: 0 Poisoned data on NIX_AQ_RES_S write: 0 Poisoned data on HW context read: 0 Poisoned data on packet read from mirror buffer: 0 Poisoned data on packet read from mcast buffer: 0 Poisoned data on WQE read from mirror buffer: 0 Poisoned data on WQE read from multicast buffer: 0 Poisoned data on NIX_RX_MCE_S read: 0 NIX_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 414 +- .../marvell/octeontx2/af/rvu_devlink.h| 31 ++ .../marvell/octeontx2/af/rvu_struct.h | 10 + 3 files changed, 453 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index b7f0691d86b0..c02d0f56ae7a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -35,6 +35,131 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg) return devlink_fmsg_pair_nest_end(fmsg); } +static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->nix_event_ctx; + nix_event_count = _event_context->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT); + nix_event_context->nix_af_rvu_int = intr; + + if (intr & BIT_ULL(0)) + nix_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT_ENA_W1C, ~0ULL); + devlink_health_report(rvu_dl->rvu_nix_health_reporter, "NIX_AF_RVU Error", + nix_event_context); + + return IRQ_HANDLED; +} + +static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_ctx *nix_event_context; + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_context = rvu_dl->nix_event_ctx; + nix_event_count = _event_context->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT); + nix_event_context->nix_af_rvu_err = intr; + + if (intr & BIT_ULL(14)) + nix_event_count->aq_inst_count++; + if (intr & BIT_ULL(13)) + nix_event_count->aq_res_count++; + if (intr & BIT_ULL(12)) + nix_event_count->aq_db_count++; + if (intr & BIT_ULL(6)) + nix_event_count->rx_on_unmap_pf_count++; + if (intr & BIT_ULL(5)) + nix_event_count->rx_mcast_repl_count++; + if (intr & BIT_ULL(4)) + nix_event_count->rx_mcast_memfault_count++; + if (intr & BIT_ULL(3)) + nix_event_count->rx_mcast_wqe_memfault_count++; + if (intr & BIT_ULL(2)) + nix_event_count->rx_mirror_wqe_memfault_count++; + if (intr & BIT_ULL(1)) + nix_event_count->rx_mirror_pktw_memfault_count++; + if (intr & BIT_ULL(0)) + nix_event_count->rx_mcast_pktw_memfault_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr); + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT_ENA_W1C, ~0ULL); + dev
[PATCHv3 net-next 0/3] Add devlink and devlink health reporters to
Add basic devlink and devlink health reporters. Devlink health reporters are added for NPA and NIX blocks. These reporters report the error count in respective blocks. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html Change-log: v3 - Address Saeed's comments on v2. - Renamed the reporter name as hw_*. - Call devlink_health_report() when an event is raised. - Added recover op too. v2 - Address Willem's comments on v1. - Fixed the sparse issues, reported by Jakub. George Cherian (3): octeontx2-af: Add devlink suppoort to af driver octeontx2-af: Add devlink health reporters for NPA octeontx2-af: Add devlink health reporters for NIX .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 3 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 +- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 + .../marvell/octeontx2/af/rvu_devlink.c| 972 ++ .../marvell/octeontx2/af/rvu_devlink.h| 82 ++ .../marvell/octeontx2/af/rvu_struct.h | 33 + 7 files changed, 1102 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h -- 2.25.1
[PATCHv3 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af versions: fixed: mbox version: 9 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 3 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 ++- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 ++ .../marvell/octeontx2/af/rvu_devlink.c| 72 +++ .../marvell/octeontx2/af/rvu_devlink.h| 20 ++ 6 files changed, 107 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 543a1d047567..16caa02095fe 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" select OCTEONTX2_MBOX + select NET_DEVLINK depends on (64BIT && COMPILE_TEST) || ARM64 depends on PCI help diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 2f7a861d0c7b..20135f1d3387 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -9,4 +9,5 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o rvu_trace.o octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \ - rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o + rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o \ + rvu_devlink.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index a28a518c0eae..67d6e05d1037 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -2816,17 +2816,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_flr; + err = rvu_register_dl(rvu); + if (err) + goto err_irq; + rvu_setup_rvum_blk_revid(rvu); /* Enable AF's VFs (if any) */ err = rvu_enable_sriov(rvu); if (err) - goto err_irq; + goto err_dl; /* Initialize debugfs */ rvu_dbg_init(rvu); return 0; +err_dl: + rvu_unregister_dl(rvu); err_irq: rvu_unregister_interrupts(rvu); err_flr: @@ -2858,6 +2864,7 @@ static void rvu_remove(struct pci_dev *pdev) rvu_dbg_exit(rvu); rvu_unregister_interrupts(rvu); + rvu_unregister_dl(rvu); rvu_flr_wq_destroy(rvu); rvu_cgx_exit(rvu); rvu_fwdata_exit(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 5ac9bb12415f..282566235918 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -12,7 +12,10 @@ #define RVU_H #include +#include + #include "rvu_struct.h" +#include "rvu_devlink.h" #include "common.h" #include "mbox.h" @@ -376,6 +379,7 @@ struct rvu { #ifdef CONFIG_DEBUG_FS struct rvu_debugfs rvu_dbg; #endif + struct rvu_devlink *rvu_dl; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c new file mode 100644 index ..04ef945e7e75 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Devlink + * + * Copyright (C) 2020 Marvell. + * + */ + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" + +static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + char buf[10]; + int err; + + err = devlink_info_driver_name_put(req, DRV_NAME); + if (err) + return err; + + sprintf(buf, "%X", OTX2_MBOX_VERSION); + return devlink_info_version_fixed_put(req, "mbox version:", buf); +} + +static const struct devlink_ops rvu_devlink_ops = { + .info_get = rvu_devlink_info_get, +}; + +int rvu_register_dl(struct rvu *rvu) +{ + struct rvu_devlink *rvu_dl; +
Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX
Hi Saeed, Thanks for the review. > -Original Message- > From: Saeed Mahameed > Sent: Thursday, November 5, 2020 10:39 AM > To: George Cherian ; net...@vger.kernel.org; > linux-kernel@vger.kernel.org; Jiri Pirko > Cc: k...@kernel.org; da...@davemloft.net; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org; > willemdebruijn.ker...@gmail.com > Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health > reporters for NIX > > On Wed, 2020-11-04 at 17:57 +0530, George Cherian wrote: > > Add health reporters for RVU NPA block. >^^^ NIX ? > Yes, it's NIX. > Cc: Jiri > > Anyway, could you please spare some words on what is NPA and what is > NIX? > > Regarding the reporters names, all drivers register well known generic names > such as (fw,hw,rx,tx), I don't know if it is a good idea to use vendor > specific > names, if you are reporting for hw/fw units then just use "hw" or "fw" as the > reporter name and append the unit NPA/NIX to the counter/error names. Okay. These are hw units, I will rename them as hw_npa/hw_nix. > > > Only reporter dump is supported. > > > > Output: > > # ./devlink health > > pci/0002:01:00.0: > >reporter npa > > state healthy error 0 recover 0 > >reporter nix > > state healthy error 0 recover 0 > > # ./devlink health dump show pci/0002:01:00.0 reporter nix > > NIX_AF_GENERAL: > > Memory Fault on NIX_AQ_INST_S read: 0 > > Memory Fault on NIX_AQ_RES_S write: 0 > > AQ Doorbell error: 0 > > Rx on unmapped PF_FUNC: 0 > > Rx multicast replication error: 0 > > Memory fault on NIX_RX_MCE_S read: 0 > > Memory fault on multicast WQE read: 0 > > Memory fault on mirror WQE read: 0 > > Memory fault on mirror pkt write: 0 > > Memory fault on multicast pkt write: 0 > >NIX_AF_RAS: > > Poisoned data on NIX_AQ_INST_S read: 0 > > Poisoned data on NIX_AQ_RES_S write: 0 > > Poisoned data on HW context read: 0 > > Poisoned data on packet read from mirror buffer: 0 > > Poisoned data on packet read from mcast buffer: 0 > > Poisoned data on WQE read from mirror buffer: 0 > > Poisoned data on WQE read from multicast buffer: 0 > > Poisoned data on NIX_RX_MCE_S read: 0 > >NIX_AF_RVU: > > Unmap Slot Error: 0 > > > > Now i am a little bit skeptic here, devlink health reporter infrastructure was > never meant to deal with dump op only, the main purpose is to > diagnose/dump and recover. > > especially in your use case where you only report counters, i don't believe > devlink health dump is a proper interface for this. These are not counters. These are error interrupts raised by HW blocks. The count is provided to understand on how frequently the errors are seen. Error recovery for some of the blocks happen internally. That is the reason, Currently only dump op is added. > Many of these counters if not most are data path packet based and maybe > they should belong to ethtool. Regards, -George
[PATCH v2 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
Add health reporters for RVU NPA block. Only reporter dump is supported Output: # devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 # devlink health dump show pci/0002:01:00.0 reporter npa NPA_AF_GENERAL: Unmap PF Error: 0 Free Disabled for NIX0 RX: 0 Free Disabled for NIX0 TX: 0 Free Disabled for NIX1 RX: 0 Free Disabled for NIX1 TX: 0 Free Disabled for SSO: 0 Free Disabled for TIM: 0 Free Disabled for DPI: 0 Free Disabled for AURA: 0 Alloc Disabled for Resvd: 0 NPA_AF_ERR: Memory Fault on NPA_AQ_INST_S read: 0 Memory Fault on NPA_AQ_RES_S write: 0 AQ Doorbell Error: 0 Poisoned data on NPA_AQ_INST_S read: 0 Poisoned data on NPA_AQ_RES_S write: 0 Poisoned data on HW context read: 0 NPA_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 432 +- .../marvell/octeontx2/af/rvu_devlink.h| 23 + .../marvell/octeontx2/af/rvu_struct.h | 23 + 3 files changed, 477 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 596bb9c533b5..bf9efe1f6aec 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -5,10 +5,438 @@ * */ +#include + #include "rvu.h" +#include "rvu_reg.h" +#include "rvu_struct.h" #define DRV_NAME "octeontx2-af" +static int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + return devlink_fmsg_obj_nest_start(fmsg); +} + +static int rvu_report_pair_end(struct devlink_fmsg *fmsg) +{ + int err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + + return devlink_fmsg_pair_nest_end(fmsg); +} + +static bool rvu_common_request_irq(struct rvu *rvu, int offset, + const char *name, irq_handler_t fn) +{ + struct rvu_devlink *rvu_dl = rvu->rvu_dl; + int rc; + + sprintf(>irq_name[offset * NAME_SIZE], name); + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0, +>irq_name[offset * NAME_SIZE], rvu_dl); + if (rc) + dev_warn(rvu->dev, "Failed to register %s irq\n", name); + else + rvu->irq_allocated[offset] = true; + + return rvu->irq_allocated[offset]; +} + +static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_count = rvu_dl->npa_event_cnt; + intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT); + + if (intr & BIT_ULL(0)) + npa_event_count->unmap_slot_count++; + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr); + return IRQ_HANDLED; +} + +static int rvu_npa_inpq_to_cnt(u16 in, + struct rvu_npa_event_cnt *npa_event_count) +{ + switch (in) { + case 0: + return 0; + case BIT(NPA_INPQ_NIX0_RX): + return npa_event_count->free_dis_nix0_rx_count++; + case BIT(NPA_INPQ_NIX0_TX): + return npa_event_count->free_dis_nix0_tx_count++; + case BIT(NPA_INPQ_NIX1_RX): + return npa_event_count->free_dis_nix1_rx_count++; + case BIT(NPA_INPQ_NIX1_TX): + return npa_event_count->free_dis_nix1_tx_count++; + case BIT(NPA_INPQ_SSO): + return npa_event_count->free_dis_sso_count++; + case BIT(NPA_INPQ_TIM): + return npa_event_count->free_dis_tim_count++; + case BIT(NPA_INPQ_DPI): + return npa_event_count->free_dis_dpi_count++; + case BIT(NPA_INPQ_AURA_OP): + return npa_event_count->free_dis_aura_count++; + case BIT(NPA_INPQ_INTERNAL_RSV): + return npa_event_count->free_dis_rsvd_count++; + } + + return npa_event_count->alloc_dis_rsvd_count++; +} + +static irqreturn_t rvu_npa_af_gen_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr, val; + u64 intr; + + rvu = rvu_dl-&
[PATCH v2 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NPA block. Only reporter dump is supported. Output: # ./devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 reporter nix state healthy error 0 recover 0 # ./devlink health dump show pci/0002:01:00.0 reporter nix NIX_AF_GENERAL: Memory Fault on NIX_AQ_INST_S read: 0 Memory Fault on NIX_AQ_RES_S write: 0 AQ Doorbell error: 0 Rx on unmapped PF_FUNC: 0 Rx multicast replication error: 0 Memory fault on NIX_RX_MCE_S read: 0 Memory fault on multicast WQE read: 0 Memory fault on mirror WQE read: 0 Memory fault on mirror pkt write: 0 Memory fault on multicast pkt write: 0 NIX_AF_RAS: Poisoned data on NIX_AQ_INST_S read: 0 Poisoned data on NIX_AQ_RES_S write: 0 Poisoned data on HW context read: 0 Poisoned data on packet read from mirror buffer: 0 Poisoned data on packet read from mcast buffer: 0 Poisoned data on WQE read from mirror buffer: 0 Poisoned data on WQE read from multicast buffer: 0 Poisoned data on NIX_RX_MCE_S read: 0 NIX_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 360 +- .../marvell/octeontx2/af/rvu_devlink.h| 24 ++ .../marvell/octeontx2/af/rvu_struct.h | 10 + 3 files changed, 393 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index bf9efe1f6aec..49e51d1bd7d5 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -35,6 +35,110 @@ static int rvu_report_pair_end(struct devlink_fmsg *fmsg) return devlink_fmsg_pair_nest_end(fmsg); } +static irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_count = rvu_dl->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT); + + if (intr & BIT_ULL(0)) + nix_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr); + return IRQ_HANDLED; +} + +static irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_count = rvu_dl->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT); + + if (intr & BIT_ULL(14)) + nix_event_count->aq_inst_count++; + if (intr & BIT_ULL(13)) + nix_event_count->aq_res_count++; + if (intr & BIT_ULL(12)) + nix_event_count->aq_db_count++; + if (intr & BIT_ULL(6)) + nix_event_count->rx_on_unmap_pf_count++; + if (intr & BIT_ULL(5)) + nix_event_count->rx_mcast_repl_count++; + if (intr & BIT_ULL(4)) + nix_event_count->rx_mcast_memfault_count++; + if (intr & BIT_ULL(3)) + nix_event_count->rx_mcast_wqe_memfault_count++; + if (intr & BIT_ULL(2)) + nix_event_count->rx_mirror_wqe_memfault_count++; + if (intr & BIT_ULL(1)) + nix_event_count->rx_mirror_pktw_memfault_count++; + if (intr & BIT_ULL(0)) + nix_event_count->rx_mcast_pktw_memfault_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr); + return IRQ_HANDLED; +} + +static irqreturn_t rvu_nix_af_ras_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_count = rvu_dl->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RAS); + + if (intr & BIT_ULL(34)) + nix_event_count->poison_aq_inst_count++; + if (intr & BIT_ULL(33)) + nix_event_count->poison_aq_res_count++; + if (intr & BIT_ULL(3
[PATCH v2 net-next 1/3] octeontx2-af: Add devlink suppoort to af driver
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af versions: fixed: mbox version: 9 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 3 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 ++- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 ++ .../marvell/octeontx2/af/rvu_devlink.c| 72 +++ .../marvell/octeontx2/af/rvu_devlink.h| 20 ++ 6 files changed, 107 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 543a1d047567..16caa02095fe 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" select OCTEONTX2_MBOX + select NET_DEVLINK depends on (64BIT && COMPILE_TEST) || ARM64 depends on PCI help diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 2f7a861d0c7b..20135f1d3387 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -9,4 +9,5 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o rvu_trace.o octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \ - rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o + rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o \ + rvu_devlink.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index f0ce2ec0993b..cfff7d3fb705 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -2816,17 +2816,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_flr; + err = rvu_register_dl(rvu); + if (err) + goto err_irq; + rvu_setup_rvum_blk_revid(rvu); /* Enable AF's VFs (if any) */ err = rvu_enable_sriov(rvu); if (err) - goto err_irq; + goto err_dl; /* Initialize debugfs */ rvu_dbg_init(rvu); return 0; +err_dl: + rvu_unregister_dl(rvu); err_irq: rvu_unregister_interrupts(rvu); err_flr: @@ -2858,6 +2864,7 @@ static void rvu_remove(struct pci_dev *pdev) rvu_dbg_exit(rvu); rvu_unregister_interrupts(rvu); + rvu_unregister_dl(rvu); rvu_flr_wq_destroy(rvu); rvu_cgx_exit(rvu); rvu_fwdata_exit(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 5ac9bb12415f..282566235918 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -12,7 +12,10 @@ #define RVU_H #include +#include + #include "rvu_struct.h" +#include "rvu_devlink.h" #include "common.h" #include "mbox.h" @@ -376,6 +379,7 @@ struct rvu { #ifdef CONFIG_DEBUG_FS struct rvu_debugfs rvu_dbg; #endif + struct rvu_devlink *rvu_dl; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c new file mode 100644 index ..596bb9c533b5 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Devlink + * + * Copyright (C) 2020 Marvell International Ltd. + * + */ + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" + +static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + char buf[10]; + int err; + + err = devlink_info_driver_name_put(req, DRV_NAME); + if (err) + return err; + + sprintf(buf, "%X", OTX2_MBOX_VERSION); + return devlink_info_version_fixed_put(req, "mbox version:", buf); +} + +static const struct devlink_ops rvu_devlink_ops = { + .info_get = rvu_devlink_info_get, +}; + +int rvu_register_dl(struct rvu *rvu) +{ + struct r
[PATCH v2 net-next 0/3] Add devlink and devlink health reporters to
Add basic devlink and devlink health reporters. Devlink health reporters are added for NPA and NIX blocks. These reporters report the error count in respective blocks. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html Change-log: - Address Willem's comments on v1. - Fixed the sparse issues, reported by Jakub. George Cherian (3): octeontx2-af: Add devlink suppoort to af driver octeontx2-af: Add devlink health reporters for NPA octeontx2-af: Add devlink health reporters for NIX .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 3 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 +- .../net/ethernet/marvell/octeontx2/af/rvu.h | 4 + .../marvell/octeontx2/af/rvu_devlink.c| 860 ++ .../marvell/octeontx2/af/rvu_devlink.h| 67 ++ .../marvell/octeontx2/af/rvu_struct.h | 33 + 7 files changed, 975 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h -- 2.25.4
Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
Hi Willem, > -Original Message- > From: Willem de Bruijn > Sent: Tuesday, November 3, 2020 11:26 PM > To: George Cherian > Cc: Network Development ; linux-kernel ker...@vger.kernel.org>; Jakub Kicinski ; David Miller > ; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org > Subject: Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health > reporters for NPA > > On Tue, Nov 3, 2020 at 12:43 PM George Cherian > wrote: > > > > Hi Willem, > > > > > > > -Original Message- > > > From: Willem de Bruijn > > > Sent: Tuesday, November 3, 2020 7:21 PM > > > To: George Cherian > > > Cc: Network Development ; linux-kernel > > > ; Jakub Kicinski ; > > > David Miller ; Sunil Kovvuri Goutham > > > ; Linu Cherian ; > > > Geethasowjanya Akula ; masahi...@kernel.org > > > Subject: [EXT] Re: [net-next PATCH 2/3] octeontx2-af: Add devlink > > > health reporters for NPA > > > > > > External Email > > > > > > > > > -- > > > > > > static int rvu_devlink_info_get(struct devlink *devlink, > > > > > > struct > > > > > devlink_info_req *req, > > > > > > struct netlink_ext_ack > > > > > > *extack) { @@ > > > > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu) > > > > > > rvu_dl->dl = dl; > > > > > > rvu_dl->rvu = rvu; > > > > > > rvu->rvu_dl = rvu_dl; > > > > > > - return 0; > > > > > > + > > > > > > + return rvu_health_reporters_create(rvu); > > > > > > > > > > when would this be called with rvu->rvu_dl == NULL? > > > > > > > > During initialization. > > > > > > This is the only caller, and it is only reached if rvu_dl is non-zero. > > > > Did you mean to ask, where is it de-initialized? > > If so, it should be done in rvu_unregister_dl() after freeing rvu_dl. > > No, I meant that rvu_health_reporters_create does not need an !rvu- > >rvu_dl precondition test, as the only callers calls with with a non-zero > rvu_dl. Yes understood!! Will fix in v2. Thanks, -George
Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
Hi Willem, > -Original Message- > From: Willem de Bruijn > Sent: Tuesday, November 3, 2020 7:21 PM > To: George Cherian > Cc: Network Development ; linux-kernel ker...@vger.kernel.org>; Jakub Kicinski ; David Miller > ; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org > Subject: [EXT] Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health > reporters for NPA > > > > > static int rvu_devlink_info_get(struct devlink *devlink, struct > > > devlink_info_req *req, > > > > struct netlink_ext_ack *extack) { @@ > > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu) > > > > rvu_dl->dl = dl; > > > > rvu_dl->rvu = rvu; > > > > rvu->rvu_dl = rvu_dl; > > > > - return 0; > > > > + > > > > + return rvu_health_reporters_create(rvu); > > > > > > when would this be called with rvu->rvu_dl == NULL? > > > > During initialization. > > This is the only caller, and it is only reached if rvu_dl is non-zero. Yes!!! I got it, will address it in v2. Regards -George
Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
Hi Willem, > -Original Message- > From: Willem de Bruijn > Sent: Tuesday, November 3, 2020 7:21 PM > To: George Cherian > Cc: Network Development ; linux-kernel ker...@vger.kernel.org>; Jakub Kicinski ; David Miller > ; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org > Subject: [EXT] Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health > reporters for NPA > > External Email > > -- > > > > static int rvu_devlink_info_get(struct devlink *devlink, struct > > > devlink_info_req *req, > > > > struct netlink_ext_ack *extack) { @@ > > > > -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu) > > > > rvu_dl->dl = dl; > > > > rvu_dl->rvu = rvu; > > > > rvu->rvu_dl = rvu_dl; > > > > - return 0; > > > > + > > > > + return rvu_health_reporters_create(rvu); > > > > > > when would this be called with rvu->rvu_dl == NULL? > > > > During initialization. > > This is the only caller, and it is only reached if rvu_dl is non-zero. Did you mean to ask, where is it de-initialized? If so, it should be done in rvu_unregister_dl() after freeing rvu_dl. Is that what you meant? Regards, -George
Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
Hi Willem, Thanks for the review. > -Original Message- > From: Willem de Bruijn > Sent: Monday, November 2, 2020 7:12 PM > To: George Cherian > Cc: Network Development ; linux-kernel ker...@vger.kernel.org>; Jakub Kicinski ; David Miller > ; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org > Subject: Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health > reporters for NPA > > On Mon, Nov 2, 2020 at 12:07 AM George Cherian > wrote: > > > > Add health reporters for RVU NPA block. > > Only reporter dump is supported > > > > Output: > > # devlink health > > pci/0002:01:00.0: > >reporter npa > > state healthy error 0 recover 0 > > # devlink health dump show pci/0002:01:00.0 reporter npa > > NPA_AF_GENERAL: > > Unmap PF Error: 0 > > Free Disabled for NIX0 RX: 0 > > Free Disabled for NIX0 TX: 0 > > Free Disabled for NIX1 RX: 0 > > Free Disabled for NIX1 TX: 0 > > Free Disabled for SSO: 0 > > Free Disabled for TIM: 0 > > Free Disabled for DPI: 0 > > Free Disabled for AURA: 0 > > Alloc Disabled for Resvd: 0 > > NPA_AF_ERR: > > Memory Fault on NPA_AQ_INST_S read: 0 > > Memory Fault on NPA_AQ_RES_S write: 0 > > AQ Doorbell Error: 0 > > Poisoned data on NPA_AQ_INST_S read: 0 > > Poisoned data on NPA_AQ_RES_S write: 0 > > Poisoned data on HW context read: 0 > > NPA_AF_RVU: > > Unmap Slot Error: 0 > > > > Signed-off-by: Sunil Kovvuri Goutham > > Signed-off-by: Jerin Jacob > > Signed-off-by: George Cherian > > > > +static bool rvu_npa_af_request_irq(struct rvu *rvu, int blkaddr, int > > offset, > > + const char *name, irq_handler_t fn) > > +{ > > + struct rvu_devlink *rvu_dl = rvu->rvu_dl; > > + int rc; > > + > > + WARN_ON(rvu->irq_allocated[offset]); > > Please use WARN_ON sparingly for important unrecoverable events. This > seems like a basic precondition. If it can happen at all, can probably catch > in a > normal branch with a netdev_err. The stacktrace in the oops is not likely to > point at the source of the non-zero value, anyway. Okay, will fix it in v2. > > > + rvu->irq_allocated[offset] = false; > > Why initialize this here? Are these fields not zeroed on alloc? Is this here > only > to safely call rvu_npa_unregister_interrupts on partial alloc? Then it might > be > simpler to just have jump labels in this function to free the successfully > requested irqs. It shouldn't be initialized like this; it is zeroed on alloc. Will fix in v2. > > > + sprintf(>irq_name[offset * NAME_SIZE], name); > > + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0, > > +>irq_name[offset * NAME_SIZE], rvu_dl); > > + if (rc) > > + dev_warn(rvu->dev, "Failed to register %s irq\n", name); > > + else > > + rvu->irq_allocated[offset] = true; > > + > > + return rvu->irq_allocated[offset]; } > > > +static int rvu_npa_health_reporters_create(struct rvu_devlink > > +*rvu_dl) { > > + struct devlink_health_reporter *rvu_npa_health_reporter; > > + struct rvu_npa_event_cnt *npa_event_count; > > + struct rvu *rvu = rvu_dl->rvu; > > + > > + npa_event_count = kzalloc(sizeof(*npa_event_count), GFP_KERNEL); > > + if (!npa_event_count) > > + return -ENOMEM; > > + > > + rvu_dl->npa_event_cnt = npa_event_count; > > + rvu_npa_health_reporter = devlink_health_reporter_create(rvu_dl- > >dl, > > + > > _npa_hw_fault_reporter_ops, > > +0, rvu); > > + if (IS_ERR(rvu_npa_health_reporter)) { > > + dev_warn(rvu->dev, "Failed to create npa reporter, err > > =%ld\n", > > +PTR_ERR(rvu_npa_health_reporter)); > > + return PTR_ERR(rvu_npa_health_reporter); > > + } > > + > > + rvu_dl->rvu_npa_health_reporter = rvu_npa_health_reporter; > > + return 0; > > +} > > + > > +static void rvu_npa_health_reporters_destroy(struct rvu_devlink > > +*rvu_dl) { > > + if (!rvu_dl->rvu_npa_health_reporter)
Re: [net-next PATCH 1/3] octeontx2-af: Add devlink suppoort to af driver
Hi Willem, Thanks for the review. > -Original Message- > From: Willem de Bruijn > Sent: Monday, November 2, 2020 7:01 PM > To: George Cherian > Cc: Network Development ; linux-kernel ker...@vger.kernel.org>; Jakub Kicinski ; David Miller > ; Sunil Kovvuri Goutham > ; Linu Cherian ; > Geethasowjanya Akula ; masahi...@kernel.org > Subject: Re: [net-next PATCH 1/3] octeontx2-af: Add devlink suppoort > to af driver > > On Mon, Nov 2, 2020 at 12:07 AM George Cherian > wrote: > > > > Add devlink support to AF driver. Basic devlink support is added. > > Currently info_get is the only supported devlink ops. > > > > devlink ouptput looks like this > > # devlink dev > > pci/0002:01:00.0 > > # devlink dev info > > pci/0002:01:00.0: > > driver octeontx2-af > > versions: > > fixed: > > mbox version: 9 > > > > Signed-off-by: Sunil Kovvuri Goutham > > Signed-off-by: Jerin Jacob > > Signed-off-by: George Cherian > > > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h > > b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h > > index 5ac9bb12415f..c112b299635d 100644 > > --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h > > +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h > > @@ -12,7 +12,10 @@ > > #define RVU_H > > > > #include > > +#include > > + > > #include "rvu_struct.h" > > +#include "rvu_devlink.h" > > #include "common.h" > > #include "mbox.h" > > > > @@ -372,10 +375,10 @@ struct rvu { > > struct npc_kpu_profile_adapter kpu; > > > > struct ptp *ptp; > > - > > accidentally removed this line? Yes. > > > #ifdef CONFIG_DEBUG_FS > > struct rvu_debugfs rvu_dbg; > > #endif > > + struct rvu_devlink *rvu_dl; > > }; > > > > +int rvu_register_dl(struct rvu *rvu) > > +{ > > + struct rvu_devlink *rvu_dl; > > + struct devlink *dl; > > + int err; > > + > > + rvu_dl = kzalloc(sizeof(*rvu_dl), GFP_KERNEL); > > + if (!rvu_dl) > > + return -ENOMEM; > > + > > + dl = devlink_alloc(_devlink_ops, sizeof(struct rvu_devlink)); > > + if (!dl) { > > + dev_warn(rvu->dev, "devlink_alloc failed\n"); > > + return -ENOMEM; > > rvu_dl not freed on error. Thanks for pointing out, will address in v2. > > This happens a couple of times in these patches Will fix it. > > Is the intermediate struct needed, or could you embed the fields directly into > rvu and use container_of to get from devlink to struct rvu? Even if needed, > perhaps easier to embed the struct into rvu rather than a pointer. Currently only 2 hardware blocks are supported NIX and NPA. Error reporting for more HW blocks will be added, that’s the reason for the intermediate struct. > > > + } > > + > > + err = devlink_register(dl, rvu->dev); > > + if (err) { > > + dev_err(rvu->dev, "devlink register failed with error > > %d\n", err); > > + devlink_free(dl); > > + return err; > > + } > > + > > + rvu_dl->dl = dl; > > + rvu_dl->rvu = rvu; > > + rvu->rvu_dl = rvu_dl; > > + return 0; > > +} > > + > > +void rvu_unregister_dl(struct rvu *rvu) { > > + struct rvu_devlink *rvu_dl = rvu->rvu_dl; > > + struct devlink *dl = rvu_dl->dl; > > + > > + if (!dl) > > + return; > > + > > + devlink_unregister(dl); > > + devlink_free(dl); > > here too Yes, will fix in v2. Regards, -George
[net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA
Add health reporters for RVU NPA block. Only reporter dump is supported Output: # devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 # devlink health dump show pci/0002:01:00.0 reporter npa NPA_AF_GENERAL: Unmap PF Error: 0 Free Disabled for NIX0 RX: 0 Free Disabled for NIX0 TX: 0 Free Disabled for NIX1 RX: 0 Free Disabled for NIX1 TX: 0 Free Disabled for SSO: 0 Free Disabled for TIM: 0 Free Disabled for DPI: 0 Free Disabled for AURA: 0 Alloc Disabled for Resvd: 0 NPA_AF_ERR: Memory Fault on NPA_AQ_INST_S read: 0 Memory Fault on NPA_AQ_RES_S write: 0 AQ Doorbell Error: 0 Poisoned data on NPA_AQ_INST_S read: 0 Poisoned data on NPA_AQ_RES_S write: 0 Poisoned data on HW context read: 0 NPA_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 434 +- .../marvell/octeontx2/af/rvu_devlink.h| 23 + .../marvell/octeontx2/af/rvu_struct.h | 23 + 3 files changed, 479 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index c9f5f66e6701..946e751fb544 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -5,10 +5,440 @@ * */ +#include + #include "rvu.h" +#include "rvu_reg.h" +#include "rvu_struct.h" #define DRV_NAME "octeontx2-af" +void rvu_npa_unregister_interrupts(struct rvu *rvu); + +int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + return devlink_fmsg_obj_nest_start(fmsg); +} + +int rvu_report_pair_end(struct devlink_fmsg *fmsg) +{ + int err; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + return err; + + return devlink_fmsg_pair_nest_end(fmsg); +} + +static irqreturn_t rvu_npa_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_count = rvu_dl->npa_event_cnt; + intr = rvu_read64(rvu, blkaddr, NPA_AF_RVU_INT); + + if (intr & BIT_ULL(0)) + npa_event_count->unmap_slot_count++; + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NPA_AF_RVU_INT, intr); + return IRQ_HANDLED; +} + +static int rvu_npa_inpq_to_cnt(u16 in, + struct rvu_npa_event_cnt *npa_event_count) +{ + switch (in) { + case 0: + return 0; + case BIT(NPA_INPQ_NIX0_RX): + return npa_event_count->free_dis_nix0_rx_count++; + case BIT(NPA_INPQ_NIX0_TX): + return npa_event_count->free_dis_nix0_tx_count++; + case BIT(NPA_INPQ_NIX1_RX): + return npa_event_count->free_dis_nix1_rx_count++; + case BIT(NPA_INPQ_NIX1_TX): + return npa_event_count->free_dis_nix1_tx_count++; + case BIT(NPA_INPQ_SSO): + return npa_event_count->free_dis_sso_count++; + case BIT(NPA_INPQ_TIM): + return npa_event_count->free_dis_tim_count++; + case BIT(NPA_INPQ_DPI): + return npa_event_count->free_dis_dpi_count++; + case BIT(NPA_INPQ_AURA_OP): + return npa_event_count->free_dis_aura_count++; + case BIT(NPA_INPQ_INTERNAL_RSV): + return npa_event_count->free_dis_rsvd_count++; + } + + return npa_event_count->alloc_dis_rsvd_count++; +} + +static irqreturn_t rvu_npa_af_gen_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_npa_event_cnt *npa_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr, val; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0); + if (blkaddr < 0) + return IRQ_NONE; + + npa_event_count = rvu_dl->npa_event_cnt; + intr = rvu_read64(rvu, blkaddr, NPA_AF_GEN_INT); + + if (intr & BIT_ULL(32)) + npa_event_count->unmap_pf_count++; + + val = FIELD_GET(GENMASK(31, 16), intr); + rvu_npa_inpq_to_cnt(val, npa_event_count); + + val = FIELD_GET(GENMASK(15, 0), intr); + rvu_npa_inpq_to_cnt(val, npa_event_count); + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, N
[net-next PATCH 1/3] octeontx2-af: Add devlink suppoort to af driver
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af versions: fixed: mbox version: 9 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 3 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 ++- .../net/ethernet/marvell/octeontx2/af/rvu.h | 5 +- .../marvell/octeontx2/af/rvu_devlink.c| 69 +++ .../marvell/octeontx2/af/rvu_devlink.h| 20 ++ 6 files changed, 104 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig index 543a1d047567..16caa02095fe 100644 --- a/drivers/net/ethernet/marvell/octeontx2/Kconfig +++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX config OCTEONTX2_AF tristate "Marvell OcteonTX2 RVU Admin Function driver" select OCTEONTX2_MBOX + select NET_DEVLINK depends on (64BIT && COMPILE_TEST) || ARM64 depends on PCI help diff --git a/drivers/net/ethernet/marvell/octeontx2/af/Makefile b/drivers/net/ethernet/marvell/octeontx2/af/Makefile index 2f7a861d0c7b..20135f1d3387 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/Makefile +++ b/drivers/net/ethernet/marvell/octeontx2/af/Makefile @@ -9,4 +9,5 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o octeontx2_mbox-y := mbox.o rvu_trace.o octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \ - rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o + rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o \ + rvu_devlink.o diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index a28a518c0eae..58c48fa7aa72 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -2812,10 +2812,14 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_mbox; - err = rvu_register_interrupts(rvu); + err = rvu_register_dl(rvu); if (err) goto err_flr; + err = rvu_register_interrupts(rvu); + if (err) + goto err_dl; + rvu_setup_rvum_blk_revid(rvu); /* Enable AF's VFs (if any) */ @@ -2829,6 +2833,8 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) return 0; err_irq: rvu_unregister_interrupts(rvu); +err_dl: + rvu_unregister_dl(rvu); err_flr: rvu_flr_wq_destroy(rvu); err_mbox: @@ -2858,6 +2864,7 @@ static void rvu_remove(struct pci_dev *pdev) rvu_dbg_exit(rvu); rvu_unregister_interrupts(rvu); + rvu_unregister_dl(rvu); rvu_flr_wq_destroy(rvu); rvu_cgx_exit(rvu); rvu_fwdata_exit(rvu); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 5ac9bb12415f..c112b299635d 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -12,7 +12,10 @@ #define RVU_H #include +#include + #include "rvu_struct.h" +#include "rvu_devlink.h" #include "common.h" #include "mbox.h" @@ -372,10 +375,10 @@ struct rvu { struct npc_kpu_profile_adapter kpu; struct ptp *ptp; - #ifdef CONFIG_DEBUG_FS struct rvu_debugfs rvu_dbg; #endif + struct rvu_devlink *rvu_dl; }; static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c new file mode 100644 index ..c9f5f66e6701 --- /dev/null +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Marvell OcteonTx2 RVU Devlink + * + * Copyright (C) 2020 Marvell International Ltd. + * + */ + +#include "rvu.h" + +#define DRV_NAME "octeontx2-af" + +static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + char buf[10]; + int err; + + err = devlink_info_driver_name_put(req, DRV_NAME); + if (err) + return err; + + sprintf(buf, "%X", OTX2_MBOX_VERSION); + return devlink_info_version_fixed_put(req, "mbox
[net-next PATCH 0/3] Add devlink and devlink health reporters to
Add basic devlink and devlink health reporters. Devlink health reporters are added for NPA and NIX blocks. These reporters report the error count in respective blocks. Address Jakub's comment to add devlink support for error reporting. https://www.spinics.net/lists/netdev/msg670712.html George Cherian (3): octeontx2-af: Add devlink suppoort to af driver octeontx2-af: Add devlink health reporters for NPA octeontx2-af: Add devlink health reporters for NIX .../net/ethernet/marvell/octeontx2/Kconfig| 1 + .../ethernet/marvell/octeontx2/af/Makefile| 3 +- .../net/ethernet/marvell/octeontx2/af/rvu.c | 9 +- .../net/ethernet/marvell/octeontx2/af/rvu.h | 5 +- .../marvell/octeontx2/af/rvu_devlink.c| 875 ++ .../marvell/octeontx2/af/rvu_devlink.h| 67 ++ .../marvell/octeontx2/af/rvu_struct.h | 33 + 7 files changed, 990 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h -- 2.25.1
[net-next PATCH 3/3] octeontx2-af: Add devlink health reporters for NIX
Add health reporters for RVU NPA block. Only reporter dump is supported. Output: # ./devlink health pci/0002:01:00.0: reporter npa state healthy error 0 recover 0 reporter nix state healthy error 0 recover 0 # ./devlink health dump show pci/0002:01:00.0 reporter nix NIX_AF_GENERAL: Memory Fault on NIX_AQ_INST_S read: 0 Memory Fault on NIX_AQ_RES_S write: 0 AQ Doorbell error: 0 Rx on unmapped PF_FUNC: 0 Rx multicast replication error: 0 Memory fault on NIX_RX_MCE_S read: 0 Memory fault on multicast WQE read: 0 Memory fault on mirror WQE read: 0 Memory fault on mirror pkt write: 0 Memory fault on multicast pkt write: 0 NIX_AF_RAS: Poisoned data on NIX_AQ_INST_S read: 0 Poisoned data on NIX_AQ_RES_S write: 0 Poisoned data on HW context read: 0 Poisoned data on packet read from mirror buffer: 0 Poisoned data on packet read from mcast buffer: 0 Poisoned data on WQE read from mirror buffer: 0 Poisoned data on WQE read from multicast buffer: 0 Poisoned data on NIX_RX_MCE_S read: 0 NIX_AF_RVU: Unmap Slot Error: 0 Signed-off-by: Sunil Kovvuri Goutham Signed-off-by: Jerin Jacob Signed-off-by: George Cherian --- .../marvell/octeontx2/af/rvu_devlink.c| 376 +- .../marvell/octeontx2/af/rvu_devlink.h| 24 ++ .../marvell/octeontx2/af/rvu_struct.h | 10 + 3 files changed, 409 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index 946e751fb544..c2dd2026c7da 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -14,6 +14,7 @@ #define DRV_NAME "octeontx2-af" void rvu_npa_unregister_interrupts(struct rvu *rvu); +void rvu_nix_unregister_interrupts(struct rvu *rvu); int rvu_report_pair_start(struct devlink_fmsg *fmsg, const char *name) { @@ -37,6 +38,373 @@ int rvu_report_pair_end(struct devlink_fmsg *fmsg) return devlink_fmsg_pair_nest_end(fmsg); } +irqreturn_t rvu_nix_af_rvu_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_count = rvu_dl->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RVU_INT); + + if (intr & BIT_ULL(0)) + nix_event_count->unmap_slot_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_RVU_INT, intr); + return IRQ_HANDLED; +} + +irqreturn_t rvu_nix_af_err_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_count = rvu_dl->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_ERR_INT); + + if (intr & BIT_ULL(14)) + nix_event_count->aq_inst_count++; + if (intr & BIT_ULL(13)) + nix_event_count->aq_res_count++; + if (intr & BIT_ULL(12)) + nix_event_count->aq_db_count++; + if (intr & BIT_ULL(6)) + nix_event_count->rx_on_unmap_pf_count++; + if (intr & BIT_ULL(5)) + nix_event_count->rx_mcast_repl_count++; + if (intr & BIT_ULL(4)) + nix_event_count->rx_mcast_memfault_count++; + if (intr & BIT_ULL(3)) + nix_event_count->rx_mcast_wqe_memfault_count++; + if (intr & BIT_ULL(2)) + nix_event_count->rx_mirror_wqe_memfault_count++; + if (intr & BIT_ULL(1)) + nix_event_count->rx_mirror_pktw_memfault_count++; + if (intr & BIT_ULL(0)) + nix_event_count->rx_mcast_pktw_memfault_count++; + + /* Clear interrupts */ + rvu_write64(rvu, blkaddr, NIX_AF_ERR_INT, intr); + return IRQ_HANDLED; +} + +irqreturn_t rvu_nix_af_ras_intr_handler(int irq, void *rvu_irq) +{ + struct rvu_nix_event_cnt *nix_event_count; + struct rvu_devlink *rvu_dl = rvu_irq; + struct rvu *rvu; + int blkaddr; + u64 intr; + + rvu = rvu_dl->rvu; + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, 0); + if (blkaddr < 0) + return IRQ_NONE; + + nix_event_count = rvu_dl->nix_event_cnt; + intr = rvu_read64(rvu, blkaddr, NIX_AF_RAS); + +
[net-next PATCH 2/2] octeontx2-pf: Support to change VLAN based RSS hash options via ethtool
Add support to control rx-flow-hash based on VLAN. By default VLAN plus 4-tuple based hashing is enabled. Changes can be done runtime using ethtool To enable 2-tuple plus VLAN based flow distribution # ethtool -N rx-flow-hash sdv To enable 4-tuple plus VLAN based flow distribution # ethtool -N rx-flow-hash sdfnv Signed-off-by: George Cherian Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 2 +- drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 7 +++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index 820fc660de66..d2581090f9a4 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -355,7 +355,7 @@ int otx2_rss_init(struct otx2_nic *pfvf) rss->flowkey_cfg = rss->enable ? rss->flowkey_cfg : NIX_FLOW_KEY_TYPE_IPV4 | NIX_FLOW_KEY_TYPE_IPV6 | NIX_FLOW_KEY_TYPE_TCP | NIX_FLOW_KEY_TYPE_UDP | - NIX_FLOW_KEY_TYPE_SCTP; + NIX_FLOW_KEY_TYPE_SCTP | NIX_FLOW_KEY_TYPE_VLAN; ret = otx2_set_flowkey_cfg(pfvf); if (ret) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c index 0341d9694e8b..662fb80dbb9d 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c @@ -428,6 +428,8 @@ static int otx2_get_rss_hash_opts(struct otx2_nic *pfvf, /* Mimimum is IPv4 and IPv6, SIP/DIP */ nfc->data = RXH_IP_SRC | RXH_IP_DST; + if (rss->flowkey_cfg & NIX_FLOW_KEY_TYPE_VLAN) + nfc->data |= RXH_VLAN; switch (nfc->flow_type) { case TCP_V4_FLOW: @@ -477,6 +479,11 @@ static int otx2_set_rss_hash_opts(struct otx2_nic *pfvf, if (!(nfc->data & RXH_IP_SRC) || !(nfc->data & RXH_IP_DST)) return -EINVAL; + if (nfc->data & RXH_VLAN) + rss_cfg |= NIX_FLOW_KEY_TYPE_VLAN; + else + rss_cfg &= ~NIX_FLOW_KEY_TYPE_VLAN; + switch (nfc->flow_type) { case TCP_V4_FLOW: case TCP_V6_FLOW: -- 2.25.1
[net-next PATCH 1/2] octeontx2-af: Add support for VLAN based RSS hashing
Added support for PF/VF drivers to choose RSS flow key algorithm with VLAN tag included in hashing input data. Only CTAG is considered. Signed-off-by: George Cherian Signed-off-by: Sunil Goutham --- drivers/net/ethernet/marvell/octeontx2/af/mbox.h| 1 + drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 8 2 files changed, 9 insertions(+) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index 4aaef0a2b51c..aa3bda3f34be 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -625,6 +625,7 @@ struct nix_rss_flowkey_cfg { #define NIX_FLOW_KEY_TYPE_INNR_UDP BIT(15) #define NIX_FLOW_KEY_TYPE_INNR_SCTP BIT(16) #define NIX_FLOW_KEY_TYPE_INNR_ETH_DMAC BIT(17) +#define NIX_FLOW_KEY_TYPE_VLAN BIT(20) u32 flowkey_cfg; /* Flowkey types selected */ u8 group; /* RSS context or group */ }; diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index 08181fc5f5d4..4bdc4baa3c59 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -2509,6 +2509,14 @@ static int set_flowkey_fields(struct nix_rx_flowkey_alg *alg, u32 flow_cfg) field->ltype_match = NPC_LT_LE_GTPU; field->ltype_mask = 0xF; break; + case NIX_FLOW_KEY_TYPE_VLAN: + field->lid = NPC_LID_LB; + field->hdr_offset = 2; /* Skip TPID (2-bytes) */ + field->bytesm1 = 1; /* 2 Bytes (Actually 12 bits) */ + field->ltype_match = NPC_LT_LB_CTAG; + field->ltype_mask = 0xF; + field->fn_mask = 1; /* Mask out the first nibble */ + break; } field->ena = 1; -- 2.25.1
[net-next PATCH 0/2] Add support for VLAN based flow distribution
This series add support for VLAN based flow distribution for octeontx2 netdev driver. This adds support for configuring the same via ethtool. Following tests have been done. - Multi VLAN flow with same SD - Multi VLAN flow with same SDFN - Single VLAN flow with multi SD - Single VLAN flow with multi SDFN All tests done for udp/tcp both v4 and v6 George Cherian (2): octeontx2-af: Add support for VLAN based RSS hashing octeontx2-pf: Support to change VLAN based RSS hash options via ethtool drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 + drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c| 10 +- .../net/ethernet/marvell/octeontx2/nic/otx2_common.c | 2 +- .../net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 7 +++ 4 files changed, 18 insertions(+), 2 deletions(-) -- 2.25.1
Re: [PATCH v2 3/3] asm-generic/io.h: Fix !CONFIG_GENERIC_IOMAP pci_iounmap() implementation
> -Original Message- > From: Lorenzo Pieralisi > Sent: Thursday, September 17, 2020 3:00 PM > To: Catalin Marinas > Cc: linux-kernel@vger.kernel.org; George Cherian ; > Arnd Bergmann ; Will Deacon ; Bjorn > Helgaas ; Yang Yingliang > ; linux-...@vger.kernel.org; linux- > a...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; David S. Miller > > Subject: Re: [PATCH v2 3/3] asm-generic/io.h: Fix > !CONFIG_GENERIC_IOMAP pci_iounmap() implementation > > > -- > On Wed, Sep 16, 2020 at 03:51:11PM +0100, Catalin Marinas wrote: > > On Wed, Sep 16, 2020 at 12:06:58PM +0100, Lorenzo Pieralisi wrote: > > > For arches that do not select CONFIG_GENERIC_IOMAP, the current > > > pci_iounmap() function does nothing causing obvious memory leaks for > > > mapped regions that are backed by MMIO physical space. > > > > > > In order to detect if a mapped pointer is IO vs MMIO, a check must > > > made available to the pci_iounmap() function so that it can actually > > > detect whether the pointer has to be unmapped. > > > > > > In configurations where CONFIG_HAS_IOPORT_MAP && > > > !CONFIG_GENERIC_IOMAP, a mapped port is detected using an > > > ioport_map() stub defined in asm-generic/io.h. > > > > > > Use the same logic to implement a stub (ie __pci_ioport_unmap()) > > > that detects if the passed in pointer in pci_iounmap() is IO vs MMIO > > > to iounmap conditionally and call it in pci_iounmap() fixing the issue. > > > > > > Leave __pci_ioport_unmap() as a NOP for all other config options. > > > > > > Reported-by: George Cherian > > > Link: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org > > > _lkml_20200905024811.74701-2D1-2Dyangyingliang- > 40huawei.com=DwIBAg > > > > =nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk > Mzr4a > > > Cbk=UO5qU5LtNtCn6_gnT0rCkBxIm-w8jCaxHO6v7oK-U- > I=CSGHQpKoVdNiqb1e > > > DFuRUhka_Xv5o2PosWZ1rR8oOD4= > > > Link: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org > > > _lkml_20200824132046.3114383-2D1-2Dgeorge.cherian- > 40marvell.com=Dw > > > > IBAg=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpz > RkkM > > > zr4aCbk=UO5qU5LtNtCn6_gnT0rCkBxIm-w8jCaxHO6v7oK-U- > I=3B83oan7i1g3 > > > KaPgQmFK6PudR9GzvAPk33Z5Yyv-CMI= > > > Signed-off-by: Lorenzo Pieralisi > > > Cc: Arnd Bergmann > > > Cc: George Cherian > > > Cc: Will Deacon > > > Cc: Bjorn Helgaas > > > Cc: Catalin Marinas > > > Cc: Yang Yingliang > > > --- > > > include/asm-generic/io.h | 39 > > > +++ > > > 1 file changed, 27 insertions(+), 12 deletions(-) > > > > This works for me. The only question I have is whether pci_iomap.h is > > better than io.h for __pci_ioport_unmap(). These headers are really > > confusing. > > Yes they are, in total honesty there is much more to do to make them sane, > this patch is just a band-aid. > > I thought about moving this stuff into pci_iomap.h, though that file is > included _independently_ from io.h from some arches so I tried to keep > everything in io.h to minimize disruption. > > We can merge this patch - since it is a fix after all - and then I can try to > improve the whole pci_iounmap() includes. > > > Either way: > > > > Reviewed-by: Catalin Marinas > > Thanks a lot. I'd appreciate a tested-by from the George as he is the one who > reported the problem. Verified this patch and it works as expected. Tested-by: George Cherian > Lorenzo
Re: [PATCH] arm64: PCI: fix memleak when calling pci_iomap/unmap()
> -Original Message- > From: Catalin Marinas > Sent: Monday, September 7, 2020 4:16 PM > To: Yang Yingliang > Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-arm- > ker...@lists.infradead.org; will.dea...@arm.com; bhelg...@google.com; > George Cherian ; guohan...@huawei.com > Subject: Re: [PATCH] arm64: PCI: fix memleak when calling > pci_iomap/unmap() > > > -- > On Sat, Sep 05, 2020 at 10:48:11AM +0800, Yang Yingliang wrote: > > diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index > > 1006ed2d7c604..ddfa1c53def48 100644 > > --- a/arch/arm64/kernel/pci.c > > +++ b/arch/arm64/kernel/pci.c > > @@ -217,4 +217,9 @@ void pcibios_remove_bus(struct pci_bus *bus) > > acpi_pci_remove_bus(bus); > > } > > > > +void pci_iounmap(struct pci_dev *dev, void __iomem *addr) { > > + iounmap(addr); > > +} > > +EXPORT_SYMBOL(pci_iounmap); > > So, what's wrong with the generic pci_iounmap() implementation? > Shouldn't it call iounmap() already? Since ARM64 selects CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP, the pci_iounmap function is reduced to a NULL function. Due to this, even the managed release variants or even the explicit pci_iounmap calls doesn't really remove the mappings leading to leak. -George https://lkml.org/lkml/2020/8/20/28 > > -- > Catalin
Re: Re: [PATCH v3] PCI: Add pci_iounmap
Hi Yang, > -Original Message- > From: Yang Yingliang > Sent: Tuesday, September 1, 2020 6:59 PM > To: George Cherian ; linux-kernel@vger.kernel.org; > linux-a...@vger.kernel.org; linux-...@vger.kernel.org > Cc: kbuild-...@lists.01.org; bhelg...@google.com; a...@arndb.de; > m...@redhat.com > Subject: Re: [PATCH v3] PCI: Add pci_iounmap > > > > > On 2020/8/25 9:25, kernel test robot wrote: > > Hi George, > > > > I love your patch! Yet something to improve: > > > > [auto build test ERROR on pci/next] > > [also build test ERROR on linux/master linus/master asm-generic/master > > v5.9-rc2 next-20200824] [If your patch is applied to the wrong git tree, > kindly drop us a note. > > And when submitting patch, we suggest to use '--base' as documented in > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git- > 2Dscm.com_doc > > s_git-2Dformat-2Dpatch=DwIC- > g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7di > > > rkF6u2D3eSIS0cA8FeYpzRkkMzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIV > WLSWfG61 > > NWTWG5LI=ycW6SZOVRuKAm3YwdhyAuSh22oPuengSMVuv- > EwaUew= ] > > > > url:https://urldefense.proofpoint.com/v2/url?u=https- > 3A__github.com_0day-2Dci_linux_commits_George-2DCherian_PCI-2DAdd- > 2Dpci-5Fiounmap_20200824-2D212149=DwIC- > g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk > Mzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIVWLSWfG61NWTWG5LI=6c > UOYHeDOBZ0HaFc2z-vaDgDmbIK4LCBRt9kNkn1sto= > > base: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__git.kernel.org_pub_scm_linux_kernel_git_helgaas_pci.git=DwIC- > g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk > Mzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIVWLSWfG61NWTWG5LI=h- > TMyLlEdAwew-u52q4dgWBUMgm0ys-xKzvOO86e1Lw= next > > config: powerpc-allyesconfig (attached as .config) > > compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 > > build): > > wget https://urldefense.proofpoint.com/v2/url?u=https- > 3A__raw.githubusercontent.com_intel_lkp- > 2Dtests_master_sbin_make.cross=DwIC- > g=nKjWec2b6R0mOyPaz7xtfQ=TjMsEFPc7dirkF6u2D3eSIS0cA8FeYpzRkk > Mzr4aCbk=dvtRkwC273FmalEZE_KonLRWrIVWLSWfG61NWTWG5LI=az > QcL0MQmPpr9UfvyBSSdQiu1UbjJgFrzNJOtcZ_--E= -O ~/bin/make.cross > > chmod +x ~/bin/make.cross > > # save the attached .config to linux build tree > > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 > > make.cross ARCH=powerpc > > > > If you fix the issue, kindly add following tag as appropriate > > Reported-by: kernel test robot > > > > All errors (new ones prefixed by >>): > > > > powerpc64-linux-ld: lib/pci_iomap.o: in function `__crc_pci_iounmap': > >>> (.rodata+0x10): multiple definition of `__crc_pci_iounmap'; > >>> lib/iomap.o:(.rodata+0x68): first defined here > EXPORT_SYMBOL(pci_iounmap) in lib/iomap.c need be removed. I really don't think that is the way to fix this. I have also seen your other patch in which iomap being moved out of lib/iomap.c to header file. There was a reason for moving iomap and its variants to a lib since most of the arch's implementation of map was similar. Whereas the unmap had multiple implementation per arch's. So, the lib/iomap never implemented the generic unmap. I see either of the following solution. a. Have an arm64 specific implementation for the unmap function. Or b. something on the lines of v2[1], which accommodates all the arch's but has the #ifdef for which Bjorn raised his concerns. Bjorn, any comments? Regards -George [1] - https://lkml.org/lkml/2020/8/20/28
[PATCH v3] PCI: Add pci_iounmap
In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a NULL function. Due to this the managed release variants or even the explicit pci_iounmap calls doesn't really remove the mappings. This issue is seen on an arm64 based system. arm64 by default selects only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP from this 'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")' Also '66eab4df288a ("lib: add GENERIC_PCI_IOMAP")' moved only the iomap functions to lib/pci_iomap.c. The pci_iounmap() was left in lib/iomap.c as different achitectures has its own pci_iounmap implementation. For architectures, which doesn't have pci_iounmap implemented, this would lead to a potential leak. So provide a generic iounmap function in lib/pci_iomap.c. Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap, would lead to the following error message after long hour tests "allocation failed: out of vmalloc space - use vmalloc= to increase size." Signed-off-by: George Cherian --- * Changes from v2 - Get rid of the #ifdefs around pci_iounmap() * Changes from v1 - Fix the 0-day compilation error. - Mark the lib/iomap pci_iounmap call as weak incase if any architecture have there own implementation. include/asm-generic/io.h| 4 include/asm-generic/iomap.h | 1 - include/asm-generic/pci_iomap.h | 1 + lib/pci_iomap.c | 6 ++ 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index dabf8cb7203b..5986b37226b7 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void __iomem *addr, struct pci_dev; extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +#ifdef CONFIG_GENERIC_PCI_IOMAP +extern void pci_iounmap(struct pci_dev *dev, void __iomem *p); +#else #ifndef pci_iounmap #define pci_iounmap pci_iounmap static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p) { } #endif +#endif /* CONFIG_GENERIC_PCI_IOMAP */ #endif /* CONFIG_GENERIC_IOMAP */ /* diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index 649224664969..68c75e26edbd 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -104,7 +104,6 @@ extern void ioport_unmap(void __iomem *); #ifdef CONFIG_PCI /* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */ struct pci_dev; -extern void pci_iounmap(struct pci_dev *dev, void __iomem *); #elif defined(CONFIG_GENERIC_IOMAP) struct pci_dev; static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index d4f16dcc2ed7..3684307a6b44 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -18,6 +18,7 @@ extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void pci_iounmap(struct pci_dev *dev, void __iomem *p); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index 2d3eb1cb73b8..e97b73995af7 100644 --- a/lib/pci_iomap.c +++ b/lib/pci_iomap.c @@ -134,4 +134,10 @@ void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen) return pci_iomap_wc_range(dev, bar, 0, maxlen); } EXPORT_SYMBOL_GPL(pci_iomap_wc); + +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr) +{ + iounmap(addr); +} +EXPORT_SYMBOL(pci_iounmap); #endif /* CONFIG_PCI */ -- 2.25.1
Re: [PATCHv2] PCI: Add pci_iounmap
Hi Bjorn, > -Original Message- > From: Bjorn Helgaas > Sent: Friday, August 21, 2020 3:26 AM > To: George Cherian > Cc: linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; linux- > p...@vger.kernel.org; bhelg...@google.com; a...@arndb.de; Michael S. > Tsirkin > Subject: [EXT] Re: [PATCHv2] PCI: Add pci_iounmap > > [+cc Michael, author of 66eab4df288a ("lib: add GENERIC_PCI_IOMAP")] > > On Thu, Aug 20, 2020 at 10:33:06AM +0530, George Cherian wrote: > > In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not > > CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a > > NULL function. Due to this the managed release variants or even the > > explicit pci_iounmap calls doesn't really remove the mappings. > > > > This issue is seen on an arm64 based system. arm64 by default selects > > only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP > from this > > 'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")' > > > > Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap, > > would lead to the following error message after long hour tests > > > > "allocation failed: out of vmalloc space - use vmalloc= to > > increase size." > > > > Signed-off-by: George Cherian > > --- > > * Changes from v1 > > - Fix the 0-day compilation error. > > - Mark the lib/iomap pci_iounmap call as weak incase > > if any architecture have there own implementation. > > > > include/asm-generic/io.h | 4 > > lib/pci_iomap.c | 10 ++ > > 2 files changed, 14 insertions(+) > > > > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index > > dabf8cb7203b..5986b37226b7 100644 > > --- a/include/asm-generic/io.h > > +++ b/include/asm-generic/io.h > > @@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void > > __iomem *addr, struct pci_dev; extern void __iomem *pci_iomap(struct > > pci_dev *dev, int bar, unsigned long max); > > > > +#ifdef CONFIG_GENERIC_PCI_IOMAP > > +extern void pci_iounmap(struct pci_dev *dev, void __iomem *p); #else > > #ifndef pci_iounmap > > #define pci_iounmap pci_iounmap > > static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p) > > { } #endif > > +#endif /* CONFIG_GENERIC_PCI_IOMAP */ > > #endif /* CONFIG_GENERIC_IOMAP */ > > > > /* > > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index > > 2d3eb1cb73b8..ecd1eb3f6c25 100644 > > --- a/lib/pci_iomap.c > > +++ b/lib/pci_iomap.c > > @@ -134,4 +134,14 @@ void __iomem *pci_iomap_wc(struct pci_dev > *dev, int bar, unsigned long maxlen) > > return pci_iomap_wc_range(dev, bar, 0, maxlen); } > > EXPORT_SYMBOL_GPL(pci_iomap_wc); > > + > > +#ifndef CONFIG_GENERIC_IOMAP > > +#define pci_iounmap pci_iounmap > > +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr); > > +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr) { > > + iounmap(addr); > > +} > > +EXPORT_SYMBOL(pci_iounmap); > > +#endif > > I completely agree that this looks like a leak that needs to be fixed. > > But my head hurts after trying to understand pci_iomap() and > pci_iounmap(). I hate to add even more #ifdefs here. Can't we somehow > rationalize this and put pci_iounmap() next to pci_iomap()? Yes, that makes more sense than having #ifdefs here. I will re-spin and send out another version. > > 66eab4df288a ("lib: add GENERIC_PCI_IOMAP") moved pci_iomap() from > lib/iomap.c to lib/pci_iomap.c, but left pci_iounmap() in lib/iomap.c. > There must be some good reason why they're separated, but I don't know > what it is. > > > #endif /* CONFIG_PCI */ > > -- > > 2.25.1 > >
[PATCHv2] PCI: Add pci_iounmap
In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a NULL function. Due to this the managed release variants or even the explicit pci_iounmap calls doesn't really remove the mappings. This issue is seen on an arm64 based system. arm64 by default selects only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP from this 'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")' Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap, would lead to the following error message after long hour tests "allocation failed: out of vmalloc space - use vmalloc= to increase size." Signed-off-by: George Cherian --- * Changes from v1 - Fix the 0-day compilation error. - Mark the lib/iomap pci_iounmap call as weak incase if any architecture have there own implementation. include/asm-generic/io.h | 4 lib/pci_iomap.c | 10 ++ 2 files changed, 14 insertions(+) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index dabf8cb7203b..5986b37226b7 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void __iomem *addr, struct pci_dev; extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +#ifdef CONFIG_GENERIC_PCI_IOMAP +extern void pci_iounmap(struct pci_dev *dev, void __iomem *p); +#else #ifndef pci_iounmap #define pci_iounmap pci_iounmap static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p) { } #endif +#endif /* CONFIG_GENERIC_PCI_IOMAP */ #endif /* CONFIG_GENERIC_IOMAP */ /* diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index 2d3eb1cb73b8..ecd1eb3f6c25 100644 --- a/lib/pci_iomap.c +++ b/lib/pci_iomap.c @@ -134,4 +134,14 @@ void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen) return pci_iomap_wc_range(dev, bar, 0, maxlen); } EXPORT_SYMBOL_GPL(pci_iomap_wc); + +#ifndef CONFIG_GENERIC_IOMAP +#define pci_iounmap pci_iounmap +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr); +void __weak pci_iounmap(struct pci_dev *dev, void __iomem *addr) +{ + iounmap(addr); +} +EXPORT_SYMBOL(pci_iounmap); +#endif #endif /* CONFIG_PCI */ -- 2.25.1
[PATCH] PCI: Add pci_iounmap
In case if any architecture selects CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP, then the pci_iounmap function is reduced to a NULL function. Due to this the managed release variants or even the explicit pci_iounmap calls doesn't really remove the mappings. This issue is seen on an arm64 based system. arm64 by default selects only CONFIG_GENERIC_PCI_IOMAP and not CONFIG_GENERIC_IOMAP from this 'commit cb61f6769b88 ("ARM64: use GENERIC_PCI_IOMAP")' Simple bind/unbind test of any pci driver using pcim_iomap/pci_iomap, would lead to the following error message after long hour tests "allocation failed: out of vmalloc space - use vmalloc= to increase size." Signed-off-by: George Cherian --- include/asm-generic/io.h | 4 lib/pci_iomap.c | 9 + 2 files changed, 13 insertions(+) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index dabf8cb7203b..5986b37226b7 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -915,12 +915,16 @@ static inline void iowrite64_rep(volatile void __iomem *addr, struct pci_dev; extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +#ifdef CONFIG_GENERIC_PCI_IOMAP +extern void pci_iounmap(struct pci_dev *dev, void __iomem *p); +#else #ifndef pci_iounmap #define pci_iounmap pci_iounmap static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p) { } #endif +#endif /* CONFIG_GENERIC_PCI_IOMAP */ #endif /* CONFIG_GENERIC_IOMAP */ /* diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index 2d3eb1cb73b8..36128af05e1c 100644 --- a/lib/pci_iomap.c +++ b/lib/pci_iomap.c @@ -134,4 +134,13 @@ void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen) return pci_iomap_wc_range(dev, bar, 0, maxlen); } EXPORT_SYMBOL_GPL(pci_iomap_wc); + +#ifndef CONFIG_GENERIC_IOMAP +#define pci_iounmap pci_iounmap +void pci_iounmap(struct pci_dev *dev, void __iomem *addr) +{ + iounmap(addr); +} +EXPORT_SYMBOL(pci_iounmap); +#endif #endif /* CONFIG_PCI */ -- 2.25.1
Re: [EXT] Re: [PATCH] PCI: Enhance the ACS quirk for Cavium devices
Hi Bjorn, Sorry for the late reply I was off for couple of days. On 10/8/19 2:32 PM, Bjorn Helgaas wrote: > External Email > > -- > On Tue, Oct 08, 2019 at 08:25:23AM +, Robert Richter wrote: >> On 04.10.19 14:48:13, Bjorn Helgaas wrote: >>> commit 37b22fbfec2d >>> Author: George Cherian >>> Date: Thu Sep 19 02:43:34 2019 + >>> >>> PCI: Apply Cavium ACS quirk to CN99xx and CN11xxx Root Ports >>> >>> Add an array of Cavium Root Port device IDs and apply the quirk only >>> to the >>> listed devices. >>> >>> Instead of applying the quirk to all Root Ports where >>> "(dev->device & 0xf800) == 0xa000", apply it only to CN88xx 0xa180 and >>> 0xa170 Root Ports. All the root ports of CN88xx series will have device id's 0xa180 and 0xa170. This patch currently targets only CN88xx series and not all of the CN8xxx. For eg:- 83xx devices don't wont the quirk to be applied as of today. The quirk needs to be applied only for TX1 series and not oncteon-tx1 series. >> No, this can't be removed. It is a match all for all CN8xxx variants >> (note the 3 'x', all TX1 cores). So all device ids from 0xa000 to >> 0xa7FF are affected here and need the quirk. > OK, I'll drop the patch and wait for a new one. Maybe what was needed > was to keep the "(dev->device & 0xf800) == 0xa000" part and add the > pci_quirk_cavium_acs_ids[] array in addition? > >>> Also apply the quirk to CN99xx (0xaf84) and CN11xxx (0xb884) Root >>> Ports. The device id's for all variants of CN99xx is 0xaf84 and CN11xxx will be 0xb884. So this patch holds good for TX2 as well as TX3 series of processors. >> I thought the quirk is CN8xxx specific, but I could be wrong here. >> >> -Robert >> >>> >>> Link: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_r_20190919024319.GA8792-40dc5-2Deodlnx05.marvell.com=DwIBAg=nKjWec2b6R0mOyPaz7xtfQ=8vKOpC26NZGzQPAMiIlimxyEGCRSJiq-j8yyjPJ6VZ4=Vmml-rx3t63ZbbXZ0XaESAM9yAlexE29R-giTbcj4Qk=57jKIj8BAydbLpftLt5Ssva7vD6GuoCaIpjTi-sB5kU= >>> Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more >>> Root Ports") >>> Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices") >>> Signed-off-by: George Cherian >>> Signed-off-by: Bjorn Helgaas >>> Cc: sta...@vger.kernel.org # v4.12+ >>> >>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >>> index 320255e5e8f8..4e5048cb5ec6 100644 >>> --- a/drivers/pci/quirks.c >>> +++ b/drivers/pci/quirks.c >>> @@ -4311,17 +4311,24 @@ static int pci_quirk_amd_sb_acs(struct pci_dev >>> *dev, u16 acs_flags) >>> #endif >>> } >>> >>> +static const u16 pci_quirk_cavium_acs_ids[] = { >>> + 0xa180, 0xa170, /* CN88xx family of devices */ >>> + 0xaf84, /* CN99xx family of devices */ >>> + 0xb884, /* CN11xxx family of devices */ >>> +}; >>> + >>> static bool pci_quirk_cavium_acs_match(struct pci_dev *dev) >>> { >>> - /* >>> -* Effectively selects all downstream ports for whole ThunderX 1 >>> -* family by 0xf800 mask (which represents 8 SoCs), while the lower >>> -* bits of device ID are used to indicate which subdevice is used >>> -* within the SoC. >>> -*/ >>> - return (pci_is_pcie(dev) && >>> - (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) && >>> - ((dev->device & 0xf800) == 0xa000)); >>> + int i; >>> + >>> + if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) >>> + return false; >>> + >>> + for (i = 0; i < ARRAY_SIZE(pci_quirk_cavium_acs_ids); i++) >>> + if (pci_quirk_cavium_acs_ids[i] == dev->device) >>> + return true; >>> + >>> + return false; >>> } >>> >>> static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
[PATCH] PCI: Enhance the ACS quirk for Cavium devices
Enhance the ACS quirk for Cavium Processors. Add the root port vendor ID's in an array and use the same in match function. For newer devices add the vendor ID's in the array so that the match function is simpler. Signed-off-by: George Cherian --- drivers/pci/quirks.c | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 44c4ae1abd00..64deeaddd51c 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4241,17 +4241,27 @@ static int pci_quirk_amd_sb_acs(struct pci_dev *dev, u16 acs_flags) #endif } +static const u16 pci_quirk_cavium_acs_ids[] = { + /* CN88xx family of devices */ + 0xa180, 0xa170, + /* CN99xx family of devices */ + 0xaf84, + /* CN11xxx family of devices */ + 0xb884, +}; + static bool pci_quirk_cavium_acs_match(struct pci_dev *dev) { - /* -* Effectively selects all downstream ports for whole ThunderX 1 -* family by 0xf800 mask (which represents 8 SoCs), while the lower -* bits of device ID are used to indicate which subdevice is used -* within the SoC. -*/ - return (pci_is_pcie(dev) && - (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) && - ((dev->device & 0xf800) == 0xa000)); + int i; + + if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) + return false; + + for (i = 0; i < ARRAY_SIZE(pci_quirk_cavium_acs_ids); i++) + if (pci_quirk_cavium_acs_ids[i] == dev->device) + return true; + + return false; } static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags) -- 2.17.1
Re: [RFC PATCH] cpufreq / cppc: Work around for Hisilicon CPPC cpufreq
Hi Wang, On Thu, Jan 24, 2019 at 12:27 PM Viresh Kumar wrote: > > +George/Prashanth. > > Guys please see if you have any objections to this patch. I am not > very familiar with this stuff and it would be good to get some > feedback from you guys. > > @Rafael: Do you have any comments on this ? > > On 17-01-19, 19:00, Xiongfeng Wang wrote: > > Hisilicon chips do not support delivered performance counter register > > and reference performance counter register. But the platform can > > calculate the real performance using its own method. This patch provide > > a workaround for this problem, and other platforms can also use this > > workaround framework. We reuse the desired performance register to > > store the real performance calculated by the platform. After the > > platform finished the frequency adjust, it gets the real performance and > > writes it into desired performance register. OS can use it to calculate > > the real frequency. Does your platform support Autonomous Selection mode? This register is not valid when autonomous mode is enabled. In such case how are you calculating the frequency? > > > > Signed-off-by: Xiongfeng Wang > > --- > > drivers/acpi/cppc_acpi.c | 29 > > drivers/cpufreq/Kconfig.arm| 7 + > > drivers/cpufreq/cppc_cpufreq.c | 62 > > ++ > > include/acpi/cppc_acpi.h | 4 +++ > > 4 files changed, 102 insertions(+) > > > > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c > > index 217a782..0cdaf7e 100644 > > --- a/drivers/acpi/cppc_acpi.c > > +++ b/drivers/acpi/cppc_acpi.c > > @@ -1050,6 +1050,35 @@ static int cpc_write(int cpu, struct > > cpc_register_resource *reg_res, u64 val) > > return ret_val; > > } > > > > +#ifdef CONFIG_HISILICON_CPPC_CPUFREQ_WORKAROUND > > +/* > > + * We reuse the desired performance register to store the real performance > > + * calculated by the platform. > > + */ > > +u64 hisi_cppc_get_real_perf(unsigned int cpunum) > > +{ > > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum); > > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum); > > + struct cpc_register_resource *desired_reg; > > + u64 desired_perf; > > + int ret; > > + > > + /* > > + * Make sure the platform has finished the frequency adjust > > + * and wrote the real performance in desired performance register > > + */ > > + ret = check_pcc_chan(pcc_ss_id, false); > > + if (ret) > > + return 0; If there is a pending command in the channel then returning zero will give bogus frequency value. You may return the previous written value. > > + > > + desired_reg = _desc->cpc_regs[DESIRED_PERF]; > > + cpc_read(cpunum, desired_reg, _perf); > > + > > + return desired_perf; > > +} > > +EXPORT_SYMBOL_GPL(hisi_cppc_get_real_perf); > > +#endif > > + > > /** > > * cppc_get_perf_caps - Get a CPUs performance capabilities. > > * @cpunum: CPU from which to get capabilities info. > > diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm > > index 688f102..236bd07 100644 > > --- a/drivers/cpufreq/Kconfig.arm > > +++ b/drivers/cpufreq/Kconfig.arm > > @@ -18,6 +18,13 @@ config ACPI_CPPC_CPUFREQ > > > > If in doubt, say N. > > > > +config HISILICON_CPPC_CPUFREQ_WORKAROUND > > + bool "Workaround for Hisilicon CPPC Cpufreq" > > + default y > > + depends on ACPI_CPPC_CPUFREQ && ARM64 Do you really want this to be applied to all ARM64? or just only for affected HISI platforms? > > + help > > + This option enables a workaround for Hisilicon CPPC Cpufreq. > > + > > config ARM_ARMADA_37XX_CPUFREQ > > tristate "Armada 37xx CPUFreq support" > > depends on ARCH_MVEBU && CPUFREQ_DT > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > > index fd25c21c..b910e84 100644 > > --- a/drivers/cpufreq/cppc_cpufreq.c > > +++ b/drivers/cpufreq/cppc_cpufreq.c > > @@ -33,6 +33,13 @@ > > /* Offest in the DMI processor structure for the max frequency */ > > #define DMI_PROCESSOR_MAX_SPEED 0x14 > > > > +struct cppc_get_rate_workaround_info { If your intention is to make a generic framework for future extensions then you might need to name it differently. Something like struct cppc_workaround_info, so that you can extend the same for any future workarounds. > > + char oem_id[ACPI_OEM_ID_SIZE +1]; > > + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1]; > > + u32 oem_revision; > > + unsigned int (*get)(unsigned int cpu); This can be unsigned int (*get_rate)(unsigned int cpu); > > +}; > > + > > /* > > * These structs contain information parsed from per CPU > > * ACPI _CPC structures. > > @@ -357,6 +364,59 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int > > cpunum) > > .name = "cppc_cpufreq", > > }; > > > > +#ifdef CONFIG_HISILICON_CPPC_CPUFREQ_WORKAROUND > > +/* > > + * When the platform does not support delivered
Re: [PATCH] xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
Hi Alan, Thanks for the review. I will update the patch accordingly and send out v2. On 10/28/2018 10:48 PM, Alan Stern wrote: > > On Sat, 27 Oct 2018, Cherian, George wrote: > >> Implement workaround for ThunderX2 Errata-129 (documented in >> CN99XX Known Issues" available at Cavium support site). >> As per ThunderX2errata-129, USB-2.0 device may come up as USB-1.0 >> If a connection to a USB-1.0 device is followed by another connection >> to a USB-2.0 device, the link will come up as USB-1.0 for the USB-2.0 >> device. >> >> Resolution: Reset the PHY after the USB1.0 device is disconnected. >> The PHY reset sequence is done using private registers in XHCI register >> space. After the PHY is reset we check for the PLL lock status and retry >> the operation if it fails. From our tests, retrying 4 times is sufficient. >> >> Add a new quirk flag XHCI_RESET_PLL_ON_DISCONNECT to invoke the workaround >> in handle_xhci_port_status(). > > Minor nitpick (for both the patch description and the code comments): > > USB 1.0 was never widely adopted and is not used any more. The > earliest vesion of USB currently used in supported devices is USB 1.1. > Likewise, there are a few devices around that support USB 2.1, not > USB 2.0, but they are presumably also subject to the problem described > above. > > I suggest you change the description and the comments to refer to USB 1 > and USB 2 instead of USB 1.0 and USB 2.0, as the latter are too > restrictive and misleading. > > Alan Stern > Regards, -George
Re: [PATCH] xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
Hi Alan, Thanks for the review. I will update the patch accordingly and send out v2. On 10/28/2018 10:48 PM, Alan Stern wrote: > > On Sat, 27 Oct 2018, Cherian, George wrote: > >> Implement workaround for ThunderX2 Errata-129 (documented in >> CN99XX Known Issues" available at Cavium support site). >> As per ThunderX2errata-129, USB-2.0 device may come up as USB-1.0 >> If a connection to a USB-1.0 device is followed by another connection >> to a USB-2.0 device, the link will come up as USB-1.0 for the USB-2.0 >> device. >> >> Resolution: Reset the PHY after the USB1.0 device is disconnected. >> The PHY reset sequence is done using private registers in XHCI register >> space. After the PHY is reset we check for the PLL lock status and retry >> the operation if it fails. From our tests, retrying 4 times is sufficient. >> >> Add a new quirk flag XHCI_RESET_PLL_ON_DISCONNECT to invoke the workaround >> in handle_xhci_port_status(). > > Minor nitpick (for both the patch description and the code comments): > > USB 1.0 was never widely adopted and is not used any more. The > earliest vesion of USB currently used in supported devices is USB 1.1. > Likewise, there are a few devices around that support USB 2.1, not > USB 2.0, but they are presumably also subject to the problem described > above. > > I suggest you change the description and the comments to refer to USB 1 > and USB 2 instead of USB 1.0 and USB 2.0, as the latter are too > restrictive and misleading. > > Alan Stern > Regards, -George
Re: [PATCH 2/2] ipmi_ssif: Fix crash seen while ipmi_unregister_smi
Hi Corey, On 08/24/2018 06:38 PM, Corey Minyard wrote: On 08/24/2018 06:10 AM, George Cherian wrote: Dont set ssif_info->intf to NULL before ipmi_unresgiter_smi. shutdown_ssif will anyways free ssif_info. This is correct, but it goes a little deeper. I just sent out a patch yesterday that included this. Yes I saw the patch now, https://sourceforge.net/p/openipmi/mailman/message/36397896/ I will test and update in that thread. Thanks, -corey Following crash is obsearved if ssif_info->intf is set to NULL before ipmi_unregister_smi. CPU: 119 PID: 7317 Comm: kssif000e Not tainted 4.18.0+ #80 Hardware name: Cavium Inc. Saber/Saber, BIOS Cavium reference firmware version 7.0 08/04/2018 pstate: 2049 (nzCv daif +PAN -UAO) pc : ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler] lr : deliver_recv_msg+0x30/0x5c [ipmi_ssif] sp : 37a0fd20 x29: 37a0fd20 x28: x27: 047e08f0 x26: 800ed9375800 x25: 37a0fe00 x24: 09073000 x23: 0013 x22: x21: 7000 x20: 800adce18400 x19: x18: 3742fd38 x17: 089960f0 x16: 000e x15: 0007 x14: x13: x12: 0033 x11: 0381 x10: 0ba0 x9 : x8 : 800ac001fc00 x7 : 7fe003b4d800 x6 : 800adce1854b x5 : 0014 x4 : 0004 x3 : x2 : 0002 x1 : 567cb12f8b916b00 x0 : 0002 Process kssif000e (pid: 7317, stack limit = 0x41077d8a) Call trace: ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler] deliver_recv_msg+0x30/0x5c [ipmi_ssif] msg_done_handler+0x2f0/0x66c [ipmi_ssif] ipmi_ssif_thread+0x108/0x124 [ipmi_ssif] kthread+0x108/0x134 ret_from_fork+0x10/0x18 Code: b9402280 91401e75 f90037a1 7100041f (b945bab6) ---[ end trace fb7d748bc7b17490 ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x23800c38 Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: George Cherian --- drivers/char/ipmi/ipmi_ssif.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index ccdf6b1..1490636 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -1226,7 +1226,6 @@ static void shutdown_ssif(void *send_info) static int ssif_remove(struct i2c_client *client) { struct ssif_info *ssif_info = i2c_get_clientdata(client); - struct ipmi_smi *intf; struct ssif_addr_info *addr_info; if (!ssif_info) @@ -1236,9 +1235,7 @@ static int ssif_remove(struct i2c_client *client) * After this point, we won't deliver anything asychronously * to the message handler. We can unregister ourself. */ - intf = ssif_info->intf; - ssif_info->intf = NULL; - ipmi_unregister_smi(intf); + ipmi_unregister_smi(ssif_info->intf); list_for_each_entry(addr_info, _infos, link) { if (addr_info->client == client) {
Re: [PATCH 2/2] ipmi_ssif: Fix crash seen while ipmi_unregister_smi
Hi Corey, On 08/24/2018 06:38 PM, Corey Minyard wrote: On 08/24/2018 06:10 AM, George Cherian wrote: Dont set ssif_info->intf to NULL before ipmi_unresgiter_smi. shutdown_ssif will anyways free ssif_info. This is correct, but it goes a little deeper. I just sent out a patch yesterday that included this. Yes I saw the patch now, https://sourceforge.net/p/openipmi/mailman/message/36397896/ I will test and update in that thread. Thanks, -corey Following crash is obsearved if ssif_info->intf is set to NULL before ipmi_unregister_smi. CPU: 119 PID: 7317 Comm: kssif000e Not tainted 4.18.0+ #80 Hardware name: Cavium Inc. Saber/Saber, BIOS Cavium reference firmware version 7.0 08/04/2018 pstate: 2049 (nzCv daif +PAN -UAO) pc : ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler] lr : deliver_recv_msg+0x30/0x5c [ipmi_ssif] sp : 37a0fd20 x29: 37a0fd20 x28: x27: 047e08f0 x26: 800ed9375800 x25: 37a0fe00 x24: 09073000 x23: 0013 x22: x21: 7000 x20: 800adce18400 x19: x18: 3742fd38 x17: 089960f0 x16: 000e x15: 0007 x14: x13: x12: 0033 x11: 0381 x10: 0ba0 x9 : x8 : 800ac001fc00 x7 : 7fe003b4d800 x6 : 800adce1854b x5 : 0014 x4 : 0004 x3 : x2 : 0002 x1 : 567cb12f8b916b00 x0 : 0002 Process kssif000e (pid: 7317, stack limit = 0x41077d8a) Call trace: ipmi_smi_msg_received+0x44/0x3bc [ipmi_msghandler] deliver_recv_msg+0x30/0x5c [ipmi_ssif] msg_done_handler+0x2f0/0x66c [ipmi_ssif] ipmi_ssif_thread+0x108/0x124 [ipmi_ssif] kthread+0x108/0x134 ret_from_fork+0x10/0x18 Code: b9402280 91401e75 f90037a1 7100041f (b945bab6) ---[ end trace fb7d748bc7b17490 ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x23800c38 Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: George Cherian --- drivers/char/ipmi/ipmi_ssif.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index ccdf6b1..1490636 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -1226,7 +1226,6 @@ static void shutdown_ssif(void *send_info) static int ssif_remove(struct i2c_client *client) { struct ssif_info *ssif_info = i2c_get_clientdata(client); - struct ipmi_smi *intf; struct ssif_addr_info *addr_info; if (!ssif_info) @@ -1236,9 +1235,7 @@ static int ssif_remove(struct i2c_client *client) * After this point, we won't deliver anything asychronously * to the message handler. We can unregister ourself. */ - intf = ssif_info->intf; - ssif_info->intf = NULL; - ipmi_unregister_smi(intf); + ipmi_unregister_smi(ssif_info->intf); list_for_each_entry(addr_info, _infos, link) { if (addr_info->client == client) {
[PATCH v2] i2c: xlp9xx: Fix case where SSIF read transaction completes early
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian --- drivers/i2c/busses/i2c-xlp9xx.c | 41 - 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index 1f41a4f..7134f72 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -191,28 +191,43 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + + /* +* We expect at least 2 interrupts for I2C_M_RECV_LEN +* transactions. The length is updated during the first +* interrupt, and the buffer contents are only copied +* during subsequent interrupts. If in case the interrupts +* get merged we would complete the transaction without +* copying out the bytes from RX fifo. To avoid this now we +* drain the fifo as and when data is available. +* We drained the rlen byte already, decrement total length +* by one. +*/ + + len--; if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) { rlen = 0; /*abort transfer */ priv->msg_buf_remaining = 0; priv->msg_len = 0; - } else { - *buf++ = rlen; - if (priv->client_pec) - ++rlen; /* account for error check byte */ - /* update remaining bytes and message length */ - priv->msg_buf_remaining = rlen; - priv->msg_len = rlen + 1; + xlp9xx_i2c_update_rlen(priv); + return; } + + *buf++ = rlen; + if (priv->client_pec) + ++rlen; /* account for error check byte */ + /* update remaining bytes and message length */ + priv->msg_buf_remaining = rlen; + priv->msg_len = rlen + 1; xlp9xx_i2c_update_rlen(priv); priv->len_recv = false; - } else { - len = min(priv->msg_buf_remaining, len); - for (i = 0; i < len; i++, buf++) - *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); - - priv->msg_buf_remaining -= len; } + len = min(priv->msg_buf_remaining, len); + for (i = 0; i < len; i++, buf++) + *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + + priv->msg_buf_remaining -= len; priv->msg_buf = buf; if (priv->msg_buf_remaining) -- 1.8.3.1
[PATCH v2] i2c: xlp9xx: Fix case where SSIF read transaction completes early
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian --- drivers/i2c/busses/i2c-xlp9xx.c | 41 - 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index 1f41a4f..7134f72 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -191,28 +191,43 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + + /* +* We expect at least 2 interrupts for I2C_M_RECV_LEN +* transactions. The length is updated during the first +* interrupt, and the buffer contents are only copied +* during subsequent interrupts. If in case the interrupts +* get merged we would complete the transaction without +* copying out the bytes from RX fifo. To avoid this now we +* drain the fifo as and when data is available. +* We drained the rlen byte already, decrement total length +* by one. +*/ + + len--; if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) { rlen = 0; /*abort transfer */ priv->msg_buf_remaining = 0; priv->msg_len = 0; - } else { - *buf++ = rlen; - if (priv->client_pec) - ++rlen; /* account for error check byte */ - /* update remaining bytes and message length */ - priv->msg_buf_remaining = rlen; - priv->msg_len = rlen + 1; + xlp9xx_i2c_update_rlen(priv); + return; } + + *buf++ = rlen; + if (priv->client_pec) + ++rlen; /* account for error check byte */ + /* update remaining bytes and message length */ + priv->msg_buf_remaining = rlen; + priv->msg_len = rlen + 1; xlp9xx_i2c_update_rlen(priv); priv->len_recv = false; - } else { - len = min(priv->msg_buf_remaining, len); - for (i = 0; i < len; i++, buf++) - *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); - - priv->msg_buf_remaining -= len; } + len = min(priv->msg_buf_remaining, len); + for (i = 0; i < len; i++, buf++) + *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + + priv->msg_buf_remaining -= len; priv->msg_buf = buf; if (priv->msg_buf_remaining) -- 1.8.3.1
Re: Re: [PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early
Hi Wolfran, Thanks for the review. I will update the patch with a small comment section above len --; so that there is no confusion. On 08/01/2018 02:35 AM, Wolfram Sang wrote: --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + len--; I don't know the HW and assume the above line is correct because of merging two interrupts into one. However, the line looks a bit stray, and I wonder if we shouldn't add a comment somewhere explaining the situation similar to the second paragraph of the commit message? Regards, -George
Re: Re: [PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early
Hi Wolfran, Thanks for the review. I will update the patch with a small comment section above len --; so that there is no confusion. On 08/01/2018 02:35 AM, Wolfram Sang wrote: --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + len--; I don't know the HW and assume the above line is correct because of merging two interrupts into one. However, the line looks a bit stray, and I wonder if we shouldn't add a comment somewhere explaining the situation similar to the second paragraph of the commit message? Regards, -George
[PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian --- drivers/i2c/busses/i2c-xlp9xx.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index 1f41a4f..01fa04d 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + len--; if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) { rlen = 0; /*abort transfer */ priv->msg_buf_remaining = 0; priv->msg_len = 0; - } else { - *buf++ = rlen; - if (priv->client_pec) - ++rlen; /* account for error check byte */ - /* update remaining bytes and message length */ - priv->msg_buf_remaining = rlen; - priv->msg_len = rlen + 1; + xlp9xx_i2c_update_rlen(priv); + return; } + + *buf++ = rlen; + if (priv->client_pec) + ++rlen; /* account for error check byte */ + /* update remaining bytes and message length */ + priv->msg_buf_remaining = rlen; + priv->msg_len = rlen + 1; xlp9xx_i2c_update_rlen(priv); priv->len_recv = false; - } else { - len = min(priv->msg_buf_remaining, len); - for (i = 0; i < len; i++, buf++) - *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); - - priv->msg_buf_remaining -= len; } + len = min(priv->msg_buf_remaining, len); + for (i = 0; i < len; i++, buf++) + *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + + priv->msg_buf_remaining -= len; priv->msg_buf = buf; if (priv->msg_buf_remaining) -- 1.8.3.1
[PATCH] i2c: xlp9xx: Fix case where SSIF read transaction completes early
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian --- drivers/i2c/busses/i2c-xlp9xx.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index 1f41a4f..01fa04d 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -191,28 +191,30 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + len--; if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) { rlen = 0; /*abort transfer */ priv->msg_buf_remaining = 0; priv->msg_len = 0; - } else { - *buf++ = rlen; - if (priv->client_pec) - ++rlen; /* account for error check byte */ - /* update remaining bytes and message length */ - priv->msg_buf_remaining = rlen; - priv->msg_len = rlen + 1; + xlp9xx_i2c_update_rlen(priv); + return; } + + *buf++ = rlen; + if (priv->client_pec) + ++rlen; /* account for error check byte */ + /* update remaining bytes and message length */ + priv->msg_buf_remaining = rlen; + priv->msg_len = rlen + 1; xlp9xx_i2c_update_rlen(priv); priv->len_recv = false; - } else { - len = min(priv->msg_buf_remaining, len); - for (i = 0; i < len; i++, buf++) - *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); - - priv->msg_buf_remaining -= len; } + len = min(priv->msg_buf_remaining, len); + for (i = 0; i < len; i++, buf++) + *buf = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); + + priv->msg_buf_remaining -= len; priv->msg_buf = buf; if (priv->msg_buf_remaining) -- 1.8.3.1
[PATCH v4] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 52 ++ 1 file changed, 52 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..30f3021 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,62 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static inline u64 get_delta(u64 t1, u64 t0) +{ + if (t1 > t0 || t0 > ~(u32)0) + return t1 - t0; + + return (u32)t1 - (u32)t0; +} + +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = get_delta(fb_ctrs_t1.reference, + fb_ctrs_t0.reference); + delta_delivered = get_delta(fb_ctrs_t1.delivered, + fb_ctrs_t0.delivered); + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 1.8.3.1
[PATCH v4] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 52 ++ 1 file changed, 52 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..30f3021 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,62 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static inline u64 get_delta(u64 t1, u64 t0) +{ + if (t1 > t0 || t0 > ~(u32)0) + return t1 - t0; + + return (u32)t1 - (u32)t0; +} + +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = get_delta(fb_ctrs_t1.reference, + fb_ctrs_t0.reference); + delta_delivered = get_delta(fb_ctrs_t1.delivered, + fb_ctrs_t0.delivered); + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 1.8.3.1
Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prakash, On 07/10/2018 09:19 PM, Prakash, Prashanth wrote: On 7/9/2018 11:42 PM, George Cherian wrote: Hi Prakash, On 07/09/2018 10:12 PM, Prakash, Prashanth wrote: Hi George, On 7/9/2018 4:10 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..61132e8 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = (u32)fb_ctrs_t1.reference - + (u32)fb_ctrs_t0.reference; + delta_delivered = (u32)fb_ctrs_t1.delivered - + (u32)fb_ctrs_t0.delivered; Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs have 64b fields for reference and delivered counters. Moreover, the integer math is incorrect. You can run into a scenario where t1.ref/del < t0.ref/del, thus setting a negative number to u64! The likelihood of this is very high especially when you throw away the higher 32bits. Because of binary representation, unsigned subtraction will work even if t1.ref/del < t0.ref/del. So essentially, the code should look like this, static inline u64 get_delta(u64 t1, u64 t0) { if (t1 > t0 || t0 > ~(u32)0) return t1 - t0; return (u32)t1 - (u32)t0; } As a further optimization, I used (u32) since that also works, as long as the momentary delta at any point is not greater than 2 ^ 32. I don't foresee any reason for any platform to increment the counters at an interval greater than 2 ^ 32. We are NOT running within any critical section to make sure that there will be no context switch between feedback counter reads. Thus the assumptions that the delta always represent a very short momentary window of time and that it is always less than 2^32 is not accurate. The single overflow assumption about when the above interger math will work is also not acceptable - especially when we throw away the higher order bits. There are hardware out there that uses 64b counters and can overflow lower 32b in quite short order of time. Since the spec (and some hardware) provides 64bits, we should use it make our implementation more robust instead of throwing away the higher order bits. I think it's ok to use the above integer math, but please add a comment about single overflow assumption and don't throw away the higher 32bits. Okay, I will spin a v4 with the get_delta changes. Also note that the get_delta function doesn't throw away the higher 32 bits. To keep things simple, do something like below: if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) { /* Atleast one of them should have overflowed */ return desired_perf; } else { compute the delivered perf using the counters. } No need to do like this as this is tested and found working across counter overruns in our platform. + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret)
Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prakash, On 07/10/2018 09:19 PM, Prakash, Prashanth wrote: On 7/9/2018 11:42 PM, George Cherian wrote: Hi Prakash, On 07/09/2018 10:12 PM, Prakash, Prashanth wrote: Hi George, On 7/9/2018 4:10 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..61132e8 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = (u32)fb_ctrs_t1.reference - + (u32)fb_ctrs_t0.reference; + delta_delivered = (u32)fb_ctrs_t1.delivered - + (u32)fb_ctrs_t0.delivered; Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs have 64b fields for reference and delivered counters. Moreover, the integer math is incorrect. You can run into a scenario where t1.ref/del < t0.ref/del, thus setting a negative number to u64! The likelihood of this is very high especially when you throw away the higher 32bits. Because of binary representation, unsigned subtraction will work even if t1.ref/del < t0.ref/del. So essentially, the code should look like this, static inline u64 get_delta(u64 t1, u64 t0) { if (t1 > t0 || t0 > ~(u32)0) return t1 - t0; return (u32)t1 - (u32)t0; } As a further optimization, I used (u32) since that also works, as long as the momentary delta at any point is not greater than 2 ^ 32. I don't foresee any reason for any platform to increment the counters at an interval greater than 2 ^ 32. We are NOT running within any critical section to make sure that there will be no context switch between feedback counter reads. Thus the assumptions that the delta always represent a very short momentary window of time and that it is always less than 2^32 is not accurate. The single overflow assumption about when the above interger math will work is also not acceptable - especially when we throw away the higher order bits. There are hardware out there that uses 64b counters and can overflow lower 32b in quite short order of time. Since the spec (and some hardware) provides 64bits, we should use it make our implementation more robust instead of throwing away the higher order bits. I think it's ok to use the above integer math, but please add a comment about single overflow assumption and don't throw away the higher 32bits. Okay, I will spin a v4 with the get_delta changes. Also note that the get_delta function doesn't throw away the higher 32 bits. To keep things simple, do something like below: if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) { /* Atleast one of them should have overflowed */ return desired_perf; } else { compute the delivered perf using the counters. } No need to do like this as this is tested and found working across counter overruns in our platform. + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret)
Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prakash, On 07/09/2018 10:12 PM, Prakash, Prashanth wrote: Hi George, On 7/9/2018 4:10 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..61132e8 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = (u32)fb_ctrs_t1.reference - + (u32)fb_ctrs_t0.reference; + delta_delivered = (u32)fb_ctrs_t1.delivered - + (u32)fb_ctrs_t0.delivered; Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs have 64b fields for reference and delivered counters. Moreover, the integer math is incorrect. You can run into a scenario where t1.ref/del < t0.ref/del, thus setting a negative number to u64! The likelihood of this is very high especially when you throw away the higher 32bits. Because of binary representation, unsigned subtraction will work even if t1.ref/del < t0.ref/del. So essentially, the code should look like this, static inline u64 get_delta(u64 t1, u64 t0) { if (t1 > t0 || t0 > ~(u32)0) return t1 - t0; return (u32)t1 - (u32)t0; } As a further optimization, I used (u32) since that also works, as long as the momentary delta at any point is not greater than 2 ^ 32. I don't foresee any reason for any platform to increment the counters at an interval greater than 2 ^ 32. To keep things simple, do something like below: if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) { /* Atleast one of them should have overflowed */ return desired_perf; } else { compute the delivered perf using the counters. } No need to do like this as this is tested and found working across counter overruns in our platform. + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq",
Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prakash, On 07/09/2018 10:12 PM, Prakash, Prashanth wrote: Hi George, On 7/9/2018 4:10 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..61132e8 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = (u32)fb_ctrs_t1.reference - + (u32)fb_ctrs_t0.reference; + delta_delivered = (u32)fb_ctrs_t1.delivered - + (u32)fb_ctrs_t0.delivered; Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs have 64b fields for reference and delivered counters. Moreover, the integer math is incorrect. You can run into a scenario where t1.ref/del < t0.ref/del, thus setting a negative number to u64! The likelihood of this is very high especially when you throw away the higher 32bits. Because of binary representation, unsigned subtraction will work even if t1.ref/del < t0.ref/del. So essentially, the code should look like this, static inline u64 get_delta(u64 t1, u64 t0) { if (t1 > t0 || t0 > ~(u32)0) return t1 - t0; return (u32)t1 - (u32)t0; } As a further optimization, I used (u32) since that also works, as long as the momentary delta at any point is not greater than 2 ^ 32. I don't foresee any reason for any platform to increment the counters at an interval greater than 2 ^ 32. To keep things simple, do something like below: if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) { /* Atleast one of them should have overflowed */ return desired_perf; } else { compute the delivered perf using the counters. } No need to do like this as this is tested and found working across counter overruns in our platform. + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq",
[PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..61132e8 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = (u32)fb_ctrs_t1.reference - + (u32)fb_ctrs_t0.reference; + delta_delivered = (u32)fb_ctrs_t1.delivered - + (u32)fb_ctrs_t0.delivered; + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 1.8.3.1
[PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a9d3eec..61132e8 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + + delta_reference = (u32)fb_ctrs_t1.reference - + (u32)fb_ctrs_t0.reference; + delta_delivered = (u32)fb_ctrs_t1.delivered - + (u32)fb_ctrs_t0.delivered; + + /* Check to avoid divide-by zero */ + if (delta_reference || delta_delivered) + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = cpu->perf_ctrls.desired_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 1.8.3.1
Re: [v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi JC, Thanks for the review. On 06/20/2018 02:09 AM, Jayachandran C wrote: Hi George, Few comments on your patch: On Fri, Jun 15, 2018 at 03:03:15AM -0700, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 71 ++ 1 file changed, 71 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3464580..3fe7625 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) { + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.reference > (~(u32)0)) + delta_reference = (~((u64)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + else + delta_reference = (~((u32)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + } + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) { + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.delivered > (~(u32)0)) + delta_delivered = (~((u64)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + else + delta_delivered = (~((u32)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + } Having this code repeated twice does not look great. Also the math here is not correct, since (~0 - val2 + val1) is off by one. Because of binary representation, unsigned subtraction will work even if val2 < val1. So cleaner way would be to do: static inline u64 ts_sub(u64 t1, u64 t0) { if (t1 > t0 || t0 > ~(u32)0) return t1 - t0; return (u32)t1 - (u32)t0; } And then use ts_sub in both places above. I was actually thinking to replace the whole comparison with a single line irrespective of rollover or not. It will look something like this. delta = (u32)(((1UL << 32) - t0) + t1); This will also take care of the value being off by one. JC. Regards, -George
Re: [v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi JC, Thanks for the review. On 06/20/2018 02:09 AM, Jayachandran C wrote: Hi George, Few comments on your patch: On Fri, Jun 15, 2018 at 03:03:15AM -0700, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 71 ++ 1 file changed, 71 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3464580..3fe7625 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) { + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.reference > (~(u32)0)) + delta_reference = (~((u64)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + else + delta_reference = (~((u32)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + } + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) { + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.delivered > (~(u32)0)) + delta_delivered = (~((u64)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + else + delta_delivered = (~((u32)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + } Having this code repeated twice does not look great. Also the math here is not correct, since (~0 - val2 + val1) is off by one. Because of binary representation, unsigned subtraction will work even if val2 < val1. So cleaner way would be to do: static inline u64 ts_sub(u64 t1, u64 t0) { if (t1 > t0 || t0 > ~(u32)0) return t1 - t0; return (u32)t1 - (u32)t0; } And then use ts_sub in both places above. I was actually thinking to replace the whole comparison with a single line irrespective of rollover or not. It will look something like this. delta = (u32)(((1UL << 32) - t0) + t1); This will also take care of the value being off by one. JC. Regards, -George
Re: [PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prakash, Thanks for the review. On 06/19/2018 01:51 AM, Prakash, Prashanth wrote: External Email Hi George, On 6/15/2018 4:03 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 71 ++ 1 file changed, 71 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3464580..3fe7625 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) { + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + } else { There should be another if () here to check if the reference counters are equal. We cannot assume, there was a overflow when the counters are equal. As I mentioned on last patch, the counters *may* pause in idle states. My Bad... I somehow, over looked that point. In case of delta_reference being zero there is actually a check below to avoid divide-by-zero. There I returned reference perf instead of desired perf, same I will take care in v3. Isn't that sufficient or is there a need for an explicit check here for delta = zero? Moreover the delta calculation am planning to replace with single line comparison in v3 for both normal and overflow case. + /* + * Counters would have wrapped-around + * We also need to find whether the low level fw + * maintains 32 bit or 64 bit counters, to calculate + * the correct delta. + */ + if (fb_ctrs_t0.reference > (~(u32)0)) + delta_reference = (~((u64)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + else + delta_reference = (~((u32)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + } + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) { + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + } else { + /* + * Counters would have wrapped-around + * We also need to find whether the low level fw + * maintains 32 bit or 64 bit counters, to calculate + * the correct delta. + */ + if (fb_ctrs_t0.delivered > (~(u32)0)) + delta_delivered = (~((u64)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + else + delta_delivered = (~((u32)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + } + + if (delta_reference) /* Check to avoid divide-by zero */ + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = reference_perf; If we cannot compute delivered performance then we should return desired/requested perf and not reference_perf. Noted!! + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_c
Re: [PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prakash, Thanks for the review. On 06/19/2018 01:51 AM, Prakash, Prashanth wrote: External Email Hi George, On 6/15/2018 4:03 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 71 ++ 1 file changed, 71 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3464580..3fe7625 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) { + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + } else { There should be another if () here to check if the reference counters are equal. We cannot assume, there was a overflow when the counters are equal. As I mentioned on last patch, the counters *may* pause in idle states. My Bad... I somehow, over looked that point. In case of delta_reference being zero there is actually a check below to avoid divide-by-zero. There I returned reference perf instead of desired perf, same I will take care in v3. Isn't that sufficient or is there a need for an explicit check here for delta = zero? Moreover the delta calculation am planning to replace with single line comparison in v3 for both normal and overflow case. + /* + * Counters would have wrapped-around + * We also need to find whether the low level fw + * maintains 32 bit or 64 bit counters, to calculate + * the correct delta. + */ + if (fb_ctrs_t0.reference > (~(u32)0)) + delta_reference = (~((u64)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + else + delta_reference = (~((u32)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + } + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) { + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + } else { + /* + * Counters would have wrapped-around + * We also need to find whether the low level fw + * maintains 32 bit or 64 bit counters, to calculate + * the correct delta. + */ + if (fb_ctrs_t0.delivered > (~(u32)0)) + delta_delivered = (~((u64)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + else + delta_delivered = (~((u32)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + } + + if (delta_reference) /* Check to avoid divide-by zero */ + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = reference_perf; If we cannot compute delivered performance then we should return desired/requested perf and not reference_perf. Noted!! + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_c
[PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 71 ++ 1 file changed, 71 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3464580..3fe7625 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) { + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.reference > (~(u32)0)) + delta_reference = (~((u64)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + else + delta_reference = (~((u32)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + } + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) { + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.delivered > (~(u32)0)) + delta_delivered = (~((u64)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + else + delta_delivered = (~((u32)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + } + + if (delta_reference) /* Check to avoid divide-by zero */ + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = reference_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 2.7.4
[PATCH v2] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian Acked-by: Viresh Kumar --- drivers/cpufreq/cppc_cpufreq.c | 71 ++ 1 file changed, 71 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 3464580..3fe7625 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -296,10 +296,81 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu, +struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, delivered_perf; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) { + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.reference > (~(u32)0)) + delta_reference = (~((u64)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + else + delta_reference = (~((u32)0) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + } + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) { + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + } else { + /* +* Counters would have wrapped-around +* We also need to find whether the low level fw +* maintains 32 bit or 64 bit counters, to calculate +* the correct delta. +*/ + if (fb_ctrs_t0.delivered > (~(u32)0)) + delta_delivered = (~((u64)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + else + delta_delivered = (~((u32)0) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + } + + if (delta_reference) /* Check to avoid divide-by zero */ + delivered_perf = (reference_perf * delta_delivered) / + delta_reference; + else + delivered_perf = reference_perf; + + return cppc_cpufreq_perf_to_khz(cpu, delivered_perf); +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + struct cppc_cpudata *cpu = all_cpu_data[cpunum]; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + udelay(2); /* 2usec delay between sampling */ + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 2.7.4
Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prashanth, On 05/29/2018 09:14 PM, Prakash, Prashanth wrote: On 5/28/2018 1:09 AM, George Cherian wrote: Hi Prashanth, On 05/26/2018 02:30 AM, Prakash, Prashanth wrote: On 5/25/2018 12:27 AM, George Cherian wrote: Hi Prashanth, On 05/25/2018 12:55 AM, Prakash, Prashanth wrote: Hi George, On 5/22/2018 5:42 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; We need to check that the wraparound time is long enough to make sure that the counters cannot wrap around more than once. We can register a get() api only after checking that wraparound time value is reasonably high. I am not aware of any platforms where wraparound time is soo short, but wouldn't hurt to check once during init. By design the wraparound time is a 64 bit counter, for that matter even all the feedback counters too are 64 bit counters. I don't see any chance in which the counters can wraparound twice in back to back reads. The only situation is in which system itself is running at a really high frequency. Even in that case today's spec is not sufficient to support the same. The spec doesn't say these have to be 64bit registers. The wraparound counter register is in spec to communicate the worst case(shortest) counter rollover time. Spec says these are 32 or 64 bit registers. Spec also defines counter wraparound time in seconds. The minimum value possible is 1 as zero means the counters are never assumed to wrap around. Even in platforms with value set as 1 (1 sec) I dont really see a situation in which the counter can wraparound twice if we are putting a delay of 2usec between sampling. ok. Thanks As as mentioned before this is just a defensive check to make sure that the platform has not set it to some very low number (which is allowed by the spec). It might be unnecessary to have a check like this. + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; Why not just return the computed value here instead of *1000 and later /1000? return (ref_per * delta_del) / delta_ref; Yes. + else + return -EINVAL; Instead of EINVAL, i think we should return current frequency. Sorry, I didn't get you, How do you calculate the current frequency? Did you mean reference performance? I mean the performance that OSPM/Linux had requested earlier. i.e the desired_perf Okay, I will make necessary changes for this in v2. The counters can pause if CPUs are in idle state during our sampling interval, so If the counters did not progress, it is reasonable to assume the delivered perf was equal to desired perf. No, that is wrong. Here the check is for reference performance delta. This counter can never pause. In case of cpuidle only the delivered counters could pause. Delivered counters will pause only if the particular core enters power down mode, Otherwise we would be still clocking the core and we should be getting a delta across 2 sampling periods. In case if the reference counter is paused which
Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prashanth, On 05/29/2018 09:14 PM, Prakash, Prashanth wrote: On 5/28/2018 1:09 AM, George Cherian wrote: Hi Prashanth, On 05/26/2018 02:30 AM, Prakash, Prashanth wrote: On 5/25/2018 12:27 AM, George Cherian wrote: Hi Prashanth, On 05/25/2018 12:55 AM, Prakash, Prashanth wrote: Hi George, On 5/22/2018 5:42 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; We need to check that the wraparound time is long enough to make sure that the counters cannot wrap around more than once. We can register a get() api only after checking that wraparound time value is reasonably high. I am not aware of any platforms where wraparound time is soo short, but wouldn't hurt to check once during init. By design the wraparound time is a 64 bit counter, for that matter even all the feedback counters too are 64 bit counters. I don't see any chance in which the counters can wraparound twice in back to back reads. The only situation is in which system itself is running at a really high frequency. Even in that case today's spec is not sufficient to support the same. The spec doesn't say these have to be 64bit registers. The wraparound counter register is in spec to communicate the worst case(shortest) counter rollover time. Spec says these are 32 or 64 bit registers. Spec also defines counter wraparound time in seconds. The minimum value possible is 1 as zero means the counters are never assumed to wrap around. Even in platforms with value set as 1 (1 sec) I dont really see a situation in which the counter can wraparound twice if we are putting a delay of 2usec between sampling. ok. Thanks As as mentioned before this is just a defensive check to make sure that the platform has not set it to some very low number (which is allowed by the spec). It might be unnecessary to have a check like this. + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; Why not just return the computed value here instead of *1000 and later /1000? return (ref_per * delta_del) / delta_ref; Yes. + else + return -EINVAL; Instead of EINVAL, i think we should return current frequency. Sorry, I didn't get you, How do you calculate the current frequency? Did you mean reference performance? I mean the performance that OSPM/Linux had requested earlier. i.e the desired_perf Okay, I will make necessary changes for this in v2. The counters can pause if CPUs are in idle state during our sampling interval, so If the counters did not progress, it is reasonable to assume the delivered perf was equal to desired perf. No, that is wrong. Here the check is for reference performance delta. This counter can never pause. In case of cpuidle only the delivered counters could pause. Delivered counters will pause only if the particular core enters power down mode, Otherwise we would be still clocking the core and we should be getting a delta across 2 sampling periods. In case if the reference counter is paused which
Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prashanth, On 05/26/2018 02:30 AM, Prakash, Prashanth wrote: On 5/25/2018 12:27 AM, George Cherian wrote: Hi Prashanth, On 05/25/2018 12:55 AM, Prakash, Prashanth wrote: Hi George, On 5/22/2018 5:42 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian <george.cher...@cavium.com> --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; We need to check that the wraparound time is long enough to make sure that the counters cannot wrap around more than once. We can register a get() api only after checking that wraparound time value is reasonably high. I am not aware of any platforms where wraparound time is soo short, but wouldn't hurt to check once during init. By design the wraparound time is a 64 bit counter, for that matter even all the feedback counters too are 64 bit counters. I don't see any chance in which the counters can wraparound twice in back to back reads. The only situation is in which system itself is running at a really high frequency. Even in that case today's spec is not sufficient to support the same. The spec doesn't say these have to be 64bit registers. The wraparound counter register is in spec to communicate the worst case(shortest) counter rollover time. Spec says these are 32 or 64 bit registers. Spec also defines counter wraparound time in seconds. The minimum value possible is 1 as zero means the counters are never assumed to wrap around. Even in platforms with value set as 1 (1 sec) I dont really see a situation in which the counter can wraparound twice if we are putting a delay of 2usec between sampling. As as mentioned before this is just a defensive check to make sure that the platform has not set it to some very low number (which is allowed by the spec). It might be unnecessary to have a check like this. + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; Why not just return the computed value here instead of *1000 and later /1000? return (ref_per * delta_del) / delta_ref; Yes. + else + return -EINVAL; Instead of EINVAL, i think we should return current frequency. Sorry, I didn't get you, How do you calculate the current frequency? Did you mean reference performance? I mean the performance that OSPM/Linux had requested earlier. i.e the desired_perf Okay, I will make necessary changes for this in v2. The counters can pause if CPUs are in idle state during our sampling interval, so If the counters did not progress, it is reasonable to assume the delivered perf was equal to desired perf. No, that is wrong. Here the check is for reference performance delta. This counter can never pause. In case of cpuidle only the delivered counters could pause. Delivered counters will pause only if the particular core enters power down mode, Otherwise we would be still clocking the core and we should be getting a delta across 2 sampling periods. In case if the reference counter is paused which by design is not correct then there is no point in returning reference performance numbers. That too is
Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prashanth, On 05/26/2018 02:30 AM, Prakash, Prashanth wrote: On 5/25/2018 12:27 AM, George Cherian wrote: Hi Prashanth, On 05/25/2018 12:55 AM, Prakash, Prashanth wrote: Hi George, On 5/22/2018 5:42 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; We need to check that the wraparound time is long enough to make sure that the counters cannot wrap around more than once. We can register a get() api only after checking that wraparound time value is reasonably high. I am not aware of any platforms where wraparound time is soo short, but wouldn't hurt to check once during init. By design the wraparound time is a 64 bit counter, for that matter even all the feedback counters too are 64 bit counters. I don't see any chance in which the counters can wraparound twice in back to back reads. The only situation is in which system itself is running at a really high frequency. Even in that case today's spec is not sufficient to support the same. The spec doesn't say these have to be 64bit registers. The wraparound counter register is in spec to communicate the worst case(shortest) counter rollover time. Spec says these are 32 or 64 bit registers. Spec also defines counter wraparound time in seconds. The minimum value possible is 1 as zero means the counters are never assumed to wrap around. Even in platforms with value set as 1 (1 sec) I dont really see a situation in which the counter can wraparound twice if we are putting a delay of 2usec between sampling. As as mentioned before this is just a defensive check to make sure that the platform has not set it to some very low number (which is allowed by the spec). It might be unnecessary to have a check like this. + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; Why not just return the computed value here instead of *1000 and later /1000? return (ref_per * delta_del) / delta_ref; Yes. + else + return -EINVAL; Instead of EINVAL, i think we should return current frequency. Sorry, I didn't get you, How do you calculate the current frequency? Did you mean reference performance? I mean the performance that OSPM/Linux had requested earlier. i.e the desired_perf Okay, I will make necessary changes for this in v2. The counters can pause if CPUs are in idle state during our sampling interval, so If the counters did not progress, it is reasonable to assume the delivered perf was equal to desired perf. No, that is wrong. Here the check is for reference performance delta. This counter can never pause. In case of cpuidle only the delivered counters could pause. Delivered counters will pause only if the particular core enters power down mode, Otherwise we would be still clocking the core and we should be getting a delta across 2 sampling periods. In case if the reference counter is paused which by design is not correct then there is no point in returning reference performance numbers. That too is wrong. In case the low
Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prashanth, On 05/25/2018 12:55 AM, Prakash, Prashanth wrote: Hi George, On 5/22/2018 5:42 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian <george.cher...@cavium.com> --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; We need to check that the wraparound time is long enough to make sure that the counters cannot wrap around more than once. We can register a get() api only after checking that wraparound time value is reasonably high. I am not aware of any platforms where wraparound time is soo short, but wouldn't hurt to check once during init. By design the wraparound time is a 64 bit counter, for that matter even all the feedback counters too are 64 bit counters. I don't see any chance in which the counters can wraparound twice in back to back reads. The only situation is in which system itself is running at a really high frequency. Even in that case today's spec is not sufficient to support the same. + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; Why not just return the computed value here instead of *1000 and later /1000? return (ref_per * delta_del) / delta_ref; Yes. + else + return -EINVAL; Instead of EINVAL, i think we should return current frequency. Sorry, I didn't get you, How do you calculate the current frequency? Did you mean reference performance? The counters can pause if CPUs are in idle state during our sampling interval, so If the counters did not progress, it is reasonable to assume the delivered perf was equal to desired perf. No, that is wrong. Here the check is for reference performance delta. This counter can never pause. In case of cpuidle only the delivered counters could pause. Delivered counters will pause only if the particular core enters power down mode, Otherwise we would be still clocking the core and we should be getting a delta across 2 sampling periods. In case if the reference counter is paused which by design is not correct then there is no point in returning reference performance numbers. That too is wrong. In case the low level FW is not updating the counters properly then it should be evident till Linux, instead of returning a bogus frequency. Even if platform wanted to limit, since the CPUs were asleep(idle) we could not have observed lower performance, so we will not throw off any logic that could be driven using the returned value. + + return (reference_perf * ratio) / 1000; This should be converted to KHz as cpufreq is not aware of CPPC abstract scale In our platform all performance registers are implemented in KHz. Because of which we never had an issue with conversion. I am not aware whether ACPI mandates to use any particular unit. How is that implemented in your platform? Just to avoid any extra conversion don't you feel it is better to always report in KHz from firmware. +} + +static unsigned int cppc_cpufreq_get_rate(unsi
Re: [PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Hi Prashanth, On 05/25/2018 12:55 AM, Prakash, Prashanth wrote: Hi George, On 5/22/2018 5:42 AM, George Cherian wrote: Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; We need to check that the wraparound time is long enough to make sure that the counters cannot wrap around more than once. We can register a get() api only after checking that wraparound time value is reasonably high. I am not aware of any platforms where wraparound time is soo short, but wouldn't hurt to check once during init. By design the wraparound time is a 64 bit counter, for that matter even all the feedback counters too are 64 bit counters. I don't see any chance in which the counters can wraparound twice in back to back reads. The only situation is in which system itself is running at a really high frequency. Even in that case today's spec is not sufficient to support the same. + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; Why not just return the computed value here instead of *1000 and later /1000? return (ref_per * delta_del) / delta_ref; Yes. + else + return -EINVAL; Instead of EINVAL, i think we should return current frequency. Sorry, I didn't get you, How do you calculate the current frequency? Did you mean reference performance? The counters can pause if CPUs are in idle state during our sampling interval, so If the counters did not progress, it is reasonable to assume the delivered perf was equal to desired perf. No, that is wrong. Here the check is for reference performance delta. This counter can never pause. In case of cpuidle only the delivered counters could pause. Delivered counters will pause only if the particular core enters power down mode, Otherwise we would be still clocking the core and we should be getting a delta across 2 sampling periods. In case if the reference counter is paused which by design is not correct then there is no point in returning reference performance numbers. That too is wrong. In case the low level FW is not updating the counters properly then it should be evident till Linux, instead of returning a bogus frequency. Even if platform wanted to limit, since the CPUs were asleep(idle) we could not have observed lower performance, so we will not throw off any logic that could be driven using the returned value. + + return (reference_perf * ratio) / 1000; This should be converted to KHz as cpufreq is not aware of CPPC abstract scale In our platform all performance registers are implemented in KHz. Because of which we never had an issue with conversion. I am not aware whether ACPI mandates to use any particular unit. How is that implemented in your platform? Just to avoid any extra conversion don't you feel it is better to always report in KHz from firmware. +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_per
[PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian <george.cher...@cavium.com> --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; + else + return -EINVAL; + + return (reference_perf * ratio) / 1000; +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 1.8.3.1
[PATCH] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance feedback via set of performance counters. To determine the actual performance level delivered over time, OSPM may read a set of performance counters from the Reference Performance Counter Register and the Delivered Performance Counter Register. OSPM calculates the delivered performance over a given time period by taking a beginning and ending snapshot of both the reference and delivered performance counters, and calculating: delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter). Implement the above and hook this to the cpufreq->get method. Signed-off-by: George Cherian --- drivers/cpufreq/cppc_cpufreq.c | 44 ++ 1 file changed, 44 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index b15115a..a046915 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) return ret; } +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs fb_ctrs_t0, +struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delta_reference, delta_delivered; + u64 reference_perf, ratio; + + reference_perf = fb_ctrs_t0.reference_perf; + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference) + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference; + else /* Counters would have wrapped-around */ + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) + + fb_ctrs_t1.reference; + + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered) + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered; + else /* Counters would have wrapped-around */ + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) + + fb_ctrs_t1.delivered; + + if (delta_reference) /* Check to avoid divide-by zero */ + ratio = (delta_delivered * 1000) / delta_reference; + else + return -EINVAL; + + return (reference_perf * ratio) / 1000; +} + +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) +{ + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; + int ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0); + if (ret) + return ret; + + ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1); + if (ret) + return ret; + + return cppc_get_rate_from_fbctrs(fb_ctrs_t0, fb_ctrs_t1); +} + static struct cpufreq_driver cppc_cpufreq_driver = { .flags = CPUFREQ_CONST_LOOPS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, + .get = cppc_cpufreq_get_rate, .init = cppc_cpufreq_cpu_init, .stop_cpu = cppc_cpufreq_stop_cpu, .name = "cppc_cpufreq", -- 1.8.3.1
[PATCH 1/4] i2c: xlp9xx: Add support for SMBAlert
Add support for SMBus alert mechanism to i2c-xlp9xx driver. The second interrupt is parsed to use for SMBus alert. The first interrupt is the i2c controller main interrupt. Signed-off-by: Kamlakant Patel <kamlakant.pa...@cavium.com> Signed-off-by: George Cherian <george.cher...@cavium.com> Reviewed-by: Jan Glauber <jglau...@cavium.com> --- drivers/i2c/busses/i2c-xlp9xx.c | 24 1 file changed, 24 insertions(+) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index eb8913e..fe54512 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -84,6 +85,8 @@ struct xlp9xx_i2c_dev { struct device *dev; struct i2c_adapter adapter; struct completion msg_complete; + struct i2c_smbus_alert_setup alert_data; + struct i2c_client *ara; int irq; bool msg_read; bool len_recv; @@ -447,6 +450,19 @@ static int xlp9xx_i2c_get_frequency(struct platform_device *pdev, return 0; } +static int xlp9xx_i2c_smbus_setup(struct xlp9xx_i2c_dev *priv, + struct platform_device *pdev) +{ + if (!priv->alert_data.irq) + return -EINVAL; + + priv->ara = i2c_setup_smbus_alert(>adapter, >alert_data); + if (!priv->ara) + return -ENODEV; + + return 0; +} + static int xlp9xx_i2c_probe(struct platform_device *pdev) { struct xlp9xx_i2c_dev *priv; @@ -467,6 +483,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev) dev_err(>dev, "invalid irq!\n"); return priv->irq; } + /* SMBAlert irq */ + priv->alert_data.irq = platform_get_irq(pdev, 1); + if (priv->alert_data.irq <= 0) + priv->alert_data.irq = 0; xlp9xx_i2c_get_frequency(pdev, priv); xlp9xx_i2c_init(priv); @@ -493,6 +513,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev) if (err) return err; + err = xlp9xx_i2c_smbus_setup(priv, pdev); + if (err) + dev_dbg(>dev, "No active SMBus alert %d\n", err); + platform_set_drvdata(pdev, priv); dev_dbg(>dev, "I2C bus:%d added\n", priv->adapter.nr); -- 1.8.3.1
[PATCH 1/4] i2c: xlp9xx: Add support for SMBAlert
Add support for SMBus alert mechanism to i2c-xlp9xx driver. The second interrupt is parsed to use for SMBus alert. The first interrupt is the i2c controller main interrupt. Signed-off-by: Kamlakant Patel Signed-off-by: George Cherian Reviewed-by: Jan Glauber --- drivers/i2c/busses/i2c-xlp9xx.c | 24 1 file changed, 24 insertions(+) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index eb8913e..fe54512 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -84,6 +85,8 @@ struct xlp9xx_i2c_dev { struct device *dev; struct i2c_adapter adapter; struct completion msg_complete; + struct i2c_smbus_alert_setup alert_data; + struct i2c_client *ara; int irq; bool msg_read; bool len_recv; @@ -447,6 +450,19 @@ static int xlp9xx_i2c_get_frequency(struct platform_device *pdev, return 0; } +static int xlp9xx_i2c_smbus_setup(struct xlp9xx_i2c_dev *priv, + struct platform_device *pdev) +{ + if (!priv->alert_data.irq) + return -EINVAL; + + priv->ara = i2c_setup_smbus_alert(>adapter, >alert_data); + if (!priv->ara) + return -ENODEV; + + return 0; +} + static int xlp9xx_i2c_probe(struct platform_device *pdev) { struct xlp9xx_i2c_dev *priv; @@ -467,6 +483,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev) dev_err(>dev, "invalid irq!\n"); return priv->irq; } + /* SMBAlert irq */ + priv->alert_data.irq = platform_get_irq(pdev, 1); + if (priv->alert_data.irq <= 0) + priv->alert_data.irq = 0; xlp9xx_i2c_get_frequency(pdev, priv); xlp9xx_i2c_init(priv); @@ -493,6 +513,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev) if (err) return err; + err = xlp9xx_i2c_smbus_setup(priv, pdev); + if (err) + dev_dbg(>dev, "No active SMBus alert %d\n", err); + platform_set_drvdata(pdev, priv); dev_dbg(>dev, "I2C bus:%d added\n", priv->adapter.nr); -- 1.8.3.1
[PATCH 3/4] i2c: xlp9xx: Make sure the transfer size is not more than I2C_SMBUS_BLOCK_SIZE
For SMBus transactions the max permissible transfer size is I2C_SMBUS_BLOCK_SIZE. It is possible that some clients might not follow it strictly occasionally. This would lead to stack corruption if the driver copies more than I2C_SMBUS_BLOCK_SIZE bytes. Add a check to avoid such conditions. Signed-off-by: Jayachandran C <jn...@caviumnetworks.com> Signed-off-by: George Cherian <george.cher...@cavium.com> --- drivers/i2c/busses/i2c-xlp9xx.c | 37 - 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index c268fde..1f41a4f 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -172,6 +172,8 @@ static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev *priv) len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) & XLP9XX_I2C_FIFO_WCNT_MASK; len = max_t(u32, priv->msg_len, len + 4); + if (len >= I2C_SMBUS_BLOCK_MAX + 2) + return; val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val); @@ -189,14 +191,20 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); - *buf++ = rlen; - if (priv->client_pec) - ++rlen; - /* update remaining bytes and message length */ - priv->msg_buf_remaining = rlen; - priv->msg_len = rlen + 1; - priv->len_recv = false; + if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) { + rlen = 0; /*abort transfer */ + priv->msg_buf_remaining = 0; + priv->msg_len = 0; + } else { + *buf++ = rlen; + if (priv->client_pec) + ++rlen; /* account for error check byte */ + /* update remaining bytes and message length */ + priv->msg_buf_remaining = rlen; + priv->msg_len = rlen + 1; + } xlp9xx_i2c_update_rlen(priv); + priv->len_recv = false; } else { len = min(priv->msg_buf_remaining, len); for (i = 0; i < len; i++, buf++) @@ -315,10 +323,6 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev *priv, struct i2c_msg *msg, xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_MFIFOCTRL, XLP9XX_I2C_MFIFOCTRL_RST); - /* set FIFO threshold if reading */ - if (priv->msg_read) - xlp9xx_i2c_update_rx_fifo_thres(priv); - /* set slave addr */ xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_SLAVEADDR, (msg->addr << XLP9XX_I2C_SLAVEADDR_ADDR_SHIFT) | @@ -337,9 +341,13 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev *priv, struct i2c_msg *msg, val &= ~XLP9XX_I2C_CTRL_ADDMODE; priv->len_recv = msg->flags & I2C_M_RECV_LEN; - len = priv->len_recv ? XLP9XX_I2C_FIFO_SIZE : msg->len; + len = priv->len_recv ? I2C_SMBUS_BLOCK_MAX + 2 : msg->len; priv->client_pec = msg->flags & I2C_CLIENT_PEC; + /* set FIFO threshold if reading */ + if (priv->msg_read) + xlp9xx_i2c_update_rx_fifo_thres(priv); + /* set data length to be transferred */ val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); @@ -393,8 +401,11 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev *priv, struct i2c_msg *msg, } /* update msg->len with actual received length */ - if (msg->flags & I2C_M_RECV_LEN) + if (msg->flags & I2C_M_RECV_LEN) { + if (!priv->msg_len) + return -EPROTO; msg->len = priv->msg_len; + } return 0; } -- 1.8.3.1
[PATCH 3/4] i2c: xlp9xx: Make sure the transfer size is not more than I2C_SMBUS_BLOCK_SIZE
For SMBus transactions the max permissible transfer size is I2C_SMBUS_BLOCK_SIZE. It is possible that some clients might not follow it strictly occasionally. This would lead to stack corruption if the driver copies more than I2C_SMBUS_BLOCK_SIZE bytes. Add a check to avoid such conditions. Signed-off-by: Jayachandran C Signed-off-by: George Cherian --- drivers/i2c/busses/i2c-xlp9xx.c | 37 - 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index c268fde..1f41a4f 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -172,6 +172,8 @@ static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev *priv) len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) & XLP9XX_I2C_FIFO_WCNT_MASK; len = max_t(u32, priv->msg_len, len + 4); + if (len >= I2C_SMBUS_BLOCK_MAX + 2) + return; val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val); @@ -189,14 +191,20 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) if (priv->len_recv) { /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); - *buf++ = rlen; - if (priv->client_pec) - ++rlen; - /* update remaining bytes and message length */ - priv->msg_buf_remaining = rlen; - priv->msg_len = rlen + 1; - priv->len_recv = false; + if (rlen > I2C_SMBUS_BLOCK_MAX || rlen == 0) { + rlen = 0; /*abort transfer */ + priv->msg_buf_remaining = 0; + priv->msg_len = 0; + } else { + *buf++ = rlen; + if (priv->client_pec) + ++rlen; /* account for error check byte */ + /* update remaining bytes and message length */ + priv->msg_buf_remaining = rlen; + priv->msg_len = rlen + 1; + } xlp9xx_i2c_update_rlen(priv); + priv->len_recv = false; } else { len = min(priv->msg_buf_remaining, len); for (i = 0; i < len; i++, buf++) @@ -315,10 +323,6 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev *priv, struct i2c_msg *msg, xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_MFIFOCTRL, XLP9XX_I2C_MFIFOCTRL_RST); - /* set FIFO threshold if reading */ - if (priv->msg_read) - xlp9xx_i2c_update_rx_fifo_thres(priv); - /* set slave addr */ xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_SLAVEADDR, (msg->addr << XLP9XX_I2C_SLAVEADDR_ADDR_SHIFT) | @@ -337,9 +341,13 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev *priv, struct i2c_msg *msg, val &= ~XLP9XX_I2C_CTRL_ADDMODE; priv->len_recv = msg->flags & I2C_M_RECV_LEN; - len = priv->len_recv ? XLP9XX_I2C_FIFO_SIZE : msg->len; + len = priv->len_recv ? I2C_SMBUS_BLOCK_MAX + 2 : msg->len; priv->client_pec = msg->flags & I2C_CLIENT_PEC; + /* set FIFO threshold if reading */ + if (priv->msg_read) + xlp9xx_i2c_update_rx_fifo_thres(priv); + /* set data length to be transferred */ val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); @@ -393,8 +401,11 @@ static int xlp9xx_i2c_xfer_msg(struct xlp9xx_i2c_dev *priv, struct i2c_msg *msg, } /* update msg->len with actual received length */ - if (msg->flags & I2C_M_RECV_LEN) + if (msg->flags & I2C_M_RECV_LEN) { + if (!priv->msg_len) + return -EPROTO; msg->len = priv->msg_len; + } return 0; } -- 1.8.3.1
[PATCH 4/4] i2c: xlp9xx: Add MAINTAINERS entry
The i2c XLP9xx driver is maintained by Cavium. Add George Cherian and Jan Glauber as the Maintainers. Signed-off-by: George Cherian <george.cher...@cavium.com> --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index df6e9bb..68da265 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15509,6 +15509,14 @@ L: linux-kernel@vger.kernel.org S: Supported F: drivers/char/xillybus/ +XLP9XX I2C DRIVER +M: George Cherian <george.cher...@cavium.com> +M: Jan Glauber <jglau...@cavium.com> +L: linux-...@vger.kernel.org +W: http://www.cavium.com +S: Supported +F: drivers/i2c/busses/i2c-xlp9xx.c + XRA1403 GPIO EXPANDER M: Nandor Han <nandor@ge.com> M: Semi Malinen <semi.mali...@ge.com> -- 1.8.3.1
[PATCH 4/4] i2c: xlp9xx: Add MAINTAINERS entry
The i2c XLP9xx driver is maintained by Cavium. Add George Cherian and Jan Glauber as the Maintainers. Signed-off-by: George Cherian --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index df6e9bb..68da265 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15509,6 +15509,14 @@ L: linux-kernel@vger.kernel.org S: Supported F: drivers/char/xillybus/ +XLP9XX I2C DRIVER +M: George Cherian +M: Jan Glauber +L: linux-...@vger.kernel.org +W: http://www.cavium.com +S: Supported +F: drivers/i2c/busses/i2c-xlp9xx.c + XRA1403 GPIO EXPANDER M: Nandor Han M: Semi Malinen -- 1.8.3.1
[PATCH 2/4] i2c: xlp9xx: Fix issue seen when updating receive length
The hardware does not handle updates to the length register gracefully if the new value is less than the number of bytes received so far. If this happens, the i2c controller will not stop the receive transaction properly. Fix this by ensuring that the updated length is ok. This is done by making sure that the new length written to hardware is at least few bytes more than the bytes received so far. While at that refactor the length updation to a new function. Signed-off-by: Jayachandran C <jn...@caviumnetworks.com> Signed-off-by: George Cherian <george.cher...@cavium.com> --- drivers/i2c/busses/i2c-xlp9xx.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index fe54512..c268fde 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -158,9 +158,28 @@ static void xlp9xx_i2c_fill_tx_fifo(struct xlp9xx_i2c_dev *priv) priv->msg_buf += len; } +static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev *priv) +{ + u32 val, len; + + /* +* Update receive length. Re-read len to get the latest value, +* and then add 4 to have a minimum value that can be safely +* written. This is to account for the byte read above, the +* transfer in progress and any delays in the register I/O +*/ + val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL); + len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) & + XLP9XX_I2C_FIFO_WCNT_MASK; + len = max_t(u32, priv->msg_len, len + 4); + val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | + (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); + xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val); +} + static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) { - u32 len, i, val; + u32 len, i; u8 rlen, *buf = priv->msg_buf; len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) & @@ -171,20 +190,13 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); *buf++ = rlen; - len--; - if (priv->client_pec) ++rlen; /* update remaining bytes and message length */ priv->msg_buf_remaining = rlen; priv->msg_len = rlen + 1; priv->len_recv = false; - - /* Update transfer length to read only actual data */ - val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL); - val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | - ((rlen + 1) << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); - xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val); + xlp9xx_i2c_update_rlen(priv); } else { len = min(priv->msg_buf_remaining, len); for (i = 0; i < len; i++, buf++) -- 1.8.3.1
[PATCH 0/4] i2c-xlp9xx Add support for SMBAlert and minor fixes
This series adds the SMBAlert support for i2c-xlp9xx driver and the following fixes. Patch 2: Make sure we update the transfer length to a future length. Patch 3: Restrict the transfer size to I2C_SMBUS_BLOCK_SIZE for transfers with I2C_M_RECV_LEN is set. Patch 4: While at that update the MAINATINERS file to reflect the current maintainers of the driver. George Cherian (4): i2c: xlp9xx: Add support for SMBAlert i2c: xlp9xx: Fix issue seen when updating receive length i2c: xlp9xx: Make sure the transfer size is not more than I2C_SMBUS_BLOCK_SIZE i2c: xlp9xx: Add MAINTAINERS entry MAINTAINERS | 8 drivers/i2c/busses/i2c-xlp9xx.c | 89 +++-- 2 files changed, 76 insertions(+), 21 deletions(-) -- 1.8.3.1
[PATCH 2/4] i2c: xlp9xx: Fix issue seen when updating receive length
The hardware does not handle updates to the length register gracefully if the new value is less than the number of bytes received so far. If this happens, the i2c controller will not stop the receive transaction properly. Fix this by ensuring that the updated length is ok. This is done by making sure that the new length written to hardware is at least few bytes more than the bytes received so far. While at that refactor the length updation to a new function. Signed-off-by: Jayachandran C Signed-off-by: George Cherian --- drivers/i2c/busses/i2c-xlp9xx.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c index fe54512..c268fde 100644 --- a/drivers/i2c/busses/i2c-xlp9xx.c +++ b/drivers/i2c/busses/i2c-xlp9xx.c @@ -158,9 +158,28 @@ static void xlp9xx_i2c_fill_tx_fifo(struct xlp9xx_i2c_dev *priv) priv->msg_buf += len; } +static void xlp9xx_i2c_update_rlen(struct xlp9xx_i2c_dev *priv) +{ + u32 val, len; + + /* +* Update receive length. Re-read len to get the latest value, +* and then add 4 to have a minimum value that can be safely +* written. This is to account for the byte read above, the +* transfer in progress and any delays in the register I/O +*/ + val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL); + len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) & + XLP9XX_I2C_FIFO_WCNT_MASK; + len = max_t(u32, priv->msg_len, len + 4); + val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | + (len << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); + xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val); +} + static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) { - u32 len, i, val; + u32 len, i; u8 rlen, *buf = priv->msg_buf; len = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_FIFOWCNT) & @@ -171,20 +190,13 @@ static void xlp9xx_i2c_drain_rx_fifo(struct xlp9xx_i2c_dev *priv) /* read length byte */ rlen = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_MRXFIFO); *buf++ = rlen; - len--; - if (priv->client_pec) ++rlen; /* update remaining bytes and message length */ priv->msg_buf_remaining = rlen; priv->msg_len = rlen + 1; priv->len_recv = false; - - /* Update transfer length to read only actual data */ - val = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_CTRL); - val = (val & ~XLP9XX_I2C_CTRL_MCTLEN_MASK) | - ((rlen + 1) << XLP9XX_I2C_CTRL_MCTLEN_SHIFT); - xlp9xx_write_i2c_reg(priv, XLP9XX_I2C_CTRL, val); + xlp9xx_i2c_update_rlen(priv); } else { len = min(priv->msg_buf_remaining, len); for (i = 0; i < len; i++, buf++) -- 1.8.3.1
[PATCH 0/4] i2c-xlp9xx Add support for SMBAlert and minor fixes
This series adds the SMBAlert support for i2c-xlp9xx driver and the following fixes. Patch 2: Make sure we update the transfer length to a future length. Patch 3: Restrict the transfer size to I2C_SMBUS_BLOCK_SIZE for transfers with I2C_M_RECV_LEN is set. Patch 4: While at that update the MAINATINERS file to reflect the current maintainers of the driver. George Cherian (4): i2c: xlp9xx: Add support for SMBAlert i2c: xlp9xx: Fix issue seen when updating receive length i2c: xlp9xx: Make sure the transfer size is not more than I2C_SMBUS_BLOCK_SIZE i2c: xlp9xx: Add MAINTAINERS entry MAINTAINERS | 8 drivers/i2c/busses/i2c-xlp9xx.c | 89 +++-- 2 files changed, 76 insertions(+), 21 deletions(-) -- 1.8.3.1
[PATCH] cpufreq: cppc: Use transition_delay_us depending on the transition_latency
With commit e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms") the cpufreq was not honouring the delay passed via ACPI (PCCT). Due to which on ARM based platforms using CPPC the cpufreq governor tries to change the frequency of CPU faster than expeted. This leads to continuous error messages like the following. " ACPI CPPC: PCC check channel failed. Status=0 " Earlier (without above commit) the default transition delay was taken form the value passed from PCCT. Use the same value provided by PCCT to set the transition_delay_us. Fixes: e948bc8fbee0 (cpufreq: Cap the default transition delay value to 10 ms) Signed-off-by: George Cherian <george.cher...@cavium.com> --- drivers/cpufreq/cppc_cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a1c3025..dcb1cb9 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -162,6 +163,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) policy->cpuinfo.max_freq = cppc_dmi_max_khz; policy->cpuinfo.transition_latency = cppc_get_transition_latency(cpu_num); + policy->transition_delay_us = cppc_get_transition_latency(cpu_num) / + NSEC_PER_USEC; policy->shared_type = cpu->shared_type; if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) -- 1.8.3.1
[PATCH] cpufreq: cppc: Use transition_delay_us depending on the transition_latency
With commit e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms") the cpufreq was not honouring the delay passed via ACPI (PCCT). Due to which on ARM based platforms using CPPC the cpufreq governor tries to change the frequency of CPU faster than expeted. This leads to continuous error messages like the following. " ACPI CPPC: PCC check channel failed. Status=0 " Earlier (without above commit) the default transition delay was taken form the value passed from PCCT. Use the same value provided by PCCT to set the transition_delay_us. Fixes: e948bc8fbee0 (cpufreq: Cap the default transition delay value to 10 ms) Signed-off-by: George Cherian --- drivers/cpufreq/cppc_cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index a1c3025..dcb1cb9 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -162,6 +163,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) policy->cpuinfo.max_freq = cppc_dmi_max_khz; policy->cpuinfo.transition_latency = cppc_get_transition_latency(cpu_num); + policy->transition_delay_us = cppc_get_transition_latency(cpu_num) / + NSEC_PER_USEC; policy->shared_type = cpu->shared_type; if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) -- 1.8.3.1
Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173
Hi Bjorn, On 02/22/2018 08:39 PM, Bjorn Helgaas wrote: On Thu, Feb 22, 2018 at 06:43:34PM +0530, George Cherian wrote: On 02/22/2018 04:50 AM, Bjorn Helgaas wrote: On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote: On 02/21/2018 03:24 PM, Lukas Wunner wrote: On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote: I will explain the setup used To the Cavium ThunderX RC the following PLX device is connected. PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch There is no device connected downstream to the PLX switch. AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms since pci_bridge_d3_possible() returns true. And later pci_sysfs_init() ends up doing a config access of PLX which fails with a "synchronous external abort" Thanks for the details! This one *should* be fixed by this patch: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization=bf6c089ee2ac67eb22c0ff0ac9cc7f9ccd619d90 Any chance you could try that out? I did try your patch and it works fine on the above failing setup. Thanks for testing it! I have found another configuration where this fails. Following is the configuration 1) Connected a PCIe Intel i40 card under the root port. 2) unbind the i40 driver and bind with vfio-pci driver. 3) Run lspci in a loop. "lspci -s xx:xx.xx -vvv" I get the same synchronous external abort. In this case the vfio-pci driver probe it moves the device (i40) to D3hot provided disable_idle_d3 is not set. lspci tries to do the config_access which fails with synchronous external abort when the root port transitions to D3hot. the stack trace for this issue looks like this [] pci_generic_config_read+0x5c/0xf0 [] pci_user_read_config_dword+0x84/0x110 [] pci_vpd_read+0x100/0x208 [] pci_read_vpd+0x50/0x68 [] read_vpd_attr+0x60/0x80 [] sysfs_kf_bin_read+0x6c/0xa8 [] kernfs_fop_read+0xa4/0x1c8 [] __vfs_read+0x60/0x170 [] vfs_read+0x8c/0x148 [] SyS_pread64+0xbc/0xd8 I have tried adding pci_config_pm_runtime_get/put pair inside pci_vpd_read(), which I guess might be needed, in case the device goes to D3cold. But having said that it didnt fix the problem in our platform. Your original patch avoids this problem by setting PCI_DEV_FLAGS_NO_D3 on the root port, so it seems like this must be somehow related to the root port's state. This seems to be another issue and is not related to $SUBJECT. Our Hardware team is internally looking into the same and will keep you posted of any further details. Thanks for your time and suggestions. I assume this VPD read is on the i40 device, right? Since you're still seeing the problem even after calling pci_config_pm_runtime_get(), I assume the root port is still not in D0. Can you add a little more instrumentation to read PCI_PM_CTRL and PCI_PM_PPB_EXTENSIONS for the root port and PCI_PM_CTRL for the i40 device right after you call pci_config_pm_runtime_get()? I don't see anything obviously different between the pci_read_config() path and the pci_vpd_read() path except for the pci_config_pm_runtime_get() call that you've already added. I guess you could try using setpci instead of lspci to see if the failure only happens in the pci_vpd_read() path. I assume that will be the case because lspci probably does config reads before it does the VPD read, and those initial config reads seemed to work OK. The VPD path does do config writes in addition to config reads. Maybe there's something special about writes, although I don't know what that would be. You can tell I'm running out of ideas here :) Bjorn -George
Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173
Hi Bjorn, On 02/22/2018 08:39 PM, Bjorn Helgaas wrote: On Thu, Feb 22, 2018 at 06:43:34PM +0530, George Cherian wrote: On 02/22/2018 04:50 AM, Bjorn Helgaas wrote: On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote: On 02/21/2018 03:24 PM, Lukas Wunner wrote: On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote: I will explain the setup used To the Cavium ThunderX RC the following PLX device is connected. PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch There is no device connected downstream to the PLX switch. AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms since pci_bridge_d3_possible() returns true. And later pci_sysfs_init() ends up doing a config access of PLX which fails with a "synchronous external abort" Thanks for the details! This one *should* be fixed by this patch: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization=bf6c089ee2ac67eb22c0ff0ac9cc7f9ccd619d90 Any chance you could try that out? I did try your patch and it works fine on the above failing setup. Thanks for testing it! I have found another configuration where this fails. Following is the configuration 1) Connected a PCIe Intel i40 card under the root port. 2) unbind the i40 driver and bind with vfio-pci driver. 3) Run lspci in a loop. "lspci -s xx:xx.xx -vvv" I get the same synchronous external abort. In this case the vfio-pci driver probe it moves the device (i40) to D3hot provided disable_idle_d3 is not set. lspci tries to do the config_access which fails with synchronous external abort when the root port transitions to D3hot. the stack trace for this issue looks like this [] pci_generic_config_read+0x5c/0xf0 [] pci_user_read_config_dword+0x84/0x110 [] pci_vpd_read+0x100/0x208 [] pci_read_vpd+0x50/0x68 [] read_vpd_attr+0x60/0x80 [] sysfs_kf_bin_read+0x6c/0xa8 [] kernfs_fop_read+0xa4/0x1c8 [] __vfs_read+0x60/0x170 [] vfs_read+0x8c/0x148 [] SyS_pread64+0xbc/0xd8 I have tried adding pci_config_pm_runtime_get/put pair inside pci_vpd_read(), which I guess might be needed, in case the device goes to D3cold. But having said that it didnt fix the problem in our platform. Your original patch avoids this problem by setting PCI_DEV_FLAGS_NO_D3 on the root port, so it seems like this must be somehow related to the root port's state. This seems to be another issue and is not related to $SUBJECT. Our Hardware team is internally looking into the same and will keep you posted of any further details. Thanks for your time and suggestions. I assume this VPD read is on the i40 device, right? Since you're still seeing the problem even after calling pci_config_pm_runtime_get(), I assume the root port is still not in D0. Can you add a little more instrumentation to read PCI_PM_CTRL and PCI_PM_PPB_EXTENSIONS for the root port and PCI_PM_CTRL for the i40 device right after you call pci_config_pm_runtime_get()? I don't see anything obviously different between the pci_read_config() path and the pci_vpd_read() path except for the pci_config_pm_runtime_get() call that you've already added. I guess you could try using setpci instead of lspci to see if the failure only happens in the pci_vpd_read() path. I assume that will be the case because lspci probably does config reads before it does the VPD read, and those initial config reads seemed to work OK. The VPD path does do config writes in addition to config reads. Maybe there's something special about writes, although I don't know what that would be. You can tell I'm running out of ideas here :) Bjorn -George