Re: ieee80211.h virtual_map splat

2024-06-25 Thread Jakub Kicinski
On Tue, 25 Jun 2024 09:56:35 +0300 Kalle Valo wrote:
> > Adding netdev to the initial message in the thread.
> > https://lore.kernel.org/all/CAPh3n83zb1PwFBFijJKChBqY95zzpYh=2iPf8tmh=yts6e3...@mail.gmail.com/
> >
> > There was some discussion in the thread, with the observation that the 
> > splat 
> > is fixed by:
> > 2ae5c9248e06 ("wifi: mac80211: Use flexible array in struct 
> > ieee80211_tim_ie")
> >
> > Followed by discussion if this should be backported.
> >
> > Kees said that "netdev [...] maintainers have asked that contributors not 
> > include "Cc: stable" tags, as they want to evaluate for themselves whether 
> > patches should go to stable or not"  
> 
> BTW this rule doesn't apply to wireless subsystem. For wireless patches
> it's ok to "Cc: stable" in patches or anyone can send a request to
> stable maintainers to pick a patch.

It's an old rule. Quoting documentation:

  Stable tree
  ~~~
  
  While it used to be the case that netdev submissions were not supposed
  to carry explicit ``CC: sta...@vger.kernel.org`` tags that is no longer
  the case today. Please follow the standard stable rules in
  :ref:`Documentation/process/stable-kernel-rules.rst `,
  and make sure you include appropriate Fixes tags!
  
See: 
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#stable-tree



Re: [PATCH net-next v6 03/10] net: create a dummy net_device allocator

2024-04-12 Thread Jakub Kicinski
On Thu, 11 Apr 2024 06:59:27 -0700 Breno Leitao wrote:
> +/**
> + * alloc_netdev_dummy - Allocate and initialize a dummy net device.
> + * @sizeof_priv: size of private data to allocate space for
> + */
> +struct net_device *alloc_netdev_dummy(int sizeof_priv)

Sorry, one more round :)

We started using -Wall for kdoc (./scripts/kernel-doc -Wall $files)
recently and it now complains about missing return values...



Re: [PATCH net-next v5 00/10] allocate dummy device dynamically

2024-04-11 Thread Jakub Kicinski
On Wed, 10 Apr 2024 06:13:41 -0700 Breno Leitao wrote:
>   wifi: ath11k: allocate dummy net_device dynamically

Sorry Breno, I didn't notice earlier, patch 10 didn't make it
to the list. The series wasn't ingested by CI and tested because 
of this. Could you repost?
-- 
pw-bot: cr



Re: [PATCH net-next v4 1/9] net: free_netdev: exit earlier if dummy

2024-04-09 Thread Jakub Kicinski
On Tue,  9 Apr 2024 05:57:15 -0700 Breno Leitao wrote:
> For dummy devices, exit earlier at free_netdev() instead of executing
> the whole function. This is necessary, because dummy devices are
> special, and shouldn't have the second part of the function executed.
> 
> Otherwise reg_state, which is NETREG_DUMMY for dummy devices, will be
> overwritten and there will be no way to identify that this is a dummy
> device. Also, this device do not need the final put_device(), since
> dummy devices are not registered (through register_netdevice()), where
> the device reference is increased (at netdev_register_kobject() ->
> device_add()).

There's a small fuzz when applying due to the phy topo changes
landing, please rebase, the CI didn't ingest it right.
-- 
pw-bot: cr



Re: [PATCH] ath10k: allocate dummy net_device dynamically

2024-03-27 Thread Jakub Kicinski
On Wed, 27 Mar 2024 08:42:55 -0700 Jeff Johnson wrote:
> On 3/27/2024 7:45 AM, Jakub Kicinski wrote:
> > On Wed, 27 Mar 2024 07:38:05 -0700 Breno Leitao wrote:  
> >>   -void init_dummy_netdev(struct net_device *dev)
> >>   +void init_dummy_netdev_core(struct net_device *dev)  
> > 
> > Can init_dummy_netdev_core() be a static function (and no export)?
> > alloc_netdev_dummy() is probably going to be the only user.
> > 
> > I'd also lean towards squashing the two commits into one.  
> 
> And once all of the conversions are complete, won't init_dummy_netdev() no
> longer be used and can be removed?

That's right, it should be removed. But it may take another release
cycle until we can do that, depending on how quickly the patches
propagate :(



Re: [PATCH] ath10k: allocate dummy net_device dynamically

2024-03-27 Thread Jakub Kicinski
On Wed, 27 Mar 2024 07:38:05 -0700 Breno Leitao wrote:
>   -void init_dummy_netdev(struct net_device *dev)
>   +void init_dummy_netdev_core(struct net_device *dev)

Can init_dummy_netdev_core() be a static function (and no export)?
alloc_netdev_dummy() is probably going to be the only user.

I'd also lean towards squashing the two commits into one.



Re: [PATCH] ath10k: allocate dummy net_device dynamically

2024-03-22 Thread Jakub Kicinski
On Fri, 22 Mar 2024 07:58:02 -0700 Breno Leitao wrote:
> > Looks like init_dummy_netdev wipes the netdev structure clean, so I
> > don't think we can use it directly as the setup function, Breno :(  
> 
> Before my patch,  init_dummy_netdev was being also used. The patch was
> basically replacing the init_dummy_netdev by alloc_netdev() with will
> call "setup(dev);" later. 
> 
> -   init_dummy_netdev(&irq_grp->napi_ndev);
> +   irq_grp->napi_ndev = alloc_netdev(0, "dummy", 
> NET_NAME_UNKNOWN,
> + init_dummy_netdev);
> 
> I am wondering if alloc_netdev() is messing with something instead of
> init_dummy_netdev().

alloc_netdev() allocates some memory and initializes lists which
free_netdev() wants to free, basically. But init_dummy_netdev() does:

/* Clear everything. Note we don't initialize spinlocks
 * are they aren't supposed to be taken by any of the
 * NAPI code and this dummy netdev is supposed to be
 * only ever used for NAPI polls
 */
memset(dev, 0, sizeof(struct net_device));

so all those pointers and init alloc_netdev() did before calling setup
will get wiped.

> Also, Kalle's crash is during rmmod, and not during initialization.
> getting NULL after free_netdev() is called.
> 
> > Maybe we should add a new helper to "alloc dummy netdev" which can
> > call alloc_netdev() with right arguments and do necessary init?  
> 
> What are the right arguments in this case?

I'm not sure we have a noop setup() callback today. If you define a
wrapper to allocate a dummy netdev you can define a new empty function
next to it and pass that as init? Hope I got the question right.



Re: [PATCH] ath10k: allocate dummy net_device dynamically

2024-03-21 Thread Jakub Kicinski
On Thu, 21 Mar 2024 15:02:39 -0700 Jeff Johnson wrote:
> >> As suggested there we should just use kmalloc/kfree to match the existing 
> >> logic.  
> > 
> > Please no. There is no magic here. alloc + free must match whether
> > you're using magic object alloc wrapper (alloc_netdev()) or straight
> > up kzalloc().  
> 
> Based upon the ath11k patch there must be something going on with
> alloc_netdev()/free_netdev() that doesn't occur when these aren't used.

Looks like init_dummy_netdev wipes the netdev structure clean, so I
don't think we can use it directly as the setup function, Breno :(
Maybe we should add a new helper to "alloc dummy netdev" which can
call alloc_netdev() with right arguments and do necessary init?

> So I'm just suggesting that instead we use kmalloc() and kfree(), which are
> matching functions, and which, like the existing code, are not subject to
> whatever is happening in alloc_netdev()/free_netdev().
> 
> I don't understand your objection.

Using subsystem APIs to allocate its objects is preferable to ad hoc
kmalloc().

We're working upstream, basic alloc/free of an object should work.
Took me 5 min to realize what the problem is.



Re: [PATCH] ath10k: allocate dummy net_device dynamically

2024-03-21 Thread Jakub Kicinski
On Wed, 20 Mar 2024 08:12:46 -0700 Jeff Johnson wrote:
> NAK this based upon the ath11k patch results.

The ath11 patch is much more complex, I'd wager this one is fine.

> As suggested there we should just use kmalloc/kfree to match the existing 
> logic.

Please no. There is no magic here. alloc + free must match whether
you're using magic object alloc wrapper (alloc_netdev()) or straight
up kzalloc().



Re: [net-next PATCH v2 3/4] netdev: replace napi_reschedule with napi_schedule

2023-10-05 Thread Jakub Kicinski
On Thu, 5 Oct 2023 18:11:56 +0200 Eric Dumazet wrote:
> OK, but I suspect some users of napi_reschedule() might not be race-free...

What's the race you're thinking of?

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [net-next PATCH 1/4] netdev: replace simple napi_schedule_prep/__napi_schedule to napi_schedule

2023-10-02 Thread Jakub Kicinski
On Mon,  2 Oct 2023 17:10:20 +0200 Christian Marangi wrote:
>   queue_work(priv->xfer_wq, &priv->rx_work);
> - else if (napi_schedule_prep(&priv->napi))
> - __napi_schedule(&priv->napi);
> + else
> + napi_schedule(&priv->napi)

Missing semi-colon, please make sure each patch builds cleanly 
with allmodconfig.
-- 
pw-bot: cr

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] wifi: ath10k: Fix return value in ath10k_pci_init()

2022-11-09 Thread Jakub Kicinski
On Thu, 10 Nov 2022 11:12:28 +0800 Xiu Jianfeng wrote:
> To: , , ,  
> , , 
> CC: , ,  
> 

How did you come up with this CC list? If you're CCing netdev
maintainers you should also CC net...@vger.kernel.org 
Somehow you managed to CC LMKL but not netdev :S

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH wireless-next 2/3] wifi: ath10k: remove a copy of the NAPI_POLL_WEIGHT define

2022-04-29 Thread Jakub Kicinski
On Fri, 29 Apr 2022 22:06:20 +0300 Kalle Valo wrote:
> This failed to build as recently usb.c got napi poll as well, I fixed it
> in the pending branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=89fc2e14d3e50cad605104572228d3740df7ca77

Sorry & thanks!

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH wireless-next 2/3] wifi: ath10k: remove a copy of the NAPI_POLL_WEIGHT define

2022-04-29 Thread Jakub Kicinski
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski 
---
CC: kv...@kernel.org
CC: ath10k@lists.infradead.org
CC: linux-wirel...@vger.kernel.org
---
 drivers/net/wireless/ath/ath10k/core.h | 3 ---
 drivers/net/wireless/ath/ath10k/pci.c  | 2 +-
 drivers/net/wireless/ath/ath10k/sdio.c | 2 +-
 drivers/net/wireless/ath/ath10k/snoc.c | 2 +-
 4 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h 
b/drivers/net/wireless/ath/ath10k/core.h
index 9f6680b3be0a..8bfabbcfdb14 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -59,9 +59,6 @@
 #define ATH10K_KEEPALIVE_MAX_IDLE 3895
 #define ATH10K_KEEPALIVE_MAX_UNRESPONSIVE 3900
 
-/* NAPI poll budget */
-#define ATH10K_NAPI_BUDGET  64
-
 /* SMBIOS type containing Board Data File Name Extension */
 #define ATH10K_SMBIOS_BDF_EXT_TYPE 0xF8
 
diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
b/drivers/net/wireless/ath/ath10k/pci.c
index 4d4e2f91e15c..bf1c938be7d0 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -3216,7 +3216,7 @@ static void ath10k_pci_free_irq(struct ath10k *ar)
 void ath10k_pci_init_napi(struct ath10k *ar)
 {
netif_napi_add(&ar->napi_dev, &ar->napi, ath10k_pci_napi_poll,
-  ATH10K_NAPI_BUDGET);
+  NAPI_POLL_WEIGHT);
 }
 
 static int ath10k_pci_init_irq(struct ath10k *ar)
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c 
b/drivers/net/wireless/ath/ath10k/sdio.c
index 73693c66cef1..24283c02a5ef 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -2532,7 +2532,7 @@ static int ath10k_sdio_probe(struct sdio_func *func,
}
 
netif_napi_add(&ar->napi_dev, &ar->napi, ath10k_sdio_napi_poll,
-  ATH10K_NAPI_BUDGET);
+  NAPI_POLL_WEIGHT);
 
ath10k_dbg(ar, ATH10K_DBG_BOOT,
   "sdio new func %d vendor 0x%x device 0x%x block 0x%x/0x%x\n",
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c 
b/drivers/net/wireless/ath/ath10k/snoc.c
index 8328966a0471..607e8164bf98 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -1243,7 +1243,7 @@ static int ath10k_snoc_napi_poll(struct napi_struct *ctx, 
int budget)
 static void ath10k_snoc_init_napi(struct ath10k *ar)
 {
netif_napi_add(&ar->napi_dev, &ar->napi, ath10k_snoc_napi_poll,
-  ATH10K_NAPI_BUDGET);
+  NAPI_POLL_WEIGHT);
 }
 
 static int ath10k_snoc_request_irq(struct ath10k *ar)
-- 
2.34.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: enable threaded napi on ath10k driver

2021-12-14 Thread Jakub Kicinski
On Tue, 14 Dec 2021 22:39:36 + Abhishek Kumar wrote:
> NAPI poll can be done in threaded context along with soft irq
> context. Threaded context can be scheduled efficiently, thus
> creating less of bottleneck during Rx processing. This patch is
> to enable threaded NAPI on ath10k driver.

You need to explain in more detail what you mean by "can be scheduled
efficiently". mt76 had an issue where Rx and Tx would use the same IRQ 
and threaded NAPI allowed them to be run on separate cores. What's the
challenge for ath10k HW?

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Bug 215129] New: Linux kernel hangs during power down

2021-11-25 Thread Jakub Kicinski
On Thu, 25 Nov 2021 08:32:18 +0100 Heiner Kallweit wrote:
> I think the reference to ath10k_pci is misleading, Kalle isn't needed here.
> The actual issue is a RTNL deadlock in igb_resume(). See log snippet:
> 
> Nov 24 18:56:19 MartinsPc kernel:  igb_resume+0xff/0x1e0 [igb 
> 21bf6a00cb1f20e9b0e8434f7f8748a0504e93f8]
> Nov 24 18:56:19 MartinsPc kernel:  pci_pm_runtime_resume+0xa7/0xd0
> Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
> Nov 24 18:56:19 MartinsPc kernel:  __rpm_callback+0x41/0x120
> Nov 24 18:56:19 MartinsPc kernel:  ? pci_pm_freeze_noirq+0x110/0x110
> Nov 24 18:56:19 MartinsPc kernel:  rpm_callback+0x35/0x70
> Nov 24 18:56:19 MartinsPc kernel:  rpm_resume+0x567/0x810
> Nov 24 18:56:19 MartinsPc kernel:  __pm_runtime_resume+0x4a/0x80
> Nov 24 18:56:19 MartinsPc kernel:  dev_ethtool+0xd4/0x2d80
> 
> We have at least two places in net core where runtime_resume() is called
> under RTNL. This conflicts with the current structure in few Intel drivers
> that have something like the following in their resume path.
> 
>   rtnl_lock();
>   if (!err && netif_running(netdev))
>   err = __igb_open(netdev, true);
> 
>   if (!err)
>   netif_device_attach(netdev);
>   rtnl_unlock();
> 
> Other drivers don't do this, so it's the question whether it's actually
> needed here to take RTNL. Some discussion was started [0], but it ended
> w/o tangible result and since then it has been surprisingly quiet.
> 
> [0] https://www.spinics.net/lists/netdev/msg736880.html

Ah, that makes perfect sense, I didn't see that stack trace. 
Dropping Kalle from CC. Let's hear what Intel folks have to say..

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Bug 215129] New: Linux kernel hangs during power down

2021-11-24 Thread Jakub Kicinski
Adding Kalle and Hainer.

On Wed, 24 Nov 2021 14:45:05 -0800 Stephen Hemminger wrote:
> Begin forwarded message:
> 
> Date: Wed, 24 Nov 2021 21:14:53 +
> From: bugzilla-dae...@bugzilla.kernel.org
> To: step...@networkplumber.org
> Subject: [Bug 215129] New: Linux kernel hangs during power down
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=215129
> 
> Bug ID: 215129
>Summary: Linux kernel hangs during power down
>Product: Networking
>Version: 2.5
> Kernel Version: 5.15
>   Hardware: All
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: Other
>   Assignee: step...@networkplumber.org
>   Reporter: martin.sto...@gmail.com
> Regression: No
> 
> Created attachment 299703
>   --> https://bugzilla.kernel.org/attachment.cgi?id=299703&action=edit
> Kernel log after timeout occured
> 
> On my system the kernel is waiting for a task during shutdown which doesn't
> complete.
> 
> The commit which causes this behavior is:
> [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev
> parent before ethtool ioctl ops
> 
> This bug causes also that the system gets unresponsive after starting Steam:
> https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/
> 


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH net-next 1/2] soc: qcom: ipa: Constify static qmi structs

2020-11-25 Thread Jakub Kicinski
On Wed, 25 Nov 2020 15:45:05 -0600 Alex Elder wrote:
> On 11/22/20 5:40 PM, Rikard Falkeborn wrote:
> > These are only used as input arguments to qmi_handle_init() which
> > accepts const pointers to both qmi_ops and qmi_msg_handler. Make them
> > const to allow the compiler to put them in read-only memory.
> > 
> > Signed-off-by: Rikard Falkeborn   
> 
> Acked-by: Alex Elder 

Applied to net-next, thanks!

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH net-next 1/2] soc: qcom: ipa: Constify static qmi structs

2020-11-24 Thread Jakub Kicinski
On Mon, 23 Nov 2020 00:40:30 +0100 Rikard Falkeborn wrote:
> These are only used as input arguments to qmi_handle_init() which
> accepts const pointers to both qmi_ops and qmi_msg_handler. Make them
> const to allow the compiler to put them in read-only memory.
> 
> Signed-off-by: Rikard Falkeborn 

I can take this one if Alex acks it.

The other patch is probably best handled by Kalle.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH net v2 00/21] net: avoid to remove module when its debugfs is being used

2020-11-07 Thread Jakub Kicinski
On Sat,  7 Nov 2020 17:21:31 + Taehee Yoo wrote:
> When debugfs file is opened, its module should not be removed until
> it's closed.
> Because debugfs internally uses the module's data.
> So, it could access freed memory.
> 
> In order to avoid panic, it just sets .owner to THIS_MODULE.
> So that all modules will be held when its debugfs file is opened.

Hm, looks like some of the patches need to be revised because
.owner is already set in the ops, and a warning gets generated.

Also it'd be good to mention why Johannes's approach was abandoned.

When you repost please separate out all the patches for
drivers/net/wireless/ and send that to Kalle's wireless drivers tree.
Patch 1 needs to be split in two. Patches 2 and 3 would go via Johannes.
The wimax patch needs to go to staging (wimax code has been moved).
The remaining patches can be posted individually, not as a series.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 0/3]

2020-07-31 Thread Jakub Kicinski
On Fri, 31 Jul 2020 23:57:19 +0530 Rakesh Pillai wrote:
> The history recording will be compiled only if
> ATH10K_DEBUG is enabled, and also enabled via
> the module parameter. Once the history recording
> is enabled via module parameter, it can be enabled
> or disabled runtime via debugfs.

Have you seen the trace_devlink_hwmsg() interface?
Could it be used here?

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 0/7] Add support to process rx packets in thread

2020-07-22 Thread Jakub Kicinski
On Tue, 21 Jul 2020 19:25:14 +0200 Andrew Lunn wrote:
> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
> > NAPI gets scheduled on the CPU core which got the
> > interrupt. The linux scheduler cannot move it to a
> > different core, even if the CPU on which NAPI is running
> > is heavily loaded. This can lead to degraded wifi
> > performance when running traffic at peak data rates.
> > 
> > A thread on the other hand can be moved to different
> > CPU cores, if the one on which its running is heavily
> > loaded. During high incoming data traffic, this gives
> > better performance, since the thread can be moved to a
> > less loaded or sometimes even a more powerful CPU core
> > to account for the required CPU performance in order
> > to process the incoming packets.
> > 
> > This patch series adds the support to use a high priority
> > thread to process the incoming packets, as opposed to
> > everything being done in NAPI context.  
> 
> I don't see why this problem is limited to the ath10k driver. I expect
> it applies to all drivers using NAPI. So shouldn't you be solving this
> in the NAPI core? Allow a driver to request the NAPI core uses a
> thread?

Agreed, this is a problem we have with all drivers today.
We see seriously sub-optimal behavior in data center workloads, 
because kernel overloads the cores doing packet processing.

I think the fix may actually be in the scheduler. AFAIU the scheduler
counts the softIRQ time towards the interrupted process, and on top
of that tries to move processes to the cores handling their IO. In the
end the configuration which works somewhat okay is when each core has
its own IRQ and queues, which is seriously sub-optimal.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 1/2] devlink: add simple fw crash helpers

2020-05-25 Thread Jakub Kicinski
On Fri, 22 May 2020 22:46:07 +0200 Johannes Berg wrote:
> > The irony is you have a problem with a networking device and all the
> > devices your initial set touched are networking. Two of the drivers 
> > you touched either have or will soon have devlink health reporters
> > implemented.  
> 
> Like I said above, do you think it'd be feasible to make a devcoredump
> out of devlink health reports? And can the report be in a way that we
> control the file format, or are there limits? I guess I should read the
> code to find out, but I figure you probably just know. But feel free to
> tell me to read it :)
> 
> The reason I'm asking is that it's starting to sound like we really
> ought to be implementing devlink, but we've got a bunch of
> infrastructure that uses the devcoredump, and it'll take time
> (significantly so) to change all that...

In devlink world pure FW core dumps are exposed by devlink regions.
An API allowing reading device memory, registers, etc., but also 
creating dumps of memory regions when things go wrong. It should be
a fairly straightforward migration.

Devlink health is more targeted, the dump is supposed to contain only
relevant information, selected and formatted by the driver. When device
misbehaves driver reads the relevant registers and FW state and creates
a formatted state dump. I believe each element of the dump must fit
into a netlink message (but there may be multiple elements, see
devlink_fmsg_prepare_skb()).

We should be able to convert dl-regions dumps and dl-health dumps into
devcoredumps, but since health reporters are supposed to be more
targeted there's usually multiple of them per device.

Conversely devcoredumps can be trivially exposed as dl-region dumps,
but I believe dl-health would require driver changes to format the
information appropriately.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 1/2] devlink: add simple fw crash helpers

2020-05-22 Thread Jakub Kicinski
On Fri, 22 May 2020 05:20:46 + Luis Chamberlain wrote:
> > diff --git a/net/core/Makefile b/net/core/Makefile
> > index 3e2c378e5f31..6f1513781c17 100644
> > --- a/net/core/Makefile
> > +++ b/net/core/Makefile
> > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
> >  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
> >  obj-$(CONFIG_DST_CACHE) += dst_cache.o
> >  obj-$(CONFIG_HWBM) += hwbm.o
> > -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o  
> 
> This was looking super sexy up to here. This is networking specific.
> We want something generic for *anything* that requests firmware.

You can't be serious. It's network specific because of how the Kconfig
is named?

Working for a company operating large data centers I would strongly
prefer if we didn't have ten different ways of reporting firmware
problems in the fleet.

> I'm afraid this won't work for something generic. I don't think its
> throw-away work though, the idea to provide a generic interface to
> dump firmware through netlink might be nice for networking, or other
> things.
> 
> But I have a feeling we'll want something still more generic than this.

Please be specific. Saying generic a lot is not helpful. The code (as
you can see in this patch) is in no way network specific. Or are you
saying there are machines out there running without netlink sockets?

> So networking may want to be aware that a firmware crash happened as
> part of this network device health thing, but firmware crashing is a
> generic thing.
> 
> I have now extended my patch set to include uvents and I am more set on
> that we need the taint now more than ever.

Please expect my nack if you're trying to add this to networking
drivers.

The irony is you have a problem with a networking device and all the
devices your initial set touched are networking. Two of the drivers 
you touched either have or will soon have devlink health reporters
implemented.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[RFC 2/2] i2400m: use devlink health reporter

2020-05-19 Thread Jakub Kicinski
It builds.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/wimax/i2400m/rx.c  | 2 ++
 drivers/net/wimax/i2400m/usb.c | 5 +
 2 files changed, 7 insertions(+)

diff --git a/drivers/net/wimax/i2400m/rx.c b/drivers/net/wimax/i2400m/rx.c
index c9fb619a9e01..cc7fe78f2df0 100644
--- a/drivers/net/wimax/i2400m/rx.c
+++ b/drivers/net/wimax/i2400m/rx.c
@@ -144,6 +144,7 @@
  *   i2400m_msg_size_check
  *   wimax_msg
  */
+#include 
 #include 
 #include 
 #include 
@@ -712,6 +713,7 @@ void __i2400m_roq_queue(struct i2400m *i2400m, struct 
i2400m_roq *roq,
dev_err(dev, "SW BUG? failed to insert packet\n");
dev_err(dev, "ERX: roq %p [ws %u] skb %p nsn %d sn %u\n",
roq, roq->ws, skb, nsn, roq_data->sn);
+   devlink_simple_fw_reporter_report_crash(dev);
skb_queue_walk(&roq->queue, skb_itr) {
roq_data_itr = (struct i2400m_roq_data *) &skb_itr->cb;
nsn_itr = __i2400m_roq_nsn(roq, roq_data_itr->sn);
diff --git a/drivers/net/wimax/i2400m/usb.c b/drivers/net/wimax/i2400m/usb.c
index 9659f9e1aaa6..5c811dccbf1d 100644
--- a/drivers/net/wimax/i2400m/usb.c
+++ b/drivers/net/wimax/i2400m/usb.c
@@ -49,6 +49,7 @@
  *   usb_reset_device()
  */
 #include "i2400m-usb.h"
+#include 
 #include 
 #include 
 #include 
@@ -423,6 +424,8 @@ int i2400mu_probe(struct usb_interface *iface,
if (usb_dev->speed != USB_SPEED_HIGH)
dev_err(dev, "device not connected as high speed\n");
 
+   devlink_simple_fw_reporter_prepare(dev);
+
/* Allocate instance [calls i2400m_netdev_setup() on it]. */
result = -ENOMEM;
net_dev = alloc_netdev(sizeof(*i2400mu), "wmx%d", NET_NAME_UNKNOWN,
@@ -506,6 +509,7 @@ int i2400mu_probe(struct usb_interface *iface,
usb_put_dev(i2400mu->usb_dev);
free_netdev(net_dev);
 error_alloc_netdev:
+   devlink_simple_fw_reporter_cleanup(dev);
return result;
 }
 
@@ -532,6 +536,7 @@ void i2400mu_disconnect(struct usb_interface *iface)
usb_set_intfdata(iface, NULL);
usb_put_dev(i2400mu->usb_dev);
free_netdev(net_dev);
+   devlink_simple_fw_reporter_cleanup(dev);
d_fnend(3, dev, "(iface %p i2400m %p) = void\n", iface, i2400m);
 }
 
-- 
2.25.4


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[RFC 1/2] devlink: add simple fw crash helpers

2020-05-19 Thread Jakub Kicinski
Add infra for creating devlink instances for a device to report
fw crashes. This patch expects the devlink instance to be registered
at probe time. I belive to be the cleanest. We can also add a devm
version of the helpers, so that we don't have to do the clean up.
Or we can go even further and register the devlink instance only
once error has happened (for the first time, then we can just
find out if already registered by traversing the list like we
do here).

With the patch applied and a sample driver converted we get:

$ devlink dev
pci/:07:00.0

Then monitor for errors:

$ devlink mon health
[health,status] pci/:07:00.0:
  reporter fw
state error error 1 recover 0
[health,status] pci/:07:00.0:
  reporter fw
state error error 2 recover 0

These are the events I triggered on purpose. One can also inspect
the health of all devices capable of reporting fw errors:

$ devlink health
pci/:07:00.0:
  reporter fw
state error error 7 recover 0

Obviously drivers may upgrade to the full devlink health API
which includes state dump, state dump auto-collect and automatic
error recovery control.

Signed-off-by: Jakub Kicinski 
---
 include/linux/devlink.h   |  11 +++
 net/core/Makefile |   2 +-
 net/core/devlink_simple_fw_reporter.c | 101 ++
 3 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/devlink.h
 create mode 100644 net/core/devlink_simple_fw_reporter.c

diff --git a/include/linux/devlink.h b/include/linux/devlink.h
new file mode 100644
index ..2b73987eefca
--- /dev/null
+++ b/include/linux/devlink.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_DEVLINK_H_
+#define _LINUX_DEVLINK_H_
+
+struct device;
+
+void devlink_simple_fw_reporter_prepare(struct device *dev);
+void devlink_simple_fw_reporter_cleanup(struct device *dev);
+void devlink_simple_fw_reporter_report_crash(struct device *dev);
+
+#endif
diff --git a/net/core/Makefile b/net/core/Makefile
index 3e2c378e5f31..6f1513781c17 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
 obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
-obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o
 obj-$(CONFIG_GRO_CELLS) += gro_cells.o
 obj-$(CONFIG_FAILOVER) += failover.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
diff --git a/net/core/devlink_simple_fw_reporter.c 
b/net/core/devlink_simple_fw_reporter.c
new file mode 100644
index ..48dde9123c3c
--- /dev/null
+++ b/net/core/devlink_simple_fw_reporter.c
@@ -0,0 +1,101 @@
+#include 
+#include 
+#include 
+#include 
+
+struct devlink_simple_fw_reporter {
+   struct list_head list;
+   struct devlink_health_reporter *reporter;
+};
+
+
+static LIST_HEAD(devlink_simple_fw_reporters);
+static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex);
+
+static const struct devlink_health_reporter_ops simple_devlink_health = {
+   .name = "fw",
+};
+
+static const struct devlink_ops simple_devlink_ops = {
+};
+
+static struct devlink_simple_fw_reporter *
+devlink_simple_fw_reporter_find_for_dev(struct device *dev)
+{
+   struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL;
+   struct devlink *devlink;
+
+   mutex_lock(&devlink_simple_fw_reporters_mutex);
+   list_for_each_entry(simple_devlink, &devlink_simple_fw_reporters,
+   list) {
+   devlink = priv_to_devlink(simple_devlink);
+   if (devlink->dev == dev) {
+   ret = simple_devlink;
+   break;
+   }
+   }
+   mutex_unlock(&devlink_simple_fw_reporters_mutex);
+
+   return ret;
+}
+
+void devlink_simple_fw_reporter_report_crash(struct device *dev)
+{
+   struct devlink_simple_fw_reporter *simple_devlink;
+
+   simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev);
+   if (!simple_devlink)
+   return;
+
+   devlink_health_report(simple_devlink->reporter, "firmware crash", NULL);
+}
+EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_report_crash);
+
+void devlink_simple_fw_reporter_prepare(struct device *dev)
+{
+   struct devlink_simple_fw_reporter *simple_devlink;
+   struct devlink *devlink;
+
+   devlink = devlink_alloc(&simple_devlink_ops,
+   sizeof(struct devlink_simple_fw_reporter));
+   if (!devlink)
+   return;
+
+   if (devlink_register(devlink, dev))
+   goto err_free;
+
+   simple_devlink = devlink_priv(devlink);
+   simple_devlink->reporter =
+   devlink_health_reporter_create(devlink, &simple_devlink_health,
+  0, NULL);
+   if (IS_ERR(simple_d

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Jakub Kicinski
On Mon, 18 May 2020 21:22:02 + Luis Chamberlain wrote:
> Indeed my issue with devlink is that it did not seem generic enough for
> all devices which use firmware and for which firmware can crash. Support
> should not have to be "add devlink support" + "now use this new hook",
> but rather a very lighweight devlink_crash(device) call we can sprinkly
> with only the device as a functional requirement.

We can provide a lightweight devlink_crash(device) which only generates
the notification, without the need to register the health reporter or a
devlink instance upfront. But then we loose the ability to control the
recovery, count errors, etc. So I'd think that's not the direction we
want to go in.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Jakub Kicinski
On Mon, 18 May 2020 22:41:48 +0200 Johannes Berg wrote:
> On Mon, 2020-05-18 at 13:35 -0700, Jakub Kicinski wrote:
> > It's intended to be a generic netlink channel for configuring devices.
> > 
> > All the firmware-related interfaces have no dependencies on netdevs,
> > in fact that's one of the reasons we moved to devlink - we don't want
> > to hold rtnl lock just for talking to device firmware.  
> 
> Sounds good :)
> 
> So I guess Luis just has to add some way in devlink to hook up devlink
> health in a simple way to drivers, perhaps? I mean, many drivers won't
> really want to use devlink for anything else, so I guess it should be as
> simple as the API that Luis proposed ("firmware crashed for this struct
> device"), if nothing more interesting is done with devlink?
> 
> Dunno. But anyway sounds like it should somehow integrate there rather
> than the way this patchset proposed?

Right, that'd be great. Simple API to register a devlink instance with
whatever number of reporters the device would need. I'm happy to help.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Jakub Kicinski
On Mon, 18 May 2020 22:29:53 +0200 Johannes Berg wrote:
> On Mon, 2020-05-18 at 13:28 -0700, Jakub Kicinski wrote:
> > On Mon, 18 May 2020 21:25:09 +0200 Johannes Berg wrote:  
> > > It's pretty clear, but even then, first of all I doubt this is the case
> > > for many of the places that you've sprinkled the annotation on, and
> > > secondly it actually hides useful information.
> > > 
> > > Regardless of the support issue, I think this hiding of information is
> > > also problematic.
> > > 
> > > I really think we'd all be better off if you just made a sysfs file (I
> > > mistyped debugfs in some other email, sorry, apparently you didn't see
> > > the correction in time) that listed which device(s) crashed and how many
> > > times. That would actually be useful. Because honestly, if a random
> > > device crashed for some random reason, that's pretty much a non-event.
> > > If it keeps happening, then we might even want to know about it.  
> > 
> > Johannes - have you seen devlink health? I think we should just use
> > that interface, since it supports all the things you're requesting,
> > rather than duplicate it in sysfs.  
> 
> I haven't, and I'm glad to hear that's there, sounds good!
> 
> I suspect that Luis wants something more generic though, that isn't just
> applicable to netdevices, unless devlink grew some kind of non-netdev
> stuff while I wasn't looking? :)

It's intended to be a generic netlink channel for configuring devices.

All the firmware-related interfaces have no dependencies on netdevs,
in fact that's one of the reasons we moved to devlink - we don't want
to hold rtnl lock just for talking to device firmware.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Jakub Kicinski
On Mon, 18 May 2020 21:25:09 +0200 Johannes Berg wrote:
> It's pretty clear, but even then, first of all I doubt this is the case
> for many of the places that you've sprinkled the annotation on, and
> secondly it actually hides useful information.
> 
> Regardless of the support issue, I think this hiding of information is
> also problematic.
> 
> I really think we'd all be better off if you just made a sysfs file (I
> mistyped debugfs in some other email, sorry, apparently you didn't see
> the correction in time) that listed which device(s) crashed and how many
> times. That would actually be useful. Because honestly, if a random
> device crashed for some random reason, that's pretty much a non-event.
> If it keeps happening, then we might even want to know about it.

Johannes - have you seen devlink health? I think we should just use
that interface, since it supports all the things you're requesting,
rather than duplicate it in sysfs.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath9k: remove stray backslash in Makefile

2017-11-27 Thread Jakub Kicinski
On Mon, 27 Nov 2017 18:56:21 +0100, Matthias Schiffer wrote:
> Signed-off-by: Matthias Schiffer 
> ---
>  drivers/net/wireless/ath/ath9k/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/Makefile 
> b/drivers/net/wireless/ath/ath9k/Makefile
> index 36a40ffdce15..90e4a341076c 100644
> --- a/drivers/net/wireless/ath/ath9k/Makefile
> +++ b/drivers/net/wireless/ath/ath9k/Makefile
> @@ -59,7 +59,7 @@ obj-$(CONFIG_ATH9K_HW) += ath9k_hw.o
>  obj-$(CONFIG_ATH9K_COMMON) += ath9k_common.o
>  ath9k_common-y:= common.o \
>   common-init.o \
> - common-beacon.o \
> + common-beacon.o
>  

It's not necessarily stray, there is nothing on the next line so it's
OK, and if you add \ at the end of all lines, you don't have to touch
the last line every time you add/remove something.  Sort of like
putting a , after last enum value.

Just my $.02, not that I mind either way :)

>  ath9k_common-$(CONFIG_ATH9K_COMMON_DEBUG) += common-debug.o \
>common-spectral.o


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k