On Tue, Sep 16, 2025 at 03:22:54PM +0200, Paolo Abeni wrote: > On 9/15/25 5:58 AM, Erni Sri Satya Vennela wrote: > > Report standard counter stats->rx_missed_errors > > using hc_rx_discards_no_wqe from the hardware. > > > > Add a dedicated workqueue to periodically run > > mana_query_gf_stats every 2 seconds to get the latest > > info in eth_stats and define a driver capability flag > > to notify hardware of the periodic queries. > > > > To avoid repeated failures and log flooding, the workqueue > > is not rescheduled if mana_query_gf_stats fails. > > Can the failure root cause be a "transient" one? If so, this looks like > a dangerous strategy; is such scenario, AFAICS, stats will be broken > until the device is removed and re-probed. > > /P After internal discussion, We are planning to fix this issue following the below approach:
Stop rescheduling the work queue only upon detecting HWC timeout. In this case: 1. Reset all stats to zero to avoid stale reporting. 2. Introduce a driver flag to detect the first occurrence of HWC timeout. 3. Log a warn_once during subsequent calls to mana_get_stats64 to signal the issue. Thanks, Vennela
