> -----Original Message----- > From: Paul Menzel <pmen...@molgen.mpg.de> > Sent: Monday, June 10, 2024 11:45 AM > To: Loktionov, Aleksandr <aleksandr.loktio...@intel.com>; Kang, > Kelvin <kelvin.k...@intel.com> > Cc: intel-wired-...@lists.osuosl.org; Nguyen, Anthony L > <anthony.l.ngu...@intel.com>; net...@vger.kernel.org; Kubalewski, > Arkadiusz <arkadiusz.kubalew...@intel.com>; Sokolowski, Jan > <jan.sokolow...@intel.com>; Leon Romanovsky <leo...@nvidia.com> > Subject: Re: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue > NVM content is corrupted after nvmupdate > > Dear Aleksandr, dear Kelvin, > > > Thank you for your patch. > > > Am 10.06.24 um 11:20 schrieb Aleksandr Loktionov: > > After 230f3d53a547 patch all I/O errors are being converted into > > EAGAIN which leads to retries until timeout so nvmupdate > sometimes > > fails after more than 20 minutes! > > > > Remove misleading EIO to EGAIN conversion and pass all errors as > is. > > > > Fixes: 230f3d53a547 ("i40e: remove i40e_status") > > This commit is present since v6.6-rc1, released September last year > (2023). So until now, nobody noticed this? > Really, really. The regression affects users only when they update F/W, and not all F/W are affected, only that generate I/O errors while update. Not cars are affected, but the consequences are serous as in subj.
> > Co-developed-by: Kelvin Kang <kelvin.k...@intel.com> > > Signed-off-by: Kelvin Kang <kelvin.k...@intel.com> > > Reviewed-by: Arkadiusz Kubalewski > <arkadiusz.kubalew...@intel.com> > > Signed-off-by: Aleksandr Loktionov > <aleksandr.loktio...@intel.com> > > Please give more details about your test setup. For me it’s also > not clear, how the NVM content gets corrupted as stated in the > summary/title. Could you please elaborate that in the commit > message. > > > --- > > drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ---- > > 1 file changed, 4 deletions(-) > > > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h > > b/drivers/net/ethernet/intel/i40e/i40e_adminq.h > > index ee86d2c..55b5bb8 100644 > > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h > > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h > > @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int > aq_ret, int aq_rc) > > -EFBIG, /* I40E_AQ_RC_EFBIG */ > > }; > > > > - /* aq_rc is invalid if AQ timed out */ > > - if (aq_ret == -EIO) > > - return -EAGAIN; > > - > > if (!((u32)aq_rc < (sizeof(aq_to_posix) / > sizeof((aq_to_posix)[0])))) > > return -ERANGE; > > The referenced commit 230f3d53a547 does: > > ``` > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h > b/drivers/net/ethernet/intel/i40e/i40e_adminq.h > index ee394aacef4d..267f2e0a21ce 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h > @@ -5,7 +5,6 @@ > #define _I40E_ADMINQ_H_ > > #include "i40e_osdep.h" > -#include "i40e_status.h" > #include "i40e_adminq_cmd.h" > > #define I40E_ADMINQ_DESC(R, i) \ > @@ -117,7 +116,7 @@ static inline int i40e_aq_rc_to_posix(int > aq_ret, int aq_rc) > }; > > /* aq_rc is invalid if AQ timed out */ > - if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT) > + if (aq_ret == -EIO) > return -EAGAIN; > > if (!((u32)aq_rc < (sizeof(aq_to_posix) / > sizeof((aq_to_posix)[0])))) > ``` > > So I do not see yet, why removing the whole hunk is the solution. > > > Kind regards, > > Paul