On Tue, Jan 17, 2017 at 10:15 AM, Luis R. Rodriguez <mcg...@kernel.org> wrote: > On Tue, Jan 17, 2017 at 03:35:05PM +0000, Jakub Kicinski wrote: >> Commit 5d47ec02c37e ("firmware: Correct handling of fw_state_wait() >> return value") made the assumption that any error returned from >> fw_state_wait_timeout() means FW load has to be aborted. This is >> incorrect, FW load only has to be aborted when load timed out or >> has been interrupted. Otherwise, if wait exited because of a wake >> up, the waking thread (in firmware_loading_store()) had already >> performed the clean up - deleted fw_buf from the pending list and >> set fw_priv->buf to NULL. >> >> Example NULL dereference which occurs because requsted firmware >> could not be found by UMH: >> >> nfp 0000:02:00.0: Direct firmware load for AMDA0081-0001.cat failed with >> error -2 >> nfp 0000:02:00.0: Falling back to user helper >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 >> IP: _request_firmware+0x7fd/0x910 >> PGD 0 >> >> Oops: 0000 [#1] PREEMPT SMP >> CPU: 4 PID: 1981 Comm: insmod Tainted: G O >> 4.10.0-rc3-perf-00404-g575a284323c2 #78 >> Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10 03/10/2015 >> task: ffff919b81a99a80 task.stack: ffffb3bac5668000 >> RIP: 0010:_request_firmware+0x7fd/0x910 >> RSP: 0018:ffffb3bac566b978 EFLAGS: 00010246 >> RAX: 00000000fffffffe RBX: ffff919b9b2becc0 RCX: ffff919b9b2bece8 >> RDX: ffff919b81a99a80 RSI: 0000000000000206 RDI: 0000000000000000 >> RBP: ffffb3bac566b9e0 R08: ffff919b9f30f4b8 R09: 0000000000000002 >> R10: 00000120d2b47b48 R11: 0000000000000000 R12: ffffb3bac566ba18 >> R13: ffff919b823f4780 R14: ffff919b907c5000 R15: 0000000000003a98 >> FS: 00007fae4d803740(0000) GS:ffff919b9f300000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000000000038 CR3: 00000008507b8000 CR4: 00000000001406e0 >> Call Trace: >> request_firmware+0x37/0x50 >> ... >> >> Fixes: 5d47ec02c37e ("firmware: Correct handling of fw_state_wait() return >> value") >> Signed-off-by: Jakub Kicinski <jakub.kicin...@netronome.com> >> --- >> v2: >> - improve commit message. >> --- >> drivers/base/firmware_class.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c >> index 4497d263209f..ce142e6b2c72 100644 >> --- a/drivers/base/firmware_class.c >> +++ b/drivers/base/firmware_class.c >> @@ -1020,7 +1020,7 @@ static int _request_firmware_load(struct firmware_priv >> *fw_priv, >> } >> >> retval = fw_state_wait_timeout(&buf->fw_st, timeout); >> - if (retval < 0) { >> + if (retval == -ETIMEDOUT || retval == -ERESTARTSYS) { >> mutex_lock(&fw_lock); >> fw_load_abort(fw_priv); >> mutex_unlock(&fw_lock); > > This is a bit messy, two other similar issues were reported before > and upon review I suggested Patrick Bruenn's fix with a better commit > log seems best fit. Patrick sent a patch Jan 4, 2017 but never followed up > despite my feedback on a small change on the commit log message [0]. Can you > try that and if that fixes it can you adjust the commit log accordingly? > Please > note the preferred solution would be: > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index b9ac348e8d33..c530f8b4af01 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -542,6 +542,8 @@ static struct firmware_priv *to_firmware_priv(struct > device *dev) > > static void __fw_load_abort(struct firmware_buf *buf) > { > + if (!buf) > + return; > /* > * There is a small window in which user can write to 'loading' > * between loading done and disappearance of 'loading' > > [0] https://http://lkml.kernel.org/r/20170104174227.go13...@wotan.suse.de
Actually Patrick did follow up with a new patch today but I just saw it was not as I recommended above. Patrick can you send a new one with your good commit log with the above change ? Can you both confirm it fixes the issue? Luis