Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Mathias Nyman writes: >> We set CMD_RING_STATE_ABORTED state under locking. I'm not checking what >> is for taking lock for register though, I guess it should be enough just >> lock around of read=>write of ->cmd_ring if need lock. > > After your patch it should be enough to have the lock only while > reading and writing the cmd_ring register. > > If we want a locking fix that applies more easily to older stable > releases before your change then the lock needs to cover set > CMD_RING_STATE_ABORT, read cmd_reg, write cmd_reg and busiloop > checking CRR bit. Otherwise the stop cmd ring interrupt handler may > restart the ring just before we start checing CRR. The stop cmd ring > interrupt will set the CMD_RING_STATE_ABORTED to > CMD_RING_STATE_RUNNING so ring will really restart in the interrupt > handler. Just for record (no chance to make patch I myself for now, sorry), while checking locking slightly, I noticed unrelated missing locking. xhci_cleanup_command_queue() We are calling it without locking, but we need to lock for accessing list. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Mathias Nyman writes: >> Below is the latest code. I put my comments in line. >> >> 322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) >> 323 { >> 324 u64 temp_64; >> 325 int ret; >> 326 >> 327 xhci_dbg(xhci, "Abort command ring\n"); >> 328 >> 329 reinit_completion(&xhci->cmd_ring_stop_completion); >> 330 >> 331 temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); >> 332 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, >> 333 &xhci->op_regs->cmd_ring); >> >> We should hold xhci->lock when we are modifying xhci registers >> at runtime. >> > > Makes sense, but we need to unlock it before sleeping or waiting for > completion. I need to look into that in more detail. > > But this was an issue already before these changes. We set CMD_RING_STATE_ABORTED state under locking. I'm not checking what is for taking lock for register though, I guess it should be enough just lock around of read=>write of ->cmd_ring if need lock. [Rather ->cmd_ring user should check CMD_RING_STATE_ABORTED state.] > But then again I really like OGAWA Hiroumi's solution that separates the > command ring stopping from aborting commands and restarting the ring. > > The current way of always restarting the command ring as a response to > a stop command ring command really limits its usage. > > So, with this in mind most reasonable would be to > 1. fix the lock to cover abort+CRR check, and send it to usb-linus +stable > 2. rebase OGAWA Hirofumi's changes on top of that, and send to usb-linus only > 3. remove unnecessary second abort try as a separate patch, send to usb-next > 4. remove polling for the Command ring running (CRR), waiting for completion > is enough, if completion times out then we can check CRR. for usb-next I think we should check both of CRR and even of stop completion. Because CRR and stop completion was not same time (both can be first). Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()
Mathias Nyman writes: > On 14.12.2016 01:40, OGAWA Hirofumi wrote: >> ping about [PATCH 1/3, 2/3, 3/3]? >> > > 1/3 and 2/3 will be sent to 4.10 usb-linus after rc1, > 3/3 maybe to usb-next after that Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()
ping about [PATCH 1/3, 2/3, 3/3]? OGAWA Hirofumi writes: > Mathias Nyman writes: > >>> - Add xhci_handshake_sleep(), and use it. >> >> This seems a but overkill, I'd rather don't have xhci_handshake(), >> xhci_handshake_sleep() and __xhci_handshake() to maintain. > > I agree about it. However, on other hand, I thought about the > possibility/effort to decreasing use-cases of xhci_handshake() > busyloop. (there are several places to use more than e.g. 500ms > timeout.) Because long busyloop (especially with interrupt-off) has > really bad effect to whole system of non-usb. > >> As we now can sleep in xhci_abort_cmd_ring() I would rather just first >> wait for the completion and then maybe handshake check the register. >> At that point it shouldn't need to busyloop anymore, one read should >> do it. > > I worried about in the case of real internal confusing, and consider > about chip that doesn't have no stop/abort event. > > To handle both case, timeout choice would not be straight-forward. > >> But this is all just optimizing an error path, I'm actually fine with >> taking just patch 1/3 and 2/3 v3, and skipping 3/3. the udelay(1000) >> to msleep(1) I can add myself > > Right. The main point of patchset is 1/3 and 2/3. > >>> As related change, current xhci_handshake() is strange behavior, >>> E.g. xhci_handshake(ptr, mask, done, 1) does >>> >>> result = readl(ptr); >>> /* check result */ >>> udelay(1); <= meaningless delay >>> return -ETIMEDOUT; >> >> Only in the timeout case, so if we after 3 or 5 million register reads >> + udelays(1) still don't get the value we want then there is an >> unnecessary udelay(1). >> >> Not worth putting much effort on it now. > > True. But actual effort for it was very small. > > @@ -65,16 +65,18 @@ int xhci_handshake(void __iomem *ptr, u3 > { > u32 result; > > - do { > + while (1) { > result = readl(ptr); > if (result == ~(u32)0) /* card removed */ > return -ENODEV; > result &= mask; > if (result == done) > return 0; > + if (usec <= 0) > + break; > udelay(1); > usec--; > - } while (usec > 0); > + } > return -ETIMEDOUT; > } > > Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletier writes: > Problem is, I could not reproduce the "Command completion event does not > match command" error either (which used to happen even on successful > boots), even without your patch. Checking my syslog, I see it did > happen with my previous kernel (vanilla 4.6.0). > Could something else since 4.6.0 have improved xhci initialisation on > 00:14.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx, > Celeron N2000 Series USB xHCI (rev 0e) > From a naive "git log drivers/usb/host/xhci*" read, > commit 3425aa03f484d45dc21e0e791c2f6c74ea656421 > Author: Mathias Nyman > Date: Wed Jun 1 18:09:08 2016 +0300 > xhci: Fix handling timeouted commands on hosts in weird states. > caught my eye, along with some quirk reworks but I can't tell if they > could be related to the improvement. > FWIW the following log line > did not change since (at least) 4.4.0 on this machine: > xhci_hcd :00:14.0: hcc params 0x200077c1 hci version 0x100 quirks > 0x9810 > > So I cannot tell for sure whether your patch set improves the situation. > > What I can say though is that after about 5 boots *with* your patch is > that I had no boot failure nor xhci warnings, so at least no obvious > regression either. Thanks for testing. Looks like that root abort cause itself was fixed somewhere. I think that workaround is not needed anymore though, Mathias will decide whether removing or not. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletier writes: >> I couldn't reproduce that "no CRC" >> issue with debian kernel's .config > > I did not diff recently my .config with debian's. I think I based it on > Debian's 4.2 and then let it evolve "naturally" following vanilla. > I checked all diffs against Debian's 4.8 and reduced it a lot (mostly > new drivers which I did not enable). Doing so, I discovered that I > somehow did not have "CONFIG_MODVERSIONS" set at all (not even the > "is not set" comment). > > 4.9-rc7 is done building, there is no warning about CRC anymore - > I'm not sure whether this comes from rc7 or MODVERSIONS. I can check if > you think it is valuable. At 4.9-rc7, MODVERSIONS was marked as BROKEN. It might fix that issue. Well, anyway, it doesn't matter anymore. > Resulting kernel booted successfully once, but I must do several more > tries before being able to tell if it reliably does so. > I also want to check rc7 without your patches, to compare boot failure > rate. > > For reference, the I pushed the source I built in > https://github.com/vpelletier/linux/tree/ts651_4.9-rc7 > Please do check I did not do mistakes when resolving conflicts (the 2 > last commits). Your ts651_4.9-rc7 (latest commit) is broken somehow. The following incremental patch for ts651_4.9-rc7 is to fix it. With this, xhci will become same with mine. Thanks. -- OGAWA Hirofumi --- drivers/usb/host/xhci-mem.c |2 +- drivers/usb/host/xhci-ring.c | 16 +--- 2 files changed, 6 insertions(+), 12 deletions(-) diff -puN drivers/usb/host/xhci-ring.c~ts651_4.9-rc7-fix drivers/usb/host/xhci-ring.c --- linux/drivers/usb/host/xhci-ring.c~ts651_4.9-rc7-fix2016-11-29 01:58:29.103923074 +0900 +++ linux-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-29 02:02:48.958281665 +0900 @@ -327,16 +327,9 @@ static int xhci_abort_cmd_ring(struct xh xhci_dbg(xhci, "Abort command ring\n"); - temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); - xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; + reinit_completion(&xhci->stop_completion); - /* -* Writing the CMD_RING_ABORT bit should cause a cmd completion event, -* however on some host hw the CMD_RING_RUNNING bit is correctly cleared -* but the completion event in never sent. Use the cmd timeout timer to -* handle those cases. Use twice the time to cover the bit polling retry -*/ - xhci_mod_cmd_timer(xhci, 2 * XHCI_CMD_DEFAULT_TIMEOUT); + temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, &xhci->op_regs->cmd_ring); @@ -357,6 +350,7 @@ static int xhci_abort_cmd_ring(struct xh xhci_halt(xhci); return -ESHUTDOWN; } + /* * Writing the CMD_RING_ABORT bit should cause a cmd completion event, * however on some host hw the CMD_RING_RUNNING bit is correctly cleared @@ -1318,7 +1312,7 @@ void xhci_handle_command_timeout(struct /* command timeout on stopped ring, ring can't be aborted */ xhci_dbg(xhci, "Command timeout on stopped ring\n"); - complete_all(&xhci->stop_completion); + xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); spin_unlock_irqrestore(&xhci->lock, flags); return; } @@ -1359,7 +1353,7 @@ static void handle_cmd_completion(struct /* If CMD ring stopped we own the trbs between enqueue and dequeue */ if (cmd_comp_code == COMP_CMD_STOP) { - xhci_handle_stopped_cmd_ring(xhci, cmd); + complete_all(&xhci->stop_completion); return; } diff -puN drivers/usb/host/xhci-mem.c~ts651_4.9-rc7-fix drivers/usb/host/xhci-mem.c --- linux/drivers/usb/host/xhci-mem.c~ts651_4.9-rc7-fix 2016-11-29 02:01:48.178963612 +0900 +++ linux-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-29 02:02:02.341037706 +0900 @@ -1796,7 +1796,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x int size; int i, j, num_ports; - cancel_delayed_work_sync(&xhci->cmd_timer);; + cancel_delayed_work_sync(&xhci->cmd_timer); /* Free the Event Ring Segment Table and the actual Event Ring */ size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries); _ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
OGAWA Hirofumi writes: >> I succeed to build 4.9-rc9 vanilla (starting from the same .config as >> with 4.8.9) but it then fails to load produced modules. During the >> build I have the following warnings: >> WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC! > > Looks like, CONFIG_MODVERSIONS is broken. CONFIG_MODVERSIONS=n may fix it. Hm, this issue might be same with 4efca4ed05cbdfd13ec3e8cb623fb77d6e4ab187. But it was already including into 4.9-rc7. 4.9-rc9 was actually -rc1? Well, can you try 4.9-rc7 if older one? Then if not working, can you send .config? I couldn't reproduce that "no CRC" issue with debian kernel's .config (BTW, compiler is debian/testing gcc-6.2.0-13). Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletier writes: > I fail to build 4.8.9 vanilla: > Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: -fstack-protector not > supported by compiler Probably, debian's gcc default-pie issue. Below patch may fix it. > I succeed to build 4.9-rc9 vanilla (starting from the same .config as > with 4.8.9) but it then fails to load produced modules. During the > build I have the following warnings: > WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC! Looks like, CONFIG_MODVERSIONS is broken. CONFIG_MODVERSIONS=n may fix it. Thanks. -- OGAWA Hirofumi 8ae94224c9d72fc4d9aaac93b2d7833cf46d7141 82031ea29e454b574bc6f49a33683a693ca5d907 90944e40ba1838de4b2a9290cf273f9d76bd3bdd c6a385539175ebc603da53aafb7753d39089f32e Signed-off-by: OGAWA Hirofumi --- Makefile |5 +++-- arch/x86/purgatory/Makefile |1 + scripts/gcc-x86_64-has-stack-protector.sh |2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff -puN Makefile~debian-pie-default-fix Makefile --- linux/Makefile~debian-pie-default-fix 2016-11-26 22:57:00.681881071 +0900 +++ linux-hirofumi/Makefile 2016-11-26 22:57:00.683881081 +0900 @@ -399,11 +399,12 @@ KBUILD_CFLAGS := -Wall -Wundef -Wstric -fno-strict-aliasing -fno-common \ -Werror-implicit-function-declaration \ -Wno-format-security \ - -std=gnu89 + -std=gnu89 $(call cc-option,-fno-PIE) + KBUILD_AFLAGS_KERNEL := KBUILD_CFLAGS_KERNEL := -KBUILD_AFLAGS := -D__ASSEMBLY__ +KBUILD_AFLAGS := -D__ASSEMBLY__ $(call cc-option,-fno-PIE) KBUILD_AFLAGS_MODULE := -DMODULE KBUILD_CFLAGS_MODULE := -DMODULE KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds diff -puN scripts/gcc-x86_64-has-stack-protector.sh~debian-pie-default-fix scripts/gcc-x86_64-has-stack-protector.sh --- linux/scripts/gcc-x86_64-has-stack-protector.sh~debian-pie-default-fix 2016-11-26 22:57:00.681881071 +0900 +++ linux-hirofumi/scripts/gcc-x86_64-has-stack-protector.sh2016-11-26 22:57:00.683881081 +0900 @@ -1,6 +1,6 @@ #!/bin/sh -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs" +echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs" if [ "$?" -eq "0" ] ; then echo y else diff -puN arch/x86/purgatory/Makefile~debian-pie-default-fix arch/x86/purgatory/Makefile --- linux/arch/x86/purgatory/Makefile~debian-pie-default-fix2016-11-26 22:57:00.682881076 +0900 +++ linux-hirofumi/arch/x86/purgatory/Makefile 2016-11-26 22:57:00.683881081 +0900 @@ -16,6 +16,7 @@ KCOV_INSTRUMENT := n KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -MD -Os -mcmodel=large KBUILD_CFLAGS += -m$(BITS) +KBUILD_CFLAGS += $(call cc-option,-fno-PIE) $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld) _ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()
Mathias Nyman writes: >> - Add xhci_handshake_sleep(), and use it. > > This seems a but overkill, I'd rather don't have xhci_handshake(), > xhci_handshake_sleep() and __xhci_handshake() to maintain. I agree about it. However, on other hand, I thought about the possibility/effort to decreasing use-cases of xhci_handshake() busyloop. (there are several places to use more than e.g. 500ms timeout.) Because long busyloop (especially with interrupt-off) has really bad effect to whole system of non-usb. > As we now can sleep in xhci_abort_cmd_ring() I would rather just first > wait for the completion and then maybe handshake check the register. > At that point it shouldn't need to busyloop anymore, one read should > do it. I worried about in the case of real internal confusing, and consider about chip that doesn't have no stop/abort event. To handle both case, timeout choice would not be straight-forward. > But this is all just optimizing an error path, I'm actually fine with > taking just patch 1/3 and 2/3 v3, and skipping 3/3. the udelay(1000) > to msleep(1) I can add myself Right. The main point of patchset is 1/3 and 2/3. >> As related change, current xhci_handshake() is strange behavior, >> E.g. xhci_handshake(ptr, mask, done, 1) does >> >> result = readl(ptr); >> /* check result */ >> udelay(1); <= meaningless delay >> return -ETIMEDOUT; > > Only in the timeout case, so if we after 3 or 5 million register reads > + udelays(1) still don't get the value we want then there is an > unnecessary udelay(1). > > Not worth putting much effort on it now. True. But actual effort for it was very small. @@ -65,16 +65,18 @@ int xhci_handshake(void __iomem *ptr, u3 { u32 result; - do { + while (1) { result = readl(ptr); if (result == ~(u32)0) /* card removed */ return -ENODEV; result &= mask; if (result == done) return 0; + if (usec <= 0) + break; udelay(1); usec--; - } while (usec > 0); + } return -ETIMEDOUT; } Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletier writes: > On Wed, 23 Nov 2016 22:48:35 +0900, OGAWA Hirofumi > wrote: >> ping? > > FWIW, I intend to run this patch on the hardware which caused the > issue (thanks Mathias for the CC !). > > So far, in the very short attempt I made, I failed to even build the > kernel (some stack protection feature error when being probed in > gcc...). I should have time to try again this week-end, but please do > not wait for me. If you need my help for something, let me know either publicly or privately what way you want. Well, I can't see why he is waiting something. My patchset is fix to race, and your stuff is to remove unnecessary code, i.e. patchset has no dependency actually. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
ping? OGAWA Hirofumi writes: > Current abort operation has race. > > xhci_handle_command_timeout() > xhci_abort_cmd_ring() > xhci_write_64(CMD_RING_ABORT) > xhci_handshake(5s) > do { > check CMD_RING_RUNNING > udelay(1) > ... > COMP_CMD_ABORT event > COMP_CMD_STOP event > xhci_handle_stopped_cmd_ring() >restart cmd_ring > CMD_RING_RUNNING become 1 again > } while () > return -ETIMEDOUT > xhci_write_64(CMD_RING_ABORT) > /* can abort random command */ > > To do abort operation correctly, we have to wait both of COMP_CMD_STOP > event and negation of CMD_RING_RUNNING. > > But like above, while timeout handler is waiting negation of > CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout > handler never be notice negation of CMD_RING_RUNNING, and retry of > CMD_RING_ABORT can abort random command (BTW, I guess retry of > CMD_RING_ABORT was workaround of this race). > > To fix this race, this moves xhci_handle_stopped_cmd_ring() to > xhci_abort_cmd_ring(). And timeout handler waits COMP_CMD_STOP event. > > At this point, timeout handler is owner of cmd_ring, and safely > restart cmd_ring by using xhci_handle_stopped_cmd_ring(). > > [FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE > operation] > > Signed-off-by: OGAWA Hirofumi > --- > > drivers/usb/host/xhci-mem.c |1 > drivers/usb/host/xhci-ring.c | 163 > ++ > drivers/usb/host/xhci.h |1 > 3 files changed, 88 insertions(+), 77 deletions(-) > > diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 > drivers/usb/host/xhci-mem.c > --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 2016-11-16 > 18:38:52.551146764 +0900 > +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-17 01:04:11.402014445 > +0900 > @@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, > > /* init command timeout work */ > INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout); > + init_completion(&xhci->cmd_ring_stop_completion); > > page_size = readl(&xhci->op_regs->page_size); > xhci_dbg_trace(xhci, trace_xhci_dbg_init, > diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 > drivers/usb/host/xhci-ring.c > --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race22016-11-16 > 18:38:52.552146764 +0900 > +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c2016-11-17 > 01:04:06.393018485 +0900 > @@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh > return mod_delayed_work(system_wq, &xhci->cmd_timer, delay); > } > > +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci) > +{ > + return list_first_entry_or_null(&xhci->cmd_list, struct xhci_command, > + cmd_list); > +} > + > +/* > + * Turn all commands on command ring with status set to "aborted" to no-op > trbs. > + * If there are other commands waiting then restart the ring and kick the > timer. > + * This must be called with command ring stopped and xhci->lock held. > + */ > +static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci, > + struct xhci_command *cur_cmd) > +{ > + struct xhci_command *i_cmd; > + u32 cycle_state; > + > + /* Turn all aborted commands in list to no-ops, then restart */ > + list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) { > + > + if (i_cmd->status != COMP_CMD_ABORT) > + continue; > + > + i_cmd->status = COMP_CMD_STOP; > + > + xhci_dbg(xhci, "Turn aborted command %p to no-op\n", > + i_cmd->command_trb); > + /* get cycle state from the original cmd trb */ > + cycle_state = le32_to_cpu( > + i_cmd->command_trb->generic.field[3]) & TRB_CYCLE; > + /* modify the command trb to no-op command */ > + i_cmd->command_trb->generic.field[0] = 0; > + i_cmd->command_trb->generic.field[1] = 0; > + i_cmd->command_trb->generic.field[2] = 0; > + i_cmd->command_trb->generic.field[3] = cpu_to_le32( > + TRB_TYPE(TRB_CMD_NOOP) | cycle_state); > + &g
[PATCH 2/3 v3] xhci: Fix race related to abort operation
Current abort operation has race. xhci_handle_command_timeout() xhci_abort_cmd_ring() xhci_write_64(CMD_RING_ABORT) xhci_handshake(5s) do { check CMD_RING_RUNNING udelay(1) ... COMP_CMD_ABORT event COMP_CMD_STOP event xhci_handle_stopped_cmd_ring() restart cmd_ring CMD_RING_RUNNING become 1 again } while () return -ETIMEDOUT xhci_write_64(CMD_RING_ABORT) /* can abort random command */ To do abort operation correctly, we have to wait both of COMP_CMD_STOP event and negation of CMD_RING_RUNNING. But like above, while timeout handler is waiting negation of CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout handler never be notice negation of CMD_RING_RUNNING, and retry of CMD_RING_ABORT can abort random command (BTW, I guess retry of CMD_RING_ABORT was workaround of this race). To fix this race, this moves xhci_handle_stopped_cmd_ring() to xhci_abort_cmd_ring(). And timeout handler waits COMP_CMD_STOP event. At this point, timeout handler is owner of cmd_ring, and safely restart cmd_ring by using xhci_handle_stopped_cmd_ring(). [FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE operation] Signed-off-by: OGAWA Hirofumi --- drivers/usb/host/xhci-mem.c |1 drivers/usb/host/xhci-ring.c | 163 ++ drivers/usb/host/xhci.h |1 3 files changed, 88 insertions(+), 77 deletions(-) diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 drivers/usb/host/xhci-mem.c --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 2016-11-16 18:38:52.551146764 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-17 01:04:11.402014445 +0900 @@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, /* init command timeout work */ INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout); + init_completion(&xhci->cmd_ring_stop_completion); page_size = readl(&xhci->op_regs->page_size); xhci_dbg_trace(xhci, trace_xhci_dbg_init, diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 drivers/usb/host/xhci-ring.c --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 2016-11-16 18:38:52.552146764 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-17 01:04:06.393018485 +0900 @@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh return mod_delayed_work(system_wq, &xhci->cmd_timer, delay); } +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci) +{ + return list_first_entry_or_null(&xhci->cmd_list, struct xhci_command, + cmd_list); +} + +/* + * Turn all commands on command ring with status set to "aborted" to no-op trbs. + * If there are other commands waiting then restart the ring and kick the timer. + * This must be called with command ring stopped and xhci->lock held. + */ +static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci, +struct xhci_command *cur_cmd) +{ + struct xhci_command *i_cmd; + u32 cycle_state; + + /* Turn all aborted commands in list to no-ops, then restart */ + list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) { + + if (i_cmd->status != COMP_CMD_ABORT) + continue; + + i_cmd->status = COMP_CMD_STOP; + + xhci_dbg(xhci, "Turn aborted command %p to no-op\n", +i_cmd->command_trb); + /* get cycle state from the original cmd trb */ + cycle_state = le32_to_cpu( + i_cmd->command_trb->generic.field[3]) & TRB_CYCLE; + /* modify the command trb to no-op command */ + i_cmd->command_trb->generic.field[0] = 0; + i_cmd->command_trb->generic.field[1] = 0; + i_cmd->command_trb->generic.field[2] = 0; + i_cmd->command_trb->generic.field[3] = cpu_to_le32( + TRB_TYPE(TRB_CMD_NOOP) | cycle_state); + + /* +* caller waiting for completion is called when command +* completion event is received for these no-op commands +*/ + } + + xhci->cmd_ring_state = CMD_RING_STATE_RUNNING; + + /* ring command ring doorbell to restart the command ring */ + if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) && + !(xhci->xhc_state & XHCI_STATE_DYING)) { + xhci->current_c
Re: [PATCH 2/3 v2] xhci: Fix race related to abort operation
Mathias Nyman writes: > The retry was a workaround for a case triggered on v4.1.4 by > Vincent Pelletier (added to CC) > >http://marc.info/?l=linux-usb&m=144031185222108&w=2 > > The race could very well explain that issue. >From logs of that thread, it would has possibility about this race. >> @@ -320,15 +367,29 @@ static int xhci_abort_cmd_ring(struct xh >> udelay(1000); >> ret = xhci_handshake(&xhci->op_regs->cmd_ring, >> CMD_RING_RUNNING, 0, 3 * 1000 * 1000); >> -if (ret == 0) >> -return 0; >> +if (ret < 0) { >> +xhci_err(xhci, "Stopped the command ring failed, " >> + "maybe the host is dead\n"); >> +xhci->xhc_state |= XHCI_STATE_DYING; >> +xhci_halt(xhci); >> +return -ESHUTDOWN; >> +} >> +} > > So if we can verify that the race fix solves the old issue for Vincent > Pelletier then we could get rid of the abort retry above as well. Right. However, I'm not sure (and maybe you neither). So I didn't remove it in this patchset. (After this patchset, I guess the retry will hit only when actual chip internal confusion. I.e. retry will fail, then will chip halted.) Well, anyway, if you decide to try to remove the retry, I also think it would worth to try. However, it would be [PATCH 4/4] or as another patchset. >> +/* >> + * Writing the CMD_RING_ABORT bit should cause a cmd completion event, >> + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared >> + * but the completion event in never sent. Wait 2 secs (arbitrary >> + * number) to handle those cases after negation of CMD_RING_RUNNING. >> + */ > > I'm speculating a bit here, but as we now can sleep, and if we could > get rid of the abort retry couldn't we swap the order of the > xhci_handshake(CMD_RING_RUNNING) busywait and > wait_for_completion_timeout(). We could also use a standard command > timeout time while waiting for the completion > > something like: > > hci_write_64(CMD_RING_ABORT) > timed_out = wait_for_completion_timeout(XHCI_CMD_DEFAULT_TIMEOUT) > xhci_handshake(CMD_RING_RUNNING) > if (handshake fail) { > xhci_halt(), etc.. >return -ESHUTDOWN > } > if (timed out) { >xhci_cleanup_command_queue(xhci); >return > } > > It would reduce the time we spend in the xhci_handshake() busyloop BTW, busyloop removal was done by "[PATCH 3/3] ...". By [PATCH 3/3], we can simply use straight forward coding without busyloop. >> diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race2 >> drivers/usb/host/xhci.h >> --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race22016-11-16 >> 18:38:52.552146764 +0900 >> +++ xhci-hirofumi/drivers/usb/host/xhci.h2016-11-16 18:38:52.554146762 >> +0900 >> @@ -1574,6 +1574,7 @@ struct xhci_hcd { >> struct list_head cmd_list; >> unsigned intcmd_ring_reserved_trbs; >> struct delayed_work cmd_timer; >> +struct completion stop_completion; > > Tiny thing, but maybe change stop_completion to > cmd_ring_stop_completion. to avoid mixing it with stopping host > controller, stop endpoint, stop device etc OK. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3 v2] xhci: Fix race related to abort operation
Current abort operation has race. xhci_handle_command_timeout() xhci_abort_cmd_ring() xhci_write_64(CMD_RING_ABORT) xhci_handshake(5s) do { check CMD_RING_RUNNING udelay(1) ... COMP_CMD_ABORT event COMP_CMD_STOP event xhci_handle_stopped_cmd_ring() restart cmd_ring CMD_RING_RUNNING become 1 again } while () return -ETIMEDOUT xhci_write_64(CMD_RING_ABORT) /* can abort random command */ To do abort operation correctly, we have to wait both of COMP_CMD_STOP event and negation of CMD_RING_RUNNING. But like above, while timeout handler is waiting negation of CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout handler never be notice negation of CMD_RING_RUNNING, and retry of CMD_RING_ABORT can abort random command (BTW, I guess retry of CMD_RING_ABORT was workaround of this race). To fix this race, this moves xhci_handle_stopped_cmd_ring() to xhci_abort_cmd_ring(). And timeout handler waits COMP_CMD_STOP event. At this point, timeout handler is owner of cmd_ring, and safely restart cmd_ring by using xhci_handle_stopped_cmd_ring(). [FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE operation] Signed-off-by: OGAWA Hirofumi --- drivers/usb/host/xhci-mem.c |1 drivers/usb/host/xhci-ring.c | 162 ++ drivers/usb/host/xhci.h |1 3 files changed, 87 insertions(+), 77 deletions(-) diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 drivers/usb/host/xhci-mem.c --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 2016-11-16 18:38:52.551146764 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-16 18:38:52.553146763 +0900 @@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, /* init command timeout work */ INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout); + init_completion(&xhci->stop_completion); page_size = readl(&xhci->op_regs->page_size); xhci_dbg_trace(xhci, trace_xhci_dbg_init, diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 drivers/usb/host/xhci-ring.c --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 2016-11-16 18:38:52.552146764 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-16 18:39:07.027136278 +0900 @@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh return mod_delayed_work(system_wq, &xhci->cmd_timer, delay); } +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci) +{ + return list_first_entry_or_null(&xhci->cmd_list, struct xhci_command, + cmd_list); +} + +/* + * Turn all commands on command ring with status set to "aborted" to no-op trbs. + * If there are other commands waiting then restart the ring and kick the timer. + * This must be called with command ring stopped and xhci->lock held. + */ +static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci, +struct xhci_command *cur_cmd) +{ + struct xhci_command *i_cmd; + u32 cycle_state; + + /* Turn all aborted commands in list to no-ops, then restart */ + list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) { + + if (i_cmd->status != COMP_CMD_ABORT) + continue; + + i_cmd->status = COMP_CMD_STOP; + + xhci_dbg(xhci, "Turn aborted command %p to no-op\n", +i_cmd->command_trb); + /* get cycle state from the original cmd trb */ + cycle_state = le32_to_cpu( + i_cmd->command_trb->generic.field[3]) & TRB_CYCLE; + /* modify the command trb to no-op command */ + i_cmd->command_trb->generic.field[0] = 0; + i_cmd->command_trb->generic.field[1] = 0; + i_cmd->command_trb->generic.field[2] = 0; + i_cmd->command_trb->generic.field[3] = cpu_to_le32( + TRB_TYPE(TRB_CMD_NOOP) | cycle_state); + + /* +* caller waiting for completion is called when command +* completion event is received for these no-op commands +*/ + } + + xhci->cmd_ring_state = CMD_RING_STATE_RUNNING; + + /* ring command ring doorbell to restart the command ring */ + if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) && + !(xhci->xhc_state & XHCI_STATE_DYING)) { + xhci->current_cmd =
Re: [PATCH 2/3] xhci: Fix race related to abort operation
Felipe Balbi writes: > Hi, Hi, > OGAWA Hirofumi writes: >> diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 >> drivers/usb/host/xhci-ring.c >> --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 2016-11-16 >> 13:36:07.219329211 +0900 >> +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-16 >> 13:36:07.221329211 +0900 >> @@ -284,6 +284,61 @@ static bool xhci_mod_cmd_timer(struct xh >> return mod_delayed_work(system_wq, &xhci->cmd_timer, delay); >> } >> >> +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci) >> +{ >> +if (list_empty(&xhci->cmd_list)) >> +return NULL; >> +return list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list); > > could use list_first_entry_or_null() here OK. Thanks for pointing out. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()
Now, xhci_abort_cmd_ring() is sleepable. So no reason to use busy loop anymore. - Convert udelay(1000) => msleep(1). - Add xhci_handshake_sleep(), and use it. As related change, current xhci_handshake() is strange behavior, E.g. xhci_handshake(ptr, mask, done, 1) does result = readl(ptr); /* check result */ udelay(1); <= meaningless delay return -ETIMEDOUT; Instead of above, this changes to do result = readl(ptr); /* check result */ udelay(1); result = readl(ptr); /* check result */ return -ETIMEDOUT; Signed-off-by: OGAWA Hirofumi --- drivers/usb/host/xhci-ring.c | 10 drivers/usb/host/xhci.c | 52 +- drivers/usb/host/xhci.h |1 3 files changed, 48 insertions(+), 15 deletions(-) diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race3 drivers/usb/host/xhci-ring.c --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race3 2016-11-16 13:36:08.787328641 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-16 13:36:08.789328640 +0900 @@ -359,15 +359,15 @@ static int xhci_abort_cmd_ring(struct xh * seconds), then it should assume that the there are * larger problems with the xHC and assert HCRST. */ - ret = xhci_handshake(&xhci->op_regs->cmd_ring, - CMD_RING_RUNNING, 0, 5 * 1000 * 1000); + ret = xhci_handshake_sleep(&xhci->op_regs->cmd_ring, + CMD_RING_RUNNING, 0, 5 * 1000 * 1000); if (ret < 0) { /* we are about to kill xhci, give it one more chance */ xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, &xhci->op_regs->cmd_ring); - udelay(1000); - ret = xhci_handshake(&xhci->op_regs->cmd_ring, -CMD_RING_RUNNING, 0, 3 * 1000 * 1000); + msleep(1); + ret = xhci_handshake_sleep(&xhci->op_regs->cmd_ring, + CMD_RING_RUNNING, 0, 3 * 1000 * 1000); if (ret < 0) { xhci_err(xhci, "Stopped the command ring failed, " "maybe the host is dead\n"); diff -puN drivers/usb/host/xhci.c~xhci-fix-abort-race3 drivers/usb/host/xhci.c --- xhci/drivers/usb/host/xhci.c~xhci-fix-abort-race3 2016-11-16 13:36:08.787328641 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci.c 2016-11-16 13:36:08.790328639 +0900 @@ -61,21 +61,53 @@ MODULE_PARM_DESC(quirks, "Bit flags for * handshake done). There are two failure modes: "usec" have passed (major * hardware flakeout), or the register reads as all-ones (hardware removed). */ +static int __xhci_handshake(void __iomem *ptr, u32 mask, u32 done) +{ + u32 result; + + result = readl(ptr); + if (result == ~(u32)0) /* card removed */ + return -ENODEV; + result &= mask; + if (result == done) + return 0; + return 1; +} + int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec) { - u32 result; + int ret; - do { - result = readl(ptr); - if (result == ~(u32)0) /* card removed */ - return -ENODEV; - result &= mask; - if (result == done) - return 0; + while (1) { + ret = __xhci_handshake(ptr, mask, done); + if (ret <= 0) + goto out; + if (usec <= 0) + break; udelay(1); usec--; - } while (usec > 0); - return -ETIMEDOUT; + } + ret = -ETIMEDOUT; +out: + return ret; +} + +int xhci_handshake_sleep(void __iomem *ptr, u32 mask, u32 done, int usec) +{ + unsigned long timeout = jiffies + usecs_to_jiffies(usec); + int ret; + + while (1) { + ret = __xhci_handshake(ptr, mask, done); + if (ret <= 0) + goto out; + if (time_after(jiffies, timeout)) + break; + usleep_range(1, 10); + } + ret = -ETIMEDOUT; +out: + return ret; } /* diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race3 drivers/usb/host/xhci.h --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race3 2016-11-16 13:36:08.788328640 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci.h 2016-11-16 13:36:08.790328639 +0900 @@ -1844,6 +1844,7 @@ void xhci_free_command(struct xhci_hcd * /* xHCI host controller glue */ typedef void (*xhci_get_quirks_t)(struct device *, struct xhci_hcd *); int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec); +int xhci_handshake_sleep(void __iomem *ptr, u32 mask, u32 done, int usec); v
[PATCH 2/3] xhci: Fix race related to abort operation
Current abort operation has race. xhci_handle_command_timeout() xhci_abort_cmd_ring() xhci_write_64(CMD_RING_ABORT) xhci_handshake(5s) do { check CMD_RING_RUNNING udelay(1) ... COMP_CMD_ABORT event COMP_CMD_STOP event xhci_handle_stopped_cmd_ring() restart cmd_ring CMD_RING_RUNNING become 1 again } while () return -ETIMEDOUT xhci_write_64(CMD_RING_ABORT) /* can abort random command */ To do abort operation correctly, we have to wait both of COMP_CMD_STOP event and negation of CMD_RING_RUNNING. But like above, while timeout handler is waiting negation of CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout handler never be notice negation of CMD_RING_RUNNING, and retry of CMD_RING_ABORT can abort random command (BTW, I guess retry of CMD_RING_ABORT was workaround of this race). To fix this race, this moves xhci_handle_stopped_cmd_ring() to xhci_abort_cmd_ring(). And timeout handler waits COMP_CMD_STOP event. At this point, timeout handler is owner of cmd_ring, and safely restart cmd_ring by using xhci_handle_stopped_cmd_ring(). [FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE operation] Signed-off-by: OGAWA Hirofumi --- drivers/usb/host/xhci-mem.c |1 drivers/usb/host/xhci-ring.c | 163 ++ drivers/usb/host/xhci.h |1 3 files changed, 88 insertions(+), 77 deletions(-) diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 drivers/usb/host/xhci-mem.c --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 2016-11-16 13:36:07.219329211 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-16 13:36:07.221329211 +0900 @@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, /* init command timeout work */ INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout); + init_completion(&xhci->stop_completion); page_size = readl(&xhci->op_regs->page_size); xhci_dbg_trace(xhci, trace_xhci_dbg_init, diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 drivers/usb/host/xhci-ring.c --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 2016-11-16 13:36:07.219329211 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-16 13:36:07.221329211 +0900 @@ -284,6 +284,61 @@ static bool xhci_mod_cmd_timer(struct xh return mod_delayed_work(system_wq, &xhci->cmd_timer, delay); } +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci) +{ + if (list_empty(&xhci->cmd_list)) + return NULL; + return list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list); +} + +/* + * Turn all commands on command ring with status set to "aborted" to no-op trbs. + * If there are other commands waiting then restart the ring and kick the timer. + * This must be called with command ring stopped and xhci->lock held. + */ +static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci, +struct xhci_command *cur_cmd) +{ + struct xhci_command *i_cmd; + u32 cycle_state; + + /* Turn all aborted commands in list to no-ops, then restart */ + list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) { + + if (i_cmd->status != COMP_CMD_ABORT) + continue; + + i_cmd->status = COMP_CMD_STOP; + + xhci_dbg(xhci, "Turn aborted command %p to no-op\n", +i_cmd->command_trb); + /* get cycle state from the original cmd trb */ + cycle_state = le32_to_cpu( + i_cmd->command_trb->generic.field[3]) & TRB_CYCLE; + /* modify the command trb to no-op command */ + i_cmd->command_trb->generic.field[0] = 0; + i_cmd->command_trb->generic.field[1] = 0; + i_cmd->command_trb->generic.field[2] = 0; + i_cmd->command_trb->generic.field[3] = cpu_to_le32( + TRB_TYPE(TRB_CMD_NOOP) | cycle_state); + + /* +* caller waiting for completion is called when command +* completion event is received for these no-op commands +*/ + } + + xhci->cmd_ring_state = CMD_RING_STATE_RUNNING; + + /* ring command ring doorbell to restart the command ring */ + if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) && + !(xhci->xhc_state & XHCI_STATE_DYING)) { +
[PATCH 1/3] xhci: Use delayed_work instead of timer for command timeout
This is preparation to fix abort operation race (See "xhci: Fix race related to abort operation"). To make timeout sleepable, use delayed_work instead of timer. Signed-off-by: OGAWA Hirofumi --- drivers/usb/host/xhci-mem.c |7 +++ drivers/usb/host/xhci-ring.c | 24 +++- drivers/usb/host/xhci.h |4 ++-- 3 files changed, 20 insertions(+), 15 deletions(-) diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race1 drivers/usb/host/xhci-mem.c --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race1 2016-11-16 13:36:04.132330334 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-16 13:36:04.135330333 +0900 @@ -1795,7 +1795,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x int size; int i, j, num_ports; - del_timer_sync(&xhci->cmd_timer); + cancel_delayed_work_sync(&xhci->cmd_timer); /* Free the Event Ring Segment Table and the actual Event Ring */ size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries); @@ -2342,9 +2342,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, INIT_LIST_HEAD(&xhci->cmd_list); - /* init command timeout timer */ - setup_timer(&xhci->cmd_timer, xhci_handle_command_timeout, - (unsigned long)xhci); + /* init command timeout work */ + INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout); page_size = readl(&xhci->op_regs->page_size); xhci_dbg_trace(xhci, trace_xhci_dbg_init, diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race1 drivers/usb/host/xhci-ring.c --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race1 2016-11-16 13:36:04.10334 +0900 +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-16 13:36:04.135330333 +0900 @@ -279,6 +279,11 @@ void xhci_ring_cmd_db(struct xhci_hcd *x readl(&xhci->dba->doorbell[0]); } +static bool xhci_mod_cmd_timer(struct xhci_hcd *xhci, unsigned long delay) +{ + return mod_delayed_work(system_wq, &xhci->cmd_timer, delay); +} + static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) { u64 temp_64; @@ -295,7 +300,7 @@ static int xhci_abort_cmd_ring(struct xh * but the completion event in never sent. Use the cmd timeout timer to * handle those cases. Use twice the time to cover the bit polling retry */ - mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT)); + xhci_mod_cmd_timer(xhci, 2 * XHCI_CMD_DEFAULT_TIMEOUT); xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, &xhci->op_regs->cmd_ring); @@ -320,7 +325,7 @@ static int xhci_abort_cmd_ring(struct xh xhci_err(xhci, "Stopped the command ring failed, " "maybe the host is dead\n"); - del_timer(&xhci->cmd_timer); + cancel_delayed_work(&xhci->cmd_timer); xhci->xhc_state |= XHCI_STATE_DYING; xhci_halt(xhci); return -ESHUTDOWN; @@ -1251,21 +1256,22 @@ static void xhci_handle_stopped_cmd_ring if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) && !(xhci->xhc_state & XHCI_STATE_DYING)) { xhci->current_cmd = cur_cmd; - mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT); + xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT); xhci_ring_cmd_db(xhci); } return; } -void xhci_handle_command_timeout(unsigned long data) +void xhci_handle_command_timeout(struct work_struct *work) { struct xhci_hcd *xhci; int ret; unsigned long flags; u64 hw_ring_state; bool second_timeout = false; - xhci = (struct xhci_hcd *) data; + + xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer); /* mark this command to be cancelled */ spin_lock_irqsave(&xhci->lock, flags); @@ -1333,7 +1339,7 @@ static void handle_cmd_completion(struct cmd = list_entry(xhci->cmd_list.next, struct xhci_command, cmd_list); - del_timer(&xhci->cmd_timer); + cancel_delayed_work(&xhci->cmd_timer); trace_xhci_cmd_completion(cmd_trb, (struct xhci_generic_trb *) event); @@ -1421,7 +1427,7 @@ static void handle_cmd_completion(struct if (cmd->cmd_list.next != &xhci->cmd_list) { xhci->current_cmd = list_entry(cmd->cmd_list.next, struct xhci_command, cmd_list); - mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT); + xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT); } event_handled: @@ -3790,9 +3796,9 @@ static int queue_command(struct xhci_hcd /* if the
Re: [PATCH] xhci: Fix dereference from empty xhci->cmd_list
Mathias Nyman writes: >> OGAWA Hirofumi writes: >>> If race with timeout or stop event on empty cmd_list was happened, >>> handle_cmd_completion() deferences wrong pointer from xhci->cmd_list. >>> >>> So this makes sure we don't dereference wrong pointer from empty >>> xhci->cmd_list. >>> >>> Signed-off-by: OGAWA Hirofumi >>> --- > > Thanks, nice catch. > > What kernel is this based on, There is a partial fix in 4.8: > > commit 33be126510974e2eb9679f1ca9bca4f67ee4c4c7 > xhci: always handle "Command Ring Stopped" events > > Looking at the existing code the case you describe could be possible if > handle_cmd_completion is racing with: > - command timed out twice and xhci_cleanup_command_queue() is called > - or command timed out and we fail to restart command ring (assume host is > dead) > > It would be interesting to know how this was triggered, were there any error > messages like: > "Abort command ring failed" > or with xhci debugging enabled something like: > "command timed out twice, ring start fail?" Ah, sorry for confusing. I wrote this for older kernel (3.8.x, and not for vanilla). So 4.8 might not have race what I saw actually. (33be126510974e2eb9679f1ca9bca4f67ee4c4c7 seems to be prevent it) However, even if 4.8 fixed that case, if there is spurious event or driver state confusing, fetch and dereference empty list is possible and bad thing. So, IMO, it would be better to make sure to avoid it. >>> + if (list_empty(&xhci->cmd_list)) >>> + cmd = NULL; >>> + else { >>> + cmd = list_first_entry(&xhci->cmd_list, struct xhci_command, >>> + cmd_list); >>> + } >> >> These few lines could become a nice helper. Something like: >> >> static inline struct xhci_command *next_command(struct xhci_hcd *xhci) >> { >> if (list_empty(&xhci->cmd_list)) >> return NULL; >> return list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list); >> } >> > > Both works for me, helper function, or code in this patch. There are some opencode list_head usages too. It might be better to change at once as list cleanup if need. E.g. /* restart timer if this wasn't the last command */ if (cmd->cmd_list.next != &xhci->cmd_list) { is if (!list_is_last(&cmd->cmd_list, &xhci->cmd_list)) { Well, so, if next_command() cleanup is not required, I'd like to use as is, in this patch. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xhci: Fix dereference from empty xhci->cmd_list
If race with timeout or stop event on empty cmd_list was happened, handle_cmd_completion() deferences wrong pointer from xhci->cmd_list. So this makes sure we don't dereference wrong pointer from empty xhci->cmd_list. Signed-off-by: OGAWA Hirofumi --- drivers/usb/host/xhci-ring.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-empty-list drivers/usb/host/xhci-ring.c --- linux/drivers/usb/host/xhci-ring.c~xhci-fix-empty-list 2016-10-18 21:20:07.973055018 +0900 +++ linux-hirofumi/drivers/usb/host/xhci-ring.c 2016-10-18 21:21:35.949060365 +0900 @@ -1336,7 +1336,12 @@ static void handle_cmd_completion(struct return; } - cmd = list_entry(xhci->cmd_list.next, struct xhci_command, cmd_list); + if (list_empty(&xhci->cmd_list)) + cmd = NULL; + else { + cmd = list_first_entry(&xhci->cmd_list, struct xhci_command, + cmd_list); + } del_timer(&xhci->cmd_timer); @@ -1350,9 +1355,10 @@ static void handle_cmd_completion(struct return; } - if (cmd->command_trb != xhci->cmd_ring->dequeue) { + if (cmd == NULL || cmd->command_trb != xhci->cmd_ring->dequeue) { xhci_err(xhci, -"Command completion event does not match command\n"); +"Command completion event %s\n", +cmd ? "does not match command" : "on empty list"); return; } _ -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html