Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletierwrites: > Problem is, I could not reproduce the "Command completion event does not > match command" error either (which used to happen even on successful > boots), even without your patch. Checking my syslog, I see it did > happen with my previous kernel (vanilla 4.6.0). > Could something else since 4.6.0 have improved xhci initialisation on > 00:14.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx, > Celeron N2000 Series USB xHCI (rev 0e) > From a naive "git log drivers/usb/host/xhci*" read, > commit 3425aa03f484d45dc21e0e791c2f6c74ea656421 > Author: Mathias Nyman > Date: Wed Jun 1 18:09:08 2016 +0300 > xhci: Fix handling timeouted commands on hosts in weird states. > caught my eye, along with some quirk reworks but I can't tell if they > could be related to the improvement. > FWIW the following log line > did not change since (at least) 4.4.0 on this machine: > xhci_hcd :00:14.0: hcc params 0x200077c1 hci version 0x100 quirks > 0x9810 > > So I cannot tell for sure whether your patch set improves the situation. > > What I can say though is that after about 5 boots *with* your patch is > that I had no boot failure nor xhci warnings, so at least no obvious > regression either. Thanks for testing. Looks like that root abort cause itself was fixed somewhere. I think that workaround is not needed anymore though, Mathias will decide whether removing or not. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
On Tue, 29 Nov 2016 02:11:48 +0900, OGAWA Hirofumiwrote: > Your ts651_4.9-rc7 (latest commit) is broken somehow. The following > incremental patch for ts651_4.9-rc7 is to fix it. With this, xhci will > become same with mine. Thanks, applied. As a test, I tried to get the xhci driver to give up on initialise the controller (the bug I reported originally on 4.1.4) I failed to reproduce this bug, despite having applied the following revert: commit 4cfad5b7601a46b8f3c5b267abe52a9cbf77bdaf Author: Vincent Pelletier Date: Sun Nov 27 01:23:46 2016 + Revert "xhci: give command abortion one more chance before killing xhci" This reverts commit a6809ffd1687b3a8c192960e69add559b9d32649. I gave up after about 10 boots, with a mix of cold boots and warm reboots. Problem is, I could not reproduce the "Command completion event does not match command" error either (which used to happen even on successful boots), even without your patch. Checking my syslog, I see it did happen with my previous kernel (vanilla 4.6.0). Could something else since 4.6.0 have improved xhci initialisation on 00:14.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 0e) From a naive "git log drivers/usb/host/xhci*" read, commit 3425aa03f484d45dc21e0e791c2f6c74ea656421 Author: Mathias Nyman Date: Wed Jun 1 18:09:08 2016 +0300 xhci: Fix handling timeouted commands on hosts in weird states. caught my eye, along with some quirk reworks but I can't tell if they could be related to the improvement. FWIW the following log line did not change since (at least) 4.4.0 on this machine: xhci_hcd :00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x9810 So I cannot tell for sure whether your patch set improves the situation. What I can say though is that after about 5 boots *with* your patch is that I had no boot failure nor xhci warnings, so at least no obvious regression either. Regards, -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletierwrites: >> I couldn't reproduce that "no CRC" >> issue with debian kernel's .config > > I did not diff recently my .config with debian's. I think I based it on > Debian's 4.2 and then let it evolve "naturally" following vanilla. > I checked all diffs against Debian's 4.8 and reduced it a lot (mostly > new drivers which I did not enable). Doing so, I discovered that I > somehow did not have "CONFIG_MODVERSIONS" set at all (not even the > "is not set" comment). > > 4.9-rc7 is done building, there is no warning about CRC anymore - > I'm not sure whether this comes from rc7 or MODVERSIONS. I can check if > you think it is valuable. At 4.9-rc7, MODVERSIONS was marked as BROKEN. It might fix that issue. Well, anyway, it doesn't matter anymore. > Resulting kernel booted successfully once, but I must do several more > tries before being able to tell if it reliably does so. > I also want to check rc7 without your patches, to compare boot failure > rate. > > For reference, the I pushed the source I built in > https://github.com/vpelletier/linux/tree/ts651_4.9-rc7 > Please do check I did not do mistakes when resolving conflicts (the 2 > last commits). Your ts651_4.9-rc7 (latest commit) is broken somehow. The following incremental patch for ts651_4.9-rc7 is to fix it. With this, xhci will become same with mine. Thanks. -- OGAWA Hirofumi --- drivers/usb/host/xhci-mem.c |2 +- drivers/usb/host/xhci-ring.c | 16 +--- 2 files changed, 6 insertions(+), 12 deletions(-) diff -puN drivers/usb/host/xhci-ring.c~ts651_4.9-rc7-fix drivers/usb/host/xhci-ring.c --- linux/drivers/usb/host/xhci-ring.c~ts651_4.9-rc7-fix2016-11-29 01:58:29.103923074 +0900 +++ linux-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-29 02:02:48.958281665 +0900 @@ -327,16 +327,9 @@ static int xhci_abort_cmd_ring(struct xh xhci_dbg(xhci, "Abort command ring\n"); - temp_64 = xhci_read_64(xhci, >op_regs->cmd_ring); - xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; + reinit_completion(>stop_completion); - /* -* Writing the CMD_RING_ABORT bit should cause a cmd completion event, -* however on some host hw the CMD_RING_RUNNING bit is correctly cleared -* but the completion event in never sent. Use the cmd timeout timer to -* handle those cases. Use twice the time to cover the bit polling retry -*/ - xhci_mod_cmd_timer(xhci, 2 * XHCI_CMD_DEFAULT_TIMEOUT); + temp_64 = xhci_read_64(xhci, >op_regs->cmd_ring); xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, >op_regs->cmd_ring); @@ -357,6 +350,7 @@ static int xhci_abort_cmd_ring(struct xh xhci_halt(xhci); return -ESHUTDOWN; } + /* * Writing the CMD_RING_ABORT bit should cause a cmd completion event, * however on some host hw the CMD_RING_RUNNING bit is correctly cleared @@ -1318,7 +1312,7 @@ void xhci_handle_command_timeout(struct /* command timeout on stopped ring, ring can't be aborted */ xhci_dbg(xhci, "Command timeout on stopped ring\n"); - complete_all(>stop_completion); + xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); spin_unlock_irqrestore(>lock, flags); return; } @@ -1359,7 +1353,7 @@ static void handle_cmd_completion(struct /* If CMD ring stopped we own the trbs between enqueue and dequeue */ if (cmd_comp_code == COMP_CMD_STOP) { - xhci_handle_stopped_cmd_ring(xhci, cmd); + complete_all(>stop_completion); return; } diff -puN drivers/usb/host/xhci-mem.c~ts651_4.9-rc7-fix drivers/usb/host/xhci-mem.c --- linux/drivers/usb/host/xhci-mem.c~ts651_4.9-rc7-fix 2016-11-29 02:01:48.178963612 +0900 +++ linux-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-29 02:02:02.341037706 +0900 @@ -1796,7 +1796,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x int size; int i, j, num_ports; - cancel_delayed_work_sync(>cmd_timer);; + cancel_delayed_work_sync(>cmd_timer); /* Free the Event Ring Segment Table and the actual Event Ring */ size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries); _ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
On Mon, 28 Nov 2016 21:57:11 +0900, OGAWA Hirofumiwrote: > OGAWA Hirofumi writes: > Hm, this issue might be same with 4efca4ed05cbdfd13ec3e8cb623fb77d6e4ab187. > But it was already including into 4.9-rc7. My bad, it was rc6, not rc9. > can you try 4.9-rc7 if older one? Updated & rebased, building. > I couldn't reproduce that "no CRC" > issue with debian kernel's .config I did not diff recently my .config with debian's. I think I based it on Debian's 4.2 and then let it evolve "naturally" following vanilla. I checked all diffs against Debian's 4.8 and reduced it a lot (mostly new drivers which I did not enable). Doing so, I discovered that I somehow did not have "CONFIG_MODVERSIONS" set at all (not even the "is not set" comment). 4.9-rc7 is done building, there is no warning about CRC anymore - I'm not sure whether this comes from rc7 or MODVERSIONS. I can check if you think it is valuable. Resulting kernel booted successfully once, but I must do several more tries before being able to tell if it reliably does so. I also want to check rc7 without your patches, to compare boot failure rate. For reference, the I pushed the source I built in https://github.com/vpelletier/linux/tree/ts651_4.9-rc7 Please do check I did not do mistakes when resolving conflicts (the 2 last commits). Regards, -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
OGAWA Hirofumiwrites: >> I succeed to build 4.9-rc9 vanilla (starting from the same .config as >> with 4.8.9) but it then fails to load produced modules. During the >> build I have the following warnings: >> WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC! > > Looks like, CONFIG_MODVERSIONS is broken. CONFIG_MODVERSIONS=n may fix it. Hm, this issue might be same with 4efca4ed05cbdfd13ec3e8cb623fb77d6e4ab187. But it was already including into 4.9-rc7. 4.9-rc9 was actually -rc1? Well, can you try 4.9-rc7 if older one? Then if not working, can you send .config? I couldn't reproduce that "no CRC" issue with debian kernel's .config (BTW, compiler is debian/testing gcc-6.2.0-13). Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletierwrites: > I fail to build 4.8.9 vanilla: > Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: -fstack-protector not > supported by compiler Probably, debian's gcc default-pie issue. Below patch may fix it. > I succeed to build 4.9-rc9 vanilla (starting from the same .config as > with 4.8.9) but it then fails to load produced modules. During the > build I have the following warnings: > WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC! Looks like, CONFIG_MODVERSIONS is broken. CONFIG_MODVERSIONS=n may fix it. Thanks. -- OGAWA Hirofumi 8ae94224c9d72fc4d9aaac93b2d7833cf46d7141 82031ea29e454b574bc6f49a33683a693ca5d907 90944e40ba1838de4b2a9290cf273f9d76bd3bdd c6a385539175ebc603da53aafb7753d39089f32e Signed-off-by: OGAWA Hirofumi --- Makefile |5 +++-- arch/x86/purgatory/Makefile |1 + scripts/gcc-x86_64-has-stack-protector.sh |2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff -puN Makefile~debian-pie-default-fix Makefile --- linux/Makefile~debian-pie-default-fix 2016-11-26 22:57:00.681881071 +0900 +++ linux-hirofumi/Makefile 2016-11-26 22:57:00.683881081 +0900 @@ -399,11 +399,12 @@ KBUILD_CFLAGS := -Wall -Wundef -Wstric -fno-strict-aliasing -fno-common \ -Werror-implicit-function-declaration \ -Wno-format-security \ - -std=gnu89 + -std=gnu89 $(call cc-option,-fno-PIE) + KBUILD_AFLAGS_KERNEL := KBUILD_CFLAGS_KERNEL := -KBUILD_AFLAGS := -D__ASSEMBLY__ +KBUILD_AFLAGS := -D__ASSEMBLY__ $(call cc-option,-fno-PIE) KBUILD_AFLAGS_MODULE := -DMODULE KBUILD_CFLAGS_MODULE := -DMODULE KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds diff -puN scripts/gcc-x86_64-has-stack-protector.sh~debian-pie-default-fix scripts/gcc-x86_64-has-stack-protector.sh --- linux/scripts/gcc-x86_64-has-stack-protector.sh~debian-pie-default-fix 2016-11-26 22:57:00.681881071 +0900 +++ linux-hirofumi/scripts/gcc-x86_64-has-stack-protector.sh2016-11-26 22:57:00.683881081 +0900 @@ -1,6 +1,6 @@ #!/bin/sh -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs" +echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs" if [ "$?" -eq "0" ] ; then echo y else diff -puN arch/x86/purgatory/Makefile~debian-pie-default-fix arch/x86/purgatory/Makefile --- linux/arch/x86/purgatory/Makefile~debian-pie-default-fix2016-11-26 22:57:00.682881076 +0900 +++ linux-hirofumi/arch/x86/purgatory/Makefile 2016-11-26 22:57:00.683881081 +0900 @@ -16,6 +16,7 @@ KCOV_INSTRUMENT := n KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -MD -Os -mcmodel=large KBUILD_CFLAGS += -m$(BITS) +KBUILD_CFLAGS += $(call cc-option,-fno-PIE) $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld) _ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
On Thu, 24 Nov 2016 19:42:04 +0900, OGAWA Hirofumiwrote: > If you need my help for something, let me know either publicly or > privately what way you want. So, I fail a lot. And in ways totally unrelated to this patchset. My goal is to get a kernel image built with a trivial platform patch for my hardware (declare a few GPIO with leds and keys), a revert of "xhci: give command abortion one more chance before killing xhci" a6809ffd1687b3a8c192960e69add559b9d32649 to be able to tell if your changes improve the situation, plus a backport of changes 1 & 2 (which conflict with the revert), as I saw 3 is not essential and maybe on the way out. I fail to build 4.8.9 vanilla: Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: -fstack-protector not supported by compiler I succeed to build 4.9-rc9 vanilla (starting from the same .config as with 4.8.9) but it then fails to load produced modules. During the build I have the following warnings: WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC! Any idea on either ? I am building from Debian sid, build & target are x86_64. $ gcc --version gcc (Debian 6.2.0-13) 6.2.0 20161109 Build command line: $ make deb-pkg Regards, -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
Vincent Pelletierwrites: > On Wed, 23 Nov 2016 22:48:35 +0900, OGAWA Hirofumi > wrote: >> ping? > > FWIW, I intend to run this patch on the hardware which caused the > issue (thanks Mathias for the CC !). > > So far, in the very short attempt I made, I failed to even build the > kernel (some stack protection feature error when being probed in > gcc...). I should have time to try again this week-end, but please do > not wait for me. If you need my help for something, let me know either publicly or privately what way you want. Well, I can't see why he is waiting something. My patchset is fix to race, and your stuff is to remove unnecessary code, i.e. patchset has no dependency actually. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
On Wed, 23 Nov 2016 22:48:35 +0900, OGAWA Hirofumiwrote: > ping? FWIW, I intend to run this patch on the hardware which caused the issue (thanks Mathias for the CC !). So far, in the very short attempt I made, I failed to even build the kernel (some stack protection feature error when being probed in gcc...). I should have time to try again this week-end, but please do not wait for me. Regards, -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation
ping? OGAWA Hirofumiwrites: > Current abort operation has race. > > xhci_handle_command_timeout() > xhci_abort_cmd_ring() > xhci_write_64(CMD_RING_ABORT) > xhci_handshake(5s) > do { > check CMD_RING_RUNNING > udelay(1) > ... > COMP_CMD_ABORT event > COMP_CMD_STOP event > xhci_handle_stopped_cmd_ring() >restart cmd_ring > CMD_RING_RUNNING become 1 again > } while () > return -ETIMEDOUT > xhci_write_64(CMD_RING_ABORT) > /* can abort random command */ > > To do abort operation correctly, we have to wait both of COMP_CMD_STOP > event and negation of CMD_RING_RUNNING. > > But like above, while timeout handler is waiting negation of > CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout > handler never be notice negation of CMD_RING_RUNNING, and retry of > CMD_RING_ABORT can abort random command (BTW, I guess retry of > CMD_RING_ABORT was workaround of this race). > > To fix this race, this moves xhci_handle_stopped_cmd_ring() to > xhci_abort_cmd_ring(). And timeout handler waits COMP_CMD_STOP event. > > At this point, timeout handler is owner of cmd_ring, and safely > restart cmd_ring by using xhci_handle_stopped_cmd_ring(). > > [FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE > operation] > > Signed-off-by: OGAWA Hirofumi > --- > > drivers/usb/host/xhci-mem.c |1 > drivers/usb/host/xhci-ring.c | 163 > ++ > drivers/usb/host/xhci.h |1 > 3 files changed, 88 insertions(+), 77 deletions(-) > > diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 > drivers/usb/host/xhci-mem.c > --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 2016-11-16 > 18:38:52.551146764 +0900 > +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-17 01:04:11.402014445 > +0900 > @@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, > > /* init command timeout work */ > INIT_DELAYED_WORK(>cmd_timer, xhci_handle_command_timeout); > + init_completion(>cmd_ring_stop_completion); > > page_size = readl(>op_regs->page_size); > xhci_dbg_trace(xhci, trace_xhci_dbg_init, > diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 > drivers/usb/host/xhci-ring.c > --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race22016-11-16 > 18:38:52.552146764 +0900 > +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c2016-11-17 > 01:04:06.393018485 +0900 > @@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh > return mod_delayed_work(system_wq, >cmd_timer, delay); > } > > +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci) > +{ > + return list_first_entry_or_null(>cmd_list, struct xhci_command, > + cmd_list); > +} > + > +/* > + * Turn all commands on command ring with status set to "aborted" to no-op > trbs. > + * If there are other commands waiting then restart the ring and kick the > timer. > + * This must be called with command ring stopped and xhci->lock held. > + */ > +static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci, > + struct xhci_command *cur_cmd) > +{ > + struct xhci_command *i_cmd; > + u32 cycle_state; > + > + /* Turn all aborted commands in list to no-ops, then restart */ > + list_for_each_entry(i_cmd, >cmd_list, cmd_list) { > + > + if (i_cmd->status != COMP_CMD_ABORT) > + continue; > + > + i_cmd->status = COMP_CMD_STOP; > + > + xhci_dbg(xhci, "Turn aborted command %p to no-op\n", > + i_cmd->command_trb); > + /* get cycle state from the original cmd trb */ > + cycle_state = le32_to_cpu( > + i_cmd->command_trb->generic.field[3]) & TRB_CYCLE; > + /* modify the command trb to no-op command */ > + i_cmd->command_trb->generic.field[0] = 0; > + i_cmd->command_trb->generic.field[1] = 0; > + i_cmd->command_trb->generic.field[2] = 0; > + i_cmd->command_trb->generic.field[3] = cpu_to_le32( > + TRB_TYPE(TRB_CMD_NOOP) | cycle_state); > + > + /* > + * caller waiting for completion is called when command > + * completion event is received for these no-op commands > + */ > + } > + > + xhci->cmd_ring_state = CMD_RING_STATE_RUNNING; > + > + /* ring command ring doorbell to restart the command ring */ > + if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) && > + !(xhci->xhc_state