Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-21 Thread OGAWA Hirofumi
Mathias Nyman  writes:

>> We set CMD_RING_STATE_ABORTED state under locking. I'm not checking what
>> is for taking lock for register though, I guess it should be enough just
>> lock around of read=>write of ->cmd_ring if need lock.
>
> After your patch it should be enough to have the lock only while
> reading and writing the cmd_ring register.
>
> If we want a locking fix that applies more easily to older stable
> releases before your change then the lock needs to cover set
> CMD_RING_STATE_ABORT, read cmd_reg, write cmd_reg and busiloop
> checking CRR bit.  Otherwise the stop cmd ring interrupt handler may
> restart the ring just before we start checing CRR. The stop cmd ring
> interrupt will set the CMD_RING_STATE_ABORTED to
> CMD_RING_STATE_RUNNING so ring will really restart in the interrupt
> handler.

Just for record (no chance to make patch I myself for now, sorry), while
checking locking slightly, I noticed unrelated missing locking.

xhci_cleanup_command_queue()

We are calling it without locking, but we need to lock for accessing list.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-21 Thread OGAWA Hirofumi
Mathias Nyman  writes:

>> Below is the latest code. I put my comments in line.
>>
>>   322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>>   323 {
>>   324 u64 temp_64;
>>   325 int ret;
>>   326
>>   327 xhci_dbg(xhci, "Abort command ring\n");
>>   328
>>   329 reinit_completion(&xhci->cmd_ring_stop_completion);
>>   330
>>   331 temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>>   332 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>>   333 &xhci->op_regs->cmd_ring);
>>
>> We should hold xhci->lock when we are modifying xhci registers
>> at runtime.
>>
>
> Makes sense, but we need to unlock it before sleeping or waiting for
> completion.  I need to look into that in more detail.
>
> But this was an issue already before these changes.

We set CMD_RING_STATE_ABORTED state under locking. I'm not checking what
is for taking lock for register though, I guess it should be enough just
lock around of read=>write of ->cmd_ring if need lock.

[Rather ->cmd_ring user should check CMD_RING_STATE_ABORTED state.]

> But then again I really like OGAWA Hiroumi's solution that separates the
> command ring stopping from aborting commands and restarting the ring.
>
> The current way of always restarting the command ring as a response to
> a stop command ring command really limits its usage.
>
> So, with this in mind most reasonable would be to
> 1. fix the lock to cover abort+CRR check, and send it to usb-linus +stable
> 2. rebase OGAWA Hirofumi's changes on top of that, and send to usb-linus only
> 3. remove unnecessary second abort try as a separate patch, send to usb-next
> 4. remove polling for the Command ring running (CRR), waiting for completion
>     is enough, if completion times out then we can check CRR. for usb-next

I think we should check both of CRR and even of stop completion. Because
CRR and stop completion was not same time (both can be first).

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()

2016-12-14 Thread OGAWA Hirofumi
Mathias Nyman  writes:

> On 14.12.2016 01:40, OGAWA Hirofumi wrote:
>> ping about [PATCH 1/3, 2/3, 3/3]?
>>
>
> 1/3 and 2/3 will be sent to 4.10 usb-linus after rc1,
> 3/3 maybe to usb-next after that

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()

2016-12-13 Thread OGAWA Hirofumi
ping about [PATCH 1/3, 2/3, 3/3]?

OGAWA Hirofumi  writes:

> Mathias Nyman  writes:
>
>>> - Add xhci_handshake_sleep(), and use it.
>>
>> This seems a but overkill, I'd rather don't have xhci_handshake(),
>> xhci_handshake_sleep() and __xhci_handshake() to maintain.
>
> I agree about it. However, on other hand, I thought about the
> possibility/effort to decreasing use-cases of xhci_handshake()
> busyloop. (there are several places to use more than e.g. 500ms
> timeout.)  Because long busyloop (especially with interrupt-off) has
> really bad effect to whole system of non-usb.
>
>> As we now can sleep in xhci_abort_cmd_ring() I would rather just first
>> wait for the completion and then maybe handshake check the register.
>> At that point it shouldn't need to busyloop anymore, one read should
>> do it.
>
> I worried about in the case of real internal confusing, and consider
> about chip that doesn't have no stop/abort event.
>
> To handle both case, timeout choice would not be straight-forward.
>
>> But this is all just optimizing an error path, I'm actually fine with
>> taking just patch 1/3 and 2/3 v3, and skipping 3/3. the udelay(1000)
>> to msleep(1) I can add myself
>
> Right. The main point of patchset is 1/3 and 2/3.
>
>>> As related change, current xhci_handshake() is strange behavior,
>>> E.g. xhci_handshake(ptr, mask, done, 1) does
>>>
>>>  result = readl(ptr);
>>>  /* check result */
>>>  udelay(1); <= meaningless delay
>>>  return -ETIMEDOUT;
>>
>> Only in the timeout case, so if we after 3 or 5 million register reads
>> + udelays(1) still don't get the value we want then there is an
>> unnecessary udelay(1).
>>
>> Not worth putting much effort on it now.
>
> True. But actual effort for it was very small.
>
> @@ -65,16 +65,18 @@ int xhci_handshake(void __iomem *ptr, u3
>  {
>   u32 result;
>  
> - do {
> + while (1) {
>   result = readl(ptr);
>   if (result == ~(u32)0)  /* card removed */
>   return -ENODEV;
>       result &= mask;
>   if (result == done)
>   return 0;
> + if (usec <= 0)
> + break;
>   udelay(1);
>   usec--;
> - } while (usec > 0);
> + }
>   return -ETIMEDOUT;
>  }
>
> Thanks.

-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-12-05 Thread OGAWA Hirofumi
Vincent Pelletier  writes:

> Problem is, I could not reproduce the "Command completion event does not
> match command" error either (which used to happen even on successful
> boots), even without your patch. Checking my syslog, I see it did
> happen with my previous kernel (vanilla 4.6.0).
> Could something else since 4.6.0 have improved xhci initialisation on
>   00:14.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx, 
> Celeron N2000 Series USB xHCI (rev 0e)
> From a naive "git log drivers/usb/host/xhci*" read,
>   commit 3425aa03f484d45dc21e0e791c2f6c74ea656421
>   Author: Mathias Nyman 
>   Date:   Wed Jun 1 18:09:08 2016 +0300
>   xhci: Fix handling timeouted commands on hosts in weird states.
> caught my eye, along with some quirk reworks but I can't tell if they
> could be related to the improvement.
> FWIW the following log line
> did not change since (at least) 4.4.0 on this machine:
>   xhci_hcd :00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 
> 0x9810
>
> So I cannot tell for sure whether your patch set improves the situation.
>
> What I can say though is that after about 5 boots *with* your patch is
> that I had no boot failure nor xhci warnings, so at least no obvious
> regression either.

Thanks for testing. Looks like that root abort cause itself was fixed
somewhere.  I think that workaround is not needed anymore though,
Mathias will decide whether removing or not.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-11-28 Thread OGAWA Hirofumi
Vincent Pelletier  writes:

>> I couldn't reproduce that "no CRC"
>> issue with debian kernel's .config
>
> I did not diff recently my .config with debian's. I think I based it on
> Debian's 4.2 and then let it evolve "naturally" following vanilla.
> I checked all diffs against Debian's 4.8 and reduced it a lot (mostly
> new drivers which I did not enable). Doing so, I discovered that I
> somehow did not have "CONFIG_MODVERSIONS" set at all (not even the
> "is not set" comment).
>
> 4.9-rc7 is done building, there is no warning about CRC anymore -
> I'm not sure whether this comes from rc7 or MODVERSIONS. I can check if
> you think it is valuable.

At 4.9-rc7, MODVERSIONS was marked as BROKEN. It might fix that
issue. Well, anyway, it doesn't matter anymore.

> Resulting kernel booted successfully once, but I must do several more
> tries before being able to tell if it reliably does so.
> I also want to check rc7 without your patches, to compare boot failure
> rate.
>
> For reference, the I pushed the source I built in
>   https://github.com/vpelletier/linux/tree/ts651_4.9-rc7
> Please do check I did not do mistakes when resolving conflicts (the 2
> last commits).

Your ts651_4.9-rc7 (latest commit) is broken somehow. The following
incremental patch for ts651_4.9-rc7 is to fix it. With this, xhci will
become same with mine.

Thanks.
-- 
OGAWA Hirofumi 



---

 drivers/usb/host/xhci-mem.c  |2 +-
 drivers/usb/host/xhci-ring.c |   16 +---
 2 files changed, 6 insertions(+), 12 deletions(-)

diff -puN drivers/usb/host/xhci-ring.c~ts651_4.9-rc7-fix 
drivers/usb/host/xhci-ring.c
--- linux/drivers/usb/host/xhci-ring.c~ts651_4.9-rc7-fix2016-11-29 
01:58:29.103923074 +0900
+++ linux-hirofumi/drivers/usb/host/xhci-ring.c 2016-11-29 02:02:48.958281665 
+0900
@@ -327,16 +327,9 @@ static int xhci_abort_cmd_ring(struct xh
 
xhci_dbg(xhci, "Abort command ring\n");
 
-   temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
-   xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
+   reinit_completion(&xhci->stop_completion);
 
-   /*
-* Writing the CMD_RING_ABORT bit should cause a cmd completion event,
-* however on some host hw the CMD_RING_RUNNING bit is correctly cleared
-* but the completion event in never sent. Use the cmd timeout timer to
-* handle those cases. Use twice the time to cover the bit polling retry
-*/
-   xhci_mod_cmd_timer(xhci, 2 * XHCI_CMD_DEFAULT_TIMEOUT);
+   temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
&xhci->op_regs->cmd_ring);
 
@@ -357,6 +350,7 @@ static int xhci_abort_cmd_ring(struct xh
xhci_halt(xhci);
return -ESHUTDOWN;
}
+
/*
 * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
 * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
@@ -1318,7 +1312,7 @@ void xhci_handle_command_timeout(struct
 
/* command timeout on stopped ring, ring can't be aborted */
xhci_dbg(xhci, "Command timeout on stopped ring\n");
-   complete_all(&xhci->stop_completion);
+   xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
spin_unlock_irqrestore(&xhci->lock, flags);
return;
 }
@@ -1359,7 +1353,7 @@ static void handle_cmd_completion(struct
 
/* If CMD ring stopped we own the trbs between enqueue and dequeue */
if (cmd_comp_code == COMP_CMD_STOP) {
-   xhci_handle_stopped_cmd_ring(xhci, cmd);
+   complete_all(&xhci->stop_completion);
return;
}
 
diff -puN drivers/usb/host/xhci-mem.c~ts651_4.9-rc7-fix 
drivers/usb/host/xhci-mem.c
--- linux/drivers/usb/host/xhci-mem.c~ts651_4.9-rc7-fix 2016-11-29 
02:01:48.178963612 +0900
+++ linux-hirofumi/drivers/usb/host/xhci-mem.c  2016-11-29 02:02:02.341037706 
+0900
@@ -1796,7 +1796,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x
int size;
int i, j, num_ports;
 
-   cancel_delayed_work_sync(&xhci->cmd_timer);;
+   cancel_delayed_work_sync(&xhci->cmd_timer);
 
/* Free the Event Ring Segment Table and the actual Event Ring */
size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-11-28 Thread OGAWA Hirofumi
OGAWA Hirofumi  writes:

>> I succeed to build 4.9-rc9 vanilla (starting from the same .config as
>> with 4.8.9) but it then fails to load produced modules. During the
>> build I have the following warnings:
>>   WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC!
>
> Looks like, CONFIG_MODVERSIONS is broken. CONFIG_MODVERSIONS=n may fix it.

Hm, this issue might be same with 4efca4ed05cbdfd13ec3e8cb623fb77d6e4ab187.
But it was already including into 4.9-rc7.

4.9-rc9 was actually -rc1? Well, can you try 4.9-rc7 if older one? Then
if not working, can you send .config? I couldn't reproduce that "no CRC"
issue with debian kernel's .config (BTW, compiler is debian/testing
gcc-6.2.0-13).

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-11-28 Thread OGAWA Hirofumi
Vincent Pelletier  writes:

> I fail to build 4.8.9 vanilla:
>   Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: -fstack-protector not 
> supported by compiler

Probably, debian's gcc default-pie issue. Below patch may fix it.

> I succeed to build 4.9-rc9 vanilla (starting from the same .config as
> with 4.8.9) but it then fails to load produced modules. During the
> build I have the following warnings:
>   WARNING: "_copy_from_user" [drivers/hid/hid.ko] has no CRC!

Looks like, CONFIG_MODVERSIONS is broken. CONFIG_MODVERSIONS=n may fix it.

Thanks.
-- 
OGAWA Hirofumi 


8ae94224c9d72fc4d9aaac93b2d7833cf46d7141
82031ea29e454b574bc6f49a33683a693ca5d907
90944e40ba1838de4b2a9290cf273f9d76bd3bdd
c6a385539175ebc603da53aafb7753d39089f32e

Signed-off-by: OGAWA Hirofumi 
---

 Makefile  |5 +++--
 arch/x86/purgatory/Makefile   |1 +
 scripts/gcc-x86_64-has-stack-protector.sh |2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff -puN Makefile~debian-pie-default-fix Makefile
--- linux/Makefile~debian-pie-default-fix   2016-11-26 22:57:00.681881071 
+0900
+++ linux-hirofumi/Makefile 2016-11-26 22:57:00.683881081 +0900
@@ -399,11 +399,12 @@ KBUILD_CFLAGS   := -Wall -Wundef -Wstric
   -fno-strict-aliasing -fno-common \
   -Werror-implicit-function-declaration \
   -Wno-format-security \
-  -std=gnu89
+  -std=gnu89 $(call cc-option,-fno-PIE)
+
 
 KBUILD_AFLAGS_KERNEL :=
 KBUILD_CFLAGS_KERNEL :=
-KBUILD_AFLAGS   := -D__ASSEMBLY__
+KBUILD_AFLAGS   := -D__ASSEMBLY__ $(call cc-option,-fno-PIE)
 KBUILD_AFLAGS_MODULE  := -DMODULE
 KBUILD_CFLAGS_MODULE  := -DMODULE
 KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds
diff -puN scripts/gcc-x86_64-has-stack-protector.sh~debian-pie-default-fix 
scripts/gcc-x86_64-has-stack-protector.sh
--- linux/scripts/gcc-x86_64-has-stack-protector.sh~debian-pie-default-fix  
2016-11-26 22:57:00.681881071 +0900
+++ linux-hirofumi/scripts/gcc-x86_64-has-stack-protector.sh2016-11-26 
22:57:00.683881081 +0900
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 
-mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 
-mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
 if [ "$?" -eq "0" ] ; then
echo y
 else
diff -puN arch/x86/purgatory/Makefile~debian-pie-default-fix 
arch/x86/purgatory/Makefile
--- linux/arch/x86/purgatory/Makefile~debian-pie-default-fix2016-11-26 
22:57:00.682881076 +0900
+++ linux-hirofumi/arch/x86/purgatory/Makefile  2016-11-26 22:57:00.683881081 
+0900
@@ -16,6 +16,7 @@ KCOV_INSTRUMENT := n
 
 KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes 
-fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -MD -Os 
-mcmodel=large
 KBUILD_CFLAGS += -m$(BITS)
+KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
 
 $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
_
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()

2016-11-24 Thread OGAWA Hirofumi
Mathias Nyman  writes:

>> - Add xhci_handshake_sleep(), and use it.
>
> This seems a but overkill, I'd rather don't have xhci_handshake(),
> xhci_handshake_sleep() and __xhci_handshake() to maintain.

I agree about it. However, on other hand, I thought about the
possibility/effort to decreasing use-cases of xhci_handshake()
busyloop. (there are several places to use more than e.g. 500ms
timeout.)  Because long busyloop (especially with interrupt-off) has
really bad effect to whole system of non-usb.

> As we now can sleep in xhci_abort_cmd_ring() I would rather just first
> wait for the completion and then maybe handshake check the register.
> At that point it shouldn't need to busyloop anymore, one read should
> do it.

I worried about in the case of real internal confusing, and consider
about chip that doesn't have no stop/abort event.

To handle both case, timeout choice would not be straight-forward.

> But this is all just optimizing an error path, I'm actually fine with
> taking just patch 1/3 and 2/3 v3, and skipping 3/3. the udelay(1000)
> to msleep(1) I can add myself

Right. The main point of patchset is 1/3 and 2/3.

>> As related change, current xhci_handshake() is strange behavior,
>> E.g. xhci_handshake(ptr, mask, done, 1) does
>>
>>  result = readl(ptr);
>>  /* check result */
>>  udelay(1);  <= meaningless delay
>>  return -ETIMEDOUT;
>
> Only in the timeout case, so if we after 3 or 5 million register reads
> + udelays(1) still don't get the value we want then there is an
> unnecessary udelay(1).
>
> Not worth putting much effort on it now.

True. But actual effort for it was very small.

@@ -65,16 +65,18 @@ int xhci_handshake(void __iomem *ptr, u3
 {
u32 result;
 
-   do {
+   while (1) {
result = readl(ptr);
if (result == ~(u32)0)  /* card removed */
return -ENODEV;
result &= mask;
if (result == done)
return 0;
+   if (usec <= 0)
+   break;
    udelay(1);
usec--;
-   } while (usec > 0);
+   }
return -ETIMEDOUT;
 }

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-11-24 Thread OGAWA Hirofumi
Vincent Pelletier  writes:

> On Wed, 23 Nov 2016 22:48:35 +0900, OGAWA Hirofumi
>  wrote:
>> ping?
>
> FWIW, I intend to run this patch on the hardware which caused the
> issue (thanks Mathias for the CC !).
>
> So far, in the very short attempt I made, I failed to even build the
> kernel (some stack protection feature error when being probed in
> gcc...). I should have time to try again this week-end, but please do
> not wait for me.

If you need my help for something, let me know either publicly or
privately what way you want.

Well, I can't see why he is waiting something. My patchset is fix to
race, and your stuff is to remove unnecessary code, i.e. patchset has no
dependency actually.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-11-23 Thread OGAWA Hirofumi

ping?

OGAWA Hirofumi  writes:

> Current abort operation has race.
>
> xhci_handle_command_timeout()
>   xhci_abort_cmd_ring()
> xhci_write_64(CMD_RING_ABORT)
> xhci_handshake(5s)
> do {
>   check CMD_RING_RUNNING
> udelay(1)
>  ...
>  COMP_CMD_ABORT event
>  COMP_CMD_STOP event
>  xhci_handle_stopped_cmd_ring()
>restart cmd_ring
>  CMD_RING_RUNNING become 1 again
> } while ()
>   return -ETIMEDOUT
> xhci_write_64(CMD_RING_ABORT)
> /* can abort random command */
>
> To do abort operation correctly, we have to wait both of COMP_CMD_STOP
> event and negation of CMD_RING_RUNNING.
>
> But like above, while timeout handler is waiting negation of
> CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout
> handler never be notice negation of CMD_RING_RUNNING, and retry of
> CMD_RING_ABORT can abort random command (BTW, I guess retry of
> CMD_RING_ABORT was workaround of this race).
>
> To fix this race, this moves xhci_handle_stopped_cmd_ring() to
> xhci_abort_cmd_ring().  And timeout handler waits COMP_CMD_STOP event.
>
> At this point, timeout handler is owner of cmd_ring, and safely
> restart cmd_ring by using xhci_handle_stopped_cmd_ring().
>
> [FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE
> operation]
>
> Signed-off-by: OGAWA Hirofumi 
> ---
>
>  drivers/usb/host/xhci-mem.c  |1 
>  drivers/usb/host/xhci-ring.c |  163 
> ++
>  drivers/usb/host/xhci.h  |1 
>  3 files changed, 88 insertions(+), 77 deletions(-)
>
> diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 
> drivers/usb/host/xhci-mem.c
> --- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 2016-11-16 
> 18:38:52.551146764 +0900
> +++ xhci-hirofumi/drivers/usb/host/xhci-mem.c 2016-11-17 01:04:11.402014445 
> +0900
> @@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci,
>  
>   /* init command timeout work */
>   INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout);
> + init_completion(&xhci->cmd_ring_stop_completion);
>  
>   page_size = readl(&xhci->op_regs->page_size);
>   xhci_dbg_trace(xhci, trace_xhci_dbg_init,
> diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 
> drivers/usb/host/xhci-ring.c
> --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race22016-11-16 
> 18:38:52.552146764 +0900
> +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c2016-11-17 
> 01:04:06.393018485 +0900
> @@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh
>   return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
>  }
>  
> +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci)
> +{
> + return list_first_entry_or_null(&xhci->cmd_list, struct xhci_command,
> + cmd_list);
> +}
> +
> +/*
> + * Turn all commands on command ring with status set to "aborted" to no-op 
> trbs.
> + * If there are other commands waiting then restart the ring and kick the 
> timer.
> + * This must be called with command ring stopped and xhci->lock held.
> + */
> +static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
> +  struct xhci_command *cur_cmd)
> +{
> + struct xhci_command *i_cmd;
> + u32 cycle_state;
> +
> + /* Turn all aborted commands in list to no-ops, then restart */
> + list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) {
> +
> + if (i_cmd->status != COMP_CMD_ABORT)
> + continue;
> +
> + i_cmd->status = COMP_CMD_STOP;
> +
> + xhci_dbg(xhci, "Turn aborted command %p to no-op\n",
> +  i_cmd->command_trb);
> + /* get cycle state from the original cmd trb */
> + cycle_state = le32_to_cpu(
> + i_cmd->command_trb->generic.field[3]) & TRB_CYCLE;
> + /* modify the command trb to no-op command */
> + i_cmd->command_trb->generic.field[0] = 0;
> + i_cmd->command_trb->generic.field[1] = 0;
> + i_cmd->command_trb->generic.field[2] = 0;
> + i_cmd->command_trb->generic.field[3] = cpu_to_le32(
> + TRB_TYPE(TRB_CMD_NOOP) | cycle_state);
> +
&g

[PATCH 2/3 v3] xhci: Fix race related to abort operation

2016-11-16 Thread OGAWA Hirofumi
Current abort operation has race.

xhci_handle_command_timeout()
  xhci_abort_cmd_ring()
xhci_write_64(CMD_RING_ABORT)
xhci_handshake(5s)
  do {
check CMD_RING_RUNNING
udelay(1)
   ...
   COMP_CMD_ABORT event
   COMP_CMD_STOP event
   xhci_handle_stopped_cmd_ring()
 restart cmd_ring
 CMD_RING_RUNNING become 1 again
  } while ()
  return -ETIMEDOUT
xhci_write_64(CMD_RING_ABORT)
/* can abort random command */

To do abort operation correctly, we have to wait both of COMP_CMD_STOP
event and negation of CMD_RING_RUNNING.

But like above, while timeout handler is waiting negation of
CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout
handler never be notice negation of CMD_RING_RUNNING, and retry of
CMD_RING_ABORT can abort random command (BTW, I guess retry of
CMD_RING_ABORT was workaround of this race).

To fix this race, this moves xhci_handle_stopped_cmd_ring() to
xhci_abort_cmd_ring().  And timeout handler waits COMP_CMD_STOP event.

At this point, timeout handler is owner of cmd_ring, and safely
restart cmd_ring by using xhci_handle_stopped_cmd_ring().

[FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE
operation]

Signed-off-by: OGAWA Hirofumi 
---

 drivers/usb/host/xhci-mem.c  |1 
 drivers/usb/host/xhci-ring.c |  163 ++
 drivers/usb/host/xhci.h  |1 
 3 files changed, 88 insertions(+), 77 deletions(-)

diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 
drivers/usb/host/xhci-mem.c
--- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2   2016-11-16 
18:38:52.551146764 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-mem.c   2016-11-17 01:04:11.402014445 
+0900
@@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci,
 
/* init command timeout work */
INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout);
+   init_completion(&xhci->cmd_ring_stop_completion);
 
page_size = readl(&xhci->op_regs->page_size);
xhci_dbg_trace(xhci, trace_xhci_dbg_init,
diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 
drivers/usb/host/xhci-ring.c
--- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2  2016-11-16 
18:38:52.552146764 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-ring.c  2016-11-17 01:04:06.393018485 
+0900
@@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh
return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
 }
 
+static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci)
+{
+   return list_first_entry_or_null(&xhci->cmd_list, struct xhci_command,
+   cmd_list);
+}
+
+/*
+ * Turn all commands on command ring with status set to "aborted" to no-op 
trbs.
+ * If there are other commands waiting then restart the ring and kick the 
timer.
+ * This must be called with command ring stopped and xhci->lock held.
+ */
+static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
+struct xhci_command *cur_cmd)
+{
+   struct xhci_command *i_cmd;
+   u32 cycle_state;
+
+   /* Turn all aborted commands in list to no-ops, then restart */
+   list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) {
+
+   if (i_cmd->status != COMP_CMD_ABORT)
+   continue;
+
+   i_cmd->status = COMP_CMD_STOP;
+
+   xhci_dbg(xhci, "Turn aborted command %p to no-op\n",
+i_cmd->command_trb);
+   /* get cycle state from the original cmd trb */
+   cycle_state = le32_to_cpu(
+   i_cmd->command_trb->generic.field[3]) & TRB_CYCLE;
+   /* modify the command trb to no-op command */
+   i_cmd->command_trb->generic.field[0] = 0;
+   i_cmd->command_trb->generic.field[1] = 0;
+   i_cmd->command_trb->generic.field[2] = 0;
+   i_cmd->command_trb->generic.field[3] = cpu_to_le32(
+   TRB_TYPE(TRB_CMD_NOOP) | cycle_state);
+
+   /*
+* caller waiting for completion is called when command
+*  completion event is received for these no-op commands
+*/
+   }
+
+   xhci->cmd_ring_state = CMD_RING_STATE_RUNNING;
+
+   /* ring command ring doorbell to restart the command ring */
+   if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
+   !(xhci->xhc_state & XHCI_STATE_DYING)) {
+   xhci->current_c

Re: [PATCH 2/3 v2] xhci: Fix race related to abort operation

2016-11-16 Thread OGAWA Hirofumi
Mathias Nyman  writes:

> The retry was a workaround for a case triggered on v4.1.4 by
> Vincent Pelletier (added to CC)
>
>http://marc.info/?l=linux-usb&m=144031185222108&w=2
>
> The race could very well explain that issue.

>From logs of that thread, it would has possibility about this race.

>> @@ -320,15 +367,29 @@ static int xhci_abort_cmd_ring(struct xh
>>  udelay(1000);
>>  ret = xhci_handshake(&xhci->op_regs->cmd_ring,
>>   CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
>> -if (ret == 0)
>> -return 0;
>> +if (ret < 0) {
>> +xhci_err(xhci, "Stopped the command ring failed, "
>> + "maybe the host is dead\n");
>> +xhci->xhc_state |= XHCI_STATE_DYING;
>> +xhci_halt(xhci);
>> +return -ESHUTDOWN;
>> +}
>> +}
>
> So if we can verify that the race fix solves the old issue for Vincent
> Pelletier then we could get rid of the abort retry above as well.

Right. However, I'm not sure (and maybe you neither).  So I didn't
remove it in this patchset.

(After this patchset, I guess the retry will hit only when actual chip
internal confusion. I.e. retry will fail, then will chip halted.)

Well, anyway, if you decide to try to remove the retry, I also think it
would worth to try. However, it would be [PATCH 4/4] or as another
patchset.

>> +/*
>> + * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
>> + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
>> + * but the completion event in never sent. Wait 2 secs (arbitrary
>> + * number) to handle those cases after negation of CMD_RING_RUNNING.
>> + */
>
> I'm speculating a bit here, but as we now can sleep, and if we could
> get rid of the abort retry couldn't we swap the order of the
> xhci_handshake(CMD_RING_RUNNING) busywait and
> wait_for_completion_timeout(). We could also use a standard command
> timeout time while waiting for the completion
>
> something like:
>
> hci_write_64(CMD_RING_ABORT)
> timed_out = wait_for_completion_timeout(XHCI_CMD_DEFAULT_TIMEOUT)
> xhci_handshake(CMD_RING_RUNNING)
> if (handshake fail) {
> xhci_halt(), etc..
>return -ESHUTDOWN
> } 
> if (timed out) {
>xhci_cleanup_command_queue(xhci);
>return
> }
>
> It would reduce the time we spend in the xhci_handshake() busyloop

BTW, busyloop removal was done by "[PATCH 3/3] ...". By [PATCH 3/3], we
can simply use straight forward coding without busyloop.

>> diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race2 
>> drivers/usb/host/xhci.h
>> --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race22016-11-16 
>> 18:38:52.552146764 +0900
>> +++ xhci-hirofumi/drivers/usb/host/xhci.h2016-11-16 18:38:52.554146762 
>> +0900
>> @@ -1574,6 +1574,7 @@ struct xhci_hcd {
>>  struct list_head    cmd_list;
>>  unsigned intcmd_ring_reserved_trbs;
>>  struct delayed_work cmd_timer;
>> +struct completion   stop_completion;
>
> Tiny thing, but maybe change stop_completion to
> cmd_ring_stop_completion.  to avoid mixing it with stopping host
> controller, stop endpoint, stop device etc

OK.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3 v2] xhci: Fix race related to abort operation

2016-11-16 Thread OGAWA Hirofumi
Current abort operation has race.

xhci_handle_command_timeout()
  xhci_abort_cmd_ring()
xhci_write_64(CMD_RING_ABORT)
xhci_handshake(5s)
  do {
check CMD_RING_RUNNING
udelay(1)
   ...
   COMP_CMD_ABORT event
   COMP_CMD_STOP event
   xhci_handle_stopped_cmd_ring()
 restart cmd_ring
 CMD_RING_RUNNING become 1 again
  } while ()
  return -ETIMEDOUT
xhci_write_64(CMD_RING_ABORT)
/* can abort random command */

To do abort operation correctly, we have to wait both of COMP_CMD_STOP
event and negation of CMD_RING_RUNNING.

But like above, while timeout handler is waiting negation of
CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout
handler never be notice negation of CMD_RING_RUNNING, and retry of
CMD_RING_ABORT can abort random command (BTW, I guess retry of
CMD_RING_ABORT was workaround of this race).

To fix this race, this moves xhci_handle_stopped_cmd_ring() to
xhci_abort_cmd_ring().  And timeout handler waits COMP_CMD_STOP event.

At this point, timeout handler is owner of cmd_ring, and safely
restart cmd_ring by using xhci_handle_stopped_cmd_ring().

[FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE
operation]

Signed-off-by: OGAWA Hirofumi 
---

 drivers/usb/host/xhci-mem.c  |1 
 drivers/usb/host/xhci-ring.c |  162 ++
 drivers/usb/host/xhci.h  |1 
 3 files changed, 87 insertions(+), 77 deletions(-)

diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 
drivers/usb/host/xhci-mem.c
--- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2   2016-11-16 
18:38:52.551146764 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-mem.c   2016-11-16 18:38:52.553146763 
+0900
@@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci,
 
/* init command timeout work */
INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout);
+   init_completion(&xhci->stop_completion);
 
page_size = readl(&xhci->op_regs->page_size);
xhci_dbg_trace(xhci, trace_xhci_dbg_init,
diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 
drivers/usb/host/xhci-ring.c
--- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2  2016-11-16 
18:38:52.552146764 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-ring.c  2016-11-16 18:39:07.027136278 
+0900
@@ -284,6 +284,60 @@ static bool xhci_mod_cmd_timer(struct xh
return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
 }
 
+static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci)
+{
+   return list_first_entry_or_null(&xhci->cmd_list, struct xhci_command,
+   cmd_list);
+}
+
+/*
+ * Turn all commands on command ring with status set to "aborted" to no-op 
trbs.
+ * If there are other commands waiting then restart the ring and kick the 
timer.
+ * This must be called with command ring stopped and xhci->lock held.
+ */
+static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
+struct xhci_command *cur_cmd)
+{
+   struct xhci_command *i_cmd;
+   u32 cycle_state;
+
+   /* Turn all aborted commands in list to no-ops, then restart */
+   list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) {
+
+   if (i_cmd->status != COMP_CMD_ABORT)
+   continue;
+
+   i_cmd->status = COMP_CMD_STOP;
+
+   xhci_dbg(xhci, "Turn aborted command %p to no-op\n",
+i_cmd->command_trb);
+   /* get cycle state from the original cmd trb */
+   cycle_state = le32_to_cpu(
+   i_cmd->command_trb->generic.field[3]) & TRB_CYCLE;
+   /* modify the command trb to no-op command */
+   i_cmd->command_trb->generic.field[0] = 0;
+   i_cmd->command_trb->generic.field[1] = 0;
+   i_cmd->command_trb->generic.field[2] = 0;
+   i_cmd->command_trb->generic.field[3] = cpu_to_le32(
+   TRB_TYPE(TRB_CMD_NOOP) | cycle_state);
+
+   /*
+* caller waiting for completion is called when command
+*  completion event is received for these no-op commands
+*/
+   }
+
+   xhci->cmd_ring_state = CMD_RING_STATE_RUNNING;
+
+   /* ring command ring doorbell to restart the command ring */
+   if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
+   !(xhci->xhc_state & XHCI_STATE_DYING)) {
+   xhci->current_cmd =

Re: [PATCH 2/3] xhci: Fix race related to abort operation

2016-11-16 Thread OGAWA Hirofumi
Felipe Balbi  writes:

> Hi,

Hi,

> OGAWA Hirofumi  writes:
>> diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 
>> drivers/usb/host/xhci-ring.c
>> --- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2   2016-11-16 
>> 13:36:07.219329211 +0900
>> +++ xhci-hirofumi/drivers/usb/host/xhci-ring.c   2016-11-16 
>> 13:36:07.221329211 +0900
>> @@ -284,6 +284,61 @@ static bool xhci_mod_cmd_timer(struct xh
>>  return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
>>  }
>>  
>> +static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci)
>> +{
>> +if (list_empty(&xhci->cmd_list))
>> +return NULL;
>> +return list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list);
>
> could use list_first_entry_or_null() here

OK. Thanks for pointing out.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] xhci: Remove busy loop from xhci_abort_cmd_ring()

2016-11-15 Thread OGAWA Hirofumi
Now, xhci_abort_cmd_ring() is sleepable. So no reason to use busy loop
anymore.

- Convert udelay(1000) => msleep(1).
- Add xhci_handshake_sleep(), and use it.

As related change, current xhci_handshake() is strange behavior,
E.g. xhci_handshake(ptr, mask, done, 1) does

result = readl(ptr);
/* check result */
udelay(1);  <= meaningless delay
return -ETIMEDOUT;

Instead of above, this changes to do

result = readl(ptr);
/* check result */
udelay(1);
result = readl(ptr);
/* check result */
return -ETIMEDOUT;

Signed-off-by: OGAWA Hirofumi 
---

 drivers/usb/host/xhci-ring.c |   10 
 drivers/usb/host/xhci.c  |   52 +-
 drivers/usb/host/xhci.h  |1 
 3 files changed, 48 insertions(+), 15 deletions(-)

diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race3 
drivers/usb/host/xhci-ring.c
--- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race3  2016-11-16 
13:36:08.787328641 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-ring.c  2016-11-16 13:36:08.789328640 
+0900
@@ -359,15 +359,15 @@ static int xhci_abort_cmd_ring(struct xh
 * seconds), then it should assume that the there are
 * larger problems with the xHC and assert HCRST.
 */
-   ret = xhci_handshake(&xhci->op_regs->cmd_ring,
-   CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
+   ret = xhci_handshake_sleep(&xhci->op_regs->cmd_ring,
+  CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
if (ret < 0) {
/* we are about to kill xhci, give it one more chance */
xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
  &xhci->op_regs->cmd_ring);
-   udelay(1000);
-   ret = xhci_handshake(&xhci->op_regs->cmd_ring,
-CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
+   msleep(1);
+   ret = xhci_handshake_sleep(&xhci->op_regs->cmd_ring,
+   CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
if (ret < 0) {
xhci_err(xhci, "Stopped the command ring failed, "
 "maybe the host is dead\n");
diff -puN drivers/usb/host/xhci.c~xhci-fix-abort-race3 drivers/usb/host/xhci.c
--- xhci/drivers/usb/host/xhci.c~xhci-fix-abort-race3   2016-11-16 
13:36:08.787328641 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci.c   2016-11-16 13:36:08.790328639 
+0900
@@ -61,21 +61,53 @@ MODULE_PARM_DESC(quirks, "Bit flags for
  * handshake done).  There are two failure modes:  "usec" have passed (major
  * hardware flakeout), or the register reads as all-ones (hardware removed).
  */
+static int __xhci_handshake(void __iomem *ptr, u32 mask, u32 done)
+{
+   u32 result;
+
+   result = readl(ptr);
+   if (result == ~(u32)0)  /* card removed */
+   return -ENODEV;
+   result &= mask;
+   if (result == done)
+   return 0;
+   return 1;
+}
+
 int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec)
 {
-   u32 result;
+   int ret;
 
-   do {
-   result = readl(ptr);
-   if (result == ~(u32)0)  /* card removed */
-   return -ENODEV;
-   result &= mask;
-   if (result == done)
-   return 0;
+   while (1) {
+   ret = __xhci_handshake(ptr, mask, done);
+   if (ret <= 0)
+   goto out;
+   if (usec <= 0)
+   break;
udelay(1);
usec--;
-   } while (usec > 0);
-   return -ETIMEDOUT;
+   }
+   ret = -ETIMEDOUT;
+out:
+   return ret;
+}
+
+int xhci_handshake_sleep(void __iomem *ptr, u32 mask, u32 done, int usec)
+{
+   unsigned long timeout = jiffies + usecs_to_jiffies(usec);
+   int ret;
+
+   while (1) {
+   ret = __xhci_handshake(ptr, mask, done);
+   if (ret <= 0)
+   goto out;
+   if (time_after(jiffies, timeout))
+   break;
+   usleep_range(1, 10);
+   }
+   ret = -ETIMEDOUT;
+out:
+   return ret;
 }
 
 /*
diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race3 drivers/usb/host/xhci.h
--- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race3   2016-11-16 
13:36:08.788328640 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci.h   2016-11-16 13:36:08.790328639 
+0900
@@ -1844,6 +1844,7 @@ void xhci_free_command(struct xhci_hcd *
 /* xHCI host controller glue */
 typedef void (*xhci_get_quirks_t)(struct device *, struct xhci_hcd *);
 int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec);
+int xhci_handshake_sleep(void __iomem *ptr, u32 mask, u32 done, int usec);
 v

[PATCH 2/3] xhci: Fix race related to abort operation

2016-11-15 Thread OGAWA Hirofumi
Current abort operation has race.

xhci_handle_command_timeout()
  xhci_abort_cmd_ring()
xhci_write_64(CMD_RING_ABORT)
xhci_handshake(5s)
  do {
check CMD_RING_RUNNING
udelay(1)
   ...
   COMP_CMD_ABORT event
   COMP_CMD_STOP event
   xhci_handle_stopped_cmd_ring()
 restart cmd_ring
 CMD_RING_RUNNING become 1 again
  } while ()
  return -ETIMEDOUT
xhci_write_64(CMD_RING_ABORT)
/* can abort random command */

To do abort operation correctly, we have to wait both of COMP_CMD_STOP
event and negation of CMD_RING_RUNNING.

But like above, while timeout handler is waiting negation of
CMD_RING_RUNNING, event handler can restart cmd_ring. So timeout
handler never be notice negation of CMD_RING_RUNNING, and retry of
CMD_RING_ABORT can abort random command (BTW, I guess retry of
CMD_RING_ABORT was workaround of this race).

To fix this race, this moves xhci_handle_stopped_cmd_ring() to
xhci_abort_cmd_ring().  And timeout handler waits COMP_CMD_STOP event.

At this point, timeout handler is owner of cmd_ring, and safely
restart cmd_ring by using xhci_handle_stopped_cmd_ring().

[FWIW, as bonus, this way would be easily extend to add CMD_RING_PAUSE
operation]

Signed-off-by: OGAWA Hirofumi 
---

 drivers/usb/host/xhci-mem.c  |1 
 drivers/usb/host/xhci-ring.c |  163 ++
 drivers/usb/host/xhci.h  |1 
 3 files changed, 88 insertions(+), 77 deletions(-)

diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2 
drivers/usb/host/xhci-mem.c
--- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race2   2016-11-16 
13:36:07.219329211 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-mem.c   2016-11-16 13:36:07.221329211 
+0900
@@ -2344,6 +2344,7 @@ int xhci_mem_init(struct xhci_hcd *xhci,
 
/* init command timeout work */
INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout);
+   init_completion(&xhci->stop_completion);
 
page_size = readl(&xhci->op_regs->page_size);
xhci_dbg_trace(xhci, trace_xhci_dbg_init,
diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2 
drivers/usb/host/xhci-ring.c
--- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race2  2016-11-16 
13:36:07.219329211 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-ring.c  2016-11-16 13:36:07.221329211 
+0900
@@ -284,6 +284,61 @@ static bool xhci_mod_cmd_timer(struct xh
return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
 }
 
+static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci)
+{
+   if (list_empty(&xhci->cmd_list))
+   return NULL;
+   return list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list);
+}
+
+/*
+ * Turn all commands on command ring with status set to "aborted" to no-op 
trbs.
+ * If there are other commands waiting then restart the ring and kick the 
timer.
+ * This must be called with command ring stopped and xhci->lock held.
+ */
+static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
+struct xhci_command *cur_cmd)
+{
+   struct xhci_command *i_cmd;
+   u32 cycle_state;
+
+   /* Turn all aborted commands in list to no-ops, then restart */
+   list_for_each_entry(i_cmd, &xhci->cmd_list, cmd_list) {
+
+   if (i_cmd->status != COMP_CMD_ABORT)
+   continue;
+
+   i_cmd->status = COMP_CMD_STOP;
+
+   xhci_dbg(xhci, "Turn aborted command %p to no-op\n",
+i_cmd->command_trb);
+   /* get cycle state from the original cmd trb */
+   cycle_state = le32_to_cpu(
+   i_cmd->command_trb->generic.field[3]) & TRB_CYCLE;
+   /* modify the command trb to no-op command */
+   i_cmd->command_trb->generic.field[0] = 0;
+   i_cmd->command_trb->generic.field[1] = 0;
+   i_cmd->command_trb->generic.field[2] = 0;
+   i_cmd->command_trb->generic.field[3] = cpu_to_le32(
+   TRB_TYPE(TRB_CMD_NOOP) | cycle_state);
+
+   /*
+* caller waiting for completion is called when command
+*  completion event is received for these no-op commands
+*/
+   }
+
+   xhci->cmd_ring_state = CMD_RING_STATE_RUNNING;
+
+   /* ring command ring doorbell to restart the command ring */
+   if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
+   !(xhci->xhc_state & XHCI_STATE_DYING)) {
+ 

[PATCH 1/3] xhci: Use delayed_work instead of timer for command timeout

2016-11-15 Thread OGAWA Hirofumi
This is preparation to fix abort operation race (See "xhci: Fix race
related to abort operation"). To make timeout sleepable, use
delayed_work instead of timer.

Signed-off-by: OGAWA Hirofumi 
---

 drivers/usb/host/xhci-mem.c  |7 +++
 drivers/usb/host/xhci-ring.c |   24 +++-
 drivers/usb/host/xhci.h  |4 ++--
 3 files changed, 20 insertions(+), 15 deletions(-)

diff -puN drivers/usb/host/xhci-mem.c~xhci-fix-abort-race1 
drivers/usb/host/xhci-mem.c
--- xhci/drivers/usb/host/xhci-mem.c~xhci-fix-abort-race1   2016-11-16 
13:36:04.132330334 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-mem.c   2016-11-16 13:36:04.135330333 
+0900
@@ -1795,7 +1795,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x
int size;
int i, j, num_ports;
 
-   del_timer_sync(&xhci->cmd_timer);
+   cancel_delayed_work_sync(&xhci->cmd_timer);
 
/* Free the Event Ring Segment Table and the actual Event Ring */
size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
@@ -2342,9 +2342,8 @@ int xhci_mem_init(struct xhci_hcd *xhci,
 
INIT_LIST_HEAD(&xhci->cmd_list);
 
-   /* init command timeout timer */
-   setup_timer(&xhci->cmd_timer, xhci_handle_command_timeout,
-   (unsigned long)xhci);
+   /* init command timeout work */
+   INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout);
 
page_size = readl(&xhci->op_regs->page_size);
xhci_dbg_trace(xhci, trace_xhci_dbg_init,
diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-abort-race1 
drivers/usb/host/xhci-ring.c
--- xhci/drivers/usb/host/xhci-ring.c~xhci-fix-abort-race1  2016-11-16 
13:36:04.10334 +0900
+++ xhci-hirofumi/drivers/usb/host/xhci-ring.c  2016-11-16 13:36:04.135330333 
+0900
@@ -279,6 +279,11 @@ void xhci_ring_cmd_db(struct xhci_hcd *x
readl(&xhci->dba->doorbell[0]);
 }
 
+static bool xhci_mod_cmd_timer(struct xhci_hcd *xhci, unsigned long delay)
+{
+   return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
+}
+
 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
 {
u64 temp_64;
@@ -295,7 +300,7 @@ static int xhci_abort_cmd_ring(struct xh
 * but the completion event in never sent. Use the cmd timeout timer to
 * handle those cases. Use twice the time to cover the bit polling retry
 */
-   mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT));
+   xhci_mod_cmd_timer(xhci, 2 * XHCI_CMD_DEFAULT_TIMEOUT);
xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
&xhci->op_regs->cmd_ring);
 
@@ -320,7 +325,7 @@ static int xhci_abort_cmd_ring(struct xh
 
xhci_err(xhci, "Stopped the command ring failed, "
"maybe the host is dead\n");
-   del_timer(&xhci->cmd_timer);
+   cancel_delayed_work(&xhci->cmd_timer);
xhci->xhc_state |= XHCI_STATE_DYING;
xhci_halt(xhci);
return -ESHUTDOWN;
@@ -1251,21 +1256,22 @@ static void xhci_handle_stopped_cmd_ring
if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
!(xhci->xhc_state & XHCI_STATE_DYING)) {
xhci->current_cmd = cur_cmd;
-   mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
+   xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT);
xhci_ring_cmd_db(xhci);
}
return;
 }
 
 
-void xhci_handle_command_timeout(unsigned long data)
+void xhci_handle_command_timeout(struct work_struct *work)
 {
struct xhci_hcd *xhci;
int ret;
unsigned long flags;
u64 hw_ring_state;
bool second_timeout = false;
-   xhci = (struct xhci_hcd *) data;
+
+   xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer);
 
/* mark this command to be cancelled */
spin_lock_irqsave(&xhci->lock, flags);
@@ -1333,7 +1339,7 @@ static void handle_cmd_completion(struct
 
cmd = list_entry(xhci->cmd_list.next, struct xhci_command, cmd_list);
 
-   del_timer(&xhci->cmd_timer);
+   cancel_delayed_work(&xhci->cmd_timer);
 
trace_xhci_cmd_completion(cmd_trb, (struct xhci_generic_trb *) event);
 
@@ -1421,7 +1427,7 @@ static void handle_cmd_completion(struct
if (cmd->cmd_list.next != &xhci->cmd_list) {
xhci->current_cmd = list_entry(cmd->cmd_list.next,
   struct xhci_command, cmd_list);
-   mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
+   xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT);
}
 
 event_handled:
@@ -3790,9 +3796,9 @@ static int queue_command(struct xhci_hcd
 
/* if the

Re: [PATCH] xhci: Fix dereference from empty xhci->cmd_list

2016-10-18 Thread OGAWA Hirofumi
Mathias Nyman  writes:

>> OGAWA Hirofumi  writes:
>>> If race with timeout or stop event on empty cmd_list was happened,
>>> handle_cmd_completion() deferences wrong pointer from xhci->cmd_list.
>>>
>>> So this makes sure we don't dereference wrong pointer from empty
>>> xhci->cmd_list.
>>>
>>> Signed-off-by: OGAWA Hirofumi 
>>> ---
>
> Thanks, nice catch.
>
> What kernel is this based on, There is a partial fix in 4.8:
>
> commit 33be126510974e2eb9679f1ca9bca4f67ee4c4c7
>  xhci: always handle "Command Ring Stopped" events
>
> Looking at the existing code the case you describe could be possible if
> handle_cmd_completion is racing with:
> -  command timed out twice and xhci_cleanup_command_queue() is called
> -  or command timed out and we fail to restart command ring (assume host is 
> dead)
>
> It would be interesting to know how this was triggered, were there any error 
> messages like:
> "Abort command ring failed"
> or with xhci debugging enabled something like:
> "command timed out twice, ring start fail?"

Ah, sorry for confusing. I wrote this for older kernel (3.8.x, and not
for vanilla). So 4.8 might not have race what I saw actually.
(33be126510974e2eb9679f1ca9bca4f67ee4c4c7 seems to be prevent it)

However, even if 4.8 fixed that case, if there is spurious event or
driver state confusing, fetch and dereference empty list is possible and
bad thing. So, IMO, it would be better to make sure to avoid it.

>>> +   if (list_empty(&xhci->cmd_list))
>>> +   cmd = NULL;
>>> +   else {
>>> +   cmd = list_first_entry(&xhci->cmd_list, struct xhci_command,
>>> +  cmd_list);
>>> +   }
>>
>> These few lines could become a nice helper. Something like:
>>
>> static inline struct xhci_command *next_command(struct xhci_hcd *xhci)
>> {
>>  if (list_empty(&xhci->cmd_list))
>>  return NULL;
>>  return list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list);
>> }
>>
>
> Both works for me, helper function, or code in this patch.

There are some opencode list_head usages too. It might be better to
change at once as list cleanup if need.

E.g.

/* restart timer if this wasn't the last command */
if (cmd->cmd_list.next != &xhci->cmd_list) {

is

if (!list_is_last(&cmd->cmd_list, &xhci->cmd_list)) {

Well, so, if next_command() cleanup is not required, I'd like to use as
is, in this patch.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xhci: Fix dereference from empty xhci->cmd_list

2016-10-18 Thread OGAWA Hirofumi

If race with timeout or stop event on empty cmd_list was happened,
handle_cmd_completion() deferences wrong pointer from xhci->cmd_list.

So this makes sure we don't dereference wrong pointer from empty
xhci->cmd_list.

Signed-off-by: OGAWA Hirofumi 
---

 drivers/usb/host/xhci-ring.c |   12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff -puN drivers/usb/host/xhci-ring.c~xhci-fix-empty-list 
drivers/usb/host/xhci-ring.c
--- linux/drivers/usb/host/xhci-ring.c~xhci-fix-empty-list  2016-10-18 
21:20:07.973055018 +0900
+++ linux-hirofumi/drivers/usb/host/xhci-ring.c 2016-10-18 21:21:35.949060365 
+0900
@@ -1336,7 +1336,12 @@ static void handle_cmd_completion(struct
return;
}
 
-   cmd = list_entry(xhci->cmd_list.next, struct xhci_command, cmd_list);
+   if (list_empty(&xhci->cmd_list))
+   cmd = NULL;
+   else {
+   cmd = list_first_entry(&xhci->cmd_list, struct xhci_command,
+  cmd_list);
+   }
 
del_timer(&xhci->cmd_timer);
 
@@ -1350,9 +1355,10 @@ static void handle_cmd_completion(struct
return;
}
 
-   if (cmd->command_trb != xhci->cmd_ring->dequeue) {
+   if (cmd == NULL || cmd->command_trb != xhci->cmd_ring->dequeue) {
xhci_err(xhci,
-"Command completion event does not match command\n");
+"Command completion event %s\n",
+cmd ? "does not match command" : "on empty list");
return;
}
 
_

-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html