Mathias Nyman <mathias.ny...@linux.intel.com> writes:

> The retry was a workaround for a case triggered on v4.1.4 by
> Vincent Pelletier (added to CC)
>
>    http://marc.info/?l=linux-usb&m=144031185222108&w=2
>
> The race could very well explain that issue.

>From logs of that thread, it would has possibility about this race.

>> @@ -320,15 +367,29 @@ static int xhci_abort_cmd_ring(struct xh
>>              udelay(1000);
>>              ret = xhci_handshake(&xhci->op_regs->cmd_ring,
>>                                   CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
>> -            if (ret == 0)
>> -                    return 0;
>> +            if (ret < 0) {
>> +                    xhci_err(xhci, "Stopped the command ring failed, "
>> +                             "maybe the host is dead\n");
>> +                    xhci->xhc_state |= XHCI_STATE_DYING;
>> +                    xhci_halt(xhci);
>> +                    return -ESHUTDOWN;
>> +            }
>> +    }
>
> So if we can verify that the race fix solves the old issue for Vincent
> Pelletier then we could get rid of the abort retry above as well.

Right. However, I'm not sure (and maybe you neither).  So I didn't
remove it in this patchset.

(After this patchset, I guess the retry will hit only when actual chip
internal confusion. I.e. retry will fail, then will chip halted.)

Well, anyway, if you decide to try to remove the retry, I also think it
would worth to try. However, it would be [PATCH 4/4] or as another
patchset.

>> +    /*
>> +     * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
>> +     * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
>> +     * but the completion event in never sent. Wait 2 secs (arbitrary
>> +     * number) to handle those cases after negation of CMD_RING_RUNNING.
>> +     */
>
> I'm speculating a bit here, but as we now can sleep, and if we could
> get rid of the abort retry couldn't we swap the order of the
> xhci_handshake(CMD_RING_RUNNING) busywait and
> wait_for_completion_timeout(). We could also use a standard command
> timeout time while waiting for the completion
>
> something like:
>
> hci_write_64(CMD_RING_ABORT)
> timed_out = wait_for_completion_timeout(XHCI_CMD_DEFAULT_TIMEOUT)
> xhci_handshake(CMD_RING_RUNNING)
> if (handshake fail) {
>     xhci_halt(), etc..
>    return -ESHUTDOWN
> }     
> if (timed out) {
>    xhci_cleanup_command_queue(xhci);
>    return
> }
>
> It would reduce the time we spend in the xhci_handshake() busyloop

BTW, busyloop removal was done by "[PATCH 3/3] ...". By [PATCH 3/3], we
can simply use straight forward coding without busyloop.

>> diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race2 
>> drivers/usb/host/xhci.h
>> --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race2        2016-11-16 
>> 18:38:52.552146764 +0900
>> +++ xhci-hirofumi/drivers/usb/host/xhci.h    2016-11-16 18:38:52.554146762 
>> +0900
>> @@ -1574,6 +1574,7 @@ struct xhci_hcd {
>>      struct list_head        cmd_list;
>>      unsigned int            cmd_ring_reserved_trbs;
>>      struct delayed_work     cmd_timer;
>> +    struct completion       stop_completion;
>
> Tiny thing, but maybe change stop_completion to
> cmd_ring_stop_completion.  to avoid mixing it with stopping host
> controller, stop endpoint, stop device etc

OK.
-- 
OGAWA Hirofumi <hirof...@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to