On 2013.04.04 19:47, Pete Batard wrote:
>> The problem is that the backend private information for transfers wasn't
>> getting correctly cleared before the transfer completion callback was
>> called.

OK, I had a chance to look more closely into it at last. I think the 
patch is close to what we want, but needs just a little bit of tweaking.

>> The problematic sequence is along the lines of the following:
>>
>> 1. Client creates a transfer using libusb_alloc_transfer. Let's call
>> this transfer A.
>> 2. Transfer A is submitted, e.g. reading from bulk endpoint 0x81.
>> 3. The backend assigns a fake fd of 1 to transfer A and then submits it
>> to the OS.

Yup. We create a new fake fd for each transfer, at submission time (so 
that even if a transfer is reused, it will have a new fd).

>> 4. The OS completes the request.
>> 5. The backend detects this when next handling events and the client has
>> the callbacks called.

OK. Bearing in mind that before we call on the callbacks, 
windows_clear_transfer_priv -> usbi_free_fd -> _free_index which sets 
our poll_fd[] entry for that transfer to "INVALID_WINFD" (i.e. no active 
transfer) in a critical section. The index of a free entry from that 
table is what we use as fake fd right now.

>> 6. Without the fix in this patch, transfer_priv->pollable_fd.fd still
>> has a value of one.

Yep.

>> 7. Client creates a new transfer using libusb_alloc_transfer. Let's call
>> this transfer B.
>> 8. Transer B is submitted, e.g. reading from bulk endpoing 0x81 with
>> infinite timeout.
>> 9. The backend assigns a fake fd of 1 to transfer B and then submits it
>> to the OS.

With you so far. The poll_fd[] entry was cleared, so we're free to 
reassign the same fake fd, and the changes added for WinCE removed the 
calls we used to issue to the OS, for an actual fd, so that we now use 
the index of the first free element in our table as an fd value => we're 
most likely going to get an fd value of 1 again.

>> 10. The client then submits transfer A, reading from bulk endpoint 0x82
>> with a timeout of 2 seconds.

If client reuses transfer A and submits, a new fd should be created and 
assigned to A's transfer_priv part, in which case we probably have an fd 
value of 2 in transfer A's priv part at this stage, since 1 has just 
been assigned. A new OVERLAPPED should also be assigned.
Thus if a new transfer that reuses A has been fully submitted, 
OVERLAPPED should not be NULL and the fake fd should not be 1.
As such, I don't think a fully completed 10 will result in the issue 
described (but a partially completed 10 probably will).

>> 11. Before the backend is given transfer A it is added to the flying
>> transfer list.
>> 12. The OS completes transfer B.
>> 13. A separate thread starts to run and asks libusb to handle events.

OK.

>> 14. The windows backend notices that fake fd 1 has been signalled and
>> starts to look through the flying transfer list.

OK.

>> 15. The windows backend finds transfer A as it is earlier in the flying
>> transfer list as it has an earlier timeout and has the old fake fd value
>> of 1

The only way this should happen is if we are going through the flying 
transfer list before the transfer that reuses A has been fully submitted 
(or even initiated at all)
Otherwise we should have an fd value of 2 for transfer A, not 1.

> and thinks it's been completed. However the OVERLAPPED is NULL so
>> it causes a NULL pointer dereference.

Same story. This can only happen if A's submission for a new transfer 
has not been completed yet, as the OVERLAPPED wouldn't be NULL otherwise.

On the other hand, the following line:
https://github.com/libusbx/libusbx/blob/master/libusb/os/windows_usb.c#L2116
makes it clear that if we have an inactive or semi inactive transfer in 
our flying list, with an fd that hasn't been cleared and is being 
reused, we're going to have an issue indeed, so regardless of what 10/15 
should read, your explanation makes sense.

I think the reason we haven't picked on this issue so far is that before 
the inclusion of WinCE, we were requesting an actual fd from the OS (See 
the _open() line that was removed with [1]), so during the course of a 
normal run, we would never see the same fd value, even for fake fds.

Right now, I think the fd cleanup when we process the original transfer 
is what we want, but the piece that is missing would be to surround your 
patch with a flying_transfers_lock or pass a pointer to the fd in 
usbi_free_fd() and do the fd cleanup there (since we have a critical 
section), to avoid any risks of having an OVERLAPPED that is NULL with 
an fd that we may still be trying to process. This may be needed in case 
we switch to processing the flying transfer list in a multithreaded app, 
right before we clean the fd.

I still want to give it some more thought as well as hear any comments 
people may have. However this looks important enough to have fixed for 
1.0.15, especially as it's due to a change we added after 1.0.14.
Right now, my preference is with passing the fd as a pointer to 
usbi_free_fd() and reset it to INVALID_WINFD there, so if there aren't 
any comments, that's probably the patch I'll apply before release.

Regards,

/Pete

[1] 
https://github.com/libusbx/libusbx/commit/790ffc78b008a03c95d10899f53997b504f55c72#L1L347


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
libusbx-devel mailing list
libusbx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libusbx-devel

Reply via email to