On 2013.04.04 19:47, Pete Batard wrote: >> The problem is that the backend private information for transfers wasn't >> getting correctly cleared before the transfer completion callback was >> called.
OK, I had a chance to look more closely into it at last. I think the patch is close to what we want, but needs just a little bit of tweaking. >> The problematic sequence is along the lines of the following: >> >> 1. Client creates a transfer using libusb_alloc_transfer. Let's call >> this transfer A. >> 2. Transfer A is submitted, e.g. reading from bulk endpoint 0x81. >> 3. The backend assigns a fake fd of 1 to transfer A and then submits it >> to the OS. Yup. We create a new fake fd for each transfer, at submission time (so that even if a transfer is reused, it will have a new fd). >> 4. The OS completes the request. >> 5. The backend detects this when next handling events and the client has >> the callbacks called. OK. Bearing in mind that before we call on the callbacks, windows_clear_transfer_priv -> usbi_free_fd -> _free_index which sets our poll_fd[] entry for that transfer to "INVALID_WINFD" (i.e. no active transfer) in a critical section. The index of a free entry from that table is what we use as fake fd right now. >> 6. Without the fix in this patch, transfer_priv->pollable_fd.fd still >> has a value of one. Yep. >> 7. Client creates a new transfer using libusb_alloc_transfer. Let's call >> this transfer B. >> 8. Transer B is submitted, e.g. reading from bulk endpoing 0x81 with >> infinite timeout. >> 9. The backend assigns a fake fd of 1 to transfer B and then submits it >> to the OS. With you so far. The poll_fd[] entry was cleared, so we're free to reassign the same fake fd, and the changes added for WinCE removed the calls we used to issue to the OS, for an actual fd, so that we now use the index of the first free element in our table as an fd value => we're most likely going to get an fd value of 1 again. >> 10. The client then submits transfer A, reading from bulk endpoint 0x82 >> with a timeout of 2 seconds. If client reuses transfer A and submits, a new fd should be created and assigned to A's transfer_priv part, in which case we probably have an fd value of 2 in transfer A's priv part at this stage, since 1 has just been assigned. A new OVERLAPPED should also be assigned. Thus if a new transfer that reuses A has been fully submitted, OVERLAPPED should not be NULL and the fake fd should not be 1. As such, I don't think a fully completed 10 will result in the issue described (but a partially completed 10 probably will). >> 11. Before the backend is given transfer A it is added to the flying >> transfer list. >> 12. The OS completes transfer B. >> 13. A separate thread starts to run and asks libusb to handle events. OK. >> 14. The windows backend notices that fake fd 1 has been signalled and >> starts to look through the flying transfer list. OK. >> 15. The windows backend finds transfer A as it is earlier in the flying >> transfer list as it has an earlier timeout and has the old fake fd value >> of 1 The only way this should happen is if we are going through the flying transfer list before the transfer that reuses A has been fully submitted (or even initiated at all) Otherwise we should have an fd value of 2 for transfer A, not 1. > and thinks it's been completed. However the OVERLAPPED is NULL so >> it causes a NULL pointer dereference. Same story. This can only happen if A's submission for a new transfer has not been completed yet, as the OVERLAPPED wouldn't be NULL otherwise. On the other hand, the following line: https://github.com/libusbx/libusbx/blob/master/libusb/os/windows_usb.c#L2116 makes it clear that if we have an inactive or semi inactive transfer in our flying list, with an fd that hasn't been cleared and is being reused, we're going to have an issue indeed, so regardless of what 10/15 should read, your explanation makes sense. I think the reason we haven't picked on this issue so far is that before the inclusion of WinCE, we were requesting an actual fd from the OS (See the _open() line that was removed with [1]), so during the course of a normal run, we would never see the same fd value, even for fake fds. Right now, I think the fd cleanup when we process the original transfer is what we want, but the piece that is missing would be to surround your patch with a flying_transfers_lock or pass a pointer to the fd in usbi_free_fd() and do the fd cleanup there (since we have a critical section), to avoid any risks of having an OVERLAPPED that is NULL with an fd that we may still be trying to process. This may be needed in case we switch to processing the flying transfer list in a multithreaded app, right before we clean the fd. I still want to give it some more thought as well as hear any comments people may have. However this looks important enough to have fixed for 1.0.15, especially as it's due to a change we added after 1.0.14. Right now, my preference is with passing the fd as a pointer to usbi_free_fd() and reset it to INVALID_WINFD there, so if there aren't any comments, that's probably the patch I'll apply before release. Regards, /Pete [1] https://github.com/libusbx/libusbx/commit/790ffc78b008a03c95d10899f53997b504f55c72#L1L347 ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ libusbx-devel mailing list libusbx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libusbx-devel