On Sun, 2019-12-22 at 15:37 -0800, Ryan Schmitt wrote: > I think there are actually two field stores where we can race: one is > `subscription`, and the other is `requests`. Also, I think that the > simplest thing to do is to mark the `flushToSubscriber` method > `synchronized`. (The `flushInProgress` field is used to implement > non-reentrancy, rather than mutual exclusion; it turns out we need > both.) >
I actually had a fix for this defect ready yesterday but it was already too late at night (I am at GMT+5 right now) and I was too tired to prepare a proper pull request and re-test everything. I am sorry you had to have duplicated all my work. I have prepared a fix similar to yours that also removes extra synchronization on Queue<ByteBuffer> instance variable. https://github.com/ok2c/httpcomponents-core/commit/5fd4d1596e12c3acb0952d2f53f925b59cc3f524 Feel free to apply whatever fix you like better. > Also, is there a reason why httpcore5 doesn't have an Slf4j > dependency? I'd > like to keep my Reactive debug logging (at DEBUG or TRACE) level, but > I had > to add an Slf4j just to write it in the first place. > This goes back to the infamous logging wars. I actually like it a lot that core components do not depend on any particular logging framework but feel free to add it to the reactive module if you deem it justified. Cheers Oleg > On Sun, Dec 22, 2019 at 3:21 PM Ryan Schmitt <[email protected]> > wrote: > > > I think I figured it out. I added debug logging to > > ReactiveDataConsumer > > and ReactiveEntityConsumer, and then I injected randomized sleeps > > at > > various points in the code in order to try to induce races. Check > > out the > > following: > > > > > > https://gist.githubusercontent.com/rschmitt/51ba439f5141e6cac8b6e148e29c61fd/raw/2f4e13012451e93c90ccda29ad9f1c37c051f1ed/reactive-race.txt > > > > Here, the two threads (an RxJava thread and a client thread) race > > in such > > a way that the call to #streamEnd never results in a call to > > Subscriber#onComplete from either thread: > > > > 1. The client calls #streamEnd > > 2. The client acquires the flush lock > > 3. The client sees that the subscriber is `null` and decides to > > return > > 4. RxJava subscribes > > 5. RxJava sees that the lock is already held and returns > > 6. The client releases the lock > > > > In the logs, (3) and (4) are reported in reverse order. The > > `subscriber` > > field is volatile, so we are dealing with a race condition, not a > > memory > > model issue. > > > > I'll need to give some thought to how to fix this. The obvious > > solution is > > for ReactiveDataConsumer#subscribe to acquire (spinwait on?) the > > flush lock > > before the store to the `subscriber` field takes place. > > > > On Sun, Dec 22, 2019 at 5:10 AM Oleg Kalnichevski <[email protected] > > > > > wrote: > > > > > On Sun, 2019-12-22 at 10:28 +0100, Oleg Kalnichevski wrote: > > > > On Sat, 2019-12-21 at 10:58 +0100, Oleg Kalnichevski wrote: > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > It appears that the defect only occurs with TLS transport and > > > > > entity > > > > > enclosing requests. It also looks to be a race condition of > > > > > some > > > > > sort. > > > > > It is nearly impossible to reproduce with context / wire > > > > > logging > > > > > on. > > > > > > > > > > I will continue looking into it tomorrow. > > > > > > > > > > Oleg > > > > > > > > > > > > > Hi Ryan > > > > > > > > I can now reliably reproduce the issue with context / wire > > > > logging > > > > on. > > > > > > > > So far it looks quite certain there is a race condition of a > > > > sort in > > > > the reactive binding layer. > > > > > > > > All failing tests get stuck here > > > > > > > > > > > > > > https://github.com/apache/httpcomponents-client/pull/181/commits/c1b060703ac57fc4806d1249b906a385648862b5#diff-4aa21c6723799de37acda71cbe858faaR254 > > > > > > > > However all message exchanges clearly look completed in the > > > > context > > > > log > > > > and there appear to be no issues in the protocol handling > > > > layer. > > > > > > > > It would be great if you could take a look. I will continue > > > > debugging > > > > reactive binding code in the meantime. > > > > > > > > Cheers > > > > > > > > Oleg > > > > > > > > > > > > > > Without #publisherToByteArray conversion all tests pass > > > consistently > > > without a single failure after many repetitions. > > > > > > I will start looking into ReactiveDataConsumer. Any help would be > > > appreciated. > > > > > > Oleg > > > > > > > > > > > > --------------------------------------------------------------- > > > ------ > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
