On Sun, 2019-12-22 at 15:37 -0800, Ryan Schmitt wrote:
> I think there are actually two field stores where we can race: one is
> `subscription`, and the other is `requests`. Also, I think that the
> simplest thing to do is to mark the `flushToSubscriber` method
> `synchronized`. (The `flushInProgress` field is used to implement
> non-reentrancy, rather than mutual exclusion; it turns out we need
> both.)
> 

I actually had a fix for this defect ready yesterday but it was already
too late at night (I am at GMT+5 right now) and I was too tired to
prepare a proper pull request and re-test everything. I am sorry you
had to have duplicated all my work.

I have prepared a fix similar to yours that also removes extra
synchronization on Queue<ByteBuffer> instance variable. 

https://github.com/ok2c/httpcomponents-core/commit/5fd4d1596e12c3acb0952d2f53f925b59cc3f524

Feel free to apply whatever fix you like better.


> Also, is there a reason why httpcore5 doesn't have an Slf4j
> dependency? I'd
> like to keep my Reactive debug logging (at DEBUG or TRACE) level, but
> I had
> to add an Slf4j just to write it in the first place.
> 

This goes back to the infamous logging wars. 

I actually like it a lot that core components do not depend on any
particular logging framework but feel free to add it to the reactive
module if you deem it justified.

Cheers

Oleg



> On Sun, Dec 22, 2019 at 3:21 PM Ryan Schmitt <[email protected]>
> wrote:
> 
> > I think I figured it out. I added debug logging to
> > ReactiveDataConsumer
> > and ReactiveEntityConsumer, and then I injected randomized sleeps
> > at
> > various points in the code in order to try to induce races. Check
> > out the
> > following:
> > 
> > 
> > 
https://gist.githubusercontent.com/rschmitt/51ba439f5141e6cac8b6e148e29c61fd/raw/2f4e13012451e93c90ccda29ad9f1c37c051f1ed/reactive-race.txt
> > 
> > Here, the two threads (an RxJava thread and a client thread) race
> > in such
> > a way that the call to #streamEnd never results in a call to
> > Subscriber#onComplete from either thread:
> > 
> > 1. The client calls #streamEnd
> > 2. The client acquires the flush lock
> > 3. The client sees that the subscriber is `null` and decides to
> > return
> > 4. RxJava subscribes
> > 5. RxJava sees that the lock is already held and returns
> > 6. The client releases the lock
> > 
> > In the logs, (3) and (4) are reported in reverse order. The
> > `subscriber`
> > field is volatile, so we are dealing with a race condition, not a
> > memory
> > model issue.
> > 
> > I'll need to give some thought to how to fix this. The obvious
> > solution is
> > for ReactiveDataConsumer#subscribe to acquire (spinwait on?) the
> > flush lock
> > before the store to the `subscriber` field takes place.
> > 
> > On Sun, Dec 22, 2019 at 5:10 AM Oleg Kalnichevski <[email protected]
> > >
> > wrote:
> > 
> > > On Sun, 2019-12-22 at 10:28 +0100, Oleg Kalnichevski wrote:
> > > > On Sat, 2019-12-21 at 10:58 +0100, Oleg Kalnichevski wrote:
> > > > > 
> > > > 
> > > > 
> > > > ...
> > > > 
> > > > > 
> > > > > It appears that the defect only occurs with TLS transport and
> > > > > entity
> > > > > enclosing requests. It also looks to be a race condition of
> > > > > some
> > > > > sort.
> > > > > It is nearly impossible to reproduce with context / wire
> > > > > logging
> > > > > on.
> > > > > 
> > > > > I will continue looking into it tomorrow.
> > > > > 
> > > > > Oleg
> > > > > 
> > > > 
> > > > Hi Ryan
> > > > 
> > > > I can now reliably reproduce the issue with context / wire
> > > > logging
> > > > on.
> > > > 
> > > > So far it looks quite certain there is a race condition of a
> > > > sort in
> > > > the reactive binding layer.
> > > > 
> > > > All failing tests get stuck here
> > > > 
> > > > 
> > > 
> > > 
https://github.com/apache/httpcomponents-client/pull/181/commits/c1b060703ac57fc4806d1249b906a385648862b5#diff-4aa21c6723799de37acda71cbe858faaR254
> > > > 
> > > > However all message exchanges clearly look completed in the
> > > > context
> > > > log
> > > > and there appear to be no issues in the protocol handling
> > > > layer.
> > > > 
> > > > It would be great if you could take a look. I will continue
> > > > debugging
> > > > reactive binding code in the meantime.
> > > > 
> > > > Cheers
> > > > 
> > > > Oleg
> > > > 
> > > > 
> > > 
> > > Without #publisherToByteArray conversion all tests pass
> > > consistently
> > > without a single failure after many repetitions.
> > > 
> > > I will start looking into ReactiveDataConsumer. Any help would be
> > > appreciated.
> > > 
> > > Oleg
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > > 
> > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to