I don't even know what GMT+5 is. Yekaterinburg?

I have prepared a fix similar to yours that also removes extra
> synchronization on Queue<ByteBuffer> instance variable.


I don't believe your fix is correct, because you appear to have removed the
reentrancy checks in `flushToSubscriber`. Those checks are there to prevent
*the same thread* from calling into `flushToSubscriber` multiple times, and
racing against itself. This is more straightforward than it sounds; the
call stack looks like:

`flushToSubscriber` -> `Subscriber#onNext` -> `Subscription#request` ->
`flushToSubscriber`

This can cause all kinds of nightmares, such as ByteBuffers being sent
out-of-order to the subscriber. If I'm right about this, and your changes
passed all the tests (including the reactive tests in the client), then I
clearly need even more test coverage.

I actually like it a lot that core components do not depend on any
> particular logging framework but feel free to add it to the reactive
> module if you deem it justified.


I strongly agree that core should not depend on a logging _runtime_, but
depending on a logging API is completely reasonable. I personally use
log4j2 as my logging framework, but I routinely install bridges for
JakartaCommons-logging, java.util.logging, log4j 1.2, and Slf4j 1.7, so the
API doesn't really matter terribly. I'm surprised to hear that there was
apparently some sort of religious war about this in the past.

On Mon, Dec 23, 2019 at 12:46 AM Oleg Kalnichevski <[email protected]> wrote:

> On Sun, 2019-12-22 at 15:37 -0800, Ryan Schmitt wrote:
> > I think there are actually two field stores where we can race: one is
> > `subscription`, and the other is `requests`. Also, I think that the
> > simplest thing to do is to mark the `flushToSubscriber` method
> > `synchronized`. (The `flushInProgress` field is used to implement
> > non-reentrancy, rather than mutual exclusion; it turns out we need
> > both.)
> >
>
> I actually had a fix for this defect ready yesterday but it was already
> too late at night (I am at GMT+5 right now) and I was too tired to
> prepare a proper pull request and re-test everything. I am sorry you
> had to have duplicated all my work.
>
> I have prepared a fix similar to yours that also removes extra
> synchronization on Queue<ByteBuffer> instance variable.
>
>
> https://github.com/ok2c/httpcomponents-core/commit/5fd4d1596e12c3acb0952d2f53f925b59cc3f524
>
> Feel free to apply whatever fix you like better.
>
>
> > Also, is there a reason why httpcore5 doesn't have an Slf4j
> > dependency? I'd
> > like to keep my Reactive debug logging (at DEBUG or TRACE) level, but
> > I had
> > to add an Slf4j just to write it in the first place.
> >
>
> This goes back to the infamous logging wars.
>
> I actually like it a lot that core components do not depend on any
> particular logging framework but feel free to add it to the reactive
> module if you deem it justified.
>
> Cheers
>
> Oleg
>
>
>
> > On Sun, Dec 22, 2019 at 3:21 PM Ryan Schmitt <[email protected]>
> > wrote:
> >
> > > I think I figured it out. I added debug logging to
> > > ReactiveDataConsumer
> > > and ReactiveEntityConsumer, and then I injected randomized sleeps
> > > at
> > > various points in the code in order to try to induce races. Check
> > > out the
> > > following:
> > >
> > >
> > >
>
> https://gist.githubusercontent.com/rschmitt/51ba439f5141e6cac8b6e148e29c61fd/raw/2f4e13012451e93c90ccda29ad9f1c37c051f1ed/reactive-race.txt
> > >
> > > Here, the two threads (an RxJava thread and a client thread) race
> > > in such
> > > a way that the call to #streamEnd never results in a call to
> > > Subscriber#onComplete from either thread:
> > >
> > > 1. The client calls #streamEnd
> > > 2. The client acquires the flush lock
> > > 3. The client sees that the subscriber is `null` and decides to
> > > return
> > > 4. RxJava subscribes
> > > 5. RxJava sees that the lock is already held and returns
> > > 6. The client releases the lock
> > >
> > > In the logs, (3) and (4) are reported in reverse order. The
> > > `subscriber`
> > > field is volatile, so we are dealing with a race condition, not a
> > > memory
> > > model issue.
> > >
> > > I'll need to give some thought to how to fix this. The obvious
> > > solution is
> > > for ReactiveDataConsumer#subscribe to acquire (spinwait on?) the
> > > flush lock
> > > before the store to the `subscriber` field takes place.
> > >
> > > On Sun, Dec 22, 2019 at 5:10 AM Oleg Kalnichevski <[email protected]
> > > >
> > > wrote:
> > >
> > > > On Sun, 2019-12-22 at 10:28 +0100, Oleg Kalnichevski wrote:
> > > > > On Sat, 2019-12-21 at 10:58 +0100, Oleg Kalnichevski wrote:
> > > > > >
> > > > >
> > > > >
> > > > > ...
> > > > >
> > > > > >
> > > > > > It appears that the defect only occurs with TLS transport and
> > > > > > entity
> > > > > > enclosing requests. It also looks to be a race condition of
> > > > > > some
> > > > > > sort.
> > > > > > It is nearly impossible to reproduce with context / wire
> > > > > > logging
> > > > > > on.
> > > > > >
> > > > > > I will continue looking into it tomorrow.
> > > > > >
> > > > > > Oleg
> > > > > >
> > > > >
> > > > > Hi Ryan
> > > > >
> > > > > I can now reliably reproduce the issue with context / wire
> > > > > logging
> > > > > on.
> > > > >
> > > > > So far it looks quite certain there is a race condition of a
> > > > > sort in
> > > > > the reactive binding layer.
> > > > >
> > > > > All failing tests get stuck here
> > > > >
> > > > >
> > > >
> > > >
>
> https://github.com/apache/httpcomponents-client/pull/181/commits/c1b060703ac57fc4806d1249b906a385648862b5#diff-4aa21c6723799de37acda71cbe858faaR254
> > > > >
> > > > > However all message exchanges clearly look completed in the
> > > > > context
> > > > > log
> > > > > and there appear to be no issues in the protocol handling
> > > > > layer.
> > > > >
> > > > > It would be great if you could take a look. I will continue
> > > > > debugging
> > > > > reactive binding code in the meantime.
> > > > >
> > > > > Cheers
> > > > >
> > > > > Oleg
> > > > >
> > > > >
> > > >
> > > > Without #publisherToByteArray conversion all tests pass
> > > > consistently
> > > > without a single failure after many repetitions.
> > > >
> > > > I will start looking into ReactiveDataConsumer. Any help would be
> > > > appreciated.
> > > >
> > > > Oleg
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to