Is there a specific type of corruption or inconsistency that the IO Reactor
is prone to? My concern is that if the IO Reactor can die under _any_
circumstance, we have to code for that eventuality under _all_
circumstances, on pain of a serious production outage. To my knowledge,
other async HTTP clients (e.g. those based on Netty) don't have this
failure mode.

On Fri, Jun 26, 2020 at 2:55 AM Oleg Kalnichevski <[email protected]> wrote:

> On Thu, 2020-06-25 at 16:17 -0700, xzer wrote:
> > Hi,
> >
> > I spent some time in the past week to read the code, then I found
> > several points:
> >
> > # An Error in FutureCallback or AsyncResponseConsumer will stop the
> > IOReactorWorker as there is not Error catch
> >
> > This can be easily reproduced and confirmed in the code, thus I am
> > not
> > providing the reproduce code, but I can submit it later if required.
> >
> > # suspicion on current Exception handling
> >
> > The main event loop is located at SingleCoreIOReactor#doExecute, the
> > main process is like following:
> >
> >
>
> https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111
> >
> > while(...){
> >   closePendingChannels();
> >   processEvents();
> >   validateActiveChannels();
> >   processClosedSessions();
> >   processPendingChannels();
> >   processPendingConnectionRequests();
> > }
> >
> > I noticed that there is no exception handling at the first layer of
> > the event loop, and inside the process methods, the Exception
> > handling
> > is performed on important points but not covering all execution
> > paths.
> > I tried several days to generate an Exception out of the existing
> > Exception handling coverage but failed, but I am still not very
> > confident on the possibility of uncaught Exceptions.
> >
> > I don't know the reason why there is no explicit Exception handling
> > inside the event loop to guard the loop, and am also curious about
> > why
> > the Error cannot be caught too inside the event loop to keep the
> > IOReactor running.
> >
> > I assume that the Errors and not covered Exceptions may cause
> > inconsistency on the IO reactor status (not sure), thus I am
> > proposing
> > a robust wrapper on the IO reactor so that we can abandon a stopped
> > IO
> > reactor and create a new one to replace it.
> >
> > Which wrapper, let me call it RobustReactor, will be extended from
> > SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we want
> > to handle the SingleCoreListeningIOReactor too. RobustReactor will
> > accept a SingleCoreIOReactor supplier by constructor. thus it will be
> > transparent to IOReactorWorker. RobustReactor will override the
> > execute to catch all Throwable and recreate the IOReactor instance.
> >
> > public RobustReactor(Supplier<SingleCoreIOReactor> reactorSupplier){}
> >
> > public void execute(){
> >   try{
> >       this.delegatee.execute();
> >   } catch (Throwable t){
> >       safeclose(this.delegatee);
> >       this.delegatee = supplier.get();
> >   }
> > }
> >
> > I am pursuing the suggestion for the possible better approaches.
> >
>
> I personally think that catching `Error`s is a _terribly_ _bad_ idea.
> It does not actually make things any more robust but makes damn sure
> the i/o layer will end up in a corrupt state in case something abnormal
> happens.
>
> I am fine with having one global try-catch block over the i/o reactor
> as long as it catches `Exception`s instead of `Throwable`s or at the
> very least re-throws `Error`s by default.
>
> You also need to consider the case when attempt to re-create I/O
> reactors in case of an OOM condition would likely result in more OOMs
> or, worse, a corrupt I/O reactor instance.
>
> Cheers
>
> Oleg
>
>
>
> > Best Regards
> > Rui Liu
> > Sr SDE | Amazon IN Core CX Tech
> >
> > xzer <[email protected]> 于2020年6月18日周四 下午2:11写道:
> > >
> > > Hi Oleg,
> > >
> > > (Thanks Ryan helped me forwarding the mail)
> > >
> > > Thanks for your reply, I had a glance of current 5.0 codebase,
> > > looks
> > > like the way of exception handling has been changed and
> > > IOReactorExceptionHandler is not required any more, which is a
> > > pretty
> > > good change.
> > >
> > > I agree with you it is not likely a problem to the library itself
> > > as
> > > there is no more alternatives in hand as I mentioned in my first
> > > mail.
> > > The only problem is that we left the robustness challenge to the
> > > users
> > > and most users may even not realize it, the other ones who realized
> > > this issue then will be implementing their own recovery mechanism
> > > to
> > > guard the client instance availability.
> > >
> > > I will take some to learn the current implementation and see if
> > > there
> > > is anything we should do more, as you said, the error handling may
> > > still be a challenge.
> > >
> > > Best Regards
> > > Rui Liu
> > > Sr SDE | Amazon IN Core CX Tech
> > >
> > > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道:
> > > >
> > > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote:
> > > > > Oleg, is there anything else we should know about the details
> > > > > of this
> > > > > problem on the 4.x line? It seems like there's more going on
> > > > > here
> > > > > than just
> > > > > a missing `catch` block.
> > > > >
> > > >
> > > > I would not really define it as a problem. It is just a
> > > > fundamental
> > > > principle that is is generally better to terminate i/o endpoints
> > > > in
> > > > case of an unexpected condition the framework does not know how
> > > > to
> > > > gracefully recover from than leaving them running in a half-
> > > > broken or
> > > > inconsistent state. If you want to keep endpoints running in case
> > > > of a
> > > > unexpected runtime exception in the application layer, so be it,
> > > > but
> > > > just apply that it consistently across the entire i/o and
> > > > protocol
> > > > layers.
> > > >
> > > > Oleg
> > > >
> > > >
> > > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote:
> > > > > > > Forwarding this along, since it's not making it to the list
> > > > > > > for
> > > > > > > some
> > > > > > > reason.
> > > > > > >
> > > > > > > ---------- Forwarded message ---------
> > > > > > > From: xzer <[email protected]>
> > > > > > > Date: 2020/6/17/ 10:59PM
> > > > > > > Subject: About HttpAsyncClient exception handling mechanism
> > > > > > > To: <[email protected]>
> > > > > > >
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am writing this mail to pursue the possible improvement
> > > > > > > on the
> > > > > > > current
> > > > > > > HttpAsyncClient exception handling mechanism.
> > > > > > >
> > > > > > > We observed many service failures inside our company with
> > > > > > > the
> > > > > > > error
> > > > > > > of “I/O
> > > > > > > reactor status: STOPPED” when using Apache HttpAsyncClient,
> > > > > > > we
> > > > > > > also
> > > > > > > know
> > > > > > > the reason is because the IOReactorExceptionHandler is not
> > > > > > > set
> > > > > > > correctly.
> > > > > > >
> > > > > > > For reference:
> > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
>
> https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html
> > > > > > >
> > > > > > > We noticed that there are several things should be taken
> > > > > > > into
> > > > > > > consideration
> > > > > > > as this kind of "mistake" keeps happening:
> > > > > > >
> > > > > > > - There is almost no guidance information about
> > > > > > > IOReactorExceptionHandler
> > > > > > > at the official site except a small paragraph in tutorial.
> > > > > > > The
> > > > > > > official
> > > > > > > example of how to customize and configure the client
> > > > > > > building
> > > > > > > does
> > > > > > > not have
> > > > > > > IOReactorExceptionHandler configured too.
> > > > > > >
> > > > > > > - The example of quick start is using
> > > > > > > "HttpAsyncClients.createDefault()" to
> > > > > > > illustrate the out-of-box usage, but the client instance
> > > > > > > created
> > > > > > > by
> > > > > > > which
> > > > > > > "quick way" does not have IOReactorExceptionHandler
> > > > > > > configured
> > > > > > > too.
> > > > > > >
> > > > > > > - Even we configure the IOReactorExceptionHandler to handle
> > > > > > > the
> > > > > > > Exceptions,
> > > > > > > we still have several concerns:
> > > > > > >   - Is that safe to resume the IO Reactor unconditionally
> > > > > > > on any
> > > > > > > Exception?
> > > > > > >   - It is still impossible to recover the IO Reactor if
> > > > > > > there is
> > > > > > > an
> > > > > > > Error
> > > > > > > rather than Exception, which is typically happening when
> > > > > > > the
> > > > > > > service
> > > > > > > is
> > > > > > > under traffic pressure.
> > > > > > >
> > > > > > > For the final point of Exception/Error recovery, we
> > > > > > > understand
> > > > > > > that
> > > > > > > it is
> > > > > > > difficult to decide how to handle the Exception/Error at
> > > > > > > library
> > > > > > > level as
> > > > > > > we have less knowledge about the runtime context. However,
> > > > > > > it is
> > > > > > > a
> > > > > > > painful
> > > > > > > task to developers who have to take the robustness into
> > > > > > > account.
> > > > > > > We
> > > > > > > believe
> > > > > > > that HttpAsyncClient should provide extra robustness
> > > > > > > mechanism to
> > > > > > > simplify
> > > > > > > the usage.
> > > > > > >
> > > > > > > We have a proposal like following:
> > > > > > >
> > > > > > > - The client instance created by
> > > > > > > "HttpAsyncClients.createDefault()"
> > > > > > > should
> > > > > > > have a default IOReactorExceptionHandler configured.
> > > > > > >
> > > > > > > - The guidance information of setting
> > > > > > > IOReactorExceptionHandler
> > > > > > > should be
> > > > > > > added into the example of customizing.
> > > > > > >
> > > > > > > - We also propose a default strategy of Exception handling
> > > > > > > as:
> > > > > > > resume
> > > > > > > on
> > > > > > > RuntimeException and crash on IOException, which is based
> > > > > > > on a
> > > > > > > hypothesis
> > > > > > > that RuntimeException is usually caused by user code and
> > > > > > > IOException
> > > > > > > is
> > > > > > > more likely suggesting a underlying network communication
> > > > > > > failure.
> > > > > > >
> > > > > > > - We also propose a mechanism which will check the
> > > > > > > IOReactor
> > > > > > > status
> > > > > > > and
> > > > > > > then reinitialize the client instance entirely when
> > > > > > > uncaught
> > > > > > > Exception/Error happens.
> > > > > > >
> > > > > > > As the proposal still requires polish and discussion, thus
> > > > > > > we are
> > > > > > > sending
> > > > > > > this mail to ask the opinion from you about it.
> > > > > > >
> > > > > >
> > > > > > Hi Rui
> > > > > >
> > > > > > I _think_ most of the technical issues have already been
> > > > > > addressed
> > > > > > in
> > > > > > HC 5.0. I am not sure HC 4.x is worth any major refactoring
> > > > > > efforts
> > > > > > at
> > > > > > this point but you are certainly welcome to propose a PR for
> > > > > > the
> > > > > > 4.x
> > > > > > release branch. The only potentially contentious issue might
> > > > > > be
> > > > > > handing
> > > > > > of Errors but I am sure we can work something out.
> > > > > >
> > > > > > Please do however consider upgrading to HC 5.0
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > Oleg
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----------------------------------------------------------
> > > > > > ------
> > > > > > ----
> > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > For additional commands, e-mail: [email protected]
> > > > > >
> > > > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to