Hey Zsolt,

The same argument could be generalized to many phases of the query
execution. If something goes bad when compiling or running the query
there is a risk that the whole HS2 could go down.

In many cases, adding asynchronous executions seems to alleviate a
problem but at the same time it makes the code more complex and error
prone. Moreover, whenever we add more thread pools we have to be
mindful that this can affect the performance of the whole process.  If
a hook goes wild and takes a lot of time, then even if it runs on a
separate thread it will have an impact on the whole server and it may
be even more difficult to diagnose what is going wrong. Furthermore,
adding a timeout assumes that the hooks are interruptible which might
not always be the case.

Bottom line is that there are trade-offs to consider and with the
information I have so far I am neither for nor against this proposal.

Best,
Stamatis


On Wed, Sep 25, 2024 at 10:21 AM Zsolt Miskolczi
<zsolt.miskol...@gmail.com> wrote:
>
> Hi folks!
>
> At this point, Hive hooks are running synchronously:
>
> for (ExecuteWithHookContext hook : hooks) {
>   perfLogger.perfLogBegin(CLASS_NAME, prefix + hook.getClass().getName());
>   hook.run(hookContext);
>   perfLogger.perfLogEnd(CLASS_NAME, prefix + hook.getClass().getName());
> }
>
> My current problem is that if any problem happens with the hook, it just 
> slows Hive down.
> In the current situation, we got a hook that has a retry logic in it and it 
> consumed a lot of time from the Hive side.
>
> I'm thinking about two possible solutions:
> - running the hooks asynchronously, so that Hive wouldn't even care about how 
> long the hooks are running
> - adding a timeout (like 2 seconds) to run the hook.
>
> What are your thoughts?
>
> Thank you,
> Zsolt

Reply via email to