Hi Andrey,

It's hard to say how much this affects users in practice, but I agree
that it sounds concerning.  Definitely worth auditing our existing
uses of 'submit' IMO.

Some cases might be tough to handle.  If an API call is intentionally
queueing up an operation to run asynchronously and then returns back
to a user, it's hard to surface that information to them after the
fact.  But we should make sure that the exception gets logged at an
absolute minimum - it sounds like that's not the case by default for
these usages of "submit" that don't hold onto the 'Future'?

Best,

Jason

On Tue, Jul 23, 2024 at 12:55 PM Andrey Bozhko <andyboz...@gmail.com> wrote:
>
> Hi all,
>
> I'd like to bring up for discussion how Solr handles failures of various
> background tasks.
>
>
> Typically with an ExecutorService, the task can be offloaded to a
> background thread via `execute(...)` or `submit(...)` methods:
> - if using `execute(Runnable)` method, any exception thrown by the task
> (assuming that the task doesn't have a try-catch) is intercepted and
> handled by the thread's UncaughtExceptionHandler - e.g., printed to console;
> - if using `submit(Callable<T>)` method, the caller *must* hold on to the
> future instance returned from the method, as the task result (or the task
> failure) can only be
> retrieved by invoking `get()` on the future instance.
>
>
> That said, there are quite a few places in Solr codebase where the
> background task is created by invoking `submit(...)` method but which do
> not retan any reference to the returned future.
> So if the background task fails for any reason, the failure will go
> completely unnoticed.
>
> Some of these places are:
> - CoreContainer#runAsync (
> https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L2588
> )
> - SolrZkClient.ProcessWatchWithExecutor#process (
> https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L1077-L1085
> )
>
> For example, CoreContainer#runAsync may be used to asynchronously reload
> the collection schema in certain cases - so if the reloading fails, I
> imagine the users would want to be aware of the failure and not let it go
> unnoticed.
>
>
> Does any of the above describe a real issue? Well, so far I tried searching
> the codebase for usage of ExecutorService `submit(...)` methods and
> replacing them with `execute(...)` where it makes sense - and then running
> the tests. Doing so broke 200+ tests due to uncaught exceptions in
> background threads. But I did not go through those uncaught exceptions to
> see which ones indicate a real issue and which ones are harmless.
>
> Thoughts?
>
>
> Best,
> Andrey Bozhko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to