[
https://issues.apache.org/jira/browse/SOLR-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris M. Hostetter updated SOLR-17744:
--------------------------------------
Attachment: SOLR-17744.patch
Assignee: Chris M. Hostetter
Status: Open (was: Open)
I'm attaching a patch the basically re-creates Jetty-12's {{graceful.mod}} but
using {{{}StatisticsHandler{}}}.
So far in my (miminal) testing this patch does the job i hoped it would:
* clients with non-distrib in-flight requests to a solr node that is being
shutdown:
** no longer get errors – the requests finish successfully
** true for either non-distrib requests, or single-shard collections
* clients with (multi-shard) distributed in-flight requests to a solr node
that is being shutdown:
** _ALSO_ no longer get connection errors – the requests finish successfully
** I do see sometimes see {{org.eclipse.jetty.io.EofException: Closed}} errors
in the logs, but it seems like it only happens with the sub-shard requests are
being sent to the same node that's being shutdown?
*** but the requests still seem to finish successfully which is weird.
** No such errors seem to be logged when sub-shard requests are in-flight to
other nodes (not being shutdown)
* clients with (multi-shard) distributed in-flight requests to a solr node
that is _NOT_ being shutdown, but is sending sub-shard requests to a node being
shutdown:
** _Still_ don't get connection errors – there's o reason they ever would
** But they _CAN_ still get 500 errors
*** It looks like internally {{solr.SearchHandler}} doesn't deal well
w/rejection when a sub-shard request gets a {{java.net.ConnectException:
Connection refused}}
While there are certainly other improvements we can add to Solr to make our own
internal code work better on graceful shutdown (including re-thinking how early
we de-register from live nodes / cluster state) I really think this jetty level
improv3ement is worth putting in.
(And FWIW: the jetty-12 {{GracefulHandler}} looks like it would be pretty easy
to clone/backport if folks feel like the "stats" part of {{StatisticsHandler}}
is too much overhead)
> Solr shutdown does not graceful close Jetty requests/connections
> ----------------------------------------------------------------
>
> Key: SOLR-17744
> URL: https://issues.apache.org/jira/browse/SOLR-17744
> Project: Solr
> Issue Type: New Feature
> Reporter: Chris M. Hostetter
> Assignee: Chris M. Hostetter
> Priority: Major
> Attachments: SOLR-17744.patch
>
>
> Solr does a lot of work internally (via things like SolrCore reference
> counting) to ensure that we "finish" in-flight requests on orderly shutdown
> (ie: when the user has issued a "stop" command) – but it does not appear that
> we are doing anything to ensure that *Jetty* managed resources will also wait
> for in process requests to finish.
> In particular, Jetty seems to abruptly close any existing & active network
> connections to clients, even as Solr continues to process those requests and
> try to write out the responses.
> There are Jetty features to ensure that shutdown is genuinely "graceful"
> (refusing new requests while letting existing ones finish) but Solr doesn't
> appear to use/enable these features:
> * In Jetty 10 & 11, this is apparently done using the {{StatisticsHandler}}
> (as a wrapper around the main handler collection i think?)
> **
> [https://github.com/jetty/jetty.project/issues/2076#issuecomment-353578130]
> **
> [https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/server/handler/StatisticsHandler.html]
> **
> [https://jetty.org/docs/jetty/11/programming-guide/server/http.html#handler-use-util-stats-handler]
> * In Jetty 12+ there is a {{graceful}} module that provides a
> {{GracefulHandler}} (which seems like a slightly more robust version of what
> {{StatisticsHandler}} does in jetty-10, but with less statistics tracking
> overhead)
> **
> [https://jetty.org/docs/jetty/12/operations-guide/start/index.html#stop-graceful]
> **
> [https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#graceful]
>
> The net result is that even during planned shutdown (or restart) of Solr
> nodes, clients can get lots of errors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]