[
https://issues.apache.org/jira/browse/SOLR-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944850#comment-17944850
]
Chris M. Hostetter commented on SOLR-17744:
-------------------------------------------
Reproducing this problem is pretty easy.
* spin up the {{-e cloud}} example
* index a few million docs
** I used a few randomly generated long fields
* use some shell scripting so that a handful of concurrent loops issue the
same expensive/slow query over and over to {{localhost:7574}} (exiting on
failure)
** I used function queries & sorts wrapped around some nested {{scale()}}
functions
* run {{./bin/solr stop -p 7574}} while those curl loops are running
Examples of the types of errors you might get from your curl loops...
{noformat}
curl: (52) Empty reply from server
curl: (18) transfer closed with outstanding read data remaining
curl: (7) Failed to connect to localhost port 7574: Connection refused
{noformat}
...that last one, "Failed to connect", is really the only valid error curl
should report if Solr+Jetty are both genuinely doing a "graceful" shutdown.
Here's an example of what you'll see in the Solr logs...
{noformat}
2025-04-15 00:09:40.129 INFO (ShutdownMonitor) [c: s: r: x: t:] o.e.j.s.Server
Stopped Server@64337702{STOPPING}[10.0.20,sto=0]
2025-04-15 00:09:40.135 INFO (ShutdownMonitor) [c: s: r: x: t:]
o.e.j.s.AbstractConnector Stopped ServerConnector@470a696f{HTTP/1.1, (http/1.1,
h2c)}{127.0.0.1:7574}
2025-04-15 00:09:40.161 INFO (qtp1631119258-39-localhost-58) [c:gettingstarted
s:shard2 r:core_node5 x:gettingstarted_shard2_replica_n2 t:localhost-58]
o.a.s.c.S.Request webapp=/solr path=/select
params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=16388&start=0&fsv=true&sort=scale(product(s
cale(b_l,-88888,1234567),scale(a_l,-25,99999999)),-12345678,987654321)+asc,+scale(product(scale(c_l,-88888,1234567),scale(d_l,-25,99999999)),-12345678,987654321)+desc&rows=10000&rid=localhost-58&version=2&q={!func}scale(product(scale(e_l,-88888,1234567),scale(f_l,-25,99999999)),-12345678,987654321)&om
itHeader=false&NOW=1744675779392&isShard=true&wt=javabin} hits=500400 status=0
QTime=767
2025-04-15 00:09:40.232 INFO (qtp1631119258-39-localhost-58) [c:gettingstarted
s:shard2 r:core_node5 x:gettingstarted_shard2_replica_n2 t:localhost-58]
o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we
are shutting down => org.eclipse.jetty.io.EofException: Closed
at
org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:756)
org.eclipse.jetty.io.EofException: Closed
at
org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:756)
~[jetty-server-10.0.20.jar:10.0.20]
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:780)
~[jetty-server-10.0.20.jar:10.0.20]
at
org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:157)
~[?:?]
at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:207)
~[?:?]
at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:200)
~[?:?]
at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:170) ~[?:?]
at
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:58)
~[?:?]
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:59)
~[?:?]
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:1031)
~[?:?]
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:621)
~[?:?]
...
2025-04-15 00:09:40.240 INFO (ShutdownMonitor) [c: s: r: x: t:]
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1562912969
2025-04-15 00:09:40.241 INFO (ShutdownMonitor) [c: s: r: x: t:]
o.a.s.c.ZkController Remove node as live in
ZooKeeper:/live_nodes/localhost:7574_solr
2025-04-15 00:09:40.254 INFO (ShutdownMonitor) [c: s: r: x: t:]
o.a.s.c.ZkController Publish this node as DOWN...
2025-04-15 00:09:40.254 INFO (ShutdownMonitor) [c: s: r: x: t:]
o.a.s.c.ZkController Publish node=localhost:7574_solr as DOWN
...
{noformat}
Note that Jetty {{Stopped ServerConnector}} almost immediately, but Solr is
still using jetty request threads like {{qtp1631119258-39-localhost-58}} to
spend time/cpu processing requests, only for the Jetty's {{HttpOutput}} to be
unable to write the response to the client because the connection has already
been closed.
(Heck: Solr hasn't even had a chance to update the nodes status in ZK when
{{Stopped ServerConnector}} happens – so not only are in-flight connections
being aborted, but we're still advertising this node as available to SolrJ
clients)
> Solr shutdown does not graceful close Jetty requests/connections
> ----------------------------------------------------------------
>
> Key: SOLR-17744
> URL: https://issues.apache.org/jira/browse/SOLR-17744
> Project: Solr
> Issue Type: New Feature
> Reporter: Chris M. Hostetter
> Priority: Major
>
> Solr does a lot of work internally (via things like SolrCore reference
> counting) to ensure that we "finish" in-flight requests on orderly shutdown
> (ie: when the user has issued a "stop" command) – but it does not appear that
> we are doing anything to ensure that *Jetty* managed resources will also wait
> for in process requests to finish.
> In particular, Jetty seems to abruptly close any existing & active network
> connections to clients, even as Solr continues to process those requests and
> try to write out the responses.
> There are Jetty features to ensure that shutdown is genuinely "graceful"
> (refusing new requests while letting existing ones finish) but Solr doesn't
> appear to use/enable these features:
> * In Jetty 10 & 11, this is apparently done using the {{StatisticsHandler}}
> (as a wrapper around the main handler collection i think?)
> **
> [https://github.com/jetty/jetty.project/issues/2076#issuecomment-353578130]
> **
> [https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/server/handler/StatisticsHandler.html]
> **
> [https://jetty.org/docs/jetty/11/programming-guide/server/http.html#handler-use-util-stats-handler]
> * In Jetty 12+ there is a {{graceful}} module that provides a
> {{GracefulHandler}} (which seems like a slightly more robust version of what
> {{StatisticsHandler}} does in jetty-10, but with less statistics tracking
> overhead)
> **
> [https://jetty.org/docs/jetty/12/operations-guide/start/index.html#stop-graceful]
> **
> [https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#graceful]
>
> The net result is that even during planned shutdown (or restart) of Solr
> nodes, clients can get lots of errors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]