Jason Gerlowski created SOLR-17515:
--------------------------------------
Summary: Recovery fails in Solr 9.7.0 if basic-auth is enabled
Key: SOLR-17515
URL: https://issues.apache.org/jira/browse/SOLR-17515
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 9.7
Reporter: Jason Gerlowski
Several reporters on the users@ list, recently shared a bug they noticed on
upgrading to Solr 9.7. Replicas would try to recover, but fail with a
NullPointerException:
{code}
2024-09-18 09:36:31.238 ERROR
(recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr
dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts
s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot
invoke
"org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)"
because "this.authenticationStore" is null
at
org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318)
java.lang.NullPointerException: Cannot invoke
"org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)"
because "this.authenticationStore" is null
at
org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318)
~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97)
~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85)
~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093)
~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062)
~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907)
~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633)
~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333)
~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum -
2024-09-03 15:05:20]
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309)
~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum -
2024-09-03 15:05:20]
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212)
~[metrics-core-4.2.26.jar:4.2.26]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
~[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449)
~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
- 2024-09-03 15:05:20]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
2024-09-18 09:36:31.238 ERROR
(recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr
dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts
s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:]
o.a.s.c.RecoveryStrategy Recovery failed - trying again... (0)
2024-09-18 09:36:31.238 INFO
(recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr
dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts
s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:]
o.a.s.c.RecoveryStrategy Wait [4] seconds before trying to recover again
(attempt=1)
{code}
It turns out that the issue isn't specific to upgrading clusters: any 9.7.0
cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on
during replica recovery. The result is that replicas will fail to recover, and
sit marked as "recovering" indefinitely.
The issue can be reproduced locally in a source-checkout using the following
steps:
{code}
git checkout branch_9_7
./gradlew clean assemble
cd solr/packaging/build/solr-9.7.0-SNAPSHOT
# At prompts, I chose: 4 nodes, "gettingstarted", 1 shard, 2 replicas,
"_default" configset
bin/solr start -e cloud
bin/solr post -c gettingstarted example/exampledocs/books.json
# Stop the node containing the non-leader replica
bin/solr stop -p <port>
bin/solr post -c gettingstarted example/exampledocs/books.csv
# Enable auth and trigger recovery by turning the node back on
bin/solr auth enable -type basicAuth -credentials solr:solrRocks -blockUnknown
true
# This line will need tweaked based on which Solr node was previously stopped
"bin/solr" start --cloud -p <port> -s "example/cloud/<node>/solr" -z
127.0.0.1:9983
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]