[
https://issues.apache.org/jira/browse/SOLR-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074969#comment-18074969
]
Chris M. Hostetter commented on SOLR-18144:
-------------------------------------------
{{TestSchemaDesignerConfigSetHelper.testPersistSampleDocs}} has had a fairly
consistent 10% jenkins failure rate since roughly the first week of march.
When it fails it always fails on the 9x branch, and the the apache jenkins job
retry logic says the seeds reproduce reliably
git bisect blames
[8299d43bdf3|https://gitbox.apache.org/repos/asf?p=solr.git;h=8299d43bdf3] for
the cause...
{noformat}
$ git bisect log
# bad: [fa0547434f464a306e5433776415fd2e954fdc6d] SOLR-18192: Pin all GitHub
Actions to full commit SHAs as per ASF policy (branch_9x) (#4290)
# good: [da35e1f6a288c549930194e82068d25d7ae1854b] SOLR-18107: Test fix
testLogLevelHandlerOutput (#4146)
git bisect start 'fa0547434f464a306e5433776415fd2e954fdc6d'
'da35e1f6a288c549930194e82068d25d7ae1854b'
# bad: [4ee15bfcb88917114b775a2aa5c9d07a84cb6378]
TikaServerExtractionBackendTest: ignore on s390x
git bisect bad 4ee15bfcb88917114b775a2aa5c9d07a84cb6378
# bad: [8299d43bdf37bc2a51d93842497327bbe86299eb] SOLR-18144: fix schema
designer to auto-create .system in 9x (#4202)
git bisect bad 8299d43bdf37bc2a51d93842497327bbe86299eb
# good: [2b37abf2cd9617833c79b8491ef4e41452bddec5] Default to
MockDirectoryFactory in test configs (#2598)
git bisect good 2b37abf2cd9617833c79b8491ef4e41452bddec5
# good: [bcc6ccad1a416cfa6adf8d73a112d880d31dca4b] SOLR-18146: Fix race in
CircuitBreakerRegistry (#4189)
git bisect good bcc6ccad1a416cfa6adf8d73a112d880d31dca4b
# first bad commit: [8299d43bdf37bc2a51d93842497327bbe86299eb] SOLR-18144: fix
schema designer to auto-create .system in 9x (#4202)
{noformat}
{noformat}
2> 611801 INFO
(TEST-TestSchemaDesignerConfigSetHelper.testPersistSampleDocs-seed#[53E20FA28F998680])
[] o.a.s.SolrTestCaseJ4 ###Ending testPersistSampleDocs
> org.apache.solr.common.SolrException: Collection not found: .system
> at
__randomizedtesting.SeedInfo.seed([53E20FA28F998680:7BB99F0E84BDA927]:0)
> at
app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:938)
> at
app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:891)
> at
app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:296)
> at
app//org.apache.solr.handler.designer.SchemaDesignerConfigSetHelper.postDataToBlobStore(SchemaDesignerConfigSetHelper.java:516)
> at
app//org.apache.solr.handler.designer.TestSchemaDesignerConfigSetHelper.testPersistSampleDocs(TestSchemaDesignerConfigSetHelper.java:253)
> at
[email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
> at
[email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
[email protected]/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at
[email protected]/java.lang.reflect.Method.invoke(Method.java:566)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
app//com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:80)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
app//com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:80)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
> at [email protected]/java.lang.Thread.run(Thread.java:829)
2> NOTE: reproduce with: gradlew test --tests
TestSchemaDesignerConfigSetHelper.testPersistSampleDocs
-Dtests.seed=53E20FA28F998680 -Dtests.multiplier=2 -Dtests.locale=gu-IN
-Dtests.timezone=Africa/Johannesburg -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
{noformat}
> Regression in Schema Designer and .system collection
> ----------------------------------------------------
>
> Key: SOLR-18144
> URL: https://issues.apache.org/jira/browse/SOLR-18144
> Project: Solr
> Issue Type: Bug
> Components: Schema Designer
> Affects Versions: 9.9, 9.10, 9.10.1
> Reporter: Jan Høydahl
> Assignee: David Smiley
> Priority: Major
> Labels: pull-request-available
> Fix For: 9.11
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> When using Schema designer in solr 9.9 or 9.10, it complains with an error
> message "Collection not found: .system". We first thought it was related to
> SOLR-15832 but it turns out to be a new bug. Here is an analysis by Claude
> Code which I believe is accurate.
> h2. Root Cause
> The regression was introduced by commit 82c50e4291e (SOLR-16503: Use
> Http2SolrClient in SolrClientCache, SchemaDesigner, cherry-picked to
> branch_9x on Feb 10, 2025).
> h3. What Changed
> SchemaDesignerConfigSetHelper was migrated from using raw HTTP via
> CloudLegacySolrClient.getHttpClient() to using GenericSolrRequest +
> CloudSolrClient.
> Before (CloudLegacySolrClient.getHttpClient().execute(httpPost)):
> - Bypassed CloudSolrClient's ZooKeeper collection validation entirely
> - The raw HTTP POST arrived at the Solr servlet (HttpSolrCall)
> - HttpSolrCall.autoCreateSystemColl() detected: POST + .system not in ZK →
> auto-created the collection
> - getStoredSampleDocs used raw HTTP GET; HTTP 404 was explicitly handled as
> "not found is OK"
> After (request.process(cloudClient, BLOB_STORE_ID)):
> - CloudSolrClient performs a ZooKeeper lookup before sending any HTTP request
> - If .system doesn't exist in ZK → immediately throws
> SolrException("Collection not found: .system")
> - HttpSolrCall.autoCreateSystemColl() is never reached → .system is never
> auto-created
> - getStoredSampleDocs now only catches SolrServerException, not
> SolrException — so "Collection not found" propagates up
> h3. Fix Options
> # Best for branch_9x: Before any blob operation in
> SchemaDesignerConfigSetHelper, check if .system exists and explicitly create
> it (CollectionAdminRequest.createCollection(SYSTEM_COLL, null, 1, 1)). Also
> fix getStoredSampleDocs to catch SolrException for the "not found" case (like
> the old HTTP 404 handling).
> # Also fix getStoredSampleDocs: Add SolrException to the catch block and
> return Collections.emptyList() when the collection doesn't exist — matching
> the old behavior where HTTP 404 was silently treated as "no documents stored.
> # Long-term/main: The main branch already has the proper fix — commit
> 242c1fc4e8a (SOLR-17852) migrated Schema Designer entirely to the FileStore
> API, eliminating .system / BlobStore dependency. This backport would be more
> work but more correct.
> The most targeted fix would be to ensure .system is created lazily in
> SchemaDesignerConfigSetHelper before blob reads/writes, and to also fix
> getStoredSampleDocs to handle the "collection not found" case gracefully.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]