kotman12 commented on code in PR #4279:
URL: https://github.com/apache/solr/pull/4279#discussion_r3110855110
##########
solr/core/src/test/org/apache/solr/handler/admin/UpgradeCoreIndexActionTest.java:
##########
@@ -365,10 +365,192 @@ public void
testUpgradeCoreIndexFailsWithNestedDocuments() throws Exception {
coreName),
resp));
- // Verify the exception message indicates nested documents are not
supported
+ // Verify the exception message indicates child documents are not
supported
assertThat(
thrown.getMessage(),
- containsString("does not support indexes containing nested
documents"));
+ containsString("does not support indexes containing child
documents"));
+ } finally {
+ admin.shutdown();
+ admin.close();
+ }
+ }
+
+ // --- Child docs detection tests ---
+ //
+ // These tests verify that the child document detection in the upgrade path
+ // correctly distinguishes between genuine child docs and non-child docs,
+ // even in the presence of updates and deletes that leave deleted documents
+ // in segments (since NoMergePolicy prevents segment merges from purging
them).
+
+ @Test
+ public void testChildDocsDetection_noChildDocsJustAdd() throws Exception {
+ for (int i = 0; i < 10; i++) {
+ assertU(adoc("id", String.valueOf(i), "title", "doc" + i));
+ }
+ assertU(commit("openSearcher", "true"));
+
+ assertUpgradeDoesNotDetectChildDocs();
+ }
+
+ @Test
+ public void testChildDocsDetection_withChildDocsJustAdd() throws Exception {
+ addChildDoc("100", "101");
+ addChildDoc("200", "201");
+ assertU(commit("openSearcher", "true"));
+
+ assertUpgradeDetectsChildDocs();
+ }
+
+ @Test
+ public void testChildDocsDetection_noChildDocsWithWithinCommitUpdates()
throws Exception {
+ // Add docs and then update some of them BEFORE committing, so both the old
+ // (deleted) and new versions end up in the same flushed segment.
+ // With NoMergePolicy and a 100MB RAM buffer (from SolrIndexConfig
defaults),
+ // no flush or merge occurs mid-batch, guaranteeing co-location.
+ //
+ // In the resulting segment, _root_ Terms stats will show:
+ // Terms.size() = N (unique _root_ values, one per unique id)
+ // Terms.getDocCount() = N + updates (includes deleted doc entries)
+ //
+ // A naive check (uniqueRootValues < docsWithRoot) may false-positive here
+ // because multiple docs share the same _root_ value within the segment.
+ for (int i = 0; i < 10; i++) {
+ assertU(adoc("id", String.valueOf(i), "title", "doc" + i));
+ }
+ // Re-add a few docs with the same ids (within-commit updates)
+ for (int i = 0; i < 3; i++) {
+ assertU(adoc("id", String.valueOf(i), "title", "updated_doc" + i));
+ }
+ assertU(commit("openSearcher", "true"));
+
+ // 10 live docs — the updates replaced 3 docs in-place
+ assertQ(req("q", "*:*"), "//result[@numFound='10']");
+ assertUpgradeDoesNotDetectChildDocs();
+ }
+
+ @Test
+ public void testChildDocsDetection_withChildDocsWithWithinCommitUpdates()
throws Exception {
Review Comment:
Consider three statistics that are similar but subtly different for
"id-like" fields: `terms.size`, `terms.docCount`, `unique(id)` (i.e. facet via
searcher). When all your deletions are via updates, meaning every deletion of a
solr Id is paired with an addition of that same solr Id then you have:
`terms.docCount > terms.size = unique(id)`
However, when you have deletions where none of the deletions are part of
updates then you have:
`terms.docCount = terms.size > unique(id)`
I want to add tests which protect against the wrong kind of refactor which
IMO is not that hard considering the subtelties of the different types of
counts. I know they confused me a bit until I considered all the cases
mentioned above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]