kotman12 commented on code in PR #4279:
URL: https://github.com/apache/solr/pull/4279#discussion_r3087264699


##########
solr/core/src/test/org/apache/solr/handler/admin/UpgradeCoreIndexActionTest.java:
##########
@@ -365,10 +365,192 @@ public void 
testUpgradeCoreIndexFailsWithNestedDocuments() throws Exception {
                           coreName),
                       resp));
 
-      // Verify the exception message indicates nested documents are not 
supported
+      // Verify the exception message indicates child documents are not 
supported
       assertThat(
           thrown.getMessage(),
-          containsString("does not support indexes containing nested 
documents"));
+          containsString("does not support indexes containing child 
documents"));
+    } finally {
+      admin.shutdown();
+      admin.close();
+    }
+  }
+
+  // --- Child docs detection tests ---
+  //
+  // These tests verify that the child document detection in the upgrade path
+  // correctly distinguishes between genuine child docs and non-child docs,
+  // even in the presence of updates and deletes that leave deleted documents
+  // in segments (since NoMergePolicy prevents segment merges from purging 
them).
+
+  @Test
+  public void testChildDocsDetection_noChildDocsJustAdd() throws Exception {
+    for (int i = 0; i < 10; i++) {
+      assertU(adoc("id", String.valueOf(i), "title", "doc" + i));
+    }
+    assertU(commit("openSearcher", "true"));
+
+    assertUpgradeDoesNotDetectChildDocs();
+  }
+
+  @Test
+  public void testChildDocsDetection_withChildDocsJustAdd() throws Exception {
+    addChildDoc("100", "101");
+    addChildDoc("200", "201");
+    assertU(commit("openSearcher", "true"));
+
+    assertUpgradeDetectsChildDocs();
+  }
+
+  @Test
+  public void testChildDocsDetection_noChildDocsWithWithinCommitUpdates() 
throws Exception {
+    // Add docs and then update some of them BEFORE committing, so both the old
+    // (deleted) and new versions end up in the same flushed segment.
+    // With NoMergePolicy and a 100MB RAM buffer (from SolrIndexConfig 
defaults),
+    // no flush or merge occurs mid-batch, guaranteeing co-location.
+    //
+    // In the resulting segment, _root_ Terms stats will show:
+    //   Terms.size()     = N  (unique _root_ values, one per unique id)
+    //   Terms.getDocCount() = N + updates  (includes deleted doc entries)
+    //
+    // A naive check (uniqueRootValues < docsWithRoot) may false-positive here
+    // because multiple docs share the same _root_ value within the segment.
+    for (int i = 0; i < 10; i++) {
+      assertU(adoc("id", String.valueOf(i), "title", "doc" + i));
+    }
+    // Re-add a few docs with the same ids (within-commit updates)
+    for (int i = 0; i < 3; i++) {
+      assertU(adoc("id", String.valueOf(i), "title", "updated_doc" + i));
+    }
+    assertU(commit("openSearcher", "true"));
+
+    // 10 live docs — the updates replaced 3 docs in-place
+    assertQ(req("q", "*:*"), "//result[@numFound='10']");
+    assertUpgradeDoesNotDetectChildDocs();
+  }
+
+  @Test
+  public void testChildDocsDetection_withChildDocsWithWithinCommitUpdates() 
throws Exception {

Review Comment:
   In the current algorithm they are equivalent but things can change so this 
is more for regression prevention. I think it's possible to imagine some future 
code path that only gets exercised when `unique(id) == idTerms.size()` (which 
is the case only if all deletes are via update) or vice versa. We can also 
randomize the counts of docs added vs deleted vs updated if we are really 
worried about the incremental build time here. All that being said, it feels 
like overkill for the `UpgradeCoreIndex` tests because this isn't even the main 
point of this class. 
   
   That's why I proposed moving the child doc check to a central location 
(where they can also be exposed by some info endpoint as well). I know @dsmiley 
expressed interest in exposing the child doc check via some info endpoint in 
the original PR (luke or maybe in core/collection status?). If we were to go 
that direction then we'd probably want to move this check to some centralized 
utility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to