kotman12 commented on code in PR #4279:
URL: https://github.com/apache/solr/pull/4279#discussion_r3110855110


##########
solr/core/src/test/org/apache/solr/handler/admin/UpgradeCoreIndexActionTest.java:
##########
@@ -365,10 +365,192 @@ public void 
testUpgradeCoreIndexFailsWithNestedDocuments() throws Exception {
                           coreName),
                       resp));
 
-      // Verify the exception message indicates nested documents are not 
supported
+      // Verify the exception message indicates child documents are not 
supported
       assertThat(
           thrown.getMessage(),
-          containsString("does not support indexes containing nested 
documents"));
+          containsString("does not support indexes containing child 
documents"));
+    } finally {
+      admin.shutdown();
+      admin.close();
+    }
+  }
+
+  // --- Child docs detection tests ---
+  //
+  // These tests verify that the child document detection in the upgrade path
+  // correctly distinguishes between genuine child docs and non-child docs,
+  // even in the presence of updates and deletes that leave deleted documents
+  // in segments (since NoMergePolicy prevents segment merges from purging 
them).
+
+  @Test
+  public void testChildDocsDetection_noChildDocsJustAdd() throws Exception {
+    for (int i = 0; i < 10; i++) {
+      assertU(adoc("id", String.valueOf(i), "title", "doc" + i));
+    }
+    assertU(commit("openSearcher", "true"));
+
+    assertUpgradeDoesNotDetectChildDocs();
+  }
+
+  @Test
+  public void testChildDocsDetection_withChildDocsJustAdd() throws Exception {
+    addChildDoc("100", "101");
+    addChildDoc("200", "201");
+    assertU(commit("openSearcher", "true"));
+
+    assertUpgradeDetectsChildDocs();
+  }
+
+  @Test
+  public void testChildDocsDetection_noChildDocsWithWithinCommitUpdates() 
throws Exception {
+    // Add docs and then update some of them BEFORE committing, so both the old
+    // (deleted) and new versions end up in the same flushed segment.
+    // With NoMergePolicy and a 100MB RAM buffer (from SolrIndexConfig 
defaults),
+    // no flush or merge occurs mid-batch, guaranteeing co-location.
+    //
+    // In the resulting segment, _root_ Terms stats will show:
+    //   Terms.size()     = N  (unique _root_ values, one per unique id)
+    //   Terms.getDocCount() = N + updates  (includes deleted doc entries)
+    //
+    // A naive check (uniqueRootValues < docsWithRoot) may false-positive here
+    // because multiple docs share the same _root_ value within the segment.
+    for (int i = 0; i < 10; i++) {
+      assertU(adoc("id", String.valueOf(i), "title", "doc" + i));
+    }
+    // Re-add a few docs with the same ids (within-commit updates)
+    for (int i = 0; i < 3; i++) {
+      assertU(adoc("id", String.valueOf(i), "title", "updated_doc" + i));
+    }
+    assertU(commit("openSearcher", "true"));
+
+    // 10 live docs — the updates replaced 3 docs in-place
+    assertQ(req("q", "*:*"), "//result[@numFound='10']");
+    assertUpgradeDoesNotDetectChildDocs();
+  }
+
+  @Test
+  public void testChildDocsDetection_withChildDocsWithWithinCommitUpdates() 
throws Exception {

Review Comment:
   Consider three statistics that are similar but subtly different for 
"id-like" fields: `terms.size`, `terms.docCount`, `unique(id)` (i.e. facet via 
searcher). When all your deletions are via updates, meaning every deletion of a 
solr docId is paired with an addition of that same solr doc Id then you have:
   
   `terms.docCount > terms.size = unique(id)`
   
   However, when you have deletions where none of the deletions are part of 
updates then you have:
   
   `terms.docCount = terms.size > unique(id)`
   
   I want to  add tests which protect against the wrong kind of refactor which 
IMO is not that hard considering the subtelties of the different types of 
counts. I know they confused me a bit until I considered all the cases 
mentioned above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to