kotman12 opened a new pull request, #4279:
URL: https://github.com/apache/solr/pull/4279

   # Description
   
   The upgrade core index endpoint doesn't currently support child docs and so 
it guards against running on an index with such documents. The  logic to check 
for the existence of child docs contingent upon the `_root_` field existing 
could be improved. If you just compare the doc count of `_root_` vs the unique 
count (current behavior), it will falsely flag any segment with updated docs 
(where the update happened within one segment) as containing child docs when in 
actuality it may have no child docs. I actually ran into such a scenario 
devising my own version of this check.
   
   # Solution
   
   Compare terms size of id field with the terms size of the `_root_` field to 
ensure no documents share a root. We do this for each segment separately as was 
done for the previous check.
   
   # Tests
   
     1. testNestedDocsDetection_nonNestedJustAdd — Plain docs, no updates. 
Verifies no false positive.
     2. testNestedDocsDetection_nestedJustAdd — Actual nested docs. Verifies 
detection works.
     3. testNestedDocsDetection_nonNestedWithWithinCommitUpdates — Non-nested 
docs where some are re-added (same id) before commit. The deleted+re-added docs 
share _root_ values within the segment, which previously caused false positives.
     4. testNestedDocsDetection_nestedWithWithinCommitUpdates — Same 
within-commit update pattern but with real nested docs present.
     5. testNestedDocsDetection_nonNestedWithWithinCommitDeletesAndUpdates — 
Non-nested docs with deletes AND updates before commit. Deleted entries in the 
segment can also trigger false positives.
     6. testNestedDocsDetection_nestedWithWithinCommitDeletesAndUpdates — Same 
delete+update pattern with real nested docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to