Re: [PR] SOLR-18189: Skip indexing duplicate docs via a content hash URP [solr]

via GitHub Fri, 10 Apr 2026 20:02:36 -0700


dsmiley commented on PR #4263:
URL: https://github.com/apache/solr/pull/4263#issuecomment-4228005667


   I replaced the String/base64 usage with bytes & BinaryField.
   
   I asked Claude:  Compare this feature to DocBasedVersionConstraintsProcessor 
in terms of implementation.  As a code reviewer, are we missing something here? 
 
   
   ----
   
    1. Missing isNotLeader check (significant)
   
     DocBasedVersionConstraintsProcessor.processAdd immediately bails on 
non-leader nodes:
     if (isNotLeader(cmd)) {
         super.processAdd(cmd);
         return;
     }
     ContentHashVersionProcessor has no such guard. In a distributed cluster, 
every replica would also call RealTimeGetComponent.getInputDocument on every 
update — wasteful and potentially wrong
     during REPLAY/PEER_SYNC phases.
   
     2. Double ++i in retry loop (bug)
   
     Current code:
     for (int i = 0; ; i++) {         // i++ here
         ...
         } catch (SolrException e) {
             if (e.code() != 409) throw e;
             ++i;                      // second increment — wrong
         }
     }
     DocBasedVersionConstraintsProcessor uses continue instead, letting only 
the loop header increment. The logOverlyFailedRetries mask (i & 255) == 255 
fires at wrong cadence.
   
     3. Minor: hashField not from actual schema
   
     DocBasedVersionConstraintsProcessor does schema.getField(...) so the 
SchemaField reflects actual configured properties. ContentHashVersionProcessor 
constructs new SchemaField(hashFieldName, new
     BinaryField()) with default properties. Fine for now since only .getName() 
is used, but if someone later uses hashField for type dispatch they'll get 
wrong results.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-18189: Skip indexing duplicate docs via a content hash URP [solr]

Reply via email to