Andrzej Bialecki created SOLR-18077:
---------------------------------------

             Summary: CrossDC Consumer - out-of-order Kafka partition processing
                 Key: SOLR-18077
                 URL: https://issues.apache.org/jira/browse/SOLR-18077
             Project: Solr
          Issue Type: Bug
          Components: module - crossDC
    Affects Versions: 9.10
            Reporter: Andrzej Bialecki
            Assignee: Andrzej Bialecki


When mirrored requests are submitted to Kafka in {{KafkaMirroringSink}} the 
default partitioner is used (\{{BuiltInPartitioner}}), which is submits 
messages to partitions in batches, switching between partitions in a 
round-robin fashion.

The same partitioner will be used (see below) by the MirrorMaker when adding 
messages to the target Kafka topic. Then 
{{KafkaCrossDcConsumer.pollAndProcessRequests()}} method retrieves new records 
- BUT then it iterates over partitions in a basically random order because 
{{ConsumerRecords.partitions}} is a HashMap.

This means that the batches of messages retrieved from multiple partitions are 
no longer necessarily in the same order as they were submitted. If requests in 
these batches from multiple partitions refer to the same collection then they 
may be applied out of order, leading to data divergence.

One possible solution is to explicitly use a different partitioning scheme when 
submitting messages from {{KafkaMirroringSink}} . This happens automatically 
when {{ProducerRecord}} key is explicitly set, and we can use the 
{{collection}} name as the key - this way all requests for the same collection 
will end up in the same partition, thus preserving the ordering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to