sajjad-moradi commented on code in PR #8321:
URL: https://github.com/apache/pinot/pull/8321#discussion_r903853686


##########
pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConsumer.java:
##########
@@ -55,7 +58,12 @@ public MessageBatch<byte[]> fetchMessages(long startOffset, 
long endOffset, int
       LOGGER.debug("poll consumer: {}, startOffset: {}, endOffset:{} timeout: 
{}ms", _topicPartition, startOffset,
           endOffset, timeoutMillis);
     }
-    _consumer.seek(_topicPartition, startOffset);
+    Map<TopicPartition, Long> beginningOffsets = 
_consumer.beginningOffsets(Lists.newArrayList(_topicPartition));
+    Long beginningOffset = beginningOffsets.values().iterator().next();
+    // explicitly check for OutOfRange, where startOffset < beginningOffset
+    // without this, _consumer.poll will auto offset reset to latest, 
resulting in data loss
+    _consumer.seek(_topicPartition, Math.max(startOffset, beginningOffset));

Review Comment:
   We're adding one more call to kafka in the execution path for all happy 
cases to fix a rare edge case. If Kafka consumer doesn't throw exception for 
out of order scenario, maybe we should check the fetched messages and in case 
there's no message, then we can get the beginning offset; seek to it; and then 
fetch again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to