sajjad-moradi commented on code in PR #8321: URL: https://github.com/apache/pinot/pull/8321#discussion_r903853686
########## pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConsumer.java: ########## @@ -55,7 +58,12 @@ public MessageBatch<byte[]> fetchMessages(long startOffset, long endOffset, int LOGGER.debug("poll consumer: {}, startOffset: {}, endOffset:{} timeout: {}ms", _topicPartition, startOffset, endOffset, timeoutMillis); } - _consumer.seek(_topicPartition, startOffset); + Map<TopicPartition, Long> beginningOffsets = _consumer.beginningOffsets(Lists.newArrayList(_topicPartition)); + Long beginningOffset = beginningOffsets.values().iterator().next(); + // explicitly check for OutOfRange, where startOffset < beginningOffset + // without this, _consumer.poll will auto offset reset to latest, resulting in data loss + _consumer.seek(_topicPartition, Math.max(startOffset, beginningOffset)); Review Comment: We're adding one more call to kafka in the execution path for all happy cases to fix a rare edge case. If Kafka consumer doesn't throw exception for out of order scenario, maybe we should check the fetched messages and in case there's no message, then we can get the beginning offset; seek to it; and then fetch again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org