leesf commented on a change in pull request #1039: [HUDI-340]: made max events to read from kafka source configurable URL: https://github.com/apache/incubator-hudi/pull/1039#discussion_r349321525
########## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ########## @@ -229,7 +229,9 @@ public KafkaOffsetGen(TypedProperties props) { new HashMap(ScalaHelpers.toJavaMap(cluster.getLatestLeaderOffsets(topicPartitions).right().get())); // Come up with final set of OffsetRanges to read (account for new partitions, limit number of events) - long numEvents = Math.min(DEFAULT_MAX_EVENTS_TO_READ, sourceLimit); + long maxEventsToReadFromKafka = props.getLong(Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP, + Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE); + long numEvents = sourceLimit == Long.MAX_VALUE ? maxEventsToReadFromKafka : sourceLimit; Review comment: Should we also handle the case that `Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP` is set to `Long.MAX_VALUE` in props? It would also scan the entire Kafka topic. If `sourceLimit` and `Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP` both are set to `Long.MAX_VALUE`, just fallback to `Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE`. WDYT? @pratyakshsharma @vinothchandar ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services