leesf commented on a change in pull request #1039: [HUDI-340]: made max events 
to read from kafka source configurable
URL: https://github.com/apache/incubator-hudi/pull/1039#discussion_r349321525
 
 

 ##########
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##########
 @@ -229,7 +229,9 @@ public KafkaOffsetGen(TypedProperties props) {
         new 
HashMap(ScalaHelpers.toJavaMap(cluster.getLatestLeaderOffsets(topicPartitions).right().get()));
 
     // Come up with final set of OffsetRanges to read (account for new 
partitions, limit number of events)
-    long numEvents = Math.min(DEFAULT_MAX_EVENTS_TO_READ, sourceLimit);
+    long maxEventsToReadFromKafka = 
props.getLong(Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP,
+        Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE);
+    long numEvents = sourceLimit == Long.MAX_VALUE ? maxEventsToReadFromKafka 
: sourceLimit;
 
 Review comment:
   Should we also handle the case that 
`Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP` is set to `Long.MAX_VALUE` in props? 
It would also scan the entire Kafka topic. If `sourceLimit` and 
`Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP` both are set to `Long.MAX_VALUE`, 
just fallback to `Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE`. WDYT? 
@pratyakshsharma @vinothchandar 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to