[ 
https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140999#comment-17140999
 ] 

Pratyaksh Sharma commented on HUDI-340:
---------------------------------------

Hi [~wangxianghu], the idea behind having these checks is one should not try to 
scan the entire kafka topic at a go. I agree you can still set the value to 
Long.MAX_VALUE - 1, but I do not think someone would try to set such a source 
limit after checking this logic which you pointed out. If you try to scan the 
entire kafka topic, in essence you might be trying to read a really large chunk 
of data, which might cause issues. 

If you further want to tune this logic, we can discuss about setting some hard 
limit to the sourceLimit so that one is not able to configure sourceLimits like 
Long.MAX_VALUE - 1, but I am not sure if that is going to be a good idea. 

> Increase Default max events to read from kafka source
> -----------------------------------------------------
>
>                 Key: HUDI-340
>                 URL: https://issues.apache.org/jira/browse/HUDI-340
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, DEFAULT_MAX_EVENTS_TO_READ is set to 1M in case of kafka source in 
> KafkaOffsetGen.java class. DeltaStreamer can handle much more incoming 
> records than this. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to