[
https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140999#comment-17140999
]
Pratyaksh Sharma commented on HUDI-340:
---------------------------------------
Hi [~wangxianghu], the idea behind having these checks is one should not try to
scan the entire kafka topic at a go. I agree you can still set the value to
Long.MAX_VALUE - 1, but I do not think someone would try to set such a source
limit after checking this logic which you pointed out. If you try to scan the
entire kafka topic, in essence you might be trying to read a really large chunk
of data, which might cause issues.
If you further want to tune this logic, we can discuss about setting some hard
limit to the sourceLimit so that one is not able to configure sourceLimits like
Long.MAX_VALUE - 1, but I am not sure if that is going to be a good idea.
> Increase Default max events to read from kafka source
> -----------------------------------------------------
>
> Key: HUDI-340
> URL: https://issues.apache.org/jira/browse/HUDI-340
> Project: Apache Hudi
> Issue Type: Improvement
> Components: DeltaStreamer
> Reporter: Pratyaksh Sharma
> Assignee: Pratyaksh Sharma
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.5.1
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Right now, DEFAULT_MAX_EVENTS_TO_READ is set to 1M in case of kafka source in
> KafkaOffsetGen.java class. DeltaStreamer can handle much more incoming
> records than this.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)