Arpit Varshney created GOBBLIN-2118:
---------------------------------------
Summary: Reduce no of network calls while fetching kafka offsets
during startup
Key: GOBBLIN-2118
URL: https://issues.apache.org/jira/browse/GOBBLIN-2118
Project: Apache Gobblin
Issue Type: Improvement
Reporter: Arpit Varshney
During starting while creating work unit, in Kafkasource there are network
calls that tries to fetch the kafka offsets (both earliest and latest) to find
out the watermark (to find the offsets where the gobblin job will start
consuming from)
These calls are fetched for each topic and each partition in the topic. For
each partition, there is a separate call that goes to kafka client, which
increases the no of network calls. If there are cross colo calls (calls to
different datacenters in different regions) this increase the time to fetch and
results in timeout which leads to skipping of the topic partition to fetch
leading to starvation.
This ticket targets to reduce the no of network calls, rather than doing a call
for each partition. Utilize kafka source to fetch the offsets for all the
paritions at once from kafka.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)