Arpit Varshney created GOBBLIN-2118:
---------------------------------------

             Summary: Reduce no of network calls while fetching kafka offsets 
during startup
                 Key: GOBBLIN-2118
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2118
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: Arpit Varshney


During starting while creating work unit, in Kafkasource there are network 
calls that tries to fetch the kafka offsets (both earliest and latest) to find 
out the watermark (to find the offsets where the gobblin job will start 
consuming from)

These calls are fetched for each topic and each partition in the topic. For 
each partition, there is a separate call that goes to kafka client, which 
increases the no of network calls. If there are cross colo calls (calls to 
different datacenters in different regions) this increase the time to fetch and 
results in timeout which leads to skipping of the topic partition to fetch 
leading to starvation. 

This ticket targets to reduce the no of network calls, rather than doing a call 
for each partition. Utilize kafka source to fetch the offsets for all the 
paritions at once from kafka.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to