[ 
https://issues.apache.org/jira/browse/SAMZA-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Maes updated SAMZA-964:
----------------------------
    Attachment: SAMZA-964_2.patch

> Improve the performance of the continuous OFFSET checkpointing for logged 
> stores
> --------------------------------------------------------------------------------
>
>                 Key: SAMZA-964
>                 URL: https://issues.apache.org/jira/browse/SAMZA-964
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jake Maes
>            Assignee: Jake Maes
>         Attachments: SAMZA-964_1.patch, SAMZA-964_2.patch
>
>
> SAMZA-905 added the capability to write the OFFSET file on every commit().
> Unfortunately, the performance was a hindrance for one of our larger jobs at 
> LinkedIn. The job has 10 stores, each with hundreds of partitions in their 
> changelog topics. The performance problem came from 
> KafkaSystemAdmin.getSystemStreamMetadata() method which:
> 1. Periodically refetches the topic metadata
> 2. Always fetches offsets twice (oldest,upcoming) for every partition
> Calling this method to fetch the offsets for just a couple tasks is wasteful. 
> Metadata should only be fetched if there's a problem. Doing it periodically 
> doesn't help. The total number of offset fetches is S*2*T^2 where S is the 
> number of stores and P is the number of tasks/changelog partitions. Since we 
> only need the newest offset should require S*T offset requests. Ideally, we'd 
> also parallelize these requests, but that will be an exercise for another 
> time. 
> The fix has 3 components:
> 1. Cache metadata more aggressively. Only expire metadata if we get Kafka 
> NotLeaderForPartitionException
> 2. Reduce excessive Offset fetching. 
> 3. Do not allow unbounded exponential backoff for offset checkpointing, just 
> skip the offset file. Exponential backoff can balloon the commit time and 
> stall the event loop. So we will only retry up to 3 times for a max delay of 
> 400ms



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to