[
https://issues.apache.org/jira/browse/SAMZA-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Maes updated SAMZA-964:
----------------------------
Attachment: SAMZA-964_5.patch
> Improve the performance of the continuous OFFSET checkpointing for logged
> stores
> --------------------------------------------------------------------------------
>
> Key: SAMZA-964
> URL: https://issues.apache.org/jira/browse/SAMZA-964
> Project: Samza
> Issue Type: Bug
> Reporter: Jake Maes
> Assignee: Jake Maes
> Attachments: SAMZA-964_1.patch, SAMZA-964_2.patch, SAMZA-964_3.patch,
> SAMZA-964_4.patch, SAMZA-964_5.patch
>
>
> SAMZA-905 added the capability to write the OFFSET file on every commit().
> Unfortunately, the performance was a hindrance for one of our larger jobs at
> LinkedIn. The job has 10 stores, each with hundreds of partitions in their
> changelog topics. The performance problem came from
> KafkaSystemAdmin.getSystemStreamMetadata() method which:
> 1. Periodically refetches the topic metadata
> 2. Always fetches offsets twice (oldest,upcoming) for every partition
> Calling this method to fetch the offsets for just a couple tasks is wasteful.
> Metadata should only be fetched if there's a problem. Doing it periodically
> doesn't help. The total number of offset fetches is S*2*T^2 where S is the
> number of stores and P is the number of tasks/changelog partitions. Since we
> only need the newest offset should require S*T offset requests. Ideally, we'd
> also parallelize these requests, but that will be an exercise for another
> time.
> The fix has 3 components:
> 1. Cache metadata more aggressively. Only expire metadata if we get Kafka
> NotLeaderForPartitionException
> 2. Reduce excessive Offset fetching.
> 3. Do not allow unbounded exponential backoff for offset checkpointing, just
> skip the offset file. Exponential backoff can balloon the commit time and
> stall the event loop. So we will only retry up to 3 times for a max delay of
> 400ms
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)