[ https://issues.apache.org/jira/browse/SAMZA-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shanthoosh Venkataraman updated SAMZA-2284: ------------------------------------------- Issue Type: Improvement (was: New Feature) > Remove redundant stream metadata API invocations in SamzaContainer startup > sequence. > ------------------------------------------------------------------------------------ > > Key: SAMZA-2284 > URL: https://issues.apache.org/jira/browse/SAMZA-2284 > Project: Samza > Issue Type: Improvement > Reporter: Shanthoosh Venkataraman > Assignee: Shanthoosh Venkataraman > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > SAMZA-2122 added support to initialize the lag of input streams during tasks > initialization. In order to accomplish that, the metadata of input stream was > fetched for every system stream partition assigned to a task instance. > SamzaContainer startup sequence fetches the metadata of same input streams > multiple times. Fetching the metadata of a stream entails making a remote > call to underlying messaging broker and is very expensive. This redundant > fetch-input-stream-metadata API invocations incurred significant delays in > the start of actual message processing by the samza job. > Impact: > 1. With some samza jobs at LinkedIn, we observed that this > fetch-input-stream-metadata loop took around 1.5 hrs to complete. > 2. The redundant fetch-input-stream-metadata remote API calls will increase > the load on the underlying messaging broker significantly. -- This message was sent by Atlassian JIRA (v7.6.14#76016)