[ 
https://issues.apache.org/jira/browse/SAMZA-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-2284:
-------------------------------------------
    Issue Type: Improvement  (was: New Feature)

> Remove redundant stream metadata API invocations in SamzaContainer startup 
> sequence.
> ------------------------------------------------------------------------------------
>
>                 Key: SAMZA-2284
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2284
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SAMZA-2122 added support to initialize the lag of input streams during tasks 
> initialization. In order to accomplish that, the metadata of input stream was 
> fetched for every system stream partition assigned to a task instance.
> SamzaContainer startup sequence fetches the metadata of same input streams 
> multiple times. Fetching the metadata of a stream entails making a remote 
> call to underlying messaging broker and is very expensive. This redundant 
> fetch-input-stream-metadata API invocations incurred significant delays in 
> the start of actual message processing by the samza job.
> Impact:
> 1. With some samza jobs at LinkedIn, we observed that this 
> fetch-input-stream-metadata loop took around 1.5 hrs to complete.
> 2. The redundant fetch-input-stream-metadata remote API calls will increase 
> the load on the underlying messaging broker significantly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to