[jira] [Commented] (SAMZA-1979) StreamPartitionCountMonitor bug.

Prateek Maheshwari (JIRA) Tue, 06 Nov 2018 15:44:17 -0800


    [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677420#comment-16677420
 ]


Prateek Maheshwari commented on SAMZA-1979:
-------------------------------------------

Can you clarify why the job restarting and picking up the new partition count 
is an issue? Isn't that what the StreamPartitionCountMonitor is for?

> StreamPartitionCountMonitor bug.
> --------------------------------
>
>                 Key: SAMZA-1979
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1979
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
>  However for stateful jobs, the above solution might not work in the 
> following scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the StreamPartitionCountMonitor is run in a different 
> application attempt of Jobcoordinator, then it would have missed the signal 
> of input partition count change.
>  B. Let's say that when the input stream partition count changes, the user 
> stops the samza job. The daemon thread spawned in the subsequent run of the 
> samza job would not have the adequate state to determine that the partition 
> count of the input streams has changed.
> The general pattern here is that before the monitor detects the partition 
> count change, if the JobCoordinator process is killed then we would lose the 
> in-memory state.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (SAMZA-1979) StreamPartitionCountMonitor bug.

Reply via email to