[jira] [Resolved] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman resolved SAMZA-1979.

Resolution: Fixed

> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor:
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the StreamPartitionCountMonitor is run in a different 
> application attempt of Jobcoordinator, then it would have missed the signal 
> of input partition count change.
>  B. Let's say that when the input stream partition count changes, the user 
> stops the samza job. The daemon thread spawned in the subsequent run of the 
> samza job would not have the adequate state to determine that the partition 
> count of the input streams has changed.
>  
> The general pattern here is that if there's a change in partition count of 
> input streams and JobCoordinator process is killed before the scheduled to 
> run then we would fail to detect the partition count change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SAMZA-1776) Refactor KafkaSystemConsumer to remove the usage of deprecated SimpleConsumer client

2018-11-06 Thread Yi Pan (Data Infrastructure) (JIRA)


[ 
https://issues.apache.org/jira/browse/SAMZA-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677431#comment-16677431
 ] 

Yi Pan (Data Infrastructure) commented on SAMZA-1776:
-

[~zhjohnzzh], thanks for poking for the updates. Please watch out for the vote 
on d...@samza.apache.org. Samza 1.0 release RC4 is already out for a while and 
I would expect that the vote be closed very soon.

Best!

> Refactor KafkaSystemConsumer to remove the usage of deprecated SimpleConsumer 
> client
> 
>
> Key: SAMZA-1776
> URL: https://issues.apache.org/jira/browse/SAMZA-1776
> Project: Samza
>  Issue Type: Improvement
>Reporter: Yi Pan (Data Infrastructure)
>Assignee: Boris Shkolnik
>Priority: Major
> Fix For: 1.0
>
>
> After Kafka 0.11, SimpleConsumer clients are deprecated and we also 
> discovered bugs in deprecated old consumer client library (SAMZA-1371).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677432#comment-16677432
 ] 

Shanthoosh Venkataraman commented on SAMZA-1979:


This is a problem only for a stateful samza job. Our existing implementation of 
StreamPartitionCountMonitor intends to stop the stateful job if it detects the 
partition count change in input streams.

 

> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor:
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the StreamPartitionCountMonitor is run in a different 
> application attempt of Jobcoordinator, then it would have missed the signal 
> of input partition count change.
>  B. Let's say that when the input stream partition count changes, the user 
> stops the samza job. The daemon thread spawned in the subsequent run of the 
> samza job would not have the adequate state to determine that the partition 
> count of the input streams has changed.
>  
> The general pattern here is that if there's a change in partition count of 
> input streams and JobCoordinator process is killed before the scheduled to 
> run then we would fail to detect the partition count change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
B. Let's say that when the input stream partition count changes, the user stops 
the samza job. The daemon thread spawned in the subsequent run of the samza job 
would not have the adequate state to determine that the partition count of the 
input streams has changed.

 

The general pattern here is that if there's a change in partition count of 
input streams and JobCoordinator process is killed before the scheduled to run 
then we would fail to detect the partition count change.

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

 

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then the monitor would fail to 
detect the partition count change.


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor:
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

 

The general pattern here is that if there's a change in partition count of 
input streams and JobCoordinator process is killed before the scheduled to run 
then we would fail to detect the partition count change.

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
B. Let's say that when the input stream partition count changes, the user stops 
the samza job. The daemon thread spawned in the subsequent run of the samza job 
would not have the adequate state to determine that the partition count of the 
input streams has changed.

 

The general pattern here is that if there's a change in partition count of 
input streams and JobCoordinator process is killed before the scheduled to run 
then we would fail to detect the partition count change.


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor:
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's 

[jira] [Commented] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Prateek Maheshwari (JIRA)


[ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677420#comment-16677420
 ] 

Prateek Maheshwari commented on SAMZA-1979:
---

Can you clarify why the job restarting and picking up the new partition count 
is an issue? Isn't that what the StreamPartitionCountMonitor is for?

> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
>  However for stateful jobs, the above solution might not work in the 
> following scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the StreamPartitionCountMonitor is run in a different 
> application attempt of Jobcoordinator, then it would have missed the signal 
> of input partition count change.
>  B. Let's say that when the input stream partition count changes, the user 
> stops the samza job. The daemon thread spawned in the subsequent run of the 
> samza job would not have the adequate state to determine that the partition 
> count of the input streams has changed.
> The general pattern here is that before the monitor detects the partition 
> count change, if the JobCoordinator process is killed then we would lose the 
> in-memory state.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
>  However for stateful jobs, the above solution might not work in the 
> following scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

 

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then the monitor would fail to 
detect the partition count change.

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then the monitor would fail to 
detect the partition count change.


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then the monitor would fail to 
detect the partition count change.

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the StreamPartitionCountMonitor, let's 
say that there's a change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
>  However for stateful jobs, the above solution might not work in the 
> following scenarios: 
> A. In between the scheduling interval of the StreamPartitionCountMonitor, 
> let's say that there's a change in partition count of input stream. Before 
> the StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.

* JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

* StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

* In case of stateful jobs, if there's a difference in partition count, then it 
stops the JobCoordinator. 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.

1. JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

2. StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

3. In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
> * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
> * StreamPartitionCountMonitor periodically fetches the partition count of the 
> input streams and compares it against the initialized partition count.
> * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the monitor, let's say that there's 
> a change in partition count of input stream. Before the 
> StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
>  However for stateful jobs, the above solution might not work in the 
> following scenarios: 
> A. In between the scheduling interval of the monitor, let's say that there's 
> a change in partition count of input stream. Before the 
> StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.

1. JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

2. StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

3. In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count. In case 
of stateful jobs, if there's a difference in partition count, then it stops the 
JobCoordinator. 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
B. Let's say that when the input stream partition count changes, the user stops 
the samza job. The daemon thread spawned in the subsequent run of the samza job 
would not have the adequate state to determine that the partition count of the 
input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
> 1. JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
> 2. StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
> 3. In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the monitor, let's say that there's 
> a change in partition count of input stream. Before the 
> StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Description: 
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.
 * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

 * StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

 * In case of stateful jobs, if there's a difference in partition count, then 
it stops the JobCoordinator. 

 

 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 

  was:
Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

 

Here's the control flow of StreamPartitionCountMonitor.

* JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

* StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count.

* In case of stateful jobs, if there's a difference in partition count, then it 
stops the JobCoordinator. 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
 B. Let's say that when the input stream partition count changes, the user 
stops the samza job. The daemon thread spawned in the subsequent run of the 
samza job would not have the adequate state to determine that the partition 
count of the input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 


> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
>  
> Here's the control flow of StreamPartitionCountMonitor.
>  * JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
>  * StreamPartitionCountMonitor periodically fetches the partition count of 
> the input streams and compares it against the initialized partition count.
>  * In case of stateful jobs, if there's a difference in partition count, then 
> it stops the JobCoordinator. 
>  
>  
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the monitor, let's say that there's 
> a change in partition count of input stream. Before the 
> StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator 

[jira] [Updated] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1979:
---
Issue Type: Bug  (was: New Feature)

> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
> JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
> StreamPartitionCountMonitor periodically fetches the partition count of the 
> input streams and compares it against the initialized partition count. In 
> case of stateful jobs, if there's a difference in partition count, then it 
> stops the JobCoordinator. 
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the monitor, let's say that there's 
> a change in partition count of input stream. Before the 
> StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the StreamPartitionCountMonitor is run in a different 
> application attempt of Jobcoordinator, then it would have missed the signal 
> of input partition count change.
> B. Let's say that when the input stream partition count changes, the user 
> stops the samza job. The daemon thread spawned in the subsequent run of the 
> samza job would not have the adequate state to determine that the partition 
> count of the input streams has changed.
> The general pattern here is that before the monitor detects the partition 
> count change, if the JobCoordinator process is killed then we would lose the 
> in-memory state.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)
Shanthoosh Venkataraman created SAMZA-1979:
--

 Summary: StreamPartitionCountMonitor bug.
 Key: SAMZA-1979
 URL: https://issues.apache.org/jira/browse/SAMZA-1979
 Project: Samza
  Issue Type: New Feature
Reporter: Shanthoosh Venkataraman
Assignee: Shanthoosh Venkataraman


Samza relies on a daemon thread named StreamPartitionCountMonitor which runs in 
the JobCoordinator to determine any change in the partition count of input 
streams.

JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
partition count of the input streams.

StreamPartitionCountMonitor periodically fetches the partition count of the 
input streams and compares it against the initialized partition count. In case 
of stateful jobs, if there's a difference in partition count, then it stops the 
JobCoordinator. 

However for stateful jobs, the above solution might not work in the following 
scenarios: 

A. In between the scheduling interval of the monitor, let's say that there's a 
change in partition count of input stream. Before the 
StreamPartitionCountMonitor's successive scheduled run, let's say the 
JobCoordinator process is killed(due to nodemanager failure). Yarn will 
reschedule the Jobcoordinator process to run in a different nodemanager 
machine. When the StreamPartitionCountMonitor is run in a different application 
attempt of Jobcoordinator, then it would have missed the signal of input 
partition count change.
B. Let's say that when the input stream partition count changes, the user stops 
the samza job. The daemon thread spawned in the subsequent run of the samza job 
would not have the adequate state to determine that the partition count of the 
input streams has changed.

The general pattern here is that before the monitor detects the partition count 
change, if the JobCoordinator process is killed then we would lose the 
in-memory state.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SAMZA-1979) StreamPartitionCountMonitor bug.

2018-11-06 Thread Shanthoosh Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SAMZA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677406#comment-16677406
 ] 

Shanthoosh Venkataraman commented on SAMZA-1979:


Proposal is to rely upon the task to partition assignments from the JobModel 
rather than some in-memory state in StreamPartitionCountMonitor.

> StreamPartitionCountMonitor bug.
> 
>
> Key: SAMZA-1979
> URL: https://issues.apache.org/jira/browse/SAMZA-1979
> Project: Samza
>  Issue Type: Bug
>Reporter: Shanthoosh Venkataraman
>Assignee: Shanthoosh Venkataraman
>Priority: Major
>
> Samza relies on a daemon thread named StreamPartitionCountMonitor which runs 
> in the JobCoordinator to determine any change in the partition count of input 
> streams.
> JobCoordinator initializes the StreamPartitionCountMonitor thread with the 
> partition count of the input streams.
> StreamPartitionCountMonitor periodically fetches the partition count of the 
> input streams and compares it against the initialized partition count. In 
> case of stateful jobs, if there's a difference in partition count, then it 
> stops the JobCoordinator. 
> However for stateful jobs, the above solution might not work in the following 
> scenarios: 
> A. In between the scheduling interval of the monitor, let's say that there's 
> a change in partition count of input stream. Before the 
> StreamPartitionCountMonitor's successive scheduled run, let's say the 
> JobCoordinator process is killed(due to nodemanager failure). Yarn will 
> reschedule the Jobcoordinator process to run in a different nodemanager 
> machine. When the StreamPartitionCountMonitor is run in a different 
> application attempt of Jobcoordinator, then it would have missed the signal 
> of input partition count change.
> B. Let's say that when the input stream partition count changes, the user 
> stops the samza job. The daemon thread spawned in the subsequent run of the 
> samza job would not have the adequate state to determine that the partition 
> count of the input streams has changed.
> The general pattern here is that before the monitor detects the partition 
> count change, if the JobCoordinator process is killed then we would lose the 
> in-memory state.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SAMZA-1817) Long classpath support

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SAMZA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677233#comment-16677233
 ] 

ASF GitHub Bot commented on SAMZA-1817:
---

GitHub user Sanil15 opened a pull request:

https://github.com/apache/samza/pull/797

SAMZA-1817: Generating relative path from absolute paths

- We removed wildcarding to shorten the classpath since it induces 
indeterminism in classpath on two consecutive deploys
- This patch constructs classpath (run-class.sh) by generating the relative 
path from directory executing the script to target directory where jars are 
present. 
- From observations on deploying jobs on YARN and locally, we saw that we 
can save over 65% of classpath length by switching to relative paths as 
compared to the absolute path



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sanil15/samza SAMZA-1817-reopen

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #797


commit 218ab2cb8bd8f9095a14c339427cb623c3fb9a91
Author: Sanil15 
Date:   2018-11-06T19:39:37Z

Generating relative path from absolute paths




> Long classpath support
> --
>
> Key: SAMZA-1817
> URL: https://issues.apache.org/jira/browse/SAMZA-1817
> Project: Samza
>  Issue Type: Bug
>Reporter: Sanil Jain
>Assignee: Sanil Jain
>Priority: Major
>
> Samza team could also support a problem for long classpath by implementing 
> something that would reduce the classpath length.
> Some potential ideas to fix:
> - Change the generated classpath (done in Samza's bash scripts) to be a list 
> of `/*jar` entries instead of a list of every individual jar. 
> Wildcards in the classpath are supported in jdk >= 6.
> This should be a relatively simple change and can be implemented by getting 
> the parent directory of all jar files into a list, sorting and making the 
> list unique, then append a wildcard to the end to pick up all jars under each 
> directory in the list (ask Jon Bringhurst if you have any questions).
> See 
> https://github.com/apache/samza/blob/92ae4c628abb3d113520ec47ca82f08c480123ad/samza-shell/src/main/bash/run-class.sh#L63
>  for where the code change for this should probably be.
> - Use the manifest in a special "uber-jar" to list all dependencies in the 
> classpath. Then just include that one "uber-jar" in the classpath on the cli. 
> This would probably be tough to do, and seems like it would be annoying to 
> manage, but apparently IntelliJ uses this strategy.
> Note that this is not an issue with the number of arguments (argc limit, 
> which is based on a multiple of the ulimit -s stack size), it is a problem 
> with the length of a single argument (strlen(*argv) limit based on 
> MAX_ARG_STRLEN, which is a multiple of the PAGE_SIZE). So, the only way to 
> get around this limit is to recompile the kernel with a different page size 
> (which obviously isn't practical).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SAMZA-1393) Asynchronous SystemProducer API

2018-11-06 Thread Jake Maes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SAMZA-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Maes reassigned SAMZA-1393:


Assignee: Daniel Nishimura

> Asynchronous SystemProducer API
> ---
>
> Key: SAMZA-1393
> URL: https://issues.apache.org/jira/browse/SAMZA-1393
> Project: Samza
>  Issue Type: Bug
>Reporter: Jake Maes
>Assignee: Daniel Nishimura
>Priority: Major
>
> We've encountered a number of issues with the current SystemProducer 
> interface with async processors. For example, see SAMZA-1392. 
> Most of the issues arise from the fact that there's no way for us to notify 
> the user when there is an async error in from the producer. Consider the 
> KafkaSystemProducer. If the KafkaProducer callback returns an exception, the 
> KafkaSystemProducer has to track it and find an appropriate time to handle 
> the exception. Depending on where the exception is handled and whether it is 
> swallowed the OffsetManager will need to be overly conservative to ensure we 
> do not checkpoint offsets until the corresponding send() was successful. 
> Sometimes whole batches of messages will fail, but only a single exception is 
> thrown, and there's no context from which send() the exception originated. 
> This will always be unintuitive to the user unless they are synchronously 
> notified of the exception. 
> There hasn't been much planning on this yet, but the idea is to have a send() 
> method that takes a callback which is called directly from the producer 
> callback. So the callback would flow something like this:
> producer.callback -> user.callback -> asynctask.callback -> (throw error | 
> offsetmanager.update)
> Some care needs to be taken to ensure that there are no gaps in the offsets 
> recorded by the OffsetManager. Also, it's not yet clear how to enable users 
> to retry a failed send.
> NOTE: depending on the implementation, the changes in SAMZA-1384 may need to 
> be reverted after this feature is implemented to ensure that the latest 
> offsets are checkpointed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SAMZA-1776) Refactor KafkaSystemConsumer to remove the usage of deprecated SimpleConsumer client

2018-11-06 Thread Zihao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SAMZA-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677082#comment-16677082
 ] 

Zihao Zhang commented on SAMZA-1776:


Hi [~nickpan47],

We are really excited to try the new 1.0 version of Samza, especially the fix 
of stuck container issue. Do you have an estimation when this new version will 
be released? Thank you.

> Refactor KafkaSystemConsumer to remove the usage of deprecated SimpleConsumer 
> client
> 
>
> Key: SAMZA-1776
> URL: https://issues.apache.org/jira/browse/SAMZA-1776
> Project: Samza
>  Issue Type: Improvement
>Reporter: Yi Pan (Data Infrastructure)
>Assignee: Boris Shkolnik
>Priority: Major
> Fix For: 1.0
>
>
> After Kafka 0.11, SimpleConsumer clients are deprecated and we also 
> discovered bugs in deprecated old consumer client library (SAMZA-1371).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)