[ 
https://issues.apache.org/jira/browse/SAMZA-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180226#comment-15180226
 ] 

Yi Pan (Data Infrastructure) commented on SAMZA-882:
----------------------------------------------------

[~jarradk], thanks for your input. IMO, the "dormant partition" problem you 
described would be better handled by a correct choice of hash function in 
partitioning. In your example, when you use user id as the partition key, there 
will be M number of users who are mapped to a single partition, based on some 
hash function. Hence, the "dormant partition" can only happen if *all* M users 
"leave the company". Usually, the probability for all user ids mapped to a 
single partition id are invalid is really low, with a good choice of hash 
function.

> Detect partition count changes in input streams
> -----------------------------------------------
>
>                 Key: SAMZA-882
>                 URL: https://issues.apache.org/jira/browse/SAMZA-882
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Navina Ramesh
>            Assignee: Navina Ramesh
>             Fix For: 0.10.1
>
>
> This is a known issue where any change in the partition count in the upstream 
> affects the Samza job and it needs to be restarted. In such scenarios, we 
> experience data loss or incorrect processing because the application logic 
> depends on the partitioning strategy. It is worsened by the fact that we 
> don't even have a good mechanism to detect such a change. 
> As a first-step towards detection, I propose that we modify the stream 
> metadata cache maintained in Samza such that when there a change in partition 
> count, we increment a gauge metric. This way we can at least attach a hook to 
> monitor when this happens and take necessary actions. 
> However, in the long-term, we need to come up with a better strategy for 
> handling this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to