[
https://issues.apache.org/jira/browse/SAMZA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206302#comment-14206302
]
Martin Kleppmann commented on SAMZA-353:
----------------------------------------
Clarification of my last comment. I think what I'm looking for here is a
SystemStreamPartitionGrouper which generates the *product* of the input streams.
Say you have two input streams: a (with partitions a1, a2, a3, a4) and b (with
partitions b1, b2, b3, b4). In that case I would like to create sixteen
StreamTask instances, one for every combination of partitions: (a1, b1), (a1,
b2), (a1, b3), (a1, b4), (a2, b1), ..., (a4, b4).
This would do exactly the right thing if one input stream is queries, and the
other is documents: every document will be matched against all the queries, but
the document stream can also be partitioned and thus scaled (and each document
partition gets its own set of query processors). Another way of looking at this
is that it allows scatter-gather-type data flows.
This doesn't answer the question of how multi-subscriber partitions should be
implemented (the main problem is checkpointing, as different StreamTasks in the
same container may have a different consumer offset for the same stream
partition). But I think it's quite a reasonable use case.
> Support assigning the same SSP to multiple tasknames
> ----------------------------------------------------
>
> Key: SAMZA-353
> URL: https://issues.apache.org/jira/browse/SAMZA-353
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.8.0
> Reporter: Jakob Homan
> Labels: design
> Fix For: 0.8.0
>
> Attachments: DESIGN-SAMZA-353-0.md, DESIGN-SAMZA-353-0.pdf
>
>
> Post SAMZA-123, it is possible to add the same SSP to multiple tasknames,
> although currently we check for this and error out if this is done. We
> should think through the implications of having the same SSP appear in
> multiple tasknames and support this if it makes sense.
> This could be used as a broadcast stream that's either added by Samza itself
> to each taskname, or individual groupers could do this as makes sense. Right
> now the container maintains a map of SSP to TaskInstance and delivers the ssp
> to that task instance. With this change, we'd need to change the map to SSP
> to Set[TaskInstance] and deliver the message to each TI in the set.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)