[
https://issues.apache.org/jira/browse/SAMZA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205216#comment-14205216
]
Martin Kleppmann commented on SAMZA-353:
----------------------------------------
Today, while exploring SAMZA-423, I had a use case where it would have been
great if all StreamTasks could consume the same partition of a particular input
stream.
The context is [full-text search on
streams|https://github.com/romseygeek/samza-luwak]. Users can register queries,
and those queries are run against each incoming document. When a document
matches one of a user's registered queries, that user can be notified. (Think
Google Alerts, or following a hashtag stream on Twitter. ElasticSearch's
[percolator|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html]
is similar to this.)
In this case, co-partitioning of queries and documents is not possible, because
in general you don't know in advance which query will match which document. So
you have to either partition the set of queries and send each document to all
query partitions, or you have to partition the documents and send new queries
to all document partitions. Either way, one of the inputs has to be broadcast
to all partitions.
Global state (SAMZA-402) doesn't really do what we need, because we need the
process() method to be called when a message on the broadcast stream comes in.
And we don't want to add an RPC channel, because Kafka is ideally suited for
message transport in this use case.
> Support assigning the same SSP to multiple tasknames
> ----------------------------------------------------
>
> Key: SAMZA-353
> URL: https://issues.apache.org/jira/browse/SAMZA-353
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.8.0
> Reporter: Jakob Homan
> Labels: design
> Fix For: 0.8.0
>
> Attachments: DESIGN-SAMZA-353-0.md, DESIGN-SAMZA-353-0.pdf
>
>
> Post SAMZA-123, it is possible to add the same SSP to multiple tasknames,
> although currently we check for this and error out if this is done. We
> should think through the implications of having the same SSP appear in
> multiple tasknames and support this if it makes sense.
> This could be used as a broadcast stream that's either added by Samza itself
> to each taskname, or individual groupers could do this as makes sense. Right
> now the container maintains a map of SSP to TaskInstance and delivers the ssp
> to that task instance. With this change, we'd need to change the map to SSP
> to Set[TaskInstance] and deliver the message to each TI in the set.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)