[
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129068#comment-14129068
]
Martin Kleppmann commented on SAMZA-402:
----------------------------------------
bq. I was thinking of the input streams as being partitioned by key, so (1)
doesn't seem to be a problem to me. I hadn't much considered (2).
As long as we're restricting ourselves to a plain key-value data model, that's
probably ok. However, I still have a mild preference for using a single
partition, as I think it could be less confusing in some edge cases. Say you
write to the stream using a client (in some other system) which doesn't do key
partitioning in the same way, or which accidentally omits the partitioning key.
If we're consuming a single partition, that write will either take effect (if
it's in partition 0) or be ignored, but the outcome is deterministic. If we're
consuming all partitions, the store ends up in a nondeterministic state which
could change when it is rebuilt. So it seems to me that a single partition is
less error-prone, and I can't see a compelling advantage of using multiple
partitions.
> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
> Key: SAMZA-402
> URL: https://issues.apache.org/jira/browse/SAMZA-402
> Project: Samza
> Issue Type: Bug
> Components: container, kv
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf,
> DESIGN-SAMZA-402-1.md, DESIGN-SAMZA-402-1.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353.
> Initially, it seemed as though we might implement them through SAMZA-353, but
> now it seems more preferable to implement them separately. As such, this
> ticket is to discuss global state/shared state (terms that are being used
> interchangeably) between StreamTasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)