[
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128720#comment-14128720
]
Chris Riccomini commented on SAMZA-402:
---------------------------------------
bq. from which changelog partition(s) should a local shared store be
restored/updated
I was thinking all of them.
bq. If several partitions are consumed, this introduces an ordering problem.
I think this ordering problem only matters in two scenarios:
# There are writes for the same key in different partitions.
# The store implementation is some-how order-dependent (write k1, write k2
results in a different state than write k2, write k1).
I was thinking of the input streams as being partitioned by key, so (1) doesn't
seem to be a problem to me. I hadn't much considered (2).
bq. I am inclined to say that this stream should be created with one
partition, and if there are several partitions, the job only reads from
partition 0 and ignores the others. As shared local state is intended for
fairly small data volumes, it shouldn't be a problem to put it all in one
partition.
I was kind of thinking the reverse: consume from everything by default, and
assume that the stream is partitioned by key and the store doesn't care about
ordering of writes for different keys. The developer can always create their
input stream to have a single partition, which should automatically solve both
problems (1) and (2).
> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
> Key: SAMZA-402
> URL: https://issues.apache.org/jira/browse/SAMZA-402
> Project: Samza
> Issue Type: Bug
> Components: container, kv
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf,
> DESIGN-SAMZA-402-1.md, DESIGN-SAMZA-402-1.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353.
> Initially, it seemed as though we might implement them through SAMZA-353, but
> now it seems more preferable to implement them separately. As such, this
> ticket is to discuss global state/shared state (terms that are being used
> interchangeably) between StreamTasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)