[ 
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128720#comment-14128720
 ] 

Chris Riccomini commented on SAMZA-402:
---------------------------------------

bq. from which changelog partition(s) should a local shared store be 
restored/updated

I was thinking all of them.

bq. If several partitions are consumed, this introduces an ordering problem.

I think this ordering problem only matters in two scenarios:

# There are writes for the same key in different partitions. 
# The store implementation is some-how order-dependent (write k1, write k2 
results in a different state than write k2, write k1).

I was thinking of the input streams as being partitioned by key, so (1) doesn't 
seem to be a problem to me. I hadn't much considered (2).

bq.  I am inclined to say that this stream should be created with one 
partition, and if there are several partitions, the job only reads from 
partition 0 and ignores the others. As shared local state is intended for 
fairly small data volumes, it shouldn't be a problem to put it all in one 
partition.

I was kind of thinking the reverse: consume from everything by default, and 
assume that the stream is partitioned by key and the store doesn't care about 
ordering of writes for different keys. The developer can always create their 
input stream to have a single partition, which should automatically solve both 
problems (1) and (2).

> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
>                 Key: SAMZA-402
>                 URL: https://issues.apache.org/jira/browse/SAMZA-402
>             Project: Samza
>          Issue Type: Bug
>          Components: container, kv
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>         Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf, 
> DESIGN-SAMZA-402-1.md, DESIGN-SAMZA-402-1.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353. 
> Initially, it seemed as though we might implement them through SAMZA-353, but 
> now it seems more preferable to implement them separately. As such, this 
> ticket is to discuss global state/shared state (terms that are being used 
> interchangeably) between StreamTasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to