[ 
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134149#comment-14134149
 ] 

Chris Riccomini commented on SAMZA-402:
---------------------------------------

bq. Are there other use case examples besides ML weights for unpartitioned 
local state?

Yes. We kind of whittled things down to two different use cases:

# Read-only global state that functions as a table to do a stream-table join.
# Read-write global state that functions as a way for tasks to share their 
state with eachother.

(1) is usually used when you're computing some static data set and pushing it 
to the job periodically. This could be done via a Hadoop to Kafka push, or 
simply by consuming the state from another Samza job's changelog stream.

(2) is useful in the ML case you've described. We have opted not to directly 
support read-write (ML-style) state. As you've said, for iterative algorithms, 
it's definitely possible to have calculations done locally, and just 
periodically sync the global state via some secondary Samza job. The counting 
example in the design doc could be achieved simply by having every StreamTask 
periodically send their local counts, and having a second aggregator job 
calculate the full count.

In addition, (2) can be achieved indirectly by having StreamTasks send messages 
directly to the read-only store's change log stream. This is "advanced", since 
race conditions pop up when you don't have single-writers for a given key in 
the store. We've opted to make this possible, but not directly supported. 
Basically, it's discouraged.

In short, I agree that (2) can be achieved in several alternative ways, so we 
aren't directly supporting it. The current proposal is for read-only global 
state.

[~martinkl] might have some more ideas for how (2) could be used, as well.

> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
>                 Key: SAMZA-402
>                 URL: https://issues.apache.org/jira/browse/SAMZA-402
>             Project: Samza
>          Issue Type: Bug
>          Components: container, kv
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>              Labels: design, project
>         Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf, 
> DESIGN-SAMZA-402-1.md, DESIGN-SAMZA-402-1.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353. 
> Initially, it seemed as though we might implement them through SAMZA-353, but 
> now it seems more preferable to implement them separately. As such, this 
> ticket is to discuss global state/shared state (terms that are being used 
> interchangeably) between StreamTasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to