[
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134647#comment-14134647
]
Yan Fang commented on SAMZA-402:
--------------------------------
+1 for the proposal.
1. two other use cases I can think of about the global state are
* continuously update ML model. This is related to the ML weights example in
the proposal. Since we enable updating the state, this opens the potential of
updating the ML model without stopping the Samza job. Users update/retrain the
model in the batch process and then push the result to the state stream. Then
Samza reads the updated state (that is, the new model) and keeps processing.
The limitation is that, this only works when the users only change the model's
weight but do not change the feature numbers and the algorithm. But I think
changing parameters of the model happens a lot. Even this makes tuning the
model possible in the Samza !
* control the Samza job. If the state can be updates, controlling the Samza
job through the state stream seems very feasible. We can send commands to the
Samza job.
2. A question about the "read-only" concept. Do we plan to provide an mechanism
to really guarantee "read-only"? Since the state stream actually can be updated
even by the same Samza job. It's more like "write" is discouraged. It maybe
related to what Martin brings up in Samza-300 , some lock-like stuff.
3. How do we localize the "global state"? Not sure if we leave this to the
users or we do this job. If we do this, then user's life will become easier
because they can simply call the "global key" and get the value. Then question
for us is that, where we want to store the "global state". Putting the global
state to the local state store seems better than putting to HashMap. But then
it will requires the "global key" to be different from any "local-key".
Otherwise, it will be occasionally overridden by local operations. If we leave
this to the users, they need to write an extra task to process the global
state, which I think maybe too much. Of course, we can always provide a default
setting and an API to override.
> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
> Key: SAMZA-402
> URL: https://issues.apache.org/jira/browse/SAMZA-402
> Project: Samza
> Issue Type: Bug
> Components: container, kv
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Labels: design, project
> Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf,
> DESIGN-SAMZA-402-1.md, DESIGN-SAMZA-402-1.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353.
> Initially, it seemed as though we might implement them through SAMZA-353, but
> now it seems more preferable to implement them separately. As such, this
> ticket is to discuss global state/shared state (terms that are being used
> interchangeably) between StreamTasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)