[ 
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134647#comment-14134647
 ] 

Yan Fang commented on SAMZA-402:
--------------------------------

+1 for the proposal. 

1. two other use cases I can think of about the global state are
  * continuously update ML model. This is related to the ML weights example in 
the proposal. Since we enable updating the state, this opens the potential of 
updating the ML model without stopping the Samza job. Users update/retrain the 
model in the batch process and then push the result to the state stream. Then 
Samza reads the updated state (that is, the new model) and keeps processing. 
The limitation is that, this only works when the users only change the model's 
weight but do not change the feature numbers and the algorithm. But I think 
changing parameters of the model happens a lot. Even this makes tuning the 
model possible in the Samza !
  * control the Samza job. If the state can be updates, controlling the Samza 
job through the state stream seems very feasible. We can send commands to the 
Samza job.

2. A question about the "read-only" concept. Do we plan to provide an mechanism 
to really guarantee "read-only"? Since the state stream actually can be updated 
even by the same Samza job. It's more like "write" is discouraged. It maybe 
related to what Martin brings up in Samza-300 , some lock-like stuff.

3. How do we localize the "global state"? Not sure if we leave this to the 
users or we do this job. If we do this, then user's life will become easier 
because they can simply call the "global key" and get the value. Then question 
for us is that, where we want to store the "global state". Putting the global 
state to the local state store seems better than putting to HashMap. But then 
it will requires the "global key" to be different from any "local-key". 
Otherwise, it will be occasionally overridden by local operations. If we leave 
this to the users, they need to write an extra task to process the global 
state, which I think maybe too much. Of course, we can always provide a default 
setting and an API to override.


> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
>                 Key: SAMZA-402
>                 URL: https://issues.apache.org/jira/browse/SAMZA-402
>             Project: Samza
>          Issue Type: Bug
>          Components: container, kv
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>              Labels: design, project
>         Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf, 
> DESIGN-SAMZA-402-1.md, DESIGN-SAMZA-402-1.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353. 
> Initially, it seemed as though we might implement them through SAMZA-353, but 
> now it seems more preferable to implement them separately. As such, this 
> ticket is to discuss global state/shared state (terms that are being used 
> interchangeably) between StreamTasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to