GitHub user xintongsong added a comment to the discussion: Replay-based Per
Action State Consistency
@letaoj Thanks for updating the design doc based on our offline discussion. I
think the overall design is quite good now. I just have a few more comments on
the details.
1. For the request-response map, I think we should use some unique identifier
of the action execution as a key, rather than hash of event. Because one event
may trigger multiple actions. It looks right from `TaskActionState` which tries
to capture the execution state of an action. But in the execution flow, it
shows hash of events are used as part of the map key.
2. I'd suggest not to rebuild the short-term memory at the beginning, but to
rebuild it during replaying the actions. To be specific, when recovering from a
checkpoint, the short-term memory (state) should be restored to how it was when
the checkpoint was made. Then we replay the inputs, and check for whether the
action has already been performed. If performed, we skip the action, applies
any state changes it made, and get the output (events). This ensures actions
being re-executed see the same state as it was executed for the first time.
3. `<message_key>-<event_hash_1>: {"request": request, "short-term-memory":
short_term_memory.dump_json()"}` Does this mean we are storing the whole
short-term memory for each request-response pair? That should be unnecessary.
Since the full short-term memory is already persisted with the checkpoint, we
only need to persist the incremental changes of short-term memory since the
checkpint.
4. `TaskActionState .output_event` should be a list, because each action may
emit multiple events.
GitHub link:
https://github.com/apache/flink-agents/discussions/108#discussioncomment-14209491
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]