neels has posted comments on this change. ( https://gerrit.osmocom.org/c/libosmocore/+/15660 )
Change subject: add osmo_fsm_inst_watch() ...................................................................... Patch Set 2: I wish my reply could be shorter, but this is a quite important aspect of osmo_fsm design, so I am taking the time necessary for this. There are two proposals, by Pau and Vadim. In short, I disagree with Pau, but find Vadim's idea interesting. But even if Vadim's idea solves most cases, I have another use case that needs to "watch" an FSM instance past main loops. I still have another idea to solve that differently, too. So I think I will now try Vadim's idea to replace this patch. In long: * Pau's callback idea: > TBH I don't like this API, I think it's too specific. > > I'd rather add a free_cb, which is more common for readers and more generic > (can be reused for other purposes) (BTW, your proposal lacks a osmo_fsm_free_cb_remove(), I'll just assume that is included) IMHO use cases that need a callback function should rather be solved by event dispatch. If you're aiming at arbitrary cleanup actions, I'd rather use the FSM instance cleanup() functions. This here is really aiming only at the use case "Skip some action if an instance was freed". Usually we notify parent / child FSM instances of deallocation by events. But this doesn't work well in functions that are already running. Even if the term event sets a flag in the fi->priv struct, that may also be deallocated along. I am really aiming at only a 1bit information whether an FSM instance with its storage still exists, within a running function. A callback would also work, but would make implementing it more complex. Let's compare in pseudocode: Your proposal: my_free_callback() { complex_action1(); complex_action2(); complex_action3(); } safe_event_handling() { add_free_callback(fi2, my_free_callback, NULL); event_dispatch(fi1); remove_free_callback(fi2, my_free_callback); } But since this calling my_free_callback() is in essence a 1bit information, we might as well do: safe_event_handling() { fi_watch(&watch, fi2); event_dispatch(fi1); if (!watch.exists) { complex_action1(); complex_action2(); complex_action3(); } fi_unwatch(&watch); } I like this better because we can read the code in the flow of a single function. Furthermore, I am not aiming at adding more cleanup actions, but rather want to *skip* some usually normal action if a free has occured due to error handling. For that I ultimately need a bool flag to be evaluated in a running function. Your proposal solves it this way: my_free_callback(void *data) { bool *flag = data; *flag = false; } safe_event_handling() { bool flag = true; add_free_callback(fi2, my_free_callback, &flag); event_dispatch(fi1); if (!flag) return; normal_action(); } This would work, but since a bool is all we really need to be notified of a free, my patch is simpler. safe_event_handling() { fi_watch(&watch, fi2); event_dispatch(fi1); fi_unwatch(&watch); if (!watch.exists) return; normal_action(); } * Vadim's idea to free FSM instances in the main loop: We already have this osmo_fsm_inst_term_safely() feature, where terminated FSM instances get collected in a talloc pool, and only get deallocated when the last one is done. This solves various use-after-free problems in the same way as proposed, but indeed deferring deallocation to the main loop would more generally solve the problem in a simpler fashion. Implementation: We could talloc_reparent terminating FSM instances to this OTC_SELECT context, and then FSM instances would be deallocated by the next main loop iteration. It would be only a slight modification of the existing osmo_fsm_inst_term_safely() mechanisms. If we can be sure that terminated FSM instances are never deallocated before a function call returns back to osmo_select_main*(), we may not even need any code changes, if we drop events when the target inst is already terminating, or we could also do this: event_dispatch(fi1); if (fi2->proc.terminated) return; That would indeed be much simpler than my 'watcher' proposal. I like that. The only problem I see is that so far osmo_fsm API does not necessarily require an osmo_select_main_ctx(), or any select at all. But this is easily solved by adding another flag to the API that changes behavior only when set to true. A program needing deferred allocation can set the flag to true and then knows that it should use osmo_select_main_ctx(). * Other use case for "watcher": osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c: this FSM instance handles MGCP and the state of an MGW endpoint with its individual connections. There we have the feature to notify an arbitrary FSM instance of failure or success: void osmo_mgcpc_ep_ci_request(..., struct osmo_fsm_inst *notify, uint32_t event_success, uint32_t event_failure, void *notify_data); This triggers e.g. a CRCX or MDCX, and once the response comes back, dispatches an arbitrary event to an arbitrary FSM instance. So far we had no check whether that 'notify' FSM instance still exists when the MGCP response is received. Obviously the MGCP response will come in a later main loop select iteration. So far there would be the need to somehow tell the osmo_mgcpc_ep_ci FSM that a 'notify' target has disappeared, but since we so far lack such API, we basically hope that the instance is still around when the reply comes in. I now I realized that this 'notify' is still vulnerable to use-after-free, by the order of messages coming in. Say a CC Release comes in in-between an MDCX and its OK response, so the CC Release deallocates, then the OK response dispatches to deallocated 'notify'. One way to solve that would be the proposed "watcher", which I have already implemented in a patch. That would work generally, and the calling FSM instance would not need to take any more action to prevent use-after-free during accessing the 'notify' FSM instance. Another way would be to add an osmo_mgcpc_ep_cancel_notify() API that makes sure no 'notify' are dispatched. Here a calling FSM instance would have to make sure to call this function before deallocation. That would be fair enough. I think it would be nice to have a generalized way of telling whether an FSM instance is still there or not. But I also noticed that it can be hard to make sure to call the osmo_fsm_inst_unwatch() in all cases. So I will go for Vadim's solution and the osmo_mgcpc_ep_cancel_notify(). -- To view, visit https://gerrit.osmocom.org/c/libosmocore/+/15660 To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings Gerrit-Project: libosmocore Gerrit-Branch: master Gerrit-Change-Id: I4d8306488506c60b4c2fc1c4cb3ac04654db9c43 Gerrit-Change-Number: 15660 Gerrit-PatchSet: 2 Gerrit-Owner: neels <nhofm...@sysmocom.de> Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: neels <nhofm...@sysmocom.de> Gerrit-CC: fixeria <axilira...@gmail.com> Gerrit-CC: pespin <pes...@sysmocom.de> Gerrit-Comment-Date: Fri, 04 Oct 2019 17:41:22 +0000 Gerrit-HasComments: No Gerrit-Has-Labels: No Gerrit-MessageType: comment