Change in ...libosmocore[master]: add osmo_fsm_inst_watch()

neels Fri, 04 Oct 2019 10:41:32 -0700

neels has posted comments on this change. ( 
https://gerrit.osmocom.org/c/libosmocore/+/15660 )


Change subject: add osmo_fsm_inst_watch()
......................................................................


Patch Set 2:

I wish my reply could be shorter, but this is a quite important aspect of 
osmo_fsm design, so I am taking the time necessary for this.

There are two proposals, by Pau and Vadim.

In short, I disagree with Pau, but find Vadim's idea interesting.

But even if Vadim's idea solves most cases,
I have another use case that needs to "watch" an FSM instance past main loops.
I still have another idea to solve that differently, too.

So I think I will now try Vadim's idea to replace this patch.

In long:


* Pau's callback idea:

> TBH I don't like this API, I think it's too specific.
>
> I'd rather add a free_cb, which is more common for readers and more generic 
> (can be reused for other purposes)

(BTW, your proposal lacks a osmo_fsm_free_cb_remove(), I'll just assume that is 
included)

IMHO use cases that need a callback function should rather be solved by event 
dispatch.
If you're aiming at arbitrary cleanup actions, I'd rather use the FSM instance 
cleanup() functions.
This here is really aiming only at the use case "Skip some action if an 
instance was freed".

Usually we notify parent / child FSM instances of deallocation by events.
But this doesn't work well in functions that are already running.
Even if the term event sets a flag in the fi->priv struct, that may also be 
deallocated along.

I am really aiming at only a 1bit information whether an FSM instance with its 
storage still exists,
within a running function.

A callback would also work, but would make implementing it more complex.

Let's compare in pseudocode:

Your proposal:

  my_free_callback()
  {
       complex_action1();
       complex_action2();
       complex_action3();
  }

  safe_event_handling()
  {
       add_free_callback(fi2, my_free_callback, NULL);
       event_dispatch(fi1);
       remove_free_callback(fi2, my_free_callback);
  }

But since this calling my_free_callback() is in essence a 1bit information, we 
might as well do:

  safe_event_handling()
  {
       fi_watch(&watch, fi2);
       event_dispatch(fi1);
       if (!watch.exists) {
         complex_action1();
         complex_action2();
         complex_action3();
       }
       fi_unwatch(&watch);
  }

I like this better because we can read the code in the flow of a single 
function.

Furthermore, I am not aiming at adding more cleanup actions, but rather want to 
*skip* some usually normal action if a free has occured due to error handling. 
For that I ultimately need a bool flag to be evaluated in a running function.

Your proposal solves it this way:

  my_free_callback(void *data)
  {
       bool *flag = data;
       *flag = false;
  }

  safe_event_handling()
  {
       bool flag = true;
       add_free_callback(fi2, my_free_callback, &flag);
       event_dispatch(fi1);
       if (!flag)
          return;
       normal_action();
  }

This would work, but since a bool is all we really need to be notified of a 
free, my patch is simpler.

  safe_event_handling()
  {
       fi_watch(&watch, fi2);
       event_dispatch(fi1);
       fi_unwatch(&watch);
       if (!watch.exists)
           return;
       normal_action();
  }


* Vadim's idea to free FSM instances in the main loop:

We already have this osmo_fsm_inst_term_safely() feature, where terminated FSM 
instances get collected in a talloc pool, and only get deallocated when the 
last one is done. This solves various use-after-free problems in the same way 
as proposed, but indeed deferring deallocation to the main loop would more 
generally solve the problem in a simpler fashion.
Implementation: We could talloc_reparent terminating FSM instances to this 
OTC_SELECT context, and then FSM instances would be deallocated by the next 
main loop iteration.
It would be only a slight modification of the existing 
osmo_fsm_inst_term_safely() mechanisms.

If we can be sure that terminated FSM instances are never deallocated before a 
function call returns back to osmo_select_main*(),
we may not even need any code changes, if we drop events when the target inst 
is already terminating,
or we could also do this:

   event_dispatch(fi1);
   if (fi2->proc.terminated)
        return;

That would indeed be much simpler than my 'watcher' proposal. I like that.

The only problem I see is that so far osmo_fsm API does not necessarily require 
an osmo_select_main_ctx(), or any select at all.
But this is easily solved by adding another flag to the API that changes 
behavior only when set to true.
A program needing deferred allocation can set the flag to true and then knows 
that it should use osmo_select_main_ctx().


* Other use case for "watcher":

osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c: this FSM instance 
handles MGCP and the state of an MGW endpoint with its individual connections. 
There we have the feature to notify an arbitrary FSM instance of failure or 
success:

  void osmo_mgcpc_ep_ci_request(...,
                              struct osmo_fsm_inst *notify,
                              uint32_t event_success, uint32_t event_failure,
                              void *notify_data);

This triggers e.g. a CRCX or MDCX, and once the response comes back, dispatches 
an arbitrary event to an arbitrary FSM instance.
So far we had no check whether that 'notify' FSM instance still exists when the 
MGCP response is received.
Obviously the MGCP response will come in a later main loop select iteration.

So far there would be the need to somehow tell the osmo_mgcpc_ep_ci FSM that a 
'notify' target has disappeared,
but since we so far lack such API, we basically hope that the instance is still 
around when the reply comes in.
I now I realized that this 'notify' is still vulnerable to use-after-free, by 
the order of messages coming in.
Say a CC Release comes in in-between an MDCX and its OK response, so the CC 
Release deallocates,
then the OK response dispatches to deallocated 'notify'.

One way to solve that would be the proposed "watcher", which I have already 
implemented in a patch.
That would work generally, and the calling FSM instance would not need to take 
any more action to prevent
use-after-free during accessing the 'notify' FSM instance.

Another way would be to add an osmo_mgcpc_ep_cancel_notify() API that makes 
sure no 'notify' are dispatched.
Here a calling FSM instance would have to make sure to call this function 
before deallocation.
That would be fair enough.

I think it would be nice to have a generalized way of telling whether an FSM 
instance is still there or not.
But I also noticed that it can be hard to make sure to call the 
osmo_fsm_inst_unwatch() in all cases.

So I will go for Vadim's solution and the osmo_mgcpc_ep_cancel_notify().


--
To view, visit https://gerrit.osmocom.org/c/libosmocore/+/15660
To unsubscribe, or for help writing mail filters, visit 
https://gerrit.osmocom.org/settings

Gerrit-Project: libosmocore
Gerrit-Branch: master
Gerrit-Change-Id: I4d8306488506c60b4c2fc1c4cb3ac04654db9c43
Gerrit-Change-Number: 15660
Gerrit-PatchSet: 2
Gerrit-Owner: neels <nhofm...@sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: neels <nhofm...@sysmocom.de>
Gerrit-CC: fixeria <axilira...@gmail.com>
Gerrit-CC: pespin <pes...@sysmocom.de>
Gerrit-Comment-Date: Fri, 04 Oct 2019 17:41:22 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Change in ...libosmocore[master]: add osmo_fsm_inst_watch()

Reply via email to