Motivated by the recent submission of an ALUA checker for multipath-tools by Brian Bunker, I am proposing a generic framework for asynchronous checker threads in multipath-tools.
The first patch is a small fix I came up with while testing this. 2/5 is the actual implementation, 3/5 test code, and 4/4 modifies the TUR checker to the new library code. 5/5 is new, it simplifies testing the timeout handling code. This makes the logic of the TUR checker easier to understand, and should make it easier for Brian to implement the ALUA checker based on the same framework. Further improvements on top of this are possible. The new TUR code, except for the tur_check() function itself, is pretty generic and would allow abstracting an "async checker" model with just a few changes. This would make it possible to switch also the legacy checkers to an asynchronous mode of operation. Comments and reviews welcome. Martin Changes v2 -> v3 (based on the review by Ben Marzinski): - 02/05: fix handling of cancellation in idle state: don't attempt to increase refcount or free context in runner_thread(). Set refcount to 2 to begin with. - 03/05: treat "-t 0" option as minimal, not infinite, timeout, to enable testing early cancellation. - add compile time option RUNNER_START_DELAY_US=<N> for testing races between thread cancellation and startup Changes v1 -> v2 (based on the review by Ben Marzinski): - The design of the runner code has been changed such that no caller-side references are dropped automatically any more. The caller needs to call release_runner() explicitly to drop its reference, independent of the runner status. This simplifies the code and allows implementing the MAX_TIMEOUTS logic in tur.c as it used to be. The overall API is not more complex, as release_runner() replaces cancel_runner(). - An additional state RUNNER_DEAD is introduced, denoting a state in which the callback result is unavailable, but the thread has finished. - The naming of variables and struct members has been cleaned up. - 02/05: specified alignment for struct runner_context.data - 02/05: removed most assert() statements - 02/05: removed superfluous cmm_smp_wmb() - 02/05: perform all state modifications in the runner's cleanup function - 03/05: fixed signal handler name - 03/05: reduced the run time of the runner test script - 04/05: revert back to the original MAX_TIMEOUTS logic, keeping a ref the "last" runner context until the thread eventually terminates - 04/05: consistent handling of state and checker message - 05/05: new Martin Wilck (5): multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN libmpathutil: add generic implementation for checker thread runners multipath-tools tests: add test program for thread runners libmultipath: TUR checker: use runner threads libmultipath: tur checker: improve tur_deep_sleep() test libmpathutil/Makefile | 2 +- libmpathutil/libmpathutil.version | 7 +- libmpathutil/runner.c | 229 ++++++++++++ libmpathutil/runner.h | 102 ++++++ libmultipath/checkers/tur.c | 369 ++++++------------- multipathd/main.c | 4 +- tests/Makefile | 15 +- tests/runner-test.sh | 57 +++ tests/runner-test.supp | 15 + tests/runner.c | 564 ++++++++++++++++++++++++++++++ 10 files changed, 1103 insertions(+), 261 deletions(-) create mode 100644 libmpathutil/runner.c create mode 100644 libmpathutil/runner.h create mode 100755 tests/runner-test.sh create mode 100644 tests/runner-test.supp create mode 100644 tests/runner.c -- 2.53.0
