On Wed, May 19, 2021 at 02:36:03PM -0400, Andrew Dunstan wrote: > Yeah, this area needs substantial improvement. I have seen similar sorts > of nasty hangs, where the script is waiting forever for some process > that hasn't got the shutdown message. At least we probably need some way > of making sure the END handler doesn't abort early. Maybe > PostgresNode::stop() needs a mode that handles failure more gracefully. > Maybe it needs to try shutting down all the nodes and only calling > BAIL_OUT after trying all of them and getting a failure. But that might > still leave us work to do on failures occuring pre-END.
For that, we could just make the END block called run_log() directly as well, as this catches stderr and an error code. What about making the shutdown a two-phase logic by the way? Trigger an immediate stop, and if it fails fallback to an extra kill9() to be on the safe side. Have you seen this being a problem even in cases where the tests all passed? If yes, it may be worth using the more aggressive flow even in the case where the tests pass. -- Michael
signature.asc
Description: PGP signature