[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
Regarding systemd-specific logic, AFAICT we already have some systemd- specific code for socket activation (see lxd/util/http.go:GetListeners). I think that conditionally leveraging what systemd offers (in case LXD is managed by systemd), wouldn't be a bad idea, and the strategy proposed by Dimitri sounds robust. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
It seems that the "failed to open cluster database" error is related to: https://github.com/lxc/lxd/issues/4485 which should be fixed by: https://github.com/lxc/lxd/pull/4518 Issue #4485 also contains some info about how to get you unstuck right now, but you'll want to upgrade to a LXD binary that includes with the PR above. It should be released as snap in the next days and as deb for 18.04 probably a bit later. ** Bug watch added: LXD bug tracker #4485 https://github.com/lxc/lxd/issues/4485 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
The DB upgrade issue is something we've fixed but needs manual intervention if the user ran into it when installing one of our betas. I've helped xnox get back online on IRC by restoring a DB backup and then re-importing some missing containers. This allowed getting back online with all containers present and running (they didn't get stopped during the failure). ** Changed in: lxd (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
Start-Date: 2018-04-03 22:22:21 Commandline: apt full-upgrade Requested-By: xnox (1000) ... lxc:amd64 (3.0.0~beta3-0ubuntu1, 3.0.0-0ubuntu2), lxd:amd64 (3.0.0~beta5-0ubuntu2, 3.0.0-0ubuntu1), ... Meanwhile > Apr 04 09:45:04 sochi lxd[27714]: Error: failed to open > cluster database: failed to ensure schema: failed to begin > transaction: gRPC BEGIN response error: rpc error: code = > Unknown desc = failed to handle BEGIN request: FSM out of > sync: timed out enqueuing operation ... End-Date: 2018-04-04 09:50:09 Meaning it kind was stuck, and I guess i had to kill things, like lxd possibly since at about this time. And I guess i tried to upgrade again, straight after because I also have: Start-Date: 2018-04-04 09:52:08 Commandline: apt full-upgrade Requested-By: xnox (1000) ... lxd:amd64 (3.0.0-0ubuntu1, 3.0.0-0ubuntu2) ... Apr 04 09:57:43 sochi lxd[15068]: Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation ... End-Date: 2018-04-04 10:07:02 And many more "failed to open cluster database" since. Like a few on the Apr 20. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
We don't want systemd logic inside LXD itself. If there's a way to tell systemd in its unit to kill any PostStart tasks when the main process exits non-zero, then we'd definitely add that to the unit. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
In the environment, INVOCATION_ID= will be set, when the process is executed under an invocation of a systemd unit. (This can also be reliably verified from the process keyring). When that is set, waitready, can check if said invocation _id is present on the current system's systemd by talking to /run/systemd/private. Then it could establish the watch on the main process of the matching invocation_id, if found. If all of above checks out, it means waitready can know for a fact that said invocation_id is what this waitready is blocked on. It can then establish a job status watch, and after establishing the watch to check if MAIN process has already died. By definition, waitready is spawned after the main process is spawned, so this should not be racy. Then waitready could receive the event that mainpid died, and bailout with an error earlier than the 10 minute timeout. If something like that would be welcomed, I could work on a patch that does that inside the waitready function. Let me did my logs w.r.t. database stuff. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1765699] Re: lxd fails to start main process, yet waitready doesn't bail out
LXD itself doesn't interact with systemd, so I'm not sure how waitready would know that the main systemd unit failed. What waitready does is attempt to talk to the LXD API (which will be blocking due to lxd's socket activation), calling an internal REST API endpoint to wait for LXD's early initialization to complete. As for your database being unhappy, when did that happen? Is that on initial upgrade to 3.0 from 2.21 or what exactly was the upgrade sequence involved here? It should be reasonably easy to have LXD re-convert the database, likely fixing the problem, but I'd like to figure out when it broke to see if it's one of the issues that we have since resolved. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765699 Title: lxd fails to start main process, yet waitready doesn't bail out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1765699/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs