[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
This bug was fixed in the package mysql-5.5 - 5.5.20-0ubuntu1 --- mysql-5.5 (5.5.20-0ubuntu1) precise; urgency=low * New upstream release. * d/mysql-server-5.5.mysql.upstart: Fix stop on to make sure mysql is fully stopped before shutdown commences. (LP: #688541) Also simplify start on as it is redundant. * d/control: Depend on upstart version which has apparmor profile load script to prevent failure on upgrade from lucid to precise. (LP: #907465) * d/apparmor-profile: need to allow /run since that is the true path of /var/run files. (LP: #917542) * d/control: mysql-server-5.5 has files in it that used to be owned by libmysqlclient-dev, so it must break/replace it. (LP: #912487) * d/rules, d/control: 5.5.20 Fixes segfault on tests with gcc 4.6, change compiler back to system default. * d/rules: Turn off embedded libedit/readline.(Closes: #659566) -- Clint ByrumTue, 14 Feb 2012 23:59:22 -0800 ** Changed in: mysql-5.5 (Ubuntu Precise) Status: Invalid => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
This bug was fixed in the package sysvinit - 2.88dsf-13.10ubuntu4.1 --- sysvinit (2.88dsf-13.10ubuntu4.1) oneiric-proposed; urgency=low * d/src/initscripts/etc/init.d/sendsigs: wait up to 300 extra seconds for upstart jobs that have been killed. They will be sent SIGKILL by upstart when their 'kill timeout' has been reached, so we should trust the job's author to give the service a reasonable amount of time to shut down. (LP: #688541) * also omit pids of stop/killed upstart jobs since we know they've been killed already. * d/src/initscripts/etc/init.d/umountroot: Check for init.upgraded file in /var/run before clearing out /var/run. (LP: #886439) -- Clint ByrumMon, 12 Dec 2011 16:08:10 -0800 ** Changed in: sysvinit (Ubuntu Oneiric) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Alan you are my stable release update hero this week. :) The update should land in about 5 days (minimum 7 in -proposed just in case we missed a major regression.) ** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Tested the Python "test case" from the bug description using the initscripts in Oneiric proposed. Prior to patch, did see "Killing all remaining processes... fail" as described. After patch, saw "All processes ended within 16 seconds". According to the test case, this is a successful fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Hello Michael, or anyone else affected, Accepted sysvinit into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance! ** Tags added: verification-needed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Fix is waiting in the oneiric-proposed queue. ** Description changed: + == SRU JUSTIFICATION == + + IMPACT: potential data loss or extension of downtime. MySQL, for + example, if sent a SIGKILL before it is done flushing its buffers into + MyISAM tables, will lose that data. If using InnoDB, the transactions + will have to be replayed from the transaction log at startup, which can + take far longer than completing the flush procedure which the 300 second + kill timeout in its job file allows for. + + TEST CASE: + + 1. create a script, /usr/local/bin/15seconds.py with this as the + content: + + # BEGIN COPY/PASTE # + #!/usr/bin/python + + import time + import signal + import logging + import sys + + logging.basicConfig(level=logging.INFO,format="TEST: %(asctime)s: + %(message)s") + + def shutdown_process(sig, frame): + logging.info("sleeping 15 seconds...") + time.sleep(15) + logging.info("now exitting...") + sys.exit(0) + + signal.signal(signal.SIGTERM, shutdown_process) + + logging.info("Entering infinite loop") + while True: + time.sleep(1) + # END COPY/PASTE # + + chmod +x /usr/local/bin/15seconds.py + + 2. Create an upstart job file, /etc/init/15sec.conf to run this: + + # BEGIN COPY/PASTE # + start on runlevel [2345] + stop on runlevel [016] + + respawn + + kill timeout 17 + + console output + + exec /usr/local/bin/15seconds.py + # END COPY/PASTE # + + 3. sudo initctl start 15sec + 4. sudo shutdown -h now + + On an affected system, the job will be sent SIGKILL before the 15 second + kill timeout, so your shutdown log will look something like this: + + Checking for running unattended-upgrades: + TEST: 2011-12-13 01:02:54,638: sleeping 15 seconds... + * Asking all remaining processes to terminate... TEST: 2011-12-13 01:02:54,818: now exitting... + TEST: 2011-12-13 01:02:54,819: sleeping 15 seconds... + [ OK ] + * Killing all remaining processes... [fail] + * Deconfiguring network interfaces... [ OK ] + * Deactivating swap... [ OK ] + * Will now halt + [ 68.020383] System halted. + + An unaffected system will look like this: + + Checking for running unattended-upgrades: + TEST: 2011-12-13 00:52:30,476: sleeping 15 seconds... + * Asking all remaining processes to terminate...[ OK ] + TEST: 2011-12-13 00:52:45,497: now exitting... + * All processes ended within 16 seconds [ OK ] + * Deconfiguring network interfaces... [ OK ] + * Deactivating swap... [ OK ] + * Will now halt + [ 356.481556] System halted. + + Note that the 15sec job is waited on once the bug is fixed, where in the + unpatched version it is killed immediately. + + DEV FIX: The sendsigs script has not been changed in precise other than + for this patch. + + REGRESSION POTENTIAL: There may be scenarios and jobs that have very + high kill timeouts which will cause system shutdowns to wait for up to + 300 seconds instead of the pervious 10. This is considered a good + balance between waiting long enough for any reasonable application to + flush its buffers and short enough that we won't run up against any + battery backup systems running out of battery power. + + == + I'm using mysql-server-5.1 on a 10.04 LTS installation. The mysql db is around 27GB and on a separate partition mounted as /var/lib/mysql. On shutdown I get the following error message: - Checking for running unattended-upgrades: * Asking all remaining processes to terminate... - [80G + Checking for running unattended-upgrades: * Asking all remaining processes to terminate... + [80G [74G[ OK ] - * All processes ended within 1 seconds - [80G + * All processes ended within 1 seconds + [80G [74G[ OK ] - * Deconfiguring network interfaces... - [80G + * Deconfiguring network interfaces... + [80G [74G[ OK ] - * Deactivating swap... - [80G + * Deactivating swap... + [80G [74G[ OK ] - * Unmounting local filesystems... + * Unmounting local filesystems... [80G umount2: Device or resource busy umount: /var/lib/mysql: device is busy. - (In some cases useful info about processes that use - the device is found by lsof(8) or fuser(1)) + (In some cases useful info about processes that use + the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount2: Device or resource busy umount: /tmp: device is busy. - (In some cases useful info about processes that use - the device is found by lsof(8) or fuser(1)) + (In some cases useful info about processes that
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Also affects: mysql-5.1 (Ubuntu Oneiric) Importance: Undecided Status: New ** Also affects: mysql-5.5 (Ubuntu Oneiric) Importance: Undecided Status: New ** Also affects: sysvinit (Ubuntu Oneiric) Importance: Undecided Status: New ** Changed in: mysql-5.1 (Ubuntu Oneiric) Status: New => Invalid ** Changed in: mysql-5.5 (Ubuntu Oneiric) Status: New => Invalid ** Changed in: sysvinit (Ubuntu Oneiric) Status: New => In Progress ** Changed in: sysvinit (Ubuntu Oneiric) Assignee: (unassigned) => Clint Byrum (clint-fewbar) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
This bug was fixed in the package sysvinit - 2.88dsf-13.10ubuntu8 --- sysvinit (2.88dsf-13.10ubuntu8) precise; urgency=low * d/src/initscripts/etc/init.d/sendsigs: wait up to 300 extra seconds for upstart jobs that have been killed. They will be sent SIGKILL by upstart when their 'kill timeout' has been reached, so we should trust the job's author to give the service a reasonable amount of time to shut down. (LP: #688541) * also omit pids of stop/killed upstart jobs since we know they've been killed already. * d/src/initscripts/etc/init.d/umountroot: Check for init.upgraded file in /var/run before clearing out /var/run. (LP: #886439) -- Clint ByrumMon, 12 Dec 2011 16:16:37 -0800 ** Branch linked: lp:ubuntu/sysvinit ** Changed in: sysvinit (Ubuntu Precise) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Branch linked: lp:~clint-fewbar/ubuntu/precise/sysvinit/wait-for- long-shutdown-jobs -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Changed in: mysql-5.5 (Ubuntu Precise) Status: In Progress => Invalid ** Changed in: mysql-5.1 (Ubuntu Precise) Status: Triaged => Invalid ** Changed in: sysvinit (Ubuntu Precise) Status: Triaged => In Progress ** Changed in: sysvinit (Ubuntu Precise) Assignee: Canonical Foundations Team (canonical-foundations) => Clint Byrum (clint-fewbar) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
I think ultimately no, mysql-5.5 shouldn't be changed for this, and agreed the change should be straight forward. I was not certain if we were willing to tackle the sysvinit change in precise.. but on second thought, of course we should be. So I'll hold back on the change to mysql-5.5 and take a look at sysvinit. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
I agree that this should be fixed in sysvinit. Is it really appropriate to change mysql-5.5 at all? It should be a straightforward change to sysvinit, and the mysql change should be reverted afterwards. ** Tags added: rls-p-tracking ** Also affects: mysql-5.1 (Ubuntu Precise) Importance: High Assignee: Clint Byrum (clint-fewbar) Status: Triaged ** Also affects: mysql-5.5 (Ubuntu Precise) Importance: High Assignee: Clint Byrum (clint-fewbar) Status: In Progress ** Also affects: sysvinit (Ubuntu Precise) Importance: High Assignee: Canonical Foundations Team (canonical-foundations) Status: Triaged ** Changed in: sysvinit (Ubuntu Precise) Milestone: None => ubuntu-12.04 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Changed in: mysql-5.5 (Ubuntu) Assignee: (unassigned) => Clint Byrum (clint-fewbar) ** Changed in: mysql-5.1 (Ubuntu) Status: In Progress => Triaged -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Branch linked: lp:~clint-fewbar/ubuntu/precise/mysql-5.5/merge-from- ddebian -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Also affects: mysql-5.5 (Ubuntu) Importance: Undecided Status: New ** Changed in: mysql-5.5 (Ubuntu) Status: New => In Progress ** Changed in: mysql-5.5 (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
So, I believe the right way to handle this is to wait a long time for any upstart job that has status of 'stop/killed'. We can't be finding all of the "services that are slow to shutdown" one by one. Authors of upstart jobs will know how long to wait before sending kill -9. Once kill -9 has been sent, the job's state actually changes to post-stop, so sendsigs wouldn't wait any longer anyway, but we should cap it at something longer than 10 seconds. I would suggest 5 minutes. Anyway, because of this, I don't think we should just fix this in mysql, we should fix it in sysvinit. However until its fixed in sysvinit, I'll change mysql's stop on to be 'stop on starting rc RUNLEVEL...' ** Changed in: mysql-5.1 (Ubuntu) Assignee: Canonical Foundations Team (canonical-foundations) => Clint Byrum (clint-fewbar) ** Changed in: mysql-5.1 (Ubuntu) Status: Triaged => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Changed in: sysvinit (Ubuntu) Importance: Undecided => High ** Changed in: sysvinit (Ubuntu) Assignee: (unassigned) => Canonical Foundations Team (canonical-foundations) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
I do unsubscribe from this bug for the time beeing. It does not make sense to deal with the symtoms until the root of the evil Bug #672177 is fixed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
And here the output of initctl list after reinstall of libc6. It is amaizeing, that even though I selected "root shell without network" in the maintainence system, a lot of services including network is up and running (I used scp to copy the output to my PC). ** Attachment added: ""initctl list" output after reinstall of libc6" https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+attachment/1775392/+files/initctl.out -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
I first tried to grep within a x-term, but there I get faulty output regarding gvfs. I suppose it's not needed. So I booted into a maintainence root-shell (without network) and did: lsof -n | grep deleted -> nothing reported apt-get install --reinstall libc6 and afterwards lsof -n | grep deleted -> nothing reported Rebooting brings up the 8 orphaned inodes. May I conclude that reinstall of libc6 package performes correctly and file-system corruption is caused by shutdown process? I am prepared to do more tests, just advise (I am not an expert) and consider local time, I am living east of Greenwich. Merry Christmas, Ingo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
On Thu, 2010-12-23 at 22:07 +, ingo wrote: > I took you literally and canged all [!2345], not the others: > > The remaining now are: > > fgrep "stop on runlevel" /etc/init/*.conf > /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL] > /etc/init/rcS.conf:stop on runlevel [!S] > /etc/init/rc-sysinit.conf:stop on runlevel > /etc/init/tty2.conf:stop on runlevel [!23] > /etc/init/tty3.conf:stop on runlevel [!23] > /etc/init/tty4.conf:stop on runlevel [!23] > /etc/init/tty5.conf:stop on runlevel [!23] > /etc/init/tty6.conf:stop on runlevel [!23] > /etc/init/ufw.conf:stop on runlevel [!023456] > > I still get the orphaned inodes. Shall I also convert the tty's? > You can, but I doubt they're the problem. Can you paste the output of lsof -n |grep deleted After the reinstall? Thanks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
I took you literally and canged all [!2345], not the others: The remaining now are: fgrep "stop on runlevel" /etc/init/*.conf /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL] /etc/init/rcS.conf:stop on runlevel [!S] /etc/init/rc-sysinit.conf:stop on runlevel /etc/init/tty2.conf:stop on runlevel [!23] /etc/init/tty3.conf:stop on runlevel [!23] /etc/init/tty4.conf:stop on runlevel [!23] /etc/init/tty5.conf:stop on runlevel [!23] /etc/init/tty6.conf:stop on runlevel [!23] /etc/init/ufw.conf:stop on runlevel [!023456] I still get the orphaned inodes. Shall I also convert the tty's? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
On Thu, 2010-12-23 at 12:40 +, ingo wrote: > @Clint: > > I did test your proposal in Maverick. > > Before editing the stop scripts: > > fgrep "stop on runlevel" /etc/init/*.conf > /etc/init/acpid.conf:stop on runlevel [!2345] > /etc/init/anacron.conf:stop on runlevel [!2345] > /etc/init/apport.conf:stop on runlevel [!2345] > /etc/init/atd.conf:stop on runlevel [!2345] > /etc/init/cron.conf:stop on runlevel [!2345] > /etc/init/cups.conf:stop on runlevel [016] > /etc/init/dbus.conf:stop on runlevel [06] > /etc/init/failsafe-x.conf:stop on runlevel [06] > /etc/init/gdm.conf:stop on runlevel [016] > /etc/init/irqbalance.conf:stop on runlevel [!2345] > /etc/init/mountall-shell.conf:stop on runlevel [06] > /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL] > /etc/init/rcS.conf:stop on runlevel [!S] > /etc/init/rc-sysinit.conf:stop on runlevel > /etc/init/rsyslog.conf:stop on runlevel [06] > /etc/init/tty1.conf:stop on runlevel [!2345] > /etc/init/tty2.conf:stop on runlevel [!23] > /etc/init/tty3.conf:stop on runlevel [!23] > /etc/init/tty4.conf:stop on runlevel [!23] > /etc/init/tty5.conf:stop on runlevel [!23] > /etc/init/tty6.conf:stop on runlevel [!23] > /etc/init/udev.conf:stop on runlevel [06] > /etc/init/ufw.conf:stop on runlevel [!023456] > > After editing the stop scripts: > > fgrep "stop on starting" /etc/init/*.conf > /etc/init/cups.conf:stop on starting rc RUNLEVEL=[016] > /etc/init/dbus.conf:stop on starting rc RUNLEVEL=[06] > /etc/init/failsafe-x.conf:stop on starting rc RUNLEVEL=[06] > /etc/init/gdm.conf:stop on starting rc RUNLEVEL=[016] > /etc/init/mountall.conf:stop on starting rcS > /etc/init/mountall-shell.conf:stop on starting rc RUNLEVEL=[06] > /etc/init/rsyslog.conf:stop on starting rc RUNLEVEL=[06] > /etc/init/udev.conf:stop on starting rc RUNLEVEL=[06] > > Then execute > > apt-get install --reinstall libc6 > and reboot: > > I still get the 8 orphaned inodes as reported already. > > Did I miss to change the other scrips as well like this? > stop on runlevel [!2345] -> stop on stop on starting rc RUNLEVEL=[016] > Yes, and some of those are probably the most likely to have libc open. If doing the same to all of the !2345's does not fix the corruption, can you do: apt-get install --reinstall libc6 lsof -n |grep deleted initctl list And paste or upload the output of that here? > -- > You received this bug notification because you are a direct subscriber > of the bug. > https://bugs.launchpad.net/bugs/688541 > > Title: > race condition on shutdown (leads to corrupted fs) > > Status in “mysql-5.1” package in Ubuntu: > Triaged > Status in “sysvinit” package in Ubuntu: > Triaged > > Bug description: > I'm using mysql-server-5.1 on a 10.04 LTS installation. > The mysql db is around 27GB and on a separate partition mounted as > /var/lib/mysql. > > On shutdown I get the following error message: > > Checking for running unattended-upgrades: * Asking all remaining processes > to terminate... > [80G > [74G[ OK ] > * All processes ended within 1 seconds > [80G > [74G[ OK ] > * Deconfiguring network interfaces... > [80G > [74G[ OK ] > * Deactivating swap... > [80G > [74G[ OK ] > * Unmounting local filesystems... > [80G umount2: Device or resource busy > umount: /var/lib/mysql: device is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > umount2: Device or resource busy > umount2: Device or resource busy > umount: /tmp: device is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > umount2: Device or resource busy > [74G[ > [31mfail > [39;49m] > mount: / is busy > * Will now restart > [ 3369.429751] Restarting system. > > > On the next reboot the file system is corrupt and need to be fsck-ed. > > I think the problem is, that mysql uses an upstart job (/etc/init/mysql.conf) > and has > stop on runlevel [016] > > The rc.conf job is also triggered on runlevel 0 and 6, so they basically run > at the same time.As > > When /etc/rc0.d/S20sendsigs is run, it deliberatly does not wait or kill any > upstart jobs. > > As my mysqld process takes some time to shutdown, S40umountfs and > S60umountroot are run before the mysqld has quit. > > Leading to the fs not being properly unmounted. It is event possible that > mysqld is forcefully killed by halt in S90halt if it hasn't stopped by then. > > This is a serious issue, as it can (and will) lead to data loss. > > Other upstart jobs, like rsyslog.conf, use the same "stop on runlevel [016]" > stanza, so they are probably affected too. > > ProblemType: Bug > DistroRelease: Ubuntu 10.10 > Package: mysql-server-5.1 5.1.49-1ubuntu8.1 > Uname: Linux 2.6.32-5-686 i686 > NonfreeKernelModules: michael_mic arc4 ecb lib80211_crypt_tkip aes_i586 > aes_generic lib80211_crypt_ccmp sco bnep rfcomm l2cap binfmt_misc > acpi_cpufreq ppdev lp cpuf
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
@Clint: I did test your proposal in Maverick. Before editing the stop scripts: fgrep "stop on runlevel" /etc/init/*.conf /etc/init/acpid.conf:stop on runlevel [!2345] /etc/init/anacron.conf:stop on runlevel [!2345] /etc/init/apport.conf:stop on runlevel [!2345] /etc/init/atd.conf:stop on runlevel [!2345] /etc/init/cron.conf:stop on runlevel [!2345] /etc/init/cups.conf:stop on runlevel [016] /etc/init/dbus.conf:stop on runlevel [06] /etc/init/failsafe-x.conf:stop on runlevel [06] /etc/init/gdm.conf:stop on runlevel [016] /etc/init/irqbalance.conf:stop on runlevel [!2345] /etc/init/mountall-shell.conf:stop on runlevel [06] /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL] /etc/init/rcS.conf:stop on runlevel [!S] /etc/init/rc-sysinit.conf:stop on runlevel /etc/init/rsyslog.conf:stop on runlevel [06] /etc/init/tty1.conf:stop on runlevel [!2345] /etc/init/tty2.conf:stop on runlevel [!23] /etc/init/tty3.conf:stop on runlevel [!23] /etc/init/tty4.conf:stop on runlevel [!23] /etc/init/tty5.conf:stop on runlevel [!23] /etc/init/tty6.conf:stop on runlevel [!23] /etc/init/udev.conf:stop on runlevel [06] /etc/init/ufw.conf:stop on runlevel [!023456] After editing the stop scripts: fgrep "stop on starting" /etc/init/*.conf /etc/init/cups.conf:stop on starting rc RUNLEVEL=[016] /etc/init/dbus.conf:stop on starting rc RUNLEVEL=[06] /etc/init/failsafe-x.conf:stop on starting rc RUNLEVEL=[06] /etc/init/gdm.conf:stop on starting rc RUNLEVEL=[016] /etc/init/mountall.conf:stop on starting rcS /etc/init/mountall-shell.conf:stop on starting rc RUNLEVEL=[06] /etc/init/rsyslog.conf:stop on starting rc RUNLEVEL=[06] /etc/init/udev.conf:stop on starting rc RUNLEVEL=[06] Then execute apt-get install --reinstall libc6 and reboot: I still get the 8 orphaned inodes as reported already. Did I miss to change the other scrips as well like this? stop on runlevel [!2345] -> stop on stop on starting rc RUNLEVEL=[016] -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
I was working with someone on another issue recently, and he pointed out a situation where someone had used this: start on starting rc RUNLEVEL=[06] to run a specific task before the system shut down. It got me thinking, should we instead just transition services that need to start before shutdown to stop on starting rc RUNLEVEL=[016] That would cause these jobs to stop fully before any of the bits of the shutdown run. They'd still shutdown in parallel, so it wouldn't make the shutdown slower. I do think you have to do *all* services like this. Even one left holding deleted libraries open can still ruin the shutdown process. Anyway, I like this even shorter term solution because it allows us to SRU individual problem daemons such as mysql without creating a new event. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
> This doesn't affect Debian as the upstart package in Debian still uses > plain sysv compat and there are no native upstart jobs yet. A wise decision, good to know. Thanks, Ingo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/21 ingo <688...@bugs.launchpad.net>: > On Tue, 2010-12-21 at 12:41 +, Scott James Remnant wrote: >> General note: ubuntu-devel is *NOT* the correct list to discuss Upstart >> changes unless they're unique to Ubuntu. > > Wouldn't it be fair to inform Debian about those problems before they release > Squeeze? > (tough I never observed it on Squeeze till now) This doesn't affect Debian as the upstart package in Debian still uses plain sysv compat and there are no native upstart jobs yet. Michael (upstart maintainer in Debian) -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
On Tue, 2010-12-21 at 12:41 +, Scott James Remnant wrote: > General note: ubuntu-devel is *NOT* the correct list to discuss Upstart > changes unless they're unique to Ubuntu. Wouldn't it be fair to inform Debian about those problems before they release Squeeze? (tough I never observed it on Squeeze till now) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
@Michael: yes, this should be "stop on unmount-filesystem or single- user" (we can create a new event for single-user to make the logic clearer). @Clint: I agree that full migration sounds like the best approach. I have had a few discussions previously with Scott on the idea of abstract jobs. There is quite a lot of scope here. Aside from network-services, we could introduce jobs such as: - "network-manager" (not the application, could also refer to connman, wicd, etc). - "firewall" (iptables, ufw, etc). - "display-manager" (gdm, kdm, xdm, etc) - "ssh" (openssh, dropbear) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
On Tue, 2010-12-21 at 12:41 +, Scott James Remnant wrote: > On 20/12/10 18:22, Clint Byrum wrote: > > In a message to ubuntu-devel I suggested that we have an abstract job, > > 'network-services', which most normal (non boot-critical) services > > should follow. > > > > https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html > > > General note: ubuntu-devel is *NOT* the correct list to discuss Upstart > changes unless they're unique to Ubuntu. > Thanks, Scott In this case, I don't know if this would be unique to Ubuntu or not. I am not suggesting a code change in upstart with that message, but rather a change in the way upstart is used and packaged in Ubuntu. Though, it would be rather nice if everybody used upstart the same way. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
On Mon, 2010-12-20 at 12:50 +, James Hunt wrote: > After discussion with Scott, the best short-term solution would seem to > be: > > 1) Modify /etc/init.d/umountfs to call the following in do_stop before > calling umount/swapoff: > > "initctl emit unmount-filesystem" > > 2) Modify /etc/init.d/umountroot to call the following in do_stop before > calling umount: > > "initctl emit unmount-root-filesystem" > > > 3) Modify all upstart configs for services which are "slow" to stop such that > they "stop on unmount-filesystem", > rather than "stop on runlevel [016]". > > 4) Test! > > The overall effect of this being that when /etc/init.d/umountfs emits > the unmount-filesystem event, it will block until any Upstart jobs which > "stop on" those events have completed. Thus, /etc/init.d/umountfs will > wait for the mysql Upstart job to finish before unmounting its > filesystems. Not much happens between rc-sysinit starting and sendsigs/umountfs. Is slow even 1 second between SIGTERM and exiting? Shouldn't we just make sure everything that is 'stop on runlevel [!2345]' or 'stop on runlevel [016]' stops before we umount? bug #672177 may very well be caused simply by killing the last service that had the deleted libc.so.6 open, causing the fs to need to finish the deletion right then, which could be waiting on a sync and many other files being flushed/etc. on a busy rotational disk. This will cause something very tiny to take a second to die. I think we must transition *everything* that stops on runlevel [016] to 'stop on unmounting-filesystems', or get clever and find a way to wait until upstart is done stopping everything it already wants to stop. I do think that initctl list is flawed for this task, but it might be the best chance at catching stragglers that we have. In a message to ubuntu-devel I suggested that we have an abstract job, 'network-services', which most normal (non boot-critical) services should follow. https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html By taking this approach, we can at least ammend this fix if it has unintended consequences. There's also still the issue (which probably should be its own bug report) that sendsigs will kill the children of already stopping jobs, which it shouldn't do, and which it would still do in the suggested fix since sendsigs runs before umountfs. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/20 James Hunt <688...@bugs.launchpad.net>: > > 3) Modify all upstart configs for services which are "slow" to stop such that > they "stop on unmount-filesystem", > rather than "stop on runlevel [016]". - What about single user mode? I guess when switching to runlevel 1 we want to stop services like mysql? - How do you decide if a service is '"slow" to stop' ? Imho that highly depends on the given hardware, local configuration and the amount of data you are dealing with. A general approach would be preferable. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
After discussion with Scott, the best short-term solution would seem to be: 1) Modify /etc/init.d/umountfs to call the following in do_stop before calling umount/swapoff: "initctl emit unmount-filesystem" 2) Modify /etc/init.d/umountroot to call the following in do_stop before calling umount: "initctl emit unmount-root-filesystem" 3) Modify all upstart configs for services which are "slow" to stop such that they "stop on unmount-filesystem", rather than "stop on runlevel [016]". 4) Test! The overall effect of this being that when /etc/init.d/umountfs emits the unmount-filesystem event, it will block until any Upstart jobs which "stop on" those events have completed. Thus, /etc/init.d/umountfs will wait for the mysql Upstart job to finish before unmounting its filesystems. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
On Thu, 2010-12-16 at 15:45 +, Michael Biebl wrote: > 2010/12/16 Clint Byrum : > > > > > > I'm attaching a debdiff that solves the race as far as I can tell, > > though I think it needs a good long look, since it could mean shutdowns > > hang for a long time waiting (I'm especially curious if the pre-stop > > /post-stop's are subject to kill timeout) > > This code is still racy, afaics. What about upstart jobs, which are > not stopped by "stop on runlevel [016]"? They could receive their stop > signal at a point when your loop has already been run. > Indeed, there is still a race I think now that I dig through upstart's code a bit. If any of the jobs in the stop/!waiting state have 'stop on stopped' jobs that will be stopped after they stop, the event isn't emitted until *after* the transition to stop/waiting. thread A (upstart job foo): start/running -> stop/pre-stop sends TERM to owned process stop/pre-stop -> stop/killed process dies stop/killed -> stop/waiting emit stopped JOB=foo thread B (upstart job baz) start/running -> stop/pre-stop sends kill to owned process stop/pre-stop -> stop/killed process dies stop/killed -> stop/waiting thread C (sleep loop) runs initctl list greps sleeps runs initctl list greps sleeps list is handled by doing a "get all jobs" command first, and then individual status commands for each job, so its entirely possible that we will ask for the status of baz and it will say start/running, and then foo finishes its transition, then we ask for foo's status and it is stop/waiting, and we think we're done. This race would probably be solved by having a "list all jobs with status" command, as long as the stopped event is guaranteed to be consumed before any commands, which, I believe it will. One delicate issue is that if an upstart managed process dies for any other reason than being stopped, upstart will try to respawn it, so we can't just go sending SIGTERM/SIGKILL to all pids, as upstart will fight us on those. We actually have to stop everything. > If you don't want to change existing jobs, we probably have to pick up > Ante's suggestion, and do the following in sendsigs: > > 1) run a for loop to wait for *all* running upstart jobs to stop. > upstart jobs which need to keep running past sendsigs (e.g. plymouth) > need to signal that using a similar mechanism like the killall5 > sendsigs.d omit interface. I'd at least give upstart jobs 60secs time > to stop, so big databases etc have enough time to cleanly shutdown IMO, leaving out a valid stop on that gets it stopped at or before runlevel [016] is the equivilent of the omit interface. You've started it, saying exactly when upstart should or should not stop it. However, if you've wandered into the scenario mentioned above with stop on stopped foo, then we need to handle that. > 2.) run a for loop and send SIGTERM all remaining processes, but do > *not* add upstart pids to $OMITPIDS See above, you'd have to send 'stop' commands to upstart for them, instead of omitting them. > 3.) send a final SIGKILL if any processes are left. > I'd say "let upstart do that".. but how do we know when we can continue on to unmounting? I suppose after a lengthy timeout (60s does seem long enough, though mysql can take longer) this makes sense. > > Regarding 1.), it would be nice to have a native C implementation in > upstart, instead of running initctl, grep and sleep manually. > I agree, but I'm having trouble envisioning exactly what one would ask for. "Block until all current goals are reached." Would work maybe. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Tags added: patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/16 Clint Byrum : > > /etc/init.d/sendsigs has this code: > > > # Upstart jobs have their own "stop on" clauses that sends > # SIGTERM/SIGKILL just like this, so if they're still running, > # they're supposed to be > for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process > //p"); do > OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid" > done > > > It uses this to determine which pids not to kill because, presumably, upstart > should be managing them. > > However, this code is flawed. killall5 will kill the children of all of > these if they are multi process daemons or scripts running things. This observation is correct. On the other hand, isn't this exactly what the sendsigs script is for: clean up any remaining, stray processes which have not been stopped by its corresponding sysv init script or upstart job (or have been e.g. started by the user)? But I guess you are right, we should first stop all upstart jobs, give them time to finish stopping, and then let sendsigs clean up anything remaining afterwards. > However, this technique can actually be used to determine if there are > still jobs that are supposed to be stopped, but haven't finished > stopping yet. Since they should be listed as stop/(pre-stop|post- > stop|killed), we can determine exactly which pids we expect to go away. > Since upstart has its own idea of how long to wait before it kills > these, we should actually wait indefinitely. > > I'm attaching a debdiff that solves the race as far as I can tell, > though I think it needs a good long look, since it could mean shutdowns > hang for a long time waiting (I'm especially curious if the pre-stop > /post-stop's are subject to kill timeout) This code is still racy, afaics. What about upstart jobs, which are not stopped by "stop on runlevel [016]"? They could receive their stop signal at a point when your loop has already been run. If you don't want to change existing jobs, we probably have to pick up Ante's suggestion, and do the following in sendsigs: 1) run a for loop to wait for *all* running upstart jobs to stop. upstart jobs which need to keep running past sendsigs (e.g. plymouth) need to signal that using a similar mechanism like the killall5 sendsigs.d omit interface. I'd at least give upstart jobs 60secs time to stop, so big databases etc have enough time to cleanly shutdown 2.) run a for loop and send SIGTERM all remaining processes, but do *not* add upstart pids to $OMITPIDS 3.) send a final SIGKILL if any processes are left. Regarding 1.), it would be nice to have a native C implementation in upstart, instead of running initctl, grep and sleep manually. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Also attaching the bash script that I used to test this, which simulates a process taking a long time on SIGTERM without forking.. it *should* work with sleep too, given the sendsigs change I posted, but when that change is not there.. sendsigs kills the sleeps and ruins all the fun. Below is the upstart job I used to run it. I tested this on lucid, 10.04.1, and without the sendsigs change, the script would continue to run right up to the umounts and beyond despite having been "stopped". With the sendsigs change to wait, the test script would be sent SIGKILL well before the end of the halt. start on filesystem and net-device-up stop on runlevel [016] console output kill timeout 20 exec /home/clint/test_dies_slowly.bash ** Attachment added: "test_dies_slowly.bash" https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+attachment/1767468/+files/test_dies_slowly.bash -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
** Patch added: "lp688541.diff" https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+attachment/1767453/+files/lp688541.diff ** Changed in: sysvinit (Ubuntu) Status: New => Triaged -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
So I've done some more thinking about this, and I had a bit of an aha! moment. While we *should* in fact stop using 'stop on runlevel [016]' or 'stop on runlevel [!2345]', I think we can solve this without touching all of those jobs. /etc/init.d/sendsigs has this code: # Upstart jobs have their own "stop on" clauses that sends # SIGTERM/SIGKILL just like this, so if they're still running, # they're supposed to be for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid" done It uses this to determine which pids not to kill because, presumably, upstart should be managing them. However, this code is flawed. killall5 will kill the children of all of these if they are multi process daemons or scripts running things. This would only be solved by walking through /proc looking for these as parent pids (and then doing the same again with the new list.. ). However, this technique can actually be used to determine if there are still jobs that are supposed to be stopped, but haven't finished stopping yet. Since they should be listed as stop/(pre-stop|post- stop|killed), we can determine exactly which pids we expect to go away. Since upstart has its own idea of how long to wait before it kills these, we should actually wait indefinitely. I'm attaching a debdiff that solves the race as far as I can tell, though I think it needs a good long look, since it could mean shutdowns hang for a long time waiting (I'm especially curious if the pre-stop /post-stop's are subject to kill timeout) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Hmm, I am wondering now if this bug is the same thing. https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/616287 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Great point, thanks for pointing that out! rc-sysinit does not start until filesystem and net-device-up IFACE=lo, and so, runlevel 2, which is reached by callint rc-sysinit, implies all of the services you mention. It is important to point out that we must include any of those *implied* to be started up by filesystem or local- filesystems. Before I go off and throw one together, I wonder if there is a tool that reads through /etc/init/*.conf and would simulate each event and the resulting chaos^H^H^H^H^Hstarted jobs? Such a thing would be massively useful. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/14 Clint Byrum : > > I do think the appropriate fix is to have umountfs emit an 'unmounting- > filesystems' event and anything that does a 'start on local-filesystems' > or 'start on filesystem' should also 'stop on unmounting-filesystems', What do you do about services which have "start on runlevel [2345]" and the binary is in /usr? There are quite a few examples here: acpid, atd, cron, irqbalance, etc which all have: start on runlevel [2345] stop on runlevel [!2345] Either those jobs are buggy to not specify the "start on (local-)filesystems" dependency or your criteria is not sufficient. Imho the major problem here is, that there is a mixup between dependencies that need to be satisfied to be able to run a job and when (in which runlevels) to start a job. Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
And, whoops, I just re-read that, its using killall5's -o to still omit those processes. Please disregard that last message then. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Note that there is one incorrect assumption, which is that sendsigs will never kill any upstart jobs. In fact, it does make one attempt to kill -9 any still running upstart jobs: if [ -z "$alldead" ] ; then log_action_begin_msg "Killing all remaining processes" #report_unkillable killall5 -9 $OMITPIDS # SIGKILL log_action_end_msg 1 Unfortunately, it doesn't actually wait for this kill -9 to finish, so its still possible to have running processes there corrupting the system. I do think the appropriate fix is to have umountfs emit an 'unmounting- filesystems' event and anything that does a 'start on local-filesystems' or 'start on filesystem' should also 'stop on unmounting-filesystems', causing this to wait for upstart to give up on its jobs (which is nice as they can have their own well defined kill timeout). What I don't know yet, is whether upstart will check to see that its SIGKILL actually ended the job, or just report that it sent it, and move on. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
James, could you take a look at this? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Ante, good eyes there. That statement is a little misleading, that "if they're still running, they're supposed to be", as this assumes there was an event somewhere between the running system, and runlevel [016], which to my knowledge, there isn't. I'm a little confused as to why umountfs is still running as part of rc, and not in an upstart job. I'm pretty sure actually, that this is related to bug #616287 , which I originally thought was mountall's fault, but now it seems is in fact sysvinit's. In any case, one solution could be to have umountfs emit 'unmounting- filesystems' before it starts, and then change 'stop on runlevel [016]' to 'stop on unmounting-filesystems'. If I understand initctl correctly, it will wait for all of the triggered stops to complete before continuing. I also think that we should look at abstracting the events a bit more for generic services so job writers don't have to become boot experts to know when to start on / stop on. Adding sysvinit task as well. ** Also affects: sysvinit (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/10 Ante Karamatić : > Suggestion: make umountfs wait for all upstart jobs to finish. Doesn't that conflict though with what is written in /etc/init.d/sendsigs: # Upstart jobs have their own "stop on" clauses that sends # SIGTERM/SIGKILL just like this, so if they're still running, # they're supposed to be for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid" done or # did an upstart job start since we last polled initctl? check # again on each loop and add any new jobs (e.g., plymouth) to # the list. If we did miss one starting up, this beats waiting # 10 seconds before shutting down. for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid" done -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Suggestion: make umountfs wait for all upstart jobs to finish. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
What would be the general approach to express "shut down on runlevel 0/1/6 before the disks go away" in terms of upstart triggers? Once there's an approach, pleaes hand over to canonical-server. Thanks! ** Changed in: mysql-5.1 (Ubuntu) Status: New => Triaged ** Changed in: mysql-5.1 (Ubuntu) Assignee: (unassigned) => Canonical Foundations Team (canonical-foundations) ** Changed in: mysql-5.1 (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs