subject:"Re\: \[Bug 688541\] Re\: race condition on shutdown \(leads to corrupted fs\)"

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-23 Thread Clint Byrum

On Thu, 2010-12-23 at 22:07 +, ingo wrote:
 I took you literally and canged all [!2345], not the others:
 
 The remaining now are:
 
 fgrep stop on runlevel /etc/init/*.conf
 /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
 /etc/init/rcS.conf:stop on runlevel [!S]
 /etc/init/rc-sysinit.conf:stop on runlevel
 /etc/init/tty2.conf:stop on runlevel [!23]
 /etc/init/tty3.conf:stop on runlevel [!23]
 /etc/init/tty4.conf:stop on runlevel [!23]
 /etc/init/tty5.conf:stop on runlevel [!23]
 /etc/init/tty6.conf:stop on runlevel [!23]
 /etc/init/ufw.conf:stop on runlevel [!023456]
 
 I still get the orphaned inodes. Shall I also convert the tty's?
 

You can, but I doubt they're the problem.

Can you paste the output of

lsof -n |grep deleted

After the reinstall?

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-23 Thread Clint Byrum

On Thu, 2010-12-23 at 12:40 +, ingo wrote:
 @Clint:
 
 I did test your proposal in Maverick.
 
 Before editing the stop scripts:
 
 fgrep stop on runlevel /etc/init/*.conf
 /etc/init/acpid.conf:stop on runlevel [!2345]
 /etc/init/anacron.conf:stop on runlevel [!2345]
 /etc/init/apport.conf:stop on runlevel [!2345]
 /etc/init/atd.conf:stop on runlevel [!2345]
 /etc/init/cron.conf:stop on runlevel [!2345]
 /etc/init/cups.conf:stop on runlevel [016]
 /etc/init/dbus.conf:stop on runlevel [06]
 /etc/init/failsafe-x.conf:stop on runlevel [06]
 /etc/init/gdm.conf:stop on runlevel [016]
 /etc/init/irqbalance.conf:stop on runlevel [!2345]
 /etc/init/mountall-shell.conf:stop on runlevel [06]
 /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
 /etc/init/rcS.conf:stop on runlevel [!S]
 /etc/init/rc-sysinit.conf:stop on runlevel
 /etc/init/rsyslog.conf:stop on runlevel [06]
 /etc/init/tty1.conf:stop on runlevel [!2345]
 /etc/init/tty2.conf:stop on runlevel [!23]
 /etc/init/tty3.conf:stop on runlevel [!23]
 /etc/init/tty4.conf:stop on runlevel [!23]
 /etc/init/tty5.conf:stop on runlevel [!23]
 /etc/init/tty6.conf:stop on runlevel [!23]
 /etc/init/udev.conf:stop on runlevel [06]
 /etc/init/ufw.conf:stop on runlevel [!023456]
 
 After editing the stop scripts:
 
 fgrep stop on starting /etc/init/*.conf
 /etc/init/cups.conf:stop on starting rc RUNLEVEL=[016]
 /etc/init/dbus.conf:stop on starting rc RUNLEVEL=[06]
 /etc/init/failsafe-x.conf:stop on starting rc RUNLEVEL=[06]
 /etc/init/gdm.conf:stop on starting rc RUNLEVEL=[016]
 /etc/init/mountall.conf:stop on starting rcS
 /etc/init/mountall-shell.conf:stop on starting rc RUNLEVEL=[06]
 /etc/init/rsyslog.conf:stop on starting rc RUNLEVEL=[06]
 /etc/init/udev.conf:stop on starting rc RUNLEVEL=[06]
 
 Then execute
 
 apt-get install --reinstall libc6
 and reboot:
 
 I still get the 8 orphaned inodes as reported already.
 
 Did I miss to change the other scrips as well like this?
  stop on runlevel [!2345]  - stop on stop on starting rc RUNLEVEL=[016]
 

Yes, and some of those are probably the most likely to have libc open.

If doing the same to all of the !2345's does not fix the corruption, can
you do:

apt-get install --reinstall libc6
lsof -n |grep deleted
initctl list

And paste or upload the output of that here?

 -- 
 You received this bug notification because you are a direct subscriber
 of the bug.
 https://bugs.launchpad.net/bugs/688541
 
 Title:
   race condition on shutdown (leads to corrupted fs)
 
 Status in “mysql-5.1” package in Ubuntu:
   Triaged
 Status in “sysvinit” package in Ubuntu:
   Triaged
 
 Bug description:
   I'm using mysql-server-5.1 on a 10.04 LTS installation.
 The mysql db is around 27GB and on a separate partition mounted as 
 /var/lib/mysql.
 
 On shutdown I get the following error message:
 
 Checking for running unattended-upgrades:  * Asking all remaining processes 
 to terminate...   
 [80G 
 [74G[ OK ]
  * All processes ended within 1 seconds   
 [80G 
 [74G[ OK ]
  * Deconfiguring network interfaces...   
 [80G 
 [74G[ OK ]
  * Deactivating swap...   
 [80G 
 [74G[ OK ]
  * Unmounting local filesystems...   
 [80G umount2: Device or resource busy
 umount: /var/lib/mysql: device is busy.
 (In some cases useful info about processes that use
  the device is found by lsof(8) or fuser(1))
 umount2: Device or resource busy
 umount2: Device or resource busy
 umount: /tmp: device is busy.
 (In some cases useful info about processes that use
  the device is found by lsof(8) or fuser(1))
 umount2: Device or resource busy
 [74G[
 [31mfail
 [39;49m]
 mount: / is busy
  * Will now restart
 [ 3369.429751] Restarting system.
 
 
 On the next reboot the file system is corrupt and need to be fsck-ed.
 
 I think the problem is, that mysql uses an upstart job (/etc/init/mysql.conf) 
 and has
 stop on runlevel [016]
 
 The rc.conf job is also triggered on runlevel 0 and 6, so they basically run 
 at the same time.As 
 
 When /etc/rc0.d/S20sendsigs is run, it deliberatly does not wait or kill any 
 upstart jobs.
 
 As my mysqld process takes some time to shutdown, S40umountfs and 
 S60umountroot are run before the mysqld has quit.
 
 Leading to the fs not being properly unmounted. It is event possible that 
 mysqld is forcefully killed by halt in S90halt if it hasn't stopped by then.
 
 This is a serious issue, as it can (and will) lead to data loss.
 
 Other upstart jobs, like rsyslog.conf, use the same stop on runlevel [016] 
 stanza, so they are probably affected too.
 
 ProblemType: Bug
 DistroRelease: Ubuntu 10.10
 Package: mysql-server-5.1 5.1.49-1ubuntu8.1
 Uname: Linux 2.6.32-5-686 i686
 NonfreeKernelModules: michael_mic arc4 ecb lib80211_crypt_tkip aes_i586 
 aes_generic lib80211_crypt_ccmp sco bnep rfcomm l2cap binfmt_misc 
 acpi_cpufreq ppdev lp cpufreq_userspace cpufreq_stats vboxnetadp 
 cpufreq_powersave vboxnetflt cpufreq_conservative vboxdrv fuse pcmcia 
 snd_intel8x0m snd_intel8x0

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-23 Thread Clint Byrum

On Thu, 2010-12-23 at 22:07 +, ingo wrote:
 I took you literally and canged all [!2345], not the others:
 
 The remaining now are:
 
 fgrep stop on runlevel /etc/init/*.conf
 /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
 /etc/init/rcS.conf:stop on runlevel [!S]
 /etc/init/rc-sysinit.conf:stop on runlevel
 /etc/init/tty2.conf:stop on runlevel [!23]
 /etc/init/tty3.conf:stop on runlevel [!23]
 /etc/init/tty4.conf:stop on runlevel [!23]
 /etc/init/tty5.conf:stop on runlevel [!23]
 /etc/init/tty6.conf:stop on runlevel [!23]
 /etc/init/ufw.conf:stop on runlevel [!023456]
 
 I still get the orphaned inodes. Shall I also convert the tty's?
 

You can, but I doubt they're the problem.

Can you paste the output of

lsof -n |grep deleted

After the reinstall?

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-21 Thread Clint Byrum

On Tue, 2010-12-21 at 12:41 +, Scott James Remnant wrote:
 On 20/12/10 18:22, Clint Byrum wrote:
  In a message to ubuntu-devel I suggested that we have an abstract job,
  'network-services', which most normal (non boot-critical) services
  should follow.
 
  https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html
 
 General note:  ubuntu-devel is *NOT* the correct list to discuss Upstart 
 changes unless they're unique to Ubuntu.
 

Thanks, Scott

In this case, I don't know if this would be unique to Ubuntu or not. I
am not suggesting a code change in upstart with that message, but rather
a change in the way upstart is used and packaged in Ubuntu. Though, it
would be rather nice if everybody used upstart the same way.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-21 Thread Clint Byrum

On Tue, 2010-12-21 at 12:41 +, Scott James Remnant wrote:
 On 20/12/10 18:22, Clint Byrum wrote:
  In a message to ubuntu-devel I suggested that we have an abstract job,
  'network-services', which most normal (non boot-critical) services
  should follow.
 
  https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html
 
 General note:  ubuntu-devel is *NOT* the correct list to discuss Upstart 
 changes unless they're unique to Ubuntu.
 

Thanks, Scott

In this case, I don't know if this would be unique to Ubuntu or not. I
am not suggesting a code change in upstart with that message, but rather
a change in the way upstart is used and packaged in Ubuntu. Though, it
would be rather nice if everybody used upstart the same way.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-21 Thread Michael Biebl

2010/12/21 ingo 688...@bugs.launchpad.net:
 On Tue, 2010-12-21 at 12:41 +, Scott James Remnant wrote:
 General note: ubuntu-devel is *NOT* the correct list to discuss Upstart
 changes unless they're unique to Ubuntu.

 Wouldn't it be fair to inform Debian about those problems before they release 
 Squeeze?
 (tough I never observed it on Squeeze till now)

This doesn't affect Debian as the upstart package in Debian still uses
plain sysv compat and there are no native upstart jobs yet.

Michael
(upstart maintainer in Debian)

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-20 Thread Michael Biebl

2010/12/20 James Hunt 688...@bugs.launchpad.net:

 3) Modify all upstart configs for services which are slow to stop such that 
 they stop on unmount-filesystem,
    rather than stop on runlevel [016].

- What about single user mode? I guess when switching to runlevel 1 we
want to stop services like mysql?
- How do you decide if a service  is 'slow to stop' ? Imho that
highly depends on the given hardware, local configuration and the
amount of data you are dealing with. A general approach would be
preferable.

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-20 Thread Clint Byrum

On Mon, 2010-12-20 at 12:50 +, James Hunt wrote:
 After discussion with Scott, the best short-term solution would seem to
 be:
 
 1) Modify /etc/init.d/umountfs to call the following in do_stop before
 calling umount/swapoff:
 
  initctl emit unmount-filesystem
 
 2) Modify /etc/init.d/umountroot to call the following in do_stop before
 calling umount:
 
  initctl emit unmount-root-filesystem
 
 
 3) Modify all upstart configs for services which are slow to stop such that 
 they stop on unmount-filesystem,
 rather than stop on runlevel [016].
 
 4) Test!
 
 The overall effect of this being that when /etc/init.d/umountfs emits
 the unmount-filesystem event, it will block until any Upstart jobs which
 stop on those events have completed. Thus, /etc/init.d/umountfs will
 wait for the mysql Upstart job to finish before unmounting its
 filesystems.


Not much happens between rc-sysinit starting and sendsigs/umountfs. Is
slow even 1 second between SIGTERM and exiting? Shouldn't we just make
sure everything that is 'stop on runlevel [!2345]' or 'stop on runlevel
[016]' stops before we umount? bug #672177 may very well be caused
simply by killing the last service that had the deleted libc.so.6 open,
causing the fs to need to finish the deletion right then, which could be
waiting on a sync and many other files being flushed/etc. on a busy
rotational disk. This will cause something very tiny to take a second to
die.

I think we must transition *everything* that stops on runlevel [016] to
'stop on unmounting-filesystems', or get clever and find a way to wait
until upstart is done stopping everything it already wants to stop. I do
think that initctl list is flawed for this task, but it might be the
best chance at catching stragglers that we have.

In a message to ubuntu-devel I suggested that we have an abstract job,
'network-services', which most normal (non boot-critical) services
should follow.

https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html

By taking this approach, we can at least ammend this fix if it has
unintended consequences.

There's also still the issue (which probably should be its own bug
report) that sendsigs will kill the children of already stopping jobs,
which it shouldn't do, and which it would still do in the suggested fix
since sendsigs runs before umountfs.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-20 Thread Michael Biebl

2010/12/20 James Hunt 688...@bugs.launchpad.net:

 3) Modify all upstart configs for services which are slow to stop such that 
 they stop on unmount-filesystem,
    rather than stop on runlevel [016].

- What about single user mode? I guess when switching to runlevel 1 we
want to stop services like mysql?
- How do you decide if a service  is 'slow to stop' ? Imho that
highly depends on the given hardware, local configuration and the
amount of data you are dealing with. A general approach would be
preferable.

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-20 Thread Clint Byrum

On Mon, 2010-12-20 at 12:50 +, James Hunt wrote:
 After discussion with Scott, the best short-term solution would seem to
 be:
 
 1) Modify /etc/init.d/umountfs to call the following in do_stop before
 calling umount/swapoff:
 
  initctl emit unmount-filesystem
 
 2) Modify /etc/init.d/umountroot to call the following in do_stop before
 calling umount:
 
  initctl emit unmount-root-filesystem
 
 
 3) Modify all upstart configs for services which are slow to stop such that 
 they stop on unmount-filesystem,
 rather than stop on runlevel [016].
 
 4) Test!
 
 The overall effect of this being that when /etc/init.d/umountfs emits
 the unmount-filesystem event, it will block until any Upstart jobs which
 stop on those events have completed. Thus, /etc/init.d/umountfs will
 wait for the mysql Upstart job to finish before unmounting its
 filesystems.


Not much happens between rc-sysinit starting and sendsigs/umountfs. Is
slow even 1 second between SIGTERM and exiting? Shouldn't we just make
sure everything that is 'stop on runlevel [!2345]' or 'stop on runlevel
[016]' stops before we umount? bug #672177 may very well be caused
simply by killing the last service that had the deleted libc.so.6 open,
causing the fs to need to finish the deletion right then, which could be
waiting on a sync and many other files being flushed/etc. on a busy
rotational disk. This will cause something very tiny to take a second to
die.

I think we must transition *everything* that stops on runlevel [016] to
'stop on unmounting-filesystems', or get clever and find a way to wait
until upstart is done stopping everything it already wants to stop. I do
think that initctl list is flawed for this task, but it might be the
best chance at catching stragglers that we have.

In a message to ubuntu-devel I suggested that we have an abstract job,
'network-services', which most normal (non boot-critical) services
should follow.

https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html

By taking this approach, we can at least ammend this fix if it has
unintended consequences.

There's also still the issue (which probably should be its own bug
report) that sendsigs will kill the children of already stopping jobs,
which it shouldn't do, and which it would still do in the suggested fix
since sendsigs runs before umountfs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-16 Thread Michael Biebl

2010/12/16 Clint Byrum cl...@fewbar.com:

 /etc/init.d/sendsigs has this code:


        # Upstart jobs have their own stop on clauses that sends
        # SIGTERM/SIGKILL just like this, so if they're still running,
        # they're supposed to be
        for pid in $(initctl list | sed -n -e /process [0-9]/s/.*process 
 //p); do
                OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
        done


 It uses this to determine which pids not to kill because, presumably, upstart 
 should be managing them.

 However, this code is flawed. killall5 will kill the children of all of
 these if they are multi process daemons or scripts running things.

This observation is correct. On the other hand, isn't this exactly
what the sendsigs script is for: clean up any remaining, stray
processes  which have not been stopped by its corresponding sysv init
script or upstart job (or have been e.g. started by the user)?

But I guess you are right, we should first stop all upstart jobs, give
them time to finish stopping, and then let sendsigs clean up anything
remaining afterwards.

 However, this technique can actually be used to determine if there are
 still jobs that are supposed to be stopped, but haven't finished
 stopping yet. Since they should be listed as stop/(pre-stop|post-
 stop|killed), we can determine exactly which pids we expect to go away.
 Since upstart has its own idea of how long to wait before it kills
 these, we should actually wait indefinitely.

 I'm attaching a debdiff that solves the race as far as I can tell,
 though I think it needs a good long look, since it could mean shutdowns
 hang for a long time waiting (I'm especially curious if the pre-stop
 /post-stop's are subject to kill timeout)

This code is still racy, afaics. What about upstart jobs, which are
not stopped by stop on runlevel [016]? They could receive their stop
signal at a point when your loop has already been run.

If you don't want to change existing jobs, we probably have to pick up
Ante's suggestion, and do the following in sendsigs:

1) run a for loop to wait for *all* running upstart jobs to stop.
upstart jobs which need to keep running past sendsigs (e.g. plymouth)
need to signal that using a similar mechanism like the killall5
sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
to stop, so big databases etc have enough time to cleanly shutdown
2.) run a for loop and send SIGTERM all remaining processes, but do
*not* add upstart pids to $OMITPIDS
3.) send a final SIGKILL if any processes are left.


Regarding 1.), it would be nice to have a native C implementation in
upstart, instead of running initctl, grep and sleep manually.


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-16 Thread Clint Byrum

On Thu, 2010-12-16 at 15:45 +, Michael Biebl wrote:
 2010/12/16 Clint Byrum cl...@fewbar.com:
 
  
  I'm attaching a debdiff that solves the race as far as I can tell,
  though I think it needs a good long look, since it could mean shutdowns
  hang for a long time waiting (I'm especially curious if the pre-stop
  /post-stop's are subject to kill timeout)
 
 This code is still racy, afaics. What about upstart jobs, which are
 not stopped by stop on runlevel [016]? They could receive their stop
 signal at a point when your loop has already been run.
 

Indeed, there is still a race I think now that I dig through upstart's
code a bit. If any of the jobs in the stop/!waiting state have 'stop on
stopped' jobs that will be stopped after they stop, the event isn't
emitted until *after* the transition to stop/waiting.

thread A (upstart job foo):

start/running - stop/pre-stop
sends TERM to owned process
stop/pre-stop - stop/killed
process dies
stop/killed - stop/waiting
emit stopped JOB=foo

thread B (upstart job baz)
start/running - stop/pre-stop
sends kill to owned process
stop/pre-stop - stop/killed
process dies
stop/killed - stop/waiting

thread C (sleep loop)

runs initctl list
greps
sleeps
runs initctl list
greps
sleeps

list is handled by doing a get all jobs command first, and then
individual status commands for each job, so its entirely possible that
we will ask for the status of baz and it will say start/running, and
then foo finishes its transition, then we ask for foo's status and it is
stop/waiting, and we think we're done.

This race would probably be solved by having a list all jobs with
status command, as long as the stopped event is guaranteed to be
consumed before any commands, which, I believe it will.

One delicate issue is that if an upstart managed process dies for any
other reason than being stopped, upstart will try to respawn it, so we
can't just go sending SIGTERM/SIGKILL to all pids, as upstart will fight
us on those. We actually have to stop everything.

 If you don't want to change existing jobs, we probably have to pick up
 Ante's suggestion, and do the following in sendsigs:
 
 1) run a for loop to wait for *all* running upstart jobs to stop.
 upstart jobs which need to keep running past sendsigs (e.g. plymouth)
 need to signal that using a similar mechanism like the killall5
 sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
 to stop, so big databases etc have enough time to cleanly shutdown

IMO, leaving out a valid stop on that gets it stopped at or before
runlevel [016] is the equivilent of the omit interface. You've started
it, saying exactly when upstart should or should not stop it. However,
if you've wandered into the scenario mentioned above with stop on
stopped foo, then we need to handle that.

 2.) run a for loop and send SIGTERM all remaining processes, but do
 *not* add upstart pids to $OMITPIDS

See above, you'd have to send 'stop' commands to upstart for them,
instead of omitting them.

 3.) send a final SIGKILL if any processes are left.
 

I'd say let upstart do that.. but how do we know when we can continue
on to unmounting? I suppose after a lengthy timeout (60s does seem long
enough, though mysql can take longer) this makes sense.

 
 Regarding 1.), it would be nice to have a native C implementation in
 upstart, instead of running initctl, grep and sleep manually.
 

I agree, but I'm having trouble envisioning exactly what one would ask
for. Block until all current goals are reached. Would work maybe.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-16 Thread Michael Biebl

2010/12/16 Clint Byrum cl...@fewbar.com:

 /etc/init.d/sendsigs has this code:


        # Upstart jobs have their own stop on clauses that sends
        # SIGTERM/SIGKILL just like this, so if they're still running,
        # they're supposed to be
        for pid in $(initctl list | sed -n -e /process [0-9]/s/.*process 
 //p); do
                OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
        done


 It uses this to determine which pids not to kill because, presumably, upstart 
 should be managing them.

 However, this code is flawed. killall5 will kill the children of all of
 these if they are multi process daemons or scripts running things.

This observation is correct. On the other hand, isn't this exactly
what the sendsigs script is for: clean up any remaining, stray
processes  which have not been stopped by its corresponding sysv init
script or upstart job (or have been e.g. started by the user)?

But I guess you are right, we should first stop all upstart jobs, give
them time to finish stopping, and then let sendsigs clean up anything
remaining afterwards.

 However, this technique can actually be used to determine if there are
 still jobs that are supposed to be stopped, but haven't finished
 stopping yet. Since they should be listed as stop/(pre-stop|post-
 stop|killed), we can determine exactly which pids we expect to go away.
 Since upstart has its own idea of how long to wait before it kills
 these, we should actually wait indefinitely.

 I'm attaching a debdiff that solves the race as far as I can tell,
 though I think it needs a good long look, since it could mean shutdowns
 hang for a long time waiting (I'm especially curious if the pre-stop
 /post-stop's are subject to kill timeout)

This code is still racy, afaics. What about upstart jobs, which are
not stopped by stop on runlevel [016]? They could receive their stop
signal at a point when your loop has already been run.

If you don't want to change existing jobs, we probably have to pick up
Ante's suggestion, and do the following in sendsigs:

1) run a for loop to wait for *all* running upstart jobs to stop.
upstart jobs which need to keep running past sendsigs (e.g. plymouth)
need to signal that using a similar mechanism like the killall5
sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
to stop, so big databases etc have enough time to cleanly shutdown
2.) run a for loop and send SIGTERM all remaining processes, but do
*not* add upstart pids to $OMITPIDS
3.) send a final SIGKILL if any processes are left.


Regarding 1.), it would be nice to have a native C implementation in
upstart, instead of running initctl, grep and sleep manually.


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-16 Thread Clint Byrum

On Thu, 2010-12-16 at 15:45 +, Michael Biebl wrote:
 2010/12/16 Clint Byrum cl...@fewbar.com:
 
  
  I'm attaching a debdiff that solves the race as far as I can tell,
  though I think it needs a good long look, since it could mean shutdowns
  hang for a long time waiting (I'm especially curious if the pre-stop
  /post-stop's are subject to kill timeout)
 
 This code is still racy, afaics. What about upstart jobs, which are
 not stopped by stop on runlevel [016]? They could receive their stop
 signal at a point when your loop has already been run.
 

Indeed, there is still a race I think now that I dig through upstart's
code a bit. If any of the jobs in the stop/!waiting state have 'stop on
stopped' jobs that will be stopped after they stop, the event isn't
emitted until *after* the transition to stop/waiting.

thread A (upstart job foo):

start/running - stop/pre-stop
sends TERM to owned process
stop/pre-stop - stop/killed
process dies
stop/killed - stop/waiting
emit stopped JOB=foo

thread B (upstart job baz)
start/running - stop/pre-stop
sends kill to owned process
stop/pre-stop - stop/killed
process dies
stop/killed - stop/waiting

thread C (sleep loop)

runs initctl list
greps
sleeps
runs initctl list
greps
sleeps

list is handled by doing a get all jobs command first, and then
individual status commands for each job, so its entirely possible that
we will ask for the status of baz and it will say start/running, and
then foo finishes its transition, then we ask for foo's status and it is
stop/waiting, and we think we're done.

This race would probably be solved by having a list all jobs with
status command, as long as the stopped event is guaranteed to be
consumed before any commands, which, I believe it will.

One delicate issue is that if an upstart managed process dies for any
other reason than being stopped, upstart will try to respawn it, so we
can't just go sending SIGTERM/SIGKILL to all pids, as upstart will fight
us on those. We actually have to stop everything.

 If you don't want to change existing jobs, we probably have to pick up
 Ante's suggestion, and do the following in sendsigs:
 
 1) run a for loop to wait for *all* running upstart jobs to stop.
 upstart jobs which need to keep running past sendsigs (e.g. plymouth)
 need to signal that using a similar mechanism like the killall5
 sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
 to stop, so big databases etc have enough time to cleanly shutdown

IMO, leaving out a valid stop on that gets it stopped at or before
runlevel [016] is the equivilent of the omit interface. You've started
it, saying exactly when upstart should or should not stop it. However,
if you've wandered into the scenario mentioned above with stop on
stopped foo, then we need to handle that.

 2.) run a for loop and send SIGTERM all remaining processes, but do
 *not* add upstart pids to $OMITPIDS

See above, you'd have to send 'stop' commands to upstart for them,
instead of omitting them.

 3.) send a final SIGKILL if any processes are left.
 

I'd say let upstart do that.. but how do we know when we can continue
on to unmounting? I suppose after a lengthy timeout (60s does seem long
enough, though mysql can take longer) this makes sense.

 
 Regarding 1.), it would be nice to have a native C implementation in
 upstart, instead of running initctl, grep and sleep manually.
 

I agree, but I'm having trouble envisioning exactly what one would ask
for. Block until all current goals are reached. Would work maybe.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-13 Thread Michael Biebl

2010/12/14 Clint Byrum cl...@fewbar.com:

 I do think the appropriate fix is to have umountfs emit an 'unmounting-
 filesystems' event and anything that does a 'start on local-filesystems'
 or 'start on filesystem' should also 'stop on unmounting-filesystems',

What do you do about services which have
start on runlevel [2345] and the binary is in /usr?

There are quite a few examples here: acpid, atd, cron, irqbalance, etc
which all have:

start on runlevel [2345]
stop on runlevel [!2345]

Either those jobs are buggy to not specify the start on
(local-)filesystems dependency or your criteria is not sufficient.

Imho the major problem here is, that there is a mixup between
dependencies that need to be satisfied to be able to run a job and
when (in which runlevels) to start a job.

Michael


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-13 Thread Michael Biebl

2010/12/14 Clint Byrum cl...@fewbar.com:

 I do think the appropriate fix is to have umountfs emit an 'unmounting-
 filesystems' event and anything that does a 'start on local-filesystems'
 or 'start on filesystem' should also 'stop on unmounting-filesystems',

What do you do about services which have
start on runlevel [2345] and the binary is in /usr?

There are quite a few examples here: acpid, atd, cron, irqbalance, etc
which all have:

start on runlevel [2345]
stop on runlevel [!2345]

Either those jobs are buggy to not specify the start on
(local-)filesystems dependency or your criteria is not sufficient.

Imho the major problem here is, that there is a mixup between
dependencies that need to be satisfied to be able to run a job and
when (in which runlevels) to start a job.

Michael


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-10 Thread Michael Biebl

2010/12/10 Ante Karamatić iv...@grad.hr:
 Suggestion: make umountfs wait for all upstart jobs to finish.

Doesn't that conflict though with what is written in
/etc/init.d/sendsigs:

# Upstart jobs have their own stop on clauses that sends
# SIGTERM/SIGKILL just like this, so if they're still running,
# they're supposed to be
for pid in $(initctl list | sed -n -e /process
[0-9]/s/.*process //p); do
OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
done

or

# did an upstart job start since we last polled initctl? check
# again on each loop and add any new jobs (e.g., plymouth) to
# the list.  If we did miss one starting up, this beats waiting
# 10 seconds before shutting down.
for pid in $(initctl list | sed -n -e /process
[0-9]/s/.*process //p); do
OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
done


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-10 Thread Michael Biebl

2010/12/10 Ante Karamatić iv...@grad.hr:
 Suggestion: make umountfs wait for all upstart jobs to finish.

Doesn't that conflict though with what is written in
/etc/init.d/sendsigs:

# Upstart jobs have their own stop on clauses that sends
# SIGTERM/SIGKILL just like this, so if they're still running,
# they're supposed to be
for pid in $(initctl list | sed -n -e /process
[0-9]/s/.*process //p); do
OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
done

or

# did an upstart job start since we last polled initctl? check
# again on each loop and add any new jobs (e.g., plymouth) to
# the list.  If we did miss one starting up, this beats waiting
# 10 seconds before shutting down.
for pid in $(initctl list | sed -n -e /process
[0-9]/s/.*process //p); do
OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
done


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

18 matches

Site Navigation

Mail list logo

Footer information