ShutdownWaitLength vs. 'restart' in init scripts

2009-06-22 Thread Bill McGonigle
Hi folks,

I noticed a problem with the init script I have with the tor package on
Fedora 10.  The 'restart' command (just a start and stop) sends a -INT
to the running process, but doesn't account for ShutdownWaitLength.  It
looks like the old server instance unbinds, so the new one can start up,
but then when the old one is really ready to die, it takes them all out,
which leaves an inconsistent lockfile state and no tor running.  I first
noticed this on a version upgrade (which runs a 'restart').

Looks like this:
  Jun 22 22:09:39.260 [notice] Performing bandwidth self-test...done.


  Jun 22 22:09:57.620 [notice] Clean shutdown

So, I'm curious what other folks are doing to handle this.  I'm thinking
in order of preference:

 * wait for the pid file to disappear
 * extract ShutdownWaitLength from the config and wait that long
 * send a double -INT on stop()
 * wait 30 seconds

or, perhaps even better: fixing the server shutdown process so the old
server can't take out the new server.

But today is the first time I've ever run a tor relay, and I don't know
the codebase or what I don't know, so pointers appreciated from those
who may have already figured this out.

Thanks,
-Bill

-- 
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
http://www.bfccomputing.com/Cell: 603.252.2606
Twitter, etc.: bill_mcgonigle   Page: 603.442.1833
Email, IM, VOIP: b...@bfccomputing.com
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf


Re: ShutdownWaitLength vs. 'restart' in init scripts

2009-06-23 Thread m
If the package came from Fedoras repository they may use their own
init-scripts. Contact Fedoras bugreporting.

M


Bill McGonigle wrote:
> Hi folks,
> 
> I noticed a problem with the init script I have with the tor package on
> Fedora 10.  The 'restart' command (just a start and stop) sends a -INT
> to the running process, but doesn't account for ShutdownWaitLength.  It
> looks like the old server instance unbinds, so the new one can start up,
> but then when the old one is really ready to die, it takes them all out,
> which leaves an inconsistent lockfile state and no tor running.  I first
> noticed this on a version upgrade (which runs a 'restart').
> 
> Looks like this:
>   Jun 22 22:09:39.260 [notice] Performing bandwidth self-test...done.
> 
> 
>   Jun 22 22:09:57.620 [notice] Clean shutdown
> 
> So, I'm curious what other folks are doing to handle this.  I'm thinking
> in order of preference:
> 
>  * wait for the pid file to disappear
>  * extract ShutdownWaitLength from the config and wait that long
>  * send a double -INT on stop()
>  * wait 30 seconds
> 
> or, perhaps even better: fixing the server shutdown process so the old
> server can't take out the new server.
> 
> But today is the first time I've ever run a tor relay, and I don't know
> the codebase or what I don't know, so pointers appreciated from those
> who may have already figured this out.
> 
> Thanks,
> -Bill
> 


Re: ShutdownWaitLength vs. 'restart' in init scripts

2009-06-25 Thread Roger Dingledine
On Tue, Jun 23, 2009 at 12:47:56AM -0400, Bill McGonigle wrote:
> I noticed a problem with the init script I have with the tor package on
> Fedora 10.

Is this with the tor rpm shipped by Fedora? We don't support (or use,
or like, or recommend) the fedora tor rpm.

>  The 'restart' command (just a start and stop) sends a -INT
> to the running process, but doesn't account for ShutdownWaitLength.

Check out how the torctl that we ship in the rpms handles that:
https://git.torproject.org/checkout/tor/master/contrib/torctl.in
It basically kills it, and then tries to kill it even harder if it
doesn't die. That's probably not the right answer.

The right answer imo is how the deb package does it:
https://git.torproject.org/checkout/tor/master/debian/tor.init
Check out the wait_for_deaddaemon function: it basically checks each
second whether the process is still around, and returns when it's gone
(or 60 seconds have passed).

So I guess if you raise your ShutdownWaitLength, you'll want to tweak
the script. But that still seems better than the
"kill -INT, sleep 1, kill -9" strategy the rpm uses.

Patches to the torctl script greatly appreciated. :)

> or, perhaps even better: fixing the server shutdown process so the old
> server can't take out the new server.

Can you clarify what happens here? 'tor stop' finishes but Tor is still
running, so then 'tor start' fails to launch a new Tor, and then the
old Tor exits, and then you have no Tor running but you think you do?

Thanks!
--Roger



Re: ShutdownWaitLength vs. 'restart' in init scripts

2009-06-25 Thread Bill McGonigle
On 06/25/2009 04:39 AM, Roger Dingledine wrote:

> Is this with the tor rpm shipped by Fedora?

yes

> We don't support (or use,
> or like, or recommend) the fedora tor rpm.

OK.  Is this ideological, or because it's no good?  I can work to fix
the latter (I don't think anybody at Fedora wants a bad RPM).  In my
experience automatic security updates [aside: which are currently borked
for the Fedora tor package, but that's another story, which I think I've
gotten the right people to resolve at this point] are worth many
trade-offs.  People just don't do security updates the way we wish they
would.

> The right answer imo is how the deb package does it:
> https://git.torproject.org/checkout/tor/master/debian/tor.init
> Check out the wait_for_deaddaemon function: it basically checks each
> second whether the process is still around, and returns when it's gone
> (or 60 seconds have passed).

this makes sense.

> So I guess if you raise your ShutdownWaitLength, you'll want to tweak
> the script. But that still seems better than the
> "kill -INT, sleep 1, kill -9" strategy the rpm uses.

agreed, do you see any reason not to extract ShutdownWaitLength from the
config file?

>> or, perhaps even better: fixing the server shutdown process so the old
>> server can't take out the new server.
> 
> Can you clarify what happens here? 'tor stop' finishes but Tor is still
> running, so then 'tor start' fails to launch a new Tor, and then the
> old Tor exits, and then you have no Tor running but you think you do?

OK, so to be more clear:

Let's call the old tor process we're taking down torA and the new one we
want torB.

1) torA is sent an INT to tell it to stop.  It begins its shutdown process.
2) The init script isn't waiting or watching, so it starts torB.
Because torA is no longer bound to its listener port, torB can start up
just fine. The init script is out of the picture now.
3) torA reaches ShutdownWaitLength time.  It kills itself.  <---guess
4) torB gets taken out by torA's final shutdown.

At this point the init script's lockfiles reflect torB running, when
actually no tor is running.  So it behaves inappropriately.

I realize there's no point in addressing the current Fedora RPM init
script here, but assuming 4) is correct, it would seem that however torA
is finding torB, it shouldn't do it that way.

-Bill
-- 
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
http://www.bfccomputing.com/Cell: 603.252.2606
Twitter, etc.: bill_mcgonigle   Page: 603.442.1833
Email, IM, VOIP: b...@bfccomputing.com
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf


Re: ShutdownWaitLength vs. 'restart' in init scripts

2009-06-26 Thread m

I would go with the Debian init-script. I have used it for years without
any problems. Much better that Tor's own contributed init-script. It is
much nicer to wait the process to die than to kill -9 them and cut users
active connections.

I recommend that you take a look at the tor-debian-init-script and fix
Fedoras init-script by using Debian's init-script as an example.

You can find Debian Lenny's tor-init at
http://tor-proxy.piirakka.com/debian-tor-init

Oh jea, the script contains little local change by me but it's all in
one place and start of local changes is marked by  # START OF LOCAL
CHANGES and the end of local changes is marked by # END OF LOCAL CHANGES.

I had little problems with tor eating all memory and slowing the server
to a crawl until it was impossible even to ssh to the server. Played
with ulimit and found working values.


M

ps: I'd really want to run a tor exit-node but I have to think about my
family. I don't want cops all over the place again, scared the shit out
of my wife. If I lived alone it would be much easier.


Re: ShutdownWaitLength vs. 'restart' in init scripts

2009-07-14 Thread Roger Dingledine
On Thu, Jun 25, 2009 at 09:50:16PM -0400, Bill McGonigle wrote:
> > Is this with the tor rpm shipped by Fedora?
> 
> yes
> 
> > We don't support (or use,
> > or like, or recommend) the fedora tor rpm.
> 
> OK.  Is this ideological, or because it's no good?

Mostly the latter.

There was a little issue at the beginning where there were two people
offering to be maintainer, and I had been working with one and he had
a fine rpm spec file, and for some reason they picked the other and
ignored us both. I haven't interacted with them at all since; they seem
happy in their world ignoring upstream. :(

>  I can work to fix
> the latter (I don't think anybody at Fedora wants a bad RPM).  In my
> experience automatic security updates [aside: which are currently borked
> for the Fedora tor package, but that's another story, which I think I've
> gotten the right people to resolve at this point] are worth many
> trade-offs.  People just don't do security updates the way we wish they
> would.

Right. We've been thinking of setting up an rpm.torproject.org
repository, and putting our rpms into it. That would be similar to the
mirror.noreply.org deb repository that our Debian maintainer maintains.

Then we would have better control over what people think of as our rpms.

But the even better answer would be to somehow get the fedora folks
to improve their spec file. It needs to set ulimit -n like the debian
init script does; understand how to shut down relays cleanly; create
a separate user and run Tor as that user; and I really haven't looked
deep enough lately to know what else it's missing.

> > The right answer imo is how the deb package does it:
> > https://git.torproject.org/checkout/tor/master/debian/tor.init
> > Check out the wait_for_deaddaemon function: it basically checks each
> > second whether the process is still around, and returns when it's gone
> > (or 60 seconds have passed).
> 
> this makes sense.
> 
> > So I guess if you raise your ShutdownWaitLength, you'll want to tweak
> > the script. But that still seems better than the
> > "kill -INT, sleep 1, kill -9" strategy the rpm uses.
> 
> agreed, do you see any reason not to extract ShutdownWaitLength from the
> config file?

The main reason against would be 'complexity'.

For example, if you run a controller that 'setconf's a new
ShutdownWaitLength value via the control port, but the Tor process can't
saveconf because it can't write to its torrc (arguably a feature not a
bug), then Tor would be using a different value of ShutdownWaitLength
than you'd find in its torrc file. That's an unlikely edge case, but it
illustrates how it might not be that simple.

I would say that just taking Debian's "up to 60 seconds while it exits"
strategy would get us most of the way there.

> >> or, perhaps even better: fixing the server shutdown process so the old
> >> server can't take out the new server.
> > 
> > Can you clarify what happens here? 'tor stop' finishes but Tor is still
> > running, so then 'tor start' fails to launch a new Tor, and then the
> > old Tor exits, and then you have no Tor running but you think you do?
> 
> OK, so to be more clear:
> 
> Let's call the old tor process we're taking down torA and the new one we
> want torB.
> 
> 1) torA is sent an INT to tell it to stop.  It begins its shutdown process.
> 2) The init script isn't waiting or watching, so it starts torB.
> Because torA is no longer bound to its listener port, torB can start up
> just fine. The init script is out of the picture now.
> 3) torA reaches ShutdownWaitLength time.  It kills itself.  <---guess
> 4) torB gets taken out by torA's final shutdown.
> 
> At this point the init script's lockfiles reflect torB running, when
> actually no tor is running.  So it behaves inappropriately.

Ah. I'm not sure whether that's the exact order of events (I don't think
TorA can kill TorB just by exiting), but in any case it's certainly a
big mess. The thing to do is to make sure that TorA has exited before
you launch TorB, and that has to be done by the init script.

> I realize there's no point in addressing the current Fedora RPM init
> script here, but assuming 4) is correct, it would seem that however torA
> is finding torB, it shouldn't do it that way.

Tor just exits cleanly when it's counted through its ShutdownWaitLength
time. It doesn't go hunting for other instances of Tor to kill them too.

Perhaps TorB dies early on (but after writing its pid file) when it
realizes that TorA is still around?

In any case, launching Tor while Tor is still running is not a supported
operation. We should make it not do that. :)

Can you take the lead in making a patch, and in either getting Fedora to
believe in it, or helping us maintain our own rpms better?

Thanks!
--Roger