Re: rc(8) script -- waiting for the network to become usable

2010-04-27 Thread Brandon Gooch
On Mon, Apr 26, 2010 at 10:02 PM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 On Tue, Apr 27, 2010 at 09:48:41AM +1000, Phil wrote:
 Jeremy,
 A good proposal to improve start-up robustness. If I may suggest,
 waitnetwork_ip should include a short list of alternate IP's in
 the event of a local network outage, or DOS, etc.  Something like:
 waitnetwork_ip=IP1 IP2 IP3

 Having multiple target IP's will improve the likelihood of timely
 booting when silly/nasty things happen on the wider network.

 Good idea to have incorporated into the base system.

 Phil,

 I brought this point up in my post on -rc and -net, actually.  I've
 since extended the script to support multiple IPs in $waitnetwork_ip
 (wasn't that hard).  The logic is that if any of the IPs pass the ping
 test, then the network connection is considered usable.

 Attached is the modified script.  I'll be updating the version on my
 server (HTTP) momentarily.

 --
 | Jeremy Chadwick                                   j...@parodius.com |
 | Parodius Networking                       http://www.parodius.com/ |
 | UNIX Systems Administrator                  Mountain View, CA, USA |
 | Making life hard for others since 1977.              PGP: 4BD6C0CB |


 #!/bin/sh
 #
 # $FreeBSD: $
 #

 # PROVIDE: waitnetwork
 # REQUIRE: NETWORKING
 # BEFORE: mountcritremote
 # KEYWORD: nojail

 # XXX - Once/if committed to base, it's better to have mountcritremote
 # XXX - REQUIRE waitnetwork, rather than use the above BEFORE line.

 . /etc/rc.subr

 name=waitnetwork
 rc_var=`set_rcvar`

 start_cmd=waitnetwork_start
 stop_cmd=:

 # XXX - Once/if committed to base, the following defaults should
 # XXX - be placed into src/etc/defaults/rc.conf instead of here
 # XXX - Also be sure to keep waitnetwork_ip= commented out!

 waitnetwork_enable=NO         # Wait for network availability before
                                # continuing with NETWORKING rc scripts
 #waitnetwork_ip=              # IP address to ping
 waitnetwork_count=5           # ping count (see ping(8) -c flag)
 waitnetwork_timeout=60        # ping timeout (see ping(8) -t flag)

 waitnetwork_start()
 {
        local ip rc success

        success=0

        if [ -z ${waitnetwork_ip} ]; then
                warn You must define one or more IP addresses in 
 waitnetwork_ip
                return
        fi

        for ip in ${waitnetwork_ip}; do
                echo Waiting for ${ip} to respond to ICMP...

                if [ -z ${waitnetwork_timeout} ]; then
                        /sbin/ping -c ${waitnetwork_count} ${ip} /dev/null 
 21
                        rc=$?
                else
                        info Using timeout of ${waitnetwork_timeout} seconds
                        /sbin/ping -t ${waitnetwork_timeout} -c 
 ${waitnetwork_count} ${ip} /dev/null 21
                        rc=$?
                fi

                if [ $rc -eq 0 ]; then
                        echo Host reachable; network considered available.
                        return
                else
                        echo No response from host.
                fi
        done

        echo Exhausted IP list.  Continuing with startup, but be aware you 
 may
        echo not have a fully functional networking layer at this point.
 }

 load_rc_config $name
 run_rc_command $1

Not to hijack the thread, but this type of clean, quality work (even
though some consider it a hack), really helps a lot of people out.

I wonder, has anyone ever brought up the idea of an rc repository or
something similar, for rc scripts and/or configs that may help many,
but for whatever reason, will not be included in the base system?

I'm thinking of something more official, hosted at the freebsd.org
domain. Maybe in the same vein as:

http://www.sun.com/bigadmin/home/index.jsp

Shields up,

-Brandon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-26 Thread Jeremy Chadwick
On Mon, Apr 19, 2010 at 11:54:46AM -0700, Jeremy Chadwick wrote:
 On Mon, Apr 19, 2010 at 11:05:17AM -0700, Doug Barton wrote:
  On 4/18/2010 4:24 PM, Andrew Reilly wrote:
   By way of discussion, I'd just like to re-iterate what I
   said the first time around: it must be understood that this
   sort of thing is a (necessary) hacky-workaround that should
   ultimately be unnecessary. 
  
  While I find the discussion about launchd-type facilities interesting,
  we have a real problem at the moment and now we have a real solution for it.
  
  Jeremy, since no one has criticized your idea on a technical basis why
  don't you run it by the -net and -rc lists to be sure that it's
  technically sound, then I would be inclined to move forward with it.
 
 Doug and Garrett -- thanks.  I'll send a shorter mail to said lists
 referencing the discussion here on -stable and let folks weigh in.
 
 Much appreciated!

I've sent said mail to -rc and -net.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: rc(8) script -- waiting for the network to become usable

2010-04-26 Thread Phil
Jeremy, 
A good proposal to improve start-up robustness. If I may suggest, 
waitnetwork_ip should include a short list of alternate IP's in 
the event of a local network outage, or DOS, etc.  Something like:
waitnetwork_ip=IP1 IP2 IP3

Having multiple target IP's will improve the likelihood of timely 
booting when silly/nasty things happen on the wider network.

Good idea to have incorporated into the base system.

Andrew, 
I agree that the problems should be corrected at the source;
and my preference is to fail properly (b) so that other mitigation
may occur.  Done in parallel, would eventually provide a belts 
and braces start-up: wait for the network, and fail properly for 
network dependent processes.  
(I can't speak to desktops that resume from a suspend when the 
network has changed state.)
Regards, Phil

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-26 Thread Jeremy Chadwick
On Tue, Apr 27, 2010 at 09:48:41AM +1000, Phil wrote:
 Jeremy, 
 A good proposal to improve start-up robustness. If I may suggest, 
 waitnetwork_ip should include a short list of alternate IP's in 
 the event of a local network outage, or DOS, etc.  Something like:
 waitnetwork_ip=IP1 IP2 IP3
 
 Having multiple target IP's will improve the likelihood of timely 
 booting when silly/nasty things happen on the wider network.
 
 Good idea to have incorporated into the base system.

Phil,

I brought this point up in my post on -rc and -net, actually.  I've
since extended the script to support multiple IPs in $waitnetwork_ip
(wasn't that hard).  The logic is that if any of the IPs pass the ping
test, then the network connection is considered usable.

Attached is the modified script.  I'll be updating the version on my
server (HTTP) momentarily.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |


#!/bin/sh
#
# $FreeBSD: $
#

# PROVIDE: waitnetwork
# REQUIRE: NETWORKING
# BEFORE: mountcritremote
# KEYWORD: nojail

# XXX - Once/if committed to base, it's better to have mountcritremote
# XXX - REQUIRE waitnetwork, rather than use the above BEFORE line.

. /etc/rc.subr

name=waitnetwork
rc_var=`set_rcvar`

start_cmd=waitnetwork_start
stop_cmd=:

# XXX - Once/if committed to base, the following defaults should
# XXX - be placed into src/etc/defaults/rc.conf instead of here
# XXX - Also be sure to keep waitnetwork_ip= commented out!

waitnetwork_enable=NO # Wait for network availability before
# continuing with NETWORKING rc scripts
#waitnetwork_ip=  # IP address to ping
waitnetwork_count=5   # ping count (see ping(8) -c flag)
waitnetwork_timeout=60# ping timeout (see ping(8) -t flag)

waitnetwork_start()
{
local ip rc success

success=0

if [ -z ${waitnetwork_ip} ]; then
warn You must define one or more IP addresses in 
waitnetwork_ip
return
fi

for ip in ${waitnetwork_ip}; do
echo Waiting for ${ip} to respond to ICMP...

if [ -z ${waitnetwork_timeout} ]; then
/sbin/ping -c ${waitnetwork_count} ${ip} /dev/null 21
rc=$?
else
info Using timeout of ${waitnetwork_timeout} seconds
/sbin/ping -t ${waitnetwork_timeout} -c 
${waitnetwork_count} ${ip} /dev/null 21
rc=$?
fi

if [ $rc -eq 0 ]; then
echo Host reachable; network considered available.
return
else
echo No response from host.
fi
done

echo Exhausted IP list.  Continuing with startup, but be aware you may
echo not have a fully functional networking layer at this point.
}

load_rc_config $name
run_rc_command $1
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-19 Thread Daniel Braniss
 On Sun, Apr 18, 2010 at 02:37:27PM -0700, Jeremy Chadwick wrote:
  I'd like to discuss the possibility of introduction of a new script into
  /etc/rc.d base system a script, which when enabled, would provide a way
  to wait until the IP networking layer (using ping(8)) is up and usable
  before continuing with daemon startup.
  
  Let's discuss.  :-)
  
  
  HISTORY
  =
  The situation which brought this debacle to my attention:
  
  I found that on reboot of some of our systems, ntpdate (used to sync the
  clock initially before ntpd would be started) wouldn't work.  The daemon
  would report that it couldn't resolve any of the FQDNs within ntp.conf,
  and would therefore act as a no-op before continuing on.
 
 By way of discussion, I'd just like to re-iterate what I
 said the first time around: it must be understood that this
 sort of thing is a (necessary) hacky-workaround that should
 ultimately be unnecessary.  In preference, we should work on
 the failing daemons or hassle up-stream daemon authors so
 that the daemons in question either (a) retry until they *do*
 get the information they're after or (b) fail properly, so
 that they can be restarted by an external process monitoring
 framework like sysutils/daemontools or launchd.  The reasoning
 is simple: network outage is something that can happen even
 after startup, and when network connectivity returns, the
 routing and addresses that are visible won't necessarily be the
 same.  Consider laptops that suspend, as a particular example.
 Or mobile devices that switch from wi-fi to cellular networking
 to no connectivity on a regular basis.  The get it right at
 boot time model is important and traditional, but (I think)
 a fragile and diminishing fraction of use cases.  Our rc-ng
 framework favours solution (a).  I'm more a fan of approach (b),
 myself: I use daemontools for many services, and I like the way
 that launchd works on my Mac laptops.

I think that rc is being overloaded yet again(*), and a launchDaemon
kind of solution should be followed, maybe even as a replacement for
inetd?
/blah
(*): in the begining rc would do everything, but life was simple - no internet,
then it got complicated, too many daemons, so inetd was invented, things
were back in control, for a while. Then sysv invented init.d/init levels, then
rc-ng came along, though it solves many problems, 1) the order of things,
2) easy to configure services, it's getting complicated to get 1 'correctly'.
blah/

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-19 Thread Doug Barton
On 4/18/2010 4:24 PM, Andrew Reilly wrote:
 By way of discussion, I'd just like to re-iterate what I
 said the first time around: it must be understood that this
 sort of thing is a (necessary) hacky-workaround that should
 ultimately be unnecessary. 

While I find the discussion about launchd-type facilities interesting,
we have a real problem at the moment and now we have a real solution for it.

Jeremy, since no one has criticized your idea on a technical basis why
don't you run it by the -net and -rc lists to be sure that it's
technically sound, then I would be inclined to move forward with it.


Doug

-- 

... and that's just a little bit of history repeating.
-- Propellerheads

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-19 Thread Garrett Cooper
On Mon, Apr 19, 2010 at 11:05 AM, Doug Barton do...@freebsd.org wrote:
 On 4/18/2010 4:24 PM, Andrew Reilly wrote:
 By way of discussion, I'd just like to re-iterate what I
 said the first time around: it must be understood that this
 sort of thing is a (necessary) hacky-workaround that should
 ultimately be unnecessary.

 While I find the discussion about launchd-type facilities interesting,
 we have a real problem at the moment and now we have a real solution for it.

 Jeremy, since no one has criticized your idea on a technical basis why
 don't you run it by the -net and -rc lists to be sure that it's
 technically sound, then I would be inclined to move forward with it.

Agreed.
Thanks,
-Garrett
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-19 Thread Jeremy Chadwick
On Mon, Apr 19, 2010 at 11:05:17AM -0700, Doug Barton wrote:
 On 4/18/2010 4:24 PM, Andrew Reilly wrote:
  By way of discussion, I'd just like to re-iterate what I
  said the first time around: it must be understood that this
  sort of thing is a (necessary) hacky-workaround that should
  ultimately be unnecessary. 
 
 While I find the discussion about launchd-type facilities interesting,
 we have a real problem at the moment and now we have a real solution for it.
 
 Jeremy, since no one has criticized your idea on a technical basis why
 don't you run it by the -net and -rc lists to be sure that it's
 technically sound, then I would be inclined to move forward with it.

Doug and Garrett -- thanks.  I'll send a shorter mail to said lists
referencing the discussion here on -stable and let folks weigh in.

Much appreciated!

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


rc(8) script -- waiting for the network to become usable

2010-04-18 Thread Jeremy Chadwick
I'd like to discuss the possibility of introduction of a new script into
/etc/rc.d base system a script, which when enabled, would provide a way
to wait until the IP networking layer (using ping(8)) is up and usable
before continuing with daemon startup.

I've written a script that's in use on all of our RELENG_8 systems (I
have not tested RELENG_7) which works reliably; I'll include that script
at the bottom of my mail, and also a link to it[1].

Let's discuss.  :-)


HISTORY
=
The situation which brought this debacle to my attention:

I found that on reboot of some of our systems, ntpdate (used to sync the
clock initially before ntpd would be started) wouldn't work.  The daemon
would report that it couldn't resolve any of the FQDNs within ntp.conf,
and would therefore act as a no-op before continuing on.

This failure had dire consequences -- Dovecot (at least with older
versions; newer seems to behave better[2]) would refuse to start up,
citing time moved backwards.  Dovecot not starting had a trick-down
effect on Postfix (which was compiled to use Dovecot for SMTP AUTH),
where Postfix would start but all inbound mail would fail due to
Dovecot's SMTP AUTH mech not listening on a domain socket.  Ouch.

Since DNS failure was the root issue, I dug around rc.d/named and found
that Doug had introduced a feature to rc.d/named called named_wait
which calls host $named_wait_host repetitively (sleeping 1 second
between calls), waiting until successful resolution before continuing
onwards.  This worked (e.g. set named_wait_host to www.google.com or
something Internet-bound).

However, named itself still complains during startup about host
unreachable resolving XXX messages with regards to the root servers.
These errors were visible in logs, etc... and could cause confusion or
unnecessary worry (they did in my case).

The root cause should be fairly obvious: the physical networking layer
hadn't fully come up by the time named had started.  In other cases, the
physical network was available but layer 2 (ARP) hadn't finished.

So I wrote this.


USE
=
1) Install script as /usr/local/etc/rc.d/waitnetwork
2) chmod 755 /usr/local/etc/rc.d/waitnetwork
3) Set the following in rc.conf:

waitnetwork_enable=yes
waitnetwork_ip=some_ip_addr_to_ping

Note that this does need to be an IP address and *not* an FQDN.  I've
discussed this reasoning with some others and they agree.  Don't pick
something like 127.0.0.1 either (meaning don't be silly).  :-)

Other parameters you can adjust:

waitnetwork_count   -- passed as ping(8) -c flag  (default 5)
waitnetwork_timeout -- passed as ping(8) -t flag  (default 60)


CAVEATS / POINTS OF INTEREST
==
1) This script requires the $waitnetwork_ip box/router/whatever respond
to ICMP ECHO requests.  Please do not bikeshed on this point; we need
something that works, and this requirement shouldn't be that bad to deal
with (firewall/ACL-wise).  For most folks (co-located in particular),
this could be your default gateway, but you can use whatever you want.

2) The needs of some folks may vary depending upon configuration; we
have two NICs, dual-homed, so what exactly do I put in waitnetwork_ip?
Yes, I understand the confusion -- hopefully these folks, given their
topologies, can figure out a way to make this work reliably for them.

3) Other stuff I probably haven't thought of.

For those considering arguing that we should just wait for the NIC to
come up, that won't work -- what's needed is a way to verify layer 3/4
is usable, not layer 1.

I admit there's no universal way to cover every single person's needs,
but providing a simple framework to at least wait until something is
pingable would be a good starting point; it's better than nothing!


NOTES BEFORE COMMITTING
=
The script also contains some XXX comments which should be reviewed by
anyone willing to commit this into the base system.


REFERENCES

[1]: http://jdc.parodius.com/freebsd/waitnetwork
[2]: http://wiki.dovecot.org/TimeMovedBackwards

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |


#!/bin/sh
#
# $FreeBSD: $
#

# PROVIDE: waitnetwork
# REQUIRE: NETWORKING
# BEFORE: mountcritremote
# KEYWORD: nojail

# XXX - once/if committed to base, it's better to have mountcritremote
# XXX - REQUIRE waitnetwork, rather than use the above BEFORE line.

. /etc/rc.subr

name=waitnetwork
rc_var=`set_rcvar`

start_cmd=waitnetwork_start
stop_cmd=:

# XXX - once/if committed to base, the following defaults should
# XXX - be placed into src/etc/defaults/rc.conf instead of here

waitnetwork_enable=NO # Wait for network availability before
# continuing with NETWORKING rc scripts
waitnetwork_ip=  

Re: rc(8) script -- waiting for the network to become usable

2010-04-18 Thread Andrew Reilly
On Sun, Apr 18, 2010 at 02:37:27PM -0700, Jeremy Chadwick wrote:
 I'd like to discuss the possibility of introduction of a new script into
 /etc/rc.d base system a script, which when enabled, would provide a way
 to wait until the IP networking layer (using ping(8)) is up and usable
 before continuing with daemon startup.
 
 Let's discuss.  :-)
 
 
 HISTORY
 =
 The situation which brought this debacle to my attention:
 
 I found that on reboot of some of our systems, ntpdate (used to sync the
 clock initially before ntpd would be started) wouldn't work.  The daemon
 would report that it couldn't resolve any of the FQDNs within ntp.conf,
 and would therefore act as a no-op before continuing on.

By way of discussion, I'd just like to re-iterate what I
said the first time around: it must be understood that this
sort of thing is a (necessary) hacky-workaround that should
ultimately be unnecessary.  In preference, we should work on
the failing daemons or hassle up-stream daemon authors so
that the daemons in question either (a) retry until they *do*
get the information they're after or (b) fail properly, so
that they can be restarted by an external process monitoring
framework like sysutils/daemontools or launchd.  The reasoning
is simple: network outage is something that can happen even
after startup, and when network connectivity returns, the
routing and addresses that are visible won't necessarily be the
same.  Consider laptops that suspend, as a particular example.
Or mobile devices that switch from wi-fi to cellular networking
to no connectivity on a regular basis.  The get it right at
boot time model is important and traditional, but (I think)
a fragile and diminishing fraction of use cases.  Our rc-ng
framework favours solution (a).  I'm more a fan of approach (b),
myself: I use daemontools for many services, and I like the way
that launchd works on my Mac laptops.

Cheers,

-- 
Andrew

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: rc(8) script -- waiting for the network to become usable

2010-04-18 Thread Garrett Cooper
On Sun, Apr 18, 2010 at 4:24 PM, Andrew Reilly arei...@bigpond.net.au wrote:
 On Sun, Apr 18, 2010 at 02:37:27PM -0700, Jeremy Chadwick wrote:
 I'd like to discuss the possibility of introduction of a new script into
 /etc/rc.d base system a script, which when enabled, would provide a way
 to wait until the IP networking layer (using ping(8)) is up and usable
 before continuing with daemon startup.

 Let's discuss.  :-)


 HISTORY
 =
 The situation which brought this debacle to my attention:

 I found that on reboot of some of our systems, ntpdate (used to sync the
 clock initially before ntpd would be started) wouldn't work.  The daemon
 would report that it couldn't resolve any of the FQDNs within ntp.conf,
 and would therefore act as a no-op before continuing on.

 By way of discussion, I'd just like to re-iterate what I
 said the first time around: it must be understood that this
 sort of thing is a (necessary) hacky-workaround that should
 ultimately be unnecessary.  In preference, we should work on
 the failing daemons or hassle up-stream daemon authors so
 that the daemons in question either (a) retry until they *do*
 get the information they're after or (b) fail properly, so
 that they can be restarted by an external process monitoring
 framework like sysutils/daemontools or launchd.  The reasoning
 is simple: network outage is something that can happen even
 after startup, and when network connectivity returns, the
 routing and addresses that are visible won't necessarily be the
 same.  Consider laptops that suspend, as a particular example.
 Or mobile devices that switch from wi-fi to cellular networking
 to no connectivity on a regular basis.  The get it right at
 boot time model is important and traditional, but (I think)
 a fragile and diminishing fraction of use cases.  Our rc-ng
 framework favours solution (a).  I'm more a fan of approach (b),
 myself: I use daemontools for many services, and I like the way
 that launchd works on my Mac laptops.

I agree with Andrew. This is the model that Mac has been on for a
while now with launchd and this is the way that we should be migrating
towards (IMO) as it does a better job at detecting asynchronous system
events and could improve the overall init / rc model used in FreeBSD.
What ever happened to this work: http://wiki.freebsd.org/launchd ?
I remember that Apple went in a more OSX centric set of APIs in
Leopard+, but it might be worthwhile to start with the older version
of launchd, and migrate from there.
Thanks,
-Garrett
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org