> +# get the link status on $NIC
> +# returns UP or DOWN or whatever ip reports (UNKNOWN?)
> +get_link_status () {
> +     $IP2UTIL -o link show dev "$NIC" \
> +             | sed 's/.* state \([^ ]*\) .*/\1/'
> +}
> +
> +# returns the number of received rx packets on $NIC
> +get_rx_packets () {
> +     ocf_log debug "$IP2UTIL -o -s link show dev $NIC"
> +     $IP2UTIL -o -s link show dev "$NIC" \
> +             | sed 's/.* RX: [^0-9]*[0-9]* *\([0-9]*\) .*/\1/'
> +             # the first number after RX: ist the # of bytes ,
> +             # the second is the # of packets received
> +}
> +
> +# watch for packet counter changes for max. OCF_RESKEY_pktcnt_timeout seconds
> +# returns immedeately with return code 0 if any packets were received
> +# otherwise 1 is returned
> +watch_pkt_counter () {
> +     local RX_PACKETS_NEW
> +     local RX_PACKETS_OLD
> +     RX_PACKETS_NEW="`get_rx_packets`"
> +     RX_PACKETS_OLD="$RX_PACKETS_NEW"

^^^ assign old here, only

> +     for n in `seq $OCF_RESKEY_pktcnt_timeout`; do
> +             sleep 1

maybe even sub second sleep?
likely we are busy anyways,
so we should see some packets very quickly

> +             RX_PACKETS_OLD="$RX_PACKETS_NEW"

not here. because, if it changed, you return anyways.
if not, well, it has not changed right?

> +             RX_PACKETS_NEW="`get_rx_packets`"
> +             ocf_log debug "RX_PACKETS_OLD: $RX_PACKETS_OLD    
> RX_PACKETS_NEW: $RX_PACKETS_NEW"
> +             if [ "$RX_PACKETS_OLD" -ne "$RX_PACKETS_NEW" ]; then
> +                     ocf_log debug "we received some packages."

s/packages/packets/

> +                     return

explicitly return 0 here, or you will be returning the result of
ocf_log, and you don't want to cause recovery action because the logging
failed somewhere.

> +             fi
> +     done
> +     return 1
> +}

I do like this packet counter monitoring.

> +# returns list of chached ARP entries for $NIC

s/chached/cached

> +# sorted by age ("last confirmed")
> +# max. OCF_RESKEY_arping_cache_entries entries
> +get_arp_list () {
> +     $IP2UTIL -s neighbour show dev $NIC \
> +             | grep -v "^$BASEIP " \

my own ip is not included anyways,
at least on my box?

do you want to grep REACHABLE here, too?
not sure.

> +             | tr '/' ' ' | sort -n -k8 | cut -d' ' -f1 \

 | sort -t/ -k2,2n | cut -d' ' -f1 \

> +             | head -n $OCF_RESKEY_arping_cache_entries | tr '\n' ' '

what for do you tr \n ' ' ?
no need.

> +             # omit our own IP
> +             # the "used" entries in `ip -s neighbour show` are:
> +             # "last used"/"last confirmed"/"last updated"
> +}
> +
> +# arping the IP given as argument $1 on $NIC
> +# until OCF_RESKEY_arping_count answers are received
> +do_arping () {
> +     if $DEBUG ; then arping -c $OCF_RESKEY_arping_count -I $NIC $1; return 
> $?; fi
> +     arping -q -c $OCF_RESKEY_arping_count -I $NIC $1

I think you should add the source IP.
Is this the only arping out there?  I think there are at least two
variants, with slightly different options.

If your invokation is compatible, add a comment that it is.
If not, find a way to deal with both.

> +     return $?

return $? can go, functions return with the exit code of the last command 
anyways.

> +}
> +
> +#
> +#    Check the interface depending on the level given as parameter:
> +#    (this will be startup_check_level on start or OCF_CHECK_LEVEL on 
> monitor)
> +#
> +# 09: check for nonempty ARP cache
> +# 10: watch for packet counter changes
> +#
> +# 19: check arping_ip_list
> +# 20: check arping ARP cache entries
> +# 
> +# 30:  watch for packet counter changes in promiscios mode
> +# 
> +# If unsuccessfull in levels 18 and above,
> +# the tests for higher check levels are run.
> +#
> +ip_monitor_if_check () {
> +     local CURRENT_CHECK_LEVEL
> +     CURRENT_CHECK_LEVEL=$1
> +     # always check link status first
> +     link_status="`get_link_status`"
> +     ocf_log debug "link_status: $link_status"
> +     case $link_status in
> +             UP)
> +                     if [ "$CURRENT_CHECK_LEVEL" -eq 0 ]; then
> +                             return $OCF_SUCCESS
> +                     fi
> +                     ;;
> +             DOWN)
> +                     # remove address from NIC
> +                     return $OCF_NOT_RUNNING
> +                     ;;
> +             *) # this should not happen.
> +                     return $OCF_ERR_GENERIC
> +                     ;;
> +     esac

# $CURRENT_CHECK_LEVEL > 0 and link_status UP is implicit from now on

> +
> +     # check for nonempty ARP cache
> +     if [ $CURRENT_CHECK_LEVEL -le 9 ]; then
> +             ocf_log debug "check for nonempty ARP cache" 
> +             if [ "`get_arp_list`" != "" ]; then
> +                     return $OCF_SUCCESS
> +             fi
> +     fi
> +
> +     # watch for packet counter changes
> +     if [ "$CURRENT_CHECK_LEVEL" -eq 10 ]; then

Do you want -le here?
if not, add a comment why you want -eq.

> +             ocf_log debug "watch for packet counter changes" 
> +             watch_pkt_counter && return $OCF_SUCCESS
> +     fi
> +
> +     # check arping_ip_list
> +     if [ $CURRENT_CHECK_LEVEL -le 19 ]; then
> +             ocf_log debug "check arping_ip_list" 
> +             if [ "$OCF_RESKEY_arping_ip_list" != "" ]; then

No need to check for != "",
the for loop will just not execute if it is.

> +                     for ip in $OCF_RESKEY_arping_ip_list; do
> +                             do_arping $ip && return $OCF_SUCCESS
> +                     done
> +             fi
> +     fi
> +
> +     # check arping ARP cache entries
> +     if [ $CURRENT_CHECK_LEVEL -le 20 ]; then
> +             ocf_log debug "check arping ARP cache entries" 
> +             if [ "$OCF_RESKEY_arping_ip_list" != "" ]; then

no need here either, probably even wrong.
You want this even if there is no arping_ip_list explicitly set, right?

> +                     for ip in `get_arp_list`; do
> +                             do_arping $ip && return $OCF_SUCCESS
> +                     done
> +             fi
> +     fi
> +
> +     # watch for packet counter changes in promiscios mode
> +     if [ $CURRENT_CHECK_LEVEL -le 30 ]; then
> +             ocf_log debug "watch for packet counter changes in promiscios 
> mode" 
> +             # be sure switch off promiscios mode in any case
> +             trap "$IP2UTIL link set dev $NIC promisc off; exit" INT TERM 
> EXIT

I don't like this.
You assume that the dev does not operate in promisc mode.
This will break the (presumably minority of) cases that actually
run promisc intentionally during "normal operation".


> +                     $IP2UTIL link set dev $NIC promisc on
> +                     watch_pkt_counter && return $OCF_SUCCESS
> +                     $IP2UTIL link set dev $NIC promisc off
> +             trap - INT TERM EXIT
> +     fi

How about you remember the packet stats first thing,
then do all the other checks,
finally you do a broadcast ping or something,
then go again waiting for packets,
comparing with the first remembered count?

If you really want the promisc check,
maybe first check if it was on,
and if so, leave it on.
Or make that an advanced option: keep_promisc_on...

Maybe I'm just wrong here, and it's ok as you have it.

> +     # looks like it's not working (for whatever reason)
> +     # remove address from NIC
> +     return $OCF_NOT_RUNNING
> +}
> +

Enough for today...

Thanks again for this valuable contribution.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to