Hi, just bumping this to be not forgotten. RA runs fine for almost a month, several simulated network outages were passed with full success, so it could be included in some package. I think pacemaker, because this RA uses pacemaker-specific calls.
Andrew? (I can support this one) 23.02.2011 11:53, Vladislav Bogdanov wrote: > Hi Lars, > > thank you for your time and for so detailed review. > > Just to dot half of i's (where it is about coding style): > 1. I strongly prefer to cleanly separate data access from main logic by API. > 2. I prefer to have non-void functions to return result explicitly > ("main" too). This will prevent "correct" return code from being lost on > subsequent code modifications. > 3. I insist on all variables inside functions to be local. This helps to > avoid some hardly-debugable logic errors. > 4. I prefer not to touch global variables inside of lower-layer > functions. To be honest, I prefer to pass everything needed by function > in its parameters. This sometimes is hard to do, so sometimes I prefer > to use global variables relatively high at stack rather then pass them > down through several more functions. > 5) I prefer to use sub-shells for recursive function calls. > > Please look at one more attached revision, and also in comments inline. > > > ... >>> <longdesc lang="en"> >>> Every time the monitor action is run, this resource agent records (in the >>> CIB) speed of active network interfaces from a list. >> >> >> Where "active" is ... >> what, exactly? >> Add some hint on the intended use case and purpose, >> maybe add an example or two. This is the long description, after all. > > Done. > >> >> Also note that this is >> - linux specific >> - requires kernel >= 2.6.33, >> afaict no /sys/class/net/*/speed before that > > Ahm, was not aware about that. I need to look again at this because I > need this to run on RHEL6 too. Anybody knows, does it has this sysfs? > Let's delay with this for a bit. > >> >>> </longdesc> >>> <shortdesc lang="en">Network interface status</shortdesc> >> >> again, "interface status" may be a bit misleading. > > Fixed. > >> >>> Weight of each 10Mbps in interface speed (1Gbps = 100 * 10Mbps). >> >> The way you implemented it, you get the speed, then do some math with >> it, just to arrive at the same value you just read... >> Why not just give one point per Mbps by default? > > Rewrote. >> >>> With default value 1Gbps interface will be counted as 1000. >>> </longdesc> >>> <shortdesc lang="en">Weight of 10Mbps interface</shortdesc> >> >> can someone come up with better wording here? > > I tried. > >> >>> <parameter name="dampen" unique="0"> >>> <longdesc lang="en"> >>> The time to wait (dampening) further changes occur >> >> This english apparently needs fixing >> already wherever it was copied from ;-) > > Hopefully done (although english is not my native lang). > >> >> >>> </longdesc> >>> <shortdesc lang="en">Dampening interval</shortdesc> >>> <content type="integer" default="${OCF_RESKEY_dampen_default}"/> >>> </parameter> >>> >>> <parameter name="debug" unique="0"> >>> <longdesc lang="en"> >>> Enables to use default attrd_updater verbose logging on every call. >> >> "If enabled, ..." > > Re-worded. >> >>> </longdesc> >>> <shortdesc lang="en">Verbose logging</shortdesc> >>> <content type="string" default="false"/> >>> </parameter> >>> >>> </parameters> >>> >>> <actions> >>> <action name="start" timeout="30" /> >>> <action name="stop" timeout="30" /> >>> <action name="reload" timeout="30" /> >>> <action name="monitor" depth="0" timeout="30" interval="10"/> >>> <action name="meta-data" timeout="5" /> >>> <action name="validate-all" timeout="30" /> >>> </actions> >>> </resource-agent> >>> END >>> } >>> >>> usage() { >>> cat <<END >>> usage: $0 >>> {start|stop|monitor|migrate_to|migrate_from|validate-all|meta-data} >>> >>> Expects to have a fully populated OCF RA-compliant environment set. >>> END >> >> BTW: >> cat does an extra fork/exec, and "here documents" go via tmpfiles, >> which can break things, or slow things down quite a bit. >> most of the time, I prefer the typically built in echo " >> ... >> .... " >> >> But that has nothing to do with this RA, even less with its usage(). >> Actually it comes down to a matter of taste entirely, >> so just ignore this statement ;-) > > Let's leave it as-is. > >> >>> } >>> >>> start() { >>> monitor >>> if [ $? -eq $OCF_SUCCESS ]; then >>> return $OCF_SUCCESS >>> fi >> >> monitor && return > > Ahm, I'd leave it as-is. It is a bit more readable and probably does not > cost any extra CPU cycles. > >> >>> ha_pseudo_resource ${ha_pseudo_resource_name} start >>> update >> >> the update has been done in monitor already? > > Nope. ha_pseudo_resource ${ha_pseudo_resource_name} monitor returns 7 > there so update is not called. > >> >>> } >>> >>> stop() { >>> ha_pseudo_resource ${ha_pseudo_resource_name} stop >>> attrd_updater -D -n ${OCF_RESKEY_name} -d ${OCF_RESKEY_dampen} >>> ${attrd_options} >>> return $OCF_SUCCESS >> >> do we want to consider the exit code of attrd_updater, or not? >> if not, what do we do about non-zero exit code? >> does it have a useful reliable exit code, at all? >> you don't ignore it in update. >> Though you ignore the return value of update ... >> >> Yeah, right, ping does it the same way probably (I did not check; does >> it?) -- good to have a reason to review that, too ;-) > > We did our best, and failure of attr_updater means much more serious > problems than this RA han handle. > >> >>> } >>> >>> monitor() { >>> local ret >>> ha_pseudo_resource ${ha_pseudo_resource_name} monitor >>> ret=$? >>> if [ ${ret} -eq $OCF_SUCCESS ] ; then >>> update >>> fi >>> return ${ret} >> >> ha_pseudo_resource $ha_pseudo_resource_name monitor || return >> update >> >> and, if that's really necessary, ignore the return code of update: >> return $OCF_SUCCESS > > No, we are interested in NOT_RUNNING too. > Let's leave as-is. > >> >>> } >>> >>> validate() { >> >> check for linux, >> and kernel >= 2.6.33, >> and sysfs present, >> and if not, exit with not installed or something? > > Again, let's delay it for a bit. > >> >> Ah. just noticed you do that explicitly early on, anyways. >>> # Check the check interval >>> if ocf_is_decimal "${OCF_RESKEY_CRM_meta_interval}" && [ >>> ${OCF_RESKEY_CRM_meta_interval} -gt 0 ]; then >>> : >>> else >>> ocf_log err "Invalid check interval ${OCF_RESKEY_interval}. It >>> should be positive integer!" >>> exit $OCF_ERR_CONFIGURED >>> fi >>> >>> # Check the intarfaces list >> >> s/tar/ter > > Thanks. > >> >>> if [ "x" = "x${OCF_RESKEY_iface}" ]; then >> >> You declared yourself as bash. >> no need for this "x" = "x$VAR" nonsense in bash... >> in fact, even in sh, I find test -z preferable. > > Blindly copied it. Done. > >> >>> ocf_log err "Empty iface parameter. Please specify some network >>> interface to check" >> >> also check for strange characters in iface, >> it will be used as regex/sed/awk expression later. > > Let it be TODO. > >> >>> exit $OCF_ERR_CONFIGURED >>> fi >>> >>> return $OCF_SUCCESS >>> } >>> >>> iface_get_speed() { >>> local iface=$1 >>> local operstate >>> local carrier >>> local speed >>> >>> if [ ! -e "/sys/class/net/${iface}" ] ; then >>> echo 0 >>> elif iface_is_bridge ${iface} ; then >>> bridge_get_speed ${iface} >>> elif iface_is_bond ${iface} ; then >>> bond_get_speed ${iface} >>> elif iface_is_vlan ${iface} ; then >>> iface_get_speed $( vlan_get_phy ${iface} ) >>> else >>> read operstate < "/sys/class/net/${iface}/operstate" >>> read carrier < "/sys/class/net/${iface}/carrier" >>> if [ "${operstate}" != "up" ] || [ "${carrier}" != "1" ] ; then >>> speed="0" >>> else >>> read speed < "/sys/class/net/${iface}/speed" >>> fi >>> echo ${speed} >> >> I'd prefer to have speed non-local, > > I'd not. Let's be safe with recursions. > >> and be a return value. > > Does bash already support 10000 to be function return code? ;) > >> >> then you can do away with the $(iface_get_speed) later as well. >> >>> fi >>> } >>> >>> iface_is_vlan() { >>> local iface=$1 >>> [ -e "/proc/net/vlan/${iface}" ] && return 0 || return 1 >>> } >>> >>> iface_is_bridge() { >>> local iface=$1 >>> [ -e "/sys/class/net/${iface}/bridge" ] && return 0 || return 1 >>> } >>> >>> iface_is_bond() { >>> local iface=$1 >>> [ -e "/sys/class/net/${iface}/bonding" ] && return 0 || return 1 >>> } >> >> All these functions: >> iface_is_vlan() { [ -e "/proc/net/vlan/$1" ]; } >> iface_is_bridge() { [ -e "/sys/class/net/$1/bridge" ]; } >> iface_is_bond() { [ -e "/sys/class/net/$1/bonding" ]; } > > Please see above about implicit return codes. This should not take any > CPU cycles. > >> >>> vlan_get_phy() { >>> local iface=$1 >>> grep "^${iface} " "/proc/net/vlan/config" | sed -r 's/.*\| +(.*)/\1/' >> >> probably safe to assume that on a box running linux kernel >= 2.6.33 >> the sed supports -r, but you never know ;-) >> also, if you use sed aleady, have it do the pattern matching as well? >> >> vlan_get_phy() { sed -ne "s/^$1 .*| *//p" < /proc/net/vlan/config; } > > Ahm, good point, thanks. > >> >>> } >>> >>> bridge_is_stp_enabled() { >>> local iface=$1 >>> local stp >>> read stp < "/sys/class/net/${iface}/bridge/stp_state" >>> [ "${stp}" = "1" ] && return 0 || return 1 >> >> how about >> bridge_is_stp_enabled() { grep 1 < /sys/class/net/$1/bridge/stp_state; } > > This will be slower. > >> >> maybe add 2>/dev/null ? >> >> or, as we are bash: >> bridge_is_stp_enabled() { [[ $(</sys/class/net/$1/bridge/stp_state) = 1 >> ]]; } > > I'd say this is unreadable for me. I'd avoid non-trivial constructs with > no performance gain. And again, implicit vs explicit. > >> >> but some may prefer the more verbose thing. >> if so, at least do away with the && return || return. >> shell function return value is the "exit status" of the last command >> executed, so [[ $stp = 1 ]] is sufficient. > > But more error-prone on subsequent code modifications. > >> >>> } >>> >>> bridge_get_root_ports() { >>> local bridge=$1 >>> local root_id >>> local root_ports="" >>> local bridge_id >>> >>> read root_id < "/sys/class/net/${bridge}/bridge/root_id" >>> >>> for port in /sys/class/net/${bridge}/brif/* ; do >>> read bridge_id < "${port}/designated_bridge" >>> if [ "${bridge_id}" = "${root_id}" ] ; then >>> root_ports="${root_ports} ${port##*/}" >>> fi >>> done >> >> Yes, I think here the more verbose style is appropriate ;-) >> >>> echo "${root_ports# }" > > I'd leave this as-is. And, I remade this chunk of code, so this > construct is (probably) really needed now. Anyways, I hate implicit > assumptions. > >> >> Though this is not necessary: echo $root_ports would do this just fine. >> and, again, I'd prefer a non-local variable to pass the value >> over some $(subshell) thingy. > > Did it another way. > >> >>> } >>> >>> # From /inlude/linux/if_bridge.h: >>> #define BR_STATE_DISABLED 0 >>> #define BR_STATE_LISTENING 1 >>> #define BR_STATE_LEARNING 2 >>> #define BR_STATE_FORWARDING 3 >>> #define BR_STATE_BLOCKING 4 >>> >>> bridge_get_active_ports() { >>> local bridge=$1 >>> shift 1 >>> local ports="$*" >>> local active_ports="" >>> local port_state >>> local stp_state=bridge_is_stp_enabled ${bridge} >> >> some double quote missing there? > > I should be temporarily insane while writing that ;) > Fixed. > >> >>> local warn=0 >>> >>> if [ -z "${ports}" ] || [ "${ports}" = "detect" ] ; then >>> ports=$( bridge_get_root_ports ${bridge} ) >> >> if you did non-local root_ports, this would become >> bridge_get_root_ports $bridge >> # root ports now in $root_ports, >> # so you can assign port=$root_ports, >> # or just use $root_ports as is. > > Rewrote this. > >> >>> fi >>> >>> for port in $ports ; do >> >> there. no need to trim white space anyways, as I thought. > > I'd leave it. > >> >>> if [ ! -e "/sys/class/net/${bridge}/brif/${port}" ] ; then >>> ocf_log warning "Port ${port} doesn't belong to bridge >>> ${bridge}" >>> continue >>> fi >>> read port_state < "/sys/class/net/${bridge}/brif/${port}/state" >>> if [ "${port_state}" = "3" ] ; then >>> if [ -n "${active_ports}" ] && ${stp_state} ; then >> >> no need to re-check stp state for each iteration. >> do that once, and reference the result here. > > This nonsense is fixed. > >> >>> warn=1 >>> fi >>> active_ports="${active_ports} ${port}" >>> fi >>> done >>> if [ ${warn} -eq 1 ] ; then >>> ocf_log warning "More then one upstream port in bridge '${bridge}' >>> is in forwarding state while STP is enabled: ${active_ports}" >>> fi >>> echo "${active_ports# }" >> >> again, no need to trim white space, and I prefer non-local documented >> variables over $(subshell echo) style assignments. >> >>> } >>> >>> bridge_get_speed() { >>> local $iface=$1 >>> >>> if ! iface_is_bridge ${iface} ; then >>> echo 0 >>> return >>> fi >>> >>> local ports=$( bridge_get_active_ports ${iface} >>> ${OCF_RESKEY_bridge_ports} ) > > Still leaving this as subshell, will rethink later. > >>> for port in ${ports} ; do >>> : $(( aggregate_speed += $( iface_get_speed ${port} ) )) >>> done >>> echo ${aggregate_speed} >> >> This can be rewritten without subshells. > > Error-prone because of recursions. > I'd leave it as-is. > >> >> It could even be done without bashisms, >> (there is only one bashism in there, afaics) >> but who cares, we are bash anyways ;-) >> >>> } >>> >>> bond_get_slaves() { >>> local iface=$1 >>> local slaves >>> read slaves < "/sys/class/net/${iface}/bonding/slaves" >>> echo ${slaves} >> >> if that is what you wanted, why not just say cat? > > It is slower. > >> or, drop this function completely, as it only adds noise, >> and later do >> slaves=$(< /sys/class/net/${iface}/bonding/slaves) >> avoiding any subshells? > > Data access separation. > >> >>> } >>> >>> bond_get_active_iface() { >>> local iface=$1 >>> local active >>> read active < "/sys/class/net/${iface}/bonding/active_slave" >>> echo ${active} >>> } >> >> similar. >> >>> >>> bond_is_balancing() { >>> local iface=$1 >>> read mode mode_index < "/sys/class/net/${iface}/bonding/mode" >> >> >> You declare all variables in all your tiny to-be-used-as-subshell >> functions as local (needlessly, as they are visible in that subshell >> only anyways... but keep them local, we want to get rid of the subshell >> echo style assignments), but then here omit the local? how come? > > Having function variables local is simply a good habit I think. Like > having curly braces around one-statement blocks in C. > I rewrote some functions so they can be used without subshell at a cost > of additional variable. > >> >> not that it matters, really, they are used only here, anyways. >> >>> case ${mode} in >>> "balance-rr"|"balance-xor"|"802.3ad"|"balance-tlb"|"balance-alb") >>> return 0 >>> ;; >>> *) >>> return 1 >>> ;; >>> esac >>> } >>> >>> bond_get_speed() { >>> local iface=$1 >>> local aggregate_speed=0 >>> >>> if ! iface_is_bond ${iface} ; then >>> echo 0 >>> return >>> fi >>> >>> local slaves=$( bond_get_slaves ${iface} ) >>> if bond_is_balancing ${iface} ; then >>> for slave in ${slaves} ; do >>> : $(( aggregate_speed += $( iface_get_speed ${slave} ) )) >>> done >>> # Bonding is unable to get speed*n >>> : $(( aggregate_speed = aggregate_speed*8/10 )) >>> else >>> : $(( aggregate_speed = $( iface_get_speed $( bond_get_active_iface >>> ${iface} ) ) )) >>> fi >>> echo ${aggregate_speed} >> >> Again, I'd prefer this to be handled without subshells. > > Recursion. > >> >>> } >>> >>> update() { >>> local speed=$( iface_get_speed ${OCF_RESKEY_iface} ) >>> >>> : $(( score = speed * ${OCF_RESKEY_weight_base} / 10 )) >> >> hmmm... >> score = speed * 10 / 10 >> but I complained about that earlier already ;-) >> >>> attrd_updater -n ${OCF_RESKEY_name} -v ${score} -d ${OCF_RESKEY_dampen} >>> ${attrd_options} >>> rc=$? >>> case ${rc} in >>> 0) >>> ocf_is_true ${OCF_RESKEY_debug} && ocf_log debug "Updated >>> ${OCF_RESKEY_name} = ${score}" >>> ;; >>> *) >>> ocf_log warn "Could not update ${OCF_RESKEY_name} = ${score}: >>> rc=${rc}" >>> ;; >>> esac >>> return ${rc} >> >> all that fun just to ignore the return value of this function anyways? > > No. It was not ignored. > But I fixed implicit to explicit return in start(). > >> >> >>> } >>> >>> if [ `uname` != "Linux" ] ; then >>> ocf_log err "This RA works only on linux." >>> exit $OCF_ERR_INSTALLED >>> fi >>> >>> if ! ocf_is_true ${OCF_RESKEY_CRM_meta_globally_unique} ; then >>> : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESKEY_name}"} >>> else >>> : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESOURCE_INSTANCE}"} >>> fi >> >> At least one of those clone possibilities is probably nonsense, >> but wellm, that's how this ha_pseudo_resource stuff works. > > Dunno, just blindly copied this. > >> >>> attrd_options='-q' >>> if ocf_is_true ${OCF_RESKEY_debug} ; then >>> attrd_options='' >>> fi >>> >>> case $__OCF_ACTION in >>> meta-data) >>> meta_data >>> exit $OCF_SUCCESS >>> ;; >>> start) >>> start >>> ;; >>> stop) >>> stop >>> ;; >>> monitor) >>> monitor >>> ;; >>> reload) >>> start >>> ;; >>> validate-all) >>> validate >>> ;; >>> usage|help) >>> usage >>> exit $OCF_SUCCESS >>> ;; >>> *) >>> usage >>> exit $OCF_ERR_UNIMPLEMENTED >>> ;; >>> esac >>> exit $? >> >> The $? is not necessary there. > > I prefer to have it explicitly. > >> >> >> >> Good job! >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker