Hello community, here is the log from the commit of package resource-agents for openSUSE:Factory checked in at 2016-02-11 12:33:00 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/resource-agents (Old) and /work/SRC/openSUSE:Factory/.resource-agents.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "resource-agents" Changes: -------- --- /work/SRC/openSUSE:Factory/resource-agents/resource-agents.changes 2016-01-23 01:04:16.000000000 +0100 +++ /work/SRC/openSUSE:Factory/.resource-agents.new/resource-agents.changes 2016-02-11 12:33:01.000000000 +0100 @@ -1,0 +2,16 @@ +Wed Feb 03 14:48:30 UTC 2016 - kgronl...@suse.com + +- Update to version 3.9.7+git.1454497075.e697f43: + + Medium: nfsserver: fix monitor for systemd + + galera: force crash recovery if needed during last commit detection + + galera: prevent recovered nodes from bootstrapping cluster when possible + + galera: remove bashism + + Add portal check to open_iscsi_get_session_id() + +------------------------------------------------------------------- +Thu Jan 28 14:08:43 UTC 2016 - kgronl...@suse.com + +- Update to version 3.9.7~rc1+git.1453889774.3446b99: + + Low: ldirectord: Fix unset failcount error (bsc#962795) + +------------------------------------------------------------------- Old: ---- resource-agents-3.9.6+git.1452867140.fc8ace0.tar.xz New: ---- resource-agents-3.9.7+git.1454497075.e697f43.tar.xz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ resource-agents.spec ++++++ --- /var/tmp/diff_new_pack.rphIgX/_old 2016-02-11 12:33:02.000000000 +0100 +++ /var/tmp/diff_new_pack.rphIgX/_new 2016-02-11 12:33:02.000000000 +0100 @@ -48,7 +48,7 @@ Summary: Open Source HA Reusable Cluster Resource Scripts License: GPL-2.0 and LGPL-2.1+ and GPL-3.0+ Group: Productivity/Clustering/HA -Version: 3.9.6+git.1452867140.fc8ace0 +Version: 3.9.7+git.1454497075.e697f43 Release: 0 Url: http://linux-ha.org/ Source: resource-agents-%{version}.tar.xz ++++++ _service ++++++ --- /var/tmp/diff_new_pack.rphIgX/_old 2016-02-11 12:33:02.000000000 +0100 +++ /var/tmp/diff_new_pack.rphIgX/_new 2016-02-11 12:33:02.000000000 +0100 @@ -4,7 +4,7 @@ <param name="scm">git</param> <param name="exclude">.git</param> <param name="filename">resource-agents</param> - <param name="versionformat">3.9.6+git.%ct.%h</param> + <param name="versionformat">3.9.7+git.%ct.%h</param> <param name="revision">master</param> <param name="changesgenerate">enable</param> </service> ++++++ _servicedata ++++++ --- /var/tmp/diff_new_pack.rphIgX/_old 2016-02-11 12:33:02.000000000 +0100 +++ /var/tmp/diff_new_pack.rphIgX/_new 2016-02-11 12:33:02.000000000 +0100 @@ -1,4 +1,4 @@ <servicedata> <service name="tar_scm"> <param name="url">git://github.com/ClusterLabs/resource-agents.git</param> - <param name="changesrevision">fc8ace0c3eb428990d4373e0833c604a27d3db8c</param></service></servicedata> \ No newline at end of file + <param name="changesrevision">e697f43c4e59a47bd0dc7c093b7d46174035c2dd</param></service></servicedata> \ No newline at end of file ++++++ resource-agents-3.9.6+git.1452867140.fc8ace0.tar.xz -> resource-agents-3.9.7+git.1454497075.e697f43.tar.xz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/resource-agents-3.9.6+git.1452867140.fc8ace0/ChangeLog new/resource-agents-3.9.7+git.1454497075.e697f43/ChangeLog --- old/resource-agents-3.9.6+git.1452867140.fc8ace0/ChangeLog 2016-01-18 08:16:19.000000000 +0100 +++ new/resource-agents-3.9.7+git.1454497075.e697f43/ChangeLog 2016-02-03 15:48:30.000000000 +0100 @@ -1,3 +1,101 @@ +* Wed Feb 3 2016 resource-agents contributors +- stable release 3.9.7 +- ldirectord: fix unset failcount error +- iscsi: add portal check to open_iscsi_get_session_id() +- galera: use mysql's --tc-heuristic-recover if crash recovery is needed +- nfsserver: fix monitor for systemd + +* Wed Jan 20 2016 resource-agents contributors +- release candidate 3.9.7 rc1 +- nfsserver.sh: add hostname attribute for NFS export (required for NFSv4+Kerberos support) +- oradg.sh: new RA for Oracle Data Guard +- ocf_shellfuncs: suppress bash specific trace_ra log on dash +- sg_persist: remove uncalled for ocf_run calls +- multiple RA: replace error log messages with calls to ocf_exit_reason +- nfsserver: only do redhat specific stuff on redhat +- exportfs: don't increment fsid for single directory +- Filesystem: add tmpfs support +- netfs.sh: move defaults to metadata +- nfsserver: /var/lock/subsys is non-standard, check for it first +- nagios: new RA +- docker: check for errors in the container name +- mysql: fix grep failure on MySQL 5.6 or higher when checking read_only variable +- VirtualDomain: new attributes migration_speed and migration_downtime +- fs: remove not-working tmpfs support +- vm.sh: add migrate_options parameter +- nfsserver: Use rpc-statd.service for NFS locking in EXEC_MODE=3 (bsc#955114) +- nfsserver: Add EXEC_MODE for systemd without nfs-lock.service (bsc#955114) +- IPaddr2: Add IPv6 DAD collision detection +- Filesystem: add overlay as supported filesystem +- ldirectord: dns_check and fallbackcommand enhancements +- IPaddr2: fix potential syntax error on if-then-else +- SAPDatabase: add Oracle 12 to list of supported databases (bsc#953991) +- mysql-common.sh: fix issue where "removing old PID file" wasnt logged +- mysql-common.sh: when mysql has been stopped, mysql stop returns success +- mysql.sh: wait up to startup_wait seconds before failing if mysqld startup is slow +- orainstance.sh: fix 90s wait/killing of databases containing the name of the database being killed, and added cleanup code to kill remaining listener process +- ip.sh: Use DAD to check for IPv6 address collision +- iSCSITarget: fix to only create one IQN and add portals to it +- galera: document the bootstrap flow +- galera: start joining nodes during 'monitor' to allow long-running SST +- galera: add support for MYSQL_HOST and MYSQL_PORT from /etc/sysconfig/clustercheck +- redis: fix password parser +- pgsql fix exec_sql errors like "unknown variable select pg_ " in dash +- pgsql: fix get_my_location() sql regression +- docker: fix image variable name +- pgsql: Fix return code override in pgsql_real_start() +- slapd: add "maxfiles" parameter to set max number of open files (for ulimit -n) +- redis: use required client password when set +- send_arp: fix for infiniband, re-merge from upstream iputils arping +- CTDB: Preserve smb.conf permissions (bsc#935253) +- lxc: fix emergency stop functionality on 1.0 +- tomcat: use runuser instead of su for SELinux enforcing mode +- pgsql: use runuser intead of su command for SELinux enforcing mode +- docker: image name check fixes +- iSCSITarget: properly create portals for lio-t implementation +- iSCSILogicalUnit: when deleting a LUN or initiator fails with lio-t, proceed with warning +- iSCSILogicalUnit: return OCF_NOT_RUNNING on monitor if backing path does not exist +- iSCSILogicalUnit: add check for leftover target/core entries for lio-t +- pgsql: delete old replication slot when creating a new slot. +- Filesystem: support RozoFS +- orainstance.sh: interpret listener stop results correctly +- dhcpd: use correct default chroot for RHEL based systems +- LVM: allow vgck failures if partial_activation is true +- redis: avoid 0 byte dump.rdb start failures +- docker: fix container_exist test +- redis: fixed start operation if replication sync takes > 20 seconds +- ethmonitor: add link_status_only option for skipping RX counter and arping tests +- clvm: fix issue with only first option of daemon_options being used +- IPsrcaddr: return correct error code during stop when misconfigured +- clvm: activate_vgs option for enable/disable of automatic vg activation +- galera: properly redetect bootstrap after demote +- galera: clear last know sequence number any time promote is even attempted +- asterisk: fix return code +- galera: retrieve last sequence number without using read-only mode +- redis: add wait_last_known_master option +- redis: only connect to active master instances +- redis: do not attempt to demote if redis is dead +- redis: reliable shutdown. +- pgsql: add support for replication slots +- redis: set executable bit to be able to greate docs (make rpm) +- rabbitmq-cluster: fix rmq_join_list() to only return online nodes +- rabbitmq-cluster: new RA +- Filesystem: support overlayfs +- sg_persist: use default binary setting in meta-data +- dnsupdate: use nsupdate_opts parameter +- nfsserver: merge options into existing /etc/sysconfig/nfs +- portblock: portno param can be a string like 137,138 +- portblock: replace ancient heartbeat config with crm configure +- portblock: clarify TCP RST vs ICMP port unreachable +- VirtualDomain: enforce C locale in force_stop +- redis: retry on unknown error when starting +- redis: remove stop timeout and add placeholder master during election period +- CTDB: Change default socket location to CTDB's expected default. +- multiple RA: make sure that the pidfile directory exist +- multiple RA: create state-directory writable by the application +- orainstance.sh: Handle ORA-* error messages +- redis: new RA + * Thu Jan 29 2015 resource-agents contributors - stable release 3.9.6 - VirtualDomain: add migrate_options parameter diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/README.galera new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/README.galera --- old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/README.galera 2016-01-18 08:16:19.000000000 +0100 +++ new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/README.galera 2016-02-03 15:48:30.000000000 +0100 @@ -25,7 +25,7 @@ ### Bootstrap the cluster with the right node -When synced, the nodes of a galera clusters have in common a last seqno, +When synced, the nodes of a galera cluster have in common a last seqno, which identifies the last transaction considered successful by a majority of nodes in the cluster (think quorum). @@ -130,3 +130,20 @@ node started and entered the Galera cluster - Deleted: during recurring slave monitor in `check_sync_status()` as soon as the Galera code reports to be SYNC-ed. + +### heuristic-recovered + +If a galera node was unexpectedly killed in a middle of a replication, +InnoDB can retain the equivalent of a XA transaction in prepared state +in its redo log. If so, mysqld cannot recover state (nor last seqno) +automatically, and special recovery heuristic has to be used to +unblock the node. + +This attribute is used to keep track of forced recoveries to prevent +bootstrapping a cluster from a recovered node when possible. + +- Used : during `detect_first_master()` to elect the bootstrap node +- Created: in `detect_last_commit()` if the node has a pending XA + transaction to recover in the redo log +- Deleted: when a node is promoted to Master. This attribute is + kept in the CIB if a node in stopped. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/galera new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/galera --- old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/galera 2016-01-18 08:16:19.000000000 +0100 +++ new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/galera 2016-02-03 15:48:30.000000000 +0100 @@ -279,6 +279,22 @@ } +set_heuristic_recovered() +{ + ${HA_SBIN_DIR}/crm_attribute -N $NODENAME -l reboot --name "${INSTANCE_ATTR_NAME}-heuristic-recovered" -v "true" +} + +clear_heuristic_recovered() +{ + ${HA_SBIN_DIR}/crm_attribute -N $NODENAME -l reboot --name "${INSTANCE_ATTR_NAME}-heuristic-recovered" -D +} + +is_heuristic_recovered() +{ + local node=$1 + ${HA_SBIN_DIR}/crm_attribute -N $node -l reboot --name "${INSTANCE_ATTR_NAME}-heuristic-recovered" -Q 2>/dev/null +} + clear_last_commit() { ${HA_SBIN_DIR}/crm_attribute -N $NODENAME -l reboot --name "${INSTANCE_ATTR_NAME}-last-committed" -D @@ -337,7 +353,7 @@ return $OCF_ERR_GENERIC fi - if [ "$state" == "4" -a "$ready" == "ON" ]; then + if [ "$state" = "4" -a "$ready" = "ON" ]; then ocf_log info "local node synced with the cluster" # when sync is finished, we are ready to switch to Master clear_sync_needed @@ -429,8 +445,19 @@ local best_node="$NODENAME" local last_commit=0 local missing_nodes=0 + local nodes="" + local nodes_recovered="" + # avoid selecting a recovered node as bootstrap if possible for node in $(echo "$OCF_RESKEY_wsrep_cluster_address" | sed 's/gcomm:\/\///g' | tr -d ' ' | tr -s ',' ' '); do + if is_heuristic_recovered $node; then + nodes_recovered="$nodes_recovered $node" + else + nodes="$nodes $node" + fi + done + + for node in $nodes_recovered $nodes; do last_commit=$(get_last_commit $node) if [ -z "$last_commit" ]; then @@ -517,14 +544,77 @@ if ocf_is_true $bootstrap; then clear_bootstrap_node + # clear attribute heuristic-recovered. if last shutdown was + # not clean, we cannot be extra-cautious by requesting a SST + # since this is the bootstrap node + clear_heuristic_recovered else set_sync_needed + # attribute heuristic-recovered will be cleared once the joiner + # has finished syncing and is promoted to Master fi ocf_log info "Galera started" return $OCF_SUCCESS } +detect_last_commit() +{ + local last_commit + local recover_args="--defaults-file=$OCF_RESKEY_config \ + --pid-file=$OCF_RESKEY_pid \ + --socket=$OCF_RESKEY_socket \ + --datadir=$OCF_RESKEY_datadir \ + --user=$OCF_RESKEY_user" + local recovered_position_regex='s/.*WSREP\:\s*[R|r]ecovered\s*position.*\:\(.*\)\s*$/\1/p' + + ocf_log info "attempting to detect last commit version by reading ${OCF_RESKEY_datadir}/grastate.dat" + last_commit="$(cat ${OCF_RESKEY_datadir}/grastate.dat | sed -n 's/^seqno.\s*\(.*\)\s*$/\1/p')" + if [ -z "$last_commit" ] || [ "$last_commit" = "-1" ]; then + local tmp=$(mktemp) + local tmperr=$(mktemp) + + ocf_log info "now attempting to detect last commit version using 'mysqld_safe --wsrep-recover'" + + ${OCF_RESKEY_binary} $recover_args --wsrep-recover > $tmp 2> $tmperr + + last_commit="$(cat $tmp | sed -n $recovered_position_regex)" + if [ -z "$last_commit" ]; then + # Galera uses InnoDB's 2pc transactions internally. If + # server was stopped in the middle of a replication, the + # recovery may find a "prepared" XA transaction in the + # redo log, and mysql won't recover automatically + + cat $tmperr | grep -q -E '\[ERROR\]\s+Found\s+[0-9]+\s+prepared\s+transactions!' 2>/dev/null + if [ $? -eq 0 ]; then + # we can only rollback the transaction, but that's OK + # since the DB will get resynchronized anyway + ocf_log warn "local node <${NODENAME}> was not shutdown properly. Rollback stuck transaction with --tc-heuristic-recover" + ${OCF_RESKEY_binary} $recover_args --wsrep-recover \ + --tc-heuristic-recover=rollback > $tmp 2>/dev/null + + last_commit="$(cat $tmp | sed -n $recovered_position_regex)" + if [ ! -z "$last_commit" ]; then + ocf_log warn "State recovered. force SST at next restart for full resynchronization" + rm -f ${OCF_RESKEY_datadir}/grastate.dat + # try not to use this node if bootstrap is needed + set_heuristic_recovered + fi + fi + fi + rm -f $tmp $tmperr + fi + + if [ ! -z "$last_commit" ]; then + ocf_log info "Last commit version found: $last_commit" + set_last_commit $last_commit + return $OCF_SUCCESS + else + ocf_exit_reason "Unable to detect last known write sequence number" + clear_last_commit + return $OCF_ERR_GENERIC + fi +} galera_promote() { @@ -547,6 +637,8 @@ # promoting other masters only performs sanity checks # as the joining nodes were started during the "monitor" op if ! check_sync_needed; then + # sync is done, clear info about last recovery + clear_heuristic_recovered return $OCF_SUCCESS else ocf_exit_reason "Attempted to promote local node while sync was still needed." @@ -569,13 +661,15 @@ clear_last_commit clear_sync_needed - # record last commit by "starting" galera. start is just detection of the last sequence number - galera_start + # record last commit for next promotion + detect_last_commit + rc=$? + return $rc } galera_start() { - local last_commit + local rc echo $OCF_RESKEY_wsrep_cluster_address | grep -q $NODENAME if [ $? -ne 0 ]; then @@ -591,34 +685,11 @@ mysql_common_prepare_dirs - ocf_log info "attempting to detect last commit version by reading ${OCF_RESKEY_datadir}/grastate.dat" - last_commit="$(cat ${OCF_RESKEY_datadir}/grastate.dat | sed -n 's/^seqno.\s*\(.*\)\s*$/\1/p')" - if [ -z "$last_commit" ] || [ "$last_commit" = "-1" ]; then - ocf_log info "now attempting to detect last commit version using 'mysqld_safe --wsrep-recover'" - local tmp=$(mktemp) - ${OCF_RESKEY_binary} --defaults-file=$OCF_RESKEY_config \ - --pid-file=$OCF_RESKEY_pid \ - --socket=$OCF_RESKEY_socket \ - --datadir=$OCF_RESKEY_datadir \ - --user=$OCF_RESKEY_user \ - --wsrep-recover > $tmp 2>&1 - - last_commit="$(cat $tmp | sed -n 's/.*WSREP\:\s*[R|r]ecovered\s*position.*\:\(.*\)\s*$/\1/p')" - rm -f $tmp - - if [ "$last_commit" = "-1" ]; then - last_commit="0" - fi - fi - - if [ -z "$last_commit" ]; then - ocf_exit_reason "Unable to detect last known write sequence number" - clear_last_commit - return $OCF_ERR_GENERIC + detect_last_commit + rc=$? + if [ $rc -ne $OCF_SUCCESS ]; then + return $rc fi - ocf_log info "Last commit version found: $last_commit" - - set_last_commit $last_commit master_exists if [ $? -eq 0 ]; then diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/iscsi new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/iscsi --- old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/iscsi 2016-01-18 08:16:19.000000000 +0100 +++ new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/iscsi 2016-02-03 15:48:30.000000000 +0100 @@ -268,14 +268,16 @@ } open_iscsi_get_session_id() { local target="$1" + local portal="$2" $iscsiadm -m session 2>/dev/null | grep -E "$target($|[[:space:]])" | + grep -E "] $portal" | awk '{print $2}' | tr -d '[]' } open_iscsi_remove() { local target="$1" local session_id - session_id=`open_iscsi_get_session_id "$target"` + session_id=`open_iscsi_get_session_id "$target" "$OCF_RESKEY_portal"` if [ "$session_id" ]; then $iscsiadm -m session -r $session_id -u else @@ -296,7 +298,7 @@ local recov recov=${2:-$OCF_RESKEY_try_recovery} - session_id=`open_iscsi_get_session_id "$target"` + session_id=`open_iscsi_get_session_id "$target" "$OCF_RESKEY_portal"` prev_state="" if [ -z "$session_id" ]; then if $iscsiadm -m node -p $OCF_RESKEY_portal -T $target >/dev/null 2>&1; then diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/nfsserver new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/nfsserver --- old/resource-agents-3.9.6+git.1452867140.fc8ace0/heartbeat/nfsserver 2016-01-18 08:16:19.000000000 +0100 +++ new/resource-agents-3.9.7+git.1454497075.e697f43/heartbeat/nfsserver 2016-02-03 15:48:30.000000000 +0100 @@ -293,10 +293,42 @@ fi } +nfsserver_systemd_monitor() +{ + local threads_num + local rc + + nfs_exec is-active + rc=$? + + # Now systemctl is-active can't detect the failure of kernel process like nfsd. + # So, if the return value of systemctl is-active is 0, check the threads number + # to make sure the process is running really. + # /proc/fs/nfsd/threads has the numbers of the nfsd threads. + if [ $rc -eq 0 ]; then + threads_num=`cat /proc/fs/nfsd/threads 2>/dev/null` + if [ $? -eq 0 ]; then + if [ $threads_num -gt 0 ]; then + return $OCF_SUCCESS + else + return 3 + fi + else + return $OCF_ERR_GENERIC + fi + fi + + return $rc +} + nfsserver_monitor () { + set_exec_mode fn=`mktemp` - nfs_exec status > $fn 2>&1 + case $EXEC_MODE in + 1) nfs_exec status > $fn 2>&1;; + [23]) nfsserver_systemd_monitor > $fn 2>&1;; + esac rc=$? ocf_log debug "$(cat $fn)" rm -f $fn diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/resource-agents-3.9.6+git.1452867140.fc8ace0/ldirectord/ldirectord.in new/resource-agents-3.9.7+git.1454497075.e697f43/ldirectord/ldirectord.in --- old/resource-agents-3.9.6+git.1452867140.fc8ace0/ldirectord/ldirectord.in 2016-01-18 08:16:19.000000000 +0100 +++ new/resource-agents-3.9.7+git.1454497075.e697f43/ldirectord/ldirectord.in 2016-02-03 15:48:30.000000000 +0100 @@ -2108,7 +2108,7 @@ my $new_rsrv; my $rsrv; - $new_rsrv = {"server"=>$ip, "port"=>$port}; + $new_rsrv = {"server"=>$ip, "port"=>$port, "failcount"=>0}; $flags =~ /(\w+)(.*)/ && ($1 eq "gate" || $1 eq "masq" || $1 eq "ipip") or &config_error($line, "forward method must be gate, masq or ipip");