[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-07-22 Thread Michael Banck
Any update here?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-06-23 Thread Michael Banck
Sorry that you had to do work now, but I wrote in comment #4:

"About later versions, my understanding is that the resource-agents
source package was neglected for years in Debian until a colleague of
mine updated it about a year ago. So the non-LTS releases after trusty
likely have the same issues, but they are EOL. xenial has 3.9.7 which is
from 2016 and is fine."

That was a bit convoluted, but has the same conclusion you arrived at I
think.

Also note that I posted a debdiff, so packaging should be mostly dealt
with.

I built the package on trusty and tested it on my 2-node virt-manager
Pacemaker test-system. I understand that this might not be enough, but
note that you really need a full two-node corosync/pacemaker setup to
reproduce the bug and test the updated package. I am happy to offer
assistance here if neeeded.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-06-23 Thread ChristianEhrhardt
Clearly no one wants to dump work on you, so I checked the commits vs
the git repo and the latest that got upstream is
b7911abce27889becc8a4637e003bfcf5ef1b15e in 3.9.7

1:3.9.7-1 is in Xenial - so marking tasks correctly.

** Changed in: resource-agents (Ubuntu)
   Status: New => Fix Released

** Tags added: bitesize

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-06-23 Thread ChristianEhrhardt
Since you already worked out the patch (and thanks for the update BTW) the 
remaining task is packaging, building and testing - I call that less effort 
than usual and tag it bitesize for that.
In some sense this should make it even more likely to be picked up rather soon.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-06-23 Thread ChristianEhrhardt
Hi Michael,
I have to excuse our team for not getting to your bug, there is no particular 
reason this was neglected other than being fully busy with even more pressing 
things.
OTOH we clearly want to encourage and support very responsive and active 
reporters.
Adding server-next tag which will give it a higher visibility.

The only thing left you can help is to clarify the last part of Nish's update:
"Can you also help me understand if the issues exist in later versions of 
resource-agents?"

Because for the SRU purpoose things have to be fixed in the current
version and then backported (otherwise an upgrading user will regress).
So understanding if it is fixed in latter versions (and which ones) will
help a lot to check which verssions to SRU to and/or if we need to start
with the current development release.

** Tags added: server-next

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-06-22 Thread Michael Banck
Any news on this? Should I be adding more information or can I help
otherwise?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-05-30 Thread Michael Banck
Attached is an updated debdiff with dep3 header.

** Patch added: "updated debdiff with DEP3"
   
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+attachment/4886061/+files/resource-agents_3.9.3+git20121009-3ubuntu3.debdiff

** Description changed:

+ [Impact]
+ 
+ The pgsql Pacemaker resource agent
+ (/usr/lib/ocf/resource.d/heartbeat/pgsql from the resource-agents
+ package) implements a Pacemaker Master/Slave set. Besides regular
+ actions lke starting/stopping/monitoring the resource, this also
+ includes monitoring of the transaction log position on each node and
+ assigning a score to a node and implamenting promote/demote actions. In
+ the case of failed monitoring of the Master, Pacemaker may decide to
+ failover to a Slave based on the Slave's score.
+ 
+ Due to version skew in 14.04 between the version of Pacemaker (1.1.10)
+ and resource-agents (3.9.3), the Pacemaker output of various status
+ commands is slightly different than what the pgsql resource agent
+ expects, and parses it wrong. In particular, since Pacemaker 1.1.8, the
+ so-called instance number is no longer appended to the resource name
+ (like pgsql:1) if the property globally-unique is set to false.
+ 
+ This leads to the following two problems:
+ 
+ - The call to crm_attribute fails as it appended the instance number to
+ the resource name.  It first tries to read the current score, but as the
+ requested resource name:instance does not exist, gets back an error
+ message and subsequently leaves the score of the Standby at the default
+ of 1000 and not at 100 as it should be.
+ 
+ To wit:
+ 
+ # crm_attribute -l reboot -N h1db2 -n "master-pgsql:1" -G -q
+ Error performing operation: No such device or address
+ # crm_attribute -l reboot -N h1db2 -n "master-pgsql" -G -q
+ 100
+ 
+ - The output of crm_mon no longer includes a colon, so the resource
+ agent on the Standby believes no Master is present and is not able to
+ get the Master's transaction log position.
+ 
+ To wit:
+ 
+ # crm_mon -n1 | grep Master
+ pgsql   (ocf::heartbeat:pgsql): Master
+ # crm_mon -n1 | tr -d "\t" | grep Master
+ pgsql(ocf::heartbeat:pgsql):Master
+ 
+ The pgsql resource agent's original grep was running 'grep -q
+ "^${RESOURCE_NAME}:.* Master"' (where $RESSOURCE_NAME=pgsql) on the last
+ line, which turned up no hits (or rather, a non-zero exit status).
+ 
+ This gives Pacemaker wrong input to decide in failover situations,
+ resulting in possibly spurious failovers. As Pacemaker is typically
+ deployed in business-critical setups, any unneeded failover implies a
+ (possibly short but) unwanted downtime.  Fixing them will make it
+ possble to use PostgreSQL streaming replication in a high-availability
+ fashion on 14.04.
+ 
+ The problems have since been fixed in the upstream resource-agents
+ repository (https://github.com/ClusterLabs/resource-agents/). The
+ appropriate upstream commits have been stashed to a single patch.
+ 
+ [Test Case]
+ 
+ Setup a two-node PostgreSQL Pacemaker cluster on 14.04 according to e.g. 
according to 
https://github.com/gocardless/our-postgresql-setup/blob/master/postgresql-cluster-setup.sh
+ Note that gocardless ship a patched version of the pgsql resource agent as 
well, so revert commit 
https://github.com/gocardless/our-postgresql-setup/commit/2511b9441d43996a3e45604080dedfac9a490c28
 or comment out the deployement of the patched pgsql.
+ 
+ After setup, the score of the Standby will be 1000 with the current
+ resource-agents package, and after installation of the proposed SRU
+ package, it will be 100.
+ 
+ [Regression Potential]
+ 
+ As the commits are from upstream and fix currenlty broken behaviour in a
+ localized fashion, there should be no regressions.
+ 
+ The patch has been deployed by our customer for three weeks now and they
+ reported no problems.
+ 
+ [Other Info]
+ 
+ I am happy to answer further questions.
+ 
+ [Original Description]
+ 
  We were debugging an unexpected failover of a PostgreSQL-9.3 Pacemaker
  cluster running on 14.04 LTS at a client. As a timeline, the client put
  back the second node (h1db2) into the cluster at around 9:10 AM, and the
  unexpected failover occured at 10:32 AM.
  
  What exactly lead to the failover could not be exatly figured out, but
  two problems were apparent from the logs:
  
  1. The standby monitor action thought there was no master running:
  
  May  4 09:10:59 h1db2 pgsql(pgsql)[2460]: INFO: Master does not exist.
  [Message repeats 175 times]
  May  4 10:32:04 h1db2 pgsql(pgsql)[2611]: INFO: Master does not exist.
  [...]
  May  4 10:32:04 h1db2 pgsql(pgsql)[2611]: INFO: I have a master right.
  
  At this point, Pacemaker decided to promote h1db2.
  
  2. Between 9:10 and 10:32, the score of the standby was -INFINITY,
  at10:32 it was then set to the same score as the master (1000) while it
  should be 100 for standbys.
  
  Both problems were debugged and traced back to bugs in the pgsql
  resource agent version in 

[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-05-24 Thread Michael Banck
@nacc: Thanks for the comments, but I probably won't have time to work
on the SRU template this week as I'm on vacation, or is there a point
release deadline coming up?

About later versions, my understanding is that the resource-agents
source package was neglected for years in Debian until a colleague of
mine updated it about a year ago. So the non-LTS releases after trusty
likely have the same issues, but they are EOL. xenial has 3.9.7 which is
from 2016 and is fine.

About DEP3 headers, I'll look into it, I didn't do it cause I thought
the other patches did not have headers either, but I guess it makes
sense for an SRU.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-05-23 Thread Nish Aravamudan
@mbanck: thank you very much for your extensive analysis. Note that for
SRU purposes we will eventually need to provide an SRU template in the
Description per: https://wiki.ubuntu.com/StableReleaseUpdates. Would you
be able to start doing that?

Can you also help me understand if the issues exist in later versions of
resource-agents?

Finally, and this may not be trivial because of the number of commits,
but would it be possible to add DEP3 headers to the debdiff?
dep.debian.net/deps/dep3/

** Also affects: resource-agents (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Changed in: resource-agents (Ubuntu Trusty)
Milestone: None => ubuntu-14.04.5

** Changed in: resource-agents (Ubuntu)
Milestone: ubuntu-14.04.5 => None

** Changed in: resource-agents (Ubuntu Trusty)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to resource-agents in Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-05-23 Thread Nish Aravamudan
@mbanck: thank you very much for your extensive analysis. Note that for
SRU purposes we will eventually need to provide an SRU template in the
Description per: https://wiki.ubuntu.com/StableReleaseUpdates. Would you
be able to start doing that?

Can you also help me understand if the issues exist in later versions of
resource-agents?

Finally, and this may not be trivial because of the number of commits,
but would it be possible to add DEP3 headers to the debdiff?
dep.debian.net/deps/dep3/

** Also affects: resource-agents (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Changed in: resource-agents (Ubuntu Trusty)
Milestone: None => ubuntu-14.04.5

** Changed in: resource-agents (Ubuntu)
Milestone: ubuntu-14.04.5 => None

** Changed in: resource-agents (Ubuntu Trusty)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-05-05 Thread Ubuntu Foundations Team Bug Bot
** Tags added: patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1688613] Re: pgsql RA has problems with pacemaker version

2017-05-05 Thread Michael Banck
Debdiff attached.

** Patch added: "Proposed debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+attachment/4872345/+files/resource-agents_3.9.3+git20121009-3ubuntu3.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1688613

Title:
  pgsql RA has problems with pacemaker version

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs