Re: [ClusterLabs] corosync doesn't start any resource
21.06.2018 16:04, Stefan Krueger пишет: > Hi Ken, > >> Can you attach the pe-input file listed just above here? > done ;) > > And thank you for your patience! > You delete all context which makes it hard to answer. This is not web forum where users can simply scroll up to see previous reply. Both your logs and pe-input show that nfs-server and vm-storage wait for each other. My best guess is that you have incorrect ordering for start and stop which causes loop in pacemaker decision. Your start order is "nfs-server vm-storage" and your stop order is "nfs-server vm-storage", while it should normally be symmetrical. Reversing order in one of sets makes it work as intended (verified). I would actually expect that asymmetrical configuration still should work, so I leave it to pacemaker developers to comment whether this is a bug or feature :) ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing on 2-node cluster
On 2018-06-20 11:52 PM, Andrei Borzenkov wrote: > 21.06.2018 00:50, Digimer пишет: >> On 2018-06-20 05:46 PM, Jehan-Guillaume de Rorthais wrote: >>> On Wed, 20 Jun 2018 17:24:41 -0400 >>> Digimer wrote: >>> Make sure quorum is disabled. Quorum doesn't work on 2-node clusters. >>> >>> It does with the "two_node" parameter enabled in corosync.conf...as far as I >>> understand it anyway... >> >> It doesn't, that option disables quorum in corosync. >> > > This option does not disable quorum - this option fakes quorum so > corosync continues to report "in quorum" even when one node is lost. it > is quite possible that pacemaker quorum does not map one-to-one to > corosync quorum though. Technically correct, which is the best kind of correct. I didn't go into that detail as the results are the same (and consistent with pacemaker's quorum=false language). >> Quorum is floor(($nodes / 2) + 1). So in a 3-node, that is 3 -> 1.5 -> >> 2.5 -> 2 votes needed for quorum. In a 2-node, that is 2 -> 1 -> 2 -> 2 >> votes needed for quorum, meaning you can't lose a node to operate (which >> is kinda not HA :) ). >> > > Yes, but that assumes normal, non two_node, configuration. As said, > two_node makes corosync to always pretend quorum is available (after > initial implicit wait_for_all). > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Upgrade corosync problem
Hi, I upgraded my PostgreSQL/Pacemaker cluster with these versions. Pacemaker 1.1.14 -> 1.1.18 Corosync 2.3.5 -> 2.4.4 Crmsh 2.2.0 -> 3.0.1 Resource agents 3.9.7 -> 4.1.1 I started on a first node (I am trying one node at a time upgrade). On a PostgreSQL slave node I did: crm node standby service pacemaker stop service corosync stop Then I build the tool above as described on their GitHub.com page. ./autogen.sh (where required) ./configure make (where required) make install Everything went ok. I expect new file overwrite old one. I left the dependency I had with old software because I noticed the .configure didn’t complain. I started corosync. service corosync start To verify corosync work properly I used the following commands: corosync-cfg-tool -s corosync-cmapctl | grep members Everything seemed ok and I verified my node joined the cluster (at least this is my impression). Here I verified a problem. Doing the command: corosync-quorumtool -ps I got the following problem: Cannot initialise CFG service If I try to start pacemaker, I only see pacemaker process running and pacemaker.log containing the following lines: Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Jun 21 15:09:38 [17115] pg1 pacemakerd: info: get_cluster_type: Detected an active 'corosync' cluster Jun 21 15:09:38 [17115] pg1 pacemakerd: info: mcp_read_config: Reading configure for stack: corosync Jun 21 15:09:38 [17115] pg1 pacemakerd: notice: main: Starting Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc lha-fencing nagios corosync-native atomic-attrd acls Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main: Maximum core file size is: 18446744073709551615 Jun 21 15:09:38 [17115] pg1 pacemakerd: info: qb_ipcs_us_publish: server name: pacemakerd Jun 21 15:09:53 [17115] pg1 pacemakerd: warning: corosync_node_name: Could not connect to Cluster Configuration Database API, error CS_ERR_TRY_AGAIN Jun 21 15:09:53 [17115] pg1 pacemakerd: info: corosync_node_name: Unable to get node name for nodeid 1 Jun 21 15:09:53 [17115] pg1 pacemakerd: notice: get_node_name:Could not obtain a node name for corosync nodeid 1 Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer: Created entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node (null)/1 (1 total) Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer: Node 1 has uuid 1 Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1] - corosync-cpg is now online Jun 21 15:09:53 [17115] pg1 pacemakerd:error: cluster_connect_quorum: Could not connect to the Quorum API: 2 Jun 21 15:09:53 [17115] pg1 pacemakerd: info: qb_ipcs_us_withdraw: withdrawing server sockets Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main: Exiting pacemakerd Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2 What is wrong in my procedure? ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On Thu, Jun 21, 2018 at 9:49 AM Jan Pokorný wrote: > > On 21/06/18 07:05 -0400, Jason Gauthier wrote: > > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield > > wrote: > >> On 19/06/18 18:47, Jason Gauthier wrote: > >>> Attached! > >> > >> That's very odd. I can see communication with the server and corosync in > >> there (do it's doing something) but no logging at all. When I start > >> qdevice on my systems it logs loads of messages even if it doesn't > >> manage to contact the server. Do you have any logging entries in > >> corosync.conf that might be stopping it? > > > > I haven't checked the corosync logs for any entries before, but I just > > did. There isn't anything logged. > > What about syslog entries (may boil down to /var/log/messages, > journald log, or whatever sink is configured)? I took a look, since both you and Chrissie mentioned that. There aren't any new entries added to any of the /var/log files. # corosync-qdevice -f -d # date Thu Jun 21 10:36:06 EDT 2018 # ls -lt|head total 152072 -rw-r- 1 rootadm 68018 Jun 21 10:34 auth.log -rw-rw-r-- 1 rootutmp 18704352 Jun 21 10:34 lastlog -rw-rw-r-- 1 rootutmp107136 Jun 21 10:34 wtmp -rw-r- 1 rootadm 248444 Jun 21 10:34 daemon.log -rw-r- 1 rootadm 160899 Jun 21 10:34 syslog -rw-r- 1 rootadm1119856 Jun 21 09:46 kern.log I did look through daemon, messages, and syslog just to be sure. > >> Where did the binary come from? did you build it yourself or is it from > >> a package? I wonder if it's got corrupted or is a bad version. Possibly > >> linked against a 'dodgy' libqb - there have been some things going on > >> there that could cause logging to go missing in some circumstances. > >> > >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit > >> here anyway! > > > > Hmm. Interesting. I installed the debian package. When it didn't > > work, I grabbed the source from github. They both act the same way, > > but if there is an underlying library issue then that will continue to > > be a problem. > > > > It doesn't say much: > > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1 > > You are likely using libqb v1.0.1. Correct. I didn't even think to look at the output of dpkg -l for the package version. Debian 9 also packages binutils-2.28 > Ability to figure out the proper package version is one of the most > basic skills to provide useful diagnostics about the issues with > distro-provided packages. > > With Debian, the proper incantation seems to be > > dpkg -s libqb-dev | grep -i version > > or > > apt list libqb-dev > > (or substitute libqb0 for libqb-dev). > > As Chrissie mentioned, there is some fishiness possible if you happen > to use ld linker from binutils 2.29+ for the building with this old > libqb in the mix, so if the issues persist and logging seems to be > missing, try recompiling with the downgraded binutils package below > said breakage point. Since the system already has a lower numbered binutils (2.28) I wonder if I should attempt to build a newer version of the libqb library. As Chrissie mentioned, I will open a bug with Debian in the Interim. But I don 't believe I will see resolution to that any time soon. :) ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On 21/06/18 14:44 +0100, Christine Caulfield wrote: > On 21/06/18 14:27, Christine Caulfield wrote: >> >> I just tried this on my Debian VM and it does exactly the same as yours. >> So I think you should report it to the Debian maintainer as it doesn't >> happen on my Fedora or RHEL systems >> > > ahh more light here. I still don't understand why Debian doesn't log > to stderr, but I'm getting messages in /var/log/syslog Exactly what I coincidentally mentioned in the parallel response :-) That's also the stock behaviour of RHEL 7 and derived distros, IIRC. > (fedora is different, that's why I missed them) about the security > keys (on my system). are you getting any system log errors on yours? -- Jan (Poki) pgpOdjLCe3AN_.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On 21/06/18 07:05 -0400, Jason Gauthier wrote: > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield > wrote: >> On 19/06/18 18:47, Jason Gauthier wrote: >>> Attached! >> >> That's very odd. I can see communication with the server and corosync in >> there (do it's doing something) but no logging at all. When I start >> qdevice on my systems it logs loads of messages even if it doesn't >> manage to contact the server. Do you have any logging entries in >> corosync.conf that might be stopping it? > > I haven't checked the corosync logs for any entries before, but I just > did. There isn't anything logged. What about syslog entries (may boil down to /var/log/messages, journald log, or whatever sink is configured)? >> Where did the binary come from? did you build it yourself or is it from >> a package? I wonder if it's got corrupted or is a bad version. Possibly >> linked against a 'dodgy' libqb - there have been some things going on >> there that could cause logging to go missing in some circumstances. >> >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit >> here anyway! > > Hmm. Interesting. I installed the debian package. When it didn't > work, I grabbed the source from github. They both act the same way, > but if there is an underlying library issue then that will continue to > be a problem. > > It doesn't say much: > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1 You are likely using libqb v1.0.1. Ability to figure out the proper package version is one of the most basic skills to provide useful diagnostics about the issues with distro-provided packages. With Debian, the proper incantation seems to be dpkg -s libqb-dev | grep -i version or apt list libqb-dev (or substitute libqb0 for libqb-dev). As Chrissie mentioned, there is some fishiness possible if you happen to use ld linker from binutils 2.29+ for the building with this old libqb in the mix, so if the issues persist and logging seems to be missing, try recompiling with the downgraded binutils package below said breakage point. Hope this helps. -- Jan (Poki) pgpu03WnxxUJR.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On 21/06/18 14:27, Christine Caulfield wrote: > On 21/06/18 12:05, Jason Gauthier wrote: >> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield >> wrote: >>> >>> On 19/06/18 18:47, Jason Gauthier wrote: On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield wrote: > > On 19/06/18 11:44, Jason Gauthier wrote: >> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield >> wrote: >>> >>> On 19/06/18 02:46, Jason Gauthier wrote: Greetings, I've just discovered corosync-qdevice and corosync-qnet. (Thanks Ken Gaillot) . Set up was pretty quick. I enabled qnet off cluster. I followed the steps presented by corosync-qdevice-net-certutil.However, when running corosync-qdevice it exits. Even with -f -d there isn't a single output presented. >>> >>> It sounds like the first time you ran it (without -d -f) >>> corosync-qdevice started up and daemonised itself. The second time you >>> tried (with -d -f) it couldn't run because there was already one >>> running. There's a good argument for it printing an error if it's >>> already running I think! >>> >> >> The process doesn't stay running. I've showed in output of qnet below >> that it launches, connected, and disconnects. I've rebooted several >> times since then (testing stonith). I can provide strace output if >> it's helpful. >> > > yes please Attached! >>> >>> That's very odd. I can see communication with the server and corosync in >>> there (do it's doing something) but no logging at all. When I start >>> qdevice on my systems it logs loads of messages even if it doesn't >>> manage to contact the server. Do you have any logging entries in >>> corosync.conf that might be stopping it? >> >> I haven't checked the corosync logs for any entries before, but I just >> did. There isn't anything logged. >> >>> Where did the binary come from? did you build it yourself or is it from >>> a package? I wonder if it's got corrupted or is a bad version. Possibly >>> linked against a 'dodgy' libqb - there have been some things going on >>> there that could cause logging to go missing in some circumstances. >>> >>> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit >>> here anyway! >> >> Hmm. Interesting. I installed the debian package. When it didn't >> work, I grabbed the source from github. They both act the same way, >> but if there is an underlying library issue then that will continue to >> be a problem. >> >> It doesn't say much: >> /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1 >> >> > > I just tried this on my Debian VM and it does exactly the same as yours. > So I think you should report it to the Debian maintainer as it doesn't > happen on my Fedora or RHEL systems > ahh more light here. I still don't understand why Debian doesn't log to stderr, but I'm getting messages in /var/log/syslog (fedora is different, that's why I missed them) about the security keys (on my system). are you getting any system log errors on yours? Chrissie ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource agents differences from 1.1.14 and 1.1.18
Hi, thanks for reply > On 21 Jun 2018, at 15:09, Jan Pokorný wrote: > > Hello Salvatore, > > On 21/06/18 12:44 +0200, Salvatore D'angelo wrote: >> I am trying to upgrade my PostgresSQL cluster managed by pacemaker >> to pacemaker 1.1.8 or 2.0.0. I have some resource agents that I >> patched to have them working with my cluster. >> >> Can someone tell me if something is changed in the OCF interface >> from 1.1.14 release and the 1.1.8/2.0.0? > > You can consider the OCF specification/interface stable and no > breakages are really imminent. Good to know > There are admittedly some parts with > less than well-defined semantics (if it's defined at all; for instance, > questions on what's the proper interpretation of "unique" slash > reloadable parameters was raised in the past [1,2]). > > This stability is moreover enforced with the requirement of cross > compatibility between various OCF conformant agent vs. resource > manager implementations (say those maintained in resource-agents > project vs. pacemaker, plus various versions thereof, without any > apriori defined ways of how to negotiate any further inteface > specifics, but see [3], for instance). > >> I am using the following resource agents: >> >> /usr/lib/ocf/resource.d/heartbeat/Filesystem >> /usr/lib/ocf/resource.d/heartbeat/ethmonitor >> /usr/lib/ocf/resource.d/heartbeat/pgsql (patched) > > ^ this is really contained within resource-agents project, and as > mentioned, nothing pushes you to update this piece of software > even if you intend to update pacemaker (granted, keeping step > with overall evolutionary "time snapshots" is always wise) > >> /usr/lib/ocf/resource.d/pacemaker/HealthCPU (patched) >> /usr/lib/ocf/resource.d/pacemaker/ping (patched) >> /usr/lib/ocf/resource.d/pacemaker/SysInfo (patched) > > ^ and these are from pacemaker's realms, so there's naturally > a closer coupling possibly beyond what standard mandates, but > again, OCF forms a "fixed point", basis upon which the graph > connecting the functionality user(s) and providers is formed, > so presumably you can mix and match various versions even if > the bits come from the very same project > >> I am doing some tests to verify this but I would like to know if >> there is at high level something I should be aware. > > Nothing comes to my mind, though your are always best served with > your own investigation (since you are modifying the agents anyway). > > As a rule of thumb, I'd start with checking the changelogs of the > mentioned projects, and deeper concerns can ultimately be resolved > with the review of cross-version changes on the source code level, > e.g.: > > git clone https://github.com/ClusterLabs/resource-agents.git > pushd resource-agents > # let's say you start with agents from v3.9.7 release > git diff v3.9.7 v4.1.1 -- heartbeat/{Filesystem,ethmonitor,pgsql} > popd > > git clone https://github.com/ClusterLabs/pacemaker.git > pushd pacemaker > git diff Pacemaker-1.1.14 Pacemaker-1.1.18 -- \ > extra/resources/{HealthCPU,SysInfo,ping} > popd > > It's more like showing how to fish than serving you a meal, > but hopefully this helps regardless (perhaps even more than > latter would do). > Yes, that’s exactly what I did. I just double checked. > > [1] https://lists.clusterlabs.org/pipermail/users/2016-June/010635.html > [2] https://lists.clusterlabs.org/pipermail/users/2017-September/013743.html > [3] https://github.com/ClusterLabs/OCF-spec/issues/17 > > -- > Jan (Poki) > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On 21/06/18 12:05, Jason Gauthier wrote: > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield > wrote: >> >> On 19/06/18 18:47, Jason Gauthier wrote: >>> On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield >>> wrote: On 19/06/18 11:44, Jason Gauthier wrote: > On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield > wrote: >> >> On 19/06/18 02:46, Jason Gauthier wrote: >>> Greetings, >>> >>>I've just discovered corosync-qdevice and corosync-qnet. >>> (Thanks Ken Gaillot) . Set up was pretty quick. >>> >>> I enabled qnet off cluster. I followed the steps presented by >>> corosync-qdevice-net-certutil.However, when running >>> corosync-qdevice it exits. Even with -f -d there isn't a single >>> output presented. >>> >> >> It sounds like the first time you ran it (without -d -f) >> corosync-qdevice started up and daemonised itself. The second time you >> tried (with -d -f) it couldn't run because there was already one >> running. There's a good argument for it printing an error if it's >> already running I think! >> > > The process doesn't stay running. I've showed in output of qnet below > that it launches, connected, and disconnects. I've rebooted several > times since then (testing stonith). I can provide strace output if > it's helpful. > yes please >>> >>> Attached! >>> >> >> That's very odd. I can see communication with the server and corosync in >> there (do it's doing something) but no logging at all. When I start >> qdevice on my systems it logs loads of messages even if it doesn't >> manage to contact the server. Do you have any logging entries in >> corosync.conf that might be stopping it? > > I haven't checked the corosync logs for any entries before, but I just > did. There isn't anything logged. > >> Where did the binary come from? did you build it yourself or is it from >> a package? I wonder if it's got corrupted or is a bad version. Possibly >> linked against a 'dodgy' libqb - there have been some things going on >> there that could cause logging to go missing in some circumstances. >> >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit >> here anyway! > > Hmm. Interesting. I installed the debian package. When it didn't > work, I grabbed the source from github. They both act the same way, > but if there is an underlying library issue then that will continue to > be a problem. > > It doesn't say much: > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1 > > I just tried this on my Debian VM and it does exactly the same as yours. So I think you should report it to the Debian maintainer as it doesn't happen on my Fedora or RHEL systems Chrissie ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource agents differences from 1.1.14 and 1.1.18
Hello Salvatore, On 21/06/18 12:44 +0200, Salvatore D'angelo wrote: > I am trying to upgrade my PostgresSQL cluster managed by pacemaker > to pacemaker 1.1.8 or 2.0.0. I have some resource agents that I > patched to have them working with my cluster. > > Can someone tell me if something is changed in the OCF interface > from 1.1.14 release and the 1.1.8/2.0.0? You can consider the OCF specification/interface stable and no breakages are really imminent. There are admittedly some parts with less than well-defined semantics (if it's defined at all; for instance, questions on what's the proper interpretation of "unique" slash reloadable parameters was raised in the past [1,2]). This stability is moreover enforced with the requirement of cross compatibility between various OCF conformant agent vs. resource manager implementations (say those maintained in resource-agents project vs. pacemaker, plus various versions thereof, without any apriori defined ways of how to negotiate any further inteface specifics, but see [3], for instance). > I am using the following resource agents: > > /usr/lib/ocf/resource.d/heartbeat/Filesystem > /usr/lib/ocf/resource.d/heartbeat/ethmonitor > /usr/lib/ocf/resource.d/heartbeat/pgsql (patched) ^ this is really contained within resource-agents project, and as mentioned, nothing pushes you to update this piece of software even if you intend to update pacemaker (granted, keeping step with overall evolutionary "time snapshots" is always wise) > /usr/lib/ocf/resource.d/pacemaker/HealthCPU (patched) > /usr/lib/ocf/resource.d/pacemaker/ping (patched) > /usr/lib/ocf/resource.d/pacemaker/SysInfo (patched) ^ and these are from pacemaker's realms, so there's naturally a closer coupling possibly beyond what standard mandates, but again, OCF forms a "fixed point", basis upon which the graph connecting the functionality user(s) and providers is formed, so presumably you can mix and match various versions even if the bits come from the very same project > I am doing some tests to verify this but I would like to know if > there is at high level something I should be aware. Nothing comes to my mind, though your are always best served with your own investigation (since you are modifying the agents anyway). As a rule of thumb, I'd start with checking the changelogs of the mentioned projects, and deeper concerns can ultimately be resolved with the review of cross-version changes on the source code level, e.g.: git clone https://github.com/ClusterLabs/resource-agents.git pushd resource-agents # let's say you start with agents from v3.9.7 release git diff v3.9.7 v4.1.1 -- heartbeat/{Filesystem,ethmonitor,pgsql} popd git clone https://github.com/ClusterLabs/pacemaker.git pushd pacemaker git diff Pacemaker-1.1.14 Pacemaker-1.1.18 -- \ extra/resources/{HealthCPU,SysInfo,ping} popd It's more like showing how to fish than serving you a meal, but hopefully this helps regardless (perhaps even more than latter would do). [1] https://lists.clusterlabs.org/pipermail/users/2016-June/010635.html [2] https://lists.clusterlabs.org/pipermail/users/2017-September/013743.html [3] https://github.com/ClusterLabs/OCF-spec/issues/17 -- Jan (Poki) pgphjo9hkHXxI.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
Hi Ken, > Can you attach the pe-input file listed just above here? done ;) And thank you for your patience! best regards Stefan pre-input-228.bz2 Description: application/bzip ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield wrote: > > On 19/06/18 18:47, Jason Gauthier wrote: > > On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield > > wrote: > >> > >> On 19/06/18 11:44, Jason Gauthier wrote: > >>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield > >>> wrote: > > On 19/06/18 02:46, Jason Gauthier wrote: > > Greetings, > > > >I've just discovered corosync-qdevice and corosync-qnet. > > (Thanks Ken Gaillot) . Set up was pretty quick. > > > > I enabled qnet off cluster. I followed the steps presented by > > corosync-qdevice-net-certutil.However, when running > > corosync-qdevice it exits. Even with -f -d there isn't a single > > output presented. > > > > It sounds like the first time you ran it (without -d -f) > corosync-qdevice started up and daemonised itself. The second time you > tried (with -d -f) it couldn't run because there was already one > running. There's a good argument for it printing an error if it's > already running I think! > > >>> > >>> The process doesn't stay running. I've showed in output of qnet below > >>> that it launches, connected, and disconnects. I've rebooted several > >>> times since then (testing stonith). I can provide strace output if > >>> it's helpful. > >>> > >> > >> yes please > > > > Attached! > > > > That's very odd. I can see communication with the server and corosync in > there (do it's doing something) but no logging at all. When I start > qdevice on my systems it logs loads of messages even if it doesn't > manage to contact the server. Do you have any logging entries in > corosync.conf that might be stopping it? I haven't checked the corosync logs for any entries before, but I just did. There isn't anything logged. > Where did the binary come from? did you build it yourself or is it from > a package? I wonder if it's got corrupted or is a bad version. Possibly > linked against a 'dodgy' libqb - there have been some things going on > there that could cause logging to go missing in some circumstances. > > Honza (the qdevice expert) is away at the moment, so I'm guessing a bit > here anyway! Hmm. Interesting. I installed the debian package. When it didn't work, I grabbed the source from github. They both act the same way, but if there is an underlying library issue then that will continue to be a problem. It doesn't say much: /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1 > Chrissie > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Resource agents differences from 1.1.14 and 1.1.18
Hi all, I am trying to upgrade my PostgresSQL cluster managed by pacemaker to pacemaker 1.1.8 or 2.0.0. I have some resource agents that I patched to have them working with my cluster. Can someone tell me if something is changed in the OCF interface from 1.1.14 release and the 1.1.8/2.0.0? I am using the following resource agents: /usr/lib/ocf/resource.d/heartbeat/Filesystem /usr/lib/ocf/resource.d/heartbeat/ethmonitor /usr/lib/ocf/resource.d/heartbeat/pgsql (patched) /usr/lib/ocf/resource.d/pacemaker/HealthCPU (patched) /usr/lib/ocf/resource.d/pacemaker/ping (patched) /usr/lib/ocf/resource.d/pacemaker/SysInfo (patched) I am doing some tests to verify this but I would like to know if there is at high level something I should be aware. Thanks in advance for your help.___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
On 19/06/18 18:47, Jason Gauthier wrote: > On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield > wrote: >> >> On 19/06/18 11:44, Jason Gauthier wrote: >>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield >>> wrote: On 19/06/18 02:46, Jason Gauthier wrote: > Greetings, > >I've just discovered corosync-qdevice and corosync-qnet. > (Thanks Ken Gaillot) . Set up was pretty quick. > > I enabled qnet off cluster. I followed the steps presented by > corosync-qdevice-net-certutil.However, when running > corosync-qdevice it exits. Even with -f -d there isn't a single > output presented. > It sounds like the first time you ran it (without -d -f) corosync-qdevice started up and daemonised itself. The second time you tried (with -d -f) it couldn't run because there was already one running. There's a good argument for it printing an error if it's already running I think! >>> >>> The process doesn't stay running. I've showed in output of qnet below >>> that it launches, connected, and disconnects. I've rebooted several >>> times since then (testing stonith). I can provide strace output if >>> it's helpful. >>> >> >> yes please > > Attached! > That's very odd. I can see communication with the server and corosync in there (do it's doing something) but no logging at all. When I start qdevice on my systems it logs loads of messages even if it doesn't manage to contact the server. Do you have any logging entries in corosync.conf that might be stopping it? Where did the binary come from? did you build it yourself or is it from a package? I wonder if it's got corrupted or is a bad version. Possibly linked against a 'dodgy' libqb - there have been some things going on there that could cause logging to go missing in some circumstances. Honza (the qdevice expert) is away at the moment, so I'm guessing a bit here anyway! Chrissie ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org