Re: [ClusterLabs] corosync doesn't start any resource

2018-06-21 Thread Andrei Borzenkov
21.06.2018 16:04, Stefan Krueger пишет:
> Hi Ken,
> 
>> Can you attach the pe-input file listed just above here?
> done ;) 
> 
> And thank you for your patience!
> 

You delete all context which makes it hard to answer. This is not web
forum where users can simply scroll up to see previous reply.

Both your logs and pe-input show that nfs-server and vm-storage wait for
each other.

My best guess is that you have incorrect ordering for start and stop
which causes loop in pacemaker decision. Your start order is "nfs-server
vm-storage" and your stop order is "nfs-server vm-storage", while it
should normally be symmetrical. Reversing order in one of sets makes it
work as intended (verified).

I would actually expect that asymmetrical configuration still should
work, so I leave it to pacemaker developers to comment whether this is a
bug or feature :)

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-21 Thread Digimer
On 2018-06-20 11:52 PM, Andrei Borzenkov wrote:
> 21.06.2018 00:50, Digimer пишет:
>> On 2018-06-20 05:46 PM, Jehan-Guillaume de Rorthais wrote:
>>> On Wed, 20 Jun 2018 17:24:41 -0400
>>> Digimer  wrote:
>>>
 Make sure quorum is disabled. Quorum doesn't work on 2-node clusters.
>>>
>>> It does with the "two_node" parameter enabled in corosync.conf...as far as I
>>> understand it anyway...
>>
>> It doesn't, that option disables quorum in corosync.
>>
> 
> This option does not disable quorum - this option fakes quorum so
> corosync continues to report "in quorum" even when one node is lost. it
> is quite possible that pacemaker quorum does not map one-to-one to
> corosync quorum though.

Technically correct, which is the best kind of correct.

I didn't go into that detail as the results are the same (and consistent
with pacemaker's quorum=false language).

>> Quorum is floor(($nodes / 2) + 1). So in a 3-node, that is 3 -> 1.5 ->
>> 2.5 -> 2 votes needed for quorum. In a 2-node, that is 2 -> 1 -> 2 -> 2
>> votes needed for quorum, meaning you can't lose a node to operate (which
>> is kinda not HA :) ).
>>
> 
> Yes, but that assumes normal, non two_node, configuration. As said,
> two_node makes corosync to always pretend quorum is available (after
> initial implicit wait_for_all).
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Upgrade corosync problem

2018-06-21 Thread Salvatore D'angelo
Hi,

I upgraded my PostgreSQL/Pacemaker cluster with these versions.
Pacemaker 1.1.14 -> 1.1.18
Corosync 2.3.5 -> 2.4.4
Crmsh 2.2.0 -> 3.0.1
Resource agents 3.9.7 -> 4.1.1

I started on a first node  (I am trying one node at a time upgrade).
On a PostgreSQL slave node  I did:

crm node standby 
service pacemaker stop
service corosync stop

Then I build the tool above as described on their GitHub.com page. 

./autogen.sh (where required)
./configure
make (where required)
make install

Everything went ok. I expect new file overwrite old one. I left the dependency 
I had with old software because I noticed the .configure didn’t complain. 
I started corosync.

service corosync start

To verify corosync work properly I used the following commands:
corosync-cfg-tool -s
corosync-cmapctl | grep members

Everything seemed ok and I verified my node joined the cluster (at least this 
is my impression).

Here I verified a problem. Doing the command:
corosync-quorumtool -ps

I got the following problem:
Cannot initialise CFG service

If I try to start pacemaker, I only see pacemaker process running and 
pacemaker.log containing the following lines:

Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init: Changed active 
directory to /var/lib/pacemaker/cores
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: get_cluster_type: 
Detected an active 'corosync' cluster
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: mcp_read_config:  Reading 
configure for stack: corosync
Jun 21 15:09:38 [17115] pg1 pacemakerd:   notice: main: Starting Pacemaker 
1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc lha-fencing nagios  
corosync-native atomic-attrd acls
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main: Maximum core file size 
is: 18446744073709551615
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: qb_ipcs_us_publish:   server 
name: pacemakerd
Jun 21 15:09:53 [17115] pg1 pacemakerd:  warning: corosync_node_name:   Could 
not connect to Cluster Configuration Database API, error CS_ERR_TRY_AGAIN
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: corosync_node_name:   Unable 
to get node name for nodeid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd:   notice: get_node_name:Could 
not obtain a node name for corosync nodeid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer: Created entry 
1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node (null)/1 (1 total)
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer: Node 1 has uuid 
1
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_update_peer_proc: 
cluster_connect_cpg: Node (null)[1] - corosync-cpg is now online
Jun 21 15:09:53 [17115] pg1 pacemakerd:error: cluster_connect_quorum:   
Could not connect to the Quorum API: 2
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: qb_ipcs_us_withdraw:  
withdrawing server sockets
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main: Exiting pacemakerd
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_xml_cleanup:  
Cleaning up memory from libxml2

What is wrong in my procedure?



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Jason Gauthier
On Thu, Jun 21, 2018 at 9:49 AM Jan Pokorný  wrote:
>
> On 21/06/18 07:05 -0400, Jason Gauthier wrote:
> > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield  
> > wrote:
> >> On 19/06/18 18:47, Jason Gauthier wrote:
> >>> Attached!
> >>
> >> That's very odd. I can see communication with the server and corosync in
> >> there (do it's doing something) but no logging at all. When I start
> >> qdevice on my systems it logs loads of messages even if it doesn't
> >> manage to contact the server. Do you have any logging entries in
> >> corosync.conf that might be stopping it?
> >
> > I haven't checked the corosync logs for any entries before, but I just
> > did.  There isn't anything logged.
>
> What about syslog entries (may boil down to /var/log/messages,
> journald log, or whatever sink is configured)?

I took a look, since both you and Chrissie mentioned that.

There aren't any new entries added to any of the /var/log files.

# corosync-qdevice -f -d
# date
Thu Jun 21 10:36:06 EDT 2018

# ls -lt|head
total 152072
-rw-r- 1 rootadm  68018 Jun 21 10:34 auth.log
-rw-rw-r-- 1 rootutmp  18704352 Jun 21 10:34 lastlog
-rw-rw-r-- 1 rootutmp107136 Jun 21 10:34 wtmp
-rw-r- 1 rootadm 248444 Jun 21 10:34 daemon.log
-rw-r- 1 rootadm 160899 Jun 21 10:34 syslog
-rw-r- 1 rootadm1119856 Jun 21 09:46 kern.log

I did look through daemon, messages, and syslog just to be sure.

> >> Where did the binary come from? did you build it yourself or is it from
> >> a package? I wonder if it's got corrupted or is a bad version. Possibly
> >> linked against a 'dodgy' libqb - there have been some things going on
> >> there that could cause logging to go missing in some circumstances.
> >>
> >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
> >> here anyway!
> >
> > Hmm. Interesting.  I installed the debian package.  When it didn't
> > work, I grabbed the source from github.  They both act the same way,
> > but if there is an underlying library issue then that will continue to
> > be a problem.
> >
> > It doesn't say much:
> > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1
>
> You are likely using libqb v1.0.1.

Correct. I didn't even think to look at the output of dpkg -l for the
package version.
Debian 9 also packages binutils-2.28

> Ability to figure out the proper package version is one of the most
> basic skills to provide useful diagnostics about the issues with
> distro-provided packages.
>
> With Debian, the proper incantation seems to be
>
>   dpkg -s libqb-dev | grep -i version
>
> or
>
>   apt list libqb-dev
>
> (or substitute libqb0 for libqb-dev).
>
> As Chrissie mentioned, there is some fishiness possible if you happen
> to use ld linker from binutils 2.29+ for the building with this old
> libqb in the mix, so if the issues persist and logging seems to be
> missing, try recompiling with the downgraded binutils package below
> said breakage point.

Since the system already has a lower numbered binutils (2.28) I wonder
if I should attempt to build a newer version of the libqb library.

As Chrissie mentioned, I will open a bug with Debian in the Interim.
But I don 't believe I will see resolution to that any time soon. :)
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Jan Pokorný
On 21/06/18 14:44 +0100, Christine Caulfield wrote:
> On 21/06/18 14:27, Christine Caulfield wrote:
>> 
>> I just tried this on my Debian VM and it does exactly the same as yours.
>> So I think you should report it to the Debian maintainer as it doesn't
>> happen on my Fedora or RHEL systems
>> 
> 
> ahh more light here. I still don't understand why Debian doesn't log
> to stderr, but I'm getting messages in /var/log/syslog

Exactly what I coincidentally mentioned in the parallel response :-)
That's also the stock behaviour of RHEL 7 and derived distros, IIRC.

> (fedora is different, that's why I missed them) about the security
> keys (on my system). are you getting any system log errors on yours?

-- 
Jan (Poki)


pgpOdjLCe3AN_.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Jan Pokorný
On 21/06/18 07:05 -0400, Jason Gauthier wrote:
> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield  
> wrote:
>> On 19/06/18 18:47, Jason Gauthier wrote:
>>> Attached!
>> 
>> That's very odd. I can see communication with the server and corosync in
>> there (do it's doing something) but no logging at all. When I start
>> qdevice on my systems it logs loads of messages even if it doesn't
>> manage to contact the server. Do you have any logging entries in
>> corosync.conf that might be stopping it?
> 
> I haven't checked the corosync logs for any entries before, but I just
> did.  There isn't anything logged.

What about syslog entries (may boil down to /var/log/messages,
journald log, or whatever sink is configured)?

>> Where did the binary come from? did you build it yourself or is it from
>> a package? I wonder if it's got corrupted or is a bad version. Possibly
>> linked against a 'dodgy' libqb - there have been some things going on
>> there that could cause logging to go missing in some circumstances.
>> 
>> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
>> here anyway!
> 
> Hmm. Interesting.  I installed the debian package.  When it didn't
> work, I grabbed the source from github.  They both act the same way,
> but if there is an underlying library issue then that will continue to
> be a problem.
> 
> It doesn't say much:
> /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1

You are likely using libqb v1.0.1.

Ability to figure out the proper package version is one of the most
basic skills to provide useful diagnostics about the issues with
distro-provided packages.

With Debian, the proper incantation seems to be

  dpkg -s libqb-dev | grep -i version

or

  apt list libqb-dev

(or substitute libqb0 for libqb-dev).

As Chrissie mentioned, there is some fishiness possible if you happen
to use ld linker from binutils 2.29+ for the building with this old
libqb in the mix, so if the issues persist and logging seems to be
missing, try recompiling with the downgraded binutils package below
said breakage point.

Hope this helps.

-- 
Jan (Poki)


pgpu03WnxxUJR.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 21/06/18 14:27, Christine Caulfield wrote:
> On 21/06/18 12:05, Jason Gauthier wrote:
>> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield  
>> wrote:
>>>
>>> On 19/06/18 18:47, Jason Gauthier wrote:
 On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield  
 wrote:
>
> On 19/06/18 11:44, Jason Gauthier wrote:
>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield 
>>  wrote:
>>>
>>> On 19/06/18 02:46, Jason Gauthier wrote:
 Greetings,

I've just discovered corosync-qdevice and corosync-qnet.
 (Thanks Ken Gaillot) . Set up was pretty quick.

 I enabled qnet off cluster.  I followed the steps presented by
 corosync-qdevice-net-certutil.However, when running
 corosync-qdevice it exits.  Even with -f -d there isn't a single
 output presented.

>>>
>>> It sounds like the first time you ran it (without -d -f)
>>> corosync-qdevice started up and daemonised itself. The second time you
>>> tried (with -d -f) it couldn't run because there was already one
>>> running. There's a good argument for it printing an error if it's
>>> already running I think!
>>>
>>
>> The process doesn't stay running.  I've showed in output of qnet below
>> that it launches, connected, and disconnects. I've rebooted several
>> times since then (testing stonith). I can provide strace output if
>> it's helpful.
>>
>
> yes please

 Attached!

>>>
>>> That's very odd. I can see communication with the server and corosync in
>>> there (do it's doing something) but no logging at all. When I start
>>> qdevice on my systems it logs loads of messages even if it doesn't
>>> manage to contact the server. Do you have any logging entries in
>>> corosync.conf that might be stopping it?
>>
>> I haven't checked the corosync logs for any entries before, but I just
>> did.  There isn't anything logged.
>>
>>> Where did the binary come from? did you build it yourself or is it from
>>> a package? I wonder if it's got corrupted or is a bad version. Possibly
>>> linked against a 'dodgy' libqb - there have been some things going on
>>> there that could cause logging to go missing in some circumstances.
>>>
>>> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
>>> here anyway!
>>
>> Hmm. Interesting.  I installed the debian package.  When it didn't
>> work, I grabbed the source from github.  They both act the same way,
>> but if there is an underlying library issue then that will continue to
>> be a problem.
>>
>> It doesn't say much:
>> /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1
>>
>>
> 
> I just tried this on my Debian VM and it does exactly the same as yours.
> So I think you should report it to the Debian maintainer as it doesn't
> happen on my Fedora or RHEL systems
> 

ahh more light here. I still don't understand why Debian doesn't log
to stderr, but I'm getting messages in /var/log/syslog (fedora is
different, that's why I missed them) about the security keys (on my
system). are you getting any system log errors on yours?

Chrissie
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource agents differences from 1.1.14 and 1.1.18

2018-06-21 Thread Salvatore D'angelo
Hi, thanks for reply

> On 21 Jun 2018, at 15:09, Jan Pokorný  wrote:
> 
> Hello Salvatore,
> 
> On 21/06/18 12:44 +0200, Salvatore D'angelo wrote:
>> I am trying to upgrade my PostgresSQL cluster managed by pacemaker
>> to pacemaker 1.1.8 or 2.0.0.  I have some resource agents that I
>> patched to have them working with my cluster.
>> 
>> Can someone tell me if something is changed in the OCF interface
>> from 1.1.14 release and the 1.1.8/2.0.0?
> 
> You can consider the OCF specification/interface stable and no
> breakages are really imminent.

Good to know

>  There are admittedly some parts with
> less than well-defined semantics (if it's defined at all; for instance,
> questions on what's the proper interpretation of "unique" slash
> reloadable parameters was raised in the past [1,2]).
> 
> This stability is moreover enforced with the requirement of cross
> compatibility between various OCF conformant agent vs. resource
> manager implementations (say those maintained in resource-agents
> project vs. pacemaker, plus various versions thereof, without any
> apriori defined ways of how to negotiate any further inteface
> specifics, but see [3], for instance).
> 
>> I am using the following resource agents:
>> 
>> /usr/lib/ocf/resource.d/heartbeat/Filesystem
>> /usr/lib/ocf/resource.d/heartbeat/ethmonitor
>> /usr/lib/ocf/resource.d/heartbeat/pgsql (patched)
> 
> ^ this is really contained within resource-agents project, and as
>  mentioned, nothing pushes you to update this piece of software
>  even if you intend to update pacemaker (granted, keeping step
>  with overall evolutionary "time snapshots" is always wise)
> 
>> /usr/lib/ocf/resource.d/pacemaker/HealthCPU (patched)
>> /usr/lib/ocf/resource.d/pacemaker/ping (patched)
>> /usr/lib/ocf/resource.d/pacemaker/SysInfo (patched)
> 
> ^ and these are from pacemaker's realms, so there's naturally
>  a closer coupling possibly beyond what standard mandates, but
>  again, OCF forms a "fixed point", basis upon which the graph
>  connecting the functionality user(s) and providers is formed,
>  so presumably you can mix and match various versions even if
>  the bits come from the very same project
> 
>> I am doing some tests to verify this but I would like to know if
>> there is at high level something I should be aware.
> 
> Nothing comes to my mind, though your are always best served with
> your own investigation (since you are modifying the agents anyway).
> 
> As a rule of thumb, I'd start with checking the changelogs of the
> mentioned projects, and deeper concerns can ultimately be resolved
> with the review of cross-version changes on the source code level,
> e.g.:
> 
>  git clone https://github.com/ClusterLabs/resource-agents.git
>  pushd resource-agents
>  # let's say you start with agents from v3.9.7 release
>  git diff v3.9.7 v4.1.1 -- heartbeat/{Filesystem,ethmonitor,pgsql}
>  popd
> 
>  git clone https://github.com/ClusterLabs/pacemaker.git
>  pushd pacemaker
>  git diff Pacemaker-1.1.14 Pacemaker-1.1.18 -- \
>  extra/resources/{HealthCPU,SysInfo,ping}
>  popd
> 
> It's more like showing how to fish than serving you a meal,
> but hopefully this helps regardless (perhaps even more than
> latter would do).
> 

Yes, that’s exactly what I did. I just double checked.

> 
> [1] https://lists.clusterlabs.org/pipermail/users/2016-June/010635.html
> [2] https://lists.clusterlabs.org/pipermail/users/2017-September/013743.html
> [3] https://github.com/ClusterLabs/OCF-spec/issues/17
> 
> -- 
> Jan (Poki)
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 21/06/18 12:05, Jason Gauthier wrote:
> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield  
> wrote:
>>
>> On 19/06/18 18:47, Jason Gauthier wrote:
>>> On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield  
>>> wrote:

 On 19/06/18 11:44, Jason Gauthier wrote:
> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield  
> wrote:
>>
>> On 19/06/18 02:46, Jason Gauthier wrote:
>>> Greetings,
>>>
>>>I've just discovered corosync-qdevice and corosync-qnet.
>>> (Thanks Ken Gaillot) . Set up was pretty quick.
>>>
>>> I enabled qnet off cluster.  I followed the steps presented by
>>> corosync-qdevice-net-certutil.However, when running
>>> corosync-qdevice it exits.  Even with -f -d there isn't a single
>>> output presented.
>>>
>>
>> It sounds like the first time you ran it (without -d -f)
>> corosync-qdevice started up and daemonised itself. The second time you
>> tried (with -d -f) it couldn't run because there was already one
>> running. There's a good argument for it printing an error if it's
>> already running I think!
>>
>
> The process doesn't stay running.  I've showed in output of qnet below
> that it launches, connected, and disconnects. I've rebooted several
> times since then (testing stonith). I can provide strace output if
> it's helpful.
>

 yes please
>>>
>>> Attached!
>>>
>>
>> That's very odd. I can see communication with the server and corosync in
>> there (do it's doing something) but no logging at all. When I start
>> qdevice on my systems it logs loads of messages even if it doesn't
>> manage to contact the server. Do you have any logging entries in
>> corosync.conf that might be stopping it?
> 
> I haven't checked the corosync logs for any entries before, but I just
> did.  There isn't anything logged.
> 
>> Where did the binary come from? did you build it yourself or is it from
>> a package? I wonder if it's got corrupted or is a bad version. Possibly
>> linked against a 'dodgy' libqb - there have been some things going on
>> there that could cause logging to go missing in some circumstances.
>>
>> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
>> here anyway!
> 
> Hmm. Interesting.  I installed the debian package.  When it didn't
> work, I grabbed the source from github.  They both act the same way,
> but if there is an underlying library issue then that will continue to
> be a problem.
> 
> It doesn't say much:
> /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1
> 
> 

I just tried this on my Debian VM and it does exactly the same as yours.
So I think you should report it to the Debian maintainer as it doesn't
happen on my Fedora or RHEL systems

Chrissie


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource agents differences from 1.1.14 and 1.1.18

2018-06-21 Thread Jan Pokorný
Hello Salvatore,

On 21/06/18 12:44 +0200, Salvatore D'angelo wrote:
> I am trying to upgrade my PostgresSQL cluster managed by pacemaker
> to pacemaker 1.1.8 or 2.0.0.  I have some resource agents that I
> patched to have them working with my cluster.
> 
> Can someone tell me if something is changed in the OCF interface
> from 1.1.14 release and the 1.1.8/2.0.0?

You can consider the OCF specification/interface stable and no
breakages are really imminent.  There are admittedly some parts with
less than well-defined semantics (if it's defined at all; for instance,
questions on what's the proper interpretation of "unique" slash
reloadable parameters was raised in the past [1,2]).

This stability is moreover enforced with the requirement of cross
compatibility between various OCF conformant agent vs. resource
manager implementations (say those maintained in resource-agents
project vs. pacemaker, plus various versions thereof, without any
apriori defined ways of how to negotiate any further inteface
specifics, but see [3], for instance).

> I am using the following resource agents:
> 
> /usr/lib/ocf/resource.d/heartbeat/Filesystem
> /usr/lib/ocf/resource.d/heartbeat/ethmonitor
> /usr/lib/ocf/resource.d/heartbeat/pgsql (patched)

^ this is really contained within resource-agents project, and as
  mentioned, nothing pushes you to update this piece of software
  even if you intend to update pacemaker (granted, keeping step
  with overall evolutionary "time snapshots" is always wise)

> /usr/lib/ocf/resource.d/pacemaker/HealthCPU (patched)
> /usr/lib/ocf/resource.d/pacemaker/ping (patched)
> /usr/lib/ocf/resource.d/pacemaker/SysInfo (patched)

^ and these are from pacemaker's realms, so there's naturally
  a closer coupling possibly beyond what standard mandates, but
  again, OCF forms a "fixed point", basis upon which the graph
  connecting the functionality user(s) and providers is formed,
  so presumably you can mix and match various versions even if
  the bits come from the very same project

> I am doing some tests to verify this but I would like to know if
> there is at high level something I should be aware.

Nothing comes to my mind, though your are always best served with
your own investigation (since you are modifying the agents anyway).

As a rule of thumb, I'd start with checking the changelogs of the
mentioned projects, and deeper concerns can ultimately be resolved
with the review of cross-version changes on the source code level,
e.g.:

  git clone https://github.com/ClusterLabs/resource-agents.git
  pushd resource-agents
  # let's say you start with agents from v3.9.7 release
  git diff v3.9.7 v4.1.1 -- heartbeat/{Filesystem,ethmonitor,pgsql}
  popd

  git clone https://github.com/ClusterLabs/pacemaker.git
  pushd pacemaker
  git diff Pacemaker-1.1.14 Pacemaker-1.1.18 -- \
  extra/resources/{HealthCPU,SysInfo,ping}
  popd

It's more like showing how to fish than serving you a meal,
but hopefully this helps regardless (perhaps even more than
latter would do).


[1] https://lists.clusterlabs.org/pipermail/users/2016-June/010635.html
[2] https://lists.clusterlabs.org/pipermail/users/2017-September/013743.html
[3] https://github.com/ClusterLabs/OCF-spec/issues/17

-- 
Jan (Poki)


pgphjo9hkHXxI.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-21 Thread Stefan Krueger
Hi Ken,

> Can you attach the pe-input file listed just above here?
done ;) 

And thank you for your patience!

best regards
Stefan

pre-input-228.bz2
Description: application/bzip
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Jason Gauthier
On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield  wrote:
>
> On 19/06/18 18:47, Jason Gauthier wrote:
> > On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield  
> > wrote:
> >>
> >> On 19/06/18 11:44, Jason Gauthier wrote:
> >>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield  
> >>> wrote:
> 
>  On 19/06/18 02:46, Jason Gauthier wrote:
> > Greetings,
> >
> >I've just discovered corosync-qdevice and corosync-qnet.
> > (Thanks Ken Gaillot) . Set up was pretty quick.
> >
> > I enabled qnet off cluster.  I followed the steps presented by
> > corosync-qdevice-net-certutil.However, when running
> > corosync-qdevice it exits.  Even with -f -d there isn't a single
> > output presented.
> >
> 
>  It sounds like the first time you ran it (without -d -f)
>  corosync-qdevice started up and daemonised itself. The second time you
>  tried (with -d -f) it couldn't run because there was already one
>  running. There's a good argument for it printing an error if it's
>  already running I think!
> 
> >>>
> >>> The process doesn't stay running.  I've showed in output of qnet below
> >>> that it launches, connected, and disconnects. I've rebooted several
> >>> times since then (testing stonith). I can provide strace output if
> >>> it's helpful.
> >>>
> >>
> >> yes please
> >
> > Attached!
> >
>
> That's very odd. I can see communication with the server and corosync in
> there (do it's doing something) but no logging at all. When I start
> qdevice on my systems it logs loads of messages even if it doesn't
> manage to contact the server. Do you have any logging entries in
> corosync.conf that might be stopping it?

I haven't checked the corosync logs for any entries before, but I just
did.  There isn't anything logged.

> Where did the binary come from? did you build it yourself or is it from
> a package? I wonder if it's got corrupted or is a bad version. Possibly
> linked against a 'dodgy' libqb - there have been some things going on
> there that could cause logging to go missing in some circumstances.
>
> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
> here anyway!

Hmm. Interesting.  I installed the debian package.  When it didn't
work, I grabbed the source from github.  They both act the same way,
but if there is an underlying library issue then that will continue to
be a problem.

It doesn't say much:
/usr/lib/x86_64-linux-gnu/libqb.so.0.18.1


> Chrissie
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Resource agents differences from 1.1.14 and 1.1.18

2018-06-21 Thread Salvatore D'angelo
Hi all,

I am trying to upgrade my PostgresSQL cluster managed by pacemaker to pacemaker 
1.1.8 or 2.0.0.
I have some resource agents that I patched to have them working with my cluster.

Can someone tell me if something is changed in the OCF interface from 1.1.14 
release and the 1.1.8/2.0.0?
I am using the following resource agents:

/usr/lib/ocf/resource.d/heartbeat/Filesystem
/usr/lib/ocf/resource.d/heartbeat/ethmonitor
/usr/lib/ocf/resource.d/heartbeat/pgsql (patched)
/usr/lib/ocf/resource.d/pacemaker/HealthCPU (patched)
/usr/lib/ocf/resource.d/pacemaker/ping (patched)
/usr/lib/ocf/resource.d/pacemaker/SysInfo (patched)

I am doing some tests to verify this but I would like to know if there is at 
high level something I should be aware.
Thanks in advance for your help.___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

2018-06-21 Thread Christine Caulfield
On 19/06/18 18:47, Jason Gauthier wrote:
> On Tue, Jun 19, 2018 at 6:58 AM Christine Caulfield  
> wrote:
>>
>> On 19/06/18 11:44, Jason Gauthier wrote:
>>> On Tue, Jun 19, 2018 at 3:25 AM Christine Caulfield  
>>> wrote:

 On 19/06/18 02:46, Jason Gauthier wrote:
> Greetings,
>
>I've just discovered corosync-qdevice and corosync-qnet.
> (Thanks Ken Gaillot) . Set up was pretty quick.
>
> I enabled qnet off cluster.  I followed the steps presented by
> corosync-qdevice-net-certutil.However, when running
> corosync-qdevice it exits.  Even with -f -d there isn't a single
> output presented.
>

 It sounds like the first time you ran it (without -d -f)
 corosync-qdevice started up and daemonised itself. The second time you
 tried (with -d -f) it couldn't run because there was already one
 running. There's a good argument for it printing an error if it's
 already running I think!

>>>
>>> The process doesn't stay running.  I've showed in output of qnet below
>>> that it launches, connected, and disconnects. I've rebooted several
>>> times since then (testing stonith). I can provide strace output if
>>> it's helpful.
>>>
>>
>> yes please
> 
> Attached!
> 

That's very odd. I can see communication with the server and corosync in
there (do it's doing something) but no logging at all. When I start
qdevice on my systems it logs loads of messages even if it doesn't
manage to contact the server. Do you have any logging entries in
corosync.conf that might be stopping it?

Where did the binary come from? did you build it yourself or is it from
a package? I wonder if it's got corrupted or is a bad version. Possibly
linked against a 'dodgy' libqb - there have been some things going on
there that could cause logging to go missing in some circumstances.

Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
here anyway!

Chrissie

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org