Re: [ClusterLabs] pacemaker startup problem

2020-07-27 Thread Gabriele Bulfon
Solved this, actually I don't need heartbeat component and service running.
I just use corosync and pacemaker, and this seems to work.
Now going on with crm configuration.
 
Thanks!
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Reid Wahl
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
26 luglio 2020 12.25.20 CEST
Oggetto:
Re: [ClusterLabs] pacemaker startup problem
Hmm. If it's reading PCMK_ipc_type and matching the server type to 
QB_IPC_SOCKET, then the only other place I see it could be coming from is 
qb_ipc_auth_creds.
 
qb_ipcs_run -qb_ipcs_us_publish -qb_ipcs_us_connection_acceptor 
-qb_ipcs_uc_recv_and_auth -process_auth -qb_ipc_auth_creds -
 
static int32_t
qb_ipc_auth_creds(struct ipc_auth_data *data)
{
...
#ifdef HAVE_GETPEERUCRED
        /*
         * Solaris and some BSD systems
...
#elif defined(HAVE_GETPEEREID)
        /*
        * Usually MacOSX systems
...
#elif defined(SO_PASSCRED)
        /*
        * Usually Linux systems
...
#else /* no credentials */
        data-ugp.pid = 0;
        data-ugp.uid = 0;
        data-ugp.gid = 0;
        res = -ENOTSUP;
#endif /* no credentials */
        return res;
 
I'll leave it to Ken to say whether that's likely and what it implies if so.
On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon
gbul...@sonicle.com
wrote:
Sorry, actually the problem is not gone yet.
Now corosync and pacemaker are running happily, but those IPC errors are coming 
out of heartbeat and crmd as soon as I start it.
The pacemakerd process has PCMK_ipc_type=socket, what's wrong with heartbeat or 
crmd?
 
Here's the env of the process:
 
sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
4222: /usr/sbin/pacemakerd
envp[0]: PCMK_respawned=true
envp[1]: PCMK_watchdog=false
envp[2]: HA_LOGFACILITY=none
envp[3]: HA_logfacility=none
envp[4]: PCMK_logfacility=none
envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
envp[7]: HA_debug=0
envp[8]: PCMK_debug=0
envp[9]: HA_quorum_type=corosync
envp[10]: PCMK_quorum_type=corosync
envp[11]: HA_cluster_type=corosync
envp[12]: PCMK_cluster_type=corosync
envp[13]: HA_use_logd=off
envp[14]: PCMK_use_logd=off
envp[15]: HA_mcp=true
envp[16]: PCMK_mcp=true
envp[17]: HA_LOGD=no
envp[18]: LC_ALL=C
envp[19]: PCMK_service=pacemakerd
envp[20]: PCMK_ipc_type=socket
envp[21]: SMF_ZONENAME=global
envp[22]: PWD=/
envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
envp[24]: _=/usr/sbin/pacemakerd
envp[25]: TZ=Europe/Rome
envp[26]: LANG=en_US.UTF-8
envp[27]: SMF_METHOD=start
envp[28]: SHLVL=2
envp[29]: PATH=/usr/sbin:/usr/bin
envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[31]: A__z="*SHLVL
 
 
Here are crmd complaints:
 
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Node 
xstorage1 state is now member
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
start crmd IPC server: Operation not supported (-48)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Failed to 
create IPC server: shutting down and inhibiting respawn
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: The 
local CRM is operational
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_ERROR received in state S_STARTING from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: State 
transition S_STARTING -S_RECOVERY
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: 
Fast-tracking shutdown in response to errors
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Input 
I_PENDING received in state S_RECOVERY from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_TERMINATE received in state S_RECOVERY from do_recover
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: 
Disconnected from the LRM
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Child 
process pengine exited (pid=4316, rc=100)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
recover from internal error
Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: WARN: 
Managed /usr/libexec/pacemaker/crmd process 4315 exited with return code 201.
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
kgail...@redhat.com
A: Cluster Labs - All topics related to open-source clustering welcomed
users@clusterlabs.org
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back t

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Reid Wahl
Illumos might have getpeerucred, which can also set errno to ENOTSUP.

On Sun, Jul 26, 2020 at 3:25 AM Reid Wahl  wrote:

> Hmm. If it's reading PCMK_ipc_type and matching the server type to
> QB_IPC_SOCKET, then the only other place I see it could be coming from is
> qb_ipc_auth_creds.
>
> qb_ipcs_run -> qb_ipcs_us_publish -> qb_ipcs_us_connection_acceptor ->
> qb_ipcs_uc_recv_and_auth -> process_auth -> qb_ipc_auth_creds ->
>
> static int32_t
> qb_ipc_auth_creds(struct ipc_auth_data *data)
> {
> ...
> #ifdef HAVE_GETPEERUCRED
> /*
>  * Solaris and some BSD systems
> ...
> #elif defined(HAVE_GETPEEREID)
> /*
> * Usually MacOSX systems
> ...
> #elif defined(SO_PASSCRED)
> /*
> * Usually Linux systems
> ...
> #else /* no credentials */
> data->ugp.pid = 0;
> data->ugp.uid = 0;
> data->ugp.gid = 0;
> res = -ENOTSUP;
> #endif /* no credentials */
>
> return res;
>
> I'll leave it to Ken to say whether that's likely and what it implies if
> so.
>
> On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon 
> wrote:
>
>> Sorry, actually the problem is not gone yet.
>> Now corosync and pacemaker are running happily, but those IPC errors are
>> coming out of heartbeat and crmd as soon as I start it.
>> The pacemakerd process has PCMK_ipc_type=socket, what's wrong with
>> heartbeat or crmd?
>>
>> Here's the env of the process:
>>
>> sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
>> 4222: /usr/sbin/pacemakerd
>> envp[0]: PCMK_respawned=true
>> envp[1]: PCMK_watchdog=false
>> envp[2]: HA_LOGFACILITY=none
>> envp[3]: HA_logfacility=none
>> envp[4]: PCMK_logfacility=none
>> envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
>> envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
>> envp[7]: HA_debug=0
>> envp[8]: PCMK_debug=0
>> envp[9]: HA_quorum_type=corosync
>> envp[10]: PCMK_quorum_type=corosync
>> envp[11]: HA_cluster_type=corosync
>> envp[12]: PCMK_cluster_type=corosync
>> envp[13]: HA_use_logd=off
>> envp[14]: PCMK_use_logd=off
>> envp[15]: HA_mcp=true
>> envp[16]: PCMK_mcp=true
>> envp[17]: HA_LOGD=no
>> envp[18]: LC_ALL=C
>> envp[19]: PCMK_service=pacemakerd
>> envp[20]: PCMK_ipc_type=socket
>> envp[21]: SMF_ZONENAME=global
>> envp[22]: PWD=/
>> envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
>> envp[24]: _=/usr/sbin/pacemakerd
>> envp[25]: TZ=Europe/Rome
>> envp[26]: LANG=en_US.UTF-8
>> envp[27]: SMF_METHOD=start
>> envp[28]: SHLVL=2
>> envp[29]: PATH=/usr/sbin:/usr/bin
>> envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
>> envp[31]: A__z="*SHLVL
>>
>>
>> Here are crmd complaints:
>>
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> Node xstorage1 state is now member
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Could not start crmd IPC server: Operation not supported (-48)
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Failed to create IPC server: shutting down and inhibiting respawn
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> The local CRM is operational
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Input I_ERROR received in state S_STARTING from do_started
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> State transition S_STARTING -> S_RECOVERY
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
>> Fast-tracking shutdown in response to errors
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
>> Input I_PENDING received in state S_RECOVERY from do_started
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Input I_TERMINATE received in state S_RECOVERY from do_recover
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> Disconnected from the LRM
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Child process pengine exited (pid=4316, rc=100)
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Could not recover from internal error
>> Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]:
>> WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return
>> code 201.
>>
>>
>>
>>
>> *Sonicle S.r.l. *: http://www.sonicle.com

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Reid Wahl
Hmm. If it's reading PCMK_ipc_type and matching the server type to
QB_IPC_SOCKET, then the only other place I see it could be coming from is
qb_ipc_auth_creds.

qb_ipcs_run -> qb_ipcs_us_publish -> qb_ipcs_us_connection_acceptor ->
qb_ipcs_uc_recv_and_auth -> process_auth -> qb_ipc_auth_creds ->

static int32_t
qb_ipc_auth_creds(struct ipc_auth_data *data)
{
...
#ifdef HAVE_GETPEERUCRED
/*
 * Solaris and some BSD systems
...
#elif defined(HAVE_GETPEEREID)
/*
* Usually MacOSX systems
...
#elif defined(SO_PASSCRED)
/*
* Usually Linux systems
...
#else /* no credentials */
data->ugp.pid = 0;
data->ugp.uid = 0;
data->ugp.gid = 0;
res = -ENOTSUP;
#endif /* no credentials */

return res;

I'll leave it to Ken to say whether that's likely and what it implies if so.

On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon  wrote:

> Sorry, actually the problem is not gone yet.
> Now corosync and pacemaker are running happily, but those IPC errors are
> coming out of heartbeat and crmd as soon as I start it.
> The pacemakerd process has PCMK_ipc_type=socket, what's wrong with
> heartbeat or crmd?
>
> Here's the env of the process:
>
> sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
> 4222: /usr/sbin/pacemakerd
> envp[0]: PCMK_respawned=true
> envp[1]: PCMK_watchdog=false
> envp[2]: HA_LOGFACILITY=none
> envp[3]: HA_logfacility=none
> envp[4]: PCMK_logfacility=none
> envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
> envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
> envp[7]: HA_debug=0
> envp[8]: PCMK_debug=0
> envp[9]: HA_quorum_type=corosync
> envp[10]: PCMK_quorum_type=corosync
> envp[11]: HA_cluster_type=corosync
> envp[12]: PCMK_cluster_type=corosync
> envp[13]: HA_use_logd=off
> envp[14]: PCMK_use_logd=off
> envp[15]: HA_mcp=true
> envp[16]: PCMK_mcp=true
> envp[17]: HA_LOGD=no
> envp[18]: LC_ALL=C
> envp[19]: PCMK_service=pacemakerd
> envp[20]: PCMK_ipc_type=socket
> envp[21]: SMF_ZONENAME=global
> envp[22]: PWD=/
> envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
> envp[24]: _=/usr/sbin/pacemakerd
> envp[25]: TZ=Europe/Rome
> envp[26]: LANG=en_US.UTF-8
> envp[27]: SMF_METHOD=start
> envp[28]: SHLVL=2
> envp[29]: PATH=/usr/sbin:/usr/bin
> envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
> envp[31]: A__z="*SHLVL
>
>
> Here are crmd complaints:
>
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> Node xstorage1 state is now member
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Could not start crmd IPC server: Operation not supported (-48)
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Failed to create IPC server: shutting down and inhibiting respawn
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> The local CRM is operational
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Input I_ERROR received in state S_STARTING from do_started
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> State transition S_STARTING -> S_RECOVERY
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
> Fast-tracking shutdown in response to errors
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
> Input I_PENDING received in state S_RECOVERY from do_started
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Input I_TERMINATE received in state S_RECOVERY from do_recover
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> Disconnected from the LRM
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Child process pengine exited (pid=4316, rc=100)
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Could not recover from internal error
> Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]:
> WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return
> code 201.
>
>
>
>
> *Sonicle S.r.l. *: http://www.sonicle.com
> *Music: *http://www.gabrielebulfon.com
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
>
>
>
>
> --
>
> Da: Ken Gaillot 
> A: Cluster Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> Data: 25 luglio 2020 0.46.52 CEST
> Oggetto: Re: [ClusterLabs] pacemaker startup problem
>
> On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
> > Hello,
> >
> > after a long time I'm back to run heartbeat/pacemaker

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Gabriele Bulfon
Sorry, actually the problem is not gone yet.
Now corosync and pacemaker are running happily, but those IPC errors are coming 
out of heartbeat and crmd as soon as I start it.
The pacemakerd process has PCMK_ipc_type=socket, what's wrong with heartbeat or 
crmd?
 
Here's the env of the process:
 
sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
4222: /usr/sbin/pacemakerd
envp[0]: PCMK_respawned=true
envp[1]: PCMK_watchdog=false
envp[2]: HA_LOGFACILITY=none
envp[3]: HA_logfacility=none
envp[4]: PCMK_logfacility=none
envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
envp[7]: HA_debug=0
envp[8]: PCMK_debug=0
envp[9]: HA_quorum_type=corosync
envp[10]: PCMK_quorum_type=corosync
envp[11]: HA_cluster_type=corosync
envp[12]: PCMK_cluster_type=corosync
envp[13]: HA_use_logd=off
envp[14]: PCMK_use_logd=off
envp[15]: HA_mcp=true
envp[16]: PCMK_mcp=true
envp[17]: HA_LOGD=no
envp[18]: LC_ALL=C
envp[19]: PCMK_service=pacemakerd
envp[20]: PCMK_ipc_type=socket
envp[21]: SMF_ZONENAME=global
envp[22]: PWD=/
envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
envp[24]: _=/usr/sbin/pacemakerd
envp[25]: TZ=Europe/Rome
envp[26]: LANG=en_US.UTF-8
envp[27]: SMF_METHOD=start
envp[28]: SHLVL=2
envp[29]: PATH=/usr/sbin:/usr/bin
envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[31]: A__z="*SHLVL
 
 
Here are crmd complaints:
 
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Node 
xstorage1 state is now member
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
start crmd IPC server: Operation not supported (-48)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Failed to 
create IPC server: shutting down and inhibiting respawn
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: The 
local CRM is operational
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_ERROR received in state S_STARTING from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: State 
transition S_STARTING -S_RECOVERY
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: 
Fast-tracking shutdown in response to errors
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Input 
I_PENDING received in state S_RECOVERY from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_TERMINATE received in state S_RECOVERY from do_recover
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: 
Disconnected from the LRM
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Child 
process pengine exited (pid=4316, rc=100)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
recover from internal error
Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: WARN: 
Managed /usr/libexec/pacemaker/crmd process 4315 exited with return code 201.
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
(e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
state S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
cluster type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
start lrmd IPC server: Operation not supported (-48)
This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the 

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Gabriele Bulfon
Sorry, I was using wrong hostnames for that networks, using debug log I found 
it was not finding "this node" in conf file.
 
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Gabriele Bulfon
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
26 luglio 2020 11.23.53 CEST
Oggetto:
Re: [ClusterLabs] pacemaker startup problem
 
Thanks, I ran it manually so I got those errors, running from service script it 
correctly set PCMK_ipc_type to socket.
 
But now I see these now:
Jul 26 11:08:16 [4039] pacemakerd: info: crm_log_init: Changed active directory 
to /sonicle/var/cluster/lib/pacemaker/cores
Jul 26 11:08:16 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 1s
Jul 26 11:08:17 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 2s
Jul 26 11:08:19 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 3s
Jul 26 11:08:22 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 4s
Jul 26 11:08:26 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 5s
Jul 26 11:08:31 [4039] pacemakerd: warning: mcp_read_config: Could not connect 
to Cluster Configuration Database API, error 2
Jul 26 11:08:31 [4039] pacemakerd: notice: main: Could not obtain corosync 
config data, exiting
Jul 26 11:08:31 [4039] pacemakerd: info: crm_xml_cleanup: Cleaning up memory 
from libxml2
 
So I think I need to start corosync first (right?) but it dies with this:
 
Jul 26 11:07:06 [4027] xstorage1 corosync notice [MAIN ] Corosync Cluster 
Engine ('2.4.1'): started and ready to provide service.
Jul 26 11:07:06 [4027] xstorage1 corosync info [MAIN ] Corosync built-in 
features: bindnow
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: none hash: none
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] The network interface 
[10.100.100.1] is now up.
Jul 26 11:07:06 [4027] xstorage1 corosync notice [SERV ] Service engine loaded: 
corosync configuration map access [0]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync configuration service [1]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync cluster closed process group service v1.01 [2]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync profile loading service [4]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [QUORUM] Using quorum provider 
corosync_votequorum
Jul 26 11:07:06 [4027] xstorage1 corosync crit [QUORUM] Quorum provider: 
corosync_votequorum failed to initialize.
Jul 26 11:07:06 [4027] xstorage1 corosync error [SERV ] Service engine 
'corosync_quorum' failed to load for reason 'configuration error: nodelist or 
quorum.expected_votes must be configured!'
Jul 26 11:07:06 [4027] xstorage1 corosync error [MAIN ] Corosync Cluster Engine 
exiting with status 20 at 
/data/sources/sonicle/xstream-storage-gate/components/cluster/corosync/corosync-2.4.1/exec/service.c:356.
My corosync conf has nodelist configured! Here it is:
 
service {ver: 1name: pacemakeruse_mgmtd: nouse_logd: no}totem { 
   version: 2crypto_cipher: nonecrypto_hash: none
interface {ringnumber: 0bindnetaddr: 
10.100.100.0mcastaddr: 239.255.1.1mcastport: 
5405ttl: 1}}nodelist {   node { ring0_addr: 
xstorage1 nodeid: 1}   node { ring0_addr: xstorage2 
nodeid: 2}}quorum {provider: corosync_votequorum
two_node: 1}logging {fileline: offto_stderr: no
to_logfile: yeslogfile: /sonicle/var/log/cluster/corosync.log
to_syslog: nodebug: offtimestamp: onlogger_subsys { 
   subsys: QUORUMdebug: off}}
 
 
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from wh

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Gabriele Bulfon
Thanks, I ran it manually so I got those errors, running from service script it 
correctly set PCMK_ipc_type to socket.
 
But now I see these now:
Jul 26 11:08:16 [4039] pacemakerd: info: crm_log_init: Changed active directory 
to /sonicle/var/cluster/lib/pacemaker/cores
Jul 26 11:08:16 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 1s
Jul 26 11:08:17 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 2s
Jul 26 11:08:19 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 3s
Jul 26 11:08:22 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 4s
Jul 26 11:08:26 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 5s
Jul 26 11:08:31 [4039] pacemakerd: warning: mcp_read_config: Could not connect 
to Cluster Configuration Database API, error 2
Jul 26 11:08:31 [4039] pacemakerd: notice: main: Could not obtain corosync 
config data, exiting
Jul 26 11:08:31 [4039] pacemakerd: info: crm_xml_cleanup: Cleaning up memory 
from libxml2
 
So I think I need to start corosync first (right?) but it dies with this:
 
Jul 26 11:07:06 [4027] xstorage1 corosync notice [MAIN ] Corosync Cluster 
Engine ('2.4.1'): started and ready to provide service.
Jul 26 11:07:06 [4027] xstorage1 corosync info [MAIN ] Corosync built-in 
features: bindnow
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: none hash: none
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] The network interface 
[10.100.100.1] is now up.
Jul 26 11:07:06 [4027] xstorage1 corosync notice [SERV ] Service engine loaded: 
corosync configuration map access [0]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync configuration service [1]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync cluster closed process group service v1.01 [2]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync profile loading service [4]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [QUORUM] Using quorum provider 
corosync_votequorum
Jul 26 11:07:06 [4027] xstorage1 corosync crit [QUORUM] Quorum provider: 
corosync_votequorum failed to initialize.
Jul 26 11:07:06 [4027] xstorage1 corosync error [SERV ] Service engine 
'corosync_quorum' failed to load for reason 'configuration error: nodelist or 
quorum.expected_votes must be configured!'
Jul 26 11:07:06 [4027] xstorage1 corosync error [MAIN ] Corosync Cluster Engine 
exiting with status 20 at 
/data/sources/sonicle/xstream-storage-gate/components/cluster/corosync/corosync-2.4.1/exec/service.c:356.
My corosync conf has nodelist configured! Here it is:
 
service {ver: 1name: pacemakeruse_mgmtd: nouse_logd: no}totem { 
   version: 2crypto_cipher: nonecrypto_hash: none
interface {ringnumber: 0bindnetaddr: 
10.100.100.0mcastaddr: 239.255.1.1mcastport: 
5405ttl: 1}}nodelist {   node { ring0_addr: 
xstorage1 nodeid: 1}   node { ring0_addr: xstorage2 
nodeid: 2}}quorum {provider: corosync_votequorum
two_node: 1}logging {fileline: offto_stderr: no
to_logfile: yeslogfile: /sonicle/var/log/cluster/corosync.log
to_syslog: nodebug: offtimestamp: onlogger_subsys { 
   subsys: QUORUMdebug: off}}
 
 
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
(e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received

Re: [ClusterLabs] pacemaker startup problem

2020-07-24 Thread Ken Gaillot
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
> Hello,
>  
> after a long time I'm back to run heartbeat/pacemaker/corosync on our
> XStreamOS/illumos distro.
> I rebuilt the original components I did in 2016 on our latest release
> (probably a bit outdated, but I want to start from where I left).
> Looks like pacemaker is having trouble starting up showin this logs:
> 
> Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
> Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
> Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
> (e174ec8)
> Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
> state S_STARTING from crmd_init
> Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
> cluster type: 'heartbeat'
> Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
> active 'heartbeat' cluster
> Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
> Connecting to cluster infrastructure: heartbeat


> Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
> start lrmd IPC server: Operation not supported (-48)

This is repeated for all the subdaemons ... the error is coming from
qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type
for illumos. If you set it to "socket" it should work.


> Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server:
> shutting down and inhibiting respawn
> Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory
> from libxml2
> Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster
> type: 'heartbeat'
> Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an
> active 'heartbeat' cluster
> Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-
> system "pengine"
> Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry
> 25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1
> total)
> Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid
> d426a730-5229-6758-853a-99d4d491514a
> Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
> Hostname: xstorage1
> Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn:
> UUID: d426a730-5229-6758-853a-99d4d491514a
> Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting
> to cluster infrastructure: heartbeat
> Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could
> not start attrd IPC server: Operation not supported (-48)
> Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to
> create attrd servers: exiting and inhibiting respawn.
> Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify
> pacemaker and pacemaker_remote are not both enabled.
> Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active
> directory to /sonicle/var/cluster/lib/pacemaker/cores
> Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could
> not start pengine IPC server: Operation not supported (-48)
> Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC
> server: shutting down and inhibiting respawn
> Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up
> memory from libxml2
> Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 1 times... pause and retry
> Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process
> pengine exited (pid=972, rc=100)
> Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 2 times... pause and retry
> Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 3 times... pause and retry
> Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped (2000ms)
> Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect
> to the CIB service: Transport endpoint is not connected
> Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't
> complete CIB registration 4 times... pause and retry
> Jul 24 

[ClusterLabs] pacemaker startup problem

2020-07-24 Thread Gabriele Bulfon
Hello,
 
after a long time I'm back to run heartbeat/pacemaker/corosync on our 
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release (probably 
a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active directory to 
/sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15 (e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in state 
S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active directory to 
/sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active directory 
to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying cluster 
type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an active 
'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect: Connecting to 
cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not start 
lrmd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server: shutting 
down and inhibiting respawn
Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory from 
libxml2
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster type: 
'heartbeat'
Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an active 
'heartbeat' cluster
Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub-system "pengine"
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry 
25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1 total)
Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid 
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: Hostname: 
xstorage1
Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: UUID: 
d426a730-5229-6758-853a-99d4d491514a
Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting to cluster 
infrastructure: heartbeat
Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could not start 
attrd IPC server: Operation not supported (-48)
Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to create 
attrd servers: exiting and inhibiting respawn.
Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify pacemaker 
and pacemaker_remote are not both enabled.
Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active directory to 
/sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could not start 
pengine IPC server: Operation not supported (-48)
Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC server: 
shutting down and inhibiting respawn
Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up memory from 
libxml2
Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 1 times... pause and retry
Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process pengine 
exited (pid=972, rc=100)
Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just 
popped (2000ms)
Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 2 times... pause and retry
Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just 
popped (2000ms)
Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 3 times... pause and retry
Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer (I_NULL) just 
popped (2000ms)
Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect to the CIB 
service: Transport endpoint is not connected
Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't complete CIB 
registration 4 times... pause and retry
Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect to the 
CIB service: Transport endpoint is not connected (-134)
Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server: Could not 
start stonith-ng IPC server: Operation not supported (-48)
Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init: Failed to 
create stonith-ng servers: exiting and inhibiting respawn.
Jul