Illumos might have getpeerucred, which can also set errno to ENOTSUP. On Sun, Jul 26, 2020 at 3:25 AM Reid Wahl <nw...@redhat.com> wrote:
> Hmm. If it's reading PCMK_ipc_type and matching the server type to > QB_IPC_SOCKET, then the only other place I see it could be coming from is > qb_ipc_auth_creds. > > qb_ipcs_run -> qb_ipcs_us_publish -> qb_ipcs_us_connection_acceptor -> > qb_ipcs_uc_recv_and_auth -> process_auth -> qb_ipc_auth_creds -> > > static int32_t > qb_ipc_auth_creds(struct ipc_auth_data *data) > { > ... > #ifdef HAVE_GETPEERUCRED > /* > * Solaris and some BSD systems > ... > #elif defined(HAVE_GETPEEREID) > /* > * Usually MacOSX systems > ... > #elif defined(SO_PASSCRED) > /* > * Usually Linux systems > ... > #else /* no credentials */ > data->ugp.pid = 0; > data->ugp.uid = 0; > data->ugp.gid = 0; > res = -ENOTSUP; > #endif /* no credentials */ > > return res; > > I'll leave it to Ken to say whether that's likely and what it implies if > so. > > On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon <gbul...@sonicle.com> > wrote: > >> Sorry, actually the problem is not gone yet. >> Now corosync and pacemaker are running happily, but those IPC errors are >> coming out of heartbeat and crmd as soon as I start it. >> The pacemakerd process has PCMK_ipc_type=socket, what's wrong with >> heartbeat or crmd? >> >> Here's the env of the process: >> >> sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222 >> 4222: /usr/sbin/pacemakerd >> envp[0]: PCMK_respawned=true >> envp[1]: PCMK_watchdog=false >> envp[2]: HA_LOGFACILITY=none >> envp[3]: HA_logfacility=none >> envp[4]: PCMK_logfacility=none >> envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log >> envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log >> envp[7]: HA_debug=0 >> envp[8]: PCMK_debug=0 >> envp[9]: HA_quorum_type=corosync >> envp[10]: PCMK_quorum_type=corosync >> envp[11]: HA_cluster_type=corosync >> envp[12]: PCMK_cluster_type=corosync >> envp[13]: HA_use_logd=off >> envp[14]: PCMK_use_logd=off >> envp[15]: HA_mcp=true >> envp[16]: PCMK_mcp=true >> envp[17]: HA_LOGD=no >> envp[18]: LC_ALL=C >> envp[19]: PCMK_service=pacemakerd >> envp[20]: PCMK_ipc_type=socket >> envp[21]: SMF_ZONENAME=global >> envp[22]: PWD=/ >> envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default >> envp[24]: _=/usr/sbin/pacemakerd >> envp[25]: TZ=Europe/Rome >> envp[26]: LANG=en_US.UTF-8 >> envp[27]: SMF_METHOD=start >> envp[28]: SHLVL=2 >> envp[29]: PATH=/usr/sbin:/usr/bin >> envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default >> envp[31]: A__z="*SHLVL >> >> >> Here are crmd complaints: >> >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: >> Node xstorage1 state is now member >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: >> Could not start crmd IPC server: Operation not supported (-48) >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: >> Failed to create IPC server: shutting down and inhibiting respawn >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: >> The local CRM is operational >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: >> Input I_ERROR received in state S_STARTING from do_started >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: >> State transition S_STARTING -> S_RECOVERY >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: >> Fast-tracking shutdown in response to errors >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: >> Input I_PENDING received in state S_RECOVERY from do_started >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: >> Input I_TERMINATE received in state S_RECOVERY from do_recover >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: >> Disconnected from the LRM >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: >> Child process pengine exited (pid=4316, rc=100) >> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: >> Could not recover from internal error >> Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: >> WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return >> code 201. >> >> >> >> >> *Sonicle S.r.l. *: http://www.sonicle.com >> *Music: *http://www.gabrielebulfon.com >> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon >> >> >> >> >> ---------------------------------------------------------------------------------- >> >> Da: Ken Gaillot <kgail...@redhat.com> >> A: Cluster Labs - All topics related to open-source clustering welcomed < >> users@clusterlabs.org> >> Data: 25 luglio 2020 0.46.52 CEST >> Oggetto: Re: [ClusterLabs] pacemaker startup problem >> >> On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote: >> > Hello, >> > >> > after a long time I'm back to run heartbeat/pacemaker/corosync on our >> > XStreamOS/illumos distro. >> > I rebuilt the original components I did in 2016 on our latest release >> > (probably a bit outdated, but I want to start from where I left). >> > Looks like pacemaker is having trouble starting up showin this logs: >> > >> > Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log >> > Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log >> > Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active >> > directory to /sonicle/var/cluster/lib/pacemaker/cores >> > Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15 >> > (e174ec8) >> > Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in >> > state S_STARTING from crmd_init >> > Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active >> > directory to /sonicle/var/cluster/lib/pacemaker/cores >> > Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active >> > directory to /sonicle/var/cluster/lib/pacemaker/cores >> > Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying >> > cluster type: 'heartbeat' >> > Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an >> > active 'heartbeat' cluster >> > Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect: >> > Connecting to cluster infrastructure: heartbeat >> >> >> > Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not >> > start lrmd IPC server: Operation not supported (-48) >> >> This is repeated for all the subdaemons ... the error is coming from >> qb_ipcs_run(), which looks like the issue is an invalid PCMK_ipc_type >> for illumos. If you set it to "socket" it should work. >> >> >> > Jul 24 18:21:32 [969] lrmd: error: main: Failed to create IPC server: >> > shutting down and inhibiting respawn >> > Jul 24 18:21:32 [969] lrmd: info: crm_xml_cleanup: Cleaning up memory >> > from libxml2 >> > Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Verifying cluster >> > type: 'heartbeat' >> > Jul 24 18:21:32 [971] crmd: info: get_cluster_type: Assuming an >> > active 'heartbeat' cluster >> > Jul 24 18:21:32 [971] crmd: info: start_subsystem: Starting sub- >> > system "pengine" >> > Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Created entry >> > 25bc5492-a49e-40d7-ae60-fd8f975a294a/80886f0 for node xstorage1/0 (1 >> > total) >> > Jul 24 18:21:32 [968] stonith-ng: info: crm_get_peer: Node 0 has uuid >> > d426a730-5229-6758-853a-99d4d491514a >> > Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: >> > Hostname: xstorage1 >> > Jul 24 18:21:32 [968] stonith-ng: info: register_heartbeat_conn: >> > UUID: d426a730-5229-6758-853a-99d4d491514a >> > Jul 24 18:21:32 [970] attrd: notice: crm_cluster_connect: Connecting >> > to cluster infrastructure: heartbeat >> > Jul 24 18:21:32 [970] attrd: error: mainloop_add_ipc_server: Could >> > not start attrd IPC server: Operation not supported (-48) >> > Jul 24 18:21:32 [970] attrd: error: attrd_ipc_server_init: Failed to >> > create attrd servers: exiting and inhibiting respawn. >> > Jul 24 18:21:32 [970] attrd: warning: attrd_ipc_server_init: Verify >> > pacemaker and pacemaker_remote are not both enabled. >> > Jul 24 18:21:32 [972] pengine: info: crm_log_init: Changed active >> > directory to /sonicle/var/cluster/lib/pacemaker/cores >> > Jul 24 18:21:32 [972] pengine: error: mainloop_add_ipc_server: Could >> > not start pengine IPC server: Operation not supported (-48) >> > Jul 24 18:21:32 [972] pengine: error: main: Failed to create IPC >> > server: shutting down and inhibiting respawn >> > Jul 24 18:21:32 [972] pengine: info: crm_xml_cleanup: Cleaning up >> > memory from libxml2 >> > Jul 24 18:21:33 [971] crmd: info: do_cib_control: Could not connect >> > to the CIB service: Transport endpoint is not connected >> > Jul 24 18:21:33 [971] crmd: warning: do_cib_control: Couldn't >> > complete CIB registration 1 times... pause and retry >> > Jul 24 18:21:33 [971] crmd: error: crmd_child_exit: Child process >> > pengine exited (pid=972, rc=100) >> > Jul 24 18:21:35 [971] crmd: info: crm_timer_popped: Wait Timer >> > (I_NULL) just popped (2000ms) >> > Jul 24 18:21:36 [971] crmd: info: do_cib_control: Could not connect >> > to the CIB service: Transport endpoint is not connected >> > Jul 24 18:21:36 [971] crmd: warning: do_cib_control: Couldn't >> > complete CIB registration 2 times... pause and retry >> > Jul 24 18:21:38 [971] crmd: info: crm_timer_popped: Wait Timer >> > (I_NULL) just popped (2000ms) >> > Jul 24 18:21:39 [971] crmd: info: do_cib_control: Could not connect >> > to the CIB service: Transport endpoint is not connected >> > Jul 24 18:21:39 [971] crmd: warning: do_cib_control: Couldn't >> > complete CIB registration 3 times... pause and retry >> > Jul 24 18:21:41 [971] crmd: info: crm_timer_popped: Wait Timer >> > (I_NULL) just popped (2000ms) >> > Jul 24 18:21:42 [971] crmd: info: do_cib_control: Could not connect >> > to the CIB service: Transport endpoint is not connected >> > Jul 24 18:21:42 [971] crmd: warning: do_cib_control: Couldn't >> > complete CIB registration 4 times... pause and retry >> > Jul 24 18:21:42 [968] stonith-ng: error: setup_cib: Could not connect >> > to the CIB service: Transport endpoint is not connected (-134) >> > Jul 24 18:21:42 [968] stonith-ng: error: mainloop_add_ipc_server: >> > Could not start stonith-ng IPC server: Operation not supported (-48) >> > Jul 24 18:21:42 [968] stonith-ng: error: stonith_ipc_server_init: >> > Failed to create stonith-ng servers: exiting and inhibiting respawn. >> > Jul 24 18:21:42 [968] stonith-ng: warning: stonith_ipc_server_init: >> > Verify pacemaker and pacemaker_remote are not both enabled. >> > >> > Any idea what's happening? >> > Gabriele >> > >> > >> > >> > >> > Sonicle S.r.l. : http://www.sonicle.com >> > Music: http://www.gabrielebulfon.com >> > Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon >> > _______________________________________________ >> > Manage your subscription: >> > https://lists.clusterlabs.org/mailman/listinfo/users >> > >> > ClusterLabs home: https://www.clusterlabs.org/ >> -- >> Ken Gaillot <kgail...@redhat.com> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > > -- > Regards, > > Reid Wahl, RHCA > Software Maintenance Engineer, Red Hat > CEE - Platform Support Delivery - ClusterHA > -- Regards, Reid Wahl, RHCA Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/