Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-03-30 Thread Ken Gaillot
On Wed, 2020-02-19 at 18:21 +0100, Maverick wrote:
> How is it possible that pacemaker is reporting that takes 4.2 minutes
> (254930ms) to execute the start of httpd systemd unit?

Sorry I didn't get a chance to look into this sooner.

Fedora 31 introduced a change where the ftime() call that pacemaker had
been using for operation timing was no longer available. We implemented
clock_gettime()-based timing in a rush because it happened right before
the release of 2.0.3. We enabled that code only for systems like Fedora
31 that didn't support ftime().

The clock_gettime()-based code turned out to have a bug that was
recently fixed. The fixes will be in 2.0.4 (the first release candidate
should come out in a couple of weeks) which will then be packaged for
Fedora 31 and 32.

> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)
> info:
> executing - rsc:apache action:start call_id:25
> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec)
>
> debug: Performing asynchronous start op on systemd unit httpd named
> 'apache'
> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
> (systemd_unit_exec_with_unit) debug: Calling StartUnit for
> apache:
> /org/freedesktop/systemd1/unit/httpd_2eservice
> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)
>
> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
> remaining=-154930ms)
> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)
> debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
> exec-time:254935ms queue-time:235ms
> 
> 
> Starting manually works fine and fast:
> 
> # time systemctl start httpd
> real0m0.144s
> user0m0.005s
> sys0m0.008s
> 
> 
> On 17/02/2020 22:47, Mvrk wrote:
> > In attachment the pacemaker.log. On the log i can see that the
> > cluster
> > tries to start, the start fails, then tries to stop, and the stop
> > also
> > fails also.
> > 
> > One more thing, my cluster was working fine on Fedora 28, i started
> > having this problem after upgrade to Fedora 31.
> > 
> > On 17/02/2020 21:30, Ricardo Esteves wrote:
> > > Hi,
> > > 
> > > Yes, i also don't understand why is trying to stop them first.
> > > 
> > > SELinux is disabled:
> > > 
> > > # getenforce
> > > Disabled
> > > 
> > > All systemd services controlled by the cluster are disabled from
> > > starting at boot:
> > > 
> > > # systemctl is-enabled httpd
> > > disabled
> > > 
> > > # systemctl is-enabled openvpn-server@01-server
> > > disabled
> > > 
> > > 
> > > On 17/02/2020 20:28, Ken Gaillot wrote:
> > > > On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
> > > > > Hi,
> > > > > 
> > > > > When i start my cluster, most of my systemd resources won't
> > > > > start:
> > > > > 
> > > > > Failed Resource Actions:
> > > > >   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
> > > > > status='Timed Out', exitreason='', last-rc-change='1970-01-01
> > > > > 01:00:54 +01:00', queued=29ms, exec=197799ms
> > > > >   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
> > > > > status='Timed Out', exitreason='', last-rc-change='1970-01-01
> > > > > 01:00:54 +01:00', queued=1805ms, exec=198841ms
> > > > 
> > > > These show that attempts to stop failed, rather than start.
> > > > 
> > > > > So everytime i reboot my node, i need to start the resources
> > > > > manually
> > > > > using systemd, for example:
> > > > > 
> > > > > systemd start apache
> > > > > 
> > > > > and then pcs resource cleanup
> > > > > 
> > > > > Resources configuration:
> > > > > 
> > > > > Clone: apache-clone
> > > > >   Meta Attrs: maintenance=false
> > > > >   Resource: apache (class=systemd type=httpd)
> > > > >Meta Attrs: maintenance=false
> > > > >Operations: monitor interval=60 timeout=100 (apache-
> > > > > monitor-
> > > > > interval-60)
> > > > >start interval=0s timeout=100 (apache-start-
> > > > > interval-
> > > > > 0s)
> > > > >stop interval=0s timeout=100 (apache-stop-
> > > > > interval-0s)
> > > > > 
> > > > > 
> > > > > 
> > > > > Resource: openvpn (class=systemd 
> > > > > type=openvpn-server@01-server)
> > > > >Meta Attrs: maintenance=false
> > > > >Operations: monitor interval=60 timeout=100 (openvpn-
> > > > > monitor-
> > > > > interval-60)
> > > > >start interval=0s timeout=100 (openvpn-start-
> > > > > interval-
> > > > > 0s)
> > > > >stop interval=0s timeout=100 (openvpn-stop-
> > > > > interval-
> > > > > 0s)
> > > > > 
> > > > > 
> > > > > 
> > > > > Btw, if i try a debug-start / debug-stop the mentioned
> > > > > resources
> > > > > start and stop ok.
> > > > 
> > > > Based on that, my first guess would be SELinux. Check the
> > > > SELinux logs
> > > > for denials.
> > > > 
> > > > Also, make sure your systemd services are not enabled in
> > > > systemd itself
> > > > (e.g. via systemctl enable). Clustered systemd services should
> > > > be
> > > > managed by the cluster only.
> 
> 

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-22 Thread Maverick
Hi,

As i don't have much time to dig into this pacemaker vs systemd problem,
i decided to dump systemd.

For apache resource i replaced it with ocf::heartbeat:apache, openvpn i
replaced with ocf::heartbeat:anything
and for the other resources that need some more elaborated start/stop
script i created /etc/init.d/ scripts and used lsb resource type.

Everything is working perfectly now.

On 20/02/2020 23:10, Maverick wrote:
> Hi,
>
> I'm using Fedora 31 (x86_64).
>
> For apache i can use the ocf agent sure, but i have other resources for
> who don't exist an ocf agent, so for them i need to use systemd.
>
> All ocf and lsb type resources start ok on boot, only systemd resources
> have this problem.
>
> I already enabled debug for httpd and openvpn-server systemd units, but
> i don't see any debug on /var/log/messages or journal about any of these
> units.
>
> Here some of the systemd units:
>
> Apache:
>
> [Unit]
> Description=The Apache HTTP Server
> Wants=httpd-init.service
> After=network.target remote-fs.target nss-lookup.target httpd-init.service
> Documentation=man:httpd.service(8)
>
> [Service]
> Type=notify
> Environment=LANG=C
> Environment=SYSTEMD_LOG_LEVEL=debug
>
> ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND
> ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
> # Send SIGWINCH for graceful stop
> KillSignal=SIGWINCH
> KillMode=mixed
> PrivateTmp=true
>
> [Install]
> WantedBy=multi-user.target
>
> -
>
> OpenVPN:
>
> [Unit]
> Description=OpenVPN service for %I
> After=syslog.target network-online.target
> Wants=network-online.target
> Documentation=man:openvpn(8)
> Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
> Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO
>
> [Service]
> Type=notify
> PrivateTmp=true
> WorkingDirectory=/etc/openvpn/server
> Environment=SYSTEMD_LOG_LEVEL=debug
> ExecStart=/usr/sbin/openvpn --status %t/openvpn-server/status-%i.log
> --status-version 2 --suppress-timestamps --cipher AES-256-GCM
> --ncp-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC:AES-128-CBC:BF-CBC
> --config %i.conf
> CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_BIND_SERVICE
> CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
> CAP_AUDIT_WRITE
> LimitNPROC=10
> DeviceAllow=/dev/null rw
> DeviceAllow=/dev/net/tun rw
> ProtectSystem=true
> ProtectHome=true
> KillMode=process
> RestartSec=5s
> Restart=on-failure
>
> [Install]
> WantedBy=multi-user.target
>
> -
>
> Zabbix Server:
>
> [Unit]
> Description=Zabbix Server with Oracle DB
> After=syslog.target network.target
>
> [Service]
> Type=simple
> Environment="LD_LIBRARY_PATH=/opt/oracle/lib"
> ExecStart=/usr/sbin/zabbix_server -f
> User=zabbixsrv
>
> [Install]
> WantedBy=multi-user.target
>
>
>
> On 20/02/2020 22:29, Strahil Nikolov wrote:
>> On February 20, 2020 10:29:54 PM GMT+02:00, Maverick  wrote:
 Hi Maverick,


 According this thread:

>>> https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html
 You have 'startup-fencing' is set  to false.

 Check it out - maybe this is your reason.

 Best Regards,
 Strahil Nikolov
>>> Yes, i have stonith disabled, because as soon as the resources startup
>>> fail on boot, node was rebooted.
>>>
>>>
>>> Anyway, i was checking the pacemaker logs and the journal log, and i
>>> see
>>> that the service actually starts ok but for some reason pacemaker
>>> thinks
>>> it has timeout and then because of that tries to stop and also thinks
>>> it
>>> has timeout but actually stops it:
>>>
>>> pacemaker.log:
>>>
>>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_execute)  info:
>>> executing - rsc:apache action:start call_id:25
>>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
>>> debug: Performing asynchronous start op on systemd unit httpd named
>>> 'apache'
>>> Feb 20 19:39:52 boss1 pacemaker-execd [1499]
>>> (systemd_unit_exec_with_unit)  debug: Calling StartUnit for apache:
>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (action_complete) 
>>> notice: Giving up on apache start (rc=0): timeout (elapsed=248199ms,
>>> remaining=-148199ms)
>>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_finished)
>>> debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
>>> exec-time:248205ms queue-time:216ms
>>>
>>> Feb 20 19:40:00 boss1 pacemaker-execd [1499] (log_execute)  info:
>>> executing - rsc:apache action:stop call_id:81
>>> Feb 20 19:40:00 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
>>> debug: Performing asynchronous stop op on systemd unit httpd named
>>> 'apache'
>>> Feb 20 19:40:00 boss1 pacemaker-execd [1499]
>>> (systemd_unit_exec_with_unit)  debug: Calling StopUnit for apache:
>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>> Feb 20 19:40:01 boss1 pacemaker-execd [1499] (action_complete) 
>>> 

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Maverick
Hi,

I'm using Fedora 31 (x86_64).

For apache i can use the ocf agent sure, but i have other resources for
who don't exist an ocf agent, so for them i need to use systemd.

All ocf and lsb type resources start ok on boot, only systemd resources
have this problem.

I already enabled debug for httpd and openvpn-server systemd units, but
i don't see any debug on /var/log/messages or journal about any of these
units.

Here some of the systemd units:

Apache:

[Unit]
Description=The Apache HTTP Server
Wants=httpd-init.service
After=network.target remote-fs.target nss-lookup.target httpd-init.service
Documentation=man:httpd.service(8)

[Service]
Type=notify
Environment=LANG=C
Environment=SYSTEMD_LOG_LEVEL=debug

ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
# Send SIGWINCH for graceful stop
KillSignal=SIGWINCH
KillMode=mixed
PrivateTmp=true

[Install]
WantedBy=multi-user.target

-

OpenVPN:

[Unit]
Description=OpenVPN service for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
WorkingDirectory=/etc/openvpn/server
Environment=SYSTEMD_LOG_LEVEL=debug
ExecStart=/usr/sbin/openvpn --status %t/openvpn-server/status-%i.log
--status-version 2 --suppress-timestamps --cipher AES-256-GCM
--ncp-ciphers AES-256-GCM:AES-128-GCM:AES-256-CBC:AES-128-CBC:BF-CBC
--config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_BIND_SERVICE
CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
CAP_AUDIT_WRITE
LimitNPROC=10
DeviceAllow=/dev/null rw
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process
RestartSec=5s
Restart=on-failure

[Install]
WantedBy=multi-user.target

-

Zabbix Server:

[Unit]
Description=Zabbix Server with Oracle DB
After=syslog.target network.target

[Service]
Type=simple
Environment="LD_LIBRARY_PATH=/opt/oracle/lib"
ExecStart=/usr/sbin/zabbix_server -f
User=zabbixsrv

[Install]
WantedBy=multi-user.target



On 20/02/2020 22:29, Strahil Nikolov wrote:
> On February 20, 2020 10:29:54 PM GMT+02:00, Maverick  wrote:
>>> Hi Maverick,
>>>
>>>
>>> According this thread:
>>>
>> https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html
>>> You have 'startup-fencing' is set  to false.
>>>
>>> Check it out - maybe this is your reason.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>> Yes, i have stonith disabled, because as soon as the resources startup
>> fail on boot, node was rebooted.
>>
>>
>> Anyway, i was checking the pacemaker logs and the journal log, and i
>> see
>> that the service actually starts ok but for some reason pacemaker
>> thinks
>> it has timeout and then because of that tries to stop and also thinks
>> it
>> has timeout but actually stops it:
>>
>> pacemaker.log:
>>
>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_execute)  info:
>> executing - rsc:apache action:start call_id:25
>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
>> debug: Performing asynchronous start op on systemd unit httpd named
>> 'apache'
>> Feb 20 19:39:52 boss1 pacemaker-execd [1499]
>> (systemd_unit_exec_with_unit)  debug: Calling StartUnit for apache:
>> /org/freedesktop/systemd1/unit/httpd_2eservice
>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (action_complete) 
>> notice: Giving up on apache start (rc=0): timeout (elapsed=248199ms,
>> remaining=-148199ms)
>> Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_finished)
>> debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
>> exec-time:248205ms queue-time:216ms
>>
>> Feb 20 19:40:00 boss1 pacemaker-execd [1499] (log_execute)  info:
>> executing - rsc:apache action:stop call_id:81
>> Feb 20 19:40:00 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
>> debug: Performing asynchronous stop op on systemd unit httpd named
>> 'apache'
>> Feb 20 19:40:00 boss1 pacemaker-execd [1499]
>> (systemd_unit_exec_with_unit)  debug: Calling StopUnit for apache:
>> /org/freedesktop/systemd1/unit/httpd_2eservice
>> Feb 20 19:40:01 boss1 pacemaker-execd [1499] (action_complete) 
>> notice: Giving up on apache stop (rc=0): timeout (elapsed=304539ms,
>> remaining=-204539ms)
>> Feb 20 19:40:01 boss1 pacemaker-execd [1499] (log_finished)
>> debug: finished - rsc:apache action:monitor call_id:81  exit-code:198
>> exec-time:304545ms queue-time:240ms
>>
>>
>> system journal:
>>
>> Feb 20 19:39:52 boss1 systemd[1]: Starting Cluster Controlled httpd...
>> Feb 20 19:39:53 boss1 systemd[1]: Started Cluster Controlled httpd.
>> Feb 20 19:39:53 boss1 httpd[2145]: Server configured, listening on:
>> port
>> 443, port 80
>>
>> Feb 20 19:40:01 boss1 systemd[1]: Stopping The Apache HTTP Server...
>> Feb 20 19:40:02 boss1 

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Strahil Nikolov
On February 20, 2020 10:29:54 PM GMT+02:00, Maverick  wrote:
>
>> Hi Maverick,
>>
>>
>> According this thread:
>>
>https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html
>>
>> You have 'startup-fencing' is set  to false.
>>
>> Check it out - maybe this is your reason.
>>
>> Best Regards,
>> Strahil Nikolov
>
>Yes, i have stonith disabled, because as soon as the resources startup
>fail on boot, node was rebooted.
>
>
>Anyway, i was checking the pacemaker logs and the journal log, and i
>see
>that the service actually starts ok but for some reason pacemaker
>thinks
>it has timeout and then because of that tries to stop and also thinks
>it
>has timeout but actually stops it:
>
>pacemaker.log:
>
>Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_execute)  info:
>executing - rsc:apache action:start call_id:25
>Feb 20 19:39:52 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
>debug: Performing asynchronous start op on systemd unit httpd named
>'apache'
>Feb 20 19:39:52 boss1 pacemaker-execd [1499]
>(systemd_unit_exec_with_unit)  debug: Calling StartUnit for apache:
>/org/freedesktop/systemd1/unit/httpd_2eservice
>Feb 20 19:39:52 boss1 pacemaker-execd [1499] (action_complete) 
>notice: Giving up on apache start (rc=0): timeout (elapsed=248199ms,
>remaining=-148199ms)
>Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_finished)
>debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
>exec-time:248205ms queue-time:216ms
>
>Feb 20 19:40:00 boss1 pacemaker-execd [1499] (log_execute)  info:
>executing - rsc:apache action:stop call_id:81
>Feb 20 19:40:00 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
>debug: Performing asynchronous stop op on systemd unit httpd named
>'apache'
>Feb 20 19:40:00 boss1 pacemaker-execd [1499]
>(systemd_unit_exec_with_unit)  debug: Calling StopUnit for apache:
>/org/freedesktop/systemd1/unit/httpd_2eservice
>Feb 20 19:40:01 boss1 pacemaker-execd [1499] (action_complete) 
>notice: Giving up on apache stop (rc=0): timeout (elapsed=304539ms,
>remaining=-204539ms)
>Feb 20 19:40:01 boss1 pacemaker-execd [1499] (log_finished)
>debug: finished - rsc:apache action:monitor call_id:81  exit-code:198
>exec-time:304545ms queue-time:240ms
>
>
>system journal:
>
>Feb 20 19:39:52 boss1 systemd[1]: Starting Cluster Controlled httpd...
>Feb 20 19:39:53 boss1 systemd[1]: Started Cluster Controlled httpd.
>Feb 20 19:39:53 boss1 httpd[2145]: Server configured, listening on:
>port
>443, port 80
>
>Feb 20 19:40:01 boss1 systemd[1]: Stopping The Apache HTTP Server...
>Feb 20 19:40:02 boss1 systemd[1]: httpd.service: Succeeded.
>Feb 20 19:40:02 boss1 systemd[1]: Stopped The Apache HTTP Server.
>
>
>
>
>On 20/02/2020 21:02, Strahil Nikolov wrote:
>> On February 20, 2020 9:35:07 PM GMT+02:00, Maverick 
>wrote:
>>> Manually it starts ok, no problems:
>>>
>>> pcs resource debug-start apache --full
>>> (unpack_config)     warning: Blind faith: not fencing unseen nodes
>>> Operation start for apache (systemd::httpd) returned: 'ok' (0)
>>>
>>>
>>> On 20/02/2020 16:46, Strahil Nikolov wrote:
 On February 20, 2020 12:49:43 PM GMT+02:00, Maverick 
>>> wrote:
>> You really need to debug the start & stop of  tthe resource .
>>
>> Please try the debug procedure  and provide the output:
>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>
>> Best Regards,
>> Strahil Nikolov
> Hi,
>
> Correct me if i'm wrong, but i think that procedure doesn't work
>for
> systemd class resources, i don't know which OCF script is
>>> responsible
> for handling systemd class resources.
>
> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
> command
> only in SUSE.
>
>
>
> On 19/02/2020 19:23, Strahil Nikolov wrote:
>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick
>
> wrote:
>>> How is it possible that pacemaker is reporting that takes 4.2
> minutes
>>> (254930ms) to execute the start of httpd systemd unit?
>>>
>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)
>   
>>> info:
>>> executing - rsc:apache action:start call_id:25
>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>>> (systemd_unit_exec)
>>>    
>>> debug: Performing asynchronous start op on systemd unit httpd
>>> named
>>> 'apache'
>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
> apache:
>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514]
>(action_complete)
>    
>>> notice: Giving up on apache start (rc=0): timeout
>>> (elapsed=254930ms,
>>> remaining=-154930ms)
>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)
>>>    
>>> debug: finished - rsc:apache action:monitor call_id:25 
> exit-code:198
>>> 

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Maverick

> Hi Maverick,
>
>
> According this thread:
> https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html
>
> You have 'startup-fencing' is set  to false.
>
> Check it out - maybe this is your reason.
>
> Best Regards,
> Strahil Nikolov

Yes, i have stonith disabled, because as soon as the resources startup
fail on boot, node was rebooted.


Anyway, i was checking the pacemaker logs and the journal log, and i see
that the service actually starts ok but for some reason pacemaker thinks
it has timeout and then because of that tries to stop and also thinks it
has timeout but actually stops it:

pacemaker.log:

Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_execute)  info:
executing - rsc:apache action:start call_id:25
Feb 20 19:39:52 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
debug: Performing asynchronous start op on systemd unit httpd named 'apache'
Feb 20 19:39:52 boss1 pacemaker-execd [1499]
(systemd_unit_exec_with_unit)  debug: Calling StartUnit for apache:
/org/freedesktop/systemd1/unit/httpd_2eservice
Feb 20 19:39:52 boss1 pacemaker-execd [1499] (action_complete) 
notice: Giving up on apache start (rc=0): timeout (elapsed=248199ms,
remaining=-148199ms)
Feb 20 19:39:52 boss1 pacemaker-execd [1499] (log_finished)
debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
exec-time:248205ms queue-time:216ms

Feb 20 19:40:00 boss1 pacemaker-execd [1499] (log_execute)  info:
executing - rsc:apache action:stop call_id:81
Feb 20 19:40:00 boss1 pacemaker-execd [1499] (systemd_unit_exec)   
debug: Performing asynchronous stop op on systemd unit httpd named 'apache'
Feb 20 19:40:00 boss1 pacemaker-execd [1499]
(systemd_unit_exec_with_unit)  debug: Calling StopUnit for apache:
/org/freedesktop/systemd1/unit/httpd_2eservice
Feb 20 19:40:01 boss1 pacemaker-execd [1499] (action_complete) 
notice: Giving up on apache stop (rc=0): timeout (elapsed=304539ms,
remaining=-204539ms)
Feb 20 19:40:01 boss1 pacemaker-execd [1499] (log_finished)
debug: finished - rsc:apache action:monitor call_id:81  exit-code:198
exec-time:304545ms queue-time:240ms


system journal:

Feb 20 19:39:52 boss1 systemd[1]: Starting Cluster Controlled httpd...
Feb 20 19:39:53 boss1 systemd[1]: Started Cluster Controlled httpd.
Feb 20 19:39:53 boss1 httpd[2145]: Server configured, listening on: port
443, port 80

Feb 20 19:40:01 boss1 systemd[1]: Stopping The Apache HTTP Server...
Feb 20 19:40:02 boss1 systemd[1]: httpd.service: Succeeded.
Feb 20 19:40:02 boss1 systemd[1]: Stopped The Apache HTTP Server.




On 20/02/2020 21:02, Strahil Nikolov wrote:
> On February 20, 2020 9:35:07 PM GMT+02:00, Maverick  wrote:
>> Manually it starts ok, no problems:
>>
>> pcs resource debug-start apache --full
>> (unpack_config)     warning: Blind faith: not fencing unseen nodes
>> Operation start for apache (systemd::httpd) returned: 'ok' (0)
>>
>>
>> On 20/02/2020 16:46, Strahil Nikolov wrote:
>>> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick 
>> wrote:
> You really need to debug the start & stop of  tthe resource .
>
> Please try the debug procedure  and provide the output:
> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>
> Best Regards,
> Strahil Nikolov
 Hi,

 Correct me if i'm wrong, but i think that procedure doesn't work for
 systemd class resources, i don't know which OCF script is
>> responsible
 for handling systemd class resources.

 Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
 command
 only in SUSE.



 On 19/02/2020 19:23, Strahil Nikolov wrote:
> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick 
 wrote:
>> How is it possible that pacemaker is reporting that takes 4.2
 minutes
>> (254930ms) to execute the start of httpd systemd unit?
>>
>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)    
>> info:
>> executing - rsc:apache action:start call_id:25
>> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>> (systemd_unit_exec)
>>    
>> debug: Performing asynchronous start op on systemd unit httpd
>> named
>> 'apache'
>> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
 apache:
>> /org/freedesktop/systemd1/unit/httpd_2eservice
>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)
    
>> notice: Giving up on apache start (rc=0): timeout
>> (elapsed=254930ms,
>> remaining=-154930ms)
>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)
>>    
>> debug: finished - rsc:apache action:monitor call_id:25 
 exit-code:198
>> exec-time:254935ms queue-time:235ms
>>
>>
>> Starting manually works fine and fast:
>>
>> # time systemctl start httpd
>> real    0m0.144s
>> user    0m0.005s
>> sys    0m0.008s

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Strahil Nikolov
On February 20, 2020 9:35:07 PM GMT+02:00, Maverick  wrote:
>
>Manually it starts ok, no problems:
>
>pcs resource debug-start apache --full
>(unpack_config)     warning: Blind faith: not fencing unseen nodes
>Operation start for apache (systemd::httpd) returned: 'ok' (0)
>
>
>On 20/02/2020 16:46, Strahil Nikolov wrote:
>> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick 
>wrote:
 You really need to debug the start & stop of  tthe resource .

 Please try the debug procedure  and provide the output:
 https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures

 Best Regards,
 Strahil Nikolov
>>>
>>> Hi,
>>>
>>> Correct me if i'm wrong, but i think that procedure doesn't work for
>>> systemd class resources, i don't know which OCF script is
>responsible
>>> for handling systemd class resources.
>>>
>>> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
>>> command
>>> only in SUSE.
>>>
>>>
>>>
>>> On 19/02/2020 19:23, Strahil Nikolov wrote:
 On February 19, 2020 7:21:12 PM GMT+02:00, Maverick 
>>> wrote:
> How is it possible that pacemaker is reporting that takes 4.2
>>> minutes
> (254930ms) to execute the start of httpd systemd unit?
>
> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)    
> info:
> executing - rsc:apache action:start call_id:25
> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>(systemd_unit_exec)
>    
> debug: Performing asynchronous start op on systemd unit httpd
>named
> 'apache'
> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
>>> apache:
> /org/freedesktop/systemd1/unit/httpd_2eservice
> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)
>>>    
> notice: Giving up on apache start (rc=0): timeout
>(elapsed=254930ms,
> remaining=-154930ms)
> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)
>   
> debug: finished - rsc:apache action:monitor call_id:25 
>>> exit-code:198
> exec-time:254935ms queue-time:235ms
>
>
> Starting manually works fine and fast:
>
> # time systemctl start httpd
> real    0m0.144s
> user    0m0.005s
> sys    0m0.008s
>
>
> On 17/02/2020 22:47, Mvrk wrote:
>> In attachment the pacemaker.log. On the log i can see that the
> cluster
>> tries to start, the start fails, then tries to stop, and the stop
> also
>> fails also.
>>
>> One more thing, my cluster was working fine on Fedora 28, i
>started
>> having this problem after upgrade to Fedora 31.
>>
>> On 17/02/2020 21:30, Ricardo Esteves wrote:
>>> Hi,
>>>
>>> Yes, i also don't understand why is trying to stop them first.
>>>
>>> SELinux is disabled:
>>>
>>> # getenforce
>>> Disabled
>>>
>>> All systemd services controlled by the cluster are disabled from
>>> starting at boot:
>>>
>>> # systemctl is-enabled httpd
>>> disabled
>>>
>>> # systemctl is-enabled openvpn-server@01-server
>>> disabled
>>>
>>>
>>> On 17/02/2020 20:28, Ken Gaillot wrote:
 On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
> Hi,
>
> When i start my cluster, most of my systemd resources won't
>>> start:
> Failed Resource Actions:
>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
> status='Timed Out', exitreason='', last-rc-change='1970-01-01
> 01:00:54 +01:00', queued=29ms, exec=197799ms
>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
> status='Timed Out', exitreason='', last-rc-change='1970-01-01
> 01:00:54 +01:00', queued=1805ms, exec=198841ms
 These show that attempts to stop failed, rather than start.

> So everytime i reboot my node, i need to start the resources
> manually
> using systemd, for example:
>
> systemd start apache
>
> and then pcs resource cleanup
>
> Resources configuration:
>
> Clone: apache-clone
>   Meta Attrs: maintenance=false
>   Resource: apache (class=systemd type=httpd)
>Meta Attrs: maintenance=false
>Operations: monitor interval=60 timeout=100
>(apache-monitor-
> interval-60)
>start interval=0s timeout=100
> (apache-start-interval-
> 0s)
>stop interval=0s timeout=100
> (apache-stop-interval-0s)
>
> Resource: openvpn (class=systemd
>type=openvpn-server@01-server)
>Meta Attrs: maintenance=false
>Operations: monitor interval=60 timeout=100
>(openvpn-monitor-
> interval-60)
>start interval=0s timeout=100
> (openvpn-start-interval-
> 0s)
>stop interval=0s timeout=100
> (openvpn-stop-interval-

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Maverick

Manually it starts ok, no problems:

pcs resource debug-start apache --full
(unpack_config)     warning: Blind faith: not fencing unseen nodes
Operation start for apache (systemd::httpd) returned: 'ok' (0)


On 20/02/2020 16:46, Strahil Nikolov wrote:
> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick  wrote:
>>> You really need to debug the start & stop of  tthe resource .
>>>
>>> Please try the debug procedure  and provide the output:
>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>
>> Hi,
>>
>> Correct me if i'm wrong, but i think that procedure doesn't work for
>> systemd class resources, i don't know which OCF script is responsible
>> for handling systemd class resources.
>>
>> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
>> command
>> only in SUSE.
>>
>>
>>
>> On 19/02/2020 19:23, Strahil Nikolov wrote:
>>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick 
>> wrote:
 How is it possible that pacemaker is reporting that takes 4.2
>> minutes
 (254930ms) to execute the start of httpd systemd unit?

 Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)    
 info:
 executing - rsc:apache action:start call_id:25
 Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec)
    
 debug: Performing asynchronous start op on systemd unit httpd named
 'apache'
 Feb 19 17:04:09 boss1 pacemaker-execd [1514]
 (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
>> apache:
 /org/freedesktop/systemd1/unit/httpd_2eservice
 Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)
>>    
 notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
 remaining=-154930ms)
 Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)    
 debug: finished - rsc:apache action:monitor call_id:25 
>> exit-code:198
 exec-time:254935ms queue-time:235ms


 Starting manually works fine and fast:

 # time systemctl start httpd
 real    0m0.144s
 user    0m0.005s
 sys    0m0.008s


 On 17/02/2020 22:47, Mvrk wrote:
> In attachment the pacemaker.log. On the log i can see that the
 cluster
> tries to start, the start fails, then tries to stop, and the stop
 also
> fails also.
>
> One more thing, my cluster was working fine on Fedora 28, i started
> having this problem after upgrade to Fedora 31.
>
> On 17/02/2020 21:30, Ricardo Esteves wrote:
>> Hi,
>>
>> Yes, i also don't understand why is trying to stop them first.
>>
>> SELinux is disabled:
>>
>> # getenforce
>> Disabled
>>
>> All systemd services controlled by the cluster are disabled from
>> starting at boot:
>>
>> # systemctl is-enabled httpd
>> disabled
>>
>> # systemctl is-enabled openvpn-server@01-server
>> disabled
>>
>>
>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>> On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
 Hi,

 When i start my cluster, most of my systemd resources won't
>> start:
 Failed Resource Actions:
   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
 status='Timed Out', exitreason='', last-rc-change='1970-01-01
 01:00:54 +01:00', queued=29ms, exec=197799ms
   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
 status='Timed Out', exitreason='', last-rc-change='1970-01-01
 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>> These show that attempts to stop failed, rather than start.
>>>
 So everytime i reboot my node, i need to start the resources
 manually
 using systemd, for example:

 systemd start apache

 and then pcs resource cleanup

 Resources configuration:

 Clone: apache-clone
   Meta Attrs: maintenance=false
   Resource: apache (class=systemd type=httpd)
Meta Attrs: maintenance=false
Operations: monitor interval=60 timeout=100 (apache-monitor-
 interval-60)
start interval=0s timeout=100
 (apache-start-interval-
 0s)
stop interval=0s timeout=100
 (apache-stop-interval-0s)

 Resource: openvpn (class=systemd type=openvpn-server@01-server)
Meta Attrs: maintenance=false
Operations: monitor interval=60 timeout=100 (openvpn-monitor-
 interval-60)
start interval=0s timeout=100
 (openvpn-start-interval-
 0s)
stop interval=0s timeout=100
 (openvpn-stop-interval-
 0s)



 Btw, if i try a debug-start / debug-stop the mentioned resources
 start and stop ok.
>>> Based on that, my first guess would be SELinux. Check the SELinux

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Strahil Nikolov
On February 20, 2020 12:49:43 PM GMT+02:00, Maverick  wrote:
>
>> You really need to debug the start & stop of  tthe resource .
>>
>> Please try the debug procedure  and provide the output:
>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>
>> Best Regards,
>> Strahil Nikolov
>
>
>Hi,
>
>Correct me if i'm wrong, but i think that procedure doesn't work for
>systemd class resources, i don't know which OCF script is responsible
>for handling systemd class resources.
>
>Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
>command
>only in SUSE.
>
>
>
>On 19/02/2020 19:23, Strahil Nikolov wrote:
>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick 
>wrote:
>>> How is it possible that pacemaker is reporting that takes 4.2
>minutes
>>> (254930ms) to execute the start of httpd systemd unit?
>>>
>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)    
>>> info:
>>> executing - rsc:apache action:start call_id:25
>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec)
>>>    
>>> debug: Performing asynchronous start op on systemd unit httpd named
>>> 'apache'
>>> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
>apache:
>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)
>   
>>> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
>>> remaining=-154930ms)
>>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)    
>>> debug: finished - rsc:apache action:monitor call_id:25 
>exit-code:198
>>> exec-time:254935ms queue-time:235ms
>>>
>>>
>>> Starting manually works fine and fast:
>>>
>>> # time systemctl start httpd
>>> real    0m0.144s
>>> user    0m0.005s
>>> sys    0m0.008s
>>>
>>>
>>> On 17/02/2020 22:47, Mvrk wrote:
 In attachment the pacemaker.log. On the log i can see that the
>>> cluster
 tries to start, the start fails, then tries to stop, and the stop
>>> also
 fails also.

 One more thing, my cluster was working fine on Fedora 28, i started
 having this problem after upgrade to Fedora 31.

 On 17/02/2020 21:30, Ricardo Esteves wrote:
> Hi,
>
> Yes, i also don't understand why is trying to stop them first.
>
> SELinux is disabled:
>
> # getenforce
> Disabled
>
> All systemd services controlled by the cluster are disabled from
> starting at boot:
>
> # systemctl is-enabled httpd
> disabled
>
> # systemctl is-enabled openvpn-server@01-server
> disabled
>
>
> On 17/02/2020 20:28, Ken Gaillot wrote:
>> On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
>>> Hi,
>>>
>>> When i start my cluster, most of my systemd resources won't
>start:
>>>
>>> Failed Resource Actions:
>>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
>> These show that attempts to stop failed, rather than start.
>>
>>> So everytime i reboot my node, i need to start the resources
>>> manually
>>> using systemd, for example:
>>>
>>> systemd start apache
>>>
>>> and then pcs resource cleanup
>>>
>>> Resources configuration:
>>>
>>> Clone: apache-clone
>>>   Meta Attrs: maintenance=false
>>>   Resource: apache (class=systemd type=httpd)
>>>Meta Attrs: maintenance=false
>>>Operations: monitor interval=60 timeout=100 (apache-monitor-
>>> interval-60)
>>>start interval=0s timeout=100
>>> (apache-start-interval-
>>> 0s)
>>>stop interval=0s timeout=100
>>> (apache-stop-interval-0s)
>>>
>>>
>>> Resource: openvpn (class=systemd type=openvpn-server@01-server)
>>>Meta Attrs: maintenance=false
>>>Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>>> interval-60)
>>>start interval=0s timeout=100
>>> (openvpn-start-interval-
>>> 0s)
>>>stop interval=0s timeout=100
>>> (openvpn-stop-interval-
>>> 0s)
>>>
>>>
>>>
>>> Btw, if i try a debug-start / debug-stop the mentioned resources
>>> start and stop ok.
>> Based on that, my first guess would be SELinux. Check the SELinux
>>> logs
>> for denials.
>>
>> Also, make sure your systemd services are not enabled in systemd
>>> itself
>> (e.g. via systemctl enable). Clustered systemd services should be
>> managed by the cluster only.
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs 

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Maverick

> You really need to debug the start & stop of  tthe resource .
>
> Please try the debug procedure  and provide the output:
> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>
> Best Regards,
> Strahil Nikolov


Hi,

Correct me if i'm wrong, but i think that procedure doesn't work for
systemd class resources, i don't know which OCF script is responsible
for handling systemd class resources.

Also crm command doesn't exist in RHEL/Fedora, i've seen the crm command
only in SUSE.



On 19/02/2020 19:23, Strahil Nikolov wrote:
> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick  wrote:
>> How is it possible that pacemaker is reporting that takes 4.2 minutes
>> (254930ms) to execute the start of httpd systemd unit?
>>
>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)    
>> info:
>> executing - rsc:apache action:start call_id:25
>> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec)
>>    
>> debug: Performing asynchronous start op on systemd unit httpd named
>> 'apache'
>> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for apache:
>> /org/freedesktop/systemd1/unit/httpd_2eservice
>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)    
>> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
>> remaining=-154930ms)
>> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)    
>> debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
>> exec-time:254935ms queue-time:235ms
>>
>>
>> Starting manually works fine and fast:
>>
>> # time systemctl start httpd
>> real    0m0.144s
>> user    0m0.005s
>> sys    0m0.008s
>>
>>
>> On 17/02/2020 22:47, Mvrk wrote:
>>> In attachment the pacemaker.log. On the log i can see that the
>> cluster
>>> tries to start, the start fails, then tries to stop, and the stop
>> also
>>> fails also.
>>>
>>> One more thing, my cluster was working fine on Fedora 28, i started
>>> having this problem after upgrade to Fedora 31.
>>>
>>> On 17/02/2020 21:30, Ricardo Esteves wrote:
 Hi,

 Yes, i also don't understand why is trying to stop them first.

 SELinux is disabled:

 # getenforce
 Disabled

 All systemd services controlled by the cluster are disabled from
 starting at boot:

 # systemctl is-enabled httpd
 disabled

 # systemctl is-enabled openvpn-server@01-server
 disabled


 On 17/02/2020 20:28, Ken Gaillot wrote:
> On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
>> Hi,
>>
>> When i start my cluster, most of my systemd resources won't start:
>>
>> Failed Resource Actions:
>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
> These show that attempts to stop failed, rather than start.
>
>> So everytime i reboot my node, i need to start the resources
>> manually
>> using systemd, for example:
>>
>> systemd start apache
>>
>> and then pcs resource cleanup
>>
>> Resources configuration:
>>
>> Clone: apache-clone
>>   Meta Attrs: maintenance=false
>>   Resource: apache (class=systemd type=httpd)
>>Meta Attrs: maintenance=false
>>Operations: monitor interval=60 timeout=100 (apache-monitor-
>> interval-60)
>>start interval=0s timeout=100
>> (apache-start-interval-
>> 0s)
>>stop interval=0s timeout=100
>> (apache-stop-interval-0s)
>>
>>
>> Resource: openvpn (class=systemd type=openvpn-server@01-server)
>>Meta Attrs: maintenance=false
>>Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>> interval-60)
>>start interval=0s timeout=100
>> (openvpn-start-interval-
>> 0s)
>>stop interval=0s timeout=100
>> (openvpn-stop-interval-
>> 0s)
>>
>>
>>
>> Btw, if i try a debug-start / debug-stop the mentioned resources
>> start and stop ok.
> Based on that, my first guess would be SELinux. Check the SELinux
>> logs
> for denials.
>
> Also, make sure your systemd services are not enabled in systemd
>> itself
> (e.g. via systemctl enable). Clustered systemd services should be
> managed by the cluster only.
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
> You really need to debug the start & stop of  tthe resource .
>
> Please try the debug procedure  and provide the output:
> 

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-19 Thread Maverick

How is it possible that pacemaker is reporting that takes 4.2 minutes
(254930ms) to execute the start of httpd systemd unit?

Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)     info:
executing - rsc:apache action:start call_id:25
Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec)    
debug: Performing asynchronous start op on systemd unit httpd named 'apache'
Feb 19 17:04:09 boss1 pacemaker-execd [1514]
(systemd_unit_exec_with_unit)     debug: Calling StartUnit for apache:
/org/freedesktop/systemd1/unit/httpd_2eservice
Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)    
notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
remaining=-154930ms)
Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)    
debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
exec-time:254935ms queue-time:235ms


Starting manually works fine and fast:

# time systemctl start httpd
real    0m0.144s
user    0m0.005s
sys    0m0.008s


On 17/02/2020 22:47, Mvrk wrote:
> In attachment the pacemaker.log. On the log i can see that the cluster
> tries to start, the start fails, then tries to stop, and the stop also
> fails also.
>
> One more thing, my cluster was working fine on Fedora 28, i started
> having this problem after upgrade to Fedora 31.
>
> On 17/02/2020 21:30, Ricardo Esteves wrote:
>> Hi,
>>
>> Yes, i also don't understand why is trying to stop them first.
>>
>> SELinux is disabled:
>>
>> # getenforce
>> Disabled
>>
>> All systemd services controlled by the cluster are disabled from
>> starting at boot:
>>
>> # systemctl is-enabled httpd
>> disabled
>>
>> # systemctl is-enabled openvpn-server@01-server
>> disabled
>>
>>
>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>> On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
 Hi,

 When i start my cluster, most of my systemd resources won't start:

 Failed Resource Actions:
   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
 status='Timed Out', exitreason='', last-rc-change='1970-01-01
 01:00:54 +01:00', queued=29ms, exec=197799ms
   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
 status='Timed Out', exitreason='', last-rc-change='1970-01-01
 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>> These show that attempts to stop failed, rather than start.
>>>
 So everytime i reboot my node, i need to start the resources manually
 using systemd, for example:

 systemd start apache

 and then pcs resource cleanup

 Resources configuration:

 Clone: apache-clone
   Meta Attrs: maintenance=false
   Resource: apache (class=systemd type=httpd)
Meta Attrs: maintenance=false
Operations: monitor interval=60 timeout=100 (apache-monitor-
 interval-60)
start interval=0s timeout=100 (apache-start-interval-
 0s)
stop interval=0s timeout=100 (apache-stop-interval-0s)



 Resource: openvpn (class=systemd type=openvpn-server@01-server)
Meta Attrs: maintenance=false
Operations: monitor interval=60 timeout=100 (openvpn-monitor-
 interval-60)
start interval=0s timeout=100 (openvpn-start-interval-
 0s)
stop interval=0s timeout=100 (openvpn-stop-interval-
 0s)



 Btw, if i try a debug-start / debug-stop the mentioned resources
 start and stop ok.
>>> Based on that, my first guess would be SELinux. Check the SELinux logs
>>> for denials.
>>>
>>> Also, make sure your systemd services are not enabled in systemd itself
>>> (e.g. via systemctl enable). Clustered systemd services should be
>>> managed by the cluster only.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-17 Thread Ricardo Esteves
Hi,

Yes, i also don't understand why is trying to stop them first.

SELinux is disabled:

# getenforce
Disabled

All systemd services controlled by the cluster are disabled from
starting at boot:

# systemctl is-enabled httpd
disabled

# systemctl is-enabled openvpn-server@01-server
disabled


On 17/02/2020 20:28, Ken Gaillot wrote:
> On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
>> Hi,
>>
>> When i start my cluster, most of my systemd resources won't start:
>>
>> Failed Resource Actions:
>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
> These show that attempts to stop failed, rather than start.
>
>> So everytime i reboot my node, i need to start the resources manually
>> using systemd, for example:
>>
>> systemd start apache
>>
>> and then pcs resource cleanup
>>
>> Resources configuration:
>>
>> Clone: apache-clone
>>   Meta Attrs: maintenance=false
>>   Resource: apache (class=systemd type=httpd)
>>Meta Attrs: maintenance=false
>>Operations: monitor interval=60 timeout=100 (apache-monitor-
>> interval-60)
>>start interval=0s timeout=100 (apache-start-interval-
>> 0s)
>>stop interval=0s timeout=100 (apache-stop-interval-0s)
>>
>>
>>
>> Resource: openvpn (class=systemd type=openvpn-server@01-server)
>>Meta Attrs: maintenance=false
>>Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>> interval-60)
>>start interval=0s timeout=100 (openvpn-start-interval-
>> 0s)
>>stop interval=0s timeout=100 (openvpn-stop-interval-
>> 0s)
>>
>>
>>
>> Btw, if i try a debug-start / debug-stop the mentioned resources
>> start and stop ok.
> Based on that, my first guess would be SELinux. Check the SELinux logs
> for denials.
>
> Also, make sure your systemd services are not enabled in systemd itself
> (e.g. via systemctl enable). Clustered systemd services should be
> managed by the cluster only.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-17 Thread Ken Gaillot
On Mon, 2020-02-17 at 17:35 +, Maverick wrote:
> 
> Hi,
> 
> When i start my cluster, most of my systemd resources won't start:
> 
> Failed Resource Actions:
>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
> status='Timed Out', exitreason='', last-rc-change='1970-01-01
> 01:00:54 +01:00', queued=29ms, exec=197799ms
>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
> status='Timed Out', exitreason='', last-rc-change='1970-01-01
> 01:00:54 +01:00', queued=1805ms, exec=198841ms

These show that attempts to stop failed, rather than start.

> 
> So everytime i reboot my node, i need to start the resources manually
> using systemd, for example:
> 
> systemd start apache
> 
> and then pcs resource cleanup
> 
> Resources configuration:
> 
> Clone: apache-clone
>   Meta Attrs: maintenance=false
>   Resource: apache (class=systemd type=httpd)
>Meta Attrs: maintenance=false
>Operations: monitor interval=60 timeout=100 (apache-monitor-
> interval-60)
>start interval=0s timeout=100 (apache-start-interval-
> 0s)
>stop interval=0s timeout=100 (apache-stop-interval-0s)
> 
> 
> 
> Resource: openvpn (class=systemd type=openvpn-server@01-server)
>Meta Attrs: maintenance=false
>Operations: monitor interval=60 timeout=100 (openvpn-monitor-
> interval-60)
>start interval=0s timeout=100 (openvpn-start-interval-
> 0s)
>stop interval=0s timeout=100 (openvpn-stop-interval-
> 0s)
> 
> 
> 
> Btw, if i try a debug-start / debug-stop the mentioned resources
> start and stop ok.

Based on that, my first guess would be SELinux. Check the SELinux logs
for denials.

Also, make sure your systemd services are not enabled in systemd itself
(e.g. via systemctl enable). Clustered systemd services should be
managed by the cluster only.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-17 Thread Maverick

Hi,

When i start my cluster, most of my systemd resources won't start:

Failed Resource Actions:
  * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82, status='Timed  
Out', exitreason='', last-rc-change='1970-01-01 01:00:54 +01:00',  
queued=29ms, exec=197799ms
  * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,  
status='Timed Out', exitreason='', last-rc-change='1970-01-01 01:00:54  
+01:00', queued=1805ms, exec=198841ms


So everytime i reboot my node, i need to start the resources manually  
using systemd, for example:


systemd start apache

and then pcs resource cleanup

Resources configuration:

Clone: apache-clone
  Meta Attrs: maintenance=false
  Resource: apache (class=systemd type=httpd)
   Meta Attrs: maintenance=false
   Operations: monitor interval=60 timeout=100 (apache-monitor-interval-60)
   start interval=0s timeout=100 (apache-start-interval-0s)
   stop interval=0s timeout=100 (apache-stop-interval-0s)

Resource: openvpn (class=systemd type=openvpn-server@01-server)
   Meta Attrs: maintenance=false
   Operations: monitor interval=60 timeout=100 (openvpn-monitor-interval-60)
   start interval=0s timeout=100 (openvpn-start-interval-0s)
   stop interval=0s timeout=100 (openvpn-stop-interval-0s)

Btw, if i try a debug-start / debug-stop the mentioned resources start  
and stop ok.


 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/