Re: [ClusterLabs] info: mcp_cpg_deliver: Ignoring process list sent by peer for local node

2019-05-29 Thread Ken Gaillot
On Wed, 2019-05-29 at 17:28 +0100, lejeczek wrote:
> hi guys,
> 
> I have a 3-nodes cluster but one node is a freaking mystery to me. I
> see
> this:
> 
> May 29 17:21:45 [51617] rider.private pacemakerd: info:
> pcmk_cpg_membership:Node 3 still member of group pacemakerd
> (peer=rider.private, counter=0.2)
> May 29 17:21:45 [51617] rider.private pacemakerd: info:
> mcp_cpg_deliver:Ignoring process list sent by peer for local node
> May 29 17:21:45 [51617] rider.private pacemakerd: info:
> mcp_cpg_deliver:Ignoring process list sent by peer for local node

These are harmless and unrelated.

> and I wonder if it in any way relates to the fact that the node says:
> 
> $ crm_mon --one-shot
> Connection to cluster failed: Transport endpoint is not connected
> $ pcs status --all
> Error: cluster is not currently running on this node

What user are you running as? That's expected if the user isn't either
root or in the haclient group.

> 
> and:
> $ systemctl status -l pacemaker.service 
> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>Loaded: loaded (/usr/lib/systemd/system/pacemaker.service;
> disabled; vendor preset: disabled)
>Active: active (running) since Wed 2019-05-29 17:21:45 BST; 7s ago
>  Docs: man:pacemakerd
>
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
>  Main PID: 51617 (pacemakerd)
> Tasks: 1
>Memory: 3.3M
>CGroup: /system.slice/pacemaker.service
>└─51617 /usr/sbin/pacemakerd -f
> 
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking
> existing pengine process (pid=51528)
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking
> existing lrmd process (pid=51542)
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking
> existing stonithd process (pid=51558)
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking
> existing attrd process (pid=51559)
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking
> existing cib process (pid=51560)
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking
> existing crmd process (pid=51566)
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Quorum
> acquired
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node
> whale.private state is now member
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node
> swir.private state is now member
> May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node
> rider.private state is now member
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Q: ocf:pacemaker:NodeUtilization monitor

2019-05-29 Thread Andrei Borzenkov
29.05.2019 11:12, Ulrich Windl пишет:
 Jan Pokorný  schrieb am 28.05.2019 um 16:31 in
> Nachricht
> <20190528143145.ga29...@redhat.com>:
>> On 27/05/19 08:28 +0200, Ulrich Windl wrote:
>>> I copnfigured ocf:pacemaker:NodeUtilization more or less for fun, and I 
>> realized that the cluster rrepiorts no problems, but in syslog I have these
> 
>> unusual messages:
>>> 2019‑05‑27T08:21:07.748149+02:00 h06 lrmd[16599]:   notice: 
>> prm_node_util_monitor_30:15028:stderr [ info: Writing node (dir)Top...
> ]
>>> 2019‑05‑27T08:21:07.748546+02:00 h06 lrmd[16599]:   notice: 
>> prm_node_util_monitor_30:15028:stderr [ info: Cannot find node `(dir)GNU
> 
>> Free Documentation License'. ]
>>> 2019‑05‑27T08:21:07.748799+02:00 h06 lrmd[16599]:   notice: 
>> prm_node_util_monitor_30:15028:stderr [ info: Done. ]
>>>
>>>
>>> "(dir)" looks a lot like Documentation. What has the monitor to do with 
>> documentation?
>>
>> The above looks as if you run "info" without arguments (it will try
>> to dispay initial page '(dir)Top' ‑‑ and moreover perhasp when it is
>> not found).
>>
>> I have no idea how this could happen, since there's the only reference
>> to "info" and it seems basic‑sanity guarded:
>>
>>
> https://github.com/ClusterLabs/resource‑agents/blob/v4.2.0/heartbeat/NodeUtil
> 
>> ization#L119
>>
>>> 118 if [ ‑x $xentool ]; then
>>> 119 $xentool info | awk
> '/total_memory/{printf("%d\n",$3);exit(0)}'
>>> 120 else
>>> 121 ocf_log warn "Can only set hv_memory for Xen hypervisor"
>>> 122 echo "0"
>>
>> So kind of a mystery :‑)
> 
> Except when $xentool is undefined ;-)

How can condition [ -x $xentool ] be true then? Unless there actually is
binary with name "]" and it happened to be in local directory (whatever
local directory is when this script is executed).
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] info: mcp_cpg_deliver: Ignoring process list sent by peer for local node

2019-05-29 Thread lejeczek
hi guys,

I have a 3-nodes cluster but one node is a freaking mystery to me. I see
this:

May 29 17:21:45 [51617] rider.private pacemakerd: info:
pcmk_cpg_membership:    Node 3 still member of group pacemakerd
(peer=rider.private, counter=0.2)
May 29 17:21:45 [51617] rider.private pacemakerd: info:
mcp_cpg_deliver:    Ignoring process list sent by peer for local node
May 29 17:21:45 [51617] rider.private pacemakerd: info:
mcp_cpg_deliver:    Ignoring process list sent by peer for local node

and I wonder if it in any way relates to the fact that the node says:

$ crm_mon --one-shot
Connection to cluster failed: Transport endpoint is not connected
$ pcs status --all
Error: cluster is not currently running on this node

and:
$ systemctl status -l pacemaker.service 
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor 
preset: disabled)
   Active: active (running) since Wed 2019-05-29 17:21:45 BST; 7s ago
 Docs: man:pacemakerd
   
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
 Main PID: 51617 (pacemakerd)
Tasks: 1
   Memory: 3.3M
   CGroup: /system.slice/pacemaker.service
   └─51617 /usr/sbin/pacemakerd -f

May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking existing 
pengine process (pid=51528)
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking existing 
lrmd process (pid=51542)
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking existing 
stonithd process (pid=51558)
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking existing 
attrd process (pid=51559)
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking existing 
cib process (pid=51560)
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Tracking existing 
crmd process (pid=51566)
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Quorum acquired
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node whale.private 
state is now member
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node swir.private 
state is now member
May 29 17:21:45 rider.private pacemakerd[51617]:   notice: Node rider.private 
state is now member



pEpkey.asc
Description: application/pgp-keys
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-05-29 Thread Ken Gaillot
On Wed, 2019-05-29 at 11:42 +0100, lejeczek wrote:
> hi guys,
> 
> I doing something which I believe is fairly simple, namely:
> 
> $ pcs resource create HA-work9-win10-kvm VirtualDomain
> hypervisor="qemu:///system"
> config="/0-ALL.SYSDATA/QEMU_VMs/HA-work9-win10.qcow2"
> migration_transport=ssh --disable
> 
> virt guest is good, runs in libvirth okey, yet pacemaker fails:
> 
> ...
> 
> 
>   notice: State transition S_IDLE -> S_POLICY_ENGINE
>error: Invalid recurring action
> chenbro0.1-raid5-mnt-start-interval-90 wth name: 'start'
>error: Invalid recurring action chenbro0.1-raid5-mnt-stop-
> interval-90
> wth name: 'stop'

The "start" and "stop" actions in the configuration must have interval
0 (which is the default if you just omit it). Configuring start/stop is
just a way to be able to set the timeout etc. used with those actions.

>   notice: Calculated transition 1864, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-2022.bz2
>   notice: Configuration ERRORs found during PE processing.  Please
> run
> "crm_verify -L" to identify issues.
>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
> locally on whale.private
>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
> on
> swir.private
>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
> on
> rider.private
>  warning: HA-work9-win10-kvm_monitor_0 process (PID 2103512) timed
> out
>  warning: HA-work9-win10-kvm_monitor_0:2103512 - timed out after
> 3ms
>   notice: HA-work9-win10-kvm_monitor_0:2103512:stderr [
> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too
> many
> arguments ]

This looks like a bug in the resource agent, probably due to some
unexpected configuration value. Double-check your resource
configuration for what values the various parameters can have. (Or it
may just be a side effect of the interval issue above, so try fixing
that first.)

>error: Result of probe operation for HA-work9-win10-kvm on
> whale.private: Timed Out
>   notice: whale.private-HA-work9-win10-kvm_monitor_0:204 [
> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too
> many
> arguments\n ]
>  warning: Action 15 (HA-work9-win10-kvm_monitor_0) on rider.private
> failed (target: 7 vs. rc: 1): Error
>   notice: Transition aborted by operation HA-work9-win10-
> kvm_monitor_0
> 'modify' on rider.private: Event failed
>  warning: Action 17 (HA-work9-win10-kvm_monitor_0) on whale.private
> failed (target: 7 vs. rc: 1): Error
>  warning: Action 16 (HA-work9-win10-kvm_monitor_0) on swir.private
> failed (target: 7 vs. rc: 1): Error
>   notice: Transition 1864 (Complete=3, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2022.bz2):
> Complete
>  warning: Processing failed probe of HA-work9-win10-kvm on
> whale.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> whale.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> whale.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> whale.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> swir.private:
> unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> swir.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> swir.private:
> unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> swir.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> rider.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> rider.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> rider.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> rider.private, see the resource-discovery option for location
> constraints
>error: Invalid recurring action
> chenbro0.1-raid5-mnt-start-interval-90 wth name: 'start'
>error: Invalid recurring action chenbro0.1-raid5-mnt-stop-
> interval-90
> wth name: 'stop'
>error: Resource HA-work9-win10-kvm is active on 3 nodes
> (attempting
> recovery)
>   notice: See
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
> information
>   notice:  * Stop   HA-work9-win10-kvm  
> ( whale.private )   due
> to
> node availability
>   notice:  * Stop   HA-work9-win10-kvm  
> (  swir.private )   due
> to
> node availability
>   notice:  * Stop   HA-work9-win10-kvm  
> ( rider.private )   due
> to
> node ava

Re: [ClusterLabs] Resource-agents log is not output to /var/log/pacemaker/pacemaker.log on RHEL8

2019-05-29 Thread Ken Gaillot
On Wed, 2019-05-29 at 16:53 +0900, 飯田雄介 wrote:
> Hi Ken and Jan,
> 
> Thank you for your comment.
> 
> I understand that solusion is to set PCMK_logfile in the sysconfig
> file.
> 
> As a permanent fix, if you use the default values inside Pacemaker,
> how about setting environment variables using set_daemon_option()
> there?

That would be better. I was just going to change the shipped sysconfig
because it's easy to do immediately, but changing the code would handle
cases where users auto-generate a sysconfig that doesn't include it,
launch pacemaker manually for testing, etc. However that'll have to
wait for the next release.

> For example, as PCMK_logficility does.
> 
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2-rc2/lib/common/logging.c#L806
> 
> BTW, Pacemaker writes to /var/log/pacemaker/pacemaker.log via libqb.
> RA writes to this file with echo redirect.
> If writing occurs at the same time, is there a risk that the file may
> be corrupted or the written log may disappear?
> I have never actually had a problem, but I'm interested in how this
> might happen.
> 
> Regards,
> Yusuke
> 
> 2019年5月28日(火) 23:56 Jan Pokorný :
> > On 28/05/19 09:29 -0500, Ken Gaillot wrote:
> > > On Mon, 2019-05-27 at 14:12 +0900, 飯田雄介 wrote:
> > >> By the way, when /var/log/pacemaker/pacemaker.log is explicitly
> > set
> > >> in the PCMK_logfile, it is confirmed that the resource-agents
> > log is
> > >> output to the file set in the PCMK_logfile.
> > > 
> > > Interesting ... the resource-agents library must look for
> > PCMK_logfile
> > > as well as HA_logfile. In that case, the easiest solution will be
> > for
> > > us to set PCMK_logfile explicitly in the shipped sysconfig file.
> > I can
> > > squeeze that into the soon-to-be-released 2.0.2 since it's not a
> > code
> > > change.
> > 
> > Solution remains the same, only meant to note that presence of
> > either:
> > 
> >   PCMK_logfile
> >   HA_logfile (likely on the way towards deprecation, preferably
> > avoid)

Yep, which brings up the question of what OCF should do. Currently
neither is part of the standard.

> > in the environment (from respective sysconfig/default/conf.d file
> > for
> > pacemaker) will trigger export of HA_LOGFILE environment variable
> > propagated subsequently towards the agent processes, and everything
> > then works as expected.  IOW. OCF and/or resource-agents are still
> > reasonably decoupled, thankfully.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-05-29 Thread lejeczek
hi guys,

I doing something which I believe is fairly simple, namely:

$ pcs resource create HA-work9-win10-kvm VirtualDomain
hypervisor="qemu:///system"
config="/0-ALL.SYSDATA/QEMU_VMs/HA-work9-win10.qcow2"
migration_transport=ssh --disable

virt guest is good, runs in libvirth okey, yet pacemaker fails:

...


  notice: State transition S_IDLE -> S_POLICY_ENGINE
   error: Invalid recurring action
chenbro0.1-raid5-mnt-start-interval-90 wth name: 'start'
   error: Invalid recurring action chenbro0.1-raid5-mnt-stop-interval-90
wth name: 'stop'
  notice: Calculated transition 1864, saving inputs in
/var/lib/pacemaker/pengine/pe-input-2022.bz2
  notice: Configuration ERRORs found during PE processing.  Please run
"crm_verify -L" to identify issues.
  notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
locally on whale.private
  notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0 on
swir.private
  notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0 on
rider.private
 warning: HA-work9-win10-kvm_monitor_0 process (PID 2103512) timed out
 warning: HA-work9-win10-kvm_monitor_0:2103512 - timed out after 3ms
  notice: HA-work9-win10-kvm_monitor_0:2103512:stderr [
/usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too many
arguments ]
   error: Result of probe operation for HA-work9-win10-kvm on
whale.private: Timed Out
  notice: whale.private-HA-work9-win10-kvm_monitor_0:204 [
/usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too many
arguments\n ]
 warning: Action 15 (HA-work9-win10-kvm_monitor_0) on rider.private
failed (target: 7 vs. rc: 1): Error
  notice: Transition aborted by operation HA-work9-win10-kvm_monitor_0
'modify' on rider.private: Event failed
 warning: Action 17 (HA-work9-win10-kvm_monitor_0) on whale.private
failed (target: 7 vs. rc: 1): Error
 warning: Action 16 (HA-work9-win10-kvm_monitor_0) on swir.private
failed (target: 7 vs. rc: 1): Error
  notice: Transition 1864 (Complete=3, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2022.bz2):
Complete
 warning: Processing failed probe of HA-work9-win10-kvm on
whale.private: unknown error
  notice: If it is not possible for HA-work9-win10-kvm to run on
whale.private, see the resource-discovery option for location constraints
 warning: Processing failed probe of HA-work9-win10-kvm on
whale.private: unknown error
  notice: If it is not possible for HA-work9-win10-kvm to run on
whale.private, see the resource-discovery option for location constraints
 warning: Processing failed probe of HA-work9-win10-kvm on swir.private:
unknown error
  notice: If it is not possible for HA-work9-win10-kvm to run on
swir.private, see the resource-discovery option for location constraints
 warning: Processing failed probe of HA-work9-win10-kvm on swir.private:
unknown error
  notice: If it is not possible for HA-work9-win10-kvm to run on
swir.private, see the resource-discovery option for location constraints
 warning: Processing failed probe of HA-work9-win10-kvm on
rider.private: unknown error
  notice: If it is not possible for HA-work9-win10-kvm to run on
rider.private, see the resource-discovery option for location constraints
 warning: Processing failed probe of HA-work9-win10-kvm on
rider.private: unknown error
  notice: If it is not possible for HA-work9-win10-kvm to run on
rider.private, see the resource-discovery option for location constraints
   error: Invalid recurring action
chenbro0.1-raid5-mnt-start-interval-90 wth name: 'start'
   error: Invalid recurring action chenbro0.1-raid5-mnt-stop-interval-90
wth name: 'stop'
   error: Resource HA-work9-win10-kvm is active on 3 nodes (attempting
recovery)
  notice: See
https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
information
  notice:  * Stop   HA-work9-win10-kvm  
( whale.private )   due to
node availability
  notice:  * Stop   HA-work9-win10-kvm  
(  swir.private )   due to
node availability
  notice:  * Stop   HA-work9-win10-kvm  
( rider.private )   due to
node availability
   error: Calculated transition 1865 (with errors), saving inputs in
/var/lib/pacemaker/pengine/pe-error-56.bz2
  notice: Configuration ERRORs found during PE processing.  Please run
"crm_verify -L" to identify issues.
  notice: Initiating stop operation HA-work9-win10-kvm_stop_0 on
rider.private
  notice: Initiating stop operation HA-work9-win10-kvm_stop_0 on
swir.private
  notice: Initiating stop operation HA-work9-win10-kvm_stop_0 locally on
whale.private
 warning: Action 17 (HA-work9-win10-kvm_stop_0) on rider.private failed
(target: 0 vs. rc: 1): Error
  notice: Transition aborted by operation HA-work9-win10-kvm_stop_0
'modify' on rider.private: Event failed
  notice: Transition aborted by
status-3-fail-count-HA-work9-win10-kvm.stop_0 doing create
fail

[ClusterLabs] Antw: Re: Q: ocf:pacemaker:NodeUtilization monitor

2019-05-29 Thread Ulrich Windl
>>> Jan Pokorný  schrieb am 28.05.2019 um 16:31 in
Nachricht
<20190528143145.ga29...@redhat.com>:
> On 27/05/19 08:28 +0200, Ulrich Windl wrote:
>> I copnfigured ocf:pacemaker:NodeUtilization more or less for fun, and I 
> realized that the cluster rrepiorts no problems, but in syslog I have these

> unusual messages:
>> 2019‑05‑27T08:21:07.748149+02:00 h06 lrmd[16599]:   notice: 
> prm_node_util_monitor_30:15028:stderr [ info: Writing node (dir)Top...
]
>> 2019‑05‑27T08:21:07.748546+02:00 h06 lrmd[16599]:   notice: 
> prm_node_util_monitor_30:15028:stderr [ info: Cannot find node `(dir)GNU

> Free Documentation License'. ]
>> 2019‑05‑27T08:21:07.748799+02:00 h06 lrmd[16599]:   notice: 
> prm_node_util_monitor_30:15028:stderr [ info: Done. ]
>> 
>> 
>> "(dir)" looks a lot like Documentation. What has the monitor to do with 
> documentation?
> 
> The above looks as if you run "info" without arguments (it will try
> to dispay initial page '(dir)Top' ‑‑ and moreover perhasp when it is
> not found).
> 
> I have no idea how this could happen, since there's the only reference
> to "info" and it seems basic‑sanity guarded:
> 
>
https://github.com/ClusterLabs/resource‑agents/blob/v4.2.0/heartbeat/NodeUtil

> ization#L119
> 
>> 118 if [ ‑x $xentool ]; then
>> 119 $xentool info | awk
'/total_memory/{printf("%d\n",$3);exit(0)}'
>> 120 else
>> 121 ocf_log warn "Can only set hv_memory for Xen hypervisor"
>> 122 echo "0"
> 
> So kind of a mystery :‑)

Except when $xentool is undefined ;-)
Actually on my system this command creates an empty line:
echo $(which xl 2> /dev/null || which xm 2> /dev/null)


My configuration is:
# crm configure show prm_node_util
primitive prm_node_util ocf:pacemaker:NodeUtilization \
op start interval=0 timeout=90 \
op stop interval=0 timeout=120 \
op monitor interval=300 timeout=90

> 
>> Despite of that the RAS seems to work. (SLES12 SP4 with current
>> patches applied)
> 
> Are you sure the resource's provider is pacemaker and not heartbeat?

Genuine SUSE:
# rpm -qf /usr/lib/ocf/resource.d/pacemaker/NodeUtilization
pacemaker-cli-1.1.19+20181105.ccd6b5b10-3.10.1.x86_64
# ll /usr/lib/ocf/resource.d/pacemaker/NodeUtilization
lrwxrwxrwx 1 root root 28 Apr 29 14:58
/usr/lib/ocf/resource.d/pacemaker/NodeUtilization ->
../heartbeat/NodeUtilization

> Got stuck for a bit trying to look up that agent on pacemaker side
> (booth is admittedly also a bit misleading in this regard).
> 
> ‑‑ 
> Jan (Poki)



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Resource-agents log is not output to /var/log/pacemaker/pacemaker.log on RHEL8

2019-05-29 Thread 飯田雄介
Hi Ken and Jan,

Thank you for your comment.

I understand that solusion is to set PCMK_logfile in the sysconfig file.

As a permanent fix, if you use the default values inside Pacemaker, how
about setting environment variables using set_daemon_option() there?
For example, as PCMK_logficility does.
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2-rc2/lib/common/logging.c#L806

BTW, Pacemaker writes to /var/log/pacemaker/pacemaker.log via libqb.
RA writes to this file with echo redirect.
If writing occurs at the same time, is there a risk that the file may be
corrupted or the written log may disappear?
I have never actually had a problem, but I'm interested in how this might
happen.

Regards,
Yusuke

2019年5月28日(火) 23:56 Jan Pokorný :

> On 28/05/19 09:29 -0500, Ken Gaillot wrote:
> > On Mon, 2019-05-27 at 14:12 +0900, 飯田雄介 wrote:
> >> By the way, when /var/log/pacemaker/pacemaker.log is explicitly set
> >> in the PCMK_logfile, it is confirmed that the resource-agents log is
> >> output to the file set in the PCMK_logfile.
> >
> > Interesting ... the resource-agents library must look for PCMK_logfile
> > as well as HA_logfile. In that case, the easiest solution will be for
> > us to set PCMK_logfile explicitly in the shipped sysconfig file. I can
> > squeeze that into the soon-to-be-released 2.0.2 since it's not a code
> > change.
>
> Solution remains the same, only meant to note that presence of either:
>
>   PCMK_logfile
>   HA_logfile (likely on the way towards deprecation, preferably avoid)
>
> in the environment (from respective sysconfig/default/conf.d file for
> pacemaker) will trigger export of HA_LOGFILE environment variable
> propagated subsequently towards the agent processes, and everything
> then works as expected.  IOW. OCF and/or resource-agents are still
> reasonably decoupled, thankfully.
>
> --
> Jan (Poki)
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 

株式会社 DNPメトロシステムズ
飯田 雄介(iida...@mail.dnp.co.jp)
〒141-8001
東京都品川区西五反田3-5-20 DNP五反田ビル
携帯. 070-3186-0919

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/