Re: [ClusterLabs] How is fencing and unfencing suppose to work?

2018-09-28 Thread digimer

On 2018-09-04 8:49 p.m., Ken Gaillot wrote:

On Tue, 2018-08-21 at 10:23 -0500, Ryan Thomas wrote:

I’m seeing unexpected behavior when using “unfencing” – I don’t think
I’m understanding it correctly.  I configured a resource that
“requires unfencing” and have a custom fencing agent which “provides
unfencing”.   I perform a simple test where I setup the cluster and
then run “pcs stonith fence node2”, and I see that node2 is
successfully fenced by sending an “off” action to my fencing agent.
But, immediately after this, I see an “on” action sent to my fencing
agent.  My fence agent doesn’t implement the “reboot” action, so
perhaps its trying to reboot by running an off action followed by a
on action.  Prior to adding “provides unfencing” to the fencing
agent, I didn’t see the on action. It seems unsafe to say “node2 you
can’t run” and then immediately “ you can run”.

I'm not as familiar with unfencing as I'd like, but I believe the basic
idea is:

- the fence agent's off action cuts the machine off from something
essential needed to run resources (generally shared storage or network
access)

- the fencing works such that a fenced host is not able to request
rejoining the cluster without manual intervention by a sysadmin

- when the sysadmin allows the host back into the cluster, and it
contacts the other nodes to rejoin, the cluster will call the fence
agent's on action, which is expected to re-enable the host's access

How that works in practice, I have only vague knowledge.


This is correct. Consider fabric fencing where fiber channel ports are 
disconnected. Unfence restores the connection. Similar to a pure 'off' 
fence call to switched PDUs, as you mention above. Unfence powers the 
outlets back up.



I don’t think I’m understanding this aspect of fencing/stonith.  I
thought that the fence agent acted as a proxy to a node, when the
node was fenced, it was isolated from shared storage by some means
(power, fabric, etc).  It seems like it shouldn’t become unfenced
until connectivity between the nodes is repaired.  Yet, the node is
turn “off” (isolated) and then “on” (unisolated) immediately.  This
(kind-of) makes sense for a fencing agent that uses power to isolate,
since when it’s turned back on, pacemaker will not started any
resources on that node until it sees the other nodes (due to the
wait_for_all setting).  However, for other types of fencing agents,
it doesn’t make sense.  Does the “off” action not mean isolate from
shared storage? And the “on” action not mean unisolate?  What is the
correct way to understand fencing/stonith?

I think the key idea is that "on" will be called when the fenced node
asks to rejoin the cluster. So stopping that from happening until a
sysadmin has intervened is an important part (if I'm not missing
something).

Note that if the fenced node still has network connectivity to the
cluster, and the fenced node is actually operational, it will be
notified by the cluster that it was fenced, and it will stop its
pacemaker, thus fulfilling the requirement. But you obviously can't
rely on that because fencing may be called precisely because network
connectivity is lost or the host is not fully operational.


The behavior I wanted to see was, when pacemaker lost connectivity to
a node, it would run the off action for that node.  If this
succeeded, it could continue running resources.  Later, when
pacemaker saw the node again it would run the “on” action on the
fence agent (knowing that it was no longer split-brained).  Node2,
would try to do the same thing, but once it was fenced, it would not
longer attempt to fence node1.  It also wouldn’t attempt to start any
resources.  I thought that adding “requires unfencing” to the
resource would make this happen.  Is there a way to get this
behavior?

That is basically what happens, the question is how "pacemaker saw the
node again" becomes possible.


Thanks!

btw, here's the cluster configuration:

pcs cluster auth node1 node2
pcs cluster setup --name ataCluster node1 node2
pcs cluster start –all
pcs property set stonith-enabled=true
pcs resource defaults migration-threshold=1
pcs resource create Jaws ocf:atavium:myResource op stop on-fail=fence
meta requires=unfencing
pcs stonith create myStonith fence_custom op monitor interval=0 meta
provides=unfencing
pcs property set symmetric-cluster=true

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How is fencing and unfencing suppose to work?

2018-09-28 Thread Ryan Thomas
Update:  It seems like fencing does work as I expected it to work.  The
problem was with how I was testing it.  I was seeing the node  turned “off”
(isolated) and then “on” (unisolated) immediately which seemed wrong.  This
was because the way I was turning the node off in my testing was to kill
the some processes, including the pacemaker and corosync processes.
However the systemd unit file for pacemaker/corosync is configured to
restart the service immediately if it dies.  So, I was seeing the "on" call
immediately after the "off" because the pacemaker/corosync service was
restarted, so it appeared the node I just killed, immediately came back.
Thanks,
Ryan

On Tue, Sep 4, 2018 at 7:49 PM Ken Gaillot  wrote:

> On Tue, 2018-08-21 at 10:23 -0500, Ryan Thomas wrote:
> > I’m seeing unexpected behavior when using “unfencing” – I don’t think
> > I’m understanding it correctly.  I configured a resource that
> > “requires unfencing” and have a custom fencing agent which “provides
> > unfencing”.   I perform a simple test where I setup the cluster and
> > then run “pcs stonith fence node2”, and I see that node2 is
> > successfully fenced by sending an “off” action to my fencing agent.
> > But, immediately after this, I see an “on” action sent to my fencing
> > agent.  My fence agent doesn’t implement the “reboot” action, so
> > perhaps its trying to reboot by running an off action followed by a
> > on action.  Prior to adding “provides unfencing” to the fencing
> > agent, I didn’t see the on action. It seems unsafe to say “node2 you
> > can’t run” and then immediately “ you can run”.
>
> I'm not as familiar with unfencing as I'd like, but I believe the basic
> idea is:
>
> - the fence agent's off action cuts the machine off from something
> essential needed to run resources (generally shared storage or network
> access)
>
> - the fencing works such that a fenced host is not able to request
> rejoining the cluster without manual intervention by a sysadmin
>
> - when the sysadmin allows the host back into the cluster, and it
> contacts the other nodes to rejoin, the cluster will call the fence
> agent's on action, which is expected to re-enable the host's access
>
> How that works in practice, I have only vague knowledge.
>
> > I don’t think I’m understanding this aspect of fencing/stonith.  I
> > thought that the fence agent acted as a proxy to a node, when the
> > node was fenced, it was isolated from shared storage by some means
> > (power, fabric, etc).  It seems like it shouldn’t become unfenced
> > until connectivity between the nodes is repaired.  Yet, the node is
> > turn “off” (isolated) and then “on” (unisolated) immediately.  This
> > (kind-of) makes sense for a fencing agent that uses power to isolate,
> > since when it’s turned back on, pacemaker will not started any
> > resources on that node until it sees the other nodes (due to the
> > wait_for_all setting).  However, for other types of fencing agents,
> > it doesn’t make sense.  Does the “off” action not mean isolate from
> > shared storage? And the “on” action not mean unisolate?  What is the
> > correct way to understand fencing/stonith?
>
> I think the key idea is that "on" will be called when the fenced node
> asks to rejoin the cluster. So stopping that from happening until a
> sysadmin has intervened is an important part (if I'm not missing
> something).
>
> Note that if the fenced node still has network connectivity to the
> cluster, and the fenced node is actually operational, it will be
> notified by the cluster that it was fenced, and it will stop its
> pacemaker, thus fulfilling the requirement. But you obviously can't
> rely on that because fencing may be called precisely because network
> connectivity is lost or the host is not fully operational.
>
> > The behavior I wanted to see was, when pacemaker lost connectivity to
> > a node, it would run the off action for that node.  If this
> > succeeded, it could continue running resources.  Later, when
> > pacemaker saw the node again it would run the “on” action on the
> > fence agent (knowing that it was no longer split-brained).  Node2,
> > would try to do the same thing, but once it was fenced, it would not
> > longer attempt to fence node1.  It also wouldn’t attempt to start any
> > resources.  I thought that adding “requires unfencing” to the
> > resource would make this happen.  Is there a way to get this
> > behavior?
>
> That is basically what happens, the question is how "pacemaker saw the
> node again" becomes possible.
>
> >
> > Thanks!
> >
> > btw, here's the cluster configuration:
> >
> > pcs cluster auth node1 node2
> > pcs cluster setup --name ataCluster node1 node2
> > pcs cluster start –all
> > pcs property set stonith-enabled=true
> > pcs resource defaults migration-threshold=1
> > pcs resource create Jaws ocf:atavium:myResource op stop on-fail=fence
> > meta requires=unfencing
> > pcs stonith create myStonith fence_custom op monitor interval=0 meta
> > provides=unfencin

Re: [ClusterLabs] Which effective user is calling OCF agents for querying meta-data?

2018-09-28 Thread Ken Gaillot
On Wed, 2018-09-26 at 13:26 +, cfpubl...@verimatrix.com wrote:
> Hi all,
>  
> we have been using pacemaker 1.1.7 for many years on RedHat 6.
> Recently, we moved to RedHat 7.3 and pacemaker 1.1.17.
> Note that we build pacemaker from source RPMs and don’t use the
> packages supplied by RedHat.
>  
> With pacemaker 1.1.17, we observe the following messages during
> startup of pacemaker:
> 2018-09-18T11:58:18.452951+03:00 p12-0001-bcsm03 crmd[2871]: 
> warning: Cannot execute
> '/usr/lib/ocf/resource.d/verimatrix/anything4': Permission denied
> (13)
> 2018-09-18T11:58:18.453179+03:00 p12-0001-bcsm03 crmd[2871]:   
> error: Failed to retrieve meta-data for ocf:verimatrix:anything4
> 2018-09-18T11:58:18.453291+03:00 p12-0001-bcsm03 crmd[2871]:   
> error: No metadata for ocf::verimatrix:anything4
>  
> However, apart from that, we can control the respective cluster
> resource (start, stop, move, etc.) as expected.
>  
> crmd is running as user ‘hacluster’, both on the old pacemaker 1.1.7
> deployment on RHEL6 and on the new pacemaker 1.1.17 deployment on
> RHEL7.
>  
> It seems that on startup, crmd is querying the meta-data on the OCF
> agents using a non-root user (hacluster?) while the regular resource
> control activity seems to be done as root.
> The OCF resource in question intentionally resides in a directory
> that is inaccessible to non-root users.
>  
> Is this behavior of using different users intended? If yes, any clue
> why was it working with pacemaker 1.1.7 under RHEL6?
>  
> Thanks,
>   Carsten

This was answered elsewhere, but for anyone searching who ends up here:

The crmd executes meta-data actions as hacluster, while the lrmd
executes all other resource agent actions as root. It is a long-term
goal to make crmd go through the lrmd to get meta-data, to fix this and
other issues.

As a best practice, resource agents' meta-data action should not
require any permissions or have any side effects, as normal users
should be able to query meta-data.

I'm not sure why it worked under 1.1.7. My best guess is that you were
using the old corosync 1 plugin (as opposed to the CMAN layer more
commonly used on RHEL 6), and that the plugin launched all processes as
root.

Giving hacluster access to the protected directory, using setfacl or
group membership, might be a useful workaround.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Understanding the behavior of pacemaker crash

2018-09-28 Thread Ken Gaillot
On Fri, 2018-09-28 at 15:26 +0530, Prasad Nagaraj wrote:
> Hi Ken - Only if I turn off corosync on the node [ where I crashed
> pacemaker] other nodes are able to detect and put the node as
> OFFLINE.
> Do you have any other guidance or insights into this ?

Yes, corosync is the cluster membership layer -- if corosync is
successfully running, then the node is a member of the cluster.
Pacemaker's crmd provides a higher level of membership; typically, with
corosync but no crmd, the node shows up as "pending" in status. However
I am not sure how it worked with the old corosync plugin.

> 
> Thanks
> Prasad
> 
> On Thu, Sep 27, 2018 at 9:33 PM Prasad Nagaraj  l.com> wrote:
> > Hi Ken - Thanks for the response. Pacemaker is still not running on
> > that node. So I am still wondering what could be the issue ? Any
> > other configurations or logs should I be sharing to understand this
> > more ?
> > 
> > Thanks!
> > 
> > On Thu, Sep 27, 2018 at 8:08 PM Ken Gaillot 
> > wrote:
> > > On Thu, 2018-09-27 at 13:45 +0530, Prasad Nagaraj wrote:
> > > > Hello - I was trying to understand the behavior or cluster when
> > > > pacemaker crashes on one of the nodes. So I hard killed
> > > pacemakerd
> > > > and its related processes.
> > > > 
> > > > -
> > > --
> > > > -
> > > > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
> > > > root      74022      1  0 07:53 pts/0    00:00:00 pacemakerd
> > > > 189       74028  74022  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/cib
> > > > root      74029  74022  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/stonithd
> > > > root      74030  74022  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/lrmd
> > > > 189       74031  74022  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/attrd
> > > > 189       74032  74022  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/pengine
> > > > 189       74033  74022  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/crmd
> > > > 
> > > > root      75228  50092  0 07:54 pts/0    00:00:00 grep
> > > pacemaker
> > > > [root@SG-mysqlold-907 azureuser]# kill -9 74022
> > > > 
> > > > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
> > > > root      74030      1  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/lrmd
> > > > 189       74032      1  0 07:53 ?        00:00:00
> > > > /usr/libexec/pacemaker/pengine
> > > > 
> > > > root      75303  50092  0 07:55 pts/0    00:00:00 grep
> > > pacemaker
> > > > [root@SG-mysqlold-907 azureuser]# kill -9 74030
> > > > [root@SG-mysqlold-907 azureuser]# kill -9 74032
> > > > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
> > > > root      75332  50092  0 07:55 pts/0    00:00:00 grep
> > > pacemaker
> > > > 
> > > > [root@SG-mysqlold-907 azureuser]# crm satus
> > > > ERROR: status: crm_mon (rc=107): Connection to cluster failed:
> > > > Transport endpoint is not connected
> > > > -
> > > --
> > > > --
> > > > 
> > > > However, this does not seem to be having any effect on the
> > > cluster
> > > > status from other nodes
> > > > -
> > > --
> > > > 
> > > > 
> > > > [root@SG-mysqlold-909 azureuser]# crm status
> > > > Last updated: Thu Sep 27 07:56:17 2018          Last change:
> > > Thu Sep
> > > > 27 07:53:43 2018 by root via crm_attribute on SG-mysqlold-909
> > > > Stack: classic openais (with plugin)
> > > > Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0)
> > > -
> > > > partition with quorum
> > > > 3 nodes and 3 resources configured, 3 expected votes
> > > > 
> > > > Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ]
> > > 
> > > It most definitely would make the node offline, and if fencing
> > > were
> > > configured, the rest of the cluster would fence the node to make
> > > sure
> > > it's safely down.
> > > 
> > > I see you're using the old corosync 1 plugin. I suspect what
> > > happened
> > > in this case is that corosync noticed the plugin died and
> > > restarted it
> > > quickly enough that it had rejoined by the time you checked the
> > > status
> > > elsewhere.
> > > 
> > > > 
> > > > Full list of resources:
> > > > 
> > > >  Master/Slave Set: ms_mysql [p_mysql]
> > > >      Masters: [ SG-mysqlold-909 ]
> > > >      Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ]
> > > > 
> > > > 
> > > > [root@SG-mysqlold-908 azureuser]# crm status
> > > > Last updated: Thu Sep 27 07:56:08 2018          Last change:
> > > Thu Sep
> > > > 27 07:53:43 2018 by root via crm_attribute on SG-mysqlold-909
> > > > Stack: classic openais (with plugin)
> > > > Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0)
> > > -
> > > > partition with quorum
> > > > 3 nodes a

Re: [ClusterLabs] Understanding the behavior of pacemaker crash

2018-09-28 Thread Prasad Nagaraj
Hi Ken - Only if I turn off corosync on the node [ where I crashed
pacemaker] other nodes are able to detect and put the node as OFFLINE.
Do you have any other guidance or insights into this ?

Thanks
Prasad

On Thu, Sep 27, 2018 at 9:33 PM Prasad Nagaraj 
wrote:

> Hi Ken - Thanks for the response. Pacemaker is still not running on that
> node. So I am still wondering what could be the issue ? Any other
> configurations or logs should I be sharing to understand this more ?
>
> Thanks!
>
> On Thu, Sep 27, 2018 at 8:08 PM Ken Gaillot  wrote:
>
>> On Thu, 2018-09-27 at 13:45 +0530, Prasad Nagaraj wrote:
>> > Hello - I was trying to understand the behavior or cluster when
>> > pacemaker crashes on one of the nodes. So I hard killed pacemakerd
>> > and its related processes.
>> >
>> > ---
>> > -
>> > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
>> > root  74022  1  0 07:53 pts/000:00:00 pacemakerd
>> > 189   74028  74022  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/cib
>> > root  74029  74022  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/stonithd
>> > root  74030  74022  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/lrmd
>> > 189   74031  74022  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/attrd
>> > 189   74032  74022  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/pengine
>> > 189   74033  74022  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/crmd
>> >
>> > root  75228  50092  0 07:54 pts/000:00:00 grep pacemaker
>> > [root@SG-mysqlold-907 azureuser]# kill -9 74022
>> >
>> > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
>> > root  74030  1  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/lrmd
>> > 189   74032  1  0 07:53 ?00:00:00
>> > /usr/libexec/pacemaker/pengine
>> >
>> > root  75303  50092  0 07:55 pts/000:00:00 grep pacemaker
>> > [root@SG-mysqlold-907 azureuser]# kill -9 74030
>> > [root@SG-mysqlold-907 azureuser]# kill -9 74032
>> > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
>> > root  75332  50092  0 07:55 pts/000:00:00 grep pacemaker
>> >
>> > [root@SG-mysqlold-907 azureuser]# crm satus
>> > ERROR: status: crm_mon (rc=107): Connection to cluster failed:
>> > Transport endpoint is not connected
>> > ---
>> > --
>> >
>> > However, this does not seem to be having any effect on the cluster
>> > status from other nodes
>> > ---
>> > 
>> >
>> > [root@SG-mysqlold-909 azureuser]# crm status
>> > Last updated: Thu Sep 27 07:56:17 2018  Last change: Thu Sep
>> > 27 07:53:43 2018 by root via crm_attribute on SG-mysqlold-909
>> > Stack: classic openais (with plugin)
>> > Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0) -
>> > partition with quorum
>> > 3 nodes and 3 resources configured, 3 expected votes
>> >
>> > Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ]
>>
>> It most definitely would make the node offline, and if fencing were
>> configured, the rest of the cluster would fence the node to make sure
>> it's safely down.
>>
>> I see you're using the old corosync 1 plugin. I suspect what happened
>> in this case is that corosync noticed the plugin died and restarted it
>> quickly enough that it had rejoined by the time you checked the status
>> elsewhere.
>>
>> >
>> > Full list of resources:
>> >
>> >  Master/Slave Set: ms_mysql [p_mysql]
>> >  Masters: [ SG-mysqlold-909 ]
>> >  Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ]
>> >
>> >
>> > [root@SG-mysqlold-908 azureuser]# crm status
>> > Last updated: Thu Sep 27 07:56:08 2018  Last change: Thu Sep
>> > 27 07:53:43 2018 by root via crm_attribute on SG-mysqlold-909
>> > Stack: classic openais (with plugin)
>> > Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0) -
>> > partition with quorum
>> > 3 nodes and 3 resources configured, 3 expected votes
>> >
>> > Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ]
>> >
>> > Full list of resources:
>> >
>> >  Master/Slave Set: ms_mysql [p_mysql]
>> >  Masters: [ SG-mysqlold-909 ]
>> >  Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ]
>> >
>> > ---
>> > ---
>> >
>> > I am bit surprised that other nodes are not able to detect that
>> > pacemaker is down on one of the nodes - SG-mysqlold-907
>> >
>> > Even if I kill pacemaker on the node which is a DC - I observe the
>> > same behavior with rest of the nodes not detecting that DC is down.
>> >
>> > Could some one explain what is the expected behavior in these cases ?
>> >
>> > I 

[ClusterLabs] Which effective user is calling OCF agents for querying meta-data?

2018-09-28 Thread cfpubl...@verimatrix.com
Hi all,

we have been using pacemaker 1.1.7 for many years on RedHat 6. Recently, we 
moved to RedHat 7.3 and pacemaker 1.1.17.
Note that we build pacemaker from source RPMs and don’t use the packages 
supplied by RedHat.

With pacemaker 1.1.17, we observe the following messages during startup of 
pacemaker:
2018-09-18T11:58:18.452951+03:00 p12-0001-bcsm03 crmd[2871]:  warning: Cannot 
execute '/usr/lib/ocf/resource.d/verimatrix/anything4': Permission denied (13)
2018-09-18T11:58:18.453179+03:00 p12-0001-bcsm03 crmd[2871]:error: Failed 
to retrieve meta-data for ocf:verimatrix:anything4
2018-09-18T11:58:18.453291+03:00 p12-0001-bcsm03 crmd[2871]:error: No 
metadata for ocf::verimatrix:anything4

However, apart from that, we can control the respective cluster resource 
(start, stop, move, etc.) as expected.

crmd is running as user ‘hacluster’, both on the old pacemaker 1.1.7 deployment 
on RHEL6 and on the new pacemaker 1.1.17 deployment on RHEL7.

It seems that on startup, crmd is querying the meta-data on the OCF agents 
using a non-root user (hacluster?) while the regular resource control activity 
seems to be done as root.
The OCF resource in question intentionally resides in a directory that is 
inaccessible to non-root users.

Is this behavior of using different users intended? If yes, any clue why was it 
working with pacemaker 1.1.7 under RHEL6?

Thanks,
  Carsten
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-28 Thread Christine Caulfield
On 27/09/18 20:16, Ferenc Wágner wrote:
> Christine Caulfield  writes:
> 
>> I'm also looking into high-res timestamps for logfiles too.
> 
> Wouldn't that be a useful option for the syslog output as well?  I'm
> sometimes concerned by the batching effect added by the transport
> between the application and the (local) log server (rsyslog or systemd).
> Reliably merging messages from different channels can prove impossible
> without internal timestamps (even considering a single machine only).
> 
> Another interesting feature could be structured, direct journal output
> (if you're looking for challenges).
> 


I'm inclined to leave syslog timestamps to syslog - rsyslog has the
option for hi-res timestamps (yes, I know it stamps them on receipt and
all that) if you need them. Adding 'proper' journal output sounds like a
good idea to me though

I'm not so much looking for challenges as looking to make libqb more
useful for the people using it :)


Chrissie
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org