Re: [ClusterLabs] resource management of standby node

2020-12-08 Thread Roger Zhou



On 12/1/20 4:03 PM, Ulrich Windl wrote:

Ken Gaillot  schrieb am 30.11.2020 um 19:52 in Nachricht

:

...


Though there's nothing wrong with putting all nodes in standby. Another
alternative would be to set the stop-all-resources cluster property.


Hi Ken,

thanks for the valuable feedback!

I was looking for that, but unfortunately crm shell cannot set that from the 
resource (or node) context; only from the configure context.
I don't know what a good syntax would be "resource stop all" / "resource start all" or 
"resource stop-all" / "resource unstop-all"
(the asymmetry is that after a "stop all" you cannot start a singly resource (I guess), 
but you'll have to use "start-all" (which, in turn, does not start resources that have a 
stopped role (I guess).

So maybe "resource set stop-all" / "resource unset stop-all" / "resource clear 
stop-all"



1.
Well, let `crm resource stop|start all` change the cluster property of 
`stop-all-resources` might contaminate the syntax at the resources level alone. 
To avoid that, the user interface need be more carefully to deliver the proper 
information at the first place about the internals at some degree to avoid the 
potential misunderstanding or questions.


2.
On the other hand, people might naturally read `crm resource stop all` as 
changing all resources `target-role=Stopped`. Well, technically this seems a 
bit awkward, but no obvious benefit comparing to stop-all-resources. And, 
pacemaker developers could comment more internals around this.


3.
`resource set|unset` add more commands under `resource` and will confuse some 
users and should be avoided in my view.


I feel more discussion is expected, though my gut feeling approach 1 is a 
better one.


Anyway, good topic indeed. Feedback from more users would be useful to shape 
the better UI/UX. I can imagine some people may have idea to suggest "--all" 
even, btw.


Thanks,
Roger

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Roger Zhou



On 12/8/20 6:48 PM, Strahil Nikolov wrote:

Nope,

but if you don't use clustered FS, you could also use plain LVM + tags.
As far as I know you need dlm and clvmd for clustered FS.



FYI, clvmd is dropped since lvm2 v2_03, and is replaced by lvmlockd. BTW, 
lvmlockd (or its precedent clvmd) is optional here in theory, though 
practically useful.



On Fri, Dec 4, 2020 at 5:32 AM Ulrich Windl


Offtopic: Are you using DLM with OCFS2 ?


Hi!

I'm using OCFS2, but I tend to ask "Can I use OCFS2 _without_ DLM?". ;-)



As Strahil said, DLM is a must-have. Well, as a side note, a component named 
"o2cb" in the kernel space is an alternative to replace corosync/pacemaker, 
however.


BR,
Roger

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-08 Thread Roger Zhou

Great news for the new version, first of all!

On 12/8/20 8:12 PM, Klaus Wenninger wrote:

On 12/8/20 11:51 AM, Klaus Wenninger wrote:

On 12/3/20 9:29 AM, Reid Wahl wrote:

On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl


[...]


‑ add robustness against misconfiguration / improve documentation

   * add environment section to man‑page previously just available in
 template‑config
   * inform the user to restart the sbd service after disk‑initialization

I thought with adding UUIDs sbd automatically detects a header change.

You're having a valid point here.
Actually a disk-init on an operational cluster should be
quite safe. (A very small race between header and slot
read does exist.)
Might make sense to think over taking the message back
or revising it.

Yan Gao just pointed me to the timeout configuration not being
updated if it changes in the header.
Guess until that is tackled one way or another the message
is a good idea.



Indeed, users may want to tune sbd in the runtime without restart the whole 
cluster stack, eg. watchdog timeout, msgwait timeout, etc. Currently, changing 
these timeouts will force users to recreate the sbd disk, that is a little 
strange user experience. And, restarting the whole cluster is a worse impression.


I can understand there are gaps currently, eg. to reinitialize watchdog driver 
timeout, and followed by a script to refresh pacemaker stonith-watchdog-timeout 
and stonith-timeout etc.


Furthermore, Can we change SBD_DEVICE in runtime even?

All in all, they add flexibilities to the management activities, and make the 
better user experience overall.


Thanks,
Roger

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Strahil Nikolov
Nope,

but if you don't use clustered FS, you could also use plain LVM + tags.
As far as I know you need dlm and clvmd for clustered FS.

Best Regards,
Strahil Nikolov






В вторник, 8 декември 2020 г., 10:15:39 Гринуич+2, Ulrich Windl 
 написа: 





>>> Strahil Nikolov  schrieb am 05.12.2020 um 18:51 in
Nachricht :
> It's more interesting why you got connection close...
> Are you sure you didn't got network issues ? What is corosync saying in
> the lgos ?
> 
> Offtopic: Are you using DLM with OCFS2 ?

Hi!

I'm using OCFS2, but I tend to ask "Can I use OCFS2 _without_ DLM?". ;-)

Regards,
Ulrich

> 
> Best Regards,
> Strahil Nikolov
> 
> В 10:33 -0800 на 04.12.2020 (пт), Reid Wahl написа:
>> On Fri, Dec 4, 2020 at 10:32 AM Reid Wahl  wrote:
>> > I'm inclined to agree, although maybe there's a good reason. These
>> > get
>> > logged with KERN_ERR priority.
>> 
>> I hit Enter and that email sent instead of line-breaking... anyway.
>> 
>> https://github.com/torvalds/linux/blob/master/fs/dlm/dlm_internal.h#L61-L62

>> https://github.com/torvalds/linux/blob/master/fs/dlm/lowcomms.c#L1250 
>> 
>> > On Fri, Dec 4, 2020 at 5:32 AM Ulrich Windl
>> >  wrote:
>> > > Hi!
>> > > 
>> > > Logging into a server via iDRAC, I see several messages drom
>> > > "dlm:" at the console screen. My obvious explanation is that they
>> > > are on the screen, because journald (SLES15 SP2) treats them is
>> > > high priority messages that should go to the screen. However IMHO
>> > > they are not:
>> > > 
>> > > [83035.82] dlm: closing connection to node 118
>> > > [84756.045008] dlm: closing connection to node 118
>> > > [160906.211673] dlm: Using SCTP for communications
>> > > [160906.239357] dlm: connecting to 118
>> > > [160906.239807] dlm: connecting to 116
>> > > [160906.241432] dlm: connected to 116
>> > > [160906.241448] dlm: connected to 118
>> > > [174464.522831] dlm: closing connection to node 116
>> > > [174670.058912] dlm: connecting to 116
>> > > [174670.061373] dlm: connected to 116
>> > > [175561.816821] dlm: closing connection to node 118
>> > > [175617.654995] dlm: connecting to 118
>> > > [175617.665153] dlm: connected to 118
>> > > [175695.310971] dlm: closing connection to node 118
>> > > [175695.311039] dlm: closing connection to node 116
>> > > [175695.311084] dlm: closing connection to node 119
>> > > [175759.045564] dlm: Using SCTP for communications
>> > > [175759.052075] dlm: connecting to 118
>> > > [175759.052623] dlm: connecting to 116
>> > > [175759.052917] dlm: connected to 116
>> > > [175759.053847] dlm: connected to 118
>> > > [432217.637844] dlm: closing connection to node 119
>> > > [432217.637912] dlm: closing connection to node 118
>> > > [432217.637953] dlm: closing connection to node 116
>> > > [438872.495086] dlm: Using SCTP for communications
>> > > [438872.499832] dlm: connecting to 118
>> > > [438872.500340] dlm: connecting to 116
>> > > [438872.500600] dlm: connected to 116
>> > > [438872.500642] dlm: connected to 118
>> > > [779424.346316] dlm: closing connection to node 116
>> > > [780017.597844] dlm: connecting to 116
>> > > [780017.616321] dlm: connected to 116
>> > > [783118.476060] dlm: closing connection to node 116
>> > > [783318.744036] dlm: connecting to 116
>> > > [783318.756923] dlm: connected to 116
>> > > [784893.793366] dlm: closing connection to node 118
>> > > [785082.619709] dlm: connecting to 118
>> > > [785082.633263] dlm: connected to 118
>> > > 
>> > > Regards,
>> > > Ulrich
>> > > 
>> > > 
>> > > 
>> > > ___
>> > > Manage your subscription:
>> > > https://lists.clusterlabs.org/mailman/listinfo/users 
>> > > 
>> > > ClusterLabs home: https://www.clusterlabs.org/ 
>> > > 
>> > 
>> > --
>> > Regards,
>> > 
>> > Reid Wahl, RHCA
>> > Senior Software Maintenance Engineer, Red Hat
>> > CEE - Platform Support Delivery - ClusterHA
>> 
>> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-08 Thread Klaus Wenninger
On 12/8/20 11:51 AM, Klaus Wenninger wrote:
> On 12/3/20 9:29 AM, Reid Wahl wrote:
>> On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl
>>  wrote:
>>> Hi!
>>>
>>> See comments inline...
>>>
>> Klaus Wenninger  schrieb am 02.12.2020 um 22:05 in
>>> Nachricht <1b29fa92-b1b7-2315-fbcf-0787ec0e1...@redhat.com>:
 Hi sbd ‑ developers & users!

 Thanks to everybody for contributing to tests and
 further development.

 Improvements in build/CI‑friendlyness and
 added robustness against misconfiguration
 justify labeling the repo v1.4.2.

 I tried to quickly summarize the changes in the
 repo since it was labeled v1.4.1:

 ‑ improve build/CI‑friendlyness

   * travis: switch to F32 as build‑host
 switch to F32 & leap‑15.2
 changes for mock‑2.0
 turn off loop‑devices & device‑mapper on x86_64 targets because
 of changes in GCE
   * regressions.sh: get timeouts from disk‑header to go with proper
>>> defaults
 for architecture
   * use configure for watchdog‑default‑timeout & others
   * ship sbd.pc with basic sbd build information for downstream packages
 to use
   * add number of commits since version‑tag to build‑counter

 ‑ add robustness against misconfiguration / improve documentation

   * add environment section to man‑page previously just available in
 template‑config
   * inform the user to restart the sbd service after disk‑initialization
>>> I thought with adding UUIDs sbd automatically detects a header change.
> You're having a valid point here.
> Actually a disk-init on an operational cluster should be
> quite safe. (A very small race between header and slot
> read does exist.)
> Might make sense to think over taking the message back
> or revising it.
Yan Gao just pointed me to the timeout configuration not being
updated if it changes in the header.
Guess until that is tackled one way or another the message
is a good idea.

Klaus
   * refuse to start if any of the configured device names is invalid
>>> Is this a good idea? Assume you configured two devices, and one device 
>>> fails.
>>> Do you really want to prevent sbd startup then?
>> AFAICT, it's just making sure the device name is of a valid format.
>>
>> https://github.com/ClusterLabs/sbd/blob/master/src/sbd-inquisitor.c#L830-L833
>> -> 
>> https://github.com/ClusterLabs/sbd/blob/master/src/sbd-inquisitor.c#L65-L78
>> -- --> 
>> https://github.com/ClusterLabs/sbd/blob/master/src/sbd-common.c#L1189-L1220
>>
   * add handshake to sync startup/shutdown with pacemakerd
 Previously sbd just waited for the cib‑connnection to show up/go away
 which isn't robust at all.
 The new feature needs new pacemakerd‑api as counterpart.
 Thus build checks for presence of pacemakerd‑api.
 To simplify downstream adoption behavior is configurable at runtime
 via configure‑file with a build‑time‑configurable default.
   * refuse to start if qdevice‑sync_timeout doesn't match watchdog‑timeout
 Needed in particular as qdevice‑sync_timeout delays quorum‑state‑update
 and has a default of 30s that doesn't match the 5s watchdog‑timeout
 default.

 ‑ Fix: sbd‑pacemaker: handle new no_quorum_demote + robustness against new
   policies added
 ‑ Fix: agent: correctly compare string values when calculating timeout
 ‑ Fix: scheduling: overhaul the whole thing
   * prevent possible lockup when format in proc changes
   * properly get and handle scheduler policy & prio
   * on SCHED_RR failing push to the max with SCHED_OTHER
>>> Do you also mess with ioprio/ionice?
> Yes, IOPRIO_CLASS_RT.
> But a good reminder to check in how far the hacky code doing this
> is still state of the art and in how far it is even effective using AIO.
>
> Klaus
>>> Regards,
>>> Ulrich
>>>
 Regards,
 Klaus

 ___
 Manage your subscription:
 https://lists.clusterlabs.org/mailman/listinfo/users

 ClusterLabs home: https://www.clusterlabs.org/
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-08 Thread Klaus Wenninger
On 12/3/20 9:29 AM, Reid Wahl wrote:
> On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl
>  wrote:
>> Hi!
>>
>> See comments inline...
>>
> Klaus Wenninger  schrieb am 02.12.2020 um 22:05 in
>> Nachricht <1b29fa92-b1b7-2315-fbcf-0787ec0e1...@redhat.com>:
>>> Hi sbd ‑ developers & users!
>>>
>>> Thanks to everybody for contributing to tests and
>>> further development.
>>>
>>> Improvements in build/CI‑friendlyness and
>>> added robustness against misconfiguration
>>> justify labeling the repo v1.4.2.
>>>
>>> I tried to quickly summarize the changes in the
>>> repo since it was labeled v1.4.1:
>>>
>>> ‑ improve build/CI‑friendlyness
>>>
>>>   * travis: switch to F32 as build‑host
>>> switch to F32 & leap‑15.2
>>> changes for mock‑2.0
>>> turn off loop‑devices & device‑mapper on x86_64 targets because
>>> of changes in GCE
>>>   * regressions.sh: get timeouts from disk‑header to go with proper
>> defaults
>>> for architecture
>>>   * use configure for watchdog‑default‑timeout & others
>>>   * ship sbd.pc with basic sbd build information for downstream packages
>>> to use
>>>   * add number of commits since version‑tag to build‑counter
>>>
>>> ‑ add robustness against misconfiguration / improve documentation
>>>
>>>   * add environment section to man‑page previously just available in
>>> template‑config
>>>   * inform the user to restart the sbd service after disk‑initialization
>> I thought with adding UUIDs sbd automatically detects a header change.
You're having a valid point here.
Actually a disk-init on an operational cluster should be
quite safe. (A very small race between header and slot
read does exist.)
Might make sense to think over taking the message back
or revising it.
>>
>>>   * refuse to start if any of the configured device names is invalid
>> Is this a good idea? Assume you configured two devices, and one device fails.
>> Do you really want to prevent sbd startup then?
> AFAICT, it's just making sure the device name is of a valid format.
>
> https://github.com/ClusterLabs/sbd/blob/master/src/sbd-inquisitor.c#L830-L833
> -> https://github.com/ClusterLabs/sbd/blob/master/src/sbd-inquisitor.c#L65-L78
> -- --> 
> https://github.com/ClusterLabs/sbd/blob/master/src/sbd-common.c#L1189-L1220
>
>>>   * add handshake to sync startup/shutdown with pacemakerd
>>> Previously sbd just waited for the cib‑connnection to show up/go away
>>> which isn't robust at all.
>>> The new feature needs new pacemakerd‑api as counterpart.
>>> Thus build checks for presence of pacemakerd‑api.
>>> To simplify downstream adoption behavior is configurable at runtime
>>> via configure‑file with a build‑time‑configurable default.
>>>   * refuse to start if qdevice‑sync_timeout doesn't match watchdog‑timeout
>>> Needed in particular as qdevice‑sync_timeout delays quorum‑state‑update
>>> and has a default of 30s that doesn't match the 5s watchdog‑timeout
>>> default.
>>>
>>> ‑ Fix: sbd‑pacemaker: handle new no_quorum_demote + robustness against new
>>>   policies added
>>> ‑ Fix: agent: correctly compare string values when calculating timeout
>>> ‑ Fix: scheduling: overhaul the whole thing
>>>   * prevent possible lockup when format in proc changes
>>>   * properly get and handle scheduler policy & prio
>>>   * on SCHED_RR failing push to the max with SCHED_OTHER
>> Do you also mess with ioprio/ionice?
Yes, IOPRIO_CLASS_RT.
But a good reminder to check in how far the hacky code doing this
is still state of the art and in how far it is even effective using AIO.

Klaus
>>
>> Regards,
>> Ulrich
>>
>>> Regards,
>>> Klaus
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Reid Wahl
Yeah I agree with you on all points, unless the author had a reason
for those decisions 15 years ago. Most of us do our best to avoid
touching DLM :)

If anyone wants to make changes to the DLM logging macros, I'm in
favor of it. I'm just not gonna lead the charge on it.

On Tue, Dec 8, 2020 at 12:01 AM Ulrich Windl
 wrote:
>
> >>> Reid Wahl  schrieb am 04.12.2020 um 19:33 in Nachricht
> :
> > On Fri, Dec 4, 2020 at 10:32 AM Reid Wahl  wrote:
> >>
> >> I'm inclined to agree, although maybe there's a good reason. These get
> >> logged with KERN_ERR priority.
> >
> > I hit Enter and that email sent instead of line‑breaking... anyway.
> >
> > https://github.com/torvalds/linux/blob/master/fs/dlm/dlm_internal.h#L61‑L62
>
> > https://github.com/torvalds/linux/blob/master/fs/dlm/lowcomms.c#L1250
>
> So everything log_print() outputs is an error? IMHO log_print is missing the
> priority/severity parameter...
> Comparing log_print(), log_error() and log_debug() I think that loggin code
> could benefit from some refactoring.
>
> Back on the subject:
> I think "Using SCTP for communications" is informational, not error, just as
> "closing connection to node 118" is probably notice, while "connecting to" /
> "connected to" is probably info or notice, too.
>
> Regards,
> Ulrich
>
> >
> >>
> >> On Fri, Dec 4, 2020 at 5:32 AM Ulrich Windl
> >>  wrote:
> >> >
> >> > Hi!
> >> >
> >> > Logging into a server via iDRAC, I see several messages drom "dlm:" at
> the
> > console screen. My obvious explanation is that they are on the screen,
> > because journald (SLES15 SP2) treats them is high priority messages that
> > should go to the screen. However IMHO they are not:
> >> >
> >> > [83035.82] dlm: closing connection to node 118
> >> > [84756.045008] dlm: closing connection to node 118
> >> > [160906.211673] dlm: Using SCTP for communications
> >> > [160906.239357] dlm: connecting to 118
> >> > [160906.239807] dlm: connecting to 116
> >> > [160906.241432] dlm: connected to 116
> >> > [160906.241448] dlm: connected to 118
> >> > [174464.522831] dlm: closing connection to node 116
> >> > [174670.058912] dlm: connecting to 116
> >> > [174670.061373] dlm: connected to 116
> >> > [175561.816821] dlm: closing connection to node 118
> >> > [175617.654995] dlm: connecting to 118
> >> > [175617.665153] dlm: connected to 118
> >> > [175695.310971] dlm: closing connection to node 118
> >> > [175695.311039] dlm: closing connection to node 116
> >> > [175695.311084] dlm: closing connection to node 119
> >> > [175759.045564] dlm: Using SCTP for communications
> >> > [175759.052075] dlm: connecting to 118
> >> > [175759.052623] dlm: connecting to 116
> >> > [175759.052917] dlm: connected to 116
> >> > [175759.053847] dlm: connected to 118
> >> > [432217.637844] dlm: closing connection to node 119
> >> > [432217.637912] dlm: closing connection to node 118
> >> > [432217.637953] dlm: closing connection to node 116
> >> > [438872.495086] dlm: Using SCTP for communications
> >> > [438872.499832] dlm: connecting to 118
> >> > [438872.500340] dlm: connecting to 116
> >> > [438872.500600] dlm: connected to 116
> >> > [438872.500642] dlm: connected to 118
> >> > [779424.346316] dlm: closing connection to node 116
> >> > [780017.597844] dlm: connecting to 116
> >> > [780017.616321] dlm: connected to 116
> >> > [783118.476060] dlm: closing connection to node 116
> >> > [783318.744036] dlm: connecting to 116
> >> > [783318.756923] dlm: connected to 116
> >> > [784893.793366] dlm: closing connection to node 118
> >> > [785082.619709] dlm: connecting to 118
> >> > [785082.633263] dlm: connected to 118
> >> >
> >> > Regards,
> >> > Ulrich
> >> >
> >> >
> >> >
> >> > ___
> >> > Manage your subscription:
> >> > https://lists.clusterlabs.org/mailman/listinfo/users
> >> >
> >> > ClusterLabs home: https://www.clusterlabs.org/
> >> >
> >>
> >>
> >> ‑‑
> >> Regards,
> >>
> >> Reid Wahl, RHCA
> >> Senior Software Maintenance Engineer, Red Hat
> >> CEE ‑ Platform Support Delivery ‑ ClusterHA
> >
> >
> >
> > ‑‑
> > Regards,
> >
> > Reid Wahl, RHCA
> > Senior Software Maintenance Engineer, Red Hat
> > CEE ‑ Platform Support Delivery ‑ ClusterHA
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: "crm node status" display

2020-12-08 Thread Roger Zhou

Can you create the Github Issues before we lose tracking? Thank you Ulrich!
https://github.com/ClusterLabs/crmsh/issues

BR,
Roger


On 11/20/20 2:50 PM, Ulrich Windl wrote:

Hi!

Setting up a new cluster with SLES15 SP2, I'm wondering: "crm node status" 
displays XML. Is that the way it should be?
h16:~ # crm node
crm(live/rksaph16)node# status

   
   
   


crmsh-4.2.0+git.1604052559.2a348644-5.26.1.noarch

Regards,
Ulrich


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Ulrich Windl
>>> Strahil Nikolov  schrieb am 05.12.2020 um 18:51 in
Nachricht :
> It's more interesting why you got connection close...
> Are you sure you didn't got network issues ? What is corosync saying in
> the lgos ?
> 
> Offtopic: Are you using DLM with OCFS2 ?

Hi!

I'm using OCFS2, but I tend to ask "Can I use OCFS2 _without_ DLM?". ;-)

Regards,
Ulrich

> 
> Best Regards,
> Strahil Nikolov
> 
> В 10:33 -0800 на 04.12.2020 (пт), Reid Wahl написа:
>> On Fri, Dec 4, 2020 at 10:32 AM Reid Wahl  wrote:
>> > I'm inclined to agree, although maybe there's a good reason. These
>> > get
>> > logged with KERN_ERR priority.
>> 
>> I hit Enter and that email sent instead of line-breaking... anyway.
>> 
>> https://github.com/torvalds/linux/blob/master/fs/dlm/dlm_internal.h#L61-L62

>> https://github.com/torvalds/linux/blob/master/fs/dlm/lowcomms.c#L1250 
>> 
>> > On Fri, Dec 4, 2020 at 5:32 AM Ulrich Windl
>> >  wrote:
>> > > Hi!
>> > > 
>> > > Logging into a server via iDRAC, I see several messages drom
>> > > "dlm:" at the console screen. My obvious explanation is that they
>> > > are on the screen, because journald (SLES15 SP2) treats them is
>> > > high priority messages that should go to the screen. However IMHO
>> > > they are not:
>> > > 
>> > > [83035.82] dlm: closing connection to node 118
>> > > [84756.045008] dlm: closing connection to node 118
>> > > [160906.211673] dlm: Using SCTP for communications
>> > > [160906.239357] dlm: connecting to 118
>> > > [160906.239807] dlm: connecting to 116
>> > > [160906.241432] dlm: connected to 116
>> > > [160906.241448] dlm: connected to 118
>> > > [174464.522831] dlm: closing connection to node 116
>> > > [174670.058912] dlm: connecting to 116
>> > > [174670.061373] dlm: connected to 116
>> > > [175561.816821] dlm: closing connection to node 118
>> > > [175617.654995] dlm: connecting to 118
>> > > [175617.665153] dlm: connected to 118
>> > > [175695.310971] dlm: closing connection to node 118
>> > > [175695.311039] dlm: closing connection to node 116
>> > > [175695.311084] dlm: closing connection to node 119
>> > > [175759.045564] dlm: Using SCTP for communications
>> > > [175759.052075] dlm: connecting to 118
>> > > [175759.052623] dlm: connecting to 116
>> > > [175759.052917] dlm: connected to 116
>> > > [175759.053847] dlm: connected to 118
>> > > [432217.637844] dlm: closing connection to node 119
>> > > [432217.637912] dlm: closing connection to node 118
>> > > [432217.637953] dlm: closing connection to node 116
>> > > [438872.495086] dlm: Using SCTP for communications
>> > > [438872.499832] dlm: connecting to 118
>> > > [438872.500340] dlm: connecting to 116
>> > > [438872.500600] dlm: connected to 116
>> > > [438872.500642] dlm: connected to 118
>> > > [779424.346316] dlm: closing connection to node 116
>> > > [780017.597844] dlm: connecting to 116
>> > > [780017.616321] dlm: connected to 116
>> > > [783118.476060] dlm: closing connection to node 116
>> > > [783318.744036] dlm: connecting to 116
>> > > [783318.756923] dlm: connected to 116
>> > > [784893.793366] dlm: closing connection to node 118
>> > > [785082.619709] dlm: connecting to 118
>> > > [785082.633263] dlm: connected to 118
>> > > 
>> > > Regards,
>> > > Ulrich
>> > > 
>> > > 
>> > > 
>> > > ___
>> > > Manage your subscription:
>> > > https://lists.clusterlabs.org/mailman/listinfo/users 
>> > > 
>> > > ClusterLabs home: https://www.clusterlabs.org/ 
>> > > 
>> > 
>> > --
>> > Regards,
>> > 
>> > Reid Wahl, RHCA
>> > Senior Software Maintenance Engineer, Red Hat
>> > CEE - Platform Support Delivery - ClusterHA
>> 
>> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Ulrich Windl
>>> Reid Wahl  schrieb am 04.12.2020 um 19:33 in Nachricht
:
> On Fri, Dec 4, 2020 at 10:32 AM Reid Wahl  wrote:
>>
>> I'm inclined to agree, although maybe there's a good reason. These get
>> logged with KERN_ERR priority.
> 
> I hit Enter and that email sent instead of line‑breaking... anyway.
> 
> https://github.com/torvalds/linux/blob/master/fs/dlm/dlm_internal.h#L61‑L62

> https://github.com/torvalds/linux/blob/master/fs/dlm/lowcomms.c#L1250 

So everything log_print() outputs is an error? IMHO log_print is missing the
priority/severity parameter...
Comparing log_print(), log_error() and log_debug() I think that loggin code
could benefit from some refactoring.

Back on the subject:
I think "Using SCTP for communications" is informational, not error, just as
"closing connection to node 118" is probably notice, while "connecting to" /
"connected to" is probably info or notice, too.

Regards,
Ulrich

> 
>>
>> On Fri, Dec 4, 2020 at 5:32 AM Ulrich Windl
>>  wrote:
>> >
>> > Hi!
>> >
>> > Logging into a server via iDRAC, I see several messages drom "dlm:" at
the 
> console screen. My obvious explanation is that they are on the screen, 
> because journald (SLES15 SP2) treats them is high priority messages that 
> should go to the screen. However IMHO they are not:
>> >
>> > [83035.82] dlm: closing connection to node 118
>> > [84756.045008] dlm: closing connection to node 118
>> > [160906.211673] dlm: Using SCTP for communications
>> > [160906.239357] dlm: connecting to 118
>> > [160906.239807] dlm: connecting to 116
>> > [160906.241432] dlm: connected to 116
>> > [160906.241448] dlm: connected to 118
>> > [174464.522831] dlm: closing connection to node 116
>> > [174670.058912] dlm: connecting to 116
>> > [174670.061373] dlm: connected to 116
>> > [175561.816821] dlm: closing connection to node 118
>> > [175617.654995] dlm: connecting to 118
>> > [175617.665153] dlm: connected to 118
>> > [175695.310971] dlm: closing connection to node 118
>> > [175695.311039] dlm: closing connection to node 116
>> > [175695.311084] dlm: closing connection to node 119
>> > [175759.045564] dlm: Using SCTP for communications
>> > [175759.052075] dlm: connecting to 118
>> > [175759.052623] dlm: connecting to 116
>> > [175759.052917] dlm: connected to 116
>> > [175759.053847] dlm: connected to 118
>> > [432217.637844] dlm: closing connection to node 119
>> > [432217.637912] dlm: closing connection to node 118
>> > [432217.637953] dlm: closing connection to node 116
>> > [438872.495086] dlm: Using SCTP for communications
>> > [438872.499832] dlm: connecting to 118
>> > [438872.500340] dlm: connecting to 116
>> > [438872.500600] dlm: connected to 116
>> > [438872.500642] dlm: connected to 118
>> > [779424.346316] dlm: closing connection to node 116
>> > [780017.597844] dlm: connecting to 116
>> > [780017.616321] dlm: connected to 116
>> > [783118.476060] dlm: closing connection to node 116
>> > [783318.744036] dlm: connecting to 116
>> > [783318.756923] dlm: connected to 116
>> > [784893.793366] dlm: closing connection to node 118
>> > [785082.619709] dlm: connecting to 118
>> > [785082.633263] dlm: connected to 118
>> >
>> > Regards,
>> > Ulrich
>> >
>> >
>> >
>> > ___
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/ 
>> >
>>
>>
>> ‑‑
>> Regards,
>>
>> Reid Wahl, RHCA
>> Senior Software Maintenance Engineer, Red Hat
>> CEE ‑ Platform Support Delivery ‑ ClusterHA
> 
> 
> 
> ‑‑ 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE ‑ Platform Support Delivery ‑ ClusterHA
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/