Hello Ulrich,
See my comments inline.
On 2021/10/8 16:38, Ulrich Windl wrote:
Hi!
I just noticed these two messages on two nodes of a 3-node cluster:
Oct 08 10:00:14 h18 kernel: dlm: 790F9C237C2A45758135FE4945B7A744:
send_repeat_remove dir 119 O09d835
Oct 08 10:00:14
On 2021/9/29 16:20, Lentes, Bernd wrote:
- On Sep 29, 2021, at 4:37 AM, Gang He g...@suse.com wrote:
Hi Lentes,
Thank for your feedback.
I have some questions as below,
1) how to clone these VM images from each ocfs2 nodes via reflink?
do you encounter any problems during this step
Gang He schrieb am 11.07.2021 um 10:55 in Nachricht
Hi Ulrich,
Thank for your update.
Based on some feedback from the upstream, there is a patch (ocfs2:
initialize ip_next_orphan), which should fix this problem.
I can comfirm the patch looks very similar with your problem.
I will verify
Hi Ulrich,
Thank for your update.
Based on some feedback from the upstream, there is a patch (ocfs2: initialize
ip_next_orphan), which should fix this problem.
I can comfirm the patch looks very similar with your problem.
I will verify it next week, then let you know the result.
Thanks
Gang
1 um 11:00 in Nachricht <60B748A4.E0C :
161 :
60728>:
Gang He schrieb am 02.06.2021 um 08:34 in Nachricht
om>
Hi Ulrich,
The hang problem looks like a fix
(90bd070aae6c4fb5d302f9c4b9c88be60c8197ec
ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not
100%
Hi Ulrich,
The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec
ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 100%
sure.
If possible, could you help to report a bug to SUSE, then we can work on that
further.
Thanks
Gang
Hi Ulrich,
On 2021/5/18 18:52, Ulrich Windl wrote:
Hi!
I thought using the reflink feature of OCFS2 would be just a nice way to make
crash-consistent VM snapshots while they are running.
As it is a bit tricky to find out how much data is shared between snapshots, I
started to write an
On 2021/1/22 16:17, Ulrich Windl wrote:
Gang He schrieb am 22.01.2021 um 09:13 in Nachricht
<1fd1c07d-d12c-fea9-4b17-90a977fe7...@suse.com>:
Hi Ulrich,
I reviewed the crm configuration file, there are some comments as below,
1) lvmlockd resource is used for shared VG, if you do no
That means this order should be wrong.
order ord_lockspace_fs__lvmlockd Mandatory: cln_lockspace_ocfs2 cln_lvmlock
Thanks
Gang
On 2021/1/21 20:08, Ulrich Windl wrote:
Gang He schrieb am 21.01.2021 um 11:30 in Nachricht
<59b543ee-0824-6b91-d0af-48f66922b...@suse.com>:
Hi Ulri
Hi Ulrich,
The problem is reproduced stably? could you help to share your
pacemaker crm configure and OS/lvm2/resource-agents related version
information?
I feel the problem was probably caused by lvmlock resource agent script,
which did not handle this corner case correctly.
Thanks
Gang
Hi Ulrish
Which Linux distribution/version do you use? could you share the whole crm
configure?
There is a crm configuration demo for your reference.
primitive dlm ocf:pacemaker:controld \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor
for how to setup pacemaker/corosync
cluster stack.
Then, add dlm resource clone, and add ocfs2 resource clone.
Thanks
Gang
Best regards,
On 23.09.2020 10:11, Gang He wrote:
Hello Michael,
ocfs2:o2cb resource is provided by resource-agents on SELS 11.x series.
For the new SLES series(e.g. 12
Hello Michael,
ocfs2:o2cb resource is provided by resource-agents on SELS 11.x series.
For the new SLES series(e.g. 12 or 15), there is not o2cb resource agent
in resource-agent rpm, since this resource is not needed.
You can refer to new SUSE HA guide for how to setup ocfs2 on pacemaker
Hello Strahil,
This kind of configuration should not be recommended.
Why?
Since SBD partition need to be accessed by the cluster nodes stably/frequently.
But the other partition (for XFS file system) is probably under extreme
pressure conditions,
in that case, the SBD partition IO requests will
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] DLM in the cluster can tolerate more than one node
> failure at the same time?
>
> On 22/10/2019 07:15, Gang He wrote:
> > Hi List,
> >
> > I remember that master node has the full copy for one DLM lock
> &g
Hi Bob,
> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Bob
> Peterson
> Sent: 2019年10月21日 21:02
> To: Cluster Labs - All topics related to open-source clustering welcomed
>
> Subject: Re: [ClusterLabs] gfs2: fsid=:work.3: fatal: filesystem
>
Hello Lentes,o
In the cluster environment, usually we need to fence(or dynamically add/delete)
node,
the full stacks provided by packmaker/corosync can help to complete it
automatically/integrally.
Thanks
Gang
From: Users on behalf of Lentes, Bernd
Sent:
> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Lentes,
> Bernd
> Sent: 2019年10月14日 20:04
> To: Pacemaker ML
> Subject: Re: [ClusterLabs] trace of Filesystem RA does not log
>
>
> >> -Original Message-
> >> From: Users
Hello Lentes,
> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Lentes,
> Bernd
> Sent: 2019年10月11日 22:32
> To: Pacemaker ML
> Subject: [ClusterLabs] trace of Filesystem RA does not log
>
> Hi,
>
> occasionally the stop of a Filesystem resource
Hello Ulrich
Cluster MD belongs to SLE HA extension product.
The related doc link is here, e.g.
https://documentation.suse.com/sle-ha/15-SP1/single-html/SLE-HA-guide/#cha-ha-cluster-md
Thanks
Gang
> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of
>
> Regards,
>
>
> Indivar Nair
>
>
>
> On Tue, Jun 11, 2019 at 10:18 AM Gang He wrote:
>>
>> Hi Indivar,
>>
>> See my comments inline.
>>
>> >>> On 6/11/2019 at 12:10 pm, in message
>> , Indivar
&
Hi Indivar,
See my comments inline.
>>> On 6/11/2019 at 12:10 pm, in message
, Indivar
Nair wrote:
> Hello ...,
>
> I have an Active-Passive cluster with two nodes hosting an XFS
> Filesystem over a CLVM Volume.
>
> If a failover happens, the volume is mounted on the other node without
> a
Hello Guys,
As the subject said, I want to download the source code of libdlm, to see its
git log changes.
libdm is used to build dlm_controld, dlm_stonith, dlm_tool and etc.
Thanks
Gang
___
Manage your subscription:
Hello Ulrich,
Could you reproduce this issue stably? if yes, please share your steps.
Since we also encountered a similar issue, it looks that Cmirrord can not join
the CPG(corosync related concept), then the resource is timeout, the node is
fenced.
Thanks
Gang
>>> On 2018/11/12 at 15:46, in
Hello Lentes,
>>> On 2018/9/11 at 20:50, in message
<584818902.7776848.1536670226935.javamail.zim...@helmholtz-muenchen.de>,
"Lentes, Bernd" wrote:
>
> - On Sep 11, 2018, at 4:29 AM, Gang He g...@suse.com wrote:
>
>> Hello Lentes,
>>
>>
Hello Lentes,
It does not look like a OCFS2 or pacemaker problem, more like virtualization
problem.
From OCFS2/LVM2 perspective, if you use one LV for one VirtualDomain, that
means the guest VMs on that VirtualDomain can not occupy the other LVs' storage
space.
If you use OCFS2 on one LV for
ks like a bug? since the
tests were hanged, we have to reboot that node manually.
Thanks
Gang
>>>
> On Thu, Apr 12, 2018 at 09:31:49PM -0600, Gang He wrote:
>> During this period, could we allow tcp protocol work (rather than return
> error directly) under two-ring cluste
Hi Lentes,
>>>
>
> - On Mar 15, 2018, at 3:47 AM, Gang He g...@suse.com wrote:
>> Just one comments, you have to make sure the VM file integrity before
> calling
>> reflink.
>>
>
> Hi Gang,
>
> how could i achieve that ? sync ? The d
Hello Lentes,
>>>
> Hi,
>
> i have a 2-node-cluster with my services (web, db) running in VirtualDomain
> resources.
> I have a SAN with cLVM, each guest lies in a dedicated logical volume with
> an ext3 fs.
>
> Currently i'm thinking about snapshoting the guests to make a backup in the
>
ormally under that situation.
Yan/Bin, do you have any comments about two-node cluster? which configuration
settings will affect corosync quorum/DLM ?
Thanks
Gang
>
>
> --
> Regards,
> Muhammad Sharfuddin
>
> On 3/12/2018 10:59 AM, Gang He wrote:
>> Hello Muhammad,
, please watch where is mount.ocfs2 process hanged via
"cat /proc/xxx/stack" command.
If the back trace is stopped at DLM kernel module, usually the root cause is
cluster configuration problem.
Thanks
Gang
>>>
> On 3/12/2018 7:32 AM, Gang He wrote:
>> Hello Muhammad
Hello Muhammad,
I think this problem is not in ocfs2, the cause looks like the cluster quorum
is missed.
For two-node cluster (does not three-node cluster), if one node is offline, the
quorum will be missed by default.
So, you should configure two-node related quorum setting according to the
Hello Digimer,
>>>
> On 2018-03-08 12:10 PM, David Teigland wrote:
>>> I use active rrp_mode in corosync.conf and reboot the cluster to let the
> configuration effective.
>>> But, the about 5 mins hang in new_lockspace() function is still here.
>>
>> The last time I tested connection
Hello David,
If sctp implementation did not fix this problem, there is any workaround for a
two-rings cluster?
Could we use TCP protocol in DLM under a two-rings cluster to by-pass
connection channel switch issue?
Thanks
Gang
>>>
>> I use active rrp_mode in corosync.conf and reboot the
pleted immediately.
Yes, the behavior does not follow the O_NONBLOCK flag, it is too long for 5
mins.
Thanks
Gang
Regards,
Ulrich
>>> "Gang He" <g...@suse.com> schrieb am 08.03.2018 um 10:48 in Nachricht
<5aa1776502f9000ad...@prv-mh.provo.novell.com>:
>
/sle_ha/book_sleha/data/sec_ha_installatio
> n_terms.html
>
> That fixes I saw in 4.14.*
>
>> On 8 Mar 2018, at 09:12, Gang He <g...@suse.com> wrote:
>>
>> Hi Feldhost,
>>
>>
>>>>>
>>> Hello Gang He,
>>>
>>
Hi Feldhost,
>>>
> Hello Gang He,
>
> which type of corosync rrp_mode you use? Passive or Active?
clvm1:/etc/corosync # cat corosync.conf | grep rrp_mode
rrp_mode: passive
Did you try test both?
No, only this mode.
Also, what kernel version you use? I s
Hello list and David Teigland,
I got a problem under a two rings cluster, the problem can be reproduced with
the below steps.
1) setup a two rings cluster with two nodes.
e.g.
clvm1(nodeid 172204569) addr_list eth0 10.67.162.25 eth1 192.168.152.240
clvm2(nodeid 172204570) addr_list eth0
Hello David,
If you want to use OCFS2 with Pacemaker stack, you do not need ocfs2_controld
in the new version.
you do not need configure o2cb resource too.
I can give you a crm demo in SLE12SP3 environment (actually there is not any
change since SLE12SP1)
crm(live/tb-node1)configure# show
Hello Kyle,
>From the ocfs2 code, ocfs2 supports fstrim operation in the cluster
>environment.
If there was a file system corruption encountered, it should be a bug.
By the way, you should be recommended to run a fstrim operation from one node
in the cluster.
Thanks
Gang
>>>
> Hello,
>
>
Hi Kristoffer,
The meeting looks very attractive.
Just one question, does the meeting have any website to archive the previous
topics/presentations/materials?
Thanks
Gang
>>>
> Hi everyone!
>
> The last time we had an HA summit was in 2015, and the intention then
> was to have SUSE
41 matches
Mail list logo