subject:"\[ceph\-users\] Re\: Cephfs IO halt on Node failure"

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-25 Thread Yoann Moulin

Hello,

> Sorry for the late reply.
> I have pasted crush map in below url : https://pastebin.com/ASPpY2VB
> and this my osd tree output and this issue are only when i use it with
> filelayout.

could send the output of "ceph osd pool ls detail" please ?

Yoann

> ID CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
> -1   327.48047 root default
> -3   109.16016 host strgsrv01
>  0   hdd   5.45799 osd.0  up  1.0 1.0
>  2   hdd   5.45799 osd.2  up  1.0 1.0
>  3   hdd   5.45799 osd.3  up  1.0 1.0
>  4   hdd   5.45799 osd.4  up  1.0 1.0
>  5   hdd   5.45799 osd.5  up  1.0 1.0
>  6   hdd   5.45799 osd.6  up  1.0 1.0
>  7   hdd   5.45799 osd.7  up  1.0 1.0
> 19   hdd   5.45799 osd.19 up  1.0 1.0
> 20   hdd   5.45799 osd.20 up  1.0 1.0
> 21   hdd   5.45799 osd.21 up  1.0 1.0
> 22   hdd   5.45799 osd.22 up  1.0 1.0
> 23   hdd   5.45799 osd.23 up  1.0 1.0
> -5   109.16016 host strgsrv02
>  1   hdd   5.45799 osd.1  up  1.0 1.0
>  8   hdd   5.45799 osd.8  up  1.0 1.0
>  9   hdd   5.45799 osd.9  up  1.0 1.0
> 10   hdd   5.45799 osd.10 up  1.0 1.0
> 11   hdd   5.45799 osd.11 up  1.0 1.0
> 12   hdd   5.45799 osd.12 up  1.0 1.0
> 24   hdd   5.45799 osd.24 up  1.0 1.0
> 25   hdd   5.45799 osd.25 up  1.0 1.0
> 26   hdd   5.45799 osd.26 up  1.0 1.0
> 27   hdd   5.45799 osd.27 up  1.0 1.0
> 28   hdd   5.45799 osd.28 up  1.0 1.0
> 29   hdd   5.45799 osd.29 up  1.0 1.0
> -7   109.16016 host strgsrv03
> 13   hdd   5.45799 osd.13 up  1.0 1.0
> 14   hdd   5.45799 osd.14 up  1.0 1.0
> 15   hdd   5.45799 osd.15 up  1.0 1.0
> 16   hdd   5.45799 osd.16 up  1.0 1.0
> 17   hdd   5.45799 osd.17 up  1.0 1.0
> 18   hdd   5.45799 osd.18 up  1.0 1.0
> 30   hdd   5.45799 osd.30 up  1.0 1.0
> 31   hdd   5.45799 osd.31 up  1.0 1.0
> 32   hdd   5.45799 osd.32 up  1.0 1.0
> 33   hdd   5.45799 osd.33 up  1.0 1.0
> 34   hdd   5.45799 osd.34 up  1.0 1.0
> 35   hdd   5.45799 osd.35 up  1.0 1.0
> 
> On Tue, May 19, 2020 at 12:16 PM Eugen Block  wrote:
> 
>> Was that a typo and you mean you changed min_size to 1? I/O paus with
>> min_size 1 and size 2 is unexpected, can you share more details like
>> your crushmap and your osd tree?
>>
>>
>> Zitat von Amudhan P :
>>
>>> Behaviour is same even after setting min_size 2.
>>>
>>> On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:
>>>
 If your pool has a min_size 2 and size 2 (always a bad idea) it will
 pause IO in case of a failure until the recovery has finished. So the
 described behaviour is expected.


 Zitat von Amudhan P :

> Hi,
>
> Crush rule is "replicated" and min_size 2 actually. I am trying to
>> test
> multiple volume configs in a single filesystem
> using file layout.
>
> I have created metadata pool with rep 3 (min_size2 and replicated
>> crush
> rule) and data pool with rep 3  (min_size2 and replicated crush rule).
 and
> also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and
>> added
 to
> the filesystem.
>
> Using file layout I have set different data pool to a different
>> folders.
 so
> I can test different configs in the same filesystem. all data pools
> min_size set to handle single node failure.
>
> Single node failure is handled properly when only having metadata pool
 and
> one data pool (rep3).
>
> After adding additional data pool to fs, single node failure scenario
>> is
> not working.
>
> regards
> Amudhan P
>
> On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
>
>> What’s your pool configuration wrt min_size and crush rules?
>>
>>
>> Zitat von Amudhan P :
>>
>>> Hi,
>>>
>>> I am using ceph Nautilus cluster with below configuration.
>>>
>>> 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
>> running
>>> in shared mode.
>>>
>>> The client mounted through ceph kernel client.
>>>
>>> I was trying to emulate a node failure when a write and read were
 going
>> on
>>> (replica2) pool.
>>>
>>> I was expecting read and write continue after

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-25 Thread Eugen Block


Hi,

you didn't really clear things up so I'll just summerarize what I  
understood so far. Please also share 'ceph osd pool ls detail' and  
'ceph fs status'.


One of the pools is configured with min_size 2 and size 2, this will  
pause IO if one node goes down as it's very likely that this node  
contains PGs from that pool. The IO will resume as soon as those PGs  
are recovered on the remaining nodes. To serve IO despite the failure  
you can set min_size to 1 for that pool.
If you take down the node with the active mds the failover can take  
some time, of course.




Zitat von Amudhan P :


Sorry for the late reply.
I have pasted crush map in below url : https://pastebin.com/ASPpY2VB
and this my osd tree output and this issue are only when i use it with
filelayout.

ID CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
-1   327.48047 root default
-3   109.16016 host strgsrv01
 0   hdd   5.45799 osd.0  up  1.0 1.0
 2   hdd   5.45799 osd.2  up  1.0 1.0
 3   hdd   5.45799 osd.3  up  1.0 1.0
 4   hdd   5.45799 osd.4  up  1.0 1.0
 5   hdd   5.45799 osd.5  up  1.0 1.0
 6   hdd   5.45799 osd.6  up  1.0 1.0
 7   hdd   5.45799 osd.7  up  1.0 1.0
19   hdd   5.45799 osd.19 up  1.0 1.0
20   hdd   5.45799 osd.20 up  1.0 1.0
21   hdd   5.45799 osd.21 up  1.0 1.0
22   hdd   5.45799 osd.22 up  1.0 1.0
23   hdd   5.45799 osd.23 up  1.0 1.0
-5   109.16016 host strgsrv02
 1   hdd   5.45799 osd.1  up  1.0 1.0
 8   hdd   5.45799 osd.8  up  1.0 1.0
 9   hdd   5.45799 osd.9  up  1.0 1.0
10   hdd   5.45799 osd.10 up  1.0 1.0
11   hdd   5.45799 osd.11 up  1.0 1.0
12   hdd   5.45799 osd.12 up  1.0 1.0
24   hdd   5.45799 osd.24 up  1.0 1.0
25   hdd   5.45799 osd.25 up  1.0 1.0
26   hdd   5.45799 osd.26 up  1.0 1.0
27   hdd   5.45799 osd.27 up  1.0 1.0
28   hdd   5.45799 osd.28 up  1.0 1.0
29   hdd   5.45799 osd.29 up  1.0 1.0
-7   109.16016 host strgsrv03
13   hdd   5.45799 osd.13 up  1.0 1.0
14   hdd   5.45799 osd.14 up  1.0 1.0
15   hdd   5.45799 osd.15 up  1.0 1.0
16   hdd   5.45799 osd.16 up  1.0 1.0
17   hdd   5.45799 osd.17 up  1.0 1.0
18   hdd   5.45799 osd.18 up  1.0 1.0
30   hdd   5.45799 osd.30 up  1.0 1.0
31   hdd   5.45799 osd.31 up  1.0 1.0
32   hdd   5.45799 osd.32 up  1.0 1.0
33   hdd   5.45799 osd.33 up  1.0 1.0
34   hdd   5.45799 osd.34 up  1.0 1.0
35   hdd   5.45799 osd.35 up  1.0 1.0

On Tue, May 19, 2020 at 12:16 PM Eugen Block  wrote:


Was that a typo and you mean you changed min_size to 1? I/O paus with
min_size 1 and size 2 is unexpected, can you share more details like
your crushmap and your osd tree?


Zitat von Amudhan P :

> Behaviour is same even after setting min_size 2.
>
> On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:
>
>> If your pool has a min_size 2 and size 2 (always a bad idea) it will
>> pause IO in case of a failure until the recovery has finished. So the
>> described behaviour is expected.
>>
>>
>> Zitat von Amudhan P :
>>
>> > Hi,
>> >
>> > Crush rule is "replicated" and min_size 2 actually. I am trying to
test
>> > multiple volume configs in a single filesystem
>> > using file layout.
>> >
>> > I have created metadata pool with rep 3 (min_size2 and replicated
crush
>> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
>> and
>> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and
added
>> to
>> > the filesystem.
>> >
>> > Using file layout I have set different data pool to a different
folders.
>> so
>> > I can test different configs in the same filesystem. all data pools
>> > min_size set to handle single node failure.
>> >
>> > Single node failure is handled properly when only having metadata pool
>> and
>> > one data pool (rep3).
>> >
>> > After adding additional data pool to fs, single node failure scenario
is
>> > not working.
>> >
>> > regards
>> > Amudhan P
>> >
>> > On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
>> >
>> >> What’s your pool configuration wrt min_size and crush rules?
>> >>
>> >>
>> >> Zitat von Amudhan P :
>> >>
>> >> > Hi,
>> >> >
>> >> > I am using ceph Nautilus cluster with below configuration.
>> >> >
>> >> > 3 node's (Ubuntu 1

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-24 Thread Amudhan P

Sorry for the late reply.
I have pasted crush map in below url : https://pastebin.com/ASPpY2VB
and this my osd tree output and this issue are only when i use it with
filelayout.

ID CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
-1   327.48047 root default
-3   109.16016 host strgsrv01
 0   hdd   5.45799 osd.0  up  1.0 1.0
 2   hdd   5.45799 osd.2  up  1.0 1.0
 3   hdd   5.45799 osd.3  up  1.0 1.0
 4   hdd   5.45799 osd.4  up  1.0 1.0
 5   hdd   5.45799 osd.5  up  1.0 1.0
 6   hdd   5.45799 osd.6  up  1.0 1.0
 7   hdd   5.45799 osd.7  up  1.0 1.0
19   hdd   5.45799 osd.19 up  1.0 1.0
20   hdd   5.45799 osd.20 up  1.0 1.0
21   hdd   5.45799 osd.21 up  1.0 1.0
22   hdd   5.45799 osd.22 up  1.0 1.0
23   hdd   5.45799 osd.23 up  1.0 1.0
-5   109.16016 host strgsrv02
 1   hdd   5.45799 osd.1  up  1.0 1.0
 8   hdd   5.45799 osd.8  up  1.0 1.0
 9   hdd   5.45799 osd.9  up  1.0 1.0
10   hdd   5.45799 osd.10 up  1.0 1.0
11   hdd   5.45799 osd.11 up  1.0 1.0
12   hdd   5.45799 osd.12 up  1.0 1.0
24   hdd   5.45799 osd.24 up  1.0 1.0
25   hdd   5.45799 osd.25 up  1.0 1.0
26   hdd   5.45799 osd.26 up  1.0 1.0
27   hdd   5.45799 osd.27 up  1.0 1.0
28   hdd   5.45799 osd.28 up  1.0 1.0
29   hdd   5.45799 osd.29 up  1.0 1.0
-7   109.16016 host strgsrv03
13   hdd   5.45799 osd.13 up  1.0 1.0
14   hdd   5.45799 osd.14 up  1.0 1.0
15   hdd   5.45799 osd.15 up  1.0 1.0
16   hdd   5.45799 osd.16 up  1.0 1.0
17   hdd   5.45799 osd.17 up  1.0 1.0
18   hdd   5.45799 osd.18 up  1.0 1.0
30   hdd   5.45799 osd.30 up  1.0 1.0
31   hdd   5.45799 osd.31 up  1.0 1.0
32   hdd   5.45799 osd.32 up  1.0 1.0
33   hdd   5.45799 osd.33 up  1.0 1.0
34   hdd   5.45799 osd.34 up  1.0 1.0
35   hdd   5.45799 osd.35 up  1.0 1.0

On Tue, May 19, 2020 at 12:16 PM Eugen Block  wrote:

> Was that a typo and you mean you changed min_size to 1? I/O paus with
> min_size 1 and size 2 is unexpected, can you share more details like
> your crushmap and your osd tree?
>
>
> Zitat von Amudhan P :
>
> > Behaviour is same even after setting min_size 2.
> >
> > On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:
> >
> >> If your pool has a min_size 2 and size 2 (always a bad idea) it will
> >> pause IO in case of a failure until the recovery has finished. So the
> >> described behaviour is expected.
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >> > Hi,
> >> >
> >> > Crush rule is "replicated" and min_size 2 actually. I am trying to
> test
> >> > multiple volume configs in a single filesystem
> >> > using file layout.
> >> >
> >> > I have created metadata pool with rep 3 (min_size2 and replicated
> crush
> >> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
> >> and
> >> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and
> added
> >> to
> >> > the filesystem.
> >> >
> >> > Using file layout I have set different data pool to a different
> folders.
> >> so
> >> > I can test different configs in the same filesystem. all data pools
> >> > min_size set to handle single node failure.
> >> >
> >> > Single node failure is handled properly when only having metadata pool
> >> and
> >> > one data pool (rep3).
> >> >
> >> > After adding additional data pool to fs, single node failure scenario
> is
> >> > not working.
> >> >
> >> > regards
> >> > Amudhan P
> >> >
> >> > On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
> >> >
> >> >> What’s your pool configuration wrt min_size and crush rules?
> >> >>
> >> >>
> >> >> Zitat von Amudhan P :
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I am using ceph Nautilus cluster with below configuration.
> >> >> >
> >> >> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> >> >> running
> >> >> > in shared mode.
> >> >> >
> >> >> > The client mounted through ceph kernel client.
> >> >> >
> >> >> > I was trying to emulate a node failure when a write and read were
> >> going
> >> >> on
> >> >> > (replica2) pool.
> >> >> >
> >> >> > I was expecting read and write continue after a small pause due to
> a
> >> Node
> >> >> > failure but it halts and never resumes until the failed node is up.
> >> >> >
>

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-18 Thread Eugen Block

Was that a typo and you mean you changed min_size to 1? I/O paus with  
min_size 1 and size 2 is unexpected, can you share more details like  
your crushmap and your osd tree?



Zitat von Amudhan P :


Behaviour is same even after setting min_size 2.

On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:


If your pool has a min_size 2 and size 2 (always a bad idea) it will
pause IO in case of a failure until the recovery has finished. So the
described behaviour is expected.


Zitat von Amudhan P :

> Hi,
>
> Crush rule is "replicated" and min_size 2 actually. I am trying to test
> multiple volume configs in a single filesystem
> using file layout.
>
> I have created metadata pool with rep 3 (min_size2 and replicated crush
> rule) and data pool with rep 3  (min_size2 and replicated crush rule).
and
> also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added
to
> the filesystem.
>
> Using file layout I have set different data pool to a different folders.
so
> I can test different configs in the same filesystem. all data pools
> min_size set to handle single node failure.
>
> Single node failure is handled properly when only having metadata pool
and
> one data pool (rep3).
>
> After adding additional data pool to fs, single node failure scenario is
> not working.
>
> regards
> Amudhan P
>
> On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
>
>> What’s your pool configuration wrt min_size and crush rules?
>>
>>
>> Zitat von Amudhan P :
>>
>> > Hi,
>> >
>> > I am using ceph Nautilus cluster with below configuration.
>> >
>> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
>> running
>> > in shared mode.
>> >
>> > The client mounted through ceph kernel client.
>> >
>> > I was trying to emulate a node failure when a write and read were
going
>> on
>> > (replica2) pool.
>> >
>> > I was expecting read and write continue after a small pause due to a
Node
>> > failure but it halts and never resumes until the failed node is up.
>> >
>> > I remember I tested the same scenario before in ceph mimic where it
>> > continued IO after a small pause.
>> >
>> > regards
>> > Amudhan P
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>>







___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-18 Thread Amudhan P

Behaviour is same even after setting min_size 2.

On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:

> If your pool has a min_size 2 and size 2 (always a bad idea) it will
> pause IO in case of a failure until the recovery has finished. So the
> described behaviour is expected.
>
>
> Zitat von Amudhan P :
>
> > Hi,
> >
> > Crush rule is "replicated" and min_size 2 actually. I am trying to test
> > multiple volume configs in a single filesystem
> > using file layout.
> >
> > I have created metadata pool with rep 3 (min_size2 and replicated crush
> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
> and
> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added
> to
> > the filesystem.
> >
> > Using file layout I have set different data pool to a different folders.
> so
> > I can test different configs in the same filesystem. all data pools
> > min_size set to handle single node failure.
> >
> > Single node failure is handled properly when only having metadata pool
> and
> > one data pool (rep3).
> >
> > After adding additional data pool to fs, single node failure scenario is
> > not working.
> >
> > regards
> > Amudhan P
> >
> > On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
> >
> >> What’s your pool configuration wrt min_size and crush rules?
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >> > Hi,
> >> >
> >> > I am using ceph Nautilus cluster with below configuration.
> >> >
> >> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> >> running
> >> > in shared mode.
> >> >
> >> > The client mounted through ceph kernel client.
> >> >
> >> > I was trying to emulate a node failure when a write and read were
> going
> >> on
> >> > (replica2) pool.
> >> >
> >> > I was expecting read and write continue after a small pause due to a
> Node
> >> > failure but it halts and never resumes until the failed node is up.
> >> >
> >> > I remember I tested the same scenario before in ceph mimic where it
> >> > continued IO after a small pause.
> >> >
> >> > regards
> >> > Amudhan P
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-18 Thread Eugen Block

If your pool has a min_size 2 and size 2 (always a bad idea) it will  
pause IO in case of a failure until the recovery has finished. So the  
described behaviour is expected.

Zitat von Amudhan P :

Hi,

Crush rule is "replicated" and min_size 2 actually. I am trying to test
multiple volume configs in a single filesystem
using file layout.

I have created metadata pool with rep 3 (min_size2 and replicated crush
rule) and data pool with rep 3  (min_size2 and replicated crush rule). and
also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added to
the filesystem.

Using file layout I have set different data pool to a different folders. so
I can test different configs in the same filesystem. all data pools
min_size set to handle single node failure.

Single node failure is handled properly when only having metadata pool and
one data pool (rep3).

After adding additional data pool to fs, single node failure scenario is
not working.

regards
Amudhan P

On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:

What’s your pool configuration wrt min_size and crush rules?

Zitat von Amudhan P :

> Hi,
>
> I am using ceph Nautilus cluster with below configuration.
>
> 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
running
> in shared mode.
>
> The client mounted through ceph kernel client.
>
> I was trying to emulate a node failure when a write and read were going
on
> (replica2) pool.
>
> I was expecting read and write continue after a small pause due to a Node
> failure but it halts and never resumes until the failed node is up.
>
> I remember I tested the same scenario before in ceph mimic where it
> continued IO after a small pause.
>
> regards
> Amudhan P
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-17 Thread Amudhan P

Hi,

Crush rule is "replicated" and min_size 2 actually. I am trying to test
multiple volume configs in a single filesystem
using file layout.

I have created metadata pool with rep 3 (min_size2 and replicated crush
rule) and data pool with rep 3  (min_size2 and replicated crush rule). and
also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added to
the filesystem.

Using file layout I have set different data pool to a different folders. so
I can test different configs in the same filesystem. all data pools
min_size set to handle single node failure.

Single node failure is handled properly when only having metadata pool and
one data pool (rep3).

After adding additional data pool to fs, single node failure scenario is
not working.

regards
Amudhan P

On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:

> What’s your pool configuration wrt min_size and crush rules?
>
>
> Zitat von Amudhan P :
>
> > Hi,
> >
> > I am using ceph Nautilus cluster with below configuration.
> >
> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> running
> > in shared mode.
> >
> > The client mounted through ceph kernel client.
> >
> > I was trying to emulate a node failure when a write and read were going
> on
> > (replica2) pool.
> >
> > I was expecting read and write continue after a small pause due to a Node
> > failure but it halts and never resumes until the failed node is up.
> >
> > I remember I tested the same scenario before in ceph mimic where it
> > continued IO after a small pause.
> >
> > regards
> > Amudhan P
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-16 Thread Eugen Block


What’s your pool configuration wrt min_size and crush rules?


Zitat von Amudhan P :


Hi,

I am using ceph Nautilus cluster with below configuration.

3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are running
in shared mode.

The client mounted through ceph kernel client.

I was trying to emulate a node failure when a write and read were going on
(replica2) pool.

I was expecting read and write continue after a small pause due to a Node
failure but it halts and never resumes until the failed node is up.

I remember I tested the same scenario before in ceph mimic where it
continued IO after a small pause.

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

[ceph-users] Re: Cephfs IO halt on Node failure

8 matches

Site Navigation

Mail list logo

Footer information