Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan

you are right, the data didn't get lost. it is caused by write stall.

thanks



13605702...@163.com
 
From: Yan, Zheng
Date: 2017-12-18 12:01
To: 13605702...@163.com
CC: John Spray; ceph-users
Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
On Mon, Dec 18, 2017 at 11:34 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
>> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
>> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
>
> this is caused by write stall
>
> but the data below got lost, is this normal?
 
your script never write below data to the file. try script
 
while true; do date=`date`; echo $date; echo $date >> time.txt; sync;
sleep 1; done
 
 
 
> Mon Dec 18 03:07:48 UTC 2017
> Mon Dec 18 03:07:49 UTC 2017
> Mon Dec 18 03:07:50 UTC 2017
> Mon Dec 18 03:07:51 UTC 2017
> Mon Dec 18 03:07:52 UTC 2017
> Mon Dec 18 03:07:53 UTC 2017
> Mon Dec 18 03:07:54 UTC 2017
> Mon Dec 18 03:07:55 UTC 2017
> Mon Dec 18 03:07:56 UTC 2017
> Mon Dec 18 03:07:57 UTC 2017
> Mon Dec 18 03:07:58 UTC 2017
> Mon Dec 18 03:07:59 UTC 2017
> Mon Dec 18 03:08:00 UTC 2017
> Mon Dec 18 03:08:01 UTC 2017
> Mon Dec 18 03:08:02 UTC 2017
> Mon Dec 18 03:08:03 UTC 2017
> Mon Dec 18 03:08:04 UTC 2017
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 11:27
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds
> rebooting
> On Mon, Dec 18, 2017 at 11:11 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi Yan
>>
>> my test script:
>>
>> #!/bin/sh
>>
>> rm -f /root/cephfs/time.txt
>>
>> while true
>> do
>> echo `date` >> /root/cephfs/time.txt
>> sync
>> sleep 1
>> done
>>
>> i run this scripte and then reboot master mds
>>
>> from the file /root/cephfs/time.txt, i can see there are more than 15
>> lines
>> got lost:
>> Mon Dec 18 03:07:43 UTC 2017
>> Mon Dec 18 03:07:44 UTC 2017
>> Mon Dec 18 03:07:45 UTC 2017
>> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
>> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
>
> this is caused by write stall
>
>> Mon Dec 18 03:08:06 UTC 2017
>> Mon Dec 18 03:08:07 UTC 2017
>> Mon Dec 18 03:08:08 UTC 2017
>> Mon Dec 18 03:08:09 UTC 2017
>> Mon Dec 18 03:08:10 UTC 2017
>>
>> 
>> 13605702...@163.com
>>
>>
>> From: Yan, Zheng
>> Date: 2017-12-18 10:59
>> To: 13605702...@163.com
>> CC: John Spray; ceph-users
>> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi Yan
>>>
>>> 1. run "ceph mds fail" before rebooting host
>>> 2. host reboot by itself for some reason
>>>
>>> you means no data get lost in the  BOTH conditions?
>>>
>>> in my test, i echo the date string per second into the file under cephfs
>>> dir,
>>> when i reboot the master mds, there are 15 lines got lost.
>>>
>>
>> what do you mean 15 line got lost? are you sure it's not caused by write
>> stall?
>>
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>>
>>> From: Yan, Zheng
>>> Date: 2017-12-18 09:55
>>> To: 13605702...@163.com
>>> CC: John Spray; ceph-users
>>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>>> rebooting
>>> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
>>> <13605702...@163.com> wrote:
>>>> hi John
>>>>
>>>> thanks for your answer.
>>>>
>>>> in normal condition, i can run  "ceph mds fiail" before reboot.
>>>> but if the host reboots by itself for some reason, i can do nothing!
>>>> if this happens, data must be losed.
>>>>
>>>> so, is there any other way to stop data from being losed?
>>>>
>>>
>>> no data get lost in this condition.  just IO stall for a few seconds
>>>
>>>> thanks
>>>>
>>>> 
>>>> 13605702...@163.com
>>>>
>>>>
>>>> From: John Spray
>>>> Date: 2017-12-15 18:08
>>>> To: 13605702...@163.com
>>>> CC: ceph-users
&g

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread David Turner
The lines might not be in the file, but did the thing writing to the file
say it succeeded to write or did it fail to write? I'm guessing the latter
which means that just check that the write was successful and don't just
assume it was before continuing on.

On Sun, Dec 17, 2017, 10:07 PM Wei Jin  wrote:

> On Fri, Dec 15, 2017 at 6:08 PM, John Spray  wrote:
> > On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
> > <13605702...@163.com> wrote:
> >> hi
> >>
> >> i used 3 nodes to deploy mds (each node also has mon on it)
> >>
> >> my config:
> >> [mds.ceph-node-10-101-4-17]
> >> mds_standby_replay = true
> >> mds_standby_for_rank = 0
> >>
> >> [mds.ceph-node-10-101-4-21]
> >> mds_standby_replay = true
> >> mds_standby_for_rank = 0
> >>
> >> [mds.ceph-node-10-101-4-22]
> >> mds_standby_replay = true
> >> mds_standby_for_rank = 0
> >>
> >> the mds stat:
> >> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay,
> 1
> >> up:standby
> >>
> >> i mount the cephfs on the ceph client, and run the test script to write
> data
> >> into file under the cephfs dir,
> >> when i reboot the master mds, and i found the data is not written into
> the
> >> file.
> >> after 15 seconds, data can be written into the file again
> >>
> >> so my question is:
> >> is this normal when reboot the master mds?
> >> when will the up:standby-replay mds take over the the cephfs?
> >
> > The standby takes over after the active daemon has not reported to the
> > monitors for `mds_beacon_grace` seconds, which as you have noticed is
> > 15s by default.
> >
> > If you know you are rebooting something, you can pre-empt the timeout
> > mechanism by using "ceph mds fail" on the active daemon, to cause
> > another to take over right away.
>
> Why reboot mds must wait for grace time?
> Is it possible or reasonable to tell monitor during reboot by that
> daemon itself?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 11:34 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
>> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
>> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
>
> this is caused by write stall
>
> but the data below got lost, is this normal?

your script never write below data to the file. try script

while true; do date=`date`; echo $date; echo $date >> time.txt; sync;
sleep 1; done



> Mon Dec 18 03:07:48 UTC 2017
> Mon Dec 18 03:07:49 UTC 2017
> Mon Dec 18 03:07:50 UTC 2017
> Mon Dec 18 03:07:51 UTC 2017
> Mon Dec 18 03:07:52 UTC 2017
> Mon Dec 18 03:07:53 UTC 2017
> Mon Dec 18 03:07:54 UTC 2017
> Mon Dec 18 03:07:55 UTC 2017
> Mon Dec 18 03:07:56 UTC 2017
> Mon Dec 18 03:07:57 UTC 2017
> Mon Dec 18 03:07:58 UTC 2017
> Mon Dec 18 03:07:59 UTC 2017
> Mon Dec 18 03:08:00 UTC 2017
> Mon Dec 18 03:08:01 UTC 2017
> Mon Dec 18 03:08:02 UTC 2017
> Mon Dec 18 03:08:03 UTC 2017
> Mon Dec 18 03:08:04 UTC 2017
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 11:27
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds
> rebooting
> On Mon, Dec 18, 2017 at 11:11 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi Yan
>>
>> my test script:
>>
>> #!/bin/sh
>>
>> rm -f /root/cephfs/time.txt
>>
>> while true
>> do
>> echo `date` >> /root/cephfs/time.txt
>> sync
>> sleep 1
>> done
>>
>> i run this scripte and then reboot master mds
>>
>> from the file /root/cephfs/time.txt, i can see there are more than 15
>> lines
>> got lost:
>> Mon Dec 18 03:07:43 UTC 2017
>> Mon Dec 18 03:07:44 UTC 2017
>> Mon Dec 18 03:07:45 UTC 2017
>> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
>> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
>
> this is caused by write stall
>
>> Mon Dec 18 03:08:06 UTC 2017
>> Mon Dec 18 03:08:07 UTC 2017
>> Mon Dec 18 03:08:08 UTC 2017
>> Mon Dec 18 03:08:09 UTC 2017
>> Mon Dec 18 03:08:10 UTC 2017
>>
>> 
>> 13605702...@163.com
>>
>>
>> From: Yan, Zheng
>> Date: 2017-12-18 10:59
>> To: 13605702...@163.com
>> CC: John Spray; ceph-users
>> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi Yan
>>>
>>> 1. run "ceph mds fail" before rebooting host
>>> 2. host reboot by itself for some reason
>>>
>>> you means no data get lost in the  BOTH conditions?
>>>
>>> in my test, i echo the date string per second into the file under cephfs
>>> dir,
>>> when i reboot the master mds, there are 15 lines got lost.
>>>
>>
>> what do you mean 15 line got lost? are you sure it's not caused by write
>> stall?
>>
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>>
>>> From: Yan, Zheng
>>> Date: 2017-12-18 09:55
>>> To: 13605702...@163.com
>>> CC: John Spray; ceph-users
>>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>>> rebooting
>>> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
>>> <13605702...@163.com> wrote:
>>>> hi John
>>>>
>>>> thanks for your answer.
>>>>
>>>> in normal condition, i can run  "ceph mds fiail" before reboot.
>>>> but if the host reboots by itself for some reason, i can do nothing!
>>>> if this happens, data must be losed.
>>>>
>>>> so, is there any other way to stop data from being losed?
>>>>
>>>
>>> no data get lost in this condition.  just IO stall for a few seconds
>>>
>>>> thanks
>>>>
>>>> 
>>>> 13605702...@163.com
>>>>
>>>>
>>>> From: John Spray
>>>> Date: 2017-12-15 18:08
>>>> To: 13605702...@163.com
>>>> CC: ceph-users
>>>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>>>> rebooting
>>>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>>>> <13605702...@163.com> wrote:
>>>>> hi
>>>>>

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan

> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
 
this is caused by write stall

but the data below got lost, is this normal?
Mon Dec 18 03:07:48 UTC 2017
Mon Dec 18 03:07:49 UTC 2017
Mon Dec 18 03:07:50 UTC 2017
Mon Dec 18 03:07:51 UTC 2017
Mon Dec 18 03:07:52 UTC 2017
Mon Dec 18 03:07:53 UTC 2017
Mon Dec 18 03:07:54 UTC 2017
Mon Dec 18 03:07:55 UTC 2017
Mon Dec 18 03:07:56 UTC 2017
Mon Dec 18 03:07:57 UTC 2017
Mon Dec 18 03:07:58 UTC 2017
Mon Dec 18 03:07:59 UTC 2017
Mon Dec 18 03:08:00 UTC 2017
Mon Dec 18 03:08:01 UTC 2017
Mon Dec 18 03:08:02 UTC 2017
Mon Dec 18 03:08:03 UTC 2017
Mon Dec 18 03:08:04 UTC 2017



13605702...@163.com
 
From: Yan, Zheng
Date: 2017-12-18 11:27
To: 13605702...@163.com
CC: John Spray; ceph-users
Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
On Mon, Dec 18, 2017 at 11:11 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
> my test script:
>
> #!/bin/sh
>
> rm -f /root/cephfs/time.txt
>
> while true
> do
> echo `date` >> /root/cephfs/time.txt
> sync
> sleep 1
> done
>
> i run this scripte and then reboot master mds
>
> from the file /root/cephfs/time.txt, i can see there are more than 15 lines
> got lost:
> Mon Dec 18 03:07:43 UTC 2017
> Mon Dec 18 03:07:44 UTC 2017
> Mon Dec 18 03:07:45 UTC 2017
> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
 
this is caused by write stall
 
> Mon Dec 18 03:08:06 UTC 2017
> Mon Dec 18 03:08:07 UTC 2017
> Mon Dec 18 03:08:08 UTC 2017
> Mon Dec 18 03:08:09 UTC 2017
> Mon Dec 18 03:08:10 UTC 2017
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 10:59
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds
> rebooting
> On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi Yan
>>
>> 1. run "ceph mds fail" before rebooting host
>> 2. host reboot by itself for some reason
>>
>> you means no data get lost in the  BOTH conditions?
>>
>> in my test, i echo the date string per second into the file under cephfs
>> dir,
>> when i reboot the master mds, there are 15 lines got lost.
>>
>
> what do you mean 15 line got lost? are you sure it's not caused by write
> stall?
>
>
>> thanks
>>
>> 
>> 13605702...@163.com
>>
>>
>> From: Yan, Zheng
>> Date: 2017-12-18 09:55
>> To: 13605702...@163.com
>> CC: John Spray; ceph-users
>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi John
>>>
>>> thanks for your answer.
>>>
>>> in normal condition, i can run  "ceph mds fiail" before reboot.
>>> but if the host reboots by itself for some reason, i can do nothing!
>>> if this happens, data must be losed.
>>>
>>> so, is there any other way to stop data from being losed?
>>>
>>
>> no data get lost in this condition.  just IO stall for a few seconds
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>>
>>> From: John Spray
>>> Date: 2017-12-15 18:08
>>> To: 13605702...@163.com
>>> CC: ceph-users
>>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>>> rebooting
>>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>>> <13605702...@163.com> wrote:
>>>> hi
>>>>
>>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>>
>>>> my config:
>>>> [mds.ceph-node-10-101-4-17]
>>>> mds_standby_replay = true
>>>> mds_standby_for_rank = 0
>>>>
>>>> [mds.ceph-node-10-101-4-21]
>>>> mds_standby_replay = true
>>>> mds_standby_for_rank = 0
>>>>
>>>> [mds.ceph-node-10-101-4-22]
>>>> mds_standby_replay = true
>>>> mds_standby_for_rank = 0
>>>>
>>>> the mds stat:
>>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay,
>>>> 1
>>>> up:standby
>>>>
>>>> i mount the cephfs on the ceph client, and run the test script to write
>>>> data
>>>> into file unde

Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 11:11 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
> my test script:
>
> #!/bin/sh
>
> rm -f /root/cephfs/time.txt
>
> while true
> do
> echo `date` >> /root/cephfs/time.txt
> sync
> sleep 1
> done
>
> i run this scripte and then reboot master mds
>
> from the file /root/cephfs/time.txt, i can see there are more than 15 lines
> got lost:
> Mon Dec 18 03:07:43 UTC 2017
> Mon Dec 18 03:07:44 UTC 2017
> Mon Dec 18 03:07:45 UTC 2017
> Mon Dec 18 03:07:47 UTC 2017  <-- reboot
> Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works

this is caused by write stall

> Mon Dec 18 03:08:06 UTC 2017
> Mon Dec 18 03:08:07 UTC 2017
> Mon Dec 18 03:08:08 UTC 2017
> Mon Dec 18 03:08:09 UTC 2017
> Mon Dec 18 03:08:10 UTC 2017
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 10:59
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds
> rebooting
> On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi Yan
>>
>> 1. run "ceph mds fail" before rebooting host
>> 2. host reboot by itself for some reason
>>
>> you means no data get lost in the  BOTH conditions?
>>
>> in my test, i echo the date string per second into the file under cephfs
>> dir,
>> when i reboot the master mds, there are 15 lines got lost.
>>
>
> what do you mean 15 line got lost? are you sure it's not caused by write
> stall?
>
>
>> thanks
>>
>> 
>> 13605702...@163.com
>>
>>
>> From: Yan, Zheng
>> Date: 2017-12-18 09:55
>> To: 13605702...@163.com
>> CC: John Spray; ceph-users
>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi John
>>>
>>> thanks for your answer.
>>>
>>> in normal condition, i can run  "ceph mds fiail" before reboot.
>>> but if the host reboots by itself for some reason, i can do nothing!
>>> if this happens, data must be losed.
>>>
>>> so, is there any other way to stop data from being losed?
>>>
>>
>> no data get lost in this condition.  just IO stall for a few seconds
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>>
>>> From: John Spray
>>> Date: 2017-12-15 18:08
>>> To: 13605702...@163.com
>>> CC: ceph-users
>>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>>> rebooting
>>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>>> <13605702...@163.com> wrote:
>>>> hi
>>>>
>>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>>
>>>> my config:
>>>> [mds.ceph-node-10-101-4-17]
>>>> mds_standby_replay = true
>>>> mds_standby_for_rank = 0
>>>>
>>>> [mds.ceph-node-10-101-4-21]
>>>> mds_standby_replay = true
>>>> mds_standby_for_rank = 0
>>>>
>>>> [mds.ceph-node-10-101-4-22]
>>>> mds_standby_replay = true
>>>> mds_standby_for_rank = 0
>>>>
>>>> the mds stat:
>>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay,
>>>> 1
>>>> up:standby
>>>>
>>>> i mount the cephfs on the ceph client, and run the test script to write
>>>> data
>>>> into file under the cephfs dir,
>>>> when i reboot the master mds, and i found the data is not written into
>>>> the
>>>> file.
>>>> after 15 seconds, data can be written into the file again
>>>>
>>>> so my question is:
>>>> is this normal when reboot the master mds?
>>>> when will the up:standby-replay mds take over the the cephfs?
>>>
>>> The standby takes over after the active daemon has not reported to the
>>> monitors for `mds_beacon_grace` seconds, which as you have noticed is
>>> 15s by default.
>>>
>>> If you know you are rebooting something, you can pre-empt the timeout
>>> mechanism by using "ceph mds fail" on the active daemon, to cause
>>> another to take over right away.
>>>
>>> John
>>>
>>>> thanks
>>>>
>>>> 
>>>> 13605702...@163.com
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan

my test script: 

#!/bin/sh

rm -f /root/cephfs/time.txt

while true
do
echo `date` >> /root/cephfs/time.txt
sync
sleep 1
done

i run this scripte and then reboot master mds

from the file /root/cephfs/time.txt, i can see there are more than 15 lines got 
lost:
Mon Dec 18 03:07:43 UTC 2017
Mon Dec 18 03:07:44 UTC 2017
Mon Dec 18 03:07:45 UTC 2017
Mon Dec 18 03:07:47 UTC 2017  <-- reboot
Mon Dec 18 03:08:05 UTC 2017  <-- mds failover works
Mon Dec 18 03:08:06 UTC 2017
Mon Dec 18 03:08:07 UTC 2017
Mon Dec 18 03:08:08 UTC 2017
Mon Dec 18 03:08:09 UTC 2017
Mon Dec 18 03:08:10 UTC 2017



13605702...@163.com
 
From: Yan, Zheng
Date: 2017-12-18 10:59
To: 13605702...@163.com
CC: John Spray; ceph-users
Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
> 1. run "ceph mds fail" before rebooting host
> 2. host reboot by itself for some reason
>
> you means no data get lost in the  BOTH conditions?
>
> in my test, i echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>
 
what do you mean 15 line got lost? are you sure it's not caused by write stall?
 
 
> thanks
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 09:55
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi John
>>
>> thanks for your answer.
>>
>> in normal condition, i can run  "ceph mds fiail" before reboot.
>> but if the host reboots by itself for some reason, i can do nothing!
>> if this happens, data must be losed.
>>
>> so, is there any other way to stop data from being losed?
>>
>
> no data get lost in this condition.  just IO stall for a few seconds
>
>> thanks
>>
>> ____________
>> 13605702...@163.com
>>
>>
>> From: John Spray
>> Date: 2017-12-15 18:08
>> To: 13605702...@163.com
>> CC: ceph-users
>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi
>>>
>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>
>>> my config:
>>> [mds.ceph-node-10-101-4-17]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-21]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-22]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> the mds stat:
>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>>> up:standby
>>>
>>> i mount the cephfs on the ceph client, and run the test script to write
>>> data
>>> into file under the cephfs dir,
>>> when i reboot the master mds, and i found the data is not written into
>>> the
>>> file.
>>> after 15 seconds, data can be written into the file again
>>>
>>> so my question is:
>>> is this normal when reboot the master mds?
>>> when will the up:standby-replay mds take over the the cephfs?
>>
>> The standby takes over after the active daemon has not reported to the
>> monitors for `mds_beacon_grace` seconds, which as you have noticed is
>> 15s by default.
>>
>> If you know you are rebooting something, you can pre-empt the timeout
>> mechanism by using "ceph mds fail" on the active daemon, to cause
>> another to take over right away.
>>
>> John
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Wei Jin
On Fri, Dec 15, 2017 at 6:08 PM, John Spray  wrote:
> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi
>>
>> i used 3 nodes to deploy mds (each node also has mon on it)
>>
>> my config:
>> [mds.ceph-node-10-101-4-17]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> [mds.ceph-node-10-101-4-21]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> [mds.ceph-node-10-101-4-22]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> the mds stat:
>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>> up:standby
>>
>> i mount the cephfs on the ceph client, and run the test script to write data
>> into file under the cephfs dir,
>> when i reboot the master mds, and i found the data is not written into the
>> file.
>> after 15 seconds, data can be written into the file again
>>
>> so my question is:
>> is this normal when reboot the master mds?
>> when will the up:standby-replay mds take over the the cephfs?
>
> The standby takes over after the active daemon has not reported to the
> monitors for `mds_beacon_grace` seconds, which as you have noticed is
> 15s by default.
>
> If you know you are rebooting something, you can pre-empt the timeout
> mechanism by using "ceph mds fail" on the active daemon, to cause
> another to take over right away.

Why reboot mds must wait for grace time?
Is it possible or reasonable to tell monitor during reboot by that
daemon itself?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
> 1. run "ceph mds fail" before rebooting host
> 2. host reboot by itself for some reason
>
> you means no data get lost in the  BOTH conditions?
>
> in my test, i echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>

what do you mean 15 line got lost? are you sure it's not caused by write stall?


> thanks
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 09:55
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi John
>>
>> thanks for your answer.
>>
>> in normal condition, i can run  "ceph mds fiail" before reboot.
>> but if the host reboots by itself for some reason, i can do nothing!
>> if this happens, data must be losed.
>>
>> so, is there any other way to stop data from being losed?
>>
>
> no data get lost in this condition.  just IO stall for a few seconds
>
>> thanks
>>
>> ________________
>> 13605702...@163.com
>>
>>
>> From: John Spray
>> Date: 2017-12-15 18:08
>> To: 13605702...@163.com
>> CC: ceph-users
>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi
>>>
>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>
>>> my config:
>>> [mds.ceph-node-10-101-4-17]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-21]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-22]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> the mds stat:
>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>>> up:standby
>>>
>>> i mount the cephfs on the ceph client, and run the test script to write
>>> data
>>> into file under the cephfs dir,
>>> when i reboot the master mds, and i found the data is not written into
>>> the
>>> file.
>>> after 15 seconds, data can be written into the file again
>>>
>>> so my question is:
>>> is this normal when reboot the master mds?
>>> when will the up:standby-replay mds take over the the cephfs?
>>
>> The standby takes over after the active daemon has not reported to the
>> monitors for `mds_beacon_grace` seconds, which as you have noticed is
>> 15s by default.
>>
>> If you know you are rebooting something, you can pre-empt the timeout
>> mechanism by using "ceph mds fail" on the active daemon, to cause
>> another to take over right away.
>>
>> John
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan

cephfs client was also on the rebooted host?

NO, the cephfs client is an indepentent vm



13605702...@163.com
 
From: Yan, Zheng
Date: 2017-12-18 10:36
To: 13605702...@163.com
CC: John Spray; ceph-users
Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
> 1. run "ceph mds fail" before rebooting host
> 2. host reboot by itself for some reason
>
 
cephfs client was also on the rebooted host?
 
> you means no data get lost in the  BOTH conditions?
>
> in my test, i echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>
 
 
 
> thanks
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 09:55
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi John
>>
>> thanks for your answer.
>>
>> in normal condition, i can run  "ceph mds fiail" before reboot.
>> but if the host reboots by itself for some reason, i can do nothing!
>> if this happens, data must be losed.
>>
>> so, is there any other way to stop data from being losed?
>>
>
> no data get lost in this condition.  just IO stall for a few seconds
>
>> thanks
>>
>> ________________
>> 13605702...@163.com
>>
>>
>> From: John Spray
>> Date: 2017-12-15 18:08
>> To: 13605702...@163.com
>> CC: ceph-users
>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi
>>>
>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>
>>> my config:
>>> [mds.ceph-node-10-101-4-17]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-21]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-22]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> the mds stat:
>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>>> up:standby
>>>
>>> i mount the cephfs on the ceph client, and run the test script to write
>>> data
>>> into file under the cephfs dir,
>>> when i reboot the master mds, and i found the data is not written into
>>> the
>>> file.
>>> after 15 seconds, data can be written into the file again
>>>
>>> so my question is:
>>> is this normal when reboot the master mds?
>>> when will the up:standby-replay mds take over the the cephfs?
>>
>> The standby takes over after the active daemon has not reported to the
>> monitors for `mds_beacon_grace` seconds, which as you have noticed is
>> 15s by default.
>>
>> If you know you are rebooting something, you can pre-empt the timeout
>> mechanism by using "ceph mds fail" on the active daemon, to cause
>> another to take over right away.
>>
>> John
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 10:10 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi Yan
>
> 1. run "ceph mds fail" before rebooting host
> 2. host reboot by itself for some reason
>

cephfs client was also on the rebooted host?

> you means no data get lost in the  BOTH conditions?
>
> in my test, i echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>



> thanks
>
> 
> 13605702...@163.com
>
>
> From: Yan, Zheng
> Date: 2017-12-18 09:55
> To: 13605702...@163.com
> CC: John Spray; ceph-users
> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
> On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi John
>>
>> thanks for your answer.
>>
>> in normal condition, i can run  "ceph mds fiail" before reboot.
>> but if the host reboots by itself for some reason, i can do nothing!
>> if this happens, data must be losed.
>>
>> so, is there any other way to stop data from being losed?
>>
>
> no data get lost in this condition.  just IO stall for a few seconds
>
>> thanks
>>
>> ________________
>> 13605702...@163.com
>>
>>
>> From: John Spray
>> Date: 2017-12-15 18:08
>> To: 13605702...@163.com
>> CC: ceph-users
>> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds
>> rebooting
>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
>> <13605702...@163.com> wrote:
>>> hi
>>>
>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>
>>> my config:
>>> [mds.ceph-node-10-101-4-17]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-21]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-22]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> the mds stat:
>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>>> up:standby
>>>
>>> i mount the cephfs on the ceph client, and run the test script to write
>>> data
>>> into file under the cephfs dir,
>>> when i reboot the master mds, and i found the data is not written into
>>> the
>>> file.
>>> after 15 seconds, data can be written into the file again
>>>
>>> so my question is:
>>> is this normal when reboot the master mds?
>>> when will the up:standby-replay mds take over the the cephfs?
>>
>> The standby takes over after the active daemon has not reported to the
>> monitors for `mds_beacon_grace` seconds, which as you have noticed is
>> 15s by default.
>>
>> If you know you are rebooting something, you can pre-empt the timeout
>> mechanism by using "ceph mds fail" on the active daemon, to cause
>> another to take over right away.
>>
>> John
>>
>>> thanks
>>>
>>> 
>>> 13605702...@163.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi Yan

1. run "ceph mds fail" before rebooting host
2. host reboot by itself for some reason

you means no data get lost in the  BOTH conditions?

in my test, i echo the date string per second into the file under cephfs dir,
when i reboot the master mds, there are 15 lines got lost.

thanks



13605702...@163.com
 
From: Yan, Zheng
Date: 2017-12-18 09:55
To: 13605702...@163.com
CC: John Spray; ceph-users
Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi John
>
> thanks for your answer.
>
> in normal condition, i can run  "ceph mds fiail" before reboot.
> but if the host reboots by itself for some reason, i can do nothing!
> if this happens, data must be losed.
>
> so, is there any other way to stop data from being losed?
>
 
no data get lost in this condition.  just IO stall for a few seconds
 
> thanks
>
> 
> 13605702...@163.com
>
>
> From: John Spray
> Date: 2017-12-15 18:08
> To: 13605702...@163.com
> CC: ceph-users
> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi
>>
>> i used 3 nodes to deploy mds (each node also has mon on it)
>>
>> my config:
>> [mds.ceph-node-10-101-4-17]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> [mds.ceph-node-10-101-4-21]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> [mds.ceph-node-10-101-4-22]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> the mds stat:
>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>> up:standby
>>
>> i mount the cephfs on the ceph client, and run the test script to write
>> data
>> into file under the cephfs dir,
>> when i reboot the master mds, and i found the data is not written into the
>> file.
>> after 15 seconds, data can be written into the file again
>>
>> so my question is:
>> is this normal when reboot the master mds?
>> when will the up:standby-replay mds take over the the cephfs?
>
> The standby takes over after the active daemon has not reported to the
> monitors for `mds_beacon_grace` seconds, which as you have noticed is
> 15s by default.
>
> If you know you are rebooting something, you can pre-empt the timeout
> mechanism by using "ceph mds fail" on the active daemon, to cause
> another to take over right away.
>
> John
>
>> thanks
>>
>> 
>> 13605702...@163.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread Yan, Zheng
On Mon, Dec 18, 2017 at 9:24 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi John
>
> thanks for your answer.
>
> in normal condition, i can run  "ceph mds fiail" before reboot.
> but if the host reboots by itself for some reason, i can do nothing!
> if this happens, data must be losed.
>
> so, is there any other way to stop data from being losed?
>

no data get lost in this condition.  just IO stall for a few seconds

> thanks
>
> 
> 13605702...@163.com
>
>
> From: John Spray
> Date: 2017-12-15 18:08
> To: 13605702...@163.com
> CC: ceph-users
> Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
> On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
> <13605702...@163.com> wrote:
>> hi
>>
>> i used 3 nodes to deploy mds (each node also has mon on it)
>>
>> my config:
>> [mds.ceph-node-10-101-4-17]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> [mds.ceph-node-10-101-4-21]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> [mds.ceph-node-10-101-4-22]
>> mds_standby_replay = true
>> mds_standby_for_rank = 0
>>
>> the mds stat:
>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>> up:standby
>>
>> i mount the cephfs on the ceph client, and run the test script to write
>> data
>> into file under the cephfs dir,
>> when i reboot the master mds, and i found the data is not written into the
>> file.
>> after 15 seconds, data can be written into the file again
>>
>> so my question is:
>> is this normal when reboot the master mds?
>> when will the up:standby-replay mds take over the the cephfs?
>
> The standby takes over after the active daemon has not reported to the
> monitors for `mds_beacon_grace` seconds, which as you have noticed is
> 15s by default.
>
> If you know you are rebooting something, you can pre-empt the timeout
> mechanism by using "ceph mds fail" on the active daemon, to cause
> another to take over right away.
>
> John
>
>> thanks
>>
>> 
>> 13605702...@163.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-17 Thread 13605702...@163.com
hi John

thanks for your answer.

in normal condition, i can run  "ceph mds fiail" before reboot.
but if the host reboots by itself for some reason, i can do nothing!
if this happens, data must be losed.

so, is there any other way to stop data from being losed? 

thanks



13605702...@163.com
 
From: John Spray
Date: 2017-12-15 18:08
To: 13605702...@163.com
CC: ceph-users
Subject: Re: [ceph-users] cephfs miss data for 15s when master mds rebooting
On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi
>
> i used 3 nodes to deploy mds (each node also has mon on it)
>
> my config:
> [mds.ceph-node-10-101-4-17]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> [mds.ceph-node-10-101-4-21]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> [mds.ceph-node-10-101-4-22]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> the mds stat:
> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
> up:standby
>
> i mount the cephfs on the ceph client, and run the test script to write data
> into file under the cephfs dir,
> when i reboot the master mds, and i found the data is not written into the
> file.
> after 15 seconds, data can be written into the file again
>
> so my question is:
> is this normal when reboot the master mds?
> when will the up:standby-replay mds take over the the cephfs?
 
The standby takes over after the active daemon has not reported to the
monitors for `mds_beacon_grace` seconds, which as you have noticed is
15s by default.
 
If you know you are rebooting something, you can pre-empt the timeout
mechanism by using "ceph mds fail" on the active daemon, to cause
another to take over right away.
 
John
 
> thanks
>
> 
> 13605702...@163.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-15 Thread John Spray
On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi
>
> i used 3 nodes to deploy mds (each node also has mon on it)
>
> my config:
> [mds.ceph-node-10-101-4-17]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> [mds.ceph-node-10-101-4-21]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> [mds.ceph-node-10-101-4-22]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> the mds stat:
> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
> up:standby
>
> i mount the cephfs on the ceph client, and run the test script to write data
> into file under the cephfs dir,
> when i reboot the master mds, and i found the data is not written into the
> file.
> after 15 seconds, data can be written into the file again
>
> so my question is:
> is this normal when reboot the master mds?
> when will the up:standby-replay mds take over the the cephfs?

The standby takes over after the active daemon has not reported to the
monitors for `mds_beacon_grace` seconds, which as you have noticed is
15s by default.

If you know you are rebooting something, you can pre-empt the timeout
mechanism by using "ceph mds fail" on the active daemon, to cause
another to take over right away.

John

> thanks
>
> 
> 13605702...@163.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-14 Thread 13605702...@163.com
hi

i used 3 nodes to deploy mds (each node also has mon on it)

my config:
[mds.ceph-node-10-101-4-17]
mds_standby_replay = true
mds_standby_for_rank = 0

[mds.ceph-node-10-101-4-21]
mds_standby_replay = true
mds_standby_for_rank = 0

[mds.ceph-node-10-101-4-22]
mds_standby_replay = true
mds_standby_for_rank = 0

the mds stat:
e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1 
up:standby

i mount the cephfs on the ceph client, and run the test script to write data 
into file under the cephfs dir,
when i reboot the master mds, and i found the data is not written into the file.
after 15 seconds, data can be written into the file again

so my question is: 
is this normal when reboot the master mds?
when will the up:standby-replay mds take over the the cephfs?

thanks



13605702...@163.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com