Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread Gregory Farnum
Yes, this is a consequence of co-locating the MDS and monitors — if the MDS
reports to its co-located monitor and both fail, the monitor cluster has to
go through its own failure detection and then wait for a full MDS timeout
period after that before it marks the MDS down. :(

We might conceivably be able to optimize for this, but there's not a
general solution. If you need to co-locate, one thing that would make it
better without being a lot of work is trying to have the MDS connect to one
of the monitors on a different host. You can do that by just restricting
the list of monitors you feed it in the ceph.conf, although it's not a
guarantee that will *prevent* it from connecting to its own monitor if
there are failures or reconnects after first startup.
-Greg

On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
wrote:

> Hi.
>
>
>
> We have a 5 node Ceph cluster (refer to ceph -s output at bottom of
> email). During resiliency tests we have an occasional problem when we
> reboot the active MDS instance and a MON instance together i.e.
>  dub-sitv-ceph-02 and dub-sitv-ceph-04. We expect the MDS to failover to
> the standby instance dub-sitv-ceph-01 which is in standby-replay mode, and
> 80% of the time it does with no problems. However, 20% of the time it
> doesn’t and the MDS_ALL_DOWN health check is not cleared until 30 seconds
> later when the rebooted dub-sitv-ceph-02 and dub-sitv-ceph-04 instances
> come back up.
>
>
>
> When the MDS successfully fails over to the standby we see in the ceph.log
> the following:
>
>
>
> 2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 50 : cluster [ERR] Health check failed: 1 filesystem is offline
> (MDS_ALL_DOWN)
>
> 2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 52 : cluster [INF] Standby daemon mds.dub-sitv-ceph-01 assigned to
> filesystem cephfs as rank 0
>
> 2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 54 : cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is
> offline)
>
>
>
> When the active MDS role does not failover to the standby the MDS_ALL_DOWN
> check is not cleared until after the rebooted instances have come back up
> e.g.:
>
>
>
> 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 55 : cluster [ERR] Health check failed: 1 filesystem is offline
> (MDS_ALL_DOWN)
>
> 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 10.18.186.208:6789/0
> 226 : cluster [INF] mon.dub-sitv-ceph-05 calling monitor election
>
> 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 56 : cluster [INF] mon.dub-sitv-ceph-03 calling monitor election
>
> 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 57 : cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons
> dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2)
>
> 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 62 : cluster [WRN] Health check failed: 1/3 mons down, quorum
> dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
>
> 2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 63 : cluster [WRN] overall HEALTH_WARN 2 osds down; 2 hosts (2 osds) down;
> 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05
>
> 2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 64 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs
> inactive, 115 pgs peering (PG_AVAILABILITY)
>
> 2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 66 : cluster [WRN] Health check failed: Degraded data redundancy: 712/2504
> objects degraded (28.435%), 86 pgs degraded (PG_DEGRADED)
>
> 2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 67 : cluster [WRN] Health check update: Reduced data availability: 1 pg
> inactive, 69 pgs peering (PG_AVAILABILITY)
>
> 2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 68 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
> availability: 1 pg inactive, 69 pgs peering)
>
> 2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 69 : cluster [WRN] Health check update: Degraded data redundancy: 1286/2572
> objects degraded (50.000%), 166 pgs degraded (PG_DEGRADED)
>
> 2018-08-25 03:30:26.139491 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 71 : cluster [WRN] Health check update: Degraded data redundancy: 1292/2584
> objects degraded (50.000%), 166 pgs degraded (PG_DEGRADED)
>
> 2018-08-25 03:30:31.355321 mon.dub-sitv-ceph-04 mon.1 10.18.53.155:6789/0
> 1 : cluster [INF] mon.dub-sitv-ceph-04 calling monitor election
>
> 2018-08-25 03:30:31.371519 mon.dub-sitv-ceph-04 mon.1 10.18.53.155:6789/0
> 2 : cluster [WRN] message from mon.0 was stamped 0.817433s in the future,
> clocks not synchronized
>
> 2018-08-25 03:30:32.175677 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
> 72 : cluster [INF] mon.dub-sitv-ceph-03 calling monitor election
>
> 2018-08-25 03:30:32.17

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread William Lawton
Thanks for the response Greg. We did originally have co-located mds and mon but 
realised this wasn't a good idea early on and separated them out onto different 
hosts. So our mds hosts are on ceph-01 and ceph-02, and our mon hosts are on 
ceph-03, 04 and 05. Unfortunately we see this issue occurring when we reboot 
ceph-02(mds) and ceph-04(mon) together. We expect ceph-01 to become the active 
mds but often it doesnt.

Sent from my iPhone

On 30 Aug 2018, at 17:46, Gregory Farnum 
mailto:gfar...@redhat.com>> wrote:

Yes, this is a consequence of co-locating the MDS and monitors — if the MDS 
reports to its co-located monitor and both fail, the monitor cluster has to go 
through its own failure detection and then wait for a full MDS timeout period 
after that before it marks the MDS down. :(

We might conceivably be able to optimize for this, but there's not a general 
solution. If you need to co-locate, one thing that would make it better without 
being a lot of work is trying to have the MDS connect to one of the monitors on 
a different host. You can do that by just restricting the list of monitors you 
feed it in the ceph.conf, although it's not a guarantee that will *prevent* it 
from connecting to its own monitor if there are failures or reconnects after 
first startup.
-Greg

On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
mailto:william.law...@irdeto.com>> wrote:
Hi.

We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). 
During resiliency tests we have an occasional problem when we reboot the active 
MDS instance and a MON instance together i.e.  dub-sitv-ceph-02 and 
dub-sitv-ceph-04. We expect the MDS to failover to the standby instance 
dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does 
with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN 
health check is not cleared until 30 seconds later when the rebooted 
dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up.

When the MDS successfully fails over to the standby we see in the ceph.log the 
following:

2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 50 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 52 : cluster [INF] Standby daemon 
mds.dub-sitv-ceph-01 assigned to filesystem cephfs as rank 0
2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 54 : cluster [INF] Health check 
cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)

When the active MDS role does not failover to the standby the MDS_ALL_DOWN 
check is not cleared until after the rebooted instances have come back up e.g.:

2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 55 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 
10.18.186.208:6789/0 226 : cluster [INF] 
mon.dub-sitv-ceph-05 calling monitor election
2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 56 : cluster [INF] 
mon.dub-sitv-ceph-03 calling monitor election
2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 57 : cluster [INF] 
mon.dub-sitv-ceph-03 is new leader, mons dub-sitv-ceph-03,dub-sitv-ceph-05 in 
quorum (ranks 0,2)
2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 62 : cluster [WRN] Health check 
failed: 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 63 : cluster [WRN] overall 
HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 mons down, quorum 
dub-sitv-ceph-03,dub-sitv-ceph-05
2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 64 : cluster [WRN] Health check 
failed: Reduced data availability: 2 pgs inactive, 115 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 66 : cluster [WRN] Health check 
failed: Degraded data redundancy: 712/2504 objects degraded (28.435%), 86 pgs 
degraded (PG_DEGRADED)
2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 67 : cluster [WRN] Health check 
update: Reduced data availability: 1 pg inactive, 69 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 68 : cluster [INF] Health check 
cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 69 pgs 
peering)
2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 69 : cluster [WR

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread Gregory Farnum
Okay, well that will be the same reason then. If the active MDS is
connected to a monitor and they fail at the same time, the monitors can’t
replace the mds until they’ve been through their own election and a full
mds timeout window.
On Thu, Aug 30, 2018 at 11:46 AM William Lawton 
wrote:

> Thanks for the response Greg. We did originally have co-located mds and
> mon but realised this wasn't a good idea early on and separated them out
> onto different hosts. So our mds hosts are on ceph-01 and ceph-02, and our
> mon hosts are on ceph-03, 04 and 05. Unfortunately we see this issue
> occurring when we reboot ceph-02(mds) and ceph-04(mon) together. We expect
> ceph-01 to become the active mds but often it doesnt.
>
> Sent from my iPhone
>
> On 30 Aug 2018, at 17:46, Gregory Farnum  wrote:
>
> Yes, this is a consequence of co-locating the MDS and monitors — if the
> MDS reports to its co-located monitor and both fail, the monitor cluster
> has to go through its own failure detection and then wait for a full MDS
> timeout period after that before it marks the MDS down. :(
>
> We might conceivably be able to optimize for this, but there's not a
> general solution. If you need to co-locate, one thing that would make it
> better without being a lot of work is trying to have the MDS connect to one
> of the monitors on a different host. You can do that by just restricting
> the list of monitors you feed it in the ceph.conf, although it's not a
> guarantee that will *prevent* it from connecting to its own monitor if
> there are failures or reconnects after first startup.
> -Greg
>
> On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
> wrote:
>
>> Hi.
>>
>>
>>
>> We have a 5 node Ceph cluster (refer to ceph -s output at bottom of
>> email). During resiliency tests we have an occasional problem when we
>> reboot the active MDS instance and a MON instance together i.e.
>>  dub-sitv-ceph-02 and dub-sitv-ceph-04. We expect the MDS to failover to
>> the standby instance dub-sitv-ceph-01 which is in standby-replay mode, and
>> 80% of the time it does with no problems. However, 20% of the time it
>> doesn’t and the MDS_ALL_DOWN health check is not cleared until 30 seconds
>> later when the rebooted dub-sitv-ceph-02 and dub-sitv-ceph-04 instances
>> come back up.
>>
>>
>>
>> When the MDS successfully fails over to the standby we see in the
>> ceph.log the following:
>>
>>
>>
>> 2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 50 : cluster [ERR] Health check failed: 1 filesystem is offline
>> (MDS_ALL_DOWN)
>>
>> 2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 52 : cluster [INF] Standby daemon mds.dub-sitv-ceph-01 assigned to
>> filesystem cephfs as rank 0
>>
>> 2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 54 : cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is
>> offline)
>>
>>
>>
>> When the active MDS role does not failover to the standby the
>> MDS_ALL_DOWN check is not cleared until after the rebooted instances have
>> come back up e.g.:
>>
>>
>>
>> 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 55 : cluster [ERR] Health check failed: 1 filesystem is offline
>> (MDS_ALL_DOWN)
>>
>> 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2
>> 10.18.186.208:6789/0 226 : cluster [INF] mon.dub-sitv-ceph-05 calling
>> monitor election
>>
>> 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 56 : cluster [INF] mon.dub-sitv-ceph-03 calling monitor election
>>
>> 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 57 : cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons
>> dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2)
>>
>> 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 62 : cluster [WRN] Health check failed: 1/3 mons down, quorum
>> dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
>>
>> 2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 63 : cluster [WRN] overall HEALTH_WARN 2 osds down; 2 hosts (2 osds) down;
>> 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05
>>
>> 2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 64 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs
>> inactive, 115 pgs peering (PG_AVAILABILITY)
>>
>> 2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 66 : cluster [WRN] Health check failed: Degraded data redundancy: 712/2504
>> objects degraded (28.435%), 86 pgs degraded (PG_DEGRADED)
>>
>> 2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 67 : cluster [WRN] Health check update: Reduced data availability: 1 pg
>> inactive, 69 pgs peering (PG_AVAILABILITY)
>>
>> 2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>> 68 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
>> availability: 1 pg inactive, 69 pgs peering)
>>
>> 2018-08-

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread William Lawton
Oh i see. We’d taken steps to reduce the risk of losing the active mds and mon 
leader instances at the same time in the hope that it would prevent this issue. 
Do you know if the mds always connects to a specific mon instance i.e. the mon 
provider and can it be determined which mon instance that is? Or is it adhoc?

Sent from my iPhone

On 30 Aug 2018, at 20:01, Gregory Farnum 
mailto:gfar...@redhat.com>> wrote:

Okay, well that will be the same reason then. If the active MDS is connectedng  
to a monitor and they fail at the same time, the monitors can’t replace the mds 
until they’ve been through their own election and a full mds timeout window.
On Thu, Aug 30, 2018 at 11:46 AM William Lawton 
mailto:william.law...@irdeto.com>> wrote:
Thanks for the response Greg. We did originally have co-located mds and mon but 
realised this wasn't a good idea early on and separated them out onto different 
hosts. So our mds hosts are on ceph-01 and ceph-02, and our mon hosts are on 
ceph-03, 04 and 05. Unfortunately we see this issue occurring when we reboot 
ceph-02(mds) and ceph-04(mon) together. We expect ceph-01 to become the active 
mds but often it doesnt.

Sent from my iPhone

On 30 Aug 2018, at 17:46, Gregory Farnum 
mailto:gfar...@redhat.com>> wrote:

Yes, this is a consequence of co-locating the MDS and monitors — if the MDS 
reports to its co-located monitor and both fail, the monitor cluster has to go 
through its own failure detection and then wait for a full MDS timeout period 
after that before it marks the MDS down. :(

We might conceivably be able to optimize for this, but there's not a general 
solution. If you need to co-locate, one thing that would make it better without 
being a lot of work is trying to have the MDS connect to one of the monitors on 
a different host. You can do that by just restricting the list of monitors you 
feed it in the ceph.conf, although it's not a guarantee that will *prevent* it 
from connecting to its own monitor if there are failures or reconnects after 
first startup.
-Greg

On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
mailto:william.law...@irdeto.com>> wrote:
Hi.

We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). 
During resiliency tests we have an occasional problem when we reboot the active 
MDS instance and a MON instance together i.e.  dub-sitv-ceph-02 and 
dub-sitv-ceph-04. We expect the MDS to failover to the standby instance 
dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does 
with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN 
health check is not cleared until 30 seconds later when the rebooted 
dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up.

When the MDS successfully fails over to the standby we see in the ceph.log the 
following:

2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 50 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 52 : cluster [INF] Standby daemon 
mds.dub-sitv-ceph-01 assigned to filesystem cephfs as rank 0
2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 54 : cluster [INF] Health check 
cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)

When the active MDS role does not failover to the standby the MDS_ALL_DOWN 
check is not cleared until after the rebooted instances have come back up e.g.:

2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 55 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 
10.18.186.208:6789/0 226 : cluster [INF] 
mon.dub-sitv-ceph-05 calling monitor election
2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 56 : cluster [INF] 
mon.dub-sitv-ceph-03 calling monitor election
2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 57 : cluster [INF] 
mon.dub-sitv-ceph-03 is new leader, mons dub-sitv-ceph-03,dub-sitv-ceph-05 in 
quorum (ranks 0,2)
2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 62 : cluster [WRN] Health check 
failed: 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 63 : cluster [WRN] overall 
HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 mons down, quorum 
dub-sitv-ceph-03,dub-sitv-ceph-05
2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 64 : cluster [WRN] Health check 
failed: Reduced data availability: 2 pgs inactive, 115 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-0

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-08-30 Thread Gregory Farnum
LOn Thu, Aug 30, 2018 at 12:46 PM William Lawton 
wrote:

Oh i see. We’d taken steps to reduce the risk of losing the active mds and
> mon leader instances at the same time in the hope that it would prevent
> this issue. Do you know if the mds always connects to a specific mon
> instance i.e. the mon provider and can it be determined which mon instance
> that is? Or is it adhoc?
>



On Thu, Aug 30, 2018 at 9:45 AM Gregory Farnum  wrote:

> If you need to co-locate, one thing that would make it better without
> being a lot of work is trying to have the MDS connect to one of the
> monitors on a different host. You can do that by just restricting the list
> of monitors you feed it in the ceph.conf, although it's not a guarantee
> that will *prevent* it from connecting to its own monitor if there are
> failures or reconnects after first startup.
>

:)


> Sent from my iPhone
>
> On 30 Aug 2018, at 20:01, Gregory Farnum  wrote:
>
> Okay, well that will be the same reason then. If the active MDS is
> connectedng  to a monitor and they fail at the same time, the monitors
> can’t replace the mds until they’ve been through their own election and a
> full mds timeout window.
>
>
> On Thu, Aug 30, 2018 at 11:46 AM William Lawton 
> wrote:
>
>> Thanks for the response Greg. We did originally have co-located mds and
>> mon but realised this wasn't a good idea early on and separated them out
>> onto different hosts. So our mds hosts are on ceph-01 and ceph-02, and our
>> mon hosts are on ceph-03, 04 and 05. Unfortunately we see this issue
>> occurring when we reboot ceph-02(mds) and ceph-04(mon) together. We expect
>> ceph-01 to become the active mds but often it doesnt.
>>
>> Sent from my iPhone
>>
>> On 30 Aug 2018, at 17:46, Gregory Farnum  wrote:
>>
>> Yes, this is a consequence of co-locating the MDS and monitors — if the
>> MDS reports to its co-located monitor and both fail, the monitor cluster
>> has to go through its own failure detection and then wait for a full MDS
>> timeout period after that before it marks the MDS down. :(
>>
>> We might conceivably be able to optimize for this, but there's not a
>> general solution. If you need to co-locate, one thing that would make it
>> better without being a lot of work is trying to have the MDS connect to one
>> of the monitors on a different host. You can do that by just restricting
>> the list of monitors you feed it in the ceph.conf, although it's not a
>> guarantee that will *prevent* it from connecting to its own monitor if
>> there are failures or reconnects after first startup.
>> -Greg
>>
>> On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
>> wrote:
>>
>>> Hi.
>>>
>>>
>>>
>>> We have a 5 node Ceph cluster (refer to ceph -s output at bottom of
>>> email). During resiliency tests we have an occasional problem when we
>>> reboot the active MDS instance and a MON instance together i.e.
>>>  dub-sitv-ceph-02 and dub-sitv-ceph-04. We expect the MDS to failover to
>>> the standby instance dub-sitv-ceph-01 which is in standby-replay mode, and
>>> 80% of the time it does with no problems. However, 20% of the time it
>>> doesn’t and the MDS_ALL_DOWN health check is not cleared until 30 seconds
>>> later when the rebooted dub-sitv-ceph-02 and dub-sitv-ceph-04 instances
>>> come back up.
>>>
>>>
>>>
>>> When the MDS successfully fails over to the standby we see in the
>>> ceph.log the following:
>>>
>>>
>>>
>>> 2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 50 : cluster [ERR] Health check failed: 1 filesystem is offline
>>> (MDS_ALL_DOWN)
>>>
>>> 2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 52 : cluster [INF] Standby daemon mds.dub-sitv-ceph-01 assigned to
>>> filesystem cephfs as rank 0
>>>
>>> 2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 54 : cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is
>>> offline)
>>>
>>>
>>>
>>> When the active MDS role does not failover to the standby the
>>> MDS_ALL_DOWN check is not cleared until after the rebooted instances have
>>> come back up e.g.:
>>>
>>>
>>>
>>> 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 55 : cluster [ERR] Health check failed: 1 filesystem is offline
>>> (MDS_ALL_DOWN)
>>>
>>> 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2
>>> 10.18.186.208:6789/0 226 : cluster [INF] mon.dub-sitv-ceph-05 calling
>>> monitor election
>>>
>>> 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 56 : cluster [INF] mon.dub-sitv-ceph-03 calling monitor election
>>>
>>> 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 57 : cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons
>>> dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2)
>>>
>>> 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0
>>> 62 : cluster [WRN] Health check failed: 1/3 mons down, quorum
>>> dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
>>>
>>> 2

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-01 Thread Bryan Henderson
> If the active MDS is connected to a monitor and they fail at the same time,
> the monitors can't replace the mds until they've been through their own
> election and a full mds timeout window.

So how long are we talking?

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-03 Thread William Lawton
Which configuration option determines the MDS timeout period?

William Lawton

From: Gregory Farnum 
Sent: Thursday, August 30, 2018 5:46 PM
To: William Lawton 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS does not always failover to hot standby on reboot

Yes, this is a consequence of co-locating the MDS and monitors — if the MDS 
reports to its co-located monitor and both fail, the monitor cluster has to go 
through its own failure detection and then wait for a full MDS timeout period 
after that before it marks the MDS down. :(

We might conceivably be able to optimize for this, but there's not a general 
solution. If you need to co-locate, one thing that would make it better without 
being a lot of work is trying to have the MDS connect to one of the monitors on 
a different host. You can do that by just restricting the list of monitors you 
feed it in the ceph.conf, although it's not a guarantee that will *prevent* it 
from connecting to its own monitor if there are failures or reconnects after 
first startup.
-Greg
On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
mailto:william.law...@irdeto.com>> wrote:
Hi.

We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). 
During resiliency tests we have an occasional problem when we reboot the active 
MDS instance and a MON instance together i.e.  dub-sitv-ceph-02 and 
dub-sitv-ceph-04. We expect the MDS to failover to the standby instance 
dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does 
with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN 
health check is not cleared until 30 seconds later when the rebooted 
dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up.

When the MDS successfully fails over to the standby we see in the ceph.log the 
following:

2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 50 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 52 : cluster [INF] Standby daemon 
mds.dub-sitv-ceph-01 assigned to filesystem cephfs as rank 0
2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 54 : cluster [INF] Health check 
cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)

When the active MDS role does not failover to the standby the MDS_ALL_DOWN 
check is not cleared until after the rebooted instances have come back up e.g.:

2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 55 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 
10.18.186.208:6789/0<http://10.18.186.208:6789/0> 226 : cluster [INF] 
mon.dub-sitv-ceph-05 calling monitor election
2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 56 : cluster [INF] 
mon.dub-sitv-ceph-03 calling monitor election
2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 57 : cluster [INF] 
mon.dub-sitv-ceph-03 is new leader, mons dub-sitv-ceph-03,dub-sitv-ceph-05 in 
quorum (ranks 0,2)
2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 62 : cluster [WRN] Health check 
failed: 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 63 : cluster [WRN] overall 
HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 mons down, quorum 
dub-sitv-ceph-03,dub-sitv-ceph-05
2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 64 : cluster [WRN] Health check 
failed: Reduced data availability: 2 pgs inactive, 115 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 66 : cluster [WRN] Health check 
failed: Degraded data redundancy: 712/2504 objects degraded (28.435%), 86 pgs 
degraded (PG_DEGRADED)
2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 67 : cluster [WRN] Health check 
update: Reduced data availability: 1 pg inactive, 69 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 68 : cluster [INF] Health check 
cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 69 pgs 
peering)
2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0<http://10.18.53.32:6789/0> 69 : cluster [WRN] Health check 
update: Degraded data redundancy: 1286/2572 objects degraded (50.000%), 166 pgs 
degraded (PG_DEGRADED)
2018-08-25 03:30:26.139491

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-04 Thread John Spray
It's mds_beacon_grace.  Set that on the monitor to control the
replacement of laggy MDS daemons, and usually also set it to the same
value on the MDS daemon as it's used there for the daemon to hold off
on certain tasks if it hasn't seen a mon beacon recently.

John
On Mon, Sep 3, 2018 at 9:26 AM William Lawton  wrote:
>
> Which configuration option determines the MDS timeout period?
>
>
>
> William Lawton
>
>
>
> From: Gregory Farnum 
> Sent: Thursday, August 30, 2018 5:46 PM
> To: William Lawton 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] MDS does not always failover to hot standby on 
> reboot
>
>
>
> Yes, this is a consequence of co-locating the MDS and monitors — if the MDS 
> reports to its co-located monitor and both fail, the monitor cluster has to 
> go through its own failure detection and then wait for a full MDS timeout 
> period after that before it marks the MDS down. :(
>
>
>
> We might conceivably be able to optimize for this, but there's not a general 
> solution. If you need to co-locate, one thing that would make it better 
> without being a lot of work is trying to have the MDS connect to one of the 
> monitors on a different host. You can do that by just restricting the list of 
> monitors you feed it in the ceph.conf, although it's not a guarantee that 
> will *prevent* it from connecting to its own monitor if there are failures or 
> reconnects after first startup.
>
> -Greg
>
> On Thu, Aug 30, 2018 at 8:38 AM William Lawton  
> wrote:
>
> Hi.
>
>
>
> We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). 
> During resiliency tests we have an occasional problem when we reboot the 
> active MDS instance and a MON instance together i.e.  dub-sitv-ceph-02 and 
> dub-sitv-ceph-04. We expect the MDS to failover to the standby instance 
> dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does 
> with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN 
> health check is not cleared until 30 seconds later when the rebooted 
> dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up.
>
>
>
> When the MDS successfully fails over to the standby we see in the ceph.log 
> the following:
>
>
>
> 2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 50 : 
> cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
>
> 2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 52 : 
> cluster [INF] Standby daemon mds.dub-sitv-ceph-01 assigned to filesystem 
> cephfs as rank 0
>
> 2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 54 : 
> cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is 
> offline)
>
>
>
> When the active MDS role does not failover to the standby the MDS_ALL_DOWN 
> check is not cleared until after the rebooted instances have come back up 
> e.g.:
>
>
>
> 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 55 : 
> cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
>
> 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 10.18.186.208:6789/0 
> 226 : cluster [INF] mon.dub-sitv-ceph-05 calling monitor election
>
> 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 56 : 
> cluster [INF] mon.dub-sitv-ceph-03 calling monitor election
>
> 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 57 : 
> cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons 
> dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2)
>
> 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 62 : 
> cluster [WRN] Health check failed: 1/3 mons down, quorum 
> dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
>
> 2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 63 : 
> cluster [WRN] overall HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 
> mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05
>
> 2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 64 : 
> cluster [WRN] Health check failed: Reduced data availability: 2 pgs inactive, 
> 115 pgs peering (PG_AVAILABILITY)
>
> 2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 66 : 
> cluster [WRN] Health check failed: Degraded data redundancy: 712/2504 objects 
> degraded (28.435%), 86 pgs degraded (PG_DEGRADED)
>
> 2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 67 : 
> cluster [WRN] Health check update: Reduced data availability: 1 pg inactive, 
> 69 pgs peering (PG_AVAILABILITY)
>
> 2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 10.1