Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Tarek Zegar

https://github.com/ceph/ceph-ansible/issues/3961   <--- created ticket

Thanks
Tarek



From:   Matthew Vernon 
To: Tarek Zegar , solarflo...@gmail.com
Cc: ceph-users@lists.ceph.com
Date:   05/14/2019 04:41 AM
Subject:[EXTERNAL] Re: [ceph-users] Rolling upgrade fails with flag
    norebalance with background IO [EXT]



On 14/05/2019 00:36, Tarek Zegar wrote:
> It's not just mimic to nautilus
> I confirmed with luminous to mimic
>
> They are checking for clean pgs with flags set, they should unset flags,
> then check. Set flags again, move on to next osd

I think I'm inclined to agree that "norebalance" is likely to get in the
way when upgrading a cluster - our rolling upgrade playbook omits it.

OTOH, you might want to raise this on the ceph-ansible list (
ceph-ansi...@lists.ceph.com ) and/or as a github issue - I don't think
the ceph-ansible maintainers routinely watch this list.

HTH,

Matthew


--
 The Wellcome Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Matthew Vernon
On 14/05/2019 00:36, Tarek Zegar wrote:
> It's not just mimic to nautilus
> I confirmed with luminous to mimic
>  
> They are checking for clean pgs with flags set, they should unset flags,
> then check. Set flags again, move on to next osd

I think I'm inclined to agree that "norebalance" is likely to get in the
way when upgrading a cluster - our rolling upgrade playbook omits it.

OTOH, you might want to raise this on the ceph-ansible list (
ceph-ansi...@lists.ceph.com ) and/or as a github issue - I don't think
the ceph-ansible maintainers routinely watch this list.

HTH,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-13 Thread Tarek Zegar
It's not just mimic to nautilus
I confirmed with luminous to mimic
 
They are checking for clean pgs with flags set, they should unset flags, then check. Set flags again, move on to next osd
 
- Original message -From: solarflow99 To: Tarek Zegar Cc: Ceph Users Subject: [EXTERNAL] Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IODate: Mon, May 13, 2019 6:36 PM 
Are you sure can you really use 3.2 for nautilus?   

On Fri, May 10, 2019 at 7:23 AM Tarek Zegar  wrote:
Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets flag "norebalance". When there is*no* I/O to the cluster, upgrade works fine. When upgrading with IO running in the background, some PG become `active+undersized+remapped+backfilling`Flag norebalance prevents them from backfilling / recovering and upgrade fails. I'm uncertain why those OSD are "backfilling" instead of "recovering" but I guess it doesn't matter, norebalance halts the process. setting ceph tell osd.* injectargs '--osd_max_backfills=2 made no difference https://github.com/ceph/ceph-ansible/commit/08d94324545b3c4e0f6a1caf6224f37d1c2b36db  <-- did anyone other then the author verify this?Tarek ___ceph-users mailing listceph-users@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-13 Thread solarflow99
Are you sure can you really use 3.2 for nautilus?

On Fri, May 10, 2019 at 7:23 AM Tarek Zegar  wrote:

> Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets
> flag "norebalance". When there is*no* I/O to the cluster, upgrade works
> fine. When upgrading with IO running in the background, some PG become
> `active+undersized+remapped+backfilling`
> Flag norebalance prevents them from backfilling / recovering and upgrade
> fails. I'm uncertain why those OSD are "backfilling" instead of
> "recovering" but I guess it doesn't matter, norebalance halts the process.
> setting ceph tell osd.* injectargs '--osd_max_backfills=2 made no
> difference
>
>
> *https://github.com/ceph/ceph-ansible/commit/08d94324545b3c4e0f6a1caf6224f37d1c2b36db*
> 
>  <--
> did anyone other then the author verify this?
>
> *Tarek*
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com