Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-25 Thread Adrien Gillard
The issue is finally resolved.
Upgrading to Luminous was the way to go. Unfortunately, we did not set
'ceph osd require-osd-release luminous' immediately so we did not
activate the luminous functionnalities that saved us.

I think the new mechanisms to manage and prune past intervals[1]
allowed the OSDs to start without consuming enormous amounts of memory
(around 1.5GB for the majority, up to 10GB for a few).
The pgs took the larger part of the day to peer and activate and the
cluster eventually ended up in HEALTH_OK. We are having a maximum of
scrubs over the week-end to detect inconsistencies that could have
appeared with the flapping.

Thank you again for your help !

[1] https://ceph.com/community/new-luminous-pg-overdose-protection/
On Thu, Aug 23, 2018 at 10:04 PM Adrien Gillard
 wrote:
>
> Sending back, forgot the plain text for ceph-devel.
>
> Sorry about that.
>
> On Thu, Aug 23, 2018 at 9:57 PM Adrien Gillard  
> wrote:
> >
> > We are running CentOS 7.5 with upstream Ceph packages, no remote syslog, 
> > just default local logging.
> >
> > After looking a bit deeper into pprof, --alloc_space seems to represent 
> > allocations that happened since the program started which goes along with 
> > the quick deallocation of the memory. --inuse_space (the default) indicates 
> > allocation at collection. std::_Rb_tree::_M_copy uses the most memory here 
> > but I have other samples (well, most of the others actually) where 
> > ceph::logging::Log::create_entry has the lead.
> > I will look at the admin socket.
> >
> > I'm adding ceph-devel, as someone might have a clue as to how to better 
> > intepret these results.
> >
> > Thank you for keeping up with this :)
> >
> >
> > On Thu, Aug 23, 2018 at 6:58 PM Gregory Farnum  wrote:
> >>
> >> On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard  
> >> wrote:
> >>>
> >>> With a bit of profiling, it seems all the memory is allocated to  
> >>> ceph::logging::Log::create_entry (see below)
> >>>
> >>> Shoould this be normal ? Is it because some OSDs are down and it logs the 
> >>> results of its osd_ping  ?
> >>
> >>
> >> Hmm, is that where the memory is actually *staying* or was it merely 
> >> allocated and then quickly deallocated? If you have debugging turned on 
> >> the volume of log output can be quite large, but it's a small amount kept 
> >> resident in memory at a time. Given that you don't seem to have any logs 
> >> over level 5 (except monc at 10) that does seem like a very strange ratio, 
> >> though.
> >>
> >> Generally log messages are put into a circular buffer and then removed, 
> >> and if the buffer gets full the thread trying to log just gets blocked. 
> >> I'm not sure how you could get very large arbitrary amounts of data like 
> >> that. Are you using syslog or something? I don't think that should do it, 
> >> but that's all I can imagine.
> >> That or somehow the upgrade has broken the way the OSD releases memory 
> >> back to the OS. Have you looked at the heap stats available through the 
> >> admin socket? What distro are you running and what's the source of your 
> >> packages?
> >> -Greg
> >>
> >>>
> >>>
> >>> The debug level of the OSD is below also.
> >>>
> >>> Thanks,
> >>>
> >>> Adrien
> >>>
> >>>  $ pprof /usr/bin/ceph-osd osd.36.profile.0042-first.heap
> >>> Using local file /usr/bin/ceph-osd.
> >>> Using local file osd.36.profile.0042-first.heap.
> >>> Welcome to pprof!  For help, type 'help'.
> >>> (pprof) top
> >>> Total: 2468.0 MB
> >>>519.5  21.0%  21.0%   1000.4  40.5% std::_Rb_tree::_M_copy
> >>>481.0  19.5%  40.5%481.0  19.5% std::_Rb_tree::_M_create_node
> >>>384.3  15.6%  56.1%384.3  15.6% 
> >>> std::_Rb_tree::_M_emplace_hint_unique
> >>>374.0  15.2%  71.3%374.0  15.2% 
> >>> ceph::buffer::create_aligned_in_mempool
> >>>305.6  12.4%  83.6%305.6  12.4% ceph::logging::Log::create_entry
> >>>217.3   8.8%  92.4%220.2   8.9% std::vector::_M_emplace_back_aux
> >>>114.4   4.6%  97.1%115.6   4.7% ceph::buffer::list::append@a51900
> >>> 21.4   0.9%  98.0%210.0   8.5% OSD::heartbeat
> >>> 21.2   0.9%  98.8% 21.2   0.9% std::string::_Rep::_S_create
> >>>  4.7   0.2%  99.0%266.2  10.8% AsyncConnection::send_message
> >>> (pprof)
> >>>
> >>>
> >>>  $ pprof /usr/bin/ceph-osd --alloc_space osd.36.profile.0042-first.heap
> >>> Using local file /usr/bin/ceph-osd.
> >>> Using local file osd.36.profile.0042.heap.
> >>> Welcome to pprof!  For help, type 'help'.
> >>> (pprof) top
> >>> Total: 16519.3 MB
> >>>  11915.7  72.1%  72.1%  11915.7  72.1% ceph::logging::Log::create_entry
> >>>745.0   4.5%  76.6%745.0   4.5% std::string::_Rep::_S_create
> >>>716.9   4.3%  81.0%716.9   4.3% 
> >>> ceph::buffer::create_aligned_in_mempool
> >>>700.2   4.2%  85.2%703.8   4.3% std::vector::_M_emplace_back_aux
> >>>671.9   4.1%  89.3%671.9   4.1% 
> >>> std::_Rb_tree::_M_emplace_hint_unique
> >>>557.8   3.4%  92.7%   1075.2   6.5% 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
Sending back, forgot the plain text for ceph-devel.

Sorry about that.

On Thu, Aug 23, 2018 at 9:57 PM Adrien Gillard  wrote:
>
> We are running CentOS 7.5 with upstream Ceph packages, no remote syslog, just 
> default local logging.
>
> After looking a bit deeper into pprof, --alloc_space seems to represent 
> allocations that happened since the program started which goes along with the 
> quick deallocation of the memory. --inuse_space (the default) indicates 
> allocation at collection. std::_Rb_tree::_M_copy uses the most memory here 
> but I have other samples (well, most of the others actually) where 
> ceph::logging::Log::create_entry has the lead.
> I will look at the admin socket.
>
> I'm adding ceph-devel, as someone might have a clue as to how to better 
> intepret these results.
>
> Thank you for keeping up with this :)
>
>
> On Thu, Aug 23, 2018 at 6:58 PM Gregory Farnum  wrote:
>>
>> On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard  
>> wrote:
>>>
>>> With a bit of profiling, it seems all the memory is allocated to  
>>> ceph::logging::Log::create_entry (see below)
>>>
>>> Shoould this be normal ? Is it because some OSDs are down and it logs the 
>>> results of its osd_ping  ?
>>
>>
>> Hmm, is that where the memory is actually *staying* or was it merely 
>> allocated and then quickly deallocated? If you have debugging turned on the 
>> volume of log output can be quite large, but it's a small amount kept 
>> resident in memory at a time. Given that you don't seem to have any logs 
>> over level 5 (except monc at 10) that does seem like a very strange ratio, 
>> though.
>>
>> Generally log messages are put into a circular buffer and then removed, and 
>> if the buffer gets full the thread trying to log just gets blocked. I'm not 
>> sure how you could get very large arbitrary amounts of data like that. Are 
>> you using syslog or something? I don't think that should do it, but that's 
>> all I can imagine.
>> That or somehow the upgrade has broken the way the OSD releases memory back 
>> to the OS. Have you looked at the heap stats available through the admin 
>> socket? What distro are you running and what's the source of your packages?
>> -Greg
>>
>>>
>>>
>>> The debug level of the OSD is below also.
>>>
>>> Thanks,
>>>
>>> Adrien
>>>
>>>  $ pprof /usr/bin/ceph-osd osd.36.profile.0042-first.heap
>>> Using local file /usr/bin/ceph-osd.
>>> Using local file osd.36.profile.0042-first.heap.
>>> Welcome to pprof!  For help, type 'help'.
>>> (pprof) top
>>> Total: 2468.0 MB
>>>519.5  21.0%  21.0%   1000.4  40.5% std::_Rb_tree::_M_copy
>>>481.0  19.5%  40.5%481.0  19.5% std::_Rb_tree::_M_create_node
>>>384.3  15.6%  56.1%384.3  15.6% std::_Rb_tree::_M_emplace_hint_unique
>>>374.0  15.2%  71.3%374.0  15.2% 
>>> ceph::buffer::create_aligned_in_mempool
>>>305.6  12.4%  83.6%305.6  12.4% ceph::logging::Log::create_entry
>>>217.3   8.8%  92.4%220.2   8.9% std::vector::_M_emplace_back_aux
>>>114.4   4.6%  97.1%115.6   4.7% ceph::buffer::list::append@a51900
>>> 21.4   0.9%  98.0%210.0   8.5% OSD::heartbeat
>>> 21.2   0.9%  98.8% 21.2   0.9% std::string::_Rep::_S_create
>>>  4.7   0.2%  99.0%266.2  10.8% AsyncConnection::send_message
>>> (pprof)
>>>
>>>
>>>  $ pprof /usr/bin/ceph-osd --alloc_space osd.36.profile.0042-first.heap
>>> Using local file /usr/bin/ceph-osd.
>>> Using local file osd.36.profile.0042.heap.
>>> Welcome to pprof!  For help, type 'help'.
>>> (pprof) top
>>> Total: 16519.3 MB
>>>  11915.7  72.1%  72.1%  11915.7  72.1% ceph::logging::Log::create_entry
>>>745.0   4.5%  76.6%745.0   4.5% std::string::_Rep::_S_create
>>>716.9   4.3%  81.0%716.9   4.3% 
>>> ceph::buffer::create_aligned_in_mempool
>>>700.2   4.2%  85.2%703.8   4.3% std::vector::_M_emplace_back_aux
>>>671.9   4.1%  89.3%671.9   4.1% std::_Rb_tree::_M_emplace_hint_unique
>>>557.8   3.4%  92.7%   1075.2   6.5% std::_Rb_tree::_M_copy
>>>517.4   3.1%  95.8%517.4   3.1% std::_Rb_tree::_M_create_node
>>>468.2   2.8%  98.6%469.6   2.8% ceph::buffer::list::append@a51900
>>>122.8   0.7%  99.4%122.8   0.7% ceph::buffer::list::append@a51d20
>>> 33.3   0.2%  99.6%   7042.7  42.6% OSD::heartbeat
>>> (pprof)
>>>
>>>
>>>  $ ceph --admin-daemon /var/run/ceph/ceph-osd.36.asok config show | grep 
>>> debug
>>> "debug_none": "0/5",
>>> "debug_lockdep": "0/1",
>>> "debug_context": "0/1",
>>> "debug_crush": "1/1",
>>> "debug_mds": "1/5",
>>> "debug_mds_balancer": "1/5",
>>> "debug_mds_locker": "1/5",
>>> "debug_mds_log": "1/5",
>>> "debug_mds_log_expire": "1/5",
>>> "debug_mds_migrator": "1/5",
>>> "debug_buffer": "0/1",
>>> "debug_timer": "0/1",
>>> "debug_filer": "0/1",
>>> "debug_striper": "0/1",
>>> "debug_objecter": "0/1",
>>> "debug_rados": "0/5",
>>> "debug_rbd": "0/5",
>>> "debug_rbd_mirror": "0/5",
>>> "debug_rbd_replay": 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
We are running CentOS 7.5 with upstream Ceph packages, no remote syslog,
just default local logging.

After looking a bit deeper into pprof, --alloc_space seems to represent
allocations that happened since the program started which goes along with
the quick deallocation of the memory. --inuse_space (the default) indicates
allocation at collection. std::_Rb_tree::_M_copy uses the most memory here
but I have other samples (well, most of the others actually) where
ceph::logging::Log::create_entry has the lead.
I will look at the admin socket.

I'm adding ceph-devel, as someone might have a clue as to how to better
intepret these results.

Thank you for keeping up with this :)


On Thu, Aug 23, 2018 at 6:58 PM Gregory Farnum  wrote:

> On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard 
> wrote:
>
>> With a bit of profiling, it seems all the memory is allocated to
>> ceph::logging::Log::create_entry (see below)
>>
>> Shoould this be normal ? Is it because some OSDs are down and it logs the
>> results of its osd_ping  ?
>>
>
> Hmm, is that where the memory is actually *staying* or was it merely
> allocated and then quickly deallocated? If you have debugging turned on the
> volume of log output can be quite large, but it's a small amount kept
> resident in memory at a time. Given that you don't seem to have any logs
> over level 5 (except monc at 10) that does seem like a very strange ratio,
> though.
>
> Generally log messages are put into a circular buffer and then removed,
> and if the buffer gets full the thread trying to log just gets blocked. I'm
> not sure how you could get very large arbitrary amounts of data like that.
> Are you using syslog or something? I don't think that should do it, but
> that's all I can imagine.
> That or somehow the upgrade has broken the way the OSD releases memory
> back to the OS. Have you looked at the heap stats available through the
> admin socket? What distro are you running and what's the source of your
> packages?
> -Greg
>
>
>>
>> The debug level of the OSD is below also.
>>
>> Thanks,
>>
>> Adrien
>>
>>  $ pprof /usr/bin/ceph-osd osd.36.profile.0042-first.heap
>> Using local file /usr/bin/ceph-osd.
>> Using local file osd.36.profile.0042-first.heap.
>> Welcome to pprof!  For help, type 'help'.
>> (pprof) top
>> Total: 2468.0 MB
>>519.5  21.0%  21.0%   1000.4  40.5% std::_Rb_tree::_M_copy
>>481.0  19.5%  40.5%481.0  19.5% std::_Rb_tree::_M_create_node
>>384.3  15.6%  56.1%384.3  15.6%
>> std::_Rb_tree::_M_emplace_hint_unique
>>374.0  15.2%  71.3%374.0  15.2%
>> ceph::buffer::create_aligned_in_mempool
>>305.6  12.4%  83.6%305.6  12.4% ceph::logging::Log::create_entry
>>217.3   8.8%  92.4%220.2   8.9% std::vector::_M_emplace_back_aux
>>114.4   4.6%  97.1%115.6   4.7% ceph::buffer::list::append@a51900
>> 21.4   0.9%  98.0%210.0   8.5% OSD::heartbeat
>> 21.2   0.9%  98.8% 21.2   0.9% std::string::_Rep::_S_create
>>  4.7   0.2%  99.0%266.2  10.8% AsyncConnection::send_message
>> (pprof)
>>
>>
>>  $ pprof /usr/bin/ceph-osd --alloc_space osd.36.profile.0042-first.heap
>> Using local file /usr/bin/ceph-osd.
>> Using local file osd.36.profile.0042.heap.
>> Welcome to pprof!  For help, type 'help'.
>> (pprof) top
>> Total: 16519.3 MB
>>  11915.7  72.1%  72.1%  11915.7  72.1% ceph::logging::Log::create_entry
>>745.0   4.5%  76.6%745.0   4.5% std::string::_Rep::_S_create
>>716.9   4.3%  81.0%716.9   4.3%
>> ceph::buffer::create_aligned_in_mempool
>>700.2   4.2%  85.2%703.8   4.3% std::vector::_M_emplace_back_aux
>>671.9   4.1%  89.3%671.9   4.1%
>> std::_Rb_tree::_M_emplace_hint_unique
>>557.8   3.4%  92.7%   1075.2   6.5% std::_Rb_tree::_M_copy
>>517.4   3.1%  95.8%517.4   3.1% std::_Rb_tree::_M_create_node
>>468.2   2.8%  98.6%469.6   2.8% ceph::buffer::list::append@a51900
>>122.8   0.7%  99.4%122.8   0.7% ceph::buffer::list::append@a51d20
>> 33.3   0.2%  99.6%   7042.7  42.6% OSD::heartbeat
>> (pprof)
>>
>>
>>  $ ceph --admin-daemon /var/run/ceph/ceph-osd.36.asok config show | grep
>> debug
>> "debug_none": "0/5",
>> "debug_lockdep": "0/1",
>> "debug_context": "0/1",
>> "debug_crush": "1/1",
>> "debug_mds": "1/5",
>> "debug_mds_balancer": "1/5",
>> "debug_mds_locker": "1/5",
>> "debug_mds_log": "1/5",
>> "debug_mds_log_expire": "1/5",
>> "debug_mds_migrator": "1/5",
>> "debug_buffer": "0/1",
>> "debug_timer": "0/1",
>> "debug_filer": "0/1",
>> "debug_striper": "0/1",
>> "debug_objecter": "0/1",
>> "debug_rados": "0/5",
>> "debug_rbd": "0/5",
>> "debug_rbd_mirror": "0/5",
>> "debug_rbd_replay": "0/5",
>> "debug_journaler": "0/5",
>> "debug_objectcacher": "0/5",
>> "debug_client": "0/5",
>> "debug_osd": "0/0",
>> "debug_optracker": "0/5",
>> "debug_objclass": "0/5",
>> "debug_filestore": "0/0",
>> "debug_journal": "0/0",
>> "debug_ms": 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Gregory Farnum
On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard 
wrote:

> With a bit of profiling, it seems all the memory is allocated to
> ceph::logging::Log::create_entry (see below)
>
> Shoould this be normal ? Is it because some OSDs are down and it logs the
> results of its osd_ping  ?
>

Hmm, is that where the memory is actually *staying* or was it merely
allocated and then quickly deallocated? If you have debugging turned on the
volume of log output can be quite large, but it's a small amount kept
resident in memory at a time. Given that you don't seem to have any logs
over level 5 (except monc at 10) that does seem like a very strange ratio,
though.

Generally log messages are put into a circular buffer and then removed, and
if the buffer gets full the thread trying to log just gets blocked. I'm not
sure how you could get very large arbitrary amounts of data like that. Are
you using syslog or something? I don't think that should do it, but that's
all I can imagine.
That or somehow the upgrade has broken the way the OSD releases memory back
to the OS. Have you looked at the heap stats available through the admin
socket? What distro are you running and what's the source of your packages?
-Greg


>
> The debug level of the OSD is below also.
>
> Thanks,
>
> Adrien
>
>  $ pprof /usr/bin/ceph-osd osd.36.profile.0042-first.heap
> Using local file /usr/bin/ceph-osd.
> Using local file osd.36.profile.0042-first.heap.
> Welcome to pprof!  For help, type 'help'.
> (pprof) top
> Total: 2468.0 MB
>519.5  21.0%  21.0%   1000.4  40.5% std::_Rb_tree::_M_copy
>481.0  19.5%  40.5%481.0  19.5% std::_Rb_tree::_M_create_node
>384.3  15.6%  56.1%384.3  15.6%
> std::_Rb_tree::_M_emplace_hint_unique
>374.0  15.2%  71.3%374.0  15.2%
> ceph::buffer::create_aligned_in_mempool
>305.6  12.4%  83.6%305.6  12.4% ceph::logging::Log::create_entry
>217.3   8.8%  92.4%220.2   8.9% std::vector::_M_emplace_back_aux
>114.4   4.6%  97.1%115.6   4.7% ceph::buffer::list::append@a51900
> 21.4   0.9%  98.0%210.0   8.5% OSD::heartbeat
> 21.2   0.9%  98.8% 21.2   0.9% std::string::_Rep::_S_create
>  4.7   0.2%  99.0%266.2  10.8% AsyncConnection::send_message
> (pprof)
>
>
>  $ pprof /usr/bin/ceph-osd --alloc_space osd.36.profile.0042-first.heap
> Using local file /usr/bin/ceph-osd.
> Using local file osd.36.profile.0042.heap.
> Welcome to pprof!  For help, type 'help'.
> (pprof) top
> Total: 16519.3 MB
>  11915.7  72.1%  72.1%  11915.7  72.1% ceph::logging::Log::create_entry
>745.0   4.5%  76.6%745.0   4.5% std::string::_Rep::_S_create
>716.9   4.3%  81.0%716.9   4.3%
> ceph::buffer::create_aligned_in_mempool
>700.2   4.2%  85.2%703.8   4.3% std::vector::_M_emplace_back_aux
>671.9   4.1%  89.3%671.9   4.1%
> std::_Rb_tree::_M_emplace_hint_unique
>557.8   3.4%  92.7%   1075.2   6.5% std::_Rb_tree::_M_copy
>517.4   3.1%  95.8%517.4   3.1% std::_Rb_tree::_M_create_node
>468.2   2.8%  98.6%469.6   2.8% ceph::buffer::list::append@a51900
>122.8   0.7%  99.4%122.8   0.7% ceph::buffer::list::append@a51d20
> 33.3   0.2%  99.6%   7042.7  42.6% OSD::heartbeat
> (pprof)
>
>
>  $ ceph --admin-daemon /var/run/ceph/ceph-osd.36.asok config show | grep
> debug
> "debug_none": "0/5",
> "debug_lockdep": "0/1",
> "debug_context": "0/1",
> "debug_crush": "1/1",
> "debug_mds": "1/5",
> "debug_mds_balancer": "1/5",
> "debug_mds_locker": "1/5",
> "debug_mds_log": "1/5",
> "debug_mds_log_expire": "1/5",
> "debug_mds_migrator": "1/5",
> "debug_buffer": "0/1",
> "debug_timer": "0/1",
> "debug_filer": "0/1",
> "debug_striper": "0/1",
> "debug_objecter": "0/1",
> "debug_rados": "0/5",
> "debug_rbd": "0/5",
> "debug_rbd_mirror": "0/5",
> "debug_rbd_replay": "0/5",
> "debug_journaler": "0/5",
> "debug_objectcacher": "0/5",
> "debug_client": "0/5",
> "debug_osd": "0/0",
> "debug_optracker": "0/5",
> "debug_objclass": "0/5",
> "debug_filestore": "0/0",
> "debug_journal": "0/0",
> "debug_ms": "0/5",
> "debug_mon": "1/5",
> "debug_monc": "0/10",
> "debug_paxos": "1/5",
> "debug_tp": "0/5",
> "debug_auth": "1/5",
> "debug_crypto": "1/5",
> "debug_finisher": "1/1",
> "debug_reserver": "1/1",
> "debug_heartbeatmap": "1/5",
> "debug_perfcounter": "1/5",
> "debug_rgw": "1/5",
> "debug_civetweb": "1/10",
> "debug_javaclient": "1/5",
> "debug_asok": "1/5",
> "debug_throttle": "1/1",
> "debug_refs": "0/0",
> "debug_xio": "1/5",
> "debug_compressor": "1/5",
> "debug_bluestore": "1/5",
> "debug_bluefs": "1/5",
> "debug_bdev": "1/3",
> "debug_kstore": "1/5",
> "debug_rocksdb": "4/5",
> "debug_leveldb": "4/5",
> "debug_memdb": "4/5",
> "debug_kinetic": "1/5",
> "debug_fuse": "1/5",
> "debug_mgr": "1/5",
> "debug_mgrc": "1/5",
> 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
With a bit of profiling, it seems all the memory is allocated to
ceph::logging::Log::create_entry (see below)

Shoould this be normal ? Is it because some OSDs are down and it logs the
results of its osd_ping  ?

The debug level of the OSD is below also.

Thanks,

Adrien

 $ pprof /usr/bin/ceph-osd osd.36.profile.0042-first.heap
Using local file /usr/bin/ceph-osd.
Using local file osd.36.profile.0042-first.heap.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 2468.0 MB
   519.5  21.0%  21.0%   1000.4  40.5% std::_Rb_tree::_M_copy
   481.0  19.5%  40.5%481.0  19.5% std::_Rb_tree::_M_create_node
   384.3  15.6%  56.1%384.3  15.6% std::_Rb_tree::_M_emplace_hint_unique
   374.0  15.2%  71.3%374.0  15.2%
ceph::buffer::create_aligned_in_mempool
   305.6  12.4%  83.6%305.6  12.4% ceph::logging::Log::create_entry
   217.3   8.8%  92.4%220.2   8.9% std::vector::_M_emplace_back_aux
   114.4   4.6%  97.1%115.6   4.7% ceph::buffer::list::append@a51900
21.4   0.9%  98.0%210.0   8.5% OSD::heartbeat
21.2   0.9%  98.8% 21.2   0.9% std::string::_Rep::_S_create
 4.7   0.2%  99.0%266.2  10.8% AsyncConnection::send_message
(pprof)


 $ pprof /usr/bin/ceph-osd --alloc_space osd.36.profile.0042-first.heap
Using local file /usr/bin/ceph-osd.
Using local file osd.36.profile.0042.heap.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 16519.3 MB
 11915.7  72.1%  72.1%  11915.7  72.1% ceph::logging::Log::create_entry
   745.0   4.5%  76.6%745.0   4.5% std::string::_Rep::_S_create
   716.9   4.3%  81.0%716.9   4.3%
ceph::buffer::create_aligned_in_mempool
   700.2   4.2%  85.2%703.8   4.3% std::vector::_M_emplace_back_aux
   671.9   4.1%  89.3%671.9   4.1% std::_Rb_tree::_M_emplace_hint_unique
   557.8   3.4%  92.7%   1075.2   6.5% std::_Rb_tree::_M_copy
   517.4   3.1%  95.8%517.4   3.1% std::_Rb_tree::_M_create_node
   468.2   2.8%  98.6%469.6   2.8% ceph::buffer::list::append@a51900
   122.8   0.7%  99.4%122.8   0.7% ceph::buffer::list::append@a51d20
33.3   0.2%  99.6%   7042.7  42.6% OSD::heartbeat
(pprof)


 $ ceph --admin-daemon /var/run/ceph/ceph-osd.36.asok config show | grep
debug
"debug_none": "0/5",
"debug_lockdep": "0/1",
"debug_context": "0/1",
"debug_crush": "1/1",
"debug_mds": "1/5",
"debug_mds_balancer": "1/5",
"debug_mds_locker": "1/5",
"debug_mds_log": "1/5",
"debug_mds_log_expire": "1/5",
"debug_mds_migrator": "1/5",
"debug_buffer": "0/1",
"debug_timer": "0/1",
"debug_filer": "0/1",
"debug_striper": "0/1",
"debug_objecter": "0/1",
"debug_rados": "0/5",
"debug_rbd": "0/5",
"debug_rbd_mirror": "0/5",
"debug_rbd_replay": "0/5",
"debug_journaler": "0/5",
"debug_objectcacher": "0/5",
"debug_client": "0/5",
"debug_osd": "0/0",
"debug_optracker": "0/5",
"debug_objclass": "0/5",
"debug_filestore": "0/0",
"debug_journal": "0/0",
"debug_ms": "0/5",
"debug_mon": "1/5",
"debug_monc": "0/10",
"debug_paxos": "1/5",
"debug_tp": "0/5",
"debug_auth": "1/5",
"debug_crypto": "1/5",
"debug_finisher": "1/1",
"debug_reserver": "1/1",
"debug_heartbeatmap": "1/5",
"debug_perfcounter": "1/5",
"debug_rgw": "1/5",
"debug_civetweb": "1/10",
"debug_javaclient": "1/5",
"debug_asok": "1/5",
"debug_throttle": "1/1",
"debug_refs": "0/0",
"debug_xio": "1/5",
"debug_compressor": "1/5",
"debug_bluestore": "1/5",
"debug_bluefs": "1/5",
"debug_bdev": "1/3",
"debug_kstore": "1/5",
"debug_rocksdb": "4/5",
"debug_leveldb": "4/5",
"debug_memdb": "4/5",
"debug_kinetic": "1/5",
"debug_fuse": "1/5",
"debug_mgr": "1/5",
"debug_mgrc": "1/5",
"debug_dpdk": "1/5",
"debug_eventtrace": "1/5",


On Thu, Aug 23, 2018 at 4:27 PM Adrien Gillard 
wrote:

> After upgrading to luminous, we see the exact same behaviour, with OSDs
> eating as much as 80/90 GB of memory.
>
> We'll try some memory profiling but at this point we're a bit lost. Is
> there any specific logs that could help us ?
>
> On Thu, Aug 23, 2018 at 2:34 PM Adrien Gillard 
> wrote:
>
>> Well after a few hours, still nothing new in the behaviour. With half of
>> the OSDs (so 6 per host)  up and peering and the nodown flag set to limit
>> the creation of new maps, all the memory is consumed and OSDs get killed by
>> OOM killer.
>> We observe a lot of threads being created for each OSDs (roughly between
>> 500 and 900). As there is around 200 pg per osd, does this seem a lot to
>> you ? We see tons being created each second, most probably to replace the
>> threads that time out or disappear with the osd processes.
>>
>> We are preparing to upgrade the OSDs to Luminous.
>>
>> On Thu, Aug 23, 2018 at 4:05 AM Gregory Farnum 
>> wrote:
>>
>>>
>>>
>>> On Wed, Aug 22, 2018 at 6:02 PM Adrien Gillard 
>>> wrote:
>>>
 We 'paused' the cluster early in our investigation to avoid 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
After upgrading to luminous, we see the exact same behaviour, with OSDs
eating as much as 80/90 GB of memory.

We'll try some memory profiling but at this point we're a bit lost. Is
there any specific logs that could help us ?

On Thu, Aug 23, 2018 at 2:34 PM Adrien Gillard 
wrote:

> Well after a few hours, still nothing new in the behaviour. With half of
> the OSDs (so 6 per host)  up and peering and the nodown flag set to limit
> the creation of new maps, all the memory is consumed and OSDs get killed by
> OOM killer.
> We observe a lot of threads being created for each OSDs (roughly between
> 500 and 900). As there is around 200 pg per osd, does this seem a lot to
> you ? We see tons being created each second, most probably to replace the
> threads that time out or disappear with the osd processes.
>
> We are preparing to upgrade the OSDs to Luminous.
>
> On Thu, Aug 23, 2018 at 4:05 AM Gregory Farnum  wrote:
>
>>
>>
>> On Wed, Aug 22, 2018 at 6:02 PM Adrien Gillard 
>> wrote:
>>
>>> We 'paused' the cluster early in our investigation to avoid unnecessary
>>> IO.
>>> We also set the nodown flag but the OOM rate was really sustained and we
>>> got servers that stopped responding from time to time, so we decided to
>>> lower the number of OSDs up and let them peer.
>>>
>>
>> Right but this often also means that they need to deal with changing
>> intervals or trying to set up recovery and things, plus the change in which
>> OSDs are up creates new maps that need to be processed. If you set “nodown”
>> you can incrementally bring up OSDs while doing a better job of throttling
>> the map creation process and preventing a bunch of new IO when/if they OOM.
>>
>>
>> I don't know if it is the best solution but it is the middle of the night
>>> here, so we'll see tomorrow morning and ponder the completion of the
>>> upgrade.
>>>
>>> I agree with you Greg, it does not make sense. The cluster was running
>>> great for around 6 hours after the mons upgrade...
>>>
>>> Finally, regarding the cache-tiering, I know it is not used a lot nor
>>> will it be maintained over time. One of the main goal of the luminous
>>> upgrade is to remove it completely and switch to ec rbd.
>>>
>>> Thank you for your inputs !
>>>
>>>
>>> On Wed, Aug 22, 2018 at 11:57 PM Gregory Farnum 
>>> wrote:
>>>
 On Wed, Aug 22, 2018 at 6:35 AM Adrien Gillard <
 gillard.adr...@gmail.com> wrote:

> Hi everyone,
>
> We have a hard time figuring out a behaviour encountered after
> upgrading the monitors of one of our cluster from Jewel to Luminous
> yesterday.
>
> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of
> RAM), each containing 12x4TB OSD with journals on DC grade SSDs). The
> cluster was installed in Infernalis and upgraded to Jewel. We mainly use 
> it
> for RBD with a cache tier in front of an EC pool.
>
> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
> afternoon with the monitors and the provisionning of the MGR. We were
> planning on upgrading the OSD servers today.
> All went well until the cluster received the following commands, a few
> hours after the upgrade :
>
> ceph osd pool set pool-cache-01 target_max_bytes 0
> ceph osd pool set pool-cache-02 target_max_bytes 0
>
> (The context here is that we use this cluster for backups and set
> target_max_bytes to 0 to stop the flush and give maximum ressources to the
> backups IOs)
>
> Below is an extract from the logs of one of the monitors.
> The cluster started exchanging a lot over the cluster network,
> apparently backfill/recovery and multiple Gbits/s, in an amount which 
> we've
> never seen before. Due to some network uplinks sub-sizing, the traffic
> could not be handled well and this lead to OSDs crashing.
>

 This really doesn't make a ton of sense — you're saying that by telling
 the cluster to *stop* doing flushing IO, everything went bad? In terms of
 root causing the issue you'll obviously want to focus on that, although it
 may or may not be helpful in recovering things at this point.

 Do keep in mind that users running cache-tiering are increasingly
 uncommon, and while we still test it we aren't expanding that coverage and
 would really like to remove and replace that functionality, so you're out
 on the far edge of things here. :/

 Finally, if you're still experiencing OOM issues, you may want to set
 the "nodown" flag — that will prevent OSDs being marked down, which
 obviously will stall IO and peering if they go down anyway, but will
 prevent spamming map updates that cause the cluster to OOM again.
 -Greg


>
> At the moment we are still investigating the logs to try to understand
> what happened exactly. The cluster is down. When we restart the OSDs, PG
> start activating and peering but at some 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
Well after a few hours, still nothing new in the behaviour. With half of
the OSDs (so 6 per host)  up and peering and the nodown flag set to limit
the creation of new maps, all the memory is consumed and OSDs get killed by
OOM killer.
We observe a lot of threads being created for each OSDs (roughly between
500 and 900). As there is around 200 pg per osd, does this seem a lot to
you ? We see tons being created each second, most probably to replace the
threads that time out or disappear with the osd processes.

We are preparing to upgrade the OSDs to Luminous.

On Thu, Aug 23, 2018 at 4:05 AM Gregory Farnum  wrote:

>
>
> On Wed, Aug 22, 2018 at 6:02 PM Adrien Gillard 
> wrote:
>
>> We 'paused' the cluster early in our investigation to avoid unnecessary
>> IO.
>> We also set the nodown flag but the OOM rate was really sustained and we
>> got servers that stopped responding from time to time, so we decided to
>> lower the number of OSDs up and let them peer.
>>
>
> Right but this often also means that they need to deal with changing
> intervals or trying to set up recovery and things, plus the change in which
> OSDs are up creates new maps that need to be processed. If you set “nodown”
> you can incrementally bring up OSDs while doing a better job of throttling
> the map creation process and preventing a bunch of new IO when/if they OOM.
>
>
> I don't know if it is the best solution but it is the middle of the night
>> here, so we'll see tomorrow morning and ponder the completion of the
>> upgrade.
>>
>> I agree with you Greg, it does not make sense. The cluster was running
>> great for around 6 hours after the mons upgrade...
>>
>> Finally, regarding the cache-tiering, I know it is not used a lot nor
>> will it be maintained over time. One of the main goal of the luminous
>> upgrade is to remove it completely and switch to ec rbd.
>>
>> Thank you for your inputs !
>>
>>
>> On Wed, Aug 22, 2018 at 11:57 PM Gregory Farnum 
>> wrote:
>>
>>> On Wed, Aug 22, 2018 at 6:35 AM Adrien Gillard 
>>> wrote:
>>>
 Hi everyone,

 We have a hard time figuring out a behaviour encountered after
 upgrading the monitors of one of our cluster from Jewel to Luminous
 yesterday.

 The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of
 RAM), each containing 12x4TB OSD with journals on DC grade SSDs). The
 cluster was installed in Infernalis and upgraded to Jewel. We mainly use it
 for RBD with a cache tier in front of an EC pool.

 We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
 afternoon with the monitors and the provisionning of the MGR. We were
 planning on upgrading the OSD servers today.
 All went well until the cluster received the following commands, a few
 hours after the upgrade :

 ceph osd pool set pool-cache-01 target_max_bytes 0
 ceph osd pool set pool-cache-02 target_max_bytes 0

 (The context here is that we use this cluster for backups and set
 target_max_bytes to 0 to stop the flush and give maximum ressources to the
 backups IOs)

 Below is an extract from the logs of one of the monitors.
 The cluster started exchanging a lot over the cluster network,
 apparently backfill/recovery and multiple Gbits/s, in an amount which we've
 never seen before. Due to some network uplinks sub-sizing, the traffic
 could not be handled well and this lead to OSDs crashing.

>>>
>>> This really doesn't make a ton of sense — you're saying that by telling
>>> the cluster to *stop* doing flushing IO, everything went bad? In terms of
>>> root causing the issue you'll obviously want to focus on that, although it
>>> may or may not be helpful in recovering things at this point.
>>>
>>> Do keep in mind that users running cache-tiering are increasingly
>>> uncommon, and while we still test it we aren't expanding that coverage and
>>> would really like to remove and replace that functionality, so you're out
>>> on the far edge of things here. :/
>>>
>>> Finally, if you're still experiencing OOM issues, you may want to set
>>> the "nodown" flag — that will prevent OSDs being marked down, which
>>> obviously will stall IO and peering if they go down anyway, but will
>>> prevent spamming map updates that cause the cluster to OOM again.
>>> -Greg
>>>
>>>

 At the moment we are still investigating the logs to try to understand
 what happened exactly. The cluster is down. When we restart the OSDs, PG
 start activating and peering but at some point all the RAM and swap of the
 servers is consumed and OSDs crash.
 I tried to set the ovious cluster parameters
 (osd_recovery_max_active=1, osd_recovery_threads=1, osd_max_backfills=1)
 and norecover, nobackfill, but when the OSDs start, we still see 4 or 5
 Gbits/s over the cluster network.

 We are currently reconfiguring the network to try to remove the
 bandwith problem and are preparing to add 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread Gregory Farnum
On Wed, Aug 22, 2018 at 6:02 PM Adrien Gillard 
wrote:

> We 'paused' the cluster early in our investigation to avoid unnecessary
> IO.
> We also set the nodown flag but the OOM rate was really sustained and we
> got servers that stopped responding from time to time, so we decided to
> lower the number of OSDs up and let them peer.
>

Right but this often also means that they need to deal with changing
intervals or trying to set up recovery and things, plus the change in which
OSDs are up creates new maps that need to be processed. If you set “nodown”
you can incrementally bring up OSDs while doing a better job of throttling
the map creation process and preventing a bunch of new IO when/if they OOM.


I don't know if it is the best solution but it is the middle of the night
> here, so we'll see tomorrow morning and ponder the completion of the
> upgrade.
>
> I agree with you Greg, it does not make sense. The cluster was running
> great for around 6 hours after the mons upgrade...
>
> Finally, regarding the cache-tiering, I know it is not used a lot nor will
> it be maintained over time. One of the main goal of the luminous upgrade is
> to remove it completely and switch to ec rbd.
>
> Thank you for your inputs !
>
>
> On Wed, Aug 22, 2018 at 11:57 PM Gregory Farnum 
> wrote:
>
>> On Wed, Aug 22, 2018 at 6:35 AM Adrien Gillard 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> We have a hard time figuring out a behaviour encountered after upgrading
>>> the monitors of one of our cluster from Jewel to Luminous yesterday.
>>>
>>> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of
>>> RAM), each containing 12x4TB OSD with journals on DC grade SSDs). The
>>> cluster was installed in Infernalis and upgraded to Jewel. We mainly use it
>>> for RBD with a cache tier in front of an EC pool.
>>>
>>> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
>>> afternoon with the monitors and the provisionning of the MGR. We were
>>> planning on upgrading the OSD servers today.
>>> All went well until the cluster received the following commands, a few
>>> hours after the upgrade :
>>>
>>> ceph osd pool set pool-cache-01 target_max_bytes 0
>>> ceph osd pool set pool-cache-02 target_max_bytes 0
>>>
>>> (The context here is that we use this cluster for backups and set
>>> target_max_bytes to 0 to stop the flush and give maximum ressources to the
>>> backups IOs)
>>>
>>> Below is an extract from the logs of one of the monitors.
>>> The cluster started exchanging a lot over the cluster network,
>>> apparently backfill/recovery and multiple Gbits/s, in an amount which we've
>>> never seen before. Due to some network uplinks sub-sizing, the traffic
>>> could not be handled well and this lead to OSDs crashing.
>>>
>>
>> This really doesn't make a ton of sense — you're saying that by telling
>> the cluster to *stop* doing flushing IO, everything went bad? In terms of
>> root causing the issue you'll obviously want to focus on that, although it
>> may or may not be helpful in recovering things at this point.
>>
>> Do keep in mind that users running cache-tiering are increasingly
>> uncommon, and while we still test it we aren't expanding that coverage and
>> would really like to remove and replace that functionality, so you're out
>> on the far edge of things here. :/
>>
>> Finally, if you're still experiencing OOM issues, you may want to set the
>> "nodown" flag — that will prevent OSDs being marked down, which obviously
>> will stall IO and peering if they go down anyway, but will prevent spamming
>> map updates that cause the cluster to OOM again.
>> -Greg
>>
>>
>>>
>>> At the moment we are still investigating the logs to try to understand
>>> what happened exactly. The cluster is down. When we restart the OSDs, PG
>>> start activating and peering but at some point all the RAM and swap of the
>>> servers is consumed and OSDs crash.
>>> I tried to set the ovious cluster parameters (osd_recovery_max_active=1,
>>> osd_recovery_threads=1, osd_max_backfills=1) and norecover, nobackfill, but
>>> when the OSDs start, we still see 4 or 5 Gbits/s over the cluster network.
>>>
>>> We are currently reconfiguring the network to try to remove the bandwith
>>> problem and are preparing to add more RAM to the servers.
>>>
>>> But still, I have a couple of questions :
>>>
>>>  - Has anyone encoutered the issue we have with the cache ? Or have a
>>> possible explanation ?
>>>  - Is it normal to observe so much traffic on the cluster network when
>>> the PG are activating / peering with norecover and nobackfill set ?
>>>
>>> Thank you for reading and for your input !
>>>
>>>
>>> 2018-08-21 21:05:02.766969 7f10de377700  0 log_channel(cluster) log
>>> [INF] : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB
>>> / 611 TB avail; 298 MB/s rd, 212 MB/s wr, 456 op/s
>>> 2018-08-21 21:05:03.805124 7f10de377700  0 log_channel(cluster) log
>>> [INF] : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread Adrien Gillard
We 'paused' the cluster early in our investigation to avoid unnecessary IO.
We also set the nodown flag but the OOM rate was really sustained and we
got servers that stopped responding from time to time, so we decided to
lower the number of OSDs up and let them peer.
I don't know if it is the best solution but it is the middle of the night
here, so we'll see tomorrow morning and ponder the completion of the
upgrade.

I agree with you Greg, it does not make sense. The cluster was running
great for around 6 hours after the mons upgrade...

Finally, regarding the cache-tiering, I know it is not used a lot nor will
it be maintained over time. One of the main goal of the luminous upgrade is
to remove it completely and switch to ec rbd.

Thank you for your inputs !


On Wed, Aug 22, 2018 at 11:57 PM Gregory Farnum  wrote:

> On Wed, Aug 22, 2018 at 6:35 AM Adrien Gillard 
> wrote:
>
>> Hi everyone,
>>
>> We have a hard time figuring out a behaviour encountered after upgrading
>> the monitors of one of our cluster from Jewel to Luminous yesterday.
>>
>> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of RAM),
>> each containing 12x4TB OSD with journals on DC grade SSDs). The cluster was
>> installed in Infernalis and upgraded to Jewel. We mainly use it for RBD
>> with a cache tier in front of an EC pool.
>>
>> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
>> afternoon with the monitors and the provisionning of the MGR. We were
>> planning on upgrading the OSD servers today.
>> All went well until the cluster received the following commands, a few
>> hours after the upgrade :
>>
>> ceph osd pool set pool-cache-01 target_max_bytes 0
>> ceph osd pool set pool-cache-02 target_max_bytes 0
>>
>> (The context here is that we use this cluster for backups and set
>> target_max_bytes to 0 to stop the flush and give maximum ressources to the
>> backups IOs)
>>
>> Below is an extract from the logs of one of the monitors.
>> The cluster started exchanging a lot over the cluster network, apparently
>> backfill/recovery and multiple Gbits/s, in an amount which we've never seen
>> before. Due to some network uplinks sub-sizing, the traffic could not be
>> handled well and this lead to OSDs crashing.
>>
>
> This really doesn't make a ton of sense — you're saying that by telling
> the cluster to *stop* doing flushing IO, everything went bad? In terms of
> root causing the issue you'll obviously want to focus on that, although it
> may or may not be helpful in recovering things at this point.
>
> Do keep in mind that users running cache-tiering are increasingly
> uncommon, and while we still test it we aren't expanding that coverage and
> would really like to remove and replace that functionality, so you're out
> on the far edge of things here. :/
>
> Finally, if you're still experiencing OOM issues, you may want to set the
> "nodown" flag — that will prevent OSDs being marked down, which obviously
> will stall IO and peering if they go down anyway, but will prevent spamming
> map updates that cause the cluster to OOM again.
> -Greg
>
>
>>
>> At the moment we are still investigating the logs to try to understand
>> what happened exactly. The cluster is down. When we restart the OSDs, PG
>> start activating and peering but at some point all the RAM and swap of the
>> servers is consumed and OSDs crash.
>> I tried to set the ovious cluster parameters (osd_recovery_max_active=1,
>> osd_recovery_threads=1, osd_max_backfills=1) and norecover, nobackfill, but
>> when the OSDs start, we still see 4 or 5 Gbits/s over the cluster network.
>>
>> We are currently reconfiguring the network to try to remove the bandwith
>> problem and are preparing to add more RAM to the servers.
>>
>> But still, I have a couple of questions :
>>
>>  - Has anyone encoutered the issue we have with the cache ? Or have a
>> possible explanation ?
>>  - Is it normal to observe so much traffic on the cluster network when
>> the PG are activating / peering with norecover and nobackfill set ?
>>
>> Thank you for reading and for your input !
>>
>>
>> 2018-08-21 21:05:02.766969 7f10de377700  0 log_channel(cluster) log [INF]
>> : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB / 611
>> TB avail; 298 MB/s rd, 212 MB/s wr, 456 op/s
>> 2018-08-21 21:05:03.805124 7f10de377700  0 log_channel(cluster) log [INF]
>> : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB / 611
>> TB avail; 296 MB/s rd, 193 MB/s wr, 433 op/s
>> 2018-08-21 21:05:03.870690 7f10d83cc700  0 
>> mon.clb-cph-frpar2-mon-01@0(leader)
>> e6 handle_command mon_command({"var": "target_max_bytes", "prefix": "osd
>> pool set", "pool": "pool-cache-02", "val": "0"} v 0
>> ) v1
>> 2018-08-21 21:05:03.870749 7f10d83cc700  0 log_channel(audit) log [INF] :
>> from='client.146529236 -' entity='client.admin' cmd=[{"var":
>> "target_max_bytes", "prefix": "osd pool set", "pool": "pool-cache-02
>> ", "val": "0"}]: dispatch
>> 2018-08-21 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread Gregory Farnum
On Wed, Aug 22, 2018 at 6:35 AM Adrien Gillard 
wrote:

> Hi everyone,
>
> We have a hard time figuring out a behaviour encountered after upgrading
> the monitors of one of our cluster from Jewel to Luminous yesterday.
>
> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of RAM),
> each containing 12x4TB OSD with journals on DC grade SSDs). The cluster was
> installed in Infernalis and upgraded to Jewel. We mainly use it for RBD
> with a cache tier in front of an EC pool.
>
> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
> afternoon with the monitors and the provisionning of the MGR. We were
> planning on upgrading the OSD servers today.
> All went well until the cluster received the following commands, a few
> hours after the upgrade :
>
> ceph osd pool set pool-cache-01 target_max_bytes 0
> ceph osd pool set pool-cache-02 target_max_bytes 0
>
> (The context here is that we use this cluster for backups and set
> target_max_bytes to 0 to stop the flush and give maximum ressources to the
> backups IOs)
>
> Below is an extract from the logs of one of the monitors.
> The cluster started exchanging a lot over the cluster network, apparently
> backfill/recovery and multiple Gbits/s, in an amount which we've never seen
> before. Due to some network uplinks sub-sizing, the traffic could not be
> handled well and this lead to OSDs crashing.
>

This really doesn't make a ton of sense — you're saying that by telling the
cluster to *stop* doing flushing IO, everything went bad? In terms of root
causing the issue you'll obviously want to focus on that, although it may
or may not be helpful in recovering things at this point.

Do keep in mind that users running cache-tiering are increasingly uncommon,
and while we still test it we aren't expanding that coverage and would
really like to remove and replace that functionality, so you're out on the
far edge of things here. :/

Finally, if you're still experiencing OOM issues, you may want to set the
"nodown" flag — that will prevent OSDs being marked down, which obviously
will stall IO and peering if they go down anyway, but will prevent spamming
map updates that cause the cluster to OOM again.
-Greg


>
> At the moment we are still investigating the logs to try to understand
> what happened exactly. The cluster is down. When we restart the OSDs, PG
> start activating and peering but at some point all the RAM and swap of the
> servers is consumed and OSDs crash.
> I tried to set the ovious cluster parameters (osd_recovery_max_active=1,
> osd_recovery_threads=1, osd_max_backfills=1) and norecover, nobackfill, but
> when the OSDs start, we still see 4 or 5 Gbits/s over the cluster network.
>
> We are currently reconfiguring the network to try to remove the bandwith
> problem and are preparing to add more RAM to the servers.
>
> But still, I have a couple of questions :
>
>  - Has anyone encoutered the issue we have with the cache ? Or have a
> possible explanation ?
>  - Is it normal to observe so much traffic on the cluster network when the
> PG are activating / peering with norecover and nobackfill set ?
>
> Thank you for reading and for your input !
>
>
> 2018-08-21 21:05:02.766969 7f10de377700  0 log_channel(cluster) log [INF]
> : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB / 611
> TB avail; 298 MB/s rd, 212 MB/s wr, 456 op/s
> 2018-08-21 21:05:03.805124 7f10de377700  0 log_channel(cluster) log [INF]
> : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB / 611
> TB avail; 296 MB/s rd, 193 MB/s wr, 433 op/s
> 2018-08-21 21:05:03.870690 7f10d83cc700  0 mon.clb-cph-frpar2-mon-01@0(leader)
> e6 handle_command mon_command({"var": "target_max_bytes", "prefix": "osd
> pool set", "pool": "pool-cache-02", "val": "0"} v 0
> ) v1
> 2018-08-21 21:05:03.870749 7f10d83cc700  0 log_channel(audit) log [INF] :
> from='client.146529236 -' entity='client.admin' cmd=[{"var":
> "target_max_bytes", "prefix": "osd pool set", "pool": "pool-cache-02
> ", "val": "0"}]: dispatch
> 2018-08-21 21:05:03.913663 7f10d83cc700  0 mon.clb-cph-frpar2-mon-01@0(leader)
> e6 handle_command mon_command({"prefix": "osd pool stats", "format":
> "json"} v 0) v1
> 2018-08-21 21:05:03.913745 7f10d83cc700  0 log_channel(audit) log [DBG] :
> from='client.? 10.25.25.3:0/4101432770' entity='client.admin'
> cmd=[{"prefix": "osd pool stats", "format": "json"}]: dispatch
> 2018-08-21 21:05:04.042835 7f10d83cc700  0 mon.clb-cph-frpar2-mon-01@0(leader)
> e6 handle_command mon_command({"var": "target_max_bytes", "prefix": "osd
> pool set", "pool": "pool-cache-01", "val": "0"} v 0
> ) v1
> 2018-08-21 21:05:04.042894 7f10d83cc700  0 log_channel(audit) log [INF] :
> from='client.146565481 -' entity='client.admin' cmd=[{"var":
> "target_max_bytes", "prefix": "osd pool set", "pool": "pool-cache-01
> ", "val": "0"}]: dispatch
> 2018-08-21 21:05:04.813217 7f10de377700  0 log_channel(cluster) log [INF]
> : pgmap 6208 pgs: 6208 active+clean; 261 TB 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread David Turner
I swear I remember a thread on the ML that talked about someone having
increased memory usage on their OSDs after upgrading their MONs to Luminous
as well, but I can't seem to find it.  Iirc the problem for them was
resolved when they finished the upgrade to Luminous.  It might not be a bad
idea for you to run `ceph osd pause` to stop all IO on the cluster and get
through the upgrade without worrying about which OSDs are up or not.  When
all of the peering is done for the OSDs and the upgrade is complete, `ceph
osd unpause` and you should be good to go.  That works especially well if
you're in a situation where you can't serve data to clients anyway since
it'll just help things get through faster.  If you haven't incurred down
time yet, then you might want to try something less drastic.

On Wed, Aug 22, 2018 at 5:44 PM Adrien Gillard 
wrote:

> Thank you so much for your feedback. It always helps with this kind of
> situation.
>
> We fixed the network issue, added as much RAM and as much swap as possible
> but
> were still far from anything with OOM killer decimating the OSDs which at
> times
> used more than 35GB of memory.
>
> We decided to shut half of the OSDs and let the remaining peer and
> recover. We'll
> see in a few hours.
>
> We'll keep investigating the root cause but nothing really interesting so
> far :/
>
>
> On Wed, Aug 22, 2018 at 10:04 PM Lothar Gesslein 
> wrote:
>
>> Hi Adrien,
>>
>> I don't expect I can fully explain what happened to your cluster, but
>> since you got no other feedback so far I'll try my best.
>>
>> So you have 517 million RADOS objects. Assuming at least 3 copies each
>> for normal replication or 5 "shards" for EC pools, there are somewhere
>> between 1.5 to 2.5 billion objects stored on 168 osds, which gives each
>> osd 9-15 million objects to store and manage.
>>
>> I am not surprised that this could consume a lot of memory for each osd,
>> and needs a lot of bandwith when the peering osds exchange info about
>> all objects. When peering they basically iterate over all objects to
>> compare metadata and state.
>>
>> I would not try to downgrade the mons again, which AFAIK is not really
>> possible, but rather upgrade all OSDs.
>>
>> As long as you see reasonable progress in the recovery after the memory
>> and bandwith increase I would give it some time. The most important part
>> in my opinion is that the number of pgs still peering is going down,
>> while all osds remain up.
>>
>> If random osds still go down, I'd try to start only a few osds across
>> systems at a time to keep the number of concurrently peering pgs low and
>> a chance to carefully increase and monitor system and network load.
>>
>> Regards,
>> Lothar
>>
>>
>> On 08/22/2018 08:34 PM, Adrien Gillard wrote:
>> > Some follow-up.
>> >
>> > After doubling the RAM (so 128GB for 12x4TB OSD), all the RAM is still
>> > consumed a bit after the OSD restart.
>> >
>> >
>> > We are considering going through with the update of the OSDs to luminous
>> > (or maybe go back to jewel on the mons...) but the
>> > cluster is in bad shape...
>> >
>> > health: HEALTH_ERR
>> > 5054 pgs are stuck inactive for more than 60 seconds
>> > 73 pgs backfill_wait
>> > 23 pgs backfilling
>> > 835 pgs degraded
>> > 1456 pgs down
>> > 4632 pgs peering
>> > 11 pgs recovering
>> > 11 pgs recovery_wait
>> > 1341 pgs stale
>> > 293 pgs stuck degraded
>> > 5051 pgs stuck inactive
>> > 3 pgs stuck stale
>> > 5719 pgs stuck unclean
>> > 284 pgs stuck undersized
>> > 820 pgs undersized
>> > 4 requests are blocked > 32 sec
>> > recovery 3408607/517873855 objects degraded (0.658%)
>> > recovery 173949/517873855 objects misplaced (0.034%)
>> > recovery 3/67978083 objects unfound (0.000%)
>> > 22 osds down
>> > pauserd,pausewr,nobackfill,norecover,noscrub,nodeep-scrub
>> > flag(s) set
>> >
>> >
>> > On Wed, Aug 22, 2018 at 3:35 PM Adrien Gillard <
>> gillard.adr...@gmail.com>
>> > wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> We have a hard time figuring out a behaviour encountered after
>> upgrading
>> >> the monitors of one of our cluster from Jewel to Luminous yesterday.
>> >>
>> >> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of
>> RAM),
>> >> each containing 12x4TB OSD with journals on DC grade SSDs). The
>> cluster was
>> >> installed in Infernalis and upgraded to Jewel. We mainly use it for RBD
>> >> with a cache tier in front of an EC pool.
>> >>
>> >> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
>> >> afternoon with the monitors and the provisionning of the MGR. We were
>> >> planning on upgrading the OSD servers today.
>> >> All went well until the cluster received the following commands, a few
>> >> hours after the upgrade :
>> >>
>> >> 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread Adrien Gillard
Thank you so much for your feedback. It always helps with this kind of
situation.

We fixed the network issue, added as much RAM and as much swap as possible
but
were still far from anything with OOM killer decimating the OSDs which at
times
used more than 35GB of memory.

We decided to shut half of the OSDs and let the remaining peer and recover.
We'll
see in a few hours.

We'll keep investigating the root cause but nothing really interesting so
far :/


On Wed, Aug 22, 2018 at 10:04 PM Lothar Gesslein 
wrote:

> Hi Adrien,
>
> I don't expect I can fully explain what happened to your cluster, but
> since you got no other feedback so far I'll try my best.
>
> So you have 517 million RADOS objects. Assuming at least 3 copies each
> for normal replication or 5 "shards" for EC pools, there are somewhere
> between 1.5 to 2.5 billion objects stored on 168 osds, which gives each
> osd 9-15 million objects to store and manage.
>
> I am not surprised that this could consume a lot of memory for each osd,
> and needs a lot of bandwith when the peering osds exchange info about
> all objects. When peering they basically iterate over all objects to
> compare metadata and state.
>
> I would not try to downgrade the mons again, which AFAIK is not really
> possible, but rather upgrade all OSDs.
>
> As long as you see reasonable progress in the recovery after the memory
> and bandwith increase I would give it some time. The most important part
> in my opinion is that the number of pgs still peering is going down,
> while all osds remain up.
>
> If random osds still go down, I'd try to start only a few osds across
> systems at a time to keep the number of concurrently peering pgs low and
> a chance to carefully increase and monitor system and network load.
>
> Regards,
> Lothar
>
>
> On 08/22/2018 08:34 PM, Adrien Gillard wrote:
> > Some follow-up.
> >
> > After doubling the RAM (so 128GB for 12x4TB OSD), all the RAM is still
> > consumed a bit after the OSD restart.
> >
> >
> > We are considering going through with the update of the OSDs to luminous
> > (or maybe go back to jewel on the mons...) but the
> > cluster is in bad shape...
> >
> > health: HEALTH_ERR
> > 5054 pgs are stuck inactive for more than 60 seconds
> > 73 pgs backfill_wait
> > 23 pgs backfilling
> > 835 pgs degraded
> > 1456 pgs down
> > 4632 pgs peering
> > 11 pgs recovering
> > 11 pgs recovery_wait
> > 1341 pgs stale
> > 293 pgs stuck degraded
> > 5051 pgs stuck inactive
> > 3 pgs stuck stale
> > 5719 pgs stuck unclean
> > 284 pgs stuck undersized
> > 820 pgs undersized
> > 4 requests are blocked > 32 sec
> > recovery 3408607/517873855 objects degraded (0.658%)
> > recovery 173949/517873855 objects misplaced (0.034%)
> > recovery 3/67978083 objects unfound (0.000%)
> > 22 osds down
> > pauserd,pausewr,nobackfill,norecover,noscrub,nodeep-scrub
> > flag(s) set
> >
> >
> > On Wed, Aug 22, 2018 at 3:35 PM Adrien Gillard  >
> > wrote:
> >
> >> Hi everyone,
> >>
> >> We have a hard time figuring out a behaviour encountered after upgrading
> >> the monitors of one of our cluster from Jewel to Luminous yesterday.
> >>
> >> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of
> RAM),
> >> each containing 12x4TB OSD with journals on DC grade SSDs). The cluster
> was
> >> installed in Infernalis and upgraded to Jewel. We mainly use it for RBD
> >> with a cache tier in front of an EC pool.
> >>
> >> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
> >> afternoon with the monitors and the provisionning of the MGR. We were
> >> planning on upgrading the OSD servers today.
> >> All went well until the cluster received the following commands, a few
> >> hours after the upgrade :
> >>
> >> ceph osd pool set pool-cache-01 target_max_bytes 0
> >> ceph osd pool set pool-cache-02 target_max_bytes 0
> >>
> >> (The context here is that we use this cluster for backups and set
> >> target_max_bytes to 0 to stop the flush and give maximum ressources to
> the
> >> backups IOs)
> >>
> >> Below is an extract from the logs of one of the monitors.
> >> The cluster started exchanging a lot over the cluster network,
> apparently
> >> backfill/recovery and multiple Gbits/s, in an amount which we've never
> seen
> >> before. Due to some network uplinks sub-sizing, the traffic could not be
> >> handled well and this lead to OSDs crashing.
> >>
> >> At the moment we are still investigating the logs to try to understand
> >> what happened exactly. The cluster is down. When we restart the OSDs, PG
> >> start activating and peering but at some point all the RAM and swap of
> the
> >> servers is consumed and OSDs crash.
> >> I tried to set the ovious cluster parameters (osd_recovery_max_active=1,
> >> 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread Lothar Gesslein
Hi Adrien,

I don't expect I can fully explain what happened to your cluster, but
since you got no other feedback so far I'll try my best.

So you have 517 million RADOS objects. Assuming at least 3 copies each
for normal replication or 5 "shards" for EC pools, there are somewhere
between 1.5 to 2.5 billion objects stored on 168 osds, which gives each
osd 9-15 million objects to store and manage.

I am not surprised that this could consume a lot of memory for each osd,
and needs a lot of bandwith when the peering osds exchange info about
all objects. When peering they basically iterate over all objects to
compare metadata and state.

I would not try to downgrade the mons again, which AFAIK is not really
possible, but rather upgrade all OSDs.

As long as you see reasonable progress in the recovery after the memory
and bandwith increase I would give it some time. The most important part
in my opinion is that the number of pgs still peering is going down,
while all osds remain up.

If random osds still go down, I'd try to start only a few osds across
systems at a time to keep the number of concurrently peering pgs low and
a chance to carefully increase and monitor system and network load.

Regards,
Lothar


On 08/22/2018 08:34 PM, Adrien Gillard wrote:
> Some follow-up.
> 
> After doubling the RAM (so 128GB for 12x4TB OSD), all the RAM is still
> consumed a bit after the OSD restart.
> 
> 
> We are considering going through with the update of the OSDs to luminous
> (or maybe go back to jewel on the mons...) but the
> cluster is in bad shape...
> 
> health: HEALTH_ERR
> 5054 pgs are stuck inactive for more than 60 seconds
> 73 pgs backfill_wait
> 23 pgs backfilling
> 835 pgs degraded
> 1456 pgs down
> 4632 pgs peering
> 11 pgs recovering
> 11 pgs recovery_wait
> 1341 pgs stale
> 293 pgs stuck degraded
> 5051 pgs stuck inactive
> 3 pgs stuck stale
> 5719 pgs stuck unclean
> 284 pgs stuck undersized
> 820 pgs undersized
> 4 requests are blocked > 32 sec
> recovery 3408607/517873855 objects degraded (0.658%)
> recovery 173949/517873855 objects misplaced (0.034%)
> recovery 3/67978083 objects unfound (0.000%)
> 22 osds down
> pauserd,pausewr,nobackfill,norecover,noscrub,nodeep-scrub
> flag(s) set
> 
> 
> On Wed, Aug 22, 2018 at 3:35 PM Adrien Gillard 
> wrote:
> 
>> Hi everyone,
>>
>> We have a hard time figuring out a behaviour encountered after upgrading
>> the monitors of one of our cluster from Jewel to Luminous yesterday.
>>
>> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of RAM),
>> each containing 12x4TB OSD with journals on DC grade SSDs). The cluster was
>> installed in Infernalis and upgraded to Jewel. We mainly use it for RBD
>> with a cache tier in front of an EC pool.
>>
>> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
>> afternoon with the monitors and the provisionning of the MGR. We were
>> planning on upgrading the OSD servers today.
>> All went well until the cluster received the following commands, a few
>> hours after the upgrade :
>>
>> ceph osd pool set pool-cache-01 target_max_bytes 0
>> ceph osd pool set pool-cache-02 target_max_bytes 0
>>
>> (The context here is that we use this cluster for backups and set
>> target_max_bytes to 0 to stop the flush and give maximum ressources to the
>> backups IOs)
>>
>> Below is an extract from the logs of one of the monitors.
>> The cluster started exchanging a lot over the cluster network, apparently
>> backfill/recovery and multiple Gbits/s, in an amount which we've never seen
>> before. Due to some network uplinks sub-sizing, the traffic could not be
>> handled well and this lead to OSDs crashing.
>>
>> At the moment we are still investigating the logs to try to understand
>> what happened exactly. The cluster is down. When we restart the OSDs, PG
>> start activating and peering but at some point all the RAM and swap of the
>> servers is consumed and OSDs crash.
>> I tried to set the ovious cluster parameters (osd_recovery_max_active=1,
>> osd_recovery_threads=1, osd_max_backfills=1) and norecover, nobackfill, but
>> when the OSDs start, we still see 4 or 5 Gbits/s over the cluster network.
>>
>> We are currently reconfiguring the network to try to remove the bandwith
>> problem and are preparing to add more RAM to the servers.
>>
>> But still, I have a couple of questions :
>>
>>  - Has anyone encoutered the issue we have with the cache ? Or have a
>> possible explanation ?
>>  - Is it normal to observe so much traffic on the cluster network when the
>> PG are activating / peering with norecover and nobackfill set ?
>>
>> Thank you for reading and for your input !
>>
>>
>> 2018-08-21 21:05:02.766969 7f10de377700  0 log_channel(cluster) log [INF]
>> : 

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-22 Thread Adrien Gillard
Some follow-up.

After doubling the RAM (so 128GB for 12x4TB OSD), all the RAM is still
consumed a bit after the OSD restart.


We are considering going through with the update of the OSDs to luminous
(or maybe go back to jewel on the mons...) but the
cluster is in bad shape...

health: HEALTH_ERR
5054 pgs are stuck inactive for more than 60 seconds
73 pgs backfill_wait
23 pgs backfilling
835 pgs degraded
1456 pgs down
4632 pgs peering
11 pgs recovering
11 pgs recovery_wait
1341 pgs stale
293 pgs stuck degraded
5051 pgs stuck inactive
3 pgs stuck stale
5719 pgs stuck unclean
284 pgs stuck undersized
820 pgs undersized
4 requests are blocked > 32 sec
recovery 3408607/517873855 objects degraded (0.658%)
recovery 173949/517873855 objects misplaced (0.034%)
recovery 3/67978083 objects unfound (0.000%)
22 osds down
pauserd,pausewr,nobackfill,norecover,noscrub,nodeep-scrub
flag(s) set


On Wed, Aug 22, 2018 at 3:35 PM Adrien Gillard 
wrote:

> Hi everyone,
>
> We have a hard time figuring out a behaviour encountered after upgrading
> the monitors of one of our cluster from Jewel to Luminous yesterday.
>
> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of RAM),
> each containing 12x4TB OSD with journals on DC grade SSDs). The cluster was
> installed in Infernalis and upgraded to Jewel. We mainly use it for RBD
> with a cache tier in front of an EC pool.
>
> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 yesterday
> afternoon with the monitors and the provisionning of the MGR. We were
> planning on upgrading the OSD servers today.
> All went well until the cluster received the following commands, a few
> hours after the upgrade :
>
> ceph osd pool set pool-cache-01 target_max_bytes 0
> ceph osd pool set pool-cache-02 target_max_bytes 0
>
> (The context here is that we use this cluster for backups and set
> target_max_bytes to 0 to stop the flush and give maximum ressources to the
> backups IOs)
>
> Below is an extract from the logs of one of the monitors.
> The cluster started exchanging a lot over the cluster network, apparently
> backfill/recovery and multiple Gbits/s, in an amount which we've never seen
> before. Due to some network uplinks sub-sizing, the traffic could not be
> handled well and this lead to OSDs crashing.
>
> At the moment we are still investigating the logs to try to understand
> what happened exactly. The cluster is down. When we restart the OSDs, PG
> start activating and peering but at some point all the RAM and swap of the
> servers is consumed and OSDs crash.
> I tried to set the ovious cluster parameters (osd_recovery_max_active=1,
> osd_recovery_threads=1, osd_max_backfills=1) and norecover, nobackfill, but
> when the OSDs start, we still see 4 or 5 Gbits/s over the cluster network.
>
> We are currently reconfiguring the network to try to remove the bandwith
> problem and are preparing to add more RAM to the servers.
>
> But still, I have a couple of questions :
>
>  - Has anyone encoutered the issue we have with the cache ? Or have a
> possible explanation ?
>  - Is it normal to observe so much traffic on the cluster network when the
> PG are activating / peering with norecover and nobackfill set ?
>
> Thank you for reading and for your input !
>
>
> 2018-08-21 21:05:02.766969 7f10de377700  0 log_channel(cluster) log [INF]
> : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB / 611
> TB avail; 298 MB/s rd, 212 MB/s wr, 456 op/s
> 2018-08-21 21:05:03.805124 7f10de377700  0 log_channel(cluster) log [INF]
> : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB used, 225 TB / 611
> TB avail; 296 MB/s rd, 193 MB/s wr, 433 op/s
> 2018-08-21 21:05:03.870690 7f10d83cc700  0 mon.clb-cph-frpar2-mon-01@0(leader)
> e6 handle_command mon_command({"var": "target_max_bytes", "prefix": "osd
> pool set", "pool": "pool-cache-02", "val": "0"} v 0
> ) v1
> 2018-08-21 21:05:03.870749 7f10d83cc700  0 log_channel(audit) log [INF] :
> from='client.146529236 -' entity='client.admin' cmd=[{"var":
> "target_max_bytes", "prefix": "osd pool set", "pool": "pool-cache-02
> ", "val": "0"}]: dispatch
> 2018-08-21 21:05:03.913663 7f10d83cc700  0 mon.clb-cph-frpar2-mon-01@0(leader)
> e6 handle_command mon_command({"prefix": "osd pool stats", "format":
> "json"} v 0) v1
> 2018-08-21 21:05:03.913745 7f10d83cc700  0 log_channel(audit) log [DBG] :
> from='client.? 10.25.25.3:0/4101432770' entity='client.admin'
> cmd=[{"prefix": "osd pool stats", "format": "json"}]: dispatch
> 2018-08-21 21:05:04.042835 7f10d83cc700  0 mon.clb-cph-frpar2-mon-01@0(leader)
> e6 handle_command mon_command({"var": "target_max_bytes", "prefix": "osd
> pool set", "pool": "pool-cache-01", "val": "0"} v 0
> ) v1
> 2018-08-21 21:05:04.042894