Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

Adrien Gillard Sat, 25 Aug 2018 01:48:33 -0700

The issue is finally resolved.
Upgrading to Luminous was the way to go. Unfortunately, we did not set
'ceph osd require-osd-release luminous' immediately so we did not
activate the luminous functionnalities that saved us.


I think the new mechanisms to manage and prune past intervals[1]
allowed the OSDs to start without consuming enormous amounts of memory
(around 1.5GB for the majority, up to 10GB for a few).
The pgs took the larger part of the day to peer and activate and the
cluster eventually ended up in HEALTH_OK. We are having a maximum of
scrubs over the week-end to detect inconsistencies that could have
appeared with the flapping.

Thank you again for your help !

[1] https://ceph.com/community/new-luminous-pg-overdose-protection/
On Thu, Aug 23, 2018 at 10:04 PM Adrien Gillard
<gillard.adr...@gmail.com> wrote:
>
> Sending back, forgot the plain text for ceph-devel.
>
> Sorry about that.
>
> On Thu, Aug 23, 2018 at 9:57 PM Adrien Gillard <gillard.adr...@gmail.com> 
> wrote:
> >
> > We are running CentOS 7.5 with upstream Ceph packages, no remote syslog, 
> > just default local logging.
> >
> > After looking a bit deeper into pprof, --alloc_space seems to represent 
> > allocations that happened since the program started which goes along with 
> > the quick deallocation of the memory. --inuse_space (the default) indicates 
> > allocation at collection. std::_Rb_tree::_M_copy uses the most memory here 
> > but I have other samples (well, most of the others actually) where 
> > ceph::logging::Log::create_entry has the lead.
> > I will look at the admin socket.
> >
> > I'm adding ceph-devel, as someone might have a clue as to how to better 
> > intepret these results.
> >
> > Thank you for keeping up with this :)
> >
> >
> > On Thu, Aug 23, 2018 at 6:58 PM Gregory Farnum <gfar...@redhat.com> wrote:
> >>
> >> On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard <gillard.adr...@gmail.com> 
> >> wrote:
> >>>
> >>> With a bit of profiling, it seems all the memory is allocated to  
> >>> ceph::logging::Log::create_entry (see below)
> >>>
> >>> Shoould this be normal ? Is it because some OSDs are down and it logs the 
> >>> results of its osd_ping  ?
> >>
> >>
> >> Hmm, is that where the memory is actually *staying* or was it merely 
> >> allocated and then quickly deallocated? If you have debugging turned on 
> >> the volume of log output can be quite large, but it's a small amount kept 
> >> resident in memory at a time. Given that you don't seem to have any logs 
> >> over level 5 (except monc at 10) that does seem like a very strange ratio, 
> >> though.
> >>
> >> Generally log messages are put into a circular buffer and then removed, 
> >> and if the buffer gets full the thread trying to log just gets blocked. 
> >> I'm not sure how you could get very large arbitrary amounts of data like 
> >> that. Are you using syslog or something? I don't think that should do it, 
> >> but that's all I can imagine.
> >> That or somehow the upgrade has broken the way the OSD releases memory 
> >> back to the OS. Have you looked at the heap stats available through the 
> >> admin socket? What distro are you running and what's the source of your 
> >> packages?
> >> -Greg
> >>
> >>>
> >>>
> >>> The debug level of the OSD is below also.
> >>>
> >>> Thanks,
> >>>
> >>> Adrien
> >>>
> >>>  $ pprof /usr/bin/ceph-osd osd.36.profile.0042-first.heap
> >>> Using local file /usr/bin/ceph-osd.
> >>> Using local file osd.36.profile.0042-first.heap.
> >>> Welcome to pprof!  For help, type 'help'.
> >>> (pprof) top
> >>> Total: 2468.0 MB
> >>>    519.5  21.0%  21.0%   1000.4  40.5% std::_Rb_tree::_M_copy
> >>>    481.0  19.5%  40.5%    481.0  19.5% std::_Rb_tree::_M_create_node
> >>>    384.3  15.6%  56.1%    384.3  15.6% 
> >>> std::_Rb_tree::_M_emplace_hint_unique
> >>>    374.0  15.2%  71.3%    374.0  15.2% 
> >>> ceph::buffer::create_aligned_in_mempool
> >>>    305.6  12.4%  83.6%    305.6  12.4% ceph::logging::Log::create_entry
> >>>    217.3   8.8%  92.4%    220.2   8.9% std::vector::_M_emplace_back_aux
> >>>    114.4   4.6%  97.1%    115.6   4.7% ceph::buffer::list::append@a51900
> >>>     21.4   0.9%  98.0%    210.0   8.5% OSD::heartbeat
> >>>     21.2   0.9%  98.8%     21.2   0.9% std::string::_Rep::_S_create
> >>>      4.7   0.2%  99.0%    266.2  10.8% AsyncConnection::send_message
> >>> (pprof)
> >>>
> >>>
> >>>  $ pprof /usr/bin/ceph-osd --alloc_space osd.36.profile.0042-first.heap
> >>> Using local file /usr/bin/ceph-osd.
> >>> Using local file osd.36.profile.0042.heap.
> >>> Welcome to pprof!  For help, type 'help'.
> >>> (pprof) top
> >>> Total: 16519.3 MB
> >>>  11915.7  72.1%  72.1%  11915.7  72.1% ceph::logging::Log::create_entry
> >>>    745.0   4.5%  76.6%    745.0   4.5% std::string::_Rep::_S_create
> >>>    716.9   4.3%  81.0%    716.9   4.3% 
> >>> ceph::buffer::create_aligned_in_mempool
> >>>    700.2   4.2%  85.2%    703.8   4.3% std::vector::_M_emplace_back_aux
> >>>    671.9   4.1%  89.3%    671.9   4.1% 
> >>> std::_Rb_tree::_M_emplace_hint_unique
> >>>    557.8   3.4%  92.7%   1075.2   6.5% std::_Rb_tree::_M_copy
> >>>    517.4   3.1%  95.8%    517.4   3.1% std::_Rb_tree::_M_create_node
> >>>    468.2   2.8%  98.6%    469.6   2.8% ceph::buffer::list::append@a51900
> >>>    122.8   0.7%  99.4%    122.8   0.7% ceph::buffer::list::append@a51d20
> >>>     33.3   0.2%  99.6%   7042.7  42.6% OSD::heartbeat
> >>> (pprof)
> >>>
> >>>
> >>>  $ ceph --admin-daemon /var/run/ceph/ceph-osd.36.asok config show | grep 
> >>> debug
> >>>     "debug_none": "0/5",
> >>>     "debug_lockdep": "0/1",
> >>>     "debug_context": "0/1",
> >>>     "debug_crush": "1/1",
> >>>     "debug_mds": "1/5",
> >>>     "debug_mds_balancer": "1/5",
> >>>     "debug_mds_locker": "1/5",
> >>>     "debug_mds_log": "1/5",
> >>>     "debug_mds_log_expire": "1/5",
> >>>     "debug_mds_migrator": "1/5",
> >>>     "debug_buffer": "0/1",
> >>>     "debug_timer": "0/1",
> >>>     "debug_filer": "0/1",
> >>>     "debug_striper": "0/1",
> >>>     "debug_objecter": "0/1",
> >>>     "debug_rados": "0/5",
> >>>     "debug_rbd": "0/5",
> >>>     "debug_rbd_mirror": "0/5",
> >>>     "debug_rbd_replay": "0/5",
> >>>     "debug_journaler": "0/5",
> >>>     "debug_objectcacher": "0/5",
> >>>     "debug_client": "0/5",
> >>>     "debug_osd": "0/0",
> >>>     "debug_optracker": "0/5",
> >>>     "debug_objclass": "0/5",
> >>>     "debug_filestore": "0/0",
> >>>     "debug_journal": "0/0",
> >>>     "debug_ms": "0/5",
> >>>     "debug_mon": "1/5",
> >>>     "debug_monc": "0/10",
> >>>     "debug_paxos": "1/5",
> >>>     "debug_tp": "0/5",
> >>>     "debug_auth": "1/5",
> >>>     "debug_crypto": "1/5",
> >>>     "debug_finisher": "1/1",
> >>>     "debug_reserver": "1/1",
> >>>     "debug_heartbeatmap": "1/5",
> >>>     "debug_perfcounter": "1/5",
> >>>     "debug_rgw": "1/5",
> >>>     "debug_civetweb": "1/10",
> >>>     "debug_javaclient": "1/5",
> >>>     "debug_asok": "1/5",
> >>>     "debug_throttle": "1/1",
> >>>     "debug_refs": "0/0",
> >>>     "debug_xio": "1/5",
> >>>     "debug_compressor": "1/5",
> >>>     "debug_bluestore": "1/5",
> >>>     "debug_bluefs": "1/5",
> >>>     "debug_bdev": "1/3",
> >>>     "debug_kstore": "1/5",
> >>>     "debug_rocksdb": "4/5",
> >>>     "debug_leveldb": "4/5",
> >>>     "debug_memdb": "4/5",
> >>>     "debug_kinetic": "1/5",
> >>>     "debug_fuse": "1/5",
> >>>     "debug_mgr": "1/5",
> >>>     "debug_mgrc": "1/5",
> >>>     "debug_dpdk": "1/5",
> >>>     "debug_eventtrace": "1/5",
> >>>
> >>>
> >>> On Thu, Aug 23, 2018 at 4:27 PM Adrien Gillard <gillard.adr...@gmail.com> 
> >>> wrote:
> >>>>
> >>>> After upgrading to luminous, we see the exact same behaviour, with OSDs 
> >>>> eating as much as 80/90 GB of memory.
> >>>>
> >>>> We'll try some memory profiling but at this point we're a bit lost. Is 
> >>>> there any specific logs that could help us ?
> >>>>
> >>>> On Thu, Aug 23, 2018 at 2:34 PM Adrien Gillard 
> >>>> <gillard.adr...@gmail.com> wrote:
> >>>>>
> >>>>> Well after a few hours, still nothing new in the behaviour. With half 
> >>>>> of the OSDs (so 6 per host)  up and peering and the nodown flag set to 
> >>>>> limit the creation of new maps, all the memory is consumed and OSDs get 
> >>>>> killed by OOM killer.
> >>>>> We observe a lot of threads being created for each OSDs (roughly 
> >>>>> between 500 and 900). As there is around 200 pg per osd, does this seem 
> >>>>> a lot to you ? We see tons being created each second, most probably to 
> >>>>> replace the threads that time out or disappear with the osd processes.
> >>>>>
> >>>>> We are preparing to upgrade the OSDs to Luminous.
> >>>>>
> >>>>> On Thu, Aug 23, 2018 at 4:05 AM Gregory Farnum <gfar...@redhat.com> 
> >>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Aug 22, 2018 at 6:02 PM Adrien Gillard 
> >>>>>> <gillard.adr...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> We 'paused' the cluster early in our investigation to avoid 
> >>>>>>> unnecessary IO.
> >>>>>>> We also set the nodown flag but the OOM rate was really sustained and 
> >>>>>>> we got servers that stopped responding from time to time, so we 
> >>>>>>> decided to lower the number of OSDs up and let them peer.
> >>>>>>
> >>>>>>
> >>>>>> Right but this often also means that they need to deal with changing 
> >>>>>> intervals or trying to set up recovery and things, plus the change in 
> >>>>>> which OSDs are up creates new maps that need to be processed. If you 
> >>>>>> set “nodown” you can incrementally bring up OSDs while doing a better 
> >>>>>> job of throttling the map creation process and preventing a bunch of 
> >>>>>> new IO when/if they OOM.
> >>>>>>
> >>>>>>
> >>>>>>> I don't know if it is the best solution but it is the middle of the 
> >>>>>>> night here, so we'll see tomorrow morning and ponder the completion 
> >>>>>>> of the upgrade.
> >>>>>>>
> >>>>>>> I agree with you Greg, it does not make sense. The cluster was 
> >>>>>>> running great for around 6 hours after the mons upgrade...
> >>>>>>>
> >>>>>>> Finally, regarding the cache-tiering, I know it is not used a lot nor 
> >>>>>>> will it be maintained over time. One of the main goal of the luminous 
> >>>>>>> upgrade is to remove it completely and switch to ec rbd.
> >>>>>>>
> >>>>>>> Thank you for your inputs !
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Aug 22, 2018 at 11:57 PM Gregory Farnum <gfar...@redhat.com> 
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On Wed, Aug 22, 2018 at 6:35 AM Adrien Gillard 
> >>>>>>>> <gillard.adr...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi everyone,
> >>>>>>>>>
> >>>>>>>>> We have a hard time figuring out a behaviour encountered after 
> >>>>>>>>> upgrading the monitors of one of our cluster from Jewel to Luminous 
> >>>>>>>>> yesterday.
> >>>>>>>>>
> >>>>>>>>> The cluster is composed of 14 OSDs hosts (2xE5-2640 v3 and 64 GB of 
> >>>>>>>>> RAM), each containing 12x4TB OSD with journals on DC grade SSDs). 
> >>>>>>>>> The cluster was installed in Infernalis and upgraded to Jewel. We 
> >>>>>>>>> mainly use it for RBD with a cache tier in front of an EC pool.
> >>>>>>>>>
> >>>>>>>>> We started the upgrade from Jewel 10.2.10 to Luminous 12.2.7 
> >>>>>>>>> yesterday afternoon with the monitors and the provisionning of the 
> >>>>>>>>> MGR. We were planning on upgrading the OSD servers today.
> >>>>>>>>> All went well until the cluster received the following commands, a 
> >>>>>>>>> few hours after the upgrade :
> >>>>>>>>>
> >>>>>>>>> ceph osd pool set pool-cache-01 target_max_bytes 0
> >>>>>>>>> ceph osd pool set pool-cache-02 target_max_bytes 0
> >>>>>>>>>
> >>>>>>>>> (The context here is that we use this cluster for backups and set 
> >>>>>>>>> target_max_bytes to 0 to stop the flush and give maximum ressources 
> >>>>>>>>> to the backups IOs)
> >>>>>>>>>
> >>>>>>>>> Below is an extract from the logs of one of the monitors.
> >>>>>>>>> The cluster started exchanging a lot over the cluster network, 
> >>>>>>>>> apparently backfill/recovery and multiple Gbits/s, in an amount 
> >>>>>>>>> which we've never seen before. Due to some network uplinks 
> >>>>>>>>> sub-sizing, the traffic could not be handled well and this lead to 
> >>>>>>>>> OSDs crashing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> This really doesn't make a ton of sense — you're saying that by 
> >>>>>>>> telling the cluster to *stop* doing flushing IO, everything went 
> >>>>>>>> bad? In terms of root causing the issue you'll obviously want to 
> >>>>>>>> focus on that, although it may or may not be helpful in recovering 
> >>>>>>>> things at this point.
> >>>>>>>>
> >>>>>>>> Do keep in mind that users running cache-tiering are increasingly 
> >>>>>>>> uncommon, and while we still test it we aren't expanding that 
> >>>>>>>> coverage and would really like to remove and replace that 
> >>>>>>>> functionality, so you're out on the far edge of things here. :/
> >>>>>>>>
> >>>>>>>> Finally, if you're still experiencing OOM issues, you may want to 
> >>>>>>>> set the "nodown" flag — that will prevent OSDs being marked down, 
> >>>>>>>> which obviously will stall IO and peering if they go down anyway, 
> >>>>>>>> but will prevent spamming map updates that cause the cluster to OOM 
> >>>>>>>> again.
> >>>>>>>> -Greg
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> At the moment we are still investigating the logs to try to 
> >>>>>>>>> understand what happened exactly. The cluster is down. When we 
> >>>>>>>>> restart the OSDs, PG start activating and peering but at some point 
> >>>>>>>>> all the RAM and swap of the servers is consumed and OSDs crash.
> >>>>>>>>> I tried to set the ovious cluster parameters 
> >>>>>>>>> (osd_recovery_max_active=1, osd_recovery_threads=1, 
> >>>>>>>>> osd_max_backfills=1) and norecover, nobackfill, but when the OSDs 
> >>>>>>>>> start, we still see 4 or 5 Gbits/s over the cluster network.
> >>>>>>>>>
> >>>>>>>>> We are currently reconfiguring the network to try to remove the 
> >>>>>>>>> bandwith problem and are preparing to add more RAM to the servers.
> >>>>>>>>>
> >>>>>>>>> But still, I have a couple of questions :
> >>>>>>>>>
> >>>>>>>>>  - Has anyone encoutered the issue we have with the cache ? Or have 
> >>>>>>>>> a possible explanation ?
> >>>>>>>>>  - Is it normal to observe so much traffic on the cluster network 
> >>>>>>>>> when the PG are activating / peering with norecover and nobackfill 
> >>>>>>>>> set ?
> >>>>>>>>>
> >>>>>>>>> Thank you for reading and for your input !
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2018-08-21 21:05:02.766969 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB 
> >>>>>>>>> used, 225 TB / 611 TB avail; 298 MB/s rd, 212 MB/s wr, 456 op/s
> >>>>>>>>> 2018-08-21 21:05:03.805124 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB 
> >>>>>>>>> used, 225 TB / 611 TB avail; 296 MB/s rd, 193 MB/s wr, 433 op/s
> >>>>>>>>> 2018-08-21 21:05:03.870690 7f10d83cc700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader) e6 handle_command 
> >>>>>>>>> mon_command({"var": "target_max_bytes", "prefix": "osd pool set", 
> >>>>>>>>> "pool": "pool-cache-02", "val": "0"} v 0
> >>>>>>>>> ) v1
> >>>>>>>>> 2018-08-21 21:05:03.870749 7f10d83cc700  0 log_channel(audit) log 
> >>>>>>>>> [INF] : from='client.146529236 -' entity='client.admin' 
> >>>>>>>>> cmd=[{"var": "target_max_bytes", "prefix": "osd pool set", "pool": 
> >>>>>>>>> "pool-cache-02
> >>>>>>>>> ", "val": "0"}]: dispatch
> >>>>>>>>> 2018-08-21 21:05:03.913663 7f10d83cc700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader) e6 handle_command 
> >>>>>>>>> mon_command({"prefix": "osd pool stats", "format": "json"} v 0) v1
> >>>>>>>>> 2018-08-21 21:05:03.913745 7f10d83cc700  0 log_channel(audit) log 
> >>>>>>>>> [DBG] : from='client.? 10.25.25.3:0/4101432770' 
> >>>>>>>>> entity='client.admin' cmd=[{"prefix": "osd pool stats", "format": 
> >>>>>>>>> "json"}]: dispatch
> >>>>>>>>> 2018-08-21 21:05:04.042835 7f10d83cc700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader) e6 handle_command 
> >>>>>>>>> mon_command({"var": "target_max_bytes", "prefix": "osd pool set", 
> >>>>>>>>> "pool": "pool-cache-01", "val": "0"} v 0
> >>>>>>>>> ) v1
> >>>>>>>>> 2018-08-21 21:05:04.042894 7f10d83cc700  0 log_channel(audit) log 
> >>>>>>>>> [INF] : from='client.146565481 -' entity='client.admin' 
> >>>>>>>>> cmd=[{"var": "target_max_bytes", "prefix": "osd pool set", "pool": 
> >>>>>>>>> "pool-cache-01
> >>>>>>>>> ", "val": "0"}]: dispatch
> >>>>>>>>> 2018-08-21 21:05:04.813217 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB 
> >>>>>>>>> used, 225 TB / 611 TB avail; 271 MB/s rd, 181 MB/s wr, 415 op/s
> >>>>>>>>> 2018-08-21 21:05:04.821486 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114693 e114693: 168 total, 
> >>>>>>>>> 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:04.821711 7f10de377700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114693 crush map has 
> >>>>>>>>> features 432365734501548032, adjusting msgr requires
> >>>>>>>>> 2018-08-21 21:05:04.823032 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114693 create_pending 
> >>>>>>>>> setting backfillfull_ratio = 0.9
> >>>>>>>>> 2018-08-21 21:05:04.823156 7f10de377700  0 log_channel(audit) log 
> >>>>>>>>> [INF] : from='client.146529236 -' entity='client.admin' 
> >>>>>>>>> cmd='[{"var": "target_max_bytes", "prefix": "osd pool set", "pool": 
> >>>>>>>>> "pool-cache-0
> >>>>>>>>> 2", "val": "0"}]': finished
> >>>>>>>>> 2018-08-21 21:05:04.823280 7f10de377700  0 log_channel(audit) log 
> >>>>>>>>> [INF] : from='client.146565481 -' entity='client.admin' 
> >>>>>>>>> cmd='[{"var": "target_max_bytes", "prefix": "osd pool set", "pool": 
> >>>>>>>>> "pool-cache-0
> >>>>>>>>> 1", "val": "0"}]': finished
> >>>>>>>>> 2018-08-21 21:05:04.825528 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [DBG] : osdmap e114693: 168 total, 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:04.844932 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 6208 active+clean; 261 TB data, 385 TB 
> >>>>>>>>> used, 225 TB / 611 TB avail; 275 MB/s rd, 192 MB/s wr, 429 op/s
> >>>>>>>>> 2018-08-21 21:05:05.873682 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 187 peering, 6021 active+clean; 261 TB 
> >>>>>>>>> data, 385 TB used, 225 TB / 611 TB avail; 220 MB/s rd, 188 MB/s wr, 
> >>>>>>>>> 383
> >>>>>>>>> op/s
> >>>>>>>>> 2018-08-21 21:05:05.895414 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114694 e114694: 168 total, 
> >>>>>>>>> 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:05.895721 7f10de377700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114694 crush map has 
> >>>>>>>>> features 432365734501548032, adjusting msgr requires
> >>>>>>>>> 2018-08-21 21:05:05.905473 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114694 create_pending 
> >>>>>>>>> setting backfillfull_ratio = 0.9
> >>>>>>>>> 2018-08-21 21:05:05.940729 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [DBG] : osdmap e114694: 168 total, 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:05.947921 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 204 peering, 6004 active+clean; 261 TB 
> >>>>>>>>> data, 385 TB used, 225 TB / 611 TB avail; 201 MB/s rd, 200 MB/s wr, 
> >>>>>>>>> 355
> >>>>>>>>> op/s
> >>>>>>>>> 2018-08-21 21:05:06.912752 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114695 e114695: 168 total, 
> >>>>>>>>> 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:06.913104 7f10de377700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114695 crush map has 
> >>>>>>>>> features 432365734501548032, adjusting msgr requires
> >>>>>>>>> 2018-08-21 21:05:06.928521 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114695 create_pending 
> >>>>>>>>> setting backfillfull_ratio = 0.9
> >>>>>>>>> 2018-08-21 21:05:06.968793 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [DBG] : osdmap e114695: 168 total, 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:07.001112 7f10d93ce700  1 heartbeat_map 
> >>>>>>>>> reset_timeout 'Monitor::cpu_tp thread 0x7f10d93ce700' had timed out 
> >>>>>>>>> after 0
> >>>>>>>>> 2018-08-21 21:05:07.004274 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 12 unknown, 114 remapped+peering, 5 
> >>>>>>>>> activating, 481 peering, 5596 active+clean; 261 TB data, 385 TB 
> >>>>>>>>> used, 225 T
> >>>>>>>>> B / 611 TB avail; 177 MB/s rd, 173 MB/s wr, 317 op/s
> >>>>>>>>> 2018-08-21 21:05:07.025200 7f10d8bcd700  1 heartbeat_map 
> >>>>>>>>> reset_timeout 'Monitor::cpu_tp thread 0x7f10d8bcd700' had timed out 
> >>>>>>>>> after 0
> >>>>>>>>> 2018-08-21 21:05:07.948138 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114696 e114696: 168 total, 
> >>>>>>>>> 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:07.948487 7f10de377700  0 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114696 crush map has 
> >>>>>>>>> features 432365734501548032, adjusting msgr requires
> >>>>>>>>> 2018-08-21 21:05:07.965671 7f10de377700  1 
> >>>>>>>>> mon.clb-cph-frpar2-mon-01@0(leader).osd e114696 create_pending 
> >>>>>>>>> setting backfillfull_ratio = 0.9
> >>>>>>>>> 2018-08-21 21:05:08.007585 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [DBG] : osdmap e114696: 168 total, 168 up, 168 in
> >>>>>>>>> 2018-08-21 21:05:08.052153 7f10de377700  0 log_channel(cluster) log 
> >>>>>>>>> [INF] : pgmap 6208 pgs: 46 activating+remapped, 15 unknown, 305 
> >>>>>>>>> remapped+peering, 11 activating, 697 peering, 8 
> >>>>>>>>> active+remapped+backfil
> >>>>>>>>> ling, 5126 active+clean; 261 TB data, 385 TB used, 225 TB / 611 TB 
> >>>>>>>>> avail; 115 MB/s rd, 182 MB/s wr, 292 op/s; 26681/519327262 objects 
> >>>>>>>>> misplaced (0.005%); 4856 kB/s, 3 objects/s recovering
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> ceph-users mailing list
> >>>>>>>>> ceph-users@lists.ceph.com
> >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

Reply via email to