Re: [ceph-users] OSD process exhausting server memory

Michael J. Kidd Thu, 30 Oct 2014 10:34:14 -0700

Hello Lukas,
  Unfortunately, I'm all out of ideas at the moment.  There are some memory
profiling techniques which can help identify what is causing the memory
utilization, but it's a bit beyond what I typically work on.  Others on the
list may have experience with this (or otherwise have ideas) and may chip
in...


Wish I could be more help..

Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services
 - by Red Hat

On Thu, Oct 30, 2014 at 11:00 AM, Lukáš Kubín <lukas.ku...@gmail.com> wrote:

> Thanks Michael, still no luck.
>
> Letting the problematic OSD.10 down has no effect. Within minutes more of
> OSDs fail on same issue after consuming ~50GB of memory. Also, I can see
> two of those cache-tier OSDs on separate hosts which remain utilized almost
> 200% CPU all the time
>
> I've performed upgrade of all cluster to 0.80.7. Did not help.
>
> I have also tried to unset norecovery+nobackfill flags to attempt a
> recovery completion. No luck, several OSDs fail with the same issue
> preventing the recovery to complete. I've performed your fix steps from the
> start again and currently I'm behind the "unset noin" step.
>
> I could get some of pools to a state with no degraded objects temporarily.
> Then (within minutes) some OSD fails and it's degraded again.
>
> I have also tried to let the OSD processes get restarted automatically to
> keep them up as much as possible.
>
> I consider disabling the tiering pool "volumes-cache" as that's something
> I can miss:
>
> pool name       category                 KB      objects       clones
> degraded
> backups         -                          0            0            0
>        0
> data            -                          0            0            0
>        0
> images          -                  777989590        95027            0
>     8883
> metadata        -                          0            0            0
>        0
> rbd             -                          0            0            0
>        0
> volumes         -                  115608693        25965          179
>     3307
> volumes-cache   -                  649577103     16708730         9894
>  1144650
>
>
> Can I just switch it into the forward mode and let it empty
> (cache-flush-evict-all) to see if that changes anything?
>
> Could you or any of your colleagues provide anything else to try?
>
> Thank you,
>
> Lukas
>
>
> On Thu, Oct 30, 2014 at 3:05 PM, Michael J. Kidd <michael.k...@inktank.com
> > wrote:
>
>> Hello Lukas,
>>   The 'slow request' logs are expected while the cluster is in such a
>> state.. the OSD processes simply aren't able to respond quickly to client
>> IO requests.
>>
>> I would recommend trying to recover without the most problematic disk (
>> seems to be OSD.10? ).. Simply shut it down and see if the other OSDs
>> settle down.  You should also take a look at the kernel logs for any
>> indications of a problem with the disks themselves, or possibly do an FIO
>> test against the drive with the OSD shut down (to a file on the OSD
>> filesystem, not the raw drive.. this would be destructive).
>>
>> Also, you could upgrade to 0.80.7.  There are some bug fixes, but I'm not
>> sure if any would specifically help this situation.. not likely to hurt
>> through.
>>
>> The desired state is for the cluster to be steady-state before the next
>> move (unsetting the next flag).  Hopefully this can be achieved without
>> needing to take down OSDs in multiple hosts.
>>
>> I'm also unsure about the cache tiering and how it could relate to the
>> load being seen.
>>
>> Hope this helps...
>>
>> Michael J. Kidd
>> Sr. Storage Consultant
>> Inktank Professional Services
>>  - by Red Hat
>>
>> On Thu, Oct 30, 2014 at 4:00 AM, Lukáš Kubín <lukas.ku...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I've noticed the following messages always accumulate in OSD log before
>>> it exhausts all memory:
>>>
>>> 2014-10-30 08:48:42.994190 7f80a2019700  0 log [WRN] : slow request
>>> 38.901192 seconds old, received at 2014-10-30 08:48:04.092889:
>>> osd_op(osd.29.3076:207644827 rbd_data.2e4ee3ba663be.000000000000363b@17
>>> [copy-get max 8388608] 7.af87e887
>>> ack+read+ignore_cache+ignore_overlay+map_snap_clone e3359) v4 currently
>>> reached pg
>>>
>>>
>>> Note this is always from the most frequently failing osd.10 (sata tier)
>>> referring to osd.29 (ssd cache tier). That osd.29 is consuming huge CPU and
>>> memory resources, but keeps running without failures.
>>>
>>> Can this be eg. a bug? Or some erroneous I/O request which initiated
>>> this behaviour? Can I eg. attempt to upgrade the Ceph to a more recent
>>> release in the current unhealthy status of the cluster? Can I eg. try
>>> disabling the caching tier? Or just somehow evacuate the problematic OSD?
>>>
>>> I'll welcome any ideas. Currently, I'm keeping the osd.10 in an
>>> automatic restart loop with 60 seconds pause before starting again.
>>>
>>> Thanks and greetings,
>>>
>>> Lukas
>>>
>>> On Wed, Oct 29, 2014 at 8:04 PM, Lukáš Kubín <lukas.ku...@gmail.com>
>>> wrote:
>>>
>>>> I should have figured that out myself since I did that recently. Thanks.
>>>>
>>>> Unfortunately, I'm still at the step "ceph osd unset noin". After
>>>> setting all the OSDs in, the original issue reapears preventing me to
>>>> proceed with recovery. It now appears mostly at single OSD - osd.10 which
>>>> consumes ~200% CPU and all memory within 45 seconds being killed by Linux
>>>> then:
>>>>
>>>> Oct 29 18:24:38 q09 kernel: Out of memory: Kill process 17202
>>>> (ceph-osd) score 912 or sacrifice child
>>>> Oct 29 18:24:38 q09 kernel: Killed process 17202, UID 0, (ceph-osd)
>>>> total-vm:62713176kB, anon-rss:62009772kB, file-rss:328kB
>>>>
>>>>
>>>> I've tried to restart it several times with same result. Similar
>>>> situation with OSDs 0 and 13.
>>>>
>>>> Also, I've noticed one of SSD cache tier's OSD - osd.29 generating high
>>>> CPU utilization around 180%.
>>>>
>>>> All the problematic OSD's have been the same ones all the time -  OSD
>>>> 0,8,10,13 and 29 - they are those which I've found to be down this morning.
>>>>
>>>> There is some minor load coming from client - Openstack instances, I
>>>> preferred not to kill them:
>>>>
>>>> [root@q04 ceph-recovery]# ceph -s
>>>>     cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
>>>>      health HEALTH_ERR 31 pgs backfill; 241 pgs degraded; 62 pgs down;
>>>> 193 pgs incomplete; 13 pgs inconsistent; 62 pgs peering; 12 pgs recovering;
>>>> 205 pgs recovery_wait; 93 pgs stuck inactive; 608 pgs stuck unclean; 381138
>>>> requests are blocked > 32 sec; recovery 1162468/35207488 objects degraded
>>>> (3.302%); 466/17112963 unfound (0.003%); 13 scrub errors; 1/34 in osds are
>>>> down; nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
>>>>      monmap e2: 3 mons at {q03=
>>>> 10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
>>>> election epoch 92, quorum 0,1,2 q03,q04,q05
>>>>      osdmap e2782: 34 osds: 33 up, 34 in
>>>>             flags nobackfill,norecover,noscrub,nodeep-scrub
>>>>       pgmap v7440374: 5632 pgs, 7 pools, 1449 GB data, 16711 kobjects
>>>>             3148 GB used, 15010 GB / 18158 GB avail
>>>>             1162468/35207488 objects degraded (3.302%); 466/17112963
>>>> unfound (0.003%)
>>>>                   13 active
>>>>                   22 active+recovery_wait+remapped
>>>>                    1 active+recovery_wait+inconsistent
>>>>                 4794 active+clean
>>>>                  193 incomplete
>>>>                   62 down+peering
>>>>                    9 active+degraded+remapped+wait_backfill
>>>>                  182 active+recovery_wait
>>>>                   74 active+remapped
>>>>                   12 active+recovering
>>>>                   12 active+clean+inconsistent
>>>>                   22 active+remapped+wait_backfill
>>>>                    4 active+clean+replay
>>>>                  232 active+degraded
>>>>   client io 0 B/s rd, 1048 kB/s wr, 184 op/s
>>>>
>>>>
>>>> Below I'm sending the requested output.
>>>>
>>>> Do you have any other ideas how to recover from this?
>>>>
>>>> Thanks a lot.
>>>>
>>>> Lukas
>>>>
>>>>
>>>>
>>>>
>>>> [root@q04 ceph-recovery]# ceph osd crush rule dump
>>>> [
>>>>     { "rule_id": 0,
>>>>       "rule_name": "replicated_ruleset",
>>>>       "ruleset": 0,
>>>>       "type": 1,
>>>>       "min_size": 1,
>>>>       "max_size": 10,
>>>>       "steps": [
>>>>             { "op": "take",
>>>>               "item": -1,
>>>>               "item_name": "default"},
>>>>             { "op": "chooseleaf_firstn",
>>>>               "num": 0,
>>>>               "type": "host"},
>>>>             { "op": "emit"}]},
>>>>     { "rule_id": 1,
>>>>       "rule_name": "ssd",
>>>>       "ruleset": 1,
>>>>       "type": 1,
>>>>       "min_size": 1,
>>>>       "max_size": 10,
>>>>       "steps": [
>>>>             { "op": "take",
>>>>               "item": -5,
>>>>               "item_name": "ssd"},
>>>>             { "op": "chooseleaf_firstn",
>>>>               "num": 0,
>>>>               "type": "host"},
>>>>             { "op": "emit"}]},
>>>>     { "rule_id": 2,
>>>>       "rule_name": "sata",
>>>>       "ruleset": 2,
>>>>       "type": 1,
>>>>       "min_size": 1,
>>>>       "max_size": 10,
>>>>       "steps": [
>>>>             { "op": "take",
>>>>               "item": -4,
>>>>               "item_name": "sata"},
>>>>             { "op": "chooseleaf_firstn",
>>>>               "num": 0,
>>>>               "type": "host"},
>>>>             { "op": "emit"}]}]
>>>>
>>>> [root@q04 ceph-recovery]# ceph osd dump | grep pool
>>>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 2 object_hash
>>>> rjenkins pg_num 512 pgp_num 512 last_change 630 flags hashpspool
>>>> crash_replay_interval 45 stripe_width 0
>>>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 2
>>>> object_hash rjenkins pg_num 512 pgp_num 512 last_change 632 flags
>>>> hashpspool stripe_width 0
>>>> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash
>>>> rjenkins pg_num 512 pgp_num 512 last_change 634 flags hashpspool
>>>> stripe_width 0
>>>> pool 7 'volumes' replicated size 2 min_size 2 crush_ruleset 0
>>>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 1517 flags
>>>> hashpspool tiers 14 read_tier 14 write_tier 14 stripe_width 0
>>>> pool 8 'images' replicated size 2 min_size 2 crush_ruleset 0
>>>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 1519 flags
>>>> hashpspool stripe_width 0
>>>> pool 12 'backups' replicated size 2 min_size 1 crush_ruleset 0
>>>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 862 flags
>>>> hashpspool stripe_width 0
>>>> pool 14 'volumes-cache' replicated size 2 min_size 1 crush_ruleset 1
>>>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 1517 flags
>>>> hashpspool tier_of 7 cache_mode writeback target_bytes 1000000000000
>>>> hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0}
>>>> 3600s x1 stripe_width 0
>>>>
>>>>
>>>> On Wed, Oct 29, 2014 at 6:43 PM, Michael J. Kidd <
>>>> michael.k...@inktank.com> wrote:
>>>>
>>>>> Ah, sorry... since they were set out manually, they'll need to be set
>>>>> in manually..
>>>>>
>>>>> for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd
>>>>> in $i; done
>>>>>
>>>>>
>>>>>
>>>>> Michael J. Kidd
>>>>> Sr. Storage Consultant
>>>>> Inktank Professional Services
>>>>>  - by Red Hat
>>>>>
>>>>> On Wed, Oct 29, 2014 at 12:33 PM, Lukáš Kubín <lukas.ku...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I've ended up at step "ceph osd unset noin". My OSDs are up, but not
>>>>>> in, even after an hour:
>>>>>>
>>>>>> [root@q04 ceph-recovery]# ceph osd stat
>>>>>>      osdmap e2602: 34 osds: 34 up, 0 in
>>>>>>             flags nobackfill,norecover,noscrub,nodeep-scrub
>>>>>>
>>>>>>
>>>>>> There seems to be no activity generated by OSD processes,
>>>>>> occasionally they show 0,3% which I believe is just some basic
>>>>>> communication processing. No load in network interfaces.
>>>>>>
>>>>>> Is there some other step needed to bring the OSDs in?
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Lukas
>>>>>>
>>>>>> On Wed, Oct 29, 2014 at 3:58 PM, Michael J. Kidd <
>>>>>> michael.k...@inktank.com> wrote:
>>>>>>
>>>>>>> Hello Lukas,
>>>>>>>   Please try the following process for getting all your OSDs up and
>>>>>>> operational...
>>>>>>>
>>>>>>> * Set the following flags: noup, noin, noscrub, nodeep-scrub,
>>>>>>> norecover, nobackfill
>>>>>>> for i in noup noin noscrub nodeep-scrub norecover nobackfill; do
>>>>>>> ceph osd set $i; done
>>>>>>>
>>>>>>> * Stop all OSDs (I know, this seems counter productive)
>>>>>>> * Set all OSDs down / out
>>>>>>> for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd
>>>>>>> down $i; ceph osd out $i; done
>>>>>>> * Set recovery / backfill throttles as well as heartbeat and OSD map
>>>>>>> processing tweaks in the /etc/ceph/ceph.conf file under the [osd] 
>>>>>>> section:
>>>>>>> [osd]
>>>>>>> osd_max_backfills = 1
>>>>>>> osd_recovery_max_active = 1
>>>>>>> osd_recovery_max_single_start = 1
>>>>>>> osd_backfill_scan_min = 8
>>>>>>> osd_heartbeat_interval = 36
>>>>>>> osd_heartbeat_grace = 240
>>>>>>> osd_map_message_max = 1000
>>>>>>> osd_map_cache_size = 3136
>>>>>>>
>>>>>>> * Start all OSDs
>>>>>>> * Monitor 'top' for 0% CPU on all OSD processes.. it may take a
>>>>>>> while..  I usually issue 'top' then, the keys M c
>>>>>>>  - M = Sort by memory usage
>>>>>>>  - c = Show command arguments
>>>>>>>  - This allows to easily monitor the OSD process and know which OSDs
>>>>>>> have settled, etc..
>>>>>>> * Once all OSDs have hit 0% CPU utilization, remove the 'noup' flag
>>>>>>>  - ceph osd unset noup
>>>>>>> * Again, wait for 0% CPU utilization (may  be immediate, may take a
>>>>>>> while.. just gotta wait)
>>>>>>> * Once all OSDs have hit 0% CPU again, remove the 'noin' flag
>>>>>>>  - ceph osd unset noin
>>>>>>>  - All OSDs should now appear up/in, and will go through peering..
>>>>>>> * Once ceph -s shows no further activity, and OSDs are back at 0%
>>>>>>> CPU again, unset 'nobackfill'
>>>>>>>  - ceph osd unset nobackfill
>>>>>>> * Once ceph -s shows no further activity, and OSDs are back at 0%
>>>>>>> CPU again, unset 'norecover'
>>>>>>>  - ceph osd unset norecover
>>>>>>> * Monitor OSD memory usage... some OSDs may get killed off again,
>>>>>>> but their subsequent restart should consume less memory and allow more
>>>>>>> recovery to occur between each step above.. and ultimately, hopefully...
>>>>>>> your entire cluster will come back online and be usable.
>>>>>>>
>>>>>>> ## Clean-up:
>>>>>>> * Remove all of the above set options from ceph.conf
>>>>>>> * Reset the running OSDs to their defaults:
>>>>>>> ceph tell osd.\* injectargs '--osd_max_backfills 10
>>>>>>> --osd_recovery_max_active 15 --osd_recovery_max_single_start 5
>>>>>>> --osd_backfill_scan_min 64 --osd_heartbeat_interval 6 
>>>>>>> --osd_heartbeat_grace
>>>>>>> 36 --osd_map_message_max 100 --osd_map_cache_size 500'
>>>>>>> * Unset the noscrub and nodeep-scrub flags:
>>>>>>>  - ceph osd unset noscrub
>>>>>>>  - ceph osd unset nodeep-scrub
>>>>>>>
>>>>>>>
>>>>>>> ## For help identifying why memory usage was so high, please provide:
>>>>>>> * ceph osd dump | grep pool
>>>>>>> * ceph osd crush rule dump
>>>>>>>
>>>>>>> Let us know if this helps... I know it looks extreme, but it's
>>>>>>> worked for me in the past..
>>>>>>>
>>>>>>>
>>>>>>> Michael J. Kidd
>>>>>>> Sr. Storage Consultant
>>>>>>> Inktank Professional Services
>>>>>>>  - by Red Hat
>>>>>>>
>>>>>>> On Wed, Oct 29, 2014 at 8:51 AM, Lukáš Kubín <lukas.ku...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>> I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs
>>>>>>>> being down through night after months of running without change. From 
>>>>>>>> Linux
>>>>>>>> logs I found out the OSD processes were killed because they consumed 
>>>>>>>> all
>>>>>>>> available memory.
>>>>>>>>
>>>>>>>> Those 5 failed OSDs were from different hosts of my 4-node cluster
>>>>>>>> (see below). Two hosts act as SSD cache tier in some of my pools. The 
>>>>>>>> other
>>>>>>>> two hosts are the default rotational drives storage.
>>>>>>>>
>>>>>>>> After checking the Linux was not out of memory I've attempted to
>>>>>>>> restart those failed OSDs. Most of those OSD daemon exhaust all memory 
>>>>>>>> in
>>>>>>>> seconds and got killed by Linux again:
>>>>>>>>
>>>>>>>> Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207
>>>>>>>> (ceph-osd) score 867 or sacrifice child
>>>>>>>> Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
>>>>>>>> total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB
>>>>>>>>
>>>>>>>>
>>>>>>>> On the host I've found lots of similar "slow request" messages
>>>>>>>> preceding the crash:
>>>>>>>>
>>>>>>>> 2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
>>>>>>>> 31.117125 seconds old, received at 2014-10-28 22:10:49.768291:
>>>>>>>> osd_sub_op(client.168752.0:2197931 14.2c7
>>>>>>>> 888596c7/rbd_data.293272f8695e4.000000000000006f/head//14 [] v 
>>>>>>>> 1551'377417
>>>>>>>> snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached
>>>>>>>> 2014-10-28 22:11:21.885668 7f25f84d1700  0 log [WRN] : 67 slow
>>>>>>>> requests, 1 included below; oldest blocked for > 9879.304770 secs
>>>>>>>>
>>>>>>>>
>>>>>>>> Apparently I can't get the cluster fixed by restarting the OSDs all
>>>>>>>> over again. Is there any other option then?
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Lukas Kubin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [root@q04 ~]# ceph -s
>>>>>>>>     cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
>>>>>>>>      health HEALTH_ERR 9 pgs backfill; 1 pgs backfilling; 521 pgs
>>>>>>>> degraded; 425 pgs incomplete; 13 pgs inconsistent; 20 pgs recovering; 
>>>>>>>> 50
>>>>>>>> pgs recovery_wait; 151 pgs stale; 425 pgs stuck inactive; 151 pgs stuck
>>>>>>>> stale; 1164 pgs stuck unclean; 12070270 requests are blocked > 32 sec;
>>>>>>>> recovery 887322/35206223 objects degraded (2.520%); 119/17131232 
>>>>>>>> unfound
>>>>>>>> (0.001%); 13 scrub errors
>>>>>>>>      monmap e2: 3 mons at {q03=
>>>>>>>> 10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
>>>>>>>> election epoch 90, quorum 0,1,2 q03,q04,q05
>>>>>>>>      osdmap e2194: 34 osds: 31 up, 31 in
>>>>>>>>       pgmap v7429812: 5632 pgs, 7 pools, 1446 GB data, 16729
>>>>>>>> kobjects
>>>>>>>>             2915 GB used, 12449 GB / 15365 GB avail
>>>>>>>>             887322/35206223 objects degraded (2.520%); 119/17131232
>>>>>>>> unfound (0.001%)
>>>>>>>>                   38 active+recovery_wait+remapped
>>>>>>>>                 4455 active+clean
>>>>>>>>                   65 stale+incomplete
>>>>>>>>                    3 active+recovering+remapped
>>>>>>>>                  359 incomplete
>>>>>>>>                   12 active+recovery_wait
>>>>>>>>                  139 active+remapped
>>>>>>>>                   86 stale+active+degraded
>>>>>>>>                   16 active+recovering
>>>>>>>>                    1 active+remapped+backfilling
>>>>>>>>                   13 active+clean+inconsistent
>>>>>>>>                    9 active+remapped+wait_backfill
>>>>>>>>                  434 active+degraded
>>>>>>>>                    1 remapped+incomplete
>>>>>>>>                    1 active+recovering+degraded+remapped
>>>>>>>>   client io 0 B/s rd, 469 kB/s wr, 48 op/s
>>>>>>>>
>>>>>>>> [root@q04 ~]# ceph osd tree
>>>>>>>> # id    weight  type name       up/down reweight
>>>>>>>> -5      3.24    root ssd
>>>>>>>> -6      1.62            host q06
>>>>>>>> 16      0.18                    osd.16  up      1
>>>>>>>> 17      0.18                    osd.17  up      1
>>>>>>>> 18      0.18                    osd.18  up      1
>>>>>>>> 19      0.18                    osd.19  up      1
>>>>>>>> 20      0.18                    osd.20  up      1
>>>>>>>> 21      0.18                    osd.21  up      1
>>>>>>>> 22      0.18                    osd.22  up      1
>>>>>>>> 23      0.18                    osd.23  up      1
>>>>>>>> 24      0.18                    osd.24  up      1
>>>>>>>> -7      1.62            host q07
>>>>>>>> 25      0.18                    osd.25  up      1
>>>>>>>> 26      0.18                    osd.26  up      1
>>>>>>>> 27      0.18                    osd.27  up      1
>>>>>>>> 28      0.18                    osd.28  up      1
>>>>>>>> 29      0.18                    osd.29  up      1
>>>>>>>> 30      0.18                    osd.30  up      1
>>>>>>>> 31      0.18                    osd.31  up      1
>>>>>>>> 32      0.18                    osd.32  up      1
>>>>>>>> 33      0.18                    osd.33  up      1
>>>>>>>> -1      14.56   root default
>>>>>>>> -4      14.56           root sata
>>>>>>>> -2      7.28                    host q08
>>>>>>>> 0       0.91                            osd.0   up      1
>>>>>>>> 1       0.91                            osd.1   up      1
>>>>>>>> 2       0.91                            osd.2   up      1
>>>>>>>> 3       0.91                            osd.3   up      1
>>>>>>>> 11      0.91                            osd.11  up      1
>>>>>>>> 12      0.91                            osd.12  up      1
>>>>>>>> 13      0.91                            osd.13  down    0
>>>>>>>> 14      0.91                            osd.14  up      1
>>>>>>>> -3      7.28                    host q09
>>>>>>>> 4       0.91                            osd.4   up      1
>>>>>>>> 5       0.91                            osd.5   up      1
>>>>>>>> 6       0.91                            osd.6   up      1
>>>>>>>> 7       0.91                            osd.7   up      1
>>>>>>>> 8       0.91                            osd.8   down    0
>>>>>>>> 9       0.91                            osd.9   up      1
>>>>>>>> 10      0.91                            osd.10  down    0
>>>>>>>> 15      0.91                            osd.15  up      1
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD process exhausting server memory

Reply via email to