Re: [ceph-users] what is the Recommandation configure for a ceph cluster with 10 servers without memory leak?

wsnote Wed, 18 Jun 2014 17:06:42 -0700

It happeds when I have installed ceph cluster and start it. I have not done 
anything in it.
I use one moniter per server. That is to say, there are 10 moniters in the ceph 
cluster.
I tried to reduce the number of moniters, but the problem remained.



I give the info about some commands.


1. command: ceph -s

cluster 084265e5-1dc0-46ca-912b-81b9b0127461
     health HEALTH_WARN 374 pgs degraded; 11053 pgs down; 6805 pgs incomplete; 
51729 pgs peering; 62838 pgs stale; 187069 pgs stuck inactive; 62838 pgs stuck 
stale; 187153 pgs stuck unclean; 45 requests are blocked > 32 sec; 1/15 in osds 
are down
     monmap e1: 10 mons at 
{1=122.228.248.168:6789/0,10=122.228.248.177:6789/0,2=122.228.248.169:6789/0,3=122.228.248.170:6789/0,4=122.228.248.171:6789/0,5=122.228.248.172:6789/0,6=122.228.248.173:6789/0,7=122.228.248.174:6789/0,8=122.228.248.175:6789/0,9=122.228.248.176:6789/0},
 election epoch 110, quorum 0,1,2,3,4,5,6,7,8,9 1,2,3,4,5,6,7,8,9,10
     mdsmap e46: 1/1/1 up {0=1=up:creating}, 9 up:standby
     osdmap e786: 50 osds: 14 up, 15 in
      pgmap v1293: 193152 pgs, 3 pools, 0 bytes data, 0 objects
            16997 MB used, 44673 GB / 44690 GB avail
              122757 creating
                   2 stale+active+degraded+remapped
                6480 peering
                6496 stale+incomplete
                  89 stale+replay+degraded
               10022 stale+down+peering
                   5 stale+active+replay+degraded
                 152 stale+remapped
                   1 stale+active+remapped
                 293 incomplete
                1858 stale+remapped+peering
                3802 stale+replay
                 774 down+peering
                 201 stale+degraded
                3892 stale+active+clean+replay
                  76 stale+active+degraded
                  10 remapped+peering
                  16 stale+remapped+incomplete
                1525 stale
                   1 stale+replay+degraded+remapped
                 257 stale+down+remapped+peering
                2107 stale+active+clean
               32328 stale+peering
                   8 stale+replay+remapped


2. command: ceph osd tree
# idweighttype nameup/downreweight
-150root default
-350rack unknownrack
-25host yun177
10011osd.1001down0
10021osd.1002down0
10031osd.1003down0
10041osd.1004down0
10051osd.1005down0
-45host yun168
1011osd.101down0
1021osd.102down0
1031osd.103down0
1041osd.104down0
1051osd.105down0
-55host yun169
2011osd.201down0
2021osd.202down0
2031osd.203down1
2041osd.204up1
2051osd.205down0
-65host yun170
3011osd.301down0
3021osd.302down0
3031osd.303up1
3041osd.304down0
3051osd.305down0
-75host yun171
4011osd.401down0
4021osd.402down0
4031osd.403up1
4041osd.404up1
4051osd.405down0
-85host yun172
5011osd.501down0
5021osd.502down0
5031osd.503down0
5041osd.504down0
5051osd.505up1
-95host yun173
6011osd.601down0
6021osd.602down0
6031osd.603down0
6041osd.604down0
6051osd.605down0
-105host yun174
7011osd.701down0
7021osd.702down0
7031osd.703down0
7041osd.704down0
7051osd.705down0
-115host yun175
8011osd.801down0
8021osd.802up1
8031osd.803up1
8041osd.804up1
8051osd.805up1
-125host yun176
9011osd.901up1
9021osd.902up1
9031osd.903up1
9041osd.904up1
9051osd.905up1


3. command: ceph health detail
HEALTH_WARN 374 pgs degraded; 11053 pgs down; 6805 pgs incomplete; 51729 pgs 
peering; 62838 pgs stale; 187069 pgs stuck inactive; 62838 pgs stuck stale; 
187153 pgs stuck unclean; 45 requests are blocked > 32 sec; 10 osds have slow 
requests; 1/15 in osds are down
pg 0.fb7f is stuck inactive since forever, current state down+peering, last 
acting [404]
pg 1.fb7e is stuck inactive since forever, current state stale+down+peering, 
last acting [202]
pg 2.fb7d is stuck inactive since forever, current state creating, last acting 
[204,902,505]
pg 0.fb7e is stuck inactive since forever, current state stale+peering, last 
acting [704,1004,504]
pg 1.fb7f is stuck inactive since forever, current state creating, last acting 
[404,204,804]
pg 2.fb7c is stuck inactive since forever, current state creating, last acting 
[805,901,404]
pg 0.fb7d is stuck inactive since forever, current state creating, last acting 
[903,204,803]
pg 1.fb7c is stuck inactive since forever, current state creating, last acting 
[802,905,505]
pg 2.fb7f is stuck inactive since forever, current state creating, last acting 
[804,902]
pg 1.fb7d is stuck inactive since forever, current state creating, last acting 
[803,404,204]
pg 2.fb7e is stuck inactive since forever, current state creating, last acting 
[404,905,802]
pg 0.fb7c is stuck inactive since forever, current state stale+peering, last 
acting [904,405]
pg 2.fb79 is stuck inactive since forever, current state creating, last acting 
[804,901]
pg 1.fb7a is stuck inactive since forever, current state creating, last acting 
[903,505,803]
pg 0.fb7b is stuck inactive since forever, current state stale+peering, last 
acting [801,503]
pg 2.fb78 is stuck inactive since forever, current state creating, last acting 
[903,404]
pg 1.fb7b is stuck inactive since forever, current state creating, last acting 
[505,803,303]
pg 0.fb7a is stuck inactive since forever, current state 
stale+remapped+peering, last acting [1003]
pg 2.fb7b is stuck inactive since forever, current state creating, last acting 
[303,505,802]
pg 1.fb78 is stuck inactive since forever, current state creating, last acting 
[803,303,905]
pg 0.fb79 is stuck inactive since forever, current state creating, last acting 
[901,403,805]
pg 0.fb78 is stuck inactive since forever, current state creating, last acting 
[404,901]
pg 2.fb7a is stuck inactive since forever, current state creating, last acting 
[403,303,805]
pg 1.fb79 is stuck inactive since forever, current state creating, last acting 
[803,901,404]
pg 0.fb77 is stuck inactive for 24155.756030, current state stale+peering, last 
acting [101,1005]
pg 1.fb76 is stuck inactive since forever, current state creating, last acting 
[901,505,403]
pg 2.fb75 is stuck inactive since forever, current state creating, last acting 
[905,403,204]
pg 1.fb77 is stuck inactive since forever, current state creating, last acting 
[905,805,204]
pg 2.fb74 is stuck inactive since forever, current state creating, last acting 
[901,404]
pg 0.fb75 is stuck inactive since forever, current state creating, last acting 
[903,403]
pg 1.fb74 is stuck inactive since forever, current state creating, last acting 
[901,204,403]
pg 2.fb77 is stuck inactive since forever, current state creating, last acting 
[505,802,905]
pg 0.fb74 is stuck inactive for 24042.660267, current state stale+incomplete, 
last acting [101]
pg 1.fb75 is stuck inactive since forever, current state creating, last acting 
[905,403,804]





At 2014-06-19 04:31:09,"Craig Lewis" <[email protected]> wrote:

I haven't seen behavior like that.  I have seen my OSDs use a lot of RAM while 
they're doing a recovery, but it goes back down when they're done.


Your OSD is doing something, it's using 126% CPU. What does `ceph osd tree` and 
`ceph health detail` say?




When you say you're installing Ceph on 10 severs, are you running a monitor on 
all 10 servers?







On Wed, Jun 18, 2014 at 4:18 AM, wsnote <[email protected]> wrote:

If I install ceph in 10 servers with one disk each servers, the problem remains.
This is the memory usage of ceph-osd.
ceph-osd VIRT:10.2G, RES: 4.2G
The usage of ceph-osd is too big!



At 2014-06-18 16:51:02,wsnote <[email protected]> wrote:

Hi, Lewis!
I come up with a question and don't know how to solve, so I ask you for help.
I can succeed to install ceph in a cluster with 3 or 4 servers but fail to do 
it with 10 servers.
I install it and start it, then there would be a server whose memory rises up 
to 100% and this server crash.I have to restart it.
All the config are the same.I don't know what's the problem.
Can you give some suggestion?
Thanks!

ceph.conf:
[global]
        auth supported = none


        ;auth_service_required = cephx
        ;auth_client_required = cephx
        ;auth_cluster_required = cephx
        filestore_xattr_use_omap = true


        max open files = 131072
        log file = /var/log/ceph/$name.log
        pid file = /var/run/ceph/$name.pid
        keyring = /etc/ceph/keyring.admin
        
        ;mon_clock_drift_allowed = 1 ;clock skew detected


[mon]
        mon data = /data/mon$id
        keyring = /etc/ceph/keyring.$name
[mds]
        mds data = /data/mds$id
        keyring = /etc/ceph/keyring.$name
[osd]
        osd data = /data/osd$id
        osd journal = /data/osd$id/journal
        osd journal size = 1024
        keyring = /etc/ceph/keyring.$name
        osd mkfs type = xfs    
        osd mount options xfs = rw,noatime
        osd mkfs options xfs = -f
        filestore fiemap = false


In every server, there is an mds, an mon, 11 osd with 4TB space each.
mon address is public IP, and osd address has an public IP and an cluster IP.


wsnote

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] what is the Recommandation configure for a ceph cluster with 10 servers without memory leak?

Reply via email to