It happeds when I have installed ceph cluster and start it. I have not done
anything in it.
I use one moniter per server. That is to say, there are 10 moniters in the ceph
cluster.
I tried to reduce the number of moniters, but the problem remained.
I give the info about some commands.
1. command: ceph -s
cluster 084265e5-1dc0-46ca-912b-81b9b0127461
health HEALTH_WARN 374 pgs degraded; 11053 pgs down; 6805 pgs incomplete;
51729 pgs peering; 62838 pgs stale; 187069 pgs stuck inactive; 62838 pgs stuck
stale; 187153 pgs stuck unclean; 45 requests are blocked > 32 sec; 1/15 in osds
are down
monmap e1: 10 mons at
{1=122.228.248.168:6789/0,10=122.228.248.177:6789/0,2=122.228.248.169:6789/0,3=122.228.248.170:6789/0,4=122.228.248.171:6789/0,5=122.228.248.172:6789/0,6=122.228.248.173:6789/0,7=122.228.248.174:6789/0,8=122.228.248.175:6789/0,9=122.228.248.176:6789/0},
election epoch 110, quorum 0,1,2,3,4,5,6,7,8,9 1,2,3,4,5,6,7,8,9,10
mdsmap e46: 1/1/1 up {0=1=up:creating}, 9 up:standby
osdmap e786: 50 osds: 14 up, 15 in
pgmap v1293: 193152 pgs, 3 pools, 0 bytes data, 0 objects
16997 MB used, 44673 GB / 44690 GB avail
122757 creating
2 stale+active+degraded+remapped
6480 peering
6496 stale+incomplete
89 stale+replay+degraded
10022 stale+down+peering
5 stale+active+replay+degraded
152 stale+remapped
1 stale+active+remapped
293 incomplete
1858 stale+remapped+peering
3802 stale+replay
774 down+peering
201 stale+degraded
3892 stale+active+clean+replay
76 stale+active+degraded
10 remapped+peering
16 stale+remapped+incomplete
1525 stale
1 stale+replay+degraded+remapped
257 stale+down+remapped+peering
2107 stale+active+clean
32328 stale+peering
8 stale+replay+remapped
2. command: ceph osd tree
# idweighttype nameup/downreweight
-150root default
-350rack unknownrack
-25host yun177
10011osd.1001down0
10021osd.1002down0
10031osd.1003down0
10041osd.1004down0
10051osd.1005down0
-45host yun168
1011osd.101down0
1021osd.102down0
1031osd.103down0
1041osd.104down0
1051osd.105down0
-55host yun169
2011osd.201down0
2021osd.202down0
2031osd.203down1
2041osd.204up1
2051osd.205down0
-65host yun170
3011osd.301down0
3021osd.302down0
3031osd.303up1
3041osd.304down0
3051osd.305down0
-75host yun171
4011osd.401down0
4021osd.402down0
4031osd.403up1
4041osd.404up1
4051osd.405down0
-85host yun172
5011osd.501down0
5021osd.502down0
5031osd.503down0
5041osd.504down0
5051osd.505up1
-95host yun173
6011osd.601down0
6021osd.602down0
6031osd.603down0
6041osd.604down0
6051osd.605down0
-105host yun174
7011osd.701down0
7021osd.702down0
7031osd.703down0
7041osd.704down0
7051osd.705down0
-115host yun175
8011osd.801down0
8021osd.802up1
8031osd.803up1
8041osd.804up1
8051osd.805up1
-125host yun176
9011osd.901up1
9021osd.902up1
9031osd.903up1
9041osd.904up1
9051osd.905up1
3. command: ceph health detail
HEALTH_WARN 374 pgs degraded; 11053 pgs down; 6805 pgs incomplete; 51729 pgs
peering; 62838 pgs stale; 187069 pgs stuck inactive; 62838 pgs stuck stale;
187153 pgs stuck unclean; 45 requests are blocked > 32 sec; 10 osds have slow
requests; 1/15 in osds are down
pg 0.fb7f is stuck inactive since forever, current state down+peering, last
acting [404]
pg 1.fb7e is stuck inactive since forever, current state stale+down+peering,
last acting [202]
pg 2.fb7d is stuck inactive since forever, current state creating, last acting
[204,902,505]
pg 0.fb7e is stuck inactive since forever, current state stale+peering, last
acting [704,1004,504]
pg 1.fb7f is stuck inactive since forever, current state creating, last acting
[404,204,804]
pg 2.fb7c is stuck inactive since forever, current state creating, last acting
[805,901,404]
pg 0.fb7d is stuck inactive since forever, current state creating, last acting
[903,204,803]
pg 1.fb7c is stuck inactive since forever, current state creating, last acting
[802,905,505]
pg 2.fb7f is stuck inactive since forever, current state creating, last acting
[804,902]
pg 1.fb7d is stuck inactive since forever, current state creating, last acting
[803,404,204]
pg 2.fb7e is stuck inactive since forever, current state creating, last acting
[404,905,802]
pg 0.fb7c is stuck inactive since forever, current state stale+peering, last
acting [904,405]
pg 2.fb79 is stuck inactive since forever, current state creating, last acting
[804,901]
pg 1.fb7a is stuck inactive since forever, current state creating, last acting
[903,505,803]
pg 0.fb7b is stuck inactive since forever, current state stale+peering, last
acting [801,503]
pg 2.fb78 is stuck inactive since forever, current state creating, last acting
[903,404]
pg 1.fb7b is stuck inactive since forever, current state creating, last acting
[505,803,303]
pg 0.fb7a is stuck inactive since forever, current state
stale+remapped+peering, last acting [1003]
pg 2.fb7b is stuck inactive since forever, current state creating, last acting
[303,505,802]
pg 1.fb78 is stuck inactive since forever, current state creating, last acting
[803,303,905]
pg 0.fb79 is stuck inactive since forever, current state creating, last acting
[901,403,805]
pg 0.fb78 is stuck inactive since forever, current state creating, last acting
[404,901]
pg 2.fb7a is stuck inactive since forever, current state creating, last acting
[403,303,805]
pg 1.fb79 is stuck inactive since forever, current state creating, last acting
[803,901,404]
pg 0.fb77 is stuck inactive for 24155.756030, current state stale+peering, last
acting [101,1005]
pg 1.fb76 is stuck inactive since forever, current state creating, last acting
[901,505,403]
pg 2.fb75 is stuck inactive since forever, current state creating, last acting
[905,403,204]
pg 1.fb77 is stuck inactive since forever, current state creating, last acting
[905,805,204]
pg 2.fb74 is stuck inactive since forever, current state creating, last acting
[901,404]
pg 0.fb75 is stuck inactive since forever, current state creating, last acting
[903,403]
pg 1.fb74 is stuck inactive since forever, current state creating, last acting
[901,204,403]
pg 2.fb77 is stuck inactive since forever, current state creating, last acting
[505,802,905]
pg 0.fb74 is stuck inactive for 24042.660267, current state stale+incomplete,
last acting [101]
pg 1.fb75 is stuck inactive since forever, current state creating, last acting
[905,403,804]
At 2014-06-19 04:31:09,"Craig Lewis" <[email protected]> wrote:
I haven't seen behavior like that. I have seen my OSDs use a lot of RAM while
they're doing a recovery, but it goes back down when they're done.
Your OSD is doing something, it's using 126% CPU. What does `ceph osd tree` and
`ceph health detail` say?
When you say you're installing Ceph on 10 severs, are you running a monitor on
all 10 servers?
On Wed, Jun 18, 2014 at 4:18 AM, wsnote <[email protected]> wrote:
If I install ceph in 10 servers with one disk each servers, the problem remains.
This is the memory usage of ceph-osd.
ceph-osd VIRT:10.2G, RES: 4.2G
The usage of ceph-osd is too big!
At 2014-06-18 16:51:02,wsnote <[email protected]> wrote:
Hi, Lewis!
I come up with a question and don't know how to solve, so I ask you for help.
I can succeed to install ceph in a cluster with 3 or 4 servers but fail to do
it with 10 servers.
I install it and start it, then there would be a server whose memory rises up
to 100% and this server crash.I have to restart it.
All the config are the same.I don't know what's the problem.
Can you give some suggestion?
Thanks!
ceph.conf:
[global]
auth supported = none
;auth_service_required = cephx
;auth_client_required = cephx
;auth_cluster_required = cephx
filestore_xattr_use_omap = true
max open files = 131072
log file = /var/log/ceph/$name.log
pid file = /var/run/ceph/$name.pid
keyring = /etc/ceph/keyring.admin
;mon_clock_drift_allowed = 1 ;clock skew detected
[mon]
mon data = /data/mon$id
keyring = /etc/ceph/keyring.$name
[mds]
mds data = /data/mds$id
keyring = /etc/ceph/keyring.$name
[osd]
osd data = /data/osd$id
osd journal = /data/osd$id/journal
osd journal size = 1024
keyring = /etc/ceph/keyring.$name
osd mkfs type = xfs
osd mount options xfs = rw,noatime
osd mkfs options xfs = -f
filestore fiemap = false
In every server, there is an mds, an mon, 11 osd with 4TB space each.
mon address is public IP, and osd address has an public IP and an cluster IP.
wsnote
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com