Re: [ceph-users] stopped backfilling process

2013-11-06 Thread Dominik Mostowiec
I hope it will help.

crush: https://www.dropbox.com/s/inrmq3t40om26vf/crush.txt
ceph osd dump: https://www.dropbox.com/s/jsbt7iypyfnnbqm/ceph_osd_dump.txt

--
Regards
Dominik

2013/11/6 yy-nm yxdyours...@gmail.com:
 On 2013/11/5 22:02, Dominik Mostowiec wrote:

 Hi,
 After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
 starts data migration process.
 It stopped on:
 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
 active+degraded, 2 active+clean+scrubbing;
 degraded (1.718%)

 All osd with reweight==1 are UP.

 ceph -v
 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

 health details:
 https://www.dropbox.com/s/149zvee2ump1418/health_details.txt

 pg active+degraded query:
 https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
 pg active+remapped query:
 https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt

 Please help - how can we fix it?

 can you show  your  decoded crushmap? and output of #ceph osd dump ?

 ---
 此电子邮件没有病毒和恶意软件,因为 avast! 防病毒保护处于活动状态。
 http://www.avast.com




-- 
Pozdrawiam
Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stopped backfilling process

2013-11-06 Thread Bohdan Sydor
On Tue, Nov 5, 2013 at 3:02 PM, Dominik Mostowiec
dominikmostow...@gmail.com wrote:
 After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
 starts data migration process.
 It stopped on:
 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
 active+degraded, 2 active+clean+scrubbing;
 degraded (1.718%)

 All osd with reweight==1 are UP.

 ceph -v
 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

Hi,

Below, I'm pasting some more information on this issue.

The cluster status hasn't been changed for more than 24 hours:
# ceph health
HEALTH_WARN 1596 pgs degraded; 1787 pgs stuck unclean; recovery
2142704/123949567 degraded (1.729%)

I parsed the output of ceph pg dump and can see there three types of pg states:

1. *Two* osd's are up and *two* acting:

16.11   [42, 92][42, 92]active+degraded
17.10   [42, 92][42, 92]active+degraded

2. *Three* osd's are up and *three* acting:

12.d[114, 138, 5]   [114, 138, 5]   active+clean
15.e[13, 130, 142]  [13, 130, 142]  active+clean

3. *Two* osd's that are and *three* acting:

16.2256 [63, 109]   [63, 109, 40]   active+remapped
16.220b [129, 22]   [129, 22, 47]   active+remapped

A part of the crush map:

rack rack1 {
id -5   # do not change unnecessarily
# weight 60.000
alg straw
hash 0  # rjenkins1
item storinodfs1 weight 12.000
item storinodfs11 weight 12.000
item storinodfs6 weight 12.000
item storinodfs9 weight 12.000
item storinodfs8 weight 12.000
}

rack rack2 {
id -7   # do not change unnecessarily
# weight 48.000
alg straw
hash 0  # rjenkins1
item storinodfs3 weight 12.000
item storinodfs4 weight 12.000
item storinodfs2 weight 12.000
item storinodfs10 weight 12.000
}

rack rack3 {
id -10  # do not change unnecessarily
# weight 36.000
alg straw
hash 0  # rjenkins1
item storinodfs5 weight 12.000 === all osd's on this node
have been disabled by ceph osd out
item storinodfs7 weight 12.000
item storinodfs12 weight 12.000
}

rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}

The command ceph osd out has been invoked on all osd's on storinodfs5,
and I can see them all down while listing with ceph osd tree:

-11 12  host storinodfs5
48  1   osd.48  down0
49  1   osd.49  down0
50  1   osd.50  down0
51  1   osd.51  down0
52  1   osd.52  down0
53  1   osd.53  down0
54  1   osd.54  down0
55  1   osd.55  down0
56  1   osd.56  down0
57  1   osd.57  down0
58  1   osd.58  down0
59  1   osd.59  down0



I wonder if the current cluster state might be related to the fact
that the crush map keeps information that storinodfs5 has weight 12?
We're unable to make ceph recover from this faulty state.

Any hints are very appreciated.

-- 
Regards,
Bohdan Sydor
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] stopped backfilling process

2013-11-05 Thread Dominik Mostowiec
Hi,
After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
starts data migration process.
It stopped on:
32424 pgs: 30635 active+clean, 191 active+remapped, 1596
active+degraded, 2 active+clean+scrubbing;
degraded (1.718%)

All osd with reweight==1 are UP.

ceph -v
ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

health details:
https://www.dropbox.com/s/149zvee2ump1418/health_details.txt

pg active+degraded query:
https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
pg active+remapped query:
https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt

Please help - how can we fix it?

-- 
Pozdrawiam
Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stopped backfilling process

2013-11-05 Thread Dominik Mostowiec
Hi,
This is s3/ceph cluster, .rgw.buckets has 3 copies of data.
Many PG's are only on 2 OSD's and are marked as 'degraded'.
Scrubbing can fix this on degraded object's?

I don't have set tunables in cruch, mabye this can help (this is safe?)?

--
Regards
Dominik



2013/11/5 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
 starts data migration process.
 It stopped on:
 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
 active+degraded, 2 active+clean+scrubbing;
 degraded (1.718%)

 All osd with reweight==1 are UP.

 ceph -v
 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

 health details:
 https://www.dropbox.com/s/149zvee2ump1418/health_details.txt

 pg active+degraded query:
 https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
 pg active+remapped query:
 https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt

 Please help - how can we fix it?

 --
 Pozdrawiam
 Dominik



-- 
Pozdrawiam
Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com