Re: ceph cluster hangs when rebooting one node
On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: > Hello list, > > i was checking what happens if i reboot a ceph node. > > Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is > possible. If you are using the current master, the new 'min_size' may be biting you; ceph osd dump | grep ^pool and see if you see min_size for your pools. You can change that back to the norma behavior with ceph osd pool set min_size 1 sage > > ceph -w: > Looks like this: > 2012-11-12 16:03:58.191106 mon.0 [INF] pgmap v19013: 7032 pgs: 7032 > active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail > 2012-11-12 16:04:08.365557 mon.0 [INF] mon.a calling new monitor election > 2012-11-12 16:04:13.422682 mon.0 [INF] mon.a@0 won leader election with quorum > 0,2 > 2012-11-12 16:04:13.708045 mon.0 [INF] pgmap v19014: 7032 pgs: 7032 > active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail > 2012-11-12 16:04:13.708059 mon.0 [INF] mdsmap e1: 0/0/1 up > 2012-11-12 16:04:13.708070 mon.0 [INF] osdmap e4582: 20 osds: 20 up, 20 in > 2012-11-12 16:04:08.242688 mon.2 [INF] mon.c calling new monitor election > 2012-11-12 16:04:13.708089 mon.0 [INF] monmap e1: 3 mons at > {a=10.255.0.100:6789/0,b=10.255.0.101:6789/0,c=10.255.0.102:6789/0} > 2012-11-12 16:04:14.070593 mon.0 [INF] pgmap v19015: 7032 pgs: 7032 > active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail > 2012-11-12 16:04:15.283954 mon.0 [INF] pgmap v19016: 7032 pgs: 7032 > active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail > 2012-11-12 16:04:18.506812 mon.0 [INF] osd.21 10.255.0.101:6800/5049 failed (3 > reports from 3 peers after 20.339769 >= grace 20.00) > 2012-11-12 16:04:18.890003 mon.0 [INF] osdmap e4583: 20 osds: 19 up, 20 in > 2012-11-12 16:04:19.137936 mon.0 [INF] pgmap v19017: 7032 pgs: 6720 > active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / > 4469 GB avail > 2012-11-12 16:04:20.024595 mon.0 [INF] osdmap e4584: 20 osds: 19 up, 20 in > 2012-11-12 16:04:20.330149 mon.0 [INF] pgmap v19018: 7032 pgs: 6720 > active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / > 4469 GB avail > 2012-11-12 16:04:21.535471 mon.0 [INF] pgmap v19019: 7032 pgs: 6720 > active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / > 4469 GB avail > 2012-11-12 16:04:24.181292 mon.0 [INF] osd.22 10.255.0.101:6803/5153 failed (3 > reports from 3 peers after 23.013550 >= grace 20.00) > 2012-11-12 16:04:24.182208 mon.0 [INF] osd.23 10.255.0.101:6806/5276 failed (3 > reports from 3 peers after 21.000834 >= grace 20.00) > 2012-11-12 16:04:24.671373 mon.0 [INF] pgmap v19020: 7032 pgs: 6637 > active+clean, 208 stale+active+clean, 187 incomplete; 91615 MB data, 174 GB > used, 4295 GB / 4469 GB avail > 2012-11-12 16:04:24.829022 mon.0 [INF] osdmap e4585: 20 osds: 17 up, 20 in > 2012-11-12 16:04:24.870969 mon.0 [INF] osd.24 10.255.0.101:6809/5397 failed (3 > reports from 3 peers after 20.688672 >= grace 20.00) > 2012-11-12 16:04:25.522333 mon.0 [INF] pgmap v19021: 7032 pgs: 5912 > active+clean, 933 stale+active+clean, 187 incomplete; 91615 MB data, 174 GB > used, 4295 GB / 4469 GB avail > 2012-11-12 16:04:25.596927 mon.0 [INF] osd.24 10.255.0.101:6809/5397 failed (3 > reports from 3 peers after 21.708444 >= grace 20.00) > 2012-11-12 16:04:26.077545 mon.0 [INF] osdmap e4586: 20 osds: 16 up, 20 in > 2012-11-12 16:04:26.606475 mon.0 [INF] pgmap v19022: 7032 pgs: 5394 > active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 173 GB > used, 4296 GB / 4469 GB avail > 2012-11-12 16:04:27.162034 mon.0 [INF] osdmap e4587: 20 osds: 16 up, 20 in > 2012-11-12 16:04:27.656974 mon.0 [INF] pgmap v19023: 7032 pgs: 5394 > active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 173 GB > used, 4296 GB / 4469 GB avail > 2012-11-12 16:04:30.229958 mon.0 [INF] pgmap v19024: 7032 pgs: 5394 > active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 172 GB > used, 4296 GB / 4469 GB avail > 2012-11-12 16:04:31.411989 mon.0 [INF] pgmap v19025: 7032 pgs: 5394 > active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 172 GB > used, 4296 GB / 4469 GB avail > 2012-11-12 16:04:32.617576 mon.0 [INF] pgmap v19026: 7032 pgs: 4660 > active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB / 4469 GB > avail > 2012-11-12 16:04:35.172861 mon.0 [INF] pgmap v19027: 7032 pgs: 4660 > active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB / 4469 GB > avail > 2012-11-12 16:04:30.505872 osd.53 [WRN] 6 slow requests, 6 included below; > oldest blocked for > 30.247691 secs > 2012-11-12 16:04:30.505875 osd.53 [WRN] slow request 30.247691 seconds old, > received at 2012-11-12 16:04:00.258118: osd_op(client.131626.0:771962 > rb.0.107a.734602d5.0bce [write 2478080~4096] 3.562a9efc) v4 currently > reached pg > 2012-11-12 16:04:30.505879 osd.53 [WRN] slow request 30.238016 seconds old, > received at 2012-11-12 16:04:00.2
Re: ceph cluster hangs when rebooting one node
Am 12.11.2012 16:11, schrieb Sage Weil: On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, i was checking what happens if i reboot a ceph node. Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is possible. If you are using the current master, the new 'min_size' may be biting you; ceph osd dump | grep ^pool and see if you see min_size for your pools. You can change that back to the norma behavior with No i don't see any min size: # ceph osd dump | grep ^pool pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 3000 pgp_num 3000 last_change 958 owner 0 ceph osd pool set min_size 1 Yes this helps! But min_size is still not shown in ceph osd dump. Also when i reboot a node it takes up to 10s-20s until all osds from this node are set to failed and the I/O starts again. Should i issue an ceph osd out command before? But i had already this set for all my rules in my crushmap min_size 1 max_size 2 in my crushmap for each rule. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph cluster hangs when rebooting one node
Hello! I have the same problem. After switching off the second node, the cluster hangs, there is some solution? All the best, Alex! 2012/11/12 Stefan Priebe - Profihost AG : > Am 12.11.2012 16:11, schrieb Sage Weil: > >> On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: >>> >>> Hello list, >>> >>> i was checking what happens if i reboot a ceph node. >>> >>> Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is >>> possible. >> >> >> If you are using the current master, the new 'min_size' may be biting you; >> ceph osd dump | grep ^pool and see if you see min_size for your pools. >> You can change that back to the norma behavior with > > > No i don't see any min size: > > # ceph osd dump | grep ^pool > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 > pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num > 1344 pgp_num 1344 last_change 1 owner 0 > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 > pgp_num 1344 last_change 1 owner 0 > pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num > 3000 pgp_num 3000 last_change 958 owner 0 > > >> ceph osd pool set min_size 1 > > Yes this helps! But min_size is still not shown in ceph osd dump. Also when > i reboot a node it takes up to 10s-20s until all osds from this node are set > to failed and the I/O starts again. Should i issue an ceph osd out command > before? > > But i had already this set for all my rules in my crushmap > min_size 1 > max_size 2 > > in my crushmap for each rule. > > > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph cluster hangs when rebooting one node
On Wed, 14 Nov 2012, Aleksey Samarin wrote: > Hello! > > I have the same problem. After switching off the second node, the > cluster hangs, there is some solution? > > All the best, Alex! I suspect this is min_size; the latest master has a few changes and also will print it out so you can tell what is going on. min_size is the minimum number of replicas before the OSDs will go active (handle reads/writes). Setting it to 1 gets you old behavior, while increasing it protects you from cases where writes to a single replica that then fails will force the admin to make a difficult decision about losing data. You can adjust with ceph osd pool set min_size sage > > 2012/11/12 Stefan Priebe - Profihost AG : > > Am 12.11.2012 16:11, schrieb Sage Weil: > > > >> On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: > >>> > >>> Hello list, > >>> > >>> i was checking what happens if i reboot a ceph node. > >>> > >>> Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is > >>> possible. > >> > >> > >> If you are using the current master, the new 'min_size' may be biting you; > >> ceph osd dump | grep ^pool and see if you see min_size for your pools. > >> You can change that back to the norma behavior with > > > > > > No i don't see any min size: > > > > # ceph osd dump | grep ^pool > > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 > > pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 > > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num > > 1344 pgp_num 1344 last_change 1 owner 0 > > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 > > pgp_num 1344 last_change 1 owner 0 > > pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num > > 3000 pgp_num 3000 last_change 958 owner 0 > > > > > >> ceph osd pool set min_size 1 > > > > Yes this helps! But min_size is still not shown in ceph osd dump. Also when > > i reboot a node it takes up to 10s-20s until all osds from this node are set > > to failed and the I/O starts again. Should i issue an ceph osd out command > > before? > > > > But i had already this set for all my rules in my crushmap > > min_size 1 > > max_size 2 > > > > in my crushmap for each rule. > > > > > > Stefan > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html