Re: [ceph-users] [PG] Slow request *** seconds old,v4 currently waiting for pg to exist locally
Hi, looks that some osds are down?! What is the output of ceph osd tree Udo Am 25.09.2014 04:29, schrieb Aegeaner: The cluster healthy state is WARN: health HEALTH_WARN 118 pgs degraded; 8 pgs down; 59 pgs incomplete; 28 pgs peering; 292 pgs stale; 87 pgs stuck inactive; 292 pgs stuck stale; 205 pgs stuck unclean; 22 requests are blocked 32 sec; recovery 12474/46357 objects degraded (26.909%) monmap e3: 3 mons at {CVM-0-mon01=172.18.117.146:6789/0,CVM-0-mon02=172.18.117.152:6789/0,CVM-0-mon03=172.18.117.153:6789/0}, election epoch 24, quorum 0,1,2 CVM-0-mon01,CVM-0-mon02,CVM-0-mon03 osdmap e421: 9 osds: 9 up, 9 in pgmap v2261: 292 pgs, 4 pools, 91532 MB data, 23178 objects 330 MB used, 3363 GB / 3363 GB avail 12474/46357 objects degraded (26.909%) 20 stale+peering 87 stale+active+clean 8 stale+down+peering 59 stale+incomplete 118 stale+active+degraded What does these errors mean? Can these PGs be recovered? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [PG] Slow request *** seconds old,v4 currently waiting for pg to exist locally
Hi again, sorry - forgot my post... see osdmap e421: 9 osds: 9 up, 9 in shows that all your 9 osds are up! Do you have trouble with your journal/filesystem? Udo Am 25.09.2014 08:01, schrieb Udo Lembke: Hi, looks that some osds are down?! What is the output of ceph osd tree Udo Am 25.09.2014 04:29, schrieb Aegeaner: The cluster healthy state is WARN: health HEALTH_WARN 118 pgs degraded; 8 pgs down; 59 pgs incomplete; 28 pgs peering; 292 pgs stale; 87 pgs stuck inactive; 292 pgs stuck stale; 205 pgs stuck unclean; 22 requests are blocked 32 sec; recovery 12474/46357 objects degraded (26.909%) monmap e3: 3 mons at {CVM-0-mon01=172.18.117.146:6789/0,CVM-0-mon02=172.18.117.152:6789/0,CVM-0-mon03=172.18.117.153:6789/0}, election epoch 24, quorum 0,1,2 CVM-0-mon01,CVM-0-mon02,CVM-0-mon03 osdmap e421: 9 osds: 9 up, 9 in pgmap v2261: 292 pgs, 4 pools, 91532 MB data, 23178 objects 330 MB used, 3363 GB / 3363 GB avail 12474/46357 objects degraded (26.909%) 20 stale+peering 87 stale+active+clean 8 stale+down+peering 59 stale+incomplete 118 stale+active+degraded What does these errors mean? Can these PGs be recovered? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [PG] Slow request *** seconds old,v4 currently waiting for pg to exist locally
osd_op(client.4625.1:9005787) . This is due to external factors. For example, the network settings. 2014-09-25 10:05 GMT+04:00 Udo Lembke ulem...@polarzone.de: Hi again, sorry - forgot my post... see osdmap e421: 9 osds: 9 up, 9 in shows that all your 9 osds are up! Do you have trouble with your journal/filesystem? Udo Am 25.09.2014 08:01, schrieb Udo Lembke: Hi, looks that some osds are down?! What is the output of ceph osd tree Udo Am 25.09.2014 04:29, schrieb Aegeaner: The cluster healthy state is WARN: health HEALTH_WARN 118 pgs degraded; 8 pgs down; 59 pgs incomplete; 28 pgs peering; 292 pgs stale; 87 pgs stuck inactive; 292 pgs stuck stale; 205 pgs stuck unclean; 22 requests are blocked 32 sec; recovery 12474/46357 objects degraded (26.909%) monmap e3: 3 mons at {CVM-0-mon01= 172.18.117.146:6789/0,CVM-0-mon02=172.18.117.152:6789/0,CVM-0-mon03=172.18.117.153:6789/0 }, election epoch 24, quorum 0,1,2 CVM-0-mon01,CVM-0-mon02,CVM-0-mon03 osdmap e421: 9 osds: 9 up, 9 in pgmap v2261: 292 pgs, 4 pools, 91532 MB data, 23178 objects 330 MB used, 3363 GB / 3363 GB avail 12474/46357 objects degraded (26.909%) 20 stale+peering 87 stale+active+clean 8 stale+down+peering 59 stale+incomplete 118 stale+active+degraded What does these errors mean? Can these PGs be recovered? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [PG] Slow request *** seconds old,v4 currently waiting for pg to exist locally
Yeah, three of nine OSDs went down but I recreated them, but the pgs cannot be recovered. I don't know how to erase all the pgs, so I deleted all the osd pools, including data and metadata … Now all pgs are active and clean... I'm not sure if there are more elegant ways to deal with this. === Aegeaner 在 2014-09-25 14:11, Irek Fasikhov 写道: osd_op(client.4625.1:9005787) . This is due to external factors. For example, the network settings. 2014-09-25 10:05 GMT+04:00 Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de: Hi again, sorry - forgot my post... see osdmap e421: 9 osds: 9 up, 9 in shows that all your 9 osds are up! Do you have trouble with your journal/filesystem? Udo Am 25.09.2014 08:01, schrieb Udo Lembke: Hi, looks that some osds are down?! What is the output of ceph osd tree Udo Am 25.09.2014 04:29, schrieb Aegeaner: The cluster healthy state is WARN: health HEALTH_WARN 118 pgs degraded; 8 pgs down; 59 pgs incomplete; 28 pgs peering; 292 pgs stale; 87 pgs stuck inactive; 292 pgs stuck stale; 205 pgs stuck unclean; 22 requests are blocked 32 sec; recovery 12474/46357 objects degraded (26.909%) monmap e3: 3 mons at {CVM-0-mon01=172.18.117.146:6789/0,CVM-0-mon02=172.18.117.152:6789/0,CVM-0-mon03=172.18.117.153:6789/0 http://172.18.117.146:6789/0,CVM-0-mon02=172.18.117.152:6789/0,CVM-0-mon03=172.18.117.153:6789/0}, election epoch 24, quorum 0,1,2 CVM-0-mon01,CVM-0-mon02,CVM-0-mon03 osdmap e421: 9 osds: 9 up, 9 in pgmap v2261: 292 pgs, 4 pools, 91532 MB data, 23178 objects 330 MB used, 3363 GB / 3363 GB avail 12474/46357 objects degraded (26.909%) 20 stale+peering 87 stale+active+clean 8 stale+down+peering 59 stale+incomplete 118 stale+active+degraded What does these errors mean? Can these PGs be recovered? ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state
Replies Inline : Sahana Lokeshappa Test Development Engineer I SanDisk Corporation 3rd Floor, Bagmane Laurel, Bagmane Tech Park C V Raman nagar, Bangalore 560093 T: +918042422283 sahana.lokesha...@sandisk.com -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Wednesday, September 24, 2014 6:10 PM To: Sahana Lokeshappa Cc: Varada Kari; ceph-us...@ceph.com Subject: RE: [Ceph-community] Pgs are in stale+down+peering state On Wed, 24 Sep 2014, Sahana Lokeshappa wrote: 2.a9518 0 0 0 0 2172649472 3001 3001active+clean2014-09-22 17:49:35.357586 6826'35762 17842:72706 [12,7,28] 12 [12,7,28] 12 6826'35762 2014-09-22 11:33:55.985449 0'0 2014-09-16 20:11:32.693864 Can you verify that 2.a9 exists in teh data directory for 12, 7, and/or 28? If so the next step would be to enable logging (debug osd = 20, debug ms = 1) and see wy peering is stuck... Yes 2.a9 directories are present in osd.12, 7 ,28 and 0.49 0.4d and 0.1c directories are not present in respective acting osds. Here are the logs I can see when debugs were raised to 20 2014-09-24 18:38:41.706566 7f92e2dc8700 7 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] replica_scrub 2014-09-24 18:38:41.706586 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] build_scrub_map 2014-09-24 18:38:41.706592 7f92e2dc8700 20 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] scrub_map_chunk [476de738//0//-1,f38//0//-1) 2014-09-24 18:38:41.711778 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] _scan_list scanning 23 objects deeply 2014-09-24 18:38:41.730881 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x89cda20 already has epoch 17850 2014-09-24 18:38:41.73 7f92eede0700 20 osd.12 17850 share_map_peer 0x89cda20 already has epoch 17850 2014-09-24 18:38:41.822444 7f92ed5dd700 20 osd.12 17850 share_map_peer 0xd2eb080 already has epoch 17850 2014-09-24 18:38:41.822519 7f92eede0700 20 osd.12 17850 share_map_peer 0xd2eb080 already has epoch 17850 2014-09-24 18:38:41.878894 7f92eede0700 20 osd.12 17850 share_map_peer 0xd5cd5a0 already has epoch 17850 2014-09-24 18:38:41.878921 7f92ed5dd700 20 osd.12 17850 share_map_peer 0xd5cd5a0 already has epoch 17850 2014-09-24 18:38:41.918307 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x1161bde0 already has epoch 17850 2014-09-24 18:38:41.918426 7f92eede0700 20 osd.12 17850 share_map_peer 0x1161bde0 already has epoch 17850 2014-09-24 18:38:41.951678 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x7fc5700 already has epoch 17850 2014-09-24 18:38:41.951709 7f92eede0700 20 osd.12 17850 share_map_peer 0x7fc5700 already has epoch 17850 2014-09-24 18:38:42.064759 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] build_scrub_map_chunk done. 2014-09-24 18:38:42.107016 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x10377b80 already has epoch 17850 2014-09-24 18:38:42.107032 7f92eede0700 20 osd.12 17850 share_map_peer 0x10377b80 already has epoch 17850 2014-09-24 18:38:42.109356 7f92f15e5700 10 osd.12 17850 do_waiters -- start 2014-09-24 18:38:42.109372 7f92f15e5700 10 osd.12 17850 do_waiters -- finish 2014-09-24 18:38:42.109373 7f92f15e5700 20 osd.12 17850 _dispatch 0xeb0d900 replica scrub(pg: 2.738,from:0'0,to:6489'28646,epoch:17850,start:f38//0//-1,end:92371f38//0//-1,chunky:1,deep:1,version:5) v5 2014-09-24 18:38:42.109378 7f92f15e5700 10 osd.12 17850 queueing MOSDRepScrub replica scrub(pg: 2.738,from:0'0,to:6489'28646,epoch:17850,start:f38//0//-1,end:92371f38//0//-1,chunky:1,deep:1,version:5) v5 2014-09-24 18:38:42.109395 7f92f15e5700 10 osd.12 17850 do_waiters -- start 2014-09-24 18:38:42.109396 7f92f15e5700 10 osd.12 17850 do_waiters -- finish 2014-09-24 18:38:42.109456 7f92e2dc8700 7 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] replica_scrub 2014-09-24 18:38:42.109522 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c
Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state
Hi Craig, Sorry for late response. Somehow missed this mail. All osds are up and running. There were no specific logs related to this activity. And, there are no IOs running right now. Few osds were made in and out ,removed fully and recreated before these pgs coming to this stage. I had tried restarting osds. It didn’t work. Thanks Sahana Lokeshappa Test Development Engineer I SanDisk Corporation 3rd Floor, Bagmane Laurel, Bagmane Tech Park C V Raman nagar, Bangalore 560093 T: +918042422283 sahana.lokesha...@sandisk.com From: Craig Lewis [mailto:cle...@centraldesktop.com] Sent: Wednesday, September 24, 2014 5:44 AM To: Sahana Lokeshappa Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state Is osd.12 doing anything strange? Is it consuming lots of CPU or IO? Is it flapping? Writing any interesting logs? Have you tried restarting it? If that doesn't help, try the other involved osds: 56, 27, 6, 25, 23. I doubt that it will help, but it won't hurt. On Mon, Sep 22, 2014 at 11:21 AM, Varada Kari varada.k...@sandisk.commailto:varada.k...@sandisk.com wrote: Hi Sage, To give more context on this problem, This cluster has two pools rbd and user-created. Osd.12 is a primary for some other PG’s , but the problem happens for these three PG’s. $ sudo ceph osd lspools 0 rbd,2 pool1, $ sudo ceph -s cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758 health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean; 1 requests are blocked 32 sec monmap e1: 3 mons at {rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0http://10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0}, election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3 osdmap e17842: 64 osds: 64 up, 64 in pgmap v79729: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects 12504 GB used, 10971 GB / 23476 GB avail 2145 active+clean 3 stale+down+peering Snippet from pg dump: 2.a9518 0 0 0 0 2172649472 30013001 active+clean2014-09-22 17:49:35.357586 6826'35762 17842:72706 [12,7,28] 12 [12,7,28] 12 6826'35762 2014-09-22 11:33:55.985449 0'0 2014-09-16 20:11:32.693864 0.590 0 0 0 0 0 0 0 active+clean2014-09-22 17:50:00.751218 0'0 17842:4472 [12,41,2] 12 [12,41,2] 12 0'0 2014-09-22 16:47:09.315499 0'0 2014-09-16 12:20:48.618726 0.4d0 0 0 0 0 0 4 4 stale+down+peering 2014-09-18 17:51:10.038247 186'4 11134:498 [12,56,27] 12 [12,56,27] 12 186'42014-09-18 17:30:32.393188 0'0 2014-09-16 12:20:48.615322 0.490 0 0 0 0 0 0 0 stale+down+peering 2014-09-18 17:44:52.681513 0'0 11134:498 [12,6,25] 12 [12,6,25] 12 0'0 2014-09-18 17:16:12.986658 0'0 2014-09-16 12:20:48.614192 0.1c0 0 0 0 0 0 12 12 stale+down+peering 2014-09-18 17:51:16.735549 186'12 11134:522 [12,25,23] 12 [12,25,23] 12 186'12 2014-09-18 17:16:04.457863 186'10 2014-09-16 14:23:58.731465 2.17510 0 0 0 0 2139095040 30013001 active+clean2014-09-22 17:52:20.364754 6784'30742 17842:72033 [12,27,23] 12 [12,27,23] 12 6784'30742 2014-09-22 00:19:39.905291 0'0 2014-09-16 20:11:17.016299 2.7e8 508 0 0 0 0 2130706432 34333433 active+clean2014-09-22 17:52:20.365083 6702'21132 17842:64769 [12,25,23] 12 [12,25,23] 12 6702'21132 2014-09-22 17:01:20.546126 0'0 2014-09-16 14:42:32.079187 2.6a5 528 0 0 0 0 2214592512 28402840 active+clean2014-09-22 22:50:38.092084 6775'34416 17842:83221 [12,58,0] 12 [12,58,0] 12 6775'34416 2014-09-22 22:50:38.091989 0'0 2014-09-16 20:11:32.703368 And we couldn’t observe and peering events happening on the primary osd. $ sudo ceph pg 0.49 query Error ENOENT: i don't have pgid 0.49 $ sudo ceph pg 0.4d query Error ENOENT: i don't have pgid 0.4d $ sudo ceph pg 0.1c query Error ENOENT: i don't have pgid 0.1c Not able to explain why the peering was stuck. BTW, Rbd pool doesn’t contain any data. Varada From: Ceph-community [mailto:ceph-community-boun...@lists.ceph.commailto:ceph-community-boun...@lists.ceph.com] On Behalf Of Sage Weil Sent: Monday, September 22, 2014 10:44 PM To: Sahana Lokeshappa;
Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state
Hi All, Here are the steps I followed, to get back all pgs to active+clean state. Still don't know what is the root cause for this pg state. 1. Force create pgs which are in stale+down+peering 2. Stop osd.12 3. Mark osd.12 as lost 4. Start osd.12 5. All pgs were back to active+clean state Thanks Sahana Lokeshappa Test Development Engineer I SanDisk Corporation 3rd Floor, Bagmane Laurel, Bagmane Tech Park C V Raman nagar, Bangalore 560093 T: +918042422283 sahana.lokesha...@sandisk.com -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sahana Lokeshappa Sent: Thursday, September 25, 2014 1:26 PM To: Sage Weil Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state Replies Inline : Sahana Lokeshappa Test Development Engineer I SanDisk Corporation 3rd Floor, Bagmane Laurel, Bagmane Tech Park C V Raman nagar, Bangalore 560093 T: +918042422283 sahana.lokesha...@sandisk.com -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Wednesday, September 24, 2014 6:10 PM To: Sahana Lokeshappa Cc: Varada Kari; ceph-us...@ceph.com Subject: RE: [Ceph-community] Pgs are in stale+down+peering state On Wed, 24 Sep 2014, Sahana Lokeshappa wrote: 2.a9518 0 0 0 0 2172649472 3001 3001active+clean2014-09-22 17:49:35.357586 6826'35762 17842:72706 [12,7,28] 12 [12,7,28] 12 6826'35762 2014-09-22 11:33:55.985449 0'0 2014-09-16 20:11:32.693864 Can you verify that 2.a9 exists in teh data directory for 12, 7, and/or 28? If so the next step would be to enable logging (debug osd = 20, debug ms = 1) and see wy peering is stuck... Yes 2.a9 directories are present in osd.12, 7 ,28 and 0.49 0.4d and 0.1c directories are not present in respective acting osds. Here are the logs I can see when debugs were raised to 20 2014-09-24 18:38:41.706566 7f92e2dc8700 7 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] replica_scrub 2014-09-24 18:38:41.706586 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] build_scrub_map 2014-09-24 18:38:41.706592 7f92e2dc8700 20 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] scrub_map_chunk [476de738//0//-1,f38//0//-1) 2014-09-24 18:38:41.711778 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] _scan_list scanning 23 objects deeply 2014-09-24 18:38:41.730881 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x89cda20 already has epoch 17850 2014-09-24 18:38:41.73 7f92eede0700 20 osd.12 17850 share_map_peer 0x89cda20 already has epoch 17850 2014-09-24 18:38:41.822444 7f92ed5dd700 20 osd.12 17850 share_map_peer 0xd2eb080 already has epoch 17850 2014-09-24 18:38:41.822519 7f92eede0700 20 osd.12 17850 share_map_peer 0xd2eb080 already has epoch 17850 2014-09-24 18:38:41.878894 7f92eede0700 20 osd.12 17850 share_map_peer 0xd5cd5a0 already has epoch 17850 2014-09-24 18:38:41.878921 7f92ed5dd700 20 osd.12 17850 share_map_peer 0xd5cd5a0 already has epoch 17850 2014-09-24 18:38:41.918307 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x1161bde0 already has epoch 17850 2014-09-24 18:38:41.918426 7f92eede0700 20 osd.12 17850 share_map_peer 0x1161bde0 already has epoch 17850 2014-09-24 18:38:41.951678 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x7fc5700 already has epoch 17850 2014-09-24 18:38:41.951709 7f92eede0700 20 osd.12 17850 share_map_peer 0x7fc5700 already has epoch 17850 2014-09-24 18:38:42.064759 7f92e2dc8700 10 osd.12 pg_epoch: 17850 pg[2.738( v 6870'28894 (4076'25093,6870'28894] local-les=17723 n=537 ec=188 les/c 17723/17725 17722/17722/17709) [57,12,48] r=1 lpr=17722 pi=17199-17721/6 luod=0'0 crt=0'0 lcod 0'0 active] build_scrub_map_chunk done. 2014-09-24 18:38:42.107016 7f92ed5dd700 20 osd.12 17850 share_map_peer 0x10377b80 already has epoch 17850 2014-09-24 18:38:42.107032 7f92eede0700 20 osd.12 17850 share_map_peer 0x10377b80 already has epoch 17850 2014-09-24 18:38:42.109356 7f92f15e5700 10 osd.12 17850 do_waiters -- start 2014-09-24 18:38:42.109372 7f92f15e5700 10 osd.12 17850 do_waiters -- finish 2014-09-24 18:38:42.109373 7f92f15e5700 20 osd.12 17850 _dispatch 0xeb0d900 replica scrub(pg: 2.738,from:0'0,to:6489'28646,epoch:17850,start:f38//0//-1,end:92371f38//0//-1,chunky:1,deep:1,version:5) v5
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, That's strange. 3.13 is way before any changes that could have had any such effect. Can you by any chance try with older kernels to see where it starts misbehaving for you? 3.12? 3.10? 3.8? my crush tunables are set to bobtail, so I can't go bellow 3.9, I will try 3.12 tomorrow and report back. Ok, I have tested 3.12.9 and it also hangs. I have no other pre-build kernels to test :-(. If I have to compile Kernels anyway I will test 3.16.3 as well :-/. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS
As Dieter asked, what replication level is this, I guess 1? Yes, replication x1 for theses benchmarks. Now at 3 nodes and 6 OSDs you're getting about the performance of a single SSD, food for thought. yes, sure . I don't have more nodes to test, but I would like to known if it's scale more than 20k iops with more nodes. but clearly, the cpu is the limit. - Mail original - De: Christian Balzer ch...@gol.com À: ceph-users@lists.ceph.com Envoyé: Jeudi 25 Septembre 2014 06:50:31 Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS On Wed, 24 Sep 2014 20:49:21 +0200 (CEST) Alexandre DERUMIER wrote: What about writes with Giant? I'm around - 4k iops (4k random) with 1osd (1 node - 1 osd) - 8k iops (4k random) with 2 osd (1 node - 2 osd) - 16K iops (4k random) with 4 osd (2 nodes - 2 osd by node) - 22K iops (4k random) with 6 osd (3 nodes - 2 osd by node) Seem to scale, but I'm cpu bound on node (8 cores E5-2603 v2 @ 1.80GHz 100% cpu for 2 osd) You don't even need a full SSD cluster to see that Ceph has a lot of room for improvements, see my Slow IOPS on RBD compared to journal and backing devices thread in May. As Dieter asked, what replication level is this, I guess 1? Now at 3 nodes and 6 OSDs you're getting about the performance of a single SSD, food for thought. Christian - Mail original - De: Sebastien Han sebastien@enovance.com À: Jian Zhang jian.zh...@intel.com Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-users@lists.ceph.com Envoyé: Mardi 23 Septembre 2014 17:41:38 Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS What about writes with Giant? On 18 Sep 2014, at 08:12, Zhang, Jian jian.zh...@intel.com wrote: Have anyone ever testing multi volume performance on a *FULL* SSD setup? We are able to get ~18K IOPS for 4K random read on a single volume with fio (with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) IOPS even with multiple volumes. Seems the maximum random write performance we can get on the entire cluster is quite close to single volume performance. Thanks Jian -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sebastien Han Sent: Tuesday, September 16, 2014 9:33 PM To: Alexandre DERUMIER Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Hi, Thanks for keeping us updated on this subject. dsync is definitely killing the ssd. I don't have much to add, I'm just surprised that you're only getting 5299 with 0.85 since I've been able to get 6,4K, well I was using the 200GB model, that might explain this. On 12 Sep 2014, at 16:32, Alexandre DERUMIER aderum...@odiso.com wrote: here the results for the intel s3500 max performance is with ceph 0.85 + optracker disabled. intel s3500 don't have d_sync problem like crucial %util show almost 100% for read and write, so maybe the ssd disk performance is the limit. I have some stec zeusram 8GB in stock (I used them for zfs zil), I'll try to bench them next week. INTEL s3500 --- raw disk randread: fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio bw=288207KB/s, iops=72051 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 0,00 73454,00 0,00 293816,00 0,00 8,00 30,96 0,42 0,42 0,00 0,01 99,90 randwrite: fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio --sync=1 bw=48131KB/s, iops=12032 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 0,00 0,00 24120,00 0,00 48240,00 4,00 2,08 0,09 0,00 0,09 0,04 100,00 ceph 0.80 - randread: no tuning: bw=24578KB/s, iops=6144 randwrite: bw=10358KB/s, iops=2589 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 373,00 0,00 8878,00 0,00 34012,50 7,66 1,63 0,18 0,00 0,18 0,06 50,90 ceph 0.85 : - randread : bw=41406KB/s, iops=10351 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 2,00 0,00 10425,00 0,00 41816,00 0,00 8,02 1,36 0,13 0,13 0,00 0,07 75,90 randwrite : bw=17204KB/s, iops=4301 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 333,00 0,00 9788,00 0,00 57909,00 11,83 1,46 0,15 0,00 0,15 0,07 67,80 ceph 0.85 tuning op_tracker=false randread :
Re: [ceph-users] bug: ceph-deploy does not support jumbo frame
thanks. i have not configured switch. i just know about it. 在 2014-09-25 12:38:48,Irek Fasikhov malm...@gmail.com 写道: You have configured the switch? 2014-09-25 5:07 GMT+04:00 yuelongguang fasts...@163.com: hi,all after i set mtu=9000, ceph-deply waits reply all the time , 'detecting platform for host.' how to know what commands ceph-deploy need that osd to do? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Guys, Have done some testing with 3.16.3-031603-generic downloaded from Ubuntu utopic branch. The hang task problem is gone when using large block size (tested with 1M and 4M) and I could no longer preproduce the hang tasks while doing 100 dd tests in a for loop. However, I can confirm that I am still getting hang tasks while working with a 4K block size. The hang tasks start after about an hour, but they do not cause the server crash. After a while the dd test times out and continues with the loop. This is what I was running: for i in {1..100} ; do time dd if=/dev/zero of=/tmp/mount/1G bs=4K count=25K oflag=direct ; done The following test definately produces the hang tasks like these: [23160.549785] INFO: task dd:2033 blocked for more than 120 seconds. [23160.588364] Tainted: G OE 3.16.3-031603-generic #201409171435 [23160.627998] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23160.706856] dd D 000b 0 2033 23859 0x [23160.706861] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23160.706865] 000143c0 000143c0 88048661bcc0 880113441440 [23160.706868] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23160.706871] Call Trace: [23160.706883] [81791f69] schedule+0x29/0x70 [23160.706887] [8179203f] io_schedule+0x8f/0xd0 [23160.706893] [81219e74] dio_await_completion+0x54/0xd0 [23160.706897] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23160.706903] [810ba81e] ? wake_up_bit+0x2e/0x40 [23160.706908] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23160.706914] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706919] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23160.706922] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706928] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23160.706931] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706935] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23160.706938] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23160.706942] [812627fc] ext4_direct_IO+0xec/0x1e0 [23160.706946] [8120a003] ? __mark_inode_dirty+0x53/0x2d0 [23160.706952] [8116d39b] generic_file_direct_write+0xbb/0x180 [23160.706957] [811ffbe2] ? mnt_clone_write+0x12/0x30 [23160.706960] [8116d707] __generic_file_write_iter+0x2a7/0x350 [23160.706963] [8125c2b1] ext4_file_write_iter+0x111/0x3d0 [23160.706969] [81192fd4] ? iov_iter_init+0x14/0x40 [23160.706976] [811e0c8b] new_sync_write+0x7b/0xb0 [23160.706978] [811e19a7] vfs_write+0xc7/0x1f0 [23160.706980] [811e1eaf] SyS_write+0x4f/0xb0 [23160.706985] [81795ded] system_call_fastpath+0x1a/0x1f [23280.705400] INFO: task dd:2033 blocked for more than 120 seconds. [23280.745358] Tainted: G OE 3.16.3-031603-generic #201409171435 [23280.785069] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23280.864158] dd D 000b 0 2033 23859 0x [23280.864164] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23280.864167] 000143c0 000143c0 88048661bcc0 880113441440 [23280.864170] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23280.864173] Call Trace: [23280.864185] [81791f69] schedule+0x29/0x70 [23280.864197] [8179203f] io_schedule+0x8f/0xd0 [23280.864203] [81219e74] dio_await_completion+0x54/0xd0 [23280.864207] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23280.864213] [810ba81e] ? wake_up_bit+0x2e/0x40 [23280.864218] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23280.864224] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864229] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23280.864239] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864244] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23280.864247] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864251] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23280.864254] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23280.864258] [812627fc] ext4_direct_IO+0xec/0x1e0 [23280.864263] [8120a003] ? __mark_inode_dirty+0x53/0x2d0 [23280.864268] [8116d39b] generic_file_direct_write+0xbb/0x180 [23280.864273] [811ffbe2] ? mnt_clone_write+0x12/0x30 [23280.864284] [8116d707] __generic_file_write_iter+0x2a7/0x350 [23280.864289] [8125c2b1] ext4_file_write_iter+0x111/0x3d0 [23280.864295] [81192fd4] ? iov_iter_init+0x14/0x40 [23280.864300] [811e0c8b] new_sync_write+0x7b/0xb0 [23280.864302] [811e19a7] vfs_write+0xc7/0x1f0 [23280.864307] [811e1eaf] SyS_write+0x4f/0xb0 [23280.864314] [81795ded] system_call_fastpath+0x1a/0x1f [23400.861043] INFO: task dd:2033 blocked for
[ceph-users] pgs stuck in active+clean+replay state
Hi! 16 pgs in our ceph cluster are in active+clean+replay state more then one day. All clients are working fine. Is this ok? root@bastet-mon1:/# ceph -w cluster fffeafa2-a664-48a7-979a-517e3ffa0da1 health HEALTH_OK monmap e3: 3 mons at {1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch 2570, quorum 0,1,2 1,2,3 osdmap e3108: 16 osds: 16 up, 16 in pgmap v1419232: 8704 pgs, 6 pools, 513 GB data, 125 kobjects 2066 GB used, 10879 GB / 12945 GB avail 8688 active+clean 16 active+clean+replay client io 3237 kB/s wr, 68 op/s root@bastet-mon1:/# ceph pg dump | grep replay dumped all in format plain 0.fd0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.902766 0'0 3108:2628 [0,7,14,8] [0,7,14,8] 0 0'0 2014-09-23 02:23:49.463704 0'0 2014-09-23 02:23:49.463704 0.e80 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:21.945082 0'0 3108:1823 [2,7,9,10] [2,7,9,10] 2 0'0 2014-09-22 14:37:32.910787 0'0 2014-09-22 14:37:32.910787 0.aa0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.326607 0'0 3108:2451 [0,7,15,12][0,7,15,12] 0 0'0 2014-09-23 00:39:10.717363 0'0 2014-09-23 00:39:10.717363 0.9c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.325229 0'0 3108:1917 [0,7,9,12] [0,7,9,12] 0 0'0 2014-09-22 14:40:06.694479 0'0 2014-09-22 14:40:06.694479 0.9a0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.325074 0'0 3108:2486 [0,7,14,11][0,7,14,11] 0 0'0 2014-09-23 01:14:55.825900 0'0 2014-09-23 01:14:55.825900 0.910 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.839148 0'0 3108:1962 [0,7,9,10] [0,7,9,10] 0 0'0 2014-09-22 14:37:44.652796 0'0 2014-09-22 14:37:44.652796 0.8c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.838683 0'0 3108:2635 [0,2,9,11] [0,2,9,11] 0 0'0 2014-09-23 01:52:52.390529 0'0 2014-09-23 01:52:52.390529 0.8b0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:21.215964 0'0 3108:1636 [2,0,8,14] [2,0,8,14] 2 0'0 2014-09-23 01:31:38.134466 0'0 2014-09-23 01:31:38.134466 0.500 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.869160 0'0 3108:1801 [7,2,15,10][7,2,15,10] 7 0'0 2014-09-20 08:38:53.963779 0'0 2014-09-13 10:27:26.977929 0.440 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.871409 0'0 3108:1819 [7,2,15,10][7,2,15,10] 7 0'0 2014-09-20 11:59:05.208164 0'0 2014-09-20 11:59:05.208164 0.390 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.653190 0'0 3108:1827 [0,2,9,10] [0,2,9,10] 0 0'0 2014-09-22 14:40:50.697850 0'0 2014-09-22 14:40:50.697850 0.320 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:10.970515 0'0 3108:1719 [2,0,14,9] [2,0,14,9] 2 0'0 2014-09-20 12:06:23.716480 0'0 2014-09-20 12:06:23.716480 0.2c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.647268 0'0 3108:2540 [0,7,12,8] [0,7,12,8] 0 0'0 2014-09-22 23:44:53.387815 0'0 2014-09-22 23:44:53.387815 0.1f0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.651059 0'0 3108:2522 [0,2,14,11][0,2,14,11] 0 0'0 2014-09-22 23:38:16.315755 0'0 2014-09-22 23:38:16.315755 0.7 0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.848797 0'0 3108:1739 [7,0,12,10][7,0,12,10] 7 0'0 2014-09-22 14:43:38.224718 0'0 2014-09-22 14:43:38.224718 0.3 0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:08.885066 0'0 3108:1640 [2,0,11,15][2,0,11,15] 2 0'0 2014-09-20 06:18:55.987318 0'0 2014-09-20 06:18:55.987318 With best regards, Pavel. ___ ceph-users mailing list
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Right, I've stopped the tests because it is just getting ridiculous. Without rbd cache enabled, dd tests run extremely slow: dd if=/dev/zero of=/tmp/mount/1G bs=1M count=1000 oflag=direct 230+0 records in 230+0 records out 241172480 bytes (241 MB) copied, 929.71 s, 259 kB/s Any thoughts why I am getting 250kb/s instead of expected 100MB/s+ with large block size? How do I investigate what's causing this crappy performance? Cheers Andrei - Original Message - From: Andrei Mikhailovsky and...@arhont.com To: Micha Krause mi...@krausam.de Cc: ceph-users@lists.ceph.com Sent: Thursday, 25 September, 2014 10:58:07 AM Subject: Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server Guys, Have done some testing with 3.16.3-031603-generic downloaded from Ubuntu utopic branch. The hang task problem is gone when using large block size (tested with 1M and 4M) and I could no longer preproduce the hang tasks while doing 100 dd tests in a for loop. However, I can confirm that I am still getting hang tasks while working with a 4K block size. The hang tasks start after about an hour, but they do not cause the server crash. After a while the dd test times out and continues with the loop. This is what I was running: for i in {1..100} ; do time dd if=/dev/zero of=/tmp/mount/1G bs=4K count=25K oflag=direct ; done The following test definately produces the hang tasks like these: [23160.549785] INFO: task dd:2033 blocked for more than 120 seconds. [23160.588364] Tainted: G OE 3.16.3-031603-generic #201409171435 [23160.627998] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23160.706856] dd D 000b 0 2033 23859 0x [23160.706861] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23160.706865] 000143c0 000143c0 88048661bcc0 880113441440 [23160.706868] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23160.706871] Call Trace: [23160.706883] [81791f69] schedule+0x29/0x70 [23160.706887] [8179203f] io_schedule+0x8f/0xd0 [23160.706893] [81219e74] dio_await_completion+0x54/0xd0 [23160.706897] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23160.706903] [810ba81e] ? wake_up_bit+0x2e/0x40 [23160.706908] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23160.706914] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706919] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23160.706922] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706928] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23160.706931] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706935] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23160.706938] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23160.706942] [812627fc] ext4_direct_IO+0xec/0x1e0 [23160.706946] [8120a003] ? __mark_inode_dirty+0x53/0x2d0 [23160.706952] [8116d39b] generic_file_direct_write+0xbb/0x180 [23160.706957] [811ffbe2] ? mnt_clone_write+0x12/0x30 [23160.706960] [8116d707] __generic_file_write_iter+0x2a7/0x350 [23160.706963] [8125c2b1] ext4_file_write_iter+0x111/0x3d0 [23160.706969] [81192fd4] ? iov_iter_init+0x14/0x40 [23160.706976] [811e0c8b] new_sync_write+0x7b/0xb0 [23160.706978] [811e19a7] vfs_write+0xc7/0x1f0 [23160.706980] [811e1eaf] SyS_write+0x4f/0xb0 [23160.706985] [81795ded] system_call_fastpath+0x1a/0x1f [23280.705400] INFO: task dd:2033 blocked for more than 120 seconds. [23280.745358] Tainted: G OE 3.16.3-031603-generic #201409171435 [23280.785069] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23280.864158] dd D 000b 0 2033 23859 0x [23280.864164] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23280.864167] 000143c0 000143c0 88048661bcc0 880113441440 [23280.864170] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23280.864173] Call Trace: [23280.864185] [81791f69] schedule+0x29/0x70 [23280.864197] [8179203f] io_schedule+0x8f/0xd0 [23280.864203] [81219e74] dio_await_completion+0x54/0xd0 [23280.864207] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23280.864213] [810ba81e] ? wake_up_bit+0x2e/0x40 [23280.864218] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23280.864224] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864229] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23280.864239] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864244] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23280.864247] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864251] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23280.864254] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23280.864258]
[ceph-users] ceph debian systemd
Hi, I'm using ceph version 0.80.5 I trying to make work a ceph cluster using debian and systemd I have already manage to install ceph cluster on debian with sysinit without any problem But after installing all, using ceph deploy without error after rebooting not all my osd start (they are not mount) and what is more strange at each reboot, it 's not the same osd that start adn some start 10 min later I ' ve this in the log Sep 25 12:18:23 addceph3 systemd-udevd[437]: '/usr/sbin/ceph-disk-activate /dev/sdh1' [1005] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[476]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdq1' [1142] Sep 25 12:18:23 addceph3 systemd-udevd[486]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdg1' [998] Sep 25 12:18:23 addceph3 systemd-udevd[486]: '/usr/sbin/ceph-disk-activate /dev/sdg1' [998] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[476]: '/usr/sbin/ceph-disk-activate /dev/sdq1' [1142] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[458]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdi1' [1001] Sep 25 12:18:23 addceph3 systemd-udevd[458]: '/usr/sbin/ceph-disk-activate /dev/sdi1' [1001] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[444]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdj1' [1006] Sep 25 12:18:23 addceph3 systemd-udevd[460]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdk1' [1152] Sep 25 12:18:23 addceph3 systemd-udevd[444]: '/usr/sbin/ceph-disk-activate /dev/sdj1' [1006] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[460]: '/usr/sbin/ceph-disk-activate /dev/sdk1' [1152] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[469]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdm1' [1110] Sep 25 12:18:23 addceph3 systemd-udevd[470]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdp1' [1189] Sep 25 12:18:23 addceph3 systemd-udevd[469]: '/usr/sbin/ceph-disk-activate /dev/sdm1' [1110] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[470]: '/usr/sbin/ceph-disk-activate /dev/sdp1' [1189] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[468]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdl1' [1177] Sep 25 12:18:23 addceph3 systemd-udevd[447]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdo1' [1181] Sep 25 12:18:23 addceph3 systemd-udevd[468]: '/usr/sbin/ceph-disk-activate /dev/sdl1' [1177] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[447]: '/usr/sbin/ceph-disk-activate /dev/sdo1' [1181] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[490]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdr1' [1160] Sep 25 12:18:23 addceph3 systemd-udevd[490]: '/usr/sbin/ceph-disk-activate /dev/sdr1' [1160] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 systemd-udevd[445]: timeout: killing '/usr/sbin/ceph-disk-activate /dev/sdn1' [1202] Sep 25 12:18:23 addceph3 systemd-udevd[445]: '/usr/sbin/ceph-disk-activate /dev/sdn1' [1202] terminated by signal 9 (Killed) Sep 25 12:18:23 addceph3 kernel: [ 39.813701] XFS (sdo1): Mounting Filesystem Sep 25 12:18:23 addceph3 kernel: [ 39.854510] XFS (sdo1): Ending clean mount Sep 25 12:22:59 addceph3 systemd[1]: ceph.service operation timed out. Terminating. Sep 25 12:22:59 addceph3 systemd[1]: Failed to start LSB: Start Ceph distributed file system daemons at boot time. I'm not actually very experimented with systemd don't really how ceph handle systemd if someone can give me a bit of information thanks -- probeSys - spécialiste GNU/Linux site web : http://www.probesys.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
On Thu, Sep 25, 2014 at 1:58 PM, Andrei Mikhailovsky and...@arhont.com wrote: Guys, Have done some testing with 3.16.3-031603-generic downloaded from Ubuntu utopic branch. The hang task problem is gone when using large block size (tested with 1M and 4M) and I could no longer preproduce the hang tasks while doing 100 dd tests in a for loop. However, I can confirm that I am still getting hang tasks while working with a 4K block size. The hang tasks start after about an hour, but they do not cause the server crash. After a while the dd test times out and continues with the loop. This is what I was running: for i in {1..100} ; do time dd if=/dev/zero of=/tmp/mount/1G bs=4K count=25K oflag=direct ; done The following test definately produces the hang tasks like these: [23160.549785] INFO: task dd:2033 blocked for more than 120 seconds. [23160.588364] Tainted: G OE 3.16.3-031603-generic #201409171435 [23160.627998] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23160.706856] dd D 000b 0 2033 23859 0x [23160.706861] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23160.706865] 000143c0 000143c0 88048661bcc0 880113441440 [23160.706868] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23160.706871] Call Trace: [23160.706883] [81791f69] schedule+0x29/0x70 [23160.706887] [8179203f] io_schedule+0x8f/0xd0 [23160.706893] [81219e74] dio_await_completion+0x54/0xd0 [23160.706897] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23160.706903] [810ba81e] ? wake_up_bit+0x2e/0x40 [23160.706908] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23160.706914] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706919] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23160.706922] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706928] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23160.706931] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706935] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23160.706938] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23160.706942] [812627fc] ext4_direct_IO+0xec/0x1e0 [23160.706946] [8120a003] ? __mark_inode_dirty+0x53/0x2d0 [23160.706952] [8116d39b] generic_file_direct_write+0xbb/0x180 [23160.706957] [811ffbe2] ? mnt_clone_write+0x12/0x30 [23160.706960] [8116d707] __generic_file_write_iter+0x2a7/0x350 [23160.706963] [8125c2b1] ext4_file_write_iter+0x111/0x3d0 [23160.706969] [81192fd4] ? iov_iter_init+0x14/0x40 [23160.706976] [811e0c8b] new_sync_write+0x7b/0xb0 [23160.706978] [811e19a7] vfs_write+0xc7/0x1f0 [23160.706980] [811e1eaf] SyS_write+0x4f/0xb0 [23160.706985] [81795ded] system_call_fastpath+0x1a/0x1f [23280.705400] INFO: task dd:2033 blocked for more than 120 seconds. [23280.745358] Tainted: G OE 3.16.3-031603-generic #201409171435 [23280.785069] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23280.864158] dd D 000b 0 2033 23859 0x [23280.864164] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23280.864167] 000143c0 000143c0 88048661bcc0 880113441440 [23280.864170] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23280.864173] Call Trace: [23280.864185] [81791f69] schedule+0x29/0x70 [23280.864197] [8179203f] io_schedule+0x8f/0xd0 [23280.864203] [81219e74] dio_await_completion+0x54/0xd0 [23280.864207] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23280.864213] [810ba81e] ? wake_up_bit+0x2e/0x40 [23280.864218] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23280.864224] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864229] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23280.864239] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864244] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23280.864247] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864251] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23280.864254] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23280.864258] [812627fc] ext4_direct_IO+0xec/0x1e0 [23280.864263] [8120a003] ? __mark_inode_dirty+0x53/0x2d0 [23280.864268] [8116d39b] generic_file_direct_write+0xbb/0x180 [23280.864273] [811ffbe2] ? mnt_clone_write+0x12/0x30 [23280.864284] [8116d707] __generic_file_write_iter+0x2a7/0x350 [23280.864289] [8125c2b1] ext4_file_write_iter+0x111/0x3d0 [23280.864295] [81192fd4] ? iov_iter_init+0x14/0x40 [23280.864300] [811e0c8b] new_sync_write+0x7b/0xb0 [23280.864302]
Re: [ceph-users] [ceph-calamari] Setting up Ceph calamari :: Made Simple
Karan, Thanks for the tutorial, great stuff. Please note that in order to get the graphs working, I had to install ipvsadm and create a symlink from /sbin/ipvsadm to /usr/bin/ipvsadm (CentOS 6). On Wed, Sep 24, 2014 at 10:16 AM, Karan Singh karan.si...@csc.fi wrote: Hello Cepher’s Now here comes my new blog on setting up Ceph Calamari. I hope you would like this step-by-step guide http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html - Karan - ___ ceph-calamari mailing list ceph-calam...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com -- Met vriendelijke groeten / With kind regards, Johan Kooijman ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Ilya, I've not used rbd map on older kernels. Just experimenting with rbd map to have an iscsi and nfs gateway service for hypervisors such as xenserver and vmware. I've tried it with the latest ubuntu LTS kernel 3.13 I believe and noticed the issue. Can you not reproduce the hang tasks when doing dd testing? have you tried 4K block sizes and running it for sometime, like I have done? Thanks Andrei - Original Message - From: Ilya Dryomov ilya.dryo...@inktank.com To: Andrei Mikhailovsky and...@arhont.com Cc: Micha Krause mi...@krausam.de, ceph-users@lists.ceph.com Sent: Thursday, 25 September, 2014 12:04:37 PM Subject: Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server On Thu, Sep 25, 2014 at 1:58 PM, Andrei Mikhailovsky and...@arhont.com wrote: Guys, Have done some testing with 3.16.3-031603-generic downloaded from Ubuntu utopic branch. The hang task problem is gone when using large block size (tested with 1M and 4M) and I could no longer preproduce the hang tasks while doing 100 dd tests in a for loop. However, I can confirm that I am still getting hang tasks while working with a 4K block size. The hang tasks start after about an hour, but they do not cause the server crash. After a while the dd test times out and continues with the loop. This is what I was running: for i in {1..100} ; do time dd if=/dev/zero of=/tmp/mount/1G bs=4K count=25K oflag=direct ; done The following test definately produces the hang tasks like these: [23160.549785] INFO: task dd:2033 blocked for more than 120 seconds. [23160.588364] Tainted: G OE 3.16.3-031603-generic #201409171435 [23160.627998] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23160.706856] dd D 000b 0 2033 23859 0x [23160.706861] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23160.706865] 000143c0 000143c0 88048661bcc0 880113441440 [23160.706868] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23160.706871] Call Trace: [23160.706883] [81791f69] schedule+0x29/0x70 [23160.706887] [8179203f] io_schedule+0x8f/0xd0 [23160.706893] [81219e74] dio_await_completion+0x54/0xd0 [23160.706897] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23160.706903] [810ba81e] ? wake_up_bit+0x2e/0x40 [23160.706908] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23160.706914] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706919] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23160.706922] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706928] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23160.706931] [81265320] ? ext4_get_block_write+0x20/0x20 [23160.706935] [81261fbb] ext4_ext_direct_IO+0x1bb/0x2a0 [23160.706938] [81290158] ? __ext4_journal_stop+0x78/0xa0 [23160.706942] [812627fc] ext4_direct_IO+0xec/0x1e0 [23160.706946] [8120a003] ? __mark_inode_dirty+0x53/0x2d0 [23160.706952] [8116d39b] generic_file_direct_write+0xbb/0x180 [23160.706957] [811ffbe2] ? mnt_clone_write+0x12/0x30 [23160.706960] [8116d707] __generic_file_write_iter+0x2a7/0x350 [23160.706963] [8125c2b1] ext4_file_write_iter+0x111/0x3d0 [23160.706969] [81192fd4] ? iov_iter_init+0x14/0x40 [23160.706976] [811e0c8b] new_sync_write+0x7b/0xb0 [23160.706978] [811e19a7] vfs_write+0xc7/0x1f0 [23160.706980] [811e1eaf] SyS_write+0x4f/0xb0 [23160.706985] [81795ded] system_call_fastpath+0x1a/0x1f [23280.705400] INFO: task dd:2033 blocked for more than 120 seconds. [23280.745358] Tainted: G OE 3.16.3-031603-generic #201409171435 [23280.785069] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [23280.864158] dd D 000b 0 2033 23859 0x [23280.864164] 88011cec78c8 0082 88011cec78d8 88011cec7fd8 [23280.864167] 000143c0 000143c0 88048661bcc0 880113441440 [23280.864170] 88011cec7898 88067fd54cc0 880113441440 880113441440 [23280.864173] Call Trace: [23280.864185] [81791f69] schedule+0x29/0x70 [23280.864197] [8179203f] io_schedule+0x8f/0xd0 [23280.864203] [81219e74] dio_await_completion+0x54/0xd0 [23280.864207] [8121c6a8] do_blockdev_direct_IO+0x958/0xcc0 [23280.864213] [810ba81e] ? wake_up_bit+0x2e/0x40 [23280.864218] [812aa865] ? jbd2_journal_dirty_metadata+0xc5/0x260 [23280.864224] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864229] [8121ca5c] __blockdev_direct_IO+0x4c/0x50 [23280.864239] [81265320] ? ext4_get_block_write+0x20/0x20 [23280.864244] [8129f44e] ext4_ind_direct_IO+0xce/0x410 [23280.864247]
[ceph-users] v0.67.11 dumpling released
v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) * osd: fix mount/remount sync race (#9144 Sage Weil) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.67.11.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
On Thu, Sep 25, 2014 at 7:06 PM, Andrei Mikhailovsky and...@arhont.com wrote: Ilya, I've not used rbd map on older kernels. Just experimenting with rbd map to have an iscsi and nfs gateway service for hypervisors such as xenserver and vmware. I've tried it with the latest ubuntu LTS kernel 3.13 I believe and noticed the issue. Can you not reproduce the hang tasks when doing dd testing? have you tried 4K block sizes and running it for sometime, like I have done? I forget which block size I tried, but it was one that you reported on the tracker, I didn't make up my own. I'll try it exactly the way you described in your previous mail. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Icehouse Ceph -- live migration fails?
Hi! We have an Icehouse system running with librbd based Cinder and Glance configurations, storing images and volumes in Ceph. Configuration is (apart from network setup details, of course) by the book / OpenStack setup guide. Works very nicely, including regular migration, but live migration of virtual machines fails. I created a simple machine booting from a volume based off the Ubuntu 14.04.1 cloud image for testing. Using Horizon, I can move this VM from host to host, but when I try to Live Migrate it from one baremetal host to another, I get an error message “Failed to live migrate instance to host ’node02’. The only related log entry I recognize is in the controller’s nova-api.log: 2014-09-25 17:15:47.679 3616 INFO nova.api.openstack.wsgi [req-f3dc3c2e-d366-40c5-a1f1-31db71afd87a f833f8e2d1104e66b9abe9923751dcf2 a908a95a87cc42cd87ff97da4733c414] HTTP exception thrown: Compute service of node02.baremetal.clusterb.centerdevice.local is unavailable at this time. 2014-09-25 17:15:47.680 3616 INFO nova.osapi_compute.wsgi.server [req-f3dc3c2e-d366-40c5-a1f1-31db71afd87a f833f8e2d1104e66b9abe9923751dcf2 a908a95a87cc42cd87ff97da4733c414] 10.102.6.8 POST /v2/a908a95a87cc42cd87ff97da4733c414/servers/0f762f35-64ee-461f-baa4-30f5de4d5ddf/action HTTP/1.1 status: 400 len: 333 time: 0.1479030 I cannot see anything of value on the destination host itself. New machines get scheduled there, so the compute service cannot really be down. In this thread Travis http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-March/019944.html http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-March/019944.html describes a similar situation, however that was on Folsom, so I wonder if it is still applicable. Would be great to get some outside opinion :) Thanks! Daniel -- Daniel Schneller Mobile Development Lead CenterDevice GmbH | Merscheider Straße 1 | 42699 Solingen tel: +49 1754155711| Deutschland daniel.schnel...@centerdevice.com | www.centerdevice.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.67.11 dumpling released
On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) Sage, Thanks for the great work! Could you provide any links describing how to tune the scrub and snap trim thread pool IO priority? I couldn't find these settings in the docs. IIUC, 0.67.11 does not include the proposed changes to address #9487 or #9503, right? Thanks, Mike Dawson * osd: fix mount/remount sync race (#9144 Sage Weil) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.67.11.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.67.11 dumpling released
On Thu, 25 Sep 2014, Mike Dawson wrote: On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) Sage, Thanks for the great work! Could you provide any links describing how to tune the scrub and snap trim thread pool IO priority? I couldn't find these settings in the docs. It's osd disk thread ioprio class = idle osd disk thread ioprio priority = 0 Note that this is a short-term solution; we eventaully want to send all IO through the same queue so that we can prioritize things more carefully. This setting will most likely go away in the future. IIUC, 0.67.11 does not include the proposed changes to address #9487 or #9503, right? Correct. That will come later once it's gone through more testing. Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.67.11 dumpling released
Hi Mike, On 25 Sep 2014, at 17:47, Mike Dawson mike.daw...@cloudapt.com wrote: On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) Sage, Thanks for the great work! Could you provide any links describing how to tune the scrub and snap trim thread pool IO priority? I couldn't find these settings in the docs. I use: [osd] osd disk thread ioprio class = 3 osd disk thread ioprio priority = 0 You’ll need to use the cfq io scheduler for those to have an effect. FYI, I can make scrubs generally transparent by also adding: osd scrub sleep = .1 osd scrub chunk max = 5 osd deep scrub stride = 1048576 Your mileage may vary. IIUC, 0.67.11 does not include the proposed changes to address #9487 or #9503, right? Those didn’t make it. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-maintainers] v0.67.11 dumpling released
Hi, On 25/09/2014 17:53, Sage Weil wrote: On Thu, 25 Sep 2014, Mike Dawson wrote: On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) Sage, Thanks for the great work! Could you provide any links describing how to tune the scrub and snap trim thread pool IO priority? I couldn't find these settings in the docs. It's osd disk thread ioprio class = idle osd disk thread ioprio priority = 0 Note that this is a short-term solution; we eventaully want to send all IO through the same queue so that we can prioritize things more carefully. This setting will most likely go away in the future. The documentation for these can be found at http://ceph.com/docs/giant/rados/configuration/osd-config-ref/#operations Control-f ioprio Cheers IIUC, 0.67.11 does not include the proposed changes to address #9487 or #9503, right? Correct. That will come later once it's gone through more testing. Thanks! sage ___ Ceph-maintainers mailing list ceph-maintain...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-maintainers-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.67.11 dumpling released
On Thu, 25 Sep 2014, Dan Van Der Ster wrote: Hi Mike, On 25 Sep 2014, at 17:47, Mike Dawson mike.daw...@cloudapt.com wrote: On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) Sage, Thanks for the great work! Could you provide any links describing how to tune the scrub and snap trim thread pool IO priority? I couldn't find these settings in the docs. I use: [osd] osd disk thread ioprio class = 3 Sigh.. it looks like the version that went into master and firefly uses the string names for classes while the dumpling patch takes the numeric ID. Oops. You'll need to take some care to adjust this setting when you upgrade. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ceph-calamari] Setting up Ceph calamari :: Made Simple
Can you explain this a little more, Johan? I've never even heard of ipvsadmin or its facilities before today, and it ought not be required... On Sep 25, 2014 7:04 AM, Johan Kooijman m...@johankooijman.com wrote: Karan, Thanks for the tutorial, great stuff. Please note that in order to get the graphs working, I had to install ipvsadm and create a symlink from /sbin/ipvsadm to /usr/bin/ipvsadm (CentOS 6). On Wed, Sep 24, 2014 at 10:16 AM, Karan Singh karan.si...@csc.fi wrote: Hello Cepher’s Now here comes my new blog on setting up Ceph Calamari. I hope you would like this step-by-step guide http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html - Karan - ___ ceph-calamari mailing list ceph-calam...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com -- Met vriendelijke groeten / With kind regards, Johan Kooijman ___ ceph-calamari mailing list ceph-calam...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck in active+clean+replay state
I imagine you aren't actually using the data/metadata pool that these PGs are in, but it's a previously-reported bug we haven't identified: http://tracker.ceph.com/issues/8758 They should go away if you restart the OSDs that host them (or just remove those pools), but it's not going to hurt anything as long as you aren't using them. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Sep 25, 2014 at 3:37 AM, Pavel V. Kaygorodov pa...@inasan.ru wrote: Hi! 16 pgs in our ceph cluster are in active+clean+replay state more then one day. All clients are working fine. Is this ok? root@bastet-mon1:/# ceph -w cluster fffeafa2-a664-48a7-979a-517e3ffa0da1 health HEALTH_OK monmap e3: 3 mons at {1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch 2570, quorum 0,1,2 1,2,3 osdmap e3108: 16 osds: 16 up, 16 in pgmap v1419232: 8704 pgs, 6 pools, 513 GB data, 125 kobjects 2066 GB used, 10879 GB / 12945 GB avail 8688 active+clean 16 active+clean+replay client io 3237 kB/s wr, 68 op/s root@bastet-mon1:/# ceph pg dump | grep replay dumped all in format plain 0.fd0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.902766 0'0 3108:2628 [0,7,14,8] [0,7,14,8] 0 0'0 2014-09-23 02:23:49.463704 0'0 2014-09-23 02:23:49.463704 0.e80 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:21.945082 0'0 3108:1823 [2,7,9,10] [2,7,9,10] 2 0'0 2014-09-22 14:37:32.910787 0'0 2014-09-22 14:37:32.910787 0.aa0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.326607 0'0 3108:2451 [0,7,15,12][0,7,15,12] 0 0'0 2014-09-23 00:39:10.717363 0'0 2014-09-23 00:39:10.717363 0.9c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.325229 0'0 3108:1917 [0,7,9,12] [0,7,9,12] 0 0'0 2014-09-22 14:40:06.694479 0'0 2014-09-22 14:40:06.694479 0.9a0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.325074 0'0 3108:2486 [0,7,14,11][0,7,14,11] 0 0'0 2014-09-23 01:14:55.825900 0'0 2014-09-23 01:14:55.825900 0.910 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.839148 0'0 3108:1962 [0,7,9,10] [0,7,9,10] 0 0'0 2014-09-22 14:37:44.652796 0'0 2014-09-22 14:37:44.652796 0.8c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.838683 0'0 3108:2635 [0,2,9,11] [0,2,9,11] 0 0'0 2014-09-23 01:52:52.390529 0'0 2014-09-23 01:52:52.390529 0.8b0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:21.215964 0'0 3108:1636 [2,0,8,14] [2,0,8,14] 2 0'0 2014-09-23 01:31:38.134466 0'0 2014-09-23 01:31:38.134466 0.500 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.869160 0'0 3108:1801 [7,2,15,10][7,2,15,10] 7 0'0 2014-09-20 08:38:53.963779 0'0 2014-09-13 10:27:26.977929 0.440 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.871409 0'0 3108:1819 [7,2,15,10][7,2,15,10] 7 0'0 2014-09-20 11:59:05.208164 0'0 2014-09-20 11:59:05.208164 0.390 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.653190 0'0 3108:1827 [0,2,9,10] [0,2,9,10] 0 0'0 2014-09-22 14:40:50.697850 0'0 2014-09-22 14:40:50.697850 0.320 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:10.970515 0'0 3108:1719 [2,0,14,9] [2,0,14,9] 2 0'0 2014-09-20 12:06:23.716480 0'0 2014-09-20 12:06:23.716480 0.2c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.647268 0'0 3108:2540 [0,7,12,8] [0,7,12,8] 0 0'0 2014-09-22 23:44:53.387815 0'0 2014-09-22 23:44:53.387815 0.1f0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.651059 0'0 3108:2522 [0,2,14,11][0,2,14,11] 0 0'0 2014-09-22 23:38:16.315755 0'0 2014-09-22 23:38:16.315755 0.7 0 0 0 0 0 0 0
Re: [ceph-users] v0.67.11 dumpling released
Looks like the packages have partially hit the repo, but at least the following are missing: Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/librbd1_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/librados2_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/python-ceph_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/ceph_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/libcephfs1_0.67.11-1precise_amd64.deb 404 Not Found Based on the timestamps of the files that made it, it looks like the process to publish the packages isn't still in process, but rather failed yesterday. Thanks, Mike Dawson On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) * osd: fix mount/remount sync race (#9144 Sage Weil) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.67.11.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.67.11 dumpling released
On Thu, Sep 25, 2014 at 1:27 PM, Mike Dawson mike.daw...@cloudapt.com wrote: Looks like the packages have partially hit the repo, but at least the following are missing: Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/librbd1_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/librados2_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/python-ceph_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/ceph_0.67.11-1precise_amd64.deb 404 Not Found Failed to fetch http://ceph.com/debian-dumpling/pool/main/c/ceph/libcephfs1_0.67.11-1precise_amd64.deb 404 Not Found Based on the timestamps of the files that made it, it looks like the process to publish the packages isn't still in process, but rather failed yesterday. That is odd. I just went ahead and re-pushed the packages and they are now showing up. Thanks for letting us know! Thanks, Mike Dawson On 9/25/2014 11:09 AM, Sage Weil wrote: v0.67.11 Dumpling === This stable update for Dumpling fixes several important bugs that affect a small set of users. We recommend that all Dumpling users upgrade at their convenience. If none of these issues are affecting your deployment there is no urgency. Notable Changes --- * common: fix sending dup cluster log items (#9080 Sage Weil) * doc: several doc updates (Alfredo Deza) * libcephfs-java: fix build against older JNI headesr (Greg Farnum) * librados: fix crash in op timeout path (#9362 Matthias Kiefer, Sage Weil) * librbd: fix crash using clone of flattened image (#8845 Josh Durgin) * librbd: fix error path cleanup when failing to open image (#8912 Josh Durgin) * mon: fix crash when adjusting pg_num before any OSDs are added (#9052 Sage Weil) * mon: reduce log noise from paxos (Aanchal Agrawal, Sage Weil) * osd: allow scrub and snap trim thread pool IO priority to be adjusted (Sage Weil) * osd: fix mount/remount sync race (#9144 Sage Weil) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.67.11.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck in active+clean+replay state
Hi! I imagine you aren't actually using the data/metadata pool that these PGs are in, but it's a previously-reported bug we haven't identified: http://tracker.ceph.com/issues/8758 They should go away if you restart the OSDs that host them (or just remove those pools), but it's not going to hurt anything as long as you aren't using them. Thanks a lot, restarting of osds helps! BTW, I tried to delete data and metadata pools just after setup, but ceph refused me to do this. With best regards, Pavel. On Thu, Sep 25, 2014 at 3:37 AM, Pavel V. Kaygorodov pa...@inasan.ru wrote: Hi! 16 pgs in our ceph cluster are in active+clean+replay state more then one day. All clients are working fine. Is this ok? root@bastet-mon1:/# ceph -w cluster fffeafa2-a664-48a7-979a-517e3ffa0da1 health HEALTH_OK monmap e3: 3 mons at {1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch 2570, quorum 0,1,2 1,2,3 osdmap e3108: 16 osds: 16 up, 16 in pgmap v1419232: 8704 pgs, 6 pools, 513 GB data, 125 kobjects 2066 GB used, 10879 GB / 12945 GB avail 8688 active+clean 16 active+clean+replay client io 3237 kB/s wr, 68 op/s root@bastet-mon1:/# ceph pg dump | grep replay dumped all in format plain 0.fd0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.902766 0'0 3108:2628 [0,7,14,8] [0,7,14,8] 0 0'0 2014-09-23 02:23:49.463704 0'0 2014-09-23 02:23:49.463704 0.e80 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:21.945082 0'0 3108:1823 [2,7,9,10] [2,7,9,10] 2 0'0 2014-09-22 14:37:32.910787 0'0 2014-09-22 14:37:32.910787 0.aa0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.326607 0'0 3108:2451 [0,7,15,12][0,7,15,12] 0 0'0 2014-09-23 00:39:10.717363 0'0 2014-09-23 00:39:10.717363 0.9c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.325229 0'0 3108:1917 [0,7,9,12] [0,7,9,12] 0 0'0 2014-09-22 14:40:06.694479 0'0 2014-09-22 14:40:06.694479 0.9a0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:29.325074 0'0 3108:2486 [0,7,14,11][0,7,14,11] 0 0'0 2014-09-23 01:14:55.825900 0'0 2014-09-23 01:14:55.825900 0.910 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.839148 0'0 3108:1962 [0,7,9,10] [0,7,9,10] 0 0'0 2014-09-22 14:37:44.652796 0'0 2014-09-22 14:37:44.652796 0.8c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.838683 0'0 3108:2635 [0,2,9,11] [0,2,9,11] 0 0'0 2014-09-23 01:52:52.390529 0'0 2014-09-23 01:52:52.390529 0.8b0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:21.215964 0'0 3108:1636 [2,0,8,14] [2,0,8,14] 2 0'0 2014-09-23 01:31:38.134466 0'0 2014-09-23 01:31:38.134466 0.500 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.869160 0'0 3108:1801 [7,2,15,10][7,2,15,10] 7 0'0 2014-09-20 08:38:53.963779 0'0 2014-09-13 10:27:26.977929 0.440 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:35.871409 0'0 3108:1819 [7,2,15,10][7,2,15,10] 7 0'0 2014-09-20 11:59:05.208164 0'0 2014-09-20 11:59:05.208164 0.390 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.653190 0'0 3108:1827 [0,2,9,10] [0,2,9,10] 0 0'0 2014-09-22 14:40:50.697850 0'0 2014-09-22 14:40:50.697850 0.320 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:10.970515 0'0 3108:1719 [2,0,14,9] [2,0,14,9] 2 0'0 2014-09-20 12:06:23.716480 0'0 2014-09-20 12:06:23.716480 0.2c0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.647268 0'0 3108:2540 [0,7,12,8] [0,7,12,8] 0 0'0 2014-09-22 23:44:53.387815 0'0 2014-09-22 23:44:53.387815 0.1f0 0 0 0 0 0 0 active+clean+replay 2014-09-24 02:38:28.651059 0'0 3108:2522 [0,2,14,11][0,2,14,11] 0 0'0 2014-09-22 23:38:16.315755
Re: [ceph-users] Any way to remove possible orphaned files in a federated gateway configuration
Thanks Yehuda for your response, much appreciated. Using the radosgw-admin object stat option I was able to reconcile the objects on master and slave. There are 10 objects on the master that have replicated to the slave, for these 10 objects I was able to confirm by pulling the tag prefix from object stat, verifying size, name, etc. There are still a large number of shadow files in .region-1.zone-2.rgw.buckets pool which have no corresponding object to cross reference using object stat command. These files are taking up several hundred GB from OSD's on the region-2 cluster. What would be the correct way to remove these shadow files that no longer have objects associated? Is there a process that will clean these orphaned objects? Any steps anyone can provide to remove these files would greatly appreciated. BTW - Since my original post several objects have been copied via s3 client to the master and everything appears to be replicating without issue. Objects have been deleted as well, the sync looks fine, objects are being removed from master and slave. I'm pretty sure the large number of orphaned shadow files that are currently in the .region-1.zone-2.rgw.buckets pool are from the original sync performed back on Sept. 15. Thanks in advance, MLM -Original Message- From: yehud...@gmail.com [mailto:yehud...@gmail.com] On Behalf Of Yehuda Sadeh Sent: Tuesday, September 23, 2014 5:30 PM To: lyn_mitch...@bellsouth.net Cc: ceph-users; ceph-commun...@lists.ceph.com Subject: Re: [ceph-users] Any way to remove possible orphaned files in a federated gateway configuration On Tue, Sep 23, 2014 at 3:05 PM, Lyn Mitchell mitc...@bellsouth.net wrote: Is anyone aware of a way to either reconcile or remove possible orphaned “shadow” files in a federated gateway configuration? The issue we’re seeing is the number of chunks/shadow files on the slave has many more “shadow” files than the master, the breakdown is as follows: master zone: .region-1.zone-1.rgw.buckets = 1737 “shadow” files of which there are 10 distinct sets of tags, an example of 1 distinct set is: alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_1 through alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_516 slave zone: .region-1.zone-2.rgw.buckets = 331961 “shadow” files, of which there are 652 distinct sets of tags, examples: 1 set having 516 “shadow” files: alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_1 through alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_516 236 sets having 515 “shadow” files apiece: alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_1 through alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_515 alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_1 through alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_515 These are all part of the same bucket (prefixed by alph-1.80907.1). …. The number of shadow files in zone-2 is taking quite a bit of space from the OSD’s in the cluster. Without being able to trace back to the original file name from an s3 or rados tag, I have no way of knowing which files these are. Is it possible that the same file may have been replicated multiple times, due to network or connectivity issues? I can provide any logs or other information that may provide some help, however at this point we’re not seeing any real errors. Thanks in advance for any help that can be provided, You can also run the following command on the existing objects within that specific bucket: $ radosgw-admin object stat --bucket=bucket --object=object This will show the mapping from the rgw object to the rados objects that construct it. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD import slow
On 09/24/2014 04:57 PM, Brian Rak wrote: I've been doing some testing of importing virtual machine images, and I've found that 'rbd import' is at least 2x as slow as 'qemu-img convert'. Is there anything I can do to speed this process up? I'd like to use rbd import because it gives me a little additional flexibility. My test setup was a 40960MB LVM volume, and I used the following two commands: rbd import /dev/lvmtest/testvol test qemu-img convert /dev/lvmtest/testvol rbd:test/test rbd import took 13 minutes, qemu-img took 5. I'm at a loss to explain this, I would have expected rbd import to be faster. This is with ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) rbd import was doing one synchronous I/O after another. Recently import and export were parallelized according to --rbd-concurrent-management-ops (default 10), which helps quite a bit. This will be in giant. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Best practice about using multiple disks on one single OSD
Hi, I have several servers and each server has 4 disks. Now I am going to setup Ceph on these servers and use all the 4 disks but it seems one OSD instance can be configured with one backend storage. So there seems two options to me: 1. Make the 4 disks into a raid0 then setup OSD to use this raid0 but obviously this is not good because one disk failure will ruin the entire storage. 2. Build FS on each disk and start 4 OSD instances on the server. Both options are not good. So I am wondering what's the best practice of setting up multiple didks on one OSD for Ceph. Thanks! Best Regards, James Jiaming Pan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best practice about using multiple disks on one single OSD
Hi James, the best practice is to set up 1 OSD daemon per physical disk drive. In your case, each OSD node would hence be 4 OSD daemons using one physical drive per daemon, and deploying a minimum of 3 servers so each object copy resides on a separate physical server. JC On Sep 25, 2014, at 20:42, James Pan dev...@yahoo.com wrote: Hi, I have several servers and each server has 4 disks. Now I am going to setup Ceph on these servers and use all the 4 disks but it seems one OSD instance can be configured with one backend storage. So there seems two options to me: 1. Make the 4 disks into a raid0 then setup OSD to use this raid0 but obviously this is not good because one disk failure will ruin the entire storage. 2. Build FS on each disk and start 4 OSD instances on the server. Both options are not good. So I am wondering what's the best practice of setting up multiple didks on one OSD for Ceph. Thanks! Best Regards, James Jiaming Pan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] iptables
Hello, On my ceph cluster osd node . there is a rule to REJECT all. As per the documentation, added a rule to allow the trafficon the full range of ports, But, the cluster will not come into clean state. Can you please share your experience with the iptables configuration. Following are the INPUT rules: 5ACCEPT tcp -- 10.108.240.192/260.0.0.0/0 multiport dports 6800:7100 6REJECT all -- 0.0.0.0/00.0.0.0/0 reject-with icmp-host-prohibited Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com