[ceph-users] the first write some dd more slowly (This RBD-test is based on other RBD)
Hi everyone. Had a problem about speed write by some write frist dd(s) on my rootdisk (My rootdisk-RBD was based on other RBD-Image). It have run more slowly, but running faster after. (I used writeback cache on RBD Client side and RAID Phisycal). #dd if=/dev/zero of=bigfile01 bs=1M count=500 conv=fdatasync 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 5,51315 s, 19,0 MB/s Then: #dd if=/dev/zero of=bigfile03 bs=1M count=500 conv=fdatasync 300+0 records in 300+0 records out 314572800 bytes (315 MB) copied, 4,16175 s, 75,6 MB/s Please explain this proplem to me!, Thanks! -- Tuantaba Ha Noi-VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [Cache-tier] librbd: error finding source object: (2) No such file or directory
Hi everyone, I has been used the cache-tier on a data pool. After a long time, a lot of rbd images don't be displayed in rbd -p data ls. Although that Images still show through rbd info and rados ls command. rbd -p data info volume-008ae4f7-3464-40c0-80b0-51140d8b95a8 rbd image 'volume-008ae4f7-3464-40c0-80b0-51140d8b95a8': size 128 GB in 32768 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.10c1c102eb141f2 format: 2 features: layering flags: And: rados -p data ls|grep 10c1c102eb141f2 # grep though block_name_prefix. = show: rbd_header.10c1c102eb141f2 Or: rados -p data ls|grep volume-008ae4f7-3464-40c0-80b0-51140d8b95a8 = show: rbd_id.volume-008ae4f7-3464-40c0-80b0-51140d8b95a8 Everything seem is normal. But*I tried to move ( and rename) above Image*, then received the following error: #rbd mv data/volume-008ae4f7-3464-40c0-80b0-51140d8b95a8 data/volume-008ae4f7-3464-40c0-80b0-51140d8b95a8_new rbd: rename error: (2) No such file or directory 2015-08-19 10:46:07.175525 7fb8b0985840 -1 librbd: error finding source object: (2) No such file or directory = rename action will spawn a new RBD, didn't delete original RBD *and when deleting the Image (deleting still sucessfullly**)*: deleting data/volume-32e1fa85-2e03-4cbe-be36-09358aa6e7f4 Removing all snapshots: 100% complete...done. Removing image: 99% complete...failed. rbd: delete error: (2) No such file or directory 2015-08-19 11:27:17.904695 7f9c32217840 -1 librbd: error removing img from new-style directory: (2) No such file or directory What happend with that RBDs?, how to fix that error? Thanks so much! -- Tuan- HaNoi,VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to unset lfor setting (from cache pool)
I understand, Thank Gregory Farnum for your explaining -- Tuantaba Ha Noi-VietNam On 07/04/2015 00:54, Gregory Farnum wrote: On Mon, Apr 6, 2015 at 2:21 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi all, I have ever to setup the cache-pool for my pool. But had some proplems about cache-pool running, so I removed the cache pool from My CEPH Cluster. The DATA pool currently don't use cache pool, but lfor setting still be appeared. lfor seems is a setting, not flag. pool 3 'data_pool' replicated size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 4098 pgp_num 4098 last_change 145235 lfor 75731 flags hashpspool stripe_width 0 How to unset lfor?, and After unsetting lofr, My pool activities OK? lfor is not a config option; it's last force op resend and the value is an OSDMap epoch. You can see some details about exactly what it means in the v0.80.2 changelog. -Greg Thanks! -- Tuantaba Ha Noi-VietNam. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to unset lfor setting (from cache pool)
Hi all, I have ever to setup the cache-pool for my pool. But had some proplems about cache-pool running, so I removed the cache pool from My CEPH Cluster. The DATA pool currently don't use cache pool, but lfor setting still be appeared. *lfor* seems is a setting, not flag. pool 3 'data_pool' replicated size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 4098 pgp_num 4098 last_change 145235 *lfor 75731* flags hashpspool stripe_width 0 How to unset lfor?, and After unsetting lofr, My pool activities OK? Thanks! -- Tuantaba Ha Noi-VietNam. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to detect degraded objects
Hi everyone, 111/57706299 objects degraded (0.001%) 14918 active+clean 1 active+clean+scrubbing+deep 52 active+recovery_wait+degraded 2 active+recovering+degraded Ceph'state : *111 /*57706299 objects degraded. Some missing object(s) to have CEPH crash one osd daemon. How to list degraded objects? Guide me, please. Thanks! -2196 2014-11-07 16:04:23.063584 7fe1aed83700 10 osd.21 pg_epoch: 107789 pg[6.9f0( v 107789'7058293 lc 107786'7058229 (107617'7055096,107789'7058293] local-les=107788 n=4506 ec=164 les/c 107788/107785 107787/107787/105273) [101,21,78] r=1 lpr=107787 pi=106418-107786/36 luod=0'0 crt=107786'7058241 lcod 107786'7058222 active m=1] *got missing 1f7c69f0/rbd_data.885435b2bbeeb.59c2/head//6 *v 107786'7058230 0 2014-11-07 16:14:57.024605 7f8602e3d700 -1 *** Caught signal (Aborted) ** in thread 7f8602e3d700 ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: /usr/bin/ceph-osd() [0x9b6725] 2: (()+0xfcb0) [0x7f8626439cb0] 3: (gsignal()+0x35) [0x7f8624d3e0d5] 4: (abort()+0x17b) [0x7f8624d4183b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f862569069d] 6: (()+0xb5846) [0x7f862568e846] 7: (()+0xb5873) [0x7f862568e873] 8: (()+0xb596e) [0x7f862568e96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0xaa0089] 10: (ReplicatedPG::trim_object(hobject_t const)+0x222d) [0x8139ed] 11: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82b9be] 12: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mp l_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n a, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870ce0] 13: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, *ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::nul*l_except ion_translator::process_queued_events()+0xfb) [0x85618b] 14: (boost::statechart::state_machineR*eplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_except* ion_translator::process_event(boost::statechart::event_base const)+0x1e) [0x85633e] 15: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5ef8] 16: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x673ab4] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fade] 18: (ThreadPool::WorkThread::entry()+0x10) [0xa92870] 19: (()+0x7e9a) [0x7f8626431e9a] 20: (clone()+0x6d) [0x7f8624dfc31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -- Tuan HaNoi-VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to detect degraded objects
Hi Sahana, Thank for your replying. But, how to list objects of pgs ? :D Thanks! Tuan -- HaNoi-VietNam On 11/07/2014 04:22 PM, Sahana Lokeshappa wrote: Hi Tuan, 14918 active+clean 1 active+clean+scrubbing+deep 52 active+recovery_wait+degraded 2 active+recovering+degraded This says that 2 +52 pgs are degraded. You can run command: ceph pg dump | grep degraded. You will get list of pgs which are in degraded state. The objects included in that pg are in degraded state Thanks *Sahana Lokeshappa** * *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Ta Ba Tuan *Sent:* Friday, November 07, 2014 2:49 PM *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] How to detect degraded objects Hi everyone, 111/57706299 objects degraded (0.001%) n bsp; 14918 active+clean 1 active+clean+scrubbing+deep 52 active+recovery_wait+degraded 2 active+recovering+degraded Ceph'state : *111 /*57706299 objects degraded. Some missing object(s) to have CEPH crash one osd daemon. How to list degraded objects? Guide me, please. Thanks! -2196 2014-11-07 16:04:23.063584 7fe1aed83700 10 osd.21 pg_epoch: 107789 pg[6.9f0( v 107789'7058293 lc 107786'7058229 (107617'7055096,107789'7058293] local-les=107788 n=4506 ec=164 les/c 107788/107785 107787/107787/105273) [101,21,78] r=1 lpr=107787 pi=106418-107786/36 luod=0'0 crt=107786'7058241 lcod 107786'7058222 active m=1] *got missing 1f7c69f0/rbd_data.885435b2bbeeb.59c2/head//6 *v 107786'7058230 0 2014-11-07 16:14:57.024605 7f8602e3d700 -1 *** Caught signal (Aborted) ** in thread 7f8602e3d700 ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: /usr/bin/ceph-osd() [0x9b6725] 2: (()+0xfcb0) [0x7f8626439cb0] 3: (gsignal()+0x35) [0x7f8624d3e0d5] 4: (abort()+0x17b) [0x7f8624d4183b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f862569069d] 6: (()+0xb5846) [0x7f862568e846] 7: (()+0xb5873) [0x7f862568e873] 8: (()+0xb596e) [0x7f862568e96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0xaa0089] 10: (ReplicatedPG::trim_object(hobject_t const)+0x222d) [0x8139ed] 11: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82b9be] 12: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mp l_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n a, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870ce0] 13: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, *ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::nul*l_except ion_translator::process_queued_events()+0xfb) [0x85618b] 14: (boost::statechart::state_machineR*eplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_except* ion_translator::process_event(boost::statechart::event_base const)+0x1e) [0x85633e] 15: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5ef8] 16: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x673ab4] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fade] 18: (ThreadPool::WorkThread::entry()+0x10) [0xa92870] 19: (()+0x7e9a) [0x7f8626431e9a] 20: (clone()+0x6d) [0x7f8624dfc31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -- Tuan HaNoi-VietNam PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?
Hi David, I re-uploaded entire log at http://123.30.41.138/ceph-osd.21.log appear many many logs :| 2014-11-04 18:24:38.641529 7f0fda7ac780 15 read_log missing 106395'4837671 (106395'4837670) modify 5479e128/rbd_data.74ae9c3be03aff.0b01/head//6 by client.7912580.0:19413835 2014-11-04 15:06:55.874814 (have 106384'4836070) 2014-11-04 18:24:38.641581 7f0fda7ac780 15 filestore(/var/lib/ceph/osd/cloud-21) getattr 6.128_head/573c4128/rbd_data.75551c509c613c.0780/head//6 '_' ... Thank you -- Tuan HaNoi-VietNam On 11/05/2014 11:36 AM, David Zafman wrote: Can you upload the entire log file? David On Nov 4, 2014, at 1:03 AM, Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn wrote: Hi Sam, I resend logs with debug options http://123.30.41.138/ceph-osd.21.log (Sorry about my spam :D) I saw many missing objects :| 2014-11-04 15:26:02.205607 7f3ab11a8700 10 osd.21 pg_epoch: 106407 pg[24.7d7( v 106407'491583 lc 106401'491579 (105805'487042,106407'491583] loca l-les=106403 n=179 ec=25000 les/c 106403/106390 106402/106402/106402) [21,28,4] r=0 lpr=106402 pi=106377-106401/4 rops=1 crt=106401'491581 mlcod 106393'491097 active+recovering+degraded m=2 snaptrimq=[306~1,312~1]] recover_primary 675ea7d7/*rbd_data.4930222ae8944a.0001/head//24 106401'491580 (missing) (missing head) (recovering) (recovering head)* 2014-11-04 15:26:02.205642 7f3ab11a8700 10 osd.21 pg_epoch: 106407 pg[24.7d7( v 106407'491583 lc 106401'491579 (105805'487042,106407'491583] local-les=106403 n=179 ec=25000 les/c 106403/106390 106402/106402/106402) [21,28,4] r=0 lpr=106402 pi=106377-106401/4 rops=1 crt=106401'491581 mlcod 106393'491097 active+recovering+degraded m=2 snaptrimq=[306~1,312~1]] recover_primary d4d4bfd7/rbd_data.c6964d30a28220.035f/head//24 106401'491581 (missing) (missing head) 2014-11-04 15:26:02.237994 7f3ab29ab700 10 osd.21 pg_epoch: 106407 pg[24.7d7( v 106407'491583 lc 106401'491579 (105805'487042,106407'491583] local-les=106403 n=179 ec=25000 les/c 106403/106390 106402/106402/106402) [21,28,4] r=0 lpr=106402 pi=106377-106401/4 rops=2 crt=106401'491581 mlcod 106393'491097 active+recovering+degraded m=2 snaptrimq=[306~1,312~1]] *got missing d4d4bfd7/rbd_data.c6964d30a28220.035f/head//24 v 106401'491581* Thanks Sam and All, -- Tuan HaNoi-Vietnam On 11/04/2014 04:54 AM, Samuel Just wrote: Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 In the [osd] section of that osd's ceph.conf? -Sam On Sun, Nov 2, 2014 at 9:10 PM, Ta Ba Tuantua...@vccloud.vn wrote: Hi Sage, Samuel All, I upgraded to GAINT, but still appearing that errors |: I'm trying on deleting related objects/volumes, but very hard to verify missing objects :(. Guide me to resolve it, please! (I send attached detail log). 2014-11-03 11:37:57.730820 7f28fb812700 0 osd.21 105950 do_command r=0 2014-11-03 11:37:57.856578 7f28fc013700 -1 *** Caught signal (Segmentation fault) ** in thread 7f28fc013700 ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: /usr/bin/ceph-osd() [0x9b6725] 2: (()+0xfcb0) [0x7f291fc2acb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811b55] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82b9be] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870ce0] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x85618b] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x85633e] 8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5ef8] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x673ab4] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fade] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa92870] 12: (()+0x7e9a) [0x7f291fc22e9a] 13: (clone()+0x6d) [0x7f291e5ed31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -9993 2014-11-03 11:37:47.689335 7f28fc814700 1 -- 172.30.5.2:6803/7606 -- 172.30.5.1:6886/3511 -- MOSDPGPull(6.58e 105950 [PullOp(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6, recovery_info: ObjectRecoveryInfo(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6@105938'11622009, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0
Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?
Hi Sam, I resend logs with debug options http://123.30.41.138/ceph-osd.21.log (Sorry about my spam :D) I saw many missing objects :| 2014-11-04 15:26:02.205607 7f3ab11a8700 10 osd.21 pg_epoch: 106407 pg[24.7d7( v 106407'491583 lc 106401'491579 (105805'487042,106407'491583] loca l-les=106403 n=179 ec=25000 les/c 106403/106390 106402/106402/106402) [21,28,4] r=0 lpr=106402 pi=106377-106401/4 rops=1 crt=106401'491581 mlcod 106393'491097 active+recovering+degraded m=2 snaptrimq=[306~1,312~1]] recover_primary 675ea7d7/*rbd_data.4930222ae8944a.0001/head//24 106401'491580 (missing) (missing head) (recovering) (recovering head)* 2014-11-04 15:26:02.205642 7f3ab11a8700 10 osd.21 pg_epoch: 106407 pg[24.7d7( v 106407'491583 lc 106401'491579 (105805'487042,106407'491583] local-les=106403 n=179 ec=25000 les/c 106403/106390 106402/106402/106402) [21,28,4] r=0 lpr=106402 pi=106377-106401/4 rops=1 crt=106401'491581 mlcod 106393'491097 active+recovering+degraded m=2 snaptrimq=[306~1,312~1]] recover_primary d4d4bfd7/rbd_data.c6964d30a28220.035f/head//24 106401'491581 (missing) (missing head) 2014-11-04 15:26:02.237994 7f3ab29ab700 10 osd.21 pg_epoch: 106407 pg[24.7d7( v 106407'491583 lc 106401'491579 (105805'487042,106407'491583] local-les=106403 n=179 ec=25000 les/c 106403/106390 106402/106402/106402) [21,28,4] r=0 lpr=106402 pi=106377-106401/4 rops=2 crt=106401'491581 mlcod 106393'491097 active+recovering+degraded m=2 snaptrimq=[306~1,312~1]] *got missing d4d4bfd7/rbd_data.c6964d30a28220.035f/head//24 v 106401'491581* Thanks Sam and All, -- Tuan HaNoi-Vietnam On 11/04/2014 04:54 AM, Samuel Just wrote: Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 In the [osd] section of that osd's ceph.conf? -Sam On Sun, Nov 2, 2014 at 9:10 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Sage, Samuel All, I upgraded to GAINT, but still appearing that errors |: I'm trying on deleting related objects/volumes, but very hard to verify missing objects :(. Guide me to resolve it, please! (I send attached detail log). 2014-11-03 11:37:57.730820 7f28fb812700 0 osd.21 105950 do_command r=0 2014-11-03 11:37:57.856578 7f28fc013700 -1 *** Caught signal (Segmentation fault) ** in thread 7f28fc013700 ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: /usr/bin/ceph-osd() [0x9b6725] 2: (()+0xfcb0) [0x7f291fc2acb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811b55] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82b9be] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870ce0] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x85618b] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x85633e] 8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5ef8] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x673ab4] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fade] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa92870] 12: (()+0x7e9a) [0x7f291fc22e9a] 13: (clone()+0x6d) [0x7f291e5ed31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -9993 2014-11-03 11:37:47.689335 7f28fc814700 1 -- 172.30.5.2:6803/7606 -- 172.30.5.1:6886/3511 -- MOSDPGPull(6.58e 105950 [PullOp(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6, recovery_info: ObjectRecoveryInfo(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6@105938'11622009, copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x26c59000 con 0x22fbc420 -2 2014-11-03 11:37:57.853585 7f2902820700 5 osd.21 pg_epoch: 105950 pg[24.9e4( v 105946'113392 lc 105946'113391 (103622'109598,105946'113392] local-les=1 05948 n=88 ec=25000 les/c 105948/105943 105947/105947/105947) [21,112,33] r=0 lpr=105947 pi=105933-105946/4 crt=105946'113392 lcod 0'0 mlcod 0'0 active+recovery _wait+degraded m=1 snaptrimq=[303~3,307~1]] enter Started/Primary/Active/Recovering -1 2014-11-03 11:37:57.853735 7f28fc814700 1 -- 172.30.5.2:6803/7606 -- 172.30.5.9:6806/24552 -- MOSDPGPull(24.9e4 105950 [PullOp(5abb99e4/rbd_data.5dd32 f2ae8944a.0165/head//24, recovery_info
Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?
Hi Sage Weil Thank for your repling. Yes, I'm using Ceph v.0.86, I report some related bugs, Hope you help me, 2014-10-31 15:34:52.927965 7f85efb6b700 0 osd.21 104744 do_command r=0 2014-10-31 15:34:53.105533 7f85f036c700 -1 *** Caught signal (Segmentation fault) ** in thread 7f85f036c700 *ceph version 0.86-106-g6f8524e (*6f8524ef7673ab4448de2e0ff76638deaf03cae8) 1: /usr/bin/ceph-osd() [0x9b6655] 2: (()+0xfcb0) [0x7f8615726cb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811c25] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82baae] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870c30] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x8560db] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x8562ae] 8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5f48] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x6739b4] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fa0e] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa927a0] 12: (()+0x7e9a) [0x7f861571ee9a] 13: (clone()+0x6d) [0x7f86140e931d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -9523 2014-10-31 15:34:45.571962 7f85e3ee0700 5 -- op tracker -- seq: 6937, time: 2014-10-31 15:34:45.531887, event: header_read, op: MOSDPGPus h(*6.749 *104744 [*PushOp(d2106749/rbd_data.a2e6185b9a8ef8.0803/head//6, version: 104736'7736506, data_included: [*0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(d2106749/rbd_data.a2e6185b9a8ef8.00 000803/head//6@104736'7736506, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41943 04, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete :false, omap_recovered_to:, omap_complete:false)),PushOp(60940749/rbd_data.3435875ff78f67.1408/head//6, version: 104736'7736579, data_ included: [0~335360], data_size: 335360, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(60940749/rb d_data.3435875ff78f67.1408/head//6@104736'7736579, copy_subset: [0~335360], clone_subset: {}), after_progress: ObjectRecoveryProgress( !first, data_recovered_to:335360, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data _recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(922b1749/rbd_data.1c3dade6cdc10.14c5/head//6, v ersion: 104736'7736866, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(922b1749/rbd_data.1c3dade6cdc10.14c5/head//6@104736'7736866, copy_subset: [0~4194304], clone_subset: {}), after_pr ogress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: Ob jectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) -6933 2014-10-31 15:34tha7.611229 7f85f737a700 5 osd.21 pg_epoch: 104744 pg[*6.749*( v 104744'7741801 (104665'7732106,104744'7741801] lb 14886749/rbd_data.3955b9640616f2.f5e2/head//6 local-les=104661 n=1780 ec=164 les/c 104742/104735 104740/104741/103210) [74,112,21]/[74,112] r=-1 lpr=104741 pi=64005-104740/278 luod=0'0 crt=104744'7741798 active+remapped] enter *Started/ReplicaActive/RepNotRecovering* I think having some missing objects, I can't start one osd that above objects be pushed to that osd. Ceph'versions are slower 0.86 then appear this bug? Should I upgrade to Giant o resolve this bug?, Thank you, -- Tuan HaNoi-VietNam On 10/30/2014 10:02 PM, Sage Weil wrote: On Thu, 30 Oct 2014, Ta Ba Tuan wrote: Hi Everyone, I upgraded Ceph to Giant by installing *tar.gz package, but appeared some errors related Object Trimming or Snap Trimming: I think having some missing objects and be not recovered. Note that this isn't giant, which is 0.87, but something a few weeks older. There were a few bugs fixed in this code, but we can't tell
Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?
Hi Samuel and Sage, I will upgrde to Giant soon, Thank you so much. -- Tuan HaNoi-VietNam On 11/01/2014 01:10 AM, Samuel Just wrote: You should start by upgrading to giant, many many bug fixes went in between .86 and giant. -Sam On Fri, Oct 31, 2014 at 8:54 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Sage Weil Thank for your repling. Yes, I'm using Ceph v.0.86, I report some related bugs, Hope you help me, 2014-10-31 15:34:52.927965 7f85efb6b700 0 osd.21 104744 do_command r=0 2014-10-31 15:34:53.105533 7f85f036c700 -1 *** Caught signal (Segmentation fault) ** in thread 7f85f036c700 ceph version 0.86-106-g6f8524e (6f8524ef7673ab4448de2e0ff76638deaf03cae8) 1: /usr/bin/ceph-osd() [0x9b6655] 2: (()+0xfcb0) [0x7f8615726cb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811c25] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82baae] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870c30] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x8560db] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x8562ae] 8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5f48] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x6739b4] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fa0e] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa927a0] 12: (()+0x7e9a) [0x7f861571ee9a] 13: (clone()+0x6d) [0x7f86140e931d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -9523 2014-10-31 15:34:45.571962 7f85e3ee0700 5 -- op tracker -- seq: 6937, time: 2014-10-31 15:34:45.531887, event: header_read, op: MOSDPGPus h(6.749 104744 [PushOp(d2106749/rbd_data.a2e6185b9a8ef8.0803/head//6, version: 104736'7736506, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(d2106749/rbd_data.a2e6185b9a8ef8.00 000803/head//6@104736'7736506, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41943 04, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete :false, omap_recovered_to:, omap_complete:false)),PushOp(60940749/rbd_data.3435875ff78f67.1408/head//6, version: 104736'7736579, data_ included: [0~335360], data_size: 335360, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(60940749/rb d_data.3435875ff78f67.1408/head//6@104736'7736579, copy_subset: [0~335360], clone_subset: {}), after_progress: ObjectRecoveryProgress( !first, data_recovered_to:335360, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data _recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(922b1749/rbd_data.1c3dade6cdc10.14c5/head//6, v ersion: 104736'7736866, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(922b1749/rbd_data.1c3dade6cdc10.14c5/head//6@104736'7736866, copy_subset: [0~4194304], clone_subset: {}), after_pr ogress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: Ob jectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) -6933 2014-10-31 15:34tha7.611229 7f85f737a700 5 osd.21 pg_epoch: 104744 pg[6.749( v 104744'7741801 (104665'7732106,104744'7741801] lb 14886749/rbd_data.3955b9640616f2.f5e2/head//6 local-les=104661 n=1780 ec=164 les/c 104742/104735 104740/104741/103210) [74,112,21]/[74,112] r=-1 lpr=104741 pi=64005-104740/278 luod=0'0 crt=104744'7741798 active+remapped] enter Started/ReplicaActive/RepNotRecovering I think having some missing objects, I can't start one osd that above objects be pushed to that osd. Ceph'versions are slower 0.86 then appear this bug? Should I upgrade to Giant o resolve this bug?, Thank you, -- Tuan HaNoi-VietNam On 10/30/2014 10:02 PM, Sage Weil wrote: On Thu, 30 Oct 2014, Ta Ba Tuan wrote: Hi Everyone, I upgraded Ceph to Giant by installing *tar.gz package, but appeared
[ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?
Hi Everyone, I upgraded Ceph to Giant by installing *tar.gz package, but appeared some errors related Object Trimming or Snap Trimming: I think having some missing objects and be not recovered. * ceph version 0.86*-106-g6f8524e (6f8524ef7673ab4448de2e0ff76638deaf03cae8) 1: /usr/bin/ceph-osd() [0x9b6655] 2: (()+0xfcb0) [0x7fa52c471cb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811c25] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x43e) [0x82baae] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl _::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na , mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x870c30] 6: (boost::statechart::state_machineReplicatedPG::*SnapTrimmer, ReplicatedPG::NotTrimming, *std::allocatorvoid, boost::statechart::null_excepti on_translator::process_queued_events()+0xfb) [0x8560db] 7: (boost::statechart::state_machine*ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, *std::allocatorvoid, boost::statechart::null_excepti on_translator::process_event(boost::statechart::event_base const)+0x1e) [0x8562ae] 8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5f48] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x6739b4] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fa0e] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa927a0] 12: (()+0x7e9a) [0x7fa52c469e9a] 13: (clone()+0x6d) [0x7fa52ae3431d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -128 2014-10-29 13:51:23.049357 7fa50ed9d700 5 osd.21 pg_epoch: 104445 pg[6.9d8( v 104445'7857889 (103730'7852406,104445'7857889] local-les=10 n=4345 ec=164 les/c 10/104272 104443/104443/104443) [21,93,49] r=0 lpr=104443 pi=103787-104442/16 crt=104442'7857887 mlcod 104445'7857888 *active snaptrimq*=[1907~1,1941~4,1946~1,19ef~2,19f2~1,19f4~3,19fa~5]] exit *Started/Primary/Active/Recovered 0**.0*00084 0 0.00 -127 2014-10-29 13:51:23.049392 7fa50ed9d700 5 osd.21 pg_epoch: 104445 pg[6.9d8( v 104445'7857889 (103730'7852406,104445'7857889] local-les=10 n=4345 ec=164 les/c 10/104272 104443/104443/104443) [21,93,49] r=0 lpr=104443 pi=103787-104442/16 crt=104442'7857887 mlcod 104445'7857888 *active snaptrimq*=[1907~1,1941~4,1946~1,19ef~2,19f2~1,19f4~3,19fa~5]] enter Started/Primary/Active/Clean -126 2014-10-29 13:51:23.049582 7fa50ed9d700 1 -- 172.30.5.2:6838/22980 -- 172.30.5.4:6859/8884 -- pg_info(1 pgs e104445:6.9d8) v4 -- ?+0 0x30d41c00 con 0x26c6ac60 Thank you! -- Tuan HaNoi-VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false
Re: [ceph-users] Can't start osd- one osd alway be down.
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus h(*6.9d8* 102856 [PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, version: 102853'7800592, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844. e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41 94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp lete:false, omap_recovered_to:, omap_complete:false))]) I think having some error objects. What'm I must do?,please! Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 03:01 PM, Ta Ba Tuan wrote: I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first
Re: [ceph-users] Can't start osd- one osd alway be down.
#ceph pg *6.9d8* query ... peer_info: [ { peer: 49, pgid: 6.9d8, last_update: 102889'7801917, last_complete: 102889'7801917, log_tail: 102377'7792649, last_user_version: 7801879, last_backfill: MAX, purged_snaps: [1~7,9~44b,455~1f8,64f~63,6b3~3a,6ee~12f,81f~10,830~8,839~69b,ed7~7,edf~4,ee4~6f5,15da~f9,16d4~1f,16f5~7,16fd~4,1705~5 e,1764~7,1771~78,17eb~12,1800~2,1803~d,1812~3,181a~1,181c~a,1827~3b,1863~1,1865~1,1867~1,186b~e,187a~3,1881~1,1884~7,188c~1,188f~3,1894~5,189f~2, 18ab~1,18c6~1,1922~13,193d~1,1940~1,194a~1,1968~5,1975~1,1979~4,197e~4,1984~1,1987~11,199c~1,19a0~1,19a3~9,19ad~3,19b2~1,19b6~27,19de~8], history: { epoch_created: 164, last_epoch_started: 102888, last_epoch_clean: 102888, last_epoch_split: 0 parent_split_bits: 0, last_scrub: 91654'7460936, last_scrub_stamp: 2014-10-10 10:36:25.433016, last_deep_scrub: 81667'5815892, last_deep_scrub_stamp: 2014-08-29 09:44:14.012219, last_clean_scrub_stamp: 2014-10-10 10:36:25.433016, log_size: 9229, ondisk_log_size: 9229, stats_invalid: 1, stat_sum: { num_bytes: 17870536192, num_objects: 4327, num_object_clones: 29, num_object_copies: 12981,* ** num_objects_missing_on_primary: 4,* num_objects_degraded: 4, num_objects_unfound: 0, num_objects_dirty: 1092, num_whiteouts: 0, num_read: 4820626, num_read_kb: 59073045, num_write: 12748709, num_write_kb: 181630845, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 135847, num_bytes_recovered: 562255538176, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0}, On 10/25/2014 07:40 PM, Ta Ba Tuan wrote: My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus h(*6.9d8* 102856 [PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, version: 102853'7800592, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844. e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41 94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp lete:false, omap_recovered_to:, omap_complete:false))]) I think having some error objects. What'm I must do?,please! Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 03:01 PM, Ta Ba Tuan wrote: I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active
Re: [ceph-users] Can't start osd- one osd alway be down.
Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 05:07 AM, Craig Lewis wrote: It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list
[ceph-users] Can't start osd- one osd alway be down.
Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam 2014-10-24 11:10:53.036094 7f86c6fcb780 0 xfsfilestorebackend(/var/lib/ceph/osd/cloud-21) detect_feature: extsize is disabled by conf 2014-10-24 11:10:53.181392 7f86c6fcb780 0 filestore(/var/lib/ceph/osd/cloud-21) mount: WRITEAHEAD journal mode explicitly enabled in conf 2014-10-24 11:10:53.191499 7f86c6fcb780 1 journal _open /dev/sdi1 fd 24: 11998855168 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-10-24 11:11:03.794632 7f86c6fcb780 1 journal _open /dev/sdi1 fd 24: 11998855168 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-10-24 11:11:03.845410 7f86c6fcb780 0 cls cls/hello/cls_hello.cc:271: loading cls_hello 2014-10-24 11:11:04.174302 7f86c6fcb780 0 osd.21 101773 crush map has features 1107558400, adjusting msgr requires for clients 2014-10-24 11:11:04.174360 7f86c6fcb780 0 osd.21 101773 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2014-10-24 11:11:04.174373 7f86c6fcb780 0 osd.21 101773 crush map has features 1107558400, adjusting msgr requires for osds 2014-10-24 11:11:04.174402 7f86c6fcb780 0 osd.21 101773 load_pgs 2014-10-24 11:11:22.986057 7f86c6fcb780 0 osd.21 101773 load_pgs opened 281 pgs 2014-10-24 11:11:23.039971 7f86b6d2e700 0 osd.21 101773 ignoring osdmap until we have initialized 2014-10-24 11:11:23.040818 7f86b6d2e700 0 osd.21 101773 ignoring osdmap until we have initialized 2014-10-24 11:11:23.276236 7f86c6fcb780 0 osd.21 101773 done with init, starting boot process 2014-10-24 11:12:44.346474 7f865ca3c700 0 -- 192.168.1.2:6840/28594 172.30.1.81:0/4234900213 pipe(0x23c15000 sd=66 :6840 s=0 pgs=0 cs=0 l=0 c=0x246f96e0).accept peer addr is really 172.30.1.81:0/4234900213 (socket is 172.30.1.81:47697/0) 2014-10-24 11:15:27.767594 7f86a2505700 -1 *** Caught signal (Segmentation fault) ** in thread 7f86a2505700 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: /usr/bin/ceph-osd() [0x9c830a] 2: (()+0xfcb0) [0x7f86c6009cb0] 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x8079e5] 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x44c) [0x82215c] 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x867390] 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x84d70b] 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x84d8de] 8: (ReplicatedPG::snap_trimmer()+0x588) [0x7cc118] 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x675f14] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa9a366] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa9c380] 12: (()+0x7e9a) [0x7f86c6001e9a] 13: (clone()+0x6d) [0x7f86c4efb31d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -1 2014-10-24 11:15:20.324218 7f869c4f9700 1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6853/4658 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0xe955e00 con 0x20af2b00 - 2014-10-24 11:15:20.324268 7f869c4f9700 1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6862/22372 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0x22e2de00 con 0x20af2840 -9998 2014-10-24 11:15:20.324313 7f869c4f9700 1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6863/22372 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0x22e2d700 con 0x20af26e0 -9713 2014-10-24 11:15:20.365153 7f86ae51d700 5 -- op tracker -- , seq: 18573, time: 2014-10-24 11:15:20.365153, event: done, request: osd_op(client.7869019.0:6944380 rbd_data.451e822ae8944a.0128 [set-alloc-hint object_size 4194304 write_size 4194304,write 479232~4096] 6.b4cc39f6 snapc 18ee=[18ee] ack+ondisk+write e101783) v4 -9712 2014-10-24 11:15:20.365266 7f86ae51d700 5 -- op tracker -- , seq: 18576, time: 2014-10-24 11:15:20.365266, event: done, request: osd_sub_op_reply(client.7869019.0:6944380 6.9f6 b4cc39f6/rbd_data.451e822ae8944a.0128/head//6 [] ondisk, result = 0) v2 -9711 2014-10-24
[ceph-users] urgent- object unfound
Hi eveyone, I use replicate 3, many unfound object and Ceph very slow. pg 6.9d8 is active+recovery_wait+degraded+remapped, acting [22,93], 4 unfound pg 6.766 is active+recovery_wait+degraded+remapped, acting [21,36], 1 unfound pg 6.73f is active+recovery_wait+degraded+remapped, acting [19,84], 2 unfound pg 6.63c is active+recovery_wait+degraded+remapped, acting [10,37], 2 unfound pg 6.56c is active+recovery_wait+degraded+remapped, acting [124,93], 2 unfound pg 6.4d3 is active+recovering+degraded+remapped, acting [33,94], 2 unfound pg 6.4a5 is active+recovery_wait+degraded+remapped, acting [11,94], 2 unfound pg 6.2f9 is active+recovery_wait+degraded+remapped, acting [22,34], 2 unfound recovery 535673/52672768 objects degraded (1.017%); 17/17470639 unfound (0.000%) ceph pg map 6.766 osdmap e94990 pg 6.766 (6.766) - up [49,36,21] acting [21,36] I can't resolve it. I need data on those objects. Guide me, please! Thank you! -- Tuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] add new data host
Hi all, I adding a new ceph-data host, but #ceph -s -k /etc/ceph/ceph.client.admin.keyring 2014-06-09 17:39:51.686082 7fade4f14700 0 librados: client.admin authentication error (1) Operation not permitted Error connecting to cluster: PermissionError my ceph.conf: [global] auth cluster required = cephx auth service required = cephx auth client required = cephx keyring = /etc/ceph/ceph.client.admin.keyring any suggest ? Thanks all -- TABA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] add new data host
i solved this by export key from ceph auth export... :D above question, i use key with old format version. On 06/09/2014 05:44 PM, Ta Ba Tuan wrote: Hi all, I adding a new ceph-data host, but #ceph -s -k /etc/ceph/ceph.client.admin.keyring 2014-06-09 17:39:51.686082 7fade4f14700 0 librados: client.admin authentication error (1) Operation not permitted Error connecting to cluster: PermissionError my ceph.conf: [global] auth cluster required = cephx auth service required = cephx auth client required = cephx keyring = /etc/ceph/ceph.client.admin.keyring any suggest ? Thanks all -- TABA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD not up
Dear all, I'm using Firefly. One disk was false, I replated failure disk and start that osd. But that osd 's still down. Help me, Thank you 2014-05-30 17:01:56.090314 7f9387516780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use o f aio anyway 2014-05-30 17:01:56.090344 7f9387516780 1 journal _open /mnt/osd26/journal fd 20: 1048576 bytes, block size 4096 bytes, directio = 1, aio = 0 2014-05-30 17:01:56.108431 7f9387516780 1 journal _open /mnt/osd26/journal fd 20: 1048576 bytes, block size 4096 bytes, directio = 1, aio = 0 2014-05-30 17:01:56.109711 7f9387516780 1 journal close /mnt/osd26/journal 2014-05-30 17:01:56.51 7f9387516780 0 filestore(/var/lib/ceph/osd/cloud-26) mount detected xfs (libxfs) 2014-05-30 17:01:56.115571 7f9387516780 0 genericfilestorebackend(/var/lib/ceph/osd/cloud-26) detect_features: FIEMAP ioctl is supported and app ears to work 2014-05-30 17:01:56.115597 7f9387516780 0 genericfilestorebackend(/var/lib/ceph/osd/cloud-26) detect_features: FIEMAP ioctl is disabled via 'fil estore fiemap' config option 2014-05-30 17:01:56.116528 7f9387516780 0 genericfilestorebackend(/var/lib/ceph/osd/cloud-26) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2014-05-30 17:01:56.116634 7f9387516780 0 xfsfilestorebackend(/var/lib/ceph/osd/cloud-26) detect_feature: extsize is supported 2014-05-30 17:01:56.119424 7f9387516780 0 filestore(/var/lib/ceph/osd/cloud-26) mount: WRITEAHEAD journal mode explicitly enabled in conf 2014-05-30 17:01:56.119733 7f9387516780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use o f aio anyway 2014-05-30 17:01:56.119765 7f9387516780 1 journal _open /mnt/osd26/journal fd 21: 1048576 bytes, block size 4096 bytes, directio = 1, aio = 0 2014-05-30 17:01:56.119927 7f9387516780 1 journal _open /mnt/osd26/journal fd 21: 1048576 bytes, block size 4096 bytes, directio = 1, aio = 0 2014-05-30 17:01:56.178011 7f9387516780 0 cls cls/hello/cls_hello.cc:271: loading cls_hello 2014-05-30 17:01:56.211731 7f9387516780 0 osd.26 45313 crush map has features 33816576, adjusting msgr requires for clients 2014-05-30 17:01:56.211761 7f9387516780 0 osd.26 45313 crush map has features 33816576, adjusting msgr requires for osds 2014-05-30 17:01:56.211777 7f9387516780 0 osd.26 45313 load_pgs 2014-05-30 17:01:56.211832 7f9387516780 0 osd.26 45313 load_pgs opened 0 pgs 2014-05-30 17:01:56.220766 7f9377e06700 0 osd.26 45313 ignoring osdmap until we have initialized 2014-05-30 17:01:56.221522 7f9377e06700 0 osd.26 45313 ignoring osdmap until we have initialized 2014-05-30 17:01:56.221741 7f9387516780 0 osd.26 45313 done with init, starting boot process ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD not up
Thanks *Lewis* I removed osd as follow, and re-add it. It's solved. ceph osd out 26 /etc/init.d/ceph stop osd.26 ceph osd crush remove osd.26 ceph auth del osd.26 ceph osd down 26 ceph osd rm 26 On 05/31/2014 04:16 AM, Craig Lewis wrote: On 5/30/14 03:08 , Ta Ba Tuan wrote: Dear all, I'm using Firefly. One disk was false, I replated failure disk and start that osd. But that osd 's still down. Help me, Thank you You need to re-initialize the disk after replacing it. Ceph stores cluster information on the disk, and ceph-osd needs that information to start. The process is pretty much removing the osd, then adding it again. This blog walks you through the details: http://karan-mj.blogspot.com/2014/03/admin-guide-replacing-failed-disk-in.html Or you can search through the mailing list for replace osd for more discussions. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com *Central Desktop. Work together in ways you never thought possible.* Connect with us Website http://www.centraldesktop.com/ | Twitter http://www.twitter.com/centraldesktop | Facebook http://www.facebook.com/CentralDesktop | LinkedIn http://www.linkedin.com/groups?gid=147417 | Blog http://cdblog.centraldesktop.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How do I do deep-scrub manually?
Dear Yang, I planed set nodeep-scrub at nigh daily by crontab. and with error HEALTH_WARN nodeep-scrub flag(s) set. I only concentrate messages from the monitoring tool (vd: nagios) = and I re-writed nagios'checkscript to with message HEALTH_WARN nodeep-scrub flag(s) set returns code = 0. On 05/20/2014 10:47 AM, Jianing Yang wrote: I found that deep scrub has a significant impact on my cluster. I've used ceph osd set nodeep-scrub disable it. But I got an error HEALTH_WARN nodeep-scrub flag(s) set. What is the proper way to disable deep scrub? and how can I run it manually? -- _ / Install 'denyhosts' to help protect \ | against brute force SSH attacks,| \ auto-blocking multiple attempts./ - \ \ \ .- O -..--. ,---. .-==-. /_-\'''/-_\ / / '' \ \ |,-.| /____\ |/ o) (o \|| | ')(' | | /,'-'.\ |/ (')(') \| \ ._. / \ \/ / {_/(') (')\_} \ __ / ,-_,,,_-. '=jf=' `. _ .','--__--'. / . \/\ /'-___-'\/:|\ (_) . (_) / \ / \ (_) :| (_) \_-'--/ (_)(_) (_)___(_) |___:|| \___/ || \___/ |_| ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg incomplete in .rgw.buckets pool
Dear everyone, I lost 2 osd(s) and my '.rgw.buckets' pool is using 2 replicate, Therefore has some incomplete pgs cluster health HEALTH_WARN 88 pgs backfill; 1 pgs backfilling; 89 pgs degraded; *5 pgs incomplete;** * *14.aa8* 39930 0 0 1457965487 13091309 incomplete 2014-04-18 11:25:12.806908 25968'6407229 42833:8886 [22,82] [22,82] 25968'6407229 2014-04-07 06:22:35.668600 25968'6407229 2014-04-07 06:22:35.668600 *14.a5a * 0 0 0 0 0 0 0 incomplete 2014-04-18 11:25:12.924637 0'0 42833:36 [82,23] [82,23] 25968'55952310 2014-04-04 09:51:10.317932 25968'55952310 2014-04-04 09:51:10.317932 #ceph osd dump|grep .rgw.buckets pool 14 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 2800 pgp_num 2800 last_change 14373 (incomplele_pgs in .rgw.buckets pool (id=14)) How to resolve this error? Thank everyone, -- Tuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg incomplete in .rgw.buckets pool
Thank Ирек Фасихов for my reply. I restarted osds that contains incomplete pgs, but still false :( On 04/18/2014 03:16 PM, Ирек Фасихов wrote: Ceph detects that a placement group is missing a necessary period of history from its log. If you see this state, report a bug, and try to start any failed OSDs that may contain the needed information. 2014-04-18 12:15 GMT+04:00 Ирек Фасихов malm...@gmail.com mailto:malm...@gmail.com: Oh, sorry, confused with inconsistent. :) 2014-04-18 12:13 GMT+04:00 Ирек Фасихов malm...@gmail.com mailto:malm...@gmail.com: You need to repair pg. This is the first sign that your hard drive was fail under. ceph pg repair *14.a5a * ceph pg repair *14.aa8* 2014-04-18 12:09 GMT+04:00 Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn: Dear everyone, I lost 2 osd(s) and my '.rgw.buckets' pool is using 2 replicate, Therefore has some incomplete pgs cluster health HEALTH_WARN 88 pgs backfill; 1 pgs backfilling; 89 pgs degraded; *5 pgs incomplete;** * *14.aa8* 39930 0 0 1457965487 1309 1309incomplete 2014-04-18 11:25:12.806908 25968'6407229 42833:8886 [22,82] [22,82] 25968'6407229 2014-04-07 06:22:35.668600 25968'6407229 2014-04-07 06:22:35.668600 *14.a5a * 0 0 0 0 0 0 0 incomplete 2014-04-18 11:25:12.924637 0'0 42833:36[82,23] [82,23] 25968'55952310 2014-04-04 09:51:10.317932 25968'55952310 2014-04-04 09:51:10.317932 #ceph osd dump|grep .rgw.buckets pool 14 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 2800 pgp_num 2800 last_change 14373 (incomplele_pgs in .rgw.buckets pool (id=14)) How to resolve this error? Thank everyone, -- Tuan ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg incomplete in .rgw.buckets pool
Yes, I restarted all ceph-osds (22,23,82). But: cluster health HEALTH_WARN 75 pgs backfill; 1 pgs backfilling; 76 pgs degraded; *5 pgs incomplete*; .. On 04/18/2014 03:42 PM, Ирек Фасихов wrote: You OSD restarts all disks on which is your unfinished pgs? (22,23,82) 2014-04-18 12:35 GMT+04:00 Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn: Thank Ирек Фасихов for my reply. I restarted osds that contains incomplete pgs, but still false :( On 04/18/2014 03:16 PM, Ирек Фасихов wrote: Ceph detects that a placement group is missing a necessary period of history from its log. If you see this state, report a bug, and try to start any failed OSDs that may contain the needed information. 2014-04-18 12:15 GMT+04:00 Ирек Фасихов malm...@gmail.com mailto:malm...@gmail.com: Oh, sorry, confused with inconsistent. :) 2014-04-18 12:13 GMT+04:00 Ирек Фасихов malm...@gmail.com mailto:malm...@gmail.com: You need to repair pg. This is the first sign that your hard drive was fail under. ceph pg repair *14.a5a * ceph pg repair *14.aa8* 2014-04-18 12:09 GMT+04:00 Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn: Dear everyone, I lost 2 osd(s) and my '.rgw.buckets' pool is using 2 replicate, Therefore has some incomplete pgs cluster health HEALTH_WARN 88 pgs backfill; 1 pgs backfilling; 89 pgs degraded; *5 pgs incomplete;** * *14.aa8* 3993 0 0 0 1457965487 1309 1309incomplete 2014-04-18 11:25:12.806908 25968'6407229 42833:8886 [22,82] [22,82] 25968'6407229 2014-04-07 06:22:35.668600 25968'6407229 2014-04-07 06:22:35.668600 *14.a5a * 0 0 0 0 0 0 0 incomplete 2014-04-18 11:25:12.924637 0'0 42833:36 [82,23] [82,23] 25968'55952310 2014-04-04 09:51:10.317932 25968'55952310 2014-04-04 09:51:10.317932 #ceph osd dump|grep .rgw.buckets pool 14 '.rgw.buckets' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 2800 pgp_num 2800 last_change 14373 (incomplele_pgs in .rgw.buckets pool (id=14)) How to resolve this error? Thank everyone, -- Tuan ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg incomplete in .rgw.buckets pool
Hi Ирек Фасихов I send it to you :D, Thank you! { state: incomplete, epoch: 42880, up: [ 82, 26], acting: [ 82, 26], info: { pgid: 14.7c8, last_update: 0'0, last_complete: 0'0, log_tail: 0'0, last_user_version: 0, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 1225, last_epoch_started: 33655, last_epoch_clean: 31852, last_epoch_split: 0, same_up_since: 42851, same_interval_since: 42851, same_primary_since: 42799, last_scrub: 25968'6408501, last_scrub_stamp: 2014-04-04 11:00:42.392406, last_deep_scrub: 25968'6408501, last_deep_scrub_stamp: 2014-04-04 11:00:42.392406, last_clean_scrub_stamp: 2014-04-04 11:00:42.392406}, stats: { version: 0'0, reported_seq: 86, reported_epoch: 42880, state: incomplete, last_fresh: 2014-04-18 17:32:50.786806, last_change: 2014-04-18 15:29:38.116110, last_active: 0.00, last_clean: 0.00, last_became_active: 0.00, last_unstale: 2014-04-18 17:32:50.786806, log_start: 0'0, ondisk_log_start: 0'0, created: 1225, last_epoch_clean: 31852, parent: 0.0, parent_split_bits: 0, last_scrub: 25968'6408501, last_scrub_stamp: 2014-04-04 11:00:42.392406, last_deep_scrub: 25968'6408501, last_deep_scrub_stamp: 2014-04-04 11:00:42.392406, last_clean_scrub_stamp: 2014-04-04 11:00:42.392406, log_size: 0, ondisk_log_size: 0, stats_invalid: 0, stat_sum: { num_bytes: 0, num_objects: 0, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_read: 0, num_read_kb: 0, num_write: 0, num_write_kb: 0, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 0, num_bytes_recovered: 0, num_keys_recovered: 0}, stat_cat_sum: {}, up: [ 82, 26], acting: [ 82, 26]}, empty: 1, dne: 0, incomplete: 0, last_epoch_started: 0}, recovery_state: [ { name: Started\/Primary\/Peering, enter_time: 2014-04-18 15:29:38.053872, past_intervals: [ { first: 31851, last: 31936, maybe_went_rw: 1, up: [ 82], acting: [ 82]}, { first: 31937, last: 33639, maybe_went_rw: 1, up: [ 82, 26], acting: [ 82, 26]}, { first: 33640, last: 33653, maybe_went_rw: 1, up: [ 26], acting: [ 26]}, { first: 33654, last: 33668, maybe_went_rw: 1, up: [ 82, 26], acting: [ 82, 26]}, { first: 33669, last: 34084, maybe_went_rw: 1, up: [ 26], acting: [ 26]}, { first: 34085, last: 42332, maybe_went_rw: 1, up: [ 82, 26], acting: [ 82, 26]}, { first: 42333, last: 42380, maybe_went_rw: 1, up: [ 82], acting: [ 82]}, { first: 42381, last: 42392, maybe_went_rw: 1, up: [ 82, 26], acting: [ 82, 26]}, { first: 42393, last: 42420, maybe_went_rw: 1, up: [ 26], acting: [ 26]}, { first: 42421, last: 42696, maybe_went_rw: 1, up: [ 82,
Re: [ceph-users] pg incomplete in .rgw.buckets pool
Hi Ирек Фасихов, *#ls -lsa /var/lib/ceph/osd/cloud-26/current/14.7c8_*/* total 16 0 drwxr-xr-x 2 root root 6 Apr 9 22:49 . 16 drwxr-xr-x 443 root root 12288 Apr 18 18:46 .. *# ls -lsa /var/lib/ceph/osd/cloud-82/current/14.7c8_*/* total 16 0 drwxr-xr-x 2 root root 6 Apr 18 11:25 . 16 drwxr-xr-x 445 root root 12288 Apr 18 19:17 .. Thanks! On 04/18/2014 06:11 PM, Ирек Фасихов wrote: Is there any data to: ls -lsa /var/lib/ceph/osd/ceph-82/current/14.7c8_*/ ls -lsa /var/lib/ceph/osd/ceph-26/current/14.7c8_*/ 2014-04-18 14:36 GMT+04:00 Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn: Hi Ирек Фасихов I send it to you :D, Thank you! { state: incomplete, epoch: 42880, up: [ 82, 26], acting: [ 82, 26], info: { pgid: 14.7c8, last_update: 0'0, last_complete: 0'0, log_tail: 0'0, last_user_version: 0, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 1225, last_epoch_started: 33655, last_epoch_clean: 31852, last_epoch_split: 0, same_up_since: 42851, same_interval_since: 42851, same_primary_since: 42799, last_scrub: 25968'6408501, last_scrub_stamp: 2014-04-04 11:00:42.392406, last_deep_scrub: 25968'6408501, last_deep_scrub_stamp: 2014-04-04 11:00:42.392406, last_clean_scrub_stamp: 2014-04-04 11:00:42.392406}, stats: { version: 0'0, reported_seq: 86, reported_epoch: 42880, state: incomplete, last_fresh: 2014-04-18 17:32:50.786806, last_change: 2014-04-18 15:29:38.116110, last_active: 0.00, last_clean: 0.00, last_became_active: 0.00, last_unstale: 2014-04-18 17:32:50.786806, log_start: 0'0, ondisk_log_start: 0'0, created: 1225, last_epoch_clean: 31852, parent: 0.0, parent_split_bits: 0, last_scrub: 25968'6408501, last_scrub_stamp: 2014-04-04 11:00:42.392406, last_deep_scrub: 25968'6408501, last_deep_scrub_stamp: 2014-04-04 11:00:42.392406, last_clean_scrub_stamp: 2014-04-04 11:00:42.392406, log_size: 0, ondisk_log_size: 0, stats_invalid: 0, stat_sum: { num_bytes: 0, num_objects: 0, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_read: 0, num_read_kb: 0, num_write: 0, num_write_kb: 0, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 0, num_bytes_recovered: 0, num_keys_recovered: 0}, stat_cat_sum: {}, up: [ 82, 26], acting: [ 82, 26]}, empty: 1, dne: 0, incomplete: 0, last_epoch_started: 0}, recovery_state: [ { name: Started\/Primary\/Peering, enter_time: 2014-04-18 15:29:38.053872, past_intervals: [ { first: 31851, last: 31936, maybe_went_rw: 1, up: [ 82], acting: [ 82]}, { first: 31937, last: 33639, maybe_went_rw: 1, up: [ 82, 26], acting: [ 82, 26]}, { first: 33640, last: 33653, maybe_went_rw: 1, up: [ 26], acting: [ 26]}, { first: 33654, last: 33668, maybe_went_rw: 1, up: [ 82, 26], acting: [ 82, 26]}, { first: 33669, last: 34084, maybe_went_rw: 1, up: [ 26], acting: [ 26
Re: [ceph-users] Change object size from 4MB (default) to 16MB
Hi Wido, Thank you for your helping. I' m using network with 10Gbps speed for my Ceph Cluster. and I want to optimize it to read/write images faster with object size= 16/32MB. Any your idea? If you use the 'rbd' tool you can set the --order flag to set it to anything else then the default 22 (4MB). Also your reached way, Can I use arrgument configurations from .. and don't nead to use rbd with --order ? rbd_default_stripe_count: 1, rbd_default_stripe_unit: *4194304*, to rbd_default_stripe_count: 1, rbd_default_stripe_unit: *8388608*, #(16MB) Thanks Wido! -- Tuan On 04/06/2014 04:34 AM, Wido den Hollander wrote: On 04/05/2014 07:15 AM, Ta Ba Tuan wrote: Hi everyone My Ceph cluster is running, I'm plaining to tune my Ceph performance. I want to increase object size from 4M to 16MB (maybe 32MB,..) With the fomular: stripe_unit * stripe_count equals object_size, i'm thinking to change this following option: rbd_default_stripe_unit from *4194304 = 8388608 (16MB)* and stripe_count still keeping default (ceph --admin-daemon /var/run/ceph/ceph-osd.74.asok config set *rbd_default_stripe_unit **8388608* (on any osd(s)) Righ? Please guide me. No, not completely. The striping/object size settings have to be set when creating an RBD image. If you use the 'rbd' tool you can set the --order flag to set it to anything else then the default 22 (4MB). The OSDs don't care about the object size, they simply serve objects. Any reason why you want to go for 16 or 32MB? Just wondering. The 4MB default seems to work fine, but I can imagine situations where a larger object size works better. Wido Thank everyone! -- Tuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Change object size from 4MB (default) to 16MB
Hi everyone My Ceph cluster is running, I'm plaining to tune my Ceph performance. I want to increase object size from 4M to 16MB (maybe 32MB,..) With the fomular: stripe_unit * stripe_count equals object_size, i'm thinking to change this following option: rbd_default_stripe_unit from *4194304 = 8388608 (16MB)* and stripe_count still keeping default (ceph --admin-daemon /var/run/ceph/ceph-osd.74.asok config set *rbd_default_stripe_unit **8388608* (on any osd(s)) Righ? Please guide me. Thank everyone! -- Tuan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [Big Problem?] Why not using Device'UUID in ceph.conf
Hi all I have 3 OSDs, named sdb, sdc, sdd. Suppose, one OSD with device */dev/sdc* die = My server have only sdb, sdc at the moment. Because device /dev/sdc replaced by /dev/sdd I have the following configuration: [osd.0] host = data-01 devs = /dev/sdb1 [osd.1] host = data-01 devs = /dev/sdc1 [osd.2] host = data-02 devs = /dev/sdd1 .. when rebooting server (or restart ceph service). I think osd (with /dev/sdd1) will be error. Everone, Why not using Device'UUID in ceph.conf? ( Configure into /etc/fstab) and with larger Storage enviroment will a bigest proplem Thanks all! -- tuantaba ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Big Problem?] Why not using Device'UUID in ceph.conf
Hi James, Proplem is why the Ceph not recommend using Device'UUID in Ceph.conf, when, above error can be occur? -- TuanTaBa On 11/26/2013 04:04 PM, James Harper wrote: Hi all I have 3 OSDs, named sdb, sdc, sdd. Suppose, one OSD with device /dev/sdc die = My server have only sdb, sdc at the moment. Because device /dev/sdc replaced by /dev/sdd Can you just use one of the /dev/disk/by-something/identifier symlinks? Eg /dev/disk/by-uuid/153cf32b-e46b-4d31-95ef-749db3a88d02 /dev/disk/by-id/scsi-SATA_WDC_WD10EACS-00D_WD-WCAU66606660 Your distribution should allow for such things automatically, and if not you should be able to add some udev rules to do it. James ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Optimize Ceph cluster (kernel, osd, rbd)
Please help me! On 07/20/2013 02:11 AM, Ta Ba Tuan wrote: Hi everyone, I have *3 nodes (running MON and MDS)* and *6 data nodes ( 84 OSDs**)* Each data nodes has configuraions: - CPU: 24 processor * Core Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz - RAM: 32GB - Disk: 14*4TB (14disks *4TB *6 data nodes= 84 OSDs) To optimize Ceph Cluster, *I adjusted some kernel arguments* (nr_request in queue and increated read throughput): #Adjust nr_request in queue (staying in mem - default is 128) echo 1024 /sys/block/sdb/queue/nr_requests echo noop /sys/block/sda/queue/scheduler (default= noop deadline [cfq]) #Increase read throughput (default: 128) echo 512 /sys/block/*/queue/read_ahead_kb And, *tuning Ceph configuraion options below:* [client] rbd cache = true rbd cache size = 536870912 rbd cache max dirty = 134217728 rbd cache target dirty = 33554432 rbd cache max dirty age = 5 [osd] osd data = /var/lib/ceph/osd/cloud-$id osd journal = /var/lib/ceph/osd/cloud-$id/journal osd journal size = 1 osd mkfs type = xfs osd mkfs options xfs = -f -i size=2048 osd mount options xfs = rw,noatime,inode64,logbsize=250k keyring = /var/lib/ceph/osd/cloud-$id/keyring.osd.$id #increasing the number may increase the request processing rate osd op threads = 24 #The number of disk threads, which are used to perform background disk intensive OSD operations such as scrubbing and snap trimming osd disk threads =24 #The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster. osd recovery max active =1 #writing direct to the journal. #Allow use of libaio to do asynchronous writes journal dio = true journal aio = true #Synchronization interval: #The maximum/minimum interval in seconds for synchronizing the filestore. filestore max sync interval = 100 filestore min sync interval = 50 #Defines the maximum number of in progress operations the file store accepts before blocking on queuing new operations. filestore queue max ops = 2000 #The maximum number of bytes for an operation filestore queue max bytes = 536870912 #The maximum number of operations the filestore can commit. filestore queue committing max ops = 2000 (default =500) #The maximum number of bytes the filestore can commit. filestore queue committing max bytes = 536870912 #When you add or remove Ceph OSD Daemons to a cluster, the CRUSH algorithm will want to rebalance the cluster by moving placement groups to or from Ceph OSD Daemons to restore the balance. The process of migrating placement groups and the objects they contain can reduce the cluster's operational performance considerably. To maintain operational performance, Ceph performs this migration with 'backfilling', which allows Ceph to set backfill operations to a lower priority than requests to read or write data. osd max backfills = 1 Tomorrow, I'm going to implement Ceph Cluster, I have very little experience in managing Ceph. So, I hope someone give me advices about above arguments and guide me how to best optimize ceph cluster? Thank you so much! --tuantaba ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -w warning I don't have pgid 0.2c8?
Hi Samuel, Output logs from : ceph pg dump | grep 'stale\|creating' 0.f4f 0 0 0 0 0 0 0 stale+creating 2013-07-17 16:35:06.882419 0'0 0'0 [] [68,12] 0│'0 0.00 0'0 0.00 │ 2.f4d 0 0 0 0 0 0 0 stale+creating 2013-07-17 16:35:22.826552 0'0 0'0 [] [68,12] 0│ '0 0.00 0'0 0.00 │ 0.2c8 0 0 0 0 0 0 0 stale+creating 2013-07-17 14:30:54.280454 0'0 0'0 [] [68,5] 0│ '0 0.00 0'0 0.00 │ 2.2c6 0 0 0 0 0 0 0 stale+creating 2013-07-17 16:35:28.445878 0'0 0'0 [] [68,5] 0│ '0 0.00 0'0 0.00 Thanks! On 07/19/2013 01:16 AM, Samuel Just wrote: ceph pg dump | grep 'stale\|creating' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph -w warning I don't have pgid 0.2c8?
Hi everyone, I converted every osds from 2TB to 4TB, and when moving complete, show log Ceph realtimeceph -w: displays error: *I don't have pgid 0.2c8* after then, I run: ceph pg force_create_pg 0.2c8 Ceph warning: pgmap v55175: 22944 pgs: 1 creating, 22940 active+clean, 3 stale+active+degraded then, I can't read/write data to mounted CephFS on Client-side == notify on client side: Operation not permitted now, ceph -w still notify 22944 pgs: 1 creating, 22940 active+clean, 3 stale+active+degraded and I don't understand something occuring? Please, help me!!! Thanks to everyone. --tuantaba ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -w warning I don't have pgid 0.2c8?
I'm using Ceph-0.61.4, I removed each osds (2TB) on data hosts and re-create with disks (4TB). When converting finish, Ceph warns that have 4 pgs in stale state and warning: i don't have pgid pgid after, I created 4 pgs by command: ceph pg force_create_pg pgid Now (after the long time), Ceph still warning: pgmap v57451: 22944 pgs: *4 creating*, 22940 active+clean; I don't know how to remove those pgs?. Please guiding this error help me! Thank you! --tuantaba TA BA TUAN On 07/18/2013 01:16 AM, Samuel Just wrote: What version are you running? How did you move the osds from 2TB to 4TB? -Sam On Wed, Jul 17, 2013 at 12:59 AM, Ta Ba Tuantua...@vccloud.vn wrote: Hi everyone, I converted every osds from 2TB to 4TB, and when moving complete, show log Ceph realtimeceph -w: displays error: I don't have pgid 0.2c8 after then, I run: ceph pg force_create_pg 0.2c8 Ceph warning: pgmap v55175: 22944 pgs: 1 creating, 22940 active+clean, 3 stale+active+degraded then, I can't read/write data to mounted CephFS on Client-side == notify on client side: Operation not permitted now, ceph -w still notify 22944 pgs: 1 creating, 22940 active+clean, 3 stale+active+degraded and I don't understand something occuring? Please, help me!!! Thanks to everyone. --tuantaba ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Capacity proplem when mount CephFS with Ubuntu 14.04
Hi everyone, OSDs capacity sumary is 144TB, and *when I mount CephFS on Ubuntu 14.04 then it only display **576GB*, (Currently, I'm using replicate 3 for data pools) (using: mount -t ceph Monitor_IP:/ /ceph -o name=admin,secret=xx) I don't think capacity is too small?, please explain this help me! *on Ubuntu 14.04* FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/ 576GB 20G 556GB 1% /tmp/ceph_mount But when mounting on Ubuntu 12.04: *on Ubuntu 12.04* FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/144T 800G 113.6T 1% /tmp/ceph_mount Thanks to everyone. --tuantaba ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Capacity proplem when mount CephFS with Ubuntu 14.04
Hi Windo, Client OS that I'm using is Ubuntu 14.04 64 bits, I notified to dev list about this bug. On 07/16/2013 03:44 PM, Wido den Hollander wrote: Hi, On 07/16/2013 10:35 AM, Ta Ba Tuan wrote: Hi everyone, OSDs capacity sumary is 144TB, and *when I mount CephFS on Ubuntu 14.04 then it only display **576GB*, (Currently, I'm using replicate 3 for data pools) (using: mount -t ceph Monitor_IP:/ /ceph -o name=admin,secret=xx) I don't think capacity is too small?, please explain this help me! Just to be sure, the client isn't running a 32-bit system is it? P.S.: Please send a message to either the users or the dev list. The users list would have been the place for this message. Thanks! *on Ubuntu 14.04* FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/ 576GB 20G 556GB 1% /tmp/ceph_mount But when mounting on Ubuntu 12.04: *on Ubuntu 12.04* FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/144T 800G 113.6T 1% /tmp/ceph_mount Thanks to everyone. --tuantaba ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to limit access to pools ?
Hi Markus, Limit access to specified pool through key authentication.: Example, i having a pool is 'instances', and setting permission likes: #ceph auth get-or-create client.instances mon 'allow r' osd 'allow rwx pool=instances' --tuantaba TA BA TUAN On 07/16/2013 08:04 PM, Markus Goldberg wrote: Hi, i created a few pools with 'ceph osd pool create poolname 100 100' and set a relation to corresponding directories with 'cephfs /mnt/myceph/dirname set_layout -p poolname'. I can list the pools with 'ceph osd pools' I can mount the dirs/subdirs at the client with 'mount -t ceph xxx.xxx.xxx.xxx:6789:/dir1/dir2 /mnt/myceph -v -o name=admin,secretfile=/etc/ceph/admin.secret' (admin.secret is the key for the data-rootdir (/) ) how can i give specific clients read/write access to only a subset of the pools ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Capacity proplem when mount CephFS with Ubuntu 14.04
Thank David and Markus so much. I think, that is solution for me. I will try it. @thanks David for explaining On 07/16/2013 07:50 PM, Markus Goldberg wrote: Hi, upgrading to 3.9 Kernel is the solution. It is only needed at the client-side. Bye, Markus Am 16.07.2013 12:35, schrieb David McBride: On 16/07/13 09:35, Ta Ba Tuan wrote: Hi everyone, OSDs capacity sumary is 144TB, and *when I mount CephFS on Ubuntu 14.04 then it only display **576GB*, (Currently, I'm using replicate 3 for data pools) (using: mount -t ceph Monitor_IP:/ /ceph -o name=admin,secret=xx) I don't think capacity is too small?, please explain this help me! *on Ubuntu 14.04* FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/576GB 20G 556GB 1% /tmp/ceph_mount But when mounting on Ubuntu 12.04: *on Ubuntu 12.04* FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/144T 800G 113.6T 1% /tmp/ceph_mount This is most likely: http://tracker.ceph.com/issues/3793 This is caused by using a more modern version of coreutils; that, coupled with the numbers reported by Ceph for filesystem blocksizes in older kernels, results in the above error. To fix, either use a recent version of the fuse driver or upgrade your kernel to a newer one that includes the commit ceph: fix statvfs fr_size: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=92a49fb0f79f3300e6e50ddf56238e70678e4202 (Kernels 3.9-rc1 and later should include it.) Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Proplem about capacity when mount using CephFS?
Thank Sage, tuantaba On 07/16/2013 09:24 PM, Sage Weil wrote: On Tue, 16 Jul 2013, Ta Ba Tuan wrote: Thanks Sage, I wories about returned capacity when mounting CephFS. but when disk is full, capacity will showed 50% or 100% Used? 100%. sage On 07/16/2013 11:01 AM, Sage Weil wrote: On Tue, 16 Jul 2013, Ta Ba Tuan wrote: Hi everyone. I have 83 osds, and every osds have same 2TB, (Capacity sumary is 166TB) I'm using replicate 3 for pools ('data','metadata'). But when mounting Ceph filesystem from somewhere (using: mount -t ceph Monitor_IP:/ /ceph -o name=admin,secret=xx) then capacity sumary is showed 160TB?, I used replicate 3 and I think that it must return 160TB/3=50TB? FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/160T 500G 156T 1% /tmp/ceph_mount Please, explain this help me? statfs/df show the raw capacity of the cluster, not the usable capacity. How much data you can store is a (potentially) complex function of your CRUSH rules and replication layout. If you store 1TB, you'll notice the available space will go down by about 2TB (if you're using the default 2x). sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Proplem about capacity when mount using CephFS?
Hi everyone. I have 83 osds, and every osds have same 2TB, (Capacity sumary is 166TB) I'm using replicate 3 for pools ('data','metadata'). But when mounting Ceph filesystem from somewhere (using: mount -t ceph Monitor_IP:/ /ceph -o name=admin,secret=xx) then/*capacity sumary is showed 160TB?, I used replicate 3 and I think that *//*i*//*t must return 160TB/3=50TB?*/ FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/160T 500G 156T 1% /tmp/ceph_mount Please, explain this help me? Thanks to everyone so much!!! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Proplem about capacity when mount using CephFS?
Thanks Sage, I wories about returned capacity when mounting CephFS. but when disk is full, capacity will showed 50% or 100% Used? On 07/16/2013 11:01 AM, Sage Weil wrote: On Tue, 16 Jul 2013, Ta Ba Tuan wrote: Hi everyone. I have 83 osds, and every osds have same 2TB, (Capacity sumary is 166TB) I'm using replicate 3 for pools ('data','metadata'). But when mounting Ceph filesystem from somewhere (using: mount -t ceph Monitor_IP:/ /ceph -o name=admin,secret=xx) then capacity sumary is showed 160TB?, I used replicate 3 and I think that it must return 160TB/3=50TB? FilesystemSize Used Avail Use% Mounted on 192.168.32.90:/160T 500G 156T 1% /tmp/ceph_mount Please, explain this help me? statfs/df show the raw capacity of the cluster, not the usable capacity. How much data you can store is a (potentially) complex function of your CRUSH rules and replication layout. If you store 1TB, you'll notice the available space will go down by about 2TB (if you're using the default 2x). sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [TuanTB] I want to join maillistsceph-delvel, ceph-users
Hi Joao, Thank for replying, I hope I might contribute my knowledges for the Ceph, With me, the Ceph is very nice!! Thank you! --TuanTB On 05/29/2013 10:17 PM, Joao Eduardo Luis wrote: On 05/29/2013 05:26 AM, Ta Ba Tuan wrote: Hi Majodomo I am TuanTB (full name: Tuan Ta Ba, and I come from VietNam), I 'm working about the Cloud Computing Of course, Were are using the Ceph, and I'm a new Ceph'member so, I hope to be joined ceph-delvel, ceph-users mailist. Thank you so much Regrex! --TuanTB Hello Tuan, I suspect you might want to subscribe to both ceph-devel and ceph-users. For that, you should do the following: - send an email with no subject to 'ceph-de...@vger.kernel.org' just containing 'subscribe' as a message - Do the same as above, but send it do 'ceph-us...@ceph.com' instead That should do the trick. Hope this helps. -Joao ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] (no subject)
subscribe ceph-users ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [TuanTB] I want to join maillistsceph-delvel, ceph-users
Hi Majodomo I am TuanTB (full name: Tuan Ta Ba, and I come from VietNam), I 'm working about the Cloud Computing Of course, Were are using the Ceph, and I'm a new Ceph'member so, I hope to be joined ceph-delvel, ceph-users mailist. Thank you so much Regrex! --TuanTB ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com