Re: ceph status reporting non-existing osd
On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg Oh, sorry, I have mislooked and thought that you suggested filling up osds. How do I can set full/nearfull ratios correctly? $ceph injectargs '--mon_osd_full_ratio 96' parsed options $ ceph injectargs '--mon_osd_near_full_ratio 94' parsed options ceph pg dump | grep 'full' full_ratio 0.95 nearfull_ratio 0.85 Setting parameters in the ceph.conf and then restarting mons does not affect ratios either. Thanks, it worked, but setting values back result to turn warning back. Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com wrote: On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg Oh, sorry, I have mislooked and thought that you suggested filling up osds. How do I can set full/nearfull ratios correctly? $ceph injectargs '--mon_osd_full_ratio 96' parsed options $ ceph injectargs '--mon_osd_near_full_ratio 94' parsed options ceph pg dump | grep 'full' full_ratio 0.95 nearfull_ratio 0.85 Setting parameters in the ceph.conf and then restarting mons does not affect ratios either. Thanks, it worked, but setting values back result to turn warning back. Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Hi Noah, After reinstalled java-rados,when I run ant test now am getting follwing error in terminal, Buildfile: /home/vutp/java-rados/build.xml makedir: compile-rados: compile-tests: [javac] Compiling 1 source file to /home/vutp/java-rados/build/test jar: test: [junit] Running ClusterStatsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec [junit] Running ClusterTest [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec BUILD FAILED /home/vutp/java-rados/build.xml:134: Test ClusterTest failed Total time: 10 seconds --And also two txt files generated in Java-Rados directory one is TEST- ClusterStatsTest.txt in this file the text is , Testsuite: ClusterStatsTest Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec Testcase: test_ClusterStats took 0.027 sec --and one more txt file is TEST-ClusterTest.txt in this file the text is , Testsuite: ClusterTest Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec Testcase: test_ConfigOption took 0.026 sec FAILED junit.framework.AssertionFailedError: at ClusterTest.test_ConfigOption(Unknown Source) Testcase: test_getClusterStats took 0.005 sec Testcase: test_getInstancePointer took 0.004 sec Testcase: test_getVersion took 0.005 sec Testcase: test_PoolOperations took 1.821 sec Testcase: test_openIOContext took 2.134 sec Testcase: test_PoolList took 2.543 sec Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg Oh, sorry, I have mislooked and thought that you suggested filling up osds. How do I can set full/nearfull ratios correctly? $ceph injectargs '--mon_osd_full_ratio 96' parsed options $ ceph injectargs '--mon_osd_near_full_ratio 94' parsed options ceph pg dump | grep 'full' full_ratio 0.95 nearfull_ratio 0.85 Setting parameters in the ceph.conf and then restarting mons does not affect ratios either. Thanks, it worked, but setting values back result to turn warning back. Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). Okay, that's what happened — marking an OSD out in the CRUSH map means all the data gets moved off it, but that doesn't remove it from all the places where it's registered in the monitor and in the map, for a couple reasons: 1) You might want to mark an OSD out before taking it down, to allow for more orderly data movement. 2) OSDs can get marked out automatically, but the system shouldn't be able to forget about them on its own. 3) You might want to remove an
Re: ceph status reporting non-existing osd
On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote: On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg Oh, sorry, I have mislooked and thought that you suggested filling up osds. How do I can set full/nearfull ratios correctly? $ceph injectargs '--mon_osd_full_ratio 96' parsed options $ ceph injectargs '--mon_osd_near_full_ratio 94' parsed options ceph pg dump | grep 'full' full_ratio 0.95 nearfull_ratio 0.85 Setting parameters in the ceph.conf and then restarting mons does not affect ratios either. Thanks, it worked, but setting values back result to turn warning back. Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). Okay, that's what happened — marking an OSD out in the CRUSH map means all the data gets moved off it, but that doesn't remove it from all the places where it's registered in the monitor and in the map, for a couple reasons: 1) You might want to mark an OSD out before taking it down, to allow for more orderly data movement. 2) OSDs can get marked out automatically, but the system shouldn't be able to forget about them on its own. 3) You might want to remove an OSD from the CRUSH map in the
Puppet modules for Ceph
Hi, I'm currently working on writing a Puppet module for Ceph. As after some research I found no existing module, I'll start from scratch but I would be glad to hear from people who would already have started working or this or having any idea or pointers regarding this subject. Thanks, [ By the way, I'm fc on #ceph ! ] -- François Charlier Software Engineer // eNovance labs http://labs.enovance.com // ✉ francois.charl...@enovance.com ☎ +33 1 49 70 99 81 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Puppet modules for Ceph
On 7/18/12 8:58 AM, François Charlier wrote: Hi, I'm currently working on writing a Puppet module for Ceph. As after some research I found no existing module, I'll start from scratch but I would be glad to hear from people who would already have started working or this or having any idea or pointers regarding this subject. Thanks, [ By the way, I'm fc on #ceph ! ] Hi Francois, That's great! You might want to look at the chef work that has been done as a base to start from. I'm not very familiar with what is in place, but Tommi or Dan may chime in later with more details. Some of the folks from Mediawiki were actually just talking about puppet modules yesterday on the IRC channel so they may be interested in collaborating too. Thanks, Mark -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poor read performance in KVM
On 07/17/2012 10:46 PM, Vladimir Bashkirtsev wrote: On 16/07/12 15:46, Josh Durgin wrote: On 07/15/2012 06:13 AM, Vladimir Bashkirtsev wrote: Hello, Lately I was trying to get KVM to perform well on RBD. But it still appears elusive. [root@alpha etc]# rados -p rbd bench 120 seq -t 8 Total time run:16.873277 Total reads made: 302 Read size:4194304 Bandwidth (MB/sec):71.592 Average Latency: 0.437984 Max latency: 3.26817 Min latency: 0.015786 Fairly good performance. But when I run in KVM: [root@mail ~]# hdparm -tT /dev/vda /dev/vda: Timing cached reads: 8808 MB in 2.00 seconds = 4411.49 MB/sec This is just the guest page cache - it's reading the first two megabytes of the device repeatedly. Just to make sure there no issue with VM itself. Timing buffered disk reads: 10 MB in 6.21 seconds = 1.61 MB/sec This is a sequential read, so readahead in the guest should help here. Should but obviously does not. Not even close to what rados bench show! I even seen 900KB/sec performance. Such slow read performance of course affecting guests. Any ideas where to start to look for performance boost? Do you have rbd caching enabled? rbd_cache=true:rbd_cache_size=134217728:rbd_cache_max_dirty=125829120 It would also be interesting to see how the guest reads are translating to rados reads. hdparm is doing 2MiB sequential reads of the block device. If you add admin_socket=/var/run/ceph/kvm.asok to the rbd device on the qemu command line) you can see number of requests, latency, and request size info while the guest is running via: ceph --admin-daemon /var/run/ceph/kvm.asok perf dump Done that. Waited for VM to fully boot then got perf dump. It would be nice to get output in human readable format instead of JSON - I remember some other part of ceph had relevant command line switch. Does it exist for perf dump? {librbd-rbd/kvm1:{rd:0,rd_bytes:0,rd_latency:{avgcount:0,sum:0},wr:0,wr_bytes:0,wr_latency:{avgcount:0,sum:0},discard:0,discard_bytes:0,discard_latency:{avgcount:0,sum:0},flush:0,aio_rd:3971,aio_rd_bytes:64750592,aio_rd_latency:{avgcount:3971,sum:803.656},aio_wr:91,aio_wr_bytes:652288,aio_wr_latency:{avgcount:91,sum:0.002977},aio_discard:0,aio_discard_bytes:0,aio_discard_latency:{avgcount:0,sum:0},snap_create:0,snap_remove:0,snap_rollback:0,notify:0,resize:0},objectcacher-librbd-rbd/kvm1:{cache_ops_hit:786,cache_ops_miss:3189,cache_bytes_hit:72186880,cache_bytes_miss:61276672,data_read:64750592,data_written:652288,data_flushed:648192,data_overwritten_while_flushing:8192,write_ops_blocked:0,write_bytes_blocked:0,write_time_blocked:0},objecter:{op_active:0,op_laggy:0,op_send:3271,op_send_bytes:0,op_resend:0,op_ack:3270,op_commit:78,op:3271,op_r:3194,op_w:77,op_rmw: 0,op_pg:0,osdop_stat:1,osdop_create:0,osdop_read:3191,osdop_write:77,osdop_writefull:0,osdop_append:0,osdop_zero:0,osdop_truncate:0,osdop_delete:0,osdop_mapext:0,osdop_sparse_read:0,osdop_clonerange:0,osdop_getxattr:0,osdop_setxattr:0,osdop_cmpxattr:0,osdop_rmxattr:0,osdop_resetxattrs:0,osdop_tmap_up:0,osdop_tmap_put:0,osdop_tmap_get:0,osdop_call:1,osdop_watch:1,osdop_notify:0,osdop_src_cmpxattr:0,osdop_pgls:0,osdop_pgls_filter:0,osdop_other:0,linger_active:1,linger_send:1,linger_resend:0,poolop_active:0,poolop_send:0,poolop_resend:0,poolstat_active:0,poolstat_send:0,poolstat_resend:0,statfs_active:0,statfs_send:0,statfs_resend:0,map_epoch:0,map_full:0,map_inc:0,osd_sessions:10,osd_session_open:4,osd_session_close:0,osd_laggy:1},throttle-msgr_dispatch_throttler-radosclient:{val:0,max:104857600,get:3292,get_sum:61673502,get_or_fail_fail:0,get_or_fail_success:0,take:0,take_sum :0,put:3292,put_sum:61673502,wait:{avgcount:0,sum:0}},throttle-objecter_bytes:{val:0,max:104857600,get:3271,get_sum:61928960,get_or_fail_fail:0,get_or_fail_success:3271,take:0,take_sum:0,put:3268,put_sum:61928960,wait:{avgcount:0,sum:0}},throttle-objecter_ops:{val:0,max:1024,get:3271,get_sum:3271,get_or_fail_fail:0,get_or_fail_success:3271,take:0,take_sum:0,put:3271,put_sum:3271,wait:{avgcount:0,sum:0}}} If my understanding is correct aio_rd is asynchrous read, latency in millisecons? Average read latency of 800ms is quite high! I remember in 1991 my 80MB HDD had similar read times - surely we are in 2012! :) It's actually the sum of the latencies of all 3971 asynchronous reads, in seconds, so the average latency was ~200ms, which is still pretty high. Write latency appears to be excellent. Latency measured between KVM and librbd or between librbd and OSDs or between KVM and OSDs? Something tells me it is latter and thus it does not sched any light on where the problem is. Notably rados has max latency of just over 3ms. Does it mean that latency of 800ms comes from qemu-rbd driver?! That's latency between KVM and the OSDs. The extra latency could be from the callback to qemu or an artifact of this workload on the osds. You can use the admin socket on the osds for 'perf dump' as well, and
Re: osd/OSDMap.h: 330: FAILED assert(is_up(osd))
On 07/17/2012 06:03 PM, Samuel Just wrote: master should now have a fix for that, let me know how it goes. I opened bug #2798 for this issue. Hmmm, it seems handle_osd_ping() now runs into a case where for the first ping it gets, service.osdmap can be empty? 0 2012-07-18 09:17:23.977497 7fffe6ec6700 -1 *** Caught signal (Segmentation fault) ** in thread 7fffe6ec6700 ceph version 0.48argonaut-419-g4e1d973 (commit:4e1d973e466cd45138f004e84ab8631d9b2a60fa) 1: /usr/bin/ceph-osd() [0x723c39] 2: (()+0xf4a0) [0x776584a0] 3: (OSD::handle_osd_ping(MOSDPing*)+0x7d4) [0x5d7894] 4: (OSD::heartbeat_dispatch(Message*)+0x71) [0x5d8111] 5: (SimpleMessenger::DispatchQueue::entry()+0x583) [0x7d5103] 6: (SimpleMessenger::dispatch_entry()+0x15) [0x7d6485] 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x79523d] 8: (()+0x77f1) [0x776507f1] 9: (clone()+0x6d) [0x76aa1ccd] gdb has this to say: (gdb) bt #0 0x7765836b in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x00724067 in reraise_fatal (signum=11) at global/signal_handler.cc:58 #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:104 #3 signal handler called #4 get_epoch (this=0x15d, m=0x1587000) at ./osd/OSDMap.h:210 #5 OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711 #6 0x005d8111 in OSD::heartbeat_dispatch (this=0x15d, m=0x1587000) at osd/OSD.cc:2769 #7 0x007d5103 in ms_deliver_dispatch (this=0x1472960) at msg/Messenger.h:504 #8 SimpleMessenger::DispatchQueue::entry (this=0x1472960) at msg/SimpleMessenger.cc:367 #9 0x007d6485 in SimpleMessenger::dispatch_entry (this=0x1472880) at msg/SimpleMessenger.cc:384 #10 0x0079523d in SimpleMessenger::DispatchThread::entry (this=value optimized out) at ./msg/SimpleMessenger.h:807 #11 0x776507f1 in start_thread (arg=0x7fffe6ec6700) at pthread_create.c:301 #12 0x76aa1ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) f 5 #5 OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711 1711m-stamp); (gdb) l 1706} 1707 } 1708 Message *r = new MOSDPing(monc-get_fsid(), 1709curmap-get_epoch(), 1710MOSDPing::PING_REPLY, 1711m-stamp); 1712 hbserver_messenger-send_message(r, m-get_connection()); 1713 1714 if (curmap-is_up(from)) { 1715note_peer_epoch(from, m-map_epoch); (gdb) p curmap $1 = std::tr1::shared_ptr (empty) 0x0 -- Jim Thanks for the info! -Sam On Tue, Jul 17, 2012 at 2:54 PM, Jim Schuttjasc...@sandia.gov wrote: On 07/17/2012 03:44 PM, Samuel Just wrote: Not quite. OSDService::get_osdmap() returns the most recently published osdmap. Generally, OSD::osdmap is safe to use when you are holding the osd lock. Otherwise, OSDService::get_osdmap() should be used. There are a few other things that should be fixed surrounding this issue as well, I'll put some time into it today. The map_lock should probably be removed all together. Thanks for taking a look. Let me know when you get something, and I'll take it for a spin. Thanks -- Jim -Sam -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poor read performance in KVM
On 07/17/2012 10:46 PM, Vladimir Bashkirtsev wrote: ceph --admin-daemon /var/run/ceph/kvm.asok perf dump Done that. Waited for VM to fully boot then got perf dump. It would be nice to get output in human readable format instead of JSON - I remember some other part of ceph had relevant command line switch. Does it exist for perf dump? I forgot to mention you can pipe that to 'python -mjson.tool' for more readable output. It's intended to be used by monitoring tools, hence json, but some kind of more plain output could be added. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Please 'git pull' to grab the following change which solved the same problem in my location tree: diff --git a/src/test/ClusterTest.java b/src/test/ClusterTest.java index 9b6bcb6..8b83bdd 100644 --- a/src/test/ClusterTest.java +++ b/src/test/ClusterTest.java @@ -25,13 +25,13 @@ public class ClusterTest { String val1, val2; /* set option to 2 and check that it set */ -val1 = 2; +val1 = true; cluster.setConfigOption(opt, val1); val2 = cluster.getConfigOption(opt); assertTrue(val1.compareTo(val2) == 0); /* make sure the option wasn't already 2 */ -val1 = 1; +val1 = false; cluster.setConfigOption(opt, val1); val2 = cluster.getConfigOption(opt); assertTrue(val1.compareTo(val2) == 0); On Tue, Jul 17, 2012 at 11:24 PM, ramu ramu.freesyst...@gmail.com wrote: [junit] Running ClusterStatsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec [junit] Running ClusterTest [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec good, no errors. BUILD FAILED /home/vutp/java-rados/build.xml:134: Test ClusterTest failed Total time: 10 seconds --And also two txt files generated in Java-Rados directory one is TEST- ClusterStatsTest.txt in this file the text is , Testsuite: ClusterStatsTest Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec no errors here either. Testcase: test_ClusterStats took 0.027 sec --and one more txt file is TEST-ClusterTest.txt in this file the text is , Testsuite: ClusterTest Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec Testcase: test_ConfigOption took 0.026 sec FAILED junit.framework.AssertionFailedError: at ClusterTest.test_ConfigOption(Unknown Source) So, it looks like we are down to one error? Testcase: test_getClusterStats took 0.005 sec Testcase: test_getInstancePointer took 0.004 sec Testcase: test_getVersion took 0.005 sec Testcase: test_PoolOperations took 1.821 sec Testcase: test_openIOContext took 2.134 sec Testcase: test_PoolList took 2.543 sec Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Puppet modules for Ceph
On Wed, Jul 18, 2012 at 6:58 AM, François Charlier francois.charl...@enovance.com wrote: I'm currently working on writing a Puppet module for Ceph. As after some research I found no existing module, I'll start from scratch but I would be glad to hear from people who would already have started working or this or having any idea or pointers regarding this subject. Hi. I don't remember anyone actively working on puppet modules for Ceph. A quick search gives me just this: http://git.sans.ethz.ch/?p=puppet-modules/ceph;a=summary The Chef cookbook at https://github.com/ceph/ceph-cookbooks is starting to get into a pretty good stage. It radically changes how we do deployment and management, so I'd recommend you look at it in detail, and don't imitate mkcephfs. We've been actively changing core Ceph to make deployment and management simpler; I think the best proof of that is that the cookbook is already shorter than the mkcephfs shell script, and will probably just become a thinner layer in the future. The Juju charms for Ceph are also adopting a model quite close to what the Chef cookbook does. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: osd/OSDMap.h: 330: FAILED assert(is_up(osd))
Sorry, master has a fix now for that also. 76efd9772c60b93bbf632e3ecc3b9117dc081427 -Sam On Wed, Jul 18, 2012 at 8:29 AM, Jim Schutt jasc...@sandia.gov wrote: On 07/17/2012 06:03 PM, Samuel Just wrote: master should now have a fix for that, let me know how it goes. I opened bug #2798 for this issue. Hmmm, it seems handle_osd_ping() now runs into a case where for the first ping it gets, service.osdmap can be empty? 0 2012-07-18 09:17:23.977497 7fffe6ec6700 -1 *** Caught signal (Segmentation fault) ** in thread 7fffe6ec6700 ceph version 0.48argonaut-419-g4e1d973 (commit:4e1d973e466cd45138f004e84ab8631d9b2a60fa) 1: /usr/bin/ceph-osd() [0x723c39] 2: (()+0xf4a0) [0x776584a0] 3: (OSD::handle_osd_ping(MOSDPing*)+0x7d4) [0x5d7894] 4: (OSD::heartbeat_dispatch(Message*)+0x71) [0x5d8111] 5: (SimpleMessenger::DispatchQueue::entry()+0x583) [0x7d5103] 6: (SimpleMessenger::dispatch_entry()+0x15) [0x7d6485] 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x79523d] 8: (()+0x77f1) [0x776507f1] 9: (clone()+0x6d) [0x76aa1ccd] gdb has this to say: (gdb) bt #0 0x7765836b in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x00724067 in reraise_fatal (signum=11) at global/signal_handler.cc:58 #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:104 #3 signal handler called #4 get_epoch (this=0x15d, m=0x1587000) at ./osd/OSDMap.h:210 #5 OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711 #6 0x005d8111 in OSD::heartbeat_dispatch (this=0x15d, m=0x1587000) at osd/OSD.cc:2769 #7 0x007d5103 in ms_deliver_dispatch (this=0x1472960) at msg/Messenger.h:504 #8 SimpleMessenger::DispatchQueue::entry (this=0x1472960) at msg/SimpleMessenger.cc:367 #9 0x007d6485 in SimpleMessenger::dispatch_entry (this=0x1472880) at msg/SimpleMessenger.cc:384 #10 0x0079523d in SimpleMessenger::DispatchThread::entry (this=value optimized out) at ./msg/SimpleMessenger.h:807 #11 0x776507f1 in start_thread (arg=0x7fffe6ec6700) at pthread_create.c:301 #12 0x76aa1ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) f 5 #5 OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711 1711m-stamp); (gdb) l 1706} 1707 } 1708 Message *r = new MOSDPing(monc-get_fsid(), 1709curmap-get_epoch(), 1710MOSDPing::PING_REPLY, 1711m-stamp); 1712 hbserver_messenger-send_message(r, m-get_connection()); 1713 1714 if (curmap-is_up(from)) { 1715note_peer_epoch(from, m-map_epoch); (gdb) p curmap $1 = std::tr1::shared_ptr (empty) 0x0 -- Jim Thanks for the info! -Sam On Tue, Jul 17, 2012 at 2:54 PM, Jim Schuttjasc...@sandia.gov wrote: On 07/17/2012 03:44 PM, Samuel Just wrote: Not quite. OSDService::get_osdmap() returns the most recently published osdmap. Generally, OSD::osdmap is safe to use when you are holding the osd lock. Otherwise, OSDService::get_osdmap() should be used. There are a few other things that should be fixed surrounding this issue as well, I'll put some time into it today. The map_lock should probably be removed all together. Thanks for taking a look. Let me know when you get something, and I'll take it for a spin. Thanks -- Jim -Sam -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote: On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). Okay, that's what happened — marking an OSD out in the CRUSH map means all the data gets moved off it, but that doesn't remove it from all the places where it's registered in the monitor and in the map, for a couple reasons: 1) You might want to mark an OSD out before taking it down, to allow for more orderly data movement. 2) OSDs can get marked out automatically, but the system shouldn't be able to forget about them on its own. 3) You might want to remove an OSD from the CRUSH map in the process of placing it somewhere else (perhaps you moved the physical machine to a new location). etc. You want to run ceph osd rm 4 5 and that should unregister both of them from everything[1]. :) -Greg [1]: Except for the full lists, which have a bug in the version of code you're running — remove the OSDs, then adjust the full ratios again, and all will be well. $ ceph osd rm 4 osd.4 does not exist $ ceph -s health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 58, quorum 0,1,2 0,1,2 osdmap e2198: 4 osds: 4 up, 4 in pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB used, 95877 MB / 324 GB avail mdsmap e207: 1/1/1 up {0=a=up:active} $ ceph health detail HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% $ ceph osd dump max_osd 4 osd.0 up in weight 1 up_from 2183 up_thru 2187 down_at 2172 last_clean_interval [2136,2171) 192.168.10.128:6800/4030 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up 68b3deec-e80a-48b7-9c29-1b98f5de4f62 osd.1 up in weight 1 up_from 2136 up_thru 2186 down_at 2135 last_clean_interval [2115,2134) 192.168.10.129:6800/2980 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up b2a26fe9-aaa8-445f-be1f-fa7d2a283b57 osd.2 up in weight 1 up_from 2181 up_thru 2187 down_at 2172 last_clean_interval [2136,2171) 192.168.10.128:6803/4128 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up 378d367a-f7fb-4892-9ec9-db8ffdd2eb20 osd.3 up in weight 1 up_from 2136 up_thru 2186 down_at 2135 last_clean_interval [2115,2134) 192.168.10.129:6803/3069 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up faf8eda8-55fc-4a0e-899f-47dbd32b81b8 Hrm. How did you create your new crush map? All the normal avenues of removing an OSD from the map set a flag which the PGMap uses to delete its records (which would prevent it reappearing in the full list), and I can't see how setcrushmap would remove an OSD from the map (although there might be a code path I haven't found). -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: osd/OSDMap.h: 330: FAILED assert(is_up(osd))
On 07/18/2012 12:03 PM, Samuel Just wrote: Sorry, master has a fix now for that also. 76efd9772c60b93bbf632e3ecc3b9117dc081427 -Sam That got things running for me. Thanks for the quick reply. -- Jim -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote: On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). Okay, that's what happened — marking an OSD out in the CRUSH map means all the data gets moved off it, but that doesn't remove it from all the places where it's registered in the monitor and in the map, for a couple reasons: 1) You might want to mark an OSD out before taking it down, to allow for more orderly data movement. 2) OSDs can get marked out automatically, but the system shouldn't be able to forget about them on its own. 3) You might want to remove an OSD from the CRUSH map in the process of placing it somewhere else (perhaps you moved the physical machine to a new location). etc. You want to run ceph osd rm 4 5 and that should unregister both of them from everything[1]. :) -Greg [1]: Except for the full lists, which have a bug in the version of code you're running — remove the OSDs, then adjust the full ratios again, and all will be well. $ ceph osd rm 4 osd.4 does not exist $ ceph -s health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 58, quorum 0,1,2 0,1,2 osdmap e2198: 4 osds: 4 up, 4 in pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB used, 95877 MB / 324 GB avail mdsmap e207: 1/1/1 up {0=a=up:active} $ ceph health detail HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% $ ceph osd dump max_osd 4 osd.0 up in weight 1 up_from 2183 up_thru 2187 down_at 2172 last_clean_interval [2136,2171) 192.168.10.128:6800/4030 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up 68b3deec-e80a-48b7-9c29-1b98f5de4f62 osd.1 up in weight 1 up_from 2136 up_thru 2186 down_at 2135 last_clean_interval [2115,2134) 192.168.10.129:6800/2980 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up b2a26fe9-aaa8-445f-be1f-fa7d2a283b57 osd.2 up in weight 1 up_from 2181 up_thru 2187 down_at 2172 last_clean_interval [2136,2171) 192.168.10.128:6803/4128 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up 378d367a-f7fb-4892-9ec9-db8ffdd2eb20 osd.3 up in weight 1 up_from 2136 up_thru 2186 down_at 2135 last_clean_interval [2115,2134) 192.168.10.129:6803/3069 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up faf8eda8-55fc-4a0e-899f-47dbd32b81b8 Hrm. How did you create your new crush map? All the normal avenues of removing an OSD from the map set a flag which the PGMap uses to delete its records (which would prevent it reappearing in the full list), and I can't see how setcrushmap would remove an OSD from the map (although there might be a code path I haven't found). Manually, by deleting osd4|5 entries and reweighing remaining nodes. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Wed, Jul 18, 2012 at 12:07 PM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote: On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: Hrm. That shouldn't be possible if the OSD has been removed. How did you take it out? It sounds like maybe you just marked it in the OUT state (and turned it off quite quickly) without actually taking it out of the cluster? -Greg As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). Okay, that's what happened — marking an OSD out in the CRUSH map means all the data gets moved off it, but that doesn't remove it from all the places where it's registered in the monitor and in the map, for a couple reasons: 1) You might want to mark an OSD out before taking it down, to allow for more orderly data movement. 2) OSDs can get marked out automatically, but the system shouldn't be able to forget about them on its own. 3) You might want to remove an OSD from the CRUSH map in the process of placing it somewhere else (perhaps you moved the physical machine to a new location). etc. You want to run ceph osd rm 4 5 and that should unregister both of them from everything[1]. :) -Greg [1]: Except for the full lists, which have a bug in the version of code you're running — remove the OSDs, then adjust the full ratios again, and all will be well. $ ceph osd rm 4 osd.4 does not exist $ ceph -s health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 58, quorum 0,1,2 0,1,2 osdmap e2198: 4 osds: 4 up, 4 in pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB used, 95877 MB / 324 GB avail mdsmap e207: 1/1/1 up {0=a=up:active} $ ceph health detail HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% $ ceph osd dump max_osd 4 osd.0 up in weight 1 up_from 2183 up_thru 2187 down_at 2172 last_clean_interval [2136,2171) 192.168.10.128:6800/4030 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up 68b3deec-e80a-48b7-9c29-1b98f5de4f62 osd.1 up in weight 1 up_from 2136 up_thru 2186 down_at 2135 last_clean_interval [2115,2134) 192.168.10.129:6800/2980 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up b2a26fe9-aaa8-445f-be1f-fa7d2a283b57 osd.2 up in weight 1 up_from 2181 up_thru 2187 down_at 2172 last_clean_interval [2136,2171) 192.168.10.128:6803/4128 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up 378d367a-f7fb-4892-9ec9-db8ffdd2eb20 osd.3 up in weight 1 up_from 2136 up_thru 2186 down_at 2135 last_clean_interval [2115,2134) 192.168.10.129:6803/3069 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up faf8eda8-55fc-4a0e-899f-47dbd32b81b8 Hrm. How did you create your new crush map? All the normal avenues of removing an OSD from the map set a flag which the PGMap uses to delete its records (which would prevent it reappearing in the full list), and I can't see how setcrushmap would remove an OSD from the map (although there might be a code path I haven't found). Manually, by deleting osd4|5 entries and reweighing remaining nodes. So you extracted the CRUSH map, edited it, and injected it using ceph osd setrcrushmap? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Puppet modules for Ceph
On Wed, Jul 18, 2012 at 2:59 PM, Teyo Tyree t...@puppetlabs.com wrote: As you probably know, Puppet Labs is based in Portland. Are you attending OScon? It might be a good opportunity for us to have some face to face hacking time on a Puppet module. Let me know if you would like for us to arrange sometime to get together if you are in town. Sorry, I'm not at OScon, I'm intentionally limiting my travel right now. A large chunk of Inktank is based in Los Angeles, so we're not far away even outside of conferences. Frankly, we still have a bit of cleanup work to do on the chef cookbook side, and you'd probably be most productive writing puppet modules once that stuff is all flushed out. Soon, we'll start to de-emphasize mkcephfs in favor of other, more flexible, deployment mechanisms; I think bringing together some Puppet, Juju and Chef experts at that point would be most beneficial. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Puppet modules for Ceph
On Wed, Jul 18, 2012 at 3:26 PM, Teyo Tyree t...@puppetlabs.com wrote: Ha, that would be an interesting experiment indeed. I think Francois would like to have the Puppet module done sooner rather than later. Are the current Chef cookbooks functional enough for us to get started with them as a reference? I think so. They Work For Me(tm). The ugly stuff is mostly things like needing to wait a few rounds due to Chef's asynchronous data store, and it's missing just about all internal documentation; the user-visible aspects have a decent write-up, but nothing explains e.g. what exactly /var/lib/ceph/bootstrap-osd/ is about. I'd love to help you work through that though, so please keep talking to me and make me explain everything in enough detail. It's just that I don't have anything except source to give you right now. The end user Chef deployment docs are at http://ceph.com/docs/master/install/chef/ http://ceph.com/docs/master/config-cluster/chef/ The cookbook is at https://github.com/ceph/ceph-cookbooks and currently it assumes Ubuntu 12.04. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Hi Noah, Thank u for your reply,it is working fine but Iam getting one more error in terminal is,BUILD FAILED /home/vu/java-rados/build.xml:134: Test IOContextTest failed Total time: 43 seconds and in TEST-IOContextTest.txt file the error is, Testsuite: IOContextTest Tests run: 11, Failures: 1, Errors: 0, Time elapsed: 32.302 sec Testcase: test_toString took 1.791 sec Testcase: test_getCluster took 2.116 sec Testcase: test_getPoolStats took 2.364 sec Testcase: test_setLocatorKey took 2.046 sec Testcase: test_write took 3.109 sec Testcase: test_writeFull took 3.188 sec Testcase: test_getLastVersion took 2.46 sec Testcase: test_append took 3.505 sec Testcase: test_truncate took 3.423 sec Testcase: test_getsetAttribute took 3.161 sec Testcase: test_getObjects took 5.101 sec FAILED junit.framework.AssertionFailedError: at IOContextTest.test_getObjects(Unknown Source) Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html