Re: Poor read performance in KVM
On 07/15/2012 06:13 AM, Vladimir Bashkirtsev wrote: Hello, Lately I was trying to get KVM to perform well on RBD. But it still appears elusive. [root@alpha etc]# rados -p rbd bench 120 seq -t 8 Total time run:16.873277 Total reads made: 302 Read size:4194304 Bandwidth (MB/sec):71.592 Average Latency: 0.437984 Max latency: 3.26817 Min latency: 0.015786 Fairly good performance. But when I run in KVM: [root@mail ~]# hdparm -tT /dev/vda /dev/vda: Timing cached reads: 8808 MB in 2.00 seconds = 4411.49 MB/sec This is just the guest page cache - it's reading the first two megabytes of the device repeatedly. Timing buffered disk reads: 10 MB in 6.21 seconds = 1.61 MB/sec This is a sequential read, so readahead in the guest should help here. Not even close to what rados bench show! I even seen 900KB/sec performance. Such slow read performance of course affecting guests. Any ideas where to start to look for performance boost? Do you have rbd caching enabled? It would also be interesting to see how the guest reads are translating to rados reads. hdparm is doing 2MiB sequential reads of the block device. If you add admin_socket=/var/run/ceph/kvm.asok to the rbd device on the qemu command line) you can see number of requests, latency, and request size info while the guest is running via: ceph --admin-daemon /var/run/ceph/kvm.asok perf dump Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Ramu, I receive the same error code in my installation when the CEPH_CONF_FILE environment variable contains an invalid path. Could you please verify that the path you are using points to a valid Ceph configuration? Thanks, - Noah On Sun, Jul 15, 2012 at 10:32 PM, ramu ramu.freesyst...@gmail.com wrote: Hi Noah, I tried but am getting following error in terminal , Buildfile: /home/vu/java-rados/build.xml makedir: compile-rados: compile-tests: jar: test: [junit] Running ClusterStatsTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.04 sec BUILD FAILED /home/vu/java-rados/build.xml:134: Test ClusterStatsTest failed Total time: 1 second and one more error file is /home/vu/java-rados/TEST-ClusterStatsTest.txt in that file the text is like this, Testsuite: ClusterStatsTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.04 sec Testcase: test_ClusterStats took 0.022 sec Caused an ERROR conf_read_file: ret=-22 net.newdream.ceph.rados.RadosException: conf_read_file: ret=-22 at net.newdream.ceph.rados.Cluster.native_conf_read_file(Native Method) at net.newdream.ceph.rados.Cluster.readConfigFile(Unknown Source) at ClusterStatsTest.setup(Unknown Source) Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Hi Noah, I printed the CONF_FILE and file and states,the printed as following,these are in ClusterStatsTest.java. test: [junit] Running ClusterStatsTest [junit] conffile---/etc/ceph/ceph.conf [junit] state--CONFIGURING [junit] file---/etc/ceph/ceph.conf [junit] state--CONFIGURING [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.123 sec BUILD FAILED /home/ramu/java-rados/build.xml:134: Test ClusterStatsTest failed -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel hanged when try to remove a rbd device
Hi, kernel hanged when try to remove a rbd device, detail steps are: Create a rbd image and map it to client; then stop ceph cluster through '/etc/init.d/ceph -a stop'; then in client side, run command 'echo id /sys/bus/rbd/remove',and this command can not return. Checking dmesg, seems like it enters an endless loop, try to re-connect osds and mons; Then press keys 'CTRL + C' to send an INT signal to 'echo id /sys/bus/rbd/remove',then kernel hanged. Can I use rados in this way? with the following patch, kernel will not hang, but ,this patch is not good as well, for there is transaction has not been finished, if just delete it, maybe the data will be inconsistent. But, seems like there is no way to stop this transaction safely,I mean cancel this transaction(avoid data inconsistence) and tell it's caller that this transaction has been failed and has been canceled. (well,If any one know there is a way/or many ways,please tell me,thanks). Also, if there are plans to do these things, I'am very glad to join in and do some work. Or, are there any other resolving plans? thanks a lot for your reply! Signed-off-by: Guanjun He hegua...@gmail.com --- net/ceph/osd_client.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 1ffebed..4dba062 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -688,11 +688,20 @@ static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) static void remove_all_osds(struct ceph_osd_client *osdc) { + struct list_head *pos, *q; + struct ceph_osd_request *req; + dout(__remove_old_osds %p\n, osdc); mutex_lock(osdc-request_mutex); while (!RB_EMPTY_ROOT(osdc-osds)) { struct ceph_osd *osd = rb_entry(rb_first(osdc-osds), struct ceph_osd, o_node); + list_for_each_safe(pos, q, osd-o_requests) { + req = list_entry(pos, struct ceph_osd_request, + r_osd_item); + list_del(pos); + __unregister_request(osdc, req); + kfree(req); + } __remove_osd(osdc, osd); } mutex_unlock(osdc-request_mutex); -- best, Guanjun g...@suse.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libceph: trivial fix for the incorrect debug output
On 07/15/2012 01:45 AM, Jiaju Zhang wrote: This is a trivial fix for the debug output, as it is inconsistent with the function name so may confuse people when debugging. Signed-off-by: Jiaju Zhang jjzh...@suse.de I have been converting these to use __func__ whenever I touch code nearby. Mind if I do that here as well? Reviewed-by: Alex Elder el...@inktank.com --- net/ceph/osd_client.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 1ffebed..ad6d745 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -688,7 +688,7 @@ static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) static void remove_all_osds(struct ceph_osd_client *osdc) { - dout(__remove_old_osds %p\n, osdc); + dout(__remove_all_osds %p\n, osdc); dout(%s %p\n, __func__, osdc); mutex_lock(osdc-request_mutex); while (!RB_EMPTY_ROOT(osdc-osds)) { struct ceph_osd *osd = rb_entry(rb_first(osdc-osds), -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libceph: trivial fix for the incorrect debug output
On Mon, 2012-07-16 at 07:55 -0500, Alex Elder wrote: On 07/15/2012 01:45 AM, Jiaju Zhang wrote: This is a trivial fix for the debug output, as it is inconsistent with the function name so may confuse people when debugging. Signed-off-by: Jiaju Zhang jjzh...@suse.de I have been converting these to use __func__ whenever I touch code nearby. Mind if I do that here as well? Reviewed-by: Alex Elder el...@inktank.com Oh, please do;) Using __func__ would be good. Thanks for the review. Thanks, Jiaju --- net/ceph/osd_client.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 1ffebed..ad6d745 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -688,7 +688,7 @@ static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) static void remove_all_osds(struct ceph_osd_client *osdc) { - dout(__remove_old_osds %p\n, osdc); + dout(__remove_all_osds %p\n, osdc); dout(%s %p\n, __func__, osdc); mutex_lock(osdc-request_mutex); while (!RB_EMPTY_ROOT(osdc-osds)) { struct ceph_osd *osd = rb_entry(rb_first(osdc-osds), -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
On Mon, 16 Jul 2012, Noah Watkins wrote: Ramu, I receive the same error code in my installation when the CEPH_CONF_FILE environment variable contains an invalid path. Could you please verify that the path you are using points to a valid Ceph configuration? BTW it's CEPH_CONF for the config file (not CEPH_CONFG_FILE). Alternatively, you can stick config options in the same format as the command line arguments in CEPH_ARGS. E.g., CEPH_ARGS=--debug-ms 1 --log-file foo some_command ... sage Thanks, - Noah On Sun, Jul 15, 2012 at 10:32 PM, ramu ramu.freesyst...@gmail.com wrote: Hi Noah, I tried but am getting following error in terminal , Buildfile: /home/vu/java-rados/build.xml makedir: compile-rados: compile-tests: jar: test: [junit] Running ClusterStatsTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.04 sec BUILD FAILED /home/vu/java-rados/build.xml:134: Test ClusterStatsTest failed Total time: 1 second and one more error file is /home/vu/java-rados/TEST-ClusterStatsTest.txt in that file the text is like this, Testsuite: ClusterStatsTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.04 sec Testcase: test_ClusterStats took 0.022 sec Caused an ERROR conf_read_file: ret=-22 net.newdream.ceph.rados.RadosException: conf_read_file: ret=-22 at net.newdream.ceph.rados.Cluster.native_conf_read_file(Native Method) at net.newdream.ceph.rados.Cluster.readConfigFile(Unknown Source) at ClusterStatsTest.setup(Unknown Source) Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
obsync crashes when src file is unreadable
Hello. I'm trying to synchronize 2 Amazon S3 buckets, but obsync keeps crashing when it tries to read a certain file. Here is the message displayed when run with --verbose --more-verbose: Mon, 16 Jul 2012 15:24:48 GMT /src-bucket/1356/file Traceback (most recent call last): File /usr/bin/obsync, line 1165, in module sobj = src_all_objects.next() File /usr/bin/obsync, line 636, in next k = self.bucket.get_key(key.name) File /usr/lib/python2.7/dist-packages/boto/s3/bucket.py, line 195, in get_key response.status, response.reason, '') S3ResponseError: S3ResponseError: 403 Forbidden ERROR TYPE: unknown, ORIGIN: source I checked with other S3 clients the state of the file and indeed, it can not be read and I can't change its ACLs. From what I gather, this is one of those very rare cases of unreadable Amazon S3 files due to the failure of Amazon's underlying storage (I think the file will be automatically recovered at some point later by S3). It would be nice if obsync would get over this error, continue with the other files and just display a warning message that the file couldn't be read. Is there something I can do right now that can make obsync continue copying (maybe a quick hack in the code or something)? This is happening on Ubuntu 12.10 2012-07-05, obsync version 0.47.2-0ubuntu2. Thanks. -- Fita Adrian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
On Mon, Jul 16, 2012 at 8:26 AM, Sage Weil s...@inktank.com wrote: On Mon, 16 Jul 2012, Noah Watkins wrote: Ramu, I receive the same error code in my installation when the CEPH_CONF_FILE environment variable contains an invalid path. Could you please verify that the path you are using points to a valid Ceph configuration? BTW it's CEPH_CONF for the config file (not CEPH_CONFG_FILE). It turns out to be quite awkward to tell the unit tests about the configuration location, and using an environment variable is convenient. However, I wasn't aware of CEPH_CONF, and in fact the unit test framework looks for CEPH_CONFIG_FILE. The later should definitely be removed (ramu, can you try CEPH_CONF?), and this isn't an issue in libcephfs wrappers -- the rados wrappers are incredibly old. Thanks, Noah Alternatively, you can stick config options in the same format as the command line arguments in CEPH_ARGS. E.g., CEPH_ARGS=--debug-ms 1 --log-file foo some_command ... sage Thanks, - Noah On Sun, Jul 15, 2012 at 10:32 PM, ramu ramu.freesyst...@gmail.com wrote: Hi Noah, I tried but am getting following error in terminal , Buildfile: /home/vu/java-rados/build.xml makedir: compile-rados: compile-tests: jar: test: [junit] Running ClusterStatsTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.04 sec BUILD FAILED /home/vu/java-rados/build.xml:134: Test ClusterStatsTest failed Total time: 1 second and one more error file is /home/vu/java-rados/TEST-ClusterStatsTest.txt in that file the text is like this, Testsuite: ClusterStatsTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.04 sec Testcase: test_ClusterStats took 0.022 sec Caused an ERROR conf_read_file: ret=-22 net.newdream.ceph.rados.RadosException: conf_read_file: ret=-22 at net.newdream.ceph.rados.Cluster.native_conf_read_file(Native Method) at net.newdream.ceph.rados.Cluster.readConfigFile(Unknown Source) at ClusterStatsTest.setup(Unknown Source) Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Disregard that last message. Ok, I'm not sure where else EINVAL gets returned in the configuration path, but I can look into it this evening. I tested the wrappers on a clean install last night and they seem to be working for me. Can you turn on debug logging with CEPH_ARGS (as per Sage's last email)? On Mon, Jul 16, 2012 at 12:23 AM, ramu ramu.freesyst...@gmail.com wrote: Hi Noah, I printed the CONF_FILE and file and states,the printed as following,these are in ClusterStatsTest.java. test: [junit] Running ClusterStatsTest [junit] conffile---/etc/ceph/ceph.conf [junit] state--CONFIGURING [junit] file---/etc/ceph/ceph.conf [junit] state--CONFIGURING [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.123 sec BUILD FAILED /home/ramu/java-rados/build.xml:134: Test ClusterStatsTest failed -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg Oh, sorry, I have mislooked and thought that you suggested filling up osds. How do I can set full/nearfull ratios correctly? $ceph injectargs '--mon_osd_full_ratio 96' parsed options $ ceph injectargs '--mon_osd_near_full_ratio 94' parsed options ceph pg dump | grep 'full' full_ratio 0.95 nearfull_ratio 0.85 Setting parameters in the ceph.conf and then restarting mons does not affect ratios either. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph status reporting non-existing osd
On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru (mailto:and...@xdel.ru) wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap e3: 3 mons at {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, election epoch 240, quorum 0,1,2 0,1,2 osdmap e2098: 4 osds: 4 up, 4 in pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB used, 143 GB / 324 GB avail mdsmap e181: 1/1/1 up {0=a=up:active} HEALTH_WARN 1 near full osd(s) osd.4 is near full at 89% Needless to say, osd.4 remains only in ceph.conf, but not at crushmap. Reducing has been done 'on-line', e.g. without restart entire cluster. Whoops! It looks like Sage has written some patches to fix this, but for now you should be good if you just update your ratios to a larger number, and then bring them back down again. :) Restarting ceph-mon should also do the trick. Thanks for the bug report! sage Should I restart mons simultaneously? I don't think restarting will actually do the trick for you — you actually will need to set the ratios again. Restarting one by one has no effect, same as filling up data pool up to ~95 percent(btw, when I deleted this 50Gb file on cephfs, mds was stuck permanently and usage remained same until I dropped and recreated data pool - hope it`s one of known posix layer bugs). I also deleted entry from config, and then restarted mons, with no effect. Any suggestions? I'm not sure what you're asking about here? -Greg Oh, sorry, I have mislooked and thought that you suggested filling up osds. How do I can set full/nearfull ratios correctly? $ceph injectargs '--mon_osd_full_ratio 96' parsed options $ ceph injectargs '--mon_osd_near_full_ratio 94' parsed options ceph pg dump | grep 'full' full_ratio 0.95 nearfull_ratio 0.85 Setting parameters in the ceph.conf and then restarting mons does not affect ratios either. Thanks, it worked, but setting values back result to turn warning back. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
can rbd unmap detect if device is mounted?
Hi folks, I've made this mistake a couple of times now (completely my fault, when will I learn?), and am wondering if a bit of protection can be put in place against user errors. I mapped a device rbd map, then formatted and and mounted the device (mkfs.extf /dev/rbd0..., mount /dev/rbd0...). Then sometime later, I want to remove the RBD device. Stupidly, I do the rbd unmap command before I unmount the device. The kernel doesn't really care for this. Or more accurately, I can't remap that same RBD because I run into: kernel: [2248653.941688] sysfs: cannot create duplicate filename '/devices/virtual/block/rbd0' kernel: [2248653.941833] kobject_add_internal failed for rbd0 with -EEXIST, don't try to register things with the same name in the same directory. At this point, the rbd map command hangs indefinitely (producing the logs from above). Ctrl-C does exit out, though. But if I try to fix my mistake by doing the unmount now, I get the error: umount: device is busy. So really I get stuck. I can't unmount without the device, and I can't remap the device to the old block device. I have to reboot to clean up and move on. I imagine other bad things can happen with the block device goes away out from under the mount point. Any way the rbd unmap command can detect when the device is in use or mounted and inform the user? Thanks, - Travis -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can rbd unmap detect if device is mounted?
On 07/16/2012 12:59 PM, Travis Rhoden wrote: Hi folks, I've made this mistake a couple of times now (completely my fault, when will I learn?), and am wondering if a bit of protection can be put in place against user errors. Yeah, we've been working on advisory locking. The first step is just adding an option to lock via the rbd command line tool, so you could script lock/map and unmap/unlock. This is described a little more in this thread: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 I mapped a device rbd map, then formatted and and mounted the device (mkfs.extf /dev/rbd0..., mount /dev/rbd0...). Then sometime later, I want to remove the RBD device. Stupidly, I do the rbd unmap command before I unmount the device. The kernel doesn't really care for this. Or more accurately, I can't remap that same RBD because I run into: kernel: [2248653.941688] sysfs: cannot create duplicate filename '/devices/virtual/block/rbd0' kernel: [2248653.941833] kobject_add_internal failed for rbd0 with -EEXIST, don't try to register things with the same name in the same directory. At this point, the rbd map command hangs indefinitely (producing the logs from above). Ctrl-C does exit out, though. But if I try to fix my mistake by doing the unmount now, I get the error: umount: device is busy. So really I get stuck. I can't unmount without the device, and I can't remap the device to the old block device. I have to reboot to clean up and move on. A similar issue was fixed in 3.4 (see http://tracker.newdream.net/issues/1907). What kernel are you using? 3.2 had a nasty possibility of preventing further operations if mapping hung while trying to connect to the monitors. I imagine other bad things can happen with the block device goes away out from under the mount point. Any way the rbd unmap command can detect when the device is in use or mounted and inform the user? Before actually unmapping the device, rbd unmap could check if it was present in mtab. If it's being used as a raw block device and not mounted or you created and used your own device node this wouldn't help, but it would be better than nothing. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can rbd unmap detect if device is mounted?
Thanks for the response, Josh. Sorry I didn't send in my version info with the initial message. On Mon, Jul 16, 2012 at 6:43 PM, Josh Durgin josh.dur...@inktank.com wrote: On 07/16/2012 12:59 PM, Travis Rhoden wrote: Hi folks, I've made this mistake a couple of times now (completely my fault, when will I learn?), and am wondering if a bit of protection can be put in place against user errors. Yeah, we've been working on advisory locking. The first step is just adding an option to lock via the rbd command line tool, so you could script lock/map and unmap/unlock. This is described a little more in this thread: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 I had in fact seen this. Looks like the advisory locking is targeted for 0.49. That's great! Thanks for reminding me of it. I'll look forward to it. I mapped a device rbd map, then formatted and and mounted the device (mkfs.extf /dev/rbd0..., mount /dev/rbd0...). Then sometime later, I want to remove the RBD device. Stupidly, I do the rbd unmap command before I unmount the device. The kernel doesn't really care for this. Or more accurately, I can't remap that same RBD because I run into: kernel: [2248653.941688] sysfs: cannot create duplicate filename '/devices/virtual/block/rbd0' kernel: [2248653.941833] kobject_add_internal failed for rbd0 with -EEXIST, don't try to register things with the same name in the same directory. At this point, the rbd map command hangs indefinitely (producing the logs from above). Ctrl-C does exit out, though. But if I try to fix my mistake by doing the unmount now, I get the error: umount: device is busy. So really I get stuck. I can't unmount without the device, and I can't remap the device to the old block device. I have to reboot to clean up and move on. A similar issue was fixed in 3.4 (see http://tracker.newdream.net/issues/1907). What kernel are you using? 3.2 had a nasty possibility of preventing further operations if mapping hung while trying to connect to the monitors. I am using the stock Ubuntu 12.04 kernel, which is in fact 3.2 Good point. So, the only way for me to get the updates you mentioned is to upgrade to a 3.4 kernel, correct? I imagine other bad things can happen with the block device goes away out from under the mount point. Any way the rbd unmap command can detect when the device is in use or mounted and inform the user? Before actually unmapping the device, rbd unmap could check if it was present in mtab. If it's being used as a raw block device and not mounted or you created and used your own device node this wouldn't help, but it would be better than nothing. That would be awesome. Every little bit helps. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can rbd unmap detect if device is mounted?
On Mon, Jul 16, 2012 at 3:43 PM, Josh Durgin josh.dur...@inktank.com wrote: I've made this mistake a couple of times now (completely my fault, when will I learn?), and am wondering if a bit of protection can be put in place against user errors. Yeah, we've been working on advisory locking. The first step is just adding an option to lock via the rbd command line tool, so you could script lock/map and unmap/unlock. Is his problem really about the locking? It sounded to me like, he has something (the mount) referencing a block device, and we're letting the block device disappear. The locking you guys have been talking about sounds like that lock would be held whenever the image is mapped, regardless of whether it's mounted or not (think mkfs). Should unmap even be possible while the block device is open? Shouldn't there be a refcount and an -EBUSY? That's what other block device providers do: [0 tv@dreamer ~]$ dd if=/dev/zero of=foo bs=1M count=40 40+0 records in 40+0 records out 41943040 bytes (42 MB) copied, 0.167171 s, 251 MB/s [1 tv@dreamer ~]$ sudo losetup --show -f foo /dev/loop0 [0 tv@dreamer ~]$ sudo mkfs /dev/loop0 mke2fs 1.42 (29-Nov-2011) Discarding device blocks: done Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 10240 inodes, 40960 blocks 2048 blocks (5.00%) reserved for the super user First data block=1 Maximum filesystem blocks=41943040 5 block groups 8192 blocks per group, 8192 fragments per group 2048 inodes per group Superblock backups stored on blocks: 8193, 24577 Allocating group tables: done Writing inode tables: done Writing superblocks and filesystem accounting information: done [0 tv@dreamer ~]$ sudo mount /dev/loop0 /mnt [0 tv@dreamer ~]$ sudo losetup -d /dev/loop0 loop: can't delete device /dev/loop0: Device or resource busy [1 tv@dreamer ~]$ sudo umount /mnt [0 tv@dreamer ~]$ sudo losetup -d /dev/loop0 [0 tv@dreamer ~]$ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can rbd unmap detect if device is mounted?
On 07/16/2012 04:37 PM, Tommi Virtanen wrote: On Mon, Jul 16, 2012 at 3:43 PM, Josh Durginjosh.dur...@inktank.com wrote: I've made this mistake a couple of times now (completely my fault, when will I learn?), and am wondering if a bit of protection can be put in place against user errors. Yeah, we've been working on advisory locking. The first step is just adding an option to lock via the rbd command line tool, so you could script lock/map and unmap/unlock. Is his problem really about the locking? It sounded to me like, he has something (the mount) referencing a block device, and we're letting the block device disappear. The locking you guys have been talking about sounds like that lock would be held whenever the image is mapped, regardless of whether it's mounted or not (think mkfs). Should unmap even be possible while the block device is open? Shouldn't there be a refcount and an -EBUSY? That's what other block device providers do: That would be the best solution. Looking into it, the rbd driver is already keeping a refcount just like the loop driver (in block_device_operations .open/.release). The rbd driver is only using it for the struct device though, instead of struct rbd_device. This shouldn't be too hard to fix. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Robustify ceph-rbdnamer and adapt udev rules
On 07/12/2012 12:49 AM, Pascal de Bruijn | Unilogic Networks B.V. wrote: On Wed, 2012-07-11 at 09:28 -0700, Josh Durgin wrote: On 07/11/2012 06:23 AM, Pascal de Bruijn | Unilogic Networks B.V. wrote: Below is a patch which makes the ceph-rbdnamer script more robust and fixes a problem with the rbd udev rules. On our setup we encountered a symlink which was linked to the wrong rbd: /dev/rbd/mypool/myrbd - /dev/rbd1 While that link should have gone to /dev/rbd3 (on which a partition /dev/rbd3p1 was present). Now the old udev rule passes %n to the ceph-rbdnamer script, the problem with %n is that %n results in a value of 3 (for rbd3), but in a value of 1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming. In the patch below the ceph-rbdnamer script is made more robust and it now it can be called in various ways: /usr/bin/ceph-rbdnamer /dev/rbd3 /usr/bin/ceph-rbdnamer /dev/rbd3p1 /usr/bin/ceph-rbdnamer rbd3 /usr/bin/ceph-rbdnamer rbd3p1 /usr/bin/ceph-rbdnamer 3 Even with all these different styles of calling the modified script, it should now return the same rbdname. This change has to be combined with calling it from udev with %k though. With that fixed, we hit the second problem. We ended up with: /dev/rbd/mypool/myrbd - /dev/rbd3p1 So the rbdname was symlinked to the partition on the rbd instead of the rbd itself. So what probably went wrong is udev discovering the disk and running ceph-rbdnamer which resolved it to myrbd so the following symlink was created: /dev/rbd/mypool/myrbd - /dev/rbd3 However partitions would be discovered next and ceph-rbdnamer would be run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with the previous correct symlink being overwritten with a faulty one: /dev/rbd/mypool/myrbd - /dev/rbd3p1 The solution to the problem is in differentiating between disks and partitions in udev and handling them slightly differently. So with the patch below partitions now get their own symlinks in the following style (which is fairly consistent with other udev rules): /dev/rbd/mypool/myrbd-part1 - /dev/rbd3p1 Please let me know any feedback you have on this patch or the approach used. This all makes sense, but maybe we should put the -part suffix in another namespace to avoid colliding with images that happen to have -partN in their name, e.g.: /dev/rbd/mypool/myrbd/part1 - /dev/rbd3p1 Well my current patch changes the udev rules in a way that's consistent with other udev bits. For example: /dev/disk/by-id/cciss-3600508b100103835322020202026 /dev/disk/by-id/cciss-3600508b100103835322020202026-part1 /dev/disk/by-id/cciss-3600508b100103835322020202026-part2 There is no namespacing there either. That said, those rules tends to use serials/unique-id's for naming (and not user specified strings), so there is little risk of conflicting with the -part%n bit. Also, having a namespace as suggested: /dev/rbd/mypool/myrbd/part1 - /dev/rbd3p1 Also precludes: /dev/rbd/mypool/myrbd - /dev/rbd3 From existing, as myrbd can't be both a device file and directory at the same time :) Assuming you'd want to continue with this approach the disk udev link should probably be something like: /dev/rbd/mypool/myrbd/disk - /dev/rbd3 Please do note that this would change the udev rules in a way that could potentially break people's existing scripts which might assume the old udev scheme (whereas my current patch does not break the old scheme). Good point. Maybe it's worth considering applying my patch as-is to the 0.48.x stable tree, and experimenting with other udev schemes in newer development releases? That sounds best. I've applied your patch to the stable, next, and master branches. Thanks! Josh Regards, Pascal de Bruijn -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Intermittent loss of connectivity with KVM-Ceph-Network (solved)
Hello, Just want to share my recent experience with KVM backed by RBD. Ceph appears to be not at fault but I post it here for others to read as my configuration is something other users of Ceph may configure. Over last three weeks I was battling some elusive issue: KVM backed by RBD intermittently lost network under some consistent (but relatively low) load. It was driving me nuts as nothing appeared in the logs and everything was seemingly OK. With only one exception: pings to nearby host sometimes were coming with No buffer space available error and number of pings could be delayed by 20-30 seconds (obviously such delay causes a lot of timeouts). KVM has two network virtio interfaces and on host vhost_net module running. One interface is publicly available, second one connected to private network. I have tried to change virtio to e1000, increase network buffers - all in vain. I also noticed that when hold up on interface happened pings come back with second difference: ie they piling up on interface and then suddenly sent through all at once and remote host returns all of them pretty much simultaneously. Another observation which I have made: this behaviour was clearly evident when network/disk activity was reasonable - during backup. I have stopped backup for one day but it did not help (however loss of connectivity did not happen as much). Being unable to identify the cause I started to pull things apart: I have moved image from RBD to qcow and magically everything became normal. Back to RBD and issue manifested itself again. But on other hand I had number of freshly installed VMs which also backed by RBD and do not have such issue. VMs which had this fault were different: they were migrated from hardware hosts to VM environment. Fresh VMs and migrated VMs are distro-synched FC17 and so I did not expect any difference. The only difference I had left was that migrated VMs were 32 bit and freshly installed were 64 bit. So in the end I have upgraded kernel in one faulty VM to 64 bit (while leaving balance of the system 32 bit) and problem disappeared! Next day I have upgraded another VM the same way and it also became problem free! So I am now sure that problem lies in 32 bit kernel which is run on 64 bit host. So I guess that there some race condition, likely in virtio_net driver or in tcp stack, which is apparently triggered by context switching from 32 bit to 64 bit and io delays introduced by QEMU-RBD driver. Only when 32 bit VM runs on 64 bit host and it is backed by RBD image this issue appears. Being unable to identify exact spot in the kernel were the problem is I even not sure where exactly I should report it to so I decided to post it here as the place where people with VMs backed by RBD most likely will look for solution. Regards, Vladimir -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can rbd unmap detect if device is mounted?
On Mon, Jul 16, 2012 at 7:37 PM, Tommi Virtanen t...@inktank.com wrote: [0 tv@dreamer ~]$ sudo mount /dev/loop0 /mnt [0 tv@dreamer ~]$ sudo losetup -d /dev/loop0 loop: can't delete device /dev/loop0: Device or resource busy [1 tv@dreamer ~]$ sudo umount /mnt [0 tv@dreamer ~]$ sudo losetup -d /dev/loop0 [0 tv@dreamer ~]$ Thanks for the ingenious loop device example, Tommi. You illustrated my point better than I ever could have. Sounds like you guys have a handle on the situation. - Travis -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to compile Java-Rados.
Hi Noah, I hardcoded CONF_FIL=/etc/ceph/ceph.conf; and commented //System.getProperty(CEPH_CONF_FILE); in RadosTestUtils.java. And run the ant and after run ant test,but iam getting same error. Buildfile: /home/vu/java-rados/build.xml makedir: compile-rados: compile-tests: [javac] Compiling 1 source file to /home/vu/java-rados/build/test jar: test: [junit] Running ClusterStatsTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.068 sec BUILD FAILED /home/vu/java-rados/build.xml:134: Test ClusterStatsTest failed Total time: 2 seconds and in TEST-ClusterStatsTest.txt file also getting same errors, Testsuite: ClusterStatsTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.068 sec Testcase: test_ClusterStats took 0.036 sec Caused an ERROR rados_connect: ret=-1 net.newdream.ceph.rados.RadosException: rados_connect: ret=-1 at net.newdream.ceph.rados.Cluster.native_connect(Native Method) at net.newdream.ceph.rados.Cluster.connect(Unknown Source) at ClusterStatsTest.setup(Unknown Source) --How to give CEPH_ARGS. Thanks, Ramu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html