Jerasure 1.2A plugin for Ceph
Hi James, The first version of the jerasure 1.2A plugin for Ceph is complete at https://github.com/ceph/ceph/pull/538#commits-pushed-763275e This commit introduces the main part: ErasureCodeJerasure: base class for jerasure ErasureCodeInterface https://github.com/dachary/ceph/commit/76d2842358465e560a4929d60131762f8c93804f and each technique is derived from it in six successive commits, starting from here ErasureCodeJerasure: define technique ReedSolomonVandermonde It would be great if you could take a look and let us know if you see anything odd. Cheers -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. signature.asc Description: OpenPGP digital signature
Re: radosgw 0.67.2 update - ERROR: failed to initialize watch
Hi, I just pushed a fix to wip-6161, can you verify that it fixes the issue for you? Thanks, I'll give it a shot on monday, I'm out of the office at the moment. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
libvirt: Using rbd_create3 to create format 2 images
Hi, I created the attached patch to have libvirt create images with format 2 by default, this would simplify the CloudStack code and could also help other projects. The problem with libvirt is that there is no mechanism to supply information like order, features, stripe unit and count to the rbd_create3 method, so it's now hardcoded in libvirt. Any comments on this patch before I fire it of to the libvirt guys? -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on From 2731f7c131d938ed5029bf8343877fcc4d950a0f Mon Sep 17 00:00:00 2001 From: Wido den Hollander w...@widodh.nl Date: Fri, 30 Aug 2013 10:50:25 +0200 Subject: [PATCH] rbd: Use rbd_create3 to create RBD format 2 images by default This new RBD format supports snapshotting and cloning. By having libvirt create images in format 2 end-users of the created images can benefit of the new RBD format. Signed-off-by: Wido den Hollander w...@widodh.nl --- src/storage/storage_backend_rbd.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index d9e1789..e5d720e 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -443,6 +443,9 @@ static int virStorageBackendRBDCreateVol(virConnectPtr conn, ptr.cluster = NULL; ptr.ioctx = NULL; int order = 0; +uint64_t features = 3; +uint64_t stripe_count = 1; +uint64_t stripe_unit = 4194304; int ret = -1; VIR_DEBUG(Creating RBD image %s/%s with size %llu, @@ -467,7 +470,8 @@ static int virStorageBackendRBDCreateVol(virConnectPtr conn, goto cleanup; } -if (rbd_create(ptr.ioctx, vol-name, vol-capacity, order) 0) { +if (rbd_create3(ptr.ioctx, vol-name, vol-capacity, features, order, +stripe_count, stripe_unit) 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _(failed to create volume '%s/%s'), pool-def-source.name, -- 1.7.9.5
RE: debugging librbd async - valgrind memtest hit
I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints on what api behaviour by the tapdisk driver could be causing this to happen in librbd... thanks James ==25078== Memcheck, a memory error detector ==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==25078== Command: /usr/bin/tapdisk.clean ==25078== Parent PID: 25077 ==25078== ==25078== ==25078== HEAP SUMMARY: ==25078== in use at exit: 6,808 bytes in 7 blocks ==25078== total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated ==25078== ==25078== For a detailed leak analysis, rerun with: --leak-check=full ==25078== ==25078== For counts of detected and suppressed errors, rerun with: -v ==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) ==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints ==25081==This could cause spurious value errors to appear. ==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s) ==25081==at 0x75F1AC7: ioctl (syscall-template.S:82) ==25081==by 0x4088DF: tapdisk_blktap_complete_request (tapdisk-blktap.c:150) ==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441) ==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081==by 0x4039BF: main (tapdisk2.c:150) ==25081== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3032) ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117) ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540) ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659) ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244) ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340) ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403) ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891) ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081== Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd ==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457) ==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74) ==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41) ==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) (internal.cc:2751) ==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) (AioCompletionImpl.h:181) ==25081==by 0x5F907E0: Context::complete(int) (Context.h:42) ==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56) ==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46) ==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41) ==25081==by 0x7308B4F: start_thread (pthread_create.c:304) ==25081==by 0x75F8A7C: clone (clone.S:112) ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3032) ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117) ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540) ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659) ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244) ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340) ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403) ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891) ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220) ==25081==by 0x40E864: tapdisk_server_run
collectd plugin with cuttlefish
Hi, Has anything changed with the admin socket that would prevent the collectd plugin (compiled against 5.3.1 using the patches submitted to the collectd ML) from gathering stats? I've recompiled collectd with --enable-debug and receive the following output in the log: ceph_init name=mon_ceph1, asok_path=/var/run/ceph/ceph-mon.ceph1.asok entering cconn_main_loop(request_type = 0) did cconn_prepare(name=mon_ceph1,i=0,st=1) cconn_handle_event(name=mon_ceph1,state=1,amt=0,ret=4) did cconn_prepare(name=mon_ceph1,i=0,st=2) did cconn_prepare(name=mon_ceph1,i=0,st=2) ERROR: cconn_main_loop: timed out. cconn_main_loop: reached all Ceph daemons :) Initialization of plugin `ceph' failed with status -110. Plugin will be unloaded. plugin_unregister_read: Marked `ceph' for removal. Thanks in advance! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph s3 allowed characters
Hi, I got err (400) from radosgw on request: 2013-08-30 08:09:19.396812 7f3b307c0700 2 req 3070:0.000150::POST /dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%283%29.pdf::http status=400 2013-08-30 08:09:34.851892 7f3b55ffb700 10 s-object=files/test.t...@op.pl/DOMIWENT 2013/Damian DW/dw/Specyfikacja istotnych warunkF3w zamF3wienia.doc s-bucket=dysk What is allowed range of chars in url in radosgw? -- Regards Dominik -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] ceph s3 allowed characters
(echo -n 'GET /dysk/files/test.test%40op.pl/DOMIWENT%202013/Damian%20DW/dw/Specyfikacja%20istotnych%20warunk%F3w%20zam%F3wienia.doc HTTP/1.0'; printf \r\n\r\n) | nc localhost 88 HTTP/1.1 400 Bad Request Date: Fri, 30 Aug 2013 14:10:07 GMT Server: Apache/2.2.22 (Ubuntu) Accept-Ranges: bytes Content-Length: 83 Connection: close Content-Type: application/xml ?xml version=1.0 encoding=UTF-8?ErrorCodeInvalidObjectName/Code/Error Full log from radosgw another error: 2013-08-30 14:32:52.166321 7f42e77d6700 1 == starting new request req=0x12cff20 = 2013-08-30 14:32:52.166385 7f42e77d6700 2 req 33246:0.65initializing 2013-08-30 14:32:52.166410 7f42e77d6700 10 meta HTTP_X_AMZ_ACL=public-read 2013-08-30 14:32:52.166419 7f42e77d6700 10 x x-amz-acl:public-read 2013-08-30 14:32:52.166497 7f42e77d6700 10 s-object=files/test.t...@op.pl/DOMIWENT 2013/DW 2013_03_27/PROJEKTY 2012/ZB KROL/Szkoła Łaziska ZB KROL/sala-A3aziska_Dolne_PB-0_went_15_11_06 Layou t1 (4).pdf s-bucket=dysk 2013-08-30 14:32:52.166563 7f42e77d6700 2 req 33246:0.000242::POST /dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL /sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%284%29.pdf::http status=400 2013-08-30 14:32:52.166653 7f42e77d6700 1 == req done req=0x12cff20 http_status=400 == -- Dominik 2013/8/30 Alfredo Deza alfredo.d...@inktank.com: On Fri, Aug 30, 2013 at 9:52 AM, Dominik Mostowiec dominikmostow...@gmail.com wrote: Hi, I got err (400) from radosgw on request: 2013-08-30 08:09:19.396812 7f3b307c0700 2 req 3070:0.000150::POST /dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%283%29.pdf::http status=400 2013-08-30 08:09:34.851892 7f3b55ffb700 10 s-object=files/test.t...@op.pl/DOMIWENT 2013/Damian DW/dw/Specyfikacja istotnych warunkF3w zamF3wienia.doc s-bucket=dysk What is allowed range of chars in url in radosgw? Can you post the full HTTP headers for the response? The output you are pasting is not entirely clear to me, is that a single log line for the whole request? Maybe it is just the formatting that is throwing me off. -- Regards Dominik ___ ceph-users mailing list ceph-us...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Pozdrawiam Dominik -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] ceph s3 allowed characters
On Fri, Aug 30, 2013 at 7:44 AM, Dominik Mostowiec dominikmostow...@gmail.com wrote: (echo -n 'GET /dysk/files/test.test%40op.pl/DOMIWENT%202013/Damian%20DW/dw/Specyfikacja%20istotnych%20warunk%F3w%20zam%F3wienia.doc HTTP/1.0'; printf \r\n\r\n) | nc localhost 88 HTTP/1.1 400 Bad Request Date: Fri, 30 Aug 2013 14:10:07 GMT Server: Apache/2.2.22 (Ubuntu) Accept-Ranges: bytes Content-Length: 83 Connection: close Content-Type: application/xml ?xml version=1.0 encoding=UTF-8?ErrorCodeInvalidObjectName/Code/Error The object name needs to be utf8 encoded. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async - valgrind memtest hit
On Fri, 30 Aug 2013, James Harper wrote: I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick What version is this? The line numbers don't seem to match up with my source tree. Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints on what api behaviour by the tapdisk driver could be causing this to happen in librbd... It looks like refcounting for the AioCompletion is off. My first guess would be premature (or extra) calls to rados_aio_release or AioCompletion::release(). I did a quick look at the code and it looks like aio_read() is carrying a ref for the AioComplete for the entire duration of the function, so it should not be disappearing (and taking the AioRead request struct with it) until well after where the invalid read is. Maybe there is an error path somewhere what is dropping a ref it shouldn't? sage thanks James ==25078== Memcheck, a memory error detector ==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==25078== Command: /usr/bin/tapdisk.clean ==25078== Parent PID: 25077 ==25078== ==25078== ==25078== HEAP SUMMARY: ==25078== in use at exit: 6,808 bytes in 7 blocks ==25078== total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated ==25078== ==25078== For a detailed leak analysis, rerun with: --leak-check=full ==25078== ==25078== For counts of detected and suppressed errors, rerun with: -v ==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) ==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints ==25081==This could cause spurious value errors to appear. ==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s) ==25081==at 0x75F1AC7: ioctl (syscall-template.S:82) ==25081==by 0x4088DF: tapdisk_blktap_complete_request (tapdisk-blktap.c:150) ==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441) ==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081==by 0x4039BF: main (tapdisk2.c:150) ==25081== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3032) ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117) ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540) ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659) ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244) ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340) ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403) ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891) ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081== Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd ==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457) ==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74) ==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41) ==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) (internal.cc:2751) ==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) (AioCompletionImpl.h:181) ==25081==by 0x5F907E0: Context::complete(int) (Context.h:42) ==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56) ==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46) ==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41) ==25081==by 0x7308B4F: start_thread (pthread_create.c:304) ==25081==by 0x75F8A7C: clone (clone.S:112) ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330:
Re: libvirt: Using rbd_create3 to create format 2 images
On 08/30/2013 02:42 AM, Wido den Hollander wrote: Hi, I created the attached patch to have libvirt create images with format 2 by default, this would simplify the CloudStack code and could also help other projects. The problem with libvirt is that there is no mechanism to supply information like order, features, stripe unit and count to the rbd_create3 method, so it's now hardcoded in libvirt. Any comments on this patch before I fire it of to the libvirt guys? Seems ok to me. They might want you to detect whether the function is there and compile without it if librbd doesn't support it (rbd_create3 first appeared in bobtail). -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Deep-Scrub and High Read Latency with QEMU/RBD
We've been struggling with an issue of spikes of high i/o latency with qemu/rbd guests. As we've been chasing this bug, we've greatly improved the methods we use to monitor our infrastructure. It appears that our RBD performance chokes in two situations: - Deep-Scrub - Backfill/recovery In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around 10% utilized to complete saturation during a scrub. RBD writeback cache appears to cover the issue nicely, but occasionally suffers drops in performance (presumably when it flushes). But, reads appear to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see log fragment below). If I make the assumption that deep-scrub isn't intended to create massive spindle contention, this appears to be a problem. What should happen here? Looking at the settings around deep-scrub, I don't see an obvious way to say don't saturate my drives. Are there any setting in Ceph or otherwise (readahead?) that might lower the burden of deep-scrub? If not, perhaps reads could be remapped to avoid waiting on saturated disks during scrub. Any ideas? 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s 2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s 2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s 2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s 2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 3788KB/s wr, 239op/s 2013-08-30 15:47:38.120343 mon.0 [INF] pgmap v9853945: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 4671KB/s wr, 239op/s 2013-08-30 15:47:39.546980 mon.0 [INF] pgmap v9853946: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 13487KB/s wr, 444op/s 2013-08-30 15:47:40.561203 mon.0 [INF] pgmap v9853947: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 15265KB/s wr, 489op/s 2013-08-30 15:47:41.794355 mon.0 [INF] pgmap v9853948: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 7157KB/s wr, 240op/s 2013-08-30 15:47:44.661000 mon.0 [INF] pgmap v9853949: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB
Re: Deep-Scrub and High Read Latency with QEMU/RBD
You may want to reduce scrubbing pgs per osd to 1 using config option and check the results. On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote: We've been struggling with an issue of spikes of high i/o latency with qemu/rbd guests. As we've been chasing this bug, we've greatly improved the methods we use to monitor our infrastructure. It appears that our RBD performance chokes in two situations: - Deep-Scrub - Backfill/recovery In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around 10% utilized to complete saturation during a scrub. RBD writeback cache appears to cover the issue nicely, but occasionally suffers drops in performance (presumably when it flushes). But, reads appear to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see log fragment below). If I make the assumption that deep-scrub isn't intended to create massive spindle contention, this appears to be a problem. What should happen here? Looking at the settings around deep-scrub, I don't see an obvious way to say don't saturate my drives. Are there any setting in Ceph or otherwise (readahead?) that might lower the burden of deep-scrub? If not, perhaps reads could be remapped to avoid waiting on saturated disks during scrub. Any ideas? 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s 2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s 2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s 2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s 2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 3788KB/s wr, 239op/s 2013-08-30 15:47:38.120343 mon.0 [INF] pgmap v9853945: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 4671KB/s wr, 239op/s 2013-08-30 15:47:39.546980 mon.0 [INF] pgmap v9853946: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 13487KB/s wr, 444op/s 2013-08-30 15:47:40.561203 mon.0 [INF] pgmap v9853947: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 0B/s rd, 15265KB/s wr, 489op/s 2013-08-30 15:47:41.794355 mon.0 [INF] pgmap v9853948: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111
Re: Deep-Scrub and High Read Latency with QEMU/RBD
Andrey, I use all the defaults: # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub osd_scrub_thread_timeout: 60, osd_scrub_finalize_thread_timeout: 600, osd_max_scrubs: 1, osd_scrub_load_threshold: 0.5, osd_scrub_min_interval: 86400, osd_scrub_max_interval: 604800, osd_scrub_chunk_min: 5, osd_scrub_chunk_max: 25, osd_deep_scrub_interval: 604800, osd_deep_scrub_stride: 524288, Which value are you referring to? Does anyone know exactly how osd scrub load threshold works? The manual states The maximum CPU load. Ceph will not scrub when the CPU load is higher than this number. Default is 50%. So on a system with multiple processors and cores...what happens? Is the threshold .5 load (meaning half a core) or 50% of max load meaning anything less than 8 if you have 16 cores? Thanks, Mike Dawson On 8/30/2013 1:34 PM, Andrey Korolyov wrote: You may want to reduce scrubbing pgs per osd to 1 using config option and check the results. On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote: We've been struggling with an issue of spikes of high i/o latency with qemu/rbd guests. As we've been chasing this bug, we've greatly improved the methods we use to monitor our infrastructure. It appears that our RBD performance chokes in two situations: - Deep-Scrub - Backfill/recovery In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around 10% utilized to complete saturation during a scrub. RBD writeback cache appears to cover the issue nicely, but occasionally suffers drops in performance (presumably when it flushes). But, reads appear to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see log fragment below). If I make the assumption that deep-scrub isn't intended to create massive spindle contention, this appears to be a problem. What should happen here? Looking at the settings around deep-scrub, I don't see an obvious way to say don't saturate my drives. Are there any setting in Ceph or otherwise (readahead?) that might lower the burden of deep-scrub? If not, perhaps reads could be remapped to avoid waiting on saturated disks during scrub. Any ideas? 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s 2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s 2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s 2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s 2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664 active+clean, 8
Re: Deep-Scrub and High Read Latency with QEMU/RBD
On Fri, Aug 30, 2013 at 9:44 PM, Mike Dawson mike.daw...@cloudapt.com wrote: Andrey, I use all the defaults: # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub osd_scrub_thread_timeout: 60, osd_scrub_finalize_thread_timeout: 600, osd_max_scrubs: 1, This one. I may suggest to increase max_interval and write some kind of script doing per-pg scrub with low intensity, so you`ll have one scrubbing PG or less anytime and you may wait some time before scrubbing next, so they will not start scrubbing at once when max_interval will expire. I had discussed some throttling mechanisms to scrubbing some months ago here or in ceph-devel, but there still no such implementation (it is ultimately low-priority task since it can be handled by such simple thing as proposal above). osd_scrub_load_threshold: 0.5, osd_scrub_min_interval: 86400, osd_scrub_max_interval: 604800, osd_scrub_chunk_min: 5, osd_scrub_chunk_max: 25, osd_deep_scrub_interval: 604800, osd_deep_scrub_stride: 524288, Which value are you referring to? Does anyone know exactly how osd scrub load threshold works? The manual states The maximum CPU load. Ceph will not scrub when the CPU load is higher than this number. Default is 50%. So on a system with multiple processors and cores...what happens? Is the threshold .5 load (meaning half a core) or 50% of max load meaning anything less than 8 if you have 16 cores? Thanks, Mike Dawson On 8/30/2013 1:34 PM, Andrey Korolyov wrote: You may want to reduce scrubbing pgs per osd to 1 using config option and check the results. On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote: We've been struggling with an issue of spikes of high i/o latency with qemu/rbd guests. As we've been chasing this bug, we've greatly improved the methods we use to monitor our infrastructure. It appears that our RBD performance chokes in two situations: - Deep-Scrub - Backfill/recovery In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around 10% utilized to complete saturation during a scrub. RBD writeback cache appears to cover the issue nicely, but occasionally suffers drops in performance (presumably when it flushes). But, reads appear to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see log fragment below). If I make the assumption that deep-scrub isn't intended to create massive spindle contention, this appears to be a problem. What should happen here? Looking at the settings around deep-scrub, I don't see an obvious way to say don't saturate my drives. Are there any setting in Ceph or otherwise (readahead?) that might lower the burden of deep-scrub? If not, perhaps reads could be remapped to avoid waiting on saturated disks during scrub. Any ideas? 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s 2013-08-30 15:47:32.388303
Re: libvirt: Using rbd_create3 to create format 2 images
On 08/30/2013 05:26 PM, Josh Durgin wrote: On 08/30/2013 02:42 AM, Wido den Hollander wrote: Hi, I created the attached patch to have libvirt create images with format 2 by default, this would simplify the CloudStack code and could also help other projects. The problem with libvirt is that there is no mechanism to supply information like order, features, stripe unit and count to the rbd_create3 method, so it's now hardcoded in libvirt. Any comments on this patch before I fire it of to the libvirt guys? Seems ok to me. They might want you to detect whether the function is there and compile without it if librbd doesn't support it (rbd_create3 first appeared in bobtail). Good one. Although I don't think anybody is still running Argonaut I'll do a version check of librbd and switch to rbd_create if needed. -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ceph-users] ceph install
Hello ceph-users, I am new to Ceph and would like to bring up a 5-node cluster for my PoC. I am doing an installation from below link and ran into a problem. I am not so sure how to deal with it. Can someone please shed some light? http://ceph.com/docs/master/install/rpm/ [root@cleverloadgen16 ceph]# ceph auth add client.radosgw.gateway --in-file=/etc/ceph/keyring.radosgw.gateway unable to find any monitors in conf. please specify monitors via -m monaddr or -c ceph.conf Error connecting to cluster: ObjectNotFound [root@cleverloadgen16 ceph]# cat keyring.radosgw.gateway [client.radosgw.gateway] key = AQCC4yBSyMWQGBAADS7j7DnIZeGAZiaJFaM8Xw== caps mon = allow rw caps osd = allow rwx [root@cleverloadgen16 ceph]# Thanks, jimmy -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async - valgrind memtest hit
On Fri, 30 Aug 2013, James Harper wrote: I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick What version is this? The line numbers don't seem to match up with my source tree. 0.67.2, but I've peppered it with debug prints Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints on what api behaviour by the tapdisk driver could be causing this to happen in librbd... It looks like refcounting for the AioCompletion is off. My first guess would be premature (or extra) calls to rados_aio_release or AioCompletion::release(). I did a quick look at the code and it looks like aio_read() is carrying a ref for the AioComplete for the entire duration of the function, so it should not be disappearing (and taking the AioRead request struct with it) until well after where the invalid read is. Maybe there is an error path somewhere what is dropping a ref it shouldn't? I'll see if I can find a way to track that. It's the c-get() and c-put() that track this right? The crash seems a little bit different every time, so it could still be something stomping on memory, eg overwriting the ref count or something. Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: collectd plugin with cuttlefish
It's a bit surprising that it broke with cuttlefish; something might have happened in dumpling, but we wouldn't expect changes in cuttlefish. It looks like collectd just couldn't talk to the monitor properly. Maybe look at the mon's log and see what it thinks it saw? On 08/30/2013 05:04 AM, Damien Churchill wrote: Hi, Has anything changed with the admin socket that would prevent the collectd plugin (compiled against 5.3.1 using the patches submitted to the collectd ML) from gathering stats? I've recompiled collectd with --enable-debug and receive the following output in the log: ceph_init name=mon_ceph1, asok_path=/var/run/ceph/ceph-mon.ceph1.asok entering cconn_main_loop(request_type = 0) did cconn_prepare(name=mon_ceph1,i=0,st=1) cconn_handle_event(name=mon_ceph1,state=1,amt=0,ret=4) did cconn_prepare(name=mon_ceph1,i=0,st=2) did cconn_prepare(name=mon_ceph1,i=0,st=2) ERROR: cconn_main_loop: timed out. cconn_main_loop: reached all Ceph daemons :) Initialization of plugin `ceph' failed with status -110. Plugin will be unloaded. plugin_unregister_read: Marked `ceph' for removal. Thanks in advance! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html