RE: debugging librbd async - valgrind memtest hit
I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints on what api behaviour by the tapdisk driver could be causing this to happen in librbd... thanks James ==25078== Memcheck, a memory error detector ==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==25078== Command: /usr/bin/tapdisk.clean ==25078== Parent PID: 25077 ==25078== ==25078== ==25078== HEAP SUMMARY: ==25078== in use at exit: 6,808 bytes in 7 blocks ==25078== total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated ==25078== ==25078== For a detailed leak analysis, rerun with: --leak-check=full ==25078== ==25078== For counts of detected and suppressed errors, rerun with: -v ==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) ==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints ==25081==This could cause spurious value errors to appear. ==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s) ==25081==at 0x75F1AC7: ioctl (syscall-template.S:82) ==25081==by 0x4088DF: tapdisk_blktap_complete_request (tapdisk-blktap.c:150) ==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441) ==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081==by 0x4039BF: main (tapdisk2.c:150) ==25081== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3032) ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117) ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540) ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659) ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244) ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340) ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403) ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891) ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081== Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd ==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457) ==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74) ==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41) ==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) (internal.cc:2751) ==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) (AioCompletionImpl.h:181) ==25081==by 0x5F907E0: Context::complete(int) (Context.h:42) ==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56) ==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46) ==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41) ==25081==by 0x7308B4F: start_thread (pthread_create.c:304) ==25081==by 0x75F8A7C: clone (clone.S:112) ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3032) ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117) ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540) ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659) ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244) ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340) ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403) ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891) ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220) ==25081==by 0x40E864: tapdisk_server_run
RE: debugging librbd async - valgrind memtest hit
On Fri, 30 Aug 2013, James Harper wrote: I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick What version is this? The line numbers don't seem to match up with my source tree. Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints on what api behaviour by the tapdisk driver could be causing this to happen in librbd... It looks like refcounting for the AioCompletion is off. My first guess would be premature (or extra) calls to rados_aio_release or AioCompletion::release(). I did a quick look at the code and it looks like aio_read() is carrying a ref for the AioComplete for the entire duration of the function, so it should not be disappearing (and taking the AioRead request struct with it) until well after where the invalid read is. Maybe there is an error path somewhere what is dropping a ref it shouldn't? sage thanks James ==25078== Memcheck, a memory error detector ==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==25078== Command: /usr/bin/tapdisk.clean ==25078== Parent PID: 25077 ==25078== ==25078== ==25078== HEAP SUMMARY: ==25078== in use at exit: 6,808 bytes in 7 blocks ==25078== total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated ==25078== ==25078== For a detailed leak analysis, rerun with: --leak-check=full ==25078== ==25078== For counts of detected and suppressed errors, rerun with: -v ==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) ==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints ==25081==This could cause spurious value errors to appear. ==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s) ==25081==at 0x75F1AC7: ioctl (syscall-template.S:82) ==25081==by 0x4088DF: tapdisk_blktap_complete_request (tapdisk-blktap.c:150) ==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441) ==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081==by 0x4039BF: main (tapdisk2.c:150) ==25081== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3032) ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117) ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540) ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659) ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244) ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340) ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403) ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891) ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220) ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334) ==25081== Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd ==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457) ==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74) ==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41) ==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) (internal.cc:2751) ==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) (AioCompletionImpl.h:181) ==25081==by 0x5F907E0: Context::complete(int) (Context.h:42) ==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56) ==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46) ==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41) ==25081==by 0x7308B4F: start_thread (pthread_create.c:304) ==25081==by 0x75F8A7C: clone (clone.S:112) ==25081== ==25081== Invalid read of size 8 ==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106) ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long const, char*, ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096) ==25081==by 0x7076330:
RE: debugging librbd async - valgrind memtest hit
On Fri, 30 Aug 2013, James Harper wrote: I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick What version is this? The line numbers don't seem to match up with my source tree. 0.67.2, but I've peppered it with debug prints Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints on what api behaviour by the tapdisk driver could be causing this to happen in librbd... It looks like refcounting for the AioCompletion is off. My first guess would be premature (or extra) calls to rados_aio_release or AioCompletion::release(). I did a quick look at the code and it looks like aio_read() is carrying a ref for the AioComplete for the entire duration of the function, so it should not be disappearing (and taking the AioRead request struct with it) until well after where the invalid read is. Maybe there is an error path somewhere what is dropping a ref it shouldn't? I'll see if I can find a way to track that. It's the c-get() and c-put() that track this right? The crash seems a little bit different every time, so it could still be something stomping on memory, eg overwriting the ref count or something. Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
On Wed, 28 Aug 2013, James Harper wrote: Still having crashes with the rbd module for blktap. I think I can't get consistent debugging info out of librbd. When it writes to a file the logging is buffered so the tail is always missing. When it logs to syslog I thought I was getting everything but now I'm not so sure. You can disable buffering with log max recent = 0 That should catch everything! What is the best way to ensure that each log message gets flushed to the logfile? I'm looking now but a hint would get the job done faster! sage Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
I have set the logfile to be opened with SYNC and that seems to be giving me more consistent output I see the crash is mostly happening around queue_aio_write. Most of the time the last thing I see is this entry librados: queue_aio_write 0x7f0928004390 completion 0x1ea65d0 write_seq 147. I've never seen it happen in a read operation. And always where there are lots of writes queued. From adding further debug statements, I can see that the exact point of the crash is very soon after that but not at a constant point. I'm thinking that the crash is actually happening somewhere in the callback chain, although definitely before librbd callbacks are invoked as debug prints in the rados end of the callbacks shows nothing. Where is the 'top' of the callback chain on a write? I can see that librados calls librbd (which then calls the callback in tapdisk), but what calls librados? Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
Still having crashes with the rbd module for blktap. I think I can't get consistent debugging info out of librbd. When it writes to a file the logging is buffered so the tail is always missing. When it logs to syslog I thought I was getting everything but now I'm not so sure. What is the best way to ensure that each log message gets flushed to the logfile? I'm looking now but a hint would get the job done faster! Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
I can now reliably reproduce this with fio (see config following email), but never under valgrind so far. James [global] directory=/tmp/fio size=128M ioengine=libaio [randwrite1] rw=randwrite iodepth=32 [randread1] rw=randread iodepth=32 [randwrite2] rw=randwrite iodepth=32 [randread2] rw=randread iodepth=32 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
Of course, the old standby is to just crank up the logging detail and try to narrow down where the crash happens. Have you tried that yet? I haven't touched the rbd code. Is increased logging a compile-time option or a config option? That is probably the first you should try then. In the [client] section of ceph.conf on the node where tapdisk is running add something like [client] debug rbd = 20 debug rados = 20 debug ms = 1 log file = /var/log/ceph/client.$name.$pid.log and make sure the log directory is writeable. Excellent. How noisy are those levels likely to be? Is it the consumer of librbd that reads those values? I mean all I need to do is restart tapdisk process and the logging should happen right? There is a probable issue with aio_flush and caching enabled that Mike Dawson is trying to reproduce. Are you running with caching on or off? I have not enabled caching, and I believe it's disabled by default. There is a fix for an aio hang that just hit the cuttlefish branch today that could conceivably be the issue. It causes a hang on qemu but maybe tapdisk is more sensitive? I'd make sure you're running with that in any case to rule it out. I switched to dumpling in the last few days to see if the problem existed there. Is the fix you mention in dumpling? I'm not yet running mission critical production code on ceph, just a secondary windows domain controller, secondary spam filter, and a few other machines that don't affect production if they crash. I'm also testing valgrind at the moment, just basic memtest, but suddenly everything is quite stable even though it's under reasonable load right now. Stupid heisenbugs. Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
On Fri, 16 Aug 2013, James Harper wrote: Of course, the old standby is to just crank up the logging detail and try to narrow down where the crash happens. Have you tried that yet? I haven't touched the rbd code. Is increased logging a compile-time option or a config option? That is probably the first you should try then. In the [client] section of ceph.conf on the node where tapdisk is running add something like [client] debug rbd = 20 debug rados = 20 debug ms = 1 log file = /var/log/ceph/client.$name.$pid.log and make sure the log directory is writeable. Excellent. How noisy are those levels likely to be? Is it the consumer of librbd that reads those values? I mean all I need to do is restart tapdisk process and the logging should happen right? That sound do it, yeah. There is a probable issue with aio_flush and caching enabled that Mike Dawson is trying to reproduce. Are you running with caching on or off? I have not enabled caching, and I believe it's disabled by default. There is a fix for an aio hang that just hit the cuttlefish branch today that could conceivably be the issue. It causes a hang on qemu but maybe tapdisk is more sensitive? I'd make sure you're running with that in any case to rule it out. I switched to dumpling in the last few days to see if the problem existed there. Is the fix you mention in dumpling? I'm not yet running mission critical production code on ceph, just a secondary windows domain controller, secondary spam filter, and a few other machines that don't affect production if they crash. The fix is in the dumpling branch, but not in v0.67. It will be in v0.67.1. I'm also testing valgrind at the moment, just basic memtest, but suddenly everything is quite stable even though it's under reasonable load right now. Stupid heisenbugs. Valgrind makes things go very slow (~10x?), which can have a huge effect on timing. Sometimes that reveals new races, other times it hides others. :/ sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
I'm also testing valgrind at the moment, just basic memtest, but suddenly everything is quite stable even though it's under reasonable load right now. Stupid heisenbugs. Valgrind makes things go very slow (~10x?), which can have a huge effect on timing. Sometimes that reveals new races, other times it hides others. :/ That's what I figured. It's run overnight with no hiccups so far under valgrind (helgrind - but no crashes under memtest either). The helgrind process on one tapdisk instance is running pretty hard, but performance is tolerable. I'll leave it go and poke at it occasionally to see if I can reproduce an actual crash. So far helgrind has identified a bunch of possible races like: ==12478== Possible data race during write of size 4 at 0x913F250 by thread #1 ==12478== Locks held: 2, at addresses 0x9119B80 0x913EF88 ==12478==at 0x5821D57: Pipe::Pipe(SimpleMessenger*, int, Connection*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x5838FC7: SimpleMessenger::connect_rank(entity_addr_t const, int, Connection*, Message*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x5839414: SimpleMessenger::get_connection(entity_inst_t const) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x56D062F: Objecter::get_session(int) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x56D0FAA: Objecter::recalc_op_target(Objecter::Op*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x56D6140: Objecter::_op_submit(Objecter::Op*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x56C4640: librados::IoCtxImpl::operate_read(object_t const, ObjectOperation*, ceph::buffer::list*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x56C53D1: librados::IoCtxImpl::stat(object_t const, unsigned long*, long*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x569B83E: librados::IoCtx::stat(std::string const, unsigned long*, long*) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x63E45BF: librbd::detect_format(librados::IoCtx, std::string const, bool*, unsigned long*) (in /usr/lib/librbd.so.1.0.0) ==12478==by 0x63DCF6E: librbd::ImageCtx::init() (in /usr/lib/librbd.so.1.0.0) ==12478==by 0x63F45C7: librbd::open_image(librbd::ImageCtx*) (in /usr/lib/librbd.so.1.0.0) ==12478== ==12478== This conflicts with a previous read of size 4 by thread #9 ==12478== Locks held: none ==12478==at 0x581E6A1: Pipe::tcp_read_wait() (in /usr/lib/librados.so.2.0.0) ==12478==by 0x581E96F: Pipe::tcp_read(char*, int) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x581F965: Pipe::read_message(Message**) (in /usr/lib/librados.so.2.0.0) ==12478==by 0x583138B: Pipe::reader() (in /usr/lib/librados.so.2.0.0) ==12478==by 0x5834D5C: Pipe::Reader::entry() (in /usr/lib/librados.so.2.0.0) ==12478==by 0x4C2B5AD: mythread_wrapper (hg_intercepts.c:219) ==12478==by 0x6650B4F: start_thread (pthread_create.c:304) ==12478==by 0x6940A7C: clone (clone.S:112) But I don't think they are correct. There are also some races that appear to relate to memcpy. Also I believe tapdisk serialises all callbacks and I don't know that helgrind is aware of that completely as it identifies some cases that just can't happen. The last thing logged is: ==12478== More than 1000 total errors detected. I'm not reporting any more. ==12478== Final error counts will be inaccurate. Go fix your program! ==12478== Rerun with --error-limit=no to disable this cutoff. Note ==12478== that errors may occur in your program without prior warning from ==12478== Valgrind, because errors are no longer being displayed. So I think maybe it's stopped being useful in that instance. I can post the full output if you want. It's only about 40kb James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: debugging librbd async
On Fri, 16 Aug 2013, James Harper wrote: I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been having all sorts of problems as the tapdisk process is segfaulting. To make matters worse, any attempt to use gdb on the resulting core just tells me it can't find the threads ('generic error'). Google tells me that I can get around this error by linking the main exe (tapdisk) with libpthread, but that doesn't help. With strategic printf's I have confirmed that in most cases the crash happens after a call to rbd_aio_read or rbd_aio_write and before the callback is called. Given the async nature of tapdisk it's impossible to be sure but I'm confident that the crash is not happening in any of the tapdisk code. It's possible that there is an off-by-one error in a buffer somewhere with the corruption showing up later but there really isn't a lot of code there and I've been over it very closely and it appears quite sound. I have also tested for multiple complete's for the same request, and corrupt pointers being passed into the completion routine, and nothing shows up there either. In most cases there is nothing pre-empting the crash, aside from a tendency to seemingly crash more often when the cluster is disturbed (eg a mon node is rebooted). I have one VM which will be unbootable for long periods of time with the crash happening during boot, typically when postgres starts. This can be reproduced for hours and is useful for debugging, but then suddenly the problem goes away spontaneously and I can no longer reproduce it even after hundreds of reboots. I'm using Debian and the problem exists with both the latest cuttlefish and dumpling deb's. So... does librbd have any internal self-checking options I can enable? If I'm going to start injecting printf's around the place, can anyone suggest what code paths are most likely to be causing the above? This is usually about the time when we trying running things under valgrind. Is that an option with tapdisk? Of course, the old standby is to just crank up the logging detail and try to narrow down where the crash happens. Have you tried that yet? There is a probable issue with aio_flush and caching enabled that Mike Dawson is trying to reproduce. Are you running with caching on or off? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
On Fri, 16 Aug 2013, James Harper wrote: I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been having all sorts of problems as the tapdisk process is segfaulting. To make matters worse, any attempt to use gdb on the resulting core just tells me it can't find the threads ('generic error'). Google tells me that I can get around this error by linking the main exe (tapdisk) with libpthread, but that doesn't help. With strategic printf's I have confirmed that in most cases the crash happens after a call to rbd_aio_read or rbd_aio_write and before the callback is called. Given the async nature of tapdisk it's impossible to be sure but I'm confident that the crash is not happening in any of the tapdisk code. It's possible that there is an off-by-one error in a buffer somewhere with the corruption showing up later but there really isn't a lot of code there and I've been over it very closely and it appears quite sound. I have also tested for multiple complete's for the same request, and corrupt pointers being passed into the completion routine, and nothing shows up there either. In most cases there is nothing pre-empting the crash, aside from a tendency to seemingly crash more often when the cluster is disturbed (eg a mon node is rebooted). I have one VM which will be unbootable for long periods of time with the crash happening during boot, typically when postgres starts. This can be reproduced for hours and is useful for debugging, but then suddenly the problem goes away spontaneously and I can no longer reproduce it even after hundreds of reboots. I'm using Debian and the problem exists with both the latest cuttlefish and dumpling deb's. So... does librbd have any internal self-checking options I can enable? If I'm going to start injecting printf's around the place, can anyone suggest what code paths are most likely to be causing the above? This is usually about the time when we trying running things under valgrind. Is that an option with tapdisk? Never used it before. I guess I can find out pretty easy, I'll try that next. Of course, the old standby is to just crank up the logging detail and try to narrow down where the crash happens. Have you tried that yet? I haven't touched the rbd code. Is increased logging a compile-time option or a config option? There is a probable issue with aio_flush and caching enabled that Mike Dawson is trying to reproduce. Are you running with caching on or off? I have not enabled caching, and I believe it's disabled by default. Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: debugging librbd async
On Fri, 16 Aug 2013, James Harper wrote: On Fri, 16 Aug 2013, James Harper wrote: I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been having all sorts of problems as the tapdisk process is segfaulting. To make matters worse, any attempt to use gdb on the resulting core just tells me it can't find the threads ('generic error'). Google tells me that I can get around this error by linking the main exe (tapdisk) with libpthread, but that doesn't help. With strategic printf's I have confirmed that in most cases the crash happens after a call to rbd_aio_read or rbd_aio_write and before the callback is called. Given the async nature of tapdisk it's impossible to be sure but I'm confident that the crash is not happening in any of the tapdisk code. It's possible that there is an off-by-one error in a buffer somewhere with the corruption showing up later but there really isn't a lot of code there and I've been over it very closely and it appears quite sound. I have also tested for multiple complete's for the same request, and corrupt pointers being passed into the completion routine, and nothing shows up there either. In most cases there is nothing pre-empting the crash, aside from a tendency to seemingly crash more often when the cluster is disturbed (eg a mon node is rebooted). I have one VM which will be unbootable for long periods of time with the crash happening during boot, typically when postgres starts. This can be reproduced for hours and is useful for debugging, but then suddenly the problem goes away spontaneously and I can no longer reproduce it even after hundreds of reboots. I'm using Debian and the problem exists with both the latest cuttlefish and dumpling deb's. So... does librbd have any internal self-checking options I can enable? If I'm going to start injecting printf's around the place, can anyone suggest what code paths are most likely to be causing the above? This is usually about the time when we trying running things under valgrind. Is that an option with tapdisk? Never used it before. I guess I can find out pretty easy, I'll try that next. Of course, the old standby is to just crank up the logging detail and try to narrow down where the crash happens. Have you tried that yet? I haven't touched the rbd code. Is increased logging a compile-time option or a config option? That is probably the first you should try then. In the [client] section of ceph.conf on the node where tapdisk is running add something like [client] debug rbd = 20 debug rados = 20 debug ms = 1 log file = /var/log/ceph/client.$name.$pid.log and make sure the log directory is writeable. There is a probable issue with aio_flush and caching enabled that Mike Dawson is trying to reproduce. Are you running with caching on or off? I have not enabled caching, and I believe it's disabled by default. There is a fix for an aio hang that just hit the cuttlefish branch today that could conceivably be the issue. It causes a hang on qemu but maybe tapdisk is more sensitive? I'd make sure you're running with that in any case to rule it out. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html