RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread James Harper
I finally got a valgrind memtest hit... output attached below email. I 
recompiled all of tapdisk and ceph without any -O options (thought I had 
already...) and it seems to have done the trick

Basically it looks like an instance of AioRead is being accessed after being 
free'd. I need some hints on what api behaviour by the tapdisk driver could be 
causing this to happen in librbd...

thanks

James

==25078== Memcheck, a memory error detector
==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==25078== Command: /usr/bin/tapdisk.clean
==25078== Parent PID: 25077
==25078==
==25078==
==25078== HEAP SUMMARY:
==25078== in use at exit: 6,808 bytes in 7 blocks
==25078==   total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated
==25078==
==25078== For a detailed leak analysis, rerun with: --leak-check=full
==25078==
==25078== For counts of detected and suppressed errors, rerun with: -v
==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints
==25081==This could cause spurious value errors to appear.
==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
proper wrapper.
==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s)
==25081==at 0x75F1AC7: ioctl (syscall-template.S:82)
==25081==by 0x4088DF: tapdisk_blktap_complete_request (tapdisk-blktap.c:150)
==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441)
==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211)
==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
==25081==by 0x4039BF: main (tapdisk2.c:150)
==25081==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==25081==
==25081== Invalid read of size 8
==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106)
==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
std::vectorstd::pairunsigned long, unsigned long, 
std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, 
unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) 
(internal.cc:3032)
==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117)
==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540)
==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659)
==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244)
==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340)
==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403)
==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891)
==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220)
==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
==25081==  Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd
==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74)
==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41)
==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) (internal.cc:2751)
==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) 
(AioCompletionImpl.h:181)
==25081==by 0x5F907E0: Context::complete(int) (Context.h:42)
==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56)
==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46)
==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41)
==25081==by 0x7308B4F: start_thread (pthread_create.c:304)
==25081==by 0x75F8A7C: clone (clone.S:112)
==25081==
==25081== Invalid read of size 8
==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106)
==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
std::vectorstd::pairunsigned long, unsigned long, 
std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, 
unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) 
(internal.cc:3032)
==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117)
==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540)
==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659)
==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244)
==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340)
==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403)
==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891)
==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220)
==25081==by 0x40E864: tapdisk_server_run 

RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread Sage Weil
On Fri, 30 Aug 2013, James Harper wrote:
 I finally got a valgrind memtest hit... output attached below email. I 
 recompiled all of tapdisk and ceph without any -O options (thought I had 
 already...) and it seems to have done the trick

What version is this?  The line numbers don't seem to match up with my 
source tree.
 
 Basically it looks like an instance of AioRead is being accessed after 
 being free'd. I need some hints on what api behaviour by the tapdisk 
 driver could be causing this to happen in librbd...

It looks like refcounting for the AioCompletion is off.  My first guess 
would be premature (or extra) calls to rados_aio_release or 
AioCompletion::release().

I did a quick look at the code and it looks like aio_read() is carrying a 
ref for the AioComplete for the entire duration of the function, so it 
should not be disappearing (and taking the AioRead request struct with it) 
until well after where the invalid read is.  Maybe there is an error path 
somewhere what is dropping a ref it shouldn't?

sage


 
 thanks
 
 James
 
 ==25078== Memcheck, a memory error detector
 ==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
 ==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
 ==25078== Command: /usr/bin/tapdisk.clean
 ==25078== Parent PID: 25077
 ==25078==
 ==25078==
 ==25078== HEAP SUMMARY:
 ==25078== in use at exit: 6,808 bytes in 7 blocks
 ==25078==   total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated
 ==25078==
 ==25078== For a detailed leak analysis, rerun with: --leak-check=full
 ==25078==
 ==25078== For counts of detected and suppressed errors, rerun with: -v
 ==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
 ==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints
 ==25081==This could cause spurious value errors to appear.
 ==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
 proper wrapper.
 ==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s)
 ==25081==at 0x75F1AC7: ioctl (syscall-template.S:82)
 ==25081==by 0x4088DF: tapdisk_blktap_complete_request 
 (tapdisk-blktap.c:150)
 ==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441)
 ==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211)
 ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
 ==25081==by 0x4039BF: main (tapdisk2.c:150)
 ==25081==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
 ==25081==
 ==25081== Invalid read of size 8
 ==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106)
 ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
 std::vectorstd::pairunsigned long, unsigned long, 
 std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
 ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
 ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, 
 unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) 
 (internal.cc:3032)
 ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117)
 ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540)
 ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659)
 ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244)
 ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340)
 ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403)
 ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891)
 ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220)
 ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
 ==25081==  Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd
 ==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
 ==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74)
 ==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41)
 ==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) 
 (internal.cc:2751)
 ==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) 
 (AioCompletionImpl.h:181)
 ==25081==by 0x5F907E0: Context::complete(int) (Context.h:42)
 ==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56)
 ==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46)
 ==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41)
 ==25081==by 0x7308B4F: start_thread (pthread_create.c:304)
 ==25081==by 0x75F8A7C: clone (clone.S:112)
 ==25081==
 ==25081== Invalid read of size 8
 ==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106)
 ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
 std::vectorstd::pairunsigned long, unsigned long, 
 std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
 ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
 ==25081==by 0x7076330: 

RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread James Harper
 
 On Fri, 30 Aug 2013, James Harper wrote:
  I finally got a valgrind memtest hit... output attached below email. I
  recompiled all of tapdisk and ceph without any -O options (thought I had
  already...) and it seems to have done the trick
 
 What version is this?  The line numbers don't seem to match up with my
 source tree.

0.67.2, but I've peppered it with debug prints

  Basically it looks like an instance of AioRead is being accessed after
  being free'd. I need some hints on what api behaviour by the tapdisk
  driver could be causing this to happen in librbd...
 
 It looks like refcounting for the AioCompletion is off.  My first guess
 would be premature (or extra) calls to rados_aio_release or
 AioCompletion::release().
 
 I did a quick look at the code and it looks like aio_read() is carrying a
 ref for the AioComplete for the entire duration of the function, so it
 should not be disappearing (and taking the AioRead request struct with it)
 until well after where the invalid read is.  Maybe there is an error path
 somewhere what is dropping a ref it shouldn't?
 

I'll see if I can find a way to track that. It's the c-get() and c-put() that 
track this right?
 
The crash seems a little bit different every time, so it could still be 
something stomping on memory, eg overwriting the ref count or something.

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-28 Thread Sage Weil
On Wed, 28 Aug 2013, James Harper wrote:
 Still having crashes with the rbd module for blktap. I think I can't get 
 consistent debugging info out of librbd. When it writes to a file the 
 logging is buffered so the tail is always missing. When it logs to 
 syslog I thought I was getting everything but now I'm not so sure.

You can disable buffering with

 log max recent = 0

That should catch everything!

 What is the best way to ensure that each log message gets flushed to the 
 logfile? I'm looking now but a hint would get the job done faster!

sage


 
 Thanks
 
 James
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-28 Thread James Harper
I have set the logfile to be opened with SYNC and that seems to be giving me 
more consistent output

I see the crash is mostly happening around queue_aio_write. Most of the time 
the last thing I see is this entry librados: queue_aio_write 0x7f0928004390 
completion 0x1ea65d0 write_seq 147. I've never seen it happen in a read 
operation. And always where there are lots of writes queued.

From adding further debug statements, I can see that the exact point of the 
crash is very soon after that but not at a constant point.

I'm thinking that the crash is actually happening somewhere in the callback 
chain, although definitely before librbd callbacks are invoked as debug prints 
in the rados end of the callbacks shows nothing.

Where is the 'top' of the callback chain on a write? I can see that librados 
calls librbd (which then calls the callback in tapdisk), but what calls 
librados?

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-27 Thread James Harper
Still having crashes with the rbd module for blktap. I think I can't get 
consistent debugging info out of librbd. When it writes to a file the logging 
is buffered so the tail is always missing. When it logs to syslog I thought I 
was getting everything but now I'm not so sure.

What is the best way to ensure that each log message gets flushed to the 
logfile? I'm looking now but a hint would get the job done faster!

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-17 Thread James Harper
I can now reliably reproduce this with fio (see config following email), but 
never under valgrind so far.

James

[global]
directory=/tmp/fio
size=128M
ioengine=libaio

[randwrite1]
rw=randwrite
iodepth=32

[randread1]
rw=randread
iodepth=32

[randwrite2]
rw=randwrite
iodepth=32

[randread2]
rw=randread
iodepth=32
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-16 Thread James Harper
   Of course, the old standby is to just crank up the logging detail and try
   to narrow down where the crash happens.  Have you tried that yet?
 
  I haven't touched the rbd code. Is increased logging a compile-time
  option or a config option?
 
 That is probably the first you should try then.  In the [client] section
 of ceph.conf on the node where tapdisk is running add something like
 
  [client]
   debug rbd = 20
   debug rados = 20
   debug ms = 1
   log file = /var/log/ceph/client.$name.$pid.log
 
 and make sure the log directory is writeable.
 

Excellent. How noisy are those levels likely to be?

Is it the consumer of librbd that reads those values? I mean all I need to do 
is restart tapdisk process and the logging should happen right?

   There is a probable issue with aio_flush and caching enabled that Mike
   Dawson is trying to reproduce.  Are you running with caching on or off?
 
  I have not enabled caching, and I believe it's disabled by default.
 
 There is a fix for an aio hang that just hit the cuttlefish branch today
 that could conceivably be the issue.  It causes a hang on qemu but maybe
 tapdisk is more sensitive?  I'd make sure you're running with that in any
 case to rule it out.
 

I switched to dumpling in the last few days to see if the problem existed 
there. Is the fix you mention in dumpling? I'm not yet running mission critical 
production code on ceph, just a secondary windows domain controller, secondary 
spam filter, and a few other machines that don't affect production if they 
crash.

I'm also testing valgrind at the moment, just basic memtest, but suddenly 
everything is quite stable even though it's under reasonable load right now. 
Stupid heisenbugs.

Thanks

James



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-16 Thread Sage Weil
On Fri, 16 Aug 2013, James Harper wrote:
Of course, the old standby is to just crank up the logging detail and 
try
to narrow down where the crash happens.  Have you tried that yet?
  
   I haven't touched the rbd code. Is increased logging a compile-time
   option or a config option?
  
  That is probably the first you should try then.  In the [client] section
  of ceph.conf on the node where tapdisk is running add something like
  
   [client]
debug rbd = 20
debug rados = 20
debug ms = 1
log file = /var/log/ceph/client.$name.$pid.log
  
  and make sure the log directory is writeable.
  
 
 Excellent. How noisy are those levels likely to be?
 
 Is it the consumer of librbd that reads those values? I mean all I need 
 to do is restart tapdisk process and the logging should happen right?

That sound do it, yeah.

There is a probable issue with aio_flush and caching enabled that Mike
Dawson is trying to reproduce.  Are you running with caching on or off?
  
   I have not enabled caching, and I believe it's disabled by default.
  
  There is a fix for an aio hang that just hit the cuttlefish branch today
  that could conceivably be the issue.  It causes a hang on qemu but maybe
  tapdisk is more sensitive?  I'd make sure you're running with that in any
  case to rule it out.
  
 
 I switched to dumpling in the last few days to see if the problem existed 
 there. Is the fix you mention in dumpling? I'm not yet running mission 
 critical production code on ceph, just a secondary windows domain controller, 
 secondary spam filter, and a few other machines that don't affect production 
 if they crash.

The fix is in the dumpling branch, but not in v0.67.  It will be in 
v0.67.1.
 
 I'm also testing valgrind at the moment, just basic memtest, but suddenly 
 everything is quite stable even though it's under reasonable load right now. 
 Stupid heisenbugs.

Valgrind makes things go very slow (~10x?), which can have a huge effect 
on timing. Sometimes that reveals new races, other times it hides others.  
:/

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-16 Thread James Harper
  I'm also testing valgrind at the moment, just basic memtest, but suddenly
  everything is quite stable even though it's under reasonable load right now.
  Stupid heisenbugs.
 
 Valgrind makes things go very slow (~10x?), which can have a huge effect
 on timing. Sometimes that reveals new races, other times it hides others.
 :/
 

That's what I figured. It's run overnight with no hiccups so far under valgrind 
(helgrind - but no crashes under memtest either). The helgrind process on one 
tapdisk instance is running pretty hard, but performance is tolerable. I'll 
leave it go and poke at it occasionally to see if I can reproduce an actual 
crash.

So far helgrind has identified a bunch of possible races like:

==12478== Possible data race during write of size 4 at 0x913F250 by thread #1
==12478== Locks held: 2, at addresses 0x9119B80 0x913EF88
==12478==at 0x5821D57: Pipe::Pipe(SimpleMessenger*, int, Connection*) (in 
/usr/lib/librados.so.2.0.0)
==12478==by 0x5838FC7: SimpleMessenger::connect_rank(entity_addr_t const, 
int, Connection*, Message*) (in /usr/lib/librados.so.2.0.0)
==12478==by 0x5839414: SimpleMessenger::get_connection(entity_inst_t 
const) (in /usr/lib/librados.so.2.0.0)
==12478==by 0x56D062F: Objecter::get_session(int) (in 
/usr/lib/librados.so.2.0.0)
==12478==by 0x56D0FAA: Objecter::recalc_op_target(Objecter::Op*) (in 
/usr/lib/librados.so.2.0.0)
==12478==by 0x56D6140: Objecter::_op_submit(Objecter::Op*) (in 
/usr/lib/librados.so.2.0.0)
==12478==by 0x56C4640: librados::IoCtxImpl::operate_read(object_t const, 
ObjectOperation*, ceph::buffer::list*) (in /usr/lib/librados.so.2.0.0)
==12478==by 0x56C53D1: librados::IoCtxImpl::stat(object_t const, unsigned 
long*, long*) (in /usr/lib/librados.so.2.0.0)
==12478==by 0x569B83E: librados::IoCtx::stat(std::string const, unsigned 
long*, long*) (in /usr/lib/librados.so.2.0.0)
==12478==by 0x63E45BF: librbd::detect_format(librados::IoCtx, std::string 
const, bool*, unsigned long*) (in /usr/lib/librbd.so.1.0.0)
==12478==by 0x63DCF6E: librbd::ImageCtx::init() (in 
/usr/lib/librbd.so.1.0.0)
==12478==by 0x63F45C7: librbd::open_image(librbd::ImageCtx*) (in 
/usr/lib/librbd.so.1.0.0)
==12478==
==12478== This conflicts with a previous read of size 4 by thread #9
==12478== Locks held: none
==12478==at 0x581E6A1: Pipe::tcp_read_wait() (in /usr/lib/librados.so.2.0.0)
==12478==by 0x581E96F: Pipe::tcp_read(char*, int) (in 
/usr/lib/librados.so.2.0.0)
==12478==by 0x581F965: Pipe::read_message(Message**) (in 
/usr/lib/librados.so.2.0.0)
==12478==by 0x583138B: Pipe::reader() (in /usr/lib/librados.so.2.0.0)
==12478==by 0x5834D5C: Pipe::Reader::entry() (in /usr/lib/librados.so.2.0.0)
==12478==by 0x4C2B5AD: mythread_wrapper (hg_intercepts.c:219)
==12478==by 0x6650B4F: start_thread (pthread_create.c:304)
==12478==by 0x6940A7C: clone (clone.S:112)

But I don't think they are correct. There are also some races that appear to 
relate to memcpy.

Also I believe tapdisk serialises all callbacks and I don't know that helgrind 
is aware of that completely as it identifies some cases that just can't happen.

The last thing logged is:

==12478== More than 1000 total errors detected.  I'm not reporting any more.
==12478== Final error counts will be inaccurate.  Go fix your program!
==12478== Rerun with --error-limit=no to disable this cutoff.  Note
==12478== that errors may occur in your program without prior warning from
==12478== Valgrind, because errors are no longer being displayed.

So I think maybe it's stopped being useful in that instance. I can post the 
full output if you want. It's only about 40kb

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debugging librbd async

2013-08-15 Thread Sage Weil
On Fri, 16 Aug 2013, James Harper wrote:
 I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been 
 having all sorts of problems as the tapdisk process is segfaulting. To make 
 matters worse, any attempt to use gdb on the resulting core just tells me it 
 can't find the threads ('generic error'). Google tells me that I can get 
 around this error by linking the main exe (tapdisk) with libpthread, but that 
 doesn't help.
 
 With strategic printf's I have confirmed that in most cases the crash happens 
 after a call to rbd_aio_read or rbd_aio_write and before the callback is 
 called. Given the async nature of tapdisk it's impossible to be sure but I'm 
 confident that the crash is not happening in any of the tapdisk code. It's 
 possible that there is an off-by-one error in a buffer somewhere with the 
 corruption showing up later but there really isn't a lot of code there and 
 I've been over it very closely and it appears quite sound.
 
 I have also tested for multiple complete's for the same request, and corrupt 
 pointers being passed into the completion routine, and nothing shows up there 
 either.
 
 In most cases there is nothing pre-empting the crash, aside from a tendency 
 to seemingly crash more often when the cluster is disturbed (eg a mon node is 
 rebooted). I have one VM which will be unbootable for long periods of time 
 with the crash happening during boot, typically when postgres starts. This 
 can be reproduced for hours and is useful for debugging, but then suddenly 
 the problem goes away spontaneously and I can no longer reproduce it even 
 after hundreds of reboots.
 
 I'm using Debian and the problem exists with both the latest cuttlefish and 
 dumpling deb's.
 
 So... does librbd have any internal self-checking options I can enable? If 
 I'm going to start injecting printf's around the place, can anyone suggest 
 what code paths are most likely to be causing the above?

This is usually about the time when we trying running things under 
valgrind.  Is that an option with tapdisk?

Of course, the old standby is to just crank up the logging detail and try 
to narrow down where the crash happens.  Have you tried that yet?

There is a probable issue with aio_flush and caching enabled that Mike 
Dawson is trying to reproduce.  Are you running with caching on or off?

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-15 Thread James Harper
 
 On Fri, 16 Aug 2013, James Harper wrote:
  I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have
  been having all sorts of problems as the tapdisk process is segfaulting. To
  make matters worse, any attempt to use gdb on the resulting core just tells
  me it can't find the threads ('generic error'). Google tells me that I can 
  get
  around this error by linking the main exe (tapdisk) with libpthread, but 
  that
  doesn't help.
 
  With strategic printf's I have confirmed that in most cases the crash
  happens after a call to rbd_aio_read or rbd_aio_write and before the
  callback is called. Given the async nature of tapdisk it's impossible to be 
  sure
  but I'm confident that the crash is not happening in any of the tapdisk 
  code.
  It's possible that there is an off-by-one error in a buffer somewhere with 
  the
  corruption showing up later but there really isn't a lot of code there and 
  I've
  been over it very closely and it appears quite sound.
 
  I have also tested for multiple complete's for the same request, and
  corrupt pointers being passed into the completion routine, and nothing
  shows up there either.
 
  In most cases there is nothing pre-empting the crash, aside from a
  tendency to seemingly crash more often when the cluster is disturbed (eg a
  mon node is rebooted). I have one VM which will be unbootable for long
  periods of time with the crash happening during boot, typically when
  postgres starts. This can be reproduced for hours and is useful for 
  debugging,
  but then suddenly the problem goes away spontaneously and I can no longer
  reproduce it even after hundreds of reboots.
 
  I'm using Debian and the problem exists with both the latest cuttlefish and
  dumpling deb's.
 
  So... does librbd have any internal self-checking options I can enable? If 
  I'm
  going to start injecting printf's around the place, can anyone suggest what
  code paths are most likely to be causing the above?
 
 This is usually about the time when we trying running things under
 valgrind.  Is that an option with tapdisk?

Never used it before. I guess I can find out pretty easy, I'll try that next.

 Of course, the old standby is to just crank up the logging detail and try
 to narrow down where the crash happens.  Have you tried that yet?

I haven't touched the rbd code. Is increased logging a compile-time option or a 
config option?

 
 There is a probable issue with aio_flush and caching enabled that Mike
 Dawson is trying to reproduce.  Are you running with caching on or off?

I have not enabled caching, and I believe it's disabled by default.

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async

2013-08-15 Thread Sage Weil
On Fri, 16 Aug 2013, James Harper wrote:
  
  On Fri, 16 Aug 2013, James Harper wrote:
   I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have
   been having all sorts of problems as the tapdisk process is segfaulting. 
   To
   make matters worse, any attempt to use gdb on the resulting core just 
   tells
   me it can't find the threads ('generic error'). Google tells me that I 
   can get
   around this error by linking the main exe (tapdisk) with libpthread, but 
   that
   doesn't help.
  
   With strategic printf's I have confirmed that in most cases the crash
   happens after a call to rbd_aio_read or rbd_aio_write and before the
   callback is called. Given the async nature of tapdisk it's impossible to 
   be sure
   but I'm confident that the crash is not happening in any of the tapdisk 
   code.
   It's possible that there is an off-by-one error in a buffer somewhere 
   with the
   corruption showing up later but there really isn't a lot of code there 
   and I've
   been over it very closely and it appears quite sound.
  
   I have also tested for multiple complete's for the same request, and
   corrupt pointers being passed into the completion routine, and nothing
   shows up there either.
  
   In most cases there is nothing pre-empting the crash, aside from a
   tendency to seemingly crash more often when the cluster is disturbed (eg a
   mon node is rebooted). I have one VM which will be unbootable for long
   periods of time with the crash happening during boot, typically when
   postgres starts. This can be reproduced for hours and is useful for 
   debugging,
   but then suddenly the problem goes away spontaneously and I can no longer
   reproduce it even after hundreds of reboots.
  
   I'm using Debian and the problem exists with both the latest cuttlefish 
   and
   dumpling deb's.
  
   So... does librbd have any internal self-checking options I can enable? 
   If I'm
   going to start injecting printf's around the place, can anyone suggest 
   what
   code paths are most likely to be causing the above?
  
  This is usually about the time when we trying running things under
  valgrind.  Is that an option with tapdisk?
 
 Never used it before. I guess I can find out pretty easy, I'll try that next.
 
  Of course, the old standby is to just crank up the logging detail and try
  to narrow down where the crash happens.  Have you tried that yet?
 
 I haven't touched the rbd code. Is increased logging a compile-time 
 option or a config option?

That is probably the first you should try then.  In the [client] section 
of ceph.conf on the node where tapdisk is running add something like

 [client]
  debug rbd = 20
  debug rados = 20
  debug ms = 1
  log file = /var/log/ceph/client.$name.$pid.log

and make sure the log directory is writeable.

  There is a probable issue with aio_flush and caching enabled that Mike
  Dawson is trying to reproduce.  Are you running with caching on or off?
 
 I have not enabled caching, and I believe it's disabled by default.

There is a fix for an aio hang that just hit the cuttlefish branch today 
that could conceivably be the issue.  It causes a hang on qemu but maybe 
tapdisk is more sensitive?  I'd make sure you're running with that in any 
case to rule it out.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html