[ceph-users] Deadlock in ceph journal

2014-08-19 Thread Somnath Roy
Hi Sage/Sam,
During our testing we found a potential deadlock scenario in the filestore 
journal code base. This is happening because of two reason.


1.   This is because code is not signaling aio_cond from 
check_aio_completion() in case seq = 0

2.   Following changes in the write_thread_entry() is allowing a very first 
header write with seq = 0.

   if (writeq.empty()  !must_write_header) {



Now, during ceph-deploy activate this is what happening.


1. The very first write of header with seq = 0 issued and it is waiting for aio 
completion. So, aio_num = 1.

2. superblock write came in and got into while (aio_num  0) block of 
write_thread_entry() and waiting on the aio_cond

3. The seq = 0 aio completed but not setting completed_something = true and as 
a result aio_cond is not signaled.

4. write_thread_entry() is getting into deadlock.

This is a timing problem and if header write is returned before superblock 
write this will not happen and will be happening in case of block journal 
device only (aio is enabled).

Here is the log snippet we are getting.

2014-08-19 12:59:10.029363 7f60fa33b700 10 journal write_thread_entry start
2014-08-19 12:59:10.029395 7f60fa33b700 20 journal prepare_multi_write 
queue_pos now 4096
2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write writing 4096~0 
+ header
2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 0~4096 seq 0
2014-08-19 12:59:10.029442 7f60f9339700 10 journal write_finish_thread_entry 
enter
2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl .. 0~4096 in 1
2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 4096~0 seq 0
2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle finished 0 ops 
and 0 bytes, now 0 ops and 0 bytes
2014-08-19 12:59:10.029510 7f60fa33b700 20 journal write_thread_entry going to 
sleep
2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
2014-08-19 12:59:10.029566 7f60f9339700 20 journal write_finish_thread_entry 
waiting for aio(s)
2014-08-19 12:59:10.029726 7f60ff178800 15 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read meta/23c2fcde/osd_superblock/0//-1 
0~0
2014-08-19 12:59:10.029793 7f60ff178800 -1 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find 
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-08-19 12:59:10.029815 7f60ff178800 10 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) 
FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No such 
file or directory
2014-08-19 12:59:10.029892 7f60ff178800  5 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new osr(default 
0x42ea9f0)/0x42ea9f0
2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
2014-08-19 12:59:10.030009 7f60ff178800  5 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions (writeahead) 2 
0x7fff6e817080
2014-08-19 12:59:10.030028 7f60ff178800 10 journal op_journal_transactions 2 
0x7fff6e817080
2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 2 len 505 
(0x42a76f0)
2014-08-19 12:59:10.030065 7f60fa33b700 20 journal write_thread_entry woke up
2014-08-19 12:59:10.030070 7f60fa33b700 20 journal write_thread_entry aio 
throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
2014-08-19 12:59:10.030076 7f60fa33b700 20 journal write_thread_entry deferring 
until more aios complete: 1 aios with 4096 bytes needs 4 bytes to start a new 
aio (currently 0 pending)
2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 2
2014-08-19 12:59:10.030389 7f60f9339700 10 journal write_finish_thread_entry 
aio 0~4096 done
2014-08-19 12:59:10.030402 7f60f9339700 20 journal check_aio_completion
2014-08-19 12:59:10.030406 7f60f9339700 20 journal check_aio_completion 
completed seq 0 0~4096
2014-08-19 12:59:10.030412 7f60f9339700 20 journal write_finish_thread_entry 
sleeping
2014-08-19 12:59:15.026609 7f60fab3c700 20 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000459
2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start max_applied_seq 
1, open_ops 0
2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start blocked, all 
open_ops have completed
2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start nothing to do
2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
2014-08-19 12:59:15.026691 7f60fab3c700 20 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for max_interval 
5.00
2014-08-19 12:59:20.026826 7f60fab3c700 20 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000135
2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start max_applied_seq 
1, open_ops 0
2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start blocked, all 
open_ops have completed
2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start nothing to do
2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start


Could you please confirm this as a valid defect ?

If so, sending a signal on aio_cond in 

Re: [ceph-users] Deadlock in ceph journal

2014-08-19 Thread Somnath Roy
I think this is the issue..

http://tracker.ceph.com/issues/9073

Thanks  Regards
Somnath

From: Somnath Roy
Sent: Tuesday, August 19, 2014 6:25 PM
To: Sage Weil (s...@inktank.com); Samuel Just (sam.j...@inktank.com)
Cc: ceph-users@lists.ceph.com
Subject: Deadlock in ceph journal

Hi Sage/Sam,
During our testing we found a potential deadlock scenario in the filestore 
journal code base. This is happening because of two reason.


1.   This is because code is not signaling aio_cond from 
check_aio_completion() in case seq = 0

2.   Following changes in the write_thread_entry() is allowing a very first 
header write with seq = 0.

   if (writeq.empty()  !must_write_header) {



Now, during ceph-deploy activate this is what happening.


1. The very first write of header with seq = 0 issued and it is waiting for aio 
completion. So, aio_num = 1.

2. superblock write came in and got into while (aio_num  0) block of 
write_thread_entry() and waiting on the aio_cond

3. The seq = 0 aio completed but not setting completed_something = true and as 
a result aio_cond is not signaled.

4. write_thread_entry() is getting into deadlock.

This is a timing problem and if header write is returned before superblock 
write this will not happen and will be happening in case of block journal 
device only (aio is enabled).

Here is the log snippet we are getting.

2014-08-19 12:59:10.029363 7f60fa33b700 10 journal write_thread_entry start
2014-08-19 12:59:10.029395 7f60fa33b700 20 journal prepare_multi_write 
queue_pos now 4096
2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write writing 4096~0 
+ header
2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 0~4096 seq 0
2014-08-19 12:59:10.029442 7f60f9339700 10 journal write_finish_thread_entry 
enter
2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl .. 0~4096 in 1
2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 4096~0 seq 0
2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle finished 0 ops 
and 0 bytes, now 0 ops and 0 bytes
2014-08-19 12:59:10.029510 7f60fa33b700 20 journal write_thread_entry going to 
sleep
2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
2014-08-19 12:59:10.029566 7f60f9339700 20 journal write_finish_thread_entry 
waiting for aio(s)
2014-08-19 12:59:10.029726 7f60ff178800 15 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read meta/23c2fcde/osd_superblock/0//-1 
0~0
2014-08-19 12:59:10.029793 7f60ff178800 -1 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find 
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-08-19 12:59:10.029815 7f60ff178800 10 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) 
FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No such 
file or directory
2014-08-19 12:59:10.029892 7f60ff178800  5 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new osr(default 
0x42ea9f0)/0x42ea9f0
2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
2014-08-19 12:59:10.030009 7f60ff178800  5 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions (writeahead) 2 
0x7fff6e817080
2014-08-19 12:59:10.030028 7f60ff178800 10 journal op_journal_transactions 2 
0x7fff6e817080
2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 2 len 505 
(0x42a76f0)
2014-08-19 12:59:10.030065 7f60fa33b700 20 journal write_thread_entry woke up
2014-08-19 12:59:10.030070 7f60fa33b700 20 journal write_thread_entry aio 
throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
2014-08-19 12:59:10.030076 7f60fa33b700 20 journal write_thread_entry deferring 
until more aios complete: 1 aios with 4096 bytes needs 4 bytes to start a new 
aio (currently 0 pending)
2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 2
2014-08-19 12:59:10.030389 7f60f9339700 10 journal write_finish_thread_entry 
aio 0~4096 done
2014-08-19 12:59:10.030402 7f60f9339700 20 journal check_aio_completion
2014-08-19 12:59:10.030406 7f60f9339700 20 journal check_aio_completion 
completed seq 0 0~4096
2014-08-19 12:59:10.030412 7f60f9339700 20 journal write_finish_thread_entry 
sleeping
2014-08-19 12:59:15.026609 7f60fab3c700 20 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000459
2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start max_applied_seq 
1, open_ops 0
2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start blocked, all 
open_ops have completed
2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start nothing to do
2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
2014-08-19 12:59:15.026691 7f60fab3c700 20 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for max_interval 
5.00
2014-08-19 12:59:20.026826 7f60fab3c700 20 
filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000135
2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start max_applied_seq 
1, open_ops 0
2014-08-19 12:59:20.026876 7f60fab3c700 10 journal