On Mon, 19 May 2014, Benjamin LaHaise wrote: > On Mon, May 19, 2014 at 07:38:51PM +0200, Sebastian Ott wrote: > > Hello, > > > > on the latest kernel a fio job with 4 workers using libaio hangs. > > Is more than one process stuck in state D when the hang occurs? If so, > what does a backtrace show for the stuck processes (or are there any > hung process warnings issued)?
I've seen both - just one or multiple processes in D state. Here it are 2: ./fio ../../test.job file1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 file2: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 file3: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 file4: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.1.8 Starting 4 processes Jobs: 2 (f=0): [m__m] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s] [ 58.125227] fio D 00000000006a2dfe 0 804 800 0x00000001 [ 58.125229] 000000007b52fb88 0000000000ab3800 0000000000ab3800 00000000009e60b0 [ 58.125229] 0000000000ab3800 00000000789284d8 0000000002e0f800 0000000078928000 [ 58.125229] 00000000797aec00 0000000000ab3800 0000000000ab3800 00000000009e60b0 [ 58.125229] 0000000000ab3800 0000000000000000 0000000002e0f800 0000000078928000 [ 58.125229] 00000000006b39b0 00000000006a377a 000000007b52fbc8 000000007b52fd20 [ 58.125238] Call Trace: [ 58.125240] ([<00000000006a377a>] __schedule+0x562/0xcc8) [ 58.125241] [<00000000006a2dfe>] schedule_timeout+0x1ee/0x270 [ 58.125243] [<00000000006a4acc>] wait_for_common+0x100/0x1d0 [ 58.125246] [<00000000002ca3fe>] SyS_io_destroy+0x9a/0xdc [ 58.125247] [<00000000006a80f8>] sysc_nr_ok+0x22/0x28 [ 58.125248] [<000003fffd21c6d2>] 0x3fffd21c6d2 [ 58.125250] fio D 00000000006a2dfe 0 811 800 0x00000001 [ 58.125252] 000000007d82bb88 0000000000ab3800 0000000000ab3800 00000000009e60b0 [ 58.125252] 0000000000ab3800 000000007b0869f8 0000000002e0f800 000000007b086520 [ 58.125252] 000000007ea0f600 0000000000ab3800 0000000000ab3800 00000000009e60b0 [ 58.125252] 0000000000ab3800 0000000000000000 0000000002e0f800 000000007b086520 [ 58.125252] 00000000006b39b0 00000000006a377a 000000007d82bbc8 000000007d82bd20 [ 58.125261] Call Trace: [ 58.125263] ([<00000000006a377a>] __schedule+0x562/0xcc8) [ 58.125264] [<00000000006a2dfe>] schedule_timeout+0x1ee/0x270 [ 58.125266] [<00000000006a4acc>] wait_for_common+0x100/0x1d0 [ 58.125267] [<00000000002ca3fe>] SyS_io_destroy+0x9a/0xdc [ 58.125269] [<00000000006a80f8>] sysc_nr_ok+0x22/0x28 [ 58.125270] [<000003fffd21c6d2>] 0x3fffd21c6d2 > It is entirely possible the bug isn't > caused by the referenced commit, as the commit you're pointing to merely > makes io_destroy() syscall wait for all aio outstanding to complete > before returning. I cannot reproduce this when I revert said commit (on top of 14186fe). If that matters - the arch is s390. Regards, Sebastian > > -ben > > > git bisect points to: > > commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c > > Author: Anatol Pomozov <anatol.pomo...@gmail.com> > > Date: Tue Apr 15 11:31:33 2014 -0700 > > > > aio: block io_destroy() until all context requests are completed > > > > > > The fio workers are on the wait_for_completion in sys_io_destroy. > > > > Regards, > > Sebastian > > [global] > > blocksize=4K > > size=256M > > rw=randrw > > verify=md5 > > iodepth=32 > > ioengine=libaio > > direct=1 > > end_fsync=1 > > > > [file1] > > filename=/dev/scma > > > > [file2] > > filename=/dev/scmbw > > > > [file3] > > filename=/dev/scmc > > > > [file4] > > filename=/dev/scmx > > > -- > "Thought is the essence of where you are now." > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/