Re: hanging aio process

Sebastian Ott Tue, 20 May 2014 01:10:29 -0700

On Mon, 19 May 2014, Benjamin LaHaise wrote:
> On Mon, May 19, 2014 at 07:38:51PM +0200, Sebastian Ott wrote:
> > Hello,
> > 
> > on the latest kernel a fio job with 4 workers using libaio hangs.
> 
> Is more than one process stuck in state D when the hang occurs?  If so, 
> what does a backtrace show for the stuck processes (or are there any 
> hung process warnings issued)?


I've seen both - just one or multiple processes in D state. Here it are 2:

./fio ../../test.job 
file1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
file2: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
file3: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
file4: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.1.8
Starting 4 processes
Jobs: 2 (f=0): [m__m] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]



[   58.125227] fio             D 00000000006a2dfe     0   804    800 0x00000001
[   58.125229]        000000007b52fb88 0000000000ab3800 0000000000ab3800 
00000000009e60b0 
[   58.125229]        0000000000ab3800 00000000789284d8 0000000002e0f800 
0000000078928000 
[   58.125229]        00000000797aec00 0000000000ab3800 0000000000ab3800 
00000000009e60b0 
[   58.125229]        0000000000ab3800 0000000000000000 0000000002e0f800 
0000000078928000 
[   58.125229]        00000000006b39b0 00000000006a377a 000000007b52fbc8 
000000007b52fd20 
[   58.125238] Call Trace:
[   58.125240] ([<00000000006a377a>] __schedule+0x562/0xcc8)
[   58.125241]  [<00000000006a2dfe>] schedule_timeout+0x1ee/0x270
[   58.125243]  [<00000000006a4acc>] wait_for_common+0x100/0x1d0
[   58.125246]  [<00000000002ca3fe>] SyS_io_destroy+0x9a/0xdc
[   58.125247]  [<00000000006a80f8>] sysc_nr_ok+0x22/0x28
[   58.125248]  [<000003fffd21c6d2>] 0x3fffd21c6d2
[   58.125250] fio             D 00000000006a2dfe     0   811    800 0x00000001
[   58.125252]        000000007d82bb88 0000000000ab3800 0000000000ab3800 
00000000009e60b0 
[   58.125252]        0000000000ab3800 000000007b0869f8 0000000002e0f800 
000000007b086520 
[   58.125252]        000000007ea0f600 0000000000ab3800 0000000000ab3800 
00000000009e60b0 
[   58.125252]        0000000000ab3800 0000000000000000 0000000002e0f800 
000000007b086520 
[   58.125252]        00000000006b39b0 00000000006a377a 000000007d82bbc8 
000000007d82bd20 
[   58.125261] Call Trace:
[   58.125263] ([<00000000006a377a>] __schedule+0x562/0xcc8)
[   58.125264]  [<00000000006a2dfe>] schedule_timeout+0x1ee/0x270
[   58.125266]  [<00000000006a4acc>] wait_for_common+0x100/0x1d0
[   58.125267]  [<00000000002ca3fe>] SyS_io_destroy+0x9a/0xdc
[   58.125269]  [<00000000006a80f8>] sysc_nr_ok+0x22/0x28
[   58.125270]  [<000003fffd21c6d2>] 0x3fffd21c6d2


> It is entirely possible the bug isn't 
> caused by the referenced commit, as the commit you're pointing to merely 
> makes io_destroy() syscall wait for all aio outstanding to complete 
> before returning.

I cannot reproduce this when I revert said commit (on top of 14186fe). If
that matters - the arch is s390.

Regards,
Sebastian
> 
>               -ben
> 
> > git bisect points to:
> >     commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
> >     Author: Anatol Pomozov <anatol.pomo...@gmail.com>
> >     Date:   Tue Apr 15 11:31:33 2014 -0700
> > 
> >         aio: block io_destroy() until all context requests are completed
> > 
> > 
> > The fio workers are on the wait_for_completion in sys_io_destroy.
> > 
> > Regards,
> > Sebastian
> > [global]
> > blocksize=4K
> > size=256M
> > rw=randrw
> > verify=md5
> > iodepth=32
> > ioengine=libaio
> > direct=1
> > end_fsync=1
> > 
> > [file1]
> > filename=/dev/scma
> > 
> > [file2]
> > filename=/dev/scmbw
> > 
> > [file3]
> > filename=/dev/scmc
> > 
> > [file4]
> > filename=/dev/scmx
> 
> 
> -- 
> "Thought is the essence of where you are now."
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hanging aio process

Reply via email to