On Fri 28-08-20 14:07:12, Xianting Tian wrote:
> As the normal aio wait path(read_events() ->
> wait_event_interruptible_hrtimeout()) doesn't account iowait time, so use
> this patch to make it to account iowait time, which can truely reflect
> the system io situation when using a tool like 'top'.
> 
> The test result as below.
> 
> Test environment:
>       Architecture:          x86_64
>       CPU op-mode(s):        32-bit, 64-bit
>       Byte Order:            Little Endian
>       CPU(s):                32
>       On-line CPU(s) list:   0-31
>       Thread(s) per core:    2
>       Core(s) per socket:    8
>       Socket(s):             2
>       NUMA node(s):          2
>       Vendor ID:             GenuineIntel
>       CPU family:            6
>       Model:                 85
>       Model name:            Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz
>       Stepping:              4
>       CPU MHz:               801.660
> 
> AIO test command:
>       fio -ioengine=libaio -bs=8k -direct=1 -numjobs 32 -rw=read -size=10G
>           -filename=/dev/sda3 -name="Max throughput" -iodepth=128 -runtime=60
> 
> Before test, set nr_requests to 512(default is 256), aim to to make the 
> backend
> device busy to handle io request, and make io_getevents() have more chances to
> wait for io completion:
>       echo 512 >  /sys/block/sda/queue/nr_requests
> 
> Fio test result with the AIO iowait time accounting patch showed as below,
> almost all fio threads are in 'D' state to waiting for io completion, and the
> iowait time is accounted as 48.0%:
>       top - 19:19:23 up 29 min,  2 users,  load average: 14.60, 9.45, 9.45
>       Tasks: 456 total,   4 running, 247 sleeping,   0 stopped,   0 zombie
>       %Cpu(s):  0.4 us,  1.0 sy,  0.0 ni, 50.6 id, 48.0 wa,  0.0 hi,  0.0 si, 
>  0.0 st
>       KiB Mem : 19668915+total, 19515264+free,   866476 used,   670028 
> buff/cache
>       KiB Swap:  4194300 total,  4194300 free,        0 used. 19449948+avail 
> Mem
> 
>         PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
> COMMAND
>       16135 root      20   0  294092  63724  63060 S   1.7  0.0   0:03.31 fio
>       16173 root      20   0  272352   3540   1792 D   1.7  0.0   0:03.85 fio
>       16175 root      20   0  272360   3544   1796 D   1.7  0.0   0:03.85 fio
>       16185 root      20   0  272400   3556   1808 D   1.7  0.0   0:03.84 fio
>       16187 root      20   0  272408   3552   1804 D   1.7  0.0   0:03.82 fio
>       16190 root      20   0  272420   3500   1804 R   1.7  0.0   0:03.88 fio
>       16169 root      20   0  272336   3444   1740 D   1.3  0.0   0:03.75 fio
>       16170 root      20   0  272340   3504   1804 R   1.3  0.0   0:03.80 fio
>       16172 root      20   0  272348   3500   1800 D   1.3  0.0   0:03.86 fio
>       16174 root      20   0  272356   3544   1796 D   1.3  0.0   0:03.77 fio
>       16179 root      20   0  272376   3528   1780 D   1.3  0.0   0:03.79 fio
>       16180 root      20   0  272380   3500   1800 D   1.3  0.0   0:03.85 fio
>       16181 root      20   0  272384   3552   1804 D   1.3  0.0   0:03.87 fio
>       16182 root      20   0  272388   3520   1772 D   1.3  0.0   0:03.80 fio
>       16183 root      20   0  272392   3552   1804 D   1.3  0.0   0:03.77 fio
>       16186 root      20   0  272404   3500   1804 D   1.3  0.0   0:03.88 fio
>       16188 root      20   0  272412   3500   1800 D   1.3  0.0   0:03.89 fio
>       16191 root      20   0  272424   3500   1800 D   1.3  0.0   0:03.92 fio
>       16192 root      20   0  272428   3500   1800 D   1.3  0.0   0:03.87 fio
>       16194 root      20   0  272436   3500   1804 D   1.3  0.0   0:03.82 fio
>       16195 root      20   0  272440   3500   1800 R   1.3  0.0   0:03.82 fio
>       16196 root      20   0  272444   3552   1804 D   1.3  0.0   0:03.84 fio
>       16198 root      20   0  272452   3500   1804 D   1.3  0.0   0:03.89 fio
>       16199 root      20   0  272456   3504   1800 D   1.3  0.0   0:03.84 fio
>       16200 root      20   0  272460   3552   1804 D   1.3  0.0   0:03.85 fio
>       16171 root      20   0  272344   3504   1800 D   1.0  0.0   0:03.84 fio
>       16176 root      20   0  272364   3520   1772 D   1.0  0.0   0:03.76 fio
>       16177 root      20   0  272368   3556   1808 D   1.0  0.0   0:03.74 fio
>       16178 root      20   0  272372   3500   1804 D   1.0  0.0   0:03.90 fio
>       16184 root      20   0  272396   3500   1800 D   1.0  0.0   0:03.83 fio
>       16189 root      20   0  272416   3500   1804 D   1.0  0.0   0:03.86 fio
>       16193 root      20   0  272432   3500   1804 D   1.0  0.0   0:03.85 fio
>       16197 root      20   0  272448   3556   1808 D   1.0  0.0   0:03.75 fio
> 
> Fio test result without the AIO iowait time accounting patch showed as below,
> almost all fio threads are in 'S' state to waiting for io completion, and the
> iowait time is not accounted, iowait is 0.0%.
>       top - 19:20:44 up 31 min,  2 users,  load average: 12.50, 10.15, 9.72
>       Tasks: 458 total,   2 running, 249 sleeping,   0 stopped,   0 zombie
>       %Cpu(s):  0.4 us,  0.9 sy,  0.0 ni, 98.6 id,  0.0 wa,  0.0 hi,  0.0 si, 
>  0.0 st
>       KiB Mem : 19668915+total, 19513945+free,   879652 used,   670040 
> buff/cache
>       KiB Swap:  4194300 total,  4194300 free,        0 used. 19448636+avail 
> Mem
> 
>         PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
> COMMAND
>       16243 root      20   0  294092  63736  63068 S   1.7  0.0   0:03.06 fio
>       16277 root      20   0  272336   3568   1868 S   1.7  0.0   0:03.59 fio
>       16287 root      20   0  272376   3564   1864 S   1.7  0.0   0:03.64 fio
>       16291 root      20   0  272392   3620   1868 S   1.7  0.0   0:03.63 fio
>       16298 root      20   0  272420   3564   1868 S   1.7  0.0   0:03.61 fio
>       16302 root      20   0  272436   3560   1868 S   1.7  0.0   0:03.61 fio
>       16303 root      20   0  272440   3552   1800 S   1.7  0.0   0:03.62 fio
>       16308 root      20   0  272460   3568   1864 S   1.7  0.0   0:03.60 fio
>       16278 root      20   0  272340   3568   1868 S   1.3  0.0   0:03.59 fio
>       16279 root      20   0  272344   3508   1800 S   1.3  0.0   0:03.60 fio
>       16280 root      20   0  272348   3564   1864 S   1.3  0.0   0:03.60 fio
>       16281 root      20   0  272352   3624   1872 S   1.3  0.0   0:03.57 fio
>       16283 root      20   0  272360   3612   1860 S   1.3  0.0   0:03.60 fio
>       16285 root      20   0  272368   3592   1840 S   1.3  0.0   0:03.62 fio
>       16286 root      20   0  272372   3580   1828 S   1.3  0.0   0:03.61 fio
>       16288 root      20   0  272380   3620   1868 S   1.3  0.0   0:03.55 fio
>       16289 root      20   0  272384   3564   1868 S   1.3  0.0   0:03.59 fio
>       16292 root      20   0  272396   3536   1836 S   1.3  0.0   0:03.62 fio
>       16293 root      20   0  272400   3624   1872 S   1.3  0.0   0:03.63 fio
>       16295 root      20   0  272408   3620   1868 S   1.3  0.0   0:03.61 fio
>       16297 root      20   0  272416   3568   1868 S   1.3  0.0   0:03.62 fio
>       16300 root      20   0  272428   3564   1864 R   1.3  0.0   0:03.61 fio
>       16304 root      20   0  272444   3564   1864 S   1.3  0.0   0:03.59 fio
>       16305 root      20   0  272448   3456   1760 S   1.3  0.0   0:03.65 fio
>       16307 root      20   0  272456   3568   1864 S   1.3  0.0   0:03.64 fio
>       16282 root      20   0  272356   3556   1860 S   1.0  0.0   0:03.55 fio
>       16284 root      20   0  272364   3612   1860 S   1.0  0.0   0:03.57 fio
>       16290 root      20   0  272388   3616   1864 S   1.0  0.0   0:03.54 fio
>       16294 root      20   0  272404   3624   1872 S   1.0  0.0   0:03.60 fio
>       16296 root      20   0  272412   3564   1864 S   1.0  0.0   0:03.60 fio
>       16299 root      20   0  272424   3540   1840 S   1.0  0.0   0:03.62 fio
>       16301 root      20   0  272432   3568   1868 S   1.0  0.0   0:03.63 fio
>       16306 root      20   0  272452   3624   1872 S   1.0  0.0   0:03.60 fio
> 
> Signed-off-by: Xianting Tian <tian.xiant...@h3c.com>

Nice! The patch looks good to me. You can add:

Reviewed-by: Jan Kara <j...@suse.cz>

                                                                Honza

> ---
>  fs/aio.c             |  2 +-
>  include/linux/wait.h | 26 ++++++++++++++++++++++----
>  2 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 5736bff48..8d00548e0 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1290,7 +1290,7 @@ static long read_events(struct kioctx *ctx, long 
> min_nr, long nr,
>       if (until == 0)
>               aio_read_events(ctx, min_nr, nr, event, &ret);
>       else
> -             wait_event_interruptible_hrtimeout(ctx->wait,
> +             io_wait_event_hrtimeout(ctx->wait,
>                               aio_read_events(ctx, min_nr, nr, event, &ret),
>                               until);
>       return ret;
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 898c890fc..fb5902a25 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -312,6 +312,13 @@ do {                                                     
>                         \
>       (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0,     
> \
>                           io_schedule())
>  
> +#define __io_wait_event_hrtimeout(wq_head, condition, timeout)               
>         \
> +({                                                                           
> \
> +     int __ret = 0;                                                          
> \
> +     __ret = __wait_event_hrtimeout(wq_head, condition, timeout,             
> \
> +                         TASK_UNINTERRUPTIBLE, io_schedule());               
> \
> +})
> +
>  /*
>   * io_wait_event() -- like wait_event() but with io_schedule()
>   */
> @@ -323,6 +330,15 @@ do {                                                     
>                         \
>       __io_wait_event(wq_head, condition);                                    
> \
>  } while (0)
>  
> +
> +#define io_wait_event_hrtimeout(wq_head, condition, timeout)                 
> \
> +do {                                                                         
> \
> +     might_sleep();                                                          
> \
> +     if (condition)                                                          
> \
> +             break;                                                          
> \
> +     __io_wait_event_hrtimeout(wq_head, condition, timeout);                 
> \
> +} while (0)
> +
>  #define __wait_event_freezable(wq_head, condition)                           
> \
>       ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0,             
> \
>                           freezable_schedule())
> @@ -500,7 +516,7 @@ do {                                                      
>                         \
>       __ret;                                                                  
> \
>  })
>  
> -#define __wait_event_hrtimeout(wq_head, condition, timeout, state)           
> \
> +#define __wait_event_hrtimeout(wq_head, condition, timeout, state, cmd)      
> \
>  ({                                                                           
> \
>       int __ret = 0;                                                          
> \
>       struct hrtimer_sleeper __t;                                             
> \
> @@ -517,7 +533,7 @@ do {                                                      
>                         \
>                       __ret = -ETIME;                                         
> \
>                       break;                                                  
> \
>               }                                                               
> \
> -             schedule());                                                    
> \
> +             cmd);                                                           
> \
>                                                                               
> \
>       hrtimer_cancel(&__t.timer);                                             
> \
>       destroy_hrtimer_on_stack(&__t.timer);                                   
> \
> @@ -546,7 +562,8 @@ do {                                                      
>                         \
>       might_sleep();                                                          
> \
>       if (!(condition))                                                       
> \
>               __ret = __wait_event_hrtimeout(wq_head, condition, timeout,     
> \
> -                                            TASK_UNINTERRUPTIBLE);           
> \
> +                                            TASK_UNINTERRUPTIBLE,            
> \
> +                                            schedule());                     
> \
>       __ret;                                                                  
> \
>  })
>  
> @@ -572,7 +589,8 @@ do {                                                      
>                         \
>       might_sleep();                                                          
> \
>       if (!(condition))                                                       
> \
>               __ret = __wait_event_hrtimeout(wq, condition, timeout,          
> \
> -                                            TASK_INTERRUPTIBLE);             
> \
> +                                            TASK_INTERRUPTIBLE,              
> \
> +                                             schedule());                    
> \
>       __ret;                                                                  
> \
>  })
>  
> -- 
> 2.17.1
> 
-- 
Jan Kara <j...@suse.com>
SUSE Labs, CR

Reply via email to