RE: [PATCH v2] libgfortran: Replace mutex with rwlock
> > Hi Lipeng, > > > > > This patch try to introduce the rwlock and split the read/write to > > > unit_root tree and unit_cache with rwlock instead of the mutex to > > > increase CPU efficiency. In the get_gfc_unit function, the > > > percentage to step into the insert_unit function is around 30%, in > > > most instances, we can get the unit in the phase of reading the > > > unit_cache or unit_root tree. So split the read/write phase by > > > rwlock would be an approach to make it more parallel. > > > > No comment on the code itself, as yet... but I'd like to know how > > throroughly you tested it, using which tools, and on which programs. > > Did you use valgrind --tool=helgrind or --tool=drd? Since it is prone to > > race conditions, did you also test Fortran's asynchronous I/O? > > > > Best regards > > > > Thomas > > Hi Thomas, > > I didn’t use valgrind and make check-fortran succeed in local. > And the tools/programs I used is pts/neatbench > https://openbenchmarking.org/test/pts/neatbench > > Best Regards, > Lipeng Zhu Hi Thomas, Thanks for your comments, I just use valgrind --tool=helgrind or --tool=drd to test make check-fortran, no errors occurred. ==2609780== Helgrind, a thread error detector ==2609780== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al. ==2609780== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==2609780== Command: make check-fortran -j150 ==2609780== ==2609780== ==2609780== Use --history-level=approx or =none to gain increased speed, at ==2609780== the cost of reduced accuracy of conflicting-access information ==2609780== For lists of detected and suppressed errors, rerun with: -s ==2609780== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==2976599== drd, a thread error detector ==2976599== Copyright (C) 2006-2020, and GNU GPL'd, by Bart Van Assche. ==2976599== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==2976599== Command: make check-fortran -j150 ==2976599== ==2976599== ==2976599== For lists of detected and suppressed errors, rerun with: -s ==2976599== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) BTW, can you give me some guidance of how to test Fortran's asynchronous I/O? Best Regards, Lipeng Zhu
RE: [PATCH v2] libgfortran: Replace mutex with rwlock
> Hi Lipeng, > > This patch try to introduce the rwlock and split the read/write to > > unit_root tree and unit_cache with rwlock instead of the mutex to > > increase CPU efficiency. In the get_gfc_unit function, the percentage > > to step into the insert_unit function is around 30%, in most > > instances, we can get the unit in the phase of reading the unit_cache > > or unit_root tree. So split the read/write phase by rwlock would be an > > approach to make it more parallel. > > No comment on the code itself, as yet... but I'd like to know how throroughly > you tested it, using which tools, and on which programs. > Did you use valgrind --tool=helgrind or --tool=drd? Since it is prone to > race conditions, did you also test Fortran's asynchronous I/O? > > Best regards > > Thomas Hi Thomas, I didn’t use valgrind and make check-fortran succeed in local. And the tools/programs I used is pts/neatbench https://openbenchmarking.org/test/pts/neatbench Best Regards, Lipeng Zhu
Re: [PATCH v2] libgfortran: Replace mutex with rwlock
Hi Lipeng, This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. No comment on the code itself, as yet... but I'd like to know how throroughly you tested it, using which tools, and on which programs. Did you use valgrind --tool=helgrind or --tool=drd? Since it is prone to race conditions, did you also test Fortran's asynchronous I/O? Best regards Thomas
[PATCH v2] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro (__gthrw): New function (__gthread_rwlock_rdlock): New function (__gthread_rwlock_tryrdlock): New function (__gthread_rwlock_wrlock): New function (__gthread_rwlock_trywrlock): New function (__gthread_rwlock_unlock): New function libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New * io/async.h (RWLOCK_DEBUG_ADD): New macro (CHECK_RDLOCK): New macro (CHECK_WRLOCK): New macro (TAIL_RWLOCK_DEBUG_QUEUE): New macro (IN_RWLOCK_DEBUG_QUEUE): New macro (RDLOCK): New macro (WRLOCK): New macro (RWUNLOCK): New macro (RD_TO_WRLOCK): New macro (INTERN_RDLOCK): New macro (INTERN_WRLOCK): New macro (INTERN_RWUNLOCK): New macro * io/io.h (internal_proto): Define unit_rwlock * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock (st_write_done_worker): Relace unit_lock with unit_rwlock * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock (if): Relace unit_lock with unit_rwlock (close_unit_1): Relace unit_lock with unit_rwlock (close_units): Relace unit_lock with unit_rwlock (newunit_alloc): Relace unit_lock with unit_rwlock * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 ++-- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 65 libgfortran/io/unix.c | 16 ++-- 7 files changed, 273 insertions(+), 46 deletions(-) diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index f1a5ab8e075..cbe7cba20d0 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER) #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init) __gthrw(pthread_mutexattr_settype) __gthrw(pthread_mutexattr_destroy) +#ifndef __cplusplus +__gthrw(pthread_rwlock_rdlock) +__gthrw(pthread_rwlock_tryrdlock) +__gthrw(pthread_rwlock_wrlock) +__gthrw(pthread_rwlock_trywrlock) +__gthrw(pthread_rwlock_unlock) +#endif #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK) /* Objective-C. */ @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond) return __gthrw_(pthread_cond_destroy) (__cond); } +#ifndef __cplusplus +static inline int +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_rdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_wrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_trywrlock) (__rwlock); + else +return 0; +} + +static inline int +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock) +{ + if (__gthread_active_p ()) +return __gthrw_(pthread_rwlock_unlock) (__rwlock); + else +return 0; +} +#endif + #endif /* _LIBOBJC */ #endif /* ! GCC_GTHR_POSIX_H */ diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c in