On 09/12/2023 15:39, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
>       * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>       (__gthrw): New function.
>       (__gthread_rwlock_rdlock): New function.
>       (__gthread_rwlock_tryrdlock): New function.
>       (__gthread_rwlock_wrlock): New function.
>       (__gthread_rwlock_trywrlock): New function.
>       (__gthread_rwlock_unlock): New function.
> 
> libgfortran/ChangeLog:
> 
>       * io/async.c (DEBUG_LINE): New macro.
>       * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>       (CHECK_RDLOCK): New macro.
>       (CHECK_WRLOCK): New macro.
>       (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>       (IN_RWLOCK_DEBUG_QUEUE): New macro.
>       (RDLOCK): New macro.
>       (WRLOCK): New macro.
>       (RWUNLOCK): New macro.
>       (RD_TO_WRLOCK): New macro.
>       (INTERN_RDLOCK): New macro.
>       (INTERN_WRLOCK): New macro.
>       (INTERN_RWUNLOCK): New macro.
>       * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>       a comment.
>       (unit_lock): Remove including associated internal_proto.
>       (unit_rwlock): New declarations including associated internal_proto.
>       (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>       instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>       unit_lock.
>       * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>       unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>       (st_write_done_worker): Likewise.
>       * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>       comment. Use unit_rwlock variable instead of unit_lock variable.
>       (get_gfc_unit_from_unit_root): New function.
>       (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>       instead of LOCK and UNLOCK on unit_lock.
>       (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>       LOCK and UNLOCK on unit_lock.
>       (close_units): Likewise.
>       (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>       unit_lock.
>       * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>       instead of LOCK and UNLOCK on unit_lock.
>       (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>       of LOCK and UNLOCK on unit_lock.
> 

It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function 
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
 1023 |   WRLOCK (&unit_rwlock);
      |   ^~~~~~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
 1025 |   RWUNLOCK (&unit_rwlock);
      |   ^~~~~~~~


R.

> ---
> v1 -> v2:
> Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.
> 
> v2 -> v3:
> Rebase the patch with trunk branch.
> 
> v3 -> v4:
> Update the comments.
> 
> v4 -> v5:
> Fix typos and code formatter.
> 
> v5 -> v6:
> Add unit tests.
> 
> v6 -> v7:
> Update ChangeLog and code formatter.
> 
> Reviewed-by: Hongjiu Lu <hongjiu...@intel.com>
> Reviewed-by: Bernhard Reutner-Fischer <rep.dot....@gmail.com>
> Reviewed-by: Thomas Koenig <tkoe...@netcologne.de>
> Reviewed-by: Jakub Jelinek <ja...@redhat.com>
> Signed-off-by: Lipeng Zhu <lipeng....@intel.com>
> ---
>  libgcc/gthr-posix.h                           |  60 +++++++
>  libgfortran/io/async.c                        |   4 +
>  libgfortran/io/async.h                        | 151 ++++++++++++++++++
>  libgfortran/io/io.h                           |  15 +-
>  libgfortran/io/transfer.c                     |   8 +-
>  libgfortran/io/unit.c                         | 117 +++++++++-----
>  libgfortran/io/unix.c                         |  16 +-
>  .../testsuite/libgomp.fortran/rwlock_1.f90    |  33 ++++
>  .../testsuite/libgomp.fortran/rwlock_2.f90    |  22 +++
>  .../testsuite/libgomp.fortran/rwlock_3.f90    |  18 +++
>  10 files changed, 386 insertions(+), 58 deletions(-)
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> 
> diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
> index aebcfdd9f4c..73283082997 100644
> --- a/libgcc/gthr-posix.h
> +++ b/libgcc/gthr-posix.h
> @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
>  typedef pthread_key_t __gthread_key_t;
>  typedef pthread_once_t __gthread_once_t;
>  typedef pthread_mutex_t __gthread_mutex_t;
> +#ifndef __cplusplus
> +typedef pthread_rwlock_t __gthread_rwlock_t;
> +#endif
>  typedef pthread_mutex_t __gthread_recursive_mutex_t;
>  typedef pthread_cond_t __gthread_cond_t;
>  typedef struct timespec __gthread_time_t;
> @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
>  
>  #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
>  #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
> +#ifndef __cplusplus
> +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
> +#endif
>  #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
>  #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
>  #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER
> @@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init)
>  __gthrw(pthread_mutexattr_settype)
>  __gthrw(pthread_mutexattr_destroy)
>  
> +#ifndef __cplusplus
> +__gthrw(pthread_rwlock_rdlock)
> +__gthrw(pthread_rwlock_tryrdlock)
> +__gthrw(pthread_rwlock_wrlock)
> +__gthrw(pthread_rwlock_trywrlock)
> +__gthrw(pthread_rwlock_unlock)
> +#endif
>  
>  #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)
>  /* Objective-C.  */
> @@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond)
>    return __gthrw_(pthread_cond_destroy) (__cond);
>  }
>  
> +#ifndef __cplusplus
> +static inline int
> +__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_rdlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_wrlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_trywrlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +
> +static inline int
> +__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock)
> +{
> +  if (__gthread_active_p ())
> +    return __gthrw_(pthread_rwlock_unlock) (__rwlock);
> +  else
> +    return 0;
> +}
> +#endif
> +
>  #endif /* _LIBOBJC */
>  
>  #endif /* ! GCC_GTHR_POSIX_H */
> diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c
> index 8fa1f0d4ce0..91bf397105d 100644
> --- a/libgfortran/io/async.c
> +++ b/libgfortran/io/async.c
> @@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX);
>  
>  DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_INIT;)
>  DEBUG_LINE (aio_lock_debug *aio_debug_head = NULL;)
> +#ifdef __GTHREAD_RWLOCK_INIT
> +DEBUG_LINE (aio_rwlock_debug *aio_rwlock_debug_head = NULL;)
> +DEBUG_LINE (__gthread_rwlock_t debug_queue_rwlock = __GTHREAD_RWLOCK_INIT;)
> +#endif
>  
>  /* Current unit for asynchronous I/O.  Needed for error reporting.  */
>  
> diff --git a/libgfortran/io/async.h b/libgfortran/io/async.h
> index ad226c8e856..f112f6870bb 100644
> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -210,6 +210,128 @@
>      DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #mutex, mutex); \
>    } while (0)
>  
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RWLOCK_DEBUG_ADD(rwlock) do {                \
> +    aio_rwlock_debug *n;                             \
> +    n = xmalloc (sizeof (aio_rwlock_debug)); \
> +    n->prev = TAIL_RWLOCK_DEBUG_QUEUE;                       \
> +    if (n->prev)                             \
> +      n->prev->next = n;                     \
> +    n->next = NULL;                          \
> +    n->line = __LINE__;                              \
> +    n->func = __FUNCTION__;                  \
> +    n->rw = rwlock;                          \
> +    if (!aio_rwlock_debug_head) {                    \
> +      aio_rwlock_debug_head = n;                     \
> +    }                                                \
> +  } while (0)
> +
> +#define CHECK_RDLOCK(rwlock, status) do {                                    
> \
> +    aio_rwlock_debug *curr;                                          \
> +    INTERN_WRLOCK (&debug_queue_rwlock);                                     
> \
> +    if (__gthread_rwlock_tryrdlock (rwlock)) {                               
> \
> +      if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) {                         
> \
> +     sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, 
> curr->line); \
> +      } else                                                         \
> +     sprintf (status, DEBUG_RED "unknown" DEBUG_NORM);                       
> \
> +    }                                                                        
> \
> +    else {                                                           \
> +      __gthread_rwlock_unlock (rwlock);                                      
> \
> +      sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM);                 
> \
> +    }                                                                        
> \
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);                                   
> \
> +  }while (0)
> +
> +#define CHECK_WRLOCK(rwlock, status) do {                                    
> \
> +    aio_rwlock_debug *curr;                                          \
> +    INTERN_WRLOCK (&debug_queue_rwlock);                                     
> \
> +    if (__gthread_rwlock_trywrlock (rwlock)) {                               
> \
> +      if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) {                         
> \
> +     sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, 
> curr->line); \
> +      } else                                                         \
> +     sprintf (status, DEBUG_RED "unknown" DEBUG_NORM);                       
> \
> +    }                                                                        
> \
> +    else {                                                           \
> +      __gthread_rwlock_unlock (rwlock);                                      
> \
> +      sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM);                 
> \
> +    }                                                                        
> \
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);                                   
> \
> +  }while (0)
> +
> +#define TAIL_RWLOCK_DEBUG_QUEUE ({                   \
> +      aio_rwlock_debug *curr = aio_rwlock_debug_head;        \
> +      while (curr && curr->next) {           \
> +     curr = curr->next;                      \
> +      }                                              \
> +      curr;                                  \
> +    })
> +
> +#define IN_RWLOCK_DEBUG_QUEUE(rwlock) ({             \
> +      __label__ end;                         \
> +      aio_rwlock_debug *curr = aio_rwlock_debug_head;        \
> +      while (curr) {                         \
> +     if (curr->rw == rwlock) {                       \
> +       goto end;                             \
> +     }                                       \
> +     curr = curr->next;                      \
> +      }                                              \
> +    end:;                                    \
> +      curr;                                  \
> +    })
> +
> +#define RDLOCK(rwlock) do {                                          \
> +    char status[200];                                                        
> \
> +    CHECK_RDLOCK (rwlock, status);                                           
> \
> +    DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix,      
> \
> +              DEBUG_RED "RDLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, 
> __LINE__, (void *) rwlock); \
> +    INTERN_RDLOCK (rwlock);                                                  
> \
> +    INTERN_WRLOCK (&debug_queue_rwlock);                                     
> \
> +    RWLOCK_DEBUG_ADD (rwlock);                                               
> \
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);                                   
> \
> +    DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #rwlock, rwlock); \
> +  } while (0)
> +
> +#define WRLOCK(rwlock) do {                                          \
> +    char status[200];                                                        
> \
> +    CHECK_WRLOCK (rwlock, status);                                           
> \
> +    DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix,      
> \
> +              DEBUG_RED "WRLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, 
> __LINE__, (void *) rwlock); \
> +    INTERN_WRLOCK (rwlock);                                                  
> \
> +    INTERN_WRLOCK (&debug_queue_rwlock);                                     
> \
> +    RWLOCK_DEBUG_ADD (rwlock);                                               
> \
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);                                   
> \
> +    DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", 
> aio_prefix, #rwlock, rwlock); \
> +  } while (0)
> +
> +#define RWUNLOCK(rwlock) do {                                                
> \
> +    aio_rwlock_debug *curr;                                          \
> +    DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_GREEN 
> "RWUNLOCK: " DEBUG_NORM #rwlock, \
> +              __FUNCTION__, __LINE__, (void *) rwlock);              \
> +    INTERN_WRLOCK (&debug_queue_rwlock);                                     
> \
> +    curr = IN_RWLOCK_DEBUG_QUEUE (rwlock);                                   
> \
> +    if (curr)                                                                
> \
> +      {                                                                      
> \
> +     if (curr->prev)                                                 \
> +       curr->prev->next = curr->next;                                \
> +     if (curr->next) {                                               \
> +       curr->next->prev = curr->prev;                                \
> +       if (curr == aio_rwlock_debug_head)                                    
> \
> +         aio_rwlock_debug_head = curr->next;                         \
> +     } else {                                                        \
> +       if (curr == aio_rwlock_debug_head)                                    
> \
> +         aio_rwlock_debug_head = NULL;                                       
> \
> +     }                                                               \
> +     free (curr);                                                    \
> +      }                                                                      
> \
> +    INTERN_RWUNLOCK (&debug_queue_rwlock);                                   
> \
> +    INTERN_RWUNLOCK (rwlock);                                                
> \
> +  } while (0)
> +
> +#define RD_TO_WRLOCK(rwlock) \
> +  RWUNLOCK (rwlock); \
> +  WRLOCK (rwlock);
> +#endif
> +
>  #define DEBUG_LINE(...) __VA_ARGS__
>  
>  #else
> @@ -221,12 +343,31 @@
>  #define LOCK(mutex) INTERN_LOCK (mutex)
>  #define UNLOCK(mutex) INTERN_UNLOCK (mutex)
>  #define TRYLOCK(mutex) (__gthread_mutex_trylock (mutex))
> +#ifdef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) INTERN_RDLOCK (rwlock)
> +#define WRLOCK(rwlock) INTERN_WRLOCK (rwlock)
> +#define RWUNLOCK(rwlock) INTERN_RWUNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) \
> +  RWUNLOCK (rwlock); \
> +  WRLOCK (rwlock);
> +#endif
> +#endif
> +
> +#ifndef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) LOCK (rwlock)
> +#define WRLOCK(rwlock) LOCK (rwlock)
> +#define RWUNLOCK(rwlock) UNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) do {} while (0)
>  #endif
>  
>  #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
>  
>  #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
>  
> +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock)
> +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock)
> +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock)
> +
>  #if ASYNC_IO
>  
>  /* au->lock has to be held when calling this macro.  */
> @@ -288,8 +429,18 @@ DEBUG_LINE (typedef struct aio_lock_debug{
>    struct aio_lock_debug *prev;
>  } aio_lock_debug;)
>  
> +DEBUG_LINE (typedef struct aio_rwlock_debug{
> +  __gthread_rwlock_t *rw;
> +  int line;
> +  const char *func;
> +  struct aio_rwlock_debug *next;
> +  struct aio_rwlock_debug *prev;
> +} aio_rwlock_debug;)
> +
>  DEBUG_LINE (extern aio_lock_debug *aio_debug_head;)
>  DEBUG_LINE (extern __gthread_mutex_t debug_queue_lock;)
> +DEBUG_LINE (extern aio_rwlock_debug *aio_rwlock_debug_head;)
> +DEBUG_LINE (extern __gthread_rwlock_t debug_queue_rwlock;)
>  
>  /* Thread - local storage of the current unit we are looking at. Needed for
>     error reporting.  */
> diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
> index ecdf1dd3f05..15daa0995b1 100644
> --- a/libgfortran/io/io.h
> +++ b/libgfortran/io/io.h
> @@ -690,7 +690,7 @@ typedef struct gfc_unit
>       from the UNIT_ROOT tree, but doesn't free it and the
>       last of the waiting threads will do that.
>       This must be either atomically increased/decreased, or
> -     always guarded by UNIT_LOCK.  */
> +     always guarded by UNIT_RWLOCK.  */
>    int waiting;
>    /* Flag set by close_unit if the unit as been closed.
>       Must be manipulated under unit's lock.  */
> @@ -769,8 +769,13 @@ internal_proto(default_recl);
>  extern gfc_unit *unit_root;
>  internal_proto(unit_root);
>  
> -extern __gthread_mutex_t unit_lock;
> -internal_proto(unit_lock);
> +#ifdef __GTHREAD_RWLOCK_INIT
> +extern __gthread_rwlock_t unit_rwlock;
> +internal_proto(unit_rwlock);
> +#else
> +extern __gthread_mutex_t unit_rwlock;
> +internal_proto(unit_rwlock);
> +#endif
>  
>  extern int close_unit (gfc_unit *);
>  internal_proto(close_unit);
> @@ -1015,9 +1020,9 @@ dec_waiting_unlocked (gfc_unit *u)
>  #ifdef HAVE_ATOMIC_FETCH_ADD
>    (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
>  #else
> -  __gthread_mutex_lock (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    u->waiting--;
> -  __gthread_mutex_unlock (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>  #endif
>  }
>  
> diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
> index 500db90c828..00d516adcb0 100644
> --- a/libgfortran/io/transfer.c
> +++ b/libgfortran/io/transfer.c
> @@ -4538,9 +4538,9 @@ st_read_done_worker (st_parameter_dt *dtp, bool unlock)
>     if (free_newunit)
>       {
>         /* Avoid inverse lock issues by placing after unlock_unit.  */
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         newunit_free (dtp->common.unit);
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>       }
>  }
>  
> @@ -4634,9 +4634,9 @@ st_write_done_worker (st_parameter_dt *dtp, bool unlock)
>     if (free_newunit)
>       {
>         /* Avoid inverse lock issues by placing after unlock_unit.  */
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         newunit_free (dtp->common.unit);
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>       }
>  }
>  
> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c
> index 36d025949c2..0c8c35e464e 100644
> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  
>  
>  /* IO locking rules:
> -   UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   UNIT_RWLOCK is a master rw lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> +   Using an rwlock improves efficiency by allowing us to separate readers
> +   and writers of both UNIT_ROOT and UNIT_CACHE.
>     Concurrent use of different units should be supported, so
>     each unit has its own lock, LOCK.
>     Open should be atomic with its reopening of units and list_read.c
>     in several places needs find_unit another unit while holding stdin
> -   unit's lock, so it must be possible to acquire UNIT_LOCK while holding
> +   unit's lock, so it must be possible to acquire UNIT_RWLOCK while holding
>     some unit's lock.  Therefore to avoid deadlocks, it is forbidden
> -   to acquire unit's private locks while holding UNIT_LOCK, except
> +   to acquire unit's private locks while holding UNIT_RWLOCK, except
>     for freshly created units (where no other thread can get at their
>     address yet) or when using just trylock rather than lock operation.
>     In addition to unit's private lock each unit has a WAITERS counter
>     and CLOSED flag.  WAITERS counter must be either only
>     atomically incremented/decremented in all places (if atomic builtins
> -   are supported), or protected by UNIT_LOCK in all places (otherwise).
> +   are supported), or protected by UNIT_RWLOCK in all places (otherwise).
>     CLOSED flag must be always protected by unit's LOCK.
> -   After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_LOCK held,
> +   After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_RWLOCK held,
>     WAITERS must be incremented to avoid concurrent close from freeing
> -   the unit between unlocking UNIT_LOCK and acquiring unit's LOCK.
> -   Unit freeing is always done under UNIT_LOCK.  If close_unit sees any
> +   the unit between unlocking UNIT_RWLOCK and acquiring unit's LOCK.
> +   Unit freeing is always done under UNIT_RWLOCK.  If close_unit sees any
>     WAITERS, it doesn't free the unit but instead sets the CLOSED flag
>     and the thread that decrements WAITERS to zero while CLOSED flag is
> -   set is responsible for freeing it (while holding UNIT_LOCK).
> +   set is responsible for freeing it (while holding UNIT_RWLOCK).
>     flush_all_units operation is iterating over the unit tree with
> -   increasing UNIT_NUMBER while holding UNIT_LOCK and attempting to
> +   increasing UNIT_NUMBER while holding UNIT_RWLOCK and attempting to
>     flush each unit (and therefore needs the unit's LOCK held as well).
>     To avoid deadlocks, it just trylocks the LOCK and if unsuccessful,
> -   remembers the current unit's UNIT_NUMBER, unlocks UNIT_LOCK, acquires
> -   unit's LOCK and after flushing reacquires UNIT_LOCK and restarts with
> +   remembers the current unit's UNIT_NUMBER, unlocks UNIT_RWLOCK, acquires
> +   unit's LOCK and after flushing reacquires UNIT_RWLOCK and restarts with
>     the smallest UNIT_NUMBER above the last one flushed.
>  
>     If find_unit/find_or_create_unit/find_file/get_unit routines return
> @@ -101,10 +103,14 @@ gfc_offset max_offset;
>  gfc_offset default_recl;
>  
>  gfc_unit *unit_root;
> +#ifdef __GTHREAD_RWLOCK_INIT
> +__gthread_rwlock_t unit_rwlock = __GTHREAD_RWLOCK_INIT;
> +#else
>  #ifdef __GTHREAD_MUTEX_INIT
> -__gthread_mutex_t unit_lock = __GTHREAD_MUTEX_INIT;
> +__gthread_mutex_t unit_rwlock = __GTHREAD_MUTEX_INIT;
>  #else
> -__gthread_mutex_t unit_lock;
> +__gthread_mutex_t unit_rwlock;
> +#endif
>  #endif
>  
>  /* We use these filenames for error reporting.  */
> @@ -317,6 +323,28 @@ delete_unit (gfc_unit *old)
>    unit_root = delete_treap (old, unit_root);
>  }
>  
> +/* get_gfc_unit_from_root()-- Given an integer, return a pointer
> +   to the unit structure. Returns NULL if the unit does not exist,
> +   otherwise returns a locked unit. */
> +
> +static inline gfc_unit *
> +get_gfc_unit_from_unit_root (int n)
> +{
> +  gfc_unit *p;
> +  int c = 0;
> +  p = unit_root;
> +  while (p != NULL)
> +    {
> +      c = compare (n, p->unit_number);
> +      if (c < 0)
> +        p = p->left;
> +      if (c > 0)
> +        p = p->right;
> +      if (c == 0)
> +        break;
> +    }
> +  return p;
> +}
>  
>  /* get_gfc_unit()-- Given an integer, return a pointer to the unit
>     structure.  Returns NULL if the unit does not exist,
> @@ -329,7 +357,7 @@ get_gfc_unit (int n, int do_create)
>    int c, created = 0;
>  
>    NOTE ("Unit n=%d, do_create = %d", n, do_create);
> -  LOCK (&unit_lock);
> +  RDLOCK (&unit_rwlock);
>  
>  retry:
>    for (c = 0; c < CACHE_SIZE; c++)
> @@ -339,18 +367,25 @@ retry:
>       goto found;
>        }
>  
> -  p = unit_root;
> -  while (p != NULL)
> -    {
> -      c = compare (n, p->unit_number);
> -      if (c < 0)
> -     p = p->left;
> -      if (c > 0)
> -     p = p->right;
> -      if (c == 0)
> -     break;
> -    }
> -
> +  p = get_gfc_unit_from_unit_root (n);
> +
> +  /* We did not find a unit in the cache nor in the unit list,
> +    create a new (locked) unit and insert into the unit list and
> +    cache. Manipulating either or both the unit list and the unit
> +    cache requires to hold a write-lock [for obvious reasons]:
> +    By separating the read/write lock, we will greatly reduce
> +    the contention on the read part, while the write part is
> +    unlikely once the unit hits the cache. */
> +  RD_TO_WRLOCK (&unit_rwlock);
> +
> +  /* In the case of high concurrency, when multiple threads want
> +    to find or create the same unit, the unit number may not
> +    exist in cache nor in the unit list during read phase, then
> +    threads will acquire the write-lock to insert the same unit
> +    number to unit list. To avoid duplicate insert, we need to
> +    find unit list once again to ensure that the unit number
> +    not exist. */
> +  p = get_gfc_unit_from_unit_root (n);
>    if (p == NULL && do_create)
>      {
>        p = insert_unit (n);
> @@ -368,8 +403,8 @@ retry:
>    if (created)
>      {
>        /* Newly created units have their lock held already
> -      from insert_unit.  Just unlock UNIT_LOCK and return.  */
> -      UNLOCK (&unit_lock);
> +      from insert_unit. Just unlock UNIT_RWLOCK and return. */
> +      RWUNLOCK (&unit_rwlock);
>        return p;
>      }
>  
> @@ -380,7 +415,7 @@ found:
>        if (! TRYLOCK (&p->lock))
>       {
>         /* assert (p->closed == 0); */
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>         return p;
>       }
>  
> @@ -388,14 +423,14 @@ found:
>      }
>  
>  
> -  UNLOCK (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>  
>    if (p != NULL && (p->child_dtio == 0))
>      {
>        LOCK (&p->lock);
>        if (p->closed)
>       {
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         UNLOCK (&p->lock);
>         if (predec_waiting_locked (p) == 0)
>           destroy_unit_mutex (p);
> @@ -594,8 +629,8 @@ init_units (void)
>  #endif
>  #endif
>  
> -#ifndef __GTHREAD_MUTEX_INIT
> -  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
> +#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT))
> +  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
>  #endif
>  
>    if (sizeof (max_offset) == 8)
> @@ -732,7 +767,7 @@ close_unit_1 (gfc_unit *u, int locked)
>  
>    u->closed = 1;
>    if (!locked)
> -    LOCK (&unit_lock);
> +    WRLOCK (&unit_rwlock);
>  
>    for (i = 0; i < CACHE_SIZE; i++)
>      if (unit_cache[i] == u)
> @@ -759,7 +794,7 @@ close_unit_1 (gfc_unit *u, int locked)
>      destroy_unit_mutex (u);
>  
>    if (!locked)
> -    UNLOCK (&unit_lock);
> +    RWUNLOCK (&unit_rwlock);
>  
>    return rc;
>  }
> @@ -796,10 +831,10 @@ close_unit (gfc_unit *u)
>  void
>  close_units (void)
>  {
> -  LOCK (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    while (unit_root != NULL)
>      close_unit_1 (unit_root, 1);
> -  UNLOCK (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>  
>    free (newunits);
>  
> @@ -906,7 +941,7 @@ finish_last_advance_record (gfc_unit *u)
>  int
>  newunit_alloc (void)
>  {
> -  LOCK (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    if (!newunits)
>      {
>        newunits = xcalloc (16, 1);
> @@ -920,7 +955,7 @@ newunit_alloc (void)
>          {
>            newunits[ii] = true;
>            newunit_lwi = ii + 1;
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>            return -ii + NEWUNIT_START;
>          }
>      }
> @@ -933,12 +968,12 @@ newunit_alloc (void)
>    memset (newunits + old_size, 0, old_size);
>    newunits[old_size] = true;
>    newunit_lwi = old_size + 1;
> -    UNLOCK (&unit_lock);
> +    RWUNLOCK (&unit_rwlock);
>    return -old_size + NEWUNIT_START;
>  }
>  
>  
> -/* Free a previously allocated newunit= unit number.  unit_lock must
> +/* Free a previously allocated newunit= unit number.  unit_rwlock must
>     be held when calling.  */
>  
>  void
> diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c
> index d466df979df..dcae051744d 100644
> --- a/libgfortran/io/unix.c
> +++ b/libgfortran/io/unix.c
> @@ -1773,7 +1773,7 @@ find_file (const char *file, gfc_charlen_type file_len)
>    id = id_from_path (path);
>  #endif
>  
> -  LOCK (&unit_lock);
> +  RDLOCK (&unit_rwlock);
>  retry:
>    u = find_file0 (unit_root, FIND_FILE0_ARGS);
>    if (u != NULL)
> @@ -1782,19 +1782,19 @@ retry:
>        if (! __gthread_mutex_trylock (&u->lock))
>       {
>         /* assert (u->closed == 0); */
> -       UNLOCK (&unit_lock);
> +       RWUNLOCK (&unit_rwlock);
>         goto done;
>       }
>  
>        inc_waiting_locked (u);
>      }
> -  UNLOCK (&unit_lock);
> +  RWUNLOCK (&unit_rwlock);
>    if (u != NULL)
>      {
>        LOCK (&u->lock);
>        if (u->closed)
>       {
> -       LOCK (&unit_lock);
> +       RDLOCK (&unit_rwlock);
>         UNLOCK (&u->lock);
>         if (predec_waiting_locked (u) == 0)
>           free (u);
> @@ -1838,13 +1838,13 @@ flush_all_units (void)
>    gfc_unit *u;
>    int min_unit = 0;
>  
> -  LOCK (&unit_lock);
> +  WRLOCK (&unit_rwlock);
>    do
>      {
>        u = flush_all_units_1 (unit_root, min_unit);
>        if (u != NULL)
>       inc_waiting_locked (u);
> -      UNLOCK (&unit_lock);
> +      RWUNLOCK (&unit_rwlock);
>        if (u == NULL)
>       return;
>  
> @@ -1855,13 +1855,13 @@ flush_all_units (void)
>        if (u->closed == 0)
>       {
>         sflush (u->s);
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         UNLOCK (&u->lock);
>         (void) predec_waiting_locked (u);
>       }
>        else
>       {
> -       LOCK (&unit_lock);
> +       WRLOCK (&unit_rwlock);
>         UNLOCK (&u->lock);
>         if (predec_waiting_locked (u) == 0)
>           free (u);
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 
> b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> new file mode 100644
> index 00000000000..f90ecbeb00f
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_1.f90
> @@ -0,0 +1,33 @@
> +! { dg-do run }
> +! Multiple threads call open/write/read/close in concurrency with different 
> unit number,
> +! threads can acquire read lock concurrently, to find unit from cache or 
> unit list very frequently,
> +! if not found, threads will acquire the write lock exclusively to insert 
> unit to cache and unit list.
> +! This test case is used to stress both the read and write lock when access 
> unit cache and list.
> +program main
> +  use omp_lib
> +  implicit none
> +  integer:: unit_number, v1, v2, i
> +  character(11) :: file_name
> +  character(3) :: async = "no"
> +  !$omp parallel private (unit_number, v1, v2, file_name, async, i)
> +    do i = 0, 100
> +      unit_number = 10 + omp_get_thread_num ()
> +      write (file_name, "(I3, A)") unit_number, "_tst.dat"
> +      file_name = adjustl(file_name)
> +      open (unit_number, file=file_name, asynchronous="yes")
> +      ! call inquire with file parameter to test find_file in unix.c
> +      inquire (file=file_name, asynchronous=async)
> +      if (async /= "YES") stop 1
> +      write (unit_number, *, asynchronous="yes") unit_number
> +      write (unit_number, *, asynchronous="yes") unit_number + 1
> +      close(unit_number)
> +
> +      open (unit_number, file = file_name, asynchronous="yes")
> +      read (unit_number, *, asynchronous="yes") v1
> +      read (unit_number, *, asynchronous="yes") v2
> +      wait (unit_number)
> +      if ((v1 /= unit_number) .or. (v2 /= unit_number + 1)) stop 2
> +      close(unit_number, status="delete")
> +    end do
> +  !$omp end parallel
> +end program
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_2.f90 
> b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> new file mode 100644
> index 00000000000..08c80d14cfb
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_2.f90
> @@ -0,0 +1,22 @@
> +! { dg-do run }
> +! Insert a unit into cache at the beginning, then start multiple
> +! threads to access the same unit concurrency, unit will be found in unit 
> cache during the read lock phase.
> +! This test case is used to test the read lock when access unit cache and 
> list.
> +program main
> +  use omp_lib
> +  implicit none
> +  integer:: thread_id, total_threads, i, j
> +  total_threads = omp_get_max_threads ()
> +  open (10, file='tst.dat', asynchronous="yes")
> +  !$omp parallel private (thread_id, i, j)
> +    do i = 1, 100
> +      thread_id = omp_get_thread_num ()
> +      do j = 1, 100
> +        write (10, *, asynchronous="yes") thread_id, i
> +      end do
> +    end do
> +  !$omp end parallel
> +  ! call inquire with file parameter to test find_file in unix.c
> +  call flush ()
> +  close (10, status="delete")
> +end program
> diff --git a/libgomp/testsuite/libgomp.fortran/rwlock_3.f90 
> b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> new file mode 100644
> index 00000000000..1906fcd7a0b
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/rwlock_3.f90
> @@ -0,0 +1,18 @@
> +! { dg-do run }
> +! Find or create the same unit number in concurrency,
> +! at beginning, threads cannot find the unit in cache or unit list,
> +! then threads will acquire the write lock to insert unit.
> +! This test case is used to ensure that no duplicate unit number will be
> +! inserted into cache nor unit list when same unit was accessed in 
> concurrency.
> +program main
> +  use omp_lib
> +  implicit none
> +  integer:: i
> +  !$omp parallel private (i)
> +    do i = 1, 100
> +      open (10, file='tst.dat', asynchronous="yes")
> +      ! Delete the unit number from cache and unit list to stress write lock.
> +      close (10, status="delete")
> +    end do
> +  !$omp end parallel
> +end program

Reply via email to