On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
Hi!

[please do not top-post]

Sure, thanks for the reminder.

On Thu, 20 Apr 2023 21:13:08 +0800
"Zhu, Lipeng" <lipeng....@intel.com> wrote:

Hi Bernhard,

Thanks for your questions and suggestions.
The rwlock could allow multiple threads to have concurrent read-only
access to the cache/unit list, only a single writer is allowed.

right.

Write lock will not be acquired until all read lock are released.

So i must have confused rwlock with something else, something that allows self 
to hold a read-lock and
upgrade that to a write-lock, purposely starving all successive incoming 
readers. I.e. just toggle your
RD_TO_WRLOCK impl, here, atomically. This proved to be benefical in some 
situations in the past.
Doesn't seem to work with your rwlock, does it

Pthread API doesn't support the upgrade rdlock to wrlock directly, the calling thread may deadlock if at the time the call is made it holds the read-write lock (whether a read or write lock). https://linux.die.net/man/3/pthread_rwlock_wrlock. That is why we implement RD_TO_WRLOCK like this: release read lock firstly and then acquire write lock.

And I didn't change the mutex scope when refactor the code, only make
a more fine-grained distinction for the read/write cache/unit list.

Yes of course, i can see you did that.

I complete the comment according to your template, I will insert the
comment in the source code in next version patch with other refinement
by your suggestions.
"
Now we did not get a unit in cache and unit list, so we need to create
a new unit, and update it to cache and unit list.

s/Now/By now/ or s/Now w/W/ and s/get/find/ "
We did not find a unit in the cache nor in the unit list, create a new
(locked) unit and insert into the unit list and cache.
Manipulating either or both the unit list and the unit cache requires to hold a 
write-lock [for obvious
reasons]"

Superfluous when talking about pthread_rwlock_wrlock since that implies that 
even the process
acquiring the wrlock has to first release it's very own rdlock.

Thanks for the correction, I will update the comments in next v4 patch.

Prior to update the cache and list, we need to release all read locks,
and then immediately to acquire write lock, thus ensure the exclusive
update to the cache and unit list.
Either way, we will manipulate the cache and/or the unit list so we
must take a write lock now.
We don't take the write bit in *addition* to the read lock because:
1. It will needlessly complicate releasing the respective lock;

Under pthread_rwlock_wrlock it will deadlock, so that's wrong?
Drop that point then? If not, what's your reasoning / observation?

Under my lock, you hold the R, additionally take the W and then immediately 
release the R because you
yourself won't read, just write.
But mine's not the pthread_rwlock you talk about, admittedly.

Yes, pthread_rwlock doesn't support upgrade rdlock to wrlock directly, so we can’t hold rdlock while trying to acquire wrlock. I will drop the first point and update the comment accordingly.

2. By separate the read/write lock, it will greatly reduce the
contention at the read part, while write part is not always necessary
or most unlikely once the unit hit in cache;

We know that.

3. We try to balance the implementation complexity and the performance
gains that fit into current cases we observed.

.. by just using a pthread_rwlock. And that's the top point iff you keep it at 
that. That's a fair step, sure.
BTW, did you look at the RELEASE semantics, respectively the note that one day 
(and now is that very
day), we might improve on the release semantics? Can of course be incremental 
AFAIC

Not quite understand your question about looking at the RELEASE semantics. Can you be more specific?

"

If folks agree on this first step then you have my OK with a catchy malloc and 
the discussion recorded
here on the list. A second step would be RELEASE.
And, depending on the underlying capabilities of available locks, further 
tweaks, obviously.

PS: and, please, don't top-post

thanks,


Best Regards,
Zhu, Lipeng

On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran <fortran@gcc.gnu.org> 
wrote:
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the
percentage to step into the insert_unit function is around 30%, in
most instances, we can get the unit in the phase of reading the
unit_cache or unit_root tree. So split the read/write phase by
rwlock would be an approach to make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with 220
cores. The benchmark we used is https://github.com/rwesson/NEAT
+#define RD_TO_WRLOCK(rwlock) \
+  RWUNLOCK (rwlock);\
+  WRLOCK (rwlock);
+#endif
+

diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
82664dc5f98..4312c5f36de 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c
@@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create)
    int c, created = 0;

    NOTE ("Unit n=%d, do_create = %d", n, do_create);
-  LOCK (&unit_lock);
+  RDLOCK (&unit_rwlock);

retry:
    for (c = 0; c < CACHE_SIZE; c++) @@ -350,6 +356,7 @@ retry:
        if (c == 0)
        break;
      }
+  RD_TO_WRLOCK (&unit_rwlock);

So I'm trying to convince myself why it's safe to unlock and only then take the 
write lock.

Can you please elaborate/confirm why that's ok?

I wouldn't mind a comment like
We can release the unit and cache read lock now. We might have to allocate a 
(locked) unit, below in
do_create.
Either way, we will manipulate the cache and/or the unit list so we have to 
take a write lock now.

We don't take the write bit in *addition* to the read lock because..

(that needlessly complicates releasing the respective locks / it triggers too 
much contention when
we..
/ ...?)

thanks,

    if (p == NULL && do_create)
      {
@@ -368,8 +375,8 @@ retry:
    if (created)
      {
        /* Newly created units have their lock held already
-        from insert_unit.  Just unlock UNIT_LOCK and return.  */
-      UNLOCK (&unit_lock);
+        from insert_unit.  Just unlock UNIT_RWLOCK and return.  */
+      RWUNLOCK (&unit_rwlock);
        return p;
      }

@@ -380,7 +387,7 @@ found:
        if (! TRYLOCK (&p->lock))
        {
          /* assert (p->closed == 0); */
-         UNLOCK (&unit_lock);
+         RWUNLOCK (&unit_rwlock);
          return p;
        }

@@ -388,14 +395,14 @@ found:
      }


-  UNLOCK (&unit_lock);
+  RWUNLOCK (&unit_rwlock);

    if (p != NULL && (p->child_dtio == 0))
      {
        LOCK (&p->lock);
        if (p->closed)
        {
-         LOCK (&unit_lock);
+         WRLOCK (&unit_rwlock);
          UNLOCK (&p->lock);
          if (predec_waiting_locked (p) == 0)
            destroy_unit_mutex (p);
@@ -593,8 +600,8 @@ init_units (void)
#endif
#endif

-#ifndef __GTHREAD_MUTEX_INIT
-  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
+#if (!defined(__GTHREAD_RWLOCK_INIT) &&
+!defined(__GTHREAD_MUTEX_INIT))
+  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
#endif

    if (sizeof (max_offset) == 8)

Reply via email to