[lock-free] Re: Experimental Read/Write Mutex..
Sorry for not working on this like I should be doing. Fwiw, I got distracted because several of my fractals are now on a very nice site: http://paulbourke.net/fractals My work: http://paulbourke.net/fractals/logspiral http://paulbourke.net/fractals/multijulia http://paulbourke.net/fractals/cubicjulia http://paulbourke.net/fractals/fractionalpowers The multi julia is very interesting, and creates some strange looking 3d renderings, even though the algorithm is 100% 2d in nature. Interested? To clarify, Paul used my special IFS algorithm to create the renderings. He used is own coloring algorithms. -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to lock-free+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/88f28b42-4d34-4360-a2df-19e4f617c4f0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[lock-free] Re: Experimental Read/Write Mutex..
On Saturday, March 2, 2019 at 7:28:28 AM UTC-8, Manuel Pöter wrote: > > > > On Friday, 1 March 2019 23:29:41 UTC+1, Chris M. Thomasson wrote: >> >> On Friday, March 1, 2019 at 7:11:20 AM UTC-8, Manuel Pöter wrote: >>> >>> I could get my hands on an AMD EPYC 7351P system and a Xeon Phi (Knights >>> Corner) if you are interested in more results... >>> >> >> I would be grateful if you can find the time to do this Manuel: Thanks. >> :^) >> > > Will do! :) > Thanks. I am interested in all the results you can get. > > Really like the "difference" between the Intel and SPARC results. Afaict, >> std::shared_mutex, on Solaris? Needs to be fixed? I do not like the way it >> seems to "tank" the reads and obtain a shi% load of writes. Humm... >> > > Yes, the SPARC system is running Solaris 11.3; the binaries were built > using a self-compiled GCC 6.3. I have to admit that I did not really look > at the results before posting them here, but now that you pointed it out I > agree - the shared_mutex results on Solaris are definitely not what I would > have expected! > Agreed. Very strange, massive writer preference. Humm... -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to lock-free+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/ca0474ba-6eb9-4df1-bd85-c7f33ca20f02%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[lock-free] Re: Experimental Read/Write Mutex..
I need to create another type of test, one that does not hammer things so much wrt creating an equal amount of reader and writer threads. I need to basically model a "read-mostly" work environment. Instead of a hyper-hardcore read and write focused contention massacre... -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to lock-free+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/93213150-1e75-4e69-9510-cfc302deca59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[lock-free] Re: Experimental Read/Write Mutex..
On Friday, March 1, 2019 at 7:11:20 AM UTC-8, Manuel Pöter wrote: > > I could get my hands on an AMD EPYC 7351P system and a Xeon Phi (Knights > Corner) if you are interested in more results... > I would be grateful if you can find the time to do this Manuel: Thanks. :^) Really like the "difference" between the Intel and SPARC results. Afaict, std::shared_mutex, on Solaris? Needs to be fixed? I do not like the way it seems to "tank" the reads and obtain a shi% load of writes. Humm... -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to lock-free+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/1dc19580-0c09-4464-953e-75343985cb57%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[lock-free] Re: Experimental Read/Write Mutex..
On Wednesday, February 27, 2019 at 6:34:41 AM UTC-8, Manuel Pöter wrote: > > Benchmarked this on 2 systems with `THREADS` set to 1, 2 and 4. > > Result 1: > > 8x Intel(R) Xeon(R) CPU E7- 8850 @ 2.00GHz (80 cores, 2x SMT) > > Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex > ___ > cpu_threads_n = 160 > threads_n = 160 > writers = 80 > readers = 80 > test duration = 60 seconds > ___ > ___ > Raw Reads: 186803 > Raw Writes: 1922 > reads_per_tick = 3111 > writes_per_tick = 32 > Ticks = 60.0334 > ___ > > Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex > ___ > cpu_threads_n = 160 > threads_n = 320 > writers = 160 > readers = 160 > test duration = 60 seconds > ___ > ___ > Raw Reads: 336265 > Raw Writes: 2559 > reads_per_tick = 5596 > writes_per_tick = 42 > Ticks = 60.0848 > ___ > > Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex > ___ > cpu_threads_n = 160 > threads_n = 640 > writers = 320 > readers = 320 > test duration = 60 seconds > ___ > ___ > Raw Reads: 449283 > Raw Writes: 3718 > reads_per_tick = 7471 > writes_per_tick = 61 > Ticks = 60.1302 > ___ > > Testing Version 0.1: std::shared_mutex > ___ > cpu_threads_n = 160 > threads_n = 160 > writers = 80 > readers = 80 > test duration = 60 seconds > ___ > ___ > Raw Reads: 191840 > Raw Writes: 784 > reads_per_tick = 3194 > writes_per_tick = 13 > Ticks = 60.0533 > ___ > > Testing Version 0.1: std::shared_mutex > ___ > cpu_threads_n = 160 > threads_n = 320 > writers = 160 > readers = 160 > test duration = 60 seconds > ___ > ___ > Raw Reads: 350020 > Raw Writes: 1738 > reads_per_tick = 5826 > writes_per_tick = 28 > Ticks = 60.0688 > ___ > > Testing Version 0.1: std::shared_mutex > ___ > cpu_threads_n = 160 > threads_n = 640 > writers = 320 > readers = 320 > test duration = 60 seconds > ___ > ___ > Raw Reads: 706867 > Raw Writes: 1830 > reads_per_tick = 11752 > writes_per_tick = 30 > Ticks = 60.1452 > ___ > Okay. Thank you. So, my work is losing wrt reads, but performing more writes. This has to be an aspect of my algorithms strict bakery style fairness. Writers can never starve readers, and vise versa. It has no read or writer preference. > > Result 2: > > 4x SPARC-T5-4 (64 cores, 8x SMT) > > Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex > ___ > cpu_threads_n = 512 > threads_n = 512 > writers = 256 > readers = 256 > test duration = 60 seconds > ___ > ___ > Raw Reads: 640255 > Raw Writes: 7999 > reads_per_tick = 10650 > writes_per_tick = 133 > Ticks = 60.1149 > ___ > > Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex > ___ > cpu_threads_n = 512 > threads_n = 1024 > writers = 512 > readers = 512 > test duration = 60 seconds > ___ > ___ > Raw Reads: 948097 > Raw Writes: 12602 > reads_per_tick = 15746 > writes_per_tick = 209 > Ticks = 60.2094 > ___ > > Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex > ___ > cpu_threads_n = 512 > threads_n = 2048 > writers = 1024 > readers = 1024 > test duration = 60 seconds > ___ > ___ > Raw Reads: 1718250 > Raw Writes: 23019 > reads_per_tick = 28402 > writes_per_tick = 380 > Ticks = 60.4974 > ___ > > > Testing Version 0.1: std::shared_mutex > ___ > cpu_threads_n = 512 > threads_n = 512 > writers = 256 > readers = 256 > test duration = 60 seconds > ___ > ___ > Raw Reads: 4482 > Raw Writes: 2166488 > reads_per_tick = 74 > writes_per_tick = 36045 > Ticks = 60.1037 > ___ > > Testing Version 0.1: std::shared_mutex > ___ > cpu_threads_n = 512 > threads_n = 1024 > writers = 512 > readers = 512 > test duration = 60 seconds > ___ > ___ > Raw
[lock-free] Re: Experimental Read/Write Mutex..
Benchmarked this on 2 systems with `THREADS` set to 1, 2 and 4. Result 1: 8x Intel(R) Xeon(R) CPU E7- 8850 @ 2.00GHz (80 cores, 2x SMT) Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 160 threads_n = 160 writers = 80 readers = 80 test duration = 60 seconds ___ ___ Raw Reads: 186803 Raw Writes: 1922 reads_per_tick = 3111 writes_per_tick = 32 Ticks = 60.0334 ___ Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 160 threads_n = 320 writers = 160 readers = 160 test duration = 60 seconds ___ ___ Raw Reads: 336265 Raw Writes: 2559 reads_per_tick = 5596 writes_per_tick = 42 Ticks = 60.0848 ___ Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 160 threads_n = 640 writers = 320 readers = 320 test duration = 60 seconds ___ ___ Raw Reads: 449283 Raw Writes: 3718 reads_per_tick = 7471 writes_per_tick = 61 Ticks = 60.1302 ___ Testing Version 0.1: std::shared_mutex ___ cpu_threads_n = 160 threads_n = 160 writers = 80 readers = 80 test duration = 60 seconds ___ ___ Raw Reads: 191840 Raw Writes: 784 reads_per_tick = 3194 writes_per_tick = 13 Ticks = 60.0533 ___ Testing Version 0.1: std::shared_mutex ___ cpu_threads_n = 160 threads_n = 320 writers = 160 readers = 160 test duration = 60 seconds ___ ___ Raw Reads: 350020 Raw Writes: 1738 reads_per_tick = 5826 writes_per_tick = 28 Ticks = 60.0688 ___ Testing Version 0.1: std::shared_mutex ___ cpu_threads_n = 160 threads_n = 640 writers = 320 readers = 320 test duration = 60 seconds ___ ___ Raw Reads: 706867 Raw Writes: 1830 reads_per_tick = 11752 writes_per_tick = 30 Ticks = 60.1452 ___ Result 2: 4x SPARC-T5-4 (64 cores, 8x SMT) Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 512 threads_n = 512 writers = 256 readers = 256 test duration = 60 seconds ___ ___ Raw Reads: 640255 Raw Writes: 7999 reads_per_tick = 10650 writes_per_tick = 133 Ticks = 60.1149 ___ Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 512 threads_n = 1024 writers = 512 readers = 512 test duration = 60 seconds ___ ___ Raw Reads: 948097 Raw Writes: 12602 reads_per_tick = 15746 writes_per_tick = 209 Ticks = 60.2094 ___ Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 512 threads_n = 2048 writers = 1024 readers = 1024 test duration = 60 seconds ___ ___ Raw Reads: 1718250 Raw Writes: 23019 reads_per_tick = 28402 writes_per_tick = 380 Ticks = 60.4974 ___ Testing Version 0.1: std::shared_mutex ___ cpu_threads_n = 512 threads_n = 512 writers = 256 readers = 256 test duration = 60 seconds ___ ___ Raw Reads: 4482 Raw Writes: 2166488 reads_per_tick = 74 writes_per_tick = 36045 Ticks = 60.1037 ___ Testing Version 0.1: std::shared_mutex ___ cpu_threads_n = 512 threads_n = 1024 writers = 512 readers = 512 test duration = 60 seconds ___ ___ Raw Reads: 1536 Raw Writes: 2093636 reads_per_tick = 25 writes_per_tick = 34767 Ticks = 60.2185 ___ Testing Version 0.1: std::shared_mutex ___ cpu_threads_n = 512 threads_n = 2048 writers = 1024 readers = 1024 test duration = 60 seconds ___ ___ Raw Reads: 4096 Raw Writes: 2001034 reads_per_tick = 67 writes_per_tick = 33130 Ticks = 60.3978 ___ -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and
Re: [lock-free] Re: Experimental Read/Write Mutex..
On Tue, Feb 26, 2019 at 11:47 PM Chris M. Thomasson wrote: > > On Tuesday, February 26, 2019 at 12:10:02 AM UTC-8, Dmitry Vyukov wrote: >> >> On Wed, Feb 20, 2019 at 7:51 AM Chris M. Thomasson >> wrote: >> > >> > Fwiw, I wrote a crude new benchmark that measures how many reads and >> > writes can be performed in a given amount of time. My algorithm vs >> > std::shared_mutex. So, we are _primarily_ looking for how many reads can >> > be performed in this test at 60 seconds. The number of threads is variable >> > and is determined by std::hardware_concurrency * THREADS, set at 8 in the >> > test. So on my system the setup is: >> > ___ >> > cpu_threads_n = 4 >> > threads_n = 32 >> > writers = 16 >> > readers = 16 >> > test duration = 60 seconds >> > ___ >> [...] >> >> Benchmarked this on 2 systems (3 alternating runs for each mutex): >> >> [...] > > > Thank you! :^) > >> >> >> Now it is your problem to interpret this :) > > > std::shared_mutex might have a reader priority? Wondering if it is using a > distributed algorithm on the larger system? > > The strict bakery style interleaving in my algorithm must be doing something > "interesting" on the larger system. Mine seems to allow for some more writes > in the data, too fair? The mutex aspect of my algorithm might be kicking in > here. It is using Alexander Terekhov's algorithm from pthread_mutex_t in > pthreads-win32, actually it can use any mutual exclusion algorithm for writer > access: > > https://www.sourceware.org/pthreads-win32/ > > Integrating writer access wrt ct_rwmutex::m_count should be beneficial... > > > >> >> >> Another interesting data point is time output. >> On the large system for your mutex: >> 848.76user 10.83system 1:00.26elapsed 1426%CPU >> >> for std mutex: >> 4238.27user 26.79system 1:00.18elapsed 7086%CPU >> >> So whatever your mutex did, it used 5 times less CPU time for that. > > > Bakery style, and the slow paths boiling down to condvar/mutex? I wonder what > std::shared_mutex is using on your end when it has to wait? futex? It's just a wrapper around pthread_rwlock and I have glibc 2.24. -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to lock-free+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/CAEeQi3vmhVoGtQJV4o9%2BsoG2M4egGsNZy9f8VmpGvNvWma9N%3Dg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
[lock-free] Re: Experimental Read/Write Mutex..
Fwiw, I wrote a crude new benchmark that measures how many reads and writes can be performed in a given amount of time. My algorithm vs std::shared_mutex. So, we are _primarily_ looking for how many reads can be performed in this test at 60 seconds. The number of threads is variable and is determined by std::hardware_concurrency * THREADS, set at 8 in the test. So on my system the setup is: ___ cpu_threads_n = 4 threads_n = 32 writers = 16 readers = 16 test duration = 60 seconds ___ This is going to be different for each system. Readers iterate a shared linked list with 100 nodes in this test. Writers pop and push items, and never use new or delete. Well, so far, the timings I am getting on my end using MSVC 2017 is: Testing: Chris M. Thomasson's Experimental Read/Write Mutex ___ cpu_threads_n = 4 threads_n = 32 writers = 16 readers = 16 test duration = 60 seconds ___ ___ Raw Reads: 18046464 Raw Writes: 347498 reads_per_tick = 300693 writes_per_tick = 5790 Ticks = 60.0161 ___ Testing: std::shared_mutex ___ cpu_threads_n = 4 threads_n = 32 writers = 16 readers = 16 test duration = 60 seconds ___ ___ Raw Reads: 650006 Raw Writes: 168741 reads_per_tick = 10829 writes_per_tick = 2811 Ticks = 60.0193 ___ As you can see, my algorithm is performing many more reads than std::shared_mutex. Anyway, here is my code: When you get some time, can you please give it a go? I want to see timings on other systems, the good, bad and ugly. ;^) Thanks. https://pastebin.com/raw/1QtPCGhV ___ /* Crude Read/Write Mutex Benchmark This tests my algorithm ct_rwmutex vs. std::shared_mutex. It shows how many reads and writes can be performed within a fixed amount of time. by: Chris M. Thomasson __*/ #include #include #include #include #include #include #include #include #include #include #include #include #include // undefine to test std::shared_mutex #define CT_TEST_FAST_MUTEX 1 #define THREADS 8 // multiplied by std::hardware_concurrency #define NODE_PRIME 100 // number of nodes #define CRUDE_CACHE_PAD 256 // crude cache pad #define TEST_DURATION_SEC 60 // number of seconds for the test // bare bones mutex/condvar based semaphore struct ct_slow_semaphore { unsigned long m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_slow_semaphore(unsigned long state) : m_state(state) {} void inc() { m_mutex.lock(); ++m_state; m_mutex.unlock(); m_cond.notify_one(); } void add(unsigned long addend) { { std::unique_lock lock(m_mutex); m_state += addend; } m_cond.notify_all(); } void dec() { std::unique_lock lock(m_mutex); while (m_state == 0) m_cond.wait(lock); --m_state; } }; // bin-sema struct ct_auto_reset_event { bool m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_auto_reset_event() : m_state(false) {} void signal() { std::unique_lock lock(m_mutex); m_state = true; m_cond.notify_one(); } void wait() { std::unique_lock lock(m_mutex); while (m_state == false) m_cond.wait(lock); m_state = false; // auto-reset } }; // just a layer over an auto-reset event struct ct_fast_mutex { std::atomic m_state; ct_auto_reset_event m_waitset; ct_fast_mutex() : m_state(0) {} void lock() { if (m_state.exchange(1, std::memory_order_acquire)) { while (m_state.exchange(2, std::memory_order_acquire)) { m_waitset.wait(); } } } void unlock() { if (m_state.exchange(0, std::memory_order_release) == 2) { m_waitset.signal(); } } }; // Chris M. Thomassons Experimental Read/Write Mutex // Yeah, it is pretty damn fat wrt the state, however // it has some interesting properties... // The state can be compressed a bit... // btw, it has no loops... // Take a look at the lock_shared and unlock_shared functions // The number of readers _must_ never exceed LONG_MAX! #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { unsigned char m_crude_cache_pad_0[CRUDE_CACHE_PAD]; // shared state std::atomic m_count; unsigned char m_crude_cache_pad_1[CRUDE_CACHE_PAD]; std::atomic m_rdwake; unsigned char
[lock-free] Re: Experimental Read/Write Mutex..
On Monday, February 18, 2019 at 8:50:21 PM UTC-8, Chris M. Thomasson wrote: > > Fwiw, here is a more realistic benchmark, in the Relacy stage for > verification purposes: > > https://pastebin.com/raw/j4wrg050 > > Fwiw, I have almost completed my new benchmark in pure C++11. The timings that I am getting measure a different thing. How many reads and writes can be completed within a given time frame. Well, here are some initial results: Testing: Chris M. Thomasson's Experimental Read/Write Mutex Raw Reads: 3086724 Raw Writes: 59651 reads_per_tick = 308309 writes_per_tick = 5958 Ticks = 10.0118 Testing: std::shared_mutex Raw Reads: 110922 Raw Writes: 29502 reads_per_tick = 11071 writes_per_tick = 2944 Ticks = 10.019 For a simple test of 10 seconds, my algorithm allows for 3086724 reads and 59651 writes to be completed. While std::shared_mutex allows for only 110922 reads and 29502 writes, damn. Look at the difference in the read rate per tick! Wow. I can't wait for you, and others to test it and tell me the results, good or bad. Heck, std::shared_mutex should beat my work, wouldn't ya think? Humm... -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to lock-free+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/dc5d90c2-3c8f-416c-8d50-3940a50ccc29%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[lock-free] Re: Experimental Read/Write Mutex..
Fwiw, here is a more realistic benchmark, in the Relacy stage for verification purposes: https://pastebin.com/raw/j4wrg050 Looking good, and passing tests so far. Here is the code, using Relacy: ___ // Experimental Read-Write Mutex Test // More "realistic" test, in Relacy... // By: Chris M. Thomasson //___ //#define RL_DEBUGBREAK_ON_ASSERT //#define RL_MSVC_OUTPUT //#define RL_FORCE_SEQ_CST //#define RL_GC #include #include // Simple macro based redirection of the verbose std membars. #define CT_MB_ACQ std::memory_order_acquire #define CT_MB_REL std::memory_order_release #define CT_MB_RLX std::memory_order_relaxed #define CT_MB_ACQ_REL std::memory_order_acq_rel #define CT_MB_SEQ_CST std::memory_order_seq_cst #define CT_MB(mp_0) std::atomic_thread_fence(mp_0) #define READERS 4 #define WRITERS 2 #define THREADS (READERS + WRITERS) #define ITERS 3 // bare bones mutex/condvar based semaphore struct ct_slow_semaphore { VAR_T(unsigned long) m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_slow_semaphore(unsigned long state) : m_state(state) {} void inc() { m_mutex.lock($); ++VAR(m_state); m_mutex.unlock($); m_cond.notify_one($); } void add(unsigned long addend) { m_mutex.lock($); VAR(m_state) += addend; m_mutex.unlock($); m_cond.notify_all($); } void dec() { m_mutex.lock($); while (VAR(m_state) == 0) m_cond.wait(m_mutex, $); --VAR(m_state); m_mutex.unlock($); } }; // bin-sema struct ct_auto_reset_event { bool m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_auto_reset_event() : m_state(false) {} void signal() { m_mutex.lock($); m_state = true; m_cond.notify_one($); m_mutex.unlock($); } void wait() { m_mutex.lock($); while (m_state == false) m_cond.wait(m_mutex, $); m_state = false; // auto-reset m_mutex.unlock($); } }; // just a layer over an auto-reset event struct ct_fast_mutex { std::atomic m_state; ct_auto_reset_event m_waitset; ct_fast_mutex() : m_state(0) {} void lock() { if (m_state.exchange(1, std::memory_order_acquire)) { while (m_state.exchange(2, std::memory_order_acquire)) { m_waitset.wait(); } } } void unlock() { if (m_state.exchange(0, std::memory_order_release) == 2) { m_waitset.signal(); } } }; // Chris M. Thomassons Experimental Read/Write Mutex // Yeah, it is pretty damn fat wrt the state, however // it has some interesting properties... // The state can be compressed a bit... // btw, it has no loops... // Take a look at the lock_shared and unlock_shared functions #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { // shared state std::atomic m_wrstate; std::atomic m_count; std::atomic m_rdwake; ct_slow_semaphore m_rdwset; ct_slow_semaphore m_wrwset; ct_fast_mutex m_wrlock; ct_rwmutex() : m_wrstate(1), m_count(RWMUTEX_COUNT_MAX), m_rdwake(0), m_rdwset(0), m_wrwset(0) { } // READ, pretty slim... void lock_shared() { if (m_count.fetch_add(-1, std::memory_order_acquire) < 1) { m_rdwset.dec(); } } void unlock_shared() { if (m_count.fetch_add(1, std::memory_order_release) < 0) { if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1) { m_wrwset.inc(); } } } // WRITE, more hefty void lock() { m_wrlock.lock(); long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, std::memory_order_acquire); if (count < RWMUTEX_COUNT_MAX) { long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, std::memory_order_acquire); if (rdwake + RWMUTEX_COUNT_MAX - count) { m_wrwset.dec(); } } } // write unlock void unlock() { long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, std::memory_order_release); if (count < 0) { m_rdwset.add(-count); } m_wrlock.unlock(); } }; struct ct_simple_stack { struct node { VAR_T(node*) m_next; VAR_T(unsigned int) m_tid; node(unsigned int tid) : m_tid(tid) {} }; VAR_T(node*) m_head; ct_simple_stack() : m_head(nullptr) {} void push(node* n) { VAR(n->m_next) = VAR(m_head); VAR(m_head) = n; } node* pop() { node* n = VAR(m_head); if (n) { VAR(m_head) = VAR(n->m_next); }