[lock-free] Re: Experimental Read/Write Mutex..

2019-03-15 Thread Chris M. Thomasson
Sorry for not working on this like I should be doing. Fwiw, I got 
distracted because several of my fractals are now on a very nice site:

http://paulbourke.net/fractals

My work:

http://paulbourke.net/fractals/logspiral

http://paulbourke.net/fractals/multijulia

http://paulbourke.net/fractals/cubicjulia

http://paulbourke.net/fractals/fractionalpowers

The multi julia is very interesting, and creates some strange looking 3d 
renderings, even though the algorithm is 100% 2d in nature. Interested? 

To clarify, Paul used my special IFS algorithm to create the renderings. He 
used is own coloring algorithms. 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/88f28b42-4d34-4360-a2df-19e4f617c4f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-03-04 Thread Chris M. Thomasson


On Saturday, March 2, 2019 at 7:28:28 AM UTC-8, Manuel Pöter wrote:
>
>
>
> On Friday, 1 March 2019 23:29:41 UTC+1, Chris M. Thomasson wrote:
>>
>> On Friday, March 1, 2019 at 7:11:20 AM UTC-8, Manuel Pöter wrote:
>>>
>>> I could get my hands on an AMD EPYC 7351P system and a Xeon Phi (Knights 
>>> Corner) if you are interested in more results...
>>>
>>
>> I would be grateful if you can find the time to do this Manuel: Thanks. 
>> :^)
>>
>  
> Will do! :) 
>

Thanks. I am interested in all the results you can get.
 

>
> Really like the "difference" between the Intel and SPARC results. Afaict, 
>> std::shared_mutex, on Solaris? Needs to be fixed? I do not like the way it 
>> seems to "tank" the reads and obtain a shi% load of writes. Humm...
>>
>
> Yes, the SPARC system is running Solaris 11.3; the binaries were built 
> using a self-compiled GCC 6.3. I have to admit that I did not really look 
> at the results before posting them here, but now that you pointed it out I 
> agree - the shared_mutex results on Solaris are definitely not what I would 
> have expected!
>

Agreed. Very strange, massive writer preference. Humm...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/ca0474ba-6eb9-4df1-bd85-c7f33ca20f02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-03-02 Thread Chris M. Thomasson
I need to create another type of test, one that does not hammer things so 
much wrt creating an equal amount of reader and writer threads. I need to 
basically model a "read-mostly" work environment. Instead of a 
hyper-hardcore read and write focused contention massacre...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/93213150-1e75-4e69-9510-cfc302deca59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-03-01 Thread Chris M. Thomasson
On Friday, March 1, 2019 at 7:11:20 AM UTC-8, Manuel Pöter wrote:
>
> I could get my hands on an AMD EPYC 7351P system and a Xeon Phi (Knights 
> Corner) if you are interested in more results...
>

I would be grateful if you can find the time to do this Manuel: Thanks. :^)

Really like the "difference" between the Intel and SPARC results. Afaict, 
std::shared_mutex, on Solaris? Needs to be fixed? I do not like the way it 
seems to "tank" the reads and obtain a shi% load of writes. Humm...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/1dc19580-0c09-4464-953e-75343985cb57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-02-28 Thread Chris M. Thomasson
On Wednesday, February 27, 2019 at 6:34:41 AM UTC-8, Manuel Pöter wrote:
>
> Benchmarked this on 2 systems with `THREADS` set to 1, 2 and 4.
>
> Result 1:
>
> 8x Intel(R) Xeon(R) CPU E7- 8850  @ 2.00GHz (80 cores, 2x SMT)
>
> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
> ___
> cpu_threads_n = 160
> threads_n = 160
> writers = 80
> readers = 80
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 186803
> Raw Writes: 1922
> reads_per_tick = 3111
> writes_per_tick = 32
> Ticks = 60.0334
> ___
>
> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
> ___
> cpu_threads_n = 160
> threads_n = 320
> writers = 160
> readers = 160
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 336265
> Raw Writes: 2559
> reads_per_tick = 5596
> writes_per_tick = 42
> Ticks = 60.0848
> ___
>
> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
> ___
> cpu_threads_n = 160
> threads_n = 640
> writers = 320
> readers = 320
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 449283
> Raw Writes: 3718
> reads_per_tick = 7471
> writes_per_tick = 61
> Ticks = 60.1302
> ___
>
> Testing Version 0.1: std::shared_mutex
> ___
> cpu_threads_n = 160
> threads_n = 160
> writers = 80
> readers = 80
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 191840
> Raw Writes: 784
> reads_per_tick = 3194
> writes_per_tick = 13
> Ticks = 60.0533
> ___
>
> Testing Version 0.1: std::shared_mutex
> ___
> cpu_threads_n = 160
> threads_n = 320
> writers = 160
> readers = 160
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 350020
> Raw Writes: 1738
> reads_per_tick = 5826
> writes_per_tick = 28
> Ticks = 60.0688
> ___
>
> Testing Version 0.1: std::shared_mutex
> ___
> cpu_threads_n = 160
> threads_n = 640
> writers = 320
> readers = 320
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 706867
> Raw Writes: 1830
> reads_per_tick = 11752
> writes_per_tick = 30
> Ticks = 60.1452
> ___
>


Okay. Thank you. So, my work is losing wrt reads, but performing more 
writes. This has to be an aspect of my algorithms strict bakery style 
fairness. Writers can never starve readers, and vise versa. It has no read 
or writer preference.

 

>
> Result 2:
>
> 4x SPARC-T5-4 (64 cores, 8x SMT)
>
> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
> ___
> cpu_threads_n = 512
> threads_n = 512
> writers = 256
> readers = 256
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 640255
> Raw Writes: 7999
> reads_per_tick = 10650
> writes_per_tick = 133
> Ticks = 60.1149
> ___
>
> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
> ___
> cpu_threads_n = 512
> threads_n = 1024
> writers = 512
> readers = 512
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 948097
> Raw Writes: 12602
> reads_per_tick = 15746
> writes_per_tick = 209
> Ticks = 60.2094
> ___
>
> Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
> ___
> cpu_threads_n = 512
> threads_n = 2048
> writers = 1024
> readers = 1024
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 1718250
> Raw Writes: 23019
> reads_per_tick = 28402
> writes_per_tick = 380
> Ticks = 60.4974
> ___
>
>
> Testing Version 0.1: std::shared_mutex
> ___
> cpu_threads_n = 512
> threads_n = 512
> writers = 256
> readers = 256
> test duration = 60 seconds
> ___
> ___
> Raw Reads: 4482
> Raw Writes: 2166488
> reads_per_tick = 74
> writes_per_tick = 36045
> Ticks = 60.1037
> ___
>
> Testing Version 0.1: std::shared_mutex
> ___
> cpu_threads_n = 512
> threads_n = 1024
> writers = 512
> readers = 512
> test duration = 60 seconds
> ___
> ___
> Raw 

[lock-free] Re: Experimental Read/Write Mutex..

2019-02-27 Thread Manuel Pöter
Benchmarked this on 2 systems with `THREADS` set to 1, 2 and 4.

Result 1:

8x Intel(R) Xeon(R) CPU E7- 8850  @ 2.00GHz (80 cores, 2x SMT)

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 160
threads_n = 160
writers = 80
readers = 80
test duration = 60 seconds
___
___
Raw Reads: 186803
Raw Writes: 1922
reads_per_tick = 3111
writes_per_tick = 32
Ticks = 60.0334
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 160
threads_n = 320
writers = 160
readers = 160
test duration = 60 seconds
___
___
Raw Reads: 336265
Raw Writes: 2559
reads_per_tick = 5596
writes_per_tick = 42
Ticks = 60.0848
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 160
threads_n = 640
writers = 320
readers = 320
test duration = 60 seconds
___
___
Raw Reads: 449283
Raw Writes: 3718
reads_per_tick = 7471
writes_per_tick = 61
Ticks = 60.1302
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 160
threads_n = 160
writers = 80
readers = 80
test duration = 60 seconds
___
___
Raw Reads: 191840
Raw Writes: 784
reads_per_tick = 3194
writes_per_tick = 13
Ticks = 60.0533
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 160
threads_n = 320
writers = 160
readers = 160
test duration = 60 seconds
___
___
Raw Reads: 350020
Raw Writes: 1738
reads_per_tick = 5826
writes_per_tick = 28
Ticks = 60.0688
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 160
threads_n = 640
writers = 320
readers = 320
test duration = 60 seconds
___
___
Raw Reads: 706867
Raw Writes: 1830
reads_per_tick = 11752
writes_per_tick = 30
Ticks = 60.1452
___

Result 2:

4x SPARC-T5-4 (64 cores, 8x SMT)

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 512
threads_n = 512
writers = 256
readers = 256
test duration = 60 seconds
___
___
Raw Reads: 640255
Raw Writes: 7999
reads_per_tick = 10650
writes_per_tick = 133
Ticks = 60.1149
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 512
threads_n = 1024
writers = 512
readers = 512
test duration = 60 seconds
___
___
Raw Reads: 948097
Raw Writes: 12602
reads_per_tick = 15746
writes_per_tick = 209
Ticks = 60.2094
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 512
threads_n = 2048
writers = 1024
readers = 1024
test duration = 60 seconds
___
___
Raw Reads: 1718250
Raw Writes: 23019
reads_per_tick = 28402
writes_per_tick = 380
Ticks = 60.4974
___


Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 512
threads_n = 512
writers = 256
readers = 256
test duration = 60 seconds
___
___
Raw Reads: 4482
Raw Writes: 2166488
reads_per_tick = 74
writes_per_tick = 36045
Ticks = 60.1037
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 512
threads_n = 1024
writers = 512
readers = 512
test duration = 60 seconds
___
___
Raw Reads: 1536
Raw Writes: 2093636
reads_per_tick = 25
writes_per_tick = 34767
Ticks = 60.2185
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 512
threads_n = 2048
writers = 1024
readers = 1024
test duration = 60 seconds
___
___
Raw Reads: 4096
Raw Writes: 2001034
reads_per_tick = 67
writes_per_tick = 33130
Ticks = 60.3978
___

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and 

Re: [lock-free] Re: Experimental Read/Write Mutex..

2019-02-27 Thread Dmitry Vyukov
On Tue, Feb 26, 2019 at 11:47 PM Chris M. Thomasson  wrote:
>
> On Tuesday, February 26, 2019 at 12:10:02 AM UTC-8, Dmitry Vyukov wrote:
>>
>> On Wed, Feb 20, 2019 at 7:51 AM Chris M. Thomasson  
>> wrote:
>> >
>> > Fwiw, I wrote a crude new benchmark that measures how many reads and 
>> > writes can be performed in a given amount of time. My algorithm vs 
>> > std::shared_mutex. So, we are _primarily_ looking for how many reads can 
>> > be performed in this test at 60 seconds. The number of threads is variable 
>> > and is determined by std::hardware_concurrency * THREADS, set at 8 in the 
>> > test. So on my system the setup is:
>> > ___
>> > cpu_threads_n = 4
>> > threads_n = 32
>> > writers = 16
>> > readers = 16
>> > test duration = 60 seconds
>> > ___
>> [...]
>>
>> Benchmarked this on 2 systems (3 alternating runs for each mutex):
>>
>> [...]
>
>
> Thank you! :^)
>
>>
>>
>> Now it is your problem to interpret this :)
>
>
> std::shared_mutex might have a reader priority? Wondering if it is using a 
> distributed algorithm on the larger system?
>
> The strict bakery style interleaving in my algorithm must be doing something 
> "interesting" on the larger system. Mine seems to allow for some more writes 
> in the data, too fair? The mutex aspect of my algorithm might be kicking in 
> here. It is using Alexander Terekhov's algorithm from pthread_mutex_t in 
> pthreads-win32, actually it can use any mutual exclusion algorithm for writer 
> access:
>
> https://www.sourceware.org/pthreads-win32/
>
> Integrating writer access wrt ct_rwmutex::m_count should be beneficial...
>
>
>
>>
>>
>> Another interesting data point is time output.
>> On the large system for your mutex:
>> 848.76user 10.83system 1:00.26elapsed 1426%CPU
>>
>> for std mutex:
>> 4238.27user 26.79system 1:00.18elapsed 7086%CPU
>>
>> So whatever your mutex did, it used 5 times less CPU time for that.
>
>
> Bakery style, and the slow paths boiling down to condvar/mutex? I wonder what 
> std::shared_mutex is using on your end when it has to wait? futex?

It's just a wrapper around pthread_rwlock and I have glibc 2.24.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3vmhVoGtQJV4o9%2BsoG2M4egGsNZy9f8VmpGvNvWma9N%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-02-19 Thread Chris M. Thomasson
Fwiw, I wrote a crude new benchmark that measures how many reads and writes 
can be performed in a given amount of time. My algorithm vs 
std::shared_mutex. So, we are _primarily_ looking for how many reads can be 
performed in this test at 60 seconds. The number of threads is variable and 
is determined by std::hardware_concurrency * THREADS, set at 8 in the test. 
So on my system the setup is: 
___ 
cpu_threads_n = 4 
threads_n = 32 
writers = 16 
readers = 16 
test duration = 60 seconds 
___ 

This is going to be different for each system. Readers iterate a shared 
linked list with 100 nodes in this test. Writers pop and push items, 
and never use new or delete. Well, so far, the timings I am getting on my 
end using MSVC 2017 is: 


Testing: Chris M. Thomasson's Experimental Read/Write Mutex 

___ 
cpu_threads_n = 4 
threads_n = 32 
writers = 16 
readers = 16 
test duration = 60 seconds 
___ 


___ 
Raw Reads: 18046464 
Raw Writes: 347498 
reads_per_tick = 300693 
writes_per_tick = 5790 
Ticks = 60.0161 
___ 



Testing: std::shared_mutex 

___ 
cpu_threads_n = 4 
threads_n = 32 
writers = 16 
readers = 16 
test duration = 60 seconds 
___ 


___ 
Raw Reads: 650006 
Raw Writes: 168741 
reads_per_tick = 10829 
writes_per_tick = 2811 
Ticks = 60.0193 
___ 



As you can see, my algorithm is performing many more reads than 
std::shared_mutex. Anyway, here is my code: 

When you get some time, can you please give it a go? I want to see timings 
on other systems, the good, bad and ugly. ;^) Thanks. 

https://pastebin.com/raw/1QtPCGhV 
___ 
/* Crude Read/Write Mutex Benchmark 

   This tests my algorithm ct_rwmutex vs. std::shared_mutex. 

   It shows how many reads and writes can be performed within 
   a fixed amount of time. 

   by: Chris M. Thomasson 
__*/ 



#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  


// undefine to test std::shared_mutex 
#define CT_TEST_FAST_MUTEX 1 


#define THREADS 8 // multiplied by std::hardware_concurrency 
#define NODE_PRIME 100 // number of nodes 
#define CRUDE_CACHE_PAD 256 // crude cache pad 
#define TEST_DURATION_SEC 60 // number of seconds for the test 


// bare bones mutex/condvar based semaphore 
struct ct_slow_semaphore 
{ 
unsigned long m_state; 
std::mutex m_mutex; 
std::condition_variable m_cond; 

ct_slow_semaphore(unsigned long state) : m_state(state) {} 

void inc() 
{ 
m_mutex.lock(); 
++m_state; 
m_mutex.unlock(); 
m_cond.notify_one(); 
} 

void add(unsigned long addend) 
{ 
{ 
std::unique_lock lock(m_mutex); 
m_state += addend; 
} 

m_cond.notify_all(); 
} 

void dec() 
{ 
std::unique_lock lock(m_mutex); 
while (m_state == 0) m_cond.wait(lock); 
--m_state; 
} 
}; 




// bin-sema 
struct ct_auto_reset_event 
{ 
bool m_state; 
std::mutex m_mutex; 
std::condition_variable m_cond; 

ct_auto_reset_event() : m_state(false) {} 

void signal() 
{ 
std::unique_lock lock(m_mutex); 
m_state = true; 
m_cond.notify_one(); 
} 

void wait() 
{ 
std::unique_lock lock(m_mutex); 
while (m_state == false) m_cond.wait(lock); 
m_state = false; // auto-reset 
} 
}; 




// just a layer over an auto-reset event 
struct ct_fast_mutex 
{ 
std::atomic m_state; 
ct_auto_reset_event m_waitset; 

ct_fast_mutex() : m_state(0) {} 

void lock() 
{ 
if (m_state.exchange(1, std::memory_order_acquire)) 
{ 
while (m_state.exchange(2, std::memory_order_acquire)) 
{ 
m_waitset.wait(); 
} 
} 
} 

void unlock() 
{ 
if (m_state.exchange(0, std::memory_order_release) == 2) 
{ 
m_waitset.signal(); 
} 
} 
}; 



// Chris M. Thomassons Experimental Read/Write Mutex 
// Yeah, it is pretty damn fat wrt the state, however 
// it has some interesting properties... 
// The state can be compressed a bit... 
// btw, it has no loops... 
// Take a look at the lock_shared and unlock_shared functions 

// The number of readers _must_ never exceed LONG_MAX! 

#define RWMUTEX_COUNT_MAX LONG_MAX 

struct ct_rwmutex 
{ 
unsigned char m_crude_cache_pad_0[CRUDE_CACHE_PAD]; 

// shared state 
std::atomic m_count; 
unsigned char m_crude_cache_pad_1[CRUDE_CACHE_PAD]; 

std::atomic m_rdwake; 
unsigned char 

[lock-free] Re: Experimental Read/Write Mutex..

2019-02-19 Thread Chris M. Thomasson
On Monday, February 18, 2019 at 8:50:21 PM UTC-8, Chris M. Thomasson wrote:
>
> Fwiw, here is a more realistic benchmark, in the Relacy stage for 
> verification purposes:
>
> https://pastebin.com/raw/j4wrg050
>
>

Fwiw, I have almost completed my new benchmark in pure C++11. The timings 
that I am getting measure a different thing. How many reads and writes can 
be completed within a given time frame. Well, here are some initial results:

Testing: Chris M. Thomasson's Experimental Read/Write Mutex

Raw Reads: 3086724
Raw Writes: 59651
reads_per_tick = 308309
writes_per_tick = 5958
Ticks = 10.0118


Testing: std::shared_mutex

Raw Reads: 110922
Raw Writes: 29502
reads_per_tick = 11071
writes_per_tick = 2944
Ticks = 10.019


For a simple test of 10 seconds, my algorithm allows for 3086724 reads and 
59651 writes to be completed. While std::shared_mutex allows for only 
110922 reads and 29502 writes, damn. Look at the difference in the read 
rate per tick! Wow. I can't wait for you, and others to test it and tell me 
the results, good or bad. Heck, std::shared_mutex should beat my work, 
wouldn't ya think? Humm... 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/dc5d90c2-3c8f-416c-8d50-3940a50ccc29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-02-18 Thread Chris M. Thomasson
Fwiw, here is a more realistic benchmark, in the Relacy stage for 
verification purposes:

https://pastebin.com/raw/j4wrg050

Looking good, and passing tests so far. Here is the code, using Relacy:
___

// Experimental Read-Write Mutex Test
// More "realistic" test, in Relacy...
// By: Chris M. Thomasson
//___


//#define RL_DEBUGBREAK_ON_ASSERT
//#define RL_MSVC_OUTPUT
//#define RL_FORCE_SEQ_CST
//#define RL_GC


#include 
#include 


// Simple macro based redirection of the verbose  std membars.
#define CT_MB_ACQ std::memory_order_acquire
#define CT_MB_REL std::memory_order_release
#define CT_MB_RLX std::memory_order_relaxed
#define CT_MB_ACQ_REL std::memory_order_acq_rel
#define CT_MB_SEQ_CST std::memory_order_seq_cst
#define CT_MB(mp_0) std::atomic_thread_fence(mp_0)


#define READERS 4
#define WRITERS 2
#define THREADS (READERS + WRITERS)
#define ITERS 3







// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
VAR_T(unsigned long) m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_slow_semaphore(unsigned long state) : m_state(state) {}

void inc()
{
m_mutex.lock($);
++VAR(m_state);
m_mutex.unlock($);
m_cond.notify_one($);
}

void add(unsigned long addend)
{
m_mutex.lock($);
VAR(m_state) += addend;
m_mutex.unlock($);
m_cond.notify_all($);
}

void dec()
{
m_mutex.lock($);
while (VAR(m_state) == 0) m_cond.wait(m_mutex, $);
--VAR(m_state);
m_mutex.unlock($);
}
};




// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_auto_reset_event() : m_state(false) {}

void signal()
{
m_mutex.lock($);
m_state = true;
m_cond.notify_one($);
m_mutex.unlock($);
}

void wait()
{
m_mutex.lock($);
while (m_state == false) m_cond.wait(m_mutex, $);
m_state = false; // auto-reset
m_mutex.unlock($);
}
};




// just a layer over an auto-reset event
struct ct_fast_mutex
{
std::atomic m_state;
ct_auto_reset_event m_waitset;

ct_fast_mutex() : m_state(0) {}

void lock()
{
if (m_state.exchange(1, std::memory_order_acquire))
{
while (m_state.exchange(2, std::memory_order_acquire))
{
m_waitset.wait();
}
}
}

void unlock()
{
if (m_state.exchange(0, std::memory_order_release) == 2)
{
m_waitset.signal();
}
}
};



// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions

#define RWMUTEX_COUNT_MAX LONG_MAX

struct ct_rwmutex
{
// shared state
std::atomic m_wrstate;
std::atomic m_count;
std::atomic m_rdwake;

ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;


ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}


// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}

void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}


// WRITE, more hefty
void lock()
{
m_wrlock.lock();

long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, 
std::memory_order_acquire);

if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, 
std::memory_order_acquire);

if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}

// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, 
std::memory_order_release);

if (count < 0)
{
m_rdwset.add(-count);
}

m_wrlock.unlock();
}
};






struct ct_simple_stack
{
struct node
{
VAR_T(node*) m_next;
VAR_T(unsigned int) m_tid;

node(unsigned int tid) : m_tid(tid) {}
};

VAR_T(node*) m_head;

ct_simple_stack() : m_head(nullptr) {}

void push(node* n)
{
VAR(n->m_next) = VAR(m_head);
VAR(m_head) = n;
}

node* pop()
{
node* n = VAR(m_head);

if (n)
{
VAR(m_head) = VAR(n->m_next);
}