[lock-free] Re: Scalable hash map

2018-12-28 Thread manuel
Hi,

sorry for resurrecting this old thread!

The link to the files (http://groups.google.com/group/lock-free/files) no 
longer works.
I've downloaded the hash_map.zip file from your homepage, but this version 
of the code only supports pod-types of 4 or 8 bytes. I am particularly 
interested in your solution to support non-trivial key/value types. Any 
chance this code is still available somewhere?

Thanks,
Manuel

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/47640666-7251-45ed-9a3a-141dd092713e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [lock-free] Re: Scalable hash map

2018-12-28 Thread Manuel Pöter
Hi Dmitry,

in one of your last post you wrote:

I've uploaded modified hash map which supports strings (and any other 
> satellite data) as keys/values: 
> http://groups.google.com/group/lock-free/files 
> There is example included which uses strings as both keys and values.
>

That made me think there was a version that supported non-trivial types.
But reading on it says:

The main idea is that satellite data (strings) must be passed through 
> PDR (Partial-copy-on-write Deferred Reclamation) system [...]
>

I missed that part when I read that post the first time. So basically you 
would
have to work with an additional indirection - allocate a pcx-node that 
holds the
string (or whatever type you are using) and use a pointer to that node as 
key/value.
And of course define a specialization of the concurrent_hash_map_traits for 
that
node type. Correct?

Thanks,
Manuel

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/ef5c2e8a-a722-420e-912e-ed343a191bfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [lock-free] Re: Scalable hash map

2018-12-29 Thread Manuel Pöter
BTW - I think a found a race condition that causes find to incorrectly 
return false.
Consider the following scenario:

Thread A removes an entry with key k1 which is stored directly in the cell.
The cell's head points to an add_item with key k2, so thread A first updates
the cell's state to reflect that the item that contained k1 is now invalid, 
then it
copies key and value from the add_item into the item where k1 used to be, 
then 
updates the cell's state again to reflect that the item is now again valid 
before
removing the add_item from the linked list.

Now thread B is concurrently calling find with k2. It could happen that 
thread B
finds k2 in the item marked as invalid. Now assume that the reload of the 
cell's
state is executed before the update by thread A that restores the bit to 
mark the
item as valid, i.e., the reload returns the same value. Then thread B would 
(incorrectly)
assume that k2 is currently getting removed and return false.

IMHO instead of returning early, find has to ignore items marked as invalid 
and continue
the search. If the find is faster than the remove, we would find the 
matching add_item; if
the remove is faster, the find would encounter an updated state and restart.

What do you think?

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/b5a981db-322f-43dd-8ee5-2f09acc0c0a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Xenium - a C++ library with various concurrent data structures and reclamation schemes

2019-01-08 Thread Manuel Pöter
Hi all,

I am working on a C++ library called "Xenium" that provides various 
concurrent data structures and reclamation schemes: 
https://github.com/mpoeter/xenium/

The goal is to provide correct (in the C++ memory model) and efficient 
implementations that are ready to be used in real-world applications.
Since concurrency is all about performance (otherwise we could stick with a 
simple sequential program), I am aiming to provide a high degree of 
configurability and customizability, so users can pick the configuration 
that works best for their use case.

Feedback is highly welcome!

Cheers,
Manuel

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/3652371d-1ae1-4343-a29b-9681b74e05d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-02-27 Thread Manuel Pöter
Benchmarked this on 2 systems with `THREADS` set to 1, 2 and 4.

Result 1:

8x Intel(R) Xeon(R) CPU E7- 8850  @ 2.00GHz (80 cores, 2x SMT)

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 160
threads_n = 160
writers = 80
readers = 80
test duration = 60 seconds
___
___
Raw Reads: 186803
Raw Writes: 1922
reads_per_tick = 3111
writes_per_tick = 32
Ticks = 60.0334
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 160
threads_n = 320
writers = 160
readers = 160
test duration = 60 seconds
___
___
Raw Reads: 336265
Raw Writes: 2559
reads_per_tick = 5596
writes_per_tick = 42
Ticks = 60.0848
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 160
threads_n = 640
writers = 320
readers = 320
test duration = 60 seconds
___
___
Raw Reads: 449283
Raw Writes: 3718
reads_per_tick = 7471
writes_per_tick = 61
Ticks = 60.1302
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 160
threads_n = 160
writers = 80
readers = 80
test duration = 60 seconds
___
___
Raw Reads: 191840
Raw Writes: 784
reads_per_tick = 3194
writes_per_tick = 13
Ticks = 60.0533
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 160
threads_n = 320
writers = 160
readers = 160
test duration = 60 seconds
___
___
Raw Reads: 350020
Raw Writes: 1738
reads_per_tick = 5826
writes_per_tick = 28
Ticks = 60.0688
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 160
threads_n = 640
writers = 320
readers = 320
test duration = 60 seconds
___
___
Raw Reads: 706867
Raw Writes: 1830
reads_per_tick = 11752
writes_per_tick = 30
Ticks = 60.1452
___

Result 2:

4x SPARC-T5-4 (64 cores, 8x SMT)

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 512
threads_n = 512
writers = 256
readers = 256
test duration = 60 seconds
___
___
Raw Reads: 640255
Raw Writes: 7999
reads_per_tick = 10650
writes_per_tick = 133
Ticks = 60.1149
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 512
threads_n = 1024
writers = 512
readers = 512
test duration = 60 seconds
___
___
Raw Reads: 948097
Raw Writes: 12602
reads_per_tick = 15746
writes_per_tick = 209
Ticks = 60.2094
___

Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex
___
cpu_threads_n = 512
threads_n = 2048
writers = 1024
readers = 1024
test duration = 60 seconds
___
___
Raw Reads: 1718250
Raw Writes: 23019
reads_per_tick = 28402
writes_per_tick = 380
Ticks = 60.4974
___


Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 512
threads_n = 512
writers = 256
readers = 256
test duration = 60 seconds
___
___
Raw Reads: 4482
Raw Writes: 2166488
reads_per_tick = 74
writes_per_tick = 36045
Ticks = 60.1037
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 512
threads_n = 1024
writers = 512
readers = 512
test duration = 60 seconds
___
___
Raw Reads: 1536
Raw Writes: 2093636
reads_per_tick = 25
writes_per_tick = 34767
Ticks = 60.2185
___

Testing Version 0.1: std::shared_mutex
___
cpu_threads_n = 512
threads_n = 2048
writers = 1024
readers = 1024
test duration = 60 seconds
___
___
Raw Reads: 4096
Raw Writes: 2001034
reads_per_tick = 67
writes_per_tick = 33130
Ticks = 60.3978
___

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and sto

[lock-free] Re: Experimental Read/Write Mutex..

2019-03-01 Thread Manuel Pöter
I could get my hands on an AMD EPYC 7351P system and a Xeon Phi (Knights 
Corner) if you are interested in more results...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/706bd4d2-d13e-4a1c-9c5b-c4dcec3e43b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: TensorFlow scheduler

2019-03-02 Thread Manuel Pöter
Thanks for sharing this!

Do you know the Chase-Lev work-stealing deque? 
(https://www.dre.vanderbilt.edu/~schmidt/PDF/work-stealing-dequeue.pdf)
I suppose you do, but I was wondering why you went with a blocking design 
for operations on the back of RunQueue. Was it only because you wanted to 
store std::function instances instead of pointers (as mentioned in the 
source comments)? Just wondering since std:function also requires a dynamic 
allocation.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/0013cd49-82d6-4b51-b6f8-8766b7031500%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: Experimental Read/Write Mutex..

2019-03-02 Thread Manuel Pöter


On Friday, 1 March 2019 23:29:41 UTC+1, Chris M. Thomasson wrote:
>
> On Friday, March 1, 2019 at 7:11:20 AM UTC-8, Manuel Pöter wrote:
>>
>> I could get my hands on an AMD EPYC 7351P system and a Xeon Phi (Knights 
>> Corner) if you are interested in more results...
>>
>
> I would be grateful if you can find the time to do this Manuel: Thanks. :^)
>
 
Will do! :) 

Really like the "difference" between the Intel and SPARC results. Afaict, 
> std::shared_mutex, on Solaris? Needs to be fixed? I do not like the way it 
> seems to "tank" the reads and obtain a shi% load of writes. Humm...
>

Yes, the SPARC system is running Solaris 11.3; the binaries were built 
using a self-compiled GCC 6.3. I have to admit that I did not really look 
at the results before posting them here, but now that you pointed it out I 
agree - the shared_mutex results on Solaris are definitely not what I would 
have expected!

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/4012d705-0d3d-4231-a916-0967ab55fa79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: TensorFlow scheduler

2019-03-05 Thread Manuel Pöter
After reading the code more thoroughly I've realized that you have some 
operations that the Chase-Lev work-stealing deque does not support, like 
PushBack or PopBackHalf (although Hendler and Shavit 
 
proposed a variation of the Arora work-stealing deque that supports 
stealing ~half the entries).
FWIW: for those you are interested - my Xenium library contains an 
implementation of the Chase-Lev work-stealing deque: 
https://github.com/mpoeter/xenium/blob/master/xenium/chase_work_stealing_deque.hpp

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/104a5a37-e939-4226-8223-723ec9b161fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Re: 3-thread consensus.

2020-01-14 Thread Manuel Pöter
The lock-free property guarantees that at any time at least one thread is 
making progress in a finite number of steps. Or to put this more generally: 
a stalled thread must not cause all other threads to stall indefinitely.
The arguments about lock-freedom (or the lack thereof) are usually based on 
the (somewhat artificial) assumption that any thread can fail at any time - 
i.e., simply stop executing. If such a failed thread causes the whole 
system to grind to a halt, then it is not lock-free.

You wrote yourself that "If a thread dies at that point, my entire system 
is broken...", so it is definitely not lock-free.
That being said, lock-freedom is a nice property, but by no means 
indispensable for scalability. Several of Dmitry's algorithms are not 
lock-free (like the bounded MPMC queue), which does not mean that they do 
not scale.

Alistarh et al. showed that lock-free algorithms are practically wait-free. 
I suppose the same could be shown for several concurrent algorithms that 
are not strictly lock-free. So it is not so much important whether an 
algorithm is lock-free or not, but whether it works well in practice for 
the use case it is designed for.


On Tuesday, 14 January 2020 03:16:23 UTC+1, bittnkr wrote:
>
> > 
> Well, if producer is preempted, consumers are blocked from progressing.
>
> Sorry, but this isn't true. 
>
> The consumers are preempted only in case of a empty queue. Which isn't a 
> lock. 
>
> Meaning that there is nothing to do. If you don't have active producers, 
> three is nothing to consume.
>
> How can you see a lock there?
>
> Please
>
> Em seg, 13 de jan de 2020 04:35, Dmitry Vyukov  > escreveu:
>
>> +lock-free group again, please don't drop it
>>
>> On Mon, Jan 13, 2020 at 8:19 AM bittnkr > 
>> wrote:
>> >
>> > If a thread dies at that point, my entire system is broken...
>>
>> Which means it's not lock-free.
>>
>> > But It can preempts without any problem at near zero cost.
>>
>> Well, if producer is preempted, consumers are blocked from
>> progressing. This is 100% equivalent to a mutex. If a thread is
>> preempted while holding a mutex, it also does not result in any
>> correctness problems.
>>
>>
>> > Em seg, 13 de jan de 2020 03:55, Dmitry Vyukov > > escreveu:
>> >>
>> >> On Sun, Jan 12, 2020 at 7:33 PM bittnkr > > wrote:
>> >> >
>> >> >
>> >> > > This blocks consumes from progressing (consuming next produced 
>> items)
>> >> > and is effectively a mutex.
>> >> >
>> >> > Suppose the thread A got a local copy of tail in t then is preempted,
>> >> >
>> >> > another thread will get the same tail an get the seat normally.
>> >> >
>> >> > When the previous thread retakes the line, the CAS will fail, 
>> because the seat was taken.
>> >> >
>> >> > Restarting the while without any kind of blocking.
>> >> >
>> >> > Where do you see a mutex here?
>> >>
>> >> I mean preemption between succeeding CAS and writing element /NULL to 
>> the array.
>> >>
>> >> If a thread is terminated at that point, the whole queue is broken (no
>> >> termination-safety).
>> >>
>> >>
>> >> > Em dom, 12 de jan de 2020 04:49, Dmitry Vyukov > > escreveu:
>> >> >>
>> >> >> On Sat, Jan 11, 2020 at 9:26 AM bittnkr > > wrote:
>> >> >> >
>> >> >> > Good observations. Thank you.
>> >> >> >
>> >> >> > If the thread preempts on those points, the seat position will be 
>> held on local variables h and t.
>> >> >>
>> >> >> This blocks consumes from progressing (consuming next produced 
>> items)
>> >> >> and is effectively a mutex. This makes algorithm 
>> non-termination-safe
>> >> >> and non-lock-free.
>> >> >>
>> >> >>
>> >> >> > After the thread line is restored, the CAS can fail, and the loop 
>> will just restart in normal flow, without any locking.
>> >> >> >
>> >> >> > I updated the docs, I think is clearer now.
>> >> >> >
>> >> >> > Em sáb., 11 de jan. de 2020 às 05:38, Dmitry Vyukov <
>> dvy...@gmail.com > escreveu:
>> >> >> >>
>> >> >> >> On Sat, Jan 11, 2020 at 4:09 AM bittnkr > > wrote:
>> >> >> >> >
>> >> >> >> > Hello Dmitry and fellows from the group.
>> >> >> >> >
>> >> >> >> > If you look carefully, you will see that there is no blocking, 
>> just simple CAS, even the buffer is a common buffer.
>> >> >> >>
>> >> >> >> But consider you look inside of a spin lock, or you take a
>> >> >> >> spin-lock-based algorithm and inline all spin lock code into the
>> >> >> >> algorithm. What you will see is exactly the same -- no blocking, 
>> just
>> >> >> >> a CAS. However, it does not really change anything, it's still
>> >> >> >> blocking mutex-based algorithm.
>> >> >> >> One can also combine spin lock state with some data, e.g. a 
>> particular
>> >> >> >> value of a data member means "locked" and blocks progress of 
>> other
>> >> >> >> threads. It makes spin lock even less visible, but again does not
>> >> >> >> change anything on conceptual level.
>> >> >> >>
>> >> >> >> If I am reading the algorithm correctly, if a thread is preempted
>> >> >> >> between these 2 operations:
>> >> >> >>
>> >

[lock-free] Re: 3-thread consensus.

2020-01-15 Thread Manuel Pöter


There exists a universally accepted meaning of lock-freedom. You may find 
various definitions that try to describe or formalize it in different ways, 
but the essence is always the same. Here are a few definitions from 
different sources:

   - An algorithm is lock-free if, when the program threads are run for a 
   sufficiently long time, at least one of the threads makes progress (for 
   some sensible definition of progress). (Wikipedia 
   <https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom>)
   - A method is lock-free if it guarantees that infinitely often some 
   method call finishes in a finite number of steps. (Herlihy and Shavit, 
   The Art of Multiprocessor Programming 
   
<https://www.amazon.com/Art-Multiprocessor-Programming-Revised-Reprint/dp/0123973376>
   )
   - A data structure is lock-free if and only if some operation completes 
   after a  finite number of steps system-wide have been executed on the 
   structure. (Keir Fraser, Practical Lock-freedom 
   <https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf>)

The mental model that is used to reason about this guarantee assumes that *a 
thread can fail at any time anywhere in the code* - and such a failed 
thread must not prevent other threads from finishing their operations.
Or put another way: *An algorithm that requires a thread A to wait for 
another thread B to finish some operation in order to proceed, can never be 
lock-free, because we have to assume that thread B may never finish its 
operation.*

Whether this assumption is probable/realistic or not is beside the point 
and therefore irrelevant.

And since a failed thread can break your algorithm, it is not lock-free in 
the typical sense!

On Tuesday, 14 January 2020 16:03:42 UTC+1, bit...@gmail.com wrote:
>
> Hi Manuel, 
>
> How would a processing line be broken between the CAS and the release of 
> the seat?
>
> This will happen *only if* the S.O. preempt the thread on that point and 
> never get back.
>
> This is what I mean with the "my entire system is broken"
>
> *Only a S.O. scheduling failure can produce that.*
>
> Otherwise is impossible that a thread do not terminate that processing 
> line. 
>
> Can you think another possibility?
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/22c45b49-cef0-42e5-910b-10c1edae52cd%40googlegroups.com.


Re: [lock-free] Re: 3-thread consensus.

2020-01-16 Thread Manuel Pöter
Progress guarantees have nothing to do with algorithmic complexity! Take 
Harris' 
lock-free linked list <https://timharris.uk/papers/2001-disc.pdf> - the 
list operations are definitely not O(1), but it is still lock-free.
I am not sure if you really don't understand this, or if you are 
deliberately ignoring all comments trying to explain what lock-freedom 
usually means. Either way, this discussion seems rather pointless by now...

On Thursday, 16 January 2020 01:47:49 UTC+1, bittnkr wrote:
>
>
> > Your algorithm is lockless (i.e. without lock) but not lock-free 
> according to the definition of lock-freedom.
>
> You make me think if my cellphone is wireless or wirefree. :)
>
> Thinking carefully about, I got an axiom:
>
> An algorithm can be called lock-free if it's O(1).
>
> Condition necessary and sufficient.
>
>
> Em qua, 15 de jan de 2020 07:36, Mamy Ratsimbazafy  > escreveu:
>
>> The definition of obstruction-free, lock-free or wait-free is not a 
>> guarantee of performance, it's a guarantee of progress, in particular when 
>> thread can dies (non-blocking algorithm 
>> <https://en.wikipedia.org/wiki/Non-blocking_algorithm>)
>>
>> In computer science <https://en.wikipedia.org/wiki/Computer_science>, an 
>>> algorithm <https://en.wikipedia.org/wiki/Algorithm> is called 
>>> *non-blocking* if failure or suspension 
>>> <https://en.wikipedia.org/wiki/Scheduling_(computing)> of any thread 
>>> <https://en.wikipedia.org/wiki/Thread_(computing)> cannot cause failure 
>>> or suspension of another thread;[1] 
>>> <https://en.wikipedia.org/wiki/Non-blocking_algorithm#cite_note-1> for 
>>> some operations, these algorithms provide a useful alternative to 
>>> traditional blocking implementations 
>>> <https://en.wikipedia.org/wiki/Lock_(computer_science)>. A non-blocking 
>>> algorithm is *lock-free* if there is guaranteed system-wide progress 
>>> <https://en.wikipedia.org/wiki/Resource_starvation>, and *wait-free* if 
>>> there is also guaranteed per-thread progress. 
>>>
>>
>> Your algorithm is lockless (i.e. without lock) but not lock-free 
>> according to the definition of lock-freedom.
>>
>> As you aptly said, in the practical case wait-free or lock-free is not a 
>> necessary property to be able to scale a data-structure to multiple 
>> contending threads.
>> For example, Dmitry's famous MPSC node-based queue 
>> <http://www.1024cores.net/home/lock-free-algorithms/queues/non-intrusive-mpsc-node-based-queue>
>>  
>> is also lockless, as if a producer dies in the wrong place, the consumer 
>> cannot get to the end of the queue.
>> There are lockless (or linearizable) variants of it but AFAIK, it 
>> requires giving up some performance.
>> In practice, using this queue means assuming that threads don't die which 
>> is a reasonable assumptions most of the time.
>>
>> Lock-freedom or wait-freedom is valuable is when you need strong 
>> worst-case guarantees (real-time scheduling for example) or cannot rely on 
>> the OS (because you might be writing one or you have specific 
>> fault-tolerance requirements).
>>
>> --
>> Mamy
>>
>> On Wed, Jan 15, 2020 at 10:58 AM bittnkr > 
>> wrote:
>>
>>> > Let me ask first: what is your definition of a lock-free algorithm?
>>>
>>> When there is no locks/waits between threads/actors. 
>>>
>>> And a lock happens when some thread A cannot proceed while another 
>>> thread B do not finish its job.
>>>
>>> But for practical reasons, I believe that the most important 
>>> feature/characteristic of a Lock-free algorithm is to be O(1), The number 
>>> of elements must not impact the performance. Doesn't matter if you are 
>>> handling 2 or 10.000 threads, the cost per operation must be the same.
>>>
>>>
>>>
>>> Em qua., 15 de jan. de 2020 às 04:17, Dmitry Vyukov >> > escreveu:
>>>
>>>> On Tue, Jan 14, 2020 at 4:03 PM > wrote:
>>>> >
>>>> > Hi Manuel,
>>>> >
>>>> > How would a processing line be broken between the CAS and the release 
>>>> of the seat?
>>>> >
>>>> > This will happen only if the S.O. preempt the thread on that point 
>>>> and never get back.
>>>> >
>>>> > This is what I mean with the "my entire system is broken"
>>>> >
>>>> > Only a S.O. schedulin

Re: [lock-free] Re: 3-thread consensus.

2020-01-18 Thread Manuel Pöter
Hi bittnkr,

I am still not sure whether you are interested in a meaningful discussion 
or not (but most of your responses convey the feeling that you just want to 
troll around). I will make one last attempt to try to explain your 
misconception, but if you continue to ignore all arguments and references 
to other sources I'm done.

As I said before - progress guarantees (wait-freedom, lock-freedom, ...) 
have nothing to do with algorithmic complexity. And they have nothing to to 
with performance either - in many cases (fine-grained) lock-based solutions 
perform better then lock-free versions. The key difference is, that in a 
lock-free data structure at least one thread will be able to make progress 
on its operation, even if other threads are blocked, preempted or otherwise 
delayed.

Take for example a sorted linked list. Searching/inserting/deleting entries 
with a specific key in the list is obviously O(n) where n is the number of 
items in that list (I hope we can at least agree on that). According to 
your definition, this operation cannot be lock-free. Harris' proposed a 
lock-free solution that is *unbounded*, i.e., O(∞), yet it is still 
lock-free. In fact, most lock-free algorithms are unbounded - if they are 
bounded, they are not just lock-free, but wait-free.

Regarding the chart in your screenshot: it shows that Harris' list scales 
much better than the mutex based one, even though the mutex based solution 
is O(n). But you cannot derive any more information from that chart other 
then how good each implementation scales with the number of threads. In 
particular you cannot tell if any of the implementations are lock-free (nor 
whether their implementation is O(n) for that matter).

Take a closer look at your own code - it is definitely not O(1)! 
do {
   o = out;
   while (o == in)
 sched_yield(); // if empty, wait for item
 } while (isFree[o & mask] || !out.compare_exchange_weak(o, o + 1));
This piece from the pop operation has two unbounded loops in it. Why would 
you assume that this is O(1)?  There is no guarantee that either loop 
terminates after a bounded number of iterations, so it is O(∞). Not only 
that, but it depends on a concurrent push operation to store the value into 
the buffer and setting isFree. But you have no guarantee *when* (or even 
*if* for that matter) this is going to happen. And this is what makes your 
code *not lock-free*: the fact that one thread (pop) depends on some other 
thread having to finish another operation (push).

There exists a universally accepted meaning of lock-freedom (as well as all 
the other progress guarantees) in the scientific community.  The cannonical 
reference for most of these definitions usually is the paper "A methodology 
for implementing highly concurrent data objects 
<http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p745-herlihy.pdf>" 
by Maurice Herlihy from 1993.

I suggest you get yourself a copy of "The Art of Multiprocessor 
Programming" and actually *read* it (well, reading alone won't do it; you 
should also try to *follow the arguments* and *understand *them). Another 
good starting point is "Is Parallel Programming Hard, And, If So, What Can 
You Do About It? <https://arxiv.org/abs/1701.00854>" by Paul McKenney.

However, if you prefer to live in your own world with your own definitions, 
you are welcome to do so. But don't expect me to participate in such a 
pointless discussion (and I suppose the same goes for the other people who 
responded in this thread).


On Friday, 17 January 2020 15:48:44 UTC+1, bittnkr wrote:
>
> Dear Manuel,
>
> Thanks for sending the Harris paper.
> Just now I had the time to read it.
>
> Attached I sent a screenshot of the performance chart presented there.
>
> Did you noticed the difference between the Mutex and the New version?
>
> What do you think is the O() of each one?
>
> And about the Valois version? 
>
> Just looking in this chart, It's possible to conclude if it's lock free or 
> no? Why?
>
> Dmitry?
>
> Em sex, 17 de jan de 2020 10:01, bittnkr > 
> escreveu:
>
>> Dear Nitsan,
>>
>> Thanks for your message. You bring me a smile this morning.
>>
>> I really don't think so...
>>
>> I'm not manipulating the words to put an alien meaning on them.
>>
>> But instead, I'm trying to simplify a problem, considered impossible, 
>> with a rather new, but simpler and concrete definition of a complex theme.
>>
>> Unfortunately, I've seen lots of different positions about the lock free 
>> structures, so many that seems everyone has it own ideas of what means to 
>> be lock free. 
>>
>> This topic is a proof of this. 
>>
>> We can debate for hours about the difference between lock-less o

Re: [lock-free] Re: 3-thread consensus.

2020-01-19 Thread Manuel Pöter
Sorry, but what you are writing does not make *any* sense!

It seems like you don't understand the big O notation, or how to properly 
determine the complexity of an algorithm. You can't just *pick *whatever 
you like to be the argument of O().

BTW: your implementation contains a race condition since isFree is not 
atomic; therefore you also don't get the necessary happens-before relation 
for the operations on buffer. Just run your code with TSAN (but I suppose 
you would dismiss any reports as false positives anyway, so why bother).

You seem to lack some basic understandings, but since you also proved to be 
completely and utterly resistant to all arguments, this discussion is 
indeed pointless!

On Sunday, 19 January 2020 10:34:00 UTC+1, bittnkr wrote:
>
> Hi Manuel,
>
> I really thought this list was done... But It appears you still here. 
> Thanks for your patience.
>
> This is the first time someone call me a troll. I don't know if this is a 
> detriment or an achievement to an introspect and somewhat arrogant nerd 
> like me. 
>
> I think I found the misunderstanding...
>
> > Take for example a sorted linked list. Searching/inserting/deleting 
> entries with a specific key in the list is obviously O(n) where n is the 
> number of items in that list (I hope we can at least agree on that).
>
> When I talk about O(1), I'm referring to the number of threads (the x axis 
> of the chart) and specifically the insert and remove operations(the cost on 
> Y axis), not everything you can do with the structure. 
>
> In my point of view, that's the fundamental behavior of a lock-free 
> structure. 
>
> If a queue/stack/whatever have a constant insert/remove cost which doesn't 
> change with the number of participants, that's the proof its lock-free. 
>
> Almost all lock-free papers I read, at the end have a chart like that as a 
> proof of its lock-freedom.  
>
> In the linked list example, the search is O(n), but inserts and deletes 
> are O(1). Note how the Y values doesn't change with the number of threads 
> (almost a horizontal line). 
>
> About my code, I think it should speak for itself. I know there is no lock 
> on it. No thread will wait for another finish its jobs at any point. 
>
> Besides the queue is full/empty. but this is not a lock. As I've said 
> before If you don't have nothing produced, there is no what to be 
> consumed. The loop is not a spin lock, just the normal flow of push()/pop().
>
> Anyway, maybe I'm being too simplistic, and still missing some obscure 
> nuance only reserved for higher minds than that of a brute troll like me 
> (smiles, and no offenses) . ;-)
>
> I was just euphoric willing to share my joy with someone... But I think I 
> must look another place.
>
> Thanks for all. 
>
> Em sáb., 18 de jan. de 2020 às 13:10, Manuel Pöter <
> man...@manuel-poeter.at > escreveu:
>
>> Hi bittnkr,
>>
>> I am still not sure whether you are interested in a meaningful discussion 
>> or not (but most of your responses convey the feeling that you just want to 
>> troll around). I will make one last attempt to try to explain your 
>> misconception, but if you continue to ignore all arguments and references 
>> to other sources I'm done.
>>
>> As I said before - progress guarantees (wait-freedom, lock-freedom, ...) 
>> have nothing to do with algorithmic complexity. And they have nothing to to 
>> with performance either - in many cases (fine-grained) lock-based solutions 
>> perform better then lock-free versions. The key difference is, that in a 
>> lock-free data structure at least one thread will be able to make progress 
>> on its operation, even if other threads are blocked, preempted or otherwise 
>> delayed.
>>
>> Take for example a sorted linked list. Searching/inserting/deleting 
>> entries with a specific key in the list is obviously O(n) where n is the 
>> number of items in that list (I hope we can at least agree on that). 
>> According to your definition, this operation cannot be lock-free. Harris' 
>> proposed a lock-free solution that is *unbounded*, i.e., O(∞), yet it is 
>> still lock-free. In fact, most lock-free algorithms are unbounded - if they 
>> are bounded, they are not just lock-free, but wait-free.
>>
>> Regarding the chart in your screenshot: it shows that Harris' list scales 
>> much better than the mutex based one, even though the mutex based solution 
>> is O(n). But you cannot derive any more information from that chart other 
>> then how good each implementation scales with the number of threads. In 
>> particular you cannot tell if any of the imple

Re: [lock-free] Re: 3-thread consensus.

2020-01-19 Thread Manuel Pöter
I meant to write *data race* on isFree, not of *race condition*...

On Sunday, 19 January 2020 15:18:08 UTC+1, Manuel Pöter wrote:
>
> Sorry, but what you are writing does not make *any* sense!
>
> It seems like you don't understand the big O notation, or how to properly 
> determine the complexity of an algorithm. You can't just *pick *whatever 
> you like to be the argument of O().
>
> BTW: your implementation contains a race condition since isFree is not 
> atomic; therefore you also don't get the necessary happens-before relation 
> for the operations on buffer. Just run your code with TSAN (but I suppose 
> you would dismiss any reports as false positives anyway, so why bother).
>
> You seem to lack some basic understandings, but since you also proved to 
> be completely and utterly resistant to all arguments, this discussion is 
> indeed pointless!
>
> On Sunday, 19 January 2020 10:34:00 UTC+1, bittnkr wrote:
>>
>> Hi Manuel,
>>
>> I really thought this list was done... But It appears you still here. 
>> Thanks for your patience.
>>
>> This is the first time someone call me a troll. I don't know if this is a 
>> detriment or an achievement to an introspect and somewhat arrogant nerd 
>> like me. 
>>
>> I think I found the misunderstanding...
>>
>> > Take for example a sorted linked list. Searching/inserting/deleting 
>> entries with a specific key in the list is obviously O(n) where n is the 
>> number of items in that list (I hope we can at least agree on that).
>>
>> When I talk about O(1), I'm referring to the number of threads (the x 
>> axis of the chart) and specifically the insert and remove operations(the 
>> cost on Y axis), not everything you can do with the structure. 
>>
>> In my point of view, that's the fundamental behavior of a lock-free 
>> structure. 
>>
>> If a queue/stack/whatever have a constant insert/remove cost which 
>> doesn't change with the number of participants, that's the proof its 
>> lock-free. 
>>
>> Almost all lock-free papers I read, at the end have a chart like that as 
>> a proof of its lock-freedom.  
>>
>> In the linked list example, the search is O(n), but inserts and deletes 
>> are O(1). Note how the Y values doesn't change with the number of threads 
>> (almost a horizontal line). 
>>
>> About my code, I think it should speak for itself. I know there is no 
>> lock on it. No thread will wait for another finish its jobs at any point. 
>>
>> Besides the queue is full/empty. but this is not a lock. As I've said 
>> before If you don't have nothing produced, there is no what to be 
>> consumed. The loop is not a spin lock, just the normal flow of push()/pop().
>>
>> Anyway, maybe I'm being too simplistic, and still missing some obscure 
>> nuance only reserved for higher minds than that of a brute troll like me 
>> (smiles, and no offenses) . ;-)
>>
>> I was just euphoric willing to share my joy with someone... But I think I 
>> must look another place.
>>
>> Thanks for all. 
>>
>> Em sáb., 18 de jan. de 2020 às 13:10, Manuel Pöter <
>> man...@manuel-poeter.at> escreveu:
>>
>>> Hi bittnkr,
>>>
>>> I am still not sure whether you are interested in a meaningful 
>>> discussion or not (but most of your responses convey the feeling that you 
>>> just want to troll around). I will make one last attempt to try to explain 
>>> your misconception, but if you continue to ignore all arguments and 
>>> references to other sources I'm done.
>>>
>>> As I said before - progress guarantees (wait-freedom, lock-freedom, ...) 
>>> have nothing to do with algorithmic complexity. And they have nothing to to 
>>> with performance either - in many cases (fine-grained) lock-based solutions 
>>> perform better then lock-free versions. The key difference is, that in a 
>>> lock-free data structure at least one thread will be able to make progress 
>>> on its operation, even if other threads are blocked, preempted or otherwise 
>>> delayed.
>>>
>>> Take for example a sorted linked list. Searching/inserting/deleting 
>>> entries with a specific key in the list is obviously O(n) where n is the 
>>> number of items in that list (I hope we can at least agree on that). 
>>> According to your definition, this operation cannot be lock-free. Harris' 
>>> proposed a lock-free solution that is *unbounded*, i.e., O(∞), yet it 
>>> is still lock-free. In f