Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Yichao Yu
> What prevents it is the fact that if this was possible, you've just described 
> a trivial way for you to crash any server and any laptop on earth from 
> user-mode code. One of the jobs of the hardware designers is to make it 
> impossible for you to do that, and they've usually done that job to an ok 
> enough level on the systems we run on (assuming we haven't read anything in 
> the news about "how to bring all of AWS down from a laptop").

Sorry I couldn't resist posting this

Is the `F00F`[1] bug (or other HCF bugs[2]) what might happen if such
a property is not guaranteed by the hardware? I was actually very
surprised that modern hardware could be designed without any such
problems...

[1] https://en.wikipedia.org/wiki/Pentium_F00F_bug
[2] https://en.wikipedia.org/wiki/Halt_and_Catch_Fire

>
>>
>> And even if it isn't perpetual, what guarantees that the CPU completes in a 
>> bounded number of steps?
>
>
>
>
>>
>>
>>
>> On Saturday, August 25, 2018 at 9:39:13 AM UTC+3, Gil Tene wrote:
>>>
>>> Wait-free, Lock-free, and Obstruction-free are well defined things. They 
>>> are all non-blocking, such that suspended threads cannot indefinitely 
>>> prevent all others from making progress, but they differ in the forward 
>>> progress expectations of individual threads. The "Non-blocking algorithm" 
>>> wikipedia entry does a fair job of explaining what each means. It is 
>>> forward progress in a bounded number of steps, rather than "fairness", that 
>>> is key. You can be extremely unfair (as in making forward progress at 
>>> 1,000,000,000:1 ratios) and wait-free at the same time.
>>>
>>> There is a difference between the wait-free/lock-free classification of an 
>>> operation/instruction and the classification of a mechanism/algorithm using 
>>> that operation/instruction...
>>>
>>> Both the LOCK XADD operation and LOCK CAS operations are wait free. There 
>>> is nothing any other processor can do to prevent your processor from making 
>>> forward progress and eventually (in a bound number of steps) executing 
>>> either instruction. In fact, [at the very least] all user-mode x86 CPU 
>>> instructions are trivially wait free, otherwise very bad things could be 
>>> done by user-mode code...
>>>
>>> The key difference between a LOCK XADD and a LOCK CAS is that the LOCK XADD 
>>> will actually and unconditionally force a visible change to the shared 
>>> memory state in a way that no other processor can prevent, while the shared 
>>> memory effect of a LOCK CAS operation is a conditional, and will only occur 
>>> if the compare shows that the expected value and the memory value are the 
>>> same. Simple (but by no means all) uses of CAS often perform retry loops in 
>>> lock-free, but not wait-free, mechanisms.
>>>
>>> It is easy to see how one can use LOCK XADD operations to build wait-free 
>>> mechanisms. E.g. implementing a wait-free atomic counter using XADD is 
>>> trivial. It is similarly useful in more complicated synchronization 
>>> primitives, like e.g. my WriterReaderPhaser, which is wait free on 
>>> architectures that support atomic increment operations, but (in my current 
>>> implementation) is only lock free on architectures that force us to resort 
>>> to a CAS.. It is somewhat harder to construct wait-free mechanisms using 
>>> LOCK CAS, but it is by no means not impossible, and usually just means some 
>>> extra complexity and spending some more memory per concurrent thread or 
>>> cpu. e.g. see "Wait-Free Queues With Multiple Enqueuers and Dequeuers", 
>>> where wait free queues are implemented using CAS constructs.
>>>
>>> On Friday, August 24, 2018 at 10:00:22 PM UTC-7, Peter Veentjer wrote:

 I'm polishing up my lock free knowledge and one question keeps on bugging 
 me.

 The question is about the LOCK XADD and why it is considered to be wait 
 free.

 AFAIK for wait freedom there needs to be some fairness.

 So image a concurrent counter using a spin on a cas to increment a 
 counter, then at least 1 thread wins and makes progress. Therefor this 
 implementation is lock free. It isn't wait free because some thread could 
 be spinning without bound. The problem here is that there is no fairness.

 Now imagine this counter would be implemented using a LOCK XADD and 
 therefor there is no need for a loop. What is guaranteeing that every 
 thread is going to make progress in a bound number of steps? E.g. could it 
 happen that one thread is continuously denied exclusive access to the 
 memory and therefor it won't complete in a bound number of steps? Or do 
 the request get stored in some kind of list and the requests are processed 
 in this order? This order would provide fairness and then it will be wait 
 free.

 I have been looking at the Intel documentation of LOCK XADD; but it 
 remains unclear.
>
> --
> You received this message because you are subscribed to the Google Groups 

Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Gil Tene


On Saturday, August 25, 2018 at 8:18:53 AM UTC-7, Peter Veentjer wrote:
>
> Hi Gil,
>
> thanks for your answer. 
>
> However it still feels a bit like 'it works because it does' (or my skull 
> is just too thick).
>
> So what prevents a cpu from perpetually being denied to execute the LOCK 
> XADD? So imagine there are 2 cpu's, both executing a LOCK XADD in a loop, 
> what prevents that one of the cpu's isn't perpetually being starved from 
> its resource. 
>

What prevents it is the fact that if this was possible, you've just 
described a trivial way for you to crash any server and any laptop on earth 
from user-mode code. One of the jobs of the hardware designers is to make 
it impossible for you to do that, and they've usually done that job to an 
ok enough level on the systems we run on (assuming we haven't read anything 
in the news about "how to bring all of AWS down from a laptop").
 

> And even if it isn't perpetual, what guarantees that the CPU completes in 
> a bounded number of steps? 
>


 

>
>
> On Saturday, August 25, 2018 at 9:39:13 AM UTC+3, Gil Tene wrote:
>>
>> Wait-free, Lock-free, and Obstruction-free are well defined things. They 
>> are all non-blocking, such that suspended threads cannot indefinitely 
>> prevent all others from making progress, but they differ in the forward 
>> progress expectations of individual threads. The "Non-blocking 
>> algorithm" wikipedia entry 
>>  
>> does a fair job of explaining what each means. It is forward progress in a 
>> bounded number of steps, rather than "fairness", that is key. You can be 
>> extremely unfair (as in making forward progress at 1,000,000,000:1 ratios) 
>> and wait-free at the same time.
>>
>> There is a difference between the wait-free/lock-free classification of 
>> an operation/instruction and the classification of a mechanism/algorithm 
>> using that operation/instruction...
>>
>> Both the LOCK XADD operation and LOCK CAS operations are wait free. There 
>> is nothing any other processor can do to prevent your processor from making 
>> forward progress and eventually (in a bound number of steps) executing 
>> either instruction. In fact, [at the very least] all user-mode x86 CPU 
>> instructions are trivially wait free, otherwise very bad things could be 
>> done by user-mode code...
>>
>> The key difference between a LOCK XADD and a LOCK CAS is that the LOCK 
>> XADD will actually and unconditionally force a visible change to the shared 
>> memory state in a way that no other processor can prevent, while the shared 
>> memory effect of a LOCK CAS operation is a conditional, and will only occur 
>> if the compare shows that the expected value and the memory value are the 
>> same. Simple (but by no means all) uses of CAS often perform retry loops in 
>> lock-free, but not wait-free, mechanisms. 
>>
>> It is easy to see how one can use LOCK XADD operations to build wait-free 
>> mechanisms. E.g. implementing a wait-free atomic counter using XADD is 
>> trivial. It is similarly useful in more complicated synchronization 
>> primitives, like e.g. my WriterReaderPhaser 
>> ,
>>  
>> which is wait free on architectures that support atomic increment 
>> operations, but (in my current implementation) is only lock free on 
>> architectures that force us to resort to a CAS.. It is somewhat harder to 
>> construct wait-free mechanisms using LOCK CAS, but it is by no means not 
>> impossible, and usually just means some extra complexity and spending some 
>> more memory per concurrent thread or cpu. e.g. see "Wait-Free Queues 
>> With Multiple Enqueuers and Dequeuers" 
>> , where 
>> wait free queues are implemented using CAS constructs.
>>
>> On Friday, August 24, 2018 at 10:00:22 PM UTC-7, Peter Veentjer wrote:
>>>
>>> I'm polishing up my lock free knowledge and one question keeps on 
>>> bugging me.
>>>
>>> The question is about the LOCK XADD and why it is considered to be wait 
>>> free.
>>>
>>> AFAIK for wait freedom there needs to be some fairness. 
>>>
>>> So image a concurrent counter using a spin on a cas to increment a 
>>> counter, then at least 1 thread wins and makes progress. Therefor this 
>>> implementation is lock free. It isn't wait free because some thread could 
>>> be spinning without bound. The problem here is that there is no fairness.
>>>
>>> Now imagine this counter would be implemented using a LOCK XADD and 
>>> therefor there is no need for a loop. What is guaranteeing that every 
>>> thread is going to make progress in a bound number of steps? E.g. could it 
>>> happen that one thread is continuously denied exclusive access to the 
>>> memory and therefor it won't complete in a bound number of steps? Or do the 
>>> request get stored in some kind of list and the requests are processed in 

Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Gil Tene


On Saturday, August 25, 2018 at 9:11:26 AM UTC-7, Martin Thompson wrote:
>
> To perform an update via the XADD instruction the cache line containing 
> the word must first be acquired. x86 uses the MESI cache coherence protocol 
> and to get the cacheline for update an RFO (Read/Request For Ownership) 
> message must be sent as a bus transaction. These requests are queued per 
> core and in the uncore and effectively avoid starvation. Events are 
> available, e.g. OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO. From the 
> perspective of instructions each thread takes a finite number of steps to 
> complete. The CAS equivalent would be lock-free, rather than wait-free, as 
> the number of steps per thread is not finite.
>

The CAS might never "succeed" in a capped number of steps (i.e. someone 
else could forever beat it to the memory location and the compare could 
fail in each and every attempt), but the CAS instruction will complete, one 
way or another, on all practical hardware implementations, in a capped 
number of steps. A fundamental design requirement in all hardware 
implementations I'm aware of is to prevent the starvation of any possible 
(at least user mode, and probably all) instructions, including CAS. This 
might be achieved in many different ways, often leveraging qualities of the 
cache protocols and the knowledge that the number of processors and other 
components (like memory controllers, coherence protocol coordination 
points, queue depths in various places, etc.) in the system is capped. "How 
exactly" doesn't matter. It's the job of whomever designs the hardware to 
make sure it is so.

Think about it this way: In any system where it is impossible for you to 
cause a hardware watchdog reset from user code, there's a "capped number of 
steps" for executing any individual user-mode instruction, which by 
definition makes them all (individually) wait-free. There are no 
non-wait-free user mode instructions in such systems. QED.
 

>
> On Saturday, 25 August 2018 06:00:22 UTC+1, Peter Veentjer wrote:
>>
>> I'm polishing up my lock free knowledge and one question keeps on bugging 
>> me.
>>
>> The question is about the LOCK XADD and why it is considered to be wait 
>> free.
>>  
>> AFAIK for wait freedom there needs to be some fairness. 
>>
>> So image a concurrent counter using a spin on a cas to increment a 
>> counter, then at least 1 thread wins and makes progress. Therefor this 
>> implementation is lock free. It isn't wait free because some thread could 
>> be spinning without bound. The problem here is that there is no fairness.
>>
>> Now imagine this counter would be implemented using a LOCK XADD and 
>> therefor there is no need for a loop. What is guaranteeing that every 
>> thread is going to make progress in a bound number of steps? E.g. could it 
>> happen that one thread is continuously denied exclusive access to the 
>> memory and therefor it won't complete in a bound number of steps? Or do the 
>> request get stored in some kind of list and the requests are processed in 
>> this order? This order would provide fairness and then it will be wait free.
>>
>> I have been looking at the Intel documentation of LOCK XADD; but it 
>> remains unclear.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Martin Thompson
To perform an update via the XADD instruction the cache line containing the 
word must first be acquired. x86 uses the MESI cache coherence protocol and 
to get the cacheline for update an RFO (Read/Request For Ownership) message 
must be sent as a bus transaction. These requests are queued per core and 
in the uncore and effectively avoid starvation. Events are available, e.g. 
OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO. From the perspective of 
instructions each thread takes a finite number of steps to complete. The 
CAS equivalent would be lock-free, rather than wait-free, as the number of 
steps per thread is not finite.

On Saturday, 25 August 2018 06:00:22 UTC+1, Peter Veentjer wrote:
>
> I'm polishing up my lock free knowledge and one question keeps on bugging 
> me.
>
> The question is about the LOCK XADD and why it is considered to be wait 
> free.
>  
> AFAIK for wait freedom there needs to be some fairness. 
>
> So image a concurrent counter using a spin on a cas to increment a 
> counter, then at least 1 thread wins and makes progress. Therefor this 
> implementation is lock free. It isn't wait free because some thread could 
> be spinning without bound. The problem here is that there is no fairness.
>
> Now imagine this counter would be implemented using a LOCK XADD and 
> therefor there is no need for a loop. What is guaranteeing that every 
> thread is going to make progress in a bound number of steps? E.g. could it 
> happen that one thread is continuously denied exclusive access to the 
> memory and therefor it won't complete in a bound number of steps? Or do the 
> request get stored in some kind of list and the requests are processed in 
> this order? This order would provide fairness and then it will be wait free.
>
> I have been looking at the Intel documentation of LOCK XADD; but it 
> remains unclear.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Peter Veentjer
Hi Gil,

thanks for your answer. 

However it still feels a bit like 'it works because it does' (or my skull 
is just too thick).

So what prevents a cpu from perpetually being denied to execute the LOCK 
XADD? So imagine there are 2 cpu's, both executing a LOCK XADD in a loop, 
what prevents that one of the cpu's isn't perpetually being starved from 
its resource. And even if it isn't perpetual, what guarantees that the CPU 
completes in a bounded number of steps? 


On Saturday, August 25, 2018 at 9:39:13 AM UTC+3, Gil Tene wrote:
>
> Wait-free, Lock-free, and Obstruction-free are well defined things. They 
> are all non-blocking, such that suspended threads cannot indefinitely 
> prevent all others from making progress, but they differ in the forward 
> progress expectations of individual threads. The "Non-blocking algorithm" 
> wikipedia entry 
>  
> does a fair job of explaining what each means. It is forward progress in a 
> bounded number of steps, rather than "fairness", that is key. You can be 
> extremely unfair (as in making forward progress at 1,000,000,000:1 ratios) 
> and wait-free at the same time.
>
> There is a difference between the wait-free/lock-free classification of an 
> operation/instruction and the classification of a mechanism/algorithm using 
> that operation/instruction...
>
> Both the LOCK XADD operation and LOCK CAS operations are wait free. There 
> is nothing any other processor can do to prevent your processor from making 
> forward progress and eventually (in a bound number of steps) executing 
> either instruction. In fact, [at the very least] all user-mode x86 CPU 
> instructions are trivially wait free, otherwise very bad things could be 
> done by user-mode code...
>
> The key difference between a LOCK XADD and a LOCK CAS is that the LOCK 
> XADD will actually and unconditionally force a visible change to the shared 
> memory state in a way that no other processor can prevent, while the shared 
> memory effect of a LOCK CAS operation is a conditional, and will only occur 
> if the compare shows that the expected value and the memory value are the 
> same. Simple (but by no means all) uses of CAS often perform retry loops in 
> lock-free, but not wait-free, mechanisms. 
>
> It is easy to see how one can use LOCK XADD operations to build wait-free 
> mechanisms. E.g. implementing a wait-free atomic counter using XADD is 
> trivial. It is similarly useful in more complicated synchronization 
> primitives, like e.g. my WriterReaderPhaser 
> ,
>  
> which is wait free on architectures that support atomic increment 
> operations, but (in my current implementation) is only lock free on 
> architectures that force us to resort to a CAS.. It is somewhat harder to 
> construct wait-free mechanisms using LOCK CAS, but it is by no means not 
> impossible, and usually just means some extra complexity and spending some 
> more memory per concurrent thread or cpu. e.g. see "Wait-Free Queues With 
> Multiple Enqueuers and Dequeuers" 
> , where wait 
> free queues are implemented using CAS constructs.
>
> On Friday, August 24, 2018 at 10:00:22 PM UTC-7, Peter Veentjer wrote:
>>
>> I'm polishing up my lock free knowledge and one question keeps on bugging 
>> me.
>>
>> The question is about the LOCK XADD and why it is considered to be wait 
>> free.
>>
>> AFAIK for wait freedom there needs to be some fairness. 
>>
>> So image a concurrent counter using a spin on a cas to increment a 
>> counter, then at least 1 thread wins and makes progress. Therefor this 
>> implementation is lock free. It isn't wait free because some thread could 
>> be spinning without bound. The problem here is that there is no fairness.
>>
>> Now imagine this counter would be implemented using a LOCK XADD and 
>> therefor there is no need for a loop. What is guaranteeing that every 
>> thread is going to make progress in a bound number of steps? E.g. could it 
>> happen that one thread is continuously denied exclusive access to the 
>> memory and therefor it won't complete in a bound number of steps? Or do the 
>> request get stored in some kind of list and the requests are processed in 
>> this order? This order would provide fairness and then it will be wait free.
>>
>> I have been looking at the Intel documentation of LOCK XADD; but it 
>> remains unclear.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.