Re: [lock-free] Question on relaxed atomic optimizations in MPMC queue

2015-01-28 Thread Dmitry Vyukov
On Thu, Jan 29, 2015 at 1:37 AM, Pedro Ramalhete prama...@gmail.com wrote:
 Hi Dmitry,

 I have a question regarding the MPMC queue on your site
 http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

 On the enqueue() method, aren't you worried that the store on 'cell-data_'
 gets re-ordered and becomes visible before the
 'enqueue_pos_.compare_exchange_weak()' ?
 This would only be a problem if there is a speculative load on 'pos', and if
 this were to happen, then a dequeue() on the previous round of sequences
 could end up getting the new (wrong) data.

 Here's a more detailed example scenario where the buffer_size is 4:
   cell.sequence_ = [4  1  2  3 ]
   cell.data_ = [d4 d1 d2 d3]
   enqueue_pos_   = 5
   dequeue_pos_   = 2

 Suppose there are three threads, T1, T2 and T3, where T1 and T2 are doing
 enqueue() operations and T3 is calling dequeue().
 Consider the following sequence of events:

 1. T1 calls enqueue(), does a speculative load on 'pos' and it is speculated
 to be 6 and (due to re-ordering) sets cell-data_ in index 2 to d6, and goes
 to sleep:
   cell.sequence_ = [4  1  2  3 ]
   cell.data_ = [d4 d1 d6 d3]
   enqueue_pos_   = 5
   dequeue_pos_   = 2

 2. T3 calls dequeue(), increments dequeue_pos_ to 3 and returns the data at
 index 2, which is (incorrectly) d6:
   cell.sequence_ = [4  1  2  3 ]
   cell.data_ = [d4 d1 d6 d3]
   enqueue_pos_   = 5
   dequeue_pos_   = 3

 3. T2 calls enqueue(), increments enqueue_pos_, sets the data and sequence
 in index 2:
   cell.sequence_ = [4  5  2  3 ]
   cell.data_ = [d4 d5 d6 d3]
   enqueue_pos_   = 6
   dequeue_pos_   = 3

 4. T1 wakes up, continues its call to enqueue(), the speculative load of
 'pos' succeeds now that enqueue_pos_ is 6 and it gets incremented to 7, and
 the sequence at index 2 is set to 6:
   cell.sequence_ = [4  5  6  3 ]
   cell.data_ = [d4 d5 d6 d3]
   enqueue_pos_   = 7
   dequeue_pos_   = 3


 A somewhat similar issue may occur on the dequeue() method, where the
 relaxed load on 'pos' becomes a speculative load, and the load on
 'cell-data_' gets re-ordered above the
 'dequeue_pos_.compare_exchange_weak()'. This would cause an old value of
 'cell-data_' to be read, which would be incorrect.

 There is a very interesting paper on these kind of issues:
 http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42967.pdf

 I might have completely missed something obvious, so feel free to prove me
 wrong  :)


 Anyways, I think the idea for this queue is good, and even if my analysis is
 accurate and the current implementation is incorrect, it's just a matter of
 adding the right barriers at the right places and it will become correct.


Hi Pedro,

Accesses to data are synchronized by cell-sequence_.
enqueue/dequeue_pos_ only arbitrate concurrent enqueue/dequeue
requests.

In your scenario you missed that there is also an acquire load
cell-sequence_ before the store. The store cannot be speculated ahead
of the load, because that would introduce a data race into a race-free
program.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
Scalable Synchronization Algorithms group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3sozNvJJtB%3D6DoOumybYT3xnTjPfMbVXR5yaMoeAz1bbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[lock-free] Question on relaxed atomic optimizations in MPMC queue

2015-01-28 Thread Pedro Ramalhete
Hi Dmitry,

I have a question regarding the MPMC queue on your site 
http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

On the enqueue() method, aren't you worried that the store on 'cell-data_' 
gets re-ordered and becomes visible before the 
'enqueue_pos_.compare_exchange_weak()' ?
This would only be a problem if there is a speculative load on 'pos', and 
if this were to happen, then a dequeue() on the previous round of sequences 
could end up getting the new (wrong) data.

Here's a more detailed example scenario where the buffer_size is 4:
  cell.sequence_ = [4  1  2  3 ]
  cell.data_ = [d4 d1 d2 d3]  
  enqueue_pos_   = 5
  dequeue_pos_   = 2

Suppose there are three threads, T1, T2 and T3, where T1 and T2 are doing 
enqueue() operations and T3 is calling dequeue(). 
Consider the following sequence of events:

1. T1 calls enqueue(), does a speculative load on 'pos' and it is 
speculated to be 6 and (due to re-ordering) sets cell-data_ in index 2 to 
d6, and goes to sleep:
  cell.sequence_ = [4  1  2  3 ]
  cell.data_ = [d4 d1 d6 d3]  
  enqueue_pos_   = 5
  dequeue_pos_   = 2
  
2. T3 calls dequeue(), increments dequeue_pos_ to 3 and returns the data at 
index 2, which is (incorrectly) d6:
  cell.sequence_ = [4  1  2  3 ]
  cell.data_ = [d4 d1 d6 d3]  
  enqueue_pos_   = 5
  dequeue_pos_   = 3

3. T2 calls enqueue(), increments enqueue_pos_, sets the data and sequence 
in index 2:
  cell.sequence_ = [4  5  2  3 ]
  cell.data_ = [d4 d5 d6 d3]  
  enqueue_pos_   = 6
  dequeue_pos_   = 3

4. T1 wakes up, continues its call to enqueue(), the speculative load of 
'pos' succeeds now that enqueue_pos_ is 6 and it gets incremented to 7, and 
the sequence at index 2 is set to 6:
  cell.sequence_ = [4  5  6  3 ]
  cell.data_ = [d4 d5 d6 d3]  
  enqueue_pos_   = 7
  dequeue_pos_   = 3


A somewhat similar issue may occur on the dequeue() method, where the 
relaxed load on 'pos' becomes a speculative load, and the load on 
'cell-data_' gets re-ordered above the
'dequeue_pos_.compare_exchange_weak()'. This would cause an old value of 
'cell-data_' to be read, which would be incorrect.

There is a very interesting paper on these kind of issues: 
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42967.pdf

I might have completely missed something obvious, so feel free to prove me 
wrong  :)


Anyways, I think the idea for this queue is good, and even if my analysis 
is accurate and the current implementation is incorrect, it's just a matter 
of adding the right barriers at the right places and it will become correct.

Thanks,
Pedro

-- 

--- 
You received this message because you are subscribed to the Google Groups 
Scalable Synchronization Algorithms group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/8cc97059-800d-4492-97c2-9617161f1df4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.