Re: [lock-free] Re: TensorFlow scheduler

2019-03-06 Thread Dmitry Vyukov
On Wed, Mar 6, 2019 at 7:11 AM Chris M. Thomasson  wrote:
>>>
>>> Hi,
>>>
>>> TensorFlow CPU task scheduler I wrote some time ago:
>>>
>>> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h?at=default=file-view-default
>>>
>>> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/RunQueue.h?at=default=file-view-default
>>>
>>> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/EventCount.h?at=default=file-view-default
>>>
>>> This and some other fixes improved model training time on CPU up to 3x.
>>> There is some interesting lock-free stuff in there.
>>> [...]
>>
>>
>> The code is a work of art: Well done Dmitry!

Thanks!

> Love the:
>
>  Work PushFront(Work w) {
> unsigned front = front_.load(std::memory_order_relaxed);
> Elem* e = _[front & kMask];
> uint8_t s = e->state.load(std::memory_order_relaxed);
> if (s != kEmpty ||
> !e->state.compare_exchange_strong(s, kBusy, 
> std::memory_order_acquire))
>   return w;
> front_.store(front + 1 + (kSize << 1), std::memory_order_relaxed);
> e->w = std::move(w);
> e->state.store(kReady, std::memory_order_release);
> return Work();
>
> }
>
> part. It reminds me of the same basic logic in your mpmc queue:
>
> http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>
> Ahh, I still like my little tweak of your most excellent algorithm:
>
> https://groups.google.com/d/topic/lock-free/acjQ3-89abE/discussion
>
> ;^)

IIRC once a thread executed XADD it has no way back, which implies
blocking/waiting on empty/full queue.
Here I needed "try" versions because this is not a producer-consumer
scenario and we don't need back-pressure in cooperative task execution
scenario. If we fail to pop from one queue, we will go and try to pop
from another. If we fail to push onto a queue, we can push onto
another or execute directly.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3v8hRs2pTJiTsokcRWLgEERz4s6gdkcBV_q4%2Bov%3DcCt-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [lock-free] Re: TensorFlow scheduler

2019-03-06 Thread Dmitry Vyukov
On Fri, Mar 1, 2019 at 8:27 AM Chris M. Thomasson  wrote:
>
> On Wednesday, February 27, 2019 at 1:35:04 AM UTC-8, Dmitry Vyukov wrote:
>>
>> Hi,
>>
>> TensorFlow CPU task scheduler I wrote some time ago:
>>
>> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h?at=default=file-view-default
>>
>> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/RunQueue.h?at=default=file-view-default
>>
>> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/EventCount.h?at=default=file-view-default
>>
>> This and some other fixes improved model training time on CPU up to 3x.
>> There is some interesting lock-free stuff in there.
>>
>> Main design goal is to tailor pieces for specific problem at hand & be
>> practical Eg lock-free where matters mutex-based otherwise. Sane level
>> of complexity(need to get it right). Not trading practical properties
>> that matter (array-based queue) for formal properties (lock-freedom)
>>
>> EventCount (aka "condvar for lock-free algos") has a fast common paths
>> and minimizes contention as much as possible. Due to
>> non-uniform/bursty work blocking/unblocking threads all the time
>> actually turned out to be one of the most critical parts of scheduling
>> in TF.
>
>
> Gotta love the eventcount. Fwiw, I remember way back when I created one in 
> 2005:
>
> https://groups.google.com/d/topic/comp.programming.threads/qoxirQbbs4A/discussion
>
> Actually, iirc, Joe Seigh created a very interesting bakery style rw 
> spinlock, need to find it. Iirc, it can be outfitted with an eventcount to 
> remove the spins.
>
>
>>
>>
>> RunQueue (work-stealing deque) allows all operations from both sides
>> as work can be submitted from external threads in TF. Owner ops are
>> lock-free, remote use mutex. This part was somewhat unique as compared
>> to similar systems. Getting Size right is tricky!
>>
>> ThreadPool (scheduler itself) is quite standard design based on
>> distributed runqueues. Though you can find some interesting tricks wrt
>> stealing order there. Also steal partitions (not done by me). Spinning
>> logic required quite some tuning to balance between latency and wasted
>> CPU
>
>
> Nice! Btw, did you ever find a use for work requesting?
>
> Something like:
>
> https://groups.google.com/d/topic/comp.programming.threads/YBnjd-Sqc-w/discussion


Not any real one...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3v98bX3rUH_XtEMjznGg3dUqPy9HZ0noB1f3n%3D1%3D7umbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.