Kristian, thanks for your answers! My comments inline.
On Thu, Nov 06, 2014 at 06:48:18PM +0100, Kristian Nielsen wrote: > Sergey Vojtovich <[email protected]> writes: > > >> # First spin wait, hoping to get the mutex without yielding. > >> os_compare_and_swap_ulint(&mutex->waiters, 0, 1); > > Where did you see "os_compare_and_swap_ulint(&mutex->waiters, 0, 1)"? > > FWICS waiters of InnoDB mutex are non-atomic loads/stores. > > mutex_enter_func() calls mutex_spin_wait() which does > mutex_set_waiters(mutex, 1): > > mutex_set_waiters(mutex_t *mutex, ulint n) { > #ifdef INNODB_RW_LOCKS_USE_ATOMICS > if (n) { > os_compare_and_swap_ulint(&mutex->waiters, 0, 1); > } else { > os_compare_and_swap_ulint(&mutex->waiters, 1, 0); > } > > So that is where I saw it. I do not understand what you mean with > "waiters of InnoDB mutex are non-atomic loads/stores" ? Never seen this before. InnoDB of 5.5 and InnoDB/XtraDB of 10.0 both have: UNIV_INTERN void mutex_set_waiters( /*==============*/ ib_mutex_t* mutex, /*!< in: mutex */ ulint n) /*!< in: value to set */ { volatile ulint* ptr; /* declared volatile to ensure that the value is stored to memory */ ut_ad(mutex); ptr = &(mutex->waiters); os_wmb; *ptr = n; /* Here we assume that the write of a single word in memory is atomic */ } But I confirm that I see this in XtraDB of 5.5. > > >> os_compare_and_swap_ulint() is defined as either > >> __sync_bool_compare_and_swap() or atomic_cas_ulong(), both of which > >> implies a > >> full barrier. So ENTER looks ok. > > See above: waiters loads/stores are not atomic, so no memory barriers at > > all. > > Well, the code is hopelessly complicated with various #ifdef cases. I do not > claim to have verified all cases that they have the correct barrier for ENTER. > > However, I found that in my particular build, mutex_set_waiters used the > INNODB_RW_LOCKS_USE_ATOMICS case, which uses os_compare_and_swap_ulint(). > And that is defined in os0sync.h as either __sync_bool_compare_and_swap(), > which is documented as having a full barrier; or atomic_cas_ulong(), which I > think is the MySQL's own atomics, which on x86 were implemented using LOCK > prefix last time I looked, so also full barrier. > > Why do you think there is no full barrier? Did I miss something? You're right. I didn't check XtraDB of 5.5, I did check InnoDB instead. > > >> This is consistent with the observations in MDEV-7026. A thread is waiting > >> for > >> wakeup on a mutex that is already marked free. > > Ok, so you say this hang is now repeatable with x86 because we changed > > ACQUIRE -> RELEASE. I assume you were actually able to reproduce it on x86. > > I have not tried reproducing on x86. > > Note that on x86 we changed it from full barrier (XCHG instruction) to aquire > (which does nothing if my test program is to be believed). The documentation > only guarantees AQUIRE for __sync_lock_test_and_set(), true, but in reality it > seems to have been full barrier on x86, which would explain why it worked > before. > > I am speculating that some of the hangs that started to appear recently could > be due to this change. At least one case, that I researched in considerable > depth, can be explained adequately in this way. It seems likely that this is > the explanation, but I cannot be sure. Agree. Hangs can happen in the following situations: - error monitor thread is not started (that's mostly for startup) - error monitor thread is stuck itself (can happen any time) > > > Note that we also added mysterious os_isync to mutex_exit_func, which is > > supposed to be among other things ACQUIRE. But on x86 it is no-op with a > > comment: > > /* Performance regression was observed at some conditions for Intel > > architecture. Disable memory barrier for Intel architecture for now. */ > > I saw the added isync. I think this is also wrong. isync is about speculative > execution, one can use it with a conditional instruction to prevent > instructions being executed ahead of the completion of a load. But the place I > saw, it was used incorrectly, without the conditional instruction. I did not > dig deeper into this though, as even if used correctly it would still not be > the full barrier that seems to be required here. Agree. What we basically do is: lock_word= 0; // atomic store with release barrier isync; if (waiters) // non-atomic load Here we need to ensure that store completes before load. But: - there is no isync on x86 - isync doesn't guarantee StoreLoad order (there are many articles about this) So full memory barrier seem to be the only option. > > > Does it explain server lockups at startup or during normal operations? > > Could you > > elaborate the problem you're talking about? > > So my guess is that on x86, the full barriers are in place, and the mutex code > works without losing wakeups. However, at some point there was some problem, > maybe some specific compiler or architecture or whatever, and the > sync_arr_wake_threads_if_sema_free() hack was implemented. I am guessing that > things work correctly on x86 because the sync_arr_wake_threads_if_sema_free() > is fundamentally broken. It is not active during startup (and maybe > shutdown?). And it does not work if the server error monitor thread happens to > also get stuck on a mutex. So I think missing barriers on x86 would have been > discovered in commonly used binaries. And the > sync_arr_wake_threads_if_sema_free() has been able to hide problems in rare > architectures. That is my guess. We're on the same wave. > > As to the concrete problems: > > 1. MDEV-7026 there is a lockup during startup. I believe this is visible > because the server error monitor thread is not running at this point. Agree. > > 2. In a user hang I investigated in detail, a thread was waiting on > log_sys->mutex, but no other thread was holding that mutex (consistent with > lost wakeup). The server error monitor thread was itself stuck on a mutex, > which would prevent sync_arr_wake_threads_if_sema_free() from hiding the > problem. Strange... Monty should have fixed this. Error monitor thread should call log_get_lsn_nowait(), which basically does trylock. Do you happen to have call trace? > > > with Power8. I'd vote for putting full memory barrier into mutex_exit_func() > > and probably remove that ugly hack with sync_arr_wake_threads_if_sema_free. > > :( > > Yes, I agree, that is what I meant. > > I agree that it was probably broken on PowerPC also before the change. But it > looked to me like we had full barriers on x86 before the change (even if it > was not guaranteed by documentation). Correct. It was dead-broken on PPC and it had barrier on x86 before. > > > Though a proper solution should never issue full memory barriers for mutex > > lock/unlock. I suggested different idea, but that's for 10.1: > > This sounds very strange. Are you sure? > Normally, mutex lock/unlock is assumed to be a full barrier. If my reading of > the code is correct, this is also how it was implemented on x86. It seems > _very_ likely that part of the InnoDB code relies on such full barriers. It > does not seem safe to me to change it. > > Posix thread operations are for example defined to imply full barrier. > > Or do you mean that there should be _one_ full barrier implied in > mutex_enter(), and one in mutex_exit()? But there are currently more than one, > and this should be optimised? That would make sense. Though my personal > opinion is that the InnoDB mutex implementation has always been a piece of > garbage, and rather than mess more with it, it should be replaced with > something simpler and better (isn't this what the InnoDB team is already > working on?). Do you have a link to articles that define full memory barrier for mutex lock/unlock? Probably that was quite outdated writing. See e.g.: http://en.cppreference.com/w/cpp/atomic/memory_order ...Atomic load with memory_order_acquire or stronger is an acquire operation. The lock() operation on a Mutex is also an acquire operation... ...Atomic store with memory_order_release or stronger is a release operation. The unlock() operation on a Mutex is also a release operation... Full memory barriers are way too heavy even for mutexes. All we need to to is to ensure that: - all loads following lock are performed after lock (acquire) - all stores preceding unlock are completed before unlock (release) Regards, Sergey _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

