SuperServer could hung when changing physical backup state under high load --------------------------------------------------------------------------
Key: CORE-5613 URL: http://tracker.firebirdsql.org/browse/CORE-5613 Project: Firebird Core Issue Type: Bug Components: Engine Affects Versions: 4.0 Alpha 1, 3.0.2, 3.0.1, 3.0.0 Reporter: Vlad Khorsun The issue was detected when testing nbackup during TPCC run with 64 concurrent connections. Engine could hung immediately after begin\end backup, i.e. after physical state change. Few threads waits infinitely in RWLock::beginRead() for BackupManager::localStateLock. Wait can't succeed as there is no owner of localStateLock. Also, lock value is -1 which should never happens. All other threads waits for bdb latches already acquired by threads above. The problem happens because of race condition: - backup thread acquires localStateLock in Write mode (see BackupManager::StateWriteGuard) and set TDBB_backup_write_locked flag (see BackupManager::lockStateWrite), then it marks header page and set BDB_nbak_state_lock flag on its BufferDesc note, this mark does not acquire localStateLock in Read mode because of BDB_nbak_state_lock (see CCH\set_diff_page() and BackupManager::lockStateRead) then backup thread release header page (it does not release localStateLock) - another thread commits and flush dirty pages, it writes dirty header page and release localStateLock (see CCH\clear_dirty_flag_and_nbak_state) as BufferDesc have BDB_nbak_state_lock flag set and tdbb is not marked with TDBB_backup_write_locked flag - backup thread release localStateLock in Write mode (see ~StateWriteGuard) I.e. we have excess RWLock::endRead call which broke lock state and leads to the hangup. To make problem happens there should be very short transactions to fit (from start to finish) into small time window between release of header page and localStateLock by backup thread. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tracker.firebirdsql.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel