SuperServer could hung when changing physical backup state under high load
--------------------------------------------------------------------------

                 Key: CORE-5613
                 URL: http://tracker.firebirdsql.org/browse/CORE-5613
             Project: Firebird Core
          Issue Type: Bug
          Components: Engine
    Affects Versions: 4.0 Alpha 1, 3.0.2, 3.0.1, 3.0.0
            Reporter: Vlad Khorsun


The issue was detected when testing nbackup during TPCC run with 64 concurrent 
connections.
Engine could hung immediately after begin\end backup, i.e. after physical state 
change.
Few threads waits infinitely in RWLock::beginRead() for 
BackupManager::localStateLock.
Wait can't succeed as there is no owner of localStateLock.
Also, lock value is -1 which should never happens.
All other threads waits for bdb latches already acquired by threads above.

The problem happens because of race condition: 

- backup thread acquires localStateLock in Write mode (see 
BackupManager::StateWriteGuard) and set TDBB_backup_write_locked flag (see 
BackupManager::lockStateWrite), 
then it marks header page and set BDB_nbak_state_lock flag on its BufferDesc
note, this mark does not acquire localStateLock in Read mode because of 
BDB_nbak_state_lock (see CCH\set_diff_page() and BackupManager::lockStateRead)
then backup thread release header page (it does not release localStateLock)

- another thread commits and flush dirty pages, it writes dirty header page and 
release localStateLock (see CCH\clear_dirty_flag_and_nbak_state)
as BufferDesc have BDB_nbak_state_lock flag set and tdbb is not marked with 
TDBB_backup_write_locked flag

- backup thread release localStateLock in Write mode (see ~StateWriteGuard) 

I.e. we have excess RWLock::endRead call which broke lock state and leads to 
the hangup.

To make problem happens there should be very short transactions to fit (from 
start to finish) into small time window
between release of header page and localStateLock by backup thread.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to