The branch, lock-wait has been created
        at  47d63bc9cf3885c16eec7e7f2cdef27f8b6c4e93 (commit)

- Log -----------------------------------------------------------------
commit 47d63bc9cf3885c16eec7e7f2cdef27f8b6c4e93
Author: Ronnie Sahlberg <ronniesahlb...@gmail.com>
Date:   Tue Dec 15 21:28:23 2009 +1100

    initial support for a lockdown protocol.
    
    We use a dummy file and byte range locks on this file to make records 
sticky.
    
    With the current design there is a possibility that database records for 
very hot records will be migrating bethween the nodes faster than the client 
applications can access the data.
    So that once a client application has requested a record and asking ctdbd 
to migrate the record onto the node, that this record might be migrated off the 
node again before the client got a chance to access the record.
    
    This can now be prevented by using a "pindown" mechanism.
    This pindown mechanism is implemented using fcntl() locks on a file shared 
between ctdbd and the clients on the local nodes.
    
    Records that are pinned down, can not be migrated off the node by ctdbd. 
Instead any such requests will be blocked until the pindown dissapears.
    Records can however be migrated onto the node while there is an active 
pin-down.
    
    Clients can set a pindown on a record even before it tries to have it being 
migrated onto the node with the effect that then record is fetched, and then 
remains pinned down on the node until the client has finished processing the 
record.
    
    Multiple clietns can pin down the same record, in which case the record 
remains on the local node until all clietns have released their pin-down.
    
    Client pindown is implemented by a read-lock on teh pindown file.
    
    Ctdbd tries write-locks for the same region on the pindown file when 
determining whether a migrate off node request should be allowed or if it 
should be postphoned until all clients have finished.
    
    clients use read-locks to pin the record down
    ctdbd will not allow the record to be migrated off the node until it can 
take out a write-lock.
    
    Since this will require two extra trips to the kernel and back for the 
clietns, clietns may try a cheaper non-pinlock the first few interations in the 
fetch-lock loop and not involve the heavy pind-down until this has failed a few 
times for the record,
    to make sure non-contended records are as fast as possible and at the same 
time allow for using the slightly more heavy pin-down when it gets tired of 
waiting.
    
    clients using pin-down can coexist and access the same data and records as 
clients not using pin-down.

-----------------------------------------------------------------------


-- 
CTDB repository

Reply via email to