[Firebird-devel] Shared page cache

Vlad Khorsun Mon, 09 May 2011 03:09:38 -0700

    Hi all.

    After more than a year of development, testing, switching on another tasks 
and
returning back i'm ready to commit shared page cache implementation into trunk.


    I consider it stable enough to be committed into SVN. It runs some 
stability tests
more than 10 hours for longest run and many shorter runs at different 
configurations.

    Here i want to do overview of changes, what was tested, what was not, and 
what 
is work in progress.

    Page cache was re-implemented looking at both Vulcan code (thanks to Jim) 
and 
our code base. It is not a blind copy of Vulcan code, every decision was 
thinking on
and adjusted when needed with my understanding and expirience with existing 
code.

    The cache is syncronized at different levels. There is a few BufferControl 
(BCB) 
level sync's used to syncronize access to the common structures such as LRU 
queue, Precedence graph, Dirty page list and so one. These common sync's should 
be locked at as short as possible period of time as this is kind of global 
locks. Also
there is a two syncs for every BufferDesc (BDB) - one for guard contents of 
page buffer
itself and second one is to prevent partial writes of page wich edition is not 
completed
and to prevent simultaneous write of the same page by different threads.

    I don't explain cache algoritms here, if someone have a question i'll be 
happy to 
answer it. And i not going to said that page cache implementation is frozen - if
necessary we will change it of course (there is still space for impovements and 
bugs
fixing).

    You know i already committed SyncObject class (and related classes) ported
from Vulcan. I removed some unused code and restored fairness of locking : 
initially
it was fair in Vulcan too (i.e. all lock requests was put in waiting queue and 
granted 
in first-in-first-out, or fair, order), but later Jim found some (unknown to 
me) issue with
this and made preference for SHARED locks. I found no performance issues with 
fair locking so i decided to revert that and restore original behavior. Of 
course it could 
be changed when necessary.

    Shared page cache is not an isolated change. It affects many parts of 
engine and
our syncronization model changed significantly again. There was agreement that 
we
will not implement shared metadata cache in v3 as this is risky and we could 
not 
deliver v3 in reasonable time frame. 

    In shared cache mode we have single Database instance and many Attachment 
instances linked to it (this is not new). 

    All metadata objects moved into Attachment. Metadata syncronization is 
guarded 
by attachment's mutex now. Database::SyncGuard and company are replaced by 
corresponding Attachment::XXX classes. 

    To make AST's work we need to release attachment mutex sometimes. This is 
very 
important change after v2.5 : in v2.5 attachment mutex is locked during whole 
duration 
of API call and no other API call (except of asyncronous fb_cancel_operation) 
could work 
with "busy" attacment. In v3 this rule is not worked anymore. So, now we could 
run more 
that one API call on the same attachment (of course not really simultaneously). 
I'm not 
sure it is safe but not disabled it so far. 

    To make asyncronous detach safe i introduced att_use_count counter which is
incremented each time when API call is entered and decremented on exit. detach 
now marks attachment as shutdown and waits for att_use_count == 0 before 
processing.

    Parallel access to the attachment could be easy disabled making every API 
call wait 
for att_use_count == 0 on enter or even introducing one more mutex to avoid 
spin wait. 
Also it seems this counter make obsolete att_in_use member as detach call 
should wait
for att_use_count == 0 and drop call should return "object is in use" if 
att_use_count != 0.

    All AST's related to attachment-level objects should take attachment mutex 
before 
access attachment internals. This is implemented but not tested !

    Transaction inventory pages cache (TIP cache) was reworked and is shared by 
all 
attachments.

    To avoid contention on common dbb_pool its usage was replaced by att_pool 
when 
possible. To make this task slightly easy there was introduced 
jrd_rel::rel_pool which
is points currenlty to the attachment's pool. All relation's "sub-objects" 
(such as 
formats, fields, etc) is allocated using rel_pool (it was dbb_pool before). 
When we'll
return metadata objects back to the Database it will be easy to redirect 
rel_pool to
dbb_pool at one place in code instead of makeing tens of small changes again.

    About stability testing of different parts of the engine :
- page cache - tested and worked
- nbackup - tested and worked
- monitoring - tested (less) and seems worked
- server stop\database shutdown - somewhat tested, no crash observed, client 
reaction 
    is not perfect (mostly network write erros)
- shadow - not tested
- metadata changes in concurrent environment - not tested
- garbage collection thread - not tested, needs review and rework of some 
    implementation details
- cache writer thread - not tested, needs review and rework 
- external connections - not tested
- external engines - not tested

    In configuration file there are two new settings :
a) SharedCache - boolean value which rules the page cache mode
b) SharedDatabase - boolean value which which rules the database file open mode.

    Currently they are common (as whole configuration) but soon it will be 
per-database
settings. Below is few examples of how it could be used :

- SharedCache = true, SharedDatabase = false (default mode)
    this is traditional SuperServer mode when all attachments share page cache 
and
    database file is opens in exclusive mode (only one server process could 
work with 
    database)

- SharedCache = false, SharedDatabase = true
    this is Classic mode when each attachment have its own page cache and many
    server processes could work with the same database.

    To run SuperClassic you should use switch -m in command line of firebird.exe
(on Windows) or run fb_smp_server (on Posix, here i'm not sure and Alex will
correct me)
    Else ClassicServer will run.

- SharedCache = true, SharedDatabase = true
    this is completely new mode in which database file could be opened by many
    server processes and each process could handle many attachments which will
    share page cache (i.e. per-process page cache).

    It could be used to run few SS processes, for example, or to run "main" SS 
process
and have ability to attach using embedded to make some special tasks.
    Must note that our lock manager is not ready to work in this mode, so we 
can't use 
it right now.
    Also there is unknown how performance will be affected when few SS with big 
cache
will work with the same database.

- SharedCache = false, SharedDatabase = false
    Looks like single process with single attachment will be allowed to work 
with 
    database with such settings. Probably you can find an applications for it ;)


    One more change in configuration is that CpuAffinityMask setting changed its
default value and it is 0 now. It allows new SS to use all available 
CPU's\cores by
default.


Regards,
Vlad

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

[Firebird-devel] Shared page cache

Reply via email to