Hi all.
After more than a year of development, testing, switching on another tasks
and
returning back i'm ready to commit shared page cache implementation into trunk.
I consider it stable enough to be committed into SVN. It runs some
stability tests
more than 10 hours for longest run and many shorter runs at different
configurations.
Here i want to do overview of changes, what was tested, what was not, and
what
is work in progress.
Page cache was re-implemented looking at both Vulcan code (thanks to Jim)
and
our code base. It is not a blind copy of Vulcan code, every decision was
thinking on
and adjusted when needed with my understanding and expirience with existing
code.
The cache is syncronized at different levels. There is a few BufferControl
(BCB)
level sync's used to syncronize access to the common structures such as LRU
queue, Precedence graph, Dirty page list and so one. These common sync's should
be locked at as short as possible period of time as this is kind of global
locks. Also
there is a two syncs for every BufferDesc (BDB) - one for guard contents of
page buffer
itself and second one is to prevent partial writes of page wich edition is not
completed
and to prevent simultaneous write of the same page by different threads.
I don't explain cache algoritms here, if someone have a question i'll be
happy to
answer it. And i not going to said that page cache implementation is frozen - if
necessary we will change it of course (there is still space for impovements and
bugs
fixing).
You know i already committed SyncObject class (and related classes) ported
from Vulcan. I removed some unused code and restored fairness of locking :
initially
it was fair in Vulcan too (i.e. all lock requests was put in waiting queue and
granted
in first-in-first-out, or fair, order), but later Jim found some (unknown to
me) issue with
this and made preference for SHARED locks. I found no performance issues with
fair locking so i decided to revert that and restore original behavior. Of
course it could
be changed when necessary.
Shared page cache is not an isolated change. It affects many parts of
engine and
our syncronization model changed significantly again. There was agreement that
we
will not implement shared metadata cache in v3 as this is risky and we could
not
deliver v3 in reasonable time frame.
In shared cache mode we have single Database instance and many Attachment
instances linked to it (this is not new).
All metadata objects moved into Attachment. Metadata syncronization is
guarded
by attachment's mutex now. Database::SyncGuard and company are replaced by
corresponding Attachment::XXX classes.
To make AST's work we need to release attachment mutex sometimes. This is
very
important change after v2.5 : in v2.5 attachment mutex is locked during whole
duration
of API call and no other API call (except of asyncronous fb_cancel_operation)
could work
with "busy" attacment. In v3 this rule is not worked anymore. So, now we could
run more
that one API call on the same attachment (of course not really simultaneously).
I'm not
sure it is safe but not disabled it so far.
To make asyncronous detach safe i introduced att_use_count counter which is
incremented each time when API call is entered and decremented on exit. detach
now marks attachment as shutdown and waits for att_use_count == 0 before
processing.
Parallel access to the attachment could be easy disabled making every API
call wait
for att_use_count == 0 on enter or even introducing one more mutex to avoid
spin wait.
Also it seems this counter make obsolete att_in_use member as detach call
should wait
for att_use_count == 0 and drop call should return "object is in use" if
att_use_count != 0.
All AST's related to attachment-level objects should take attachment mutex
before
access attachment internals. This is implemented but not tested !
Transaction inventory pages cache (TIP cache) was reworked and is shared by
all
attachments.
To avoid contention on common dbb_pool its usage was replaced by att_pool
when
possible. To make this task slightly easy there was introduced
jrd_rel::rel_pool which
is points currenlty to the attachment's pool. All relation's "sub-objects"
(such as
formats, fields, etc) is allocated using rel_pool (it was dbb_pool before).
When we'll
return metadata objects back to the Database it will be easy to redirect
rel_pool to
dbb_pool at one place in code instead of makeing tens of small changes again.
About stability testing of different parts of the engine :
- page cache - tested and worked
- nbackup - tested and worked
- monitoring - tested (less) and seems worked
- server stop\database shutdown - somewhat tested, no crash observed, client
reaction
is not perfect (mostly network write erros)
- shadow - not tested
- metadata changes in concurrent environment - not tested
- garbage collection thread - not tested, needs review and rework of some
implementation details
- cache writer thread - not tested, needs review and rework
- external connections - not tested
- external engines - not tested
In configuration file there are two new settings :
a) SharedCache - boolean value which rules the page cache mode
b) SharedDatabase - boolean value which which rules the database file open mode.
Currently they are common (as whole configuration) but soon it will be
per-database
settings. Below is few examples of how it could be used :
- SharedCache = true, SharedDatabase = false (default mode)
this is traditional SuperServer mode when all attachments share page cache
and
database file is opens in exclusive mode (only one server process could
work with
database)
- SharedCache = false, SharedDatabase = true
this is Classic mode when each attachment have its own page cache and many
server processes could work with the same database.
To run SuperClassic you should use switch -m in command line of firebird.exe
(on Windows) or run fb_smp_server (on Posix, here i'm not sure and Alex will
correct me)
Else ClassicServer will run.
- SharedCache = true, SharedDatabase = true
this is completely new mode in which database file could be opened by many
server processes and each process could handle many attachments which will
share page cache (i.e. per-process page cache).
It could be used to run few SS processes, for example, or to run "main" SS
process
and have ability to attach using embedded to make some special tasks.
Must note that our lock manager is not ready to work in this mode, so we
can't use
it right now.
Also there is unknown how performance will be affected when few SS with big
cache
will work with the same database.
- SharedCache = false, SharedDatabase = false
Looks like single process with single attachment will be allowed to work
with
database with such settings. Probably you can find an applications for it ;)
One more change in configuration is that CpuAffinityMask setting changed its
default value and it is 0 now. It allows new SS to use all available
CPU's\cores by
default.
Regards,
Vlad
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel