Hi, all
I will speak about how pages are allocated and released in Firebird. The
current algorithm is well
known, simple and easy to both understanding and implementation:
- we have single bitmap in database where every bit corresponds to one page,
- this bitmap is stored at sequence of Page Inventory Pages (PIP) distributed
evenly thru the database,
- pages are allocated and released one-by-one when necessary.
So far, so good. I consider that we could make some improvements in this
algorithm.
First thing is batch allocation\release. I.e. ability to allocate or
release a group of pages at once.
It could lower concurrency for PIP pages and number of writes of PIP when pages
are allocated
(because of careful writes we must write PIP page before new allocated
page(s)). Also it could
make faster release of big blob's and GTT's (as private and more often case of
DROP TABLE).
So, the first part is ability to allocate and release group of pages at
once. Corresponding PIP
page is changed once.
The second part is (based on first one) implementation of special
allocation policy for data pages.
Some (or many) database engines already used it. Idea of algorithm below
inspired by MSSQL but
there is a lot of Firebird's ODS specifics of course.
I offer to allocate data pages not one-by-one (as currently) but in group
of sequential ordered pages.
Such group of pages is often called "extent". I offer to change page allocation
algorithm for tables as
following:
- if table is empty or small (have no full extent allocated) then data pages is
allocated one-by-one (as
currently)
- if table already have at least one full extent allocated, next request for
new page will allocate the
whole extent of pages
- size of extent is 8 pages
- every such extent is aligned at 8-pages boundary
Such algoritm will reduce page-level fragmentation (all pages in extent are
adjacent), allows
OS-level prefetch to work more efficient (it will read not just a bunch of
pages of random objects but
pages related to the same table) and allows us in the future to read and write
in a large chunks
making IO more efficient.
There was requests to implement big pages (64KB, 128KB etc) to make reading
more fast but
such solution have some drawbacks:
- big page good for readers but bad for writers - the more data we have on
page, the more concurrent
writers will wait for each other to change this page
- compressed index nodes are walked sequentially when some key is searched in
index. Yes, jump
nodes in ODS 11 lower this issue but not eliminated it completely. Again, big
index pages is very bad
for concurrent writers
- in Classic architecture different processes are often make exchange of pages
between each other
and exchange by big page obviously os more costly than exchange of small page
I think that extents helps to solve problem of physical IO not making
concurrency worse at the same
time. Implementation is ready for a few months and i consider it stable enough,
so it will not delay
release of FB3. I can provide patch or compiled binaries (for Windows) for
testing to interested.
Comments ?
Vlad
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel