[Firebird-devel] RFC: Data page allocation algorithm

Vlad Khorsun Thu, 26 Dec 2013 05:39:53 -0800

    Hi, all

    I will speak about how pages are allocated and released in Firebird. The 
current algorithm is well
known, simple and easy to both understanding and implementation:
- we have single bitmap in database where every bit corresponds to one page,
- this bitmap is stored at sequence of Page Inventory Pages (PIP) distributed 
evenly thru the database,
- pages are allocated and released one-by-one when necessary.


    So far, so good. I consider that we could make some improvements in this 
algorithm.

    First thing is batch allocation\release. I.e. ability to allocate or 
release a group of pages at once.
It could lower concurrency for PIP pages and number of writes of PIP when pages 
are allocated
(because of careful writes we must write PIP page before new allocated 
page(s)). Also it could
make faster release of big blob's and GTT's (as private and more often case of 
DROP TABLE).

    So, the first part is ability to allocate and release group of pages at 
once. Corresponding PIP
page is changed once.

    The second part is (based on first one) implementation of special 
allocation policy for data pages.
Some (or many) database engines already used it. Idea of algorithm below 
inspired by MSSQL but
there is a lot of Firebird's ODS specifics of course.

    I offer to allocate data pages not one-by-one (as currently) but in group 
of sequential ordered pages.
Such group of pages is often called "extent". I offer to change page allocation 
algorithm for tables as
following:
- if table is empty or small (have no full extent allocated) then data pages is 
allocated one-by-one (as
  currently)
- if table already have at least one full extent allocated, next request for 
new page will allocate the
  whole extent of pages
- size of extent is 8 pages
- every such extent is aligned at 8-pages boundary

    Such algoritm will reduce page-level fragmentation (all pages in extent are 
adjacent), allows
OS-level prefetch to work more efficient (it will read not just a bunch of 
pages of random objects but
pages related to the same table) and allows us in the future to read and write 
in a large chunks
making IO more efficient.

    There was requests to implement big pages (64KB, 128KB etc) to make reading 
more fast but
such solution have some drawbacks:
- big page good for readers but bad for writers - the more data we have on 
page, the more concurrent
  writers will wait for each other to change this page
- compressed index nodes are walked sequentially when some key is searched in 
index. Yes, jump
  nodes in ODS 11 lower this issue but not eliminated it completely. Again, big 
index pages is very bad
  for concurrent writers
- in Classic architecture different processes are often make exchange of pages 
between each other
  and exchange by big page obviously os more costly than exchange of small page

    I think that extents helps to solve problem of physical IO not making 
concurrency worse at the same
time. Implementation is ready for a few months and i consider it stable enough, 
so it will not delay
release of FB3. I can provide patch or compiled binaries (for Windows) for 
testing to interested.

Comments ?

Vlad


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

[Firebird-devel] RFC: Data page allocation algorithm

Reply via email to