Re: [firebird-support] Gbak and indexes

2016-02-25 Thread Hugo Eyng hugoe...@msn.com [firebird-support]

Hello Ann.

Very good. Thank you.

Hugo

Em 25/02/2016 13:47, Ann Harrison aharri...@ibphoenix.com 
[firebird-support] escreveu:
Someone on a different forum asked whether gbak restores wouldn't be 
faster if gbak backed up indexes so it didn't have to rebuild them.  
The writer wanted the
clean-up and compaction that a gbak backup and restore provides, but 
not the

time spent retrieving records, sorting, and rebuilding indexes.

There's a problem with that.  An old index is of absolutely no use in 
a restored database exactly because records are in different places.  
The index contains the physical location of records.   Change the 
location, invalidate the index.   That's the short answer.


The longer answer:

An index contains a key value and a "record id" also called a db-key. 
When an index search finds an acceptable approximation of the value 
desired, it uses the db-key to find the corresponding record.


 (It's actually more complicated than that, and depends on whether the 
index is being used to optimize a sort, but in the end, an index 
search finds a db-key and uses it to find a record.)


What's in a db-key?  Three things: a pointer page sequence number, and 
offset into the pointer page, and an index offset on the data page.  A 
pointer page is a part of the structure of a table that's not visible 
outside the database.  It's simply a page that contains a vector of 
page numbers.  Those page numbers belong to the data pages in the 
table.  The RDB$PAGES table contains the sequence number and page 
number of pointer pages for all tables, among other things.  After a 
table has been referenced, the pointer page information is kept in 
memory. The pointer page may be in cache or Firebird may need to read 
it from disk.


With the pointer page in its virtual hand, Firebird reads the page 
number at the offset indicated by the db-key to get the page number of 
the appropriate data page and finds the data page in cache or on 
disk.  The offset on the data page in the db-key is an indirect 
pointer to the record.  Each data page has an index to the records on 
it containing the actual offset and length of the record, so a page 
can be reorganized without changing the db-keys of the records on it.


A gbak restore creates a new database and populates it with the data 
it backed up from the old database. Records will be stored on 
different pages and at different offsets.  Pages that were part of one 
table in the old database may be part of a different table or some 
other structure in the new database.


Gbak restore could be made faster if Firebird recognized at a commit 
that the transaction created several indexes on the same table and 
built all the indexes in parallel rather than reading the table for 
each index, but using an old index in a new database is not a good idea.


Good luck,

Ann




--


Atenciosamente,

Hugo Eyng



[firebird-support] Gbak and indexes

2016-02-25 Thread Ann Harrison aharri...@ibphoenix.com [firebird-support]
Someone on a different forum asked whether gbak restores wouldn't be faster
if gbak backed up indexes so it didn't have to rebuild them.  The writer
wanted the
clean-up and compaction that a gbak backup and restore provides, but not the
time spent retrieving records, sorting, and rebuilding indexes.

There's a problem with that.  An old index is of absolutely no use in a
restored database exactly because records are in different places.  The
index contains the physical location of records.   Change the location,
invalidate the index.   That's the short answer.

The longer answer:

An index contains a key value and a "record id" also called a db-key. When
an index search finds an acceptable approximation of the value desired, it
uses the db-key to find the corresponding record.

 (It's actually more complicated than that, and depends on whether the
index is being used to optimize a sort, but in the end, an index search
finds a db-key and uses it to find a record.)

What's in a db-key?  Three things: a pointer page sequence number, and
offset into the pointer page, and an index offset on the data page.  A
pointer page is a part of the structure of a table that's not visible
outside the database.  It's simply a page that contains a vector of page
numbers.  Those page numbers belong to the data pages in the table.  The
RDB$PAGES table contains the sequence number and page number of pointer
pages for all tables, among other things.  After a table has been
referenced, the pointer page information is kept in memory. The pointer
page may be in cache or Firebird may need to read it from disk.

With the pointer page in its virtual hand, Firebird reads the page number
at the offset indicated by the db-key to get the page number of the
appropriate data page and finds the data page in cache or on disk.  The
offset on the data page in the db-key is an indirect pointer to the
record.  Each data page has an index to the records on it containing the
actual offset and length of the record, so a page can be reorganized
without changing the db-keys of the records on it.

A gbak restore creates a new database and populates it with the data it
backed up from the old database.  Records will be stored on different pages
and at different offsets.  Pages that were part of one table in the old
database may be part of a different table or some other structure in the
new database.

Gbak restore could be made faster if Firebird recognized at a commit that
the transaction created several indexes on the same table and built all the
indexes in parallel rather than reading the table for each index, but using
an old index in a new database is not a good idea.

Good luck,

Ann