Re: [HACKERS] WIP: Fast GiST index build

Heikki Linnakangas Mon, 25 Jul 2011 11:53:00 -0700

On 22.07.2011 12:38, Alexander Korotkov wrote:

Patch with my try to detect ordered datasets is attached. The implemented
idea is desribed below.
Index tuples are divided by chunks of 128. On each chunk we measure how much
leaf pages where index tuples was inserted don't match those of previous
chunk. Based on statistics of several chunks we estimate distribution of
accesses between lead pages (exponential distribution law is accumed and
it's seems to be an error). After that we can estimate portion of index
tuples which can be processed without actual IO. If this estimate exceeds
threshold then we should switch to buffering build.
Now my implementation successfully detects randomly mixed datasets and well
ordered datasets, but it's seems to be too optimistic about intermediate
cases. I believe it's due to wrong assumption about distribution law.
Do you think this approach is acceptable? Probably there are some researches
about distribution law for such cases (while I didn't find anything relevant
in google scholar)?

Great! It would be nice to find a more scientific approach to this, butthat's probably fine for now. It's time to start cleaning up the patchfor eventual commit.

You got rid of the extra page pins, which is good, but I wonder why youstill pre-create all the GISTLoadedPartItem structs for the wholesubtree in loadTreePart() ? Can't you create those structs on-the-fly,when you descend the tree? I understand that it's difficult to updateall the parent-pointers as trees are split, but it feels like there'sway too much bookkeeping going on. Surely it's possible to simplify itsomehow..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

Reply via email to