On 24.08.2011 16:57, Alexander Korotkov wrote:
I've added some testing results to the wiki page:
http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011
There are not all the results I planned for the first chunk because it takes
more time than I expect.
Some notes about it.

Now I see two causes which accelerate regular build of GiST indexes:
1) As it was noted before regular index build of pretty ordered dataset is
fast.
2) I found that worse index is faster to build. I mean worse index is index
with higher overlaps. Function gistchoose selects the first index tuple with
zero penalty if any. Thus, with higher overlap in root page only few index
tuples of it will be choosed for insert. And, recursively, only small part
of the tree will be used for actual inserts. And that part of tree can
easier fit to the cache. Thus, high overlaps  makes inserts cheaper as much
as searches expensiver.

As an extreme case, a trivial penalty function that just always returns 0 will make index build fast - but the index will be useless for querying.

In the tests on the first version of patch I found index quality of regular
build much better than it of buffering build (without neighborrelocation).
Now it's similar, though it's because index quality of regular index build
become worse. There by in current tests regular index build is faster than
in previous. I see following possible causes of it:
  1) I didn't save source random data. So, now it's a new random data.
2) Some environment parameters of my test setup may alters, though I doubt.
Despite these possible explanation it seems quite strange for me.

That's pretty surprising. Assuming the data is truly random, I wouldn't expect a big difference in the index quality of one random data set over another. If the index quality depends so much on, say, the distribution of the few first tuples that are inserted to it, that's a quite interesting find on its own, and merits some further research.

In order to compare index build methods on more qualitative indexes, I've
tried to build indexes with my double sorting split method (see:
http://syrcose.ispras.ru/2011/files/SYRCoSE2011_Proceedings.pdf#page=36). So
on uniform dataset search is faster in about 10 times! And, as it was
expected, regular index build becomes much slower. It runs more than 60
hours and while only 50% of index is complete (estimated by file sizes).

Also, automatic switching to buffering build shows better index quality
results in all the tests. While it's hard for me to explain that.

Hmm, makes me a bit uneasy that we're testing with a modified page splitting algorithm. But if the new algorithm is that good, could you post that as a separate patch, please?

That said, I don't see any new evidence that the buffering build algorithm would be significantly worse. There's the case of ordered data that we already knew about, and will have to just accept for now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to