Re: [HACKERS] WIP: Fast GiST index build

Heikki Linnakangas Mon, 06 Jun 2011 03:52:41 -0700

On 06.06.2011 10:42, Heikki Linnakangas wrote:

On 03.06.2011 14:02, Alexander Korotkov wrote:

Hackers,


WIP patch of fast GiST index build is attached. Code is dirty and
comments
are lacking, but it works. Now it is ready for first benchmarks, which
should prove efficiency of selected technique. It's time to compare fast
GiST index build with repeat insert build on large enough datasets
(datasets
which don't fit to cache). There are following aims of testing:
1) Measure acceleration of index build.
2) Measure change in index quality.
I'm going to do first testing using synthetic datasets. Everybody who
have
interesting real-life datasets for testing are welcome.


I ran another test with a simple table generated with:

CREATE TABLE pointtest (p point);
INSERT INTO pointtest SELECT point(random(), random()) FROM
generate_series(1,50000000);

Generating a gist index with:

CREATE INDEX i_pointtest ON pointtest USING gist (p);

took about 15 hours without the patch, and 2 hours with it. That's quite
dramatic.

Oops, that was a rounding error, sorry. The run took about 2.7 hourswith the patch, which of course should be rounded to 3 hours, not 2.Anyway, it is still a very impressive improvement.

I'm glad you could get the patch ready for benchmarking this quickly.Now you just need to get the patch into shape so that it can becommitted. That is always the more time-consuming part, so I'm glad youhave plenty of time left for it.

Could you please create a TODO list on the wiki page, listing all themissing features, known bugs etc. that will need to be fixed? That'llmake it easier to see how much work there is left. It'll also helpanyone looking at the patch to know which issues are known issues.

Meanwhile, it would still be very valuable if others could test thiswith different workloads. And Alexander, it would be good if at somepoint you could write some benchmark scripts too, and put them on thewiki page, just to see what kind of workloads have been taken intoconsideration and tested already. Do you think there's some worst-casedata distributions where this algorithm would perform particularly badly?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

Reply via email to