Hi,

On 09/04/2015 06:10 PM, Jeff Janes wrote:

How can we evaluate Tom's performance concerns?  I tried
turning log_planner_stats on and using the regression test as a load
generator, but I don't think that that is very demanding of a test.

I've done a bit of benchmarking today, trying to measure how expensive the additional checks are.

Using a simple table with just 4 columns and 1M rows

    CREATE TABLE t AS SELECT i AS a, i AS b, i AS c, i AS d
                        FROM generate_series(1,1000000) s(i);

with three different index sets:

  - no indexes
  - 40 regular indexes (indexes-1.sql)
  - 40 partial indexes (indexes-2.sql)

and two different query sets:

  - matching the partial indexes (queries-1.sql)
  - not matching the partial indexes (queries-2.sql)

which means 6 combinations:

   A: no indexes / queries-1
   B: no indexes / queries-2
   C: indexes-1 / queries-1
   D: indexes-1 / queries-2
   E: indexes-2 / queries-1
   F: indexes-2 / queries-2

A summary of 100 EXPLAIN timings looks like this:


master       A          B          C          D          E          F
-------------------------------------------------------------------------
min        0.10       0.10       0.30       0.29       0.66       0.23
max        1.07       1.00       2.13       1.98       4.52       1.59
median     0.49       0.52       0.31       0.33       0.68       1.12
average    0.43       0.35       0.62       0.49       1.01       0.89


patched     A          B          C          D          E          F
-------------------------------------------------------------------------
min        0.11       0.11       0.29       0.29       0.70       0.22
max        0.99       1.05       0.55       1.93       3.79       1.12
median     0.19       0.55       0.32       0.34       0.74       0.24
average    0.42       0.52       0.34       0.55       0.95       0.27


A-D should be exactly the same, because there are no partial indexes, and the results match that expectation.

E and F should be different, depending on how expensive the additional checks are. But in this benchmark that's not true - the patched version is actually a bit faster, thanks to noise.

I find that a bit strange, but I repeated the benchmark about 3x just to verify it really behaves like this. Maybe I did some stupid mistake and the results are useless, or maybe it needs to be more complex (e.g. the conditions must not be exactly the same). So if someone could rerun the benchmark and review it, that'd be nice.

Judging the cost/benefit ratio is a bit tricky. We need to identify the cases where additional planning complexity makes it measurably slower, without getting better performance at execution. And then we need to somehow argue whether those cases are frequent enough or not.

ISTM that the worst case would be a data set with many partial indexes, that however don't allow IOS. And the amount of data would have to be small, so that the queries don't take too much time (which would make the additional planning time noise).

However that was the idea of the benchmark, and I got no difference.

regards
Tomas

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: queries-1.sql
Description: application/sql

Attachment: queries-2.sql
Description: application/sql

Attachment: indexes-2.sql
Description: application/sql

Attachment: indexes-1.sql
Description: application/sql

Attachment: table.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to