I spent some time looking into the performance complaint at [1],
which for the sake of self-containedness is

CREATE TABLE t(a int, b int);

INSERT INTO t(a, b)
SELECT
    (random() * 123456)::int AS a,
    (random() * 123456)::int AS b
FROM
    generate_series(1, 12345678);

CREATE INDEX my_idx ON t USING BTREE (a, b);

VACUUM ANALYZE t;

explain (analyze, buffers) select * from t
where row(a, b) > row(123450, 123444) and a = 0
order by a, b;

This produces something like

 Index Only Scan using my_idx on t  (cost=0.43..8.46 rows=1 width=8) (actual 
time=475.713..475.713 rows=0 loops=1)
   Index Cond: ((ROW(a, b) > ROW(123450, 123444)) AND (a = 0))
   Heap Fetches: 0
   Buffers: shared hit=1 read=33731
 Planning:
   Buffers: shared read=4
 Planning Time: 0.247 ms
 Execution Time: 475.744 ms

showing that we are reading practically the whole index, which
is pretty sad considering the index conditions are visibly
mutually contradictory.  What's going on?  I find that:

1. _bt_preprocess_keys, which is responsible for detecting
mutually-contradictory index quals, fails to do so because it
really just punts on row-comparison quals: it shoves them
directly from input to output scankey array without any
comparisons to other keys.  (In particular, that causes the
row-comparison qual to appear before the a = 0 one in the
output scankey array.)

2. The initial-positioning logic in _bt_first chooses "a = 0"
as determining where to start the scan, because it always
prefers equality over inequality keys.  (This seems reasonable.)

3. We really should stop the scan once we're past the last a = 0
index entry, which'd at least limit the damage.  However, at both
a = 0 and later entries, the row-comparison qual fails causing
_bt_check_compare to return immediately, without examining the
a = 0 key which is marked as SK_BT_REQFWD, and thus it does not
clear "continuescan".  Only when we finally reach an index entry
for which the row-comparison qual succeeds do we notice that
a = 0 is failing so we could stop the scan.

So this seems pretty horrid.  It would be nice if _bt_preprocess_keys
were smart enough to notice the contradictory nature of these quals,
but I grant that (a) that routine is dauntingly complex already and
(b) this doesn't seem like a common enough kind of query to be worth
moving heaven and earth to optimize.

However, I do think we should do something about the unstated
assumption that _bt_preprocess_keys can emit the quals (for a given
column) in any random order it feels like.  This is evidently not so,
and it's probably capable of pessimizing other examples besides this
one.  Unless we want to slow down _bt_check_compare by making it
continue to examine quals after the first failure, we need to insist
that required quals appear before non-required quals, thus ensuring
that a comparison failure will clear continuescan if possible.

Even that looks a little nontrivial, because it seems like nbtree
may be making some assumptions about the order in which array keys
appear.  I see the bit about

 * ... Some reordering of the keys
 * within each attribute may be done as a byproduct of the processing here.
 * That process must leave array scan keys (within an attribute) in the same
 * order as corresponding entries from the scan's BTArrayKeyInfo array info.

which I could cope with, but then there's this down around line 2967:

                 * Note: We do things this way around so that our arrays are
                 * always in the same order as their corresponding scan keys,
                 * even with incomplete opfamilies.  _bt_advance_array_keys
                 * depends on this.

However, despite the rather over-the-top verbosity of commenting in
_bt_advance_array_keys, it's far from clear why or how it depends on
that.  So I feel a little stuck about what needs to be done here.

                        regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAAdwFAxBjyrYUkH7u%2BEceTaztd1QxBtBY1Teux8K%3DvcGKe%3D%3D-A%40mail.gmail.com


Reply via email to