from:"johno"

Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-22 Thread johno


 No, it doesn't.  Read it again ... or read up on row comparisons,
 if you're unfamiliar with that notation.  The above queries are
 exactly equivalent per spec.


Wow, this is great. Thanks.

[PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno

Hi there,

I am trying to optimize a simple query that returns first 100 rows that
have been updated since a given timestamp (ordered by timestamp and id
desc).  If there are several rows with the same timestamp I need to a
second condition, that states that I want to return rows having the given
timestamp and id  given id.

The obvious query is

SELECT * FROM register_uz_accounting_entities
 WHERE effective_on  '2014-07-11' OR (effective_on = '2014-07-11' AND
 id  1459)
 ORDER BY effective_on, id
 LIMIT 100


With a composite index on (effective_on, id)

Query plan

Limit  (cost=4613.70..4613.95 rows=100 width=1250) (actual
 time=0.122..0.130 rows=22 loops=1)
   Buffers: shared hit=28
   -  Sort  (cost=4613.70..4617.33 rows=1453 width=1250) (actual
 time=0.120..0.122 rows=22 loops=1)
 Sort Key: effective_on, id
 Sort Method: quicksort  Memory: 30kB
 Buffers: shared hit=28
 -  Bitmap Heap Scan on register_uz_accounting_entities
  (cost=35.42..4558.17 rows=1453 width=1250) (actual time=0.036..0.083
 rows=22 loops=1)
   Recheck Cond: ((effective_on  '2014-07-11'::date) OR
 ((effective_on = '2014-07-11'::date) AND (id  1459)))
   Buffers: shared hit=28
   -  BitmapOr  (cost=35.42..35.42 rows=1453 width=0) (actual
 time=0.026..0.026 rows=0 loops=1)
 Buffers: shared hit=6
 -  Bitmap Index Scan on idx2  (cost=0.00..6.49
 rows=275 width=0) (actual time=0.017..0.017 rows=15 loops=1)
   Index Cond: (effective_on  '2014-07-11'::date)
   Buffers: shared hit=3
 -  Bitmap Index Scan on idx2  (cost=0.00..28.21
 rows=1178 width=0) (actual time=0.007..0.007 rows=7 loops=1)
   Index Cond: ((effective_on =
 '2014-07-11'::date) AND (id  1459))
   Buffers: shared hit=3
 Total runtime: 0.204 ms



Everything works as expected. However if I change the constraint to a
timestamp/date more in the past (resulting in far more matching rows) the
query slows down drastically.


 SELECT * FROM register_uz_accounting_entities
 WHERE effective_on  '2014-06-11' OR (effective_on = '2014-06-11' AND id 
 1459)
 ORDER BY effective_on, id
 LIMIT 100

 Limit  (cost=0.42..649.46 rows=100 width=1250) (actual
 time=516.125..516.242 rows=100 loops=1)
   Buffers: shared hit=576201
   -  Index Scan using idx2 on register_uz_accounting_entities
  (cost=0.42..106006.95 rows=16333 width=1250) (actual time=516.122..516.229
 rows=100 loops=1)
 Filter: ((effective_on  '2014-06-11'::date) OR ((effective_on =
 '2014-06-11'::date) AND (id  1459)))
 Rows Removed by Filter: 797708
 Buffers: shared hit=576201
 Total runtime: 516.304 ms



I've tried to optimize this query by pushing down the limit and order by's
into explicit subselects.

SELECT * FROM (
SELECT * FROM register_uz_accounting_entities
WHERE effective_on  '2014-06-11'
ORDER BY effective_on, id LIMIT 100
) t1
 UNION
   SELECT * FROM (
 SELECT * FROM register_uz_accounting_entities
 WHERE effective_on = '2014-06-11' AND id  1459
 ORDER BY effective_on, id LIMIT 100
 ) t2
 ORDER BY effective_on, id
 LIMIT 100

 -- query plan
 Limit  (cost=684.29..684.54 rows=100 width=1250) (actual
 time=2.648..2.708 rows=100 loops=1)
   Buffers: shared hit=203
   -  Sort  (cost=684.29..684.79 rows=200 width=1250) (actual
 time=2.646..2.672 rows=100 loops=1)
 Sort Key: register_uz_accounting_entities.effective_on,
 register_uz_accounting_entities.id
 Sort Method: quicksort  Memory: 79kB
 Buffers: shared hit=203
 -  HashAggregate  (cost=674.65..676.65 rows=200 width=1250)
 (actual time=1.738..1.971 rows=200 loops=1)
   Buffers: shared hit=203
   -  Append  (cost=0.42..661.15 rows=200 width=1250) (actual
 time=0.054..0.601 rows=200 loops=1)
 Buffers: shared hit=203
 -  Limit  (cost=0.42..338.62 rows=100 width=1250)
 (actual time=0.053..0.293 rows=100 loops=1)
   Buffers: shared hit=101
   -  Index Scan using idx2 on
 register_uz_accounting_entities  (cost=0.42..22669.36 rows=6703 width=1250)
 (actual time=0.052..0.260 rows=100 loops=1)
 Index Cond: (effective_on 
 '2014-06-11'::date)
 Buffers: shared hit=101
 -  Limit  (cost=0.42..318.53 rows=100 width=1250)
 (actual time=0.037..0.228 rows=100 loops=1)
   Buffers: shared hit=102
   -  Index Scan using idx2 on
 register_uz_accounting_entities register_uz_accounting_entities_1
  (cost=0.42..30888.88 rows=9710 width=1250) (actual time=0.036..0.187
 rows=100 loops=1)
 Index Cond: ((effective_on =
 '2014-06-11'::date) AND (id  1459))
 Buffers: shared hit=102

Re: [PERFORM] Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno

Thanks for the quick reply David!

However I am still unsure how these two queries are not relationally
equivalent. I am struggling to find a counterexample where the first and
third query (in email, not in gist) would yield different results. Any
ideas?

Jano

On Mon, Jul 21, 2014 at 11:31 PM, David G Johnston
david.g.johns...@gmail.com wrote:

johno wrote
The question is... why is the query planner unable to make this
optimization for the slow query? What am I missing?

Short answer - your first and last queries are not relationally equivalent
and the optimizer cannot change the behavior of the query which it is
optimizing. i.e. you did not make an optimization but rather choose to
reformulate the question so that it could be answered more easily while
still providing an acceptable answer.

The question main question is better phrased as:

Give me 100 updated at t(0) but only that are subsequent to a given ID. If
there are less than 100 such records give me enough additional rows having
t
t(0) so that the total number of rows returned is equal to 100.

Both queries give the same answer but only due to the final LIMIT 100. They
arrive there in different ways which necessitates generating different
plans. At a basic level it is unable to push down LIMIT into a WHERE
clause
and it cannot add additional sub-queries that do not exist in the original
plan - which includes adding a UNION node.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Slow-query-with-indexed-ORDER-BY-and-LIMIT-when-using-OR-d-conditions-tp5812282p5812285.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno

Oh, yes I do understand that if I remove the outer limit, the semantics of
the query would change. However I am looking for the counterexample *with*
the limit clauses. Maybe I just don't understand what relationally
equivalent means, sorry about that.

BTW this is to my understanding a very similar scenario to how partitioned
tables work and push down limit and where conditions. Why is this not
possible in this case?

Jano


On Mon, Jul 21, 2014 at 11:54 PM, David G Johnston 
david.g.johns...@gmail.com wrote:

 johno wrote
  Thanks for the quick reply David!
 
  However I am still unsure how these two queries are not relationally
  equivalent. I am struggling to find a counterexample where the first and
  third query (in email, not in gist) would yield different results. Any
  ideas?

 Remove the outer LIMIT 100 from both queries...

 The first query would return a maximal number of rows that meet the OR
 criteria while the second query would return at most 200 rows since both
 sub-queries would still have their own independent LIMIT 100 clauses.

 David J.




 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Slow-query-with-indexed-ORDER-BY-and-LIMIT-when-using-OR-d-conditions-tp5812282p5812289.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno



 Try following my lead and bottom-post, please.


Sorry for that.



 Anyway, the query has no clue that because of the final LIMIT 100 that the
 two different feeding queries are just going to happen to end up providing
 the same result.  Maybe, in this particular instance, it is theoretically
 possible to make such a proof but generally that is not the case and so
 such
 an optimization has not made into the codebase even if it theoretically
 could be done (I'm not convinced it could but do not know enough to explain
 to someone else why I have that impression).

 I do not know enough to answer why this situation is any different from a
 similar partitioning scenario.  An example showing exactly what a similar
 partitioning query looks like would help in this regard.

 If you are looking for considerably more insight into the planner workings
 and why it does or doesn't do something you will need to wait for others.
  I
 can, to a reasonable degree, deconstruct a pair of queries and either
 explain or guess as to why things are happening but that is mostly applied
 deductive reasoning and not because I have any particular insight into the
 codebase.



Thanks again for your time. Let's just wait for someone else and see where
this will end up going.

Jano

Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno

On Tue, Jul 22, 2014 at 4:53 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 johno jan.suc...@gmail.com writes:
  I am trying to optimize a simple query that returns first 100 rows that
  have been updated since a given timestamp (ordered by timestamp and id
  desc).  If there are several rows with the same timestamp I need to a
  second condition, that states that I want to return rows having the given
  timestamp and id  given id.

  The obvious query is

  SELECT * FROM register_uz_accounting_entities
  WHERE effective_on  '2014-07-11' OR (effective_on = '2014-07-11' AND
  id  1459)
  ORDER BY effective_on, id
  LIMIT 100

 A more readily optimizable query is

 SELECT * FROM register_uz_accounting_entities
 WHERE (effective_on, id)  ('2014-07-11'::date, 1459)
 ORDER BY effective_on, id
 LIMIT 100


Yes, but that query has completely different semantics - I can't change
that.



 This formulation allows the planner to match both the WHERE and ORDER BY
 clauses directly to the two-column index.


Are both fields really used? I was under the impression that only the first
column from index can be used when there is a range query.



  I've tried to optimize this query by pushing down the limit and order
 by's
  into explicit subselects.

 As noted earlier, that's unlikely to be an improvement, because on its
 face it specifies more computation.  Postgres is not terribly bright
 about UNIONs, either.



Despite the cost calculation in explain the actual query times are very
different. I get consistent sub 50ms responses from the optimized one
(union with pushing down the limits) and 500+ms for the plain one (when not
using bitmap index scan).

Is this possible optimization considered by query planner or do I have
force it?

Thanks again for your time and effort, I appreciate it.




 regards, tom lane

Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

[PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

Re: [PERFORM] Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

Re: [PERFORM] Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

Re: [PERFORM] Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

6 matches

Site Navigation

Mail list logo

Footer information