Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?

didier Sun, 29 Sep 2013 23:12:07 -0700

Hi


On Wed, Sep 25, 2013 at 6:42 PM, Sam Wong <s...@hellosam.net> wrote:

>
> > ---
> Excuse me, this is the actual query.
>
> with Q AS (
> select event
> from event_log
> WHERE log_id>1010000 and log_id<1050000
> order by event
> )
> SELECT * FROM Q LIMIT 1
> >
> > Limit  (cost=2521.82..2521.83 rows=1 width=32) (actual
> time=88.342..88.342
> > rows=1 loops=1)
> >   Output: q.event
> >   Buffers: shared hit=93 read=622
> >   CTE q
> >     ->  Sort  (cost=2502.07..2521.82 rows=39502 width=6) (actual
> > time=88.335..88.335 rows=1 loops=1)
> >           Output: event_log.event
> >           Sort Key: event_log.event
> >           Sort Method: quicksort  Memory: 3486kB
> >           Buffers: shared hit=93 read=622
> >           ->  Index Scan using event_log_pkey on uco.event_log
> > (cost=0.00..1898.89 rows=39502 width=6) (actual time=13.918..65.573
> > rows=39999 loops=1)
> >                 Output: event_log.event
> >                 Index Cond: ((event_log.log_id > 1010000) AND
> > (event_log.log_id < 1050000))
> >                 Buffers: shared hit=93 read=622
> >   ->  CTE Scan on q  (cost=0.00..237.01 rows=39502 width=32) (actual
> > time=88.340..88.340 rows=1 loops=1)
> >         Output: q.event
> >         Buffers: shared hit=93 read=622
> > Total runtime: 89.039 ms
> >
> > ---
> > Slow Query
> select min(event)
> from event_log
> WHERE log_id>1010000 and log_id<1050000
> > Result  (cost=1241.05..1241.05 rows=1 width=0) (actual
> > time=1099.532..1099.533 rows=1 loops=1)
> >   Output: $0
> >   Buffers: shared hit=49029 read=57866
> >   InitPlan 1 (returns $0)
> >     ->  Limit  (cost=0.00..1241.05 rows=1 width=6) (actual
> > time=1099.527..1099.527 rows=1 loops=1)
> >           Output: uco.event_log.event
> >           Buffers: shared hit=49029 read=57866
> >           ->  Index Scan using event_data_search on uco.event_log
> > (cost=0.00..49024009.79 rows=39502 width=6) (actual
> > time=1099.523..1099.523
> > rows=1 loops=1)
> >                 Output: uco.event_log.event
> >                 Index Cond: (uco.event_log.event IS NOT NULL)
> >                 Filter: ((uco.event_log.log_id > 1010000) AND
> > (uco.event_log.log_id < 1050000))
> >                 Rows Removed by Filter: 303884
> >                 Buffers: shared hit=49029 read=57866 Total runtime:
> > 1099.568 ms
> > (Note: Things got buffered so it goes down to 1 second, but comparing to
> the
> > buffer count with the query above this is a few orders slower than that)
> >
> > ---
> > The CTE "fast query" works, it is completed with index scan over
> > "event_log_pkey", which is what I am expecting, and it is good.
> > But it's much straight forward to write it as the "slow query", I am
> expecting
> > the planner would give the same optimum plan for both.
> >
> > For the plan of "Slow query" has an estimated total cost of 1241.05, and
> the
> > "Fast query" has 2521.83, hence the planner prefers that plan - the
> scanning
> > over the "event_data_search" index plan - but this choice doesn't make
> sense
> > to me.
> >
> > This is my point I want to bring up, why the planner would do that?
> Instead of
> > scanning over the "event_log_pkey"?
> >
>
 Maybe there's a bug but I don't think so, Postgresql optimizer is strongly
bias toward uncorrelated data but in your case log_id and event are highly
correlated, right?

With your example it has to chose between:
1- play safe and use  event_log_pkey, scan 39502 rows and select the
smallest event.

2- use event_data_search, 4 000 000 rows, 40 000 with a log_id in the
interval thus win big and find one in the first 100 scanned rows or lose
big and scan 4 000 000 rows.

With uncorrelated data 2- is 400 time faster than 1-, 100 rows vs 40 000.

Postgresql is a high stake gambler :)

Didier

Re: [PERFORM] Slow plan for MAX/MIN or LIMIT 1?

Reply via email to