Hi hackers, 

I want to ask two questions about PostgreSQL optimizer.
I have the following query:

SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id as 
dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name,o.title,o.size,o.deleted,la.otime,la.etime,uo.login
 as owner_login,uc.login as creator_login,(CASE WHEN f.user_id IS NULL THEN 0 
ELSE 1 END) AS flagged,(select 'userid\\:'||string_agg(user_id,' userid\\:') 
from get_authorized_users(o.id)) as acl FROM objects s JOIN objects o ON 
s.original_file=o.id LEFT JOIN flags f ON o.id=f.obj_id AND o.owner=f.user_id 
LEFT JOIN objects_last_activity la ON o.id = la.obj_id AND o.owner = 
la.user_id, mime m, users uc , users uo WHERE (s.mime_id=904 or s.mime_id=908) 
AND m.mime_id = o.mime_id AND o.owner = uo.user_id AND o.creator = uc.user_id 
ORDER BY s.mtime LIMIT 9;

It is executed more than one hours. And the same query with "limit 8" is 
executed in just few seconds.
get_authorized_users is quite expensive function performing recursive query 
through several tables.

In first case (limit 9) sequential scan with sort is used  while in the second 
case index scan on "mtime" index is performed and so no sort is needed.
In the second case we can extract just 9 rows and calculate 
get_authorized_users  for them. In the first case we have to calculate this 
functions for ALL (~15M) rows.

I investigated plans of both queries and find out there are three reasons of 
the problem:

1. Wrong estimation of filter selectivity: 65549 vs. 154347 real.
2. Wrong estimation of joins selectivity: 284 vs. 65549 real
3. Wrong estimation of get_authorized_users  cost: 0.25 vs. 57.379 real 

Below is output of explain analyse:

Limit  (cost=1595355.77..1595355.79 rows=9 width=301) (actual 
time=3823128.752..3823128.755 rows=9 loops=1)
   ->  Sort  (cost=1595355.77..1595356.48 rows=284 width=301) (actual 
time=3823128.750..3823128.753 rows=9 loops=1)
         Sort Key: s.mtime
         Sort Method: top-N heapsort  Memory: 30kB
         ->  Nested Loop  (cost=1.96..1595349.85 rows=284 width=301) (actual 
time=75.453..3822829.640 rows=65549 loops=1)
               ->  Nested Loop Left Join  (cost=1.54..1589345.66 rows=284 
width=271) (actual time=1.457..59068.107 rows=65549 loops=1)
                     ->  Nested Loop Left Join  (cost=0.98..1586923.67 rows=284 
width=255) (actual time=1.430..55661.926 rows=65549 loops=1)
                           Join Filter: (((o.id)::text = (f.obj_id)::text) AND 
((o.owner)::text = (f.user_id)::text))
                           Rows Removed by Join Filter: 132670698
                           ->  Nested Loop  (cost=0.98..1576821.09 rows=284 
width=236) (actual time=0.163..27721.566 rows=65549 loops=1)
                                 Join Filter: ((o.mime_id)::integer = 
(m.mime_id)::integer)
                                 Rows Removed by Join Filter: 48768456
                                 ->  Seq Scan on mime m  (cost=0.00..13.45 
rows=745 width=30) (actual time=0.008..1.372 rows=745 loops=1)
                                 ->  Materialize  (cost=0.98..1573634.65 
rows=284 width=214) (actual time=0.004..22.918 rows=65549 loops=745)
                                       ->  Nested Loop  (cost=0.98..1573633.23 
rows=284 width=214) (actual time=0.142..7095.554 rows=65549 loops=1)
                                             ->  Nested Loop  
(cost=0.56..1570232.37 rows=406 width=184) (actual time=0.130..6468.376 
rows=65549 loops=1)
                                                   ->  Seq Scan on objects s  
(cost=0.00..1183948.58 rows=45281 width=63) (actual time=0.098..5346.023 
rows=154347 loops=1)
                                                         Filter: 
(((mime_id)::integer = 904) OR ((mime_id)::integer = 908))
                                                         Rows Removed by 
Filter: 15191155
                                                   ->  Index Scan using 
objects_pkey on objects o  (cost=0.56..8.52 rows=1 width=140) (actual 
time=0.006..0.007 rows=0 loops=154347)
                                                         Index Cond: 
((id)::text = (s.original_file)::text)
                                             ->  Index Scan using users_pkey on 
users uc  (cost=0.42..8.37 rows=1 width=49) (actual time=0.008..0.009 rows=1 
loops=65549)
                                                   Index Cond: ((user_id)::text 
= (o.creator)::text)
                           ->  Materialize  (cost=0.00..48.36 rows=2024 
width=38) (actual time=0.001..0.133 rows=2024 loops=65549)
                                 ->  Seq Scan on flags f  (cost=0.00..38.24 
rows=2024 width=38) (actual time=0.009..0.462 rows=2024 loops=1)
                     ->  Index Scan using objects_last_activity_unique_index on 
objects_last_activity la  (cost=0.56..8.52 rows=1 width=54) (actual 
time=0.044..0.046 rows=1 loops=65549)
                           Index Cond: (((o.id)::text = (obj_id)::text) AND 
((o.owner)::text = (user_id)::text))
               ->  Index Scan using users_pkey on users uo  (cost=0.42..8.37 
rows=1 width=49) (actual time=0.015..0.019 rows=1 loops=65549)
                     Index Cond: ((user_id)::text = (o.owner)::text)
               SubPlan 1
                 ->  Aggregate  (cost=12.75..12.76 rows=1 width=32) (actual 
time=57.390..57.391 rows=1 loops=65549)
                       ->  Function Scan on get_authorized_users  
(cost=0.25..10.25 rows=1000 width=32) (actual time=57.379..57.379 rows=2 
loops=65549)
 Total runtime: 3823133.235 ms
(33 rows)

Then I try to manually adjust cost of function get_authorized_users using 
"alter function ... cost" statement.
It has effect on plans shown by analyze. Now plan with index scan has much 
small cost (2.39..1911772.85) than plan with sort (8507669.11..8507669.13).
But still sort is used for query execution unless I set "enable_sort=off".

I tried to investigate work of optimizer and find out that this choice is made 
in grouping_planner function:

        /*
         * Forget about the presorted path if it would be cheaper to sort the
         * cheapest-total path.  Here we need consider only the behavior at
         * the tuple_fraction point.  Also, limit_tuples is only relevant if
         * not grouping/aggregating, so use root->limit_tuples in the
         * cost_sort call.
         */
        if (sorted_path)
        {
            Path        sort_path;        /* dummy for result of cost_sort */

            if (root->query_pathkeys == NIL ||
                pathkeys_contained_in(root->query_pathkeys,
                                      cheapest_path->pathkeys))
            {
                /* No sort needed for cheapest path */
                sort_path.startup_cost = cheapest_path->startup_cost;
                sort_path.total_cost = cheapest_path->total_cost;
            }
            else
            {
                /* Figure cost for sorting */
                cost_sort(&sort_path, root, root->query_pathkeys,
                          cheapest_path->total_cost,
                          path_rows, path_width,
                          0.0, work_mem, root->limit_tuples);
            }

            if (compare_fractional_path_costs(sorted_path, &sort_path,
                                              tuple_fraction) > 0)
            {
                /* Presorted path is a loser */
                sorted_path = NULL;
            }
        }


In this case cost of sorted path is higher than of seq scan path and so sorted 
path is rejected.
So now two questions:

1. The cost compared in grouping_planner doesn't take in account price of 
get_authorized_users - it is not changed when I am altering function cost. Is 
it correct behavior?

2. Actually there is no need to calculate get_authorized_users functions for 
ALL records before sort. This field is not involved in any calculations. So we 
can perform sort, extract 9 rows and then calculate get_authorized_users only 
for them. I wonder why PostgreSQL optimizer is not trying to split target list 
into columns which are needed immediately and which are can be calculated later 
(lazy evaluation of columns). It seems to be very efficient in case of presence 
of filtering and limiting operators. But for some reasons it is not happen in 
this query. Certainly I can rewrite it using nested subselect, doing this 
optimization myself. But I find this optimization to be quite useful...

I can send example reproducing the problem with the latest devel branch. It is 
using slightly different database schema, dummy data and produce different 
estimations and timing.
But the problems are persist: optimizer choose the plan with worser cost and 
doesn't perform lazy evaluation of target list.

Thanks in advance,
Konstantin

 


Reply via email to