Re: Inconsistent query performance based on relation hit frequency

Andrei Lepikhov Thu, 27 Jun 2024 03:32:09 -0700

On 6/27/24 07:50, Laura Hausmann wrote:

I'd appreciate any and all input on the situation. If I've left out anyinformation that would be useful in figuring this out, please tell me.

Thanks for this curious case, I like it!

At first, you can try to avoid "OR" expressions - PostgreSQL has quitelimited set of optimisation/prediction tricks on such expressions.Second - I see, postgres predicts wrong number of tuples. But using mytypical tool [1] and getting more precise estimations i don't seesignificant profit:


 Limit  (cost=10832.85..10838.69 rows=50 width=21)
   ->  Gather Merge  (cost=10832.85..10838.92 rows=52 width=21)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=9832.83..9832.90 rows=26 width=21)
               Sort Key: objects.id DESC
               Sort Method: top-N heapsort  Memory: 32kB
               Worker 0:  Sort Method: quicksort  Memory: 32kB
               Worker 1:  Sort Method: quicksort  Memory: 32kB
               ->  Parallel Seq Scan on objects
                     Filter: ((hashed SubPlan 1) OR ("userId" = 1))
                     Rows Removed by Filter: 183372
                     SubPlan 1
                       ->  Nested Loop
                             ->  Index Only Scan using users_pkey on
                                   Index Cond: (id = 1)
                                   Heap Fetches: 0

-> Index Only Scan using"relationships_followerId_followeeId_idx" on relationships

                                   Index Cond: ("followerId" = 1)
                                   Heap Fetches: 0
 Planning Time: 0.762 ms
 Execution Time: 43.816 ms

 Limit  (cost=10818.83..10819.07 rows=2 width=21)
   ->  Gather Merge  (cost=10818.83..10819.07 rows=2 width=21)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=9818.81..9818.81 rows=1 width=21)
               Sort Key: objects.id DESC
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Worker 1:  Sort Method: quicksort  Memory: 25kB
               ->  Parallel Seq Scan on objects
                     Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                     Rows Removed by Filter: 183477
                     SubPlan 1
                       ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)

-> Index Only Scan using"relationships_followerId_followeeId_idx" on relationships

                                   Index Cond: ("followerId" = 4)
                                   Heap Fetches: 0
                             ->  Index Only Scan using users_pkey
                                   Index Cond: (id = 4)
                                   Heap Fetches: 0
 Planning Time: 0.646 ms
 Execution Time: 30.824 ms

But this was achieved just because of parallel workers utilisation.Disabling them we get:

Limit (cost=14635.07..14635.08 rows=2 width=21) (actualtime=75.941..75.943 rows=0 loops=1)-> Sort (cost=14635.07..14635.08 rows=2 width=21) (actualtime=75.939..75.940 rows=0 loops=1)

         Sort Key: objects.id DESC
         Sort Method: quicksort  Memory: 25kB

-> Seq Scan on objects (cost=8.61..14635.06 rows=2 width=21)(actual time=75.931..75.932 rows=0 loops=1)

               Filter: ((hashed SubPlan 1) OR ("userId" = 4))
               Rows Removed by Filter: 550430
               SubPlan 1

-> Nested Loop (cost=0.56..8.61 rows=1 width=4)(actual time=0.039..0.040 rows=0 loops=1)-> Index Only Scan using"relationships_followerId_followeeId_idx" on relationships(cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1)

                             Index Cond: ("followerId" = 4)
                             Heap Fetches: 0

-> Index Only Scan using users_pkey on users(cost=0.29..4.31 rows=1 width=4) (never executed)

                             Index Cond: (id = 4)
                             Heap Fetches: 0
 Planning Time: 0.945 ms
 Execution Time: 76.123 ms

So, from the optimiser's point of view, it has done the best it could.

Theoretically, if you have a big table with indexes and must select asmall number of tuples, the ideal query plan will include parameterisedNestLoop JOINs. Unfortunately, parameterisation in PostgreSQL can't passinside a subquery. It could be a reason for new development becauseMSSQL can do such a trick, but it is a long way.You can try to rewrite your schema and query to avoid subqueries inexpressions at all.

I hope this message gave you some insights.

[1] https://github.com/postgrespro/aqo

--
regards, Andrei Lepikhov

Re: Inconsistent query performance based on relation hit frequency

Reply via email to