On 17/1/26 00:51, Andrei Lepikhov wrote:
Thoughts?
In this letter, I want to demonstrate how to use the proposed feature.Install PostgreSQL 18 and Join-Order-Benchmark [1]. We need just one pass, executing each query one-by-one. To identify issues, I use the pg_track_optimizer extension [2] to examine the ‘bad join’ criterion. For each JOIN node, the following is calculated:
jf_max = (nfiltered1 + nfiltered2 + nfiltered3) / nloopsExtension stores MAX(jf_max) for each query to detect potentially optimisable ones. So, let’s benchmark both with and without the proposed change. TOP-5 bad queries expose quite a different picture:
SELECT queryid,LEFT(query,12) AS query,floor(jf_max) AS jf_max FROM pg_track_optimizer WHERE jf_max > 0 ORDER BY jf_max DESC LIMIT 5; Without the patch: queryid | query | jf_max ----------------------±-------------±------- 4335842597099666660 | /* 27c.sql * | 490348 patched Postgres and the extension: queryid | query | jf_max ----------------------±-------------±------- -3135863217603061370 | /* 32a.sql * | 871591 6389747766960672879 | /* 27c.sql * | 491129 -6029921280260605067 | /* 6a.sql / | 218912 5025464644910963332 | / 25b.sql * | 25753 1798119524484989875 | /* 10c.sql * | 22939In the patched variant, we see more potential cases. Let’s take a look at the first three of them.
Query 27c.sql (see analysis-27c.sql in attachment) is detected as filtering too much in both cases because, on occasion, intensive filtering occurs in a NestLoop node. Index identification is a separate topic, but here I used a quite simple approach: I just asked Claude to collect join clauses that filter out many rows and propose potential indexes. It is not for production, but worked with such a simple and limited set of queries.
So, without the patch, we have extra indexes: CREATE INDEX ON movie_info(movie_id); CREATE INDEX ON movie_companies(movie_id);Patched version finds one more problematic join, and the recommendation changes a little:
CREATE INDEX ind_3 ON movie_companies(movie_id,company_id,company_type_id); After applying these indexes query speeds up 3340 ms → 32 ms (x100)Now, undetected without patch 32a.sql, exposes that we can employ the following indexes:
CREATE INDEX ON movie_keyword (keyword_id,movie_id); CREATE INDEX ON movie_link(movie_id); It speeds up query: 250ms → 16ms (x15) And the last one, 6a.sql, needs two iterations: The first one demonstrates we need the index: CREATE INDEX ON cast_info(movie_id,person_id);An additional pass shows we can reduce the number of fetched tuples with the following index:
CREATE INDEX ON movie_keyword(keyword_id,movie_id); Finally, that speeds up query: 150ms->220ms → 26ms (x7)As you can see, the main idea is to draw attention to specific queries and having specific methodology propose indexes that may reduce the size of initially fetched data.
[1] https://github.com/danolivo/jo-bench [2] https://github.com/danolivo/pg_track_optimizer/tree/nfiltered3 -- regards, Andrei Lepikhov, pgEdge
analysis-06a.sql
Description: application/sql
analysis-27c.sql
Description: application/sql
analysis-32a.sql
Description: application/sql
