On 6/21/23 20:37, Bəxtiyar Neyman wrote: > Thanks Tomas for the lengthy write-up! > > Pardon the noise in the queries (LATERAL, AND true etc): they were > autogenerated by the library we wrote. >
I know, but it makes them harder to read for people. If you want people to respond it's generally a good idea to make it easy to understand the question. Don't make them waste their time - they'll just skip the message entirely. >> Because those queries are not doing the same thing. In the first query >> you sort by t3_0 columns, while the "id = 4732455" condition is on the >> other table. And so it can't use the index scan for sorting. >> >> While in the second query it can do that, and it doesn't need to do the >> explicit sort (which needs to fetch all the rows etc.). > > Let me try to explain what both of my queries do: > 1) Get the rank of the user using its id (id = 4732455 in this example, > but it could have been one that exists, e.g. id = 500). This is LATERAL > t3_1 in the first query and subquery in the WHERE clause of the second > query. > 2) Using that rank, get the next 10 users by rank. This is t3_0. > > Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC, > t3_1."id" DESC" as you suggest, because then the order of returned rows > will not be guaranteed. In fact, such a clause will have no effect > because there is going to be at most one row supplied by t3_1 anyway. > Ah, OK. I got this wrong. > My question thus still stands. The planner knows that t3_1 has at most > one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it > doesn't figure out that it could have lowered the Join Filter condition > from the first plan as an Index Cond of the Index Scan of t3_1. Is there > a fundamental reason for this, or is this something worth improving in > the planner? > As I tried to explain before, I don't think the problem is in the planner not being able to do this transformation, but more likely in not being able to cost it correctly. Consider this (with 1M rows in the user_ranks table): 1) subquery case ================= Limit (cost=8.87..9.15 rows=10 width=8) (actual time=0.032..0.037 rows=10 loops=1) Output: t3_0.id, t3_0.rank InitPlan 1 (returns $0,$1) -> Index Scan using user_ranks_pkey on public.user_ranks t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1) Output: t4_0.rank, t4_0.id Index Cond: (t4_0.id = 333333) -> Index Only Scan Backward using "by (rank, id)" on public.user_ranks t3_0 (cost=0.42..9493.75 rows=333333 width=8) (actual time=0.031..0.033 rows=10 loops=1) Output: t3_0.id, t3_0.rank Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1)) Heap Fetches: 0 Planning Time: 0.072 ms Execution Time: 0.055 ms (12 rows) 2) join ======= Limit (cost=0.85..2.15 rows=10 width=8) (actual time=464.662..464.672 rows=10 loops=1) Output: t3_0.id, t3_0.rank -> Nested Loop (cost=0.85..43488.87 rows=333333 width=8) (actual time=464.660..464.667 rows=10 loops=1) Output: t3_0.id, t3_0.rank Inner Unique: true Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id)) Rows Removed by Join Filter: 666667 -> Index Only Scan Backward using "by (rank, id)" on public.user_ranks t3_0 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.015..93.703 rows=666677 loops=1) Output: t3_0.rank, t3_0.id Heap Fetches: 0 -> Materialize (cost=0.42..8.45 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=666677) Output: t4_0.rank, t4_0.id -> Index Scan using user_ranks_pkey on public.user_ranks t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1 loops=1) Output: t4_0.rank, t4_0.id Index Cond: (t4_0.id = 333333) Planning Time: 0.092 ms Execution Time: 464.696 ms (17 rows) 3) join (with LEFT JOIN) ======================== Limit (cost=20038.73..20038.76 rows=10 width=8) (actual time=180.714..180.720 rows=10 loops=1) Output: t3_0.id, t3_0.rank -> Sort (cost=20038.73..20872.06 rows=333333 width=8) (actual time=180.712..180.715 rows=10 loops=1) Output: t3_0.id, t3_0.rank Sort Key: t3_0.rank DESC, t3_0.id DESC Sort Method: top-N heapsort Memory: 26kB -> Nested Loop Left Join (cost=0.85..12835.52 rows=333333 width=8) (actual time=0.033..122.000 rows=333333 loops=1) Output: t3_0.id, t3_0.rank -> Index Scan using user_ranks_pkey on public.user_ranks t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.018..0.020 rows=1 loops=1) Output: t4_0.id, t4_0.rank Index Cond: (t4_0.id = 333333) -> Index Only Scan using "by (rank, id)" on public.user_ranks t3_0 (cost=0.42..9493.75 rows=333333 width=8) (actual time=0.013..49.759 rows=333333 loops=1) Output: t3_0.rank, t3_0.id Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id)) Heap Fetches: 0 Planning Time: 0.087 ms Execution Time: 180.744 ms (17 rows) So, the optimizer clearly believes the subquery case has cost 9.15, while the inner join case costs 2.15. So it believes the plan is "cheaper" than the subquery. So even if it knew how to do the transformation / build the other plan (which I'm not sure it can), it probably wouldn't do it. OTOH if you rewrite it to a left join, it costs 20038.76 - way more than the inner join, but it's actually 2x faster. AFAICS there's no chance to make this bit smarter until the estimates get much better to reality. -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company