Thanks, Tomas! > I know, but it makes them harder to read for people. If you want people > to respond it's generally a good idea to make it easy to understand the > question. Don't make them waste their time - they'll just skip the > message entirely.
Fair point. > So, the optimizer clearly believes the subquery case has cost 9.15, > while the inner join case costs 2.15. So it believes the plan is > "cheaper" than the subquery. So even if it knew how to do the > transformation / build the other plan (which I'm not sure it can), it > probably wouldn't do it. > AFAICS there's no chance to make this bit smarter until the estimates > get much better to reality. Got it. Thanks. I guess we'll have to emit correlated subqueries/CTEs. Sincerely, Bakhtiyar On Wed, Jun 21, 2023 at 12:58 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > On 6/21/23 20:37, Bəxtiyar Neyman wrote: > > Thanks Tomas for the lengthy write-up! > > > > Pardon the noise in the queries (LATERAL, AND true etc): they were > > autogenerated by the library we wrote. > > > > I know, but it makes them harder to read for people. If you want people > to respond it's generally a good idea to make it easy to understand the > question. Don't make them waste their time - they'll just skip the > message entirely. > > >> Because those queries are not doing the same thing. In the first query > >> you sort by t3_0 columns, while the "id = 4732455" condition is on the > >> other table. And so it can't use the index scan for sorting. > >> > >> While in the second query it can do that, and it doesn't need to do the > >> explicit sort (which needs to fetch all the rows etc.). > > > > Let me try to explain what both of my queries do: > > 1) Get the rank of the user using its id (id = 4732455 in this example, > > but it could have been one that exists, e.g. id = 500). This is LATERAL > > t3_1 in the first query and subquery in the WHERE clause of the second > > query. > > 2) Using that rank, get the next 10 users by rank. This is t3_0. > > > > Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC, > > t3_1."id" DESC" as you suggest, because then the order of returned rows > > will not be guaranteed. In fact, such a clause will have no effect > > because there is going to be at most one row supplied by t3_1 anyway. > > > > Ah, OK. I got this wrong. > > > My question thus still stands. The planner knows that t3_1 has at most > > one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it > > doesn't figure out that it could have lowered the Join Filter condition > > from the first plan as an Index Cond of the Index Scan of t3_1. Is there > > a fundamental reason for this, or is this something worth improving in > > the planner? > > > > As I tried to explain before, I don't think the problem is in the > planner not being able to do this transformation, but more likely in not > being able to cost it correctly. > > Consider this (with 1M rows in the user_ranks table): > > 1) subquery case > ================= > > Limit (cost=8.87..9.15 rows=10 width=8) (actual time=0.032..0.037 > rows=10 loops=1) > Output: t3_0.id, t3_0.rank > InitPlan 1 (returns $0,$1) > -> Index Scan using user_ranks_pkey on public.user_ranks t4_0 > (cost=0.42..8.44 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1) > Output: t4_0.rank, t4_0.id > Index Cond: (t4_0.id = 333333) > -> Index Only Scan Backward using "by (rank, id)" on > public.user_ranks t3_0 (cost=0.42..9493.75 rows=333333 width=8) (actual > time=0.031..0.033 rows=10 loops=1) > Output: t3_0.id, t3_0.rank > Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1)) > Heap Fetches: 0 > Planning Time: 0.072 ms > Execution Time: 0.055 ms > (12 rows) > > > 2) join > ======= > > Limit (cost=0.85..2.15 rows=10 width=8) (actual time=464.662..464.672 > rows=10 loops=1) > Output: t3_0.id, t3_0.rank > -> Nested Loop (cost=0.85..43488.87 rows=333333 width=8) (actual > time=464.660..464.667 rows=10 loops=1) > Output: t3_0.id, t3_0.rank > Inner Unique: true > Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id)) > Rows Removed by Join Filter: 666667 > -> Index Only Scan Backward using "by (rank, id)" on > public.user_ranks t3_0 (cost=0.42..25980.42 rows=1000000 width=8) > (actual time=0.015..93.703 rows=666677 loops=1) > Output: t3_0.rank, t3_0.id > Heap Fetches: 0 > -> Materialize (cost=0.42..8.45 rows=1 width=8) (actual > time=0.000..0.000 rows=1 loops=666677) > Output: t4_0.rank, t4_0.id > -> Index Scan using user_ranks_pkey on public.user_ranks > t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1 > loops=1) > Output: t4_0.rank, t4_0.id > Index Cond: (t4_0.id = 333333) > Planning Time: 0.092 ms > Execution Time: 464.696 ms > (17 rows) > > > 3) join (with LEFT JOIN) > ======================== > > Limit (cost=20038.73..20038.76 rows=10 width=8) (actual > time=180.714..180.720 rows=10 loops=1) > Output: t3_0.id, t3_0.rank > -> Sort (cost=20038.73..20872.06 rows=333333 width=8) (actual > time=180.712..180.715 rows=10 loops=1) > Output: t3_0.id, t3_0.rank > Sort Key: t3_0.rank DESC, t3_0.id DESC > Sort Method: top-N heapsort Memory: 26kB > -> Nested Loop Left Join (cost=0.85..12835.52 rows=333333 > width=8) (actual time=0.033..122.000 rows=333333 loops=1) > Output: t3_0.id, t3_0.rank > -> Index Scan using user_ranks_pkey on public.user_ranks > t4_0 (cost=0.42..8.44 rows=1 width=8) (actual time=0.018..0.020 rows=1 > loops=1) > Output: t4_0.id, t4_0.rank > Index Cond: (t4_0.id = 333333) > -> Index Only Scan using "by (rank, id)" on > public.user_ranks t3_0 (cost=0.42..9493.75 rows=333333 width=8) (actual > time=0.013..49.759 rows=333333 loops=1) > Output: t3_0.rank, t3_0.id > Index Cond: (ROW(t3_0.rank, t3_0.id) <= > ROW(t4_0.rank, t4_0.id)) > Heap Fetches: 0 > Planning Time: 0.087 ms > Execution Time: 180.744 ms > (17 rows) > > > So, the optimizer clearly believes the subquery case has cost 9.15, > while the inner join case costs 2.15. So it believes the plan is > "cheaper" than the subquery. So even if it knew how to do the > transformation / build the other plan (which I'm not sure it can), it > probably wouldn't do it. > > OTOH if you rewrite it to a left join, it costs 20038.76 - way more than > the inner join, but it's actually 2x faster. > > > AFAICS there's no chance to make this bit smarter until the estimates > get much better to reality. > > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >