Thank you for looking into this. Now we can execute a, still narrow, family queries!
Maybe it helps to see this as a *social network feeds*. Imagine a social network, you have a few friends, or follow a few people, and you want to see their updates ordered by date. For each user we have a different combination of users that we have to display. But maybe, even having hundreds of users we will only show the first 10. There is a low hanging fruit on the skip scan, if we need N rows, and one group already has M rows we could stop there. If Nx is the number of friends, and M is the number of posts to show. This runs with complexity (Nx * M) rows, followed by an (Nx * M) sort, instead of (Nx * N) followed by an (Nx * N) sort. Where M = 10 and N is 1000 this is a significant improvement. But if M ~ N, the merge scan that runs with M + Nx row accesses, (M + Nx) heap operations. If everything is on the same page the skip scan would win. The cost estimation is probably far off. I am also not considering the filters applied after this operator, and I don't know if the planner infrastructure is able to adjust it by itself. This is where I would like reviewer's feedback. I think that the planner costs are something to be determined experimentally. Next I will make it slightly more general handling * More index columns: Index (a, b, s...) could support WHERE a IN (...) ORDER BY b LIMIT N (ignoring s...) * Multi-column prefix: WHERE (a, b) IN (...) ORDER BY c * Non-leading prefix: WHERE b IN (...) AND a = const ORDER BY c on index (a, b, c) --- Kind Regards, Alexandre On Wed, Feb 4, 2026 at 7:13 AM Michał Kłeczek <[email protected]> wrote: > > > On 3 Feb 2026, at 22:42, Ants Aasma <[email protected]> wrote: > > On Mon, 2 Feb 2026 at 01:54, Tomas Vondra <[email protected]> wrote: > > I'm also wondering how common is the targeted query pattern? How common > it is to have an IN condition on the leading column in an index, and > ORDER BY on the second one? > > > I have seen this pattern multiple times. My nickname for it is the > timeline view. Think of the social media timeline, showing posts from > all followed accounts in timestamp order, returned in reasonably sized > batches. The naive SQL query will have to scan all posts from all > followed accounts and pass them through a top-N sort. When the total > number of posts is much larger than the batch size this is much slower > than what is proposed here (assuming I understand it correctly) - > effectively equivalent to running N index scans through Merge Append. > > > My workarounds I have proposed users have been either to rewrite the > query as a UNION ALL of a set of single value prefix queries wrapped > in an order by limit. This gives the exact needed merge append plan > shape. But repeating the query N times can get unwieldy when the > number of values grows, so the fallback is: > > SELECT * FROM unnest(:friends) id, LATERAL ( > SELECT * FROM posts > WHERE user_id = id > ORDER BY tstamp DESC LIMIT 100) > ORDER BY tstamp DESC LIMIT 100; > > The downside of this formulation is that we still have to fetch a > batch worth of items from scans where we otherwise would have only had > to look at one index tuple. > > > GIST can be used to handle this kind of queries as it supports multiple > sort orders. > The only problem is that GIST does not support ORDER BY column. > One possible workaround is [1] but as described there it does not play > well with partitioning. > I’ve started drafting support for ORDER BY column in GIST - see [2]. > I think it would be easier to implement and maintain than a new IAM (but I > don’t have enough knowledge and experience to implement it myself) > > [1] > https://www.postgresql.org/message-id/3FA1E0A9-8393-41F6-88BD-62EEEA1EC21F%40kleczek.org > [2] > https://www.postgresql.org/message-id/B2AC13F9-6655-4E27-BFD3-068844E5DC91%40kleczek.org > > — > Kind regards, > Michal >
0002-MERGE-SCAN-Access-method.patch
Description: Binary data
0003-MERGE-SCAN-Planner-integration.patch
Description: Binary data
0001-MERGE-SCAN-Test-the-baseline.patch
Description: Binary data
