Hi Junwang, Thanks for an interesting thread, and for picking this up — it has been a useful occasion to re-examine assumptions I had carried over from a different graph model.
My background is on AgensGraph, where labels form an inheritance hierarchy (rather than Neo4j-style multi-label tag sets), so a path pattern rewrites against a single parent table per position and the fan-out across labels is left to the planner's existing inheritance machinery. The rewriter never enumerates per-element combinations, so an N^K blow-up like Satya's was simply not on the radar there. SQL/PGQ instead designates graph identity through the schema itself, so each pattern position carries a set of candidate element tables and the rewriter has to materialize each schema-feasible join combination — a (vertex, edge, vertex, ...) sequence of element tables — as its own SELECT branch, with all such branches joined by UNION ALL. The number of *feasible* join combinations is bounded by the schema graph's connectivity and is usually small; the current rewriter's naive shape, however, is to enumerate the full N^K candidate space first and reject infeasible combinations only afterwards. The patch tackles exactly that part. To be clear, I do not mean this as a criticism of the original implementer. For sizeable features I tend to prefer a deliberately naive implementation first — getting the full functionality working end-to-end before optimizing — and that approach is even more valuable in team settings, since a working baseline lets additional contributors join much earlier. The current shape of generate_queries_for_path_pattern_recurse() reads exactly like such a baseline, and this patch is precisely the kind of focused follow-up improvement it invites. When I thought about this independently, I had reached for the dual formulation — "extend the prefix only along feasible edges" rather than "enumerate everything and prune" — and my intuition was that the two should produce the same surviving set, though I have not worked that through rigorously. Either way, your "early pruning" framing is the better one for the patch itself: it stays close to the existing code structure and lets the change read as a small refinement of generate_queries_for_path_pattern_recurse() rather than a reorganization of how candidates are walked. That makes review and incremental landing materially easier. The same reframing also explains, after the fact, why the discrepancy between Ashutosh's quick runs and Satya's 81 GB case was so large: the cost being measured is in large part infeasible enumeration, which a forward-moved check eliminates regardless of pattern length or schema size in realistic graphs. I have not gone through the patch at the code level yet, but the shape of the problem and of your fix is clear enough from the description. My personal take, with that caveat, is that early pruning, together with the CHECK_FOR_INTERRUPTS that Satya already proposed in the earlier thread [1], feels like the right pair to land first; the explicit hard limit, in turn, is probably best revisited only if a real case turns up where even those two combined are not enough. That keeps the immediate change narrow and semantics-preserving while leaving the policy question open for evidence rather than upfront estimation. There is a fairly recent precedent in PostgreSQL itself that I think supports this ordering. I have been following — and participating in — Tatsuo Ishii's Row Pattern Recognition thread [2] since earlier this year, and that work ran into a structurally similar concern: in its earlier implementation the engine could undergo a combinatorial explosion of pattern-candidate states. A memory-based limit was floated as one mitigation, but it never actually converged into a design — the conversation around an explicit limit simply faded out as a layered set of pruning and early-termination rules accumulated incrementally and brought the state space back under control. CFI and stack-depth checks, by contrast, were treated as independently necessary throughout and stayed in the design. The shape of this thread feels analogous, and the same staged approach (structural fix + CFI first, hard limit only if a residual case genuinely demands it) is, I suspect, what will hold up here as well. I'll set aside time separately to do a proper line-by-line review and follow up. Regards, Henson [1] https://www.postgresql.org/message-id/CAHg%2BQDe8JU%2BERqA2xwjrg_ZptvH_v0T6PS9_P_ZgyYzD5h-Grw%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20230625.210509.1276733411677577841.t-ishii%40sranhm.sra.co.jp
