Has there been any progress on this? Shortest path queries seem to be something that would greatly benefit from having UNION DISTINCT ON ()
--- Hannu On Mon, Jan 5, 2026 at 2:58 PM Henson Choi <[email protected]> wrote: > > Hi hackers, > > I think it may be a bit early for this, but I wanted to share this design > document now so we can revisit and discuss when the time is right. > > ===================================================== > Shortestpath Implementation Strategies for PostgreSQL > ===================================================== > > 1. Overview > ----------- > > This document describes two implementation strategies for shortest path > queries in PostgreSQL-based graph databases. > > Shortestpath Query Example: > > MATCH p = shortestpath((a)-[*]->(b)) RETURN p > MATCH p = allshortestpaths((a)-[:knows*]->(b)) RETURN p > > Two implementation approaches: > > 1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE) > 2. Optimized: Custom executor node with bidirectional BFS > > > 2. Unoptimized Approach: WITH RECURSIVE Rewriting > ------------------------------------------------- > > 2.1 Query Transformation > > Cypher query: > > MATCH p = shortestpath((a:person)-[:knows*]->(b:person)) > WHERE a.name = 'Alice' AND b.name = 'Bob' > RETURN p > > Rewritten SQL using WITH RECURSIVE: > > WITH RECURSIVE sp_paths AS ( > -- Base case: start vertex > SELECT v.id as start_vid, > v.id as end_vid, > ARRAY[v.id] as vertex_ids, > ARRAY[]::bigint[] as edge_ids, > 0 as depth > FROM person v > WHERE name = 'Alice' > > UNION ALL > > -- Recursive case: expand one hop > SELECT p.start_vid, > e.end_id, > p.vertex_ids || e.end_id, > p.edge_ids || e.id, > p.depth + 1 > FROM sp_paths p > JOIN knows e ON p.end_vid = e.start_id > WHERE e.end_id <> ALL(p.vertex_ids) -- Cycle prevention > ) > SELECT vertex_ids, edge_ids, depth > FROM sp_paths > WHERE end_vid = (SELECT id FROM person WHERE name = 'Bob') > ORDER BY depth > LIMIT 1; > > 2.2 Execution Plan > > Limit > -> Sort > Sort Key: depth > -> CTE Scan on sp_paths > Filter: (end_vid = $bob_id) > CTE sp_paths > Recursive Union > Seq Scan on person v > Filter: (name = 'Alice') > Hash Join > Hash Cond: (p.end_vid = e.start_id) > Filter: (e.end_id <> ALL(p.vertex_ids)) > WorkTable Scan on sp_paths p > Hash > Seq Scan on knows e > > 2.3 Problems > > ┌─────────────────────┬──────────────────────────────────────────┐ > │ Problem │ Description │ > ├─────────────────────┼──────────────────────────────────────────┤ > │ Unidirectional │ Search expands only from start vertex │ > │ │ Misses optimization from end vertex │ > ├─────────────────────┼──────────────────────────────────────────┤ > │ Full Exploration │ Must explore all paths up to min depth │ > │ │ before finding shortest │ > ├─────────────────────┼──────────────────────────────────────────┤ > │ No Early Termination│ Cannot stop when shortest found │ > │ │ CTE runs to completion │ > ├─────────────────────┼──────────────────────────────────────────┤ > │ Memory Usage │ O(V) visited vertices per iteration │ > │ │ All partial paths stored │ > ├─────────────────────┼──────────────────────────────────────────┤ > │ Cycle Prevention │ <> ALL(array) grows with path length │ > │ │ O(path_length) per edge check │ > └─────────────────────┴──────────────────────────────────────────┘ > > 2.4 Search Space Comparison > > Unidirectional BFS (CTE): > > Start ──────────────────────────────────────> End > ╲ ╱ > ╲ Explores entire cone ╱ > ╲ from start ╱ > ╲ ╱ > ╲ Search space: O(B^D) ╱ > ╲ ╱ > ──────────────────────── > > Bidirectional BFS (Optimized): > > Start ────────> <──────── End > ╲ ╱╲ ╱ > ╲ ╱ ╲ ╱ > ╲ ╱ ╲ ╱ > ╲ ╱ Meet ╲ ╱ > ╲╱ Point ╲ > Search space: O(2 * B^(D/2)) > > > 3. Optimized Approach: Custom Executor Node > ------------------------------------------- > > 3.1 Design Principle > > Instead of query rewriting, implement a dedicated executor node that: > > - Performs bidirectional BFS from both start and end vertices > - Uses hash tables to track visited vertices from each direction > - Detects meeting point where both searches intersect > - Terminates early when shortest path is found > > 3.2 Plan Node Structure > > ┌────────────────────────────────────────────────────────────┐ > │ ShortestpathPlan │ > │ - Bidirectional BFS with dual hash tables │ > │ - Early termination on path found │ > └────────────────────────────────────────────────────────────┘ > │ │ > left │ │ right > ▼ ▼ > ┌───────────────────────────┐ ┌───────────────────────────┐ > │ Hash2Side (Left) │ │ Hash2Side (Right) │ > │ - KeyTable + HashTable │ │ - KeyTable + HashTable │ > │ - Forward direction │ │ - Reverse direction │ > └───────────────────────────┘ └───────────────────────────┘ > │ │ > ▼ ▼ > ┌───────────────────────────┐ ┌───────────────────────────┐ > │ SubPlan (Forward) │ │ SubPlan (Reverse) │ > │ Index Cond: │ │ Index Cond: │ > │ start_id = $current │ │ end_id = $current │ > └───────────────────────────┘ └───────────────────────────┘ > > Execution Flow: > > Initialize: > Left HashTable = {start_vertex} > Right HashTable = {end_vertex} > │ > ▼ > Expansion loop (smaller side first): > │ > ├──→ Compare |Left frontier| vs |Right frontier| > │ │ > │ └──→ Expand smaller side > │ │ > │ └──→ Check intersection with other side > │ > ├──→ Repeat until intersection found > │ > └──→ Return shortest path(s) > > > 3.3 Key Concept: Dual Hash Tables > > The executor maintains two hash tables for bidirectional search: > > ┌─────────────────────────────────────────────────────────────┐ > │ Left Direction (Start → End) │ > │ ┌─────────────────────────────────────────────────────┐ │ > │ │ KeyTable: Previous hop vertices (expansion base) │ │ > │ │ HashTable: Current hop results (newly discovered) │ │ > │ └─────────────────────────────────────────────────────┘ │ > ├─────────────────────────────────────────────────────────────┤ > │ Right Direction (End → Start) │ > │ ┌─────────────────────────────────────────────────────┐ │ > │ │ KeyTable: Previous hop vertices (expansion base) │ │ > │ │ HashTable: Current hop results (newly discovered) │ │ > │ └─────────────────────────────────────────────────────┘ │ > └─────────────────────────────────────────────────────────────┘ > > After each hop: > - Swap KeyTable and HashTable (new base for next hop) > - Check for intersection between Left and Right HashTables > - If intersection found, reconstruct path through meeting point > > 3.4 Key Concept: Smaller Side Expansion > > At each hop, expand from the direction with fewer frontier vertices. > This minimizes the total number of edges scanned. > > Example (highly unbalanced tree): > > Graph structure: > - Forward (Left->Right): branching factor = 100 > - Backward (Right->Left): branching factor = 2 > - Path distance: 6 hops > > Simple alternating: > > Hop 1: expand Left (1) -> Left=100, Right=1 > Hop 2: expand Right (1) -> Left=100, Right=2 > Hop 3: expand Left (100) -> Left=10,000, Right=2 > Hop 4: expand Right (2) -> Left=10,000, Right=4 > Hop 5: expand Left (10,000) -> Left=1,000,000, Right=4 > Hop 6: expand Right (4) -> meet! > > Total vertices expanded: 1 + 1 + 100 + 2 + 10,000 + 4 = 10,108 > > With smaller side expansion: > > Hop 1: Left=1, Right=1 -> expand Left -> Left=100, Right=1 > Hop 2: 1 < 100 -> expand Right -> Left=100, Right=2 > Hop 3: 2 < 100 -> expand Right -> Left=100, Right=4 > Hop 4: 4 < 100 -> expand Right -> Left=100, Right=8 > Hop 5: 8 < 100 -> expand Right -> Left=100, Right=16 > Hop 6: 16 < 100 -> expand Right -> meet! > > Total vertices expanded: 1 + 1 + 2 + 4 + 8 + 16 = 32 > > Speedup: 10,108 / 32 = ~300x fewer vertices explored > > > 3.5 Key Concept: Cycle Detection via Hash Lookup > > Hash table insertion automatically prevents revisiting vertices. > > Two tuple types in HashTable: > > ┌─────────────────────────────────────────────────────────────┐ > │ Long Tuple (path data) │ > │ ┌─────────┬─────────┬─────────────────────────────────┐ │ > │ │ Header │ Graphid │ Edge Rowids (path to vertex) │ │ > │ └─────────┴─────────┴─────────────────────────────────┘ │ > ├─────────────────────────────────────────────────────────────┤ > │ Short Tuple (visited marker) │ > │ ┌─────────┬─────────┐ │ > │ │ Header │ Graphid │ <- No path data, just marker │ > │ └─────────┴─────────┘ │ > └─────────────────────────────────────────────────────────────┘ > > Insertion logic: > > 1. Inserting long tuple (with path): > - Check if short tuple (marker) exists for same Graphid > - If marker exists: reject insertion (already visited) > > 2. Inserting short tuple (marker): > - Insert marker first > - Remove all existing long tuples with same Graphid > - Vertex is now "locked" - no more paths to this vertex > > Example: > > Left Hop 1: Expand from A > HashTable = {B(long), C(long)} <- paths A->B, A->C > > Left and Right meet at C -> shortest path found > > Insert C(short) marker: > HashTable = {B(long), C(short)} <- C's long tuple removed > > Later, another path tries to reach C (e.g., D->C): > -> C(short) marker exists -> insertion rejected > -> Already processed at shortest hop > > Benefits: > - O(1) duplicate detection via hash lookup > - No separate "visited" set needed > - Automatic cleanup of superseded paths > - Memory efficient (markers are minimal size) > > > 3.6 Key Concept: Batch Synchronization > > When hash table exceeds memory, it splits into multiple batches. > Left and Right hash tables must have matching batch structure: > > Problem without synchronization: > > Left: nbatch=1 [all data in batch 0] > Right: nbatch=4 [batch0] [batch1] [batch2] [batch3] > > -> Cannot find intersection (different batch structures) > > Solution: > > When Right splits into 4 batches, clone structure to Left: > > Left: nbatch=4 [batch0] [batch1] [batch2] [batch3] > Right: nbatch=4 [batch0] [batch1] [batch2] [batch3] > | | | | > Join only matching batches > > This ensures correct intersection detection across batch boundaries. > > > 3.7 Execution Plan (Optimized) > > EXPLAIN MATCH p = shortestpath((a:person)-[:knows*]->(b:person)) > WHERE a.name='Alice' AND b.name='Bob' RETURN p > > Shortestpath (limit=1) > -> Hash2Side (left direction=FORWARD) > -> Index Scan on knows e > Index Cond: (start_id = $current_vertex) > -> Hash2Side (right direction=REVERSE) > -> Index Scan on knows e > Index Cond: (end_id = $current_vertex) > > > 4. Comparison Summary > --------------------- > > > ┌─────────────────────┬─────────────────────────┬─────────────────────────┐ > │ Aspect │ Unoptimized (CTE) │ Optimized (Executor) > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Implementation │ Query rewriting │ Custom executor node > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Search Direction │ Unidirectional │ Bidirectional > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Search Space │ O(B^D) │ O(2 * B^(D/2)) > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Early Termination │ Not possible │ Stop when path found > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Data Structure │ WorkTable (tuplestore) │ Dual hash tables > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Cycle Prevention │ Array membership check │ Hash table lookup O(1) > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ Code Complexity │ Low (rewriting only) │ High (new executor) > │ > > ├─────────────────────┼─────────────────────────┼─────────────────────────┤ > │ PostgreSQL Changes │ None │ New plan/executor node > │ > > └─────────────────────┴─────────────────────────┴─────────────────────────┘ > > > 5. Search Space Example > ----------------------- > > Graph: Social network, avg 100 connections per person > Query: shortestpath((Alice)-[*]->(Bob)), actual distance = 6 hops > > Unidirectional BFS (CTE): > Hop 1: 100 vertices > Hop 2: 10,000 vertices > Hop 3: 1,000,000 vertices > Hop 4: 100,000,000 vertices > Hop 5: 10,000,000,000 vertices > Hop 6: Find Bob > > Total vertices explored: ~10 billion > > Bidirectional BFS with smaller side expansion: > Hop 1: Left=1, Right=1 -> expand Left -> Left=100 > Hop 2: 1 < 100 -> expand Right -> Right=100 > Hop 3: 100 = 100 -> expand Left -> Left=10,000 > Hop 4: 100 < 10,000 -> expand Right -> Right=10,000 > Hop 5: 10,000 = 10,000 -> expand Left -> Left=1,000,000 > Hop 6: 10,000 < 1,000,000-> expand Right -> Meet! > > Total vertices explored: ~1 million (each side explores ~500K) > > Speedup: ~10,000x fewer vertices explored > > > 6. Key Techniques Summary > ------------------------- > > > ┌────┬───────────────────────────┬────────────────────────────────────────┐ > │ # │ Technique │ Benefit > │ > > ├────┼───────────────────────────┼────────────────────────────────────────┤ > │ 1 │ Bidirectional BFS │ Exponential reduction in search space > │ > │ 2 │ Dual Hash Tables │ O(1) visited check, O(1) intersection > │ > │ 3 │ Smaller Side Expansion │ Expand from side with fewer vertices > │ > │ 4 │ Early Termination │ Stop immediately when path found > │ > │ 5 │ Cycle Detection │ Hash lookup prevents revisiting vertex > │ > │ 6 │ Batch Synchronization │ Match nbatch between Left/Right hashes > │ > │ 7 │ Path Reconstruction │ Merge left and right paths at meeting > │ > │ 8 │ Parameterized Subplan │ Reuse edge scan plan per vertex > │ > │ 9 │ Batch Processing │ Handle memory overflow with batching > │ > │ 10 │ Limit Control │ shortestpath (1) vs allshortestpaths > │ > > └────┴───────────────────────────┴────────────────────────────────────────┘ > > > 7. Conclusion > ------------- > > The unoptimized CTE-based approach is simpler to implement but suffers from: > > - Unidirectional search with O(B^D) complexity > - No early termination capability > - Inefficient cycle prevention with arrays > - Must explore all paths before finding shortest > > The optimized executor-based approach requires more implementation > effort but provides: > > - Bidirectional BFS with O(B^(D/2)) complexity > - Early termination when shortest path found > - Efficient hash-based visited tracking > - Support for both shortestpath and allshortestpaths > > For production graph database systems, the bidirectional BFS approach > is essential for practical shortest path queries on large graphs. > > This document focuses on the conceptual design. Implementation details > such as hash table sizing, batch overflow handling, and path reconstruction > algorithms are left for follow-up discussion. >
