Has there been any progress on this?

Shortest path queries seem to be something that would greatly benefit
from having UNION DISTINCT ON ()

---
Hannu

On Mon, Jan 5, 2026 at 2:58 PM Henson Choi <[email protected]> wrote:
>
> Hi hackers,
>
> I think it may be a bit early for this, but I wanted to share this design
> document now so we can revisit and discuss when the time is right.
>
> =====================================================
> Shortestpath Implementation Strategies for PostgreSQL
> =====================================================
>
> 1. Overview
> -----------
>
> This document describes two implementation strategies for shortest path
> queries in PostgreSQL-based graph databases.
>
> Shortestpath Query Example:
>
>     MATCH p = shortestpath((a)-[*]->(b)) RETURN p
>     MATCH p = allshortestpaths((a)-[:knows*]->(b)) RETURN p
>
> Two implementation approaches:
>
>     1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
>     2. Optimized:   Custom executor node with bidirectional BFS
>
>
> 2. Unoptimized Approach: WITH RECURSIVE Rewriting
> -------------------------------------------------
>
> 2.1 Query Transformation
>
> Cypher query:
>
>     MATCH p = shortestpath((a:person)-[:knows*]->(b:person))
>     WHERE a.name = 'Alice' AND b.name = 'Bob'
>     RETURN p
>
> Rewritten SQL using WITH RECURSIVE:
>
>     WITH RECURSIVE sp_paths AS (
>         -- Base case: start vertex
>         SELECT v.id as start_vid,
>                v.id as end_vid,
>                ARRAY[v.id] as vertex_ids,
>                ARRAY[]::bigint[] as edge_ids,
>                0 as depth
>         FROM person v
>         WHERE name = 'Alice'
>
>         UNION ALL
>
>         -- Recursive case: expand one hop
>         SELECT p.start_vid,
>                e.end_id,
>                p.vertex_ids || e.end_id,
>                p.edge_ids || e.id,
>                p.depth + 1
>         FROM sp_paths p
>         JOIN knows e ON p.end_vid = e.start_id
>         WHERE e.end_id <> ALL(p.vertex_ids)  -- Cycle prevention
>     )
>     SELECT vertex_ids, edge_ids, depth
>     FROM sp_paths
>     WHERE end_vid = (SELECT id FROM person WHERE name = 'Bob')
>     ORDER BY depth
>     LIMIT 1;
>
> 2.2 Execution Plan
>
>     Limit
>       ->  Sort
>             Sort Key: depth
>             ->  CTE Scan on sp_paths
>                   Filter: (end_vid = $bob_id)
>                   CTE sp_paths
>                     Recursive Union
>                       Seq Scan on person v
>                         Filter: (name = 'Alice')
>                       Hash Join
>                         Hash Cond: (p.end_vid = e.start_id)
>                         Filter: (e.end_id <> ALL(p.vertex_ids))
>                         WorkTable Scan on sp_paths p
>                         Hash
>                           Seq Scan on knows e
>
> 2.3 Problems
>
>     ┌─────────────────────┬──────────────────────────────────────────┐
>     │ Problem             │ Description                              │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Unidirectional      │ Search expands only from start vertex    │
>     │                     │ Misses optimization from end vertex      │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Full Exploration    │ Must explore all paths up to min depth   │
>     │                     │ before finding shortest                  │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ No Early Termination│ Cannot stop when shortest found          │
>     │                     │ CTE runs to completion                   │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Memory Usage        │ O(V) visited vertices per iteration      │
>     │                     │ All partial paths stored                 │
>     ├─────────────────────┼──────────────────────────────────────────┤
>     │ Cycle Prevention    │ <> ALL(array) grows with path length     │
>     │                     │ O(path_length) per edge check            │
>     └─────────────────────┴──────────────────────────────────────────┘
>
> 2.4 Search Space Comparison
>
>     Unidirectional BFS (CTE):
>
>         Start ──────────────────────────────────────> End
>               ╲                                      ╱
>                ╲    Explores entire cone           ╱
>                 ╲      from start                 ╱
>                  ╲                              ╱
>                   ╲   Search space: O(B^D)    ╱
>                    ╲                        ╱
>                     ────────────────────────
>
>     Bidirectional BFS (Optimized):
>
>         Start ────────> <──────── End
>               ╲        ╱╲       ╱
>                ╲      ╱  ╲     ╱
>                 ╲    ╱    ╲   ╱
>                  ╲  ╱ Meet ╲ ╱
>                   ╲╱ Point  ╲
>                    Search space: O(2 * B^(D/2))
>
>
> 3. Optimized Approach: Custom Executor Node
> -------------------------------------------
>
> 3.1 Design Principle
>
> Instead of query rewriting, implement a dedicated executor node that:
>
>     - Performs bidirectional BFS from both start and end vertices
>     - Uses hash tables to track visited vertices from each direction
>     - Detects meeting point where both searches intersect
>     - Terminates early when shortest path is found
>
> 3.2 Plan Node Structure
>
>     ┌────────────────────────────────────────────────────────────┐
>     │                    ShortestpathPlan                        │
>     │  - Bidirectional BFS with dual hash tables                 │
>     │  - Early termination on path found                         │
>     └────────────────────────────────────────────────────────────┘
>                     │                       │
>               left  │                       │ right
>                     ▼                       ▼
>     ┌───────────────────────────┐   ┌───────────────────────────┐
>     │     Hash2Side (Left)      │   │     Hash2Side (Right)     │
>     │  - KeyTable + HashTable   │   │  - KeyTable + HashTable   │
>     │  - Forward direction      │   │  - Reverse direction      │
>     └───────────────────────────┘   └───────────────────────────┘
>                     │                       │
>                     ▼                       ▼
>     ┌───────────────────────────┐   ┌───────────────────────────┐
>     │   SubPlan (Forward)       │   │   SubPlan (Reverse)       │
>     │  Index Cond:              │   │  Index Cond:              │
>     │  start_id = $current      │   │  end_id = $current        │
>     └───────────────────────────┘   └───────────────────────────┘
>
>     Execution Flow:
>
>     Initialize:
>         Left HashTable  = {start_vertex}
>         Right HashTable = {end_vertex}
>            │
>            ▼
>     Expansion loop (smaller side first):
>            │
>            ├──→ Compare |Left frontier| vs |Right frontier|
>            │         │
>            │         └──→ Expand smaller side
>            │                   │
>            │                   └──→ Check intersection with other side
>            │
>            ├──→ Repeat until intersection found
>            │
>            └──→ Return shortest path(s)
>
>
> 3.3 Key Concept: Dual Hash Tables
>
>     The executor maintains two hash tables for bidirectional search:
>
>     ┌─────────────────────────────────────────────────────────────┐
>     │  Left Direction (Start → End)                               │
>     │  ┌─────────────────────────────────────────────────────┐   │
>     │  │ KeyTable:   Previous hop vertices (expansion base)  │   │
>     │  │ HashTable:  Current hop results (newly discovered)  │   │
>     │  └─────────────────────────────────────────────────────┘   │
>     ├─────────────────────────────────────────────────────────────┤
>     │  Right Direction (End → Start)                              │
>     │  ┌─────────────────────────────────────────────────────┐   │
>     │  │ KeyTable:   Previous hop vertices (expansion base)  │   │
>     │  │ HashTable:  Current hop results (newly discovered)  │   │
>     │  └─────────────────────────────────────────────────────┘   │
>     └─────────────────────────────────────────────────────────────┘
>
>     After each hop:
>         - Swap KeyTable and HashTable (new base for next hop)
>         - Check for intersection between Left and Right HashTables
>         - If intersection found, reconstruct path through meeting point
>
> 3.4 Key Concept: Smaller Side Expansion
>
>     At each hop, expand from the direction with fewer frontier vertices.
>     This minimizes the total number of edges scanned.
>
>     Example (highly unbalanced tree):
>
>         Graph structure:
>             - Forward (Left->Right):  branching factor = 100
>             - Backward (Right->Left): branching factor = 2
>             - Path distance: 6 hops
>
>         Simple alternating:
>
>             Hop 1: expand Left  (1)      -> Left=100,       Right=1
>             Hop 2: expand Right (1)      -> Left=100,       Right=2
>             Hop 3: expand Left  (100)    -> Left=10,000,    Right=2
>             Hop 4: expand Right (2)      -> Left=10,000,    Right=4
>             Hop 5: expand Left  (10,000) -> Left=1,000,000, Right=4
>             Hop 6: expand Right (4)      -> meet!
>
>             Total vertices expanded: 1 + 1 + 100 + 2 + 10,000 + 4 = 10,108
>
>         With smaller side expansion:
>
>             Hop 1: Left=1, Right=1 -> expand Left  -> Left=100, Right=1
>             Hop 2: 1 < 100         -> expand Right -> Left=100, Right=2
>             Hop 3: 2 < 100         -> expand Right -> Left=100, Right=4
>             Hop 4: 4 < 100         -> expand Right -> Left=100, Right=8
>             Hop 5: 8 < 100         -> expand Right -> Left=100, Right=16
>             Hop 6: 16 < 100        -> expand Right -> meet!
>
>             Total vertices expanded: 1 + 1 + 2 + 4 + 8 + 16 = 32
>
>         Speedup: 10,108 / 32 = ~300x fewer vertices explored
>
>
> 3.5 Key Concept: Cycle Detection via Hash Lookup
>
>     Hash table insertion automatically prevents revisiting vertices.
>
>     Two tuple types in HashTable:
>
>         ┌─────────────────────────────────────────────────────────────┐
>         │  Long Tuple (path data)                                     │
>         │  ┌─────────┬─────────┬─────────────────────────────────┐   │
>         │  │ Header  │ Graphid │ Edge Rowids (path to vertex)    │   │
>         │  └─────────┴─────────┴─────────────────────────────────┘   │
>         ├─────────────────────────────────────────────────────────────┤
>         │  Short Tuple (visited marker)                               │
>         │  ┌─────────┬─────────┐                                     │
>         │  │ Header  │ Graphid │  <- No path data, just marker       │
>         │  └─────────┴─────────┘                                     │
>         └─────────────────────────────────────────────────────────────┘
>
>     Insertion logic:
>
>         1. Inserting long tuple (with path):
>            - Check if short tuple (marker) exists for same Graphid
>            - If marker exists: reject insertion (already visited)
>
>         2. Inserting short tuple (marker):
>            - Insert marker first
>            - Remove all existing long tuples with same Graphid
>            - Vertex is now "locked" - no more paths to this vertex
>
>     Example:
>
>         Left Hop 1: Expand from A
>             HashTable = {B(long), C(long)}  <- paths A->B, A->C
>
>         Left and Right meet at C -> shortest path found
>
>         Insert C(short) marker:
>             HashTable = {B(long), C(short)} <- C's long tuple removed
>
>         Later, another path tries to reach C (e.g., D->C):
>             -> C(short) marker exists -> insertion rejected
>             -> Already processed at shortest hop
>
>     Benefits:
>         - O(1) duplicate detection via hash lookup
>         - No separate "visited" set needed
>         - Automatic cleanup of superseded paths
>         - Memory efficient (markers are minimal size)
>
>
> 3.6 Key Concept: Batch Synchronization
>
>     When hash table exceeds memory, it splits into multiple batches.
>     Left and Right hash tables must have matching batch structure:
>
>         Problem without synchronization:
>
>             Left:   nbatch=1  [all data in batch 0]
>             Right:  nbatch=4  [batch0] [batch1] [batch2] [batch3]
>
>             -> Cannot find intersection (different batch structures)
>
>         Solution:
>
>             When Right splits into 4 batches, clone structure to Left:
>
>             Left:   nbatch=4  [batch0] [batch1] [batch2] [batch3]
>             Right:  nbatch=4  [batch0] [batch1] [batch2] [batch3]
>                                  |        |        |        |
>                               Join only matching batches
>
>     This ensures correct intersection detection across batch boundaries.
>
>
> 3.7 Execution Plan (Optimized)
>
>     EXPLAIN MATCH p = shortestpath((a:person)-[:knows*]->(b:person))
>             WHERE a.name='Alice' AND b.name='Bob' RETURN p
>
>     Shortestpath  (limit=1)
>       ->  Hash2Side  (left direction=FORWARD)
>             ->  Index Scan on knows e
>                   Index Cond: (start_id = $current_vertex)
>       ->  Hash2Side  (right direction=REVERSE)
>             ->  Index Scan on knows e
>                   Index Cond: (end_id = $current_vertex)
>
>
> 4. Comparison Summary
> ---------------------
>
>     
> ┌─────────────────────┬─────────────────────────┬─────────────────────────┐
>     │ Aspect              │ Unoptimized (CTE)       │ Optimized (Executor)    
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Implementation      │ Query rewriting         │ Custom executor node    
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Search Direction    │ Unidirectional          │ Bidirectional           
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Search Space        │ O(B^D)                  │ O(2 * B^(D/2))          
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Early Termination   │ Not possible            │ Stop when path found    
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Data Structure      │ WorkTable (tuplestore)  │ Dual hash tables        
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Cycle Prevention    │ Array membership check  │ Hash table lookup O(1)  
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ Code Complexity     │ Low (rewriting only)    │ High (new executor)     
> │
>     
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
>     │ PostgreSQL Changes  │ None                    │ New plan/executor node  
> │
>     
> └─────────────────────┴─────────────────────────┴─────────────────────────┘
>
>
> 5. Search Space Example
> -----------------------
>
>     Graph: Social network, avg 100 connections per person
>     Query: shortestpath((Alice)-[*]->(Bob)), actual distance = 6 hops
>
>     Unidirectional BFS (CTE):
>         Hop 1: 100 vertices
>         Hop 2: 10,000 vertices
>         Hop 3: 1,000,000 vertices
>         Hop 4: 100,000,000 vertices
>         Hop 5: 10,000,000,000 vertices
>         Hop 6: Find Bob
>
>         Total vertices explored: ~10 billion
>
>     Bidirectional BFS with smaller side expansion:
>         Hop 1: Left=1, Right=1   -> expand Left  -> Left=100
>         Hop 2: 1 < 100           -> expand Right -> Right=100
>         Hop 3: 100 = 100         -> expand Left  -> Left=10,000
>         Hop 4: 100 < 10,000      -> expand Right -> Right=10,000
>         Hop 5: 10,000 = 10,000   -> expand Left  -> Left=1,000,000
>         Hop 6: 10,000 < 1,000,000-> expand Right -> Meet!
>
>         Total vertices explored: ~1 million (each side explores ~500K)
>
>     Speedup: ~10,000x fewer vertices explored
>
>
> 6. Key Techniques Summary
> -------------------------
>
>     
> ┌────┬───────────────────────────┬────────────────────────────────────────┐
>     │ #  │ Technique                 │ Benefit                                
> │
>     
> ├────┼───────────────────────────┼────────────────────────────────────────┤
>     │ 1  │ Bidirectional BFS         │ Exponential reduction in search space  
> │
>     │ 2  │ Dual Hash Tables          │ O(1) visited check, O(1) intersection  
> │
>     │ 3  │ Smaller Side Expansion    │ Expand from side with fewer vertices   
> │
>     │ 4  │ Early Termination         │ Stop immediately when path found       
> │
>     │ 5  │ Cycle Detection           │ Hash lookup prevents revisiting vertex 
> │
>     │ 6  │ Batch Synchronization     │ Match nbatch between Left/Right hashes 
> │
>     │ 7  │ Path Reconstruction       │ Merge left and right paths at meeting  
> │
>     │ 8  │ Parameterized Subplan     │ Reuse edge scan plan per vertex        
> │
>     │ 9  │ Batch Processing          │ Handle memory overflow with batching   
> │
>     │ 10 │ Limit Control             │ shortestpath (1) vs allshortestpaths   
> │
>     
> └────┴───────────────────────────┴────────────────────────────────────────┘
>
>
> 7. Conclusion
> -------------
>
> The unoptimized CTE-based approach is simpler to implement but suffers from:
>
>     - Unidirectional search with O(B^D) complexity
>     - No early termination capability
>     - Inefficient cycle prevention with arrays
>     - Must explore all paths before finding shortest
>
> The optimized executor-based approach requires more implementation
> effort but provides:
>
>     - Bidirectional BFS with O(B^(D/2)) complexity
>     - Early termination when shortest path found
>     - Efficient hash-based visited tracking
>     - Support for both shortestpath and allshortestpaths
>
> For production graph database systems, the bidirectional BFS approach
> is essential for practical shortest path queries on large graphs.
>
> This document focuses on the conceptual design. Implementation details
> such as hash table sizing, batch overflow handling, and path reconstruction
> algorithms are left for follow-up discussion.
>

Reply via email to