Hi Peter!

On Wed, 2025-09-24 at 12:46 +1000, Peter Smith wrote:
> On Sat, Sep 20, 2025 at 12:13 AM Timur Magomedov
> <[email protected]> wrote:
> > 
> > Hi Peter!
> > 
> > > What exactly did Valgrind report? For example, you said the
> > > VciScanState points beyond allocated memory. Do you have any more
> > > clues, like where that happened? Did you discover where that
> > > (smaller
> > > than it should be) memory was allocated in the first place?
> > 
> > Doing some experiments I've faced a segfault on a query joining
> > tables
> > filled with some amount of data. It was flaky so I used Valgrind.
> > There is a line in vci_scan.c, exec_custom_plan_enabling_vp():
> > if (!scanstate->first_fetch || (scanstate->pos.num_fetched_rows <=
> > scanstate->pos.current_row))
> > Valgrind reported that line as Invalid read of size 1, 4 and 4. So
> > all
> > three of the values checked in this line are read from some random
> > memory, possibly allocated and used by other objects already.
> > 
> > When the expression in exec_custom_plan_enabling_vp() randomly
> > evaluated to true, the following ExecClearTuple() dereferences NULL
> > in
> > slot->tts_ops.
> > The memory was originally allocated in nodeHashJoin.c, in hjstate =
> > makeNode(HashJoinState) line so it is really smaller than
> > VciScanState.
> > 
> > I did not use any table data for reproducer since asserts helps to
> > catch the original problem. I also simplified the original query
> > for a
> > reproducer.
> > 
> > > OK. I am not 100% certain about the asserts, but since the
> > > existing
> > > VCI tests are passing, I have merged your patch as-is into v24-
> > > 0002.
> > > I
> > > guess we will find out later if the bug below is due to an old
> > > code
> > > cast problem or a new code assert problem.
> > > 
> > 
> > Thanks for merging asserts. And looks like the problem is related
> > to
> > VCI join nodes.
> > There is no VCI hash join or VCI nested loop. There is a code in
> > VCI
> > planner that still puts VCI Sort or VCI Aggregate nodes on top of
> > regular join nodes which makes no sense to me. This is the cause of
> > the
> > problem. VCI Sort and VCI Aggregate then convert outer nodes to VCI
> > Scan since they know there can't be anything another. This can be
> > fixed
> > either by implementing VCI joins either by disabling them in a
> > deeper
> > way. Since we already have developer GUCs for them I would rather
> > set
> > them to disabled by default instead of removing all useful VCI
> > joins
> > related code.
> > 
> > Made a patch with a test and a simplest fix (disabling joins in
> > GUCs).
> > 
> 
> Hi Timur,
> 
> Thanks for the patch! Unfortunately, this is straying into areas with
> which I'm not familiar, so I'm taking it on faith that these are good
> changes. For now, I'm happy to merge your patch into the next VCI
> version, posted unless someone else objects.
> 
> ~
> 
> But, I still have a couple of questions for clarification:
> 
> 1. What about the original Valgrind issue?
> 
> Is that still a problem that needs to be addressed? E.g., is the bad
> allocation still lurking, and your sort avoidance patch is simply
> preventing the bad allocation from being exposed until some next
> thing
> randomly fails? Or is there no allocation problem anymore to worry
> about?

Allocations are fine, the problem was using some nodes as nodes of
another type (and bigger size) which leads to crossing boundary of
allocated memory. We are safe now and asserts guard us from repeating
the original bug.


> 2. What about your added Assert that was previously failing at
> executor/vci_sort.c:89?
> 
> That Assert is still present in vci_sort.c, but AFAICT the current
> tests are not executing that code. Do those patched GUC changes
> simply
> make that code unreachable now? In other words, should that
> previously
> failing Assert be left where it is or not? Should there be another
> test case added to execute this Assert?

Added simple test for running VCI sort node, it executes the assertion
code in vci_sort.c. No assertion fails, so VCI Sort itself is OK. Here
are both two commits on top of v25 version.


-- 
Regards,
Timur Magomedov

From de880dbdc7c7ce2559275b3a7628281633ffdc76 Mon Sep 17 00:00:00 2001
From: Timur Magomedov <[email protected]>
Date: Fri, 19 Sep 2025 16:26:40 +0300
Subject: [PATCH v25 1/2] Avoid VCI sort after non-VCI join in planner

---
 contrib/vci/expected/bugs.out | 38 +++++++++++++++++++++++++++++++++++
 contrib/vci/sql/bugs.sql      | 20 ++++++++++++++++++
 contrib/vci/vci_read_guc.c    |  4 ++--
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/contrib/vci/expected/bugs.out b/contrib/vci/expected/bugs.out
index d7e08dc6f7a..5872cfe1e02 100644
--- a/contrib/vci/expected/bugs.out
+++ b/contrib/vci/expected/bugs.out
@@ -94,3 +94,41 @@ EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE) SELEC
 (4 rows)
 
 DROP TABLE t6;
+CREATE TABLE main (id BIGSERIAL PRIMARY KEY);
+CREATE TABLE secondary (id BIGSERIAL PRIMARY KEY, main_id BIGINT REFERENCES main (id), val INTEGER);
+CREATE INDEX main_vci ON main USING vci (id);
+CREATE INDEX sec_vci ON secondary USING vci (id, main_id, val);
+EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
+SELECT *
+  FROM main m
+  JOIN secondary s
+	ON m.id = s.main_id
+ WHERE s.val in (
+		SELECT MAX(val)
+		  FROM secondary s2
+		 WHERE s2.main_id = m.id)
+ ORDER BY s.val;
+                                            QUERY PLAN                                            
+--------------------------------------------------------------------------------------------------
+ Sort (actual rows=0.00 loops=1)
+   Sort Key: s.val
+   Sort Method: quicksort  Memory: 25kB
+   ->  Nested Loop (actual rows=0.00 loops=1)
+         Join Filter: (s.val = (max(s2.val)))
+         ->  Hash Join (actual rows=0.00 loops=1)
+               Hash Cond: (s.main_id = m.id)
+               ->  Custom Scan (VCI Scan) using sec_vci on secondary s (actual rows=0.00 loops=1)
+                     Disabled: true
+               ->  Hash (never executed)
+                     ->  Index Only Scan using main_pkey on main m (never executed)
+                           Heap Fetches: 0
+                           Index Searches: 0
+         ->  Custom Scan (VCI Aggregate) (never executed)
+               Disabled: true
+               ->  Custom Scan (VCI Scan) using sec_vci on secondary s2 (never executed)
+                     Disabled: true
+                     Filter: (main_id = m.id)
+(18 rows)
+
+DROP TABLE secondary;
+DROP TABLE main;
diff --git a/contrib/vci/sql/bugs.sql b/contrib/vci/sql/bugs.sql
index 3b6b7303acb..49597abcf99 100644
--- a/contrib/vci/sql/bugs.sql
+++ b/contrib/vci/sql/bugs.sql
@@ -57,3 +57,23 @@ INSERT INTO t6 SELECT id, 'info' || id FROM generate_series(1, 500) id;
 ANALYZE t6;
 EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE) SELECT max(id) FROM t6;
 DROP TABLE t6;
+
+CREATE TABLE main (id BIGSERIAL PRIMARY KEY);
+CREATE TABLE secondary (id BIGSERIAL PRIMARY KEY, main_id BIGINT REFERENCES main (id), val INTEGER);
+
+CREATE INDEX main_vci ON main USING vci (id);
+CREATE INDEX sec_vci ON secondary USING vci (id, main_id, val);
+
+EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
+SELECT *
+  FROM main m
+  JOIN secondary s
+	ON m.id = s.main_id
+ WHERE s.val in (
+		SELECT MAX(val)
+		  FROM secondary s2
+		 WHERE s2.main_id = m.id)
+ ORDER BY s.val;
+
+DROP TABLE secondary;
+DROP TABLE main;
diff --git a/contrib/vci/vci_read_guc.c b/contrib/vci/vci_read_guc.c
index 3583944390f..6b95108824d 100644
--- a/contrib/vci/vci_read_guc.c
+++ b/contrib/vci/vci_read_guc.c
@@ -150,7 +150,7 @@ static struct config_bool VciConfigureNamesBool[] =
 			NULL,
 		},
 		&VciGuc.enable_hashjoin,
-		true,
+		false,
 		NULL, NULL, NULL,
 	},
 
@@ -162,7 +162,7 @@ static struct config_bool VciConfigureNamesBool[] =
 			NULL,
 		},
 		&VciGuc.enable_nestloop,
-		true,
+		false,
 		NULL, NULL, NULL,
 	},
 
-- 
2.43.0

From 09f34919697ab550ad6305566dbff396db4747e4 Mon Sep 17 00:00:00 2001
From: Timur Magomedov <[email protected]>
Date: Wed, 24 Sep 2025 16:29:55 +0300
Subject: [PATCH v25 2/2] Check VCI Sort node runs

---
 contrib/vci/expected/bugs.out | 14 ++++++++++++++
 contrib/vci/sql/bugs.sql      |  6 ++++++
 2 files changed, 20 insertions(+)

diff --git a/contrib/vci/expected/bugs.out b/contrib/vci/expected/bugs.out
index 5872cfe1e02..70e0534dd7c 100644
--- a/contrib/vci/expected/bugs.out
+++ b/contrib/vci/expected/bugs.out
@@ -98,6 +98,8 @@ CREATE TABLE main (id BIGSERIAL PRIMARY KEY);
 CREATE TABLE secondary (id BIGSERIAL PRIMARY KEY, main_id BIGINT REFERENCES main (id), val INTEGER);
 CREATE INDEX main_vci ON main USING vci (id);
 CREATE INDEX sec_vci ON secondary USING vci (id, main_id, val);
+-- Check VCI Sort is not put on top of non-VCI join
+-- See https://www.postgresql.org/message-id/a27f68845af78d404459fcab940bfae2ec7755e5.camel%40postgrespro.ru
 EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
 SELECT *
   FROM main m
@@ -130,5 +132,17 @@ SELECT *
                      Filter: (main_id = m.id)
 (18 rows)
 
+-- Check VCI Sort is used if suitable
+EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
+SELECT * FROM secondary s ORDER BY s.val;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Custom Scan (VCI Sort) (actual rows=0.00 loops=1)
+   Sort Key: val
+   Sort Method: quicksort  Memory: 25kB
+   ->  Custom Scan (VCI Scan) using sec_vci on secondary s (actual rows=0.00 loops=1)
+         Disabled: true
+(5 rows)
+
 DROP TABLE secondary;
 DROP TABLE main;
diff --git a/contrib/vci/sql/bugs.sql b/contrib/vci/sql/bugs.sql
index 49597abcf99..d4affcd479f 100644
--- a/contrib/vci/sql/bugs.sql
+++ b/contrib/vci/sql/bugs.sql
@@ -64,6 +64,8 @@ CREATE TABLE secondary (id BIGSERIAL PRIMARY KEY, main_id BIGINT REFERENCES main
 CREATE INDEX main_vci ON main USING vci (id);
 CREATE INDEX sec_vci ON secondary USING vci (id, main_id, val);
 
+-- Check VCI Sort is not put on top of non-VCI join
+-- See https://www.postgresql.org/message-id/a27f68845af78d404459fcab940bfae2ec7755e5.camel%40postgrespro.ru
 EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
 SELECT *
   FROM main m
@@ -75,5 +77,9 @@ SELECT *
 		 WHERE s2.main_id = m.id)
  ORDER BY s.val;
 
+-- Check VCI Sort is used if suitable
+EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)
+SELECT * FROM secondary s ORDER BY s.val;
+
 DROP TABLE secondary;
 DROP TABLE main;
-- 
2.43.0

Reply via email to