Hi I was made aware of a crash in pg_stat_statements when processing IN clauses that contain both constants and variable expressions.
The original case was more complex, but I constructed a minimal repro case:
```
CREATE TABLE test_squash (id int, data int);
SELECT * FROM test_squash a, test_squash b WHERE a.id IN (1, 2, 3,
b.id, b.id + 1);
```
With asserts enabled:
```
TRAP: failed Assert("len_to_wrt >= 0"), File:
"../contrib/pg_stat_statements/pg_stat_statements.c", Line: 2906, PID:
299892
postgres: ubuntu regression_pg_stat_statements [local]
SELECT(ExceptionalCondition+0x74)[0x5bf8a012dc44]
```
The crash occurs during query normalization in pg_stat_statements and
started with
0f65f3eec4. The execution plan shows what's happening:
```
postgres=# EXPLAIN SELECT * FROM test_squash a, test_squash b WHERE
a.id IN (1, 2, 3, b.id, b.id + 1);
QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..121376.35 rows=126798 width=16)
Join Filter: ((a.id = ANY ('{1,2,3}'::integer[])) OR (a.id = b.id)
OR (a.id = (b.id + 1)))
-> Seq Scan on test_squash a (cost=0.00..32.60 rows=2260 width=8)
-> Materialize (cost=0.00..43.90 rows=2260 width=8)
-> Seq Scan on test_squash b (cost=0.00..32.60 rows=2260 width=8)
(5 rows)
```
When IN clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.
This transformation can create overlapping token locations that cause
pg_stat_statements query normalization to crash.
To fix, disable squashing for mixed IN expressions by detecting when
both variables and constants are present during parse transformation,
and setting the array's list_start/list_end to -1 to prevent squashing.
I also thought about fixing this in `generate_normalized_query` where
instead of the assert:
```
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
Assert(len_to_wrt >= 0);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
```
do something like this
```
len_to_wrt = off - last_off;
len_to_wrt -= last_tok_len;
if (len_to_wrt < 0)
continue;
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
```
to find an overlapping location. But this would normalize the statement to:
```
SELECT * FROM test_squash a, test_squash b WHERE a.id IN ($1 /*, ... */);
```
Which is not correct as it will squash the variables and potentially
introduce some
silent regressions in normalizations.
So, It is better to skip squashing altogether in this case. Attached is a patch.
If there is agreement to this fix, I think it should be backpacthed to
all versions that
support squashing of IN-lists.
Thoughts?
--
Sami Imseih
Amazon Web Services (AWS)
v1-0001-pg_stat_statements-Fix-crash-in-with-mixed-IN-cla.patch
Description: Binary data
