Hello everyone!

Recently, we upgraded the AWS RDS instance from Postgres 12.14 to 15.4 and noticed extremely high disk consumption on the following query execution:

select (exists (select 1 as "one" from "public"."indexed_commit" where "public"."indexed_commit"."repo_id" in (964992,964994,964999, ...);

For some reason, the query planner starts using Seq Scan instead of the index on the "repo_id" column when requesting under user limited with RLS. On prod, it happens when there are more than 316 IDs in the IN part of the query, on stage - 3. If we execute the request from Superuser, the planner always uses the "repo_id" index.

Luckily, we can easily reproduce this on our stage database (which is smaller). If we add a multicolumn "repo_id, tenant_id" index, the planner uses it (Index Only Scan) with any IN params count under RLS.

Could you please clarify if this is a Postgres bug or not? Should we include the "tenant_id" column in all our indexes to make them work under RLS?


     Postgres version / Operating system+version


PostgreSQL 15.4 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit


     Full Table and Index Schema

\d indexed_commit
                        Table "public.indexed_commit"
    Column     |            Type             | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------
 id            | bigint                      |           | not null |
 commit_hash   | character varying(40)       |           | not null |
 parent_hash   | text                        | |          |
 created_ts    | timestamp without time zone |           | not null |
 repo_id       | bigint                      |           | not null |
 lines_added   | bigint                      | |          |
 lines_removed | bigint                      | |          |
 tenant_id     | uuid                        |           | not null |
 author_id     | uuid                        |           | not null |
Indexes:
    "indexed-commit-repo-idx" btree (repo_id)
    "indexed_commit_commit_hash_repo_id_key" UNIQUE CONSTRAINT, btree (commit_hash, repo_id) REPLICA IDENTITY     "indexed_commit_repo_id_without_loc_idx" btree (repo_id) WHERE lines_added IS NULL OR lines_removed IS NULL
Policies:
    POLICY "commit_isolation_policy"
      USING ((tenant_id = (current_setting('app.current_tenant_id'::text))::uuid))


     Table Metadata

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='indexed_commit';     relname     | relpages |  reltuples   | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------------+----------+--------------+---------------+---------+----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------+---------------
 indexed_commit | 18170522 | 7.451964e+08 |      18104744 | r |        9 | f              | {autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=200000,autovacuum_analyze_threshold=100000} |  148903337984


     EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN

Production queries:

316 ids under RLS limited user
<https://explain.depesz.com/s/X7Iq>

392 ids under RLS limited user <https://explain.depesz.com/s/lbkX>

392 ids under Superuser <https://explain.depesz.com/s/uKSG>


     History

It became slow after the upgrade to 15.4. We never had any issues before.


     Hardware

AWS DB class db.t4g.large + GP3 400GB disk


     Maintenance Setup

Are you running autovacuum? Yes

If so, with what settings?

autovacuum_vacuum_scale_factor=0,autovacuum_analyze_scale_factor=0,autovacuum_vacuum_threshold=200000,autovacuum_analyze_threshold=100000

SELECT * FROM pg_stat_user_tables WHERE relname='indexed_commit';
 relid | schemaname |    relname     | seq_scan | seq_tup_read | idx_scan  | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |        last_autovacuum        | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-------+------------+----------------+----------+--------------+-----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 24662 | public     | indexed_commit |     2485 |  49215378424 | 374533865 |    4050928807 | 764089750 |   2191615 |  18500311 |             0 |  745241398 |        383 |               46018 |              45343 |             | 2023-10-11 23:51:29.170378+00 |              | 2023-10-11 23:50:18.922351+00 |            0 |              672 |             0 |               753


     WAL Configuration

For data writing queries: have you moved the WAL to a different disk? Changed the settings? No.


     GUC Settings

What database configuration settings have you changed? We use default settings.

What are their values?

SELECT * FROM pg_settings WHERE name IN ('effective_cache_size', 'shared_buffers', 'work_mem');          name         | setting | unit | category                | short_desc | extra_desc |  context   | vartype |       source       | min_val |  max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart
----------------------+---------+------+---------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+--------------------+---------+------------+----------+----------+-----------+------------+------------+-----------------
 effective_cache_size | 494234  | 8kB  | Query Tuning / Planner Cost Constants | Sets the planner's assumption about the total size of the data caches. | That is, the total size of the caches (kernel cache and shared buffers) used for PostgreSQL data files. This is measured in disk pages, which are normally 8 kB each. | user       | integer | configuration file | 1       | 2147483647 |          | 524288   | 494234    |            |            | f  shared_buffers       | 247117  | 8kB  | Resource Usage / Memory               | Sets the number of shared memory buffers used by the server. | | postmaster | integer | configuration file | 16      | 1073741823 |          | 16384    | 247117    |            |            | f  work_mem             | 4096    | kB   | Resource Usage / Memory               | Sets the maximum memory to be used for query workspaces.               | This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.                                                 | user       | integer | default            | 64      | 2147483647 |          | 4096     | 4096      |            |            | f


     Statistics: n_distinct, MCV, histogram

Useful to check statistics leading to bad join plan. SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE attname='...' AND tablename='...' ORDER BY 1 DESC;

Returns 0 rows.


Kind regards,

Alexander

Reply via email to