On Mon, 2026-04-27 at 15:52 +0000, Dirschel, Steve-CW wrote:
> Aurora Postgres version 17.4.

Aurora works quite differently, as far as storage is concerned, so you may be
suffering from some peculiarity of that closed source software.

> Table in question:
> 
> 
> \d poslog_publisher_rms_stage
>                    Table "public.poslog_publisher_rms_stage"
>       Column       |           Type           | Collation | Nullable | Default
> -------------------+--------------------------+-----------+----------+---------
>  stage_id          | uuid                     |           | not null |
>  status            | character varying(100)   |           |          |
>  message_body      | text                     |           | not null |
>  error_code        | character varying(100)   |           |          |
>  error_category    | character varying(100)   |           |          |
>  error_message     | text                     |           |          |
>  error_retry_count | integer                  |           |          | 0
>  create_date       | timestamp with time zone |           | not null | now()
>  modified_date     | timestamp with time zone |           | not null | now()
> Indexes:
>     "poslog_publisher_rms_stage_pkey" PRIMARY KEY, btree (stage_id)
>     "idx_poslog_publisher_stage_create_date_col" btree (create_date)
>     "idx_poslog_publisher_stage_status_error_retry_count_modi_date_c" btree 
> (status, error_retry_count, modified_date)
> Referenced by:
>     TABLE "poslog_publisher_rms_detail" CONSTRAINT 
> "fk_poslog_publisher_detail_stage_id" FOREIGN KEY (stage_id) REFERENCES 
> poslog_publisher_rms_stage(stage_id)
> Publications:
>     “sashpsrms_publication"
> 
> The table is constantly getting loaded into.  Rows are inserted with a status 
> ready and
> then there is a different process looking for that status and will update to 
> processed
> after processing the row.  We have multiple tables like this.
> 
> Then every 2 hours a different process runs this query looking for failed or 
> unprocessed rows:
> 
> select
>                       ppse.stage_id as stageId,
>                       ppse.status as status,
>                       ppse.message_body as messageBody
>                     from
>                       poslog_publisher_rms_stage ppse
>                     where
>                       ppse.status in ('UNPROCESSED','FAILED')
>                       and ppse.error_retry_count < 3
>                     order by
>                       ppse.create_date
>                     limit 100;
> 
> If I run that query with explain it is doing a ton of work to find 0 rows.  
> The index it
> uses seems appropriate for the query.

The index can be used, but it is far from perfect.  The ideal index would be:

  CREATE INDEX ON poslog_publisher_rms_stage (create_date)
     WHERE status in ('UNPROCESSED','FAILED') AND error_retry_count < 3;

>                                                                               
>                          QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=10.92..10.93 rows=1 width=1019) (actual 
> time=66566.823..66566.824 rows=0 loops=1)
>    Buffers: shared hit=1509768 read=2011479
>    I/O Timings: shared read=79792.017
>    ->  Sort  (cost=10.92..10.93 rows=1 width=1019) (actual 
> time=66566.821..66566.821 rows=0 loops=1)
>          Sort Key: create_date
>          Sort Method: quicksort  Memory: 25kB
>          Buffers: shared hit=1509768 read=2011479
>          I/O Timings: shared read=79792.017
>          ->  Index Scan using 
> idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on 
> poslog_publisher_rms_stage ppse  (cost=0.57..10.91 rows=1 width=1019) (actual 
> time=66566.761..66566.761 rows=0 loops=1)
>                Index Cond: (((status)::text = ANY 
> ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
>                Buffers: shared hit=1509765 read=2011479
>                I/O Timings: shared read=79792.017
>  Planning:
>    Buffers: shared hit=195 read=1
>    I/O Timings: shared read=1.038
>  Planning Time: 2.909 ms
>  Execution Time: 66581.498 ms
> 
> The query did 3.5 million block reads when scanning the index of which 1.5 
> million were
> in memory and 2 million were from disk.  5 seconds later I ran the exact same 
> query again:
> 
>                                                                               
>                       QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=10.92..10.93 rows=1 width=1019) (actual time=23.589..23.591 
> rows=0 loops=1)
>    Buffers: shared hit=18736
>    ->  Sort  (cost=10.92..10.93 rows=1 width=1019) (actual 
> time=23.588..23.589 rows=0 loops=1)
>          Sort Key: create_date
>          Sort Method: quicksort  Memory: 25kB
>          Buffers: shared hit=18736
>          ->  Index Scan using 
> idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on 
> poslog_publisher_rms_stage ppse  (cost=0.57..10.91 rows=1 width=1019) (actual 
> time=23.583..23.584 rows=0 loops=1)
>                Index Cond: (((status)::text = ANY 
> ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
>                Buffers: shared hit=18736
>  Planning Time: 0.118 ms
>  Execution Time: 23.628 ms
> 
> Now it only did 18k block reads all in memory.  It used the same index, it 
> also
> returned 0 rows.  Between those 2 runs I looked at pg_stat_user_tables and 
> could see the
> n_tup_ins increased by 13, n_tup_del increased by 13, n_live_tup increased by 
> 13, and
> n_dead_tup increased by 13.  n_live_tup was 96 million and n_dead_tup was 8.4 
> million.
> 18k logical reads to find 0 rows is still high but I believe that is most 
> likely caused
> by the 8.4 million n_dead_tups.
> 
> What is going on here where this query has to do 3+ million block reads to 
> find 0 rows?
> And how is it possible when I run the query 2 times in a row the logical 
> reads from the
> 2nd run comes down significantly?  Is this somehow related to determining if 
> rows are
> visible or something like that?

The only way I can imagine this happening in PostgreSQL is if the first 
execution
kills a lot of index tuples 
(https://www.cybertec-postgresql.com/en/killed-index-tuples/).

> When I waited 15 minutes between runs the inserted/updated rows only 
> increased by
> 10.3k yet total block reads increased by 3 million.


Now that seems to speak against the above theory, so you are suffering from 
some Amazon-
specific behavior.

Yours,
Laurenz Albe


Reply via email to