On Monday, May 29, 2023 5:22 PM Drouvot, Bertrand 
<bertranddrouvot...@gmail.com> wrote:
> 
> Hi,
> 
> On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:
> > Hi hackers,
> >
> > I saw a buildfarm failure on "dikkop"[1]. It failed in
> > 035_standby_logical_decoding.pl, because the slots row_removal_inactiveslot
> and
> > row_removal_activeslot are not invalidated after vacuum.
> 
> Thanks for sharing!
> 
> >
> > regress_log_035_standby_logical_decoding:
> > ```
> > [12:15:05.943](4.442s) not ok 22 - inactiveslot slot invalidation is logged 
> > with
> vacuum on pg_class
> > [12:15:05.945](0.003s)
> > [12:15:05.946](0.000s) #   Failed test 'inactiveslot slot invalidation is 
> > logged
> with vacuum on pg_class'
> > #   at t/035_standby_logical_decoding.pl line 238.
> > [12:15:05.948](0.002s) not ok 23 - activeslot slot invalidation is logged 
> > with
> vacuum on pg_class
> > [12:15:05.949](0.001s)
> > [12:15:05.950](0.000s) #   Failed test 'activeslot slot invalidation is 
> > logged with
> vacuum on pg_class'
> > #   at t/035_standby_logical_decoding.pl line 244.
> > [13:38:26.977](5001.028s) # poll_query_until timed out executing this query:
> > # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts 
> > where
> datname = 'testdb'
> > # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > [13:38:26.980](0.003s) not ok 24 - confl_active_logicalslot updated
> > [13:38:26.982](0.002s)
> > [13:38:26.982](0.000s) #   Failed test 'confl_active_logicalslot updated'
> > #   at t/035_standby_logical_decoding.pl line 251.
> > Timed out waiting confl_active_logicalslot to be updated at
> t/035_standby_logical_decoding.pl line 251.
> > ```
> >
> > 035_standby_logical_decoding.pl:
> > ```
> > # This should trigger the conflict
> > $node_primary->safe_psql(
> >     'testdb', qq[
> >    CREATE TABLE conflict_test(x integer, y text);
> >    DROP TABLE conflict_test;
> >    VACUUM pg_class;
> >    INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
> > ]);
> >
> > $node_primary->wait_for_replay_catchup($node_standby);
> >
> > # Check invalidation in the logfile and in pg_stat_database_conflicts
> > check_for_invalidation('row_removal_', $logstart, 'with vacuum on 
> > pg_class');
> > ```
> >
> > Is it possible that the vacuum command didn't remove tuples and then the
> > conflict was not triggered?
> 
> The flush_wal table added by Andres should guarantee that the WAL is flushed,
> so
> the only reason I can think about is indeed that the vacuum did not remove
> tuples (
> but I don't get why/how that could be the case).
> 
> > It seems we can't confirm this because there is not
> > enough information.
> 
> Right, and looking at its status history most of the time the test is green 
> (making
> it
> even more difficult to diagnose).
> 
> > Maybe "vacuum verbose" can be used to provide more
> > information.
> 
> I can see that dikkop "belongs" to Tomas (adding Tomas to this thread).
> Tomas, do you think it would be possible to run some
> 035_standby_logical_decoding.pl
> manually with "vacuum verbose" in the test mentioned above? (or any other
> way you can think
> about that would help diagnose this random failure?).
> 

Thanks for your reply.

I saw another failure on "drongo" [1], which looks like a similar problem. 

Maybe a temporary patch can be committed to dump the result of "vacuum verbose".
And we can check this when the test fails.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-05-26%2018%3A05%3A57

Regards,
Shi Yu

Reply via email to