Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Alexander Korotkov
On Fri, Mar 15, 2024 at 2:25 PM Andrey M. Borodin  wrote:
> > On 15 Mar 2024, at 16:30, Alexander Korotkov  wrote:
> >
> > Maybe, but do you see any negative side effects of the unconditionally
> > unset of flags?
>
> Nope, just expressed possible option.
>
> >  If not, I would prefer to keep the code simple.
>
> IMO that's fine. Let's incorporate wording improvement from Kyotaro to the 
> same patch? PFA draft for this.

Thank you, I've pushed this, but split into two commits for bugfix and
for improved wordings.

> Maybe it worth to wait a little to see if some other failures will pop up?

I prefer to push now because we're quite confident this fix will
improve the situation.  It should increase the chances of catching
other bugs if any, because buildfarm which was failing on this bug
will go ahead and have a chance to spot more.

--
Regards,
Alexander Korotkov




Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Andrey M. Borodin


> On 15 Mar 2024, at 16:30, Alexander Korotkov  wrote:
> 
> Maybe, but do you see any negative side effects of the unconditionally
> unset of flags?

Nope, just expressed possible option.

>  If not, I would prefer to keep the code simple.

IMO that's fine. Let's incorporate wording improvement from Kyotaro to the same 
patch? PFA draft for this.

Maybe it worth to wait a little to see if some other failures will pop up?


Best regards, Andrey Borodin.


v2-0001-Fix-race-condition-in-transaction-timeout-TAP-tes.patch
Description: Binary data


Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Alexander Korotkov
On Fri, Mar 15, 2024 at 1:27 PM Andrey M. Borodin  wrote:
>
> > On 15 Mar 2024, at 15:44, Alexander Korotkov  wrote:
> >
> >  We loop in the interrupt checking, given that the injection point handler 
> > checks for interrupts internally.  I propose to unset the timeout flag 
> > before the injection point (see the attached patch).
>
> Oh, cool.
> As far as I understand, this is only necessary for the test with injection 
> point.
> So, maybe unset it only when injection points are enabled? Something like 
> this is already used in GIN.
>
> #ifdef USE_INJECTION_POINTS
> 
> #endif


Maybe, but do you see any negative side effects of the unconditionally
unset of flags?  If not, I would prefer to keep the code simple.

--
Regards,
Alexander Korotkov




Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Andrey M. Borodin



> On 15 Mar 2024, at 15:44, Alexander Korotkov  wrote:
> 
>  We loop in the interrupt checking, given that the injection point handler 
> checks for interrupts internally.  I propose to unset the timeout flag before 
> the injection point (see the attached patch).

Oh, cool.
As far as I understand, this is only necessary for the test with injection 
point.
So, maybe unset it only when injection points are enabled? Something like this 
is already used in GIN.

#ifdef USE_INJECTION_POINTS

#endif


Best regards, Andrey Borodin.



Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Alexander Korotkov
On Fri, Mar 15, 2024 at 9:44 AM Michael Paquier  wrote:
> On Fri, Mar 15, 2024 at 11:20:49AM +0500, Andrey M. Borodin wrote:
> > And hachi sometimes pass this test too [0].
> >
> > I’ll look more on this. Do I understand right that we have only 2
buildfarm members with injection points?
>
> I have added --enable-injection-points to three of them: hachi,
> gokiburi and batta.  The switch is also enabled in the CI by defaultm
> where you may be able to capture a bit more information than the
> buildfarm.

I managed to reproduce the issue locally by setting the following extra
config via TEMP_CONFIG:

log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_checkpoints = 'true'
log_statement = 'all'
wal_compression = 'on'
debug_parallel_query = regress
wal_compression = lz4
default_toast_compression = lz4

The backtrace is as follows.  We loop in the interrupt checking, given that
the injection point handler checks for interrupts internally.  I propose to
unset the timeout flag before the injection point (see the attached patch).

* thread #1, name = 'postgres', stop reason = signal SIGSTOP
  * frame #0: 0x7f3756f289e7 libc.so.6`epoll_wait(epfd=9,
events=0x556a302cabc8, maxevents=1, timeout=-1) at epoll_wait.c:30:10
frame #1: 0x556a2fc41aab postgres`WaitEventSetWait at latch.c:1570:7
frame #2: 0x556a2fc41e95 postgres`WaitLatch(latch=,
wakeEvents=, timeout=, wait_event_info=117440513)
at latch.c:538:6
frame #3: 0x556a2fc4f611
postgres`ConditionVariableTimedSleep(cv=0x7f3757b0e224, timeout=-1,
wait_event_info=117440513) at condition_variable.c:163:10
frame #4: 0x7f3757b10628 injection_points.so`injection_wait at
injection_points.c:153:3
frame #5: 0x556a2fdd4f54 postgres`InjectionPointRun at
injection_point.c:313:2
frame #6: 0x556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at
postgres.c:3430:4
frame #7: 0x556a2fc4f70b
postgres`ConditionVariableTimedSleep(cv=0x7f3757b0e224, timeout=-1,
wait_event_info=117440513) at condition_variable.c:196:3
frame #8: 0x7f3757b10628 injection_points.so`injection_wait at
injection_points.c:153:3
frame #9: 0x556a2fdd4f54 postgres`InjectionPointRun at
injection_point.c:313:2
frame #10: 0x556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at
postgres.c:3430:4
frame #11: 0x556a2fc4f70b
postgres`ConditionVariableTimedSleep(cv=0x7f3757b0e224, timeout=-1,
wait_event_info=117440513) at condition_variable.c:196:3
frame #12: 0x7f3757b10628 injection_points.so`injection_wait at
injection_points.c:153:3
frame #13: 0x556a2fdd4f54 postgres`InjectionPointRun at
injection_point.c:313:2
frame #14: 0x556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at
postgres.c:3430:4
frame #15: 0x556a2fabe63d postgres`ExecGather at nodeGather.c:315:3
frame #16: 0x556a2fabe574 postgres`ExecGather at nodeGather.c:269:10
frame #17: 0x556a2fabe549
postgres`ExecGather(pstate=0x556a303d8138) at nodeGather.c:222:9
frame #18: 0x556a2faa21da postgres`standard_ExecutorRun at
executor.h:274:9
frame #19: 0x556a2fc6f3cf postgres`PortalRunSelect at pquery.c:924:4
frame #20: 0x556a2fc70b7e postgres`PortalRun at pquery.c:768:18
frame #21: 0x556a2fc6c996 postgres`exec_simple_query at
postgres.c:1274:10
frame #22: 0x556a2fc6e378 postgres`PostgresMain at postgres.c:4682:7
frame #23: 0x556a2fbcefa0 postgres`ServerLoop.isra.0 at
postmaster.c:4452:2
frame #24: 0x556a2fbcff8b postgres`PostmasterMain at
postmaster.c:1478:11
frame #25: 0x556a2f8cb5a7 postgres`main(argc=4,
argv=0x556a302ca340) at main.c:197:3
frame #26: 0x7f3756e28150
libc.so.6`__libc_start_call_main(main=(postgres`main at main.c:59:1),
argc=4, argv=0x7c17e678) at libc_start_call_main.h:58:16
frame #27: 0x7f3756e28209
libc.so.6`__libc_start_main_impl(main=(postgres`main at main.c:59:1),
argc=4, argv=0x7c17e678, init=, fini=,
rtld_fini=, stack_end=0x7c17e668) at libc-start.c:360:3
frame #28: 0x556a2f8cbb75 postgres`_start + 37

--
Regards,
Alexander Korotkov


transaction_timeout_patch_fix.patch
Description: Binary data


Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Michael Paquier
On Fri, Mar 15, 2024 at 11:20:49AM +0500, Andrey M. Borodin wrote:
> And hachi sometimes pass this test too [0].
> 
> I’ll look more on this. Do I understand right that we have only 2 buildfarm 
> members with injection points?

I have added --enable-injection-points to three of them: hachi,
gokiburi and batta.  The switch is also enabled in the CI by defaultm
where you may be able to capture a bit more information than the
buildfarm.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Andrey M. Borodin



> On 15 Mar 2024, at 11:20, Andrey M. Borodin  wrote:
> 
> The failure seems to be Perl-related.

It’s not Perl-related. This is how wait_for_log() times out.

timed out waiting for match: terminating connection due to transaction 
timeout at t/005_timeouts.pl line 58.

But it did not even wait for a 100ms…

2024-03-15 03:32:26.492 UTC [1405044:4] 005_timeouts.pl LOG: statement: 
SELECT injection_points_wakeup('transaction-timeout’);
/ here we start to wait.
2024-03-15 03:32:26.492 UTC [1405044:5] 005_timeouts.pl LOG: 
disconnection: session time: 0:00:00.002 user=admin database=postgres 
host=[local]
2024-03-15 03:35:26.623 UTC [1405009:4] LOG: received immediate 
shutdown request


Best regards, Andrey Borodin.



Re: pgsql: Add TAP tests for timeouts

2024-03-15 Thread Andrey M. Borodin
Kyotaro, thanks for your corrections. I agree that wordings should be improved. 
But let’s deal with failures first.

> On 15 Mar 2024, at 10:28, Michael Paquier  wrote:
> 
> On Fri, Mar 15, 2024 at 10:42:35AM +0900, Kyotaro Horiguchi wrote:
>> In 005_timeouts.pl, I found the following comment.
> 
> Note also that the test is not stable, one of my machines with
> injection points enabled has complained twice in its last three runs:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi=2024-03-14%2015%3A05%3A04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi=2024-03-15%2003%3A21%3A15

The failure is

t/005_timeouts.pl (Wstat: 65280 Tests: 0 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=5, Tests=68, 185 wallclock secs ( 0.03 usr  0.00 sys +  0.86 cusr  0.80 
csys =  1.69 CPU)
Result: FAIL

The failure seems to be Perl-related.
As far as I can see I’ve done everything akin to 041_checkpoint_at_promote.pl. 
On batta this test pass, but hachi seems to be unhappy with this test.
And hachi sometimes pass this test too [0].

I’ll look more on this. Do I understand right that we have only 2 buildfarm 
members with injection points?


Best regards, Andrey Borodin.

[0] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hachi=2024-03-14%2022%3A02%3A41=module-test_misc-check



Re: pgsql: Add TAP tests for timeouts

2024-03-14 Thread Michael Paquier
On Fri, Mar 15, 2024 at 10:42:35AM +0900, Kyotaro Horiguchi wrote:
> In 005_timeouts.pl, I found the following comment.

Note also that the test is not stable, one of my machines with
injection points enabled has complained twice in its last three runs:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi=2024-03-14%2015%3A05%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi=2024-03-15%2003%3A21%3A15
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for timeouts

2024-03-14 Thread Kyotaro Horiguchi
Hello.

At Thu, 14 Mar 2024 11:25:42 +, Alexander Korotkov 
 wrote in 
> Add TAP tests for timeouts
> 
> This commit adds new tests to verify that transaction_timeout,
> idle_session_timeout, and idle_in_transaction_session_timeout work as 
> expected.
> We introduce new injection points in before throwing a timeout FATAL error
> and check these injection points are reached.
> 
> Discussion: 
> https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
> Author: Andrey Borodin
> Reviewed-by: Alexander Korotkov

In 005_timeouts.pl, I found the following comment.

> # If we send \q with $psql_session->quit it can get to pump already closed.
> # So \q is in initial script, here we only finish IPC::Run.
> $psql_session->{run}->finish;

I'm not sure if "it can get to pump already closed." makes sense. I
guess that it means "the command can get to be pumped (or "can be
sent") to the session already closed" or something similar?

> # 2. Test of the sidle in transaction timeout

s/sidle/idle/ ?

> # Wait until the backend is in the timeout injection point.

I'm not sure, but it seems that "is in" meant "passes" or something like?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center