回复:回复:Queries that should be canceled will get stuck on secure_write function

2021-09-23 Thread 蔡梦娟(玊于)
Yes, it is more appropriate to set a duration time to determine whether 
secure_write() is stuck, but it is difficult to define how long the duration 
time is.
in my first patch, I add a GUC to allow the user to set the time, or it can be 
hardcoded if a time deemed reasonable is provided?



--I agree that 
something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens
Or we should introduce the dedicated timer for communication on the socket?



Re: 回复:Queries that should be canceled will get stuck on secure_write function

2021-09-21 Thread Fujii Masao




On 2021/09/22 1:14, 蔡梦娟(玊于) wrote:

Hi all, I want to know your opinion on this patch, or in what way do you think 
we should solve this problem?


I agree that something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens
Or we should introduce the dedicated timer for communication on the socket?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




回复:Queries that should be canceled will get stuck on secure_write function

2021-09-21 Thread 蔡梦娟(玊于)
Hi all, I want to know your opinion on this patch, or in what way do you think 
we should solve this problem?
--
发件人:蔡梦娟(玊于) 
发送时间:2021年9月9日(星期四) 17:38
收件人:Robert Haas ; Andres Freund ; 
alvherre ; masao.fujii 
抄 送:pgsql-hackers 
主 题:回复:Queries that should be canceled will get stuck on secure_write function


I changed the implementation about this problem: 
a) if the cancel query interrupt is from db for some reason, such as recovery 
conflict, then handle it immediately, and cancel request is treated as 
terminate request;
b) if the cancel query interrupt is from client, then ignore as original way

In the attached patch, I also add a tap test to generate a recovery conflict on 
a standby during the backend process is stuck on client write, check whether it 
can handle the cancel query request due to recovery conflict.

what do you think of it, hope to get your reply

Thanks & Best Regards




0001-Handle-cancel-interrupts-during-client-readwrite.patch
Description: Binary data


回复:Queries that should be canceled will get stuck on secure_write function

2021-09-09 Thread 蔡梦娟(玊于)


I changed the implementation about this problem: 
a) if the cancel query interrupt is from db for some reason, such as recovery 
conflict, then handle it immediately, and cancel request is treated as 
terminate request;
b) if the cancel query interrupt is from client, then ignore as original way

In the attached patch, I also add a tap test to generate a recovery conflict on 
a standby during the backend process is stuck on client write, check whether it 
can handle the cancel query request due to recovery conflict.

what do you think of it, hope to get your reply

Thanks & Best Regards




0001-Handle-cancel-interrupts-during-client-readwrite.patch
Description: Binary data


回复:Queries that should be canceled will get stuck on secure_write function

2021-09-06 Thread 蔡梦娟(玊于)


I add a test to reproduce the problem, you can see the attachment for specific 
content
during the last sleep time of the test, use pstack to get the stack of the 
backend process, which is as follows:

#0  0x7f6ebdd744d3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x007738d2 in WaitEventSetWait ()
#2  0x00675aae in secure_write ()
#3  0x0067bfab in internal_flush ()
#4  0x0067c13a in internal_putbytes ()
#5  0x0067c217 in socket_putmessage ()
#6  0x00497f36 in printtup ()
#7  0x006301e0 in standard_ExecutorRun ()
#8  0x007985fb in PortalRunSelect ()
#9  0x00799968 in PortalRun ()
#10 0x00795866 in exec_simple_query ()
#11 0x00796cff in PostgresMain ()
#12 0x00488339 in ServerLoop ()
#13 0x00717bbc in PostmasterMain ()
#14 0x00488f26 in main ()



0001-Test-stuck-on-secure-write.patch
Description: Binary data


回复:Queries that should be canceled will get stuck on secure_write function

2021-08-24 Thread 蔡梦娟(玊于)

Yes, pg_terminate_backend() can terminate the connection successfully in this 
case because ProcDiePending is set as true and ProcessClientWriteInterrupt() 
can handle it.

Queries those exceed standby delay limit can be terminated in this way, but 
what about other queries that should be canceled but get stuck on 
secure_write()? After all, there is currently no way to list all possible 
situations and then terminate these queries one by one.


--
发件人:Fujii Masao 
发送时间:2021年8月24日(星期二) 13:15
收件人:Robert Haas ; Alvaro Herrera 

抄 送:蔡梦娟(玊于) ; pgsql-hackers 
; Andres Freund 
主 题:Re: Queries that should be canceled will get stuck on secure_write function


On 2021/08/24 0:26, Alvaro Herrera wrote:
> Aren't we talking about query cancellations that occur in response to a
> standby delay limit?  Those aren't in response to user action.  What I
> mean is that if the standby delay limit is exceeded, then we send a
> query cancel; we expect the standby to cancel its query at that time and
> then the primary can move on.  But if the standby doesn't react, then we
> can have it terminate its connection.

+1


On 2021/08/24 3:45, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera  
> wrote:
>> Aren't we talking about query cancellations that occur in response to a
>> standby delay limit?  Those aren't in response to user action.
> 
> Oh, you're right. But I guess a similar problem could also occur in
> response to pg_terminate_backend(), no?

There seems no problem in that case because pg_terminate_backend() causes
a backend to set ProcDiePending to true in die() signal hander and
ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending.
No?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION