Hello, At Thu, 20 Apr 2017 19:57:30 +0530, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote in <cafjfprdmd25lia_j0p1eikznxaxth9agnx2vcz2dbpnffno...@mail.gmail.com> > On Thu, Apr 20, 2017 at 4:08 PM, Suraj Kharage > <suraj.khar...@enterprisedb.com> wrote: > > Hello All, > > > > The query on foreign table hangs due to network down of remote server for > > near about 16 minutes before exiting. > > statement_timeout is expected to work in this case as well but when i tried > > statement_timeout, it is not working as expected. > > > > Below is test case to reproduce the issue: [I am testing this on two > > different systems] > > > > 1: Set the postgres_fdw > > > > postgres=# CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS > > (host '172.16.173.237', dbname 'postgres', port '5432',keepalives '1', > > keepalives_interval '3',keepalives_idle '3', keepalives_count '1'); > > CREATE SERVER > > postgres=# CREATE USER MAPPING FOR postgres SERVER myserver OPTIONS (user > > 'postgres', password 'edb'); > > CREATE USER MAPPING > > postgres=# CREATE FOREIGN TABLE test(id int) SERVER myserver; > > CREATE FOREIGN TABLE > > postgres=# select * from test; > > id > > ---- > > 1 > > 10 > > 2 > > 1 > > (4 rows) > > > > postgres=# \d > > List of relations > > Schema | Name | Type | Owner > > --------+------+---------------+---------- > > public | test | foreign table | postgres > > (1 row) > > postgres=# > > postgres=# select * from pg_foreign_server ; > > srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | > > srvoptions > > > > ----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------ > > ------- > > myserver | 10 | 16387 | | | | > > {host=172.16.173.237,dbname=postgres,port=5432,keepalives=1,keepalives_interval=3,keepalives_idle=3,keepalives_co > > unt=1} > > (1 row) > > > > 3. Execute the insert command. > > E.g: insert into test values (generate_series(1,1000000)); > > * You need to do network down of remote server during insert command. > > > > postgres=# set statement_timeout to 6000; > > SET > > postgres=# insert into test values (generate_series(1,1000000)); > > WARNING: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > > > CONTEXT: Remote SQL command: ABORT TRANSACTION > > ERROR: could not receive data from server: No route to host > > > > CONTEXT: Remote SQL command: INSERT INTO public.test(id) VALUES ($1) > > Time: 931427.002 ms > > > > It was in hang state for approx 15-16 minute before exit. > > > > In pg_log, i see below error for above query: > > ========= > > 2017-04-20 15:22:02 IST ERROR: could not receive data from server: No route > > to host > > 2017-04-20 15:22:02 IST CONTEXT: Remote SQL command: INSERT INTO > > public.test(id) VALUES ($1) > > 2017-04-20 15:22:02 IST STATEMENT: insert into test values > > (generate_series(1,1000000)); > > 2017-04-20 15:22:02 IST WARNING: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > 2017-04-20 15:22:02 IST CONTEXT: Remote SQL command: ABORT TRANSACTION > > ========== > > > > I tested the patch submitted by Ashutosh Bapat on community > > [https://www.postgresql.org/message-id/CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g%40mail.gmail.com] > > to make the statement_timeout working and i can see statement_timeout is > > working but it is taking some extra time as statement_timeout. > > > > Thanks for testing the patch. > > > postgres=# set statement_timeout to 6000; > > SET > > postgres=# > > postgres=# \timing on > > Timing is on. > > postgres=# insert into test values (generate_series(1,1000000)); > > -- [after executing query immediately disconnect the ethernet on remote > > server ] > > 2017-04-20 07:10:51.588 IST [10467] ERROR: canceling statement due to > > statement timeout > > 2017-04-20 07:10:51.588 IST [10467] STATEMENT: insert into test values > > (generate_series(1,1000000)); > > 2017-04-20 07:11:25.590 IST [10467] WARNING: discarding connection 0xfe4260 > > because of failed cancel request: PQcancel() -- connect() failed: No route > > to host > > WARNING: discarding connection 0xfe4260 because of failed cancel request: > > PQcancel() -- connect() failed: No route to host > > > > ERROR: canceling statement due to statement timeout > > Time: 40001.765 ms (00:40.002) > > postgres=# > > postgres=# > > > > In above case, I got the error related to statement timeout after 6 seconds, > > but it it taking more time (approx 34 sec and it is varing each time if you > > see below) to terminate the connection and to exit from query. As per the > > tcp keepavlies settings for foreign server, it should take max 6 sec to > > terminate the connection. > > The logs above show that 34 seconds elapsed between starting to abort > the transaction and knowing that the foreign server is unreachable. It > looks like it took that much time for the local server to realise that > the foreign server is not reachable. Looking at PQcancel code, it > seems to be trying to connect to the foreign server to cancel the > query. But somehow it doesn't seem to honor connect_timeout setting. > Is that expected?
Yes, and No. I think PQcancel requires connection timeout, but I think it is not necessariry the same with that of a foreign server. > Irrespective of what PQcancel does, it looks like postgres_fdw should > just slam the connection if query is being aborted because of > statement_timeout. But then pgfdw_xact_callback() doesn't seem to have > a way to know whether this ABORT is because of user's request to > cancel the query, statement timeout, an abort because of some other > error or a user requested abort. Except statement timeout (may be > user's request to cancel the query?), it should try to keep the > connection around to avoid any future reconnection. But I am not able > to see how can we provide that information to pgfdw_xact_callback(). Expiration of statement_timeout doesn't mean a stall of foreign connections. If we are to keep connections by, for example, a cancel request from client, we also should keep them on statememt_timeout because it is not necessariry triggered by a stall of foreign connection. I think we can detect a stall of the channel where the foreign connections are on by a cancel request with a very short timeout, although it is a bit incorrect. I reconsider this problem and my proposal for this issue is as the follows. - Foreign servers have a new options 'stall_detection_threshold' in milliseconds, maybe defaults to connect_timeout of the foreign server setting. For many foreign servers in a local network, it could be lowered to several tens of milliseconds. -+ Invoke PQcancel with the stall_detection_threashold. (It | doesn't accept such parameter so far) | +- If PQcancel returns with timeout, slam the connection down. | +- Else we continue to send ABORT TRANSACTION to the connection. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers