Hello,

At Thu, 20 Apr 2017 19:57:30 +0530, Ashutosh Bapat 
<ashutosh.ba...@enterprisedb.com> wrote in 
<cafjfprdmd25lia_j0p1eikznxaxth9agnx2vcz2dbpnffno...@mail.gmail.com>
> On Thu, Apr 20, 2017 at 4:08 PM, Suraj Kharage
> <suraj.khar...@enterprisedb.com> wrote:
> > Hello All,
> >
> > The query on foreign table hangs due to network down of remote server for
> > near about 16 minutes before exiting.
> > statement_timeout is expected to work in this case as well but when i tried
> > statement_timeout, it is not working as expected.
> >
> > Below is test case to reproduce the issue: [I am testing this on two
> > different systems]
> >
> > 1: Set the postgres_fdw
> >
> > postgres=# CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
> > (host '172.16.173.237', dbname 'postgres', port '5432',keepalives '1',
> > keepalives_interval '3',keepalives_idle '3', keepalives_count '1');
> > CREATE SERVER
> > postgres=# CREATE USER MAPPING FOR postgres SERVER myserver OPTIONS (user
> > 'postgres', password 'edb');
> > CREATE USER MAPPING
> > postgres=# CREATE FOREIGN TABLE test(id int) SERVER myserver;
> > CREATE FOREIGN TABLE
> > postgres=# select * from test;
> >  id
> > ----
> >   1
> >  10
> >   2
> >   1
> > (4 rows)
> >
> > postgres=# \d
> >             List of relations
> >  Schema | Name |     Type      |  Owner
> > --------+------+---------------+----------
> >  public | test | foreign table | postgres
> > (1 row)
> > postgres=#
> > postgres=# select * from pg_foreign_server ;
> >  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |
> > srvoptions
> >
> > ----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------
> > -------
> >  myserver |       10 |  16387 |         |            |        |
> > {host=172.16.173.237,dbname=postgres,port=5432,keepalives=1,keepalives_interval=3,keepalives_idle=3,keepalives_co
> > unt=1}
> > (1 row)
> >
> > 3. Execute the insert command.
> > E.g: insert into test values (generate_series(1,1000000));
> > * You need to do network down of remote server during insert command.
> >
> > postgres=# set statement_timeout to 6000;
> > SET
> > postgres=# insert into test values (generate_series(1,1000000));
> > WARNING:  server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> >
> > CONTEXT:  Remote SQL command: ABORT TRANSACTION
> > ERROR:  could not receive data from server: No route to host
> >
> > CONTEXT:  Remote SQL command: INSERT INTO public.test(id) VALUES ($1)
> > Time: 931427.002 ms
> >
> > It was in hang state for approx 15-16 minute before exit.
> >
> > In pg_log, i see below error for above query:
> > =========
> > 2017-04-20 15:22:02 IST ERROR:  could not receive data from server: No route
> > to host
> > 2017-04-20 15:22:02 IST CONTEXT:  Remote SQL command: INSERT INTO
> > public.test(id) VALUES ($1)
> > 2017-04-20 15:22:02 IST STATEMENT:  insert into test values
> > (generate_series(1,1000000));
> > 2017-04-20 15:22:02 IST WARNING:  server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > 2017-04-20 15:22:02 IST CONTEXT:  Remote SQL command: ABORT TRANSACTION
> > ==========
> >
> > I tested the patch submitted by Ashutosh Bapat on community
> > [https://www.postgresql.org/message-id/CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g%40mail.gmail.com]
> > to make the statement_timeout working and i can see statement_timeout is
> > working but it is taking some extra time as statement_timeout.
> >
> 
> Thanks for testing the patch.
> 
> > postgres=# set statement_timeout to 6000;
> > SET
> > postgres=#
> > postgres=# \timing on
> > Timing is on.
> > postgres=# insert into test values (generate_series(1,1000000));
> > -- [after executing query immediately disconnect the ethernet on remote
> > server ]
> > 2017-04-20 07:10:51.588 IST [10467] ERROR:  canceling statement due to
> > statement timeout
> > 2017-04-20 07:10:51.588 IST [10467] STATEMENT:  insert into test values
> > (generate_series(1,1000000));
> > 2017-04-20 07:11:25.590 IST [10467] WARNING:  discarding connection 0xfe4260
> > because of failed cancel request: PQcancel() -- connect() failed: No route
> > to host
> > WARNING:  discarding connection 0xfe4260 because of failed cancel request:
> > PQcancel() -- connect() failed: No route to host
> >
> > ERROR:  canceling statement due to statement timeout
> > Time: 40001.765 ms (00:40.002)
> > postgres=#
> > postgres=#
> >
> > In above case, I got the error related to statement timeout after 6 seconds,
> > but it it taking more time (approx 34 sec and it is varing each time if you
> > see below) to terminate the connection and to exit from query. As per the
> > tcp keepavlies settings for foreign server, it should take max 6 sec to
> > terminate the connection.
> 
> The logs above show that 34 seconds elapsed between starting to abort
> the transaction and knowing that the foreign server is unreachable. It
> looks like it took that much time for the local server to realise that
> the foreign server is not reachable. Looking at PQcancel code, it
> seems to be trying to connect to the foreign server to cancel the
> query. But somehow it doesn't seem to honor connect_timeout setting.
> Is that expected?

Yes, and No. I think PQcancel requires connection timeout, but I
think it is not necessariry the same with that of a foreign
server.

> Irrespective of what PQcancel does, it looks like postgres_fdw should
> just slam the connection if query is being aborted because of
> statement_timeout. But then pgfdw_xact_callback() doesn't seem to have
> a way to know whether this ABORT is because of user's request to
> cancel the query, statement timeout, an abort because of some other
> error or a user requested abort. Except statement timeout (may be
> user's request to cancel the query?), it should try to keep the
> connection around to avoid any future reconnection. But I am not able
> to see how can we provide that information to pgfdw_xact_callback().

Expiration of statement_timeout doesn't mean a stall of foreign
connections. If we are to keep connections by, for example, a
cancel request from client, we also should keep them on
statememt_timeout because it is not necessariry triggered by a
stall of foreign connection.

I think we can detect a stall of the channel where the foreign
connections are on by a cancel request with a very short timeout,
although it is a bit incorrect.

I reconsider this problem and my proposal for this issue is as
the follows.

- Foreign servers have a new options 'stall_detection_threshold'
  in milliseconds, maybe defaults to connect_timeout of the
  foreign server setting. For many foreign servers in a local
  network, it could be lowered to several tens of milliseconds.

-+ Invoke PQcancel with the stall_detection_threashold. (It
 | doesn't accept such parameter so far)
 |
 +- If PQcancel returns with timeout, slam the connection down.
 |
 +- Else we continue to send ABORT TRANSACTION to the connection.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to