On Thu, May 4, 2017 at 8:08 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, May 4, 2017 at 7:13 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > >>> - For bonus points, give pgfdw_exec_query() an optional timeout >>> argument, and set it to 30 seconds or so when we're doing abort >>> cleanup. If the timeout succeeds, it errors out (which again >>> re-enters the abort path, dropping the connection and forcing outer >>> transaction levels to abort as well). >> >> Why do we need this 30 seconds timeout for abort cleanup failures? >> Isn't it sufficient if we propagate the error only on failures? > > Well, the problem is that right now we're waiting for vast amounts of > time when the remote connection dies. First, the user notices that > the command is running for too long and hits ^C. At that point, the > connection is probably already dead, and the user's been waiting for a > while already. Then, we wait for PQcancel() to time out. Then, we > wait for PQexec() to time out. The result of that, as Suraj mentioned > in the original email, is that it takes about 16 minutes for the > session to recover when the connection dies, even with keepalive > settings and connection timeout set all the way to the minimum. The > goal here is to make it take a more reasonable time to recover. > Without modifying libpq, we can't do anything about the need to wait > for PQcancel() to time out, but we can improve the rest of it. If we > removed that 30-second timeout, we would win in the case where either > ABORT TRANSACTION or ROLLBACK; RELEASE eventually succeeds but takes > more than 30 seconds. In that case, the change you are proposing > would allow us to reuse the connection instead of dropping it, and > would prevent a forced toplevel abort when subtransactions are in use. > However, the cost would be that when one of those commands never > succeeds, we would wait a lot longer before finishing a transaction > abort. >
As soon as the first command fails due to timeout, we will set 'abort_cleanup_failure' which will make a toplevel transaction to abort and also won't allow other statements to execute. The patch is trying to enforce a 30-second timeout after statement execution, so it has to anyway wait till the command execution completes (irrespective of whether the command succeeds or fail). It is quite possible I am missing some point, is it possible for you to tell in somewhat more detail in which exact case 30-second timeout will allow us to abort the toplevel transaction if we already do that in the case of statement failure case? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers