On Fri, Jul 13, 2012 at 1:15 AM, Magnus Hagander <mag...@hagander.net> wrote: > On Thu, Jul 12, 2012 at 6:07 PM, Fujii Masao <masao.fu...@gmail.com> wrote: >> On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander <mag...@hagander.net> wrote: >>> On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao <masao.fu...@gmail.com> wrote: >>>> On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao <masao.fu...@gmail.com> wrote: >>>>> Hi, >>>>> >>>>> I found several problems in pg_receivexlog, e.g., memory leaks, >>>>> file-descripter leaks, ..etc. The attached patch fixes these problems. >>>>> >>>>> ISTM there are still some other problems in pg_receivexlog, so I'll >>>>> read it deeply later. >>>> >>>> While pg_basebackup background process is streaming WAL records, >>>> if its replication connection is terminated (e.g., walsender in the server >>>> is accidentally terminated by SIGTERM signal), pg_basebackup ends >>>> up failing to include all required WAL files in the backup. The problem >>>> is that, in this case, pg_basebackup doesn't emit any error message at all. >>>> So an user might misunderstand that a base backup has been successfully >>>> taken even though it doesn't include all required WAL files. >>> >>> Ouch. That is definitely a bug if it behaves that way. >>> >>> >>>> To fix this problem, I think that, when the replication connection is >>>> terminated, ReceiveXlogStream() should check whether we've already >>>> reached the stop point by calling stream_stop() before returning TRUE. >>>> If we've not yet (this means that we've not received all required WAL >>>> files yet), ReceiveXlogStream() should return FALSE and >>>> pg_basebackup should emit an error message. Comments? >>> >>> Doesn't it already return false because it detects the error of the >>> connection? What's the codepath where we end up returning true even >>> though we had a connection failure? Shouldn't that end up under the >>> "could not read copy data" branch, which already returns false? >> >> You're right. If the error is detected, that function always returns false >> and the error message is emitted (but I think that current error message >> "pg_basebackup: child process exited with error 1" is confusing....), >> so it's OK. But if walsender in the server is terminated by SIGTERM, >> no error is detected and pg_basebackup background process gets out >> of the loop in ReceiveXlogStream() and returns true. > > Oh. Because the server does a graceful shutdown. D'uh, of course. > > Then yes, your suggested fix seems like a good one.
Attached patch adds the fix. Also I found I had forgotten to set the file descriptor to -1 at the end of ReceiveXlogStream(), in previously-committed my patch. Attached patch fixes this problem. Regards, -- Fujii Masao
pgreceivexlog_check_stoppoint_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers