Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Thu, May 24, 2012 at 7:02 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander mag...@hagander.net wrote: While reviewing and cleaning this patch up a bit I noticed it actually broke pg_receivexlog in the renaming. Here is a new version of the patch, reworked based on the above so we're down to a single callback. I moved the rename last segment file even if it's not complete to be a parameter into ReceiveXlogStream() instead of trying to overload a third functionality on the callback (which is what broke pg_receivexlog). How does this look? Have I overlooked any cases? Thanks for the patch! Looks good to me except the followings: pg_basebackup.c:233: warning: passing argument 6 of 'ReceiveXlogStream' from incompatible pointer type Hmm. I could've sworn I fixed that. I think I forgot to refresh the patch :-) It seems confusing that *stream_continue()* returns TRUE when streaming *cannot continue*, i.e., its name seems to be inconsistent with what it does. What about renaming it to stream_stop? That's a pre-existing issue, but agreed, I will rename it. Similarly, it also seems confusing that *continue_streaming()* returns TRUE when streaming *cannot continue*. Yeah, I renamed that one to stop_streaming as well. Will apply the updated version. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander mag...@hagander.net wrote: While reviewing and cleaning this patch up a bit I noticed it actually broke pg_receivexlog in the renaming. Here is a new version of the patch, reworked based on the above so we're down to a single callback. I moved the rename last segment file even if it's not complete to be a parameter into ReceiveXlogStream() instead of trying to overload a third functionality on the callback (which is what broke pg_receivexlog). How does this look? Have I overlooked any cases? Thanks for the patch! Looks good to me except the followings: pg_basebackup.c:233: warning: passing argument 6 of 'ReceiveXlogStream' from incompatible pointer type I got the above warning on compile. To fix this, the third argument segment_finished needs to be added to reached_end_position(). It seems confusing that *stream_continue()* returns TRUE when streaming *cannot continue*, i.e., its name seems to be inconsistent with what it does. What about renaming it to stream_stop? Similarly, it also seems confusing that *continue_streaming()* returns TRUE when streaming *cannot continue*. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Fri, Mar 2, 2012 at 2:26 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Feb 28, 2012 at 09:22, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote: Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real 3m56.237s user 0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? This seems like a good thing in general. Why does it need to modify pg_receivexlog, though? I thought only pg_basebackup had tihs issue? I guess it is because of the change of the API to stream_continue_callback only? Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c. But the reason why I changed segment_callback() in pg_receivexlog.c is not the same. I did that because previously segment_finish_callback is called only at the end of WAL segment but in the patch it can be called at the middle of segment. OTOH, segment_callback() must emit a verbose message only when current WAL segment is complete. So I had to add the check of whether current WAL segment is partial or complete into segment_callback(). Yeah, I caught that. Looking at it after your patch, stream_continue_callback and segment_finish_callback are the same. Should we perhaps just fold them into a single stream_continue_callback? Since you had to move the detect segment end to the caller anyway? No. I think we cannot do that because in pg_receivexlog they are not the same. But couldn't they be made the same by making the same check as you put in for the verbose message above? While reviewing and cleaning this patch up a bit I noticed it actually broke pg_receivexlog in the renaming. Here is a new version of the patch, reworked based on the above so we're down to a single callback. I moved the rename last segment file even if it's not complete to be a parameter into ReceiveXlogStream() instead of trying to overload a third functionality on the callback (which is what broke pg_receivexlog). How does this look? Have I overlooked any cases? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ xlog_stream2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Tue, Feb 28, 2012 at 09:22, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote: Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real 3m56.237s user 0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? This seems like a good thing in general. Why does it need to modify pg_receivexlog, though? I thought only pg_basebackup had tihs issue? I guess it is because of the change of the API to stream_continue_callback only? Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c. But the reason why I changed segment_callback() in pg_receivexlog.c is not the same. I did that because previously segment_finish_callback is called only at the end of WAL segment but in the patch it can be called at the middle of segment. OTOH, segment_callback() must emit a verbose message only when current WAL segment is complete. So I had to add the check of whether current WAL segment is partial or complete into segment_callback(). Yeah, I caught that. Looking at it after your patch, stream_continue_callback and segment_finish_callback are the same. Should we perhaps just fold them into a single stream_continue_callback? Since you had to move the detect segment end to the caller anyway? No. I think we cannot do that because in pg_receivexlog they are not the same. But couldn't they be made the same by making the same check as you put in for the verbose message above? Another question related to this - since we clearly don't need the xlog switch in this case, should we make it conditional on the master as well, so we don't switch unnecessarily there as well? Maybe. At the end of backup, we force WAL segment switch, to ensure all required WAL files have been archived. So theoretically if WAL archiving is not enabled, we can skip WAL segment switch. But some backup tools might depend on this behavior I was thinking we could keep doing it in pg_stop_backup(), but avoid doing it when using pg_basebackup only... In standby-only backup, we always skip WAL segment switch. So there is no guarantee that all WAL files required for the backup are archived at the end of backup. This limitation is documented. Right. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote: Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real 3m56.237s user 0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? This seems like a good thing in general. Why does it need to modify pg_receivexlog, though? I thought only pg_basebackup had tihs issue? I guess it is because of the change of the API to stream_continue_callback only? Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c. But the reason why I changed segment_callback() in pg_receivexlog.c is not the same. I did that because previously segment_finish_callback is called only at the end of WAL segment but in the patch it can be called at the middle of segment. OTOH, segment_callback() must emit a verbose message only when current WAL segment is complete. So I had to add the check of whether current WAL segment is partial or complete into segment_callback(). Looking at it after your patch, stream_continue_callback and segment_finish_callback are the same. Should we perhaps just fold them into a single stream_continue_callback? Since you had to move the detect segment end to the caller anyway? No. I think we cannot do that because in pg_receivexlog they are not the same. Another question related to this - since we clearly don't need the xlog switch in this case, should we make it conditional on the master as well, so we don't switch unnecessarily there as well? Maybe. At the end of backup, we force WAL segment switch, to ensure all required WAL files have been archived. So theoretically if WAL archiving is not enabled, we can skip WAL segment switch. But some backup tools might depend on this behavior In standby-only backup, we always skip WAL segment switch. So there is no guarantee that all WAL files required for the backup are archived at the end of backup. This limitation is documented. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote: Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real 3m56.237s user 0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? This seems like a good thing in general. Why does it need to modify pg_receivexlog, though? I thought only pg_basebackup had tihs issue? I guess it is because of the change of the API to stream_continue_callback only? Looking at it after your patch, stream_continue_callback and segment_finish_callback are the same. Should we perhaps just fold them into a single stream_continue_callback? Since you had to move the detect segment end to the caller anyway? Another question related to this - since we clearly don't need the xlog switch in this case, should we make it conditional on the master as well, so we don't switch unnecessarily there as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_basebackup -x stream from the standby gets stuck
Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real3m56.237s user0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** *** 78,84 static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum); static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum); static void BaseBackup(void); ! static bool segment_callback(XLogRecPtr segendpos, uint32 timeline); #ifdef HAVE_LIBZ static const char * --- 78,84 static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum); static void BaseBackup(void); ! static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline); #ifdef HAVE_LIBZ static const char * *** *** 129,136 usage(void) /* ! * Called in the background process whenever a complete segment of WAL ! * has been received. * On Unix, we check to see if there is any data on our pipe * (which would mean we have a stop position), and if it is, check if * it is time to stop. --- 129,137 /* ! * Called in the background process every time data is received. ! * Also called when the streaming stops to check whether ! * the current log segment can be treated as a complete one. * On Unix, we check to see if there is any data on our pipe * (which would mean we have a stop position), and if it is, check if * it is time to stop. *** *** 138,144 usage(void) * time to stop. */ static bool ! segment_callback(XLogRecPtr segendpos, uint32 timeline) { if (!has_xlogendptr) { --- 139,145 * time to stop. */ static bool ! reached_end_position(XLogRecPtr segendpos, uint32 timeline) { if (!has_xlogendptr) { *** *** 231,237 LogStreamerMain(logstreamer_param * param) { if (!ReceiveXlogStream(param-bgconn, param-startptr, param-timeline, param-sysidentifier, param-xlogdir, ! segment_callback, NULL, standby_message_timeout)) /* * Any errors will already have been reported in the function process, --- 232,238 { if (!ReceiveXlogStream(param-bgconn, param-startptr, param-timeline, param-sysidentifier, param-xlogdir, ! reached_end_position, reached_end_position, standby_message_timeout)) /* * Any errors will already have been reported in the function process, *** a/src/bin/pg_basebackup/pg_receivexlog.c --- b/src/bin/pg_basebackup/pg_receivexlog.c *** *** 71,77 usage(void) static bool segment_callback(XLogRecPtr segendpos, uint32 timeline) { ! if (verbose) fprintf(stderr, _(%s: finished segment at %X/%X (timeline %u)\n), progname, segendpos.xlogid, segendpos.xrecoff, timeline); --- 71,77 static bool segment_callback(XLogRecPtr segendpos, uint32 timeline) { ! if (verbose segendpos.xrecoff % XLOG_SEG_SIZE == 0) fprintf(stderr, _(%s: finished segment