Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Tue, Jan 22, 2013 at 7:31 AM, Amit Kapila amit.kap...@huawei.com wrote: On Monday, January 21, 2013 6:22 PM Magnus Hagander On Fri, Jan 18, 2013 at 7:50 AM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote: On 07.01.2013 16:23, Boszormenyi Zoltan wrote: Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Now that I look at this a high-level perspective, why are we only worried about timeouts in the Copy-mode and when connecting? The initial checkpoint could take a long time too, and if the server turns into a black hole while the checkpoint is running, pg_basebackup will still hang. Then again, a short timeout on that phase would be a bad idea, because the checkpoint can indeed take a long time. True, but IMO, if somebody want to take basebackup, he should do that when the server is not loaded. A lot of installations don't have such an optino, because there is no time whe nthe server is not loaded. Good to know about it. I have always heard that customer will run background maintenance activities (Reindex, Vacuum Full, etc) when the server is less loaded. For example a. Billing applications in telecom, at night times they can be relatively less loaded. That assumes there is a nighttime.. If you're operating in enough timezones, that won't happen. b. Any databases used for Sensex transactions, they will be relatively free once the market is closed. c. Banking solutions, because transactions are done mostly in day times. True. But those are definitely very very narrow usecases ;) Don't get me wrong. There are a *lot* of people who have nighttimes to do maintenance in. They are the lucky ones :) But we can't assume this scenario. There will be many cases where Database server will be loaded all the times, if you can give some example, it will be a good learning for me. Most internet based businesses that do business in multiple countries. Or really, any business that has customers in multiple timezones across the world. And even more to the point, any business who's *customers* have customers in multiple timezones across the world, provided they are services-based. In streaming replication, the keep-alive messages carry additional information, the timestamps and WAL locations, so a keepalive makes sense at that level. But otherwise, aren't we just trying to reimplement TCP keepalives? TCP keepalives are not perfect, but if we want to have an application level timeout, it should be implemented in the FE/BE protocol. I don't think we need to do anything specific to pg_basebackup. The user can simply specify TCP keepalive settings in the connection string, like with any libpq program. I think currently user has no way to specify TCP keepalive settings from pg_basebackup, please let me know if there is any such existing way? You can set it through environment variables. As was discussed elsewhere, it would be good to have the ability to do it natively to pg_basebackup as well. Sure, already modifying the existing patch to support connection string in pg_basebackup and pg_receivexlog. Good. I think specifying TCP settings is very cumbersome for most users, that's the reason most standard interfaces (ODBC/JDBC) have such application level timeout mechanism. By implementing in FE/BE protocol (do you mean to say that make such non-blocking behavior inside Libpq or something else), it might be generic and can be used for others as well but it might need few interface changes. If it's specifying them that is cumbersome, then that's the part we should fix, rather than modifying the protocol, no? That can be done as part of point 2 of initial proposal (2. Support recv_timeout separately to provide a way to users who are not comfortable tcp keepalives). Looking at the bigger picture, we should in that case support those on *all* our frontend applications, and not just pg_basebackup. To me, it makes more sense to just say use the connection string method to connect when you need to set these parameters. There are always going to be some parameters that require that. To achieve this there can be 2 ways. 1. Change in FE/BE protocol - I am not sure exactly how this can be done, but as per Heikki this is better way of implementing it. 2. Make the socket as non-blocking in pg_basebackup. Advantage of Approach-1 is that if we do in such a fashion that in lower layers (libpq) it is addressed then all other apps (pg_basebackup, etc) can use it, no need to handle separately in each application. So now as changes in Approach-1 seems to be invasive, we decided to do it later. Ok - I haven't really been following the thread, but that doesn't seem unreasonable. The thing I was objecting to is putting in special parameters to
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Fri, Jan 18, 2013 at 7:50 AM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote: On 07.01.2013 16:23, Boszormenyi Zoltan wrote: Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Now that I look at this a high-level perspective, why are we only worried about timeouts in the Copy-mode and when connecting? The initial checkpoint could take a long time too, and if the server turns into a black hole while the checkpoint is running, pg_basebackup will still hang. Then again, a short timeout on that phase would be a bad idea, because the checkpoint can indeed take a long time. True, but IMO, if somebody want to take basebackup, he should do that when the server is not loaded. A lot of installations don't have such an optino, because there is no time whe nthe server is not loaded. In streaming replication, the keep-alive messages carry additional information, the timestamps and WAL locations, so a keepalive makes sense at that level. But otherwise, aren't we just trying to reimplement TCP keepalives? TCP keepalives are not perfect, but if we want to have an application level timeout, it should be implemented in the FE/BE protocol. I don't think we need to do anything specific to pg_basebackup. The user can simply specify TCP keepalive settings in the connection string, like with any libpq program. I think currently user has no way to specify TCP keepalive settings from pg_basebackup, please let me know if there is any such existing way? You can set it through environment variables. As was discussed elsewhere, it would be good to have the ability to do it natively to pg_basebackup as well. I think specifying TCP settings is very cumbersome for most users, that's the reason most standard interfaces (ODBC/JDBC) have such application level timeout mechanism. By implementing in FE/BE protocol (do you mean to say that make such non-blocking behavior inside Libpq or something else), it might be generic and can be used for others as well but it might need few interface changes. If it's specifying them that is cumbersome, then that's the part we should fix, rather than modifying the protocol, no? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Monday, January 21, 2013 6:22 PM Magnus Hagander On Fri, Jan 18, 2013 at 7:50 AM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote: On 07.01.2013 16:23, Boszormenyi Zoltan wrote: Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Now that I look at this a high-level perspective, why are we only worried about timeouts in the Copy-mode and when connecting? The initial checkpoint could take a long time too, and if the server turns into a black hole while the checkpoint is running, pg_basebackup will still hang. Then again, a short timeout on that phase would be a bad idea, because the checkpoint can indeed take a long time. True, but IMO, if somebody want to take basebackup, he should do that when the server is not loaded. A lot of installations don't have such an optino, because there is no time whe nthe server is not loaded. Good to know about it. I have always heard that customer will run background maintenance activities (Reindex, Vacuum Full, etc) when the server is less loaded. For example a. Billing applications in telecom, at night times they can be relatively less loaded. b. Any databases used for Sensex transactions, they will be relatively free once the market is closed. c. Banking solutions, because transactions are done mostly in day times. There will be many cases where Database server will be loaded all the times, if you can give some example, it will be a good learning for me. In streaming replication, the keep-alive messages carry additional information, the timestamps and WAL locations, so a keepalive makes sense at that level. But otherwise, aren't we just trying to reimplement TCP keepalives? TCP keepalives are not perfect, but if we want to have an application level timeout, it should be implemented in the FE/BE protocol. I don't think we need to do anything specific to pg_basebackup. The user can simply specify TCP keepalive settings in the connection string, like with any libpq program. I think currently user has no way to specify TCP keepalive settings from pg_basebackup, please let me know if there is any such existing way? You can set it through environment variables. As was discussed elsewhere, it would be good to have the ability to do it natively to pg_basebackup as well. Sure, already modifying the existing patch to support connection string in pg_basebackup and pg_receivexlog. I think specifying TCP settings is very cumbersome for most users, that's the reason most standard interfaces (ODBC/JDBC) have such application level timeout mechanism. By implementing in FE/BE protocol (do you mean to say that make such non-blocking behavior inside Libpq or something else), it might be generic and can be used for others as well but it might need few interface changes. If it's specifying them that is cumbersome, then that's the part we should fix, rather than modifying the protocol, no? That can be done as part of point 2 of initial proposal (2. Support recv_timeout separately to provide a way to users who are not comfortable tcp keepalives). To achieve this there can be 2 ways. 1. Change in FE/BE protocol - I am not sure exactly how this can be done, but as per Heikki this is better way of implementing it. 2. Make the socket as non-blocking in pg_basebackup. Advantage of Approach-1 is that if we do in such a fashion that in lower layers (libpq) it is addressed then all other apps (pg_basebackup, etc) can use it, no need to handle separately in each application. So now as changes in Approach-1 seems to be invasive, we decided to do it later. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote: On 07.01.2013 16:23, Boszormenyi Zoltan wrote: Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Now that I look at this a high-level perspective, why are we only worried about timeouts in the Copy-mode and when connecting? The initial checkpoint could take a long time too, and if the server turns into a black hole while the checkpoint is running, pg_basebackup will still hang. Then again, a short timeout on that phase would be a bad idea, because the checkpoint can indeed take a long time. True, but IMO, if somebody want to take basebackup, he should do that when the server is not loaded. In streaming replication, the keep-alive messages carry additional information, the timestamps and WAL locations, so a keepalive makes sense at that level. But otherwise, aren't we just trying to reimplement TCP keepalives? TCP keepalives are not perfect, but if we want to have an application level timeout, it should be implemented in the FE/BE protocol. I don't think we need to do anything specific to pg_basebackup. The user can simply specify TCP keepalive settings in the connection string, like with any libpq program. I think currently user has no way to specify TCP keepalive settings from pg_basebackup, please let me know if there is any such existing way? I think specifying TCP settings is very cumbersome for most users, that's the reason most standard interfaces (ODBC/JDBC) have such application level timeout mechanism. By implementing in FE/BE protocol (do you mean to say that make such non-blocking behavior inside Libpq or something else), it might be generic and can be used for others as well but it might need few interface changes. IMHO if by having such less impact changes for pg_basebackup, it makes pg_basebackup network sensitive, the current approach can also be considered. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 07.01.2013 16:23, Boszormenyi Zoltan wrote: Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Now that I look at this a high-level perspective, why are we only worried about timeouts in the Copy-mode and when connecting? The initial checkpoint could take a long time too, and if the server turns into a black hole while the checkpoint is running, pg_basebackup will still hang. Then again, a short timeout on that phase would be a bad idea, because the checkpoint can indeed take a long time. In streaming replication, the keep-alive messages carry additional information, the timestamps and WAL locations, so a keepalive makes sense at that level. But otherwise, aren't we just trying to reimplement TCP keepalives? TCP keepalives are not perfect, but if we want to have an application level timeout, it should be implemented in the FE/BE protocol. I don't think we need to do anything specific to pg_basebackup. The user can simply specify TCP keepalive settings in the connection string, like with any libpq program. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On January 07, 2013 7:53 PM Boszormenyi Zoltan wrote: Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Patch is verified. Thanks for rebasing the patch. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
2013-01-04 13:43 keltezéssel, Hari Babu írta: On January 02, 2013 12:41 PM Hari Babu wrote: On January 01, 2013 10:19 PM Boszormenyi Zoltan wrote: I am reviewing your patch. • Is the patch in context diff format? Yes. Thanks for reviewing the patch. • Does it apply cleanly to the current git master? Not quite cleanly but it doesn't produce rejects or fuzz, only offset warnings: Will rebase the patch to head. • Does it include reasonable tests, necessary doc patches, etc? The test cases are not applicable. There is no test framework for testing network outage in make check. There are no documentation patches for the new --recvtimeout=INTERVAL and --conntimeout=INTERVAL options for either pg_basebackup or pg_receivexlog. I will add the documentation for the same. Per the previous comment, no. But those are for the backend to notice network breakdowns and as such, they need a separate patch. I also think it is better to handle it as a separate patch for walsender. • Are the comments sufficient and accurate? This chunk below removes a comment which seems obvious enough so it's not needed: *** *** 518,524 ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, goto error; } ! /* Check the message type. */ if (copybuf[0] == 'k') { int pos; --- 559,568 goto error; } ! /* Set the last reply timestamp */ ! last_recv_timestamp = localGetCurrentTimestamp(); ! ping_sent = false; ! if (copybuf[0] == 'k') { int pos; *** Other comments are sufficient and accurate. I will fix and update the patch. The attached V2 patch in the mail handles all the review comments identified above. Regards, Hari babu. Since my other patch against pg_basebackup is now committed, this patch doesn't apply cleanly, patch rejects 2 hunks. The fixed up patch is attached. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ diff -dcrpN postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml postgresql/doc/src/sgml/ref/pg_basebackup.sgml *** postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml 2013-01-05 17:34:30.742135371 +0100 --- postgresql/doc/src/sgml/ref/pg_basebackup.sgml 2013-01-07 15:11:40.787007890 +0100 *** PostgreSQL documentation *** 400,405 --- 400,425 /varlistentry varlistentry + termoption-r replaceable class=parameterinterval/replaceable/option/term + termoption--recvtimeout=replaceable class=parameterinterval/replaceable/option/term + listitem +para + time that receiver waits for communication from server (in seconds). +/para + /listitem + /varlistentry + + varlistentry + termoption-t replaceable class=parameterinterval/replaceable/option/term + termoption--conntimeout=replaceable class=parameterinterval/replaceable/option/term + listitem +para + time that client wait for connection to establish with server (in seconds). +/para + /listitem + /varlistentry + + varlistentry termoption-U replaceableusername/replaceable/option/term termoption--username=replaceable class=parameterusername/replaceable/option/term listitem diff -dcrpN postgresql.orig/doc/src/sgml/ref/pg_receivexlog.sgml postgresql/doc/src/sgml/ref/pg_receivexlog.sgml *** postgresql.orig/doc/src/sgml/ref/pg_receivexlog.sgml 2012-11-08 13:13:04.152630639 +0100 --- postgresql/doc/src/sgml/ref/pg_receivexlog.sgml 2013-01-07 15:11:40.788007898 +0100 *** PostgreSQL documentation *** 164,169 --- 164,189 /varlistentry varlistentry + termoption-r replaceable class=parameterinterval/replaceable/option/term + termoption--recvtimeout=replaceable class=parameterinterval/replaceable/option/term + listitem +para + time that receiver waits for communication from server (in seconds). +/para + /listitem + /varlistentry + + varlistentry + termoption-t replaceable class=parameterinterval/replaceable/option/term + termoption--conntimeout=replaceable class=parameterinterval/replaceable/option/term + listitem +para + time that client wait for connection to establish with server (in seconds). +/para + /listitem + /varlistentry + + varlistentry termoption-U replaceableusername/replaceable/option/term termoption--username=replaceable class=parameterusername/replaceable/option/term
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On January 02, 2013 12:41 PM Hari Babu wrote: On January 01, 2013 10:19 PM Boszormenyi Zoltan wrote: I am reviewing your patch. Is the patch in context diff format? Yes. Thanks for reviewing the patch. Does it apply cleanly to the current git master? Not quite cleanly but it doesn't produce rejects or fuzz, only offset warnings: Will rebase the patch to head. Does it include reasonable tests, necessary doc patches, etc? The test cases are not applicable. There is no test framework for testing network outage in make check. There are no documentation patches for the new --recvtimeout=INTERVAL and --conntimeout=INTERVAL options for either pg_basebackup or pg_receivexlog. I will add the documentation for the same. Per the previous comment, no. But those are for the backend to notice network breakdowns and as such, they need a separate patch. I also think it is better to handle it as a separate patch for walsender. Are the comments sufficient and accurate? This chunk below removes a comment which seems obvious enough so it's not needed: *** *** 518,524 ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, goto error; } ! /* Check the message type. */ if (copybuf[0] == 'k') { int pos; --- 559,568 goto error; } ! /* Set the last reply timestamp */ ! last_recv_timestamp = localGetCurrentTimestamp(); ! ping_sent = false; ! if (copybuf[0] == 'k') { int pos; *** Other comments are sufficient and accurate. I will fix and update the patch. The attached V2 patch in the mail handles all the review comments identified above. Regards, Hari babu. pg_basebkup_recvxlog_noblock_comm_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Hi, 2012-11-15 14:59 keltezéssel, Amit kapila írta: On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote: On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote: On 19.10.2012 14:42, Amit kapila wrote: On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: Are you planning to introduce the timeout mechanism in pg_basebackup main process? Or background process? It's useful to implement both. By background process, you mean ReceiveXlogStream? For both. I think for background process, it can be done in a way similar to what we have done for walreceiver. Yes. But I have some doubts for how to do for main process: Logic similar to walreceiver can not be used incase network goes down during getting other database file from server. The reason for the same is to receive the data files PQgetCopyData() is called in synchronous mode, so it keeps waiting for infinite time till it gets some data. In order to solve this issue, I can think of following options: 1. Making this call also asynchronous (but now sure about impact of this). +1 Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can solve the issue in the similar way to walreceiver's. 2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite wait), we can send some finite time. This time can be received as command line argument from respective utility and set the same in PGconn structure. Yes, I think that we should add something like --conninfo option to pg_basebackup and pg_receivexlog. We can easily set not only connect_timeout but also sslmode, application_name, ... by using such option accepting conninfo string. I have prepared an attached patch to make pg_basebackup and pg_receivexlog as non-blocking. To do so I have to add new command line parameters in pg_basebackup and pg_receivexlog for now added two more command line arguments a. -r for pg_basebackup and pg_receivexlog to take receive time-out value. Default value for this parameter is 60 sec. b. -t for pg_basebackup and pg_receivexlog to take initial connection timeout value. Default value is infinite wait. We can change to accept --conninfo as well. I feel apart from above, remaining problem is for function call PQgetResult() 1. Wherever query is getting sent from BaseBackup, it calls the function PQgetResult to receive the result of query. As PQgetResult() is blocking function (it calls pqWait which can hang), so if network is down before sending the query itself, then there will not be any result, so it will keep hanging in PQgetResult . IMO, it can be solved in below ways: a. Create one corresponding non-blocking function. But this function is being called from inside some of the other libpq function (PQexec-PQexecFinish-PQgetResult). So it can be little tricky to solve this way. b. Add the receive_timeout variable in PGconn structure and use it in pqWait for timeout whenever it is set. c. any other better way? BTW, IIRC the walsender has no timeout mechanism during sending backup data to pg_basebackup. So it's also useful to implement the timeout mechanism for the walsender during backup. What about using pq_putmessage_noblock()? I think may be some more functions also needs to be made as noblock. I am still evaluating. I will upload the attached patch in commitfest if you don't have any objections? More Suggestions/Comments? With Regards, Amit Kapila. I am reviewing your patch. * Is the patch in context diff format http://en.wikipedia.org/wiki/Diff#Context_format? Yes. * Does it apply cleanly to the current git master? Not quite cleanly but it doesn't produce rejects or fuzz, only offset warnings: [zozo@localhost postgresql]$ cat ../noblock_basebackup_and_receivexlog.patch | patch -p1 patching file src/bin/pg_basebackup/pg_basebackup.c Hunk #1 succeeded at 41 (offset -6 lines). Hunk #2 succeeded at 123 (offset -6 lines). Hunk #3 succeeded at 239 (offset -6 lines). Hunk #4 succeeded at 292 (offset -6 lines). Hunk #5 succeeded at 470 (offset -6 lines). Hunk #6 succeeded at 588 (offset -6 lines). Hunk #7 succeeded at 601 (offset -6 lines). Hunk #8 succeeded at 727 (offset -6 lines). Hunk #9 succeeded at 779 (offset -6 lines). Hunk #10 succeeded at 797 (offset -6 lines). Hunk #11 succeeded at 811 (offset -6 lines). Hunk #12 succeeded at 879 (offset -6 lines). Hunk #13 succeeded at 1080 (offset -6 lines). Hunk #14 succeeded at 1381 (offset -6 lines). Hunk #15 succeeded at 1409 (offset -6 lines). Hunk #16 succeeded at 1521 (offset -6 lines). patching file src/bin/pg_basebackup/pg_receivexlog.c Hunk #1 succeeded at 35 (offset -6 lines). Hunk #2 succeeded at 65 (offset -6 lines). Hunk #3 succeeded at 224 (offset -6 lines). Hunk #4 succeeded at 281 (offset -6
Re: Review of pg_basebackup and pg_receivexlog to use non-blocking socket communication, was: Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On January 01, 2013 10:19 PM Boszormenyi Zoltan wrote: I am reviewing your patch. Is the patch in context diff format? Yes. Thanks for reviewing the patch. Does it apply cleanly to the current git master? Not quite cleanly but it doesn't produce rejects or fuzz, only offset warnings: Will rebase the patch to head. Does it include reasonable tests, necessary doc patches, etc? The test cases are not applicable. There is no test framework for testing network outage in make check. There are no documentation patches for the new --recvtimeout=INTERVAL and --conntimeout=INTERVAL options for either pg_basebackup or pg_receivexlog. I will add the documentation for the same. Per the previous comment, no. But those are for the backend to notice network breakdowns and as such, they need a separate patch. I also think it is better to handle it as a separate patch for walsender. Are the comments sufficient and accurate? This chunk below removes a comment which seems obvious enough so it's not needed: *** *** 518,524 ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, goto error; } ! /* Check the message type. */ if (copybuf[0] == 'k') { int pos; --- 559,568 goto error; } ! /* Set the last reply timestamp */ ! last_recv_timestamp = localGetCurrentTimestamp(); ! ping_sent = false; ! if (copybuf[0] == 'k') { int pos; *** Other comments are sufficient and accurate. I will fix and update the patch. Please let me know if anything apart from above needs to be taken care. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Thursday, November 15, 2012 7:29 PM Amit kapila wrote: On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote: On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote: On 19.10.2012 14:42, Amit kapila wrote: On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: Are you planning to introduce the timeout mechanism in pg_basebackup I feel apart from above, remaining problem is for function call PQgetResult() 1. Wherever query is getting sent from BaseBackup, it calls the function PQgetResult to receive the result of query. As PQgetResult() is blocking function (it calls pqWait which can hang), so if network is down before sending the query itself, then there will not be any result, so it will keep hanging in PQgetResult . IMO, it can be solved in below ways: a. Create one corresponding non-blocking function. But this function is being called from inside some of the other libpq function (PQexec-PQexecFinish-PQgetResult). So it can be little tricky to solve this way. b. Add the receive_timeout variable in PGconn structure and use it in pqWait for timeout whenever it is set. c. any other better way? BTW, IIRC the walsender has no timeout mechanism during sending backup data to pg_basebackup. So it's also useful to implement the timeout mechanism for the walsender during backup. What about using pq_putmessage_noblock()? I think may be some more functions also needs to be made as noblock. I am still evaluating. Done the analysis and seems that for below API's also, we need equivalent noblock, otherwise same problem can happen as they are also used in the flow. a. pq_endmessage b. EndCommand c. pq_puttextmessage d. pq_putemptymessage e. ReadyForQuery - For this, because now walsender and normal backend are same. f. ReadCommand - For this, because now walsender and normal backend are same. It seems solution for it can be tricky as pq_getbyte is not called from first level function. Suggestions/Thoughts? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote: On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote: On 19.10.2012 14:42, Amit kapila wrote: On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: Are you planning to introduce the timeout mechanism in pg_basebackup main process? Or background process? It's useful to implement both. By background process, you mean ReceiveXlogStream? For both. I think for background process, it can be done in a way similar to what we have done for walreceiver. Yes. But I have some doubts for how to do for main process: Logic similar to walreceiver can not be used incase network goes down during getting other database file from server. The reason for the same is to receive the data files PQgetCopyData() is called in synchronous mode, so it keeps waiting for infinite time till it gets some data. In order to solve this issue, I can think of following options: 1. Making this call also asynchronous (but now sure about impact of this). +1 Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can solve the issue in the similar way to walreceiver's. 2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite wait), we can send some finite time. This time can be received as command line argument from respective utility and set the same in PGconn structure. Yes, I think that we should add something like --conninfo option to pg_basebackup and pg_receivexlog. We can easily set not only connect_timeout but also sslmode, application_name, ... by using such option accepting conninfo string. I have prepared an attached patch to make pg_basebackup and pg_receivexlog as non-blocking. To do so I have to add new command line parameters in pg_basebackup and pg_receivexlog for now added two more command line arguments a. -r for pg_basebackup and pg_receivexlog to take receive time-out value. Default value for this parameter is 60 sec. b. -t for pg_basebackup and pg_receivexlog to take initial connection timeout value. Default value is infinite wait. We can change to accept --conninfo as well. I feel apart from above, remaining problem is for function call PQgetResult() 1. Wherever query is getting sent from BaseBackup, it calls the function PQgetResult to receive the result of query. As PQgetResult() is blocking function (it calls pqWait which can hang), so if network is down before sending the query itself, then there will not be any result, so it will keep hanging in PQgetResult . IMO, it can be solved in below ways: a. Create one corresponding non-blocking function. But this function is being called from inside some of the other libpq function (PQexec-PQexecFinish-PQgetResult). So it can be little tricky to solve this way. b. Add the receive_timeout variable in PGconn structure and use it in pqWait for timeout whenever it is set. c. any other better way? BTW, IIRC the walsender has no timeout mechanism during sending backup data to pg_basebackup. So it's also useful to implement the timeout mechanism for the walsender during backup. What about using pq_putmessage_noblock()? I think may be some more functions also needs to be made as noblock. I am still evaluating. I will upload the attached patch in commitfest if you don't have any objections? More Suggestions/Comments? With Regards, Amit Kapila. noblock_basebackup_and_receivexlog.patch Description: noblock_basebackup_and_receivexlog.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote: On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote: On 19.10.2012 14:42, Amit kapila wrote: On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: Before implementing the timeout parameter, I think that it's better to change both pg_basebackup background process and pg_receivexlog so that BTW, IIRC the walsender has no timeout mechanism during sending backup data to pg_basebackup. So it's also useful to implement the timeout mechanism for the walsender during backup. Yes, its useful, but for walsender the main problem is that it uses blocking send call to send the data. I have tried using tcp_keepalive settings, but the send call doesn't comeout incase of network break. The only way I could get it out is: change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using the command echo 8 /proc/sys/net/ipv4/tcp_retries2 As per recommendation, its value should be at-least 8 (equivalent to 100 sec) Do you have any idea, how it can be achieved? What about using pq_putmessage_noblock()? I will try this, but do you know why at first place in code the blocking mode is used to send files? I am asking as I am little scared that it should not break any design which was initially thought of while making send of files as blocking. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: On Wed, Oct 17, 2012 at 8:46 PM, Amit Kapila amit.kap...@huawei.com wrote: On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote: On 13.10.2012 19:35, Fujii Masao wrote: On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ok, thanks. Committed. I found one typo. The attached patch fixes that typo. Thanks, fixed. ISTM you need to update the protocol.sgml because you added the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. Is it worth adding the same mechanism (send back the reply immediately if walsender request a reply) into pg_basebackup and pg_receivexlog? Good catch. Yes, they should be taught about this too. I'll look into doing that too. If you have not started and you don't have objection, I can pickup this to complete it. For both (pg_basebackup and pg_receivexlog), we need to get a timeout parameter from user in command line, as there is no conf file here. New Option can be -t (parameter name can be recvtimeout). The main changes will be in function ReceiveXlogStream(), it is a common function for both Pg_basebackup and pg_receivexlog. Handling will be done in same way as we have done in walreceiver. Suggestions/Comments? Before implementing the timeout parameter, I think that it's better to change both pg_basebackup background process and pg_receivexlog so that they send back the reply message immediately when they receive the keepalive message requesting the reply. Currently, they always ignore such keepalive message, so status interval parameter (-s) in them always must be set to the value less than replication timeout. We can avoid this troublesome parameter setting by introducing the same logic of walreceiver into both pg_basebackup background process and pg_receivexlog. Please find the patch attached to address the modification mentioned by you (send immediate reply for keepalive). Both basebackup and pg_receivexlog uses the same function ReceiveXLogStream, so single change for both will address the issue. Now further to this for introducing timeout in pg_basebackup and pg_receivexlog: We can have mechanism similar to wal receiver timeout while streaming the data from server, but same logic can not be used incase network goes down during getting other database file from server. The reason for the same is to receive the data files PQgetCopyData() is called in synchronous mode, so it keeps waiting for infinite time till it gets some data. In order to solve this issue, I can think of following options: 1. Making this call also asynchronous (but now sure about impact of this). 2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite wait), we can send some finite time. This time can be received as command line argument from respective utility and set the same in PGconn structure. In order to have timeout value in PGconn, we can have: a. Add new parameter in PGconn to indicate the receive timeout. b. Use the existing parameter connect_timeout for receive timeout also but this may lead to confusion. 3. Any other better option? Apart from above issue, there is possibility that if during connect time network goes down, then it might hang, because connect_timeout by default will be NULL and connectDBComplete will start waiting inifinitely for connection to become successful. So shall we have command line argument separately for this also or any other way as you suugest. Suggestions/Comments With Regards, Amit Kapila. pg_basebackup_keepalive_reply.patch Description: pg_basebackup_keepalive_reply.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Thursday, October 11, 2012 8:22 PM Heikki Linnakangas wrote: On 11.10.2012 13:17, Amit Kapila wrote: How does this look now? The Patch is fine and test results are also fine. Ok, thanks. Committed. Thank you very much. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: On 02.10.2012 10:36, Amit kapila wrote: On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: So let's think how this should ideally work from a user's point of view. I think there should be just two settings: walsender_timeout and walreceiver_timeout. walsender_timeout specifies how long a walsender will keep a connection open if it doesn't hear from the walreceiver, and walreceiver_timeout is the same for walreceiver. The system should The Ping/Pong messages don't necessarily need to be new message types, we can use the message types we currently have, perhaps with an additional flag attached to them, to request the other side to reply immediately. Can't we make the decision to send reply immediately based on message type, because these message types will be unique. To clarify my understanding, 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h'). 2. the reply message from walreceiver side will be current reply message ('r'). Yep. I wonder why need separate message types for Hot Standby Feedback 'h' and Reply 'r', though. Seems it would be simpler to have just one messasge type that includes all the fields from both messages. moved the contents for Hot Standby Feedback 'h' to Reply 'r' and use 'h' for heart-beat purpose. 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only? if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte to indicate it is reply? Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an additional flag. Okay. I have done it in Patch. Thank you for suggestions. I have addressed your suggestions in patch attached with this mail. Following changes are done to support replication timeout in sender as well as receiver: 1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task. 2. Existing parameter replication_timeout is renamed to wal_sender_timeout. 3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r' (i.e. reply) or 'h' (i.e. heart-beat). 4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout. In this case it will send keep-alive of type 'h'. 5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive. 6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message. So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage). 7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message. 8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7, reply is sent if wal_receiver_status_interval has lapsed (this part is same as earlier). 9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout, then it will send the hot-standby heartbeat. This heart-beat has been modified to send only sendTime. 10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alive message is sent of type 'r'. 11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sender task. 12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break at receiver task. With Regards, Amit Kapila. replication_timeout_patch_v3.patch Description: replication_timeout_patch_v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: On 21.09.2012 14:18, Amit kapila wrote: On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote: On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapilaamit.kap...@huawei.com wrote: Approach-2 : Provide a variable wal_send_status_interval, such that if this is 0, then the current behavior would prevail and if its non-zero then KeepAlive message would be send maximum after that time. The modified code of WALSendLoop will be as follows: snip Which way you think is better or you have any other idea to handle. I think #2 is better because it's more intuitive to a user. Please find a patch attached for implementation of Approach-2. So let's think how this should ideally work from a user's point of view. I think there should be just two settings: walsender_timeout and walreceiver_timeout. walsender_timeout specifies how long a walsender will keep a connection open if it doesn't hear from the walreceiver, and walreceiver_timeout is the same for walreceiver. The system should figure out itself how often to send keepalive messages so that those timeouts are not reached. By this it implies that we should remove wal_receiver_status_interval. Currently it is also used incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove this variable receiver might start sending that message sonner than required. Is that okay behavior? In walsender, after half of walsender_timeout has elapsed and we haven't received anything from the client, the walsender process should send a ping message to the client. Whenever the client receives a Ping, it replies. The walreceiver does the same; when half of walreceiver_timeout has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip resets the timer in both ends, regardless of which side initiated it, so if e.g walsender_timeout walreceiver_timeout, the client will never have to initiate a Ping message, because walsender will always reach the walsender_timeout/2 point first and initiate the heartbeat message. Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent. walreceiver should reset timer after sending reply for heartbeat message. Similar to above timers will be reset when receiver sent the heartbeat message. The Ping/Pong messages don't necessarily need to be new message types, we can use the message types we currently have, perhaps with an additional flag attached to them, to request the other side to reply immediately. Can't we make the decision to send reply immediately based on message type, because these message types will be unique. To clarify my understanding, 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h'). 2. the reply message from walreceiver side will be current reply message ('r'). 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only? if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte to indicate it is reply? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Monday, October 01, 2012 8:36 PM Robert Haas wrote: On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm, I think we need to step back a bit. I've never liked the way replication_timeout works, where it's the user's responsibility to set wal_receiver_status_interval replication_timeout. It's not very user-friendly. I'd rather not copy that same design to this walreceiver timeout. If there's two different timeouts like that, it's even worse, because it's easy to confuse the two. I agree, but also note that wal_receiver_status_interval serves another user-visible purpose as well. By above do you mean to say that wal_receiver_status_interval is used for reply of data sent by server to indicate till what point receiver has flushed data or something else? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote: On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila amit.kap...@huawei.com wrote: Approach-2 : Provide a variable wal_send_status_interval, such that if this is 0, then the current behavior would prevail and if its non-zero then KeepAlive message would be send maximum after that time. The modified code of WALSendLoop will be as follows: snip Which way you think is better or you have any other idea to handle. I think #2 is better because it's more intuitive to a user. Please find a patch attached for implementation of Approach-2. With Regards, Amit Kapila. replication_timeout_patch_v2.patch Description: replication_timeout_patch_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote: On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila amit.kap...@huawei.com wrote: On Thursday, September 13, 2012 10:57 PM Fujii Masao On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, September 12, 2012 10:15 PM Fujii Masao On Wed, Sep 12, 2012 at 8:54 PM, amit.kap...@huawei.com wrote: The following bug has been logged on the website: I would like to implement such feature for walreceiver, but there is one confusion that whether to use same configuration parameter(replication_timeout) for walrecevier as for master or introduce a new configuration parameter (receiver_replication_timeout). I like the latter. I believe some users want to set the different timeout values, for example, in the case where the master and standby servers are placed in the same room, but cascaded standby is placed in other continent. Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. The main changes are: 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. This is same as walsender functionality. As this is a feature, So I am uploading the attached patch in coming CommitFest. Suggestions/Comments? You also need to change walsender so that it periodically sends the heartbeat message, like walreceiver does each wal_receiver_status_interval. Otherwise, walreceiver will detect the timeout wrongly whenever there is no traffic in the master. Doesn't current keepalive message from walsender will suffice that need? No. Though the keepalive interval should be smaller than the timeout, IIRC there is no way to specify the keepalive interval now. Currently AFAICS in the code on idle system, it should send keepalive after 10s which is hardcoded value as sleeptime. You are right that if its not configurable, and somebody configures replication_timeout as value lower than 10s then the logic will fail. So is it okay if a new config parameter similar to wal_receiver_status_interval be added and map it directly to sleeptime in the current code. There will be no need for any new heartbeat message, existing keepalive will sufice that purpose. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila amit.kap...@huawei.com wrote: On Thursday, September 13, 2012 10:57 PM Fujii Masao On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, September 12, 2012 10:15 PM Fujii Masao On Wed, Sep 12, 2012 at 8:54 PM, amit.kap...@huawei.com wrote: The following bug has been logged on the website: I would like to implement such feature for walreceiver, but there is one confusion that whether to use same configuration parameter(replication_timeout) for walrecevier as for master or introduce a new configuration parameter (receiver_replication_timeout). I like the latter. I believe some users want to set the different timeout values, for example, in the case where the master and standby servers are placed in the same room, but cascaded standby is placed in other continent. Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. The main changes are: 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. This is same as walsender functionality. As this is a feature, So I am uploading the attached patch in coming CommitFest. Suggestions/Comments? You also need to change walsender so that it periodically sends the heartbeat message, like walreceiver does each wal_receiver_status_interval. Otherwise, walreceiver will detect the timeout wrongly whenever there is no traffic in the master. Doesn't current keepalive message from walsender will suffice that need? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Thursday, September 13, 2012 10:57 PM Fujii Masao On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, September 12, 2012 10:15 PM Fujii Masao On Wed, Sep 12, 2012 at 8:54 PM, amit.kap...@huawei.com wrote: The following bug has been logged on the website: Bug reference: 7534 Logged by: Amit Kapila Email address: amit.kap...@huawei.com PostgreSQL version: 9.2.0 Operating system: Suse 10 Description: 1. Both master and standby machine are connected normally, 2. then you use the command: ifconfig ip down; make the network card of master and standby down, Observation master can detect connect abnormal, but the standby can't detect connect abnormal and show a connected channel long time. I would like to implement such feature for walreceiver, but there is one confusion that whether to use same configuration parameter(replication_timeout) for walrecevier as for master or introduce a new configuration parameter (receiver_replication_timeout). I like the latter. I believe some users want to set the different timeout values, for example, in the case where the master and standby servers are placed in the same room, but cascaded standby is placed in other continent. Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. The main changes are: 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. This is same as walsender functionality. As this is a feature, So I am uploading the attached patch in coming CommitFest. Suggestions/Comments? With Regards, Amit Kapila.*** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 62,67 walrcv_connect_type walrcv_connect = NULL; --- 62,69 walrcv_receive_type walrcv_receive = NULL; walrcv_send_type walrcv_send = NULL; walrcv_disconnect_type walrcv_disconnect = NULL; + int wal_receiver_replication_timeout = 60 * 1000; /* maximum time to receive one + * WAL data message */ #define NAPTIME_PER_CYCLE 100 /* max sleep time between cycles (100ms) */ *** *** 174,179 WalReceiverMain(void) --- 176,184 /* use volatile pointer to prevent code rearrangement */ volatile WalRcvData *walrcv = WalRcv; + TimestampTz last_recv_timestamp; + TimestampTz timeout = 0; + /* * WalRcv should be set up already (if we are a backend, we inherit this * by fork() or EXEC_BACKEND mechanism from the postmaster). *** *** 282,287 WalReceiverMain(void) --- 287,295 MemSet(reply_message, 0, sizeof(reply_message)); MemSet(feedback_message, 0, sizeof(feedback_message)); + /* Initialize the last recv timestamp */ + last_recv_timestamp = GetCurrentTimestamp(); + /* Loop until end-of-streaming or error */ for (;;) { *** *** 316,327 WalReceiverMain(void) --- 324,343 /* Wait a while for data to arrive */ if (walrcv_receive(NAPTIME_PER_CYCLE, type, buf, len)) { + /* Something is received from master, so reset last receive time*/ + last_recv_timestamp = GetCurrentTimestamp(); + /* Accept the received data, and process it */ XLogWalRcvProcessMsg(type, buf, len); /* Receive any more data we can without sleeping */ while (walrcv_receive(0, type, buf, len)) + { + /* Something is received from master, so reset last receive time*/ + last_recv_timestamp = GetCurrentTimestamp(); + XLogWalRcvProcessMsg(type, buf, len); + } /* Let the master know that we received some data. */ XLogWalRcvSendReply(); *** *** 334,339 WalReceiverMain(void) --- 350,369 } else { + /* Check if time since last receive from standby has reached the configured limit +* No need to check if it is disabled by giving value as 0*/ + if (wal_receiver_replication_timeout 0) + { + timeout = TimestampTzPlusMilliseconds(last_recv_timestamp, +