Re: [HACKERS] WALWriter active during recovery
On 2015-07-02 14:34:48 +0100, Simon Riggs wrote: This was pushed back from last CF and I haven't worked on it at all, nor will I. Pushing back again. Let's return with feedback, not move, it then.. Moving a entries along which aren't expected to receive updates anytime soon isn't a good idea, there's more than enough entries each CF. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On 2 July 2015 at 14:38, Andres Freund and...@anarazel.de wrote: On 2015-07-02 14:34:48 +0100, Simon Riggs wrote: This was pushed back from last CF and I haven't worked on it at all, nor will I. Pushing back again. Let's return with feedback, not move, it then.. Moving a entries along which aren't expected to receive updates anytime soon isn't a good idea, there's more than enough entries each CF. Although I agree, the interface won't let me do that, so will leave as-is. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] WALWriter active during recovery
On 2 July 2015 at 14:31, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. With the patch, replication didn't work fine in my machine. I started the standby server after removing all the WAL files from the standby. ISTM that the patch doesn't handle that case. That is, in that case, the standby tries to start up walreceiver and replication to retrieve the REDO-starting checkpoint record *before* starting up walwriter (IOW, before reaching the consistent point). Then since walreceiver works without walwriter, no received WAL data cannot be fsync'd in the standby. So replication cannot advance furthermore. I think that walwriter needs to start before walreceiver starts. I just marked this patch as Waiting on Author. This patch was moved to current CF with the status Needs review. But there are already some review comments which have not been addressed yet, so I marked the patch as Waiting on Author again. This was pushed back from last CF and I haven't worked on it at all, nor will I. Pushing back again. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] WALWriter active during recovery
On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. With the patch, replication didn't work fine in my machine. I started the standby server after removing all the WAL files from the standby. ISTM that the patch doesn't handle that case. That is, in that case, the standby tries to start up walreceiver and replication to retrieve the REDO-starting checkpoint record *before* starting up walwriter (IOW, before reaching the consistent point). Then since walreceiver works without walwriter, no received WAL data cannot be fsync'd in the standby. So replication cannot advance furthermore. I think that walwriter needs to start before walreceiver starts. I just marked this patch as Waiting on Author. This patch was moved to current CF with the status Needs review. But there are already some review comments which have not been addressed yet, so I marked the patch as Waiting on Author again. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. With the patch, replication didn't work fine in my machine. I started the standby server after removing all the WAL files from the standby. ISTM that the patch doesn't handle that case. That is, in that case, the standby tries to start up walreceiver and replication to retrieve the REDO-starting checkpoint record *before* starting up walwriter (IOW, before reaching the consistent point). Then since walreceiver works without walwriter, no received WAL data cannot be fsync'd in the standby. So replication cannot advance furthermore. I think that walwriter needs to start before walreceiver starts. I just marked this patch as Waiting on Author. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. +1 At present this is a WIP patch, for code comments only. Don't bother with anything other than code questions at this stage. Implementation questions are * How should we wake WALReceiver, since it waits on a poll(). Should we use SIGUSR1, which is already used for latch waits, or another signal? Probably we need to change libpqwalreceiver so that it uses the latch. This is useful even for the startup process to report the replay location to the walreceiver in real time. * Should we introduce some pacing delays if the WALreceiver gets too far ahead of apply? I don't think so for now. Instead, we can support synchronous_commit = replay, and the users can use that new mode if they are worried about the delay of WAL replay. * Other questions you may have? Who should wake the startup process so that it reads and replays the WAL data? Current walreceiver. But if walwriter is responsible for fsyncing WAL data, probably walwriter should do that. Because the startup process should not replay the WAL data which has not been fsync'd yet. Moved this patch to CF 2015-02 to not lose track of it and because it did not get any reviews. -- Michael
Re: [HACKERS] WALWriter active during recovery
On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. +1 At present this is a WIP patch, for code comments only. Don't bother with anything other than code questions at this stage. Implementation questions are * How should we wake WALReceiver, since it waits on a poll(). Should we use SIGUSR1, which is already used for latch waits, or another signal? Probably we need to change libpqwalreceiver so that it uses the latch. This is useful even for the startup process to report the replay location to the walreceiver in real time. * Should we introduce some pacing delays if the WALreceiver gets too far ahead of apply? I don't think so for now. Instead, we can support synchronous_commit = replay, and the users can use that new mode if they are worried about the delay of WAL replay. * Other questions you may have? Who should wake the startup process so that it reads and replays the WAL data? Current walreceiver. But if walwriter is responsible for fsyncing WAL data, probably walwriter should do that. Because the startup process should not replay the WAL data which has not been fsync'd yet. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
Hi, On Tue, Dec 16, 2014 at 6:07 PM, Simon Riggs si...@2ndquadrant.com wrote: On 16 December 2014 at 14:12, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 12/15/2014 08:51 PM, Simon Riggs wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. On many Linux systems it may not do that much (2.6.32 and 3.2 are bad, 3.13 is better but still it slows the fsync). If there's a fsync in progress WALReceiver will: 1- slow the fsync because its writes to the same file are grabbed by the fsync 2- stall until the end of fsync. from 'stracing' a test program simulating this pattern: two processes, one writes to a file the second fsync it. 20279 11:51:24.037108 fsync(5 unfinished ... 20278 11:51:24.053524 ... nanosleep resumed NULL) = 0 0.020281 20278 11:51:24.053691 lseek(3, 1383612416, SEEK_SET) = 1383612416 0.000119 20278 11:51:24.053965 write(3, ..., 8192) = 8192 0.000111 20278 11:51:24.054190 nanosleep({0, 2000}, NULL) = 0 0.020243 20278 11:51:24.404386 lseek(3, 194772992, SEEK_SET unfinished ... 20279 11:51:24.754123 ... fsync resumed ) = 0 0.716971 20279 11:51:24.754202 close(5 unfinished ... 20278 11:51:24.754232 ... lseek resumed ) = 194772992 0.349825 Yes that's a 300ms lseek... What other useful actions can WAL receiver do while it's waiting? It doesn't do much else than receive WAL, and fsync it to disk. So now it will only need to do one of those two things. Regards Didier -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On 17 December 2014 at 11:27, didier did...@gmail.com wrote: If there's a fsync in progress WALReceiver will: 1- slow the fsync because its writes to the same file are grabbed by the fsync 2- stall until the end of fsync. PostgreSQL already fsyncs files while they are being written to. Are you saying we should stop doing that? It would be possible to synchronize processes so that we don't write to a file while it is being fsynced. fsyncs are also made once the whole 16MB has been written, so in those cases there is no simultaneous action. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
didier wrote: On many Linux systems it may not do that much (2.6.32 and 3.2 are bad, 3.13 is better but still it slows the fsync). If there's a fsync in progress WALReceiver will: 1- slow the fsync because its writes to the same file are grabbed by the fsync 2- stall until the end of fsync. Is this behavior filesystem-dependent? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
Hi On Wed, Dec 17, 2014 at 2:39 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: didier wrote: On many Linux systems it may not do that much (2.6.32 and 3.2 are bad, 3.13 is better but still it slows the fsync). If there's a fsync in progress WALReceiver will: 1- slow the fsync because its writes to the same file are grabbed by the fsync 2- stall until the end of fsync. Is this behavior filesystem-dependent? I don't know. I only tested ext4 Attach the trivial code I used, there's a lot of junk in it. Didier /* * Compile with: gcc testf.c -Wall -W -O0 */ #include stdio.h #include unistd.h #include string.h #include sys/types.h #include sys/stat.h #include sys/fcntl.h #include sys/time.h #include stdlib.h #include stdint.h #include sys/file.h #include errno.h static long long microseconds(void) { struct timeval tv; long long mst; gettimeofday(tv, NULL); mst = ((long long)tv.tv_sec)*100; mst += tv.tv_usec; return mst; } int out= 0; //#define FLOCK(a,b) flock(a,b) #define FLOCK(a,b) (0) //== // fsync void child(void) { int fd, retval; long long start; while(1) { fd = open(/tmp/foo.txt,O_RDONLY); //usleep(3000); usleep(500); FLOCK(fd, LOCK_EX); if (out) { printf(Start sync\n); fflush(stdout); start = microseconds(); } retval = fsync(fd); FLOCK(fd, LOCK_UN); if (out) { printf(Sync in %lld microseconds (%d)\n, microseconds()-start,retval); fflush(stdout); } close(fd); } exit(0); } char buf[8*1024]; #define f_size (2lu*1024*1024*1024) //== // read void child2(void) { int fd, retval; long long start; off_t lfsr; fd = open(/tmp/foo.txt,O_RDWR /*|O_CREAT | O_SYNC*/,0644); srandom(2000 +time(NULL)); while(1) { if (out) { start = microseconds(); } lfsr = random()/sizeof(buf); if (pread (fd, buf, sizeof(buf), sizeof(buf)*lfsr) == -1) { perror(read); exit(1); } // posix_fadvise(fd, sizeof(buf)*lfsr, sizeof(buf), POSIX_FADV_DONTNEED); if (out) { printf(read %lu in %lld microseconds\n, lfsr *sizeof(buf), microseconds()-start); fflush(stdout); } usleep(500); } close(fd); exit(0); } //== void child3(int end) { int fd, retval; long long start; off_t lfsr; int i; int j = 2; fd = open(/tmp/foo.txt,O_RDWR /*|O_CREAT | O_SYNC*/,0644); for (i = 0; i 131072/j; i++) { int u; lseek(fd, sizeof(buf)*(i*j), SEEK_SET); write(fd, buf , sizeof(buf)); } close(fd); if (end) exit(0); sleep(60); } int main(void) { int fd0 = open(/tmp/foo.txt,O_RDWR |O_CREAT /*| O_SYNC*/,0644); int fd1 = open(/tmp/foo1.txt,O_RDWR |O_CREAT /*| O_SYNC*/,0644); int fd; long long start; long long end = 0; off_t lfsr = 0; memset(buf, 'a', sizeof(buf)); ftruncate(fd0, f_size); ftruncate(fd1, f_size); printf (%d\n,RAND_MAX); // child3(0); if (!fork()) { child(); exit(1); } #if 0 if (!fork()) { child2(); exit(1); } if (!fork()) { child3(1); exit(1); } #endif srandom(1000+time(NULL)); while(1) { fd = fd0; if (FLOCK(fd, LOCK_EX| LOCK_NB) == -1) { if (errno == EWOULDBLOCK) fd =fd1; } lfsr = random()/sizeof(buf); if (out) { start = microseconds(); } // if (pwrite(fd ,buf ,sizeof(buf), sizeof(buf)*lfsr) == -1) { lseek(fd, sizeof(buf)*lfsr, SEEK_SET); if (write(fd,buf,sizeof(buf)) == -1) { perror(write); exit(1); } if (out) { printf(Write %lu in %lld microseconds\n, lfsr *sizeof(buf), microseconds()-start); fflush(stdout); } if (fd == fd0) { FLOCK(fd, LOCK_UN); } usleep(2); } close(fd); exit(0); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On 12/15/2014 08:51 PM, Simon Riggs wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. What other useful actions can WAL receiver do while it's waiting? It doesn't do much else than receive WAL, and fsync it to disk. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On 2014-12-16 16:12:40 +0200, Heikki Linnakangas wrote: On 12/15/2014 08:51 PM, Simon Riggs wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. What other useful actions can WAL receiver do while it's waiting? It doesn't do much else than receive WAL, and fsync it to disk. It can actually receive further data from the network and write it to disk? On a relatively low latency network the buffers aren't that large. Right now we generate quite a bursty IO pattern with the disks alternating between idle and fully busy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
On 16 December 2014 at 14:12, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 12/15/2014 08:51 PM, Simon Riggs wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. What other useful actions can WAL receiver do while it's waiting? It doesn't do much else than receive WAL, and fsync it to disk. So now it will only need to do one of those two things. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALWriter active during recovery
Hi, On 2014-12-15 18:51:44 +, Simon Riggs wrote: Currently, WALReceiver writes and fsyncs data it receives. Clearly, while we are waiting for an fsync we aren't doing any other useful work. Well, it can still buffer data on the network level, but there's definitely limits to that. So I can see this as being useful. Following patch starts WALWriter during recovery and makes it responsible for fsyncing data, allowing WALReceiver to progress other useful actions. At present this is a WIP patch, for code comments only. Don't bother with anything other than code questions at this stage. Implementation questions are * How should we wake WALReceiver, since it waits on a poll(). Should we use SIGUSR1, which is already used for latch waits, or another signal? It's not entirely trivial, but also not hard, to make it use the latch code for waiting. It'd probably end up requiring less code because then we could just scratch libqpwalreceiver.c:libpq_select(). * Should we introduce some pacing delays if the WALreceiver gets too far ahead of apply? Hm. Why don't we simply start fsyncing in the receiver itself at regular intervals? If already synced that's cheap, if not, it'll pace us. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers