Re: [HACKERS] WALWriter active during recovery

2015-07-02 Thread Andres Freund
On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:
 This was pushed back from last CF and I haven't worked on it at all, nor
 will I.
 
 Pushing back again.

Let's return with feedback, not  move, it then.. Moving a entries
along which aren't expected to receive updates anytime soon isn't a good
idea, there's more than enough entries each CF.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2015-07-02 Thread Simon Riggs
On 2 July 2015 at 14:38, Andres Freund and...@anarazel.de wrote:

 On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:
  This was pushed back from last CF and I haven't worked on it at all, nor
  will I.
 
  Pushing back again.

 Let's return with feedback, not  move, it then.. Moving a entries
 along which aren't expected to receive updates anytime soon isn't a good
 idea, there's more than enough entries each CF.


Although I agree, the interface won't let me do that, so will leave as-is.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] WALWriter active during recovery

2015-07-02 Thread Simon Riggs
On 2 July 2015 at 14:31, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
  On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  Currently, WALReceiver writes and fsyncs data it receives. Clearly,
  while we are waiting for an fsync we aren't doing any other useful
  work.
 
  Following patch starts WALWriter during recovery and makes it
  responsible for fsyncing data, allowing WALReceiver to progress other
  useful actions.
 
  With the patch, replication didn't work fine in my machine. I started
  the standby server after removing all the WAL files from the standby.
  ISTM that the patch doesn't handle that case. That is, in that case,
  the standby tries to start up walreceiver and replication to retrieve
  the REDO-starting checkpoint record *before* starting up walwriter
  (IOW, before reaching the consistent point). Then since walreceiver works
  without walwriter, no received WAL data cannot be fsync'd in the standby.
  So replication cannot advance furthermore. I think that walwriter needs
  to start before walreceiver starts.
 
  I just marked this patch as Waiting on Author.

 This patch was moved to current CF with the status Needs review.
 But there are already some review comments which have not been addressed
 yet,
 so I marked the patch as Waiting on Author again.


This was pushed back from last CF and I haven't worked on it at all, nor
will I.

Pushing back again.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] WALWriter active during recovery

2015-07-02 Thread Fujii Masao
On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.

 With the patch, replication didn't work fine in my machine. I started
 the standby server after removing all the WAL files from the standby.
 ISTM that the patch doesn't handle that case. That is, in that case,
 the standby tries to start up walreceiver and replication to retrieve
 the REDO-starting checkpoint record *before* starting up walwriter
 (IOW, before reaching the consistent point). Then since walreceiver works
 without walwriter, no received WAL data cannot be fsync'd in the standby.
 So replication cannot advance furthermore. I think that walwriter needs
 to start before walreceiver starts.

 I just marked this patch as Waiting on Author.

This patch was moved to current CF with the status Needs review.
But there are already some review comments which have not been addressed yet,
so I marked the patch as Waiting on Author again.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2015-03-05 Thread Fujii Masao
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2015-02-12 Thread Michael Paquier
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  Currently, WALReceiver writes and fsyncs data it receives. Clearly,
  while we are waiting for an fsync we aren't doing any other useful
  work.
 
  Following patch starts WALWriter during recovery and makes it
  responsible for fsyncing data, allowing WALReceiver to progress other
  useful actions.

 +1

  At present this is a WIP patch, for code comments only. Don't bother
  with anything other than code questions at this stage.
 
  Implementation questions are
 
  * How should we wake WALReceiver, since it waits on a poll(). Should
  we use SIGUSR1, which is already used for latch waits, or another
  signal?

 Probably we need to change libpqwalreceiver so that it uses the latch.
 This is useful even for the startup process to report the replay location
 to
 the walreceiver in real time.

  * Should we introduce some pacing delays if the WALreceiver gets too
  far ahead of apply?

 I don't think so for now. Instead, we can support synchronous_commit =
 replay,
 and the users can use that new mode if they are worried about the delay of
 WAL replay.

  * Other questions you may have?

 Who should wake the startup process so that it reads and replays the WAL
 data?
 Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
 probably walwriter should do that. Because the startup process should not
 replay
 the WAL data which has not been fsync'd yet.


Moved this patch to CF 2015-02 to not lose track of it and because it did
not get any reviews.
-- 
Michael


Re: [HACKERS] WALWriter active during recovery

2014-12-18 Thread Fujii Masao
On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.

+1

 At present this is a WIP patch, for code comments only. Don't bother
 with anything other than code questions at this stage.

 Implementation questions are

 * How should we wake WALReceiver, since it waits on a poll(). Should
 we use SIGUSR1, which is already used for latch waits, or another
 signal?

Probably we need to change libpqwalreceiver so that it uses the latch.
This is useful even for the startup process to report the replay location to
the walreceiver in real time.

 * Should we introduce some pacing delays if the WALreceiver gets too
 far ahead of apply?

I don't think so for now. Instead, we can support synchronous_commit = replay,
and the users can use that new mode if they are worried about the delay of
WAL replay.

 * Other questions you may have?

Who should wake the startup process so that it reads and replays the WAL data?
Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
probably walwriter should do that. Because the startup process should not replay
the WAL data which has not been fsync'd yet.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-17 Thread didier
Hi,

On Tue, Dec 16, 2014 at 6:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 16 December 2014 at 14:12, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 12/15/2014 08:51 PM, Simon Riggs wrote:

 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.
On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
3.13 is better but still it slows the fsync).

If there's a fsync in progress WALReceiver will:
1- slow the fsync because its writes to the same file are grabbed by the fsync
2- stall until the end of fsync.

from 'stracing' a test program simulating this pattern:
two processes, one writes to a file the second fsync it.

20279 11:51:24.037108 fsync(5 unfinished ...
20278 11:51:24.053524 ... nanosleep resumed NULL) = 0 0.020281
20278 11:51:24.053691 lseek(3, 1383612416, SEEK_SET) = 1383612416 0.000119
20278 11:51:24.053965 write(3, ...,
8192) = 8192 0.000111
20278 11:51:24.054190 nanosleep({0, 2000}, NULL) = 0 0.020243

20278 11:51:24.404386 lseek(3, 194772992, SEEK_SET unfinished ...
20279 11:51:24.754123 ... fsync resumed ) = 0 0.716971
20279 11:51:24.754202 close(5 unfinished ...
20278 11:51:24.754232 ... lseek resumed ) = 194772992 0.349825

Yes that's a 300ms lseek...



 What other useful actions can WAL receiver do while it's waiting? It doesn't
 do much else than receive WAL, and fsync it to disk.

 So now it will only need to do one of those two things.


Regards
Didier


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-17 Thread Simon Riggs
On 17 December 2014 at 11:27, didier did...@gmail.com wrote:

 If there's a fsync in progress WALReceiver will:
 1- slow the fsync because its writes to the same file are grabbed by the fsync
 2- stall until the end of fsync.

PostgreSQL already fsyncs files while they are being written to. Are
you saying we should stop doing that?

It would be possible to synchronize processes so that we don't write
to a file while it is being fsynced.

fsyncs are also made once the whole 16MB has been written, so in those
cases there is no simultaneous action.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-17 Thread Alvaro Herrera
didier wrote:

 On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
 3.13 is better but still it slows the fsync).
 
 If there's a fsync in progress WALReceiver will:
 1- slow the fsync because its writes to the same file are grabbed by the fsync
 2- stall until the end of fsync.

Is this behavior filesystem-dependent?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-17 Thread didier
Hi

On Wed, Dec 17, 2014 at 2:39 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 didier wrote:

 On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
 3.13 is better but still it slows the fsync).

 If there's a fsync in progress WALReceiver will:
 1- slow the fsync because its writes to the same file are grabbed by the 
 fsync
 2- stall until the end of fsync.

 Is this behavior filesystem-dependent?
I don't know. I only tested  ext4

Attach the trivial code I used, there's a lot of junk in it.

Didier
/*
* Compile with: gcc testf.c -Wall -W -O0 
*/
 
#include stdio.h
#include unistd.h
#include string.h
#include sys/types.h
#include sys/stat.h
#include sys/fcntl.h
#include sys/time.h
#include stdlib.h
#include stdint.h
#include sys/file.h
#include errno.h
 
static long long microseconds(void) {
   struct timeval tv;
   long long mst;
 
   gettimeofday(tv, NULL);
   mst = ((long long)tv.tv_sec)*100;
   mst += tv.tv_usec;
   return mst;
}
int out= 0;
//#define FLOCK(a,b) flock(a,b)
#define FLOCK(a,b) (0)
 
//==
// fsync  
void child(void) {
  int fd, retval;
  long long start;
 
  while(1) {
 fd = open(/tmp/foo.txt,O_RDONLY);
 //usleep(3000);
 usleep(500);
 FLOCK(fd, LOCK_EX);
 if (out) {
   printf(Start sync\n);
   fflush(stdout);
   start = microseconds();
 }
 retval = fsync(fd);
 FLOCK(fd, LOCK_UN);
 if (out) {
   printf(Sync in %lld microseconds (%d)\n, microseconds()-start,retval);
   fflush(stdout);
 }  
 close(fd);
   }
   exit(0);
}

char buf[8*1024];
#define f_size (2lu*1024*1024*1024)

//==
// read
void child2(void) {
   int fd, retval;
   long long start;
   off_t lfsr;

   fd = open(/tmp/foo.txt,O_RDWR /*|O_CREAT | O_SYNC*/,0644);
   srandom(2000 +time(NULL));
   while(1) {
  if (out) {
start = microseconds();
  }
  lfsr = random()/sizeof(buf);
  if (pread (fd, buf, sizeof(buf), sizeof(buf)*lfsr) == -1) {
 perror(read);
 exit(1);
  }
  // posix_fadvise(fd, sizeof(buf)*lfsr, sizeof(buf), POSIX_FADV_DONTNEED);
  if (out) {
printf(read %lu in %lld microseconds\n, lfsr *sizeof(buf), microseconds()-start);
fflush(stdout);
  }
  usleep(500);
   }
   close(fd);
   exit(0);
}

//==
void child3(int end) {
   int fd, retval;
   long long start;
   off_t lfsr;
   int i;
   int j = 2;

   fd = open(/tmp/foo.txt,O_RDWR /*|O_CREAT | O_SYNC*/,0644);
   for (i = 0; i  131072/j; i++) {
  int u;
  lseek(fd, sizeof(buf)*(i*j), SEEK_SET);
  write(fd, buf , sizeof(buf));  
}
   close(fd);
   if (end)
 exit(0);
   sleep(60);
}

 
int main(void) {
   int fd0 = open(/tmp/foo.txt,O_RDWR |O_CREAT /*| O_SYNC*/,0644);
   int fd1 = open(/tmp/foo1.txt,O_RDWR |O_CREAT /*| O_SYNC*/,0644);
   
   int fd;
   long long start;
   long long end = 0;
   off_t lfsr = 0;
   memset(buf, 'a', sizeof(buf));
   ftruncate(fd0, f_size);
   ftruncate(fd1, f_size);
   printf (%d\n,RAND_MAX);
//   child3(0);
   
   if (!fork()) {
 child();
 exit(1);
   }
   
#if 0
   if (!fork()) {
 child2();
 exit(1);
   }
   if (!fork()) {
 child3(1);
 exit(1);
   }
#endif   
   srandom(1000+time(NULL));
   while(1) {
  fd = fd0;
  if (FLOCK(fd, LOCK_EX| LOCK_NB) == -1) {
 if (errno == EWOULDBLOCK)
fd =fd1;
  }
  lfsr = random()/sizeof(buf);
  if (out) {
 start = microseconds();
  }
//  if (pwrite(fd ,buf ,sizeof(buf), sizeof(buf)*lfsr) == -1) {
  lseek(fd, sizeof(buf)*lfsr, SEEK_SET);
  if (write(fd,buf,sizeof(buf)) == -1) {
 perror(write);
 exit(1);
  }
  if (out) {
printf(Write %lu in %lld microseconds\n, lfsr *sizeof(buf), microseconds()-start);
fflush(stdout);
  }
  if (fd == fd0) { 
 FLOCK(fd, LOCK_UN);
  }
  usleep(2);
   }
   close(fd);
   exit(0);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-16 Thread Heikki Linnakangas

On 12/15/2014 08:51 PM, Simon Riggs wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.


What other useful actions can WAL receiver do while it's waiting? It 
doesn't do much else than receive WAL, and fsync it to disk.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-16 Thread Andres Freund
On 2014-12-16 16:12:40 +0200, Heikki Linnakangas wrote:
 On 12/15/2014 08:51 PM, Simon Riggs wrote:
 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.
 
 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.
 
 What other useful actions can WAL receiver do while it's waiting? It doesn't
 do much else than receive WAL, and fsync it to disk.

It can actually receive further data from the network and write it to
disk? On a relatively low latency network the buffers aren't that
large. Right now we generate quite a bursty IO pattern with the disks
alternating between idle and fully busy.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-16 Thread Simon Riggs
On 16 December 2014 at 14:12, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 12/15/2014 08:51 PM, Simon Riggs wrote:

 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.


 What other useful actions can WAL receiver do while it's waiting? It doesn't
 do much else than receive WAL, and fsync it to disk.

So now it will only need to do one of those two things.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2014-12-15 Thread Andres Freund
Hi,

On 2014-12-15 18:51:44 +, Simon Riggs wrote:
 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

Well, it can still buffer data on the network level, but there's
definitely limits to that. So I can see this as being useful.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.
 
 At present this is a WIP patch, for code comments only. Don't bother
 with anything other than code questions at this stage.
 
 Implementation questions are
 
 * How should we wake WALReceiver, since it waits on a poll(). Should
 we use SIGUSR1, which is already used for latch waits, or another
 signal?

It's not entirely trivial, but also not hard, to make it use the latch
code for waiting. It'd probably end up requiring less code because then
we could just scratch libqpwalreceiver.c:libpq_select().

 * Should we introduce some pacing delays if the WALreceiver gets too
 far ahead of apply?

Hm. Why don't we simply start fsyncing in the receiver itself at regular
intervals? If already synced that's cheap, if not, it'll pace us.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers