Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2010-01-19 Thread Andres Freund
Hi Greg,

On Tuesday 19 January 2010 15:52:25 Greg Stark wrote:
> On Mon, Jan 18, 2010 at 4:35 PM, Greg Stark  wrote:
> > Looking at this patch for the commitfest I have a few questions.
> 
> So I've touched this patch up a bit:
> 
> 1) moved the posix_fadvise call to a new fd.c function
> pg_fsync_start(fd,offset,nbytes) which initiates an fsync without
> waiting on it. Currently it's only implemented with
> posix_fadvise(DONT_NEED) but I want to look into using sync_file_range
> in the future -- it looks like this call might be good enough for our
> checkpoints.
Why exactly should that depend on fsync? Sure, thats where most of the pain 
comes from now but avoiding that cache poisoning wouldnt hurt otherwise as 
well.

I would rather have it called pg_flush_cache_range or such...

> 2) advised each 64k chunk as we write it which should avoid poisoning
> the cache if you do a large create database on an active system.
> 
> 3) added the promised but afaict missing fsync of the directory -- i
> think we should actually backpatch this.
I think as well. You need it during recursing as well though (where I had 
added it) and not only for the final directory.

> Barring any objections shall I commit it like this?
Other than the two things above it looks fine to me.

Thanks,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-29 Thread Andres Freund
On Monday 28 December 2009 23:59:43 Andres Freund wrote:
> On Monday 28 December 2009 23:54:51 Andres Freund wrote:
> > On Saturday 12 December 2009 21:38:41 Andres Freund wrote:
> > > On Saturday 12 December 2009 21:36:27 Michael Clemmons wrote:
> > > > If ppl think its worth it I'll create a ticket
> > >
> > > Thanks, no need. I will post a patch tomorrow or so.
> >
> > Well. It was a long day...
> >
> > Anyway.
> > In this patch I delay the fsync done in copy_file and simply do a second
> >  pass over the directory in copy_dir and fsync everything in that pass.
> > Including the directory - which was not done before and actually might be
> > necessary in some cases.
> > I added a posix_fadvise(..., FADV_DONTNEED) to make it more likely that
> > the copied file reaches storage before the fsync. Without the speed
> > benefits were quite a bit smaller and essentially random (which seems
> > sensible).
> >
> > This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s
> > on my laptop.  Still slower than with fsync off (~0.25) but quite a
> > worthy improvement.
> >
> > The benefits are obviously bigger if the template database includes
> >  anything added.
> 
> Obviously the patch would be helpfull.
And it should also be helpfull not to have annoying oversights in there. A  
FreeDir(xldir); is missing at the end of copydir().

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-29 Thread Andres Freund
On Tuesday 29 December 2009 11:48:10 Greg Stark wrote:
> On Tue, Dec 29, 2009 at 2:05 AM, Andres Freund  wrote:
> >  Reads Completed:2,8KiB  Writes Completed: 2362,  
> >  29672KiB New:
> >  Reads Completed:0,0KiB  Writes Completed:  550,
> > 5960KiB
> 
> It looks like the new method is only doing 1/6th as much i/o. Do you
> know what's going on there?
While I was surprised by the amount of difference I am not surprised at all 
that there is a significant one - currently the fsync will write out a whole 
bunch of useless stuff every time its called (all metadata, directory structure 
and so on)

This is reproducible...

6MB sounds sensible for the operation btw - the template database is around 
5MB.


Will try to analyze later what exactly causes the additional io.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-29 Thread Greg Stark
On Tue, Dec 29, 2009 at 2:05 AM, Andres Freund  wrote:
>  Reads Completed:        2,        8KiB  Writes Completed:     2362,    
> 29672KiB
> New:
>  Reads Completed:        0,        0KiB  Writes Completed:      550,     
> 5960KiB

It looks like the new method is only doing 1/6th as much i/o. Do you
know what's going on there?


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund
On Tuesday 29 December 2009 00:06:28 Tom Lane wrote:
> Andres Freund  writes:
> > This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s
> > on my laptop.  Still slower than with fsync off (~0.25) but quite a
> > worthy improvement.
> 
> I can't help wondering whether that's real or some kind of
> platform-specific artifact.  I get numbers more like 3.5s (fsync off)
> vs 4.5s (fsync on) on a machine where I believe the disks aren't lying
> about write-complete.  It makes sense that an fsync at the end would be
> a little bit faster, because it would give the kernel some additional
> freedom in scheduling the required I/O, but it isn't cutting the total
> I/O required at all.  So I find it really hard to believe a 10x speedup.
I only comfortably have access to two smaller machines without BBU from here 
(being in the Hacker Jeopardy at the ccc congress ;-)) and both show this 
behaviour. I guess its somewhat filesystem dependent. 

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund
On Tuesday 29 December 2009 00:06:28 Tom Lane wrote:
> Andres Freund  writes:
> > This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s
> > on my laptop.  Still slower than with fsync off (~0.25) but quite a
> > worthy improvement.
> I can't help wondering whether that's real or some kind of
> platform-specific artifact.  I get numbers more like 3.5s (fsync off)
> vs 4.5s (fsync on) on a machine where I believe the disks aren't lying
> about write-complete.  It makes sense that an fsync at the end would be
> a little bit faster, because it would give the kernel some additional
> freedom in scheduling the required I/O, but it isn't cutting the total
> I/O required at all.  So I find it really hard to believe a 10x speedup.
Well, a template database is about 5.5MB big here - that shouldnt take too 
long when written near-sequentially?
As I said the real benefit only occurred after adding posix_fadvise(.., 
FADV_DONTNEED) which is somewhat plausible, because i.e. the directory entries 
don't need to get scheduled for every file and because the kernel can reorder a 
whole directory nearly sequentially. Without the advice it the kernel doesn't 
know in time that it should write that data back and it wont do it for 5 
seconds by default on linux or such...

I looked at the strace output - it looks sensible timewise to me. If youre 
interested I can give you output of that.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Tom Lane
Andres Freund  writes:
> This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s on my
> laptop.  Still slower than with fsync off (~0.25) but quite a worthy 
> improvement.

I can't help wondering whether that's real or some kind of
platform-specific artifact.  I get numbers more like 3.5s (fsync off)
vs 4.5s (fsync on) on a machine where I believe the disks aren't lying
about write-complete.  It makes sense that an fsync at the end would be
a little bit faster, because it would give the kernel some additional
freedom in scheduling the required I/O, but it isn't cutting the total
I/O required at all.  So I find it really hard to believe a 10x speedup.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund
On Monday 28 December 2009 23:54:51 Andres Freund wrote:
> On Saturday 12 December 2009 21:38:41 Andres Freund wrote:
> > On Saturday 12 December 2009 21:36:27 Michael Clemmons wrote:
> > > If ppl think its worth it I'll create a ticket
> >
> > Thanks, no need. I will post a patch tomorrow or so.
> 
> Well. It was a long day...
> 
> Anyway.
> In this patch I delay the fsync done in copy_file and simply do a second
>  pass over the directory in copy_dir and fsync everything in that pass.
> Including the directory - which was not done before and actually might be
> necessary in some cases.
> I added a posix_fadvise(..., FADV_DONTNEED) to make it more likely that the
> copied file reaches storage before the fsync. Without the speed benefits
>  were quite a bit smaller and essentially random (which seems sensible).
> 
> This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s on
>  my laptop.  Still slower than with fsync off (~0.25) but quite a worthy
>  improvement.
> 
> The benefits are obviously bigger if the template database includes
>  anything added.
Obviously the patch would be helpfull.

Andres
From bd80748883d1328a71607a447677b0bfb1f54ab0 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 28 Dec 2009 23:43:57 +0100
Subject: [PATCH] Delay fsyncing files during copying in CREATE DATABASE - this
 dramatically speeds up CREATE DATABASE on non battery backed
 rotational storage.
 Additionally fsync() the directory to ensure all metadata reaches
 storage.

---
 src/port/copydir.c |   58 +--
 1 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/src/port/copydir.c b/src/port/copydir.c
index a70477e..cde3dc7 100644
*** a/src/port/copydir.c
--- b/src/port/copydir.c
***
*** 37,42 
--- 37,43 
  
  
  static void copy_file(char *fromfile, char *tofile);
+ static void fsync_fname(char *fname);
  
  
  /*
*** copydir(char *fromdir, char *todir, bool
*** 64,69 
--- 65,73 
  (errcode_for_file_access(),
   errmsg("could not open directory \"%s\": %m", fromdir)));
  
+ 	/*
+ 	 * Copy all the files
+ 	 */
  	while ((xlde = ReadDir(xldir, fromdir)) != NULL)
  	{
  		struct stat fst;
*** copydir(char *fromdir, char *todir, bool
*** 89,96 
  		else if (S_ISREG(fst.st_mode))
  			copy_file(fromfile, tofile);
  	}
- 
  	FreeDir(xldir);
  }
  
  /*
--- 93,120 
  		else if (S_ISREG(fst.st_mode))
  			copy_file(fromfile, tofile);
  	}
  	FreeDir(xldir);
+ 
+ 	/*
+ 	 * Be paranoid here and fsync all files to ensure we catch problems.
+ 	 */
+ 	xldir = AllocateDir(fromdir);
+ 	if (xldir == NULL)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg("could not open directory \"%s\": %m", fromdir)));
+ 
+ 	while ((xlde = ReadDir(xldir, fromdir)) != NULL)
+ 	{
+ 		struct stat fst;
+ 
+ 		if (strcmp(xlde->d_name, ".") == 0 ||
+ 			strcmp(xlde->d_name, "..") == 0)
+ 			continue;
+ 
+ 		snprintf(tofile, MAXPGPATH, "%s/%s", todir, xlde->d_name);
+ 		fsync_fname(tofile);
+ 	}
  }
  
  /*
*** copy_file(char *fromfile, char *tofile)
*** 150,162 
  	}
  
  	/*
! 	 * Be paranoid here to ensure we catch problems.
  	 */
! 	if (pg_fsync(dstfd) != 0)
! 		ereport(ERROR,
! (errcode_for_file_access(),
!  errmsg("could not fsync file \"%s\": %m", tofile)));
! 
  	if (close(dstfd))
  		ereport(ERROR,
  (errcode_for_file_access(),
--- 174,185 
  	}
  
  	/*
! 	 * We tell the kernel here to write the data back in order to make
! 	 * the later fsync cheaper.
  	 */
! #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
! 	posix_fadvise(dstfd, 0, 0, POSIX_FADV_DONTNEED);
! #endif
  	if (close(dstfd))
  		ereport(ERROR,
  (errcode_for_file_access(),
*** copy_file(char *fromfile, char *tofile)
*** 166,168 
--- 189,212 
  
  	pfree(buffer);
  }
+ 
+ /*
+  * fsync a file
+  */
+ static void
+ fsync_fname(char *fname)
+ {
+ 	int	fd = BasicOpenFile(fname, O_RDWR| PG_BINARY,
+ 		  S_IRUSR | S_IWUSR);
+ 
+ 	if (fd < 0)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg("could not create file \"%s\": %m", fname)));
+ 
+ 	if (pg_fsync(fd) != 0)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg("could not fsync file \"%s\": %m", fname)));
+ 	close(fd);
+ }
-- 
1.6.5.12.gd65df24


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers