Re: [PATCHES] Load distributed checkpoint V4

2007-04-23 Thread Greg Smith

On Thu, 19 Apr 2007, Heikki Linnakangas wrote:

In the sync phase, we sleep between each fsync until enough time/segments 
have passed, assuming that the time to fsync is proportional to the file 
length. I'm not sure that's a very good assumption.


I've been making scatter plots of fsync time vs. amount written to the 
database for a couple of months now, and while there's a trend there it's 
not a linear one based on data written.  Under Linux, to make a useful 
prediction about how long a fsync will take you first need to consider how 
much dirty data is already in the OS cache (the "Dirty:" figure in 
/proc/meminfo) before the write begins, relative to the kernel parameters 
that control write behavior.  Combine that with some knowledge of the 
caching behavior of the controller/disk combination you're using, and it's 
just barely possible to make a reasonable estimate.  Any less information 
than all that and you really have very little basis on which to guess how 
long it's going to take.


Other operating systems are going to give completely different behavior 
here, which of course makes the problem even worse.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-23 Thread Koichi Suzuki

Hi,

Sorry, because of so many comments/questions, I'll write inline

Josh Berkus wrote:

Hackers,


Writing lots of additional code simply to remove a parameter that
*might* be mis-interpreted doesn't sound useful to me, especially when
bugs may leak in that way. My take is that this is simple and useful
*and* we have it now; other ways don't yet exist, nor will they in time
for 8.3.


How about naming the parameter wal_compressable?  That would indicate pretty 
clearly that the parameter is intended to be used with wal_compress and 
nothing else.


Hmm, it sounds nicer.



However, I do agree with Andreas that anything which adds to WAL volume, even 
3%, seems like going in the wrong direction.  We already have higher log 
output than any comparable database (higher than InnoDB by 3x) and we should 
be looking for output to trim as well as compression.


So the relevant question is whether the patch in its current form provides 
enough benefit to make it worthwhile for 8.3, or whether we should wait for 
8.4.  Questions:




Before answering questions below, I'd like to say that archive log 
optimization has to be address different point of views to the current 
(upto 8.2) settings.


1) To deal with partial/inconsisitent write to the data file at crash 
recovery, we need full page writes at the first modification to pages 
after each checkpoint.   It consumes much of WAL space.


2) 1) is not necessary for archive recovery (PITR) and full page writes 
can be removed for this purpose.   However, we need full page writes 
during hot backup to deal with partial writes by backup commands.  This 
is implemented in 8.2.


3) To maintain crash recovery chance and reduce the amount of archive 
log, removal of  unnecessary full page writes from archive logs is a 
good choice.   To do this, we need both logical log and full page writes 
in WAL.


I don't think there should be only one setting.   It depend on how 
database is operated.   Leaving wal_add_optiomization_info = off default 
does not bring any change in WAL and archive log handling.   I 
understand some people may not be happy with additional 3% or so 
increase in WAL size, especially people who dosn't need archive log at 
all.   So I prefer to leave the default off.


For users, I think this is simple enough:

1) For people happy with 8.2 settings:
   No change is needed to move to 8.3 and there's really no change.

2) For people who need to reduce archive log size but like to leave full 
page writes to WAL (to maintain crash recovery chance):

   a) Add GUC parameter: wal_add_optiomization_info=on
   b) Change archive command from "cp" to "pg_compresslog"
   c) Change restore command from "cp" to "pg_decompresslog"

Archive log can be stored and restored as done in older releases.

1) is there any throughput benefit for platforms with fast CPU but contrained 
I/O (e.g. 2-drive webservers)?  Any penalty for servers with plentiful I/O?


I've only run benchmarks with archive process running, because 
wal_add_optimization_info=on does not make sense if we don't archive 
WAL.   In this situation, total I/O decreases because writes to archive 
log decreases.   Because of 3% or so increase in WAL size, there will be 
increase in WAL write, but decrease in archive writes makes it up.




2) Will this patch make attempts to reduce WAL volume in the future 
significantly harder?


Yes, I'd like to continue to work to reduce the WAL size.   It's still 
an issue when database size becomes several handreds of gigabytes in 
size.   Anyway, I think WAL size reduction has to be done in 
XLogInsert() or XLogWrite().   We need much more discussion for this. 
The issue will be how to maintain crash recovery chance by inconsistent 
writes (by full_page_writes=off, we have to give it up).   On the other 
hand we have to keep examining each WAL record.




3) How is this better than command-line compression for log-shipping?  e.g. 
why do we need it in the database?


I don't fully understand what command-line compression means.   Simon 
suggested that this patch can be used with log-shipping and I agree. 
If we compare compression with gzip or other general purpose 
compression, compression ratio, CPU usage and I/O by pg_compresslog are 
all quite better than those in gzip.


Please let me know if you intended defferently.

Regards;

--
-
Koichi Suzuki

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-23 Thread Josh Berkus
Hackers,

> Writing lots of additional code simply to remove a parameter that
> *might* be mis-interpreted doesn't sound useful to me, especially when
> bugs may leak in that way. My take is that this is simple and useful
> *and* we have it now; other ways don't yet exist, nor will they in time
> for 8.3.

How about naming the parameter wal_compressable?  That would indicate pretty 
clearly that the parameter is intended to be used with wal_compress and 
nothing else.

However, I do agree with Andreas that anything which adds to WAL volume, even 
3%, seems like going in the wrong direction.  We already have higher log 
output than any comparable database (higher than InnoDB by 3x) and we should 
be looking for output to trim as well as compression.

So the relevant question is whether the patch in its current form provides 
enough benefit to make it worthwhile for 8.3, or whether we should wait for 
8.4.  Questions:

1) is there any throughput benefit for platforms with fast CPU but contrained 
I/O (e.g. 2-drive webservers)?  Any penalty for servers with plentiful I/O?

2) Will this patch make attempts to reduce WAL volume in the future 
significantly harder?

3) How is this better than command-line compression for log-shipping?  e.g. 
why do we need it in the database?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-23 Thread Zeugswetter Andreas ADI SD

> I don't insist the name and the default of the GUC parameter. 
>  I'm afraid wal_fullpage_optimization = on (default) makes 
> some confusion because the default behavior becomes a bit 
> different on WAL itself.

Seems my wal_fullpage_optimization is not a good name if it caused
misinterpretation already :-(

> >> Amount of WAL after 60min. run of DBT-2 benchmark 
> >> wal_add_optimization_info = off (default) 3.13GB
> > 
> > how about wal_fullpage_optimization = on (default)

The meaning of wal_fullpage_optimization = on (default)
would be the same as your wal_add_optimization_info = off (default).
(Reversed name, reversed meaning of the boolean value)

It would be there to *turn off* the (default) WAL full_page
optimization.
For your pg_compresslog it would need to be set to off. 
"add_optimization_info" sounded like added info about/for some
optimization
which it is not. We turn off an optimization with the flag for the
benefit
of an easier pg_compresslog implementation.

As already said I would decouple this setting from the part that sets
the "removeable full page" flag in WAL, and making the recovery able to
skip dummy records. This I would do unconditionally.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] fix LOCK_DEBUG

2007-04-23 Thread Neil Conway
On Mon, 2007-04-23 at 14:22 +0100, Heikki Linnakangas wrote:
> 8.2 branch doesn't compile with LOCK_DEBUG enabled

Applied, thanks.

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] fix LOCK_DEBUG

2007-04-23 Thread Heikki Linnakangas
8.2 branch doesn't compile with LOCK_DEBUG enabled because of a missing 
include. It was added to CVS HEAD in December..


Index: src/backend/utils/misc/guc.c
===
RCS file: 
/home/hlinnaka/pgcvsrepository/pgsql/src/backend/utils/misc/guc.c,v

retrieving revision 1.360
diff -c -r1.360 guc.c
*** src/backend/utils/misc/guc.c29 Nov 2006 14:50:07 - 
1.360

--- src/backend/utils/misc/guc.c23 Apr 2007 13:21:47 -
***
*** 26,31 
--- 26,32 
  #endif


+ #include "access/transam.h"
  #include "access/gin.h"
  #include "access/twophase.h"
  #include "access/xact.h"


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Dead Space Map version 3 (simplified)

2007-04-23 Thread Gregory Stark
"Hiroki Kataoka" <[EMAIL PROTECTED]> writes:

> I think there is no problem.  Bloating will make pages including the
> unnecessary area which will not be accessed.  Soon, those pages will be
> registered into DSM.

Except the whole point of the DSM is to let us vacuum those pages *before*
that happens...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend