Re: [HACKERS] Making pg_standby compression-friendly
Koichi Suzuki wrote: As Heikki pointed out, the issue is not to decompress the compressed WAL, but also how we can keep archive log still compressed after it is handled by pg_standby. pg_standby makes a *copy* of the segment from the archive, and need only ensure that the copy is decompressed; it has no reason to ever decompress the original version in the archive. I don't see the problem here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making pg_standby compression-friendly
In the absence of further feedback from 'yall (and in the presence of some positive results from internal QA), I'm adding the posted patch as-is to the 2008-11 CommitFest queue. That said, any such additional feedback would be gratefully appreciated. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making pg_standby compression-friendly
On Thu, Oct 23, 2008 at 1:15 AM, Heikki Linnakangas < [EMAIL PROTECTED]> wrote: > Charles Duffy wrote: > >> I'm interested in compressing archived WAL segments in an environment >> set up for PITR in the interests of reducing both network traffic and >> storage requirements. However, pg_standby presently checks file sizes, >> requiring that an archive segment be exactly the right size to be >> considered valid. The idea of compressing log segments is not new -- >> the clearxlogtail project in pgfoundry provides a tool to make such >> compression more effective, and is explicitly intended for said >> purpose -- but as of 8.3.4, pg_standby appears not to support such >> environments; I propose adding such support. >> > > Can't you decompress the files in whatever script you use to copy them to > the archive location? To be sure I understand -- you're proposing a scenario in which the archive_command on the master compresses the files, passes them over to the slave while compressed, and then decompresses them on the slave for storage in their decompressed state? That succeeds in the goal of decreasing network bandwidth, but (1) isn't necessarily easy to implement over NFS, and (2) doesn't succeed in decreasing storage requirements on the slave. (While pg_standby's behavior is to delete segments which are no longer needed to keep a warm standby slave running, I maintain a separate archive for PITR use with hardlinked copies of those same archive segments; storage on the slave is a much bigger issue in this environment than it would be if the space used for segments were being deallocated as soon as pg_standby chose to unlink them). [Heikki, please accept my apologies for the initial off-list response; I wasn't paying enough attention to gmail's default reply behavior].
[HACKERS] Making pg_standby compression-friendly
Howdy, all. I'm interested in compressing archived WAL segments in an environment set up for PITR in the interests of reducing both network traffic and storage requirements. However, pg_standby presently checks file sizes, requiring that an archive segment be exactly the right size to be considered valid. The idea of compressing log segments is not new -- the clearxlogtail project in pgfoundry provides a tool to make such compression more effective, and is explicitly intended for said purpose -- but as of 8.3.4, pg_standby appears not to support such environments; I propose adding such support. To allow pg_standby to operate in an environment where archive segments are compressed, two behaviors are necessary: - suppressing the file-size checks. This puts the onus on the user to create these files via an atomic mechanism, but is necessary to allow compressed files to be considered. - allowing a custom restore command to be provided. This permits the user to specify the mechanism to be used to decompress the segment. One bikeshed is determining whether the user should pass in a command suitable for use in a pipeline or a command which accepts input and output as arguments. A sample implementation is attached, intended only to kickstart discussion; I'm not attached to either its implementation or its proposed command-line syntax. Thoughts? --- pg_standby.c.orig 2008-07-08 10:12:04.0 -0500 +++ pg_standby.c 2008-10-22 19:05:41.0 -0500 @@ -50,9 +50,11 @@ bool triggered = false; /* have we been triggered? */ bool need_cleanup = false; /* do we need to remove files from * archive? */ +bool disable_size_checks = false; /* avoid checking segment size */ static volatile sig_atomic_t signaled = false; +char *customRestore; /* Filter or command used to restore segments */ char *archiveLocation; /* where to find the archive? */ char *triggerPath; /* where to find the trigger file? */ char *xlogFilePath; /* where we are going to restore to */ @@ -66,6 +68,8 @@ #define RESTORE_COMMAND_COPY 0 #define RESTORE_COMMAND_LINK 1 +#define RESTORE_COMMAND_PIPE 2 +#define RESTORE_COMMAND_CUST 3 int restoreCommandType; #define XLOG_DATA 0 @@ -112,8 +116,15 @@ snprintf(WALFilePath, MAXPGPATH, "%s\\%s", archiveLocation, nextWALFileName); switch (restoreCommandType) { + case RESTORE_COMMAND_PIPE: + snprintf(restoreCommand, MAXPGPATH, "%s <\"%s\" >\"%s\"", customRestore, WALFilePath, xlogFilePath); + break; + case RESTORE_COMMAND_CUST: + SET_RESTORE_COMMAND(customRestore, WALFilePath, xlogFilePath); + break; case RESTORE_COMMAND_LINK: SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath); + break; case RESTORE_COMMAND_COPY: default: SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath); @@ -123,6 +134,12 @@ snprintf(WALFilePath, MAXPGPATH, "%s/%s", archiveLocation, nextWALFileName); switch (restoreCommandType) { + case RESTORE_COMMAND_PIPE: + snprintf(restoreCommand, MAXPGPATH, "%s <\"%s\" >\"%s\"", customRestore, WALFilePath, xlogFilePath); + break; + case RESTORE_COMMAND_CUST: + snprintf(restoreCommand, MAXPGPATH, "%s \"%s\" \"%s\"", customRestore, WALFilePath, xlogFilePath); + break; case RESTORE_COMMAND_LINK: #if HAVE_WORKING_LINK SET_RESTORE_COMMAND("ln -s -f", WALFilePath, xlogFilePath); @@ -170,7 +187,7 @@ nextWALFileType = XLOG_BACKUP_LABEL; return true; } - else if (stat_buf.st_size == XLOG_SEG_SIZE) + else if (disable_size_checks || stat_buf.st_size == XLOG_SEG_SIZE) { #ifdef WIN32 @@ -190,7 +207,7 @@ /* * If still too small, wait until it is the correct size */ - if (stat_buf.st_size > XLOG_SEG_SIZE) + if ( (!disable_size_checks) && stat_buf.st_size > XLOG_SEG_SIZE) { if (debug) { @@ -432,12 +449,15 @@ fprintf(stderr, "note space between ARCHIVELOCATION and NEXTWALFILE\n"); fprintf(stderr, "with main intended use as a restore_command in the recovery.conf\n"); fprintf(stderr, " restore_command = 'pg_standby [OPTION]... ARCHIVELOCATION %%f %%p %%r'\n"); - fprintf(stderr, "e.g. restore_command = 'pg_standby -l /mnt/server/archiverdir %%f %%p %%r'\n"); + fprintf(stderr, "e.g. restore_command = 'pg_standby -l /mnt/server/archiverdir %%f %%p %%r'\n\n"); + fprintf(stderr, "If -C or -p are used, the archive must be populated using atomic calls (ie. rename).\n"); fprintf(stderr, "\nOptions:\n"); + fprintf(stderr, " -C COMMAND invoke command for retrieval from the archive (as \"COMMAND source dest\")\n"); fprintf(stderr, " -c copies file from archive (default)\n"); fprintf(stderr, " -d generate lots of debugging output (testing only)\n"); fprintf(stderr, " -k NUMFILESTOKEEP if RESTARTWALFILE not used, removes files prior to limit (0 keeps all)\n"); fprintf(stderr, " -l links into archive (leaves file in archive)\n"); + fprintf(stderr, " -p COMMAND pipe through command on retrieval from the archive (ie. 'gzip -c')\n"); fp
Re: [HACKERS] [PATCHES] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()
On 7/15/06, Tom Lane <[EMAIL PROTECTED]> wrote: Anyway, Qingqing's question still needs to be answered: how can a sort of under 30k items take so long? It happens because (as previously suggested by Tom) the dataset for the 'short' (~10k rows, .3 sec) sort has no rows whose leftmost fields evaluate to 'equal' when passed to the qsort compare function. The 'long' sort, (~30k rows, 78 sec) has plenty of rows whose first 6 columns all evaluate as 'equal' when the rows are compared. For the 'long' data, the compare moves on rightward until it encounters 'flato', which is a TEXT column with an average length of 7.5k characters (with some rows up to 400k). The first 6 columns are mostly INTEGER, so compares on them are relatively inexpensive. All the expensive compares on 'flato' account for the disproportionate difference in sort times, relative to the number of rows in each set. As for the potential for memory leaks - thinking about it. Thanks, Charles Duffy. Peter Eisentraut <[EMAIL PROTECTED]> writes: > The merge sort is here: > http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc > It uses alloca, so we're good here. Uh ... but it also uses malloc, and potentially a honkin' big malloc at that (up to a quarter of physical RAM). So I'm worried again. Anyway, Qingqing's question still needs to be answered: how can a sort of under 30k items take so long? regards, tom lane Column | Type | Modifiers ---+-+--- record| integer | commr1| integer | envr1 | oid | docin | integer | creat | integer | flati | text| flato | text| doc | text| docst | integer | vlord | integer | vl0 | integer | vl1 | date| vl2 | text| vl3 | text| vl4 | text| vl5 | text| vl6 | text| vl7 | date| vl8 | text| vl9 | integer | ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster