Re: [HACKERS] log chunking broken with large queries under load
On 04/04/2012 03:09 PM, Tom Lane wrote: I wrote: The idea I had in mind was to compensate for adding list-removal logic by getting rid of the concept of an unused entry. If the removal is conditional then you can't do that and you end up with the complications of both methods. Anyway I've not tried to code it yet. I concluded this would probably be a loser performance-wise, because it would add a couple of palloc/pfree cycles to the processing of each multi-chunk message, whether there was any contention or not. So I committed the patch with just some cosmetic cleanups. OK, thanks for doing this. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
I wrote: > The idea I had in mind was to compensate for adding list-removal logic > by getting rid of the concept of an unused entry. If the removal is > conditional then you can't do that and you end up with the complications > of both methods. Anyway I've not tried to code it yet. I concluded this would probably be a loser performance-wise, because it would add a couple of palloc/pfree cycles to the processing of each multi-chunk message, whether there was any contention or not. So I committed the patch with just some cosmetic cleanups. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
Andrew Dunstan writes: > On 04/04/2012 12:13 PM, Tom Lane wrote: >> Does anyone feel that it's a bad idea that list entries are never >> reclaimed? In the worst case a transient load peak could result in >> a long list that permanently adds search overhead. Not sure if it's >> worth the extra complexity to delete a list cell that's no longer >> needed, rather than leaving it present and empty. > Me either. The logic could possibly be something simple when we free a > node like "while the list tail is an available node prune the tail". But > as you say, it might not be worth it. The idea I had in mind was to compensate for adding list-removal logic by getting rid of the concept of an unused entry. If the removal is conditional then you can't do that and you end up with the complications of both methods. Anyway I've not tried to code it yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
On 04/04/2012 12:13 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/02/2012 01:03 PM, Tom Lane wrote: When I said "list", I meant a "List *". No fixed size. Ok, like this? I think this could use a bit of editorialization (I don't think the "stripe" terminology is still applicable, in particular), but the general idea seems OK. Does anyone feel that it's a bad idea that list entries are never reclaimed? In the worst case a transient load peak could result in a long list that permanently adds search overhead. Not sure if it's worth the extra complexity to delete a list cell that's no longer needed, rather than leaving it present and empty. Me either. The logic could possibly be something simple when we free a node like "while the list tail is an available node prune the tail". But as you say, it might not be worth it. Do we consider this a bug fix, to be backpatched? Yes, definitely. I think I'd like to have a go at coding it the other way (with release of list entries), just to see if that comes out cleaner or uglier than this way. If you don't mind I'll pick this up and commit whichever way turns out better. Go for it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
Andrew Dunstan writes: > On 04/02/2012 01:03 PM, Tom Lane wrote: >> When I said "list", I meant a "List *". No fixed size. > Ok, like this? I think this could use a bit of editorialization (I don't think the "stripe" terminology is still applicable, in particular), but the general idea seems OK. Does anyone feel that it's a bad idea that list entries are never reclaimed? In the worst case a transient load peak could result in a long list that permanently adds search overhead. Not sure if it's worth the extra complexity to delete a list cell that's no longer needed, rather than leaving it present and empty. > Do we consider this a bug fix, to be backpatched? Yes, definitely. I think I'd like to have a go at coding it the other way (with release of list entries), just to see if that comes out cleaner or uglier than this way. If you don't mind I'll pick this up and commit whichever way turns out better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
On 04/02/2012 01:03 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/02/2012 12:44 PM, Tom Lane wrote: You could do something like having a list of pending chunks for each value of (pid mod 256). The length of each such list ought to be plenty short under ordinary circumstances. Yeah, ok, that should work. How big would we make each list to start with? Still 20, or smaller? When I said "list", I meant a "List *". No fixed size. Ok, like this? Do we consider this a bug fix, to be backpatched? cheers andrew diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c index a603ab4..2e4d437 100644 --- a/src/backend/postmaster/syslogger.c +++ b/src/backend/postmaster/syslogger.c @@ -35,6 +35,7 @@ #include "libpq/pqsignal.h" #include "miscadmin.h" #include "pgtime.h" +#include "nodes/pg_list.h" #include "postmaster/fork_process.h" #include "postmaster/postmaster.h" #include "postmaster/syslogger.h" @@ -93,10 +94,11 @@ static char *last_file_name = NULL; static char *last_csv_file_name = NULL; /* - * Buffers for saving partial messages from different backends. We don't expect - * that there will be very many outstanding at one time, so 20 seems plenty of - * leeway. If this array gets full we won't lose messages, but we will lose - * the protocol protection against them being partially written or interleaved. + * Buffers for saving partial messages from different backends. + * + * Keep 256 lists of these, the list for a given pid being pid mod 256, + * so that for any message we don't have to search a long list looking + * for a match and/or an open slot. * * An inactive buffer has pid == 0 and undefined contents of data. */ @@ -106,8 +108,8 @@ typedef struct StringInfoData data; /* accumulated data, as a StringInfo */ } save_buffer; -#define CHUNK_SLOTS 20 -static save_buffer saved_chunks[CHUNK_SLOTS]; +#define CHUNK_STRIPES 256 +static List *buffer_stripes[CHUNK_STRIPES]; /* These must be exported for EXEC_BACKEND case ... annoying */ #ifndef WIN32 @@ -725,6 +727,7 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer) { PipeProtoHeader p; int chunklen; + List *buffer_stripe; /* Do we have a valid header? */ memcpy(&p, cursor, sizeof(PipeProtoHeader)); @@ -743,53 +746,55 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer) dest = (p.is_last == 'T' || p.is_last == 'F') ? LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR; + buffer_stripe = buffer_stripes[p.pid % CHUNK_STRIPES]; + if (p.is_last == 'f' || p.is_last == 'F') { /* - * Save a complete non-final chunk in the per-pid buffer if - * possible - if not just write it out. + * Save a complete non-final chunk in the per-pid buffer */ -int free_slot = -1, - existing_slot = -1; -int i; +ListCell*cell; +save_buffer *existing_slot = NULL, + *free_slot = NULL; StringInfo str; -for (i = 0; i < CHUNK_SLOTS; i++) +foreach (cell, buffer_stripe) { - if (saved_chunks[i].pid == p.pid) + save_buffer *buffnode = cell->data.ptr_value; + if (buffnode->pid == p.pid) { - existing_slot = i; + existing_slot = buffnode; break; } - if (free_slot < 0 && saved_chunks[i].pid == 0) - free_slot = i; + if (free_slot == NULL && buffnode->pid == 0) + free_slot = buffnode; } -if (existing_slot >= 0) +if (existing_slot != NULL) { - str = &(saved_chunks[existing_slot].data); + str = &(existing_slot->data); appendBinaryStringInfo(str, cursor + PIPE_HEADER_SIZE, p.len); } -else if (free_slot >= 0) +else { - saved_chunks[free_slot].pid = p.pid; - str = &(saved_chunks[free_slot].data); + if (free_slot == NULL) + { + /* + * Need a free slot, but there isn't one in the list, + * so create a new one and extend the list with it. + */ + free_slot = palloc(sizeof(save_buffer)); + buffer_stripe = lappend(buffer_stripe, free_slot); + buffer_stripes[p.pid % CHUNK_STRIPES] = buffer_stripe; + } + free_slot->pid = p.pid; + str = &(free_slot->data); initStringInfo(str); appendBinaryStringInfo(str, cursor + PIPE_HEADER_SIZE, p.len); } -else -{ - /* - * If there is no free slot we'll just have to take our - * chances and write out a partial message and hope that - * it's not followed by something from another pid. - */ - write_syslogger_file(cursor + PIPE_HEADER_SIZE, p.len, - dest); -} } else { @@ -797,26 +802,28 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer) * Final chunk --- add it to anything saved for that pid, and * either way write the whole thing out. */ -int existing_slot = -1; -int i; +ListCell *cell; +sa
Re: [HACKERS] log chunking broken with large queries under load
Andrew Dunstan writes: > On 04/02/2012 12:44 PM, Tom Lane wrote: >> You could do something like having a list of pending chunks for each >> value of (pid mod 256). The length of each such list ought to be plenty >> short under ordinary circumstances. > Yeah, ok, that should work. How big would we make each list to start > with? Still 20, or smaller? When I said "list", I meant a "List *". No fixed size. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
On 04/02/2012 12:44 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/02/2012 12:00 PM, Tom Lane wrote: This seems like it isn't actually fixing the problem, only pushing out the onset of trouble a bit. Should we not replace the fixed-size array with a dynamic data structure? But maybe your're right. If we do that and stick with my two-dimensional scheme to keep the number of probes per chunk down, we'd need to reorg the array every time we increased it. That might be a bit messy, but might be ok. Or maybe linearly searching an array of several hundred slots for our pid for every log chunk that comes in would be fast enough. You could do something like having a list of pending chunks for each value of (pid mod 256). The length of each such list ought to be plenty short under ordinary circumstances. Yeah, ok, that should work. How big would we make each list to start with? Still 20, or smaller? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
Andrew Dunstan writes: > On 04/02/2012 12:00 PM, Tom Lane wrote: >> This seems like it isn't actually fixing the problem, only pushing out >> the onset of trouble a bit. Should we not replace the fixed-size array >> with a dynamic data structure? > But maybe your're right. If we do that and stick with my two-dimensional > scheme to keep the number of probes per chunk down, we'd need to reorg > the array every time we increased it. That might be a bit messy, but > might be ok. Or maybe linearly searching an array of several hundred > slots for our pid for every log chunk that comes in would be fast enough. You could do something like having a list of pending chunks for each value of (pid mod 256). The length of each such list ought to be plenty short under ordinary circumstances. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
On 04/02/2012 12:00 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/01/2012 06:34 PM, Andrew Dunstan wrote: Some of my PostgreSQL Experts colleagues have been complaining to me that servers under load with very large queries cause CSV log files that are corrupted, We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead to stripe the slots with a two dimensional array, so we didn't have to search a larger number of slots for any given message. See the attached patch. This seems like it isn't actually fixing the problem, only pushing out the onset of trouble a bit. Should we not replace the fixed-size array with a dynamic data structure? "A bit" = 10 to 20 times - more if we set CHUNK_STRIPES higher. :-) But maybe your're right. If we do that and stick with my two-dimensional scheme to keep the number of probes per chunk down, we'd need to reorg the array every time we increased it. That might be a bit messy, but might be ok. Or maybe linearly searching an array of several hundred slots for our pid for every log chunk that comes in would be fast enough. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
Andrew Dunstan writes: > On 04/01/2012 06:34 PM, Andrew Dunstan wrote: >> Some of my PostgreSQL Experts colleagues have been complaining to me >> that servers under load with very large queries cause CSV log files >> that are corrupted, > We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead > to stripe the slots with a two dimensional array, so we didn't have to > search a larger number of slots for any given message. See the attached > patch. This seems like it isn't actually fixing the problem, only pushing out the onset of trouble a bit. Should we not replace the fixed-size array with a dynamic data structure? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log chunking broken with large queries under load
On 04/01/2012 06:34 PM, Andrew Dunstan wrote: Some of my PostgreSQL Experts colleagues have been complaining to me that servers under load with very large queries cause CSV log files that are corrupted, because lines are apparently multiplexed. The log chunking protocol between the errlog routines and the syslogger is supposed to prevent that, so I did a little work to try to reproduce it in a controlled way. Well, a little further digging jogged my memory a bit. It looks like we underestimated the amount of messages we might get as more than one chunk fairly badly. We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead to stripe the slots with a two dimensional array, so we didn't have to search a larger number of slots for any given message. See the attached patch. I'm not sure how much we want to scale this up. I set CHUNK_STRIPES to 20 to start with, and I've asked some colleagues with very heavy log loads with very large queries to test it out if possible. If anyone else has a similar load I'd appreciate similar testing. cheers andrew *** a/src/backend/postmaster/syslogger.c --- b/src/backend/postmaster/syslogger.c *** *** 93,101 static char *last_file_name = NULL; static char *last_csv_file_name = NULL; /* ! * Buffers for saving partial messages from different backends. We don't expect ! * that there will be very many outstanding at one time, so 20 seems plenty of ! * leeway. If this array gets full we won't lose messages, but we will lose * the protocol protection against them being partially written or interleaved. * * An inactive buffer has pid == 0 and undefined contents of data. --- 93,108 static char *last_csv_file_name = NULL; /* ! * Buffers for saving partial messages from different backends. ! * ! * Under heavy load we can get quite a few of them, so we stripe them across ! * an array based on the mod of the pid, which seem an adequate hash function ! * for pids. We do this rather then just keeping a one-dimensional array so ! * we don't have to probe too many slots for any given pid. ! * 20 stripes of 20 slots each seems likely to be adequate leeway, but if there ! * are lots of overruns we might need to increase CHUNK_STRIPES a bit, or look ! * at some alternative scheme. ! * If a stripe gets full we won't lose messages, but we will lose * the protocol protection against them being partially written or interleaved. * * An inactive buffer has pid == 0 and undefined contents of data. *** *** 107,113 typedef struct } save_buffer; #define CHUNK_SLOTS 20 ! static save_buffer saved_chunks[CHUNK_SLOTS]; /* These must be exported for EXEC_BACKEND case ... annoying */ #ifndef WIN32 --- 114,121 } save_buffer; #define CHUNK_SLOTS 20 ! #define CHUNK_STRIPES 20 ! static save_buffer saved_chunks[CHUNK_STRIPES][CHUNK_SLOTS]; /* These must be exported for EXEC_BACKEND case ... annoying */ #ifndef WIN32 *** *** 725,730 process_pipe_input(char *logbuffer, int *bytes_in_logbuffer) --- 733,739 { PipeProtoHeader p; int chunklen; + int stripe; /* Do we have a valid header? */ memcpy(&p, cursor, sizeof(PipeProtoHeader)); *** *** 743,748 process_pipe_input(char *logbuffer, int *bytes_in_logbuffer) --- 752,759 dest = (p.is_last == 'T' || p.is_last == 'F') ? LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR; + stripe = p.pid % CHUNK_STRIPES; + if (p.is_last == 'f' || p.is_last == 'F') { /* *** *** 756,780 process_pipe_input(char *logbuffer, int *bytes_in_logbuffer) for (i = 0; i < CHUNK_SLOTS; i++) { ! if (saved_chunks[i].pid == p.pid) { existing_slot = i; break; } ! if (free_slot < 0 && saved_chunks[i].pid == 0) free_slot = i; } if (existing_slot >= 0) { ! str = &(saved_chunks[existing_slot].data); appendBinaryStringInfo(str, cursor + PIPE_HEADER_SIZE, p.len); } else if (free_slot >= 0) { ! saved_chunks[free_slot].pid = p.pid; ! str = &(saved_chunks[free_slot].data); initStringInfo(str); appendBinaryStringInfo(str, cursor + PIPE_HEADER_SIZE, --- 767,791 for (i = 0; i < CHUNK_SLOTS; i++) { ! if (saved_chunks[stripe][i].pid == p.pid) { existing_slot = i; break; } ! if (free_slot < 0 && saved_chunks[stripe][i].pid == 0) free_slot = i; } if (existing_slot >= 0) { ! str = &(saved_chunks[stripe][existing_slot].data); appendBinaryStringInfo(str, cursor + PIPE_HEADER_SIZE, p.len); } else if (free_slot >= 0) { ! saved_chunks[stripe][free_slot].pid = p.pid; ! str = &(saved_chunks[stripe][free_slot].data); i
[HACKERS] log chunking broken with large queries under load
Some of my PostgreSQL Experts colleagues have been complaining to me that servers under load with very large queries cause CSV log files that are corrupted, because lines are apparently multiplexed. The log chunking protocol between the errlog routines and the syslogger is supposed to prevent that, so I did a little work to try to reproduce it in a controlled way. On my dual quad xeon setup, this script: #!/bin/sh par=$1 seq=$2 sed 2000q /usr/share/dict/words > words psql -q -c 'drop table if exists foo' psql -q -c 'create table foo (t text)' echo '\set words `cat words`' > wordsin.sql echo 'prepare fooplan (text) as insert into foo values ($1);' >> wordsin.sql for i in `seq 1 $seq`; do echo "execute fooplan(:'words');" >> wordsin.sql done for i in `seq 1 $par`; do psql -q -t -f wordsin.sql & done wait called with parameters of 100 and 50 (i.e. 100 simultaneous clients each doing 50 very large inserts) is enough to cause CSV log corruption quite reliably on PostgreSQL 9.1. This is a serious bug. I'm going to investigate, but it's causing major pain, so anyone else who has any ideas is welcome to chime in. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers