Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote: Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. I've almost inclined to that but on some thoughts on the idea, comming to think of recovery upto target timeline, the old segment found to be necessary for the case. Without the old segment, we would be obliged to seek to the first segment of the *next* timeline (Is there any (simple) means to predict where is it?) to complete the task. How did the server that created the new timeline get the old, partial, segment? Was it already archived? Or did the DBA copy it into pg_xlog manually? Or was it streamed by streaming replication? Whatever the mechanism, the same mechanism ought to make sure the old segment is available for PITR, too. Hmm. If you have set up streaming replication and a WAL archive, and your master dies and you fail over to a standby, what you describe does happen. The partial old segment is not in the archive, so you cannot PITR to a point in the old timeline that falls within the partial segment (ie. just before the failover). However, it's not guaranteed that all the preceding WAL segments on the old timeline were already archived, anyway, so even if the partial segment is archived, it's not guaranteed to work. The old master is responsible for archiving the WAL on the old timeline, and the new master is responsible for archiving all the WAL on the new timeline. That's a straightforward, easy-to-understand rule. It might be useful to have a mode where the standby also archives all the received WAL, but that would need to be a separate option. Is it the right way we kick the older one out of archive? If it's already in the archive, it's not going to be removed from the archive. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Hi all, On 02/12/2014 08:27 PM, Greg Stark wrote: On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: For what it's worth I've confirmed the bug in wal-e caused the initial problem. Huh? Bug in wal-e? What bug? WAL-E actually didn't restore a whole 1GB file due to a transient S3 problem, in fact a bunch of them. It's remarkable that Postgres kept going with that much data missing. But the arithmetic worked out on the case I checked it on, which was the last one that I just sent the xlog record for last night. In that case there was precisely one segment missing and the relation was extended by the number of segments you would expect if it filled in that missing segment and then jumped to the end of the relation. sorry for interrupting, but did we already notify wal-e's maintainer? Andrea ps cc:ed Daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] how set GUC_check_errhint_string in call_string_check_hook()
Dear all, I need help to send hint message to erreport() in call_string_check_hook function [src/backend/utils/misc/guc.c:8092] In call_string_check_hook(), following parameter are resets /* Reset variables that might be set by hook */ 8100 GUC_check_errcode_value = ERRCODE_INVALID_PARAMETER_VALUE; 8101 GUC_check_errmsg_string = NULL; 8102 GUC_check_errdetail_string = NULL; 8103 GUC_check_errhint_string = NULL; at line 8107 in ereport() it has conditional operation as 8113GUC_check_errdetail_string ? 8114 errdetail_internal(%s, GUC_check_errdetail_string) : 0, I have basic questions, 1. Condition testing of GUC_check_errdetail_string or other GUC_* is needed, even if we resting it? 2. Can I pass hint message in above ereport(), how? Thanks in advance ! Regards, Amul Sul
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/12/2014 01:24 PM, Christoph Berg wrote: Re: Heikki Linnakangas 2014-01-13 52d3caff.3010...@vmware.com Actually, why is the partially-filled 00010002 file archived in the first place? Looking at the code, it's been like that forever, but it seems like a bad idea. If the original server is still up and running, and writing more data to that file, what will happen is that when the original server later tries to archive it, it will fail because the partial version of the file is already in the archive. Or worse, the partial version overwrites a previously archived more complete version. Oh! This explains some transient errors I've seen. Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. Now, what to do about back-branches? On one hand, I'd like to apply the same fix to all stable branches, as the current behavior is silly and always has been. On the other hand, we haven't heard any complaints about it, so we probably shouldn't fix what ain't broken. Perhaps we should apply it to 9.3, as that's where we have the acute problem the OP reported. Thoughts? In summary, I propose that we change master and REL9_3_STABLE to not archive the partial segment from previous timeline. Older branches will keep the current behavior. I've seen the can't archive file from the old timeline problem on 9.2 and 9.3 slaves after promotion. The problem is in conjunction with the proposed archive_command in the default postgresql.conf comments: # e.g. 'test ! -f /mnt/server/archivedir/%f cp %p /mnt/server/archivedir/%f' With 9.1, it works, but 9.2 and 9.3 don't archive anything until I remove the test ! -f part. (An alternative fix would be to declare the behavior ok and adjust that example in the config.) Hmm, the behavior is the same in 9.1 and 9.2. Did you use a different archive_command in 9.1, without the test? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how set GUC_check_errhint_string in call_string_check_hook()
Amul, On Thu, Feb 13, 2014 at 5:59 PM, amul sul sul_a...@yahoo.co.in wrote: I have basic questions, 1. Condition testing of GUC_check_errdetail_string or other GUC_* is needed, even if we resting it? A few lines before this conditional test, there is a call to the hook function (using the check_hook function pointer) as follows which might set these variables: if (!(*conf-check_hook) (newval, extra, source)) 2. Can I pass hint message in above ereport(), how? It could be done by setting the check hook related global variable GUC_check_errhint_string inside your hook function. -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
I think what you're arguing is that we should see WAL records filling the rest of segment 1 before we see any references to segment 2, but if that's the case then how did we get into the situation you reported? Or is it just that it was a broken base backup to start with? The scenario I could come up with that didn't require a broken base backup was that there was an earlier truncate or vacuum. So the sequence is high offset reference, truncate, growth, crash. All possibly on a single database. It's possible we're better off not assuming we've thought of all possible ways this can happen though.
[HACKERS] New hook after raw parsing, before analyze
Hello Hackers, I work on a foreign data wrapper for a legacy system. I generally find the hook system very useful and flexible way to extend Postgres. The post parse analyze hook almost fits what I need, but I have a few use cases where I would need to tap right into the parsed queries but before any catalog based validation is done. Please find the attached trivial patch for this new hook. One of the use cases I have is this: I have table like data structures in the source system for the FDW I work on. These tables are sometimes too big and the source system is able to filter and join them with limitations, thus it is not optimal to transfer the data to Postgres. At the same time I want the users to think in terms of the original tables. The idea is to rewrite the SQL queries like this: “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND b.col2=987” to: “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987” This rewritten query would be handled by the FDW table that I previously added to the catalog. The reason I want this new hook is that I don’t want tableA and tableB to be in the catalog. Looking forward to hear your thoughts, opinions, comments. Best regards, David post_raw_parse.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Thu, Feb 13, 2014 at 10:20:46AM +0530, Amit Kapila wrote: Why not *only* prefix/suffix? To represent prefix/suffix match, we atleast need a way to tell that the offset and len of matched bytes and then how much is the length of unmatched bytes we have copied. I agree that a simpler format could be devised if we just want to do prefix-suffix match, but that would require much more test during recovery to ensure everything is fine, advantage with LZ format is that we don't need to bother about decoding, it will work as without any much change in LZ decode routine. Based on the numbers I think prefix/suffix-only needs to be explored. Consider if you just change one field of a row --- prefix/suffix would find all the matching parts. If you change the first and last fields, you get no compression at all, but your prefix/suffix test isn't going to get that either. As I understand it, the only place prefix/suffix with LZ compression is a win over prefix/suffix-only is when you change two middle fields, and there are common fields unchanged between them. If we are looking at 11% CPU overhead for that, it isn't worth it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
eFrom: Andres Freund and...@2ndquadrant.com could you try if you get more readable dumps by using disassemble/m? That might at least print line numbers if you have debug info installed. Please find the attached file. I hope this will reveal something. Regards MauMau Dump of assembler code for function LWLockRelease: 0x00647d40 +0:push %r12 0x00647d42 +2:mov%edi,%r12d 0x00647d45 +5:shl$0x5,%r12 0x00647d49 +9:add0x4f3a98(%rip),%r12# 0xb3b7e8 LWLockArray 0x00647d50 +16: push %rbp 0x00647d51 +17: mov%edi,%ebp 0x00647d53 +19: push %rbx 0x00647d54 +20: mov0x4f3a86(%rip),%ebx# 0xb3b7e0 num_held_lwlocks 0x00647d5a +26: nopw 0x0(%rax,%rax,1) 0x00647d60 +32: sub$0x1,%ebx 0x00647d63 +35: js 0x647ea4 LWLockRelease+356 0x00647d69 +41: movslq %ebx,%rax 0x00647d6c +44: cmp%ebp,0xb3b800(,%rax,4) 0x00647d73 +51: jne0x647d60 LWLockRelease+32 0x00647d75 +53: mov0x4f3a65(%rip),%esi# 0xb3b7e0 num_held_lwlocks 0x00647d7b +59: sub$0x1,%esi 0x00647d7e +62: cmp%ebx,%esi 0x00647d80 +64: mov%esi,0x4f3a5a(%rip)# 0xb3b7e0 num_held_lwlocks 0x00647d86 +70: jg 0x647d92 LWLockRelease+82 0x00647d88 +72: jmp0x647dad LWLockRelease+109 0x00647d8a +74: nopw 0x0(%rax,%rax,1) 0x00647d90 +80: mov%ecx,%ebx 0x00647d92 +82: lea0x1(%rbx),%ecx 0x00647d95 +85: movslq %ebx,%rax 0x00647d98 +88: movslq %ecx,%rdx 0x00647d9b +91: cmp%ecx,%esi 0x00647d9d +93: mov0xb3b800(,%rdx,4),%edx 0x00647da4 +100: mov%edx,0xb3b800(,%rax,4) 0x00647dab +107: jg 0x647d90 LWLockRelease+80 0x00647dad +109: mov$0x1,%eax 0x00647db2 +114: lock xchg %al,(%r12) 0x00647db7 +119: test %al,%al 0x00647db9 +121: jne0x647ee4 LWLockRelease+420 0x00647dbf +127: movzbl 0x2(%r12),%eax 0x00647dc5 +133: test %al,%al 0x00647dc7 +135: jle0x647f04 LWLockRelease+452 0x00647dcd +141: movzbl 0x2(%r12),%eax 0x00647dd3 +147: sub$0x1,%eax 0x00647dd6 +150: mov%al,0x2(%r12) 0x00647ddb +155: mov0x8(%r12),%rcx 0x00647de0 +160: test %rcx,%rcx 0x00647de3 +163: je 0x647def LWLockRelease+175 0x00647de5 +165: movzbl 0x2(%r12),%eax 0x00647deb +171: test %al,%al 0x00647ded +173: je 0x647e08 LWLockRelease+200 0x00647def +175: movb $0x0,(%r12) 0x00647df4 +180: pop%rbx 0x00647df5 +181: pop%rbp 0x00647df6 +182: mov0x53d25c(%rip),%eax# 0xb85058 InterruptHoldoffCount 0x00647dfc +188: pop%r12 0x00647dfe +190: sub$0x1,%eax 0x00647e01 +193: mov%eax,0x53d251(%rip)# 0xb85058 InterruptHoldoffCount 0x00647e07 +199: retq 0x00647e08 +200: mov0x4(%r12),%eax 0x00647e0d +205: test %eax,%eax 0x00647e0f +207: jne0x647def LWLockRelease+175 0x00647e11 +209: movzbl 0x1(%r12),%eax 0x00647e17 +215: test %al,%al 0x00647e19 +217: je 0x647def LWLockRelease+175 0x00647e1b +219: cmpb $0x2,0x42(%rcx) 0x00647e1f +223: jne0x647f66 LWLockRelease+550 0x00647e25 +229: mov0x48(%rcx),%rax 0x00647e29 +233: test %rax,%rax 0x00647e2c +236: je 0x647f66 LWLockRelease+550 0x00647e32 +242: mov%rax,%rdx 0x00647e35 +245: jmp0x647e44 LWLockRelease+260 0x00647e37 +247: mov0x48(%rdx),%rdx 0x00647e3b +251: test %rdx,%rdx 0x00647e3e +254: je 0x647f16 LWLockRelease+470 0x00647e44 +260: movzbl 0x42(%rdx),%esi 0x00647e48 +264: mov%rdx,%rax 0x00647e4b +267: cmp$0x2,%sil 0x00647e4f +271: je 0x647e37 LWLockRelease+247 0x00647e51 +273: mov0x48(%rdx),%rdx 0x00647e55 +277: test %sil,%sil 0x00647e58 +280: je 0x647f16 LWLockRelease+470 0x00647e5e +286: test %rdx,%rdx 0x00647e61 +289: je 0x647f16 LWLockRelease+470 0x00647e67 +295: cmpb $0x0,0x42(%rdx) 0x00647e6b +299: mov$0x1,%edi 0x00647e70 +304: jne0x647e8a LWLockRelease+330 0x00647e72
Re: [HACKERS] Terminating pg_basebackup background streamer
On Wed, Feb 12, 2014 at 10:53 PM, Peter Eisentraut pete...@gmx.net wrote: On 2/12/14, 4:34 PM, Magnus Hagander wrote: On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net wrote: On 2/12/14, 12:47 PM, Magnus Hagander wrote: Since there were no other objections, I've applied this patch. I'm getting a compiler warning: pg_basebackup.c:105:3: error: implicit declaration of function 'kill' [-Werror=implicit-function-declaration] What platform is that? And do you know which header the declaration actually lives in? I don't see it here... OS X, signal.h according to man page Are you sure you made that test after my fixup patch (the one suggested by Andres)? Because that one was at least supposed to add signal.h... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
Hello, I might have been misunderstood your words. At Thu, 13 Feb 2014 10:11:22 +0200, Heikki Linnakangas hlinnakan...@vmware.com wrote in 52fc7e2a.9060...@vmware.com On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote: Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. Does this mean that no mechanical solution for this situation will be given and an operator should remove the older segment for the same segment id before starting recovoery process? I've almost inclined to that but on some thoughts on the idea, comming to think of recovery upto target timeline, the old segment found to be necessary for the case. Without the old segment, we would be obliged to seek to the first segment of the *next* timeline (Is there any (simple) means to predict where is it?) to complete the task. How did the server that created the new timeline get the old, partial, segment? Was it already archived? Or did the DBA copy it into pg_xlog manually? Or was it streamed by streaming replication? Whatever the mechanism, the same mechanism ought to make sure the old segment is available for PITR, too. Sure. Hmm. If you have set up streaming replication and a WAL archive, and your master dies and you fail over to a standby, what you describe does happen. The partial old segment is not in the archive, so you cannot PITR to a point in the old timeline that falls within the partial segment (ie. just before the failover). However, it's not guaranteed that all the preceding WAL segments on the old timeline were already archived, anyway, so even if the partial segment is archived, it's not guaranteed to work. Yes, and putting aside the insane or vanished segments in archive, I understand that a pair of master and standby (standby and cascaded standby and so on, too) can share one WAL archive, or archived WAL segments and all the WAL segments not archived and left in pg_xlog of the old master should be merged into WAL archive of new master (promoted old slave) to keep the availability of the online backup taken from the old master. Even with the shared WAL archive, missing segments in archive should be filled up using pg_xlog though. Nevertheless, the process can be implemented in automatic way. The test script at first of this thread is for the case of shared archive and I have unconsciously put that as the context. The old master is responsible for archiving the WAL on the old timeline, and the new master is responsible for archiving all the WAL on the new timeline. That's a straightforward, easy-to-understand rule. Yes, I was somewhat confused because of my assumption of shared archive, but it actually can be converged into single archive, and the older version of PostgreSQL could cope with that situation. It might be useful to have a mode where the standby also archives all the received WAL, but that would need to be a separate option. Perhaps such a mechanism is not demanded :) Is it the right way we kick the older one out of archive? If it's already in the archive, it's not going to be removed from the archive. I have understood the conclusion so far is not archiving the older segment when promotion but it seems a bit odd story as you suggested. If the conclusion here is no aid as my new understanding, would you let me hear the reason why recovery have changed to prefer archive to pg_xlog? The commit abf5c5c9a4 seems to change the behavior but I don't find the reason for the change. ragards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
Hi Febien, Thank you very much for your very detail and useful comments! I read your comment, I agree most of your advice:) Attached patch is fixed for your comment. That are... - Remove redundant long-option. - We can use --gaussian=NUM -S or --gaussian=NUMN -N options. - Add sentence in document - Separate two random generate function which are uniform and gaussian. - getGaussianrand() is created. - Fix ranged random number more strictly, ex. (0,1) or [0,1). - Please see comment of source code in detail:). - Fix typo. - Use cos() and sin() function when we generate gaussian random number. - Add fast sqrt calculation algorithm. - Reuse sqrt result and pre generate random number for reducing calculation cost. - Experience of this method is under following. It will be little-bit faster than non-reuse method. And distribution of gaussian is still good. * Settings shared_buffers = 1024MB * Test script pgbench -i -s 1 pgbench --gaussian=2 -T 30 -S -c8 -j4 -n pgbench --gaussian=2 -T 30 -S -c8 -j4 -n pgbench --gaussian=2 -T 30 -S -c8 -j4 -n * Result method | try1 | try2 | try3 | | reuse method | 44189 | 44453 | 44013 | non-reuse method | 43567 | 43635 | 43508 | (2014/02/09 21:32), Fabien COELHO wrote: This is a valuable contribution to enable pgbench to generate more realistic loads, which is seldom uniform in practice. Thanks! However, ISTM that other distributions such an exponantial one would make more sense, I can easy to create exponential distribution. Here, I assume exponential distribution that is f(x) = lambda * exp^(-lambda * x) in general. What do you think under following interface? custom script: \setexp [varname] min max threshold command : --exponential=NUM(threshold) I don't want to use lambda variable for simple implementation. So lambda is always 1. Because it can enough to control distribution by threshold. Threshold parameter is f(x) value. And using created distribution projects to 'aid' by same method. If you think OK, I will impliment under followings tomorrow, and also create parseing part of this function... do { rand = 1.0 - pg_erand48(thread-random_state); rand = -log(rand); }while( rand exp_threshold) return rand / exp_threshold; and also the values should be further randomized so that neighboring values are not more likely to be drawn. The latest point is non trivial. That's right, but I worry about gaussian randomness and benchmark reproducibility might be disappeared when we re-randomized access pattern, because Postgres storage method manages records by each pages and it is difficult to realize access randomness in whole pages, not record. If we solve this problem, we have to need algorithm for smart shuffule projection function that is still having gaussian randomized. I think it will be difficult, and it have to impement in another patch in the future. * Mathematical soundness We want to derive a discrete normal distribution from a uniform one. Well, normal distributions are for continuous variables... Anyway, this is done by computing a continuous normal distribution which is then projected onto integers. I'm basically fine with that. The system uses a Box-Muller transform (1958) to do this transformation. The Ziggurat method seems to be prefered for this purpose, *but* it would require precalculated tables which depends on the target values. So I'm fine with the Box-Muller transform for pgbench. Yes, that's right. I selected simple and relatively faster algorithm, that is Box-Muller transform. The BM method uses 2 uniformly distributed numbers to derive 2 normally distributed numbers. The implementation computes one of these, and loops over till one match a threshold criterion. More explanations, at least in comments, are needed about this threshold and its meaning. It is required to be more than 2. I guess is that it allows to limit the number of iterations of the while loop, Yes. This loop could not almost go on, because min stdev_threshold is 2. The possibility of retry-loop is under 4 percent. It might not be problem. but in what proportion is unclear. The documentation does not also help the user to understand this value and its meaning. Yes, it is huristic method. So I added the comments in document. What I think it is: it is the deviation for the FURTHEST point around the mean, that is the actual deviation associated to the min and max target values. The 2 minimum value induces that there is a least 4 stddev lengths between min max, with the most likely mean in the middle. Correct! If the threshold test fails, one of the 2 uniform number is redrawn, a new candidate value is tested. I'm not at ease about why only 1 value is redrawn and not both, some explanations would be welcome. Also, on the other hand, why not
Re: [HACKERS] New hook after raw parsing, before analyze
See the discussion of Custom-Scan API. https://commitfest.postgresql.org/action/patch_view?id=1282 I believe my third patch is what you really want to do... This rewritten query would be handled by the FDW table that I previously added to the catalog. The reason I want this new hook is that I don't want tableA and tableB to be in the catalog. I'd like to see why you wants the pseudo table fdw_tableA_tableB to be in the catalog, instead of the tableA and tableB. In addition, parser shall raise an error if referenced columns (as a part of tableA or tableB) are not in-catalog because of name lookup error. Thanks, 2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net: Hello Hackers, I work on a foreign data wrapper for a legacy system. I generally find the hook system very useful and flexible way to extend Postgres. The post parse analyze hook almost fits what I need, but I have a few use cases where I would need to tap right into the parsed queries but before any catalog based validation is done. Please find the attached trivial patch for this new hook. One of the use cases I have is this: I have table like data structures in the source system for the FDW I work on. These tables are sometimes too big and the source system is able to filter and join them with limitations, thus it is not optimal to transfer the data to Postgres. At the same time I want the users to think in terms of the original tables. The idea is to rewrite the SQL queries like this: SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND b.col2=987 to: SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987 This rewritten query would be handled by the FDW table that I previously added to the catalog. The reason I want this new hook is that I don't want tableA and tableB to be in the catalog. Looking forward to hear your thoughts, opinions, comments. Best regards, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 01:37 PM, Kyotaro HORIGUCHI wrote: At Thu, 13 Feb 2014 10:11:22 +0200, Heikki Linnakangas hlinnakan...@vmware.com wrote in 52fc7e2a.9060...@vmware.com On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote: Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. Does this mean that no mechanical solution for this situation will be given and an operator should remove the older segment for the same segment id before starting recovoery process? I don't understand. With the proposed change to never archive a partial segment, there is no segment with the same filename in the archive. There is nothing to remove. Or did you mean that you have to do that in 9.0-9.2 back-branches, if we only back-patch this change to 9.3? If it's already in the archive, it's not going to be removed from the archive. I have understood the conclusion so far is not archiving the older segment when promotion but it seems a bit odd story as you suggested. If the conclusion here is no aid as my new understanding, would you let me hear the reason why recovery have changed to prefer archive to pg_xlog? The commit abf5c5c9a4 seems to change the behavior but I don't find the reason for the change. Well, it didn't work too well before abf5c5c9a4 either. With your test script, and commit abf5c5c9a4 reverted [1], you got this: ### STAGE 3/3: 2/2 START SERVER IN ARCHIVE RECOVERY MODE 2ND RUN ### waiting for server to startLOG: database system was interrupted; last known up at 2014-02-13 14:17:53 EET LOG: starting archive recovery LOG: restored log file 0002.history from archive /bin/cp: cannot stat ‘/home/heikki/pgsql.93stable/test/arc/00020002’: No such file or directory LOG: restored log file 00010002 from archive LOG: record with zero length at 0/290 LOG: invalid primary checkpoint record /bin/cp: cannot stat ‘/home/heikki/pgsql.93stable/test/arc/00020002’: No such file or directory LOG: restored log file 00010002 from archive LOG: using previous checkpoint record at 0/228 LOG: consistent recovery state reached at 0/228 LOG: record with zero length at 0/290 LOG: redo is not required /bin/cp: cannot stat ‘/home/heikki/pgsql.93stable/test/arc/00020002’: No such file or directory LOG: restored log file 00010002 from archive /bin/cp: cannot stat ‘/home/heikki/pgsql.93stable/test/arc/0003.history’: No such file or directory LOG: selected new timeline ID: 3 LOG: restored log file 0002.history from archive LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started Note the invalid primary checkpoint record message. It restored the partial 00010002 file from the archive, and tried to recover using that instead of the 00020002 file from pg_xlog. The primary checkpoint record is not present in 00010002, so it fell back to recover from the previous checkpoint. It was the exact same issue, even though it appeared to work if you don't look carefully. The second recovery never replayed the WAL generated on timeline 2, it branched off directly from timeline 1, at the same point where timeline 2 did. You can see that by looking at the history files. Also, if you create a table and insert a row to it after the first recovery, just before killing the server, you'll see that the row is not there after the second recovery. That's wrong; the data files might already contain some changes from timeline 2, and you end up with a corrupt database. Thanks to commit abf5c5c9a4, the situation is at least detected, and the system refuses to start up. The behavior where we prefer a segment from archive with lower TLI over a file with higher TLI in pg_xlog actually changed in commit a068c391ab0. Arguably changing it wasn't a good idea, but the problem your test script demonstrates can be fixed by not archiving the partial segment, with no change to the preference of archive/pg_xlog. As discussed, archiving a partial segment seems like a bad idea anyway, so let's just stop doing that. [1] just applying commit abf5c5c9a4 in reverse doesn't apply cleanly due to further changes to the code, but I did this to simulate the old behavior: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6a4a504..352d3f7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4366,6 +4366,7 @@ readRecoveryCommandFile(void) /* Enable fetching
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
Re: Heikki Linnakangas 2014-02-13 52fc9468.4050...@vmware.com With 9.1, it works, but 9.2 and 9.3 don't archive anything until I remove the test ! -f part. (An alternative fix would be to declare the behavior ok and adjust that example in the config.) Hmm, the behavior is the same in 9.1 and 9.2. Did you use a different archive_command in 9.1, without the test? The behavior is different: 9.1 doesn't try to re-archive the last segment from the old timeline. (I've seen it always happen in training workshops, so that's always mostly idle machines. Possibly 9.1 will re-archive the last segment under load.) I've just reproduced it here with these settings (in Debian's /etc/postgresql-common/createcluster.conf syntax): archive_mode = on archive_command = 'test ! -f /tmp/pg-%v/%%f cp %%p /tmp/pg-%v/%%f' wal_level = hot_standby max_wal_senders = 3 ... pg_basebackup ... recovery.conf: restore_command = 'cp /tmp/pg-9.3/%f %p' 9.1 slave log (in German, but it should be easy to figure out): LOG: konsistenter Wiederherstellungszustand erreicht bei 0/2D8 cp: der Aufruf von stat für „/tmp/pg-9.1/00010003“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden LOG: konnte Datei »pg_xlog/00010003« nicht öffnen (Logdatei 0, Segment 3): Datei oder Verzeichnis nicht gefunden LOG: Redo fertig bei 0/2D8 LOG: Logdatei »00010002« aus Archiv wiederhergestellt cp: der Aufruf von stat für „/tmp/pg-9.1/0002.history“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden LOG: gewählte neue Timeline-ID: 2 cp: der Aufruf von stat für „/tmp/pg-9.1/0001.history“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden LOG: Wiederherstellung aus Archiv abgeschlossen LOG: unvollständiges Startpaket LOG: Autovacuum-Launcher startet LOG: Datenbanksystem ist bereit, um Verbindungen anzunehmen (eof) ls /tmp/pg-9.1 00010001 00010002.0020.backup 00020003 00010002 00020002 0002.history 9.2 slave log: LOG: konsistenter Wiederherstellungszustand erreicht bei 0/2E0 cp: der Aufruf von stat für „/tmp/pg-9.2/00010003“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden LOG: konnte Datei »pg_xlog/00010003« nicht öffnen (Logdatei 0, Segment 3): Datei oder Verzeichnis nicht gefunden LOG: Redo fertig bei 0/2E0 LOG: Logdatei »00010002« aus Archiv wiederhergestellt cp: der Aufruf von stat für „/tmp/pg-9.2/0002.history“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden LOG: gewählte neue Timeline-ID: 2 cp: der Aufruf von stat für „/tmp/pg-9.2/0001.history“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden LOG: Wiederherstellung aus Archiv abgeschlossen LOG: unvollständiges Startpaket LOG: Autovacuum-Launcher startet LOG: Datenbanksystem ist bereit, um Verbindungen anzunehmen LOG: Archivbefehl ist fehlgeschlagen mit Statuscode 1 DETAIL: Der fehlgeschlagene Archivbefehl war: test ! -f /tmp/pg-9.2/00010002 cp pg_xlog/00010002 /tmp/pg-9.2/00010002 LOG: Archivbefehl ist fehlgeschlagen mit Statuscode 1 ... ls /tmp/pg-9.2 00010001 00010002 00010002.0020.backup The 9.3 log/ls looks the same. Mit freundlichen Grüßen, Christoph Berg -- Senior Berater, Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE signature.asc Description: Digital signature
Re: [HACKERS] gaussian distribution pgbench
Sorry, previos attached patch has small bug. Please use latest one. 134 - return min + (int64) (max - min + 1) * rand; 134 + return min + (int64)((max - min + 1) * rand); Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pgbench/pgbench.c --- b/contrib/pgbench/pgbench.c *** *** 176,181 int progress_nthreads = 0; /* number of threads for progress report */ --- 176,183 bool is_connect; /* establish connection for each transaction */ bool is_latencies; /* report per-command latencies */ int main_pid; /* main process id used in log filename */ + double stdev_threshold = 5; /* standard deviation threshold */ + bool gaussian_option = false; /* use gaussian distribution random generator */ char *pghost = ; char *pgport = ; *** *** 338,346 static char *select_only = { --- 340,390 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n }; + /* --gaussian case */ + static char *gaussian_tpc_b = { + \\set nbranches CppAsString2(nbranches) * :scale\n + \\set ntellers CppAsString2(ntellers) * :scale\n + \\set naccounts CppAsString2(naccounts) * :scale\n + \\setgaussian aid 1 :naccounts :stdev_threshold\n + \\setrandom bid 1 :nbranches\n + \\setrandom tid 1 :ntellers\n + \\setrandom delta -5000 5000\n + BEGIN;\n + UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n + SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n + UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n + UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n + INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n + END;\n + }; + + /* --gaussian with -N case */ + static char *gaussian_simple_update = { + \\set nbranches CppAsString2(nbranches) * :scale\n + \\set ntellers CppAsString2(ntellers) * :scale\n + \\set naccounts CppAsString2(naccounts) * :scale\n + \\setgaussian aid 1 :naccounts :stdev_threshold\n + \\setrandom bid 1 :nbranches\n + \\setrandom tid 1 :ntellers\n + \\setrandom delta -5000 5000\n + BEGIN;\n + UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n + SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n + INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n + END;\n + }; + + /* --gaussian with -S case */ + static char *gaussian_select_only = { + \\set naccounts CppAsString2(naccounts) * :scale\n + \\setgaussian aid 1 :naccounts :stdev_threshold\n + SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n + }; + /* Function prototypes */ static void setalarm(int seconds); static void *threadRun(void *arg); + static inline double sqrtd(const double x); static void usage(void) *** *** 381,386 usage(void) --- 425,431 -v, --vacuum-all vacuum all four standard tables before tests\n --aggregate-interval=NUM aggregate data over NUM seconds\n --sampling-rate=NUM fraction of transactions to log (e.g. 0.01 for 1%%)\n + --gaussian=NUM gaussian distribution with NUM standard deviation threshold\n \nCommon options:\n -d, --debug print debugging output\n -h, --host=HOSTNAME database server host or socket directory\n *** *** 477,482 getrand(TState *thread, int64 min, int64 max) --- 522,597 return min + (int64) ((max - min + 1) * pg_erand48(thread-random_state)); } + /* random number generator: gaussian distribution from min to max inclusive */ + static int64 + getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold) + { + double stdev; + double rand; + static double rand1; + static double rand2; + static double var_sqrt; + static bool reuse = false; + + /* + * Get user specified random number(-stdev_threshold stdev = stdev_threshold) + * in this loop. This loop is executed until appeared ranged number we want. + * However, this loop could not almost go on, because min stdev_threshold is 2 + * then the possibility of retry-loop is under 4 percent. And possibility of + * re-retry-loop is under 1.6 percent. And it doesn't happen frequentry even if + * we also think about the cycle of the trigonometric function. + */ + do + { + /* reuse pre calculation result as possible */ + if(!reuse) + { + /* + * pg_erand48 generates [0,1) random number. However rand1 + * needs (0,1) random number because log(0) cannot calculate. + * And rand2 also needs (0,1) random number in strictly. But + * normalization cost is high and we can substitute (0,1] at + * rand1 and [0,1) at rand2, so we use approximate calculation. + */ + rand1 = 1.0 - pg_erand48(thread-random_state); + rand2 = pg_erand48(thread-random_state); + + /*
Re: [HACKERS] New hook after raw parsing, before analyze
Thanks for the link. I want flexibility. Here is a situation: my hook knows the size of tableA and tableB on the legacy side. It should be able to decide wether to offload the join/filter onto the legacy side or not. At the same time it can start transferring the data to real Postgres tables in the background. When the data is here in the local DB it can change the rewrite to use the local tables instead for the new queries. Answering your question: I don’t know what you mean by pseudo table. The “fdw_tableA_tableB” is a real FDW table with a catalog entry, with real catalog columns. tableA and tableB are non-real tables in my setup. My understanding is the raw_parse-transform-analyze triple first checks the catalog in the transform phase. That’s why I want the hook to be after raw_parse but before transform, so the transform phase won’t see tableA, tableB and their columns, thus I don’t expect errors. Thank you, David Please find my answers inline. 2014.02.13. dátummal, 13:28 időpontban Kohei KaiGai kai...@kaigai.gr.jp írta: See the discussion of Custom-Scan API. https://commitfest.postgresql.org/action/patch_view?id=1282 I believe my third patch is what you really want to do... This rewritten query would be handled by the FDW table that I previously added to the catalog. The reason I want this new hook is that I don't want tableA and tableB to be in the catalog. I'd like to see why you wants the pseudo table fdw_tableA_tableB to be in the catalog, instead of the tableA and tableB. In addition, parser shall raise an error if referenced columns (as a part of tableA or tableB) are not in-catalog because of name lookup error. Thanks, 2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net: Hello Hackers, I work on a foreign data wrapper for a legacy system. I generally find the hook system very useful and flexible way to extend Postgres. The post parse analyze hook almost fits what I need, but I have a few use cases where I would need to tap right into the parsed queries but before any catalog based validation is done. Please find the attached trivial patch for this new hook. One of the use cases I have is this: I have table like data structures in the source system for the FDW I work on. These tables are sometimes too big and the source system is able to filter and join them with limitations, thus it is not optimal to transfer the data to Postgres. At the same time I want the users to think in terms of the original tables. The idea is to rewrite the SQL queries like this: SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND b.col2=987 to: SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987 This rewritten query would be handled by the FDW table that I previously added to the catalog. The reason I want this new hook is that I don't want tableA and tableB to be in the catalog. Looking forward to hear your thoughts, opinions, comments. Best regards, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 02:42 PM, Christoph Berg wrote: Re: Heikki Linnakangas 2014-02-13 52fc9468.4050...@vmware.com With 9.1, it works, but 9.2 and 9.3 don't archive anything until I remove the test ! -f part. (An alternative fix would be to declare the behavior ok and adjust that example in the config.) Hmm, the behavior is the same in 9.1 and 9.2. Did you use a different archive_command in 9.1, without the test? The behavior is different: 9.1 doesn't try to re-archive the last segment from the old timeline. (I've seen it always happen in training workshops, so that's always mostly idle machines. Possibly 9.1 will re-archive the last segment under load.) I've just reproduced it here with these settings (in Debian's /etc/postgresql-common/createcluster.conf syntax): Ah, I think I see what's going on. 9.2 changed the archive recovery behavior so that the restored WAL segments are copied into pg_xlog, so that you can recover the standby back to the same point after restart, even if the archive later stops working (also needed for cascading standbys). In 9.1, we only attempted to archive the last segment from previous timeline if it wasn't restored from archive. In 9.2, it's always archived. I was testing this with streaming replication; 9.1 and 9.2 behave the same in that scenario. But they differ when doing archive recovery. Is this an argument for back-patching the don't archive last segment from old timeline patch to 9.2 and 9.3, but leaving 9.1 alone? You have the same problem with 9.1 and streaming replication, but no-one's complained.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 02/12/2014 06:10 PM, Ants Aasma wrote: On Wed, Feb 12, 2014 at 4:04 PM, knizhnik knizh...@garret.ru wrote: Even if reordering was not done by compiler, it still can happen while execution. There is no warranty that two subsequent assignments will be observed by all CPU cores in the same order. The x86 memory model (total store order) provides that guarantee in this specific case. Regards, Ants Aasma Sorry, I thought that we are talking about general case, not just x86 architecture. May be I do not understand something in LWlock code, but it seems to me that assigning NULL to proc-lwWaitLink is not needed at all: while (head != NULL) { LOG_LWDEBUG(LWLockRelease, lockid, release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); } This part of L1 list is not traversed by any other processor. So nobody will inspect this field. When awakened process needs to wait for another lock, it will just assign NULL to this field itself: proc-lwWaiting = 1; proc-lwWaitMode = mode; proc-lwWaitLink = NULL; if (lock-head == NULL) lock-head = proc; else lock-tail-lwWaitLink = proc; lock-tail = proc; Without TSO (total store order), such assignment of lwWaitLink in LWLockRlease outside critical section may just corrupt L1 list, in which awakened process is already linked. But I am not sure that elimination of this assignment will be enough to ensure correctness of this code without TSO. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prevent pg_basebackup -Fp -D -?
On Thu, Feb 13, 2014 at 4:10 AM, Bruce Momjian br...@momjian.us wrote: On Thu, Oct 3, 2013 at 06:50:57AM +0200, Magnus Hagander wrote: On Oct 3, 2013 2:47 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Oct 2, 2013 at 11:31 PM, Magnus Hagander mag...@hagander.net wrote: Right now, if you use pg_basebackup -Ft -D - you get a tarfile, written to stdout, for redirection. However, if you use: pg_basebackup -Fp -D - you get a plaintext (unpackaged) backup, in a directory called -. I can't think of a single usecase where this is a good idea. Therefor, I would suggest we simply throw an error in this case, instead of creating the directory. Only for the specific case of specifying exactly - as a directory. Comments? Isn't this a non-problem? This behavior is in line with the documentation, so I would suspected that if directory name is specified as - in plain mode, it should create the folder with this name. Do you consider having a folder of this name an annoyance? Yes, that is exactly the point - i do consider that an annoyance, and i don't see the use case where you'd actually want it. I bet 100% of the users of that have been accidental, thinking they'd get the pipe, not the directory. Also, if we do that, is this something we should consider backpatchable? It's not strictly speaking a bugfix, but I'd say it fixes some seriously annoying behavior. This would change the spec of pg_basebackup, so no? Does the current behavior have potential security issues? No, there are no security issues that I can see. Just annoyance. And yes, I guess it would change the spec, so backpatching might be a bad idea.. Has this been fixed? If so, I don't see it. It has not. I think the thread wasn't entirely clear on if we wanted it or not, which is why I was waiting for more input from others. And then promptly forgot about it since nobody spoke up :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
Re: Heikki Linnakangas 2014-02-13 52fcca40.3060...@vmware.com I was testing this with streaming replication; 9.1 and 9.2 behave the same in that scenario. But they differ when doing archive recovery. Is this an argument for back-patching the don't archive last segment from old timeline patch to 9.2 and 9.3, but leaving 9.1 alone? You have the same problem with 9.1 and streaming replication, but no-one's complained.. I can't say what would be better from a technical viewpoint. For me, this is literally an academic problem people run into when they try replication during the PostgreSQL course, using the archive_command example from postgresql.conf. I guess if no one else has complained yet, there's no reason to touch the back branches. Is removing the test ! -f part and hence overwriting files in the archive safe, i.e. are the files the same? Mit freundlichen Grüßen, Christoph Berg -- Senior Berater, Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE signature.asc Description: Digital signature
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 03:53 PM, Christoph Berg wrote: Is removing the test ! -f part and hence overwriting files in the archive safe, i.e. are the files the same? No. Not in general, anyway. If the old master keeps running, even for a moment, after the partial file was copied, it will have created more WAL records which are present in the old master's copy of the segment, but not in the standby's. You could perhaps use diff as part of archive_command, to check if archived version is identical to the one about to be archived. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
On Wed, Feb 12, 2014 at 06:06:40PM -0500, Bruce Momjian wrote: On Wed, Sep 11, 2013 at 02:10:45PM -0400, Robert Haas wrote: On Tue, Sep 10, 2013 at 11:33 PM, Noah Misch n...@leadboat.com wrote: I'm thinking to preserve postmaster.pid at immediate shutdown in all released versions, but I'm less sure about back-patching a change to make PGSharedMemoryCreate() pickier. On the one hand, allowing startup to proceed with backends still active in the same data directory is a corruption hazard. On the other hand, it could break weird shutdown/restart patterns that permit trivial lifespan overlap between backends of different postmasters. Opinions? I'm more sanguine about the second change than the first. Leaving postmaster.pid around seems like a clear user-visible behavior change that could break user scripts or have other consequences that we don't foresee; thus, I would vote against back-patching it. Indeed, I'm not sure it's a good idea to do that even in master. On the other hand, tightening the checks in PGSharedMemoryCreate() seems very much worth doing, and I think it might also be safe enough to back-patch. Were these changes every applied? I don't see them. No, I haven't gotten around to writing them. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
Re: Heikki Linnakangas 2014-02-13 52fcd02c.3060...@vmware.com Is removing the test ! -f part and hence overwriting files in the archive safe, i.e. are the files the same? No. Not in general, anyway. If the old master keeps running, even for a moment, after the partial file was copied, it will have created more WAL records which are present in the old master's copy of the segment, but not in the standby's. So why does PostgreSQL then ever try to archive something from the old timeline? Isn't overwriting files on the old timeline just a very bad idea in general? If so, it shouldn't be the responsibility of archive_command to prevent this, but the server shouldn't be trying that in the first place. You could perhaps use diff as part of archive_command, to check if archived version is identical to the one about to be archived. Well, if that makes it into the default postgresql.conf (and docs)... Otoh, it's still not nice that this could still be an archive command that fails, i.e. requires admin intervention before archiving can proceed. Mit freundlichen Grüßen, Christoph Berg -- Senior Berater, Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE signature.asc Description: Digital signature
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 04:07 PM, Christoph Berg wrote: Re: Heikki Linnakangas 2014-02-13 52fcd02c.3060...@vmware.com Is removing the test ! -f part and hence overwriting files in the archive safe, i.e. are the files the same? No. Not in general, anyway. If the old master keeps running, even for a moment, after the partial file was copied, it will have created more WAL records which are present in the old master's copy of the segment, but not in the standby's. So why does PostgreSQL then ever try to archive something from the old timeline? Isn't overwriting files on the old timeline just a very bad idea in general? Yes. If so, it shouldn't be the responsibility of archive_command to prevent this, but the server shouldn't be trying that in the first place. Totally agreed. That's why I'm proposing that we stop doing that :-). It's been a bad idea from day one, but I'm a bit hesitent to back-patch until someone actually complains. There has been subtle changes to archive recovery behavior in each release, so I'm worried that the old behavior was in fact useful to someone in some corner case that we're missing. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On 2/13/14, 6:25 AM, Magnus Hagander wrote: Are you sure you made that test after my fixup patch (the one suggested by Andres)? Because that one was at least supposed to add signal.h... works now -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On Feb10, 2014, at 17:38 , Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 11:11:28 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: So what we need to do is to acquire a write barrier between the assignments to lwWaitLink and lwWaiting, i.e. proc-lwWaitLink = NULL; pg_write_barrier(); proc-lwWaiting = false; You didn't really explain why you think that ordering is necessary? Each proc being awoken will surely see both fields updated, and other procs won't be examining these fields at all, since we already delinked all these procs from the LWLock's queue. The problem is that one the released backends could wake up concurrently because of a unrelated, or previous PGSemaphoreUnlock(). It could see lwWaiting = false, and thus wakeup and acquire the lock, even if the store for lwWaitLink hasn't arrived (or performed, there's no guaranteed ordering here) yet. Now, it may well be that there's no practical consequence of that, but I am not prepared to bet on it. AFAICS there is a potential problem if three backends are involved, since by the time the waiting backend's lwWaitLink is set to NULL after the original lock holder released the lock, the waiting backend might already have acquired the lock (due to a spurious wakeup) *and* a third backend might have already enqueued behind it. The exact sequence for backends A,B,C that corrupts the wait queue is: A: Release lock, set B's lwWaiting to false B: Wakes up spuriously, takes the lock C: Enqueues behind B A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and causing C and anyone behind it to block indefinitely. I wonder whether LWLockRelease really needs to update lwWaitLink at all. We take the backends we awake off the queue by updating the queue's head and tail, so the contents of lwWaitLink should only matter once the backend is re-inserted into some wait queue. But when doing that, we reset lwWaitLink back to NULL anway. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 02/12/2014 04:04 PM, Heikki Linnakangas wrote: On 02/12/2014 10:50 PM, Andres Freund wrote: On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. I'm not sure that what Andrew is describing can fairly be called a concurrent-performance problem. It sounds closer to a stuck lock. Are you sure you've diagnosed it correctly? No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck. If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow. Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time. Is that going to serialize all these inserts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] old warning in docs
On Mon, Oct 7, 2013 at 08:43:27AM -0700, David Fetter wrote: On Mon, Oct 07, 2013 at 07:51:44AM -0400, Andrew Dunstan wrote: Given that we have not supported releases older than 8.3 for quite a while, do we need to keep this in extend.sgml any longer? caution para Changing varnamePG_CONFIG/varname only works when building against productnamePostgreSQL/productname 8.3 or later. With older releases it does not work to set it to anything except literalpg_config/; you must alter your varnamePATH/ to select the installation to build against. /para /caution I say bin it. That reminds me. There are probably a lot of places in the docs that refer to versions of PostgreSQL a good bit older than 8.3. Will grep and patch as I get the time. I have created the attached patch which removes many of the pre-8.0 references, and trims some of the 8.1-8.3 references. There are probably some of these that should be kept, but it is easier to show you all the possibilities and we can trim down the removal list based on feedback. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index 30fd9bb..a5b74e6 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** NUMERIC *** 744,754 note para ! Prior to productnamePostgreSQL/productname 7.4, the precision in ! typefloat(replaceablep/replaceable)/type was taken to mean ! so many emphasisdecimal/ digits. This has been corrected to match the SQL ! standard, which specifies that the precision is measured in binary ! digits. The assumption that typereal/type and typedouble precision/type have exactly 24 and 53 bits in the mantissa respectively is correct for IEEE-standard floating point implementations. On non-IEEE platforms it might be off a little, but --- 744,750 note para ! The assumption that typereal/type and typedouble precision/type have exactly 24 and 53 bits in the mantissa respectively is correct for IEEE-standard floating point implementations. On non-IEEE platforms it might be off a little, but *** ALTER SEQUENCE replaceable class=param *** 844,859 /para /note - note - para - Prior to productnamePostgreSQL/productname 7.3, typeserial/type - implied literalUNIQUE/literal. This is no longer automatic. If - you wish a serial column to have a unique constraint or be a - primary key, it must now be specified, just like - any other data type. - /para - /note - para To insert the next value of the sequence into the typeserial/type column, specify that the typeserial/type --- 840,845 *** SELECT E'\\xDEADBEEF'; *** 1602,1609 The SQL standard requires that writing just typetimestamp/type be equivalent to typetimestamp without time zone/type, and productnamePostgreSQL/productname honors that ! behavior. (Releases prior to 7.3 treated it as typetimestamp ! with time zone/type.) typetimestamptz/type is accepted as an abbreviation for typetimestamp with time zone/type; this is a productnamePostgreSQL/productname extension. /para --- 1588,1594 The SQL standard requires that writing just typetimestamp/type be equivalent to typetimestamp without time zone/type, and productnamePostgreSQL/productname honors that ! behavior. typetimestamptz/type is accepted as an abbreviation for typetimestamp with time zone/type; this is a productnamePostgreSQL/productname extension. /para diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml new file mode 100644 index bae2e97..8ace8bd *** a/doc/src/sgml/ddl.sgml --- b/doc/src/sgml/ddl.sgml *** CREATE TABLE circles ( *** 1106,1114 within a single transaction. In practice this limit is not a problem mdash; note that the limit is on the number of acronymSQL/acronym commands, not the number of rows processed. ! Also, as of productnamePostgreSQL/productname 8.3, only commands ! that actually modify the database contents will consume a command ! identifier. /para /sect1 --- 1106,1113 within a single transaction. In practice this limit is not a problem mdash; note that the limit is on the number of acronymSQL/acronym commands, not the number of rows processed. ! Also, only commands that actually modify the database contents will ! consume a command identifier. /para /sect1 *** REVOKE CREATE ON SCHEMA public FROM PUBL *** 1873,1883 /para para ! In
Re: [HACKERS] issue with gininsert under very high load
On 02/13/2014 05:40 PM, Andrew Dunstan wrote: On 02/12/2014 04:04 PM, Heikki Linnakangas wrote: On 02/12/2014 10:50 PM, Andres Freund wrote: On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. I'm not sure that what Andrew is describing can fairly be called a concurrent-performance problem. It sounds closer to a stuck lock. Are you sure you've diagnosed it correctly? No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck. If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow. Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time. Is that going to serialize all these inserts? It will serialize the cleanup process, which moves entries from the pending list to the tree proper. But that's better than the current situation. Currently, when two processes attempt it, they will both try to insert into the GIN tree, but one of them will notice that the other one already did the cleanup, and bail out. So only one process contributes to progress, while the others just waste their effort. The processes should try to get the lock, and just give up if it's already held rather than wait. If someone else is already doing the cleanup, there's no need for the current process to do it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 02:42 PM, Heikki Linnakangas wrote: The behavior where we prefer a segment from archive with lower TLI over a file with higher TLI in pg_xlog actually changed in commit a068c391ab0. Arguably changing it wasn't a good idea, but the problem your test script demonstrates can be fixed by not archiving the partial segment, with no change to the preference of archive/pg_xlog. As discussed, archiving a partial segment seems like a bad idea anyway, so let's just stop doing that. After some further thought, while not archiving the partial segment fixes your test script, it's not enough to fix all variants of the problem. Even if archive recovery doesn't archive the last, partial, segment, if the original master server is still running, it's entirely possible that it fills the segment and archives it. In that case, archive recovery will again prefer the archived segment with lower TLI over the segment with newer TLI in pg_xlog. So I agree we should commit the patch you posted (or something to that effect). The change to not archive the last segment still seems like a good idea, but perhaps we should only do that in master. Even if after that patch, you can have a problem in more complicated scenarios involving both an archive and streaming replication. For example, imagine a timeline history like this: TLI 1 +--- | 2 +--- Now imagine that timeline 1 has been fully archived, and there are WAL segments much higher than the points where the timeline switch occurred present in the archive. But none of the WAL segments for timeline 2 have been archived; they are only present in a master server. You want to perform recovery to timeline 2, using the archived WAL segments for timelines 1, and streaming replication to catch up to the tip of timeline 2. Whether we prefer files from pg_xlog or archive will make no difference in this case, as there are no files in pg_xlog. But it will merrily apply all the WAL for timeline 1 from the archive that it can find, past the timeline switch point. After that, when it tries to connect to the server will streaming replication, it will fail. There's not much we can do about that in 9.2 and below, but in 9.3 the timeline history file contains the exact timeline switch points, so we could be more careful and not apply any extra WAL on the old timeline past the switch point. We could also be more exact in which files we try to restore from the archive, instead of just polling every future TLI in the history. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] nextVictimBuffer in README
While reading through src/backend/storage/buffer/README and looking at the code that it describes, I noticed that the case is wrong for nextVictimBuffer. It's no big deal really, but the attached trivial patch makes the README match the code. -- Vik *** a/src/backend/storage/buffer/README --- b/src/backend/storage/buffer/README *** *** 176,183 small limit value) whenever the buffer is pinned. (This requires only the buffer header spinlock, which would have to be taken anyway to increment the buffer reference count, so it's nearly free.) ! The clock hand is a buffer index, NextVictimBuffer, that moves circularly ! through all the available buffers. NextVictimBuffer is protected by the BufFreelistLock. The algorithm for a process that needs to obtain a victim buffer is: --- 176,183 buffer header spinlock, which would have to be taken anyway to increment the buffer reference count, so it's nearly free.) ! The clock hand is a buffer index, nextVictimBuffer, that moves circularly ! through all the available buffers. nextVictimBuffer is protected by the BufFreelistLock. The algorithm for a process that needs to obtain a victim buffer is: *** *** 189,196 is pinned or has a nonzero usage count, it cannot be used; ignore it and return to the start of step 2. Otherwise, pin the buffer, release BufFreelistLock, and return the buffer. ! 3. Otherwise, select the buffer pointed to by NextVictimBuffer, and ! circularly advance NextVictimBuffer for next time. 4. If the selected buffer is pinned or has a nonzero usage count, it cannot be used. Decrement its usage count (if nonzero) and return to step 3 to --- 189,196 return to the start of step 2. Otherwise, pin the buffer, release BufFreelistLock, and return the buffer. ! 3. Otherwise, select the buffer pointed to by nextVictimBuffer, and ! circularly advance nextVictimBuffer for next time. 4. If the selected buffer is pinned or has a nonzero usage count, it cannot be used. Decrement its usage count (if nonzero) and return to step 3 to *** *** 254,264 Background Writer's Processing The background writer is designed to write out pages that are likely to be recycled soon, thereby offloading the writing work from active backends. To do this, it scans forward circularly from the current position of ! NextVictimBuffer (which it does not change!), looking for buffers that are dirty and not pinned nor marked with a positive usage count. It pins, writes, and releases any such buffer. ! If we can assume that reading NextVictimBuffer is an atomic action, then the writer doesn't even need to take the BufFreelistLock in order to look for buffers to write; it needs only to spinlock each buffer header for long enough to check the dirtybit. Even without that assumption, the writer --- 254,264 The background writer is designed to write out pages that are likely to be recycled soon, thereby offloading the writing work from active backends. To do this, it scans forward circularly from the current position of ! nextVictimBuffer (which it does not change!), looking for buffers that are dirty and not pinned nor marked with a positive usage count. It pins, writes, and releases any such buffer. ! If we can assume that reading nextVictimBuffer is an atomic action, then the writer doesn't even need to take the BufFreelistLock in order to look for buffers to write; it needs only to spinlock each buffer header for long enough to check the dirtybit. Even without that assumption, the writer *** *** 268,274 the contention cost of the writer compared to PG 8.0.) During a checkpoint, the writer's strategy must be to write every dirty buffer (pinned or not!). We may as well make it start this scan from ! NextVictimBuffer, however, so that the first-to-be-written pages are the ones that backends might otherwise have to write for themselves soon. The background writer takes shared content lock on a buffer while writing it --- 268,274 During a checkpoint, the writer's strategy must be to write every dirty buffer (pinned or not!). We may as well make it start this scan from ! nextVictimBuffer, however, so that the first-to-be-written pages are the ones that backends might otherwise have to write for themselves soon. The background writer takes shared content lock on a buffer while writing it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 02/13/2014 10:58 AM, Heikki Linnakangas wrote: Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time. Is that going to serialize all these inserts? It will serialize the cleanup process, which moves entries from the pending list to the tree proper. But that's better than the current situation. Currently, when two processes attempt it, they will both try to insert into the GIN tree, but one of them will notice that the other one already did the cleanup, and bail out. So only one process contributes to progress, while the others just waste their effort. The processes should try to get the lock, and just give up if it's already held rather than wait. If someone else is already doing the cleanup, there's no need for the current process to do it. Sounds good. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
On 2014-02-12 17:40:44 -0300, Alvaro Herrera wrote: Also, AutoVacOpts (used as part of reloptions) gained three extra fields. Since this is in the middle of StdRdOptions, it'd be somewhat more involve to put these at the end of that struct. This might be a problem if somebody has a module calling RelationIsSecurityView(). If anyone thinks we should be concerned about such an ABI change, please shout quickly. That sounds problematic --- surely StdRdOptions might be something extensions are making use of? So can we assume that security_barrier is the only thing to be concerned about? If so, the attached patch should work around the issue by placing it in the same physical location. Aw. How instead about temporarily introducing AutoVacMXactOpts or something? Changing the name of the member variable sounds just as likely to break things. I guess if there are modules that add extra stuff beyond StdRdOptions, this wouldn't work, but I'm not really sure how likely this is given that our reloptions design hasn't proven to be the most extensible thing in the world. Hm, I don't see how it'd be problematic, even if they do. I don't really understand the design of the reloptions code, but afaics, they shouldn't do so by casting around rd_options but by parsing it anew, right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Same double precision operations, different results
Please can someone tell me why this behavior? Thx!! I know it's calculating over coordinates on same points, but the operations of values (all double precision) differs even forcing with cast (*) Here some double precision calculations and obtaining values #Select #cos(radians(39.9826557)) as f1, #cos(radians(39.9826557)) as f2, #cos(radians(-0.0477312004383) - radians(-0.0477312004383)) as f3, #sin(radians(39.9826557)) as f4, #sin(radians(39.9826557)) as f5; f1 |f2 | f3 |f4 |f5 ---+---++---+--- 0.766238989559398 | 0.766238989559398 | 1 | 0.642555686986733 | 0.642555686986733 (1 row) (*) Here above same obtaining values involved in simple operations # select (0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 * 0.642555686986733) calc; calc -- 0.999633651488135693 (1 row) (*) Here above queries playing together --- Result = 1 ? N! It should be 0.999633651488135693 # select (v.f1 * v.f2 * v.f3 + v.f4 * v.f5) calc from ( select cos(radians(39.9826557)) as f1, cos(radians(39.9826557)) as f2, cos(radians(-0.0477312004383) - radians(-0.0477312004383)) as f3, sin(radians(39.9826557)) as f4, sin(radians(39.9826557)) as f5 ) as v; calc -- 1 (1 row) (*) Yep, this produces problems with acos. Here acos working directly with the above values. OK this is like acos(0.999633651488135693) # select acos(0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 * 0.642555686986733); acos -- 2.58095682795179e-08 (1 row) (*) Yep, you can see same result # select acos(0.999633651488135693); acos -- 2.58095682795179e-08 (1 r (*) ok acos values from -1 to 1 you can see acos(1) here # select acos(1); acos -- 0 (1 row) (*) Why this calculation produces 1 and not 0.999633651488135693 ? # select ( cos(radians(39.9826557)) * cos(radians(39.9826557)) * cos(radians(-0.0477312004383) - radians(-0.0477312004383)) + sin(radians(39.9826557)) * sin(radians(39.9826557)) ) calc; calc -- 1 (1 row) (*) any way, the result is 1, then if acos(1) get results, this must get same result NOPS!! input is out of range # select acos( #cos(radians(39.9826557)) #* cos(radians(39.9826557)) #* cos(radians(-0.0477312004383) - radians(-0.0477312004383)) #+ sin(radians(39.9826557)) #* sin(radians(39.9826557)) # ); ERROR: input is out of range (*) You are thinking: it's some cast problem ... NOPS!! input is out of range # select acos( cast ( cos(radians(39.9826557)) * cos(radians(39.9826557)) * cos(radians(-0.0477312004383) - radians(-0.0477312004383)) + sin(radians(39.9826557)) * sin(radians(39.9826557)) as double precision ) ); ERROR: input is out of range Really, I can get the reason!!! Some idea? Workaround? Thx
Re: [HACKERS] Same double precision operations, different results
=?ISO-8859-1?Q?Daniel_V=E1zquez?= daniel2d2...@gmail.com writes: Please can someone tell me why this behavior? Thx!! You're confusing numeric and double precision; in particular this calculation is *not* being done in float8, but numeric: # select (0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 * 0.642555686986733) calc; (*) Why this calculation produces 1 and not 0.999633651488135693 Actually, it's not producing 1, but a smidgen more: regression=# set extra_float_digits TO 3; SET regression=# select ( cast ( cos(radians(39.9826557)) * cos(radians(39.9826557)) * cos(radians(-0.0477312004383) - radians(-0.0477312004383)) + sin(radians(39.9826557)) * sin(radians(39.9826557)) as double precision ) ); float8 - 1.00022 (1 row) You've got roundoff error either way, but this way happens to be in the direction that makes acos() complain. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Andres Freund escribió: On 2014-02-12 17:40:44 -0300, Alvaro Herrera wrote: Also, AutoVacOpts (used as part of reloptions) gained three extra fields. Since this is in the middle of StdRdOptions, it'd be somewhat more involve to put these at the end of that struct. This might be a problem if somebody has a module calling RelationIsSecurityView(). If anyone thinks we should be concerned about such an ABI change, please shout quickly. That sounds problematic --- surely StdRdOptions might be something extensions are making use of? So can we assume that security_barrier is the only thing to be concerned about? If so, the attached patch should work around the issue by placing it in the same physical location. Aw. How instead about temporarily introducing AutoVacMXactOpts or something? Changing the name of the member variable sounds just as likely to break things. Yes, that's what I did --- see the attached patch, which I would apply on top of the code for master and would be only in 9.3. The idea here is to keep the existing bits of StdRdOpts identical, so that macros such as RelationIsSecurityView() that were compiled with the old rel.h continue to work unchanged and without requiring a recompile. I guess if there are modules that add extra stuff beyond StdRdOptions, this wouldn't work, but I'm not really sure how likely this is given that our reloptions design hasn't proven to be the most extensible thing in the world. Hm, I don't see how it'd be problematic, even if they do. I don't really understand the design of the reloptions code, but afaics, they shouldn't do so by casting around rd_options but by parsing it anew, right? Now that I think about it, I don't think adding stuff at the end of StdRdOptions has anything to do with adding nonstandard options. So if we extend that struct we're not breaking any ABI contract. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Alvaro Herrera escribió: Yes, that's what I did --- see the attached patch, which I would apply on top of the code for master and would be only in 9.3. (Of course, these changes affect other parts of the code, in particular autovacuum.c and reloptions.c. But that's not important here). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services commit 689869c791aa7d13f408430ef438291bddb84b2f Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Thu Feb 13 12:31:37 2014 -0300 9.3 tweaks to avoid ABI break diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 534a9ea..6daa14a 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -1170,11 +1170,11 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind) {autovacuum_freeze_table_age, RELOPT_TYPE_INT, offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, freeze_table_age)}, {autovacuum_multixact_freeze_min_age, RELOPT_TYPE_INT, - offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, multixact_freeze_min_age)}, + offsetof(StdRdOptions, autovacuum2) +offsetof(AutoVacOpts2, multixact_freeze_min_age)}, {autovacuum_multixact_freeze_max_age, RELOPT_TYPE_INT, - offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, multixact_freeze_max_age)}, + offsetof(StdRdOptions, autovacuum2) +offsetof(AutoVacOpts2, multixact_freeze_max_age)}, {autovacuum_multixact_freeze_table_age, RELOPT_TYPE_INT, - offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, multixact_freeze_table_age)}, + offsetof(StdRdOptions, autovacuum2) +offsetof(AutoVacOpts2, multixact_freeze_table_age)}, {autovacuum_vacuum_scale_factor, RELOPT_TYPE_REAL, offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_scale_factor)}, {autovacuum_analyze_scale_factor, RELOPT_TYPE_REAL, diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index d749f70..8ef59e6 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -2430,10 +2430,11 @@ typedef struct VacuumStmt int options; /* OR of VacuumOption flags */ int freeze_min_age; /* min freeze age, or -1 to use default */ int freeze_table_age; /* age at which to scan whole table */ - int multixact_freeze_min_age; /* min multixact freeze age, or -1 to use default */ - int multixact_freeze_table_age; /* multixact age at which to scan whole table */ RangeVar *relation; /* single table to process, or NULL */ List *va_cols; /* list of column names, or NIL for all */ + /* place these at the end, to avoid ABI break within 9.3 branch */ + int multixact_freeze_min_age; /* min multixact freeze age, or -1 to use default */ + int multixact_freeze_table_age; /* multixact age at which to scan whole table */ } VacuumStmt; /* -- diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 47ae106..53d5690 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -187,7 +187,10 @@ typedef struct RelationData * be applied to relations that use this format or a superset for * private options data. */ - /* autovacuum-related reloptions. */ + /* autovacuum-related reloptions. + * + * Split in two to avoid ABI break. + */ typedef struct AutoVacOpts { bool enabled; @@ -198,19 +201,24 @@ typedef struct AutoVacOpts int freeze_min_age; int freeze_max_age; int freeze_table_age; - int multixact_freeze_min_age; - int multixact_freeze_max_age; - int multixact_freeze_table_age; float8 vacuum_scale_factor; float8 analyze_scale_factor; } AutoVacOpts; +typedef struct AutoVacOpts2 +{ + int multixact_freeze_min_age; + int multixact_freeze_max_age; + int multixact_freeze_table_age; +} AutoVacOpts2; + typedef struct StdRdOptions { int32 vl_len_; /* varlena header (do not touch directly!) */ int fillfactor; /* page fill factor in percent (0..100) */ AutoVacOpts autovacuum; /* autovacuum-related options */ bool security_barrier; /* for views */ + AutoVacOpts2 autovacuum2; /* rest of autovacuum options */ } StdRdOptions; #define HEAP_MIN_FILLFACTOR 10 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 02/13/2014 06:47 PM, Heikki Linnakangas wrote: On 02/13/2014 02:42 PM, Heikki Linnakangas wrote: The behavior where we prefer a segment from archive with lower TLI over a file with higher TLI in pg_xlog actually changed in commit a068c391ab0. Arguably changing it wasn't a good idea, but the problem your test script demonstrates can be fixed by not archiving the partial segment, with no change to the preference of archive/pg_xlog. As discussed, archiving a partial segment seems like a bad idea anyway, so let's just stop doing that. After some further thought, while not archiving the partial segment fixes your test script, it's not enough to fix all variants of the problem. Even if archive recovery doesn't archive the last, partial, segment, if the original master server is still running, it's entirely possible that it fills the segment and archives it. In that case, archive recovery will again prefer the archived segment with lower TLI over the segment with newer TLI in pg_xlog. So I agree we should commit the patch you posted (or something to that effect). The change to not archive the last segment still seems like a good idea, but perhaps we should only do that in master. To draw this to conclusion, barring any further insights to this, I'm going to commit the attached patch to master and REL9_3_STABLE. Please have a look at the patch, to see if I'm missing something. I modified the state machine to skip over XLOG_FROM_XLOG state, if reading in XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and pg_xlog together, and then pg_xlog alone, which is pointless. In master, I'm also going to remove the archive last segment on old timeline code. - Heikki From 15b0cadb0006b97f87257f4f903db8d33411ada9 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Thu, 13 Feb 2014 19:16:00 +0200 Subject: [PATCH 1/1] Change the order that pg_xlog and WAL archive are polled for WAL segments. If there is a WAL segment with same ID but different TLI present in both the WAL archive and pg_xlog, prefer the one with higher TLI. Before this patch, the archive was polled first, for all expected TLIs, and only if no file was found was pg_xlog scanned. This was a change in behavior from 9.3, which first scanned archive and pg_xlog for the highest TLI, then archive and pg_xlog for the next highest TLI and so forth. This patch reverts the behavior back to what it was in 9.2. The reason for this is that if for example you try to do archive recovery to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is not archived yet, we would replay past the timeline switch point on timeline 1 using the archived files, before even looking timeline 2's files in pg_xlog Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior was changed. --- src/backend/access/transam/xlog.c | 22 +- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6a4a504..b2d8e1a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9576,17 +9576,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, /*--- * Standby mode is implemented by a state machine: * - * 1. Read from archive (XLOG_FROM_ARCHIVE) - * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG) - * 3. Check trigger file - * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM) - * 5. Rescan timelines - * 6. Sleep 5 seconds, and loop back to 1. + * 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just + *pg_xlog (XLOG_FROM_XLOG) + * 2. Check trigger file + * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM) + * 4. Rescan timelines + * 5. Sleep 5 seconds, and loop back to 1. * * Failure to read from the current source advances the state machine to - * the next state. In addition, successfully reading a file from pg_xlog - * moves the state machine from state 2 back to state 1 (we always prefer - * files in the archive over files in pg_xlog). + * the next state. * * 'currentSource' indicates the current state. There are no currentSource * values for check trigger, rescan timelines, and sleep states, @@ -9614,9 +9612,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, switch (currentSource) { case XLOG_FROM_ARCHIVE: - currentSource = XLOG_FROM_PG_XLOG; - break; - case XLOG_FROM_PG_XLOG: /* @@ -9781,7 +9776,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * Try to restore the file from archive, or read an existing * file from pg_xlog. */ -readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource); +readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, + currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource); if (readFile = 0) return true; /* success! */ --
Re: [HACKERS] truncating pg_multixact/members
On 2014-02-13 14:40:39 -0300, Alvaro Herrera wrote: Andres Freund escribió: On 2014-02-12 17:40:44 -0300, Alvaro Herrera wrote: Also, AutoVacOpts (used as part of reloptions) gained three extra fields. Since this is in the middle of StdRdOptions, it'd be somewhat more involve to put these at the end of that struct. This might be a problem if somebody has a module calling RelationIsSecurityView(). If anyone thinks we should be concerned about such an ABI change, please shout quickly. That sounds problematic --- surely StdRdOptions might be something extensions are making use of? So can we assume that security_barrier is the only thing to be concerned about? If so, the attached patch should work around the issue by placing it in the same physical location. Aw. How instead about temporarily introducing AutoVacMXactOpts or something? Changing the name of the member variable sounds just as likely to break things. Yes, that's what I did --- see the attached patch, which I would apply on top of the code for master and would be only in 9.3. The idea here is to keep the existing bits of StdRdOpts identical, so that macros such as RelationIsSecurityView() that were compiled with the old rel.h continue to work unchanged and without requiring a recompile. What I mean is that earlier code using StdRelOptions-security_barrier directly now won't compile anymore. So you've changed a ABI breakage into a API break. That's why I suggest adding the new options into a separate struct at the end of StdRelOptions, that won't break anything. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how set GUC_check_errhint_string in call_string_check_hook()
Amit Langote amitlangot...@gmail.com writes: On Thu, Feb 13, 2014 at 5:59 PM, amul sul sul_a...@yahoo.co.in wrote: 2. Can I pass hint message in above ereport(), how? It could be done by setting the check hook related global variable GUC_check_errhint_string inside your hook function. You should use the macro GUC_check_errhint() to set up a hint inside a GUC variable's check function. See check_XactIsoLevel() in src/backend/commands/variable.c for examples. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: I think what you're arguing is that we should see WAL records filling the rest of segment 1 before we see any references to segment 2, but if that's the case then how did we get into the situation you reported? Or is it just that it was a broken base backup to start with? The scenario I could come up with that didn't require a broken base backup was that there was an earlier truncate or vacuum. So the sequence is high offset reference, truncate, growth, crash. All possibly on a single database. That's not really an issue, because then it would be OK to discard the high-offset update; we'd recognize that as safe when we replay the truncation. It's possible we're better off not assuming we've thought of all possible ways this can happen though. That's what's bothering me, too. On the other hand, if we can't think of a scenario where it'd be necessary to replay the high-offset update, then I'm disinclined to mess with the code further. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The scenario I could come up with that didn't require a broken base backup was that there was an earlier truncate or vacuum. So the sequence is high offset reference, truncate, growth, crash. All possibly on a single database. That's not really an issue, because then it would be OK to discard the high-offset update; we'd recognize that as safe when we replay the truncation. Yeah, that's my point. It's possible we're better off not assuming we've thought of all possible ways this can happen though. That's what's bothering me, too. On the other hand, if we can't think of a scenario where it'd be necessary to replay the high-offset update, then I'm disinclined to mess with the code further. And the whole point of the undefined page error checking is to detect cases like this, so covering them up in the name of possible edge cases we haven't thought of kind of defeats the purpose. In particular I would have liked to get errors rather than soldier on when the database found these missing segments. In that vein, the other possibly open question was how we got past the undefined pages errors that we did see. Andres said he thought that was due to the bug where some piece of code was mistakenly using the presence of a snapshot but I'm not clear how that can cause this though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New hook after raw parsing, before analyze
David Beck db...@starschema.net writes: I have table like data structures in the source system for the FDW I work on. These tables are sometimes too big and the source system is able to filter and join them with limitations, thus it is not optimal to transfer the data to Postgres. At the same time I want the users to think in terms of the original tables. The idea is to rewrite the SQL queries like this: âSELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND b.col2=987â to: âSELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987â TBH this sounds like a spectacularly bad idea, especially in the place and way you propose to do it. You can't even do catalog access safely where you've put that hook, not to mention that there are many other places where queries can be submitted. But more generally, an FDW should not operate in the way you're describing. We do lack support for pushing joins to the foreign server, and that needs to be addressed; but we need to do it in the planner, not by kluging the query somewhere upstream of that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Fri, Oct 11, 2013 at 03:39:51PM -0700, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: On 10/11/2013 01:11 PM, Bruce Momjian wrote: In summary, I think we need to: * decide on new defaults for work_mem and maintenance_work_mem * add an initdb flag to allow users/packagers to set shared_bufffers? * add an autovacuum_work_mem setting? * change the default for temp_buffers? If we're changing defaults, bgwriter_lru_maxpages and vacuum_cost_limit could also use a bump; those thresholds were set for servers with 1GB of RAM. +1 on those. Also, I have often had to bump cpu_tuple_cost into the 0.03 to 0.05 range to get a good plan. In general, this makes the exact settings of *_page_cost less fussy, and I have hit situations where I was completely unable to get a good plan to emerge without bumping cpu_tuple_cost relative to the other cpu costs. I know that it's possible to engineer a workload that shows any particular cost adjustment to make things worse, but in real-life production environments I have never seen an increase in this range make plan choice worse. So, would anyone like me to create patches for any of these items before we hit 9.4 beta? We have added autovacuum_work_mem, and increasing work_mem and maintenance_work_mem by 4x is a simple operation. Not sure about the others. Or do we just keep this all for 9.5? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Alvaro Herrera escribió: So here are two patches -- the first one, for 9.3 and HEAD, introduce the new aging variables and use them throughout vacuum and autovacuum, including per-table options; the second one adjusts the struct declarations to avoid the ABI break in VacuumStmt and StdRdOptions. I forgot to ask: what opinions are there about vacuum_multixact_freeze_table_age's default value? Right now I have 150 million, same as for Xids. However, it might make sense to use 300 millions, so that whole-table scans are not forced earlier than for Xids unless consumption rate for multixacts is double the one for Xids. I already have set autovacuum_multixact_freeze_max_age to 400 million, i.e. double that for Xids. This means emergency vacuums will not take place for multis, unless consumption rate is double that for Xids. This seems pretty reasonable to me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding
dig...@126.com writes: select t, t::bytea from convert_from('\xeec1', 'sql_ascii') as g(t); [ fails to check that string is valid in database encoding ] Hm, yeah. Normal input to the database goes through pg_any_to_server(), which will apply a validation step if the source encoding is SQL_ASCII and the destination encoding is something else. However, pg_convert and some other places call pg_do_encoding_conversion() directly, and that function will just quietly do nothing if either encoding is SQL_ASCII. The minimum-refactoring solution to this would be to tweak pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing. I'm not sure if this would break anything we need to have work, though. Thoughts? Do we want to back-patch such a change? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
Heikki Linnakangas hlinnakan...@vmware.com writes: Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time. Something like the attached? Can somebody who's seen this problem confirm this improves matters? (ginInsertCleanup's header comment also needs to be rewritten, but for testing purposes, this is fine.) regards, tom lane diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c index 4a65046..38cda14 100644 *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 21,26 --- 21,27 #include access/gin_private.h #include commands/vacuum.h #include miscadmin.h + #include storage/lmgr.h #include utils/memutils.h #include utils/rel.h *** ginInsertCleanup(GinState *ginstate, *** 739,744 --- 740,755 KeyArray datums; BlockNumber blkno; + /* + * We use a heavyweight lock on the metapage to ensure that only one + * backend at a time tries to clean up the pending list. While it does + * actually work for multiple backends to run this code concurrently, that + * turns out to be a bad idea because there's lots of locking conflicts. + * So if someone else is already running cleanup, we just do nothing. + */ + if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) + return; + metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO); LockBuffer(metabuffer, GIN_SHARE); metapage = BufferGetPage(metabuffer); *** ginInsertCleanup(GinState *ginstate, *** 748,753 --- 759,765 { /* Nothing to do */ UnlockReleaseBuffer(metabuffer); + UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); return; } *** ginInsertCleanup(GinState *ginstate, *** 925,930 --- 937,944 ReleaseBuffer(metabuffer); + UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); + /* Clean up temporary space */ MemoryContextSwitchTo(oldCtx); MemoryContextDelete(opCtx); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Alvaro Herrera escribió: So here are two patches -- the first one, for 9.3 and HEAD, introduce the new aging variables and use them throughout vacuum and autovacuum, including per-table options; the second one adjusts the struct declarations to avoid the ABI break in VacuumStmt and StdRdOptions. I have pushed this for both 9.3 and master. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 2014-02-13 16:15:42 -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time. Something like the attached? Can somebody who's seen this problem confirm this improves matters? (ginInsertCleanup's header comment also needs to be rewritten, but for testing purposes, this is fine.) Hm. Won't that possiby lead to the fast tuple list growing unboundedly? I think we would need to at least need to stop using the fast tuple mechanism during gininsert() if it's already too big and do plain inserts. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Another pgindent gripe
I noticed that current pgindent wants to do this to a recently-added comment in psql/copy.c: *** handleCopyIn(PGconn *conn, FILE *copystr *** 627,633 /* * This code erroneously assumes '\.' on a line alone * inside a quoted CSV string terminates the \copy. !* http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org */ if (strcmp(buf, \\.\n) == 0 || strcmp(buf, \\.\r\n) == 0) --- 627,634 /* * This code erroneously assumes '\.' on a line alone * inside a quoted CSV string terminates the \copy. !* http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@w !* rigleys.postgresql.org */ if (strcmp(buf, \\.\n) == 0 || strcmp(buf, \\.\r\n) == 0) Doesn't seem particularly friendly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another pgindent gripe
On Thu, Feb 13, 2014 at 06:14:17PM -0500, Tom Lane wrote: I noticed that current pgindent wants to do this to a recently-added comment in psql/copy.c: *** handleCopyIn(PGconn *conn, FILE *copystr *** 627,633 /* * This code erroneously assumes '\.' on a line alone * inside a quoted CSV string terminates the \copy. !* http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org */ if (strcmp(buf, \\.\n) == 0 || strcmp(buf, \\.\r\n) == 0) --- 627,634 /* * This code erroneously assumes '\.' on a line alone * inside a quoted CSV string terminates the \copy. !* http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@w !* rigleys.postgresql.org */ if (strcmp(buf, \\.\n) == 0 || strcmp(buf, \\.\r\n) == 0) Doesn't seem particularly friendly. Yeah, that is ugly, and I was about to add a URL to a C comment myself today. I can set the column target width, but there doesn't seem to be any flag that says to avoid breaking text to do the wrapping. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
Andres Freund and...@2ndquadrant.com writes: On 2014-02-13 16:15:42 -0500, Tom Lane wrote: Something like the attached? Can somebody who's seen this problem confirm this improves matters? Hm. Won't that possiby lead to the fast tuple list growing unboundedly? I think we would need to at least need to stop using the fast tuple mechanism during gininsert() if it's already too big and do plain inserts. No, because we've already got a process working on cleaning it out. In any case, this needs some testing to see if it's an improvement or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 2014-02-13 18:49:21 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-13 16:15:42 -0500, Tom Lane wrote: Something like the attached? Can somebody who's seen this problem confirm this improves matters? Hm. Won't that possiby lead to the fast tuple list growing unboundedly? I think we would need to at least need to stop using the fast tuple mechanism during gininsert() if it's already too big and do plain inserts. No, because we've already got a process working on cleaning it out. Well, appending to the fast tuple list will normally be fully cached io, cleaning it up not so much. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's what's bothering me, too. On the other hand, if we can't think of a scenario where it'd be necessary to replay the high-offset update, then I'm disinclined to mess with the code further. And the whole point of the undefined page error checking is to detect cases like this, so covering them up in the name of possible edge cases we haven't thought of kind of defeats the purpose. Yeah, good point. Let's leave it alone and see if the existing fix helps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On Sun, Feb 9, 2014 at 2:22 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Sat, Jan 11, 2014 at 2:47 AM, Peter Eisentraut pete...@gmx.net wrote: On Sat, 2014-01-11 at 00:48 -0200, Fabrízio de Royes Mello wrote: Now, if bdr is installed but the validation doesn't happen unless bdr is loaded in some sense, then that is an implementation deficiency that I think we can insist be rectified before this feature is accepted. Check if extension is already installed is not enough for the first version of this feature? Elsewhere it was argued that tying this to extensions is not appropriate. I agree. It depends on how this feature is supposed to be used exactly. A replication plugin might very well be loaded via session_preload_libraries and not appear in SQL at all. In that case you need some C-level hook. In another case, an extension might want to inspect relation options from user-space triggers. So you'd need to register some SQL-level function for option validation. This could end up being two separate but overlapping features. Hi all, I taken this weekend to work on this patch and on monday or tuesday I'll send it. But I have some doubts: 1) I'm not convinced to tying this to extensions. I think this feature must enable us to just store a custom GUC. We can set custom GUCs in a backend session using SET class.variable = value, and this feature could just enable us to store it for relations/attributes. Without the complexity and overhead to register a function to validate them. That way we can use this feature to extensions and other needs too. 2) If we're implement the Robert's idea to have a function to validate the extension options then we must think about how a extension developer will register this function. Beacuse when we install a extension must have one way to get de pg_proc OID and store it in the pg_extension (or a different catalog). Or we'll implement some way to register this function at the SQL level, like ALTER EXTENSION bdr SET VALIDATE FUNCTION bdr_options_validate(); or another sintax of course. I don't know if you guys understood my concerns!! :-) Comments? Hi all, The attached patch implements the first option that I suggested before. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml index d210077..5e9ee9d 100644 --- a/doc/src/sgml/ref/alter_index.sgml +++ b/doc/src/sgml/ref/alter_index.sgml @@ -82,6 +82,14 @@ ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESE xref linkend=SQL-REINDEX to get the desired effects. /para + note + para + A custom name can be used as namespace to define a storage parameter. + Storage option pattern: namespace.option=value + (namespace=custom name, option=option name and value=option value). + See example bellow. + /para + /note /listitem /varlistentry @@ -202,6 +210,17 @@ ALTER INDEX distributors SET (fillfactor = 75); REINDEX INDEX distributors; /programlisting/para + para + To set a custom storage parameter: +programlisting +ALTER INDEX distributors + SET (bdr.do_replicate=true); +/programlisting + (bdr=custom name, do_replicate=option name and + true=option value) +/para + + /refsect1 refsect1 diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 89649a2..6fd9d67 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -213,6 +213,16 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable of statistics by the productnamePostgreSQL/productname query planner, refer to xref linkend=planner-stats. /para + + note + para +A custom name can be used as namespace to define a storage parameter. +Storage option pattern: namespace.option=value +(namespace=custom name, option=option name and value=option value). +See example bellow. + /para + /note + /listitem /varlistentry @@ -476,6 +486,10 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable commandALTER TABLE/ does not treat literalOIDS/ as a storage parameter. Instead use the literalSET WITH OIDS/ and literalSET WITHOUT OIDS/ forms to change OID status. + A custom name can be used as namespace to define a storage parameter. + Storage option pattern: namespace.option=value + (namespace=custom name, option=option name and value=option value). + See example bellow. /para /note /listitem @@ -1112,6 +1126,26 @@ ALTER TABLE distributors DROP CONSTRAINT
Re: [HACKERS] [bug fix] psql \copy doesn't end if backend is killed
MauMau maumau...@gmail.com writes: If the backend is terminated with SIGKILL while psql is running \copy table_name from file_name, the \copy didn't end forever. I expected \copy to be cancelled because the corresponding server process vanished. I just noticed this CF entry pertaining to the same problem that Stephen Frost reported a couple days ago: http://www.postgresql.org/message-id/20140211205336.gu2...@tamriel.snowman.net I believe it's been adequately fixed as of commits fa4440f516 and b8f00a46bc, but if you'd test that those handle your problem cases, I'd appreciate it. [Fix] If the message transmission fails in PQputCopyEnd(), switch conn-asyncStatus back to PGASYNC_BUSY. This patch seems inappropriate to me, because it will allow libpq to exit the COPY IN state whether or not it still has a live connection. If it does, the backend will be in an inconsistent state and we'll have a mess. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
(2014/02/13 9:51), Hiroshi Inoue wrote: (2014/02/12 12:28), Inoue, Hiroshi wrote: (2014/02/12 8:30), Tom Lane wrote: I wrote: Hiroshi Inoue in...@tpf.co.jp writes: I tried MINGW port with the attached change and successfully built src and contrib and all pararell regression tests were OK. I cleaned this up a bit (the if-nesting in Makefile.shlib was making my head hurt, not to mention that it left a bunch of dead code) and committed it. Hm ... according to buildfarm member narwhal, this doesn't work so well for plperl: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -Wno-comment -shared -o plperl.dll plperl.o SPI.o Util.o -L../../../src/port -L../../../src/common -Wl,--allow-multiple-definition -L/mingw/lib -Wl,--as-needed -LC:/Perl/lib/CORE -lperl58 -L../../../src/backend -lpostgres -lpgcommon -lpgport -lintl -lxslt -lxml2 -lssleay32 -leay32 -lz -lm -lws2_32 -lshfolder -Wl,--export-all-symbols -Wl,--out-implib=libplperl.a Cannot export .idata$4: symbol not found Cannot export .idata$5: symbol not found Cannot export .idata$6: symbol not found Cannot export .text: symbol not found Cannot export perl58_NULL_THUNK_DATA: symbol not found Creating library file: libplperl.a collect2: ld returned 1 exit status make[3]: *** [plperl.dll] Error 1 Oops I forgot to inclule plperl, tcl or python, sorry. I would retry build operations with them. Unfortunately it would take pretty long time because build operations are pretty (or veeery in an old machine) slow. Not very clear what's going on there; could this be a problem in narwhal's admittedly-ancient toolchain? As for build, plperl and pltcl are OK on both Windows7+gcc4.6.1 machine and Windows Vista+gcc3.4.5 machine. plpython is OK on gcc4.6.1 machine but causes a *initializer element is not constant* error on gcc3.4.5 machine. I've not run regression test yet. Rebuild with --disable-auto-import causes errors in contrib on both machines. Errors occur in pg_buffercache, pg_stat_statements, postgres_fdw and test_shm_mq. --enable-auto-import cures all of them. regards, Hiroshi Inoue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Hiroshi Inoue in...@tpf.co.jp writes: Rebuild with --disable-auto-import causes errors in contrib on both machines. Errors occur in pg_buffercache, pg_stat_statements, postgres_fdw and test_shm_mq. Yeah, that's the idea: we want to get the same failures as on MSVC. I'm going to put in PGDLLIMPORT macros to fix these cases, but I'm waiting for verification that Cygwin also sees the problem before making it go away. brolga unfortunately seems to have been AWOL for the past day and a half. Andrew, could you give it a kick? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] HBA files w/include support?
Greetings Hackers. I'm aware of how a pg_hba.conf file can refer to other files for including @lists of users, etc. But there is currently no support for being able to pull in entire file segments as can be done for postgresql.conf via the include directive. In the environment that I'm managing, we are using a makefile to stick together a common header with a custom section for any of several clusters and may extend this further to permit additional includes for hba rules common to groupings of clusters. Anyway, please advise. I don't recall hearing anything like this discussed. Has been proposed, discussed and voted down? Or never mentioned? Thx -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net p: 312.241.7800 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [SQL] Comparison semantics of CHAR data type
On Wed, Oct 16, 2013 at 02:17:11PM -0400, Bruce Momjian wrote: You can see the UTF8 case is fine because \n is considered greater than space, but in the C locale, where \n is less than space, the false return value shows the problem with internal_bpchar_pattern_compare() trimming the string and first comparing on lengths. This is exactly the problem you outline, where space trimming assumes everything is less than a space. For collations other than C some of those issues that have to do with string comparisons might simply be hidden, depending on how strcoll() handles inputs off different lengths: If strcoll() applies implicit space padding to the shorter value, there won't be any visible difference in ordering between bpchar and varchar values. If strcoll() does not apply such space padding, the right-trimming of bpchar values causes very similar issues even in a en_US collation. I have added the attached C comment to explain the problem, and added a TODO item to fix it if we ever break binary upgrading. Does anyone think this warrants a doc mention? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c new file mode 100644 index 502ca44..284b5d1 *** a/src/backend/utils/adt/varchar.c --- b/src/backend/utils/adt/varchar.c *** bpcharcmp(PG_FUNCTION_ARGS) *** 846,851 --- 846,863 len2; int cmp; + /* + * Trimming trailing spaces off of both strings can cause a string + * with a character less than a space to compare greater than a + * space-extended string, e.g. this returns false: + * SELECT E'ab\n'::CHAR(10) E'ab '::CHAR(10); + * even though '\n' is less than the space if CHAR(10) was + * space-extended. The correct solution would be to trim only + * the longer string to be the same length of the shorter, if + * possible, then do the comparison. However, changing this + * might break existing indexes, breaking binary upgrades. + * For details, see http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV=z...@mail.gmail.com + */ len1 = bcTruelen(arg1); len2 = bcTruelen(arg2); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HBA files w/include support?
On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote: Greetings Hackers. I'm aware of how a pg_hba.conf file can refer to other files for including @lists of users, etc. But there is currently no support for being able to pull in entire file segments as can be done for postgresql.conf via the include directive. In the environment that I'm managing, we are using a makefile to stick together a common header with a custom section for any of several clusters and may extend this further to permit additional includes for hba rules common to groupings of clusters. Anyway, please advise. I don't recall hearing anything like this discussed. Has been proposed, discussed and voted down? Or never mentioned? I have never heard of anyone request this. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)
On Sun, 2014-01-19 at 15:31 +0100, Stefan Kaltenbrunner wrote: /* followings are for client encoding only */ PG_SJIS,/* Shift JIS (Winindows-932) */ while you have that file open: s/Winindows-932/Windows-932 maybe? done -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Tue, Jan 7, 2014 at 12:33:33PM +0530, Amit Kapila wrote: On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that sucks. So it's a Windows bug. I agree we'll probably want to work around it in the end, but I still think it should be put to Microsoft PSS if we can. The usual - have we actually produced a self-contained example that does just this (and doesn't include the full postgres support) and submitted it to *microsoft* for comments? I have written a self contained win32 console application with which the issue can be reproduced. The application project is attached with this mail. Logged a support ticket with Microsoft, they could reproduce the issue with the sample application (it is same what I had posted on hackers in this thread) and working on it. Further update on this issue: Microsoft has suggested a workaround for stat API. Their suggestion is to use 'GetFileAttributesEx' instead of stat, when I tried their suggestion, it also gives me same problem as stat. Still they have not told anything about other API's (rmdir, RemoveDirectory) which has same problem. Where are we on this? Is there a check we should add in our code? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HBA files w/include support?
Bruce Momjian wrote: On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote: Greetings Hackers. I'm aware of how a pg_hba.conf file can refer to other files for including @lists of users, etc. But there is currently no support for being able to pull in entire file segments as can be done for postgresql.conf via the include directive. I have never heard of anyone request this. On the contrary, I have vague memories that this has been discussed and agreed upon; we have just never implemented it. One issue with this is that pg_hba.conf is order sensitive, which could become a trap for the unwary if includes are used carelessly. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
I hadn't noticed this thread. I will give this a look. Thanks for providing a patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HBA files w/include support?
Alvaro Herrera alvhe...@2ndquadrant.com writes: Bruce Momjian wrote: On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote: Greetings Hackers. I'm aware of how a pg_hba.conf file can refer to other files for including @lists of users, etc. But there is currently no support for being able to pull in entire file segments as can be done for postgresql.conf via the include directive. I have never heard of anyone request this. On the contrary, I have vague memories that this has been discussed and agreed upon; we have just never implemented it. Interesting and it'd surprise me if no one has ever wanted the feature. One issue with this is that pg_hba.conf is order sensitive, which could become a trap for the unwary if includes are used carelessly. Indeed. The other thing that comes to mind, is that as opposed to postgresql.conf and the include scenario there... one can do show all or query from pg_stat_activity just to see what setting they ended up with. I'm not aware of any way to probe what hba rules are loaded at runtime and thus, debugging hba config changes not really possible. I presume that a simple scenario involving just 1 level of includes not too difficult to grok but nested includes sure might be a foot gun unless there was a way to dump the resulting configs somehow. Thus pasting hba files together externally a more reliable approach. Thanks -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net p: 312.241.7800 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 14/02/14 00:49, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-13 16:15:42 -0500, Tom Lane wrote: Something like the attached? Can somebody who's seen this problem confirm this improves matters? Hm. Won't that possiby lead to the fast tuple list growing unboundedly? I think we would need to at least need to stop using the fast tuple mechanism during gininsert() if it's already too big and do plain inserts. No, because we've already got a process working on cleaning it out. In any case, this needs some testing to see if it's an improvement or not. Having some real-world experience with the fastupdate mechanism. Under concurrent load it behaves really bad. Random processes waiting for cleanup (or competing with cleanup) is going to see latency-spikes, because they magically hit that corner, thus reverting to plain inserts if it cannot add to the pending list, will not remove this problem, but will make it only hit the process actually doing the cleanup. The build in mechanism, that cleanup is i cost paid by the process who happened to fill the pendinglist, is really hard to deal with in production. More control is appreciated, perhaps even an explicit flush-mechanism..I'd like to batch up inserts during one transaction only and flush on commit. -- Jesper - with fastupdate turned off due to above issues. -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers