Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
Hello, Please find new version of patch attached with this mail containing above changes. This patch applies cleanly on current HEAD and build completed successfully on both Windows and Linux. (but master needed to be rewinded to some time ago for some compile errors.) This works correctly as before. Finally before send to commiters, would you mind changing the name of the segment Global/PostgreSQL.%u as 'Global/PostgreSQL.dsm.%u or something mentioned in the last my email? However, it is a bit different thing from this patch so I have no intention to compel to do the changing. The orphan section handles on postmaster have become a matter of documentation. I had explained this in function header of dsm_keep_segment(). The comment satisfies me. Thank you. I had added a new function in dsm_impl.c for platform specific handling and explained about new behaviour in function header. This seems quite clear for me. - Global/PostgreSQL.%u is the same literal constant with that occurred in dsm_impl_windows. It should be defined as a constant (or a macro). Changed to hash define. Thank you. - dms_impl_windows uses errcode_for_dynamic_shared_memory() for ereport and it finally falls down to errcode_for_file_access(). I think it is preferable, maybe Changed as per suggestion. I saw it done. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: compiling the docs under Gentoo
Hi, On Tuesday 11 February 2014 16:04:30 Peter Eisentraut wrote: On 1/30/14, 2:42 AM, Christian Kruse wrote: +Since Gentoo often supports different versions of a package to be +installed you have to tell the PostgreSQL build environment where the +Docbook DTD is located: +programlisting +cd /path/to/postgresql/sources/doc +make DOCBOOKSTYLE=/usr/share/sgml/docbook/sgml-dtd-4.2 +/programlisting This is wrong. To be honest I noticed a few days ago that this is unnecessary. Just installing the right packages already solved the problem, it was a fallacy that setting DOCBOOKSTYLE did help. I just didn't have had the time to send a new version of the patch, yet… Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
Hi Heikki, I need PostgreSQL9.3 which fixed this problem. It didn't happen in PostgreSQL9.2, so I agree with your proposal which changes are done against 93_STABLE and master. Can you fix this in next release(9.3.3)? Tomonari Katsumata (2014/01/13 20:16), Heikki Linnakangas wrote: On 01/09/2014 10:55 PM, Josh Berkus wrote: On 01/09/2014 12:05 PM, Heikki Linnakangas wrote: Actually, why is the partially-filled 00010002 file archived in the first place? Looking at the code, it's been like that forever, but it seems like a bad idea. If the original server is still up and running, and writing more data to that file, what will happen is that when the original server later tries to archive it, it will fail because the partial version of the file is already in the archive. Or worse, the partial version overwrites a previously archived more complete version. Oh! This explains some transient errors I've seen. Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. Now, what to do about back-branches? On one hand, I'd like to apply the same fix to all stable branches, as the current behavior is silly and always has been. On the other hand, we haven't heard any complaints about it, so we probably shouldn't fix what ain't broken. Perhaps we should apply it to 9.3, as that's where we have the acute problem the OP reported. Thoughts? In summary, I propose that we change master and REL9_3_STABLE to not archive the partial segment from previous timeline. Older branches will keep the current behavior. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Wed, Feb 12, 2014 at 2:02 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, Please find new version of patch attached with this mail containing above changes. This patch applies cleanly on current HEAD and build completed successfully on both Windows and Linux. (but master needed to be rewinded to some time ago for some compile errors.) This works correctly as before. Finally before send to commiters, would you mind changing the name of the segment Global/PostgreSQL.%u as 'Global/PostgreSQL.dsm.%u or something mentioned in the last my email? Actually in that case, to maintain consistency we need to change even in function dsm_impl_posix() which uses segment name as /PostgreSQL.%u. For windows, we have added Global to /PostgreSQL.%u, as it is mandate by windows spec. To be honest, I see no harm in changing the name as per your suggestion, as it can improve segment naming for dynamic shared memory segments, however there is no clear problem with current name as well, so I don't want to change in places this patch has no relation. I think best thing to do here is to put it as Notes To Committer, something like: Some suggestions for Committer to consider: Change the name of dsm segments from .. to .. In general, what I see is that they consider all discussion in thread, but putting some special notes like above will reduce the chance of getting overlooked by them. I have done as a reviewer previously and it worked well. However, it is a bit different thing from this patch so I have no intention to compel to do the changing. Thanks to you for understanding my position. Thanks for reviewing the patch so carefully, especially Windows part which I think was bit tricky for you to setup. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
On 2014-02-11 09:15:45 -0500, Robert Haas wrote: If I understand correctly, modifying PgBackendStatus adds additional fields to the shared memory data structure that are never used and will be returned by functions like pgstat_fetch_stat_beentry() unitialized. That seems both inefficient and a pitfall for the unwary. I don't think the will be unitialized, pgstat_fetch_stat_beentry() will do a pgstat_read_current_status() if neccessary which will initialize them. But they do take up shared memory without really needing to. I personally don't find that too bad, it's not much memory. If we want to avoid it we could have a LocalPgBackendStatus that includes the normal PgBackendStatus. Since pgstat_read_current_status() copies all the data locally, that'd be a sensible point to fill it. While that will cause a bit of churn, I'd guess we can use the infrastructure in the not too far away future for other parts. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
Re: Heikki Linnakangas 2014-01-13 52d3caff.3010...@vmware.com Actually, why is the partially-filled 00010002 file archived in the first place? Looking at the code, it's been like that forever, but it seems like a bad idea. If the original server is still up and running, and writing more data to that file, what will happen is that when the original server later tries to archive it, it will fail because the partial version of the file is already in the archive. Or worse, the partial version overwrites a previously archived more complete version. Oh! This explains some transient errors I've seen. Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. Now, what to do about back-branches? On one hand, I'd like to apply the same fix to all stable branches, as the current behavior is silly and always has been. On the other hand, we haven't heard any complaints about it, so we probably shouldn't fix what ain't broken. Perhaps we should apply it to 9.3, as that's where we have the acute problem the OP reported. Thoughts? In summary, I propose that we change master and REL9_3_STABLE to not archive the partial segment from previous timeline. Older branches will keep the current behavior. I've seen the can't archive file from the old timeline problem on 9.2 and 9.3 slaves after promotion. The problem is in conjunction with the proposed archive_command in the default postgresql.conf comments: # e.g. 'test ! -f /mnt/server/archivedir/%f cp %p /mnt/server/archivedir/%f' With 9.1, it works, but 9.2 and 9.3 don't archive anything until I remove the test ! -f part. (An alternative fix would be to declare the behavior ok and adjust that example in the config.) I've always blamed 9.2+'s cascading replication for this, but haven't investigated deeper. Christoph -- c...@df7cb.de | http://www.df7cb.de/ signature.asc Description: Digital signature
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
From: Andres Freund and...@2ndquadrant.com It's x86, right? Then it's unlikely to be actual unordered memory accesses, but if the compiler reordered: LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); to LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; proc-lwWaiting = false; head = proc-lwWaitLink; proc-lwWaitLink = NULL; PGSemaphoreUnlock(proc-sem); which it is permitted to do, yes, that could cause symptoms like you describe. Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for x86_64. The PostgreSQL was built with GCC. Any chance you have the binaries the customer ran back then around? Disassembling that piece of code might give you a hint whether that's a possible cause. I'm sorry I can't provide the module, but I attached the disassembled code code for lwlockRelease and LWLockAcquire in the executable. I'm not sure this proves something. FYI, the following stack traces are the ones obtained during two instances of hang. #0 0x0036102eaf77 in semop () from /lib64/libc.so.6 #1 0x00614707 in PGSemaphoreLock () #2 0x00659d5b in LWLockAcquire () #3 0x0047983d in RelationGetBufferForTuple () #4 0x00477f86 in heap_insert () #5 0x005a4a12 in ExecModifyTable () #6 0x0058d928 in ExecProcNode () #7 0x0058c762 in standard_ExecutorRun () #8 0x7f0cb37f99cb in pgss_ExecutorRun () from /opt/symfoserver64/lib/pg_stat_statements.so #9 0x7f0cb357f545 in explain_ExecutorRun () from /opt/symfoserver64/lib/auto_explain.so #10 0x0066a59e in ProcessQuery () #11 0x0066a7ef in PortalRunMulti () #12 0x0066afd2 in PortalRun () #13 0x00666fcb in exec_simple_query () #14 0x00668058 in PostgresMain () #15 0x00622ef1 in PostmasterMain () #16 0x005c0723 in main () #0 0x0036102eaf77 in semop () from /lib64/libc.so.6 #1 0x00614707 in PGSemaphoreLock () #2 0x00659d5b in LWLockAcquire () #3 0x0047983d in RelationGetBufferForTuple () #4 0x00477f86 in heap_insert () #5 0x005a4a12 in ExecModifyTable () #6 0x0058d928 in ExecProcNode () #7 0x0058c762 in standard_ExecutorRun () #8 0x7f0cb37f99cb in pgss_ExecutorRun () from /opt/symfoserver64/lib/pg_stat_statements.so #9 0x7f0cb357f545 in explain_ExecutorRun () from /opt/symfoserver64/lib/auto_explain.so #10 0x0066a59e in ProcessQuery () #11 0x0066a7ef in PortalRunMulti () #12 0x0066afd2 in PortalRun () #13 0x00666fcb in exec_simple_query () #14 0x00668058 in PostgresMain () #15 0x00622ef1 in PostmasterMain () #16 0x005c0723 in main () #0 0x0036102eaf77 in semop () from /lib64/libc.so.6 #1 0x00614707 in PGSemaphoreLock () #2 0x00659d5b in LWLockAcquire () #3 0x0064bb8c in ProcArrayEndTransaction () #4 0x00491216 in CommitTransaction () #5 0x004925a5 in CommitTransactionCommand () #6 0x00664cf7 in finish_xact_command () #7 0x00667145 in exec_simple_query () #8 0x00668058 in PostgresMain () #9 0x00622ef1 in PostmasterMain () #10 0x005c0723 in main () Regards MauMau Dump of assembler code for function LWLockRelease: 0x00647d40 LWLockRelease+0: push %r12 0x00647d42 LWLockRelease+2: mov%edi,%r12d 0x00647d45 LWLockRelease+5: shl$0x5,%r12 0x00647d49 LWLockRelease+9: add5192344(%rip),%r12# 0xb3b7e8 LWLockArray 0x00647d50 LWLockRelease+16:push %rbp 0x00647d51 LWLockRelease+17:mov%edi,%ebp 0x00647d53 LWLockRelease+19:push %rbx 0x00647d54 LWLockRelease+20:mov5192326(%rip),%ebx# 0xb3b7e0 num_held_lwlocks 0x00647d5a LWLockRelease+26:nopw 0x0(%rax,%rax,1) 0x00647d60 LWLockRelease+32:sub$0x1,%ebx 0x00647d63 LWLockRelease+35:js 0x647ea4 LWLockRelease+356 0x00647d69 LWLockRelease+41:movslq %ebx,%rax 0x00647d6c LWLockRelease+44:cmp%ebp,0xb3b800(,%rax,4) 0x00647d73 LWLockRelease+51:jne0x647d60 LWLockRelease+32 0x00647d75 LWLockRelease+53:mov5192293(%rip),%esi# 0xb3b7e0 num_held_lwlocks 0x00647d7b LWLockRelease+59:sub$0x1,%esi 0x00647d7e LWLockRelease+62:cmp%ebx,%esi 0x00647d80 LWLockRelease+64:mov%esi,5192282(%rip)# 0xb3b7e0 num_held_lwlocks 0x00647d86 LWLockRelease+70:jg 0x647d92 LWLockRelease+82 0x00647d88 LWLockRelease+72:jmp0x647dad LWLockRelease+109 0x00647d8a LWLockRelease+74:nopw 0x0(%rax,%rax,1) 0x00647d90 LWLockRelease+80:mov%ecx,%ebx
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 2014-02-12 20:55:32 +0900, MauMau wrote: Dump of assembler code for function LWLockRelease: could you try if you get more readable dumps by using disassemble/m? That might at least print line numbers if you have debug info installed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
From: Andres Freund and...@2ndquadrant.com On 2014-02-02 23:50:40 +0900, Fujii Masao wrote: Right. If standby_mode is enabled, checkpoint_segment can trigger the restartpoint. But the problem is that the timing of restartpoint depends on not only the checkpoint parameters (i.e., checkpoint_timeout and checkpoint_segments) that are used during archive recovery but also the checkpoint WAL that was generated by the master. Sure. But we really *need* all the WAL since the last checkpoint's redo location locally to be safe. For example, could you imagine the case where the master generated only one checkpoint WAL since the last backup and it crashed with database corruption. Then DBA decided to perform normal archive recovery by using the last backup. In this case, even if DBA reduces both checkpoint_timeout and checkpoint_segments, only one restartpoint can occur during recovery. This low frequency of restartpoint might fill up the disk space with lots of WAL files. I am not sure I understand the point of this scenario. If the primary crashed after a checkpoint, there won't be that much WAL since it happened... If the issue is that you're not using standby_mode (if so, why?), then the fix maybe is to make that apply to a wider range of situations. I guess that he is not using standby_mode because, according to his first email in this thread, he said he would like to prevent WAL from accumulating in pg_xlog during normal archive recovery (i.e., PITR). Well, that doesn't necessarily prevent you from using standby_mode... But yes, that might be the case. I wonder if we shouldn't just always look at checkpoint segments during !crash recovery. Maybe we could consider in that direction, but there is a problem. Archive recovery slows down compared to 9.1, because of repeated restartpoints. Archive recovery should be as fast as possible, because it typically applies dozens or hundreds of WAL files, and the DBA desires immediate resumption of operation. So, I think we should restore 9.1 behavior for archive recovery. The attached patch keeps restored archived WAL in pg_xlog/ only during standby recovery. It is based on Fujii-san's revison of the patch, with AllowCascadeReplication() condition removed from two if statements. Regards MauMau wal_increase_in_pitr_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
From: Andres Freund and...@2ndquadrant.com could you try if you get more readable dumps by using disassemble/m? That might at least print line numbers if you have debug info installed. OK, I'll try that tomorrow. However, the debug info is not available, because they use PostgreSQL built by themselves, not the community RPM nor EnterpriseDB's installer. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
On 2014-02-12 21:23:54 +0900, MauMau wrote: Maybe we could consider in that direction, but there is a problem. Archive recovery slows down compared to 9.1, because of repeated restartpoints. Archive recovery should be as fast as possible, because it typically applies dozens or hundreds of WAL files, and the DBA desires immediate resumption of operation. So, I think we should restore 9.1 behavior for archive recovery. It's easy to be fast, if it's not correct. I don't see how that behaviour is acceptable, imo the previous behaviour simply was a bug whose fix was too invasive to backpatch. I don't think you can convince me (but maybe others) that the old behaviour is acceptable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
So I think I've come up with a scenario that could cause this. I don't think it's exactly what happened here but maybe something analogous happened with our base backup restore. On the primary you extend a table a bunch, including adding new segments, but crash before committing (or checkpointing). Then some of the blocks but not all may be written to disk. Assume they're all written except for the last block of the first file. So what you have is a .999G file followed by, day, 9 1G files. (Or maybe the hot backup process could just catch the files in this state if a table is rapidly growing and it doesn't take care to avoid picking up new files that appear after it starts?) smgrnblocks() stops at the first 1GB segment and ignores the rest. This code in xlog uses it to calculate how many blocks to add but it only calls it once and then doesn't recheck where it's at as it extends the relation. As soon as it adds that one missing block the remaining files become visible. P_NEW always recalculates the position based on smgrnblocks each time (which sounds pretty inefficient but anyways) so it will add the requested blocks to the new end. Now this isn't enough to explain things since surely the extensions records would be in the xlog in physical order. But this could have all happened after an earlier vacuum truncated the relation and we could be replaying records that predate that. So in short, if you have a 10G table and want to overwrite the last block but the first segment is one block short then xlog will add 9G to the end and write the block there. That sounds like what we've seen. I think the easy fix is to change the code in xlogutils to be more defensive and stop as soon as it finds BufferGetBlockNumber(buffer) == blkno (which is what it has in the assert already). -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Could you include bug fix for Windows ASLR issue in the next release?
Could you include this simple patch in the upcoming minor release 9.3.3/9.2.7/9.1.12? This is very scary. This problem would cause much random trouble among Windows users. https://commitfest.postgresql.org/action/patch_view?id=1409 Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DATE type output does not follow datestyle parameter
From: Bruce Momjian br...@momjian.us On Mon, Dec 2, 2013 at 10:22:47PM +0900, MauMau wrote: So, my suggestion is to just add the following sentence right after the above one. The Postgres style is an exception: the output of the date type is either MM-DD- (e.g. 12-17-1997) or DD-MM- (e.g. 17-12-1997), which is different from the date part of the output of the timestamp type. Could you consider and add this to the manual? Yes, I will make the change unless someone objects. Could you commit this if you feel okay? I'm sorry if I missed the modified article in the devel doc. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On Feb12, 2014, at 12:55 , MauMau maumau...@gmail.com wrote: From: Andres Freund and...@2ndquadrant.com It's x86, right? Then it's unlikely to be actual unordered memory accesses, but if the compiler reordered: LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); to LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; proc-lwWaiting = false; head = proc-lwWaitLink; proc-lwWaitLink = NULL; PGSemaphoreUnlock(proc-sem); which it is permitted to do, yes, that could cause symptoms like you describe. Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for x86_64. The PostgreSQL was built with GCC. The relevant part of the disassembled binary you attached seems to be Dump of assembler code for function LWLockRelease: ... 0x00647f47 LWLockRelease+519: lea0x10(%rcx),%rdi 0x00647f4b LWLockRelease+523: movq $0x0,0x48(%rcx) 0x00647f53 LWLockRelease+531: movb $0x0,0x41(%rcx) 0x00647f57 LWLockRelease+535: callq 0x606210 PGSemaphoreUnlock I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity and lwWaiting a single-byte quantity, it's pretty much certain that the first store updates lwWaitLink and the second lwWaiting. Thus, no reordering seems to have taken place here... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 02/12/2014 05:42 PM, Florian Pflug wrote: On Feb12, 2014, at 12:55 , MauMau maumau...@gmail.com wrote: From: Andres Freund and...@2ndquadrant.com It's x86, right? Then it's unlikely to be actual unordered memory accesses, but if the compiler reordered: LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); to LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; proc-lwWaiting = false; head = proc-lwWaitLink; proc-lwWaitLink = NULL; PGSemaphoreUnlock(proc-sem); which it is permitted to do, yes, that could cause symptoms like you describe. Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for x86_64. The PostgreSQL was built with GCC. The relevant part of the disassembled binary you attached seems to be Dump of assembler code for function LWLockRelease: ... 0x00647f47 LWLockRelease+519: lea0x10(%rcx),%rdi 0x00647f4b LWLockRelease+523: movq $0x0,0x48(%rcx) 0x00647f53 LWLockRelease+531: movb $0x0,0x41(%rcx) 0x00647f57 LWLockRelease+535: callq 0x606210 PGSemaphoreUnlock I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity and lwWaiting a single-byte quantity, it's pretty much certain that the first store updates lwWaitLink and the second lwWaiting. Thus, no reordering seems to have taken place here... best regards, Florian Pflug Even if reordering was not done by compiler, it still can happen while execution. There is no warranty that two subsequent assignments will be observed by all CPU cores in the same order. So if one thread is performing p-x = 1; p-y = 2; p-x = 3; p-y = 4; then other thread can see the following combinations of (x,y): (1,2) (1,4) (3,2) (3,4) It is necessary to explicitly insert write barrier to prevent such non-deterministic behaviour. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On Wed, Feb 12, 2014 at 4:04 PM, knizhnik knizh...@garret.ru wrote: Even if reordering was not done by compiler, it still can happen while execution. There is no warranty that two subsequent assignments will be observed by all CPU cores in the same order. The x86 memory model (total store order) provides that guarantee in this specific case. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
So here's my attempt to rewrite this logic. I ended up refactoring a bit because I found it unnecessarily confusing having the mode branches in several places. I think it's much clearer just having two separate pieces of logic for RBM_NEW and the extension cases since all they have in common is the ReadBuffer call. I have to say, it scares the hell out of me that there are no regression tests for this code. I'm certainly not comfortable committing it without a few other people reading it if I haven't even run the code once. At least I know it compiles... *** a/src/backend/access/transam/xlogutils.c --- b/src/backend/access/transam/xlogutils.c *** *** 293,300 Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode) { - BlockNumber lastblock; Buffer buffer; SMgrRelation smgr; Assert(blkno != P_NEW); --- 293,300 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode) { Buffer buffer; + Page page; SMgrRelation smgr; Assert(blkno != P_NEW); *** *** 312,353 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, */ smgrcreate(smgr, forknum, true); ! lastblock = smgrnblocks(smgr, forknum); ! ! if (blkno lastblock) ! { ! /* page exists in file */ ! buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno, ! mode, NULL); ! } ! else { /* hm, page doesn't exist in file */ ! if (mode == RBM_NORMAL) { log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } - /* OK to extend the file */ - /* we do this in recovery only - no rel-extension lock needed */ - Assert(InRecovery); - buffer = InvalidBuffer; - while (blkno = lastblock) - { - if (buffer != InvalidBuffer) - ReleaseBuffer(buffer); - buffer = ReadBufferWithoutRelcache(rnode, forknum, - P_NEW, mode, NULL); - lastblock++; - } - Assert(BufferGetBlockNumber(buffer) == blkno); - } ! if (mode == RBM_NORMAL) ! { ! /* check that page has been initialized */ ! Page page = (Page) BufferGetPage(buffer); /* * We assume that PageIsNew is safe without a lock. During recovery, * there should be no other backends that could modify the buffer at --- 312,332 */ smgrcreate(smgr, forknum, true); ! if (mode == RBM_NORMAL) { /* hm, page doesn't exist in file */ ! if (blkno = smgrnblocks(smgr, forknum)) { log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } ! buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno, ! mode, NULL); + /* check that page has been initialized */ + page = (Page) BufferGetPage(buffer); + /* * We assume that PageIsNew is safe without a lock. During recovery, * there should be no other backends that could modify the buffer at *** *** 359,366 XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, log_invalid_page(rnode, forknum, blkno, true); return InvalidBuffer; } - } return buffer; } --- 338,364 log_invalid_page(rnode, forknum, blkno, true); return InvalidBuffer; } + } else { + + /* If page doesn't exist extend the file */ + while (blkno = smgrnblocks(smgr, forknum)) + { + /* we do this in recovery only - no rel-extension lock needed */ + Assert(InRecovery); + buffer = ReadBufferWithoutRelcache(rnode, forknum, + P_NEW, mode, NULL); + /* Take care not to extend the relation past where it's needed */ + if (BufferGetBlockNumber(buffer) == blkno) + return buffer; + ReleaseBuffer(buffer); + } + + /* page now exists in file */ + buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno, + mode, NULL); + } + return buffer; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote: By issue, I assume you mean to say, which compression algorithm is best for this patch. For this patch, currently we have 2 algorithm's for which results have been posted. As far as I understand Heikki is pretty sure that the latest algorithm (compression using prefix-suffix match in old and new tuple) used for this patch is better than the other algorithm in terms of CPU gain or overhead. The performance data taken by me for the worst case for this algorithm shows there is a CPU overhead for this algorithm as well. OTOH the another algorithm (compression using old tuple as history) can be a bigger win in terms I/O reduction in more number of cases. In short, it is still not decided which algorithm to choose and whether it can be enabled by default or it is better to have table level switch to enable/disable it. So I think the decision to be taken here is about below points: 1. Are we okay with I/O reduction at the expense of CPU for *worst* cases and I/O reduction without impacting CPU (better overall tps) for *favourable* cases? 2. If we are not okay with worst case behaviour, then can we provide a table-level switch, so that it can be decided by user? 3. If none of above, then is there any other way to mitigate the worst case behaviour or shall we just reject this patch and move on. Given a choice to me, I would like to go with option-2, because I think for most cases UPDATE statement will have same data for old and new tuples except for some part of tuple (generally column's having large text data are not modified), so we will be end up mostly in favourable cases and surely for worst cases we don't want user to suffer from CPU overhead, so a table-level switch is also required. I think 99.9% of users are never going to adjust this so we had better choose something we are happy to enable for effectively everyone. In my reading, prefix/suffix seemed safe for everyone. We can always revisit this if we think of something better later, as WAL format changes are not a problem for pg_upgrade. I also think making it user-tunable is so hard for users to know when to adjust as to be almost not worth the user interface complexity it adds. I suggest we go with always-on prefix/suffix mode, then add some check so the worst case is avoided by just giving up on compression. As I said previously, I think compressing the page images is the next big win in this area. I think here one might argue that for some users it is not feasible to decide whether their tuples data for UPDATE is going to be similar or completely different and they are not at all ready for any risk for CPU overhead, but they would be happy to see I/O reduction in which case it is difficult to decide what should be the value of table-level switch. Here I think the only answer is nothing is free in this world, so either make sure about the application's behaviour for UPDATE statement before going to production or just don't enable this switch and be happy with the current behaviour. Again, can't set do a minimal attempt at prefix/suffix compression so there is no measurable overhead? On the other side there will be users who will be pretty certain about their usage of UPDATE statement or atleast are ready to evaluate their application if they can get such a huge gain, so it would be quite useful feature for such users. can we move move forward with the full-page compression patch? In my opinion, it is not certain that whatever compression algorithm got decided for this patch (if any) can be directly used for full-page compression, some ideas could be used or may be the algorithm could be tweaked a bit to make it usable for full-page compression. Thanks, I understand that now. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches are fine. And I have comments in the third patch related to cache scan. Thanks for your volunteering. 1. +# contrib/dbcache/Makefile Makefile header comment is not matched with file name location. Ahh, it's an old name when I started to implement. 2.+ /* + * Estimation of average width of cached tuples - it does not make + * sense to construct a new cache if its average width is more than + * 30% of the raw data. + */ Move the estimation of average width calculation of cached tuples into the case where the cache is not found, otherwise it is an overhead for cache hit scenario. You are right. If and when existing cache is found and match, its width is obviously less than 30% of total width. 3. + if (old_cache) + attrs_used = bms_union(attrs_used, old_cache-attrs_used); can't we need the check to see the average width is more than 30%? During estimation it doesn't include the existing other attributes. Indeed. It should drop some attributes on the existing cache if total average width grows more than the threshold. Probably, we need to have a statistical variable to track how many times or how recently referenced. 4. + lchunk-right = cchunk; + lchunk-l_depth = TTREE_DEPTH(lchunk-right); I think it should be lchunk-r_depth needs to be set in a clock wise rotation. Oops, nice cache. 5. can you add some comments in the code with how the block is used? Sorry, I'll add it. A block is consumed from the head to store pointers of tuples, and from the tail to store contents of the tuples. A block can hold multiple tuples unless usage of tuple pointers from the head does not cross the area for tuple contents. Anyway, I'll put it on the source code. 6. In do_insert_tuple function I felt moving the tuples and rearranging their addresses is little bit costly. How about the following way? Always insert the tuple from the bottom of the block where the empty space is started and store their corresponding reference pointers in the starting of the block in an array. As and when the new tuple inserts this array increases from block start and tuples from block end. Just need to sort this array based on item pointers, no need to update their reference pointers. In this case the movement is required only when the tuple is moved from one block to another block and also whenever if the continuous free space is not available to insert the new tuple. you can decide based on how frequent the sorting will happen in general. It seems to me a reasonable suggestion. Probably, an easier implementation is replacing an old block with dead- spaces by a new block that contains only valid tuples, if and when dead- space grows threshold of block-usage. 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. 9. + /* merge chunks if this chunk has enough space to merge */ + ccache_merge_chunk(ccache, cchunk); calling the merge chunks for every call back of heap page prune is a overhead for vacuum. After the merge which may again leads to node splits because of new data. OK, I'll check the condition to merge the chunks, to prevent too frequent merge / split. 10. columner is present in some places of the patch. correct it. Ahh, it should be columnar. 11. In cache_scan_next function, incase of cache insert fails because of shared memory the tuple pointer is not reset and cache is NULL. Because of this during next record fetch it leads to assert as cache != NULL. You are right. I had to modify the state of scan as if normal-seqscan path, not just setting NULL on csstate-ccache. I left an older manner during try error during implementation. 12. + if (ccache-status != CCACHE_STATUS_IN_PROGRESS) + cs_put_ccache(ccache); The cache is created with refcnt as 2 and in some times two times put cache is called to eliminate it and in some times with a different approach. It is little bit confusing, can you explain in with comments with why 2 is required and how it
Re: [HACKERS] Changeset Extraction v7.5
On 2014-02-11 11:22:24 -0500, Robert Haas wrote: + context = AllocSetContextCreate(CurrentMemoryContext, + Changeset Extraction Context, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); I have my doubts about whether it's wise to make this the child of CurrentMemoryContext. Certainly, if we do that, then expectations around what that context is need to be documented. Short-lived contexts are presumably unsuitable. Well, it depends on the type of usage. In the walsender, yes, it needs to be a longliving context. Not so much in the SQL case, inside the SRF we spill all the data into a tuplestore after which we are done. I don't see which context would be more suitable as a default parent; it used to be TopMemoryContext but that requires pointless cleanup routines to handle errors... So I think documenting the requirement is the best way? I'm working on the other comments, pushing individual changes to git. Will send a new version onlist once I'm through. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Craig Ringer cr...@2ndquadrant.com writes: Great, that's what I was hoping to see - proper errors where we've omitted things, not silent miscompilation. And, just to make sure that ain't nobody happy ... brolga, our one operational Cygwin animal, is still building without complaints. We really need to understand what is going on here and why some toolchains don't seem to feel that lack of PGDLLIMPORT is a problem. Nosing around, I happened to notice this in src/template/cygwin: # --enable-auto-import gets rid of a diagnostics linker message LDFLAGS=-Wl,--allow-multiple-definition -Wl,--enable-auto-import Anybody know *exactly* what --enable-auto-import does? The name is, um, suggestive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-12 10:58:20 -0500, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: Great, that's what I was hoping to see - proper errors where we've omitted things, not silent miscompilation. And, just to make sure that ain't nobody happy ... brolga, our one operational Cygwin animal, is still building without complaints. Hm, isn't that pretty much expected? Cygwin's infrastructure tries to be unixoid, so it's not surprising that the toolchain doesn't require such strange things? Otherwise even larger amounts of unix software wouldn't run, right? I am pretty sure halfway recent versions of mingw can be made to behave similarly, but I don't really see the advantage in doing so, to the contrary even. # --enable-auto-import gets rid of a diagnostics linker message LDFLAGS=-Wl,--allow-multiple-definition -Wl,--enable-auto-import Anybody know *exactly* what --enable-auto-import does? The name is, um, suggestive. My ld(1)'s manpage has three screen's worth of details... Most of it seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1 It's essentially elf like shared library linking in pe-coff through dirty tricks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 10:58:20 -0500, Tom Lane wrote: Anybody know *exactly* what --enable-auto-import does? The name is, um, suggestive. My ld(1)'s manpage has three screen's worth of details... Most of it seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1 It's essentially elf like shared library linking in pe-coff through dirty tricks. Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-12 11:26:56 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 10:58:20 -0500, Tom Lane wrote: Anybody know *exactly* what --enable-auto-import does? The name is, um, suggestive. My ld(1)'s manpage has three screen's worth of details... Most of it seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1 It's essentially elf like shared library linking in pe-coff through dirty tricks. Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? Hm, I don't see a big advantage in detecting the errors as It won't hugely increase the buildfarm coverage. But --auto-import seems to require marking the .text sections as writable, avoiding that seems to be a good idea if we don't need it anymore. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 11:26:56 -0500, Tom Lane wrote: Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? Hm, I don't see a big advantage in detecting the errors as It won't hugely increase the buildfarm coverage. But --auto-import seems to require marking the .text sections as writable, avoiding that seems to be a good idea if we don't need it anymore. Given the rather small number of Windows machines in the buildfarm, I'd really like them all to be on the same page about what's valid and what isn't. Having to wait ~24 hours to find out if they're all happy with something isn't my idea of a good time. In any case, as long as we have discrepancies between them, I'm not going to be entirely convinced that any of them are telling the full truth. I'm going to go remove that switch and see if brolga starts failing. If it does, I'll be satisfied that we understand the issues here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DATE type output does not follow datestyle parameter
On Wed, Feb 12, 2014 at 10:08:26PM +0900, MauMau wrote: From: Bruce Momjian br...@momjian.us On Mon, Dec 2, 2013 at 10:22:47PM +0900, MauMau wrote: So, my suggestion is to just add the following sentence right after the above one. The Postgres style is an exception: the output of the date type is either MM-DD- (e.g. 12-17-1997) or DD-MM- (e.g. 17-12-1997), which is different from the date part of the output of the timestamp type. Could you consider and add this to the manual? Yes, I will make the change unless someone objects. Could you commit this if you feel okay? I'm sorry if I missed the modified article in the devel doc. OK, attached doc patch applied to head and 9.3. Thanks for the report. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index b7d7d80..30fd9bb *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** January 8 04:05:06 1999 PST *** 2205,2212 historical accident.) xref linkend=datatype-datetime-output-table shows examples of each output style. The output of the typedate/type and ! typetime/type types is of course only the date or time part ! in accordance with the given examples. /para table id=datatype-datetime-output-table --- 2205,2214 historical accident.) xref linkend=datatype-datetime-output-table shows examples of each output style. The output of the typedate/type and ! typetime/type types is generally only the date or time part ! in accordance with the given examples. However, the ! productnamePOSTGRES/ style outputs date-only values in ! acronymISO/acronym format. /para table id=datatype-datetime-output-table -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-12 11:39:43 -0500, Tom Lane wrote: I'm going to go remove that switch and see if brolga starts failing. If it does, I'll be satisfied that we understand the issues here. The manpage also has a --disable-auto-import, but doesn't document wheter enabling or disabling is the default... So maybe try to be explicit? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Wed, Feb 12, 2014 at 11:39:43AM -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 11:26:56 -0500, Tom Lane wrote: Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? Hm, I don't see a big advantage in detecting the errors as It won't hugely increase the buildfarm coverage. But --auto-import seems to require marking the .text sections as writable, avoiding that seems to be a good idea if we don't need it anymore. Given the rather small number of Windows machines in the buildfarm, I'd really like them all to be on the same page about what's valid and what isn't. Having to wait ~24 hours to find out if they're all happy with something isn't my idea of a good time. In any case, as long as we have discrepancies between them, I'm not going to be entirely convinced that any of them are telling the full truth. I'm going to go remove that switch and see if brolga starts failing. If it does, I'll be satisfied that we understand the issues here. Can document the issue in case it comes up again, please? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-12 11:50:41 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 11:39:43 -0500, Tom Lane wrote: I'm going to go remove that switch and see if brolga starts failing. If it does, I'll be satisfied that we understand the issues here. The manpage also has a --disable-auto-import, but doesn't document wheter enabling or disabling is the default... So maybe try to be explicit? Yeah, I was just wondering about that myself. Presumably --enable is not the default, or it would not have gotten added. However, I notice that it was added a long time ago (2004) ... is it possible the default changed since then? Some release notes I just found seem to suggest it is http://sourceware.org/binutils/docs-2.17/ld/WIN32.html : This feature is enabled with the `--enable-auto-import' command-line option, although it is enabled by default on cygwin/mingw. The `--enable-auto-import' option itself now serves mainly to suppress any warnings that are ordinarily emitted when linked objects trigger the feature's use. 2.17 is from 2007, so it's been enabled by default at least since then. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: Some release notes I just found seem to suggest it is http://sourceware.org/binutils/docs-2.17/ld/WIN32.html : This feature is enabled with the `--enable-auto-import' command-line option, although it is enabled by default on cygwin/mingw. Curious ... narwhal is mingw but evidently doesn't have that switch turned on. OTOH we also know it's an old version. I'm inclined to change template/win32 also. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 11:39:43 -0500, Tom Lane wrote: I'm going to go remove that switch and see if brolga starts failing. If it does, I'll be satisfied that we understand the issues here. The manpage also has a --disable-auto-import, but doesn't document wheter enabling or disabling is the default... So maybe try to be explicit? Yeah, I was just wondering about that myself. Presumably --enable is not the default, or it would not have gotten added. However, I notice that it was added a long time ago (2004) ... is it possible the default changed since then? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: So here's my attempt to rewrite this logic. I ended up refactoring a bit because I found it unnecessarily confusing having the mode branches in several places. I think it's much clearer just having two separate pieces of logic for RBM_NEW and the extension cases since all they have in common is the ReadBuffer call. I don't like that at all. It's a lot of unnecessary churn in what is indeed hard-to-test code, and personally I don't find it clearer. Nor, if you're going to complain about the cost of smgrnblocks, does it seem like a great idea to be doing that *twice* per page rather than once. How about the attached instead? regards, tom lane diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 4cd82df..f1918f6 100644 *** a/src/backend/access/transam/xlogutils.c --- b/src/backend/access/transam/xlogutils.c *** XLogReadBufferExtended(RelFileNode rnode *** 338,352 /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); buffer = InvalidBuffer; ! while (blkno = lastblock) { if (buffer != InvalidBuffer) ReleaseBuffer(buffer); buffer = ReadBufferWithoutRelcache(rnode, forknum, P_NEW, mode, NULL); - lastblock++; } ! Assert(BufferGetBlockNumber(buffer) == blkno); } if (mode == RBM_NORMAL) --- 338,358 /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); buffer = InvalidBuffer; ! do { if (buffer != InvalidBuffer) ReleaseBuffer(buffer); buffer = ReadBufferWithoutRelcache(rnode, forknum, P_NEW, mode, NULL); } ! while (BufferGetBlockNumber(buffer) blkno); ! /* Handle the corner case that P_NEW returns non-consecutive pages */ ! if (BufferGetBlockNumber(buffer) != blkno) ! { ! ReleaseBuffer(buffer); ! buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno, ! mode, NULL); ! } } if (mode == RBM_NORMAL) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: So I think I've come up with a scenario that could cause this. I don't think it's exactly what happened here but maybe something analogous happened with our base backup restore. I agree it seems like a good idea for XLogReadBufferExtended to defend itself against successive P_NEW calls possibly not returning consecutive pages. However, unless you had an operating-system-level crash on the master, this isn't sufficient to explain the problem. We'd need also a plausible theory about how a base backup could've left us with short segments in a large relation. (Or maybe the hot backup process could just catch the files in this state if a table is rapidly growing and it doesn't take care to avoid picking up new files that appear after it starts?) That's a possible explanation I guess, but it doesn't seem terribly probable from a timing standpoint. Also, you should be able to gauge the probability of this theory from knowledge of the application --- are the bloated relations all ones that would've been growing *very* rapidly during the base backup? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On Mon, Feb 10, 2014 at 7:39 PM, Magnus Hagander mag...@hagander.netwrote: On Mon, Feb 10, 2014 at 7:29 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/09/2014 02:17 PM, Magnus Hagander wrote: If an error occurs in the foreground (backup) process of pg_basebackup, and we exit in a controlled way, the background process (streaming xlog process) would stay around and keep streaming. This can happen for example if disk space runs out and there is very low activity on the server. (If there is activity on the server, the background streamer will also run out of disk space and exit) Attached patch kills it off in disconnect_and_exit(), which seems like the right thing to do to me. Any objections to applying and backpatching that for the upcoming minor releases? Do you get a different error message with this patch than before? Is the new one better than the old one? Previously you got double error messages - one from the foreground, and a second one from the background sometime in the future (whenever it eventually failed, and for whatever reason - so if it was out of disk space, it would complain about that once it got enough xlog for it to happen). With the patch you just get the error message from the first process. The background process doesn't give an error on SIGTERM, it just exists. Since there were no other objections, I've applied this patch. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] truncating pg_multixact/members
In this new version, I added a couple of fields to VacuumStmt node. How strongly do we feel this would cause an ABI break? Would we be more comfortable if I put them at the end of the struct for 9.3 instead? Do we expect third-party code to be calling vacuum()? Also, AutoVacOpts (used as part of reloptions) gained three extra fields. Since this is in the middle of StdRdOptions, it'd be somewhat more involve to put these at the end of that struct. This might be a problem if somebody has a module calling RelationIsSecurityView(). If anyone thinks we should be concerned about such an ABI change, please shout quickly. Here is patch v3, which should be final or close to. Changes from previous: Robert Haas wrote: Using Multixact capitalized just so seems odd to me. Probably should be lower case (multiple places). Changed it to be all lower case. Originally the X was also upper case, which looked even odder. This part needs some copy-editing: + para +Vacuum also allows removal of old files from the +filenamepg_multixact/members/ and filenamepg_multixact/offsets/ +subdirectories, which is why the default is a relatively low +50 million transactions. Vacuuming multixacts also allows...? And: 50 million multixacts, not transactions. I reworded this rather completely. I was missing a change to SetMultiXactIdLimit to use the multixact value instead of the one for Xids, and passing the values computed by autovacuum to vacuum(). Per discussion, new default values are 150 million for vacuum_multixact_freeze_table_age (same as the one for Xids), and 5 million for vacuum_multixact_freeze_min_age. I decided to raise autovacuum_multixact_freeze_max_age to 400 million, i.e. double the one for Xids; so there should be no more emergency vacuuming than before unless multixact consumption is more than double that for Xids. (Now that I re-read this, the same rationale would have me setting the default for vacuum_multixact_freeze_table_age to 300 million. Any votes on that?). I adjusted the default values everywhere (docs and sample config), and fixed one or two typos in the docco for Xid vacuuming that I happened to notice, as well. postgresql.conf.sample contained a couple of space-before-tab which I removed. !-- I thought about using a struct to pass all four values around in multiple routines rather than 4 ints (vacuum_set_xid_limits, cluster_rel, rebuild_relation, copy_heap_data). Decided not to for the time being. Perhaps a patch for HEAD only. -- -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 4730,4735 COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; --- 4730,4762 /listitem /varlistentry + varlistentry id=guc-autovacuum-multixact-freeze-max-age xreflabel=autovacuum_multixact_freeze_max_age + termvarnameautovacuum_multixact_freeze_max_age/varname (typeinteger/type)/term + indexterm +primaryvarnameautovacuum_multixact_freeze_max_age/varname configuration parameter/primary + /indexterm + listitem +para + Specifies the maximum age (in multixacts) that a table's + structnamepg_class/.structfieldrelminmxid/ field can + attain before a commandVACUUM/ operation is forced to + prevent multixact ID wraparound within the table. + Note that the system will launch autovacuum processes to + prevent wraparound even when autovacuum is otherwise disabled. +/para + +para + Vacuuming multixacts also allows removal of old files from the + filenamepg_multixact/members/ and filenamepg_multixact/offsets/ + subdirectories, which is why the default is a relatively low + 400 million multixacts. + This parameter can only be set at server start, but the setting + can be reduced for individual tables by changing storage parameters. + For more information see xref linkend=vacuum-for-multixact-wraparound. +/para + /listitem + /varlistentry + varlistentry id=guc-autovacuum-vacuum-cost-delay xreflabel=autovacuum_vacuum_cost_delay termvarnameautovacuum_vacuum_cost_delay/varname (typeinteger/type)/term indexterm *** *** 5138,5144 COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; structnamepg_class/.structfieldrelfrozenxid/ field has reached the age specified by this setting. The default is 150 million transactions. Although users can set this value anywhere from zero to ! one billion, commandVACUUM/ will silently limit the effective value to 95% of xref linkend=guc-autovacuum-freeze-max-age, so that a periodical manual commandVACUUM/ has a chance to run before an anti-wraparound
Re: [HACKERS] narwhal and PGDLLIMPORT
On 12/02/2014 17:26, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 10:58:20 -0500, Tom Lane wrote: Anybody know *exactly* what --enable-auto-import does? The name is, um, suggestive. My ld(1)'s manpage has three screen's worth of details... Most of it seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1 It's essentially elf like shared library linking in pe-coff through dirty tricks. Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? regards, tom lane If I am not wrong --enable-auto-import is already the default on cygwin build chain ( binutils = 2.19.51 ) so it should make no difference on latest cygwin. Not sure for you 1.7.7 build enviroment. About PGDLLIMPORT , my build log is full of warning: ‘optarg’ redeclared without dllimport attribute: previous dllimport ignored I suspect that removing will also make no difference. Marco PS: we aim unix-like builds not windows one -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-12 19:13:07 +0100, Marco Atzeri wrote: On 12/02/2014 17:26, Tom Lane wrote: Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? If I am not wrong --enable-auto-import is already the default on cygwin build chain ( binutils = 2.19.51 ) so it should make no difference on latest cygwin. Not sure for you 1.7.7 build enviroment. We're *disabling* not *enabling* it. About PGDLLIMPORT , my build log is full of warning: ‘optarg’ redeclared without dllimport attribute: previous dllimport ignored That should be fixed then. I guess cygwin's getopt.h declares it that way? I suspect that removing will also make no difference. The committed patch explicitly disables the functionality. PS: we aim unix-like builds not windows one Well, there are a significant number of caveats around the auto import functionality, so there seems little benefit in using it if all the declspec's have to be there anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Alvaro Herrera alvhe...@2ndquadrant.com writes: In this new version, I added a couple of fields to VacuumStmt node. How strongly do we feel this would cause an ABI break? Would we be more comfortable if I put them at the end of the struct for 9.3 instead? In the past we've usually added such members at the end of the struct in back branches (but put them in the logical place in HEAD). I'd recommend doing that just on principle. Also, AutoVacOpts (used as part of reloptions) gained three extra fields. Since this is in the middle of StdRdOptions, it'd be somewhat more involve to put these at the end of that struct. This might be a problem if somebody has a module calling RelationIsSecurityView(). If anyone thinks we should be concerned about such an ABI change, please shout quickly. That sounds problematic --- surely StdRdOptions might be something extensions are making use of? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. For what it's worth I've confirmed the bug in wal-e caused the initial problem. But I think it's possible to occur without that bug so I think it still needs to be addressed. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Mon, Feb 10, 2014 at 12:14 PM, Jeff Janes jeff.ja...@gmail.com wrote: Presumably whatever behavior difference you are seeing is somehow related to the use of PQconnectdbParams() rather than PQsetdbLogin(), but the fine manual does not appear to document a different between those functions as regards password-prompting behavior or .pgpass usage. It looks like PQsetdbLogin() has either NULL or empty string passed to it match 5432 in pgpass, while PQconnectdbParams() only has NULL match 5432 and empty string does not. pgbench uses empty string if no -p is specified. This make pgbench behave the way I think is correct, but it hardly seems like the right way to fix it. [ kludge ] Well, it seems to me that the threshold issue here is whether or not we should try to change the behavior of libpq. If not, then your kludge might be the best option. But if so then it isn't necessary. However, I don't feel confident to pass judgement on the what the libpq semantics should be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
I wrote: Greg Stark st...@mit.edu writes: (Or maybe the hot backup process could just catch the files in this state if a table is rapidly growing and it doesn't take care to avoid picking up new files that appear after it starts?) That's a possible explanation I guess, but it doesn't seem terribly probable from a timing standpoint. I did a bit of arithmetic using the cases you posted previously. In the first case, where block 3634978 got written to 7141472, you can make the numbers come out right if you assume that a page was missing at the end of segment 1 --- that leads to the conclusion that EOF exclusive of that missing page had been around 28.75 GB, which squares well with the relation's size on master. However, it's fairly hard to credit that the base backup would have collected full-size or nearly full-size images of segments 2 through 28 while not seeing segment 1 at full size. You'd have to assume that the rel grew by a factor of ~14 while the base backup was in progress --- and then didn't grow very much more afterwards. (What state exactly did you measure the primary rel sizes in? Was it long after the backup/restore, or did you rewind things somehow?) The other examples seem to fit the theory a bit better, but this one is hard to explain this way. The other big problem for this theory is that you said in http://www.postgresql.org/message-id/cam-w4hpvjcbrvv3dxg8aj0wzku08dhux-xybfdyqhnrn5bn...@mail.gmail.com What's worse is we created a new standby from the same base backup and replayed the same records and it didn't reproduce the problem. If this were the explanation, it oughta be reproducible that way. I still agree that XLogReadBufferExtended shouldn't be assuming that P_NEW will not skip pages. But I think we have another bug in here somewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. Should be fine; we could end up with an extra block after a failed extension operation in any case. For what it's worth I've confirmed the bug in wal-e caused the initial problem. Huh? Bug in wal-e? What bug? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 12/02/2014 19:19, Andres Freund wrote: On 2014-02-12 19:13:07 +0100, Marco Atzeri wrote: On 12/02/2014 17:26, Tom Lane wrote: Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? If I am not wrong --enable-auto-import is already the default on cygwin build chain ( binutils = 2.19.51 ) so it should make no difference on latest cygwin. Not sure for you 1.7.7 build enviroment. We're *disabling* not *enabling* it. remove is not disable if enable is already the default inside binutils and gcc. Or I am missing something ? About PGDLLIMPORT , my build log is full of warning: ‘optarg’ redeclared without dllimport attribute: previous dllimport ignored That should be fixed then. I guess cygwin's getopt.h declares it that way? from /usr/include/getopt.h #ifndef __INSIDE_CYGWIN__ extern int __declspec(dllimport) opterr;/* if error message should be printed */ extern int __declspec(dllimport) optind;/* index into parent argv vector */ extern int __declspec(dllimport) optopt;/* character checked for validity */ extern int __declspec(dllimport) optreset; /* reset getopt */ extern char __declspec(dllimport) *optarg; /* argument associated with option */ #endif I suspect that removing will also make no difference. The committed patch explicitly disables the functionality. PS: we aim unix-like builds not windows one Well, there are a significant number of caveats around the auto import functionality, so there seems little benefit in using it if all the declspec's have to be there anyway. I think that I am not currently using anymore the declspec in the build. But I could be wrong, as the the postgresql build way is the most complicated between all the ones I am dealing with. Greetings, Andres Freund Cheers Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Marco Atzeri marco.atz...@gmail.com writes: On 12/02/2014 19:19, Andres Freund wrote: On 2014-02-12 19:13:07 +0100, Marco Atzeri wrote: About PGDLLIMPORT , my build log is full of warning: âoptargâ redeclared without dllimport attribute: previous dllimport ignored That should be fixed then. I guess cygwin's getopt.h declares it that way? from /usr/include/getopt.h extern char __declspec(dllimport) *optarg; /* argument associated with option */ Hm. All of our files that use getopt also do extern char *optarg; extern intoptind, opterr; #ifdef HAVE_INT_OPTRESET extern intoptreset;/* might not be declared by system headers */ #endif regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
On Mon, Feb 10, 2014 at 7:17 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Does it make another problem if dsm_detach() also releases lwlocks being allocated on the dsm segment to be released? Lwlocks being held is tracked in the held_lwlocks[] array; its length is usually 100. In case when dsm_detach() is called towards the segment between addr ~ (addr + length), it seems to me an easy job to walk on this array to find out lwlocks on the range. Oh, sure, it could be done. I just don't see the point. If you're explicitly detaching a shared memory segment while still holding lwlocks within that segment, your code is seriously buggy. Making dsm_detach() silently release those locks would probably just be hiding a bug that you'd really rather find out about. Just my rough idea. Doesn't it make sense to have an offset from the head of DSM segment that contains the lwlock, instead of the identifier form, to indicate a cstring datum? It does not prevent to allocate it later; after the postmaster starting up, and here is no problem if number of dynamic lwlocks grows millions or more. If lwlock has a tranche_offset, not tranche_id, all it needs to do is: 1. find out either of DSM segment or regular SHM segment that contains the supplied lwlock. 2. Calculate mapped_address + tranche_offset; that is the local location where cstring form is put. Well, if that offset is 8 bytes, it's going to make the LWLock structure grow past 32 bytes on common platforms, which I do not want to do. If it's 4 bytes, sure, that'll work, but now you've gotta make sure that string gets into the 4GB of the relevant shared memory segment, which sounds like an annoying requirement. How would you satisfy it with respect to the main shared memory segment, for example? I mean, one way to handle this problem is to put a hash table in the main shared memory segment with tranche ID - name mappings. Backends can suck mappings out of there and cache them locally as needed. But that's a lot of mechanism for a facility with no known users. Probably, it needs a common utility routine on dsm.c to find out a particular DSM segment that contains the supplied address. (Inverse function towards dsm_segment_address) Yeah, I've thought of that. It may be useful enough to be worth doing, whether we use it for this or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. Should be fine; we could end up with an extra block after a failed extension operation in any case. I know it's fine on the active database, I'm not so clear whether it's compatible with the xlog records from the primary. I suppose it'll just see an Initialize Page record and happily see the nul block and initialize it. It's still a bit scary. I of course think my code is vastly clearer than the existing code but yes, I see the unnecessary churn argument. I think that's the fundamental problem with lacking tests, it makes the code get more and more complex as we're reluctant to simplify it out of fear. For what it's worth I've confirmed the bug in wal-e caused the initial problem. Huh? Bug in wal-e? What bug? WAL-E actually didn't restore a whole 1GB file due to a transient S3 problem, in fact a bunch of them. It's remarkable that Postgres kept going with that much data missing. But the arithmetic worked out on the case I checked it on, which was the last one that I just sent the xlog record for last night. In that case there was precisely one segment missing and the relation was extended by the number of segments you would expect if it filled in that missing segment and then jumped to the end of the relation. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] issue with gininsert under very high load
A client of ours encountered this problem when upgrading (via slony) from 8.4 to 9.2, and then from 8.4 to to 9.3. The application is a telephony app that inserts call records at pretty hight volume in the busiest part of the day, which usually starts around 10.00 am to 11.00 am US eastern time. What happened was that the system load average shot up from, say, 6 to way over 100, and the database became largely unresponsive. This problem did not occur on 8.4. The platform is CentOS 6.4 / x86_64. On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. I attached a debugger to the wedged process and got this backtrace: #0 0x00386a2eaf37 in semop () from /lib64/libc.so.6 #1 0x006021a7 in PGSemaphoreLock () #2 0x006463a1 in LWLockAcquire () #3 0x00631997 in ReadBuffer_common () #4 0x006322ee in ReadBufferExtended () #5 0x00467475 in ginPrepareFindLeafPage () #6 0x0046663f in ginInsertItemPointers () #7 0x00462adc in ginEntryInsert () #8 0x0046cad0 in ginInsertCleanup () #9 0x0046d7c6 in ginHeapTupleFastInsert () #10 0x00462cfa in gininsert () #11 0x00722b33 in FunctionCall6Coll () #12 0x0048bdff in index_insert () #13 0x00587595 in ExecInsertIndexTuples () #14 0x005948e1 in ExecModifyTable () #15 0x0057e718 in ExecProcNode () #16 0x0057d512 in standard_ExecutorRun () [...] Armed with this the client identified a single gin index (on an array of text) that could have been involved, and on removing the index the problem simply went away, and they have now been live on 9.3 for 36 hours without a mishap. The client is attempting to create a self-contained reproducible test, but understandably are not going to re-add the index to their production system. I'm not terribly familiar with the gin code, so I'm reporting this in the hope that other people who are more familiar with it than I am might know where to look for a potential race condition or other bug that might have caused this. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. I attached a debugger to the wedged process and got this backtrace: #0 0x00386a2eaf37 in semop () from /lib64/libc.so.6 #1 0x006021a7 in PGSemaphoreLock () #2 0x006463a1 in LWLockAcquire () #3 0x00631997 in ReadBuffer_common () #4 0x006322ee in ReadBufferExtended () #5 0x00467475 in ginPrepareFindLeafPage () #6 0x0046663f in ginInsertItemPointers () #7 0x00462adc in ginEntryInsert () #8 0x0046cad0 in ginInsertCleanup () #9 0x0046d7c6 in ginHeapTupleFastInsert () #10 0x00462cfa in gininsert () #11 0x00722b33 in FunctionCall6Coll () #12 0x0048bdff in index_insert () #13 0x00587595 in ExecInsertIndexTuples () #14 0x005948e1 in ExecModifyTable () #15 0x0057e718 in ExecProcNode () #16 0x0057d512 in standard_ExecutorRun () [...] Armed with this the client identified a single gin index (on an array of text) that could have been involved, and on removing the index the problem simply went away, and they have now been live on 9.3 for 36 hours without a mishap. The client is attempting to create a self-contained reproducible test, but understandably are not going to re-add the index to their production system. I'm not terribly familiar with the gin code, so I'm reporting this in the hope that other people who are more familiar with it than I am might know where to look for a potential race condition or other bug that might have caused this. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. The fix is usually to just turn FASTUPDATE off for all indexes. If it's done after the initial creation, the table should be vacuumed afterwards to flush it. There's previously been talk about changing the limits to something more reasonable but it got stalled in bikeshedding IIRC. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. Should be fine; we could end up with an extra block after a failed extension operation in any case. I know it's fine on the active database, I'm not so clear whether it's compatible with the xlog records from the primary. I suppose it'll just see an Initialize Page record and happily see the nul block and initialize it. It's still a bit scary. Well, we can easily find uninitialized extra pages on the primary too, so if WAL replay were unable to cope with that, it would be a bug regardless. Huh? Bug in wal-e? What bug? WAL-E actually didn't restore a whole 1GB file due to a transient S3 problem, in fact a bunch of them. Hah. Okay, I think we can write this issue off as closed then. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
* Andres Freund (and...@2ndquadrant.com) wrote: There's previously been talk about changing the limits to something more reasonable but it got stalled in bikeshedding IIRC. As I recall, there was argument that we didn't really need a new GUC for this (which was the proposal) but rather just need to pick a reasonable (small) value and hard-code it. Are there objections to doing so, or are there cases where that would be a serious problem? How do people feel about 4MB? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] issue with gininsert under very high load
Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. I'm not sure that what Andrew is describing can fairly be called a concurrent-performance problem. It sounds closer to a stuck lock. Are you sure you've diagnosed it correctly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
I wrote: Greg Stark st...@mit.edu writes: WAL-E actually didn't restore a whole 1GB file due to a transient S3 problem, in fact a bunch of them. Hah. Okay, I think we can write this issue off as closed then. Oh, wait a minute. It's not just a matter of whether we find the right block: we also have to consider whether XLogReadBufferExtended will apply the right mode behavior. Currently, it supposes that all pages past the initially observed EOF should be assumed to be uninitialized; but if we're working with an inconsistent database, that seems like an unsafe assumption. It might be that a page is there but we've not (yet) fixed the length of some preceding segment. If we want to not get bogus WAL contains references to invalid pages failures in such scenarios, it seems like we need a more invasive change than what I just committed. I think your patch didn't cover this consideration either. What I think we probably want to do is forcibly cause the target page to exist, using a P_NEW loop like what I committed, and then decide on the basis of whether it's all-zeroes whether to consider it invalid or not. This seems sane on the grounds that it's just the extension to the page level of the existing policy of creating the file whether it existed or not. It could only result in a large amount of wasted work if the passed-in target block is insane --- but since we got it out of a CRC-checked WAL record, I think it's safe to not worry too much about that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 02/12/2014 02:50 PM, Andres Freund wrote: On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. I attached a debugger to the wedged process and got this backtrace: #0 0x00386a2eaf37 in semop () from /lib64/libc.so.6 #1 0x006021a7 in PGSemaphoreLock () #2 0x006463a1 in LWLockAcquire () #3 0x00631997 in ReadBuffer_common () #4 0x006322ee in ReadBufferExtended () #5 0x00467475 in ginPrepareFindLeafPage () #6 0x0046663f in ginInsertItemPointers () #7 0x00462adc in ginEntryInsert () #8 0x0046cad0 in ginInsertCleanup () #9 0x0046d7c6 in ginHeapTupleFastInsert () #10 0x00462cfa in gininsert () #11 0x00722b33 in FunctionCall6Coll () #12 0x0048bdff in index_insert () #13 0x00587595 in ExecInsertIndexTuples () #14 0x005948e1 in ExecModifyTable () #15 0x0057e718 in ExecProcNode () #16 0x0057d512 in standard_ExecutorRun () [...] Armed with this the client identified a single gin index (on an array of text) that could have been involved, and on removing the index the problem simply went away, and they have now been live on 9.3 for 36 hours without a mishap. The client is attempting to create a self-contained reproducible test, but understandably are not going to re-add the index to their production system. I'm not terribly familiar with the gin code, so I'm reporting this in the hope that other people who are more familiar with it than I am might know where to look for a potential race condition or other bug that might have caused this. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. The fix is usually to just turn FASTUPDATE off for all indexes. If it's done after the initial creation, the table should be vacuumed afterwards to flush it. There's previously been talk about changing the limits to something more reasonable but it got stalled in bikeshedding IIRC. So this doesn't work just when it might be most useful? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Tom Lane escribió: Alvaro Herrera alvhe...@2ndquadrant.com writes: In this new version, I added a couple of fields to VacuumStmt node. How strongly do we feel this would cause an ABI break? Would we be more comfortable if I put them at the end of the struct for 9.3 instead? In the past we've usually added such members at the end of the struct in back branches (but put them in the logical place in HEAD). I'd recommend doing that just on principle. Okay. Also, AutoVacOpts (used as part of reloptions) gained three extra fields. Since this is in the middle of StdRdOptions, it'd be somewhat more involve to put these at the end of that struct. This might be a problem if somebody has a module calling RelationIsSecurityView(). If anyone thinks we should be concerned about such an ABI change, please shout quickly. That sounds problematic --- surely StdRdOptions might be something extensions are making use of? So can we assume that security_barrier is the only thing to be concerned about? If so, the attached patch should work around the issue by placing it in the same physical location. I guess if there are modules that add extra stuff beyond StdRdOptions, this wouldn't work, but I'm not really sure how likely this is given that our reloptions design hasn't proven to be the most extensible thing in the world. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services [1;31m*** a/src/include/utils/rel.h[0;0m [1;34m--- b/src/include/utils/rel.h[0;0m [1;35m***[0;0m [1;35m*** 187,193 typedef struct RelationData[0;0m [0;0m * be applied to relations that use this format or a superset for[0;0m [0;0m * private options data.[0;0m [0;0m */[0;0m [1;31m! /* autovacuum-related reloptions. */[0;0m [0;0m typedef struct AutoVacOpts[0;0m [0;0m {[0;0m [0;0m bool enabled;[0;0m [1;35m--- 187,203 [0;0m [0;0m * be applied to relations that use this format or a superset for[0;0m [0;0m * private options data.[0;0m [0;0m */[0;0m [1;34m! /*[0;0m [1;34m! * autovacuum-related reloptions.[0;0m [1;34m! *[0;0m [1;34m! * In 9.3 starting from 9.3.3 we use a different struct definition,[0;0m [1;34m! * accomodating security_barrier inside the autovacuum struct, so that new[0;0m [1;34m! * fields could be added at the end. This is so that third-party modules[0;0m [1;34m! * compiled with the original definition of the struct can continue to[0;0m [1;34m! * access the security_barrier field in its original physical location,[0;0m [1;34m! * avoiding an ABI break. (9.4 and up use the normal definition, where[0;0m [1;34m! * security_barrier is placed outside autovacuum options.)[0;0m [1;34m! */[0;0m [0;0m typedef struct AutoVacOpts[0;0m [0;0m {[0;0m [0;0m bool enabled;[0;0m [1;35m***[0;0m [1;35m*** 200,213 typedef struct AutoVacOpts[0;0m [0;0m int freeze_table_age;[0;0m [0;0m float8 vacuum_scale_factor;[0;0m [0;0m float8 analyze_scale_factor;[0;0m [0;0m } AutoVacOpts;[0;0m [0;0m [0;0m [0;0m typedef struct StdRdOptions[0;0m [0;0m {[0;0m [0;0m int32 vl_len_; /* varlena header (do not touch directly!) */[0;0m [0;0m int fillfactor; /* page fill factor in percent (0..100) */[0;0m [1;31m! AutoVacOpts autovacuum; /* autovacuum-related options */[0;0m [1;31m! bool security_barrier; /* for views */[0;0m [0;0m } StdRdOptions;[0;0m [0;0m [0;0m [0;0m #define HEAP_MIN_FILLFACTOR 10[0;0m [1;35m--- 210,226 [0;0m [0;0m int freeze_table_age;[0;0m [0;0m float8 vacuum_scale_factor;[0;0m [0;0m float8 analyze_scale_factor;[0;0m [1;34m+ bool security_barrier;[0;0m [1;34m+ int multixact_freeze_min_age;[0;0m [1;34m+ int multixact_freeze_max_age;[0;0m [1;34m+ int multixact_freeze_table_age;[0;0m [0;0m } AutoVacOpts;[0;0m [0;0m [0;0m [0;0m typedef struct StdRdOptions[0;0m [0;0m {[0;0m [0;0m int32 vl_len_; /* varlena header (do not touch directly!) */[0;0m [0;0m int fillfactor; /* page fill factor in percent (0..100) */[0;0m [1;34m! AutoVacOpts autovacuum; /* autovacuum -- includes security_barrier */[0;0m [0;0m } StdRdOptions;[0;0m [0;0m [0;0m [0;0m #define HEAP_MIN_FILLFACTOR 10[0;0m [1;35m***[0;0m [1;35m*** 238,247 typedef struct StdRdOptions[0;0m [0;0m /*[0;0m [0;0m * RelationIsSecurityView[0;0m [0;0m * Returns whether the relation is security view, or not[0;0m [0;0m */[0;0m [0;0m #define RelationIsSecurityView(relation) \[0;0m [0;0m ((relation)-rd_options ?\[0;0m [1;31m! ((StdRdOptions *) (relation)-rd_options)-security_barrier : false)[0;0m [0;0m [0;0m [0;0m /*[0;0m [0;0m * RelationIsValid[0;0m [1;35m--- 251,264 [0;0m [0;0m /*[0;0m [0;0m * RelationIsSecurityView[0;0m [0;0m * Returns whether the relation is security
Re: [HACKERS] issue with gininsert under very high load
On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. I'm not sure that what Andrew is describing can fairly be called a concurrent-performance problem. It sounds closer to a stuck lock. Are you sure you've diagnosed it correctly? No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck. If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue with gininsert under very high load
On 02/12/2014 10:50 PM, Andres Freund wrote: On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: On investigation I found that a number of processes were locked waiting for one wedged process to end its transaction, which never happened (this transaction should normally take milliseconds). oprofile revealed that postgres was spending 87% of its time in s_lock(), and strace on the wedged process revealed that it was in a tight loop constantly calling select(). It did not respond to a SIGTERM. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. I'm not sure that what Andrew is describing can fairly be called a concurrent-performance problem. It sounds closer to a stuck lock. Are you sure you've diagnosed it correctly? No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck. If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow. Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2/11/14, 7:04 PM, Craig Ringer wrote: I don't see any use for that with plperl, but it might be a valid thing to be doing for (e.g.) hstore.dll. Though you can't really link to it from another module anyway, you have to go through the fmgr to get access to its symbols at rutime, so we can probably just skip generation of import libraries for contribs and PLs. There are cases where one module needs symbols from another directly. Would that be affected by this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On 2/12/14, 12:47 PM, Magnus Hagander wrote: Since there were no other objections, I've applied this patch. I'm getting a compiler warning: pg_basebackup.c:105:3: error: implicit declaration of function 'kill' [-Werror=implicit-function-declaration] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net wrote: On 2/11/14, 7:04 PM, Craig Ringer wrote: I don't see any use for that with plperl, but it might be a valid thing to be doing for (e.g.) hstore.dll. Though you can't really link to it from another module anyway, you have to go through the fmgr to get access to its symbols at rutime, so we can probably just skip generation of import libraries for contribs and PLs. There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
I wrote: What I think we probably want to do is forcibly cause the target page to exist, using a P_NEW loop like what I committed, and then decide on the basis of whether it's all-zeroes whether to consider it invalid or not. This seems sane on the grounds that it's just the extension to the page level of the existing policy of creating the file whether it existed or not. It could only result in a large amount of wasted work if the passed-in target block is insane --- but since we got it out of a CRC-checked WAL record, I think it's safe to not worry too much about that. Like the attached. A possible complaint is that if the WAL sequence contains updates against large relations that are later dropped, this will waste time and disk space replaying those updates as best it can. Doesn't seem like that's a case to optimize for, however. regards, tom lane diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index f1918f6..7a820c0 100644 *** a/src/backend/access/transam/xlogutils.c --- b/src/backend/access/transam/xlogutils.c *** XLogReadBuffer(RelFileNode rnode, BlockN *** 277,297 * XLogReadBufferExtended * Read a page during XLOG replay * ! * This is functionally comparable to ReadBufferExtended. There's some ! * differences in the behavior wrt. the mode argument: * ! * In RBM_NORMAL mode, if the page doesn't exist, or contains all-zeroes, we ! * return InvalidBuffer. In this case the caller should silently skip the ! * update on this page. (In this situation, we expect that the page was later ! * dropped or truncated. If we don't see evidence of that later in the WAL ! * sequence, we'll complain at the end of WAL replay.) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. * ! * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't ! * exist, and we don't check for all-zeroes. Thus, no log entry is made ! * to imply that the page should be dropped or truncated later. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, --- 277,307 * XLogReadBufferExtended * Read a page during XLOG replay * ! * This is functionally comparable to ReadBufferExtended, except that we ! * are willing to create the target page (and indeed the whole relation) ! * if it doesn't currently exist. This allows safe replay of WAL sequences ! * in which a relation was later dropped or truncated. * ! * The mode argument provides some control over this behavior. (See also ! * ReadBufferExtended's specification of what the modes do.) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. + * These modes should be used if the caller is going to initialize the page + * contents from scratch, and doesn't need it to be valid already. * ! * In RBM_NORMAL mode, we similarly create the page if needed, but if the ! * page contains all zeroes (including the case where we just created it), ! * we return InvalidBuffer. Then the caller should silently skip the update ! * on this page. This mode should be used for incremental updates where the ! * caller needs to see a valid page. (In this case, we expect that the page ! * later gets dropped or truncated. If we don't see evidence of that later in ! * the WAL sequence, we'll complain at the end of WAL replay.) ! * ! * RBM_NORMAL_NO_LOG mode is like RBM_NORMAL except that we will return an ! * all-zeroes page, and not log it as one that ought to get dropped later. ! * This mode is for when the caller wants to read a page that might validly ! * contain zeroes. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, *** XLogReadBufferExtended(RelFileNode rnode *** 299,304 --- 309,315 { BlockNumber lastblock; Buffer buffer; + bool present; SMgrRelation smgr; Assert(blkno != P_NEW); *** XLogReadBufferExtended(RelFileNode rnode *** 316,342 */ smgrcreate(smgr, forknum, true); lastblock = smgrnblocks(smgr, forknum); if (blkno lastblock) { /* page exists in file */ buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno, mode, NULL); } else { ! /* hm, page doesn't exist in file */ ! if (mode == RBM_NORMAL) ! { ! log_invalid_page(rnode, forknum, blkno, false); ! return InvalidBuffer; ! } ! if (mode == RBM_NORMAL_NO_LOG) ! return InvalidBuffer; ! /* OK to extend the file */ /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); buffer = InvalidBuffer; do { --- 327,357 */ smgrcreate(smgr, forknum, true); + /* + * On the same principle, if the page doesn't already exist in the
Re: [HACKERS] Terminating pg_basebackup background streamer
On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net wrote: On 2/12/14, 12:47 PM, Magnus Hagander wrote: Since there were no other objections, I've applied this patch. I'm getting a compiler warning: pg_basebackup.c:105:3: error: implicit declaration of function 'kill' [-Werror=implicit-function-declaration] What platform is that? And do you know which header the declaration actually lives in? I don't see it here... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net wrote: There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. No, I've not touched the PGDLLIMPORT macros. I was hoping to, but it looks like we're not getting there :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] memory usage of pg_upgrade
On Mon, Feb 3, 2014 at 09:14:10PM -0500, Bruce Momjian wrote: On Mon, Sep 9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote: In the case of tablespaces, I should have thought you could keep a hash table of the names and just store an entry id in the table structure. But that's just my speculation without actually looking at the code, so don't take my word for it :-) Yes, please feel free to improve the code. I improved pg_upgrade CPU usage for a lerge number of objects, but never thought to look at memory usage. It would be a big win to just palloc/pfree the memory, rather than allocate tones of memory. If you don't get to it, I will in a few weeks. Thanks you for pointing out this problem. I have researched the cause and the major problem was that I was allocating the maximum path length in a struct rather than allocating just the length I needed, and was not reusing string pointers that I knew were not going to change. The updated attached patch significantly decreases memory consumption: tables orig patch % decrease 127,168 kB 27,168 kB0 1k 46,136 kB 27,920 kB 39 2k 65,224 kB 28,796 kB 56 4k 103,276 kB 30,472 kB 70 8k 179,512 kB 33,900 kB 81 16k 331,860 kB 40,788 kB 88 32k 636,544 kB 54,572 kB 91 64k 1,245,920 kB 81,876 kB 93 As you can see, a database with 64k tables shows a 93% decrease in memory use. I plan to apply this for PG 9.4. Patch applied. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2/12/14, 4:30 PM, Andres Freund wrote: There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. It works reasonably well on other platforms. Of course, we can barely build extension modules on Windows, so maybe this is a bit much to ask. But as long as we're dealing only with functions, not variables, it should work without any dllimport dances, right? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Peter Eisentraut pete...@gmx.net writes: There are cases where one module needs symbols from another directly. Would that be affected by this? It works reasonably well on other platforms. Of course, we can barely build extension modules on Windows, so maybe this is a bit much to ask. But as long as we're dealing only with functions, not variables, it should work without any dllimport dances, right? AFAIK we've not changed anything that would affect that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [DOCS] [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
On 2/8/14, 4:41 PM, Tom Lane wrote: diff --git a/HISTORY b/HISTORY index ...360c7f6 . *** a/HISTORY --- b/HISTORY *** *** 0 --- 1,6 + Release notes for all versions of PostgreSQL can be found on-line at + http://www.postgresql.org/docs/devel/static/release.html Should be current instead of devel? + + In a distribution file set, release notes for the current version can be + found prebuilt under doc/src/sgml/html/. Visit the index.html file with + an HTML browser, then consult the Release Notes appendix. You can point them straight at doc/src/sgml/html/release.html. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On February 12, 2014 10:35:06 PM CET, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net wrote: There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. No, I've not touched the PGDLLIMPORT macros. I was hoping to, but it looks like we're not getting there :-( Right, that was just the test patch... Then the macros we're using in fmgr.h for the magic macros (even if not strictly needed) should work for Peter's case. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On February 12, 2014 10:34:47 PM CET, Magnus Hagander mag...@hagander.net wrote: On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net wrote: On 2/12/14, 12:47 PM, Magnus Hagander wrote: Since there were no other objections, I've applied this patch. I'm getting a compiler warning: pg_basebackup.c:105:3: error: implicit declaration of function 'kill' [-Werror=implicit-function-declaration] What platform is that? And do you know which header the declaration actually lives in? I don't see it here... Should be in the signal.h you added a bit later according to posix. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On 2/12/14, 4:34 PM, Magnus Hagander wrote: On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net wrote: On 2/12/14, 12:47 PM, Magnus Hagander wrote: Since there were no other objections, I've applied this patch. I'm getting a compiler warning: pg_basebackup.c:105:3: error: implicit declaration of function 'kill' [-Werror=implicit-function-declaration] What platform is that? And do you know which header the declaration actually lives in? I don't see it here... OS X, signal.h according to man page -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Mon, Feb 10, 2014 at 8:59 PM, Alexander Korotkov aekorot...@gmail.com wrote: Done. Patch is splitted. Thanks! I think the 1st patch now has a bug in initial_cost_mergejoin; you still pass the presorted_keys argument to cost_sort, making it calculate a partial sort cost, but generated plans never use partial sort. I think 0 should be passed instead. Patch attached, needs to be applied on top of partial-sort-basic-1 and then reverse-applied on partial-sort-merge-1. With partial-sort-basic-1 and this fix on the same test suite, the planner overhead is now a more manageable 0.5% to 1.3%; one test is faster by 0.5%. Built with asserts disabled, ran on Intel i5-3570K. In an effort to reduce variance, I locked the server and pgbench to a single CPU core (taskset -c 3), but there are still noticeable run-to-run differences, so these numbers are a bit fuzzy. The faster result is definitely not a fluke, however; it happens every time. On Mon, Feb 10, 2014 at 2:33 PM, Marti Raudsepp ma...@juffo.org wrote: AFAICT this only happens once per plan and the overhead is O(n) to the number of pathkeys? I was of course wrong about that, it also adds extra overhead when iterating over the paths list. Test taskset -c 3 run2.sh from https://github.com/intgr/benchjunk/tree/master/partial_sort Overhead percentages (between best of each 3 runs): join5.sql 0.7 star5.sql 0.8 line5.sql 0.5 lim_join5.sql -0.5 lim_star5.sql 1.3 lim_line5.sql 0.5 limord_join5.sql 0.6 limord_star5.sql 0.5 limord_line5.sql 0.7 Raw results: git 48870dd join5.sql tps = 509.328070 (excluding connections establishing) join5.sql tps = 509.772190 (excluding connections establishing) join5.sql tps = 510.651517 (excluding connections establishing) star5.sql tps = 499.208698 (excluding connections establishing) star5.sql tps = 498.200314 (excluding connections establishing) star5.sql tps = 496.269315 (excluding connections establishing) line5.sql tps = 797.968831 (excluding connections establishing) line5.sql tps = 797.011690 (excluding connections establishing) line5.sql tps = 796.379258 (excluding connections establishing) lim_join5.sql tps = 394.946024 (excluding connections establishing) lim_join5.sql tps = 395.417689 (excluding connections establishing) lim_join5.sql tps = 395.482958 (excluding connections establishing) lim_star5.sql tps = 423.434393 (excluding connections establishing) lim_star5.sql tps = 423.774305 (excluding connections establishing) lim_star5.sql tps = 424.386099 (excluding connections establishing) lim_line5.sql tps = 733.007330 (excluding connections establishing) lim_line5.sql tps = 731.794731 (excluding connections establishing) lim_line5.sql tps = 732.356280 (excluding connections establishing) limord_join5.sql tps = 385.317921 (excluding connections establishing) limord_join5.sql tps = 385.915870 (excluding connections establishing) limord_join5.sql tps = 384.747848 (excluding connections establishing) limord_star5.sql tps = 417.992615 (excluding connections establishing) limord_star5.sql tps = 416.944685 (excluding connections establishing) limord_star5.sql tps = 418.262647 (excluding connections establishing) limord_line5.sql tps = 708.979203 (excluding connections establishing) limord_line5.sql tps = 710.926866 (excluding connections establishing) limord_line5.sql tps = 710.928907 (excluding connections establishing) 48870dd + partial-sort-basic-1.patch.gz + fix-cost_sort.patch join5.sql tps = 505.488181 (excluding connections establishing) join5.sql tps = 507.222759 (excluding connections establishing) join5.sql tps = 506.549654 (excluding connections establishing) star5.sql tps = 495.432915 (excluding connections establishing) star5.sql tps = 494.906793 (excluding connections establishing) star5.sql tps = 492.623808 (excluding connections establishing) line5.sql tps = 789.315968 (excluding connections establishing) line5.sql tps = 793.875456 (excluding connections establishing) line5.sql tps = 790.545990 (excluding connections establishing) lim_join5.sql tps = 396.956732 (excluding connections establishing) lim_join5.sql tps = 397.515213 (excluding connections establishing) lim_join5.sql tps = 397.578669 (excluding connections establishing) lim_star5.sql tps = 417.459963 (excluding connections establishing) lim_star5.sql tps = 418.024803 (excluding connections establishing) lim_star5.sql tps = 418.830234 (excluding connections establishing) lim_line5.sql tps = 729.186915 (excluding connections establishing) lim_line5.sql tps = 726.288788 (excluding connections establishing) lim_line5.sql tps = 728.123296 (excluding connections establishing) limord_join5.sql tps = 383.484767 (excluding connections establishing) limord_join5.sql tps = 383.021960 (excluding connections establishing) limord_join5.sql tps = 383.722051 (excluding connections establishing) limord_star5.sql tps = 414.138460 (excluding connections establishing) limord_star5.sql tps = 414.063766 (excluding connections establishing) limord_star5.sql
Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
On Wed, Sep 11, 2013 at 02:10:45PM -0400, Robert Haas wrote: On Tue, Sep 10, 2013 at 11:33 PM, Noah Misch n...@leadboat.com wrote: I'm thinking to preserve postmaster.pid at immediate shutdown in all released versions, but I'm less sure about back-patching a change to make PGSharedMemoryCreate() pickier. On the one hand, allowing startup to proceed with backends still active in the same data directory is a corruption hazard. On the other hand, it could break weird shutdown/restart patterns that permit trivial lifespan overlap between backends of different postmasters. Opinions? I'm more sanguine about the second change than the first. Leaving postmaster.pid around seems like a clear user-visible behavior change that could break user scripts or have other consequences that we don't foresee; thus, I would vote against back-patching it. Indeed, I'm not sure it's a good idea to do that even in master. On the other hand, tightening the checks in PGSharedMemoryCreate() seems very much worth doing, and I think it might also be safe enough to back-patch. Were these changes every applied? I don't see them. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/12/2014 02:31 PM, Inoue, Hiroshi wrote: Maybe this is one of the few use cases of dlltool. Because python doesn't ship with its MINGW import library, the Makefile uses dlltool to generate an import library from the python DLL. Wow. Has anyone been in touch with the Python package maintainers to see if they can fix that and bundle the .lib ? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/13/2014 12:39 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-12 11:26:56 -0500, Tom Lane wrote: Hm. So if we're giving up on the idea of ever getting rid of PGDLLIMPORT, we ought to actually remove that, so that the Cygwin build works more like the other Windows builds? Hm, I don't see a big advantage in detecting the errors as It won't hugely increase the buildfarm coverage. But --auto-import seems to require marking the .text sections as writable, avoiding that seems to be a good idea if we don't need it anymore. Given the rather small number of Windows machines in the buildfarm, I'd really like them all to be on the same page about what's valid and what isn't. Having to wait ~24 hours to find out if they're all happy with something isn't my idea of a good time. In any case, as long as we have discrepancies between them, I'm not going to be entirely convinced that any of them are telling the full truth. I'm going to go remove that switch and see if brolga starts failing. If it does, I'll be satisfied that we understand the issues here. I wouldn't assume that _anything_ Cygwin does makes sense, or is consistent with anything else. Its attempts to produce UNIX-like behaviour on Windows cause some truly bizarre behaviour at times. It is not totally unfair to compare the level of weirdness of Cygwin to that of WINE, though they operate in completely different ways. I wouldn't suggest making any conclusions about the _other_ platforms based on Cygwin. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [DOCS] [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
Peter Eisentraut pete...@gmx.net writes: On 2/8/14, 4:41 PM, Tom Lane wrote: + Release notes for all versions of PostgreSQL can be found on-line at + http://www.postgresql.org/docs/devel/static/release.html Should be current instead of devel? + + In a distribution file set, release notes for the current version can be + found prebuilt under doc/src/sgml/html/. Visit the index.html file with + an HTML browser, then consult the Release Notes appendix. You can point them straight at doc/src/sgml/html/release.html. Done and done. I also noticed that these instructions were wrong anyway for 8.4, which still has the embedded-tarball HTML docs, so I adjusted the text for that version. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/13/2014 05:23 AM, Peter Eisentraut wrote: On 2/11/14, 7:04 PM, Craig Ringer wrote: I don't see any use for that with plperl, but it might be a valid thing to be doing for (e.g.) hstore.dll. Though you can't really link to it from another module anyway, you have to go through the fmgr to get access to its symbols at rutime, so we can probably just skip generation of import libraries for contribs and PLs. There are cases where one module needs symbols from another directly. Would that be affected by this? Yes, in that you cannot link directly to another DLL on Windows (without hoop jumping and lots of pain), you link to the import library. So if we don't generate an import library then (eg) MyExtension cannot link to hstore.dll . It can still look up function exports via the fmgr. As concluded upthread, it's easier to just generate import libraries for everything since we need it for the client library, so nothing's going to change anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Craig Ringer cr...@2ndquadrant.com writes: On 02/12/2014 02:31 PM, Inoue, Hiroshi wrote: Maybe this is one of the few use cases of dlltool. Because python doesn't ship with its MINGW import library, the Makefile uses dlltool to generate an import library from the python DLL. Wow. Has anyone been in touch with the Python package maintainers to see if they can fix that and bundle the .lib ? Indeed somebody should pester them about that, but it's not clear how much it'd help us even if they fixed it tomorrow. We're going to have to be able to build with existing Python distributions for a long time to come. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/13/2014 05:35 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net wrote: There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. No, I've not touched the PGDLLIMPORT macros. I was hoping to, but it looks like we're not getting there :-( In theory we could now remove the __declspec(dllexport) case for BUILDING_DLL, as it should now be redundant with the fixed .DEF generator. Should. Unfortunately it looks like there's not going to be any getting around having something that can turn into __declspec(dllimport) in the headers for compiling things that link to postgres.exe, though. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. yes you are correct. sorry for the noise. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Making strxfrm() blobs in indexes work
On Sun, Feb 02, 2014 at 05:09:06PM -0800, Peter Geoghegan wrote: However, it also occurs to me that strxfrm() blobs have another useful property: We (as, say, the author of an equality operator on text, an operator intended for a btree operator class) *can* trust a strcmp()'s result on blobs, provided the result isn't 0/equal, *even if the blobs are truncated*. So maybe a better scheme, and certainly a simpler one would be to have a pseudo-attribute in inner pages only with, say, the first 8 bytes of a strxfrm() blob formed from the logically-leading text attribute of the same indexTuple. Because we're only doing this on inner pages, there is a very good chance that that will be good enough most of the time. This also allows us to reduce bloat very effectively. (A bit late to the party). This idea has come up before and the most annoying thing is that braindead strxfrm api. Namely, to strxfrm a large strings you need to strxfrm it completely even if you only want the first 8 bytes. That said, since btree index tuples are limited to 3k anyway, the overhead probably isn't that bad. I think it would make a noticable difference if it can be made to work. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/13/2014 05:35 AM, Peter Eisentraut wrote: On 2/12/14, 4:30 PM, Andres Freund wrote: There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. It works reasonably well on other platforms. Of course, we can barely build extension modules on Windows, so maybe this is a bit much to ask. But as long as we're dealing only with functions, not variables, it should work without any dllimport dances, right? Don't think so. If you don't have __declspec(dllexport) or a .DEF file marking something as exported, it's not part of the DLL interface at all, it's like you compiled it with gcc using -fvisibility=hidden and didn't give the symbol __attribute__((visibility (default)) . If you _do_ have the symbol exported from the DLL, using __declspec(dllimport) or a generated .DEF file that exposes all externs, you can link to the symbol. However, from the reading I've done recently, I'm pretty sure that if you fail to declare __declspec(dllimport) on the importing side, you actually land up statically linking to a thunk function that in turn calls the real function in the DLL. So it works, but at a performance cost. So you should do the dance. Sorry. It gets worse, too. Say you want hstore to export a couple of symbols. Those symbols must be __declspec(dllexport) while everything else in headers must be __declspec(dllimport). This means you can't just use PGDLLIMPORT. You must define a HSTOREDLLIMPORT that's __declspec(dllexport) when compiling hstore and otherwise __declspec(dllimport). Then set a preprocessor macro like -DCOMPILING_HSTORE to trigger it. Even though we're generating .DEF files, I don't think we can avoid that, because we at minimum need to have the hstore exported symbols *not* annotated __declspec(dllimport) when compiling hstore, but have them so annotated when importing the header into anything else. (Come to think of it, how're we dealing with this in libpq? I wonder if postgres is linking to libpq using thunk functions?) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making strxfrm() blobs in indexes work
On Wed, Feb 12, 2014 at 3:30 PM, Martijn van Oosterhout klep...@svana.org wrote: (A bit late to the party). This idea has come up before and the most annoying thing is that braindead strxfrm api. Namely, to strxfrm a large strings you need to strxfrm it completely even if you only want the first 8 bytes. I think that in general strxfrm() must have that property. However, it does not obligate us to store the entire string, provided that we don't trust blob comparisons that indicate equality (since I believe that we cannot do so generally with a non-truncated blob, we might as well take advantage of this, particularly given we're doing this with already naturally dissimilar inner pages, where we'll mostly get away with truncation provided there is a reliable tie-breaker). Besides all this, I'm not particularly worried about the cost of calling strxfrm() if that only has to occur when a page split occurs, as we insert a downlink into the parent page. The big picture here is that we can exploit the properties of inner pages to do the smallest amount of work for the largest amount of benefit. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Craig Ringer cr...@2ndquadrant.com writes: However, from the reading I've done recently, I'm pretty sure that if you fail to declare __declspec(dllimport) on the importing side, you actually land up statically linking to a thunk function that in turn calls the real function in the DLL. So it works, but at a performance cost. Meh. I'm not really willing to go through the pushups that would be needed to deal with that, even if it can be shown that the performance cost is noticeable. We've already kluged our source code for Windows far beyond what anybody would tolerate for any other platform, and I find that to be quite enough bending over for Redmond. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
Hello, I've marked this patch as 'Ready for committer'. To be honest, I see no harm in changing the name as per your suggestion, as it can improve segment naming for dynamic shared memory segments, however there is no clear problem with current name as well, so I don't want to change in places this patch has no relation. Okay, let's go with it as it is. I think best thing to do here is to put it as Notes To Committer, something like: Some suggestions for Committer to consider: Change the name of dsm segments from .. to .. In general, what I see is that they consider all discussion in thread, but putting some special notes like above will reduce the chance of getting overlooked by them. I have done as a reviewer previously and it worked well. Thank you for the sugestion. However, it is a bit different thing from this patch so I have no intention to compel to do the changing. Thanks to you for understanding my position. Thanks for reviewing the patch so carefully, especially Windows part which I think was bit tricky for you to setup. It's my presure and I learned a lot. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-13 07:58:09 +0800, Craig Ringer wrote: On 02/13/2014 05:35 AM, Peter Eisentraut wrote: On 2/12/14, 4:30 PM, Andres Freund wrote: There are cases where one module needs symbols from another directly. Would that be affected by this? I don't think we have real infrastructure for that yet. Neither from the POV of loading several .so's, nor from a symbol visibility. Afaics we'd need a working definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed the remnants of that. It works reasonably well on other platforms. Of course, we can barely build extension modules on Windows, so maybe this is a bit much to ask. But as long as we're dealing only with functions, not variables, it should work without any dllimport dances, right? Don't think so. If you don't have __declspec(dllexport) or a .DEF file marking something as exported, it's not part of the DLL interface at all, it's like you compiled it with gcc using -fvisibility=hidden and didn't give the symbol __attribute__((visibility (default)) . If you _do_ have the symbol exported from the DLL, using __declspec(dllimport) or a generated .DEF file that exposes all externs, you can link to the symbol. However, from the reading I've done recently, I'm pretty sure that if you fail to declare __declspec(dllimport) on the importing side, you actually land up statically linking to a thunk function that in turn calls the real function in the DLL. So it works, but at a performance cost. So you should do the dance. Sorry. I don't think the thunk function will have such a high overhead in this day and age. And it's what we essentially already do for all functions called *from* extensions, no? It gets worse, too. Say you want hstore to export a couple of symbols. Those symbols must be __declspec(dllexport) while everything else in headers must be __declspec(dllimport). This means you can't just use PGDLLIMPORT. You must define a HSTOREDLLIMPORT that's __declspec(dllexport) when compiling hstore and otherwise __declspec(dllimport). Then set a preprocessor macro like -DCOMPILING_HSTORE to trigger it. We actually have a a macro that should do that, namely PGDLLEXPORT. I am not sure though, why it's not dependent on dependent on BUILDING_DLL (which is absolutely horribly misnamed, being essentially inverted). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/13/2014 08:26 AM, Andres Freund wrote: It gets worse, too. Say you want hstore to export a couple of symbols. Those symbols must be __declspec(dllexport) while everything else in headers must be __declspec(dllimport). This means you can't just use PGDLLIMPORT. You must define a HSTOREDLLIMPORT that's __declspec(dllexport) when compiling hstore and otherwise __declspec(dllimport). Then set a preprocessor macro like -DCOMPILING_HSTORE to trigger it. We actually have a a macro that should do that, namely PGDLLEXPORT. I am not sure though, why it's not dependent on dependent on BUILDING_DLL (which is absolutely horribly misnamed, being essentially inverted). Yes, it does that *for building postgres*. What I'm saying is that you need another one for hstore if you wish to be able to export symbols from hstore and import them into other libs. You can't use PGDLLEXPORT because BUILDING_DLL will be unset, so it'll expand to __declspec(dllimport) while compiling hstore. Which is wrong if you want to be exporting symbols; even if you use a .DEF file, it must expand blank, and if you don't use a .DEF it must expand to __declspec(dllexport) ... but ONLY for those symbols exported by hstore its self. So each lib that wants to export symbols needs its own equivalent of PGDLLIMPORT, and a flag set just when compiling that lib. This is the normal way it's done on Windows builds, not stuff I'm looking into and saying how _I_ think we should do it. AFAIK this is just how it's done on Windows, horrible as that is. I'll see if I can find a couple of relevant links. BTW, BUILDING_DLL is so-named because the macro definition will be taken from examples that talk about compiling a DLL, that's all. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
(2014/02/12 12:28), Inoue, Hiroshi wrote: (2014/02/12 8:30), Tom Lane wrote: I wrote: Hiroshi Inoue in...@tpf.co.jp writes: I tried MINGW port with the attached change and successfully built src and contrib and all pararell regression tests were OK. I cleaned this up a bit (the if-nesting in Makefile.shlib was making my head hurt, not to mention that it left a bunch of dead code) and committed it. Hm ... according to buildfarm member narwhal, this doesn't work so well for plperl: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -Wno-comment -shared -o plperl.dll plperl.o SPI.o Util.o -L../../../src/port -L../../../src/common -Wl,--allow-multiple-definition -L/mingw/lib -Wl,--as-needed -LC:/Perl/lib/CORE -lperl58 -L../../../src/backend -lpostgres -lpgcommon -lpgport -lintl -lxslt -lxml2 -lssleay32 -leay32 -lz -lm -lws2_32 -lshfolder -Wl,--export-all-symbols -Wl,--out-implib=libplperl.a Cannot export .idata$4: symbol not found Cannot export .idata$5: symbol not found Cannot export .idata$6: symbol not found Cannot export .text: symbol not found Cannot export perl58_NULL_THUNK_DATA: symbol not found Creating library file: libplperl.a collect2: ld returned 1 exit status make[3]: *** [plperl.dll] Error 1 Oops I forgot to inclule plperl, tcl or python, sorry. I would retry build operations with them. Unfortunately it would take pretty long time because build operations are pretty (or veeery in an old machine) slow. Not very clear what's going on there; could this be a problem in narwhal's admittedly-ancient toolchain? As for build, plperl and pltcl are OK on both Windows7+gcc4.6.1 machine and Windows Vista+gcc3.4.5 machine. plpython is OK on gcc4.6.1 machine but causes a *initializer element is not constant* error on gcc3.4.5 machine. I've not run regression test yet. regards, Hiroshi Inoue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql should show disabled internal triggers
On Thu, Nov 21, 2013 at 11:59:51PM -0200, Fabrízio de Royes Mello wrote: On Fri, Oct 25, 2013 at 3:37 PM, fabriziomello fabriziome...@gmail.com wrote: On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote: On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote: --On 18. September 2013 13:52:29 +0200 Andres Freund lt;andres@gt; wrote: If you do ALTER TABLE ... DISABLE TRIGGER ALL; and then individually re-enable the disabled triggers it's easy to miss internal triggers. A \d+ tablename will not show anything out of the ordinary for that situation since we don't show internal triggers. But foreign key checks won't work. So, how about displaying disabled internal triggers in psql? Hi had exactly the same concerns this morning while starting to look at the ENABLE/DISABLE constraint patch. However, i wouldn't display them as triggers, but maybe more generally as disabled constraints or such. Well, that will lead the user in the wrong direction, won't it? They haven't disabled the constraint but the trigger. Especially as we already have NOT VALID and might grow DISABLED for constraint themselves... Hi, The attached patch [1] enable PSQL to list internal disabled triggers in \d only in versions = 9.0. [1] psql-display-all-triggers-v1.patch http://postgresql.1045698.n5.nabble.com/file/n5775954/ psql-display-all-triggers-v1.patch As others, I am concerned about people being confused when funny-looking trigger names suddenly appearing when you disable all table triggers. What I ended up doing is to create a user and internal section when displaying disabled triggers: Disabled user triggers: check_update BEFORE UPDATE ON orders FOR EACH ROW EXECUTE PROCEDURE trigf() Disabled internal triggers: RI_ConstraintTrigger_c_16409 AFTER INSERT ON orders FROM customer NOT DEF ... RI_ConstraintTrigger_c_16410 AFTER UPDATE ON orders FROM customer NOT DEF ... I kept the Triggers section unchanged, showing only user triggers. I also updated the code to handle 8.3+ servers. Patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c new file mode 100644 index 3cb8df2..a194ce7 *** a/src/bin/psql/describe.c --- b/src/bin/psql/describe.c *** describeOneTableDetails(const char *sche *** 2090,2104 printfPQExpBuffer(buf, SELECT t.tgname, pg_catalog.pg_get_triggerdef(t.oid%s), ! t.tgenabled\n FROM pg_catalog.pg_trigger t\n WHERE t.tgrelid = '%s' AND , (pset.sversion = 9 ? , true : ), ! oid); if (pset.sversion = 9) ! appendPQExpBufferStr(buf, NOT t.tgisinternal); else if (pset.sversion = 80300) ! appendPQExpBufferStr(buf, t.tgconstraint = 0); else appendPQExpBufferStr(buf, (NOT tgisconstraint --- 2090,2108 printfPQExpBuffer(buf, SELECT t.tgname, pg_catalog.pg_get_triggerdef(t.oid%s), ! t.tgenabled, %s\n FROM pg_catalog.pg_trigger t\n WHERE t.tgrelid = '%s' AND , (pset.sversion = 9 ? , true : ), ! (pset.sversion = 9 ? t.tgisinternal : ! pset.sversion = 80300 ? ! t.tgconstraint 0 AS tgisinternal : ! false AS tgisinternal), oid); if (pset.sversion = 9) ! /* display/warn about disabled internal triggers */ ! appendPQExpBuffer(buf, (NOT t.tgisinternal OR (t.tgisinternal AND t.tgenabled = 'D'))); else if (pset.sversion = 80300) ! appendPQExpBufferStr(buf, (t.tgconstraint = 0 OR (t.tgconstraint 0 AND t.tgenabled = 'D'))); else appendPQExpBufferStr(buf, (NOT tgisconstraint *** describeOneTableDetails(const char *sche *** 2124,2130 * disabled triggers and the two special ALWAYS and REPLICA * configurations. */ ! for (category = 0; category 4; category++) { have_heading = false; for (i = 0; i tuples; i++) --- 2128,2134 * disabled triggers and the two special ALWAYS and REPLICA * configurations. */ ! for (category = 0; category = 4; category++) { have_heading = false; for (i = 0; i tuples; i++) *** describeOneTableDetails(const char *sche *** 2133,2143 --- 2137,2149 const char *tgdef; const char *usingpos; const char *tgenabled; + const char *tgisinternal; /* * Check if this trigger falls into the current category */ tgenabled = PQgetvalue(result, i, 2); + tgisinternal = PQgetvalue(result, i, 3); list_trigger = false; switch (category) { *** describeOneTableDetails(const char *sche *** 2146,2159
Re: [HACKERS] Dead code or buggy code?
On Fri, Sep 20, 2013 at 12:13:18AM +0100, Greg Stark wrote: So I'm just going to make the code defensive and assume NULL is possible when if maybe it isn't. In case it's not clear, this is one of the thing's Xi Wang's took picked up. There not to many but it turns out they are indeed not all in the adt code so I might wait until after the commit fest to commit it to avoid causing bit churn. Uh, where are we on this? --- -- greg On 19 Sep 2013 12:52, Robert Haas robertmh...@gmail.com wrote: On Wed, Sep 18, 2013 at 6:20 PM, Greg Stark st...@mit.edu wrote: The following code is in the ProcSleep at proc.c:1138. GetBlockingAutoVacuumPgproc() should presumably always return a vacuum pgproc entry since the deadlock state says it's blocked by autovacuum. But I'm not really familiar enough with this codepath to know whether there's not a race condition here where it can sometimes return null. The following code checks autovac != NULL but the PGXACT initializer would have seg faulted if it returned NULL if that's possible. if (deadlock_state == DS_BLOCKED_BY_AUTOVACUUM allow_autovacuum_cancel) { PGPROC *autovac = GetBlockingAutoVacuumPgproc(); PGXACT *autovac_pgxact = ProcGlobal-allPgXact[autovac-pgprocno]; LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); /* * Only do it if the worker is not working to protect against Xid * wraparound. */ if ((autovac != NULL) (autovac_pgxact-vacuumFlags PROC_IS_AUTOVACUUM) !(autovac_pgxact-vacuumFlags PROC_VACUUM_FOR_WRAPAROUND)) { Hmm, yeah. I remember noticing this some time ago but never got around to fixing it. +1 for rearranging things there somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Oh, wait a minute. It's not just a matter of whether we find the right block: we also have to consider whether XLogReadBufferExtended will apply the right mode behavior. Currently, it supposes that all pages past the initially observed EOF should be assumed to be uninitialized; but if we're working with an inconsistent database, that seems like an unsafe assumption. It might be that a page is there but we've not (yet) fixed the length of some preceding segment. If we want to not get bogus WAL contains references to invalid pages failures in such scenarios, it seems like we need a more invasive change than what I just committed. I think your patch didn't cover this consideration either. Hm. I *think* those cases would be handled anyways since the table would later be truncated. Arguably any reference after the short segment is a reference to an invalid page since it means it's a record which predates the records which caused the extension. Obviously it would still give the error in the case we had where files were missing but then probably it should give errors in that case. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones
On Fri, Nov 15, 2013 at 10:40:20AM +0200, Heikki Linnakangas wrote: The BTRFS_IOC_CLONE ioctl operates on file level and can be used to clone files anywhere in a btrfs filesystem. Hmm, you can also do cp --reflog -r data92 data-tmp I think you mean --reflink here. pg_upgrade --link --old-datadir=data92-copy --new-datadir=data-tmp rm -rf data-tmp That BTRFS_IOC_CLONE ioctl seems so hacky that I'd rather not get that in our source tree. cp --reflog is much more likely to get that magic incantation right, since it gets a lot more attention and testing than pg_upgrade. I'm not in favor of adding filesystem-specific tricks into pg_upgrade. It would be nice to list these tricks in the docs, though. I have applied the attached patch which suggests the use of file system snapshots and copy-on-write file copies. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml new file mode 100644 index 3d529b2..bb3b6a0 *** a/doc/src/sgml/pgupgrade.sgml --- b/doc/src/sgml/pgupgrade.sgml *** psql --username postgres --file script.s *** 569,575 the old server and run commandrsync/ again to update the copy with any changes to make it consistent. You might want to exclude some files, e.g. filenamepostmaster.pid/, as documented in xref !linkend=backup-lowlevel-base-backup. /para refsect2 --- 569,578 the old server and run commandrsync/ again to update the copy with any changes to make it consistent. You might want to exclude some files, e.g. filenamepostmaster.pid/, as documented in xref !linkend=backup-lowlevel-base-backup. If your file system supports !file system snapshots or copy-on-write file copying, you can use that !to make a backup of the old cluster, though the snapshot and copies !must be created simultaneously or while the database server is down. /para refsect2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prevent pg_basebackup -Fp -D -?
On Thu, Oct 3, 2013 at 06:50:57AM +0200, Magnus Hagander wrote: On Oct 3, 2013 2:47 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Oct 2, 2013 at 11:31 PM, Magnus Hagander mag...@hagander.net wrote: Right now, if you use pg_basebackup -Ft -D - you get a tarfile, written to stdout, for redirection. However, if you use: pg_basebackup -Fp -D - you get a plaintext (unpackaged) backup, in a directory called -. I can't think of a single usecase where this is a good idea. Therefor, I would suggest we simply throw an error in this case, instead of creating the directory. Only for the specific case of specifying exactly - as a directory. Comments? Isn't this a non-problem? This behavior is in line with the documentation, so I would suspected that if directory name is specified as - in plain mode, it should create the folder with this name. Do you consider having a folder of this name an annoyance? Yes, that is exactly the point - i do consider that an annoyance, and i don't see the use case where you'd actually want it. I bet 100% of the users of that have been accidental, thinking they'd get the pipe, not the directory. Also, if we do that, is this something we should consider backpatchable? It's not strictly speaking a bugfix, but I'd say it fixes some seriously annoying behavior. This would change the spec of pg_basebackup, so no? Does the current behavior have potential security issues? No, there are no security issues that I can see. Just annoyance. And yes, I guess it would change the spec, so backpatching might be a bad idea.. Has this been fixed? If so, I don't see it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql should show disabled internal triggers
On Thu, Feb 13, 2014 at 12:04 AM, Bruce Momjian br...@momjian.us wrote: On Thu, Nov 21, 2013 at 11:59:51PM -0200, Fabrízio de Royes Mello wrote: On Fri, Oct 25, 2013 at 3:37 PM, fabriziomello fabriziome...@gmail.com wrote: On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote: On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote: --On 18. September 2013 13:52:29 +0200 Andres Freund lt;andres@gt; wrote: If you do ALTER TABLE ... DISABLE TRIGGER ALL; and then individually re-enable the disabled triggers it's easy to miss internal triggers. A \d+ tablename will not show anything out of the ordinary for that situation since we don't show internal triggers. But foreign key checks won't work. So, how about displaying disabled internal triggers in psql? Hi had exactly the same concerns this morning while starting to look at the ENABLE/DISABLE constraint patch. However, i wouldn't display them as triggers, but maybe more generally as disabled constraints or such. Well, that will lead the user in the wrong direction, won't it? They haven't disabled the constraint but the trigger. Especially as we already have NOT VALID and might grow DISABLED for constraint themselves... Hi, The attached patch [1] enable PSQL to list internal disabled triggers in \d only in versions = 9.0. [1] psql-display-all-triggers-v1.patch http://postgresql.1045698.n5.nabble.com/file/n5775954/ psql-display-all-triggers-v1.patch As others, I am concerned about people being confused when funny-looking trigger names suddenly appearing when you disable all table triggers. What I ended up doing is to create a user and internal section when displaying disabled triggers: Disabled user triggers: check_update BEFORE UPDATE ON orders FOR EACH ROW EXECUTE PROCEDURE trigf() Disabled internal triggers: RI_ConstraintTrigger_c_16409 AFTER INSERT ON orders FROM customer NOT DEF ... RI_ConstraintTrigger_c_16410 AFTER UPDATE ON orders FROM customer NOT DEF ... I kept the Triggers section unchanged, showing only user triggers. I also updated the code to handle 8.3+ servers. Patch attached. Makes more sense than my previous patch... The code looks fine to me!! Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Oh, wait a minute. It's not just a matter of whether we find the right block: we also have to consider whether XLogReadBufferExtended will apply the right mode behavior. Currently, it supposes that all pages past the initially observed EOF should be assumed to be uninitialized; but if we're working with an inconsistent database, that seems like an unsafe assumption. It might be that a page is there but we've not (yet) fixed the length of some preceding segment. If we want to not get bogus WAL contains references to invalid pages failures in such scenarios, it seems like we need a more invasive change than what I just committed. I think your patch didn't cover this consideration either. Hm. I *think* those cases would be handled anyways since the table would later be truncated. Arguably any reference after the short segment is a reference to an invalid page since it means it's a record which predates the records which caused the extension. Well, that would be the case if you assume perfectly sequential filesystem behavior, but I'm not sure the assumption holds if the starting condition is a base backup. We could be looking at a version of segment 1 that predates segment 2's existence, and yet see some data in segment 2 as well, because it's not a perfectly coherent snapshot. I think what you're arguing is that we should see WAL records filling the rest of segment 1 before we see any references to segment 2, but if that's the case then how did we get into the situation you reported? Or is it just that it was a broken base backup to start with? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 02/11/2014 08:19 PM, Yeb Havinga wrote: I compared output of psql -ef of the minirim.sql script posted earlier in http://www.postgresql.org/message-id/52f54927.1040...@gmail.com between v4 and v7. Not everything is ok. +psql:/home/m/minirim2.sql:409: ERROR: attribute 6 has wrong type +DETAIL: Table has type tsrange, but query expects _confidentialitycode. This looks like an issue with attribute transformations made when applying security barrier quals. +psql:/home/m/minirim2.sql:439: connection to server was lost That's dying with: Program received signal SIGABRT, Aborted. 0x003f02c359e9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x003f02c359e9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x003f02c370f8 in __GI_abort () at abort.c:90 #2 0x00758d9d in ExceptionalCondition (conditionName=conditionName@entry=0x8b7bf0 !(!bms_is_empty(upper_varnos)), errorType=errorType@entry=0x78e89c FailedAssertion, fileName=fileName@entry=0x8b78d7 subselect.c, lineNumber=lineNumber@entry=1394) at assert.c:54 #3 0x0061de2b in convert_EXISTS_sublink_to_join (root=optimized out, sublink=optimized out, under_not=under_not@entry=0 '\000', available_rels=0x117d038) at subselect.c:1394 #4 0x0061efa9 in pull_up_sublinks_qual_recurse (root=root@entry=0x117d058, node=0x117a0f8, jtlink1=jtlink1@entry=0x7fff6d97f708, available_rels1=available_rels1@entry=0x117d038, jtlink2=jtlink2@entry=0x0, available_rels2=available_rels2@entry=0x0) at prepjointree.c:391 #5 0x0061f339 in pull_up_sublinks_jointree_recurse (root=root@entry=0x117d058, jtnode=0x117a040, relids=relids@entry=0x7fff6d97f758) at prepjointree.c:208 #6 0x0062066f in pull_up_sublinks (root=root@entry=0x117d058) at prepjointree.c:147 #7 0x0061967e in subquery_planner (glob=0x10c3fb8, parse=parse@entry=0x1179af8, parent_root=parent_root@entry=0x1182fd0, hasRecursion=hasRecursion@entry=0 '\000', ` #8 0x005fc093 in set_subquery_pathlist (root=root@entry=0x1182fd0, rel=rel@entry=0x1179370, rti=rti@entry=4, rte=rte@entry=0x1183980) at allpaths.c:1209 #9 0x005fc54e in set_rel_size (root=root@entry=0x1182fd0, rel=0x1179370, rti=rti@entry=4, rte=0x1183980) at allpaths.c:265 #10 0x005fc62e in set_base_rel_sizes (root=root@entry=0x1182fd0) at allpaths.c:180 #11 0x005fd584 in make_one_rel (root=root@entry=0x1182fd0, joinlist=joinlist@entry=0x1179560) at allpaths.c:138 #12 0x0061617a in query_planner (root=root@entry=0x1182fd0, tlist=tlist@entry=0x11771c8, qp_callback=qp_callback@entry=0x616fd6 standard_qp_callback, qp_extra=qp_extra@entry=0x7fff6d97fa00) at planmain.c:236 #13 0x006182f2 in grouping_planner (root=root@entry=0x1182fd0, tuple_fraction=tuple_fraction@entry=0) at planner.c:1280 #14 0x00619a11 in subquery_planner (glob=glob@entry=0x10c3fb8, parse=parse@entry=0x10c3d30, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=0, subroot=subroot@entry=0x7fff6d97fb58) at planner.c:574 #15 0x00619c1f in standard_planner (parse=0x10c3d30, cursorOptions=0, boundParams=0x0) at planner.c:211 #16 0x00619e35 in planner (parse=parse@entry=0x10c3d30, cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0) at planner.c:139 #17 0x0068c64b in pg_plan_query (querytree=0x10c3d30, cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0) at postgres.c:759 #18 0x0068c6d3 in pg_plan_queries (querytrees=optimized out, cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0) at postgres.c:818 #19 0x0068c932 in exec_simple_query (query_string=query_string@entry=0x10c2d78 SELECT * FROM hl7.test;) at postgres.c:983 #20 0x0068e46e in PostgresMain (argc=optimized out, argv=argv@entry=0x10cd1f0, dbname=0x10cd058 regress, username=optimized out) at postgres.c:4003 #21 0x006419a7 in BackendRun (port=port@entry=0x10c7e50) at postmaster.c:4080 #22 0x006432c2 in BackendStartup (port=port@entry=0x10c7e50) at postmaster.c:3769 #23 0x00643526 in ServerLoop () at postmaster.c:1580 #24 0x0064467c in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x10a8750) at postmaster.c:1235 #25 0x005d692c in main (argc=4, argv=0x10a8750) at main.c:205 (gdb) It's crashing while pulling up the query over emp (hl7.employee) and part (hl7.participation). Given the simplicity of what the row-security code its self is doing, I'm wondering if this is a case that isn't handled in updatable s.b. views. I'll look into it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Feb 12, 2014 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote: I think 99.9% of users are never going to adjust this so we had better choose something we are happy to enable for effectively everyone. In my reading, prefix/suffix seemed safe for everyone. We can always revisit this if we think of something better later, as WAL format changes are not a problem for pg_upgrade. Agreed. I also think making it user-tunable is so hard for users to know when to adjust as to be almost not worth the user interface complexity it adds. I suggest we go with always-on prefix/suffix mode, then add some check so the worst case is avoided by just giving up on compression. As I said previously, I think compressing the page images is the next big win in this area. I think here one might argue that for some users it is not feasible to decide whether their tuples data for UPDATE is going to be similar or completely different and they are not at all ready for any risk for CPU overhead, but they would be happy to see I/O reduction in which case it is difficult to decide what should be the value of table-level switch. Here I think the only answer is nothing is free in this world, so either make sure about the application's behaviour for UPDATE statement before going to production or just don't enable this switch and be happy with the current behaviour. Again, can't set do a minimal attempt at prefix/suffix compression so there is no measurable overhead? Yes, currently it is there at 25%, which means there should be atleast 25% match in prefix-suffix, then only we consider it for compression and that is pretty fast and almost no overhead, but the worst case here is other way i.e when the string has 25% match in prefix-suffix, but after that there is no match or at least in next few bytes there is no match. For example, consider below 2 cases: Case-1 old tuple aaa new tuple bbb bbba Here there is a suffix match for 25% of string, but after that there is no match, so we have to copy all the 75% remaining bytes as it is byte-by-byte. Now here with bit longer tuples (800 bytes), the performance data taken be me shows around ~11% of CPU overhead. Now as this test is a fabricated test to just see how much extra CPU it consumes for worst scenario, in reality user might not see this, at least in synchronous commit mode on, because there is always some I/O involved at end of transaction (unless there is some error in between or user rollbacks transaction chances of which are very less). First thing that comes to mind after seeing above scenario, is that why not increase the minimum limit of 25%, because we have almost negligible overhead in comparing prefix-suffix, so I have tried that by increasing it to 35% or more but in that case it starts falling from other side like for cases when there is 34% match and still we return. Here one of the improvements which can be done is that after prefix-suffix match, instead of going byte-by-byte copy as per LZ format we can directly copy all the remaining part of tuple but I think that would require us to use some different format than LZ which is also not too difficult to do, but the question is do we really need such a change to handle the above kind of worst case. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Well, concurrent updates towards a particular table may cause lock contention due to a giant lock. On the other hands, one my headache is how to avoid dead-locking if we try to implement it using finer granularity locking. Please assume per-chunk locking. It also needs to take a lock on the neighbor nodes when a record is moved out. Concurrently, some other process may try to move another record with inverse order. That is a ticket for dead-locking. Is there idea or reference to implement concurrent tree structure updating? Anyway, it is a good idea to measure the impact of concurrent updates on cached tables, to find out the significance of lock splitting. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Haribabu Kommi [mailto:kommi.harib...@gmail.com] Sent: Thursday, February 13, 2014 8:31 AM To: Kohei KaiGai Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. yes you are correct. sorry for the noise. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila amit.kapil...@gmail.com wrote: Here one of the improvements which can be done is that after prefix-suffix match, instead of going byte-by-byte copy as per LZ format we can directly copy all the remaining part of tuple but I think that would require us to use some different format than LZ which is also not too difficult to do, but the question is do we really need such a change to handle the above kind of worst case. Why use LZ at all? Why not *only* prefix/suffix? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Thu, Feb 13, 2014 at 10:07 AM, Claudio Freire klaussfre...@gmail.com wrote: On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila amit.kapil...@gmail.com wrote: Here one of the improvements which can be done is that after prefix-suffix match, instead of going byte-by-byte copy as per LZ format we can directly copy all the remaining part of tuple but I think that would require us to use some different format than LZ which is also not too difficult to do, but the question is do we really need such a change to handle the above kind of worst case. Why use LZ at all? We are just using LZ *format* to represent compressed string. Just copied some text from pg_lzcompress.c, to explain what exactly we are using the first byte after the header tells what to do the next 8 times. We call this the control byte. An unset bit in the control byte means, that one uncompressed byte follows, which is copied from input to output. A set bit in the control byte means, that a tag of 2-3 bytes follows. A tag contains information to copy some bytes, that are already in the output buffer, to the current location in the output. Why not *only* prefix/suffix? To represent prefix/suffix match, we atleast need a way to tell that the offset and len of matched bytes and then how much is the length of unmatched bytes we have copied. I agree that a simpler format could be devised if we just want to do prefix-suffix match, but that would require much more test during recovery to ensure everything is fine, advantage with LZ format is that we don't need to bother about decoding, it will work as without any much change in LZ decode routine. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 11, 2014 at 11:37 AM, Bruce Momjian br...@momjian.us wrote: Yes, that was my point. I though the compression of full-page images was a huge win and that compression was pretty straight-forward, except for the compression algorithm. If the compression algorithm issue is resolved, can we move move forward with the full-page compression patch? Discussion of the full-page compression patch properly belongs on that thread rather than this one. However, based on what we've discovered so far here, I won't be very surprised if that patch turns out to have serious problems with CPU consumption. The evidence from this thread suggests that making even relatively lame attempts at compression is extremely costly in terms of CPU overhead. Now, the issues with straight-up compression are somewhat different than for delta compression and, in particular, it's easier to bail out of straight-up compression sooner if things aren't working out. But even with all that, I expect it to be not too difficult to find cases where some compression is achieved but with a dramatic increase in runtime on CPU-bound workloads. Which is basically the same problem this patch has. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers