date:20140212

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-12 Thread Kyotaro HORIGUCHI

Hello,

 Please find new version of patch attached with this mail containing
 above changes.

This patch applies cleanly on current HEAD and build completed
successfully on both Windows and Linux. (but master needed to be
rewinded to some time ago for some compile errors.)

This works correctly as before.

Finally before send to commiters, would you mind changing the
name of the segment Global/PostgreSQL.%u as
'Global/PostgreSQL.dsm.%u or something mentioned in the last my
email? However, it is a bit different thing from this patch so
I have no intention to compel to do the changing.


  The orphan section handles on postmaster have become a matter of
  documentation.
 
 I had explained this in function header of dsm_keep_segment().

The comment satisfies me. Thank you.

 I had added a new function in dsm_impl.c for platform specific
 handling and explained about new behaviour in function header.

This seems quite clear for me.

  - Global/PostgreSQL.%u is the same literal constant with that
occurred in dsm_impl_windows. It should be defined as a
constant (or a macro).
 
 Changed to hash define.

Thank you.

  - dms_impl_windows uses errcode_for_dynamic_shared_memory() for
ereport and it finally falls down to
errcode_for_file_access(). I think it is preferable, maybe
 
 Changed as per suggestion.

I saw it done.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: compiling the docs under Gentoo

2014-02-12 Thread Christian Kruse

Hi,

On Tuesday 11 February 2014 16:04:30 Peter Eisentraut wrote:
 On 1/30/14, 2:42 AM, Christian Kruse wrote:
  +Since Gentoo often supports different versions of a package to be
  +installed you have to tell the PostgreSQL build environment where the
  +Docbook DTD is located:
  +programlisting
  +cd /path/to/postgresql/sources/doc
  +make DOCBOOKSTYLE=/usr/share/sgml/docbook/sgml-dtd-4.2
  +/programlisting
 
 This is wrong.

To be honest I noticed a few days ago that this is unnecessary. Just
installing the right packages already solved the problem, it was a
fallacy that setting DOCBOOKSTYLE did help.

I just didn't have had the time to send a new version of the patch,
yet…

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-12 Thread Tomonari Katsumata


Hi Heikki,

I need PostgreSQL9.3 which fixed this problem.

It didn't happen in PostgreSQL9.2, so I agree
with your proposal which changes are done
against 93_STABLE and master.

Can you fix this in next release(9.3.3)?


Tomonari Katsumata

(2014/01/13 20:16), Heikki Linnakangas wrote:
 On 01/09/2014 10:55 PM, Josh Berkus wrote:
 On 01/09/2014 12:05 PM, Heikki Linnakangas wrote:
 Actually, why is the partially-filled 00010002 file
 archived in the first place? Looking at the code, it's been like that
 forever, but it seems like a bad idea. If the original server is still
 up and running, and writing more data to that file, what will happen is
 that when the original server later tries to archive it, it will fail
 because the partial version of the file is already in the archive. Or
 worse, the partial version overwrites a previously archived more
 complete version.

 Oh!  This explains some transient errors I've seen.
 Wouldn't it be better to not archive the old segment, and instead 
switch

 to a new segment after writing the end-of-recovery checkpoint, so that
 the segment on the new timeline is archived sooner?

 It would be better to zero-fill and switch segments, yes. We should
 NEVER be in a position of archiving two different versions of the same
 segment.

 Ok, I think we're in agreement that that's the way to go for master.

 Now, what to do about back-branches? On one hand, I'd like to apply 
the same fix to all stable branches, as the current behavior is silly 
and always has been. On the other hand, we haven't heard any complaints 
about it, so we probably shouldn't fix what ain't broken. Perhaps we 
should apply it to 9.3, as that's where we have the acute problem the OP 
reported. Thoughts?


 In summary, I propose that we change master and REL9_3_STABLE to not 
archive the partial segment from previous timeline. Older branches will 
keep the current behavior.


 - Heikki





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-12 Thread Amit Kapila

On Wed, Feb 12, 2014 at 2:02 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello,

 Please find new version of patch attached with this mail containing
 above changes.

 This patch applies cleanly on current HEAD and build completed
 successfully on both Windows and Linux. (but master needed to be
 rewinded to some time ago for some compile errors.)

 This works correctly as before.

 Finally before send to commiters, would you mind changing the
 name of the segment Global/PostgreSQL.%u as
 'Global/PostgreSQL.dsm.%u or something mentioned in the last my
 email?

Actually in that case, to maintain consistency we need to change even in
function dsm_impl_posix() which uses segment name as /PostgreSQL.%u.
For windows, we have added Global to /PostgreSQL.%u, as it is
mandate by windows spec.
To be honest, I see no harm in changing the name as per your suggestion,
as it can improve segment naming for dynamic shared memory segments,
however there is no clear problem with current name as well, so I don't
want to change in places this patch has no relation.

I think best thing to do here is to put it as Notes To Committer, something
like:
Some suggestions for Committer to consider:
Change the name of dsm segments from .. to ..

In general, what I see is that they consider all discussion in thread, but
putting some special notes like above will reduce the chance of getting
overlooked by them. I have done as a reviewer previously and it worked
well.

 However, it is a bit different thing from this patch so
 I have no intention to compel to do the changing.

Thanks to you for understanding my position.


Thanks for reviewing the patch so carefully, especially Windows part
which I think was bit tricky for you to setup.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication

2014-02-12 Thread Andres Freund

On 2014-02-11 09:15:45 -0500, Robert Haas wrote:
 If I understand correctly, modifying PgBackendStatus adds additional
 fields to the shared memory data structure that are never used and
 will be returned by functions like pgstat_fetch_stat_beentry()
 unitialized.  That seems both inefficient and a pitfall for the
 unwary.

I don't think the will be unitialized, pgstat_fetch_stat_beentry() will
do a pgstat_read_current_status() if neccessary which will initialize
them.

But they do take up shared memory without really needing to. I
personally don't find that too bad, it's not much memory. If we want to
avoid it we could have a LocalPgBackendStatus that includes the normal
PgBackendStatus. Since pgstat_read_current_status() copies all the data
locally, that'd be a sensible point to fill it. While that will cause a
bit of churn, I'd guess we can use the infrastructure in the not too far
away future for other parts.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-12 Thread Christoph Berg

Re: Heikki Linnakangas 2014-01-13 52d3caff.3010...@vmware.com
 Actually, why is the partially-filled 00010002 file
 archived in the first place? Looking at the code, it's been like that
 forever, but it seems like a bad idea. If the original server is still
 up and running, and writing more data to that file, what will happen is
 that when the original server later tries to archive it, it will fail
 because the partial version of the file is already in the archive. Or
 worse, the partial version overwrites a previously archived more
 complete version.
 
 Oh!  This explains some transient errors I've seen.
 
 Wouldn't it be better to not archive the old segment, and instead switch
 to a new segment after writing the end-of-recovery checkpoint, so that
 the segment on the new timeline is archived sooner?
 
 It would be better to zero-fill and switch segments, yes.  We should
 NEVER be in a position of archiving two different versions of the same
 segment.
 
 Ok, I think we're in agreement that that's the way to go for master.
 
 Now, what to do about back-branches? On one hand, I'd like to apply
 the same fix to all stable branches, as the current behavior is
 silly and always has been. On the other hand, we haven't heard any
 complaints about it, so we probably shouldn't fix what ain't broken.
 Perhaps we should apply it to 9.3, as that's where we have the acute
 problem the OP reported. Thoughts?
 
 In summary, I propose that we change master and REL9_3_STABLE to not
 archive the partial segment from previous timeline. Older branches
 will keep the current behavior.

I've seen the can't archive file from the old timeline problem on
9.2 and 9.3 slaves after promotion. The problem is in conjunction with
the proposed archive_command in the default postgresql.conf comments:

# e.g. 'test ! -f /mnt/server/archivedir/%f  cp %p /mnt/server/archivedir/%f'

With 9.1, it works, but 9.2 and 9.3 don't archive anything until I
remove the test ! -f part. (An alternative fix would be to declare
the behavior ok and adjust that example in the config.)

I've always blamed 9.2+'s cascading replication for this, but haven't
investigated deeper.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread MauMau


From: Andres Freund and...@2ndquadrant.com

It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   proc-lwWaiting = false;
   PGSemaphoreUnlock(proc-sem);
to
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   proc-lwWaiting = false;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   PGSemaphoreUnlock(proc-sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.


Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for 
x86_64.  The PostgreSQL was built with GCC.



Any chance you have the binaries the customer ran back then around?
Disassembling that piece of code might give you a hint whether that's a
possible cause.


I'm sorry I can't provide the module, but I attached the disassembled code 
code for lwlockRelease and LWLockAcquire in the executable.  I'm not sure 
this proves something.


FYI, the following stack traces are the ones obtained during two instances 
of hang.


#0  0x0036102eaf77 in semop () from /lib64/libc.so.6
#1  0x00614707 in PGSemaphoreLock ()
#2  0x00659d5b in LWLockAcquire ()
#3  0x0047983d in RelationGetBufferForTuple ()
#4  0x00477f86 in heap_insert ()
#5  0x005a4a12 in ExecModifyTable ()
#6  0x0058d928 in ExecProcNode ()
#7  0x0058c762 in standard_ExecutorRun ()
#8  0x7f0cb37f99cb in pgss_ExecutorRun () from 
/opt/symfoserver64/lib/pg_stat_statements.so
#9  0x7f0cb357f545 in explain_ExecutorRun () from 
/opt/symfoserver64/lib/auto_explain.so

#10 0x0066a59e in ProcessQuery ()
#11 0x0066a7ef in PortalRunMulti ()
#12 0x0066afd2 in PortalRun ()
#13 0x00666fcb in exec_simple_query ()
#14 0x00668058 in PostgresMain ()
#15 0x00622ef1 in PostmasterMain ()
#16 0x005c0723 in main ()

#0  0x0036102eaf77 in semop () from /lib64/libc.so.6
#1  0x00614707 in PGSemaphoreLock ()
#2  0x00659d5b in LWLockAcquire ()
#3  0x0047983d in RelationGetBufferForTuple ()
#4  0x00477f86 in heap_insert ()
#5  0x005a4a12 in ExecModifyTable ()
#6  0x0058d928 in ExecProcNode ()
#7  0x0058c762 in standard_ExecutorRun ()
#8  0x7f0cb37f99cb in pgss_ExecutorRun () from 
/opt/symfoserver64/lib/pg_stat_statements.so
#9  0x7f0cb357f545 in explain_ExecutorRun () from 
/opt/symfoserver64/lib/auto_explain.so

#10 0x0066a59e in ProcessQuery ()
#11 0x0066a7ef in PortalRunMulti ()
#12 0x0066afd2 in PortalRun ()
#13 0x00666fcb in exec_simple_query ()
#14 0x00668058 in PostgresMain ()
#15 0x00622ef1 in PostmasterMain ()
#16 0x005c0723 in main ()


#0  0x0036102eaf77 in semop () from /lib64/libc.so.6
#1  0x00614707 in PGSemaphoreLock ()
#2  0x00659d5b in LWLockAcquire ()
#3  0x0064bb8c in ProcArrayEndTransaction ()
#4  0x00491216 in CommitTransaction ()
#5  0x004925a5 in CommitTransactionCommand ()
#6  0x00664cf7 in finish_xact_command ()
#7  0x00667145 in exec_simple_query ()
#8  0x00668058 in PostgresMain ()
#9  0x00622ef1 in PostmasterMain ()
#10 0x005c0723 in main ()


Regards
MauMau


Dump of assembler code for function LWLockRelease:
0x00647d40 LWLockRelease+0: push   %r12
0x00647d42 LWLockRelease+2: mov%edi,%r12d
0x00647d45 LWLockRelease+5: shl$0x5,%r12
0x00647d49 LWLockRelease+9: add5192344(%rip),%r12# 0xb3b7e8 
LWLockArray
0x00647d50 LWLockRelease+16:push   %rbp
0x00647d51 LWLockRelease+17:mov%edi,%ebp
0x00647d53 LWLockRelease+19:push   %rbx
0x00647d54 LWLockRelease+20:mov5192326(%rip),%ebx# 0xb3b7e0 
num_held_lwlocks
0x00647d5a LWLockRelease+26:nopw   0x0(%rax,%rax,1)
0x00647d60 LWLockRelease+32:sub$0x1,%ebx
0x00647d63 LWLockRelease+35:js 0x647ea4 LWLockRelease+356
0x00647d69 LWLockRelease+41:movslq %ebx,%rax
0x00647d6c LWLockRelease+44:cmp%ebp,0xb3b800(,%rax,4)
0x00647d73 LWLockRelease+51:jne0x647d60 LWLockRelease+32
0x00647d75 LWLockRelease+53:mov5192293(%rip),%esi# 0xb3b7e0 
num_held_lwlocks
0x00647d7b LWLockRelease+59:sub$0x1,%esi
0x00647d7e LWLockRelease+62:cmp%ebx,%esi
0x00647d80 LWLockRelease+64:mov%esi,5192282(%rip)# 0xb3b7e0 
num_held_lwlocks
0x00647d86 LWLockRelease+70:jg 0x647d92 LWLockRelease+82
0x00647d88 LWLockRelease+72:jmp0x647dad LWLockRelease+109
0x00647d8a LWLockRelease+74:nopw   0x0(%rax,%rax,1)
0x00647d90 LWLockRelease+80:mov%ecx,%ebx

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread Andres Freund

On 2014-02-12 20:55:32 +0900, MauMau wrote:
 Dump of assembler code for function LWLockRelease:

could you try if you get more readable dumps by using disassemble/m?
That might at least print line numbers if you have debug info installed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

2014-02-12 Thread MauMau


From: Andres Freund and...@2ndquadrant.com

On 2014-02-02 23:50:40 +0900, Fujii Masao wrote:

Right. If standby_mode is enabled, checkpoint_segment can trigger
the restartpoint. But the problem is that the timing of restartpoint
depends on not only the checkpoint parameters (i.e.,
checkpoint_timeout and checkpoint_segments) that are used during
archive recovery but also the checkpoint WAL that was generated
by the master.


Sure. But we really *need* all the WAL since the last checkpoint's redo
location locally to be safe.


For example, could you imagine the case where the master generated
only one checkpoint WAL since the last backup and it crashed with
database corruption. Then DBA decided to perform normal archive
recovery by using the last backup. In this case, even if DBA reduces
both checkpoint_timeout and checkpoint_segments, only one
restartpoint can occur during recovery. This low frequency of
restartpoint might fill up the disk space with lots of WAL files.


I am not sure I understand the point of this scenario. If the primary
crashed after a checkpoint, there won't be that much WAL since it
happened...


 If the issue is that you're not using standby_mode (if so, why?), then
 the fix maybe is to make that apply to a wider range of situations.

I guess that he is not using standby_mode because, according to
his first email in this thread, he said he would like to prevent WAL
from accumulating in pg_xlog during normal archive recovery (i.e., PITR).


Well, that doesn't necessarily prevent you from using
standby_mode... But yes, that might be the case.

I wonder if we shouldn't just always look at checkpoint segments during
!crash recovery.


Maybe we could consider in that direction, but there is a problem.  Archive 
recovery slows down compared to 9.1, because of repeated restartpoints. 
Archive recovery should be as fast as possible, because it typically applies 
dozens or hundreds of WAL files, and the DBA desires immediate resumption of 
operation.


So, I think we should restore 9.1 behavior for archive recovery.  The 
attached patch keeps restored archived WAL in pg_xlog/ only during standby 
recovery.  It is based on Fujii-san's revison of the patch, with 
AllowCascadeReplication() condition removed from two if statements.


Regards
MauMau


wal_increase_in_pitr_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread MauMau


From: Andres Freund and...@2ndquadrant.com

could you try if you get more readable dumps by using disassemble/m?
That might at least print line numbers if you have debug info installed.


OK, I'll try that tomorrow.  However, the debug info is not available, 
because they use PostgreSQL built by themselves, not the community RPM nor 
EnterpriseDB's installer.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

2014-02-12 Thread Andres Freund

On 2014-02-12 21:23:54 +0900, MauMau wrote:
 Maybe we could consider in that direction, but there is a problem.  Archive
 recovery slows down compared to 9.1, because of repeated restartpoints.
 Archive recovery should be as fast as possible, because it typically applies
 dozens or hundreds of WAL files, and the DBA desires immediate resumption of
 operation.
 
 So, I think we should restore 9.1 behavior for archive recovery.

It's easy to be fast, if it's not correct. I don't see how that
behaviour is acceptable, imo the previous behaviour simply was a bug
whose fix was too invasive to backpatch.

I don't think you can convince me (but maybe others) that the old
behaviour is acceptable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark

So I think I've come up with a scenario that could cause this. I don't
think it's exactly what happened here but maybe something analogous
happened with our base backup restore.

On the primary you extend a table a bunch, including adding new
segments, but crash before committing (or checkpointing). Then some of
the blocks but not all may be written to disk. Assume they're all
written except for the last block of the first file. So what you have
is a .999G file followed by, day, 9 1G files. (Or maybe the hot backup
process could just catch the files in this state if a table is rapidly
growing and it doesn't take care to avoid picking up new files that
appear after it starts?)

smgrnblocks() stops at the first  1GB segment and ignores the rest.
This code in xlog uses it to calculate how many blocks to add but it
only calls it once and then doesn't recheck where it's at as it
extends the relation. As soon as it adds that one missing block the
remaining files become visible. P_NEW always recalculates the position
based on smgrnblocks each time (which sounds pretty  inefficient but
anyways) so it will add the requested blocks to the new end.

Now this isn't enough to explain things since surely the extensions
records would be in the xlog in physical order. But this could have
all happened after an earlier vacuum truncated the relation and we
could be replaying records that predate that.

So in short, if you have a 10G table and want to overwrite the last
block but the first segment is one block short then xlog will add 9G
to the end and write the block there. That sounds like what we've
seen.

I think the easy fix is to change the code in xlogutils to be more
defensive and stop as soon as it finds BufferGetBlockNumber(buffer) ==
blkno (which is what it has in the assert already).
-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Could you include bug fix for Windows ASLR issue in the next release?

2014-02-12 Thread MauMau

Could you include this simple patch in the upcoming minor release 
9.3.3/9.2.7/9.1.12?  This is very scary.  This problem would cause much 
random trouble among Windows users.


https://commitfest.postgresql.org/action/patch_view?id=1409

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DATE type output does not follow datestyle parameter

2014-02-12 Thread MauMau


From: Bruce Momjian br...@momjian.us

On Mon, Dec  2, 2013 at 10:22:47PM +0900, MauMau wrote:

So, my suggestion is to just add the following sentence right after
the above one.

The Postgres style is an exception: the output of the date type is
either MM-DD- (e.g. 12-17-1997) or DD-MM- (e.g. 17-12-1997),
which is different from the date part of the output of the timestamp
type.

Could you consider and add this to the manual?


Yes, I will make the change unless someone objects.


Could you commit this if you feel okay?  I'm sorry if I missed the modified 
article in the devel doc.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread Florian Pflug

On Feb12, 2014, at 12:55 , MauMau maumau...@gmail.com wrote:
 From: Andres Freund and...@2ndquadrant.com
 It's x86, right? Then it's unlikely to be actual unordered memory
 accesses, but if the compiler reordered:
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   proc-lwWaiting = false;
   PGSemaphoreUnlock(proc-sem);
 to
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   proc-lwWaiting = false;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   PGSemaphoreUnlock(proc-sem);
 which it is permitted to do, yes, that could cause symptoms like you
 describe.
 
 Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for 
 x86_64.
 The PostgreSQL was built with GCC.

The relevant part of the disassembled binary you attached seems to be

Dump of assembler code for function LWLockRelease:
...
0x00647f47 LWLockRelease+519: lea0x10(%rcx),%rdi
0x00647f4b LWLockRelease+523: movq   $0x0,0x48(%rcx)
0x00647f53 LWLockRelease+531: movb   $0x0,0x41(%rcx)
0x00647f57 LWLockRelease+535: callq  0x606210 PGSemaphoreUnlock

I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity
and lwWaiting a single-byte quantity, it's pretty much certain that the
first store updates lwWaitLink and the second lwWaiting. Thus, no reordering
seems to have taken place here...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread knizhnik


On 02/12/2014 05:42 PM, Florian Pflug wrote:

On Feb12, 2014, at 12:55 , MauMau maumau...@gmail.com wrote:

From: Andres Freund and...@2ndquadrant.com

It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   proc-lwWaiting = false;
   PGSemaphoreUnlock(proc-sem);
to
   LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter);
   proc = head;
   proc-lwWaiting = false;
   head = proc-lwWaitLink;
   proc-lwWaitLink = NULL;
   PGSemaphoreUnlock(proc-sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.


Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for x86_64.
The PostgreSQL was built with GCC.


The relevant part of the disassembled binary you attached seems to be

Dump of assembler code for function LWLockRelease:
...
0x00647f47 LWLockRelease+519:   lea0x10(%rcx),%rdi
0x00647f4b LWLockRelease+523:   movq   $0x0,0x48(%rcx)
0x00647f53 LWLockRelease+531:   movb   $0x0,0x41(%rcx)
0x00647f57 LWLockRelease+535:   callq  0x606210 PGSemaphoreUnlock

I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity
and lwWaiting a single-byte quantity, it's pretty much certain that the
first store updates lwWaitLink and the second lwWaiting. Thus, no reordering
seems to have taken place here...

best regards,
Florian Pflug





Even if reordering was not done by compiler, it still can happen while 
execution.
There is no warranty that two subsequent assignments will be observed by all 
CPU cores in the same order.
So if one thread is performing

p-x = 1;
p-y = 2;
p-x = 3;
p-y = 4;

then other thread can see the following combinations of (x,y):

(1,2)
(1,4)
(3,2)
(3,4)

It is necessary to explicitly insert write barrier to prevent such 
non-deterministic behaviour.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-12 Thread Ants Aasma

On Wed, Feb 12, 2014 at 4:04 PM, knizhnik knizh...@garret.ru wrote:
 Even if reordering was not done by compiler, it still can happen while
 execution.
 There is no warranty that two subsequent assignments will be observed by all
 CPU cores in the same order.

The x86 memory model (total store order) provides that guarantee in
this specific case.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark

So here's my attempt to rewrite this logic. I ended up refactoring a
bit because I found it unnecessarily confusing having the mode
branches in several places. I think it's much clearer just having two
separate pieces of logic for RBM_NEW and the extension cases since all
they have in common is the ReadBuffer call.

I have to say, it scares the hell out of me that there are no
regression tests for this code. I'm certainly not comfortable
committing it without a few other people reading it if I haven't even
run the code once. At least I know it compiles...
*** a/src/backend/access/transam/xlogutils.c
--- b/src/backend/access/transam/xlogutils.c
***
*** 293,300  Buffer
  XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
  	   BlockNumber blkno, ReadBufferMode mode)
  {
- 	BlockNumber lastblock;
  	Buffer		buffer;
  	SMgrRelation smgr;
  
  	Assert(blkno != P_NEW);
--- 293,300 
  XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
  	   BlockNumber blkno, ReadBufferMode mode)
  {
  	Buffer		buffer;
+ 	Page 		page;
  	SMgrRelation smgr;
  
  	Assert(blkno != P_NEW);
***
*** 312,353  XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
  	 */
  	smgrcreate(smgr, forknum, true);
  
! 	lastblock = smgrnblocks(smgr, forknum);
! 
! 	if (blkno  lastblock)
! 	{
! 		/* page exists in file */
! 		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
! 		   mode, NULL);
! 	}
! 	else
  	{
  		/* hm, page doesn't exist in file */
! 		if (mode == RBM_NORMAL)
  		{
  			log_invalid_page(rnode, forknum, blkno, false);
  			return InvalidBuffer;
  		}
- 		/* OK to extend the file */
- 		/* we do this in recovery only - no rel-extension lock needed */
- 		Assert(InRecovery);
- 		buffer = InvalidBuffer;
- 		while (blkno = lastblock)
- 		{
- 			if (buffer != InvalidBuffer)
- ReleaseBuffer(buffer);
- 			buffer = ReadBufferWithoutRelcache(rnode, forknum,
- 			   P_NEW, mode, NULL);
- 			lastblock++;
- 		}
- 		Assert(BufferGetBlockNumber(buffer) == blkno);
- 	}
  
! 	if (mode == RBM_NORMAL)
! 	{
! 		/* check that page has been initialized */
! 		Page		page = (Page) BufferGetPage(buffer);
  
  		/*
  		 * We assume that PageIsNew is safe without a lock. During recovery,
  		 * there should be no other backends that could modify the buffer at
--- 312,332 
  	 */
  	smgrcreate(smgr, forknum, true);
  
! 	if (mode == RBM_NORMAL)
  	{
  		/* hm, page doesn't exist in file */
! 		if (blkno = smgrnblocks(smgr, forknum))
  		{
  			log_invalid_page(rnode, forknum, blkno, false);
  			return InvalidBuffer;
  		}
  
! 		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
! 		   mode, NULL);
  
+ 		/* check that page has been initialized */
+ 		page = (Page) BufferGetPage(buffer);
+ 		
  		/*
  		 * We assume that PageIsNew is safe without a lock. During recovery,
  		 * there should be no other backends that could modify the buffer at
***
*** 359,366  XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
  			log_invalid_page(rnode, forknum, blkno, true);
  			return InvalidBuffer;
  		}
- 	}
  
  	return buffer;
  }
  
--- 338,364 
  			log_invalid_page(rnode, forknum, blkno, true);
  			return InvalidBuffer;
  		}
  
+ 	} else {
+ 
+ 		/* If page doesn't exist extend the file */
+ 		while (blkno = smgrnblocks(smgr, forknum)) 
+ 		{
+ 			/* we do this in recovery only - no rel-extension lock needed */
+ 			Assert(InRecovery);
+ 			buffer = ReadBufferWithoutRelcache(rnode, forknum,
+ 			   P_NEW, mode, NULL);
+ 			/* Take care not to extend the relation past where it's needed */
+ 			if (BufferGetBlockNumber(buffer) == blkno) 
+ return buffer;
+ 			ReleaseBuffer(buffer);
+ 		}
+ 
+ 		/* page now exists in file */
+ 		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
+ 		   mode, NULL);
+ 	}
+ 	
  	return buffer;
  }
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Bruce Momjian

On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote:
 By issue, I assume you mean to say, which compression algorithm is
 best for this patch.
 For this patch, currently we have 2 algorithm's for which results have been
 posted. As far as I understand Heikki is pretty sure that the latest algorithm
 (compression using prefix-suffix match in old and new tuple) used for this
 patch is better than the other algorithm in terms of CPU gain or overhead.
 The performance data taken by me for the worst case for this algorithm
 shows there is a CPU overhead for this algorithm as well.
 
 OTOH the another algorithm (compression using old tuple as history) can be
 a bigger win in terms I/O reduction in more number of cases.
 
 In short, it is still not decided which algorithm to choose and whether
 it can be enabled by default or it is better to have table level switch
 to enable/disable it.
 
 So I think the decision to be taken here is about below points:
 1.  Are we okay with I/O reduction at the expense of CPU for *worst* cases
  and I/O reduction without impacting CPU (better overall tps) for
  *favourable* cases?
 2.  If we are not okay with worst case behaviour, then can we provide
  a table-level switch, so that it can be decided by user?
 3.  If none of above, then is there any other way to mitigate the worst
  case behaviour or shall we just reject this patch and move on.
 
 Given a choice to me, I would like to go with option-2, because I think
 for most cases UPDATE statement will have same data for old and
 new tuples except for some part of tuple (generally column's having large
 text data are not modified), so we will be end up mostly in favourable cases
 and surely for worst cases we don't want user to suffer from CPU overhead,
 so a table-level switch is also required.

I think 99.9% of users are never going to adjust this so we had better
choose something we are happy to enable for effectively everyone.  In my
reading, prefix/suffix seemed safe for everyone.  We can always revisit
this if we think of something better later, as WAL format changes are not
a problem for pg_upgrade.

I also think making it user-tunable is so hard for users to know when to
adjust as to be almost not worth the user interface complexity it adds.

I suggest we go with always-on prefix/suffix mode, then add some check
so the worst case is avoided by just giving up on compression.

As I said previously, I think compressing the page images is the next
big win in this area.

 I think here one might argue that for some users it is not feasible to
 decide whether their tuples data for UPDATE is going to be similar
 or completely different and they are not at all ready for any risk for
 CPU overhead, but they would be happy to see I/O reduction in which
 case it is difficult to decide what should be the value of table-level
 switch. Here I think the only answer is nothing is free in this world,
 so either make sure about the application's behaviour for UPDATE
 statement before going to production or just don't enable this switch and
 be happy with the current behaviour.

Again, can't set do a minimal attempt at prefix/suffix compression so
there is no measurable overhead?

 On the other side there will be users who will be pretty certain about their
 usage of UPDATE statement or atleast are ready to evaluate their
 application if they can get such a huge gain, so it would be quite useful
 feature for such users.
 
 can we move move forward with the full-page compression patch?
 
 In my opinion, it is not certain that whatever compression algorithm got
 decided for this patch (if any) can be directly used for full-page
 compression, some ideas could be used or may be the algorithm could be
 tweaked a bit to make it usable for full-page compression.

Thanks, I understand that now.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Kohei KaiGai

2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
 I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
 are fine.
 And I have comments in the third patch related to cache scan.

Thanks for your volunteering.

 1. +# contrib/dbcache/Makefile

Makefile header comment is not matched with file name location.

Ahh, it's an old name when I started to implement.

 2.+   /*
 +   * Estimation of average width of cached tuples - it does not make
 +   * sense to construct a new cache if its average width is more than
 +   * 30% of the raw data.
 +   */

Move the estimation of average width calculation of cached tuples into
 the case where the cache is not found,
otherwise it is an overhead for cache hit scenario.

You are right. If and when existing cache is found and match, its width is
obviously less than 30% of total width.

 3. + if (old_cache)
 + attrs_used = bms_union(attrs_used, old_cache-attrs_used);

can't we need the check to see the average width is more than 30%? During
 estimation it doesn't
include the existing other attributes.

Indeed. It should drop some attributes on the existing cache if total average
width grows more than the threshold. Probably, we need to have a statistical
variable to track how many times or how recently referenced.

 4. + lchunk-right = cchunk;
 + lchunk-l_depth = TTREE_DEPTH(lchunk-right);

   I think it should be lchunk-r_depth needs to be set in a clock wise
 rotation.

Oops, nice cache.

 5. can you add some comments in the code with how the block is used?

Sorry, I'll add it. A block is consumed from the head to store pointers of
tuples, and from the tail to store contents of the tuples. A block can hold
multiple tuples unless usage of tuple pointers from the head does not cross
the area for tuple contents. Anyway, I'll put it on the source code.

 6. In do_insert_tuple function I felt moving the tuples and rearranging
 their addresses is little bit costly. How about the following way?

Always insert the tuple from the bottom of the block where the empty
 space is started and store their corresponding reference pointers
in the starting of the block in an array. As and when the new tuple
 inserts this array increases from block start and tuples from block end.
Just need to sort this array based on item pointers, no need to update
 their reference pointers.

In this case the movement is required only when the tuple is moved from
 one block to another block and also whenever if the continuous
free space is not available to insert the new tuple. you can decide based
 on how frequent the sorting will happen in general.

It seems to me a reasonable suggestion.
Probably, an easier implementation is replacing an old block with dead-
spaces by a new block that contains only valid tuples, if and when dead-
space grows threshold of block-usage.

 7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups); can
 go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the forward scan
 direction.

It shouldn't happen, because the first or second ItemPointerCompare will
handle the condition. Please assume the cchunk-ntups == 1. In this case,
any given ctid shall match either of them, because any ctid is less, equal or
larger to the tuple being only cached, thus, it moves to the right or left node
according to the scan direction.

 8. I am not able to find a protection mechanism in insert/delete and etc of
 a tuple in Ttree. As this is a shared memory it can cause problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to invocation of
ccache_insert_tuple / ccache_delete_tuple.

 9. + /* merge chunks if this chunk has enough space to merge */
 + ccache_merge_chunk(ccache, cchunk);

   calling the merge chunks for every call back of heap page prune is a
 overhead for vacuum. After the merge which may again leads
   to node splits because of new data.

OK, I'll check the condition to merge the chunks, to prevent too frequent
merge / split.

 10. columner is present in some places of the patch. correct it.

Ahh, it should be columnar.

 11. In cache_scan_next function, incase of cache insert fails because of
 shared memory the tuple pointer is not reset and cache is NULL.
 Because of this during next record fetch it leads to assert as cache !=
 NULL.

You are right. I had to modify the state of scan as if normal-seqscan path,
not just setting NULL on csstate-ccache.
I left an older manner during try  error during implementation.

 12. + if (ccache-status != CCACHE_STATUS_IN_PROGRESS)
   + cs_put_ccache(ccache);

 The cache is created with refcnt as 2 and in some times two times put
 cache is called to eliminate it and in some times with a different approach.
 It is little bit confusing, can you explain in with comments with why 2
 is required and how it

Re: [HACKERS] Changeset Extraction v7.5

2014-02-12 Thread Andres Freund

On 2014-02-11 11:22:24 -0500, Robert Haas wrote:
 +   context = AllocSetContextCreate(CurrentMemoryContext,
 +
  Changeset Extraction Context,
 +
  ALLOCSET_DEFAULT_MINSIZE,
 +
  ALLOCSET_DEFAULT_INITSIZE,
 +
  ALLOCSET_DEFAULT_MAXSIZE);
 
 I have my doubts about whether it's wise to make this the child of
 CurrentMemoryContext.  Certainly, if we do that, then expectations
 around what that context is need to be documented.  Short-lived
 contexts are presumably unsuitable.

Well, it depends on the type of usage. In the walsender, yes, it needs
to be a longliving context. Not so much in the SQL case, inside the SRF
we spill all the data into a tuplestore after which we are done. I don't
see which context would be more suitable as a default parent; it used to
be TopMemoryContext but that requires pointless cleanup routines to
handle errors...

So I think documenting the requirement is the best way?

I'm working on the other comments, pushing individual changes to
git. Will send a new version onlist once I'm through.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 Great, that's what I was hoping to see - proper errors where we've
 omitted things, not silent miscompilation.

And, just to make sure that ain't nobody happy ... brolga, our one
operational Cygwin animal, is still building without complaints.

We really need to understand what is going on here and why some
toolchains don't seem to feel that lack of PGDLLIMPORT is a problem.

Nosing around, I happened to notice this in src/template/cygwin:

# --enable-auto-import gets rid of a diagnostics linker message
LDFLAGS=-Wl,--allow-multiple-definition -Wl,--enable-auto-import

Anybody know *exactly* what --enable-auto-import does?  The name
is, um, suggestive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On 2014-02-12 10:58:20 -0500, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
  Great, that's what I was hoping to see - proper errors where we've
  omitted things, not silent miscompilation.
 
 And, just to make sure that ain't nobody happy ... brolga, our one
 operational Cygwin animal, is still building without complaints.

Hm, isn't that pretty much expected? Cygwin's infrastructure tries to be
unixoid, so it's not surprising that the toolchain doesn't require such
strange things? Otherwise even larger amounts of unix software wouldn't
run, right?
I am pretty sure halfway recent versions of mingw can be made to behave
similarly, but I don't really see the advantage in doing so, to the
contrary even.

 # --enable-auto-import gets rid of a diagnostics linker message
 LDFLAGS=-Wl,--allow-multiple-definition -Wl,--enable-auto-import
 
 Anybody know *exactly* what --enable-auto-import does?  The name
 is, um, suggestive.

My ld(1)'s manpage has three screen's worth of details... Most of it
seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1

It's essentially elf like shared library linking in pe-coff through
dirty tricks.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 10:58:20 -0500, Tom Lane wrote:
 Anybody know *exactly* what --enable-auto-import does?  The name
 is, um, suggestive.

 My ld(1)'s manpage has three screen's worth of details... Most of it
 seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1

 It's essentially elf like shared library linking in pe-coff through
 dirty tricks.

Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
we ought to actually remove that, so that the Cygwin build works more like
the other Windows builds?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On 2014-02-12 11:26:56 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-02-12 10:58:20 -0500, Tom Lane wrote:
  Anybody know *exactly* what --enable-auto-import does?  The name
  is, um, suggestive.
 
  My ld(1)'s manpage has three screen's worth of details... Most of it
  seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1
 
  It's essentially elf like shared library linking in pe-coff through
  dirty tricks.
 
 Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
 we ought to actually remove that, so that the Cygwin build works more like
 the other Windows builds?

Hm, I don't see a big advantage in detecting the errors as It won't
hugely increase the buildfarm coverage. But --auto-import seems to
require marking the .text sections as writable, avoiding that seems to
be a good idea if we don't need it anymore.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 11:26:56 -0500, Tom Lane wrote:
 Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
 we ought to actually remove that, so that the Cygwin build works more like
 the other Windows builds?

 Hm, I don't see a big advantage in detecting the errors as It won't
 hugely increase the buildfarm coverage. But --auto-import seems to
 require marking the .text sections as writable, avoiding that seems to
 be a good idea if we don't need it anymore.

Given the rather small number of Windows machines in the buildfarm,
I'd really like them all to be on the same page about what's valid
and what isn't.  Having to wait ~24 hours to find out if they're
all happy with something isn't my idea of a good time.  In any case,
as long as we have discrepancies between them, I'm not going to be
entirely convinced that any of them are telling the full truth.

I'm going to go remove that switch and see if brolga starts failing.
If it does, I'll be satisfied that we understand the issues here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DATE type output does not follow datestyle parameter

2014-02-12 Thread Bruce Momjian

On Wed, Feb 12, 2014 at 10:08:26PM +0900, MauMau wrote:
 From: Bruce Momjian br...@momjian.us
 On Mon, Dec  2, 2013 at 10:22:47PM +0900, MauMau wrote:
 So, my suggestion is to just add the following sentence right after
 the above one.
 
 The Postgres style is an exception: the output of the date type is
 either MM-DD- (e.g. 12-17-1997) or DD-MM- (e.g. 17-12-1997),
 which is different from the date part of the output of the timestamp
 type.
 
 Could you consider and add this to the manual?
 
 Yes, I will make the change unless someone objects.
 
 Could you commit this if you feel okay?  I'm sorry if I missed the
 modified article in the devel doc.

OK, attached doc patch applied to head and 9.3.  Thanks for the report.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index b7d7d80..30fd9bb
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** January 8 04:05:06 1999 PST
*** 2205,2212 
   historical accident.)  xref
   linkend=datatype-datetime-output-table shows examples of each
   output style.  The output of the typedate/type and
!  typetime/type types is of course only the date or time part
!  in accordance with the given examples.
  /para
  
   table id=datatype-datetime-output-table
--- 2205,2214 
   historical accident.)  xref
   linkend=datatype-datetime-output-table shows examples of each
   output style.  The output of the typedate/type and
!  typetime/type types is generally only the date or time part
!  in accordance with the given examples.  However, the
!  productnamePOSTGRES/ style outputs date-only values in
!  acronymISO/acronym format.
  /para
  
   table id=datatype-datetime-output-table

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On 2014-02-12 11:39:43 -0500, Tom Lane wrote:
 I'm going to go remove that switch and see if brolga starts failing.
 If it does, I'll be satisfied that we understand the issues here.

The manpage also has a --disable-auto-import, but doesn't document
wheter enabling or disabling is the default... So maybe try to be
explicit?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Bruce Momjian

On Wed, Feb 12, 2014 at 11:39:43AM -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-02-12 11:26:56 -0500, Tom Lane wrote:
  Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
  we ought to actually remove that, so that the Cygwin build works more like
  the other Windows builds?
 
  Hm, I don't see a big advantage in detecting the errors as It won't
  hugely increase the buildfarm coverage. But --auto-import seems to
  require marking the .text sections as writable, avoiding that seems to
  be a good idea if we don't need it anymore.
 
 Given the rather small number of Windows machines in the buildfarm,
 I'd really like them all to be on the same page about what's valid
 and what isn't.  Having to wait ~24 hours to find out if they're
 all happy with something isn't my idea of a good time.  In any case,
 as long as we have discrepancies between them, I'm not going to be
 entirely convinced that any of them are telling the full truth.
 
 I'm going to go remove that switch and see if brolga starts failing.
 If it does, I'll be satisfied that we understand the issues here.

Can document the issue in case it comes up again, please?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On 2014-02-12 11:50:41 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-02-12 11:39:43 -0500, Tom Lane wrote:
  I'm going to go remove that switch and see if brolga starts failing.
  If it does, I'll be satisfied that we understand the issues here.
 
  The manpage also has a --disable-auto-import, but doesn't document
  wheter enabling or disabling is the default... So maybe try to be
  explicit?
 
 Yeah, I was just wondering about that myself.  Presumably --enable
 is not the default, or it would not have gotten added.  However,
 I notice that it was added a long time ago (2004) ... is it possible
 the default changed since then?

Some release notes I just found seem to suggest it is
http://sourceware.org/binutils/docs-2.17/ld/WIN32.html :

This feature is enabled with the `--enable-auto-import' command-line
option, although it is enabled by default on cygwin/mingw. The
`--enable-auto-import' option itself now serves mainly to suppress any
warnings that are ordinarily emitted when linked objects trigger the
feature's use.

2.17 is from 2007, so it's been enabled by default at least since then.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 Some release notes I just found seem to suggest it is
 http://sourceware.org/binutils/docs-2.17/ld/WIN32.html :

 This feature is enabled with the `--enable-auto-import' command-line
 option, although it is enabled by default on cygwin/mingw.

Curious ... narwhal is mingw but evidently doesn't have that switch
turned on.  OTOH we also know it's an old version.  I'm inclined to
change template/win32 also.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 11:39:43 -0500, Tom Lane wrote:
 I'm going to go remove that switch and see if brolga starts failing.
 If it does, I'll be satisfied that we understand the issues here.

 The manpage also has a --disable-auto-import, but doesn't document
 wheter enabling or disabling is the default... So maybe try to be
 explicit?

Yeah, I was just wondering about that myself.  Presumably --enable
is not the default, or it would not have gotten added.  However,
I notice that it was added a long time ago (2004) ... is it possible
the default changed since then?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 So here's my attempt to rewrite this logic. I ended up refactoring a
 bit because I found it unnecessarily confusing having the mode
 branches in several places. I think it's much clearer just having two
 separate pieces of logic for RBM_NEW and the extension cases since all
 they have in common is the ReadBuffer call.

I don't like that at all.  It's a lot of unnecessary churn in what is
indeed hard-to-test code, and personally I don't find it clearer.
Nor, if you're going to complain about the cost of smgrnblocks, does
it seem like a great idea to be doing that *twice* per page rather
than once.

How about the attached instead?

regards, tom lane

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 4cd82df..f1918f6 100644
*** a/src/backend/access/transam/xlogutils.c
--- b/src/backend/access/transam/xlogutils.c
*** XLogReadBufferExtended(RelFileNode rnode
*** 338,352 
  		/* we do this in recovery only - no rel-extension lock needed */
  		Assert(InRecovery);
  		buffer = InvalidBuffer;
! 		while (blkno = lastblock)
  		{
  			if (buffer != InvalidBuffer)
  ReleaseBuffer(buffer);
  			buffer = ReadBufferWithoutRelcache(rnode, forknum,
  			   P_NEW, mode, NULL);
- 			lastblock++;
  		}
! 		Assert(BufferGetBlockNumber(buffer) == blkno);
  	}
  
  	if (mode == RBM_NORMAL)
--- 338,358 
  		/* we do this in recovery only - no rel-extension lock needed */
  		Assert(InRecovery);
  		buffer = InvalidBuffer;
! 		do
  		{
  			if (buffer != InvalidBuffer)
  ReleaseBuffer(buffer);
  			buffer = ReadBufferWithoutRelcache(rnode, forknum,
  			   P_NEW, mode, NULL);
  		}
! 		while (BufferGetBlockNumber(buffer)  blkno);
! 		/* Handle the corner case that P_NEW returns non-consecutive pages */
! 		if (BufferGetBlockNumber(buffer) != blkno)
! 		{
! 			ReleaseBuffer(buffer);
! 			buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
! 			   mode, NULL);
! 		}
  	}
  
  	if (mode == RBM_NORMAL)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 So I think I've come up with a scenario that could cause this. I don't
 think it's exactly what happened here but maybe something analogous
 happened with our base backup restore.

I agree it seems like a good idea for XLogReadBufferExtended to defend
itself against successive P_NEW calls possibly not returning consecutive
pages.  However, unless you had an operating-system-level crash on the
master, this isn't sufficient to explain the problem.  We'd need also a
plausible theory about how a base backup could've left us with short
segments in a large relation.

 (Or maybe the hot backup
 process could just catch the files in this state if a table is rapidly
 growing and it doesn't take care to avoid picking up new files that
 appear after it starts?)

That's a possible explanation I guess, but it doesn't seem terribly
probable from a timing standpoint.  Also, you should be able to gauge
the probability of this theory from knowledge of the application ---
are the bloated relations all ones that would've been growing *very*
rapidly during the base backup?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-12 Thread Magnus Hagander

On Mon, Feb 10, 2014 at 7:39 PM, Magnus Hagander mag...@hagander.netwrote:

 On Mon, Feb 10, 2014 at 7:29 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 02/09/2014 02:17 PM, Magnus Hagander wrote:

 If an error occurs in the foreground (backup) process of pg_basebackup,
 and
 we exit in a controlled way, the background process (streaming xlog
 process) would stay around and keep streaming.

 This can happen for example if disk space runs out and there is very low
 activity on the server. (If there is activity on the server, the
 background
 streamer will also run out of disk space and exit)

 Attached patch kills it off in disconnect_and_exit(), which seems like
 the
 right thing to do to me.

 Any objections to applying and backpatching that for the upcoming minor
 releases?


 Do you get a different error message with this patch than before? Is the
 new one better than the old one?


 Previously you got double error messages - one from the foreground, and a
 second one from the background sometime in the future (whenever it
 eventually failed, and for whatever reason - so if it was out of disk
 space, it would complain about that once it got enough xlog for it to
 happen).

 With the patch you just get the error message from the first process. The
 background process doesn't give an error on SIGTERM, it just exists.



Since there were no other objections, I've applied this patch.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] truncating pg_multixact/members

2014-02-12 Thread Alvaro Herrera

In this new version, I added a couple of fields to VacuumStmt node.  How
strongly do we feel this would cause an ABI break?  Would we be more
comfortable if I put them at the end of the struct for 9.3 instead?
Do we expect third-party code to be calling vacuum()?

Also, AutoVacOpts (used as part of reloptions) gained three extra
fields.  Since this is in the middle of StdRdOptions, it'd be somewhat
more involve to put these at the end of that struct.  This might be a
problem if somebody has a module calling RelationIsSecurityView().  If
anyone thinks we should be concerned about such an ABI change, please
shout quickly.

Here is patch v3, which should be final or close to.  Changes from
previous:


Robert Haas wrote:

 Using Multixact capitalized just so seems odd to me.  Probably should
 be lower case (multiple places).

Changed it to be all lower case.  Originally the X was also upper case,
which looked even odder.

 This part needs some copy-editing:

 +   para
 +Vacuum also allows removal of old files from the
 +filenamepg_multixact/members/ and 
 filenamepg_multixact/offsets/
 +subdirectories, which is why the default is a relatively low
 +50 million transactions.
 
 Vacuuming multixacts also allows...?  And: 50 million multixacts, not
 transactions.

I reworded this rather completely.

I was missing a change to SetMultiXactIdLimit to use the multixact value
instead of the one for Xids, and passing the values computed by
autovacuum to vacuum().

Per discussion, new default values are 150 million for
vacuum_multixact_freeze_table_age (same as the one for Xids), and 5
million for vacuum_multixact_freeze_min_age.  I decided to raise
autovacuum_multixact_freeze_max_age to 400 million, i.e. double the one
for Xids; so there should be no more emergency vacuuming than before
unless multixact consumption is more than double that for Xids.  (Now
that I re-read this, the same rationale would have me setting the
default for vacuum_multixact_freeze_table_age to 300 million.  Any votes
on that?).

I adjusted the default values everywhere (docs and sample config), and
fixed one or two typos in the docco for Xid vacuuming that I happened to
notice, as well.  postgresql.conf.sample contained a couple of
space-before-tab which I removed.

!-- I thought about using a struct to pass all four values around in
multiple routines rather than 4 ints (vacuum_set_xid_limits,
cluster_rel, rebuild_relation, copy_heap_data).  Decided not to for the
time being.  Perhaps a patch for HEAD only.  --

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 4730,4735  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
--- 4730,4762 
/listitem
   /varlistentry
  
+  varlistentry id=guc-autovacuum-multixact-freeze-max-age xreflabel=autovacuum_multixact_freeze_max_age
+   termvarnameautovacuum_multixact_freeze_max_age/varname (typeinteger/type)/term
+   indexterm
+primaryvarnameautovacuum_multixact_freeze_max_age/varname configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the maximum age (in multixacts) that a table's
+ structnamepg_class/.structfieldrelminmxid/ field can
+ attain before a commandVACUUM/ operation is forced to
+ prevent multixact ID wraparound within the table.
+ Note that the system will launch autovacuum processes to
+ prevent wraparound even when autovacuum is otherwise disabled.
+/para
+ 
+para
+ Vacuuming multixacts also allows removal of old files from the
+ filenamepg_multixact/members/ and filenamepg_multixact/offsets/
+ subdirectories, which is why the default is a relatively low
+ 400 million multixacts.
+ This parameter can only be set at server start, but the setting
+ can be reduced for individual tables by changing storage parameters.
+ For more information see xref linkend=vacuum-for-multixact-wraparound.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-autovacuum-vacuum-cost-delay xreflabel=autovacuum_vacuum_cost_delay
termvarnameautovacuum_vacuum_cost_delay/varname (typeinteger/type)/term
indexterm
***
*** 5138,5144  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
  structnamepg_class/.structfieldrelfrozenxid/ field has reached
  the age specified by this setting.  The default is 150 million
  transactions.  Although users can set this value anywhere from zero to
! one billion, commandVACUUM/ will silently limit the effective value
  to 95% of xref linkend=guc-autovacuum-freeze-max-age, so that a
  periodical manual commandVACUUM/ has a chance to run before an
  anti-wraparound

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Marco Atzeri


On 12/02/2014 17:26, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 10:58:20 -0500, Tom Lane wrote:

Anybody know *exactly* what --enable-auto-import does?  The name
is, um, suggestive.



My ld(1)'s manpage has three screen's worth of details... Most of it
seems to be on http://www.freebsd.org/cgi/man.cgi?query=ldsektion=1



It's essentially elf like shared library linking in pe-coff through
dirty tricks.


Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
we ought to actually remove that, so that the Cygwin build works more like
the other Windows builds?

regards, tom lane




If I am not wrong --enable-auto-import is already the
default on cygwin build chain ( binutils = 2.19.51 ) so it should make 
no difference on latest cygwin. Not sure for you 1.7.7 build enviroment.


About PGDLLIMPORT , my build log is full of warning: ‘optarg’ 
redeclared without dllimport attribute: previous dllimport ignored 


I suspect that removing will also make no difference.

Marco

PS: we aim unix-like builds not windows one


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On 2014-02-12 19:13:07 +0100, Marco Atzeri wrote:
 On 12/02/2014 17:26, Tom Lane wrote:
 Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
 we ought to actually remove that, so that the Cygwin build works more like
 the other Windows builds?

 If I am not wrong --enable-auto-import is already the
 default on cygwin build chain ( binutils = 2.19.51 ) so it should make no
 difference on latest cygwin. Not sure for you 1.7.7 build enviroment.

We're *disabling* not *enabling* it.

 About PGDLLIMPORT , my build log is full of warning: ‘optarg’ redeclared
 without dllimport attribute: previous dllimport ignored 

That should be fixed then. I guess cygwin's getopt.h declares it that way?

 I suspect that removing will also make no difference.

The committed patch explicitly disables the functionality.

 PS: we aim unix-like builds not windows one

Well, there are a significant number of caveats around the auto import
functionality, so there seems little benefit in using it if all the
declspec's have to be there anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

2014-02-12 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 In this new version, I added a couple of fields to VacuumStmt node.  How
 strongly do we feel this would cause an ABI break?  Would we be more
 comfortable if I put them at the end of the struct for 9.3 instead?

In the past we've usually added such members at the end of the struct
in back branches (but put them in the logical place in HEAD).  I'd
recommend doing that just on principle.

 Also, AutoVacOpts (used as part of reloptions) gained three extra
 fields.  Since this is in the middle of StdRdOptions, it'd be somewhat
 more involve to put these at the end of that struct.  This might be a
 problem if somebody has a module calling RelationIsSecurityView().  If
 anyone thinks we should be concerned about such an ABI change, please
 shout quickly.

That sounds problematic --- surely StdRdOptions might be something
extensions are making use of?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark

On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 How about the attached instead?

This does possibly allocate an extra block past the target block. I'm
not sure how surprising that would be for the rest of the code.

For what it's worth I've confirmed the bug in wal-e caused the initial
problem. But I think it's possible to occur without that bug so I
think it still needs to be addressed.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2014-02-12 Thread Robert Haas

On Mon, Feb 10, 2014 at 12:14 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Presumably whatever behavior difference you are seeing is somehow
 related to the use of PQconnectdbParams() rather than PQsetdbLogin(),
 but the fine manual does not appear to document a different between
 those functions as regards password-prompting behavior or .pgpass
 usage.

 It looks like PQsetdbLogin() has either NULL or empty string passed to it
 match 5432 in pgpass, while PQconnectdbParams() only has NULL match 5432 and
 empty string does not.  pgbench uses empty string if no -p is specified.

 This make pgbench behave the way I think is correct, but it hardly seems
 like the right way to fix it.

 [ kludge ]

Well, it seems to me that the threshold issue here is whether or not
we should try to change the behavior of libpq.  If not, then your
kludge might be the best option.   But if so then it isn't necessary.
However, I don't feel confident to pass judgement on the what the
libpq semantics should be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

I wrote:
 Greg Stark st...@mit.edu writes:
 (Or maybe the hot backup
 process could just catch the files in this state if a table is rapidly
 growing and it doesn't take care to avoid picking up new files that
 appear after it starts?)

 That's a possible explanation I guess, but it doesn't seem terribly
 probable from a timing standpoint.

I did a bit of arithmetic using the cases you posted previously.
In the first case, where block 3634978 got written to 7141472,
you can make the numbers come out right if you assume that a page
was missing at the end of segment 1 --- that leads to the conclusion
that EOF exclusive of that missing page had been around 28.75 GB,
which squares well with the relation's size on master.  However, it's
fairly hard to credit that the base backup would have collected
full-size or nearly full-size images of segments 2 through 28 while
not seeing segment 1 at full size.  You'd have to assume that the
rel grew by a factor of ~14 while the base backup was in progress
--- and then didn't grow very much more afterwards.  (What state
exactly did you measure the primary rel sizes in?  Was it long
after the backup/restore, or did you rewind things somehow?)

The other examples seem to fit the theory a bit better, but this
one is hard to explain this way.

The other big problem for this theory is that you said in
http://www.postgresql.org/message-id/cam-w4hpvjcbrvv3dxg8aj0wzku08dhux-xybfdyqhnrn5bn...@mail.gmail.com

 What's worse is we created a new standby from the same base backup and
 replayed the same records and it didn't reproduce the problem.

If this were the explanation, it oughta be reproducible that way.

I still agree that XLogReadBufferExtended shouldn't be assuming that P_NEW
will not skip pages.  But I think we have another bug in here somewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 How about the attached instead?

 This does possibly allocate an extra block past the target block. I'm
 not sure how surprising that would be for the rest of the code.

Should be fine; we could end up with an extra block after a failed
extension operation in any case.

 For what it's worth I've confirmed the bug in wal-e caused the initial
 problem.

Huh?  Bug in wal-e?  What bug?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Marco Atzeri


On 12/02/2014 19:19, Andres Freund wrote:

On 2014-02-12 19:13:07 +0100, Marco Atzeri wrote:

On 12/02/2014 17:26, Tom Lane wrote:

Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
we ought to actually remove that, so that the Cygwin build works more like
the other Windows builds?



If I am not wrong --enable-auto-import is already the
default on cygwin build chain ( binutils = 2.19.51 ) so it should make no
difference on latest cygwin. Not sure for you 1.7.7 build enviroment.


We're *disabling* not *enabling* it.


remove is not disable if enable is already the default inside
binutils and gcc. Or I am missing something ?



About PGDLLIMPORT , my build log is full of warning: ‘optarg’ redeclared
without dllimport attribute: previous dllimport ignored 


That should be fixed then. I guess cygwin's getopt.h declares it that way?


from /usr/include/getopt.h

#ifndef __INSIDE_CYGWIN__
extern int __declspec(dllimport) opterr;/* if error message 
should be printed */
extern int __declspec(dllimport) optind;/* index into parent 
argv vector */
extern int __declspec(dllimport) optopt;/* character checked for 
validity */

extern int __declspec(dllimport) optreset;  /* reset getopt */
extern char __declspec(dllimport) *optarg;  /* argument associated 
with option */

#endif





I suspect that removing will also make no difference.


The committed patch explicitly disables the functionality.


PS: we aim unix-like builds not windows one


Well, there are a significant number of caveats around the auto import
functionality, so there seems little benefit in using it if all the
declspec's have to be there anyway.


I think that I am not currently using anymore the declspec in the build.
But I could be wrong, as the the postgresql build way
is the most complicated between all the ones I am dealing with.


Greetings,

Andres Freund



Cheers
Marco


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Marco Atzeri marco.atz...@gmail.com writes:
 On 12/02/2014 19:19, Andres Freund wrote:
 On 2014-02-12 19:13:07 +0100, Marco Atzeri wrote:
 About PGDLLIMPORT , my build log is full of warning: âoptargâ 
 redeclared
 without dllimport attribute: previous dllimport ignored 

 That should be fixed then. I guess cygwin's getopt.h declares it that way?

 from /usr/include/getopt.h

 extern char __declspec(dllimport) *optarg;  /* argument associated 
 with option */

Hm.  All of our files that use getopt also do

extern char *optarg;
extern intoptind,
opterr;

#ifdef HAVE_INT_OPTRESET
extern intoptreset;/* might not be declared by system headers */
#endif

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-02-12 Thread Robert Haas

On Mon, Feb 10, 2014 at 7:17 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Does it make another problem if dsm_detach() also releases lwlocks
 being allocated on the dsm segment to be released?
 Lwlocks being held is tracked in the held_lwlocks[] array; its length is
 usually 100. In case when dsm_detach() is called towards the segment
 between addr ~ (addr + length), it seems to me an easy job to walk on
 this array to find out lwlocks on the range.

Oh, sure, it could be done.  I just don't see the point.  If you're
explicitly detaching a shared memory segment while still holding
lwlocks within that segment, your code is seriously buggy.  Making
dsm_detach() silently release those locks would probably just be
hiding a bug that you'd really rather find out about.

 Just my rough idea. Doesn't it make sense to have an offset from the
 head of DSM segment that contains the lwlock, instead of the identifier
 form, to indicate a cstring datum? It does not prevent to allocate it
 later; after the postmaster starting up, and here is no problem if number
 of dynamic lwlocks grows millions or more.
 If lwlock has a tranche_offset, not tranche_id, all it needs to do is:
 1. find out either of DSM segment or regular SHM segment that contains
 the supplied lwlock.
 2. Calculate mapped_address + tranche_offset; that is the local location
 where cstring form is put.

Well, if that offset is 8 bytes, it's going to make the LWLock
structure grow past 32 bytes on common platforms, which I do not want
to do.  If it's 4 bytes, sure, that'll work, but now you've gotta make
sure that string gets into the 4GB of the relevant shared memory
segment, which sounds like an annoying requirement.  How would you
satisfy it with respect to the main shared memory segment, for
example?

I mean, one way to handle this problem is to put a hash table in the
main shared memory segment with tranche ID - name mappings.  Backends
can suck mappings out of there and cache them locally as needed.  But
that's a lot of mechanism for a facility with no known users.

 Probably, it needs a common utility routine on dsm.c to find out
 a particular DSM segment that contains the supplied address.
 (Inverse function towards dsm_segment_address)

Yeah, I've thought of that.  It may be useful enough to be worth
doing, whether we use it for this or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark

On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 How about the attached instead?

 This does possibly allocate an extra block past the target block. I'm
 not sure how surprising that would be for the rest of the code.

 Should be fine; we could end up with an extra block after a failed
 extension operation in any case.

I know it's fine on the active database, I'm not so clear whether it's
compatible with the xlog records from the primary. I suppose it'll
just see an Initialize Page record and happily see the nul block and
initialize it. It's still a bit scary.

I of course think my code is vastly clearer than the existing code but
yes, I see the unnecessary churn argument. I think that's the
fundamental problem with lacking tests, it makes the code get more and
more complex as we're reluctant to simplify it out of fear.

 For what it's worth I've confirmed the bug in wal-e caused the initial
 problem.

 Huh?  Bug in wal-e?  What bug?

WAL-E actually didn't restore a whole 1GB file due to a transient S3
problem, in fact a bunch of them. It's remarkable that Postgres kept
going with that much data missing. But the arithmetic worked out on
the case I checked it on, which was the last one that I just sent the
xlog record for last night. In that case there was precisely one
segment missing and the relation was extended by the number of
segments you would expect if it filled in that missing segment and
then jumped to the end of the relation.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andrew Dunstan

A client of ours encountered this problem when upgrading (via slony) 
from 8.4 to 9.2, and then from 8.4 to to 9.3. The application is a 
telephony app that inserts call records at pretty hight volume in the 
busiest part of the day, which usually starts around 10.00 am to 11.00 
am US eastern time. What happened was that the system load average shot 
up from, say, 6 to way over 100, and the database became largely 
unresponsive. This problem did not occur on 8.4. The platform is CentOS 
6.4 / x86_64.


On investigation I found that a number of processes were locked waiting 
for one wedged process to end its transaction, which never happened 
(this transaction should normally take milliseconds). oprofile revealed 
that postgres was spending 87% of its time in s_lock(), and strace on 
the wedged process revealed that it was in a tight loop constantly 
calling select(). It did not respond to a SIGTERM. I attached a debugger 
to the wedged process and got this backtrace:


   #0  0x00386a2eaf37 in semop () from /lib64/libc.so.6
   #1  0x006021a7 in PGSemaphoreLock ()
   #2  0x006463a1 in LWLockAcquire ()
   #3  0x00631997 in ReadBuffer_common ()
   #4  0x006322ee in ReadBufferExtended ()
   #5  0x00467475 in ginPrepareFindLeafPage ()
   #6  0x0046663f in ginInsertItemPointers ()
   #7  0x00462adc in ginEntryInsert ()
   #8  0x0046cad0 in ginInsertCleanup ()
   #9  0x0046d7c6 in ginHeapTupleFastInsert ()
   #10 0x00462cfa in gininsert ()
   #11 0x00722b33 in FunctionCall6Coll ()
   #12 0x0048bdff in index_insert ()
   #13 0x00587595 in ExecInsertIndexTuples ()
   #14 0x005948e1 in ExecModifyTable ()
   #15 0x0057e718 in ExecProcNode ()
   #16 0x0057d512 in standard_ExecutorRun ()
   [...]



Armed with this the client identified a single gin index (on an array of 
text) that could have been involved, and on removing the index the 
problem simply went away, and they have now been live on 9.3 for 36 
hours without a mishap.


The client is attempting to create a self-contained reproducible test, 
but understandably are not going to re-add the index to their production 
system.


I'm not terribly familiar with the gin code, so I'm reporting this in 
the hope that other people who are more familiar with it than I am might 
know where to look for a potential race condition or other bug that 
might have caused this.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andres Freund

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
 On investigation I found that a number of processes were locked waiting for
 one wedged process to end its transaction, which never happened (this
 transaction should normally take milliseconds). oprofile revealed that
 postgres was spending 87% of its time in s_lock(), and strace on the wedged
 process revealed that it was in a tight loop constantly calling select(). It
 did not respond to a SIGTERM. I attached a debugger to the wedged process
 and got this backtrace:
 
#0  0x00386a2eaf37 in semop () from /lib64/libc.so.6
#1  0x006021a7 in PGSemaphoreLock ()
#2  0x006463a1 in LWLockAcquire ()
#3  0x00631997 in ReadBuffer_common ()
#4  0x006322ee in ReadBufferExtended ()
#5  0x00467475 in ginPrepareFindLeafPage ()
#6  0x0046663f in ginInsertItemPointers ()
#7  0x00462adc in ginEntryInsert ()
#8  0x0046cad0 in ginInsertCleanup ()
#9  0x0046d7c6 in ginHeapTupleFastInsert ()
#10 0x00462cfa in gininsert ()
#11 0x00722b33 in FunctionCall6Coll ()
#12 0x0048bdff in index_insert ()
#13 0x00587595 in ExecInsertIndexTuples ()
#14 0x005948e1 in ExecModifyTable ()
#15 0x0057e718 in ExecProcNode ()
#16 0x0057d512 in standard_ExecutorRun ()
[...]
 
 
 
 Armed with this the client identified a single gin index (on an array of
 text) that could have been involved, and on removing the index the problem
 simply went away, and they have now been live on 9.3 for 36 hours without a
 mishap.
 
 The client is attempting to create a self-contained reproducible test, but
 understandably are not going to re-add the index to their production system.
 
 I'm not terribly familiar with the gin code, so I'm reporting this in the
 hope that other people who are more familiar with it than I am might know
 where to look for a potential race condition or other bug that might have
 caused this.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

The fix is usually to just turn FASTUPDATE off for all indexes. If it's
done after the initial creation, the table should be vacuumed afterwards
to flush it.

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 This does possibly allocate an extra block past the target block. I'm
 not sure how surprising that would be for the rest of the code.

 Should be fine; we could end up with an extra block after a failed
 extension operation in any case.

 I know it's fine on the active database, I'm not so clear whether it's
 compatible with the xlog records from the primary. I suppose it'll
 just see an Initialize Page record and happily see the nul block and
 initialize it. It's still a bit scary.

Well, we can easily find uninitialized extra pages on the primary too,
so if WAL replay were unable to cope with that, it would be a bug
regardless.

 Huh?  Bug in wal-e?  What bug?

 WAL-E actually didn't restore a whole 1GB file due to a transient S3
 problem, in fact a bunch of them.

Hah.  Okay, I think we can write this issue off as closed then.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 There's previously been talk about changing the limits to something more
 reasonable but it got stalled in bikeshedding IIRC.

As I recall, there was argument that we didn't really need a new GUC for
this (which was the proposal) but rather just need to pick a reasonable
(small) value and hard-code it.  Are there objections to doing so, or
are there cases where that would be a serious problem?  How do people
feel about 4MB?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
 On investigation I found that a number of processes were locked waiting for
 one wedged process to end its transaction, which never happened (this
 transaction should normally take milliseconds). oprofile revealed that
 postgres was spending 87% of its time in s_lock(), and strace on the wedged
 process revealed that it was in a tight loop constantly calling select(). It
 did not respond to a SIGTERM.

 That's a deficiency of the gin fastupdate cache: a) it bases it's size
 on work_mem which usually makes it *far* too big b) it doesn't perform the
 cleanup in one go if it can get a suitable lock, but does independent
 locking for each entry. That usually leads to absolutely horrific
 performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

I wrote:
 Greg Stark st...@mit.edu writes:
 WAL-E actually didn't restore a whole 1GB file due to a transient S3
 problem, in fact a bunch of them.

 Hah.  Okay, I think we can write this issue off as closed then.

Oh, wait a minute.  It's not just a matter of whether we find the right
block: we also have to consider whether XLogReadBufferExtended will
apply the right mode behavior.  Currently, it supposes that all pages
past the initially observed EOF should be assumed to be uninitialized;
but if we're working with an inconsistent database, that seems like
an unsafe assumption.  It might be that a page is there but we've not
(yet) fixed the length of some preceding segment.  If we want to not
get bogus WAL contains references to invalid pages failures in such
scenarios, it seems like we need a more invasive change than what
I just committed.  I think your patch didn't cover this consideration
either.

What I think we probably want to do is forcibly cause the target page
to exist, using a P_NEW loop like what I committed, and then decide
on the basis of whether it's all-zeroes whether to consider it invalid
or not.  This seems sane on the grounds that it's just the extension
to the page level of the existing policy of creating the file whether
it existed or not.  It could only result in a large amount of wasted
work if the passed-in target block is insane --- but since we got it
out of a CRC-checked WAL record, I think it's safe to not worry too
much about that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andrew Dunstan



On 02/12/2014 02:50 PM, Andres Freund wrote:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked waiting for
one wedged process to end its transaction, which never happened (this
transaction should normally take milliseconds). oprofile revealed that
postgres was spending 87% of its time in s_lock(), and strace on the wedged
process revealed that it was in a tight loop constantly calling select(). It
did not respond to a SIGTERM. I attached a debugger to the wedged process
and got this backtrace:

#0  0x00386a2eaf37 in semop () from /lib64/libc.so.6
#1  0x006021a7 in PGSemaphoreLock ()
#2  0x006463a1 in LWLockAcquire ()
#3  0x00631997 in ReadBuffer_common ()
#4  0x006322ee in ReadBufferExtended ()
#5  0x00467475 in ginPrepareFindLeafPage ()
#6  0x0046663f in ginInsertItemPointers ()
#7  0x00462adc in ginEntryInsert ()
#8  0x0046cad0 in ginInsertCleanup ()
#9  0x0046d7c6 in ginHeapTupleFastInsert ()
#10 0x00462cfa in gininsert ()
#11 0x00722b33 in FunctionCall6Coll ()
#12 0x0048bdff in index_insert ()
#13 0x00587595 in ExecInsertIndexTuples ()
#14 0x005948e1 in ExecModifyTable ()
#15 0x0057e718 in ExecProcNode ()
#16 0x0057d512 in standard_ExecutorRun ()
[...]



Armed with this the client identified a single gin index (on an array of
text) that could have been involved, and on removing the index the problem
simply went away, and they have now been live on 9.3 for 36 hours without a
mishap.

The client is attempting to create a self-contained reproducible test, but
understandably are not going to re-add the index to their production system.

I'm not terribly familiar with the gin code, so I'm reporting this in the
hope that other people who are more familiar with it than I am might know
where to look for a potential race condition or other bug that might have
caused this.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

The fix is usually to just turn FASTUPDATE off for all indexes. If it's
done after the initial creation, the table should be vacuumed afterwards
to flush it.

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.



So this doesn't work just when it might be most useful?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

2014-02-12 Thread Alvaro Herrera

Tom Lane escribió:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  In this new version, I added a couple of fields to VacuumStmt node.  How
  strongly do we feel this would cause an ABI break?  Would we be more
  comfortable if I put them at the end of the struct for 9.3 instead?
 
 In the past we've usually added such members at the end of the struct
 in back branches (but put them in the logical place in HEAD).  I'd
 recommend doing that just on principle.

Okay.

  Also, AutoVacOpts (used as part of reloptions) gained three extra
  fields.  Since this is in the middle of StdRdOptions, it'd be somewhat
  more involve to put these at the end of that struct.  This might be a
  problem if somebody has a module calling RelationIsSecurityView().  If
  anyone thinks we should be concerned about such an ABI change, please
  shout quickly.
 
 That sounds problematic --- surely StdRdOptions might be something
 extensions are making use of?

So can we assume that security_barrier is the only thing to be concerned
about?  If so, the attached patch should work around the issue by
placing it in the same physical location.  I guess if there are modules
that add extra stuff beyond StdRdOptions, this wouldn't work, but I'm
not really sure how likely this is given that our reloptions design
hasn't proven to be the most extensible thing in the world.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
[1;31m*** a/src/include/utils/rel.h[0;0m
[1;34m--- b/src/include/utils/rel.h[0;0m
[1;35m***[0;0m
[1;35m*** 187,193  typedef struct RelationData[0;0m
[0;0m   * be applied to relations that use this format or a superset for[0;0m
[0;0m   * private options data.[0;0m
[0;0m   */[0;0m
[1;31m!  /* autovacuum-related reloptions. */[0;0m
[0;0m  typedef struct AutoVacOpts[0;0m
[0;0m  {[0;0m
[0;0m  	bool		enabled;[0;0m
[1;35m--- 187,203 [0;0m
[0;0m   * be applied to relations that use this format or a superset for[0;0m
[0;0m   * private options data.[0;0m
[0;0m   */[0;0m
[1;34m!  /*[0;0m
[1;34m!   * autovacuum-related reloptions.[0;0m
[1;34m!   *[0;0m
[1;34m!   * In 9.3 starting from 9.3.3 we use a different struct definition,[0;0m
[1;34m!   * accomodating security_barrier inside the autovacuum struct, so that new[0;0m
[1;34m!   * fields could be added at the end.  This is so that third-party modules[0;0m
[1;34m!   * compiled with the original definition of the struct can continue to[0;0m
[1;34m!   * access the security_barrier field in its original physical location,[0;0m
[1;34m!   * avoiding an ABI break.  (9.4 and up use the normal definition, where[0;0m
[1;34m!   * security_barrier is placed outside autovacuum options.)[0;0m
[1;34m!   */[0;0m
[0;0m  typedef struct AutoVacOpts[0;0m
[0;0m  {[0;0m
[0;0m  	bool		enabled;[0;0m
[1;35m***[0;0m
[1;35m*** 200,213  typedef struct AutoVacOpts[0;0m
[0;0m  	int			freeze_table_age;[0;0m
[0;0m  	float8		vacuum_scale_factor;[0;0m
[0;0m  	float8		analyze_scale_factor;[0;0m
[0;0m  } AutoVacOpts;[0;0m
[0;0m  [0;0m
[0;0m  typedef struct StdRdOptions[0;0m
[0;0m  {[0;0m
[0;0m  	int32		vl_len_;		/* varlena header (do not touch directly!) */[0;0m
[0;0m  	int			fillfactor;		/* page fill factor in percent (0..100) */[0;0m
[1;31m! 	AutoVacOpts autovacuum;		/* autovacuum-related options */[0;0m
[1;31m! 	bool		security_barrier;		/* for views */[0;0m
[0;0m  } StdRdOptions;[0;0m
[0;0m  [0;0m
[0;0m  #define HEAP_MIN_FILLFACTOR			10[0;0m
[1;35m--- 210,226 [0;0m
[0;0m  	int			freeze_table_age;[0;0m
[0;0m  	float8		vacuum_scale_factor;[0;0m
[0;0m  	float8		analyze_scale_factor;[0;0m
[1;34m+ 	bool		security_barrier;[0;0m
[1;34m+ 	int			multixact_freeze_min_age;[0;0m
[1;34m+ 	int			multixact_freeze_max_age;[0;0m
[1;34m+ 	int			multixact_freeze_table_age;[0;0m
[0;0m  } AutoVacOpts;[0;0m
[0;0m  [0;0m
[0;0m  typedef struct StdRdOptions[0;0m
[0;0m  {[0;0m
[0;0m  	int32		vl_len_;		/* varlena header (do not touch directly!) */[0;0m
[0;0m  	int			fillfactor;		/* page fill factor in percent (0..100) */[0;0m
[1;34m! 	AutoVacOpts	autovacuum;		/* autovacuum -- includes security_barrier */[0;0m
[0;0m  } StdRdOptions;[0;0m
[0;0m  [0;0m
[0;0m  #define HEAP_MIN_FILLFACTOR			10[0;0m
[1;35m***[0;0m
[1;35m*** 238,247  typedef struct StdRdOptions[0;0m
[0;0m  /*[0;0m
[0;0m   * RelationIsSecurityView[0;0m
[0;0m   *		Returns whether the relation is security view, or not[0;0m
[0;0m   */[0;0m
[0;0m  #define RelationIsSecurityView(relation)	\[0;0m
[0;0m  	((relation)-rd_options ?\[0;0m
[1;31m! 	 ((StdRdOptions *) (relation)-rd_options)-security_barrier : false)[0;0m
[0;0m  [0;0m
[0;0m  /*[0;0m
[0;0m   * RelationIsValid[0;0m
[1;35m--- 251,264 [0;0m
[0;0m  /*[0;0m
[0;0m   * RelationIsSecurityView[0;0m
[0;0m   *		Returns whether the relation is security

Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andres Freund

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
 On investigation I found that a number of processes were locked
waiting for
 one wedged process to end its transaction, which never happened
(this
 transaction should normally take milliseconds). oprofile revealed
that
 postgres was spending 87% of its time in s_lock(), and strace on the
wedged
 process revealed that it was in a tight loop constantly calling
select(). It
 did not respond to a SIGTERM.

 That's a deficiency of the gin fastupdate cache: a) it bases it's
size
 on work_mem which usually makes it *far* too big b) it doesn't
perform the
 cleanup in one go if it can get a suitable lock, but does independent
 locking for each entry. That usually leads to absolutely horrific
 performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't actually 
stuck, just livelocked. I'm just on my mobile right now, but afair Andrew 
described a loop involving lots of semaphores and spinlock, that shouldn't be 
the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a large 
amount of items one by one, it's not surprising if its dramatically slow.

Andres
-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Heikki Linnakangas


On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.



That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.


I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?


No. But I've several times seen similar backtraces where it wasn't actually 
stuck, just livelocked. I'm just on my mobile right now, but afair Andrew 
described a loop involving lots of semaphores and spinlock, that shouldn't be 
the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a large 
amount of items one by one, it's not surprising if its dramatically slow.


Perhaps we should use a lock to enforce that only one process tries to 
clean up the pending list at a time.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Peter Eisentraut

On 2/11/14, 7:04 PM, Craig Ringer wrote:
 I don't see any use for that with plperl, but it might be a valid thing
 to be doing for (e.g.) hstore.dll. Though you can't really link to it
 from another module anyway, you have to go through the fmgr to get
 access to its symbols at rutime, so we can probably just skip generation
 of import libraries for contribs and PLs.

There are cases where one module needs symbols from another directly.
Would that be affected by this?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-12 Thread Peter Eisentraut

On 2/12/14, 12:47 PM, Magnus Hagander wrote:
 Since there were no other objections, I've applied this patch. 

I'm getting a compiler warning:

pg_basebackup.c:105:3: error: implicit declaration of function 'kill'
[-Werror=implicit-function-declaration]



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net wrote:
On 2/11/14, 7:04 PM, Craig Ringer wrote:
 I don't see any use for that with plperl, but it might be a valid
thing
 to be doing for (e.g.) hstore.dll. Though you can't really link to it
 from another module anyway, you have to go through the fmgr to get
 access to its symbols at rutime, so we can probably just skip
generation
 of import libraries for contribs and PLs.

There are cases where one module needs symbols from another directly.
Would that be affected by this?

I don't think we have real infrastructure for that yet. Neither from the POV of 
loading several .so's, nor from a symbol visibility. Afaics we'd need a working 
definition of PGDLLIMPORT which inverts the declspecs. I think Tom just removed 
the remnants of that.

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

I wrote:
 What I think we probably want to do is forcibly cause the target page
 to exist, using a P_NEW loop like what I committed, and then decide
 on the basis of whether it's all-zeroes whether to consider it invalid
 or not.  This seems sane on the grounds that it's just the extension
 to the page level of the existing policy of creating the file whether
 it existed or not.  It could only result in a large amount of wasted
 work if the passed-in target block is insane --- but since we got it
 out of a CRC-checked WAL record, I think it's safe to not worry too
 much about that.

Like the attached.  A possible complaint is that if the WAL sequence
contains updates against large relations that are later dropped,
this will waste time and disk space replaying those updates as best
it can.  Doesn't seem like that's a case to optimize for, however.

regards, tom lane

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index f1918f6..7a820c0 100644
*** a/src/backend/access/transam/xlogutils.c
--- b/src/backend/access/transam/xlogutils.c
*** XLogReadBuffer(RelFileNode rnode, BlockN
*** 277,297 
   * XLogReadBufferExtended
   *		Read a page during XLOG replay
   *
!  * This is functionally comparable to ReadBufferExtended. There's some
!  * differences in the behavior wrt. the mode argument:
   *
!  * In RBM_NORMAL mode, if the page doesn't exist, or contains all-zeroes, we
!  * return InvalidBuffer. In this case the caller should silently skip the
!  * update on this page. (In this situation, we expect that the page was later
!  * dropped or truncated. If we don't see evidence of that later in the WAL
!  * sequence, we'll complain at the end of WAL replay.)
   *
   * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the
   * relation is extended with all-zeroes pages up to the given block number.
   *
!  * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't
!  * exist, and we don't check for all-zeroes.  Thus, no log entry is made
!  * to imply that the page should be dropped or truncated later.
   */
  Buffer
  XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
--- 277,307 
   * XLogReadBufferExtended
   *		Read a page during XLOG replay
   *
!  * This is functionally comparable to ReadBufferExtended, except that we
!  * are willing to create the target page (and indeed the whole relation)
!  * if it doesn't currently exist.  This allows safe replay of WAL sequences
!  * in which a relation was later dropped or truncated.
   *
!  * The mode argument provides some control over this behavior.  (See also
!  * ReadBufferExtended's specification of what the modes do.)
   *
   * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the
   * relation is extended with all-zeroes pages up to the given block number.
+  * These modes should be used if the caller is going to initialize the page
+  * contents from scratch, and doesn't need it to be valid already.
   *
!  * In RBM_NORMAL mode, we similarly create the page if needed, but if the
!  * page contains all zeroes (including the case where we just created it),
!  * we return InvalidBuffer.  Then the caller should silently skip the update
!  * on this page.  This mode should be used for incremental updates where the
!  * caller needs to see a valid page.  (In this case, we expect that the page
!  * later gets dropped or truncated. If we don't see evidence of that later in
!  * the WAL sequence, we'll complain at the end of WAL replay.)
!  *
!  * RBM_NORMAL_NO_LOG mode is like RBM_NORMAL except that we will return an
!  * all-zeroes page, and not log it as one that ought to get dropped later.
!  * This mode is for when the caller wants to read a page that might validly
!  * contain zeroes.
   */
  Buffer
  XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
*** XLogReadBufferExtended(RelFileNode rnode
*** 299,304 
--- 309,315 
  {
  	BlockNumber lastblock;
  	Buffer		buffer;
+ 	bool		present;
  	SMgrRelation smgr;
  
  	Assert(blkno != P_NEW);
*** XLogReadBufferExtended(RelFileNode rnode
*** 316,342 
  	 */
  	smgrcreate(smgr, forknum, true);
  
  	lastblock = smgrnblocks(smgr, forknum);
  
  	if (blkno  lastblock)
  	{
  		/* page exists in file */
  		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
  		   mode, NULL);
  	}
  	else
  	{
! 		/* hm, page doesn't exist in file */
! 		if (mode == RBM_NORMAL)
! 		{
! 			log_invalid_page(rnode, forknum, blkno, false);
! 			return InvalidBuffer;
! 		}
! 		if (mode == RBM_NORMAL_NO_LOG)
! 			return InvalidBuffer;
! 		/* OK to extend the file */
  		/* we do this in recovery only - no rel-extension lock needed */
  		Assert(InRecovery);
  		buffer = InvalidBuffer;
  		do
  		{
--- 327,357 
  	 */
  	smgrcreate(smgr, forknum, true);
  
+ 	/*
+ 	 * On the same principle, if the page doesn't already exist in the

Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-12 Thread Magnus Hagander

On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 2/12/14, 12:47 PM, Magnus Hagander wrote:
  Since there were no other objections, I've applied this patch.

 I'm getting a compiler warning:

 pg_basebackup.c:105:3: error: implicit declaration of function 'kill'
 [-Werror=implicit-function-declaration]


What platform is that? And do you know which header the declaration
actually lives in? I don't see it here...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net 
 wrote:
 There are cases where one module needs symbols from another directly.
 Would that be affected by this?

 I don't think we have real infrastructure for that yet. Neither from the POV 
 of loading several .so's, nor from a symbol visibility. Afaics we'd need a 
 working definition of PGDLLIMPORT which inverts the declspecs. I think Tom 
 just removed the remnants of that.

No, I've not touched the PGDLLIMPORT macros.  I was hoping to, but it
looks like we're not getting there :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory usage of pg_upgrade

2014-02-12 Thread Bruce Momjian

On Mon, Feb  3, 2014 at 09:14:10PM -0500, Bruce Momjian wrote:
 On Mon, Sep  9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
   In the case of tablespaces, I should have thought you could keep a
   hash table of the names and just store an entry id in the table
   structure. But that's just my speculation without actually looking
   at the code, so don't take my word for it :-)
  
  Yes, please feel free to improve the code.  I improved pg_upgrade CPU
  usage for a lerge number of objects, but never thought to look at memory
  usage.  It would be a big win to just palloc/pfree the memory, rather
  than allocate tones of memory.  If you don't get to it, I will in a few
  weeks.
 
 Thanks you for pointing out this problem.  I have researched the cause
 and the major problem was that I was allocating the maximum path length
 in a struct rather than allocating just the length I needed, and was not
 reusing string pointers that I knew were not going to change.
 
 The updated attached patch significantly decreases memory consumption:
 
   tables  orig  patch % decrease
   
   127,168 kB  27,168 kB0
   1k   46,136 kB  27,920 kB   39
   2k   65,224 kB  28,796 kB   56
   4k  103,276 kB  30,472 kB   70
   8k  179,512 kB  33,900 kB   81
   16k 331,860 kB  40,788 kB   88
   32k 636,544 kB  54,572 kB   91
   64k   1,245,920 kB  81,876 kB   93
 
 As you can see, a database with 64k tables shows a 93% decrease in
 memory use.  I plan to apply this for PG 9.4.

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Peter Eisentraut

On 2/12/14, 4:30 PM, Andres Freund wrote:
 There are cases where one module needs symbols from another directly.
 Would that be affected by this?
 
 I don't think we have real infrastructure for that yet. Neither from the POV 
 of loading several .so's, nor from a symbol visibility. Afaics we'd need a 
 working definition of PGDLLIMPORT which inverts the declspecs. I think Tom 
 just removed the remnants of that.

It works reasonably well on other platforms.

Of course, we can barely build extension modules on Windows, so maybe
this is a bit much to ask.  But as long as we're dealing only with
functions, not variables, it should work without any dllimport dances,
right?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 There are cases where one module needs symbols from another directly.
 Would that be affected by this?

 It works reasonably well on other platforms.

 Of course, we can barely build extension modules on Windows, so maybe
 this is a bit much to ask.  But as long as we're dealing only with
 functions, not variables, it should work without any dllimport dances,
 right?

AFAIK we've not changed anything that would affect that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [DOCS] [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)

2014-02-12 Thread Peter Eisentraut

On 2/8/14, 4:41 PM, Tom Lane wrote:
 diff --git a/HISTORY b/HISTORY
 index ...360c7f6 .
 *** a/HISTORY
 --- b/HISTORY
 ***
 *** 0 
 --- 1,6 
 + Release notes for all versions of PostgreSQL can be found on-line at
 + http://www.postgresql.org/docs/devel/static/release.html

Should be current instead of devel?

 + 
 + In a distribution file set, release notes for the current version can be
 + found prebuilt under doc/src/sgml/html/.  Visit the index.html file with
 + an HTML browser, then consult the Release Notes appendix.

You can point them straight at doc/src/sgml/html/release.html.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On February 12, 2014 10:35:06 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:
Andres Freund and...@2ndquadrant.com writes:
 On February 12, 2014 10:23:21 PM CET, Peter Eisentraut
pete...@gmx.net wrote:
 There are cases where one module needs symbols from another
directly.
 Would that be affected by this?

 I don't think we have real infrastructure for that yet. Neither from
the POV of loading several .so's, nor from a symbol visibility. Afaics
we'd need a working definition of PGDLLIMPORT which inverts the
declspecs. I think Tom just removed the remnants of that.

No, I've not touched the PGDLLIMPORT macros.  I was hoping to, but it
looks like we're not getting there :-(

Right, that was just the test patch... Then the macros we're using in fmgr.h 
for the magic macros (even if not strictly needed) should work for Peter's case.

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-12 Thread Andres Freund

On February 12, 2014 10:34:47 PM CET, Magnus Hagander mag...@hagander.net 
wrote:
On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net
wrote:

 On 2/12/14, 12:47 PM, Magnus Hagander wrote:
  Since there were no other objections, I've applied this patch.

 I'm getting a compiler warning:

 pg_basebackup.c:105:3: error: implicit declaration of function 'kill'
 [-Werror=implicit-function-declaration]


What platform is that? And do you know which header the declaration
actually lives in? I don't see it here...

Should be in the signal.h you added a bit later according to posix.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-12 Thread Peter Eisentraut

On 2/12/14, 4:34 PM, Magnus Hagander wrote:
 On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net
 mailto:pete...@gmx.net wrote:
 
 On 2/12/14, 12:47 PM, Magnus Hagander wrote:
  Since there were no other objections, I've applied this patch.
 
 I'm getting a compiler warning:
 
 pg_basebackup.c:105:3: error: implicit declaration of function 'kill'
 [-Werror=implicit-function-declaration]
 
 
 What platform is that? And do you know which header the declaration
 actually lives in? I don't see it here... 

OS X, signal.h according to man page






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-12 Thread Marti Raudsepp

On Mon, Feb 10, 2014 at 8:59 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Done. Patch is splitted.

Thanks!

I think the 1st patch now has a bug in initial_cost_mergejoin; you
still pass the presorted_keys argument to cost_sort, making it
calculate a partial sort cost, but generated plans never use partial
sort. I think 0 should be passed instead. Patch attached, needs to be
applied on top of partial-sort-basic-1 and then reverse-applied on
partial-sort-merge-1.

With partial-sort-basic-1 and this fix on the same test suite, the
planner overhead is now a more manageable 0.5% to 1.3%; one test is
faster by 0.5%. Built with asserts disabled, ran on Intel i5-3570K. In
an effort to reduce variance, I locked the server and pgbench to a
single CPU core (taskset -c 3), but there are still noticeable
run-to-run differences, so these numbers are a bit fuzzy. The faster
result is definitely not a fluke, however; it happens every time.

 On Mon, Feb 10, 2014 at 2:33 PM, Marti Raudsepp ma...@juffo.org wrote:
 AFAICT this only happens once per plan and the overhead is O(n) to the
 number of pathkeys?

I was of course wrong about that, it also adds extra overhead when
iterating over the paths list.


Test taskset -c 3 run2.sh from
https://github.com/intgr/benchjunk/tree/master/partial_sort

Overhead percentages (between best of each 3 runs):
join5.sql 0.7
star5.sql 0.8
line5.sql 0.5
lim_join5.sql -0.5
lim_star5.sql 1.3
lim_line5.sql 0.5
limord_join5.sql 0.6
limord_star5.sql 0.5
limord_line5.sql 0.7

Raw results:
git 48870dd
join5.sql tps = 509.328070 (excluding connections establishing)
join5.sql tps = 509.772190 (excluding connections establishing)
join5.sql tps = 510.651517 (excluding connections establishing)
star5.sql tps = 499.208698 (excluding connections establishing)
star5.sql tps = 498.200314 (excluding connections establishing)
star5.sql tps = 496.269315 (excluding connections establishing)
line5.sql tps = 797.968831 (excluding connections establishing)
line5.sql tps = 797.011690 (excluding connections establishing)
line5.sql tps = 796.379258 (excluding connections establishing)
lim_join5.sql tps = 394.946024 (excluding connections establishing)
lim_join5.sql tps = 395.417689 (excluding connections establishing)
lim_join5.sql tps = 395.482958 (excluding connections establishing)
lim_star5.sql tps = 423.434393 (excluding connections establishing)
lim_star5.sql tps = 423.774305 (excluding connections establishing)
lim_star5.sql tps = 424.386099 (excluding connections establishing)
lim_line5.sql tps = 733.007330 (excluding connections establishing)
lim_line5.sql tps = 731.794731 (excluding connections establishing)
lim_line5.sql tps = 732.356280 (excluding connections establishing)
limord_join5.sql tps = 385.317921 (excluding connections establishing)
limord_join5.sql tps = 385.915870 (excluding connections establishing)
limord_join5.sql tps = 384.747848 (excluding connections establishing)
limord_star5.sql tps = 417.992615 (excluding connections establishing)
limord_star5.sql tps = 416.944685 (excluding connections establishing)
limord_star5.sql tps = 418.262647 (excluding connections establishing)
limord_line5.sql tps = 708.979203 (excluding connections establishing)
limord_line5.sql tps = 710.926866 (excluding connections establishing)
limord_line5.sql tps = 710.928907 (excluding connections establishing)

48870dd + partial-sort-basic-1.patch.gz + fix-cost_sort.patch
join5.sql tps = 505.488181 (excluding connections establishing)
join5.sql tps = 507.222759 (excluding connections establishing)
join5.sql tps = 506.549654 (excluding connections establishing)
star5.sql tps = 495.432915 (excluding connections establishing)
star5.sql tps = 494.906793 (excluding connections establishing)
star5.sql tps = 492.623808 (excluding connections establishing)
line5.sql tps = 789.315968 (excluding connections establishing)
line5.sql tps = 793.875456 (excluding connections establishing)
line5.sql tps = 790.545990 (excluding connections establishing)
lim_join5.sql tps = 396.956732 (excluding connections establishing)
lim_join5.sql tps = 397.515213 (excluding connections establishing)
lim_join5.sql tps = 397.578669 (excluding connections establishing)
lim_star5.sql tps = 417.459963 (excluding connections establishing)
lim_star5.sql tps = 418.024803 (excluding connections establishing)
lim_star5.sql tps = 418.830234 (excluding connections establishing)
lim_line5.sql tps = 729.186915 (excluding connections establishing)
lim_line5.sql tps = 726.288788 (excluding connections establishing)
lim_line5.sql tps = 728.123296 (excluding connections establishing)
limord_join5.sql tps = 383.484767 (excluding connections establishing)
limord_join5.sql tps = 383.021960 (excluding connections establishing)
limord_join5.sql tps = 383.722051 (excluding connections establishing)
limord_star5.sql tps = 414.138460 (excluding connections establishing)
limord_star5.sql tps = 414.063766 (excluding connections establishing)
limord_star5.sql

Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

2014-02-12 Thread Bruce Momjian

On Wed, Sep 11, 2013 at 02:10:45PM -0400, Robert Haas wrote:
 On Tue, Sep 10, 2013 at 11:33 PM, Noah Misch n...@leadboat.com wrote:
  I'm thinking to preserve postmaster.pid at immediate shutdown in all 
  released
  versions, but I'm less sure about back-patching a change to make
  PGSharedMemoryCreate() pickier.  On the one hand, allowing startup to 
  proceed
  with backends still active in the same data directory is a corruption 
  hazard.
  On the other hand, it could break weird shutdown/restart patterns that 
  permit
  trivial lifespan overlap between backends of different postmasters.  
  Opinions?
 
 I'm more sanguine about the second change than the first.  Leaving
 postmaster.pid around seems like a clear user-visible behavior change
 that could break user scripts or have other consequences that we don't
 foresee; thus, I would vote against back-patching it.  Indeed, I'm not
 sure it's a good idea to do that even in master.  On the other hand,
 tightening the checks in PGSharedMemoryCreate() seems very much worth
 doing, and I think it might also be safe enough to back-patch.

Were these changes every applied?  I don't see them.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Craig Ringer

On 02/12/2014 02:31 PM, Inoue, Hiroshi wrote:
 Maybe this is one of the few use cases of dlltool.
 Because python doesn't ship with its MINGW import library, the
 Makefile uses dlltool to generate an import library from the python
 DLL.

Wow. Has anyone been in touch with the Python package maintainers to see
if they can fix that and bundle the .lib ?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Craig Ringer

On 02/13/2014 12:39 AM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 11:26:56 -0500, Tom Lane wrote:
 Hm.  So if we're giving up on the idea of ever getting rid of PGDLLIMPORT,
 we ought to actually remove that, so that the Cygwin build works more like
 the other Windows builds?
 
 Hm, I don't see a big advantage in detecting the errors as It won't
 hugely increase the buildfarm coverage. But --auto-import seems to
 require marking the .text sections as writable, avoiding that seems to
 be a good idea if we don't need it anymore.
 
 Given the rather small number of Windows machines in the buildfarm,
 I'd really like them all to be on the same page about what's valid
 and what isn't.  Having to wait ~24 hours to find out if they're
 all happy with something isn't my idea of a good time.  In any case,
 as long as we have discrepancies between them, I'm not going to be
 entirely convinced that any of them are telling the full truth.
 
 I'm going to go remove that switch and see if brolga starts failing.
 If it does, I'll be satisfied that we understand the issues here.

I wouldn't assume that _anything_ Cygwin does makes sense, or is
consistent with anything else.

Its attempts to produce UNIX-like behaviour on Windows cause some truly
bizarre behaviour at times.

It is not totally unfair to compare the level of weirdness of Cygwin to
that of WINE, though they operate in completely different ways. I
wouldn't suggest making any conclusions about the _other_ platforms
based on Cygwin.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [DOCS] [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)

2014-02-12 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On 2/8/14, 4:41 PM, Tom Lane wrote:
 + Release notes for all versions of PostgreSQL can be found on-line at
 + http://www.postgresql.org/docs/devel/static/release.html

 Should be current instead of devel?

 + 
 + In a distribution file set, release notes for the current version can be
 + found prebuilt under doc/src/sgml/html/.  Visit the index.html file with
 + an HTML browser, then consult the Release Notes appendix.

 You can point them straight at doc/src/sgml/html/release.html.

Done and done.  I also noticed that these instructions were wrong anyway
for 8.4, which still has the embedded-tarball HTML docs, so I adjusted
the text for that version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Craig Ringer

On 02/13/2014 05:23 AM, Peter Eisentraut wrote:
 On 2/11/14, 7:04 PM, Craig Ringer wrote:
 I don't see any use for that with plperl, but it might be a valid thing
 to be doing for (e.g.) hstore.dll. Though you can't really link to it
 from another module anyway, you have to go through the fmgr to get
 access to its symbols at rutime, so we can probably just skip generation
 of import libraries for contribs and PLs.
 
 There are cases where one module needs symbols from another directly.
 Would that be affected by this?

Yes, in that you cannot link directly to another DLL on Windows (without
hoop jumping and lots of pain), you link to the import library. So if we
don't generate an import library then (eg) MyExtension cannot link to
hstore.dll . It can still look up function exports via the fmgr.

As concluded upthread, it's easier to just generate import libraries for
everything since we need it for the client library, so nothing's going
to change anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 On 02/12/2014 02:31 PM, Inoue, Hiroshi wrote:
 Maybe this is one of the few use cases of dlltool.
 Because python doesn't ship with its MINGW import library, the
 Makefile uses dlltool to generate an import library from the python
 DLL.

 Wow. Has anyone been in touch with the Python package maintainers to see
 if they can fix that and bundle the .lib ?

Indeed somebody should pester them about that, but it's not clear how
much it'd help us even if they fixed it tomorrow.  We're going to have
to be able to build with existing Python distributions for a long time
to come.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Craig Ringer

On 02/13/2014 05:35 AM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On February 12, 2014 10:23:21 PM CET, Peter Eisentraut pete...@gmx.net 
 wrote:
 There are cases where one module needs symbols from another directly.
 Would that be affected by this?
 
 I don't think we have real infrastructure for that yet. Neither from the POV 
 of loading several .so's, nor from a symbol visibility. Afaics we'd need a 
 working definition of PGDLLIMPORT which inverts the declspecs. I think Tom 
 just removed the remnants of that.
 
 No, I've not touched the PGDLLIMPORT macros.  I was hoping to, but it
 looks like we're not getting there :-(

In theory we could now remove the __declspec(dllexport) case for
BUILDING_DLL, as it should now be redundant with the fixed .DEF generator.

Should.

Unfortunately it looks like there's not going to be any getting around
having something that can turn into __declspec(dllimport) in the headers
for compiling things that link to postgres.exe, though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Haribabu Kommi

On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
  7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups);
 can
  go wrong when only one tuple present in the block
 with the equal item pointer what we are searching in the forward scan
  direction.
 
 It shouldn't happen, because the first or second ItemPointerCompare will
 handle the condition. Please assume the cchunk-ntups == 1. In this case,
 any given ctid shall match either of them, because any ctid is less, equal
 or
 larger to the tuple being only cached, thus, it moves to the right or left
 node
 according to the scan direction.


yes you are correct. sorry for the noise.


   8. I am not able to find a protection mechanism in insert/delete and
 etc of
  a tuple in Ttree. As this is a shared memory it can cause problems.
 
 For design simplification, I put a giant lock per columnar-cache.
 So, routines in cscan.c acquires exclusive lwlock prior to invocation of
 ccache_insert_tuple / ccache_delete_tuple.


Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-02-12 Thread Martijn van Oosterhout

On Sun, Feb 02, 2014 at 05:09:06PM -0800, Peter Geoghegan wrote:
 However, it also occurs to me that strxfrm() blobs have another useful
 property: We (as, say, the author of an equality operator on text, an
 operator intended for a btree operator class) *can* trust a strcmp()'s
 result on blobs, provided the result isn't 0/equal, *even if the blobs
 are truncated*. So maybe a better scheme, and certainly a simpler one
 would be to have a pseudo-attribute in inner pages only with, say, the
 first 8 bytes of a strxfrm() blob formed from the logically-leading
 text attribute of the same indexTuple. Because we're only doing this
 on inner pages, there is a very good chance that that will be good
 enough most of the time. This also allows us to reduce bloat very
 effectively.

(A bit late to the party). This idea has come up before and the most
annoying thing is that braindead strxfrm api.  Namely, to strxfrm a
large strings you need to strxfrm it completely even if you only want
the first 8 bytes.

That said, since btree index tuples are limited to 3k anyway, the
overhead probably isn't that bad.

I think it would make a noticable difference if it can be made to work.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Craig Ringer

On 02/13/2014 05:35 AM, Peter Eisentraut wrote:
 On 2/12/14, 4:30 PM, Andres Freund wrote:
 There are cases where one module needs symbols from another directly.
 Would that be affected by this?

 I don't think we have real infrastructure for that yet. Neither from the POV 
 of loading several .so's, nor from a symbol visibility. Afaics we'd need a 
 working definition of PGDLLIMPORT which inverts the declspecs. I think Tom 
 just removed the remnants of that.
 
 It works reasonably well on other platforms.
 
 Of course, we can barely build extension modules on Windows, so maybe
 this is a bit much to ask.  But as long as we're dealing only with
 functions, not variables, it should work without any dllimport dances,
 right?

Don't think so.

If you don't have __declspec(dllexport) or a .DEF file marking something
as exported, it's not part of the DLL interface at all, it's like you
compiled it with gcc using -fvisibility=hidden and didn't give the
symbol __attribute__((visibility (default)) .

If you _do_ have the symbol exported from the DLL, using
__declspec(dllimport) or a generated .DEF file that exposes all
externs, you can link to the symbol.

However, from the reading I've done recently, I'm pretty sure that if
you fail to declare __declspec(dllimport) on the importing side, you
actually land up statically linking to a thunk function that in turn
calls the real function in the DLL. So it works, but at a performance cost.

So you should do the dance. Sorry.

It gets worse, too. Say you want hstore to export a couple of symbols.
Those symbols must be __declspec(dllexport) while everything else in
headers must be __declspec(dllimport). This means you can't just use
PGDLLIMPORT. You must define a HSTOREDLLIMPORT that's
__declspec(dllexport) when compiling hstore and otherwise
__declspec(dllimport). Then set a preprocessor macro like
-DCOMPILING_HSTORE to trigger it.

Even though we're generating .DEF files, I don't think we can avoid
that, because we at minimum need to have the hstore exported symbols
*not* annotated __declspec(dllimport) when compiling hstore, but have
them so annotated when importing the header into anything else.

(Come to think of it, how're we dealing with this in libpq? I wonder if
postgres is linking to libpq using thunk functions?)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-02-12 Thread Peter Geoghegan

On Wed, Feb 12, 2014 at 3:30 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 (A bit late to the party). This idea has come up before and the most
 annoying thing is that braindead strxfrm api.  Namely, to strxfrm a
 large strings you need to strxfrm it completely even if you only want
 the first 8 bytes.

I think that in general strxfrm() must have that property. However, it
does not obligate us to store the entire string, provided that we
don't trust blob comparisons that indicate equality (since I believe
that we cannot do so generally with a non-truncated blob, we might as
well take advantage of this, particularly given we're doing this with
already naturally dissimilar inner pages, where we'll mostly get away
with truncation provided there is a reliable tie-breaker).

Besides all this, I'm not particularly worried about the cost of
calling strxfrm() if that only has to occur when a page split occurs,
as we insert a downlink into the parent page. The big picture here is
that we can exploit the properties of inner pages to do the smallest
amount of work for the largest amount of benefit.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 However, from the reading I've done recently, I'm pretty sure that if
 you fail to declare __declspec(dllimport) on the importing side, you
 actually land up statically linking to a thunk function that in turn
 calls the real function in the DLL. So it works, but at a performance cost.

Meh.  I'm not really willing to go through the pushups that would be
needed to deal with that, even if it can be shown that the performance
cost is noticeable.  We've already kluged our source code for Windows
far beyond what anybody would tolerate for any other platform, and
I find that to be quite enough bending over for Redmond.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-12 Thread Kyotaro HORIGUCHI

Hello, I've marked this patch as 'Ready for committer'.

 To be honest, I see no harm in changing the name as per your suggestion,
 as it can improve segment naming for dynamic shared memory segments,
 however there is no clear problem with current name as well, so I don't
 want to change in places this patch has no relation.

Okay, let's go with it as it is.

 I think best thing to do here is to put it as Notes To Committer, something
 like:
 Some suggestions for Committer to consider:
 Change the name of dsm segments from .. to ..

 In general, what I see is that they consider all discussion in thread, but
 putting some special notes like above will reduce the chance of getting
 overlooked by them. I have done as a reviewer previously and it worked
 well.

Thank you for the sugestion.

  However, it is a bit different thing from this patch so
  I have no intention to compel to do the changing.
 
 Thanks to you for understanding my position.
 
 Thanks for reviewing the patch so carefully, especially Windows part
 which I think was bit tricky for you to setup.

It's my presure and I learned a lot.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Andres Freund

On 2014-02-13 07:58:09 +0800, Craig Ringer wrote:
 On 02/13/2014 05:35 AM, Peter Eisentraut wrote:
  On 2/12/14, 4:30 PM, Andres Freund wrote:
  There are cases where one module needs symbols from another directly.
  Would that be affected by this?
 
  I don't think we have real infrastructure for that yet. Neither from the 
  POV of loading several .so's, nor from a symbol visibility. Afaics we'd 
  need a working definition of PGDLLIMPORT which inverts the declspecs. I 
  think Tom just removed the remnants of that.
  
  It works reasonably well on other platforms.
  
  Of course, we can barely build extension modules on Windows, so maybe
  this is a bit much to ask.  But as long as we're dealing only with
  functions, not variables, it should work without any dllimport dances,
  right?
 
 Don't think so.
 
 If you don't have __declspec(dllexport) or a .DEF file marking something
 as exported, it's not part of the DLL interface at all, it's like you
 compiled it with gcc using -fvisibility=hidden and didn't give the
 symbol __attribute__((visibility (default)) .
 
 If you _do_ have the symbol exported from the DLL, using
 __declspec(dllimport) or a generated .DEF file that exposes all
 externs, you can link to the symbol.
 
 However, from the reading I've done recently, I'm pretty sure that if
 you fail to declare __declspec(dllimport) on the importing side, you
 actually land up statically linking to a thunk function that in turn
 calls the real function in the DLL. So it works, but at a performance cost.
 
 So you should do the dance. Sorry.

I don't think the thunk function will have such a high overhead in this
day and age. And it's what we essentially already do for all functions
called *from* extensions, no?

 It gets worse, too. Say you want hstore to export a couple of symbols.
 Those symbols must be __declspec(dllexport) while everything else in
 headers must be __declspec(dllimport). This means you can't just use
 PGDLLIMPORT. You must define a HSTOREDLLIMPORT that's
 __declspec(dllexport) when compiling hstore and otherwise
 __declspec(dllimport). Then set a preprocessor macro like
 -DCOMPILING_HSTORE to trigger it.

We actually have a a macro that should do that, namely PGDLLEXPORT. I am
not sure though, why it's not dependent on dependent on BUILDING_DLL
(which is absolutely horribly misnamed, being essentially inverted).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Craig Ringer

On 02/13/2014 08:26 AM, Andres Freund wrote:
  It gets worse, too. Say you want hstore to export a couple of symbols.
  Those symbols must be __declspec(dllexport) while everything else in
  headers must be __declspec(dllimport). This means you can't just use
  PGDLLIMPORT. You must define a HSTOREDLLIMPORT that's
  __declspec(dllexport) when compiling hstore and otherwise
  __declspec(dllimport). Then set a preprocessor macro like
  -DCOMPILING_HSTORE to trigger it.

 We actually have a a macro that should do that, namely PGDLLEXPORT. I am
 not sure though, why it's not dependent on dependent on BUILDING_DLL
 (which is absolutely horribly misnamed, being essentially inverted).

Yes, it does that *for building postgres*.

What I'm saying is that you need another one for hstore if you wish to
be able to export symbols from hstore and import them into other libs.

You can't use PGDLLEXPORT because BUILDING_DLL will be unset, so it'll
expand to __declspec(dllimport) while compiling hstore. Which is wrong
if you want to be exporting symbols; even if you use a .DEF file, it
must expand blank, and if you don't use a .DEF it must expand to
__declspec(dllexport) ... but ONLY for those symbols exported by hstore
its self.

So each lib that wants to export symbols needs its own equivalent of
PGDLLIMPORT, and a flag set just when compiling that lib.

This is the normal way it's done on Windows builds, not stuff I'm
looking into and saying how _I_ think we should do it. AFAIK this is
just how it's done on Windows, horrible as that is. I'll see if I can
find a couple of relevant links.

BTW, BUILDING_DLL is so-named because the macro definition will be taken
from examples that talk about compiling a DLL, that's all.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-12 Thread Hiroshi Inoue

(2014/02/12 12:28), Inoue, Hiroshi wrote:
 (2014/02/12 8:30), Tom Lane wrote:
 I wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 I tried MINGW port with the attached change and successfully built
 src and contrib and all pararell regression tests were OK.

 I cleaned this up a bit (the if-nesting in Makefile.shlib was making
 my head hurt, not to mention that it left a bunch of dead code) and
 committed it.

 Hm ... according to buildfarm member narwhal, this doesn't work so well
 for plperl:

 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
 -Wformat-security -fno-strict-aliasing -fwrapv -g -Wno-comment   -shared -o 
 plperl.dll  plperl.o SPI.o Util.o -L../../../src/port -L../../../src/common 
 -Wl,--allow-multiple-definition -L/mingw/lib  -Wl,--as-needed   
 -LC:/Perl/lib/CORE -lperl58 -L../../../src/backend -lpostgres -lpgcommon 
 -lpgport -lintl -lxslt -lxml2 -lssleay32 -leay32 -lz -lm  -lws2_32 
 -lshfolder -Wl,--export-all-symbols -Wl,--out-implib=libplperl.a
 Cannot export .idata$4: symbol not found
 Cannot export .idata$5: symbol not found
 Cannot export .idata$6: symbol not found
 Cannot export .text: symbol not found
 Cannot export perl58_NULL_THUNK_DATA: symbol not found
 Creating library file: libplperl.a
 collect2: ld returned 1 exit status
 make[3]: *** [plperl.dll] Error 1
 
 Oops I forgot to inclule plperl, tcl or python, sorry. I would
 retry build operations with them. Unfortunately it would take
 pretty long time because build operations are pretty (or veeery
   in an old machine) slow.
 
 Not very clear what's going on there; could this be a problem in
 narwhal's admittedly-ancient toolchain?

As for build, plperl and pltcl are OK on both Windows7+gcc4.6.1
machine and Windows Vista+gcc3.4.5 machine. plpython is OK on
gcc4.6.1 machine but causes a *initializer element is not constant*
error on gcc3.4.5 machine.
I've not run regression test yet.

regards,
Hiroshi Inoue



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql should show disabled internal triggers

2014-02-12 Thread Bruce Momjian

On Thu, Nov 21, 2013 at 11:59:51PM -0200, Fabrízio de Royes Mello wrote:
 On Fri, Oct 25, 2013 at 3:37 PM, fabriziomello fabriziome...@gmail.com 
 wrote:
 
  On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote:
   On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote:
--On 18. September 2013 13:52:29 +0200 Andres Freund
lt;andres@gt; wrote:
   
If you do ALTER TABLE ... DISABLE TRIGGER ALL; and then individually
re-enable the disabled triggers it's easy to miss internal triggers.
A \d+ tablename will not show anything out of the ordinary for that
situation since we don't show internal triggers. But foreign key checks
won't work.
So, how about displaying disabled internal triggers in psql?
   
Hi had exactly the same concerns this morning while starting to look at
   the
ENABLE/DISABLE constraint patch. However, i wouldn't display them as
triggers, but maybe more generally as disabled constraints or such.
  
   Well, that will lead the user in the wrong direction, won't it? They
   haven't disabled the constraint but the trigger. Especially as we
   already have NOT VALID and might grow DISABLED for constraint
   themselves...
  
 
  Hi,
 
  The attached patch [1] enable PSQL to list internal disabled triggers in \d
  only in versions = 9.0.
 
  [1]  psql-display-all-triggers-v1.patch
  http://postgresql.1045698.n5.nabble.com/file/n5775954/
 psql-display-all-triggers-v1.patch

As others, I am concerned about people being confused when funny-looking
trigger names suddenly appearing when you disable all table triggers.

What I ended up doing is to create a user and internal section when
displaying disabled triggers:

Disabled user triggers:
check_update BEFORE UPDATE ON orders FOR EACH ROW EXECUTE PROCEDURE 
trigf()
Disabled internal triggers:
RI_ConstraintTrigger_c_16409 AFTER INSERT ON orders FROM customer 
NOT DEF ...
RI_ConstraintTrigger_c_16410 AFTER UPDATE ON orders FROM customer 
NOT DEF ...

I kept the Triggers section unchanged, showing only user triggers.  I
also updated the code to handle 8.3+ servers.

Patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
new file mode 100644
index 3cb8df2..a194ce7
*** a/src/bin/psql/describe.c
--- b/src/bin/psql/describe.c
*** describeOneTableDetails(const char *sche
*** 2090,2104 
  		printfPQExpBuffer(buf,
  		  SELECT t.tgname, 
  		  pg_catalog.pg_get_triggerdef(t.oid%s), 
! 		  t.tgenabled\n
  		  FROM pg_catalog.pg_trigger t\n
  		  WHERE t.tgrelid = '%s' AND ,
  		  (pset.sversion = 9 ? , true : ),
! 		  oid);
  		if (pset.sversion = 9)
! 			appendPQExpBufferStr(buf, NOT t.tgisinternal);
  		else if (pset.sversion = 80300)
! 			appendPQExpBufferStr(buf, t.tgconstraint = 0);
  		else
  			appendPQExpBufferStr(buf,
   (NOT tgisconstraint 
--- 2090,2108 
  		printfPQExpBuffer(buf,
  		  SELECT t.tgname, 
  		  pg_catalog.pg_get_triggerdef(t.oid%s), 
! 		  t.tgenabled, %s\n
  		  FROM pg_catalog.pg_trigger t\n
  		  WHERE t.tgrelid = '%s' AND ,
  		  (pset.sversion = 9 ? , true : ),
! 		  (pset.sversion = 9 ? t.tgisinternal :
! 		   pset.sversion = 80300 ?
! 		   t.tgconstraint  0 AS tgisinternal :
! 		   false AS tgisinternal), oid);
  		if (pset.sversion = 9)
! 			/* display/warn about disabled internal triggers */
! 			appendPQExpBuffer(buf, (NOT t.tgisinternal OR (t.tgisinternal AND t.tgenabled = 'D')));
  		else if (pset.sversion = 80300)
! 			appendPQExpBufferStr(buf, (t.tgconstraint = 0 OR (t.tgconstraint  0 AND t.tgenabled = 'D')));
  		else
  			appendPQExpBufferStr(buf,
   (NOT tgisconstraint 
*** describeOneTableDetails(const char *sche
*** 2124,2130 
  			 * disabled triggers and the two special ALWAYS and REPLICA
  			 * configurations.
  			 */
! 			for (category = 0; category  4; category++)
  			{
  have_heading = false;
  for (i = 0; i  tuples; i++)
--- 2128,2134 
  			 * disabled triggers and the two special ALWAYS and REPLICA
  			 * configurations.
  			 */
! 			for (category = 0; category = 4; category++)
  			{
  have_heading = false;
  for (i = 0; i  tuples; i++)
*** describeOneTableDetails(const char *sche
*** 2133,2143 
--- 2137,2149 
  	const char *tgdef;
  	const char *usingpos;
  	const char *tgenabled;
+ 	const char *tgisinternal;
  
  	/*
  	 * Check if this trigger falls into the current category
  	 */
  	tgenabled = PQgetvalue(result, i, 2);
+ 	tgisinternal = PQgetvalue(result, i, 3);
  	list_trigger = false;
  	switch (category)
  	{
*** describeOneTableDetails(const char *sche
*** 2146,2159

Re: [HACKERS] Dead code or buggy code?

2014-02-12 Thread Bruce Momjian

On Fri, Sep 20, 2013 at 12:13:18AM +0100, Greg Stark wrote:
 So I'm just going to make the code defensive and assume NULL is possible when
 if maybe it isn't.
 
 In case it's not clear, this is one of the thing's Xi Wang's took picked up.
 There not to many but it turns out they are indeed not all in the adt code so 
 I
 might wait until after the commit fest to commit it to avoid causing bit 
 churn.

Uh, where are we on this?

---


 
 --
 greg
 
 On 19 Sep 2013 12:52, Robert Haas robertmh...@gmail.com wrote:
 
 On Wed, Sep 18, 2013 at 6:20 PM, Greg Stark st...@mit.edu wrote:
  The following code is in the ProcSleep at proc.c:1138.
  GetBlockingAutoVacuumPgproc() should presumably always return a vacuum
  pgproc entry since the deadlock state says it's blocked by autovacuum.
  But I'm not really familiar enough with this codepath to know whether
  there's not a race condition here where it can sometimes return null.
  The following code checks autovac != NULL but the PGXACT initializer
  would have seg faulted if it returned NULL if that's possible.
 
  if (deadlock_state == DS_BLOCKED_BY_AUTOVACUUM 
  allow_autovacuum_cancel)
  {
  PGPROC   *autovac = GetBlockingAutoVacuumPgproc();
  PGXACT   *autovac_pgxact =
  ProcGlobal-allPgXact[autovac-pgprocno];
 
  LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
  /*
   * Only do it if the worker is not working to protect 
 against
 Xid
   * wraparound.
   */
  if ((autovac != NULL) 
  (autovac_pgxact-vacuumFlags  PROC_IS_AUTOVACUUM) 
  !(autovac_pgxact-vacuumFlags 
 PROC_VACUUM_FOR_WRAPAROUND))
  {
 
 Hmm, yeah.  I remember noticing this some time ago but never got
 around to fixing it.  +1 for rearranging things there somehow.
 
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark

On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh, wait a minute.  It's not just a matter of whether we find the right
 block: we also have to consider whether XLogReadBufferExtended will
 apply the right mode behavior.  Currently, it supposes that all pages
 past the initially observed EOF should be assumed to be uninitialized;
 but if we're working with an inconsistent database, that seems like
 an unsafe assumption.  It might be that a page is there but we've not
 (yet) fixed the length of some preceding segment.  If we want to not
 get bogus WAL contains references to invalid pages failures in such
 scenarios, it seems like we need a more invasive change than what
 I just committed.  I think your patch didn't cover this consideration
 either.

Hm. I *think* those cases would be handled anyways since the table
would later be truncated. Arguably any reference after the short
segment is a reference to an invalid page since it means it's a
record which predates the records which caused the extension.
Obviously it would still give the error in the case we had where files
were missing but then probably it should give errors in that case.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2014-02-12 Thread Bruce Momjian

On Fri, Nov 15, 2013 at 10:40:20AM +0200, Heikki Linnakangas wrote:
 The BTRFS_IOC_CLONE ioctl operates on file level and can be used to
 clone files anywhere in a btrfs filesystem.
 
 Hmm, you can also do
 
 cp --reflog -r data92 data-tmp

I think you mean --reflink here.

 pg_upgrade --link --old-datadir=data92-copy --new-datadir=data-tmp
 rm -rf data-tmp
 
 That BTRFS_IOC_CLONE ioctl seems so hacky that I'd rather not get
 that in our source tree. cp --reflog is much more likely to get that
 magic incantation right, since it gets a lot more attention and
 testing than pg_upgrade.
 
 I'm not in favor of adding filesystem-specific tricks into
 pg_upgrade. It would be nice to list these tricks in the docs,
 though.

I have applied the attached patch which suggests the use of file system
snapshots and copy-on-write file copies.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index 3d529b2..bb3b6a0
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** psql --username postgres --file script.s
*** 569,575 
 the old server and run commandrsync/ again to update the copy with any
 changes to make it consistent.  You might want to exclude some
 files, e.g. filenamepostmaster.pid/, as documented in xref
!linkend=backup-lowlevel-base-backup.
/para
  
   refsect2
--- 569,578 
 the old server and run commandrsync/ again to update the copy with any
 changes to make it consistent.  You might want to exclude some
 files, e.g. filenamepostmaster.pid/, as documented in xref
!linkend=backup-lowlevel-base-backup.  If your file system supports
!file system snapshots or copy-on-write file copying, you can use that
!to make a backup of the old cluster, though the snapshot and copies
!must be created simultaneously or while the database server is down.
/para
  
   refsect2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent pg_basebackup -Fp -D -?

2014-02-12 Thread Bruce Momjian

On Thu, Oct  3, 2013 at 06:50:57AM +0200, Magnus Hagander wrote:
 
 On Oct 3, 2013 2:47 AM, Michael Paquier michael.paqu...@gmail.com wrote:
 
  On Wed, Oct 2, 2013 at 11:31 PM, Magnus Hagander mag...@hagander.net 
  wrote:
   Right now, if you use
  
   pg_basebackup -Ft -D -
  
   you get a tarfile, written to stdout, for redirection.
  
   However, if you use:
  
   pg_basebackup -Fp -D -
  
   you get a plaintext (unpackaged) backup, in a directory called -.
  
   I can't think of a single usecase where this is a good idea. Therefor,
   I would suggest we simply throw an error in this case, instead of
   creating the directory. Only for the specific case of specifying
   exactly - as a directory.
  
   Comments?
  Isn't this a non-problem? This behavior is in line with the
  documentation, so I would suspected that if directory name is
  specified as - in plain mode, it should create the folder with this
  name.
  Do you consider having a folder of this name an annoyance?
 
 Yes, that is exactly the point - i do consider that an annoyance, and i don't
 see the use case where you'd actually want it. I bet 100% of the users of that
 have been accidental, thinking they'd get the pipe, not the directory.
 
   Also, if we do that, is this something we should consider
   backpatchable? It's not strictly speaking a bugfix, but I'd say it
   fixes some seriously annoying behavior.
  This would change the spec of pg_basebackup, so no? Does the current
  behavior have potential security issues?
 
 No, there are no security issues that I can see. Just annoyance. And yes, I
 guess it would change the spec, so backpatching might be a bad idea..

Has this been fixed?  If so, I don't see it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql should show disabled internal triggers

2014-02-12 Thread Fabrízio de Royes Mello

On Thu, Feb 13, 2014 at 12:04 AM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Nov 21, 2013 at 11:59:51PM -0200, Fabrízio de Royes Mello wrote:
  On Fri, Oct 25, 2013 at 3:37 PM, fabriziomello fabriziome...@gmail.com
wrote:
  
   On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote:
On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote:
 --On 18. September 2013 13:52:29 +0200 Andres Freund
 lt;andres@gt; wrote:

 If you do ALTER TABLE ... DISABLE TRIGGER ALL; and then
individually
 re-enable the disabled triggers it's easy to miss internal
triggers.
 A \d+ tablename will not show anything out of the ordinary for
that
 situation since we don't show internal triggers. But foreign key
checks
 won't work.
 So, how about displaying disabled internal triggers in psql?

 Hi had exactly the same concerns this morning while starting to
look at
the
 ENABLE/DISABLE constraint patch. However, i wouldn't display them
as
 triggers, but maybe more generally as disabled constraints or
such.
   
Well, that will lead the user in the wrong direction, won't it? They
haven't disabled the constraint but the trigger. Especially as we
already have NOT VALID and might grow DISABLED for constraint
themselves...
   
  
   Hi,
  
   The attached patch [1] enable PSQL to list internal disabled triggers
in \d
   only in versions = 9.0.
  
   [1]  psql-display-all-triggers-v1.patch
   http://postgresql.1045698.n5.nabble.com/file/n5775954/
  psql-display-all-triggers-v1.patch

 As others, I am concerned about people being confused when funny-looking
 trigger names suddenly appearing when you disable all table triggers.

 What I ended up doing is to create a user and internal section when
 displaying disabled triggers:

 Disabled user triggers:
 check_update BEFORE UPDATE ON orders FOR EACH ROW EXECUTE
PROCEDURE trigf()
 Disabled internal triggers:
 RI_ConstraintTrigger_c_16409 AFTER INSERT ON orders FROM
customer NOT DEF ...
 RI_ConstraintTrigger_c_16410 AFTER UPDATE ON orders FROM
customer NOT DEF ...

 I kept the Triggers section unchanged, showing only user triggers.  I
 also updated the code to handle 8.3+ servers.

 Patch attached.


Makes more sense than my previous patch...

The code looks fine to me!!

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh, wait a minute.  It's not just a matter of whether we find the right
 block: we also have to consider whether XLogReadBufferExtended will
 apply the right mode behavior.  Currently, it supposes that all pages
 past the initially observed EOF should be assumed to be uninitialized;
 but if we're working with an inconsistent database, that seems like
 an unsafe assumption.  It might be that a page is there but we've not
 (yet) fixed the length of some preceding segment.  If we want to not
 get bogus WAL contains references to invalid pages failures in such
 scenarios, it seems like we need a more invasive change than what
 I just committed.  I think your patch didn't cover this consideration
 either.

 Hm. I *think* those cases would be handled anyways since the table
 would later be truncated. Arguably any reference after the short
 segment is a reference to an invalid page since it means it's a
 record which predates the records which caused the extension.

Well, that would be the case if you assume perfectly sequential filesystem
behavior, but I'm not sure the assumption holds if the starting condition
is a base backup.  We could be looking at a version of segment 1 that
predates segment 2's existence, and yet see some data in segment 2 as
well, because it's not a perfectly coherent snapshot.

I think what you're arguing is that we should see WAL records filling the
rest of segment 1 before we see any references to segment 2, but if that's
the case then how did we get into the situation you reported?  Or is it
just that it was a broken base backup to start with?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-02-12 Thread Craig Ringer

On 02/11/2014 08:19 PM, Yeb Havinga wrote:

 I compared output of psql -ef of the minirim.sql script posted earlier
 in http://www.postgresql.org/message-id/52f54927.1040...@gmail.com
 between v4 and v7.
 
 Not everything is ok.

 +psql:/home/m/minirim2.sql:409: ERROR:  attribute 6 has wrong type
 +DETAIL:  Table has type tsrange, but query expects _confidentialitycode.

This looks like an issue with attribute transformations made when
applying security barrier quals.

 +psql:/home/m/minirim2.sql:439: connection to server was lost

That's dying with:

 Program received signal SIGABRT, Aborted.
 0x003f02c359e9 in __GI_raise (sig=sig@entry=6) at 
 ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
 (gdb) bt
 #0  0x003f02c359e9 in __GI_raise (sig=sig@entry=6) at 
 ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x003f02c370f8 in __GI_abort () at abort.c:90
 #2  0x00758d9d in ExceptionalCondition 
 (conditionName=conditionName@entry=0x8b7bf0 !(!bms_is_empty(upper_varnos)), 
 errorType=errorType@entry=0x78e89c FailedAssertion, 
 fileName=fileName@entry=0x8b78d7 subselect.c, 
 lineNumber=lineNumber@entry=1394) at assert.c:54
 #3  0x0061de2b in convert_EXISTS_sublink_to_join (root=optimized 
 out, sublink=optimized out, under_not=under_not@entry=0 '\000', 
 available_rels=0x117d038)
 at subselect.c:1394
 #4  0x0061efa9 in pull_up_sublinks_qual_recurse 
 (root=root@entry=0x117d058, node=0x117a0f8, 
 jtlink1=jtlink1@entry=0x7fff6d97f708, 
 available_rels1=available_rels1@entry=0x117d038, 
 jtlink2=jtlink2@entry=0x0, available_rels2=available_rels2@entry=0x0) at 
 prepjointree.c:391
 #5  0x0061f339 in pull_up_sublinks_jointree_recurse 
 (root=root@entry=0x117d058, jtnode=0x117a040, 
 relids=relids@entry=0x7fff6d97f758) at prepjointree.c:208
 #6  0x0062066f in pull_up_sublinks (root=root@entry=0x117d058) at 
 prepjointree.c:147
 #7  0x0061967e in subquery_planner (glob=0x10c3fb8, 
 parse=parse@entry=0x1179af8, parent_root=parent_root@entry=0x1182fd0, 
 hasRecursion=hasRecursion@entry=0 '\000', 
`
 #8  0x005fc093 in set_subquery_pathlist (root=root@entry=0x1182fd0, 
 rel=rel@entry=0x1179370, rti=rti@entry=4, rte=rte@entry=0x1183980) at 
 allpaths.c:1209
 #9  0x005fc54e in set_rel_size (root=root@entry=0x1182fd0, 
 rel=0x1179370, rti=rti@entry=4, rte=0x1183980) at allpaths.c:265
 #10 0x005fc62e in set_base_rel_sizes (root=root@entry=0x1182fd0) at 
 allpaths.c:180
 #11 0x005fd584 in make_one_rel (root=root@entry=0x1182fd0, 
 joinlist=joinlist@entry=0x1179560) at allpaths.c:138
 #12 0x0061617a in query_planner (root=root@entry=0x1182fd0, 
 tlist=tlist@entry=0x11771c8, qp_callback=qp_callback@entry=0x616fd6 
 standard_qp_callback, 
 qp_extra=qp_extra@entry=0x7fff6d97fa00) at planmain.c:236
 #13 0x006182f2 in grouping_planner (root=root@entry=0x1182fd0, 
 tuple_fraction=tuple_fraction@entry=0) at planner.c:1280
 #14 0x00619a11 in subquery_planner (glob=glob@entry=0x10c3fb8, 
 parse=parse@entry=0x10c3d30, parent_root=parent_root@entry=0x0, 
 hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=0, 
 subroot=subroot@entry=0x7fff6d97fb58) at planner.c:574
 #15 0x00619c1f in standard_planner (parse=0x10c3d30, cursorOptions=0, 
 boundParams=0x0) at planner.c:211
 #16 0x00619e35 in planner (parse=parse@entry=0x10c3d30, 
 cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0) at 
 planner.c:139
 #17 0x0068c64b in pg_plan_query (querytree=0x10c3d30, 
 cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0) at 
 postgres.c:759
 #18 0x0068c6d3 in pg_plan_queries (querytrees=optimized out, 
 cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0) at 
 postgres.c:818
 #19 0x0068c932 in exec_simple_query 
 (query_string=query_string@entry=0x10c2d78 SELECT * FROM hl7.test;) at 
 postgres.c:983
 #20 0x0068e46e in PostgresMain (argc=optimized out, 
 argv=argv@entry=0x10cd1f0, dbname=0x10cd058 regress, username=optimized 
 out) at postgres.c:4003
 #21 0x006419a7 in BackendRun (port=port@entry=0x10c7e50) at 
 postmaster.c:4080
 #22 0x006432c2 in BackendStartup (port=port@entry=0x10c7e50) at 
 postmaster.c:3769
 #23 0x00643526 in ServerLoop () at postmaster.c:1580
 #24 0x0064467c in PostmasterMain (argc=argc@entry=4, 
 argv=argv@entry=0x10a8750) at postmaster.c:1235
 #25 0x005d692c in main (argc=4, argv=0x10a8750) at main.c:205
 (gdb) 


It's crashing while pulling up the query over emp (hl7.employee) and
part (hl7.participation).

Given the simplicity of what the row-security code its self is doing,
I'm wondering if this is a case that isn't handled in updatable s.b.
views. I'll look into it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Amit Kapila

On Wed, Feb 12, 2014 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote:

 I think 99.9% of users are never going to adjust this so we had better
 choose something we are happy to enable for effectively everyone.  In my
 reading, prefix/suffix seemed safe for everyone.  We can always revisit
 this if we think of something better later, as WAL format changes are not
 a problem for pg_upgrade.

  Agreed.

 I also think making it user-tunable is so hard for users to know when to
 adjust as to be almost not worth the user interface complexity it adds.

 I suggest we go with always-on prefix/suffix mode, then add some check
 so the worst case is avoided by just giving up on compression.

 As I said previously, I think compressing the page images is the next
 big win in this area.

 I think here one might argue that for some users it is not feasible to
 decide whether their tuples data for UPDATE is going to be similar
 or completely different and they are not at all ready for any risk for
 CPU overhead, but they would be happy to see I/O reduction in which
 case it is difficult to decide what should be the value of table-level
 switch. Here I think the only answer is nothing is free in this world,
 so either make sure about the application's behaviour for UPDATE
 statement before going to production or just don't enable this switch and
 be happy with the current behaviour.

 Again, can't set do a minimal attempt at prefix/suffix compression so
 there is no measurable overhead?

Yes, currently it is there at 25%, which means there should be atleast 25%
match in prefix-suffix, then only we consider it for compression and that
is pretty fast and almost no overhead, but the worst case here is other
way i.e when the string has 25% match in prefix-suffix, but after that
there is no match or at least in next few bytes there is no match.

For example, consider below 2 cases:

Case-1

old tuple
aaa


new tuple
bbb
bbba

Here there is a suffix match for 25% of string, but after that there is no
match, so we have to copy all the 75% remaining bytes as it is byte-by-byte.
Now here with bit longer tuples (800 bytes), the performance data taken be me
shows around ~11% of CPU overhead. Now as this test is a fabricated test
to just see how much extra CPU it consumes for worst scenario, in reality
user might not see this, at least in synchronous commit mode on, because
there is always some I/O involved at end of transaction (unless there is some
error in between or user rollbacks transaction chances of which are very less).


First thing that comes to mind after seeing above scenario, is that why not
increase the minimum limit of 25%, because we have almost negligible
overhead in comparing prefix-suffix, so I have tried that by increasing it
to 35% or more but in that case it starts falling from other side like
for cases when there is 34% match and still we return.

Here one of the improvements which can be done is that after prefix-suffix
match, instead of going byte-by-byte copy as per LZ format we can directly
copy all the remaining part of tuple but I think that would require us to use
some different format than LZ which is also not too difficult to do, but the
question is do we really need such a change to handle the above kind of
worst case.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Kouhei Kaigai

8. I am not able to find a protection mechanism in insert/delete
 and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.
   
 
   For design simplification, I put a giant lock per columnar-cache.
   So, routines in cscan.c acquires exclusive lwlock prior to
 invocation of
   ccache_insert_tuple / ccache_delete_tuple.
 
 
 Correct. But this lock can be a bottleneck for the concurrency. Better to
 analyze the same once we have the performance report.

Well, concurrent updates towards a particular table may cause lock contention
due to a giant lock.
On the other hands, one my headache is how to avoid dead-locking if we try to
implement it using finer granularity locking. Please assume per-chunk locking.
It also needs to take a lock on the neighbor nodes when a record is moved out.
Concurrently, some other process may try to move another record with inverse
order. That is a ticket for dead-locking.

Is there idea or reference to implement concurrent tree structure updating?

Anyway, it is a good idea to measure the impact of concurrent updates on
cached tables, to find out the significance of lock splitting.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
 Sent: Thursday, February 13, 2014 8:31 AM
 To: Kohei KaiGai
 Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
 Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
 table scan?)
 
 On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 
 
   2014-02-12 14:59 GMT+09:00 Haribabu Kommi
 kommi.harib...@gmail.com:
 
7. In ccache_find_tuple function this Assert(i_min + 1 
 cchunk-ntups); can
 
go wrong when only one tuple present in the block
   with the equal item pointer what we are searching in the
 forward scan
direction.
   
 
   It shouldn't happen, because the first or second ItemPointerCompare
 will
   handle the condition. Please assume the cchunk-ntups == 1. In this
 case,
   any given ctid shall match either of them, because any ctid is less,
 equal or
   larger to the tuple being only cached, thus, it moves to the right
 or left node
   according to the scan direction.
 
 
 yes you are correct. sorry for the noise.
 
 
8. I am not able to find a protection mechanism in insert/delete
 and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.
   
 
   For design simplification, I put a giant lock per columnar-cache.
   So, routines in cscan.c acquires exclusive lwlock prior to
 invocation of
   ccache_insert_tuple / ccache_delete_tuple.
 
 
 Correct. But this lock can be a bottleneck for the concurrency. Better to
 analyze the same once we have the performance report.
 
 Regards,
 Hari Babu
 
 Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Claudio Freire

On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Here one of the improvements which can be done is that after prefix-suffix
 match, instead of going byte-by-byte copy as per LZ format we can directly
 copy all the remaining part of tuple but I think that would require us to use
 some different format than LZ which is also not too difficult to do, but the
 question is do we really need such a change to handle the above kind of
 worst case.


Why use LZ at all? Why not *only* prefix/suffix?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Amit Kapila

On Thu, Feb 13, 2014 at 10:07 AM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Here one of the improvements which can be done is that after prefix-suffix
 match, instead of going byte-by-byte copy as per LZ format we can directly
 copy all the remaining part of tuple but I think that would require us to use
 some different format than LZ which is also not too difficult to do, but the
 question is do we really need such a change to handle the above kind of
 worst case.


 Why use LZ at all?

We are just using LZ *format* to represent compressed string.
Just copied some text from pg_lzcompress.c, to explain what
exactly we are using

the first byte after the header tells what to do
 the next 8 times. We call this the control byte.

An unset bit in the control byte means, that one uncompressed
byte follows, which is copied from input to output.
A set bit in the control byte means, that a tag of 2-3 bytes
follows. A tag contains information to copy some bytes, that
are already in the output buffer, to the current location in
the output.

 Why not *only* prefix/suffix?

To represent prefix/suffix match, we atleast need a way to tell
that the offset and len of matched bytes and then how much
is the length of unmatched bytes we have copied.
I agree that a simpler format could be devised if we just want to
do prefix-suffix match, but that would require much more test
during recovery to ensure everything is fine, advantage with LZ
format is that we don't need to bother about decoding, it will work
as without any much change in LZ decode routine.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Robert Haas

On Tue, Feb 11, 2014 at 11:37 AM, Bruce Momjian br...@momjian.us wrote:
 Yes, that was my point.  I though the compression of full-page images
 was a huge win and that compression was pretty straight-forward, except
 for the compression algorithm.  If the compression algorithm issue is
 resolved, can we move move forward with the full-page compression patch?

Discussion of the full-page compression patch properly belongs on that
thread rather than this one.  However, based on what we've discovered
so far here, I won't be very surprised if that patch turns out to have
serious problems with CPU consumption.  The evidence from this thread
suggests that making even relatively lame attempts at compression is
extremely costly in terms of CPU overhead.  Now, the issues with
straight-up compression are somewhat different than for delta
compression and, in particular, it's easier to bail out of straight-up
compression sooner if things aren't working out.  But even with all
that, I expect it to be not too difficult to find cases where some
compression is achieved but with a dramatic increase in runtime on
CPU-bound workloads.  Which is basically the same problem this patch
has.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 105 matches

Mail list logo