date:20140214

Re: [HACKERS] Changeset Extraction v7.6

2014-02-14 Thread Erik Rijkers

On Thu, February 13, 2014 17:12, Andres Freund wrote:
 Hi,

 On 2014-02-11 11:22:24 -0500, Robert Haas wrote:
 [loads of comments]

 I tried to address all the points you mentioned.


0001-wal_decoding-Introduce-logical-changeset-extraction.patch.gz  159 k
0002-wal_decoding-logical-changeset-extraction-walsender-.patch.gz 16 k
0003-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz 15 k
0004-wal_decoding-Documentation-for-replication-slots-and.patch.gz 14 k
0005-wal_decoding-Temporarily-add-logical-decoding-regres.patch.gz 1.8 k

These don't apply...








-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
 On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
  I'm aware of how a pg_hba.conf file can refer to other files for
  including @lists of users, etc.
  
  But there is currently no support for being able to pull in entire file
  segments as can be done for postgresql.conf via the include directive.
  
  In the environment that I'm managing, we are using a makefile to stick
  together a common header with a custom section for any of several
  clusters and may extend this further to permit additional includes for
  hba rules common to groupings of clusters.
  
  Anyway, please advise.  I don't recall hearing anything like this
  discussed.
  
  Has been proposed, discussed and voted down?  Or never mentioned?
 
 I have never heard of anyone request this.

I've brought it up on various threads before, including both the ALTER
SYSTEM thread and the postgresql.conf 'includes' thread, though I don't
feel like going back and hunting down the specific emails right now.

Having @include and directory.d-style capabilities for pg_hba.conf *and*
pg_ident.conf would make managing larger environments much better.
There has been some talk about providing those capabilities via tables
in the catalog, but I'm not aware of anyone working on it and it'd
certainly be quite a bit more work than adding include/dir.d options.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-14 Thread Kyotaro HORIGUCHI

Hello,

Before taking up the topic..

At Thu, 13 Feb 2014 19:45:38 +0200, Heikki Linnakangas wrote
 On 02/13/2014 06:47 PM, Heikki Linnakangas wrote:
  On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
  The behavior where we prefer a segment from archive with lower TLI
  over
  a file with higher TLI in pg_xlog actually changed in commit
  a068c391ab0. Arguably changing it wasn't a good idea, but the problem
  your test script demonstrates can be fixed by not archiving the
  partial
  segment, with no change to the preference of archive/pg_xlog. As
  discussed, archiving a partial segment seems like a bad idea anyway,
  so
  let's just stop doing that.

It surely makes things simple and I rather like the idea but as
long as the final and possiblly partial segment of the lower TLI
is actually created and the recovery mechanism allows users to
command recovery operation requires such segments
(recovery_target_timeline does this), a perfect archive - which
means an archive which can cover all sorts of restore operatoins
- necessarily may have such duplicate segments, I
believe. Besides, I suppose that policy makes operations around
archive/restore way difficult. DBAs should get stuck with tensive
work of picking only actually needed segments for the recovery
undertaken out of the haystack. It sounds somewhat gloomy..

# However I also doubt the appropriateness of stockpiling archive
# segments spanning over so many timelines, two generations are
# enough to cause this issue.

Anyway, returning to the topic,

  After some further thought, while not archiving the partial segment
  fixes your test script, it's not enough to fix all variants of the
  problem. Even if archive recovery doesn't archive the last, partial,
  segment, if the original master server is still running, it's entirely
  possible that it fills the segment and archives it. In that case,
  archive recovery will again prefer the archived segment with lower TLI
  over the segment with newer TLI in pg_xlog.

Yes, it is the generalized description of the case I've
mentioned. (Though I've not reached that thought :)

  So I agree we should commit the patch you posted (or something to that
  effect). The change to not archive the last segment still seems like a
  good idea, but perhaps we should only do that in master.

My opinion on duplicate segments on older timelines is as
decribed above.

 To draw this to conclusion, barring any further insights to this, I'm
 going to commit the attached patch to master and REL9_3_STABLE. Please
 have a look at the patch, to see if I'm missing something. I modified
 the state machine to skip over XLOG_FROM_XLOG state, if reading in
 XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and
 pg_xlog together, and then pg_xlog alone, which is pointless.
 
 In master, I'm also going to remove the archive last segment on old
 timeline code.

Thank you for finishing the patch. I didn't think of the behavior
after XLOG_FROM_ARCHIVE failure. It seems that the state machine
will go round getting rid of extra round with it. Recovery
process becomes able to grab the segment on highest (expected)
TLI among those with the same segment id regardless of their
locations. I think the recovery process will cope with perfect
archives described above for all types of recovery operation. The
state machine loop considering fallback from archive to pg_xlog
now seems somewhat too complicated than needed but it's also no
harm.

Though, here which was in my original patch,

  readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
currentSource == XLOG_FROM_ARCHIVE ? 
 XLOG_FROM_ANY : currentSource);

is sticking far out the line wrapping boundary and seems somewhat
dirty:(

And what the conditional operator seems to make the meaning of
the XLOG_FROM_ARCHIVE and _ANY a bit confused. But I failed to
unify them to any side so it is left as is..

Finally, the patch you will find attached is fixed only in
styling mentioned above from your last patch. This patch applies
current HEAD and I confirmed that it fixes this issue but I have
not checked the lastSourceFailed section. Simple file removal
could not lead to there.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 508970a..85a0ce9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11006,17 +11006,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	/*---
 	 * Standby mode is implemented by a state machine:
 	 *
-	 * 1. Read from archive (XLOG_FROM_ARCHIVE)
-	 * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG)
-	 * 3. Check trigger file
-	 * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM)
-	 * 5. Rescan timelines
-	 * 6. Sleep 5 seconds, and loop back to 1.
+	 * 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just
+	 *pg_xlog (XLOG_FROM_XLOG)
+	 *

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Hiroshi Inoue

(2014/02/12 8:30), Tom Lane wrote:
 I wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 I tried MINGW port with the attached change and successfully built
 src and contrib and all pararell regression tests were OK.
 
 I cleaned this up a bit (the if-nesting in Makefile.shlib was making
 my head hurt, not to mention that it left a bunch of dead code) and
 committed it.
 
 Hm ... according to buildfarm member narwhal, this doesn't work so well
 for plperl:
 
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
 -Wformat-security -fno-strict-aliasing -fwrapv -g -Wno-comment   -shared -o 
 plperl.dll  plperl.o SPI.o Util.o -L../../../src/port -L../../../src/common 
 -Wl,--allow-multiple-definition -L/mingw/lib  -Wl,--as-needed   
 -LC:/Perl/lib/CORE -lperl58 -L../../../src/backend -lpostgres -lpgcommon 
 -lpgport -lintl -lxslt -lxml2 -lssleay32 -leay32 -lz -lm  -lws2_32 -lshfolder 
 -Wl,--export-all-symbols -Wl,--out-implib=libplperl.a
 Cannot export .idata$4: symbol not found
 Cannot export .idata$5: symbol not found
 Cannot export .idata$6: symbol not found
 Cannot export .text: symbol not found
 Cannot export perl58_NULL_THUNK_DATA: symbol not found
 Creating library file: libplperl.a
 collect2: ld returned 1 exit status
 make[3]: *** [plperl.dll] Error 1
 
 Not very clear what's going on there; could this be a problem in
 narwhal's admittedly-ancient toolchain?

The error doesn't occur here (un)fortunately.
One thing I'm wondering about is that plperl is linking perlxx.lib
not libperlxx.a. I made a patch following plpython and it also
works here.
Is it worth trying?

regards,
Hiroshi Inoue

diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index e0e31ec..065c5d3 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -16,6 +16,8 @@ endif
 ifeq ($(shared_libperl),yes)
 
 ifeq ($(PORTNAME), win32)
+generate_perl_implib = yes
+
 override CPPFLAGS += -DPLPERL_HAVE_UID_GID
 # Perl on win32 contains /* within comment all over the header file,
 # so disable this warning.
@@ -36,7 +38,21 @@ DATA = plperl.control plperl--1.0.sql plperl--unpackaged--1.0.sql \
 
 PERLCHUNKS = plc_perlboot.pl plc_trusted.pl
 
+# Perl on win32 ships with import libraries only for Microsoft Visual C++,
+# which may not be handled with mingw gcc. Therefore we choose to build a
+# new import library to link with.
+ifeq ($(generate_perl_implib), yes)
+	perlwithver = $(subst -l,,$(filter -l%, $(perl_embed_ldflags)))
+	OBJS += lib$(perlwithver).a
+lib$(perlwithver).a: $(perlwithver).def
+	dlltool --dllname $(perlwithver).dll --def $(perlwithver).def --output-lib  lib$(perlwithver).a
+$(perlwithver).def:
+	pexports $(dir $(subst ',,$(PERL)))$(perlwithver).dll  $(perlwithver).def
+
+SHLIB_LINK = 
+else
 SHLIB_LINK = $(perl_embed_ldflags)
+endif
 
 REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=plperl  --load-extension=plperlu
 REGRESS = plperl plperl_lc plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array
@@ -105,6 +121,9 @@ submake:
 clean distclean maintainer-clean: clean-lib
 	rm -f SPI.c Util.c $(OBJS) perlchunks.h plperl_opmask.h
 	rm -rf $(pg_regress_clean_files)
+ifeq (yes, $(generate_perl_implib))
+	rm -rf	$(perlwithver).def
+endif
 
 else # can't build
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.6

2014-02-14 Thread Andres Freund

Hi,

On 2014-02-14 09:23:45 +0100, Erik Rijkers wrote:
 0001-wal_decoding-Introduce-logical-changeset-extraction.patch.gz159 k
 0002-wal_decoding-logical-changeset-extraction-walsender-.patch.gz   16 k
 0003-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz   15 k
 0004-wal_decoding-Documentation-for-replication-slots-and.patch.gz   14 k
 0005-wal_decoding-Temporarily-add-logical-decoding-regres.patch.gz   1.8 k
 
 These don't apply...

Works here, could you give a bit more details about the problem you have
applying them? Note that they are compressed, so need to be gunzipped first...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck

Thanks for the reply. There are two things I think I’ve been misunderstood:

1, the point is to do the rewrite without and before catalog access
2, I do want to push the join to the source and equally important pushing the 
where conditions there

Best regards, David


2014.02.13. dátummal, 21:22 időpontban Tom Lane t...@sss.pgh.pa.us írta:

 David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
  “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”
 
 to:
 
  “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
 
 TBH this sounds like a spectacularly bad idea, especially in the place and
 way you propose to do it.  You can't even do catalog access safely where
 you've put that hook, not to mention that there are many other places
 where queries can be submitted.  But more generally, an FDW should not
 operate in the way you're describing.
 
 We do lack support for pushing joins to the foreign server, and that needs
 to be addressed; but we need to do it in the planner, not by kluging the
 query somewhere upstream of that.
 
   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.6

2014-02-14 Thread Erik Rijkers

On Fri, February 14, 2014 10:13, Andres Freund wrote:
 Hi,

 On 2014-02-14 09:23:45 +0100, Erik Rijkers wrote:
 0001-wal_decoding-Introduce-logical-changeset-extraction.patch.gz   159 k
 0002-wal_decoding-logical-changeset-extraction-walsender-.patch.gz  16 k
 0003-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz  15 k
 0004-wal_decoding-Documentation-for-replication-slots-and.patch.gz  14 k
 0005-wal_decoding-Temporarily-add-logical-decoding-regres.patch.gz  1.8 k

 These don't apply...

 Works here, could you give a bit more details about the problem you have
 applying them? Note that they are compressed, so need to be gunzipped first...


yeah, unzipping -- I thought of that :)  (no offense taken :))

I just gave it into my standard patch-applying-compiling shell script which 
does something like, as always :

patch --dry-run -b -l -F 25 -p 1 
/home/aardvark/download/pgpatches/0094/lcsg/20140213/0001-wal_decoding-Introduce-logical-changeset-extraction.patch
patch.1.dry-run.1.out

(it loops to try out several -p values, none acceptable)

etc




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
 On Feb10, 2014, at 17:38 , Andres Freund and...@2ndquadrant.com wrote:
  On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
  So what we need to do is to acquire a write barrier between the
  assignments to lwWaitLink and lwWaiting, i.e.
 proc-lwWaitLink = NULL;
 pg_write_barrier();
 proc-lwWaiting = false;
  
  You didn't really explain why you think that ordering is necessary?
  Each proc being awoken will surely see both fields updated, and other
  procs won't be examining these fields at all, since we already delinked
  all these procs from the LWLock's queue.
  
  The problem is that one the released backends could wake up concurrently
  because of a unrelated, or previous PGSemaphoreUnlock(). It could see
  lwWaiting = false, and thus wakeup and acquire the lock, even if the
  store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
  ordering here) yet.
  Now, it may well be that there's no practical consequence of that, but I
  am not prepared to bet on it.
 
 AFAICS there is a potential problem if three backends are involved, since
 by the time the waiting backend's lwWaitLink is set to NULL after the
 original lock holder released the lock, the waiting backend might already
 have acquired the lock (due to a spurious wakeup) *and* a third backend
 might have already enqueued behind it.
 
 The exact sequence for backends A,B,C that corrupts the wait queue is:
 
 A: Release lock, set B's lwWaiting to false
 B: Wakes up spuriously, takes the lock
 C: Enqueues behind B
 A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
causing C and anyone behind it to block indefinitely.

I don't think that can actually happen because the head of the wait list
isn't the lock holder's lwWaitLink, but LWLock-head. I thought the same
for a while...

So, right now I don't see problems without either the compiler reordering
stores or heavily out of order machines with speculative execution.

 I wonder whether LWLockRelease really needs to update lwWaitLink at all.
 We take the backends we awake off the queue by updating the queue's head and
 tail, so the contents of lwWaitLink should only matter once the backend is
 re-inserted into some wait queue. But when doing that, we reset lwWaitLink
 back to NULL anway.

I don't like that, this stuff is hard to debug already, having stale
pointers around doesn't help. I think we should just add the barrier in
the releases with barrier.h and locally use a volatile var in the
branches before that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] walsender can ignore send failures in WalSndLoop

2014-02-14 Thread Andres Freund

Hi,

There's a small issue in abfd192b, namely one of the error cases wasn't
changed when WalSndLoop was changed to be able to return.

I don't think this is likely to have any grave consequences, we'll
likely error out soon afterwards again.

Patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 06b22e2..8750dd2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1287,7 +1287,7 @@ WalSndLoop(void)
 	ping_sent = true;
 	/* Try to flush pending output to the client */
 	if (pq_flush_if_writable() != 0)
-		break;
+		goto send_failure;
 }
 			}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-02-14 Thread Mitsumasa KONDO

I add exponential distribution random generator (and little bit
refactoring:) ).
I use inverse transform method to create its distribution.  It's very
simple method that is
created by - log (rand()). We can control slope of distribution using
threshold parameter.
It is same as gaussian threshold.

usage example
  pgbench --exponential=NUM -S

Attached graph is created with exponential threshold = 5. We can see
exponential
distribution in the graphs. It supports -S, -N options and custom script.
So we set
¥setexponential [var] [min] [max] [threshold] in a transaction pattern
file,
it appear distribution we want.

We have no time to fix its very much... But I think almost part of patch
have been completed.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


gaussian_and_exponential_pgbench_v6.patch
Description: Binary data
attachment: exponential=5.png

gnuplot.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Andres Freund

Hi,

In WalSndLoop() we do:

wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
WL_SOCKET_READABLE;

if (pq_is_send_pending())
wakeEvents |= WL_SOCKET_WRITEABLE;
else if (wal_sender_timeout  0  !ping_sent)
{
...
if (GetCurrentTimestamp() = timeout)
WalSndKeepalive(true);
...

I think those two if's should simply be separate. There's no reason not
to ask for a ping when we're writing. On a busy server that might be the
case most of the time.
The if (pq_is_send_pending()) should also be *after* sending the
keepalive, after all, it might not immediately have been flushed as
unlikely as that is.

The attached patch is unsurprisingly simple.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 06b22e2..41db03d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1270,9 +1270,7 @@ WalSndLoop(void)
 			wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
 WL_SOCKET_READABLE;
 
-			if (pq_is_send_pending())
-wakeEvents |= WL_SOCKET_WRITEABLE;
-			else if (wal_sender_timeout  0  !ping_sent)
+			if (wal_sender_timeout  0  !ping_sent)
 			{
 /*
  * If half of wal_sender_timeout has lapsed without receiving
@@ -1291,6 +1289,9 @@ WalSndLoop(void)
 }
 			}
 
+			if (pq_is_send_pending())
+wakeEvents |= WL_SOCKET_WRITEABLE;
+
 			/* Determine time until replication timeout */
 			if (wal_sender_timeout  0)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug

On Feb14, 2014, at 11:45 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
 On Feb10, 2014, at 17:38 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 So what we need to do is to acquire a write barrier between the
 assignments to lwWaitLink and lwWaiting, i.e.
   proc-lwWaitLink = NULL;
   pg_write_barrier();
   proc-lwWaiting = false;
 
 You didn't really explain why you think that ordering is necessary?
 Each proc being awoken will surely see both fields updated, and other
 procs won't be examining these fields at all, since we already delinked
 all these procs from the LWLock's queue.
 
 The problem is that one the released backends could wake up concurrently
 because of a unrelated, or previous PGSemaphoreUnlock(). It could see
 lwWaiting = false, and thus wakeup and acquire the lock, even if the
 store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
 ordering here) yet.
 Now, it may well be that there's no practical consequence of that, but I
 am not prepared to bet on it.
 
 AFAICS there is a potential problem if three backends are involved, since
 by the time the waiting backend's lwWaitLink is set to NULL after the
 original lock holder released the lock, the waiting backend might already
 have acquired the lock (due to a spurious wakeup) *and* a third backend
 might have already enqueued behind it.
 
 The exact sequence for backends A,B,C that corrupts the wait queue is:
 
 A: Release lock, set B's lwWaiting to false
 B: Wakes up spuriously, takes the lock
 C: Enqueues behind B
 A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
   causing C and anyone behind it to block indefinitely.
 
 I don't think that can actually happen because the head of the wait list
 isn't the lock holder's lwWaitLink, but LWLock-head. I thought the same
 for a while...

Hm, true, but does that protect us under all circumstances? If another
backend grabs the lock before B gets a chance to do so, B will become the
wait queue's head, and anyone who enqueues behind B will do so by updating
B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
by the original lock holder.

The specific sequence I have in mind is:

A: Take lock
B: Enqueue
A: Release lock, set B's lwWaiting to false
D: Acquire lock
B: Enqueue after spurious wakeup
   (lock-head := B)
C: Enqueue behind B
   (B-lwWaitLink := C, lock-tail := C)
A: Set B's wWaitLink back to NULL, thereby corrupting the queue
   (B-lwWaitLink := NULL)
D: Release lock, dequeue and wakeup B
   (lock-head := B-lwWaitLink, i.e. lock-head := NULL)
B: Take and release lock, queue appears empty!
   C blocks indefinitely.

 I wonder whether LWLockRelease really needs to update lwWaitLink at all.
 We take the backends we awake off the queue by updating the queue's head and
 tail, so the contents of lwWaitLink should only matter once the backend is
 re-inserted into some wait queue. But when doing that, we reset lwWaitLink
 back to NULL anway.
 
 I don't like that, this stuff is hard to debug already, having stale
 pointers around doesn't help. I think we should just add the barrier in
 the releases with barrier.h and locally use a volatile var in the
 branches before that.

I like the idea of updating lwWaiting and lwWaitLink while still holding the
spinlock better. The costs are probably negligible, and it'd make the code much
easier to reason about.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
  I don't think that can actually happen because the head of the wait list
  isn't the lock holder's lwWaitLink, but LWLock-head. I thought the same
  for a while...
 
 Hm, true, but does that protect us under all circumstances? If another
 backend grabs the lock before B gets a chance to do so, B will become the
 wait queue's head, and anyone who enqueues behind B will do so by updating
 B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
 by the original lock holder.
 
 The specific sequence I have in mind is:
 
 A: Take lock
 B: Enqueue
 A: Release lock, set B's lwWaiting to false
 D: Acquire lock
 B: Enqueue after spurious wakeup
(lock-head := B)
 C: Enqueue behind B
(B-lwWaitLink := C, lock-tail := C)
 A: Set B's wWaitLink back to NULL, thereby corrupting the queue
(B-lwWaitLink := NULL)
 D: Release lock, dequeue and wakeup B
(lock-head := B-lwWaitLink, i.e. lock-head := NULL)
 B: Take and release lock, queue appears empty!
C blocks indefinitely.

That's assuming either reordering by the compiler or an out-of-order
store architecture, right?

  I wonder whether LWLockRelease really needs to update lwWaitLink at all.
  We take the backends we awake off the queue by updating the queue's head 
  and
  tail, so the contents of lwWaitLink should only matter once the backend is
  re-inserted into some wait queue. But when doing that, we reset lwWaitLink
  back to NULL anway.
  
  I don't like that, this stuff is hard to debug already, having stale
  pointers around doesn't help. I think we should just add the barrier in
  the releases with barrier.h and locally use a volatile var in the
  branches before that.
 
 I like the idea of updating lwWaiting and lwWaitLink while still holding the
 spinlock better. The costs are probably negligible, and it'd make the code 
 much
 easier to reason about.

I agree we should do that, but imo not in the backbranches. Anything
more than than the minimal fix in that code should be avoided in the
stable branches, this stuff is friggin performance sensitive, and the
spinlock already is a *major* contention point in many workloads.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with gininsert under very high load

2014-02-14 Thread Andres Freund

On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:
 On 14/02/14 00:49, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
 Something like the attached?  Can somebody who's seen this problem confirm
 this improves matters?
 Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
 I think we would need to at least need to stop using the fast tuple
 mechanism during gininsert() if it's already too big and do plain
 inserts.
 No, because we've already got a process working on cleaning it out.
 
 In any case, this needs some testing to see if it's an improvement
 or not.
 
 Having some real-world experience with the fastupdate mechanism. Under
 concurrent load
 it behaves really bad. Random processes waiting for cleanup (or competing
 with cleanup) is
 going to see latency-spikes, because they magically hit that corner, thus
 reverting to plain
 inserts if it cannot add to the pending list, will not remove this problem,
 but will
 make it only hit the process actually doing the cleanup.

Yea, this is only a part of fixing fastupdate. Limiting the size of the
fastupdate list to something more reasonable is pretty important as
well. Not competing around cleanup will make cleanup much faster though,
so I am not that concerned about the latency spikes it causes once it's
limited in size and done non-concurrently.

 The build in mechanism, that cleanup is i cost paid by the process who
 happened to fill the pendinglist, is really hard to deal with in
 production. More control is appreciated, perhaps even an explicit
 flush-mechanism..  I'd like to batch up inserts during one transaction
 only and flush on commit.

That doesn't seem likely to work with a reasonable amount of effort. The
fastupdate list is shared across all processes, so one backend will
always pay the price for several others.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug

On Feb14, 2014, at 13:36 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
 I don't think that can actually happen because the head of the wait list
 isn't the lock holder's lwWaitLink, but LWLock-head. I thought the same
 for a while...
 
 Hm, true, but does that protect us under all circumstances? If another
 backend grabs the lock before B gets a chance to do so, B will become the
 wait queue's head, and anyone who enqueues behind B will do so by updating
 B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
 by the original lock holder.
 
 The specific sequence I have in mind is:
 
 A: Take lock
 B: Enqueue
 A: Release lock, set B's lwWaiting to false
 D: Acquire lock
 B: Enqueue after spurious wakeup
   (lock-head := B)
 C: Enqueue behind B
   (B-lwWaitLink := C, lock-tail := C)
 A: Set B's wWaitLink back to NULL, thereby corrupting the queue
   (B-lwWaitLink := NULL)
 D: Release lock, dequeue and wakeup B
   (lock-head := B-lwWaitLink, i.e. lock-head := NULL)
 B: Take and release lock, queue appears empty!
   C blocks indefinitely.
 
 That's assuming either reordering by the compiler or an out-of-order
 store architecture, right?

Yes, it requires that a backend exits out of the PGSemaphoreLock loop in
LWLockAcquire before lwWaitLink has been reset to NULL by the previous lock
holder's LWLockRelease. Only if that happens there is a risk of the PGPROC
being on a wait queue by the time lwWaitLink is finally reset to NULL.

 I wonder whether LWLockRelease really needs to update lwWaitLink at all.
 We take the backends we awake off the queue by updating the queue's head 
 and
 tail, so the contents of lwWaitLink should only matter once the backend is
 re-inserted into some wait queue. But when doing that, we reset lwWaitLink
 back to NULL anway.
 
 I don't like that, this stuff is hard to debug already, having stale
 pointers around doesn't help. I think we should just add the barrier in
 the releases with barrier.h and locally use a volatile var in the
 branches before that.
 
 I like the idea of updating lwWaiting and lwWaitLink while still holding the
 spinlock better. The costs are probably negligible, and it'd make the code 
 much
 easier to reason about.
 
 I agree we should do that, but imo not in the backbranches. Anything
 more than than the minimal fix in that code should be avoided in the
 stable branches, this stuff is friggin performance sensitive, and the
 spinlock already is a *major* contention point in many workloads.

No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h? AFAICS the only other choices we have 
on
these branches are to either ignore the bug - it's probably *extremely* unlikely
- or to use a spinlock acquire/release cycle to create a barrier. The former
leaves me with a bit of an uneasy feeling, and the latter quite certainly has
a larger performance impact than moving the PGPROC updates within the section
protected by the spinlock and using an array to remember the backends to wakeup.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Greg Stark

On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund and...@2ndquadrant.com wrote:
 There's no reason not
 to ask for a ping when we're writing.


Is there a reason to ask for a ping? The point of keepalives is to
ensure there's some traffic on idle connections so that if the
connection is dead it doesn't linger forever and so that any on-demand
links (or more recently NAT routers or stateful firewalls) don't time
out and disconnect and have to reconnect (or more recently just fail
outright).

By analogy TCP doesn't send any keepalives if there is any traffic on
the link. On the other hand I happen to know (the hard way) that
typical PPPoE routers *do* send LCP pings even when the link is busy
so there's precedent for either behaviour. I'm guess it comes down to
why you want the keepalives.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Alvaro Herrera

Jerry Sievers wrote:

 The other thing that comes to mind, is that as opposed to
 postgresql.conf and the include scenario there... one can do show all or
 query from pg_stat_activity just to see what setting they ended up
 with. 
 
 I'm not aware of any way to probe what hba rules are loaded at runtime
 and thus, debugging hba config changes not really possible.

Well, getting includes in postgresql.conf in an acceptable shape
required lots more work than just implementing the include directive;
one of the very first requirements was that the file path and line
number that each setting got its value from was available in
pg_settings.  I'm pretty sure that a patch that only added includes
without equivalent functionality would not get very far.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Andres Freund

On 2014-02-14 12:55:06 +, Greg Stark wrote:
 On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  There's no reason not
  to ask for a ping when we're writing.

 Is there a reason to ask for a ping? The point of keepalives is to
 ensure there's some traffic on idle connections so that if the
 connection is dead it doesn't linger forever and so that any on-demand
 links (or more recently NAT routers or stateful firewalls) don't time
 out and disconnect and have to reconnect (or more recently just fail
 outright).

This ain't TCP keepalives. The reason is that we want to kill walsenders
if they haven't responded to a ping inside wal_sender_timeout. That's
rather important e.g. for sychronous replication, so we can quickly fall
over to the next standby. In such scenarios you'll usually want a
timeout *far* below anything TCP provides.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with gininsert under very high load

2014-02-14 Thread Alvaro Herrera

Andres Freund wrote:
 On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:

  The build in mechanism, that cleanup is i cost paid by the process who
  happened to fill the pendinglist, is really hard to deal with in
  production. More control is appreciated, perhaps even an explicit
  flush-mechanism..  I'd like to batch up inserts during one transaction
  only and flush on commit.
 
 That doesn't seem likely to work with a reasonable amount of effort. The
 fastupdate list is shared across all processes, so one backend will
 always pay the price for several others.

Unless some other process does it, such as autovacuum.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
  I agree we should do that, but imo not in the backbranches. Anything
  more than than the minimal fix in that code should be avoided in the
  stable branches, this stuff is friggin performance sensitive, and the
  spinlock already is a *major* contention point in many workloads.
 
 No argument there. But unless I missed something, there actually is a bug
 there, and using volatile won't fix it. A barrier would, but what about the
 back branches that don't have barrier.h?

Yea, but I don't see a better alternative. Realistically the likelihood
of a problem without the compiler reordering stuff is miniscule on any
relevant platform that's supported by the !barrier.h branches. Newer
ARMs are the only realistic suspect, and the support for in older
releases wasn't so good...

 The former
 leaves me with a bit of an uneasy feeling, and the latter quite certainly has
 a larger performance impact than moving the PGPROC updates within the section
 protected by the spinlock and using an array to remember the backends to 
 wakeup.

I am not so sure, it adds a host of new cacheline references in a piece
of code that's already heavily affected by pipeline stalls, that can
influence performance. I am not saying it's super likely, just more than
I want to do for a theoretical problem in the back branches.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-14 Thread Heikki Linnakangas


On 02/14/2014 10:38 AM, Kyotaro HORIGUCHI wrote:

Finally, the patch you will find attached is fixed only in
styling mentioned above from your last patch. This patch applies
current HEAD and I confirmed that it fixes this issue but I have
not checked the lastSourceFailed section. Simple file removal
could not lead to there.


Ok, applied. Thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Andres Freund

On 2014-02-14 13:58:59 +0100, Andres Freund wrote:
 On 2014-02-14 12:55:06 +, Greg Stark wrote:
  On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
   There's no reason not
   to ask for a ping when we're writing.
 
  Is there a reason to ask for a ping? The point of keepalives is to
  ensure there's some traffic on idle connections so that if the
  connection is dead it doesn't linger forever and so that any on-demand
  links (or more recently NAT routers or stateful firewalls) don't time
  out and disconnect and have to reconnect (or more recently just fail
  outright).
 
 This ain't TCP keepalives. The reason is that we want to kill walsenders
 if they haven't responded to a ping inside wal_sender_timeout. That's
 rather important e.g. for sychronous replication, so we can quickly fall
 over to the next standby. In such scenarios you'll usually want a
 timeout *far* below anything TCP provides.

walreceiver sends pings everytime it receives a 'w' message, so it's
probably not an issue there, but pg_receivexlog/basebackup don't; they
use their own configured intervarl. So this might be an explanation of
the latter two being disconnected too early. I've seen reports of
that...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug

On Feb14, 2014, at 14:07 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
 I agree we should do that, but imo not in the backbranches. Anything
 more than than the minimal fix in that code should be avoided in the
 stable branches, this stuff is friggin performance sensitive, and the
 spinlock already is a *major* contention point in many workloads.
 
 No argument there. But unless I missed something, there actually is a bug
 there, and using volatile won't fix it. A barrier would, but what about the
 back branches that don't have barrier.h?
 
 Yea, but I don't see a better alternative. Realistically the likelihood
 of a problem without the compiler reordering stuff is miniscule on any
 relevant platform that's supported by the !barrier.h branches. Newer
 ARMs are the only realistic suspect, and the support for in older
 releases wasn't so good...

Isn't POWER/PowerPC potentially affected by this? 

Also, there is still the alternative of not resetting lwWaitLink in 
LWLockRelease.
I can understand why you oppose that - it's certainly nicer to not have stray
pointer lying around. But since (as least as far as we know)

a) resetting lwWaitLink is not strictly necessary
b) resetting lwWaitLink introduces a race condition (however unlikely)

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.

Another idea for a fix would be to conflate lwWaiting and lwWaitLink into one
field. We could replace lwWaiting by lwWaitLink != NULL everywhere it's
tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to true.

We'd then depend on pointer-sized stores being atomic, which I think we depend
on in other places already.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck

Let me rephrase this:

Let’s remove my motivations and use cases from this conversation….

Why is that a bad idea of rewriting the query before it reaches 
transform/analyze (without ever accessing the catalog)?

If that flexibility is acceptable to you, where would be the best place to put 
it in?

Thanks, David


2014.02.14. dátummal, 10:30 időpontban David Beck db...@starschema.net írta:

 Thanks for the reply. There are two things I think I’ve been misunderstood:
 
 1, the point is to do the rewrite without and before catalog access
 2, I do want to push the join to the source and equally important pushing the 
 where conditions there
 
 Best regards, David
 
 
 2014.02.13. dátummal, 21:22 időpontban Tom Lane t...@sss.pgh.pa.us írta:
 
 David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work 
 on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
 “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”
 
 to:
 
 “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
 
 TBH this sounds like a spectacularly bad idea, especially in the place and
 way you propose to do it.  You can't even do catalog access safely where
 you've put that hook, not to mention that there are many other places
 where queries can be submitted.  But more generally, an FDW should not
 operate in the way you're describing.
 
 We do lack support for pushing joins to the foreign server, and that needs
 to be addressed; but we need to do it in the planner, not by kluging the
 query somewhere upstream of that.
 
  regards, tom lane
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Bruce Momjian

On Thu, Feb 13, 2014 at 11:28:45PM -0600, Jerry Sievers wrote:
  One issue with this is that pg_hba.conf is order sensitive, which could
  become a trap for the unwary if includes are used carelessly.
 
 Indeed.
 
 The other thing that comes to mind, is that as opposed to
 postgresql.conf and the include scenario there... one can do show all or
 query from pg_stat_activity just to see what setting they ended up
 with. 
 
 I'm not aware of any way to probe what hba rules are loaded at runtime
 and thus, debugging hba config changes not really possible.

In an ideal world we would have a tool where you could plug in a
username, database, IP address, and test pg_hba.conf file and it would
report what line is matched.

 I presume that a simple scenario involving just 1 level of includes not
 too difficult to grok but nested includes sure might be a foot gun
 unless there was a way to dump the resulting configs somehow.
 
 Thus pasting hba files together externally a more reliable approach.

You certainly would not have a visual idea of what line is matched
_first_.  We have the same problem with postgresql.conf includes, though
the last match wins there --- not sure if that makes it any easier.

I think one concern is that pg_hba.conf is more security-oriented than
postgresql.conf.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Bruce Momjian

On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
 Bruce,
 
 * Bruce Momjian (br...@momjian.us) wrote:
  On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
   I'm aware of how a pg_hba.conf file can refer to other files for
   including @lists of users, etc.
   
   But there is currently no support for being able to pull in entire file
   segments as can be done for postgresql.conf via the include directive.
   
   In the environment that I'm managing, we are using a makefile to stick
   together a common header with a custom section for any of several
   clusters and may extend this further to permit additional includes for
   hba rules common to groupings of clusters.
   
   Anyway, please advise.  I don't recall hearing anything like this
   discussed.
   
   Has been proposed, discussed and voted down?  Or never mentioned?
  
  I have never heard of anyone request this.
 
 I've brought it up on various threads before, including both the ALTER
 SYSTEM thread and the postgresql.conf 'includes' thread, though I don't
 feel like going back and hunting down the specific emails right now.
 
 Having @include and directory.d-style capabilities for pg_hba.conf *and*
 pg_ident.conf would make managing larger environments much better.
 There has been some talk about providing those capabilities via tables
 in the catalog, but I'm not aware of anyone working on it and it'd
 certainly be quite a bit more work than adding include/dir.d options.

Do we want a TODO for this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
  Having @include and directory.d-style capabilities for pg_hba.conf *and*
  pg_ident.conf would make managing larger environments much better.
  There has been some talk about providing those capabilities via tables
  in the catalog, but I'm not aware of anyone working on it and it'd
  certainly be quite a bit more work than adding include/dir.d options.
 
 Do we want a TODO for this?

Sure, and it should be a candidate for GSoC, imv...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Magnus Hagander

On Fri, Feb 14, 2014 at 3:32 PM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Feb 13, 2014 at 11:28:45PM -0600, Jerry Sievers wrote:
   One issue with this is that pg_hba.conf is order sensitive, which could
   become a trap for the unwary if includes are used carelessly.
 
  Indeed.
 
  The other thing that comes to mind, is that as opposed to
  postgresql.conf and the include scenario there... one can do show all or
  query from pg_stat_activity just to see what setting they ended up
  with.
 
  I'm not aware of any way to probe what hba rules are loaded at runtime
  and thus, debugging hba config changes not really possible.

 In an ideal world we would have a tool where you could plug in a
 username, database, IP address, and test pg_hba.conf file and it would
 report what line is matched.


I almost wrote a function you could call to do that a while back. I never
finished it though :)

It's not all that hard to do, but requires some minor refactoring of how
the HBA code works.

What would also be useful would be to be able to use such a function/tool
against a different file than the current HBA one, to verify *before* you
reload...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 In an ideal world we would have a tool where you could plug in a
 username, database, IP address, and test pg_hba.conf file and it would
 report what line is matched.

That's not a bad idea, but we don't expose the logic that figures that
out today..  It would, perhaps, not be horrible to duplicate it, but
then we'd need to make sure that we update both places if it ever
changes (not that it's changed much in oh-so-many-years).  Perhaps
another candidate to be a GSoC project.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Bruce Momjian

On Fri, Feb 14, 2014 at 09:34:59AM -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
   Having @include and directory.d-style capabilities for pg_hba.conf *and*
   pg_ident.conf would make managing larger environments much better.
   There has been some talk about providing those capabilities via tables
   in the catalog, but I'm not aware of anyone working on it and it'd
   certainly be quite a bit more work than adding include/dir.d options.
  
  Do we want a TODO for this?
 
 Sure, and it should be a candidate for GSoC, imv...

OK, added.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Fabrízio de Royes Mello

On Fri, Feb 14, 2014 at 12:36 PM, Stephen Frost sfr...@snowman.net wrote:

 * Bruce Momjian (br...@momjian.us) wrote:
  In an ideal world we would have a tool where you could plug in a
  username, database, IP address, and test pg_hba.conf file and it would
  report what line is matched.

 That's not a bad idea, but we don't expose the logic that figures that
 out today..  It would, perhaps, not be horrible to duplicate it, but
 then we'd need to make sure that we update both places if it ever
 changes (not that it's changed much in oh-so-many-years).  Perhaps
 another candidate to be a GSoC project.


I would like to participate the GSoC this year as a student.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 Having @include and directory.d-style capabilities for pg_hba.conf *and*
 pg_ident.conf would make managing larger environments much better.

I'm a little suspicious of this, mainly because pg_hba searching is
necessarily linear (and none too cheap per-entry).  I think anyone
who tries to use a set of entries large enough to really need multiple
files is going to have pain.

We already have various methods for making one pg_hba entry do the
work of many; for instance, IP-subnet entries, wildcards, and role
references.  And you can use database CONNECT privilege grants as
another substitute for fine-grained pg_hba entries.

I'd be interested to see a real use-case where those things aren't
an adequate substitute for a pg_hba rule set that's too large to
fit conveniently in one file.  Maybe we could identify another
pg_hba abstraction technique we need to support.

In short: I suspect this approach may be fixing the wrong thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Tom Lane

Florian Pflug f...@phlo.org writes:
 Another idea for a fix would be to conflate lwWaiting and lwWaitLink into one
 field. We could replace lwWaiting by lwWaitLink != NULL everywhere it's
 tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
 enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to true.

 We'd then depend on pointer-sized stores being atomic, which I think we depend
 on in other places already.

I don't believe that's true; neither that we depend on it now, nor that
it would be safe to do so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Andres Freund

On 2014-02-14 10:19:30 -0500, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
  Having @include and directory.d-style capabilities for pg_hba.conf *and*
  pg_ident.conf would make managing larger environments much better.
 
 I'm a little suspicious of this, mainly because pg_hba searching is
 necessarily linear (and none too cheap per-entry).  I think anyone
 who tries to use a set of entries large enough to really need multiple
 files is going to have pain.

I don't think big hba files should be the primary motivator for a
feature like this. What I have seen is that people want a couple entries
that grant local access to maintenance scripts and
root/postgres. Additionally they want to have a *few* for every
application that runs on the same cluster and another additional couple
of entries allowing access to a limited number of replicas.

It's not too hard to script this, simply have a pg_bha.d directory whose
contents are concatenated in an alphabetic order resulting in a combined
file. But that's clearly inferior to builtin support for that because
the error reporting obviously won't be as good and it's not possible to
nicely hook this into reloads and such.

I wonder if the linear search could be improved a bit by separating
database specific entries from database independent ones as the setups
with large number of entries mostly had lots of database specific
rules. It would probably be nontrivial to guarantee the sequential
lookup order, but it should be possible.

 I'd be interested to see a real use-case where those things aren't
 an adequate substitute for a pg_hba rule set that's too large to
 fit conveniently in one file.  Maybe we could identify another
 pg_hba abstraction technique we need to support.

That additionally very well might be. The biggest thing I have seen is
a) the inability to restrict access based on outside group membership
when using ldap, b) the inability to grant membership in groups from
outside of pg.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-14 10:26:07 -0500, Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
  Another idea for a fix would be to conflate lwWaiting and lwWaitLink into 
  one
  field. We could replace lwWaiting by lwWaitLink != NULL everywhere it's
  tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
  enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to 
  true.
 
  We'd then depend on pointer-sized stores being atomic, which I think we 
  depend
  on in other places already.
 
 I don't believe that's true; neither that we depend on it now, nor that
 it would be safe to do so.

Yea, we currently rely on 4 byte stores being atomic (most notably for
xids), but we don't rely on anything bigger. I am not sure if there are
architectures with 64bit pointers that aren't accessed atomically when
aligned? Even if, that's certainly nothing that should be introduced
when backpatching.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug

On Feb14, 2014, at 16:32 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 10:26:07 -0500, Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 Another idea for a fix would be to conflate lwWaiting and lwWaitLink into 
 one
 field. We could replace lwWaiting by lwWaitLink != NULL everywhere it's
 tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
 enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to 
 true.
 
 We'd then depend on pointer-sized stores being atomic, which I think we 
 depend
 on in other places already.
 
 I don't believe that's true; neither that we depend on it now, nor that
 it would be safe to do so.
 
 Yea, we currently rely on 4 byte stores being atomic (most notably for
 xids), but we don't rely on anything bigger. I am not sure if there are
 architectures with 64bit pointers that aren't accessed atomically when
 aligned? Even if, that's certainly nothing that should be introduced
 when backpatching.

Hm, we could use 4-byte stores instead of 8-byte stores if we turned lwWaitLink
into an index into the proc array instead of a pointer to the PGPROC struct.

We could then use 0x instead place of NULL to indicate not waiting,
and PROCARRAY_MAXPROCS to indicate waiting, but no successor in the queue.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Tom Lane

Hiroshi Inoue in...@tpf.co.jp writes:
 (2014/02/12 8:30), Tom Lane wrote:
 Not very clear what's going on there; could this be a problem in
 narwhal's admittedly-ancient toolchain?

 The error doesn't occur here (un)fortunately.
 One thing I'm wondering about is that plperl is linking perlxx.lib
 not libperlxx.a. I made a patch following plpython and it also
 works here.
 Is it worth trying?

I hadn't noticed that part of plpython's Makefile before.  Man,
that's an ugly technique :-(.  Still, there's little about this
platform that isn't ugly.  Let's try it and see what happens.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:
 On Feb14, 2014, at 14:07 , Andres Freund and...@2ndquadrant.com wrote:
  On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
  I agree we should do that, but imo not in the backbranches. Anything
  more than than the minimal fix in that code should be avoided in the
  stable branches, this stuff is friggin performance sensitive, and the
  spinlock already is a *major* contention point in many workloads.
  
  No argument there. But unless I missed something, there actually is a bug
  there, and using volatile won't fix it. A barrier would, but what about the
  back branches that don't have barrier.h?
  
  Yea, but I don't see a better alternative. Realistically the likelihood
  of a problem without the compiler reordering stuff is miniscule on any
  relevant platform that's supported by the !barrier.h branches. Newer
  ARMs are the only realistic suspect, and the support for in older
  releases wasn't so good...
 
 Isn't POWER/PowerPC potentially affected by this? 

I wouldn't consider it a major architecture... And I am not sure how
much out of order a CPU has to be to be affected by this, the read side
uses spinlocks, which in most of the spinlock implementations implies a
full memory barrier which in many cache coherency designs will cause
other CPUs to flush writes. And I think the control dependency in the
PGSemaphoreUnlock() loop will actually cause a flush on ppc...

 Also, there is still the alternative of not resetting lwWaitLink in 
 LWLockRelease.
 I can understand why you oppose that - it's certainly nicer to not have stray
 pointer lying around. But since (as least as far as we know)
 
 a) resetting lwWaitLink is not strictly necessary

I don't want to rely on that in the backbranches, that's an entirely new
assumption. I think anything more than minimal changes are hard to
justify for a theoretical issue that hasn't been observed in the field.

I think the biggest danger here is that writes are reordered by the
compiler, that we definitely need to protect against. Making a variable
volatile or introducing a memory barrier is pretty simple to analyze.

 b) resetting lwWaitLink introduces a race condition (however unlikely)
 
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.

It's not just cleanliness, it's being able to actually debug crashes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  Having @include and directory.d-style capabilities for pg_hba.conf *and*
  pg_ident.conf would make managing larger environments much better.
 
 I'm a little suspicious of this, mainly because pg_hba searching is
 necessarily linear (and none too cheap per-entry).  I think anyone
 who tries to use a set of entries large enough to really need multiple
 files is going to have pain.

It would depend on the connection rate- some of these environments have
low connection rate but quite a few users (eg: corporate database
server).  As expressed up-thread, and in my own experience, people are
already doing this by building up the files themselves, so I don't agree
that performance would be so bad that this isn't worth it.  Not everyone
is using PG to back their website.  Improving performance would always
be good too, or perhaps providing another mechanism which performs
better when there are more entries.

 We already have various methods for making one pg_hba entry do the
 work of many; for instance, IP-subnet entries, wildcards, and role
 references.  And you can use database CONNECT privilege grants as
 another substitute for fine-grained pg_hba entries.

Yes, but those entries all have to agree on the exact authentication
method, including any mappings or other options which are passed through
pg_hba.conf.  Also, CONNECT is only true/false, it doesn't have any
ability to consider source-IP, SSL, etc.

 I'd be interested to see a real use-case where those things aren't
 an adequate substitute for a pg_hba rule set that's too large to
 fit conveniently in one file.  Maybe we could identify another
 pg_hba abstraction technique we need to support.

Being able to 'fit' inside of one file is wholly in the eye of the
beholder and isn't terribly relevant either.  Larger installations will
be using puppet, chef, or one of the other configuration management
systems and will want to use even small/independent files rather than
having to write code which fools around with the one file approach.  The
files I have in /etc/cron.d/ aren't particularly large, but it's really
nice to be able to independently manage and deploy them without
templating them.

Also, all of the above ignores the pg_ident side of the house, which is
even worse as you need an entry for every user, period, if you're using
client-side SSL certificates or Kerberos/GSSAPI-based authentication
with full princ names.

 In short: I suspect this approach may be fixing the wrong thing.

I'm curious what you're thinking would be the right thing to fix here?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Andres Freund

On 2014-02-14 11:03:19 -0500, Stephen Frost wrote:
 Also, all of the above ignores the pg_ident side of the house, which is
 even worse as you need an entry for every user, period, if you're using
 client-side SSL certificates or Kerberos/GSSAPI-based authentication
 with full princ names.

Well, there's regexes for mappings, that can often enough take care of
most of that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-02-14 11:03:19 -0500, Stephen Frost wrote:
  Also, all of the above ignores the pg_ident side of the house, which is
  even worse as you need an entry for every user, period, if you're using
  client-side SSL certificates or Kerberos/GSSAPI-based authentication
  with full princ names.
 
 Well, there's regexes for mappings, that can often enough take care of
 most of that?

In some cases, yes, but certainly not all.  Apologies for over-stating
the case, but I came from an environment where the Kerberos princs were
'm##', while the database users were all first-initial || last-name.
Also, the CN in an SSL certificate isn't likely to be what you want for
a username either, and a regexp isn't likely to help that either.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 In short: I suspect this approach may be fixing the wrong thing.

 I'm curious what you're thinking would be the right thing to fix here?

I was asking for use-cases so we could figure out what's the right thing ;-)

The argument about wanting to assemble a pg_hba file from separately
managed configuration pieces seems to have some merit, but the weak
spot there is how do you define the search order?  Or are you planning
to just cross your fingers and hope it doesn't matter too much?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Andres Freund

On 2014-02-14 11:10:48 -0500, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Tom Lane (t...@sss.pgh.pa.us) wrote:
  In short: I suspect this approach may be fixing the wrong thing.
 
  I'm curious what you're thinking would be the right thing to fix here?
 
 I was asking for use-cases so we could figure out what's the right thing ;-)
 
 The argument about wanting to assemble a pg_hba file from separately
 managed configuration pieces seems to have some merit, but the weak
 spot there is how do you define the search order?  Or are you planning
 to just cross your fingers and hope it doesn't matter too much?

The usual solution is to prepend a numeric prefix guaranteeing the
search order. 00 is sysadmin stuff, 10 replication, 20 database specific
or somesuch. I think most admins using automated tools to manage bigger
configuration files by using some .d config directory already know how
to deal with that problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread knizhnik


On 02/14/2014 07:51 PM, Andres Freund wrote:

On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:

On Feb14, 2014, at 14:07 , Andres Freund and...@2ndquadrant.com wrote:

On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:

I agree we should do that, but imo not in the backbranches. Anything
more than than the minimal fix in that code should be avoided in the
stable branches, this stuff is friggin performance sensitive, and the
spinlock already is a *major* contention point in many workloads.


No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h?


Yea, but I don't see a better alternative. Realistically the likelihood
of a problem without the compiler reordering stuff is miniscule on any
relevant platform that's supported by the !barrier.h branches. Newer
ARMs are the only realistic suspect, and the support for in older
releases wasn't so good...


Isn't POWER/PowerPC potentially affected by this?


I wouldn't consider it a major architecture... And I am not sure how
much out of order a CPU has to be to be affected by this, the read side
uses spinlocks, which in most of the spinlock implementations implies a
full memory barrier which in many cache coherency designs will cause
other CPUs to flush writes. And I think the control dependency in the
PGSemaphoreUnlock() loop will actually cause a flush on ppc...


PGSemaphoreUnlock() should really introduce memory barrier, but the problem can 
happen before PGSemaphoreUnlock() is called.
So presence of PGSemaphoreUnlock() just reduces probability of race condition 
on non-TSO architectures (PPC, ARM, IA64,...) but doesn't completely eliminate 
it.




Also, there is still the alternative of not resetting lwWaitLink in 
LWLockRelease.
I can understand why you oppose that - it's certainly nicer to not have stray
pointer lying around. But since (as least as far as we know)

a) resetting lwWaitLink is not strictly necessary


I don't want to rely on that in the backbranches, that's an entirely new
assumption. I think anything more than minimal changes are hard to
justify for a theoretical issue that hasn't been observed in the field.

I think the biggest danger here is that writes are reordered by the
compiler, that we definitely need to protect against. Making a variable
volatile or introducing a memory barrier is pretty simple to analyze.


b) resetting lwWaitLink introduces a race condition (however unlikely)

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.


It's not just cleanliness, it's being able to actually debug crashes.



Frankly speaking I do not understand why elimination of resetting of lwWaitLink 
was considered to be bad idea.
Resetting this pointer to NULL will not help much to analyze crash dumps, 
because right now it is possible that lwWaitLink==NULL but process in waiting 
for a lock and linked in the list
(if it is the last element of the list). So the fact that  lwWaitLink==NULL 
actually gives not so much useful information.





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-14 20:23:32 +0400, knizhnik wrote:
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.
 
 It's not just cleanliness, it's being able to actually debug crashes.
 
 
 Frankly speaking I do not understand why elimination of resetting of 
 lwWaitLink was considered to be bad idea.
 Resetting this pointer to NULL will not help much to analyze crash dumps, 
 because right now it is possible that lwWaitLink==NULL but process in waiting 
 for a lock and linked in the list
 (if it is the last element of the list). So the fact that  lwWaitLink==NULL 
 actually gives not so much useful information.

At the moment if you connect to a live pg or a crash dump, investigating
the wait links allows you to somewhat sensibly determine which backends
are waiting for a lock and whether they are part of a queue. If they
aren't reset anymore that will tell you nothing, so you'll need to
connect to all pg instances to debug issues.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread knizhnik


On 02/14/2014 08:28 PM, Andres Freund wrote:

On 2014-02-14 20:23:32 +0400, knizhnik wrote:

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.


It's not just cleanliness, it's being able to actually debug crashes.



Frankly speaking I do not understand why elimination of resetting of lwWaitLink 
was considered to be bad idea.
Resetting this pointer to NULL will not help much to analyze crash dumps, 
because right now it is possible that lwWaitLink==NULL but process in waiting 
for a lock and linked in the list
(if it is the last element of the list). So the fact that  lwWaitLink==NULL 
actually gives not so much useful information.


At the moment if you connect to a live pg or a crash dump, investigating
the wait links allows you to somewhat sensibly determine which backends
are waiting for a lock and whether they are part of a queue. If they
aren't reset anymore that will tell you nothing, so you'll need to
connect to all pg instances to debug issues.


Why it is not enough to inspect lwWaiting flag?
In any case, resetting lwWaitLink can be safely done in awakened process:


if (!proc-lwWaiting) {
proc-lwWaitLink = NULL;
break;
}





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark

On Fri, Feb 14, 2014 at 2:28 PM, David Beck db...@starschema.net wrote:
 Why is that a bad idea of rewriting the query before it reaches 
 transform/analyze (without ever accessing the catalog)?

 If that flexibility is acceptable to you, where would be the best place to 
 put it in?

Well if there are two foreign tables and the planner could push the
join work down to the fdw then the planner should be able to
accurately represent that plan and cost it without having the user
have to create any catalog structures. That's what the planner does
for every other type of plan node.

What you're describing would still be useful for materialized views.
In that case the user is creating the materialized view and it is a
real thing in the catalogs that won't disappear on the planner. Even
then it would be ideal if the planner could decide to use the
materialized view late enough that it can actually determine if it's
superior rather than rewriting the query before it gets to that point.
That would be much more flexible for users too who might not write the
query in a way that exactly matches the materialized view.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-02-14 11:10:48 -0500, Tom Lane wrote:
  Stephen Frost sfr...@snowman.net writes:
   * Tom Lane (t...@sss.pgh.pa.us) wrote:
   In short: I suspect this approach may be fixing the wrong thing.
  
   I'm curious what you're thinking would be the right thing to fix here?
  
  I was asking for use-cases so we could figure out what's the right thing ;-)

The specific use-cases that I've seen are primairly around managing the
sets of users who are allowed to connect from specific subnets.  As for
how the files might be independently managed, as Andres pointed out, all
systems would get a 'default' set which includes access for the local
'postgres' unix user, and then certain sets of systems would get
appropriate combinations of:

system-specific admin users (from anywhere)
normal users for that environment (from the desktop network)
application users for app servers (from the app servers)
ETL users for batch systems (from the ETL servers)
sales users for the *replica* (whose CONNECT privs can't be different..)

etc..  Would it be possible to have those all in one pg_hba file?  Sure,
in most cases, and you'd trust that the accounts simply wouldn't exist
or that they wouldn't have CONNECT privs on the database, but this isn't
how I run my systems (entries in pg_hba for users that don't exist?) and
I doubt I'm alone in that.  Being able to enforce based on source-IP is
*very* nice and is only available in pg_hba today.  Also, as noted
above, if you're running a replica that should have different access
rights (in particular, more 'open') then you can't depend on CONNECT
privs or account authentication (username/password differences) to
control access since that's all in the catalog and therefore must be
identical to the primary.

  The argument about wanting to assemble a pg_hba file from separately
  managed configuration pieces seems to have some merit, but the weak
  spot there is how do you define the search order?  Or are you planning
  to just cross your fingers and hope it doesn't matter too much?
 
 The usual solution is to prepend a numeric prefix guaranteeing the
 search order. 00 is sysadmin stuff, 10 replication, 20 database specific
 or somesuch. I think most admins using automated tools to manage bigger
 configuration files by using some .d config directory already know how
 to deal with that problem.

Indeed, and run-parts codified it as 'lexical sort order (according to
the C/POSIX locale)', under at least Debian and Ubuntu systems.  I'm
pretty confident that most admins are going to be familiar with handling
this issue as it's exactly how /etc/cron.daily and anything else called
through run-parts works.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Tom Lane

I wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 One thing I'm wondering about is that plperl is linking perlxx.lib
 not libperlxx.a. I made a patch following plpython and it also
 works here.
 Is it worth trying?

 I hadn't noticed that part of plpython's Makefile before.  Man,
 that's an ugly technique :-(.  Still, there's little about this
 platform that isn't ugly.  Let's try it and see what happens.

And what happens is this:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhaldt=2014-02-14%2017%3A00%3A02
namely, it gets through plperl now and then chokes with the same
symptoms on pltcl.  So I guess we need the same hack in pltcl.
The fun never stops ...

(BTW, narwhal is evidently not trying to build plpython.  I wonder
why not?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Jerry Sievers

Tom Lane t...@sss.pgh.pa.us writes:

 Stephen Frost sfr...@snowman.net writes:

 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 In short: I suspect this approach may be fixing the wrong thing.

 I'm curious what you're thinking would be the right thing to fix here?

 I was asking for use-cases so we could figure out what's the right thing ;-)

No doubt this environment I took over managing, of 90 clusters ranging
from tiny to ~3.5TB, some with hundreds of user accounts...  could stand
some massive rejiggering in regards to users/rols and such to go towards
doing it the right way.

That said, trying to roll up the hba files some with over 300 entries
and lots of cases of high duplication among clusters belonging to somne
subset, would be daunting and perhaps invasive.

I realize that gathering up those duplicates into a file common to
whatever group and then having that pulled in as an include is going to
result  in some reordering of the rules  since they are not in a totally
predictable order  presently

And my Jr. DBAs are still handling requests daily to add more hba rules
quite often to more than a single machine but still along the same
groupings mentioned.

Even without retrofitting a good cleanup here, being able to include a
global section at top of all files and at least one group specific file
next then followed by legacy entries and/or items specific to a single
cluster, might save a lot of work.

Thus my idea of using a make file for this and then inspiring the
question about includes.

Great just getting the ball rolling and I respect whatever the concensus
opinion that emerges.

Thx


 The argument about wanting to assemble a pg_hba file from separately
 managed configuration pieces seems to have some merit, but the weak
 spot there is how do you define the search order?  Or are you planning
 to just cross your fingers and hope it doesn't matter too much?

   regards, tom lane


-- 
Jerry Sievers
e: jerry.siev...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug

On Feb14, 2014, at 16:51 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:
 On Feb14, 2014, at 14:07 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
 I agree we should do that, but imo not in the backbranches. Anything
 more than than the minimal fix in that code should be avoided in the
 stable branches, this stuff is friggin performance sensitive, and the
 spinlock already is a *major* contention point in many workloads.
 
 No argument there. But unless I missed something, there actually is a bug
 there, and using volatile won't fix it. A barrier would, but what about the
 back branches that don't have barrier.h?
 
 Yea, but I don't see a better alternative. Realistically the likelihood
 of a problem without the compiler reordering stuff is miniscule on any
 relevant platform that's supported by the !barrier.h branches. Newer
 ARMs are the only realistic suspect, and the support for in older
 releases wasn't so good...
 
 Isn't POWER/PowerPC potentially affected by this? 
 
 I wouldn't consider it a major architecture... And I am not sure how
 much out of order a CPU has to be to be affected by this, the read side
 uses spinlocks, which in most of the spinlock implementations implies a
 full memory barrier which in many cache coherency designs will cause
 other CPUs to flush writes. And I think the control dependency in the
 PGSemaphoreUnlock() loop will actually cause a flush on ppc...

I guess it's sufficient for the store to lwWaitLink to be delayed.
As long as the CPU on which that store is executing hasn't managed to gain
exclusive access to the relevant cache line, memory barriers on the read
side won't help because the store won't be visible to other CPUs.

 Also, there is still the alternative of not resetting lwWaitLink in 
 LWLockRelease.
 I can understand why you oppose that - it's certainly nicer to not have stray
 pointer lying around. But since (as least as far as we know)
 
 a) resetting lwWaitLink is not strictly necessary
 
 I don't want to rely on that in the backbranches, that's an entirely new
 assumption. I think anything more than minimal changes are hard to
 justify for a theoretical issue that hasn't been observed in the field.
 
 I think the biggest danger here is that writes are reordered by the
 compiler, that we definitely need to protect against. Making a variable
 volatile or introducing a memory barrier is pretty simple to analyze.

Well, the assumption isn't all that new. We already have the situation that
a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
Currently, the process who took it off the queue is responsible for rectifying
that eventually, with the changed proposed below it'll be the backend who
owns the PGPROC. From the point of view of other backends, nothing really
changes.

 
 b) resetting lwWaitLink introduces a race condition (however unlikely)
 
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.
 
 It's not just cleanliness, it's being able to actually debug crashes.

We could still arrange for the stray lwWaitLink from being visible only
momentarily. If a backend is woken up and detects that lwWaiting is false,
it knows that it cannot be on any wait queue - it was just removed, and
hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
back to NULL. The stray value would thus only be visible while a backend 
executes
the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
from a stack trace. So I'm not convinced that this makes debugging any harder.

(knizhnik has just suggested the same)

It's interesting, BTW, that updating lwWaitLink after lwWaiting is OK here -
the crucial property is not that it's updated before lwWaiting, but rather that
it is reset before enqueuing the backend again. Which is something that backend
itself can guarantee far more easily than whoever happens to de-queue it due to
spurious wakeups.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git-review: linking commits to review discussion in git

2014-02-14 Thread Peter C Rigby

On 2014-01-28 07:10:45 Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 01/27/2014 11:36 PM, Murtuza Mukadam wrote:

 Hello All,
We have linked peer review discussions on
 'pgsql-hackers' to their respective commits within the main
 postgresql.git repository. You can view the linked reviews from 2012
 until present in the GitHub repo at
 https://github.com/mmukadam/postgres/tree/review
[snip]
 I don't understand what this does. The repository at
 https://github.com/mmukadam/postgres looks like just a clone of the main
 PostgreSQL repository, with no extra links anywhere. And the repository at
 https://github.com/mmukadam/postgres/tree/review looks like a mailing list
 archive turned into a git repository, but I don't see any links to the
 commits in the main repository there.

 Am I missing something?

If you want to search for the reviews related to a commit, you can
type in the full hash code here:
http://users.encs.concordia.ca/~m_mukada/git-review-tracker.html

or append the commit hash to the following url:

https://github.com/mmukadam/postgres/tree/review/commit_hash


It's a bit hard to view the reviews on GitHub because the 'review' branch was
designed to be viewed on the command line using a set of simple scripts that
sit on top of the standard git commands.
For example,

to view all reviews Heikki is involved in you can do: git review
--log-reviewer heikki

to view the reviews of particular commit: git review --log
1a1832eb085e5bca198735e5d0e766a3cb61b8fc

You can also create new reviews on a commit and mail them as well as
import reviews from email.

The tool is beta, but it just creates a detached review branch, so it
won't affect master. To remove it all you do is delete the scripts and
the 'review' branch. More info available here:

Online man page: http://users.encs.concordia.ca/~m_mukada/git-review.html

Tutorial: http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html

Installation: 
http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html#Install

Cheers,
Peter

PS this is part of Murtuza's thesis, so any feedback is appreciated.
If you'd like us to extract reviews from another mailing list or repo,
please let us know


Example output:

git review --log-reviewer heikki

Review: adfe85d4dd1844c1f09401d2d970f564d0dac61d
Commit Reviewed: 593a9631a7947ab95903e87e24786d7e469cc988
Author: Tom Lane t...@sss.pgh.pa.us
AuthorDate: Tue Feb 21 15:03:36 2012 -0500
Reviewer:   Heikki Linnakangas hlinnakan...@vmware.com hlinnakan...@vmware.com
ReviewDate: Mon Jan 27 16:34:44 2014 +0200


Re: Race condition in b-tree page deletion

Review: e7bfe4323d2b83605df83174f11785c117ad3a53
Commit Reviewed: 56a57473a999b0497e63bde3e303beda5a3c0ff3
Author: Tom Lane t...@sss.pgh.pa.us
AuthorDate: Sat Jan 8 14:47:13 2011 -0500
Reviewer:   Heikki Linnakangas hlinnakan...@vmware.com hlinnakan...@vmware.com
ReviewDate: Sat Jan 25 23:21:20 2014 +0200


Re: GIN improvements part2: fast scan



-- 
http://users.encs.concordia.ca/~pcr/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Josh Berkus

All,

Do the 9.3.3 replication fixes mean that users should reclone their
replicas, like 9.3.2 did?  Or not?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Andres Freund

Hi,

On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
 Do the 9.3.3 replication fixes mean that users should reclone their
 replicas, like 9.3.2 did?  Or not?

Which replication replication fixes are you referring to?
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
?
If so, no, that doesn't require a reclone.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund

On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
  I wouldn't consider it a major architecture... And I am not sure how
  much out of order a CPU has to be to be affected by this, the read side
  uses spinlocks, which in most of the spinlock implementations implies a
  full memory barrier which in many cache coherency designs will cause
  other CPUs to flush writes. And I think the control dependency in the
  PGSemaphoreUnlock() loop will actually cause a flush on ppc...
 
 I guess it's sufficient for the store to lwWaitLink to be delayed.
 As long as the CPU on which that store is executing hasn't managed to gain
 exclusive access to the relevant cache line, memory barriers on the read
 side won't help because the store won't be visible to other CPUs.

The whole lwlock actually should be on the same cacheline on anything
with cachelines = 32. As the woken up side is doing an atomic op on it
*before* modifying lwWaitLink I think we are actually guaranteed that
any incomplete store on the writer will have completed unless the compiler
reordered. So we are safe against out of order stores, but not out of
order execution. That might have been what prevented the issue from
being noticable on some platforms.

 Well, the assumption isn't all that new. We already have the situation that
 a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
 Currently, the process who took it off the queue is responsible for rectifying
 that eventually, with the changed proposed below it'll be the backend who
 owns the PGPROC. From the point of view of other backends, nothing really
 changes.

That window is absolutely tiny tho.

  b) resetting lwWaitLink introduces a race condition (however unlikely)
  
  we'll trade correctness for cleanliness if we continue to reset lwWaitLink
  without protecting against the race. That's a bit of a weird trade-off to 
  make.
  
  It's not just cleanliness, it's being able to actually debug crashes.
 
 We could still arrange for the stray lwWaitLink from being visible only
 momentarily. If a backend is woken up and detects that lwWaiting is false,
 it knows that it cannot be on any wait queue - it was just removed, and
 hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
 back to NULL. The stray value would thus only be visible while a backend 
 executes
 the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
 from a stack trace. So I'm not convinced that this makes debugging any harder.

That's not actually safe on an out of order architecture afaics. Without
barriers the store to lwWaitLink in the woken up backend can preempt the
read for the next element in the PGSemaphoreUnlock() loop.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Josh Berkus

On 02/14/2014 01:11 PM, Andres Freund wrote:
 Hi,
 
 On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
 Do the 9.3.3 replication fixes mean that users should reclone their
 replicas, like 9.3.2 did?  Or not?
 
 Which replication replication fixes are you referring to?
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
 ?
 If so, no, that doesn't require a reclone.

Hmmm.  I thought there were also 9.3-only replication fixes in this
release?  No?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Andres Freund

On 2014-02-14 13:33:46 -0500, Josh Berkus wrote:
 On 02/14/2014 01:11 PM, Andres Freund wrote:
  Hi,
  
  On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
  Do the 9.3.3 replication fixes mean that users should reclone their
  replicas, like 9.3.2 did?  Or not?
  
  Which replication replication fixes are you referring to?
  http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
  ?
  If so, no, that doesn't require a reclone.
 
 Hmmm.  I thought there were also 9.3-only replication fixes in this
 release?  No?

I don't know any. There's further multixact fixes but AFAIK there's
nothing replication specific, and they shouldn't cause problems but lost
row locks in some edge cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Jeff Janes

On Fri, Feb 14, 2014 at 6:33 AM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
  Bruce,

  Having @include and directory.d-style capabilities for pg_hba.conf *and*
  pg_ident.conf would make managing larger environments much better.
  There has been some talk about providing those capabilities via tables
  in the catalog, but I'm not aware of anyone working on it and it'd
  certainly be quite a bit more work than adding include/dir.d options.

 Do we want a TODO for this?


If we are assembling a wish-list, I've often wanted the opposite of an
include.  I want the ability to encapsulate the contents of pg_hba.conf
directly into postgresql.conf.  So instead of giving a filename to
hba_file, optionally give a multi-lined string with some kind of here-doc
like mechanism, or something like that.

When I set up a forked dev environment and then eventually want to compare
the diverged dev setup back to production, I often forget to compare the
pg_hba.conf file.

Cheers,

Jeff

Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Alvaro Herrera

Andres Freund wrote:
 On 2014-02-14 13:33:46 -0500, Josh Berkus wrote:
  On 02/14/2014 01:11 PM, Andres Freund wrote:
   Hi,
   
   On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
   Do the 9.3.3 replication fixes mean that users should reclone their
   replicas, like 9.3.2 did?  Or not?
   
   Which replication replication fixes are you referring to?
   http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
   ?
   If so, no, that doesn't require a reclone.
  
  Hmmm.  I thought there were also 9.3-only replication fixes in this
  release?  No?
 
 I don't know any. There's further multixact fixes but AFAIK there's
 nothing replication specific, and they shouldn't cause problems but lost
 row locks in some edge cases.

There is one issue that might cause foreign keys to go unchecked.  In
cases where applications update referenced tuples and then delete them
in the same transaction, it might be wise to recheck foreign keys.  This
is the relevant commit:

commit db1014bc46de21a6de1751b807ea084e607104ad
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
Date:   Wed Dec 18 13:31:27 2013 -0300

Don't ignore tuple locks propagated by our updates

If a tuple was locked by transaction A, and transaction B updated it,
the new version of the tuple created by B would be locked by A, yet
visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the
lock held by A wouldn't get checked if transaction B later deleted (or
key-updated) the new version of the tuple.  This might cause referential
integrity checks to give false positives (that is, allow deletes that
should have been rejected).

This is an easy oversight to have made, because prior to improved tuple
locks in commit 0ac5ad5134f it wasn't possible to have tuples created by
our own transaction that were also locked by remote transactions, and so
locks weren't even considered in that code path.

It is recommended that foreign keys be rechecked manually in bulk after
installing this update, in case some referenced rows are missing with
some referencing row remaining.

Per bug reported by Daniel Wood in
CAPweHKe5QQ1747X2c0tA=5zf4yns2xcvgf13opd-1mq24rf...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Greg Stark

Going over this I think this is still a potential issue:

On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:


 I am not sure that explains the issue, but I think the redo action for
 truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
 do a smgrtruncate() (and then mdtruncate) which will iterate over the
 segments starting at 0 till mdnblocks()/segment_size and *truncate* but
 not delete individual segment files that are not needed anymore, right?
 If we crash in the midst of that a new mdtruncate() will be issued, but
 it will get a shorter value back from mdnblocks().

 Am I missing something?


I'm not too familiar with md.c but my reading of the code is that we
truncate the files in reverse order? In which case I think the code is safe
*iff* the filesystem guarantees ordered meta data writes which I tihnk ext3
does (I think in all the journal modes). Most filesystems meta data writes
are synchronous so the truncates are safe for them too.

But we don't generally rely on meta data writes being ordered. I think the
correct thing to do is to record the nblocks prior to the truncate and
then have md.c expose a new function that takes that parameter and pokes
around looking for any segments it might need to clean up. But that would
involve lots of abstraction violations in md.c. I think using nblocks would
keep the violations within md.c but that still seems like a pain.

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck

I think I’m gonna need to dig into the planner to fully understand your points. 
Thank you for the insights. I was more into putting the knowledge of the legacy 
system into the an extension and my codebase. Now I see better use of the 
planner would help. Thank you.

What inspired me is the scriptable query rewrite in 
http://dev.mysql.com/downloads/mysql-proxy/ 
The hook I proposed would be a lot nicer in Postgres because the raw parsing is 
already done at this point while in mysql-proxy that has to be done manually.

Another point I liked in mysql is the possibility to write info schema plugins: 
http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
Like a virtual catalog. Is there anything similar in Postgres?

Thank you, David


2014.02.14. dátummal, 18:06 időpontban Greg Stark st...@mit.edu írta:

 On Fri, Feb 14, 2014 at 2:28 PM, David Beck db...@starschema.net wrote:
 Why is that a bad idea of rewriting the query before it reaches 
 transform/analyze (without ever accessing the catalog)?
 
 If that flexibility is acceptable to you, where would be the best place to 
 put it in?
 
 Well if there are two foreign tables and the planner could push the
 join work down to the fdw then the planner should be able to
 accurately represent that plan and cost it without having the user
 have to create any catalog structures. That's what the planner does
 for every other type of plan node.
 
 What you're describing would still be useful for materialized views.
 In that case the user is creating the materialized view and it is a
 real thing in the catalogs that won't disappear on the planner. Even
 then it would be ideal if the planner could decide to use the
 materialized view late enough that it can actually determine if it's
 superior rather than rewriting the query before it gets to that point.
 That would be much more flexible for users too who might not write the
 query in a way that exactly matches the materialized view.
 
 -- 
 greg



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [SQL] Comparison semantics of CHAR data type

2014-02-14 Thread Bruce Momjian

On Thu, Feb 13, 2014 at 09:47:01PM -0500, Bruce Momjian wrote:
 On Wed, Oct 16, 2013 at 02:17:11PM -0400, Bruce Momjian wrote:
You can see the UTF8 case is fine because \n is considered greater
than space, but in the C locale, where \n is less than space, the
false return value shows the problem with
internal_bpchar_pattern_compare() trimming the string and first
comparing on lengths.  This is exactly the problem you outline, where
space trimming assumes everything is less than a space.
   
   For collations other than C some of those issues that have to do with
   string comparisons might simply be hidden, depending on how strcoll()
   handles inputs off different lengths: If strcoll() applies implicit
   space padding to the shorter value, there won't be any visible
   difference in ordering between bpchar and varchar values.  If strcoll()
   does not apply such space padding, the right-trimming of bpchar values
   causes very similar issues even in a en_US collation.
 
 I have added the attached C comment to explain the problem, and added a
 TODO item to fix it if we ever break binary upgrading.
 
 Does anyone think this warrants a doc mention?

I have done some more thinking on this and I found a way to document
this, which reduces our need to actually fix it some day.  I am afraid
the behavioral change needed to fix this might break so many
applications that the fix will never be done, though I will keep the
TODO item until I get more feedback on that.  Patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 30fd9bb..9635004
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** SELECT '52093.89'::money::numeric::float
*** 1072,1081 
 para
  Values of type typecharacter/type are physically padded
  with spaces to the specified width replaceablen/, and are
! stored and displayed that way.  However, the padding spaces are
! treated as semantically insignificant.  Trailing spaces are
! disregarded when comparing two values of type typecharacter/type,
! and they will be removed when converting a typecharacter/type value
  to one of the other string types.  Note that trailing spaces
  emphasisare/ semantically significant in
  typecharacter varying/type and typetext/type values, and
--- 1072,1084 
 para
  Values of type typecharacter/type are physically padded
  with spaces to the specified width replaceablen/, and are
! stored and displayed that way.  However, trailing spaces are treated as
! semantically insignificant and disregarded when comparing two values
! of type typecharacter/type.  In collations where whitespace
! is significant, this behavior can produce unexpected results,
! e.g. commandSELECT 'a '::CHAR(2) collate C  'a\n'::CHAR(2)
! returns true.
! Trailing spaces are removed when converting a typecharacter/type value
  to one of the other string types.  Note that trailing spaces
  emphasisare/ semantically significant in
  typecharacter varying/type and typetext/type values, and
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
new file mode 100644
index 284b5d1..502ca44
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
*** bpcharcmp(PG_FUNCTION_ARGS)
*** 846,863 
  len2;
  	int			cmp;
  
- 	/*
- 	 * Trimming trailing spaces off of both strings can cause a string
- 	 * with a character less than a space to compare greater than a
- 	 * space-extended string, e.g. this returns false:
- 	 *		SELECT E'ab\n'::CHAR(10)  E'ab '::CHAR(10);
- 	 * even though '\n' is less than the space if CHAR(10) was
- 	 * space-extended.  The correct solution would be to trim only
- 	 * the longer string to be the same length of the shorter, if
- 	 * possible, then do the comparison.  However, changing this
- 	 * might break existing indexes, breaking binary upgrades.
- 	 * For details, see http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV=z...@mail.gmail.com
- 	 */
  	len1 = bcTruelen(arg1);
  	len2 = bcTruelen(arg2);
  
--- 846,851 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-02-14 Thread Alvaro Herrera

Haribabu Kommi escribió:

 I changed the balance cost calculations a little bit to give priority to
 the user provided per table autovacuum parameters.
 If any user specified per table vacuum parameters exists and those are
 different with guc vacuum parameters then the
 balance cost calculations will not include that worker in calculation. Only
 the cost is distributed between other workers
 with specified guc vacuum cost parameter.
 
 The problem in this calculation is if the user provides same guc values to
 the per table values also then it doesn't consider them in calculation.

I think this is a strange approach to the problem, because if you
configure the backends just so, they are completely ignored instead of
being adjusted.  And this would have action-at-a-distance consequences
because if you change the defaults in postgresql.conf you end up with
completely different behavior on the tables for which you have carefully
tuned the delay so that they are ignored in rebalance calculations.

I think that rather than ignoring some backends completely, we should be
looking at how to weight the balancing calculations among all the
backends in some smart way that doesn't mean they end up with the
default values of limit, which AFAIU is what happens now -- which is
stupid.  Not real sure how to do that, perhaps base it on the
globally-configured ratio of delay/limit vs. the table-specific ratio.

What I mean is that perhaps the current approach is all wrong and we
need to find a better algorithm to suit this case and more generally.
Of course, I don't mean to say that it should behave completely
differently than now in normal cases, only that it shouldn't give
completely stupid results in corner cases such as this one.

As an example, suppose that global limit=200 and global delay=20 (the
defaults).  Then we have a global ratio of 5.  If all three tables being
vacuumed currently are using the default values, then they all have
ratio=5 and therefore all should have the same limit and delay settings
applied after rebalance.  Now, if two tables have ratio=5 and one table
has been configured to have a very fast vacuum, that is limit=1,
then ratio for that table is 1/20=500.  Therefore that table should
be configured, after rebalance, to have a limit and delay that are 100
times faster than the settings for the other two tables.  (And there is
a further constraint that the total delay per limit unit should be
so-and-so to accomodate getting the correct total delay per limit unit.)

I haven't thought about how to code that, but I don't think it should be
too difficult.  Want to give it a try?  I think it makes sense to modify
both the running delay and the running limit to achieve whatever ratio
we come up with, except that delay should probably not go below 10ms
because, apparently, some platforms have that much of sleep granularity
and it wouldn't really work to have a smaller delay.

Am I making sense?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-14 Thread Kevin Grittner

We have had a case where a production cluster was accidentally shut
down by a customer who used Ctrl+C in the same sh session in which
they had (long before) run pg_ctl start.  We have only seen this in
sh on Solaris.  Other shells on Solaris don't behave this way, nor
does sh on tested versions of Linux.  Nevertheless, the problem is
seen on the default shell for a supported OS.

Analysis suggests that this is because the postmaster retains the
process group ID of the original parent (in this case pg_ctl).  If
pg_ctl is run through the setpgrp command a subsequent Ctrl+C in
the sh session does not shut down the PostgreSQL cluster.

On my development Linux machine:

$ ps axfopid,ppid,pgid,command
  PID  PPID  PGID COMMAND
[ ... ]
 8416 1  8412 /home/kgrittn/pg/master/Debug/bin/postgres -D Debug/data
 8418  8416  8418  \_ postgres: checkpointer process
 8419  8416  8419  \_ postgres: writer process
 8420  8416  8420  \_ postgres: wal writer process
 8421  8416  8421  \_ postgres: autovacuum launcher process
 8422  8416  8422  \_ postgres: stats collector process
 8427  8416  8427  \_ postgres: kgrittn test [local] idle

All of the PPID values seem correct, and while the PGID values for
backends might initially seem surprising, the commit notes and C
comments here explain why each backend sets up its own process
group:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3ad0728

What is surprising is that the postmaster doesn't set up its own
process group when it is running as a daemon.  We probably don't
want to change that when postgres is run directly from a command
line for development or diagnostic purposes, but Noah suggested
perhaps we should add a --daemonize option which pg_ctl should use
when launching the postmaster, which would cause it to create its
own session group.

Although this is arguably a bug, it seems like it is very rarely
hit and has several workarounds, and any fix would either change
things in a way which might break existing user scripts or would
require a new command-line option; so back-patching a fix to stable
branches doesn't seem appropriate.  I would argue for including a
fix in 9.4 on the basis of it being a bug fix and there being time
to mention it in the release change notes; but I understand the
counter-arguments and realize this is a judgment call.

Thoughts?

If the new option seems reasonable, I can draft a patch to
implement that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-14 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 What is surprising is that the postmaster doesn't set up its own
 process group when it is running as a daemon.  We probably don't
 want to change that when postgres is run directly from a command
 line for development or diagnostic purposes, but Noah suggested
 perhaps we should add a --daemonize option which pg_ctl should use
 when launching the postmaster, which would cause it to create its
 own session group.

We intentionally removed the daemonization support that used to
be there; see commit f7ea6beaf4ca02b8e6dc576255e35a5b86035cb9.
One of the things it did was exactly this.  I'm a bit disinclined
to put that back.

If this is, as it sounds to be, a Solaris shell bug, doesn't it
affect other daemons too?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark

On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 Another point I liked in mysql is the possibility to write info schema 
 plugins: 
 http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
 Like a virtual catalog. Is there anything similar in Postgres?

The documentation you linked to describes how to provide
information_schema plugins but not why you would want to do such a
thing. I'm not seeing why this would be useful. The information_schema
schema is described by the standard so creating new views in it isn't
needed very often and the schema for the existing views doesn't change
very often. I can see why a plugin might want to add rows to the views
but that doesn't seem to be what this feature is about.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-02-14 Thread Haribabu Kommi

On Feb 15, 2014 9:19 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Haribabu Kommi escribió:

  I changed the balance cost calculations a little bit to give priority to
  the user provided per table autovacuum parameters.
  If any user specified per table vacuum parameters exists and those are
  different with guc vacuum parameters then the
  balance cost calculations will not include that worker in calculation.
Only
  the cost is distributed between other workers
  with specified guc vacuum cost parameter.
 
  The problem in this calculation is if the user provides same guc values
to
  the per table values also then it doesn't consider them in calculation.

 I think this is a strange approach to the problem, because if you
 configure the backends just so, they are completely ignored instead of
 being adjusted.  And this would have action-at-a-distance consequences
 because if you change the defaults in postgresql.conf you end up with
 completely different behavior on the tables for which you have carefully
 tuned the delay so that they are ignored in rebalance calculations.

 I think that rather than ignoring some backends completely, we should be
 looking at how to weight the balancing calculations among all the
 backends in some smart way that doesn't mean they end up with the
 default values of limit, which AFAIU is what happens now -- which is
 stupid.  Not real sure how to do that, perhaps base it on the
 globally-configured ratio of delay/limit vs. the table-specific ratio.

 What I mean is that perhaps the current approach is all wrong and we
 need to find a better algorithm to suit this case and more generally.
 Of course, I don't mean to say that it should behave completely
 differently than now in normal cases, only that it shouldn't give
 completely stupid results in corner cases such as this one.

 As an example, suppose that global limit=200 and global delay=20 (the
 defaults).  Then we have a global ratio of 5.  If all three tables being
 vacuumed currently are using the default values, then they all have
 ratio=5 and therefore all should have the same limit and delay settings
 applied after rebalance.  Now, if two tables have ratio=5 and one table
 has been configured to have a very fast vacuum, that is limit=1,
 then ratio for that table is 1/20=500.  Therefore that table should
 be configured, after rebalance, to have a limit and delay that are 100
 times faster than the settings for the other two tables.  (And there is
 a further constraint that the total delay per limit unit should be
 so-and-so to accomodate getting the correct total delay per limit unit.)

 I haven't thought about how to code that, but I don't think it should be
 too difficult.  Want to give it a try?  I think it makes sense to modify
 both the running delay and the running limit to achieve whatever ratio
 we come up with, except that delay should probably not go below 10ms
 because, apparently, some platforms have that much of sleep granularity
 and it wouldn't really work to have a smaller delay.

 Am I making sense?

Yes makes sense and it's a good approach also not leaving the delay
parameter as is. Thanks I will give a try.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 Another point I liked in mysql is the possibility to write info schema 
 plugins: 
 http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
 Like a virtual catalog. Is there anything similar in Postgres?

 The documentation you linked to describes how to provide
 information_schema plugins but not why you would want to do such a
 thing. I'm not seeing why this would be useful. The information_schema
 schema is described by the standard so creating new views in it isn't
 needed very often and the schema for the existing views doesn't change
 very often.

IIRC, mysql has a liberal view about what they can do to the
information_schema, so for them this isn't as insane as it sounds to us.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark

On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 What inspired me is the scriptable query rewrite in 
 http://dev.mysql.com/downloads/mysql-proxy/
 The hook I proposed would be a lot nicer in Postgres because the raw parsing 
 is already done at this point while in mysql-proxy that has to be done 
 manually.

There are a few similar things in the Postgres world such as pgbouncer
and plproxy but as you note they are limited in how powerful they can
be by the complexity of parsing and analyzing SQL.

I think Postgres tends to take a different tack regarding
extensibility than MySQL. Rather than have one system then hooks to
allow external code to modify or replace it, in Postgres modules
usually are treated similarly to the internal code. This is a pretty
broad generalization and there are certainly exceptions. But for
example new data types, functions, even whole new index types are all
treated almost identically to the internal data types, functions, and
index types. The planner then considers them all more or less on equal
basis.

Obviously there are limits. Btree indexes are kind of special because
they represent Postgres's basic concept of ordering. And we don't have
a pluggable recovery system which makes any externally provided index
types non-crash-safe. And we don't have a generalized concept of table
storage -- the closest thing we have is FDWs which is more like
MySQL's style of extensibility where the extension is a special case.

But this is why our instinct is that if you want to be able to push
down joins the way to do it is to extend the FDW api so that the
planner can make those decisions just like it makes the decisions
about internal joins rather than have an external module that takes
over part of the planner's work.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Andres Freund

On 2014-02-14 20:46:01 +, Greg Stark wrote:
 Going over this I think this is still a potential issue:
 
 On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:
 
 
  I am not sure that explains the issue, but I think the redo action for
  truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
  do a smgrtruncate() (and then mdtruncate) which will iterate over the
  segments starting at 0 till mdnblocks()/segment_size and *truncate* but
  not delete individual segment files that are not needed anymore, right?
  If we crash in the midst of that a new mdtruncate() will be issued, but
  it will get a shorter value back from mdnblocks().
 
  Am I missing something?
 
 
 I'm not too familiar with md.c but my reading of the code is that we
 truncate the files in reverse order?

That's what I had assumed as well, but it doesn't look that way:

v = mdopen(reln, forknum, EXTENSION_FAIL);

priorblocks = 0; /* - initial value */
while (v != NULL)
{
MdfdVec*ov = v;

if (priorblocks  nblocks)
{
/* truncate entire segment */
}
else if (priorblocks + ((BlockNumber) RELSEG_SIZE)  nblocks)
{
/* truncate entire segment */
}
else
{
/* nothing to do, all needed */
}
priorblocks += RELSEG_SIZE;
}

So, according to that we start at segment 0, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Hiroshi Inoue

(2014/02/15 2:32), Tom Lane wrote:
 I wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 One thing I'm wondering about is that plperl is linking perlxx.lib
 not libperlxx.a. I made a patch following plpython and it also
 works here.
 Is it worth trying?
 
 I hadn't noticed that part of plpython's Makefile before.  Man,
 that's an ugly technique :-(.  Still, there's little about this
 platform that isn't ugly.  Let's try it and see what happens.
 
 And what happens is this:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhaldt=2014-02-14%2017%3A00%3A02
 namely, it gets through plperl now and then chokes with the same
 symptoms on pltcl.  So I guess we need the same hack in pltcl.
 The fun never stops ...

There seems some risk to link msvc import library.
Linking perlxx.lib or tcl.lib worls here but linking pythonxx.lib
doesn't work even in newer machine.

 (BTW, narwhal is evidently not trying to build plpython.  I wonder
 why not?)

Due to *initializer element is not constant* error which also can be
 see on my old machine.
http://www.postgresql.org/message-id/CA+OCxozr=0wkQDF7kfd2n-bJQOwdSUs0Myohg29pA_U5=2p...@mail.gmail.com


regards,
Hiroshi Inoue




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-14 Thread Greg Stark

On 14 Feb 2014 23:07, Tom Lane t...@sss.pgh.pa.us wrote:

 If this is, as it sounds to be, a Solaris shell bug, doesn't it
 affect other daemons too?

This is simmering i never exactly followed but i think if the shell doesn't
support job control it's expected behaviour, not a bug. Only shells that
support job control create new process groups for every backgrounded
command.

I would have expected if I run postgres myself that it be attached to the
terminal and die when I C-c it but if it's started by pg_ctl I would have
thought it was running independently of my terminal and shell.

Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?

2014-02-14 Thread Michael Paquier

On Thu, Feb 6, 2014 at 11:26 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Here are updated patches to use pg_lsn instead of pglsn...
Should I register this patch somewhere to avoid having it lost in the void?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Tom Lane

Hiroshi Inoue in...@tpf.co.jp writes:
 (2014/02/15 2:32), Tom Lane wrote:
 (BTW, narwhal is evidently not trying to build plpython.  I wonder
 why not?)

 Due to *initializer element is not constant* error which also can be
  see on my old machine.

Hmm, isn't that one of the symptoms of inadequacies in
--enable-auto-import?  Wonder if the disable change fixed it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding

2014-02-14 Thread digoal

HI,
   Thanks very much.  
We use dblink or foreign table migrate datas instead pg_dump now resolve 
the error data load problem.

--
公益是一辈子的事,I'm Digoal,Just Do It.




At 2014-02-14 04:49:08,Tom Lane t...@sss.pgh.pa.us wrote:
dig...@126.com writes:
 select t, t::bytea from convert_from('\xeec1', 'sql_ascii') as g(t);
 [ fails to check that string is valid in database encoding ]

Hm, yeah.  Normal input to the database goes through pg_any_to_server(),
which will apply a validation step if the source encoding is SQL_ASCII
and the destination encoding is something else.  However, pg_convert and
some other places call pg_do_encoding_conversion() directly, and that
function will just quietly do nothing if either encoding is SQL_ASCII.

The minimum-refactoring solution to this would be to tweak
pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but
the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing.

I'm not sure if this would break anything we need to have work,
though.  Thoughts?  Do we want to back-patch such a change?

   regards, tom lane

Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug

On Feb14, 2014, at 19:21 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
 Well, the assumption isn't all that new. We already have the situation that
 a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
 Currently, the process who took it off the queue is responsible for 
 rectifying
 that eventually, with the changed proposed below it'll be the backend who
 owns the PGPROC. From the point of view of other backends, nothing really
 changes.
 
 That window is absolutely tiny tho.

True, but it's there, so if anything counts on that never being the case, it's
still already broken.

 b) resetting lwWaitLink introduces a race condition (however unlikely)
 
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.
 
 It's not just cleanliness, it's being able to actually debug crashes.
 
 We could still arrange for the stray lwWaitLink from being visible only
 momentarily. If a backend is woken up and detects that lwWaiting is false,
 it knows that it cannot be on any wait queue - it was just removed, and
 hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
 back to NULL. The stray value would thus only be visible while a backend 
 executes
 the PGSemaphoreLock() loop, and whether or not this is the case can be 
 deduced
 from a stack trace. So I'm not convinced that this makes debugging any 
 harder.
 
 That's not actually safe on an out of order architecture afaics. Without
 barriers the store to lwWaitLink in the woken up backend can preempt the
 read for the next element in the PGSemaphoreUnlock() loop.

Hm, true. I guess we could prevent that by introducing an artificial dependency
between the read and the write - something like

  proc = head;
  head = proc-lwWaitLink
  /*
   * We don't ever expect to actually PANIC here, but the test forces the
   * load to execute before the store to lwWaiting. This prevents a race
   * between reading lwWaitLink here and resetting it back to zero in the
   * awoken backend (Note that backends can wake up spuriously, so just
   * reading it before doing PGSemaphoreUnlock is insufficient)
   */
  if (head != MyProc)
proc-lwWaiting = false;
  else
elog(PANIC, ...)
  PGSemaphoreUnlock(proc-sem);

(This assumes that proc is a volatile pointer)

Another idea would be to do as you suggest and only mark the PGPROC pointers
volatile, but to additionally add a check for queue corruption somewhere. We 
should
be able to detect that - if we ever hit this issue, LWLockRelease should find a
PGPROC while traversing the queue whose lwWaitLink is NULL but which isn't 
equal to
lock-tail. If that ever happens, we'd simply PANIC.

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-14 20:46:01 +, Greg Stark wrote:
 Going over this I think this is still a potential issue:
 On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:
 I am not sure that explains the issue, but I think the redo action for
 truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
 do a smgrtruncate() (and then mdtruncate) which will iterate over the
 segments starting at 0 till mdnblocks()/segment_size and *truncate* but
 not delete individual segment files that are not needed anymore, right?
 If we crash in the midst of that a new mdtruncate() will be issued, but
 it will get a shorter value back from mdnblocks().

 I'm not too familiar with md.c but my reading of the code is that we
 truncate the files in reverse order?

 That's what I had assumed as well, but it doesn't look that way:

No, it's deleting forward.

We could probably fix things so it deleted backwards; it'd be a tad
tedious because the list structure isn't organized that way, but we
could do it.  Not sure if that's good enough though.  If you don't
want to assume the filesystem metadata is coherent after a crash,
we might have nonzero-size segments after zero-size ones, even if
the truncate calls had been issued in the right order.

Another possibility is to keep opening and truncating files until
we don't find the next segment in sequence, looking directly at the
filesystem not at the mdfd chain.  I don't think this would be
appropriate in normal operation, but we could do it if InRecovery
(and maybe even only if we don't think the database is consistent?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Small psql memory fix

2014-02-14 Thread Bruce Momjian

The attached tiny patch fixes a small leak in psql's \gset command and
simplifies memory freeing in two places.
 
-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 764534a..e80528d
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** exec_command(const char *cmd,
*** 746,753 
  		{
  			expand_tilde(fname);
  			pset.gfname = pg_strdup(fname);
  		}
- 		free(fname);
  		status = PSQL_CMD_SEND;
  	}
  
--- 746,753 
  		{
  			expand_tilde(fname);
  			pset.gfname = pg_strdup(fname);
+ 			free(fname);
  		}
  		status = PSQL_CMD_SEND;
  	}
  
*** exec_command(const char *cmd,
*** 757,762 
--- 757,764 
  		char	   *prefix = psql_scan_slash_option(scan_state,
  	OT_NORMAL, NULL, false);
  
+ 		if (pset.gset_prefix)
+ 			free(pset.gset_prefix);
  		if (prefix)
  			pset.gset_prefix = prefix;
  		else
*** exec_command(const char *cmd,
*** 1152,1159 
  success = false;
  			}
  			free(newval);
  		}
- 		free(opt0);
  	}
  
  
--- 1154,1161 
  success = false;
  			}
  			free(newval);
+ 			free(opt0);
  		}
  	}
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication

2014-02-14 Thread Robert Haas

On Wed, Feb 12, 2014 at 8:00 AM, Christian Kruse
christ...@2ndquadrant.com wrote:
 On Wednesday 12 February 2014 11:14:56 Andres Freund wrote:
 But they do take up shared memory without really needing to. I
 personally don't find that too bad, it's not much memory. If we want to
 avoid it we could have a LocalPgBackendStatus that includes the normal
 PgBackendStatus. Since pgstat_read_current_status() copies all the data
 locally, that'd be a sensible point to fill it. While that will cause a
 bit of churn, I'd guess we can use the infrastructure in the not too far
 away future for other parts.

 That's a good idea. Attached you will find a patch implementing it
 that way; is this how you pictured it?

 Although I'm not sure if this shouldn't be done in two patches, one
 for the changes needed for LocalPgBackendStatus and one for the
 xid/xmin changes.

Well, this version of the patch reveals a mighty interesting point: a
lot of the people who are calling pgstat_fetch_stat_beentry() don't
need this additional information and might prefer not to pay the cost
of fetching it.  None of pg_stat_get_backend_pid,
pg_stat_get_backend_dbid, pg_stat_get_backend_userid,
pg_stat_get_backend_activity, pg_stat_get_backend_activity,
pg_stat_get_backend_waiting, pg_stat_get_backend_activity_start,
pg_stat_get_backend_xact_start, pg_stat_get_backend_start,
pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port,
pg_stat_get_backend_client_port, and pg_stat_get_db_numbackends
actually need this new information; it's only ever used in one place.
So it seems like it might be wise to have pgstat_fetch_stat_beentry
continue to return the PgBackendStatus * and add a new function
pgstat_fetch_stat_local_beentry to fetch the LocalPgBackendStatus *;
then most of these call sites wouldn't need to change.

It would still be the case that pgstat_read_current_status() pays the
price of fetching this information even if pg_stat_get_activity is
never called.  But since that's probably by far the most commonly-used
API for this information, that's probably OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 The attached tiny patch fixes a small leak in psql's \gset command and
 simplifies memory freeing in two places.

The first and third hunks look to me like they introduce memory
leaks, not fix them.  The second hunk is probably safe enough,
although I'm not sure the case can actually occur --- gset should
free the prefix before any new backslash command can be executed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Bruce Momjian

On Fri, Feb 14, 2014 at 11:28:49PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  The attached tiny patch fixes a small leak in psql's \gset command and
  simplifies memory freeing in two places.
 
 The first and third hunks look to me like they introduce memory
 leaks, not fix them.  The second hunk is probably safe enough,

The first and third just move the free into blocks where we have already
tested the value is not null.  It just more clearly matches the
surrounding code.  Do you see something different?

 although I'm not sure the case can actually occur --- gset should
 free the prefix before any new backslash command can be executed.

Oh, interesting idea.  Updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 764534a..8225159
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** exec_command(const char *cmd,
*** 746,762 
  		{
  			expand_tilde(fname);
  			pset.gfname = pg_strdup(fname);
  		}
- 		free(fname);
  		status = PSQL_CMD_SEND;
  	}
  
  	/* \gset [prefix] -- send query and store result into variables */
  	else if (strcmp(cmd, gset) == 0)
  	{
! 		char	   *prefix = psql_scan_slash_option(scan_state,
! 	OT_NORMAL, NULL, false);
  
  		if (prefix)
  			pset.gset_prefix = prefix;
  		else
--- 746,766 
  		{
  			expand_tilde(fname);
  			pset.gfname = pg_strdup(fname);
+ 			free(fname);
  		}
  		status = PSQL_CMD_SEND;
  	}
  
  	/* \gset [prefix] -- send query and store result into variables */
  	else if (strcmp(cmd, gset) == 0)
  	{
! 		char	   *prefix;
! 
! 		if (pset.gset_prefix)
! 			free(pset.gset_prefix);
  
+ 		prefix = psql_scan_slash_option(scan_state,
+ 		OT_NORMAL, NULL, false);
  		if (prefix)
  			pset.gset_prefix = prefix;
  		else
*** exec_command(const char *cmd,
*** 1152,1159 
  success = false;
  			}
  			free(newval);
  		}
- 		free(opt0);
  	}
  
  
--- 1156,1163 
  success = false;
  			}
  			free(newval);
+ 			free(opt0);
  		}
  	}
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 On Fri, Feb 14, 2014 at 11:28:49PM -0500, Tom Lane wrote:
 The first and third hunks look to me like they introduce memory
 leaks, not fix them.  The second hunk is probably safe enough,

 The first and third just move the free into blocks where we have already
 tested the value is not null.  It just more clearly matches the
 surrounding code.  Do you see something different?

[ looks closer... ]  OK, they're not actually broken, but I question
whether they're improvements.  The existing coding is clear enough
that the result of psql_scan_slash_option will be freed.  What you're
doing is conflating that requirement with some other processing.

 although I'm not sure the case can actually occur --- gset should
 free the prefix before any new backslash command can be executed.

 Oh, interesting idea.  Updated patch attached.

I don't think that's an improvement at all.  My point was that I
didn't think gset_prefix would ever be set at entry to this code,
because the setting should never survive for more than one backslash
command execution cycle.

If you want to add probably-useless code to free it, go ahead, but
this isn't a clearer way to do that than your previous version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Bruce Momjian

On Fri, Feb 14, 2014 at 11:52:42PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Fri, Feb 14, 2014 at 11:28:49PM -0500, Tom Lane wrote:
  The first and third hunks look to me like they introduce memory
  leaks, not fix them.  The second hunk is probably safe enough,
 
  The first and third just move the free into blocks where we have already
  tested the value is not null.  It just more clearly matches the
  surrounding code.  Do you see something different?
 
 [ looks closer... ]  OK, they're not actually broken, but I question
 whether they're improvements.  The existing coding is clear enough
 that the result of psql_scan_slash_option will be freed.  What you're
 doing is conflating that requirement with some other processing.
 
  although I'm not sure the case can actually occur --- gset should
  free the prefix before any new backslash command can be executed.
 
  Oh, interesting idea.  Updated patch attached.
 
 I don't think that's an improvement at all.  My point was that I
 didn't think gset_prefix would ever be set at entry to this code,
 because the setting should never survive for more than one backslash
 command execution cycle.
 
 If you want to add probably-useless code to free it, go ahead, but
 this isn't a clearer way to do that than your previous version.

If you think this code isn't helping, I will just discard it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2014-02-14 Thread Amit Kapila

On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
 On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila amit.kapil...@gmail.com wrote:
  On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila amit.kapil...@gmail.com 
  wrote:
  On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net 
  wrote:
  I agree we'll probably want to work around it in the end, but I still
  think it should be put to Microsoft PSS if we can. The usual - have we
  actually produced a self-contained example that does just this (and
  doesn't include the full postgres support) and submitted it to
  *microsoft* for comments?

 Further update on this issue:

 Microsoft has suggested a workaround for stat API. Their suggestion
 is to use 'GetFileAttributesEx' instead of stat, when I tried their
 suggestion, it also gives me same problem as stat.

 Still they have not told anything about other API's
 (rmdir, RemoveDirectory) which has same problem.

 Where are we on this?

Till now we didn't received any workaround which can fix this problem
from Microsoft. From the discussion over support ticket with them,
it seems this problem is in their kernel and changing the code for
it might not be straight forward for them, neither they have any clear
alternative.

 Is there a check we should add in our code?

We can possibly solve this problem in one of the below ways:

1. Resolve symbolic link to actual path in code whenever we tries to
access it.

2. Another way is to check in code (initdb and create tablespace)
to not allow path of length more than ~120 for Windows.

Approach-1 has benefit that it can support the actual MAX_PATH and
even if MS doesn't resolve the problem, PostgreSQL will not face it.

Approach-2 is straightforward to fix. If we want to go with Approach-2,
then we might change the limit of MaxPath for windows in future
whenever there is a fix for it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

84 matches

Mail list logo