date:20170913

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-13 Thread Heikki Linnakangas


On 09/12/2017 04:09 AM, Noah Misch wrote:

On Wed, May 10, 2017 at 10:50:51PM -0400, Bruce Momjian wrote:

On Mon, May  1, 2017 at 08:12:51AM -0400, Robert Haas wrote:

On Tue, Apr 25, 2017 at 10:16 PM, Bruce Momjian  wrote:

Well, we could add "MD5 users are encouraged to switch to
SCRAM-SHA-256".  Now whether we want to list this as something on the
SCRAM-SHA-256 description, or mention it as an incompatibility, or
under Migration.  I am not clear that MD5 is in such terrible shape that
this is warranted.


I think it's warranted.  The continuing use of MD5 has been a headache
for some EnterpriseDB customers who have compliance requirements which
they must meet.  It's not that they themselves necessarily know or
care whether MD5 is secure, although in some cases they do; it's that
if they use it, they will be breaking laws or regulations to which
their business or agency is subject.  I imagine customers of other
PostgreSQL companies have similar issues.  But leaving that aside, the
advantage of SCRAM isn't merely that it uses a better algorithm to
hash the password.  It has other advantages also, like not being
vulnerable to replay attacks.  If you're doing password
authentication, you should really be using SCRAM, and encouraging
people to move to SCRAM after upgrading is a good idea.

That having been said, SCRAM is a wire protocol break.  You will not
be able to upgrade to SCRAM unless and until the drivers you use to
connect to the database add support for it.  The only such driver
that's part of libpq; other drivers that have reimplemented the
PostgreSQL wire protocol will have to be updated with SCRAM support
before it will be possible to use SCRAM with those drivers.  I think
this should be mentioned in the release notes, too.  I also think it
would be great if somebody would put together a wiki page listing all
the popular drivers and (1) whether they use libpq or reimplement the
wire protocol, and (2) if the latter, the status of any efforts to
implement SCRAM, and (3) if those efforts have been completed, the
version from which they support SCRAM.  Then, I think we should reach
out to all of the maintainers of those driver authors who aren't
moving to support SCRAM and encourage them to do so.


I have added this as an open item because we will have to wait to see
where we are with driver support as the release gets closer.


With the release near, I'm promoting this to the regular open issues section.


Thanks.

I updated the list of drivers on the wiki 
(https://wiki.postgresql.org/wiki/List_of_drivers), adding a column for 
whether the driver supports SCRAM authentication. Currently, the only 
non-libpq driver that has implemented SCRAM is the JDBC driver. I 
submitted a patch for the Go driver, but it hasn't been committed yet.


As for a recommendation in the release notes, maybe something like 
"Installations using MD5 authentication are encouraged to switch to 
SCRAM-SHA-256, unless using older client programs or drivers that don't 
support it yet."


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.

2017-09-13 Thread Andres Freund

Hi,

Surprising myself I discovered that in workloads that do a large number
of fmgr_info* lookups, fmgr_isbuiltin() is actually quite the
bottleneck.

In my development build we have 2765 builtin functions, stored in a 88KB
array. Apparently the ~12 steps are quite noticeable. Generally binary
searches have quite a poor memory access pattern...

In the workload I playing around with right now (producing this wall of
performance fixes) the main source of lookups is
printtup_prepare_info(), which does a fmgr_info for every column. If you
have a large number of columns ...

I think we could conceivable deduplicate the output functions for
printtup to one FmgrInfo per type? I'd assume that it'd be good for
things besides reducing the overhead - alowing the respective function
to reuse fn_extra might be quite good.  I've not implemented that idea
at this point, I'm not 100% what the best way to do so is without also
causing slowdowns.

Another idea would be to have an array of FmgrBuiltin*, that we index by
oid. That'd not be super small though, given that the space for function
oids is sparse.

Thus what I've instead done is replacing the binary search in
fmgr_isbuiltin() with a simplehash.h style hashtable. After that the
lookup is still visible in the profile, but far less prominent.

I'd like to move the hash creation out of fmgr_isbuiltin (to avoid
having to check whether it's already created), but there's currently no
convenient place to create the hash from.   Now that we don't rely on
the sortedness of fmgrtab.c we could remove a few lines from
Gen_fmgrtab.pl, but I don't quite see the advantage. If we were
interested in a faster by-name lookup we could sort it by name, but
that'd be better solved by another hashtable...


I was wondering about also replacing the C function hash with a
simplehash, but given that I've not seen this in profiles, I've not
bothered so far.

Greetings,

Andres Freund
>From 2b3e06380d5a339efc94e748aa57985d3bb80223 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 18:43:46 -0700
Subject: [PATCH 4/8] Add inline murmurhash32(int32) function.

The function already existed in tidbitmap.c but more users requiring
fast hashing of 32bit ints are coming up.
---
 src/backend/nodes/tidbitmap.c | 20 ++--
 src/include/utils/hashutils.h | 18 ++
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c4e53adb0c..01d6bc5c11 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -45,6 +45,7 @@
 #include "nodes/tidbitmap.h"
 #include "storage/lwlock.h"
 #include "utils/dsa.h"
+#include "utils/hashutils.h"
 
 /*
  * The maximum number of tuples per page is not large (typically 256 with
@@ -237,30 +238,13 @@ static int	tbm_comparator(const void *left, const void *right);
 static int tbm_shared_comparator(const void *left, const void *right,
 	  void *arg);
 
-/*
- * Simple inline murmur hash implementation for the exact width required, for
- * performance.
- */
-static inline uint32
-hash_blockno(BlockNumber b)
-{
-	uint32		h = b;
-
-	h ^= h >> 16;
-	h *= 0x85ebca6b;
-	h ^= h >> 13;
-	h *= 0xc2b2ae35;
-	h ^= h >> 16;
-	return h;
-}
-
 /* define hashtable mapping block numbers to PagetableEntry's */
 #define SH_USE_NONDEFAULT_ALLOCATOR
 #define SH_PREFIX pagetable
 #define SH_ELEMENT_TYPE PagetableEntry
 #define SH_KEY_TYPE BlockNumber
 #define SH_KEY blockno
-#define SH_HASH_KEY(tb, key) hash_blockno(key)
+#define SH_HASH_KEY(tb, key) murmurhash32(key)
 #define SH_EQUAL(tb, a, b) a == b
 #define SH_SCOPE static inline
 #define SH_DEFINE
diff --git a/src/include/utils/hashutils.h b/src/include/utils/hashutils.h
index 56b7bfc9cb..35281689e8 100644
--- a/src/include/utils/hashutils.h
+++ b/src/include/utils/hashutils.h
@@ -20,4 +20,22 @@ hash_combine(uint32 a, uint32 b)
 	return a;
 }
 
+
+/*
+ * Simple inline murmur hash implementation hashing a 32 bit ingeger, for
+ * performance.
+ */
+static inline uint32
+murmurhash32(uint32 data)
+{
+	uint32		h = data;
+
+	h ^= h >> 16;
+	h *= 0x85ebca6b;
+	h ^= h >> 13;
+	h *= 0xc2b2ae35;
+	h ^= h >> 16;
+	return h;
+}
+
 #endif			/* HASHUTILS_H */
-- 
2.14.1.536.g6867272d5b.dirty

>From 703ddd56fb484692c84101d1731e60f9ea81193f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 19:43:02 -0700
Subject: [PATCH 5/8] Replace binary search in fmgr_isbuiltin with hashtable.

Turns out we have enough functions that the binary search is quite
noticeable in profiles.

It'd be nice if there were a better place to build the hashtable than
the first pass through fmgr_isbuiltin() but there's currently none.
---
 src/backend/utils/fmgr/fmgr.c | 63 ---
 1 file changed, 47 insertions(+), 16 deletions(-)

diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index a7b07827e0..87867f2183 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-09-13 Thread Kyotaro HORIGUCHI

At Wed, 13 Sep 2017 17:42:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170913.174239.25978735.horiguchi.kyot...@lab.ntt.co.jp>
> filterdiff seems to did something wrong..

# to did...

The patch is broken by filterdiff so I send a new patch made
directly by git format-patch. I confirmed that a build completes
with applying this.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7086b5855080065f73de4d099cbaab09511f01fc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Sep 2017 13:01:33 +0900
Subject: [PATCH] Fix WAL logging problem

---
 src/backend/access/heap/heapam.c| 113 +---
 src/backend/access/heap/pruneheap.c |   3 +-
 src/backend/access/heap/rewriteheap.c   |   4 +-
 src/backend/access/heap/visibilitymap.c |   3 +-
 src/backend/access/transam/xact.c   |   7 +
 src/backend/catalog/storage.c   | 318 +---
 src/backend/commands/copy.c |  13 +-
 src/backend/commands/createas.c |   9 +-
 src/backend/commands/matview.c  |   6 +-
 src/backend/commands/tablecmds.c|   8 +-
 src/backend/commands/vacuumlazy.c   |   6 +-
 src/backend/storage/buffer/bufmgr.c |  40 +++-
 src/backend/utils/cache/relcache.c  |  13 ++
 src/include/access/heapam.h |   8 +-
 src/include/catalog/storage.h   |   5 +-
 src/include/storage/bufmgr.h|   2 +
 src/include/utils/rel.h |   8 +
 17 files changed, 476 insertions(+), 90 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d20f038..e40254d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -34,6 +34,28 @@
  *	  the POSTGRES heap access method used for all POSTGRES
  *	  relations.
  *
+ * WAL CONSIDERATIONS
+ *	  All heap operations are normally WAL-logged. but there are a few
+ *	  exceptions. Temporary and unlogged relations never need to be
+ *	  WAL-logged, but we can also skip WAL-logging for a table that was
+ *	  created in the same transaction, if we don't need WAL for PITR or
+ *	  WAL archival purposes (i.e. if wal_level=minimal), and we fsync()
+ *	  the file to disk at COMMIT instead.
+ *
+ *	  The same-relation optimization is not employed automatically on all
+ *	  updates to a table that was created in the same transacton, because
+ *	  for a small number of changes, it's cheaper to just create the WAL
+ *	  records than fsyncing() the whole relation at COMMIT. It is only
+ *	  worthwhile for (presumably) large operations like COPY, CLUSTER,
+ *	  or VACUUM FULL. Use heap_register_sync() to initiate such an
+ *	  operation; it will cause any subsequent updates to the table to skip
+ *	  WAL-logging, if possible, and cause the heap to be synced to disk at
+ *	  COMMIT.
+ *
+ *	  To make that work, all modifications to heap must use
+ *	  HeapNeedsWAL() to check if WAL-logging is needed in this transaction
+ *	  for the given block.
+ *
  *-
  */
 #include "postgres.h"
@@ -56,6 +78,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/storage.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -2373,12 +2396,6 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2409,6 +2426,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * TID where the tuple was stored.  But note that any toasting of fields
  * within the tuple data is NOT reflected into *tup.
  */
+extern HTAB *pendingSyncs;
 Oid
 heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			int options, BulkInsertState bistate)
@@ -2482,7 +2500,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
+	if (BufferNeedsWAL(relation, buffer))
 	{
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
@@ -2681,12 +2699,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	int			ndone;
 	char	   *scratch = NULL;
 	Page		page;
-	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
 	bool		need_cids = RelationIsAccessibleInLogicalDecoding(relation);
 
-	needwal = !(options & HEAP_INSERT_SKIP_WAL) &&

[HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-09-13 Thread Andres Freund

Hi,

When running workloads that include fast queries with a lot of columns,
SendRowDescriptionMessage(), and the routines it calls, becomes a
bottleneck.  Besides syscache lookups (see [1] and [2]) a major cost of
that is manipulation of the StringBuffer and the version specific
branches in the per-attribute loop.  As so often, the performance
differential of this patch gets bigger when the other performance
patches are applied.

The issues in SendRowDescriptionMessage() are the following:

1) All the pq_sendint calls, and also the pq_sendstring, are
   expensive. The amount of calculations done for a single 2/4 byte
   addition to the stringbuffer (particularly enlargeStringInfo()) is
   problematic, as are the reallocations themselves.

   I addressed this by adding pq_send*_pre() wrappers, implemented as
   inline functions, that require that memory is pre-allocated.
   Combining that with doing a enlargeStringInfo() in
   SendRowDescriptionMessage() that pre-allocates the maximum required
   memory, yields pretty good speedup.

   I'm not yet super sure about the implementation. For one, I'm not
   sure this shouldn't instead be stringinfo.h functions, with very very
   tiny pqformat.h wrappers. But conversely I think it'd make a lot of
   sense for the pqformat integer functions to get rid of the
   continually maintained trailing null-byte - I was hoping the compiler
   could optimize that away, but alas, no luck.  As soon as a single
   integer is sent, you can't rely on 0 terminated strings anyway.

2) It creates a new StringInfo in every iteration. That causes
   noticeable memory management overhead, and restarts the size of the
   buffer every time. When the description is relatively large, that
   leads to a number of reallocations for every
   SendRowDescriptionMessage() call.

   My solution here was to create persistent StringInfo for
   SendRowDescriptionMessage() that never gets deallocated (just
   reset). That in combination with new versions of
   pq_beginmessage/endmessage that keep the buffer alive, yields a nice
   speedup.

   Currently I'm using a static variable to allocate a string buffer for
   the function. It'd probably better to manage that outside of a single
   function - I'm also wondering why we're allocating a good number of
   stringinfos in various places, rather than reuse them. Most of them
   can't be entered recursively, and even if that's a concern, we could
   have one reusable per portal or such.

3) The v2/v3 branches in the attribute loop are noticeable (others too,
   but well...).  I solved that by splitting out the v2 and v3
   per-attribute loops into separate functions.  Imo also a good chunk
   more readable.

Comments?

Greetings,

Andres Freund

[1] 
http://archives.postgresql.org/message-id/ca+tgmobj72e_tg6w98h0oubccumoc4urmjocypbnwc2rxya...@mail.gmail.com
[2] 
http://archives.postgresql.org/message-id/20170914061207.zxotvyopetm7lrrp%40alap3.anarazel.de
>From 672cbfd0660e4d2b2cc6980f3f5c2af27e692404 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 18:39:24 -0700
Subject: [PATCH 2/8] Add more efficient functions to pqformat API.

New inline functions allow to add data to a stringbuf in a more
efficient manner by pre-allocating ahead of time, and
pq_beginmessage_pre/pq_endmessage_keep allow reuse of a stringbuffer.
---
 src/backend/libpq/pqformat.c   | 37 +
 src/backend/utils/mb/mbutils.c | 11 --
 src/include/libpq/pqformat.h   | 47 ++
 src/include/mb/pg_wchar.h  | 11 ++
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index c8cf67c041..6e40ee087c 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -97,6 +97,28 @@ pq_beginmessage(StringInfo buf, char msgtype)
 	buf->cursor = msgtype;
 }
 
+/* 
+
+ *		pq_beginmessage_pre - initialize for sending a message, reuse buffer
+ *
+ * This requires the buffer to be allocated in an sufficiently long-lived
+ * memory context.
+ * 
+ */
+void
+pq_beginmessage_pre(StringInfo buf, char msgtype)
+{
+	resetStringInfo(buf);
+
+	/*
+	 * We stash the message type into the buffer's cursor field, expecting
+	 * that the pq_sendXXX routines won't touch it.  We could alternatively
+	 * make it the first byte of the buffer contents, but this seems easier.
+	 */
+	buf->cursor = msgtype;
+}
+
+
 /* 
  *		pq_sendbyte		- append a raw byte to a StringInfo buffer
  * 
@@ -350,6 +372,21 @@ pq_endmessage(StringInfo buf)
 	buf->data = NULL;
 }
 
+/* 
+ *		pq_endmessage_keep	- send the completed message to the frontend
+ *
+ * The data buffer is *not* freed, allowing to reuse the buffer with
+ * pg_beginmessage_pre.
+ 
+ */
+
+void
+p

[HACKERS] Improve catcache/syscache performance.

2017-09-13 Thread Andres Freund

Hi,

There's plenty workloads where SearchSysCache()/SearchCatCache() shows
up as a major runtime factor. Primarily in workloads with very fast
queries.

A fair while ago, before I had my commit bit, I'd posted [1]. Looking at
the profiles/benchmarks I was easily able to confirm that it still
helps, but that there's also still a lot left on the table.

Attached is a patch that tries to improve sys/catcache performance,
going further than the patch referenced earlier.

This primarily includes four pieces:

1) Avoidance of FunctionCallInfo based function calls, replaced by
   more efficient functions with a native C argument interface.
2) Only initializing the ScanKey when necessary, i.e. catcache misses,
   reduces cache unnecessary cpu cache misses.
3) Allowing the compiler to specialize critical SearchCatCache for a
   specific number of attributes allows to unroll loops and avoid
   other nkeys dependant initialization.
4) Split of the heap lookup from the hash lookup, reducing stack
   allocations etc in the common case.

There's further potential:
- replace open coded hash with simplehash - the list walk right now
  shows up in profiles.
- As oid is the only system column supported, avoid the use of
  heap_getsysattr(), by adding an explicit branch for
  ObjectIdAttributeNumber. This shows up in profiles.
- move cache initialization out of the search path
- add more proper functions, rather than macros for
  SearchSysCacheCopyN etc., but right now they don't show up in profiles.

The reason the macro wrapper for syscache.c/h have to be changed,
rather than just catcache, is that doing otherwise would require
exposing the SysCache array to the outside.  That might be a good idea
anyway, but it's for another day.

This patch gives me roughly 8% speedup in a workload that consists out
of a fast query that returns a lot of columns.  If I apply a few
other performance patches, this patch itself starts to make a bigger
difference, of around 11%.

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20130905191323.gc490...@alap2.anarazel.de
>From 2b3e06380d5a339efc94e748aa57985d3bb80223 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 18:43:46 -0700
Subject: [PATCH 4/8] Add inline murmurhash32(int32) function.

The function already existed in tidbitmap.c but more users requiring
fast hashing of 32bit ints are coming up.
---
 src/backend/nodes/tidbitmap.c | 20 ++--
 src/include/utils/hashutils.h | 18 ++
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c4e53adb0c..01d6bc5c11 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -45,6 +45,7 @@
 #include "nodes/tidbitmap.h"
 #include "storage/lwlock.h"
 #include "utils/dsa.h"
+#include "utils/hashutils.h"
 
 /*
  * The maximum number of tuples per page is not large (typically 256 with
@@ -237,30 +238,13 @@ static int	tbm_comparator(const void *left, const void *right);
 static int tbm_shared_comparator(const void *left, const void *right,
 	  void *arg);
 
-/*
- * Simple inline murmur hash implementation for the exact width required, for
- * performance.
- */
-static inline uint32
-hash_blockno(BlockNumber b)
-{
-	uint32		h = b;
-
-	h ^= h >> 16;
-	h *= 0x85ebca6b;
-	h ^= h >> 13;
-	h *= 0xc2b2ae35;
-	h ^= h >> 16;
-	return h;
-}
-
 /* define hashtable mapping block numbers to PagetableEntry's */
 #define SH_USE_NONDEFAULT_ALLOCATOR
 #define SH_PREFIX pagetable
 #define SH_ELEMENT_TYPE PagetableEntry
 #define SH_KEY_TYPE BlockNumber
 #define SH_KEY blockno
-#define SH_HASH_KEY(tb, key) hash_blockno(key)
+#define SH_HASH_KEY(tb, key) murmurhash32(key)
 #define SH_EQUAL(tb, a, b) a == b
 #define SH_SCOPE static inline
 #define SH_DEFINE
diff --git a/src/include/utils/hashutils.h b/src/include/utils/hashutils.h
index 56b7bfc9cb..35281689e8 100644
--- a/src/include/utils/hashutils.h
+++ b/src/include/utils/hashutils.h
@@ -20,4 +20,22 @@ hash_combine(uint32 a, uint32 b)
 	return a;
 }
 
+
+/*
+ * Simple inline murmur hash implementation hashing a 32 bit ingeger, for
+ * performance.
+ */
+static inline uint32
+murmurhash32(uint32 data)
+{
+	uint32		h = data;
+
+	h ^= h >> 16;
+	h *= 0x85ebca6b;
+	h ^= h >> 13;
+	h *= 0xc2b2ae35;
+	h ^= h >> 16;
+	return h;
+}
+
 #endif			/* HASHUTILS_H */
-- 
2.14.1.536.g6867272d5b.dirty

>From 0feaa0c4a39b3e0e995cd5897cfd3ebba6a92c48 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 19:58:43 -0700
Subject: [PATCH 6/8] Add pg_noinline macro to c.h.

Forcing a function not to be inlined can be useful if it's the
slow-path of a performance critical function, or should be visible in
profiles to allow for proper cost attribution.

Author: Andres Freund
Discussion: https://postgr.es/m/
---
 src/include/c.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index 630dfbfc41..087b1eac

Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-13 Thread Andres Freund

On 2017-09-12 00:19:48 -0700, Andres Freund wrote:
> Hi,
> 
> I've recently seen a benchmark in which pg_mbcliplen() showed up
> prominently. Which it will basically in any benchmark with longer query
> strings, but fast queries. That's not that uncommon.
> 
> I wonder if we could avoid the cost of pg_mbcliplen() from within
> pgstat_report_activity(), by moving some of the cost to the read
> side. pgstat values are obviously read far less frequently in nearly all
> cases that are performance relevant.
> 
> Therefore I wonder if we couldn't just store a querystring that's
> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
> read side.  I think that should work because all *server side* encodings
> store character lengths in the *first* byte of a multibyte character
> (at least one clientside encoding, gb18030, doesn't behave that way).
> 
> That'd necessitate an added memory copy in pg_stat_get_activity(), but
> that seems fairly harmless.
> 
> Faults in my thinking?

Here's a patch that implements that idea.  Seems to work well.  I'm a
bit loathe to add proper regression tests for this, seems awfully
dependent on specific track_activity_query_size settings.  I did confirm
using gdb that I see incomplete characters before
pgstat_clip_activity(), but not after.

I've renamed st_activity to st_activity_raw to increase the likelihood
that potential external users of st_activity notice and adapt. Increases
the noise, but imo to a very bareable amount. Don't feel strongly
though.

Greetings,

Andres Freund
>From 9c9503f0dfe1babb21e81c1955e996ad06c93608 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 19:25:34 -0700
Subject: [PATCH 1/8] Speedup pgstat_report_activity by moving mb-aware
 truncation to read side.

Previously multi-byte aware truncation was done on every
pgstat_report_activity() call - proving to be a bottleneck for
workloads with long query strings that execute quickly.

Instead move the truncation to the read side, which is commonly
executed far less frequently. That's possible because all server
encodings allow to determine the length of a multi-byte string from
the first byte.

Author: Andres Freund
Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkvie...@alap3.anarazel.de
---
 src/backend/postmaster/pgstat.c | 63 -
 src/backend/utils/adt/pgstatfuncs.c | 17 +++---
 src/include/pgstat.h| 12 +--
 3 files changed, 72 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index accf302cf7..ccb528e627 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2701,7 +2701,7 @@ CreateSharedBackendStatus(void)
 		buffer = BackendActivityBuffer;
 		for (i = 0; i < NumBackendStatSlots; i++)
 		{
-			BackendStatusArray[i].st_activity = buffer;
+			BackendStatusArray[i].st_activity_raw = buffer;
 			buffer += pgstat_track_activity_query_size;
 		}
 	}
@@ -2922,11 +2922,11 @@ pgstat_bestart(void)
 #endif
 	beentry->st_state = STATE_UNDEFINED;
 	beentry->st_appname[0] = '\0';
-	beentry->st_activity[0] = '\0';
+	beentry->st_activity_raw[0] = '\0';
 	/* Also make sure the last byte in each string area is always 0 */
 	beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
 	beentry->st_appname[NAMEDATALEN - 1] = '\0';
-	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
+	beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
 	beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
 	beentry->st_progress_command_target = InvalidOid;
 
@@ -3017,7 +3017,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			pgstat_increment_changecount_before(beentry);
 			beentry->st_state = STATE_DISABLED;
 			beentry->st_state_start_timestamp = 0;
-			beentry->st_activity[0] = '\0';
+			beentry->st_activity_raw[0] = '\0';
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
@@ -3034,8 +3034,12 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	start_timestamp = GetCurrentStatementStartTimestamp();
 	if (cmd_str != NULL)
 	{
-		len = pg_mbcliplen(cmd_str, strlen(cmd_str),
-		   pgstat_track_activity_query_size - 1);
+		/*
+		 * Compute length of to-be-stored string unaware of multi-byte
+		 * characters. For speed reasons that'll get corrected on read, rather
+		 * than computed every write.
+		 */
+		len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1);
 	}
 	current_timestamp = GetCurrentTimestamp();
 
@@ -3049,8 +3053,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 
 	if (cmd_str != NULL)
 	{
-		memcpy((char *) beentry->st_activity, cmd_str, len);
-		beentry->st_activity[len] = '\0';
+		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
+		beentry->st_activity_raw[len] = '\0';
 		beentry->st_activity_start_timestamp =

[HACKERS] pspg - Postgres Pager

2017-09-13 Thread Pavel Stehule

Hi

I am sorry for spam - this is just info for not too wide community (I did
pre-release), so I am using this mailing list.

I wrote new pager designed primary for usage as psql pager. It is tested
only on Linux, but it should to work on any ncurses ready platform. It is
experiment - less or more - there are not too much specialised pagers, so
there can be bugs, issues - but I hope so it can be useful for somebody.

I invite any cooperation, comments, testing, patches, ...

https://github.com/okbob/pspg

Regards

Pavel

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Ashutosh Bapat

On Thu, Sep 14, 2017 at 4:30 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat
>>  wrote:
>>> I added some "stable" tests to your patch taking inspiration from the
>>> test SQL file. I think those will be stable across machines and runs.
>>> Please let me know if those look good to you.
>
>> Hmm.  But they show actual rows, not plan->plan_rows, and although the
>> former is interesting as a sanity check the latter is the thing under
>> test here.  It seems like we don't have fine enough control of
>> EXPLAIN's output to show estimated rows but not cost.  I suppose we
>> could try to capture EXPLAIN's output somehow (plpgsql dynamic
>> execution or spool output from psql?) and then pull out just the row
>> estimates, maybe with extra rounding to cope with instability.
>
> Don't have time to think about the more general question right now,
> but as far as the testing goes, there's already precedent for filtering
> EXPLAIN output --- see explain_sq_limit() in subselect.sql.  But I'm
> dubious whether the rowcount estimate could be relied on to be perfectly
> machine-independent, even if you were hiding costs successfully.
>

Are you referring to rounding errors? We should probably add some fuzz
factor to cover the rounding errors and cause a diff when difference
in expected and reported plan rows is beyond that fuzz factor.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Ashutosh Bapat

On Thu, Sep 14, 2017 at 4:19 AM, Thomas Munro
 wrote:
> On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat
>  wrote:
>> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
>>  wrote:
>>> That just leaves the question of whether we should try to handle the
>>> empty RHS and single-value RHS cases using statistics.  My intuition
>>> is that we shouldn't, but I'll be happy to change my intuition and
>>> code that up if that is the feedback from planner gurus.
>>
>> Empty RHS can result from dummy relations also, which are produced by
>> constraint exclusion, so may be that's an interesting case. Single
>> value RHS may be interesting with partitioned table with all rows in a
>> given partition end up with the same partition key value. But may be
>> those are just different patches. I am not sure.
>
> Can you elaborate on the constraint exclusion case?  We don't care
> about the selectivity of an excluded relation, do we?
>

I meant, an empty RHS case doesn't necessarily need an empty table, it
could happen because of a relation excluded by constraints (see
relation_excluded_by_constraints()). So, that's not as obscure as we
would think. But it's not very frequent either. But I think we should
deal with that as a separate patch. This patch improves the estimate
for some cases, while not degrading those in other cases. So, I think
we can leave other cases for a later patch.

> Any other views on the empty and single value special cases, when
> combined with [NOT] EXISTS (SELECT ... WHERE r.something <>
> s.something)?  Looking at this again, my feeling is that they're too
> obscure to spend time on, but others may disagree.
>
>>> Please find attached a new version, and a test script I used, which
>>> shows a bunch of interesting cases.  I'll add this to the commitfest.
>>
>> I added some "stable" tests to your patch taking inspiration from the
>> test SQL file. I think those will be stable across machines and runs.
>> Please let me know if those look good to you.
>
> Hmm.  But they show actual rows, not plan->plan_rows, and although the
> former is interesting as a sanity check the latter is the thing under
> test here.

I missed this point while adopting the tests. Sorry.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Fabien COELHO



Hello,


Personnally I'm fine with a pager, so vertical spacing is fine. I just do
not like paging horizontally.


-1 [...]

If I was going to try and read it like a book I'd want the extra
white-space to make doing so easier (white-space gives the eye a breather
when done with a particular concept) - and the length wouldn't really
matter since I'd just make a single pass and be done with it.  But the
planned usage is for quick lookup of options that you know (or at least
suspect) exist and which you probably have an approximate idea of how they
are spelled.  The all-caps and left-justified block headers are distinct
enough to scan down - though I'd consider indenting 4 spaces instead of 2
to make that even easier (less effort to ignore the indented lines since
ignoring nothing is easier than ignoring something).  Having more fit on
one screen makes that vertical skimming considerably easier as well (no
break and re-acquire when scrolling in a new page).


Interesting and fine arguments!


So I'll agree that in an absolute sense reading the whole of the content in
its condensed form is more difficult than if there were blank lines in
between each block, but usability for the intended purpose is better in the
current form.


As far as usability is concerned, I most often use the "/" pager search 
feature, or page down to scan everything. Both uses are not really 
hampered by skipping lines, but I can leave with that as well.


Help formatting could be an option, but that would require more coding and 
I'm not sure of the i18n aspect.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-09-13 Thread Andres Freund

Hi,

On 2017-09-13 23:39:21 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Re-upping this topic.
> 
> > On 2016-10-07 10:06:07 -0400, Tom Lane wrote:
> >> In the same line, maybe we should kill libpq's support for V2 protocol
> >> (which would make the cutoff 7.4).  And maybe the server's support too,
> >> though that wouldn't save very much code.  The argument for cutting this
> >> isn't so much that we would remove lots of code as that we're removing
> >> code that never gets tested, at least not by us.
> 
> > I'd like to do this in the not too far away future for at least the
> > backend. There's enough not particularly pretty code to deal with v2
> > that that'd be worthwhile.
> 
> Hm, I don't recall that there's very much on the server side that could be
> saved --- what's incurring your ire, exactly?

In this specific instance it's SendRowDescriptionMessage() for a queries
returning a lot of columns that execute fast. The additional branches
due to the if (proto >= 3) conditionals are noticeable. It's easy enough
to fix by having two for() for the two cases. The added code is annoying
but bearable, what actually concerns me is that it's really hard to test
the v2 support.


> >> One small problem with cutting libpq's V2 support is that the server's
> >> report_fork_failure_to_client() function still sends a V2-style message.
> 
> > We should really fix that so it reports the error as a v3 message,
> > independent of ripping out libpq-fe support for v2.
> 
> It might be reasonable to do that, but libpq would have to be prepared
> for the other case for many years to come :-(

Right. But there seems pretty much no way to get around that. At least
none that I can see?  It might be worthwhile and more testable to add a
special case V2 handling for the oom-fork case, but still.


> The real problem in this area, to my mind, is that we're not testing that
> code --- either end of it --- in any systematic way.  If it's broken it
> could take us quite a while to notice.

Yea, that concerns me a lot too (see above).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimise default partition scanning while adding new partition

2017-09-13 Thread Amit Langote

Hi Jeevan,

On 2017/09/12 18:22, Jeevan Ladhe wrote:
> Commit 6f6b99d1335be8ea1b74581fc489a97b109dd08a introduced default
> partitioning support. This commit added a new function
> check_default_allows_bound(),
> which checks if there exists a row in the default partition that would
> belong to
> the new partition being added. If it finds one, it throws an error. Before
> taking
> the decision to scan the default partition, this function checks if there
> are
> existing constraints on default partition that would imply the new partition
> constraints, if yes it skips scanning the default partition, otherwise it
> scans the
> default partition and its children(if any). But, while doing so the current
> code
> misses the fact that there can be constraints on the child of default
> partition
> such that they would imply the constraints of the new partition being added,
> and hence individual child scan can also be skipped.
> Attached is the patch which does this.
> 
> This is previously discussed in default partitioning thread[1], and decision
> was made that we can take this a separate patch rather than as a part of the
> default partitioning support.

Patch looks fine to me.  By the way, why don't we just say "Can we skip
scanning part_rel?" in the comment before the newly added call to
PartConstraintImpliedByRelConstraint()?  We don't need to repeat the
explanation of what it does at the every place we call it.

> Amit Langote has a similar patch[2] for scanning the children of a
> partitioned
> table which is being attached as partition of another partitioned table.

I just posted my rebased patch there [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/a83a0899-19f5-594c-9aac-3ba0f16989a1%40lab.ntt.co.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-09-13 Thread Amit Langote

On 2017/08/07 11:05, Amit Langote wrote:
> By the way, bulk of 0004 is refactoring which it seems is what Jeevan's
> default partition patch set also includes as one of the patches [1].  It
> got a decent amount review from Ashutosh.  I broke it down into a separate
> patch, so that the patch to add the new feature is its own tiny patch.
> 
> I also spotted a couple of comments referring to attachRel that we just
> recently renamed.
> 
> So, attached are:
> 
> 0001: s/attachRel/attachrel/g
> 0002: Refactoring to introduce a PartConstraintImpliedByRelConstraint
> 0003: Add the feature to skip the scan of individual leaf partitions

Since Jeevan Ladhe mentioned this patch [1] earlier this week, sending the
rebased patches here for consideration.  Actually there are only 2 patches
now, because 0002 above is rendered unnecessary by ecfe59e50fb [2].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAOgcT0MWwG8WBw8frFMtRYHAgDD=tpt6u7wcso_l2k0kypm...@mail.gmail.com

[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecfe59e50fb
From 55e1e14a821de541c2d24c152c193bf57eb91d43 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 7 Aug 2017 10:45:39 +0900
Subject: [PATCH 1/2] Typo: attachRel is now attachrel

---
 src/backend/commands/tablecmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96354bdee5..563bcda30c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13779,7 +13779,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 * Prevent circularity by seeing if rel is a partition of attachrel. (In
 * particular, this disallows making a rel a partition of itself.)
 *
-* We do that by checking if rel is a member of the list of attachRel's
+* We do that by checking if rel is a member of the list of attachrel's
 * partitions provided the latter is partitioned at all.  We want to 
avoid
 * having to construct this list again, so we request the strongest lock
 * on all partitions.  We need the strongest lock, because we may decide
-- 
2.11.0

From a7d0d781bd9e3730f90d902d0e09abf79962f872 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 7 Aug 2017 10:51:47 +0900
Subject: [PATCH 2/2] Teach ATExecAttachPartition to skip validation in more
 cases

In cases where the table being attached is a partitioned table and
the table itself does not have constraints that would allow validation
on the whole table to be skipped, we can still skip the validations
of individual partitions if they each happen to have the requisite
constraints.

Per an idea of Robert Haas', with code refactoring suggestions from
Ashutosh Bapat.
---
 src/backend/commands/tablecmds.c  | 10 ++
 src/test/regress/expected/alter_table.out | 13 +
 src/test/regress/sql/alter_table.sql  | 10 ++
 3 files changed, 33 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 563bcda30c..901eea7fe2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13678,6 +13678,16 @@ ValidatePartitionConstraints(List **wqueue, Relation 
scanrel,
/* There can never be a whole-row reference here */
if (found_whole_row)
elog(ERROR, "unexpected whole-row reference 
found in partition key");
+
+   /* Check if we can we skip scanning this part_rel. */
+   if (PartConstraintImpliedByRelConstraint(part_rel, 
my_partconstr))
+   {
+   ereport(INFO,
+   (errmsg("partition constraint 
for table \"%s\" is implied by existing constraints",
+   
RelationGetRelationName(part_rel;
+   heap_close(part_rel, NoLock);
+   continue;
+   }
}
 
/* Grab a work queue entry. */
diff --git a/src/test/regress/expected/alter_table.out 
b/src/test/regress/expected/alter_table.out
index 0478a8ac60..e3415837b6 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3464,6 +3464,19 @@ ERROR:  updated partition constraint for default 
partition would be violated by
 -- should be ok after deleting the bad row
 DELETE FROM part5_def_p1 WHERE b = 'y';
 ALTER TABLE part_5 ATTACH PARTITION part5_p1 FOR VALUES IN ('y');
+-- If the partitioned table being attached does not have a constraint that
+-- would allow validation scan to be skipped, but an individual partition
+-- does, then the partition's validation scan is skipped.  Note that the
+-- following leaf partition only allows rows that have a = 7 (and b = 'b' but
+-- that's irrelevant).

Re: [HACKERS] utility commands benefiting from parallel plan

2017-09-13 Thread Rafia Sabih

On Wed, Sep 13, 2017 at 2:29 PM, Haribabu Kommi
 wrote:
>
>
> On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih 
> wrote:
>>
>> On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
>>  wrote:
>> >
>> > Hi All,
>> >
>> > Attached a rebased patch that supports parallelism for the queries
>> > that are underneath of some utility commands such as CREATE TABLE AS
>> > and CREATE MATERIALIZED VIEW.
>> >
>> > Note: This patch doesn't make the utility statement (insert operation)
>> > to run in parallel. It only allows the select query to be parallel if
>> > the
>> > query
>> > is eligible for parallel.
>> >
>>
>> Here is my feedback fro this patch,
>>
>> - The patch is working as expected, all regression tests are passing
>
>
> Thanks for the review.
>
>>
>> - I agree with Dilip that having similar mechanism for 'insert into
>> select...' statements would add more value to the patch, but even then
>> this looks like a good idea to extend parallelism for atleast a few of
>> the write operations
>
>
> Yes, I also agree that supporting of 'insert into select' will provide more
> benefit. I already tried to support the same in [1], but it have many
> drawbacks especially with triggers. To support a proper parallel support
> for DML queries, I feel the logic of ParalleMode needs an update to
> avoid the errors from PreventCommandIfParallelMode() function to
> identify whether it is nested query operation and that should execute
> only in backend and etc.
>
> As the current patch falls into DDL category that gets benefited from
> parallel query, because of this reason, I didn't add the 'insert into
> select'
> support into this patch. Without support of it also, it provides the
> benefit.
> I work on supporting the DML write support with parallel query as a
> separate patch.
>
Sounds sensible. In that case, I'll be marking this patch ready for committer.
>
> [1] -
> https://www.postgresql.org/message-id/CAJrrPGfo58TrYxnqwnFAo4%2BtYr8wUH-oC0dJ7V9x7gAOZeaz%2BQ%40mail.gmail.com
>
>

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-09-13 Thread Amit Kapila

On Wed, Sep 13, 2017 at 5:30 PM, Rushabh Lathia
 wrote:
> On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila 
> wrote:
>>
>
>
> This seems like a good optimization. I tried to simulate the test given
> in the mail, initially wasn't able to generate the exact test - as index
> creation is missing in the test shared.
>

Oops.

> I also won't consider this as bug, but its definitely good optimization
> for GatherMerge.
>
>>
>>
>> Note - If we agree on the problems and fix, then I can add regression
>> tests to cover above cases in the patch.
>
>
> Sure, once you do that - I will review the patch.
>

The attached patch contains regression test as well.

Thanks for looking into it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pushdown_target_gathermerge_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Call RelationDropStorage() for broader range of object drops.

2017-09-13 Thread Hadi Moshayedi

To provide more context, in cstore_fdw creating the storage is easy, we
only need to hook into CREATE FOREIGN TABLE using event triggers. Removing
the storage is not that easy, for DROP FOREIGN TABLE we can use event
triggers. But when we do DROP EXTENSION, the event triggers don't get fired
(because they have already been dropped), so to handle DROP EXTENSION, we
need to hook into the process utility hook. Now to implement this, (1) we
get a list of all cstore tables (2) call postgres's utility hook, (3) if #2
succeeds clean-up all cstore table storage's. But when #3 happens the
relation isn't there anymore, so we create a pseudo-relation [1] and call
RelationDropStorage().

Implementing all of this seemed messy, so we thought maybe postgres could
try to clean-up storage for every relation it assigns a relfilenode for. We
are open to ideas.

[1]
https://github.com/citusdata/cstore_fdw/blob/store_data_in_internal_storage/cstore_fdw.c#L907

[HACKERS] Warnings "unrecognized node type" for some DDLs with log_statement = 'ddl'

2017-09-13 Thread Michael Paquier

Hi all,

While reviewing another patch, I have bumped into a couple of failures
when running installcheck if log_statement = 'ddl'. This pops
regression failures for 4 tests: object_address, alter_generic,
alter_operator and stats_ext involving commands CREATE STATISTICS and
ALTER OPERATOR.

You can as well reproduce the failures using simply that:
=# create table aa (a int, b int);
CREATE TABLE
=# CREATE STATISTICS aa_stat ON a, b FROM aa;
WARNING:  01000: unrecognized node type: 332
LOCATION:  GetCommandLogLevel, utility.c:3357
ERROR:  42P17: extended statistics require at least 2 columns
LOCATION:  CreateStatistics, statscmds.c:220
=# ALTER OPERATOR = (boolean, boolean) SET (RESTRICT = NONE);
WARNING:  01000: unrecognized node type: 294
LOCATION:  GetCommandLogLevel, utility.c:3357
ALTER OPERATOR

Attached is a patch to fix all the failures I have spotted. As CREATE
STATISTICS is new in PG10, I am adding an open item as things come
from 7b504eb2. The problems of ALTER OPERATOR are introduced by 9.6.
Still I would suggest to fix everything at the same time. The problem
comes from GetCommandLogLevel() which forgot to add T_CreateStatsStmt
and T_AlterOperatorStmt. I have noticed as well that
T_AlterCollationStmt was missing.

Thanks,
-- 
Michael


log-statement-ddl-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Call RelationDropStorage() for broader range of object drops.

2017-09-13 Thread Hadi Moshayedi

On Wed, Sep 13, 2017 at 12:12 AM, Michael Paquier  wrote:
>
> Foreign tables do not have physical storage assigned to by default. At
> least heap_create() tells so, create_storage being set to false for a
> foreign table. So there is nothing to clean up normally. Or is
> cstore_fdw using directly heap_create with its own relfilenode set,
> creating a physical storage?
>

cstore_fdw (in store_data_in_internal_storage branch) calls
RelationCreateStorage() after CREATE FOREIGN TABLE completes [1]. Later it
also creates the FSM fork and uses it for storing some metadata.

[1]
https://github.com/citusdata/cstore_fdw/blob/store_data_in_internal_storage/cstore_fdw.c#L237

Re: [HACKERS] Is it time to kill support for very old servers?

2017-09-13 Thread Tom Lane

Andres Freund  writes:
> Re-upping this topic.

> On 2016-10-07 10:06:07 -0400, Tom Lane wrote:
>> In the same line, maybe we should kill libpq's support for V2 protocol
>> (which would make the cutoff 7.4).  And maybe the server's support too,
>> though that wouldn't save very much code.  The argument for cutting this
>> isn't so much that we would remove lots of code as that we're removing
>> code that never gets tested, at least not by us.

> I'd like to do this in the not too far away future for at least the
> backend. There's enough not particularly pretty code to deal with v2
> that that'd be worthwhile.

Hm, I don't recall that there's very much on the server side that could be
saved --- what's incurring your ire, exactly?

>> One small problem with cutting libpq's V2 support is that the server's
>> report_fork_failure_to_client() function still sends a V2-style message.

> We should really fix that so it reports the error as a v3 message,
> independent of ripping out libpq-fe support for v2.

It might be reasonable to do that, but libpq would have to be prepared
for the other case for many years to come :-(

The real problem in this area, to my mind, is that we're not testing that
code --- either end of it --- in any systematic way.  If it's broken it
could take us quite a while to notice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-13 Thread Amit Kapila

On Wed, Sep 13, 2017 at 9:39 AM, Amit Kapila  wrote:
> On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar  
> wrote:
>> On 5 September 2017 at 14:04, Amit Kapila  wrote:
>>
>> I started with a quick review ... a couple of comments below :
>>
>> - * If this is a baserel, consider gathering any partial paths we may have
>> - * created for it.  (If we tried to gather inheritance children, we could
>> + * If this is a baserel and not the only rel, consider gathering any
>> + * partial paths we may have created for it.  (If we tried to gather
>>
>>   /* Create GatherPaths for any useful partial paths for rel */
>> -  generate_gather_paths(root, rel);
>> +  if (lev < levels_needed)
>> + generate_gather_paths(root, rel, NULL);
>>
>> I think at the above two places, and may be in other place also, it's
>> better to mention the reason why we should generate the gather path
>> only if it's not the only rel.
>>
>
> I think the comment you are looking is present where we are calling
> generate_gather_paths in grouping_planner. Instead of adding same or
> similar comment at multiple places, how about if we just say something
> like "See in grouping_planner where we generate gather paths" at all
> other places?
>
>> --
>>
>> -   if (rel->reloptkind == RELOPT_BASEREL)
>> -   generate_gather_paths(root, rel);
>> +   if (rel->reloptkind == RELOPT_BASEREL &&
>> root->simple_rel_array_size > 2)
>> +   generate_gather_paths(root, rel, NULL);
>>
>> Above, in case it's a partitioned table, root->simple_rel_array_size
>> includes the child rels. So even if it's a simple select without a
>> join rel, simple_rel_array_size would be > 2, and so gather path would
>> be generated here for the root table, and again in grouping_planner().
>>
>
> Yeah, that could be a problem.  I think we should ensure that there is
> no append rel list by checking root->append_rel_list.  Can you think
> of a better way to handle it?
>

The attached patch fixes both the review comments as discussed above.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_paths_include_tlist_cost_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-09-13 Thread Pavel Stehule

2017-09-13 23:36 GMT+02:00 Thomas Munro :

> On Wed, Sep 6, 2017 at 4:12 AM, Pavel Stehule 
> wrote:
> > 2017-09-05 18:06 GMT+02:00 Tom Lane :
> >> Pushed, with some fooling with the documentation (notably,
> >> re-alphabetizing relevant lists).
> >>
> > Thank you very much
>
> I've started setting PSQL_PAGER="~/bin/pspg -s0" to try your new
> column-aware pager from https://github.com/okbob/pspg for my regular
> work.  Wow!  It could use some warning clean-up but it's a clever idea
> and so far it works really well.  Thanks for making this.
>

:) Thank you. It is really fresh project - I started it three months ago,
so probably there are lot of issues. It is +/- little bit cleaned and
enhanced prototype. But I hope - it is used together with psql very well.

Yesterday I cleaned all warnings and I prepared autoconf and rpmbuild. I
have a plan to release it next week.

Regards

Pavel

> --
> Thomas Munro
> http://www.enterprisedb.com
>

Re: [HACKERS] PG 10 release notes

2017-09-13 Thread Tsunakawa, Takayuki

It's embarrassing to ask about such a trivial thing, but I noticed the 
following line was missing in the latest release note, which was originally in 
Bruce's website:

Remove documented restriction about using large shared buffers on Windows 
(Takayuki Tsunakawa)

Is this intended?

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-13 Thread Amit Langote

On 2017/09/14 7:43, Robert Haas wrote:
> On Wed, Sep 13, 2017 at 12:56 PM, Ashutosh Bapat
>  wrote:
>> I debugged what happens in case of query "select 1 from t1 union all
>> select 2 from t1;" with the current HEAD (without multi-level
>> expansion patch attached). It doesn't set partitioned_rels in Append
>> path that gets converted into Append plan. Remember t1 is a
>> multi-level partitioned table here with t1p1 as its immediate
>> partition and t1p1p1 as partition of t1p1. So, the
>> set_append_rel_pathlist() recurses once as shown in the following
>> stack trace.
> 
> Nice debugging.

+1.

> I spent some time today looking at this and I think
> it's a bug in v10, and specifically in add_paths_to_append_rel(),
> which only sets partitioned_rels correctly when the appendrel is a
> partitioned rel, and not when it's a subquery RTE with one or more
> partitioned queries beneath it.
> 
> Attached are two patches either one of which will fix it.  First, I
> wrote mechanical-partrels-fix.patch, which just mechanically
> propagates partitioned_rels lists from accumulated subpaths into the
> list used to construct the parent (Merge)AppendPath.  I wasn't entire
> happy with that, because it ends up building multiple partitioned_rels
> lists for the same RelOptInfo.  That seems silly, but there's no
> principled way to avoid it; avoiding it amounts to hoping that all the
> paths for the same relation carry the same partitioned_rels list,
> which is uncomfortable.
> 
> So then I wrote pcinfo-for-subquery.patch.  That patch notices when an
> RTE_SUBQUERY appendrel is processed and accumulates the
> partitioned_rels of its immediate children; in case there can be
> multiple nested levels of subqueries before we get down to the actual
> partitioned rel, it also adds a PartitionedChildRelInfo for the
> subquery RTE, so that there's no need to walk the whole tree to build
> the partitioned_rels list at higher levels, just the immediate
> children.  I find this fix a lot more satisfying.  It adds less code
> and does no extra work in the common case.

I very much like pcinfo-for-subquery.patch, although I'm not sure if we
need to create PartitionedChildRelInfo for the sub-query parent RTE as the
patch teaches add_paths_to_append_rel() to do.  ISTM, nested UNION ALL
subqueries are flattened way before we get to add_paths_to_append_rel();
if it could not be flattened, there wouldn't be a call to
add_paths_to_append_rel() in the first place, because no AppendRelInfos
would be generated.  See what happens when is_simple_union_all_recurse()
returns false to flatten_simple_union_all() -- no AppendRelInfos will be
generated and added to root->append_rel_list in that case.

IOW, there won't be nested AppendRelInfos for nested UNION ALL sub-queries
like we're setting out to build for multi-level partitioned tables.

So, as things stand today, there can at most be one recursive call of
add_path_to_append_rel() for a sub-query parent RTE, that is, if its child
sub-queries contain partitioned tables, but not more.  The other patch
(multi-level expansion of partitioned tables) will change that, but even
then we won't need sub-query's own PartitioendChildRelInfo.

> Notice that the choice of fix we adopt has consequences for your
> 0001-Multi-level-partitioned-table-expansion.patch -- with
> mechanical-partrels-fix.patch, that patch could either associated all
> partitioned_rels with the top-parent or it could work level by level
> and everything would get properly assembled later.  But with
> pcinfo-for-subquery.patch, we need everything associated with the
> top-parent.  That doesn't seem like a problem to me, but it's
> something to note.

I think it's fine.

With 0001-Multi-level-partitioned-table-expansion.patch,
get_partitioned_child_rels() will get called even for non-root partitioned
tables, for which it won't find a valid pcinfo.  I think that patch must
also change its callers to stop Asserting that a valid pcinfo is returned.

Spotted a typo in pcinfo-for-subquery.patch:

+ * A plain relation will alread have

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-09-13 Thread Noah Misch

On Wed, Sep 13, 2017 at 12:06:40PM -0400, Robert Haas wrote:
> On Wed, Sep 13, 2017 at 10:54 AM, Dean Rasheed  
> wrote:
> > Oracle, MySQL and DB2 all use MINVALUE/MAXVALUE. Actually, Oracle and
> > MySQL only use MAXVALUE, not MINVALUE, because they don't allow gaps
> > between partitions and the first partition implicitly starts at
> > MINVALUE, so the bounds that we currently support are a strict
> > superset of those supported by Oracle and MySQL.
> >
> > Both Oracle and MySQL allow finite values after MAXVALUE (usually
> > listed as "0" in code examples, e.g. see [1]). Oracle explicitly
> > documents the fact that values after MAXVALUE are irrelevant in [1].
> > I'm not sure if MySQL explicitly documents that, but it does behave
> > the same.
> >
> > Also, both Oracle and MySQL store what the user entered (they do not
> > canonicalise), as can be seen by looking at ALL_TAB_PARTITIONS in
> > Oracle, or "show create table" in MySQL.
> 
> OK, thanks.  I still don't really like allowing this, but I can see
> that compatibility with other systems has some value here, and if
> nobody else is rejecting these cases, maybe we shouldn't either.  So
> I'll hold my nose and change my vote to canonicalizing rather than
> rejecting outright.

I vote for rejecting it.  DDL compatibility is less valuable than other
compatibility.  The hypothetical affected application can change its DDL to
placate PostgreSQL and use that modified DDL for all other databases, too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-09-13 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ashutosh Sharma
> I have once again tested the latest patch (v14 patch) on Windows and the
> results looked fine to me. Basically I have repeated the test cases which
> I had done earlier on v8 patch. For more details, on the tests that i have
> re-executed, please refer to - [1]. Thanks.

Thanks so much.  I'm relieved to know that the patch still works.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Savepoint-related statements terminates connection

2017-09-13 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> The originally reported bug is fixed.  Not making any claims about other
> bugs ...

I'm sorry I couldn't reply to you.  I've recently been in a situation where I 
can't use my time for development.  I think I'll be able to rejoin the 
community activity soon.

I confirmed your patch fixed the problem.  And the code looks perfect.  Thank 
you very much.

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-09-13 Thread David Fetter

On Wed, Sep 13, 2017 at 10:09:37AM +0200, Andreas Karlsson wrote:
> On 09/13/2017 04:04 AM, Simon Riggs wrote:
> >On 31 August 2017 at 05:16, Peter Eisentraut
> > wrote:
> >>- index support (and related constraint support)
> >
> >Presumably you can't index a VIRTUAL column. Or at least I don't
> >think its worth spending time trying to make it work.
> 
> I think end users would be surprised if one can index STORED columns
> and expressions but not VIRTUAL columns. So unless it is a huge
> project I would say it is worth it.

So long as the expression on the normal columns was immutable, it's
fit for an expressional index, as is any immutable function composed
with it.

What am I missing?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Peter Eisentraut

On 9/13/17 09:56, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Peter Eisentraut  writes:
> 
>>> - Disallow DROP SUBSCRIPTION in a transaction under certain
>>> circumstances, for example if a transaction has previously manipulated
>>> the same subscription.
> 
>> ISTM the second of those (refuse to drop an in-use subscription) is
>> by far the least surprising behavior.
> 
> +1 for that option.  IIRC this has precedent for other object types such
> as tables, where we refuse some action if we have already operated on
> the table in the same transaction.

What are some examples of such behavior?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-09-13 Thread Amit Langote

On 2017/09/14 1:42, Robert Haas wrote:
> On Wed, Sep 13, 2017 at 6:02 AM, Amit Langote
>  wrote:
>> It seems to me we don't really need the first patch all that much.  That
>> is, let's keep PartitionDispatchData the way it is for now, since we don't
>> really have any need for it beside tuple-routing (EIBO as committed didn't
>> need it for one).  So, let's forget about "decoupling
>> RelationGetPartitionDispatchInfo() from the executor" thing for now and
>> move on.
>>
>> So, attached is just the patch to make RelationGetPartitionDispatchInfo()
>> traverse the partition tree in depth-first manner to be applied on HEAD.
> 
> I like this patch.  Not only does it improve the behavior, but it's
> actually less code than we have now, and in my opinion, the new code
> is easier to understand, too.
> 
> A few suggestions:

Thanks for the review.

> - I think get_partition_dispatch_recurse() get a check_stack_depth()
> call just like expand_partitioned_rtentry() did, and for the same
> reasons: if the catalog contents are corrupted so that we have an
> infinite loop in the partitioning hierarchy, we want to error out, not
> crash.

Ah, missed that.  Done.

> - I think we should add a comment explaining that we're careful to do
> this in the same order as expand_partitioned_rtentry() so that callers
> can assume that the N'th entry in the leaf_part_oids array will also
> be the N'th child of an Append node.

Done.  Since the Append/ModifyTable may skip some leaf partitions due to
pruning, added a note about that too.

> + * For every partitioned table other than root, we must store a
> 
> other than the root
> 
> + * partitioned table.  The value multiplied back by -1 is returned as the
> 
> multiplied by -1, not multiplied back by -1
> 
> + * tables in the tree, using which, search is continued further down the
> + * partition tree.
> 
> Period after "in the tree".  Then continue: "This value is used to
> continue the search in the next level of the partition tree."

Fixed.

Attached updated patch.

Thanks,
Amit
From c2599d52267cc362e059452efe69ddd09b94c083 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 8 Sep 2017 17:35:10 +0900
Subject: [PATCH] Make RelationGetPartitionDispatch expansion order depth-first

This is so as it matches what the planner is doing with partitioning
inheritance expansion.  Matching with planner order helps because
it helps ease matching the executor's per-partition objects with
planner-created per-partition nodes.
---
 src/backend/catalog/partition.c | 252 +---
 1 file changed, 109 insertions(+), 143 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 73eff17202..36f52ddb98 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -147,6 +147,8 @@ static int32 partition_bound_cmp(PartitionKey key,
 static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
void *probe, bool 
probe_is_bound, bool *is_equal);
+static void get_partition_dispatch_recurse(Relation rel, Relation parent,
+  List **pds, List 
**leaf_part_oids);
 
 /*
  * RelationBuildPartitionDesc
@@ -1192,21 +1194,6 @@ get_partition_qual_relid(Oid relid)
 }
 
 /*
- * Append OIDs of rel's partitions to the list 'partoids' and for each OID,
- * append pointer rel to the list 'parents'.
- */
-#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \
-   do\
-   {\
-   int i;\
-   for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\
-   {\
-   (partoids) = lappend_oid((partoids), 
(rel)->rd_partdesc->oids[i]);\
-   (parents) = lappend((parents), (rel));\
-   }\
-   } while(0)
-
-/*
  * RelationGetPartitionDispatchInfo
  * Returns information necessary to route tuples down a partition 
tree
  *
@@ -1222,151 +1209,130 @@ PartitionDispatch *
 RelationGetPartitionDispatchInfo(Relation rel,
 int 
*num_parted, List **leaf_part_oids)
 {
+   List   *pdlist;
PartitionDispatchData **pd;
-   List   *all_parts = NIL,
-  *all_parents = NIL,
-  *parted_rels,
-  *parted_rel_parents;
-   ListCell   *lc1,
-  *lc2;
-   int i,
-   k,
-   offset;
+   ListCell *lc;
+   int i;
 
-   /*
-* We rely on the relcache to traverse the partition tree to build both
-* the leaf partition OIDs list and the array of PartitionDispatch 
objects
-* for the partitioned tables in the tree.  That means every partitioned
-* table in t

Re: [HACKERS] domain type smashing is expensive

2017-09-13 Thread Andres Freund

Hi,

On 2017-09-12 14:28:51 -0400, Robert Haas wrote:
> On Tue, Sep 12, 2017 at 1:37 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On short-running queries that return a lot of columns,
> >> SendRowDescriptionMessage's calls to getBaseTypeAndTypmod() are a
> >> noticeable expense.
> >
> > Yeah, I was never very happy with the way that the original domain
> > patch dealt with that.  I think you're not even focusing on the
> > worst part, which is all the getBaseType calls in the parser.
> > I do not have a good idea about how to get rid of them though.
> 
> Well, I'm focusing on the part that shows up in the profile.  Prepared
> queries don't get re-parsed repeatedly, so the calls in the parser
> don't matter in that context.  I'm not saying it wouldn't be nice to
> get rid of them, but it only helps people who aren't preparing their
> queries.

In my experience syscache lookups aren't particularly prominent in
profiles of non-prepared workloads. That's commonly memory allocator
(due to all the lists), raw parser, and then parse-analysis (with a
small proportion spend in the syscaches). Leaving executor side aside.


> >> +   if (typid < FirstBootstrapObjectId)
> >> +   break;
> >
> > I'm really unwilling to buy into an assumption that we'll never
> > have any built-in domains just to support such a crock as this.

I'm not super happy about that solution either, but it has the big
advantage of being simple and consisting of very little code. Adding a
couple comments here and a type_sanity.sql check seems to buy a good
chunk of performance for little effort.  Adding additional hashtable
searches is far from free.


> I more or less expected that reaction, but I think it's a bit
> short-sighted.  If somebody wanted to define a domain type in
> pg_type.h, they'd have to write any domain constraint out in
> pg_constraint.h in nodeToString() form, and it seems to me that the
> chances that we'd accept a patch are pretty much nil, because it would
> be a maintenance nuisance.  Now, maybe you could argue that somebody
> might want to define a constraint-less domain in pg_type.h, but I
> can't recall any proposal to do such a thing and don't see why
> anybody'd want to do it.

Due to that reason we'd probably create such domain types outside of
bootstrap, and therefore in a separate oid range...



> > You'd need to dig around in the archives from around that time.  But
> > my hazy recollection is that the argument was that clients would be
> > much more likely to know what to do with a built-in type than with
> > some domain over it.  psql, for example, knows about right-justifying
> > the built-in numeric types, but it'd fail to do so for domains.
> 
> Mmm, that's a good point.

Yea, I don't think we want to revise that just because of this
performance issue - it'd likely cause some subtle breakage. I'm far from
convinced that this "downcasting" is a good idea on a semantical basis,
but that seems like a separate discussion and I can't recall complaints.


> >> 2. Precompute the list of types to be sent to the client during
> >> planning instead of during execution.  The point of prepared
> >> statements is supposed to be to do as much of the work as possible at
> >> prepare time so that bind/execute is as fast as possible, but we're
> >> not really adhering to that design philosophy here.  However, I don't
> >> have a clear idea of exactly how to do that.
> >
> > That'd help for prepared statements, but not for simple query execution.

> Sure, but that's kinda my point.  We've got to send a RowDescription
> message for every query, and if that requires smashing domain types to
> base types, we have to do it.  What we don't have to do is repeat that
> work for every execution of a prepared query.

We also have done a bunch of those lookups in the planner already, so if
we'd move it there it might still be be an advantage performancewise
even for the single execution case.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-09-13 Thread Robert Haas

On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
 wrote:
> The cause is that NO INHERIT doesn't take an exlusive lock on the
> parent. This allows expand_inherited_rtentry to add the child
> relation into appendrel after removal from the inheritance but
> still exists.
>
> I see two ways to fix this.
>
> The first patch adds a recheck of inheritance relationship if the
> corresponding attribute is missing in the child in
> make_inh_translation_list(). The recheck is a bit complex but it
> is not performed unless the sequence above is happen. It checks
> duplication of relid (or cycles in inheritance) following
> find_all_inheritors (but doing a bit different) but I'm not sure
> it is really useful.
>
>
> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> the parent first.
>
> Since the latter has a larger impact on the current behavior and
> we already treat "DROP TABLE child" case in the similar way, I
> suppose that the first approach would be preferable.
>
>
> Any comments or thoughts?

I agree that the second has less user impact, but I wonder if we can
think that will really fix the bug completely, or more generally if it
will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
locking the parent.  I have a sneaking suspicion that may be wishful
thinking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-09-13 Thread Andres Freund

Hi,

Re-upping this topic.

On 2016-10-07 10:06:07 -0400, Tom Lane wrote:
> In the same line, maybe we should kill libpq's support for V2 protocol
> (which would make the cutoff 7.4).  And maybe the server's support too,
> though that wouldn't save very much code.  The argument for cutting this
> isn't so much that we would remove lots of code as that we're removing
> code that never gets tested, at least not by us.

I'd like to do this in the not too far away future for at least the
backend. There's enough not particularly pretty code to deal with v2
that that'd be worthwhile.


> One small problem with cutting libpq's V2 support is that the server's
> report_fork_failure_to_client() function still sends a V2-style message.
> We could change that in HEAD, certainly, but we don't really want modern
> libpq unable to parse such a message from an older server.  Possibly we
> could handle that specific case with a little special-purpose code and
> still be able to take out most of fe-protocol2.c.

We should really fix that so it reports the error as a v3 message,
independent of ripping out libpq-fe support for v2.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-09-13 Thread Robert Haas

On Tue, Sep 12, 2017 at 10:04 PM, Simon Riggs  wrote:
> I think an option to decide whether the default is STORED or VIRTUAL
> would be useful.

That seems like it could be a bit of a foot-gun.  For example, an
extension author who uses generated columns will have to be careful to
always specify one or the other, because they don't know what the
default will be on the system where it's deployed.  Similarly for an
author of a portable application.  I think it'll create fewer
headaches if we just pick a default and stick with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

2017-09-13 Thread Thomas Munro

On Thu, Sep 14, 2017 at 12:51 AM, Prabhat Sahu
 wrote:
> Setting with lower "shared_buffers" and "work_mem" as below,  query getting 
> crash but able to see explain plan.

Thanks Prabhat.  A small thinko in the batch reset code means that it
sometimes thinks the shared skew hash table is present and tries to
probe it after batch 1.  I have a fix for that and I will post a new
patch set just as soon as I have a good regression test figured out.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Tom Lane

"David G. Johnston"  writes:
>If I was going to try and read it like a book I'd want the extra
> white-space to make doing so easier (white-space gives the eye a breather
> when done with a particular concept) - and the length wouldn't really
> matter since I'd just make a single pass and be done with it.  But the
> planned usage is for quick lookup of options that you know (or at least
> suspect) exist and which you probably have an approximate idea of how they
> are spelled.  The all-caps and left-justified block headers are distinct
> enough to scan down - though I'd consider indenting 4 spaces instead of 2
> to make that even easier (less effort to ignore the indented lines since
> ignoring nothing is easier than ignoring something).  Having more fit on
> one screen makes that vertical skimming considerably easier as well (no
> break and re-acquire when scrolling in a new page).

Hmm, indenting the descriptions a couple more spaces might be a workable
compromise.  Anyone want to try that and see what it looks like?
Preferably somebody who's not happy with the current layout ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread David G. Johnston

On Wed, Sep 13, 2017 at 12:46 PM, Fabien COELHO  wrote:

>
> Hello Tom,
>
> Probably it needs some rebase after Tom committed result status variables.
>>>
>>
>> As it is a style thing, ISTM that the patch is ready if most people agree
>>> that it is better this way and there is no strong veto against.
>>>
>>
>> FWIW, I think it's a bad idea.  We already nearly-doubled the vertical
>> space required for this variable list.  That was a heavy cost --- and we
>> already got at least one complaint about it --- but it seemed warranted
>> to avoid having to deal with very constrained variable descriptions.
>> This proposes to make the vertical space nearly triple what it was in v10.
>> In a typical-size window that's going to have a pretty severe impact on
>> how much of the list you can see at once.  And the readability gain is
>> (at least to my eyes) very marginal.
>>
>
> Ok, you do not like it. As Pavel said, it is subjective. When it is a
> matter of taste, people tend to differ, someone will always complain, one
> way or another, and they are neither right nor wrong.
>
> So, is it a -1 or a veto?
>
> If it is the later, the patch can be marked as "Rejected" and everybody
> will get more time for other things:-)
>
> If it is a not a veto, people can continue to give their opinions.
> Personnally I'm fine with a pager, so vertical spacing is fine. I just do
> not like paging horizontally.

-1

I don't fully by the "it is subjective" argument - I'm by no means an
expert but there are many people out there who have done considerable
research on human perception that there is at least room for non-subjective
evaluation.  Below is my attempt.

If I was going to try and read it like a book I'd want the extra
white-space to make doing so easier (white-space gives the eye a breather
when done with a particular concept) - and the length wouldn't really
matter since I'd just make a single pass and be done with it.  But the
planned usage is for quick lookup of options that you know (or at least
suspect) exist and which you probably have an approximate idea of how they
are spelled.  The all-caps and left-justified block headers are distinct
enough to scan down - though I'd consider indenting 4 spaces instead of 2
to make that even easier (less effort to ignore the indented lines since
ignoring nothing is easier than ignoring something).  Having more fit on
one screen makes that vertical skimming considerably easier as well (no
break and re-acquire when scrolling in a new page).

So I'll agree that in an absolute sense reading the whole of the content in
its condensed form is more difficult than if there were blank lines in
between each block, but usability for the intended purpose is better in the
current form.

David J.

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Tom Lane

Thomas Munro  writes:
> On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat
>  wrote:
>> I added some "stable" tests to your patch taking inspiration from the
>> test SQL file. I think those will be stable across machines and runs.
>> Please let me know if those look good to you.

> Hmm.  But they show actual rows, not plan->plan_rows, and although the
> former is interesting as a sanity check the latter is the thing under
> test here.  It seems like we don't have fine enough control of
> EXPLAIN's output to show estimated rows but not cost.  I suppose we
> could try to capture EXPLAIN's output somehow (plpgsql dynamic
> execution or spool output from psql?) and then pull out just the row
> estimates, maybe with extra rounding to cope with instability.

Don't have time to think about the more general question right now,
but as far as the testing goes, there's already precedent for filtering
EXPLAIN output --- see explain_sq_limit() in subselect.sql.  But I'm
dubious whether the rowcount estimate could be relied on to be perfectly
machine-independent, even if you were hiding costs successfully.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Thomas Munro

On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat
 wrote:
> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
>  wrote:
>> That just leaves the question of whether we should try to handle the
>> empty RHS and single-value RHS cases using statistics.  My intuition
>> is that we shouldn't, but I'll be happy to change my intuition and
>> code that up if that is the feedback from planner gurus.
>
> Empty RHS can result from dummy relations also, which are produced by
> constraint exclusion, so may be that's an interesting case. Single
> value RHS may be interesting with partitioned table with all rows in a
> given partition end up with the same partition key value. But may be
> those are just different patches. I am not sure.

Can you elaborate on the constraint exclusion case?  We don't care
about the selectivity of an excluded relation, do we?

Any other views on the empty and single value special cases, when
combined with [NOT] EXISTS (SELECT ... WHERE r.something <>
s.something)?  Looking at this again, my feeling is that they're too
obscure to spend time on, but others may disagree.

>> Please find attached a new version, and a test script I used, which
>> shows a bunch of interesting cases.  I'll add this to the commitfest.
>
> I added some "stable" tests to your patch taking inspiration from the
> test SQL file. I think those will be stable across machines and runs.
> Please let me know if those look good to you.

Hmm.  But they show actual rows, not plan->plan_rows, and although the
former is interesting as a sanity check the latter is the thing under
test here.  It seems like we don't have fine enough control of
EXPLAIN's output to show estimated rows but not cost.  I suppose we
could try to capture EXPLAIN's output somehow (plpgsql dynamic
execution or spool output from psql?) and then pull out just the row
estimates, maybe with extra rounding to cope with instability.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-09-13 Thread Henry

I was just reading the Postgresql 11 roadmap and it mentions native graph
support. I would be interested in following the design work for this.

Would this require a the new pluggable storage which is currently in
development or would the existing storage engine be sufficient? I am just
wondering if there are any rough design/plans for this...

https://wiki.postgresql.org/wiki/Fujitsu_roadmap#Multi-model_database

   - *graph: Natively support graph data model. Implement Cypher and/or
   Gremlin as the query language through UDFs.*


Thank you,
Henry


On Sun, Sep 3, 2017 at 1:14 PM MauMau  wrote:

> From: Henry M
> > This may be interesting... they implement cypher (unfortunately they
> had to fork in order to have cypher be a first class query language
> with SQL).
> >
> > https://github.com/bitnine-oss/agensgraph
>
> I'm sorry for my very late reply.
>
> Thanks for the information.  AgensGraph is certainly interesting, but
> the problem is that it's a fork of PostgreSQL as you mentioned.  I
> wish the data models, including query languages, to be pluggable
> extensions, so that various people (especially database researchers?)
> can develop them flexibly.  Of course, I want various data models to
> be incorporated in the core as early as possible, but I'm afraid it's
> not easy.  If new data models can be added as extensions, they can be
> developed outside the PostgreSQL community process, get popular and
> mature, and then be embraced in core like GiST/SP-Gist indexes and
> full text search did.
>
>
> Regards
> MauMau
>
>

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-13 Thread Robert Haas

On Wed, Sep 13, 2017 at 12:56 PM, Ashutosh Bapat
 wrote:
> I debugged what happens in case of query "select 1 from t1 union all
> select 2 from t1;" with the current HEAD (without multi-level
> expansion patch attached). It doesn't set partitioned_rels in Append
> path that gets converted into Append plan. Remember t1 is a
> multi-level partitioned table here with t1p1 as its immediate
> partition and t1p1p1 as partition of t1p1. So, the
> set_append_rel_pathlist() recurses once as shown in the following
> stack trace.

Nice debugging.  I spent some time today looking at this and I think
it's a bug in v10, and specifically in add_paths_to_append_rel(),
which only sets partitioned_rels correctly when the appendrel is a
partitioned rel, and not when it's a subquery RTE with one or more
partitioned queries beneath it.

Attached are two patches either one of which will fix it.  First, I
wrote mechanical-partrels-fix.patch, which just mechanically
propagates partitioned_rels lists from accumulated subpaths into the
list used to construct the parent (Merge)AppendPath.  I wasn't entire
happy with that, because it ends up building multiple partitioned_rels
lists for the same RelOptInfo.  That seems silly, but there's no
principled way to avoid it; avoiding it amounts to hoping that all the
paths for the same relation carry the same partitioned_rels list,
which is uncomfortable.

So then I wrote pcinfo-for-subquery.patch.  That patch notices when an
RTE_SUBQUERY appendrel is processed and accumulates the
partitioned_rels of its immediate children; in case there can be
multiple nested levels of subqueries before we get down to the actual
partitioned rel, it also adds a PartitionedChildRelInfo for the
subquery RTE, so that there's no need to walk the whole tree to build
the partitioned_rels list at higher levels, just the immediate
children.  I find this fix a lot more satisfying.  It adds less code
and does no extra work in the common case.

Notice that the choice of fix we adopt has consequences for your
0001-Multi-level-partitioned-table-expansion.patch -- with
mechanical-partrels-fix.patch, that patch could either associated all
partitioned_rels with the top-parent or it could work level by level
and everything would get properly assembled later.  But with
pcinfo-for-subquery.patch, we need everything associated with the
top-parent.  That doesn't seem like a problem to me, but it's
something to note.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

mechanical-partrels-fix.patch
Description: Binary data

pcinfo-for-subquery.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Tom Lane

Andres Freund  writes:
> Indeed that seems plausible. I guess something like the attached should
> fix the issue?

Ah, I see you came to the same conclusion I did.  But see comment
about adding a comment.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Tom Lane

Jeff Janes  writes:
> In 10beta4 and 11dev, If I run the below it enters an uninterruptible
> state.  After the insert starts, I give 15 or seconds or so until the
> memory usage starts to grow due to enqueued triggers checks. Then I can't
> interrupt it with either ctrl-C in psql or kill -15  from another
> terminal.

Hm, I suspect the culprit is that the fast path out of ExecScan()
fails to include a CHECK_FOR_INTERRUPTS.  It might be best to take
the CHECK_FOR_INTERRUPTS at line 160 and put it into ExecScanFetch
instead (but if so, that function's comment could use adjustment).
Andres?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Jeff Janes

On Wed, Sep 13, 2017 at 2:41 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-09-13 14:28:34 -0700, Jeff Janes wrote:
> > In 10beta4 and 11dev, If I run the below it enters an uninterruptible
> > state.  After the insert starts, I give 15 or seconds or so until the
> > memory usage starts to grow due to enqueued triggers checks. Then I can't
> > interrupt it with either ctrl-C in psql or kill -15  from another
> > terminal.
> >
> > I have to do kill -9 
> >
> > create table foo  (x int);
> > create or replace function notice () returns trigger as $$ begin raise
> > notice 'asdfsdf'; return NEW; END;$$ language plpgsql;
> > create trigger foobar after insert on foo for each row execute procedure
> > notice();
> > insert into foo select * from generate_series(1,1);
> >
> > Git bisect lays the blame here which certainly seems plausible:
> >
> > commit d47cfef7116fb36349949f5c757aa2112c249804
> > Author: Andres Freund 
> > Date:   Tue Jul 25 17:37:17 2017 -0700
> >
> > Move interrupt checking from ExecProcNode() to executor nodes.
>
> Indeed that seems plausible. I guess something like the attached should
> fix the issue?
>

Yep, that fixes it.

Thanks,

Jeff

Re: [HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Andres Freund

Hi,

On 2017-09-13 14:28:34 -0700, Jeff Janes wrote:
> In 10beta4 and 11dev, If I run the below it enters an uninterruptible
> state.  After the insert starts, I give 15 or seconds or so until the
> memory usage starts to grow due to enqueued triggers checks. Then I can't
> interrupt it with either ctrl-C in psql or kill -15  from another
> terminal.
> 
> I have to do kill -9 
> 
> create table foo  (x int);
> create or replace function notice () returns trigger as $$ begin raise
> notice 'asdfsdf'; return NEW; END;$$ language plpgsql;
> create trigger foobar after insert on foo for each row execute procedure
> notice();
> insert into foo select * from generate_series(1,1);
> 
> Git bisect lays the blame here which certainly seems plausible:
> 
> commit d47cfef7116fb36349949f5c757aa2112c249804
> Author: Andres Freund 
> Date:   Tue Jul 25 17:37:17 2017 -0700
> 
> Move interrupt checking from ExecProcNode() to executor nodes.

Indeed that seems plausible. I guess something like the attached should
fix the issue?

Greetings,

Andres Freund
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 47a34a044a..3fec198589 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -40,6 +40,8 @@ ExecScanFetch(ScanState *node,
 {
 	EState	   *estate = node->ps.state;
 
+	CHECK_FOR_INTERRUPTS();
+
 	if (estate->es_epqTuple != NULL)
 	{
 		/*
@@ -157,8 +159,6 @@ ExecScan(ScanState *node,
 	{
 		TupleTableSlot *slot;
 
-		CHECK_FOR_INTERRUPTS();
-
 		slot = ExecScanFetch(node, accessMtd, recheckMtd);
 
 		/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-09-13 Thread Thomas Munro

On Wed, Sep 6, 2017 at 4:12 AM, Pavel Stehule  wrote:
> 2017-09-05 18:06 GMT+02:00 Tom Lane :
>> Pushed, with some fooling with the documentation (notably,
>> re-alphabetizing relevant lists).
>>
> Thank you very much

I've started setting PSQL_PAGER="~/bin/pspg -s0" to try your new
column-aware pager from https://github.com/okbob/pspg for my regular
work.  Wow!  It could use some warning clean-up but it's a clever idea
and so far it works really well.  Thanks for making this.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Jeff Janes

In 10beta4 and 11dev, If I run the below it enters an uninterruptible
state.  After the insert starts, I give 15 or seconds or so until the
memory usage starts to grow due to enqueued triggers checks. Then I can't
interrupt it with either ctrl-C in psql or kill -15  from another
terminal.

I have to do kill -9 

create table foo  (x int);
create or replace function notice () returns trigger as $$ begin raise
notice 'asdfsdf'; return NEW; END;$$ language plpgsql;
create trigger foobar after insert on foo for each row execute procedure
notice();
insert into foo select * from generate_series(1,1);

Git bisect lays the blame here which certainly seems plausible:

commit d47cfef7116fb36349949f5c757aa2112c249804
Author: Andres Freund 
Date:   Tue Jul 25 17:37:17 2017 -0700

Move interrupt checking from ExecProcNode() to executor nodes.



#0  0x003e4e8db7d0 in __write_nocancel () at
../sysdeps/unix/syscall-template.S:82
#1  0x004ef3e1 in XLogFileInit (logsegno=189,
use_existent=0x7ffe52b505df "\001", use_lock=1 '\001') at xlog.c:3222
#2  0x004f15e8 in XLogWrite (WriteRqst=..., flexible=0 '\000') at
xlog.c:2408
#3  0x004f1d89 in AdvanceXLInsertBuffer (upto=3175088128,
opportunistic=) at xlog.c:2114
#4  0x004f1e99 in GetXLogBuffer (ptr=3175088128) at xlog.c:1879
#5  0x004f79fe in CopyXLogRecordToWAL (rdata=0xc921f0,
fpw_lsn=, flags=1 '\001') at xlog.c:1498
#6  XLogInsertRecord (rdata=0xc921f0, fpw_lsn=,
flags=1 '\001') at xlog.c:1073
#7  0x004fb40b in XLogInsert (rmid=10 '\n', info=0 '\000') at
xloginsert.c:462
#8  0x004af8fd in heap_insert (relation=0x7f1456e7be58,
tup=0x7f14568e12d8, cid=, options=,
bistate=) at heapam.c:2537
#9  0x0060db9f in ExecInsert (node=0x14f44b8) at
nodeModifyTable.c:601
#10 ExecModifyTable (node=0x14f44b8) at nodeModifyTable.c:1741
#11 0x005f4ef8 in ExecProcNode (node=0x14f44b8) at
execProcnode.c:422
#12 0x005f1d76 in ExecutePlan (queryDesc=0x1442bb8,
direction=, count=0, execute_once=-72 '\270') at
execMain.c:1693
#13 standard_ExecutorRun (queryDesc=0x1442bb8, direction=, count=0, execute_once=-72 '\270') at execMain.c:362
#14 0x0071c95b in ProcessQuery (plan=,
sourceText=0x149d018 "insert into foo select * from
generate_series(1,1);",
params=0x0, queryEnv=0x0, dest=,
completionTag=0x7ffe52b50cc0 "") at pquery.c:161
#15 0x0071cb95 in PortalRunMulti (portal=0x14caea8, isTopLevel=1
'\001', setHoldSnapshot=0 '\000', dest=0x14f0ca8, altdest=0x14f0ca8,
completionTag=0x7ffe52b50cc0 "") at pquery.c:1286
#16 0x0071d260 in PortalRun (portal=0x14caea8,
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001',
dest=0x14f0ca8,
altdest=0x14f0ca8, completionTag=0x7ffe52b50cc0 "") at pquery.c:799
#17 0x00719987 in exec_simple_query (query_string=0x149d018 "insert
into foo select * from generate_series(1,1);") at postgres.c:1099
#18 0x0071a8c9 in PostgresMain (argc=,
argv=, dbname=0x14473c0 "jjanes", username=)
at postgres.c:4090
#19 0x006aff4a in BackendRun (argc=,
argv=) at postmaster.c:4357
#20 BackendStartup (argc=, argv=)
at postmaster.c:4029
#21 ServerLoop (argc=, argv=) at
postmaster.c:1753
#22 PostmasterMain (argc=, argv=)
at postmaster.c:1361
#23 0x00631410 in main (argc=1, argv=0x141c2f0) at main.c:228



Cheers,

Jeff

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Fabien COELHO



Hello Tom,


Probably it needs some rebase after Tom committed result status variables.



As it is a style thing, ISTM that the patch is ready if most people agree
that it is better this way and there is no strong veto against.


FWIW, I think it's a bad idea.  We already nearly-doubled the vertical
space required for this variable list.  That was a heavy cost --- and we
already got at least one complaint about it --- but it seemed warranted
to avoid having to deal with very constrained variable descriptions.
This proposes to make the vertical space nearly triple what it was in v10.
In a typical-size window that's going to have a pretty severe impact on
how much of the list you can see at once.  And the readability gain is
(at least to my eyes) very marginal.


Ok, you do not like it. As Pavel said, it is subjective. When it is a 
matter of taste, people tend to differ, someone will always complain, one 
way or another, and they are neither right nor wrong.


So, is it a -1 or a veto?

If it is the later, the patch can be marked as "Rejected" and everybody 
will get more time for other things:-)


If it is a not a veto, people can continue to give their opinions. 
Personnally I'm fine with a pager, so vertical spacing is fine. I just do 
not like paging horizontally.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small patch for pg_basebackup argument parsing

2017-09-13 Thread Pierre Ducroquet

On Wednesday, September 13, 2017 2:06:50 AM CEST Daniel Gustafsson wrote:
> > On 05 Jul 2017, at 08:32, Michael Paquier 
> > wrote:> 
> > On Wed, Jul 5, 2017 at 2:57 PM, Ryan Murphy  wrote:
> >> I tried to apply your patch to test it (though reading Robert's last
> >> comment it seems we wish to have it adjusted before committing)... but
> >> in any case I was not able to apply your patch to the tip of the master
> >> branch (my git apply failed).  I'm setting this to Waiting On Author for
> >> a new patch, and I also agree with Robert that the test can be simpler
> >> and can go in the other order.  If you don't have time to make another
> >> patch, let me know, I may be able to make one.> 
> > git is unhappy even if forcibly applied with patch -p1. You should
> > check for whitespaces at the same time:
> > $ git diff --check
> > src/bin/pg_basebackup/pg_receivewal.c:483: indent with spaces.
> > +char   *strtol_endptr = NULL
> > And there are more than this one.
> 
> Like Michael said above, this patch no longer applies and have some
> whitespace issues.  The conflicts in applying are quite trivial though, so
> it should be easy to create a new version.  Do you plan to work on this
> during this Commitfest so we can move this patch forward?

Yes, my bad. Attached to this email is the new version, that now applies on 
master.

Sorry for the delay :(
>From 7efc1573c1bcc07c0eaa80912e6e035f2e0d203d Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 13 Sep 2017 19:51:09 +0200
Subject: [PATCH] Port most calls of atoi(optarg) to strcol

atoi does not allow any kind of data check. If the input data is invalid,
the result will be silently truncated to the valid part of input, or just
0 if no digit was found at the beginning of input.
---
 src/bin/pg_basebackup/pg_basebackup.c  | 11 ++-
 src/bin/pg_basebackup/pg_receivewal.c  | 11 ++-
 src/bin/pg_basebackup/pg_recvlogical.c | 11 ++-
 src/bin/pg_ctl/pg_ctl.c|  9 -
 src/bin/pg_dump/pg_dump.c  | 12 +---
 src/bin/pg_dump/pg_restore.c   |  8 +++-
 src/bin/pg_upgrade/option.c|  5 +++--
 src/bin/pgbench/pgbench.c  | 33 +
 8 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 51509d150e..b51b62cc21 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -87,7 +87,7 @@ static IncludeWal includewal = STREAM_WAL;
 static bool fastcheckpoint = false;
 static bool writerecoveryconf = false;
 static bool do_sync = true;
-static int	standby_message_timeout = 10 * 1000;	/* 10 sec = default */
+static long int	standby_message_timeout = 10 * 1000;	/* 10 sec = default */
 static pg_time_t last_progress_report = 0;
 static int32 maxrate = 0;		/* no limit by default */
 static char *replication_slot = NULL;
@@ -2072,6 +2072,7 @@ BaseBackup(void)
 int
 main(int argc, char **argv)
 {
+	char *strtol_endptr = NULL;
 	static struct option long_options[] = {
 		{"help", no_argument, NULL, '?'},
 		{"version", no_argument, NULL, 'V'},
@@ -2212,8 +2213,8 @@ main(int argc, char **argv)
 #endif
 break;
 			case 'Z':
-compresslevel = atoi(optarg);
-if (compresslevel < 0 || compresslevel > 9)
+compresslevel = strtol(optarg, &strtol_endptr, 10);
+if (compresslevel < 0 || compresslevel > 9 || (strtol_endptr != optarg + strlen(optarg)))
 {
 	fprintf(stderr, _("%s: invalid compression level \"%s\"\n"),
 			progname, optarg);
@@ -2251,8 +2252,8 @@ main(int argc, char **argv)
 dbgetpassword = 1;
 break;
 			case 's':
-standby_message_timeout = atoi(optarg) * 1000;
-if (standby_message_timeout < 0)
+standby_message_timeout = strtol(optarg, &strtol_endptr, 10) * 1000;
+if ((standby_message_timeout < 0) || (strtol_endptr != optarg + strlen(optarg)))
 {
 	fprintf(stderr, _("%s: invalid status interval \"%s\"\n"),
 			progname, optarg);
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 710a33ab4d..c1651961b5 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -494,6 +494,7 @@ main(int argc, char **argv)
 	int			c;
 	int			option_index;
 	char	   *db_name;
+	char	   *strtol_endptr = NULL;
 	uint32		hi, lo;
 
 	progname = get_progname(argv[0]);
@@ -529,7 +530,7 @@ main(int argc, char **argv)
 dbhost = pg_strdup(optarg);
 break;
 			case 'p':
-if (atoi(optarg) <= 0)
+if ((strtol(optarg, &strtol_endptr, 10) <= 0) || (strtol_endptr != optarg + strlen(optarg)))
 {
 	fprintf(stderr, _("%s: invalid port number \"%s\"\n"),
 			progname, optarg);
@@ -547,8 +548,8 @@ main(int argc, char **argv)
 dbgetpassword = 1;
 break;
 			case 's':
-standby_message_timeout = atoi(optarg) * 1000;
-if (standby_message_timeout < 0)
+standby_mess

Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)

2017-09-13 Thread Michael Banck

On Tue, Sep 12, 2017 at 07:38:40PM -0400, Stephen Frost wrote:
> Further, really, I think we should provide a utility to do all of the
> above instead of using rsync- and that utility should do some additional
> things, such as:
> 
> - Check that the control file on the primary and replica show that they
>   reached the same point prior to the pg_upgrade.  If they didn't, then
>   things could go badly as there's unplayed WAL that the primary got
>   through and the replica didn't.
> 
> - Not copy over unlogged data, or any other information that shouldn't
>   be copied across.
> 
> - Allow the directory structures to be more different between the
>   primary and the replica than rsync allows (wouldn't have to have a
>   common subdirectory on the replica).
> 
> - Perhaps other validation checks or similar.
> 
> Unfortunately, this is a bit annoying as it necessairly involves running
> things on both the primary and the replica from the same tool, without
> access to PG, meaning we'd have to work through something else (such as
> SSH, like rsync does, but then what would we do for Windows...?).

Maybe pg_rewind's mechanism could be partially reused for this as it
seems to accomplish something vaguely similar AIUI?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace

2017-09-13 Thread Pierre Ducroquet

On Wednesday, September 13, 2017 6:01:43 PM CEST you wrote:
> > On 15 May 2017, at 07:26, Michael Paquier 
> > wrote:> 
> > On Tue, May 2, 2017 at 2:28 AM, Pierre Ducroquet  
wrote:
> >> I will submit this patch in the current commit fest.
> > 
> > I have not spotted any flaws in the refactored logic.
> 
> This patch no longer applies, could you take a look at it and submit a new
> version rebased on top of HEAD?
> 
> cheers ./daniel

Hi

All my apologies for the schockingly long time with no answer on this topic.
Attached to this email is the new version of the patch, checked against HEAD 
and REL_10_STABLE, with the small change required by Michael (affect true/
false to the boolean instead of 1/0) applied.
I will do my best to help review some patches in the current CF.

 Pierre>From cfb47eb5db398c1a30c5f83c680b59b4aa3d196a Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 13 Sep 2017 19:09:32 +0200
Subject: [PATCH] Allow a pg_basebackup when a tablespace is shared between two
 versions

When a tablespace folder is shared between two PostgreSQL versions,
pg_basebackup fails because its tablespace folder checking is stricter
than what is done in the server.
That behaviour makes it possible to create clusters that will then be
complicated to replicate whithout playing with symlinks.

This patch fixes this by delaying the tablespace folder verification.
The folder name is using the PG catalog version, that can not be
obtained from the server. It is a compile-time constant that is not
exposed publicly.
The fix is thus to simply delay the checking of folders and use the
folder name from the tablespace tarball.
---
 src/bin/pg_basebackup/pg_basebackup.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index dfb9b5ddcb..080468d846 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1310,6 +1310,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 	pgoff_t		current_len_left = 0;
 	int			current_padding = 0;
 	bool		basetablespace;
+	boolfirstfile = 1;
 	char	   *copybuf = NULL;
 	FILE	   *file = NULL;
 
@@ -1403,7 +1404,15 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 	 * Directory
 	 */
 	filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
-	if (mkdir(filename, S_IRWXU) != 0)
+	if (firstfile && !basetablespace)
+	{
+		/*
+		 * The first file in the tablespace is its main folder, whose name can not be guessed (PG_MAJORVER_CATVER)
+		 * So we must check here that this folder can be created or is empty.
+		 */
+		verify_dir_is_empty_or_create(filename, &made_tablespace_dirs, &found_tablespace_dirs);
+	}
+	else if (mkdir(filename, S_IRWXU) != 0)
 	{
 		/*
 		 * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
@@ -1534,6 +1543,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 continue;
 			}
 		}		/* continuing data in existing file */
+		firstfile = 0;	/* mark that we are done with the first file of the tarball */
 	}			/* loop over all data blocks */
 	progress_report(rownum, filename, true);
 
@@ -1848,18 +1858,6 @@ BaseBackup(void)
 	for (i = 0; i < PQntuples(res); i++)
 	{
 		totalsize += atol(PQgetvalue(res, i, 2));
-
-		/*
-		 * Verify tablespace directories are empty. Don't bother with the
-		 * first once since it can be relocated, and it will be checked before
-		 * we do anything anyway.
-		 */
-		if (format == 'p' && !PQgetisnull(res, i, 1))
-		{
-			char	   *path = (char *) get_tablespace_mapping(PQgetvalue(res, i, 1));
-
-			verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs);
-		}
 	}
 
 	/*
-- 
2.14.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-13 Thread Ashutosh Bapat

On Wed, Sep 13, 2017 at 12:51 PM, Ashutosh Bapat
 wrote:
> On Wed, Sep 13, 2017 at 12:39 AM, Robert Haas  wrote:
>> On Tue, Sep 12, 2017 at 3:46 AM, Amit Langote
>>  wrote:
>>> In this case, AcquireExecutorLocks will lock all the relations in
>>> PlannedStmt.rtable, which must include all partitioned tables of all
>>> partition trees involved in the query.  Of those, it will lock the tables
>>> whose RT indexes appear in PlannedStmt.nonleafResultRelations with
>>> RowExclusiveLock mode.  PlannedStmt.nonleafResultRelations is a global
>>> list of all partitioned table RT indexes obtained by concatenating
>>> partitioned_rels lists of all ModifyTable nodes involved in the query
>>> (set_plan_refs does that).  We need to distinguish nonleafResultRelations,
>>> because we need to take the stronger lock on a given table before any
>>> weaker one if it happens to appear in the query as a non-result relation
>>> too, to avoid lock strength upgrade deadlock hazard.
>>
>> Hmm.  The problem with this theory in my view is that it doesn't
>> explain why InitPlan() and ExecOpenScanRelation() lock the relations
>> instead of just assuming that they are already locked either by
>> AcquireExecutorLocks or by planning.  If ExecLockNonLeafAppendTables()
>> doesn't really need to take locks, then ExecOpenScanRelation() must
>> not need to do it either.  We invented ExecLockNonLeafAppendTables()
>> on the occasion of removing the scans of those tables which would
>> previously have caused ExecOpenScanRelation() to be invoked, so as to
>> keep the locking behavior unchanged.
>>
>> AcquireExecutorLocks() looks like an odd bit of code to me.  The
>> executor itself locks result tables in InitPlan() and then everything
>> else during InitPlan() and all of the others later on while walking
>> the plan tree -- comments in InitPlan() say that this is to avoid a
>> lock upgrade hazard if a result rel is also a source rel.  But
>> AcquireExecutorLocks() has no such provision; it just locks everything
>> in RTE order.  In theory, that's a deadlock hazard of another kind, as
>> we just talked about in the context of EIBO.  In fact, expanding in
>> bound order has made the situation worse: before, expansion order and
>> locking order were the same, so maybe having AcquireExecutorLocks()
>> work in RTE order coincidentally happened to give the same result as
>> the executor code itself as long as there are no result relations.
>> But this is certainly not true any more.  I'm not sure it's worth
>> expending a lot of time on this -- it's evidently not a problem in
>> practice, or somebody probably would've complained before now.
>>
>> But that having been said, I don't think we should assume that all the
>> locks taken from the executor are worthless because plancache.c will
>> always do the job for us.  I don't know of a case where we execute a
>> saved plan without going through the plan cache, but that doesn't mean
>> that there isn't one or that there couldn't be one in the future.
>> It's not the job of these partitioning patches to whack around the way
>> we do locking in general -- they should preserve the existing behavior
>> as much as possible.  If we want to get rid of the locking in the
>> executor altogether, that's a separate discussion where, I have a
>> feeling, there will prove to be better reasons for the way things are
>> than we are right now supposing.
>>
>
> I agree that it's not the job of these patches to change the locking
> or even get rid of partitioned_rels. In order to continue returning
> partitioned_rels in Append paths esp. in the case of queries involving
> set operations and partitioned table e.g "select 1 from t1 union all
> select 2 from t1;" in which t1 is multi-level partitioned table, we
> need a fix in add_paths_to_append_rels(). The fix provided in [1] is
> correct but we will need a longer explanation of why we have to
> involve RTE_SUBQUERY with RELKIND_PARTITIONED_TABLE. The explanation
> is complicated. If we get rid of partitioned_rels, we don't need to
> fix that code in add_paths_to_append_rel().
>
> I suggested that [2]
> -- (excerpt from [2])
>
> Actually, the original problem that caused this discussion started
> with an assertion failure in get_partitioned_child_rels() as
> Assert(list_length(result) >= 1);
>
> This assertion fails if result is NIL when an intermediate partitioned
> table is passed. May be we should assert (result == NIL ||
> list_length(result) == 1) and allow that function to be called even
> for intermediate partitioned partitions for which the function will
> return NIL. That will leave the code in add_paths_to_append_rel()
> simple. Thoughts?
> --
>
> Amit Langote agrees with this. It kind of makes the assertion lame but
> keeps the code sane. What do you think?

I debugged what happens in case of query "select 1 from t1 union all
select 2 from t1;" with the current HEAD (without multi-level
expansion patch attached). It doesn't set partitioned_rels in Append
path th

Re: [HACKERS] expanding inheritance in partition bound order

2017-09-13 Thread Robert Haas

On Wed, Sep 13, 2017 at 6:02 AM, Amit Langote
 wrote:
> It seems to me we don't really need the first patch all that much.  That
> is, let's keep PartitionDispatchData the way it is for now, since we don't
> really have any need for it beside tuple-routing (EIBO as committed didn't
> need it for one).  So, let's forget about "decoupling
> RelationGetPartitionDispatchInfo() from the executor" thing for now and
> move on.
>
> So, attached is just the patch to make RelationGetPartitionDispatchInfo()
> traverse the partition tree in depth-first manner to be applied on HEAD.

I like this patch.  Not only does it improve the behavior, but it's
actually less code than we have now, and in my opinion, the new code
is easier to understand, too.

A few suggestions:

- I think get_partition_dispatch_recurse() get a check_stack_depth()
call just like expand_partitioned_rtentry() did, and for the same
reasons: if the catalog contents are corrupted so that we have an
infinite loop in the partitioning hierarchy, we want to error out, not
crash.

- I think we should add a comment explaining that we're careful to do
this in the same order as expand_partitioned_rtentry() so that callers
can assume that the N'th entry in the leaf_part_oids array will also
be the N'th child of an Append node.

+ * For every partitioned table other than root, we must store a

other than the root

+ * partitioned table.  The value multiplied back by -1 is returned as the

multiplied by -1, not multiplied back by -1

+ * tables in the tree, using which, search is continued further down the
+ * partition tree.

Period after "in the tree".  Then continue: "This value is used to
continue the search in the next level of the partition tree."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)

2017-09-13 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> I have applied the attached patch to show examples of using rsync on
> PGDATA and tablespaces, documented that rsync is only useful when in
> link mode, and explained more clearly how rsync handles links.  You can
> see the results here:
> 
>   http://momjian.us/pgsql_docs/pgupgrade.html
> 
> Any more improvements?

First off, I'd strongly suggest that we make "Step 1" in the pg_upgrade
process be "take a full backup and verify that you're able to restore it
successfully and without corruption."

I don't particularly care for how this seems to imply that the Rsync
method is "the" method to use when --link mode is used with pg_upgrade.

I'd reword the section title to be along these lines:

If you have streaming replicas or log-shipping standby servers then they
will also need to be updated.  The simplest way to accomplish this is to
simply rebuild the replicas from scratch once the primary is back
online.  Unfortunately, that can take a long time for larger systems as
the data has to be copied from the primary to each replica in the
environment.  If --link mode was used with pg_upgrade, the Latest
checkpoint location matches between the primary and the replica(s) (as
discussed in Step 8), the rsync utility is available, and the existing
data directory and new data directory on the replica are able to be in a
common directory on the same filesystem (as is required on the primary
for --link mode to be used), then an alternative method may be used to
update the replicas using rsync which will generally require much less
time.

Note that this method will end up needlessly copying across temporary
files and unlogged tables.  If these make up a large portion of your
database, then rebuilding the replicas from scratch may be a better
option.

With this method, you will not be running pg_upgrade on the standby
servers, but rather rsync on the primary to sync the replicas to match
the results of the pg_upgrade on the primary.  Do not start any servers
yet.  If you did not use link mode, skip the instructions in this
section and simply recreate the standby servers.

This method requires that the *old* data directory on the replica be in
place as rsync will be creating a hard-link tree between the old data
files on the replica and the new data directory on the replica (as was
done by pg_upgrade on the primary).

a. Install the new PostgreSQL binaries on standby servers.

...

b. Make sure the new standby data directories do not exist

If initdb was run on the replica to create a new data directory, remove
that new data directory (the rsync will recreate it).  Do *not* remove
the existing old data directory.

c. Install custom shared object files

 ** I would probably move this up to be step 'b' instead, and make step
 'b' be step 'c' instead.

d. Stop standby servers

...

*new*
e. Verify/re-verify that Latest checkpoint location in pg_controldata
   on the replica matches that of the primary (from before the primary
   was upgraded with pg_upgrade).

f. Save configuration files

  ** this should have a caveat that it's only necessary if the config
  files are in the data directory.

g. Run rsync

  ** I am having a hard time figuring out why --delete makes sense here.
  There shouldn't be anything in the new data directory, and we don't
  actually need to delete anything in the old data directory on the
  replica, so what are we doing suggesting --delete be used?  Strikes me
  as unnecessairly adding risk, should someone end up doing the wrong
  command.  Also, again, if I was doing this, I'd absolutely run rsync
  with --dry-run for starters and review what it is going to do and make
  sure that's consistent with what I'd expect.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-09-13 Thread Robert Haas

On Wed, Sep 13, 2017 at 10:54 AM, Dean Rasheed  wrote:
> Oracle, MySQL and DB2 all use MINVALUE/MAXVALUE. Actually, Oracle and
> MySQL only use MAXVALUE, not MINVALUE, because they don't allow gaps
> between partitions and the first partition implicitly starts at
> MINVALUE, so the bounds that we currently support are a strict
> superset of those supported by Oracle and MySQL.
>
> Both Oracle and MySQL allow finite values after MAXVALUE (usually
> listed as "0" in code examples, e.g. see [1]). Oracle explicitly
> documents the fact that values after MAXVALUE are irrelevant in [1].
> I'm not sure if MySQL explicitly documents that, but it does behave
> the same.
>
> Also, both Oracle and MySQL store what the user entered (they do not
> canonicalise), as can be seen by looking at ALL_TAB_PARTITIONS in
> Oracle, or "show create table" in MySQL.

OK, thanks.  I still don't really like allowing this, but I can see
that compatibility with other systems has some value here, and if
nobody else is rejecting these cases, maybe we shouldn't either.  So
I'll hold my nose and change my vote to canonicalizing rather than
rejecting outright.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace

2017-09-13 Thread Daniel Gustafsson

> On 15 May 2017, at 07:26, Michael Paquier  wrote:
> 
> On Tue, May 2, 2017 at 2:28 AM, Pierre Ducroquet  wrote:
> 
>> I will submit this patch in the current commit fest.
> 
> I have not spotted any flaws in the refactored logic.

This patch no longer applies, could you take a look at it and submit a new
version rebased on top of HEAD?

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surjective functional indexes

2017-09-13 Thread Konstantin Knizhnik




On 13.09.2017 14:00, Simon Riggs wrote:

On 13 September 2017 at 11:30, Konstantin Knizhnik
 wrote:


The only reason of all this discussion about terms is that I need to choose
name for correspondent index option.
Simon think that we do not need this option at all. In this case we should
not worry about right term.
 From my point of view, "projection" is quite clear notion and not only for
mathematics. It is also widely used in IT and especially in DBMSes.

If we do have an option it won't be using fancy mathematical
terminology at all, it would be described in terms of its function,
e.g. recheck_on_update

Yes, I'd rather not have an option at all, just some simple code with
useful effect, like we have in many other places.


Yehhh,
After more thinking I found out that my idea to use table/index 
statistic (particularity number of distinct values) to determine 
projection functions  was wrong.
Consider case column bookinfo of jsonb type and index expression 
(bookinfo->'ISBN').
Both can be considered as unique. But it is an obvious example of 
projection function, which value is  not changed if we update other 
information related with this book.


So this approach doesn't work. Looks like the only thing we can do to 
autotune is to collect own statistic: how frequently changing 
attribute(s) doesn't affect result of the function.
By default we can considered function as projection and perform 
comparison of old/new function results.
If after some number of comparisons  fraction of hits (when value of 
function is not changed) is smaller than some threshold (0.5?, 0.9?,...) 
then we can mark index as non-projective
and eliminate this checks in future. But it will require extending index 
statistic. Do we really need/want it?


Despite to the possibility to implement autotune, I still think that we 
should have manual switch, doesn't mater how it is named.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-09-13 Thread Stephen Frost

Tom, all,

* Stephen Frost (sfr...@snowman.net) wrote:
> Alright, here's an updated patch which cleans things up a bit and adds
> comments to explain what's going on.  I also updated the comments in
> acl.h to explain that ordering actually does matter.

Getting back to this, here's rebased patches for master/v10 and 9.6
(which only had whitespace differences, probably pgindent to blame
there..).

I'm going to push these later today unless there's other comments on it.

Thanks!

Stephen
From ac86eb5451492bcb72f25cceab4ae467716ea073 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Thu, 3 Aug 2017 21:23:37 -0400
Subject: [PATCH] Fix ordering in pg_dump of GRANTs

The order in which GRANTs are output is important as GRANTs which have
been GRANT'd by individuals via WITH GRANT OPTION GRANTs have to come
after the GRANT which included the WITH GRANT OPTION.  This happens
naturally in the backend during normal operation as we only change
existing ACLs in-place, only add new ACLs to the end, and when removing
an ACL we remove any which depend on it also.

Also, adjust the comments in acl.h to make this clear.

Unfortunately, the updates to pg_dump to handle initial privileges
involved pulling apart ACLs and then combining them back together and
could end up putting them back together in an invalid order, leading to
dumps which wouldn't restore.

Fix this by adjusting the queries used by pg_dump to ensure that the
ACLs are rebuilt in the same order in which they were originally.

Back-patch to 9.6 where the changes for initial privileges were done.
---
 src/bin/pg_dump/dumputils.c | 51 -
 src/include/utils/acl.h | 14 ++---
 2 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index dfc611848b..e4c95feb63 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -722,21 +722,36 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer racl_subquery,
 	 * We always perform this delta on all ACLs and expect that by the time
 	 * these are run the initial privileges will be in place, even in a binary
 	 * upgrade situation (see below).
+	 *
+	 * Finally, the order in which privileges are in the ACL string (the order
+	 * they been GRANT'd in, which the backend maintains) must be preserved to
+	 * ensure that GRANTs WITH GRANT OPTION and subsequent GRANTs based on
+	 * those are dumped in the correct order.
 	 */
-	printfPQExpBuffer(acl_subquery, "(SELECT pg_catalog.array_agg(acl) FROM "
-	  "(SELECT pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) AS acl "
-	  "EXCEPT "
-	  "SELECT pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s as foo)",
+	printfPQExpBuffer(acl_subquery,
+	  "(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM "
+	  "(SELECT acl, row_n FROM "
+	  "pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
+	  "WITH ORDINALITY AS perm(acl,row_n) "
+	  "WHERE NOT EXISTS ( "
+	  "SELECT 1 FROM "
+	  "pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+	  "AS init(init_acl) WHERE acl = init_acl)) as foo)",
 	  acl_column,
 	  obj_kind,
 	  acl_owner,
 	  obj_kind,
 	  acl_owner);
 
-	printfPQExpBuffer(racl_subquery, "(SELECT pg_catalog.array_agg(acl) FROM "
-	  "(SELECT pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) AS acl "
-	  "EXCEPT "
-	  "SELECT pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s as foo)",
+	printfPQExpBuffer(racl_subquery,
+	  "(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM "
+	  "(SELECT acl, row_n FROM "
+	  "pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+	  "WITH ORDINALITY AS initp(acl,row_n) "
+	  "WHERE NOT EXISTS ( "
+	  "SELECT 1 FROM "
+	  "pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
+	  "AS permp(orig_acl) WHERE acl = orig_acl)) as foo)",
 	  obj_kind,
 	  acl_owner,
 	  acl_column,
@@ -761,19 +776,25 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer racl_subquery,
 	{
 		printfPQExpBuffer(init_acl_subquery,
 		  "CASE WHEN privtype = 'e' THEN "
-		  "(SELECT pg_catalog.array_agg(acl) FROM "
-		  "(SELECT pg_catalog.unnest(pip.initprivs) AS acl "
-		  "EXCEPT "
-		  "SELECT pg_catalog.unnest(pg_catalog.acldefault(%s,%s))) as foo) END",
+		  "(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM "
+		  "(SELECT acl, row_n FROM pg_catalog.unnest(pip.initprivs) "
+		  "WITH ORDINALITY AS initp(acl,row_n) "
+		  "WHERE NOT EXISTS ( "
+		  "SELECT 1 FROM "
+		  "pg_catalog.unnest(pg_catalog.acldefault(%s,%s)) "
+		  "AS privm(orig_acl) WHERE acl = orig_acl)) as foo) END",
 		  obj_kind,
 		  acl_owner);
 
 		printfPQExpBuffer(init_racl_subquery,
 		  "CASE WHEN privtype = 'e' THEN "
 		  "(SE

Re: [HACKERS] Inconsistencies between pg_settings and postgresql.conf

2017-09-13 Thread Adrian Escoms

Hi,

I realized that the parameter 'shared_preload_libraries' used to belong to
category 'Resource Usage / Kernel Resources' but since postgresql version
9.4 it was changed in pg_settings to 'Client Connection Defaults / Shared
Library Preloading' but in postgresql.conf it remains unchanged.
I attach the updated postgresql.conf.sample.diff with this change.

Regards

Adrián Escoms

On Wed, Sep 13, 2017 at 11:51 AM, Adrian Escoms  wrote:

> Hello,
>
> We are working with postgresql.conf configuration file and we have found
> some discrepancies between it and pg_settings in terms of categories and
> subcategories (we have split the field category in pg_settings by '/', the
> first part being 'category', the second 'subcategory')
> We suggest to change the postgresql.conf.sample file with the attached
> diff to resolve these inconsistencies.
> We think that these changes could also be backported to previous
> versions.
>
> Looking forward to your comments,
>
>
> Adrián Escoms
>
diff --git a/src/backend/utils/misc/postgresql.conf.sample 
b/src/backend/utils/misc/postgresql.conf.sample
index 53aa006..49d8b91 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -105,7 +105,7 @@
 
 
 #--
-# RESOURCE USAGE (except WAL)
+# RESOURCE USAGE
 #--
 
 # - Memory -
@@ -138,11 +138,10 @@
 #temp_file_limit = -1  # limits per-process temp file space
# in kB, or -1 for no limit
 
-# - Kernel Resource Usage -
+# - Kernel Resource -
 
 #max_files_per_process = 1000  # min 25
# (change requires restart)
-#shared_preload_libraries = '' # (change requires restart)
 
 # - Cost-Based Vacuum Delay -
 
@@ -172,7 +171,7 @@
 
 
 #--
-# WRITE AHEAD LOG
+# WRITE-AHEAD LOG
 #--
 
 # - Settings -
@@ -228,7 +227,7 @@
 # REPLICATION
 #--
 
-# - Sending Server(s) -
+# - Sending Servers -
 
 # Set these on the master and on any standby that will send replication data.
 
@@ -336,7 +335,7 @@
 
 
 #--
-# ERROR REPORTING AND LOGGING
+# REPORTING AND LOGGING
 #--
 
 # - Where to Log -
@@ -471,8 +470,9 @@
# -1 disables, 0 logs all temp files
 #log_timezone = 'GMT'
 
-
-# - Process Title -
+#--
+# PROCESS TITLE
+#--
 
 #cluster_name = '' # added to process titles if nonempty
# (change requires restart)
@@ -480,10 +480,10 @@
 
 
 #--
-# RUNTIME STATISTICS
+# STATISTICS
 #--
 
-# - Query/Index Statistics Collector -
+# - Query and Index Statistics Collector -
 
 #track_activities = on
 #track_counts = on
@@ -493,7 +493,7 @@
 #stats_temp_directory = 'pg_stat_tmp'
 
 
-# - Statistics Monitoring -
+# - Monitoring -
 
 #log_parser_stats = off
 #log_planner_stats = off
@@ -502,7 +502,7 @@
 
 
 #--
-# AUTOVACUUM PARAMETERS
+# AUTOVACUUM
 #--
 
 #autovacuum = on   # Enable autovacuum subprocess?  'on'
@@ -587,12 +587,16 @@
 # default configuration for text search
 #default_text_search_config = 'pg_catalog.simple'
 
-# - Other Defaults -
+# - Shared Library Preloading -
 
-#dynamic_library_path = '$libdir'
+#shared_preload_libraries = '' # (change requires restart)
 #local_preload_libraries = ''
 #session_preload_libraries = ''
 
+# - Other Defaults -
+
+#dynamic_library_path = '$libdir'
+
 
 #--
 # LOCK MANAGEMENT
@@ -610,7 +614,7 @@
 
 
 #--
-# VERSION/PLATFORM COMPATIBILITY
+# VERSION AND PLATFORM COMPATIBILITY
 #--
 
 # - Previous PostgreSQL Versions -

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Masahiko Sawada

On Thu, Sep 14, 2017 at 12:04 AM, Masahiko Sawada  wrote:
> On Wed, Sep 13, 2017 at 8:00 PM, Arseny Sher  wrote:
>> Peter Eisentraut  writes:
>>> We can break this in any number of ways:
>>>
>>> - (your patch) Kill workers right away after ALTER SUBSCRIPTION DISABLE,
>>> thus breaking the appearance of transactional DDL somewhat.
>>> ...
>>> - Have DROP SUBSCRIPTION attempt to kill workers if the subscription is
>>> disabled (and possibly, was changed in the same transaction), which
>>> would address this scenario very narrowly.
>>
>> Actually, my patch is closer to the last variant. I proposed to kill the
>> workers in DROP SUBSCRIPTION, and only if we are dropping replication
>> origin (which is probably always the case, though). I agree that it is
>> somewhat narrow and still slightly violates transactionality of DROP
>> SUBSCRIPTION, meaning that its changes (stopped workers) are seen before
>> the commit.
>>
>> However, do we care much about that? Is there any chance that users will
>> rely on living apply workers after DROP SUBSCRIPTION, but before the
>> transaction commit? In which situation this might be useful?
>>
>> On the other hand, forbidding to execute disable sub and drop sub in one
>> transaction makes it impossible e.g. to drop subscription in trigger as
>> long as Postgres doesn't have autonomous transactions.
>>
>>
>> Tom Lane  writes:
>>> ISTM the second of those (refuse to drop an in-use subscription) is
>>> by far the least surprising behavior.  However, I wonder if there aren't
>>> race conditions involved here.  What if we haven't yet committed a
>>> DROP SUBSCRIPTION, and some new worker starts up after we look for
>>> workers?
>>
>> We hold a lock on subscription till the end of transaction, so workers
>> won't start.
>>
>>> If there aren't variants of that that will break all four options,
>>> it's not very obvious why not.
>>
>> I see it this way:
>> * We want effect of drop sub invisible till commit, so we can't stop
>>   workers before commit.
>> * Drop of replication origin needs to be executed in one transaction with
>>   drop sub, it writes to WAL and so must be executed before commit.
>> * Apply worker needs RO for its work, it owns origin for the whole
>>   lifetime.
>>
>> Something should be given up here. One more idea that was not yet
>> mentioned is to abandon attempts to drop subs and ROs simultenously and
>> just garbage-collect replication origins periodically, but that doesn't
>> sound as an elegant solution.
>>
>>
>> Masahiko Sawada  writes:
>>
 I don't think this is reliable -- what if worker suddenly dies without
 accomplishing the job?
>>>
>>> The apply worker will be launched by the launcher later. If DROP
>>> SUBSCRIPTION is issued before the apply worker launches again, DROP
>>> SUBSCRIPTION itself can remove the replication origin.
>>
>> Why launcher would restart the worker if we already destroyed the
>> subscription?
>
> Ah, the apply worker will not launch in that case.
>
>> Consider the sequence of actions:
>>
>> * We check in DROP SUBSCRIPTION that worker alive and don't remove RO.
>> * DROP SUBSCRIPTION commits.
>> * Worker is killed by some villain before it had the chance to drop RO.
>>   It might be killed even before drop sub commit, but after the check,
>>   we are left with orphan RO anyway.
>
> Hmm yes, we could left with orphan the replication origin. It's not a
> good solution.
>
> The second option makes subscription management more complex for
> users. Whenever user wants to drop subscription in use, they have to
> disable it before dropping the subscription, while CREATE SUBSCRIPTION
> starts the logical replication without ALTER SUBSCRIPTION ENABLE. I
> guess disallowing DROP SUBSCRIPTION in a transaction would rather be
> more useful for users because there would not be a lot of users who
> want to manage subscriptions transactionally.
>

Sorry, "The second option" meant the second option of options listed by Peter.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Tom Lane

Alvaro Herrera  writes:
> Why is it that we're not opening the pager automatically when this help
> is invoked via psql --help=variables?  "\? variables" already does that.

Hm, given that output from a -c option does get paginated (I just
checked), maybe that should happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

2017-09-13 Thread Tom Lane

Kuntal Ghosh  writes:
> On Tue, Sep 12, 2017 at 9:47 PM, Tom Lane  wrote:
>> Aleksander Alekseev  writes:
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world:  tested, passed
>>> Implements feature:   tested, passed
>>> Spec compliant:   tested, passed
>>> Documentation:tested, passed
>>> Also I didn't manage to find any typos or obvious mistakes in the code.

> I've performed the regression tests for the same. It passed all the
> test cases. Also, I've verified the feature implementation using the
> queries posted by you earlier and some of my own test cases. It is
> working as expected.

Pushed, thanks for reviewing!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Masahiko Sawada

On Wed, Sep 13, 2017 at 8:00 PM, Arseny Sher  wrote:
> Peter Eisentraut  writes:
>> We can break this in any number of ways:
>>
>> - (your patch) Kill workers right away after ALTER SUBSCRIPTION DISABLE,
>> thus breaking the appearance of transactional DDL somewhat.
>> ...
>> - Have DROP SUBSCRIPTION attempt to kill workers if the subscription is
>> disabled (and possibly, was changed in the same transaction), which
>> would address this scenario very narrowly.
>
> Actually, my patch is closer to the last variant. I proposed to kill the
> workers in DROP SUBSCRIPTION, and only if we are dropping replication
> origin (which is probably always the case, though). I agree that it is
> somewhat narrow and still slightly violates transactionality of DROP
> SUBSCRIPTION, meaning that its changes (stopped workers) are seen before
> the commit.
>
> However, do we care much about that? Is there any chance that users will
> rely on living apply workers after DROP SUBSCRIPTION, but before the
> transaction commit? In which situation this might be useful?
>
> On the other hand, forbidding to execute disable sub and drop sub in one
> transaction makes it impossible e.g. to drop subscription in trigger as
> long as Postgres doesn't have autonomous transactions.
>
>
> Tom Lane  writes:
>> ISTM the second of those (refuse to drop an in-use subscription) is
>> by far the least surprising behavior.  However, I wonder if there aren't
>> race conditions involved here.  What if we haven't yet committed a
>> DROP SUBSCRIPTION, and some new worker starts up after we look for
>> workers?
>
> We hold a lock on subscription till the end of transaction, so workers
> won't start.
>
>> If there aren't variants of that that will break all four options,
>> it's not very obvious why not.
>
> I see it this way:
> * We want effect of drop sub invisible till commit, so we can't stop
>   workers before commit.
> * Drop of replication origin needs to be executed in one transaction with
>   drop sub, it writes to WAL and so must be executed before commit.
> * Apply worker needs RO for its work, it owns origin for the whole
>   lifetime.
>
> Something should be given up here. One more idea that was not yet
> mentioned is to abandon attempts to drop subs and ROs simultenously and
> just garbage-collect replication origins periodically, but that doesn't
> sound as an elegant solution.
>
>
> Masahiko Sawada  writes:
>
>>> I don't think this is reliable -- what if worker suddenly dies without
>>> accomplishing the job?
>>
>> The apply worker will be launched by the launcher later. If DROP
>> SUBSCRIPTION is issued before the apply worker launches again, DROP
>> SUBSCRIPTION itself can remove the replication origin.
>
> Why launcher would restart the worker if we already destroyed the
> subscription?

Ah, the apply worker will not launch in that case.

> Consider the sequence of actions:
>
> * We check in DROP SUBSCRIPTION that worker alive and don't remove RO.
> * DROP SUBSCRIPTION commits.
> * Worker is killed by some villain before it had the chance to drop RO.
>   It might be killed even before drop sub commit, but after the check,
>   we are left with orphan RO anyway.

Hmm yes, we could left with orphan the replication origin. It's not a
good solution.

The second option makes subscription management more complex for
users. Whenever user wants to drop subscription in use, they have to
disable it before dropping the subscription, while CREATE SUBSCRIPTION
starts the logical replication without ALTER SUBSCRIPTION ENABLE. I
guess disallowing DROP SUBSCRIPTION in a transaction would rather be
more useful for users because there would not be a lot of users who
want to manage subscriptions transactionally.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-09-13 Thread Dean Rasheed

On 13 September 2017 at 14:53, Robert Haas  wrote:
> On Wed, Sep 13, 2017 at 4:51 AM, Dean Rasheed  
> wrote:
>> A drawback to doing this is that we lose compatibility with syntaxes
>> supported by other databases, which was part of the reason for
>> choosing the terms MINVALUE and MAXVALUE in the first place.
>>
>> So thinking about this afresh, my preference would actually be to just
>> canonicalise the values stored rather than erroring out.
>
> Can you be more specific about what other databases do here?  Which
> other systems support MINVALUE/MAXVALUE, and what are their respective
> behaviors in this situation?
>

Oracle, MySQL and DB2 all use MINVALUE/MAXVALUE. Actually, Oracle and
MySQL only use MAXVALUE, not MINVALUE, because they don't allow gaps
between partitions and the first partition implicitly starts at
MINVALUE, so the bounds that we currently support are a strict
superset of those supported by Oracle and MySQL.

Both Oracle and MySQL allow finite values after MAXVALUE (usually
listed as "0" in code examples, e.g. see [1]). Oracle explicitly
documents the fact that values after MAXVALUE are irrelevant in [1].
I'm not sure if MySQL explicitly documents that, but it does behave
the same.

Also, both Oracle and MySQL store what the user entered (they do not
canonicalise), as can be seen by looking at ALL_TAB_PARTITIONS in
Oracle, or "show create table" in MySQL.

I have not used DB2.

Regards,
Dean

[1] https://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin001.htm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Alvaro Herrera

Most of the time I suppose you'd search (using the pager's search
function) whatever you're looking for, rather than read the whole
page from top to bottom.

Why is it that we're not opening the pager automatically when this help
is invoked via psql --help=variables?  "\? variables" already does that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)

2017-09-13 Thread Andreas Joseph Krogh

På onsdag 13. september 2017 kl. 15:26:27, skrev Bruce Momjian mailto:br...@momjian.us>>:
On Wed, Sep 13, 2017 at 01:35:17AM +0200, Andreas Joseph Krogh wrote:
 [snip]
 > I know I'm being a little nitty-gritty here, but if it helps me understand 
it
 > might help others.

 I have applied the attached patch to show examples of using rsync on
 PGDATA and tablespaces, documented that rsync is only useful when in
 link mode, and explained more clearly how rsync handles links.  You can
 see the results here:

 http://momjian.us/pgsql_docs/pgupgrade.html

 Any more improvements?

Very nice!

For sake of completeness I think an example of running rsync when 
having pg_wal located outside the data directories would be helpful. Especially 
an example upgrading from 9.6 to 10 because of the name-change of pg_xlog -> 
pg_wal.

 --
 Andreas Joseph Krogh

Re: [HACKERS] OpenFile() Permissions Refactor

2017-09-13 Thread David Steele

Hi Peter,

Here's a new patch based on your review.  Where I had a question I made
a choice as described below:

On 9/1/17 1:58 PM, David Steele wrote:
> On 9/1/17 1:15 PM, Peter Eisentraut wrote:
>> On 8/29/17 12:15, David Steele wrote:
>>
>> I wonder whether we even need that much flexibility.  We already set a
>> global umask, so we could just open all files with 0666 and let the
>> umask sort it out.  Then we don't need all the *Perm() variants.
> 
> Well, there's one instance where the *Perm is used:
> 
> diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
> - fd = OpenTransientFile(fnamebuf, O_CREAT | O_WRONLY | O_TRUNC | 
> PG_BINARY,
> -S_IRUSR | S_IWUSR | S_IRGRP 
> | S_IROTH);
> + fd = OpenTransientFilePerm(fnamebuf, O_CREAT | O_WRONLY | O_TRUNC |
> PG_BINARY,
> +S_IRUSR | S_IWUSR | 
> S_IRGRP | S_IROTH);
> 
> I also think it's worth leaving the variants for extensions to use.
> Even though there are no calls in the core extensions it's hard to say
> what might be out there in the field.

These have been left in.

>> I don't like the function-like macros in fd.h.  We can use proper functions.
> 
> I had them as functions originally but thought macros might be
> friendlier with compilers that don't inline.  I'm happy to change them back.

Macros have been converted to functions.

>> I also wonder whether the umask save-and-restore code in copy.c and
>> be-fsstubs.c is sound.  If the AllocateFile() call in between errors
>> out, then there is nothing that restores the original umask.  This might
>> need a TRY/CATCH block, or maybe just a chmod() afterwards.
> 
> Unless I'm mistaken this is a preexisting issue.  Would you prefer I
> submit a different patch for that or combine it into this patch?
> 
> The chmod() implementation seems the safer option to me and produces
> fewer code paths.  It also prevents partially-written files from being
> accessible to any user but postgres.

I went with chmod().  The fix is incorporated in this patch but if you
want it broken out let me know.

>> The mkdir() calls could perhaps use some further refactoring so you
>> don't have to specify the mode everywhere.
> 
> I thought about that but feared it would be considered an overreach.
> Does fd.c seem like a good place for the new function?

New functions MakeDirectory() and MakeDirectoryPerm() have been added to
fd.c.

MakeDirectoryPerm() is used in ipc.c.

>> This kind of code:
>>
>> -   if (stat_buf.st_mode & (S_IRWXG | S_IRWXO))
>> +   if (stat_buf.st_mode & PG_MODE_MASK_DEFAULT)
>> ereport(FATAL,
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>  errmsg("data directory \"%s\" has group or world access",
>> DataDir),
>> -errdetail("Permissions should be u=rwx (0700).")));
>> +errdetail("Permissions should be (%04o).",
>> PG_DIR_MODE_DEFAULT)));
>>
>> can be problematic, because we are hoping that PG_MODE_MASK_DEFAULT,
>> PG_DIR_MODE_DEFAULT, and the wording in the error message can stay
>> consistent.
> 
> Well, the eventual goal is to make the mask/mode configurable - at least
> to the extent that group access is allowed.  However, I'm happy to leave
> that discussion for another day.

Changes to postmaster.c have been reverted (except to rename mkdir to
MakeDirectory).

Patch v2 is attached.

-- 
-David
da...@pgmasters.net
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index fa409d72b7..3ab1fd2db4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1869,8 +1869,7 @@ qtext_store(const char *query, int query_len,
*query_offset = off;
 
/* Now write the data into the successfully-reserved part of the file */
-   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY,
-  S_IRUSR | S_IWUSR);
+   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY);
if (fd < 0)
goto error;
 
@@ -1934,7 +1933,7 @@ qtext_load_file(Size *buffer_size)
int fd;
struct stat stat;
 
-   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY, 0);
+   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY);
if (fd < 0)
{
if (errno != ENOENT)
diff --git a/src/backend/access/heap/rewriteheap.c 
b/src/backend/access/heap/rewriteheap.c
index bd560e47e1..f93c194e18 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1013,8 +1013,7 @@ logical_rewrite_log_mapping(RewriteState state, 
TransactionId xid,
src->off = 0;
memcpy(src->path, path, sizeof(path));
src->vfd = PathNameOpenFile(

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Pavel Stehule

2017-09-13 16:11 GMT+02:00 Tom Lane :

> Fabien COELHO  writes:
> >> I'll assign this patch to next commitfest
>
> > Probably it needs some rebase after Tom committed result status
> variables.
>
> > As it is a style thing, ISTM that the patch is ready if most people agree
> > that it is better this way and there is no strong veto against.
>
> FWIW, I think it's a bad idea.  We already nearly-doubled the vertical
> space required for this variable list.  That was a heavy cost --- and we
> already got at least one complaint about it --- but it seemed warranted
> to avoid having to deal with very constrained variable descriptions.
> This proposes to make the vertical space nearly triple what it was in v10.
> In a typical-size window that's going to have a pretty severe impact on
> how much of the list you can see at once.  And the readability gain is
> (at least to my eyes) very marginal.
>

unfortunately - readability is subjective. Now it is really hard to read to
me. I understand so text is long enough already, and with lot of empty
lines, it will be longer.

because we cannot to use colors or some effects (bold, ..)  I am not
feeling to well when I read it.

I proposed it because readability of current design is not too good - but
it is not major topic for me - so should not to push this topic too much.
Somebody can read it better, some not - it is subjective.

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] SCRAM protocol documentation

2017-09-13 Thread Peter Eisentraut

On 8/11/17 09:27, Peter Eisentraut wrote:
> On 8/11/17 09:06, Álvaro Hernández Tortosa wrote:
>>  Strictly speaking the RFC assumes that the username is at least 1 
>> character. I understand this was precisely Peter's original comment.
> 
> Well, my main point was that the documentation, the code, and the code
> comments all say slightly different things.

To conclude this thread for now, I have removed the offending sentence
from the documentation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-09-13 Thread Jesper Pedersen


Hi Amul,

On 09/08/2017 08:40 AM, amul sul wrote:

Rebased 0002 against this commit & renamed to 0001, PFA.



This patch needs a rebase.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: new help related to variables are not too readable

2017-09-13 Thread Tom Lane

Fabien COELHO  writes:
>> I'll assign this patch to next commitfest

> Probably it needs some rebase after Tom committed result status variables.

> As it is a style thing, ISTM that the patch is ready if most people agree
> that it is better this way and there is no strong veto against.

FWIW, I think it's a bad idea.  We already nearly-doubled the vertical
space required for this variable list.  That was a heavy cost --- and we
already got at least one complaint about it --- but it seemed warranted
to avoid having to deal with very constrained variable descriptions.
This proposes to make the vertical space nearly triple what it was in v10.
In a typical-size window that's going to have a pretty severe impact on
how much of the list you can see at once.  And the readability gain is
(at least to my eyes) very marginal.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-09-13 Thread Robert Haas

On Wed, Sep 13, 2017 at 5:05 AM, Amit Langote
 wrote:
>> So thinking about this afresh, my preference would actually be to just
>> canonicalise the values stored rather than erroring out.
>
> Coincidentally, I just wrote the patch for canonicalizing stored values,
> instead of erroring out.  Please see attached if that's what you were
> thinking too.

Coincidentally, I wrote a patch for this too, but mine goes back to
rejecting MINVALUE or MAXVALUE followed by anything else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


enforce-minmaxvalue-consistency.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Alvaro Herrera

Tom Lane wrote:
> Peter Eisentraut  writes:

> > - Disallow DROP SUBSCRIPTION in a transaction under certain
> > circumstances, for example if a transaction has previously manipulated
> > the same subscription.

> ISTM the second of those (refuse to drop an in-use subscription) is
> by far the least surprising behavior.

+1 for that option.  IIRC this has precedent for other object types such
as tables, where we refuse some action if we have already operated on
the table in the same transaction.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2017-09-13 Thread Jesper Pedersen


Hi,

On 08/29/2017 05:04 AM, Mithun Cy wrote:

Test Setting:
=
Server configuration:
./postgres -c shared_buffers=8GB -N 300 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 -c
wal_buffers=256MB &

pgbench configuration:
scale_factor = 300
./pgbench -c $threads -j $threads -T $time_for_reading -M prepared -S  postgres

The machine has 64 cores with this patch I can see server starts
improvement after 64 clients. I have tested up to 256 clients. Which
shows performance improvement nearly max 39% [1]. This is the best
case for the patch where once computed snapshotData is reused further.

The worst case seems to be small, quick write transactions which
frequently invalidate the cached SnapshotData before it is reused by
any next GetSnapshotData call. As of now, I tested simple update
tests: pgbench -M Prepared -N on the same machine with the above
server configuration. I do not see much change in TPS numbers.

All TPS are median of 3 runs
Clients TPS-With Patch 05   TPS-Base%Diff
1 752.461117755.186777  -0.3%
64   32171.296537   31202.153576   +3.1%
128 41059.660769   40061.929658   +2.49%

I will do some profiling and find out why this case is not costing us
some performance due to caching overhead.



I have done a run with this patch on a 2S/28C/56T/256Gb w/ 2 x RAID10 
SSD machine.


Both for -M prepared, and -M prepared -S I'm not seeing any improvements 
(1 to 375 clients); e.g. +-1%.


Although the -M prepared -S case should improve, I'm not sure that the 
extra overhead in the -M prepared case is worth the added code complexity.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-09-13 Thread Robert Haas

On Wed, Sep 13, 2017 at 4:51 AM, Dean Rasheed  wrote:
> A drawback to doing this is that we lose compatibility with syntaxes
> supported by other databases, which was part of the reason for
> choosing the terms MINVALUE and MAXVALUE in the first place.
>
> So thinking about this afresh, my preference would actually be to just
> canonicalise the values stored rather than erroring out.

Can you be more specific about what other databases do here?  Which
other systems support MINVALUE/MAXVALUE, and what are their respective
behaviors in this situation?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)

2017-09-13 Thread Bruce Momjian

On Wed, Sep 13, 2017 at 01:35:17AM +0200, Andreas Joseph Krogh wrote:
> På onsdag 13. september 2017 kl. 01:00:20, skrev Bruce Momjian <
> br...@momjian.us>:
> (I know this isn't exactly -hackers food, but it seems natural to end this
> thread here)
>  
> Ok, thanks.
> It is clearer what happens now that you've explained that there's a clever
> "rsync-trick" involving 2 directories and making rsync preserving
> hard-links that way on the destination-server. Maybe it's because I'm not a
> native English speaker but it wasn't obvious to me...
>  
> I have my tablespaces laid out like this:
> /storage/fast_ssd/9.6/tablespaces/
> which you correctly say that in practice means that 9.6 files are (I see now
> that I don't need the pg-version in my directory-structure):
> /storage/fast_ssd/9.6/tablespaces//PG_9.6_201608131
>  
> I understand, I hope, that without link-mode rsyncing tablespaces would be 
> like
> this:
> rsync --archive /path/to/tablespace_basedir 
> standby:/path/to/tablespace_basedir
>  
> What would the equivalent be in link-mode, for transferring most efficiently?
> The reason I ask is that it's not immediately obvious to me what "old_datadir"
> and "new_datadir" when rsync'ing tablespaces and pg_wal dirs outside the
> "pg-dirs".
>  
> Speaking of pg_wal, how should this be rsynced now that it's changed its name
> (from pg_xlog), just rsync pg_xlog and rename it?
>  
> I know I'm being a little nitty-gritty here, but if it helps me understand it
> might help others.

I have applied the attached patch to show examples of using rsync on
PGDATA and tablespaces, documented that rsync is only useful when in
link mode, and explained more clearly how rsync handles links.  You can
see the results here:

http://momjian.us/pgsql_docs/pgupgrade.html

Any more improvements?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
new file mode 100644
index f8d9630..60011d8
*** a/doc/src/sgml/ref/pgupgrade.sgml
--- b/doc/src/sgml/ref/pgupgrade.sgml
*** pg_upgrade.exe
*** 421,432 
  Upgrade Streaming Replication and Log-Shipping standby 
servers

!  If you have Streaming Replication (see ) or Log-Shipping (see ) standby servers, follow these steps to
!  upgrade them.  You will not be running pg_upgrade
!  on the standby servers, but rather rsync on the
!  primary.  Do not start any servers yet.

--- 421,434 
  Upgrade Streaming Replication and Log-Shipping standby 
servers

!  If you used link mode and have Streaming Replication (see ) or Log-Shipping (see ) standby servers, follow these steps to
!  upgrade them.  You will not be running pg_upgrade on
!  the standby servers, but rather rsync on the primary.
!  Do not start any servers yet.  If you did not use link
!  mode, skip the instructions in this section and simply recreate the
!  standby servers.

*** pg_upgrade.exe
*** 482,490 
Run rsync

!From a directory on the primary server that is above the old and
!new database cluster directories, run this on the
!primary for each standby server:

  rsync --archive --delete --hard-links --size-only old_pgdata new_pgdata 
remote_dir
--- 484,494 
Run rsync

!When using link mode, standby servers can be quickly upgraded using
!rsync.  To accomplish this, from a directory on
!the primary server that is above the old and new database cluster
!directories, run this on the primary for each standby
!server:

  rsync --archive --delete --hard-links --size-only old_pgdata new_pgdata 
remote_dir
*** rsync --archive --delete --hard-links --
*** 492,521 

 where old_pgdata and new_pgdata are relative
 to the current directory on the primary, and remote_dir
!is above the old and new cluster directories on
!the standby.  The old and new relative cluster paths
!must match on the primary and standby server.  Consult the
 rsync manual page for details on specifying the
!remote directory, e.g. standbyhost:/opt/PostgreSQL/.

!What rsync does is to copy files from the
!primary to the standby, and, if pg_upgrade's
!--link mode was used, link files from the old to
!new clusters on the standby.  It links the same files that
!pg_upgrade linked in the primary old and new
!clusters.  (Of course, linking speeds up rsync.)
!Unfortunately, rsync needlessly copies files
!associated with temporary and unlogged tables because these files
!don't normally exist on standby servers.

Re: [HACKERS] Hooks to track changed pages for backup purposes

2017-09-13 Thread Tomas Vondra



On 09/13/2017 07:53 AM, Andrey Borodin wrote:
>> * I see there are conditions like this:
>>
>>    if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
>>
>> Why is it enough to restrict the block-tracking code to main fork?
>> Aren't we interested in all relation forks?
> fsm, vm and others are small enough to take them 
> 

That seems like an optimization specific to your backup solution, not
necessarily to others and/or to other possible use cases.

>> I guess you'll have to explain
>> what the implementation of the hooks is supposed to do, and why these
>> locations for hook calls are the right ones. It's damn impossible to
>> validate the patch without that information.
>>
>> Assuming you still plan to use the hook approach ...
> Yes, I still think hooking is good idea, but you are right - I need
> prototype first. I'll mark patch as Returned with feedback before
> prototype implementation.
> 

OK

>>
 There
 are no arguments fed to this hook, so modules would not be able to
 analyze things in this context, except shared memory and process
 state?
>>>

 Those hooks are put in hot code paths, and could impact performance of
 WAL insertion itself.
>>> I do not think sending few bytes to cached array is comparable to disk
>> write of XLog record. Checking the func ptr is even cheaper with correct
>> branch prediction.
>>>
>>
>> That seems somewhat suspicious, for two reasons. Firstly, I believe we
>> only insert the XLOG records into WAL buffer here, so why should there
>> be any disk write related? Or do you mean the final commit?
> Yes, I mean finally we will be waiting for disk. Hundred empty ptr
> checks are neglectable in comparision with disk.

Aren't we doing these calls while holding XLog locks? IIRC there was
quite a significant performance improvement after Heikki reduced the
amount of code executed while holding the locks.

>>
>> But more importantly, doesn't this kind of information require some
>> durability guarantees? I mean, if it gets lost during server crashes or
>> restarts, doesn't that mean the incremental backups might miss some
>> buffers? I'd guess the hooks will have to do some sort of I/O, to
>> achieve that, no?
> We need durability only on the level of one segment. If we do not have
> info from segment we can just rescan it.
> If we send segment to S3 as one file, we are sure in it's integrity. But
> this IO can by async.
> 
> PTRACK in it's turn switch bits in fork's buffers which are written in
> checkpointer and..well... recovered during recovery. By usual WAL replay
> of recovery.
> 

But how do you do that from the hooks, if they only store the data into
a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and
then the system crashes and reboots. How do you know you have incomplete
information from the WAL segment?

Although, that's probably what wal_switch_hook() might do - sync the
data whenever the WAL segment is switched. Right?

> 
>> From this POV, the idea to collect this information on the backup system
>> (WAL archive) by pre-processing the arriving WAL segments seems like the
>> most promising. It moves the work to another system, the backup system
>> can make it as durable as the WAL segments, etc.
> 
> Well, in some not so rare cases users encrypt backups and send to S3.
> And there is no system with CPUs that can handle that WAL parsing.
> Currently, I'm considering mocking prototype for wal-g, which works
> exactly this.
> 

Why couldn't there be a system with enough CPU power? Sure, if you want
to do this, you'll need a more powerful system, but regular CPUs can do
>1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the
database as part of archive_command, before the encryption, of course.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks to track changed pages for backup purposes

2017-09-13 Thread Andrey Borodin

Hi!
Thank you for your interest and experiment results.
> 13 сент. 2017 г., в 15:43, Ants Aasma  написал(а):
> 
> On Thu, Aug 31, 2017 at 9:02 AM, Andrey Borodin  wrote:
>> When we have accumulated diff blocknumbers for most of segments we can 
>> significantly speed up method of WAL scanning. If we have blocknumbers for 
>> all segments we can skip WAL scanning at all.
> 
> Have you measured that the WAL scanning is actually a significant
> issue? As a quick experiment I hacked up pg_waldump to just dump block
> references to stdout in binary format. It scanned 2.8GB of WAL in 3.17
> seconds, outputting 9.3M block refs per second. WAL was generated with
> pgbench, synchronous commit off, using 4 cores for 10 minutes - making
> the ratio of work from generating WAL to parsing it be about 750:1.
> 

No, I had not done this measurement myself. Sure, parsing WAL, when it is in 
RAM, is not very expensive. Though, it can be even cheaper before formatting 
WAL.
I just want to figure out what is the best place for this, if backuping exec is 
sharing CPUs with postmaster.

Best regards, Andrey Borodin.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

2017-09-13 Thread Prabhat Sahu

Hi Thomas,

Setting with lower "shared_buffers" and "work_mem" as below,  query getting
crash but able to see explain plan.

shared_buffers = 1MB
work_mem = 1MB
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
enable_mergejoin = off
enable_nestloop = off
enable_hashjoin = on
force_parallel_mode = on
seq_page_cost = 0.1
random_page_cost = 0.1
effective_cache_size = 128MB
parallel_tuple_cost = 0
parallel_setup_cost = 0
parallel_synchronization_cost = 0

CREATE TABLE t1 (a int, b text);
INSERT INTO t1 (SELECT x%2, x%2||'_b' FROM
generate_series(1,20) x);
ANALYZE;

postgres=# explain select * from t1, t1 t2 where t1.a = t2.a;
   QUERY PLAN


-
 Gather  (cost=2852.86..16362.74 rows=2069147 width=22)
   Workers Planned: 1
   ->  Parallel Hash Join  (cost=2852.86..16362.74 rows=1217145 width=22)
 Hash Cond: (t1.a = t2.a)
 ->  Parallel Seq Scan on t1  (cost=0.00..1284.57 rows=117647
width=11)
 ->  Parallel Hash  (cost=1284.57..1284.57 rows=117647 width=11)
   ->  Parallel Seq Scan on t1 t2  (cost=0.00..1284.57
rows=117647 width=11)
(7 rows)

postgres=# select * from t1, t1 t2 where t1.a = t2.a;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>


-- After assigning more "shared_buffers(10MB)" and "work_mem(10MB)" query
execute successfully.

Kindly check, if you can reproduce this at your end.

*Thanks & Regards,*

*Prabhat Kumar Sahu*
Mob: 7758988455
Skype ID: prabhat.sahu1984

www.enterprisedb.co m


On Wed, Sep 13, 2017 at 12:34 PM, Prabhat Sahu <
prabhat.s...@enterprisedb.com> wrote:

>
> On Thu, Aug 31, 2017 at 6:23 PM, Thomas Munro <
> thomas.mu...@enterprisedb.com> wrote:
>
>> Here's a new rebased and debugged patch set.
>
>
> Hi Thomas,
>
> I have applied the recent patch (v19) and started testing on this feature
> and i got a crash with below testcase.
>
> with default setting on "postgres.conf" file
>
> create table tab1 (a int, b text);
> create table tab2 (a int, b text);
> insert into tab1 (select x, x||'_b' from generate_series(1,20) x);
> insert into tab2 (select x%2, x%2||'_b' from
> generate_series(1,20) x);
> ANALYZE;
> select * from tab1 t1, tab2 t2, tab1 t3 where t1.a = t2.a and  t2.b = t3.b
> order by 1;
>
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
> Kindly check, if you can reproduce this at your end.
>
>
> *Thanks & Regards,*
>
> *Prabhat Kumar Sahu*
> Mob: 7758988455
> Skype ID: prabhat.sahu1984
>
> www.enterprisedb.co m
> 
>
>

Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Tomas Vondra

Hi Aleksander,

On 09/13/2017 11:49 AM, Aleksander Alekseev wrote:
> Hi Tomas,
> 
> I appreciate your feedback, although it doesn't seem to be completely
> fair. Particularly:
> 
>> You gave everyone about 4 hours to object
> 
> This is not quite accurate since my proposal was sent 2017-09-11
> 09:41:32 and this thread started - 2017-09-12 14:14:55.
> 

Understood. I didn't really consider the first message to be a proposal
with a deadline, as it starts with "here's a crazy idea" and it's not
immediately clear that you intend to proceed with it immediately, and
that you expect people to object.

The message I referenced is a much clearer on this.

>> You just changed the status of 10-15% open patches. I'd expect
>> things like this to be consulted with the CF manager, yet I don't see
>> any comments from Daniel.
> 
> Agree, that was clearly a mistake, I had to add Daniel to CC. Sorry I
> didn't do that. I've returned all affected patches back to "Needs
> Review". On the bright side while doing this I've noticed that many
> patches were already updated by their authors.
> 

Yeah.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Logical replication support for initial data copy

2017-09-13 Thread Peter Eisentraut

On 3/30/17 14:04, Fujii Masao wrote:
> On Thu, Mar 23, 2017 at 9:59 PM, Peter Eisentraut  wrote:
>> Logical replication support for initial data copy
> 
> + case T_SQLCmd:
> + if (MyDatabaseId == InvalidOid)
> + ereport(ERROR,
> + (errmsg("not connected to database")));
> 
> This error message doesn't seem to follow the error message style in docs.

I have committed an improved message.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-09-13 Thread Magnus Hagander

On Wed, Sep 13, 2017 at 3:41 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> Hi Thomas, Magnus
>
> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro
> > Since it only conflicts with c7b8998e because of pgindent whitespace
> > movement, I applied it with "patch -p1 --ignore-whitespace" and created
> > a new patch.  See attached.
>
> Thanks, Thomas.  I've added your name in the CF entry so that your name
> will also be listed on the release note, because my patch is originally
> based on your initial try.  Please remove your name just in case you mind
> it.  BTW, your auto-reviewer looks very convenient.  Thank you again for
> your great work.
>
> Magnus, it would be grateful if you could review and commit the patch
> while your memory is relatively fresh.
>
> I've been in a situation which keeps me from doing development recently,
> but I think I can gradually rejoin the community activity soon.
>
>
Hi!

It's my plan to get to this patch during this commitfest. I've been
travelling for open and some 24/7 work so far, but hope to get CFing soon.



-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Peter Eisentraut

On 9/13/17 06:39, Ashutosh Sharma wrote:
> Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
> C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468

 Googling around I see some indications that the macro may not be
 defined in all implementations and that some other projects test if
 it's defined:
>>>
>>> Does this work for you Ashutosh?
>>>
> 
> Thanks for the patch. Yes, it works for me.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Daniel Gustafsson

> On 13 Sep 2017, at 11:49, Aleksander Alekseev  
> wrote:
> 
> Hi Tomas,
> 
> I appreciate your feedback, although it doesn't seem to be completely
> fair.

I would like to stress one thing (and I am speaking only for myself here), this
has been feedback and not criticism.  Your (and everyone involved in this)
initiative is great and automation will no doubt make the CF process better.
We just need to finetune it a little to make it work for, as well as with, the
community.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-09-13 Thread Rushabh Lathia

On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila 
wrote:

> During my recent work on costing of parallel paths [1], I noticed that
> we are missing to push target list below GatherMerge in some simple
> cases like below.
>
> Test prepration
> -
> create or replace function simple_func(var1 integer) returns integer
> as $$
> begin
> return var1 + 10;
> end;
> $$ language plpgsql PARALLEL SAFE;
>
> create table t1(c1 int, c2 char(5));
> insert into t1 values(generate_series(1,50), 'aaa');
> set parallel_setup_cost=0;
> set parallel_tuple_cost=0;
> set min_parallel_table_scan_size=0;
> set max_parallel_workers_per_gather=4;
> set cpu_operator_cost=0; set min_parallel_index_scan_size=0;
>
> Case-1
> -
> postgres=# explain (costs off, verbose) select c1,simple_func(c1) from
> t1 where c1 < 1 order by c1;
>  QUERY PLAN
> -
>  Gather Merge
>Output: c1, simple_func(c1)
>Workers Planned: 4
>->  Parallel Index Scan using idx_t1 on public.t1
>  Output: c1
>  Index Cond: (t1.c1 < 1)
> (6 rows)
>
> In the above case, I don't see any reason why we can't push the target
> list expression (simple_func(c1)) down to workers.
>
> Case-2
> --
> set enable_indexonlyscan=off;
> set enable_indexscan=off;
> postgres=# explain (costs off, verbose) select c1,simple_func(c1) from
> t1 where c1 < 1 order by c1;
>  QUERY PLAN
> 
>  Gather Merge
>Output: c1, simple_func(c1)
>Workers Planned: 4
>->  Sort
>  Output: c1
>  Sort Key: t1.c1
>  ->  Parallel Bitmap Heap Scan on public.t1
>Output: c1
>Recheck Cond: (t1.c1 < 1)
>->  Bitmap Index Scan on idx_t1
>  Index Cond: (t1.c1 < 1)
> (11 rows)
>
> It is different from above case (Case-1) because sort node can't
> project, but I think adding a Result node on top of it can help to
> project and serve our purpose.
>
> The attached patch allows pushing the target list expression in both
> the cases. I think we should handle GatherMerge case in
> apply_projection_to_path similar to Gather so that target list can be
> pushed.  Another problem was during the creation of ordered paths
> where we don't allow target expressions to be pushed below gather
> merge.
>
> Plans after patch:
>
> Case-1
> --
> postgres=# explain (costs off, verbose) select c1,simple_func(c1) from
> t1 where c1 < 1 order by c1;
> QUERY PLAN
> --
>  Gather Merge
>Output: c1, (simple_func(c1))
>Workers Planned: 4
>->  Parallel Index Only Scan using idx_t1 on public.t1
>  Output: c1, simple_func(c1)
>  Index Cond: (t1.c1 < 1)
> (6 rows)
>
> Case-2
> ---
> postgres=# explain (costs off, verbose) select c1,simple_func(c1) from
> t1 where c1 < 1 order by c1;
> QUERY PLAN
> --
>  Gather Merge
>Output: c1, (simple_func(c1))
>Workers Planned: 4
>->  Result
>  Output: c1, simple_func(c1)
>  ->  Sort
>Output: c1
>Sort Key: t1.c1
>->  Parallel Bitmap Heap Scan on public.t1
>  Output: c1
>  Recheck Cond: (t1.c1 < 1)
>  ->  Bitmap Index Scan on idx_t1
>Index Cond: (t1.c1 < 1)
> (13 rows)
>
> Note, that simple_func() is pushed down to workers in both the cases.
>
> Thoughts?
>

This seems like a good optimization. I tried to simulate the test given
in the mail, initially wasn't able to generate the exact test - as index
creation is missing in the test shared.

I also won't consider this as bug, but its definitely good optimization
for GatherMerge.


>
> Note - If we agree on the problems and fix, then I can add regression
> tests to cover above cases in the patch.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1Ji_
> 0pgrjFotAyvvfxGikxJQEKcxD863VQ-xYtfQBy0uQ%40mail.gmail.com


Sure, once you do that - I will review the patch.

Thanks,


>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Rushabh Lathia

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Arseny Sher

Peter Eisentraut  writes:
> We can break this in any number of ways:
>
> - (your patch) Kill workers right away after ALTER SUBSCRIPTION DISABLE,
> thus breaking the appearance of transactional DDL somewhat.
> ...
> - Have DROP SUBSCRIPTION attempt to kill workers if the subscription is
> disabled (and possibly, was changed in the same transaction), which
> would address this scenario very narrowly.

Actually, my patch is closer to the last variant. I proposed to kill the
workers in DROP SUBSCRIPTION, and only if we are dropping replication
origin (which is probably always the case, though). I agree that it is
somewhat narrow and still slightly violates transactionality of DROP
SUBSCRIPTION, meaning that its changes (stopped workers) are seen before
the commit.

However, do we care much about that? Is there any chance that users will
rely on living apply workers after DROP SUBSCRIPTION, but before the
transaction commit? In which situation this might be useful?

On the other hand, forbidding to execute disable sub and drop sub in one
transaction makes it impossible e.g. to drop subscription in trigger as
long as Postgres doesn't have autonomous transactions.

Tom Lane  writes:
> ISTM the second of those (refuse to drop an in-use subscription) is
> by far the least surprising behavior.  However, I wonder if there aren't
> race conditions involved here.  What if we haven't yet committed a
> DROP SUBSCRIPTION, and some new worker starts up after we look for
> workers?

We hold a lock on subscription till the end of transaction, so workers
won't start.

> If there aren't variants of that that will break all four options,
> it's not very obvious why not.

I see it this way:
* We want effect of drop sub invisible till commit, so we can't stop
  workers before commit.
* Drop of replication origin needs to be executed in one transaction with
  drop sub, it writes to WAL and so must be executed before commit.
* Apply worker needs RO for its work, it owns origin for the whole
  lifetime.

Something should be given up here. One more idea that was not yet
mentioned is to abandon attempts to drop subs and ROs simultenously and
just garbage-collect replication origins periodically, but that doesn't
sound as an elegant solution.

Masahiko Sawada  writes:

>> I don't think this is reliable -- what if worker suddenly dies without
>> accomplishing the job?
>
> The apply worker will be launched by the launcher later. If DROP
> SUBSCRIPTION is issued before the apply worker launches again, DROP
> SUBSCRIPTION itself can remove the replication origin.

Why launcher would restart the worker if we already destroyed the
subscription? Consider the sequence of actions:

* We check in DROP SUBSCRIPTION that worker alive and don't remove RO.
* DROP SUBSCRIPTION commits.
* Worker is killed by some villain before it had the chance to drop RO.
  It might be killed even before drop sub commit, but after the check,
  we are left with orphan RO anyway.

--
Arseny Sher

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surjective functional indexes

2017-09-13 Thread Simon Riggs

On 13 September 2017 at 11:30, Konstantin Knizhnik
 wrote:

> The only reason of all this discussion about terms is that I need to choose
> name for correspondent index option.
> Simon think that we do not need this option at all. In this case we should
> not worry about right term.
> From my point of view, "projection" is quite clear notion and not only for
> mathematics. It is also widely used in IT and especially in DBMSes.

If we do have an option it won't be using fancy mathematical
terminology at all, it would be described in terms of its function,
e.g. recheck_on_update

Yes, I'd rather not have an option at all, just some simple code with
useful effect, like we have in many other places.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-09-13 Thread amul sul

On Sun, Sep 10, 2017 at 8:47 AM, Amit Kapila 
wrote:

> On Fri, Sep 8, 2017 at 4:51 PM, amul sul  wrote:
> > On Thu, May 18, 2017 at 9:13 AM, Amit Kapila 
> > wrote:
> >>
> >>  On Wed, May 17, 2017 at 5:17 PM, Robert Haas 
> >> wrote:
> >> > On Wed, May 17, 2017 at 6:29 AM, Amit Kapila  >
> >> > wrote:
> >> >> I think we can do this even without using an additional infomask bit.
> >> >> As suggested by Greg up thread, we can set InvalidBlockId in ctid to
> >> >> indicate such an update.
> >> >
> >> > Hmm.  How would that work?
> >> >
> >>
> >> We can pass a flag say row_moved (or require_row_movement) to
> >> heap_delete which will in turn set InvalidBlockId in ctid instead of
> >> setting it to self. Then the ExecUpdate needs to check for the same
> >> and return an error when heap_update is not successful (result !=
> >> HeapTupleMayBeUpdated).  Can you explain what difficulty are you
> >> envisioning?
> >>
> >
> > Attaching WIP patch incorporates the above logic, although I am yet to
> check
> > all the code for places which might be using ip_blkid.  I have got a
> small
> > query here,
> > do we need an error on HeapTupleSelfUpdated case as well?
> >
>
> No, because that case is anyway a no-op (or error depending on whether
> is updated/deleted by same command or later command).  Basically, even
> if the row wouldn't have been moved to another partition, we would not
> have allowed the command to proceed with the update.  This handling is
> to make commands fail rather than a no-op where otherwise (when the
> tuple is not moved to another partition) the command would have
> succeeded.
>
> 
Thank you.

I've rebased patch against  Amit Khandekar's latest
 
patch
 
(v17_rebased_2)
.
Also 
added ip_blkid validation
 
check in heap_get_latest_tid(), rewrite_heap_tuple()
& rewrite_heap_tuple() function, because only
 
ItemPointerEquals() check is no
longer sufficient
after
 this patch.

Regards,
Amul


0002-invalidate_ctid-ip_blkid-WIP_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks to track changed pages for backup purposes

2017-09-13 Thread Ants Aasma

On Thu, Aug 31, 2017 at 9:02 AM, Andrey Borodin  wrote:
> When we have accumulated diff blocknumbers for most of segments we can 
> significantly speed up method of WAL scanning. If we have blocknumbers for 
> all segments we can skip WAL scanning at all.

Have you measured that the WAL scanning is actually a significant
issue? As a quick experiment I hacked up pg_waldump to just dump block
references to stdout in binary format. It scanned 2.8GB of WAL in 3.17
seconds, outputting 9.3M block refs per second. WAL was generated with
pgbench, synchronous commit off, using 4 cores for 10 minutes - making
the ratio of work from generating WAL to parsing it be about 750:1.

Regards,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Ashutosh Sharma

Hi,

On Wed, Sep 13, 2017 at 3:15 PM, Ashutosh Sharma  wrote:
> Hi Thomas,
>
> On Wed, Sep 13, 2017 at 2:57 PM, Thomas Munro
>  wrote:
>> On Wed, Sep 13, 2017 at 9:11 PM, Thomas Munro
>>  wrote:
>>> On Wed, Sep 13, 2017 at 8:58 PM, Ashutosh Sharma  
>>> wrote:
 I am getting the following error message when trying to build latest
 PG source on Windows,

 Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
 C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468
>>>
>>> Googling around I see some indications that the macro may not be
>>> defined in all implementations and that some other projects test if
>>> it's defined:
>>
>> Does this work for you Ashutosh?
>>

Thanks for the patch. Yes, it works for me.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surjective functional indexes

2017-09-13 Thread Konstantin Knizhnik




On 13.09.2017 13:14, Christoph Berg wrote:

Re: Konstantin Knizhnik 2017-09-13 
<2393c4b3-2ec4-dc68-4ea9-670597b56...@postgrespro.ru>


On 13.09.2017 10:51, Christoph Berg wrote:

Re: Konstantin Knizhnik 2017-09-01 


+   Functional index is based on on projection function: function which 
extract subset of its argument.
+   In mathematic such functions are called non-injective. For injective 
function if any attribute used in the indexed
+   expression is changed, then value of index expression is also changed.

This is Just Wrong. I still think what you are doing here doesn't have
anything to do with the function being injective or not.

Sorry, can you please explain what is wrong?

I don't get why you are reasoning about "projection" ->
"non-injective" -> "injective". Can't you try to explain what this
functionality is about without abusing math terms that just mean
something else in the rest of the world?


I tried to explain it in my previous e-mail. In most cases (it is just 
my filling, may be it is wrong), functional indexes are built for some 
complex types, like JSON, arrays, structs,...
and index expression extracts some components of this compound value. It 
means that even if underlying column is changes, there is good chance 
that value of index function is not changed. So there is no need to 
update index and we can use HOT. It allows to several time increase 
performance.


The only reason of all this discussion about terms is that I need to 
choose name for correspondent index option.
Simon think that we do not need this option at all. In this case we 
should not worry about right term.
From my point of view, "projection" is quite clear notion and not only 
for mathematics. It is also widely used in IT and especially in DBMSes.


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-09-13 Thread Amit Khandekar

On 13 September 2017 at 15:32, Amit Langote
 wrote:
> On 2017/09/11 18:56, Amit Langote wrote:
>> Attached updated patch does it that way for both partitioned table indexes
>> and leaf partition indexes.  Thanks for pointing it out.
>
> It seems to me we don't really need the first patch all that much.  That
> is, let's keep PartitionDispatchData the way it is for now, since we don't
> really have any need for it beside tuple-routing (EIBO as committed didn't
> need it for one).  So, let's forget about "decoupling
> RelationGetPartitionDispatchInfo() from the executor" thing for now and
> move on.
>
> So, attached is just the patch to make RelationGetPartitionDispatchInfo()
> traverse the partition tree in depth-first manner to be applied on HEAD.
>
> Thoughts?

+1. If at all we need the decoupling later for some reason, we can do
that incrementally.

Will review your latest patch by tomorrow.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surjective functional indexes

2017-09-13 Thread Christoph Berg

Re: Konstantin Knizhnik 2017-09-13 
<2393c4b3-2ec4-dc68-4ea9-670597b56...@postgrespro.ru>
> 
> 
> On 13.09.2017 10:51, Christoph Berg wrote:
> > Re: Konstantin Knizhnik 2017-09-01 
> > 
> > > +   Functional index is based on on projection function: function 
> > > which extract subset of its argument.
> > > +   In mathematic such functions are called non-injective. For 
> > > injective function if any attribute used in the indexed
> > > +   expression is changed, then value of index expression is also 
> > > changed.
> > This is Just Wrong. I still think what you are doing here doesn't have
> > anything to do with the function being injective or not.
> 
> Sorry, can you please explain what is wrong?

I don't get why you are reasoning about "projection" ->
"non-injective" -> "injective". Can't you try to explain what this
functionality is about without abusing math terms that just mean
something else in the rest of the world?

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surjective functional indexes

2017-09-13 Thread Konstantin Knizhnik




On 13.09.2017 10:51, Christoph Berg wrote:

Re: Konstantin Knizhnik 2017-09-01 


+   Functional index is based on on projection function: function which 
extract subset of its argument.
+   In mathematic such functions are called non-injective. For injective 
function if any attribute used in the indexed
+   expression is changed, then value of index expression is also changed.

This is Just Wrong. I still think what you are doing here doesn't have
anything to do with the function being injective or not.


Sorry, can you please explain what is wrong?
The problem I am trying to solve comes from particular use case: 
functional index on part of JSON column.
Usually such index is built for persistent attributes, which are rarely 
changed, like ISBN...
Right now any update of JSON column disables hot update. Even if such 
update doesn't really affect index.
So instead of disabling HOT juts based on mask of modified attributes, I 
suggest to compare old and new value of index expression.


Such behavior can significantly (several times) increase performance. 
But only for "projection" functions.
There was long discussion in this thread about right notion for this 
function (subjective,  non-injective,  projection).

But I think criteria is quite obvious.

Simon propose eliminate "projection" property and use autotune to 
determine optimal behavior.
I still think that such option will be useful, but we can really use 
statistic to compare number of unique values for index function and for 
it's argument(s).
If them are similar, then most likely the function is injective, so it 
produce different result for different attributes.
Then there is no sense to spend extra CPU time, calculating old and new 
values of the function.

This is what I am going to implement now.

So I will be please if you more precisely explain your concerns and 
suggestions (if you have one).


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-09-13 Thread Ashutosh Sharma

On Wed, Sep 13, 2017 at 7:11 AM, Tsunakawa, Takayuki
 wrote:
> Hi Thomas, Magnus
>
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro
>> Since it only conflicts with c7b8998e because of pgindent whitespace
>> movement, I applied it with "patch -p1 --ignore-whitespace" and created
>> a new patch.  See attached.
>
> Thanks, Thomas.  I've added your name in the CF entry so that your name will 
> also be listed on the release note, because my patch is originally based on 
> your initial try.  Please remove your name just in case you mind it.  BTW, 
> your auto-reviewer looks very convenient.  Thank you again for your great 
> work.
>
> Magnus, it would be grateful if you could review and commit the patch while 
> your memory is relatively fresh.
>
> I've been in a situation which keeps me from doing development recently, but 
> I think I can gradually rejoin the community activity soon.
>

I have once again tested the latest patch (v14 patch) on Windows and
the results looked fine to me. Basically I have repeated the test
cases which I had done earlier on v8 patch. For more details, on the
tests that i have re-executed, please refer to - [1]. Thanks.

[1]-   
https://www.postgresql.org/message-id/CAE9k0Pkz%2BtOiPmx2LrVePM7cZydTLNbQ6R3GqgeivurfsXyZ5w%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-09-13 Thread Amit Langote

On 2017/09/11 18:56, Amit Langote wrote:
> Attached updated patch does it that way for both partitioned table indexes
> and leaf partition indexes.  Thanks for pointing it out.

It seems to me we don't really need the first patch all that much.  That
is, let's keep PartitionDispatchData the way it is for now, since we don't
really have any need for it beside tuple-routing (EIBO as committed didn't
need it for one).  So, let's forget about "decoupling
RelationGetPartitionDispatchInfo() from the executor" thing for now and
move on.

So, attached is just the patch to make RelationGetPartitionDispatchInfo()
traverse the partition tree in depth-first manner to be applied on HEAD.

Thoughts?

Thanks,
Amit
From 1e99c776eda30c29fdb0e48570d6b3acd6b9a05d Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 8 Sep 2017 17:35:10 +0900
Subject: [PATCH] Make RelationGetPartitionDispatch expansion order depth-first

This is so as it matches what the planner is doing with partitioning
inheritance expansion.  Matching with planner order helps because
it helps ease matching the executor's per-partition objects with
planner-created per-partition nodes.
---
 src/backend/catalog/partition.c | 242 
 1 file changed, 99 insertions(+), 143 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 73eff17202..ddb46a80cb 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -147,6 +147,8 @@ static int32 partition_bound_cmp(PartitionKey key,
 static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
void *probe, bool 
probe_is_bound, bool *is_equal);
+static void get_partition_dispatch_recurse(Relation rel, Relation parent,
+  List **pds, List 
**leaf_part_oids);
 
 /*
  * RelationBuildPartitionDesc
@@ -1192,21 +1194,6 @@ get_partition_qual_relid(Oid relid)
 }
 
 /*
- * Append OIDs of rel's partitions to the list 'partoids' and for each OID,
- * append pointer rel to the list 'parents'.
- */
-#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \
-   do\
-   {\
-   int i;\
-   for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\
-   {\
-   (partoids) = lappend_oid((partoids), 
(rel)->rd_partdesc->oids[i]);\
-   (parents) = lappend((parents), (rel));\
-   }\
-   } while(0)
-
-/*
  * RelationGetPartitionDispatchInfo
  * Returns information necessary to route tuples down a partition 
tree
  *
@@ -1222,151 +1209,120 @@ PartitionDispatch *
 RelationGetPartitionDispatchInfo(Relation rel,
 int 
*num_parted, List **leaf_part_oids)
 {
+   List   *pdlist;
PartitionDispatchData **pd;
-   List   *all_parts = NIL,
-  *all_parents = NIL,
-  *parted_rels,
-  *parted_rel_parents;
-   ListCell   *lc1,
-  *lc2;
-   int i,
-   k,
-   offset;
+   ListCell *lc;
+   int i;
 
-   /*
-* We rely on the relcache to traverse the partition tree to build both
-* the leaf partition OIDs list and the array of PartitionDispatch 
objects
-* for the partitioned tables in the tree.  That means every partitioned
-* table in the tree must be locked, which is fine since we require the
-* caller to lock all the partitions anyway.
-*
-* For every partitioned table in the tree, starting with the root
-* partitioned table, add its relcache entry to parted_rels, while also
-* queuing its partitions (in the order in which they appear in the
-* partition descriptor) to be looked at later in the same loop.  This 
is
-* a bit tricky but works because the foreach() macro doesn't fetch the
-* next list element until the bottom of the loop.
-*/
-   *num_parted = 1;
-   parted_rels = list_make1(rel);
-   /* Root partitioned table has no parent, so NULL for parent */
-   parted_rel_parents = list_make1(NULL);
-   APPEND_REL_PARTITION_OIDS(rel, all_parts, all_parents);
-   forboth(lc1, all_parts, lc2, all_parents)
+   Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+   *num_parted = 0;
+   *leaf_part_oids = NIL;
+
+   get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids);
+   *num_parted = list_length(pdlist);
+   pd = (PartitionDispatchData **) palloc(*num_parted *
+   
   sizeof(PartitionDispatchData *));
+   i = 0;
+   foreach (lc, pdlist)

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-13 Thread Amit Khandekar

On 13 September 2017 at 13:05, Ashutosh Bapat
 wrote:
> On Wed, Sep 13, 2017 at 12:32 PM, Amit Khandekar  
> wrote:
>> Hi,
>>
>> Rafia had done some testing on TPCH queries using Partition-wise join
>> patch along with Parallel Append patch.
>>
>> There, we had observed that for query 4, even though the partition
>> wise joins are under a Parallel Append, the join are all non-partial.
>>
>> Specifically, the partition-wise join has non-partial nested loop
>> joins when actually it was expected to have partial nested loop joins.
>> (The difference can be seen by the observation that the outer relation
>> of that join is scanned by non-parallel Bitmap Heap scan when it
>> should have used Parallel Bitmap Heap Scan).
>>
>> Here is the detailed analysis , including where I think is the issue :
>>
>> https://www.postgresql.org/message-id/CAJ3gD9cZms1ND3p%3DNN%3DhDYDFt_SeKq1htMBhbj85bOmvJwY5fg%40mail.gmail.com
>>
>> All the TPCH results are posted in the same above mail thread.
>
> Can you please check if the attached patch fixes the issue.

Thanks Ashutosh. Yes, it does fix the issue. Partial Nested Loop joins
are generated now. If I see any unexpected differences in the
estimated or actual costs, I will report that in the Parallel Append
thread. As far as Partition-wise join is concerned, this issue is
solved, because Partial nested loop join does get created.

>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Inconsistencies between pg_settings and postgresql.conf

2017-09-13 Thread Adrian Escoms

Hello,

We are working with postgresql.conf configuration file and we have found
some discrepancies between it and pg_settings in terms of categories and
subcategories (we have split the field category in pg_settings by '/', the
first part being 'category', the second 'subcategory')
We suggest to change the postgresql.conf.sample file with the attached diff
to resolve these inconsistencies.
We think that these changes could also be backported to previous versions.

Looking forward to your comments,


Adrián Escoms
diff --git a/src/backend/utils/misc/postgresql.conf.sample 
b/src/backend/utils/misc/postgresql.conf.sample
index 53aa006..99ec156 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -105,7 +105,7 @@
 
 
 #--
-# RESOURCE USAGE (except WAL)
+# RESOURCE USAGE
 #--
 
 # - Memory -
@@ -138,7 +138,7 @@
 #temp_file_limit = -1  # limits per-process temp file space
# in kB, or -1 for no limit
 
-# - Kernel Resource Usage -
+# - Kernel Resource -
 
 #max_files_per_process = 1000  # min 25
# (change requires restart)
@@ -172,7 +172,7 @@
 
 
 #--
-# WRITE AHEAD LOG
+# WRITE-AHEAD LOG
 #--
 
 # - Settings -
@@ -228,7 +228,7 @@
 # REPLICATION
 #--
 
-# - Sending Server(s) -
+# - Sending Servers -
 
 # Set these on the master and on any standby that will send replication data.
 
@@ -336,7 +336,7 @@
 
 
 #--
-# ERROR REPORTING AND LOGGING
+# REPORTING AND LOGGING
 #--
 
 # - Where to Log -
@@ -471,8 +471,9 @@
# -1 disables, 0 logs all temp files
 #log_timezone = 'GMT'
 
-
-# - Process Title -
+#--
+# PROCESS TITLE
+#--
 
 #cluster_name = '' # added to process titles if nonempty
# (change requires restart)
@@ -480,10 +481,10 @@
 
 
 #--
-# RUNTIME STATISTICS
+# STATISTICS
 #--
 
-# - Query/Index Statistics Collector -
+# - Query and Index Statistics Collector -
 
 #track_activities = on
 #track_counts = on
@@ -493,7 +494,7 @@
 #stats_temp_directory = 'pg_stat_tmp'
 
 
-# - Statistics Monitoring -
+# - Monitoring -
 
 #log_parser_stats = off
 #log_planner_stats = off
@@ -502,7 +503,7 @@
 
 
 #--
-# AUTOVACUUM PARAMETERS
+# AUTOVACUUM
 #--
 
 #autovacuum = on   # Enable autovacuum subprocess?  'on'
@@ -610,7 +611,7 @@
 
 
 #--
-# VERSION/PLATFORM COMPATIBILITY
+# VERSION AND PLATFORM COMPATIBILITY
 #--
 
 # - Previous PostgreSQL Versions -

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Aleksander Alekseev

Hi Tomas,

I appreciate your feedback, although it doesn't seem to be completely
fair. Particularly:

> You gave everyone about 4 hours to object

This is not quite accurate since my proposal was sent 2017-09-11
09:41:32 and this thread started - 2017-09-12 14:14:55.

> You just changed the status of 10-15% open patches. I'd expect
> things like this to be consulted with the CF manager, yet I don't see
> any comments from Daniel.

Agree, that was clearly a mistake, I had to add Daniel to CC. Sorry I
didn't do that. I've returned all affected patches back to "Needs
Review". On the bright side while doing this I've noticed that many
patches were already updated by their authors.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Ashutosh Sharma

Hi Thomas,

On Wed, Sep 13, 2017 at 2:57 PM, Thomas Munro
 wrote:
> On Wed, Sep 13, 2017 at 9:11 PM, Thomas Munro
>  wrote:
>> On Wed, Sep 13, 2017 at 8:58 PM, Ashutosh Sharma  
>> wrote:
>>> I am getting the following error message when trying to build latest
>>> PG source on Windows,
>>>
>>> Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
>>> C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468
>>
>> Googling around I see some indications that the macro may not be
>> defined in all implementations and that some other projects test if
>> it's defined:
>
> Does this work for you Ashutosh?
>
> --

I am currently stuck with some other task on Windows. Is it okay if i
can let you know the results in another 30-40 mins. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

2017-09-13 Thread Fabien COELHO




I have a serious, serious dislike for tests that seem to work until
they're run on a heavily loaded machine.



I'm not that sure the error message was because of that.


No, this particular failure (probably) wasn't.  But now that I've realized
that this test case is timing-sensitive, I'm worried about what will
happen when it's run on a sufficiently slow or loaded machine.


I would not necessarily object to doing something in the code that
would guarantee that, though.



Hmmm. Interesting point.


It could be as simple as putting the check-for-done at the bottom of the
loop not the top, perhaps.


I agree that it is best if tests should work in all reasonable conditions, 
including a somehow overloaded host...


I'm going to think about it, but I'm not sure of the best approach. In the 
mean time, ISTM that the issue has not been encountered (yet), so this is 
not a pressing issue. Maybe under -T > --aggregate-interval pgbench could 
go on over the limit if the log file has not been written at all, but that 
would be some kind of kludge for this specific test...


Note that to get test coverage for -T and have to assume that maybe a 
loaded host would not be able to generate just a one line log every second 
during that time is kind of a hard assumption...


Maybe some test could be "warnings", i.e. it could be acceptable to accept 
a failure once in a while in specific conditions, if this is rare enough 
and documented. ISTM that there is such a test for random output.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 138 matches

Mail list logo