date:20180920

On Fri, Sep 21, 2018 at 05:37:42AM +, Tsunakawa, Takayuki wrote:
> I think all of these are almost equally good.  I chose (1) at first,
> and you chose (3).  But (2) may be the best, because it's the natural
> place the user will see when configuring the standby, and it already
> contains an example of recovery.conf.  We can add "options=''-c
> wal_sender_timeout=5000''" or something in that example.  I'm OK with
> anyplace, but I recommend adding how to specify wal_sender_timeout in
> primary_conninfo, because libpq's options parameter may not be
> well-konown, and it's a bit difficult to figure out the need to
> enclose the value with double single-quotes. 

I think that the description of wal_sender_timeout and its properties
should remain where the parameter is defined, so (3) is not a good
option in my opinion.  (2) has a point with the use of quotes actually,
so why not just mention options=''-c wal_sender_timeout=5000'' in the
example of recovery.conf as you suggest and call it a day, but keep the
paragraph I suggested in (1)?
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

On 2018-09-20 16:19:26 -0700, Andres Freund wrote:
> On 2018-09-20 19:03:16 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2018-09-20 18:38:36 -0400, Tom Lane wrote:
> > >> hmmm ... but even with variadic, C's macro facility is so weak that
> > >> I'm not sure we can reimplement these with it.  What would the
> > >> expansion look like?
> > 
> > > There's a dirty hack to count arguments in vararg macros:
> > > https://groups.google.com/d/msg/comp.std.c/d-6Mj5Lko_s/5fW1bP6T3RIJ
> > 
> > Doesn't seem to help for this case.  What we really want is to expand
> > a given pattern once for each variadic argument, and I don't see how
> > to get there from here.
> 
> Depends on whether your goal is to simplify *using* the macro, or the
> infrastructure for the macro.  Afaict it should be relatively
> straightforward to use a, possibly simplified, macro like referenced
> above to have TailMatches(...), TailMatchesCS(...), Matches(...) that
> then expand to the current *N macros.
> 
> 
> > Although maybe I'm thinking too much inside-the-box.  The expansion
> > doesn't necessarily have to be identical to the code the macros
> > generate today.  In fact, that code is kinda bulky.  I wonder if
> > we could go over to something involving a variadic function, or
> > maybe an array of string-pointer constants?
> 
> Yea, that might be a way to simplify both the macros and the use of the
> macros.  Assuming we have something like PP_NARG, ISTM it should be
> relatively straightforward to define something like
> 
> #define Matches(...) CheckMatchesFor(PP_NARG(__VA_ARGS__), __VA_ARGS__)
> 
> and then have a CheckMatchesFor() first check previous_words_count, and
> then just have a simple for loop through previous_words[] - afaict the
> number of arguments ought to suffice to make that possible?

Here's a very quick-and-dirty implementation of this approach. Some very
very brief testing seems to indicate it works, although I'm sure not
perfectly.

The current duplication of the new functions doing the actual checking
(like CheckMatchesFor(), CheckTailMatchesFor()) would probably need to
be reduced. But I don't want to invest actual time before we decide that
this could be something we'd actually want to pursue.

Greetings,

Andres Freund
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 47a1a19688f..21a4a807d93 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1220,6 +1220,110 @@ ends_with(const char *s, char c)
 	return (length > 0 && s[length - 1] == c);
 }
 
+static bool
+CheckTailMatchesFor(int previous_words_count, char **previous_words, int narg, ...)
+{
+	va_list		args;
+
+	if (previous_words_count < narg)
+		return false;
+
+	va_start(args, narg);
+
+	for (int argno = 0; argno < narg; argno++)
+	{
+		const char *arg = va_arg(args, const char *);
+
+		if (!word_matches(arg, previous_words[narg - argno - 1]))
+		{
+			va_end(args);
+			return false;
+		}
+	}
+
+	va_end(args);
+
+	return true;
+}
+
+static bool
+CheckMatchesFor(int previous_words_count, char **previous_words, int narg, ...)
+{
+	va_list		args;
+
+	if (previous_words_count != narg)
+		return false;
+
+	va_start(args, narg);
+
+	for (int argno = 0; argno < narg; argno++)
+	{
+		const char *arg = va_arg(args, const char *);
+
+		if (!word_matches(arg, previous_words[narg - argno - 1]))
+		{
+			va_end(args);
+			return false;
+		}
+	}
+
+	va_end(args);
+
+	return true;
+}
+
+static bool
+CheckTailMatchesCSFor(int previous_words_count, char **previous_words, int narg, ...)
+{
+	va_list		args;
+
+	if (previous_words_count < narg)
+		return false;
+
+	va_start(args, narg);
+
+	for (int argno = 0; argno < narg; argno++)
+	{
+		const char *arg = va_arg(args, const char *);
+
+		if (!word_matches_cs(arg, previous_words[narg - argno - 1]))
+		{
+			va_end(args);
+			return false;
+		}
+	}
+
+	va_end(args);
+
+	return true;
+}
+
+static bool
+CheckHeadMatchesFor(int previous_words_count, char **previous_words, int narg, ...)
+{
+	va_list		args;
+
+	if (previous_words_count < narg)
+		return false;
+
+	va_start(args, narg);
+
+	for (int argno = 0; argno < narg; argno++)
+	{
+		const char *arg = va_arg(args, const char *);
+
+		if (!word_matches(arg, previous_words[previous_words_count - (argno + 1)]))
+		{
+			va_end(args);
+			return false;
+		}
+	}
+
+	va_end(args);
+
+	return true;
+}
+
 /*
  * The completion function.
  *
@@ -1260,149 +1364,23 @@ psql_completion(const char *text, int start, int end)
 #define prev8_wd  (previous_words[7])
 #define prev9_wd  (previous_words[8])
 
+		/* Macros for matching the last N words before point, case-insensitively. */
+#define TailMatches(...) CheckTailMatchesFor(previous_words_count, previous_words, PP_NARG(__VA_ARGS__), __VA_ARGS__)
+
 	/* Macros for matching the last N words before point, case-insensitively. */
-#define TailMatches1(p1) \
-	(previous_words_count >= 1 && \
-	 word_matches(p1, prev_wd))
-
-#define TailMatches2(p2, p1) \
-	(previous_words_cou

RE: Changing the setting of wal_sender_timeout per standby

2018-09-20 Thread Tsunakawa, Takayuki

From: Michael Paquier [mailto:mich...@paquier.xyz]
> But that does not apply to this single parameter, no?  I would think that
> a section in recovery.conf is more adapted.  I can see that the patch I
> proposed up-thread could be more precise though, so why not adding at an
> extra paragraph?  Here is an idea:
> "For a cluster distributed across multiple geographic locations, using a
> different value per location brings more flexibility in the cluster
> management.  A smaller value is useful for faster failure detection with
> a standby having a low connection latency, and a larger value helps
> judging better the health of a standby if located in a remote location,
> with a longer connection latency."

That paragraph looks cool, thanks.  Regarding where the paragraph should be, 
there are three candidate locations:

(1) 19.6.1. Sending Servers, wal_sender_timeout description
https://www.postgresql.org/docs/devel/static/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

(2) 26.2.4. Setting Up a Standby Server
https://www.postgresql.org/docs/devel/static/warm-standby.html#STANDBY-SERVER-SETUP

(3) 27.3. Standby Server Settings, primary_conninfo description
https://www.postgresql.org/docs/devel/static/standby-settings.html

I think all of these are almost equally good.  I chose (1) at first, and you 
chose (3).  But (2) may be the best, because it's the natural place the user 
will see when configuring the standby, and it already contains an example of 
recovery.conf.  We can add "options=''-c wal_sender_timeout=5000''" or 
something in that example.  I'm OK with anyplace, but I recommend adding how to 
specify wal_sender_timeout in primary_conninfo, because libpq's options 
parameter may not be well-konown, and it's a bit difficult to figure out the 
need to enclose the value with double single-quotes.


Regards
Takayuki Tsunakawa

Re: transction_timestamp() inside of procedures

On Thu, Sep 20, 2018 at 10:12:06PM -0700, Andres Freund wrote:
> Isn't the point that transaction_timestamp() does *not* currently change
> its value, even though the transaction (although not the outermost
> statement) has finished?

Ouch, yes.  I see the point now.  Indeed that's strange to not have a
new transaction timestamp after commit within the DO block..

I need a break of a couple of minutes.
--
Michael


signature.asc
Description: PGP signature

Re: Function to promote standby servers

On Thu, Sep 20, 2018 at 07:59:02AM +0200, Laurenz Albe wrote:
> Providing SQL access for administrative tasks seems to be a
> good thing, see ALTER SYSTEM and pg_reload_conf().
> 
> In that vein, I propose a function pg_promote() to promote
> physical standby servers.

No fundamental issues from me regarding the concept of being able to
trigger a promotion remotely, so +1.  Do we want this capability as well
for fallback_promote?  My gut tells me no, and that we ought to drop
this option at some point in the future, still that's worth mentioning.

You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c
could use it.
--
Michael

signature.asc
Description: PGP signature

Re: Unclear error message

On Thu, Sep 20, 2018 at 09:45:09PM +0200, Laurenz Albe wrote:
> That message is wrong, because "rel" and not "pkrel" is the partitioned table.
> I think it should be
> 
> ereport(ERROR,
> (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>  errmsg("foreign key cannot be defined on ONLY \"%s\" for a 
> partitioned table",
> Relatio
> nGetRelationName(rel;

Hmm...  There are no test cases for this error message, nor for the one
below "cannot add NOT VALID foreign key to relation foo", which is a bad
idea.

The referenced relation has to be RELKIND_RELATION.  Here is another
idea of error message:
cannot use ONLY on partitioned table "foo" with foreign key
--
Michael


signature.asc
Description: PGP signature

Re: transction_timestamp() inside of procedures

Hi,

On 2018-09-21 13:55:36 +0900, Michael Paquier wrote:
> On Thu, Sep 20, 2018 at 07:40:40PM -0400, Bruce Momjian wrote:
> > This surprised me since I expected a new timestamp after commit.  Is
> > this something we want to change or document?  Are there other
> > per-transaction behaviors we should adjust?
> 
> I don't quite follow your argument here.  clock_timestamp is known to be
> volatile, while the two others are stable, so its value can change
> within a transaction.

Isn't the point that transaction_timestamp() does *not* currently change
its value, even though the transaction (although not the outermost
statement) has finished?

I think Bruce has quite the point here.

Greetings,

Andres Freund

Re: transction_timestamp() inside of procedures

On Thu, Sep 20, 2018 at 07:40:40PM -0400, Bruce Momjian wrote:
> This surprised me since I expected a new timestamp after commit.  Is
> this something we want to change or document?  Are there other
> per-transaction behaviors we should adjust?

I don't quite follow your argument here.  clock_timestamp is known to be
volatile, while the two others are stable, so its value can change
within a transaction. 
--
Michael


signature.asc
Description: PGP signature

Re: Proposal for Signal Detection Refactoring

On 2018-09-21 13:46:11 +0900, Michael Paquier wrote:
> On Thu, Sep 20, 2018 at 03:08:34PM +0200, Chris Travers wrote:
> > So here's a small patch.  I will add it for the next commit fest unless
> > anyone has any reason I shouldn't.
> 
> -   return InterruptPending && (QueryCancelPending || ProcDiePending);
> +   return PENDING_INTERRUPT_LEVEL() >= QUERY_CANCEL;
> 
> This is pretty similar to lock levels, where it is pretty hard to put a
> strict monotone hierarchy when it comes to such interruptions.

+1

Greetings,

Andres Freund

Re: Proposal for Signal Detection Refactoring

On Thu, Sep 20, 2018 at 03:08:34PM +0200, Chris Travers wrote:
> So here's a small patch.  I will add it for the next commit fest unless
> anyone has any reason I shouldn't.

-   return InterruptPending && (QueryCancelPending || ProcDiePending);
+   return PENDING_INTERRUPT_LEVEL() >= QUERY_CANCEL;

This is pretty similar to lock levels, where it is pretty hard to put a
strict monotone hierarchy when it comes to such interruptions.  The new
code does not seem like an improvment either, as for example in the code
mentioned above, you know directly what are the actions involved, which
is not the case with the new code style.
--
Michael


signature.asc
Description: PGP signature

Re: Strange failure in LWLock on skink in REL9_5_STABLE

Thomas Munro  writes:
> On Fri, Sep 21, 2018 at 4:06 PM Tom Lane  wrote:
>> Why would we fix it rather than just removing it?

> I assumed we wouldn't remove an extern C function extension code
> somewhere might use.  Though admittedly I'd be surprised if anyone
> used this one.

Unless it looks practical to support this behavior in the Windows
and SysV cases, I think we should get rid of it rather than expend
effort on supporting it for just some platforms.

regards, tom lane

Re: Changing the setting of wal_sender_timeout per standby

On Fri, Sep 21, 2018 at 02:28:07AM +, Tsunakawa, Takayuki wrote:
> I find it more user friendly to include a description somewhere that
> the user can tune the timeout per standby, like I added a tip in the
> description of wal_sender_timeout.  I'm afraid users won't know
> whether and how to tune the setting per standby, as libpq's options
> parameter doesn't seem well-known in my experience. 

But that does not apply to this single parameter, no?  I would think
that a section in recovery.conf is more adapted.  I can see that the
patch I proposed up-thread could be more precise though, so why not
adding at an extra paragraph?  Here is an idea:
"For a cluster distributed across multiple geographic locations, using
a different value per location brings more flexibility in the cluster
management.  A smaller value is useful for faster failure detection with
a standby having a low connection latency, and a larger value helps in
judging better the health of a standby if located in a remote location,
with a longer connection latency."
--
Michael


signature.asc
Description: PGP signature

Re: Strange failure in LWLock on skink in REL9_5_STABLE

On Fri, Sep 21, 2018 at 4:06 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > ... There may be ways to fix the dsm_resize() path
> > based on the observation that you don't need to fallocate() if you
> > made the mapping smaller, and if you made it bigger then you could
> > always undo that on error (or not) and you haven't thrown away any
> > data.  Hmm... I note that there are actually no callers of
> > dsm_resize(), and it's not implemented on Windows or SystemV.

Erm, actually you probably only need to do ftruncate() *or*
posix_fallocate(), depending on the direction of the resize.  Doing
both is redundant and introduces this theoretical hazard (in practice
I'd be surprised if fallocate() really can fail after you shrank a
file that was already fully allocated).

> Why would we fix it rather than just removing it?

I assumed we wouldn't remove an extern C function extension code
somewhere might use.  Though admittedly I'd be surprised if anyone
used this one.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Strange failure in LWLock on skink in REL9_5_STABLE

On 2018-09-20 23:46:54 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I've made sure that the setting is off now however, I'm not
> > trusting the related logic very much...
> 
> Was it on before?

Yes.

Re: Strange failure in LWLock on skink in REL9_5_STABLE

Thomas Munro  writes:
> ... There may be ways to fix the dsm_resize() path
> based on the observation that you don't need to fallocate() if you
> made the mapping smaller, and if you made it bigger then you could
> always undo that on error (or not) and you haven't thrown away any
> data.  Hmm... I note that there are actually no callers of
> dsm_resize(), and it's not implemented on Windows or SystemV.

Why would we fix it rather than just removing it?

regards, tom lane

Re: Strange failure in LWLock on skink in REL9_5_STABLE

On Fri, Sep 21, 2018 at 3:21 PM Andres Freund  wrote:
> I'm quite suspicious of the logic around:
>
> /*
>  * If we received a query cancel or termination signal, we 
> will have
>  * EINTR set here.  If the caller said that errors are OK 
> here, check
>  * for interrupts immediately.
>  */
> if (errno == EINTR && elevel >= ERROR)
> CHECK_FOR_INTERRUPTS();
>
> because it seems far from guaranteed to do anything meaningful as I
> don't see a guarantee that interrupts are active at that point (e.g. it
> seems quite reasonable to hold an lwlock while resizing).

Right, in that case CFI does nothing and you get the following
ereport() instead, so the user sees an unsightly "Interrupt system
call" message (or however your strerror() spells it).  That would
actually happen in the dsa.c case for example (something that should
be improved especially now that dsm_create() is so slow on Linux,
independently of all this).  That's probably not a great user
experience, but I'm not sure if the fact that it "works around" the
suppression of interrupts while LWLocks are held by converting them
into errors is a more serious problem or not.  The caller has to be
ready for errors to be raised here in any case.

> Afaict that might cause problems at a later stage, because at that point
> we've not adjusted the actual mapping, but *have* ftruncate()ed it. If
> there's actual data in the mapping, that certainly could cause trouble.
>
> In fact, while this commit has expanded the size of the problem, I fail
> to see how the error handling for resizing is correct. It's fine to fail
> in the ftruncate() itself - at that point no changes have been made -,
> but I don't think it's currently ok for posix_fallocate() to ever error
> out.

Right, independently of this commit, dsm_resize() might have done the
actual truncation when the error is reported.  That's not good if the
caller plans to do anything other than abandon ship.  None of this
applies to dsm_create() though, which uses the same code path but
knows how to clean up.  There may be ways to fix the dsm_resize() path
based on the observation that you don't need to fallocate() if you
made the mapping smaller, and if you made it bigger then you could
always undo that on error (or not) and you haven't thrown away any
data.  Hmm... I note that there are actually no callers of
dsm_resize(), and it's not implemented on Windows or SystemV.

--
Thomas Munro
http://www.enterprisedb.com

Re: Problem while setting the fpw with SIGHUP

On Wed, Sep 19, 2018 at 02:20:34PM +0900, Michael Paquier wrote:
> On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote:
>> My latest patch tries to remove the window by imposing all
>> responsibility to apply config file changes to the shared FPW
>> flag on the checkpointer. RecoveryInProgress() is changed to be
>> called prior to UpdateFullPageWrites on the way doing that.
> 
> I still need to look at that in details.  That may be better than what I
> am proposing.  At quick glance what I propose is more simple, and does
> not need enforcing a checkpoint using SIGINT.

I have finally looked at the patch set from Horiguchi-san here:
https://www.postgresql.org/message-id/20180828.193436.253621888.horiguchi.kyot...@lab.ntt.co.jp

And actually this is very close to what my proposal does, except for a
couple of caveats.  Here are my notes:
1) The issue with palloc() happening in critical sections is able to go
away, by making the logic behind UpdateFullPageWrites() more complicated
than necessary.  With the proposed patch, UpdateFullPageWrites() never
gets called by the checkpointer until the startup process has done its
business.  I would have believed that keeping the check to
RecoveryInProgress() directly in UpdateFullPageWrites() makes the logic
more simple.  That's actually what my proposal does.  With your patch,
it would be even possible to add an assertion so as this never gets
called while in recovery.
2) At the end of recovery, if there is a crash before the checkpointer
is able to update the shared parameters it needs to work on
away, then no XLOG_CHANGE_PARAMETER record is generated.  This is not a
problem for normal cases, but there is one scenario where this is a
problem: at the end of recovery if the bgwriter is not started, then the
startup process creates a checkpoint by itself, and it would miss the
record generation.
3) SIGINT is abused of, in such a way that the checkpointer may generate
two checkpoints where only one is needed post-recovery.
4) SyncRepUpdateSyncStandbysDefined() would get called even without
SIGHUP being reached.  This feels also like a future trap waiting to
bite as well.

When I see your patch, I actually see the same kind of logic as what I
propose which is summarized in two points, and that's a good thing:
a) Avoid the allocation in the critical section.
b) Avoid two processes to call UpdateFullPageWrites at the same time.

Now the points mentioned above make what you are proposing weaker in my
opinion, and 2) is an actual bug, side effect of the proposed patch.
--
Michael

signature.asc
Description: PGP signature

Re: generating bootstrap entries for array types

2018-09-20 Thread John Naylor

On 9/21/18, Tom Lane  wrote:
> I did some other minor hacking (mostly, fixing the documentation)
> and pushed it.

Thank you.

-John Naylor

Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-09-20 Thread Andrey Lepikhov

The v6 version of quick vacuum, which utilizes the amtargetdelete() 
interface for retail indextuple deletion.

Now it is more simple and laconic.
It must be applied after Retail-IndexTuple-Deletion-Access-Method.patch.

BENCHMARKS:
---

Initial script:
pgbench -i -s $scale # initial DB generation
"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts USING 
btree (abalance);" # additional index


Comparison with lazy vacuum:

script:
"DELETE FROM pgbench_accounts WHERE (random() < $factor);" # delete a 
part of tuples for cleaning strategies comparison
"VACUUM pgbench_accounts;" # check time of vacuum process by bash 'date 
+%s%N | cut -b1-13' command


Results:
   | $scale=10   | $scale=100  |
$factor| QVAC | LVAC | QVAC | LVAC |
1E-6   | -| -| 284  | 979  |
1E-5   | 78   | 144  | 288  | 1423 |
1E-4   | 72   | 280  | 388  | 3304 |
1E-3   | 189  | 609  | 2294 | 6029 |
1E-2   | 443  | 783  | 54232| 67884|
1E-1   | 1593 | 1237 | 83092| 86104|

where QVAC - forced use of quick vacuum; LVAC - use lazy vacuum for 
index cleanup. $factor corresponds a number of vacuumed tuples. For 
example, $scale=10, $factor=1E-1 -> 10 tuples vacuumed. Time 
measured in ms.


So, quick strategy can be used in a vacuum process effectively up to 
1-2% of DEAD tuples in a relation.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From a5664f6fa42f615d1fb46250ac61aa9ac8a9d32d Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 20 Sep 2018 13:04:26 +0500
Subject: [PATCH] Quick Vacuum Strategy

---
 src/backend/access/heap/heapam.c|  31 +++
 src/backend/access/heap/pruneheap.c |  41 --
 src/backend/catalog/index.c | 122 
 src/backend/commands/vacuumlazy.c   |  32 ++--
 src/include/access/heapam.h |   2 +
 src/include/catalog/index.h |   4 +
 6 files changed, 220 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 56f1d82f96..0f0949bef3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -9412,6 +9412,36 @@ heap_sync(Relation rel)
 	}
 }
 
+/*
+ * Mask DEAD tuples at a HEAP page
+ *
+ * We introduce this function for wal_consistency_checking satisfaction at
+ * Hot Standby node.
+ *
+ * DEAD tuple on a master node has t_cid and t_infomask, which can not be
+ * obtained by WAL records applying on a Hot Standby node.
+ */
+static void
+mask_dead_tuples(Page page)
+{
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber; offnum <= PageGetMaxOffsetNumber(page); offnum = OffsetNumberNext(offnum))
+	{
+		ItemId	lp = PageGetItemId(page, offnum);
+		char*	htup_begin;
+
+		if (!ItemIdIsDead(lp))
+			continue;
+
+		if (!ItemIdHasStorage(lp))
+			continue;
+
+		htup_begin = (char *) PageGetItem(page, lp);
+		memset(htup_begin, MASK_MARKER, ItemIdGetLength(lp));
+	}
+}
+
 /*
  * Mask a heap page before performing consistency checks on it.
  */
@@ -9425,6 +9455,7 @@ heap_mask(char *pagedata, BlockNumber blkno)
 
 	mask_page_hint_bits(page);
 	mask_unused_space(page);
+	mask_dead_tuples(page);
 
 	for (off = 1; off <= PageGetMaxOffsetNumber(page); off++)
 	{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c2f5343dac..f66e5d0260 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,6 +43,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Parameter for target deletion strategy in lazy vacuum */
+double target_index_deletion_factor = 0.01;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
  OffsetNumber rootoffnum,
@@ -405,10 +408,9 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 			if (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer)
 == HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
 			{
-heap_prune_record_unused(prstate, rootoffnum);
+heap_prune_record_dead(prstate, rootoffnum);
 HeapTupleHeaderAdvanceLatestRemovedXid(htup,
 	   &prstate->latestRemovedXid);
-ndeleted++;
 			}
 
 			/* Nothing more to do */
@@ -580,8 +582,10 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		 */
 		for (i = 1; (i < nchain) && (chainitems[i - 1] != latestdead); i++)
 		{
-			heap_prune_record_unused(prstate, chainitems[i]);
+			if (chainitems[i] == latestdead)
+continue;
 			ndeleted++;
+			heap_prune_record_unused(prstate, chainitems[i]);
 		}
 
 		/*
@@ -598,9 +602,28 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
+		{
+			if (rootoffnum != latestdead)
+			{
+if (ItemIdIsNormal(rootlp))
+	heap_prune_record_unused(prstate, latestdead);
+else
+{
+	/*
+	 * We allow overlapping of redirected and dead

Re: Strange failure in LWLock on skink in REL9_5_STABLE

Andres Freund  writes:
> On 2018-09-20 23:15:45 -0400, Tom Lane wrote:
>> https://www.postgresql.org/docs/devel/static/kernel-resources.html#SYSTEMD-REMOVEIPC

> That shouldn't be relevant here - I'm not running the buildfarm from an
> interactive session and then logging out. So that code shouldn't
> trigger.

Well, the reason that systemd behavior is so brain-dead is exactly that
it will trigger against processes that you didn't launch interactively.
IIUC, you just have to log in and out of that account, and anything that
was running in the background (eg cron jobs) under the same userID gets
clobbered.

> I've made sure that the setting is off now however, I'm not
> trusting the related logic very much...

Was it on before?

regards, tom lane

Re: Strange failure in LWLock on skink in REL9_5_STABLE

On 2018-09-20 23:15:45 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > On Fri, Sep 21, 2018 at 2:59 PM Tom Lane  wrote:
> >> I was just looking at that, and my guess is that it was caused by
> >> something doing an ipcrm or equivalent, and is unrelated to your patch.
> >> Especially since skink has succeeded with that patch in several other
> >> branches.
> 
> > I found this case from January:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-01-05%2000%3A10%3A03
> > ! FATAL:  semop(id=1313210374) failed: Invalid argument

> Uh-huh.
> 
> https://www.postgresql.org/docs/devel/static/kernel-resources.html#SYSTEMD-REMOVEIPC

That shouldn't be relevant here - I'm not running the buildfarm from an
interactive session and then logging out. So that code shouldn't
trigger.  I've made sure that the setting is off now however, I'm not
trusting the related logic very much...

Greetings,

Andres Freund

Re: Strange failure in LWLock on skink in REL9_5_STABLE

On 2018-09-20 22:59:29 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > Andres pinged me off-list to point out this failure after my commit 
> > fb389498be:
> 
> > ! FATAL:  semop(id=332464133) failed: Invalid argument
> 
> I was just looking at that, and my guess is that it was caused by
> something doing an ipcrm or equivalent, and is unrelated to your patch.
> Especially since skink has succeeded with that patch in several other
> branches.

I'm (hopefully) the only person with access to that machine, and I
certainly didn't do so. Nor are there script I know of that'd do
so. There's not been a lot of instability on skink, so it's certainly
quite weird.

I'm quite suspicious of the logic around:

/*
 * If we received a query cancel or termination signal, we will 
have
 * EINTR set here.  If the caller said that errors are OK here, 
check
 * for interrupts immediately.
 */
if (errno == EINTR && elevel >= ERROR)
CHECK_FOR_INTERRUPTS();

because it seems far from guaranteed to do anything meaningful as I
don't see a guarantee that interrupts are active at that point (e.g. it
seems quite reasonable to hold an lwlock while resizing).

Afaict that might cause problems at a later stage, because at that point
we've not adjusted the actual mapping, but *have* ftruncate()ed it. If
there's actual data in the mapping, that certainly could cause trouble.

In fact, while this commit has expanded the size of the problem, I fail
to see how the error handling for resizing is correct. It's fine to fail
in the ftruncate() itself - at that point no changes have been made -,
but I don't think it's currently ok for posix_fallocate() to ever error
out.

It's not clear to me how that'd be problematic in 9.5 of all releases
however.

> If it's repeatable, then it would be time to get excited.

Yea, I guess we'll have to wait :/.

Greetings,

Andres Freund

Re: Strange failure in LWLock on skink in REL9_5_STABLE

Thomas Munro  writes:
> On Fri, Sep 21, 2018 at 2:59 PM Tom Lane  wrote:
>> I was just looking at that, and my guess is that it was caused by
>> something doing an ipcrm or equivalent, and is unrelated to your patch.
>> Especially since skink has succeeded with that patch in several other
>> branches.

> I found this case from January:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-01-05%2000%3A10%3A03
> ! FATAL:  semop(id=1313210374) failed: Invalid argument

Uh-huh.

https://www.postgresql.org/docs/devel/static/kernel-resources.html#SYSTEMD-REMOVEIPC

regards, tom lane

Re: Strange failure in LWLock on skink in REL9_5_STABLE

On Fri, Sep 21, 2018 at 2:59 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Andres pinged me off-list to point out this failure after my commit 
> > fb389498be:
>
> > ! FATAL:  semop(id=332464133) failed: Invalid argument
>
> I was just looking at that, and my guess is that it was caused by
> something doing an ipcrm or equivalent, and is unrelated to your patch.
> Especially since skink has succeeded with that patch in several other
> branches.
>
> If it's repeatable, then it would be time to get excited.

I found this case from January:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-01-05%2000%3A10%3A03

! FATAL:  semop(id=1313210374) failed: Invalid argument
! LINE 1: CREATE TABLE as_select1 AS SELECT * FROM pg_class WHERE relk...

   EINVAL The semaphore set doesn't exist, or semid is less than zero,
  or nsops has a nonpositive value.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Strange failure in LWLock on skink in REL9_5_STABLE

Thomas Munro  writes:
> Andres pinged me off-list to point out this failure after my commit 
> fb389498be:

> ! FATAL:  semop(id=332464133) failed: Invalid argument

I was just looking at that, and my guess is that it was caused by
something doing an ipcrm or equivalent, and is unrelated to your patch.
Especially since skink has succeeded with that patch in several other
branches.

If it's repeatable, then it would be time to get excited.

regards, tom lane

Re: logical decoding bug when mapped relation with toast contents is rewritten repeatedly

Hi,

On 2018-09-14 16:13:46 +0200, Tomas Vondra wrote:
> > 
> > I suspect the proper fix would be to have a new HEAP_INSERT_NO_LOGICAL
> > option, and specify that in raw_heap_insert() iff
> > RelationIsLogicallyLogged(state->rs_old_rel) or something like that.
> > 
> > Attached is a *prototype* patch of that approach.  Without the code
> > level changes the addition to test_decoding's rewrite.sql trigger the
> > bug, after it they're fixed.
> > 
> > 
> > The only reason the scenario I was debugging hit this was that there was
> > a cluster wide VACUUM FULL a couple times a day, and replication was
> > several hours behind due to slow network / receiving side.
> > 
> > 
> > Now I'm having a beer outside.

> Yeah, that seems like a bad idea. That error already caught a couple of
> bugs (including da10d6a8a9 and this one), and I have a hunch those are
> not the last ones.

One problem with this is that that means upgrading won't fix an existing
instance of the problem, but turning the ERROR into a WARNING would. I
personally think that's *NOT* enough justification for relaxing the
error, given that recreating the slot would fix the issue, but I see how
other people can reasonably differ.  I can't really see a reasonably
complex approach that solves the issue in a "cake but have it too"
way...

> After discarding 30 theories? Have two.

I will neither confirm nor deny. ;)

Greetings,

Andres Freund

Strange failure in LWLock on skink in REL9_5_STABLE

Hello,

Andres pinged me off-list to point out this failure after my commit fb389498be:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-09-20%2005%3A24%3A34

Change Set for this build:
fb389498be Tue Sep 18 11:19:22 2018 UTC Allow DSM allocation to be interrupted.

The failure looks like this:

! FATAL: semop(id=332464133) failed: Invalid argument
! CONTEXT: SQL statement "CREATE TEMP TABLE brin_result (cid tid)"
! PL/pgSQL function inline_code_block line 22 at SQL statement
! PANIC: queueing for lock while waiting on another one
! server closed the connection unexpectedly
! This probably means the server terminated abnormally
! before or while processing the request.
! connection to server was lost

I don't immediately see any connection between that particular commit,
which relates to the treatment of signals while allocating a DSM
segment, and the location of the first failure, which is in a
statement that is creating a temporary table. On the other hand skink
has been very stable lately. I'm also not sure how the FATAL error
and the PANIC are related (LWLockQueueSelf() has discovered that
MyProc->lwWaiting is already set). Though it's possible that the root
problem was something happening in any of the other parallel tests
running, I don't see how any of those (lock security_label tablesample
object_address rowsecurity collate spgist privileges matview
replica_identity brin gin gist groupingsets) would reach code touched
by that commit in 9.5, but I don't currently have any other ideas
about what happened here.

--
Thomas Munro
http://www.enterprisedb.com

RE: Changing the setting of wal_sender_timeout per standby

2018-09-20 Thread Tsunakawa, Takayuki

From: Michael Paquier [mailto:mich...@paquier.xyz]
> Thanks for the new version.  Per my comments up-thread here, you cannot
> actually use PGC_BACKEND:
> https://www.postgresql.org/message-id/20180919061303.GB19808@paquier.x
> yz
> 
> This would break the case where this parameter is reloaded when a session
> does not use a custom value for wal_sender_timeout.  I have also looked
> at all the code paths using wal_sender_timeout, and the change looks safe
> to me.  Please find attached an update, simplified, version.
> Does that look fine to you?

Thank you, PGC_USERSET is off course fine to me.  I thought PGC_BACKEND 
includes the capability of PGC_SIGHUP, because PGC_BACKEND is placed after 
PGC_SIGHUP in the definition of enum GucContext.  That was a pitfall.  I should 
have paid more attention.

I find it more user friendly to include a description somewhere that the user 
can tune the timeout per standby, like I added a tip in the description of 
wal_sender_timeout.  I'm afraid users won't know whether and how to tune the 
setting per standby, as libpq's options parameter doesn't seem well-known in my 
experience.


Regards
Takayuki Tsunakawa

Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

Hi,

On 2018-09-13 14:29:59 -0700, Andres Freund wrote:
> On 2018-09-04 17:51:30 -0700, Andres Freund wrote:
> > My current proposal is thus to do add a check that does
> > #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
> > something-that-fails
> > #endif
> > in an autoconf test, and have configure complain if that
> > fails. Something roughly along the lines of
> > "Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use 
> > -msse2 or use gcc."
> 
> Here's a patch along those lines.

And pushed. I assume we should note this somewhat prominently in the
minor release notes.

- Andres

Re: FETCH FIRST clause PERCENT option

Hi,

On 2018-09-20 17:06:36 -0700, Mark Dilger wrote:
> I should think that spilling anything to a tuplestore would only be needed
> if the query contains an ORDER BY expression.  If you query
> 
>   FETCH FIRST 50 PERCENT * FROM foo;
> 
> you should just return every other row, discarding the rest, right?  It's
> only when an explicit ordering is given that the need to store the results
> arises.  Even with
> 
>   FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
> 
> you can return one row for every two rows that you get back from the
> sort node, reducing the maximum number you need to store at any time to
> no more than 25% of all rows.

I'm doubtful about the validity of these optimizations, particularly
around being surprising. But I think more importantly, we should focus
on the basic implementation that's needed anyway.

Greetings,

Andres Freund

Re: FETCH FIRST clause PERCENT option

2018-09-20 Thread Mark Dilger

> On Aug 16, 2018, at 7:34 AM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2018-08-16 17:27:45 +0300, Surafel Temesgen wrote:
>> FETCH FIRST with PERCENT option is SQL standard that state limit count to
>> be specified in a percentage in addition to specify it in exact count and
>> listed as one of Major features simply not implemented yet in recent wiki
>> page [1].
>> 
>> I implemented it by executing the query plan twice. One without limit
>> filter to find the total number of row that will be feed to limit node so
>> the exact limit count can be calculated and the query plan execute again
>> with limit filter with newly calculated exact count .

Surafel, there are no regression tests that I can see in your patch.  It
would help if you added some, as then I could precisely what behavior you
are expecting.  As it is, I'm just guessing here, but here goes

> Won't that have rather massive issues with multiple evaluations of
> clauses in the query?  Besides being really expensive?
> 
> I think you'd have to instead spill the query results into a tuplestore
> (like we do for FOR HOLD queries at end of xact), and then do the
> computations based on that.

I should think that spilling anything to a tuplestore would only be needed
if the query contains an ORDER BY expression.  If you query

FETCH FIRST 50 PERCENT * FROM foo;

you should just return every other row, discarding the rest, right?  It's
only when an explicit ordering is given that the need to store the results
arises.  Even with

FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;

you can return one row for every two rows that you get back from the
sort node, reducing the maximum number you need to store at any time to
no more than 25% of all rows.

Or am I missing something?

mark

Re: Changing the setting of wal_sender_timeout per standby

On Wed, Sep 19, 2018 at 06:21:52AM +, Tsunakawa, Takayuki wrote:
> How embarrassing...  I'm sorry to cause you trouble to point out a
> silly mistake like this (I thought I would write as you suggested, but
> it seems that I was not who I usually am.)  The revised patch
> attached. 

Thanks for the new version.  Per my comments up-thread here, you cannot
actually use PGC_BACKEND:
https://www.postgresql.org/message-id/20180919061303.gb19...@paquier.xyz

This would break the case where this parameter is reloaded when a
session does not use a custom value for wal_sender_timeout.  I have also
looked at all the code paths using wal_sender_timeout, and the change
looks safe to me.  Please find attached an update, simplified, version.
Does that look fine to you?
--
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e1073ac6d3..29bd0eca8c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3127,10 +3127,8 @@ include_dir 'conf.d'
 Terminate replication connections that are inactive longer
 than the specified number of milliseconds. This is useful for
 the sending server to detect a standby crash or network outage.
-A value of zero disables the timeout mechanism.  This parameter
-can only be set in
-the postgresql.conf file or on the server command line.
-The default value is 60 seconds.
+A value of zero disables the timeout mechanism. The default value
+is 60 seconds.

   
  
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 77662aff7f..e9f542cfed 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2539,7 +2539,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+		{"wal_sender_timeout", PGC_USERSET, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
 			GUC_UNIT_MS


signature.asc
Description: PGP signature

transction_timestamp() inside of procedures

2018-09-20 Thread Bruce Momjian

This function shows that only clock_timestamp() advances inside a
procedure, not statement_timestamp() or transaction_timestamp():

CREATE OR REPLACE PROCEDURE test_timestamp () AS $$
DECLARE
str TEXT;
BEGIN
WHILE TRUE LOOP
-- clock_timestamp() is updated on every loop
SELECT clock_timestamp() INTO str;
RAISE NOTICE 'clock   %', str;
SELECT statement_timestamp() INTO str;
RAISE NOTICE 'statement   %', str;
SELECT transaction_timestamp() INTO str;
RAISE NOTICE 'transaction %', str;
COMMIT;

PERFORM pg_sleep(2);
END LOOP;
END
$$ LANGUAGE plpgsql;

CALL test_timestamp();
NOTICE:  clock   2018-09-20 19:38:22.575794-04
NOTICE:  statement   2018-09-20 19:38:22.575685-04
NOTICE:  transaction 2018-09-20 19:38:22.575685-04

--> NOTICE:  clock   2018-09-20 19:38:24.578027-04
NOTICE:  statement   2018-09-20 19:38:22.575685-04
NOTICE:  transaction 2018-09-20 19:38:22.575685-04

This surprised me since I expected a new timestamp after commit.  Is
this something we want to change or document?  Are there other
per-transaction behaviors we should adjust?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

On 2018-09-20 19:03:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-20 18:38:36 -0400, Tom Lane wrote:
> >> hmmm ... but even with variadic, C's macro facility is so weak that
> >> I'm not sure we can reimplement these with it.  What would the
> >> expansion look like?
> 
> > There's a dirty hack to count arguments in vararg macros:
> > https://groups.google.com/d/msg/comp.std.c/d-6Mj5Lko_s/5fW1bP6T3RIJ
> 
> Doesn't seem to help for this case.  What we really want is to expand
> a given pattern once for each variadic argument, and I don't see how
> to get there from here.

Depends on whether your goal is to simplify *using* the macro, or the
infrastructure for the macro.  Afaict it should be relatively
straightforward to use a, possibly simplified, macro like referenced
above to have TailMatches(...), TailMatchesCS(...), Matches(...) that
then expand to the current *N macros.

> Although maybe I'm thinking too much inside-the-box.  The expansion
> doesn't necessarily have to be identical to the code the macros
> generate today.  In fact, that code is kinda bulky.  I wonder if
> we could go over to something involving a variadic function, or
> maybe an array of string-pointer constants?

Yea, that might be a way to simplify both the macros and the use of the
macros.  Assuming we have something like PP_NARG, ISTM it should be
relatively straightforward to define something like

#define Matches(...) CheckMatchesFor(PP_NARG(__VA_ARGS__), __VA_ARGS__)

and then have a CheckMatchesFor() first check previous_words_count, and
then just have a simple for loop through previous_words[] - afaict the
number of arguments ought to suffice to make that possible?

Greetings,

Andres Freund

On-disk compatibility for nbtree-unique-key enhancement

2018-09-20 Thread Peter Geoghegan

I would like to get some feedback on dealing with on-disk
compatibility for my nbtree patch, which is in the current CF [1]. I
thought I'd start a new thread on these questions, since it's probably
the most sensitive aspect of the project, and can be discussed in a
fairly contained way.

My patch to make nbtree index tuple keys uniquely identifiable by
using heap TID as a differentiating last attribute/part of the key
space complicates space management within routines like
_bt_findsplitloc(). Though, ideally, not very much. In general, we may
need to make new "pivot" tuples [2] (all of which originate as the new
high key added on the left side of a leaf page split) have an
additional, explicit representation of heap TID. I'm MAXALIGN()'ing
ItemPointerData to fit the "extra" heap TID attribute once an actual
page split in underway, so that's an extra 8 bytes that I might have
to append to a nonpivot leaf tuple when a new high key/pivot is
generated during a leaf page split. This 8 bytes needs to be accounted
for.

I want to make a general pessimistic assumption that it's always
necessary to have an extra 8 bytes to fit a heap TID, stolen from the
"1/3 of a buffer" limit that we already enforce as the largest
possible nbtree tuple size. Prior to the current _bt_findsplitloc()
coding (circa 1999), there were bugs about a failure to find a split
point, and I'm keen to avoid more problems along those lines.
Actually, I think that it's essential to keep the space management
logic as simple as possible, with everything ideally confined to a
single page (and not reliant on things that are true across pages). So
I feel reasonably confident that the best thing to do is impose
certain theoretical costs on users in the short term, rather than
doing something fancy and brittle that might be much harder to
stabilize in the long term.

More concretely:

* My patch changes the definition of BTMaxItemSize() for indexes on
the proposed new BTREE_VERSION, v4. We insist that new tuples must be
less than or equal to 2704 bytes in size, including all IndexTuple
overhead, rather than the current 2712 bytes.

* Under this scheme, the "internal" limit will remain 2712 bytes, so
that pivot tuples that are already at the proposed new user-visible
maximum of 2704 can safely have another 8 bytes added.

* This means that there is a compatibility issue for anyone that is
already right on the threshold -- we *really* don't want to see a
REINDEX fail, but that seems like a possibility that we need to talk
about now.

We've documented that there is this 1/3 of a page restriction, but
specify that it's "approximate", so I don't think our documentation as
written would need to be changed (though a release note item would
certainly be in order). Also, since the vagaries of how well TOAST can
compress are very hard to understand, I have to imagine that almost
nobody relies on having the last 8 bytes -- if they do, then they're
playing with fire.

Not being able to REINDEX is still really awful, however unrealistic
the scenario may be. I can imagine ways of avoiding it, but I'd really
rather not go there, since the cure is worse than the disease. I'll
still throw out some ideas for a cure, though:

* Maybe we could rely on the fact that internal pages are the only
pages that can contain more than one maybe-extra-sized 2720 byte pivot
tuple, since in practice they also always have one "negative infinity"
item. Those are always 16 bytes, so it works out.

We've done something a bit like this before [3], though I can see big
downsides to going further with it. It seems pretty grotty to me. I
actually would like to *not* truncate outside of the leftmost page on
the level, so that I can use the not-actually-negative-infinity first
item in internal pages as a "low key". This would make low-overhead
prefix compression within internal pages work. (This is not to be
confused with leaf level prefix compression, which the DBA would have
to enable.)

* Alternatively, we could lift the general restriction on out-of-line
storage within indexes, but I think that that would probably break
certain things in a way that would go undetected for a long time.

It would break amcheck's heapallindexed check, for one thing, and it
might have implications for buffer locking protocols. In general,
out-of-line storage within indexes seems like it works against what an
index is supposed to do. And, it's a lot of effort for a small to
nonexistent reward.

Does it seem like I have the right general idea here?

[1] https://commitfest.postgresql.org/19/1787/
[2]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/README;h=3680e69b89a8458d58f6c3361eb612788fa33786;hb=df8b5f3eb8a7c477156d0ad9d83e7297912cfe79#l612
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/itup.h;h=bd3a702380954bc69c9536876094249430cb7c72;hb=df8b5f3eb8a7c477156d0ad9d83e7297912cfe79#l139
--
Peter Geoghegan

Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

Andres Freund  writes:
> On 2018-09-20 18:38:36 -0400, Tom Lane wrote:
>> hmmm ... but even with variadic, C's macro facility is so weak that
>> I'm not sure we can reimplement these with it.  What would the
>> expansion look like?

> There's a dirty hack to count arguments in vararg macros:
> https://groups.google.com/d/msg/comp.std.c/d-6Mj5Lko_s/5fW1bP6T3RIJ

Doesn't seem to help for this case.  What we really want is to expand
a given pattern once for each variadic argument, and I don't see how
to get there from here.

Although maybe I'm thinking too much inside-the-box.  The expansion
doesn't necessarily have to be identical to the code the macros
generate today.  In fact, that code is kinda bulky.  I wonder if
we could go over to something involving a variadic function, or
maybe an array of string-pointer constants?

regards, tom lane

Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

On 2018-09-20 18:38:36 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > . o O ( hmm, we now have variadic macros )
> 
> hmmm ... but even with variadic, C's macro facility is so weak that
> I'm not sure we can reimplement these with it.  What would the
> expansion look like?

There's a dirty hack to count arguments in vararg macros:
https://groups.google.com/d/msg/comp.std.c/d-6Mj5Lko_s/5fW1bP6T3RIJ

Afaict that ought to be enough to implement something like the current
macros? Whether that's too ugly or not, I don't know.


> It constantly annoys me that C's so weak here.

Yea. Weak and fragile, both.


Greetings,

Andres Freund

Re: Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

Thomas Munro  writes:
> . o O ( hmm, we now have variadic macros )

hmmm ... but even with variadic, C's macro facility is so weak that
I'm not sure we can reimplement these with it.  What would the
expansion look like?

(It constantly annoys me that C's so weak here.  In the language
I used for my first for-pay programming work, Bliss/11, the macro
facility could have done that easily.  That was 45 years ago.)

regards, tom lane

Unreadable, unmaintainable code in tab completion

At some point, somebody had the bright idea that various queries
in tab-complete.c should be named like this:

Query_for_list_of_tsvmf
Query_for_list_of_tmf
Query_for_list_of_tpm
Query_for_list_of_tm

Even assuming that you can fill in the unstated "relations of these
relkinds", I challenge anybody to know right offhand what those sets
correspond to semantically.  Or to figure out what should change if
a new relkind gets added.  Don't expect to get any help from the
comments, because there are none.

It did not help any when somebody else decided to cram the new
partitioned-table relkind into some (not all) of these queries without
renaming them, so that even this crummy naming principle no longer
applies at all.

Meanwhile, immediately adjacent to those we have a far better design
example:

/* Relations supporting INSERT, UPDATE or DELETE */
static const SchemaQuery Query_for_list_of_updatables = {

Here we have something where it's actually possible to decide in
a principled fashion whether a new relkind belongs in the query's
relkind set, and where we aren't faced with a need to rename the
query to reflect such a change.  And it's way more readable, if you
ask me.

So I think we ought to rename these query variables on that model.
Looking through the references to each of these, it appears that
what they are really doing is:

Query_for_list_of_tsvmf SELECT, TABLE, GRANT/REVOKE [table], \dp
Query_for_list_of_tmf   ANALYZE
Query_for_list_of_tpm   CREATE INDEX ON
Query_for_list_of_tmVACUUM, CLUSTER, REINDEX TABLE

Here's what I propose:

* Rename Query_for_list_of_tsvmf to Query_for_list_of_selectables,
and then add

/* Currently, selectable rels are exactly the ones you can GRANT on */
#define Query_for_list_of_grantables Query_for_list_of_selectables

and use those two names as appropriate at the call sites.  If the
relkinds for those two behaviors ever diverge, we can replace the
#define with a separate constant, but we needn't do so today.

* Rename Query_for_list_of_tmf to Query_for_list_of_analyzables.

* Rename Query_for_list_of_tpm to Query_for_list_of_indexables.

* Rename Query_for_list_of_tm to Query_for_list_of_vacuumables,
and add a comment saying that this currently covers CLUSTER as well.
(Or we could add a #define like that for the GRANT case.)

* Change the reference in REINDEX TABLE to reference
Query_for_list_of_indexables not Query_for_list_of_vacuumables.

The last item will have the effect that partitioned tables are offered
for REINDEX TABLE.  Now, if you actually try that, you get

WARNING:  REINDEX of partitioned tables is not yet implemented, skipping 
"mlparted"
NOTICE:  table "mlparted" has no indexes
REINDEX

but I think tab-complete.c has exactly no business knowing that
that isn't implemented yet.  Are we going to remember to change
it when that does get implemented?  Are we going to put in a
server-version-specific completion rule?  I'd say the answers
are "maybe not" and "definitely not", so we might as well just
allow it now and not have to fix this code when it gets changed.

I haven't written the actual patch yet, but the above sketch
seems sufficient to explain it.  Any objections, or bikeshedding
on the names?

regards, tom lane

Re: FETCH FIRST clause PERCENT option

On Fri, Aug 31, 2018 at 11:42 PM Surafel Temesgen  wrote:
> On Tue, Aug 28, 2018 at 7:33 PM Erik Rijkers  wrote:
>>
>> ;
>>
>> TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
>> Line: 42
>
> The attache patch include a fix for the crash .can you check it again?

FYI this fails[1] in src/test/modules/test_ddl_deparse's create_table
test because of the keyword:

  CREATE TABLE stud_emp (
  percent int4
  ) INHERITS (emp, student);
! ERROR:  syntax error at or near "percent"

[1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/430777123

--
Thomas Munro
http://www.enterprisedb.com

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-09-20 Thread Peter Geoghegan

On Wed, Sep 19, 2018 at 9:56 PM, Andrey Lepikhov
 wrote:
> Note, that the interface of _bt_moveright() and _bt_binsrch() functions with
> combination of scankey, scantid and nextkey parameters is too semantic
> loaded.
> Everytime of code reading i spend time to remember, what this functions do
> exactly.
> May be it needed to rewrite comments.

I think that it might be a good idea to create an "BTInsertionScankey"
struct, or similar, since keysz, nextkey, the scankey array and now
scantid are all part of that, and are all common to these 4 or so
functions. It could have a flexible array at the end, so that we still
only need a single palloc(). I'll look into that.

> What do you think about submitting the patch to the next CF?

Clearly the project that you're working on is a difficult one. It's
easy for me to understand why you might want to take an iterative
approach, with lots of prototyping. Your patch needs attention to
advance, and IMV the CF is the best way to get that attention. So, I
think that it would be fine to go submit it now.

I must admit that I didn't even notice that your patch lacked a CF
entry. Everyone has a different process, perhaps.

-- 
Peter Geoghegan

Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

On Fri, Sep 21, 2018 at 9:22 AM Tom Lane  wrote:
> Arthur Zakirov  writes:
> > On Thu, Aug 30, 2018 at 05:54:23PM +0100, Dagfinn Ilmari Mannsåker wrote:
> >> And just after hitting send, I noticed I'd typed ALTER TABLE instead of
> >> ALTER DATABASE, including in the commit message :(
> >> Fixed patch attached.
>
> > The patch seems reasonable. It fixes the lack of tab completion for
> > ALTER DATABASE ... SET TABLESPACE ... .
>
> LGTM too, pushed.

+   else if (Matches5("ALTER", "DATABASE", MatchAny, "SET", "TABLESPACE"))

. o O ( hmm, we now have variadic macros )

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE

Arthur Zakirov  writes:
> On Thu, Aug 30, 2018 at 05:54:23PM +0100, Dagfinn Ilmari Mannsåker wrote:
>> And just after hitting send, I noticed I'd typed ALTER TABLE instead of
>> ALTER DATABASE, including in the commit message :(
>> Fixed patch attached.

> The patch seems reasonable. It fixes the lack of tab completion for
> ALTER DATABASE ... SET TABLESPACE ... .

LGTM too, pushed.

regards, tom lane

Re: [patch] Support LLVM 7

Re: To Andres Freund 2018-09-20 <20180920210315.gb21...@msg.df7cb.de>
> Server beendete die Verbindung unerwartet

Something ate the attachments. Sorry.

FATAL:  fatal llvm error: Cannot select: 0x57e61d40: ch,glue = X86ISD::CALL 
0x57e61cb0, 0x57e61e18, Register:i32 $edi, RegisterMask:Untyped, 0x57e61cb0:1
  0x57e61e18: i32 = X86ISD::Wrapper TargetGlobalAddress:i32 0
0x57e61dd0: i32 = TargetGlobalAddress 0
  0x57e61c68: i32 = Register $edi
  0x57e61cf8: Untyped = RegisterMask
  0x57e61cb0: ch,glue = CopyToReg 0x57e61c20, Register:i32 $edi, 0x57e61b90
0x57e61c68: i32 = Register $edi
0x57e61b90: i32,ch = CopyFromReg 0x57e1fd3c, Register:i32 %27
  0x57e61b48: i32 = Register %27
In function: evalexpr_0_0
Server beendete die Verbindung unerwartet

Christoph


19323.0.bc.gz
Description: application/gzip


19323.0.optimized.bc.gz
Description: application/gzip

Re: [patch] Support LLVM 7

Re: Andres Freund 2018-09-20 <20180920173238.f5idtzdlpkjsu...@alap3.anarazel.de>
> That's pretty clearly an LLVM bug. Could you enable jit_dump_bitcode and
> send the bitcode files (I assume there should be something like
> ..bc and the same with .optimized.bc) from the
> data directory?
> 
> Not that I think x32 is a particularly popular database platform, but
> LLVM clearly needs to be fixed independent of PG...

$ PGOPTIONS="-c jit_debugging_support=on" psql
psql (11beta4 (Debian 11~beta4-2))
postgres=# set jit=on;
postgres=# set jit_dump_bitcode = on;
postgres=# \i /home/cbe/postgresql/debian/11/src/test/regress/sql/boolean.sql 

FATAL:  fatal llvm error: Cannot select: 0x580bfdf0: ch,glue = X86ISD::CALL 
0x580bfd60, 0x580bfec8, Register:i32 $edi, RegisterMask:Untyped, 0x580bfd60:1
  0x580bfec8: i32 = X86ISD::Wrapper TargetGlobalAddress:i32 0
0x580bfe80: i32 = TargetGlobalAddress 0
  0x580bfd18: i32 = Register $edi
  0x580bfda8: Untyped = RegisterMask
  0x580bfd60: ch,glue = CopyToReg 0x580bfcd0, Register:i32 $edi, 0x580bfc40
0x580bfd18: i32 = Register $edi
0x580bfc40: i32,ch = CopyFromReg 0x5807eeec, Register:i32 %27
  0x580bfbf8: i32 = Register %27
In function: evalexpr_0_0
Server beendete die Verbindung unerwartet

gdb reports "exited with code 01" at that point.

Christoph

Re: Allow parallelism for deferrable serializable txns

2018-09-20 Thread Kevin Grittner

On Thu, Sep 6, 2018 at 2:14 PM Andres Freund  wrote:

> afaict for deferrable READ ONLY DEFERRABLE transactions we could
> trivially allow parallelism?  Am I missing something?

I think you are right.  What's more, DEFERRABLE transactions seem very
likely to be cases where parallelism would be especially beneficial.

Will take care of it.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: [patch] Support LLVM 7

Re: Andres Freund 2018-09-20 <20180920173009.ywi5grbotl7um...@alap3.anarazel.de>
> Hm, this is pretty helpful.  Sorry to ask, but could you a) turn on
> jit_debugging_support (connection start) b) jit_dump_bitcode.
> 
> Then reproduce again.  After that, it'd be helpful to get:
> 1) /proc/cpuinfo
> 2) the "newest" *.bc file from the data directory
> 3) a backtrace
> 4) gdb disassemble at the point of the error.

postgres=# show jit_debugging_support ;
 jit_debugging_support
---
 on
(1 Zeile)

postgres=# set jit = on;
SET
postgres=# set jit_dump_bitcode = on;
SET
postgres=# select pg_backend_pid(); 
 pg_backend_pid

  14414
(1 Zeile)

postgres=# SELECT   
ARRAY(SELECT f.i FROM ( 
(SELECT d + g.i FROM 
generate_series(4, 30, 3) d ORDER BY 1)
UNION ALL   
(SELECT d + g.i FROM generate_series(0, 30, 5) d ORDER BY 1)
) f(i)  
ORDER BY f.i LIMIT 10)  
FROM generate_series(1, 
3) g(i);

(gdb) f 0
#0  0xf4a20c14 in evalexpr_0_15 ()
(gdb) disassemble
Dump of assembler code for function evalexpr_0_15:
   0xf4a20b58 <+0>: mflrr0
   0xf4a20b5c <+4>: stw r0,4(r1)
   0xf4a20b60 <+8>: stwur1,-48(r1)
   0xf4a20b64 <+12>:mr  r6,r4
   0xf4a20b68 <+16>:mr  r7,r3
   0xf4a20b6c <+20>:addir8,r3,8
   0xf4a20b70 <+24>:addir9,r3,5
   0xf4a20b74 <+28>:lwz r4,4(r4)
   0xf4a20b78 <+32>:lwz r3,12(r3)
   0xf4a20b7c <+36>:lwz r10,28(r3)
   0xf4a20b80 <+40>:lwz r3,32(r3)
   0xf4a20b84 <+44>:stw r4,44(r1)
   0xf4a20b88 <+48>:stw r9,40(r1)
   0xf4a20b8c <+52>:stw r5,36(r1)
   0xf4a20b90 <+56>:stw r6,32(r1)
   0xf4a20b94 <+60>:stw r7,28(r1)
   0xf4a20b98 <+64>:stw r8,24(r1)
   0xf4a20b9c <+68>:stw r10,20(r1)
   0xf4a20ba0 <+72>:stw r3,16(r1)
   0xf4a20ba4 <+76>:b   0xf4a20ba8 
   0xf4a20ba8 <+80>:lwz r3,44(r1)
   0xf4a20bac <+84>:lwz r4,24(r3)
   0xf4a20bb0 <+88>:cmplwi  r4,0
   0xf4a20bb4 <+92>:bne 0xf4a20bc8 
   0xf4a20bb8 <+96>:b   0xf4a20bbc 
   0xf4a20bbc <+100>:   lwz r3,44(r1)
   0xf4a20bc0 <+104>:   bl  0xf4a20c50 
   0xf4a20bc4 <+108>:   b   0xf4a20bc8 
   0xf4a20bc8 <+112>:   lis r3,423
   0xf4a20bcc <+116>:   ori r4,r3,16744
   0xf4a20bd0 <+120>:   lwz r3,28(r1)
   0xf4a20bd4 <+124>:   lwz r5,32(r1)
   0xf4a20bd8 <+128>:   stfdf5,1(r16)
   0xf4a20bdc <+132>:   b   0xf4a20be0 
   0xf4a20be0 <+136>:   lwz r3,24(r1)
   0xf4a20be4 <+140>:   lwz r3,0(r3)
   0xf4a20be8 <+144>:   lwz r4,40(r1)
   0xf4a20bec <+148>:   lbz r5,0(r4)
   0xf4a20bf0 <+152>:   lwz r6,20(r1)
   0xf4a20bf4 <+156>:   lwz r7,16(r1)
   0xf4a20bf8 <+160>:   stb r5,0(r7)
   0xf4a20bfc <+164>:   cmplwi  r5,0
   0xf4a20c00 <+168>:   stw r3,12(r1)
   0xf4a20c04 <+172>:   stw r6,8(r1)
   0xf4a20c08 <+176>:   bne 0xf4a20c24 
   0xf4a20c0c <+180>:   b   0xf4a20c10 
   0xf4a20c10 <+184>:   lwz r3,12(r1)
=> 0xf4a20c14 <+188>:   .long 0xae81
   0xf4a20c18 <+192>:   lwz r4,8(r1)
   0xf4a20c1c <+196>:   stw r3,0(r4)
   0xf4a20c20 <+200>:   b   0xf4a20c24 
   0xf4a20c24 <+204>:   lwz r3,24(r1)
   0xf4a20c28 <+208>:   lwz r3,0(r3)
   0xf4a20c2c <+212>:   lwz r4,40(r1)
   0xf4a20c30 <+216>:   lbz r5,0(r4)
   0xf4a20c34 <+220>:   clrlwi  r5,r5,31
   0xf4a20c38 <+224>:   lwz r6,36(r1)
   0xf4a20c3c <+228>:   stb r5,0(r6)
   0xf4a20c40 <+232>:   lwz r0,52(r1)
   0xf4a20c44 <+236>:   addir1,r1,48
   0xf4a20c48 <+240>:   mtlrr0
   0xf4a20c4c <+244>:   blr
End of assembler dump.

Program received signal SIGSEGV, Segmentation fault.
0xf4a20c14 in evalexpr_0_15 ()
(gdb) bt f
#0  0xf4a20c14 in evalexpr_0_15 ()
No symbol table info available.
#1  0xf4a41b90 in ExecRunCompiledExpr (state=0x1a740bc, econtext=0x1a72e60, 
isNull=0xffe17c2b)
at ./build/../src/backend/jit/llvm/llvmjit_expr.c:2591
cstate = 
func = 0xf4a20b58 
#2  0x00c2d39c in ExecEvalExprSwitchContext (isNull=0xffe17c2b, 
econtext=,
state=0x1a740bc) at ./build/../src/include/executor/executor.h:303
retDatum = 
oldContext = 0x1a06cc0
retDatum = 
oldContext = 
#3  ExecProject (projInfo=0x1a740b8) at 
./build/../src/include/executor/executor.h:337
econtext = 
state = 0x1a740bc
slot = 0x1a74020
isnull = 252
econtext = 
state = 
slot

Unclear error message

2018-09-20 Thread Laurenz Albe

In backend/commands/tablecmds.c, function ATAddForeignKeyConstraint, I find:

if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
if (!recurse)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("foreign key referencing partitioned table \"%s\" must 
not be ONLY",
RelationGetRelationName(pkrel;

That message is wrong, because "rel" and not "pkrel" is the partitioned table.
I think it should be

ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("foreign key cannot be defined on ONLY \"%s\" for a 
partitioned table",
Relatio
nGetRelationName(rel;

Yours,
Laurenz Albe

Re: generating bootstrap entries for array types

I wrote:
> I did some other minor hacking (mostly, fixing the documentation)
> and pushed it.

BTW, I hadn't thought to measure before, but this patch cut the
size of pg_type.dat by almost 40%.  Nice!

regards, tom lane

Re: generating bootstrap entries for array types

ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> Tom Lane  writes:
>> I'm inclined to think the right way is to do the expansion always,
>> and teach reformat_dat_file.pl to drop autogenerated array types
>> on the way out, making this work more like the existing facilities
>> for default/computed fields.
>> 
>> The easiest way to make that happen seems to be to invent another, purely
>> internal metadata field, along the lines of "is_autogenerated_entry".
>> Fooling with that now ...

> Something like this, on top of the v2 patch?

Yeah, that's pretty close to what I came up with, except that I thought
it'd be good if "reformat_dat_file.pl --full-tuples" would print
autogenerated entries; seems useful for debug purposes if nothing else.
So that requires also teaching ParseData to ignore autogenerated entries
on read-in, else you end up with duplicates.

I did some other minor hacking (mostly, fixing the documentation)
and pushed it.

regards, tom lane

Re: generating bootstrap entries for array types

2018-09-20 Thread John Naylor

On 9/20/18, Dagfinn Ilmari Mannsåker  wrote:
> Hi again John,

Hi Ilmari,
My apologies -- your messages ended up in my spam folder. I only saw
this thread again when Tom replied. Thanks for the review! I'll keep a
look out for any future messages.

-John Naylor

Re: shared-memory based stats collector

Hi,

On 2018-09-20 09:55:27 +0200, Antonin Houska wrote:
> I've spent some time reviewing this version.
> 
> Design
> --
> 
> 1. Even with your patch the stats collector still uses an UDP socket to
>receive data. Now that the shared memory API is there, shouldn't the
>messages be sent via shared memory queue? [1] That would increase the
>reliability of message delivery.
> 
>I can actually imagine backends inserting data into the shared hash tables
>themselves, but that might make them wait if the same entries are accessed
>by another backend. It should be much cheaper just to insert message into
>the queue and let the collector process it. In future version the collector
>can launch parallel workers so that writes by backends do not get blocked
>due to full queue.

I don't think either of these is right. I think it's crucial to get rid
of the UDP socket, but I think using a shmem queue is the wrong
approach. Not just because postgres' shm_mq is single-reader/writer, but
also because it's plainly unnecessary.  Backends should attempt to
update the shared hashtable, but acquire the necessary lock
conditionally, and leave the pending updates of the shared hashtable to
a later time if they cannot acquire the lock.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

2018-09-20 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-09-20 09:07:21 -0400, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > FWIW, not that I want to do that now, but at some point it might make
> > > sense to sub-divide this into things like number of "expressions",
> > > "tuple deforming", "plans", ...  Just mentioning that if somebody wants
> > > to comment on reformatting this as well, if we're tinkering anyway.
> > 
> > I'd actually think we'd maybe want some kind of 'verbose' mode which
> > shows exactly what got JIT'd and what didn't- one of the questions that
> > I think people will be asking is "why didn't X get JIT'd?" and I don't
> > think that's very easy to figure out currently.
> 
> That seems largely a separate discussion / feature though, right?  I'm
> not entirely clear what precisely you mean with "why didn't X get
> JIT'd?" - currently that's a whole query decision.

There are things that can't be JIT'd currently and, I'm thinking anyway,
we'd JIT what we are able to and report out what was JIT'd in some
fashion which would inform a user what was able to be JIT'd and what
wasn't.  I'll admit, perhaps this is more of a debugging tool than
something that a end-user would find useful, unless there's possibly
some way that a user could effect what happens.

> > > I'm pretty certain you're right :).  There's already arguments around
> > > making optimization more gradual (akin to O1,2,3).
> > 
> > That certainly sounds like it'd be very neat to have though I wonder how
> > well we'll be able to automatically plan out which optimization level to
> > use when..
> 
> Well, that's not really different from having to decide whether to use
> JIT or not.  I suspect that once / if we get caching and/or background
> JIT compilation, we can get a lot more creative around this.

yeah, I definitely understand that, cacheing, in particular, I would
expect to play a very large role.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [patch] Support LLVM 7

Hi,

On 2018-09-20 10:10:44 +0200, Christoph Berg wrote:
> In the meantime, there's a third architecture where llvm itself
> compiled, but explodes with PG11 - x32:
> 
>   SELECT '' AS tf_12, BOOLTBL1.*, BOOLTBL2.*
>  FROM BOOLTBL1, BOOLTBL2
>  WHERE BOOLTBL2.f1 <> BOOLTBL1.f1;
> ! FATAL:  fatal llvm error: Cannot select: 0x57a7ae60: ch,glue = X86ISD::CALL 
> 0x57a7add0, 0x57a7af38, Register:i32 $edi, RegisterMask:Untyped, 0x57a7add0:1
> !   0x57a7af38: i32 = X86ISD::Wrapper TargetGlobalAddress:i32 (%struct.TupleTableSlot*)* @deform_0_1> 0
> ! 0x57a7aef0: i32 = TargetGlobalAddress @deform_0_1> 0
> !   0x57a7ad88: i32 = Register $edi
> !   0x57a7ae18: Untyped = RegisterMask
> !   0x57a7add0: ch,glue = CopyToReg 0x57a7ad40, Register:i32 $edi, 0x57a7acb0
> ! 0x57a7ad88: i32 = Register $edi
> ! 0x57a7acb0: i32,ch = CopyFromReg 0x57a367ac, Register:i32 %27
> !   0x57a7ac68: i32 = Register %27
> ! In function: evalexpr_0_0
> ! server closed the connection unexpectedly
> ! This probably means the server terminated abnormally
> ! before or while processing the request.
> ! connection to server was lost
> 
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-11&arch=x32&ver=11~beta3-2&stamp=1537286634&raw=0

That's pretty clearly an LLVM bug. Could you enable jit_dump_bitcode and
send the bitcode files (I assume there should be something like
..bc and the same with .optimized.bc) from the
data directory?

Not that I think x32 is a particularly popular database platform, but
LLVM clearly needs to be fixed independent of PG...

Greetings,

Andres Freund

Re: [patch] Support LLVM 7

On 2018-09-20 15:18:14 +0200, Christoph Berg wrote:
> Re: To Andres Freund 2018-09-20 <20180920081044.ga16...@msg.df7cb.de>
> > > > 2018-09-15 10:49:25.052 UTC [26458] DETAIL:  Failed process was 
> > > > running: SELECT '' AS tf_12, BOOLTBL1.*, BOOLTBL2.*
> > > >FROM BOOLTBL1, BOOLTBL2
> > > >WHERE BOOLTBL2.f1 <> BOOLTBL1.f1;
> > > > 2018-09-15 10:49:25.052 UTC [26458] LOG:  terminating any other active 
> > > > server processes
> > > 
> > > Hm. Is there any chance to get a backtrace for this one?  This could,
> > > although I think less likely so, also be a postgres issue
> > > (e.g. generating code for the wrong microarch).
> > 
> > I'll see if I can find a porterbox to get a backtrace.

Hm, this is pretty helpful.  Sorry to ask, but could you a) turn on
jit_debugging_support (connection start) b) jit_dump_bitcode.

Then reproduce again.  After that, it'd be helpful to get:
1) /proc/cpuinfo
2) the "newest" *.bc file from the data directory
3) a backtrace
4) gdb disassemble at the point of the error.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

On 2018-09-20 09:07:21 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > JIT:
> > > >   Functions: 2
> > > >   Generation Time: 0.680 ms
> > > >   Inlining: true
> > > >   Inlining Time: 7.591 ms
> > > >   Optimization: true
> > > >   Optimization Time: 20.522 ms
> > > >   Emission Time: 14.607 ms
> [...]
> > > > How about making that:
> > > > JIT:
> > > >   Functions: 2
> > 
> > FWIW, not that I want to do that now, but at some point it might make
> > sense to sub-divide this into things like number of "expressions",
> > "tuple deforming", "plans", ...  Just mentioning that if somebody wants
> > to comment on reformatting this as well, if we're tinkering anyway.
> 
> I'd actually think we'd maybe want some kind of 'verbose' mode which
> shows exactly what got JIT'd and what didn't- one of the questions that
> I think people will be asking is "why didn't X get JIT'd?" and I don't
> think that's very easy to figure out currently.

That seems largely a separate discussion / feature though, right?  I'm
not entirely clear what precisely you mean with "why didn't X get
JIT'd?" - currently that's a whole query decision.

> > I'm pretty certain you're right :).  There's already arguments around
> > making optimization more gradual (akin to O1,2,3).
> 
> That certainly sounds like it'd be very neat to have though I wonder how
> well we'll be able to automatically plan out which optimization level to
> use when..

Well, that's not really different from having to decide whether to use
JIT or not.  I suspect that once / if we get caching and/or background
JIT compilation, we can get a lot more creative around this.

Greetings,

Andres Freund

Re: generating bootstrap entries for array types

2018-09-20 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>>> Doing this in genbki.pl makes DBD::Pg lose its array type information,
>>> since it uses Catalog::ParseData() to get it
>>> (https://github.com/bucardo/dbdpg/blob/master/types.c#L439).  May I
>>> suggest moving gen_array_types() to Catalog.pm and calling it from
>>> ParseData() if the input file is pg_type.dat, so that it always returns
>>> complete data?
>
>> Attached is a proposed revision of this patch that does the above.  It
>> passes check-world, reformat_dat_file.pl still works, and DBD::Pg is
>> still able to get the array type information.
>
> I find the way you did that (making the call dependent on
> $preserve_formatting) to be a mighty ugly kluge.  It causes ParseData
> to know far more than it should about what callers are likely to do with
> the data.
>
> I'm inclined to think the right way is to do the expansion always,
> and teach reformat_dat_file.pl to drop autogenerated array types
> on the way out, making this work more like the existing facilities
> for default/computed fields.
>
> The easiest way to make that happen seems to be to invent another, purely
> internal metadata field, along the lines of "is_autogenerated_entry".
> Fooling with that now ...

Something like this, on top of the v2 patch?

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 12b142928a..7286c5b673 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -297,7 +297,7 @@ sub ParseData
 
# Generate array types unless we're just reformatting the file
Catalog::GenArrayTypes($schema, $data)
-   if $catname eq 'pg_type' and !$preserve_formatting;
+   if $catname eq 'pg_type';
return $data;
 }
 
@@ -485,6 +485,7 @@ sub GenArrayTypes
 
foreach my $elem_type (@$types)
{
+   next if ref $elem_type ne 'HASH';
next if !exists $elem_type->{array_type_oid};
my %array_type;
 
@@ -493,6 +494,9 @@ sub GenArrayTypes
$array_type{typname} = '_' . $elem_type->{typname};
$array_type{typelem} = $elem_type->{typname};
 
+   # Stops this entry being output when reformatting the .dat files
+   $array_type{is_auto_generated_entry} = 1;
+
# Arrays require INT alignment, unless the element type requires
# DOUBLE alignment.
$array_type{typalign} = $elem_type->{typalign} eq 'd' ? 'd' : 
'i';
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 053d1ee00d..3859d46043 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -396,7 +396,8 @@ EOM
  || $key eq "oid_symbol"
  || $key eq "descr"
  || $key eq "line_number"
- || $key eq "array_type_oid";
+ || $key eq "array_type_oid"
+ || $key eq "is_auto_generated_entry";
die sprintf "unrecognized field name \"%s\" in %s.dat 
line %s\n",
  $key, $catname, $bki_values{line_number}
  if (!exists($attnames{$key}));
diff --git a/src/include/catalog/reformat_dat_file.pl 
b/src/include/catalog/reformat_dat_file.pl
index 24d9d18637..78c0f3e103 100755
--- a/src/include/catalog/reformat_dat_file.pl
+++ b/src/include/catalog/reformat_dat_file.pl
@@ -132,6 +132,7 @@ foreach my $catname (@catnames)
if (ref $data eq 'HASH')
{
my %values = %$data;
+   next if $values{is_auto_generated_entry};
 


# At this point we have the full tuple in memory as a 
hash

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law

Re: v11 transaction semantics inside procedures

2018-09-20 Thread Pavel Stehule

Hi

čt 20. 9. 2018 v 17:55 odesílatel Dave Cramer  napsal:

> Is there somewhere that the transaction semantics inside a procedure are
> documented ? From what I can tell transactions start from the first DML
> statement and end implicitly when the procedure exits. Commit or Rollback
> can be called anytime inside the transaction and this implicitly starts
> another transaction.
>
> Is there anything else I am missing ? Does DDL get applied after the
> transaction ends ?
>

The CALL statement starts possible chain of transactions. You can check
pg_stat_activity - transaction is started by CALL command.


>
> I do find this somewhat surprising as Postgres typically requires a BEGIN
> statement to start a transaction block.
>

When SELECT is not executed under explicitly started transactions, then
transaction is started implicitly before execution of SELECT command.

There is different behave - SELECT is executed under only one transaction
without exception. The procedure looks like client batch executed on
server. It can be sequence of transactions.

Regards

Pavel


> Thanks
> Dave Cramer
>

Re: generating bootstrap entries for array types

ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>> Doing this in genbki.pl makes DBD::Pg lose its array type information,
>> since it uses Catalog::ParseData() to get it
>> (https://github.com/bucardo/dbdpg/blob/master/types.c#L439).  May I
>> suggest moving gen_array_types() to Catalog.pm and calling it from
>> ParseData() if the input file is pg_type.dat, so that it always returns
>> complete data?

> Attached is a proposed revision of this patch that does the above.  It
> passes check-world, reformat_dat_file.pl still works, and DBD::Pg is
> still able to get the array type information.

I find the way you did that (making the call dependent on
$preserve_formatting) to be a mighty ugly kluge.  It causes ParseData
to know far more than it should about what callers are likely to do with
the data.

I'm inclined to think the right way is to do the expansion always,
and teach reformat_dat_file.pl to drop autogenerated array types
on the way out, making this work more like the existing facilities
for default/computed fields.

The easiest way to make that happen seems to be to invent another, purely
internal metadata field, along the lines of "is_autogenerated_entry".
Fooling with that now ...

regards, tom lane

v11 transaction semantics inside procedures

2018-09-20 Thread Dave Cramer

Is there somewhere that the transaction semantics inside a procedure are
documented ? From what I can tell transactions start from the first DML
statement and end implicitly when the procedure exits. Commit or Rollback
can be called anytime inside the transaction and this implicitly starts
another transaction.

Is there anything else I am missing ? Does DDL get applied after the
transaction ends ?

I do find this somewhat surprising as Postgres typically requires a BEGIN
statement to start a transaction block.

Thanks
Dave Cramer

Re: [patch] Support LLVM 7

Re: To Andres Freund 2018-09-20 <20180920081044.ga16...@msg.df7cb.de>
> > > 2018-09-15 10:49:25.052 UTC [26458] DETAIL:  Failed process was running: 
> > > SELECT '' AS tf_12, BOOLTBL1.*, BOOLTBL2.*
> > >  FROM BOOLTBL1, BOOLTBL2
> > >  WHERE BOOLTBL2.f1 <> BOOLTBL1.f1;
> > > 2018-09-15 10:49:25.052 UTC [26458] LOG:  terminating any other active 
> > > server processes
> > 
> > Hm. Is there any chance to get a backtrace for this one?  This could,
> > although I think less likely so, also be a postgres issue
> > (e.g. generating code for the wrong microarch).
> 
> I'll see if I can find a porterbox to get a backtrace.

32-bit powerpc, 11~beta3-2:

postgres=# set jit = off;
SET
postgres=# SELECT
postgres-# ARRAY(SELECT f.i FROM (
postgres(# (SELECT d + g.i FROM generate_series(4, 30, 3) d 
ORDER BY 1)
postgres(# UNION ALL
postgres(# (SELECT d + g.i FROM generate_series(0, 30, 5) d 
ORDER BY 1)
postgres(# ) f(i)
postgres(# ORDER BY f.i LIMIT 10)
postgres-# FROM generate_series(1, 3) g(i);
array
--
 {1,5,6,8,11,11,14,16,17,20}
 {2,6,7,9,12,12,15,17,18,21}
 {3,7,8,10,13,13,16,18,19,22}
(3 Zeilen)

postgres=# set jit = on;
SET
postgres=# SELECT   
  
ARRAY(SELECT f.i FROM ( 
  (SELECT d + g.i 
FROM generate_series(4, 30, 3) d ORDER BY 1)
  UNION ALL 

(SELECT d + g.i FROM generate_series(0, 30, 5) d ORDER BY 
1)  ) 
f(i)
ORDER BY f.i LIMIT 10)  

  FROM generate_series(1, 3) g(i);


Program received signal SIGSEGV, Segmentation fault.
0xf4a20c18 in ?? ()
(gdb) bt f
#0  0xf4a20c18 in ?? ()
No symbol table info available.
#1  0xf4a20bc8 in ?? ()
No symbol table info available.
#2  0xf4a41b90 in ExecRunCompiledExpr (state=0x1a7515c, econtext=0x1a73dd0, 
isNull=0xffe17c2b)
at ./build/../src/backend/jit/llvm/llvmjit_expr.c:2591
cstate = 
func = 0xf4a20b5c
#3  0x00c2d39c in ExecEvalExprSwitchContext (isNull=0xffe17c2b, 
econtext=, state=0x1a7515c)
at ./build/../src/include/executor/executor.h:303
retDatum = 
oldContext = 0x19fe830
retDatum = 
oldContext = 
#4  ExecProject (projInfo=0x1a75158) at 
./build/../src/include/executor/executor.h:337
econtext = 
state = 0x1a7515c
slot = 0x1a750c0
isnull = 252
econtext = 
state = 
slot = 
isnull = 
#5  ExecScan (node=, accessMtd=accessMtd@entry=0xc3ce50 
,
recheckMtd=recheckMtd@entry=0xc3cdf0 ) at 
./build/../src/backend/executor/execScan.c:201
slot = 
econtext = 
qual = 0x0
projInfo = 0x1a75158
#6  0x00c3ce3c in ExecFunctionScan (pstate=) at 
./build/../src/backend/executor/nodeFunctionscan.c:270
node = 
#7  0x00c2b280 in ExecProcNodeFirst (node=0x1a73d48) at 
./build/../src/backend/executor/execProcnode.c:445
No locals.
#8  0x00c23058 in ExecProcNode (node=0x1a73d48) at 
./build/../src/include/executor/executor.h:237
No locals.
#9  ExecutePlan (execute_once=, dest=0x1a1c218, 
direction=, numberTuples=,
sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x1a73d48, estate=0x19fe8c0)
at ./build/../src/backend/executor/execMain.c:1721
slot = 
current_tuple_count = 0
slot = 
current_tuple_count = 
#10 standard_ExecutorRun (queryDesc=0x1962250, direction=, 
count=, execute_once=)
at ./build/../src/backend/executor/execMain.c:362
estate = 0x19fe8c0
operation = CMD_SELECT
dest = 0x1a1c218
sendTuples = 
oldcontext = 0x19621c0
__func__ = "standard_ExecutorRun"
#11 0x00c23284 in ExecutorRun (queryDesc=queryDesc@entry=0x1962250, 
direction=direction@entry=ForwardScanDirection,
count=, execute_once=) at 
./build/../src/backend/executor/execMain.c:305
No locals.
#12 0x00dcd3a0 in PortalRunSelect (portal=portal@entry=0x198d290, 
forward=forward@entry=true, count=0, count@entry=2147483647,
dest=dest@entry=0x1a1c218) at ./build/../src/backend/tcop/pquery.c:932
queryDesc = 0x1962250
direction = 
nprocessed = 
__func__ = "PortalRunSelect"
#13 0x00dcee7c in PortalRun (portal=portal@entry=0x198d290, 
count=c

Proposal for Signal Detection Refactoring

2018-09-20 Thread Chris Travers

After the patch we did for the race condition here, one point which was
brought up which I thought was quite relevant was the fact that checking
global flags feels a little ugly.

I would like to set up a patch which refactors this as follows:

1.  We create an enum of cancellation levels:
NO_INTERRUPT
INTERRUPT
QUERY_CANCEL
PROCESS_EXIT

2.  We create a macro called PENDING_INTERRUPT_LEVEL() which returns the
highest pending interrupt level.

3.  We check on whether the output of this is greater or equal to the level
we need to handle and work on that basis.

This allows us to handle cases where we just want to check the interrupt
level for return or cleanup purposes.  It does not apply to cases where we
need to change interrupt handling temporarily as we do in a couple of
cases.  I think it is cleaner than checking the QueryCancelPending and
ProcDiePending global flags directly.

So here's a small patch.  I will add it for the next commit fest unless
anyone has any reason I shouldn't.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


sig_refactor.patch
Description: Binary data

Re: fast default vs triggers

2018-09-20 Thread Tomas Vondra





On 09/19/2018 10:35 PM, Andrew Dunstan wrote:



On 09/18/2018 03:36 PM, Andrew Dunstan wrote:



Tomas Vondra has pointed out to me that there's an issue with triggers 
not getting expanded tuples for columns with fast defaults. Here is an 
example that shows the issue:



   andrew=# create table blurfl (id int);
   CREATE TABLE
   andrew=# insert into blurfl select x from generate_series(1,5) x;
   INSERT 0 5
   andrew=# alter table blurfl add column x int default 100;
   ALTER TABLE
   andrew=# create or replace function showmej() returns trigger
   language plpgsql as $$ declare j json; begin j := to_json(old);
   raise notice 'old x: %', j->>'x'; return new; end; $$;
   CREATE FUNCTION
   andrew=# create trigger show_x before update on blurfl for each row
   execute procedure showmej();
   CREATE TRIGGER
   andrew=# update blurfl set id = id where id = 1;
   NOTICE:  old x: 
   UPDATE 1
   andrew=# update blurfl set id = id where id = 1;
   NOTICE:  old x: 100
   UPDATE 1
   andrew=#


The error is fixed with this patch:


   diff --git a/src/backend/commands/trigger.c 
b/src/backend/commands/trigger.c

   index 2436692..f34a72a 100644
   --- a/src/backend/commands/trigger.c
   +++ b/src/backend/commands/trigger.c
   @@ -3396,7 +3396,11 @@ ltrmark:;
    LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
    }
    -   result = heap_copytuple(&tuple);
   +   if (HeapTupleHeaderGetNatts(tuple.t_data) < 
relation->rd_att->natts)

   +   result = heap_expand_tuple(&tuple, relation->rd_att);
   +   else
   +   result = heap_copytuple(&tuple);
   +
    ReleaseBuffer(buffer);
     return result;

I'm going to re-check the various places that this might have been 
missed. I guess it belongs on the open items list.








I haven't found anything further that is misbehaving. I paid particular 
attention to indexes and index-only scans.


I propose to commit this along with an appropriate regression test.



Seems reasonable to me.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL 11 {Beta 4, RC1} Release: 2018-09-20

2018-09-20 Thread Jonathan S. Katz

On 9/18/18 12:15 PM, Jonathan S. Katz wrote:
> On 9/13/18 12:51 PM, Jonathan S. Katz wrote:
>> Hi,
>>
>> We are planning to have another release of PostgreSQL 11, either Beta 4
>> or RC1, next week on Thursday, 2018-09-20. The version will be
>> determined based on the state of the open items list[1] around the time
>> of stamping.
> 
> PostgreSQL 11 was stamped Beta 4[1] yesterday evening for release on
> 2018-09-20.

PostgreSQL 11 Beta 4 has been released[1].

Thanks!

Jonathan

[1] https://www.postgresql.org/about/news/1890/



signature.asc
Description: OpenPGP digital signature

Re: Query is over 2x slower with jit=on

2018-09-20 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > JIT:
> > >   Functions: 2
> > >   Generation Time: 0.680 ms
> > >   Inlining: true
> > >   Inlining Time: 7.591 ms
> > >   Optimization: true
> > >   Optimization Time: 20.522 ms
> > >   Emission Time: 14.607 ms
[...]
> > > How about making that:
> > > JIT:
> > >   Functions: 2
> 
> FWIW, not that I want to do that now, but at some point it might make
> sense to sub-divide this into things like number of "expressions",
> "tuple deforming", "plans", ...  Just mentioning that if somebody wants
> to comment on reformatting this as well, if we're tinkering anyway.

I'd actually think we'd maybe want some kind of 'verbose' mode which
shows exactly what got JIT'd and what didn't- one of the questions that
I think people will be asking is "why didn't X get JIT'd?" and I don't
think that's very easy to figure out currently.

> > >   Options: Inlining, Optimization
> > >   Times (Total, Generation, Inlining, Optimization, Emission): 43.4 ms, 
> > > 0.680 ms, 7.591 ms, 20.522 ms, 14.607 ms
> > 
> > > or something similar?
> > 
> > That's going in the right direction.  Personally I'd make the last line
> > more like
> > 
> > Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, 
> > emission 14.607 ms, total 43.4 ms
> 
> Yea, that's probably easier to read.

I tend to agree that it's easier to read but I'm not sure we need to
quite go that far in reducing the number of rows.

> > (total at the end seems more natural to me, YMMV).

I agree with this..

> I kind of think doing it first is best, because that's usually the first
> thing one wants to know.

and this, so what about:

Times:
  generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 
14.607 ms
  Total: 43.4 ms

Gets the total out there quick on the left where you're scanning down
while keeping the detailed info above for reviewing after.

> > Also, the "options" format you suggest here seems a bit too biased
> > towards binary on/off options --- what happens when there's a
> > three-way option?  So maybe that line should be like
> > 
> > Options: inlining on, optimization on
> > 
> > though I'm less sure about that part.
> 
> I'm pretty certain you're right :).  There's already arguments around
> making optimization more gradual (akin to O1,2,3).

That certainly sounds like it'd be very neat to have though I wonder how
well we'll be able to automatically plan out which optimization level to
use when..

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Bug in to_timestamp().

2018-09-20 Thread Alexander Korotkov

On Thu, Sep 20, 2018 at 6:09 AM amul sul  wrote:
> On Thu, Sep 20, 2018, 3:22 AM Alexander Korotkov  
> wrote:
>> It's hard to understand whether it was expected, because it wasn't
>> properly documented.  More important that it's the same behavior we
>> have before cf984672, and purpose of cf984672 was not to change this.
>>
>> But from the code comments, it's intentional. If you put digits or
>> text characters into format string, you can skip non-separator input
>> string characters.  For instance you may do.
>>
>> # SELECT to_date('y18y12y2011', 'xDDxMMx');
>>   to_date
>> 
>>  2011-12-18
>> (1 row)
>>
>> It's even more interesting that letters and digits are handled in
>> different manner.
>>
>> # SELECT to_date('01801202011', 'xDDxMMx');
>> ERROR:  date/time field value out of range: "01801202011"
>> Time: 0,453 ms
>>
>> # SELECT to_date('01801202011', '9DD9MM9');
>>   to_date
>> 
>>  2011-12-18
>> (1 row)
>>
>> So, letters in format string doesn't allow you to extract fields at
>> particular positions of digit sequence, but digits in format string
>> allows you to.  That's rather undocumented, but from the code you can
>> get that it's intentional.  Thus, I think it would be nice to improve
>> the documentation here.  But I still propose to commit the patch I
>> propose to bring back unintentional behavior change in cf984672.
>
> Agreed, thanks for working on this.

Pushed, thanks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: when set track_commit_timestamp on, database system abort startup

2018-09-20 Thread Masahiko Sawada

On Wed, Sep 19, 2018 at 12:12 PM, Kyotaro HORIGUCHI
 wrote:
> At Sat, 15 Sep 2018 19:26:39 +0900, Masahiko Sawada  
> wrote in 
>> >> To fix that maybe we can disable commitTs if
>> >> controlFile->track_commit_timestamp == false and the
>> >> track_commit_timestamp == true even in crash recovery.
>> >
>> > Hmm, so keep it off while crash recovery runs, and once it's out of that
>> > then enable it automatically?
>>
>> Yes. The attached patch changes it to check
>> controlFile->track_commit_timestamp even the crash recover case. If
>> track_commit_timestamp is set to true in the config file, it's enabled
>> at end of the recovery.
>>
>> > That might work -- by definition we don't
>> > care about the commit TSs of the transaction replayed during crash
>> > recovery, since they were executed in the primary that didn't have
>> > commitTS enable anyway.
>> >
>> > It seems like the first thing we need is TAP cases that reproduce these
>> > two crash scenarios.
>>
>> I attached TAP test that reproduces this issue. We can reproduce it
>> even with single server; making postgres replay a commit WAL in the
>> crash recovery after consumed transactions and enabled
>> track_commit_timestamp.
>
> The fix looks good to me.  The TAP test works fine.

Thank you for looking at this patch.

>
> In the TAP test:
>
> 
> The test script lacks a general description about its objective.
>
> 
> +$node->append_conf('postgresql.conf',
> + "track_commit_timestamp = off");
> +$node->start;
> +
> +# When we start firstly from the initdb the PARAMETER_CHANGES
> +# is emitted at end of the recovery, which disables the
> +# track_commit_timestamp if the crash recovery replay that
> +# WAL. Therefore we restart the server so that we can recovery
> +# from the point where doesn't contain that WAL.
> +$node->restart;
>
> The first startup actually doesn't emit a PARAMETER_CHAGES. If
> track_commit_timestamp were set to on, we get one. However, I
> agree that it is reasonable to eliminate the possiblity of being
> affected by the record. How about something like this?
>
> +# We don't want to replay PARAMETER_CHANGES record while the
> +# crash recovery test below. It is not expected to be emitted
> +# thus far, but we restart the server to get rid of it just in
> +# case.
>
>
> 
> +# During the crash recovery we replay the commit WAL that sets
> +# the commit timestamp to a new page.
> +$node->start;
>
> The comment is mentioning the unexpected symptom. Shouldn't it be
> the desired behavior?
>
> +# During crash recovery server will replay COMMIT records
> +# emitted while commit timestamp was off. The server can start
> +# if we correctly avoid processing commit timestamp for the
> +# records.
>

I agreed with your all review comments. Attached the updated patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/test/modules/commit_ts/t/005_recovery.pl b/src/test/modules/commit_ts/t/005_recovery.pl
new file mode 100644
index 000..80d52fc
--- /dev/null
+++ b/src/test/modules/commit_ts/t/005_recovery.pl
@@ -0,0 +1,53 @@
+# Testing of recovery from where commit timestamps was off
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf',
+  "track_commit_timestamp = off");
+$node->start;
+
+# We don't want to replay PARAMETER_CHANGES record while the
+# crash recovery test below. It is not expected to be emitted
+# thus far, but we restart the server to get rid of it just in
+# case.
+$node->restart;
+
+# Consume 2000 XIDs to beyond the commitTS page boundary.
+$node->safe_psql(
+	'postgres',
+	qq(
+CREATE PROCEDURE comsume_xid(cnt int)
+AS \$\$
+DECLARE
+	i int;
+BEGIN
+	FOR i in 1..cnt LOOP
+		EXECUTE 'SELECT txid_current()';
+		COMMIT;
+	END LOOP;
+END;
+\$\$
+LANGUAGE plpgsql;
+));
+$node->safe_psql('postgres', 'CALL comsume_xid(2000)');
+
+$node->teardown_node;
+
+# Enable track_commit_tiemstamp
+$node->append_conf('postgresql.conf',
+  "track_commit_timestamp = on");
+
+# During crash recovery server will replay COMMIT records
+# emitted while commit timestamp was off. The server can start
+# if we correctly avoid processing commit timestamp for the
+# records.
+$node->start;
+
+# Check if the server launched.
+is($node->psql('postgres', qq(SELECT 1)), 0,
+   'started from the crash recovery');
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4754e75..0ece043 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6833,11 +6833,12 @@ StartupXLOG(void)
 	StartupMultiXact();
 
 	/*
-	 * Ditto commit timestamps.  In a standby, we do it if setting is enabled
-	 * in ControlFile; in a master we base the decision on the GUC itself.
+	 * Ditto commit timestamps.  In both a standby and a master, we do it if
+	 * setting is enabl

Re: generating bootstrap entries for array types

2018-09-20 Thread Dagfinn Ilmari Mannsåker

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

The v2 patch is IMO ready to commit.

The documentation in the form of source comments is updated,
and having Catalog::ParseData() do the expansion instead of
genbki.pl is more in line with what bki.sgml says: 

  If you want to add a new method of making the data representation
  smaller, you must implement it
  in reformat_dat_file.pl and also
  teach Catalog::ParseData() how to expand the
  data back into the full representation.

- ilmari

The new status of this patch is: Ready for Committer

Re: generating bootstrap entries for array types

2018-09-20 Thread Dagfinn Ilmari Mannsåker

Hi again John,

ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:

> Doing this in genbki.pl makes DBD::Pg lose its array type information,
> since it uses Catalog::ParseData() to get it
> (https://github.com/bucardo/dbdpg/blob/master/types.c#L439).  May I
> suggest moving gen_array_types() to Catalog.pm and calling it from
> ParseData() if the input file is pg_type.dat, so that it always returns
> complete data?

Attached is a proposed revision of this patch that does the above.  It
passes check-world, reformat_dat_file.pl still works, and DBD::Pg is
still able to get the array type information.

Thanks,

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen

>From 947e5102d37219a3b008272be2caae3151bbd641 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 20 Sep 2018 12:43:35 +0100
Subject: [PATCH v2] Generate bootstrap entries for array types

Add a new metadata field called 'array_type_oid', which instructs
Catalog::ParseData to generate an array type from the given element type.
This allows most array type entries to be deleted from pg_type.dat.
The lone exception is _record, since it's a pseudotype.
---
 src/backend/catalog/Catalog.pm   |  57 +++
 src/backend/catalog/genbki.pl|   3 +-
 src/include/catalog/genbki.h |   2 +
 src/include/catalog/pg_type.dat  | 471 ---
 src/include/catalog/pg_type.h|  24 +-
 src/include/catalog/reformat_dat_file.pl |   2 +-
 6 files changed, 154 insertions(+), 405 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index ae5b499b6a..12b142928a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -191,6 +191,10 @@ sub ParseHeader
 	{
 		$column{default} = $1;
 	}
+	elsif ($attopt =~ /BKI_ARRAY_TYPE\(['"]?([^'"]+)['"]?\)/)
+	{
+		$column{array} = $1;
+	}
 	elsif ($attopt =~ /BKI_LOOKUP\((\w+)\)/)
 	{
 		$column{lookup} = $1;
@@ -290,6 +294,10 @@ sub ParseData
 		}
 	}
 	close $ifd;
+
+	# Generate array types unless we're just reformatting the file
+	Catalog::GenArrayTypes($schema, $data)
+		if $catname eq 'pg_type' and !$preserve_formatting;
 	return $data;
 }
 
@@ -448,6 +456,8 @@ sub FindAllOidsFromHeaders
 			foreach my $row (@$catdata)
 			{
 push @oids, $row->{oid} if defined $row->{oid};
+push @oids, $row->{array_type_oid}
+	if defined $row->{array_type_oid};
 			}
 		}
 
@@ -464,4 +474,51 @@ sub FindAllOidsFromHeaders
 	return \@oids;
 }
 
+# If the type specifies an array type OID, generate an entry for the array
+# type using values from the element type, plus some values that are needed
+# for all array types.
+sub GenArrayTypes
+{
+	my $pgtype_schema = shift;
+	my $types = shift;
+	my @array_types;
+
+	foreach my $elem_type (@$types)
+	{
+		next if !exists $elem_type->{array_type_oid};
+		my %array_type;
+
+		# Specific values derived from the element type.
+		$array_type{oid} = $elem_type->{array_type_oid};
+		$array_type{typname} = '_' . $elem_type->{typname};
+		$array_type{typelem} = $elem_type->{typname};
+
+		# Arrays require INT alignment, unless the element type requires
+		# DOUBLE alignment.
+		$array_type{typalign} = $elem_type->{typalign} eq 'd' ? 'd' : 'i';
+
+		# Fill in the rest of the values.
+		foreach my $column (@$pgtype_schema)
+		{
+			my $attname = $column->{name};
+
+			# Skip if we already set it above.
+			next if defined $array_type{$attname};
+
+			# If we have a value needed for all arrays, use it, otherwise
+			# copy the value from the element type.
+			if (defined $column->{array})
+			{
+$array_type{$attname} = $column->{array};
+			}
+			else
+			{
+$array_type{$attname} = $elem_type->{$attname};
+			}
+		}
+		push @array_types, \%array_type;
+	}
+	push @$types, @array_types;
+}
+
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 9be51d28b0..053d1ee00d 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -395,7 +395,8 @@ EOM
 			  if $key eq "oid"
 			  || $key eq "oid_symbol"
 			  || $key eq "descr"
-			  || $key eq "line_number";
+			  || $key eq "line_number"
+			  || $key eq "array_type_oid";
 			die sprintf "unrecognized field name \"%s\" in %s.dat line %s\n",
 			  $key, $catname, $bki_values{line_number}
 			  if (!exists($attnames{$key}));
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index b1e2cbdb62..a5fc03f3f4 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -34,6 +34,8 @@
 #define BKI_FORCE_NOT_NULL
 /* Specifies a default value for a catalog field */
 #define BKI_DEFAULT(value)
+/* Specifies a value needed for array types */
+#define BKI_ARRAY_TYPE(value)
 /* Indicates how to perform name lookups for

Re: PostgreSQL 11 {Beta 4, RC1} Release: 2018-09-20

2018-09-20 Thread Jonathan S. Katz

On 9/20/18 1:13 AM, Haroon wrote:
> On Tue, 18 Sep 2018 at 21:15, Jonathan S. Katz  > wrote:
> 
> On 9/13/18 12:51 PM, Jonathan S. Katz wrote:
> > Hi,
> >
> > We are planning to have another release of PostgreSQL 11, either
> Beta 4
> > or RC1, next week on Thursday, 2018-09-20. The version will be
> > determined based on the state of the open items list[1] around the
> time
> > of stamping.
> 
> PostgreSQL 11 was stamped Beta 4[1] yesterday evening for release on
> 2018-09-20.
> 
> There is a copy of the Beta 4 release notes draft here[2]. I would
> greatly appreciate any review and feedback on them, particularly around
> technical accuracy and any omissions that should be included.
> 
> 
> Under Upgrading to PostgreSQL 11 Beta 4 section:
> 
> s/you will to use a strategy/you will need to use a strategy/

Thanks! I've made the change.

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: doc - add missing documentation for "acldefault"

2018-09-20 Thread Joe Conway

On 09/19/2018 12:30 PM, John Naylor wrote:
> On 9/19/18, Tom Lane  wrote:
>> However, I don't object to documenting any function that has its
>> own pg_description string.
> 
> Speaking of, that description string seems to have been neglected.
> I've attached a remedy for that.

Thanks, will take care of it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

RE: Size and size_t in dsa API

2018-09-20 Thread Ideriha, Takeshi

>> As a non-expert developer's opinion, I think mixing of Size and size_t makes 
>> difficult
>to understand source code.
>
>Agreed.  Let's change them all to size_t and back-patch that to keep future
>back-patching easy.  Patch attached.

Thank you for the quick action. I'm happy now.
I confirmed English word Size in comment is kept as it is.


Takeshi Ideriha
Fujitsu Limited

Segfault when creating partition with a primary key and sql_drop trigger exists

2018-09-20 Thread Marco Slot

We're seeing a segmentation fault when creating a partition of a
partitioned table with a primary key when there is a sql_drop trigger on
Postgres 11beta4.

We discovered it because the Citus extension creates a sql_drop trigger,
but it's otherwise unrelated to the Citus extension:
https://github.com/citusdata/citus/issues/2390

To reproduce:

CREATE OR REPLACE FUNCTION on_drop()
RETURNS event_trigger AS $ondrop$
BEGIN
  RAISE NOTICE 'drop_trigger';
END;
$ondrop$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER fail_drop_table ON sql_drop
EXECUTE PROCEDURE on_drop();

CREATE TABLE collections_list (
  key bigint,
  ts timestamptz,
  collection_id integer,
  value numeric,
  PRIMARY KEY(collection_id)
) PARTITION BY LIST ( collection_id );

CREATE TABLE collections_list_1
PARTITION OF collections_list (key, ts, collection_id, value)
FOR VALUES IN (1);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Marco

Re: Size and size_t in dsa API

On Thu, Sep 20, 2018 at 6:28 PM Ideriha, Takeshi
 wrote:
> As a non-expert developer's opinion, I think mixing of Size and size_t makes 
> difficult to understand source code.

Agreed.  Let's change them all to size_t and back-patch that to keep
future back-patching easy.  Patch attached.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-size_t-consistently-in-dsa.-ch.patch
Description: Binary data

Re: Is it possible for postgres_fdw to push down queries on co-located tables?

2018-09-20 Thread Etsuro Fujita


(2018/09/18 23:20), Jinhua Luo wrote:

Sorry, the example is not so proper. I just think even if it's a
simple example, e.g. join two co-located tables, the planner should
work out to push down it. Can you confirm the postgresql could detect
co-located tables on the same foreign server and push down queries on
them? Could you give an actual example or point out the relevant
source code paths for reference?

(Let me clarify the context of this question, if the planner supports
co-located push-down, then it's meaningful for manual sharding via
partitioning to remote tables, where it's mostly necessary to join two
or more co-located parent tables in complex queries. If not, the
postgresql instance on which the parent tables are placed (let's say
it's a coordinator node) would be likely the bottleneck.)


You might want to check partitionwise join functionality as well, which 
we have in PG11 [1].


Best regards,
Etsuro Fujita

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f49842d1ee31b976c681322f76025d7732e860f3

Re: [HACKERS] proposal: schema variables

2018-09-20 Thread Pavel Stehule

Hi

st 19. 9. 2018 v 13:23 odesílatel Arthur Zakirov 
napsal:

> Hello,
>
> On Wed, Sep 19, 2018 at 10:30:31AM +0200, Pavel Stehule wrote:
> > Hi
> >
> > new update:
> >
> > I fixed pg_restore, and I cleaned a code related to transaction
> processing
> >
> > There should be a full functionality now.
>
> I reviewed a little bit the patch. I have a few comments.
>
> > pg_views Columns
>
> I think there is a typo here. It should be "pg_variable".
>

fixed


> > - ONCOMMIT_DROP   /* ON COMMIT DROP */
> > + ONCOMMIT_DROP,  /* ON COMMIT DROP */
> > } OnCommitAction;
>
> There is the extra comma here after ONCOMMIT_DROP.
>

fixed

Thank you for comments

attached updated patch




> 1 - https://www.postgresql.org/docs/current/static/sql-createtable.html
>
> --
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>


schema-variables-20180920-01.patch.gz
Description: application/gzip

Re: [patch] Support LLVM 7