Re: Joins on TID

2018-12-21 Thread Tom Lane
BTW, if we're to start taking joins on TID seriously, we should also
add the missing hash opclass for TID, so that you can do hash joins
when dealing with a lot of rows.

(In principle this also enables things like hash aggregation, though
I'm not very clear on a use-case for grouping by TID.)

regards, tom lane

diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 41d540b..7b25947 100644
*** a/src/backend/utils/adt/tid.c
--- b/src/backend/utils/adt/tid.c
***
*** 20,25 
--- 20,26 
  #include 
  #include 
  
+ #include "access/hash.h"
  #include "access/heapam.h"
  #include "access/sysattr.h"
  #include "catalog/namespace.h"
*** tidsmaller(PG_FUNCTION_ARGS)
*** 239,244 
--- 240,272 
  	PG_RETURN_ITEMPOINTER(ItemPointerCompare(arg1, arg2) <= 0 ? arg1 : arg2);
  }
  
+ Datum
+ hashtid(PG_FUNCTION_ARGS)
+ {
+ 	ItemPointer key = PG_GETARG_ITEMPOINTER(0);
+ 
+ 	/*
+ 	 * While you'll probably have a lot of trouble with a compiler that
+ 	 * insists on appending pad space to struct ItemPointerData, we can at
+ 	 * least make this code work, by not using sizeof(ItemPointerData).
+ 	 * Instead rely on knowing the sizes of the component fields.
+ 	 */
+ 	return hash_any((unsigned char *) key,
+ 	sizeof(BlockIdData) + sizeof(OffsetNumber));
+ }
+ 
+ Datum
+ hashtidextended(PG_FUNCTION_ARGS)
+ {
+ 	ItemPointer key = PG_GETARG_ITEMPOINTER(0);
+ 	uint64		seed = PG_GETARG_INT64(1);
+ 
+ 	/* As above */
+ 	return hash_any_extended((unsigned char *) key,
+ 			 sizeof(BlockIdData) + sizeof(OffsetNumber),
+ 			 seed);
+ }
+ 
  
  /*
   *	Functions to get latest tid of a specified tuple.
diff --git a/src/include/catalog/pg_amop.dat b/src/include/catalog/pg_amop.dat
index e689c9b..436f1bd 100644
*** a/src/include/catalog/pg_amop.dat
--- b/src/include/catalog/pg_amop.dat
***
*** 1013,1018 
--- 1013,1022 
  { amopfamily => 'hash/cid_ops', amoplefttype => 'cid', amoprighttype => 'cid',
amopstrategy => '1', amopopr => '=(cid,cid)', amopmethod => 'hash' },
  
+ # tid_ops
+ { amopfamily => 'hash/tid_ops', amoplefttype => 'tid', amoprighttype => 'tid',
+   amopstrategy => '1', amopopr => '=(tid,tid)', amopmethod => 'hash' },
+ 
  # text_pattern_ops
  { amopfamily => 'hash/text_pattern_ops', amoplefttype => 'text',
amoprighttype => 'text', amopstrategy => '1', amopopr => '=(text,text)',
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index bbcee26..8ddb699 100644
*** a/src/include/catalog/pg_amproc.dat
--- b/src/include/catalog/pg_amproc.dat
***
*** 340,345 
--- 340,349 
amprocrighttype => 'cid', amprocnum => '1', amproc => 'hashint4' },
  { amprocfamily => 'hash/cid_ops', amproclefttype => 'cid',
amprocrighttype => 'cid', amprocnum => '2', amproc => 'hashint4extended' },
+ { amprocfamily => 'hash/tid_ops', amproclefttype => 'tid',
+   amprocrighttype => 'tid', amprocnum => '1', amproc => 'hashtid' },
+ { amprocfamily => 'hash/tid_ops', amproclefttype => 'tid',
+   amprocrighttype => 'tid', amprocnum => '2', amproc => 'hashtidextended' },
  { amprocfamily => 'hash/text_pattern_ops', amproclefttype => 'text',
amprocrighttype => 'text', amprocnum => '1', amproc => 'hashtext' },
  { amprocfamily => 'hash/text_pattern_ops', amproclefttype => 'text',
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index 5178d04..c451d36 100644
*** a/src/include/catalog/pg_opclass.dat
--- b/src/include/catalog/pg_opclass.dat
***
*** 167,172 
--- 167,174 
opcintype => 'xid' },
  { opcmethod => 'hash', opcname => 'cid_ops', opcfamily => 'hash/cid_ops',
opcintype => 'cid' },
+ { opcmethod => 'hash', opcname => 'tid_ops', opcfamily => 'hash/tid_ops',
+   opcintype => 'tid' },
  { opcmethod => 'hash', opcname => 'text_pattern_ops',
opcfamily => 'hash/text_pattern_ops', opcintype => 'text',
opcdefault => 'f' },
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 2abd531..e8452e1 100644
*** a/src/include/catalog/pg_operator.dat
--- b/src/include/catalog/pg_operator.dat
***
*** 204,212 
oprrest => 'eqsel', oprjoin => 'eqjoinsel' },
  
  { oid => '387', oid_symbol => 'TIDEqualOperator', descr => 'equal',
!   oprname => '=', oprcanmerge => 't', oprleft => 'tid', oprright => 'tid',
!   oprresult => 'bool', oprcom => '=(tid,tid)', oprnegate => '<>(tid,tid)',
!   oprcode => 'tideq', oprrest => 'eqsel', oprjoin => 'eqjoinsel' },
  { oid => '402', descr => 'not equal',
oprname => '<>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
oprcom => '<>(tid,tid)', oprnegate => '=(tid,tid)', oprcode => 'tidne',
--- 204,213 
oprrest => 'eqsel', oprjoin => 'eqjoinsel' },
  
  { oid => '387', oid_symbol => 'TIDEqualOperator', descr => 'equal',
!   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'tid',
!   oprright => 

Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

2018-12-21 Thread Alexander Korotkov
On Fri, Dec 21, 2018 at 7:09 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> >> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> >> change.  Otherwise, what is going to happen to an unpatched standby (of
> >> released versions) that receives the new WAL record from a patched
> >> primary?
>
> Oh, and if the answer to your question is not "it fails with an
> intelligible error about an unrecognized WAL record type", then we
> need to adjust what is emitted so that that will be what happens.
> Crashing, or worse silently misprocessing the record, will not do.

Please, note that backpatched version takes special efforts to not
introduce new WAL record type. Unpatched standby applies WAL stream of
patched primary without any errors, but ignoring conflicts (as it was
before) [1].  Patched standby applies the same WAL stream with
conflict handling.  And I've briefly mentioned that in commit message.

"On stable releases we've to be tricky to keep WAL compatibility.
Information required for conflict processing is just appended to data
of XLOG_GIST_PAGE_UPDATE record.  So, PostgreSQL version, which
doesn't know about conflict processing, will just ignore that."

The thing we can mention in the release notes is that both primary and
standby should be upgraded to get conflict handling.  If one of them
is not upgraded, conflicts will be still missed.

Links:
1. 
https://www.postgresql.org/message-id/CAPpHfdsKS0K8q1sJ-XyMrU%3DL%2Be6XSAOgS09NXp1bQDQts%2Bqz%2Bg%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Change pgarch_readyXlog() to return .history files first

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 08:17:12AM +0200, David Steele wrote:
> I thought about doing that, but wanted to focus on the task at hand.  It
> does save a strcpy and a bit of stack space, so seems like a win.
> 
> Overall, the patch looks good to me.  I think breaking up the if does make
> the code more readable.

Thanks for the lookups.  I can see that the patch applies without
conflicts down to 9.4, and based on the opinions gathered on this
thread back-patching this stuff is the consensus, based on input from
Kyotaro Horiguchi and David Steele (I don't mind much myself).  So,
any objections from others in doing so?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 03:14:36PM +, Dagfinn Ilmari Mannsåker wrote:
> Here's a patch that does this (and in passing alphabetises the list of
> options).

Cool, thanks.  The position of the option list is fine.  However
list_TABLEOPTIONS is not a name consistent with the surroundings.  So
we could just use table_level_option?  Reordering them is a good idea,
log_autovacuum_min_duration being the bad entry.
--
Michael


signature.asc
Description: PGP signature


Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 10:43:57AM -0300, Alvaro Herrera wrote:
> errhint("Check that your archive_command is executing properly.  "
> +   "Backup can be canceled safely, "
> "but the database backup will not be usable without all the WAL 
> segments.")))
> 
> I think repeating the same term in the third line is not great.  Some
> ideas:
> 
> Backups can be canceled safely, but they will not be usable without all the 
> WAL segments.
> The backup can be canceled safely, but it will not be usable without all the 
> WAL segments.
> Database backups can be canceled safely, but the current backup will not be 
> usable without all the WAL segments.
> Database backups can be canceled safely, but no backup will be usable without 
> all the WAL segments.

Yes, I agree that repeating two times the work backup is not great.
What about the following then?  This is your second proposal except
that the sentence refers to the backup current running using "this",
which shows better the context in my opinion:
"This backup can be canceled safely, but it will not be usable without
all the WAL segments.
--
Michael


signature.asc
Description: PGP signature


Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-21 Thread John Naylor
On 12/20/18, Tom Lane  wrote:
> I'd be inclined to put the script in src/tools, I think.  IMO src/common
> is for code that actually gets built into our executables.

Done.

>> which takes
>> pl_unreserved_kwlist.h as input and outputs
>> pl_unreserved_kwlist_offset.h and pl_unreserved_kwlist_string.h.
>
> I wonder whether we'd not be better off producing just one output
> file, in which we have the offsets emitted as PG_KEYWORD macros
> and then the giant string emitted as a macro definition, ie
> something like
>
> #define PG_KEYWORD_STRING \
>   "absolute\0" \
>   "alias\0" \
>   ...
>
> That simplifies the Makefile-hacking, at least, and it possibly gives
> callers more flexibility about what they actually want to do with the
> string.

Okay, I tried that. Since the script is turning one header into
another, I borrowed the "*_d.h" nomenclature from the catalogs. Using
a single file required some #ifdef hacks in the output file. Maybe
there's a cleaner way to do this, but I don't know what it is.

Using a single file also gave me another idea: Take value and category
out of ScanKeyword, and replace them with an index into another array
containing those, which will only be accessed in the event of a hit.
That would shrink ScanKeyword to 4 bytes (offset, index), further
increasing locality of reference. Might not be worth it, but I can try
it after moving on to the core scanner.

> I'm for "not change things unnecessarily".  People might well be
> scraping the keyword list out of parser/kwlist.h for other purposes
> right now --- indeed, it's defined the way it is exactly to let
> people do that.  I don't see a good reason to force them to redo
> whatever tooling they have that depends on that.  So let's build
> kwlist_offsets.h alongside that, but not change kwlist.h itself.

Done.

>> I used the global .gitignore, but maybe that's an abuse of it.
>
> Yeah, I'd say it is.

Moved.

>> +# TODO: Error out if the keyword names are not in ASCII order.
>
> +many for including such a check.

Done.

> Also note that we don't require people to have Perl installed when
> building from a tarball.  Therefore, these derived headers must get
> built during "make distprep" and removed by maintainer-clean but
> not distclean.  I think this also has some implications for VPATH
> builds, but as long as you follow the pattern used for other
> derived header files (e.g. fmgroids.h), you should be fine.

Done. I also blindly added support for MSVC.

-John Naylor
 src/common/keywords.c |  55 
 src/include/common/keywords.h |  14 +++
 src/pl/plpgsql/src/.gitignore |   1 +
 src/pl/plpgsql/src/Makefile   |   9 +-
 src/pl/plpgsql/src/pl_scanner.c   | 119 ++
 src/pl/plpgsql/src/pl_unreserved_kwlist.h | 107 +++
 src/tools/gen_keywords.pl | 136 ++
 src/tools/msvc/Solution.pm|  10 +++
 8 files changed, 356 insertions(+), 95 deletions(-)

diff --git a/src/common/keywords.c b/src/common/keywords.c
index 0c0c794c68..0fb14a0372 100644
--- a/src/common/keywords.c
+++ b/src/common/keywords.c
@@ -112,3 +112,58 @@ ScanKeywordLookup(const char *text,
 
 	return NULL;
 }
+
+/* Like ScanKeywordLookup, but uses offsets into a keyword string. */
+const ScanKeywordOffset *
+ScanKeywordLookupOffset(const char *text,
+		const ScanKeywordOffset *keywords,
+		int num_keywords,
+		const char *keyword_string)
+{
+	int			len,
+i;
+	char		word[NAMEDATALEN];
+	const ScanKeywordOffset *low;
+	const ScanKeywordOffset *high;
+
+	len = strlen(text);
+	/* We assume all keywords are shorter than NAMEDATALEN. */
+	if (len >= NAMEDATALEN)
+		return NULL;
+
+	/*
+	 * Apply an ASCII-only downcasing.  We must not use tolower() since it may
+	 * produce the wrong translation in some locales (eg, Turkish).
+	 */
+	for (i = 0; i < len; i++)
+	{
+		char		ch = text[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		word[i] = ch;
+	}
+	word[len] = '\0';
+
+	/*
+	 * Now do a binary search using plain strcmp() comparison.
+	 */
+	low = keywords;
+	high = keywords + (num_keywords - 1);
+	while (low <= high)
+	{
+		const ScanKeywordOffset *middle;
+		int			difference;
+
+		middle = low + (high - low) / 2;
+		difference = strcmp(keyword_string + middle->offset, word);
+		if (difference == 0)
+			return middle;
+		else if (difference < 0)
+			low = middle + 1;
+		else
+			high = middle - 1;
+	}
+
+	return NULL;
+}
diff --git a/src/include/common/keywords.h b/src/include/common/keywords.h
index 0b31505b66..7cd7bf4461 100644
--- a/src/include/common/keywords.h
+++ b/src/include/common/keywords.h
@@ -28,11 +28,20 @@ typedef struct ScanKeyword
 	int16		category;		/* see codes above */
 } ScanKeyword;
 
+typedef struct ScanKeywordOffset
+{
+	int32		offset;			/* offset into a keyword string */
+	int16		value;			/* grammar's token code */
+	int16		category;		/* see codes 

Re: could recovery_target_timeline=latest be the default in standby mode?

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 01:54:20PM +0300, Sergei Kornilov wrote:
> I am +1 for recovery_target_timeline=latest by default. This is
> common case in my opinion.

I agree also that switching to the latest timeline should be the
default.  People get confused because of the current default.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 01:57:40PM +, Dagfinn Ilmari Mannsåker wrote:
> Yeah, because of that we can't do the obvious HeadMatches("CREATE",
> "TABLE") && (TailMatches(...) || TailMatches(...) || ...).  I believe
> this would require extending the match syntax with regex-like grouping,
> alternation and repetition operators, which I'm not volunteering to
> do.

That seems to be a lot of work for little benefit as few queries are
able to work within CREATE SCHEMA, so I would not take this road.
--
Michael


signature.asc
Description: PGP signature


Joins on TID

2018-12-21 Thread Tom Lane
I decided to spend an afternoon seeing exactly how much work would be
needed to support parameterized TID scans, ie nestloop-with-inner-TID-
scan joins, as has been speculated about before, most recently here:

https://www.postgresql.org/message-id/flat/CAMqTPq%3DhNg0GYFU0X%2BxmuKy8R2ARk1%2BA_uQpS%2BMnf71MYpBKzg%40mail.gmail.com

It turns out it's not that bad, less than 200 net new lines of code
(all of it in the planner; the executor seems to require no work).

Much of the code churn is because tidpath.c is so ancient and crufty.
It was mostly ignoring the RestrictInfo infrastructure, in particular
emitting the list of tidquals as just bare clauses not RestrictInfos.
I had to change that in order to avoid inefficiencies in some places.

I haven't really looked at how much of a merge problem there'll be
with Edmund Horner's work for TID range scans.  My feeling about it
is that we might be best off treating that as a totally separate
code path, because the requirements are significantly different (for
instance, a range scan needs AND semantics not OR semantics for the
list of quals to apply).

regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 480fd25..88780c0 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** cost_tidscan(Path *path, PlannerInfo *ro
*** 1202,1216 
  	ntuples = 0;
  	foreach(l, tidquals)
  	{
! 		if (IsA(lfirst(l), ScalarArrayOpExpr))
  		{
  			/* Each element of the array yields 1 tuple */
! 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l);
  			Node	   *arraynode = (Node *) lsecond(saop->args);
  
  			ntuples += estimate_array_length(arraynode);
  		}
! 		else if (IsA(lfirst(l), CurrentOfExpr))
  		{
  			/* CURRENT OF yields 1 tuple */
  			isCurrentOf = true;
--- 1202,1219 
  	ntuples = 0;
  	foreach(l, tidquals)
  	{
! 		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
! 		Expr	   *qual = rinfo->clause;
! 
! 		if (IsA(qual, ScalarArrayOpExpr))
  		{
  			/* Each element of the array yields 1 tuple */
! 			ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) qual;
  			Node	   *arraynode = (Node *) lsecond(saop->args);
  
  			ntuples += estimate_array_length(arraynode);
  		}
! 		else if (IsA(qual, CurrentOfExpr))
  		{
  			/* CURRENT OF yields 1 tuple */
  			isCurrentOf = true;
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 3bb5b8d..335de0a 100644
*** a/src/backend/optimizer/path/tidpath.c
--- b/src/backend/optimizer/path/tidpath.c
***
*** 12,29 
   * this allows
   *		WHERE ctid IN (tid1, tid2, ...)
   *
   * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
   * which amount to "CTID = run-time-determined-TID".  These could in
   * theory be translated to a simple comparison of CTID to the result of
   * a function, but in practice it works better to keep the special node
   * representation all the way through to execution.
   *
-  * There is currently no special support for joins involving CTID; in
-  * particular nothing corresponding to best_inner_indexscan().  Since it's
-  * not very useful to store TIDs of one table in another table, there
-  * doesn't seem to be enough use-case to justify adding a lot of code
-  * for that.
-  *
   *
   * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
--- 12,28 
   * this allows
   *		WHERE ctid IN (tid1, tid2, ...)
   *
+  * As with indexscans, our definition of "pseudoconstant" is pretty liberal:
+  * we allow anything that doesn't involve a volatile function or a Var of
+  * the relation under consideration.  Vars belonging to other relations of
+  * the query are allowed, giving rise to parameterized TID scans.
+  *
   * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr),
   * which amount to "CTID = run-time-determined-TID".  These could in
   * theory be translated to a simple comparison of CTID to the result of
   * a function, but in practice it works better to keep the special node
   * representation all the way through to execution.
   *
   *
   * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
***
*** 44,125 
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/restrictinfo.h"
  
  
! static bool IsTidEqualClause(OpExpr *node, int varno);
! static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno);
! static List *TidQualFromExpr(Node *expr, int varno);
! static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel);
! 
  
  /*
!  * Check to see if an opclause is of the form
   *		CTID = pseudoconstant
   * or
   *		pseudoconstant = CTID
!  *
!  * We check that the CTID Var belongs to relation "varno".  That is 

Re: Offline enabling/disabling of data checksums

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:
> It adds an (now mandatory) --action parameter that takes either verify,
> enable or disable as argument.

There are two discussion points which deserve attention here:
1) Do we want to rename pg_verify_checksums to something else, like
pg_checksums.  I like a lot if we would do a simple renaming of the
tool, which should be the first step taken. 
2) Which kind of interface do we want to use?  When I did my own
flavor of pg_checksums, I used an --action switch able to use the
following values:
- enable
- disable
- verify
The switch cannot be specified twice (perhaps we could enforce the
last value as other binaries do in the tree, not sure if that's
adapted here).  A second type of interface is to use one switch per
action.  For both interfaces if no action is specify then the tool
fails.  Vote is open.  

> This is basically meant as a stop-gap measure in case online activation
> of checksums won't make it for v12, but maybe it is independently
> useful?

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

> Things I have not done so far:
> 
> 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
> only verify checksums.

Check.  That sounds right to me.

> 2. Rename the scan_* functions (Michael renamed them to operate_file and
> operate_directory but I am not sure it is worth it.

The renaming makes sense, as scan implies only reading while enabling
checksums causes a write.

> 3. Once that patch is in, there would be a way to disable checksums so
> there'd be a case to also change the initdb default to enabled, but that
> required further discussion (and maybe another round of benchmarks).

Perhaps, that's unrelated to this thread though.  I am not sure that
all users would be ready to pay the extra cost of checksums enabled by
default.
--
Michael


signature.asc
Description: PGP signature


Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-12-21 Thread David Rowley
On Fri, 21 Dec 2018 at 10:05, Robert Haas  wrote:
> On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera  
> wrote:
> > Namely: how does this handle the case of partition pruning structure
> > being passed from planner to executor, if an attach happens in the
> > middle of it and puts a partition in between existing partitions?  Array
> > indexes of any partitions that appear later in the partition descriptor
> > will change.
> >
> > This is the reason I used the query snapshot rather than EState.
>
> I didn't handle that.  If partition pruning relies on nothing changing
> between planning and execution, isn't that broken regardless of any of
> this?  It's true that with the simple query protocol we'll hold locks
> continuously from planning into execution, and therefore with the
> current locking regime we couldn't really have a problem.  But unless
> I'm confused, with the extended query protocol it's quite possible to
> generate a plan, release locks, and then reacquire locks at execution
> time.  Unless we have some guarantee that a new plan will always be
> generated if any DDL has happened in the middle, I think we've got
> trouble, and I don't think that is guaranteed in all cases.

Today the plan would be invalidated if a partition was ATTACHED or
DETACHED. The newly built plan would get the updated list of
partitions.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Compiling on Termux

2018-12-21 Thread Thomas Munro
On Sat, Dec 22, 2018 at 9:19 AM David Fetter  wrote:
> On Fri, Dec 21, 2018 at 04:24:20PM -0500, Andrew Dunstan wrote:
> > On 12/21/18 4:04 PM, Thomas Munro wrote:
> > > On Sat, Dec 22, 2018 at 7:32 AM David Fetter  wrote:
> > >> On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote:
> > >>> That talks about using -D__ANDROID_API__=23 (or presumably higher) to
> > >>> make sure that sigtimedwait is exposed by signal.h.  Something similar
> > >>> may be afoot here.
> > >> That worked perfectly...to expose the next issue, namely that at least
> > >> part of the System-V IPC mechanisms have been left out.  Do we support
> > >> other systems where this is true?
> > > Interesting, they left that stuff out deliberately out basically
> > > because it all sucks:
> > >
> > > https://android.googlesource.com/platform/ndk/+/4e159d95ebf23b5f72bb707b0cb1518ef96b3d03/docs/system/libc/SYSV-IPC.TXT
> > >
> > > PostgreSQL currently needs SysV shm just for the small segment that's
> > > used for preventing multiple copies running in the same pgdata.  You'd
> > > need to find a new way to do that.  We don't use SysV semaphores
> > > (though we can) or message queues.
> >
> > That doc says:
> >
> > Killing processes automatically to make room for new ones is an
> > important part of Android's application lifecycle implementation.
> >
> > How's that going to play with Postgres? Sounds like the OOM killer on
> > steroids.
>
> I don't know precisely how it's going to play with Postgres, but
> Termux does supply a Postgres in its native packages.  That package
> appears to work, at least in the single-connection case, so they're
> doing something somehow to get it up and running.
>
> It could be that Android won't be a platform we recommend for high
> workloads, at least as long as these things remain true.

They use libandroid-shmem which emulates SysV shmem.

https://github.com/termux/termux-packages/blob/master/packages/postgresql/src-backend-Makefile.patch
https://github.com/termux/libandroid-shmem

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Compiling on Termux

2018-12-21 Thread David Fetter
On Fri, Dec 21, 2018 at 04:24:20PM -0500, Andrew Dunstan wrote:
> On 12/21/18 4:04 PM, Thomas Munro wrote:
> > On Sat, Dec 22, 2018 at 7:32 AM David Fetter  wrote:
> >> On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote:
> >>> That talks about using -D__ANDROID_API__=23 (or presumably higher) to
> >>> make sure that sigtimedwait is exposed by signal.h.  Something similar
> >>> may be afoot here.
> >> That worked perfectly...to expose the next issue, namely that at least
> >> part of the System-V IPC mechanisms have been left out.  Do we support
> >> other systems where this is true?
> > Interesting, they left that stuff out deliberately out basically
> > because it all sucks:
> >
> > https://android.googlesource.com/platform/ndk/+/4e159d95ebf23b5f72bb707b0cb1518ef96b3d03/docs/system/libc/SYSV-IPC.TXT
> >
> > PostgreSQL currently needs SysV shm just for the small segment that's
> > used for preventing multiple copies running in the same pgdata.  You'd
> > need to find a new way to do that.  We don't use SysV semaphores
> > (though we can) or message queues.
> 
> That doc says:
> 
> Killing processes automatically to make room for new ones is an
> important part of Android's application lifecycle implementation.
> 
> How's that going to play with Postgres? Sounds like the OOM killer on
> steroids.

I don't know precisely how it's going to play with Postgres, but
Termux does supply a Postgres in its native packages.  That package
appears to work, at least in the single-connection case, so they're
doing something somehow to get it up and running.

It could be that Android won't be a platform we recommend for high
workloads, at least as long as these things remain true.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Use an enum for RELKIND_*?

2018-12-21 Thread Greg Stark
Out of curiosity I built with -Wswitch-enum to see if it would be
possible to just enable it. It looks like the main culprits are the
node types and if those were switched to #defines it might be feasible
to do so though it would still be a lot of hassle to add case labels
all over the place.

But I had a second look to see if the output pointed out any actual
bugs. I found one though it's pretty minor:

lockfuncs.c:234:3: warning: enumeration value
‘LOCKTAG_SPECULATIVE_TOKEN’ not handled in switch [-Wswitch-enum]
   switch ((LockTagType) instance->locktag.locktag_type)
   ^~

It just causes speculative locks to be printed wrong in the
pg_lock_status view. And speculative locks are only held transiently
so unless you're do a bulk load using ON CONFLICT (and also cared
about pg_lock_status, perhaps in some monitoring tool) you wouldn't
even notice this:

 postgres***=# select * from pg_lock_status() where locktype =
'speculative token';
┌───┬──┬──┬──┬───┬┬───┬─┬───┬──┬┬───┬───┬─┬──┐
│ locktype  │ database │ relation │ page │ tuple │ virtualxid
│ transactionid │ classid │ objid │ objsubid │ virtualtransaction │
pid  │ mode  │ granted │ fastpath │
├───┼──┼──┼──┼───┼┼───┼─┼───┼──┼┼───┼───┼─┼──┤
│ speculative token │  634 │  │  │   │
│   │   1 │ 0 │0 │ 4/32   │
18652 │ ExclusiveLock │ t   │ f│
└───┴──┴──┴──┴───┴┴───┴─┴───┴──┴┴───┴───┴─┴──┘

The speculative lock is actually on a transaction ID and "speculative
lock token" so the "database", "objid", and "objsubid" are bogus here.


Re: [HACKERS] ArrayLists instead of List (for some things)

2018-12-21 Thread David Rowley
On Fri, 3 Nov 2017 at 03:17, Tom Lane  wrote:
>
> David Rowley  writes:
> > Comments on the design are welcome, but I was too late to the
> > commitfest, so there are other priorities. However, if you have a
> > strong opinion, feel free to voice it.
>
> I do not like replacing Lists piecemeal; that's a recipe for ongoing
> API breakage and back-patching pain.  Plus we'll then have *four*
> different linked-list implementations in the backend, which sure
> seems like too many.
>
> We've jacked up the List API and driven a new implementation underneath
> once before.  Maybe it's time to do that again.

(reviving this old thread as it's still on my list of things to work on)

How would you feel if I submitted a patch that changed all locations
where we test for an empty list with: if (list) or if (list == NIL) to
a more standard form of if (list_is_empty(list)) ?

The idea here is that I'd like a way to preallocate memory for a list
to a given size before we start populating it, and that flies in the
face of how we currently test for list emptiness.

Such a macro could be backpatched to assist extensions.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance issue in foreign-key-aware join estimation

2018-12-21 Thread David Rowley
On Sat, 22 Dec 2018 at 09:28, David Rowley  wrote:
> Going by my profiler this drops match_eclasses_to_foreign_key_col()
> down to just 10% of total planner time for this query. The new
> bms_is_member() call is pretty hot inside that function though.

I should have said 28% instead of 10%. 10% is the time spent
exclusively just in that function (not in a function called by that
function). The bms_is_member() call I mention above is about 20% of
the total time, which likely includes some other call sites too.

Back in [1], I mentioned that I'd like to start moving away from our
linked list based implementation of List and start using an array
based version instead. If we had this we could easily further improve
this code fkey matching code to not even look near equivalence classes
that don't contain members for the relations in question with a design
something like:

1. Make PlannerInfo->eq_classes an array list instead of an array,
this will significantly improve the performance of list_nth().
2. Have a Bitmapset per relation that indexes which items in
eq_classes have ec_members for the given relation.
3. In match_eclasses_to_foreign_key_col() perform a bms_overlap() on
the eq_classes index bitmapsets for the relations at either side of
the foreign key then perform a bms_next_member() type loop over the
result of that in order to skip over the eq_classes items that can't
match.

Since match_foreign_keys_to_quals() calls
match_eclasses_to_foreign_key_col() once for each item in
PlannerInfo->fkey_list (167815 items in this case) and again for each
foreign key column in those keys, then this should significantly
reduce the effort required since we make a pass over *every*
equivalence  class each time match_eclasses_to_foreign_key_col() gets
called.

This array list implementation is something I did want to get one for
PG12. The height of the bar to do this is pretty high given what was
mentioned in [2].

[1] 
https://www.postgresql.org/message-id/CAKJS1f_2SnXhPVa6eWjzy2O9A%3Docwgd0Cj-LQeWpGtrWqbUSDA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/21592.1509632225%40sss.pgh.pa.us

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Compiling on Termux

2018-12-21 Thread Thomas Munro
On Sat, Dec 22, 2018 at 7:32 AM David Fetter  wrote:
> On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote:
> > That talks about using -D__ANDROID_API__=23 (or presumably higher) to
> > make sure that sigtimedwait is exposed by signal.h.  Something similar
> > may be afoot here.
>
> That worked perfectly...to expose the next issue, namely that at least
> part of the System-V IPC mechanisms have been left out.  Do we support
> other systems where this is true?

Interesting, they left that stuff out deliberately out basically
because it all sucks:

https://android.googlesource.com/platform/ndk/+/4e159d95ebf23b5f72bb707b0cb1518ef96b3d03/docs/system/libc/SYSV-IPC.TXT

PostgreSQL currently needs SysV shm just for the small segment that's
used for preventing multiple copies running in the same pgdata.  You'd
need to find a new way to do that.  We don't use SysV semaphores
(though we can) or message queues.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Compiling on Termux

2018-12-21 Thread David Fetter
On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote:
> On Sat, Dec 22, 2018 at 5:56 AM David Fetter  wrote:
> >
> > Folks,
> >
> > I'm trying to compile master (c952eae52a33069e2e92d34f217b43d0eca3d7de)
> > on Termux, using the supplied settings, as follows.
> >
> > pg_config --configure | xargs ./configure > configure.out 2>configure.err
> > make -j 4 > make.out 2> make.err
> >
> > There appears to be some confusion somewhere about sync_file_range,
> > namely that it's found by ./configure and then not found in make.
> >
> > What should I be poking at to make this work?
> 
> Apparently your libc (or something else) defines the function so the
> configure test passes, but your  doesn't declare it so we
> can't use it.  I guess Termux supplies the headers but your Android
> supplies the libraries, so there may be sync issues.  I'd try hunting
> around for declarations with something like find /usr/include -name
> '*.h' | xargs grep sync_file_range.  Here's an interesting similar
> case:
> 
> https://github.com/termux/termux-packages/issues/899
> 
> That talks about using -D__ANDROID_API__=23 (or presumably higher) to
> make sure that sigtimedwait is exposed by signal.h.  Something similar
> may be afoot here.

That worked perfectly...to expose the next issue, namely that at least
part of the System-V IPC mechanisms have been left out.  Do we support
other systems where this is true?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Performance issue in foreign-key-aware join estimation

2018-12-21 Thread David Rowley
On Fri, 21 Dec 2018 at 06:44, Tom Lane  wrote:
> I was distressed to discover via perf that 69% of the runtime of this
> test now goes into match_eclasses_to_foreign_key_col().  That seems
> clearly unacceptable.

Agreed. That's pretty terrible.

I looked at this a bit and see that
match_eclasses_to_foreign_key_col() is missing any smarts to skip
equivalence classes that don't have ec_relids bits for both rels. With
that added the run-time is reduced pretty dramatically.

I've only tested with a debug build as of now, but I get:

Unpatched:

$ pgbench -n -T 60 -f query.sql postgres
latency average = 18411.604 ms

Patched:
latency average = 8748.177 ms

Going by my profiler this drops match_eclasses_to_foreign_key_col()
down to just 10% of total planner time for this query. The new
bms_is_member() call is pretty hot inside that function though.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


speed_up_matching_fkeys_to_eclasses.patch
Description: Binary data


Offline enabling/disabling of data checksums

2018-12-21 Thread Michael Banck
Hi,

the attached patch adds offline enabling/disabling of checksums to
pg_verify_checksums. It is based on independent work both Michael
(Paquier) and me did earlier this year and takes changes from both, see
https://github.com/credativ/pg_checksums and
https://github.com/michaelpq/pg_plugins/tree/master/pg_checksums

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?

Things I have not done so far:

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.

2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.

3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 6444fc9ca4..65d6195509 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies/enables/disables page level checksums in an offline cluster
  *
  *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
  *
@@ -13,15 +13,16 @@
 #include 
 #include 
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
-
 
 static int64 files = 0;
 static int64 blocks = 0;
@@ -31,16 +32,32 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_NONE,
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE,
+	PG_ACTION_VERIFY
+} ChecksumAction;
+
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_NONE;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables/disables/verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -A, --action   action to take on the cluster, can be set as\n"));
+	printf(_(" \"verify\", \"enable\" and \"disable\"\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
@@ -80,6 +97,80 @@ skipfile(const char *fn)
 }
 
 static void
+updateControlFile(char *DataDir, ControlFileData *ControlFile)
+{
+	int			fd;
+	char		buffer[PG_CONTROL_FILE_SIZE];
+	char		ControlFilePath[MAXPGPATH];
+
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_DISABLE);
+
+	/*
+	 * For good luck, apply the same static assertions as in backend's
+	 * WriteControlFile().
+	 */
+#if PG_VERSION_NUM >= 10
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+	 "pg_control is too large for atomic disk writes");
+#endif
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
+	 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+(char *) ControlFile,
+offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
+	 * errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+	unlink(ControlFilePath);
+
+	fd = open(ControlFilePath,
+			  

Re: Compiling on Termux

2018-12-21 Thread Thomas Munro
On Sat, Dec 22, 2018 at 5:56 AM David Fetter  wrote:
>
> Folks,
>
> I'm trying to compile master (c952eae52a33069e2e92d34f217b43d0eca3d7de)
> on Termux, using the supplied settings, as follows.
>
> pg_config --configure | xargs ./configure > configure.out 2>configure.err
> make -j 4 > make.out 2> make.err
>
> There appears to be some confusion somewhere about sync_file_range,
> namely that it's found by ./configure and then not found in make.
>
> What should I be poking at to make this work?

Apparently your libc (or something else) defines the function so the
configure test passes, but your  doesn't declare it so we
can't use it.  I guess Termux supplies the headers but your Android
supplies the libraries, so there may be sync issues.  I'd try hunting
around for declarations with something like find /usr/include -name
'*.h' | xargs grep sync_file_range.  Here's an interesting similar
case:

https://github.com/termux/termux-packages/issues/899

That talks about using -D__ANDROID_API__=23 (or presumably higher) to
make sure that sigtimedwait is exposed by signal.h.  Something similar
may be afoot here.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: slow queries over information schema.tables

2018-12-21 Thread Greg Stark
Just brainstorming here. Another crazy idea would be to get rid of
"name" data type, at least from the user-visible planner point of
view. It would probably have to be stored as a fixed length data type
like today but with a one-byte length header. That would make it
possible for the planner to use as if it was just another varchar.



Re: Tid scan improvements

2018-12-21 Thread Tom Lane
BTW, with respect to this bit in 0001:

@@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType 
nulltesttype, Node *arg,
 return (Selectivity) 0; /* keep compiler quiet */
 }
 }
+else if (vardata.var && IsA(vardata.var, Var) &&
+ ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber)
+{
+/*
+ * There are no stats for system columns, but we know CTID is never
+ * NULL.
+ */
+selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
+}
 else
 {
 /*

I'm not entirely sure why you're bothering; surely nulltestsel is
unrelated to what this patch is about?  And would anybody really
write "WHERE ctid IS NULL"?

However, if we do think it's worth adding code to cover this case,
I wouldn't make it specific to CTID.  *All* system columns can be
assumed not null, see heap_getsysattr().

regards, tom lane



Re: A few new options for vacuumdb

2018-12-21 Thread Bossart, Nathan
On 12/21/18, 10:51 AM, "Robert Haas"  wrote:
> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan  wrote:
>> Either way, we'll still have to decide whether to fail or to silently
>> skip the option if you do something like specify --min-mxid-age for a
>> 9.4 server.
>
> +1 for fail.

Sounds good.  I'll keep all the version checks and will fail for
unsupported options in the next patch set.

Nathan



Re: [suggestion]support UNICODE host variables in ECPG

2018-12-21 Thread Tom Lane
"Nagaura, Ryohei"  writes:
> Tsunakawa-san
>> * What's the benefit of supporting UTF16 in host variables?

> 1) As byte per character is constant in UTF16 encoding, it can process 
> strings more efficiently than other encodings.

I don't think I buy that argument; it falls down as soon as you consider
characters above U+.  I worry that by supporting UTF16, we'd basically
be encouraging users to write code that fails on such characters, which
doesn't seem like good project policy.

regards, tom lane



Re: Tid scan improvements

2018-12-21 Thread Tom Lane
Edmund Horner  writes:
> Ok.  I think that will simplify things.  So if I follow you correctly,
> we should do:

> 1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC),
> and use truncate_useless_pathkeys on them.
> 2. If we have tid quals or pathkeys, emit a TID scan path.

Check.

> For the (optional) backwards scan support patch, should we separately
> emit another path, in the reverse direction?

What indxpath.c does is, if has_useful_pathkeys is true, to generate
pathkeys both ways and then build paths if the pathkeys get past
truncate_useless_pathkeys.  That seems sufficient in this case too.
There are various heuristics about whether it's really useful to
consider both sort directions, but that intelligence is already
built into truncate_useless_pathkeys.  tid quals with no pathkeys
would be reason to generate a forward path, but not reason to
generate a reverse path, because then that would be duplicative.

regards, tom lane



Re: A few new options for vacuumdb

2018-12-21 Thread Robert Haas
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan  wrote:
> Either way, we'll still have to decide whether to fail or to silently
> skip the option if you do something like specify --min-mxid-age for a
> 9.4 server.

+1 for fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improve selectivity estimate for range queries

2018-12-21 Thread Tom Lane
"Yuzuko Hosoya"  writes:
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" 
>>  wrote in
>> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp>
>>> To handle such cases I've thought up of an idea based on a previous
>>> commit[1] which solved a similar problem related to
>>> DEFAULT_NUM_DISTINCT.  Just like this modification, I added a flag
>>> which shows whether the selectivity

>> The commit [1] added almost no complexity, but this does. Affects many
>> functions to introduce tighter coupling between operator-selectivity
>> functions and clause selectivity functions. isdefault travells alone
>> too long just to bind remote functions. We might need more pricipled
>> way to handle that.

Yeah, I don't think that this approach is a reasonable tradeoff to
eliminate a narrow case.  In particular, what's been done here to
clause_selectivity is totally unprincipled and poorly thought out:
you can't add an output parameter that's set in only some cases.
Yet, there's no good way to decide how to set it in many cases.
For instance, what would you do for an AND or OR where some of
the branches are default estimates and some not?

>> Around the [1], there was a discussion about introducing the notion of
>> uncertainty to selectivity.  The isdefualt is a kind of uncertainty
>> value indicating '0/100% uncertain'. So my feeling is saying that
>> it's better that Selectivity has certinty component then building a
>> selectivity arithmetics involving uncertainty, though I don't have a
>> clear picture.

> I see.  Indeed if we would change Selectivity so that it includes 
> information about default/non-default, such problems I mentioned 
> would be solved.  But I'm afraid that this modification would lead
> to a big impact, since there are lots of functions that calculate
> selectivity.  I also don't have a concreate idea, so I will think 
> about it.  Besides that, I'd like to ask community whether such
> modification would be acceptable or not.

The planner definitely has a lot of problems that could be addressed
with some sort of uncertainty framework ... but it'd be a huge
undertaking, which is why nobody's tried yet.

A smaller-footprint way to fix the immediate problem might be to
change the values of DEFAULT_INEQ_SEL and friends so that they're
even less likely to be matched by accident.  That is, instead of
0., use 0.342 or some such.  It might
seem that that's just moving the problem around, but I think it
might be possible to show that such a value couldn't be computed
by scalarltsel given a histogram with no more than 1 members.
(I haven't tried to actually prove that, but it seems intuitive
that the set of possible results would be quantized with no more
than about 5 digits precision.)

regards, tom lane



Re: Remove Deprecated Exclusive Backup Mode

2018-12-21 Thread Robert Haas
On Fri, Dec 21, 2018 at 1:18 AM David Steele  wrote:
> On 12/21/18 2:01 AM, Michael Paquier wrote:
> > On Thu, Dec 20, 2018 at 12:29:48PM +0200, David Steele wrote:
> >> Cannot move patch to the same commitfest, and multiple future commitfests
> >> exist!
> >
> > I am not sure what it means either.  What if you just mark the existing
> > entry as returned with feedback, and create a new one ahead?
>
> Yeah, I can do that, but it would be nice to know why this is not
> working.  It would also be nice to preserve the history.
>
> Magnus?

I think what it means is that it doesn't know which CommitFest is the
"next" one.  If the patch were currently in the in-progress
CommitFest, the Open one (if any) would be the next one.  Otherwise,
if there were only one Future CommitFest, that would be the next one.
But right now you're moving a patch that is already in the Open
CommitFest and there are two 2 Future CommitFests, so it doesn't know
which one to pick.

I think the old CommitFest application let you set the CommitFest via
the Edit screen also, so you could pick a specific CommitFest to
target.  But that doesn't seem to exist in Magnus's version.

If you just wait until the Open CommitFest is marked In Progress and
the next one is marked Open, it should work in any case.

I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-12-21 Thread Robert Haas
On Thu, Dec 20, 2018 at 4:38 PM Robert Haas  wrote:
> Lowering the lock level might also make something that was previously
> safe into something unsafe, because now there's no longer a guarantee
> that invalidation messages are received soon enough. With
> AccessExclusiveLock, we'll send invalidation messages before releasing
> the lock, and other processes will acquire the lock and then
> AcceptInvalidationMessages().  But with ShareUpdateExclusiveLock the
> locks can coexist, so now there might be trouble.  I think this is an
> area where we need to do some more investigation.

So there are definitely problems here.  With my patch:

create table tab (a int, b text) partition by range (a);
create table tab1 partition of tab for values from (0) to (10);
prepare t as select * from tab;
begin;
explain execute t; -- seq scan on tab1
execute t; -- no rows

Then, in another session:

alter table tab detach partition tab1;
insert into tab1 values (300, 'oops');

Back to the first session:

execute t; -- shows (300, 'oops')
explain execute t; -- still planning to scan tab1
commit;
explain execute t; -- now it got the memo, and plans to scan nothing
execute t; -- no rows

Well, that's not good.  We're showing a value that was never within
the partition bounds of any partition of tab.  The problem is that,
since we already have locks on all relevant objects, nothing triggers
the second 'explain execute' to process invalidation messages, so we
don't update the plan.  Generally, any DDL with less than
AccessExclusiveLock has this issue.  On another thread, I was
discussing with Tom and Peter the possibility of trying to rejigger
things so that we always AcceptInvalidationMessages() at least once
per command, but I think that just turns this into a race: if a
concurrent commit happens after 'explain execute t' decides not to
re-plan but before it begins executing, we have the same problem.

This example doesn't involve partition pruning, and in general I don't
think that the problem is confined to partition pruning.  It's rather
that if there's no conflict between the lock that is needed to change
the set of partitions and the lock that is needed to run a query, then
there's no way to guarantee that the query runs with the same set of
partitions for which it was planned. Unless I'm mistaken, which I
might be, this is also a problem with your approach -- if you repeat
the same prepared query in the same transaction, the transaction
snapshot will be updated, and thus the PartitionDesc will be expanded
differently at execution time, but the plan will not have changed,
because invalidation messages have not been processed.

Anyway, I think the only fix here is likely to be making the executor
resilient against concurrent changes in the PartitionDesc. I don't
think there's going to be any easy  way to compensate for added
partitions without re-planning, but maybe we could find a way to flag
detached partitions so that they return no rows without actually
touching the underlying relation.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: START/END line number for COPY FROM

2018-12-21 Thread Tom Lane
Surafel Temesgen  writes:
> Currently we can skip header line on COPY FROM but having the ability to
> skip and stop copying at any line can use to divide long copy operation and
> enable to copy a subset of the file and skipping footer. Attach is a patch
> for it

I do not think this is a good idea.  We have resisted attempts to add
ETL-like features to COPY on the grounds that it would add complexity
and cost performance, and that that's not what COPY is for.   This
seems to fall squarely in the domain of something you should be doing
with another tool.

regards, tom lane



Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

2018-12-21 Thread Alvaro Herrera
On 2018-Dec-21, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
> > change.  Otherwise, what is going to happen to an unpatched standby (of
> > released versions) that receives the new WAL record from a patched
> > primary?
> 
> We can't change XLOG_PAGE_MAGIC in released branches, surely.
> 
> I think the correct thing is just for the release notes to warn people
> to upgrade standby servers first.

You're right.  My memory is playing tricks on me.  I recalled that we
had done it to prevent replay of WAL replay in nonpatched standbys in
some backpatched commit, but I can't find any evidence of this :-(
The commit message for 8e9a16ab8f7f (in 9.3 branch after it was
released) says:

In replication scenarios using the 9.3 branch, standby servers must be
upgraded before their master, so that they are prepared to deal with the
new WAL record once the master is upgraded; failure to do so will cause
WAL replay to die with a PANIC message.  Later upgrade of the standby
will allow the process to continue where it left off, so there's no
disruption of the data in the standby in any case.  Standbys know how to
deal with the old WAL record, so it's okay to keep the master running
the old code for a while.

Stupidly, I checked the 9.4 version of that commit (then the master
branch) and it does indeed contain the XLOG_PAGE_MAGIC change, but the
9.3 commit doesn't.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

2018-12-21 Thread Tom Lane
Alvaro Herrera  writes:
>> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
>> change.  Otherwise, what is going to happen to an unpatched standby (of
>> released versions) that receives the new WAL record from a patched
>> primary?

Oh, and if the answer to your question is not "it fails with an
intelligible error about an unrecognized WAL record type", then we
need to adjust what is emitted so that that will be what happens.
Crashing, or worse silently misprocessing the record, will not do.

regards, tom lane



Re: Changes to pg_dump/psql following collation "C" in the catalog

2018-12-21 Thread Tom Lane
"Daniel Verite"  writes:
> One consequence of using the "C" collation in the catalog versus
> the db collation is that pg_dump -t with a regexp may not find
> the same tables as before. It happens when these conditions are
> all met:
> - the collation of the database is not "C"
> - the regexp has locale-dependant parts
> - the names to match include characters that are sensitive to
> locale-dependant matching

Hm, interesting.

> It seems that to fix that, we could qualify the references to columns such
> as "relname" and "schema_name" with COLLATE "default" clauses in the
> queries that use pattern-matching in client-side tools, AFAICS
> pg_dump and psql.

Seems reasonable.  I was initially worried that this might interfere with
query optimization, but some experimentation says that the planner
successfully derives prefix index clauses anyway (which is correct,
because matching a fixed regex prefix doesn't depend on locale).

It might be better to attach the COLLATE clause to the pattern constant
instead of the column name; that'd be less likely to break if sent to
an older server.

> Before going any further with this idea, is there agreement that it's an
> issue to address and does this look like the best way to do that?

That is a question worth asking.  We're going to be forcing people to get
used to this when working directly in SQL, so I don't know if masking it
in a subset of tools is really a win or not.

regards, tom lane



Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()

2018-12-21 Thread Alvaro Herrera
Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this
change.  Otherwise, what is going to happen to an unpatched standby (of
released versions) that receives the new WAL record from a patched
primary?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: tickling the lesser contributor's withering ego

2018-12-21 Thread Erik Rijkers

On 2018-12-21 16:17, Alvaro Herrera wrote:

On 2018-Dec-21, Tom Lane wrote:


Alvaro Herrera  writes:
> I propose the following patch, which will make those links stable --
> then we can add the following links to the contributors page:
> https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS
> https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS

Seems reasonable, but note the lag time --- unless somebody does
something out of the ordinary, those pages won't actually have
such tags till after the February minor releases.


Good point.  That seems acceptable to me.

Erik, are you willing to propose a patch for the pgweb side of things?


Yes, sounds reasonable.  I think I can cobble something together.




Re: tickling the lesser contributor's withering ego

2018-12-21 Thread Alvaro Herrera
On 2018-Dec-21, Tom Lane wrote:

> Alvaro Herrera  writes:
> > I propose the following patch, which will make those links stable --
> > then we can add the following links to the contributors page:
> > https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS
> > https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS
> 
> Seems reasonable, but note the lag time --- unless somebody does
> something out of the ordinary, those pages won't actually have
> such tags till after the February minor releases.

Good point.  That seems acceptable to me.

Erik, are you willing to propose a patch for the pgweb side of things?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Changes to pg_dump/psql following collation "C" in the catalog

2018-12-21 Thread Daniel Verite
 Hi,

One consequence of using the "C" collation in the catalog versus
the db collation is that pg_dump -t with a regexp may not find
the same tables as before. It happens when these conditions are
all met:
- the collation of the database is not "C"
- the regexp has locale-dependant parts
- the names to match include characters that are sensitive to
locale-dependant matching

For instance a table named "rapport_journée_12" in an fr_FR.UTF-8 db
used to be found by pg_dump -t 'rapport_\w+_\d+', and is now missed
in the devel version.

It seems that to fix that, we could qualify the references to columns such
as "relname" and "schema_name" with COLLATE "default" clauses in the
queries that use pattern-matching in client-side tools, AFAICS
pg_dump and psql.

Before going any further with this idea, is there agreement that it's an
issue to address and does this look like the best way to do that?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-21 Thread Dagfinn Ilmari Mannsåker
I wrote:
> Another omission I just realised of is that it doesn't complete the list
> of table storage options after after "WITH (".  That should be fairly
> easy to add (we already have the list for completing after ALTER TABLE
>  SET|RESET), but it's getting late here now.

Here's a patch that does this (and in passing alphabetises the list of
options).

- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

>From 5f948f2ebff03a89d18ab621bc21d03cfaa07529 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Fri, 21 Dec 2018 15:03:19 +
Subject: [PATCH] Add completion for CREATE TABLE ... WITH options

Move the list of options from the "ALTER TABLE  SET|RESET (" block
to the top-level, and reuse it after "CREATE TABLE  ( ... ) WITH (".

In passing, alphabetise the option list properly.
---
 src/bin/psql/tab-complete.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5ba6ffba8c..074c88378b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1005,6 +1005,40 @@ static const pgsql_thing_t words_after_create[] = {
 	{NULL}		/* end of list */
 };
 
+static const char *const list_TABLEOPTIONS[] = {
+	"autovacuum_analyze_scale_factor",
+	"autovacuum_analyze_threshold",
+	"autovacuum_enabled",
+	"autovacuum_freeze_max_age",
+	"autovacuum_freeze_min_age",
+	"autovacuum_freeze_table_age",
+	"autovacuum_multixact_freeze_max_age",
+	"autovacuum_multixact_freeze_min_age",
+	"autovacuum_multixact_freeze_table_age",
+	"autovacuum_vacuum_cost_delay",
+	"autovacuum_vacuum_cost_limit",
+	"autovacuum_vacuum_scale_factor",
+	"autovacuum_vacuum_threshold",
+	"fillfactor",
+	"log_autovacuum_min_duration",
+	"parallel_workers",
+	"toast.autovacuum_enabled",
+	"toast.autovacuum_freeze_max_age",
+	"toast.autovacuum_freeze_min_age",
+	"toast.autovacuum_freeze_table_age",
+	"toast.autovacuum_multixact_freeze_max_age",
+	"toast.autovacuum_multixact_freeze_min_age",
+	"toast.autovacuum_multixact_freeze_table_age",
+	"toast.autovacuum_vacuum_cost_delay",
+	"toast.autovacuum_vacuum_cost_limit",
+	"toast.autovacuum_vacuum_scale_factor",
+	"toast.autovacuum_vacuum_threshold",
+	"toast.log_autovacuum_min_duration",
+	"toast_tuple_target",
+	"user_catalog_table",
+	NULL
+};
+
 
 /* Forward declaration of functions */
 static char **psql_completion(const char *text, int start, int end);
@@ -1904,44 +1938,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("(");
 	/* ALTER TABLE  SET|RESET ( */
 	else if (Matches("ALTER", "TABLE", MatchAny, "SET|RESET", "("))
-	{
-		static const char *const list_TABLEOPTIONS[] =
-		{
-			"autovacuum_analyze_scale_factor",
-			"autovacuum_analyze_threshold",
-			"autovacuum_enabled",
-			"autovacuum_freeze_max_age",
-			"autovacuum_freeze_min_age",
-			"autovacuum_freeze_table_age",
-			"autovacuum_multixact_freeze_max_age",
-			"autovacuum_multixact_freeze_min_age",
-			"autovacuum_multixact_freeze_table_age",
-			"autovacuum_vacuum_cost_delay",
-			"autovacuum_vacuum_cost_limit",
-			"autovacuum_vacuum_scale_factor",
-			"autovacuum_vacuum_threshold",
-			"fillfactor",
-			"parallel_workers",
-			"log_autovacuum_min_duration",
-			"toast_tuple_target",
-			"toast.autovacuum_enabled",
-			"toast.autovacuum_freeze_max_age",
-			"toast.autovacuum_freeze_min_age",
-			"toast.autovacuum_freeze_table_age",
-			"toast.autovacuum_multixact_freeze_max_age",
-			"toast.autovacuum_multixact_freeze_min_age",
-			"toast.autovacuum_multixact_freeze_table_age",
-			"toast.autovacuum_vacuum_cost_delay",
-			"toast.autovacuum_vacuum_cost_limit",
-			"toast.autovacuum_vacuum_scale_factor",
-			"toast.autovacuum_vacuum_threshold",
-			"toast.log_autovacuum_min_duration",
-			"user_catalog_table",
-			NULL
-		};
-
 		COMPLETE_WITH_LIST(list_TABLEOPTIONS);
-	}
 	else if (Matches("ALTER", "TABLE", MatchAny, "REPLICA", "IDENTITY", "USING", "INDEX"))
 	{
 		completion_info_charp = prev5_wd;
@@ -2439,6 +2436,9 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("CREATE", "TEMP|TEMPORARY", "TABLE", MatchAny, "(*)"))
 		COMPLETE_WITH("INHERITS (", "ON COMMIT", "PARTITION BY",
 	  "TABLESPACE", "WITH (");
+	else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)", "WITH", "(") ||
+			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "(*)", "WITH", "("))
+		COMPLETE_WITH_LIST(list_TABLEOPTIONS);
 	/* Complete CREATE TABLE ON COMMIT with actions */
 	else if (TailMatches("CREATE", "TEMP|TEMPORARY", "TABLE", MatchAny, "(*)", "ON", "COMMIT"))
 		COMPLETE_WITH("DELETE ROWS", "DROP", "PRESERVE ROWS");
-- 
2.20.1



Re: tickling the lesser contributor's withering ego

2018-12-21 Thread Tom Lane
Alvaro Herrera  writes:
> I propose the following patch, which will make those links stable --
> then we can add the following links to the contributors page:
> https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS
> https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS

Seems reasonable, but note the lag time --- unless somebody does
something out of the ordinary, those pages won't actually have
such tags till after the February minor releases.

regards, tom lane



Re: tickling the lesser contributor's withering ego

2018-12-21 Thread Alvaro Herrera
On 2018-Nov-04, Erik Rijkers wrote:

> I wouldn't mind if this page:
>   https://www.postgresql.org/community/contributors/
> 
> contained a link to (contributors v11):
> https://www.postgresql.org/docs/11/static/release-11.html#id-1.11.6.5.6
> 
> and to (contributors v10)
> https://www.postgresql.org/docs/current/static/release-11.html#id-1.11.6.5.6

I propose the following patch, which will make those links stable --
then we can add the following links to the contributors page:

https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS
https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/release-10.sgml b/doc/src/sgml/release-10.sgml
index aacdd360e0..c72b4c931e 100644
--- a/doc/src/sgml/release-10.sgml
+++ b/doc/src/sgml/release-10.sgml
@@ -8990,7 +8990,7 @@ This was disabled in the PG 9.6 branch so there is no commit here.
 
   
 
-  
+  
Acknowledgments
 

diff --git a/doc/src/sgml/release-11.sgml b/doc/src/sgml/release-11.sgml
index f35b0d8cc9..faa5835de4 100644
--- a/doc/src/sgml/release-11.sgml
+++ b/doc/src/sgml/release-11.sgml
@@ -3617,7 +3617,7 @@ same commits as above
 
   
 
-  
+  
Acknowledgments
 



Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-21 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Thu, Dec 20, 2018 at 12:02:52AM +, Dagfinn Ilmari Mannsåker wrote:
>> Point, fixed in the attached v4.  OTOH, as I mentioned in my other
>> email, that runs into the problem that it won't complete the actions
>> after e.g.  "CREATE TEMP TABLE FOO () WITH () ON COMMIT".
>
> I am fine to do that later on if that's adapted, one complication being
> that the completion is dependent on the order of the clauses for CREATE
> TABLE as we need something compatible with CREATE TABLE commands bundled
> with CREATE SCHEMA calls so only tail checks can be done.

Yeah, because of that we can't do the obvious HeadMatches("CREATE",
"TABLE") && (TailMatches(...) || TailMatches(...) || ...).  I believe
this would require extending the match syntax with regex-like grouping,
alternation and repetition operators, which I'm not volunteering to do.

> So committed your last version after some comment tweaks and reordering
> the entries in alphabetical order.

Thanks!

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law



Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup

2018-12-21 Thread Alvaro Herrera
On 2018-Dec-21, Michael Paquier wrote:

> The thing is that the current messages are actually misleading, because
> for base backups taken by the replication protocol pg_stop_backup is
> never called, which is I think confusing.  While looking around I have
> also noticed that the top comments of do_pg_start_backup and
> do_pg_stop_backup also that they are used with BASE_BACKUP.
> 
> Attached is a patch to reduce the confusion and improve the related
> comments and messages.

Looks like good cleanup.

This phrase: "Backup can be canceled safely, " seems a bit odd.  It
would work either in plural "Backups can" or maybe be specific to the
current backup, "The backup can".  But looking at the bigger picture,

 errhint("Check that your archive_command is executing 
properly.  "
+"Backup can be canceled safely, "
 "but the database backup will not be usable 
without all the WAL segments.")))

I think repeating the same term in the third line is not great.  Some
ideas:

Backups can be canceled safely, but they will not be usable without all the WAL 
segments.
The backup can be canceled safely, but it will not be usable without all the 
WAL segments.
Database backups can be canceled safely, but the current backup will not be 
usable without all the WAL segments.
Database backups can be canceled safely, but no backup will be usable without 
all the WAL segments.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: could recovery_target_timeline=latest be the default in standby mode?

2018-12-21 Thread Sergei Kornilov
Hello

I am +1 for recovery_target_timeline=latest by default. This is common case in 
my opinion. And first release without recovery.conf is reasonable time for 
change default value.

But i doubt if we can ignore recovery_target_timeline in standby and always use 
latest in standby. I have no use case for this but i prefer change only default 
value.

regards, Sergei



pg_upgrade: Pass -j down to vacuumdb

2018-12-21 Thread Jesper Pedersen

Hi Hackers,

Here is a patch that passes the -j option from pg_upgrade down to 
vacuumdb if supported.


I'll add it to the January 'Fest.

Thanks for considering !

Best regards,
 Jesper
>From ea941f942830589469281e0d5c17740469c6aebc Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Pass the -j option down to vacuumdb if supported.

Author: Jesper Pedersen 
---
 src/bin/pg_upgrade/check.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index f2215b7095..102221a201 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,26 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data,
+		/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	else
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s-j %d --all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data, user_opts.jobs,
+		/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
-			new_cluster.bindir, user_specification.data);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+		fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+new_cluster.bindir, user_specification.data);
+	else
+		fprintf(script, "\"%s/vacuumdb\" %s-j %d --all --analyze-in-stages\n",
+new_cluster.bindir, user_specification.data, user_opts.jobs);
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2



Re: Progress reporting for pg_verify_checksums

2018-12-21 Thread Michael Banck
Hi,

On Wed, Oct 03, 2018 at 11:50:36AM +1300, Thomas Munro wrote:
> On Sat, Sep 29, 2018 at 1:07 AM Michael Banck  
> wrote:
> Windows doesn't like sigaction:
> 
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189

Thanks for the report and sorry for taking so long to reply.
 
> I'm not sure if we classify this as a "frontend" program.  Should it
> be using pqsignal() from src/port/pqsignal.c?  Or perhaps just
> sigaction as you have it (pqsignal.c says that we require sigaction on
> all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
> never going to work anyway.

I've used pqsignal now, and disabled that feature on Windows.

I've also updated one comment and added an additional regression test.

V5 attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..b470c19c08 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -80,6 +80,19 @@ PostgreSQL documentation
  
 
  
+  -P
+  --progress
+  
+   
+Enable progress reporting. Turning this on will deliver an approximate
+progress report during the checksum verification. Sending the
+SIGUSR1 signal will toggle progress reporting
+on or off during the verification run (not available on Windows).
+   
+  
+ 
+
+ 
-V
--version

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 6444fc9ca4..b049ea17b3 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -10,6 +10,8 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -30,9 +32,18 @@ static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool show_progress = false;
 
 static const char *progname;
 
+/*
+ * Progress status information.
+ */
+int64		total_size = 0;
+int64		current_size = 0;
+pg_time_t	last_progress_update;
+pg_time_t	scan_started;
+
 static void
 usage(void)
 {
@@ -43,6 +54,7 @@ usage(void)
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
+	printf(_("  -P, --progress show progress information\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nIf no data directory (DATADIR) is specified, "
@@ -67,6 +79,69 @@ static const char *const skip[] = {
 	NULL,
 };
 
+static void
+toggle_progress_report(int signum)
+{
+
+	/* we handle SIGUSR1 only, and toggle the value of show_progress */
+	if (signum == SIGUSR1)
+		show_progress = !show_progress;
+
+}
+
+/*
+ * Report current progress status. Parts borrowed from
+ * PostgreSQLs' src/bin/pg_basebackup.c
+ */
+static void
+report_progress(bool force)
+{
+	pg_time_t	now = time(NULL);
+	int			total_percent = 0;
+
+	char		totalstr[32];
+	char		currentstr[32];
+	char		currspeedstr[32];
+
+	/* Make sure we report at most once a second */
+	if ((now == last_progress_update) && !force)
+		return;
+
+	/* Save current time */
+	last_progress_update = now;
+
+	/*
+	 * Calculate current percent done, based on KiB...
+	 */
+	total_percent = total_size ? (int64) ((current_size / 1024) * 100 / (total_size / 1024)) : 0;
+
+	/* Don't display larger than 100% */
+	if (total_percent > 100)
+		total_percent = 100;
+
+	/* The same for total size */
+	if (current_size > total_size)
+		total_size = current_size / 1024;
+
+	snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
+			 total_size / 1024);
+	snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
+			 current_size / 1024);
+	snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
+			 (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
+	fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
+			currentstr, totalstr, total_percent, currspeedstr);
+
+	/*
+	 * If we are reporting to a terminal, send a carriage return so that we
+	 * stay on the same line.  If not, send a newline.
+	 */
+	if (isatty(fileno(stderr)))
+		fprintf(stderr, "\r");
+	else
+		fprintf(stderr, "\n");
+}
+
 static bool
 skipfile(const char *fn)
 {
@@ -117,6 +192,7 @@ scan_file(const char *fn, BlockNumber segmentno)
 			

Re: Online verification of checksums

2018-12-21 Thread Michael Banck
Hi,

On Thu, Dec 20, 2018 at 04:19:11PM +0100, Michael Banck wrote:
> Yeah, new rebased version attached.

By the way, one thing that this patch also fixes is checksum
verification on basebackups (as pointed out the other day by my
colleague Bernd Helmele):

postgres@kohn:~$ initdb -k data
postgres@kohn:~$ pg_ctl -D data -l logfile start
waiting for server to start done
server started
postgres@kohn:~$ pg_verify_checksums -D data
pg_verify_checksums: cluster must be shut down to verify checksums
postgres@kohn:~$ pg_basebackup -h /tmp -D backup1
postgres@kohn:~$ pg_verify_checksums -D backup1
pg_verify_checksums: cluster must be shut down to verify checksums
postgres@kohn:~$ pg_checksums -c -D backup1
Checksum scan completed
Files scanned:  1094
Blocks scanned: 2867
Bad checksums:  0
Data checksum version: 1

Where pg_checksums has the online verification patch applied.

As I don't think many people will take down their production servers in
order to verify checksums, verifying them on basebackups looks like a
useful use-case that is currently broken with pg_verify_checksums.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



could recovery_target_timeline=latest be the default in standby mode?

2018-12-21 Thread Peter Eisentraut
Is there ever a reason why you would *not* want
recovery_target_timeline=latest in standby mode?

pg_basebackup -R doesn't set it.  That seems suboptimal.

Perhaps this could be the default in standby mode, or even the implicit
default, ignoring the recovery_target_timeline setting altogether.

How could we make things simpler here?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Speeding up creating UPDATE/DELETE generic plan for partitioned table into a lot

2018-12-21 Thread Amit Langote
Kato-san,

On 2018/12/21 15:36, Kato, Sho wrote:
> Hi,
> I want to speed up the creation of UPDATE/DELETE generic plan for tables 
> partitioned into a lot.
> 
> Currently, creating a generic plan of UPDATE/DELTE for such table, planner 
> creates a plan to scan all partitions.
> So it takes a very long time.
> I tried with a table partitioned into 8192, it took 12 seconds. 
>
> In most cases, since the partitions to access are partial, I think planner 
> does not need to create a Scan path for every partition.

What do you mean by "since the partitions to access are partial"?

> Is there any better way? For example, can planner create generic plans from 
> the parameters specified for EXECUTE?

Well, a generic plan is, by definition, *not* specific to the values of
parameters, so it's not clear what you're suggesting here.

Thanks,
Amit




Performance of SELECT in a table partitioned into a lot

2018-12-21 Thread Kato, Sho
Hi,

I compared INSERT/UPDATE/DELETE/SELECT throughput with PostgreSQL and another 
dbms.
For INSERT/DELETE/UPDATE, PostgreSQL performance is superior, but for SELECT, 
PostgreSQL performance is slightly lower than another dbms.

Because information may be missing, it may be difficult, but do you know this 
reason?
Also, if you know where I need to find out, please let me know.

*table information(e.g. 8192 partitions, each partition has 1,000 records)* 

testdb=# \d test.accounts
   Partitioned table "test.accounts"
  Column  |  Type   | Collation | Nullable |  Default   

--+-+---+--+
 aid  | integer |   | not null | 
nextval('test.accounts_aid_seq'::regclass)
 abalance | integer |   |  | 
Partition key: RANGE (aid)
Indexes:
"accounts_ix" btree (aid)
Number of partitions: 8192 (Use \d+ to list them.)

*interface*
JDBC(postgresql-42.2.4.jar)
Use PreparedStatement

*database tuning*
plan_cache_mode = force_custom_plan
max_worker_processes = 0
max_parallel_workers_per_gather = 0
max_parallel_workers = 0

*SQL*
SELECT abalance FROM test.accounts WHERE aid = $1;

$1 is random(npart * 1000)

*Other setting*
Benchmark is executed with a single session.

*Benchmark results*
I use master(commit 71a05b2232 Wed Dec 5) + v8 patch[1] + v1 patch[2].

npart PostgreSQL another dbms
- -- 
0 6314.7 6580.3
2 5761.9 6390.6
4 5916   6279.3
8 5884.1 6000.7
165887.7 6296
325868.3 6274.4
645826.5 6248.6
128   5807.4 6208.9
256   5748.7 6241.4
512   5699.8 6204.6
1024  5625.9 6174.1
2048  5540.5 6159.3
4096  5393.3 6060
8192  5251.3 6093.4

[1]:https://www.postgresql.org/message-id/9d7c5112-cb99-6a47-d3be-cf1ee6862...@lab.ntt.co.jp
[2]:https://www.postgresql.org/message-id/CAKJS1f-=fnmqmqp6qitkd+xeddxw22yslp-0xfk3jaqux2y...@mail.gmail.com

regards,

Sho Kato




RE: [suggestion]support UNICODE host variables in ECPG

2018-12-21 Thread Nagaura, Ryohei
Matsumura-san, Tsunakawa-san

Thank you for reply.

Tsunakawa-san
> * What's the benefit of supporting UTF16 in host variables?
There are two benefits.
1) As byte per character is constant in UTF16 encoding, it can process strings 
more efficiently than other encodings.
2) This enables C programmers to use wide characters.

> * Does your proposal comply with the SQL standard?  If not, what does the
> SQL standard say about support for UTF16?
I referred to the document, but I could not find it.
Does anyone know about this?

> * Why only Windows?
It should be implemented in other OS if needed.

Matsumura-san
> I understand that the previous discussion pointed that the feature had
> better be implemented more simply or step-by-step and description about
> implementation is needed more.
> I also think it prevented the discussion to reach to the detail of feature.
> What is your opinion about it?
I wanted to discuss the necessity first, so I did not answer.
I'm very sorry for not having mentioned it.
If it is judged that this function is necessary, I'll remake the design.

Best regards,
-
Ryohei Nagaura




RE: Improve selectivity estimate for range queries

2018-12-21 Thread Yuzuko Hosoya
Hi,

Thanks for the comments.
I attach the v2 patch.

> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Friday, December 21, 2018 12:25 PM
> 
> Hello.
> 
> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" 
>  wrote in
> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp>
> > In my environment, the selectivity for id > 0 was 0.1,
> > and the selectivity for id < 1 was 0.1. Then, the
> > value of rqlist->hibound and rqlist->lobound are set respectively.
> > On the other hand, DEFAULT_INEQ_SEL is 0..  As a
> > result, the final selectivity is computed according to
> > DEFAULT_RANGE_INEQ_SEL, since the condition of the second if statement
> > was satisfied. However, these selectivities were computed according to
> > the statistics, so the final selectivity had to be calculated from 
> > rqlist->hibound +
rqlist->lobound
> - 1.0.
> > My test case might be uncommon, but I think it would cause some problems.
> 
> Yeah, its known behavior as the comment just above. If in your example query, 
> the table weer very
large
> and had an index it could run faultly an index scan over a fair amount of 
> tuples. But I'm not sure
> how much it matters and your example doesn't show that.
> 
> The behvavior discussed here is introduced by this commit.
> 
> | commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a
> | Author: Tom Lane 
> | Date:   Tue Nov 9 00:34:46 2004 +
> |
> | Use a hopefully-more-reliable method of detecting default selectivity
> | estimates when combining the estimates for a range query.  As pointed 
> out
> | by Miquel van Smoorenburg, the existing check for an impossible combined
> | result would quite possibly fail to detect one default and one 
> non-default
> | input.  It seems better to use the default range query estimate in such
> | cases.  To do so, add a check for an estimate of exactly 
> DEFAULT_INEQ_SEL.
> | This is a bit ugly because it introduces additional coupling between
> | clauselist_selectivity and scalarltsel/scalargtsel, but it's not like
> | there wasn't plenty already...
> 
> > To handle such cases I've thought up of an idea based on a previous
> > commit[1] which solved a similar problem related to
> > DEFAULT_NUM_DISTINCT.  Just like this modification, I added a flag
> > which shows whether the selectivity
> 
> The commit [1] added almost no complexity, but this does. Affects many 
> functions to introduce
tighter
> coupling between operator-selectivity functions and clause selectivity 
> functions.
> isdefault travells alone too long just to bind remote functions. We might 
> need more pricipled way
to
> handle that.
>
Yes, indeed.  But I have not found better way than this approach yet.
 
> > was calculated according to the statistics or not to
> > clauselist_selectivity, and used it as a condition of the if statement
> > instead of
> > rqlist->hibound == DEFAULT_INEQ_SEL and rqlist->lobound == DEFAULT_INEQ_SEL.
> > I am afraid that changes were more than I had expected, but we can
> > estimate selectivity correctly.
> >
> > Patch attached.  Do you have any thoughts?
> 
> I've just run over the patch, but some have comments.
> 
> + if (isdefault)
> + rqelem->lobound_isdefault = true;
> 
> This is taking the disjunction of lobounds ever added, I suppose it should be 
> the last lobounds'
> isdefault.
> 
Indeed.  I fixed it.

> As you see, four other instances of "selec ==/!= DEFAULT_*" exist in 
> mergejoinsel. Don't they lead
> to similar kind of problem?
>
I didn't encounter similar problems, but as you said, such problems would be 
occurred due to the
condition, selec != DEFAULT_INEQ_SEL.  I changed these conditions into 
"!isdefault".

> 
> 
>  Selectivity lobound;/* Selectivity of a var > something clause */
>  Selectivity hibound;/* Selectivity of a var < something clause */
> +boollobound_isdefault;/* lobound is a default selectivity? */
> +boolhibound_isdefault;/* hibound is a default selectivity? */
>  } RangeQueryClause;
> 
> Around the [1], there was a discussion about introducing the notion of 
> uncertainty to selectivity.
> The isdefualt is a kind of uncertainty value indicating '0/100% uncertain'. 
> So my feeling is
saying
> that it's better that Selectivity has certinty component then building a 
> selectivity arithmetics
> involving uncertainty, though I don't have a clear picture.
>
I see.  Indeed if we would change Selectivity so that it includes 
information about default/non-default, such problems I mentioned 
would be solved.  But I'm afraid that this modification would lead
to a big impact, since there are lots of functions that calculate
selectivity.  I also don't have a concreate idea, so I will think 
about it.  Besides that, I'd like to ask community whether such
modification would be acceptable or not.

Best regards,

Yuzuko Hosoya
NTT 

Re: Tid scan improvements

2018-12-21 Thread Edmund Horner
On Fri, 21 Dec 2018 at 16:31, Tom Lane  wrote:
>
> Edmund Horner  writes:
> > For the forward scan, I seem to recall, from your merge join example,
> > that it's useful to set the pathkeys even when there are no
> > query_pathkeys.  We just have to unconditionally set them so that the
> > larger plan can make use of them.
>
> No.  Look at indxpath.c: it does not worry about pathkeys unless
> has_useful_pathkeys is true, and it definitely does not generate
> pathkeys that don't get past truncate_useless_pathkeys.  Those
> functions are responsible for worrying about whether mergejoin
> can use the pathkeys.  It's not tidpath.c's job to outthink them.

Ok.  I think that will simplify things.  So if I follow you correctly,
we should do:

1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC),
and use truncate_useless_pathkeys on them.
2. If we have tid quals or pathkeys, emit a TID scan path.

For the (optional) backwards scan support patch, should we separately
emit another path, in the reverse direction?  (My current patch only
creates one path, and tries to decide what the best direction is by
looking at query_pathkeys.  This doesn't fit into the above
algorithm.)