Re: pgbench - use pg logging capabilities

2020-01-09 Thread Fabien COELHO


Bonjour Michaël,


TBH, my recommendation would be to drop *all* of these likely()
and unlikely() calls.  What evidence have you got that those are
meaningfully improving the quality of the generated code?  And if
they're buried inside macros, they certainly aren't doing anything
useful in terms of documenting the code.


Yes.  I am wondering if we should not rework this part of the logging
with something like the attached.  My 2c, thoughts welcome.


ISTM that the intent is to minimise the performance impact of ignored 
pg_log calls, especially when under debug where it is most likely to be 
the case AND that they may be in critical places.


Compared to dealing with the level inside the call, the use of the level 
variable avoids a call-test-return cycle in this case, and the unlikely 
should help the compiler reorder instructions so that no actual branch is 
taken under the common case.


So I think that the current situation is a good thing at least for debug.

For other levels, they are on by default AND would not be placed at 
critical performance points, so the whole effort of avoiding the call are 
moot.


I agree with Tom that __pg_log_level variable name violates usages.

ISTM that switching the variable to explicitely global solves the issues, 
and that possible the level test can be moved to inside the function for 
all but the debug level. See attached which reprises some of your idea, 
but keep the outside filtering for the debug level.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 39c1a243d5..e7b33bb8e0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3295,7 +3295,7 @@ executeMetaCommand(CState *st, instr_time *now)
 	argc = command->argc;
 	argv = command->argv;
 
-	if (unlikely(__pg_log_level <= PG_LOG_DEBUG))
+	if (unlikely(pg_log_current_level <= PG_LOG_DEBUG))
 	{
 		PQExpBufferData	buf;
 
diff --git a/src/common/logging.c b/src/common/logging.c
index c78ae793b8..ce12593e32 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -13,7 +13,7 @@
 
 #include "common/logging.h"
 
-enum pg_log_level __pg_log_level;
+enum pg_log_level pg_log_current_level;
 
 static const char *progname;
 static int	log_flags;
@@ -45,7 +45,7 @@ pg_logging_init(const char *argv0)
 	setvbuf(stderr, NULL, _IONBF, 0);
 
 	progname = get_progname(argv0);
-	__pg_log_level = PG_LOG_INFO;
+	pg_log_current_level = PG_LOG_INFO;
 
 	if (pg_color_env)
 	{
@@ -107,7 +107,7 @@ pg_logging_config(int new_flags)
 void
 pg_logging_set_level(enum pg_log_level new_level)
 {
-	__pg_log_level = new_level;
+	pg_log_current_level = new_level;
 }
 
 void
@@ -142,6 +142,9 @@ pg_log_generic_v(enum pg_log_level level, const char *pg_restrict fmt, va_list a
 	size_t		required_len;
 	char	   *buf;
 
+	if (unlikely(pg_log_current_level > level))
+		return;
+
 	Assert(progname);
 	Assert(level);
 	Assert(fmt);
diff --git a/src/include/common/logging.h b/src/include/common/logging.h
index 028149c7a1..c73c4ee76c 100644
--- a/src/include/common/logging.h
+++ b/src/include/common/logging.h
@@ -55,7 +55,7 @@ enum pg_log_level
 	PG_LOG_OFF,
 };
 
-extern enum pg_log_level __pg_log_level;
+extern enum pg_log_level pg_log_current_level;
 
 /*
  * Kind of a hack to be able to produce the psql output exactly as required by
@@ -72,24 +72,16 @@ void		pg_logging_set_locus_callback(void (*cb) (const char **filename, uint64 *l
 void		pg_log_generic(enum pg_log_level level, const char *pg_restrict fmt,...) pg_attribute_printf(2, 3);
 void		pg_log_generic_v(enum pg_log_level level, const char *pg_restrict fmt, va_list ap) pg_attribute_printf(2, 0);
 
-#define pg_log_fatal(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_FATAL)) pg_log_generic(PG_LOG_FATAL, __VA_ARGS__); \
-	} while(0)
+#define pg_log_fatal(...) pg_log_generic(PG_LOG_FATAL, __VA_ARGS__)
 
-#define pg_log_error(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_ERROR)) pg_log_generic(PG_LOG_ERROR, __VA_ARGS__); \
-	} while(0)
+#define pg_log_error(...) pg_log_generic(PG_LOG_ERROR, __VA_ARGS__)
 
-#define pg_log_warning(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_WARNING)) pg_log_generic(PG_LOG_WARNING, __VA_ARGS__); \
-	} while(0)
+#define pg_log_warning(...) pg_log_generic(PG_LOG_WARNING, __VA_ARGS__)
 
-#define pg_log_info(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_INFO)) pg_log_generic(PG_LOG_INFO, __VA_ARGS__); \
-	} while(0)
+#define pg_log_info(...) pg_log_generic(PG_LOG_INFO, __VA_ARGS__)
 
 #define pg_log_debug(...) do { \
-		if (unlikely(__pg_log_level <= PG_LOG_DEBUG)) pg_log_generic(PG_LOG_DEBUG, __VA_ARGS__); \
+		if (unlikely(pg_log_current_level <= PG_LOG_DEBUG)) pg_log_generic(PG_LOG_DEBUG, __VA_ARGS__); \
 	} while(0)
 
 #endif			/* COMMON_LOGGING_H */


Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-09 Thread Mahendra Singh Thalor
On Thu, 9 Jan 2020 at 09:36, Michael Paquier  wrote:
>
> On Wed, Jan 08, 2020 at 10:56:01AM +0900, Michael Paquier wrote:
> > And done this way as per the attached.  I am of course open to
> > objections or better ideas, though this looks formulation looks pretty
> > good to me.  Robert?
>
> Just to be clear here, I would like to commit this patch and backpatch
> with the current formulation in the error strings in the follow-up
> days.  In 9.4~10, the error cannot be reached, but that feels safer if
> we begin to work again on this portion of the autovacuum code.  So if
> you would like to object, that's the moment..
> --

Hi,
I reviewed and tested the patch. After applying patch, I am getting other
assert failure.

postgres=# CREATE TEMPORARY TABLE temp (a int);
CREATE TABLE
postgres=# \d
  List of relations
  Schema   | Name | Type  |  Owner
---+--+---+--
 pg_temp_3 | temp | table | mahendra
(1 row)

postgres=# drop schema pg_temp_3 cascade ;
NOTICE:  drop cascades to table temp
DROP SCHEMA
postgres=# \d
Did not find any relations.
postgres=# CREATE TEMPORARY TABLE temp (a int);
CREATE TABLE
postgres=# \d
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
postgres=#

*Stack trace:*
(gdb) bt
#0  0x7f7d749bd277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f7d749be968 in __GI_abort () at abort.c:90
#2  0x00eca3c4 in ExceptionalCondition (conditionName=0x114cc08
"relation->rd_backend != InvalidBackendId", errorType=0x114ca8b
"FailedAssertion",
fileName=0x114c8b0 "relcache.c", lineNumber=1123) at assert.c:67
#3  0x00eaacb9 in RelationBuildDesc (targetRelId=16392,
insertIt=true) at relcache.c:1123
#4  0x00eadf61 in RelationIdGetRelation (relationId=16392) at
relcache.c:2021
#5  0x0049f370 in relation_open (relationId=16392, lockmode=8) at
relation.c:59
#6  0x0064ccda in heap_drop_with_catalog (relid=16392) at
heap.c:1890
#7  0x006435f3 in doDeletion (object=0x2d623c0, flags=21) at
dependency.c:1360
#8  0x00643180 in deleteOneObject (object=0x2d623c0,
depRel=0x7ffcb9636290, flags=21) at dependency.c:1261
#9  0x00640d97 in deleteObjectsInList (targetObjects=0x2dce438,
depRel=0x7ffcb9636290, flags=21) at dependency.c:271
#10 0x00640ed6 in performDeletion (object=0x7ffcb96363b0,
behavior=DROP_CASCADE, flags=21) at dependency.c:356
#11 0x00aebc3d in do_autovacuum () at autovacuum.c:2269
#12 0x00aea478 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1693
#13 0x00ae9cf9 in StartAutoVacWorker () at autovacuum.c:1487
#14 0x00b13cdc in StartAutovacuumWorker () at postmaster.c:5562
#15 0x00b131b5 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5279
#16 
#17 0x7f7d74a7cc53 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
#18 0x00b09fc9 in ServerLoop () at postmaster.c:1691
#19 0x00b09544 in PostmasterMain (argc=3, argv=0x2ce2290) at
postmaster.c:1400
#20 0x00974b43 in main (argc=3, argv=0x2ce2290) at main.c:210

I think, before committing 1st patch, we should fix this crash also and
then we should commit all the patches.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


Re: CREATE ROUTINE MAPPING

2020-01-09 Thread Tom Lane
[ I wasn't paying much attention to this thread at the time, but
  Kyotaro-san just drew my attention to it again ]

Masahiko Sawada  writes:
> I agree that this feature covers A and B as the first step. But I'm
> concerned that for D (and maybe for C?) the volatility of mapped
> function could be changed. That is, currently we allow to push down
> only immutable functions but they might break it. Also, can the
> replacing a function with any expression be a risk of sql injections?

Yes, I'm afraid that there's serious security issues here that
the SQL standard fails to address.  Let's assume that user A has
created a function F(), and user B has created a foreign server S.
What privileges should be required for user C to create a mapping
for F on S?  AFAICS, the spec only requires C to have USAGE on S.
This means that all C needs is USAGE on S to create a trojan horse
that will execute arbitrary code when any other user D executes
a query using F on S.  C doesn't need to have any privilege on F
at all, and USAGE is not exactly strong privilege on S --- you
certainly wouldn't expect that "USAGE" translates to "I can
backdoor anybody else's usage of this server".

I see that SQL:2011's access rules for "CREATE ROUTINE MAPPING" are

1) The applicable privileges shall include the USAGE privilege on
the foreign server identified by FSN.

2) Additional privileges, if any, necessary to execute  are implementation-defined.

It seems to me that (2) should be read as "we know we blew it here,
but we're leaving it up to implementors to fix this".

Some of the alternatives that were discussed upthread basically replace
this whole idea with attaching properties to the original function F.
I think I like that a lot better from a security perspective.  If you are
calling F in your query, you are already placing trust in F's owner.
Better to let F's owner define how it maps to functions on remote servers
than to let random third parties define that.

regards, tom lane




Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDE

2020-01-09 Thread Dilip Kumar
On Fri, Jan 10, 2020 at 10:31 AM Michael Paquier  wrote:
>
> On Fri, Jan 10, 2020 at 07:30:34AM +0530, Dilip Kumar wrote:
> > On Thu, 9 Jan 2020 at 10:43 PM, Andres Freund  wrote:
> >> There's not much point in having this assert, right? Given that it
> >> covers all choices? Seems better to just drop it.
> >
> > Yeah right!
>
> Refreshing my mind on that...  The two remaining assertions still make
> sense for update and delete changes per the restrictions in place in
> CheckCmdReplicaIdentity(),

Right

 and there is a gap with the regression
> tests.  So combining all that I get the attached patch (origin point
> is 665d1fa).  Thoughts?

LGTM

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: remove some STATUS_* symbols

2020-01-09 Thread Michael Paquier
On Thu, Jan 09, 2020 at 11:15:08AM +0100, Peter Eisentraut wrote:
> You mean put he subsequent GrantLock() calls into LockCheckConflicts()? That
> would technically save some duplicate code, but it seems weird, because
> LockCheckConflicts() is notionally a read-only function that shouldn't
> change state.

Nah.  I was thinking about the first part of this "if" clause
LockCheckConflicts is part of here:
   if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
   status = STATUS_FOUND;
   else
   status = LockCheckConflicts(lockMethodTable, lockmode,
   lock, proclock);

But now that I look at it closely it messes up heavily with
ProcSleep() ;)
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-09 Thread Dilip Kumar
On Mon, Dec 30, 2019 at 3:43 PM Amit Kapila  wrote:
>
> On Sun, Dec 29, 2019 at 1:34 PM Dilip Kumar  wrote:
> >
> > I have observed some more issues
> >
> > 1. Currently, In ReorderBufferCommit, it is always expected that
> > whenever we get REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM, we must
> > have already got REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT and in
> > SPEC_CONFIRM we send the tuple we got in SPECT_INSERT.  But, now those
> > two messages can be in different streams.  So we need to find a way to
> > handle this.  Maybe once we get SPEC_INSERT then we can remember the
> > tuple and then if we get the SPECT_CONFIRM in the next stream we can
> > send that tuple?
> >
>
> Your suggestion makes sense to me.  So, we can try it.

I have implemented this and attached it as a separate patch. In my
latest patch set[1]
>
> > 2. During commit time in DecodeCommit we check whether we need to skip
> > the changes of the transaction or not by calling
> > SnapBuildXactNeedsSkip but since now we support streaming so it's
> > possible that before we decode the commit WAL, we might have already
> > sent the changes to the output plugin even though we could have
> > skipped those changes.  So my question is instead of checking at the
> > commit time can't we check before adding to ReorderBuffer itself
> >
>
> I think if we can do that then the same will be true for current code
> irrespective of this patch.  I think it is possible that we can't take
> that decision while decoding because we haven't assembled a consistent
> snapshot yet.  I think we might be able to do that while we try to
> stream the changes.  I think we need to take care of all the
> conditions during streaming (when the logical_decoding_workmem limit
> is reached) as we do in DecodeCommit.  This needs a bit more study.

I have analyzed this further and I think we can not decide all the
conditions even while streaming.  Because IMHO once we get the
SNAPBUILD_FULL_SNAPSHOT we can add the changes to the reorder buffer
so that if we get the commit of the transaction after we reach to the
SNAPBUILD_CONSISTENT.  However, if we get the commit before we reach
to SNAPBUILD_CONSISTENT then we need to ignore this transaction.  Now,
even if we have SNAPBUILD_FULL_SNAPSHOT we can stream the changes
which might get dropped later but that we can not decide while
streaming.

[1] 
https://www.postgresql.org/message-id/CAFiTN-snMb%3D53oqkM8av8Lqfxojjm4OBwCNxmFssgLCceY_zgg%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Tom Lane
Noah Misch  writes:
> On Thu, Jan 09, 2020 at 12:45:41AM -0500, Tom Lane wrote:
>> (1) Why did we get a crash and not some more-decipherable out-of-resources
>> error?  Can we improve that experience?

> By default, 32-bit AIX binaries have maxdata:0x.  Specifying
> maxdata:0x1000 provides the same 256M of RAM, yet it magically changes the
> SIGSEGV to ENOMEM:
> ...
> We could add -Wl,-bmaxdata:0x1000 (or a higher value) to LDFLAGS when
> building for 32-bit AIX.

+1, seems like that would improve matters considerably on that platform.

>> (2) Should we be dialing back the resource consumption of this test?
>> Even on machines where it doesn't fail outright, I'd imagine that it's
>> costing a lot of buildfarm cycles.  Is it actually worth that?

> The test's resource usage, being quite low, should not be a factor in the
> test's fate.  On my usual development machine, the entire
> 006_logical_decoding.pl file takes just 3s and ~250 MiB of RAM.

Yeah, as I noted downthread, it appears that initdb itself can't
succeed with less than ~250MB these days.  My old-school self
feels like that's excessive, but I must admit I'm not motivated
to go reduce it right now.  But I think it's a clear win to fail
with "out of memory" rather than "SIGSEGV", so I think we ought
to adjust the AIX build options as you suggest.

regards, tom lane




Re: [HACKERS] Block level parallel vacuum

2020-01-09 Thread Masahiko Sawada
On Thu, 9 Jan 2020 at 19:33, Amit Kapila  wrote:
>
> On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, 8 Jan 2020 at 22:16, Amit Kapila  wrote:
> > >
> > >
> > > What do you think of the attached?  Sawada-san, kindly verify the
> > > changes and let me know your opinion.
> >
> > I agreed to not include both the FAST option patch and
> > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
> > on the main part and we can discuss and add them later if want.
> >
> > I've looked at the latest version patch you shared. Overall it looks
> > good and works fine. I have a few small comments:
> >
>
> I have addressed all your comments and slightly change nearby comments
> and ran pgindent.  I think we can commit the first two preparatory
> patches now unless you or someone else has any more comments on those.

Yes.

I'd like to briefly summarize the
v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel for other
reviewers who wants to newly starts to review this patch:

Introduce PARALLEL option to VACUUM command. Parallel vacuum is
enabled by default. The number of parallel workers is determined based
on the number of indexes that support parallel index when user didn't
specify the parallel degree or PARALLEL option is omitted. Specifying
PARALLEL 0 disables parallel vacuum.

In parallel vacuum of this patch, only the leader process does heap
scan and collect dead tuple TIDs on the DSM segment. Before starting
index vacuum or index cleanup the leader launches the parallel workers
and perform it together with parallel workers. Individual index are
processed by one vacuum worker process. Therefore parallel vacuum can
be used when the table has at least 2 indexes (the leader always takes
one index). After launched parallel workers, the leader process
vacuums indexes first that don't support parallel index after launched
parallel workers. The parallel workers process indexes that support
parallel index vacuum and the leader process join as a worker after
completing such indexes. Once all indexes are processed the parallel
worker processes exit.  After that, the leader process re-initializes
the parallel context so that it can use the same DSM for multiple
passes of index vacuum and for performing index cleanup.  For updating
the index statistics, we need to update the system table and since
updates are not allowed during parallel mode we update the index
statistics after exiting from the parallel mode.

When the vacuum cost-based delay is enabled, even parallel vacuum is
throttled. The basic idea of a cost-based vacuum delay for parallel
index vacuuming is to allow all parallel vacuum workers including the
leader process to have a shared view of cost related parameters
(mainly VacuumCostBalance). We allow each worker to update it as and
when it has incurred any cost and then based on that decide whether it
needs to sleep.  We allow the worker to sleep proportional to the work
done and reduce the VacuumSharedCostBalance by the amount which is
consumed by the current worker (VacuumCostBalanceLocal).  This can
avoid letting the workers sleep who have done less or no I/O as
compared to other workers and therefore can ensure that workers who
are doing more I/O got throttled more.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Block level parallel vacuum

2020-01-09 Thread Mahendra Singh Thalor
On Thu, 9 Jan 2020 at 17:31, Sergei Kornilov  wrote:
>
> Hello
>
> I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip 
> small indexes but this is not mentioned in documentation for both vacuum 
> command and GUC itself.
>
> +   /* Determine the number of parallel workers to launch */
> +   if (lps->lvshared->for_cleanup)
> +   {
> +   if (lps->lvshared->first_time)
> +   nworkers = lps->nindexes_parallel_cleanup +
> +   lps->nindexes_parallel_condcleanup - 1;
> +   else
> +   nworkers = lps->nindexes_parallel_cleanup - 1;
> +
> +   }
> +   else
> +   nworkers = lps->nindexes_parallel_bulkdel - 1;

v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum and
v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum patches look
fine to me.

Below are some review comments for v43-0002 patch.

1.
+integer
+
+ 
+  Specifies a positive integer value passed to the selected option.
+  The integer value can
+  also be omitted, in which case the value is decided by the command
+  based on the option used.
+ 
+http://www.enterprisedb.com




Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDE

2020-01-09 Thread Michael Paquier
On Fri, Jan 10, 2020 at 07:30:34AM +0530, Dilip Kumar wrote:
> On Thu, 9 Jan 2020 at 10:43 PM, Andres Freund  wrote:
>> There's not much point in having this assert, right? Given that it
>> covers all choices? Seems better to just drop it.
>
> Yeah right!

Refreshing my mind on that...  The two remaining assertions still make
sense for update and delete changes per the restrictions in place in
CheckCmdReplicaIdentity(), and there is a gap with the regression
tests.  So combining all that I get the attached patch (origin point
is 665d1fa).  Thoughts? 
--
Michael
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index dcf7c08c18..3c6d0cd171 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -143,10 +143,6 @@ logicalrep_write_insert(StringInfo out, Relation rel, HeapTuple newtuple)
 {
 	pq_sendbyte(out, 'I');		/* action INSERT */
 
-	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
-		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
-		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
-
 	/* use Oid as relation identifier */
 	pq_sendint32(out, RelationGetRelid(rel));
 
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index 77a1560b23..b9608d21f2 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 22;
+use Test::More tests => 23;
 
 # Initialize publisher node
 my $node_publisher = get_new_node('publisher');
@@ -34,6 +34,10 @@ $node_publisher->safe_psql('postgres',
 $node_publisher->safe_psql('postgres',
 	"CREATE TABLE tab_include (a int, b text, CONSTRAINT covering PRIMARY KEY(a) INCLUDE(b))"
 );
+# Let this table with REPLICA IDENTITY NOTHING, allowing only INSERT changes.
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_nothing (a int)");
+$node_publisher->safe_psql('postgres',
+	"ALTER TABLE tab_nothing REPLICA IDENTITY NOTHING");
 
 # Setup structure on subscriber
 $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_notrep (a int)");
@@ -42,6 +46,7 @@ $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
 $node_subscriber->safe_psql('postgres',
 	"CREATE TABLE tab_rep (a int primary key)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_nothing (a int)");
 
 # different column count and order than on publisher
 $node_subscriber->safe_psql('postgres',
@@ -59,7 +64,7 @@ $node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
 $node_publisher->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_ins_only WITH (publish = insert)");
 $node_publisher->safe_psql('postgres',
-	"ALTER PUBLICATION tap_pub ADD TABLE tab_rep, tab_full, tab_full2, tab_mixed, tab_include"
+	"ALTER PUBLICATION tap_pub ADD TABLE tab_rep, tab_full, tab_full2, tab_mixed, tab_include, tab_nothing"
 );
 $node_publisher->safe_psql('postgres',
 	"ALTER PUBLICATION tap_pub_ins_only ADD TABLE tab_ins");
@@ -97,6 +102,9 @@ $node_publisher->safe_psql('postgres', "UPDATE tab_rep SET a = -a");
 $node_publisher->safe_psql('postgres',
 	"INSERT INTO tab_mixed VALUES (2, 'bar', 2.2)");
 
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab_nothing VALUES (generate_series(1,20))");
+
 $node_publisher->safe_psql('postgres',
 	"INSERT INTO tab_include SELECT generate_series(1,50)");
 $node_publisher->safe_psql('postgres',
@@ -117,6 +125,10 @@ $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab_mixed");
 is( $result, qq(local|1.1|foo|1
 local|2.2|bar|2), 'check replicated changes with different column order');
 
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*) FROM tab_nothing");
+is( $result, qq(20), 'check replicated changes with REPLICA IDENTITY NOTHING');
+
 $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(a), max(a) FROM tab_include");
 is($result, qq(20|-20|-1),


signature.asc
Description: PGP signature


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-09 Thread Dilip Kumar
On Sat, Jan 4, 2020 at 4:07 PM Amit Kapila  wrote:
>
> On Mon, Dec 30, 2019 at 3:11 PM Dilip Kumar  wrote:
> >
> > On Thu, Dec 12, 2019 at 9:44 AM Dilip Kumar  wrote:

> > > > 0002-Issue-individual-invalidations-with-wal_level-log
> > > > 
> > > > 1.
> > > > xact_desc_invalidations(StringInfo buf,
> > > > {
> > > > ..
> > > > + else if (msg->id == SHAREDINVALSNAPSHOT_ID)
> > > > + appendStringInfo(buf, " snapshot %u", msg->sn.relId);
> > > >
> > > > You have removed logging for the above cache but forgot to remove its
> > > > reference from one of the places.  Also, I think you need to add a
> > > > comment somewhere in inval.c to say why you are writing for WAL for
> > > > some types of invalidations and not for others?
> > Done
> >
>
> I don't see any new comments as asked by me.
Done

 I think we should also
> consider WAL logging at each command end instead of doing piecemeal as
> discussed in another email [1], which will have lesser code changes
> and maybe better in performance.  You might want to evaluate the
> performance of both approaches.

Still pending, will work on this.
>
> > > > 0005-Gracefully-handle-concurrent-aborts-of-uncommitte
> > > > --
> > > > 1.
> > > > @@ -1877,6 +1877,7 @@ ReorderBufferCommit(ReorderBuffer *rb, 
> > > > TransactionId xid,
> > > >   PG_CATCH();
> > > >   {
> > > >   /* TODO: Encapsulate cleanup
> > > > from the PG_TRY and PG_CATCH blocks */
> > > > +
> > > >   if (iterstate)
> > > >   ReorderBufferIterTXNFinish(rb, iterstate);
> > > >
> > > > Spurious line change.
> > > >
> > Done
>
> + /*
> + * We don't expect direct calls to heap_getnext with valid
> + * CheckXidAlive for regular tables. Track that below.
> + */
> + if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
> + !(IsCatalogRelation(scan->rs_base.rs_rd) ||
> +   RelationIsUsedAsCatalogTable(scan->rs_base.rs_rd
> + elog(ERROR, "improper heap_getnext call");
>
> Earlier, I thought we don't need to check if it is a regular table in
> this check, but it is required because output plugins can try to do
> that and if they do so during decoding (with historic snapshots), the
> same should be not allowed.
>
> How about changing the error message to "unexpected heap_getnext call
> during logical decoding" or something like that?
Done
>
> > > > 2. The commit message of this patch refers to Prepared transactions.
> > > > I think that needs to be changed.
> > > >
> > > > 0006-Implement-streaming-mode-in-ReorderBuffer
> > > > -
>
> Few comments on v4-0018-Review-comment-fix-and-refactoring:
> 1.
> + if (streaming)
> + {
> + /*
> + * Set the last last of the stream as the final lsn before calling
> + * stream stop.
> + */
> + txn->final_lsn = prev_lsn;
> + rb->stream_stop(rb, txn);
> + }
>
> Shouldn't we try to final_lsn as is done by Vignesh's patch [2]?
Already agreed upon current implementation
>
> 2.
> + if (streaming)
> + {
> + /*
> + * Set the CheckXidAlive to the current (sub)xid for which this
> + * change belongs to so that we can detect the abort while we are
> + * decoding.
> + */
> + CheckXidAlive = change->txn->xid;
> +
> + /* Increment the stream count. */
> + streamed++;
> + }
>
> Is the variable 'streamed' used anywhere?
Removed
>
> 3.
> + /*
> + * Destroy the (relfilenode, ctid) hashtable, so that we don't leak
> + * any memory. We could also keep the hash table and update it with
> + * new ctid values, but this seems simpler and good enough for now.
> + */
> + ReorderBufferDestroyTupleCidHash(rb, txn);
>
> Won't this be required only when we are streaming changes?
Fixed
>
> As per my understanding apart from the above comments, the known
> pending work for this patchset is as follows:
> a. The two open items agreed to you in the email [3].
> b. Complete the handling of schema_sent as discussed above [4].
> c. Few comments by Vignesh and the response on the same by me [5][6].
> d. WAL overhead and performance testing for additional WAL logging by
> this patchset.
> e. Some way to see the tuple for streamed transactions by decoding API
> as speculated by you [7].
>
> Have I missed anything?
I have worked upon most of these items, I will reply to them separately.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench - use pg logging capabilities

2020-01-09 Thread Michael Paquier
On Thu, Jan 09, 2020 at 08:09:29PM -0500, Tom Lane wrote:
> TBH, my recommendation would be to drop *all* of these likely()
> and unlikely() calls.  What evidence have you got that those are
> meaningfully improving the quality of the generated code?  And if
> they're buried inside macros, they certainly aren't doing anything
> useful in terms of documenting the code.

Yes.  I am wondering if we should not rework this part of the logging
with something like the attached.  My 2c, thoughts welcome.
--
Michael
diff --git a/src/include/common/logging.h b/src/include/common/logging.h
index 028149c7a1..dd54dc57d0 100644
--- a/src/include/common/logging.h
+++ b/src/include/common/logging.h
@@ -55,8 +55,6 @@ enum pg_log_level
 	PG_LOG_OFF,
 };
 
-extern enum pg_log_level __pg_log_level;
-
 /*
  * Kind of a hack to be able to produce the psql output exactly as required by
  * the regression tests.
@@ -66,6 +64,7 @@ extern enum pg_log_level __pg_log_level;
 void		pg_logging_init(const char *argv0);
 void		pg_logging_config(int new_flags);
 void		pg_logging_set_level(enum pg_log_level new_level);
+enum pg_log_level pg_logging_get_level(void);
 void		pg_logging_set_pre_callback(void (*cb) (void));
 void		pg_logging_set_locus_callback(void (*cb) (const char **filename, uint64 *lineno));
 
@@ -73,23 +72,23 @@ void		pg_log_generic(enum pg_log_level level, const char *pg_restrict fmt,...) p
 void		pg_log_generic_v(enum pg_log_level level, const char *pg_restrict fmt, va_list ap) pg_attribute_printf(2, 0);
 
 #define pg_log_fatal(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_FATAL)) pg_log_generic(PG_LOG_FATAL, __VA_ARGS__); \
+		pg_log_generic(PG_LOG_FATAL, __VA_ARGS__); \
 	} while(0)
 
 #define pg_log_error(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_ERROR)) pg_log_generic(PG_LOG_ERROR, __VA_ARGS__); \
+		pg_log_generic(PG_LOG_ERROR, __VA_ARGS__); \
 	} while(0)
 
 #define pg_log_warning(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_WARNING)) pg_log_generic(PG_LOG_WARNING, __VA_ARGS__); \
+		pg_log_generic(PG_LOG_WARNING, __VA_ARGS__); \
 	} while(0)
 
 #define pg_log_info(...) do { \
-		if (likely(__pg_log_level <= PG_LOG_INFO)) pg_log_generic(PG_LOG_INFO, __VA_ARGS__); \
+		pg_log_generic(PG_LOG_INFO, __VA_ARGS__); \
 	} while(0)
 
 #define pg_log_debug(...) do { \
-		if (unlikely(__pg_log_level <= PG_LOG_DEBUG)) pg_log_generic(PG_LOG_DEBUG, __VA_ARGS__); \
+		pg_log_generic(PG_LOG_DEBUG, __VA_ARGS__); \
 	} while(0)
 
 #endif			/* COMMON_LOGGING_H */
diff --git a/src/common/logging.c b/src/common/logging.c
index c78ae793b8..18b3c94fe2 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -13,7 +13,7 @@
 
 #include "common/logging.h"
 
-enum pg_log_level __pg_log_level;
+static enum pg_log_level __pg_log_level;
 
 static const char *progname;
 static int	log_flags;
@@ -110,6 +110,12 @@ pg_logging_set_level(enum pg_log_level new_level)
 	__pg_log_level = new_level;
 }
 
+enum pg_log_level
+pg_logging_get_level(void)
+{
+	return __pg_log_level;
+}
+
 void
 pg_logging_set_pre_callback(void (*cb) (void))
 {
@@ -127,6 +133,9 @@ pg_log_generic(enum pg_log_level level, const char *pg_restrict fmt,...)
 {
 	va_list		ap;
 
+	if (level < __pg_log_level)
+		return;
+
 	va_start(ap, fmt);
 	pg_log_generic_v(level, fmt, ap);
 	va_end(ap);
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 39c1a243d5..5e63e1f51a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3295,7 +3295,7 @@ executeMetaCommand(CState *st, instr_time *now)
 	argc = command->argc;
 	argv = command->argv;
 
-	if (unlikely(__pg_log_level <= PG_LOG_DEBUG))
+	if (pg_logging_get_level() <= PG_LOG_DEBUG)
 	{
 		PQExpBufferData	buf;
 


signature.asc
Description: PGP signature


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Amit Kapila
On Fri, Jan 10, 2020 at 6:10 AM Tom Lane  wrote:
>
> I wrote:
> >   ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 
> > chunks); 223295504 used
>
> > The test case is only inserting 50K fairly-short rows, so this seems
> > like an unreasonable amount of memory to be consuming for that; and
> > even if you think it's reasonable, it clearly isn't going to scale
> > to large production transactions.
>
> > Now, the good news is that v11 and later get through
> > 006_logical_decoding.pl just fine under the same restriction.
> > So we did something in v11 to fix this excessive memory consumption.
> > However, unless we're willing to back-port whatever that was, this
> > test case is clearly consuming excessive resources for the v10 branch.
>
> I dug around a little in the git history for backend/replication/logical/,
> and while I find several commit messages mentioning memory leaks and
> faulty spill logic, they all claim to have been back-patched as far
> as 9.4.
>
> It seems reasonably likely to me that this result is telling us about
> an actual bug, ie, faulty back-patching of one or more of those fixes
> into v10 and perhaps earlier branches.
>

I think it would be good to narrow down this problem, but it seems we
can do this separately.   I think to avoid forgetting about this, can
we track it somewhere as an open issue (In Older Bugs section of
PostgreSQL 12 Open Items or some other place)?

It seems to me that this test has found a problem in back-branches, so
we might want to keep it after removing the max_files_per_process
restriction.  However, unless we narrow down this memory leak it is
not a good idea to keep it at least not in v10.  So, we have the below
options:
(a) remove this test entirely from all branches and once we found the
memory leak problem in back-branches, then consider adding it again
without max_files_per_process restriction.
(b) keep this test without max_files_per_process restriction till v11
and once the memory leak issue in v10 is found, we can back-patch to
v10 as well.

Suggestions?

>  If I have to do so to prove my point, I will set up a buildfarm member
>  that uses USE_NAMED_POSIX_SEMAPHORES, and then insist that the patch
>  cope with that.
>

Shall we document that under USE_NAMED_POSIX_SEMAPHORES, we consume
additional fd?  I thought about it because the minimum limit for
max_files_per_process is 25 and the system won't even start if someone
has used a platform where USE_NAMED_POSIX_SEMAPHORES is enabled.
Also, if it would have been explicitly mentioned, then I think this
test wouldn't have tried to become so optimistic about
max_files_per_process.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: base backup client as auxiliary backend process

2020-01-09 Thread Masahiko Sawada
On Fri, 22 Nov 2019 at 19:22, Peter Eisentraut
 wrote:
>
> On 2019-11-15 14:52, Sergei Kornilov wrote:
> >> I looked into this. It seems trivial to make walsender create and use a
> >> temporary replication slot by default if no permanent replication slot
> >> is specified. This is basically the logic that pg_basebackup has but
> >> done server-side. See attached patch for a demonstration. Any reason
> >> not to do that?
> > Seems this would break pg_basebackup --no-slot option?
>
> After thinking about this a bit more, doing the temporary slot stuff on
> the walsender side might lead to too many complications in practice.
>
> Here is another patch set that implements the temporary slot use on the
> walreceiver side, essentially mirroring what pg_basebackup already does.
>
> I think this patch set might be useful on its own, even without the base
> backup stuff to follow.
>

I agreed that these patches are useful on its own and 0001 patch and
0002 patch look good to me. For 0003 patch,

+  linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+  a temporary replication slot (determined by ), but these are not shown
+  here.

I think it's better to show the temporary slot name on
pg_stat_wal_receiver view. Otherwise user would have no idea about
what wal receiver is using the temporary slot.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Avoiding hash join batch explosions with extreme skew and weird stats

2020-01-09 Thread Melanie Plageman
On Tue, Jan 7, 2020 at 4:14 PM Thomas Munro  wrote:

> * I am uneasy about BarrierArriveExplicitAndWait() (a variant of
> BarrierArriveAndWait() that lets you skip directly to a given phase?);
> perhaps you only needed that for a circular phase system, which you
> could do with modular phase numbers, like PHJ_GROW_BATCHES_PHASE?  I
> tried to make the barrier interfaces look like the libraries in other
> parallel programming environments, and I'd be worried that the
> explicit phase thing could easily lead to bugs.
>

So, I actually use it to circle back up to the first phase while
skipping the last phase.
So I couldn't do it with modular phase numbers and a loop.
The last phase detaches from the chunk barrier. I don't want to detach
from the chunk barrier if there are more chunks.
I basically need a way to only attach to the chunk barrier at the
begininng of the first chunk and only detach at the end of the last
chunk--not in between chunks. I will return from the function and
re-enter between chunks -- say between chunk 2 and chunk 3 of 5.

However, could this be solved by having more than one chunk
barrier?
A worker would attach to one chunk barrier and then when it moves to
the next chunk it would attach to the other chunk barrier and then
switch back when it switches to the next chunk. Then it could detach
and attach each time it enters/leaves the function.


> * I'm not sure it's OK to wait at the end of each loop, as described
> in the commit message:
>
> Workers probing a fallback batch will wait until all workers have
> finished probing before moving on so that an elected worker can read
> and combine the outer match status files into a single bitmap and use
> it to emit unmatched outer tuples after all chunks of the inner side
> have been processed.
>
> Maybe I misunderstood completely, but that seems to break the
> programming rule described in nodeHashjoin.c's comment beginning "To
> avoid deadlocks, ...".  To recap: (1) When you emit a tuple, the
> program counter escapes to some other node, and maybe that other node
> waits for thee, (2) Maybe the leader is waiting for you but you're
> waiting for it to drain its queue so you can emit a tuple (I learned a
> proper name for this: "flow control deadlock").  That's why the
> current code only ever detaches (a non-waiting operation) after it's
> begun emitting tuples (that is, the probing phase).  It just moves
> onto another batch.  That's not a solution here: you can't simply move
> to another loop, loops are not independent of each other like batches.
> It's possible that barriers are not the right tool for this part of
> the problem, or that there is a way to use a barrier that you don't
> remain attached to while emitting, or that we should remove the
> deadlock risks another way entirely[1] but I'm not sure.  Furthermore,
> the new code in ExecParallelHashJoinNewBatch() appears to break the
> rule even in the non-looping case (it calls BarrierArriveAndWait() in
> ExecParallelHashJoinNewBatch(), where the existing code just
> detaches).
>

Yea, I think I'm totally breaking that rule.
Just to make sure I understand the way in which I am breaking that
rule:

In my patch, while attached to a chunk_barrier, worker1 emits a
matched tuple (control leaves the current node).  Meanwhile, worker2
has finished probing the chunk and is waiting on the chunk_barrier for
worker1.
How though could worker1 be waiting for worker2?

Is this only a problem when one of the barrier participants is the
leader and is reading from the tuple queue? (reading your tuple queue
deadlock hazard example in the thread [1] you referred to).
Basically is my deadlock hazard a tuple queue deadlock hazard?

I thought maybe this could be a problem with nested HJ nodes, but I'm
not sure.

As I understand it, this isn't a problem with current master with
batch barriers because while attached to a batch_barrier, a worker can
emit tuples. No other workers will wait on the batch barrier once they
have started probing.

I need to think more about the suggestions you provided in [1] about
nixing the tuple queue deadlock hazard.

However, hypothetically, if we decide we don't want to break the no
emitting tuples while attached to a barrier rule, how can we still
allow workers to coordinate while probing chunks of the batch
sequentially (1 chunk at a time)?

I could think of two options (both sound slow and bad):

Option 1:
Stash away the matched tuples in a tuplestore and emit them at the end
of the batch (incurring more writes).

Option 2:
Degenerate to 1 worker for fallback batches

Any other ideas?


>
> >   - Rename "chunk" (as in chunks of inner side) to something that is
> > not already used in the context of memory chunks and, more
> > importantly, SharedTuplestoreChunk
>
> +1.  Fragments?  Loops?  Blocks (from
> https://en.wikipedia.org/wiki/Block_nested_loop, though, no, strike
> that, blocks are also super overloaded).
>

Hmmm. I think loop is kinda 

Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDE

2020-01-09 Thread Dilip Kumar
On Thu, 9 Jan 2020 at 10:43 PM, Andres Freund  wrote:

> Hi,
>
> On 2020-01-09 13:17:59 +0530, Dilip Kumar wrote:
> > I am able to reproduce the failure,  I think the assert in the
> > 'logicalrep_write_insert' is not correct.  IMHO even if the replica
> > identity is set to NOTHING we should be able to replicate INSERT?
> >
> > This will fix the issue.
> >
> > diff --git a/src/backend/replication/logical/proto.c
> > b/src/backend/replication/logical/proto.c
> > index dcf7c08..471461c 100644
> > --- a/src/backend/replication/logical/proto.c
> > +++ b/src/backend/replication/logical/proto.c
> > @@ -145,7 +145,8 @@ logicalrep_write_insert(StringInfo out, Relation
> > rel, HeapTuple newtuple)
> >
> > Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
> >rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
> > -  rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
> > +  rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX ||
> > +  rel->rd_rel->relreplident ==
> REPLICA_IDENTITY_NOTHING);
> >
> > /* use Oid as relation identifier */
> > pq_sendint32(out, RelationGetRelid(rel));
>
> There's not much point in having this assert, right? Given that it
> covers all choices? Seems better to just drop

it.


Yeah right!
-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Noah Misch
On Thu, Jan 09, 2020 at 12:45:41AM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > Even so, a web search for "extend_brk" led to the answer.  By default, 
> > 32-bit
> > AIX binaries get only 256M of RAM for stack and sbrk.  The new regression 
> > test
> > used more than that, hence this crash.
> 
> Hm, so
> 
> (1) Why did we get a crash and not some more-decipherable out-of-resources
> error?  Can we improve that experience?

By default, 32-bit AIX binaries have maxdata:0x.  Specifying
maxdata:0x1000 provides the same 256M of RAM, yet it magically changes the
SIGSEGV to ENOMEM:

$ OBJECT_MODE=32 gcc maxdata.c && ./a.out
Segmentation fault
$ OBJECT_MODE=32 gcc -Wl,-bmaxdata:0x maxdata.c && ./a.out
Segmentation fault
$ OBJECT_MODE=32 gcc -Wl,-bmaxdata:0x1000 maxdata.c && ./a.out
done at 255 MiB: Not enough space

We could add -Wl,-bmaxdata:0x1000 (or a higher value) to LDFLAGS when
building for 32-bit AIX.

> (2) Should we be dialing back the resource consumption of this test?
> Even on machines where it doesn't fail outright, I'd imagine that it's
> costing a lot of buildfarm cycles.  Is it actually worth that?

The test's resource usage, being quite low, should not be a factor in the
test's fate.  On my usual development machine, the entire
006_logical_decoding.pl file takes just 3s and ~250 MiB of RAM.




Re: table partitioning and access privileges

2020-01-09 Thread Fujii Masao
On Tue, Jan 7, 2020 at 5:15 PM Amit Langote  wrote:
>
> On Fri, Dec 27, 2019 at 4:26 AM Tom Lane  wrote:
> > Fujii Masao  writes:
> > > My customer reported me that the queries through a partitioned table
> > > ignore each partition's SELECT, INSERT, UPDATE, and DELETE privileges,
> > > on the other hand, only TRUNCATE privilege specified for each partition
> > > is applied. I'm not sure if this behavior is expected or not. But anyway
> > > is it better to document that? For example,
> >
> > > Access privileges may be defined and removed separately for each 
> > > partition.
> > > But note that queries through a partitioned table ignore each 
> > > partition's
> > > SELECT, INSERT, UPDATE and DELETE privileges, and apply only TRUNCATE 
> > > one.
> >
> > I believe it's intentional that we only check access privileges on
> > the table explicitly named in the query.  So I'd say SELECT etc
> > are doing the right thing, and if TRUNCATE isn't in step with them
> > that's a bug to fix, not something to document.
>
> I tend to agree that TRUNCATE's permission model for inheritance
> should be consistent with that for the other commands.  How about the
> attached patch toward that end?

Thanks for the patch!

The patch basically looks good to me.

+GRANT SELECT (f1, fz), UPDATE (fz) ON atestc TO regress_priv_user2;
+REVOKE TRUNCATE ON atestc FROM regress_priv_user2;

These seem not to be necessary for the test.

BTW, I found that LOCK TABLE on the parent table checks the permission
of its child tables. This also needs to be fixed (as a separate patch)?

Regards,

-- 
Fujii Masao




Re: backup manifests

2020-01-09 Thread David Steele

Hi Robert,

On 1/7/20 6:33 PM, Stephen Frost wrote:

> These are issues that we've thought
> about and worried about over the years of pgbackrest and with that
> experience we've come down on the side that a JSON-based format would be
> an altogether better design.  That's why we're advocating for it, not
> because it requires more code or so that it delays the efforts here, but
> because we've been there, we've used other formats, we've dealt with
> user complaints when we do break things, this is all history for us
> that's helped us learn- for PG, it looks like the future with a static
> format, and I get that the future is hard to predict and pg_basebackup
> isn't pgbackrest and yeah, I could be completely wrong because I don't
> actually have a crystal ball, but this starting point sure looks really
> familiar.

For example, have you considered what will happen if you have a file in 
the cluster with a tab in the name?  This is perfectly valid in Posix 
filesystems, at least.  You may already be escaping tabs but the simple 
code snippet you provided earlier isn't going to work so well either 
way.  It gets complicated quickly.


I know users should not be creating weird files in PGDATA, but it's 
amazing how often this sort of thing pops up.  We currently have an open 
issue because = in file names breaks our file format.  Tab is surely 
less common but it's amazing what users will do.


Another fun one is 03849840 which fixes the handling of \ characters in 
the code which checksums the manifest.  The file is not fully JSON but 
the checksums are and that was initially missed in the C migration.  The 
bug never got released but it easily could have been.


In short, using a quick-and-dirty homegrown format seemed great at first 
but has caused many headaches.  Because we don't change the repo format 
across releases we are kind of stuck with past sins until we create a 
new repo format and write update/compatability code.  Users are 
understandably concerned if new versions of the software won't work with 
their repo, some of which contain years of backups (really).


This doesn't even get into the work everyone else will need to do to 
read a custom format.  I do appreciate your offer of contributing parser 
code to pgBackRest, but honestly I'd rather it were not necessary. 
Though of course I'd still love to see a contribution of some sort from you!


Hard experience tells me that using a standard format where all these 
issues have been worked out is the way to go.


There are a few MIT-licensed JSON projects that are implemented in a 
single file.  cJSON is very capable while JSMN is very minimal. Is is 
possible that one of those (or something like it) would be acceptable? 
It looks like the one requirement we have is that the JSON can be 
streamed rather than just building up one big blob?  Even with that 
requirement there are a few tricks that can be used.  JSON nests rather 
nicely after all so the individual file records can be transmitted 
independently of the overall file format.


Your first question may be why didn't pgBackRest use one of those 
parsers?  The answer is that JSON parsing/rendering is pretty trivial. 
Memory management and a (datum-like) type system are the hard parts and 
pgBackRest already had those.


Would it be acceptable to bring in JSON code with a compatible license 
to use in libcommon?  If so I'm willing to help adapt that code for use 
in Postgres.  It's possible that the pgBackRest code could be adapted 
similarly, but it might make more sense to start from one of these 
general purpose parsers.


Thoughts?

--
-David
da...@pgmasters.net




RE: [PATCH] Resolve Parallel Hash Join Performance Issue

2020-01-09 Thread Deng, Gang
Regarding to the reason of setting bit was not cheap anymore in parallel join. 
As I explain in my original mail, it is because 'false sharing cache 
coherence'. In short word, setting of the bit will cause the whole cache line 
(64 bytes) dirty. So that all CPU cores contain the cache line have to load it 
again, which will waste much cpu time. Article 
https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
 explain more detail.

-Original Message-
From: Tom Lane  
Sent: Thursday, January 9, 2020 10:43 PM
To: Thomas Munro 
Cc: Deng, Gang ; pgsql-hack...@postgresql.org
Subject: Re: [PATCH] Resolve Parallel Hash Join Performance Issue

Thomas Munro  writes:
> Right, I see.  The funny thing is that the match bit is not even used 
> in this query (it's used for right and full hash join, and those 
> aren't supported for parallel joins yet).  Hmm.  So, instead of the 
> test you proposed, an alternative would be to use if (!parallel).
> That's a value that will be constant-folded, so that there will be no 
> branch in the generated code (see the pg_attribute_always_inline 
> trick).  If, in a future release, we need the match bit for parallel 
> hash join because we add parallel right/full hash join support, we 
> could do it the way you showed, but only if it's one of those join 
> types, using another constant parameter.

Can we base the test off the match type today, and avoid leaving something that 
will need to be fixed later?

I'm pretty sure that the existing coding is my fault, and that it's like that 
because I reasoned that setting the bit was too cheap to justify having a 
test-and-branch around it.  Apparently that's not true anymore in a parallel 
join, but I have to say that it's unclear why.  In any case, the reasoning 
probably still holds good in non-parallel cases, so it'd be a shame to 
introduce a run-time test if we can avoid it.

regards, tom lane




Re: pgbench - use pg logging capabilities

2020-01-09 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Jan-09, Fabien COELHO wrote:
>> -if (unlikely(__pg_log_level <= PG_LOG_DEBUG))
>> +if (pg_log_debug_level)
>>  {

> Umm ... I find the original exceedingly ugly, but the new line is
> totally impenetrable.

So, I had not been paying any attention to this thread, but that
snippet is already enough to set off alarm bells.

1. (problem with already-committed code, evidently)  The C standard is
quite clear that

 -- All  identifiers  that  begin  with  an  underscore and
either an uppercase letter or  another  underscore  are
always reserved for any use.

 -- All  identifiers  that  begin  with  an  underscore are
always reserved for use as identifiers with file  scope
in both the ordinary and tag name spaces.

"Reserved" in this context appears to mean "reserved for use by
system headers and/or compiler special behaviors".

Declaring our own global variables with double-underscore prefixes is not
just asking for trouble, it's waving a red flag in front of a bull.


2. (problem with proposed patch) I share Alvaro's allergy for replacing
uses of a common variable with a bunch of macros, especially macros that
don't look like macros.  That's not reducing the reader's cognitive
burden.  I'd even say it's actively misleading the reader, because what
the new code *looks* like it's doing is referencing several independent
global variables.  We don't need our code to qualify as an entry for
the Obfuscated C Contest.

The notational confusion could be solved perhaps by writing the macros
with function-like parentheses, but it still doesn't seem like an
improvement.  In particular, the whole point here is to have a common
idiom for logging, but I'm unconvinced that every frontend program
should be using unlikely() in this particular way.  Maybe it's unlikely
for pgbench's usage that verbose logging would be turned on, but why
should we build in an assumption that that's universally the case?

TBH, my recommendation would be to drop *all* of these likely()
and unlikely() calls.  What evidence have you got that those are
meaningfully improving the quality of the generated code?  And if
they're buried inside macros, they certainly aren't doing anything
useful in terms of documenting the code.

regards, tom lane




RE: [PATCH] Resolve Parallel Hash Join Performance Issue

2020-01-09 Thread Deng, Gang
Thank you for the comment. Yes, I agree the alternative of using '(!parallel)', 
so that no need to test the bit. Will someone submit patch to for it 
accordingly?

-Original Message-
From: Thomas Munro  
Sent: Thursday, January 9, 2020 6:04 PM
To: Deng, Gang 
Cc: pgsql-hack...@postgresql.org
Subject: Re: [PATCH] Resolve Parallel Hash Join Performance Issue

On Thu, Jan 9, 2020 at 10:04 PM Deng, Gang  wrote:
> Attached is a patch to resolve parallel hash join performance issue. This is 
> my first time to contribute patch to PostgreSQL community, I referred one of 
> previous thread as template to report the issue and patch. Please let me know 
> if need more information of the problem and patch.

Thank you very much for investigating this and for your report.

> HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
>
> changed to:
>
> if 
> (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
>
> {
>
> 
> HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
>
> }
>
> Compared with original code, modified code can avoid unnecessary write to 
> memory/cache.

Right, I see.  The funny thing is that the match bit is not even used in this 
query (it's used for right and full hash join, and those aren't supported for 
parallel joins yet).  Hmm.  So, instead of the test you proposed, an 
alternative would be to use if (!parallel).
That's a value that will be constant-folded, so that there will be no branch in 
the generated code (see the pg_attribute_always_inline trick).  If, in a future 
release, we need the match bit for parallel hash join because we add parallel 
right/full hash join support, we could do it the way you showed, but only if 
it's one of those join types, using another constant parameter.

> D. Result
>
> With the modified code, performance of hash join operation can scale better 
> with number of threads. Here is result of query02 after patch. For example, 
> performance improved ~2.5x when run 28 threads.
>
> number of thread:1   48 1628
> time used(sec):465.1  193.1   97.9   55.9  41

Wow.  That is a very nice improvement.


Re: pgbench - use pg logging capabilities

2020-01-09 Thread Michael Paquier
On Thu, Jan 09, 2020 at 09:27:42PM -0300, Alvaro Herrera wrote:
> On 2020-Jan-09, Fabien COELHO wrote:
>> -if (unlikely(__pg_log_level <= PG_LOG_DEBUG))
>> +if (pg_log_debug_level)
>>  {
> 
> Umm ... I find the original exceedingly ugly, but the new line is
> totally impenetrable.

Maybe just a pg_logging_get_level() for consistency with the
_set_level() one, and then compare the returned result with
PG_LOG_DEBUG in pgbench?
--
Michael


signature.asc
Description: PGP signature


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Tom Lane
I wrote:
>   ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 
> chunks); 223295504 used

> The test case is only inserting 50K fairly-short rows, so this seems
> like an unreasonable amount of memory to be consuming for that; and
> even if you think it's reasonable, it clearly isn't going to scale
> to large production transactions.

> Now, the good news is that v11 and later get through
> 006_logical_decoding.pl just fine under the same restriction.
> So we did something in v11 to fix this excessive memory consumption.
> However, unless we're willing to back-port whatever that was, this
> test case is clearly consuming excessive resources for the v10 branch.

I dug around a little in the git history for backend/replication/logical/,
and while I find several commit messages mentioning memory leaks and
faulty spill logic, they all claim to have been back-patched as far
as 9.4.

It seems reasonably likely to me that this result is telling us about
an actual bug, ie, faulty back-patching of one or more of those fixes
into v10 and perhaps earlier branches.

I don't know this code well enough to take point on looking for the
problem, though.

regards, tom lane




Re: pgbench - use pg logging capabilities

2020-01-09 Thread Alvaro Herrera
On 2020-Jan-09, Fabien COELHO wrote:

> - if (unlikely(__pg_log_level <= PG_LOG_DEBUG))
> + if (pg_log_debug_level)
>   {

Umm ... I find the original exceedingly ugly, but the new line is
totally impenetrable.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2020-01-09 Thread Alvaro Herrera
I looked at this a little while and was bothered by the perl changes; it
seems out of place to have RecursiveCopy be thinking about tablespaces,
which is way out of its league.  So I rewrote that to use a callback:
the PostgresNode code passes a callback that's in charge to handle the
case of a symlink.  Things look much more in place with that.  I didn't
verify that all places that should use this are filled.

In 0002 I found adding a new function unnecessary: we can keep backwards
compat by checking 'ref' of the third argument.  With that we don't have
to add a new function.  (POD changes pending.)

I haven't reviewed 0003.

v8 of all these patches attached.

"git am" told me your 0001 was in unrecognized format.  It applied fine
with "patch".  I suggest that if you're going to submit a series with
commit messages and all, please use "git format-patch" with the same
"-v" argument (9 in this case) for all patches.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From a81e747f0bfa90af8021e2399e196e177a93f62c Mon Sep 17 00:00:00 2001
From: Asim R P 
Date: Fri, 20 Sep 2019 17:31:25 +0530
Subject: [PATCH v8 1/3] Support node initialization from backup with
 tablespaces

User defined tablespaces appear as symlinks in in the backup.  This
commit tweaks recursive copy subroutine to allow for symlinks specific
to tablespaces.

Authored by Kyotaro
---
 src/test/perl/PostgresNode.pm  | 29 +++-
 src/test/perl/RecursiveCopy.pm | 40 --
 2 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 2e0cf4a2f3..3cae483ddb 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -593,6 +593,32 @@ sub backup_fs_cold
 	return;
 }
 
+sub _srcsymlink
+{
+	my ($srcpath, $destpath) = @_;
+
+	croak "Cannot operate on symlink \"$srcpath\""
+		if ($srcpath !~ qr{/(pg_tblspc/[0-9]+)$});
+
+	# We have mapped tablespaces. Copy them individually
+	my $tmpdir = TestLib::tempdir;
+	my $dstrealdir = TestLib::perl2host($tmpdir);
+	my $srcrealdir = readlink($srcpath);
+
+	opendir(my $dh, $srcrealdir);
+	while (readdir $dh)
+	{
+		next if (/^\.\.?$/);
+		my $spath = "$srcrealdir/$_";
+		my $dpath = "$dstrealdir/$_";
+		RecursiveCopy::copypath($spath, $dpath);
+	}
+	closedir $dh;
+
+	symlink $dstrealdir, $destpath;
+
+	return 1;
+}
 
 # Common sub of backup_fs_hot and backup_fs_cold
 sub _backup_fs
@@ -680,7 +706,8 @@ sub init_from_backup
 
 	my $data_path = $self->data_dir;
 	rmdir($data_path);
-	RecursiveCopy::copypath($backup_path, $data_path);
+	RecursiveCopy::copypath($backup_path, $data_path,
+			srcsymlinkfn => \&_srcsymlink);
 	chmod(0700, $data_path);
 
 	# Base configuration for this node
diff --git a/src/test/perl/RecursiveCopy.pm b/src/test/perl/RecursiveCopy.pm
index baf5d0ac63..715edcdedd 100644
--- a/src/test/perl/RecursiveCopy.pm
+++ b/src/test/perl/RecursiveCopy.pm
@@ -66,6 +66,7 @@ sub copypath
 {
 	my ($base_src_dir, $base_dest_dir, %params) = @_;
 	my $filterfn;
+	my $srcsymlinkfn;
 
 	if (defined $params{filterfn})
 	{
@@ -80,31 +81,55 @@ sub copypath
 		$filterfn = sub { return 1; };
 	}
 
+	if (defined $params{srcsymlinkfn})
+	{
+		croak "if specified, srcsymlinkfn must be a subroutine reference"
+			unless defined(ref $params{srcsymlinkfn})
+			and (ref $params{srcsymlinkfn} eq 'CODE');
+
+		$srcsymlinkfn = $params{srcsymlinkfn};
+	}
+	else
+	{
+		$srcsymlinkfn = undef;
+	}
+
 	# Complain if original path is bogus, because _copypath_recurse won't.
 	croak "\"$base_src_dir\" does not exist" if !-e $base_src_dir;
 
 	# Start recursive copy from current directory
-	return _copypath_recurse($base_src_dir, $base_dest_dir, "", $filterfn);
+	return _copypath_recurse($base_src_dir, $base_dest_dir, "", $filterfn, $srcsymlinkfn);
 }
 
 # Recursive private guts of copypath
 sub _copypath_recurse
 {
-	my ($base_src_dir, $base_dest_dir, $curr_path, $filterfn) = @_;
+	my ($base_src_dir, $base_dest_dir, $curr_path, $filterfn,
+		$srcsymlinkfn) = @_;
 	my $srcpath  = "$base_src_dir/$curr_path";
 	my $destpath = "$base_dest_dir/$curr_path";
 
 	# invoke the filter and skip all further operation if it returns false
 	return 1 unless &$filterfn($curr_path);
 
-	# Check for symlink -- needed only on source dir
-	# (note: this will fall through quietly if file is already gone)
-	croak "Cannot operate on symlink \"$srcpath\"" if -l $srcpath;
-
 	# Abort if destination path already exists.  Should we allow directories
 	# to exist already?
 	croak "Destination path \"$destpath\" already exists" if -e $destpath;
 
+	# Check for symlink -- needed only on source dir
+	# If caller provided us with a callback, call it; otherwise we're out.
+	if (-l $srcpath)
+	{
+		if (defined $srcsymlinkfn)
+		{
+			return &$srcsymlinkfn($srcpath, $destpath);
+		}
+		else
+		{
+			croak "Cannot operate on symlink 

Re: pgbench - use pg logging capabilities

2020-01-09 Thread Michael Paquier
On Thu, Jan 09, 2020 at 10:28:21AM +0100, Fabien COELHO wrote:
> Yep, I thought of it, but I was not very keen on having a malloc/free cycle
> just for one debug message. However under debug this is probably not an
> issue.

Consistency is more important here IMO, so applied.

> Your patch works for me. IT can avoid some level of format interpretation
> overheads by switching to Char/Str functions, see first attachement.

I kept both grouped to avoid any unnecessary churn with the
manipulation of PQExpBufferData.

> The other point is the test on __pg_log_level, see second attached.

May be better to discuss that on a separate thread as that's not only
related to pgbench.
--
Michael


signature.asc
Description: PGP signature


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Tom Lane
Amit Kapila  writes:
> On Thu, Jan 9, 2020 at 11:15 AM Tom Lane  wrote:
>> Noah Misch  writes:
>>> Even so, a web search for "extend_brk" led to the answer.  By default, 
>>> 32-bit
>>> AIX binaries get only 256M of RAM for stack and sbrk.  The new regression 
>>> test
>>> used more than that, hence this crash.

>> Hm, so
>> (1) Why did we get a crash and not some more-decipherable out-of-resources
>> error?  Can we improve that experience?
>> (2) Should we be dialing back the resource consumption of this test?

> In HEAD, we have a guc variable 'logical_decoding_work_mem' by which
> we can control the memory usage of changes and we have used that, but
> for back branches, we don't have such a control.

I poked into this a bit more by running the src/test/recovery tests under
restrictive ulimit settings.  I used

ulimit -s 1024
ulimit -v 25

(At least on my 64-bit RHEL6 box, reducing ulimit -v much below this
causes initdb to fail, apparently because the post-bootstrap process
tries to load all our tsearch and encoding conversion shlibs at once,
and it hasn't got enough VM space to do so.  Someday we may have to
improve that.)

I did not manage to duplicate Noah's crash this way.  What I see in
the v10 branch is that the new 006_logical_decoding.pl test fails,
but with a clean "out of memory" error.  The memory map dump that
that produces fingers the culprit pretty unambiguously:

...
  ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 
223295504 used
ReorderBufferByXid: 24576 total in 2 blocks; 11888 free (3 chunks); 
12688 used
Slab: TXN: 8192 total in 1 blocks; 5208 free (21 chunks); 2984 used
Slab: Change: 2170880 total in 265 blocks; 2800 free (35 chunks); 
2168080 used
...
Grand total: 226714720 bytes in 27327 blocks; 590888 free (785 chunks); 
226123832 used

The test case is only inserting 50K fairly-short rows, so this seems
like an unreasonable amount of memory to be consuming for that; and
even if you think it's reasonable, it clearly isn't going to scale
to large production transactions.

Now, the good news is that v11 and later get through
006_logical_decoding.pl just fine under the same restriction.
So we did something in v11 to fix this excessive memory consumption.
However, unless we're willing to back-port whatever that was, this
test case is clearly consuming excessive resources for the v10 branch.

We're not out of the woods either.  I also observe that v12 and HEAD
fall over, under these same test conditions, with a stack-overflow
error in the 012_subtransactions.pl test.  This seems to be due to
somebody's decision to use a heavily recursive function to generate a
bunch of subtransactions.  Is there a good reason for hs_subxids() to
use recursion instead of a loop?  If there is, what's the value of
using 201 levels rather than, say, 10?

Anyway it remains unclear why Noah's machine got a crash instead of
something more user-friendly.  But the reason why it's only in the
v10 branch seems non-mysterious.

regards, tom lane




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2020-01-09 Thread cary huang
On Mon, Jan 6, 2020 at 4:43 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Sat, 4 Jan 2020 at 15:11, cary huang  wrote:
> >>
> >> Hello Sawada and all
> >>
> >> I would like to elaborate more on Sehrope and Sawada's discussion on
> passing NULL IV in "pg_cipher_encrypt/decrypt" functions during
> kmgr_wrap_key and kmgr_unwrap_key routines in kmgr_utils.c. Openssl
> implements key wrap according to RFC3394 as Sawada mentioned and passing
> NULL will make openssl to use default IV, which equals to A6A6A6A6A6A6A6A6.
> I have confirmed this on my side; A key wrapped with "NULL" IV can be
> unwrapped successfully with IV=A6A6A6A6A6A6A6A6, and unwrap will fail if IV
> is set to anything else other than NULL or A6A6A6A6A6A6A6A6.
> >>
>
> >Sehrope also suggested me not to use the fixed IV in order to avoid
> >getting the same result from the same value. I'm researching it now.
> >Also, currently it's using key wrap algorithm[1] but it accepts only
> >multiple of 8 bytes as input. Since it's not good for some cases it's
> >better to use key wrap with padding algorithm[2] instead, which seems
> >available in OpenSSL 1.1.0 or later.
>

Since the current kmgr only supports AES128 and AES256, the master key
generated by kmgr during bootstrap will always be multiple of 8. I believe
AES keys in general must always be in multiple of 8. I have not done enough
research as to which encryption algorithm will involve keys not in multiple
of 8 so I think with the current key_wrap_algorithm is fine. With the key
wrap algorithm defined in RFC3394. The IV is used only as a "initial value"
and it has to be static; either we randomize one or we use the default
A6A6A6... by passing NULL, It is different from the CTR block cipher mode
which has been selected to encrypt WAL and buffer. In CTR mode, each block
requires a different and unique IV as input in order to be secured; and we
have agreed to use segment IDs as IVs. For this reason, I think the current
key wrap implementation is fine. The least we can do is to generate a IV
during bootstrap and store it in control file, and this generated IV will
be used for all key wrapping / unwrapping purposes instead of using the
default one.

Best regards
Cary Huang
HighGo Software Canada


Re: Setting min/max TLS protocol in clientside libpq

2020-01-09 Thread Daniel Gustafsson
Thanks for review everyone!  A v2 of the patch which I believe addresses all
concerns raised is attached.

> On 6 Jan 2020, at 07:01, Michael Paquier  wrote:
> 
> On Thu, Jan 02, 2020 at 09:46:44PM +, cary huang wrote:
>> I agree with Arthur that it makes sense to check the validity of
>> "conn->sslmaxprotocolversion" first before checking if it is larger
>> than "conn->sslminprotocolversion"
> 
> Here I don't agree.  Why not just let OpenSSL handle things with
> SSL_CTX_set_min_proto_version?  We don't bother about that in the
> backend code for that reason on top of keeping the code more simple
> with less error handling.  And things are cleaner when it comes to
> this libpq patch by giving up with the INT_MIN hack.

I looked into this and it turns out that OpenSSL does nothing to prevent the
caller from setting a nonsensical protocol range like min=tlsv1.3,max=tlsv1.1.
Thus, it's quite easy to screw up the backend server config and get it to start
properly, but with quite unrelated error messages as a result on connection.

Since I think this needs to be dealt with for both backend and frontend (if
this is accepted), I removed it from this patch to return to it in a separate
thread.

>> In the patch, if the client does not supply "sslminprotocolversion",
>> it will run to "else" statement and sets TLS min version to "INT_MIN",
>> which is a huge negative number and of course openssl won't set
>> it. I think this else statement can be enhanced a little to set
>> "sslminprotocolversion" to TLSv1.2 by default to match the server
>> and provide some warning message that TLS minimum has defaulted to
>> TLSv1.2. 
> 
> In this patch fe-secure-openssl.c has just done a copy-paste of
> SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version
> present in be-secure-openssl.c.  That's not good.  Could you refactor
> that please as a separate file?

Done.  I opted for a more generic header to make usage of the code easier, not
sure if thats ok.

One thing I noticed when looking at it is that we now have sha2_openssl.c and
openssl_protocol.c in src/common.  For easier visual grouping of OpenSSL
functionality, it makes sense to me to rename sha2_openssl.c to openssl_sha2.c,
but that might just be pointless churn.

> The patch should have tests in src/test/ssl/, like for invalid values,
> incorrect combinations leading to failures, etc.

Also done.

cheers ./daniel



libpq_minmaxproto_v2.patch
Description: Binary data




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Andrew Dunstan
On Fri, Jan 10, 2020 at 8:32 AM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > On Fri, Jan 10, 2020 at 1:21 AM Robert Haas  wrote:
> >> I share the concern about the security issue here. I can't testify to
> >> whether Christoph's whole analysis is here, but as a general point,
> >> non-superusers can't be allowed to do things that cause the server to
> >> access arbitrary local files.
>
> > It's probably fairly easy to do (c.f. 6136e94dcb). I'm not (yet)
> > convinced that there is any significant security threat here. This
> > doesn't give the user or indeed any postgres code any access to the
> > contents of these files. But if there is a consensus to restrict this
> > I'll do it.
>
> Well, even without access to the file contents, the mere ability to
> probe the existence of a file is something we don't want unprivileged
> users to have.  And (I suppose) this is enough for that, by looking
> at what error you get back from trying it.
>


OK, that's convincing enough. Will do it before long.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench - rework variable management

2020-01-09 Thread Fabien COELHO



Patch v4 is a just a rebase.


Patch v5 is a rebase with some adjustements.

--
Fabien.diff --git a/src/bin/pgbench/Makefile b/src/bin/pgbench/Makefile
index f402fe7b91..e1d3ef9bb3 100644
--- a/src/bin/pgbench/Makefile
+++ b/src/bin/pgbench/Makefile
@@ -10,6 +10,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	$(WIN32RES) \
 	exprparse.o \
+	symbol_table.o \
 	pgbench.o
 
 override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 85d61caa9f..ca21ee012e 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -211,7 +211,7 @@ make_variable(char *varname)
 	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
 
 	expr->etype = ENODE_VARIABLE;
-	expr->u.variable.varname = varname;
+	expr->u.variable.varnum = getOrCreateSymbolId(symbol_table, varname);
 	return expr;
 }
 
diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
index 430bff38a6..a3c5ea348d 100644
--- a/src/bin/pgbench/exprscan.l
+++ b/src/bin/pgbench/exprscan.l
@@ -207,19 +207,19 @@ notnull			[Nn][Oo][Tt][Nn][Uu][Ll][Ll]
 {digit}+		{
 	if (!strtoint64(yytext, true, >ival))
 		expr_yyerror_more(yyscanner, "bigint constant overflow",
-		  strdup(yytext));
+		  pg_strdup(yytext));
 	return INTEGER_CONST;
 }
 {digit}+(\.{digit}*)?([eE][-+]?{digit}+)?	{
 	if (!strtodouble(yytext, true, >dval))
 		expr_yyerror_more(yyscanner, "double constant overflow",
-		  strdup(yytext));
+		  pg_strdup(yytext));
 	return DOUBLE_CONST;
 }
 \.{digit}+([eE][-+]?{digit}+)?	{
 	if (!strtodouble(yytext, true, >dval))
 		expr_yyerror_more(yyscanner, "double constant overflow",
-		  strdup(yytext));
+		  pg_strdup(yytext));
 	return DOUBLE_CONST;
 }
 {alpha}{alnum}*	{
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ee1134aea2..22ca43738a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -251,24 +251,32 @@ const char *progname;
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /*
- * Variable definitions.
+ * Variable contents.
+ *
+ * Variable names are managed in symbol_table with a number.
  *
  * If a variable only has a string value, "svalue" is that value, and value is
- * "not set".  If the value is known, "value" contains the value (in any
- * variant).
+ * not set.
+ *
+ * If the value is known, "value" contains the value (in any variant).
  *
  * In this case "svalue" contains the string equivalent of the value, if we've
- * had occasion to compute that, or NULL if we haven't.
+ * had occasion to compute that, or an empty string if we haven't.
+ *
+ * The string length is arbitrary limited to benefit from static allocation.
  */
+#define SVALUE_SIZE	128
 typedef struct
 {
-	char	   *name;			/* variable's name */
-	char	   *svalue;			/* its value in string form, if known */
-	PgBenchValue value;			/* actual variable's value */
+	int			number;/* variable symbol number, -1 if unset */
+	char			svalue[SVALUE_SIZE];	/* value in string form, if known */
+	PgBenchValue	value;	/* actual value, if known */
 } Variable;
 
+#define undefined_variable_value(v)		\
+	v.svalue[0] == '\0' && v.value.type == PGBT_NO_VALUE
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
-#define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
 /*
  * Simple data structure to keep stats about something.
@@ -414,7 +422,6 @@ typedef struct
 	/* client variables */
 	Variable   *variables;		/* array of variable definitions */
 	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -427,6 +434,9 @@ typedef struct
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
+
+	/* next buffer should be at thread level, but it would change functions... */
+	PQExpBufferData	buf;		/* persistant buffer to avoid malloc/free cycles */
 } CState;
 
 /*
@@ -512,6 +522,7 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * varprefix 	SQL commands terminated with \gset have this set
  *to a non NULL value.  If nonempty, it's used to prefix the
  *variable name that receives the value.
+ * set_varnum	variable symbol number for \set and \setshell.
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
  */
@@ -524,6 +535,7 @@ typedef struct Command
 	int			argc;
 	char	   *argv[MAX_ARGS];
 	char	   *varprefix;
+	int			set_varnum;
 	PgBenchExpr *expr;
 	SimpleStats stats;
 } Command;
@@ -540,6 +552,9 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+#define 

Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Tom Lane
Andrew Dunstan  writes:
> On Fri, Jan 10, 2020 at 1:21 AM Robert Haas  wrote:
>> I share the concern about the security issue here. I can't testify to
>> whether Christoph's whole analysis is here, but as a general point,
>> non-superusers can't be allowed to do things that cause the server to
>> access arbitrary local files.

> It's probably fairly easy to do (c.f. 6136e94dcb). I'm not (yet)
> convinced that there is any significant security threat here. This
> doesn't give the user or indeed any postgres code any access to the
> contents of these files. But if there is a consensus to restrict this
> I'll do it.

Well, even without access to the file contents, the mere ability to
probe the existence of a file is something we don't want unprivileged
users to have.  And (I suppose) this is enough for that, by looking
at what error you get back from trying it.

regards, tom lane




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Daniel Gustafsson
> On 9 Jan 2020, at 22:38, Andrew Dunstan  
> wrote:

> I'm not (yet)
> convinced that there is any significant security threat here. This
> doesn't give the user or indeed any postgres code any access to the
> contents of these files. But if there is a consensus to restrict this
> I'll do it.

I've seen successful exploits made from parts that I in my wildest imagination
couldn't think be useful, so FWIW +1 for adding belts to suspenders and
restricting this.

cheers ./daniel



Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-09 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> I think he means something like
>>  make check with_readline=no
>> not for the actual build.

> Oh, I see.  I'd rather not codify that though, because it risks
> problems if that symbol ever gets used any other way.  I was
> thinking of making the test script check for some independent
> environment variable, say SKIP_READLINE_TESTS.

I thought of another problem with the with_readline=no method,
which is that it requires the user to be issuing "make check"
directly; it wouldn't be convenient for a buildfarm owner, say.
(*Perhaps* it'd work to set with_readline=no throughout a
buildfarm run, but I think that's asking for trouble with the
build part.)  I pushed a patch using SKIP_READLINE_TESTS.
Christoph should be able to set that for the Ubuntu branches
where the test is failing.

regards, tom lane




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Andrew Dunstan
On Fri, Jan 10, 2020 at 1:21 AM Robert Haas  wrote:
>
> On Thu, Jan 9, 2020 at 5:30 AM Christoph Berg  wrote:
> > I have some concerns about security, though. It's true that the
> > sslcert/sslkey options can only be set/modified by superusers when
> > "password_required" is set. But when password_required is not set, any
> > user and create user mappings that reference arbitrary files on the
> > server filesystem. I believe the options are still used in that case
> > for creating connections, even when that means the remote server isn't
> > set up for cert auth, which needs password_required=false to succeed.
> >
> > In short, I believe these options need explicit superuser checks.
>
> I share the concern about the security issue here. I can't testify to
> whether Christoph's whole analysis is here, but as a general point,
> non-superusers can't be allowed to do things that cause the server to
> access arbitrary local files.


It's probably fairly easy to do (c.f. 6136e94dcb). I'm not (yet)
convinced that there is any significant security threat here. This
doesn't give the user or indeed any postgres code any access to the
contents of these files. But if there is a consensus to restrict this
I'll do it.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Coding in WalSndWaitForWal

2020-01-09 Thread Tom Lane
Alvaro Herrera  writes:
> However, we already have a large number of proc_exit() calls in switch
> blocks that are not followed by breaks.  In fact, the majority are
> already like that.

Oh, hmm ... consistency is good.

regards, tom lane




Re: our checks for read-only queries are not great

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jan 9, 2020 at 3:07 PM Tom Lane  wrote:
>> You could argue about exactly how to extend that to non-spec
>> utility commands, but for the most part allowing them seems
>> to make sense if DML is allowed.

> But I think we allow them on all tables, not just temp tables, so I
> don't think I understand this argument.

Oh, I misunderstood your concern.

Peter might remember more clearly, but I have a feeling that we
concluded that the intent of the spec was for read-only-ness to
disallow globally-visible changes in the visible database contents.
VACUUM, for example, does not cause any visible change, so it
should be admissible.  REINDEX ditto.  (We ignore here the possibility
of such things causing, say, a change in the order in which rows are
returned, since that's beneath the spec's notice to begin with.)
ANALYZE ditto, except to the extent that if you look at pg_stats
you might see something different --- but again, system catalog
contents are outside the spec's purview.

You could extend this line of argument, perhaps, far enough to justify
ALTER SYSTEM SET as well.  But I don't like that because some GUCs have
visible effects on the results that an ordinary query minding its own
business can get.  Timezone is perhaps the poster child there, or
search_path.  If we were to subdivide the GUCs into "affects
implementation details only" vs "can affect query semantics",
I'd hold still for allowing ALTER SYSTEM SET on the former group.
Doubt it's worth the trouble to distinguish, though.

regards, tom lane




Re: our checks for read-only queries are not great

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 3:37 PM Robert Haas  wrote:
> > You could argue about exactly how to extend that to non-spec
> > utility commands, but for the most part allowing them seems
> > to make sense if DML is allowed.
>
> But I think we allow them on all tables, not just temp tables, so I
> don't think I understand this argument.

Oh, wait: I'm conflating two things. The current behavior extends the
spec behavior to COPY in a logical way.

But it also allows CLUSTER, REINDEX, and VACUUM on any table. The spec
presumably has no view on that, nor does the passage you quoted seem
to apply here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: our checks for read-only queries are not great

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 3:07 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Maybe the SQL standard has something to say about this?
>
> [ pokes around ... ]  Yeah, it does, and I'd say it's pretty clearly
> in agreement with what Peter did, so far as DML ops go.  For instance,
> this bit from SQL99's description of DELETE:
>
>  1) If the access mode of the current SQL-transaction or the access
> mode of the branch of the current SQL-transaction at the current
> SQL-connection is read-only, and T is not a temporary table,
> then an exception condition is raised: invalid transaction state
> - read-only SQL-transaction.
>
> UPDATE and INSERT say the same.  (I didn't look at later spec versions,
> since Peter's 2003 commit was probably based on SQL99.)

OK. That's good to know.

> You could argue about exactly how to extend that to non-spec
> utility commands, but for the most part allowing them seems
> to make sense if DML is allowed.

But I think we allow them on all tables, not just temp tables, so I
don't think I understand this argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 3:18 PM Tom Lane  wrote:
> * ISTM that that's assuming that the DBA and the sysadmin are the same
> person (or at least hold identical views on this subject).  In many
> installations it'd only be root who has control over what's in that
> directory, and I don't think it's unreasonable for the DBA to wish
> to be able to exercise additional filtering.

An emphatic +1 from me. This is what I've been trying to argue over
and over, apparently rather unclearly.

> * The point of a default role would be for the DBA to be able to
> control which database users can install extensions.  Even if the
> DBA has full authority over the extension library, that would not
> provide control over who can install, only over what is available
> for any of them to install.

I agree with that, too. I guess you could decide that the answer to
the question "who can install extensions?" must be the same as the
answer to the question "who owns a database?" but having the
flexibility to make the answers to those questions different seems
better than forcing them to always be the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-09 Thread Dent John
> On 9 Jan 2020, at 17:43, Daniel Verite  wrote:
> 
> […]
> (using plain text instead of HTML messages might help with that).

Thanks. I’ll do that next time.

> […]
> * unnest-refcursor-v3.patch needs a slight rebase because this chunk
> in the Makefile fails
> - regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \
> + refcursor.o regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \

Likewise I’ll make that rebase in the next version.

> * I'm under the impression that refcursor_unnest() is functionally
> equivalent to a plpgsql implementation like this:
> 
> […]
> 
> but it would differ in performance, because internally a materialization step
> could be avoided, but only when the other patch "Allow FunctionScans to
> pipeline results"  gets in?

Yes. That’s at least true if unnest(x) is used in the FROM. If it’s used in the 
SELECT, actually it can get the performance benefit right away. However, in the 
SELECT case, there’s a bit of a gotcha because anonymous records can’t easily 
be manipulated because they have no type information available. So to make a 
useful performance contribution, it does need to be combined with another 
change — either to make it FROM pipeline as in my other patch, or perhaps 
enabling anonymous record types to be cast or otherwise manipulated.

> […]
> /*
> - * UNNEST
> + * UNNEST (array)
>  */
> 
> This chunk looks unnecessary?

It was for purpose of disambiguating. But indeed it is unnecessary. Choosing a 
different name would avoid need for it.

> * some user-facing doc would be needed.

Indeed. I fully intend that. I figured I’d get the concept on a firmer footing 
first.

> * Is it good to overload "unnest" rather than choosing a specific
> function name?

Yeah. I wondered about that. A couple of syntactically obvious ideas were:

SELECT … FROM TABLE (x) (which is what I think Oracle does, but is reserved)

SELECT … FROM CURSOR (x) (which seems likely to confuse, but, surprisingly, 
isn’t actually reserved)

SELECT … FROM FETCH (x) (which I quite like, but is reserved)

SELECT … FROM ROWS_FROM (x) (is okay, but conflicts with our ROWS FROM 
construct)

So I kind of landed on UNNEST(x) out of lack of better idea. EXPAND(x) could be 
an option. Actually ROWS_IN(x) or ROWS_OF(x) might work.

Do you have any preference or suggestion?

Thanks a lot for the feedback.

denty.



Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Tom Lane
Stephen Frost  writes:
> I am not particularly concerned about that backwards compatibility issue
> and I don’t intend to base my argument on it, but I would use that case to
> point out that we have long had the ability to install trusted C functions
> into the backend as a DB owner, without complaint from either users or
> security pedants,

Right, which is why my patch proposes generalizing that feature for
trusted PLs into a general feature for other extensions.  I'd be
much leerier of that if we'd had any pushback on it for trusted PLs.

> ... and that backs up my position that we are setting up this
> privilege at the wrong level by using a default role which a superuser must
> grant independently from DB ownership.

Don't see how this follows.  It's somewhat accidental I think that
the existing behavior is tied to DB ownership.  That's just because
at the time, that's the only sort of privilege we had that seemed
intermediate between superuser and Joe User.  If we were designing
the behavior today, with default roles already a done deal for
handing out possibly-dangerous privileges, I think there's no
question that we'd be setting up this privilege as a default role
rather than tying it to DB ownership.  We don't make DB ownership
a prerequisite to creating other sorts of functions, yet other
functions can be just as dangerous in some cases as C functions.

regards, tom lane




Re: Coding in WalSndWaitForWal

2020-01-09 Thread Alvaro Herrera
On 2020-Jan-09, Tom Lane wrote:

> Alvaro Herrera  writes:
> > In modern times, we define pg_attribute_noreturn() like this:
> 
> > /* GCC, Sunpro and XLC support aligned, packed and noreturn */
> > #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
> > #define pg_attribute_noreturn() __attribute__((noreturn))
> > #define HAVE_PG_ATTRIBUTE_NORETURN 1
> > #else
> > #define pg_attribute_noreturn()
> > #endif
> 
> > I suppose this will cause warnings in compilers other than those, but
> > I'm not sure if we care.  What about MSVC for example?
> 
> Yeah, the lack of coverage for MSVC seems like the main reason not
> to assume this works "everywhere of interest".

That would easy to add as __declspec(noreturn) ... except that in MSVC
the decoration goes *before* the prototype rather after it, so this
seems difficult to achieve without invasive surgery.
https://docs.microsoft.com/en-us/cpp/cpp/noreturn?view=vs-2015

> > With the attached patch, everything compiles cleanly in my setup, no
> > warnings, but then it's GCC.
> 
> Meh ... I'm not really convinced that any of those changes are
> improvements.  Particularly not the removals of switch-case breaks.

However, we already have a large number of proc_exit() calls in switch
blocks that are not followed by breaks.  In fact, the majority are
already like that.

I can easily leave this well enough alone, though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Stephen Frost
Greetings,

On Thu, Jan 9, 2020 at 14:48 Tom Lane  wrote:

> Robert Haas  writes:
> > So, if I understand correctly, the patch you are proposing to commit
> > has a new system role, and if you've got that system role, then you
> > can install extensions.
>
> Install *trusted* extensions, correct.  The patch as it stands also
> allows DB owners to install trusted extensions.
>
> > I thought that part of the earlier debate was
> > whether DB owners should also be able to install trusted extensions
> > even without that role, and I thought it would be cleaner if the
> > answer was "no," because then the superuser could decide whether to
> > grant that role or not in particular cases. But I'm not clear whether
> > you agreed with that, what Stephen thought about it, or whether that's
> > still what you are proposing to commit.
>
> I agree that if we dropped the proviso about DB owners, it would be
> a cleaner design.  I included that only for backwards compatibility
> with the existing behavior that DB owners can install trusted PLs.
> If we can agree that we're willing to lose that behavior, I'd be
> perfectly fine with removing the special case for DB owners.
> However, I'm unsure whether that compatibility cost is acceptable.
> It's definitely likely that it would cause an upgrade headache
> for some installations.
>
> One idea for working around the upgrade problem would be to teach
> pg_dumpall to automatically issue "GRANT pg_install_trusted_extension"
> to each DB-owner role, when dumping from a pre-v13 database.  There's
> room to object to that, because it would end with more privilege than
> before (that is, an owner of some DB could now install extensions
> even in DBs she doesn't own, as long as she can connect to them).
> So maybe it's a bad idea.  But it would probably reduce the number
> of complaints --- and I think a lot of installations would end up
> making such grants anyway, because otherwise their DB owners can't
> do things they expect to be able to do.
>
> I should not put words into Stephen's mouth, but perhaps his
> concern about having some DB-level privilege here is to alleviate
> the problem that there's no exact equivalent to the old level of
> privilege that DB ownership afforded, ie you can install in your
> own DB but not others.  It's not clear to me whether that behavior
> is critical to preserve.


I am not particularly concerned about that backwards compatibility issue
and I don’t intend to base my argument on it, but I would use that case to
point out that we have long had the ability to install trusted C functions
into the backend as a DB owner, without complaint from either users or
security pedants, and that backs up my position that we are setting up this
privilege at the wrong level by using a default role which a superuser must
grant independently from DB ownership.

Thanks,

Stephen

>


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Tom Lane
Stephen Frost  writes:
> So I'm at a loss for why there is this insistence on a default role and
> a superuser-explicit-granting based approach that goes beyond "is it
> installed on the filesystem?" and "is it marked as trusted?".

Okay, so it seems like we're down to just this one point of contention.
You feel that the superuser can control what is in the extension library
directory and that that ought to be sufficient control.  I disagree
with that, for two reasons:

* ISTM that that's assuming that the DBA and the sysadmin are the same
person (or at least hold identical views on this subject).  In many
installations it'd only be root who has control over what's in that
directory, and I don't think it's unreasonable for the DBA to wish
to be able to exercise additional filtering.

* The point of a default role would be for the DBA to be able to
control which database users can install extensions.  Even if the
DBA has full authority over the extension library, that would not
provide control over who can install, only over what is available
for any of them to install.

regards, tom lane




Re: our checks for read-only queries are not great

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> Maybe the SQL standard has something to say about this?

[ pokes around ... ]  Yeah, it does, and I'd say it's pretty clearly
in agreement with what Peter did, so far as DML ops go.  For instance,
this bit from SQL99's description of DELETE:

 1) If the access mode of the current SQL-transaction or the access
mode of the branch of the current SQL-transaction at the current
SQL-connection is read-only, and T is not a temporary table,
then an exception condition is raised: invalid transaction state
- read-only SQL-transaction.

UPDATE and INSERT say the same.  (I didn't look at later spec versions,
since Peter's 2003 commit was probably based on SQL99.)

You could argue about exactly how to extend that to non-spec
utility commands, but for the most part allowing them seems
to make sense if DML is allowed.

regards, tom lane




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 2:48 PM Tom Lane  wrote:
> I agree that if we dropped the proviso about DB owners, it would be
> a cleaner design.  I included that only for backwards compatibility
> with the existing behavior that DB owners can install trusted PLs.
> If we can agree that we're willing to lose that behavior, I'd be
> perfectly fine with removing the special case for DB owners.
> However, I'm unsure whether that compatibility cost is acceptable.
> It's definitely likely that it would cause an upgrade headache
> for some installations.

I was assuming that installing extensions was fairly infrequent and
that it probably gets done mostly by superusers anyway, so probably
most people won't care if, after upgrading, they needed an extra GRANT
to get things working again. That might be wrong, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jan 9, 2020 at 1:35 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > Again, as I said upthread, Tom had the exact feature about which I am
> > > talking in the first version of the patch. That is a strong argument
> > > in favor of it being practical. It's also a pretty good argument that
> > > it is at least potentially useful, because Tom doesn't usually do
> > > useless things for no reason.
> >
> > To try to clarify that a bit: I think there is certainly some value
> > in allowing superusers to control which extensions could be installed
> > by non-superusers, further restricting what we may think is trustworthy.
> 
> Cool.

I'm arguing for the position that superusers/admins have the ability to
control which extensions exist on the filesystem, and that plus the
'trusted' marking is sufficient flexibility.

> > However, I felt at the time that my GUC-based implementation of that
> > was ugly, and then Peter raised some concrete points against it,
> > so I took it out.  I don't want to put it back in the same form.
> > I think we could leave designing a replacement for later, because it's
> > pretty optional, especially if we aren't aggressive about promoting
> > contrib modules to "trusted" status.
> 
> Agreed.

Also agreed- which is why I figured we weren't really discussing that
any more.

> > I don't agree that the lack of
> > such a feature is a reason not to commit what I've got.
> 
> I said the same in
> http://postgr.es/m/ca+tgmoygwgs_rnmoooczzcgrzfqtfngshaq2gu7lm5skxrf...@mail.gmail.com
> - penultimate paragraph, last sentence.

I also agree that we don't need the "who can install what extension"
flexibility that the original GUC-based approach contemplated, but
that's because I don't think we are likely to ever need it.  If we do
and someone comes up with a good design for it, that'd be fine too.

> > In any case, AFAICT most of the heat-vs-light in this thread has not
> > been about which extensions are trustworthy, but about which users
> > should be allowed to install extensions, which seems like a totally
> > independent discussion.
> 
> I agree it's independent. It wasn't really the main point of what *I*
> was trying to talk about, but the heat-vs-light problem seems to have
> totally obscured what I *was* trying to talk about.

I'm entirely confused about what you were trying to talk about then.

Most of the back-and-forth, as I saw it anyway, were points being raised
to say "we can't let the right of installing extensions be allowed to DB
owners", which I don't agree with and which I've yet to see an actual
justification for beyond "well, we think it should require some explicit
superuser privilege-granting, beyond the granting that the superuser
does when they create a database owned by a given user."

> > And controlling that is also a feature that
> > we don't have today, so I'd rather get a minimal feature committed
> > for v13 and then later consider whether we need more functionality.
> >
> > The idea of a DB-level INSTALL privilege addresses the second
> > point not the first, unless I'm totally misunderstanding it.  As
> > I said before, I'm not terribly comfortable with handing control
> > of that over to non-superuser DB owners, and I sure don't see why
> > doing so should be a required part of the minimal feature.
> 
> So, if I understand correctly, the patch you are proposing to commit
> has a new system role, and if you've got that system role, then you
> can install extensions. I thought that part of the earlier debate was
> whether DB owners should also be able to install trusted extensions
> even without that role, and I thought it would be cleaner if the
> answer was "no," because then the superuser could decide whether to
> grant that role or not in particular cases. But I'm not clear whether
> you agreed with that, what Stephen thought about it, or whether that's
> still what you are proposing to commit.

I do *not* agree with having a default role for this, at all.  This
looks just like the right to CREATE tables or functions inside a schema,
except with a DB-level object (an extension) instead of a schema-level
object, and that is the purview of the DB owner.

The arguments raised about $SCARYEXTENSION and security concerns make a
lot of sense- I agree that those are things we should discuss and make
sure that allowing a DB owner this privilege won't pave a way for them
to get superuser access, but, imv anyway, we discussed those and didn't
actually come up with any cases where it'd be an issue, in part thanks
to Tom's design where the objects end up owned by the bootstrap
superuser except in specific cases.

So I'm at a loss for why there is this insistence on a default role and
a superuser-explicit-granting based approach that goes beyond "is it
installed on the filesystem?" and "is it marked as trusted?".

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: our checks for read-only queries are not great

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 2:24 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I'd be really interested to hear if anyone knows the history behind
> > allowing CLUSTER, REINDEX, VACUUM, and some operations on temp tables.
> > It seems to have been that way for a long time. I wonder if it was a
> > deliberate choice or something that just happened semi-accidentally.
>
> Within a "read-only" xact you mean?  I believe that allowing DML writes
> was intentional.  As for the utility commands, I suspect that it was in
> part accidental (error of omission?), and then if anyone thought hard
> about it they decided that allowing DML writes to temp tables justifies
> those operations too.
>
> Have you tried excavating in our git history to see when the relevant
> permission tests originated?

check_xact_readonly() with a long list of command tags originated in
the same commit that added read-only transactions. CLUSTER, REINDEX,
and VACUUM weren't included in the list of prohibited operations then,
either, but it's unclear whether that was a deliberate omission or an
oversight. That commit also thought that COPY FROM - and queries -
should allow temp tables. But there's nothing in the commit that seems
to explain why, unless the commit message itself is a hint:

commit b65cd562402ed9d3206d501cc74dc38bc421b2ce
Author: Peter Eisentraut 
Date:   Fri Jan 10 22:03:30 2003 +

Read-only transactions, as defined in SQL.

Maybe the SQL standard has something to say about this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Coding in WalSndWaitForWal

2020-01-09 Thread Tom Lane
Alvaro Herrera  writes:
> In modern times, we define pg_attribute_noreturn() like this:

> /* GCC, Sunpro and XLC support aligned, packed and noreturn */
> #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
> #define pg_attribute_noreturn() __attribute__((noreturn))
> #define HAVE_PG_ATTRIBUTE_NORETURN 1
> #else
> #define pg_attribute_noreturn()
> #endif

> I suppose this will cause warnings in compilers other than those, but
> I'm not sure if we care.  What about MSVC for example?

Yeah, the lack of coverage for MSVC seems like the main reason not
to assume this works "everywhere of interest".

> With the attached patch, everything compiles cleanly in my setup, no
> warnings, but then it's GCC.

Meh ... I'm not really convinced that any of those changes are
improvements.  Particularly not the removals of switch-case breaks.

regards, tom lane




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> So, if I understand correctly, the patch you are proposing to commit
> has a new system role, and if you've got that system role, then you
> can install extensions.

Install *trusted* extensions, correct.  The patch as it stands also
allows DB owners to install trusted extensions.

> I thought that part of the earlier debate was
> whether DB owners should also be able to install trusted extensions
> even without that role, and I thought it would be cleaner if the
> answer was "no," because then the superuser could decide whether to
> grant that role or not in particular cases. But I'm not clear whether
> you agreed with that, what Stephen thought about it, or whether that's
> still what you are proposing to commit.

I agree that if we dropped the proviso about DB owners, it would be
a cleaner design.  I included that only for backwards compatibility
with the existing behavior that DB owners can install trusted PLs.
If we can agree that we're willing to lose that behavior, I'd be
perfectly fine with removing the special case for DB owners.
However, I'm unsure whether that compatibility cost is acceptable.
It's definitely likely that it would cause an upgrade headache
for some installations.

One idea for working around the upgrade problem would be to teach
pg_dumpall to automatically issue "GRANT pg_install_trusted_extension"
to each DB-owner role, when dumping from a pre-v13 database.  There's
room to object to that, because it would end with more privilege than
before (that is, an owner of some DB could now install extensions
even in DBs she doesn't own, as long as she can connect to them).
So maybe it's a bad idea.  But it would probably reduce the number
of complaints --- and I think a lot of installations would end up
making such grants anyway, because otherwise their DB owners can't
do things they expect to be able to do.

I should not put words into Stephen's mouth, but perhaps his
concern about having some DB-level privilege here is to alleviate
the problem that there's no exact equivalent to the old level of
privilege that DB ownership afforded, ie you can install in your
own DB but not others.  It's not clear to me whether that behavior
is critical to preserve.

regards, tom lane




Re: Coding in WalSndWaitForWal

2020-01-09 Thread Alvaro Herrera
On 2019-Nov-12, Andres Freund wrote:

> We still have the curious
>   proc_exit(0);
>   abort();/* keep the compiler 
> quiet */
> 
> pattern in WalSndShutdown() - wouldn't the right approach to instead
> proc_exit() with pg_attribute_noreturn()?

proc_exit() is already marked noreturn ... and has been marked as such
since commit eeece9e60984 (2012), which is the same that added abort()
after some proc_exit calls as well as other routines that were already
marked noreturn, such as WalSenderMain().  However, back then we were
using the GCC-specific notation of __attribute__((noreturn)), so perhaps
the reason we kept the abort() (and a few breaks, etc) after proc_exit()
was to satisfy compilers other than GCC.

In modern times, we define pg_attribute_noreturn() like this:

/* GCC, Sunpro and XLC support aligned, packed and noreturn */
#if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
#define pg_attribute_noreturn() __attribute__((noreturn))
#define HAVE_PG_ATTRIBUTE_NORETURN 1
#else
#define pg_attribute_noreturn()
#endif

I suppose this will cause warnings in compilers other than those, but
I'm not sure if we care.  What about MSVC for example?

With the attached patch, everything compiles cleanly in my setup, no
warnings, but then it's GCC.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index bfc629c753..e202ade4b9 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -304,7 +304,6 @@ AuxiliaryProcessMain(int argc, char *argv[])
 write_stderr("Try \"%s --help\" for more information.\n",
 			 progname);
 proc_exit(1);
-break;
 		}
 	}
 
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index c2250d7d4e..d26868e578 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -89,8 +89,8 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
 
 	if (in_restore_command)
 		proc_exit(1);
-	else
-		shutdown_requested = true;
+
+	shutdown_requested = true;
 	WakeupRecovery();
 
 	errno = save_errno;
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 77360f1524..45b552cdf2 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -208,7 +208,6 @@ WalReceiverMain(void)
 		case WALRCV_STOPPED:
 			SpinLockRelease(>mutex);
 			proc_exit(1);
-			break;
 
 		case WALRCV_STARTING:
 			/* The usual case */
@@ -616,8 +615,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 		SpinLockRelease(>mutex);
 		if (state == WALRCV_STOPPING)
 			proc_exit(0);
-		else
-			elog(FATAL, "unexpected walreceiver state");
+
+		elog(FATAL, "unexpected walreceiver state");
 	}
 	walrcv->walRcvState = WALRCV_WAITING;
 	walrcv->receiveStart = InvalidXLogRecPtr;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9c063749b6..99f897cb32 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -337,7 +337,6 @@ WalSndShutdown(void)
 		whereToSendOutput = DestNone;
 
 	proc_exit(0);
-	abort();	/* keep the compiler quiet */
 }
 
 /*


Re: our checks for read-only queries are not great

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> I'd be really interested to hear if anyone knows the history behind
> allowing CLUSTER, REINDEX, VACUUM, and some operations on temp tables.
> It seems to have been that way for a long time. I wonder if it was a
> deliberate choice or something that just happened semi-accidentally.

Within a "read-only" xact you mean?  I believe that allowing DML writes
was intentional.  As for the utility commands, I suspect that it was in
part accidental (error of omission?), and then if anyone thought hard
about it they decided that allowing DML writes to temp tables justifies
those operations too.

Have you tried excavating in our git history to see when the relevant
permission tests originated?

regards, tom lane




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 1:35 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Again, as I said upthread, Tom had the exact feature about which I am
> > talking in the first version of the patch. That is a strong argument
> > in favor of it being practical. It's also a pretty good argument that
> > it is at least potentially useful, because Tom doesn't usually do
> > useless things for no reason.
>
> To try to clarify that a bit: I think there is certainly some value
> in allowing superusers to control which extensions could be installed
> by non-superusers, further restricting what we may think is trustworthy.

Cool.

> However, I felt at the time that my GUC-based implementation of that
> was ugly, and then Peter raised some concrete points against it,
> so I took it out.  I don't want to put it back in the same form.
> I think we could leave designing a replacement for later, because it's
> pretty optional, especially if we aren't aggressive about promoting
> contrib modules to "trusted" status.

Agreed.

> I don't agree that the lack of
> such a feature is a reason not to commit what I've got.

I said the same in
http://postgr.es/m/ca+tgmoygwgs_rnmoooczzcgrzfqtfngshaq2gu7lm5skxrf...@mail.gmail.com
- penultimate paragraph, last sentence.

> In any case, AFAICT most of the heat-vs-light in this thread has not
> been about which extensions are trustworthy, but about which users
> should be allowed to install extensions, which seems like a totally
> independent discussion.

I agree it's independent. It wasn't really the main point of what *I*
was trying to talk about, but the heat-vs-light problem seems to have
totally obscured what I *was* trying to talk about.

> And controlling that is also a feature that
> we don't have today, so I'd rather get a minimal feature committed
> for v13 and then later consider whether we need more functionality.
>
> The idea of a DB-level INSTALL privilege addresses the second
> point not the first, unless I'm totally misunderstanding it.  As
> I said before, I'm not terribly comfortable with handing control
> of that over to non-superuser DB owners, and I sure don't see why
> doing so should be a required part of the minimal feature.

So, if I understand correctly, the patch you are proposing to commit
has a new system role, and if you've got that system role, then you
can install extensions. I thought that part of the earlier debate was
whether DB owners should also be able to install trusted extensions
even without that role, and I thought it would be cleaner if the
answer was "no," because then the superuser could decide whether to
grant that role or not in particular cases. But I'm not clear whether
you agreed with that, what Stephen thought about it, or whether that's
still what you are proposing to commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: our checks for read-only queries are not great

2020-01-09 Thread Robert Haas
On Wed, Jan 8, 2020 at 5:57 PM Stephen Frost  wrote:
> Yeah, I don't like the WEAKLY_READ_ONLY idea either- explicitly having
> COMMAND_OK_IN_X seems cleaner.

Done that way. v2 attached.

> Would we be able to make a rule of "can't change on-disk stuff, except
> for things in temporary schemas" and have it stick without a lot of
> complaints?  Seems like that would address Tom's ALTER SYSTEM SET
> concern, and would mean CLUSTER/REINDEX/VACUUM are disallowed in a
> backwards-incompatible way (though I think I'm fine with that..), and
> SET would still be allowed (which strikes me as correct too).  I'm not
> quite sure how I feel about LISTEN, but that it could possibly actually
> be used post-promotion and doesn't change on-disk stuff makes me feel
> like it actually probably should be allowed.

I think we can make any rule we like, but I think we should have some
measure of broad agreement on it. I'd like to go ahead with this for
now and then further changes can continue to be discussed and debated.
Hopefully we'll get a few more people to weigh in, too, because
deciding something like this based on what a handful of people doesn't
seem like a good idea to me.

I'd be really interested to hear if anyone knows the history behind
allowing CLUSTER, REINDEX, VACUUM, and some operations on temp tables.
It seems to have been that way for a long time. I wonder if it was a
deliberate choice or something that just happened semi-accidentally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-0001-Fix-problems-with-read-only-query-checks-and-refa.patch
Description: Binary data


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> Again, as I said upthread, Tom had the exact feature about which I am
> talking in the first version of the patch. That is a strong argument
> in favor of it being practical. It's also a pretty good argument that
> it is at least potentially useful, because Tom doesn't usually do
> useless things for no reason.

To try to clarify that a bit: I think there is certainly some value
in allowing superusers to control which extensions could be installed
by non-superusers, further restricting what we may think is trustworthy.

However, I felt at the time that my GUC-based implementation of that
was ugly, and then Peter raised some concrete points against it,
so I took it out.  I don't want to put it back in the same form.
I think we could leave designing a replacement for later, because it's
pretty optional, especially if we aren't aggressive about promoting
contrib modules to "trusted" status.  I don't agree that the lack of
such a feature is a reason not to commit what I've got.

In any case, AFAICT most of the heat-vs-light in this thread has not
been about which extensions are trustworthy, but about which users
should be allowed to install extensions, which seems like a totally
independent discussion.  And controlling that is also a feature that
we don't have today, so I'd rather get a minimal feature committed
for v13 and then later consider whether we need more functionality.

The idea of a DB-level INSTALL privilege addresses the second
point not the first, unless I'm totally misunderstanding it.  As
I said before, I'm not terribly comfortable with handing control
of that over to non-superuser DB owners, and I sure don't see why
doing so should be a required part of the minimal feature.

regards, tom lane




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 11:30 AM Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Thu, Jan 9, 2020 at 10:09 AM Stephen Frost  wrote:
> > > [ wall of text ]
>
> This really isn't helpful.

Sorry.

That being said, I'm pretty tired of writing emails that say the same
thing over and over again and having you write long responses that
don't seem to actually respond to the points being raised in the
email.

Like:

> > I don't see anything in here I really disagree with, but nor do I
> > understand why any of it means that giving superusers the ability to
> > customize which extensions are database-owner-installable would be a
> > bad thing.
>
> Alright, I think there's definitely something we need to sort through.
>
> If you agree that the approach I advocated means less code for hosting
> providers to have to change in their fork, and that they don't want to
> give users superuser, and that they want non-superusers to be able to
> install extensions, and that they really don't want to modify things
> post-install, then I don't get why you're against the DB-level privilege
> that I've been advocating for except that it's "not as customizable."

What I was writing about in the quoted paragraph and what you are
writing about in the response are two different things. I said
*nothing* about a DB-level privilege in the paragraph I wrote, and yet
somehow your response to that paragraph says that I'm opposing a
DB-level privilege.

> Are you saying that in order to have something here that we must make it
> so that a superuser is able to specifiy, individually, which extensions
> can be installed by which users?  You keep coming back to this point of
> saying that you want this to be 'customizable' but I really don't see
> any justification for the level of customization you're asking for- but
> I see an awful lot of work involved.  When there's a lot of work
> involved for a use-case that no one is actually asking for, I'm really
> skeptical.  The use-cases that you've presented, at least thus far,
> certainly haven't swayed me into thinking that you're right that there's
> a justifiable use-case here for this level of complicated privileges.

I set forth my exact views in
http://postgr.es/m/CA+TgmoZK=5EC2O13J3sfOUCvYtvjGtxUKg=wq11q-wy4sc4...@mail.gmail.com

Everything since then has been trying to somehow clarify what I wrote
in that email, which has resulted in me repeating everything I said
there several times in different words. I would like to stop doing
that now. It appears to be clarifying nothing, failing to advance the
patch, and irritating you.

> I'm also not convinced that such a design would even be practical- ...

Again, as I said upthread, Tom had the exact feature about which I am
talking in the first version of the patch. That is a strong argument
in favor of it being practical. It's also a pretty good argument that
it is at least potentially useful, because Tom doesn't usually do
useless things for no reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add pg_file_sync() to adminpack

2020-01-09 Thread Tom Lane
Julien Rouhaud  writes:
> On Thu, Jan 9, 2020 at 6:16 PM Stephen Frost  wrote:
>> Why would you expect that when it isn't the case for the filesystem
>> itself..?

> Just a usual habit with durable property.

I tend to agree with Stephen on this, mainly because the point of
these adminpack functions is to expose filesystem access.  If these
functions were more "database-y" and less "filesystem-y", I'd agree
with trying to impose database-like consistency requirements.

We don't have to expose every wart of the filesystem semantics
--- for example, it would be reasonable to make pg_file_sync()
Do The Right Thing when applied to a directory, even if the
particular platform we're on doesn't behave sanely for that.
But having fsync separated from write is a pretty fundamental part
of most filesystems' semantics, so we ought not try to hide that.

regards, tom lane




Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-09 Thread Daniel Verite
Dent John wrote:

> I’ve made a revision of this patch. 

Some comments:

* the commitfest app did not extract up the patch from the mail,
possibly because it's buried in the MIME structure of the mail
(using plain text instead of HTML messages might help with that).
The patch has no status in http://commitfest.cputube.org/
probably because of this too.


* unnest-refcursor-v3.patch needs a slight rebase because this chunk
in the Makefile fails
-   regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \
+   refcursor.o regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \


* I'm under the impression that refcursor_unnest() is functionally
equivalent to a plpgsql implementation like this:

create function unnest(x refcursor) returns setof record as $$
declare
 r record;
begin
  loop
fetch x into r;
exit when not found;
return next r;
  end loop;
end $$ language plpgsql;

but it would differ in performance, because internally a materialization step
could be avoided, but only when the other patch "Allow FunctionScans to
pipeline results"  gets in?
Or are performance benefits expected right away with this patch?

*
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -5941,7 +5941,7 @@ array_fill_internal(ArrayType *dims, ArrayType *lbs,
 
 
 /*
- * UNNEST
+ * UNNEST (array)
  */

This chunk looks unnecessary?

* some user-facing doc would be needed.

* Is it good to overload "unnest" rather than choosing a specific
function name?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Add pg_file_sync() to adminpack

2020-01-09 Thread Julien Rouhaud
On Thu, Jan 9, 2020 at 6:16 PM Stephen Frost  wrote:
>
> Greetings,
>
> * Julien Rouhaud (rjuju...@gmail.com) wrote:
> > On Thu, Jan 9, 2020 at 7:43 AM Fujii Masao  wrote:
> > > On Wed, Dec 25, 2019 at 11:11 PM Julien Rouhaud  
> > > wrote:
> > > > On Wed, Dec 25, 2019 at 2:01 PM Fujii Masao  
> > > > wrote:
> > > > > I'd like to propose to add pg_file_sync() function into 
> > > > > contrib/adminpack.
> > > > > This function fsyncs the specified file or directory named by its 
> > > > > argument.
> > > > > IMO this is useful, for example, when you want to fsync the file that
> > > > > pg_file_write() writes out or that COPY TO exports the data into,
> > > > > for durability. Thought?
> > > >
> > > > +1, that seems like a useful wrapper.  Looking at existing functions,
> > > > I see that there's a pg_file_rename() in adminpack, but it doesn't use
> > > > durable_rename nor does it try to perform any fsync.  Same for
> > > > pg_file_unlink vs. durable_unlink.  It's probably worth fixing that at
> > > > the same time?
> > >
> > > I don't think that's a bug. I'm not sure if every users of those functions
> > > need durable rename and unlink at the expense of performance.
> > > So IMO it's better to add new argument like "durable" to those functions
> > > and durable_rename or _unlink is used only if it's true.
> >
> > It's probably a POLA violation.  I'm pretty sure that most people
> > using those functions would expect that a successful call to
> > pg_file_unlink() mean that the file cannot raise from the dead even
> > with certain unlikely circumstances, at least I'd expect so.  If
> > performance is a problem here, I'd rather have a new wrapper with a
> > sync flag that defaults to true so it's possible to disable it if
> > needed instead of calling a different function.  That being said, I
> > agree with Arthur, it should be handled in a different patch.
>
> Why would you expect that when it isn't the case for the filesystem
> itself..?

Just a usual habit with durable property.

>  I agree with Fujii on this- you should have to explicitly ask
> for us to do more than the equivilant filesystem-level operation.  We
> shouldn't be forcing that on you.

I just checked other somehow related cases and saw that for instance
COPY TO doesn't flush data either, so it's indeed the expected
behavior.




Re: Make autovacuum sort tables in descending order of xid_age

2020-01-09 Thread Robert Haas
On Thu, Dec 12, 2019 at 2:26 PM David Fetter  wrote:
> > I wonder if you might add information about table size, table changes,
> > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > sort on that cost, such that really large bloated tables with old xids
> > might get vacuumed before smaller, less bloated tables that have
> > even older xids. Sorting the tables based purely on xid_age seems to
> > ignore other factors that are worth considering.  I do not have a
> > formula for how those four factors should be weighted in the heuristic,
> > but you are implicitly assigning three of them a weight of zero in
> > your current patch.
>
> I think it's vastly premature to come up with complex sorting systems
> right now.  Just sorting in descending order of age should either have
> or not have positive effects.

A lot of previous efforts to improve autovacuum scheduling have fallen
down precisely because they did something that was so simple that it
was doomed to regress as many cases as it improved, so I wouldn't be
too quick to dismiss Mark's suggestion. In general, sorting by XID age
seems like it should be better, but it's not hard to come up with a
counterexample: suppose table T1 is going to wrap around in 4 hours
and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
hours and takes 1 hour to vacuum. Your algorithm will prioritize T2,
but it's better to prioritize T1. A second autovacuum worker may
become available for this database later and still get T2 done before
we run into trouble, but if we don't start T1 right now, we're hosed.
The current algorithm gets this right if T1 was defined before T2 and
thus appears earlier in pg_class; your algorithm gets it wrong
regardless.

I've had the thought for a while now that perhaps we ought to try to
estimate the rate of XID consumption, because without that it's really
hard to make smart decisions. In the above example, if the rate of XID
consumption is 4x slower, then it might be smarter to vacuum T2 first,
especially if T2 is very heavily updated compared to T1 and might
bloat if we don't deal with it right away. At the lower rate of XID
consumption, T1 is an urgent problem, but not yet an emergency.
However, I've noticed that most people who complain about unexpected
wraparound vacuums have them hit in peak periods, which when you think
about it, makes a lot of sense. If you consume XIDs 10x as fast during
your busy time as your non-busy times, then the XID that triggers the
wraparound scan on any given table is very likely to occur during a
busy period. So the *current* rate of XID consumption might not be
very informative, which makes figuring out what to do here awfully
tricky.

I think Mark's suggestion of some kind of formula that takes into
account the XID age as well as table size and bloat is probably a
pretty good one. We'll probably need to make some of the parameters of
that formula configurable.  Ideally, they'll be easy enough to
understand that users can say "oh, I'm using XIDs more or less quickly
than normal here, so I need to change parameter X" and even figure out
-- without using a calculator -- what sort of value for X might be
appropriate.

When there's a replication slot or prepared transaction or open
transaction holding back xmin, you can't advance the relfrozenxid of
that table past that point no matter how aggressively you vacuum it,
so it would probably be a good idea to set up the formula so that the
weight is based on the amount by which we think we'll be able to
advance relfrozenxid rather than, say, the age relative to the last
XID assigned.

The dominant cost of vacuuming a table is often the number and size of
the indexes rather than the size of the heap, particularly because the
visibility map may permit skipping a lot of the heap. So you have N
indexes that need to be read completely and 1 heap that needs to be
read only partially. So, whatever portion of the score comes from
estimating the cost of vacuuming that table ought to factor in the
size of the indexes. Perhaps it should also consider the contents of
the visibility map, although I'm less sure about that.

One problem with the exponential in Mark's formula is that it might
treat small XID differences between old tables as more important than
they really are.  I wonder if it might be a better idea to compute
several different quantities and use the maximum from among them as
the prioritization. We can model the priority of vacuuming a
particular table as the benefit of vacuuming that table multiplied by
the effort. The effort is easy to model: just take the size of the
table and its indexes. The benefit is trickier, because there are four
different possible benefits: relfrozenxid advancement, relminmxid
advancement, dead tuple removal, and marking pages all-visible. So,
suppose we model each benefit by a separate equation. For XID
advancement, figure 

Re: Add pg_file_sync() to adminpack

2020-01-09 Thread Stephen Frost
Greetings,

* Julien Rouhaud (rjuju...@gmail.com) wrote:
> On Thu, Jan 9, 2020 at 7:43 AM Fujii Masao  wrote:
> > On Wed, Dec 25, 2019 at 11:11 PM Julien Rouhaud  wrote:
> > > On Wed, Dec 25, 2019 at 2:01 PM Fujii Masao  wrote:
> > > > I'd like to propose to add pg_file_sync() function into 
> > > > contrib/adminpack.
> > > > This function fsyncs the specified file or directory named by its 
> > > > argument.
> > > > IMO this is useful, for example, when you want to fsync the file that
> > > > pg_file_write() writes out or that COPY TO exports the data into,
> > > > for durability. Thought?
> > >
> > > +1, that seems like a useful wrapper.  Looking at existing functions,
> > > I see that there's a pg_file_rename() in adminpack, but it doesn't use
> > > durable_rename nor does it try to perform any fsync.  Same for
> > > pg_file_unlink vs. durable_unlink.  It's probably worth fixing that at
> > > the same time?
> >
> > I don't think that's a bug. I'm not sure if every users of those functions
> > need durable rename and unlink at the expense of performance.
> > So IMO it's better to add new argument like "durable" to those functions
> > and durable_rename or _unlink is used only if it's true.
> 
> It's probably a POLA violation.  I'm pretty sure that most people
> using those functions would expect that a successful call to
> pg_file_unlink() mean that the file cannot raise from the dead even
> with certain unlikely circumstances, at least I'd expect so.  If
> performance is a problem here, I'd rather have a new wrapper with a
> sync flag that defaults to true so it's possible to disable it if
> needed instead of calling a different function.  That being said, I
> agree with Arthur, it should be handled in a different patch.

Why would you expect that when it isn't the case for the filesystem
itself..?  I agree with Fujii on this- you should have to explicitly ask
for us to do more than the equivilant filesystem-level operation.  We
shouldn't be forcing that on you.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDE

2020-01-09 Thread Andres Freund
Hi,

On 2020-01-09 13:17:59 +0530, Dilip Kumar wrote:
> I am able to reproduce the failure,  I think the assert in the
> 'logicalrep_write_insert' is not correct.  IMHO even if the replica
> identity is set to NOTHING we should be able to replicate INSERT?
> 
> This will fix the issue.
> 
> diff --git a/src/backend/replication/logical/proto.c
> b/src/backend/replication/logical/proto.c
> index dcf7c08..471461c 100644
> --- a/src/backend/replication/logical/proto.c
> +++ b/src/backend/replication/logical/proto.c
> @@ -145,7 +145,8 @@ logicalrep_write_insert(StringInfo out, Relation
> rel, HeapTuple newtuple)
> 
> Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
>rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
> -  rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
> +  rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX ||
> +  rel->rd_rel->relreplident == REPLICA_IDENTITY_NOTHING);
> 
> /* use Oid as relation identifier */
> pq_sendint32(out, RelationGetRelid(rel));

There's not much point in having this assert, right? Given that it
covers all choices? Seems better to just drop it.

Greetings,

Andres Freund




Re: [HACKERS] pg_shmem_allocations view

2020-01-09 Thread Andres Freund
Hi,

On 2020-01-09 11:13:46 -0500, Robert Haas wrote:
> On Wed, Jan 8, 2020 at 2:30 AM Kyotaro Horiguchi
>  wrote:
> > I agree to the patch as-is. Thanks for the explanatin.
> 
> OK. Thanks for the review, and for your thoughtful consideration of my 
> comments.
> 
> I went ahead and committed this.

Wee!

Greetings,

Andres Freund




Re: [Proposal] Global temporary tables

2020-01-09 Thread Tomas Vondra

On Thu, Jan 09, 2020 at 02:17:08PM +0300, Konstantin Knizhnik wrote:



On 06.01.2020 8:04, 曾文旌(义从) wrote:

In the previous communication

1 we agreed on the general direction
1.1 gtt use local (private) buffer
1.2 no replica access in first version

2 We feel that gtt needs to maintain statistics, but there is no agreement on 
what it will be done.

3 Still no one commented on GTT's transaction information processing, they 
include
3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data

I suggest we discuss further, reach an agreement, and merge the two patches to 
one.



I also hope that we should come to the common solution for GTT.
If we do not try to address parallel execution issues and access to 
temp tables at replicas (and I agreed
that it should be avoided in first version of the patch), then GTT 
patch becomes quite small.




Well, that was kinda my goal - making the patch as small as possible by
eliminating bits that are contentious or where we don't know the
solution (like planning for parallel queries).

The most complex and challenged task is to support GTT for all kind of 
indexes. Unfortunately I can not proposed some good universal solution 
for it.
Just patching all existed indexes implementation seems to be the only 
choice.




I haven't looked at the indexing issue closely, but IMO we need to
ensure that every session sees/uses only indexes on GTT that were
defined before the seesion started using the table.

Can't we track which indexes a particular session sees, somehow?


Statistic is another important case.
But once again I do not completely understand why we want to address 
all this issues with statistic in first version of the patch?


I think the question is which "issues with statistic" you mean. I'm sure
we can ignore some of them, e.g. the one with parallel workers not
having any stats (assuming we consider functions using GTT to be
parallel restricted).


It contradicts to the idea to make this patch as small as possible.


Well, there's "making patch as small as possible" vs. "patch behaving
correctly" trade-off ;-)

Also it seems to me that everybody agreed that users very rarely 
create indexes for temp tables and explicitly analyze them.


I certainly *disagree* with this.

We often see temporary tables as a fix or misestimates in complex
queries, and/or as a replacement for CTEs with statistics/indexes. In
fact it's a pretty valuable tool when helping customers with complex
queries affected by poor estimates.

So I think GTT will be useful even with limited support of statistic. 
In my version statistics for GTT is provided by pushing correspondent 
information to backend's cache for pg_statistic table.


I think someone pointed out pushing stuff directly into the cache is
rather problematic, but I don't recall the details.

Also I provided pg_temp_statistic view for inspecting it by users. The 
idea to make pg_statistic a view which combines statistic of normal 
and temporary tables is overkill from my point of view.


I do not understand why do we need to maintain hash with some extra 
information for GTT in backends memory (as it was done in Wenjing 
patch).
Also idea to use create extension for accessing this information seems 
to be dubious.




I think the extension was more a PoC rather than a final solution.


regards
--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> But, again, we already *have* a way of solving this problem: use
> quotes. As Simon pointed out, your proposed solution isn't really a
> solution at all, because & can appear in role names. It probably
> won't, but there probably also won't be a role name that matches
> either of these keywords, so it's just six of one, half a dozen of the
> other. The thing that really solves it is quoting.

I really just can't agree with the idea that:

""

and



in pg_hba.conf should mean materially different things and have far
reaching security differences.  Depending on quoting in pg_hba.conf for
this distinction is an altogether bad idea.

> Now I admit that if we decide pg_hba.conf keywords have to start with
> "pg_" and prevent names beginning with "pg_" from being used as object
> names, then we'd have TWO ways of distinguishing between a keyword and
> an object name. But I don't think TMTOWTDI is the right design
> principle here.

There is a *really* big difference here though which makes this not "two
ways to do the same thing"- you *can't* create a user starting with
"pg_".  You *can* create a user with an '&' in it.  If we prevented you
from being able to create users with '&' in it then I'd be more open to
the idea of using '&' to mean something special in pg_hba, and then it
really would be two different ways to do the same thing, but that's not
actually what's being proposed here.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jan 9, 2020 at 11:06 AM Tom Lane  wrote:
>> The problem is that we keep deciding that okay, it probably won't hurt
>> anybody if this particular thing-that-ought-to-be-a-reserved-word isn't
>> really reserved.

> But, again, we already *have* a way of solving this problem: use
> quotes. As Simon pointed out, your proposed solution isn't really a
> solution at all, because & can appear in role names.

I'm not sure that the pg_hba.conf parser allows that without quotes,
but in any case it's irrelevant to the proposal to use a pg_ prefix.
We don't allow non-built-in role names to be spelled that way,
quoted or not.

> Now I admit that if we decide pg_hba.conf keywords have to start with
> "pg_" and prevent names beginning with "pg_" from being used as object
> names, then we'd have TWO ways of distinguishing between a keyword and
> an object name. But I don't think TMTOWTDI is the right design
> principle here.

The principle I'm concerned about is not letting a configuration file
that was perfectly fine in version N silently become a security hazard
in version N+1.  The only way I will accept your proposal is if we
change the pg_hba.conf parser to *require* quotes around every
role and DB name that's not meant to be a keyword, so that people get
used to that requirement.  But I doubt that idea will fly.

regards, tom lane




Re: [Proposal] Global temporary tables

2020-01-09 Thread Tomas Vondra

On Thu, Jan 09, 2020 at 06:07:46PM +0300, Konstantin Knizhnik wrote:



On 06.01.2020 14:01, Tomas Vondra wrote:

On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:

In the previous communication

1 we agreed on the general direction 1.1 gtt use local (private)
buffer 1.2 no replica access in first version



OK, good.


2 We feel that gtt needs to maintain statistics, but there is no
agreement on what it will be done.



I certainly agree GTT needs to maintain statistics, otherwise it'll
lead to poor query plans. AFAIK the current patch stores the info in a
hash table in a backend private memory, and I don't see how else to do
that (e.g. storing it in a catalog would cause catalog bloat).

FWIW this is a reasons why I think just using shared buffers (instead
of local ones) is not sufficient to support parallel queriesl as
proposed by Alexander. The workers would not know the stats, breaking
planning of queries in PARALLEL SAFE plpgsql functions etc.



I do not think that "all or nothing" approach is so good for software
development as for database transactions.


Well, sure. I'm not saying we need to have a perfect solution in v1. I'm
saying if we have two choices:

(1) Use shared buffers even if it means the parallel query plan may be
arbitrarily bad.

(2) Use private buffers, even if it means no parallel queries with temp
tables.

Then I'm voting for (2) because it's less likely to break down. I can
imagine allowing parallel queries with GTT when there's no risk of
having to plan in the worker, but that's not there yet.

If we can come up with a reasonable solution for the parallel case, we
can enable it later.

Yes, if we have function in PL/pgSQL which performs queries om 
temporary tables, then
parallel workers may build inefficient plan for this queries due to 
lack of statistics.


IMHO that's a pretty awful deficiency, because it essentially means
users may need to disable parallelism for such queries. Which means
we'll get complaints from users, and we'll have to come up with some
sort of solution. I'd rather not be in that position.

From my point of view this is not a pitfall of GTT but result of lack 
of global plan cache in Postgres. And it should be fixed not at GTT 
level.




That doesn't give us free pass to just ignore the issue. Even if it
really was due to a lack of global plan cache, the fact is we don't have
that feature, so we have a problem. I mean, if you need infrastructure
that is not available, you either have to implement that infrastructure
or make it work properly without it.

Also I never see real use cases with such functions, even in the 
systems which using hard temporary tables and stored procedures.
But there are many other real problems with temp tables  (except 
already mentioned in this thread).


Oh, I'm sure there are pretty large plpgsql applications, and I'd be
surprised if at least some of those were not affected. And I'm sure
there are apps using UDF to do all sorts of stuff (e.g. I wonder if
PostGIS would have this issue - IIRC it's using SPI etc.).

The question is whether we should consider existing apps affected,
because they are using the regular temporary tables and not GTT. So
unless they switch to GTT there is no regression ...

But even in that case I don't think it's a good idea to accept this as
an acceptable limitation. I admit one of the reasons why I think that
may be that statistics and planning are my areas of interest, so I'm not
quite willing to accept incomplete stuff as OK.


In PgPro/EE we have fixes for some of them, for example:

1. Do not reserve space in the file for temp relations. Right now 
append of relation cause writing zero page to the disk by mdextend.
It cause useless disk IO for temp tables which in most cases fit in 
memory and should not be written at disk.


2. Implicitly perform analyze of temp table intermediately after 
storing data in it. Usually tables are analyzed by autovacuum in 
background.
But it doesn't work for temp tables which are not processes by 
autovacuum and are accessed immediately after filling them with data 
and lack of statistic  may cause
building very inefficient plan. We have online_analyze extension which 
force analyze of the table after appending some bulk of data to it.
It can be used for normal table but most of all it is useful for temp 
relations.


Unlike hypothetical example with parallel safe function working with 
temp tables,

this are real problems observed by some of our customers.
Them are applicable both to local and global temp tables and this is 
why I do not want to discuss them in context of GTT.




I think those are both interesting issues worth fixing, but I don't
think it makes the issue discussed here less important.





3 Still no one commented on GTT's transaction information 
processing, they include

3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data



No idea what to do about this.



I wonder 

Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jan 9, 2020 at 10:09 AM Stephen Frost  wrote:
> > [ wall of text ]

This really isn't helpful.

> I don't see anything in here I really disagree with, but nor do I
> understand why any of it means that giving superusers the ability to
> customize which extensions are database-owner-installable would be a
> bad thing.

Alright, I think there's definitely something we need to sort through.

If you agree that the approach I advocated means less code for hosting
providers to have to change in their fork, and that they don't want to
give users superuser, and that they want non-superusers to be able to
install extensions, and that they really don't want to modify things
post-install, then I don't get why you're against the DB-level privilege
that I've been advocating for except that it's "not as customizable."

Are you saying that in order to have something here that we must make it
so that a superuser is able to specifiy, individually, which extensions
can be installed by which users?  You keep coming back to this point of
saying that you want this to be 'customizable' but I really don't see
any justification for the level of customization you're asking for- but
I see an awful lot of work involved.  When there's a lot of work
involved for a use-case that no one is actually asking for, I'm really
skeptical.  The use-cases that you've presented, at least thus far,
certainly haven't swayed me into thinking that you're right that there's
a justifiable use-case here for this level of complicated privileges.

I'm also not convinced that such a design would even be practical- we
don't know all of the extensions that a given PG install will be able to
have when it's first installed.  If postgis isn't on the filesystem when
someone installs PG, how do I, as a superuser, say that $user is allowed
to install postgis?  Or do we always have to have this two-step "install
on filesystem", "grant privs to $user to install" process?  What if that
extension is then uninstalled from the filesystem?  Do we have to clean
up the GRANT that was done?

> > > I don't think changing what's in contrib helps much. Even if we rm
> > > -rf'd it, there's the same problem with out-of-core extensions. Joe
> > > Extensionman may think his extension ought to be trusted, and package
> > > it as such, but Paula Skepticaldba is entitled to think Joe's view of
> > > the security risks originating from his code is overly rosy.
> >
> > Out of core extensions have to get installed on to the system though,
> > they don't just show up magically, and lots and lots of folks out there
> > from corporate infrastructure groups to hosting providers have got lots
> > of experience with deciding what they'll allow to be installed on a
> > system and what they won't, what repositories of code they'll trust and
> > which they won't.
> 
> You seem to be ignoring the actual point of that example, which is
> that someone may want to install the extension but have a different
> view than the packager about whether it should be trusted.

Why would someone want to install something that isn't trusted?  You're
implying that's what is happening here, but it doesn't make any sense to
me and without it making sense I can't agree that it's a sensible enough
use-case to demand a lot of work be put into it.

> You seem to think that that hosting providers and system
> administrators will be thrilled to accept the judgement of developers
> about which extensions should be trusted in their environment. Great!

Huh?  Hosting providers are the ones that choose what gets installed on
the filesystem, certainly not developers, so I am baffled how you came
to the conclusion that I'm suggesting administrators are trusting the
judgement of developers.  That's just not at all the case.

> Evidently you disagree, and that's fine, even if I don't understand
> why. Given some of the development projects you've done in the past, I
> find it extremely surprising to here you now taking the position that
> fine-grained security controls are, in this case, unnecessary and
> useless, but you don't have to like it everywhere just because you
> like it for some things.

I'm all for fine-grained control- where it makes sense.  I'm still
*very* much of the opinion that we should be able to let DB owners and
schema owners control what kind of objects users are allowed to create
in their DBs/schemas.  I want a "GRANT CREATE FUNCTION ON SCHEMA mine TO
you;" ability.  I'm not clamouring for a way to say "GRANT CREATE
THISSPECIFICFUNCTION ON SCHEMA mine TO you;" or something like "GRANT
CREATE FUNCTION MATCHING REGEXP 'abc_*' ON SCHEMA mine TO you;".

In the end, superusers are, in fact, the ones who grant out ALL access,
the question is what level of access they want to grant out and to whom.
When it comes to trusted objects, we've historically said that it's the
DB owner who gets to say who can grant out that access and all I'm
trying 

Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 11:06 AM Tom Lane  wrote:
> The problem is that we keep deciding that okay, it probably won't hurt
> anybody if this particular thing-that-ought-to-be-a-reserved-word isn't
> really reserved.  Your exercise in justifying that for "superuser" is
> not unlike every other previous argument about this.  Sooner or later
> that's going to fail, and somebody's going to have a security problem
> because they didn't know that a particular name has magic properties
> in a particular context.  (Which, indeed, maybe it didn't have when
> they chose it.)  Claiming they should have known better isn't where
> I want to be when that happens.

But, again, we already *have* a way of solving this problem: use
quotes. As Simon pointed out, your proposed solution isn't really a
solution at all, because & can appear in role names. It probably
won't, but there probably also won't be a role name that matches
either of these keywords, so it's just six of one, half a dozen of the
other. The thing that really solves it is quoting.

Now I admit that if we decide pg_hba.conf keywords have to start with
"pg_" and prevent names beginning with "pg_" from being used as object
names, then we'd have TWO ways of distinguishing between a keyword and
an object name. But I don't think TMTOWTDI is the right design
principle here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-09 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-01-09 16:50, Tom Lane wrote:
>> That would disable psql's tab completion, command editing, and history
>> altogether, which I doubt is what you want for production builds.
>> If we conclude we can't work around the testing issues for ancient
>> libedit, probably the right answer is to provide a switch to
>> disable just the test.  I've been trying to dance around that
>> conclusion, but maybe we should just do it and move on.

> I think he means something like
>   make check with_readline=no
> not for the actual build.

Oh, I see.  I'd rather not codify that though, because it risks
problems if that symbol ever gets used any other way.  I was
thinking of making the test script check for some independent
environment variable, say SKIP_READLINE_TESTS.

regards, tom lane




Re: [HACKERS] pg_shmem_allocations view

2020-01-09 Thread Robert Haas
On Wed, Jan 8, 2020 at 2:30 AM Kyotaro Horiguchi
 wrote:
> I agree to the patch as-is. Thanks for the explanatin.

OK. Thanks for the review, and for your thoughtful consideration of my comments.

I went ahead and committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-09 Thread Peter Eisentraut

On 2020-01-09 16:50, Tom Lane wrote:

Christoph Berg  writes:

Fwiw if libedit in xenial is Just Broken and fixing the tests would
just codify the brokenness without adding any value, I'll just disable
that test in that particular build. It looks like setting
with_readline=no on the make command line should work.


That would disable psql's tab completion, command editing, and history
altogether, which I doubt is what you want for production builds.
If we conclude we can't work around the testing issues for ancient
libedit, probably the right answer is to provide a switch to
disable just the test.  I've been trying to dance around that
conclusion, but maybe we should just do it and move on.


I think he means something like

make check with_readline=no

not for the actual build.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jan 9, 2020 at 10:06 AM Tom Lane  wrote:
>> What I'm basically objecting to is the pseudo-reservedness.  If we
>> don't want to dodge that with special syntax, we should dodge it
>> by making sure the keywords are actually reserved names.

> ...
> But I think I'm coming around to the view that we're making what ought
> to be a simple change complicated, and we should just not do that.

The problem is that we keep deciding that okay, it probably won't hurt
anybody if this particular thing-that-ought-to-be-a-reserved-word isn't
really reserved.  Your exercise in justifying that for "superuser" is
not unlike every other previous argument about this.  Sooner or later
that's going to fail, and somebody's going to have a security problem
because they didn't know that a particular name has magic properties
in a particular context.  (Which, indeed, maybe it didn't have when
they chose it.)  Claiming they should have known better isn't where
I want to be when that happens.

I don't want to keep going down that path.  These things are effectively
reserved words, and they need to act like reserved words, so that you
get an error if you misuse them.  Silently doing something else than
what (one reasonable reading of) a pg_hba.conf entry seems to imply
is *bad*.

regards, tom lane




Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-09 Thread Tom Lane
Christoph Berg  writes:
> Fwiw if libedit in xenial is Just Broken and fixing the tests would
> just codify the brokenness without adding any value, I'll just disable
> that test in that particular build. It looks like setting
> with_readline=no on the make command line should work.

That would disable psql's tab completion, command editing, and history
altogether, which I doubt is what you want for production builds.
If we conclude we can't work around the testing issues for ancient
libedit, probably the right answer is to provide a switch to
disable just the test.  I've been trying to dance around that
conclusion, but maybe we should just do it and move on.

regards, tom lane




Re: Patch to document base64 encoding

2020-01-09 Thread Karl O. Pinc
On Thu, 9 Jan 2020 08:27:28 +0100 (CET)
Fabien COELHO  wrote:

> > Another option would be to use "bytes bytea".  
> 
> > (The current patch uses "string bytea".)
> > This would probably also require some re-wording throughout.  

> I like it, but this is only my own limited opinion, and I'm not a
> native English speaker.

Per your request for consistency I made this change throughout 
the entire binary string section.

New patch attached: doc_base64_v13.patch

This required surprisingly little re-wording.
Added word "binary" into the descriptions of convert(),
substring(), convert_from(), and convert_to().

I also added data types to the call syntax of set_bit() 
and set_byte().

And this patch adds hyperlinks from the get_bit(), get_byte(),
set_bit(), and set_byte() descriptions to the note
that offsets are zero-based.

I also removed the hyperlinked asterisks about the hash
function results and instead hyperlinked the word "hash"
in the descriptions.  (Links to the note about md5()
returning hex text and the others returning bytea and how
to convert between the two.)

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b42f12862..3bd7232814 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1459,6 +1459,13 @@
 natively for the bit-string types.

 
+   
+ Functions which convert, both to and from, strings and
+ the bytea type
+ are documented
+ separately.
+   
+

 SQL defines some string functions that use
 key words, rather than commas, to separate
@@ -1820,101 +1827,6 @@
abcde,2,22
   
 
-  
-   
-
- convert
-
-convert(string bytea,
-src_encoding name,
-dest_encoding name)
-   
-   bytea
-   
-Convert string to dest_encoding.  The
-original encoding is specified by
-src_encoding. The
-string must be valid in this encoding.
-Conversions can be defined by CREATE CONVERSION.
-Also there are some predefined conversions. See  for available conversions.
-   
-   convert('text_in_utf8', 'UTF8', 'LATIN1')
-   text_in_utf8 represented in Latin-1
-   encoding (ISO 8859-1)
-  
-
-  
-   
-
- convert_from
-
-convert_from(string bytea,
-src_encoding name)
-   
-   text
-   
-Convert string to the database encoding.  The original encoding
-is specified by src_encoding. The
-string must be valid in this encoding.
-   
-   convert_from('text_in_utf8', 'UTF8')
-   text_in_utf8 represented in the current database encoding
-  
-
-  
-   
-
- convert_to
-
-convert_to(string text,
-dest_encoding name)
-   
-   bytea
-   
-Convert string to dest_encoding.
-   
-   convert_to('some text', 'UTF8')
-   some text represented in the UTF8 encoding
-  
-
-  
-   
-
- decode
-
-decode(string text,
-format text)
-   
-   bytea
-   
-Decode binary data from textual representation in string.
-Options for format are same as in encode.
-   
-   decode('MTIzAAE=', 'base64')
-   \x3132330001
-  
-
-  
-   
-
- encode
-
-encode(data bytea,
-format text)
-   
-   text
-   
-Encode binary data into a textual representation.  Supported
-formats are: base64, hex, escape.
-escape converts zero bytes and high-bit-set bytes to
-octal sequences (\nnn) and
-doubles backslashes.
-   
-   encode('123\000\001', 'base64')
-   MTIzAAE=
-  
-
   

 
@@ -1982,19 +1894,6 @@
4
   
 
-  
-   length(string bytea,
-encoding name )
-   int
-   
-Number of characters in string in the given
-encoding. The string
-must be valid in this encoding.
-   
-   length('jose', 'UTF8')
-   4
-  
-
   

 
@@ -2044,8 +1943,8 @@

text

-Calculates the MD5 hash of string,
-returning the result in hexadecimal
+Calculates the MD5 hash
+of string, returning the result in hexadecimal

md5('abc')
900150983cd24fb0 d6963f7d28e17f72
@@ -2358,6 +2257,66 @@
test
   
 
+  
+   
+
+ sha224
+
+sha224(string)
+   
+   bytea
+   
+SHA-224 hash
+   
+   sha224('abc')
+   \x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
+  
+
+  
+   
+
+ sha256
+
+sha256(string)
+   
+   bytea
+   
+SHA-256 hash
+   
+   sha256('abc')
+   

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Alvaro Herrera
On 2020-Jan-09, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Hmm, so why not revert the test only in the back branches, given that
> > it's not so onerous in master?
> 
> I grow tired of repeating myself, but: it's purely accidental that this
> test passes in master for the existing set of buildfarm members.

Oh, I forgot we had that problem.  I agree with reverting the test,
rather than building all the extra functionality needed to make it more
stable, in that case.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Libpq support to connect to standby server as priority

2020-01-09 Thread Alvaro Herrera
On 2020-Jan-06, tsunakawa.ta...@fujitsu.com wrote:

> Let me check my understanding.  Are you proposing these?
> 
> * The canonical libpq connection parameter is target_session_attr = {primary 
> | standby | prefer-standby}.  Leave and document read-write as a synonym for 
> primary.
> 
> * When the server version is 13 or later, libpq just checks in_recovery, not 
> checking transaction_read_only or sending SHOW transaction_read_only.
> 
> * When the server version is before 13, libpq sends SHOW 
> transaction_read_only as before.

Yes, that sounds good to me.

> Personally, 100% agreed, considering what we really wanted to do when 
> target_session_attr was introduced is to tell if the server is primary or 
> standby.  The questions are:
> 
> Q1: Should we continue to use the name target_session_attr, or rename it to 
> target_server_type and make target_session_attr a synonym for it?  I'm in 
> favor of the latter.

I'm not 100% sure about this.  I think part of the reason of making it
target_session_attrs (note plural) is that the user could be able to
specify more than one attribute (a comma-separated list, like the
DateStyle GUC), if we supported some hypothetical attributes in the
future that are independent of the existing ones.  I'm not inclined to
break that, unless the authors of the original feature agree to that.

Maybe one possible improvement would be to add target_server_type as an
additional one, that only accepts a single item 
(primary/standby/prefer-standby),
as a convenience, while target_session_attrs retains its ability to
receive more than one value.  The two would be somewhat redundant but
not exact synonyms.

> Q2: Can we accept the subtle incompatibility that
> target_session_attr=read-write and target_server_type=primary are not
> the same, when default_transaction_read_only is on?  (I'd like to hear
> yes)

... on servers versions 12 and older, yes.  (If I understand correctly,
we wouldn't have such a difference in version 13).

> Q3: Can we go without supporting standby and prefer-standby for older
> servers?  (I think yes because we can say that it's a new feature
> effective for new servers.)

Yes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm, so why not revert the test only in the back branches, given that
> it's not so onerous in master?

I grow tired of repeating myself, but: it's purely accidental that this
test passes in master for the existing set of buildfarm members.

If I have to do so to prove my point, I will set up a buildfarm member
that uses USE_NAMED_POSIX_SEMAPHORES, and then insist that the patch
cope with that.

But the real issue is that the test is abusing max_files_per_process
to do something it was never intended for.  What it was intended for,
and works well at, is to constrain the total FD consumption of a
collection of backends.  It doesn't work well to constrain the maximum
allocatedDescs consumption, because there's too much variability in
our demand for other FDs.  If we feel that we should have a test that
is constraining that, then we need to invent some other mechanism to
do it with.  If we're not willing to invent an appropriate mechanism
to support the test, then we should drop the test, because a
half-baked test is worse than none.

An appropriate mechanism, perhaps, would be some way to constrain
max_safe_fds directly, without any platform- or environment-dependent
effects in the way.  It could be as simple as


/*
 * Take off the FDs reserved for system() etc.
 */
max_safe_fds -= NUM_RESERVED_FDS;

+   /*
+* Apply debugging limit, if defined.
+*/
+#ifdef MAX_SAFE_FDS_LIMIT
+   max_safe_fds = Min(max_safe_fds, MAX_SAFE_FDS_LIMIT);
+#endif
+
/*
 * Make sure we still have enough to get by.
 */

and then somebody who was concerned about this could run a buildfarm
member with "-DMAX_SAFE_FDS_LIMIT=10" or so.

regards, tom lane




Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 10:06 AM Tom Lane  wrote:
> What I'm basically objecting to is the pseudo-reservedness.  If we
> don't want to dodge that with special syntax, we should dodge it
> by making sure the keywords are actually reserved names.

You know, as I was reading this email, I got to thinking: aren't we
engineering a solution to a problem for which we already have a
solution?

The documentation says:

"Quoting one of the keywords in a database, user, or address field
(e.g., all or replication) makes the word lose its special character,
and just match a database, user, or host with that name."

So if you've writing a pg_hba.conf file that contains a specific role
name, and you want to make sure it doesn't get confused with a current
or future keyword, just quote it. If you don't quote it, make sure to
RTFM at the time and when upgrading.

If you want to argue that this isn't the cleanest possible solution to
the problem, I think I would agree. If we were doing this over again,
we could probably design a better syntax for pg_hba.conf, perhaps one
where all specific role names have to be quoted and anything that's
not quoted is expected to be a keyword. But, as it is, nothing's
really broken here, and practical confusion is unlikely. If someone
has a role named "superuser", then it's probably a superuser account,
and the worst that will happen is that we'll match all superuser
accounts rather than only that one. If someone has a non-superuser
account called "superuser", or if they have an account named
"nonsuperuser," then, uh, that's lame, and if this patch causes them
to improve their choice of role names, that's good. If it causes them
to use quotes, that's also good.

But I think I'm coming around to the view that we're making what ought
to be a simple change complicated, and we should just not do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> What I'm basically objecting to is the pseudo-reservedness.  If we
> don't want to dodge that with special syntax, we should dodge it
> by making sure the keywords are actually reserved names.  In other
> words, add a "pg_" prefix, as somebody else suggested upthread.

Yes, that was my suggestion, and it was also my change a few major
versions ago that actually reserved the "pg_" prefix for roles.

> BTW, although that solution works for the immediate need of
> keywords that have to be distinguished from role names, it doesn't
> currently scale to keywords for the database column, because we
> don't treat "pg_" as a reserved prefix for database names:
> 
> regression=# create role pg_zit;
> ERROR:  role name "pg_zit" is reserved
> DETAIL:  Role names starting with "pg_" are reserved.
> regression=# create database pg_zit;
> CREATE DATABASE
> 
> Should we do so, or wait till there's an immediate need to?

I seem to recall (but it was years ago, so I might be wrong) advocating
that we should reserve the 'pg_' prefix for *all* object types.  All I
can recall is that there wasn't much backing for the idea (though I also
don't recall any specific objection, and it's also quite possible that
there was simply no response to the idea).

For my 2c, I'd *much* rather we reserve it across the board, and sooner
than later, since that would hopefully reduce the impact on people.  The
only justification for *not* reserving it is if we KNOW that we'll never
need a special one of those, but, well, we're well past that for
database names already- look at the fact that we've got a "replication"
one, for example.  Maybe we can't ever un-reserve that, but I like the
idea of reserving "pg_" for database names and then having
"pg_replication" be allowed to mean replication connections and then
encouraging users to use that instead.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-09 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 8, 2020 at 6:09 PM Stephen Frost  wrote:
> > > To me, this seems more accidental than the natural fallout of good design.
> >
> > I disagree that the privilege design for FDWs was accidental.
> 
> That seems like a stronger statement than the one I made...

Perhaps I misunderstood what you were referring to then.

> > What I see differently is that the purview of those decisions should be
> > that of the DB owner and not the superuser.
> 
> Yeah, I don't agree with that at all. If I'm a hosting provider, I
> want to tell my customers what they're allowed to run and what they're
> not allowed to run, but I don't want them to have to call me when they
> want to run an extension I've decided is OK.

Now I'm really floored because I've also been contemplating exactly the
situation of a hosting provider.  In my experience, there's a few pretty
common themes among the ones that I've played with:

- Users definitely don't get true PG superuser
- They certainly like users to be able to install trusted extensions
- They'd generally prefer to minimize the size of their fork

I've been feeling like the solution I'm pushing for would, in the end,
*reduce* the amount of code they have that's different from PG, which I
believe they'd see as an entirely good thing.  The approach proposed in
this thread seems likely to either increase the size of the fork or, at
best, be about the same size.

Now the question boils down to "what's trusted?" and if I'm a hosting
provider, it'd sure be nice to foist off the responsibility of figuring
that out on a community that I can trust that will do so at no cost to
me, and who will address any bugs or security holes in those trusted
extensions for me.

Realistically, we pretty much already do that for contrib and that's
really all that's relevant here- anything else would need to be
physically installed on the system in order for a user to be able to
install it, and I expect hosting providers to be pretty happy with a
solution that boils down to just having to install RPMs, or not (or the
equivilant with regard to putting files into the right places and such).

Now, if we're talking about defining repositories or allowing users to
upload their *own* C code from their systems in to PG and install that
as an extension, then, sure, that's gotta be limited to a superuser (and
wouldn't be allowed by hosting providers).  If the extension is
installed on the system though, and it's marked as trusted, then why do
we need a superuser to take some additional action to allow it to be
installed..?

When it comes to the hosting provider case, I'd argue that what we're
missing here is a way for them to give their "not-quite-a-superuser
role" the ability to have certain capabilities (at install time, of
course)- a good example of that would be "allowed to make outbound
network connections", with a way to then be able to delegate out that
ability to others.  Then they'd actually be able to use things like
postgres_fdw and dblink after they're installed and without having to
get a superuser to grant them that ability post-install (and that order
of operations issue is a pretty key one- it's far simpler to set things
up and hand them over to the user than to have some operation that has
to happen as an actual superuser after the installation is done).

Even if we marked postgres_fdw as trusted, and used this default-role
based approach, we're almost certainly going to have the FDW itself be
owned by the bootstrap superuser and therefore whatever non-superuser
role that installs it wouldn't be able to actually use it without some
other changes.  I'd love to get to a point where you could have an
initially set up system with a not-quite-superuser role that would be
able to actually install AND use postgres_fdw, without having to fork
PG.

> > As it relates to things in contrib that could be classified as 'a pile
> > of crap' or 'too experimental'- that's *our* failing, and one which we
> > should accept and address instead of punting on it.
> 
> I don't think changing what's in contrib helps much. Even if we rm
> -rf'd it, there's the same problem with out-of-core extensions. Joe
> Extensionman may think his extension ought to be trusted, and package
> it as such, but Paula Skepticaldba is entitled to think Joe's view of
> the security risks originating from his code is overly rosy.

Out of core extensions have to get installed on to the system though,
they don't just show up magically, and lots and lots of folks out there
from corporate infrastructure groups to hosting providers have got lots
of experience with deciding what they'll allow to be installed on a
system and what they won't, what repositories of code they'll trust and
which they won't.

Of course, when it comes to contrib extensions, if we don't feel
comfortable with them and we don't want to spend the time to vet them,
we can certainly just leave them marked as 

Re: [Proposal] Global temporary tables

2020-01-09 Thread Konstantin Knizhnik




On 06.01.2020 14:01, Tomas Vondra wrote:

On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:

In the previous communication

1 we agreed on the general direction
1.1 gtt use local (private) buffer
1.2 no replica access in first version



OK, good.


2 We feel that gtt needs to maintain statistics, but there is no
agreement on what it will be done.



I certainly agree GTT needs to maintain statistics, otherwise it'll lead
to poor query plans. AFAIK the current patch stores the info in a hash
table in a backend private memory, and I don't see how else to do that
(e.g. storing it in a catalog would cause catalog bloat).

FWIW this is a reasons why I think just using shared buffers (instead of
local ones) is not sufficient to support parallel queriesl as proposed
by Alexander. The workers would not know the stats, breaking planning of
queries in PARALLEL SAFE plpgsql functions etc.



I do not think that "all or nothing" approach is so good for software 
development as for database transactions.
Yes, if we have function in PL/pgSQL which performs queries om temporary 
tables, then
parallel workers may build inefficient plan for this queries due to lack 
of statistics.
From my point of view this is not a pitfall of GTT but result of lack 
of global plan cache in Postgres. And it should be fixed not at GTT level.


Also I never see real use cases with such functions, even in the systems 
which using hard temporary tables and stored procedures.
But there are many other real problems with temp tables  (except already 
mentioned in this thread).

In PgPro/EE we have fixes for some of them, for example:

1. Do not reserve space in the file for temp relations. Right now append 
of relation cause writing zero page to the disk by mdextend.
It cause useless disk IO for temp tables which in most cases fit in 
memory and should not be written at disk.


2. Implicitly perform analyze of temp table intermediately after storing 
data in it. Usually tables are analyzed by autovacuum in background.
But it doesn't work for temp tables which are not processes by 
autovacuum and are accessed immediately after filling them with data and 
lack of statistic  may cause
building very inefficient plan. We have online_analyze extension which 
force analyze of the table after appending some bulk of data to it.
It can be used for normal table but most of all it is useful for temp 
relations.


Unlike hypothetical example with parallel safe function working with 
temp tables,

this are real problems observed by some of our customers.
Them are applicable both to local and global temp tables and this is why 
I do not want to discuss them in context of GTT.





3 Still no one commented on GTT's transaction information processing, 
they include

3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data



No idea what to do about this.



I wonder what is the specific of GTT here?
The same problem takes place for normal (local) temp tables, doesn't it?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Recognizing superuser in pg_hba.conf

2020-01-09 Thread Tom Lane
Simon Riggs  writes:
> On Wed, 8 Jan 2020 at 23:55, Vik Fearing 
> wrote:
>> What is being proposed is what is in the Subject and the original
>> patch.  The other patch is because Tom didn't like "the continuing creep
>> of pseudo-reserved database and user names" so I wrote a patch to mark
>> such reserved names and rebased my original patch on top of it.  Only
>> the docs changed in the rebase.  The original patch (or its rebase) is
>> what I am interested in.

> Hopefully there will be no danger of me gaining access if I have a crafted
> rolename?
> postgres=# create role "";
> CREATE ROLE

Well, the existence of such a role name wouldn't by itself cause any
change in the way that pg_hba.conf is parsed.  If you could then
persuade a superuser to insert a pg_hba.conf line that is trying
to match your username, the line might do something else than what the
superuser expected, which is bad.  But the *exact* same hazard applies
to proposals based on inventing pseudo-reserved keywords (by which
I mean things that look like names, but aren't reserved words, so that
somebody could create a role name matching them).  Either way, an
uninformed superuser could be tricked.

What I'm basically objecting to is the pseudo-reservedness.  If we
don't want to dodge that with special syntax, we should dodge it
by making sure the keywords are actually reserved names.  In other
words, add a "pg_" prefix, as somebody else suggested upthread.
I don't personally find that prettier than a punctuation prefix,
but I can live with it if other people do.

BTW, although that solution works for the immediate need of
keywords that have to be distinguished from role names, it doesn't
currently scale to keywords for the database column, because we
don't treat "pg_" as a reserved prefix for database names:

regression=# create role pg_zit;
ERROR:  role name "pg_zit" is reserved
DETAIL:  Role names starting with "pg_" are reserved.
regression=# create database pg_zit;
CREATE DATABASE

Should we do so, or wait till there's an immediate need to?

regards, tom lane




Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-09 Thread Christoph Berg
Re: Robert Haas 2020-01-09 

> > So that raises the question: why does xenial's version of libedit
> > not match either its documentation or the distributed source code?
> > Because it doesn't.
> 
> The level of effort you've put into this is extremely impressive, but
> I can't shake the feeling that you're going to keep finding issues in
> the test setup, the operating system, or the upstream libraries,
> rather than bugs in PostgreSQL. Maybe this is all just one-time
> stabilization effort, but...

Fwiw if libedit in xenial is Just Broken and fixing the tests would
just codify the brokenness without adding any value, I'll just disable
that test in that particular build. It looks like setting
with_readline=no on the make command line should work.

Christoph




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 5:30 AM Christoph Berg  wrote:
> I have some concerns about security, though. It's true that the
> sslcert/sslkey options can only be set/modified by superusers when
> "password_required" is set. But when password_required is not set, any
> user and create user mappings that reference arbitrary files on the
> server filesystem. I believe the options are still used in that case
> for creating connections, even when that means the remote server isn't
> set up for cert auth, which needs password_required=false to succeed.
>
> In short, I believe these options need explicit superuser checks.

I share the concern about the security issue here. I can't testify to
whether Christoph's whole analysis is here, but as a general point,
non-superusers can't be allowed to do things that cause the server to
access arbitrary local files.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: Add basic TAP tests for psql's tab-completion logic.

2020-01-09 Thread Robert Haas
On Tue, Jan 7, 2020 at 5:29 PM Tom Lane  wrote:
> So that raises the question: why does xenial's version of libedit
> not match either its documentation or the distributed source code?
> Because it doesn't.

The level of effort you've put into this is extremely impressive, but
I can't shake the feeling that you're going to keep finding issues in
the test setup, the operating system, or the upstream libraries,
rather than bugs in PostgreSQL. Maybe this is all just one-time
stabilization effort, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-09 Thread Alvaro Herrera
On 2020-Jan-09, Amit Kapila wrote:

> In HEAD, we have a guc variable 'logical_decoding_work_mem' by which
> we can control the memory usage of changes and we have used that, but
> for back branches, we don't have such a control.

> After the latest changes by Noah, the tern and mandrill both are
> green.   I will revert the test added by this patch unless there is
> some strong argument to keep it.

Hmm, so why not revert the test only in the back branches, given that
it's not so onerous in master?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Resolve Parallel Hash Join Performance Issue

2020-01-09 Thread Tom Lane
Thomas Munro  writes:
> Right, I see.  The funny thing is that the match bit is not even used
> in this query (it's used for right and full hash join, and those
> aren't supported for parallel joins yet).  Hmm.  So, instead of the
> test you proposed, an alternative would be to use if (!parallel).
> That's a value that will be constant-folded, so that there will be no
> branch in the generated code (see the pg_attribute_always_inline
> trick).  If, in a future release, we need the match bit for parallel
> hash join because we add parallel right/full hash join support, we
> could do it the way you showed, but only if it's one of those join
> types, using another constant parameter.

Can we base the test off the match type today, and avoid leaving
something that will need to be fixed later?

I'm pretty sure that the existing coding is my fault, and that it's
like that because I reasoned that setting the bit was too cheap
to justify having a test-and-branch around it.  Apparently that's
not true anymore in a parallel join, but I have to say that it's
unclear why.  In any case, the reasoning probably still holds good
in non-parallel cases, so it'd be a shame to introduce a run-time
test if we can avoid it.

regards, tom lane




Re: pgsql: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Christoph Berg
Re: Robert Haas 2020-01-09 

> Does this mean that a non-superuser can induce postgres_fdw to read an
> arbitrary file from the local filesystem?

Yes, see my comments in the "Allow 'sslkey' and 'sslcert' in
postgres_fdw user mappings" thread.

Christoph




Re: Fixing parallel make of libpq

2020-01-09 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> My thoughts about the patch:
>
> 1) Changing from an "|"-style dependency to a plain dependency seems
> like a semantics change.  I've never been totally clear on the
> difference though.  I think Peter introduced our use of the "|" style,
> so maybe he can comment.

Makefile.shlib puts $(SHLIB_PREREQS) after the "|":

$ grep SHLIB_PREREQS src/Makefile.shlib
# SHLIB_PREREQS Order-only prerequisites for library build target
$(stlib): $(OBJS) | $(SHLIB_PREREQS)
$(shlib): $(OBJS) | $(SHLIB_PREREQS)
$(shlib): $(OBJS) | $(SHLIB_PREREQS)
$(shlib): $(OBJS) | $(SHLIB_PREREQS)
$(shlib): $(OBJS) | $(SHLIB_PREREQS)
$(shlib): $(OBJS) $(DLL_DEFFILE) | $(SHLIB_PREREQS)


> 2) The same coding pattern is used in a bunch of other places, so if
> this spot is broken, there probably are a lot of others that need a
> similar change.  On the other hand, there may not be that many
> directories that are likely places to start a parallel build from,
> so maybe we don't care elsewhere.

Grepping the Makefiles for ':.*submake-' shows that they are on the
actual build artifact target, libpq was just the outlier having it on
the phony "all" target.  For example pg_basebackup:

pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport 
submake-libpgfeutils
   $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) 
$(LIBS) -o $@$(X)

pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport 
submake-libpgfeutils
   $(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) 
$(LIBS) -o $@$(X)

pg_recvlogical: pg_recvlogical.o $(OBJS) | submake-libpq submake-libpgport 
submake-libpgfeutils
   $(CC) $(CFLAGS) pg_recvlogical.o $(OBJS) $(LDFLAGS) 
$(LDFLAGS_EX) $(LIBS) -o $@$(X)

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen




Re: pgsql: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Robert Haas
On Thu, Jan 9, 2020 at 3:11 AM Andrew Dunstan  wrote:
> Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings
>
> This allows different users to authenticate with different certificates.
>
> Author: Craig Ringer
>
> https://git.postgresql.org/pg/commitdiff/f5fd995a1a24e6571d26b1e29c4dc179112b1003

Does this mean that a non-superuser can induce postgres_fdw to read an
arbitrary file from the local filesystem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Fixing parallel make of libpq

2020-01-09 Thread Tom Lane
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> Michael Paquier  writes:
>> Hmm.  That logically makes sense.  Isn't that a side effect of 7143b3e
>> then?  Now, FWIW, I am not able to reproduce it here, after trying on
>> two different machines, various parallel job numbers (up to 32), and a
>> couple of dozen attempts.  Perhaps somebody else can see the failures?

> It fails reliably for me on Debian Buster, with make 4.2.1-1.2and -j4.

Yeah, it's also reliable for me on Fedora 30:

$ make -s clean
$ make -s -j4 -C src/interfaces/libpq
/usr/bin/ld: cannot find -lpgcommon_shlib
/usr/bin/ld: cannot find -lpgport_shlib
collect2: error: ld returned 1 exit status
make: *** [../../../src/Makefile.shlib:293: libpq.so.5.13] Error 1
make: *** Waiting for unfinished jobs

On a RHEL6 box, the same test only draws a complaint about
-lpgcommon_shlib, so it does seem like there's some make version
dependency in here.  And of course the whole thing is a race condition
anyway, so naturally it's going to be pretty context-sensitive.

My thoughts about the patch:

1) Changing from an "|"-style dependency to a plain dependency seems
like a semantics change.  I've never been totally clear on the
difference though.  I think Peter introduced our use of the "|" style,
so maybe he can comment.

2) The same coding pattern is used in a bunch of other places, so if
this spot is broken, there probably are a lot of others that need a
similar change.  On the other hand, there may not be that many
directories that are likely places to start a parallel build from,
so maybe we don't care elsewhere.

regards, tom lane




Re: Add pg_file_sync() to adminpack

2020-01-09 Thread Julien Rouhaud
On Thu, Jan 9, 2020 at 7:31 AM Fujii Masao  wrote:
>
> On Mon, Jan 6, 2020 at 3:42 PM Michael Paquier  wrote:
> >
> > On Mon, Jan 06, 2020 at 03:20:13PM +0900, Arthur Zakirov wrote:
> > > It isn't case if a file doesn't exist. But if there are no permissions on
> > > the file:
> > >
> > > PANIC:  could not open file "testfile": Permissions denied
> > > server closed the connection unexpectedly
> > >
> > > It could be fixed by implementing a function like pg_file_sync_internal() 
> > > or
> > > by making the function fsync_fname_ext() external.
> >
> > The patch uses stat() to make sure that the file exists and has no
> > issues.  Though it could be a problem with any kind of TOCTOU-like
> > issues (looking at you, Windows, for ENOPERM), so I agree that it
> > would make more sense to use pg_fsync() here with a fd opened first.
>
> I agree that it's not good for pg_file_sync() to cause a PANIC.
> I updated the patch so that pg_file_sync() uses fsync_fname_ext()
> instead of fsync_fname() as Arthur suggested.
>
> It's one of ideas to make pg_file_sync() open the file and directly call
> pg_fsync(). But fsync_fname_ext() has already such code and
> I'd like to avoid the code duplication.

This looks good to me.

> Attached is the updated version of the patch.

+
+ pg_catalog.pg_file_sync(filename text)
+ boolean
+ 
+  Sync a file or directory
+ 
+

"Flush to disk" looks better than "sync" here.

+/* 
+ * pg_file_sync
+ *
+ * We REVOKE EXECUTE on the function from PUBLIC.
+ * Users can then grant access to it based on their policies.
+ */

I think that pg_write_server_files should be allowed to call that
function by default.




Re: Add pg_file_sync() to adminpack

2020-01-09 Thread Julien Rouhaud
On Thu, Jan 9, 2020 at 7:43 AM Fujii Masao  wrote:
>
> On Wed, Dec 25, 2019 at 11:11 PM Julien Rouhaud  wrote:
> >
> > On Wed, Dec 25, 2019 at 2:01 PM Fujii Masao  wrote:
> > >
> > > Hi,
> > >
> > > I'd like to propose to add pg_file_sync() function into contrib/adminpack.
> > > This function fsyncs the specified file or directory named by its 
> > > argument.
> > > IMO this is useful, for example, when you want to fsync the file that
> > > pg_file_write() writes out or that COPY TO exports the data into,
> > > for durability. Thought?
> >
> > +1, that seems like a useful wrapper.  Looking at existing functions,
> > I see that there's a pg_file_rename() in adminpack, but it doesn't use
> > durable_rename nor does it try to perform any fsync.  Same for
> > pg_file_unlink vs. durable_unlink.  It's probably worth fixing that at
> > the same time?
>
> I don't think that's a bug. I'm not sure if every users of those functions
> need durable rename and unlink at the expense of performance.
> So IMO it's better to add new argument like "durable" to those functions
> and durable_rename or _unlink is used only if it's true.

It's probably a POLA violation.  I'm pretty sure that most people
using those functions would expect that a successful call to
pg_file_unlink() mean that the file cannot raise from the dead even
with certain unlikely circumstances, at least I'd expect so.  If
performance is a problem here, I'd rather have a new wrapper with a
sync flag that defaults to true so it's possible to disable it if
needed instead of calling a different function.  That being said, I
agree with Arthur, it should be handled in a different patch.




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-09 Thread Christoph Berg
Re: To Andrew Dunstan 2020-01-09 <20200109103014.ga4...@msg.df7cb.de>
> I believe the options are still used in that case
> for creating connections, even when that means the remote server isn't
> set up for cert auth, which needs password_required=false to succeed.

They are indeed:

stat("/var/lib/postgresql/.postgresql/root.crt", 0x7ffcff3e2bb0) = -1 ENOENT 
(Datei oder Verzeichnis nicht gefunden)
stat("/foo", 0x7ffcff3e2bb0)= -1 ENOENT (Datei oder Verzeichnis 
nicht gefunden)
   sslcert

I'm not sure if that could be exploited in any way, but let's just
forbid it.

Christoph




Re: Error message inconsistency

2020-01-09 Thread MBeena Emerson
Hi Mahendra,

Thanks for the patch.
I am not sure but maybe the relation name should also be added to the
following test case?

create table t4 (id int);
insert into t4 values (1);
ALTER TABLE t4 ADD CONSTRAINT c1 CHECK (id > 10) NOT VALID; -- succeeds
ALTER TABLE t4 VALIDATE CONSTRAINT c1;
*ERROR:  check constraint "c1" is violated by some row*

On Mon, 6 Jan 2020 at 18:31, Mahendra Singh Thalor 
wrote:

> On Sat, 6 Jul 2019 at 09:53, Amit Kapila  wrote:
> >
> > On Mon, Jul 1, 2019 at 10:05 PM Alvaro Herrera 
> wrote:
> > >
> > > Do we have an actual patch here?
> > >
> >
> > We have a patch, but it needs some more work like finding similar
> > places and change all of them at the same time and then change the
> > tests to adapt the same.
> >
>
> Hi all,
> Based on above discussion, I tried to find out all the places where we
> need to change error for "not null constraint".  As Amit Kapila pointed out
> 1 place, I changed the error and adding modified patch.
>
>
> *What does this patch? *
> Before this patch, to display error of "not-null constraint", we were not
> displaying relation name in some cases so attached patch is adding relation
> name with the "not-null constraint" error in 2 places. I didn't changed out
> files of test suite as we haven't finalized error messages.
>
> I verified Robert's point of for partition tables also. With the error, we
> are adding relation name of "child table" and i think, it is correct.
>
> Please review attached patch and let me know feedback.
>
> Thanks and Regards
> Mahendra Singh Thalor
> EnterpriseDB: http://www.enterprisedb.com
>


-- 
--
M Beena Emerson


Re: [HACKERS] Block level parallel vacuum

2020-01-09 Thread Sergei Kornilov
Hello

I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip 
small indexes but this is not mentioned in documentation for both vacuum 
command and GUC itself.

+   /* Determine the number of parallel workers to launch */
+   if (lps->lvshared->for_cleanup)
+   {
+   if (lps->lvshared->first_time)
+   nworkers = lps->nindexes_parallel_cleanup +
+   lps->nindexes_parallel_condcleanup - 1;
+   else
+   nworkers = lps->nindexes_parallel_cleanup - 1;
+
+   }
+   else
+   nworkers = lps->nindexes_parallel_bulkdel - 1;

(lazy_parallel_vacuum_indexes)
Perhaps we need to add a comment for future readers, why we reduce the number 
of workers by 1. Maybe this would be cleaner?

+   /* Determine the number of parallel workers to launch */
+   if (lps->lvshared->for_cleanup)
+   {
+   if (lps->lvshared->first_time)
+   nworkers = lps->nindexes_parallel_cleanup +
+   lps->nindexes_parallel_condcleanup;
+   else
+   nworkers = lps->nindexes_parallel_cleanup;
+
+   }
+   else
+   nworkers = lps->nindexes_parallel_bulkdel;
+
+   /* The leader process will participate */
+   nworkers--;

I have no more comments after reading the patches.

regards, Sergei




Re: [Proposal] Global temporary tables

2020-01-09 Thread Konstantin Knizhnik




On 06.01.2020 8:04, 曾文旌(义从) wrote:

In the previous communication

1 we agreed on the general direction
1.1 gtt use local (private) buffer
1.2 no replica access in first version

2 We feel that gtt needs to maintain statistics, but there is no agreement on 
what it will be done.

3 Still no one commented on GTT's transaction information processing, they 
include
3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data

I suggest we discuss further, reach an agreement, and merge the two patches to 
one.



I also hope that we should come to the common solution for GTT.
If we do not try to address parallel execution issues and access to temp 
tables at replicas (and I agreed
that it should be avoided in first version of the patch), then GTT patch 
becomes quite small.


The most complex and challenged task is to support GTT for all kind of 
indexes. Unfortunately I can not proposed some good universal solution 
for it.
Just patching all existed indexes implementation seems to be the only 
choice.


Statistic is another important case.
But once again I do not completely understand why we want to address all 
this issues with statistic in first version of the patch? It contradicts 
to the idea to make this patch as small as possible.
Also it seems to me that everybody agreed that users very rarely create 
indexes for temp tables and explicitly analyze them.
So I think GTT will be useful even with limited support of statistic. In 
my version statistics for GTT is provided by pushing correspondent 
information to backend's cache for pg_statistic table.
Also I provided pg_temp_statistic view for inspecting it by users. The 
idea to make pg_statistic a view which combines statistic of normal and 
temporary tables is overkill from my point of view.


I do not understand why do we need to maintain hash with some extra 
information for GTT in backends memory (as it was done in Wenjing patch).
Also idea to use create extension for accessing this information seems 
to be dubious.


--
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company





  1   2   >