date:20190123

On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen  wrote:
> okay .thank you for explaining. i attach a patch corrected as such

I did a bit of work to this to fix a bunch of things:

1. Docs for --rows-per-insert didn't mention anything about a parameter.
2. You'd not followed the alphabetical order of how the parameters are
documented.
3. Various parts of the docs claimed that --inserts just inserted 1
row per statement. Those needed to be updated.
4. New options out of order in --help. The rest were in alphabetical order.
5. DumpOptions struct variable was not in the right place. It was
grouped in with some parameterless options.
6. Code in dumpTableData_insert() was convoluted. Not sure what you
had added end_of_statement for or why you were checking PQntuples(res)
== 0.  You'd also made the number_of_row variable 1-based and set it
to 1 when we had added 0 rows. You then checked for the existence of 1
row by checking the variable was > 1... That made very little sense to
me. I've pretty much put back the code that I had sent to you
previously, just without the part where I split the row building code
out into another function.
7. A comment in dumpTableData_insert() claimed that the insertStmt
would end in "VALUES(", but it'll end in "VALUES ". I had updated that
in my last version, but you must have missed that.
8. I've made it so --rows-per-insert implies --inserts. This is
aligned with how --column-inserts behaves.

I changed a few other things. I simplified the condition that raises
an error when someone does --on-conflict-do-nothing without the
--inserts option. There was no need to check for the other options
that imply --inserts since that will already be enabled if one of the
other options has.

I also removed most of the error checking you'd added to ensure that
the --rows-per-insert parameter was a number.  I'd have kept this but
I saw that we do nothing that special for the compression option. I
didn't see why --rows-per-insert was any more special. It was quite a
bit of code for very little reward.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_dump-rows-per-insert-option_v10.patch
Description: Binary data

Re: Protect syscache from bloating with negative cache entries

2019-01-23 Thread Kyotaro HORIGUCHI

At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp>
> An option is an additional PGPROC member and interface functions.
> 
> struct PGPROC
> {
>  ...
>  int syscahe_usage_track_interval; /* track interval, 0 to disable */
> 
> =# select syscahce_usage_track_add(, [, ]);
> =# select syscahce_usage_track_remove(2134);
> 
> 
> Or, just provide an one-shot triggering function.
> 
> =# select syscahce_take_usage_track();
> 
> This can use both a similar PGPROC variable or SendProcSignal()
> but the former doesn't fire while idle time unless using timer.

The attached is revised version of this patchset, where the third
patch is the remote setting feature. It uses static shared memory.

=# select pg_backend_catcache_stats(, );

Activates or changes catcache stats feature on the backend with
PID. (The name should be changed to .._syscache_stats, though.)
It is far smaller than the remote-GUC feature. (It contains a
part that should be in the previous patch. I will fix it later.)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 067f8ad60f259453271d2bf8323505beb5b9e0a9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 16 Oct 2018 13:04:30 +0900
Subject: [PATCH 1/3] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.
---
 doc/src/sgml/config.sgml  |  38 ++
 src/backend/access/transam/xact.c |   5 +
 src/backend/utils/cache/catcache.c| 166 --
 src/backend/utils/misc/guc.c  |  23 
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/utils/catcache.h  |  28 -
 6 files changed, 254 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..af3c52b868 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1662,6 +1662,44 @@ include_dir 'conf.d'
   
  
 
+ 
+  syscache_memory_target (integer)
+  
+   syscache_memory_target configuration parameter
+  
+  
+  
+   
+Specifies the maximum amount of memory to which syscache is expanded
+without pruning. The value defaults to 0, indicating that pruning is
+always considered. After exceeding this size, syscache pruning is
+considered according to
+. If you need to keep
+certain amount of syscache entries with intermittent usage, try
+increase this setting.
+   
+  
+ 
+
+ 
+  syscache_prune_min_age (integer)
+  
+   syscache_prune_min_age configuration parameter
+  
+  
+  
+   
+Specifies the minimum amount of unused time in seconds at which a
+syscache entry is considered to be removed. -1 indicates that syscache
+pruning is disabled at all. The value defaults to 600 seconds
+(10 minutes). The syscache entries that are not
+used for the duration can be removed to prevent syscache bloat. This
+behavior is suppressed until the size of syscache exceeds
+.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 18467d96d2..dbffec8067 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -733,7 +733,12 @@ void
 SetCurrentStatementStartTimestamp(void)
 {
 	if (!IsParallelWorker())
+	{
 		stmtStartTimestamp = GetCurrentTimestamp();
+
+		/* Set this timestamp as aproximated current time */
+		SetCatCacheClock(stmtStartTimestamp);
+	}
 	else
 		Assert(stmtStartTimestamp != 0);
 }
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 8152f7e21e..ee40093553 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -72,9 +72,24 @@
 #define CACHE6_elog(a,b,c,d,e,f,g)
 #endif
 
+/*
+ * GUC variable to define the minimum size of hash to cosider entry eviction.
+ * This variable is shared among various cache mechanisms.
+ */
+int cache_memory_target = 0;
+
+/* GUC variable to define the minimum age of entries that will be cosidered to
+ * be evicted in seconds. This variable is shared among various cache
+ * mechanisms.
+ */
+int cache_prune_min_age = 600;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Timestamp used for any operation on caches. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 	   int nkeys,
 	   Datum v1, Datum v2,
@@ -491,6 +506,7 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct

Analyze all plans

2019-01-23 Thread Donald Dong

Hi,

I'm working on an extension which analyzes all possible plans
generated by the planner. I believe this extension would become
useful for benchmarking the planner (e.g. the performance of the
estimation and the cost model) and better understanding the cases
where the planners would make a suboptimal move.

Here are my procedures:
1. Enumerate all the plans
1.1 Create a hook for add_path so that it won't discard the
expensive paths from the planner's point of view.
1.2 Collect all the paths from final_rel->pathlist, and turn them
into PlannedStmt nodes by patching standard_planner.
1.3 Mark the cheapest path from the planner's point of view.

2. Explain all the plans
2.1 First explain the cheapest plan
2.1.1 If analyzing, collect the execution time and use it to set
a timer interrupt.
2.2 Explain the remaining plans
2.2.1 The other plans can be disastrous; the plans may never
finish in a reasonable amount of time. If 
analyzing, the timer
interrupt shall stop the executor.
2.2.2 Move on to the next plan

Are those procedures reasonable?

I'm able to implement all the steps except for 2.2.1.

- Attempt 1
Our most common way of handling the timeouts is to kill the
process. However, it would terminate the entire explain statement,
yet there're still plans to be explained.

- Attempt 2
Fork before explain so it would possible to kill the child process
without disrupting the explain statement. However, simply
allocating shared memory for the QueryDesc would not work (PANIC:
failed to re-find shared lock object). To me, this plan is more
doable, but it seems there exist other mechanisms I need to be
aware, to execute a plan in the child process and report the
result in the parent process?

What do you think?  I will appreciate any feedback.

Thank you,
Donald Dong

yet another comment typo patch



An assorted set of command typos is fixed in the attached patch.

--
Fabien.diff --git a/contrib/pgcrypto/pgp-decrypt.c b/contrib/pgcrypto/pgp-decrypt.c
index 7d31e5354b..96c344c30b 100644
--- a/contrib/pgcrypto/pgp-decrypt.c
+++ b/contrib/pgcrypto/pgp-decrypt.c
@@ -132,7 +132,7 @@ pgp_parse_pkt_hdr(PullFilter *src, uint8 *tag, int *len_p, int allow_ctx)
 	int			res;
 	uint8	   *p;
 
-	/* EOF is normal here, thus we dont use GETBYTE */
+	/* EOF is normal here, thus we don't use GETBYTE */
 	res = pullf_read(src, 1, &p);
 	if (res < 0)
 		return res;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d85a83abe9..961838ed93 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2771,7 +2771,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			run_cost += cpu_tuple_cost * numGroups;
 			run_cost += ptarget->cost.per_tuple * numGroups;
 
-			/* Accout for the eval cost of HAVING quals, if any */
+			/* Account for the eval cost of HAVING quals, if any */
 			if (root->parse->havingQual)
 			{
 QualCost	remote_cost;
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index fb37e2b6c4..b18ae2b3ed 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -442,7 +442,7 @@ restartScanEntry:
 			/*
 			 * Lock the entry leaf page.  This is more coarse-grained than
 			 * necessary, because it will conflict with any insertions that
-			 * land on the same leaf page, not only the exacty key we searched
+			 * land on the same leaf page, not only the exact key we searched
 			 * for.  But locking an individual tuple would require updating
 			 * that lock whenever it moves because of insertions or vacuums,
 			 * which seems too complicated.
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index d793efdd9c..5a7206f588 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1873,7 +1873,7 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot)
 
 	/*
 	 * Should probably fixed at some point, but for now it's easier to allow
-	 * buffer and heap tuples to be used interchangably.
+	 * buffer and heap tuples to be used interchangeably.
 	 */
 	if (slot->tts_ops == &TTSOpsBufferHeapTuple &&
 		op->d.fetch.kind == &TTSOpsHeapTuple)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d6cfd28ddc..ceea3d6d72 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1049,7 +1049,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
 			MemoryContextAllocZero(planstate->state->es_query_cxt, sizeof(JitInstrumentation));
 	combined = planstate->state->es_jit_worker_instr;
 
-	/* Accummulate all the workers' instrumentations. */
+	/* Accumulate all the workers' instrumentations. */
 	for (n = 0; n < shared_jit->num_workers; ++n)
 		InstrJitAgg(combined, &shared_jit->jit_instr[n]);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0c0891b33e..e773f20d9f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2243,7 +2243,7 @@ check_log_duration(char *msec_str, bool was_logged)
 
 		/*
 		 * Do not log if log_statement_sample_rate = 0. Log a sample if
-		 * log_statement_sample_rate <= 1 and avoid unecessary random() call
+		 * log_statement_sample_rate <= 1 and avoid unnecessary random() call
 		 * if log_statement_sample_rate = 1.  But don't compute any of this
 		 * unless needed.
 		 */
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index ba28611d8c..f09e3a9aff 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -8,7 +8,7 @@
  *	When a tuple is updated or deleted, our standard visibility rules
  *	consider that it is *still valid* so long as we are in the same command,
  *	ie, until the next CommandCounterIncrement() or transaction commit.
- *	(See acces/heap/heapam_visibility.c, and note that system catalogs are
+ *	(See access/heap/heapam_visibility.c, and note that system catalogs are
  *  generally scanned under the most current snapshot available, rather than
  *  the transaction snapshot.)	At the command boundary, the old tuple stops
  *	being valid and the new version, if any, becomes valid.  Therefore,
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index bca2dc118d..2fb7dd3b8e 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -151,7 +151,7 @@ typedef struct RecordCacheEntry
 
 /*
  * To deal with non-anonymous record types that are exchanged by backends
- * involved in a parallel query, we also need a shared verion of the above.
+ * involved in a parallel query, we also need a shared version of the above.
  */
 struct SharedRecordTypmodRegistry
 {
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pg

Re: Analyze all plans

2019-01-23 Thread Oleksandr Shulgin

On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:

>
> 1. Enumerate all the plans
>

Not sure this is going to work.  Because of the total number of possible
plans is somewhere
around O(n!), if I'm not mistaken, in terms of number of joined relations
times the available
access methods times the possible join strategies.

So enumerating all possible plans stops being practical for even slightly
complicated queries.

Regards,
--
Alex

Re: pg_dump multi VALUES INSERT




Hello David & Surafel,

About this v10:

Patch applies and compiles cleanly, local & global "make check" ok.

A few comments, possibly redundant with some already in the thread.

Out of abc-order rows-per-inserts option in getopt struct.

help string does not specify the expected argument.

I still think that the added rows_per_insert field is useless, ISTM That 
"int dump_inserts" can be reused, and you could also drop boolean 
got_rows_per_insert.


The feature is not tested anywhere. I still think that there should be a 
test on empty/small/larger-than-rows-per-insert tables, possibly added to 
existing TAP-tests.


--
Fabien.

inherited primary key misbehavior

Hi,

It seems that ATExecDetachPartition misses to mark a child's primary key
constraint entry (if any) as local (conislocal=true, coninhcount=0), which
causes this:

create table p (a int primary key) partition by list (a);
create table p2 partition of p for values in (2);
alter table p detach partition p2;
alter table p2 drop constraint p2_pkey ;
ERROR:  cannot drop inherited constraint "p2_pkey" of relation "p2"

select conname, conislocal, coninhcount from pg_constraint where conrelid
= 'p2'::regclass;
 conname │ conislocal │ coninhcount
─┼┼─
 p2_pkey │ f  │   1
(1 row)

How about the attached patch?

Thanks,
Amit
From 42cb636ce6acee717fa19f6b69bf40aacc402852 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 23 Jan 2019 18:33:09 +0900
Subject: [PATCH] Detach primary key constraint when detaching partition

---
 src/backend/commands/tablecmds.c   | 11 +++
 src/test/regress/expected/indexing.out |  9 +
 src/test/regress/sql/indexing.sql  |  9 +
 3 files changed, 29 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..897f74119d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15094,6 +15094,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
foreach(cell, indexes)
{
Oid idxid = lfirst_oid(cell);
+   Oid constrOid;
Relationidx;
 
if (!has_superclass(idxid))
@@ -15106,6 +15107,16 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
IndexSetParentIndex(idx, InvalidOid);
update_relispartition(classRel, idxid, false);
index_close(idx, NoLock);
+
+   /* Detach any associated constraint as well. */
+   if (!idx->rd_index->indisprimary)
+   continue;
+   constrOid = 
get_relation_idx_constraint_oid(RelationGetRelid(partRel),
+   
idxid);
+   if (!OidIsValid(constrOid))
+   elog(ERROR, "missing pg_constraint entry of index %u of 
%u",
+idxid, RelationGetRelid(partRel));
+   ConstraintSetParentConstraint(constrOid, InvalidOid);
}
table_close(classRel, RowExclusiveLock);
 
diff --git a/src/test/regress/expected/indexing.out 
b/src/test/regress/expected/indexing.out
index 118f2c78df..ebf86d7bfc 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1414,3 +1414,12 @@ DETAIL:  Key (a)=(4) already exists.
 create unique index on covidxpart (b) include (a); -- should fail
 ERROR:  insufficient columns in UNIQUE constraint definition
 DETAIL:  UNIQUE constraint on table "covidxpart" lacks column "a" which is 
part of the partition key.
+-- check that detaching a partition also detaches the primary key constraint
+create table parted_pk_detach_test (a int primary key) partition by list (a);
+create table parted_pk_detach_test1 partition of parted_pk_detach_test for 
values in (1);
+alter table parted_pk_detach_test detach partition parted_pk_detach_test1;
+alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;
+alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for 
values in (1);
+alter table parted_pk_detach_test1 drop constraint 
parted_pk_detach_test1_pkey;-- should fail
+ERROR:  cannot drop inherited constraint "parted_pk_detach_test1_pkey" of 
relation "parted_pk_detach_test1"
+drop table parted_pk_detach_test;
diff --git a/src/test/regress/sql/indexing.sql 
b/src/test/regress/sql/indexing.sql
index d4a64c18c7..c110f0bdb8 100644
--- a/src/test/regress/sql/indexing.sql
+++ b/src/test/regress/sql/indexing.sql
@@ -757,3 +757,12 @@ alter table covidxpart attach partition covidxpart4 for 
values in (4);
 insert into covidxpart values (4, 1);
 insert into covidxpart values (4, 1);
 create unique index on covidxpart (b) include (a); -- should fail
+
+-- check that detaching a partition also detaches the primary key constraint
+create table parted_pk_detach_test (a int primary key) partition by list (a);
+create table parted_pk_detach_test1 partition of parted_pk_detach_test for 
values in (1);
+alter table parted_pk_detach_test detach partition parted_pk_detach_test1;
+alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;
+alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for 
values in (1);
+alter table parted_pk_detach_test1 drop constraint 
parted_pk_detach_test1_pkey;-- should fail
+drop table parted_pk_detach_test;
-- 
2.11.0

Re: Proposal for Signal Detection Refactoring

2019-01-23 Thread Chris Travers

attached is a new signal handing patch.  Path is corrected and moved.  The
documentation is sightly streamlined in some places and expanded in others.

Feedback requested.
-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


signal_readme_patch.diff
Description: Binary data

Almost bug in COPY FROM processing of GB18030 encoded input


Hi,

I happened to notice that when CopyReadLineText() calls mblen(), it 
passes only the first byte of the multi-byte characters. However, 
pg_gb18030_mblen() looks at the first and the second byte. 
CopyReadLineText() always passes \0 as the second byte, so 
pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded 
characters as 2.


It works out fine, though, because the second half of the 4-byte encoded 
character always looks like another 2-byte encoded character, in 
GB18030. CopyReadLineText() is looking for delimiter and escape 
characters and newlines, and only single-byte characters are supported 
for those, so treating a 4-byte character as two 2-byte characters is 
harmless.


Attached is a patch to explain that in the comments. Grepping for 
mblen(), I didn't find any other callers that used mblen() like that.


- Heikki
>From b0a09c240a2b4d1120433828ca052a2542ff362d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 23 Jan 2019 13:04:05 +0200
Subject: [PATCH 1/1] Fix comments to that claimed that mblen() only looks at
 first byte.

GB18030's mblen() function looks at the first and the second byte of the
multibyte character, to determine its length. copy.c had made the assumption
that mblen() only looks at the first byte, but it turns out to work out
fine, because of the way the GB18030 encoding works. COPY will see a 4-byte
encoded character as two 2-byte encoded characters, which is enough for
COPY's purposes. It cannot mix those up with delimiter or escaping
characters, because only single-byte characters are supported as
delimiters or escape characters.
---
 src/backend/commands/copy.c  |  7 ++-
 src/backend/utils/mb/wchar.c | 32 +---
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a61a628471..5074ce0daa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -4073,9 +4073,14 @@ not_end_of_copy:
 		{
 			int			mblen;
 
+			/*
+			 * It is enough to look at the first byte in all our encodings, to
+			 * get the length.  (GB18030 is a bit special, but still works for
+			 * our purposes; see comment in pg_gb18030_mblen())
+			 */
 			mblen_str[0] = c;
-			/* All our encodings only read the first byte to get the length */
 			mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str);
+
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
 			IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
 			raw_buf_ptr += mblen - 1;
diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
index a5fdda456e..8e5116dfc1 100644
--- a/src/backend/utils/mb/wchar.c
+++ b/src/backend/utils/mb/wchar.c
@@ -15,16 +15,23 @@
 
 
 /*
- * conversion to pg_wchar is done by "table driven."
- * to add an encoding support, define mb2wchar_with_len(), mblen(), dsplen()
- * for the particular encoding. Note that if the encoding is only
- * supported in the client, you don't need to define
- * mb2wchar_with_len() function (SJIS is the case).
+ * Operations on multi-byte encodings are driven by a table of helper
+ * functions.
+ *
+ * To add an encoding support, define mblen(), dsplen() and verifier() for
+ * the encoding.  For server-encodings, also define mb2wchar() and wchar2mb()
+ * conversion functions.
  *
  * These functions generally assume that their input is validly formed.
  * The "verifier" functions, further down in the file, have to be more
- * paranoid.  We expect that mblen() does not need to examine more than
- * the first byte of the character to discover the correct length.
+ * paranoid.
+ *
+ * We expect that mblen() does not need to examine more than the first byte
+ * of the character to discover the correct length.  GB18030 is an exception
+ * to that rule, though, as it also looks at second byte.  But even that
+ * behaves in a predictable way, if you only pass the first byte: it will
+ * treat 4-byte encoded characters as two 2-byte encoded characters, which is
+ * good enough for all current uses.
  *
  * Note: for the display output of psql to work properly, the return values
  * of the dsplen functions must conform to the Unicode standard. In particular
@@ -1073,6 +1080,17 @@ pg_uhc_dsplen(const unsigned char *s)
  * GB18030
  *	Added by Bill Huang ,
  */
+
+/*
+ * Unlike all other mblen() functions, this also looks at the second byte of
+ * the input.  However, if you only pass the first byte of a multi-byte
+ * string, and \0 as the second byte, this still works in a predictable way:
+ * a 4-byte character will be reported as two 2-byte characters.  That's
+ * enough for all current uses, as a client-only encoding.  It works that
+ * way, because in any valid 4-byte GB18030-encoded character, the third and
+ * fourth byte look like a 2-byte encoded character, when looked at
+ * separately.
+ */
 static int
 pg_gb18030_mblen(const unsigned char *s)
 {
-- 
2.20.1

Re: postgres_fdw: oddity in costing aggregate pushdown paths

2019-01-23 Thread Etsuro Fujita

(2019/01/04 20:33), Etsuro Fujita wrote:
> Here is a new version of the patch.

Attached is an updated version of the patch.

Changes:
* Fixed a stupid bug in the case when use_remote_estimate
* Fixed a typo in a comment I added
* Modified comments I added a little bit further
* Added the commit message

If there are no objections, I'll push that.

Best regards,
Etsuro Fujita
>From 3ff65a6f05a330140bf5de94fbcf6ab2a04f040b Mon Sep 17 00:00:00 2001
From: Etsuro Fujita 
Date: Wed, 23 Jan 2019 20:30:30 +0900
Subject: [PATCH] postgres_fdw: Account for tlist eval costs in
 estimate_path_cost_size()

Previously, estimate_path_cost_size() didn't account for tlist eval
costs, except when costing a foreign-grouping path using local
statistics, but such costs should be accounted for when costing that path
using remote estimates, because some of the tlist expressions might be
evaluated locally.  Also, such costs should be accounted for in the case
of a foreign-scan or foreign-join path, because currently, postgres_fdw
evaluates PlaceHolderVars (if any) locally.

This also fixes an oversight in my commit
f8f6e44676ef38fee7a5bbe4f256a34ea7799ac1.

Like that commit, apply this to HEAD only to avoid destabilizing existing
plan choices.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/5BFD3EAD.2060301%40lab.ntt.co.jp
---
 contrib/postgres_fdw/postgres_fdw.c | 41 +++--
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d85a83abe9..f460fd2ec4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2499,6 +2499,9 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		total_cost;
 	Cost		cpu_per_tuple;
 
+	/* Make sure the core code has set up the relation's reltarget */
+	Assert(foreignrel->reltarget);
+
 	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
@@ -2576,6 +2579,24 @@ estimate_path_cost_size(PlannerInfo *root,
 		cost_qual_eval(&local_cost, local_param_join_conds, root);
 		startup_cost += local_cost.startup;
 		total_cost += local_cost.per_tuple * retrieved_rows;
+
+		/*
+		 * Add in tlist eval cost for each output row.  In case of an
+		 * aggregate, some of the tlist expressions such as grouping
+		 * exressions will be evaluated remotely, so adjust the costs.
+		 */
+		startup_cost += foreignrel->reltarget->cost.startup;
+		total_cost += foreignrel->reltarget->cost.startup;
+		total_cost += foreignrel->reltarget->cost.per_tuple * rows;
+		if (IS_UPPER_REL(foreignrel))
+		{
+			QualCost	tlist_cost;
+
+			cost_qual_eval(&tlist_cost, fdw_scan_tlist, root);
+			startup_cost -= tlist_cost.startup;
+			total_cost -= tlist_cost.startup;
+			total_cost -= tlist_cost.per_tuple * rows;
+		}
 	}
 	else
 	{
@@ -2674,19 +2695,19 @@ estimate_path_cost_size(PlannerInfo *root,
 			nrows = clamp_row_est(nrows * fpinfo->joinclause_sel);
 			run_cost += nrows * remote_conds_cost.per_tuple;
 			run_cost += fpinfo->local_conds_cost.per_tuple * retrieved_rows;
+
+			/* Add in tlist eval cost for each output row */
+			startup_cost += foreignrel->reltarget->cost.startup;
+			run_cost += foreignrel->reltarget->cost.per_tuple * rows;
 		}
 		else if (IS_UPPER_REL(foreignrel))
 		{
 			PgFdwRelationInfo *ofpinfo;
-			PathTarget *ptarget = foreignrel->reltarget;
 			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
 			double		numGroups = 1;
 
-			/* Make sure the core code set the pathtarget. */
-			Assert(ptarget != NULL);
-
 			/*
 			 * This cost model is mixture of costing done for sorted and
 			 * hashed aggregates in cost_agg().  We are not sure which
@@ -2750,26 +2771,22 @@ estimate_path_cost_size(PlannerInfo *root,
 			 * Startup cost includes:
 			 *	  1. Startup cost for underneath input relation
 			 *	  2. Cost of performing aggregation, per cost_agg()
-			 *	  3. Startup cost for PathTarget eval
 			 *-
 			 */
 			startup_cost = ofpinfo->rel_startup_cost;
 			startup_cost += aggcosts.transCost.startup;
 			startup_cost += aggcosts.transCost.per_tuple * input_rows;
 			startup_cost += (cpu_operator_cost * numGroupCols) * input_rows;
-			startup_cost += ptarget->cost.startup;
 
 			/*-
 			 * Run time cost includes:
 			 *	  1. Run time cost of underneath input relation
 			 *	  2. Run time cost of performing aggregation, per cost_agg()
-			 *	  3. PathTarget eval cost for each output row
 			 *-
 			 */
 			run_cost = ofpinfo->rel_total_cost - ofpinfo->rel_startup_cost;
 			run_cost += aggcosts.finalCost * numGroups;
 			run_cost += cpu_tuple_cost * numGroups;
-			run_cost += ptarget->cost.per_tuple * numGroups;
 
 			/* Accout for the eval cost of HAVING quals, if any */
 			if (root->parse->havingQual)
@@ -2784,6 +2801,10 @@ estimate_path_cost_size(PlannerInfo *root,
 startup_cost += fpinfo->local_conds_cost.startup;
 run_cost +

Re: yet another comment typo patch

On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO  wrote:
>
>
> An assorted set of command typos is fixed in the attached patch.
>

The patch looks good to me.  I will take another pass over it and commit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: yet another comment typo patch


On 23/01/2019 13:51, Amit Kapila wrote:

On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO  wrote:



An assorted set of command typos is fixed in the attached patch.


The patch looks good to me.  I will take another pass over it and commit.


I just committed this. Sorry, I didn't see your mail until just afterwards.

- Heikki

Re: yet another comment typo patch

On Wed, Jan 23, 2019 at 5:28 PM Heikki Linnakangas  wrote:
>
> On 23/01/2019 13:51, Amit Kapila wrote:
> > On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO  wrote:
> >>
> >>
> >> An assorted set of command typos is fixed in the attached patch.
> >
> > The patch looks good to me.  I will take another pass over it and commit.
>
> I just committed this. Sorry, I didn't see your mail until just afterwards.
>

No problem, thanks!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP: Avoid creation of the free space map for small tables

On Sun, Jan 20, 2019 at 5:19 AM John Naylor  wrote:
>
> I have a test for in-range and out-of-range for each relation fork.
>

I think the first two patches (a) removal of dead code in bootstrap
and (b) the core patch to avoid creation of FSM file for the small
table are good now.  I have prepared the patches along with commit
message.  There is no change except for some changes in README and
commit message of the second patch.  Kindly let me know what you think
about them?

I think these two patches can go even without the upgrade patch
(during pg_upgrade, conditionally skip transfer of FSMs.) which is
still under discussion.  However, I am not in a hurry if you or other
thinks that upgrade patch must be committed along with the second
patch.  I think the upgrade patch is generally going on track but
might need some more review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch
Description: Binary data

v17-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data

Re: postgres on a non-journaling filesystem


On 23/01/2019 01:03, maayan mordehai wrote:

hello,

I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a  journaling filesystem. and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?


Yep, any half-written transactions will be rolled back.


and how does it work in the data level? if some of the 8k block is written
but not all of it, and then there is a crash, how postgres deals with it?


The first time a block is modified after a checkpoint, a copy of the 
block is written to the WAL. At crash recovery, the block is restored 
from the WAL. This mechanism is called "full page writes".


The WAL works just like the journal in a journaling filesystem. That's 
why it's not necessary to have journaling at the filesystem level.


- Heikki

Re: pg_dump multi VALUES INSERT

On Wed, 23 Jan 2019 at 22:08, Fabien COELHO  wrote:
> Out of abc-order rows-per-inserts option in getopt struct.

I missed that. Thanks. Fixed in attached.

> help string does not specify the expected argument.

Also fixed.

> I still think that the added rows_per_insert field is useless, ISTM That
> "int dump_inserts" can be reused, and you could also drop boolean
> got_rows_per_insert.

I thought about this and looked into it, but I decided it didn't look
like a smart thing to do.  The reason is that if --inserts sets
dump_inserts to 1 then --rows-per-insert sets it to something else
that's likely fine, but if that happens in the opposite order then the
--rows-per-insert gets overwritten with 1. The bad news is the order
that happens is defined by the order of the command line args.  It
might be possible to make it work by having --inserts set some other
variable, then set dump_inserts to 1 if it's set to 0 and the other
variable is set to >= 1... but that requires another variable,  which
is what you want to avoid... I think it's best to have a variable per
argument.

I could get rid of the got_rows_per_insert variable, but it would
require setting the default value for rows_per_insert in the main()
function rather than in InitDumpOptions().   I thought
InitDumpOptions() looked like just the place to do this, so went with
that option.  To make it work without got_rows_per_insert,
rows_per_insert would have to be 0 by default and we'd know we saw a
--rows-per-insert command line arg by the fact that rows_per_insert
was non-zero.  Would you rather have it that way?

> The feature is not tested anywhere. I still think that there should be a
> test on empty/small/larger-than-rows-per-insert tables, possibly added to
> existing TAP-tests.

I was hoping to get away with not having to do that... mainly because
I've no idea how.   Please have at it if you know how it's done.

FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable
to make it pass because I couldn't really figure out how the regex
matching is meant to work. It does not seem to be explained very well
in the comments and I lack patience for Perl.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_dump-rows-per-insert-option_v11.patch
Description: Binary data

Re: postgres on a non-journaling filesystem

2019-01-23 Thread maayan mordehai

Thank you!!

On Wed, Jan 23, 2019, 2:20 PM Heikki Linnakangas  On 23/01/2019 01:03, maayan mordehai wrote:
> > hello,
> >
> > I'm Maayan, I'm in a DBA team that uses postgresql.
> > I saw in the documentation on wals:
> > https://www.postgresql.org/docs/10/wal-intro.html
> > In the tip box that, it's better not to use a  journaling filesystem.
> and I
> > wanted to ask how it works?
> > can't we get corruption that we can't recover from?
> > I mean what if postgres in the middle of a write to a wal and there is a
> > crash, and it didn't finish.
> > I'm assuming it will detect it when we will start postgres and write that
> > it was rolled back, am I right?
>
> Yep, any half-written transactions will be rolled back.
>
> > and how does it work in the data level? if some of the 8k block is
> written
> > but not all of it, and then there is a crash, how postgres deals with it?
>
> The first time a block is modified after a checkpoint, a copy of the
> block is written to the WAL. At crash recovery, the block is restored
> from the WAL. This mechanism is called "full page writes".
>
> The WAL works just like the journal in a journaling filesystem. That's
> why it's not necessary to have journaling at the filesystem level.
>
> - Heikki
>

Re: Analyze all plans

Oleksandr Shulgin  writes:
> On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:
>> 1. Enumerate all the plans

> So enumerating all possible plans stops being practical for even slightly
> complicated queries.

Yes.  You can *not* disable the planner's aggressive pruning of losing
paths and subpaths without ending up with something that's completely
impractical for anything beyond toy queries.  That's just from the
standpoint of planner runtime.  Adding on the cost of actually creating
a finished plan and then running it for long enough to get a reliable
runtime for each case would ... well, let's just say you'd be safely dead
before you got any interesting answers.

regards, tom lane

Re: ArchiveEntry optional arguments refactoring

2019-01-23 Thread Dmitry Dolgov

> On 2019-01-17 09:29:04 -0800, Andres Freund wrote:
> On 2019-01-17 10:23:39 -0500, Tom Lane wrote:
> > I don't buy the argument that this would move the goalposts in terms
> > of how much work it is to add a new argument.  You'd still end up
> > touching every call site.
>
> Why? A lot of arguments that'd be potentially added or removed would not
> be set by each callsites.
>
> If you e.g. look at
>
> you can see that a lot of changes where like
> ArchiveEntry(fout, nilCatalogId, createDumpId(),
>  "pg_largeobject", NULL, NULL, "",
> -false, "pg_largeobject", SECTION_PRE_DATA,
> +"pg_largeobject", SECTION_PRE_DATA,
>  loOutQry->data, "", NULL,
>  NULL, 0,
>  NULL, NULL);
>
> i.e. just removing a 'false' argument. In like 70+ callsites. With the
> above scheme, we'd instead just have removed a single .withoids = true,
> from dumpTableSchema()'s ArchiveEntry() call.

To make this discussion a bit more specific, I've created a patch of how it can
look like. All the arguments, except Archive, CatalogId and DumpId I've moved
into the ArchiveOpts structure. Not all of them could be empty before, but
anyway it seems better for consistency and readability. Some of the arguments
had empty string as a default value, I haven't changed anything here yet
(although this mixture of NULL and "" in ArchiveEntry looks a bit confusing).

As Andres mentioned above, for 578b229718e / the WITH OID removal and pg_dump
modification from pluggable storage thread, this patch reduces number of
changes, related to ArchiveEntry, from 70+ to just one.

0001-ArchiveOpts-structure.patch
Description: Binary data

Re: pg_dump multi VALUES INSERT




Hello David,


I thought about this and looked into it, but I decided it didn't look
like a smart thing to do.  The reason is that if --inserts sets
dump_inserts to 1 then --rows-per-insert sets it to something else
that's likely fine, but if that happens in the opposite order then the
--rows-per-insert gets overwritten with 1.


You can test before doing that!

  case X:
if (opt.dump_inserts == 0)
  opt.dump_inserts = 1;
// otherwise option is already set

The bad news is the order that happens is defined by the order of the 
command line args.


It might be possible to make it work by having --inserts set some other 
variable,


ISTM that it is enough to test whether the variable is zero.

then set dump_inserts to 1 if it's set to 0 and the other variable is 
set to >= 1... but that requires another variable, which is what you 
want to avoid...


I still do not understand the need for another variable.

  int ninserts = 0; // default is to use copy
  while (getopt...)
  {
switch (...) {
  case "--inserts":
if (ninserts == 0) ninserts = 1;
break;
  case "--rows-per-insert":
ninserts = arg_value;
checks...
break;
 ...


I think it's best to have a variable per argument.


I disagree, because it adds complexity where none is needed: here the new 
option is an extension of a previous one, thus the previous one just 
becomes a particular case, so it seems simpler to manage it as the 
particular case it is rather than a special case, creating the need for 
checking the consistency and so if two variables are used.



I could get rid of the got_rows_per_insert variable, but it would
require setting the default value for rows_per_insert in the main()
function rather than in InitDumpOptions().   I thought
InitDumpOptions() looked like just the place to do this, so went with
that option.  To make it work without got_rows_per_insert,
rows_per_insert would have to be 0 by default and we'd know we saw a
--rows-per-insert command line arg by the fact that rows_per_insert
was non-zero.  Would you rather have it that way?


Yep, esp as rows_per_insert & dump_inserts could be the same.


The feature is not tested anywhere. I still think that there should be a
test on empty/small/larger-than-rows-per-insert tables, possibly added to
existing TAP-tests.


I was hoping to get away with not having to do that... mainly because
I've no idea how.


Hmmm. That is another question! Maybe someone will help.

--
Fabien.

Re: WIP: Avoid creation of the free space map for small tables

On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila  wrote:
> I think the first two patches (a) removal of dead code in bootstrap
> and (b) the core patch to avoid creation of FSM file for the small
> table are good now.  I have prepared the patches along with commit
> message.  There is no change except for some changes in README and
> commit message of the second patch.  Kindly let me know what you think
> about them?

Good to hear! The additional language is fine. In "Once the FSM is
created for heap", I would just change that to "...for a heap".

> I think these two patches can go even without the upgrade patch
> (during pg_upgrade, conditionally skip transfer of FSMs.) which is
> still under discussion.  However, I am not in a hurry if you or other
> thinks that upgrade patch must be committed along with the second
> patch.  I think the upgrade patch is generally going on track but
> might need some more review.

The pg_upgrade piece is a nice-to-have feature and not essential, so
can go in later. Additional review is also welcome.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: postgres on a non-journaling filesystem

On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote:
> On 23/01/2019 01:03, maayan mordehai wrote:
> > hello,
> > 
> > I'm Maayan, I'm in a DBA team that uses postgresql.
> > I saw in the documentation on wals:
> > https://www.postgresql.org/docs/10/wal-intro.html
> > In the tip box that, it's better not to use a  journaling filesystem. and I
> > wanted to ask how it works?
> > can't we get corruption that we can't recover from?
> > I mean what if postgres in the middle of a write to a wal and there is a
> > crash, and it didn't finish.
> > I'm assuming it will detect it when we will start postgres and write that
> > it was rolled back, am I right?
> 
> Yep, any half-written transactions will be rolled back.
> 
> > and how does it work in the data level? if some of the 8k block is written
> > but not all of it, and then there is a crash, how postgres deals with it?
> 
> The first time a block is modified after a checkpoint, a copy of the block
> is written to the WAL. At crash recovery, the block is restored from the
> WAL. This mechanism is called "full page writes".
> 
> The WAL works just like the journal in a journaling filesystem. That's why
> it's not necessary to have journaling at the filesystem level.

But note not having journaling on the FS level often makes OS start
after a crash *painfully* slow, because fsck or similar will be run. And
that's often necessary for the internal FS consistency.

Note that even with journaling enabled, most filesystem by default don't
journal data, so you can get those partial writes anyway.

Greetings,

Andres Freund

Re: Typo: llvm.cpp files identified as llvm.c

Hi,

On 2019-01-23 12:01:10 +0900, Michael Paquier wrote:
> On Tue, Jan 22, 2019 at 05:49:46PM -0800, Andres Freund wrote:
> > On 2019-01-23 14:43:15 +1300, Thomas Munro wrote:
> >> The function name comments are similar, though less consistent so I'm
> >> too lazy to write a script to find one that is actually wrong (with
> >> which to trigger Andres's let's-delete-them-all response :-D).
> 
> I am not sure if anybody uses them for anything automatically, still I
> find myself from time to time looking at them to remember on which
> path the file is located when opened in emacs.

I often want to know that too - isn't it easier to just meta-x f and see
the directory displayed? Because that doesn't require jumping to the
head of the file...

Greetings,

Andres Freund

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

On 10/12/2018 23:37, Dmitry Dolgov wrote:

On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier wrote:

On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote:

The libpmem's pmem_map_file() supported 2M/1G(the size of huge page)
alignment, since it could reduce the number of page faults.
In addition, libpmem's pmem_memcpy_nodrain() is the function
to copy data using single instruction, multiple data(SIMD) instructions
and NT store instructions(MOVNT).
As a result, using these APIs is faster than using old mmap()/memcpy().

Please see the PGCon2018 presentation[1] for the details.

[1]
https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf

So you say that this represents a 3% gain based on the presentation?
That may be interesting to dig into it. Could you provide fresher
performance numbers? I am moving this patch to the next CF 2018-10 for
now, waiting for input from the author.

Unfortunately, the patch has some conflicts now, so probably not only fresher
performance numbers are necessary, but also a rebased version.

I believe the idea behind this patch is quite important (thanks to CMU DG for
inspiring lectures), so I decided to put some efforts and rebase it to prevent
from rotting. At the same time I have a vague impression that the patch itself
suggests quite narrow way of using of PMDK.

Thanks.

To re-iterate what I said earlier in this thread, I think the next step
here is to write a patch that modifies xlog.c to use plain old
mmap()/msync() to memory-map the WAL files, to replace the WAL buffers.
Let's see what the performance of that is, with or without NVM hardware.
I think that might actually make the code simpler. There's a bunch of
really hairy code around locking the WAL buffers, which could be made
simpler if each backend memory-mapped the WAL segment files independently.

One thing to watch out for, is that if you read() a file, and there's an
I/O error, you have a chance to ereport() it. If you try to read from a
memory-mapped file, and there's an I/O error, the process is killed with
SIGBUS. So I think we have to be careful with using memory-mapped I/O
for reading files. But for writing WAL files, it seems like a good fit.

Once we have a reliable mmap()/msync() implementation running, it should
be straightforward to change it to use MAP_SYNC and the special CPU
instructions for the flushing.

- Heikki

Re: ArchiveEntry optional arguments refactoring

Hi,

On 2019-01-23 16:12:15 +0100, Dmitry Dolgov wrote:
> To make this discussion a bit more specific, I've created a patch of how it 
> can
> look like.

Thanks.

> All the arguments, except Archive, CatalogId and DumpId I've moved
> into the ArchiveOpts structure. Not all of them could be empty before, but
> anyway it seems better for consistency and readability. Some of the arguments
> had empty string as a default value, I haven't changed anything here yet
> (although this mixture of NULL and "" in ArchiveEntry looks a bit confusing).

Probably worth changing at the same time, if we decide to go for it.

To me this does look like it'd be more maintainable going forward.

Greetings,

Andres Freund

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

Hi,

On 2019-01-23 18:45:42 +0200, Heikki Linnakangas wrote:
> To re-iterate what I said earlier in this thread, I think the next step here
> is to write a patch that modifies xlog.c to use plain old mmap()/msync() to
> memory-map the WAL files, to replace the WAL buffers. Let's see what the
> performance of that is, with or without NVM hardware. I think that might
> actually make the code simpler. There's a bunch of really hairy code around
> locking the WAL buffers, which could be made simpler if each backend
> memory-mapped the WAL segment files independently.
> 
> One thing to watch out for, is that if you read() a file, and there's an I/O
> error, you have a chance to ereport() it. If you try to read from a
> memory-mapped file, and there's an I/O error, the process is killed with
> SIGBUS. So I think we have to be careful with using memory-mapped I/O for
> reading files. But for writing WAL files, it seems like a good fit.
> 
> Once we have a reliable mmap()/msync() implementation running, it should be
> straightforward to change it to use MAP_SYNC and the special CPU
> instructions for the flushing.

FWIW, I don't think we should go there as the sole implementation. I'm
fairly convinced that we're going to need to go to direct-IO in more
cases here, and that'll not work well with mmap.  I think this'd be a
worthwhile experiment, but I'm doubtful it'd end up simplifying our
code.

Greetings,

Andres Freund

Re: Typo: llvm.cpp files identified as llvm.c

Andres Freund  writes:
> On 2019-01-23 12:01:10 +0900, Michael Paquier wrote:
>> I am not sure if anybody uses them for anything automatically, still I
>> find myself from time to time looking at them to remember on which
>> path the file is located when opened in emacs.

> I often want to know that too - isn't it easier to just meta-x f and see
> the directory displayed? Because that doesn't require jumping to the
> head of the file...

My own workflow often involves editing copies that are not in the original
directory, so that answer doesn't work for me.  In general it won't work
anytime you are looking at a file out-of-context.

regards, tom lane

Re: ArchiveEntry optional arguments refactoring

Hello

On 2019-Jan-23, Andres Freund wrote:

> > All the arguments, except Archive, CatalogId and DumpId I've moved
> > into the ArchiveOpts structure. Not all of them could be empty before, but
> > anyway it seems better for consistency and readability. Some of the 
> > arguments
> > had empty string as a default value, I haven't changed anything here yet
> > (although this mixture of NULL and "" in ArchiveEntry looks a bit 
> > confusing).
> 
> Probably worth changing at the same time, if we decide to go for it.
> 
> To me this does look like it'd be more maintainable going forward.

It does.  How does pgindent behave with it?

I'd use ArchiveEntryOpts as struct name; ArchiveOpts sounds wrong. Also,
the struct members could use better names -- "defn" for example could
perhaps be "createStmt" (to match dropStmt/copyStmt), and expand "desc"
to "description".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ArchiveEntry optional arguments refactoring

On 1/23/19 10:12 AM, Dmitry Dolgov wrote:
> To make this discussion a bit more specific, I've created a patch of how
> it can look like.
A little bit of vararg-macro action can make such a design look
even tidier, cf. [1].

Or are compilers without vararg macros still in the supported mix?

-Chap

[1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709

The macros in [1] are not defined to create a function call, but only
the argument structure because there might be several functions to pass
it to, so a call would be written like func(&SEQ_MK_CHN(NOTEON, ...)).

In ArchiveEntry's case, if there's only one function involved, there'd
be no reason not to have a macro produce the whole call.

Re: ArchiveEntry optional arguments refactoring

Chapman Flack  writes:
> Or are compilers without vararg macros still in the supported mix?

No.

regards, tom lane

Re: ArchiveEntry optional arguments refactoring

Hi,

On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote:
> I'd use ArchiveEntryOpts as struct name; ArchiveOpts sounds wrong.

Brevity would be of some advantage IMO, because it'll probably determine
how pgindent indents the arguments, because the struct name will be in
the arguments.

> Also, the struct members could use better names -- "defn" for example
> could perhaps be "createStmt" (to match dropStmt/copyStmt), and expand
> "desc" to "description".

True.

Greetings,

Andres Freund

Re: ArchiveEntry optional arguments refactoring

Hi,

On 2019-01-23 12:05:10 -0500, Chapman Flack wrote:
> On 1/23/19 10:12 AM, Dmitry Dolgov wrote:
> > To make this discussion a bit more specific, I've created a patch of how
> > it can look like.

> A little bit of vararg-macro action can make such a design look
> even tidier, cf. [1].
> [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709
> 
> The macros in [1] are not defined to create a function call, but only
> the argument structure because there might be several functions to pass
> it to, so a call would be written like func(&SEQ_MK_CHN(NOTEON, ...)).
> 
> In ArchiveEntry's case, if there's only one function involved, there'd
> be no reason not to have a macro produce the whole call.

I'm not really seeing this being more than obfuscation in this case. The
only point of the macro is to set the .tag and .op elements to something
without adding redundancies due to the struct name. Which we'd not have.

Greetings,

Andres Freund

Re: Referential Integrity Checks with Statement-level Triggers

2019-01-23 Thread Corey Huinker

Attached is a patch that refactors DELETE triggers to fire at the statement
level.

I chose delete triggers partly out of simplicity, and partly because there
some before/after row linkage in the ON UPDATE CASCADE cases where
statement level triggers might not be feasible as we have currently
implemented them.

After having done the work, I think INSERT triggers would be similarly
straightforward, but wanted to limit scope.

Also, after having stripped the delete cases out of the update-or-delete
functions, it became obvious that the on-update-set-null and
on-update-set-default cases differed by only 3-4 lines, so those functions
were combined.

On a vagrant VM running on my desktop machine, I'm seeing a speed-up of
about 25% in the benchmark provided. I think that figure is cloudy and
below my expectations. Perhaps we'd get a much better picture of whether or
not this is worth it on a bare metal machine, or at least a VM better
suited to benchmarking.

Currently 4 make-check tests are failing. Two of which appear to false
positives (the test makes assumptions about triggers that are no longer
true), and the other two are outside the scope of this benchmark so I'll
revisit them if we go forward.

ri-set-logic.sql is an edited benchmark script adapted from Kevin
Grittner's benchmark that he ran against hand-rolled triggers and posted on
2016-11-02
ri_test.out is a copy paste of two runs of the benchmark script.

Many thanks to everyone who helped, often despite their own objections to
the overall reasoning behind the endeavor. I'm aware that a large
contingent of highly experienced people would very much like to replace our
entire trigger architecture, or at least divorce RI checks from triggers.
Maybe this patch spurs on that change. Even if nothing comes of it, it's
been a great learning experience.

On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli  wrote:

> > It is far from a premature optimization IMO, it is super useful and
> something I was hoping would happen ever since I heard about transition
> tables being worked on.
>
> Me too.  Never-ending DELETEs are a common pain point especially for
> people migrated from MySQL which creates indexes for foreign keys
> automatically.
>
From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001
From: vagrant 
Date: Wed, 23 Jan 2019 16:59:17 +
Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers

---
 src/backend/commands/tablecmds.c|   9 +-
 src/backend/commands/trigger.c  |   2 +
 src/backend/utils/adt/ri_triggers.c | 779 
 3 files changed, 566 insertions(+), 224 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..21f5bf94a4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 {
 	CreateTrigStmt *fk_trigger;

+	TriggerTransition *del = makeNode(TriggerTransition);
+	del->name = "pg_deleted_transition_table";
+	del->isNew = false;
+	del->isTable = true;
+
 	/*
 	 * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON
 	 * DELETE action on the referenced table.
@@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 	fk_trigger = makeNode(CreateTrigStmt);
 	fk_trigger->trigname = "RI_ConstraintTrigger_a";
 	fk_trigger->relation = NULL;
-	fk_trigger->row = true;
+	fk_trigger->row = false;
 	fk_trigger->timing = TRIGGER_TYPE_AFTER;
 	fk_trigger->events = TRIGGER_TYPE_DELETE;
 	fk_trigger->columns = NIL;
-	fk_trigger->transitionRels = NIL;
+	fk_trigger->transitionRels = list_make1(del);
 	fk_trigger->whenClause = NULL;
 	fk_trigger->isconstraint = true;
 	fk_trigger->constrrel = NULL;
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7ffaeaffc6..080587215f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 *
 			 * Currently this is enforced by the grammar, so just Assert here.
 			 */
+			/*
 			Assert(!stmt->isconstraint);
+			*/

 			if (tt->isNew)
 			{
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index e1aa3d0044..6f89ab4c77 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -194,9 +194,10 @@ static int	ri_constraint_cache_valid_count = 0;
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
   HeapTuple old_row,
   const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
-static Datum ri_setnull(TriggerData *trigdata);
-static Datum ri_setdefault(TriggerData *trigdata);
+static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action);
+static Datu

Re: ArchiveEntry optional arguments refactoring

On 1/23/19 12:10 PM, Andres Freund wrote:
> On 2019-01-23 12:05:10 -0500, Chapman Flack wrote:
>> [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709

> I'm not really seeing this being more than obfuscation in this case. The
> only point of the macro is to set the .tag and .op elements to something
> without adding redundancies due to the struct name. Which we'd not have.

Granted, that example is more elaborate than this case, but writing


ArchiveEntry(fout, dbCatId, dbDumpId, .tag = datname, .owner = dba,
 .desc = "DATABASE", .section = SECTION_PRE_DATA,
 .defn = creaQry->data, .dropStmt = delQry->data);

instead of

ArchiveEntry(fout, dbCatId, dbDumpId, &(ArchiveOpts){.tag = datname,
 .owner = dba, .desc = "DATABASE",
 .section = SECTION_PRE_DATA, .defn = creaQry->data,
 .dropStmt = delQry->data});

would be easy, and still save a bit of visual noise.

Regards,
-Chap

Re: ArchiveEntry optional arguments refactoring

On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote:
> Hello
> 
> On 2019-Jan-23, Andres Freund wrote:
> 
> > > All the arguments, except Archive, CatalogId and DumpId I've moved
> > > into the ArchiveOpts structure. Not all of them could be empty before, but
> > > anyway it seems better for consistency and readability. Some of the 
> > > arguments
> > > had empty string as a default value, I haven't changed anything here yet
> > > (although this mixture of NULL and "" in ArchiveEntry looks a bit 
> > > confusing).
> > 
> > Probably worth changing at the same time, if we decide to go for it.
> > 
> > To me this does look like it'd be more maintainable going forward.
> 
> It does.  How does pgindent behave with it?

It craps out:
Error@3649: Unbalanced parens
Warning@3657: Extra )

But that can be worked around with something like

te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
  ARCHIVE_ARGS(.tag = tbinfo->dobj.name,
   .namespace = 
tbinfo->dobj.namespace->dobj.name,
   .owner = tbinfo->rolname,
   .desc = "TABLE DATA",
   .section = SECTION_DATA,
   .copyStmt = copyStmt,
   .deps = &(tbinfo->dobj.dumpId),
   .nDeps = 1,
   .dumpFn = dumpFn,
   .dumpArg = tdinfo,
   ));
which looks mildly simpler too.

Greetings,

Andres Freund

Re: ArchiveEntry optional arguments refactoring

On 2019-01-23 12:22:23 -0500, Chapman Flack wrote:
> On 1/23/19 12:10 PM, Andres Freund wrote:
> > On 2019-01-23 12:05:10 -0500, Chapman Flack wrote:
> >> [1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709
> 
> > I'm not really seeing this being more than obfuscation in this case. The
> > only point of the macro is to set the .tag and .op elements to something
> > without adding redundancies due to the struct name. Which we'd not have.
> 
> Granted, that example is more elaborate than this case, but writing
> 
> 
> ArchiveEntry(fout, dbCatId, dbDumpId, .tag = datname, .owner = dba,
>  .desc = "DATABASE", .section = SECTION_PRE_DATA,
>  .defn = creaQry->data, .dropStmt = delQry->data);
> 
> instead of
> 
> ArchiveEntry(fout, dbCatId, dbDumpId, &(ArchiveOpts){.tag = datname,
>  .owner = dba, .desc = "DATABASE",
>  .section = SECTION_PRE_DATA, .defn = creaQry->data,
>  .dropStmt = delQry->data});
> 
> would be easy, and still save a bit of visual noise.

IDK, it'd be harder to parse correctly as a C programmer though. I'm up
with a wrapper macro like
#define ARCHIVE_ARGS(...) &(ArchiveOpts){__VA_ARGS__}
but weirdly mixing struct arguments and normal function arguments seems
quite confusing.

Greetings,

Andres Freund

Re: ArchiveEntry optional arguments refactoring

Andres Freund  writes:
> On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote:
>> It does.  How does pgindent behave with it?

> It craps out:
> Error@3649: Unbalanced parens
> Warning@3657: Extra )

> But that can be worked around with something like

> te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
>   ARCHIVE_ARGS(.tag = tbinfo->dobj.name,
>.namespace = 
> tbinfo->dobj.namespace->dobj.name,
>.owner = tbinfo->rolname,
>.desc = "TABLE DATA",
>.section = SECTION_DATA,
>.copyStmt = copyStmt,
>.deps = &(tbinfo->dobj.dumpId),
>.nDeps = 1,
>.dumpFn = dumpFn,
>.dumpArg = tdinfo,
>));
> which looks mildly simpler too.

That looks fairly reasonable from here, but I'd suggest
ARCHIVE_OPTS rather than ARCHIVE_ARGS.

Can we omit the initial dots if we use a wrapper macro?  Would it be
a good idea to do so (I'm not really sure)?

regards, tom lane

Re: ArchiveEntry optional arguments refactoring

On 2019-01-23 12:32:06 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-01-23 13:58:07 -0300, Alvaro Herrera wrote:
> >> It does.  How does pgindent behave with it?
> 
> > It craps out:
> > Error@3649: Unbalanced parens
> > Warning@3657: Extra )
> 
> > But that can be worked around with something like
> 
> > te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
> >   ARCHIVE_ARGS(.tag = tbinfo->dobj.name,
> >.namespace = 
> > tbinfo->dobj.namespace->dobj.name,
> >.owner = tbinfo->rolname,
> >.desc = "TABLE DATA",
> >.section = SECTION_DATA,
> >.copyStmt = copyStmt,
> >.deps = &(tbinfo->dobj.dumpId),
> >.nDeps = 1,
> >.dumpFn = dumpFn,
> >.dumpArg = tdinfo,
> >));
> > which looks mildly simpler too.
> 
> That looks fairly reasonable from here, but I'd suggest
> ARCHIVE_OPTS rather than ARCHIVE_ARGS.

WFM. Seems quite possible that we'd grow a few more of these over time,
so establishing some common naming seems good.

Btw, do you have an opionion on keeping catId / dumpId outside/inside
the argument struct?

> Can we omit the initial dots if we use a wrapper macro?  Would it be
> a good idea to do so (I'm not really sure)?

Not easily, if at all, I think. We'd have to do a fair bit of weird
macro magic (and then still end up with limitations) to "process" each
argument individually.  And even if it were easy, I don't think it's
particularly advantageous.

Greetings,

Andres Freund

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-23 Thread Robert Haas

On Fri, Jan 18, 2019 at 9:01 PM Vik Fearing  wrote:
> I don't want a situation like this:
> CREATE INDEX CONCURRENTLY ...
> DROP INDEX CONCURRENTLY ...
> REINDEX INDEX (CONCURRENTLY) ...
>
> All three should be the same, and my suggestion is to add the
> parenthesized version to CREATE and DROP and not add the unparenthesized
> version to REINDEX.

+1 for all three being the same.  I could see allowing only the
unparenthesized format for all three, or allowing both forms for all
three, but I think having only one form for each and having them not
agree will be too confusing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-23 Thread Pavel Stehule

st 23. 1. 2019 19:17 odesílatel Robert Haas  napsal:

> On Fri, Jan 18, 2019 at 9:01 PM Vik Fearing 
> wrote:
> > I don't want a situation like this:
> > CREATE INDEX CONCURRENTLY ...
> > DROP INDEX CONCURRENTLY ...
> > REINDEX INDEX (CONCURRENTLY) ...
> >
> > All three should be the same, and my suggestion is to add the
> > parenthesized version to CREATE and DROP and not add the unparenthesized
> > version to REINDEX.
>
> +1 for all three being the same.  I could see allowing only the
> unparenthesized format for all three, or allowing both forms for all
> three, but I think having only one form for each and having them not
> agree will be too confusing.
>

+1

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>

Re: ArchiveEntry optional arguments refactoring

Andres Freund  writes:
> Btw, do you have an opionion on keeping catId / dumpId outside/inside
> the argument struct?

I'd go for outside, since they're not optional.  Not dead set on that
though.

regards, tom lane

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

Hi,

On 2019-01-23 13:17:26 -0500, Robert Haas wrote:
> On Fri, Jan 18, 2019 at 9:01 PM Vik Fearing  
> wrote:
> > I don't want a situation like this:
> > CREATE INDEX CONCURRENTLY ...
> > DROP INDEX CONCURRENTLY ...
> > REINDEX INDEX (CONCURRENTLY) ...
> >
> > All three should be the same, and my suggestion is to add the
> > parenthesized version to CREATE and DROP and not add the unparenthesized
> > version to REINDEX.
> 
> +1 for all three being the same.  I could see allowing only the
> unparenthesized format for all three, or allowing both forms for all
> three, but I think having only one form for each and having them not
> agree will be too confusing.

It seems quite unnecesarily confusing to me to require parens for
REINDEX CONCURRENTLY when we've historically not required that for
CREATE/DROP INDEX CONCURRENTLY. Besides that, training people that it's
the correct form to use parens for CIC/DIC, creates an unnecessary
version dependency.

I think it actually makes sense to see the CONCURRENTLY versions as
somewhat separate types of statements than the non concurrent
versions. They have significantly different transactional behaviour
(like not being able to be run within one, and leaving gunk behind in
case of error). For me it semantically makes sense to have that denoted
at the toplevel, it's a related but different type of DDL statement.

Greetings,

Andres Freund

Re: ArchiveEntry optional arguments refactoring

On 1/23/19 12:25 PM, Andres Freund wrote:
> On 2019-01-23 12:22:23 -0500, Chapman Flack wrote:

>> ArchiveEntry(fout, dbCatId, dbDumpId, .tag = datname, .owner = dba,
>>  .desc = "DATABASE", .section = SECTION_PRE_DATA,
>>  .defn = creaQry->data, .dropStmt = delQry->data);

> IDK, it'd be harder to parse correctly as a C programmer though. ...
> weirdly mixing struct arguments and normal function arguments seems
> quite confusing.

Hmm, I guess the rubric I think with goes something like "is a C
programmer who encounters this in a source file for the first time
likely to guess wrong about what it means?", and in the case above,
I can scarcely imagine it.

ISTM that these days, many people are familiar with several languages
that allow a few mandatory, positional parameters followed by optional
named ones, and so a likely reaction would be "hey look, somebody used
a macro here to make C look more like ."

On 1/23/19 12:32 PM, Tom Lane wrote:
> Can we omit the initial dots if we use a wrapper macro?

That, I think, is hard.

Getting to the form above is downright easy; making the dots go away,
even if achievable, seems way further down the path of diminishing
returns.

Regards,
-Chap

Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

2019-01-23 Thread Peter Geoghegan

On Mon, Jan 14, 2019 at 2:37 PM Peter Geoghegan  wrote:
> The fix here must be to normalize index tuples that are compressed
> within amcheck, both during initial fingerprinting, and during
> subsequent probes of the Bloom filter in bt_tuple_present_callback().

I happened to talk to Andres about this in person yesterday. He
thought that there was reason to be concerned about the need for
logical normalization beyond TOAST issues. Expression indexes were a
particular concern, because they could in principle have a change in
the on-disk representation without a change of logical values -- false
positives could result. He suggested that the long term solution was
to bring hash operator class hash functions into Bloom filter hashing,
at least where available.

I wasn't very enthused about this idea, because it will be expensive
and complicated for an uncertain benefit. There are hardly any btree
operator classes that can ever have bitwise distinct datums that are
equal, anyway (leaving aside issues with TOAST). For the cases that do
exist (e.g. numeric_ops display scale), we may not really want to
normalize the differences away. Having an index tuple with a
numeric_ops datum containing the wrong display scale but with
everything else correct still counts as corruption.

It now occurs to me that if we wanted to go further than simply
normalizing away TOAST differences, my pending nbtree patch could
enable a simpler and more flexible way of doing that than bringing
hash opclasses into it, at least on the master branch. We could simply
do an index look-up for the exact tuple of interest in the event of a
Bloom filter probe indicating its apparent absence (corruption) --
even heap TID can participate in the search. In addition, that would
cover the whole universe of logical differences, known and unknown
(e.g. it wouldn't matter if somebody initialized alignment padding to
something non-zero, since that doesn't cause wrong answers to
queries). We might even want to offer an option where the Bloom filter
is bypassed (we go straight to probing the indexes) some proportion of
the time, or when a big misestimation when sizing the Bloom filter is
detected (i.e. almost all bits in the Bloom filter bitset are set at
the time we start probing the filter).

-- 
Peter Geoghegan

Re: Analyze all plans

2019-01-23 Thread Donald Dong

On Jan 23, 2019, at 6:46 AM, Tom Lane  wrote:
> 
> Oleksandr Shulgin  writes:
>> On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:
>>> 1. Enumerate all the plans
> 
>> So enumerating all possible plans stops being practical for even slightly
>> complicated queries.
> 
> Yes.  You can *not* disable the planner's aggressive pruning of losing
> paths and subpaths without ending up with something that's completely
> impractical for anything beyond toy queries.  That's just from the
> standpoint of planner runtime.  Adding on the cost of actually creating
> a finished plan and then running it for long enough to get a reliable
> runtime for each case would ... well, let's just say you'd be safely dead
> before you got any interesting answers.

Thank you for the feedback. I didn't think about this carefully. I
thought the planner would use GEQO when the number of joins is
large, but indeed it's still n! for normal path selection.

Now I keep top-k cheapest sub plans, where k is configured
by users. If the k is set up properly, setting a timeout may not
be necessary. However, if I do want to set a timeout, I would run
into the issues I had in step 2.

I'm looking forward to hearing more thoughts :)

Thanks again!

Re: inherited primary key misbehavior

Hello

On 2019-Jan-23, Amit Langote wrote:

> It seems that ATExecDetachPartition misses to mark a child's primary key
> constraint entry (if any) as local (conislocal=true, coninhcount=0), which
> causes this:
> 
> create table p (a int primary key) partition by list (a);
> create table p2 partition of p for values in (2);
> alter table p detach partition p2;
> alter table p2 drop constraint p2_pkey ;
> ERROR:  cannot drop inherited constraint "p2_pkey" of relation "p2"

Ugh.

I suppose unique keys would have the same problem, so the check for
indisprimary is wrong.  Other than that, looks correct.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: problems with foreign keys on partitioned tables

On 2019-Jan-22, Amit Langote wrote:

> On 2019/01/22 8:30, Alvaro Herrera wrote:
> > Hi Amit,
> > 
> > Will you please rebase 0002?  Please add your proposed tests cases to
> > it, too.
> 
> Done.  See the attached patches for HEAD and PG 11.

I'm not quite sure I understand why the one in DefineIndex needs the
deps but nothing else does.  I fear that you added that one just to
appease the existing test that breaks otherwise, and I worry that with
that addition we're papering over some other, more fundamental bug.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: Avoid creation of the free space map for small tables

On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila  wrote:
> Also, another case to think in this regard is the upgrade for standby
> servers, if you read below paragraph from the user manual [1], you
> will see what I am worried about?
>
> "What this does is to record the links created by pg_upgrade's link
> mode that connect files in the old and new clusters on the primary
> server. It then finds matching files in the standby's old cluster and
> creates links for them in the standby's new cluster. Files that were
> not linked on the primary are copied from the primary to the standby.
> (They are usually small.)"
>
> [1] - https://www.postgresql.org/docs/devel/pgupgrade.html

I am still not able to get the upgraded standby to go into recovery
without resorting to pg_basebackup, but in another attempt to
investigate your question I tried the following (data1 = old cluster,
data2 = new cluster):


mkdir -p data1 data2 standby

echo 'heap' > data1/foo
echo 'fsm' > data1/foo_fsm

# simulate streaming replication
rsync --archive data1 standby

# simulate pg_upgrade, skipping FSM
ln data1/foo -t data2/

rsync --archive --delete --hard-links --size-only --no-inc-recursive
data1 data2 standby

# result
ls standby/data1
ls standby/data2


The result is that foo_fsm is not copied to standby/data2, contrary to
what the docs above imply for other unlinked files. Can anyone shed
light on this?

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Thread-unsafe coding in ecpg

2019-01-23 Thread Andrew Dunstan

On 1/22/19 12:50 PM, Andrew Dunstan wrote:
> On 1/21/19 10:00 PM, Andrew Dunstan wrote:
>> On 1/21/19 3:25 PM, Tom Lane wrote:
>>> "Joshua D. Drake"  writes:
 On 1/21/19 12:05 PM, Tom Lane wrote:
> Is there a newer version of mingw that does have this functionality?
 Apparently this can be done with thee 64bit version:
 https://stackoverflow.com/questions/33647271/how-to-use-configthreadlocale-in-mingw
>>> Hmm, the followup question makes it sound like it still didn't work :-(.
>>>
>>> However, since the mingw build is autoconf-based, seems like we can
>>> install a configure check instead of guessing.  Will make it so.
>>>
>>> Task for somebody else: run a MinGW64 buildfarm member.
>>>
>>> 
>> I could set that up in just about two shakes of a lamb's tail - I have a
>> script to do so all tested on vagrant/aws within the last few months.
>>
>>
>> What I don't have is resources. My Windows resources are pretty much
>> tapped out. I would need either some modest hardware or a Windows VM
>> somewhere in the cloud - could be anywhere but I'm most at home on AWS.
>>
>
> Incidentally, jacana *is* running mingw64, but possibly not a young
> enough version. I just ran a test on an up to date version and it found
> the config setting happily, and passed the make stage.
>
>
> I can look at upgrading jacana to a more modern compiler. The version
> it's running is 4.9.1, apparently built around Oct 2014, so it's not
> that old.
>
>

I have just spent a large amount of time testing the committed fix with
a number of versions of gcc. It blows up on any compiler modern enough
to know about _configthreadlocale

Essentially what happens is this:

== running regression test queries    ==
test compat_informix/dec_test ... ok
test compat_informix/charfuncs    ... ok
test compat_informix/rfmtdate ... ok
test compat_informix/rfmtlong ... ok
test compat_informix/rnull    ... stdout stderr FAILED
test compat_informix/sqlda    ... stdout stderr FAILED (test
process exited with exit code 1)
test compat_informix/describe ... stdout stderr FAILED (test
process exited with exit code 1)
test compat_informix/test_informix ...

At this point the regression tests hang and the disk starts filling up
rapidly.

This has been tested with versions 5.4.0, 6.4.0, 7.3.0, 8.1.0 and 8.2.1.
The first 4 are downloaded direct from the Mingw64 repo, the last is
from Msys2's mingw64 toolchain.

I'm trying to find out more info, but what we have right now is pretty
broken.

Reverting commits ee27584c4a and 8eb4a9312 cures[*] the problem.

cheers

andrew

[*] FSVO "cure". I am still getting intermittent errors but no hangs.

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Thread-unsafe coding in ecpg

Andrew Dunstan  writes:
> I have just spent a large amount of time testing the committed fix with
> a number of versions of gcc. It blows up on any compiler modern enough
> to know about _configthreadlocale

Bleah.  Since the regular Windows buildfarm members seem happy, this
evidently means that MinGW's _configthreadlocale is broken in some way.

I suppose we could just remove the autoconf test and build without
_configthreadlocale on MinGW, but that's kind of sad ...

Perhaps there's some sort of setup that MinGW's version needs that
we're not doing?

regards, tom lane

Re: index_build does not need its isprimary argument

2019-01-23 Thread Michael Paquier

On Tue, Jan 22, 2019 at 05:08:52PM +0900, Michael Paquier wrote:
> While working on REINDEX CONCURRENTLY, I have noticed that
> index_build() does not need its argument isprimary.  Perhaps it is
> not worth bothering, but for the REINDEX CONCURRENTLY business this
> removes the need to open an index when triggering a concurrent
> build.
> 
> The flag was introduced in 3fdeb189, but f66e8bf actually forgot to
> finish the cleanup as index_update_stats() has simplified its
> interface.

And committed as of 289198c, and back to the real deal..
--
Michael


signature.asc
Description: PGP signature

Re: Protect syscache from bloating with negative cache entries

2019-01-23 Thread Bruce Momjian

On Wed, Jan 23, 2019 at 05:35:02PM +0900, Kyotaro HORIGUCHI wrote:
> At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp>
> > An option is an additional PGPROC member and interface functions.
> > 
> > struct PGPROC
> > {
> >  ...
> >  int syscahe_usage_track_interval; /* track interval, 0 to disable */
> > 
> > =# select syscahce_usage_track_add(, [, ]);
> > =# select syscahce_usage_track_remove(2134);
> > 
> > 
> > Or, just provide an one-shot triggering function.
> > 
> > =# select syscahce_take_usage_track();
> > 
> > This can use both a similar PGPROC variable or SendProcSignal()
> > but the former doesn't fire while idle time unless using timer.
> 
> The attached is revised version of this patchset, where the third
> patch is the remote setting feature. It uses static shared memory.
> 
> =# select pg_backend_catcache_stats(, );
> 
> Activates or changes catcache stats feature on the backend with
> PID. (The name should be changed to .._syscache_stats, though.)
> It is far smaller than the remote-GUC feature. (It contains a
> part that should be in the previous patch. I will fix it later.)

I have a few questions to make sure we have not made the API too
complex.  First, for syscache_prune_min_age, that is the minimum age
that we prune, and entries could last twice that long.  Is there any
value to doing the scan at 50% of the age so that the
syscache_prune_min_age is the max age?  For example, if our age cutoff
is 10 minutes, we could scan every 5 minutes so 10 minutes would be the
maximum age kept.

Second, when would you use syscache_memory_target != 0?  If you had
syscache_prune_min_age really fast, e.g. 10 seconds?  What is the
use-case for this?  You have a query that touches 10k objects, and then
the connection stays active but doesn't touch many of those 10k objects,
and you want it cleaned up in seconds instead of minutes?  (I can't see
why you would not clean up all unreferenced objects after _minutes_ of
disuse, but removing them after seconds of disuse seems undesirable.)
What are the odds you would retain the entires you want with a fast
target?

What is the value of being able to change a specific backend's stat
interval?  I don't remember any other setting having this ability.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: pg_dump multi VALUES INSERT

On Thu, 24 Jan 2019 at 04:45, Fabien COELHO  wrote:

> I still do not understand the need for another variable.
>
>int ninserts = 0; // default is to use copy
>while (getopt...)
>{
>  switch (...) {
>case "--inserts":
>  if (ninserts == 0) ninserts = 1;
>  break;
>case "--rows-per-insert":
>  ninserts = arg_value;
>  checks...
>  break;
>   ...

I didn't think of that. Attached is a version that changes it to work
along those lines.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_dump-rows-per-insert-option_v12.patch
Description: Binary data

Re: Thread-unsafe coding in ecpg

2019-01-23 Thread Andrew Dunstan



On 1/23/19 6:01 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I have just spent a large amount of time testing the committed fix with
>> a number of versions of gcc. It blows up on any compiler modern enough
>> to know about _configthreadlocale
> Bleah.  Since the regular Windows buildfarm members seem happy, this
> evidently means that MinGW's _configthreadlocale is broken in some way.
>
> I suppose we could just remove the autoconf test and build without
> _configthreadlocale on MinGW, but that's kind of sad ...
>
> Perhaps there's some sort of setup that MinGW's version needs that
> we're not doing?


This seems to suggest something worse: https://reviews.llvm.org/D40181


Not sure I fully understand what's happening here, though.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Delay locking partitions during INSERT and UPDATE

On Tue, Jan 22, 2019 at 4:31 PM David Rowley
 wrote:
> Thanks. I've attached a rebased version.

Hmm, now instead of an 85x speedup over master in the 10k partition
case, I only get 20x. Anyone else see this?


--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Delay locking partitions during INSERT and UPDATE

On Thu, 24 Jan 2019 at 13:38, John Naylor  wrote:
>
> On Tue, Jan 22, 2019 at 4:31 PM David Rowley
>  wrote:
> > Thanks. I've attached a rebased version.
>
> Hmm, now instead of an 85x speedup over master in the 10k partition
> case, I only get 20x. Anyone else see this?

What's it like with fsync off?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: inherited primary key misbehavior

On 2019/01/24 6:10, Alvaro Herrera wrote:
> Hello
> 
> On 2019-Jan-23, Amit Langote wrote:
> 
>> It seems that ATExecDetachPartition misses to mark a child's primary key
>> constraint entry (if any) as local (conislocal=true, coninhcount=0), which
>> causes this:
>>
>> create table p (a int primary key) partition by list (a);
>> create table p2 partition of p for values in (2);
>> alter table p detach partition p2;
>> alter table p2 drop constraint p2_pkey ;
>> ERROR:  cannot drop inherited constraint "p2_pkey" of relation "p2"
> 
> Ugh.
> 
> I suppose unique keys would have the same problem, so the check for
> indisprimary is wrong.

Fixed in the attached.  We don't support inheriting EXCLUSION constraints
yet, so no need to consider their indexes.

Thanks,
Amit
From d794906e8d2a0839f548f20be5817f306b833b15 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 23 Jan 2019 18:33:09 +0900
Subject: [PATCH v2] Detach primary key constraint when detaching partition

---
 src/backend/commands/tablecmds.c   | 16 
 src/test/regress/expected/indexing.out | 15 +++
 src/test/regress/sql/indexing.sql  | 14 ++
 3 files changed, 45 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..e3baf73e7b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15094,6 +15094,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
foreach(cell, indexes)
{
Oid idxid = lfirst_oid(cell);
+   Oid constrOid;
Relationidx;
 
if (!has_superclass(idxid))
@@ -15106,6 +15107,21 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
IndexSetParentIndex(idx, InvalidOid);
update_relispartition(classRel, idxid, false);
index_close(idx, NoLock);
+
+   /*
+* Detach any constraints associated with the index too.  Only 
UNIQUE
+* and PRIMARY KEY index constraints can be inherited.
+*/
+   if (!idx->rd_index->indisprimary && !idx->rd_index->indisunique)
+   continue;
+
+   constrOid = 
get_relation_idx_constraint_oid(RelationGetRelid(partRel),
+   
idxid);
+   if (!OidIsValid(constrOid))
+   elog(ERROR, "missing pg_constraint entry of index %u of 
%u",
+idxid, RelationGetRelid(partRel));
+
+   ConstraintSetParentConstraint(constrOid, InvalidOid);
}
table_close(classRel, RowExclusiveLock);
 
diff --git a/src/test/regress/expected/indexing.out 
b/src/test/regress/expected/indexing.out
index 118f2c78df..b344351ee7 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1414,3 +1414,18 @@ DETAIL:  Key (a)=(4) already exists.
 create unique index on covidxpart (b) include (a); -- should fail
 ERROR:  insufficient columns in UNIQUE constraint definition
 DETAIL:  UNIQUE constraint on table "covidxpart" lacks column "a" which is 
part of the partition key.
+-- check that detaching a partition also detaches the primary key constraint
+create table parted_pk_detach_test (a int primary key) partition by list (a);
+create table parted_pk_detach_test1 partition of parted_pk_detach_test for 
values in (1);
+alter table parted_pk_detach_test1 drop constraint 
parted_pk_detach_test1_pkey;-- should fail
+ERROR:  cannot drop inherited constraint "parted_pk_detach_test1_pkey" of 
relation "parted_pk_detach_test1"
+alter table parted_pk_detach_test detach partition parted_pk_detach_test1;
+alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;
+drop table parted_pk_detach_test, parted_pk_detach_test1;
+create table parted_uniq_detach_test (a int unique) partition by list (a);
+create table parted_uniq_detach_test1 partition of parted_uniq_detach_test for 
values in (1);
+alter table parted_uniq_detach_test1 drop constraint 
parted_uniq_detach_test1_a_key;   -- should fail
+ERROR:  cannot drop inherited constraint "parted_uniq_detach_test1_a_key" of 
relation "parted_uniq_detach_test1"
+alter table parted_uniq_detach_test detach partition parted_uniq_detach_test1;
+alter table parted_uniq_detach_test1 drop constraint 
parted_uniq_detach_test1_a_key;
+drop table parted_uniq_detach_test, parted_uniq_detach_test1;
diff --git a/src/test/regress/sql/indexing.sql 
b/src/test/regress/sql/indexing.sql
index d4a64c18c7..3d43a57da2 100644
--- a/src/test/regress/sql/indexing.sql
+++ b/src/test/regress/sql/indexing.sql
@@ -757,3 +757,17 @@ alter table covidxpart attach partition covidxpart4 for 
values in (4);
 insert into covidxpart values (4, 1);
 insert into covidxpart values (4, 1);
 create unique index on c

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

On Wed, 23 Jan 2019 at 12:46, David Rowley  wrote:
> (Stopped in statext_mcv_build(). Need to take a break)

Continuing...

27. statext_mcv_build() could declare the int j,k variables in the
scope that they're required in.

28. "an array"

 * build array of SortItems for distinct groups and counts matching items

29. No need to set isnull to false in statext_mcv_load()

30. Wondering about the reason in statext_mcv_serialize() that you're
not passing the collation to sort the array.

You have:

ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;

should it not be:

ssup[dim].ssup_collation = stats[dim]->attrcollid;
?

31. uint32 should use %u, not %d:

if (mcvlist->magic != STATS_MCV_MAGIC)
elog(ERROR, "invalid MCV magic %d (expected %d)",
mcvlist->magic, STATS_MCV_MAGIC);

and

if (mcvlist->type != STATS_MCV_TYPE_BASIC)
elog(ERROR, "invalid MCV type %d (expected %d)",
mcvlist->type, STATS_MCV_TYPE_BASIC);

and

ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("invalid length (%d) item array in MCVList",
mcvlist->nitems)));

I don't think %ld is the correct format for VARSIZE_ANY_EXHDR. %u or
%d seem more suited. I see that value is quite often assigned to int,
so probably can't argue much with %d.

elog(ERROR, "invalid MCV size %ld (expected %zu)",
VARSIZE_ANY_EXHDR(data), expected_size);

32. I think the format is wrong here too:

elog(ERROR, "invalid MCV size %ld (expected %ld)",
VARSIZE_ANY_EXHDR(data), expected_size);

I'd expect "invalid MCV size %d (expected %zu)"

33. How do you allocate a single chunk non-densely?

* Allocate one large chunk of memory for the intermediate data, needed
* only for deserializing the MCV list (and allocate densely to minimize
* the palloc overhead).

34. I thought I saw a few issues with pg_stats_ext_mcvlist_items() so
tried to test it:

create table ab (a int, b int);
insert into ab select x,x from generate_serieS(1,10)x;
create statistics ab_ab_stat (mcv) on a,b from ab;
analyze ab;
select pg_mcv_list_items(stxmcv) from pg_Statistic_ext where stxmcv is not null;
ERROR:  cache lookup failed for type 2139062143

The issues I saw were:

You do:
appendStringInfoString(&itemValues, "{");
appendStringInfoString(&itemNulls, "{");

but never append '}' after building the string.

(can use appendStringInfoChar() BTW)

also:

if (i == 0)
{
appendStringInfoString(&itemValues, ", ");
appendStringInfoString(&itemNulls, ", ");
}

I'd have expected you to append the ", " only when i > 0.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Block level parallel vacuum

2019-01-23 Thread Masahiko Sawada

On Tue, Jan 22, 2019 at 9:59 PM Haribabu Kommi  wrote:
>
>
> Thanks for the latest patch. I have some more minor comments.

Thank you for reviewing the patch.

>
> +  Execute index vacuum and cleanup index in parallel with
>
> Better to use vacuum index and cleanup index? This is in same with
> the description of vacuum phases. It is better to follow same notation
> in the patch.

Agreed. I've changed it to "Vacuum index and cleanup index in parallel
with ...".

>
>
> + dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);
>
> With the change, the lazy_space_alloc takes care of initializing the
> parallel vacuum, can we write something related to that in the comments.
>

Agreed.

>
> + initprog_val[2] = dead_tuples->max_dead_tuples;
>
> dead_tuples variable may need rename for better reading?
>

I might not get your comment correctly but I've tried to fix it.
Please review it.

>
>
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
> + else
> + copy_result = true;
>
>
> I don't see a need for copy_result variable, how about directly using
> the updated flag to decide whether to copy or not? Once the result is
> copied update the flag.
>

You're right. Fixed.

>
> +use Test::More tests => 34;
>
> I don't find any new tetst are added in this patch.

Fixed.

>
> I am thinking of performance penalty if we use the parallel option of
> vacuum on a small sized table?

Hm, unlike other parallel operations other than ParallelAppend the
parallel vacuum executes multiple index vacuum simultaneously.
Therefore this can avoid contension. I think there is a performance
penalty but it would not be big.

Attached the latest patches.




Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From 53a21a6983d6ef2787bc6e104d0ce62ed905a4d3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Tue, 18 Dec 2018 14:48:34 +0900
Subject: [PATCH v13 1/2] Add parallel option to VACUUM command

In parallel vacuum, we do both index vacuum and cleanup vacuum
in parallel with parallel worker processes if the table has
more than one index. All processes including the leader process
process indexes one by one.

Parallel vacuum can be performed by specifying like
VACUUM (PARALLEL 2) tbl, meaning that performing vacuum with 2
parallel worker processes. Or the setting parallel_workers
reloption more than 0 invokes parallel vacuum.

The parallel vacuum degree is limited by both the number of
indexes the table has and max_parallel_maintenance_workers.
---
 doc/src/sgml/config.sgml  |  24 +-
 doc/src/sgml/ref/vacuum.sgml  |  28 ++
 src/backend/access/heap/vacuumlazy.c  | 892 +-
 src/backend/access/transam/parallel.c |   4 +
 src/backend/commands/vacuum.c |  78 +--
 src/backend/nodes/equalfuncs.c|   6 +-
 src/backend/parser/gram.y |  73 ++-
 src/backend/postmaster/autovacuum.c   |  13 +-
 src/backend/tcop/utility.c|   4 +-
 src/include/access/heapam.h   |   5 +-
 src/include/commands/vacuum.h |   2 +-
 src/include/nodes/parsenodes.h|  19 +-
 src/test/regress/expected/vacuum.out  |   2 +
 src/test/regress/sql/vacuum.sql   |   3 +
 14 files changed, 945 insertions(+), 208 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822..b77a2bd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2185,18 +2185,18 @@ include_dir 'conf.d'

 
  Sets the maximum number of parallel workers that can be
- started by a single utility command.  Currently, the only
- parallel utility command that supports the use of parallel
- workers is CREATE INDEX, and only when
- building a B-tree index.  Parallel workers are taken from the
- pool of processes established by , limited by .  Note that the requested
- number of workers may not actually be available at run time.
- If this occurs, the utility operation will run with fewer
- workers than expected.  The default value is 2.  Setting this
- value to 0 disables the use of parallel workers by utility
- commands.
+ started by a single utility command.  Currently, the parallel
+ utility commands that support the use of parallel workers are
+ CREATE INDEX and VACUUM
+ without FULL option, and only when building
+ a B-tree index.  Parallel workers are taken from the pool of
+ processes established by ,
+ limited by .
+ Note that the requested number of workers may not actually be
+ available at run time.  If this occurs, the utility operation
+ will run with fewer workers than expected.  The default value
+ is 2.  Setting this value to 0 disables the use of parallel
+ workers by utility commands.
 
 
 
diff --git a/doc/src/sgml/ref/vacuum

Re: inherited primary key misbehavior

On 2019-Jan-24, Amit Langote wrote:

> Fixed in the attached.  We don't support inheriting EXCLUSION constraints
> yet, so no need to consider their indexes.

Looks good, pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Thread-unsafe coding in ecpg

Andrew Dunstan  writes:
> On 1/23/19 6:01 PM, Tom Lane wrote:
>> Perhaps there's some sort of setup that MinGW's version needs that
>> we're not doing?

> This seems to suggest something worse: https://reviews.llvm.org/D40181
> Not sure I fully understand what's happening here, though.

Me either, but I noted a couple of interesting extracts from that page:

Normal mingw that uses msvcrt.dll doesn't have per-thread locales so
it won't really work in any case (but I think it does define some sort
of dummy functions that at least will allow it to build). Nowadays,
mingw can be built to target ucrtbase.dll as well though, and there it
should be possible to make it work just like for MSVC although it
might need some patches.

... Looked into MinGW-w64 sources and this is indeed the
case. _configthreadlocale will return -1 and will not do anything.

This suggests that, rather than throwing up our hands if the initial
_configthreadlocale call returns -1, we should act as though the function
doesn't exist, and just soldier on the same as before.  The code in there
assumes that -1 is a can't-happen case and doesn't try to recover,
but apparently that's over-optimistic.

regards, tom lane

Re: WIP: Avoid creation of the free space map for small tables

On Wed, Jan 23, 2019 at 9:18 PM John Naylor  wrote:
>
> On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila  wrote:
> > I think the first two patches (a) removal of dead code in bootstrap
> > and (b) the core patch to avoid creation of FSM file for the small
> > table are good now.  I have prepared the patches along with commit
> > message.  There is no change except for some changes in README and
> > commit message of the second patch.  Kindly let me know what you think
> > about them?
>
> Good to hear! The additional language is fine. In "Once the FSM is
> created for heap", I would just change that to "...for a heap".
>

Sure, apart from this I have run pgindent on the patches and make some
changes accordingly.  Latest patches attached (only second patch has
some changes).  I will take one more pass on Monday morning (28th Jan)
and will commit unless you or others see any problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch
Description: Binary data


v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data

Re: Thread-unsafe coding in ecpg

I wrote:
> This suggests that, rather than throwing up our hands if the initial
> _configthreadlocale call returns -1, we should act as though the function
> doesn't exist, and just soldier on the same as before.  The code in there
> assumes that -1 is a can't-happen case and doesn't try to recover,
> but apparently that's over-optimistic.

I pushed a patch to fix that.

It looks to me like the reason that the ecpg tests went into an infinite
loop is that compat_informix/test_informix.pgc has not considered the
possibility of repeated statement failures:

while (1)
{
$fetch forward c into :i, :j, :c;
if (sqlca.sqlcode == 100) break;
else if (sqlca.sqlcode != 0) printf ("Error: %ld\n", sqlca.sqlcode);

if (risnull(CDECIMALTYPE, (char *)&j))
printf("%d NULL\n", i);
else
{
int a;

dectoint(&j, &a);
printf("%d %d \"%s\"\n", i, a, c);
}
}


I know zip about ecpg coding practices, but wouldn't it be a better idea
to break out of the loop on seeing an error?

regards, tom lane

Re: inherited primary key misbehavior

On 2019/01/24 12:03, Alvaro Herrera wrote:
> On 2019-Jan-24, Amit Langote wrote:
> 
>> Fixed in the attached.  We don't support inheriting EXCLUSION constraints
>> yet, so no need to consider their indexes.
> 
> Looks good, pushed.

Thank you.

Regards,
Amit

Re: WIP: Avoid creation of the free space map for small tables

On Thu, Jan 24, 2019 at 3:39 AM John Naylor  wrote:
>
> On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila  wrote:
> > Also, another case to think in this regard is the upgrade for standby
> > servers, if you read below paragraph from the user manual [1], you
> > will see what I am worried about?
> >
> > "What this does is to record the links created by pg_upgrade's link
> > mode that connect files in the old and new clusters on the primary
> > server. It then finds matching files in the standby's old cluster and
> > creates links for them in the standby's new cluster. Files that were
> > not linked on the primary are copied from the primary to the standby.
> > (They are usually small.)"
> >
> > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html
>
> I am still not able to get the upgraded standby to go into recovery
> without resorting to pg_basebackup, but in another attempt to
> investigate your question I tried the following (data1 = old cluster,
> data2 = new cluster):
>
>
> mkdir -p data1 data2 standby
>
> echo 'heap' > data1/foo
> echo 'fsm' > data1/foo_fsm
>
> # simulate streaming replication
> rsync --archive data1 standby
>
> # simulate pg_upgrade, skipping FSM
> ln data1/foo -t data2/
>
> rsync --archive --delete --hard-links --size-only --no-inc-recursive
> data1 data2 standby
>
> # result
> ls standby/data1
> ls standby/data2
>
>
> The result is that foo_fsm is not copied to standby/data2, contrary to
> what the docs above imply for other unlinked files. Can anyone shed
> light on this?
>

Is foo_fsm present in standby/data1?  I think what doc means to say is
that it copies any unlinked files present in primary's new cluster
(which in your case will be data2).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Thread-unsafe coding in ecpg

Oh, I just noticed something else: several of the ecpg test programs
contain

#ifdef WIN32
#ifdef _MSC_VER/* requires MSVC */
_configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
#endif
#endif

Surely this is a kluge that we could now remove?  We've protected the
only setlocale calls that the ecpg tests could reach in threaded mode.
If there is still a problem, it'd behoove us to find it, because
ecpglib should not expect that the calling app has done this for it.

regards, tom lane

Re: RTLD_GLOBAL (& JIT inlining)

2019-01-23 Thread Kyotaro HORIGUCHI

Hello.

At Mon, 26 Feb 2018 23:50:41 +0200, Ants Aasma  wrote in 

> On Mon, Feb 26, 2018 at 11:28 PM, Andres Freund  wrote:
> > So RTLD_LOCAL is out of the question, but I think we can get a good bit
> > of the benefit by either specifying -Wl,-Bsymbolic at shlib build time,
> > or RTLD_DEEPBIND at dlopen() time.  Either leads to the opened shared
> > library effectively being put at the beginning of the search path,
> > therefore avoiding the issue that an earlier loaded shared library or
> > symbols from the main binary can accidentally overwrite things in the
> > shared library itself. Which incidentally also makes loading a bit
> > faster.
> 
> I think this would also fix oracle_fdw crashing when postgres is
> compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1]
> 
> [1] 
> https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com

I saw several cases of the misbinding specifically among
extensions using different versions of the same library or a part
of which was just transplanted from another extension. Now I'm
seeing a case nearby again and found this thread.

Some pl-libraries (plpython does but plperl doesn't for me)
mandates to export their symbols for external modules. So
RTLD_GLOBAL is needed for such extensions but not needed in most
cases. We need a means to instruct whether a module wants to
expose symbols or not.

The extension control file doesn't seem the place. The most
appropriate way seems to be another magic data.

With the attached patch, external modules are loaded RTLD_LOCAL
by default. PG_MODULE_EXPORT_SYMBOL in a module source files let
it loaded RTLD_GLOBAL. I suppose this is the change with the
least impact on existing modules because I believe most of them
don't need to export symbols.

I think this works for all platforms that have dlopen or
shl_load. Windows doesn't the equivalent but anyway we should
explicitly declare using dllexport.

Any opinions or thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 114692655060bf74d01e0f5452e89f3bd332dfa1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 24 Jan 2019 13:35:19 +0900
Subject: [PATCH] Add control on wheter symbols in external module is exported

We load exteral modlues with RTLD_GLOBAL, which is problematic when
two or more modules share the same symbol. On the other hand some
modules mandates to export its symbols. This patch enables modules
to decide whether to export them or not. They is defaultly hidden.

plpython module is modified along with as it needs to export symbols.
---
 doc/src/sgml/xfunc.sgml|  8 
 src/backend/utils/fmgr/dfmgr.c | 13 -
 src/include/fmgr.h |  8 
 src/pl/plpython/plpy_main.c|  1 +
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index e18272c33a..9a65e76eda 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -1875,6 +1875,14 @@ PG_MODULE_MAGIC;
 (Presently, unloads are disabled and will never occur, but this may
 change in the future.)

+   
+ The dynamic loader loads a file as its symbols are hidden from external
+ modules by default. A file is loaded exporting the symbols by writhing
+ this in one of the module source files.
+
+PG_MODULE_EXPORT_SYMBOL;
+
+   
 
   
 
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 456297a531..98ac1cc438 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -236,7 +236,7 @@ internal_load_library(const char *libname)
 #endif
 		file_scanner->next = NULL;
 
-		file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_GLOBAL);
+		file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_LOCAL);
 		if (file_scanner->handle == NULL)
 		{
 			load_error = dlerror();
@@ -281,6 +281,17 @@ internal_load_library(const char *libname)
 	 errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro.")));
 		}
 
+		/* Check if the module wants to export symbols */
+		if (dlsym(file_scanner->handle, PG_MODULE_EXPORT_SYMBOL_NAME_STRING))
+		{
+			elog(DEBUG3,
+ "loaded dynamic-link library \"%s\" with RTLD_GLOBAL", libname);
+			/* want to export, reload it the way */
+			dlclose(file_scanner->handle);
+			file_scanner->handle = dlopen(file_scanner->filename,
+		  RTLD_NOW | RTLD_GLOBAL);
+		}
+
 		/*
 		 * If the library has a _PG_init() function, call it.
 		 */
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ead17f0e44..c072850646 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -456,6 +456,14 @@ PG_MAGIC_FUNCTION_NAME(void) \
 } \
 extern int no_such_variable
 
+#define PG_MODULE_EXPORT_SYMBOL_NAME Pg_module_exrpot_symbols
+#define PG_MODULE_EXPORT_SYMBOL_NAME_STRING "Pg_module_exrpot_symbols"
+
+#define PG_MODULE_EXPORT_SYMBOL	\
+extern PGDLLEXPORT void PG_MODULE_EXPORT_SYMBOL

Re: RTLD_GLOBAL (& JIT inlining)

Kyotaro HORIGUCHI  writes:
> With the attached patch, external modules are loaded RTLD_LOCAL
> by default. PG_MODULE_EXPORT_SYMBOL in a module source files let
> it loaded RTLD_GLOBAL. I suppose this is the change with the
> least impact on existing modules because I believe most of them
> don't need to export symbols.

This seems really hacky :-(

In the PL cases, at least, it's not so much the PL itself that
wants to export symbols as the underlying language library
(libpython, libperl, etc).  You're assuming that an action on
the PL .so will propagate to the underlying language .so, which
seems like a pretty shaky assumption.  I wonder also whether
closing and reopening the DL handle will really do anything
at all for already-loaded libraries ...

regards, tom lane

RE: speeding up planning with partitions

2019-01-23 Thread Imai, Yoshikazu

On Wed, Jan 23, 2019 at 1:35 AM, Amit Langote wrote:
> Rebased due to the heap_open/close() -> table_open/close() change.

Maybe there are not many things I can point out through reviewing the patch, so 
I ran the performance test against v17 patches instead of reviewing codes.
There are already a lot of tests about partition pruning case and we confirmed 
performance improves in those cases. In this time, I tested about accessing all 
partitions case.

I tested with master, master + 0001, master + 0001 + 0002, ..., master + 0001 + 
0002 + 0003 + 0004.
I ran pgbench 3 times in each test case and below results are average of those.

[postgresql.conf]
max_parallel_workers = 0
max_parallel_workers_per_gather = 0

[partition table definitions(8192 partitions case)]
create table rt (a int, b int, c int) partition by range (a)
create table rt_1 partition of rt for values from (1) to (2);
...
create table rt_8192 partition of rt for values from (8191) to (8192);

[pgbench commands]
pgbench -n -f update.sql -T 30 postgres

[update.sql(updating partkey case)]
update rt set a = 1;

[update.sql(updating non-partkey case)]
update rt set b = 1;

[results]
updating partkey case:

part-num  master 0001 0002 0003 0004
18215.34  7924.99  7931.15  8407.40  8475.65 
27137.49  7026.45  7128.84  7583.08  7593.73 
45880.54  5896.47  6014.82  6405.33  6398.71 
84222.96  4446.40  4518.54  4802.43  4785.82 
16   2634.91  2891.51  2946.99  3085.81  3087.91 
32935.12  1125.28  1169.17  1199.44  1202.04 
64352.37   405.27   417.09   425.78   424.53 
128   236.26   310.01   307.70   315.29   312.81 
25665.3686.8487.6784.3989.27 
51218.3424.8423.5523.9123.91 
10244.83 6.93 6.51 6.45 6.49 


updating non-partkey case:

part-num   master0001 0002 0003  0004
18862.58  8421.49  8575.35  9843.71  10065.30   
27715.05  7575.78  7654.28  8800.84   8720.60   
46249.95  6321.32  6470.26  7278.14   7280.10   
84514.82  4730.48  4823.37  5382.93   5341.10   
16   2815.21  3123.27  3162.51  3422.36   3393.94   
32968.45  1702.47  1722.38  1809.89   1799.88   
64364.17   420.48   432.87   440.20435.31   
128   119.94   148.77   150.47   152.18143.35   
25645.0946.3546.9348.30 45.85   
512 8.7410.5910.2310.27 10.13   
10242.28 2.60 2.56 2.57  2.51   


Looking at the results, if we only apply 0001 or 0001 + 0002 and if number of 
partition is few like 1 or 2, performance degrades compare to master(A maximum 
reduction is about 5%, which is 8863->8421).
In all other cases, performance improves compare to master.

--
Yoshikazu Imai

Re: postgres_fdw: oddity in costing aggregate pushdown paths

2019-01-23 Thread Etsuro Fujita

(2019/01/23 20:35), Etsuro Fujita wrote:
> Attached is an updated version of the patch.
> 
> Changes:
> * Fixed a stupid bug in the case when use_remote_estimate
> * Fixed a typo in a comment I added
> * Modified comments I added a little bit further
> * Added the commit message
> 
> If there are no objections, I'll push that.

Pushed after fixing another typo in a comment.

Best regards,
Etsuro Fujita

Re: pg_dump multi VALUES INSERT

On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen  wrote:
> okay .thank you for explaining. i attach a patch corrected as such

I did a bit of work to this to fix a bunch of things:

1. Docs for --rows-per-insert didn't mention anything about a parameter.
2. You'd not followed the alphabetical order of how the parameters are
documented.
3. Various parts of the docs claimed that --inserts just inserted 1
row per statement. Those needed to be updated.
4. New options out of order in --help. The rest were in alphabetical order.
5. DumpOptions struct variable was not in the right place. It was
grouped in with some parameterless options.
6. Code in dumpTableData_insert() was convoluted. Not sure what you
had added end_of_statement for or why you were checking PQntuples(res)
== 0.  You'd also made the number_of_row variable 1-based and set it
to 1 when we had added 0 rows. You then checked for the existence of 1
row by checking the variable was > 1... That made very little sense to
me. I've pretty much put back the code that I had sent to you
previously, just without the part where I split the row building code
out into another function.
7. A comment in dumpTableData_insert() claimed that the insertStmt
would end in "VALUES(", but it'll end in "VALUES ". I had updated that
in my last version, but you must have missed that.
8. I've made it so --rows-per-insert implies --inserts. This is
aligned with how --column-inserts behaves.

I changed a few other things. I simplified the condition that raises
an error when someone does --on-conflict-do-nothing without the
--inserts option. There was no need to check for the other options
that imply --inserts since that will already be enabled if one of the
other options has.

I also removed most of the error checking you'd added to ensure that
the --rows-per-insert parameter was a number.  I'd have kept this but
I saw that we do nothing that special for the compression option. I
didn't see why --rows-per-insert was any more special. It was quite a
bit of code for very little reward.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_dump-rows-per-insert-option_v10.patch
Description: Binary data

Re: Protect syscache from bloating with negative cache entries

2019-01-23 Thread Kyotaro HORIGUCHI

At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp>
> An option is an additional PGPROC member and interface functions.
> 
> struct PGPROC
> {
>  ...
>  int syscahe_usage_track_interval; /* track interval, 0 to disable */
> 
> =# select syscahce_usage_track_add(, [, ]);
> =# select syscahce_usage_track_remove(2134);
> 
> 
> Or, just provide an one-shot triggering function.
> 
> =# select syscahce_take_usage_track();
> 
> This can use both a similar PGPROC variable or SendProcSignal()
> but the former doesn't fire while idle time unless using timer.

The attached is revised version of this patchset, where the third
patch is the remote setting feature. It uses static shared memory.

=# select pg_backend_catcache_stats(, );

Activates or changes catcache stats feature on the backend with
PID. (The name should be changed to .._syscache_stats, though.)
It is far smaller than the remote-GUC feature. (It contains a
part that should be in the previous patch. I will fix it later.)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 067f8ad60f259453271d2bf8323505beb5b9e0a9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 16 Oct 2018 13:04:30 +0900
Subject: [PATCH 1/3] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.
---
 doc/src/sgml/config.sgml  |  38 ++
 src/backend/access/transam/xact.c |   5 +
 src/backend/utils/cache/catcache.c| 166 --
 src/backend/utils/misc/guc.c  |  23 
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/utils/catcache.h  |  28 -
 6 files changed, 254 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..af3c52b868 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1662,6 +1662,44 @@ include_dir 'conf.d'
   
  
 
+ 
+  syscache_memory_target (integer)
+  
+   syscache_memory_target configuration parameter
+  
+  
+  
+   
+Specifies the maximum amount of memory to which syscache is expanded
+without pruning. The value defaults to 0, indicating that pruning is
+always considered. After exceeding this size, syscache pruning is
+considered according to
+. If you need to keep
+certain amount of syscache entries with intermittent usage, try
+increase this setting.
+   
+  
+ 
+
+ 
+  syscache_prune_min_age (integer)
+  
+   syscache_prune_min_age configuration parameter
+  
+  
+  
+   
+Specifies the minimum amount of unused time in seconds at which a
+syscache entry is considered to be removed. -1 indicates that syscache
+pruning is disabled at all. The value defaults to 600 seconds
+(10 minutes). The syscache entries that are not
+used for the duration can be removed to prevent syscache bloat. This
+behavior is suppressed until the size of syscache exceeds
+.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 18467d96d2..dbffec8067 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -733,7 +733,12 @@ void
 SetCurrentStatementStartTimestamp(void)
 {
 	if (!IsParallelWorker())
+	{
 		stmtStartTimestamp = GetCurrentTimestamp();
+
+		/* Set this timestamp as aproximated current time */
+		SetCatCacheClock(stmtStartTimestamp);
+	}
 	else
 		Assert(stmtStartTimestamp != 0);
 }
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 8152f7e21e..ee40093553 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -72,9 +72,24 @@
 #define CACHE6_elog(a,b,c,d,e,f,g)
 #endif
 
+/*
+ * GUC variable to define the minimum size of hash to cosider entry eviction.
+ * This variable is shared among various cache mechanisms.
+ */
+int cache_memory_target = 0;
+
+/* GUC variable to define the minimum age of entries that will be cosidered to
+ * be evicted in seconds. This variable is shared among various cache
+ * mechanisms.
+ */
+int cache_prune_min_age = 600;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Timestamp used for any operation on caches. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 	   int nkeys,
 	   Datum v1, Datum v2,
@@ -491,6 +506,7 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct

Analyze all plans

2019-01-23 Thread Donald Dong

Hi,

I'm working on an extension which analyzes all possible plans
generated by the planner. I believe this extension would become
useful for benchmarking the planner (e.g. the performance of the
estimation and the cost model) and better understanding the cases
where the planners would make a suboptimal move.

Here are my procedures:
1. Enumerate all the plans
1.1 Create a hook for add_path so that it won't discard the
expensive paths from the planner's point of view.
1.2 Collect all the paths from final_rel->pathlist, and turn them
into PlannedStmt nodes by patching standard_planner.
1.3 Mark the cheapest path from the planner's point of view.

2. Explain all the plans
2.1 First explain the cheapest plan
2.1.1 If analyzing, collect the execution time and use it to set
a timer interrupt.
2.2 Explain the remaining plans
2.2.1 The other plans can be disastrous; the plans may never
finish in a reasonable amount of time. If 
analyzing, the timer
interrupt shall stop the executor.
2.2.2 Move on to the next plan

Are those procedures reasonable?

I'm able to implement all the steps except for 2.2.1.

- Attempt 1
Our most common way of handling the timeouts is to kill the
process. However, it would terminate the entire explain statement,
yet there're still plans to be explained.

- Attempt 2
Fork before explain so it would possible to kill the child process
without disrupting the explain statement. However, simply
allocating shared memory for the QueryDesc would not work (PANIC:
failed to re-find shared lock object). To me, this plan is more
doable, but it seems there exist other mechanisms I need to be
aware, to execute a plan in the child process and report the
result in the parent process?

What do you think?  I will appreciate any feedback.

Thank you,
Donald Dong

yet another comment typo patch



An assorted set of command typos is fixed in the attached patch.

--
Fabien.diff --git a/contrib/pgcrypto/pgp-decrypt.c b/contrib/pgcrypto/pgp-decrypt.c
index 7d31e5354b..96c344c30b 100644
--- a/contrib/pgcrypto/pgp-decrypt.c
+++ b/contrib/pgcrypto/pgp-decrypt.c
@@ -132,7 +132,7 @@ pgp_parse_pkt_hdr(PullFilter *src, uint8 *tag, int *len_p, int allow_ctx)
 	int			res;
 	uint8	   *p;
 
-	/* EOF is normal here, thus we dont use GETBYTE */
+	/* EOF is normal here, thus we don't use GETBYTE */
 	res = pullf_read(src, 1, &p);
 	if (res < 0)
 		return res;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d85a83abe9..961838ed93 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2771,7 +2771,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			run_cost += cpu_tuple_cost * numGroups;
 			run_cost += ptarget->cost.per_tuple * numGroups;
 
-			/* Accout for the eval cost of HAVING quals, if any */
+			/* Account for the eval cost of HAVING quals, if any */
 			if (root->parse->havingQual)
 			{
 QualCost	remote_cost;
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index fb37e2b6c4..b18ae2b3ed 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -442,7 +442,7 @@ restartScanEntry:
 			/*
 			 * Lock the entry leaf page.  This is more coarse-grained than
 			 * necessary, because it will conflict with any insertions that
-			 * land on the same leaf page, not only the exacty key we searched
+			 * land on the same leaf page, not only the exact key we searched
 			 * for.  But locking an individual tuple would require updating
 			 * that lock whenever it moves because of insertions or vacuums,
 			 * which seems too complicated.
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index d793efdd9c..5a7206f588 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1873,7 +1873,7 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot)
 
 	/*
 	 * Should probably fixed at some point, but for now it's easier to allow
-	 * buffer and heap tuples to be used interchangably.
+	 * buffer and heap tuples to be used interchangeably.
 	 */
 	if (slot->tts_ops == &TTSOpsBufferHeapTuple &&
 		op->d.fetch.kind == &TTSOpsHeapTuple)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d6cfd28ddc..ceea3d6d72 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1049,7 +1049,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
 			MemoryContextAllocZero(planstate->state->es_query_cxt, sizeof(JitInstrumentation));
 	combined = planstate->state->es_jit_worker_instr;
 
-	/* Accummulate all the workers' instrumentations. */
+	/* Accumulate all the workers' instrumentations. */
 	for (n = 0; n < shared_jit->num_workers; ++n)
 		InstrJitAgg(combined, &shared_jit->jit_instr[n]);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0c0891b33e..e773f20d9f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2243,7 +2243,7 @@ check_log_duration(char *msec_str, bool was_logged)
 
 		/*
 		 * Do not log if log_statement_sample_rate = 0. Log a sample if
-		 * log_statement_sample_rate <= 1 and avoid unecessary random() call
+		 * log_statement_sample_rate <= 1 and avoid unnecessary random() call
 		 * if log_statement_sample_rate = 1.  But don't compute any of this
 		 * unless needed.
 		 */
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index ba28611d8c..f09e3a9aff 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -8,7 +8,7 @@
  *	When a tuple is updated or deleted, our standard visibility rules
  *	consider that it is *still valid* so long as we are in the same command,
  *	ie, until the next CommandCounterIncrement() or transaction commit.
- *	(See acces/heap/heapam_visibility.c, and note that system catalogs are
+ *	(See access/heap/heapam_visibility.c, and note that system catalogs are
  *  generally scanned under the most current snapshot available, rather than
  *  the transaction snapshot.)	At the command boundary, the old tuple stops
  *	being valid and the new version, if any, becomes valid.  Therefore,
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index bca2dc118d..2fb7dd3b8e 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -151,7 +151,7 @@ typedef struct RecordCacheEntry
 
 /*
  * To deal with non-anonymous record types that are exchanged by backends
- * involved in a parallel query, we also need a shared verion of the above.
+ * involved in a parallel query, we also need a shared version of the above.
  */
 struct SharedRecordTypmodRegistry
 {
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pg

Re: Analyze all plans

2019-01-23 Thread Oleksandr Shulgin

On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:

>
> 1. Enumerate all the plans
>

Not sure this is going to work.  Because of the total number of possible
plans is somewhere
around O(n!), if I'm not mistaken, in terms of number of joined relations
times the available
access methods times the possible join strategies.

So enumerating all possible plans stops being practical for even slightly
complicated queries.

Regards,
--
Alex

Re: pg_dump multi VALUES INSERT




Hello David & Surafel,

About this v10:

Patch applies and compiles cleanly, local & global "make check" ok.

A few comments, possibly redundant with some already in the thread.

Out of abc-order rows-per-inserts option in getopt struct.

help string does not specify the expected argument.

I still think that the added rows_per_insert field is useless, ISTM That 
"int dump_inserts" can be reused, and you could also drop boolean 
got_rows_per_insert.


The feature is not tested anywhere. I still think that there should be a 
test on empty/small/larger-than-rows-per-insert tables, possibly added to 
existing TAP-tests.


--
Fabien.

inherited primary key misbehavior

Hi,

It seems that ATExecDetachPartition misses to mark a child's primary key
constraint entry (if any) as local (conislocal=true, coninhcount=0), which
causes this:

create table p (a int primary key) partition by list (a);
create table p2 partition of p for values in (2);
alter table p detach partition p2;
alter table p2 drop constraint p2_pkey ;
ERROR:  cannot drop inherited constraint "p2_pkey" of relation "p2"

select conname, conislocal, coninhcount from pg_constraint where conrelid
= 'p2'::regclass;
 conname │ conislocal │ coninhcount
─┼┼─
 p2_pkey │ f  │   1
(1 row)

How about the attached patch?

Thanks,
Amit
From 42cb636ce6acee717fa19f6b69bf40aacc402852 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 23 Jan 2019 18:33:09 +0900
Subject: [PATCH] Detach primary key constraint when detaching partition

---
 src/backend/commands/tablecmds.c   | 11 +++
 src/test/regress/expected/indexing.out |  9 +
 src/test/regress/sql/indexing.sql  |  9 +
 3 files changed, 29 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..897f74119d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15094,6 +15094,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
foreach(cell, indexes)
{
Oid idxid = lfirst_oid(cell);
+   Oid constrOid;
Relationidx;
 
if (!has_superclass(idxid))
@@ -15106,6 +15107,16 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
IndexSetParentIndex(idx, InvalidOid);
update_relispartition(classRel, idxid, false);
index_close(idx, NoLock);
+
+   /* Detach any associated constraint as well. */
+   if (!idx->rd_index->indisprimary)
+   continue;
+   constrOid = 
get_relation_idx_constraint_oid(RelationGetRelid(partRel),
+   
idxid);
+   if (!OidIsValid(constrOid))
+   elog(ERROR, "missing pg_constraint entry of index %u of 
%u",
+idxid, RelationGetRelid(partRel));
+   ConstraintSetParentConstraint(constrOid, InvalidOid);
}
table_close(classRel, RowExclusiveLock);
 
diff --git a/src/test/regress/expected/indexing.out 
b/src/test/regress/expected/indexing.out
index 118f2c78df..ebf86d7bfc 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1414,3 +1414,12 @@ DETAIL:  Key (a)=(4) already exists.
 create unique index on covidxpart (b) include (a); -- should fail
 ERROR:  insufficient columns in UNIQUE constraint definition
 DETAIL:  UNIQUE constraint on table "covidxpart" lacks column "a" which is 
part of the partition key.
+-- check that detaching a partition also detaches the primary key constraint
+create table parted_pk_detach_test (a int primary key) partition by list (a);
+create table parted_pk_detach_test1 partition of parted_pk_detach_test for 
values in (1);
+alter table parted_pk_detach_test detach partition parted_pk_detach_test1;
+alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;
+alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for 
values in (1);
+alter table parted_pk_detach_test1 drop constraint 
parted_pk_detach_test1_pkey;-- should fail
+ERROR:  cannot drop inherited constraint "parted_pk_detach_test1_pkey" of 
relation "parted_pk_detach_test1"
+drop table parted_pk_detach_test;
diff --git a/src/test/regress/sql/indexing.sql 
b/src/test/regress/sql/indexing.sql
index d4a64c18c7..c110f0bdb8 100644
--- a/src/test/regress/sql/indexing.sql
+++ b/src/test/regress/sql/indexing.sql
@@ -757,3 +757,12 @@ alter table covidxpart attach partition covidxpart4 for 
values in (4);
 insert into covidxpart values (4, 1);
 insert into covidxpart values (4, 1);
 create unique index on covidxpart (b) include (a); -- should fail
+
+-- check that detaching a partition also detaches the primary key constraint
+create table parted_pk_detach_test (a int primary key) partition by list (a);
+create table parted_pk_detach_test1 partition of parted_pk_detach_test for 
values in (1);
+alter table parted_pk_detach_test detach partition parted_pk_detach_test1;
+alter table parted_pk_detach_test1 drop constraint parted_pk_detach_test1_pkey;
+alter table parted_pk_detach_test attach partition parted_pk_detach_test1 for 
values in (1);
+alter table parted_pk_detach_test1 drop constraint 
parted_pk_detach_test1_pkey;-- should fail
+drop table parted_pk_detach_test;
-- 
2.11.0

Re: Proposal for Signal Detection Refactoring

2019-01-23 Thread Chris Travers

attached is a new signal handing patch.  Path is corrected and moved.  The
documentation is sightly streamlined in some places and expanded in others.

Feedback requested.
-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


signal_readme_patch.diff
Description: Binary data

Almost bug in COPY FROM processing of GB18030 encoded input


Hi,

I happened to notice that when CopyReadLineText() calls mblen(), it 
passes only the first byte of the multi-byte characters. However, 
pg_gb18030_mblen() looks at the first and the second byte. 
CopyReadLineText() always passes \0 as the second byte, so 
pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded 
characters as 2.


It works out fine, though, because the second half of the 4-byte encoded 
character always looks like another 2-byte encoded character, in 
GB18030. CopyReadLineText() is looking for delimiter and escape 
characters and newlines, and only single-byte characters are supported 
for those, so treating a 4-byte character as two 2-byte characters is 
harmless.


Attached is a patch to explain that in the comments. Grepping for 
mblen(), I didn't find any other callers that used mblen() like that.


- Heikki
>From b0a09c240a2b4d1120433828ca052a2542ff362d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 23 Jan 2019 13:04:05 +0200
Subject: [PATCH 1/1] Fix comments to that claimed that mblen() only looks at
 first byte.

GB18030's mblen() function looks at the first and the second byte of the
multibyte character, to determine its length. copy.c had made the assumption
that mblen() only looks at the first byte, but it turns out to work out
fine, because of the way the GB18030 encoding works. COPY will see a 4-byte
encoded character as two 2-byte encoded characters, which is enough for
COPY's purposes. It cannot mix those up with delimiter or escaping
characters, because only single-byte characters are supported as
delimiters or escape characters.
---
 src/backend/commands/copy.c  |  7 ++-
 src/backend/utils/mb/wchar.c | 32 +---
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a61a628471..5074ce0daa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -4073,9 +4073,14 @@ not_end_of_copy:
 		{
 			int			mblen;
 
+			/*
+			 * It is enough to look at the first byte in all our encodings, to
+			 * get the length.  (GB18030 is a bit special, but still works for
+			 * our purposes; see comment in pg_gb18030_mblen())
+			 */
 			mblen_str[0] = c;
-			/* All our encodings only read the first byte to get the length */
 			mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str);
+
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
 			IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
 			raw_buf_ptr += mblen - 1;
diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
index a5fdda456e..8e5116dfc1 100644
--- a/src/backend/utils/mb/wchar.c
+++ b/src/backend/utils/mb/wchar.c
@@ -15,16 +15,23 @@
 
 
 /*
- * conversion to pg_wchar is done by "table driven."
- * to add an encoding support, define mb2wchar_with_len(), mblen(), dsplen()
- * for the particular encoding. Note that if the encoding is only
- * supported in the client, you don't need to define
- * mb2wchar_with_len() function (SJIS is the case).
+ * Operations on multi-byte encodings are driven by a table of helper
+ * functions.
+ *
+ * To add an encoding support, define mblen(), dsplen() and verifier() for
+ * the encoding.  For server-encodings, also define mb2wchar() and wchar2mb()
+ * conversion functions.
  *
  * These functions generally assume that their input is validly formed.
  * The "verifier" functions, further down in the file, have to be more
- * paranoid.  We expect that mblen() does not need to examine more than
- * the first byte of the character to discover the correct length.
+ * paranoid.
+ *
+ * We expect that mblen() does not need to examine more than the first byte
+ * of the character to discover the correct length.  GB18030 is an exception
+ * to that rule, though, as it also looks at second byte.  But even that
+ * behaves in a predictable way, if you only pass the first byte: it will
+ * treat 4-byte encoded characters as two 2-byte encoded characters, which is
+ * good enough for all current uses.
  *
  * Note: for the display output of psql to work properly, the return values
  * of the dsplen functions must conform to the Unicode standard. In particular
@@ -1073,6 +1080,17 @@ pg_uhc_dsplen(const unsigned char *s)
  * GB18030
  *	Added by Bill Huang ,
  */
+
+/*
+ * Unlike all other mblen() functions, this also looks at the second byte of
+ * the input.  However, if you only pass the first byte of a multi-byte
+ * string, and \0 as the second byte, this still works in a predictable way:
+ * a 4-byte character will be reported as two 2-byte characters.  That's
+ * enough for all current uses, as a client-only encoding.  It works that
+ * way, because in any valid 4-byte GB18030-encoded character, the third and
+ * fourth byte look like a 2-byte encoded character, when looked at
+ * separately.
+ */
 static int
 pg_gb18030_mblen(const unsigned char *s)
 {
-- 
2.20.1

Re: postgres_fdw: oddity in costing aggregate pushdown paths

2019-01-23 Thread Etsuro Fujita

(2019/01/04 20:33), Etsuro Fujita wrote:
> Here is a new version of the patch.

Attached is an updated version of the patch.

Changes:
* Fixed a stupid bug in the case when use_remote_estimate
* Fixed a typo in a comment I added
* Modified comments I added a little bit further
* Added the commit message

If there are no objections, I'll push that.

Best regards,
Etsuro Fujita
>From 3ff65a6f05a330140bf5de94fbcf6ab2a04f040b Mon Sep 17 00:00:00 2001
From: Etsuro Fujita 
Date: Wed, 23 Jan 2019 20:30:30 +0900
Subject: [PATCH] postgres_fdw: Account for tlist eval costs in
 estimate_path_cost_size()

Previously, estimate_path_cost_size() didn't account for tlist eval
costs, except when costing a foreign-grouping path using local
statistics, but such costs should be accounted for when costing that path
using remote estimates, because some of the tlist expressions might be
evaluated locally.  Also, such costs should be accounted for in the case
of a foreign-scan or foreign-join path, because currently, postgres_fdw
evaluates PlaceHolderVars (if any) locally.

This also fixes an oversight in my commit
f8f6e44676ef38fee7a5bbe4f256a34ea7799ac1.

Like that commit, apply this to HEAD only to avoid destabilizing existing
plan choices.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/5BFD3EAD.2060301%40lab.ntt.co.jp
---
 contrib/postgres_fdw/postgres_fdw.c | 41 +++--
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d85a83abe9..f460fd2ec4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2499,6 +2499,9 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		total_cost;
 	Cost		cpu_per_tuple;
 
+	/* Make sure the core code has set up the relation's reltarget */
+	Assert(foreignrel->reltarget);
+
 	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
@@ -2576,6 +2579,24 @@ estimate_path_cost_size(PlannerInfo *root,
 		cost_qual_eval(&local_cost, local_param_join_conds, root);
 		startup_cost += local_cost.startup;
 		total_cost += local_cost.per_tuple * retrieved_rows;
+
+		/*
+		 * Add in tlist eval cost for each output row.  In case of an
+		 * aggregate, some of the tlist expressions such as grouping
+		 * exressions will be evaluated remotely, so adjust the costs.
+		 */
+		startup_cost += foreignrel->reltarget->cost.startup;
+		total_cost += foreignrel->reltarget->cost.startup;
+		total_cost += foreignrel->reltarget->cost.per_tuple * rows;
+		if (IS_UPPER_REL(foreignrel))
+		{
+			QualCost	tlist_cost;
+
+			cost_qual_eval(&tlist_cost, fdw_scan_tlist, root);
+			startup_cost -= tlist_cost.startup;
+			total_cost -= tlist_cost.startup;
+			total_cost -= tlist_cost.per_tuple * rows;
+		}
 	}
 	else
 	{
@@ -2674,19 +2695,19 @@ estimate_path_cost_size(PlannerInfo *root,
 			nrows = clamp_row_est(nrows * fpinfo->joinclause_sel);
 			run_cost += nrows * remote_conds_cost.per_tuple;
 			run_cost += fpinfo->local_conds_cost.per_tuple * retrieved_rows;
+
+			/* Add in tlist eval cost for each output row */
+			startup_cost += foreignrel->reltarget->cost.startup;
+			run_cost += foreignrel->reltarget->cost.per_tuple * rows;
 		}
 		else if (IS_UPPER_REL(foreignrel))
 		{
 			PgFdwRelationInfo *ofpinfo;
-			PathTarget *ptarget = foreignrel->reltarget;
 			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
 			double		numGroups = 1;
 
-			/* Make sure the core code set the pathtarget. */
-			Assert(ptarget != NULL);
-
 			/*
 			 * This cost model is mixture of costing done for sorted and
 			 * hashed aggregates in cost_agg().  We are not sure which
@@ -2750,26 +2771,22 @@ estimate_path_cost_size(PlannerInfo *root,
 			 * Startup cost includes:
 			 *	  1. Startup cost for underneath input relation
 			 *	  2. Cost of performing aggregation, per cost_agg()
-			 *	  3. Startup cost for PathTarget eval
 			 *-
 			 */
 			startup_cost = ofpinfo->rel_startup_cost;
 			startup_cost += aggcosts.transCost.startup;
 			startup_cost += aggcosts.transCost.per_tuple * input_rows;
 			startup_cost += (cpu_operator_cost * numGroupCols) * input_rows;
-			startup_cost += ptarget->cost.startup;
 
 			/*-
 			 * Run time cost includes:
 			 *	  1. Run time cost of underneath input relation
 			 *	  2. Run time cost of performing aggregation, per cost_agg()
-			 *	  3. PathTarget eval cost for each output row
 			 *-
 			 */
 			run_cost = ofpinfo->rel_total_cost - ofpinfo->rel_startup_cost;
 			run_cost += aggcosts.finalCost * numGroups;
 			run_cost += cpu_tuple_cost * numGroups;
-			run_cost += ptarget->cost.per_tuple * numGroups;
 
 			/* Accout for the eval cost of HAVING quals, if any */
 			if (root->parse->havingQual)
@@ -2784,6 +2801,10 @@ estimate_path_cost_size(PlannerInfo *root,
 startup_cost += fpinfo->local_conds_cost.startup;
 run_cost +

Re: yet another comment typo patch

On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO  wrote:
>
>
> An assorted set of command typos is fixed in the attached patch.
>

The patch looks good to me.  I will take another pass over it and commit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: yet another comment typo patch


On 23/01/2019 13:51, Amit Kapila wrote:

On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO  wrote:



An assorted set of command typos is fixed in the attached patch.


The patch looks good to me.  I will take another pass over it and commit.


I just committed this. Sorry, I didn't see your mail until just afterwards.

- Heikki

Re: yet another comment typo patch

On Wed, Jan 23, 2019 at 5:28 PM Heikki Linnakangas  wrote:
>
> On 23/01/2019 13:51, Amit Kapila wrote:
> > On Wed, Jan 23, 2019 at 2:18 PM Fabien COELHO  wrote:
> >>
> >>
> >> An assorted set of command typos is fixed in the attached patch.
> >
> > The patch looks good to me.  I will take another pass over it and commit.
>
> I just committed this. Sorry, I didn't see your mail until just afterwards.
>

No problem, thanks!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP: Avoid creation of the free space map for small tables

On Sun, Jan 20, 2019 at 5:19 AM John Naylor  wrote:
>
> I have a test for in-range and out-of-range for each relation fork.
>

I think the first two patches (a) removal of dead code in bootstrap
and (b) the core patch to avoid creation of FSM file for the small
table are good now.  I have prepared the patches along with commit
message.  There is no change except for some changes in README and
commit message of the second patch.  Kindly let me know what you think
about them?

I think these two patches can go even without the upgrade patch
(during pg_upgrade, conditionally skip transfer of FSMs.) which is
still under discussion.  However, I am not in a hurry if you or other
thinks that upgrade patch must be committed along with the second
patch.  I think the upgrade patch is generally going on track but
might need some more review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch
Description: Binary data

v17-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data

Re: postgres on a non-journaling filesystem


On 23/01/2019 01:03, maayan mordehai wrote:

hello,

I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a  journaling filesystem. and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?


Yep, any half-written transactions will be rolled back.


and how does it work in the data level? if some of the 8k block is written
but not all of it, and then there is a crash, how postgres deals with it?


The first time a block is modified after a checkpoint, a copy of the 
block is written to the WAL. At crash recovery, the block is restored 
from the WAL. This mechanism is called "full page writes".


The WAL works just like the journal in a journaling filesystem. That's 
why it's not necessary to have journaling at the filesystem level.


- Heikki

Re: pg_dump multi VALUES INSERT

On Wed, 23 Jan 2019 at 22:08, Fabien COELHO  wrote:
> Out of abc-order rows-per-inserts option in getopt struct.

I missed that. Thanks. Fixed in attached.

> help string does not specify the expected argument.

Also fixed.

> I still think that the added rows_per_insert field is useless, ISTM That
> "int dump_inserts" can be reused, and you could also drop boolean
> got_rows_per_insert.

I thought about this and looked into it, but I decided it didn't look
like a smart thing to do.  The reason is that if --inserts sets
dump_inserts to 1 then --rows-per-insert sets it to something else
that's likely fine, but if that happens in the opposite order then the
--rows-per-insert gets overwritten with 1. The bad news is the order
that happens is defined by the order of the command line args.  It
might be possible to make it work by having --inserts set some other
variable, then set dump_inserts to 1 if it's set to 0 and the other
variable is set to >= 1... but that requires another variable,  which
is what you want to avoid... I think it's best to have a variable per
argument.

I could get rid of the got_rows_per_insert variable, but it would
require setting the default value for rows_per_insert in the main()
function rather than in InitDumpOptions().   I thought
InitDumpOptions() looked like just the place to do this, so went with
that option.  To make it work without got_rows_per_insert,
rows_per_insert would have to be 0 by default and we'd know we saw a
--rows-per-insert command line arg by the fact that rows_per_insert
was non-zero.  Would you rather have it that way?

> The feature is not tested anywhere. I still think that there should be a
> test on empty/small/larger-than-rows-per-insert tables, possibly added to
> existing TAP-tests.

I was hoping to get away with not having to do that... mainly because
I've no idea how.   Please have at it if you know how it's done.

FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable
to make it pass because I couldn't really figure out how the regex
matching is meant to work. It does not seem to be explained very well
in the comments and I lack patience for Perl.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_dump-rows-per-insert-option_v11.patch
Description: Binary data

Re: postgres on a non-journaling filesystem

2019-01-23 Thread maayan mordehai

Thank you!!

On Wed, Jan 23, 2019, 2:20 PM Heikki Linnakangas  On 23/01/2019 01:03, maayan mordehai wrote:
> > hello,
> >
> > I'm Maayan, I'm in a DBA team that uses postgresql.
> > I saw in the documentation on wals:
> > https://www.postgresql.org/docs/10/wal-intro.html
> > In the tip box that, it's better not to use a  journaling filesystem.
> and I
> > wanted to ask how it works?
> > can't we get corruption that we can't recover from?
> > I mean what if postgres in the middle of a write to a wal and there is a
> > crash, and it didn't finish.
> > I'm assuming it will detect it when we will start postgres and write that
> > it was rolled back, am I right?
>
> Yep, any half-written transactions will be rolled back.
>
> > and how does it work in the data level? if some of the 8k block is
> written
> > but not all of it, and then there is a crash, how postgres deals with it?
>
> The first time a block is modified after a checkpoint, a copy of the
> block is written to the WAL. At crash recovery, the block is restored
> from the WAL. This mechanism is called "full page writes".
>
> The WAL works just like the journal in a journaling filesystem. That's
> why it's not necessary to have journaling at the filesystem level.
>
> - Heikki
>

Re: Analyze all plans

Oleksandr Shulgin  writes:
> On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:
>> 1. Enumerate all the plans

> So enumerating all possible plans stops being practical for even slightly
> complicated queries.

Yes.  You can *not* disable the planner's aggressive pruning of losing
paths and subpaths without ending up with something that's completely
impractical for anything beyond toy queries.  That's just from the
standpoint of planner runtime.  Adding on the cost of actually creating
a finished plan and then running it for long enough to get a reliable
runtime for each case would ... well, let's just say you'd be safely dead
before you got any interesting answers.

regards, tom lane

Re: ArchiveEntry optional arguments refactoring

2019-01-23 Thread Dmitry Dolgov

> On 2019-01-17 09:29:04 -0800, Andres Freund wrote:
> On 2019-01-17 10:23:39 -0500, Tom Lane wrote:
> > I don't buy the argument that this would move the goalposts in terms
> > of how much work it is to add a new argument.  You'd still end up
> > touching every call site.
>
> Why? A lot of arguments that'd be potentially added or removed would not
> be set by each callsites.
>
> If you e.g. look at
>
> you can see that a lot of changes where like
> ArchiveEntry(fout, nilCatalogId, createDumpId(),
>  "pg_largeobject", NULL, NULL, "",
> -false, "pg_largeobject", SECTION_PRE_DATA,
> +"pg_largeobject", SECTION_PRE_DATA,
>  loOutQry->data, "", NULL,
>  NULL, 0,
>  NULL, NULL);
>
> i.e. just removing a 'false' argument. In like 70+ callsites. With the
> above scheme, we'd instead just have removed a single .withoids = true,
> from dumpTableSchema()'s ArchiveEntry() call.

To make this discussion a bit more specific, I've created a patch of how it can
look like. All the arguments, except Archive, CatalogId and DumpId I've moved
into the ArchiveOpts structure. Not all of them could be empty before, but
anyway it seems better for consistency and readability. Some of the arguments
had empty string as a default value, I haven't changed anything here yet
(although this mixture of NULL and "" in ArchiveEntry looks a bit confusing).

As Andres mentioned above, for 578b229718e / the WITH OID removal and pg_dump
modification from pluggable storage thread, this patch reduces number of
changes, related to ArchiveEntry, from 70+ to just one.

0001-ArchiveOpts-structure.patch
Description: Binary data

Re: pg_dump multi VALUES INSERT




Hello David,


I thought about this and looked into it, but I decided it didn't look
like a smart thing to do.  The reason is that if --inserts sets
dump_inserts to 1 then --rows-per-insert sets it to something else
that's likely fine, but if that happens in the opposite order then the
--rows-per-insert gets overwritten with 1.


You can test before doing that!

  case X:
if (opt.dump_inserts == 0)
  opt.dump_inserts = 1;
// otherwise option is already set

The bad news is the order that happens is defined by the order of the 
command line args.


It might be possible to make it work by having --inserts set some other 
variable,


ISTM that it is enough to test whether the variable is zero.

then set dump_inserts to 1 if it's set to 0 and the other variable is 
set to >= 1... but that requires another variable, which is what you 
want to avoid...


I still do not understand the need for another variable.

  int ninserts = 0; // default is to use copy
  while (getopt...)
  {
switch (...) {
  case "--inserts":
if (ninserts == 0) ninserts = 1;
break;
  case "--rows-per-insert":
ninserts = arg_value;
checks...
break;
 ...


I think it's best to have a variable per argument.


I disagree, because it adds complexity where none is needed: here the new 
option is an extension of a previous one, thus the previous one just 
becomes a particular case, so it seems simpler to manage it as the 
particular case it is rather than a special case, creating the need for 
checking the consistency and so if two variables are used.



I could get rid of the got_rows_per_insert variable, but it would
require setting the default value for rows_per_insert in the main()
function rather than in InitDumpOptions().   I thought
InitDumpOptions() looked like just the place to do this, so went with
that option.  To make it work without got_rows_per_insert,
rows_per_insert would have to be 0 by default and we'd know we saw a
--rows-per-insert command line arg by the fact that rows_per_insert
was non-zero.  Would you rather have it that way?


Yep, esp as rows_per_insert & dump_inserts could be the same.


The feature is not tested anywhere. I still think that there should be a
test on empty/small/larger-than-rows-per-insert tables, possibly added to
existing TAP-tests.


I was hoping to get away with not having to do that... mainly because
I've no idea how.


Hmmm. That is another question! Maybe someone will help.

--
Fabien.

Re: WIP: Avoid creation of the free space map for small tables

On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila  wrote:
> I think the first two patches (a) removal of dead code in bootstrap
> and (b) the core patch to avoid creation of FSM file for the small
> table are good now.  I have prepared the patches along with commit
> message.  There is no change except for some changes in README and
> commit message of the second patch.  Kindly let me know what you think
> about them?

Good to hear! The additional language is fine. In "Once the FSM is
created for heap", I would just change that to "...for a heap".

> I think these two patches can go even without the upgrade patch
> (during pg_upgrade, conditionally skip transfer of FSMs.) which is
> still under discussion.  However, I am not in a hurry if you or other
> thinks that upgrade patch must be committed along with the second
> patch.  I think the upgrade patch is generally going on track but
> might need some more review.

The pg_upgrade piece is a nice-to-have feature and not essential, so
can go in later. Additional review is also welcome.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: postgres on a non-journaling filesystem

On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote:
> On 23/01/2019 01:03, maayan mordehai wrote:
> > hello,
> > 
> > I'm Maayan, I'm in a DBA team that uses postgresql.
> > I saw in the documentation on wals:
> > https://www.postgresql.org/docs/10/wal-intro.html
> > In the tip box that, it's better not to use a  journaling filesystem. and I
> > wanted to ask how it works?
> > can't we get corruption that we can't recover from?
> > I mean what if postgres in the middle of a write to a wal and there is a
> > crash, and it didn't finish.
> > I'm assuming it will detect it when we will start postgres and write that
> > it was rolled back, am I right?
> 
> Yep, any half-written transactions will be rolled back.
> 
> > and how does it work in the data level? if some of the 8k block is written
> > but not all of it, and then there is a crash, how postgres deals with it?
> 
> The first time a block is modified after a checkpoint, a copy of the block
> is written to the WAL. At crash recovery, the block is restored from the
> WAL. This mechanism is called "full page writes".
> 
> The WAL works just like the journal in a journaling filesystem. That's why
> it's not necessary to have journaling at the filesystem level.

But note not having journaling on the FS level often makes OS start
after a crash *painfully* slow, because fsck or similar will be run. And
that's often necessary for the internal FS consistency.

Note that even with journaling enabled, most filesystem by default don't
journal data, so you can get those partial writes anyway.

Greetings,

Andres Freund

Re: Typo: llvm.cpp files identified as llvm.c

Hi,

On 2019-01-23 12:01:10 +0900, Michael Paquier wrote:
> On Tue, Jan 22, 2019 at 05:49:46PM -0800, Andres Freund wrote:
> > On 2019-01-23 14:43:15 +1300, Thomas Munro wrote:
> >> The function name comments are similar, though less consistent so I'm
> >> too lazy to write a script to find one that is actually wrong (with
> >> which to trigger Andres's let's-delete-them-all response :-D).
> 
> I am not sure if anybody uses them for anything automatically, still I
> find myself from time to time looking at them to remember on which
> path the file is located when opened in emacs.

I often want to know that too - isn't it easier to just meta-x f and see
the directory displayed? Because that doesn't require jumping to the
head of the file...

Greetings,

Andres Freund

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

On 10/12/2018 23:37, Dmitry Dolgov wrote:

On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier wrote:

On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote:

Please see the PGCon2018 presentation[1] for the details.

[1]
https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf

Unfortunately, the patch has some conflicts now, so probably not only fresher
performance numbers are necessary, but also a rebased version.

Thanks.

Once we have a reliable mmap()/msync() implementation running, it should
be straightforward to change it to use MAP_SYNC and the special CPU
instructions for the flushing.

- Heikki

Re: ArchiveEntry optional arguments refactoring

Hi,

On 2019-01-23 16:12:15 +0100, Dmitry Dolgov wrote:
> To make this discussion a bit more specific, I've created a patch of how it 
> can
> look like.

Thanks.

> All the arguments, except Archive, CatalogId and DumpId I've moved
> into the ArchiveOpts structure. Not all of them could be empty before, but
> anyway it seems better for consistency and readability. Some of the arguments
> had empty string as a default value, I haven't changed anything here yet
> (although this mixture of NULL and "" in ArchiveEntry looks a bit confusing).

Probably worth changing at the same time, if we decide to go for it.

To me this does look like it'd be more maintainable going forward.

Greetings,

Andres Freund

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

Hi,

On 2019-01-23 18:45:42 +0200, Heikki Linnakangas wrote:
> To re-iterate what I said earlier in this thread, I think the next step here
> is to write a patch that modifies xlog.c to use plain old mmap()/msync() to
> memory-map the WAL files, to replace the WAL buffers. Let's see what the
> performance of that is, with or without NVM hardware. I think that might
> actually make the code simpler. There's a bunch of really hairy code around
> locking the WAL buffers, which could be made simpler if each backend
> memory-mapped the WAL segment files independently.
> 
> One thing to watch out for, is that if you read() a file, and there's an I/O
> error, you have a chance to ereport() it. If you try to read from a
> memory-mapped file, and there's an I/O error, the process is killed with
> SIGBUS. So I think we have to be careful with using memory-mapped I/O for
> reading files. But for writing WAL files, it seems like a good fit.
> 
> Once we have a reliable mmap()/msync() implementation running, it should be
> straightforward to change it to use MAP_SYNC and the special CPU
> instructions for the flushing.

FWIW, I don't think we should go there as the sole implementation. I'm
fairly convinced that we're going to need to go to direct-IO in more
cases here, and that'll not work well with mmap.  I think this'd be a
worthwhile experiment, but I'm doubtful it'd end up simplifying our
code.

Greetings,

Andres Freund

Re: Typo: llvm.cpp files identified as llvm.c

Andres Freund  writes:
> On 2019-01-23 12:01:10 +0900, Michael Paquier wrote:
>> I am not sure if anybody uses them for anything automatically, still I
>> find myself from time to time looking at them to remember on which
>> path the file is located when opened in emacs.

> I often want to know that too - isn't it easier to just meta-x f and see
> the directory displayed? Because that doesn't require jumping to the
> head of the file...

My own workflow often involves editing copies that are not in the original
directory, so that answer doesn't work for me.  In general it won't work
anytime you are looking at a file out-of-context.

regards, tom lane

Re: ArchiveEntry optional arguments refactoring

Hello

On 2019-Jan-23, Andres Freund wrote:

> > All the arguments, except Archive, CatalogId and DumpId I've moved
> > into the ArchiveOpts structure. Not all of them could be empty before, but
> > anyway it seems better for consistency and readability. Some of the 
> > arguments
> > had empty string as a default value, I haven't changed anything here yet
> > (although this mixture of NULL and "" in ArchiveEntry looks a bit 
> > confusing).
> 
> Probably worth changing at the same time, if we decide to go for it.
> 
> To me this does look like it'd be more maintainable going forward.

It does.  How does pgindent behave with it?

I'd use ArchiveEntryOpts as struct name; ArchiveOpts sounds wrong. Also,
the struct members could use better names -- "defn" for example could
perhaps be "createStmt" (to match dropStmt/copyStmt), and expand "desc"
to "description".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ArchiveEntry optional arguments refactoring

On 1/23/19 10:12 AM, Dmitry Dolgov wrote:
> To make this discussion a bit more specific, I've created a patch of how
> it can look like.
A little bit of vararg-macro action can make such a design look
even tidier, cf. [1].

Or are compilers without vararg macros still in the supported mix?

-Chap

[1] https://github.com/NetBSD/src/blob/trunk/sys/sys/midiio.h#L709

The macros in [1] are not defined to create a function call, but only
the argument structure because there might be several functions to pass
it to, so a call would be written like func(&SEQ_MK_CHN(NOTEON, ...)).

In ArchiveEntry's case, if there's only one function involved, there'd
be no reason not to have a macro produce the whole call.

Re: ArchiveEntry optional arguments refactoring