Re: Assertion failure when ATTACH partition followed by CREATE PARTITION.

2020-10-26 Thread Amul Sul
Hi,

On Mon, Oct 19, 2020 at 4:58 PM Amul Sul  wrote:
>
> Hi,
>
> Assertion added in commits 6b2c4e59d016 is failing with following test:
>
> CREATE TABLE sales
> (
>   prod_id   int,
>   prod_quantity int,
>   sold_monthdate
> ) PARTITION BY RANGE(sold_month);
>
> CREATE TABLE public.sales_p1 PARTITION OF public.sales
> FOR VALUES FROM (MINVALUE) TO ('2019-02-15');
>
> CREATE TABLE sales_p2(like sales including all);
> ALTER TABLE sales ATTACH PARTITION sales_p2
> FOR VALUES FROM ('2019-02-15') TO ('2019-03-15');
>
> CREATE TABLE fail PARTITION OF public.sales
> FOR VALUES FROM ('2019-01-15') TO ('2019-02-15');
>

The reported issue has nothing to do with the ATTACH PARTITION stmt this can
also be reproducible with the following CREATE stmts:

CREATE TABLE sales
(
  prod_id   int,
  prod_quantity int,
  sold_monthdate
) PARTITION BY RANGE(sold_month);

CREATE TABLE sales_p1 PARTITION OF sales
FOR VALUES FROM (MINVALUE) TO ('2019-02-15');

CREATE TABLE sales_p2 PARTITION OF sales
FOR VALUES FROM ('2019-02-15') TO ('2019-03-15');

CREATE TABLE fail PARTITION OF sales
FOR VALUES FROM ('2019-01-15') TO ('2019-02-15');

AFAICU, the assert assumption is not correct. In the attached patch, I have
removed that assert and the related comment.  Also, minor adjustments to the
code fetching correct datum.

Regards,
Amul

>
> Here is the backtrace:
>
> (gdb) bt
> #0  0x7fa37277 in raise () from /lib64/libc.so.6
> #1  0x7fa373334968 in abort () from /lib64/libc.so.6
> #2  0x00abecdc in ExceptionalCondition (conditionName=0xc5de6d
> "cmpval >= 0", errorType=0xc5cf03 "FailedAssertion", fileName=0xc5d03e
> "partbounds.c", lineNumber=3092) at assert.c:69
> #3  0x0086189c in check_new_partition_bound
> (relname=0x7fff225f5ef0 "fail", parent=0x7fa3744868a0, spec=0x2e9,
> pstate=0x2e905e8) at partbounds.c:3092
> #4  0x006b44dc in DefineRelation (stmt=0x2e83198, relkind=114
> 'r', ownerId=10, typaddress=0x0, queryString=0x2dc07c0 "CREATE TABLE
> fail PARTITION OF public.sales \nFOR VALUES FROM ('2019-01-15') TO
> ('2019-02-15');") at tablecmds.c:1011
> #5  0x00941430 in ProcessUtilitySlow (pstate=0x2e83080,
> pstmt=0x2dc19b8, queryString=0x2dc07c0 "CREATE TABLE fail PARTITION OF
> public.sales \nFOR VALUES FROM ('2019-01-15') TO ('2019-02-15');",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> dest=0x2dc1aa8, qc=0x7fff225f67c0) at utility.c:1163
> #6  0x0094123e in standard_ProcessUtility (pstmt=0x2dc19b8,
> queryString=0x2dc07c0 "CREATE TABLE fail PARTITION OF public.s ales
> \nFOR VALUES FROM ('2019-01-15') TO ('2019-02-15');",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> dest=0x2dc1 aa8, qc=0x7fff225f67c0) at utility.c:1071
> #7  0x00940349 in ProcessUtility (pstmt=0x2dc19b8,
> queryString=0x2dc07c0 "CREATE TABLE fail PARTITION OF public.sales
> \nFO R VALUES FROM ('2019-01-15') TO ('2019-02-15');",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> dest=0x2dc1aa8, qc=0 x7fff225f67c0) at utility.c:524
> #8  0x0093f163 in PortalRunUtility (portal=0x2e22ab0,
> pstmt=0x2dc19b8, isTopLevel=true, setHoldSnapshot=false, dest=0x2dc1
> aa8, qc=0x7fff225f67c0) at pquery.c:1159
> #9  0x0093f380 in PortalRunMulti (portal=0x2e22ab0,
> isTopLevel=true, setHoldSnapshot=false, dest=0x2dc1aa8, altdest=0x2dc1
> aa8, qc=0x7fff225f67c0) at pquery.c:1305
> #10 0x0093e882 in PortalRun (portal=0x2e22ab0,
> count=9223372036854775807, isTopLevel=true, run_once=true,
> dest=0x2dc1aa8, altdest=0x2dc1aa8, qc=0x7fff225f67c0) at pquery.c:779
> #11 0x009389e8 in exec_simple_query (query_string=0x2dc07c0
> "CREATE TABLE fail PARTITION OF public.sales \nFOR VALUES FROM
> ('2019-01-15') TO ('2019-02-15');") at postgres.c:1239
>
> Regards,
> Amul Sul
From 983c950ec26ef88127c1c349f6fa34f754a239f0 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Tue, 27 Oct 2020 00:45:35 -0400
Subject: [PATCH] Fix assertion.

---
 src/backend/partitioning/partbounds.c  | 10 +++---
 src/test/regress/expected/create_table.out |  4 
 src/test/regress/sql/create_table.sql  |  1 +
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 66c42b58986..c8a42b4970d 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3061,7 +3061,7 @@ check_new_partition_bound(char *relname, Relation parent,
  * datums list.
  */
 PartitionRangeDatum *datum =
-list_nth(spec->upperdatums, -cmpval - 1);
+list_nth(spec->upperdatums, Abs(cmpval) - 1);
 
 /*
  * The new partition overlaps with the
@@ -3084,14 +3084,10 @@ check_new_partition_bound(char *relname, Relation parent,
 
 		/*
 		 * Fetch the problematic key from the lower datums
-		 * list.  Given the way partition_range_bsearch()
-		 * works, 

Re: PATCH: Report libpq version and configuration

2020-10-26 Thread Craig Ringer
On Tue, Oct 27, 2020 at 12:56 AM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > On 2020-Oct-26, Craig Ringer wrote:
> >> also adds PQlibInfoPrint() which dumps PQlibInfo() keys/values to stdout.
>
> > Sounds useful. I'd have PQlibInfoPrint(FILE *) instead, so you can pass
> > stdout or whichever fd you want.
>
> +1.  Are we concerned about translatability of these strings?  I think
> I'd vote against, as it would complicate applications, but it's worth
> thinking about it now not later.


It's necessary not to translate the key names, they are identifiers
not descriptive text. I don't object to having translations too, but
the translation teams have quite enough to do already with user-facing
text that will get regularly seen. So while it'd be potentially
interesting to expose translated versions too, I'm not entirely
convinced. It's a bit like translating macro names. You could, but ...
why?

> >> Patch 0002 exposes LIBPQ_VERSION_STR, LIBPQ_VERSION_NUM and
> >> LIBPQ_CONFIGURE_ARGS symbols in the dynamic symbol table. These can be
> >> accessed by a debugger even when the library cannot be loaded or executed,
> >> and unlike macros are available even in a stripped executable. So they can
> >> be used to identify a libpq binary found in the wild. Their storage is
> >> shared with PQlibInfo()'s static data, so they only cost three symbol table
> >> entries.
>
> > Interesting.  Is this real-world useful?
>
> -1, I think this is making way too many assumptions about the content
> and format of a shlib.


I'm not sure I understand what assumptions you're concerned about or
their consequences. On any ELF it should be just fine, and Mach-O
should be too. I do need to check that MSVC generates direct symbols
for WIN32 PE, not indirect thunked data symbols.

It doesn't help that I failed to supply the final revision of this
patch, which does this:

-const char * const LIBPQ_VERSION_STR = PG_VERSION_STR;
+const char LIBPQ_VERSION_STR[] = PG_VERSION_STR;

-const char * const LIBPQ_CONFIGURE_ARGS = CONFIGURE_ARGS;
+const char LIBPQ_CONFIGURE_ARGS[] = CONFIGURE_ARGS;

... to properly ensure the string symbols go into the read-only data section:

$ eu-nm --defined-only -D $LIBPQ | grep LIBPQ_
LIBPQ_CONFIGURE_ARGS   |00028640|GLOBAL|OBJECT
|00e4|  libpq-version.c:74|.rodata
LIBPQ_VERSION_NUM  |00028620|GLOBAL|OBJECT
|0004|  libpq-version.c:75|.rodata
LIBPQ_VERSION_STR  |00028740|GLOBAL|OBJECT
|006c|  libpq-version.c:73|.rodata

I don't propose these to replace information functions or macros, I'm
suggesting we add them as an aid to tooling and for debugging. I have
had quite enough times when I've faced a mystery libpq, and it's not
always practical in a given target environment to just compile a tool
to print the version.

In addition to easy binary identification, having symbolic references
to the version info is useful for dynamic tracing tools like perf and
systemtap - they cannot execute functions directly in the target
address space, but they can read data symbols. I actually want to
expose matching symbols in postgres itself, for the use of dynamic
tracing utilities, so they can autodetect the target postgres at
runtime even without -ggdb3 level debuginfo with macros, and correctly
adapt to version specifics of the target postgres.

In terms of standard tooling here are some different ways you can get
this information symbolically.

$ LIBPQ=/path/to/libpq.so

$ gdb -batch -ex 'p (int) LIBPQ_VERSION_NUM' -ex 'p (const char *)
LIBPQ_VERSION_STR' $LIBPQ
$1 = 14
$2 = "PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (GCC)
10.2.1 20200723 (Red Hat 10.2.1-1), 64-bit"

$ perl getpqver.pl $LIBPQ
LIBPQ_VERSION_NUM=14
LIBPQ_VERSION_STR=PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled
by gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), 64-bit

I've attached getpqver.pl. It uses eu-nm from elfutils to get symbol
offset and length, which is pretty standard stuff. And it's quite
simple to adapt it to use legacy binutils "nm" by invoking

nm --dynamic --defined -S $LIBPQ

and tweaking the reader.

If you really want something strings-able, I'm sure that's reasonably
feasible, but I don't think it's particularly unreasonable to expect
to be able to inspect the symbol table using appropriate platform
tools or a simple debugger command.

> Again, I'm not exactly excited about this.  I do not one bit like
> patches that assume that x64 linux is the universe, or at least
> all of it that need be catered to.  Reminds me of people who thought
> Windows was the universe, not too many years ago.

Yeah. I figured you'd say that, and don't disagree. It's why I split
this patch out - it's kind of a sacrificial patch.

I actually wrote this part first.

Then I wrote  PQlibInfo() when I realised that there was no sensible
pre-existing way to get the information I wanted to dump from libpq at
the API level, and adapted the 

Re: Add important info about ANALYZE after create Functional Index

2020-10-26 Thread Nikolay Samokhvalov
On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Monday, October 26, 2020, Nikolay Samokhvalov 
> wrote:
>>
>> Although, this triggers a question – should ANALYZE be automated in, say,
>> pg_restore as well?
>>
>
> Independent concern.
>

It's the same class of issues – after we created some objects, we lack
statistics and willing to automate its collection. If the approach is
automated in one case, it should be automated in the others, for
consistency.

And another question: how ANALYZE needs to be run? If it's under the
>> user's control, there is an option to use vacuumdb --analyze and benefit
>> from using -j to parallelize the work (and, in some cases, benefit from
>> using --analyze-in-stages). If we had ANALYZE as a part of building indexes
>> on expressions, should it be parallelized to the same extent as index
>> creation (controlled by max_parallel_maintenance_workers)?
>>
>
> None of that seems relevant here.  The only relevant parameter I see is
> what to specify for “table_and_columns”.
>

I'm not sure I follow.

Thanks,
Nik


Patch to fix FK-related selectivity estimates with constants

2020-10-26 Thread Tom Lane
Over in the thread at [1] it's discussed how our code for making
selectivity estimates using knowledge about FOREIGN KEY constraints
is busted in the face of EquivalenceClasses including constants.

That is, if fktab(a,b) is a 2-column FK reference to pktab(a,b)
and we have a query like

... where fktab.a = pktab.a and fktab.b = pktab.b

then we understand that any given fktab row can match at most one
pktab row (and this estimate is often a lot better than we'd get
from assuming that the a and b conditions are independent).

However, if the query is like

... where fktab.a = pktab.a and fktab.b = pktab.b
  and fktab.a = 1

then this suddenly breaks down and we go back to non-FK-aware
estimates.  The reason is that get_foreign_key_join_selectivity()
is looking for join clauses that equate the two sides of the FK
constraint ... and in this example, it will not see any such
join clause for column "a".  That's because equivclass.c decided
to replace the given clauses with "fktab.a = 1 and pktab.a = 1",
which can be enforced at the scan level, leaving nothing to be
done for column "a" at the join level.

We can fix that by detecting which EquivalenceClasses are marked
"ec_has_const", since that's the property that dictates whether
equivclass.c uses this strategy.  However, that's only a partial
fix; if you try it, you soon find that the selectivity estimates
are still off.  The reason is that because the two "a = 1" conditions
are already factored into the size estimates for the join input
relations, we're essentially double-counting the "fktab.a = 1"
condition's selectivity if we use the existing FK selectivity
estimation rule.  If we treated the constant condition as only
relevant to the PK side, then the FK selectivity rule could work
normally.  But we don't want to drop the ability to enforce the
restriction at the scan level.  So what we have to do is cancel
the FK side's condition's selectivity out of the FK selectivity.

Attached is a patch series that attacks it that way.  For ease of
review I split it into two steps:

0001 refactors process_implied_equality() so that it can pass
back the new RestrictInfo to its callers in equivclass.c.
I think this is a good change on its own merits, because it means
that when generating a derived equality, we don't have to use
initialize_mergeclause_eclasses() to set up the new RestrictInfo's
left_ec and right_ec pointers.  The equivclass.c caller knows
perfectly darn well which EquivalenceClass the two sides of the
clause belong to, so it can just assign that value, saving a couple
of potentially-not-cheap get_eclass_for_sort_expr() searches.
This does require process_implied_equality() to duplicate some of
the steps in distribute_qual_to_rels(), but on the other hand we
get to remove some complexity from distribute_qual_to_rels() because
it no longer has to deal with any is_deduced cases.  Anyway, the
end goal of this step is that we can save away all the generated
"x = const" clauses in the EC's ec_derives list.  0001 doesn't
do anything with that information, but ...

0002 actually fixes the bug.  Dealing with the first part of the
problem just requires counting how many of the ECs we matched to
an FK constraint are ec_has_const.  To deal with the second part,
we dig out the scan-level "x = const" clause that the EC generated
for the FK column and see what selectivity it has got.  This beats
other ways of reconstructing the scan-clause selectivity because
(at least in all normal cases) that selectivity would have been
cached in the RestrictInfo.  Thus we not only save cycles but can be
sure we are cancelling out exactly the right amount of selectivity.

I would not propose back-patching this, but it seems OK for HEAD.
Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 690b753369..26c98198b5 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -840,10 +840,8 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
  * scanning of the quals and before Path construction begins.
  *
  * We make no attempt to avoid generating duplicate RestrictInfos here: we
- * don't search ec_sources for matches, nor put the created RestrictInfos
- * into ec_derives.  Doing so would require some slightly ugly changes in
- * initsplan.c's API, and there's no real advantage, because the clauses
- * generated here can't duplicate anything we will generate for joins anyway.
+ * don't search ec_sources or ec_derives for matches.  It doesn't really
+ * seem worth the trouble to do so.
  */
 void
 generate_base_implied_equalities(PlannerInfo *root)
@@ -969,6 +967,7 @@ generate_base_implied_equalities_const(PlannerInfo *root,
 	{
 		EquivalenceMember *cur_em = 

Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-26 Thread Amit Kapila
On Tue, Oct 27, 2020 at 8:51 AM Justin Pryzby  wrote:
>
> On Fri, Oct 23, 2020 at 10:45:34AM +0530, Amit Kapila wrote:
> > On Fri, Oct 23, 2020 at 8:59 AM Amit Kapila  wrote:
> >
> > While updating the streaming stats patch, it occurred to me that we
> > can write a better description spill_bytes as well. Attached contains
> > the update to spill_bytes description.
>
> +This and other spill
> +counters can be used to gauge the I/O occurred during logical 
> decoding
> +and accordingly can tune 
> logical_decoding_work_mem.
>
> "gauge the IO occurred" is wrong.
> Either: I/O *which* occured, or I/O occurring, or occurs.
>
> "can tune" should say "allow tuning".
>
> Like:
> +This and other spill
> +counters can be used to gauge the I/O which occurs during logical 
> decoding
> +and accordingly allow tuning of 
> logical_decoding_work_mem.
>
> -Number of times transactions were spilled to disk. Transactions
> +Number of times transactions were spilled to disk while performing
> +decoding of changes from WAL for this slot. Transactions
>
> What about: "..while decoding changes.." (remove "performing" and "of").
>

All of your suggestions sound good to me. Find the patch attached to
update the docs accordingly.

-- 
With Regards,
Amit Kapila.


0001-Minor-improvements-in-description-of-spilled-counter.patch
Description: Binary data


Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-26 Thread Greg Nancarrow
On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila  wrote:
>
>
> Cool, let me try to explain my thoughts a bit more. The idea is first
> (in standard_planner) we check if there is any 'parallel_unsafe'
> function/expression (via max_parallel_hazard) in the query tree. If we
> don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
> this stage, the patch is checking whether there is any
> 'parallel_unsafe' or 'parallel_restricted' expression/function in the
> target relation and if there is none then we mark parallelModeOK as
> true. So, if there is anything 'parallel_restricted' then we will mark
> parallelModeOK as false which doesn't seem right to me.
>
> Then later in the planner during set_rel_consider_parallel, we
> determine if a particular relation can be scanned from within a
> worker, then we consider that relation for parallelism. Here, we
> determine if certain things are parallel-restricted then we don't
> consider this for parallelism. Then we create partial paths for the
> relations that are considered for parallelism. I think we don't need
> to change anything for the current patch in these later stages because
> we anyway are not considering Insert to be pushed into workers.
> However, in the second patch where we are thinking to push Inserts in
> workers, we might need to do something to filter parallel-restricted
> cases during this stage of the planner.
>

Posting an updated Parallel INSERT patch which (mostly) addresses
previously-identified issues and suggestions.

More work needs to be done in order to support parallel UPDATE and
DELETE (even after application of Thomas Munro's combo-cid
parallel-support patch), but it is getting closer.

Regards,
Greg Nancarrow
Fujitsu Australia


v5-0001-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
Description: Binary data


[PATCH] remove deprecated v8.2 containment operators

2020-10-26 Thread Justin Pryzby
Forking this thread:
https://www.postgresql.org/message-id/fd93f1c5-7818-a02c-01e5-1075ac0d4...@iki.fi

They have been deprecated for a Long Time, so finally remove them for v14.
Four fewer exclamation marks makes the documentation less exciting, which is a
good thing.
>From 7868fee24f92fb5150735f1f9507cfe9a6ab212c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 11 Apr 2020 22:57:06 -0500
Subject: [PATCH] remove deprecated v8.2 containment operators

See also:
ba920e1c9182eac55d5f1327ab0d29b721154277
684ad6a92fcc33adebdab65c4e7d72a68ba05408
---
 contrib/cube/cube.c|  4 --
 contrib/hstore/hstore.h|  1 -
 contrib/hstore/hstore_gist.c   |  3 +-
 contrib/intarray/_int_gin.c|  4 --
 contrib/intarray/_int_gist.c   |  2 -
 contrib/intarray/_intbig_gist.c|  2 -
 contrib/intarray/intarray--1.2.sql |  6 --
 contrib/intarray/intarray--1.3--1.4.sql|  7 ---
 contrib/seg/seg.c  |  4 --
 doc/src/sgml/cube.sgml |  8 ---
 doc/src/sgml/func.sgml |  9 ---
 doc/src/sgml/hstore.sgml   | 10 
 doc/src/sgml/intarray.sgml |  8 ---
 doc/src/sgml/seg.sgml  |  8 ---
 src/backend/access/brin/brin_inclusion.c   |  2 -
 src/backend/access/gist/gistproc.c |  4 --
 src/include/access/stratnum.h  |  2 -
 src/include/catalog/pg_amop.dat| 16 --
 src/include/catalog/pg_operator.dat| 65 --
 src/test/regress/expected/create_am.out| 12 ++--
 src/test/regress/expected/create_index.out | 24 
 src/test/regress/expected/opr_sanity.out   |  7 +--
 src/test/regress/sql/create_am.sql |  6 +-
 src/test/regress/sql/create_index.sql  | 12 ++--
 24 files changed, 28 insertions(+), 198 deletions(-)

diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index 6f810b26c5..57b89d91b8 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -634,11 +634,9 @@ g_cube_leaf_consistent(NDBOX *key,
 			retval = (cube_cmp_v0(key, query) == 0);
 			break;
 		case RTContainsStrategyNumber:
-		case RTOldContainsStrategyNumber:
 			retval = cube_contains_v0(key, query);
 			break;
 		case RTContainedByStrategyNumber:
-		case RTOldContainedByStrategyNumber:
 			retval = cube_contains_v0(query, key);
 			break;
 		default:
@@ -661,11 +659,9 @@ g_cube_internal_consistent(NDBOX *key,
 			break;
 		case RTSameStrategyNumber:
 		case RTContainsStrategyNumber:
-		case RTOldContainsStrategyNumber:
 			retval = (bool) cube_contains_v0(key, query);
 			break;
 		case RTContainedByStrategyNumber:
-		case RTOldContainedByStrategyNumber:
 			retval = (bool) cube_overlap_v0(key, query);
 			break;
 		default:
diff --git a/contrib/hstore/hstore.h b/contrib/hstore/hstore.h
index bf4a565ed9..8713a22dcc 100644
--- a/contrib/hstore/hstore.h
+++ b/contrib/hstore/hstore.h
@@ -181,7 +181,6 @@ extern Pairs *hstoreArrayToPairs(ArrayType *a, int *npairs);
 #define HStoreExistsStrategyNumber		9
 #define HStoreExistsAnyStrategyNumber	10
 #define HStoreExistsAllStrategyNumber	11
-#define HStoreOldContainsStrategyNumber 13	/* backwards compatibility */
 
 /*
  * defining HSTORE_POLLUTE_NAMESPACE=0 will prevent use of old function names;
diff --git a/contrib/hstore/hstore_gist.c b/contrib/hstore/hstore_gist.c
index 102c9cea72..10c9906fa5 100644
--- a/contrib/hstore/hstore_gist.c
+++ b/contrib/hstore/hstore_gist.c
@@ -517,8 +517,7 @@ ghstore_consistent(PG_FUNCTION_ARGS)
 
 	sign = GETSIGN(entry);
 
-	if (strategy == HStoreContainsStrategyNumber ||
-		strategy == HStoreOldContainsStrategyNumber)
+	if (strategy == HStoreContainsStrategyNumber)
 	{
 		HStore	   *query = PG_GETARG_HSTORE_P(1);
 		HEntry	   *qe = ARRPTR(query);
diff --git a/contrib/intarray/_int_gin.c b/contrib/intarray/_int_gin.c
index b7958d8eca..72cfbc2fd7 100644
--- a/contrib/intarray/_int_gin.c
+++ b/contrib/intarray/_int_gin.c
@@ -78,7 +78,6 @@ ginint4_queryextract(PG_FUNCTION_ARGS)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
 break;
 			case RTContainedByStrategyNumber:
-			case RTOldContainedByStrategyNumber:
 /* empty set is contained in everything */
 *searchMode = GIN_SEARCH_MODE_INCLUDE_EMPTY;
 break;
@@ -89,7 +88,6 @@ ginint4_queryextract(PG_FUNCTION_ARGS)
 	*searchMode = GIN_SEARCH_MODE_INCLUDE_EMPTY;
 break;
 			case RTContainsStrategyNumber:
-			case RTOldContainsStrategyNumber:
 if (*nentries > 0)
 	*searchMode = GIN_SEARCH_MODE_DEFAULT;
 else			/* everything contains the empty set */
@@ -127,7 +125,6 @@ ginint4_consistent(PG_FUNCTION_ARGS)
 			res = true;
 			break;
 		case RTContainedByStrategyNumber:
-		case RTOldContainedByStrategyNumber:
 			/* we will need recheck */
 			*recheck = true;
 			/* at least one element in check[] is true, so result = true */
@@ -148,7 +145,6 @@ ginint4_consistent(PG_FUNCTION_ARGS)
 			}
 			break;
 		case 

Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-26 Thread Justin Pryzby
On Fri, Oct 23, 2020 at 10:45:34AM +0530, Amit Kapila wrote:
> On Fri, Oct 23, 2020 at 8:59 AM Amit Kapila  wrote:
> >
> > On Fri, Oct 23, 2020 at 7:42 AM Masahiko Sawada
> >  wrote:
> > >
> > > On Thu, 22 Oct 2020 at 20:34, Amit Kapila  wrote:
> > > >
> > > > > > I have modified the description of spill_count and spill_txns to 
> > > > > > make
> > > > > > things clear. Any suggestions?
> > > > >
> > > > > Thank you for the patch.
> > > > >
> > > > > -logical decoding exceeds
> > > > > logical_decoding_work_mem. The
> > > > > -counter gets incremented both for toplevel transactions and
> > > > > -subtransactions.
> > > > > +logical decoding of changes from WAL for this exceeds
> > > > > +logical_decoding_work_mem. The counter 
> > > > > gets
> > > > > +incremented both for toplevel transactions and 
> > > > > subtransactions.
> > > > >
> > > > > What is the word "this" in the above change referring to?
> > > > >
> > > >
> > > > 'slot'. The word *slot* is missing in the sentence.
> > > >
> > > > > How about
> > > > > something like:
> > > > >
> > > >
> > > > > Number of transactions spilled to disk after the memory used by
> > > > > logical decoding of changes from WAL exceeding
> > > > > logical_decoding_work_mem. The counter gets incremented both for
> > > > > toplevel transactions and subtransactions.
> > > > >
> > > >
> > > > /exceeding/exceeds. I am fine with your proposed text as well but if
> > > > you like the above after correction that would be better because it
> > > > would be more close to spill_count description.
> > >
> > > yeah, I agree with the correction.
> > >
> >
> > Okay, thanks, attached is an updated patch.
> >
> 
> While updating the streaming stats patch, it occurred to me that we
> can write a better description spill_bytes as well. Attached contains
> the update to spill_bytes description.

+This and other spill
+counters can be used to gauge the I/O occurred during logical decoding
+and accordingly can tune logical_decoding_work_mem.

"gauge the IO occurred" is wrong.
Either: I/O *which* occured, or I/O occurring, or occurs.

"can tune" should say "allow tuning".

Like:
+This and other spill
+counters can be used to gauge the I/O which occurs during logical 
decoding
+and accordingly allow tuning of 
logical_decoding_work_mem.

-Number of times transactions were spilled to disk. Transactions
+Number of times transactions were spilled to disk while performing
+decoding of changes from WAL for this slot. Transactions

What about: "..while decoding changes.." (remove "performing" and "of").

-- 
Justin




Re: pg_upgrade: fail early if a tablespace dir already exists for new cluster version

2020-10-26 Thread Bruce Momjian
On Wed, Oct 21, 2020 at 06:54:17AM -0500, Justin Pryzby wrote:
> On Tue, Oct 20, 2020 at 09:17:22PM -0400, Tom Lane wrote:
> > Justin Pryzby  writes:
> > > I wonder if pg_upgrade should try to rmdir() the tablespace dirs before
> > > restoring global objects, allowing it to succeed, rather than just 
> > > "failing
> > > early".
> > 
> > I'm a little confused about that.  If the directories aren't empty,
> > that will fail,
> 
> You mean rmdir() will fail, returning -1, which my patch will ignore, and the
> pg_upgrade will fail, same as it would have before.  The goal of the patch is
> to improve the "empty" case, only.
> 
> > but if they are, shouldn't the upgrade just work?
> 
> It fails in "Restoring global objects", which runs "CREATE TABLESPACE".
> | errmsg("directory \"%s\" already in use as a tablespace",
> 
> I considered the possibility of changing that, but it seems like this is
> specific to pg_upgrade.  I wouldn't want to change the core server just for
> that, and it has a good reason for failing in that case:
> | * The creation of the version directory prevents more than one tablespace
> | * in a single location.

I think it is best to report an error --- pg_upgrade rarely tries to fix
things because that adds complexity, and perhaps bugs.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Commitfest 2020-11

2020-10-26 Thread Julien Rouhaud
On Tue, Oct 27, 2020 at 8:21 AM Michael Paquier  wrote:
>
> On Mon, Oct 26, 2020 at 09:09:17PM +0300, Anastasia Lubennikova wrote:
> > November commitfest will start just in a few days.
> > I'm happy to volunteer to be the CFM for this one. With a help of Georgios
> > Kokolatos [1].
>
> Thanks to both of you for volunteering.

Thanks a lot to both of you!




Re: Add important info about ANALYZE after create Functional Index

2020-10-26 Thread David G. Johnston
On Monday, October 26, 2020, Nikolay Samokhvalov 
wrote:
>
>
> Although, this triggers a question – should ANALYZE be automated in, say,
> pg_restore as well?
>

Independent concern.


>
> And another question: how ANALYZE needs to be run? If it's under the
> user's control, there is an option to use vacuumdb --analyze and benefit
> from using -j to parallelize the work (and, in some cases, benefit from
> using --analyze-in-stages). If we had ANALYZE as a part of building indexes
> on expressions, should it be parallelized to the same extent as index
> creation (controlled by max_parallel_maintenance_workers)?
>

None of that seems relevant here.  The only relevant parameter I see is
what to specify for “table_and_columns”.

David J.


Re: Add important info about ANALYZE after create Functional Index

2020-10-26 Thread Nikolay Samokhvalov
On Mon, Oct 26, 2020 at 3:46 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> It would seem preferable to call the lack of auto-analyzing after these
> operations a bug and back-patch a fix that injects an analyze side-effect
> just before their completion.  It doesn't have to be smart either,
> analyzing things even if the created (or newly validated) index doesn't
> have statistics of its own isn't a problem in my book.
>

+1 to consider it as a major problem of CREATE INDEX [CONCURRENTLY] for
indexes on expressions, it's very easy to forget what I've observed many
times.

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

And another question: how ANALYZE needs to be run? If it's under the
user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

Thanks,
Nik


Re: PATCH: Report libpq version and configuration

2020-10-26 Thread Craig Ringer
On Tue, Oct 27, 2020 at 12:41 AM Alvaro Herrera 
wrote:

> On 2020-Oct-26, Craig Ringer wrote:
>
> > Patch 0001 adds PQlibInfo(), which returns an array of key/value
> > description items reporting on configuration like the full version
> string,
> > SSL support, gssapi support, thread safety, default port and default unix
> > socket path. This is for application use and application diagnostics. It
> > also adds PQlibInfoPrint() which dumps PQlibInfo() keys/values to stdout.
> > See the commit message in patch 0001 for details.
>
> Sounds useful. I'd have PQlibInfoPrint(FILE *) instead, so you can pass
> stdout or whichever fd you want.
>

The decision not to do so was deliberate. On any platform where a shared
library could be linked to a different C runtime library than the main
executable or other libraries it is not safe to pass a FILE*. This is most
common on Windows.

I figured it's just a trivial wrapper anyway, so people can just write or
copy it if they really care.

> Patch 0002 exposes LIBPQ_VERSION_STR, LIBPQ_VERSION_NUM and
> > LIBPQ_CONFIGURE_ARGS symbols in the dynamic symbol table. These can be
> > accessed by a debugger even when the library cannot be loaded or
> executed,
> > and unlike macros are available even in a stripped executable. So they
> can
> > be used to identify a libpq binary found in the wild. Their storage is
> > shared with PQlibInfo()'s static data, so they only cost three symbol
> table
> > entries.
>
> Interesting.  Is this real-world useful?  I'm thinking most of the time
> I'd just run the library, but maybe you know of cases where that doesn't
> work?
>

It was prompted by a support conversation about how to identify a  libpq.
So I'd say yes.

In that case the eventual approach used was to use Python's ctypes to
dynamically load libpq then call PQlibVersion().

> Patch 0003 allows libpq.so to be executed directly from the command line
> to
> > print its version, configure arguments etc exactly as PQlibInfoPrint()
> > would output them. This is only enabled on x64 linux for now but can be
> > extended to other targets quite simply.
>
> +1 --- to me this is the bit that would be most useful, I expect.
>

It's also kinda cool.

But it's using a bit of a platform quirk that's not supported by the
toolchain as well as I'd really like - annoyingly, when you pass a
--entrypoint to GNU ld or to LLVM's ld.lld, it should really emit the
default .interp section to point to /bin/ld.so.2 or
/lib64/ld-linux-x86-64.so.2 as appropriate. But when building -shared they
don't seem to want to, nor do they expose a sensible macro that lets you
get the default string yourself.

So I thought there was a moderate to high chance that this patch would trip
someone's "yuck" meter.


Re: Add Information during standby recovery conflicts

2020-10-26 Thread Masahiko Sawada
On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 10/15/20 9:15 AM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> >
> >
> >
> > On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi  
> > wrote:
> >> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada 
> >>  wrote in
>  ereport(..(errmsg("%s", _("hogehoge" results in
>  fprintf((translated("%s")), translate("hogehoge")).
> 
>  So your change (errmsg("%s", gettext_noop("hogehoge")) results in
> 
>  fprintf((translated("%s")), DONT_translate("hogehoge")).
> 
>  which leads to a translation problem.
> 
>  (errmsg(gettext_noop("hogehoge"))
> >>> This seems equivalent to (errmsg("hogehoge")), right?
> >> Yes and no.  However eventually the two works the same way,
> >> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of
> >>
> >> 1: char *msg = gettext_noop("hogehoge");
> >> ...
> >> 2: .. (errmsg(msg));
> >>
> >> That is, the line 1 only registers a message id "hogehoge" and doesn't
> >> translate. The line 2 tries to translate the content of msg and it
> >> finds the translation for the message id "hogehoge".
> > Understood.
> >
> >>> I think I could understand translation stuff. Given we only report the
> >>> const string returned from get_recovery_conflict_desc() without
> >>> placeholders, the patch needs to use errmsg_internal() instead while
> >>> not changing _() part. (errmsg(get_recovery_conflict_desc())) is not
> >>> good (warned by -Wformat-security).
> >> Ah, right. we get a complain if no value parameters added. We can
> >> silence it by adding a dummy parameter to errmsg, but I'm not sure
> >> which is preferable.
> > Okay, I'm going to use errmsg_internal() for now until a better idea comes.
> >
> > I've attached the updated patch that fixed the translation part.
>
> Thanks for reviewing and helping on this patch!
>
> The patch tester bot is currently failing due to:
>
> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]"
>
> I've attached a new version with the minor change to fix it.
>

Thank you for updating the patch!

I've looked at the patch and revised a bit the formatting etc.

After some thoughts, I think it might be better to report the waiting
time as well. it would help users and there is no particular reason
for logging the report only once. It also helps make the patch clean
by reducing the variables such as recovery_conflict_logged. I’ve
implemented it in the v8 patch. The log message is now like:

 LOG:  recovery is still waiting recovery conflict on lock after 1062.601 ms
 DETAIL:  Conflicting processes: 35116, 35115, 35114.
 CONTEXT:  WAL redo at 0/328 for Standby/LOCK: xid 510 db 13185 rel 16384

LOG:  recovery is still waiting recovery conflict on lock after 2065.682 ms
DETAIL:  Conflicting processes: 35115, 35114.
CONTEXT:  WAL redo at 0/328 for Standby/LOCK: xid 510 db 13185 rel 16384

LOG:  recovery is still waiting recovery conflict on lock after 3087.926 ms
DETAIL:  Conflicting process: 35114.
CONTEXT:  WAL redo at 0/328 for Standby/LOCK: xid 510 db 13185 rel 16384

What do you think?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v8-0001-Log-the-standby-recovery-conflict-waits.patch
Description: Binary data


Re: Commitfest 2020-11

2020-10-26 Thread Michael Paquier
On Mon, Oct 26, 2020 at 09:09:17PM +0300, Anastasia Lubennikova wrote:
> November commitfest will start just in a few days.
> I'm happy to volunteer to be the CFM for this one. With a help of Georgios
> Kokolatos [1].

Thanks to both of you for volunteering.
--
Michael


signature.asc
Description: PGP signature


Re: Non-configure build of thread_test has been broken for awhile

2020-10-26 Thread Bruce Momjian
On Tue, Oct 20, 2020 at 12:25:48PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2020-Oct-18, Tom Lane wrote:
> >> It doesn't really seem sane to me to support two different build
> >> environments for thread_test, especially when one of them is so
> >> little-used that it can be broken for years before we notice.
> >> So I'd be inclined to rip out the Makefile and just consider
> >> that thread_test.c is *only* meant to be used by configure.
> >> If we wish to resurrect the standalone build method, we could
> >> probably do so by adding LIBS to the Makefile's link command
> >> ... but what's the point, and what will keep it from getting
> >> broken again later?
> 
> > Standalone usage of that program is evidently non-existant, so +1 for
> > removing the Makefile and just keep the configure compile path for it.
> 
> I concluded that if thread_test.c will only be used by configure,
> then we should stick it under $(SRCDIR)/config/ and nuke the
> src/test/thread/ subdirectory altogether.  See attached.

Sounds good.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Wrong example in the bloom documentation

2020-10-26 Thread Bruce Momjian
On Mon, Oct 26, 2020 at 05:04:09PM -0400, Bruce Momjian wrote:
> On Sat, Oct 17, 2020 at 01:50:26PM +, Daniel Westermann (DWE) wrote:
> > On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> > >This is not applying to PG 12 or earlier because the patch mentions JIT,
> > >which was only mentioned in the PG bloom docs in PG 13+.
> > 
> > Does that mean we need separate patches for each release starting with 10? 
> > As I am not frequently writing patches, I would need some help here.
> 
> I can regenerate the output for older versions using your patch.
> However, I am confused about the parallelism you are seeing.  Your patch
> shows:
> 
>  Without the two indexes being created, a parallel sequential scan 
> will happen for the query below:
>   ---
>   
>   =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 
> 123451;
>   QUERY PLAN
>   
> ---
>Seq Scan on tbloom  (cost=0.00..214.00 rows=1 width=24) (actual 
> time=2.729..2.731 rows=0 loops=1)
>  Filter: ((i2 = 898732) AND (i5 = 123451))
>  Rows Removed by Filter: 1
>Planning Time: 0.257 ms
>Execution Time: 2.764 ms
>   (5 rows)
> 
> However, I don't see any parallelism in this output.  Also, I don't see
> any parallelism once the indexes are created.  What PG version is this?
> and what config settings did you use?  Thanks.

I figured it out --- you have to use the larger generate_series value to
get the parallel output.  I have adjusted all the docs back to 9.6 to
show accurate output for that version, and simplified the query
ordering --- patch to master attached.  The other releases are similar. 
Daniel, please let me know if I have left out any details.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
new file mode 100644
index 285b67b..d1cf9ac
*** a/doc/src/sgml/bloom.sgml
--- b/doc/src/sgml/bloom.sgml
*** CREATE INDEX bloomidx ON tbloom USING bl
*** 110,184 
 FROM
generate_series(1,1000);
  SELECT 1000
- =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
- CREATE INDEX
- =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
-  pg_size_pretty
- 
-  153 MB
- (1 row)
- =# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
- CREATE INDEX
- =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
-  pg_size_pretty
- 
-  387 MB
- (1 row)
  
  

 A sequential scan over this large table takes a long time:
  
  =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
!  QUERY PLAN
! 
!  Seq Scan on tbloom  (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
 Filter: ((i2 = 898732) AND (i5 = 123451))
!Rows Removed by Filter: 1000
!  Planning time: 0.177 ms
!  Execution time: 1445.473 ms
  (5 rows)
  

  

!So the planner will usually select an index scan if possible.
!With a btree index, we get results like this:
  
  =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
!QUERY PLAN
! 
!  Index Only Scan using btreeidx on tbloom  (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
!Index Cond: ((i2 = 898732) AND (i5 = 123451))
!Heap Fetches: 0
!  Planning time: 0.193 ms
!  Execution time: 445.770 ms
  (5 rows)
  

  

!Bloom is better than btree in handling this type of search:
  
  =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
! QUERY PLAN
! ---
!  Bitmap Heap Scan on tbloom  (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
 Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
!Rows Removed by Index Recheck: 2439
!Heap Blocks: exact=2408
!-  Bitmap Index Scan on bloomidx  (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
   Index Cond: ((i2 = 898732) AND (i5 = 123451))
!  Planning time: 0.475 ms
!  Execution time: 76.778 ms
  (8 rows)
  
-Note the relatively large number of false 

Re: Add important info about ANALYZE after create Functional Index

2020-10-26 Thread David G. Johnston
On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

> Hi all,
>
> As you all already know Postgres supports functions in index expressions
> (marked as immutable ofc) and for this special index the ANALYZE command
> creates some statistics (new pg_statistic entry) about it.
>
> The problem is just after creating a new index or rebuilding concurrently
> (using the new REINDEX .. CONCURRENTLY or the old manner creating new one
> and then swapping) we need to run ANALYZE to update statistics but we don't
> mention it in any part of our documentation.
>
> Last weekend Gitlab went down because the lack of an ANALYZE after
> rebuilding concurrently a functional index and they followed the
> recommendation we have into our documentation [1] about how to rebuild it
> concurrently, but we don't warn users about the ANALYZE after.
>
> Would be nice if add some information about it into our docs but not sure
> where. I'm thinking about:
> - doc/src/sgml/ref/create_index.sgml
> - doc/src/sgml/maintenance.sgml (routine-reindex)
>
> Thoughts?
>
> [1]
> https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499
>

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion.  It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

David J.


Re: Improve pg_dump dumping publication tables

2020-10-26 Thread John Hsu
Hi Cary,

Thanks for taking a look. I agree there's a risk  since there's more memory 
usage on client side, and if you're dumping say millions of publicationTables 
then that could be problematic. 
I'd like to think this isn't any riskier than existing pg_dump code such as in 
getTables(...) where presumably we would run into similar problems. 

Cheers,
John H

Add important info about ANALYZE after create Functional Index

2020-10-26 Thread Fabrízio de Royes Mello
Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1] about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: Wrong example in the bloom documentation

2020-10-26 Thread Bruce Momjian
On Sat, Oct 17, 2020 at 01:50:26PM +, Daniel Westermann (DWE) wrote:
> On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> >This is not applying to PG 12 or earlier because the patch mentions JIT,
> >which was only mentioned in the PG bloom docs in PG 13+.
> 
> Does that mean we need separate patches for each release starting with 10? 
> As I am not frequently writing patches, I would need some help here.

I can regenerate the output for older versions using your patch.
However, I am confused about the parallelism you are seeing.  Your patch
shows:

   Without the two indexes being created, a parallel sequential scan 
will happen for the query below:
---

=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 
123451;
QUERY PLAN

---
 Seq Scan on tbloom  (cost=0.00..214.00 rows=1 width=24) (actual 
time=2.729..2.731 rows=0 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 1
 Planning Time: 0.257 ms
 Execution Time: 2.764 ms
(5 rows)

However, I don't see any parallelism in this output.  Also, I don't see
any parallelism once the indexes are created.  What PG version is this?
and what config settings did you use?  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Commitfest 2020-11

2020-10-26 Thread Fabrízio de Royes Mello
On Mon, Oct 26, 2020 at 3:09 PM Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:
>
> Hello, hackers!
>
> November commitfest will start just in a few days.
> I'm happy to volunteer to be the CFM for this one. With a help of
> Georgios Kokolatos [1].
>
> It's time to register your patch in the commitfest, if not yet.
>
> If you already have a patch in the commitfest, update its status and
> make sure it still applies and that the tests pass. Check the state at
> http://cfbot.cputube.org/
>
> If there is a long-running stale discussion, please send a short summary
> update about its current state, open questions, and TODOs. I hope, it
> will encourage reviewers to pay more attention to the thread.
>

Awesome!

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: PATCH: Batch/pipelining support for libpq

2020-10-26 Thread Alvaro Herrera
I started reading this patch.  As a starting point I decided to post an
updated copy (v22), wherein I reverted the overwriting of
src/include/port/linux.h with the win32.h contents (?) and the inclusion
of  in libpq-fe.h.  If those are needed, some different
solution will have to be found.

Trivial other changes (pgindent etc); nothing of substance.  With the
PQtrace() patch I posted at [1] the added test program crashes -- I
don't know if the fault lies in this patch or the other one.

[1] https://postgr.es/m/20201026162313.GA22502@alvherre.pgsql
>From 4d4e86c817599c7370b44c75e3caca9b4d3fea2d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 26 Oct 2020 13:24:24 -0300
Subject: [PATCH v22] libpq batch

---
 doc/src/sgml/libpq.sgml   |  468 
 doc/src/sgml/lobj.sgml|4 +
 .../libpqwalreceiver/libpqwalreceiver.c   |3 +
 src/interfaces/libpq/exports.txt  |4 +
 src/interfaces/libpq/fe-connect.c |   27 +
 src/interfaces/libpq/fe-exec.c|  630 ++-
 src/interfaces/libpq/fe-protocol2.c   |6 +
 src/interfaces/libpq/fe-protocol3.c   |   16 +-
 src/interfaces/libpq/libpq-fe.h   |   22 +-
 src/interfaces/libpq/libpq-int.h  |   46 +-
 src/test/modules/Makefile |1 +
 src/test/modules/test_libpq/.gitignore|5 +
 src/test/modules/test_libpq/Makefile  |   25 +
 src/test/modules/test_libpq/README|1 +
 .../modules/test_libpq/t/001_libpq_async.pl   |   27 +
 src/test/modules/test_libpq/testlibpqbatch.c  | 1008 +
 src/tools/msvc/Mkvcbuild.pm   |3 +-
 src/tools/pgindent/typedefs.list  |3 +
 18 files changed, 2253 insertions(+), 46 deletions(-)
 create mode 100644 src/test/modules/test_libpq/.gitignore
 create mode 100644 src/test/modules/test_libpq/Makefile
 create mode 100644 src/test/modules/test_libpq/README
 create mode 100644 src/test/modules/test_libpq/t/001_libpq_async.pl
 create mode 100644 src/test/modules/test_libpq/testlibpqbatch.c

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index de60281fcb..a768feffd3 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4819,6 +4819,465 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch mode and query pipelining
+
+  
+   libpq
+   batch mode
+  
+
+  
+   libpq
+   pipelining
+  
+
+  
+   libpq supports queueing up queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.
+  
+
+  
+   When to use batching
+
+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It increases client application complexity
+and extra caution is required to prevent client/server deadlocks but
+can sometimes offer considerable performance improvements.
+   
+
+   
+Batching is most useful when the server is distant, i.e. network latency
+(ping time) is high, and when many small operations are being performed in
+rapid sequence. There is usually less benefit in using batches when each
+query takes many multiples of the client/server round-trip time to execute.
+A 100-statement operation run on a server 300ms round-trip-time away would take
+30 seconds in network latency alone without batching; with batching it may spend
+as little as 0.3s waiting for results from the server.
+   
+
+   
+Use batches when your application does lots of small
+INSERT, UPDATE and
+DELETE operations that can't easily be transformed into
+operations on sets or into a
+COPY operation.
+   
+
+   
+Batching is not useful when information from one operation is required by the
+client before it knows enough to send the next operation. The client must
+introduce a synchronisation point and wait for a full client/server
+round-trip to get the results it needs. However, it's often possible to
+adjust the client design to exchange the required information server-side.
+Read-modify-write cycles are especially good candidates; for example:
+
+ BEGIN;
+ SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+ -- result: x=2
+ -- client adds 1 to x:
+ UPDATE mytable SET x = 3 WHERE id = 42;
+ COMMIT;
+
+could be much more efficiently done with:
+
+ UPDATE mytable SET x = x + 1 WHERE id = 42;
+
+   
+
+   
+
+ The batch API was introduced in PostgreSQL 14.0, but clients using PostgresSQL 14.0 version of libpq can
+ use batches on server versions 7.4 and newer. Batching works on any server
+ that supports the v3 extended query protocol.
+
+   
+
+  
+
+  
+   Using batch mode
+
+   
+To issue batches the application must switch
+a connection into batch mode. Enter 

Commitfest 2020-11

2020-10-26 Thread Anastasia Lubennikova

Hello, hackers!

November commitfest will start just in a few days.
I'm happy to volunteer to be the CFM for this one. With a help of 
Georgios Kokolatos [1].


It's time to register your patch in the commitfest, if not yet.

If you already have a patch in the commitfest, update its status and 
make sure it still applies and that the tests pass. Check the state at  
http://cfbot.cputube.org/


If there is a long-running stale discussion, please send a short summary 
update about its current state, open questions, and TODOs. I hope, it 
will encourage reviewers to pay more attention to the thread.


[1] 
https://www.postgresql.org/message-id/AxH0n_zLwwJ0MBN3uJpHfYDkV364diOGhtpLAv0OC0qHLN8ClyPsbRi1fSUAJLJZzObZE_y1qc-jqGravjIMoxVrdtLm74HmTUeIPWWkmSg%3D%40pm.me


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Supporting = operator in gin/gist_trgm_ops

2020-10-26 Thread Alexander Korotkov
On Mon, Oct 26, 2020 at 7:38 AM Julien Rouhaud  wrote:
> Ah, yes this might lead to bad performance if the "fake wildcard"
> matches too many rows, but this shouldn't be a very common use case,
> and the only alternative for that might be to create trigrams for non
> alphanumerics characters.  I didn't try to do that because it would
> mean meaningful overhead for mainstream usage of pg_trgm, and would
> also mean on-disk format break.  In my opinion supporting = should be
> a best effort, especially for such corner cases.

It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm().  It some
cases it would generate more trigrams.  For instance generate_trgm()
would generate '__a', '_ab', 'ab_' for '%ab%' while
generate_wildcard_trgm() would generate nothing.

Also I wonder how our costing would work if there are multiple indices
of the same column.  We should clearly prefer btree than pg_trgm
gist/gin, and I believe our costing provides this.  But we also should
prefer btree_gist/btree_gin than pg_trgm gist/gin, and I'm not sure
our costing provides this especially for gist.

--
Regards,
Alexander Korotkov




Re: PATCH: Report libpq version and configuration

2020-10-26 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Oct-26, Craig Ringer wrote:
>> also adds PQlibInfoPrint() which dumps PQlibInfo() keys/values to stdout.

> Sounds useful. I'd have PQlibInfoPrint(FILE *) instead, so you can pass
> stdout or whichever fd you want.

+1.  Are we concerned about translatability of these strings?  I think
I'd vote against, as it would complicate applications, but it's worth
thinking about it now not later.

>> Patch 0002 exposes LIBPQ_VERSION_STR, LIBPQ_VERSION_NUM and
>> LIBPQ_CONFIGURE_ARGS symbols in the dynamic symbol table. These can be
>> accessed by a debugger even when the library cannot be loaded or executed,
>> and unlike macros are available even in a stripped executable. So they can
>> be used to identify a libpq binary found in the wild. Their storage is
>> shared with PQlibInfo()'s static data, so they only cost three symbol table
>> entries.

> Interesting.  Is this real-world useful?

-1, I think this is making way too many assumptions about the content
and format of a shlib.

>> Patch 0003 allows libpq.so to be executed directly from the command line to
>> print its version, configure arguments etc exactly as PQlibInfoPrint()
>> would output them. This is only enabled on x64 linux for now but can be
>> extended to other targets quite simply.

> +1 --- to me this is the bit that would be most useful, I expect.

Again, I'm not exactly excited about this.  I do not one bit like
patches that assume that x64 linux is the universe, or at least
all of it that need be catered to.  Reminds me of people who thought
Windows was the universe, not too many years ago.

I'd rather try to set this up so that some fairly standard tooling
like "strings" + "grep" can be used to pull out the info.  Sure,
it would be less convenient, but honestly how often is this really
going to be necessary?

regards, tom lane




Re: PATCH: Report libpq version and configuration

2020-10-26 Thread Abhijit Menon-Sen
At 2020-10-26 20:56:57 +0800, craig.rin...@enterprisedb.com wrote:
>
> $  ./build/src/interfaces/libpq/libpq.so
> VERSION_NUM: 14
> VERSION: PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 10.2.1 20200723 (Red Hat 10.2.1-1), 64-bit
> CONFIGURE_ARGS:  '--cache-file=config.cache-'
> '--prefix=/home/craig/pg/master' '--enable-debug' '--enable-cassert'
> '--enable-tap-tests' '--enable-dtrace' 'CC=/usr/lib64/ccache/gcc'
> 'CFLAGS=-Og -ggdb3' 'CPPFLAGS=' 'CPP=/usr/lib64/ccache/gcc -E'
> USE_SSL: 0
> ENABLE_GSS: 0
> ENABLE_THREAD_SAFETY: 1
> HAVE_UNIX_SOCKETS: 1
> DEFAULT_PGSOCKET_DIR: /tmp
> DEF_PGPORT: 5432

This is excellent.

-- Abhijit




Re: PATCH: Report libpq version and configuration

2020-10-26 Thread Alvaro Herrera
On 2020-Oct-26, Craig Ringer wrote:

> Patch 0001 adds PQlibInfo(), which returns an array of key/value
> description items reporting on configuration like the full version string,
> SSL support, gssapi support, thread safety, default port and default unix
> socket path. This is for application use and application diagnostics. It
> also adds PQlibInfoPrint() which dumps PQlibInfo() keys/values to stdout.
> See the commit message in patch 0001 for details.

Sounds useful. I'd have PQlibInfoPrint(FILE *) instead, so you can pass
stdout or whichever fd you want.

> Patch 0002 exposes LIBPQ_VERSION_STR, LIBPQ_VERSION_NUM and
> LIBPQ_CONFIGURE_ARGS symbols in the dynamic symbol table. These can be
> accessed by a debugger even when the library cannot be loaded or executed,
> and unlike macros are available even in a stripped executable. So they can
> be used to identify a libpq binary found in the wild. Their storage is
> shared with PQlibInfo()'s static data, so they only cost three symbol table
> entries.

Interesting.  Is this real-world useful?  I'm thinking most of the time
I'd just run the library, but maybe you know of cases where that doesn't
work?

> Patch 0003 allows libpq.so to be executed directly from the command line to
> print its version, configure arguments etc exactly as PQlibInfoPrint()
> would output them. This is only enabled on x64 linux for now but can be
> extended to other targets quite simply.

+1 --- to me this is the bit that would be most useful, I expect.




Re: new heapcheck contrib module

2020-10-26 Thread Mark Dilger



> On Oct 26, 2020, at 9:12 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger
>>  wrote:
>>> Done that way in the attached, which also include Robert's changes from v19 
>>> he posted earlier today.
> 
>> Committed. Let's see what the buildfarm thinks.
> 
> Another thing that the buildfarm is pointing out is
> 
> [WARN] FOUserAgent - The contents of fo:block line 2 exceed the available 
> area in the inline-progression direction by more than 50 points. (See 
> position 148863:380)
> 
> This is coming from the sample output for verify_heapam(), which is too
> wide to fit in even a normal-size browser window, let alone A4 PDF.
> 
> While we could perhaps hack it up to allow more line breaks, or see
> if \x formatting helps, my own suggestion would be to just nuke the
> sample output altogether.

Ok.

> It doesn't look like it is any sort of
> representative real output,

It is not.  It came from artificially created corruption in the regression 
tests.  I may even have manually edited that, though I don't recall.

> and it is not useful enough to be worth
> spending time to patch up.

Ok.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: libpq debug log

2020-10-26 Thread Alvaro Herrera
Hello Aya Iwata

I've been hacking at this patch again.  There were a few things I wasn't
too clear about, so I reordered the code and renamed the routines to try
to make it easier to follow.

One thing I didn't like very much is that all the structures and enums
were exposed to the world in libq-int.h.  Because some enum members have
pretty generic names, I didn't like that much, so I moved the whole
thing to fe-misc.c, and renamed the structs.  Also, the arrays don't
take space unless PQtrace() is called.  (This is not what I was talking
about in my previous message.  The idea I was trying to explain in my
previous message cannot possibly work, so I abandoned it.)

I also renamed functions to make it clear which handles frontend and
which handles backend.  With that, it was pretty obvious that we had an
"reset BE message" in the routine to handle FE, and some clearing of FE
in the code that handles BE.  I fixed things in a way that I think makes
the most sense.

I noticed that it's theoretically possible to have a FE message so large
(more than MAXPGPATH pieces) that it would overrun the array; so I added
a "print message" call after adding one piece, to avoid this.  Also, why
MAXPGPATH?  I added a new symbol MAX_FRONTEND_MSGS for this purpose.

There are some things still to do:

0. I added a XXX comment to pqFlush.  Because we're storing messages in
fe_msgs that point to the libpq buffer, is it possible to end up with
trace messages that are pointing into outBuffer bytes that are already
sent, and perhaps even overwritten with newer bytes?  Maybe not, but
it's unclear.  Should we do pqLogFrontendMsg() preventively to avoid
this problem?

1. Is the handling of protocol 2 correct?  Since it's completely
separate from protocol 3, I have not even looked at what it produces.
But the fact that pqLogMsgByte1 completely ignores the "commsource"
argument makes me suspect that it's not correct.
1a. How do we even test protocol 2 handling?

2. We need a mode to suppress print of time; this would be useful to
be able to test libpq at a low level.  Maybe instead of time we can
print a monotonically-increasing packet sequence number.  With this, we
could easily add tests for libpq itself.  What user interface do we want
for this?  Maybe we can add an "bits32 flags" parameter to PQtrace(),
with one bit for this use.

3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good.

4. Even in text format, COPY output is not very useful.  How can we
improve that?

5. Error messages are still printing the terminating zero byte.  I
suggest that it should be suppressed.

6. Let's redefine pqTraceMaybeBreakLine() (I renamed from
pqLogLineBreak):  If message is complete, print a newline; if message is
not complete, print a " ".  Then, remove the whitespace after printing
each element.  This should lead to lines that don't have an useless " "
at the end.

7. Why does it make sense to call pqTraceMaybeBreakLine when
commsource=FROM_FRONTEND?
>From 61e135006c3f48d6d5fb3e5a5400fca448e4cbc4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 16 Sep 2020 13:13:45 -0300
Subject: [PATCH v9] libpq trace

---
 doc/src/sgml/libpq.sgml |   1 +
 src/interfaces/libpq/fe-connect.c   |  12 +-
 src/interfaces/libpq/fe-misc.c  | 576 ++--
 src/interfaces/libpq/fe-protocol3.c |   8 +
 src/interfaces/libpq/libpq-int.h|  15 +
 src/tools/pgindent/typedefs.list|   5 +
 6 files changed, 588 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index de60281fcb..bd6d89cf0f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5844,6 +5844,7 @@ PGContextVisibility PQsetErrorContextVisibility(PGconn *conn, PGContextVisibilit
 
  
   Enables  tracing of the client/server communication to a debugging file stream.
+  (Details of tracing contents appear in ).
 
 void PQtrace(PGconn *conn, FILE *stream);
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index b0ca37c2ed..856e011d9a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -6809,7 +6809,17 @@ PQtrace(PGconn *conn, FILE *debug_port)
 	if (conn == NULL)
 		return;
 	PQuntrace(conn);
-	conn->Pfdebug = debug_port;
+	if (pqTraceInit(conn))
+	{
+		conn->Pfdebug = debug_port;
+		setlinebuf(conn->Pfdebug);
+	}
+	else
+	{
+		/* XXX report ENOMEM? */
+		fprintf(conn->Pfdebug, "Failed to initialize trace support\n");
+		fflush(conn->Pfdebug);
+	}
 }
 
 void
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 4ffc7f33fb..a6bb64dd5a 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -53,11 +53,154 @@
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
+/* Log message source */
+typedef enum
+{
+	FROM_BACKEND,
+	FROM_FRONTEND
+} PGCommSource;
+
+/* Messages from backend */
+typedef enum PGLogState
+{
+	LOG_FIRST_BYTE,/* logging the 

Re: libpq compression

2020-10-26 Thread Konstantin Knizhnik

Rebased version of the patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index ace4ed5..deba608 100755
--- a/configure
+++ b/configure
@@ -700,6 +700,7 @@ LD
 LDFLAGS_SL
 LDFLAGS_EX
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 XML2_LIBS
@@ -867,6 +868,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 '
@@ -8571,6 +8573,85 @@ fi
 
 
 
+#
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
 
 #
 # Zlib
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index de60281..4a1f49c 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1225,6 +1225,20 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  compression
+  
+  
+Request compression of libpq traffic. Client sends to the server list of compression algorithms, supported by client library.
+If server supports one of this algorithms, then it acknowledges use of this algorithm and then all libpq messages send both from client to server and
+visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+message and it is up to the client whether to continue work without compression or report error.
+Supported compression algorithms are chosen at configure time. Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+  
+ 
+
  
   client_encoding
   
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3a64db6..24f2c70 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -92,6 +92,15 @@
such as COPY.
   
 
+  
+It is possible to compress protocol data to reduce traffic and speed-up client-server interaction.
+Compression is especial useful for importing/exporting data to/from database using COPY command
+and for replication (both physical and logical). Also compression can reduce server response time
+in case of queries returning large amount of data (for example returning JSON, BLOBs, text,...)
+Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+
  
   Messaging Overview
 
@@ -263,6 +272,21 @@
  
 
  
+  CompressionAck
+  
+   
+ Server acknowledges using compression for client-server communication protocol.
+ Compression can be requested by client by including "compression" option in connection string.
+ Client sends to the server list of compression algorithms, supported by client library
+ (compression algorithm is identified by one letter: 'f' - Facebook zstd, 'z' - zlib,...).
+ If server supports one of this algorithms, then it acknowledges use of this algorithm and all subsequent libpq messages send both from client to server and
+ visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+ algorithm identifier and it is up to the client whether to continue work without 

Re: new heapcheck contrib module

2020-10-26 Thread Tom Lane
Robert Haas  writes:
> On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger
>  wrote:
>> Done that way in the attached, which also include Robert's changes from v19 
>> he posted earlier today.

> Committed. Let's see what the buildfarm thinks.

Another thing that the buildfarm is pointing out is

[WARN] FOUserAgent - The contents of fo:block line 2 exceed the available area 
in the inline-progression direction by more than 50 points. (See position 
148863:380)

This is coming from the sample output for verify_heapam(), which is too
wide to fit in even a normal-size browser window, let alone A4 PDF.

While we could perhaps hack it up to allow more line breaks, or see
if \x formatting helps, my own suggestion would be to just nuke the
sample output altogether.  It doesn't look like it is any sort of
representative real output, and it is not useful enough to be worth
spending time to patch up.

regards, tom lane




Re: MultiXact\SLRU buffers configuration

2020-10-26 Thread Andrey Borodin



> 26 окт. 2020 г., в 06:05, Alexander Korotkov  
> написал(а):
> 
> Hi!
> 
> On Mon, Sep 28, 2020 at 5:41 PM Andrey M. Borodin  
> wrote:
>>> 28 авг. 2020 г., в 23:08, Anastasia Lubennikova 
>>>  написал(а):
>>> 
>>> 1) The first patch is sensible and harmless, so I think it is ready for 
>>> committer. I haven't tested the performance impact, though.
>>> 
>>> 2) I like the initial proposal to make various SLRU buffers configurable, 
>>> however, I doubt if it really solves the problem, or just moves it to 
>>> another place?
>>> 
>>> The previous patch you sent was based on some version that contained 
>>> changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' 
>>> and 'multixact_members_slru_buffers'. Have you just forgot to attach them? 
>>> The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
>>> Meanwhile, I attach the rebased patch to calm down the CFbot. The changes 
>>> are trivial.
>>> 
>>> 2.1) I think that both min and max values for this parameter are too 
>>> extreme. Have you tested them?
>>> 
>>> +   _local_cache_entries,
>>> +   256, 2, INT_MAX / 2,
>>> 
>>> 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
>>> 
>>> 3) No changes for third patch. I just renamed it for consistency.
>> 
>> Thank you for your review.
>> 
>> Indeed, I had 4th patch with tests, but these tests didn't work well: I 
>> still did not manage to stress SLRUs to reproduce problem from production...
>> 
>> You are absolutely correct in point 2: I did only tests with sane values. 
>> And observed extreme performance degradation with values ~ 64 megabytes. I 
>> do not know which highest values should we pick? 1Gb? Or highest possible 
>> functioning value?
>> 
>> I greatly appreciate your review, sorry for so long delay. Thanks!
> 
> I took a look at this patchset.
> 
> The 1st and 3rd patches look good to me.  I made just minor improvements.
> 1) There is still a case when SimpleLruReadPage_ReadOnly() relocks the
> SLRU lock, which is already taken in exclusive mode.  I've evaded this
> by passing the lock mode as a parameter to
> SimpleLruReadPage_ReadOnly().
> 3) CHECK_FOR_INTERRUPTS() is not needed anymore, because it's called
> inside ConditionVariableSleep() if needed.  Also, no current wait
> events use slashes, and I don't think we should introduce slashes
> here.  Even if we should, then we should also rename existing wait
> events to be consistent with a new one.  So, I've renamed the new wait
> event to remove the slash.
> 
> Regarding the patch 2.  I see the current documentation in the patch
> doesn't explain to the user how to set the new parameter.  I think we
> should give users an idea what workloads need high values of
> multixact_local_cache_entries parameter and what doesn't.  Also, we
> should explain the negative aspects of high values
> multixact_local_cache_entries.  Ideally, we should get the advantage
> without overloading users with new nontrivial parameters, but I don't
> have a particular idea here.
> 
> I'd like to propose committing 1 and 3, but leave 2 for further review.

Thanks for your review, Alexander! 
+1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
Other changes seem correct to me too.


I've tried to find optimal value for cache size and it seems to me that it 
affects multixact scalability much less than sizes of offsets\members buffers. 
I concur that patch 2 of the patchset does not seem documented enough.

Best regards, Andrey Borodin.



Re: automatic analyze: readahead - add "IO read time" log message

2020-10-26 Thread Stephen Frost
Greetings,

* Jakub Wartak (jakub.war...@tomtom.com) wrote:
> I have I hope interesting observation (and nano patch proposal) on system 
> where statistics freshness is a critical factor. Autovacuum/autogathering 
> statistics was tuned to be pretty very aggressive:
> autovacuum_vacuum_cost_delay=0 (makes autovacuum_vacuum_cost_limit irrelevant)
> autovacuum_naptime=1s
> autovacuum_max_workers=4
> 
> Some critical table partitions are configured with:  
> autovacuum_analyze_scale_factor=0.001, autovacuum_analyze_threshold=5 to 
> force auto-analyze jobs pretty often. The interesting logs are like this:
> automatic analyze of table "t1" system usage: CPU: user: 37.52 s, system: 
> 23.01 s, elapsed: 252.14 s
> automatic analyze of table "t2" system usage: CPU: user: 38.70 s, system: 
> 22.63 s, elapsed: 317.33 s
> automatic analyze of table "t2" system usage: CPU: user: 39.38 s, system: 
> 21.43 s, elapsed: 213.58 s
> automatic analyze of table "t1" system usage: CPU: user: 37.91 s, system: 
> 24.49 s, elapsed: 229.45 s

That's certainly pretty aggressive. :)

> and this is root-cause of my question.  As you can see there is huge 3x-4x 
> time discrepancy between "elapsed" and user+system which is strange at least 
> for me as there should be no waiting (autovacuum_vacuum_cost_delay=0). 
> According to various tools it is true: Time was wasted somewhere else, but 
> not in the PostgreSQL analyze. The ps(1) and pidstat(1) also report the same 
> for the worker:

The user/system time there is time-on-CPU (hence the 'CPU: ' prefix).

> 06:56:12 AM   PID%usr %system  %guest%CPU   CPU  Command
> 06:56:13 AM1147748.00   10.000.00   18.0018  postgres
> 06:56:14 AM1147748.00   11.000.00   19.0015  postgres
> 06:56:15 AM1147745.00   13.000.00   18.0018  postgres
> 
> 06:56:17 AM   PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
> 06:56:18 AM114774  63746.53  0.00  0.00  postgres
> 06:56:19 AM114774  62896.00  0.00  0.00  postgres
> 06:56:20 AM114774  62920.00  0.00  0.00  postgres
> 
> One could argue that such autoanalyze worker could perform 5x more of work 
> (%CPU -> 100%) here. The I/O system is not performing a lot (total = 242MB/s 
> reads@22k IOPS, 7MB/s writes@7k IOPS, with service time 0.04ms), although 
> reporting high utilization I'm pretty sure it could push much more. There can 
> be up to 3x-4x of such 70-80MB/s analyzes in parallel (let's say 300MB/s for 
> statistics collection alone).

The analyze is doing more-or-less random i/o since it's skipping through
the table picking out select blocks, not doing regular sequential i/o.

> According to various gdb backtraces, a lot of time is spent here:
> #0  0x7f98cdfc9f73 in __pread_nocancel () from /lib64/libpthread.so.0
> #1  0x00741a16 in pread (__offset=811253760, __nbytes=8192, 
> __buf=0x7f9413ab7280, __fd=) at /usr/include/bits/unistd.h:84
> #2  FileRead (file=, buffer=0x7f9413ab7280 "\037\005", 
> amount=8192, offset=811253760, wait_event_info=167772173) at fd.c:1883
> #3  0x00764b8f in mdread (reln=, forknum= out>, blocknum=19890902, buffer=0x7f9413ab7280 "\037\005") at md.c:596
> #4  0x0073d69b in ReadBuffer_common (smgr=, 
> relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=19890902, 
> mode=RBM_NORMAL, strategy=0x1102278, hit=0x7fffba7e2d4f)
> at bufmgr.c:897
> #5  0x0073e27e in ReadBufferExtended (reln=0x7f98c0c9ded0, 
> forkNum=MAIN_FORKNUM, blockNum=19890902, mode=, 
> strategy=) at bufmgr.c:665
> #6  0x004c2e2f in heapam_scan_analyze_next_block (scan= out>, blockno=, bstrategy=) at 
> heapam_handler.c:998
> #7  0x00597de1 in table_scan_analyze_next_block (bstrategy= out>, blockno=, scan=0x10c8098) at 
> ../../../src/include/access/tableam.h:1462
> #8  acquire_sample_rows (onerel=0x7f98c0c9ded0, elevel=13, rows=0x1342e08, 
> targrows=150, totalrows=0x7fffba7e3160, totaldeadrows=0x7fffba7e3158) at 
> analyze.c:1048
> #9  0x00596a50 in do_analyze_rel (onerel=0x7f98c0c9ded0, 
> params=0x10744e4, va_cols=0x0, acquirefunc=0x597ca0 , 
> relpages=26763525, inh=false,
> in_outer_xact=false, elevel=13) at analyze.c:502
> [..]
> #12 0x006e76b4 in autovacuum_do_vac_analyze (bstrategy=0x1102278, 
> tab=) at autovacuum.c:3118
> [..]

Sure, we're blocked on a read call trying to get the next block.

> The interesting thing that targrows=1.5mlns and that blocks are accessed (as 
> expected) in sorted order:
> 
> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, 
> blockno=19890910, bstrategy=0x1102278) at heapam_handler.c:984
> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, 
> blockno=19890912, bstrategy=0x1102278) at heapam_handler.c:984
> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, 
> blockno=19890922, bstrategy=0x1102278) at heapam_handler.c:984
> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, 
> 

Re: new heapcheck contrib module

2020-10-26 Thread Andres Freund
Hi,

On October 26, 2020 7:13:15 AM PDT, Tom Lane  wrote:
>Robert Haas  writes:
>> On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
>>  wrote:
 hornet is still unhappy even after
 321633e17b07968e68ca5341429e2c8bbf15c331?
>
>>> That appears to be a failed test for pg_surgery rather than for
>amcheck.  Or am I reading the log wrong?
>
>> Oh, yeah, you're right. I don't know why it just failed now, though:
>> there are a bunch of successful runs preceding it. But I guess it's
>> unrelated to this thread.
>
>pg_surgery's been unstable since it went in.  I believe Andres is
>working on a fix.

I posted one a while ago - was planning to push a cleaned up version soon if 
nobody comments in the near future.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Internal key management system

2020-10-26 Thread Stephen Frost
Greetings,

* Craig Ringer (craig.rin...@enterprisedb.com) wrote:
> On Mon, Oct 19, 2020 at 11:16 AM Masahiko Sawada 
>  wrote:
> > The patch introduces only key management infrastructure but with no
> > key. Currently, there is no interface to dynamically add a new
> > encryption key.
> 
> I'm a bit confused by the exact intent and use cases behind this patch.
> https://www.postgresql.org/message-id/17156d2e419.12a27f6df87825.436300492203108132%40highgo.ca
> that was somewhat helpful but not entirely clear.
> 
> The main intent of this proposal seems to be to power TDE-style encryption
> of data at rest, with a single master key for the entire cluster. Has any
> consideration been given to user- or role-level key management as part of
> this, or is that expected to be done separately and protected by the master
> key supplied by this patch?

I've not been following very closely, but I definitely agree with the
general feedback here (more on that below), but to this point- I do
believe that was the intent, or at least I sure hope that it was.  Being
able to have user/role keys would certainly be good.  Having a way for a
user to log in and unlock their key would also be really nice.

> If so, what if I have a HSM (or virtualised or paravirt or network proxied
> HSM) that I want to use to manage my database keys, such that the database
> master key is protected by the HSM? Say I want to put my database key in a
> smartcard, my machine's TPM, a usb HSM, a virtual HSM provided by my
> VM/cloud platform, etc?
> 
> As far as I can tell with the current design I'd have to encrypt my unlock
> passphrase and put it in the cluster_passphrase_command script or its
> arguments. The script would ask the HSM to decrypt the key passphrase and
> write that over stdio where Pg would read it and use it to decrypt the
> master key(s). That would work - but it should not be necessary and it
> weakens the protection offered by the HSM considerably.

Yeah, I do think this is how you'd need to do it and I agree that it'd
be better to offer an option that can go to the HSM directly.  That
said- I don't think we necessarily want to throw out tho command-based
option, as users may wish to use a vaulting solution or similar instead
of an HSM.  What I am curious about though- what are the thoughts around
using a vaulting solution's command-line tool vs. writing code to work
with an API?  Between these various options, what are the risks of
having a script vs. using an API and would one or the other weaken the
overall solution?  Or is what's really needed here is a way to tell us
if it's a passphrase we're getting or a proper key, regardless of the
method being used to fetch it?

> I suggest we allow the user to supply their own KEK via a
> cluster_encryption_key GUC. If set, Pg would create an SSLContext with the
> supplied key and use that SSLContext to decrypt the application keys - with
> no intermediate KEK-derivation based on cluster_passphrase_command
> performed. cluster_encryption_key could be set to an openssl engine URI, in
> which case OpenSSL would transparently use the supplied engine (usually a
> HSM) to decrypt the application keys. We'd install the
> cluster_passphrase_command as an openssl askpass callback so that if the
> HSM requires an unlock password it can be provided - like how it's done for
> libpq in Pg 13. Some thought is required for how to do key rotation here,
> though it matters a great deal less when a HSM is managing key escrow.

This really locks us into OpenSSL for this, which I don't particularly
like.  If we do go down this route, we should definitely make it clear
that this is for use when PG has been built with OpenSSL, ie:
openssl_cluster_encryption_key as the parameter name, or such.

> For example if I want to lock my database with a YubiHSM I would configure
> something like:
> 
> cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private'
> 
> The DB would be encrypted and decrypted using application keys unlocked by
> the HSM. Backups of the database, stolen disk images, etc, would be
> unreadable unless you have access to another HSM with the same key loaded.

Well, you would surely just need the key, since you could change the PG
config to fetch the key from whereever you have it, you wouldn't need an
actual HSM..

> If cluster_encryption_key is unset, Pg would perform its own KEK derivation
> based on cluster_passphrase_command as currently implemented.

To what I was suggesting above- what if we just had a GUC that's
"kek_method" with options 'passphrase' and 'direct', where passphrase
goes through KEK and 'direct' doesn't, which just changes how we treat
the results of called cluster_passphrase_command?

> I really don't think we should be adopting something that doesn't consider
> platform based hardware key escrow and protection.

I agree that we should consider platform based hardware key escrow and
protection.  I'm generally supportive of trying to do so in a way 

Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

2020-10-26 Thread Zhenghua Lyu
Hi,
when group by multi-columns, it will multiply all the distinct values 
together, and if one column is all null,
it also contributes 200 to the final estimate, and if the product is over 
the relation size, it will be clamp.

So the the value of the agg rel size is not correct, and impacts the upper 
path's cost estimate, and do not
give a good plan.

I debug some other queries and find this issue, but not sure if this issue 
is the root cause of my problem,
just open a thread here for discussion.

From: Tom Lane 
Sent: Monday, October 26, 2020 10:37 PM
To: Zhenghua Lyu 
Cc: pgsql-hackers@lists.postgresql.org 
Subject: Re: Should the function get_variable_numdistinct consider the case 
when stanullfrac is 1.0?

Zhenghua Lyu  writes:
>It seems the function `get_variable_numdistinct` ignore the case when 
> stanullfrac is 1.0:

I don't like this patch at all.  What's the argument for having a special
case for this value?  When would we ever get exactly 1.0 in practice?

regards, tom lane


Re: Additional Chapter for Tutorial

2020-10-26 Thread David G. Johnston
Removing -docs as moderation won’t let me cross-post.

On Monday, October 26, 2020, David G. Johnston 
wrote:

> On Monday, October 26, 2020, Jürgen Purtz  wrote:
>
>> On 21.10.20 22:33, David G. Johnston wrote:
>>
>>
>> Two, I find the amount of detail being provided here to be on the
>> too-much side.  A bit more judicious use of links into the appropriate
>> detail chapters seems warranted.
>>
>> The patch is intended to give every interested person an overall
>> impression of the chapter within its new position. Because it has moved
>> from part 'Tutorial' to 'Internals' the text should be very accurate
>> concerning technical issues - like all the other chapters in this part. A
>> tutorial chapter has a more superficial nature.
>>
> Haven’t reviewed the patches yet but...
>
> I still think that my comment applies even with the move to internals.
> The value here is putting together a coherent narrative and making deeper
> implementation details accessible.  If those details are already covered
> elsewhere in the documentation (not source code) links should be given
> serious consideration.
>
> David J.
>
>


Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

2020-10-26 Thread Tom Lane
Zhenghua Lyu  writes:
>It seems the function `get_variable_numdistinct` ignore the case when 
> stanullfrac is 1.0:

I don't like this patch at all.  What's the argument for having a special
case for this value?  When would we ever get exactly 1.0 in practice?

regards, tom lane




Re: default result formats setting

2020-10-26 Thread Tom Lane
Peter Eisentraut  writes:
> The conceptual solution is to allow a client to register for a session 
> which types it wants to always get in binary, unless it says otherwise. 

OK.

> In the discussion in [0], I pondered a new protocol message for that, 
> but after further thought, a GUC setting would do just as well.

I think a GUC is conceptually the wrong level ...

> In order to be able to test this via libpq, I had to add a little hack. 

... which is part of the reason why you have to kluge this.  I'm not
entirely certain which levels of the client stack need to know about
this, but surely libpq is one.

I'm also quite worried about failures (maybe even security problems)
arising from the "wrong level" of the client stack setting the GUC.

Independently of that, how would you implement "says otherwise" here,
ie do a single-query override of the session's prevailing setting?
Maybe the right thing for that is to define -1 all the way down to the
protocol level as meaning "use the session's per-type default", and
then if you don't want that you can pass 0 or 1.  An advantage of that
is that you couldn't accidentally break an application that wasn't
ready for this feature, because it would not be the default to use it.

regards, tom lane




Re: libpq compression

2020-10-26 Thread Konstantin Knizhnik

Remove redundant processed parameter from zpq_read function.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index 071d050..12ae2dd 100755
--- a/configure
+++ b/configure
@@ -700,6 +700,7 @@ LD
 LDFLAGS_SL
 LDFLAGS_EX
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 XML2_LIBS
@@ -867,6 +868,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 '
@@ -8571,6 +8573,85 @@ fi
 
 
 
+#
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
 
 #
 # Zlib
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index de60281..4a1f49c 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1225,6 +1225,20 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  compression
+  
+  
+Request compression of libpq traffic. Client sends to the server list of compression algorithms, supported by client library.
+If server supports one of this algorithms, then it acknowledges use of this algorithm and then all libpq messages send both from client to server and
+visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+message and it is up to the client whether to continue work without compression or report error.
+Supported compression algorithms are chosen at configure time. Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+  
+ 
+
  
   client_encoding
   
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a46cb72..6ff5777 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -92,6 +92,15 @@
such as COPY.
   
 
+  
+It is possible to compress protocol data to reduce traffic and speed-up client-server interaction.
+Compression is especial useful for importing/exporting data to/from database using COPY command
+and for replication (both physical and logical). Also compression can reduce server response time
+in case of queries returning large amount of data (for example returning JSON, BLOBs, text,...)
+Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+
  
   Messaging Overview
 
@@ -263,6 +272,21 @@
  
 
  
+  CompressionAck
+  
+   
+ Server acknowledges using compression for client-server communication protocol.
+ Compression can be requested by client by including "compression" option in connection string.
+ Client sends to the server list of compression algorithms, supported by client library
+ (compression algorithm is identified by one letter: 'f' - Facebook zstd, 'z' - zlib,...).
+ If server supports one of this algorithms, then it acknowledges use of this algorithm and all subsequent libpq messages send both from client to server and
+ visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+ algorithm identifier and it is up to the client 

Re: new heapcheck contrib module

2020-10-26 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
>  wrote:
>>> hornet is still unhappy even after
>>> 321633e17b07968e68ca5341429e2c8bbf15c331?

>> That appears to be a failed test for pg_surgery rather than for amcheck.  Or 
>> am I reading the log wrong?

> Oh, yeah, you're right. I don't know why it just failed now, though:
> there are a bunch of successful runs preceding it. But I guess it's
> unrelated to this thread.

pg_surgery's been unstable since it went in.  I believe Andres is
working on a fix.

regards, tom lane




Re: new heapcheck contrib module

2020-10-26 Thread Mark Dilger



> On Oct 26, 2020, at 7:00 AM, Robert Haas  wrote:
> 
> On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
>  wrote:
>> Much of the test in 0002 could be ported to work without committing the rest 
>> of 0002, if the pg_amcheck command line utiilty is not wanted.
> 
> How much consensus do we think we have around 0002 at this point? I
> think I remember a vote in favor and no votes against, but I haven't
> been paying a whole lot of attention.

My sense over the course of the thread is that people were very much in favor 
of having heap checking functionality, but quite vague on whether they wanted 
the command line interface.  I think the interface is useful, but I'd rather 
hear from others on this list whether it is useful enough to justify 
maintaining it.  As the author of it, I'm biased.  Hopefully others with a more 
objective view of the matter will read this and vote?

I don't recall patches 0003 through 0005 getting any votes.  0003 and 0004, 
which create and use a non-throwing interface to clog, were written in response 
to Andrey's request, so I'm guessing that's kind of a vote in favor.  0005 was 
factored out of of 0001 in response to a lack of agreement about whether 
verify_heapam should have acl checks.  You seemed in favor, and Peter against, 
but I don't think we heard other opinions.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PoC] Non-volatile WAL buffer

2020-10-26 Thread Heikki Linnakangas
I had a new look at this thread today, trying to figure out where we 
are. I'm a bit confused.


One thing we have established: mmap()ing WAL files performs worse than 
the current method, if pg_wal is not on a persistent memory device. This 
is because the kernel faults in existing content of each page, even 
though we're overwriting everything.


That's unfortunate. I was hoping that mmap() would be a good option even 
without persistent memory hardware. I wish we could tell the kernel to 
zero the pages instead of reading them from the file. Maybe clear the 
file with ftruncate() before mmapping it?


That should not be problem with a real persistent memory device, however 
(or when emulating it with DRAM). With DAX, the storage is memory-mapped 
directly and there is no page cache, and no pre-faulting.


Because of that, I'm baffled by what the 
v4-0002-Non-volatile-WAL-buffer.patch does. If I understand it 
correctly, it puts the WAL buffers in a separate file, which is stored 
on the NVRAM. Why? I realize that this is just a Proof of Concept, but 
I'm very much not interested in anything that requires the DBA to manage 
a second WAL location. Did you test the mmap() patches with persistent 
memory hardware? Did you compare that with the pmem patchset, on the 
same hardware? If there's a meaningful performance difference between 
the two, what's causing it?


- Heikki




Re: Internal key management system

2020-10-26 Thread Craig Ringer
On Mon, Oct 19, 2020 at 11:16 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

The patch introduces only key management infrastructure but with no
> key. Currently, there is no interface to dynamically add a new
> encryption key.


I'm a bit confused by the exact intent and use cases behind this patch.
https://www.postgresql.org/message-id/17156d2e419.12a27f6df87825.436300492203108132%40highgo.ca
that was somewhat helpful but not entirely clear.

The main intent of this proposal seems to be to power TDE-style encryption
of data at rest, with a single master key for the entire cluster. Has any
consideration been given to user- or role-level key management as part of
this, or is that expected to be done separately and protected by the master
key supplied by this patch?

If so, what if I have a HSM (or virtualised or paravirt or network proxied
HSM) that I want to use to manage my database keys, such that the database
master key is protected by the HSM? Say I want to put my database key in a
smartcard, my machine's TPM, a usb HSM, a virtual HSM provided by my
VM/cloud platform, etc?

As far as I can tell with the current design I'd have to encrypt my unlock
passphrase and put it in the cluster_passphrase_command script or its
arguments. The script would ask the HSM to decrypt the key passphrase and
write that over stdio where Pg would read it and use it to decrypt the
master key(s). That would work - but it should not be necessary and it
weakens the protection offered by the HSM considerably.

I suggest we allow the user to supply their own KEK via a
cluster_encryption_key GUC. If set, Pg would create an SSLContext with the
supplied key and use that SSLContext to decrypt the application keys - with
no intermediate KEK-derivation based on cluster_passphrase_command
performed. cluster_encryption_key could be set to an openssl engine URI, in
which case OpenSSL would transparently use the supplied engine (usually a
HSM) to decrypt the application keys. We'd install the
cluster_passphrase_command as an openssl askpass callback so that if the
HSM requires an unlock password it can be provided - like how it's done for
libpq in Pg 13. Some thought is required for how to do key rotation here,
though it matters a great deal less when a HSM is managing key escrow.

For example if I want to lock my database with a YubiHSM I would configure
something like:

cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private'

The DB would be encrypted and decrypted using application keys unlocked by
the HSM. Backups of the database, stolen disk images, etc, would be
unreadable unless you have access to another HSM with the same key loaded.

If cluster_encryption_key is unset, Pg would perform its own KEK derivation
based on cluster_passphrase_command as currently implemented.

I really don't think we should be adopting something that doesn't consider
platform based hardware key escrow and protection.


Re: new heapcheck contrib module

2020-10-26 Thread Robert Haas
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
 wrote:
> Much of the test in 0002 could be ported to work without committing the rest 
> of 0002, if the pg_amcheck command line utiilty is not wanted.

How much consensus do we think we have around 0002 at this point? I
think I remember a vote in favor and no votes against, but I haven't
been paying a whole lot of attention.

> > Thanks for committing (and adjusting) the patches for the existing
> > buildfarm failures. If I understand the buildfarm results correctly,
> > hornet is still unhappy even after
> > 321633e17b07968e68ca5341429e2c8bbf15c331?
>
> That appears to be a failed test for pg_surgery rather than for amcheck.  Or 
> am I reading the log wrong?

Oh, yeah, you're right. I don't know why it just failed now, though:
there are a bunch of successful runs preceding it. But I guess it's
unrelated to this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: new heapcheck contrib module

2020-10-26 Thread Mark Dilger



> On Oct 26, 2020, at 6:37 AM, Robert Haas  wrote:
> 
> On Fri, Oct 23, 2020 at 2:04 PM Tom Lane  wrote:
>> Seems to work, so I pushed it (after some compulsive fooling
>> about with whitespace and perltidy-ing).  It appears to me that
>> the code coverage for verify_heapam.c is not very good though,
>> only circa 50%.  Do we care to expend more effort on that?
> 
> There are two competing goods here. On the one hand, more test
> coverage is better than less. On the other hand, finicky tests that
> have platform-dependent results or fail for strange reasons not
> indicative of actual problems with the code are often judged not to be
> worth the trouble. An early version of this patch set had a very
> extensive chunk of Perl code in it that actually understood the page
> layout and, if we adopt something like that, it would probably be
> easier to test a whole bunch of scenarios. The downside is that it was
> a lot of code that basically duplicated a lot of backend logic in
> Perl, and I was (and am) afraid that people will complain about the
> amount of code and/or the difficulty of maintaining it. On the other
> hand, having all that code might allow better testing not only of this
> particular patch but also other scenarios involving corrupted pages,
> so maybe it's wrong to view all that code as a burden that we have to
> carry specifically to test this; or, alternatively, maybe it's worth
> carrying even if we only use it for this. On the third hand, as Mark
> points out, if we get 0002 committed, that will help somewhat with
> test coverage even if we do nothing else.

Much of the test in 0002 could be ported to work without committing the rest of 
0002, if the pg_amcheck command line utiilty is not wanted.

> 
> Thanks for committing (and adjusting) the patches for the existing
> buildfarm failures. If I understand the buildfarm results correctly,
> hornet is still unhappy even after
> 321633e17b07968e68ca5341429e2c8bbf15c331?

That appears to be a failed test for pg_surgery rather than for amcheck.  Or am 
I reading the log wrong?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: new heapcheck contrib module

2020-10-26 Thread Robert Haas
On Fri, Oct 23, 2020 at 2:04 PM Tom Lane  wrote:
> Seems to work, so I pushed it (after some compulsive fooling
> about with whitespace and perltidy-ing).  It appears to me that
> the code coverage for verify_heapam.c is not very good though,
> only circa 50%.  Do we care to expend more effort on that?

There are two competing goods here. On the one hand, more test
coverage is better than less. On the other hand, finicky tests that
have platform-dependent results or fail for strange reasons not
indicative of actual problems with the code are often judged not to be
worth the trouble. An early version of this patch set had a very
extensive chunk of Perl code in it that actually understood the page
layout and, if we adopt something like that, it would probably be
easier to test a whole bunch of scenarios. The downside is that it was
a lot of code that basically duplicated a lot of backend logic in
Perl, and I was (and am) afraid that people will complain about the
amount of code and/or the difficulty of maintaining it. On the other
hand, having all that code might allow better testing not only of this
particular patch but also other scenarios involving corrupted pages,
so maybe it's wrong to view all that code as a burden that we have to
carry specifically to test this; or, alternatively, maybe it's worth
carrying even if we only use it for this. On the third hand, as Mark
points out, if we get 0002 committed, that will help somewhat with
test coverage even if we do nothing else.

Thanks for committing (and adjusting) the patches for the existing
buildfarm failures. If I understand the buildfarm results correctly,
hornet is still unhappy even after
321633e17b07968e68ca5341429e2c8bbf15c331?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




PATCH: Report libpq version and configuration

2020-10-26 Thread Craig Ringer
The attached patches propose new interfaces for exposing more configuration
and versioning information from libpq at runtime. They are to be used by
applications to obtain finer grained information about libpq's
configuration (SSL, GSSAPI, etc), to identify libpq binaries, and for
applications that use libpq to report diagnostic information


Patch 0001 adds PQlibInfo(), which returns an array of key/value
description items reporting on configuration like the full version string,
SSL support, gssapi support, thread safety, default port and default unix
socket path. This is for application use and application diagnostics. It
also adds PQlibInfoPrint() which dumps PQlibInfo() keys/values to stdout.
See the commit message in patch 0001 for details.


Patch 0002 exposes LIBPQ_VERSION_STR, LIBPQ_VERSION_NUM and
LIBPQ_CONFIGURE_ARGS symbols in the dynamic symbol table. These can be
accessed by a debugger even when the library cannot be loaded or executed,
and unlike macros are available even in a stripped executable. So they can
be used to identify a libpq binary found in the wild. Their storage is
shared with PQlibInfo()'s static data, so they only cost three symbol table
entries.

$ cp ./build/src/interfaces/libpq/libpq.so libpq.so.stripped
$ strip libpq.so.stripped
$ gdb -batch -ex 'p (int)LIBPQ_VERSION_NUM' -ex 'p (const char
*)LIBPQ_VERSION_STR' -ex 'p (const char *)LIBPQ_CONFIGURE_ARGS'
./libpq.so.stripped
$1 = 14
$2 = 0x285f0 "PostgreSQL 14devel on x86_64-pc-linux-gnu, "
$3 = 0x28660 " '--cache-file=config.cache-'
'--prefix=/home/craig/pg/master' '--enable-debug' '--enable-cassert'
'--enable-tap-tests' '--enable-dtrace' 'CC=/usr/lib64/ccache/gcc'
'CFLAGS=-Og -ggdb3' ..."



Patch 0003 allows libpq.so to be executed directly from the command line to
print its version, configure arguments etc exactly as PQlibInfoPrint()
would output them. This is only enabled on x64 linux for now but can be
extended to other targets quite simply.

$  ./build/src/interfaces/libpq/libpq.so
VERSION_NUM: 14
VERSION: PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (GCC)
10.2.1 20200723 (Red Hat 10.2.1-1), 64-bit
CONFIGURE_ARGS:  '--cache-file=config.cache-'
'--prefix=/home/craig/pg/master' '--enable-debug' '--enable-cassert'
'--enable-tap-tests' '--enable-dtrace' 'CC=/usr/lib64/ccache/gcc'
'CFLAGS=-Og -ggdb3' 'CPPFLAGS=' 'CPP=/usr/lib64/ccache/gcc -E'
USE_SSL: 0
ENABLE_GSS: 0
ENABLE_THREAD_SAFETY: 1
HAVE_UNIX_SOCKETS: 1
DEFAULT_PGSOCKET_DIR: /tmp
DEF_PGPORT: 5432
From a4741725a9596b25a9b7308b6a47e85b889d4557 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 26 Oct 2020 20:43:24 +0800
Subject: [PATCH v1 2/3] Add libpq version information to dynamic symbol table

Add three new data symbols LIBPQ_VERSION_STR, LIBPQ_VERSION_NUM and
LIBPQ_CONFIGURE_ARGS to libpq's symbol table. These make it much easier to
identify a given libpq without debuginfo, even if it cannot be executed (e.g.
due to dependency problems or mismatched platforms).

$ strip libpq.so
$ gdb -batch -ex 'p (int)LIBPQ_VERSION_NUM' \
 -ex 'p (const char *)LIBPQ_VERSION_STR' \
 -ex 'p (const char *)LIBPQ_CONFIGURE_ARGS' \
 ./libpq.so.stripped
$1 = 14
$2 = 0x285f0 "PostgreSQL 14devel on x86_64 "
$3 = 0x28660 " '--enable-debug' '--enable-cassert' "
---
 src/interfaces/libpq/exports.txt |  3 +++
 src/interfaces/libpq/libpq-version.c | 32 +---
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1d8600f0d8..64fb62c802 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -181,3 +181,6 @@ PQgetSSLKeyPassHook_OpenSSL 178
 PQdefaultSSLKeyPassHook_OpenSSL 179
 PQlibInfo   180
 PQlibInfoPrint  181
+LIBPQ_VERSION_STR   182
+LIBPQ_VERSION_NUM   183
+LIBPQ_CONFIGURE_ARGS184
diff --git a/src/interfaces/libpq/libpq-version.c b/src/interfaces/libpq/libpq-version.c
index 51a5c608c8..bab0263848 100644
--- a/src/interfaces/libpq/libpq-version.c
+++ b/src/interfaces/libpq/libpq-version.c
@@ -47,21 +47,47 @@
 
 #define libinfo_boolstr(x) (x) == 1 ? gettext_noop("1") : gettext_noop("0")
 
+/*
+ * Store postgres version information in const symbols in the read-only data
+ * section of the libpq binary so they are exposed in the export symbol table,
+ * not just as macros.
+ *
+ * This way the exact version of a given libpq.so / libpq.dll can be determined
+ * even if it was built without macros in debuginfo or if is stripped, and
+ * without necessarily being able to execute the binary (e.g. due to linker
+ * issues).
+ *
+ * We need to store these for PGlibInfoEntries anyway, this just makes them
+ * individually addressable for the use of trace/diagnostic tools and
+ * debuggers.
+ *
+ * The linker will put these in read-only data 

Re: Collation versioning

2020-10-26 Thread Julien Rouhaud
On Sun, Oct 25, 2020 at 7:13 PM Julien Rouhaud  wrote:
>
> On Sun, Oct 25, 2020 at 10:36 AM Thomas Munro  wrote:
> >
> > On Fri, Oct 23, 2020 at 2:07 AM Julien Rouhaud  wrote:
> > > While reviewing the changes, I found a couple of minor issues
> > > (inherited from the previous versions).  It's missing a
> > > free_objects_addresses in recordDependencyOnCollations, and there's a
> > > small typo.  Inline diff:
> >
> > Thanks, fixed.
> >
> > I spent the past few days trying to simplify this patch further.
> > Here's what I came up with:
>
> Thanks!
>
> >
> > 1.  Drop the function dependencyExists() and related code, which in
> > earlier versions tried to avoid creating duplicate pg_depends rows by
> > merging with existing rows.  This was rather complicated, and there
> > are not many duplicates anyway, and it's easier to suppress duplicate
> > warnings at warning time (see do_collation_version_check()).  (I'm not
> > against working harder to make pg_depend rows unique, but it's not
> > required for this and I didn't like that way of doing it.)
>
> I didn't review all the changes yet, so I'll probably post a deeper
> review tomorrow.  I'm not opposed to this new approach, as it indeed
> saves a lot of code.  However, looking at
> do_collation_version_check(), it seems that you're saving the
> collation in context->checked_calls even if it didn't raise a WARNING.
> Since you can now have indexes with dependencies on a same collation
> with both version tracked and untracked (see for instance
> icuidx00_val_pattern_where in the regression tests), can't this hide
> corruption warning reports if the untracked version is found first?
> That can be easily fixed, so no objection to that approach of course.

I finish looking at the rest of the patches.  I don't have much to
say, it all looks good and I quite like how much useless code you got
rid of!




automatic analyze: readahead - add "IO read time" log message

2020-10-26 Thread Jakub Wartak
Greetings hackers,

I have I hope interesting observation (and nano patch proposal) on system where 
statistics freshness is a critical factor. Autovacuum/autogathering statistics 
was tuned to be pretty very aggressive:
autovacuum_vacuum_cost_delay=0 (makes autovacuum_vacuum_cost_limit irrelevant)
autovacuum_naptime=1s
autovacuum_max_workers=4

Some critical table partitions are configured with:  
autovacuum_analyze_scale_factor=0.001, autovacuum_analyze_threshold=5 to 
force auto-analyze jobs pretty often. The interesting logs are like this:
automatic analyze of table "t1" system usage: CPU: user: 37.52 s, system: 23.01 
s, elapsed: 252.14 s
automatic analyze of table "t2" system usage: CPU: user: 38.70 s, system: 22.63 
s, elapsed: 317.33 s
automatic analyze of table "t2" system usage: CPU: user: 39.38 s, system: 21.43 
s, elapsed: 213.58 s
automatic analyze of table "t1" system usage: CPU: user: 37.91 s, system: 24.49 
s, elapsed: 229.45 s

and this is root-cause of my question.  As you can see there is huge 3x-4x time 
discrepancy between "elapsed" and user+system which is strange at least for me 
as there should be no waiting (autovacuum_vacuum_cost_delay=0). According to 
various tools it is true: Time was wasted somewhere else, but not in the 
PostgreSQL analyze. The ps(1) and pidstat(1) also report the same for the 
worker:

06:56:12 AM   PID%usr %system  %guest%CPU   CPU  Command
06:56:13 AM1147748.00   10.000.00   18.0018  postgres
06:56:14 AM1147748.00   11.000.00   19.0015  postgres
06:56:15 AM1147745.00   13.000.00   18.0018  postgres

06:56:17 AM   PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
06:56:18 AM114774  63746.53  0.00  0.00  postgres
06:56:19 AM114774  62896.00  0.00  0.00  postgres
06:56:20 AM114774  62920.00  0.00  0.00  postgres

One could argue that such autoanalyze worker could perform 5x more of work 
(%CPU -> 100%) here. The I/O system is not performing a lot (total = 242MB/s 
reads@22k IOPS, 7MB/s writes@7k IOPS, with service time 0.04ms), although 
reporting high utilization I'm pretty sure it could push much more. There can 
be up to 3x-4x of such 70-80MB/s analyzes in parallel (let's say 300MB/s for 
statistics collection alone).

According to various gdb backtraces, a lot of time is spent here:
#0  0x7f98cdfc9f73 in __pread_nocancel () from /lib64/libpthread.so.0
#1  0x00741a16 in pread (__offset=811253760, __nbytes=8192, 
__buf=0x7f9413ab7280, __fd=) at /usr/include/bits/unistd.h:84
#2  FileRead (file=, buffer=0x7f9413ab7280 "\037\005", 
amount=8192, offset=811253760, wait_event_info=167772173) at fd.c:1883
#3  0x00764b8f in mdread (reln=, forknum=, blocknum=19890902, buffer=0x7f9413ab7280 "\037\005") at md.c:596
#4  0x0073d69b in ReadBuffer_common (smgr=, 
relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=19890902, 
mode=RBM_NORMAL, strategy=0x1102278, hit=0x7fffba7e2d4f)
at bufmgr.c:897
#5  0x0073e27e in ReadBufferExtended (reln=0x7f98c0c9ded0, 
forkNum=MAIN_FORKNUM, blockNum=19890902, mode=, 
strategy=) at bufmgr.c:665
#6  0x004c2e2f in heapam_scan_analyze_next_block (scan=, 
blockno=, bstrategy=) at heapam_handler.c:998
#7  0x00597de1 in table_scan_analyze_next_block (bstrategy=, blockno=, scan=0x10c8098) at 
../../../src/include/access/tableam.h:1462
#8  acquire_sample_rows (onerel=0x7f98c0c9ded0, elevel=13, rows=0x1342e08, 
targrows=150, totalrows=0x7fffba7e3160, totaldeadrows=0x7fffba7e3158) at 
analyze.c:1048
#9  0x00596a50 in do_analyze_rel (onerel=0x7f98c0c9ded0, 
params=0x10744e4, va_cols=0x0, acquirefunc=0x597ca0 , 
relpages=26763525, inh=false,
in_outer_xact=false, elevel=13) at analyze.c:502
[..]
#12 0x006e76b4 in autovacuum_do_vac_analyze (bstrategy=0x1102278, 
tab=) at autovacuum.c:3118
[..]

The interesting thing that targrows=1.5mlns and that blocks are accessed (as 
expected) in sorted order:

Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890910, 
bstrategy=0x1102278) at heapam_handler.c:984
Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890912, 
bstrategy=0x1102278) at heapam_handler.c:984
Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890922, 
bstrategy=0x1102278) at heapam_handler.c:984
Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890935, 
bstrategy=0x1102278) at heapam_handler.c:984
Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890996, 
bstrategy=0x1102278) at heapam_handler.c:984

And probably this explains the discrepancy, perf with CPU usage reporting shows 
a lot of frames waiting on I/O on readaheads done by ext4, trimmed for clarity:

# Children  Self   sys   usr  Command   Shared Object   Symbol
63.64% 0.00% 0.00% 0.00%  postgres  [kernel.kallsyms]   [k] 
entry_SYSCALL_64_after_hwframe

Re: pgbench - add pseudo-random permutation function

2020-10-26 Thread Muhammad Usama
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

There are few "Stripping trailing CRs from the patch" and one "Hunk succeeded 
offset -2 lines" warning.
other than that the patch applies successfully and works as it claims.

The new status of this patch is: Ready for Committer


Re: POC: GROUP BY optimization

2020-10-26 Thread Dmitry Dolgov
> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
> > Thanks for your interest! FYI there is a new thread about this topic [1]
> > with the next version of the patch and more commentaries (I've created
> > it for visibility purposes, but probably it also created some confusion,
> > sorry for that).
> >
> > Thanks!
>
> I made a very quick look at your updates and noticed that it is intended to
> be simple and some parts of the code are removed as they have little test
> coverage. I'd propose vice versa to increase test coverage to enjoy more
> precise cost calculation and probably partial grouping.
>
> Or maybe it's worth to benchmark both patches and then re-decide what we
> want more to have a more complicated or a simpler version.
>
> Good to know that this feature is not stuck anymore and we have more than
> one proposal.
> Thanks!

Just to clarify, the patch that I've posted in another thread mentioned
above is not an alternative proposal, but a development of the same
patch I had posted in this thread. As mentioned in [1], reduce of
functionality is an attempt to reduce the scope, and as soon as the base
functionality looks good enough it will be returned back.

[1]: 
https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com




Re: POC: GROUP BY optimization

2020-10-26 Thread Pavel Borisov
> Thanks for your interest! FYI there is a new thread about this topic [1]
> with the next version of the patch and more commentaries (I've created
> it for visibility purposes, but probably it also created some confusion,
> sorry for that).
>
> Thanks!

I made a very quick look at your updates and noticed that it is intended to
be simple and some parts of the code are removed as they have little test
coverage. I'd propose vice versa to increase test coverage to enjoy more
precise cost calculation and probably partial grouping.

Or maybe it's worth to benchmark both patches and then re-decide what we
want more to have a more complicated or a simpler version.

Good to know that this feature is not stuck anymore and we have more than
one proposal.
Thanks!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: select_common_typmod

2020-10-26 Thread Heikki Linnakangas

On 20/10/2020 11:58, Peter Eisentraut wrote:

While working on another patch, I figured adding a
select_common_typmod() to go along with select_common_type() and
select_common_collation() would be handy.  Typmods were previously
combined using hand-coded logic in several places, and not at all in
other places.  The logic in select_common_typmod() isn't very exciting,
but it makes the code more compact and readable in a few locations, and
in the future we can perhaps do more complicated things if desired.


Makes sense.


There might have been a tiny bug in transformValuesClause() because
while consolidating the typmods it does not take into account whether
the types are actually the same (as more correctly done in
transformSetOperationTree() and buildMergedJoinVar()).


Hmm, it seems so, but I could not come up with a test case to reach that 
codepath. I think you'd need to create two types in the same 
typcategory, but with different and incompatible typmod formats.


The patch also adds typmod resolution for hypothetical ordered-set 
aggregate arguments. I couldn't come up with a test case that would 
tickle that codepath either, but it seems like a sensible change. You 
might want to mention it in the commit message though.


- Heikki




Re: POC: GROUP BY optimization

2020-10-26 Thread Dmitry Dolgov
> On Mon, Oct 26, 2020 at 12:44:58PM +0400, Pavel Borisov wrote:
> >
> >
> > I wonder if anyone has plans to try again with this optimization in v14
> > cycle? The patches no longer apply thanks to the incremental sort patch,
> > but I suppose fixing that should not be extremely hard.
> >
> > I found this feature interesting and I'd like to join.
> PFA the updated version of the patch made fully compatible with changes in
> v13 and I suppose it is ready to be reviewed for v14.
>
> Main things changed:
> 1. Patch is made compatible with new lists representation ( 1cff1b95ab6d )
> 2. Patch is made compatible with the incremental sort ( d2d8a229bc58 and
> ba3e76cc571e )
> 3. Tests are changed to represent changes in keys numbering and
> naming convention in the plan ( 55a1954da16 and 6ef77cf46e8 )
>
> As always your ideas are very much welcome!

Hi,

Thanks for your interest! FYI there is a new thread about this topic [1]
with the next version of the patch and more commentaries (I've created
it for visibility purposes, but probably it also created some confusion,
sorry for that).

[1]: 
https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com




Re: default result formats setting

2020-10-26 Thread Pavel Stehule
po 26. 10. 2020 v 9:31 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> During the discussion on dynamic result sets[0], it became apparent that
> the current way binary results are requested in the extended query
> protocol is too cumbersome for some practical uses, and keeping that
> style around would also make the proposed protocol extensions very
> complicated.
>
> The premise here is that a client library has hard-coded knowledge on
> how to deal with binary format for certain, but not all, data types.
> (Most client libraries process everything in text, and some client
> libraries process everything in binary.  Neither of these extremes are
> of concern here.)  Such a client always has to request a result row
> description (Describe statement) before sending a Bind message, in order
> to be able to pick out the result columns in should request in binary.
> The feedback was that this extra round trip is often not worth it in
> terms of performance, and so it is not done and binary format is not
> used when it could be.
>
> The conceptual solution is to allow a client to register for a session
> which types it wants to always get in binary, unless it says otherwise.
> In the discussion in [0], I pondered a new protocol message for that,
> but after further thought, a GUC setting would do just as well.
>
> The attached patch implements this.  For example, to get int2, int4,
> int8 in binary by default, you could set
>
> SET default_result_formats = '21=1,23=1,20=1';
>

Using SET statement for this case looks very obscure :/

This is a protocol related issue, and should be solved by protocol
extending. I don't think so SQL level is good for that.

More, this format is not practical for custom types, and the list can be
pretty long.


> This is a list of oid=format pairs.
>
> I think this format satisfies the current requirements of the JDBC
> driver.  But the format could also be extended in the future to allow
> type names to be listed or some other ways of identifying the types.
>
> In order to be able to test this via libpq, I had to add a little hack.
> Currently, PQexecParams() and similar functions can only pass exactly
> one result format code, which per protocol is then applied to all result
> columns.  There is no support for sending zero result format codes to
> make the session default apply.  I enabled this by allowing -1 to be
> passed as the format code.  I'm not sure if we want to make this part of
> the official API, but it would be useful to have something like this
> somehow.
>

+1 to this feature, but -1 for design. It should be solved on protocol
level.

Regards

Pavel

>
>
> [0]:
>
> https://www.postgresql.org/message-id/flat/6e747f98-835f-2e05-cde5-86ee444a7140%402ndquadrant.com
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: POC: GROUP BY optimization

2020-10-26 Thread Pavel Borisov
>
>
> I wonder if anyone has plans to try again with this optimization in v14
> cycle? The patches no longer apply thanks to the incremental sort patch,
> but I suppose fixing that should not be extremely hard.
>
> I found this feature interesting and I'd like to join.
PFA the updated version of the patch made fully compatible with changes in
v13 and I suppose it is ready to be reviewed for v14.

Main things changed:
1. Patch is made compatible with new lists representation ( 1cff1b95ab6d )
2. Patch is made compatible with the incremental sort ( d2d8a229bc58 and
ba3e76cc571e )
3. Tests are changed to represent changes in keys numbering and
naming convention in the plan ( 55a1954da16 and 6ef77cf46e8 )

As always your ideas are very much welcome!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v11-0001-GROUP-BY-optimization-made-compatible-with-chang.patch
Description: Binary data


Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

2020-10-26 Thread Zhenghua Lyu
Hi hackers,

   It seems the function `get_variable_numdistinct` ignore the case when 
stanullfrac is 1.0:

# create table t(a int, b int);
CREATE TABLE
# insert into t select i from generate_series(1, 1)i;
INSERT 0 1
gpadmin=# analyze t;
ANALYZE
# explain analyze select b, count(1) from t group by b;
QUERY PLAN
---
 HashAggregate  (cost=195.00..197.00 rows=200 width=12) (actual 
time=5.928..5.930 rows=1 loops=1)
   Group Key: b
   Batches: 1  Memory Usage: 40kB
   ->  Seq Scan on t  (cost=0.00..145.00 rows=1 width=4) (actual 
time=0.018..1.747 rows=1 loops=1)
 Planning Time: 0.237 ms
 Execution Time: 5.983 ms
(6 rows)

So it gives the estimate using the default value: 200.


I have added some lines of code to take `stanullfrac ==1.0` into account. With 
the patch attached, we now get:

# explain analyze select b, count(1) from t group by b;
QUERY PLAN
---
 HashAggregate  (cost=195.00..195.01 rows=1 width=12) (actual time=6.163..6.164 
rows=1 loops=1)
   Group Key: b
   Batches: 1  Memory Usage: 24kB
   ->  Seq Scan on t  (cost=0.00..145.00 rows=1 width=4) (actual 
time=0.024..1.823 rows=1 loops=1)
 Planning Time: 0.535 ms
 Execution Time: 6.344 ms
(6 rows)

I am not sure if this change is valuable in practical env, but it should go in 
the correct direction.

Any comments on this are appreciated.




0001-Consider-the-case-when-stanullfrac-is-1.0-in-get_var.patch
Description:  0001-Consider-the-case-when-stanullfrac-is-1.0-in-get_var.patch


RE: pgbench: option delaying queries till connections establishment?

2020-10-26 Thread kuroda.hay...@fujitsu.com
Dear Fabien, Andres

I think your idea is good, hence I put some comments as a reviewer.
I focus on only the linux code because I'm not familiar with the Windows 
system. Sorry.

[For patch A]

Please complete fixes for the documentation. At least the following sentence 
should be fixed:
```
The last two lines report the number of transactions per second, figured with 
and without counting the time to start database sessions.
```

> -starting vacuum...end.

I think any other options should be disabled in the example, therefore please 
leave this line.

> +   /* explicitly initialize the state machines */
> +   for (int i = 0; i < nstate; i++)
> +   {
> +   state[i].state = CSTATE_CHOOSE_SCRIPT;
> +   }

I'm not sure but I think braces should be removed in our coding conventions.

Other changes are being reviewed now.

[For patch B]

> +   /* GO */
> +   pthread_barrier_wait();

The current implementation is too simple. If nthreads >= 2 and connection fails 
in the one thread,
the other one will wait forever.
Some special treatments are needed in the `done` code block and here.


[others]

> > (that is, it can be disabled)
> 
> On reflection, I'm not sure I see a use case for not running the barrier 
> if it is available.

If the start point changes and there is no way to disable this feature,
the backward compatibility will be completely violated.
It means that tps cannot be compared to older versions under the same 
conditions,
and It may hide performance-related issues.
I think it's not good.


Best regards,
Hayato Kuroda
FUJITSU LIMITED

-Original Message-
From: Fabien COELHO  
Sent: Saturday, July 4, 2020 3:34 PM
To: Daniel Gustafsson 
Cc: Andres Freund ; pgsql-hack...@postgresql.org
Subject: Re: pgbench: option delaying queries till connections establishment?


>> * First patch reworks time measurements in pgbench.
>> * Second patch adds a barrier before starting the bench
>>
>> It applies on top of the previous one. The initial imbalance due to 
>> thread creation times is smoothed.
>
> The usecs patch fails to apply to HEAD, can you please submit a rebased 
> version
> of this patchset.  Also, when doing so, can you please rename the patches such
> that sort alphabetically in the order in which they are intended to be 
> applied.
> The CFBot patch tester will otherwise try to apply them out of order which
> cause errors.

Ok. Attached.

-- 
Fabien.




default result formats setting

2020-10-26 Thread Peter Eisentraut
During the discussion on dynamic result sets[0], it became apparent that 
the current way binary results are requested in the extended query 
protocol is too cumbersome for some practical uses, and keeping that 
style around would also make the proposed protocol extensions very 
complicated.


The premise here is that a client library has hard-coded knowledge on 
how to deal with binary format for certain, but not all, data types. 
(Most client libraries process everything in text, and some client 
libraries process everything in binary.  Neither of these extremes are 
of concern here.)  Such a client always has to request a result row 
description (Describe statement) before sending a Bind message, in order 
to be able to pick out the result columns in should request in binary. 
The feedback was that this extra round trip is often not worth it in 
terms of performance, and so it is not done and binary format is not 
used when it could be.


The conceptual solution is to allow a client to register for a session 
which types it wants to always get in binary, unless it says otherwise. 
In the discussion in [0], I pondered a new protocol message for that, 
but after further thought, a GUC setting would do just as well.


The attached patch implements this.  For example, to get int2, int4, 
int8 in binary by default, you could set


SET default_result_formats = '21=1,23=1,20=1';

This is a list of oid=format pairs.

I think this format satisfies the current requirements of the JDBC 
driver.  But the format could also be extended in the future to allow 
type names to be listed or some other ways of identifying the types.


In order to be able to test this via libpq, I had to add a little hack. 
Currently, PQexecParams() and similar functions can only pass exactly 
one result format code, which per protocol is then applied to all result 
columns.  There is no support for sending zero result format codes to 
make the session default apply.  I enabled this by allowing -1 to be 
passed as the format code.  I'm not sure if we want to make this part of 
the official API, but it would be useful to have something like this 
somehow.



[0]: 
https://www.postgresql.org/message-id/flat/6e747f98-835f-2e05-cde5-86ee444a7140%402ndquadrant.com


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From fa0e6968985cbe1f100746ea562b7f7712baf646 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 26 Oct 2020 09:10:43 +0100
Subject: [PATCH v1] Add default_result_formats setting

---
 doc/src/sgml/config.sgml   |  42 ++
 doc/src/sgml/libpq.sgml|   3 +
 doc/src/sgml/protocol.sgml |   2 +-
 src/backend/tcop/pquery.c  | 100 -
 src/backend/utils/misc/guc.c   |  12 
 src/include/tcop/pquery.h  |   5 ++
 src/interfaces/libpq/fe-exec.c |  14 -
 7 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..6009e13899 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8530,6 +8530,48 @@ Statement Behavior
   
  
 
+ 
+  default_result_formats (string)
+  
+   default_result_formats
+   configuration parameter
+  
+  
+  
+   
+This parameter specifies the default result formats by data type for
+rows returned in the extended query protocol when no result formats
+are specified in the Bind message.  It is intended to be used by
+client libraries that prefer to handle certain data types in binary
+format.  The typical usage would be that the client library sets this
+value when it starts a connection.  (A client library that wants to
+handle all types in binary doesn't need to use
+this because it can just specify the format code for all types at once
+in the protocol message.)
+   
+   
+The value is a list of
+typeoid=format,
+separated by commas.  typeoid is the OID of
+a type, from the pg_type system catalog.
+format is the format code, currently 0 for
+text and 1 for binary.  0 is the default for all types, so only 1
+needs to be specified explicitly.  For example, if you want to
+automatically get values of the types int2,
+int4, and int8 in binary while leaving the
+rest in text, an appropriate setting would be
+21=1,23=1,20=1.
+   
+   
+Invalid format codes are an error.  Nonexistent type OIDs are not
+diagnosed.  This setting applies only to result rows from the extended
+query protocol, so it does not affect usage of
+psql or pg_dump
+for example.  Also, it does not affect the format of query parameters.
+   
+  
+ 
+
  
 
  
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 

Re: [HACKERS] logical decoding of two-phase transactions

2020-10-26 Thread Peter Smith
Hi Ajin.

I checked the to see how my previous review comments (of v10) were
addressed by the latest patches (currently v12)

There are a couple of remaining items.

---


v12-0001. File: doc/src/sgml/logicaldecoding.sgml


COMMENT
Section 49.6.1
Says:
An output plugin may also define functions to support streaming of
large, in-progress transactions. The stream_start_cb, stream_stop_cb,
stream_abort_cb, stream_commit_cb, stream_change_cb, and
stream_prepare_cb are required, while stream_message_cb and
stream_truncate_cb are optional.

An output plugin may also define functions to support two-phase
commits, which are decoded on PREPARE TRANSACTION. The prepare_cb,
commit_prepared_cb and rollback_prepared_cb callbacks are required,
while filter_prepare_cb is optional.
~
I was not sure how the paragraphs are organised. e.g. 1st seems to be
about streams and 2nd seems to be about two-phase commit. But they are
not mutually exclusive, so I guess I thought it was odd that
stream_prepare_cb was not mentioned in the 2nd paragraph.

Or maybe it is OK as-is?


v12-0002. File: contrib/test_decoding/expected/two_phase.out


COMMENT
Line 26
PREPARE TRANSACTION 'test_prepared#1';
--
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'two-phase-commit', '1', 'include-xids', '0',
'skip-empty-xacts', '1');
~
Seems like a missing comment to explain the expectation of that select.

---

COMMENT
Line 80
-- The insert should show the newly altered column.
~
Do you also need to mention something about the DDL not being present
in the decoding?


v12-0002. File: src/backend/replication/logical/reorderbuffer.c


COMMENT
Line 1807
/* Here we are streaming and part of the PREPARE of a two-phase commit
* The full cleanup will happen as part of the COMMIT PREPAREDs, so now
* just truncate txn by removing changes and tuple_cids
*/
~
Something seems strange about the first sentence of that comment

---

COMMENT
Line 1944
/* Discard the changes that we just streamed.
 * This can only be called if streaming and not part of a PREPARE in
 * a two-phase commit, so set prepared flag as false.
 */
~
I thought since this comment that is asserting various things, that
should also actually be written as code Assert.

---

COMMENT
Line 2401
/*
 * We are here due to one of the 3 scenarios:
 * 1. As part of streaming in-progress transactions
 * 2. Prepare of a two-phase commit
 * 3. Commit of a transaction.
 *
 * If we are streaming the in-progress transaction then discard the
 * changes that we just streamed, and mark the transactions as
 * streamed (if they contained changes), set prepared flag as false.
 * If part of a prepare of a two-phase commit set the prepared flag
 * as true so that we can discard changes and cleanup tuplecids.
 * Otherwise, remove all the
 * changes and deallocate the ReorderBufferTXN.
 */
~
The above comment is beyond my understanding. Anything you could do to
simplify it would be good.

For example, when viewing this function in isolation I have never
understood why the streaming flag and rbtxn_prepared(txn) flag are not
possible to be set at the same time?

Perhaps the code is relying on just internal knowledge of how this
helper function gets called? And if it is just that, then IMO there
really should be some Asserts in the code to give more assurance about
that. (Or maybe use completely different flags to represent those 3
scenarios instead of bending the meanings of the existing flags)


v12-0003. File: src/backend/access/transam/twophase.c


COMMENT
Line 557
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
 }

 /*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
~
Alignment of the variable declarations in LookupGXact function

---

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-26 Thread Heikki Linnakangas

On 25/10/2020 23:56, Justin Pryzby wrote:

On Fri, Oct 23, 2020 at 11:09:26PM +0300, Heikki Linnakangas wrote:

Findings in detail follow.


Are you working on a patch for these ?


I pushed the patch I included in that email now, to remove the most 
clear cases. I'm not planning to do anything more right now.



Otherwise, since I started something similar in April, I could put something
together based on comments you've gotten here.


That'd be great, thanks!

- Heikki