Re: Libpq support to connect to standby server as priority

2019-06-02 Thread Haribabu Kommi
On Thu, Apr 11, 2019 at 9:13 AM Haribabu Kommi 
wrote:

> I fixed all the comments that you have raised above and attached the
> updated
> patches.
>

Rebased patches are attached.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-New-pg_basebackup-g-option-to-control-the-group-acce.patch
Description: Binary data


0002-New-TargetSessionAttrsType-enum.patch
Description: Binary data


0003-Make-transaction_read_only-as-GUC_REPORT-varaible.patch
Description: Binary data


0004-New-prefer-read-target_session_attrs-type.patch
Description: Binary data


0005-New-read-only-target_session_attrs-type.patch
Description: Binary data


0006-Primary-prefer-standby-and-standby-options.patch
Description: Binary data


0007-New-function-to-rejecting-the-checked-write-connecti.patch
Description: Binary data


0008-Server-recovery-mode-handling.patch
Description: Binary data


Re: Print baserestrictinfo for varchar fields

2019-06-02 Thread Tom Lane
Donald Dong  writes:
> I noticed that debug_print_rel outputs "unknown expr" when the fields
> in baserestrictinfo are typed as varchar.
> ...
> I wonder if this is a proper way of fixing it?

It's hard to muster much enthusiasm for extending print_expr(),
considering how incomplete and little-used it is.  I'd rather
spend effort on ripping it out in favor of using the far more
complete, and better-tested, code in ruleutils.c.

regards, tom lane




Re: [PATCH] Simple typos fix

2019-06-02 Thread Michael Paquier
On Sun, Jun 02, 2019 at 04:42:57PM -0500, Justin Pryzby wrote:
> Thanks for finding these ; I think a few hunks are false positives and should
> be removed.

Yes, some of them are the changes in imath.c and snowball/, which we
include in Postgres but in reality are independent projects, so these
should be fixed in upstream instead, and Postgres will include those
fixes when merging with newer versions.  If we were to fix those
issues ourselves, then we would likely create conflicts when merging
with newer versions of the upstream modules.

> A few more are debatable and could be correct either way:
> 
> alloced - not an English word, but a technical one;

Indeed.  The current wording is fine by me.

> delink - "unlink" is better for the filesystem operation, but I
> don't think "delink" is wrong for a list operation.
> dependees (?)

These terms could be used in programming.

> This'd
> define'd

Don't think it is much of a big deal to keep these as well.
"invokable" can be used in programming, and "cachable" is an alternate
spelling of "cacheable" based on some research.

> On Tue, May 28, 2019 at 08:17:18PM +0200, Andrea Gelmini wrote:
>> diff --git a/contrib/amcheck/verify_nbtree.c 
>> b/contrib/amcheck/verify_nbtree.c
>> index de0a98f6d9..ff13b0c9e7 100644
>> --- a/contrib/amcheck/verify_nbtree.c
>> +++ b/contrib/amcheck/verify_nbtree.c
>> @@ -1278,7 +1278,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
>>   * Routines like _bt_search() don't require *any* page split interlock
>>   * when descending the tree, including something very light like a 
>> buffer
>>   * pin. That's why it's okay that we don't either.  This avoidance of 
>> any
>> - * need to "couple" buffer locks is the raison d' etre of the Lehman & 
>> Yao
>> + * need to "couple" buffer locks is the reason d'etre of the Lehman & 
>> Yao
> 
> I think this is wrong.  The French phase is "raison d'etre".

French here.  Note that an accent is missing on the first 'e' (être)
but we don't want non-ASCII characters in the code.  So the current
wording is fine in my opinion.

> I think this is wrong.  It should say "though".  Or perhaps:
>  * store at segment to which its start lsn belongs, but don't 
> split over
>  * multiple segments

I would replace it by "though", "tho" is not incorrect tho ;)

>> @@ -907,7 +907,7 @@ write_relmap_file(bool shared, RelMapFile *newmap,
>>   * Make sure that the files listed in the map are not deleted if the 
>> outer
>>   * transaction aborts.  This had better be within the critical section
>>   * too: it's not likely to fail, but if it did, we'd arrive at 
>> transaction
>> - * abort with the files still vulnerable.  PANICing will leave things 
>> in a
>> + * abort with the files still vulnerable.  Panicking will leave things 
>> in a
> 
> Wrong ?

Yes, the suggestion is wrong.  The comment refers to the elog level.

The original patch proposed 63 diffs.  After the false positives are
removed, 21 remain, which I have now committed.  You have done good
work in catching all these, by the way.  Thanks for taking the time to
do so.
--
Michael


signature.asc
Description: PGP signature


Re: psql completion bugs with access methods

2019-06-02 Thread Michael Paquier
On Sat, Jun 01, 2019 at 12:44:05PM -0700, Andres Freund wrote:
> "I'm fine with adding those for 12", so no, I'm not strongly opposed.

OK, fixed this one for now.
--
Michael


signature.asc
Description: PGP signature


Re: initdb recommendations

2019-06-02 Thread Noah Misch
On Tue, May 28, 2019 at 12:15:35PM -0400, Magnus Hagander wrote:
> On Fri, May 24, 2019 at 11:24 AM Noah Misch  wrote:
> > On Thu, May 23, 2019 at 06:56:49PM +0200, Magnus Hagander wrote:
> > > On Thu, May 23, 2019, 18:54 Peter Eisentraut 
> > >  wrote:
> > > > To recap, the idea here was to change the default authentication methods
> > > > that initdb sets up, in place of "trust".
> > > >
> > > > I think the ideal scenario would be to use "peer" for local and some
> > > > appropriate password method (being discussed elsewhere) for host.
> > > >
> > > > Looking through the buildfarm, I gather that the only platforms that
> > > > don't support peer are Windows, AIX, and HP-UX.  I think we can probably
> > > > figure out some fallback or alternative default for the latter two
> > > > platforms without anyone noticing.  But what should the defaults be on
> > > > Windows?  It doesn't have local sockets, so the lack of peer wouldn't
> > > > matter.  But is it OK to default to a password method, or would that
> > > > upset people particularly?
> > >
> > > I'm sure password would be fine there. It's what "everybody else" does
> > > (well sqlserver also cord integrated security, but people are used to it).
> > 
> > Our sspi auth is a more-general version of peer auth, and it works over TCP.
> > It would be a simple matter of programming to support "peer" on Windows,
> > consisting of sspi auth with an implicit pg_ident map.  Nonetheless, I agree
> > password would be fine.
>
> I hope oyu don't mean "make peer use sspi on windows". I think that's a
> really bad idea from a confusion perspective.

I don't mean "make peer an alias for SSPI", but I do mean "implement peer on
Windows as a special case of sspi, using the same Windows APIs".  To the
client, "peer" would look like "sspi".  If that's confusion-prone, what's
confusing about it?

> However, what we could do there is have the defaut pg_hba.conf file contain
> a "reasonable setup using sspi" that's a different story.

That's another way to do it.  Currently, to behave like "peer" behaves, one
hard-codes the machine's SSPI realm into pg_ident.conf.  If we introduced
pg_ident.conf syntax to remove that need (e.g. %MACHINE_REALM%), that approach
would work.

> But I wonder if that isn't better implemented at the installer level. I
> think we're better off doing something like scram as the config when you
> build from source ,and then encourage installers to do other things based on
> the fact that they know more information about the setup (such as usernames
> actually used).

If initdb has the information needed to configure the recommended
authentication, that's the best place to do it, since there's one initdb and
many installers.  So far, I haven't seen a default auth configuration proposal
involving knowledge of OS usernames or other information initdb lacks.




Re: [PATCH] Simple typos fix

2019-06-02 Thread Justin Pryzby
Thanks for finding these ; I think a few hunks are false positives and should
be removed.  A few more are debatable and could be correct either way:

Kazakstan
alloced - not an English word, but a technical one;
delink - "unlink" is better for the filesystem operation, but I don't think 
"delink" is wrong for a list operation.
dependees (?)
This'd
define'd

On Tue, May 28, 2019 at 08:17:18PM +0200, Andrea Gelmini wrote:
> diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
> index de0a98f6d9..ff13b0c9e7 100644
> --- a/contrib/amcheck/verify_nbtree.c
> +++ b/contrib/amcheck/verify_nbtree.c
> @@ -1278,7 +1278,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
>* Routines like _bt_search() don't require *any* page split interlock
>* when descending the tree, including something very light like a 
> buffer
>* pin. That's why it's okay that we don't either.  This avoidance of 
> any
> -  * need to "couple" buffer locks is the raison d' etre of the Lehman & 
> Yao
> +  * need to "couple" buffer locks is the reason d'etre of the Lehman & 
> Yao

I think this is wrong.  The French phase is "raison d'etre".

> diff --git a/src/backend/replication/logical/reorderbuffer.c 
> b/src/backend/replication/logical/reorderbuffer.c
> index e7c32f2a13..20bb928016 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2279,7 +2279,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, 
> ReorderBufferTXN *txn)
>  
>   /*
>* store in segment in which it belongs by start lsn, don't 
> split over
> -  * multiple segments tho
> +  * multiple segments to

I think this is wrong.  It should say "though".  Or perhaps:
 * store at segment to which its start lsn belongs, but don't 
split over
 * multiple segments

> diff --git a/src/backend/utils/cache/relmapper.c 
> b/src/backend/utils/cache/relmapper.c
> index 3b4f21bc54..403435df52 100644
> --- a/src/backend/utils/cache/relmapper.c
> +++ b/src/backend/utils/cache/relmapper.c
> @@ -146,7 +146,7 @@ static void perform_relmap_update(bool shared, const 
> RelMapFile *updates);
>  /*
>   * RelationMapOidToFilenode
>   *
> - * The raison d' etre ... given a relation OID, look up its filenode.
> + * The reason d'etre... given a relation OID, look up its filenode.

Wrong

> @@ -907,7 +907,7 @@ write_relmap_file(bool shared, RelMapFile *newmap,
>* Make sure that the files listed in the map are not deleted if the 
> outer
>* transaction aborts.  This had better be within the critical section
>* too: it's not likely to fail, but if it did, we'd arrive at 
> transaction
> -  * abort with the files still vulnerable.  PANICing will leave things 
> in a
> +  * abort with the files still vulnerable.  Panicking will leave things 
> in a

Wrong ?

Thanks,
Justin




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-06-02 Thread Tomas Vondra

Hi James,

On Fri, May 31, 2019 at 03:51:57PM -0400, James Coleman wrote:

I've rebased the patch on master and confirmed make check world passes.


Thanks for the rebase! I think the patch is in pretty good shape - I'm
sure we'll find ways to make it more efficient etc. but IMO that's fine
and should not prevent getting it committed.

The planning/costing logic may need further discussion and improvements,
though. IIRC this was the main reason why the patch missed PG11, because
at that point it simply used incremental sort whenever the input was
already presorted by a pathkey prefix, but that may be slower than regular
sort in some cases (unexpectedly large groups, etc.).

I see the current patch partially improves this by removing the creating
both paths (full and and incremental sort). That'd good, because it means
the decision is cost-based (as it should be). The question however is how
accurate the costing model is - and per discussion in the thread, it may
need some improvements, to handle skewed distributions better.

Currently, the costing logic (cost_incremental_sort) assumes all groups
have the same size, which is fine for uniform distributions. But for
skewed distributions, that may be an issue.

Consider for example a table with 1M rows, two columns, 100 groups in each
column, and index on the first one.

   CREATE table t (a INT, b INT);

   INSERT INTO t SELECT 100*random(), 100*random()
 FROM generate_series(1,100);

Now, let's do a simple limit query to find the first row:

   SELECT * FROM t ORDER BU a, b LIMIT 1;

In this case the current costing logic is fine - the groups are close to
average size, and we only need to sort the first group, i.e. 1% of data.

Now, let's say the first group is much larger:


   INSERT INTO t SELECT 0, 100*random()
 FROM generate_series(1,90) s(i);

   INSERT INTO t SELECT 100*random(), 100*random()
 FROM generate_series(1,10);

That is, the first group is roughly 90% of data, but the number of groups
is still the same. But we need to scan 90% of data. But the average group
size is still the same, so the current cost model is oblivious to this.

But I think we can improve this by looking at the MCV lists (either
per-column or multi-column) and see if some groups are much larger, and
consider that when computing the costs.

In particular, I think we should estimate the size of the first group,
because that's important for startup cost - we need to process the whole
first group before producing the first tuple, and that matters for LIMIT
queries etc.

For example, let's say the first (already sorted) column has a MCV. Then
we can see how large the first group (by valule, not frequency) is, and
use that instead of the average group size. E.g. in the above example we'd
know the first group is ~90%.

And we could do the same for multiple columns, either by looking at
multi-column MCV lists (if there's one), or by using minimum from each
per-column MCV lists.

Of course, these are only the groups that made it to the MCV list, and
there may be other (smaller) groups before this large one. For example
there could be a group with "-1" value and a single row.

For a moment I thought we could/should look at the histogram, becase that
could tell us if there are groups "before" the first MCV one, but I don't
think we should do that, for two reasons. Firstly, rare values may not get
to the histogram anyway, which makes this rather unreliable and might
introduce sudden plan changes, because the cost would vary wildly
depending on whether we happened to sample the rare row or not. And
secondly, the rare row may be easily filtered out by a WHERE condition or
something, at which point we'll have to deal with the large group anyway.

So I think we should look at the MCV list, and use that information when
computing the startup/total cost. I think using the first/largest group to
compute the startup cost, and the average group size for total cost would
do the trick.

I don't think we can do much better than this during planning. There will
inevitably be cases where the costing model will push us to do the wrong
thing, in either direction. The question is how serious issue that is, and
whether we could remedy that during execution somehow.

When we "incorrectly" pick full sort (when the incremental sort would be
faster), that's obviously sad but I think it's mostly fine because it's
not a regression.

The opposite direction (picking incremental sort, while full sort being
faster) is probably more serious, because it's a regression between
releases.

I don't think we can fully fix that by refining the cost model. We have
two basic options:

1) Argue/show that this is not an issue in practice, because (a) such
cases are very rare, and/or (b) the regression is limited. In short, the
benefits of the patch outweight the losses.

2) Provide some fallback at execution time. For example, we might watch
the size of the group, and if we run into an unexpectedly la

unused_oids script should advertise reserved OID range

2019-06-02 Thread Peter Geoghegan
The unused_oids script has gone from being something of interest to
everybody that wants to write a patch that creates a new catalog
entry, to something that patch authors could do without in many cases.
I think that its output should prominently advertise that patches
should use random OIDs in the range 8000 - . Commit
a6417078c4140e51cfd717448430f274b449d687 established that this should
be standard practice for patch authors.

Actually, maybe it should even suggest a *particular* random OID in
that range, so that the choice of OID is reliably random -- why even
require patch authors to pick a number at random?

It also looks like pg_proc.dat should be updated, since it still
mentions the old custom of trying to use contiguous OIDs. It also
discourages the practice of picking OIDs at random, which is almost
the opposite of what it should say.

-- 
Peter Geoghegan




Residual cpluspluscheck issues

2019-06-02 Thread Tom Lane
cpluspluscheck's expanded coverage is now passing cleanly for me on
the macOS laptop I was testing it with at PGCon.  But on returning
home, I find there's still some issues on some other boxes:

* On Linux (at least Fedora and RHEL), I get variants of this:

/usr/include/arpa/inet.h:84: error: declaration of 'char* inet_net_ntop(int, 
const void*, int, char*, size_t) throw ()' throws different exceptions
/home/postgres/pgsql/src/include/port.h:506: error: from previous declaration 
'char* inet_net_ntop(int, const void*, int, char*, size_t)'

That's because /usr/include/arpa/inet.h declares it as

extern char *inet_net_ntop (int __af, const void *__cp, int __bits,
char *__buf, size_t __len) __THROW;

and of course when a C++ compiler reads that, __THROW will expand as
something nonempty.

One possible fix for that is to teach configure to test whether
arpa/inet.h provides a declaration, and not compile our own declaration
when it does.  This would require being sure that we include arpa/inet.h
anywhere we use that function, but there are few enough callers that
that's not much of a hardship.

Alternatively, we could rename our function to pg_inet_net_ntop to
dodge the conflict.  This might be a good idea anyway to avoid
confusion, since our function doesn't necessarily recognize the same
address-family codes that libc would.

* On FreeBSD 12, I get

/home/tgl/pgsql/src/include/utils/hashutils.h:23:23: warning: 'register' storage
  class specifier is deprecated and incompatible with C++17
  [-Wdeprecated-register]
extern Datum hash_any(register const unsigned char *k, register int keylen);
  ^
/home/tgl/pgsql/src/include/utils/hashutils.h:23:56: warning: 'register' storage
  class specifier is deprecated and incompatible with C++17
  [-Wdeprecated-register]
extern Datum hash_any(register const unsigned char *k, register int keylen);
   ^
/home/tgl/pgsql/src/include/utils/hashutils.h:24:32: warning: 'register' storage
  class specifier is deprecated and incompatible with C++17
  [-Wdeprecated-register]
extern Datum hash_any_extended(register const unsigned char *k,
   ^
/home/tgl/pgsql/src/include/utils/hashutils.h:25:11: warning: 'register' storage
  class specifier is deprecated and incompatible with C++17
  [-Wdeprecated-register]
   register int ...
   ^

which I'm inclined to think means we should drop those register keywords.

(The FreeBSD box shows another couple of complaints too, but I think
the fixes for those are uncontroversial.)

Comments?

regards, tom lane




Optimize partial TOAST decompression

2019-06-02 Thread Binguo Bao
Hi, hackers!
I'm a student participating in GSoC 2019 and my project is related to TOAST
slices.
When I'm getting familiar with the postgresql codebase, I find that
PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch
all compressed data chunks then extract the relevant slice. Obviously, this
is unnecessary, we only need to fetch the data chunks we need.

The patch optimizes partial TOAST decompression.
For an example of the improvement possible, this trivial example:
-
create table slicingtest (
id serial primary key,
a text
);

insert into slicingtest (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100) as a from
generate_series(1,100);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;
-
environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux
On master, I get
Time: 28.123 ms (Take ten times average)
With the patch, I get
Time: 2.306 ms  (take ten times average)

This seems to have a 10x improvement. If the number of toast data chunks is
more, I believe that patch can play a greater role, there are about 200
related TOAST data chunks for each entry in the case.

Related discussion:
https://www.postgresql.org/message-id/flat/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com

Best regards, Binguo Bao.
From a7c99439ffe309526b57fe26ab367e4b7bf62f39 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..7d30538 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -273,8 +273,11 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		/*
+		 * Be sure to get enough compressed slice
+		 * and compressed marker will get set automatically
+		 */
+		preslice = toast_fetch_datum_slice(attr, 0, sliceoffset + slicelength + 1);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
 	{
@@ -2031,7 +2034,8 @@ toast_fetch_datum(struct varlena *attr)
  *	Reconstruct a segment of a Datum from the chunks saved
  *	in the toast relation
  *
- *	Note that this function only supports non-compressed external datums.
+ *	Note that this function supports non-compressed external datums
+ *	and compressed external datum slices at the start of the object.
  * --
  */
 static struct varlena *
@@ -2072,10 +2076,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
 	/*
-	 * It's nonsense to fetch slices of a compressed datum -- this isn't lo_*
-	 * we can't return a compressed datum which is meaningful to toast later
+	 * It's meaningful to fetch slices at the start of a compressed datum.
 	 */
-	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
+	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
 
 	attrsize = toast_pointer.va_extsize;
 	totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
@@ -2091,7 +2094,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 
 	result = (struct varlena *) palloc(length + VARHDRSZ);
 
-	SET_VARSIZE(result, length + VARHDRSZ);
+	if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) {
+		SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
+	} else {
+		SET_VARSIZE(result, length + VARHDRSZ);
+	}
 
 	if (length == 0)
 		return result;			/* Can save a lot of work at this point! */
-- 
2.7.4