Bizarre behavior in libpq's searching of ~/.pgpass

2018-07-27 Thread Tom Lane
I noticed that there's some strange coding in libpq's choice of
what hostname to use for searching ~/.pgpass for a password.
Historically (pre-v10), it just used the pghost parameter:

conn->pgpass = PasswordFromFile(conn->pghost, conn->pgport,
conn->dbName, conn->pguser);

no ifs, ands, or buts, except for the fact that PasswordFromFile
replaces its hostname parameter with "localhost" if it's null or
matches the default socket directory.  This is per the documentation
(see sections 34.1.2 and 34.15).

Since v10 we've got this:

char   *pwhost = conn->connhost[i].host;

if (conn->connhost[i].type == CHT_HOST_ADDRESS &&
conn->connhost[i].host != NULL &&
conn->connhost[i].host[0] != '\0')
pwhost = conn->connhost[i].hostaddr;

conn->connhost[i].password =
passwordFromFile(pwhost,
 conn->connhost[i].port,
 conn->dbName,
 conn->pguser,
 conn->pgpassfile);

Now that's just bizarre on its face: take hostaddr if it's specified,
but only if host is also specified?  And it certainly doesn't match
the documentation.

It's easy to demonstrate by testing that if you specify both host and
hostaddr, the search behavior is different now than it was pre-v10.
Given the lack of documentation change to match, that seems like a bug.

Digging in the git history, it seems this was inadvertently broken by
commit 274bb2b38, which allowed multiple entries in pghost but not
pghostaddr, with some rather inconsistent redefinitions of what the
internal values were.  Commit bdac9836d tried to repair it, but it was
dependent on said inconsistency, and got broken again in commit 7b02ba62e
which allowed multiple hostaddrs and removed the inconsistent internal
representation.  At no point did anyone change the docs.

So my first thought was that we should go back to the pre-v10 behavior
of considering only the host parameter, which it looks like would only
require removing the "if" bit above.

But on second thought, I'm not clear that the pre-v10 behavior is really
all that sane either.  What it means is that if you specify only hostaddr,
the code will happily grab your localhost password and send it off to
whatever server hostaddr references.  This is unlikely to be helpful,
and it could even be painted as a security breach --- the remote server
could ask for your password in plaintext and then capture it.

What seems like a saner definition is "use host if it's specified
(nonempty), else use hostaddr if it's specified (nonempty), else
fall back to localhost".  That avoids sending a password somewhere
it doesn't belong, and allows a useful ~/.pgpass lookup in cases
where only hostaddr is given -- you just need to make an entry
with the numeric IP address in the host column.

I think it's not too late to make v11 work that way, but I wonder
what we ought to do in v10.  Comments?

regards, tom lane



Re: Removing useless \. at the end of copy in pgbench

2018-07-27 Thread Tatsuo Ishii
> Tatsuo Ishii  writes:
>> While populating pgbench_account table by using COPY FROM STDIN,
>> pgbench sends out "\." at the end of the copy data. However this is
>> only necessary in the version 2 of frontend/backend protocol (i.e. the
>> version 3 protocol does not need it). I think we can safely remove the
>> code to save a few CPU cycle since we only support back to PostgreSQL
>> 9.3 and the version 3 protocol has been supported since 7.4.
> 
> What exactly is the benefit of breaking compatibility with old servers?
> Saving a few cycles during initial table population seems like it could
> not be of interest to anyone.

Then we can do protocol version checking to not break backward
compatibility. I believe this is what psql does. So this will bring a
benefit to make our clients more consistent.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-07-27 Thread David Rowley
On 27 July 2018 at 19:11, Amit Langote  wrote:
> I've attached a delta patch to make the above changes.  I'm leaving the
> hash table rename up to you though.

Thanks for the delta patch. I took all of it, just rewrote a comment slightly.

I also renamed the hash table to your suggestion and changed a few more things.

Attached a delta based on v2 and the full v3 patch.

This includes another small change to make
PartitionDispatchData->indexes an array that's allocated in the same
memory as the PartitionDispatchData. This will save a palloc() call
and also should be a bit more cache friendly.

This also required a rebase on master again due to 3e32109049.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v2-delta2.patch
Description: Binary data


v3-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch
Description: Binary data


Re: Removing useless \. at the end of copy in pgbench

2018-07-27 Thread Tom Lane
Tatsuo Ishii  writes:
> While populating pgbench_account table by using COPY FROM STDIN,
> pgbench sends out "\." at the end of the copy data. However this is
> only necessary in the version 2 of frontend/backend protocol (i.e. the
> version 3 protocol does not need it). I think we can safely remove the
> code to save a few CPU cycle since we only support back to PostgreSQL
> 9.3 and the version 3 protocol has been supported since 7.4.

What exactly is the benefit of breaking compatibility with old servers?
Saving a few cycles during initial table population seems like it could
not be of interest to anyone.

regards, tom lane



Removing useless \. at the end of copy in pgbench

2018-07-27 Thread Tatsuo Ishii
While populating pgbench_account table by using COPY FROM STDIN,
pgbench sends out "\." at the end of the copy data. However this is
only necessary in the version 2 of frontend/backend protocol (i.e. the
version 3 protocol does not need it). I think we can safely remove the
code to save a few CPU cycle since we only support back to PostgreSQL
9.3 and the version 3 protocol has been supported since 7.4.

If we want to support pre 7.4 backend (which only supports the version
2 protocol), we could test the current protocol version and decide
whether we should send out "\." or not, but I doubt it's necessary.

Comments?

(patch to remove the unneccessary code attached)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c..54b7182 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3686,11 +3686,6 @@ initGenerateData(PGconn *con)
 		}
 
 	}
-	if (PQputline(con, "\\.\n"))
-	{
-		fprintf(stderr, "very last PQputline failed\n");
-		exit(1);
-	}
 	if (PQendcopy(con))
 	{
 		fprintf(stderr, "PQendcopy failed\n");


Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-27 Thread Michael Paquier
On Fri, Jul 27, 2018 at 02:40:42PM +, Bossart, Nathan wrote:
> On 7/26/18, 11:16 PM, "Michael Paquier"  wrote:
> I think I'm essentially suggesting what you have in 0002 but without
> the new RangeVarGetRelidExtended() callback.  I've attached a modified
> version of 0002 that seems to fix the originally reported issue.  (I
> haven't looked into any extra handling needed for ANALYZE or
> partitioned tables.)  Running the same checks for all VACUUMs would
> keep things simple and provide a more uniform user experience.

Okay, let me check that.  Your patch has at least an error in
get_all_vacuum_rels() where toast relations cannot be skipped.

>> The docs mentioned that shared catalogs are processed, so I did not
>> bother, but visibly your comment is that we could be more precise about
>> the ownership in this case?  An attempt is attached.
> 
> Sorry, I should have been clearer.  But yes, your update is what I was
> thinking.

No problem.  If there are no objections, I am going to fix the REINDEX
issue first and back-patch.  Its patch is the least invasive of the
set.
--
Michael


signature.asc
Description: PGP signature


Re: add verbosity to pg_basebackup for sync

2018-07-27 Thread Michael Paquier
On Fri, Jul 27, 2018 at 11:58:42AM -0400, Jeff Janes wrote:
> But it was really waiting for the syncs of the new -D dir to finish.  The
> attached patch adds a -v notice that it is starting to do the sync, with
> the wording taken from initdb's equivalent message.

This is a good idea.  Would we want that back-patched?  I am sure that
nobody is going to complain for an extra informational log.
--
Michael


signature.asc
Description: PGP signature


Re: Locking B-tree leafs immediately in exclusive mode

2018-07-27 Thread Alexander Korotkov
On Fri, Jul 27, 2018 at 6:11 PM Alexander Korotkov
 wrote:
> On Fri, Jul 27, 2018 at 12:30 AM Simon Riggs  wrote:
> > On 26 July 2018 at 20:59, Alexander Korotkov  
> > wrote:
> >
> > > Great, thank you!  So, I think the regression is demystified.  We can
> > > now conclude that on our benchmarks this patch doesn't cause
> > > performance regression larger than measurement error.  But in some
> > > cases it shows huge performance benefit.
> > >
> > > So, I'm going to commit this, if no objections.
> >
> > +1 to commit.
> >
> > What will the commit message be?
> >
> > For me, this is about reducing contention on index leaf page hotspots,
> > while at the same time reducing the path length of lock acquisition on
> > leaf pages
>
> So, reducing path length of lock acquisition is particular technical
> change made, while reducing contention on index leaf pages is a
> result.  I think that reducing path length of lock acquisition should
> be mentioned in title of commit message, while contention reduction
> should be mentioned in the body of commit message, because it's
> motivation of this commit.  If we would have release notes item for
> this commit, it should also mention contention reduction, because it's
> a user-visible effect of this commit.

So, I've pushed it.  Thanks to everybody for review.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Segfault logical replication PG 10.4

2018-07-27 Thread Alvaro Herrera
On 2018-Jul-20, Minh-Quan Tran wrote:

> Something like this?

Can you provide a reproducer for this problem?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: patch to allow disable of WAL recycling

2018-07-27 Thread Jerry Jelinek
I've setup FreeBSD 11.1 in a VM and I setup a ZFS filesystem to use for the
Postgres DB. I ran the following simple benchmark.

pgbench -M prepared -c 4 -j 4 -T 60 postgres

Since it is in a VM and I can't control what else might be happening on the
box, I ran this several times at different times of the day and averaged
the results. Here is the average TPS and latency with WAL recycling on (the
default) and off.

recycling on
avg tps: 407.4
avg lat: 9.8

recycling off
avg tps: 425.7
avg lat: 9.4 ms

Given my uncertainty about what else is running on the box, I think it is
reasonable to say these are essentially equal, but I can collect more data
across more different times if necessary. I'm also happy to collect more
data if people have suggestions for different parameters on the pgbench run.

Thanks,
Jerry


On Fri, Jul 20, 2018 at 4:04 PM, Jerry Jelinek 
wrote:

> Thomas,
>
> Thanks for your offer to run some tests on different OSes and filesystems
> that you have. Anything you can provide here would be much appreciated. I
> don't have anything other than our native SmartOS/ZFS based configurations,
> but I might be able to setup some VMs and get results that way. I should be
> able to setup a VM running FreeBSD. If you have a chance to collect some
> data, just let me know the exact benchmarks you ran and I'll run the same
> things on the FreeBSD VM. Obviously you're under no obligation to do any of
> this, so if you don't have time, just let me know and I'll see what I can
> do on my own.
>
> Thanks again,
> Jerry
>
>
> On Tue, Jul 17, 2018 at 2:47 PM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On 07/17/2018 09:12 PM, Peter Eisentraut wrote:
>> > On 17.07.18 00:04, Jerry Jelinek wrote:
>> >> There have been quite a few comments since last week, so at this point
>> I
>> >> am uncertain how to proceed with this change. I don't think I saw
>> >> anything concrete in the recent emails that I can act upon.
>> >
>> > The outcome of this could be multiple orthogonal patches that affect the
>> > WAL file allocation behavior somehow.  I think your original idea of
>> > skipping recycling on a COW file system is sound.  But I would rather
>> > frame the option as "preallocating files is obviously useless on a COW
>> > file system" rather than "this will make things mysteriously faster with
>> > uncertain trade-offs".
>> >
>>
>> Makes sense, I guess. But I think many claims made in this thread are
>> mostly just assumptions at this point, based on our beliefs how CoW or
>> non-CoW filesystems work. The results from ZFS (showing positive impact)
>> are an exception, but that's about it. I'm sure those claims are based
>> on real-world experience and are likely true, but it'd be good to have
>> data from a wider range of filesystems / configurations etc. so that we
>> can give better recommendations to users, for example.
>>
>> That's something I can help with, assuming we agree on what tests we
>> want to do. I'd say the usual batter of write-only pgbench tests with
>> different scales (fits into s_b, fits into RAM, larger then RAM) on
>> common Linux filesystems (ext4, xfs, btrfs) and zfsonlinux, and
>> different types of storage would be enough. I don't have any freebsd box
>> available, unfortunately.
>>
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


Re: [HACKERS] WAL logging problem in 9.4.3?

2018-07-27 Thread Andrew Dunstan




On 07/18/2018 10:58 AM, Heikki Linnakangas wrote:

On 18/07/18 16:29, Robert Haas wrote:
On Wed, Jul 18, 2018 at 9:06 AM, Michael Paquier 
 wrote:

What's wrong with the approach proposed in
http://postgr.es/m/55afc302.1060...@iki.fi ?


For back-branches that's very invasive so that seems risky to me
particularly seeing the low number of complaints on the matter.


Hmm. I think that if you disable the optimization, you're betting that
people won't mind losing performance in this case in a maintenance
release.  If you back-patch Heikki's approach, you're betting that the
committed version doesn't have any bugs that are worse than the status
quo.  Personally, I'd rather take the latter bet.  Maybe the patch
isn't all there yet, but that seems like something we can work
towards.  If we just give up and disable the optimization, we won't
know how many people we ticked off or how badly until after we've done
it.


Yeah. I'm not happy about backpatching a big patch like what I 
proposed, and Kyotaro developed further. But I think it's the least 
bad option we have, the other options discussed seem even worse.


One way to review the patch is to look at what it changes, when 
wal_level is *not* set to minimal, i.e. what risk or overhead does it 
pose to users who are not affected by this bug? It seems pretty safe 
to me.


The other aspect is, how confident are we that this actually fixes the 
bug, with least impact to users using wal_level='minimal'? I think 
it's the best shot we have so far. All the other proposals either 
don't fully fix the bug, or hurt performance in some legit cases.


I'd suggest that we continue based on the patch that Kyotaro posted at 
https://www.postgresql.org/message-id/20180330.100646.86008470.horiguchi.kyotaro%40lab.ntt.co.jp.






I have just spent some time reviewing Kyatoro's patch. I'm a bit 
nervous, too, given the size. But I'm also nervous about leaving things 
as they are. I suspect the reason we haven't heard more about this is 
that these days use of "wal_level = minimal" is relatively rare.


I like the fact that this is closer to being a real fix rather than just 
throwing out the optimization. Like Heikki I've come round to the view 
that something like this is the least bad option.


The code looks good to me - some comments might be helpful in 
heap_xlog_update()


Do we want to try this on HEAD and then backpatch it? Do we want to add 
some testing along the lines Michael suggested?


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-07-27 Thread Alexander Kuzmenkov

Daniel,

Thanks for the update.


On 07/25/2018 01:37 AM, Daniel Gustafsson wrote:



Hmm, this is missing the eqop fields of SortGroupClause. I haven't
tested yet but does the similar degradation happen if two
SortGroupCaluses have different eqop and the same other values?

I wasn’t able to construct a case showing this, but I also think you’re right.
Do you have an idea of a query that can trigger a regression?  The attached
patch adds a stab at this, but I’m not sure if it’s the right approach.


To trigger that, in your test example you could order by empno::int8 for 
one window, and by empno::int2 for another. But don't I think you have 
to compare eqop here, because if eqop differs, sortop will differ too. I 
removed the comparison from the patch. I also clarified (I hope) the 
comments, and did the optimization I mentioned earlier: using array 
instead of list for active clauses. Please see the attached v6. 
Otherwise I think the patch is good.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From e6006da62d7d4c0c54750260e864e52e93e2dba9 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson 
Date: Tue, 12 Jun 2018 14:13:48 +0200
Subject: [PATCH] Order windows on partition/ordering prefix to reuse Sort
 nodes

By ordering the Windows on partitioning/ordering prefix, the result
from Sort nodes can be reused by multiple WindowAgg nodes and thus
we can avoid Sort nodes.
---
 src/backend/optimizer/plan/planner.c | 126 ++-
 src/test/regress/expected/window.out |  60 +
 src/test/regress/sql/window.sql  |  16 +
 3 files changed, 144 insertions(+), 58 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df4ec44..299149f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -110,6 +110,17 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 } grouping_sets_data;
 
+/*
+ * Temporary structure for use during WindowClause reordering in order to be
+ * be able to sort WindowClauses on partitioning/ordering prefix.
+ */
+typedef struct
+{
+	WindowClause *wc;
+	List	   *uniqueOrder;	/* A List of unique ordering/partitioning
+   clauses per Window */
+} WindowClauseSortNode;
+
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -237,6 +248,7 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
 static bool group_by_has_partkey(RelOptInfo *input_rel,
 	 List *targetList,
 	 List *groupClause);
+static int common_prefix_cmp(const void *a, const void *b);
 
 
 /*
@@ -5253,68 +5265,92 @@ postprocess_setop_tlist(List *new_tlist, List *orig_tlist)
 static List *
 select_active_windows(PlannerInfo *root, WindowFuncLists *wflists)
 {
-	List	   *result;
-	List	   *actives;
+	List	   *result = NIL;
 	ListCell   *lc;
+	WindowClauseSortNode *actives = palloc(sizeof(WindowClauseSortNode)
+	* list_length(root->parse->windowClause));
+	int nActive = 0;
 
 	/* First, make a list of the active windows */
-	actives = NIL;
-	foreach(lc, root->parse->windowClause)
+	foreach (lc, root->parse->windowClause)
 	{
 		WindowClause *wc = lfirst_node(WindowClause, lc);
 
 		/* It's only active if wflists shows some related WindowFuncs */
 		Assert(wc->winref <= wflists->maxWinRef);
-		if (wflists->windowFuncs[wc->winref] != NIL)
-			actives = lappend(actives, wc);
+		if (wflists->windowFuncs[wc->winref] == NIL)
+			continue;
+
+		actives[nActive].wc = wc;
+		actives[nActive].uniqueOrder = list_concat_unique(
+			list_copy(wc->partitionClause), wc->orderClause);
+		nActive++;
 	}
 
 	/*
-	 * Now, ensure that windows with identical partitioning/ordering clauses
-	 * are adjacent in the list.  This is required by the SQL standard, which
-	 * says that only one sort is to be used for such windows, even if they
-	 * are otherwise distinct (eg, different names or framing clauses).
-	 *
-	 * There is room to be much smarter here, for example detecting whether
-	 * one window's sort keys are a prefix of another's (so that sorting for
-	 * the latter would do for the former), or putting windows first that
-	 * match a sort order available for the underlying query.  For the moment
-	 * we are content with meeting the spec.
-	 */
-	result = NIL;
-	while (actives != NIL)
-	{
-		WindowClause *wc = linitial_node(WindowClause, actives);
-		ListCell   *prev;
-		ListCell   *next;
-
-		/* Move wc from actives to result */
-		actives = list_delete_first(actives);
-		result = lappend(result, wc);
-
-		/* Now move any matching windows from actives to result */
-		prev = NULL;
-		for (lc = list_head(actives); lc; lc = next)
-		{
-			WindowClause *wc2 = lfirst_node(WindowClause, lc);
+	 * Sort windows by their 

Re: My Skype account (korotkovae) was hacked

2018-07-27 Thread Robert Haas
On Fri, Jul 27, 2018 at 7:18 AM, Alexander Korotkov
 wrote:
> Some community members did communicate with me using my Skype account
> (korotkovae).  Today this account was hacked.  You should know that
> any requests to send money sent from my account are fraud.
> Sorry for inconvenience.  I've send request to restore my Skype
> account, it will be under consideration in next 24 hours.

Hmm, that explains why I got my first-ever message from you on Skype
today... and why it was in Russian.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >> Why did we invent "custom" format dumps instead of using a standard
 >> container-file format like tar/cpio/zip/whatever?

 Andres> Because they're either not all that simple, or don't random
 Andres> read access inside. But that's just a guess, not fact.

A more significant factor is that tar (like most file archive formats)
doesn't allow streamed _write_ access - you need to know the size of
each archive member in advance, hence why -Ft has to copy each table to
a temp file and then copy that into the archive.

-- 
Andrew.



Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-27 Thread Robert Haas
On Fri, Jul 27, 2018 at 3:17 AM, Ashutosh Bapat
 wrote:
> Apart from the complexity there's also a possibility that this
> skipping will reduce the efficiency actually in normal cases. Consider
> a case where A and B have exactly matching partitions. Current
> partition matching algorithm compare any given range/bound only once
> and we will have O(n) algorithm. If we use a binary search, however,
> for every range comparison, it will be O(n log n) algorithm. There
> will be unnecessary comparisons during binary search. The current
> algorithm is always O(n), whereas binary search would be O(n log(n))
> with a possibility of having sub-O(n) complexity in some cases. I
> would go for an algorithm which has a consistent time complexity
> always and which is efficient in normal cases, rather than worrying
> about some cases which are not practical.

Yeah, I think that's a good point.  The normal case here will be that
the partition bounds are equal, or that there are a few extra
partitions on one side that don't exist on the other.  We don't want
other cases to be crazily inefficient, but I suspect in practice that
if the partitioning bounds aren't pretty close to matching up exactly,
we're going to end up failing to be able to do a partition-wise join
at all.  It's not very likely that somebody happens to have a case
where they've partitioned two tables in randomly different ways, but
then they decide to join them anyway, but then it turns out that the
partition bounds happen to be compatible enough that this algorithm
works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Tom Lane
Jeff Janes  writes:
> But he isn't proposing getting rid of -Fp, just -Ft.  Isn't -Ft is just as
> PostgresSQL-specific
> as -Fd is?

No.  The point about -Ft format is that you can extract files that contain
SQL text and COPY data, using nothing but standard Unix tools (i.e. tar).
So just as with a plain-text dump, you'd have some work to do to get your
data into some other RDBMS, but it'd be mostly about SQL-compatibility
problems, not "what the heck is this binary file format".

I was thinking before that -Fd had basically the same payload files as
an -Ft archive, but it doesn't: we don't emit anything corresponding to
the "restore.sql" member of an -Ft archive.  This means that -Fd still
leaves you needing PG-specific tools to interpret the toc.dat file,
so it's not a plausible answer if you would like to have something that's
more structured than a plain-text dump but will still be of use if your
PG tools are not available.

The -Ft format certainly has got its problems, and I wouldn't complain
if we decided to, say, extend -Fd format so that you could also get
info out of it without using pg_restore.  But I do not think we should
just drop -Ft as long as it's our only nonproprietary structured
dump format.

regards, tom lane



Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Chris Travers
On Fri, Jul 27, 2018, 19:01 Andres Freund  wrote:

> Hi,
>
> On 2018-07-27 11:15:00 -0500, Nico Williams wrote:
> > Even assuming you can't change the PG license, you could still:
> >
> >  - require disclosure in contributions
>
> That really has no upsides, except poison the area.  Either we reject
> the patch and people doing so can reasonably be expected to know about
> the patents, making further contributions by them worse.  Or we accept
> the patch, and the listed patents make the commercial offerings harder,
> further development more dubious, readers can reasonably be concerned
> about being considered do know about the patents in independent
> projects.
>

What about just requiring a patent license that grants all rights necessary
to  fully enjoy the copyright license? That avoids the need to change the
license fomally.  And it would just clarify that we expect that patents do
NOT change the license.  Not that I expect there would be takers

>
> Greetings,
>
> Andres Freund
>


Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Chapman Flack
On 07/27/2018 01:42 PM, Andres Freund wrote:
> On 2018-07-27 13:33:28 -0400, Chapman Flack wrote:
>> On 07/27/2018 01:01 PM, Andres Freund wrote:
>>
>>> the patch and people doing so can reasonably be expected to know about
>>> the patents, making further contributions by them worse.
>>
>> I'm not sure this line of thinking, which seems rooted in notions of
>> tainted or cleanroom development from the copyright world, has the
>> same force wrt patents.
> 
> Yes, violations made with knowledge triples damages in the US.

Nobody suggested violating with knowledge. The point is to use the
knowledge to /not/ violate.

In the domain of copyright, that's nonsensical of course.
In the domain of patent, it isn't, and can be smarter than
forging ahead blindly in the hope that you're just happening
to skirt the claims of a patent you'd rather not know about.

-Chap



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Robert Haas
On Fri, Jul 27, 2018 at 1:05 PM, Andres Freund  wrote:
> My point is more that it forces users to make choices whenever they use
> pg_dump. And the tar format has plenty downsides that aren't immediately
> apparent.  By keeping something with only a small upside around, we
> force users to waste time.

Yeah, I admit that's a valid argument.

>> Why did we invent "custom" format dumps instead of using a standard
>> container-file format like tar/cpio/zip/whatever?
>
> Because they're either not all that simple, or don't random read access
> inside. But that's just a guess, not fact.

Mmm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Explain buffers wrong counter with parallel plans

2018-07-27 Thread Jonathan S. Katz

> On Jul 27, 2018, at 8:31 AM, Amit Kapila  wrote:
> 
> On Thu, Jul 26, 2018 at 7:31 PM, Jonathan S. Katz
>  wrote:
>> 
>>> On Jul 25, 2018, at 10:25 PM, Amit Kapila  wrote:
>>> 
>>> 
>>> You mean to say the number (Buffers: shared read=442478) in HEAD,
>>> right?  If so, yeah, I am also wondering why the results of the patch
>>> are different in HEAD and 11beta2.  So, if I read correctly, the
>>> numbers in 11beta2 appears correct, but on HEAD it is not correct, it
>>> should have divided the buffers read by loops.
> 
> I want to correct myself here, the numbers on HEAD are correct, but
> not on PG11beta2.  Is there any chance that either you forgot to apply
> the patch or maybe it is not using correct binaries in case of
> 11beta2.

I need to correct myself too as I was unclear - that was an *unpatched*
11beta2, sorry for the confusion.

>>> I will figure that
>>> out, but this is just to clarify that both of us are seeing the
>>> results in the same way.
>> 
>> I’ll try it again but patch it against 11beta2 and see what results I get.
>> 
>>> 
 and there appears to be a performance hit.
 
>>> 
>>> Do you mean to say the performance of the same query in 11beta2 and
>>> HEAD or something else?
>> 
>> Correct. But per your advice let me try running it against a patched
>> version of 11beta2 and see what happens.
>> 
> 
> Yeah, that would be better.  Today, I have tried the patch on both
> Head and PG11 and I am getting same and correct results.

I have applied the the patch to PG11beta2 and tested. I received
similar numbers to to the patched HEAD tests, e.g:

=> EXPLAIN (analyze,buffers,timing off,costs off) SELECT count(*) FROM t1;
QUERY PLAN
--
 Finalize Aggregate (actual rows=1 loops=1)
   Buffers: shared hit=224 read=442254
   ->  Gather (actual rows=7 loops=1)
 Workers Planned: 6
 Workers Launched: 6
 Buffers: shared hit=224 read=442254
 ->  Partial Aggregate (actual rows=1 loops=7)
   Buffers: shared hit=224 read=442254
   ->  Parallel Seq Scan on t1 (actual rows=14285714 loops=7)
 Buffers: shared hit=224 read=442254
 Planning Time: 0.104 ms
 Execution Time: 5308.140 ms


I ran the tests a few more times and I understand why the execution
times vary (with an explanation from Andres) so I am satisfied.

Thanks for working on this!

Jonathan


signature.asc
Description: Message signed with OpenPGP


Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Andres Freund
On 2018-07-27 13:33:28 -0400, Chapman Flack wrote:
> On 07/27/2018 01:01 PM, Andres Freund wrote:
> 
> > the patch and people doing so can reasonably be expected to know about
> > the patents, making further contributions by them worse.
> 
> I'm not sure this line of thinking, which seems rooted in notions of
> tainted or cleanroom development from the copyright world, has the
> same force wrt patents.

Yes, violations made with knowledge triples damages in the US.


> Sometimes a good understanding of a patented technique, including
> just what aspects of it are claimed or not in the patent's claims
> section, will be just what you need in order to be confident that
> an alternative approach you've devised really is different in the
> ways that matter. I don't think it automatically casts a cloud on
> the work as it would in the copyright case.

There's no way we can do that without extensive lawyerly input in each
case.

Greetings,

Andres Freund



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Jeff Janes
On Fri, Jul 27, 2018 at 12:51 PM, Robert Haas  wrote:

> On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> Is there any real reason to retain it?
> >
> > As I recall, the principal argument for having it to begin with was
> > that it's a "non proprietary" format that could be read without any
> > PG-specific tools.  Perhaps the directory format could be said to
> > serve that purpose too, but if you were to try to collapse a directory
> > dump into one file for transportation, you'd have ... a tar dump.
> >
> > I think a more significant question is what we'd get by removing it?
> > If you want to look around for features that are slightly less used
> > than other arguably-equivalent things, we must have hundreds of those.
> > Doesn't mean that those features have no user constituency.
>
> Yeah.  I don't mind removing really marginal features to ease
> maintenance, but I'm not sure that this one is all that marginal or
> that we'd save that much maintenance by eliminating it.  I used
> text-format dumps for years primarily because I figured that no matter
> what happened, I'd always be able to find some way of getting my data
> out of a text file.  Ideally the PostgreSQL tools will always work,
> but if they don't work and you have a text file, you have
> alternatives.  If they don't work and you have a format in some
> PostgreSQL-specific format, then what?
>

But he isn't proposing getting rid of -Fp, just -Ft.  Isn't -Ft is just as
PostgresSQL-specific
as -Fd is?

Cheers,

Jeff


Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Chapman Flack
On 07/27/2018 01:01 PM, Andres Freund wrote:

> the patch and people doing so can reasonably be expected to know about
> the patents, making further contributions by them worse.

I'm not sure this line of thinking, which seems rooted in notions of
tainted or cleanroom development from the copyright world, has the
same force wrt patents.

Sometimes a good understanding of a patented technique, including
just what aspects of it are claimed or not in the patent's claims
section, will be just what you need in order to be confident that
an alternative approach you've devised really is different in the
ways that matter. I don't think it automatically casts a cloud on
the work as it would in the copyright case.

-Chap



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Joshua D. Drake

On 07/27/2018 10:05 AM, Andres Freund wrote:



Yeah.  I don't mind removing really marginal features to ease
maintenance, but I'm not sure that this one is all that marginal or
that we'd save that much maintenance by eliminating it.

My point is more that it forces users to make choices whenever they use
pg_dump. And the tar format has plenty downsides that aren't immediately
apparent.  By keeping something with only a small upside around, we
force users to waste time.


Correct. Sometimes it is best to limit choices, someone may chose tar 
because it is a command they have used but not fully understand what 
that means within the context of PostgreSQL. Then they are going to have 
something happen, they will ask for help either on the lists or from a 
consulting firm and the first they either will say is, "Don't use the 
tar format" or at least, "You should be using one of the other formats".


Why invite the overhead?

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Nico Williams
On Fri, Jul 27, 2018 at 10:01:40AM -0700, Andres Freund wrote:
> On 2018-07-27 11:15:00 -0500, Nico Williams wrote:
> > Even assuming you can't change the PG license, you could still:
> > 
> >  - require disclosure in contributions
> 
> That really has no upsides, except poison the area.  [...]

Sure it does: a) it allows PG to reject such contributions unless they
come with suitable grants, and b) it gives PG users a legal defense
should they ever be sued for infringement on a patent that was not
disclosed to PG at the time of the contribution.  (b) is a big deal.

Nico
-- 



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Andres Freund
Hi,

On 2018-07-27 12:51:17 -0400, Robert Haas wrote:
> On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> Is there any real reason to retain it?
> >
> > As I recall, the principal argument for having it to begin with was
> > that it's a "non proprietary" format that could be read without any
> > PG-specific tools.  Perhaps the directory format could be said to
> > serve that purpose too, but if you were to try to collapse a directory
> > dump into one file for transportation, you'd have ... a tar dump.
> >
> > I think a more significant question is what we'd get by removing it?
> > If you want to look around for features that are slightly less used
> > than other arguably-equivalent things, we must have hundreds of those.
> > Doesn't mean that those features have no user constituency.
> 
> Yeah.  I don't mind removing really marginal features to ease
> maintenance, but I'm not sure that this one is all that marginal or
> that we'd save that much maintenance by eliminating it.

My point is more that it forces users to make choices whenever they use
pg_dump. And the tar format has plenty downsides that aren't immediately
apparent.  By keeping something with only a small upside around, we
force users to waste time.


> Why did we invent "custom" format dumps instead of using a standard
> container-file format like tar/cpio/zip/whatever?

Because they're either not all that simple, or don't random read access
inside. But that's just a guess, not fact.

Greetings,

Andres Freund



Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Andres Freund
Hi,

On 2018-07-27 11:15:00 -0500, Nico Williams wrote:
> Even assuming you can't change the PG license, you could still:
> 
>  - require disclosure in contributions

That really has no upsides, except poison the area.  Either we reject
the patch and people doing so can reasonably be expected to know about
the patents, making further contributions by them worse.  Or we accept
the patch, and the listed patents make the commercial offerings harder,
further development more dubious, readers can reasonably be concerned
about being considered do know about the patents in independent
projects.

Greetings,

Andres Freund



Re: Upper limit arguments of pg_logical_slot_xxx_changes functions accept invalid values

2018-07-27 Thread Robert Haas
On Wed, Jul 11, 2018 at 8:58 PM, Masahiko Sawada  wrote:
> While reading the replication slot codes, I found a wrong assignment
> in pg_logical_slot_get_changes_guts() function as follows.
>
> if (PG_ARGISNULL(2))
>upto_nchanges = InvalidXLogRecPtr;
> else
> upto_nchanges = PG_GETARG_INT32(2);
>
> Since the upto_nchanges is an integer value we should set 0 meaning
> unlimited instead of InvalidXLogRecPtr. Since InvalidXLogRecPtr is
> actually 0 this function works fine so far.

If somebody changes InvalidXLogRecPtr to (uint64)-1, then it breaks as
the code is written.  On the other hand, if somebody reverted
0ab9d1c4b31622e9176472b4276f3e9831e3d6ba, it would keep working as
written but break under your proposal.

I am not prepared to spend much time arguing about it, but I think we
should just leave this the way it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

2018-07-27 Thread Robert Haas
On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> Is there any real reason to retain it?
>
> As I recall, the principal argument for having it to begin with was
> that it's a "non proprietary" format that could be read without any
> PG-specific tools.  Perhaps the directory format could be said to
> serve that purpose too, but if you were to try to collapse a directory
> dump into one file for transportation, you'd have ... a tar dump.
>
> I think a more significant question is what we'd get by removing it?
> If you want to look around for features that are slightly less used
> than other arguably-equivalent things, we must have hundreds of those.
> Doesn't mean that those features have no user constituency.

Yeah.  I don't mind removing really marginal features to ease
maintenance, but I'm not sure that this one is all that marginal or
that we'd save that much maintenance by eliminating it.  I used
text-format dumps for years primarily because I figured that no matter
what happened, I'd always be able to find some way of getting my data
out of a text file.  Ideally the PostgreSQL tools will always work,
but if they don't work and you have a text file, you have
alternatives.  If they don't work and you have a format in some
PostgreSQL-specific format, then what?

I probably wouldn't be as nervous about this now as I was then, seeing
how careful we've been about this stuff.  But I can certainly
understand somebody wanting a standard format rather than a
PostgreSQL-specific one.  Why did we invent "custom" format dumps
instead of using a standard container-file format like
tar/cpio/zip/whatever?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Making "COPY partitioned_table FROM" faster

2018-07-27 Thread Robert Haas
On Wed, Jul 25, 2018 at 10:30 PM, David Rowley
 wrote:
> I agree RANGE partition is probably the most likely case to benefit
> from this optimisation, but I just don't think that HASH could never
> benefit and LIST probably sits somewhere in the middle.
>
> HASH partitioning might be useful in cases like partitioning by
> "sensor_id".  It does not seem that unreasonable that someone might
> want to load all the data for an entire sensor at once.

Another case might be restoring a dump with --load-via-partition-root.
Granted, you probably shouldn't specify that option unless you expect
rows to end up in different partition than they were in the original
cluster, but it's possible somebody might do it out of an abundance of
caution if, say, they are doing an automated dump restore on a machine
that may or may not have different endian-ness than the original.

I think making it adaptive, as you've done, is definitely better than
a heuristic based on the partitioning type.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Early WIP/PoC for inlining CTEs

2018-07-27 Thread Andrew Gierth
> "Andreas" == Andreas Karlsson  writes:

 >> WITH ctename AS [[NOT] MATERIALIZED] (query)

 Andreas> I think "NOT MATERIALIZED" would be a bit misleading since the
 Andreas> planner may choose to materialize anyway,

It would certainly be possible to make an explicit NOT MATERIALIZED
override such a choice by the planner, or produce an error if it is
actually impossible to do so (e.g. using NOT MATERIALIZED on a wCTE)

-- 
Andrew (irc:RhodiumToad)



Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Nico Williams
Even assuming you can't change the PG license, you could still:

 - require disclosure in contributions
 - require a wide grant in contributions
 - document all such grants separately from the copyright license

Putting the grants in the license is convenient, but it's not required
to include patent language in order to get the desired effect.  The
license is just a copyright license -- it could stay just that.

You could also require a CLA in the case where disclosure is made, or
even in all cases.  But a CLA for all cases would be a bit too
burdensome.

Nico
-- 



request for new parameter for disable promote (slave only mode)

2018-07-27 Thread Ioseph Kim
Hi.

I want to build one master & multi slave environments to use physical 
replication.
Slave nodes have low hardware spec, so I changed max_connection server 
parameters, and try start slave node.
But I could not start slave nodes, 
because CheckRequiredParameterValues function (in 
src/backend/access/transam/xlog.c) reject to set less values then master’s 
values.

It is maybe, 
This concept is base on some slave node can be promte master.
In my enterprise environments, readonly slave nodes are not used as master.
For High availability of master node, I’m using non shared disk fail over (like 
using pacemaker),
Then my slave nodes are used for always readonly nodes.

I suggest that new server parameter is added, such like allow_promote = on|off.
If allow_promte set off then CheckRequiredParameterValues skip check slave node 
parameters.

If there is a better plan, it is welcome too.

Regards, ioseph. 




Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Nico Williams
On Fri, Jul 27, 2018 at 09:30:45AM -0400, Robert Haas wrote:
> If you think that the lack of a CLA and a patent grant never causes
> extensive conversations with legal, I am quite certain that you are
> incorrect.  I know of multiple instances where this has been a
> concern.
> 
> Other open source projects even more prominent and successful than
> PostgreSQL have done these things.  [...]

FWIW, some have done it poorly.  I had trouble getting an employer to
sign the Oracle contributor agreement, for example, because the OCA did
not allow the employer to specify who could make the contribution (only
who the director is who signs the OCA) and also did not allow the
employer to narrow the scope of contributions to those relevant to the
OpenJDK.

I also recommend adding to the PG code of conduct text requiring
disclosure of relevant patent rights that the contributor knows about.

Nico
-- 



Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Robert Haas
On Fri, Jul 27, 2018 at 10:44 AM, Tom Lane  wrote:
> However, that does *not* mean that adding patent-related qualifiers to the
> license is going to be OK with everybody.  An easy counterexample is that
> we get code from some of those selfsame companies with private forks,
> which then feeds back into their forks (EDB, for instance).  What if the
> license change interferes with their own IP situation?

Maybe I'm missing something, but if it interferes with their IP
situation, that must mean that their IP situation involves potentially
wanting to sue users of PostgreSQL or derivate works for patent
infringement.  If that is the case, interfering with that situation is
a feature, not a bug.

In other words, if we say to various companies "hey, you can't
contribute to PostgreSQL any more unless you are willing to have your
contributions include the right to any patents they implicate" and one
of those companies says "well, in that case we're not going to
contribute any more," we have got to really wonder about the
motivations of that company.

> You're just attacking a straw man.

I can't agree with that statement.

> Nobody is claiming that that would
> give us entire safety.  We *are* saying that not doing that would expose
> us to more legal risk than if we do do it (ie refuse any known-patent-
> encumbered code).  In any case we will always have to be prepared to
> remove patent-covered code when someone brings it to our notice.
> The latter is a fact of life that no amount of license-tweaking or
> contributor-agreement-making is going to alter.

I agree with all this.

> But what we can do,
> to reduce the risk of future trouble, is avoid planting already-known
> patent time bombs in our code.  That prevents issues if the patent owner
> (or a successor) has a change of heart;

No, that's exactly backwards.  Right now, you are relying on people
not having a change of heart; if you made them promise to license any
patents they have, you would be relying on the law.  Today, if an
employee of company A contributes a patch to PostgreSQL which happens
to include patented technology, and they happen not to mention it
because their present management has no intention of asserting that
patent against PostgreSQL, or because they just want to get the patch
accepted for whatever reasons, they can later decide that they'd like
to assert that patent after all.  If they have agreed to a license
that includes an explicit patent grant, then they can't just change
their mind.  There's a lot of speculation on this thread which
suggests that there may be situations such as bankruptcy in which
companies can go back on their contracts and/or promises; regardless
of whether that is the case, it must be far easier for a company to
reverse course when it hasn't contracted or promised anything to begin
with.

> and I think also that having a
> clear no-patents policy will help establish lack of intent to infringe
> if things ever get sticky with an accidental infringement.

We don't have a clear contribution policy of any kind, about patents
or anything else.  The entire policy around contributions consists of
a corpus of remarks you've made on the mailing list and *maybe* a wiki
page someplace.  (I couldn't find one in a quick search, though.)
There's nothing that anybody needs to agree to in order to contribute,
and there are no checks that anyone is aware of, let alone intending
to adhere to, any particular set of rules.  If we actually did have a
clear policy, though, I agree that it might help.

On the other hand, how much difference is there between a policy that
says "we don't want code that is known to be covered by patents of any
kind" and a similar policy that says "...unless a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent
license allowing use for any purpose is conveyed along with the code"?
 If you operate under the theory that someone can revoke a license
that has "irrevocable" in the legal language, then there's a big
difference, but I don't think that's how the law works.  My
understanding, rather, is that once a technique is patented by company
or individual A, it cannot later be patented by somebody else.  So if
you know who the patent-holder is, and you know they've given a
license, you're actually safer than if there is no patent at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



add verbosity to pg_basebackup for sync

2018-07-27 Thread Jeff Janes
On some recent testing, pg_basebackup -Fp was taking an annoying amount of
time to finish once the it was done copying the data.  Using -v seemed to
blame this on "waiting for background process to finish streaming", based
on that being the last message on display while the delay was happening.

But it was really waiting for the syncs of the new -D dir to finish.  The
attached patch adds a -v notice that it is starting to do the sync, with
the wording taken from initdb's equivalent message.

I think -P should report everything -v does, just with a carriage return
rather than a newline, but that is a larger issue.

Cheers,

Jeff


pg_basebackup_v_sync.patch
Description: Binary data


Re: Auditing via logical decoding

2018-07-27 Thread Stephen Frost
Greetings,

* from_postg...@safetyphil.com (from_postg...@safetyphil.com) wrote:
> >> We have been using our own trigger-based audit system at my firm 
> >> successfully for some years, but the performance penalty is starting to 
> >> grate a bit and so I have been tasked with seeing if we can make use of 
> >> the new logical decoding functions to achieve the same thing. I thought 
> >> that someone must already have written something that would satisfy our 
> >> use-case but my internet searches have come up short so far so I am 
> >> considering writing a logical decoding plugin to do what we want.
> 
> > Have you checked pgaudit [1]? I haven't checked if it matches all your 
> > requirements, but considering it's an extension aimed at auditing use 
> > cases it might. And it's already available, of course.
> 
> Actually no, I hadn't come across this before, thanks for the heads up. It is 
> important for us to be able to get the audit data back into a different 
> database, but it looks like I could scrape the logs and do that. At the very 
> least it shows that it is possible to hook into postgres in the right places 
> to emit extra logical messages if that turns out to be the better way to do 
> it.
> 
> It doesn't appear to support application_name, nor the connection details of 
> the user doing the update - but perhaps that would be just as sensible to add 
> it here than in a logical output plugin.

I've not been following this very closely but when it comes to pgAudit,
the log lines will include whatever is in log_line_prefix, where you can
include information like the application_name, login user, etc, which is
why you don't explicitly see options for those in pgAudit itself.

This is because pgAudit logs messages the same way PG itself does (which
also means it goes to the same place, such as to a CSV log that can then
be imported into a PG database).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Auditing via logical decoding

2018-07-27 Thread Tomas Vondra




On 07/27/2018 04:43 PM, from_postg...@safetyphil.com wrote:
We have been using our own trigger-based audit system at my firm 
successfully for some years, but the performance penalty is

starting to grate a bit and so I have been tasked with seeing if
we can make use of the new logical decoding functions to achieve
the same thing. I thought that someone must already have written
something that would satisfy our use-case but my internet
searches have come up short so far so I am considering writing a
logical decoding plugin to do what we want.



Have you checked pgaudit [1]? I haven't checked if it matches all
your requirements, but considering it's an extension aimed at
auditing use cases it might. And it's already available, of
course.


Actually no, I hadn't come across this before, thanks for the heads
up. It is important for us to be able to get the audit data back into
a different database, but it looks like I could scrape the logs and
do that. At the very least it shows that it is possible to hook into
postgres in the right places to emit extra logical messages if that
turns out to be the better way to do it.

It doesn't appear to support application_name, nor the connection
details of the user doing the update - but perhaps that would be just
as sensible to add it here than in a logical output plugin.

It's a bit tricky coming to a big codebase like postgres and trying
to decide the best route of doing something; I don't have much of a
mental model about how complicated the various systems are :)



It may not support everything, but in general when a feature can be done 
in an extension, it's generally faster to deliver that way. One of the 
reasons is the ad hoc release cycle (compared to the 1-per-year cycle of 
PostgreSQL). So I'd suggest investigating this option first.


It may not meet all your requirements immediately (or at all), but well, 
no solution probably will.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-27 Thread Alexander Korotkov
On Fri, Jul 27, 2018 at 12:30 AM Simon Riggs  wrote:
> On 26 July 2018 at 20:59, Alexander Korotkov  
> wrote:
>
> > Great, thank you!  So, I think the regression is demystified.  We can
> > now conclude that on our benchmarks this patch doesn't cause
> > performance regression larger than measurement error.  But in some
> > cases it shows huge performance benefit.
> >
> > So, I'm going to commit this, if no objections.
>
> +1 to commit.
>
> What will the commit message be?
>
> For me, this is about reducing contention on index leaf page hotspots,
> while at the same time reducing the path length of lock acquisition on
> leaf pages

So, reducing path length of lock acquisition is particular technical
change made, while reducing contention on index leaf pages is a
result.  I think that reducing path length of lock acquisition should
be mentioned in title of commit message, while contention reduction
should be mentioned in the body of commit message, because it's
motivation of this commit.  If we would have release notes item for
this commit, it should also mention contention reduction, because it's
a user-visible effect of this commit.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: grammar - src/backend/access/heap/README.tuplock

2018-07-27 Thread Alvaro Herrera
I agree with all you say here; these were all my mistakes.  Pushed
patch, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auditing via logical decoding

2018-07-27 Thread Pavel Stehule
2018-07-27 16:43 GMT+02:00 :

> >> We have been using our own trigger-based audit system at my firm
> >> successfully for some years, but the performance penalty is starting to
> >> grate a bit and so I have been tasked with seeing if we can make use of
> >> the new logical decoding functions to achieve the same thing. I thought
> >> that someone must already have written something that would satisfy our
> >> use-case but my internet searches have come up short so far so I am
> >> considering writing a logical decoding plugin to do what we want.
>
> > Have you checked pgaudit [1]? I haven't checked if it matches all your
> > requirements, but considering it's an extension aimed at auditing use
> > cases it might. And it's already available, of course.
>
> Actually no, I hadn't come across this before, thanks for the heads up. It
> is important for us to be able to get the audit data back into a different
> database, but it looks like I could scrape the logs and do that. At the
> very least it shows that it is possible to hook into postgres in the right
> places to emit extra logical messages if that turns out to be the better
> way to do it.
>
> It doesn't appear to support application_name, nor the connection details
> of the user doing the update - but perhaps that would be just as sensible
> to add it here than in a logical output plugin.
>

it is very simple to show application name or any other info

Few years ago I customized pgAudit and it was not hard work - almost all
time

Regards

Pavel


> It's a bit tricky coming to a big codebase like postgres and trying to
> decide the best route of doing something; I don't have much of a mental
> model about how complicated the various systems are :)
>
> Kind Regards,
>
> Phil
>
>
>


Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jul 26, 2018 at 11:00 PM, Stephen Frost  wrote:
>> This is a killer point here- clearly the people who have been
>> contributing to PG aren't going to complain about their contributions
>> being released as part of some other work which has a different license
>> or they'd have gone after the many, many, many closed-source and
>> proprietary and commercial forks of PG.

> Yes.  Tom's argument doesn't work at all, for this reason among
> others.  Other projects have done things like this, I'm pretty sure,
> so we could, too.

No, it doesn't follow.  Anyone who's contributed code to PG in the last
twenty years has known, if they were paying any attention, that it'd
also end up in proprietary forks.  The ground rules for the community
are that we don't care about that; somebody who did care would probably
look for a GPL-licensed project to contribute to.

However, that does *not* mean that adding patent-related qualifiers to the
license is going to be OK with everybody.  An easy counterexample is that
we get code from some of those selfsame companies with private forks,
which then feeds back into their forks (EDB, for instance).  What if the
license change interferes with their own IP situation?

> I agree.  The argument that is being made by Tom, Bruce, Dave, and
> David Fetter -- not to conflate them all, but they seem to be broadly
> on the same wavelength -- is that we should just reject contributions
> if the contributor says that there is a patent on anything in that
> contribution, or if we find out that this is the case.  Then, they
> allege, we will have no problem with patents.  I am not a lawyer, but
> I don't think it works that way.

You're just attacking a straw man.  Nobody is claiming that that would
give us entire safety.  We *are* saying that not doing that would expose
us to more legal risk than if we do do it (ie refuse any known-patent-
encumbered code).  In any case we will always have to be prepared to
remove patent-covered code when someone brings it to our notice.
The latter is a fact of life that no amount of license-tweaking or
contributor-agreement-making is going to alter.  But what we can do,
to reduce the risk of future trouble, is avoid planting already-known
patent time bombs in our code.  That prevents issues if the patent owner
(or a successor) has a change of heart; and I think also that having a
clear no-patents policy will help establish lack of intent to infringe
if things ever get sticky with an accidental infringement.

regards, tom lane



RE: Auditing via logical decoding

2018-07-27 Thread from_postgres
>> We have been using our own trigger-based audit system at my firm 
>> successfully for some years, but the performance penalty is starting to 
>> grate a bit and so I have been tasked with seeing if we can make use of 
>> the new logical decoding functions to achieve the same thing. I thought 
>> that someone must already have written something that would satisfy our 
>> use-case but my internet searches have come up short so far so I am 
>> considering writing a logical decoding plugin to do what we want.

> Have you checked pgaudit [1]? I haven't checked if it matches all your 
> requirements, but considering it's an extension aimed at auditing use 
> cases it might. And it's already available, of course.

Actually no, I hadn't come across this before, thanks for the heads up. It is 
important for us to be able to get the audit data back into a different 
database, but it looks like I could scrape the logs and do that. At the very 
least it shows that it is possible to hook into postgres in the right places to 
emit extra logical messages if that turns out to be the better way to do it.

It doesn't appear to support application_name, nor the connection details of 
the user doing the update - but perhaps that would be just as sensible to add 
it here than in a logical output plugin.

It's a bit tricky coming to a big codebase like postgres and trying to decide 
the best route of doing something; I don't have much of a mental model about 
how complicated the various systems are :)

Kind Regards,

Phil




Re: Alter index rename concurrently to

2018-07-27 Thread Robert Haas
On Wed, Jul 18, 2018 at 5:20 AM, Peter Eisentraut
 wrote:
> In the 2012 thread, it was mentioned several times that some thought
> that renaming without an exclusive lock was unsafe, but without any
> further reasons.  I think that was before MVCC catalog snapshots were
> implemented, so at that time, any catalog change without an exclusive
> lock would have been risky.  After that was changed, the issue hasn't
> been mentioned again in the 2013 and beyond thread, and it seems that
> everyone assumed that it was OK.
>
> It we think that it is safe, then I think we should just lower the lock
> level of RENAME by default.
>
> In your patch, the effect of the CONCURRENTLY keyword is just to change
> the lock level, without any further changes.  That doesn't make much
> sense.  If we think the lower lock level is OK, then we should just use
> it always.

Right.  I think that, in the world of MVCC catalog scans, there are
basically two main concerns around reducing the lock levels for DDL.
The first is the way that the shared invalidation queue works.  If
people reading a certain piece of information take a lock X, and
people changing it take a lock Y which conflicts with X, then the
shared invalidation machinery guarantees that everyone always sees the
latest version.  If not, some backends may continue to use the old
version for an unbounded period of time -- there doesn't seem to be a
guarantee that AcceptInvalidationMessages() even runs once per
command, if say the transaction already holds all the locks the query
needs.  Moreover, there is no predictability to when the new
information may become visible -- it may happen just by chance that
the changes become visible almost at once.  I think it would be
worthwhile for somebody to consider adjusting or maybe even
significantly rewriting the invalidation code to provide some
less-unpredictable behavior in this area.  It would be much more
palatable to make non-critical changes under lesser lock levels if you
could describe in an understandable way when those changes would take
effect.  If you could say something like "normally within a second" or
"no later than the start of the next query" it would be a lot better.

The second problem is the relation cache.  Lots of things cache
pointers to the RelationData structure or its subsidiary structures,
and if any of that gets rebuild, the pointer addresses may change,
leaving those pointers pointing to garbage.  This is handled in
different ways in different places.  One approach is -- if we do a
rebuild of something, say the partition descriptor, and find that the
new one is identical to the old one, then free the new one and keep
the old one.  This means that it's safe to rely on the pointer not
changing underneath you provided you hold a lock strong enough to
prevent any DDL that might change the partition descriptor.  But note
that something like this is essential for any concurrent DDL to work
at all.  Without this, concurrent DDL that changes some completely
unrelated property of the table (e.g. the reloptions) would cause a
relcache rebuild that would invalidate cached pointers to the
partition descriptor, leading to crashes.

With respect to this particular patch, I don't know whether there are
any hazards of the second type.  What I'd check, if it were me, is
what structures in the index's relcache entry are going to get rebuilt
when the index name changes.  If any of those look like things that
something that somebody could hold a pointer to during the course of
query execution, or more generally be relying on not to change during
the course of query execution, then you've got a problem.  As to the
first category, I suspect it's possible to construct cases where the
fact that the index's name hasn't change fails to get noticed for an
alarmingly long period of time after the change has happened.  I also
suspect that an appropriate fix might be to ensure that
AcceptInvalidationMessages() is run at least once at the beginning of
parse analysis.  I don't think that will make this kind of thing
race-free, but if you can't demonstrate a case where the new name
fails to be noticed immediately without resorting to setting
breakpoints with a debugger, it's probably good enough.  We might need
to worry about whether the extra call to AcceptInvalidationMessages()
costs enough to be noticeable, but I hope it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PartitionDispatch's partdesc field

2018-07-27 Thread Robert Haas
On Wed, Jul 25, 2018 at 10:42 PM, Amit Langote
 wrote:
>> Another alternative, which I think might make more sense, is to make
>> use pd->key and pd->partdesc in preference to pd->reldesc->rd_partkey
>> and pd->reldesc->rd_partdesc.  It seems to me that the idea of the
>> PartitionDispatch structure is that it gathers together all of the
>> information that we need for tuple routing, so it might make sense for
>> the tuple routing code ought to get the information from there rather
>> than referring back to the RelationDesc.  See attached
>> pd-partdesc-use.patch.
>
> +1 to pd-partdesc-use.patch.

OK, that makes 2 votes for that alternative and 0 for everything else
combined, so I've committed that version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Auditing via logical decoding

2018-07-27 Thread Tomas Vondra




On 07/27/2018 12:44 PM, Philip Scott wrote:

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm 
successfully for some years, but the performance penalty is starting to 
grate a bit and so I have been tasked with seeing if we can make use of 
the new logical decoding functions to achieve the same thing. I thought 
that someone must already have written something that would satisfy our 
use-case but my internet searches have come up short so far so I am 
considering writing a logical decoding plugin to do what we want.


I thought I would run the idea past you all here just in case my plan is 
crazy; I’ve browsed around the postgres source code a bit before but 
I’ve never really gotten my hands dirty and am a little bit nervous 
about putting my own C code into the heart of our DBMS so if this comes 
to anything I would like to offer my code up for review and/or possible 
inclusion as a contributed module.


A quick summary of requirements:

We want to log (to a separate, remote database)
   - One row for every transaction that changes the state of the database.
     We call this table ‘audit_entry’ and contains the xid, transaction 
timestamp, username, client hostname, and application name of the 
session that caused the change.
   - One row for each change made by each transaction which records the 
state of the tuple before the change.
     We call this table ‘audit_detail’ and contains xid, statement 
timestamp, table name & schema, event_type, primary_key (hstore), 
old_row (hstore), and the text of the query that was responsible for the 
change.




Have you checked pgaudit [1]? I haven't checked if it matches all your 
requirements, but considering it's an extension aimed at auditing use 
cases it might. And it's already available, of course.


[1] https://www.pgaudit.org/

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Robert Haas
On Wed, Jul 25, 2018 at 10:53 AM, David Fetter  wrote:
> What made PostgreSQL attractive to those companies in the first place
> was a known lack of need to have Extensive Conversations with Legal™
> about licensing and other financial/IP matters.

If you think that the lack of a CLA and a patent grant never causes
extensive conversations with legal, I am quite certain that you are
incorrect.  I know of multiple instances where this has been a
concern.

Other open source projects even more prominent and successful than
PostgreSQL have done these things.  You (and some others) seem
confident that we know better, an attitude that I just don't
understand.  It's not as if we have some blue ribbon panel coming up
with best practices which we then implement.  We're just doing the
same thing that we have been doing for 20+ years because that's the
way we've always done it.  It's probably not a bad process, because
it's gotten us this far and this is a pretty cool place to be, but
processes can involve and get better, just as the code itself has
done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] logical decoding of two-phase transactions

2018-07-27 Thread Nikhil Sontakke
>> PFA, latest patchset, which completely removes the earlier
>> LogicalLock/LogicalUnLock implementation using groupDecode stuff and
>> uses the newly suggested approach of checking the currently decoded
>> XID for abort in systable_* API functions. Much simpler to code and
>> easier to test as well.
>
> So, leaving the fact that it might not actually be correct aside ;), you
> seem to be ok with the approach?
>

;-)

Yes, I do like the approach. Do you think there are other locations
other than systable_* APIs which might need such checks?


>> There's an additional test case in
>> 0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
>> uses a sleep in the "change" plugin API to allow a concurrent rollback
>> on the 2PC being currently decoded. Andres generally doesn't like this
>> approach :-), but there are no timing/interlocking issues now, and the
>> sleep just helps us do a concurrent rollback, so it might be ok now,
>> all things considered. Anyways, it's an additional patch for now.
>
> Yea, I still don't think it's ok. The tests won't be reliable.  There's
> ways to make this reliable, e.g. by forcing a lock to be acquired that's
> externally held or such. Might even be doable just with a weird custom
> datatype.
>

Ok, I will look at ways to do away with the sleep.


>> diff --git a/src/backend/access/index/genam.c 
>> b/src/backend/access/index/genam.c
>> index 9d08775687..67c5810bf7 100644
>> --- a/src/backend/access/index/genam.c
>> +++ b/src/backend/access/index/genam.c
>> @@ -423,6 +423,16 @@ systable_getnext(SysScanDesc sysscan)
>>   else
>>   htup = heap_getnext(sysscan->scan, ForwardScanDirection);
>>
>> + /*
>> +  * If CheckXidAlive is valid, then we check if it aborted. If it did, 
>> we
>> +  * error out
>> +  */
>> + if (TransactionIdIsValid(CheckXidAlive) &&
>> + TransactionIdDidAbort(CheckXidAlive))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_TRANSACTION_ROLLBACK),
>> +  errmsg("transaction aborted during system 
>> catalog scan")));
>> +
>>   return htup;
>>  }
>
> Don't we have to check TransactionIdIsInProgress() first? C.f. header
> comments in tqual.c.  Note this is also not guaranteed to be correct
> after a crash (where no clog entry will exist for an aborted xact), but
> we probably shouldn't get here in that case - but better be safe.
>
> I suspect it'd be better reformulated as
>   TransactionIdIsValid(CheckXidAlive) &&
>   !TransactionIdIsInProgress(CheckXidAlive) &&
>   !TransactionIdDidCommit(CheckXidAlive)
>
> What do you think?
>

tqual.c does seem to mention this for a non-MVCC snapshot, so might as
well do it this ways. The caching of fetched XID should not make these
checks too expensive anyways.

>
> I think it'd also be good to add assertions to codepaths not going
> through systable_* asserting that
> !TransactionIdIsValid(CheckXidAlive). Alternatively we could add an
> if (unlikely(TransactionIdIsValid(CheckXidAlive)) && ...)
> branch to those too.
>

I was wondering if anything else would be needed for user-defined
catalog tables..

>
>
>> From 80fc576bda483798919653991bef6dc198625d90 Mon Sep 17 00:00:00 2001
>> From: Nikhil Sontakke 
>> Date: Wed, 13 Jun 2018 16:31:15 +0530
>> Subject: [PATCH 4/5] Teach test_decoding plugin to work with 2PC
>>
>> Includes a new option "enable_twophase". Depending on this options
>> value, PREPARE TRANSACTION will either be decoded or treated as
>> a single phase commit later.
>
> FWIW, I don't think I'm ok with doing this on a per-plugin-option basis.
> I think this is something that should be known to the outside of the
> plugin. More similar to how binary / non-binary support works.  Should
> also be able to inquire the output plugin whether it's supported (cf
> previous similarity).
>

Hmm, lemme see if we can do it outside of the plugin. But note that a
plugin might want to decode some 2PC at prepare time and another are
"commit prepared" time.

We also need to take care to not break logical replication if the
other node is running non-2PC enabled code. We tried to optimize the
COMMIT/ABORT handling by adding sub flags to the existing protocol. I
will test that as well.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Robert Haas
On Thu, Jul 26, 2018 at 11:00 PM, Stephen Frost  wrote:
> This is a killer point here- clearly the people who have been
> contributing to PG aren't going to complain about their contributions
> being released as part of some other work which has a different license
> or they'd have gone after the many, many, many closed-source and
> proprietary and commercial forks of PG.

Yes.  Tom's argument doesn't work at all, for this reason among
others.  Other projects have done things like this, I'm pretty sure,
so we could, too.

> I have to say that I do like the idea of the explicit patent grant which
> is included in the Apache 2 license.  I'm a bit concerned about how it's
> just a heck of a lot more text for some, but I doubt there's a simpler
> way to get there with a license that's been actually reviewed by modern
> lawyers and which covers what we'd like.

I agree.  The argument that is being made by Tom, Bruce, Dave, and
David Fetter -- not to conflate them all, but they seem to be broadly
on the same wavelength -- is that we should just reject contributions
if the contributor says that there is a patent on anything in that
contribution, or if we find out that this is the case.  Then, they
allege, we will have no problem with patents.  I am not a lawyer, but
I don't think it works that way.  Even if PostgreSQL never accepts a
contribution that is known to be covered by a patent, somebody can
still find something in there that they think is covered by some
patent they own and start suing people.  The point of the Apache 2
license (and similar ones) is to make that sort of thing less likely
to happen and less likely to be successful.  It is not bullet-proof,
of course, but it is intended to help.

Saying "as long as we don't handle any patents we won't have any
trouble with patents" is roughly as credible as the same sentence
would be if you replaced "patents" with "nuclear weapons" or "knives"
or "guns" or "toxic waste".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Explain buffers wrong counter with parallel plans

2018-07-27 Thread Amit Kapila
On Thu, Jul 26, 2018 at 7:31 PM, Jonathan S. Katz
 wrote:
>
>> On Jul 25, 2018, at 10:25 PM, Amit Kapila  wrote:
>>
>>
>> You mean to say the number (Buffers: shared read=442478) in HEAD,
>> right?  If so, yeah, I am also wondering why the results of the patch
>> are different in HEAD and 11beta2.  So, if I read correctly, the
>> numbers in 11beta2 appears correct, but on HEAD it is not correct, it
>> should have divided the buffers read by loops.

I want to correct myself here, the numbers on HEAD are correct, but
not on PG11beta2.  Is there any chance that either you forgot to apply
the patch or maybe it is not using correct binaries in case of
11beta2.

>>  I will figure that
>> out, but this is just to clarify that both of us are seeing the
>> results in the same way.
>
> I’ll try it again but patch it against 11beta2 and see what results I get.
>
>>
>>> and there appears to be a performance hit.
>>>
>>
>> Do you mean to say the performance of the same query in 11beta2 and
>> HEAD or something else?
>
> Correct. But per your advice let me try running it against a patched
> version of 11beta2 and see what happens.
>

Yeah, that would be better.  Today, I have tried the patch on both
Head and PG11 and I am getting same and correct results.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Removing unneeded self joins

2018-07-27 Thread Alexander Kuzmenkov
Here is a current version of the patch, still rather experimental. Since 
the previous version, I fixed some bugs and added the possibility to 
remove a relation even when it is mentioned in target lists. I have to 
rewrite all references to the removed relation in targetlists and the 
equivalence classes, so that they point to the remaining relation. I 
change RestrictInfos in place, and update attr_needed and reltarget of 
the remaining relation. I also update equivalence members, and delete 
equivalence classes that become single-member.


I'm posting it a single file now, because all meaningful changes are in 
analyzejoins.c anyway.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 2f2be5d5efa0fcef38a0d6dd1225c73a377f88b8 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Wed, 13 Jun 2018 20:56:03 +0300
Subject: [PATCH] Remove unique self joins.

Remove inner joins of a relation to itself on all columns of some unique
index. Such a join can be replaced with a scan that uses the combined
restrictions from both sides. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery with slight
modifications.
---
 src/backend/nodes/equalfuncs.c|   7 +-
 src/backend/optimizer/path/indxpath.c |  23 +-
 src/backend/optimizer/path/joinpath.c |   6 +-
 src/backend/optimizer/plan/analyzejoins.c | 819 +++---
 src/backend/optimizer/plan/planmain.c |   7 +-
 src/backend/optimizer/util/pathnode.c |   3 +-
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/include/nodes/relation.h  |   6 +-
 src/include/optimizer/pathnode.h  |   5 +
 src/include/optimizer/paths.h |  14 +-
 src/include/optimizer/planmain.h  |  10 +-
 src/test/regress/expected/join.out|  84 ++-
 src/test/regress/sql/join.sql |  34 ++
 13 files changed, 937 insertions(+), 107 deletions(-)

diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6a971d0..fd02dc8 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -166,9 +166,10 @@ _equalVar(const Var *a, const Var *b)
 	COMPARE_SCALAR_FIELD(vartypmod);
 	COMPARE_SCALAR_FIELD(varcollid);
 	COMPARE_SCALAR_FIELD(varlevelsup);
-	COMPARE_SCALAR_FIELD(varnoold);
-	COMPARE_SCALAR_FIELD(varoattno);
-	COMPARE_LOCATION_FIELD(location);
+	/*
+	 * Parse location and old varno/varattno may differ even when
+	 * the variables are logically the same.
+	 */
 
 	return true;
 }
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..4daf3bf 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2964,7 +2964,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueIndexInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -2985,7 +2986,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueIndexInfo **index_info)
 {
 	ListCell   *ic;
 
@@ -3041,6 +3043,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *matched_restrictlist = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3089,6 +3092,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	matched_restrictlist = lappend(matched_restrictlist, rinfo);
 	break;
 }
 			}
@@ -3131,7 +3135,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Matched all columns of this index? */
 		if (c == ind->ncolumns)
+		{
+			if (index_info != NULL)
+			{
+/* This may be called in GEQO memory context. */
+MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+*index_info = palloc(sizeof(UniqueIndexInfo));
+(*index_info)->index = ind;
+(*index_info)->clauses = list_copy(matched_restrictlist);
+MemoryContextSwitchTo(oldContext);
+			}
+			if (matched_restrictlist)
+pfree(matched_restrictlist);
 			return true;
+		}
+		if (matched_restrictlist)
+			pfree(matched_restrictlist);
 	}
 
 	return false;
diff --git 

Re: partition tree inspection functions

2018-07-27 Thread Jesper Pedersen

Hi Amit,

On 07/26/2018 10:33 PM, Amit Langote wrote:

Optional parameter sounds good, so made it get_partition_level(regclass [
, regclass ]) in the updated patch.  Although, adding that argument is not
without possible surprises its result might evoke.  Like, what happens if
you try to find the level of the root table by passing a leaf partition
oid for the root table argument, or pass a totally unrelated table for the
root table argument.  For now, I've made the function return 0 for such cases.



As 0 is a valid return value for root nodes I think we should use -1 
instead for these cases.


Otherwise looks good.

Best regards,
 Jesper



Re: pgbench - very minor bug fix on hash() missing argument

2018-07-27 Thread Fabien COELHO


Hello Michaël,


Thanks, committed and back-patched.


Ok.


I have added some tests for least() and greatest() on the way.


Good!

Thanks,

--
Fabien.

Inconsistent error message in bt_page_items_bytea().

2018-07-27 Thread Ashutosh Sharma
Hi All,

All the pageinspect functions dealing with raw page has the error
message as "must be superuser to use raw page function" however,
that's not true for bt_page_items_bytea() which has  "must be
superuser to use pageinspect functions". This seems to me like a copy
paste error which got transferred from bt_page_items (the function
that doesn't deal with raw page).

Attached is the patch with the correct error message.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 90acf6a..184ac62 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -429,7 +429,7 @@ bt_page_items_bytea(PG_FUNCTION_ARGS)
 	if (!superuser())
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to use pageinspect functions";
+ (errmsg("must be superuser to use raw page functions";
 
 	if (SRF_IS_FIRSTCALL())
 	{


My Skype account (korotkovae) was hacked

2018-07-27 Thread Alexander Korotkov
Hi!

Some community members did communicate with me using my Skype account
(korotkovae).  Today this account was hacked.  You should know that
any requests to send money sent from my account are fraud.
Sorry for inconvenience.  I've send request to restore my Skype
account, it will be under consideration in next 24 hours.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Auditing via logical decoding

2018-07-27 Thread Jeremy Finzel
On Fri, Jul 27, 2018 at 5:41 AM Philip Scott 
wrote:

> Hi Postgres Hackers,
>
> We have been using our own trigger-based audit system at my firm
> successfully for some years, but the performance penalty is starting to
> grate a bit and so I have been tasked with seeing if we can make use of
> the new logical decoding functions to achieve the same thing. I thought
> that someone must already have written something that would satisfy our
> use-case but my internet searches have come up short so far so I am
> considering writing a logical decoding plugin to do what we want.
>
> I thought I would run the idea past you all here just in case my plan is
> crazy; I’ve browsed around the postgres source code a bit before but
> I’ve never really gotten my hands dirty and am a little bit nervous
> about putting my own C code into the heart of our DBMS so if this comes
> to anything I would like to offer my code up for review and/or possible
> inclusion as a contributed module.
>
> A quick summary of requirements:
>
> We want to log (to a separate, remote database)
>- One row for every transaction that changes the state of the
> database.
>  We call this table ‘audit_entry’ and contains the xid, transaction
> timestamp, username, client hostname, and application name of the
> session that caused the change.
>- One row for each change made by each transaction which records the
> state of the tuple before the change.
>  We call this table ‘audit_detail’ and contains xid, statement
> timestamp, table name & schema, event_type, primary_key (hstore),
> old_row (hstore), and the text of the query that was responsible for the
> change.
>
> A lot of that information is available already by listening to the
> pgoutput decoding, and my first thought was that I could just write a
> receiver for that. However, application name, username, client hostname
> and current_query() are not available. This is understandable as they
> aren’t useful for logical replication.
>
> I was about to give up, when I discovered pg_logical_emit_message.
>
> My current thoughts are to:
>- Write this extra data into a logical message while the transaction
> is still in progess
>  Either with a deferred trigger per table or, perhaps better
> Find some global commit-time (or xid-assigment time) hook emit it
> there
>
>- Then get the information out of the database:
>  Either modify the existing pgoutput plugin & protocol to forward
> such messages in its stream,
>  Or write a dedicated ‘audit’ decoding plugin with its own protocol
>
>- Then get the information into the ‘auditing’ database:
>  Either with some standalone process that connects to both, consumes
> the output created above, translates it to SQL to run in the auditing
> DB.
>  Figure out how to create a proper postgres background process to do
> it, in a similar fashion to the logical replication worker
>
> Any input you folks have would be very much appreciated.
>
> Kinds Regards,
>
> Philip
>
> PS: If there is someone out there who is willing & able to build this
> for less than my company will have to pay me to do it, please drop me a
> line ☺


All I can say is +1 this would be an awesome feature to have and I hope to
see it someday.


Auditing via logical decoding

2018-07-27 Thread Philip Scott

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm 
successfully for some years, but the performance penalty is starting to 
grate a bit and so I have been tasked with seeing if we can make use of 
the new logical decoding functions to achieve the same thing. I thought 
that someone must already have written something that would satisfy our 
use-case but my internet searches have come up short so far so I am 
considering writing a logical decoding plugin to do what we want.


I thought I would run the idea past you all here just in case my plan is 
crazy; I’ve browsed around the postgres source code a bit before but 
I’ve never really gotten my hands dirty and am a little bit nervous 
about putting my own C code into the heart of our DBMS so if this comes 
to anything I would like to offer my code up for review and/or possible 
inclusion as a contributed module.


A quick summary of requirements:

We want to log (to a separate, remote database)
  - One row for every transaction that changes the state of the 
database.
We call this table ‘audit_entry’ and contains the xid, transaction 
timestamp, username, client hostname, and application name of the 
session that caused the change.
  - One row for each change made by each transaction which records the 
state of the tuple before the change.
We call this table ‘audit_detail’ and contains xid, statement 
timestamp, table name & schema, event_type, primary_key (hstore), 
old_row (hstore), and the text of the query that was responsible for the 
change.


A lot of that information is available already by listening to the 
pgoutput decoding, and my first thought was that I could just write a 
receiver for that. However, application name, username, client hostname 
and current_query() are not available. This is understandable as they 
aren’t useful for logical replication.


I was about to give up, when I discovered pg_logical_emit_message.

My current thoughts are to:
  - Write this extra data into a logical message while the transaction 
is still in progess

Either with a deferred trigger per table or, perhaps better
Find some global commit-time (or xid-assigment time) hook emit it there

  - Then get the information out of the database:
Either modify the existing pgoutput plugin & protocol to forward 
such messages in its stream,

Or write a dedicated ‘audit’ decoding plugin with its own protocol

  - Then get the information into the ‘auditing’ database:
Either with some standalone process that connects to both, consumes 
the output created above, translates it to SQL to run in the auditing 
DB.
Figure out how to create a proper postgres background process to do 
it, in a similar fashion to the logical replication worker


Any input you folks have would be very much appreciated.

Kinds Regards,

Philip

PS: If there is someone out there who is willing & able to build this 
for less than my company will have to pay me to do it, please drop me a 
line ☺





Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2018-07-27 Thread Ashutosh Bapat
On Thu, Jul 26, 2018 at 10:30 PM, Dean Rasheed  wrote:
> On 26 July 2018 at 07:12, Ashutosh Bapat
>  wrote:
>> In the patch clauselist_selectivity() gets called repeatedly for every
>> new qual added to the clause list. Instead, if we try to combine the
>> cost/row estimation with order_qual_clauses() or
>> clauselist_selectivity(), we might be able to what the current patch
>> does in O(n). clauselist_selectivity() calculates combined selectivity
>> of all the given quals in O(n). We should do something similar to that
>> in this case.
>
> Duplicating the logic of clauselist_selectivity() seems like quite a
> lot of effort to go to just for improved filter cost estimates. Note
> also that clauselist_selectivity() might get a good deal more complex
> with multivariate stats.

I am not suggesting to duplicate code in clauselist_selectivity().
Instead I am suggesting that we incorporate costing in that function.
I don't know how feasible is that.

>
> Perhaps a reasonable simplification would be to just treat the clauses
> as independent when estimating the filter cost, and so use the
> following as a "good enough" estimate for the filter cost:
>
>   cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ...
>
> That would probably be an improvement over what we currently have. It
> would be O(n) to compute, and it would probably use the cached
> selectivity estimates for the clauses.

That looks like a good trade-off. But looking at the following comment
in clause_selectivity(), I doubt if this will work in all the cases
/*
 * If possible, cache the result of the selectivity calculation for
 * the clause.  We can cache if varRelid is zero or the clause
 * contains only vars of that relid --- otherwise varRelid will affect
 * the result, so mustn't cache.  Outer join quals might be examined
 * with either their join's actual jointype or JOIN_INNER, so we need
 * two cache variables to remember both cases.  Note: we assume the
 * result won't change if we are switching the input relations or
 * considering a unique-ified case, so we only need one cache variable
 * for all non-JOIN_INNER cases.
 */


>
> Note also that with this simplified expression for the filter cost, it
> would be possible to improve order_qual_clauses() to take into account
> the clause selectivities when sorting the clauses, and minimise the
> cost of the filter by evaluating more selective clauses sooner, if
> they're not too expensive. I believe that amounts to sorting by
> cost/(1-sel) rather than just cost, except for clauses with sel==1,
> which it makes sense to move to the end, and just sort by cost.

+1, if we could do that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



RE: Temporary tables prevent autovacuum, leading to XID wraparound

2018-07-27 Thread Tsunakawa, Takayuki
Thank you, Michael and Horiguchi-san,

From: Michael Paquier [mailto:mich...@paquier.xyz]
> autovacuum.c is a pretty bad place for stuff as namespace.c holds all the
> logic related to temporary tablespaces, so I renamed the routine to
> isTempNamespaceInUse and moved it there.

I don't have a strong opinion, but I wonder which of namespace.c or 
autovacuum.c is suitable, because isTempNamespaceInUse is specific to 
autovacuum.


Regards
Takayuki Tsunakawa
 






Re: [PATCH] Improve geometric types

2018-07-27 Thread Kyotaro HORIGUCHI
Thank you for taking this.

At Thu, 26 Jul 2018 17:12:50 +0200, Tomas Vondra  
wrote in <672f4c42-6742-c1ec-e9a4-1994b815a...@2ndquadrant.com>
> On 07/11/2018 07:13 PM, Emre Hasegeli wrote:
> > New versions are attached after the  patch got in.  I noticed
> > tests failing on Windows [1] and added alternative .out file.
> > [1]
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.5235
> > 
> 
> The version posted about two weeks ago is slightly broken - AFAICS the
> float.h includes in geo_ops.c and gistproc.c need to be part of 0002,
> not 0003. Attached is a version fixing that.
> 
> Barring objections, I'll get this committed over the next few days,
> once I review all the individual parts once more. I'm planning to
> commit the 6 parts separately, as submitted. No backpatching, as
> discussed before.

+1 for no backpatching, and not merging into single big patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Early WIP/PoC for inlining CTEs

2018-07-27 Thread David Fetter
On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter  wrote:
> > Please find attached the next version, which passes 'make check'.
> 
> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).

Please find attached a patch that does.

It doesn't always pass make installcheck-world, but I need to sleep
rather than investigate that at the moment.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f2c2b80653056058b7fb799bb8b9d2c0443ddd5f Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 24 Jul 2018 22:09:55 -0700
Subject: [PATCH] Inlining CTEs v0004
To: pgsql-hack...@postgresql.org

---
 .../postgres_fdw/expected/postgres_fdw.out|  22 +--
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/nodes/outfuncs.c  |   1 +
 src/backend/nodes/readfuncs.c |   1 +
 src/backend/optimizer/plan/subselect.c| 152 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/rowsecurity.out |  92 +--
 src/test/regress/expected/rowtypes.out|  13 +-
 src/test/regress/expected/rules.out   |   6 +-
 10 files changed, 209 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f5498c62bd..8bf011df46 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1889,21 +1889,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
 WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
- QUERY PLAN  
--
+QUERY PLAN
+--
  Limit
-   Output: t.c1_1, t.c2_1, t.c1_3
-   CTE t
- ->  Foreign Scan
-   Output: t1.c1, t1.c3, t2.c1
-   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"
-   ->  Sort
- Output: t.c1_1, t.c2_1, t.c1_3
- Sort Key: t.c1_3, t.c1_1
- ->  CTE Scan on t
-   Output: t.c1_1, t.c2_1, t.c1_3
-(12 rows)
+   Output: t1.c1, t2.c1, t1.c3
+   ->  Foreign Scan
+ Output: t1.c1, t2.c1, t1.c3
+ Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+ Remote SQL: SELECT r2."C 1", r3."C 1", r2.c3 FROM ("S 1"."T 1" r2 INNER JOIN "S 1"."T 1" r3 ON (((r2."C 1" = r3."C 1" ORDER BY r2.c3 ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 17b650b8cb..8ba2fc1aed 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2530,6 +2530,7 @@ _copyCommonTableExpr(const CommonTableExpr *from)
 	COPY_NODE_FIELD(ctequery);
 	COPY_LOCATION_FIELD(location);
 	COPY_SCALAR_FIELD(cterecursive);
+	COPY_SCALAR_FIELD(ctematerialized);
 	COPY_SCALAR_FIELD(cterefcount);
 	COPY_NODE_FIELD(ctecolnames);
 	COPY_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2facb8..2a04506d0d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2793,6 +2793,7 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
 	COMPARE_NODE_FIELD(ctequery);
 	COMPARE_LOCATION_FIELD(location);
 	COMPARE_SCALAR_FIELD(cterecursive);
+	COMPARE_SCALAR_FIELD(ctematerialized);
 	COMPARE_SCALAR_FIELD(cterefcount);
 	COMPARE_NODE_FIELD(ctecolnames);
 	COMPARE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a6454ce28b..0ea6b7125a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3080,6 +3080,7 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
 	WRITE_NODE_FIELD(ctequery);
 	

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-07-27 Thread Ashutosh Bapat
On Thu, Jul 26, 2018 at 8:37 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat 
>>  wrote:
>>
>> On Fri, Jul 20, 2018 at 3:13 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> >
>> > It's of course wrong, it's going to be O(max(m, n)) as you said, but
>> > the point is still valid - if we have partitions A1, A2 from one side
>> > and B1, ..., BN on another side, we can skip necessary the
>> > partitions from B that are between e.g. A1 and A2 faster with
>> > binary search.
>>
>> That's possible only when there is no default partition and the join
>> is an inner join. For an outer join, we need to include all the
>> partitions on the outer side, so we can't just skip over some
>> partitions. In case of a default partition, it can take place of a
>> missing partition, so we can't skip partitions using binary search.
>
> But in this case I described above (when we have a partition A3 on one side,
> and another matching partition B3 from other side, but separated by some other
> partitions, e.g. B1, B2) it means that we will merge A3 with a default
> partition from other side without actually needing that, am I right? In this
> sense it's an overhead out of nothing.

No. We will join A3 with B3 since they have matching bounds. We will
compare B1 and B2's bounds with A3 (assuming that there are no bounds
before A3). Since they won't be compatible we will match default of A
to B1 and B2. That will of-course fail since we will try to match
multiple partitions to a single partition, but that's not the point of
your question I think. You are right that we could skip comparing A3
with B1 and B2 and directly land on B3. Any partitions skipped in
between will be matched with A's default partition. But as I have said
this would be rare and complicating the logic for a rare case doesn't
seem practical at this stage.

Apart from the complexity there's also a possibility that this
skipping will reduce the efficiency actually in normal cases. Consider
a case where A and B have exactly matching partitions. Current
partition matching algorithm compare any given range/bound only once
and we will have O(n) algorithm. If we use a binary search, however,
for every range comparison, it will be O(n log n) algorithm. There
will be unnecessary comparisons during binary search. The current
algorithm is always O(n), whereas binary search would be O(n log(n))
with a possibility of having sub-O(n) complexity in some cases. I
would go for an algorithm which has a consistent time complexity
always and which is efficient in normal cases, rather than worrying
about some cases which are not practical.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-07-27 Thread Amit Langote
On 2018/07/27 1:19, David Rowley wrote:
> On 18 July 2018 at 20:29, Amit Langote  wrote:
>> Let me know what you think of the code in the updated patch.
> 
> Thanks for sending the updated patch.
> 
> I looked over it tonight and made a number of changes:
> 
> 1) Got rid of PARTITION_ROUTING_MAXSIZE. The code using this was
> useless since the int would have wrapped long before it reached
> UINT_MAX.

Oops, you're right.

> There's no shortage of other code doubling the size of an
> array by multiplying it by 2 unconditionally without considering
> overflowing an int. Unsure why you considered this more risky.

Just ill-informed paranoia on my part.  Let's just drop it as you say,
given also the Tom's comment that repalloc would fail anyway for requests
over 1GB.

> 2) Fixed a series of bugs regarding the size of the arrays in
> PartitionTupleRouting. The map arrays and the partitions array could
> differ in size despite your comment that claimed
> child_parent_tupconv_maps was the same size as 'partitions' when
> non-NULL. The map arrays being a different size than the partitions
> array caused the following two cases to segfault. I've included two
> cases as it was two seperate bugs that caused them.
> 
> -- case 1

[  ]

> -- case 2

Indeed, there were some holes in the logic that led to me to come up with
that code.

> 3) Got rid of ExecUseUpdateResultRelForRouting. I started to change
> this to remove references to UPDATE in order to make it more friendly
> towards other possible future node types that it would get used for
> (aka MERGE). In the end, I found that performance could regress when
> in cases like:
> 
> drop table listp;
> create table listp (a int) partition by list(a);
> \o /dev/null
> \timing off
> select 'create table listp'||x::Text||' partition of listp for values
> in('||x::Text||');' from generate_series(1,1000) x;
> \gexec
> \o
> insert into listp select x from generate_series(1,999) x;
> \timing on
> update listp set a = a+1;
> 
> It's true that UPDATEs with a large number of subplans performance is
> quite terrible today in the planner, but this code made the
> performance of planning+execution a bit worse. If we get around to
> fixing the inheritance planner then I think
> ExecUseUpdateResultRelForRouting() could easily appear in profiles.
>
> I ended up rewriting it to just get called once and build a hash table
> by Oid storing a ResultRelInfo pointer.  This also gets rid of the
> slow nested loop in the cleanup operation inside
> ExecCleanupTupleRouting().

OK, looks neat, although I'd name the hash table subplan_resultrel_hash
(like join_rel_hash in PlannerInfo), instead of subplan_partition_table.

> 4) Did some tuning work in ExecFindPartition() getting rid of a
> redundant check after the loop completion. Also added some likely()
> and unlikely() decorations around some conditions.

All changes seem good.

> 5) Updated some newly out-dated comments since your patch in execPartition.h.
> 
> 6) Replaced the palloc0() in ExecSetupPartitionTupleRouting() with a
> palloc() updating the few fields that were not initialised. This might
> save a few TPS (at least once we get rid of the all partition locking)
> in the single-row INSERT case, but I've not tested the performance of
> this yet.
> 
> 7) Also moved and edited some comments above
> ExecSetupPartitionTupleRouting() that I felt explained a little too
> much about some internal implementation details.

Thanks, changes look good.

> One thing that I thought of, but didn't do was just having
> ExecFindPartition() return the ResultRelInfo. I think it would be much
> nicer in both call sites to not have to check the ->partitions array
> to get that.  The copy.c call site would need a few modifications
> around the detection code to see if the partition has changed, but it
> all looks quite possible to change. I left this for now as I have
> another patch which touches all that code that I feel is closer to
> commit than this patch is.

I had wondered about that too, but gave up on doing anything about it
because the callers of ExecFindPartition want to access other fields of
PartitionTupleRouting using the returned index.  Maybe, we could make it
return a ResultRelInfo * and also return the index itself using a separate
output argument.  Seems like a cosmetic improvement that can be made later.

> I've attached a delta of the changes I made since your v2 delta and
> also a complete updated patch.

Thanks.  Here are some other minor comments on the complete v2 patch.

-tuple =
ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps[leaf_part_index],
+tuple =
ConvertPartitionTupleSlot(proute->parent_child_tupconv_maps ?
+
proute->parent_child_tupconv_maps[leaf_part_index] :
+NULL,

This piece of code that's present in both ExecPrepareTupleRouting and
CopyFrom can be written as:

if (proute->parent_child_tupconv_maps)

Re: print_path is missing GatherMerge and CustomScan support

2018-07-27 Thread Michael Paquier
On Fri, Jul 27, 2018 at 03:40:48PM +0900, Etsuro Fujita wrote:
> (I think that at least currently, there is no need for the Gather and
> GatherMerge cases in reparameterize_path_by_child, but I don't object to
> keeping those as-is there.)

Let's keep them.  As far as my understanding goes, which is way lower
than any of you by the way, those don't hurt and would automatically
help.
--
Michael


signature.asc
Description: PGP signature


Re: print_path is missing GatherMerge and CustomScan support

2018-07-27 Thread Etsuro Fujita

(2018/07/27 4:50), Robert Haas wrote:

On Thu, Jul 26, 2018 at 1:14 AM, Etsuro Fujita
  wrote:

because we currently don't consider gathering partial child-scan or
child-join paths.  I think we might consider that in future, though.


You generally want to put the Gather node as high up in the plan tree
as possible.  I think the only case in which this is beneficial is if
you can't put the Gather or Gather Merge node above the Append because
only some of the children are parallel-safe.  In that case, a separate
Gather per child can be better than no parallelism at all.  It's a
rare case, but it can happen. Actually, I thought we had code for this
already: see the end of apply_scanjoin_target_to_paths().


Agreed.  Thanks for the explanation!

(I think that at least currently, there is no need for the Gather and 
GatherMerge cases in reparameterize_path_by_child, but I don't object to 
keeping those as-is there.)


Best regards,
Etsuro Fujita



Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-07-27 Thread Michael Paquier
On Thu, Jul 26, 2018 at 07:05:11PM +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 18 Jul 2018 07:34:10 +, "Tsunakawa, Takayuki" 
>  wrote in 
> <0A3221C70F24FB45833433255569204D1FA538FD@G01JPEXMBYT05>
>> From: Michael Paquier [mailto:mich...@paquier.xyz]
>>> +   /* Does the backend own the temp schema? */
>>> +   if (proc->tempNamespaceId != namespaceID)
>>> +   return false;
>>> I have a very hard time believing that this is safe lock-less, and a spin
>>> lock would be enough it seems.
>> 
>> The lwlock in BackendIdGetProc() flushes the CPU cache so that
>> proc->tempNamespaceId reflects the latest value.  Or, do you mean
>> another spinlock elsewhere? 
> 
> It seems to be allowed that the series of checks on *proc results
> in false-positive, which is the safer side for the usage, even it
> is not atomically updated. Actually ->databaseId is written
> without taking a lock.

Well, from postinit.c there are a couple of assumptions in this case, so
it is neither white nor black:
/*
 * Now we can mark our PGPROC entry with the database ID.
 *
 * We assume this is an atomic store so no lock is needed; though actually
 * things would work fine even if it weren't atomic.  Anyone searching the
 * ProcArray for this database's ID should hold the database lock, so they
 * would not be executing concurrently with this store.  A process looking
 * for another database's ID could in theory see a chance match if it read
 * a partially-updated databaseId value; but as long as all such searches
 * wait and retry, as in CountOtherDBBackends(), they will certainly see
 * the correct value on their next try.
 */
MyProc->databaseId = MyDatabaseId;

Anyway, I have spent some time on this patch, and the thing is doing a
rather bad job about why it is fine to assume that the update can happen
lock-less, and the central part of it seems to be that autovacuum cannot
see the temporary schema created, and any objects on it when it scans
pg_class until it gets committed, and the tracking field is updated in
PGPROC before the commit.

> backend_uses_temp_namespace is taking two parameters but the
> first one is always derived from the second. backendID doesn't
> seem to be needed outside so just one parameter namespaceID is
> needed.

Yes, that reduces the number of calls to GetTempNamespaceBackendId.
autovacuum.c is a pretty bad place for stuff as namespace.c holds all
the logic related to temporary tablespaces, so I renamed the routine to
isTempNamespaceInUse and moved it there.

The patch I have now is attached.  I have not been able to test it much,
particularly with orphaned temp tables and autovacuum, which is
something I'll try to look at this week end or perhaps at the beginning
of next week, heading toward a commit if I am fine enough with it.
Please feel free to look at it for the time being.

Thanks,
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 0f67a122ed..0630032148 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -47,7 +47,7 @@
 #include "parser/parse_func.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
-#include "storage/sinval.h"
+#include "storage/sinvaladt.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
@@ -3204,6 +3204,42 @@ isOtherTempNamespace(Oid namespaceId)
 	return isAnyTempNamespace(namespaceId);
 }
 
+/*
+ * isTempNamespaceInUse - is the given namespace owned and actively used
+ * by a backend?
+ *
+ * Note: this is used by autovacuum to detect the presence of orphaned
+ * temporary tables, based on the active namespace tracked by PGPROC entries.
+ */
+bool
+isTempNamespaceInUse(Oid namespaceId)
+{
+	PGPROC	   *proc;
+	int			backendId;
+
+	backendId = GetTempNamespaceBackendId(namespaceId);
+
+	if (backendId == InvalidBackendId ||
+		backendId == MyBackendId)
+		return false;
+
+	/* Is the backend alive? */
+	proc = BackendIdGetProc(backendId);
+	if (proc == NULL)
+		return false;
+
+	/* Is the backend connected to the database being vacuumed? */
+	if (proc->databaseId != MyDatabaseId)
+		return false;
+
+	/* Does the backend own the temporary namespace? */
+	if (proc->tempNamespaceId != namespaceId)
+		return false;
+
+	/* all good to go */
+	return true;
+}
+
 /*
  * GetTempNamespaceBackendId - if the given namespace is a temporary-table
  * namespace (either my own, or another backend's), return the BackendId
@@ -3893,6 +3929,15 @@ InitTempTableNamespace(void)
 	myTempNamespace = namespaceId;
 	myTempToastNamespace = toastspaceId;
 
+	/*
+	 * Mark the proc entry as owning this namespace which autovacuum uses to
+	 * decide if a temporary file entry can be marked as orphaned or not. Even
+	 * if this is an atomic operation, this can be safely done lock-less as no
+	 * temporary relations associated to it can be seen yet by autovacuum when
+	 * scanning pg_class.
+	 */
+	MyProc->tempNamespaceId = namespaceId;
+
 	/* It should not be done already. */