Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-29 Thread Tom Lane
Alvaro Herrera  writes:
> ...  However, when you create an index, you can
> indicate which operator class to use, and it may not be the default one.
> If a different one is chosen at index creation time, then a query using
> COUNT(distinct) will do the wrong thing, because DISTINCT will select
> an equality type using the type's default operator class, not the
> equality that belongs to the operator class used to create the index.

> That's wrong: DISTINCT should use the equality operator that corresponds
> to the index' operator class instead, not the default one.

Uh, what?  Surely the semantics of count(distinct x) *must not* vary
depending on what indexes happen to be available.

I think what you meant to say is that the planner may only choose an
optimization of this sort when the index's opclass matches the one
DISTINCT will use, ie the default for the data type.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

2017-07-29 Thread Tom Lane
I wrote:
> So the question is, does anyone care?  I wouldn't except that our
> documentation appears to claim that we work with Perl "5.8 or later".
> And the lack of field complaints suggests strongly that nobody else
> cares.  So I'm inclined to think we just need to be more specific
> about the minimum Perl version --- but what exactly?

I've done some more research on this.  It seems to me there are four
distinct components to any claim about whether we work with a particular
Perl version:

1. Can it run the build-related Perl scripts needed to build PG from
a bare git checkout, on non-Windows platforms?
2. Can it run our MSVC build scripts?
3. Does pl/perl compile (and pass its regression tests) against it?
4. Can it run our TAP tests?

I have no info to offer about point #2, but I did some tests with
ancient 5.8.x Perl versions on my buildfarm hosts prairiedog and
gaur.  (I would have liked to use something faster, but these Perl
versions failed to build at all on more recent platforms, and
I thought it rather pointless to try to hack them enough to build.)

I find that perl 5.8.0 and later seem to succeed at point #1,
but you need at least 5.8.1 to compile plperl.  Also, on prairiedog
5.8.1 fails plperl's regression tests because of some seemingly
utf8-related bug.  That may be an ancient-macOS problem more than
anything else, so I didn't poke at it too hard.

The really surprising thing I found out is that you can't run the
TAP tests on anything older than 5.8.3, because that's when they
added "prove".  I'm bemused by the idea that such a fundamental
facility would get added in a third-digit minor release, but there
you have it.  (Now, in fairness, this amounted to importing a feature
that already existed on CPAN, but still...)

5.8.3 does appear to succeed at points 1,3,4.  Now, to get through
the TAP tests you need to install IPC::Run, and you also need to
update Test::More because the version shipped with 5.8.3 is too old.
But at least the failures that you get from lacking these are pretty
clear.

I am unable to confirm our claim that we work with Test::More 0.82,
because CPAN has only versions from a year or three back.  (Anyone
know of a more, er, comprehensive archive than CPAN?)  It's also
interesting to speculate about how old a version of IPC::Run is new
enough, but I see no easy way to get much data on that either.

Anyway, pending some news about compatibility of the MSVC scripts,
I think we ought to adjust our docs to state that 5.8.3 is the
minimum supported Perl version.

Also, I got seriously confused at one point during these tests because,
while our configure script carefully sets PERL to an absolute path name,
it's content to set PROVE to "prove", paying no attention to whether
that version of "prove" matches "perl".  Is it really okay to run the
TAP tests with a different perl version than we selected for other
purposes?  I think it'd be a good idea to insist that "prove" be in
the same directory we found "perl" in.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-07-29 Thread Jeevan Ladhe
Hi Ashutosh,

0003 patch

> +parentRel = heap_open(parentOid, AccessExclusiveLock);
> In [2], Amit Langote has given a reason as to why heap_drop_with_catalog()
> should not heap_open() the parent relation. But this patch still calls
> heap_open() without giving any counter argument. Also I don't see
> get_default_partition_oid() using Relation anywhere. If you remove that
> heap_open() please remove following heap_close().
>

I think the patch 0004 exactly does what you have said here, i.e. it gets
rid of the heap_open() and heap_close().
The question might be why I kept the patch 0004 a separate one, and the
answer is I wanted to make it easier for review, and also keeping it that
way would make it bit easy to work on a different approach if needed.

About this: *"Also I don't see get_default_partition_oid() using Relation
anywhere."*
The get_default_partition_oid() uses parent relation to
retrieve PartitionDesc
from parent.

Kindly let me know if you think I am still missing anything.

Regards,
Jeevan Ladhe


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-29 Thread Alvaro Herrera
Mark Rofail wrote:
> These are limitations of the patch ordered by importance:
> 
> ✗ presupposes that count(distinct y) has exactly the same notion of
> equality that the PK unique index has. In reality, count(distinct) will
> fall back to the default btree opclass for the array element type.

Operators are classified in operator classes; each data type may have
more than one operator class for a particular access method.  Exactly
one operator class for some access method can be designated as the
default one for a type.  However, when you create an index, you can
indicate which operator class to use, and it may not be the default one.
If a different one is chosen at index creation time, then a query using
COUNT(distinct) will do the wrong thing, because DISTINCT will select
an equality type using the type's default operator class, not the
equality that belongs to the operator class used to create the index.

That's wrong: DISTINCT should use the equality operator that corresponds
to the index' operator class instead, not the default one.

I hope that made sense.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-29 Thread Peter Geoghegan

Claudio Freire  wrote:

README.HOT says that that cost is not worth the benefit of
preventing a new index write, but I think that it ought to take into
account that not all index writes are equal. There is an appreciable
difference between inserting a new tuple, and updating one in-place. We
can remove the cost (hurting new snapshots by making them go through old
heap pages) while preserving most of the benefits (no logically
unnecessary index bloat).


It's a neat idea.


Thanks.

I think it's important to both prevent index bloat, and to make sure
that only the latest version is pointed to within indexes. There are
only so many ways that that can be done. I've tried to come up with a
way of doing those two things that breaks as little of heapam.c as
possible. As a bonus, some kind of super-pruning of many linked HOT
chains may be enabled, which is something that an asynchronous process
can do when triggered by a regular prune within a user backend.

This is a kind of micro-vacuum that is actually much closer to VACUUM
than the kill_prior_tuple stuff, or traditional pruning, in that it
potentially kills index entries (just those that were not subsequently
updated in place, because the new values for the index differed), and
then kills heap tuples, all together, without even keeping around a stub
itemId in the heap. And, chaining together HOT chains also lets us chain
together pruning. Retail index tuple deletion from pruning needs to be
crash safe, unlike LP_DEAD setting.


And, well, now that you mention, you don't need to touch indexes at all.

You can create the new chain, and "update" the index to point to it,
without ever touching the index itself, since you can repoint the old
HOT chain's start line pointer to point to the new HOT chain, create
a new pointer for the old one and point to it in the new HOT chain's
t_tid.

Existing index tuples thus now point to the right HOT chain without
having to go into the index and make any changes.

You do need the new HOT chain to live in the same page for this,
however.


That seems complicated. The idea that I'm trying to preserve here is the
idea that the beginning of a HOT-chain (a definition that includes a
"potential HOT chain" -- a single heap tuple that could later receive a
HOT UPDATE) unambiguously signals a need for physical changes to indexes
in all cases. The idea that I'm trying to move away from is that those
physical changes need to be new index insertions (new insertions should
only happen when it is logically necessary, because indexed values
changed).

Note that this can preserve the kill_prior_tuple stuff, I think, because
if everything is dead within a single HOT chain (a HOT chain by our
current definition -- not a chain of HOT chains) then nobody can need
the index tuple. This does require adding complexity around aborted
transactions, whose new (potential) HOT chain t_tid "backpointer" is
still needed; we must revise the definition of a HOT chain being
all_dead to accommodate that. But for the most part, we preserve HOT
chains as a thing that garbage collection can independently reason
about, process with single page atomic operations while still being
crash safe, etc.

As far as microvacuum style garbage collection goes, at a high level,
HOT chains seem like a good choke point to do clean-up of both heap
tuples (pruning) and index tuples. The complexity of doing that seems
manageable. And by chaining together HOT chains, you can really
aggressively microvacuum many HOT chains on many pages within an
asynchronous process as soon as the long running transaction goes away.
We lean on temporal locality for garbage collection.

There are numerous complications that I haven't really acknowledged but
am at least aware of. For one, when I say "update in place", I don't
necessarily mean it literally. It's probably possible to literally
update in place with unique indexes. For secondary indexes, which should
still have heap TID as part of their keyspace (once you go implement
that, Claudio), we probably need an index insertion immediately followed
by an index deletion, often within the same leaf page.

I hope that this design, such as it is, will be reviewed as a thought
experiment. What would be good or bad about a design like this in the
real world, particularly as compared to alternatives that we know about?
Is *some* "third way" design desirable and achievable, if not this one?
By "third way" design, I mean a design that is much less invasive than
adopting UNDO for MVCC, that still addresses the issues that we
currently have with certain types of UPDATE-heavy workloads, especially
when there are long running transactions, etc. I doubt that WARM meets
this standard, unfortunately, because it doesn't do anything for cases
that suffer only due to a long running xact.

I don't accept that there is a rigid dichotomy between Postgres style
MVCC, and using UNDO for MVCC, and I most certainly don't accept that
garbage 

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-29 Thread Mark Rofail
These are limitations of the patch ordered by importance:

✗ presupposes that count(distinct y) has exactly the same notion of
equality that the PK unique index has. In reality, count(distinct) will
fall back to the default btree opclass for the array element type.

- Supported actions:
 ✔ NO ACTION
 ✔ RESTRICT
 ✗ CASCADE
 ✗ SET NULL
 ✗ SET DEFAULT

✗ coercion is unsopported. i.e. a numeric can't refrence int8

✗ Only one "ELEMENT" column allowed in a multi-column key

✗ undesirable dependency on default opclass semantics in the patch, which
is that it supposes it can use array_eq() to detect whether or not the
referencing column has changed.  But I think that can be fixed without
undue pain by providing a refactored version of array_eq() that can be told
which element-comparison function to use

✗ cross-type FKs are unsupported

-- Resolved limitations =

✔ fatal performance issues.  If you issue any UPDATE or DELETE against the
PK table, you get a query like this for checking to see if the RI
constraint would be violated:
SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;
/* Changed into SELECT 1 FROM ONLY fktable x WHERE $1 @> fkcol FOR SHARE OF
x; */

-- 

Can someone help me understand the first limitation?


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-29 Thread Tom Lane
Andres Freund  writes:
> [ 0002-Move-ExecProcNode-from-dispatch-to-function-pointer-.patch ]

Here's a reviewed version of this patch.

I added dummy ExecProcNodeMtd functions to the various node types that
lacked them because they expect to be called through MultiExecProcNode
instead.  In the old coding, trying to call ExecProcNode on one of those
node types would have led to a useful error message; as you had it,
it'd have dumped core, which is not an improvement.

Also, I removed the ExecReScan stanza from ExecProcNodeFirst; that
should surely be redundant, because we should only get to that function
through ExecProcNode().  If somehow it's not redundant, please add a
comment explaining why not.

Some other minor cosmetic changes, mostly comment wordsmithing.

I think this (and the previous one) are committable.

regards, tom lane

diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 20fd9f8..d338cfe 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -17,15 +17,10 @@
  *-
  */
 /*
- *	 INTERFACE ROUTINES
- *		ExecInitNode	-		initialize a plan node and its subplans
- *		ExecProcNode	-		get a tuple by executing the plan node
- *		ExecEndNode		-		shut down a plan node and its subplans
- *
  *	 NOTES
  *		This used to be three files.  It is now all combined into
- *		one file so that it is easier to keep ExecInitNode, ExecProcNode,
- *		and ExecEndNode in sync when new nodes are added.
+ *		one file so that it is easier to keep the dispatch routines
+ *		in sync when new nodes are added.
  *
  *	 EXAMPLE
  *		Suppose we want the age of the manager of the shoe department and
@@ -122,6 +117,10 @@
 #include "miscadmin.h"
 
 
+static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+
+
 /* 
  *		ExecInitNode
  *
@@ -149,6 +148,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 	if (node == NULL)
 		return NULL;
 
+	/*
+	 * Make sure there's enough stack available. Need to check here, in
+	 * addition to ExecProcNode() (via ExecProcNodeFirst()), to ensure the
+	 * stack isn't overrun while initializing the node tree.
+	 */
+	check_stack_depth();
+
 	switch (nodeTag(node))
 	{
 			/*
@@ -365,6 +371,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 	}
 
 	/*
+	 * Add a wrapper around the ExecProcNode callback that checks stack depth
+	 * during the first execution.
+	 */
+	result->ExecProcNodeReal = result->ExecProcNode;
+	result->ExecProcNode = ExecProcNodeFirst;
+
+	/*
 	 * Initialize any initPlans present in this node.  The planner put them in
 	 * a separate list for us.
 	 */
@@ -388,195 +401,51 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 }
 
 
-/* 
- *		ExecProcNode
- *
- *		Execute the given node to return a(nother) tuple.
- * 
+/*
+ * ExecProcNode wrapper that performs some one-time checks, before calling
+ * the relevant node method (possibly via an instrumentation wrapper).
  */
-TupleTableSlot *
-ExecProcNode(PlanState *node)
+static TupleTableSlot *
+ExecProcNodeFirst(PlanState *node)
 {
-	TupleTableSlot *result;
-
-	if (node->chgParam != NULL) /* something changed */
-		ExecReScan(node);		/* let ReScan handle this */
+	/*
+	 * Perform stack depth check during the first execution of the node.  We
+	 * only do so the first time round because it turns out to not be cheap on
+	 * some common architectures (eg. x86).  This relies on an assumption that
+	 * ExecProcNode calls for a given plan node will always be made at roughly
+	 * the same stack depth.
+	 */
+	check_stack_depth();
 
+	/*
+	 * If instrumentation is required, change the wrapper to one that just
+	 * does instrumentation.  Otherwise we can dispense with all wrappers and
+	 * have ExecProcNode() directly call the relevant function from now on.
+	 */
 	if (node->instrument)
-		InstrStartNode(node->instrument);
-
-	switch (nodeTag(node))
-	{
-			/*
-			 * control nodes
-			 */
-		case T_ResultState:
-			result = ExecResult((ResultState *) node);
-			break;
-
-		case T_ProjectSetState:
-			result = ExecProjectSet((ProjectSetState *) node);
-			break;
-
-		case T_ModifyTableState:
-			result = ExecModifyTable((ModifyTableState *) node);
-			break;
-
-		case T_AppendState:
-			result = ExecAppend((AppendState *) node);
-			break;
-
-		case T_MergeAppendState:
-			result = ExecMergeAppend((MergeAppendState *) node);
-			break;
-
-		case T_RecursiveUnionState:
-			result = ExecRecursiveUnion((RecursiveUnionState *) node);
-			break;
-
-			/* BitmapAndState does not yield tuples */
-
-			/* BitmapOrState does not yield tuples */
-
-			/*
-			 * scan nodes
-			 */
-	

Re: [HACKERS] Causal reads take II

2017-07-29 Thread Dmitry Dolgov
> On 12 July 2017 at 23:45, Thomas Munro 
wrote:
> I renamed it to "synchronous replay", because "causal reads" seemed a bit
too
> arcane.

Hi

I looked through the code of `synchronous-replay-v1.patch` a bit and ran a
few
tests. I didn't manage to break anything, except one mysterious error that
I've
got only once on one of my replicas, but I couldn't reproduce it yet.
Interesting thing is that this error did not affect another replica or
primary.
Just in case here is the log for this error (maybe you can see something
obvious, that I've not noticed):

```
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  directories for tablespace 47733 could not be removed
HINT:  You can remove the directories manually if necessary.
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
FATAL:  could not create directory "pg_tblspc/47734/PG_10_201707211/47732":
File exists
CONTEXT:  WAL redo at 0/125F5768 for Storage/CREATE:
pg_tblspc/47734/PG_10_201707211/47732/47736
LOG:  startup process (PID 8034) exited with exit code 1
LOG:  terminating any other active server processes
LOG:  database system is shut down
```

And speaking about the code, so far I have just a few notes (some of them
merely questions):

* In general the idea behind this patch sounds interesting for me, but it
  relies heavily on time synchronization. As mentioned in the documentation:
  "Current hardware clocks, NTP implementations and public time servers are
  unlikely to allow the system clocks to differ more than tens or hundreds
of
  milliseconds, and systems synchronized with dedicated local time servers
may
  be considerably more accurate." But as far as I remember from my own
  experience sometimes it maybe not that trivial on something like AWS
because
  of virtualization. Maybe it's an unreasonable fear, but is it possible to
  address this problem somehow?

* Also I noticed that some time-related values are hardcoded (e.g. 50%/25%
  time shift when we're dealing with leases). Does it make sense to move
them
  out and make them configurable?

* Judging from the `SyncReplayPotentialStandby` function, it's possible to
have
  `synchronous_replay_standby_names = "*, server_name"`, which is basically
an
  equivalent for just `*`, but it looks confusing. Is it worth it to prevent
  this behaviour?

* In the same function `SyncReplayPotentialStandby` there is this code:

```
if (!SplitIdentifierString(rawstring, ',', ))
{
/* syntax error in list */
pfree(rawstring);
list_free(elemlist);
/* GUC machinery will have already complained - no need to do again */
return false;
}
```

  Am I right that ideally this (a situation when at this point in the code
  `synchronous_replay_standby_names` has incorrect value) should not happen,
  because GUC will prevent us from that? If yes, then it looks for me that
it
  still makes sense to put here a log message, just to give more
information in
  a potentially weird situation.

* In the function `SyncRepReleaseWaiters` there is a commentary:

```
  /*
  * If the number of sync standbys is less than requested or we aren't
* managing a sync standby or a standby in synchronous replay state that
* blocks then just leave.
  * /
if ((!got_recptr || !am_sync) && !walsender_sr_blocker)
```

  Is this commentary correct? If I understand everything right
`!got_recptr` -
  the number of sync standbys is less than requested (a), `!am_sync` - we
aren't
  managing a sync standby (b), `walsender_sr_blocker` - a standby in
synchronous
  replay state that blocks (c). Looks like condition is `(a or b) and not
c`.

* In the function `ProcessStandbyReplyMessage` there is a code that
implements
  this:

```
 * If the standby reports that it has fully replayed the WAL for at
least
* 10 seconds, then let's clear the lag times that were measured when it
* last wrote/flushed/applied a WAL record.  This way we avoid displaying
* stale lag data until more WAL traffic arrives.
```
  but I never found any mention of this 10 seconds in the documentation. Is
it
  not that important? Also, question 2 is related to this one.

* In the function `WalSndGetSyncReplayStateString` all the states are in
lower
  case except `UNKNOWN`, is there any particular reason for that?

There are also few more not that important notes (mostly about some typos
and
few confusing names), but I'm going to do another round of review and
testing
anyway so I'll just send them all 

Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-29 Thread Andres Freund
On 2017-07-29 14:20:32 -0400, Tom Lane wrote:
> Here's a reviewed version of this patch.

Thanks!

> * I think you put ExecScan's CFI in the wrong place; AFAICT yours
> only covers its fast path.

Sure - but the old path already has a CFI? And it has to be inside the
loop, because well, the loop ;).


> * I think ExecAgg needs a CFI at the head, just to be sure it's hit
> in any path through that.

Yep, makes esense.


> * I agree that putting CFI inside ExecHashJoin's state machine loop
> is a good idea, because it might have to trawl through quite a lot of
> a batch file before finding a returnable tuple.  But I think in merge
> and nestloop joins it's sufficient to put one CFI at the head.  Neither
> of those cases can do very much processing without invoking a child
> node, where a CFI will happen.

Ok, I can live with that.


> * You missed ExecLockRows altogether.

Well, it directly calls the next ExecProcNode(), so I didn't think it
was necessary. One of the aforementioned judgement calls. But I'm
perfectly happy to have one there.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-29 Thread Tom Lane
Andres Freund  writes:
> On 2017-07-26 16:28:38 -0400, Tom Lane wrote:
>> It's certainly possible that there are long-running loops not involving
>> any ExecProcNode recursion at all, but that would be a bug independent
>> of this issue.  The CFI in ExecProcNode itself can be replaced exactly
>> either by asking all callers to do it, or by asking all callees to do it.
>> I think the latter is going to be more uniform and harder to screw up.

> Looks a bit better.  Still a lot of judgement-y calls tho, e.g. when one
> node function just calls the next, or when there's loops etc.   I found
> a good number of missing CFIs...

> What do you think?

Here's a reviewed version of this patch.  Differences from yours:

* I think you put ExecScan's CFI in the wrong place; AFAICT yours
only covers its fast path.

* I think ExecAgg needs a CFI at the head, just to be sure it's hit
in any path through that.

* I agree that putting CFI inside ExecHashJoin's state machine loop
is a good idea, because it might have to trawl through quite a lot of
a batch file before finding a returnable tuple.  But I think in merge
and nestloop joins it's sufficient to put one CFI at the head.  Neither
of those cases can do very much processing without invoking a child
node, where a CFI will happen.

* You missed ExecLockRows altogether.

* I added some comments and cosmetic tweaks.

regards, tom lane

diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 294ad2c..20fd9f8 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -399,8 +399,6 @@ ExecProcNode(PlanState *node)
 {
 	TupleTableSlot *result;
 
-	CHECK_FOR_INTERRUPTS();
-
 	if (node->chgParam != NULL) /* something changed */
 		ExecReScan(node);		/* let ReScan handle this */
 
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 4f131b3..6fde7cd 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -126,6 +126,8 @@ ExecScan(ScanState *node,
 	ExprState  *qual;
 	ProjectionInfo *projInfo;
 
+	CHECK_FOR_INTERRUPTS();
+
 	/*
 	 * Fetch data from node
 	 */
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index de9a18e..377916d 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -677,6 +677,8 @@ fetch_input_tuple(AggState *aggstate)
 
 	if (aggstate->sort_in)
 	{
+		/* make sure we check for interrupts in either path through here */
+		CHECK_FOR_INTERRUPTS();
 		if (!tuplesort_gettupleslot(aggstate->sort_in, true, false,
 	aggstate->sort_slot, NULL))
 			return NULL;
@@ -1414,6 +1416,8 @@ process_ordered_aggregate_multi(AggState *aggstate,
 	while (tuplesort_gettupleslot(pertrans->sortstates[aggstate->current_set],
   true, true, slot1, ))
 	{
+		CHECK_FOR_INTERRUPTS();
+
 		/*
 		 * Extract the first numTransInputs columns as datums to pass to the
 		 * transfn.  (This will help execTuplesMatch too, so we do it
@@ -2100,6 +2104,8 @@ ExecAgg(AggState *node)
 {
 	TupleTableSlot *result = NULL;
 
+	CHECK_FOR_INTERRUPTS();
+
 	if (!node->agg_done)
 	{
 		/* Dispatch based on strategy */
@@ -2563,6 +2569,8 @@ agg_retrieve_hash_table(AggState *aggstate)
 		TupleTableSlot *hashslot = perhash->hashslot;
 		int			i;
 
+		CHECK_FOR_INTERRUPTS();
+
 		/*
 		 * Find the next entry in the hash table
 		 */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index aae5e3f..58045e0 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,6 +59,7 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "miscadmin.h"
 
 static bool exec_append_initialize_next(AppendState *appendstate);
 
@@ -204,6 +205,8 @@ ExecAppend(AppendState *node)
 		PlanState  *subnode;
 		TupleTableSlot *result;
 
+		CHECK_FOR_INTERRUPTS();
+
 		/*
 		 * figure out which subplan we are currently processing
 		 */
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7e0ba03..cf109d5 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -41,6 +41,7 @@
 #include "access/transam.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapHeapscan.h"
+#include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
@@ -192,6 +193,8 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		Page		dp;
 		ItemId		lp;
 
+		CHECK_FOR_INTERRUPTS();
+
 		/*
 		 * Get next page of results if needed
 		 */
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 69e2704..fc15974 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -15,6 +15,7 @@
 #include "executor/nodeCustom.h"
 #include "nodes/execnodes.h"
 #include "nodes/plannodes.h"
+#include "miscadmin.h"
 #include 

Re: [HACKERS] Memory leak

2017-07-29 Thread Fan Yang
You are right. When I add those code, it really give me a strong bad smell.
It not worth these effort.

Thanks for your reply and suggestion.

--
Sincerely
Fan Yang


Re: [HACKERS] Memory leak

2017-07-29 Thread Tom Lane
fan yang  writes:
> - src/port/quotes.c
> At line 38, at function "escape_single_quotes_ascii",
> here used "malloc" to get some memory and return the
> pointer returned by the "malloc".
> So, any caller used this function shoule free this memory.
> - /src/bin/initdb/initdb.c
> At line 327, at function "escape_quotes",
> which use function "escape_single_quotes_ascii"
> from above file.
> But at this file(/src/bin/initdb/initdb.c), there are many place
> used function "escape_quotes" but have not free the pointer returned
> by the function, thus cause memory leak.

By and large, we intentionally don't worry about memory leaks in initdb
(or any other program with a limited runtime).  It's not worth the
maintenance effort to save a few bytes, at least not where it requires
code contortions like these.

Doing a quick check with valgrind says that a run of initdb, in HEAD,
leaks about 560KB over its lifespan.  That's well below the threshold
of pain on any machine capable of running modern Postgres reasonably.

For fun, I tried to see whether your patch moved that number appreciably,
but patch(1) couldn't make any sense of it at all.  I think that probably
your mail program munged the whitespace in the patch.  Many of us have
found that the most reliable way to forward patches through email is to
add them as attachments rather than pasting them into the text in-line.

Poking at it a bit harder with valgrind, it seems that the vast majority
of what it reports as leaks is caused by replace_token().  If we wanted to
reduce memory wastage during initdb, that would be the place to work on.
But I'm dubious that it's worth any effort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory leak

2017-07-29 Thread fan yang
Hi all,

While reading the code, I find some code that make memory leak:

- src/port/quotes.c
At line 38, at function "escape_single_quotes_ascii",
here used "malloc" to get some memory and return the
pointer returned by the "malloc".
So, any caller used this function shoule free this memory.
- /src/bin/initdb/initdb.c
At line 327, at function "escape_quotes",
which use function "escape_single_quotes_ascii"
from above file.
But at this file(/src/bin/initdb/initdb.c), there are many place
used function "escape_quotes" but have not free the pointer returned
by the function, thus cause memory leak.

I have fixed this bug and made a patch here:



diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7303bbe..4e5429e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1010,6 +1010,7 @@ setup_config(void)
 charpath[MAXPGPATH];
 const char *default_timezone;
 char   *autoconflines[3];
+char   *escaped;

 fputs(_("creating configuration files ... "), stdout);
 fflush(stdout);
@@ -1043,20 +1044,28 @@ setup_config(void)
 conflines = replace_token(conflines, "#port = 5432", repltok);
 #endif

+escaped = escape_quotes(lc_messages);
 snprintf(repltok, sizeof(repltok), "lc_messages = '%s'",
- escape_quotes(lc_messages));
+ escaped);
+free(escaped);
 conflines = replace_token(conflines, "#lc_messages = 'C'", repltok);

+escaped = escape_quotes(lc_monetary);
 snprintf(repltok, sizeof(repltok), "lc_monetary = '%s'",
- escape_quotes(lc_monetary));
+ escaped);
+free(escaped);
 conflines = replace_token(conflines, "#lc_monetary = 'C'", repltok);

+escaped = escape_quotes(lc_numeric);
 snprintf(repltok, sizeof(repltok), "lc_numeric = '%s'",
- escape_quotes(lc_numeric));
+ escaped);
+free(escaped);
 conflines = replace_token(conflines, "#lc_numeric = 'C'", repltok);

+escaped = escape_quotes(lc_time);
 snprintf(repltok, sizeof(repltok), "lc_time = '%s'",
- escape_quotes(lc_time));
+ escaped);
+free(escaped);
 conflines = replace_token(conflines, "#lc_time = 'C'", repltok);

 switch (locale_date_order(lc_time))
@@ -1074,9 +1083,11 @@ setup_config(void)
 }
 conflines = replace_token(conflines, "#datestyle = 'iso, mdy'",
repltok);

+escaped = escape_quotes(default_text_search_config);
 snprintf(repltok, sizeof(repltok),
  "default_text_search_config = 'pg_catalog.%s'",
- escape_quotes(default_text_search_config));
+ escaped);
+free(escaped);
 conflines = replace_token(conflines,
   "#default_text_search_config =
'pg_catalog.simple'",
   repltok);
@@ -1084,11 +1095,13 @@ setup_config(void)
 default_timezone = select_default_timezone(share_path);
 if (default_timezone)
 {
+escaped = escape_quotes(default_timezone);
 snprintf(repltok, sizeof(repltok), "timezone = '%s'",
- escape_quotes(default_timezone));
+ escaped);
 conflines = replace_token(conflines, "#timezone = 'GMT'", repltok);
 snprintf(repltok, sizeof(repltok), "log_timezone = '%s'",
- escape_quotes(default_timezone));
+ escaped);
+free(escaped);
 conflines = replace_token(conflines, "#log_timezone = 'GMT'",
repltok);
 }

@@ -1289,6 +1302,7 @@ bootstrap_template1(void)
 char  **bki_lines;
 charheaderline[MAXPGPATH];
 charbuf[64];
+char   *escaped;

 printf(_("running bootstrap script ... "));
 fflush(stdout);
@@ -1330,13 +1344,19 @@ bootstrap_template1(void)
 bki_lines = replace_token(bki_lines, "FLOAT8PASSBYVAL",
   FLOAT8PASSBYVAL ? "true" : "false");

-bki_lines = replace_token(bki_lines, "POSTGRES",
escape_quotes(username));
+escaped = escape_quotes(username);
+bki_lines = replace_token(bki_lines, "POSTGRES", escaped);
+free(escaped);

 bki_lines = replace_token(bki_lines, "ENCODING", encodingid);

-bki_lines = replace_token(bki_lines, "LC_COLLATE",
escape_quotes(lc_collate));
+escaped = escape_quotes(lc_collate);
+bki_lines = replace_token(bki_lines, "LC_COLLATE", escaped);
+free(escaped);

-bki_lines = replace_token(bki_lines, "LC_CTYPE",
escape_quotes(lc_ctype));
+escaped = escape_quotes(lc_ctype);
+bki_lines = replace_token(bki_lines, "LC_CTYPE", escaped);
+free(escaped);

 /*
  * Pass correct LC_xxx environment to bootstrap.
@@ -1391,13 +1411,16 @@ setup_auth(FILE *cmdfd)
 "REVOKE ALL on pg_authid FROM public;\n\n",
 NULL
 };
+char*escaped;

 for (line = pg_authid_setup; *line != NULL; line++)
 PG_CMD_PUTS(*line);

+escaped = escape_quotes(superuser_password);
 if