Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 1:26 AM, Fujii Masao  wrote:
> I'm afraid that many WAL segments would start with a continuation record
> when there are the workload of short transactions (e.g., by pgbench), and
> which would make restart_lsn go behind very much. No?

I don't quite understand this argument. Even if there are many small
transactions, that would cause restart_lsn to just be late by one
segment, all the time.

> The discussion on this thread just makes me think that restart_lsn should
> indicate the replay location instead of flush location. This seems safer.

That would penalize WAL retention on the primary with standbys using
recovery_min_apply_delay and a slot for example...

We can attempt to address this problem two ways. The patch proposed
(ugly btw and there are two typos!) is doing it in the WAL sender by
not making restart_lsn jump to the next segment if a continuation
record is found. Or we could have the standby request for the next
segment instead if the record it wants to replay from is at a boundary
and that it locally has the beginning of the record, and it has it
because it already confirmed to the primary that it flushed to the
next segment. Not sure which fix is better though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-02-01 Thread Nikhil Sontakke
>> Yeah. Was thinking about this yesterday. How about adding entries in
>> TwoPhaseState itself (which become valid later)? Only if it does not
>> cause a lot of code churn.
>
> That's possible as well, yes.

PFA a patch which does the above. It re-uses the TwoPhaseState gxact
entries to track 2PC PREPARE/COMMIT in shared memory. The advantage
being that CheckPointTwoPhase() becomes the only place where the fsync
of 2PC files happens.

A minor annoyance in the patch is the duplication of the code to add
the 2nd while loop to go through these shared memory entries in
PrescanPreparedTransactions, RecoverPreparedTransactions and
StandbyRecoverPreparedTransactions.

Other than this, I ran TAP tests and they succeed as needed.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


twophase_recovery_shmem_020217.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-01 Thread Higuchi, Daisuke
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
>This has not been added yet to the next CF. As Ashutosh mentioned
>things tend to be easily forgotten. So I have added it here:
>https://commitfest.postgresql.org/13/982/
Thank you for adding this problem to CF. 

> I have noticed this typo on the way => s/requisted/requested/:
>--- a/src/interfaces/libpq/fe-connect.c
>+++ b/src/interfaces/libpq/fe-connect.c
>@@ -2908,7 +2908,7 @@ keep_going:   /* We will
>come back to here until there is
>}
>/*
>-* If a read-write connection is requisted check for same.
>+* If a read-write connection is requested check for same.
> */
>if (conn->target_session_attrs != NULL &&
>strcmp(conn->target_session_attrs, "read-write") == 0)
I add this fix to the updated patch. 
This is based on the patch Ashutosh updated. 

Regards, 
Daisuke, Higuchi



PQsendQuery_for_target_session_attrs_v4.patch
Description: PQsendQuery_for_target_session_attrs_v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-01 Thread Michael Paquier
Hi all,

While testing a bit the logical replication facility, I have bumped on
the fast that psql's completion does not show the list of things
already created. Attached is a patch.
Thanks,
-- 
Michael


subs-psql-completion.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane  wrote:
> Kyotaro HORIGUCHI  writes:
>> Then, the reason for the TRY-CATCH cluase is that I found that
>> some functions called from there can throw exceptions.
>
> Yes, but all LWLocks should be released by normal error recovery.
> It should not be necessary for this code to clean that up by hand.
> If it were necessary, there would be TRY-CATCH around every single
> LWLockAcquire in the backend, and we'd have an unreadable and
> unmaintainable system.  Please don't add a TRY-CATCH unless it's
> *necessary* -- and you haven't explained why this one is.

Putting hands into the code and at the problem, I can see that
dropping a subscription on a node makes it unresponsive in case of a
stop. And that's just because calls to LWLockRelease are missing as in
the patch attached. A try/catch problem should not be necessary.
-- 
Michael


drop-subs-locks.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-01 Thread Higuchi, Daisuke
From: Ashutosh Bapat [mailto:ashutosh.ba...@enterprisedb.com] 
>Sorry, attached wrong patch. Here's the right one.
> The code expects that there will be two PQgetResult() calls required.
> One to fetch the result of SHOW command and the other to extract NULL.
> If we require more calls or unexpected results, we should throw and
> error. The patch just checks the first result and consumes the
> remaining without verifying them. Also, it looks like we can not clear
> result of PQgetResult() before using the values or copying them
> somewhere else [1]. Here's updated patch which tries to do that. 
> Please let me know if this looks good to you.
Oh, I had a basic misunderstanding. Thank you for correct me. 
There is no problem in the patch you attached. I agree with you. 

Regards, 
Daisuke, Higuchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-01 Thread Pavan Deolasee
On Thu, Feb 2, 2017 at 3:49 AM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> I think what we ought to do about this is invent additional API
> >> functions, say
> >>
> >> Oid CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
> >> CatalogIndexState indstate);
> >> void CatalogTupleUpdateWithInfo(Relation heapRel, ItemPointer otid,
> >> HeapTuple tup, CatalogIndexState indstate);
> >>
> >> and use these in place of simple_heap_foo plus CatalogIndexInsert
> >> in the places where this optimization had been applied.
>
> > This looks reasonable enough to me.
>
> Done.
>
>
Thanks for taking care of this. Shame that I missed this because I'd
specifically noted the special casing for large objects etc. But looks like
while changing 180+ call sites, I forgot my notes.

Thanks again,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-02-01 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> Then, the reason for the TRY-CATCH cluase is that I found that
> some functions called from there can throw exceptions.

Yes, but all LWLocks should be released by normal error recovery.
It should not be necessary for this code to clean that up by hand.
If it were necessary, there would be TRY-CATCH around every single
LWLockAcquire in the backend, and we'd have an unreadable and
unmaintainable system.  Please don't add a TRY-CATCH unless it's
*necessary* -- and you haven't explained why this one is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should `pg_upgrade --check` check relation filenodes are present?

2017-02-01 Thread Peter Eisentraut
On 1/31/17 4:57 PM, Craig de Stigter wrote:
> However, this problem was not caught by the `--check` command. I'm
> looking at the source code and it appears that pg_upgrade does not
> attempt to verify relation filenodes actually exist before proceeding,
> whether using --check or not.

The purpose of --check is to see if there is anything in your database
that pg_upgrade cannot upgrade.  Its purpose is not to detect general
damage in a database.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 1:34 PM, Ashutosh Bapat
 wrote:
> On Thu, Feb 2, 2017 at 10:04 AM, Ashutosh Bapat
>  wrote:
>> [1] https://www.postgresql.org/docs/devel/static/libpq-exec.html: 
>> PQgetvalue().
> The code expects that there will be two PQgetResult() calls required.
> One to fetch the result of SHOW command and the other to extract NULL.
> If we require more calls or unexpected results, we should throw and
> error. The patch just checks the first result and consumes the
> remaining without verifying them. Also, it looks like we can not clear
> result of PQgetResult() before using the values or copying them
> somewhere else [1]. Here's updated patch which tries to do that.
> Please let me know if this looks good to you.

This has not been added yet to the next CF. As Ashutosh mentioned
things tend to be easily forgotten. So I have added it here:
https://commitfest.postgresql.org/13/982/

I have noticed this typo on the way => s/requisted/requested/:
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2908,7 +2908,7 @@ keep_going:   /* We will
come back to here until there is
}

/*
-* If a read-write connection is requisted check for same.
+* If a read-write connection is requested check for same.
 */
if (conn->target_session_attrs != NULL &&
strcmp(conn->target_session_attrs, "read-write") == 0)

Looking at the patch, I agree with Ashutosh. Any fix should be located
in the code path of CONNECTION_CHECK_WRITABLE which is the one
consuming the results.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling replication connections by default in pg_hba.conf

2017-02-01 Thread Peter Eisentraut
On 1/22/17 10:29 PM, Michael Paquier wrote:
> As now wal_level = replica has become the default for Postgres 10,
> could we consider as well making replication connections enabled by
> default in pg_hba.conf? This requires just uncommenting a couple of
> lines in pg_hba.conf.sample.

Yes, I think this makes sense, as one of the reason for these changes is
to simplify the use of pg_basebackup.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-02-01 Thread Dilip Kumar
On Wed, Feb 1, 2017 at 11:09 PM, Robert Haas  wrote:

Hi Robert,

Thanks for the review.

> When Andres wrote in
> https://www.postgresql.org/message-id/20161017201558.cr34stc6zvxy3...@alap3.anarazel.de
> that you should try to share only the iteration arrays, I believe that
> he meant the results of tbm_begin_iterate() -- that is,
> iterator->spageptr, chunkptr, schunkbit, spages, and schunks.  I'm
> perhaps putting words into his mouth, but what he said was that we
> could avoid sharing "the whole underlying hash".  But the patch set
> you've implemented here doesn't behave that way.  Instead, you've got
> the array containing the hash table elements shared (which is good)

Here is my analysis why we selected this instead of just sharing the iterator:

The problem is that each entry of iteration array is just a pointer to
hash entry. So if we only shared iterator then iterator element will
be pointing to local memory of other workers. I thought of one more
option that during tbm_begin_iterate instead of keeping pointer to
hash entry, we can make a copy of that in DSA area, but that will have
2 copies of hash table element for short duration which is not good.
And, I think what we are doing currently is better than that.

  The work of tbm_begin_iterate() should be done before we
> begin the shared scan and the results of that work should be shared.
> What happens right now (it appears) is that every backend does that
> work based on the same hash table and we just assume they all get the
> same answer.

Actually, tbm_begin_iterate is processing the each element of the hash
table and converting to chunk and page array and currently, it’s done
by each worker and it’s pretty cheap. Suppose I try to do this only by
the first worker then implementation will look something like this.

1. First, we need to create a tbm->spages array and tbm->schunks array
and both should be the array of dsa_pointers.
2. Then each worker will process these array’s and will convert them
to the array of their local pointers.
3. With the current solution where all hash elements are stored in one
large dsa chunk, then how we are going to divide them into multiple
dsa pointers.

I will work on remaining comments.

  And we really do assume that, because
> pbms_parallel_iterate() assumes it can shuttle private state back and
> forth between iterator in different backends and nothing will break;
> but those backends aren't actually using the same iteration arrays.
> They are using different iteration arrays that should have the same
> contents because they were all derived from the same semi-shared hash
> table.  That's pretty fragile, and a waste of CPU cycles if the hash
> table is large (every backend does its own sort).
>
> On a related note, I think it's unacceptable to make the internal
> details of TBMIterator public.  You've moved it from tidbitmap.c to
> tidbitmap.h so that nodeBitmapHeapScan.c can tinker with the guts of
> the TBMIterator, but that's not OK.  Those details need to remain
> private to tidbitmap.c. pbms_parallel_iterate() is a nasty kludge; we
> need some better solution.  The knowledge of how a shared iterator
> should iterate needs to be private to tidbitmap.c, not off in the
> executor someplace.  And I think the entries need to actually be
> updated in shared memory directly, not copied back and forth between a
> backend-private iterator and a shared iterator.
>
> Also, pbms_parallel_iterate() can't hold a spinlock around a call to
> tbm_iterate().  Note the coding rules for spinlocks mentioned in
> spin.h and src/backend/storage/lmgr/README.  I think the right thing
> to do here is to use an LWLock in a new tranche (add an entry to
> BuiltinTrancheIds).
>
> In 0002, you have this:
>
> +tb->alloc = palloc(sizeof(SH_ALLOCATOR));
>
> This should be using MemoryContextAlloc(ctx, ...) rather than palloc.
> Otherwise the allocator object is in a different context from
> everything else in the hash table.  But TBH, I don't really see why we
> want this to be a separate object.  Why not just put
> HashAlloc/HashFree/args into SH_TYPE directly?  That avoids some
> pointer chasing and doesn't really seem to cost anything (except that
> SH_CREATE will grow a slightly longer argument sequence).
>
> I think maybe we should rename the functions to element_allocate,
> element_free, and element_allocator_ctx, or something like that.  The
> current names aren't capitalized consistently with other things in
> this module, and putting the word "element" in there would make it
> more clear what the purpose of this thing is.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

2017-02-01 Thread Andres Freund
On 2017-02-01 23:27:36 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> Tom, do you have an opinion?
> 
> > Yes, it's broken.  split_pathtarget_at_srfs seems to be doing the right
> > thing, but then something later is recombining the last two steps.
> 
> Ah, no, I take that back: split_pathtarget_at_srfs is doing the wrong
> thing.

Yea, that's what I thought.


> Nothing's broken quite yet, but when we get to setrefs.c, it replaces
> *both* occurrences of g(x) with Vars referencing the g(x) output from
> the next level down.  So now we have the tlist of the upper ProjectSet
> node as "f(Var), Var" and ProjectSet complains that it has no SRF to
> evaluate.

But I'd missed part of that subtlety.


> I think the appropriate fix is that, once split_pathtarget_at_srfs() has
> computed a tentative list of SRFs it needs to evaluate, it has to make a
> second pass to see if any of them match expressions that were assigned to
> the next level down.

> This is pretty annoying, but we'd only have to do it
> if target_contains_srfs and context.nextlevel_contains_srfs are both true,
> which will be a negligibly small fraction of queries in practice.

Hm.  Can't really come up with something better, but I'm kinda tired
too...


> Or we could take out that Assert in nodeProjectSet.c.  But that seems
> like a hack not a fix.

Yea, I don't like that much.


> I'm pretty tired but I'll work on a fix tomorrow.

Cool.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest 2017-01 will begin soon!

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 1:23 PM, Amit Kapila  wrote:
> Many thanks to you for running the show.  I think it might be okay if
> one consolidated mail is sent for all the patches that are marked
> "Returned with Feedback" or "Rejected" or "Moved to next CF".  OTOH,
> there is some value in sending a separate mail for each patch so that
> people can argue on the decision by CFM for one particular patch, but
> not sure if it worth.

People tend to look at emails they are directly on in CC, that's why I
prefer sending individual emails.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-01 Thread Ashutosh Bapat
Sorry, attached wrong patch. Here's the right one.

On Thu, Feb 2, 2017 at 10:04 AM, Ashutosh Bapat
 wrote:
> On Thu, Feb 2, 2017 at 8:11 AM, Higuchi, Daisuke
>  wrote:
>> From: Ashutosh Bapat [mailto:ashutosh.ba...@enterprisedb.com]
>>> Per the documentation [1], "PQgetResult must be called repeatedly
>>> until it returns a null pointer, indicating that the command is
>>> done.". The code in PQgetResult() under CONNECTION_CHECK_WRITABLE
>>> case, violates that. The patch fixes it. The patch however jumps to
>>> keep_going without changing conn->status, which means that it will end
>>> up again in the same case. While that's fine, may be we should use a
>>> local loop on PQgetResult() to keep the code readable.
>> Thank you for reviewing the patch.
>> I created it with reference to pqSetenvPoll() in 
>> interfaces/libpq/fe-protocol2.c,
>> but I certainly thought that readability is not good.
>> I updated the patch, so I will add this to the next commitfest.
>
> Thanks for the patch.
>
> The code expects that there will be two PQgetResult() calls required.
> One to fetch the result of SHOW command and the other to extract NULL.
> If we require more calls or unexpected results, we should throw and
> error. The patch just checks the first result and consumes the
> remaining without verifying them. Also, it looks like we can not clear
> result of PQgetResult() before using the values or copying them
> somewhere else [1]. Here's updated patch which tries to do that.
> Please let me know if this looks good to you.
>
>
> [1] https://www.postgresql.org/docs/devel/static/libpq-exec.html: 
> PQgetvalue().
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


PQsendQuery_for_target_session_attrs_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-01 Thread Ashutosh Bapat
On Thu, Feb 2, 2017 at 8:11 AM, Higuchi, Daisuke
 wrote:
> From: Ashutosh Bapat [mailto:ashutosh.ba...@enterprisedb.com]
>> Per the documentation [1], "PQgetResult must be called repeatedly
>> until it returns a null pointer, indicating that the command is
>> done.". The code in PQgetResult() under CONNECTION_CHECK_WRITABLE
>> case, violates that. The patch fixes it. The patch however jumps to
>> keep_going without changing conn->status, which means that it will end
>> up again in the same case. While that's fine, may be we should use a
>> local loop on PQgetResult() to keep the code readable.
> Thank you for reviewing the patch.
> I created it with reference to pqSetenvPoll() in 
> interfaces/libpq/fe-protocol2.c,
> but I certainly thought that readability is not good.
> I updated the patch, so I will add this to the next commitfest.

Thanks for the patch.

The code expects that there will be two PQgetResult() calls required.
One to fetch the result of SHOW command and the other to extract NULL.
If we require more calls or unexpected results, we should throw and
error. The patch just checks the first result and consumes the
remaining without verifying them. Also, it looks like we can not clear
result of PQgetResult() before using the values or copying them
somewhere else [1]. Here's updated patch which tries to do that.
Please let me know if this looks good to you.


[1] https://www.postgresql.org/docs/devel/static/libpq-exec.html: PQgetvalue().
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


PQsendQuery_for_target_session_attrs_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

2017-02-01 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> Tom, do you have an opinion?

> Yes, it's broken.  split_pathtarget_at_srfs seems to be doing the right
> thing, but then something later is recombining the last two steps.

Ah, no, I take that back: split_pathtarget_at_srfs is doing the wrong
thing.  It's generating the desired list of PathTargets, but it's
mistakenly concluding that the last one contains_srfs, which leads to
making a ProjectSet plan node for it instead of Result.  The problem
is specific to targetlists like

f(g(x)), g(x)

where g() is a SRF and f() can be any sort of non-set-returning
expression.  split_pathtarget_at_srfs() examines the f() node,
sees that it's not a SRF, and calls split_pathtarget_walker which
finds the g(x) subexpression and correctly assigns that to the next
level down.  But then split_pathtarget_at_srfs finds the top-level
g(x) occurrence and concludes that this level contains_srfs.

Nothing's broken quite yet, but when we get to setrefs.c, it replaces
*both* occurrences of g(x) with Vars referencing the g(x) output from
the next level down.  So now we have the tlist of the upper ProjectSet
node as "f(Var), Var" and ProjectSet complains that it has no SRF to
evaluate.

I think the appropriate fix is that, once split_pathtarget_at_srfs() has
computed a tentative list of SRFs it needs to evaluate, it has to make a
second pass to see if any of them match expressions that were assigned to
the next level down.  This is pretty annoying, but we'd only have to do it
if target_contains_srfs and context.nextlevel_contains_srfs are both true,
which will be a negligibly small fraction of queries in practice.

Or we could take out that Assert in nodeProjectSet.c.  But that seems
like a hack not a fix.

I'm pretty tired but I'll work on a fix tomorrow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest 2017-01 will begin soon!

2017-02-01 Thread Amit Kapila
On Thu, Feb 2, 2017 at 8:50 AM, Michael Paquier
 wrote:
> On Tue, Jan 31, 2017 at 4:01 PM, Michael Paquier
>  wrote:
>> The CF finishing soon, I have done a first pass on the patches
>> remaining on it, moving patches to the next CF or updating them. There
>> are still 48 patches still on the tracker when writing this email:
>> Needs review: 32.
>> Ready for Committer: 16.
>> This requires a certain amount of energy to classify, so if you would
>> like spare a bit of HP/MP to the CFM, feel free to cleanup your
>> patches or the ones you are reviewing.
>>
>> @all: If I have updated your patch in a state that you think is not
>> appropriate, feel free to complain on the thread of the patch or here.
>> That's fine.
>
> After 3 days and 90 patches spamming everybody on this -hackers, the
> CF is now *closed*. Here is the final score of this session:
> Committed: 58.
> Moved to next CF: 70.
> Rejected: 4.
> Returned with Feedback: 23.
> Total: 155.
>
> It is nice to see that many people have registered as reviewers and
> that much has been done. I have finished with the impression that
> there has been more commitment in this area compared to the past.
> Thanks to all the people involved!
>

Many thanks to you for running the show.  I think it might be okay if
one consolidated mail is sent for all the patches that are marked
"Returned with Feedback" or "Rejected" or "Moved to next CF".  OTOH,
there is some value in sending a separate mail for each patch so that
people can argue on the decision by CFM for one particular patch, but
not sure if it worth.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-02-01 Thread Thomas Munro
On Thu, Feb 2, 2017 at 4:57 PM, Rafia Sabih
 wrote:
> Apart from the previously reported regression, there appear one more
> issue in this set of patches. At times, running a query using parallel
> hash it hangs up and all the workers including the master shows the
> following backtrace,

Ugh, thanks.  Investigating.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-02-01 Thread Rafia Sabih
On Thu, Feb 2, 2017 at 1:19 AM, Thomas Munro
 wrote:
> On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
>  wrote:
>> 9 | 62928.88   | 59077.909
>
> Thanks Rafia.  At first glance this plan is using the Parallel Shared
> Hash in one place where it should pay off, that is loading the orders
> table, but the numbers are terrible.  I noticed that it uses batch
> files and then has to increase the number of batch files, generating a
> bunch of extra work, even though it apparently overestimated the
> number of rows, though that's only ~9 seconds of ~60.  I am
> investigating.

Hi Thomas,
Apart from the previously reported regression, there appear one more
issue in this set of patches. At times, running a query using parallel
hash it hangs up and all the workers including the master shows the
following backtrace,

#0  0x3fff880c7de8 in __epoll_wait_nocancel () from /lib64/power8/libc.so.6
#1  0x104e2718 in WaitEventSetWaitBlock (set=0x100157bde90,
cur_timeout=-1, occurred_events=0x3fffdbe69698, nevents=1) at
latch.c:998
#2  0x104e255c in WaitEventSetWait (set=0x100157bde90,
timeout=-1, occurred_events=0x3fffdbe69698, nevents=1,
wait_event_info=134217745) at latch.c:950
#3  0x10512970 in ConditionVariableSleep (cv=0x3ffd736e05a4,
wait_event_info=134217745) at condition_variable.c:132
#4  0x104dbb1c in BarrierWaitSet (barrier=0x3ffd736e0594,
new_phase=1, wait_event_info=134217745) at barrier.c:97
#5  0x104dbb9c in BarrierWait (barrier=0x3ffd736e0594,
wait_event_info=134217745) at barrier.c:127
#6  0x103296a8 in ExecHashShrink (hashtable=0x3ffd73747dc0) at
nodeHash.c:1075
#7  0x1032c46c in dense_alloc_shared
(hashtable=0x3ffd73747dc0, size=40, shared=0x3fffdbe69eb8,
respect_work_mem=1 '\001') at nodeHash.c:2618
#8  0x1032a2f0 in ExecHashTableInsert
(hashtable=0x3ffd73747dc0, slot=0x100158f9e90, hashvalue=2389907270)
at nodeHash.c:1476
#9  0x10327fd0 in MultiExecHash (node=0x100158f9800) at nodeHash.c:296
#10 0x10306730 in MultiExecProcNode (node=0x100158f9800) at
execProcnode.c:577

The issue is not deterministic and straightforwardly reproducible,
sometimes after make clean, etc. queries run sometimes they hang up
again. I wanted to bring this to your notice hoping you might be
faster than me in picking up the exact reason behind this anomaly.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-01 Thread Amit Kapila
On Wed, Feb 1, 2017 at 10:04 PM, Robert Haas  wrote:
> On Tue, Jan 31, 2017 at 6:01 AM, Amit Kapila  wrote:
>> Moved this patch to next CF.
>
> So here is what seems to be the key hunk from this patch:
>
>  /*
> - * Since we don't have the ability to push subplans down to workers at
> - * present, we treat subplan references as parallel-restricted.  We need
> - * not worry about examining their contents; if they are unsafe, we would
> - * have found that out while examining the whole tree before reduction of
> - * sublinks to subplans.  (Really we should not see SubLink during a
> - * max_interesting == restricted scan, but if we do, return true.)
> + * We can push the subplans only if they don't contain any parallel-aware
> + * node as we don't support multi-level parallelism (parallel workers
> + * invoking another set of parallel workers).
>   */
> -else if (IsA(node, SubLink) ||
> - IsA(node, SubPlan) ||
> - IsA(node, AlternativeSubPlan))
> +else if (IsA(node, SubPlan))
> +return !((SubPlan *) node)->parallel_safe;
> +else if (IsA(node, AlternativeSubPlan))
>  {
> -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> -return true;
> +AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
> +ListCell   *lc;
> +
> +foreach(lc, asplan->subplans)
> +{
> +SubPlan*splan = (SubPlan *) lfirst(lc);
> +
> +Assert(IsA(splan, SubPlan));
> +
> +if (max_parallel_hazard_walker((Node *) splan, context))
> +return true;
> +}
> +
> +return false;
>  }
>
> I don't see the reason for having this new code that makes
> AlternativeSubPlan walk over the subplans; expression_tree_walker
> already does that.
>

It is done this way mainly to avoid recursion/nested calls, but I
think you prefer to handle it via expression_tree_walker so that code
looks clean.  Another probable reason could be that there is always a
chance that we miss handling of some expression of a particular node
(in this case AlternativeSubPlan), but if that is the case then there
are other places in the code which do operate on individual subplan/'s
in AlternativeSubPlan list.

>  On the flip side I don't see the reason for
> removing the max_parallel_hazard_test() call for AlternativeSubPlan or
> for SubLink.

If we don't remove the current test of max_parallel_hazard_test() for
AlternativeSubPlan, then AlternativeSubPlan node will be considered
parallel restricted, so why do we want that after this patch.  For
Sublinks, I think they would have converted to Subplans by the time we
reach this function for the parallel restricted check.  Can you
elaborate what you have in mind for the handling of AlternativeSubPlan
and SubLink?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2017-02-01 Thread David Rowley
On 2 February 2017 at 00:13, Heikki Linnakangas  wrote:
> Ok, I'll drop the second patch for now. I committed the first patch after
> fixing the things you and Michael pointed out. Thanks for the review!

dbd69118 caused small compiler warning for me.

The attached fixed it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


encrypt_password_warning_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-02-01 Thread Euler Taveira
On 01-02-2017 00:05, Kyotaro HORIGUCHI wrote:
> - Subscriber connects with client_encoding specification and the
>   output plugin pgoutput decide whether it accepts the encoding
>   or not. If the subscriber doesn't, pgoutput send data without
>   conversion.
> 
I don't think storage without conversion is an acceptable approach. We
should provide options to users such as ignore tuple or NULL for
column(s) with conversion problem. I wouldn't consider storage data
without conversion because it silently show incorrect data and we
historically aren't flexible with conversion routines.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

2017-02-01 Thread Tom Lane
Andres Freund  writes:
> On 2017-02-02 00:09:03 +0100, Erik Rijkers wrote:
>> Something is broken in HEAD:

> Hm. Indeed.

> The issue is that we're generating ProjectSet nodes instead of Result
> for the top-level nodes - but we currently assume that ProjectSet nodes
> actually contain sets.   I'm tentatively in favor of generating proper
> Result nodes in this case, rather than allowing this in ProjectSet - on
> the other hand, it works after removing that Assert().

> Tom, do you have an opinion?

Yes, it's broken.  split_pathtarget_at_srfs seems to be doing the right
thing, but then something later is recombining the last two steps.
I should be able to find it soon.

Does it really work without the Assert?  I'd think you'd get srf-
where-not-supported errors if the SRF isn't at top level.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest 2017-01 will begin soon!

2017-02-01 Thread Michael Paquier
On Tue, Jan 31, 2017 at 4:01 PM, Michael Paquier
 wrote:
> The CF finishing soon, I have done a first pass on the patches
> remaining on it, moving patches to the next CF or updating them. There
> are still 48 patches still on the tracker when writing this email:
> Needs review: 32.
> Ready for Committer: 16.
> This requires a certain amount of energy to classify, so if you would
> like spare a bit of HP/MP to the CFM, feel free to cleanup your
> patches or the ones you are reviewing.
>
> @all: If I have updated your patch in a state that you think is not
> appropriate, feel free to complain on the thread of the patch or here.
> That's fine.

After 3 days and 90 patches spamming everybody on this -hackers, the
CF is now *closed*. Here is the final score of this session:
Committed: 58.
Moved to next CF: 70.
Rejected: 4.
Returned with Feedback: 23.
Total: 155.

It is nice to see that many people have registered as reviewers and
that much has been done. I have finished with the impression that
there has been more commitment in this area compared to the past.
Thanks to all the people involved!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-02-01 Thread Kyotaro HORIGUCHI
Hello.

This version makes subscriber automatically set its database
encoding to clinet_encoding (if not explicitly set). And changed
the behavior when pg_server_to_client returns NULL to ERROR from
sending original string.

At Wed, 1 Feb 2017 08:39:41 +, "Shinoda, Noriyoshi" 
 wrote in 

> Thank you for creating patches.
> I strongly hope that your patch will be merged into the new
> version.  Since all databases are not yet based on UTF - 8, I
> think conversion of character codes is still necessary.

Thanks.

> -Original Message-
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] 
> Sent: Wednesday, February 01, 2017 3:31 PM
> To: Shinoda, Noriyoshi 
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Logical Replication and Character encoding
> 
> At Wed, 01 Feb 2017 12:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20170201.121304.267734380.horiguchi.kyot...@lab.ntt.co.jp>
> > > >  I tried a committed Logical Replication environment. I found  
> > > > that replication between databases of different encodings did  not 
> > > > convert encodings in character type columns. Is this  behavior 
> > > > correct?
> > > 
> > > The output plugin for subscription is pgoutput and it currently 
> > > doesn't consider encoding but would easiliy be added if desired 
> > > encoding is informed.
> > > 
> > > The easiest (but somewhat seems fragile) way I can guess is,
> > > 
> > > - Subscriber connects with client_encoding specification and the
> > >   output plugin pgoutput decide whether it accepts the encoding
> > >   or not. If the subscriber doesn't, pgoutput send data without
> > >   conversion.
> > > 
> > > The attached small patch does this and works with the following 
> > > CREATE SUBSCRIPTION.
> > 
> > Oops. It forgets to care conversion failure. It is amended in the 
> > attached patch.
> > 
> > > CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 
> > > dbname=postgres client_encoding=EUC_JP' PUBLICATION pub1;
> > > 
> > > 
> > > Also we may have explicit negotiation on, for example, 
> > > CREATE_REPLICATION_SLOT.
> > > 
> > >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> > > 
> > > Or output plugin may take options.
> > > 
> > >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> > > 
> > > 
> > > Any opinions?
> 
> This patch chokes replication when the publisher finds an inconvertible 
> character in a tuple to be sent. For the case, dropping-then-recreating 
> subscription is necessary to go forward.

I think this behavior is right and inevitable in order to protect
subscribers from broken strings.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e3af51822d1318743e554e24163390b74b254751 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 2 Feb 2017 09:05:20 +0900
Subject: [PATCH] Enable logical-replication between databases with different
 encodings

Different from physical replication, logical replication may run
between databases with different encodings. This patch makes
subscriber set client_encoding to database encoding and publisher
follow it.
---
 .../libpqwalreceiver/libpqwalreceiver.c| 35 +-
 src/backend/replication/logical/proto.c| 17 +++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 44a89c7..ef38af7 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -24,6 +24,7 @@
 #include "access/xlog.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "replication/walreceiver.h"
 #include "storage/proc.h"
@@ -112,11 +113,43 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
  char **err)
 {
 	WalReceiverConn *conn;
+	const char *myconninfo = conninfo;
 	const char *keys[5];
 	const char *vals[5];
 	int			i = 0;
 
 	/*
+	 * For logical replication, set database encoding as client_encoding if
+	 * not specified in conninfo
+	 */
+	if (logical)
+	{
+		PQconninfoOption   *opts = NULL;
+
+		opts = PQconninfoParse(conninfo, NULL);
+
+		if (opts != NULL)
+		{
+			while (opts->keyword != NULL &&
+   strcmp(opts->keyword, "client_encoding") != 0)
+opts++;
+
+			/* add client_encoding to conninfo if not set */
+			if (opts->keyword == NULL || opts->val == NULL)
+			{
+StringInfoData s;
+
+/* Assuming that the memory context here is properly set */
+initStringInfo();
+appendStringInfoString(, conninfo);
+appendStringInfo(, " client_encoding=\"%s\"",
+ 

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-02-01 Thread Michael Paquier
On Tue, Jan 31, 2017 at 9:13 AM, David Rowley
 wrote:
> On 31 January 2017 at 13:10, David Rowley  
> wrote:
>> I've attached a patch which implements this.
>
> Please disregards previous patch. (I forgot git commit before git diff
> to make the patch)
>
> I've attached the correct patch.

Moved to CF 2017-03. (You are the last one.)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-02-01 Thread Michael Paquier
On Wed, Feb 1, 2017 at 10:20 PM, Amit Kapila  wrote:
> makes sense, so changed accordingly.

I have moved this patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-02-01 Thread Michael Paquier
On Tue, Jan 24, 2017 at 8:36 PM, Kuntal Ghosh
 wrote:
> Nothing else to add from my side. I'm marking this 'Ready for commiter'.

Moved to CF 2017-03 with the same status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-02-01 Thread Michael Paquier
On Tue, Jan 24, 2017 at 3:32 AM, Kevin Grittner  wrote:
> I will need some time to consider that

Moved to CF 2017-03 for now. The last patch sent still applies.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2017-02-01 Thread Michael Paquier
On Sat, Jan 7, 2017 at 5:30 PM, Andrey Borodin  wrote:
> The new status of this patch is: Ready for Committer

I don't think that this thread has reached a conclusion yet. From what
I can see the last patch does not apply, so I have moved the patch to
next CF with "waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2017-02-01 Thread Michael Paquier
On Fri, Jan 27, 2017 at 11:38 PM, Robert Haas  wrote:
> On Fri, Jan 27, 2017 at 9:14 AM, Peter Eisentraut
>  wrote:
>> On 1/19/17 12:47 PM, Andrey Borodin wrote:
>>> 4. There is some controversy on where implemented feature shall be: in 
>>> separate extension (as in this patch), in db_link, in some PL API, in FDW 
>>> or somewhere else. I think that new extension is an appropriate place for 
>>> the feature. But I’m not certain.
>>
>> I suppose we should decide first whether we want pg_background as a
>> separate extension or rather pursue extending dblink as proposed elsewhere.
>>
>> I don't know if pg_background allows any use case that dblink can't
>> handle (yet).
>
> For the record, I have no big problem with extending dblink to allow
> this instead of adding pg_background.  But I think we should try to
> get one or the other done in time for this release.

Moved to CF 2017-03 as the discussion is not over yet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-02-01 Thread Kyotaro HORIGUCHI
At Thu, 2 Feb 2017 08:46:11 +0900, Michael Paquier  
wrote in 
> On Thu, Feb 2, 2017 at 2:14 AM, Fujii Masao  wrote:
> > The lwlock would be released when an exception occurs, so I don't think
> > that TRY-CATCH is necessary here. Or it's necessary for another reason?
> 
> +PG_CATCH();
> +{
> +LWLockRelease(LogicalRepLauncherLock);
> +PG_RE_THROW();
> +}
> +PG_END_TRY();
> Just to do that, a TRY/CATCH block looks like an overkill to me. Why
> not just call LWLockRelease in the ERROR and return code paths?

I though the same first. The modification at the "if (wrconn =="
is the remains of that. It is reverted inthe attached patch.

Then, the reason for the TRY-CATCH cluase is that I found that
some functions called from there can throw exceptions.

logicalrep_worker_stop and replorigin_drop have ereport in its path.
load_library apparently can throw exception.
(walrcv_(libpq_) functions don't seeem to.)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d0ca653bb2aa776742a2e7a697b02794b1ad66d9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 2 Feb 2017 11:33:40 +0900
Subject: [PATCH] Fix DROP SUBSCRIPTION's lock leak.

DROP SUBSCRIPTION acquires a lock on LogicalRepLauncherLock but never
releases. This fixes it.
---
 src/backend/commands/subscriptioncmds.c | 86 ++---
 1 file changed, 48 insertions(+), 38 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5de..223eea4 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -511,52 +511,62 @@ DropSubscription(DropSubscriptionStmt *stmt)
 	/* Protect against launcher restarting the worker. */
 	LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
 
-	/* Kill the apply worker so that the slot becomes accessible. */
-	logicalrep_worker_stop(subid);
-
-	/* Remove the origin tracking if exists. */
-	snprintf(originname, sizeof(originname), "pg_%u", subid);
-	originid = replorigin_by_name(originname, true);
-	if (originid != InvalidRepOriginId)
-		replorigin_drop(originid);
-
-	/* If the user asked to not drop the slot, we are done mow.*/
-	if (!stmt->drop_slot)
-	{
-		heap_close(rel, NoLock);
-		return;
-	}
-
 	/*
-	 * Otherwise drop the replication slot at the publisher node using
-	 * the replication connection.
+	 * Some functions called here can throw exceptions. we must release
+	 * LogicalRepLauncherLock for the case.
 	 */
-	load_file("libpqwalreceiver", false);
+	PG_TRY();
+	{
+		/* Kill the apply worker so that the slot becomes accessible. */
+		logicalrep_worker_stop(subid);
 
-	initStringInfo();
-	appendStringInfo(, "DROP_REPLICATION_SLOT \"%s\"", slotname);
+		/* Remove the origin tracking if exists. */
+		snprintf(originname, sizeof(originname), "pg_%u", subid);
+		originid = replorigin_by_name(originname, true);
+		if (originid != InvalidRepOriginId)
+			replorigin_drop(originid);
 
-	wrconn = walrcv_connect(conninfo, true, subname, );
-	if (wrconn == NULL)
-		ereport(ERROR,
-(errmsg("could not connect to publisher when attempting to "
-		"drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err)));
+		/* Do the follwoing only if the user asked to actually drop the slot */
+		if (stmt->drop_slot)
+		{
+			/*
+			 * Drop the replication slot at the publisher node using the
+			 * replication connection.
+			 */
+			load_file("libpqwalreceiver", false);
 
-	if (!walrcv_command(wrconn, cmd.data, ))
-		ereport(ERROR,
-(errmsg("could not drop the replication slot \"%s\" on publisher",
-		slotname),
- errdetail("The error was: %s", err)));
-	else
-		ereport(NOTICE,
-(errmsg("dropped replication slot \"%s\" on publisher",
-		slotname)));
+			initStringInfo();
+			appendStringInfo(, "DROP_REPLICATION_SLOT \"%s\"", slotname);
+
+			wrconn = walrcv_connect(conninfo, true, subname, );
+			if (wrconn == NULL)
+ereport(ERROR,
+		(errmsg("could not connect to publisher when attempting to "
+"drop the replication slot \"%s\"", slotname),
+		 errdetail("The error was: %s", err)));
 
-	walrcv_disconnect(wrconn);
+			else if (!walrcv_command(wrconn, cmd.data, ))
+ereport(ERROR,
+		(errmsg("could not drop the replication slot \"%s\" on publisher",
+slotname),
+		 errdetail("The error was: %s", err)));
+
+			ereport(NOTICE,
+	(errmsg("dropped replication slot \"%s\" on publisher",
+			slotname)));
 
-	pfree(cmd.data);
+			walrcv_disconnect(wrconn);
+			pfree(cmd.data);
+		}
+	}
+	PG_CATCH();
+	{
+		LWLockRelease(LogicalRepLauncherLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 
+	LWLockRelease(LogicalRepLauncherLock);
 	heap_close(rel, NoLock);
 }
 
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] sequence data type

2017-02-01 Thread Michael Paquier
On Wed, Feb 1, 2017 at 10:02 AM, Michael Paquier
 wrote:
> On Wed, Feb 1, 2017 at 1:11 AM, Peter Eisentraut
>  wrote:
>> And here is a rebased patch for the original feature.  I think this
>> addresses all raised concerns and suggestions now.
>
> Thanks for the new version. That looks good to me after an extra lookup.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Jim Nasby

On 2/1/17 4:27 PM, Andres Freund wrote:

On 2017-02-02 09:22:46 +0900, Michael Paquier wrote:

On Thu, Feb 2, 2017 at 9:17 AM, Jim Nasby  wrote:

Speaking of which... I have a meeting in 15 minutes to discuss moving to a
server with 4TB of memory. With current limits shared buffers maxes at 16TB,
which isn't all that far in the future. While 16TB of shared buffers might
not be a good idea, it's not going to be terribly long before we start
getting questions about it.


Time for int64 GUCs?


I don't think the GUC bit is the hard part.  We'd possibly need some
trickery (like not storing bufferid in BufferDesc anymore) to avoid
increasing memory usage.


Before doing that the first thing to look at would be why the limit is 
currently INT_MAX / 2 instead of INT_MAX.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-01 Thread Higuchi, Daisuke
From: Ashutosh Bapat [mailto:ashutosh.ba...@enterprisedb.com] 
> Per the documentation [1], "PQgetResult must be called repeatedly
> until it returns a null pointer, indicating that the command is
> done.". The code in PQgetResult() under CONNECTION_CHECK_WRITABLE
> case, violates that. The patch fixes it. The patch however jumps to
> keep_going without changing conn->status, which means that it will end
> up again in the same case. While that's fine, may be we should use a
> local loop on PQgetResult() to keep the code readable.
Thank you for reviewing the patch. 
I created it with reference to pqSetenvPoll() in 
interfaces/libpq/fe-protocol2.c, 
but I certainly thought that readability is not good. 
I updated the patch, so I will add this to the next commitfest. 

Regards, 
Daisuke, Higuchi



PQsendQuery_for_target_session_attrs_v2.patch
Description: PQsendQuery_for_target_session_attrs_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-02-01 Thread Kyotaro HORIGUCHI
Thank you for the comment.

At Thu, 2 Feb 2017 01:26:03 +0900, Fujii Masao  wrote in 

Re: [HACKERS] WAL consistency check facility

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 9:31 PM, Michael Paquier
 wrote:
> On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:
>> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>>  wrote:
>>> I've attached the patch with the modified changes. PFA.
>>
>> Can this patch check contrib/bloom?
>
> Only full pages are applied at redo by the generic WAL facility. So
> you would finish by comparing a page with itself in generic_redo.

generic_redo is more complicated than just restoring an FPI.  I admit
that generic_redo *probably* has no bugs, but I don't see why it would
hurt to allow it to be checked.  It's not IMPOSSIBLE that restoring
the page delta could go wrong somehow.

>> +/*
>> + * For a speculative tuple, the content of t_ctid is conflicting
>> + * between the backup page and current page. Hence, we set it
>> + * to the current block number and current offset.
>> + */
>>
>> Why does it differ?  Is that a bug?
>
> This has been discussed twice in this thread, once by me, once by Alvaro:
> https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com
> https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com

Sorry, I missed/forgot about that.  I think this is evidence that the
comment isn't really good enough.  It says "hey, this might differ"
... with no real explanation of why it might differ or why that's OK.
If there were an explanation there, I wouldn't have flagged it.

>> +/*
>> + * Leave if no masking functions defined, this is possible in the case
>> + * resource managers generating just full page writes, comparing an
>> + * image to itself has no meaning in those cases.
>> + */
>> +if (RmgrTable[rmid].rm_mask == NULL)
>> +return;
>>
>> ...and also...
>>
>> +/*
>> + * This feature is enabled only for the resource managers where
>> + * a masking function is defined.
>> + */
>> +for (i = 0; i <= RM_MAX_ID; i++)
>> +{
>> +if (RmgrTable[i].rm_mask != NULL)
>>
>> Why do we assume that the feature is only enabled for RMs that have a
>> mask function?  Why not instead assume that if there's no masking
>> function, no masking is required?
>
> Not all RMs work on full pages. Tracking in smgr.h the list of RMs
> that are no-ops when masking things is easier than having empty
> routines declared all over the code base. Don't you think so>

Sure, but I don't think that's what the code is doing.  If an RM is a
no-op when masking things, it must define an empty function.  If it
defines no function, checking is disabled.  I think we want to be able
to check any RM.  No?

>> +void(*rm_mask) (char *page, BlockNumber blkno);
>>
>> Could the page be passed as type "Page" rather than a "char *" to make
>> things more convenient for the masking functions?  If not, could those
>> functions at least do something like "Page page = (Page) pagebytes;"
>> rather than "Page page_norm = (Page) page;"?
>
> xlog_internal.h is used as well by frontends, this makes the header
> dependencies cleaner. (I have looked at using Page when hacking this
> stuff, but the header dependencies are not worth it, I don't recall
> all the details though this was a couple of months back).

OK.  I still think page_norm is a lame variable name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Andres Freund
On 2017-02-01 20:38:58 -0500, Robert Haas wrote:
> On Wed, Feb 1, 2017 at 8:35 PM, Andres Freund  wrote:
> > On 2017-02-01 20:30:30 -0500, Robert Haas wrote:
> >> On Wed, Feb 1, 2017 at 7:28 PM, Andres Freund  wrote:
> >> > On 2016-11-28 11:40:53 -0800, Jim Nasby wrote:
> >> >> With current limits, the most bgwriter can do (with 8k pages) is 1000 
> >> >> pages
> >> >> * 100 times/sec = 780MB/s. It's not hard to exceed that with modern
> >> >> hardware. Should we increase the limit on bgwriter_lru_maxpages?
> >> >
> >> > FWIW, I think working on replacing bgwriter (e.g. by working on the
> >> > patch I send with a POC replacement) wholesale is a better approach than
> >> > spending time increasing limits.
> >>
> >> I'm happy to see it replaced, but increasing the limits is about three
> >> orders of magnitude less work than replacing it, so let's not block
> >> this on the theory that the other thing would be better.
> >
> > I seriously doubt you can meaningfully exceed 780MB/s with the current
> > bgwriter. So it's not like the limits are all that relevant right now.
> 
> Sigh.  The patch is harmless and there are 4 or 5 votes in favor of
> it, one of which clearly states that the person involved has hit seen
> it be a problem in real workloads.  Do we really have to argue about
> this?

I don't mind increasing the limit, it's harmless. I just seriously doubt
it actually addresses any sort of problem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Robert Haas
On Wed, Feb 1, 2017 at 8:35 PM, Andres Freund  wrote:
> On 2017-02-01 20:30:30 -0500, Robert Haas wrote:
>> On Wed, Feb 1, 2017 at 7:28 PM, Andres Freund  wrote:
>> > On 2016-11-28 11:40:53 -0800, Jim Nasby wrote:
>> >> With current limits, the most bgwriter can do (with 8k pages) is 1000 
>> >> pages
>> >> * 100 times/sec = 780MB/s. It's not hard to exceed that with modern
>> >> hardware. Should we increase the limit on bgwriter_lru_maxpages?
>> >
>> > FWIW, I think working on replacing bgwriter (e.g. by working on the
>> > patch I send with a POC replacement) wholesale is a better approach than
>> > spending time increasing limits.
>>
>> I'm happy to see it replaced, but increasing the limits is about three
>> orders of magnitude less work than replacing it, so let's not block
>> this on the theory that the other thing would be better.
>
> I seriously doubt you can meaningfully exceed 780MB/s with the current
> bgwriter. So it's not like the limits are all that relevant right now.

Sigh.  The patch is harmless and there are 4 or 5 votes in favor of
it, one of which clearly states that the person involved has hit seen
it be a problem in real workloads.  Do we really have to argue about
this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Andres Freund
On 2017-02-01 20:30:30 -0500, Robert Haas wrote:
> On Wed, Feb 1, 2017 at 7:28 PM, Andres Freund  wrote:
> > On 2016-11-28 11:40:53 -0800, Jim Nasby wrote:
> >> With current limits, the most bgwriter can do (with 8k pages) is 1000 pages
> >> * 100 times/sec = 780MB/s. It's not hard to exceed that with modern
> >> hardware. Should we increase the limit on bgwriter_lru_maxpages?
> >
> > FWIW, I think working on replacing bgwriter (e.g. by working on the
> > patch I send with a POC replacement) wholesale is a better approach than
> > spending time increasing limits.
> 
> I'm happy to see it replaced, but increasing the limits is about three
> orders of magnitude less work than replacing it, so let's not block
> this on the theory that the other thing would be better.

I seriously doubt you can meaningfully exceed 780MB/s with the current
bgwriter. So it's not like the limits are all that relevant right now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Robert Haas
On Wed, Feb 1, 2017 at 7:28 PM, Andres Freund  wrote:
> On 2016-11-28 11:40:53 -0800, Jim Nasby wrote:
>> With current limits, the most bgwriter can do (with 8k pages) is 1000 pages
>> * 100 times/sec = 780MB/s. It's not hard to exceed that with modern
>> hardware. Should we increase the limit on bgwriter_lru_maxpages?
>
> FWIW, I think working on replacing bgwriter (e.g. by working on the
> patch I send with a POC replacement) wholesale is a better approach than
> spending time increasing limits.

I'm happy to see it replaced, but increasing the limits is about three
orders of magnitude less work than replacing it, so let's not block
this on the theory that the other thing would be better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Andres Freund
On 2016-11-28 11:40:53 -0800, Jim Nasby wrote:
> With current limits, the most bgwriter can do (with 8k pages) is 1000 pages
> * 100 times/sec = 780MB/s. It's not hard to exceed that with modern
> hardware. Should we increase the limit on bgwriter_lru_maxpages?

FWIW, I think working on replacing bgwriter (e.g. by working on the
patch I send with a POC replacement) wholesale is a better approach than
spending time increasing limits.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Andres Freund
On 2017-02-02 09:22:46 +0900, Michael Paquier wrote:
> On Thu, Feb 2, 2017 at 9:17 AM, Jim Nasby  wrote:
> > Speaking of which... I have a meeting in 15 minutes to discuss moving to a
> > server with 4TB of memory. With current limits shared buffers maxes at 16TB,
> > which isn't all that far in the future. While 16TB of shared buffers might
> > not be a good idea, it's not going to be terribly long before we start
> > getting questions about it.
> 
> Time for int64 GUCs?

I don't think the GUC bit is the hard part.  We'd possibly need some
trickery (like not storing bufferid in BufferDesc anymore) to avoid
increasing memory usage.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 9:17 AM, Jim Nasby  wrote:
> Speaking of which... I have a meeting in 15 minutes to discuss moving to a
> server with 4TB of memory. With current limits shared buffers maxes at 16TB,
> which isn't all that far in the future. While 16TB of shared buffers might
> not be a good idea, it's not going to be terribly long before we start
> getting questions about it.

Time for int64 GUCs?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Jim Nasby

On 2/1/17 3:36 PM, Michael Paquier wrote:

On Thu, Feb 2, 2017 at 7:01 AM, Jim Nasby  wrote:

On 2/1/17 10:27 AM, Robert Haas wrote:

This looks fine to me.

This could go without the comments, they are likely going to be
forgotten if any updates happen in the future.


I'm confused... I put the comments in there so if max shared buffers 
ever changed the other one would hopefully be updated as well.


Speaking of which... I have a meeting in 15 minutes to discuss moving to 
a server with 4TB of memory. With current limits shared buffers maxes at 
16TB, which isn't all that far in the future. While 16TB of shared 
buffers might not be a good idea, it's not going to be terribly long 
before we start getting questions about it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 2:14 AM, Fujii Masao  wrote:
> The lwlock would be released when an exception occurs, so I don't think
> that TRY-CATCH is necessary here. Or it's necessary for another reason?

+PG_CATCH();
+{
+LWLockRelease(LogicalRepLauncherLock);
+PG_RE_THROW();
+}
+PG_END_TRY();
Just to do that, a TRY/CATCH block looks like an overkill to me. Why
not just call LWLockRelease in the ERROR and return code paths?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

2017-02-01 Thread Andres Freund
Hi,

On 2017-02-02 00:09:03 +0100, Erik Rijkers wrote:
> Something is broken in HEAD:

Hm. Indeed.

> drop table if exists t;
> create table t(c text);
> insert into t (c) values ( 'abc' ) ;
> 
> select
>   regexp_split_to_array(
>   regexp_split_to_table(
> c
>   , chr(13) || chr(10)  )
>   , '","' )
>   as a
>  ,
>   regexp_split_to_table(
>   c
> , chr(13) || chr(10)
>  )
>   as rw
> from t
> ;
> 
> TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

> I realise the regexp* functions aren't doing anything particularly useful
> anymore here; they did in the more complicated original (which I had used
> for years).

The issue is that we're generating ProjectSet nodes instead of Result
for the top-level nodes - but we currently assume that ProjectSet nodes
actually contain sets.   I'm tentatively in favor of generating proper
Result nodes in this case, rather than allowing this in ProjectSet - on
the other hand, it works after removing that Assert().

Tom, do you have an opinion?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 7:01 AM, Jim Nasby  wrote:
> On 2/1/17 10:27 AM, Robert Haas wrote:
>> This looks fine to me.

This could go without the comments, they are likely going to be
forgotten if any updates happen in the future.

> If someone wants to proactively commit this, the CF entry is
> https://commitfest.postgresql.org/13/979/.
> (BTW, the Jan. CF is still showing as in-progress...)

WIP.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

2017-02-01 Thread Erik Rijkers

Something is broken in HEAD:


drop table if exists t;
create table t(c text);
insert into t (c) values ( 'abc' ) ;

select
  regexp_split_to_array(
  regexp_split_to_table(
c
  , chr(13) || chr(10)  )
  , '","' )
  as a
 ,
  regexp_split_to_table(
  c
, chr(13) || chr(10)
 )
  as rw
from t
;

TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)


I realise the regexp* functions aren't doing anything particularly 
useful anymore here; they did in the more complicated original (which I 
had used for years).


thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cast jsonb to numeric, int, float, bool

2017-02-01 Thread Nikita Glukhov

On 02.02.2017 01:07, Jim Nasby wrote:

On 2/1/17 8:26 AM, Nikita Glukhov wrote:

Some comments about the code: I think it would be better to
 * add function for extraction of scalars from pseudo-arrays
 * iterate until WJB_DONE to pfree iterator


I'm not sure what your intent here is, but if the idea is that a json array
would magically cast to a bool or a number data type I think that's a bad idea.


My idea, of course, is not about casting any json array to a scalar.  It is only
about helper subroutine for extraction of top-level jsonb scalars that are 
always
stored as one-element pseudo-arrays with special flag F_SCALAR in the array 
header.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-02-01 Thread Claudio Freire
On Wed, Feb 1, 2017 at 6:13 PM, Masahiko Sawada  wrote:
> On Wed, Feb 1, 2017 at 10:02 PM, Claudio Freire  
> wrote:
>> On Wed, Feb 1, 2017 at 5:47 PM, Masahiko Sawada  
>> wrote:
>>> Thank you for updating the patch.
>>>
>>> Whole patch looks good to me except for the following one comment.
>>> This is the final comment from me.
>>>
>>> /*
>>>  *  lazy_tid_reaped() -- is a particular tid deletable?
>>>  *
>>>  *  This has the right signature to be an IndexBulkDeleteCallback.
>>>  *
>>>  *  Assumes dead_tuples array is in sorted order.
>>>  */
>>> static bool
>>> lazy_tid_reaped(ItemPointer itemptr, void *state)
>>> {
>>> LVRelStats *vacrelstats = (LVRelStats *) state;
>>>
>>> You might want to update the comment of lazy_tid_reaped() as well.
>>
>> I don't see the mismatch with reality there (if you consider
>> "dead_tples array" in the proper context, that is, the multiarray).
>>
>> What in particular do you find out of sync there?
>
> The current lazy_tid_reaped just find a tid from a tid array using
> bsearch but in your patch lazy_tid_reaped handles multiple tid arrays
> and processing method become complicated. So I thought it's better to
> add the description of this function.

Alright, updated with some more remarks that seemed relevant
From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c| 318 ---
 src/test/regress/expected/vacuum.out |   8 +
 src/test/regress/sql/vacuum.sql  |  10 ++
 3 files changed, 271 insertions(+), 65 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 005440e..4a6895c 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -12,11 +12,24 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
- * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * uselessly for vacuuming small tables). Additional arrays of increasingly
+ * large sizes are allocated as they become necessary.
+ *
+ * The TID array is thus represented as a list of multiple segments of
+ * varying size, beginning with the initial size of up to 128MB, and growing
+ * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
+ * is used up.
+ *
+ * Lookup in that structure proceeds sequentially in the list of segments,
+ * and with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(N log N) lookup complexity.
+ *
+ * If the array threatens to overflow, we suspend the heap scan phase and
+ * perform a pass of index cleanup and page compaction, then resume the heap
+ * scan with an array of logically empty but already preallocated TID segments
+ * to be refilled with more dead tuple TIDs.
  *
  * If we're processing a table with no indexes, we can just vacuum each page
  * as we go; there's no need to save up multiple tuples to minimize the number
@@ -93,6 +106,14 @@
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
 
 /*
+ * Minimum (starting) size of the dead_tuples array segments. Will allocate
+ * space for 128MB worth of tid pointers in the first segment, further segments
+ * will grow in size exponentially. Don't make it too small or the segment list
+ * will grow bigger than the sweetspot for search efficiency on big vacuums.
+ */
+#define LAZY_MIN_TUPLES		Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData))
+
+/*
  * Before we consider skipping a page that's marked as clean in
  * visibility map, we must've seen at least this many clean pages.
  */
@@ -104,6 +125,27 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+typedef struct DeadTuplesSegment
+{
+	int			num_dead_tuples;	/* # of entries in the segment */
+	int			max_dead_tuples;	/* # of entries allocated in the segment */
+	ItemPointerData last_dead_tuple;	/* Copy of the last dead tuple (unset
+		 * until the segment is fully
+		 * populated) */
+	unsigned short padding;
+	ItemPointer dt_tids;	/* Array of dead tuples */

Re: [HACKERS] multivariate statistics (v19)

2017-02-01 Thread Alvaro Herrera
Still looking at 0002.

pg_ndistinct_in disallows input, claiming that pg_node_tree does the
same thing.  But pg_node_tree does it for security reasons: you could
crash the backend if you supplied a malicious value.  I don't think that
applies to pg_ndistinct_in.  Perhaps it will be useful to inject fake
stats at some point, so why not allow it?  It shouldn't be complicated
(though it does require writing some additional code, so perhaps that's
one reason we don't want to allow input of these values).

The comment on top of pg_ndistinct_out is missing the "_out"; also it
talks about histograms, which is not what this is about.

In the same function, a trivial point you don't need to pstrdup() the
.data out of a stringinfo; it's already palloc'ed in the right context
-- just PG_RETURN_CSTRING(str.data) and forget about "ret".  Saves you
one line.

Nearby, some auxiliary functions such as n_choose_k and num_combinations
are not documented.  What it is that they do?  I'd move these at the end
of the file, keeping the important entry points at the top of the file.

I see this patch has a estimate_ndistinct() which claims to be a re-
implementation of code already in analyze.c, but it is actually a lot
simpler than what analyze.c does.  I've been wondering if it'd be a good
idea to use some of this code so that some routines are moved out of
analyze.c; good implementations of statistics-related functions would
live in src/backend/statistics/ where they can be used both by analyze.c
and your new mvstats stuff.  (More generally I am beginning to wonder if
the new directory should be just src/backend/statistics.)

common.h does not belong in src/backend/utils/mvstats; IMO it should be
called src/include/utils/mvstat.h.  Also, it must not include
postgres.h, and it probably doesn't need most of the #includes it has;
those are better put into whatever include it.  It definitely needs a
guarding #ifdef MVSTATS_H around its whole content too.  An include file
is not just a way to avoid #includes in other files; it is supposed to
be a minimally invasive way of exporting the structs and functions
implemented in some file into other files.  So it must be kept minimal.

psql/tab-complete.c compares the wrong version number (9.6 instead of
10).

Is it important to have a cast from pg_ndistinct to bytea?  I think
it's odd that outputting it as bytea yields something completely
different than as text.  (The bytea is not human readable and cannot be
used for future input, so what is the point?)


In another subthread you seem to have surrendered to the opinion that
the new catalog should be called pg_statistics_ext, just in case in the
future we come up with additional things to put on it.  However, given
its schema, with a "starelid / stakeys", is it sensible to think that
we're going to get anything other than something that involves multiple
variables?  Maybe it should just be "pg_statistics_multivar" and if
something else comes along we create another catalog with an appropriate
schema.  Heck, how does this catalog serve the purpose of cross-table
statistics in the first place, given that it has room to record a single
relid only?  Are you thinking that in the future you'd change starelid
into an oidvector column?

The comment in gram.y about the CREATE STATISTICS is at odds with what
is actually allowed by the grammar.

I think the name of a statistics is only useful to DROP/ALTER it, right?
I wonder why it's useful that statistics belongs in a schema.  Perhaps
it should be a global object?  I suppose the name collisions would
become bothersome if you have many mvstats.  

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cast jsonb to numeric, int, float, bool

2017-02-01 Thread Jim Nasby

On 2/1/17 8:26 AM, Nikita Glukhov wrote:

If you find it useful, I can also add support of json and other types,
such as smallint and bigint.


Yes, I'd like to have support for other types and maybe for json.


There's been previous discussion about something similar, which would be 
better supporting "casting an unknown to smallint". IIRC there's some 
non-trivial problem with trying to support that.



Some comments about the code: I think it would be better to
 * add function for extraction of scalars from pseudo-arrays
 * iterate until WJB_DONE to pfree iterator


I'm not sure what your intent here is, but if the idea is that a json 
array would magically cast to a bool or a number data type I think 
that's a bad idea.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-01 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I think what we ought to do about this is invent additional API
>> functions, say
>> 
>> Oid CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
>> CatalogIndexState indstate);
>> void CatalogTupleUpdateWithInfo(Relation heapRel, ItemPointer otid,
>> HeapTuple tup, CatalogIndexState indstate);
>> 
>> and use these in place of simple_heap_foo plus CatalogIndexInsert
>> in the places where this optimization had been applied.

> This looks reasonable enough to me.

Done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Jim Nasby

On 2/1/17 10:27 AM, Robert Haas wrote:

On Tue, Jan 31, 2017 at 5:07 PM, Jim Nasby  wrote:

On 11/29/16 9:58 AM, Jeff Janes wrote:

Considering a single SSD can do 70% of that limit, I would say
yes.

Next question becomes... should there even be an upper limit?


Where the contortions needed to prevent calculation overflow become
annoying?

I'm not a big fan of nannyism in general, but the limits on this
parameter seem particularly pointless.  You can't write out more buffers
than exist in the dirty state, nor more than implied
by bgwriter_lru_multiplier.  So what is really the worse that can happen
if you make it too high?



Attached is a patch that ups the limit to INT_MAX / 2, which is the same as
shared_buffers.


This looks fine to me.


If someone wants to proactively commit this, the CF entry is 
https://commitfest.postgresql.org/13/979/. (BTW, the Jan. CF is still 
showing as in-progress...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of replication commands using printsimple

2017-02-01 Thread Michael Paquier
On Thu, Feb 2, 2017 at 4:05 AM, Robert Haas  wrote:
> On Tue, Jan 31, 2017 at 8:26 PM, Michael Paquier
>  wrote:
>> pq_sendcountedtext() does some encoding conversion, which is why I
>> haven't used because we deal only with integers in this patch... Now
>> if you wish to switch to that I have really no arguments against.
>
> OK, I see.  Well, I guess it's sensible either way, but I've committed
> this version.  Thanks.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-01 Thread Jim Nasby

On 2/1/17 12:03 PM, Fabien COELHO wrote:

I'm unsure whether it is a good idea, I like terse interfaces, but this
is just an opinion.


I think the issue here is that the original case for this is a user 
accidentally getting into an \if and then having no clue what's going 
on. That's similar to what happens when you miss a quote or a semicolon. 
We handle those cases with %R, and I think %R needs to support if as well.


Perhaps there's value to providing more info (active branch, etc), but 
ISTM trying to do that will just confuse the original (%R) case.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make psql's \set display variables in alphabetical order.

2017-02-01 Thread David Fetter
On Wed, Feb 01, 2017 at 04:38:59PM -0500, Robert Haas wrote:
> On Wed, Feb 1, 2017 at 1:08 PM, Andres Freund  wrote:
> > On 2017-02-01 12:59:36 -0500, Tom Lane wrote:
> >> David Fetter  writes:
> >> > On Wed, Feb 01, 2017 at 04:25:25PM +, Tom Lane wrote:
> >> >> Make psql's \set display variables in alphabetical order.
> >>
> >> > This is a substantial usability improvement which nevertheless is hard
> >> > to imagine changes things that scripts relied on. I think it's worth
> >> > back-patching.
> >>
> >> I'm not that excited about it personally, but I agree it would be unlikely
> >> to break anything.  Other votes?
> >
> > -0.5 - I see very little reason to backpatch this over dozes of other
> > changes.
> 
> I'll vote -1. I don't think this is a bug fix.

I withdraw my suggestion.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-02-01 Thread Jim Nasby

On 1/27/17 4:14 AM, Greg Stark wrote:

On 25 January 2017 at 20:06, Jim Nasby  wrote:

GUCs support SET LOCAL, but that's not the same as local scoping because the
setting stays in effect unless the substrans aborts. What I'd like is the
ability to set a GUC in a plpgsql block *and have the setting revert on
block exit*.


I'm wondering which GUCs you have in mind to use this with.


It's been quite some time since I messed with this; the only case I 
remember right now is wanting to do a temporary SET ROLE in an "exec" 
function:


SELECT tools.exec( 'some sql;', role := 'superuser_role' );

To do that, exec has to remember what the current role is and then set 
it back (as well as remembering to do SET LOCAL in case an error happens.



Because what you're describing is dynamic scoping and I'm wondering if
what you're really looking for is lexical scoping. That would be more
in line with how PL/PgSQL variables are scoped and with how #pragmas
usually work. But it would probably not be easy to reconcile with how
GUCs work.


Right, because GUCs aren't even simply dynamically scoped; they're 
dynamically scoped with transaction support.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make psql's \set display variables in alphabetical order.

2017-02-01 Thread Robert Haas
On Wed, Feb 1, 2017 at 1:08 PM, Andres Freund  wrote:
> On 2017-02-01 12:59:36 -0500, Tom Lane wrote:
>> David Fetter  writes:
>> > On Wed, Feb 01, 2017 at 04:25:25PM +, Tom Lane wrote:
>> >> Make psql's \set display variables in alphabetical order.
>>
>> > This is a substantial usability improvement which nevertheless is hard
>> > to imagine changes things that scripts relied on. I think it's worth
>> > back-patching.
>>
>> I'm not that excited about it personally, but I agree it would be unlikely
>> to break anything.  Other votes?
>
> -0.5 - I see very little reason to backpatch this over dozes of other
> changes.

I'll vote -1. I don't think this is a bug fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-02-01 Thread Craig Ringer
On 2 Feb. 2017 08:32, "Tom Lane"  wrote:

Robert Haas  writes:
> Also, including the GID in the WAL for each COMMIT/ABORT PREPARED
> doesn't seem inordinately expensive to me.

I'm confused ... isn't it there already?  If not, how do we handle
reconstructing 2PC state from WAL at all?


Right. Per my comments uothread I don't see why we need to add anything
more to WAL here.

Stas was concerned about what happens in logical decoding if we crash
between PREPSRE TRANSACTION and COMMIT PREPARED. But we'll always go back
and decode the whole txn again anyway so it doesn't matter.

We can just track it on ReorderBufferTxn when we see it at PREPARE
TRANSACTION time.


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-01 Thread Alvaro Herrera
Tom Lane wrote:

> The source of both of those problems is that in some places, we
> did CatalogOpenIndexes and then used the CatalogIndexState for
> multiple tuple inserts/updates before doing CatalogCloseIndexes.
> The patch dealt with these either by not touching them, just
> leaving the simple_heap_insert/update calls in place (thus failing
> to create any abstraction), or by blithely ignoring the optimization
> and doing s/simple_heap_insert/CatalogTupleInsert/ anyway.  For example,
> in inv_api.c we are now doing a CatalogOpenIndexes/CatalogCloseIndexes
> cycle for each chunk of the large object ... and just to add insult to
> injury, the now-useless open/close calls outside the loop are still there.

Ouch.  You're right, I missed that.

> I think what we ought to do about this is invent additional API
> functions, say
> 
> Oid CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
>CatalogIndexState indstate);
> void CatalogTupleUpdateWithInfo(Relation heapRel, ItemPointer otid,
> HeapTuple tup, CatalogIndexState indstate);
> 
> and use these in place of simple_heap_foo plus CatalogIndexInsert
> in the places where this optimization had been applied.

This looks reasonable enough to me.

> An alternative but much more complicated fix would be to get rid of
> the necessity for callers to worry about this at all, by caching
> a CatalogIndexState in the catalog's relcache entry.  That might be
> worth doing eventually (because it would allow sharing index info
> collection across unrelated operations) but I don't want to do it today.

Hmm, interesting idea.  No disagreement on postponing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-02-01 Thread Tom Lane
I wrote:
> Evidently collateral damage from 2f5c9d9c9.  But I'd suggest waiting
> to fix it until you can also do s/simple_heap_delete/CatalogTupleDelete/
> as I proposed in
> https://www.postgresql.org/message-id/462.1485902...@sss.pgh.pa.us
> I'll go make that happen right now, now that I realize there are patch(es)
> waiting on it.

Done.  There's some more loose ends but they won't affect very many
call sites, so you should be able to rebase now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-02-01 Thread Masahiko Sawada
On Wed, Feb 1, 2017 at 10:02 PM, Claudio Freire  wrote:
> On Wed, Feb 1, 2017 at 5:47 PM, Masahiko Sawada  wrote:
>> Thank you for updating the patch.
>>
>> Whole patch looks good to me except for the following one comment.
>> This is the final comment from me.
>>
>> /*
>>  *  lazy_tid_reaped() -- is a particular tid deletable?
>>  *
>>  *  This has the right signature to be an IndexBulkDeleteCallback.
>>  *
>>  *  Assumes dead_tuples array is in sorted order.
>>  */
>> static bool
>> lazy_tid_reaped(ItemPointer itemptr, void *state)
>> {
>> LVRelStats *vacrelstats = (LVRelStats *) state;
>>
>> You might want to update the comment of lazy_tid_reaped() as well.
>
> I don't see the mismatch with reality there (if you consider
> "dead_tples array" in the proper context, that is, the multiarray).
>
> What in particular do you find out of sync there?

The current lazy_tid_reaped just find a tid from a tid array using
bsearch but in your patch lazy_tid_reaped handles multiple tid arrays
and processing method become complicated. So I thought it's better to
add the description of this function.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-02-01 Thread Robert Haas
On Mon, Jan 2, 2017 at 7:32 AM, Ashutosh Bapat
 wrote:
> PFA the patch (pg_dp_join_v6.patch) with some bugs fixed and rebased
> on the latest code.

Maybe not surprisingly given how fast things are moving around here
these days, this needs a rebase.

Apart from that, my overall comment on this patch is that it's huge:

 37 files changed, 7993 insertions(+), 287 deletions(-)

Now, more than half of that is regression test cases and their output,
which you will certainly be asked to pare down in any version of this
intended for commit. But even excluding those, it's still a fairly
patch:

 30 files changed, 2783 insertions(+), 272 deletions(-)

I think the reason this is so large is because there's a fair amount
of refactoring work that has been done as a precondition of the actual
meat of the patch, and no attempt has been made to separate the
refactoring work from the main body of the patch.  I think that's
something that needs to be done.  If you look at the way Amit Langote
submitted the partitioning patches and the follow-up bug fixes, he had
a series of patches 0001-blah, 0002-quux, etc. generated using
format-patch.  Each patch had its own commit message written by him
explaining the purpose of that patch, links to relevant discussion,
etc.  If you can separate this into more digestible chunks it will be
easier to get committed.

Other questions/comments:

Why does find_partition_scheme need to copy the partition bound
information instead of just pointing to it?  Amit went to some trouble
to make sure that this can't change under us while we hold a lock on
the relation, and we'd better hold a lock on the relation if we're
planning a query against it.

I think the PartitionScheme stuff should live in the optimizer rather
that src/backend/catalog/partition.c.  Maybe plancat.c?  Perhaps we
eventually need a new file in the optimizer just for partitioning
stuff, but I'm not sure about that yet.

The fact that set_append_rel_size needs to reopen the relation to
extract a few more bits of information is not desirable.  You need to
fish this information through in some other way; for example, you
could have get_relation_info() stash the needed bits in the
RelOptInfo.

+* For two partitioned tables with the same
partitioning scheme, it is
+* assumed that the Oids of matching partitions from
both the tables
+* are placed at the same position in the array of
partition oids in

Rather than saying that we assume this, you should say why it has to
be true.  (If it doesn't have to be true, we shouldn't assume it.)

+* join relations. Partition tables should have same
layout as the
+* parent table and hence should not need any
translation. But rest of

The same attributes have to be present with the same types, but they
can be rearranged.  This comment seems to imply the contrary.

FRACTION_PARTS_TO_PLAN seems like it should be a GUC.

+   /*
+* Add this relation to the list of samples ordered by
the increasing
+* number of rows at appropriate place.
+*/
+   foreach (lc, ordered_child_nos)
+   {
+   int child_no = lfirst_int(lc);
+   RelOptInfo *other_childrel = rel->part_rels[child_no];
+
+   /*
+* Keep track of child with lowest number of
rows but higher than the
+* that of the child being inserted. Insert
the child before a
+* child with highest number of rows lesser than it.
+*/
+   if (child_rel->rows <= other_childrel->rows)
+   insert_after = lc;
+   else
+   break;
+   }

Can we use quicksort instead of a hand-coded insertion sort?

+   if (bms_num_members(outer_relids) > 1)

Seems like bms_get_singleton_member could be used.

+* Partitioning scheme in join relation indicates a possibilty that the

Spelling.

There seems to be no reason for create_partition_plan to be separated
from create_plan_recurse.  You can just add another case for the new
path type.

Why does create_partition_join_path need to be separate from
create_partition_join_path_with_pathkeys?  Couldn't that be combined
into a single function with a pathkeys argument that might sometimes
be NIL?  I assume most of the logic is common.

>From a sort of theoretical standpoint, the biggest danger of this
patch seems to be that by deferring path creation until a later stage
than normal, we could miss some important processing.
subquery_planner() does a lot of stuff after
expand_inherited_tables(); if any of those things, especially the ones
that happen AFTER path generation, have an effect on the paths, then
this code needs to compensate for those changes somehow.  It 

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-02-01 Thread Claudio Freire
On Wed, Feb 1, 2017 at 5:47 PM, Masahiko Sawada  wrote:
> Thank you for updating the patch.
>
> Whole patch looks good to me except for the following one comment.
> This is the final comment from me.
>
> /*
>  *  lazy_tid_reaped() -- is a particular tid deletable?
>  *
>  *  This has the right signature to be an IndexBulkDeleteCallback.
>  *
>  *  Assumes dead_tuples array is in sorted order.
>  */
> static bool
> lazy_tid_reaped(ItemPointer itemptr, void *state)
> {
> LVRelStats *vacrelstats = (LVRelStats *) state;
>
> You might want to update the comment of lazy_tid_reaped() as well.

I don't see the mismatch with reality there (if you consider
"dead_tples array" in the proper context, that is, the multiarray).

What in particular do you find out of sync there?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-02-01 Thread Pavel Stehule
Hi

2017-01-24 21:33 GMT+01:00 Pavel Stehule :

>
>
>
>>
>>> Perhaps that's as simple as renaming all the existing _ns_* functions to
>>> _block_ and then adding support for pragmas...
>>>
>>> Since you're adding cursor_options to PLpgSQL_expr it should probably be
>>> removed as an option to exec_*.
>>>
>>
>> I have to recheck it. Some cursor options going from dynamic cursor
>> variables and are related to dynamic query - not query that creates query
>> string.
>>
>
> hmm .. so current state is better due using options like
> CURSOR_OPT_PARALLEL_OK
>
>  if (expr->plan == NULL)
> exec_prepare_plan(estate, expr, (parallelOK ?
>   CURSOR_OPT_PARALLEL_OK : 0) |
> expr->cursor_options);
>
> This options is not permanent feature of expression - and then I cannot to
> remove cursor_option argument from exec_*
>
> I did minor cleaning - remove cursor_options from plpgsql_var
>
> Regards
>
> Pavel
>


+ basic doc

Regards

Pavel
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index d3272e1..97c59db 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -802,6 +802,32 @@ $$ LANGUAGE plpgsql;
 happen in a plain SQL command.

   
+
+  
+   Block level PRAGMA
+
+   
+PRAGMA
+in PL/pgSQL
+   
+
+   
+The block level PRAGMA allows to change some
+PL/pgSQL compiler behave. Currently
+only PRAGMA PLAN_CACHE is supported.
+
+
+CREATE FUNCTION enforce_fresh_plan(_id text) RETURNS boolean AS $$
+DECLARE
+  PRAGMA PLAN_CACHE(force_custom_plan);
+BEGIN
+  -- in this block every embedded query uses one shot plan
+  RETURN EXISTS(SELECT * FROM tab WHERE id = _id);
+END;
+$$ LANGUAGE plpgsql;
+
+   
+  
   
 
   
@@ -4649,6 +4675,43 @@ $$ LANGUAGE plpgsql;
  use of the now() function would still be a better idea.
 
 
+
+
+ PRAGMA PLAN_CACHE
+
+ 
+  The plan cache behave can be controlled with PRAGMA PLAN_CACHE.
+  This PRAGMA can be used on function or on block level (per
+  function, per block). The following options are possible:
+  DEFAULT - default PL/pgSQL
+  implementation - the system try to decide between custom plan and generic
+  plan after five query executions, FORCE_CUSTOM_PLAN
+  - the execution plan is one shot plan - it is specific for every set of
+  used paramaters, FORCE_GENERIC_PLAN - the query plan
+  is generic from start.
+ 
+
+ 
+  
+   PRAGMA PLAN_CACHE
+   in PL/pgSQL
+  
+  The plan for INSERT is generic from begin. The 
+  PRAGMA PLAN_CACHE is related to function - etc. every command
+  in this function will use generic plan.
+
+CREATE FUNCTION logfunc2(logtxt text) RETURNS void AS $$
+PRAGMA PLAN_CACHE(FORCE_GENERIC_PLAN);
+DECLARE
+curtime timestamp;
+BEGIN
+curtime := 'now';
+INSERT INTO logtable VALUES (logtxt, curtime);
+END;
+$$ LANGUAGE plpgsql;
+
+ 
+
   
 
   
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index b25b3f1..304fc91 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -51,6 +51,8 @@ bool		plpgsql_check_syntax = false;
 
 PLpgSQL_function *plpgsql_curr_compile;
 
+PLpgSQL_directives *plpgsql_directives;
+
 /* A context appropriate for short-term allocs during compilation */
 MemoryContext plpgsql_compile_tmp_cxt;
 
@@ -83,6 +85,11 @@ static const ExceptionLabelMap exception_label_map[] = {
 	{NULL, 0}
 };
 
+PLpgSQL_directives default_directives = {
+	NULL,
+	true,		/* is_function_scope */
+	0			/* no special cursor option */
+};
 
 /* --
  * static prototypes
@@ -374,6 +381,9 @@ do_compile(FunctionCallInfo fcinfo,
 	plpgsql_DumpExecTree = false;
 	plpgsql_start_datums();
 
+	/* set default compile directives */
+	plpgsql_directives = _directives;
+
 	switch (function->fn_is_trigger)
 	{
 		case PLPGSQL_NOT_TRIGGER:
@@ -852,6 +862,9 @@ plpgsql_compile_inline(char *proc_source)
 	plpgsql_DumpExecTree = false;
 	plpgsql_start_datums();
 
+	/* set default compile directives */
+	plpgsql_directives = _directives;
+
 	/* Set up as though in a function returning VOID */
 	function->fn_rettype = VOIDOID;
 	function->fn_retset = false;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 6fc3db0..0a01bbe 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2335,7 +2335,7 @@ exec_stmt_forc(PLpgSQL_execstate *estate, PLpgSQL_stmt_forc *stmt)
 	Assert(query);
 
 	if (query->plan == NULL)
-		exec_prepare_plan(estate, query, curvar->cursor_options);
+		exec_prepare_plan(estate, query, query->cursor_options);
 
 	/*
 	 * Set up short-lived ParamListInfo
@@ -3625,7 +3625,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	{
 		ListCell   *l;
 
-		exec_prepare_plan(estate, expr, 0);
+		exec_prepare_plan(estate, expr, expr->cursor_options);
 		stmt->mod_stmt = false;
 		foreach(l, 

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-02-01 Thread Masahiko Sawada
On Tue, Jan 31, 2017 at 3:05 AM, Claudio Freire  wrote:
> On Mon, Jan 30, 2017 at 5:51 AM, Masahiko Sawada  
> wrote:
>> 
>>  * We are willing to use at most maintenance_work_mem (or perhaps
>>  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>>  * initially allocate an array of TIDs of that size, with an upper limit that
>>  * depends on table size (this limit ensures we don't allocate a huge area
>>  * uselessly for vacuuming small tables).  If the array threatens to 
>> overflow,
>>
>> I think that we need to update the above paragraph comment at top of
>> vacuumlazy.c file.
>
> Indeed, I missed that one. Fixing.
>
>>
>> 
>> +   numtuples = Max(numtuples,
>> MaxHeapTuplesPerPage);
>> +   numtuples = Min(numtuples, INT_MAX / 2);
>> +   numtuples = Min(numtuples, 2 *
>> pseg->max_dead_tuples);
>> +   numtuples = Min(numtuples,
>> MaxAllocSize / sizeof(ItemPointerData));
>> +   seg->dt_tids = (ItemPointer)
>> palloc(sizeof(ItemPointerData) * numtuples);
>>
>> Why numtuples is limited to "INT_MAX / 2" but not INT_MAX?
>
> I forgot to mention this one in the OP.
>
> Googling around, I found out some implemetations of bsearch break with
> array sizes beyond INT_MAX/2 [1] (they'd overflow when computing the
> midpoint).
>
> Before this patch, this bsearch call had no way of reaching that size.
> An initial version of the patch (the one that allocated a big array
> with huge allocation) could reach that point, though, so I reduced the
> limit to play it safe. This latest version is back to the starting
> point, since it cannot allocate segments bigger than 1GB, but I opted
> to keep playing it safe and leave the reduced limit just in case.
>

Thanks, I understood.

>> 
>> @@ -1376,35 +1411,43 @@ lazy_vacuum_heap(Relation onerel, LVRelStats
>> *vacrelstats)
>> pg_rusage_init();
>> npages = 0;
>>
>> -   tupindex = 0;
>> -   while (tupindex < vacrelstats->num_dead_tuples)
>> +   segindex = 0;
>> +   tottuples = 0;
>> +   for (segindex = tupindex = 0; segindex <=
>> vacrelstats->dead_tuples.last_seg; tupindex = 0, segindex++)
>> {
>> -   BlockNumber tblk;
>> -   Buffer  buf;
>> -   Pagepage;
>> -   Sizefreespace;
>>
>> This is a minute thing but tupindex can be define inside of for loop.
>
> Right, changing.
>
>>
>> 
>> @@ -1129,10 +1159,13 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>>   * instead of doing a second scan.
>>   */
>>  if (nindexes == 0 &&
>> -vacrelstats->num_dead_tuples > 0)
>> +vacrelstats->dead_tuples.num_entries > 0)
>>  {
>>  /* Remove tuples from heap */
>> -lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, );
>> +Assert(vacrelstats->dead_tuples.last_seg == 0);/*
>> Should not need more
>> + *
>> than one segment per
>> + * page */
>>
>> I'm not sure we need to add Assert() here but it seems to me that the
>> comment and code is not properly correspond and the comment for
>> Assert() should be wrote above of Assert() line.
>
> Well, that assert is the one that found the second bug in
> lazy_clear_dead_tuples, so clearly it's not without merit.
>
> I'll rearrange the comments as you ask though.
>
>
> Updated and rebased v7 attached.
>
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776671

Thank you for updating the patch.

Whole patch looks good to me except for the following one comment.
This is the final comment from me.

/*
 *  lazy_tid_reaped() -- is a particular tid deletable?
 *
 *  This has the right signature to be an IndexBulkDeleteCallback.
 *
 *  Assumes dead_tuples array is in sorted order.
 */
static bool
lazy_tid_reaped(ItemPointer itemptr, void *state)
{
LVRelStats *vacrelstats = (LVRelStats *) state;

You might want to update the comment of lazy_tid_reaped() as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-01 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> However, the patch misses an
>> important part of such an abstraction layer by not also converting
>> catalog-related simple_heap_delete() calls into some sort of
>> CatalogTupleDelete() operation.  It is certainly a peculiarity of
>> PG heaps that deletions don't require any immediate index work --- most
>> other storage engines would need that.
>> I propose that we should finish the job by inventing CatalogTupleDelete(),
>> which for the moment would be a trivial wrapper around
>> simple_heap_delete(), maybe just a macro for it.
>> 
>> If there's no objections I'll go make that happen in a day or two.

> Sounds good.

So while I was working on this I got quite unhappy with the
already-committed patch: it's a leaky abstraction in more ways than
this, and it's created a possibly-serious performance regression
for large objects (and maybe other places).

The source of both of those problems is that in some places, we
did CatalogOpenIndexes and then used the CatalogIndexState for
multiple tuple inserts/updates before doing CatalogCloseIndexes.
The patch dealt with these either by not touching them, just
leaving the simple_heap_insert/update calls in place (thus failing
to create any abstraction), or by blithely ignoring the optimization
and doing s/simple_heap_insert/CatalogTupleInsert/ anyway.  For example,
in inv_api.c we are now doing a CatalogOpenIndexes/CatalogCloseIndexes
cycle for each chunk of the large object ... and just to add insult to
injury, the now-useless open/close calls outside the loop are still there.

I think what we ought to do about this is invent additional API
functions, say

Oid CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
   CatalogIndexState indstate);
void CatalogTupleUpdateWithInfo(Relation heapRel, ItemPointer otid,
HeapTuple tup, CatalogIndexState indstate);

and use these in place of simple_heap_foo plus CatalogIndexInsert
in the places where this optimization had been applied.

An alternative but much more complicated fix would be to get rid of
the necessity for callers to worry about this at all, by caching
a CatalogIndexState in the catalog's relcache entry.  That might be
worth doing eventually (because it would allow sharing index info
collection across unrelated operations) but I don't want to do it today.

Objections, better naming ideas?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
On 02/01/2017 08:28 PM, Heikki Linnakangas wrote:
>
> But if there's no pressing reason to change it, let's leave it alone. It's 
> not related to the problem at hand, right?
>

Yes, I agree with you: we should better leave it as it is.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
On 02/01/2017 08:30 PM, REIX, Tony wrote:
>
> Hi Konstantin,
>
> Please run:*/opt/IBM/xlc/13.1.3/bin/xlc -qversion*  so that I know your exact 
> XLC v13 version.
>
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)

> I'm building on Power7 and not giving any architecture flag to XLC.
>
> I'm not using *-qalign=natural* . Thus, by default, XLC use -qalign=power, 
> which is close to natural, as explained at:
>  
> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/compiler_ref/opt_align.html
> Why are you using this flag ?
>

Because otherwise double type is aligned on 4 bytes.

> Thanks for info about *pgbench*. PostgreSQL web-site contains a lot of old 
> information...
>
> If you could*share scripts or instructions about the tests you are doing with 
> pgbench*, I would reproduce here.
>

You do not need any script.
Just two simple commands.
One to initialize database:

pgbench -i -s 1000

And another to run benchmark itself:

pgbench -c 100 -j 20 -P 1 -T 10


> I have no "real" application. My job consists in porting OpenSource packages 
> on AIX. Many packages. Erlang, Go, these days. I just want to make PostgreSQL 
> RPMs as good as possible... within the limited amount of time I can give to 
> this package, before
> moving to another one.
>
> About the *zombie* issue, I've discussed with my colleagues. Looks like the 
> process keeps zombie till the father looks at its status. However, though I 
> did that several times, I  do not remember well the details. And that should 
> be not specific to AIX.
> I'll discuss with another colleague, tomorrow, who should understand this 
> better than me.
>

1. Process is not in zomby state (according to ps). It is in  state... 
It is something AIX specific, I have not see processes in this state at Linux.
2. I have implemented simple test - forkbomb. It creates 1000 children and then 
wait for them. It is about ten times slower than at Intel/Linux, but still much 
faster than 100 seconds. So there is some difference between postgress backend 
and dummy process
doing nothing - just immediately terminating after return from fork()
>
> *Patch for Large Files*: When building PostgreSQL, I found required to use 
> the following patch so that PostgreSQL works with large files. I do not 
> remember the details. Do you agree with such a patch ? 1rst version (new-...) 
> shows the exact places where
>   define _LARGE_FILES 1  is required.  2nd version (new2-...) is simpler.
>
> I'm now experimenting with your patch for dead lock. However, that should be 
> invisible with the  "check-world" tests I guess.
>
> Regards,
>
> Tony
>
>
> Le 01/02/2017 à 16:59, Konstantin Knizhnik a écrit :
>> Hi Tony,
>>
>> On 01.02.2017 18:42, REIX, Tony wrote:
>>>
>>> Hi Konstantin
>>>
>>> *XLC.*
>>>
>>> I'm on AIX 7.1 for now.
>>>
>>> I'm using this version of *XL**C v13*:
>>>
>>> # xlc -qversion
>>> IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
>>> Version: 13.01.0003.0003
>>>
>>> With this version, I have (at least, since I tested with "check" and not 
>>> "check-world" at that time) 2 failing tests: create_aggregate , aggregates .
>>>
>>>
>>> With the following *XLC v12* version, I have NO test failure:
>>>
>>> # /usr/vac/bin/xlc -qversion
>>> IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
>>> Version: 12.01..0016
>>>
>>>
>>> So maybe you are not using XLC v13.1.3.3, rather another sub-version. 
>>> Unless you are using more options for the configure ?
>>>
>>>
>>> *Configure*.
>>>
>>> What are the options that you give to the configure ?
>>>
>>>
>> export CC="/opt/IBM/xlc/13.1.3/bin/xlc"
>> export CFLAGS="-qarch=pwr8 -qtune=pwr8 -O2 -qalign=natural -q64 "
>> export LDFLAGS="-Wl,-bbigtoc,-b64"
>> export AR="/usr/bin/ar -X64"
>> export LD="/usr/bin/ld -b64 "
>> export NM="/usr/bin/nm -X64"
>> ./configure --prefix="/opt/postgresql/xlc-debug/9.6"
>>
>>
>>> *Hard load & 64 cores ?* OK. That clearly explains why I do not see this 
>>> issue.
>>>
>>>
>>> *pgbench ?* I wanted to run it. However, I'm still looking where to get it 
>>> plus a guide for using it for testing.
>>>
>>
>> pgbench is part of Postgres distributive (src/bin/pgbench)
>>
>>
>>> I would add such tests when building my PostgreSQL RPMs on AIX. So any help 
>>> is welcome !
>>>
>>>
>>> *Performance*.
>>>
>>> - Also, I'd like to compare PostgreSQL performance on AIX vs Linux/PPC64. 
>>> Any idea how I should proceed ? Any PostgreSQL performance benchmark that I 
>>> could find and use ? pgbench ?
>>>
>> pgbench is most widely used tool simulating OLTP workload. Certainly it is 
>> quite primitive and its results are rather artificial. TPC-C seems to be 
>> better choice.
>> But the best case is to implement your own benchmark simulating actual 
>> workload of your real application.
>>
>>> - I'm interested in any information for improving the performance & quality 
>>> of my PostgreSQM RPMs on AIX./(As I already said, BullFreeware RPMs for AIX 
>>> are free and can be 

Re: [HACKERS] logical decoding of two-phase transactions

2017-02-01 Thread Konstantin Knizhnik
On 02/01/2017 10:32 PM, Tom Lane wrote:
> Robert Haas  writes:
>> Also, including the GID in the WAL for each COMMIT/ABORT PREPARED
>> doesn't seem inordinately expensive to me.
> I'm confused ... isn't it there already?  If not, how do we handle
> reconstructing 2PC state from WAL at all?
>
>   regards, tom lane
>
>
Right now logical decoding ignores prepare and take in account only "commit 
prepared":

/*
 * Currently decoding ignores PREPARE TRANSACTION and will just
 * decode the transaction when the COMMIT PREPARED is sent or
 * throw away the transaction's contents when a ROLLBACK PREPARED
 * is received. In the future we could add code to expose prepared
 * transactions in the changestream allowing for a kind of
 * distributed 2PC.
 */

For some scenarios it works well, but if we really need prepared state at 
replica (as in case of multimaster), then it is not enough.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-01 Thread Fabien COELHO


Hello,


We could just issue interactive-only warnings when:
- A user types a branching condition command which sets the branch inactive
- A user types a command or query when the current branch is inactive.

The warnings could be specific about state, something like:

psql session is now in an inactive \if branch. No queries will be executed
and only branching commands (\if, \elif, \else, \endif) will be evaluated.
[...]


My 0.02€: it looks too verbose, should stay on one short line. Maybe:

  (in|)active (\if|\elif|\else), (execut|ignor)ing commands

Although there is some redundancy...

   calvin=>\if true
 active \if: executing commands
   calvin=(t)> \echo ok
 ok
   calvin=(t)> \else
 inactive \else: skipping commands
   calvin=(f)> ...

Maybe it could be controlled, say based on VERBOSITY setting (which really 
controls verbosity of error reports) or some other.


I'm unsure whether it is a good idea, I like terse interfaces, but this is 
just an opinion.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-02-01 Thread Thomas Munro
On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
 wrote:
> 9 | 62928.88   | 59077.909

Thanks Rafia.  At first glance this plan is using the Parallel Shared
Hash in one place where it should pay off, that is loading the orders
table, but the numbers are terrible.  I noticed that it uses batch
files and then has to increase the number of batch files, generating a
bunch of extra work, even though it apparently overestimated the
number of rows, though that's only ~9 seconds of ~60.  I am
investigating.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-02-01 Thread Tom Lane
Robert Haas  writes:
> Also, including the GID in the WAL for each COMMIT/ABORT PREPARED
> doesn't seem inordinately expensive to me.

I'm confused ... isn't it there already?  If not, how do we handle
reconstructing 2PC state from WAL at all?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-02-01 Thread Tom Lane
Pavel Stehule  writes:
> This patch is not possible to compile on today master
> commands/collationcmds.o: In function `AlterCollation':
> /home/pavel/src/postgresql/src/backend/commands/collationcmds.c:297:
> undefined reference to `CatalogUpdateIndexes'

Evidently collateral damage from 2f5c9d9c9.  But I'd suggest waiting
to fix it until you can also do s/simple_heap_delete/CatalogTupleDelete/
as I proposed in
https://www.postgresql.org/message-id/462.1485902...@sss.pgh.pa.us

I'll go make that happen right now, now that I realize there are patch(es)
waiting on it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multi-level partitions and partition-wise joins

2017-02-01 Thread Robert Haas
On Fri, Dec 23, 2016 at 12:54 AM, Ashutosh Bapat
 wrote:
> Another question: do we want to commit the code for creating
> unflattened hierarchy separately or along with partition-wise join?

Probably separately.  I suggest posting a series of two (or perhaps
more) patches on the same thread.  'git format-patch' is a useful way
to produce a patch series for posting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-02-01 Thread Robert Haas
On Mon, Jan 30, 2017 at 2:30 AM, Masahiko Sawada  wrote:
> "txn" can be used for abbreviation of "Transaction", so for example
> pg_fdw_txn_resolver?
> I'm also fine to change the module and function name.

If we're judging the relative clarity of various ways of abbreviating
the word "transaction", "txn" surely beats "x".

To repeat my usual refrain, is there any merit to abbreviating at all?
 Could we call it, say, "fdw_transaction_resolver" or
"fdw_transaction_manager"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-02-01 Thread Pavel Stehule
Hi

2017-01-24 18:44 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 1/15/17 5:53 AM, Pavel Stehule wrote:
> > the regress test fails
> >
> >  Program received signal SIGSEGV, Segmentation fault.
> > 0x007bbc2b in pattern_char_isalpha (locale_is_c=0 '\000',
> > locale=0x1a73220, is_multibyte=1 '\001', c=97 'a') at selfuncs.c:5291
> > 5291return isalpha_l((unsigned char) c, locale->lt);
>
> Here is an updated patch that fixes this crash and is rebased on top of
> recent changes.
>

This patch is not possible to compile on today master

commands/collationcmds.o: In function `AlterCollation':
/home/pavel/src/postgresql/src/backend/commands/collationcmds.c:297:
undefined reference to `CatalogUpdateIndexes'
collect2: error: ld returned 1 exit status

Regards

Pavel



>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] logical decoding of two-phase transactions

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 9:05 PM, Michael Paquier
 wrote:
>> Personally I don't think lack of access to the GID justifies blocking 2PC
>> logical decoding. It can be added separately. But it'd be nice to have
>> especially if it's cheap.
>
> I think it should be added reading this thread.

+1.  If on the logical replication master the user executes PREPARE
TRANSACTION 'mumble', isn't it sensible to want the logical replica to
prepare the same set of changes with the same GID?  To me, that not
only seems like *a* sensible thing to want to do but probably the
*most* sensible thing to want to do.  And then, when the eventual
COMMIT PREPAPARED 'mumble' comes along, you want to have the replica
run the same command.  If you don't do that, then the alternative is
that the replica has to make up new names based on the master's XID.
But that kinda sucks, because now if replication stops due to a
conflict or whatever and you have to disentangle things by hand, all
the names on the replica are basically meaningless.

Also, including the GID in the WAL for each COMMIT/ABORT PREPARED
doesn't seem inordinately expensive to me.  For that to really add up
to a significant cost, wouldn't you need to be doing LOTS of 2PC
transactions, each with very little work, so that the commit/abort
prepared records weren't swamped by everything else?  That seems like
an unlikely scenario, but if it does happen, that's exactly when
you'll be most grateful for the GID tracking.  I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of replication commands using printsimple

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 8:26 PM, Michael Paquier
 wrote:
> pq_sendcountedtext() does some encoding conversion, which is why I
> haven't used because we deal only with integers in this patch... Now
> if you wish to switch to that I have really no arguments against.

OK, I see.  Well, I guess it's sensible either way, but I've committed
this version.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add tab completion for DEALLOCATE

2017-02-01 Thread Tom Lane
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> While playing with prepared statements in psql, I noticed that EXECUTE
> tab-completes the list of active prepared statements, but DEALLOCATE
> does not.
> Attached is a patch to fix this.

Good idea, but I think it would be better to give DEALLOCATE its own
entry in the list, so it could be placed in alphabetical order.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-02-01 Thread Tom Lane
I wrote:
> Attached is a finished version that includes hooks for all the variables
> for which they were clearly sensible.  (FETCH_COUNT doesn't seem to really
> need one, and I didn't do anything with HISTSIZE or IGNOREEOF either.
> It might be worth bringing the latter two into the hooks paradigm, but
> that seems like fit material for a separate patch.)

I got more excited about doing that after noticing that not only would
it clean up the behavior of those particular variables, but we could get
rid of some code.  First, we'd not need the rather warty GetVariableNum()
anymore, and second, we'd then be almost at the point where every control
variable has a hook, and therefore we could drop tab-complete.c's private
list of known variable names.  That was only ever needed to cover the
possibility of important variables not being present in the variables
list.

So the attached proposed patch does these things:

1. Modify FETCH_COUNT to always have a defined value, like other control
variables, mainly so it will always appear in "\set" output.

2. Add hooks to force HISTSIZE to be defined and require it to have an
integer value.  (I don't see any point in allowing it to be set to
non-integral values.)

3. Add hooks to force IGNOREEOF to be defined and require it to have an
integer value.  Unlike the other cases, here we're trying to be
bug-compatible with a rather bogus externally-defined behavior, so I think
we need to continue to allow "\set IGNOREEOF whatever".  What I propose is
that the substitution hook silently replace non-numeric values with "10",
so that the stored value always reflects what we're really doing.

4. Add a dummy assign hook for HISTFILE, just so it's always in
variables.c's list.  We can't require it to be defined always, because
that would break the interaction with the PSQL_HISTORY environment
variable, so there isn't any change in visible behavior here.

5. Remove tab-complete.c's private list of known variable names.  Given
the other changes, there are no control variables it won't show anyway.
This does mean that if for some reason you've unset one of the status
variables (DBNAME, HOST, etc), that variable would not appear in tab
completion for \set.  But I think that's fine, for at least two reasons:
we shouldn't be encouraging people to use those variables as regular
variables, and if someone does do so anyway, why shouldn't it act just
like a regular variable?

6. Remove no-longer-used-anywhere GetVariableNum().

Any objections?

regards, tom lane

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index b9c8fcc..ae58708 100644
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** bar
*** 3247,3258 
  fail after having already displayed some rows.
  
  
- 
- FETCH_COUNT is ignored if it is unset or does not
- have a positive value.  It cannot be set to a value that is not
- syntactically an integer.
- 
- 
  
  
  Although you can use any output format with this feature,
--- 3247,3252 
*** bar
*** 3316,3325 
  HISTSIZE
  
  
! The maximum number of commands to store in the command history.
! If unset, at most 500 commands are stored by default.
! If set to a value that is negative or not an integer, no limit is
! applied.
  
  
  
--- 3310,3317 
  HISTSIZE
  
  
! The maximum number of commands to store in the command history
! (default 500).  If set to a negative value, no limit is applied.
  
  
  
*** bar
*** 3345,3357 
  IGNOREEOF
  
  
!  If unset, sending an EOF character (usually
   ControlD)
   to an interactive session of psql
!  will terminate the application. If set to a numeric value,
!  that many EOF characters are ignored before the
!  application terminates.  If the variable is set but not to a
!  numeric value, the default is 10.
  
  
  
--- 3337,3349 
  IGNOREEOF
  
  
!  If set to 1 or less, sending an EOF character (usually
   ControlD)
   to an interactive session of psql
!  will terminate the application.  If set to a larger numeric value,
!  that many consecutive EOF characters must be typed to
!  make an interactive session terminate.  If the variable is set to a
!  non-numeric value, it is interpreted as 10.
  
  
  
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 5365629..3e3cab4 100644
*** a/src/bin/psql/help.c
--- b/src/bin/psql/help.c
*** helpVariables(unsigned short int pager)
*** 348,356 
  	  " (default: 0=unlimited)\n"));
  	

Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 5:07 PM, Jim Nasby  wrote:
> On 11/29/16 9:58 AM, Jeff Janes wrote:
>> Considering a single SSD can do 70% of that limit, I would say
>> yes.
>>
>> Next question becomes... should there even be an upper limit?
>>
>>
>> Where the contortions needed to prevent calculation overflow become
>> annoying?
>>
>> I'm not a big fan of nannyism in general, but the limits on this
>> parameter seem particularly pointless.  You can't write out more buffers
>> than exist in the dirty state, nor more than implied
>> by bgwriter_lru_multiplier.  So what is really the worse that can happen
>> if you make it too high?
>
>
> Attached is a patch that ups the limit to INT_MAX / 2, which is the same as
> shared_buffers.

This looks fine to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-01 Thread Corey Huinker
>
>
> You can also run "make check" directly in the "src/bin/psql" directory.


Previously, that didn't do anything, but now that I've created a TAP test
it does. It doesn't, however, run the psql regress tests. But at least I
can use the two commands in combination and not have to run *all* TAP tests
outside of psql.


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-01 Thread Corey Huinker
On Wed, Feb 1, 2017 at 8:53 AM, Corey Huinker 
wrote:

>
>> Run   make check-world   (as opposed to  just  make check )
>
>
> I'll give that a shot.
>

That was it. Tests don't run if you don't invoke them. Thanks.


Re: [HACKERS] [COMMITTERS] pgsql: Make psql's \set display variables in alphabetical order.

2017-02-01 Thread Andres Freund
On 2017-02-01 12:59:36 -0500, Tom Lane wrote:
> David Fetter  writes:
> > On Wed, Feb 01, 2017 at 04:25:25PM +, Tom Lane wrote:
> >> Make psql's \set display variables in alphabetical order.
> 
> > This is a substantial usability improvement which nevertheless is hard
> > to imagine changes things that scripts relied on. I think it's worth
> > back-patching.
> 
> I'm not that excited about it personally, but I agree it would be unlikely
> to break anything.  Other votes?

-0.5 - I see very little reason to backpatch this over dozes of other
changes.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-02-01 Thread Robert Haas
On Sat, Jan 28, 2017 at 9:09 PM, Ashutosh Sharma  wrote:
> okay. Thanks. I have done changes on top of this patch.

+   ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+   Assert(ptr <= uargs->page + BLCKSZ);

I think this should be promoted to an ereport(); these functions can
accept an arbitrary bytea.

+   if (opaque->hasho_flag & LH_BUCKET_PAGE)
+   stat->hasho_prevblkno = InvalidBlockNumber;
+   else
+   stat->hasho_prevblkno = opaque->hasho_prevblkno;

I think we should return the raw value here.  Mithun's patch to cache
the metapage hasn't gone in yet, but even if it does, let's assume
anyone using contrib/pageinspect wants to see the data that's
physically present, not our gloss on it.

Other than that, I don't think I have any other comments on this.  The
tests that got added look a little verbose to me (do we really need to
check pages 1-4 separately in each case when they're all hash pages?
if hash_bitmap_info rejects one, surely it will reject the others) but
I'm not going to fight too hard if Peter wants it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make psql's \set display variables in alphabetical order.

2017-02-01 Thread Tom Lane
David Fetter  writes:
> On Wed, Feb 01, 2017 at 04:25:25PM +, Tom Lane wrote:
>> Make psql's \set display variables in alphabetical order.

> This is a substantial usability improvement which nevertheless is hard
> to imagine changes things that scripts relied on. I think it's worth
> back-patching.

I'm not that excited about it personally, but I agree it would be unlikely
to break anything.  Other votes?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-01 Thread Corey Huinker
>
> However I would say yes, it should provide some feedback... This means
> probably adding a new prompt substitution "%". In the worst
> case, the prompt should reflect the current stack, or at least the top of
> the task...
>

We could just issue interactive-only warnings when:
- A user types a branching condition command which sets the branch inactive
- A user types a command or query when the current branch is inactive.

The warnings could be specific about state, something like:

psql session is now in an inactive \if branch. No queries will be executed
and only branching commands (\if, \elif, \else, \endif) will be evaluated.

psql session is now in an inactive \elif branch. No queries will be
executed and only branching commands (\if, \elif, \else, \endif) will be
evaluated.

psql session is now in an inactive \else branch. No queries will be
executed and only branching commands (\if, \endif) will be evaluated.



This could of course be done in addition to prompt changes, and is
orthogonal to the CTRL-c  option. I'd like more input before moving forward
on either of those, as they have a good chance to clobber other expected
behaviors.


[HACKERS] [PATCH] Add tab completion for DEALLOCATE

2017-02-01 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

While playing with prepared statements in psql, I noticed that EXECUTE
tab-completes the list of active prepared statements, but DEALLOCATE
does not.

Attached is a patch to fix this.

Cheers,

Ilmari

-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen

>From ac727e99d6f06a82d6ba9a7927fed8ce179dac3f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Wed, 1 Feb 2017 17:39:43 +
Subject: [PATCH] Add tab completion for DEALLOCATE

EXECUTE already tab-completes the list of prepared statements, but
DEALLOCATE did not.
---
 src/bin/psql/tab-complete.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index d6fffcf42f..4a7bfe844a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2575,8 +2575,8 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches5("DROP", "RULE", MatchAny, "ON", MatchAny))
 		COMPLETE_WITH_LIST2("CASCADE", "RESTRICT");
 
-/* EXECUTE */
-	else if (Matches1("EXECUTE"))
+/* EXECUTE && DEALLOCATE */
+	else if (Matches1("EXECUTE|DEALLOCATE"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements);
 
 /* EXPLAIN */
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 6:05 AM, Dilip Kumar  wrote:
>> 0002-hash-support-alloc-free-v14.patch:
>>
>>
>> + if (tb->alloc)
>> + {
>>
>> The memory for tb->alloc is allocated always, is the if check still
>> required?
>
> In parallel case, only first worker will call SH_CREATE, other worker
> will only do palloc for pagetable and copy the reference from main
> worker, so they will not have allocator.

When Andres wrote in
https://www.postgresql.org/message-id/20161017201558.cr34stc6zvxy3...@alap3.anarazel.de
that you should try to share only the iteration arrays, I believe that
he meant the results of tbm_begin_iterate() -- that is,
iterator->spageptr, chunkptr, schunkbit, spages, and schunks.  I'm
perhaps putting words into his mouth, but what he said was that we
could avoid sharing "the whole underlying hash".  But the patch set
you've implemented here doesn't behave that way.  Instead, you've got
the array containing the hash table elements shared (which is good)
plus there's a sort of dummy hash table in every worker which copies
some but not all of the members of the original hash table, leading to
the otherwise-unnecessary if-test that Haribabu is complaining about.
So the hash table is kinda-shared-kinda-not-shared, which I don't
*think* is really what Andres had in mind.

In short, I think SH_COPY (implemented here as pagetable_copy) needs
to go away.  The work of tbm_begin_iterate() should be done before we
begin the shared scan and the results of that work should be shared.
What happens right now (it appears) is that every backend does that
work based on the same hash table and we just assume they all get the
same answer.  And we really do assume that, because
pbms_parallel_iterate() assumes it can shuttle private state back and
forth between iterator in different backends and nothing will break;
but those backends aren't actually using the same iteration arrays.
They are using different iteration arrays that should have the same
contents because they were all derived from the same semi-shared hash
table.  That's pretty fragile, and a waste of CPU cycles if the hash
table is large (every backend does its own sort).

On a related note, I think it's unacceptable to make the internal
details of TBMIterator public.  You've moved it from tidbitmap.c to
tidbitmap.h so that nodeBitmapHeapScan.c can tinker with the guts of
the TBMIterator, but that's not OK.  Those details need to remain
private to tidbitmap.c. pbms_parallel_iterate() is a nasty kludge; we
need some better solution.  The knowledge of how a shared iterator
should iterate needs to be private to tidbitmap.c, not off in the
executor someplace.  And I think the entries need to actually be
updated in shared memory directly, not copied back and forth between a
backend-private iterator and a shared iterator.

Also, pbms_parallel_iterate() can't hold a spinlock around a call to
tbm_iterate().  Note the coding rules for spinlocks mentioned in
spin.h and src/backend/storage/lmgr/README.  I think the right thing
to do here is to use an LWLock in a new tranche (add an entry to
BuiltinTrancheIds).

In 0002, you have this:

+tb->alloc = palloc(sizeof(SH_ALLOCATOR));

This should be using MemoryContextAlloc(ctx, ...) rather than palloc.
Otherwise the allocator object is in a different context from
everything else in the hash table.  But TBH, I don't really see why we
want this to be a separate object.  Why not just put
HashAlloc/HashFree/args into SH_TYPE directly?  That avoids some
pointer chasing and doesn't really seem to cost anything (except that
SH_CREATE will grow a slightly longer argument sequence).

I think maybe we should rename the functions to element_allocate,
element_free, and element_allocator_ctx, or something like that.  The
current names aren't capitalized consistently with other things in
this module, and putting the word "element" in there would make it
more clear what the purpose of this thing is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread REIX, Tony
Hi Konstantin,

Please run: /opt/IBM/xlc/13.1.3/bin/xlc -qversion  so that I know your exact 
XLC v13 version.

I'm building on Power7 and not giving any architecture flag to XLC.

I'm not using -qalign=natural . Thus, by default, XLC use -qalign=power, which 
is close to natural, as explained at:
 
https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/compiler_ref/opt_align.html
Why are you using this flag ?

Thanks for info about pgbench. PostgreSQL web-site contains a lot of old 
information...

If you could share scripts or instructions about the tests you are doing with 
pgbench, I would reproduce here.
I have no "real" application. My job consists in porting OpenSource packages on 
AIX. Many packages. Erlang, Go, these days. I just want to make PostgreSQL RPMs 
as good as possible... within the limited amount of time I can give to this 
package, before moving to another one.

About the zombie issue, I've discussed with my colleagues. Looks like the 
process keeps zombie till the father looks at its status. However, though I did 
that several times, I  do not remember well the details. And that should be not 
specific to AIX. I'll discuss with another colleague, tomorrow, who should 
understand this better than me.

Patch for Large Files: When building PostgreSQL, I found required to use the 
following patch so that PostgreSQL works with large files. I do not remember 
the details. Do you agree with such a patch ? 1rst version (new-...) shows the 
exact places where   define _LARGE_FILES 1  is required.  2nd version 
(new2-...) is simpler.

I'm now experimenting with your patch for dead lock. However, that should be 
invisible with the  "check-world" tests I guess.

Regards,

Tony

Le 01/02/2017 à 16:59, Konstantin Knizhnik a écrit :
Hi Tony,

On 01.02.2017 18:42, REIX, Tony wrote:

Hi Konstantin

XLC.

I'm on AIX 7.1 for now.

I'm using this version of XLC v13:

# xlc -qversion
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
Version: 13.01.0003.0003

With this version, I have (at least, since I tested with "check" and not 
"check-world" at that time) 2 failing tests: create_aggregate , aggregates .


With the following XLC v12 version, I have NO test failure:

# /usr/vac/bin/xlc -qversion
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
Version: 12.01..0016


So maybe you are not using XLC v13.1.3.3, rather another sub-version. Unless 
you are using more options for the configure ?


Configure.

What are the options that you give to the configure ?


export CC="/opt/IBM/xlc/13.1.3/bin/xlc"
export CFLAGS="-qarch=pwr8 -qtune=pwr8 -O2 -qalign=natural -q64 "
export LDFLAGS="-Wl,-bbigtoc,-b64"
export AR="/usr/bin/ar -X64"
export LD="/usr/bin/ld -b64 "
export NM="/usr/bin/nm -X64"
./configure --prefix="/opt/postgresql/xlc-debug/9.6"



Hard load & 64 cores ? OK. That clearly explains why I do not see this issue.


pgbench ? I wanted to run it. However, I'm still looking where to get it plus a 
guide for using it for testing.

pgbench is part of Postgres distributive (src/bin/pgbench)



I would add such tests when building my PostgreSQL RPMs on AIX. So any help is 
welcome !


Performance.

- Also, I'd like to compare PostgreSQL performance on AIX vs Linux/PPC64. Any 
idea how I should proceed ? Any PostgreSQL performance benchmark that I could 
find and use ? pgbench ?

pgbench is most widely used tool simulating OLTP workload. Certainly it is 
quite primitive and its results are rather artificial. TPC-C seems to be better 
choice.
But the best case is to implement your own benchmark simulating actual workload 
of your real application.


- I'm interested in any information for improving the performance & quality of 
my PostgreSQM RPMs on AIX. (As I already said, BullFreeware RPMs for AIX are 
free and can be used by anyone, like Perzl RPMs are. My company (ATOS/Bull) 
sells IBM Power machines under the Escala brand since ages (25 years this 
year)).


How to help ?

How could I help for improving the quality and performance of PostgreSQL on AIX 
?

We still have one open issue at AIX: see 
https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg303094.html
It will be great if you can somehow help to fix this problem.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--- src/include/postgres.h.ORIGIN	2017-02-01 07:32:04 -0600
+++ src/include/postgres.h	2017-02-01 07:32:29 -0600
@@ -44,6 +44,10 @@
 #ifndef POSTGRES_H
 #define POSTGRES_H
 
+#ifdef _AIX
+#define _LARGE_FILES 1
+#endif
+
 #include "c.h"
 #include "utils/elog.h"
 #include "utils/palloc.h"
--- src/pl/plpython/plpy_cursorobject.c.ORIGIN	2017-02-01 02:59:08 -0600
+++ src/pl/plpython/plpy_cursorobject.c	2017-02-01 03:00:20 -0600
@@ -4,6 +4,10 @@
  * src/pl/plpython/plpy_cursorobject.c
  */
 
+#ifdef _AIX
+#define _LARGE_FILES 1
+#endif
+
 #include "postgres.h"
 
 #include 
--- src/pl/plpython/plpy_elog.c.ORIGIN	2017-02-01 02:59:08 -0600
+++ 

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Heikki Linnakangas

On 02/01/2017 04:12 PM, Konstantin Knizhnik wrote:

On 01.02.2017 15:39, Heikki Linnakangas wrote:

On 02/01/2017 01:07 PM, Konstantin Knizhnik wrote:

Attached please find my patch for XLC/AIX.
The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl.
The comment in this file says that:

   * __fetch_and_add() emits a leading "sync" and trailing "isync",
thereby
   * providing sequential consistency.  This is undocumented.

But it is not true any more (I checked generated assembler code in
debugger).
This is why I have added __sync() to this function. Now pgbench working
normally.


Seems like it was not so much undocumented, but an implementation
detail that was not guaranteed after all..

Does __fetch_and_add emit a trailing isync there either? Seems odd if
__compare_and_swap requires it, but __fetch_and_add does not. Unless
we can find conclusive documentation on that, I think we should assume
that an __isync() is required, too.

There was a long thread on these things the last time this was
changed:
https://www.postgresql.org/message-id/20160425185204.jrvlghn3jxulsb7i%40alap3.anarazel.de.
I couldn't find an explanation there of why we thought that
fetch_and_add implicitly performs sync and isync.


Also there is mysterious disappearance of assembler section function
with sync instruction from pg_atomic_compare_exchange_u32_impl.
I have fixed it by using __sync() built-in function instead.


__sync() seems more appropriate there, anyway. We're using intrinsics
for all the other things in generic-xlc.h. But it sure is scary that
the "asm" sections just disappeared.

In arch-ppc.h, shouldn't we have #ifdef __IBMC__ guards for the
__sync() and __lwsync() intrinsics? Those are an xlc compiler-specific
thing, right? Or if they are expected to work on any ppc compiler,
then we should probably use them always, instead of the asm sections.

In summary, I came up with the attached. It's essentially your patch,
with tweaks for the above-mentioned things. I don't have a powerpc
system to test on, so there are probably some silly typos there.


Why do you prefer to use _check_lock instead of __check_lock_mp ?
First one is even not mentioned in XLC compiler manual:
http://www-01.ibm.com/support/docview.wss?uid=swg27046906=7
or
http://scv.bu.edu/computation/bluegene/IBMdocs/compiler/xlc-8.0/html/compiler/ref/bif_sync.htm


Googling around, it seems that they do more or less the same thing. I 
would guess that they actually produce the same assembly code, but I 
have no machine to test on. If I understand correctly, the difference is 
that __check_lock_mp() is an xlc compiler intrinsic, while _check_lock() 
is a libc function. The libc function presumably does __check_lock_mp() 
or __check_lock_up() depending on whether the system is a multi- or 
uni-processor system.


So I think if we're going to change this, the use of __check_lock_mp() 
needs to be in an #ifdef block to check that you're on the XLC compiler, 
as it's a *compiler* intrinsic, while the current code that uses 
_check_lock() are in an "#ifdef _AIX" block, which is correct for 
_check_lock() because it's defined in libc, not by the compiler.


But if there's no pressing reason to change it, let's leave it alone. 
It's not related to the problem at hand, right?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Make psql's \set display variables in alphabetical order.

2017-02-01 Thread David Fetter
On Wed, Feb 01, 2017 at 04:25:25PM +, Tom Lane wrote:
> Make psql's \set display variables in alphabetical order.

This is a substantial usability improvement which nevertheless is hard
to imagine changes things that scripts relied on. I think it's worth
back-patching.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-01 Thread Fabien COELHO




Hello Corey,


There is a spurious empty line added at the very end of "mainloop.h":

  +
   #endif   /* MAINLOOP_H */


Not in my diff, but that's been coming and going in your diff reviews.


Strange. Maybe this is linked to the warning displayed with "git apply" when 
I apply the diff.



I would suggest to add a short one line comment before each test to
explain what is being tested, like "-- test \elif execution", "-- test
\else execution"...


Where are you suggesting this?


In "regress/sql/psql.sql", in front of each group which starts a test.


Debatable suggestion about "psql_branch_empty":


The name isn't great. Maybe psql_branch_stack_empty()?


 Yep, maybe, or "empty_stack" or "stack_is_empty" or IDK...


"psql_branch_end_state": it is a pop, it could be named "psql_branch_pop"

Yeah, we either need to go fully with telling the programmer that it's a
stack (push/pop/empty) or (begin_branch/end_branch/not_branching). I'm
inclined to go full-stack, as it were.


Anything consistent is ok. I'm fine with calling a stack a stack:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-02-01 Thread Fujii Masao
On Wed, Feb 1, 2017 at 5:36 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, while looking another bug, I found that standby cannot
> shutdown after DROP SUBSCRIPTION.
>
> standby=# CREATE SUBSCRPTION sub1 ...
> standby=# 
> standby=# DROP SUBSCRIPTION sub1;
>
> Ctrl-C to the standby fails to work. ApplyLauncherMain is waiting
> LogicalRepLauncherLock forever.
>
> The culprit is DropSbuscription. It acquires
> LogicalRepLauncherLock but never releases.
>
> The attached patch fixes it. Most part of the fucntion is now
> enclosed by PG_TRY-CATCH since some functions can throw
> exceptions.

The lwlock would be released when an exception occurs, so I don't think
that TRY-CATCH is necessary here. Or it's necessary for another reason?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new autovacuum criterion for visible pages

2017-02-01 Thread Vik Fearing
On Sun, Jan 22, 2017 at 4:45 PM, Stephen Frost  wrote:

> Amit,
>
> * Amit Kapila (amit.kapil...@gmail.com) wrote:
> > On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost 
> wrote:
> > > * Simon Riggs (si...@2ndquadrant.com) wrote:
> > >> On 12 August 2016 at 01:01, Tom Lane  wrote:
> > >> > Michael Paquier  writes:
> > >> >> In short, autovacuum will need to scan by itself the VM of each
> > >> >> relation and decide based on that.
> > >> >
> > >> > That seems like a worthwhile approach to pursue.  The VM is
> supposed to be
> > >> > small, and if you're worried it isn't, you could sample a few pages
> of it.
> > >> > I do not think any of the ideas proposed so far for tracking the
> > >> > visibility percentage on-the-fly are very tenable.
> > >>
> > >> Sounds good, but we can't scan the VM for every table, every minute.
> > >> We need to record something that will tell us how many VM bits have
> > >> been cleared, which will then allow autovac to do a simple SELECT to
> > >> decide what needs vacuuming.
> > >>
> > >> Vik's proposal to keep track of the rows inserted seems like the best
> > >> approach to this issue.
> > >
> > > I tend to agree with Simon on this.  I'm also worried that an approach
> > > which was based off of a metric like "% of table not all-visible" might
> > > result in VACUUM running over and over on a table because it isn't able
> > > to actually make any progress towards improving that percentage.  We'd
> > > have to have some kind of "cool-off" period or something.
> > >
> > > Tracking INSERTs and then kicking off a VACUUM based on them seems to
> > > address that in a natural way and also seems like something that users
> > > would generally understand as it's very similar to what we do for
> > > UPDATEs and DELETEs.
> > >
> > > Tracking the INSERTs as a reason to VACUUM is also very natural when
> you
> > > consider the need to update BRIN indexes.
> >
> > Another possible advantage of tracking INSERTs is for hash indexes
> > where after split we need to remove tuples from buckets that underwent
> > split recently.
>
> That's a good point also, and for clearing GIN pending lists too, now
> that I think about it.
>

As much as I want my patch to go in, this is not an argument for it. The
GIN pending lists will be handled by autoanalyze.


> We really need to get this fixed for PG10.
>

I'm not sure what more I'm supposed to be doing, if anything.
-- 

Vik Fearing  +33 6 46 75 15
36http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et
Support


Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread REIX, Tony
Hi Konstantin

XLC.

I'm on AIX 7.1 for now.

I'm using this version of XLC v13:

# xlc -qversion
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
Version: 13.01.0003.0003

With this version, I have (at least, since I tested with "check" and not 
"check-world" at that time) 2 failing tests: create_aggregate , aggregates .


With the following XLC v12 version, I have NO test failure:

# /usr/vac/bin/xlc -qversion
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
Version: 12.01..0016


So maybe you are not using XLC v13.1.3.3, rather another sub-version. Unless 
you are using more options for the configure ?


Configure.

What are the options that you give to the configure ?


Hard load & 64 cores ? OK. That clearly explains why I do not see this issue.


pgbench ? I wanted to run it. However, I'm still looking where to get it plus a 
guide for using it for testing. I would add such tests when building my 
PostgreSQL RPMs on AIX. So any help is welcome !


Performance.

- Also, I'd like to compare PostgreSQL performance on AIX vs Linux/PPC64. Any 
idea how I should proceed ? Any PostgreSQL performance benchmark that I could 
find and use ? pgbench ?

- I'm interested in any information for improving the performance & quality of 
my PostgreSQM RPMs on AIX. (As I already said, BullFreeware RPMs for AIX are 
free and can be used by anyone, like Perzl RPMs are. My company (ATOS/Bull) 
sells IBM Power machines under the Escala brand since ages (25 years this 
year)).


How to help ?

How could I help for improving the quality and performance of PostgreSQL on AIX 
?
I may have access to very big machines for even more deeply testing of 
PostgreSQL. I just need to know how to run tests.


Thanks!

Regards,

Tony


Le 01/02/2017 à 14:48, Konstantin Knizhnik a écrit :
Hi,

We are using 13.1.3 version of XLC. All tests are passed.
Please notice that is is synchronization bug which can be reproduced only under 
hard load.
Our server has 64 cores and it is necessary to run pgbench with 100 connections 
during several minutes to reproduce the problem.
So may be you just didn't notice it;)



On 01.02.2017 16:29, REIX, Tony wrote:

Hi,

I'm now working on the port of PostgreSQL on AIX.
(RPMs can be found, as free OpenSource work, at 
 http://http://bullfreeware.com/ .
 http://bullfreeware.com/search.php?package=postgresql )

I was not aware of any issue with XLC v12 on AIX for atomic operations.
(XLC v13 generates at least 2 tests failures)

For now, with version 9.6.1, all tests "check-world", plus numeric_big test, 
are OK, in both 32 & 64bit versions.

Am I missing something ?

I configure the build of PostgreSQL with (in 64bits):

 ./configure
--prefix=/opt/freeware
--libdir=/opt/freeware/lib64
--mandir=/opt/freeware/man
--with-perl
--with-tcl
--with-tclconfig=/opt/freeware/lib
--with-python
--with-ldap
--with-openssl
--with-libxml
--with-libxslt
--enable-nls
--enable-thread-safety
--sysconfdir=/etc/sysconfig/postgresql

Am I missing some option for more optimization on AIX ?

Thanks

Regards,

Tony

Le 01/02/2017 à 12:07, Konstantin Knizhnik a écrit :
Attached please find my patch for XLC/AIX.
The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl.
The comment in this file says that:

  * __fetch_and_add() emits a leading "sync" and trailing "isync",
thereby
  * providing sequential consistency.  This is undocumented.

But it is not true any more (I checked generated assembler code in
debugger).
This is why I have added __sync() to this function. Now pgbench working
normally.

Also there is mysterious disappearance of assembler section function
with sync instruction from pg_atomic_compare_exchange_u32_impl.
I have fixed it by using __sync() built-in function instead.


Thanks to everybody who helped me to locate and fix this problem.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


ATOS WARNING !
This message contains attachments that could potentially harm your computer.
Please make sure you open ONLY attachments from senders you know, trust and is 
in an e-mail that you are expecting.

AVERTISSEMENT ATOS !
Ce message contient des pièces jointes qui peuvent potentiellement endommager 
votre ordinateur.
Merci de vous assurer que vous ouvrez uniquement les pièces jointes provenant 
d’emails que vous attendez et dont vous connaissez les expéditeurs et leur 
faites confiance.

AVISO DE ATOS !
Este mensaje contiene datos adjuntos que pudiera ser que dañaran su ordenador.
Asegúrese de abrir SOLO datos adjuntos enviados desde remitentes de confianza y 
que procedan de un correo esperado.

ATOS WARNUNG !
Diese E-Mail enthält Anlagen, welche möglicherweise ihren Computer beschädigen 
könnten.
Bitte beachten Sie, daß Sie NUR Anlagen öffnen, von einem Absender den Sie 
kennen, vertrauen und vom dem 

  1   2   >