date:20200710

On Sat, Jul 11, 2020 at 10:35:07AM +0900, Michael Paquier wrote:
> Yes, I have also opened an issue on github:
> https://github.com/macdice/cfbot/issues/11/

And Thomas has just fixed it:
https://github.com/macdice/cfbot/commit/e78438444a00bc8d83863645503b2f7c1a9da016
--
Michael


signature.asc
Description: PGP signature

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote:
> Most of the work in this patch is mechanical replacement of if/else
> if/else statements which hinge on relkind to switch statements on
> relkind.  The patch is not philosophically very interesting, but it
> is fairly long.  Reviewers might start by scrolling down the patch
> to the changes in src/include/catalog/pg_class.h 
> 
> There are no intentional behavioral changes in this patch.

Please note that 0002 does not apply anymore, there are conflicts in
heap.c and tablecmds.c, and that there are noise diffs, likely because
you ran pgindent and included places unrelated to this patch.  Anyway,
I am not really a fan of this patch.  I could see a benefit in
switching to an enum so as for places where we use a switch/case
without a default we would be warned if a new relkind gets added or if
a value is not covered, but then we should not really need
RELKIND_NULL, no?
--
Michael


signature.asc
Description: PGP signature

Re: calling procedures is slow and consumes extra much memory against calling function

2020-07-10 Thread Pavel Stehule

čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar 
napsal:

> On Wed, 17 Jun 2020 at 13:54, Pavel Stehule 
> wrote:
> >
> >
> >
> > st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar 
> napsal:
> >>
> >> On Wed, 10 Jun 2020 at 17:12, Pavel Stehule 
> wrote:
> >> > st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <
> amitdkhan...@gmail.com> napsal:
> >> >> Could you show an example testcase that tests this recursive
> scenario,
> >> >> with which your earlier patch fails the test, and this v2 patch
> passes
> >> >> it ? I am trying to understand the recursive scenario and the re-use
> >> >> of expr->plan.
> >> >
> >> >
> >> > it hangs on plpgsql tests. So you can apply first version of patch
> >> >
> >> > and "make check"
> >>
> >> I could not reproduce the make check hang with the v1 patch. But I
> >> could see a crash with the below testcase. So I understand the purpose
> >> of the plan_owner variable that you introduced in v2.
> >>
> >> Consider this recursive test :
> >>
> >> create or replace procedure p1(in r int) as $$
> >> begin
> >>RAISE INFO 'r : % ', r;
> >>if r < 3 then
> >>   call p1(r+1);
> >>end if;
> >> end
> >> $$ language plpgsql;
> >>
> >> do $$
> >> declare r int default 1;
> >> begin
> >> call p1(r);
> >> end;
> >> $$;
> >>
> >> In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
> >> consider this code of exec_stmt_call() with your v2 patch applied:
> >> if (expr->plan && !expr->plan->saved)
> >> {
> >>if (plan_owner)
> >>   SPI_freeplan(expr->plan);
> >>expr->plan = NULL;
> >> }
> >>
> >> Here, plan_owner is false. So SPI_freeplan() will not be called, and
> >> expr->plan is set to NULL. Now I have observed that the stmt pointer
> >> and expr pointer is shared between the p1() execution at this r=2
> >> level and the p1() execution at r=1 level. So after the above code is
> >> executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
> >> the same above code snippet, it gets the same expr pointer, but it's
> >> expr->plan is already set to NULL without being freed. From this
> >> logic, it looks like the plan won't get freed whenever the expr/stmt
> >> pointers are shared across recursive levels, since expr->plan is set
> >> to NULL at the lowermost level ? Basically, the handle to the plan is
> >> lost so no one at the upper recursion level can explicitly free it
> >> using SPI_freeplan(), right ? This looks the same as the main issue
> >> where the plan does not get freed for non-recursive calls. I haven't
> >> got a chance to check if we can develop a testcase for this, similar
> >> to your testcase where the memory keeps on increasing.
> >
> >
> > This is a good consideration.
> >
> > I am sending updated patch
>
> Checked the latest patch. Looks like using a local plan rather than
> expr->plan pointer for doing the checks does seem to resolve the issue
> I raised. That made me think of another scenario :
>
> Now we are checking for plan value and then null'ifying the expr->plan
> value. What if expr->plan is different from plan ? Is it possible ? I
> was thinking of such scenarios. But couldn't find one. As long as a
> plan is always created with saved=true for all levels, or with
> saved=false for all levels, we are ok. If we can have a mix of saved
> and unsaved plans at different recursion levels, then expr->plan can
> be different from the outer local plan because then the expr->plan
> will not be set to NULL in the inner level, while the outer level may
> have created its own plan. But I think a mix of saved and unsaved
> plans are not possible. If you agree, then I think we should at least
> have an assert that looks like :
>
> if (plan && !plan->saved)
> {
> if (plan_owner)
> SPI_freeplan(plan);
>
> /* If expr->plan  is present, it must be the same plan that we
> allocated */
>Assert ( !expr->plan || plan == expr->plan) );
>
> expr->plan = NULL;
> }
>
> Other than this, I have no other issues. I understand that we have to
> do this special handling only for this exec_stmt_call() because it is
> only here that we call exec_prepare_plan() with keep_plan = false, so
> doing special handling for freeing the plan seems to make sense.
>

attached patch with assert.

all regress tests passed. I think this short patch can be applied on older
releases as bugfix.

This weekend I'll try to check different strategy - try to save a plan and
release it at the end of the transaction.

Regards

Pavel
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..22c0376795 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2150,15 +2150,16 @@ static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	volatile SPIPlanPtr	plan = expr->plan;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile

Re: Default setting for enable_hashagg_disk

On Sat, 11 Jul 2020 at 14:02, Peter Geoghegan  wrote:
>
> On Fri, Jul 10, 2020 at 6:19 PM David Rowley  wrote:
> > If we have to have a new GUC, my preference would be hashagg_mem,
> > where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> > would mean use that value.  We'd need some sort of check hook to
> > disallow 0-63. I really am just failing to comprehend why we're
> > contemplating changing the behaviour of Hash Join here.
>
> I don't understand why parititonwise hash join consumes work_mem in
> the way it does. I assume that the reason is something like "because
> that behavior was the easiest to explain", or perhaps "people that use
> partitioning ought to be able to tune their database well". Or even
> "this design avoids an epic pgsql-hackers thread, because of course
> every hash table should get its own work_mem".

hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented
it differently.  The problem is made worse by the fact that we'll only
release the memory for the hash table during ExecEndHashJoin(). If the
planner had some ability to provide the executor with knowledge that
the node would never be rescanned, then the executor could release the
memory for the hash table after the join is complete. For now, we'll
need to live with the fact that an Append containing many children
doing hash joins will mean holding onto all that memory until the
executor is shutdown :-(

There's room to make improvements there, for sure, but not for PG13.

> > Of course, I
> > understand that that node type also uses a hash table, but why does
> > that give it the right to be involved in a change that we're making to
> > try and give users the ability to avoid possible regressions with Hash
> > Agg?
>
> It doesn't, exactly. The idea of hash_mem came from similar settings
> in another database system that you'll have heard of, that affect all
> nodes that use a hash table. I read about this long ago, and thought
> that it might make sense to do something similar as a way to improving
> work_mem

It sounds interesting, but it also sounds like a new feature
post-beta. Perhaps it's better we minimise the scope of the change to
be a minimal fix just for the behaviour we predict some users might
not like.

David

Re: proposal: schema variables

2020-07-10 Thread Pavel Stehule

po 6. 7. 2020 v 10:17 odesílatel Pavel Stehule 
napsal:

>
>
> ne 5. 7. 2020 v 15:33 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> čt 21. 5. 2020 v 14:49 odesílatel Pavel Stehule 
>> napsal:
>>
>>>
>>>
>>> čt 21. 5. 2020 v 13:34 odesílatel Amit Kapila 
>>> napsal:
>>>
 On Thu, May 21, 2020 at 3:41 PM Pavel Stehule 
 wrote:
 >
 > Hi
 >
 > just rebase without any other changes
 >

 You seem to forget attaching the rebased patch.

>>>
>>> I am sorry
>>>
>>> attached.
>>>
>>
>> fresh rebase
>>
>
> fix test build
>

rebase

Pavel


> Regards
>
> Pavel
>
>
>> Regards
>>
>> Pavel
>>
>>
>>> Pavel
>>>
>>>
 --
 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com

>>>


schema-variables-20200711.patch.gz
Description: application/gzip

Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote:
> There are two patches on this thread v2-0001 being much smaller than
> v2-0002.  I have looked at 0001 for now, and, like Alvaro, this
> renaming makes sense to me.  Those commands work on objects that are
> relkinds, except for one OBJECT_TYPE.  So, let's get 0001 patch
> merged.  Any objections from others?

I have been through this one again and applied it as cc35d89.
--
Michael


signature.asc
Description: PGP signature

Re: Stale external URL in doc?

2020-07-10 Thread Thomas Munro

On Sat, Jul 11, 2020 at 9:56 AM Daniel Gustafsson  wrote:
> > On 10 Jul 2020, at 23:47, Tom Lane  wrote:
> > Thomas Munro  writes:
> >> The others required minor manual sleuthing to correct; I hope I found
> >> the correct ISN ranges page.  Please see attached.
> >
> > I didn't actually check any of these, but they look like sane changes.
>
> +1, looks good, thanks!

Is it OK that I see the following warning many times when running
"make" under src/backend/utils/mb/Unicode?  It looks like this code is
from commit 1de9cc0d.  Horiguchi-san, do you think something changed
(input data format, etc) since you wrote it, or maybe some later
changes just made our perl scripts more picky about warnings?

  Use of uninitialized value $val in printf at convutils.pm line 612.

Re: Does TupleQueueReaderNext() really need to copy its result?

2020-07-10 Thread Thomas Munro

On Sat, Jul 11, 2020 at 1:37 PM Soumyadeep Chakraborty
 wrote:
> +1 to the idea! I ran some experiments on both of your patches.

Hi Soumyadeep,

Thanks for testing!

> I could reproduce the speed gain that you saw for a plan with a simple
> parallel sequential scan. However, I got no gain at all for a parallel
> hash join and parallel agg query.

Right, it's not going to make a difference when you only send one
tuple through the queue, like COUNT(*) does.

> As for gather merge, is it possible to have a situation where the slot
> input to tqueueReceiveSlot() is a heap slot (as would be the case for a
> simple select *)? If yes, in those scenarios, we would be incurring an
> extra call to minimal_tuple_from_heap_tuple() because of the extra call
> to ExecFetchSlotMinimalTuple() inside tqueueReceiveSlot() in your patch.
> And since, in a gather merge, we can't avoid the copy on the leader side
> (heap_copy_minimal_tuple() inside gm_readnext_tuple()), we would be
> doing extra work in that scenario. I couldn't come up with a plan that
> creates a scenario like this however.

Hmm.  I wish we had a way to do an "in-place" copy-to-minimal-tuple
where the caller supplies the memory, with some fast protocol to get
the size right.  We could use that for copying tuples into shm queues,
hash join tables etc without an extra palloc()/pfree() and double
copy.

Re: Default setting for enable_hashagg_disk

On Fri, Jul 10, 2020 at 6:19 PM David Rowley  wrote:
> If we get hash_mem
> or some variant that is a multiplier of work_mem, then that user is in
> exactly the same situation for that plan. i.e there's no ability to
> increase the memory allowances for Hash Agg alone.

That's true, of course.

> If we have to have a new GUC, my preference would be hashagg_mem,
> where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> would mean use that value.  We'd need some sort of check hook to
> disallow 0-63. I really am just failing to comprehend why we're
> contemplating changing the behaviour of Hash Join here.

I don't understand why parititonwise hash join consumes work_mem in
the way it does. I assume that the reason is something like "because
that behavior was the easiest to explain", or perhaps "people that use
partitioning ought to be able to tune their database well". Or even
"this design avoids an epic pgsql-hackers thread, because of course
every hash table should get its own work_mem".

> Of course, I
> understand that that node type also uses a hash table, but why does
> that give it the right to be involved in a change that we're making to
> try and give users the ability to avoid possible regressions with Hash
> Agg?

It doesn't, exactly. The idea of hash_mem came from similar settings
in another database system that you'll have heard of, that affect all
nodes that use a hash table. I read about this long ago, and thought
that it might make sense to do something similar as a way to improving
work_mem (without replacing it with something completely different to
enable things like the "hash teams" design, which should be the long
term goal). It's unusual that it took this hashaggs-that-spill issue
to make the work_mem situation come to a head, and it's unusual that
the proposal on the table doesn't just target hash agg. But it's not
*that* unusual.

I believe that it makes sense on balance to lump together hash
aggregate and hash join, with the expectation that the user might want
to tune them for the system as a whole. This is not an escape hatch --
it's something that adds granularity to how work_mem can be tuned in a
way that makes sense (but doesn't make perfect sense). It doesn't
reflect reality, but I think that it comes closer to reflecting
reality than other variations that I can think of, including your
hashagg_mem compromise proposal (which is still much better than plain
work_mem). In short, hash_mem is relatively conceptually clean, and
doesn't unduly burden the user.

I understand that you only want to add an escape hatch, which is what
hashagg_mem still amounts to. There are negative consequences to the
setting affecting hash join, which I am not unconcerned about. On the
other hand, hashagg_mem is an escape hatch, and that's ugly in a way
that hash_mem isn't. I'm also concerned about that.

In the end, I think that the "hash_mem vs. hashagg_mem" question is
fundamentally a matter of opinion.

-- 
Peter Geoghegan

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston

On Fri, Jul 10, 2020 at 6:43 PM David Rowley  wrote:

> On Sat, 11 Jul 2020 at 13:36, David G. Johnston
>  wrote:
> > If we add a setting that defaults to work_mem then the benefit is
> severely reduced.  You still have to modify individual queries, but the
> change can simply be more targeted than changing work_mem alone.
>
> I think the idea is that this is an escape hatch to allow users to get
> something closer to what PG12 did, but only if they really need it.  I
> can't quite understand why we need to leave the escape hatch open and
> push them halfway through it.  I find escape hatches are best left
> closed until you really have no choice but to use them.
>
>
The escape hatch dynamic is "the user senses a problem, goes into their
query, and modifies some GUCs to make the problem go away".  As a user
I'd much rather have the odds of my needing to use that escape hatch
reduced - especially if that reduction can be done without risk and without
any action on my part.

It's like having someone in a box right now, and then turning up the heat.
We can give them an opening to get out of the box if they need it but we
can also give them A/C.  For some the A/C may be unnecessary, but also not
harmful,  while a smaller group will stay in the margin, while for the
others it's not enough and use the opening (which they would have done
anyway without the A/C).

David J.

Re: Default setting for enable_hashagg_disk

On Sat, 11 Jul 2020 at 13:36, David G. Johnston
 wrote:
> If we add a setting that defaults to work_mem then the benefit is severely 
> reduced.  You still have to modify individual queries, but the change can 
> simply be more targeted than changing work_mem alone.

I think the idea is that this is an escape hatch to allow users to get
something closer to what PG12 did, but only if they really need it.  I
can't quite understand why we need to leave the escape hatch open and
push them halfway through it.  I find escape hatches are best left
closed until you really have no choice but to use them.

David

Re: Does TupleQueueReaderNext() really need to copy its result?

2020-07-10 Thread Soumyadeep Chakraborty

Hi Thomas,

+1 to the idea! I ran some experiments on both of your patches.

I could reproduce the speed gain that you saw for a plan with a simple
parallel sequential scan. However, I got no gain at all for a parallel
hash join and parallel agg query.

---

select pg_prewarm('lineitem');
-- lineitem is 17G. (TPCH scale = 20). shared_buffers = 30G
explain analyze select * from lineitem;

[w/o any patch]   99s
[w/ first patch]  89s
[w/ last minimal tuple patch] 79s

---

select pg_prewarm('lineitem');
-- lineitem is 17G. (TPCH scale = 20). shared_buffers = 30G
explain analyze select count(*) from lineitem;

[w/o any patch]   10s
[w/ first patch]  10s
[w/ last minimal tuple patch] 10s

---

select pg_prewarm('lineitem');
select pg_prewarm('orders');
-- lineitem is 17G, orders is 4G. (TPCH scale = 20). shared_buffers = 30G

explain analyze select count(*)
  from lineitem
  join orders on l_orderkey = o_orderkey
 where o_totalprice > 5.00;

[w/o any patch]   54s
[w/ first patch]  53s
[w/ last minimal tuple patch] 56s

---

Maybe I'm missing something, since there should be improvements with
anything that has a gather?

As for gather merge, is it possible to have a situation where the slot
input to tqueueReceiveSlot() is a heap slot (as would be the case for a
simple select *)? If yes, in those scenarios, we would be incurring an
extra call to minimal_tuple_from_heap_tuple() because of the extra call
to ExecFetchSlotMinimalTuple() inside tqueueReceiveSlot() in your patch.
And since, in a gather merge, we can't avoid the copy on the leader side
(heap_copy_minimal_tuple() inside gm_readnext_tuple()), we would be
doing extra work in that scenario. I couldn't come up with a plan that
creates a scenario like this however.

Regards,
Soumyadeep (VMware)

Re: pg_regress cleans up tablespace twice.

On Fri, Jul 10, 2020 at 09:35:56AM -0400, Tom Lane wrote:
> Should now be possible to undo whatever hack you had to use ...

Yes, I have also opened an issue on github:
https://github.com/macdice/cfbot/issues/11/
--
Michael


signature.asc
Description: PGP signature

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston

On Fri, Jul 10, 2020 at 6:19 PM David Rowley  wrote:

> If we have to have a new GUC, my preference would be hashagg_mem,
> where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> would mean use that value.  We'd need some sort of check hook to
> disallow 0-63. I really am just failing to comprehend why we're
> contemplating changing the behaviour of Hash Join here.

If we add a setting that defaults to work_mem then the benefit is severely
reduced.  You still have to modify individual queries, but the change can
simply be more targeted than changing work_mem alone.  I truly desire to
have whatever we do provide that ability as well as a default value that is
greater than the current work_mem value - which in v12 was being ignored
and thus production usages saw memory consumption greater than work_mem.
Only a multiplier does this.  A multiplier-only solution fixes the problem
at hand.  A multiplier-or-memory solution adds complexity but provides
flexibility.  If adding that flexibility is straight-forward I don't see
any serious downside other than the complexity of having the meaning of a
single GUC's value dependent upon its magnitude.

Of course, I
> understand that that node type also uses a hash table, but why does
> that give it the right to be involved in a change that we're making to
> try and give users the ability to avoid possible regressions with Hash
> Agg?
>

If Hash Join isn't affected by the "was allowed to use unlimited amounts of
execution memory but now isn't" change then it probably should continue to
consult work_mem instead of being changed to use the calculated value
(work_mem x multiplier).

David J.

Re: Default setting for enable_hashagg_disk

On Sat, 11 Jul 2020 at 12:47, David G. Johnston
 wrote:
> The multiplier seems strictly better than "rely on work_mem alone, i.e., do 
> nothing"; the detracting factor being one more GUC.  Even if one wants to 
> argue the solution is ugly or imperfect the current state seems worse and a 
> more perfect option doesn't seem worth waiting for.  The multiplier won't 
> make every single upgrade a non-event but it provides a more than sufficient 
> amount of control and in the worse case can be effectively ignored by setting 
> it to 1.0.

My argument wasn't related to if the new GUC should be a multiplier of
work_mem or an absolute amount of memory. The point I was trying to
make was that the solution to add a GUC to allow users to increase the
memory Hash Join and Hash Agg for plans which don't contain any other
nodes types that use work_mem is the same as doing nothing. As of
today, those people could just increase work_mem. If we get hash_mem
or some variant that is a multiplier of work_mem, then that user is in
exactly the same situation for that plan. i.e there's no ability to
increase the memory allowances for Hash Agg alone.

If we have to have a new GUC, my preference would be hashagg_mem,
where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
would mean use that value.  We'd need some sort of check hook to
disallow 0-63. I really am just failing to comprehend why we're
contemplating changing the behaviour of Hash Join here. Of course, I
understand that that node type also uses a hash table, but why does
that give it the right to be involved in a change that we're making to
try and give users the ability to avoid possible regressions with Hash
Agg?

David

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston

On Fri, Jul 10, 2020 at 5:16 PM David Rowley  wrote:

> Stephen mentions in [1] that:
> > Users who are actually hit by this in a negative way
> > have an option- increase work_mem to reflect what was actually happening
> > already.
>
> Peter is not a fan of that idea, which can only be due to the fact
> that will also increase the maximum memory consumption allowed by
> other nodes in the plan too.

That isn't the only reason for me - the main advantage of hash_mem is that
we get to set a default to some multiple greater than 1.0 so that an
upgrade to v13 has a region where behavior similar to v12 is effectively
maintained.  I have no feel for whether that should be 2.0, 4.0, or
something else, but 2.0 seemed small and I chose to use a power of 2.

My concern is that if we do hash_mem and
> have that control the memory allowances for Hash Joins and Hash Aggs,
> then that solution is just as good as Stephen's idea when the plan
> only contains Hash Joins and Hash Aggs.
>
> As much as I do want to see us get something to allow users some
> reasonable way to get the same performance as they're used to, I'm
> concerned that giving users something that works for many of the use
> cases is not really going to be as good as giving them something that
> works in all their use cases.   A user who has a partitioned table
> with a good number of partitions and partition-wise joins enabled
> might not like it if their Hash Join plan suddenly consumes hash_mem *
> nPartitions when they've set hash_mem to 10x of work_mem due to some
> other plan that requires that to maintain PG12's performance in PG13.
>

I don't know enough about the hash join dynamic to comment there but if an
admin goes in and changes the system default to 10x in lieu of a targeted
fix for a query that actually needs work_mem to be increased to 10 times
its current value to work properly I'd say that would be a poor decision.
Absent hash_mem they wouldn't update work_mem on their system to 10x its
current value in order to upgrade to v13, they'd set work_mem for that
query specifically.  The same should happen here.

Frankly, if admins are on top of their game and measuring and monitoring
query performance and memory consumption they would be able to operate in
our "do nothing" mode by setting the default for hash_mem to 1.0 and just
dole out memory via work_mem as they have always done.  Though setting
hash_mem to 10x for that single query would reduce their risk of OOM (none
of the work_mem consulting nodes would be increased) so having the GUC
would be a net win should they avail themselves of it.

The multiplier seems strictly better than "rely on work_mem alone, i.e., do
nothing"; the detracting factor being one more GUC.  Even if one wants to
argue the solution is ugly or imperfect the current state seems worse and a
more perfect option doesn't seem worth waiting for.  The multiplier won't
make every single upgrade a non-event but it provides a more than
sufficient amount of control and in the worse case can be effectively
ignored by setting it to 1.0.

Is there some reason to think that having this multiplier with a
conservative default of 2.0 would cause an actual problem - and would that
scenario have likely caused an OOM anyway in v12?  Given that "work_mem can
be used many many times in a single query" I'm having trouble imagining
such a problem.

David J.

Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

2020-07-10 Thread Zhenghua Lyu

Hi,
Thanks for your reply.

I find the problem in a distributed database based on Postgres (Greenplum). 
In distributed database
there may be distributed tables:
 every single node only contain subpart of the data and combine them 
all will get the full data

I think it may also be a problem for Postgres's parallel computing.
1. What postgres planner do for parallel scan a table and then join a 
generate_series() function scan?
2. What postgres planner do for parallel scan a table and then join a 
generate_series() function scan with a volatile filter?

Thus running the SQL in the above case, since generate_series functions can 
can be taken as the same every where,
And generate_series join generate_series also have this property: the data 
is complete in every single node. This property
is very helpful in a distributed join: A distributed table join  
generate_series function can just join in every local node and then
gather the result back to a single node.

But things are different when there are volatile functions: volatile 
functions may be in where clause, targetlist and somewhere.

That is why I come up with the above case and ask here.

To be honest, I do not care the push down so much. It is not normal usage 
to writing volatile functions in where clause.
I just find it lose the property.

Best,
Zhenghua Lyu

From: Tom Lane 
Sent: Friday, July 10, 2020 10:10 PM
To: Zhenghua Lyu 
Cc: pgsql-hackers@lists.postgresql.org 
Subject: Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile 
functions

Zhenghua Lyu  writes:
> The where clause is "pushed down to the x,y" because it only 
> references these two relations.

Yeah.  I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea.  There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.

Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated.  (Volatiles in SELECT lists
are a different story.)

This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?

regards, tom lane

Re: jsonpath versus NaN

2020-07-10 Thread Alexander Korotkov

On Thu, Jul 9, 2020 at 4:04 AM Alexander Korotkov  wrote:
> I understand both patches as fixes and propose to backpatch them to 12
> if no objections.

Both patches are pushed.

--
Regards,
Alexander Korotkov

Re: Default setting for enable_hashagg_disk

On Sat, 11 Jul 2020 at 10:00, Tom Lane  wrote:
>
> Stephen Frost  writes:
> > I don't see hash_mem as being any kind of proper fix- it's just punting
> > to the user saying "we can't figure this out, how about you do it" and,
> > worse, it's in conflict with how we already ask the user that question.
> > Turning it into a multiplier doesn't change that either.
>
> Have you got a better proposal that is reasonably implementable for v13?
> (I do not accept the argument that "do nothing" is a better proposal.)
>
> I agree that hash_mem is a stopgap, whether it's a multiplier or no,
> but at this point it seems difficult to avoid inventing a stopgap.
> Getting rid of the process-global work_mem setting is a research project,
> and one I wouldn't even count on having results from for v14.  In the
> meantime, it seems dead certain that there are applications for which
> the current behavior will be problematic.  hash_mem seems like a cleaner
> and more useful stopgap than the "escape hatch" approach, at least to me.

If we're going to end up going down the route of something like
hash_mem for PG13, wouldn't it be better to have something more like
hashagg_mem that only adjusts the memory limits for Hash Agg only?

Stephen mentions in [1] that:
> Users who are actually hit by this in a negative way
> have an option- increase work_mem to reflect what was actually happening
> already.

Peter is not a fan of that idea, which can only be due to the fact
that will also increase the maximum memory consumption allowed by
other nodes in the plan too. My concern is that if we do hash_mem and
have that control the memory allowances for Hash Joins and Hash Aggs,
then that solution is just as good as Stephen's idea when the plan
only contains Hash Joins and Hash Aggs.

As much as I do want to see us get something to allow users some
reasonable way to get the same performance as they're used to, I'm
concerned that giving users something that works for many of the use
cases is not really going to be as good as giving them something that
works in all their use cases.   A user who has a partitioned table
with a good number of partitions and partition-wise joins enabled
might not like it if their Hash Join plan suddenly consumes hash_mem *
nPartitions when they've set hash_mem to 10x of work_mem due to some
other plan that requires that to maintain PG12's performance in PG13.
 If that user is unable to adjust hash_mem due to that then they're
not going to be very satisfied that we've added hash_mem to allow
their query to perform as well as it did in PG12. They'll be at the
same OOM risk that they were exposed to in PG12 if they were to
increase hash_mem here.

David

[1] 
https://www.postgresql.org/message-id/20200710143415.gj12...@tamriel.snowman.net

Re: min_safe_lsn column in pg_replication_slots view

On 2020-Jul-09, Alvaro Herrera wrote:

> I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a
> macro to test for that.

That's overkill really.  I just used zero.  Running
contrib/test_decoding under valgrind, this now passes.

I think I'd rather do away with the compare to zero, and initialize to
something else in GetWALAvailability, though.  What we're doing seems
unclean and unclear.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 07546b71190b27433e673ff7d248e0f98215abb1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 10 Jul 2020 11:27:55 -0400
Subject: [PATCH] fix uninitialized value

---
 src/backend/access/transam/xlog.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28daf72a50..3d828cc56f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9530,6 +9530,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	 * considering wal_keep_segments and max_slot_wal_keep_size
 	 */
 	XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+	oldestSlotSeg = 0;
 	KeepLogSeg(currpos, &oldestSlotSeg);
 
 	/*
@@ -9624,7 +9625,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	}
 
 	/* don't delete WAL segments newer than the calculated segment */
-	if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+	if (*logSegNo == 0 || segno < *logSegNo)
 		*logSegNo = segno;
 }
 
-- 
2.20.1

Re: Default setting for enable_hashagg_disk

On Fri, Jul 10, 2020 at 2:50 PM Stephen Frost  wrote:
> Nothing of what you've said thus far has shown me that there were
> material bits of the discussion that I've missed.

Maybe that's just because you missed those bits too?

> No, that other people
> feel differently or have made comments supporting one thing or another
> isn't what I would consider material- I'm as allowed my opinions as much
> as others, even when I disagree with the majority (or so claimed anyhow-
> I've not gone back to count, but I don't claim it to be otherwise
> either).

You are of course entitled to your opinion.

The problem we're trying to address here is paradoxical, in a certain
sense. The HashAggs-that-spill patch is somehow not at fault on the
one hand, but on the other hand has created this urgent need to
ameliorate what is for all intents and purposes a regression.
Everything is intertwined. Yes -- this *is* weird! And, I admit that
the hash_mem proposal is unorthodox, even ugly -- in fact, I've said
words to that effect on perhaps a dozen occasions at this point. This
is also weird.

I pointed out that my hash_mem proposal was popular because it seemed
like it might save time. When I see somebody I know proposing
something strange, my first thought is "why are they proposing that?".
I might only realize some time later that there are special
circumstances that make the proposal much more reasonable than it
seemed at first (maybe even completely reasonable). There is no
inherent reason why other people supporting the proposal makes it more
valid, but in general it does suggest that special circumstances might
apply. It guides me in the direction of looking for and understanding
what they might be sooner.

-- 
Peter Geoghegan

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

On 2019-Sep-09, Amit Kapila wrote:

> Thanks for picking up this.  However, I noticed that previously
> Horiguchi-San has given some comments on this patch [1] which doesn't
> seem to be addressed or at least not all of them are addressed.  It is
> possible that you would have already addressed those, but in that
> case, it would be good if you respond to his email as well.  If those
> are not addressed, then it will be good to address those.

Totally unasked for, here's a rebase of this patch series.  I didn't do
anything other than rebasing to current master, solving a couple of very
trivial conflicts, fixing some whitespace complaints by git apply, and
running tests to verify everthing works.

I don't foresee working on this at all, so if anyone is interested in
seeing this feature in, I encourage them to read and address
Horiguchi-san's feedback.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From c2de4e2a2cbc301b48a7946a9ee0b9acceae7233 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 10 Jul 2020 11:29:11 -0400
Subject: [PATCH v18 1/2] libpq batch support

---
 doc/src/sgml/libpq.sgml   | 502 +++
 doc/src/sgml/lobj.sgml|   4 +
 .../libpqwalreceiver/libpqwalreceiver.c   |   3 +
 src/interfaces/libpq/exports.txt  |   5 +
 src/interfaces/libpq/fe-connect.c |  27 +
 src/interfaces/libpq/fe-exec.c| 597 --
 src/interfaces/libpq/fe-protocol2.c   |   6 +
 src/interfaces/libpq/fe-protocol3.c   |  15 +-
 src/interfaces/libpq/libpq-fe.h   |  24 +-
 src/interfaces/libpq/libpq-int.h  |  47 +-
 10 files changed, 1186 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d1ccaa775a..ad168727c7 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4826,6 +4826,500 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch mode and query pipelining
+
+  
+   libpq
+   batch mode
+  
+
+  
+   libpq
+   pipelining
+  
+
+  
+   libpq supports queueing up queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.
+  
+
+  
+   When to use batching
+
+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It increases client application complexity
+and extra caution is required to prevent client/server deadlocks but
+can sometimes offer considerable performance improvements.
+   
+
+   
+Batching is most useful when the server is distant, i.e. network latency
+(ping time) is high, and when many small operations are being performed in
+rapid sequence. There is usually less benefit in using batches when each
+query takes many multiples of the client/server round-trip time to execute.
+A 100-statement operation run on a server 300ms round-trip-time away would take
+30 seconds in network latency alone without batching; with batching it may spend
+as little as 0.3s waiting for results from the server.
+   
+
+   
+Use batches when your application does lots of small
+INSERT, UPDATE and
+DELETE operations that can't easily be transformed into
+operations on sets or into a
+COPY operation.
+   
+
+   
+Batching is not useful when information from one operation is required by the
+client before it knows enough to send the next operation. The client must
+introduce a synchronisation point and wait for a full client/server
+round-trip to get the results it needs. However, it's often possible to
+adjust the client design to exchange the required information server-side.
+Read-modify-write cycles are especially good candidates; for example:
+
+ BEGIN;
+ SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+ -- result: x=2
+ -- client adds 1 to x:
+ UPDATE mytable SET x = 3 WHERE id = 42;
+ COMMIT;
+
+could be much more efficiently done with:
+
+ UPDATE mytable SET x = x + 1 WHERE id = 42;
+
+   
+
+   
+
+ The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
+ use batches on server versions 7.4 and newer. Batching works on any server
+ that supports the v3 extended query protocol.
+
+   
+
+  
+
+  
+   Using batch mode
+
+   
+To issue batches the application must switch
+a connection into batch mode. Enter batch mode with PQenterBatchMode(conn) or test
+whether batch mode is active with PQbatchStatus(conn). In batch mode only asynchronous operations are permitted, and
+COPY is not recommended as it most likely will trigger failure in batch processing. 
+Using any synchronous command execution functio

Re: Default setting for enable_hashagg_disk

Stephen Frost  writes:
> I don't see hash_mem as being any kind of proper fix- it's just punting
> to the user saying "we can't figure this out, how about you do it" and,
> worse, it's in conflict with how we already ask the user that question.
> Turning it into a multiplier doesn't change that either.

Have you got a better proposal that is reasonably implementable for v13?
(I do not accept the argument that "do nothing" is a better proposal.)

I agree that hash_mem is a stopgap, whether it's a multiplier or no,
but at this point it seems difficult to avoid inventing a stopgap.
Getting rid of the process-global work_mem setting is a research project,
and one I wouldn't even count on having results from for v14.  In the
meantime, it seems dead certain that there are applications for which
the current behavior will be problematic.  hash_mem seems like a cleaner
and more useful stopgap than the "escape hatch" approach, at least to me.

regards, tom lane

Re: Stale external URL in doc?

2020-07-10 Thread Daniel Gustafsson

> On 10 Jul 2020, at 23:47, Tom Lane  wrote:
> Thomas Munro  writes:

>> The others required minor manual sleuthing to correct; I hope I found
>> the correct ISN ranges page.  Please see attached.
> 
> I didn't actually check any of these, but they look like sane changes.

+1, looks good, thanks!

cheers ./daniel

Re: Default setting for enable_hashagg_disk

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Peter Geoghegan  writes:
> > It now seems likely that the hash_mem/hash_mem_multiplier proposal has
> > the support it needs to get into Postgres 13. Assuming that the
> > proposal doesn't lose momentum, then it's about time to return to the
> > original question you posed at the start of the thread:
> 
> > What should we do with the hashagg_avoid_disk_plan GUC (formerly known
> > as the enable_hashagg_disk GUC), if anything?
> 
> > I myself think that there is a case to be made for removing it
> > entirely.
> 
> +0.5 or so for removing it.  It seems too confusing and dubiously
> useful.

I agree that it shouldn't exist.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Default setting for enable_hashagg_disk

Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost  wrote:
> > > The hash_mem design (as it stands) would affect both hash join and
> > > hash aggregate. I believe that it makes most sense to have hash-based
> > > nodes under the control of a single GUC. I believe that this
> > > granularity will cause the least problems. It certainly represents a
> > > trade-off.
> >
> > So, now this has moved from being a hack to deal with a possible
> > regression for a small number of users due to new behavior in one node,
> > to a change that has impacts on other nodes that hadn't been changed,
> > all happening during beta.
> 
> The common goal is ameliorating or avoiding predictable negative
> consequences for our users. One proposal is an ambitious and
> comprehensive way of dealing with that, that certainly has unique
> risks. The other is much less ambitious, and clearly a kludge -- but
> it's also much less risky. The discussion hasn't really moved at all.

Neither seem to be going in a direction which looks appropriate for a
beta-time change, particuarly given that none of this is *actually* new
territory.  Having to increase work_mem to get a HashAgg or HashJoin
that hasn't got a bunch of batches is routine and while it'd be nicer if
PG had a way to, overall, manage memory usage to stay within some
particular value, we don't.  When we start going in that direction it'll
be interesting to discuss how much we should favor trying to do an
in-memory HashAgg by reducing the amount of memory allocated to a Sort
node.

> > I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
> > the wrong direction because what's actually needed is a way to properly
> > consider and track overall memory usage- a redesign of work_mem (or some
> > new parameter, but it wouldn't be 'hash_mem') as you say, but all of
> > this discussion should be targeting v14.
> 
> It's certainly possible that hash_mem is too radical, and yet not
> radical enough -- in any timeframe (i.e. a total redesign of work_mem
> is the only thing that will be acceptable). I don't understand why you
> refuse to engage with the idea at all, though. The mere fact that
> hash_mem could in theory fix this problem comprehensively *usefully
> frames the problem*. This is the kind of issue where developing a
> shared understanding is very important.

I don't see hash_mem as being any kind of proper fix- it's just punting
to the user saying "we can't figure this out, how about you do it" and,
worse, it's in conflict with how we already ask the user that question.
Turning it into a multiplier doesn't change that either.

> Andres said to me privately that hash_mem could be a good idea, even
> though he opposes it as a fix to the open item for Postgres 13. I
> understand that proposing such a thing during beta is controversial,
> whatever the specifics are. It is a proposal made in the spirit of
> trying to move things forward. Hand wringing about ignoring the
> community's process is completely counterproductive.

I disagree that caring about the fact that we're in beta is
counterproductive.  Saying that we should ignore that we're in beta
isn't appropriate and I could say that's counterproductive- though that
hardly seems to be helpful, so I wonder why that comment was chosen.

> There are about 3 general approaches to addressing this problem, and
> hash_mem is one of them. Am I not allowed to point that out? I have
> been completely open and honest about the risks.

I don't think I said at any point that you weren't allowed to suggest
something.  I do think, and continue to feel, that not enough
consideration is being given to the fact that we're well past the point
where this kind of development should be happening- and commenting on
how leveling that concern at your proposed solution is mere 'hand
wringing' certainly doesn't reduce my feeling that we're being far too
cavalier with this.

> > > Like I said, the escape hatch GUC is not my preferred solution. But at
> > > least it acknowledges the problem. I don't think that anyone (or
> > > anyone else) believes that work_mem doesn't have serious limitations.
> >
> > work_mem obviously has serious limitations, but that's not novel or new
> > or unexpected by anyone.
> 
> In your other email from this morning, you wrote:
> 
> "And those workloads would be addressed by increasing work_mem, no?
> Why are we inventing something new here for something that'll only
> impact a small fraction of users in a small fraction of cases and
> where there's already a perfectly workable way to address the issue?"
> 
> Which one is it?

Uh, it's clearly both.  Those two statements are not contractictory at
all- I agree that work_mem isn't good, and it has limitations, but this
isn't one of those- people can increase work_mem and get the same
HashAgg they got before and have it use all that memory just as it did
before if they want to.

> > > > I'm really rather astounded

Re: Stale external URL in doc?

Thomas Munro  writes:
> The Microsoft one is OK, it's a redirect, but the redirect target
> looks like a more permanent URL to me so maybe we should change it.

+1

> The others required minor manual sleuthing to correct; I hope I found
> the correct ISN ranges page.  Please see attached.

I didn't actually check any of these, but they look like sane changes.

regards, tom lane

Re: "tuple concurrently updated" in pg_restore --jobs

On Fri, Jul 10, 2020 at 05:36:28PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:
> >> This works about 99% of the time, in fact.  It falls down in the --clean
> 
> > Note that this fails for me (sometimes) even without --clean.
> 
> Oh, I was thinking that REVOKE would only be issued in the --clean
> case, but apparently that's not so.  Doesn't really affect the fix
> proposal though.  I just finished a patch for HEAD, as attached.
> 
> (I flushed the "CatalogId objCatId" argument of dumpACL, which was
> not used.)
> 
> I'm not sure how far to back-patch it -- I think the parallel restore
> of ACLs behavior is not very old, but we might want to teach older
> pg_dump versions to insert the extra dependency anyway, for safety.

Yes, and the test case in David's patch on other thread [0] can't be
backpatched further than this patch is.  A variant on his test case could just
as well be included in this patch (with pg_dump writing to a seekable FD) and
then amended later to also test writing to an unseekable FD.

[0] https://commitfest.postgresql.org/28/2568/

Re: Stale external URL in doc?

2020-07-10 Thread Thomas Munro

On Fri, Jul 10, 2020 at 10:07 AM Daniel Gustafsson  wrote:
> (but I didn't test all of them)

Cave-person shell script time:

for url in ` git grep 'url="http' | sed 's/.*url="//;s/".*//' | sort | uniq `
do
  if ! curl --output /dev/null --silent --head --fail "$url"
  then
echo "bad URL: $url"
  fi
done

bad URL: https://mingw-w64.org/
bad URL: https://msdn.microsoft.com/en-us/library/aa380493%28VS.85%29.aspx
bad URL: https://ssl.icu-project.org/icu-bin/locexp
bad URL: https://www.ismn-international.org/ranges.html

The Microsoft one is OK, it's a redirect, but the redirect target
looks like a more permanent URL to me so maybe we should change it.
The others required minor manual sleuthing to correct; I hope I found
the correct ISN ranges page.  Please see attached.

Looking at the ICU URL, I found a couple like that in our source tree,
and fixed those too, including one used by
src/backend/utils/mb/Unicode/Makefile to fetch source data which has
moved (http://site.icu-project.org/repository says "Announcement
07/16/2018: The ICU source code repository has been migrated from
Subversion to Git, and is now hosted on GitHub.").
From f27aed63697d7ed844dd32f46182d4a8b0edb903 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 11 Jul 2020 09:33:10 +1200
Subject: [PATCH] Fix some more broken URLs.

Discussion: https://postgr.es/m/20200709.161226.204639179120026914.horikyota@gmail.com
---
 doc/src/sgml/acronyms.sgml| 2 +-
 doc/src/sgml/charset.sgml | 2 +-
 doc/src/sgml/installation.sgml| 2 +-
 doc/src/sgml/isn.sgml | 2 +-
 src/backend/utils/mb/Unicode/Makefile | 2 +-
 src/common/encnames.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..d0265dfb93 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -649,7 +649,7 @@
 SSPI
 
  
-  https://msdn.microsoft.com/en-us/library/aa380493%28VS.85%29.aspx";>Security
+  https://docs.microsoft.com/en-us/windows/win32/secauthn/sspi";>Security
   Support Provider Interface
  
 
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 4b4563c5b9..ccd39a55f5 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -841,7 +841,7 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
 subtag) can be found in
 the https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml";>CLDR
 repository.
-The https://ssl.icu-project.org/icu-bin/locexp";>ICU Locale
+The https://icu-project.org/icu-bin/locexp";>ICU Locale
 Explorer can be used to check the details of a particular locale
 definition.  The examples using the k* subtags require
 at least ICU version 54.
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 552303e211..617eb1b044 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2430,7 +2430,7 @@ xcodebuild -version -sdk macosx Path
 

  To build 64 bit binaries using MinGW, install the 64 bit tool set
- from https://mingw-w64.org/";>, put its bin
+ from https://www.mingw-w64.org/";>, put its bin
  directory in the PATH, and run
  configure with the
  --host=x86_64-w64-mingw32 option.
diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml
index e1ea209ff1..bdab4eba34 100644
--- a/doc/src/sgml/isn.sgml
+++ b/doc/src/sgml/isn.sgml
@@ -400,7 +400,7 @@ SELECT isbn13(id) FROM test;
 https://en.wikipedia.org/wiki/List_of_ISBN_identifier_groups";>
 https://www.isbn-international.org/content/isbn-users-manual";>
 https://en.wikipedia.org/wiki/International_Standard_Music_Number";>
-https://www.ismn-international.org/ranges.html";>
+https://www.ismn-international.org/ranges/ranges";>

 
Care was taken during the creation of the algorithms and they
diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 9084f03009..da307d8eb9 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -122,7 +122,7 @@ euc-jis-2004-std.txt sjis-0213-2004-std.txt:
 	$(DOWNLOAD) http://x0213.org/codetable/$(@F)
 
 gb-18030-2000.xml windows-949-2000.xml:
-	$(DOWNLOAD) https://ssl.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
+	$(DOWNLOAD) https://raw.githubusercontent.com/unicode-org/icu-data/master/charset/data/xml/$(@F)
 
 GB2312.TXT:
 	$(DOWNLOAD) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842&format=txt'
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 14cf1b39e9..2404822619 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -408,7 +408,7 @@ const pg_enc2gettext pg_enc2gettext_tbl[] =
 /*
  * Table of encoding names for ICU (currently covers backend encodings only)
  *
- * Reference:

Re: Stale external URL in doc?

2020-07-10 Thread Daniel Gustafsson

> On 10 Jul 2020, at 18:28, Tom Lane  wrote:
> Alvaro Herrera  writes:

>> Um, the comp.ai.genetic FAQ can still be found, eg. 
>> http://www.faqs.org/faqs/ai-faq/genetic/part1/
> 
> So it is, although that also shows it hasn't been updated since 2001.

Ah, I missed the alternative source.

> I'll go update that pointer and remove the other links per Daniel's
> patch.

Thanks for the fixup.

cheers ./daniel

Re: "tuple concurrently updated" in pg_restore --jobs

Justin Pryzby  writes:
> On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:
>> This works about 99% of the time, in fact.  It falls down in the --clean

> Note that this fails for me (sometimes) even without --clean.

Oh, I was thinking that REVOKE would only be issued in the --clean
case, but apparently that's not so.  Doesn't really affect the fix
proposal though.  I just finished a patch for HEAD, as attached.

(I flushed the "CatalogId objCatId" argument of dumpACL, which was
not used.)

I'm not sure how far to back-patch it -- I think the parallel restore
of ACLs behavior is not very old, but we might want to teach older
pg_dump versions to insert the extra dependency anyway, for safety.

regards, tom lane

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 4ea59f5494..08239dde4f 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -28,7 +28,7 @@
  */
 static DumpableObject **dumpIdMap = NULL;
 static int	allocedDumpIds = 0;
-static DumpId lastDumpId = 0;
+static DumpId lastDumpId = 0;	/* Note: 0 is InvalidDumpId */
 
 /*
  * Variables for mapping CatalogId to DumpableObject
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index b17c9dbb8b..1017abbbe5 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -233,6 +233,12 @@ typedef struct
 
 typedef int DumpId;
 
+#define InvalidDumpId 0
+
+/*
+ * Function pointer prototypes for assorted callback methods.
+ */
+
 typedef int (*DataDumperPtr) (Archive *AH, void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8bca147a33..e758b5c50a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -220,11 +220,11 @@ static void dumpUserMappings(Archive *fout,
 			 const char *owner, CatalogId catalogId, DumpId dumpId);
 static void dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo);
 
-static void dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
-	const char *type, const char *name, const char *subname,
-	const char *nspname, const char *owner,
-	const char *acls, const char *racls,
-	const char *initacls, const char *initracls);
+static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
+	  const char *type, const char *name, const char *subname,
+	  const char *nspname, const char *owner,
+	  const char *acls, const char *racls,
+	  const char *initacls, const char *initracls);
 
 static void getDependencies(Archive *fout);
 static void BuildArchiveDependencies(Archive *fout);
@@ -2992,7 +2992,7 @@ dumpDatabase(Archive *fout)
 	 * Dump ACL if any.  Note that we do not support initial privileges
 	 * (pg_init_privs) on databases.
 	 */
-	dumpACL(fout, dbCatId, dbDumpId, "DATABASE",
+	dumpACL(fout, dbDumpId, InvalidDumpId, "DATABASE",
 			qdatname, NULL, NULL,
 			dba, datacl, rdatacl, "", "");
 
@@ -3477,7 +3477,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 
 	/* Dump ACL if any */
 	if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
-		dumpACL(fout, binfo->dobj.catId, binfo->dobj.dumpId, "LARGE OBJECT",
+		dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
 binfo->dobj.name, NULL,
 NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
 binfo->initblobacl, binfo->initrblobacl);
@@ -10155,7 +10155,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 	 nspinfo->dobj.catId, 0, nspinfo->dobj.dumpId);
 
 	if (nspinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId, "SCHEMA",
+		dumpACL(fout, nspinfo->dobj.dumpId, InvalidDumpId, "SCHEMA",
 qnspname, NULL, NULL,
 nspinfo->rolname, nspinfo->nspacl, nspinfo->rnspacl,
 nspinfo->initnspacl, nspinfo->initrnspacl);
@@ -10439,7 +10439,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 	 tyinfo->dobj.catId, 0, tyinfo->dobj.dumpId);
 
 	if (tyinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, tyinfo->dobj.catId, tyinfo->dobj.dumpId, "TYPE",
+		dumpACL(fout, tyinfo->dobj.dumpId, InvalidDumpId, "TYPE",
 qtypname, NULL,
 tyinfo->dobj.namespace->dobj.name,
 tyinfo->rolname, tyinfo->typacl, tyinfo->rtypacl,
@@ -10565,7 +10565,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 	 tyinfo->dobj.catId, 0, tyinfo->dobj.dumpId);
 
 	if (tyinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, tyinfo->dobj.catId, tyinfo->dobj.dumpId, "TYPE",
+		dumpACL(fout, tyinfo->dobj.dumpId, InvalidDumpId, "TYPE",
 qtypname, NULL,
 tyinfo->dobj.namespace->dobj.name,
 tyinfo->rolname, tyinfo->typacl, tyinfo->rtypacl,
@@ -10637,7 +10637,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 	 tyinfo->dobj.catId, 0, tyinfo->dobj.dumpId);
 
 	if (tyinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, tyinfo->dobj.catId, tyinfo->dobj.dumpId, "TYPE",
+		dumpACL(fout, tyinfo->dobj.dumpId, InvalidDumpId, "TYPE",
 qtypname, N

Re: Default setting for enable_hashagg_disk

Peter Geoghegan  writes:
> It now seems likely that the hash_mem/hash_mem_multiplier proposal has
> the support it needs to get into Postgres 13. Assuming that the
> proposal doesn't lose momentum, then it's about time to return to the
> original question you posed at the start of the thread:

> What should we do with the hashagg_avoid_disk_plan GUC (formerly known
> as the enable_hashagg_disk GUC), if anything?

> I myself think that there is a case to be made for removing it
> entirely.

+0.5 or so for removing it.  It seems too confusing and dubiously
useful.

regards, tom lane

Re: Default setting for enable_hashagg_disk

Alvaro Herrera  writes:
> On 2020-Jul-10, Peter Geoghegan wrote:
>> * The maximum allowable value is 100.0, to protect users from
>> accidentally setting hash_mem_multiplier to a value intended to work
>> like a work_mem-style KB value (you can't provide an absolute value
>> like that directly). This maximum is absurdly high.

> I'm not sure about this bit; sounds a bit like what has been qualified
> as "nannyism" elsewhere.  Suppose I want to give a hash table 2GB of
> memory for whatever reason.  If my work_mem is default (4MB) then I
> cannot possibly achieve that without altering both settings.
> So I propose that maybe we do want a maximum value, but if so it should
> be higher than what you propose.  I think 1 is acceptable in that it
> doesn't get in the way.

I was kind of thinking 1000 as the limit ;-).  In any case, the code
will need to internally clamp the product to not exceed whatever the
work_mem physical limit is these days.

regards, tom lane

Re: Default setting for enable_hashagg_disk

On Fri, Jul 10, 2020 at 2:10 PM Alvaro Herrera  wrote:
> I'm not sure about this bit; sounds a bit like what has been qualified
> as "nannyism" elsewhere.  Suppose I want to give a hash table 2GB of
> memory for whatever reason.  If my work_mem is default (4MB) then I
> cannot possibly achieve that without altering both settings.
>
> So I propose that maybe we do want a maximum value, but if so it should
> be higher than what you propose.  I think 1 is acceptable in that it
> doesn't get in the way.

That's a good point.

I amend my proposal: the maximum allowable value of
hash_mem_multiplier should be 1.0 (i.e., ten thousand times
whatever work_mem is set to, which is subject to the existing work_mem
sizing restrictions).

-- 
Peter Geoghegan

Re: Default setting for enable_hashagg_disk

On Fri, Jul 10, 2020 at 11:34 AM Jeff Davis  wrote:
> On Fri, 2020-07-10 at 13:46 -0400, Tom Lane wrote:
> > I looked over Peter's patch in [1], and it seems generally pretty
> > sane to me, though I concur with the idea that it'd be better to
> > define the GUC as a multiplier for work_mem.  (For one thing, we
> > could
> > then easily limit it to be at least 1.0, ensuring sanity; also, if
> > work_mem does eventually become more dynamic than it is now, we might
> > still be able to salvage this knob as something useful.  Or if not,
> > we just rip it out.)  So my vote is for moving in that direction.
>
> In that case, I will hold off on my "escape-hatch" GUC.

It now seems likely that the hash_mem/hash_mem_multiplier proposal has
the support it needs to get into Postgres 13. Assuming that the
proposal doesn't lose momentum, then it's about time to return to the
original question you posed at the start of the thread:

What should we do with the hashagg_avoid_disk_plan GUC (formerly known
as the enable_hashagg_disk GUC), if anything?

I myself think that there is a case to be made for removing it
entirely. But if we keep it then we should also not change the
default. In other words, by default the planner should *not* try to
avoid hash aggs that spill. AFAICT there is no particular reason to be
concerned about that now, since nobody has expressed any concerns
about any of the possibly-relevant cost models. That said, I don't
feel strongly about this hashagg_avoid_disk_plan question. It seems
*much* less important.

--
Peter Geoghegan

Re: Default setting for enable_hashagg_disk

On 2020-Jul-10, Peter Geoghegan wrote:

> * The maximum allowable value is 100.0, to protect users from
> accidentally setting hash_mem_multiplier to a value intended to work
> like a work_mem-style KB value (you can't provide an absolute value
> like that directly). This maximum is absurdly high.
> 
> I think that it's possible that a small number of users will find it
> useful to set the value of hash_mem_multiplier as high as 5.0. That is
> a very aggressive value, but one that could still make sense with
> certain workloads.

I'm not sure about this bit; sounds a bit like what has been qualified
as "nannyism" elsewhere.  Suppose I want to give a hash table 2GB of
memory for whatever reason.  If my work_mem is default (4MB) then I
cannot possibly achieve that without altering both settings.

So I propose that maybe we do want a maximum value, but if so it should
be higher than what you propose.  I think 1 is acceptable in that it
doesn't get in the way.

Another point is that if you specify a unit for the multiplier (which is
what users are likely to do for larger values), it'll fail anyway, so
I'm not sure this is such terrible a problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: "tuple concurrently updated" in pg_restore --jobs

On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > I hit this issue intermittently (roughly half the time) while working with a
> > patch David submitted, and finally found a recipe to reproduce it on an
> > unpatched v12 instance.
> 
> > I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
> > parallel.
> 
> It's not pre-data.  But it's true that pg_restore figures it can restore
> ACLs in parallel during the ACL-restoring pass, on the theory that pg_dump
> will not emit two different ACL entries for the same object, so that we
> can do all the catalog updates in parallel without conflicts.
> 
> This works about 99% of the time, in fact.  It falls down in the --clean

Note that this fails for me (sometimes) even without --clean.

$ pg_restore pg_dump.out -j2 -d pryzbyj -v --section pre-data

pg_restore: entering main parallel loop
pg_restore: launching item 3395 ACL TABLE pg_proc
pg_restore: launching item 3396 ACL COLUMN pg_proc.proname
pg_restore: creating ACL "pg_catalog.TABLE pg_proc"
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.proname"
pg_restore: finished item 3395 ACL TABLE pg_proc
pg_restore: launching item 3397 ACL COLUMN pg_proc.pronamespace
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.pronamespace"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3396; 0 0 ACL COLUMN pg_proc.proname postgres
pg_restore: error: could not execute query: ERROR:  tuple concurrently updated
Command was: GRANT SELECT(proname) ON TABLE pg_catalog.pg_proc TO PUBLIC;


-- 
Justin

Re: Default setting for enable_hashagg_disk

On Fri, Jul 10, 2020 at 10:47 AM Tom Lane  wrote:
> I looked over Peter's patch in [1], and it seems generally pretty
> sane to me, though I concur with the idea that it'd be better to
> define the GUC as a multiplier for work_mem.  (For one thing, we could
> then easily limit it to be at least 1.0, ensuring sanity; also, if
> work_mem does eventually become more dynamic than it is now, we might
> still be able to salvage this knob as something useful.  Or if not,
> we just rip it out.)  So my vote is for moving in that direction.

Cool. I agree that it makes sense to constrain the effective value to
be at least work_mem in all cases.

With that in mind, I propose that this new GUC have the following
characteristics:

* It should be named "hash_mem_multiplier", a floating point GUC
(somewhat like bgwriter_lru_multiplier).

* The default value is 2.0.

* The minimum allowable value is 1.0, to protect users from
accidentally giving less memory to hash-based nodes.

* The maximum allowable value is 100.0, to protect users from
accidentally setting hash_mem_multiplier to a value intended to work
like a work_mem-style KB value (you can't provide an absolute value
like that directly). This maximum is absurdly high.

I think that it's possible that a small number of users will find it
useful to set the value of hash_mem_multiplier as high as 5.0. That is
a very aggressive value, but one that could still make sense with
certain workloads.

Thoughts?
-- 
Peter Geoghegan

Re: "tuple concurrently updated" in pg_restore --jobs

Justin Pryzby  writes:
> I hit this issue intermittently (roughly half the time) while working with a
> patch David submitted, and finally found a recipe to reproduce it on an
> unpatched v12 instance.

> I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
> parallel.

It's not pre-data.  But it's true that pg_restore figures it can restore
ACLs in parallel during the ACL-restoring pass, on the theory that pg_dump
will not emit two different ACL entries for the same object, so that we
can do all the catalog updates in parallel without conflicts.

This works about 99% of the time, in fact.  It falls down in the --clean
case if we have to revoke existing table permissions, because in that case
the REVOKE at table level is required to clear the table's per-column ACLs
as well, so that that ACL entry involves touching the same catalog rows
that the per-column ACLs want to touch.

I think the right fix is to give the per-column ACL entries dependencies
on the per-table ACL, if there is one.  This will not fix the problem
for the case of restoring from an existing pg_dump archive that lacks
such dependency links --- but given the lack of field complaints, I'm
okay with that.

This looks straightforward, if somewhat tedious because we'll have to
change the API of pg_dump's dumpACL() function, which is called by
a lot of places.  Barring objections, I'll go do that.

regards, tom lane

Re: Wait profiling

On 2020-Jul-10, Daniel Wood wrote:

> After nearly 5 years does something like the following yet exist?
> https://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru

Yes, we have pg_stat_activity.wait_events which implement pretty much
what Ildus describes there.

> 1) An option to "explain" to produce a wait events profile.
> postgres=# explain (analyze, waitprofile) update pgbench_accounts set 
> bid=bid+1 where aid < 200;
> ...
> Execution time: 23111.231 ms

There's an out-of-core extension, search for pg_wait_sampling.  I
haven't tested it yet ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

2020-07-10 Thread Michail Nikolaev

Hello, Peter.

Thanks for the update.

Yes, it is the right decision.
I have started to spot that bug only while working on a faster scan
using hint bits on replicas [1], so it is unlikely to hit it in
production at the moment.

Thanks,
Michail.

[1]: 
https://www.postgresql.org/message-id/CANtu0ojmkN_6P7CQWsZ%3DuEgeFnSmpCiqCxyYaHnhYpTZHj7Ubw%40mail.gmail.com

Wait profiling

2020-07-10 Thread Daniel Wood

After nearly 5 years does something like the following yet exist?
https://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru

I feel that it would be useful to have the following two things.  One PG 
enhancement and one standard extension.

1) An option to "explain" to produce a wait events profile.
postgres=# explain (analyze, waitprofile) update pgbench_accounts set bid=bid+1 
where aid < 200;
...
Execution time: 23111.231 ms

62.6% BufFileRead
50.0% CPU
9.3% LWLock

It uses a PG timer to do this.

2) An extension based function like: select pg_wait_profile(pid, nSeconds, 
timerFrequency) to return the same thing for an already running query.  Useful 
if you want examine some already long running query that is taking too long.

Neither of these would be doing the heavy weight pg_stat_activity but directly 
poll the wait event in PROC.  I've already coded the EXPLAIN option.

Furthermore, can't we just remove the following "IF" test from 
pgstat_report_wait_{start,end}?
if (!pgstat_track_activities || !proc)
return;
Just do the assignment of wait_event_info always.  We should use a dummy PGPROC 
assigned to MyProc until we assign the one in the procarray in shared memory.  
That way we don't need the "!proc" test.
About the only thing I'd want to verify is whether wait_event_info is on the 
same cache lines as anything else having to do with snapshots.

If I recall correctly the blanks lines above I've used to make this more 
readable will disappear.  :-(

- Dan Wood

Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

I wrote:
> Mark Lorenz  writes:
>> But, nevertheless, what about adding the function to accept the DAY, D 
>> (and maybe the Q) patterns for to_date() - in this case, of course, in 
>> the uncorrelated version? to_char() handles them properly. And, from my 
>> point of view, there is no reason why they should give "1" instead the 
>> real day number. What do you think?

> The trick is to produce something sane.  I think that a reasonable
> precedent for this would be what to_date does with ISO-week fields:
> you can ask it to parse IYYY-IW-ID but you can't mix that with regular
> month/day/year fields.  So for example, it seems like it'd be possible
> to reconstruct a date from -WW-D, because that's enough to uniquely
> identify a day.  The D field isn't monotonically increasing within a
> week, but nonetheless there's exactly one day in each -WW week that
> has a particular D value.  However you probably don't want to allow
> inconsistent mixtures like -WW-ID, because that's just a mess (and
> more than likely, it's a mistake).  And I would not be in favor of
> allowing -Q either, because that would not be enough to uniquely
> identify a day, so there's really no point in allowing Q to enter into
> to_date's considerations at all.
> Whether there is actually any field demand for such a feature is
> not clear to me.  AFAICT Oracle doesn't support it.

Since we're certainly not going to commit these patches as-presented,
and nothing has happened on this thread since early April, I've marked
both the CF entries as Returned With Feedback.  If you do write a patch
to make to_date work as above, please file a new CF entry.

(BTW, having two CF entries pointing at the same email thread is
pretty confusing to our not-that-bright tools.  It's probably
better to have just one entry per thread in the future.)

regards, tom lane

Re: Remove Deprecated Exclusive Backup Mode

2020-07-10 Thread David Steele


On 7/2/20 7:12 AM, David Steele wrote:


None of this really solves the problem of what happens when the user 
dumps the backup_label into the data directory. With traditional backup 
software that's pretty much going to be the only choice. Is telling them 
not to do it and washing our hands of it really enough?


After some thought I believe there are several recommendations we could 
give in the docs about what to do with the backup_label file.


This patch, though, looks to be a dead end. I think the best thing to do 
is withdraw the patch and end the thread.


I'll start a new thread when I have a patch/design that implements the 
new ideas we have discussed.


Regards,
--
-David
da...@pgmasters.net

Re: [Patch] Add missing libraries to Libs.private of libpq.pc


On 2020-04-08 11:38, Sandro Mani wrote:

The following patch, which we added to build mingw-postgresql on Fedora,
adds some missing libraries to Libs.private of libpq.pc, discovered when
attempting to statically link with libpq:

-lz: is required by -lcrypto


I think the correct fix for that would be to add libssl to libpq's 
Requires.private.



-liconv: is required by -lintl (though possibly depends on whether
gettext was compiled with iconv support)


Yeah, in both of these cases it depends on what libssl or libintl 
variant you actually got.  It could be the OS one or a separately 
installed one, it could be one with or without pkg-config support.  I'm 
not sure what a robust solution would be.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Jeff Davis

On Fri, 2020-07-10 at 13:46 -0400, Tom Lane wrote:
> I looked over Peter's patch in [1], and it seems generally pretty
> sane to me, though I concur with the idea that it'd be better to
> define the GUC as a multiplier for work_mem.  (For one thing, we
> could
> then easily limit it to be at least 1.0, ensuring sanity; also, if
> work_mem does eventually become more dynamic than it is now, we might
> still be able to salvage this knob as something useful.  Or if not,
> we just rip it out.)  So my vote is for moving in that direction.

In that case, I will hold off on my "escape-hatch" GUC.

Regards,
Jeff Davis

Re: Binary support for pgoutput plugin

Daniel Gustafsson  writes:
> Thanks for the update!  Do note that my patch included a new file which is
> missing from this patchset:
>   src/test/subscription/t/014_binary.pl
> This is, IMO, the most interesting test of this feature so it would be good to
> be included.  It's a basic test and can no doubt be extended to be even more
> relevant, but it's a start.

I was about to complain that the latest patchset includes no meaningful
test cases, but I assume that this missing file contains those.

>> I did not do the macro for updated, inserted, deleted, will give that a go 
>> tomorrow.

> This might not be a blocker, but personally I think it would make the
> code more readable. Anyone else have an opinion on this?

+1 for using macros.

> Reading through the new patch, and running the tests, I'm marking this as 
> Ready
> for Committer.  It does need some cosmetic TLC, quite possibly just from
> pg_indent but I didn't validate if it will take care of everything, and 
> comment
> touchups (there is still a TODO comment around wording that needs to be
> resolved).  However, I think it's in good enough shape for consideration at
> this point.

I took a quick look through the patch and had some concerns:

* Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks.
Those are quite dead so far as a patch for HEAD is concerned --- in fact,
since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so,
the patch is actively doing the wrong thing there.  Not that it matters.
This code will never appear in any branch where float timestamps could
be a thing.

* I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), endiannness,
etc, make any sense either.  Those surely do not affect the on-the-wire
representation of values --- or if they do, we've blown it somewhere else.
I'd just take out all those checks and assume that the binary
representation is sufficiently portable.  (If it's not, it's more or less
the user's problem, just as in binary COPY.)

* Please also remove debugging hacks such as enabling WAL_DEBUG.

* It'd likely be wise for the documentation to point out that binary
mode only works if all types to be transferred have send/receive
functions.

BTW, while it's not the job of this patch to fix it, I find it quite
distressing that we're apparently repeating the lookups of the type
I/O functions for every row transferred.

I'll set this back to WoA, but I concur with Daniel's opinion that
it doesn't seem that far from committability.

regards, tom lane

Re: Report error position in partition bound check

2020-07-10 Thread Alexandra Wang

> On 2 July 2020, at 06:39, Daniel Gustafsson  wrote:
> > On 10 Apr 2020, at 23:50, Alexandra Wang  wrote:
>
> > On Fri, Apr 10, 2020 at 8:37 AM Ashutosh Bapat <
ashutosh.ba...@2ndquadrant.com >
wrote:
> > > for a multi-key value the ^
> > > points to the first column and the reader may think that that's the
> > > problematci column. Should it instead point to ( ?
> >
> > I attached a v2 of Amit's 0002 patch to also report the exact column
> > for the partition overlap errors.
>
> This patch fails to apply to HEAD due to conflicts in the create_table
expected
> output.  Can you please submit a rebased version?  I'm marking the CF
entry
> Waiting on Author in the meantime.

Thank you Daniel. Here's the rebased patch. I also squashed the two
patches into one so it's easier to review.

-- 
*Alexandra Wang*
From 572f9086fc9d6e3e7cf1e58c3a6ee21dd8cd9f1b Mon Sep 17 00:00:00 2001
From: Alexandra Wang 
Date: Fri, 10 Jul 2020 10:28:49 -0700
Subject: [PATCH] Improve check new partition bound error position report

We have been passing a dummy ParseState to ereport(). Without the source
text in the ParseState ereport does not report the error position even
if a error location is supplied. This patch passes a ParseState to
check_new_partition_bound() when it is available.

-- Create parent table
create table foo (a int, b int, c date) partition by range (b,c);

-- Before:
create table foo_part_1 partition of foo for values from (1, date '2007-01-01') to (1, date '2006-01-01');
ERROR:  empty range bound specified for partition "foo_part_1"
DETAIL:  Specified lower bound (1, '2007-01-01') is greater than or equal to upper bound (1, '2006-01-01').

-- After:
create table foo_part_1 partition of foo for values from (1, date '2007-01-01') to (1, date '2006-01-01');
ERROR:  empty range bound specified for partition "foo_part_1"
LINE 1: ...e foo_part_1 partition of foo for values from (1, date '2007...
 ^
DETAIL:  Specified lower bound (1, '2007-01-01') is greater than or equal to upper bound (1, '2006-01-01').

Co-authored-by: Ashwin Agrawal 
Co-authored-by: Amit Langote 
---
 src/backend/commands/tablecmds.c   | 15 --
 src/backend/parser/parse_utilcmd.c |  3 ++
 src/backend/partitioning/partbounds.c  | 63 ++
 src/include/partitioning/partbounds.h  |  4 +-
 src/test/regress/expected/alter_table.out  | 10 
 src/test/regress/expected/create_table.out | 30 +++
 6 files changed, 98 insertions(+), 27 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 26700da278..8de7473ba7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -541,7 +541,8 @@ static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partPa
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel);
 static ObjectAddress ATExecAttachPartition(List **wqueue, Relation rel,
-		   PartitionCmd *cmd);
+		   PartitionCmd *cmd,
+		   AlterTableUtilityContext * context);
 static void AttachPartitionEnsureIndexes(Relation rel, Relation attachrel);
 static void QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
 			   List *partConstraint,
@@ -1005,7 +1006,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * Check first that the new partition's bound is valid and does not
 		 * overlap with any of existing partitions of the parent.
 		 */
-		check_new_partition_bound(relname, parent, bound);
+		check_new_partition_bound(relname, parent, bound, pstate);
 
 		/*
 		 * If the default partition exists, its partition constraints will
@@ -4646,7 +4647,8 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	  cur_pass, context);
 			Assert(cmd != NULL);
 			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-ATExecAttachPartition(wqueue, rel, (PartitionCmd *) cmd->def);
+ATExecAttachPartition(wqueue, rel, (PartitionCmd *) cmd->def,
+	  context);
 			else
 ATExecAttachPartitionIdx(wqueue, rel,
 		 ((PartitionCmd *) cmd->def)->name);
@@ -16186,7 +16188,8 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
  * Return the address of the newly attached partition.
  */
 static ObjectAddress
-ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
+ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
+	  AlterTableUtilityContext * context)
 {
 	Relation	attachrel,
 catalog;
@@ -16201,6 +16204,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	const char *trigger_name;
 	Oid			defaultPartOid;
 	List	   *partBoundConstraint;
+	ParseState *pstate = make_parsestate(NULL);
 
 	/*
 	 * We must lock the default partition if one exists, because attaching a

Re: Default setting for enable_hashagg_disk

Peter Geoghegan  writes:
> On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost  wrote:
>> Due to the fact that we're in beta and now is not the time to be
>> redesigning this feature.

> Did you read the discussion?

Beta is when we fix problems that testing exposes in new features.
Obviously, we'd rather not design new APIs at this point, but if it's
the only reasonable way to resolve a problem, that's what we've got
to do.  I don't think anyone is advocating for reverting the hashagg
spill feature, and "do nothing" is not an attractive option either.
On the other hand, it's surely too late to engage in any massive
redesigns such as some of this thread has speculated about.

I looked over Peter's patch in [1], and it seems generally pretty
sane to me, though I concur with the idea that it'd be better to
define the GUC as a multiplier for work_mem.  (For one thing, we could
then easily limit it to be at least 1.0, ensuring sanity; also, if
work_mem does eventually become more dynamic than it is now, we might
still be able to salvage this knob as something useful.  Or if not,
we just rip it out.)  So my vote is for moving in that direction.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com

Re: Default setting for enable_hashagg_disk

On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost  wrote:
> > The hash_mem design (as it stands) would affect both hash join and
> > hash aggregate. I believe that it makes most sense to have hash-based
> > nodes under the control of a single GUC. I believe that this
> > granularity will cause the least problems. It certainly represents a
> > trade-off.
>
> So, now this has moved from being a hack to deal with a possible
> regression for a small number of users due to new behavior in one node,
> to a change that has impacts on other nodes that hadn't been changed,
> all happening during beta.

The common goal is ameliorating or avoiding predictable negative
consequences for our users. One proposal is an ambitious and
comprehensive way of dealing with that, that certainly has unique
risks. The other is much less ambitious, and clearly a kludge -- but
it's also much less risky. The discussion hasn't really moved at all.

> I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
> the wrong direction because what's actually needed is a way to properly
> consider and track overall memory usage- a redesign of work_mem (or some
> new parameter, but it wouldn't be 'hash_mem') as you say, but all of
> this discussion should be targeting v14.

It's certainly possible that hash_mem is too radical, and yet not
radical enough -- in any timeframe (i.e. a total redesign of work_mem
is the only thing that will be acceptable). I don't understand why you
refuse to engage with the idea at all, though. The mere fact that
hash_mem could in theory fix this problem comprehensively *usefully
frames the problem*. This is the kind of issue where developing a
shared understanding is very important.

Andres said to me privately that hash_mem could be a good idea, even
though he opposes it as a fix to the open item for Postgres 13. I
understand that proposing such a thing during beta is controversial,
whatever the specifics are. It is a proposal made in the spirit of
trying to move things forward. Hand wringing about ignoring the
community's process is completely counterproductive.

There are about 3 general approaches to addressing this problem, and
hash_mem is one of them. Am I not allowed to point that out? I have
been completely open and honest about the risks.

> > Like I said, the escape hatch GUC is not my preferred solution. But at
> > least it acknowledges the problem. I don't think that anyone (or
> > anyone else) believes that work_mem doesn't have serious limitations.
>
> work_mem obviously has serious limitations, but that's not novel or new
> or unexpected by anyone.

In your other email from this morning, you wrote:

"And those workloads would be addressed by increasing work_mem, no?
Why are we inventing something new here for something that'll only
impact a small fraction of users in a small fraction of cases and
where there's already a perfectly workable way to address the issue?"

Which one is it?

> > > I'm really rather astounded at the direction this has been going in.
> >
> > Why?
>
> Due to the fact that we're in beta and now is not the time to be
> redesigning this feature.

Did you read the discussion?

-- 
Peter Geoghegan

Re: SQL-standard function body

Thomas Munro  writes:
> On Wed, Jul 1, 2020 at 5:49 AM Peter Eisentraut
>  wrote:
>> - More test coverage is needed.  Surprisingly, there wasn't actually any
>> test AFAICT that just creates and SQL function and runs it.  Most of
>> that code is tested incidentally, but there is very little or no
>> targeted testing of this functionality.

> FYI cfbot showed a sign of some kind of error_context_stack corruption
> while running "DROP TABLE functest3 CASCADE;".

BTW, it occurs to me after answering bug #16534 that
contrib/earthdistance's SQL functions would be great candidates for this
new syntax.  Binding their references at creation time is really exactly
what we want.

I still feel that we can't just replace the existing implementation,
though, as that would kill too many use-cases where late binding is
helpful.

regards, tom lane

Re: expose parallel leader in CSV and log_line_prefix

On 2020-Jul-10, Justin Pryzby wrote:
> On Fri, Jul 10, 2020 at 12:45:29PM -0400, Alvaro Herrera wrote:

> > I think it's overly verbose; all non-parallel backends are going to get
> > their own PID twice, and I'm not sure this is going to be great to
> > parse.  I think it would be more sensible that if the process does not
> > have a parent (leader), %P expands to empty.
> 
> That's what's done.
> 
> + Process ID of the parallel group leader if this process 
> was
> + at some point involved in parallel query, otherwise null.  For a
> + parallel group leader itself, this field is set to its own 
> process
> + ID.

Oh, okay by me then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: factorial function/phase out postfix operators?

2020-07-10 Thread Mark Dilger



> On Jun 30, 2020, at 2:47 PM, Mark Dilger  wrote:
> 
> 
> 
>> On May 19, 2020, at 4:47 PM, Tom Lane  wrote:
>> 
>> I wrote:
>>> However, we do have to have a benefit to show those people whose
>>> queries we break.  Hence my insistence on having a working AS fix
>>> (or some other benefit) before not after.
>> 
>> I experimented with this a bit more, and came up with the attached.
>> It's not a working patch, just a set of grammar changes that Bison
>> is happy with.  (Getting to a working patch would require fixing the
>> various build infrastructure that knows about the keyword classification,
>> which seems straightforward but tedious.)
> 
> I built a patch on top of yours that does much of that tedious work.
> 
>> As Robert theorized, it works to move a fairly-small number of unreserved
>> keywords into a new slightly-reserved category.  However, as the patch
>> stands, only the remaining fully-unreserved keywords can be used as bare
>> column labels.  I'd hoped to be able to also use col_name keywords in that
>> way (which'd make the set of legal bare column labels mostly the same as
>> ColId).  The col_name keywords that cause problems are, it appears,
>> only PRECISION, CHARACTER, and CHAR_P.  So in principle we could move
>> those three into yet another keyword category and then let the remaining
>> col_name keywords be included in BareColLabel.  I kind of think that
>> that's more complication than it's worth, though.
> 
> By my count, 288 more keywords can be used as column aliases without the AS 
> keyword after the patch.  That exactly matches what Robert said upthread.
> 
> Tom and Álvaro discussed upthread:
> 
>> Would it make sense (and possible) to have a keyword category that is
>> not disjoint wrt. the others?  Maybe that ends up being easier than
>> a solution that ends up with six or seven categories.

Version 2, attached, follows this design, increasing the number of keywords 
that can be used as column aliases without the AS keyword up to 411, with only 
39 keywords still requiring an explicit preceding AS.



v2-0001-Allow-most-keywords-to-be-used-as-implicit-column.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Index Skip Scan (new UniqueKeys)

2020-07-10 Thread Floris Van Nee

Hi Dmitry,

Also took another look at the patch now, and found a case of incorrect data. It 
looks related to the new way of creating the paths, as I can't recall seeing 
this in earlier versions.

create table t1 as select a,b,b%5 as c, random() as d from generate_series(1, 
10) a, generate_series(1,100) b;
create index on t1 (a,b,c);

postgres=# explain select distinct on (a) * from t1 order by a,b desc,c;
  QUERY PLAN   
---
 Sort  (cost=2.92..2.94 rows=10 width=20)
   Sort Key: a, b DESC, c
   ->  Index Scan using t1_a_b_c_idx on t1  (cost=0.28..2.75 rows=10 width=20)
 Skip scan: true
(4 rows)


With the 'order by a, b desc, c' we expect the value of column 'b' to always be 
100. With index_skipscan enabled, it always gives 1 though. It's not correct 
that the planner chooses a skip scan followed by sort in this case.

-Floris

Re: expose parallel leader in CSV and log_line_prefix

On Fri, Jul 10, 2020 at 12:45:29PM -0400, Alvaro Herrera wrote:
> On 2020-Mar-18, Justin Pryzby wrote:
> 
> > On Sun, Mar 15, 2020 at 12:49:33PM +0100, Julien Rouhaud wrote:
> 
> > template1=# SET log_temp_files=0; explain analyze SELECT a,COUNT(1) FROM t 
> > a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:47.288 CDT [55375537]LOG:  statement: SET 
> > log_temp_files=0;
> > SET
> > 2020-03-15 21:20:47.289 CDT [55375537]LOG:  statement: explain 
> > analyze SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:51.253 CDT [56275537]LOG:  temporary file: path 
> > "base/pgsql_tmp/pgsql_tmp5627.0", size 6094848
> > 2020-03-15 21:20:51.253 CDT [56275537]STATEMENT:  explain analyze 
> > SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:51.254 CDT [56265537]LOG:  temporary file: path 
> > "base/pgsql_tmp/pgsql_tmp5626.0", size 6103040
> > 2020-03-15 21:20:51.254 CDT [56265537]STATEMENT:  explain analyze 
> > SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:51.263 CDT [55375537]LOG:  temporary file: path 
> > "base/pgsql_tmp/pgsql_tmp5537.1.sharedfileset/o15of16.p0.0", size 557056
> 
> I think it's overly verbose; all non-parallel backends are going to get
> their own PID twice, and I'm not sure this is going to be great to
> parse.  I think it would be more sensible that if the process does not
> have a parent (leader), %P expands to empty.

That's what's done.

+ Process ID of the parallel group leader if this process was
+ at some point involved in parallel query, otherwise null.  For a
+ parallel group leader itself, this field is set to its own process
+ ID.

2020-07-10 11:53:32.304 CDT [16699 ]LOG:  statement: SELECT 1;
2020-07-10 11:53:32.304 
CDT,"pryzbyj","postgres",16699,"[local]",5f089d0b.413b,1,"idle",2020-07-10 
11:53:31 CDT,3/4,0,LOG,0,"statement: SELECT 1;","psql","client 
backend",

-- 
Justin

Re: expose parallel leader in CSV and log_line_prefix

On 2020-Mar-18, Justin Pryzby wrote:

> On Sun, Mar 15, 2020 at 12:49:33PM +0100, Julien Rouhaud wrote:

> template1=# SET log_temp_files=0; explain analyze SELECT a,COUNT(1) FROM t a 
> JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:47.288 CDT [55375537]LOG:  statement: SET 
> log_temp_files=0;
> SET
> 2020-03-15 21:20:47.289 CDT [55375537]LOG:  statement: explain 
> analyze SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:51.253 CDT [56275537]LOG:  temporary file: path 
> "base/pgsql_tmp/pgsql_tmp5627.0", size 6094848
> 2020-03-15 21:20:51.253 CDT [56275537]STATEMENT:  explain analyze 
> SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:51.254 CDT [56265537]LOG:  temporary file: path 
> "base/pgsql_tmp/pgsql_tmp5626.0", size 6103040
> 2020-03-15 21:20:51.254 CDT [56265537]STATEMENT:  explain analyze 
> SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:51.263 CDT [55375537]LOG:  temporary file: path 
> "base/pgsql_tmp/pgsql_tmp5537.1.sharedfileset/o15of16.p0.0", size 557056

I think it's overly verbose; all non-parallel backends are going to get
their own PID twice, and I'm not sure this is going to be great to
parse.  I think it would be more sensible that if the process does not
have a parent (leader), %P expands to empty.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Julien Rouhaud

On Fri, Jul 10, 2020 at 11:11:15AM -0500, Justin Pryzby wrote:
> On Fri, Jul 10, 2020 at 05:13:26PM +0200, Julien Rouhaud wrote:
> > There's a thinko in the padding handling.  It should be dones whether MyProc
> > and/or lockGroupLeader is NULL or not, and only if padding was asked, like 
> > it's
> > done for case 'd' for instance.
> > 
> > Also, the '%k' escape sounds a bit random.  Is there any reason why we don't
> > use any uppercase character for log_line_prefix?  %P could be a better
> > alternative, otherwise maybe %g, as GroupLeader/Gather?
> 
> Thanks for looking.  %P is a good idea - it's consistent with ps and pkill and
> probably other %commands.  I also amended the docs.

Thanks!

So for the leader == NULL case, the AppendStringInfoSpace is a no-op if no
padding was asked, so it's probably not worth adding extra code to make it any
more obvious.

It all looks good to me, I'm marking the patch a ready for committer!

Re: Stale external URL in doc?

Alvaro Herrera  writes:
> On 2020-Jul-10, Daniel Gustafsson wrote:
>> Taking a look at other links to external resources, most links seemed to
>> resolve still (but I didn't test all of them).  I did find another one on the
>> GEQO page which is now dead without the content available elsewhere, as well 
>> as
>> a larger problem with the AIX references.

> Um, the comp.ai.genetic FAQ can still be found, eg. 
> http://www.faqs.org/faqs/ai-faq/genetic/part1/

So it is, although that also shows it hasn't been updated since 2001.

OTOH, that vintage of info is probably just fine for understanding GEQO.

I'll go update that pointer and remove the other links per Daniel's
patch.

regards, tom lane

Re: expose parallel leader in CSV and log_line_prefix

On Fri, Jul 10, 2020 at 05:13:26PM +0200, Julien Rouhaud wrote:
> There's a thinko in the padding handling.  It should be dones whether MyProc
> and/or lockGroupLeader is NULL or not, and only if padding was asked, like 
> it's
> done for case 'd' for instance.
> 
> Also, the '%k' escape sounds a bit random.  Is there any reason why we don't
> use any uppercase character for log_line_prefix?  %P could be a better
> alternative, otherwise maybe %g, as GroupLeader/Gather?

Thanks for looking.  %P is a good idea - it's consistent with ps and pkill and
probably other %commands.  I also amended the docs.

-- 
Justin
>From 68a892d0023e915cc48aea1bb77eae5deabcecec Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 13 Mar 2020 22:03:06 -0500
Subject: [PATCH v2] Include the leader PID in logfile

See also: b025f32e0b, which adds the leader PID to pg_stat_activity
---
 doc/src/sgml/config.sgml  | 11 ++-
 src/backend/utils/error/elog.c| 30 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b353c61683..e60403d6a5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6611,6 +6611,14 @@ local0.*/var/log/postgresql
  Process ID
  no
 
+
+ %P
+ Process ID of the parallel group leader if this process was
+ at some point involved in parallel query, otherwise null.  For a
+ parallel group leader itself, this field is set to its own process
+ ID.
+ no
+
 
  %t
  Time stamp without milliseconds
@@ -6943,7 +6951,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, and backend type.
+application name, backend type, and leader PID.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -6973,6 +6981,7 @@ CREATE TABLE postgres_log
   location text,
   application_name text,
   backend_type text,
+  leader_pid integer,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e4b717c79a..dea098faf1 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,6 +77,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2448,6 +2449,25 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%d", MyProcPid);
 break;
+
+			case 'P':
+if (MyProc)
+{
+	PGPROC *leader = MyProc->lockGroupLeader;
+	if (leader == NULL)
+		/* padding only */
+		appendStringInfoSpaces(buf,
+padding > 0 ? padding : -padding);
+	else if (padding != 0)
+		appendStringInfo(buf, "%*d", padding, leader->pid);
+	else
+		appendStringInfo(buf, "%d", leader->pid);
+}
+else if (padding != 0)
+	appendStringInfoSpaces(buf,
+		   padding > 0 ? padding : -padding);
+break;
+
 			case 'l':
 if (padding != 0)
 	appendStringInfo(buf, "%*ld", padding, log_line_number);
@@ -2836,6 +2856,16 @@ write_csvlog(ErrorData *edata)
 	else
 		appendCSVLiteral(&buf, GetBackendTypeDesc(MyBackendType));
 
+	appendStringInfoChar(&buf, ',');
+
+	/* leader PID */
+	if (MyProc)
+	{
+		PGPROC *leader = MyProc->lockGroupLeader;
+		if (leader)
+			appendStringInfo(&buf, "%d", leader->pid);
+	}
+
 	appendStringInfoChar(&buf, '\n');
 
 	/* If in the syslogger process, try to write messages direct to file */
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e430e33c7b..48a50fb394 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -533,6 +533,7 @@
 	#   %h = remote host
 	#   %b = backend type
 	#   %p = process ID
+	#   %P = leader PID
 	#   %t = timestamp without milliseconds
 	#   %m = timestamp with milliseconds
 	#   %n = timestamp with milliseconds (as a Unix epoch)
-- 
2.17.0

Re: GSSENC'ed connection stalls while reconnection attempts.

Kyotaro Horiguchi  writes:
> If psql connected using GSSAPI auth and server restarted, reconnection
> sequence stalls and won't return.

Yeah, reproduced here.  (I wonder if there's any reasonable way to
exercise this scenario in src/test/kerberos/.)

> I found that psql(libpq) sends startup packet via gss
> encryption. conn->gssenc should be reset when encryption state is
> freed.

Actually, it looks to me like the GSS support was wedged in by somebody
who was paying no attention to how SSL is managed, or else we forgot
to pay attention to GSS the last time we rearranged SSL support.  It's
completely broken for the multiple-host-addresses scenario as well,
because try_gss is being set and cleared in the wrong places altogether.
conn->gcred is not being handled correctly either I think --- we need
to make sure that it's dropped in pqDropConnection.

The attached patch makes this all act more like the way SSL is handled,
and for me it resolves the reconnection problem.

> The reason that psql doesn't notice the error is pqPacketSend returns
> STATUS_OK when write error occurred.  That behavior contradicts to the
> comment of the function. The function is used only while making
> connection so it's ok to error out immediately by write failure.  I
> think other usage of pqFlush while making a connection should report
> write failure the same way.

I'm disinclined to mess with that, because (a) I don't think it's the
actual source of the problem, and (b) it would affect way more than
just GSS mode.

> Finally, It's user-friendly if psql shows error message for error on
> reset attempts. (This perhaps should be arguable.)

Meh, that's changing fairly longstanding behavior that I don't think
we've had many complaints about.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27c9bb46ee..7bee9dd201 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -477,6 +477,11 @@ pqDropConnection(PGconn *conn, bool flushInput)
 	{
 		OM_uint32	min_s;
 
+		if (conn->gcred != GSS_C_NO_CREDENTIAL)
+		{
+			gss_release_cred(&min_s, &conn->gcred);
+			conn->gcred = GSS_C_NO_CREDENTIAL;
+		}
 		if (conn->gctx)
 			gss_delete_sec_context(&min_s, &conn->gctx, GSS_C_NO_BUFFER);
 		if (conn->gtarg_nam)
@@ -496,6 +501,7 @@ pqDropConnection(PGconn *conn, bool flushInput)
 			free(conn->gss_ResultBuffer);
 			conn->gss_ResultBuffer = NULL;
 		}
+		conn->gssenc = false;
 	}
 #endif
 #ifdef ENABLE_SSPI
@@ -2027,11 +2033,6 @@ connectDBStart(PGconn *conn)
 	 */
 	resetPQExpBuffer(&conn->errorMessage);
 
-#ifdef ENABLE_GSS
-	if (conn->gssencmode[0] == 'd') /* "disable" */
-		conn->try_gss = false;
-#endif
-
 	/*
 	 * Set up to try to connect to the first host.  (Setting whichhost = -1 is
 	 * a bit of a cheat, but PQconnectPoll will advance it to 0 before
@@ -2468,6 +2469,9 @@ keep_going:		/* We will come back to here until there is
 		conn->allow_ssl_try = (conn->sslmode[0] != 'd');	/* "disable" */
 		conn->wait_ssl_try = (conn->sslmode[0] == 'a'); /* "allow" */
 #endif
+#ifdef ENABLE_GSS
+		conn->try_gss = (conn->gssencmode[0] != 'd');	/* "disable" */
+#endif
 
 		reset_connection_state_machine = false;
 		need_new_connection = true;
@@ -3349,12 +3353,8 @@ keep_going:		/* We will come back to here until there is
 	 */
 	if (conn->gssenc && conn->gssencmode[0] == 'p')
 	{
-		OM_uint32	minor;
-
 		/* postmaster expects us to drop the connection */
 		conn->try_gss = false;
-		conn->gssenc = false;
-		gss_delete_sec_context(&minor, &conn->gctx, NULL);
 		pqDropConnection(conn, true);
 		conn->status = CONNECTION_NEEDED;
 		goto keep_going;
@@ -3906,9 +3906,6 @@ makeEmptyPGconn(void)
 	conn->verbosity = PQERRORS_DEFAULT;
 	conn->show_context = PQSHOW_CONTEXT_ERRORS;
 	conn->sock = PGINVALID_SOCKET;
-#ifdef ENABLE_GSS
-	conn->try_gss = true;
-#endif
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -4065,22 +4062,6 @@ freePGconn(PGconn *conn)
 		free(conn->gsslib);
 	if (conn->connip)
 		free(conn->connip);
-#ifdef ENABLE_GSS
-	if (conn->gcred != GSS_C_NO_CREDENTIAL)
-	{
-		OM_uint32	minor;
-
-		gss_release_cred(&minor, &conn->gcred);
-		conn->gcred = GSS_C_NO_CREDENTIAL;
-	}
-	if (conn->gctx)
-	{
-		OM_uint32	minor;
-
-		gss_delete_sec_context(&minor, &conn->gctx, GSS_C_NO_BUFFER);
-		conn->gctx = NULL;
-	}
-#endif
 	/* Note that conn->Pfdebug is not ours to close or free */
 	if (conn->last_query)
 		free(conn->last_query);

Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Julien Rouhaud

On Thu, Jul 09, 2020 at 09:20:23PM -0500, Justin Pryzby wrote:
> On Fri, Jul 10, 2020 at 11:09:40AM +0900, Michael Paquier wrote:
> > On Thu, Jul 09, 2020 at 01:53:39PM +0200, Julien Rouhaud wrote:
> > > Sure!  I've been quite busy with internal work duties recently but
> > > I'll review this patch shortly.  Thanks for the reminder!
> > 
> > Hmm.  In which cases would it be useful to have this information in
> > the logs knowing that pg_stat_activity lets us know the link between
> > both the leader and its workers?
> 
> PSA is an instantaneous view whereas the logs are a record.  That's important
> for shortlived processes (like background workers) or in the case of an ERROR
> or later crash.
> 
> Right now, the logs fail to include that information, which is deficient.  
> Half
> the utility is in showing *that* the log is for a parallel worker, which is
> otherwise not apparent.

Yes, I agree that this is a nice thing to have and another smell step toward
parallel query monitoring.

About the patch:

+   case 'k':
+   if (MyProc)
+   {
+   PGPROC *leader = MyProc->lockGroupLeader;
+   if (leader == NULL)
+   /* padding only */
+   appendStringInfoSpaces(buf,
+   padding > 0 ? padding : -padding);
+   else if (padding != 0)
+   appendStringInfo(buf, "%*d", padding, leader->pid);
+   else
+   appendStringInfo(buf, "%d", leader->pid);
+   }
+   break;

There's a thinko in the padding handling.  It should be dones whether MyProc
and/or lockGroupLeader is NULL or not, and only if padding was asked, like it's
done for case 'd' for instance.

Also, the '%k' escape sounds a bit random.  Is there any reason why we don't
use any uppercase character for log_line_prefix?  %P could be a better
alternative, otherwise maybe %g, as GroupLeader/Gather?

Re: vs formatting in the docs


On 2020-06-21 18:57, Dagfinn Ilmari Mannsåker wrote:

Attached are two patches: the first adds the missing  tags,
the second adds  to all the SQL commands (specifically anything
with 7).


I have committed the first one.

I have some concerns about the second one.  If you look at the diff of 
the source of the man pages before and after, it creates a bit of a 
mess, even though the man pages look okay when rendered.  I'll need to 
think about this a bit more.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

libpq: Request Pipelining/Batching status ?

2020-07-10 Thread Matthieu Garrigues

Hi all,

Do you know what is the status of Request Pipelining and/or Batching in
libpq ?

I could see that I'm not the first one to think about it, I see an item in
the todolist:
https://web.archive.org/web/20200125013930/https://wiki.postgresql.org/wiki/Todo

And a thread here:
https://www.postgresql-archive.org/PATCH-Batch-pipelining-support-for-libpq-td5904551i80.html

And a patch here:
https://2ndquadrant.github.io/postgres/libpq-batch-mode.html

Seems like this boost performances a lot, drogon, a c++ framework
outperform all
other web framework thanks to this fork:
https://www.techempower.com/benchmarks/#section=data-r19&hw=ph&test=update
https://github.com/TechEmpower/FrameworkBenchmarks/issues/5502

It would be nice to have it it the official libpq so we don't have to use
an outdated fork
to have this feature.
Is anybody working on it ? is there lots of work to finalize this patch ?

Thanks in advance,
Matthieu

Re: WIP: BRIN multi-range indexes

2020-07-10 Thread Sascha Kuhl

Tomas Vondra  schrieb am Fr., 10. Juli 2020,
14:09:

> On Fri, Jul 10, 2020 at 06:01:58PM +0900, Masahiko Sawada wrote:
> >On Fri, 3 Jul 2020 at 09:58, Tomas Vondra 
> wrote:
> >>
> >> On Sun, Apr 05, 2020 at 08:01:50PM +0300, Alexander Korotkov wrote:
> >> >On Sun, Apr 5, 2020 at 8:00 PM Tomas Vondra
> >> > wrote:
> >> ...
> >> >> >
> >> >> >Assuming we're not going to get 0001-0003 into v13, I'm not so
> >> >> >inclined to rush on these three as well.  But you're willing to
> commit
> >> >> >them, you can count round of review on me.
> >> >> >
> >> >>
> >> >> I have no intention to get 0001-0003 committed. I think those changes
> >> >> are beneficial on their own, but the primary reason was to support
> the
> >> >> new opclasses (which require those changes). And those parts are not
> >> >> going to make it into v13 ...
> >> >
> >> >OK, no problem.
> >> >Let's do this for v14.
> >> >
> >>
> >> Hi Alexander,
> >>
> >> Are you still interested in reviewing those patches? I'll take a look at
> >> 0001-0003 to check that your previous feedback was addressed. Do you
> >> have any comments about 0004 / 0005, which I think are the more
> >> interesting parts of this series?
> >>
> >>
> >> Attached is a rebased version - I realized I forgot to include 0005 in
> >> the last update, for some reason.
> >>
> >
> >I've done a quick test with this patch set. I wonder if we can improve
> >brin_page_items() SQL function in pageinspect as well. Currently,
> >brin_page_items() is hard-coded to support only normal brin indexes.
> >When we pass brin-bloom or brin-multi-range to that function the
> >binary values are shown in 'value' column but it seems not helpful for
> >users. For instance, here is an output of brin_page_items() with a
> >brin-multi-range index:
> >
> >postgres(1:12801)=# select * from brin_page_items(get_raw_page('mul',
> >2), 'mul');
> >-[ RECORD 1
> ]--
>
> >---
> >
> >itemoffset  | 1
> >blknum  | 0
> >attnum  | 1
> >allnulls| f
> >hasnulls| f
> >placeholder | f
> >value   |
> {\x01001b002100e570e670e770e870e970ea70eb70ec70ed70ee70ef
>
> >70f070f170f270f370f470f570f670f770f870f970fa70fb70fc70fd70fe70ff700
> >071}
> >
>
> Hmm. I'm not sure we can do much better, without making the function
> much more complicated. I mean, even with regular BRIN indexes we don't
> really know if the value is plain min/max, right?
>
You can be sure with the next node. The value is in can be false positiv.
The value is out is clear. You can detect the change between in and out.

>
>
> >Also, I got an assertion failure when setting false_positive_rate
> reloption:
> >
> >postgres(1:12448)=# create index blm on t using brin (c int4_bloom_ops
> >(false_positive_rate = 1));
> >TRAP: FailedAssertion("(false_positive_rate > 0) &&
> >(false_positive_rate < 1.0)", File: "brin_bloom.c", Line: 300)
> >
> >I'll look at the code in depth and let you know if I find a problem.
> >
>
> Yeah, the assert should say (f_p_r <= 1.0).
>
> But I'm not convinced we should allow values up to 1.0, really. The
> f_p_r is the fraction of the table that will get matched always, so 1.0
> would mean we get to scan the whole table. Seems kinda pointless. So
> maybe we should cap it to something like 0.1 or so, but I agree the
> value seems kinda arbitrary.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>

Re: Default setting for enable_hashagg_disk

Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> In principle, Stephen is right: the v12 behavior is a bug, lots of
> people are unhappy about it, it causes real problems, and it would not
> be acceptable if proposed today. Otherwise I wouldn't have spent the
> time to fix it. 
> 
> Similarly, potential regressions are not the "fault" of my feature --
> they are the fault of the limitations of work_mem, the limitations of
> the planner, the wrong expectations from customers, or just
> happenstance. 

Exactly.

> But at a certain point, I have to weigh the potential anger of
> customers hitting regressions versus the potential anger of hackers
> seeing a couple extra GUCs. I have to say that I am more worried about
> the former.

We work, quite intentionally, to avoid having a billion knobs that
people have to understand and to tune.  Yes, we could create a bunch of
new GUCs to change all kinds of behavior, and we could add hints while
we're at it, but there's been quite understandable and good pressure
against doing so because much of the point of this database system is
that it should be figuring out the best plan on its own and within the
constraints that users have configured.

> If there is some more serious consequence of adding a GUC that I missed
> in this thread, please let me know. Otherwise, I intend to commit a new
> GUC shortly that will enable users to bypass work_mem for HashAgg, just
> as in v12.

I don't think this thread has properly considered that every new GUC,
every additional knob that we create, increases the complexity of the
system for users to have to deal with and, in some sense, creates a
failure of ours to be able to just figure out what the right answer
is.  For such a small set of users, who somehow have a problem with a
Sort taking up more memory but are fine with HashAgg doing so, I don't
think the requirement is met that this is a large enough issue to
warrant a new GUC.  Users who are actually hit by this in a negative way
have an option- increase work_mem to reflect what was actually happening
already.  I seriously doubt that we'd get tons of users complaining
about that answer or asking us to have something separate from that, and
we'd avoid adding some new GUC that has to be explained to every new
user to the system and complicate the documentation that explains how
work_mem works.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Default setting for enable_hashagg_disk

Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost  wrote:
> > I didn't, and don't, think it particularly relevant to the discussion,
> > but if you don't like the comparison to Sort then we could compare it to
> > a HashJoin instead- the point is that, yes, if you are willing to give
> > more memory to a given operation, it's going to go faster, and the way
> > users tell us that they'd like the query to use more memory is already
> > well-defined and understood to be through work_mem.  We do not do them a
> > service by ignoring that.
> 
> The hash_mem design (as it stands) would affect both hash join and
> hash aggregate. I believe that it makes most sense to have hash-based
> nodes under the control of a single GUC. I believe that this
> granularity will cause the least problems. It certainly represents a
> trade-off.

So, now this has moved from being a hack to deal with a possible
regression for a small number of users due to new behavior in one node,
to a change that has impacts on other nodes that hadn't been changed,
all happening during beta.

No, I don't agree with this.  Now is not the time for designing new
features for v13.

> work_mem is less of a problem with hash join, primarily because join
> estimates are usually a lot better than grouping estimates. But it is
> nevertheless something that it makes sense to put in the same
> conceptual bucket as hash aggregate, pending a future root and branch
> redesign of work_mem.

I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
the wrong direction because what's actually needed is a way to properly
consider and track overall memory usage- a redesign of work_mem (or some
new parameter, but it wouldn't be 'hash_mem') as you say, but all of
this discussion should be targeting v14.

> Like I said, the escape hatch GUC is not my preferred solution. But at
> least it acknowledges the problem. I don't think that anyone (or
> anyone else) believes that work_mem doesn't have serious limitations.

work_mem obviously has serious limitations, but that's not novel or new
or unexpected by anyone.

> > We have a parameter which already drives this and which users are
> > welcome to (and quite often do) tune.  I disagree that anything further
> > is either essential or particularly desirable.
> 
> This is a user hostile attitude.

I don't find that argument convincing, at all.

> > I'm really rather astounded at the direction this has been going in.
> 
> Why?

Due to the fact that we're in beta and now is not the time to be
redesigning this feature.  What Jeff implemented was done in a way that
follows the existing structure for how all of the other nodes work and
how HashAgg was *intended* to work (as in- if we thought the HashAgg
would go over work_mem, we wouldn't pick it and would do a GroupAgg
instead).  If there's bugs in his implementation (which I doubt, but it
can happen, of course) then that'd be useful to discuss and look at
fixing, but this discussion isn't appropriate for beta.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

Zhenghua Lyu  writes:
> The where clause is "pushed down to the x,y" because it only 
> references these two relations.

Yeah.  I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea.  There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.

Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated.  (Volatiles in SELECT lists
are a different story.)

This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?

regards, tom lane

Re: Implement UNLOGGED clause for COPY FROM

"osumi.takami...@fujitsu.com"  writes:
>> Aside from that, though, how does this improve upon the existing capability 
>> to copy into an unlogged temporary table?

> [>] unlogged temporary table can’t be inherited over sessions first of all.

Unlogged tables don't have to be temporary.

> And unlogged table needs to be recreated due to startup truncation of the 
> table’s content
> when the server crashes.

Indeed, and your proposed feature would extend that un-safety to tables
that are NOT marked unlogged, which is not good.

AFAICS, we can already accomplish basically the same thing as what you
want to do like this:

alter table foo set unlogged;
copy foo from ...;
alter table foo set logged;

The mechanics of that are already well worked out.  It's expensive,
no doubt about that, but I think you're just fooling yourself to
imagine that any shortcuts are possible.  A mix of unlogged and logged
data is never going to work safely.

> To achieve this, we have to
> consider a new idea like loaded data’d be added
> at the end of the all other pages and detach those
> if the server crashes during the UNLOGGED loading processing for example.

You keep on ignoring the indexes... not to mention replication.

regards, tom lane

Re: Default setting for enable_hashagg_disk

Greetings,

* Justin Pryzby (pry...@telsasoft.com) wrote:
> On Thu, Jul 09, 2020 at 06:58:40PM -0400, Stephen Frost wrote:
> > * Peter Geoghegan (p...@bowt.ie) wrote:
> > > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost  wrote:
> > > It makes more sense than simply ignoring what our users will see as a
> > > simple regression. (Though I still lean towards fixing the problem by
> > > introducing hash_mem, which at least tries to fix the problem head
> > > on.)
> > 
> > The presumption that this will always end up resulting in a regression
> > really doesn't seem sensible to me.
> 
> Nobody said "always" - we're concerned about a fraction of workloads which
> regress, badly affecting only only a small fraction of users.

And those workloads would be addressed by increasing work_mem, no?  Why
are we inventing something new here for something that'll only impact a
small fraction of users in a small fraction of cases and where there's
already a perfectly workable way to address the issue?

> Maybe pretend that Jeff implemented something called CashAgg, which does
> everything HashAgg does but implemented from scratch.  Users would be able to
> tune it or disable it, and we could talk about removing HashAgg for the next 3
> years.  But instead we decide to remove HashAgg right now since it's 
> redundant.
> That's a bad way to transition from an old behavior to a new one.  It's scary
> because it imposes a burden, rather than offering a new option without also
> taking away the old one.

We already have enable_hashagg.  Users are free to disable it.  This
makes it also respect work_mem- allowing users to tune that value to
adjust how much memory HashAgg actually uses.

> > > That's not the only justification. The other justification is that
> > > it's generally reasonable to prefer giving hash aggregate more memory.
> > 
> > Sure, and it's generally reasonably to prefer giving Sorts more memory
> > too... as long as you've got it available.
> 
> I believe he meant:
> "it's generally reasonable to prefer giving hash aggregate more memory THAN 
> OTHER NODES"

If we were developing a wholistic view of memory usage, with an overall
cap on how much memory is allowed to be used for a query, then that
would be an interesting thing to consider and discuss.  That's not what
any of this is.

Thanks,

Stephen


signature.asc
Description: PGP signature

RE: Implement UNLOGGED clause for COPY FROM

2020-07-10 Thread osumi.takami...@fujitsu.com

Hi David Johnston

Thank you for your comment.
Aside from that, though, how does this improve upon the existing capability to
copy into an unlogged temporary table?

[>] unlogged temporary table can’t be inherited over sessions first of all.
And unlogged table needs to be recreated due to startup truncation of the
table’s content
when the server crashes.
If you hold massive data in an unlogged table,
you’d forced to spend much time to recover it. This isn’t good.

So I’m thinking that COPY UNLOGGED’d provide a more flexible way for keeping
data
for COPY FROM users.

I’m considering that the feature gives them a choice that during ordinary
operation
you can keep WAL logging for a target table and when you need high-speed loading
you can bypass WAL generation for it.

To achieve this, we have to
consider a new idea like loaded data’d be added
at the end of the all other pages and detach those
if the server crashes during the UNLOGGED loading processing for example.

By the way, “ALTER TABLE tbl SET UNLOGGED” is supported by postgres.
You may think it’s OK to change LOGGED table to UNLOGGED table by this command.
But, it copies the whole relation once actually. (This isn’t written in the
manual.)
So this command becomes slow if the table the command is applied to contains a
lot of data.
Thus changing the table’s status of UNLOGGED/LOGGED also requires cost at the
moment and I think this copy is an obstacle for switching that table’s status.

The discussion of the reason is written in the url below.
https://www.postgresql.org/message-id/flat/CAFcNs%2Bpeg3VPG2%3Dv6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ%40mail.gmail.com

Best
Takamichi Osumi

Re: pgbench - add pseudo-random permutation function

2020-07-10 Thread Fabien COELHO




Attached v19 is a rebase, per cfbot.


Attached v20 fixes a doc xml typo, per cfbot again.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9f3bb5fce6..d4a604c6fa 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1033,7 +1033,7 @@ pgbench  options  d
 
   
 default_seed 
-   seed used in hash functions by default
+   seed used in hash and pseudo-random permutation functions by default
   
 
   
@@ -1850,6 +1850,20 @@ SELECT 4 AS four \; SELECT 5 AS five \aset

   
 
+  
+   
+pr_perm ( i, size [, seed ] )
+integer
+   
+   
+pseudo-random permutation in [0,size)
+   
+   
+pr_perm(0, 4)
+an integer between 0 and 3
+   
+  
+
   

 random ( lb, ub )
@@ -1936,6 +1950,20 @@ SELECT 4 AS four \; SELECT 5 AS five \aset
 shape of the distribution.

 
+   
+
+  When designing a benchmark which selects rows non-uniformly, be aware
+  that using pgbench non-uniform random functions
+  directly leads to unduly correlations: rows with close ids come out with
+  similar probability, skewing performance measures because they also reside
+  in close pages.
+
+
+  You must use pr_perm to shuffle the selected ids, or
+  some other additional step with similar effect, to avoid these skewing impacts.
+
+   
+

 
  
@@ -2030,24 +2058,54 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
 hash_fnv1a accept an input value and an optional seed parameter.
 In case the seed isn't provided the value of :default_seed
 is used, which is initialized randomly unless set by the command-line
--D option. Hash functions can be used to scatter the
-distribution of random functions such as random_zipfian or
-random_exponential. For instance, the following pgbench
-script simulates possible real world workload typical for social media and
+-D option.
+  
+
+  
+Function pr_perm implements a pseudo-random permutation
+on an arbitrary size.
+It allows to shuffle output of non uniform random functions so that 
+values drawn more often are not trivially correlated.
+It permutes integers in [0, size) using a seed by applying rounds of
+simple invertible functions, similarly to an encryption function,
+although beware that it is not at all cryptographically secure.
+Compared to hash functions discussed above, the function
+ensures that a perfect permutation is applied: there are neither collisions
+nor holes in the output values.
+Values outside the interval are interpreted modulo the size.
+The function raises an error if size is not positive.
+If no seed is provided, :default_seed is used.
+
+For instance, the following pgbench script
+simulates possible real world workload typical for social media and
 blogging platforms where few accounts generate excessive load:
 
 
-\set r random_zipfian(0, 1, 1.07)
-\set k abs(hash(:r)) % 100
+\set size 100
+\set r random_zipfian(1, :size, 1.07)
+\set k 1 + pr_perm(:r, :size)
 
 
 In some cases several distinct distributions are needed which don't correlate
 with each other and this is when implicit seed parameter comes in handy:
 
 
-\set k1 abs(hash(:r, :default_seed + 123)) % 100
-\set k2 abs(hash(:r, :default_seed + 321)) % 100
+\set k1 1 + pr_perm(:r, :size, :default_seed + 123)
+\set k2 1 + pr_perm(:r, :size, :default_seed + 321)
 
+
+An similar behavior can also be approximated with
+hash:
+
+
+\set size 100
+\set r random_zipfian(1, 100 * :size, 1.07)
+\set k 1 + abs(hash(:r)) % :size
+
+
+As this hash-modulo version generates collisions, some
+id would not be reachable and others would come more often
+than the target distribution.
   
 
   
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 85d61caa9f..0ea968c94b 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -19,6 +19,7 @@
 #define PGBENCH_NARGS_VARIABLE	(-1)
 #define PGBENCH_NARGS_CASE		(-2)
 #define PGBENCH_NARGS_HASH		(-3)
+#define PGBENCH_NARGS_PRPERM	(-4)
 
 PgBenchExpr *expr_parse_result;
 
@@ -370,6 +371,9 @@ static const struct
 	{
 		"hash_fnv1a", PGBENCH_NARGS_HASH, PGBENCH_HASH_FNV1A
 	},
+	{
+		"pr_perm", PGBENCH_NARGS_PRPERM, PGBENCH_PRPERM
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
@@ -482,6 +486,19 @@ make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args)
 			}
 			break;
 
+		/* pseudo-random permutation function with optional seed argument */
+		case PGBENCH_NARGS_PRPERM:
+			if (len < 2 || len > 3)
+expr_yyerror_more(yyscanner, "unexpected number of arguments",
+  PGBENCH_FUNCTIONS[fnumber].fname);
+
+			if (len == 2)
+			{
+PgBenchExpr *var = make_variable("default_seed");
+args = make_elist(var, a

Re: pg_regress cleans up tablespace twice.

Thomas Munro  writes:
> On Thu, Jun 18, 2020 at 1:42 PM Michael Paquier  wrote:
>> Thanks, applied this part to HEAD then after more testing.

> Hmm, somehow this (well I guess it's this commit based on timing and
> the area it touches, not sure exactly why) made cfbot's Windows build
> fail, like this:

Should now be possible to undo whatever hack you had to use ...

regards, tom lane

Re: Auto-vectorization speeds up multiplication of large-precision numerics

Peter Eisentraut  writes:
> We normally don't compile with -O3, so very few users would get the 
> benefit of this.

Yeah.  I don't think changing that baseline globally would be a wise move.

> We have CFLAGS_VECTOR for the checksum code.  I 
> suppose if we are making the numeric code vectorizable as well, we 
> should apply this there also.

> I think we need a bit of a policy decision from the group here.

I'd vote in favor of applying CFLAGS_VECTOR to specific source files
that can benefit.  We already have experience with that and we've not
detected any destabilization potential.

(I've not looked at this patch, so don't take this as a +1 for this
specific patch.)

regards, tom lane

Re: Default gucs for EXPLAIN


On 2020-05-23 11:14, Vik Fearing wrote:

Here is a patch to provide default gucs for EXPLAIN options.

I have two goals with this patch.  The first is that I personally
*always* want BUFFERS turned on, so this would allow me to do it without
typing it every time.


There was a lot of opposition to the approach taken by this patch, but 
there was a lot of support turning BUFFERS on by default.  Would you 
like to submit a patch for that?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Auto-vectorization speeds up multiplication of large-precision numerics


On 2020-06-10 14:15, Amit Khandekar wrote:

Well, how do we make sure we keep it that way?  How do we prevent some
random rearranging of the code or some random compiler change to break
this again?

I believe the compiler rearranges the code segments w.r.t. one another
when those are independent of each other. I guess the compiler is able
to identify that. With our case, it's the for loop. There are some
variables used inside it, and that would prevent it from moving the
for loop. Even if the compiler finds it safe to move relative to
surrounding code, it would not spilt the for loop contents themselves,
so the for loop will remain intact, and so would the vectorization,
although it seems to keep an unrolled version of the loop in case it
is called with smaller iteration counts. But yes, if someone in the
future tries to change the for loop, it would possibly break the
auto-vectorization. So, we should have appropriate comments (patch has
those). Let me know if you find any possible reasons due to which the
compiler would in the future stop the vectorization even when there is
no change in the for loop.


We normally don't compile with -O3, so very few users would get the 
benefit of this.  We have CFLAGS_VECTOR for the checksum code.  I 
suppose if we are making the numeric code vectorizable as well, we 
should apply this there also.


I think we need a bit of a policy decision from the group here.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: BRIN multi-range indexes

2020-07-10 Thread Tomas Vondra

On Fri, Jul 10, 2020 at 06:01:58PM +0900, Masahiko Sawada wrote:

On Fri, 3 Jul 2020 at 09:58, Tomas Vondra  wrote:

On Sun, Apr 05, 2020 at 08:01:50PM +0300, Alexander Korotkov wrote:
>On Sun, Apr 5, 2020 at 8:00 PM Tomas Vondra
> wrote:
...
>> >
>> >Assuming we're not going to get 0001-0003 into v13, I'm not so
>> >inclined to rush on these three as well.  But you're willing to commit
>> >them, you can count round of review on me.
>> >
>>
>> I have no intention to get 0001-0003 committed. I think those changes
>> are beneficial on their own, but the primary reason was to support the
>> new opclasses (which require those changes). And those parts are not
>> going to make it into v13 ...
>
>OK, no problem.
>Let's do this for v14.
>

Hi Alexander,

Are you still interested in reviewing those patches? I'll take a look at
0001-0003 to check that your previous feedback was addressed. Do you
have any comments about 0004 / 0005, which I think are the more
interesting parts of this series?

Attached is a rebased version - I realized I forgot to include 0005 in
the last update, for some reason.

I've done a quick test with this patch set. I wonder if we can improve
brin_page_items() SQL function in pageinspect as well. Currently,
brin_page_items() is hard-coded to support only normal brin indexes.
When we pass brin-bloom or brin-multi-range to that function the
binary values are shown in 'value' column but it seems not helpful for
users. For instance, here is an output of brin_page_items() with a
brin-multi-range index:

postgres(1:12801)=# select * from brin_page_items(get_raw_page('mul',
2), 'mul');
-[ RECORD 1 
]--
---

itemoffset  | 1
blknum  | 0
attnum  | 1
allnulls| f
hasnulls| f
placeholder | f
value   | 
{\x01001b002100e570e670e770e870e970ea70eb70ec70ed70ee70ef
70f070f170f270f370f470f570f670f770f870f970fa70fb70fc70fd70fe70ff700
071}

Hmm. I'm not sure we can do much better, without making the function
much more complicated. I mean, even with regular BRIN indexes we don't
really know if the value is plain min/max, right?

Also, I got an assertion failure when setting false_positive_rate reloption:

postgres(1:12448)=# create index blm on t using brin (c int4_bloom_ops
(false_positive_rate = 1));
TRAP: FailedAssertion("(false_positive_rate > 0) &&
(false_positive_rate < 1.0)", File: "brin_bloom.c", Line: 300)

I'll look at the code in depth and let you know if I find a problem.

Yeah, the assert should say (f_p_r <= 1.0).

But I'm not convinced we should allow values up to 1.0, really. The
f_p_r is the fraction of the table that will get matched always, so 1.0
would mean we get to scan the whole table. Seems kinda pointless. So
maybe we should cap it to something like 0.1 or so, but I agree the
value seems kinda arbitrary.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: TAP tests and symlinks on Windows

2020-07-10 Thread Andrew Dunstan


On 7/8/20 9:54 AM, Andrew Dunstan wrote:
>
>
>
> Then, with a little more sprinkling of perl2host the pg_basebackup tests
> can be made to work on msys2.
>
>
> I'm going to prepare patches along these lines.
>
>



After much frustration and gnashing of teeth here's a patch that allows
almost all the TAP tests involving symlinks to work as expected on all
Windows build environments, without requiring an additional Perl module.
I have tested this on a system that is very similar to that running
drongo and fairywren, with both msys2 and  MSVC builds.


I didn't change the name of perl2host - Sufficient unto the day is the
evil thereof. But I did modify it  a) to allow use of cygpath if
available and b) to allow it to succeed if the grandparent directory
exists when cygpath isn't available.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index d98187c970..e6e2b2762d 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -775,6 +775,7 @@ make check PROVE_TESTS='t/001_test1.pl t/003_test3.pl'

 The TAP tests require the Perl module IPC::Run.
 This module is available from CPAN or an operating system package.
+On Windows, Win32API::File is also required .

 

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 208df557b8..38b2a35c3b 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -211,154 +211,168 @@ $node->command_fails(
 	'pg_basebackup tar with long name fails');
 unlink "$pgdata/$superlongname";
 
-# The following tests test symlinks. Windows doesn't have symlinks, so
-# skip on Windows.
+# The following tests are for symlinks.
+
+# Move pg_replslot out of $pgdata and create a symlink to it.
+$node->stop;
+
+# Set umask so test directories and files are created with group permissions
+umask(0027);
+
+# Enable group permissions on PGDATA
+chmod_recursive("$pgdata", 0750, 0640);
+
+rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+  or BAIL_OUT "could not move $pgdata/pg_replslot";
+dir_symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+  or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+$node->start;
+
+# Create a temporary directory in the system location and symlink it
+# to our physical temp location.  That way we can use shorter names
+# for the tablespace directories, which hopefully won't run afoul of
+# the 99 character length limit.
+my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+dir_symlink "$tempdir", $shorter_tempdir;
+
+mkdir "$tempdir/tblspc1";
+my $realTsDir   = TestLib::perl2host("$shorter_tempdir/tblspc1");
+my $real_tempdir = TestLib::perl2host($tempdir);
+$node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';");
+$node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+$node->command_ok([ 'pg_basebackup', '-D', "$real_tempdir/tarbackup2", '-Ft' ],
+  'tar format with tablespaces');
+ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created');
+my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+rmtree("$tempdir/tarbackup2");
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+);
+
+my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+		   q{select pg_relation_filepath('tblspc1_unlogged')});
+
+# Make sure main and init forks exist
+ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+	'unlogged init fork in tablespace');
+ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+# Create files that look like temporary relations to ensure they are ignored
+# in a tablespace.
+my @tempRelationFiles = qw(t888_888 t88_88_vm.1);
+my $tblSpc1Id = basename(
+	dirname(
+		dirname(
+			$node->safe_psql(
+'postgres', q{select pg_relation_filepath('test1')};
+
+foreach my $filename (@tempRelationFiles)
+{
+	append_to_file(
+		"$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+		'TEMP_RELATION');
+}
+
+$node->command_fails(
+	[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
+	'plain format with tablespaces fails without tablespace mapping');
+
+$node->command_ok(
+	[
+		'pg_basebackup', '-D', "$tempdir/backup1", '-Fp',
+		"-T$realTsDir=$real_tempdir/tbackup/tblspc1"
+	   ],
+	'plain format with tablespaces succeeds with tablespace mapping');
+ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+
+# This symlink check is not supported on Windows as -l
+# doesn't work with junctions
 SKIP:
 {
-	skip "symlinks not supported on Windows", 18 if ($windows_os);
-
-	# Move pg_replslot out of $p

Re: posgres 12 bug (partitioned table)

2020-07-10 Thread Amit Langote

Reading my own words, I think I must fix an ambiguity:

On Fri, Jul 10, 2020 at 3:23 PM Amit Langote  wrote:
> So even if an AM's table_tuple_insert() itself doesn't populate the
> transaction info into the slot handed to it, maybe as an optimization,
> it does not sound entirely unreasonable to expect that the AM's
> slot_getsysattr() callback returns it correctly when projecting a
> target list containing system columns.

The "maybe as an optimization" refers to the part of the sentence that
comes before it.  That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM