Re: calling procedures is slow and consumes extra much memory against calling function

2020-07-10 Thread Pavel Stehule
čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar 
napsal:

> On Wed, 17 Jun 2020 at 13:54, Pavel Stehule 
> wrote:
> >
> >
> >
> > st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar 
> napsal:
> >>
> >> On Wed, 10 Jun 2020 at 17:12, Pavel Stehule 
> wrote:
> >> > st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <
> amitdkhan...@gmail.com> napsal:
> >> >> Could you show an example testcase that tests this recursive
> scenario,
> >> >> with which your earlier patch fails the test, and this v2 patch
> passes
> >> >> it ? I am trying to understand the recursive scenario and the re-use
> >> >> of expr->plan.
> >> >
> >> >
> >> > it hangs on plpgsql tests. So you can apply first version of patch
> >> >
> >> > and "make check"
> >>
> >> I could not reproduce the make check hang with the v1 patch. But I
> >> could see a crash with the below testcase. So I understand the purpose
> >> of the plan_owner variable that you introduced in v2.
> >>
> >> Consider this recursive test :
> >>
> >> create or replace procedure p1(in r int) as $$
> >> begin
> >>RAISE INFO 'r : % ', r;
> >>if r < 3 then
> >>   call p1(r+1);
> >>end if;
> >> end
> >> $$ language plpgsql;
> >>
> >> do $$
> >> declare r int default 1;
> >> begin
> >> call p1(r);
> >> end;
> >> $$;
> >>
> >> In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
> >> consider this code of exec_stmt_call() with your v2 patch applied:
> >> if (expr->plan && !expr->plan->saved)
> >> {
> >>if (plan_owner)
> >>   SPI_freeplan(expr->plan);
> >>expr->plan = NULL;
> >> }
> >>
> >> Here, plan_owner is false. So SPI_freeplan() will not be called, and
> >> expr->plan is set to NULL. Now I have observed that the stmt pointer
> >> and expr pointer is shared between the p1() execution at this r=2
> >> level and the p1() execution at r=1 level. So after the above code is
> >> executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
> >> the same above code snippet, it gets the same expr pointer, but it's
> >> expr->plan is already set to NULL without being freed. From this
> >> logic, it looks like the plan won't get freed whenever the expr/stmt
> >> pointers are shared across recursive levels, since expr->plan is set
> >> to NULL at the lowermost level ? Basically, the handle to the plan is
> >> lost so no one at the upper recursion level can explicitly free it
> >> using SPI_freeplan(), right ? This looks the same as the main issue
> >> where the plan does not get freed for non-recursive calls. I haven't
> >> got a chance to check if we can develop a testcase for this, similar
> >> to your testcase where the memory keeps on increasing.
> >
> >
> > This is a good consideration.
> >
> > I am sending updated patch
>
> Checked the latest patch. Looks like using a local plan rather than
> expr->plan pointer for doing the checks does seem to resolve the issue
> I raised. That made me think of another scenario :
>
> Now we are checking for plan value and then null'ifying the expr->plan
> value. What if expr->plan is different from plan ? Is it possible ? I
> was thinking of such scenarios. But couldn't find one. As long as a
> plan is always created with saved=true for all levels, or with
> saved=false for all levels, we are ok. If we can have a mix of saved
> and unsaved plans at different recursion levels, then expr->plan can
> be different from the outer local plan because then the expr->plan
> will not be set to NULL in the inner level, while the outer level may
> have created its own plan. But I think a mix of saved and unsaved
> plans are not possible. If you agree, then I think we should at least
> have an assert that looks like :
>
> if (plan && !plan->saved)
> {
> if (plan_owner)
> SPI_freeplan(plan);
>
> /* If expr->plan  is present, it must be the same plan that we
> allocated */
>Assert ( !expr->plan || plan == expr->plan) );
>
> expr->plan = NULL;
> }
>
> Other than this, I have no other issues. I understand that we have to
> do this special handling only for this exec_stmt_call() because it is
> only here that we call exec_prepare_plan() with keep_plan = false, so
> doing special handling for freeing the plan seems to make sense.
>

attached patch with assert.

all regress tests passed. I think this short patch can be applied on older
releases as bugfix.

This weekend I'll try to check different strategy - try to save a plan and
release it at the end of the transaction.

Regards

Pavel
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..22c0376795 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2150,15 +2150,16 @@ static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	volatile SPIPlanPtr	plan = expr->plan;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 14:02, Peter Geoghegan  wrote:
>
> On Fri, Jul 10, 2020 at 6:19 PM David Rowley  wrote:
> > If we have to have a new GUC, my preference would be hashagg_mem,
> > where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> > would mean use that value.  We'd need some sort of check hook to
> > disallow 0-63. I really am just failing to comprehend why we're
> > contemplating changing the behaviour of Hash Join here.
>
> I don't understand why parititonwise hash join consumes work_mem in
> the way it does. I assume that the reason is something like "because
> that behavior was the easiest to explain", or perhaps "people that use
> partitioning ought to be able to tune their database well". Or even
> "this design avoids an epic pgsql-hackers thread, because of course
> every hash table should get its own work_mem".

hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented
it differently.  The problem is made worse by the fact that we'll only
release the memory for the hash table during ExecEndHashJoin(). If the
planner had some ability to provide the executor with knowledge that
the node would never be rescanned, then the executor could release the
memory for the hash table after the join is complete. For now, we'll
need to live with the fact that an Append containing many children
doing hash joins will mean holding onto all that memory until the
executor is shutdown :-(

There's room to make improvements there, for sure, but not for PG13.

> > Of course, I
> > understand that that node type also uses a hash table, but why does
> > that give it the right to be involved in a change that we're making to
> > try and give users the ability to avoid possible regressions with Hash
> > Agg?
>
> It doesn't, exactly. The idea of hash_mem came from similar settings
> in another database system that you'll have heard of, that affect all
> nodes that use a hash table. I read about this long ago, and thought
> that it might make sense to do something similar as a way to improving
> work_mem

It sounds interesting, but it also sounds like a new feature
post-beta. Perhaps it's better we minimise the scope of the change to
be a minimal fix just for the behaviour we predict some users might
not like.

David




Re: proposal: schema variables

2020-07-10 Thread Pavel Stehule
po 6. 7. 2020 v 10:17 odesílatel Pavel Stehule 
napsal:

>
>
> ne 5. 7. 2020 v 15:33 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> čt 21. 5. 2020 v 14:49 odesílatel Pavel Stehule 
>> napsal:
>>
>>>
>>>
>>> čt 21. 5. 2020 v 13:34 odesílatel Amit Kapila 
>>> napsal:
>>>
 On Thu, May 21, 2020 at 3:41 PM Pavel Stehule 
 wrote:
 >
 > Hi
 >
 > just rebase without any other changes
 >

 You seem to forget attaching the rebased patch.

>>>
>>> I am sorry
>>>
>>> attached.
>>>
>>
>> fresh rebase
>>
>
> fix test build
>

rebase

Pavel


> Regards
>
> Pavel
>
>
>> Regards
>>
>> Pavel
>>
>>
>>> Pavel
>>>
>>>
 --
 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com

>>>


schema-variables-20200711.patch.gz
Description: application/gzip


Re: Towards easier AMs: Cleaning up inappropriate use of name "relkind"

2020-07-10 Thread Michael Paquier
On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote:
> There are two patches on this thread v2-0001 being much smaller than
> v2-0002.  I have looked at 0001 for now, and, like Alvaro, this
> renaming makes sense to me.  Those commands work on objects that are
> relkinds, except for one OBJECT_TYPE.  So, let's get 0001 patch
> merged.  Any objections from others?

I have been through this one again and applied it as cc35d89.
--
Michael


signature.asc
Description: PGP signature


Re: Stale external URL in doc?

2020-07-10 Thread Thomas Munro
On Sat, Jul 11, 2020 at 9:56 AM Daniel Gustafsson  wrote:
> > On 10 Jul 2020, at 23:47, Tom Lane  wrote:
> > Thomas Munro  writes:
> >> The others required minor manual sleuthing to correct; I hope I found
> >> the correct ISN ranges page.  Please see attached.
> >
> > I didn't actually check any of these, but they look like sane changes.
>
> +1, looks good, thanks!

Is it OK that I see the following warning many times when running
"make" under src/backend/utils/mb/Unicode?  It looks like this code is
from commit 1de9cc0d.  Horiguchi-san, do you think something changed
(input data format, etc) since you wrote it, or maybe some later
changes just made our perl scripts more picky about warnings?

  Use of uninitialized value $val in printf at convutils.pm line 612.




Re: Does TupleQueueReaderNext() really need to copy its result?

2020-07-10 Thread Thomas Munro
On Sat, Jul 11, 2020 at 1:37 PM Soumyadeep Chakraborty
 wrote:
> +1 to the idea! I ran some experiments on both of your patches.

Hi Soumyadeep,

Thanks for testing!

> I could reproduce the speed gain that you saw for a plan with a simple
> parallel sequential scan. However, I got no gain at all for a parallel
> hash join and parallel agg query.

Right, it's not going to make a difference when you only send one
tuple through the queue, like COUNT(*) does.

> As for gather merge, is it possible to have a situation where the slot
> input to tqueueReceiveSlot() is a heap slot (as would be the case for a
> simple select *)? If yes, in those scenarios, we would be incurring an
> extra call to minimal_tuple_from_heap_tuple() because of the extra call
> to ExecFetchSlotMinimalTuple() inside tqueueReceiveSlot() in your patch.
> And since, in a gather merge, we can't avoid the copy on the leader side
> (heap_copy_minimal_tuple() inside gm_readnext_tuple()), we would be
> doing extra work in that scenario. I couldn't come up with a plan that
> creates a scenario like this however.

Hmm.  I wish we had a way to do an "in-place" copy-to-minimal-tuple
where the caller supplies the memory, with some fast protocol to get
the size right.  We could use that for copying tuples into shm queues,
hash join tables etc without an extra palloc()/pfree() and double
copy.




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 6:19 PM David Rowley  wrote:
> If we get hash_mem
> or some variant that is a multiplier of work_mem, then that user is in
> exactly the same situation for that plan. i.e there's no ability to
> increase the memory allowances for Hash Agg alone.

That's true, of course.

> If we have to have a new GUC, my preference would be hashagg_mem,
> where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> would mean use that value.  We'd need some sort of check hook to
> disallow 0-63. I really am just failing to comprehend why we're
> contemplating changing the behaviour of Hash Join here.

I don't understand why parititonwise hash join consumes work_mem in
the way it does. I assume that the reason is something like "because
that behavior was the easiest to explain", or perhaps "people that use
partitioning ought to be able to tune their database well". Or even
"this design avoids an epic pgsql-hackers thread, because of course
every hash table should get its own work_mem".

> Of course, I
> understand that that node type also uses a hash table, but why does
> that give it the right to be involved in a change that we're making to
> try and give users the ability to avoid possible regressions with Hash
> Agg?

It doesn't, exactly. The idea of hash_mem came from similar settings
in another database system that you'll have heard of, that affect all
nodes that use a hash table. I read about this long ago, and thought
that it might make sense to do something similar as a way to improving
work_mem (without replacing it with something completely different to
enable things like the "hash teams" design, which should be the long
term goal). It's unusual that it took this hashaggs-that-spill issue
to make the work_mem situation come to a head, and it's unusual that
the proposal on the table doesn't just target hash agg. But it's not
*that* unusual.

I believe that it makes sense on balance to lump together hash
aggregate and hash join, with the expectation that the user might want
to tune them for the system as a whole. This is not an escape hatch --
it's something that adds granularity to how work_mem can be tuned in a
way that makes sense (but doesn't make perfect sense). It doesn't
reflect reality, but I think that it comes closer to reflecting
reality than other variations that I can think of, including your
hashagg_mem compromise proposal (which is still much better than plain
work_mem). In short, hash_mem is relatively conceptually clean, and
doesn't unduly burden the user.

I understand that you only want to add an escape hatch, which is what
hashagg_mem still amounts to. There are negative consequences to the
setting affecting hash join, which I am not unconcerned about. On the
other hand, hashagg_mem is an escape hatch, and that's ugly in a way
that hash_mem isn't. I'm also concerned about that.

In the end, I think that the "hash_mem vs. hashagg_mem" question is
fundamentally a matter of opinion.

-- 
Peter Geoghegan




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston
On Fri, Jul 10, 2020 at 6:43 PM David Rowley  wrote:

> On Sat, 11 Jul 2020 at 13:36, David G. Johnston
>  wrote:
> > If we add a setting that defaults to work_mem then the benefit is
> severely reduced.  You still have to modify individual queries, but the
> change can simply be more targeted than changing work_mem alone.
>
> I think the idea is that this is an escape hatch to allow users to get
> something closer to what PG12 did, but only if they really need it.  I
> can't quite understand why we need to leave the escape hatch open and
> push them halfway through it.  I find escape hatches are best left
> closed until you really have no choice but to use them.
>
>
The escape hatch dynamic is "the user senses a problem, goes into their
query, and modifies some GUCs to make the problem go away".  As a user
I'd much rather have the odds of my needing to use that escape hatch
reduced - especially if that reduction can be done without risk and without
any action on my part.

It's like having someone in a box right now, and then turning up the heat.
We can give them an opening to get out of the box if they need it but we
can also give them A/C.  For some the A/C may be unnecessary, but also not
harmful,  while a smaller group will stay in the margin, while for the
others it's not enough and use the opening (which they would have done
anyway without the A/C).

David J.


Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 13:36, David G. Johnston
 wrote:
> If we add a setting that defaults to work_mem then the benefit is severely 
> reduced.  You still have to modify individual queries, but the change can 
> simply be more targeted than changing work_mem alone.

I think the idea is that this is an escape hatch to allow users to get
something closer to what PG12 did, but only if they really need it.  I
can't quite understand why we need to leave the escape hatch open and
push them halfway through it.  I find escape hatches are best left
closed until you really have no choice but to use them.

David




Re: Does TupleQueueReaderNext() really need to copy its result?

2020-07-10 Thread Soumyadeep Chakraborty
Hi Thomas,

+1 to the idea! I ran some experiments on both of your patches.

I could reproduce the speed gain that you saw for a plan with a simple
parallel sequential scan. However, I got no gain at all for a parallel
hash join and parallel agg query.

---

select pg_prewarm('lineitem');
-- lineitem is 17G. (TPCH scale = 20). shared_buffers = 30G
explain analyze select * from lineitem;

[w/o any patch]   99s
[w/ first patch]  89s
[w/ last minimal tuple patch] 79s

---

select pg_prewarm('lineitem');
-- lineitem is 17G. (TPCH scale = 20). shared_buffers = 30G
explain analyze select count(*) from lineitem;

[w/o any patch]   10s
[w/ first patch]  10s
[w/ last minimal tuple patch] 10s

---

select pg_prewarm('lineitem');
select pg_prewarm('orders');
-- lineitem is 17G, orders is 4G. (TPCH scale = 20). shared_buffers = 30G

explain analyze select count(*)
  from lineitem
  join orders on l_orderkey = o_orderkey
 where o_totalprice > 5.00;

[w/o any patch]   54s
[w/ first patch]  53s
[w/ last minimal tuple patch] 56s

---

Maybe I'm missing something, since there should be improvements with
anything that has a gather?

As for gather merge, is it possible to have a situation where the slot
input to tqueueReceiveSlot() is a heap slot (as would be the case for a
simple select *)? If yes, in those scenarios, we would be incurring an
extra call to minimal_tuple_from_heap_tuple() because of the extra call
to ExecFetchSlotMinimalTuple() inside tqueueReceiveSlot() in your patch.
And since, in a gather merge, we can't avoid the copy on the leader side
(heap_copy_minimal_tuple() inside gm_readnext_tuple()), we would be
doing extra work in that scenario. I couldn't come up with a plan that
creates a scenario like this however.

Regards,
Soumyadeep (VMware)




Re: pg_regress cleans up tablespace twice.

2020-07-10 Thread Michael Paquier
On Fri, Jul 10, 2020 at 09:35:56AM -0400, Tom Lane wrote:
> Should now be possible to undo whatever hack you had to use ...

Yes, I have also opened an issue on github:
https://github.com/macdice/cfbot/issues/11/
--
Michael


signature.asc
Description: PGP signature


Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston
On Fri, Jul 10, 2020 at 6:19 PM David Rowley  wrote:

> If we have to have a new GUC, my preference would be hashagg_mem,
> where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> would mean use that value.  We'd need some sort of check hook to
> disallow 0-63. I really am just failing to comprehend why we're
> contemplating changing the behaviour of Hash Join here.


If we add a setting that defaults to work_mem then the benefit is severely
reduced.  You still have to modify individual queries, but the change can
simply be more targeted than changing work_mem alone.  I truly desire to
have whatever we do provide that ability as well as a default value that is
greater than the current work_mem value - which in v12 was being ignored
and thus production usages saw memory consumption greater than work_mem.
Only a multiplier does this.  A multiplier-only solution fixes the problem
at hand.  A multiplier-or-memory solution adds complexity but provides
flexibility.  If adding that flexibility is straight-forward I don't see
any serious downside other than the complexity of having the meaning of a
single GUC's value dependent upon its magnitude.

Of course, I
> understand that that node type also uses a hash table, but why does
> that give it the right to be involved in a change that we're making to
> try and give users the ability to avoid possible regressions with Hash
> Agg?
>

If Hash Join isn't affected by the "was allowed to use unlimited amounts of
execution memory but now isn't" change then it probably should continue to
consult work_mem instead of being changed to use the calculated value
(work_mem x multiplier).

David J.


Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 12:47, David G. Johnston
 wrote:
> The multiplier seems strictly better than "rely on work_mem alone, i.e., do 
> nothing"; the detracting factor being one more GUC.  Even if one wants to 
> argue the solution is ugly or imperfect the current state seems worse and a 
> more perfect option doesn't seem worth waiting for.  The multiplier won't 
> make every single upgrade a non-event but it provides a more than sufficient 
> amount of control and in the worse case can be effectively ignored by setting 
> it to 1.0.

My argument wasn't related to if the new GUC should be a multiplier of
work_mem or an absolute amount of memory. The point I was trying to
make was that the solution to add a GUC to allow users to increase the
memory Hash Join and Hash Agg for plans which don't contain any other
nodes types that use work_mem is the same as doing nothing. As of
today, those people could just increase work_mem. If we get hash_mem
or some variant that is a multiplier of work_mem, then that user is in
exactly the same situation for that plan. i.e there's no ability to
increase the memory allowances for Hash Agg alone.

If we have to have a new GUC, my preference would be hashagg_mem,
where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
would mean use that value.  We'd need some sort of check hook to
disallow 0-63. I really am just failing to comprehend why we're
contemplating changing the behaviour of Hash Join here. Of course, I
understand that that node type also uses a hash table, but why does
that give it the right to be involved in a change that we're making to
try and give users the ability to avoid possible regressions with Hash
Agg?

David




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston
On Fri, Jul 10, 2020 at 5:16 PM David Rowley  wrote:

> Stephen mentions in [1] that:
> > Users who are actually hit by this in a negative way
> > have an option- increase work_mem to reflect what was actually happening
> > already.
>
> Peter is not a fan of that idea, which can only be due to the fact
> that will also increase the maximum memory consumption allowed by
> other nodes in the plan too.


That isn't the only reason for me - the main advantage of hash_mem is that
we get to set a default to some multiple greater than 1.0 so that an
upgrade to v13 has a region where behavior similar to v12 is effectively
maintained.  I have no feel for whether that should be 2.0, 4.0, or
something else, but 2.0 seemed small and I chose to use a power of 2.

My concern is that if we do hash_mem and
> have that control the memory allowances for Hash Joins and Hash Aggs,
> then that solution is just as good as Stephen's idea when the plan
> only contains Hash Joins and Hash Aggs.
>
> As much as I do want to see us get something to allow users some
> reasonable way to get the same performance as they're used to, I'm
> concerned that giving users something that works for many of the use
> cases is not really going to be as good as giving them something that
> works in all their use cases.   A user who has a partitioned table
> with a good number of partitions and partition-wise joins enabled
> might not like it if their Hash Join plan suddenly consumes hash_mem *
> nPartitions when they've set hash_mem to 10x of work_mem due to some
> other plan that requires that to maintain PG12's performance in PG13.
>

I don't know enough about the hash join dynamic to comment there but if an
admin goes in and changes the system default to 10x in lieu of a targeted
fix for a query that actually needs work_mem to be increased to 10 times
its current value to work properly I'd say that would be a poor decision.
Absent hash_mem they wouldn't update work_mem on their system to 10x its
current value in order to upgrade to v13, they'd set work_mem for that
query specifically.  The same should happen here.

Frankly, if admins are on top of their game and measuring and monitoring
query performance and memory consumption they would be able to operate in
our "do nothing" mode by setting the default for hash_mem to 1.0 and just
dole out memory via work_mem as they have always done.  Though setting
hash_mem to 10x for that single query would reduce their risk of OOM (none
of the work_mem consulting nodes would be increased) so having the GUC
would be a net win should they avail themselves of it.

The multiplier seems strictly better than "rely on work_mem alone, i.e., do
nothing"; the detracting factor being one more GUC.  Even if one wants to
argue the solution is ugly or imperfect the current state seems worse and a
more perfect option doesn't seem worth waiting for.  The multiplier won't
make every single upgrade a non-event but it provides a more than
sufficient amount of control and in the worse case can be effectively
ignored by setting it to 1.0.

Is there some reason to think that having this multiplier with a
conservative default of 2.0 would cause an actual problem - and would that
scenario have likely caused an OOM anyway in v12?  Given that "work_mem can
be used many many times in a single query" I'm having trouble imagining
such a problem.

David J.


Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

2020-07-10 Thread Zhenghua Lyu
Hi,
Thanks for your reply.

I find the problem in a distributed database based on Postgres (Greenplum). 
In distributed database
there may be distributed tables:
 every single node only contain subpart of the data and combine them 
all will get the full data

I think it may also be a problem for Postgres's parallel computing.
1. What postgres planner do for parallel scan a table and then join a 
generate_series() function scan?
2. What postgres planner do for parallel scan a table and then join a 
generate_series() function scan with a volatile filter?

Thus running the SQL in the above case, since generate_series functions can 
can be taken as the same every where,
And generate_series join generate_series also have this property: the data 
is complete in every single node. This property
is very helpful in a distributed join: A distributed table join  
generate_series function can just join in every local node and then
gather the result back to a single node.

But things are different when there are volatile functions: volatile 
functions may be in where clause, targetlist and somewhere.

That is why I come up with the above case and ask here.

To be honest, I do not care the push down so much. It is not normal usage 
to writing volatile functions in where clause.
I just find it lose the property.

Best,
Zhenghua Lyu







From: Tom Lane 
Sent: Friday, July 10, 2020 10:10 PM
To: Zhenghua Lyu 
Cc: pgsql-hackers@lists.postgresql.org 
Subject: Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile 
functions

Zhenghua Lyu  writes:
> The where clause is "pushed down to the x,y" because it only 
> references these two relations.

Yeah.  I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea.  There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.

Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated.  (Volatiles in SELECT lists
are a different story.)

This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?

regards, tom lane


Re: jsonpath versus NaN

2020-07-10 Thread Alexander Korotkov
On Thu, Jul 9, 2020 at 4:04 AM Alexander Korotkov  wrote:
> I understand both patches as fixes and propose to backpatch them to 12
> if no objections.

Both patches are pushed.

--
Regards,
Alexander Korotkov




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 10:00, Tom Lane  wrote:
>
> Stephen Frost  writes:
> > I don't see hash_mem as being any kind of proper fix- it's just punting
> > to the user saying "we can't figure this out, how about you do it" and,
> > worse, it's in conflict with how we already ask the user that question.
> > Turning it into a multiplier doesn't change that either.
>
> Have you got a better proposal that is reasonably implementable for v13?
> (I do not accept the argument that "do nothing" is a better proposal.)
>
> I agree that hash_mem is a stopgap, whether it's a multiplier or no,
> but at this point it seems difficult to avoid inventing a stopgap.
> Getting rid of the process-global work_mem setting is a research project,
> and one I wouldn't even count on having results from for v14.  In the
> meantime, it seems dead certain that there are applications for which
> the current behavior will be problematic.  hash_mem seems like a cleaner
> and more useful stopgap than the "escape hatch" approach, at least to me.

If we're going to end up going down the route of something like
hash_mem for PG13, wouldn't it be better to have something more like
hashagg_mem that only adjusts the memory limits for Hash Agg only?

Stephen mentions in [1] that:
> Users who are actually hit by this in a negative way
> have an option- increase work_mem to reflect what was actually happening
> already.

Peter is not a fan of that idea, which can only be due to the fact
that will also increase the maximum memory consumption allowed by
other nodes in the plan too. My concern is that if we do hash_mem and
have that control the memory allowances for Hash Joins and Hash Aggs,
then that solution is just as good as Stephen's idea when the plan
only contains Hash Joins and Hash Aggs.

As much as I do want to see us get something to allow users some
reasonable way to get the same performance as they're used to, I'm
concerned that giving users something that works for many of the use
cases is not really going to be as good as giving them something that
works in all their use cases.   A user who has a partitioned table
with a good number of partitions and partition-wise joins enabled
might not like it if their Hash Join plan suddenly consumes hash_mem *
nPartitions when they've set hash_mem to 10x of work_mem due to some
other plan that requires that to maintain PG12's performance in PG13.
 If that user is unable to adjust hash_mem due to that then they're
not going to be very satisfied that we've added hash_mem to allow
their query to perform as well as it did in PG12. They'll be at the
same OOM risk that they were exposed to in PG12 if they were to
increase hash_mem here.

David

[1] 
https://www.postgresql.org/message-id/20200710143415.gj12...@tamriel.snowman.net




Re: min_safe_lsn column in pg_replication_slots view

2020-07-10 Thread Alvaro Herrera
On 2020-Jul-09, Alvaro Herrera wrote:

> I think we should define InvalidXLogSegNo to be ~((uint64)0) and add a
> macro to test for that.

That's overkill really.  I just used zero.  Running
contrib/test_decoding under valgrind, this now passes.

I think I'd rather do away with the compare to zero, and initialize to
something else in GetWALAvailability, though.  What we're doing seems
unclean and unclear.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 07546b71190b27433e673ff7d248e0f98215abb1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 10 Jul 2020 11:27:55 -0400
Subject: [PATCH] fix uninitialized value

---
 src/backend/access/transam/xlog.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28daf72a50..3d828cc56f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9530,6 +9530,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	 * considering wal_keep_segments and max_slot_wal_keep_size
 	 */
 	XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+	oldestSlotSeg = 0;
 	KeepLogSeg(currpos, );
 
 	/*
@@ -9624,7 +9625,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	}
 
 	/* don't delete WAL segments newer than the calculated segment */
-	if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+	if (*logSegNo == 0 || segno < *logSegNo)
 		*logSegNo = segno;
 }
 
-- 
2.20.1



Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 2:50 PM Stephen Frost  wrote:
> Nothing of what you've said thus far has shown me that there were
> material bits of the discussion that I've missed.

Maybe that's just because you missed those bits too?

> No, that other people
> feel differently or have made comments supporting one thing or another
> isn't what I would consider material- I'm as allowed my opinions as much
> as others, even when I disagree with the majority (or so claimed anyhow-
> I've not gone back to count, but I don't claim it to be otherwise
> either).

You are of course entitled to your opinion.

The problem we're trying to address here is paradoxical, in a certain
sense. The HashAggs-that-spill patch is somehow not at fault on the
one hand, but on the other hand has created this urgent need to
ameliorate what is for all intents and purposes a regression.
Everything is intertwined. Yes -- this *is* weird! And, I admit that
the hash_mem proposal is unorthodox, even ugly -- in fact, I've said
words to that effect on perhaps a dozen occasions at this point. This
is also weird.

I pointed out that my hash_mem proposal was popular because it seemed
like it might save time. When I see somebody I know proposing
something strange, my first thought is "why are they proposing that?".
I might only realize some time later that there are special
circumstances that make the proposal much more reasonable than it
seemed at first (maybe even completely reasonable). There is no
inherent reason why other people supporting the proposal makes it more
valid, but in general it does suggest that special circumstances might
apply. It guides me in the direction of looking for and understanding
what they might be sooner.

-- 
Peter Geoghegan




Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2020-07-10 Thread Alvaro Herrera
On 2019-Sep-09, Amit Kapila wrote:

> Thanks for picking up this.  However, I noticed that previously
> Horiguchi-San has given some comments on this patch [1] which doesn't
> seem to be addressed or at least not all of them are addressed.  It is
> possible that you would have already addressed those, but in that
> case, it would be good if you respond to his email as well.  If those
> are not addressed, then it will be good to address those.

Totally unasked for, here's a rebase of this patch series.  I didn't do
anything other than rebasing to current master, solving a couple of very
trivial conflicts, fixing some whitespace complaints by git apply, and
running tests to verify everthing works.

I don't foresee working on this at all, so if anyone is interested in
seeing this feature in, I encourage them to read and address
Horiguchi-san's feedback.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From c2de4e2a2cbc301b48a7946a9ee0b9acceae7233 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 10 Jul 2020 11:29:11 -0400
Subject: [PATCH v18 1/2] libpq batch support

---
 doc/src/sgml/libpq.sgml   | 502 +++
 doc/src/sgml/lobj.sgml|   4 +
 .../libpqwalreceiver/libpqwalreceiver.c   |   3 +
 src/interfaces/libpq/exports.txt  |   5 +
 src/interfaces/libpq/fe-connect.c |  27 +
 src/interfaces/libpq/fe-exec.c| 597 --
 src/interfaces/libpq/fe-protocol2.c   |   6 +
 src/interfaces/libpq/fe-protocol3.c   |  15 +-
 src/interfaces/libpq/libpq-fe.h   |  24 +-
 src/interfaces/libpq/libpq-int.h  |  47 +-
 10 files changed, 1186 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d1ccaa775a..ad168727c7 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4826,6 +4826,500 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch mode and query pipelining
+
+  
+   libpq
+   batch mode
+  
+
+  
+   libpq
+   pipelining
+  
+
+  
+   libpq supports queueing up queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.
+  
+
+  
+   When to use batching
+
+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It increases client application complexity
+and extra caution is required to prevent client/server deadlocks but
+can sometimes offer considerable performance improvements.
+   
+
+   
+Batching is most useful when the server is distant, i.e. network latency
+(ping time) is high, and when many small operations are being performed in
+rapid sequence. There is usually less benefit in using batches when each
+query takes many multiples of the client/server round-trip time to execute.
+A 100-statement operation run on a server 300ms round-trip-time away would take
+30 seconds in network latency alone without batching; with batching it may spend
+as little as 0.3s waiting for results from the server.
+   
+
+   
+Use batches when your application does lots of small
+INSERT, UPDATE and
+DELETE operations that can't easily be transformed into
+operations on sets or into a
+COPY operation.
+   
+
+   
+Batching is not useful when information from one operation is required by the
+client before it knows enough to send the next operation. The client must
+introduce a synchronisation point and wait for a full client/server
+round-trip to get the results it needs. However, it's often possible to
+adjust the client design to exchange the required information server-side.
+Read-modify-write cycles are especially good candidates; for example:
+
+ BEGIN;
+ SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+ -- result: x=2
+ -- client adds 1 to x:
+ UPDATE mytable SET x = 3 WHERE id = 42;
+ COMMIT;
+
+could be much more efficiently done with:
+
+ UPDATE mytable SET x = x + 1 WHERE id = 42;
+
+   
+
+   
+
+ The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
+ use batches on server versions 7.4 and newer. Batching works on any server
+ that supports the v3 extended query protocol.
+
+   
+
+  
+
+  
+   Using batch mode
+
+   
+To issue batches the application must switch
+a connection into batch mode. Enter batch mode with PQenterBatchMode(conn) or test
+whether batch mode is active with PQbatchStatus(conn). In batch mode only asynchronous operations are permitted, and
+COPY is not recommended as it most likely will trigger failure in batch processing. 
+Using any synchronous command execution 

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Stephen Frost  writes:
> I don't see hash_mem as being any kind of proper fix- it's just punting
> to the user saying "we can't figure this out, how about you do it" and,
> worse, it's in conflict with how we already ask the user that question.
> Turning it into a multiplier doesn't change that either.

Have you got a better proposal that is reasonably implementable for v13?
(I do not accept the argument that "do nothing" is a better proposal.)

I agree that hash_mem is a stopgap, whether it's a multiplier or no,
but at this point it seems difficult to avoid inventing a stopgap.
Getting rid of the process-global work_mem setting is a research project,
and one I wouldn't even count on having results from for v14.  In the
meantime, it seems dead certain that there are applications for which
the current behavior will be problematic.  hash_mem seems like a cleaner
and more useful stopgap than the "escape hatch" approach, at least to me.

regards, tom lane




Re: Stale external URL in doc?

2020-07-10 Thread Daniel Gustafsson
> On 10 Jul 2020, at 23:47, Tom Lane  wrote:
> Thomas Munro  writes:

>> The others required minor manual sleuthing to correct; I hope I found
>> the correct ISN ranges page.  Please see attached.
> 
> I didn't actually check any of these, but they look like sane changes.

+1, looks good, thanks!

cheers ./daniel




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Peter Geoghegan  writes:
> > It now seems likely that the hash_mem/hash_mem_multiplier proposal has
> > the support it needs to get into Postgres 13. Assuming that the
> > proposal doesn't lose momentum, then it's about time to return to the
> > original question you posed at the start of the thread:
> 
> > What should we do with the hashagg_avoid_disk_plan GUC (formerly known
> > as the enable_hashagg_disk GUC), if anything?
> 
> > I myself think that there is a case to be made for removing it
> > entirely.
> 
> +0.5 or so for removing it.  It seems too confusing and dubiously
> useful.

I agree that it shouldn't exist.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost  wrote:
> > > The hash_mem design (as it stands) would affect both hash join and
> > > hash aggregate. I believe that it makes most sense to have hash-based
> > > nodes under the control of a single GUC. I believe that this
> > > granularity will cause the least problems. It certainly represents a
> > > trade-off.
> >
> > So, now this has moved from being a hack to deal with a possible
> > regression for a small number of users due to new behavior in one node,
> > to a change that has impacts on other nodes that hadn't been changed,
> > all happening during beta.
> 
> The common goal is ameliorating or avoiding predictable negative
> consequences for our users. One proposal is an ambitious and
> comprehensive way of dealing with that, that certainly has unique
> risks. The other is much less ambitious, and clearly a kludge -- but
> it's also much less risky. The discussion hasn't really moved at all.

Neither seem to be going in a direction which looks appropriate for a
beta-time change, particuarly given that none of this is *actually* new
territory.  Having to increase work_mem to get a HashAgg or HashJoin
that hasn't got a bunch of batches is routine and while it'd be nicer if
PG had a way to, overall, manage memory usage to stay within some
particular value, we don't.  When we start going in that direction it'll
be interesting to discuss how much we should favor trying to do an
in-memory HashAgg by reducing the amount of memory allocated to a Sort
node.

> > I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
> > the wrong direction because what's actually needed is a way to properly
> > consider and track overall memory usage- a redesign of work_mem (or some
> > new parameter, but it wouldn't be 'hash_mem') as you say, but all of
> > this discussion should be targeting v14.
> 
> It's certainly possible that hash_mem is too radical, and yet not
> radical enough -- in any timeframe (i.e. a total redesign of work_mem
> is the only thing that will be acceptable). I don't understand why you
> refuse to engage with the idea at all, though. The mere fact that
> hash_mem could in theory fix this problem comprehensively *usefully
> frames the problem*. This is the kind of issue where developing a
> shared understanding is very important.

I don't see hash_mem as being any kind of proper fix- it's just punting
to the user saying "we can't figure this out, how about you do it" and,
worse, it's in conflict with how we already ask the user that question.
Turning it into a multiplier doesn't change that either.

> Andres said to me privately that hash_mem could be a good idea, even
> though he opposes it as a fix to the open item for Postgres 13. I
> understand that proposing such a thing during beta is controversial,
> whatever the specifics are. It is a proposal made in the spirit of
> trying to move things forward. Hand wringing about ignoring the
> community's process is completely counterproductive.

I disagree that caring about the fact that we're in beta is
counterproductive.  Saying that we should ignore that we're in beta
isn't appropriate and I could say that's counterproductive- though that
hardly seems to be helpful, so I wonder why that comment was chosen.

> There are about 3 general approaches to addressing this problem, and
> hash_mem is one of them. Am I not allowed to point that out? I have
> been completely open and honest about the risks.

I don't think I said at any point that you weren't allowed to suggest
something.  I do think, and continue to feel, that not enough
consideration is being given to the fact that we're well past the point
where this kind of development should be happening- and commenting on
how leveling that concern at your proposed solution is mere 'hand
wringing' certainly doesn't reduce my feeling that we're being far too
cavalier with this.

> > > Like I said, the escape hatch GUC is not my preferred solution. But at
> > > least it acknowledges the problem. I don't think that anyone (or
> > > anyone else) believes that work_mem doesn't have serious limitations.
> >
> > work_mem obviously has serious limitations, but that's not novel or new
> > or unexpected by anyone.
> 
> In your other email from this morning, you wrote:
> 
> "And those workloads would be addressed by increasing work_mem, no?
> Why are we inventing something new here for something that'll only
> impact a small fraction of users in a small fraction of cases and
> where there's already a perfectly workable way to address the issue?"
> 
> Which one is it?

Uh, it's clearly both.  Those two statements are not contractictory at
all- I agree that work_mem isn't good, and it has limitations, but this
isn't one of those- people can increase work_mem and get the same
HashAgg they got before and have it use all that memory just as it did
before if they want to.

> > > > I'm really rather 

Re: Stale external URL in doc?

2020-07-10 Thread Tom Lane
Thomas Munro  writes:
> The Microsoft one is OK, it's a redirect, but the redirect target
> looks like a more permanent URL to me so maybe we should change it.

+1

> The others required minor manual sleuthing to correct; I hope I found
> the correct ISN ranges page.  Please see attached.

I didn't actually check any of these, but they look like sane changes.

regards, tom lane




Re: "tuple concurrently updated" in pg_restore --jobs

2020-07-10 Thread Justin Pryzby
On Fri, Jul 10, 2020 at 05:36:28PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:
> >> This works about 99% of the time, in fact.  It falls down in the --clean
> 
> > Note that this fails for me (sometimes) even without --clean.
> 
> Oh, I was thinking that REVOKE would only be issued in the --clean
> case, but apparently that's not so.  Doesn't really affect the fix
> proposal though.  I just finished a patch for HEAD, as attached.
> 
> (I flushed the "CatalogId objCatId" argument of dumpACL, which was
> not used.)
> 
> I'm not sure how far to back-patch it -- I think the parallel restore
> of ACLs behavior is not very old, but we might want to teach older
> pg_dump versions to insert the extra dependency anyway, for safety.

Yes, and the test case in David's patch on other thread [0] can't be
backpatched further than this patch is.  A variant on his test case could just
as well be included in this patch (with pg_dump writing to a seekable FD) and
then amended later to also test writing to an unseekable FD.

[0] https://commitfest.postgresql.org/28/2568/




Re: Stale external URL in doc?

2020-07-10 Thread Thomas Munro
On Fri, Jul 10, 2020 at 10:07 AM Daniel Gustafsson  wrote:
> (but I didn't test all of them)

Cave-person shell script time:

for url in ` git grep 'url="http' | sed 's/.*url="//;s/".*//' | sort | uniq `
do
  if ! curl --output /dev/null --silent --head --fail "$url"
  then
echo "bad URL: $url"
  fi
done

bad URL: https://mingw-w64.org/
bad URL: https://msdn.microsoft.com/en-us/library/aa380493%28VS.85%29.aspx
bad URL: https://ssl.icu-project.org/icu-bin/locexp
bad URL: https://www.ismn-international.org/ranges.html

The Microsoft one is OK, it's a redirect, but the redirect target
looks like a more permanent URL to me so maybe we should change it.
The others required minor manual sleuthing to correct; I hope I found
the correct ISN ranges page.  Please see attached.

Looking at the ICU URL, I found a couple like that in our source tree,
and fixed those too, including one used by
src/backend/utils/mb/Unicode/Makefile to fetch source data which has
moved (http://site.icu-project.org/repository says "Announcement
07/16/2018: The ICU source code repository has been migrated from
Subversion to Git, and is now hosted on GitHub.").
From f27aed63697d7ed844dd32f46182d4a8b0edb903 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 11 Jul 2020 09:33:10 +1200
Subject: [PATCH] Fix some more broken URLs.

Discussion: https://postgr.es/m/20200709.161226.204639179120026914.horikyota@gmail.com
---
 doc/src/sgml/acronyms.sgml| 2 +-
 doc/src/sgml/charset.sgml | 2 +-
 doc/src/sgml/installation.sgml| 2 +-
 doc/src/sgml/isn.sgml | 2 +-
 src/backend/utils/mb/Unicode/Makefile | 2 +-
 src/common/encnames.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..d0265dfb93 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -649,7 +649,7 @@
 SSPI
 
  
-  https://msdn.microsoft.com/en-us/library/aa380493%28VS.85%29.aspx;>Security
+  https://docs.microsoft.com/en-us/windows/win32/secauthn/sspi;>Security
   Support Provider Interface
  
 
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 4b4563c5b9..ccd39a55f5 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -841,7 +841,7 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
 subtag) can be found in
 the https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml;>CLDR
 repository.
-The https://ssl.icu-project.org/icu-bin/locexp;>ICU Locale
+The https://icu-project.org/icu-bin/locexp;>ICU Locale
 Explorer can be used to check the details of a particular locale
 definition.  The examples using the k* subtags require
 at least ICU version 54.
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 552303e211..617eb1b044 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2430,7 +2430,7 @@ xcodebuild -version -sdk macosx Path
 

  To build 64 bit binaries using MinGW, install the 64 bit tool set
- from https://mingw-w64.org/;>, put its bin
+ from https://www.mingw-w64.org/;>, put its bin
  directory in the PATH, and run
  configure with the
  --host=x86_64-w64-mingw32 option.
diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml
index e1ea209ff1..bdab4eba34 100644
--- a/doc/src/sgml/isn.sgml
+++ b/doc/src/sgml/isn.sgml
@@ -400,7 +400,7 @@ SELECT isbn13(id) FROM test;
 https://en.wikipedia.org/wiki/List_of_ISBN_identifier_groups;>
 https://www.isbn-international.org/content/isbn-users-manual;>
 https://en.wikipedia.org/wiki/International_Standard_Music_Number;>
-https://www.ismn-international.org/ranges.html;>
+https://www.ismn-international.org/ranges/ranges;>

 
Care was taken during the creation of the algorithms and they
diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 9084f03009..da307d8eb9 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -122,7 +122,7 @@ euc-jis-2004-std.txt sjis-0213-2004-std.txt:
 	$(DOWNLOAD) http://x0213.org/codetable/$(@F)
 
 gb-18030-2000.xml windows-949-2000.xml:
-	$(DOWNLOAD) https://ssl.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
+	$(DOWNLOAD) https://raw.githubusercontent.com/unicode-org/icu-data/master/charset/data/xml/$(@F)
 
 GB2312.TXT:
 	$(DOWNLOAD) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842=txt'
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 14cf1b39e9..2404822619 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -408,7 +408,7 @@ const pg_enc2gettext pg_enc2gettext_tbl[] =
 /*
  * Table of encoding names for ICU (currently covers backend encodings only)
  *
- * Reference: 

Re: Stale external URL in doc?

2020-07-10 Thread Daniel Gustafsson
> On 10 Jul 2020, at 18:28, Tom Lane  wrote:
> Alvaro Herrera  writes:

>> Um, the comp.ai.genetic FAQ can still be found, eg. 
>> http://www.faqs.org/faqs/ai-faq/genetic/part1/
> 
> So it is, although that also shows it hasn't been updated since 2001.

Ah, I missed the alternative source.

> I'll go update that pointer and remove the other links per Daniel's
> patch.

Thanks for the fixup.

cheers ./daniel




Re: "tuple concurrently updated" in pg_restore --jobs

2020-07-10 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:
>> This works about 99% of the time, in fact.  It falls down in the --clean

> Note that this fails for me (sometimes) even without --clean.

Oh, I was thinking that REVOKE would only be issued in the --clean
case, but apparently that's not so.  Doesn't really affect the fix
proposal though.  I just finished a patch for HEAD, as attached.

(I flushed the "CatalogId objCatId" argument of dumpACL, which was
not used.)

I'm not sure how far to back-patch it -- I think the parallel restore
of ACLs behavior is not very old, but we might want to teach older
pg_dump versions to insert the extra dependency anyway, for safety.

regards, tom lane

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 4ea59f5494..08239dde4f 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -28,7 +28,7 @@
  */
 static DumpableObject **dumpIdMap = NULL;
 static int	allocedDumpIds = 0;
-static DumpId lastDumpId = 0;
+static DumpId lastDumpId = 0;	/* Note: 0 is InvalidDumpId */
 
 /*
  * Variables for mapping CatalogId to DumpableObject
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index b17c9dbb8b..1017abbbe5 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -233,6 +233,12 @@ typedef struct
 
 typedef int DumpId;
 
+#define InvalidDumpId 0
+
+/*
+ * Function pointer prototypes for assorted callback methods.
+ */
+
 typedef int (*DataDumperPtr) (Archive *AH, void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8bca147a33..e758b5c50a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -220,11 +220,11 @@ static void dumpUserMappings(Archive *fout,
 			 const char *owner, CatalogId catalogId, DumpId dumpId);
 static void dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo);
 
-static void dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
-	const char *type, const char *name, const char *subname,
-	const char *nspname, const char *owner,
-	const char *acls, const char *racls,
-	const char *initacls, const char *initracls);
+static DumpId dumpACL(Archive *fout, DumpId objDumpId, DumpId altDumpId,
+	  const char *type, const char *name, const char *subname,
+	  const char *nspname, const char *owner,
+	  const char *acls, const char *racls,
+	  const char *initacls, const char *initracls);
 
 static void getDependencies(Archive *fout);
 static void BuildArchiveDependencies(Archive *fout);
@@ -2992,7 +2992,7 @@ dumpDatabase(Archive *fout)
 	 * Dump ACL if any.  Note that we do not support initial privileges
 	 * (pg_init_privs) on databases.
 	 */
-	dumpACL(fout, dbCatId, dbDumpId, "DATABASE",
+	dumpACL(fout, dbDumpId, InvalidDumpId, "DATABASE",
 			qdatname, NULL, NULL,
 			dba, datacl, rdatacl, "", "");
 
@@ -3477,7 +3477,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 
 	/* Dump ACL if any */
 	if (binfo->blobacl && (binfo->dobj.dump & DUMP_COMPONENT_ACL))
-		dumpACL(fout, binfo->dobj.catId, binfo->dobj.dumpId, "LARGE OBJECT",
+		dumpACL(fout, binfo->dobj.dumpId, InvalidDumpId, "LARGE OBJECT",
 binfo->dobj.name, NULL,
 NULL, binfo->rolname, binfo->blobacl, binfo->rblobacl,
 binfo->initblobacl, binfo->initrblobacl);
@@ -10155,7 +10155,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 	 nspinfo->dobj.catId, 0, nspinfo->dobj.dumpId);
 
 	if (nspinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId, "SCHEMA",
+		dumpACL(fout, nspinfo->dobj.dumpId, InvalidDumpId, "SCHEMA",
 qnspname, NULL, NULL,
 nspinfo->rolname, nspinfo->nspacl, nspinfo->rnspacl,
 nspinfo->initnspacl, nspinfo->initrnspacl);
@@ -10439,7 +10439,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 	 tyinfo->dobj.catId, 0, tyinfo->dobj.dumpId);
 
 	if (tyinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, tyinfo->dobj.catId, tyinfo->dobj.dumpId, "TYPE",
+		dumpACL(fout, tyinfo->dobj.dumpId, InvalidDumpId, "TYPE",
 qtypname, NULL,
 tyinfo->dobj.namespace->dobj.name,
 tyinfo->rolname, tyinfo->typacl, tyinfo->rtypacl,
@@ -10565,7 +10565,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 	 tyinfo->dobj.catId, 0, tyinfo->dobj.dumpId);
 
 	if (tyinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, tyinfo->dobj.catId, tyinfo->dobj.dumpId, "TYPE",
+		dumpACL(fout, tyinfo->dobj.dumpId, InvalidDumpId, "TYPE",
 qtypname, NULL,
 tyinfo->dobj.namespace->dobj.name,
 tyinfo->rolname, tyinfo->typacl, tyinfo->rtypacl,
@@ -10637,7 +10637,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 	 tyinfo->dobj.catId, 0, tyinfo->dobj.dumpId);
 
 	if (tyinfo->dobj.dump & DUMP_COMPONENT_ACL)
-		dumpACL(fout, tyinfo->dobj.catId, tyinfo->dobj.dumpId, "TYPE",
+		dumpACL(fout, tyinfo->dobj.dumpId, InvalidDumpId, "TYPE",
 qtypname, 

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Peter Geoghegan  writes:
> It now seems likely that the hash_mem/hash_mem_multiplier proposal has
> the support it needs to get into Postgres 13. Assuming that the
> proposal doesn't lose momentum, then it's about time to return to the
> original question you posed at the start of the thread:

> What should we do with the hashagg_avoid_disk_plan GUC (formerly known
> as the enable_hashagg_disk GUC), if anything?

> I myself think that there is a case to be made for removing it
> entirely.

+0.5 or so for removing it.  It seems too confusing and dubiously
useful.

regards, tom lane




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Jul-10, Peter Geoghegan wrote:
>> * The maximum allowable value is 100.0, to protect users from
>> accidentally setting hash_mem_multiplier to a value intended to work
>> like a work_mem-style KB value (you can't provide an absolute value
>> like that directly). This maximum is absurdly high.

> I'm not sure about this bit; sounds a bit like what has been qualified
> as "nannyism" elsewhere.  Suppose I want to give a hash table 2GB of
> memory for whatever reason.  If my work_mem is default (4MB) then I
> cannot possibly achieve that without altering both settings.
> So I propose that maybe we do want a maximum value, but if so it should
> be higher than what you propose.  I think 1 is acceptable in that it
> doesn't get in the way.

I was kind of thinking 1000 as the limit ;-).  In any case, the code
will need to internally clamp the product to not exceed whatever the
work_mem physical limit is these days.

regards, tom lane




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 2:10 PM Alvaro Herrera  wrote:
> I'm not sure about this bit; sounds a bit like what has been qualified
> as "nannyism" elsewhere.  Suppose I want to give a hash table 2GB of
> memory for whatever reason.  If my work_mem is default (4MB) then I
> cannot possibly achieve that without altering both settings.
>
> So I propose that maybe we do want a maximum value, but if so it should
> be higher than what you propose.  I think 1 is acceptable in that it
> doesn't get in the way.

That's a good point.

I amend my proposal: the maximum allowable value of
hash_mem_multiplier should be 1.0 (i.e., ten thousand times
whatever work_mem is set to, which is subject to the existing work_mem
sizing restrictions).

-- 
Peter Geoghegan




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 11:34 AM Jeff Davis  wrote:
> On Fri, 2020-07-10 at 13:46 -0400, Tom Lane wrote:
> > I looked over Peter's patch in [1], and it seems generally pretty
> > sane to me, though I concur with the idea that it'd be better to
> > define the GUC as a multiplier for work_mem.  (For one thing, we
> > could
> > then easily limit it to be at least 1.0, ensuring sanity; also, if
> > work_mem does eventually become more dynamic than it is now, we might
> > still be able to salvage this knob as something useful.  Or if not,
> > we just rip it out.)  So my vote is for moving in that direction.
>
> In that case, I will hold off on my "escape-hatch" GUC.

It now seems likely that the hash_mem/hash_mem_multiplier proposal has
the support it needs to get into Postgres 13. Assuming that the
proposal doesn't lose momentum, then it's about time to return to the
original question you posed at the start of the thread:

What should we do with the hashagg_avoid_disk_plan GUC (formerly known
as the enable_hashagg_disk GUC), if anything?

I myself think that there is a case to be made for removing it
entirely. But if we keep it then we should also not change the
default. In other words, by default the planner should *not* try to
avoid hash aggs that spill. AFAICT there is no particular reason to be
concerned about that now, since nobody has expressed any concerns
about any of the possibly-relevant cost models. That said, I don't
feel strongly about this hashagg_avoid_disk_plan question. It seems
*much* less important.

--
Peter Geoghegan




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Alvaro Herrera
On 2020-Jul-10, Peter Geoghegan wrote:

> * The maximum allowable value is 100.0, to protect users from
> accidentally setting hash_mem_multiplier to a value intended to work
> like a work_mem-style KB value (you can't provide an absolute value
> like that directly). This maximum is absurdly high.
> 
> I think that it's possible that a small number of users will find it
> useful to set the value of hash_mem_multiplier as high as 5.0. That is
> a very aggressive value, but one that could still make sense with
> certain workloads.

I'm not sure about this bit; sounds a bit like what has been qualified
as "nannyism" elsewhere.  Suppose I want to give a hash table 2GB of
memory for whatever reason.  If my work_mem is default (4MB) then I
cannot possibly achieve that without altering both settings.

So I propose that maybe we do want a maximum value, but if so it should
be higher than what you propose.  I think 1 is acceptable in that it
doesn't get in the way.

Another point is that if you specify a unit for the multiplier (which is
what users are likely to do for larger values), it'll fail anyway, so
I'm not sure this is such terrible a problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: "tuple concurrently updated" in pg_restore --jobs

2020-07-10 Thread Justin Pryzby
On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > I hit this issue intermittently (roughly half the time) while working with a
> > patch David submitted, and finally found a recipe to reproduce it on an
> > unpatched v12 instance.
> 
> > I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
> > parallel.
> 
> It's not pre-data.  But it's true that pg_restore figures it can restore
> ACLs in parallel during the ACL-restoring pass, on the theory that pg_dump
> will not emit two different ACL entries for the same object, so that we
> can do all the catalog updates in parallel without conflicts.
> 
> This works about 99% of the time, in fact.  It falls down in the --clean

Note that this fails for me (sometimes) even without --clean.

$ pg_restore pg_dump.out -j2 -d pryzbyj -v --section pre-data

pg_restore: entering main parallel loop
pg_restore: launching item 3395 ACL TABLE pg_proc
pg_restore: launching item 3396 ACL COLUMN pg_proc.proname
pg_restore: creating ACL "pg_catalog.TABLE pg_proc"
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.proname"
pg_restore: finished item 3395 ACL TABLE pg_proc
pg_restore: launching item 3397 ACL COLUMN pg_proc.pronamespace
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.pronamespace"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3396; 0 0 ACL COLUMN pg_proc.proname postgres
pg_restore: error: could not execute query: ERROR:  tuple concurrently updated
Command was: GRANT SELECT(proname) ON TABLE pg_catalog.pg_proc TO PUBLIC;


-- 
Justin




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 10:47 AM Tom Lane  wrote:
> I looked over Peter's patch in [1], and it seems generally pretty
> sane to me, though I concur with the idea that it'd be better to
> define the GUC as a multiplier for work_mem.  (For one thing, we could
> then easily limit it to be at least 1.0, ensuring sanity; also, if
> work_mem does eventually become more dynamic than it is now, we might
> still be able to salvage this knob as something useful.  Or if not,
> we just rip it out.)  So my vote is for moving in that direction.

Cool. I agree that it makes sense to constrain the effective value to
be at least work_mem in all cases.

With that in mind, I propose that this new GUC have the following
characteristics:

* It should be named "hash_mem_multiplier", a floating point GUC
(somewhat like bgwriter_lru_multiplier).

* The default value is 2.0.

* The minimum allowable value is 1.0, to protect users from
accidentally giving less memory to hash-based nodes.

* The maximum allowable value is 100.0, to protect users from
accidentally setting hash_mem_multiplier to a value intended to work
like a work_mem-style KB value (you can't provide an absolute value
like that directly). This maximum is absurdly high.

I think that it's possible that a small number of users will find it
useful to set the value of hash_mem_multiplier as high as 5.0. That is
a very aggressive value, but one that could still make sense with
certain workloads.

Thoughts?
-- 
Peter Geoghegan




Re: "tuple concurrently updated" in pg_restore --jobs

2020-07-10 Thread Tom Lane
Justin Pryzby  writes:
> I hit this issue intermittently (roughly half the time) while working with a
> patch David submitted, and finally found a recipe to reproduce it on an
> unpatched v12 instance.

> I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
> parallel.

It's not pre-data.  But it's true that pg_restore figures it can restore
ACLs in parallel during the ACL-restoring pass, on the theory that pg_dump
will not emit two different ACL entries for the same object, so that we
can do all the catalog updates in parallel without conflicts.

This works about 99% of the time, in fact.  It falls down in the --clean
case if we have to revoke existing table permissions, because in that case
the REVOKE at table level is required to clear the table's per-column ACLs
as well, so that that ACL entry involves touching the same catalog rows
that the per-column ACLs want to touch.

I think the right fix is to give the per-column ACL entries dependencies
on the per-table ACL, if there is one.  This will not fix the problem
for the case of restoring from an existing pg_dump archive that lacks
such dependency links --- but given the lack of field complaints, I'm
okay with that.

This looks straightforward, if somewhat tedious because we'll have to
change the API of pg_dump's dumpACL() function, which is called by
a lot of places.  Barring objections, I'll go do that.

regards, tom lane




Re: Wait profiling

2020-07-10 Thread Alvaro Herrera
On 2020-Jul-10, Daniel Wood wrote:

> After nearly 5 years does something like the following yet exist?
> https://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru

Yes, we have pg_stat_activity.wait_events which implement pretty much
what Ildus describes there.

> 1) An option to "explain" to produce a wait events profile.
> postgres=# explain (analyze, waitprofile) update pgbench_accounts set 
> bid=bid+1 where aid < 200;
> ...
> Execution time: 23111.231 ms

There's an out-of-core extension, search for pg_wait_sampling.  I
haven't tested it yet ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

2020-07-10 Thread Michail Nikolaev
Hello, Peter.

Thanks for the update.

Yes, it is the right decision.
I have started to spot that bug only while working on a faster scan
using hint bits on replicas [1], so it is unlikely to hit it in
production at the moment.

Thanks,
Michail.

[1]: 
https://www.postgresql.org/message-id/CANtu0ojmkN_6P7CQWsZ%3DuEgeFnSmpCiqCxyYaHnhYpTZHj7Ubw%40mail.gmail.com




Wait profiling

2020-07-10 Thread Daniel Wood
After nearly 5 years does something like the following yet exist?
https://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru

I feel that it would be useful to have the following two things.  One PG 
enhancement and one standard extension.

1) An option to "explain" to produce a wait events profile.
postgres=# explain (analyze, waitprofile) update pgbench_accounts set bid=bid+1 
where aid < 200;
...
Execution time: 23111.231 ms

62.6% BufFileRead
50.0% CPU
9.3% LWLock

It uses a PG timer to do this.

2) An extension based function like: select pg_wait_profile(pid, nSeconds, 
timerFrequency) to return the same thing for an already running query.  Useful 
if you want examine some already long running query that is taking too long.

Neither of these would be doing the heavy weight pg_stat_activity but directly 
poll the wait event in PROC.  I've already coded the EXPLAIN option.

Furthermore, can't we just remove the following "IF" test from 
pgstat_report_wait_{start,end}?
if (!pgstat_track_activities || !proc)
return;
Just do the assignment of wait_event_info always.  We should use a dummy PGPROC 
assigned to MyProc until we assign the one in the procarray in shared memory.  
That way we don't need the "!proc" test.
About the only thing I'd want to verify is whether wait_event_info is on the 
same cache lines as anything else having to do with snapshots.

If I recall correctly the blanks lines above I've used to make this more 
readable will disappear.  :-(

- Dan Wood

Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

2020-07-10 Thread Tom Lane
I wrote:
> Mark Lorenz  writes:
>> But, nevertheless, what about adding the function to accept the DAY, D 
>> (and maybe the Q) patterns for to_date() - in this case, of course, in 
>> the uncorrelated version? to_char() handles them properly. And, from my 
>> point of view, there is no reason why they should give "1" instead the 
>> real day number. What do you think?

> The trick is to produce something sane.  I think that a reasonable
> precedent for this would be what to_date does with ISO-week fields:
> you can ask it to parse IYYY-IW-ID but you can't mix that with regular
> month/day/year fields.  So for example, it seems like it'd be possible
> to reconstruct a date from -WW-D, because that's enough to uniquely
> identify a day.  The D field isn't monotonically increasing within a
> week, but nonetheless there's exactly one day in each -WW week that
> has a particular D value.  However you probably don't want to allow
> inconsistent mixtures like -WW-ID, because that's just a mess (and
> more than likely, it's a mistake).  And I would not be in favor of
> allowing -Q either, because that would not be enough to uniquely
> identify a day, so there's really no point in allowing Q to enter into
> to_date's considerations at all.
> Whether there is actually any field demand for such a feature is
> not clear to me.  AFAICT Oracle doesn't support it.

Since we're certainly not going to commit these patches as-presented,
and nothing has happened on this thread since early April, I've marked
both the CF entries as Returned With Feedback.  If you do write a patch
to make to_date work as above, please file a new CF entry.

(BTW, having two CF entries pointing at the same email thread is
pretty confusing to our not-that-bright tools.  It's probably
better to have just one entry per thread in the future.)

regards, tom lane




Re: Remove Deprecated Exclusive Backup Mode

2020-07-10 Thread David Steele

On 7/2/20 7:12 AM, David Steele wrote:


None of this really solves the problem of what happens when the user 
dumps the backup_label into the data directory. With traditional backup 
software that's pretty much going to be the only choice. Is telling them 
not to do it and washing our hands of it really enough?


After some thought I believe there are several recommendations we could 
give in the docs about what to do with the backup_label file.


This patch, though, looks to be a dead end. I think the best thing to do 
is withdraw the patch and end the thread.


I'll start a new thread when I have a patch/design that implements the 
new ideas we have discussed.


Regards,
--
-David
da...@pgmasters.net




Re: [Patch] Add missing libraries to Libs.private of libpq.pc

2020-07-10 Thread Peter Eisentraut

On 2020-04-08 11:38, Sandro Mani wrote:

The following patch, which we added to build mingw-postgresql on Fedora,
adds some missing libraries to Libs.private of libpq.pc, discovered when
attempting to statically link with libpq:

-lz: is required by -lcrypto


I think the correct fix for that would be to add libssl to libpq's 
Requires.private.



-liconv: is required by -lintl (though possibly depends on whether
gettext was compiled with iconv support)


Yeah, in both of these cases it depends on what libssl or libintl 
variant you actually got.  It could be the OS one or a separately 
installed one, it could be one with or without pkg-config support.  I'm 
not sure what a robust solution would be.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Jeff Davis
On Fri, 2020-07-10 at 13:46 -0400, Tom Lane wrote:
> I looked over Peter's patch in [1], and it seems generally pretty
> sane to me, though I concur with the idea that it'd be better to
> define the GUC as a multiplier for work_mem.  (For one thing, we
> could
> then easily limit it to be at least 1.0, ensuring sanity; also, if
> work_mem does eventually become more dynamic than it is now, we might
> still be able to salvage this knob as something useful.  Or if not,
> we just rip it out.)  So my vote is for moving in that direction.

In that case, I will hold off on my "escape-hatch" GUC.

Regards,
Jeff Davis






Re: Binary support for pgoutput plugin

2020-07-10 Thread Tom Lane
Daniel Gustafsson  writes:
> Thanks for the update!  Do note that my patch included a new file which is
> missing from this patchset:
>   src/test/subscription/t/014_binary.pl
> This is, IMO, the most interesting test of this feature so it would be good to
> be included.  It's a basic test and can no doubt be extended to be even more
> relevant, but it's a start.

I was about to complain that the latest patchset includes no meaningful
test cases, but I assume that this missing file contains those.

>> I did not do the macro for updated, inserted, deleted, will give that a go 
>> tomorrow.

> This might not be a blocker, but personally I think it would make the
> code more readable. Anyone else have an opinion on this?

+1 for using macros.

> Reading through the new patch, and running the tests, I'm marking this as 
> Ready
> for Committer.  It does need some cosmetic TLC, quite possibly just from
> pg_indent but I didn't validate if it will take care of everything, and 
> comment
> touchups (there is still a TODO comment around wording that needs to be
> resolved).  However, I think it's in good enough shape for consideration at
> this point.

I took a quick look through the patch and had some concerns:

* Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks.
Those are quite dead so far as a patch for HEAD is concerned --- in fact,
since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so,
the patch is actively doing the wrong thing there.  Not that it matters.
This code will never appear in any branch where float timestamps could
be a thing.

* I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), endiannness,
etc, make any sense either.  Those surely do not affect the on-the-wire
representation of values --- or if they do, we've blown it somewhere else.
I'd just take out all those checks and assume that the binary
representation is sufficiently portable.  (If it's not, it's more or less
the user's problem, just as in binary COPY.)

* Please also remove debugging hacks such as enabling WAL_DEBUG.

* It'd likely be wise for the documentation to point out that binary
mode only works if all types to be transferred have send/receive
functions.


BTW, while it's not the job of this patch to fix it, I find it quite
distressing that we're apparently repeating the lookups of the type
I/O functions for every row transferred.

I'll set this back to WoA, but I concur with Daniel's opinion that
it doesn't seem that far from committability.

regards, tom lane




Re: Report error position in partition bound check

2020-07-10 Thread Alexandra Wang
> On 2 July 2020, at 06:39, Daniel Gustafsson  wrote:
> > On 10 Apr 2020, at 23:50, Alexandra Wang  wrote:
>
> > On Fri, Apr 10, 2020 at 8:37 AM Ashutosh Bapat <
ashutosh.ba...@2ndquadrant.com >
wrote:
> > > for a multi-key value the ^
> > > points to the first column and the reader may think that that's the
> > > problematci column. Should it instead point to ( ?
> >
> > I attached a v2 of Amit's 0002 patch to also report the exact column
> > for the partition overlap errors.
>
> This patch fails to apply to HEAD due to conflicts in the create_table
expected
> output.  Can you please submit a rebased version?  I'm marking the CF
entry
> Waiting on Author in the meantime.

Thank you Daniel. Here's the rebased patch. I also squashed the two
patches into one so it's easier to review.

-- 
*Alexandra Wang*
From 572f9086fc9d6e3e7cf1e58c3a6ee21dd8cd9f1b Mon Sep 17 00:00:00 2001
From: Alexandra Wang 
Date: Fri, 10 Jul 2020 10:28:49 -0700
Subject: [PATCH] Improve check new partition bound error position report

We have been passing a dummy ParseState to ereport(). Without the source
text in the ParseState ereport does not report the error position even
if a error location is supplied. This patch passes a ParseState to
check_new_partition_bound() when it is available.

-- Create parent table
create table foo (a int, b int, c date) partition by range (b,c);

-- Before:
create table foo_part_1 partition of foo for values from (1, date '2007-01-01') to (1, date '2006-01-01');
ERROR:  empty range bound specified for partition "foo_part_1"
DETAIL:  Specified lower bound (1, '2007-01-01') is greater than or equal to upper bound (1, '2006-01-01').

-- After:
create table foo_part_1 partition of foo for values from (1, date '2007-01-01') to (1, date '2006-01-01');
ERROR:  empty range bound specified for partition "foo_part_1"
LINE 1: ...e foo_part_1 partition of foo for values from (1, date '2007...
 ^
DETAIL:  Specified lower bound (1, '2007-01-01') is greater than or equal to upper bound (1, '2006-01-01').

Co-authored-by: Ashwin Agrawal 
Co-authored-by: Amit Langote 
---
 src/backend/commands/tablecmds.c   | 15 --
 src/backend/parser/parse_utilcmd.c |  3 ++
 src/backend/partitioning/partbounds.c  | 63 ++
 src/include/partitioning/partbounds.h  |  4 +-
 src/test/regress/expected/alter_table.out  | 10 
 src/test/regress/expected/create_table.out | 30 +++
 6 files changed, 98 insertions(+), 27 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 26700da278..8de7473ba7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -541,7 +541,8 @@ static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partPa
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel);
 static ObjectAddress ATExecAttachPartition(List **wqueue, Relation rel,
-		   PartitionCmd *cmd);
+		   PartitionCmd *cmd,
+		   AlterTableUtilityContext * context);
 static void AttachPartitionEnsureIndexes(Relation rel, Relation attachrel);
 static void QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
 			   List *partConstraint,
@@ -1005,7 +1006,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * Check first that the new partition's bound is valid and does not
 		 * overlap with any of existing partitions of the parent.
 		 */
-		check_new_partition_bound(relname, parent, bound);
+		check_new_partition_bound(relname, parent, bound, pstate);
 
 		/*
 		 * If the default partition exists, its partition constraints will
@@ -4646,7 +4647,8 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	  cur_pass, context);
 			Assert(cmd != NULL);
 			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-ATExecAttachPartition(wqueue, rel, (PartitionCmd *) cmd->def);
+ATExecAttachPartition(wqueue, rel, (PartitionCmd *) cmd->def,
+	  context);
 			else
 ATExecAttachPartitionIdx(wqueue, rel,
 		 ((PartitionCmd *) cmd->def)->name);
@@ -16186,7 +16188,8 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
  * Return the address of the newly attached partition.
  */
 static ObjectAddress
-ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
+ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
+	  AlterTableUtilityContext * context)
 {
 	Relation	attachrel,
 catalog;
@@ -16201,6 +16204,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	const char *trigger_name;
 	Oid			defaultPartOid;
 	List	   *partBoundConstraint;
+	ParseState *pstate = make_parsestate(NULL);
 
 	/*
 	 * We must lock the default partition if one exists, because attaching 

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost  wrote:
>> Due to the fact that we're in beta and now is not the time to be
>> redesigning this feature.

> Did you read the discussion?

Beta is when we fix problems that testing exposes in new features.
Obviously, we'd rather not design new APIs at this point, but if it's
the only reasonable way to resolve a problem, that's what we've got
to do.  I don't think anyone is advocating for reverting the hashagg
spill feature, and "do nothing" is not an attractive option either.
On the other hand, it's surely too late to engage in any massive
redesigns such as some of this thread has speculated about.

I looked over Peter's patch in [1], and it seems generally pretty
sane to me, though I concur with the idea that it'd be better to
define the GUC as a multiplier for work_mem.  (For one thing, we could
then easily limit it to be at least 1.0, ensuring sanity; also, if
work_mem does eventually become more dynamic than it is now, we might
still be able to salvage this knob as something useful.  Or if not,
we just rip it out.)  So my vote is for moving in that direction.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost  wrote:
> > The hash_mem design (as it stands) would affect both hash join and
> > hash aggregate. I believe that it makes most sense to have hash-based
> > nodes under the control of a single GUC. I believe that this
> > granularity will cause the least problems. It certainly represents a
> > trade-off.
>
> So, now this has moved from being a hack to deal with a possible
> regression for a small number of users due to new behavior in one node,
> to a change that has impacts on other nodes that hadn't been changed,
> all happening during beta.

The common goal is ameliorating or avoiding predictable negative
consequences for our users. One proposal is an ambitious and
comprehensive way of dealing with that, that certainly has unique
risks. The other is much less ambitious, and clearly a kludge -- but
it's also much less risky. The discussion hasn't really moved at all.

> I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
> the wrong direction because what's actually needed is a way to properly
> consider and track overall memory usage- a redesign of work_mem (or some
> new parameter, but it wouldn't be 'hash_mem') as you say, but all of
> this discussion should be targeting v14.

It's certainly possible that hash_mem is too radical, and yet not
radical enough -- in any timeframe (i.e. a total redesign of work_mem
is the only thing that will be acceptable). I don't understand why you
refuse to engage with the idea at all, though. The mere fact that
hash_mem could in theory fix this problem comprehensively *usefully
frames the problem*. This is the kind of issue where developing a
shared understanding is very important.

Andres said to me privately that hash_mem could be a good idea, even
though he opposes it as a fix to the open item for Postgres 13. I
understand that proposing such a thing during beta is controversial,
whatever the specifics are. It is a proposal made in the spirit of
trying to move things forward. Hand wringing about ignoring the
community's process is completely counterproductive.

There are about 3 general approaches to addressing this problem, and
hash_mem is one of them. Am I not allowed to point that out? I have
been completely open and honest about the risks.

> > Like I said, the escape hatch GUC is not my preferred solution. But at
> > least it acknowledges the problem. I don't think that anyone (or
> > anyone else) believes that work_mem doesn't have serious limitations.
>
> work_mem obviously has serious limitations, but that's not novel or new
> or unexpected by anyone.

In your other email from this morning, you wrote:

"And those workloads would be addressed by increasing work_mem, no?
Why are we inventing something new here for something that'll only
impact a small fraction of users in a small fraction of cases and
where there's already a perfectly workable way to address the issue?"

Which one is it?

> > > I'm really rather astounded at the direction this has been going in.
> >
> > Why?
>
> Due to the fact that we're in beta and now is not the time to be
> redesigning this feature.

Did you read the discussion?

-- 
Peter Geoghegan




Re: SQL-standard function body

2020-07-10 Thread Tom Lane
Thomas Munro  writes:
> On Wed, Jul 1, 2020 at 5:49 AM Peter Eisentraut
>  wrote:
>> - More test coverage is needed.  Surprisingly, there wasn't actually any
>> test AFAICT that just creates and SQL function and runs it.  Most of
>> that code is tested incidentally, but there is very little or no
>> targeted testing of this functionality.

> FYI cfbot showed a sign of some kind of error_context_stack corruption
> while running "DROP TABLE functest3 CASCADE;".

BTW, it occurs to me after answering bug #16534 that
contrib/earthdistance's SQL functions would be great candidates for this
new syntax.  Binding their references at creation time is really exactly
what we want.

I still feel that we can't just replace the existing implementation,
though, as that would kill too many use-cases where late binding is
helpful.

regards, tom lane




Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Alvaro Herrera
On 2020-Jul-10, Justin Pryzby wrote:
> On Fri, Jul 10, 2020 at 12:45:29PM -0400, Alvaro Herrera wrote:

> > I think it's overly verbose; all non-parallel backends are going to get
> > their own PID twice, and I'm not sure this is going to be great to
> > parse.  I think it would be more sensible that if the process does not
> > have a parent (leader), %P expands to empty.
> 
> That's what's done.
> 
> + Process ID of the parallel group leader if this process 
> was
> + at some point involved in parallel query, otherwise null.  For a
> + parallel group leader itself, this field is set to its own 
> process
> + ID.

Oh, okay by me then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: factorial function/phase out postfix operators?

2020-07-10 Thread Mark Dilger


> On Jun 30, 2020, at 2:47 PM, Mark Dilger  wrote:
> 
> 
> 
>> On May 19, 2020, at 4:47 PM, Tom Lane  wrote:
>> 
>> I wrote:
>>> However, we do have to have a benefit to show those people whose
>>> queries we break.  Hence my insistence on having a working AS fix
>>> (or some other benefit) before not after.
>> 
>> I experimented with this a bit more, and came up with the attached.
>> It's not a working patch, just a set of grammar changes that Bison
>> is happy with.  (Getting to a working patch would require fixing the
>> various build infrastructure that knows about the keyword classification,
>> which seems straightforward but tedious.)
> 
> I built a patch on top of yours that does much of that tedious work.
> 
>> As Robert theorized, it works to move a fairly-small number of unreserved
>> keywords into a new slightly-reserved category.  However, as the patch
>> stands, only the remaining fully-unreserved keywords can be used as bare
>> column labels.  I'd hoped to be able to also use col_name keywords in that
>> way (which'd make the set of legal bare column labels mostly the same as
>> ColId).  The col_name keywords that cause problems are, it appears,
>> only PRECISION, CHARACTER, and CHAR_P.  So in principle we could move
>> those three into yet another keyword category and then let the remaining
>> col_name keywords be included in BareColLabel.  I kind of think that
>> that's more complication than it's worth, though.
> 
> By my count, 288 more keywords can be used as column aliases without the AS 
> keyword after the patch.  That exactly matches what Robert said upthread.
> 
> Tom and Álvaro discussed upthread:
> 
>> Would it make sense (and possible) to have a keyword category that is
>> not disjoint wrt. the others?  Maybe that ends up being easier than
>> a solution that ends up with six or seven categories.

Version 2, attached, follows this design, increasing the number of keywords 
that can be used as column aliases without the AS keyword up to 411, with only 
39 keywords still requiring an explicit preceding AS.



v2-0001-Allow-most-keywords-to-be-used-as-implicit-column.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





RE: Index Skip Scan (new UniqueKeys)

2020-07-10 Thread Floris Van Nee
Hi Dmitry,

Also took another look at the patch now, and found a case of incorrect data. It 
looks related to the new way of creating the paths, as I can't recall seeing 
this in earlier versions.

create table t1 as select a,b,b%5 as c, random() as d from generate_series(1, 
10) a, generate_series(1,100) b;
create index on t1 (a,b,c);

postgres=# explain select distinct on (a) * from t1 order by a,b desc,c;
  QUERY PLAN   
---
 Sort  (cost=2.92..2.94 rows=10 width=20)
   Sort Key: a, b DESC, c
   ->  Index Scan using t1_a_b_c_idx on t1  (cost=0.28..2.75 rows=10 width=20)
 Skip scan: true
(4 rows)


With the 'order by a, b desc, c' we expect the value of column 'b' to always be 
100. With index_skipscan enabled, it always gives 1 though. It's not correct 
that the planner chooses a skip scan followed by sort in this case.

-Floris





Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Justin Pryzby
On Fri, Jul 10, 2020 at 12:45:29PM -0400, Alvaro Herrera wrote:
> On 2020-Mar-18, Justin Pryzby wrote:
> 
> > On Sun, Mar 15, 2020 at 12:49:33PM +0100, Julien Rouhaud wrote:
> 
> > template1=# SET log_temp_files=0; explain analyze SELECT a,COUNT(1) FROM t 
> > a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:47.288 CDT [55375537]LOG:  statement: SET 
> > log_temp_files=0;
> > SET
> > 2020-03-15 21:20:47.289 CDT [55375537]LOG:  statement: explain 
> > analyze SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:51.253 CDT [56275537]LOG:  temporary file: path 
> > "base/pgsql_tmp/pgsql_tmp5627.0", size 6094848
> > 2020-03-15 21:20:51.253 CDT [56275537]STATEMENT:  explain analyze 
> > SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:51.254 CDT [56265537]LOG:  temporary file: path 
> > "base/pgsql_tmp/pgsql_tmp5626.0", size 6103040
> > 2020-03-15 21:20:51.254 CDT [56265537]STATEMENT:  explain analyze 
> > SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> > 2020-03-15 21:20:51.263 CDT [55375537]LOG:  temporary file: path 
> > "base/pgsql_tmp/pgsql_tmp5537.1.sharedfileset/o15of16.p0.0", size 557056
> 
> I think it's overly verbose; all non-parallel backends are going to get
> their own PID twice, and I'm not sure this is going to be great to
> parse.  I think it would be more sensible that if the process does not
> have a parent (leader), %P expands to empty.

That's what's done.

+ Process ID of the parallel group leader if this process was
+ at some point involved in parallel query, otherwise null.  For a
+ parallel group leader itself, this field is set to its own process
+ ID.

2020-07-10 11:53:32.304 CDT [16699 ]LOG:  statement: SELECT 1;
2020-07-10 11:53:32.304 
CDT,"pryzbyj","postgres",16699,"[local]",5f089d0b.413b,1,"idle",2020-07-10 
11:53:31 CDT,3/4,0,LOG,0,"statement: SELECT 1;","psql","client 
backend",

-- 
Justin




Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Alvaro Herrera
On 2020-Mar-18, Justin Pryzby wrote:

> On Sun, Mar 15, 2020 at 12:49:33PM +0100, Julien Rouhaud wrote:

> template1=# SET log_temp_files=0; explain analyze SELECT a,COUNT(1) FROM t a 
> JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:47.288 CDT [55375537]LOG:  statement: SET 
> log_temp_files=0;
> SET
> 2020-03-15 21:20:47.289 CDT [55375537]LOG:  statement: explain 
> analyze SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:51.253 CDT [56275537]LOG:  temporary file: path 
> "base/pgsql_tmp/pgsql_tmp5627.0", size 6094848
> 2020-03-15 21:20:51.253 CDT [56275537]STATEMENT:  explain analyze 
> SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:51.254 CDT [56265537]LOG:  temporary file: path 
> "base/pgsql_tmp/pgsql_tmp5626.0", size 6103040
> 2020-03-15 21:20:51.254 CDT [56265537]STATEMENT:  explain analyze 
> SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
> 2020-03-15 21:20:51.263 CDT [55375537]LOG:  temporary file: path 
> "base/pgsql_tmp/pgsql_tmp5537.1.sharedfileset/o15of16.p0.0", size 557056

I think it's overly verbose; all non-parallel backends are going to get
their own PID twice, and I'm not sure this is going to be great to
parse.  I think it would be more sensible that if the process does not
have a parent (leader), %P expands to empty.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Julien Rouhaud
On Fri, Jul 10, 2020 at 11:11:15AM -0500, Justin Pryzby wrote:
> On Fri, Jul 10, 2020 at 05:13:26PM +0200, Julien Rouhaud wrote:
> > There's a thinko in the padding handling.  It should be dones whether MyProc
> > and/or lockGroupLeader is NULL or not, and only if padding was asked, like 
> > it's
> > done for case 'd' for instance.
> > 
> > Also, the '%k' escape sounds a bit random.  Is there any reason why we don't
> > use any uppercase character for log_line_prefix?  %P could be a better
> > alternative, otherwise maybe %g, as GroupLeader/Gather?
> 
> Thanks for looking.  %P is a good idea - it's consistent with ps and pkill and
> probably other %commands.  I also amended the docs.

Thanks!

So for the leader == NULL case, the AppendStringInfoSpace is a no-op if no
padding was asked, so it's probably not worth adding extra code to make it any
more obvious.

It all looks good to me, I'm marking the patch a ready for committer!




Re: Stale external URL in doc?

2020-07-10 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Jul-10, Daniel Gustafsson wrote:
>> Taking a look at other links to external resources, most links seemed to
>> resolve still (but I didn't test all of them).  I did find another one on the
>> GEQO page which is now dead without the content available elsewhere, as well 
>> as
>> a larger problem with the AIX references.

> Um, the comp.ai.genetic FAQ can still be found, eg. 
> http://www.faqs.org/faqs/ai-faq/genetic/part1/

So it is, although that also shows it hasn't been updated since 2001.

OTOH, that vintage of info is probably just fine for understanding GEQO.

I'll go update that pointer and remove the other links per Daniel's
patch.

regards, tom lane




Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Justin Pryzby
On Fri, Jul 10, 2020 at 05:13:26PM +0200, Julien Rouhaud wrote:
> There's a thinko in the padding handling.  It should be dones whether MyProc
> and/or lockGroupLeader is NULL or not, and only if padding was asked, like 
> it's
> done for case 'd' for instance.
> 
> Also, the '%k' escape sounds a bit random.  Is there any reason why we don't
> use any uppercase character for log_line_prefix?  %P could be a better
> alternative, otherwise maybe %g, as GroupLeader/Gather?

Thanks for looking.  %P is a good idea - it's consistent with ps and pkill and
probably other %commands.  I also amended the docs.

-- 
Justin
>From 68a892d0023e915cc48aea1bb77eae5deabcecec Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 13 Mar 2020 22:03:06 -0500
Subject: [PATCH v2] Include the leader PID in logfile

See also: b025f32e0b, which adds the leader PID to pg_stat_activity
---
 doc/src/sgml/config.sgml  | 11 ++-
 src/backend/utils/error/elog.c| 30 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b353c61683..e60403d6a5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6611,6 +6611,14 @@ local0.*/var/log/postgresql
  Process ID
  no
 
+
+ %P
+ Process ID of the parallel group leader if this process was
+ at some point involved in parallel query, otherwise null.  For a
+ parallel group leader itself, this field is set to its own process
+ ID.
+ no
+
 
  %t
  Time stamp without milliseconds
@@ -6943,7 +6951,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, and backend type.
+application name, backend type, and leader PID.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -6973,6 +6981,7 @@ CREATE TABLE postgres_log
   location text,
   application_name text,
   backend_type text,
+  leader_pid integer,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e4b717c79a..dea098faf1 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,6 +77,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2448,6 +2449,25 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%d", MyProcPid);
 break;
+
+			case 'P':
+if (MyProc)
+{
+	PGPROC *leader = MyProc->lockGroupLeader;
+	if (leader == NULL)
+		/* padding only */
+		appendStringInfoSpaces(buf,
+padding > 0 ? padding : -padding);
+	else if (padding != 0)
+		appendStringInfo(buf, "%*d", padding, leader->pid);
+	else
+		appendStringInfo(buf, "%d", leader->pid);
+}
+else if (padding != 0)
+	appendStringInfoSpaces(buf,
+		   padding > 0 ? padding : -padding);
+break;
+
 			case 'l':
 if (padding != 0)
 	appendStringInfo(buf, "%*ld", padding, log_line_number);
@@ -2836,6 +2856,16 @@ write_csvlog(ErrorData *edata)
 	else
 		appendCSVLiteral(, GetBackendTypeDesc(MyBackendType));
 
+	appendStringInfoChar(, ',');
+
+	/* leader PID */
+	if (MyProc)
+	{
+		PGPROC *leader = MyProc->lockGroupLeader;
+		if (leader)
+			appendStringInfo(, "%d", leader->pid);
+	}
+
 	appendStringInfoChar(, '\n');
 
 	/* If in the syslogger process, try to write messages direct to file */
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e430e33c7b..48a50fb394 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -533,6 +533,7 @@
 	#   %h = remote host
 	#   %b = backend type
 	#   %p = process ID
+	#   %P = leader PID
 	#   %t = timestamp without milliseconds
 	#   %m = timestamp with milliseconds
 	#   %n = timestamp with milliseconds (as a Unix epoch)
-- 
2.17.0



Re: GSSENC'ed connection stalls while reconnection attempts.

2020-07-10 Thread Tom Lane
Kyotaro Horiguchi  writes:
> If psql connected using GSSAPI auth and server restarted, reconnection
> sequence stalls and won't return.

Yeah, reproduced here.  (I wonder if there's any reasonable way to
exercise this scenario in src/test/kerberos/.)

> I found that psql(libpq) sends startup packet via gss
> encryption. conn->gssenc should be reset when encryption state is
> freed.

Actually, it looks to me like the GSS support was wedged in by somebody
who was paying no attention to how SSL is managed, or else we forgot
to pay attention to GSS the last time we rearranged SSL support.  It's
completely broken for the multiple-host-addresses scenario as well,
because try_gss is being set and cleared in the wrong places altogether.
conn->gcred is not being handled correctly either I think --- we need
to make sure that it's dropped in pqDropConnection.

The attached patch makes this all act more like the way SSL is handled,
and for me it resolves the reconnection problem.

> The reason that psql doesn't notice the error is pqPacketSend returns
> STATUS_OK when write error occurred.  That behavior contradicts to the
> comment of the function. The function is used only while making
> connection so it's ok to error out immediately by write failure.  I
> think other usage of pqFlush while making a connection should report
> write failure the same way.

I'm disinclined to mess with that, because (a) I don't think it's the
actual source of the problem, and (b) it would affect way more than
just GSS mode.

> Finally, It's user-friendly if psql shows error message for error on
> reset attempts. (This perhaps should be arguable.)

Meh, that's changing fairly longstanding behavior that I don't think
we've had many complaints about.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27c9bb46ee..7bee9dd201 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -477,6 +477,11 @@ pqDropConnection(PGconn *conn, bool flushInput)
 	{
 		OM_uint32	min_s;
 
+		if (conn->gcred != GSS_C_NO_CREDENTIAL)
+		{
+			gss_release_cred(_s, >gcred);
+			conn->gcred = GSS_C_NO_CREDENTIAL;
+		}
 		if (conn->gctx)
 			gss_delete_sec_context(_s, >gctx, GSS_C_NO_BUFFER);
 		if (conn->gtarg_nam)
@@ -496,6 +501,7 @@ pqDropConnection(PGconn *conn, bool flushInput)
 			free(conn->gss_ResultBuffer);
 			conn->gss_ResultBuffer = NULL;
 		}
+		conn->gssenc = false;
 	}
 #endif
 #ifdef ENABLE_SSPI
@@ -2027,11 +2033,6 @@ connectDBStart(PGconn *conn)
 	 */
 	resetPQExpBuffer(>errorMessage);
 
-#ifdef ENABLE_GSS
-	if (conn->gssencmode[0] == 'd') /* "disable" */
-		conn->try_gss = false;
-#endif
-
 	/*
 	 * Set up to try to connect to the first host.  (Setting whichhost = -1 is
 	 * a bit of a cheat, but PQconnectPoll will advance it to 0 before
@@ -2468,6 +2469,9 @@ keep_going:		/* We will come back to here until there is
 		conn->allow_ssl_try = (conn->sslmode[0] != 'd');	/* "disable" */
 		conn->wait_ssl_try = (conn->sslmode[0] == 'a'); /* "allow" */
 #endif
+#ifdef ENABLE_GSS
+		conn->try_gss = (conn->gssencmode[0] != 'd');	/* "disable" */
+#endif
 
 		reset_connection_state_machine = false;
 		need_new_connection = true;
@@ -3349,12 +3353,8 @@ keep_going:		/* We will come back to here until there is
 	 */
 	if (conn->gssenc && conn->gssencmode[0] == 'p')
 	{
-		OM_uint32	minor;
-
 		/* postmaster expects us to drop the connection */
 		conn->try_gss = false;
-		conn->gssenc = false;
-		gss_delete_sec_context(, >gctx, NULL);
 		pqDropConnection(conn, true);
 		conn->status = CONNECTION_NEEDED;
 		goto keep_going;
@@ -3906,9 +3906,6 @@ makeEmptyPGconn(void)
 	conn->verbosity = PQERRORS_DEFAULT;
 	conn->show_context = PQSHOW_CONTEXT_ERRORS;
 	conn->sock = PGINVALID_SOCKET;
-#ifdef ENABLE_GSS
-	conn->try_gss = true;
-#endif
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -4065,22 +4062,6 @@ freePGconn(PGconn *conn)
 		free(conn->gsslib);
 	if (conn->connip)
 		free(conn->connip);
-#ifdef ENABLE_GSS
-	if (conn->gcred != GSS_C_NO_CREDENTIAL)
-	{
-		OM_uint32	minor;
-
-		gss_release_cred(, >gcred);
-		conn->gcred = GSS_C_NO_CREDENTIAL;
-	}
-	if (conn->gctx)
-	{
-		OM_uint32	minor;
-
-		gss_delete_sec_context(, >gctx, GSS_C_NO_BUFFER);
-		conn->gctx = NULL;
-	}
-#endif
 	/* Note that conn->Pfdebug is not ours to close or free */
 	if (conn->last_query)
 		free(conn->last_query);


Re: expose parallel leader in CSV and log_line_prefix

2020-07-10 Thread Julien Rouhaud
On Thu, Jul 09, 2020 at 09:20:23PM -0500, Justin Pryzby wrote:
> On Fri, Jul 10, 2020 at 11:09:40AM +0900, Michael Paquier wrote:
> > On Thu, Jul 09, 2020 at 01:53:39PM +0200, Julien Rouhaud wrote:
> > > Sure!  I've been quite busy with internal work duties recently but
> > > I'll review this patch shortly.  Thanks for the reminder!
> > 
> > Hmm.  In which cases would it be useful to have this information in
> > the logs knowing that pg_stat_activity lets us know the link between
> > both the leader and its workers?
> 
> PSA is an instantaneous view whereas the logs are a record.  That's important
> for shortlived processes (like background workers) or in the case of an ERROR
> or later crash.
> 
> Right now, the logs fail to include that information, which is deficient.  
> Half
> the utility is in showing *that* the log is for a parallel worker, which is
> otherwise not apparent.

Yes, I agree that this is a nice thing to have and another smell step toward
parallel query monitoring.

About the patch:

+   case 'k':
+   if (MyProc)
+   {
+   PGPROC *leader = MyProc->lockGroupLeader;
+   if (leader == NULL)
+   /* padding only */
+   appendStringInfoSpaces(buf,
+   padding > 0 ? padding : -padding);
+   else if (padding != 0)
+   appendStringInfo(buf, "%*d", padding, leader->pid);
+   else
+   appendStringInfo(buf, "%d", leader->pid);
+   }
+   break;

There's a thinko in the padding handling.  It should be dones whether MyProc
and/or lockGroupLeader is NULL or not, and only if padding was asked, like it's
done for case 'd' for instance.

Also, the '%k' escape sounds a bit random.  Is there any reason why we don't
use any uppercase character for log_line_prefix?  %P could be a better
alternative, otherwise maybe %g, as GroupLeader/Gather?




Re: vs formatting in the docs

2020-07-10 Thread Peter Eisentraut

On 2020-06-21 18:57, Dagfinn Ilmari Mannsåker wrote:

Attached are two patches: the first adds the missing  tags,
the second adds  to all the SQL commands (specifically anything
with 7).


I have committed the first one.

I have some concerns about the second one.  If you look at the diff of 
the source of the man pages before and after, it creates a bit of a 
mess, even though the man pages look okay when rendered.  I'll need to 
think about this a bit more.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




libpq: Request Pipelining/Batching status ?

2020-07-10 Thread Matthieu Garrigues
Hi all,

Do you know what is the status of Request Pipelining and/or Batching in
libpq ?

I could see that I'm not the first one to think about it, I see an item in
the todolist:
https://web.archive.org/web/20200125013930/https://wiki.postgresql.org/wiki/Todo

And a thread here:
https://www.postgresql-archive.org/PATCH-Batch-pipelining-support-for-libpq-td5904551i80.html

And a patch here:
https://2ndquadrant.github.io/postgres/libpq-batch-mode.html

Seems like this boost performances a lot, drogon, a c++ framework
outperform all
other web framework thanks to this fork:
https://www.techempower.com/benchmarks/#section=data-r19=ph=update
https://github.com/TechEmpower/FrameworkBenchmarks/issues/5502

It would be nice to have it it the official libpq so we don't have to use
an outdated fork
to have this feature.
Is anybody working on it ? is there lots of work to finalize this patch ?

Thanks in advance,
Matthieu


Re: WIP: BRIN multi-range indexes

2020-07-10 Thread Sascha Kuhl
Tomas Vondra  schrieb am Fr., 10. Juli 2020,
14:09:

> On Fri, Jul 10, 2020 at 06:01:58PM +0900, Masahiko Sawada wrote:
> >On Fri, 3 Jul 2020 at 09:58, Tomas Vondra 
> wrote:
> >>
> >> On Sun, Apr 05, 2020 at 08:01:50PM +0300, Alexander Korotkov wrote:
> >> >On Sun, Apr 5, 2020 at 8:00 PM Tomas Vondra
> >> > wrote:
> >> ...
> >> >> >
> >> >> >Assuming we're not going to get 0001-0003 into v13, I'm not so
> >> >> >inclined to rush on these three as well.  But you're willing to
> commit
> >> >> >them, you can count round of review on me.
> >> >> >
> >> >>
> >> >> I have no intention to get 0001-0003 committed. I think those changes
> >> >> are beneficial on their own, but the primary reason was to support
> the
> >> >> new opclasses (which require those changes). And those parts are not
> >> >> going to make it into v13 ...
> >> >
> >> >OK, no problem.
> >> >Let's do this for v14.
> >> >
> >>
> >> Hi Alexander,
> >>
> >> Are you still interested in reviewing those patches? I'll take a look at
> >> 0001-0003 to check that your previous feedback was addressed. Do you
> >> have any comments about 0004 / 0005, which I think are the more
> >> interesting parts of this series?
> >>
> >>
> >> Attached is a rebased version - I realized I forgot to include 0005 in
> >> the last update, for some reason.
> >>
> >
> >I've done a quick test with this patch set. I wonder if we can improve
> >brin_page_items() SQL function in pageinspect as well. Currently,
> >brin_page_items() is hard-coded to support only normal brin indexes.
> >When we pass brin-bloom or brin-multi-range to that function the
> >binary values are shown in 'value' column but it seems not helpful for
> >users. For instance, here is an output of brin_page_items() with a
> >brin-multi-range index:
> >
> >postgres(1:12801)=# select * from brin_page_items(get_raw_page('mul',
> >2), 'mul');
> >-[ RECORD 1
> ]--
>
> >---
> >
> >itemoffset  | 1
> >blknum  | 0
> >attnum  | 1
> >allnulls| f
> >hasnulls| f
> >placeholder | f
> >value   |
> {\x01001b002100e570e670e770e870e970ea70eb70ec70ed70ee70ef
>
> >70f070f170f270f370f470f570f670f770f870f970fa70fb70fc70fd70fe70ff700
> >071}
> >
>
> Hmm. I'm not sure we can do much better, without making the function
> much more complicated. I mean, even with regular BRIN indexes we don't
> really know if the value is plain min/max, right?
>
You can be sure with the next node. The value is in can be false positiv.
The value is out is clear. You can detect the change between in and out.

>
>
> >Also, I got an assertion failure when setting false_positive_rate
> reloption:
> >
> >postgres(1:12448)=# create index blm on t using brin (c int4_bloom_ops
> >(false_positive_rate = 1));
> >TRAP: FailedAssertion("(false_positive_rate > 0) &&
> >(false_positive_rate < 1.0)", File: "brin_bloom.c", Line: 300)
> >
> >I'll look at the code in depth and let you know if I find a problem.
> >
>
> Yeah, the assert should say (f_p_r <= 1.0).
>
> But I'm not convinced we should allow values up to 1.0, really. The
> f_p_r is the fraction of the table that will get matched always, so 1.0
> would mean we get to scan the whole table. Seems kinda pointless. So
> maybe we should cap it to something like 0.1 or so, but I agree the
> value seems kinda arbitrary.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>


Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> In principle, Stephen is right: the v12 behavior is a bug, lots of
> people are unhappy about it, it causes real problems, and it would not
> be acceptable if proposed today. Otherwise I wouldn't have spent the
> time to fix it. 
> 
> Similarly, potential regressions are not the "fault" of my feature --
> they are the fault of the limitations of work_mem, the limitations of
> the planner, the wrong expectations from customers, or just
> happenstance. 

Exactly.

> But at a certain point, I have to weigh the potential anger of
> customers hitting regressions versus the potential anger of hackers
> seeing a couple extra GUCs. I have to say that I am more worried about
> the former.

We work, quite intentionally, to avoid having a billion knobs that
people have to understand and to tune.  Yes, we could create a bunch of
new GUCs to change all kinds of behavior, and we could add hints while
we're at it, but there's been quite understandable and good pressure
against doing so because much of the point of this database system is
that it should be figuring out the best plan on its own and within the
constraints that users have configured.

> If there is some more serious consequence of adding a GUC that I missed
> in this thread, please let me know. Otherwise, I intend to commit a new
> GUC shortly that will enable users to bypass work_mem for HashAgg, just
> as in v12.

I don't think this thread has properly considered that every new GUC,
every additional knob that we create, increases the complexity of the
system for users to have to deal with and, in some sense, creates a
failure of ours to be able to just figure out what the right answer
is.  For such a small set of users, who somehow have a problem with a
Sort taking up more memory but are fine with HashAgg doing so, I don't
think the requirement is met that this is a large enough issue to
warrant a new GUC.  Users who are actually hit by this in a negative way
have an option- increase work_mem to reflect what was actually happening
already.  I seriously doubt that we'd get tons of users complaining
about that answer or asking us to have something separate from that, and
we'd avoid adding some new GUC that has to be explained to every new
user to the system and complicate the documentation that explains how
work_mem works.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost  wrote:
> > I didn't, and don't, think it particularly relevant to the discussion,
> > but if you don't like the comparison to Sort then we could compare it to
> > a HashJoin instead- the point is that, yes, if you are willing to give
> > more memory to a given operation, it's going to go faster, and the way
> > users tell us that they'd like the query to use more memory is already
> > well-defined and understood to be through work_mem.  We do not do them a
> > service by ignoring that.
> 
> The hash_mem design (as it stands) would affect both hash join and
> hash aggregate. I believe that it makes most sense to have hash-based
> nodes under the control of a single GUC. I believe that this
> granularity will cause the least problems. It certainly represents a
> trade-off.

So, now this has moved from being a hack to deal with a possible
regression for a small number of users due to new behavior in one node,
to a change that has impacts on other nodes that hadn't been changed,
all happening during beta.

No, I don't agree with this.  Now is not the time for designing new
features for v13.

> work_mem is less of a problem with hash join, primarily because join
> estimates are usually a lot better than grouping estimates. But it is
> nevertheless something that it makes sense to put in the same
> conceptual bucket as hash aggregate, pending a future root and branch
> redesign of work_mem.

I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
the wrong direction because what's actually needed is a way to properly
consider and track overall memory usage- a redesign of work_mem (or some
new parameter, but it wouldn't be 'hash_mem') as you say, but all of
this discussion should be targeting v14.

> Like I said, the escape hatch GUC is not my preferred solution. But at
> least it acknowledges the problem. I don't think that anyone (or
> anyone else) believes that work_mem doesn't have serious limitations.

work_mem obviously has serious limitations, but that's not novel or new
or unexpected by anyone.

> > We have a parameter which already drives this and which users are
> > welcome to (and quite often do) tune.  I disagree that anything further
> > is either essential or particularly desirable.
> 
> This is a user hostile attitude.

I don't find that argument convincing, at all.

> > I'm really rather astounded at the direction this has been going in.
> 
> Why?

Due to the fact that we're in beta and now is not the time to be
redesigning this feature.  What Jeff implemented was done in a way that
follows the existing structure for how all of the other nodes work and
how HashAgg was *intended* to work (as in- if we thought the HashAgg
would go over work_mem, we wouldn't pick it and would do a GroupAgg
instead).  If there's bugs in his implementation (which I doubt, but it
can happen, of course) then that'd be useful to discuss and look at
fixing, but this discussion isn't appropriate for beta.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions

2020-07-10 Thread Tom Lane
Zhenghua Lyu  writes:
> The where clause is "pushed down to the x,y" because it only 
> references these two relations.

Yeah.  I agree that it's somewhat unprincipled, but changing it doesn't
seem like a great idea.  There are a lot of users out there who aren't
terribly careful about marking their UDFs as non-volatile, but would be
unhappy if the optimizer suddenly crippled their queries because of
being picky about this.

Also, we specifically document that order of evaluation in WHERE clauses
is not guaranteed, so I feel no need to make promises about how often
volatile functions there will be evaluated.  (Volatiles in SELECT lists
are a different story.)

This behavior has stood for a couple of decades with few user complaints,
so why are you concerned about changing it?

regards, tom lane




Re: Implement UNLOGGED clause for COPY FROM

2020-07-10 Thread Tom Lane
"osumi.takami...@fujitsu.com"  writes:
>> Aside from that, though, how does this improve upon the existing capability 
>> to copy into an unlogged temporary table?

> [>] unlogged temporary table can’t be inherited over sessions first of all.

Unlogged tables don't have to be temporary.

> And unlogged table needs to be recreated due to startup truncation of the 
> table’s content
> when the server crashes.

Indeed, and your proposed feature would extend that un-safety to tables
that are NOT marked unlogged, which is not good.

AFAICS, we can already accomplish basically the same thing as what you
want to do like this:

alter table foo set unlogged;
copy foo from ...;
alter table foo set logged;

The mechanics of that are already well worked out.  It's expensive,
no doubt about that, but I think you're just fooling yourself to
imagine that any shortcuts are possible.  A mix of unlogged and logged
data is never going to work safely.

> To achieve this, we have to
> consider a new idea like loaded data’d be added
> at the end of the all other pages and detach those
> if the server crashes during the UNLOGGED loading processing for example.

You keep on ignoring the indexes... not to mention replication.

regards, tom lane




Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings,

* Justin Pryzby (pry...@telsasoft.com) wrote:
> On Thu, Jul 09, 2020 at 06:58:40PM -0400, Stephen Frost wrote:
> > * Peter Geoghegan (p...@bowt.ie) wrote:
> > > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost  wrote:
> > > It makes more sense than simply ignoring what our users will see as a
> > > simple regression. (Though I still lean towards fixing the problem by
> > > introducing hash_mem, which at least tries to fix the problem head
> > > on.)
> > 
> > The presumption that this will always end up resulting in a regression
> > really doesn't seem sensible to me.
> 
> Nobody said "always" - we're concerned about a fraction of workloads which
> regress, badly affecting only only a small fraction of users.

And those workloads would be addressed by increasing work_mem, no?  Why
are we inventing something new here for something that'll only impact a
small fraction of users in a small fraction of cases and where there's
already a perfectly workable way to address the issue?

> Maybe pretend that Jeff implemented something called CashAgg, which does
> everything HashAgg does but implemented from scratch.  Users would be able to
> tune it or disable it, and we could talk about removing HashAgg for the next 3
> years.  But instead we decide to remove HashAgg right now since it's 
> redundant.
> That's a bad way to transition from an old behavior to a new one.  It's scary
> because it imposes a burden, rather than offering a new option without also
> taking away the old one.

We already have enable_hashagg.  Users are free to disable it.  This
makes it also respect work_mem- allowing users to tune that value to
adjust how much memory HashAgg actually uses.

> > > That's not the only justification. The other justification is that
> > > it's generally reasonable to prefer giving hash aggregate more memory.
> > 
> > Sure, and it's generally reasonably to prefer giving Sorts more memory
> > too... as long as you've got it available.
> 
> I believe he meant:
> "it's generally reasonable to prefer giving hash aggregate more memory THAN 
> OTHER NODES"

If we were developing a wholistic view of memory usage, with an overall
cap on how much memory is allowed to be used for a query, then that
would be an interesting thing to consider and discuss.  That's not what
any of this is.

Thanks,

Stephen


signature.asc
Description: PGP signature


RE: Implement UNLOGGED clause for COPY FROM

2020-07-10 Thread osumi.takami...@fujitsu.com
Hi David Johnston

Thank you for your comment.
Aside from that, though, how does this improve upon the existing capability to 
copy into an unlogged temporary table?

[>] unlogged temporary table can’t be inherited over sessions first of all.
And unlogged table needs to be recreated due to startup truncation of the 
table’s content
when the server crashes.
If you hold massive data in an unlogged table,
you’d forced to spend much time to recover it. This isn’t good.

So I’m thinking that COPY UNLOGGED’d provide a more flexible way for keeping 
data
for COPY FROM users.

I’m considering that the feature gives them a choice that during ordinary 
operation
you can keep WAL logging for a target table and when you need high-speed loading
you can bypass WAL generation for it.

To achieve this, we have to
consider a new idea like loaded data’d be added
at the end of the all other pages and detach those
if the server crashes during the UNLOGGED loading processing for example.

By the way, “ALTER TABLE tbl SET UNLOGGED” is supported by postgres.
You may think it’s OK to change LOGGED table to UNLOGGED table by this command.
But, it copies the whole relation once actually. (This isn’t written in the 
manual.)
So this command becomes slow if the table the command is applied to contains a 
lot of data.
Thus changing the table’s status of UNLOGGED/LOGGED also requires cost at the 
moment and I think this copy is an obstacle for switching that table’s status.

The discussion of the reason is written in the url below.
https://www.postgresql.org/message-id/flat/CAFcNs%2Bpeg3VPG2%3Dv6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ%40mail.gmail.com

Best
Takamichi Osumi



Re: pgbench - add pseudo-random permutation function

2020-07-10 Thread Fabien COELHO



Attached v19 is a rebase, per cfbot.


Attached v20 fixes a doc xml typo, per cfbot again.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9f3bb5fce6..d4a604c6fa 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1033,7 +1033,7 @@ pgbench  options  d
 
   
 default_seed 
-   seed used in hash functions by default
+   seed used in hash and pseudo-random permutation functions by default
   
 
   
@@ -1850,6 +1850,20 @@ SELECT 4 AS four \; SELECT 5 AS five \aset

   
 
+  
+   
+pr_perm ( i, size [, seed ] )
+integer
+   
+   
+pseudo-random permutation in [0,size)
+   
+   
+pr_perm(0, 4)
+an integer between 0 and 3
+   
+  
+
   

 random ( lb, ub )
@@ -1936,6 +1950,20 @@ SELECT 4 AS four \; SELECT 5 AS five \aset
 shape of the distribution.

 
+   
+
+  When designing a benchmark which selects rows non-uniformly, be aware
+  that using pgbench non-uniform random functions
+  directly leads to unduly correlations: rows with close ids come out with
+  similar probability, skewing performance measures because they also reside
+  in close pages.
+
+
+  You must use pr_perm to shuffle the selected ids, or
+  some other additional step with similar effect, to avoid these skewing impacts.
+
+   
+

 
  
@@ -2030,24 +2058,54 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
 hash_fnv1a accept an input value and an optional seed parameter.
 In case the seed isn't provided the value of :default_seed
 is used, which is initialized randomly unless set by the command-line
--D option. Hash functions can be used to scatter the
-distribution of random functions such as random_zipfian or
-random_exponential. For instance, the following pgbench
-script simulates possible real world workload typical for social media and
+-D option.
+  
+
+  
+Function pr_perm implements a pseudo-random permutation
+on an arbitrary size.
+It allows to shuffle output of non uniform random functions so that 
+values drawn more often are not trivially correlated.
+It permutes integers in [0, size) using a seed by applying rounds of
+simple invertible functions, similarly to an encryption function,
+although beware that it is not at all cryptographically secure.
+Compared to hash functions discussed above, the function
+ensures that a perfect permutation is applied: there are neither collisions
+nor holes in the output values.
+Values outside the interval are interpreted modulo the size.
+The function raises an error if size is not positive.
+If no seed is provided, :default_seed is used.
+
+For instance, the following pgbench script
+simulates possible real world workload typical for social media and
 blogging platforms where few accounts generate excessive load:
 
 
-\set r random_zipfian(0, 1, 1.07)
-\set k abs(hash(:r)) % 100
+\set size 100
+\set r random_zipfian(1, :size, 1.07)
+\set k 1 + pr_perm(:r, :size)
 
 
 In some cases several distinct distributions are needed which don't correlate
 with each other and this is when implicit seed parameter comes in handy:
 
 
-\set k1 abs(hash(:r, :default_seed + 123)) % 100
-\set k2 abs(hash(:r, :default_seed + 321)) % 100
+\set k1 1 + pr_perm(:r, :size, :default_seed + 123)
+\set k2 1 + pr_perm(:r, :size, :default_seed + 321)
 
+
+An similar behavior can also be approximated with
+hash:
+
+
+\set size 100
+\set r random_zipfian(1, 100 * :size, 1.07)
+\set k 1 + abs(hash(:r)) % :size
+
+
+As this hash-modulo version generates collisions, some
+id would not be reachable and others would come more often
+than the target distribution.
   
 
   
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 85d61caa9f..0ea968c94b 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -19,6 +19,7 @@
 #define PGBENCH_NARGS_VARIABLE	(-1)
 #define PGBENCH_NARGS_CASE		(-2)
 #define PGBENCH_NARGS_HASH		(-3)
+#define PGBENCH_NARGS_PRPERM	(-4)
 
 PgBenchExpr *expr_parse_result;
 
@@ -370,6 +371,9 @@ static const struct
 	{
 		"hash_fnv1a", PGBENCH_NARGS_HASH, PGBENCH_HASH_FNV1A
 	},
+	{
+		"pr_perm", PGBENCH_NARGS_PRPERM, PGBENCH_PRPERM
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
@@ -482,6 +486,19 @@ make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args)
 			}
 			break;
 
+		/* pseudo-random permutation function with optional seed argument */
+		case PGBENCH_NARGS_PRPERM:
+			if (len < 2 || len > 3)
+expr_yyerror_more(yyscanner, "unexpected number of arguments",
+  PGBENCH_FUNCTIONS[fnumber].fname);
+
+			if (len == 2)
+			{
+PgBenchExpr *var = make_variable("default_seed");
+args = make_elist(var, 

Re: pg_regress cleans up tablespace twice.

2020-07-10 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Jun 18, 2020 at 1:42 PM Michael Paquier  wrote:
>> Thanks, applied this part to HEAD then after more testing.

> Hmm, somehow this (well I guess it's this commit based on timing and
> the area it touches, not sure exactly why) made cfbot's Windows build
> fail, like this:

Should now be possible to undo whatever hack you had to use ...

regards, tom lane




Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-10 Thread Tom Lane
Peter Eisentraut  writes:
> We normally don't compile with -O3, so very few users would get the 
> benefit of this.

Yeah.  I don't think changing that baseline globally would be a wise move.

> We have CFLAGS_VECTOR for the checksum code.  I 
> suppose if we are making the numeric code vectorizable as well, we 
> should apply this there also.

> I think we need a bit of a policy decision from the group here.

I'd vote in favor of applying CFLAGS_VECTOR to specific source files
that can benefit.  We already have experience with that and we've not
detected any destabilization potential.

(I've not looked at this patch, so don't take this as a +1 for this
specific patch.)

regards, tom lane




Re: Default gucs for EXPLAIN

2020-07-10 Thread Peter Eisentraut

On 2020-05-23 11:14, Vik Fearing wrote:

Here is a patch to provide default gucs for EXPLAIN options.

I have two goals with this patch.  The first is that I personally
*always* want BUFFERS turned on, so this would allow me to do it without
typing it every time.


There was a lot of opposition to the approach taken by this patch, but 
there was a lot of support turning BUFFERS on by default.  Would you 
like to submit a patch for that?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-10 Thread Peter Eisentraut

On 2020-06-10 14:15, Amit Khandekar wrote:

Well, how do we make sure we keep it that way?  How do we prevent some
random rearranging of the code or some random compiler change to break
this again?

I believe the compiler rearranges the code segments w.r.t. one another
when those are independent of each other. I guess the compiler is able
to identify that. With our case, it's the for loop. There are some
variables used inside it, and that would prevent it from moving the
for loop. Even if the compiler finds it safe to move relative to
surrounding code, it would not spilt the for loop contents themselves,
so the for loop will remain intact, and so would the vectorization,
although it seems to keep an unrolled version of the loop in case it
is called with smaller iteration counts. But yes, if someone in the
future tries to change the for loop, it would possibly break the
auto-vectorization. So, we should have appropriate comments (patch has
those). Let me know if you find any possible reasons due to which the
compiler would in the future stop the vectorization even when there is
no change in the for loop.


We normally don't compile with -O3, so very few users would get the 
benefit of this.  We have CFLAGS_VECTOR for the checksum code.  I 
suppose if we are making the numeric code vectorizable as well, we 
should apply this there also.


I think we need a bit of a policy decision from the group here.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WIP: BRIN multi-range indexes

2020-07-10 Thread Tomas Vondra

On Fri, Jul 10, 2020 at 06:01:58PM +0900, Masahiko Sawada wrote:

On Fri, 3 Jul 2020 at 09:58, Tomas Vondra  wrote:


On Sun, Apr 05, 2020 at 08:01:50PM +0300, Alexander Korotkov wrote:
>On Sun, Apr 5, 2020 at 8:00 PM Tomas Vondra
> wrote:
...
>> >
>> >Assuming we're not going to get 0001-0003 into v13, I'm not so
>> >inclined to rush on these three as well.  But you're willing to commit
>> >them, you can count round of review on me.
>> >
>>
>> I have no intention to get 0001-0003 committed. I think those changes
>> are beneficial on their own, but the primary reason was to support the
>> new opclasses (which require those changes). And those parts are not
>> going to make it into v13 ...
>
>OK, no problem.
>Let's do this for v14.
>

Hi Alexander,

Are you still interested in reviewing those patches? I'll take a look at
0001-0003 to check that your previous feedback was addressed. Do you
have any comments about 0004 / 0005, which I think are the more
interesting parts of this series?


Attached is a rebased version - I realized I forgot to include 0005 in
the last update, for some reason.



I've done a quick test with this patch set. I wonder if we can improve
brin_page_items() SQL function in pageinspect as well. Currently,
brin_page_items() is hard-coded to support only normal brin indexes.
When we pass brin-bloom or brin-multi-range to that function the
binary values are shown in 'value' column but it seems not helpful for
users. For instance, here is an output of brin_page_items() with a
brin-multi-range index:

postgres(1:12801)=# select * from brin_page_items(get_raw_page('mul',
2), 'mul');
-[ RECORD 1 
]--
---

itemoffset  | 1
blknum  | 0
attnum  | 1
allnulls| f
hasnulls| f
placeholder | f
value   | 
{\x01001b002100e570e670e770e870e970ea70eb70ec70ed70ee70ef
70f070f170f270f370f470f570f670f770f870f970fa70fb70fc70fd70fe70ff700
071}



Hmm. I'm not sure we can do much better, without making the function
much more complicated. I mean, even with regular BRIN indexes we don't
really know if the value is plain min/max, right?



Also, I got an assertion failure when setting false_positive_rate reloption:

postgres(1:12448)=# create index blm on t using brin (c int4_bloom_ops
(false_positive_rate = 1));
TRAP: FailedAssertion("(false_positive_rate > 0) &&
(false_positive_rate < 1.0)", File: "brin_bloom.c", Line: 300)

I'll look at the code in depth and let you know if I find a problem.



Yeah, the assert should say (f_p_r <= 1.0).

But I'm not convinced we should allow values up to 1.0, really. The
f_p_r is the fraction of the table that will get matched always, so 1.0
would mean we get to scan the whole table. Seems kinda pointless. So
maybe we should cap it to something like 0.1 or so, but I agree the
value seems kinda arbitrary.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: TAP tests and symlinks on Windows

2020-07-10 Thread Andrew Dunstan

On 7/8/20 9:54 AM, Andrew Dunstan wrote:
>
>
>
> Then, with a little more sprinkling of perl2host the pg_basebackup tests
> can be made to work on msys2.
>
>
> I'm going to prepare patches along these lines.
>
>



After much frustration and gnashing of teeth here's a patch that allows
almost all the TAP tests involving symlinks to work as expected on all
Windows build environments, without requiring an additional Perl module.
I have tested this on a system that is very similar to that running
drongo and fairywren, with both msys2 and  MSVC builds.


I didn't change the name of perl2host - Sufficient unto the day is the
evil thereof. But I did modify it  a) to allow use of cygpath if
available and b) to allow it to succeed if the grandparent directory
exists when cygpath isn't available.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index d98187c970..e6e2b2762d 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -775,6 +775,7 @@ make check PROVE_TESTS='t/001_test1.pl t/003_test3.pl'

 The TAP tests require the Perl module IPC::Run.
 This module is available from CPAN or an operating system package.
+On Windows, Win32API::File is also required .

 

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 208df557b8..38b2a35c3b 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -211,154 +211,168 @@ $node->command_fails(
 	'pg_basebackup tar with long name fails');
 unlink "$pgdata/$superlongname";
 
-# The following tests test symlinks. Windows doesn't have symlinks, so
-# skip on Windows.
+# The following tests are for symlinks.
+
+# Move pg_replslot out of $pgdata and create a symlink to it.
+$node->stop;
+
+# Set umask so test directories and files are created with group permissions
+umask(0027);
+
+# Enable group permissions on PGDATA
+chmod_recursive("$pgdata", 0750, 0640);
+
+rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+  or BAIL_OUT "could not move $pgdata/pg_replslot";
+dir_symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+  or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+$node->start;
+
+# Create a temporary directory in the system location and symlink it
+# to our physical temp location.  That way we can use shorter names
+# for the tablespace directories, which hopefully won't run afoul of
+# the 99 character length limit.
+my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+dir_symlink "$tempdir", $shorter_tempdir;
+
+mkdir "$tempdir/tblspc1";
+my $realTsDir   = TestLib::perl2host("$shorter_tempdir/tblspc1");
+my $real_tempdir = TestLib::perl2host($tempdir);
+$node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';");
+$node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+$node->command_ok([ 'pg_basebackup', '-D', "$real_tempdir/tarbackup2", '-Ft' ],
+  'tar format with tablespaces');
+ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created');
+my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+rmtree("$tempdir/tarbackup2");
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+);
+
+my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+		   q{select pg_relation_filepath('tblspc1_unlogged')});
+
+# Make sure main and init forks exist
+ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+	'unlogged init fork in tablespace');
+ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+# Create files that look like temporary relations to ensure they are ignored
+# in a tablespace.
+my @tempRelationFiles = qw(t888_888 t88_88_vm.1);
+my $tblSpc1Id = basename(
+	dirname(
+		dirname(
+			$node->safe_psql(
+'postgres', q{select pg_relation_filepath('test1')};
+
+foreach my $filename (@tempRelationFiles)
+{
+	append_to_file(
+		"$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+		'TEMP_RELATION');
+}
+
+$node->command_fails(
+	[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
+	'plain format with tablespaces fails without tablespace mapping');
+
+$node->command_ok(
+	[
+		'pg_basebackup', '-D', "$tempdir/backup1", '-Fp',
+		"-T$realTsDir=$real_tempdir/tbackup/tblspc1"
+	   ],
+	'plain format with tablespaces succeeds with tablespace mapping');
+ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+
+# This symlink check is not supported on Windows as -l
+# doesn't work with junctions
 SKIP:
 {
-	skip "symlinks not supported on Windows", 18 if ($windows_os);
-
-	# Move pg_replslot out of 

Re: posgres 12 bug (partitioned table)

2020-07-10 Thread Amit Langote
Reading my own words, I think I must fix an ambiguity:

On Fri, Jul 10, 2020 at 3:23 PM Amit Langote  wrote:
> So even if an AM's table_tuple_insert() itself doesn't populate the
> transaction info into the slot handed to it, maybe as an optimization,
> it does not sound entirely unreasonable to expect that the AM's
> slot_getsysattr() callback returns it correctly when projecting a
> target list containing system columns.

The "maybe as an optimization" refers to the part of the sentence that
comes before it.  That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

2020-07-10 Thread Peter Eisentraut

On 2020-07-08 13:45, Rémi Lapeyre wrote:

Hi, here's a new version of the patch that should apply cleanly. I'll monitor
the status on http://cfbot.cputube.org/


It's hard to find an explanation what this patch actually does.  I don't 
want to have to go through threads dating back 4 months to determine 
what was discussed and what was actually implemented.  Since you're 
already using git format-patch, just add something to the commit message.


It appears that these are really two separate features, so perhaps they 
should be two patches.


Also, the new header matching mode could probably use more than one line 
of documentation.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-10 Thread Dilip Kumar
On Sat, Jul 4, 2020 at 11:35 AM Amit Kapila  wrote:
>
> On Tue, Jun 30, 2020 at 5:20 PM Amit Kapila  wrote:
> >
> > Let me know what you think about the above changes.
> >
>
> I went ahead and made few changes in
> 0005-Implement-streaming-mode-in-ReorderBuffer which are explained
> below.  I have few questions and suggestions for the patch as well
> which are also covered in below points.
>
> 1.
> + if (prev_lsn == InvalidXLogRecPtr)
> + {
> + if (streaming)
> + rb->stream_start(rb, txn, change->lsn);
> + else
> + rb->begin(rb, txn);
> + stream_started = true;
> + }
>
> I don't think we want to move begin callback here that will change the
> existing semantics, so it is better to move begin at its original
> position. I have made the required changes in the attached patch.
>
> 2.
> ReorderBufferTruncateTXN()
> {
> ..
> + dlist_foreach_modify(iter, >changes)
> + {
> + ReorderBufferChange *change;
> +
> + change = dlist_container(ReorderBufferChange, node, iter.cur);
> +
> + /* remove the change from it's containing list */
> + dlist_delete(>node);
> +
> + ReorderBufferReturnChange(rb, change);
> + }
> ..
> }
>
> I think here we can add an Assert that we're not mixing changes from
> different transactions.  See the changes in the patch.
>
> 3.
> SetupCheckXidLive()
> {
> ..
> + /*
> + * setup CheckXidAlive if it's not committed yet. We don't check if the xid
> + * aborted. That will happen during catalog access.  Also, reset the
> + * bsysscan flag.
> + */
> + if (!TransactionIdDidCommit(xid))
> + {
> + CheckXidAlive = xid;
> + bsysscan = false;
> ..
> }
>
> What is the need to reset bsysscan flag here if we are already
> resetting on error (like in the previous patch sent by me)?
>
> 4.
> ReorderBufferProcessTXN()
> {
> ..
> ..
> + /* Reset the CheckXidAlive */
> + if (streaming)
> + CheckXidAlive = InvalidTransactionId;
> ..
> }
>
> Similar to the previous point, we don't need this as well because
> AbortCurrentTransaction would have taken care of this.
>
> 5.
> + * XXX Do we need to check if the transaction has some changes to stream
> + * (maybe it got streamed right before the commit, which attempts to
> + * stream it again before the commit)?
> + */
> +static void
> +ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
>
> The above comment doesn't make much sense to me, so I have removed it.
> Basically, if there are no changes before commit, we still need to
> send commit and anyway if there are no more changes
> ReorderBufferProcessTXN will not do anything.
>
> 6.
> +ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
> {
> ..
> if (txn->snapshot_now == NULL)
> + {
> + dlist_iter subxact_i;
> +
> + /* make sure this transaction is streamed for the first time */
> + Assert(!rbtxn_is_streamed(txn));
> +
> + /* at the beginning we should have invalid command ID */
> + Assert(txn->command_id == InvalidCommandId);
> +
> + dlist_foreach(subxact_i, >subtxns)
> + {
> + ReorderBufferTXN *subtxn;
> +
> + subtxn = dlist_container(ReorderBufferTXN, node, subxact_i.cur);
> + ReorderBufferTransferSnapToParent(txn, subtxn);
> + }
> ..
> }
>
> Here, it is possible that there is no base_snapshot for txn, so we
> need a check for that similar to ReorderBufferCommit.
>
> 7.  Apart from the above, I made few changes in comments and ran pgindent.
>
> 8. We can't stream the transaction before we reach the
> SNAPBUILD_CONSISTENT state because some other output plugin can apply
> those changes unlike what we do with pgoutput plugin (which writes to
> file). And, I think applying the transactions without reaching a
> consistent state would be anyway wrong.  So, we should avoid that and
> if do that then we should have an Assert for streamed txns rather than
> sending abort for them in ReorderBufferForget.

I was analyzing this point so currently, we only enable streaming in
StartReplicationSlot so basically in CreateReplicationSlot the
streaming will be always off because by that time plugins are not yet
startup that will happen only on StartReplicationSlot.  See below
snippet from patch 0007.  However, I agree during start replication
slot we might decode some of the extra walls of the transaction for
which we already got the commit confirmation and we must have a way to
avoid that.  But I think we don't need to do anything for the
CONSISTENT snapshot point.  What's your thought on this?

@@ -1016,6 +1016,12 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
  WalSndPrepareWrite, WalSndWriteData,
  WalSndUpdateProgress);

+ /*
+ * Make sure streaming is disabled here - we may have the methods,
+ * but we don't have anywhere to send the data yet.
+ */
+ ctx->streaming = false;
+

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-10 Thread Amit Kapila
On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada
 wrote:
>
> On Thu, 9 Jul 2020 at 12:11, Amit Kapila  wrote:
> >
> > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada
> >  wrote:
> > >
> > >
> > > I think that using oids has another benefit that we don't need to send
> > > slot name to the stats collector along with the stats. Since the
> > > maximum size of slot name is NAMEDATALEN and we don't support the
> > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the
> > > user wants to increase NAMEDATALEN they might not be able to build.
> > >
> >
> > I think NAMEDATALEN is used for many other objects as well and I don't
> > think we want to change it in foreseeable future, so that doesn't
> > sound to be a good reason to invent OIDs for slots.  OTOH, I do
> > understand it would be better to send OIDs than names for slots but I
> > am just not sure if it is a good idea to invent a new way to generate
> > OIDs (which is different from how we do it for other objects in the
> > system) for this purpose.
>
> I'm concerned that there might be users who are using custom
> PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I
> also agree with your concerns. So perhaps we can go with the current
> PoC patch approach as the first version (i.g., sending slot drop
> message to stats collector). When we need such a unique identifier
> also for other purposes, we will be able to change this feature so
> that it uses that identifier for this statistics reporting purpose.
>

Okay, feel to submit the version atop my revert patch.  I think you
might want to remove the indexing stuff you have added for faster
search as discussed above.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-10 Thread Amit Kapila
On Fri, Jul 10, 2020 at 7:19 AM Masahiko Sawada
 wrote:
>
> On Thu, 9 Jul 2020 at 16:09, Amit Kapila  wrote:
> >
> >
> > Fair enough.  The attached patch reverts the commits related to these
> > stats.  Sawada-San, can you please once see if I have missed anything
> > apart from catversion bump which I will do before commit?
>
> Thank you for the patch!
>
> Do we remove the corresponding line in the release note by another
> commit?
>

Yes, I will do that as well.

> For the rest, the looks good to me.
>

Thanks, will wait for a day or so and then push it early next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Global temporary tables

2020-07-10 Thread wenjing zeng
HI all

I started using my personal email to respond to community issue.



> 2020年7月7日 下午6:05,Pavel Stehule  写道:
> 
> Hi
>  
> GTT Merge the latest PGMaster and resolves conflicts.
> 
> 
> 
> I tested it and it looks fine. I think it is very usable in current form, but 
> still there are some issues:
> 
> postgres=# create global temp table foo(a int);
> CREATE TABLE
> postgres=# insert into foo values(10);
> INSERT 0 1
> postgres=# alter table foo add column x int;
> ALTER TABLE
> postgres=# analyze foo;
> WARNING:  reloid 16400 not support update attstat after add colunm
> WARNING:  reloid 16400 not support update attstat after add colunm
> ANALYZE
This is a limitation that we can completely eliminate.

> 
> Please, can you summarize what is done, what limits are there, what can be 
> implemented hard, what can be implemented easily?
Sure.

The current version of the GTT implementation supports all regular table 
operations.
1 what is done
1.1 insert/update/delete on GTT.
1.2 The GTT supports all types of indexes, and the query statement supports the 
use of GTT indexes to speed up the reading of data in the GTT.
1.3 GTT statistics keep a copy of THE GTT local statistics, which are provided 
to the optimizer to choose the best query plan.
1.4 analyze vacuum GTT.
1.5 truncate cluster GTT.
1.6 all DDL on GTT.
1.7 GTT table can use  GTT sequence  or Regular sequence.
1.8 Support for creating views on GTT.
1.9 Support for creating views on foreign key.
1.10 support global temp partition.

I feel like I cover all the necessary GTT requirements.

For cluster GTT,I think it's complicated.
I'm not sure the current implementation is quite reasonable. Maybe you can help 
review it.


> 
> 
> 
> I found one open question - how can be implemented table locks - because data 
> is physically separated, then we don't need table locks as protection against 
> race conditions. 
Yes, but GTT’s DML DDL still requires table locking.
1 The DML requires table locks (RowExclusiveLock) to ensure that 
definitions do not change during run time (the DDL may modify or delete them).
This part of the implementation does not actually change the code,
because the DML on GTT does not block each other between sessions.

2 For truncate/analyze/vacuum reinidex cluster GTT is now like DML, 
they only modify local data and do not modify the GTT definition.
So I lowered the table lock level held by the GTT, only need RowExclusiveLock.

3 For DDLs that need to be modified the GTT table definition(Drop GTT Alter 
GTT), 
an exclusive level of table locking is required(AccessExclusiveLock), 
as is the case for regular table.
This part of the implementation also does not actually change the code.

Summary: What I have done is to adjust the GTT lock levels in different types 
of statements based on the above thinking.
For example, truncate GTT, I'm reducing the GTT holding table lock level to 
RowExclusiveLock,
So We can truncate data in the same GTT between different sessions at the same 
time.

What do you think about table locks on GTT?


Wenjing


> 
> Now, table locks are implemented on a global level. So exclusive lock on GTT 
> in one session block insertion on the second session. Is it expected 
> behaviour? It is safe, but maybe it is too strict. 
> 
> We should define what table lock is meaning on GTT.
> 
> Regards
> 
> Pavel
>  
> Pavel
> 
> 
>> With Regards,
>> Prabhat Kumar Sahu
>> EnterpriseDB: http://www.enterprisedb.com 
> 
> 



Re: WIP: BRIN multi-range indexes

2020-07-10 Thread Masahiko Sawada
On Fri, 3 Jul 2020 at 09:58, Tomas Vondra  wrote:
>
> On Sun, Apr 05, 2020 at 08:01:50PM +0300, Alexander Korotkov wrote:
> >On Sun, Apr 5, 2020 at 8:00 PM Tomas Vondra
> > wrote:
> ...
> >> >
> >> >Assuming we're not going to get 0001-0003 into v13, I'm not so
> >> >inclined to rush on these three as well.  But you're willing to commit
> >> >them, you can count round of review on me.
> >> >
> >>
> >> I have no intention to get 0001-0003 committed. I think those changes
> >> are beneficial on their own, but the primary reason was to support the
> >> new opclasses (which require those changes). And those parts are not
> >> going to make it into v13 ...
> >
> >OK, no problem.
> >Let's do this for v14.
> >
>
> Hi Alexander,
>
> Are you still interested in reviewing those patches? I'll take a look at
> 0001-0003 to check that your previous feedback was addressed. Do you
> have any comments about 0004 / 0005, which I think are the more
> interesting parts of this series?
>
>
> Attached is a rebased version - I realized I forgot to include 0005 in
> the last update, for some reason.
>

I've done a quick test with this patch set. I wonder if we can improve
brin_page_items() SQL function in pageinspect as well. Currently,
brin_page_items() is hard-coded to support only normal brin indexes.
When we pass brin-bloom or brin-multi-range to that function the
binary values are shown in 'value' column but it seems not helpful for
users. For instance, here is an output of brin_page_items() with a
brin-multi-range index:

postgres(1:12801)=# select * from brin_page_items(get_raw_page('mul',
2), 'mul');
-[ RECORD 1 
]--
---

itemoffset  | 1
blknum  | 0
attnum  | 1
allnulls| f
hasnulls| f
placeholder | f
value   | 
{\x01001b002100e570e670e770e870e970ea70eb70ec70ed70ee70ef
70f070f170f270f370f470f570f670f770f870f970fa70fb70fc70fd70fe70ff700
071}

Also, I got an assertion failure when setting false_positive_rate reloption:

postgres(1:12448)=# create index blm on t using brin (c int4_bloom_ops
(false_positive_rate = 1));
TRAP: FailedAssertion("(false_positive_rate > 0) &&
(false_positive_rate < 1.0)", File: "brin_bloom.c", Line: 300)

I'll look at the code in depth and let you know if I find a problem.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




GSSENC'ed connection stalls while reconnection attempts.

2020-07-10 Thread Kyotaro Horiguchi
Hello.

If psql connected using GSSAPI auth and server restarted, reconnection
sequence stalls and won't return.

I found that psql(libpq) sends startup packet via gss
encryption. conn->gssenc should be reset when encryption state is
freed.

The reason that psql doesn't notice the error is pqPacketSend returns
STATUS_OK when write error occurred.  That behavior contradicts to the
comment of the function. The function is used only while making
connection so it's ok to error out immediately by write failure.  I
think other usage of pqFlush while making a connection should report
write failure the same way.

Finally, It's user-friendly if psql shows error message for error on
reset attempts. (This perhaps should be arguable.)

The attached does the above. Any thoughts and/or opinions are welcome.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 8cc5663f31744b384cf013f86062ea28b431baa6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 10 Jul 2020 17:30:11 +0900
Subject: [PATCH] Fix reconnection failure of GSSENC connections

A connection using GSS encryption fails to reconnect and freezes. Fix
that by resetting GSS encryption state on dropping a connection. On
the way fixing that, fix the behavior for write failure while
connection attempts.  Also let psql report the cause of reset attempt
failure.
---
 src/bin/psql/common.c |  2 +-
 src/interfaces/libpq/fe-auth.c|  7 ++-
 src/interfaces/libpq/fe-connect.c | 13 ++---
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 6323a35c91..d5ab1d8ada 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -310,7 +310,7 @@ CheckConnection(void)
 		OK = ConnectionUp();
 		if (!OK)
 		{
-			fprintf(stderr, _("Failed.\n"));
+			fprintf(stderr, _("Failed: %s\n"), PQerrorMessage(pset.db));
 
 			/*
 			 * Transition to having no connection.  Keep this bit in sync with
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 9f5403d74c..3664ee10a0 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -588,7 +588,12 @@ pg_SASL_init(PGconn *conn, int payloadlen)
 	}
 	if (pqPutMsgEnd(conn))
 		goto error;
-	if (pqFlush(conn))
+
+	/*
+	 * We must not fail write here.  Write failure needs to be checked
+	 * explicitly. See pqSendSome.
+	 */
+	if (pqFlush(conn) || conn->write_failed)
 		goto error;
 
 	termPQExpBuffer(_buf);
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27c9bb46ee..f897003f42 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -496,6 +496,8 @@ pqDropConnection(PGconn *conn, bool flushInput)
 			free(conn->gss_ResultBuffer);
 			conn->gss_ResultBuffer = NULL;
 		}
+
+		conn->gssenc = false;
 	}
 #endif
 #ifdef ENABLE_SSPI
@@ -3455,8 +3457,10 @@ keep_going:		/* We will come back to here until there is
  * Just make sure that any data sent by pg_fe_sendauth is
  * flushed out.  Although this theoretically could block, it
  * really shouldn't since we don't send large auth responses.
+ * Write failure needs to be checked explicitly. See
+ * pqSendSome.
  */
-if (pqFlush(conn))
+if (pqFlush(conn) || conn->write_failed)
 	goto error_return;
 
 if (areq == AUTH_REQ_OK)
@@ -4546,8 +4550,11 @@ pqPacketSend(PGconn *conn, char pack_type,
 	if (pqPutMsgEnd(conn))
 		return STATUS_ERROR;
 
-	/* Flush to ensure backend gets it. */
-	if (pqFlush(conn))
+	/*
+	 * Flush to ensure backend gets it. Write failure needs to be checked
+	 * explicitly. See pqSendSome.
+	 */
+	if (pqFlush(conn) || conn->write_failed)
 		return STATUS_ERROR;
 
 	return STATUS_OK;
-- 
2.18.4



Re: Creating a function for exposing memory usage of backend process

2020-07-10 Thread torikoshia

On 2020-07-09 02:03, Andres Freund wrote:

Hi,

I think this is an incredibly useful feature.


Thanks for your kind comments and suggestion!



On 2020-07-07 22:02:10 +0900, torikoshia wrote:

> There can be multiple memory contexts with the same name. So I'm afraid
> that it's difficult to identify the actual parent memory context from
> this
> "parent" column. This is ok when logging memory contexts by calling
> MemoryContextStats() via gdb. Because child memory contexts are printed
> just under their parent, with indents. But this doesn't work in the
> view.
> To identify the actual parent memory or calculate the memory contexts
> tree
> from the view, we might need to assign unique ID to each memory context
> and display it. But IMO this is overkill. So I'm fine with current
> "parent"
> column. Thought? Do you have any better idea?

Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.


Hm. I wonder if we just could include the address of the context
itself. There might be reasons not to do so (e.g. security concerns
about leaked pointers making attacks easier), but I think it's worth
considering.



I tried exposing addresses of each context and their parent.
Attached a poc patch.

  =# SELECT name, address, parent_address, total_bytes FROM 
pg_backend_memory_contexts ;


   name   |  address  | parent_address | total_bytes
  --+---++-
   TopMemoryContext | 0x1280da0 ||   80800
   TopTransactionContext| 0x1309040 | 0x1280da0  |8192
   Prepared Queries | 0x138a480 | 0x1280da0  |   16384
   Type information cache   | 0x134b8c0 | 0x1280da0  |   24624
   ...
   CacheMemoryContext   | 0x12cb390 | 0x1280da0  | 1048576
   CachedPlanSource | 0x13c47f0 | 0x12cb390  |4096
   CachedPlanQuery  | 0x13c9ae0 | 0x13c47f0  |4096
   CachedPlanSource | 0x13c7310 | 0x12cb390  |4096
   CachedPlanQuery  | 0x13c1230 | 0x13c7310  |4096
   ...


Now it's possible to identify the actual parent memory context even when
there are multiple memory contexts with the same name.

I'm not sure, but I'm also worrying about this might incur some security
related problems..

I'd like to hear more opinions about:

- whether information for identifying parent-child relation is necessary 
or it's an overkill
- if this information is necessary, memory address is suitable or other 
means like assigning unique numbers are required




+/*
+ * PutMemoryContextsStatsTupleStore
+ * One recursion level for pg_get_backend_memory_contexts.
+ */
+static void
+PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
+   TupleDesc 
tupdesc, MemoryContext context,
+   MemoryContext 
parent, int level)
+{
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS9
+   Datum   values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+   boolnulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+   MemoryContextCounters stat;
+   MemoryContext child;
+   const char *name = context->name;
+   const char *ident = context->ident;
+
+   if (context == NULL)
+   return;
+
+   /*
+* To be consistent with logging output, we label dynahash contexts
+* with just the hash table name as with MemoryContextStatsPrint().
+*/
+   if (ident && strcmp(name, "dynahash") == 0)
+   {
+   name = ident;
+   ident = NULL;
+   }
+
+   /* Examine the context itself */
+   memset(, 0, sizeof(stat));
+   (*context->methods->stats) (context, NULL, (void *) , );
+
+   memset(values, 0, sizeof(values));
+   memset(nulls, 0, sizeof(nulls));
+
+   values[0] = CStringGetTextDatum(name);
+
+   if (ident)
+   {
+   int idlen = strlen(ident);
+   char
clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+   /*
+* Some identifiers such as SQL query string can be very long,
+* truncate oversize identifiers.
+*/
+   if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+			idlen = pg_mbcliplen(ident, idlen, 
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);

+


Why?


As described below[1], too long messages caused problems in the past and 
now
MemoryContextStatsPrint() truncates ident, so I decided to truncate it 
also

here.

Do you think it's not necessary here?

[1] https://www.postgresql.org/message-id/12319.1521999...@sss.pgh.pa.us


Regards,


--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 10 Jul 2020 17:01:01 +0900
Subject: [PATCH] Add a function 

Re: Creating a function for exposing memory usage of backend process

2020-07-10 Thread torikoshia

On 2020-07-08 22:12, Fujii Masao wrote:

Thanks for updating the patch! It basically looks good to me.

+  
+   backend memory contexts
+  

Do we need this indexterm?


Thanks! it's not necessary. I remove this indexterm.




+{ oid => '2282', descr => 'statistics: information about all memory
contexts of local backend',

Isn't it better to remove "statistics: " from the above description?


Yeah, it's my oversight.



+ 
+  role="column_definition">

+   parent text

There can be multiple memory contexts with the same name. So I'm 
afraid
that it's difficult to identify the actual parent memory context from 
this

"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are 
printed
just under their parent, with indents. But this doesn't work in the 
view.
To identify the actual parent memory or calculate the memory contexts 
tree
from the view, we might need to assign unique ID to each memory 
context
and display it. But IMO this is overkill. So I'm fine with current 
"parent"

column. Thought? Do you have any better idea?


Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.


Agreed. Displaying such ID would be more confusing to users.
Ok, let's leave the code as it is.

Another comment about parent column is: dynahash can be parent?
If yes, its indent instead of name should be displayed in parent 
column?


I'm not sure yet, but considering the changes in the future, it seems
better to do so.

But if we add information for identifying parent-child relation like the
memory address suggested from Andres, it seems not necessary.

So I'd like to go back to this point.



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION





Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Jeff Davis
On Thu, 2020-07-09 at 19:18 -0500, Justin Pryzby wrote:
> Maybe pretend that Jeff implemented something called CashAgg, which
> does
> everything HashAgg does but implemented from scratch.  Users would be
> able to
> tune it or disable it, and we could talk about removing HashAgg for
> the next 3
> years.

That's kind of what we'd have if we had the two escape-hatch GUCs.
Default gives new behavior, changing the GUCs would give the v12
behavior.

In principle, Stephen is right: the v12 behavior is a bug, lots of
people are unhappy about it, it causes real problems, and it would not
be acceptable if proposed today. Otherwise I wouldn't have spent the
time to fix it. 

Similarly, potential regressions are not the "fault" of my feature --
they are the fault of the limitations of work_mem, the limitations of
the planner, the wrong expectations from customers, or just
happenstance. 

But at a certain point, I have to weigh the potential anger of
customers hitting regressions versus the potential anger of hackers
seeing a couple extra GUCs. I have to say that I am more worried about
the former.

If there is some more serious consequence of adding a GUC that I missed
in this thread, please let me know. Otherwise, I intend to commit a new
GUC shortly that will enable users to bypass work_mem for HashAgg, just
as in v12.

Regards,
Jeff Davis






Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

2020-07-10 Thread Kasahara Tatsuhito
Hi,

On Wed, Jul 8, 2020 at 9:40 PM Bharath Rupireddy
 wrote:
> One way, we could solve the above problem is that, upon firing the new
> foreign query from local backend using the cached connection,
> (assuming the remote backend that was cached in the local backed got
> killed for some reasons), instead of failing the query in the local
> backend, upon detecting error from the remote backend, we could just
> delete the cached old entry and try getting another connection to
> remote backend, cache it and proceed to submit the query. This has to
> happen only at the beginning of remote xact.
+1.

I think this is a very useful feature.
In an environment with connection pooling for local, if a remote
server has a failover or switchover,
this feature would prevent unexpected errors of local queries after
recovery of the remote server.

I haven't looked at the code in detail yet, some comments here.

1. To keep the previous behavior (and performance), how about allowing
the user to specify
   whether or not to retry as a GUC parameter or in the FOREIGN SERVER OPTION?

2. How about logging a LOG message when retry was success to let us know
   the retry feature worked or how often the retries worked ?

> I couldn't think of adding a test case to the existing postgres_fdw
> regression test suite with an automated scenario of the remote backend
> getting killed.

Couldn't you confirm this by adding a test case like the following?
===
BEGIN;
-- Generate a connection to remote
SELECT * FROM ft1 LIMIT 1;

-- retrieve pid of postgres_fdw and kill it
-- could use the other unique identifier (not postgres_fdw but
fdw_retry_check, etc ) for application name
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend' AND application_name = 'postgres_fdw'

-- COMMIT, so next query will should success if connection-retry works
COMMIT;
SELECT * FROM ft1 LIMIT 1;
===

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Postgres is not able to handle more than 4k tables!?

2020-07-10 Thread Laurenz Albe
On Thu, 2020-07-09 at 12:47 -0400, Stephen Frost wrote:
> I realize this is likely to go over like a lead balloon, but the churn
> in pg_class from updating reltuples/relpages has never seemed all that
> great to me when just about everything else is so rarely changed, and
> only through some user DDL action- and I agree that it seems like those
> particular columns are more 'statistics' type of info and less info
> about the definition of the relation.  Other columns that do get changed
> regularly are relfrozenxid and relminmxid.  I wonder if it's possible to
> move all of those elsewhere- perhaps some to the statistics tables as
> you seem to be alluding to, and the others to $somewhereelse that is
> dedicated to tracking that information which VACUUM is primarily
> concerned with.

Perhaps we could create pg_class with a fillfactor less than 100
so we het HOT updates there.
That would be less invasive.

Yours,
Laurenz Albe





Re: Postgres is not able to handle more than 4k tables!?

2020-07-10 Thread Konstantin Knizhnik



On 09.07.2020 22:16, Grigory Smolkin wrote:


On 7/8/20 11:41 PM, Konstantin Knizhnik wrote:


So looks like NUM_LOCK_PARTITIONS and MAXNUMMESSAGES  constants have 
to be replaced with GUCs.
To avoid division, we can specify log2 of this values, so shift can 
be used instead.
And MAX_SIMUL_LWLOCKS should be defined as NUM_LOCK_PARTITIONS + 
NUM_INDIVIDUAL_LWLOCKS + NAMED_LWLOCK_RESERVE.


Because I was involved in this particular case and don`t want it to 
became a habit, I`m volunteering to test whatever patch this 
discussion will produce.



You are welcome:)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68effcaed6..27f6328dca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -607,7 +607,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	pgstat_report_vacuum(RelationGetRelid(onerel),
 		 onerel->rd_rel->relisshared,
 		 new_live_tuples,
-		 vacrelstats->new_dead_tuples);
+		 vacrelstats->new_dead_tuples,
+		 OldestXmin);
 	pgstat_progress_end_command();
 
 	/* and log the action if appropriate */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 9c7d4b0c60..0eed10fd65 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -92,6 +92,7 @@
 #include "storage/lmgr.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/smgr.h"
@@ -2992,6 +2993,9 @@ relation_needs_vacanalyze(Oid relid,
 	TransactionId xidForceLimit;
 	MultiXactId multiForceLimit;
 
+	TransactionId oldestXmin;
+	Relationrel;
+
 	AssertArg(classForm != NULL);
 	AssertArg(OidIsValid(relid));
 
@@ -3076,6 +3080,10 @@ relation_needs_vacanalyze(Oid relid,
 		instuples = tabentry->inserts_since_vacuum;
 		anltuples = tabentry->changes_since_analyze;
 
+		rel = RelationIdGetRelation(relid);
+		oldestXmin = TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM), rel);
+		RelationClose(rel);
+
 		vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
 		vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
 		anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
@@ -3095,9 +3103,20 @@ relation_needs_vacanalyze(Oid relid,
  vactuples, vacthresh, anltuples, anlthresh);
 
 		/* Determine if this table needs vacuum or analyze. */
-		*dovacuum = force_vacuum || (vactuples > vacthresh) ||
-			(vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
-		*doanalyze = (anltuples > anlthresh);
+		if (tabentry->autovac_oldest_xmin != oldestXmin)
+		{
+			*dovacuum = force_vacuum || (vactuples > vacthresh) ||
+(vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
+			*doanalyze = (anltuples > anlthresh);
+			elog(DEBUG1, "Consider relation %d tabentry=%p, tabentry->autovac_oldest_xmin=%d, oldestXmin=%d, dovacuum=%d, doanalyze=%d",
+ relid, tabentry, tabentry->autovac_oldest_xmin, oldestXmin, *dovacuum, *doanalyze);
+		}
+		else
+		{
+			elog(DEBUG1, "Skip autovacuum of relation %d", relid);
+			*dovacuum = force_vacuum;
+			*doanalyze = false;
+		}
 	}
 	else
 	{
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c022597bc0..99db7884de 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1461,7 +1461,8 @@ pgstat_report_autovac(Oid dboid)
  */
 void
 pgstat_report_vacuum(Oid tableoid, bool shared,
-	 PgStat_Counter livetuples, PgStat_Counter deadtuples)
+	 PgStat_Counter livetuples, PgStat_Counter deadtuples,
+	 TransactionId oldestXmin)
 {
 	PgStat_MsgVacuum msg;
 
@@ -1475,6 +1476,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	msg.m_vacuumtime = GetCurrentTimestamp();
 	msg.m_live_tuples = livetuples;
 	msg.m_dead_tuples = deadtuples;
+	msg.m_oldest_xmin = oldestXmin;
 	pgstat_send(, sizeof(msg));
 }
 
@@ -4838,6 +4840,7 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
 		result->analyze_count = 0;
 		result->autovac_analyze_timestamp = 0;
 		result->autovac_analyze_count = 0;
+		result->autovac_oldest_xmin = InvalidTransactionId;
 	}
 
 	return result;
@@ -6007,6 +6010,7 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			tabentry->analyze_count = 0;
 			tabentry->autovac_analyze_timestamp = 0;
 			tabentry->autovac_analyze_count = 0;
+			tabentry->autovac_oldest_xmin = InvalidTransactionId;
 		}
 		else
 		{
@@ -6288,6 +6292,8 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	tabentry->n_live_tuples = msg->m_live_tuples;
 	tabentry->n_dead_tuples = msg->m_dead_tuples;
 
+	tabentry->autovac_oldest_xmin = msg->m_oldest_xmin;
+
 	/*
 	 * It is quite possible that a non-aggressive VACUUM ended up skipping
 	 * various pages, however, we'll zero the insert counter here regardless.
diff --git a/src/include/pgstat.h 

Re: Yet another fast GiST build (typo)

2020-07-10 Thread Andrey M. Borodin


> 10 июля 2020 г., в 10:53, Thomas Munro  написал(а):
> 
> On Tue, Jul 7, 2020 at 7:03 PM Andrey M. Borodin  wrote:
>> Oops. I've mismerged docs and did not notice this with check world. PFA v8 
>> with fixed docs.
> 
> It looks like point_zorder_internal() has the check for NaN in the wrong 
> place.
Thanks! Fixed.

Best regards, Andrey Borodin.


v9-0001-Add-sort-support-for-point-gist_point_sortsupport.patch
Description: Binary data


v9-0002-Implement-GiST-build-using-sort-support.patch
Description: Binary data


Re: SQL-standard function body

2020-07-10 Thread Thomas Munro
On Wed, Jul 1, 2020 at 5:49 AM Peter Eisentraut
 wrote:
> - More test coverage is needed.  Surprisingly, there wasn't actually any
> test AFAICT that just creates and SQL function and runs it.  Most of
> that code is tested incidentally, but there is very little or no
> targeted testing of this functionality.

FYI cfbot showed a sign of some kind of error_context_stack corruption
while running "DROP TABLE functest3 CASCADE;".




Re: Log the location field before any backtrace

2020-07-10 Thread Peter Eisentraut

On 2020-07-10 04:04, Michael Paquier wrote:

On Thu, Jul 09, 2020 at 12:31:38PM -0400, Alvaro Herrera wrote:

On 2020-Jul-09, Daniel Gustafsson wrote:

On 9 Jul 2020, at 11:17, Peter Eisentraut  
wrote:


In PG13, we added the ability to add backtraces to the log
output. After some practical experience with it, I think the
order in which the BACKTRACE and the LOCATION fields are printed
is wrong.  I propose we put the LOCATION field before the
BACKTRACE field, not after.  This makes more sense because the
location is effectively at the lowest level of the backtrace.


Makes sense, +1


Likewise


+1.


committed

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: posgres 12 bug (partitioned table)

2020-07-10 Thread Amit Langote
Hi Soumyadeep,

On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
 wrote:
>
> Hey Amit,
>
> On Thu, Jul 9, 2020 at 12:16 AM Amit Langote  wrote:
>
> > By the way, what happens today if you do INSERT INTO a_zedstore_table
> > ... RETURNING xmin?  Do you get an error "xmin is unrecognized" or
> > some such in slot_getsysattr() when trying to project the RETURNING
> > list?
> >
> We get garbage values for xmin and cmin. If we request cmax/xmax, we get
> an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
> "zedstore tuple table slot does not have system attributes (except xmin
> and cmin)"
>
> A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
> zedstoream_insert(), which is the tuple_insert() implementation, does
> not supply the xmin/cmin, thus making those values garbage.
>
> For context, Zedstore has its own UNDO log implementation to act as
> storage for transaction information. (which is intended to be replaced
> with the upstream UNDO log in the future).
>
> The above behavior is not just restricted to INSERT..RETURNING, right
> now. If we do a select  from foo in Zedstore, the behavior is
> the same.  The transaction information is never returned from Zedstore
> in tableam calls that don't demand transactional information be
> used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
> it will use the transactional information correctly. It will also
> populate TM_FailureData, which contains xmax and cmax, in the APIs where
> it is demanded.
>
> I really wonder what other AMs are doing about these issues.
>
> I think we should either:
>
> 1. Demand transactional information off of AMs for all APIs that involve
> a projection of transactional information.
>
> 2. Have some other component of Postgres supply the transactional
> information. This is what I think the upstream UNDO log can probably
> provide.

So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.  We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots.  I suppose there's a reason why we allow AMs to
provide their own slot callbacks.

Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.

> 3. (Least elegant) Transform tuple table slots into heap tuple table
> slots (since it is the only kind of tuple storage that can supply
> transactional info) and explicitly fill in the transactional values
> depending on the context, whenever transactional information is
> projected.
>
> For this bug report, I am not sure what is right. Perhaps, to stop the
> bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
> that the AM needs to provide the transactional info in the respective
> insert AM API calls,

As long as the AM's slot_getsysattr() callback returns the correct
value, this works.

> as well as demand a heap slot for partition roots
> and interior nodes.

It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-10 Thread Dilip Kumar
On Fri, Jul 10, 2020 at 11:01 AM Ajin Cherian  wrote:
>
>
>
> On Fri, Jul 10, 2020 at 3:11 PM Dilip Kumar  wrote:
>>
>> With your changes sometimes due to incomplete toast
>> changes, if it can not pick the largest top txn for streaming it will
>> hang forever in the while loop, in that case, it should go for
>> spilling.
>>
>> while (rb->size >= logical_decoding_work_mem * 1024L)
>> {
>> /*
>> * Pick the largest transaction (or subtransaction) and evict it from
>> * memory by streaming, if supported. Otherwise, spill to disk.
>> */
>> if (ReorderBufferCanStream(rb) &&
>> (txn = ReorderBufferLargestTopTXN(rb)) != NULL)
>>
>>
>
> Which is this condition (of not picking largest top txn)? Wouldn't 
> ReorderBufferLargestTopTXN then return a NULL? If not, is there a way to know 
> that a transaction cannot be streamed, so there can be an exit condition for 
> the while loop?


Okay, I see, so if ReorderBufferLargestTopTXN returns NULL you are
breaking the loop.  I did not see the other part of the patch but I
agree that it will not go in an infinite loop.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com