Re: Refactor parse analysis of EXECUTE command

2019-11-03 Thread Peter Eisentraut

On 2019-11-02 16:00, Tom Lane wrote:

Peter Eisentraut  writes:

This patch moves the parse analysis component of ExecuteQuery() and
EvaluateParams() into a new transformExecuteStmt() that is called from
transformStmt().


Uhmm ... no actual patch attached?


Oops, here it is.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 7c5bc30a02ec34646c8e49af1499fe4113bc9e5e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 31 Oct 2019 09:54:07 +0100
Subject: [PATCH] Refactor parse analysis of EXECUTE command

Move the parse analysis component of ExecuteQuery() and
EvaluateParams() into a new transformExecuteStmt() that is called from
transformStmt().  This makes EXECUTE behave more like other utility
commands.  It also allows error messages to have position information,
and it allows using external parameters in the arguments of the
EXECUTE command.
---
 src/backend/commands/createas.c   |  2 +-
 src/backend/commands/prepare.c| 72 ++
 src/backend/parser/analyze.c  | 89 +++
 src/backend/tcop/utility.c|  2 +-
 src/include/commands/prepare.h|  2 +-
 src/test/regress/expected/prepare.out | 25 
 src/test/regress/sql/prepare.sql  | 21 +++
 7 files changed, 143 insertions(+), 70 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index b7d220699f..e4244f84e2 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -271,7 +271,7 @@ ExecCreateTableAs(CreateTableAsStmt *stmt, const char 
*queryString,
ExecuteStmt *estmt = castNode(ExecuteStmt, query->utilityStmt);
 
Assert(!is_matview);/* excluded by syntax */
-   ExecuteQuery(estmt, into, queryString, params, dest, 
completionTag);
+   ExecuteQuery(estmt, into, params, dest, completionTag);
 
/* get object address that intorel_startup saved for us */
address = ((DR_intorel *) dest)->reladdr;
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 7e0a041fab..0aba6a7b00 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -47,7 +47,7 @@ static HTAB *prepared_queries = NULL;
 
 static void InitQueryHashTable(void);
 static ParamListInfo EvaluateParams(PreparedStatement *pstmt, List *params,
-   const 
char *queryString, EState *estate);
+   EState 
*estate);
 static Datum build_regtype_array(Oid *param_types, int num_params);
 
 /*
@@ -189,16 +189,10 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
  * indicated by passing a non-null intoClause.  The DestReceiver is already
  * set up correctly for CREATE TABLE AS, but we still have to make a few
  * other adjustments here.
- *
- * Note: this is one of very few places in the code that needs to deal with
- * two query strings at once.  The passed-in queryString is that of the
- * EXECUTE, which we might need for error reporting while processing the
- * parameter expressions.  The query_string that we copy from the plan
- * source is that of the original PREPARE.
  */
 void
 ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
-const char *queryString, ParamListInfo params,
+ParamListInfo params,
 DestReceiver *dest, char *completionTag)
 {
PreparedStatement *entry;
@@ -229,8 +223,7 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
 */
estate = CreateExecutorState();
estate->es_param_list_info = params;
-   paramLI = EvaluateParams(entry, stmt->params,
-queryString, 
estate);
+   paramLI = EvaluateParams(entry, stmt->params, estate);
}
 
/* Create a new portal to run the query in */
@@ -316,7 +309,6 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
  *
  * pstmt: statement we are getting parameters for.
  * params: list of given parameter expressions (raw parser output!)
- * queryString: source text for error messages.
  * estate: executor state to use.
  *
  * Returns a filled-in ParamListInfo -- this can later be passed to
@@ -324,72 +316,19 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
  * during query execution.
  */
 static ParamListInfo
-EvaluateParams(PreparedStatement *pstmt, List *params,
-  const char *queryString, EState *estate)
+EvaluateParams(PreparedStatement *pstmt, List *params, EState *estate)
 {
Oid*param_types = pstmt->plansource->param_types;
int num_params = pstmt->plansource->num_params;
-   int

Re: cost based vacuum (parallel)

2019-11-03 Thread Komяpa
>
>
> This is somewhat similar to a memory usage problem with a
> parallel query where each worker is allowed to use up to work_mem of
> memory.  We can say that the users using parallel operation can expect
> more system resources to be used as they want to get the operation
> done faster, so we are fine with this.  However, I am not sure if that
> is the right thing, so we should try to come up with some solution for
> it and if the solution is too complex, then probably we can think of
> documenting such behavior.
>

In cloud environments (Amazon + gp2) there's a budget on input/output
operations. If you cross it for long time, everything starts looking like
you work with a floppy disk.

For the ease of configuration, I would need a "max_vacuum_disk_iops" that
would limit number of input-output operations by all of the vacuums in the
system. If I set it to less than value of budget refill, I can be sure than
that no vacuum runs too fast to impact any sibling query.

There's also value in non-throttled VACUUM for smaller tables. On gp2 such
things will be consumed out of surge budget, and its size is known to
sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has
less blocks than this value and it's a blocking in any way situation
(antiwraparound, interactive console, ...) - go on and run without
throttling.

For how to balance the cost: if we know a number of vacuum processes that
were running in the previous second, we can just divide a slot for this
iteration by that previous number.

To correct for overshots, we can subtract the previous second's overshot
from next one's. That would also allow to account for surge budget usage
and let it refill, pausing all autovacuum after a manual one for some time.

Precision of accounting limiting count of operations more than once a
second isn't beneficial for this use case.

Please don't forget that processing one page can become several iops (read,
write, wal).

Does this make sense? :)


Re: [HACKERS] Block level parallel vacuum

2019-11-03 Thread Masahiko Sawada
On Mon, 4 Nov 2019 at 14:02, Amit Kapila  wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada  wrote:
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect?
> >
>
> Yeah, this is an important thing to decide.  I don't think that the
> conclusion you are drawing is correct because it that is true then the
> same applies to the current autovacuum work division where we divide
> the cost_limit among workers but the cost_delay is same (see
> autovac_balance_cost).  Basically, if we consider the delay time of
> each worker independently, then it would appear that a parallel vacuum
> delay with approach (b) is more, but that is true only if the workers
> run serially which is not true.
>
> > I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
>
> I am also not completely sure which approach is better but I slightly
> lean towards approach (b).

Can we get the same sleep time as approach (b) if we divide the cost
limit by the number of workers and have the shared cost balance (i.e.
approach (a) with dividing the cost limit)? Currently the approach (b)
seems better but I'm concerned that it might unnecessarily delay
vacuum if some indexes are very small or bulk-deletions of indexes
does almost nothing such as brin.

>
>   I think we need input from some other
> people as well.  I will start a separate thread to discuss this and
> see if that helps to get the input from others.

+1

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-03 Thread Andrey Borodin
Hi Tels!
Thanks for your interest in fast decompression.

> 3 нояб. 2019 г., в 12:24, Tels  написал(а):
> 
> I wonder if you agree and what would happen if you try this variant on your 
> corpus tests.

I've tried some different optimization for literals. For example loop 
unrolling[0] and literals bulk-copying.
This approaches were brining some performance improvement. But with noise. 
Statistically they were somewhere better, somewhere worse, net win, but that 
"net win" depends on what we consider important data and important platform.

Proposed patch makes clearly decompression faster on any dataset, and platform.
I believe improving pglz further is viable, but optimizations like common data 
prefix seems more promising to me.
Also, I think we actually need real codecs like lz4, zstd and brotli instead of 
our own invented wheel.

If you have some spare time - Pull Requests to test_pglz are welcome, lets 
benchmark more micro optimizations, it brings a lot of fun :)


--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud

[0] https://github.com/x4m/test_pglz/blob/master/pg_lzcompress_hacked.c#L166



cost based vacuum (parallel)

2019-11-03 Thread Amit Kapila
For parallel vacuum [1], we were discussing what is the best way to
divide the cost among parallel workers but we didn't get many inputs
apart from people who are very actively involved in patch development.
I feel that we need some more inputs before we finalize anything, so
starting a new thread.

The initial version of the patch has a very rudimentary way of doing
it which means each parallel vacuum worker operates independently
w.r.t vacuum delay and cost.  This will lead to more I/O in the system
than the user has intended to do.  Assume that the overall I/O allowed
for vacuum operation is X after which it will sleep for some time,
reset the balance and continue.  In the patch, each worker will be
allowed to perform X before which it can sleep and also there is no
coordination for the same with master backend which would have done
some I/O for the heap.  So, in the worst-case scenario, there can be n
times more I/O where n is the number of workers doing the parallel
operation.  This is somewhat similar to a memory usage problem with a
parallel query where each worker is allowed to use up to work_mem of
memory.  We can say that the users using parallel operation can expect
more system resources to be used as they want to get the operation
done faster, so we are fine with this.  However, I am not sure if that
is the right thing, so we should try to come up with some solution for
it and if the solution is too complex, then probably we can think of
documenting such behavior.

The two approaches to solve this problem being discussed in that
thread [1] are as follows:
(a) Allow the parallel workers and master backend to have a shared
view of vacuum cost related parameters (mainly VacuumCostBalance) and
allow each worker to update it and then based on that decide whether
it needs to sleep.  Sawada-San has done the POC for this approach.
See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
drawback of this approach could be that we allow the worker to sleep
even though the I/O has been performed by some other worker.

(b) The other idea could be that we split the I/O among workers
something similar to what we do for auto vacuum workers (see
autovac_balance_cost).  The basic idea would be that before launching
workers, we need to compute the remaining I/O (heap operation would
have used something) after which we need to sleep and split it equally
across workers.  Here, we are primarily thinking of dividing
VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
are finished, they need to let master backend know how much I/O they
have consumed and then master backend can add it to it's current I/O
consumed.  I think we also need to rebalance the cost of remaining
workers once some of the worker's exit.  Dilip has prepared a POC
patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].

I think approach-2 is better in throttling the system as it doesn't
have the drawback of the first approach, but it might be a bit tricky
to implement.

As of now, the POC for both the approaches has been developed and we
see similar results for both approaches, but we have tested simpler
cases where each worker has similar amount of I/O to perform.

Thoughts?


[1] - https://commitfest.postgresql.org/25/1774/
[2] - 
https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAFiTN-thU-z8f04jO7xGMu5yUUpTpsBTvBrFW6EhRf-jGvEz%3Dg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-11-03 Thread Dilip Kumar
On Mon, Nov 4, 2019 at 10:32 AM Amit Kapila  wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada  wrote:
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect?
> >
>
> Yeah, this is an important thing to decide.  I don't think that the
> conclusion you are drawing is correct because it that is true then the
> same applies to the current autovacuum work division where we divide
> the cost_limit among workers but the cost_delay is same (see
> autovac_balance_cost).  Basically, if we consider the delay time of
> each worker independently, then it would appear that a parallel vacuum
> delay with approach (b) is more, but that is true only if the workers
> run serially which is not true.
>
> > I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
>
> I am also not completely sure which approach is better but I slightly
> lean towards approach (b).  I think we need input from some other
> people as well.  I will start a separate thread to discuss this and
> see if that helps to get the input from others.

+1


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-11-03 Thread Dilip Kumar
On Mon, Nov 4, 2019 at 10:45 AM Amit Kapila  wrote:
>
> On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar  wrote:
> >
> > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > I think that two approaches make parallel vacuum worker wait in
> > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > performed by single process, on the other hand in approach(b) the
> > > vacuum delay work for each workers independently.
> > >
> > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > Because all parallel vacuum workers use the shared balance value and a
> > > worker sleeps once the balance value exceeds the limit. In
> > > approach(b), since the cost limit is divided evenly the value of each
> > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > processes blocks  evenly,  the total sleep time of all workers is
> > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > compute the sleep time of approach(b) by dividing the total value by
> > > the number of parallel workers.
> > >
> > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > vacuum and parallel vacuum with approach(a) even with the same
> > > settings. Which behaviors do we expect? I thought the vacuum delay for
> > > parallel vacuum should work as if it's a single process vacuum as we
> > > did for memory usage. I might be missing something. If we prefer
> > > approach(b) I should change the patch so that the leader process
> > > divides the cost limit evenly.
> > >
> > I have repeated the same test (test1 and test2)[1] with a higher
> > shared buffer (1GB).  Currently, I have used the same formula for
> > computing the total delay
> > heap scan delay + index vacuuming delay / workers.  Because, In my
> > opinion, multiple workers are doing I/O here so the total delay should
> > also be in multiple
> > of the number of workers.  So if we want to compare the delay with the
> > sequential vacuum then we should divide total delay by the number of
> > workers.  But, I am not
> > sure whether computing the total delay is the right way to compute the
> > I/O throttling or not.  But, I support the approach (b) for dividing
> > the I/O limit because
> > auto vacuum workers are already operating with this approach.
> >
> > test1:
> > normal: stats delay 1348.16, hit 68952, miss 2, dirty 10063, total 79017
> > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
> > total 79102 (cost divide patch)
> > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
> > total 78994 (cost divide patch)
> > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
> > total 92252 (share cost patch)
> > 2 worker: stats delay 904.37, hit 86482, miss 2, dirty 17806,
> > total 104290 (share cost patch)
> >
> > test2:
> > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
> > 1 worker: stats delay 530.70, hit 36984, miss 2, dirty 3527, total
> > 40513 (cost divide patch)
> > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
> > 40518 (cost divide patch)
> > 1 worker: stats delay 490.57, hit 39090, miss 2, dirty 3497, total
> > 42589 (share cost patch)
> > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
> > 42871 (share cost patch)
> >
> > So with higher, shared buffers,  I can see with approach (b) we can
> > see the same total delay.  With approach (a) I can see a bit less
> > total delay.  But, a point to be noted that I have used the same
> > formulae for computing the total delay for both the approaches.  But,
> > Sawada-san explained in the above mail that it may not be the right
> > way to computing the total delay for the approach (a).  But my take is
> > that whether we are working with shared cost or we are dividing the
> > cost, the delay must be divided by number of workers in the parallel
> > phase.
> >
>
> Why do you think so?  I think with approach (b) if all the workers are
> doing equal amount of I/O, they will probably sleep at the same time
> whereas with approach (a) each of them will sleep at different times.
> So, probably dividing the delay in approach (b) makes more sense.

Just to be clear,  I did not mean that we divide the sleep time for
each worker.  Actually, I meant how to project the total delay in the
test patch.  So I think if we directly want to compare the sleep time
of the sequential vs parallel then it's not fair to just compare the
total sleep time because when multiple workers are working parallelly
shouldn't we need to consider their average sleep time?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-11-03 Thread Amit Kapila
On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar  wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada  wrote:
> >
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect? I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
> I have repeated the same test (test1 and test2)[1] with a higher
> shared buffer (1GB).  Currently, I have used the same formula for
> computing the total delay
> heap scan delay + index vacuuming delay / workers.  Because, In my
> opinion, multiple workers are doing I/O here so the total delay should
> also be in multiple
> of the number of workers.  So if we want to compare the delay with the
> sequential vacuum then we should divide total delay by the number of
> workers.  But, I am not
> sure whether computing the total delay is the right way to compute the
> I/O throttling or not.  But, I support the approach (b) for dividing
> the I/O limit because
> auto vacuum workers are already operating with this approach.
>
> test1:
> normal: stats delay 1348.16, hit 68952, miss 2, dirty 10063, total 79017
> 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
> total 79102 (cost divide patch)
> 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
> total 78994 (cost divide patch)
> 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
> total 92252 (share cost patch)
> 2 worker: stats delay 904.37, hit 86482, miss 2, dirty 17806,
> total 104290 (share cost patch)
>
> test2:
> normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
> 1 worker: stats delay 530.70, hit 36984, miss 2, dirty 3527, total
> 40513 (cost divide patch)
> 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
> 40518 (cost divide patch)
> 1 worker: stats delay 490.57, hit 39090, miss 2, dirty 3497, total
> 42589 (share cost patch)
> 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
> 42871 (share cost patch)
>
> So with higher, shared buffers,  I can see with approach (b) we can
> see the same total delay.  With approach (a) I can see a bit less
> total delay.  But, a point to be noted that I have used the same
> formulae for computing the total delay for both the approaches.  But,
> Sawada-san explained in the above mail that it may not be the right
> way to computing the total delay for the approach (a).  But my take is
> that whether we are working with shared cost or we are dividing the
> cost, the delay must be divided by number of workers in the parallel
> phase.
>

Why do you think so?  I think with approach (b) if all the workers are
doing equal amount of I/O, they will probably sleep at the same time
whereas with approach (a) each of them will sleep at different times.
So, probably dividing the delay in approach (b) makes more sense.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Can avoid list_copy in recomputeNamespacePath() conditionally?

2019-11-03 Thread amul sul
On Sat, Nov 2, 2019 at 8:01 PM Tom Lane  wrote:

> amul sul  writes:
> > I wondered can we have a shortcut somewhat similar to following POC
> > in recomputeNamespacePath () when the recomputed path is the same as the
> > previous baseSearchPath/activeSearchPath :
> > +   /* TODO: POC */
> > +   if (equal(oidlist, baseSearchPath))
> > +   return;
>
> There's an awful lot missing from that sketch; all of the remaining
> steps still need to be done:
>
>
You are correct, but that was intentionally skipped to avoid longer post
descriptions for the initial discussion. Sorry for being little lazy.


> baseCreationNamespace = firstNS;
> baseTempCreationPending = temp_missing;
>
> /* Mark the path valid. */
> baseSearchPathValid = true;
> namespaceUser = roleid;
>
> /* And make it active. */
> activeSearchPath = baseSearchPath;
> activeCreationNamespace = baseCreationNamespace;
> activeTempCreationPending = baseTempCreationPending;
>
> /* Clean up. */
> pfree(rawname);
> list_free(namelist);
> list_free(oidlist);
>
> More to the point, I think the onus would be on the patch submitter
> to prove that the extra complexity had some measurable benefit.
> I really doubt that it would, since the list_copy is surely trivial
> compared to the catalog lookup work we had to do to compute the OID
> list above here.
>

Agree.


> It'd likely be more useful to see if you could reduce the number of
> places where we have to invalidate the path in the first place.
>

Understood, let me check.

Regards,
Amul


Re: [HACKERS] Block level parallel vacuum

2019-11-03 Thread Amit Kapila
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada  wrote:
>
> I think that two approaches make parallel vacuum worker wait in
> different way: in approach(a) the vacuum delay works as if vacuum is
> performed by single process, on the other hand in approach(b) the
> vacuum delay work for each workers independently.
>
> Suppose that the total number of blocks to vacuum is 10,000 blocks,
> the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> ms. In single process vacuum the total sleep time is 2,500ms (=
> (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> Because all parallel vacuum workers use the shared balance value and a
> worker sleeps once the balance value exceeds the limit. In
> approach(b), since the cost limit is divided evenly the value of each
> workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> processes blocks  evenly,  the total sleep time of all workers is
> 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> compute the sleep time of approach(b) by dividing the total value by
> the number of parallel workers.
>
> IOW the approach(b) makes parallel vacuum delay much more than normal
> vacuum and parallel vacuum with approach(a) even with the same
> settings. Which behaviors do we expect?
>

Yeah, this is an important thing to decide.  I don't think that the
conclusion you are drawing is correct because it that is true then the
same applies to the current autovacuum work division where we divide
the cost_limit among workers but the cost_delay is same (see
autovac_balance_cost).  Basically, if we consider the delay time of
each worker independently, then it would appear that a parallel vacuum
delay with approach (b) is more, but that is true only if the workers
run serially which is not true.

> I thought the vacuum delay for
> parallel vacuum should work as if it's a single process vacuum as we
> did for memory usage. I might be missing something. If we prefer
> approach(b) I should change the patch so that the leader process
> divides the cost limit evenly.
>

I am also not completely sure which approach is better but I slightly
lean towards approach (b).  I think we need input from some other
people as well.  I will start a separate thread to discuss this and
see if that helps to get the input from others.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Allow superuser to grant passwordless connection rights on postgres_fdw

2019-11-03 Thread Stephen Frost
Greetings,

* Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
> On 11/1/19 12:58 PM, Robert Haas wrote:
> > On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan
> >  wrote:
> >> This patch allows the superuser to grant passwordless connection rights
> >> in postgres_fdw user mappings.
> > This is clearly something that we need, as the current code seems
> > woefully ignorant of the fact that passwords are not the only
> > authentication method supported by PostgreSQL, nor even the most
> > secure.
> >
> > But, I do wonder a bit if we ought to think harder about the overall
> > authentication model for FDW. Like, maybe we'd take a different view
> > of how to solve this particular piece of the problem if we were
> > thinking about how FDWs could do LDAP authentication, SSL
> > authentication, credentials forwarding...
> 
> I'm certainly open to alternatives.

I've long felt that the way to handle this kind of requirement is to
have a "trusted remote server" kind of option- where the local server
authenticates to the remote server as a *server* and then says "this is
the user on this server, and this is the user that this user wishes to
be" and the remote server is then able to decide if they accept that, or
not.

To be specific, there would be some kind of 'trust' established between
the servers and only if there is some kind of server-level
authentication, eg: dual TLS auth, or dual GSSAPI auth; and then, a
mapping is defined for that server, which specifies what remote user is
allowed to log in as what local user.

This would be a server-to-server auth arrangement, and is quite
different from credential forwarding, or similar.  I am certainly also a
huge fan of the idea that we support Kerberos/GSSAPI credential
forwarding / delegation, where a client willingly forwards to the PG
server a set of credentials which then allow the PG server to
authenticate as that user to another system (eg: through an FDW to
another PG server).

Of course, as long as we're talking pie-in-the-sky ideas, I would
certainly be entirely for supporting both. ;)

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Collation versioning

2019-11-03 Thread Thomas Munro
On Fri, Nov 1, 2019 at 2:21 AM Julien Rouhaud  wrote:
> Are you planning to continue working on it?  For the record, that's
> something needed to be able to implement a filter in REINDEX command
> [1].

Bonjour Julien,

Unfortunately I haven't had time to work on it seriously, but here's a
quick rebase to get the proof-of-concept back into working shape.
It's nice to see progress in other bits of the problem-space.  I hope
to have time to look at this patch set again soon, but if you or
someone else would like hack on or think about it too, please feel
free!

Yes indeed this is exactly the same problem that you're trying to
solve, approached from a different starting point.

Here are some problems to think about:

* We'd need to track dependencies on the default collation once we
have versioning for that (see
https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com).
That is how most people actually consume collations out there in real
life, and yet we don't normally track dependencies on the default
collation and I don't know if that's simply a matter of ripping out
all the code that looks like "xxx != DEFAULT_COLLATION_ID" in the
dependency analysis code or if there's more to it.
* Andres mentioned off-list that pg_depend rows might get blown away
and recreated in some DDL circumstances.  We need to look into that.
* Another is that pg_upgrade won't preserve pg_depend rows, so you'd
need some catalog manipulation (direct or via new DDL) to fix that.
* Some have expressed doubt that pg_depend is the right place for
this; let's see if any counter-proposals appear.

> # reindex table t1;
> WARNING:  01000: index "t1_val_idx" depends on collation 13330 version
> "a153.97.35.8", but the current version is "153.97.35.8"
> DETAIL:  The index may be corrupted due to changes in sort order.
> HINT:  REINDEX to avoid the risk of corruption.
> LOCATION:  index_check_collation_version, index.c:1263

Duh.  Yeah, that's stupid and needs to be fixed somehow.


0001-Remove-pg_collation.collversion-v2.patch
Description: Binary data


0002-Add-pg_depend.refobjversion-v2.patch
Description: Binary data


0003-Track-collation-versions-for-indexes-v2.patch
Description: Binary data


TAP tests aren't using the magic words for Windows file access

2019-11-03 Thread Tom Lane
Buildfarm member drongo has been failing the pg_ctl regression test
pretty often.  I happened to look closer at what's happening, and
it's this:

could not read 
"C:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles":
 Permission denied at C:/prog/bf/root/HEAD/pgsql.build/src/test/perl/TestLib.pm 
line 397.

That is, TestLib::slurp_file is failing to read a file.  Almost
certainly, "permission denied" doesn't really mean a permissions
problem, but failure to specify the file-opening flags needed to
allow concurrent access on Windows.  We fixed this in pg_ctl
itself in commit 0ba06e0bf ... but we didn't fix the TAP
infrastructure.  Is there an easy way to get Perl on board
with that?

regards, tom lane




Re: Binary support for pgoutput plugin

2019-11-03 Thread Thomas Munro
On Thu, Oct 31, 2019 at 3:03 AM Dave Cramer  wrote:
> Ok, I've rebased and reverted logicalrep_read_insert

Hi Dave,

>From the code style police (actually just from cfbot, which is set up
to complain about declarations after statements, a bit of C99 we
aren't ready for):

proto.c:557:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
  int len = pq_getmsgint(in, 4); /* read length */
  ^
proto.c:573:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
  int len = pq_getmsgint(in, 4); /* read length */
  ^




Re: New SQL counter statistics view (pg_stat_sql)

2019-11-03 Thread Thomas Munro
On Thu, Oct 17, 2019 at 3:22 PM Smith, Peter  wrote:
> We have resurrected this 2 year old "SQL statements statistics counter" 
> proposal from Hari.
>
> The attached patch addresses the previous review issues.

Hi,

No comment on the patch but I noticed that the documentation changes
don't build.  Please make sure you can "make docs" successfully,
having installed the documentation tools[1].

[1] https://www.postgresql.org/docs/devel/docguide-toolsets.html




Re: [HACKERS] Block level parallel vacuum

2019-11-03 Thread Amit Kapila
On Mon, Oct 28, 2019 at 1:52 PM Dilip Kumar  wrote:
>
> On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila  wrote:
> >
> > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar  wrote:
> > >
> > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > >
> > > I haven't yet read the new set of the patch.  But, I have noticed one
> > > thing.  That we are getting the size of the statistics using the AM
> > > routine.  But, we are copying those statistics from local memory to
> > > the shared memory directly using the memcpy.   Wouldn't it be a good
> > > idea to have an AM specific routine to get it copied from the local
> > > memory to the shared memory?  I am not sure it is worth it or not but
> > > my thought behind this point is that it will give AM to have local
> > > stats in any form ( like they can store a pointer in that ) but they
> > > can serialize that while copying to shared stats.  And, later when
> > > shared stats are passed back to the Am then it can deserialize in its
> > > local form and use it.
> > >
> >
> > You have a point, but after changing the gist index, we don't have any
> > current usage for indexes that need something like that. So, on one
> > side there is some value in having an API to copy the stats, but on
> > the other side without having clear usage of an API, it might not be
> > good to expose a new API for the same.   I think we can expose such an
> > API in the future if there is a need for the same.
> I agree with the point.  But, the current patch exposes an API for
> estimating the size for the statistics.  So IMHO, either we expose
> both APIs for estimating the size of the stats and copy the stats or
> none.  Am I missing something here?
>

I think the first one is a must as the things stand today because
otherwise, we won't be able to copy the stats.  The second one (expose
an API to copy stats) is good to have but there is no usage of it
immediately.  We can expose the second API considering the future need
but as there is no valid case as of now, it will be difficult to test
and we are also not sure whether in future any IndexAM will require
such an API.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: SimpleLruTruncate() mutual exclusion

2019-11-03 Thread Thomas Munro
On Thu, Aug 1, 2019 at 6:51 PM Noah Misch  wrote:
> vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to 
> unlink
> vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to 
> unlink
> vac_truncate_clog() instance 1 unlinks segment ABCD
> vac_truncate_clog() instance 1 calls SetTransactionIdLimit()
> vac_truncate_clog() instance 1 finishes
> some backend calls SimpleLruZeroPage(), creating segment ABCD
> vac_truncate_clog() instance 2 unlinks segment ABCD
>
> Serializing vac_truncate_clog() fixes that.

I've wondered before (in a -bugs thread[1] about unexplained pg_serial
wraparound warnings) if we could map 64 bit xids to wide SLRU file
names that never wrap around and make this class of problem go away.
Unfortunately multixacts would need 64 bit support too...

[1] 
https://www.postgresql.org/message-id/flat/CAEBTBzuS-01t12GGVD6qCezce8EFD8aZ1V%2Bo_3BZ%3DbuVLQBtRg%40mail.gmail.com




Re: fe-utils - share query cancellation code

2019-11-03 Thread Thomas Munro
On Sat, Nov 2, 2019 at 10:38 AM Fabien COELHO  wrote:
> Attached patch v4 does it.

Hi Fabien,

It looks like don't define sigint_interrupt_jmp and
sigint_interrupt_enabled on Windows, yet they are still declared and
referenced?

command.obj : error LNK2001: unresolved external symbol
sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj]
copy.obj : error LNK2001: unresolved external symbol
sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj]
input.obj : error LNK2001: unresolved external symbol
sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj]
mainloop.obj : error LNK2001: unresolved external symbol
sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj]
command.obj : error LNK2001: unresolved external symbol
sigint_interrupt_jmp [C:\projects\postgresql\psql.vcxproj]
copy.obj : error LNK2001: unresolved external symbol
sigint_interrupt_jmp [C:\projects\postgresql\psql.vcxproj]
mainloop.obj : error LNK2001: unresolved external symbol
sigint_interrupt_jmp [C:\projects\postgresql\psql.vcxproj]
.\Release\psql\psql.exe : fatal error LNK1120: 2 unresolved externals
[C:\projects\postgresql\psql.vcxproj]
0 Warning(s)

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.64074




Re: Log statement sample - take two

2019-11-03 Thread Tomas Vondra

On Sat, Oct 19, 2019 at 05:02:01PM +0200, Adrien Nayrat wrote:

Hello,

This patch propose a new way to sample statement to logs.

As a reminder, this feature was committed in PG12[1] then reverted[2] after the
proposition of log_statement_sample_limit[3]

The first implementation added a new GUC to sample statement logged by
log_min_duration_statement. Then, we wanted to add the ability to log all
statement whose duration exceed log_statement_sample_limit.

This was confusing because log_min_duration behaves as a minimum to enable
sampling. While log_statement_sample_limit behave as maximum to disable it.[4]

Tomas Vondra proposed to use two minimum thresholds:


1) log_min_duration_sample - enables sampling of commands, using the
existing GUC log_statement_sample_rate

2) log_min_duration_statement - logs all commands exceeding this


This patch implement this idea.

PS: I notice I forgot to mention "Only superusers can change this setting" in
the log_transaction_sample_rate documentation. It attached a second patch to fix
this.



Seems fine to me, mostly. I think the docs should explain how
log_min_duration_statement interacts with log_min_duration_sample.
Attached is a patch doing that, by adding one para to each GUC, along
with some minor rewordings. I think the docs are mixing "sampling"
vs. "logging" and "durations" vs. "statements" not sure.

I also think the two new sampling GUCs (log_min_duration_sample and
log_statement_sample_rate) should be next to each other. We're not
ordering the GUCs alphabetically anyway.

I plan to make those changes and push in a couple days.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From 185caaf3b1956300d87bdf48cdd91a7f89b515c3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 4 Nov 2019 02:00:26 +0100
Subject: [PATCH 1/2] fix log_min_transaction_sample_rate docs

---
 doc/src/sgml/config.sgml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0191ec84b1..48d7939d2d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5988,6 +5988,7 @@ local0.*/var/log/postgresql
  logs all statements for all transactions.
  log_transaction_sample_rate is helpful to track a
  sample of transaction.
+ Only superusers can change this setting.
 

 
-- 
2.21.0

>From ca4a8b558742f247e3c9e5d388dacd4ef918e154 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 4 Nov 2019 01:57:45 +0100
Subject: [PATCH 2/2] log_min_duration_sample rework

---
 doc/src/sgml/config.sgml  | 85 +++
 src/backend/tcop/postgres.c   | 40 ++---
 src/backend/utils/misc/guc.c  | 27 +-
 src/backend/utils/misc/postgresql.conf.sample | 11 +++
 src/include/utils/guc.h   |  2 +
 5 files changed, 153 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 48d7939d2d..e5372d933c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5930,6 +5930,55 @@ local0.*/var/log/postgresql
   
  
 
+ 
+  log_min_duration_sample (integer)
+  
+   log_min_duration_sample configuration 
parameter
+  
+  
+   
+
+ Allows to sample the logging of the duration of each completed
+ statement if the statement ran for at least the specified amount of
+ time. If this value is specified without units, it is taken as 
milliseconds.
+ Setting this to zero samples all statement durations.
+ Minus-one (the default) disables sampling statement durations.
+ For example, if you set it to 250ms
+ then all SQL statements that run 250ms or longer will be considered
+ for sampling, with sample rate is controlled by . 
+ Enabling this parameter can be helpful when the traffic too high to
+ sample all queries.
+ Only superusers can change this setting.
+
+
+
+ This option has lower priority than ,
+ which means setting this to a value higher than 
+ means the sample rate is ignored and all queries will be logged.
+
+
+
+ For clients using extended query protocol, durations of the Parse,
+ Bind, and Execute steps are logged independently.
+
+
+   
+
+ When using this option together with
+ ,
+ the text of statements that are logged because of
+ log_statement will not be repeated in the
+ duration log message.
+ If you are not using syslog, it is 
recommended
+ that you log the PID or session ID using
+ 
+ so that you can link the statement message to the later
+ duration message using the process ID or session ID.
+
+   
+   
+ 
+
 

Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)

2019-11-03 Thread Thomas Munro
On Wed, Oct 16, 2019 at 6:49 PM Dave Cramer  wrote:
> Here's an updated patch that addresses some of Andres' concerns specifically 
> does not use strtok.

Hi Dave,

I think you need to s/strncasecmp/pg_strncasecmp/ to make this build on Windows.

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.63963




Re: Getting psql to redisplay command after \e

2019-11-03 Thread Tom Lane
Fabien COELHO  writes:
>> I did experiment with trying to do that, but I couldn't get it to work, 
>> even with the single version of libreadline I had at hand.  It appears 
>> to me that readline() starts by clearing the internal buffer.  Even if 
>> we could persuade it to work in a particular readline version, I think 
>> the odds of making it portable across all the libreadline and libedit 
>> versions that are out there aren't very good.  And there's definitely no 
>> chance of being remotely compatible with that behavior when using the 
>> bare tty drivers (psql -n).

> This suggests that readline cannot be used to edit simply a known string? 
> :-( "rl_insert_text" looked promising, although probably not portable, and 
> I tried to make it work without much success anyway. Maybe I'll try to 
> investigate more deeply later.

I think that rl_insert_text and friends can probably only be used from
readline callback functions.  So in principle maybe you could make it
work by having an rl_startup_hook that injects text if there is any
to inject.  There would remain the issues of (a) is it portable across
a wide range of readline and libedit versions, (b) will the prompting
behavior be nice, and (c) do we really want this to work fundamentally
differently when readline is turned off?

(Pavel's code cited nearby seems to me to be a fine example of what
we do *not* want to do.  Getting in bed with libreadline to that
extent is inevitably going to misbehave in some places.)

>> In practice, if you decide that you don't like what you're looking at,
>> you're probably going to go back into the editor to fix it, ie issue
>> another \e.  So I'm not sure that it's worth such pushups to get the
>> data into readline's buffer.

> For me \e should mean edit, not edit-and-execute, so I should be back to 
> prompt, which is the crux of my unease with how the feature behaves, 
> because it combines two functions that IMO shouldn't.

I don't understand that complaint at all.  My proposed patch does not
change the behavior to force execution, and it does display a prompt ---
one that reflects whether you've given a complete command.

regards, tom lane




Re: Parallel leader process info in EXPLAIN

2019-11-03 Thread Thomas Munro
On Mon, Nov 4, 2019 at 12:11 PM Thomas Munro  wrote:
> I guess I thought of that as a debugging feature and took it out
> because it was too verbose, but maybe it just needs to be controlled
> by the VERBOSE switch.  Do you think we should put that back?

By which I mean: would you like to send a patch?  :-)

Here is a new version of the "Leader:" patch, because cfbot told me
that gcc didn't like it as much as clang.


0001-Show-parallel-leader-stats-in-EXPLAIN-output-v2.patch
Description: Binary data


0002-Improve-EXPLAIN-of-Sort-in-parallel-queries-v2.patch
Description: Binary data


Re: Parallel leader process info in EXPLAIN

2019-11-03 Thread Thomas Munro
On Thu, Oct 31, 2019 at 5:24 AM Melanie Plageman
 wrote:
> On Wed, Oct 23, 2019 at 12:30 AM Thomas Munro  wrote:
>> Of course there are some more things that could be reported in a
>> similar way eventually, such as filter counters and hash join details.
>
> This made me think about other explain wishlist items.
> For parallel hashjoin, I would find it useful to know which batches
> each worker participated in (maybe just probing to start with, but
> loading would be great too).
>
> I'm not sure anyone else (especially users) would care about this,
> though.

Yeah, I think that'd be interesting.  At some point in the patch set
when I was working on the batch load balancing strategy I showed the
number of tuples hashed and number of batches probed by each process
(not the actual batch numbers, since that seems a bit over the top):

https://www.postgresql.org/message-id/CAEepm%3D0th8Le2SDCv32zN7tMyCJYR9oGYJ52fXNYJz1hrpGW%2BQ%40mail.gmail.com

I guess I thought of that as a debugging feature and took it out
because it was too verbose, but maybe it just needs to be controlled
by the VERBOSE switch.  Do you think we should put that back?




Re: Consolidate 'unique array values' logic into a reusable function?

2019-11-03 Thread Thomas Munro
On Tue, Sep 10, 2019 at 11:43 AM Thomas Munro  wrote:
> Rebased due to bitrot.  Spotted one more place to use this, in
> src/backend/utils/adt/txid.c.

Rebased.  I'm planning to commit this soon.


0001-Consolidate-code-that-makes-a-sorted-array-unique-v3.patch
Description: Binary data


Re: Should we add xid_current() or a int8->xid cast?

2019-11-03 Thread Thomas Munro
On Tue, Oct 29, 2019 at 5:23 PM btfujiitkp  wrote:
> > Thomas Munro  writes:
> >> On Sun, Sep 1, 2019 at 5:04 PM Thomas Munro 
> >> wrote:
> >>> Adding to CF.
> >
> >> Rebased.  An OID clashed so re-roll the dice.  Also spotted a typo.
> >
>
> I have some questions in this code.

Thanks for looking at the patch.

> First,
> "FullTransactionIdPrecedes(xmax, val)" is not equal to "val >= xmax" of
> the previous code.  "FullTransactionIdPrecedes(xmax, val)" expresses
> "val > xmax". Is it all right?
>
> @@ -384,15 +324,17 @@ parse_snapshot(const char *str)
> while (*str != '\0')
> {
> /* read next value */
> -   val = str2txid(str, );
> +   val = FullTransactionIdFromU64(pg_strtouint64(str, , 
> 10));
> str = endp;
>
> /* require the input to be in order */
> -   if (val < xmin || val >= xmax || val < last_val)
> +   if (FullTransactionIdPrecedes(val, xmin) ||
> +   FullTransactionIdPrecedes(xmax, val) ||
> +   FullTransactionIdPrecedes(val, last_val))
>
> In addition to it, as to current TransactionId(not FullTransactionId)
> comparison, when we express ">=" of TransactionId, we use
> "TransactionIdFollowsOrEquals". this method is referred by some methods.
> On the other hand, FullTransactionIdFollowsOrEquals has not implemented
> yet. So, how about implementing this method?

Good idea.  I added the missing variants:

+#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value)
+#define FullTransactionIdFollows(a, b) ((a).value > (b).value)
+#define FullTransactionIdFollowsOrEquals(a, b) ((a).value >= (b).value)

> Second,
> About naming rule, "8" of xid8 means 8 bytes, but "8" has different
> meaning in each situation. For example, int8 of PostgreSQL means 8
> bytes, int8 of C language means 8 bits. If 64 is used, it just means 64
> bits. how about xid64()?

In C, the typenames use bits, by happy coincidence similar to the C99
stdint.h typenames (int32_t etc) that we should perhaps eventually
switch to.

In SQL, the types have names based on the number of bytes: int2, int4,
int8, float4, float8, not conforming to any standard but established
over 3 decades ago and also understood by a few other SQL systems.

That's unfortunate, but I can't see that ever changing.  I thought
that it would make most sense for the SQL type to be called xid8,
though admittedly it doesn't quite fit the pattern because xid is not
called xid4.  There is another example a bit like that: macaddr (6
bytes) and macaccdr8 (8 bytes).  As for the C type, we use
TransactionId and FullTransactionId (rather than, say, xid32 and
xid64).

In the attached I also took Tom's advice and used unused_oids script
to pick random OIDs >= 8000 for all new objects (ignoring nearby
comments about the range of OIDs used in different sections of the
file).


0001-Add-SQL-type-xid8-to-expose-FullTransactionId-to--v3.patch
Description: Binary data


0002-Introduce-xid8-variants-of-the-txid_XXX-fmgr-func-v3.patch
Description: Binary data


Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes

2019-11-03 Thread Tom Lane
[ redirecting to -hackers ]

I wrote:
> Yeah, it seems like a bad idea to override the user's choice to write
> a quote, even if one is not formally necessary.  I propose the attached.

After further experimentation, I concluded that that patch is a bad idea;
it breaks a lot of cases that used to work before.  It turns out that
Readline has a bunch of behaviors for filename completion that occur
outside of the rl_filename_completion_function function proper, and they
all assume that what's passed back from that function is plain unquoted
filename(s).  Notably, completion of a path that includes directory names
just doesn't work well at all anymore with that patch ... nor did it
work well before, if the path contained characters that we thought we
should quote.

The right way to do things, seemingly, is to let
rl_filename_completion_function be invoked without any interference,
and instead put our SQL-aware quoting/dequoting logic into the hooks
Readline provides for that purpose, rl_filename_quoting_function and
rl_filename_dequoting_function.  (It appears that somebody tried to do
that before, way back around the turn of the century, but gave up on it.
Too bad, because it's the right thing.)

Of course, this only works if we have those hooks :-(.  So far as
I can tell, all libreadline variants that might still exist in the wild
do have them; but libedit doesn't, at least not in the version Apple
is shipping.  Hence, what the attached patch does is to make configure
probe for the existence of the hook variables; if they're not there,
fall back to what I proposed previously.  The behavior on libedit is
a bit less nice than one could wish, but it's better than what we have
now.

I've tested this on the oldest and newest readline versions I have at
hand (4.2a and 8.0), as well as the oldest and newest versions of
Apple's libedit derivative; but I haven't tried it on whatever the
BSDen are shipping as libedit.

There's enough moving parts here that this probably needs to go through
a full review cycle, so I'll add it to the next commitfest.  Some notes
for anybody wanting to review:

* The patch now always quotes completed filenames, so quote_if_needed()
is misnamed and overcomplicated for this use-case.  I left the extra
generality in place for possible future use.  On the other hand, this
is the *only* use-case, so you could also argue that we should just
simplify the function's API.  I have no strong opinion about that.

* In addition to the directly-related-to-filenames changes, it turns out
to be necessary to set rl_completer_quote_characters to include at least
single quotes, else Readline doesn't understand that a quoted filename
is quoted.  The patch sets it to include double quotes as well.  This
is probably the aspect of the patch that most needs testing.  The general
effect of this change is that Readline now understands that quoted
strings are single entities, plus it will try to complete the contents
of a string if you ask it.  The side-effects I've noticed seem to be
all beneficial -- for example, if you do

select * from "foo

it now correctly completes table names starting with "foo", which it
did not before.  But there might be some bad effects as well.  Also,
although libedit has this variable, setting it doesn't have that effect
there; I've not really found that the variable does anything at all there.

* The behavior of quote_file_name is directly modeled on what Readline's
default implementation rl_quote_filename does, except that it uses
SQL-aware quoting rules.  The business of passing back the final quote
mark separately is their idea.

* An example of the kind of case that's interesting is that if you type

\lo_import /usr/i

then what you get on readline (with this patch) is

\lo_import '/usr/include/

while libedit produces

\lo_import '/usr/include' (with a space after the trailing quote)

That is, readline knows that the completion-so-far is a directory and
assumes that's not all you want, whereas libedit doesn't know that.
So you typically now have to back up two characters, type slash, and
resume completing.  That's kind of a pain, but I'm not sure we can
make it better very easily.  Anyway, libedit did something close to
that before, too.

* There are also some interesting cases around special characters in
the filename.  It seems to work well for embedded spaces, not so well
for embedded single quotes, though that may well vary across readline
versions.  Again, there seems to be a limited amount we can do about
that, given how much of the relevant logic is buried where we can't
modify it.  And I'm not sure how much I care about that case, anyway.

regards, tom lane

diff --git a/config/programs.m4 b/config/programs.m4
index 90ff944..e5bf980 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -209,11 +209,12 @@ fi
 
 
 
-# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
-# ---
+# PGAC_READLINE_VARIABLES
+# 

Re: [PATCH] Include triggers in EXPLAIN

2019-11-03 Thread Tom Lane
=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> Recently I got few times into situation where I was trying to find out what
> is blocking DELETE queries. Running EXPLAIN (even VERBOSE one) wasn't
> useful, since the reason was slow trigger (missing index on foreign key
> column). I had to create testing entry and run EXPLAIN ANALYZE DELETE to
> get this information.

> It will be really valuable for me to show triggers in EXPLAIN query since
> it will make clear for me there will be any trigger "activated" during
> execution of DELETE query and that can be the reason for slow DELETE.

I don't really see the point of this patch?  You do get the trigger
times during EXPLAIN ANALYZE, and I don't believe that a plain EXPLAIN
is going to have full information about what triggers might fire or
not fire.

regards, tom lane




Re: 64 bit transaction id

2019-11-03 Thread Thomas Munro
On Sat, Nov 2, 2019 at 6:15 AM Tomas Vondra
 wrote:
> On Fri, Nov 01, 2019 at 12:05:12PM +0300, Павел Ерёмин wrote:
> >   Hi.
> >   sorry for my English.
> >   I want to once again open the topic of 64 bit transaction id. I did not
> >   manage to find in the archive of the option that I want to discuss, so I
> >   write. If I searched poorly, then please forgive me.
> >   The idea is not very original and probably has already been considered,
> >   again I repeat - I did not find it. Therefore, please do not scold me
> >   severely.
> >   In discussions of 64-bit transaction id, I did not find mention of an
> >   algorithm for storing them, as it was done, for example, in MS SQL Server.
> >   What if instead of 2 fields (xmin and xmax) with a total length of 64 bits
> >   - use 1 field (let's call it xid) with a length of 64 bits in tuple
> >   header? In this field store the xid of the transaction that created the
> >   version. In this case, the new transaction in order to understand whether
> >   the read version is suitable for it or not, will have to read the next
> >   version as well. Those. The downside of such  decision is of course an
> >   increase in I / O. Transactions will have to read the +1 version. On the
> >   plus side, the title of the tuple remains the same length.
> >
>
> I think that assumes we can easily identify the next version of a tuple,
> and I don't think we can do that. We may be able to do that for for HOT
> chains, but that only works when the next version fits onto the same
> page (and does not update indexed columns). But when we store the new
> version on a separate page, we don't have any link between those tuples.
> And adding it may easily mean more overhead than the 8B we'd save by
> only storing a single XID.
>
> IMO the most promising solution to this is the "page epoch" approach
> discussed some time ago (1-2 years?).

There have been so many discussions of this topic that it's hard to search for.

Since we have in fact begun to take some baby steps towards using 64
bit transaction IDs in a few places, I decided to create a new wiki
page to try to keep track of the various discussions.  If you know
where to find the 'best' discussions (for example the one where, if I
recall correctly, it was Heikki who proposed a 'reference'
FullTranasctionId on the page header) and any proposals that came with
patches, then I'd be grateful if you could add links to it to this
wiki page!

https://wiki.postgresql.org/wiki/FullTransactionId




function calls optimization

2019-11-03 Thread Andrzej Barszcz
Hi

Main goal of this patch is to avoid repeated calls of immutable/stable
functions.
This patch is against version 10.10.
I guess same logic could be implemented up till version 12.
--- src/include/nodes/execnodes.h	2019-08-05 23:16:54.0 +0200
+++ src/include/nodes/execnodes.h	2019-11-03 20:05:34.338305825 +0100
@@ -882,6 +883,39 @@ typedef struct PlanState
 	TupleTableSlot *ps_ResultTupleSlot; /* slot for my result tuples */
 	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
+#ifdef OPTFUNCALLS
+	/* was_called - list of  ExprEvalStep* or FuncExpr* depending on execution stage
+	 * 
+	 * Stage I. ExecInitExprRec()
+	 *	List gathers all not volatile, not set returning, not window FuncExpr*,
+	 *	equal nodes occupy one position in the list. Position in this list ( counting from 1 )
+	 *	and planstate are remembered in actual ExprEvalStep*
+	 *
+	 * 	For query: select f(n),f(n) from t  - was_called->length will be 1 and ptr_value 
+	 *		 will be FuncExpr* node of f(n)
+	 *
+	 * 	For query: select f(n),g(n),f(n) from t - list->length == 2
+	 *
+	 * Stage II. ExecProcnode()
+	 *	For every planstate->was_called list changes its interpretation - from now on
+	 *	it is a list of ExprEvalStep* . Before executing real execProcnode
+	 *	every element of this list ( ptr_value ) is set to NULL. We don't know which
+	 *	function will be called first
+	 *
+	 * Stage III. ExecInterpExpr() case EEOP_FUNCEXPR
+	 *	ExprEvalStep.position > 0  means that in planstate->was_called could be ExprEvalStep*
+	 *	which was done yet or NULL.
+	 *
+	 *	NULL means that eval step is entered first time and:
+	 *		1. real function must be called
+	 *		2. ExprEvalStep has to be remembered in planstate->was_called at position
+	 *		step->position - 1
+	 *
+	 *	NOT NULL means that in planstate->was_called is ExprEvalStep* with ready result, so
+	 *	there is no need to call function
+	 */
+	List *was_called;
+#endif
 } PlanState;

 /* 
--- src/include/executor/execExpr.h	2019-08-05 23:16:54.0 +0200
+++ src/include/executor/execExpr.h	2019-11-03 20:04:03.739025142 +0100
@@ -561,6 +561,10 @@ typedef struct ExprEvalStep
 			AlternativeSubPlanState *asstate;
 		}			alternative_subplan;
 	}			d;
+#ifdef OPTFUNCALLS
+	PlanState *planstate;	/* parent PlanState for this expression */
+	int position;		/* position in planstate->was_called counted from 1 */
+#endif
 } ExprEvalStep;


--- src/backend/executor/execProcnode.c	2019-08-05 23:16:54.0 +0200
+++ src/backend/executor/execProcnode.c	2019-11-03 19:54:28.071672386 +0100
@@ -120,6 +120,17 @@
 static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
 static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
 
+#ifdef OPTFUNCALLS
+static TupleTableSlot *execReal(PlanState *node)
+{
+	/* Before each scan step, node->was_called elements must be set to NULL */
+	ListCell *item;
+	foreach(item,node->was_called)
+		item->data.ptr_value = NULL;
+
+	return node->ExecProcNodeReal(node);
+}
+#endif

 /* 
  *		ExecInitNode
@@ -425,8 +436,11 @@ ExecProcNodeFirst(PlanState *node)
 	if (node->instrument)
 		node->ExecProcNode = ExecProcNodeInstr;
 	else
+#ifndef OPTFUNCALLS
 		node->ExecProcNode = node->ExecProcNodeReal;
-
+#else
+		node->ExecProcNode = execReal;
+#endif
 	return node->ExecProcNode(node);
 }

@@ -442,9 +456,11 @@ ExecProcNodeInstr(PlanState *node)
 	TupleTableSlot *result;
 
 	InstrStartNode(node->instrument);
-
+#ifndef OPTFUNCALLS
 	result = node->ExecProcNodeReal(node);
-
+#else
+	result = execReal(node);
+#endif
 	InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);

 	return result;
--- src/backend/executor/execExpr.c	2019-08-05 23:16:54.0 +0200
+++ src/backend/executor/execExpr.c	2019-11-03 19:57:21.994249398 +0100
@@ -45,7 +45,13 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/typcache.h"
-
+#ifdef OPTFUNCALLS
+#include "catalog/pg_proc.h"
+#include "utils/syscache.h"
+#include "access/htup_details.h"
+static bool isNotVolatile(Oid funcid);
+static int findFuncExpr(FuncExpr* node,PlanState* parent);
+#endif

 typedef struct LastAttnumInfo
 {
@@ -806,7 +812,40 @@ ExecInitExprRec(Expr *node, PlanState *p
 ExecInitFunc(, node,
 			 func->args, func->funcid, func->inputcollid,
 			 parent, state);
+#ifdef OPTFUNCALLS
+scratch.position = 0;
+scratch.planstate = parent;
+if( parent )
+{
+	/* Build/extend the list of non volatile functions for this PlanState node.
+	 * Try to find func ( equal node ) in parent->was_called list at first
+	 */
+	int pos = findFuncExpr(func,parent);
+	if( !pos && isNotVolatile(func->funcid ))
+	{
+		/* Function is not in the list yet but it's maybe useful for optimizing
+		 * repeated calls -  register function in parent and remember its position
+		 

Re: Getting psql to redisplay command after \e

2019-11-03 Thread Pavel Stehule
ne 3. 11. 2019 v 20:58 odesílatel Fabien COELHO 
napsal:

>
> Hello Tom,
>
> >> I was suggesting something much simpler than rethinking readline
> handling.
> >> Does not mean that it is a good idea, but while testing the patch I
> would
> >> have liked the unfinished line to be in the current editing buffer,
> >> basically as if I had not typed .
> >
> > I did experiment with trying to do that, but I couldn't get it to work,
> > even with the single version of libreadline I had at hand.  It appears
> > to me that readline() starts by clearing the internal buffer.  Even if
> > we could persuade it to work in a particular readline version, I think
> > the odds of making it portable across all the libreadline and libedit
> > versions that are out there aren't very good.  And there's definitely no
> > chance of being remotely compatible with that behavior when using the
> > bare tty drivers (psql -n).
>
> Argh, too bad.
>
> This suggests that readline cannot be used to edit simply a known string?
> :-( "rl_insert_text" looked promising, although probably not portable, and
> I tried to make it work without much success anyway. Maybe I'll try to
> investigate more deeply later.
>

pspg uses rl_insert_text

https://github.com/okbob/pspg/blob/59d115cd55926ab1886fc0dedbbc6ce0577b0cb3/src/pspg.c#L2522

Pavel


> Note that replacing the current buffer is exactly what history does. So
> maybe that could be exploited by appending the edited line into history
> (add_history) and tell readline to move there (could not find how to do
> that automatically, though)? Or some other history handling…
>
> > In practice, if you decide that you don't like what you're looking at,
> > you're probably going to go back into the editor to fix it, ie issue
> > another \e.  So I'm not sure that it's worth such pushups to get the
> > data into readline's buffer.
>
> For me \e should mean edit, not edit-and-execute, so I should be back to
> prompt, which is the crux of my unease with how the feature behaves,
> because it combines two functions that IMO shouldn't.
>
> Anyway the submitted patch is an improvement to the current status.
>
> --
> Fabien.


Re: Getting psql to redisplay command after \e

2019-11-03 Thread Fabien COELHO


Hello Tom,


I was suggesting something much simpler than rethinking readline handling.
Does not mean that it is a good idea, but while testing the patch I would
have liked the unfinished line to be in the current editing buffer,
basically as if I had not typed .


I did experiment with trying to do that, but I couldn't get it to work, 
even with the single version of libreadline I had at hand.  It appears 
to me that readline() starts by clearing the internal buffer.  Even if 
we could persuade it to work in a particular readline version, I think 
the odds of making it portable across all the libreadline and libedit 
versions that are out there aren't very good.  And there's definitely no 
chance of being remotely compatible with that behavior when using the 
bare tty drivers (psql -n).


Argh, too bad.

This suggests that readline cannot be used to edit simply a known string? 
:-( "rl_insert_text" looked promising, although probably not portable, and 
I tried to make it work without much success anyway. Maybe I'll try to 
investigate more deeply later.


Note that replacing the current buffer is exactly what history does. So 
maybe that could be exploited by appending the edited line into history 
(add_history) and tell readline to move there (could not find how to do 
that automatically, though)? Or some other history handling…



In practice, if you decide that you don't like what you're looking at,
you're probably going to go back into the editor to fix it, ie issue
another \e.  So I'm not sure that it's worth such pushups to get the
data into readline's buffer.


For me \e should mean edit, not edit-and-execute, so I should be back to 
prompt, which is the crux of my unease with how the feature behaves, 
because it combines two functions that IMO shouldn't.


Anyway the submitted patch is an improvement to the current status.

--
Fabien.

[PATCH] Include triggers in EXPLAIN

2019-11-03 Thread Josef Šimánek
Hello!

Recently I got few times into situation where I was trying to find out what
is blocking DELETE queries. Running EXPLAIN (even VERBOSE one) wasn't
useful, since the reason was slow trigger (missing index on foreign key
column). I had to create testing entry and run EXPLAIN ANALYZE DELETE to
get this information.

It will be really valuable for me to show triggers in EXPLAIN query since
it will make clear for me there will be any trigger "activated" during
execution of DELETE query and that can be the reason for slow DELETE.

I have seen initial discussion at
https://www.postgresql.org/message-id/flat/20693.732761%40sss.pgh.pa.us
to show time spent in triggers in EXPLAIN ANALYZE including quick
discussion to possibly show triggers during EXPLAIN. Anyway since it
doesn't show any additional cost and just inform about the possibilities, I
still consider this feature useful. This is probably implementation of idea
mentioned at
https://www.postgresql.org/message-id/21221.736869%40sss.pgh.pa.us by
Tom Lane.

After initial discussion with Pavel Stěhule and Tomáš Vondra at czech
postgresql maillist (
https://groups.google.com/forum/#!topic/postgresql-cz/Dq1sT7huVho) I was
able to prepare initial version of this patch. I have added EXPLAIN option
called TRIGGERS enabled by default.There's already autoexplain property for
this. I understand it is not possible to show only triggers which will be
really activated unless query is really executed. EXPLAIN ANALYZE remains
untouched with this patch.

- patch with examples can be found at
https://github.com/simi/postgres/pull/2
- DIFF format https://github.com/simi/postgres/pull/2.diff
- PATCH format (also attached) https://github.com/simi/postgres/pull/2.patch

All regression tests passed with this change locally on latest git master.
I would like to cover this patch with more regression tests, but I wasn't
sure where to place them, since there's no "EXPLAIN" related test "group".
Is "src/test/regress/sql/triggers.sql" the best place to add tests related
to this change?

PS: This is my first try to contribute to postgresql codebase. The quality
of patch is probably not the best, but I will be more than happy to do any
requested change if needed.

Regards,
Josef Šimánek
From 18578e3d07aa159631e0903abae496a6482fa279 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Sat, 27 Jul 2019 09:27:41 +0200
Subject: [PATCH] Show possible triggers in EXPLAIN.

---
 doc/src/sgml/ref/explain.sgml |  9 
 src/backend/commands/explain.c| 43 +++
 src/include/commands/explain.h|  1 +
 src/test/regress/expected/foreign_key.out |  6 ++-
 src/test/regress/expected/insert_conflict.out |  4 +-
 src/test/regress/expected/updatable_views.out | 10 -
 6 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411fa0..ba0d8df88c19 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -43,6 +43,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] statementboolean ]
 TIMING [ boolean ]
 SUMMARY [ boolean ]
+TRIGGERS [ boolean ]
 FORMAT { TEXT | XML | JSON | YAML }
 
  
@@ -224,6 +225,14 @@ ROLLBACK;
 

 
+   
+TRIGGERS
+
+ 
+   TODO
+ 
+
+   

 FORMAT
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a32f..93845bcaad36 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -149,6 +149,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 	ListCell   *lc;
 	bool		timing_set = false;
 	bool		summary_set = false;
+	bool		triggers_set = false;
 
 	/* Parse options list. */
 	foreach(lc, stmt->options)
@@ -175,6 +176,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 			summary_set = true;
 			es->summary = defGetBoolean(opt);
 		}
+		else if (strcmp(opt->defname, "triggers") == 0)
+		{
+			triggers_set = true;
+			es->triggers = defGetBoolean(opt);
+		}
 		else if (strcmp(opt->defname, "format") == 0)
 		{
 			char	   *p = defGetString(opt);
@@ -210,6 +216,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 	/* if the timing was not set explicitly, set default value */
 	es->timing = (timing_set) ? es->timing : es->analyze;
 
+	/* if the triggers was not set explicitly, set default value */
+	es->triggers = (triggers_set) ? es->triggers : true;
+
 	/* check that timing is used with EXPLAIN ANALYZE */
 	if (es->timing && !es->analyze)
 		ereport(ERROR,
@@ -556,7 +565,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	}
 
 	/* Print info about runtime of triggers */
-	if (es->analyze)
+	if (es->triggers)
 		ExplainPrintTriggers(es, queryDesc);
 
 	/*
@@ -911,17 +920,23 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
 {
 	

Re: Index Skip Scan

2019-11-03 Thread Dmitry Dolgov
> On Wed, Sep 25, 2019 at 2:33 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> v27-0001-Index-skip-scan.patch
>
> Some random thoughts on this:

Thanks a lot for the commentaries!

> * Is _bt_scankey_within_page() really doing the right thing within empty 
> pages?
>
> It looks like you're accidentally using the high key when the leaf
> page is empty with forward scans (assuming that the leaf page isn't
> rightmost). You'll need to think about empty pages for both forward
> and backward direction scans there.

Yes, you're right, that's an issue I need to fix.

> Actually, using the high key in some cases may be desirable, once the
> details are worked out -- the high key is actually very helpful with
> low cardinality indexes. If you populate an index with retail
> insertions (i.e. don't just do a CREATE INDEX after the table is
> populated), and use low cardinality data in the indexed columns then
> you'll see this effect.

Can you please elaborate a bit more? I see that using high key will help
a forward index scans to access the minimum number of leaf pages, but
I'm not following how is it connected to the _bt_scankey_within_page? Or
is this commentary related in general to the whole implementation?

> * The extra scankeys that you are using in most of the new nbtsearch.c
> code is an insertion scankey -- not a search style scankey. I think
> that you should try to be a bit clearer on that distinction in
> comments. It is already confusing now, but at least there is only zero
> or one insertion scankeys per scan (for the initial positioning).
>
> * There are two _bt_skip() prototypes in nbtree.h (actually, there is
> a btskip() and a _bt_skip()). I understand that the former is a public
> wrapper of the latter, but I find the naming a little bit confusing.
> Maybe rename _bt_skip() to something that is a little bit more
> suggestive of its purpose.
>
> * Suggest running pgindent on the patch.

Sure, I'll incorporate mentioned improvements into the next patch
version (hopefully soon).

> And now some more:
>
> * I'm confused about this code in _bt_skip():
>
Yeah, it shouldn't be there, but rather before _bt_next, that expects
unlocked buffer. Will fix.

> * It also seems a bit odd that you assume that the scan is
> "scan->xs_want_itup", but then check that condition many times. Why
> bother?
>
> * Similarly, why bother using _bt_drop_lock_and_maybe_pin() at all,
> rather than just unlocking the buffer directly? We'll only drop the
> pin for a scan that is "!scan->xs_want_itup", which is never the case
> within _bt_skip().
>
> I think that the macros and stuff that manage pins and buffer locks in
> nbtsearch.c is kind of a disaster anyway [1]. Maybe there is some
> value in trying to be consistent with existing nbtsearch.c code in
> ways that aren't strictly necessary.

Yep, I've seen this thread, but tried to be consistent with the
surrounding core style. Probably it indeed doesn't make sense.

> * Not sure why you need this code after throwing an error:
>
> > else
> > {
> > elog(ERROR, "Could not read closest index tuples: %d", 
> > offnum);
> > pfree(so->skipScanKey);
> > so->skipScanKey = NULL;
> > return false;
> > }

Unfortunately this is just a leftover from a previous version. Sorry for
that, will get rid of it.




Re: [HACKERS] proposal: schema variables

2019-11-03 Thread Pavel Stehule
čt 10. 10. 2019 v 11:41 odesílatel Pavel Stehule 
napsal:

> Hi
>
> minor change - replace heap_tuple_fetch_attr by detoast_external_attr.
>
>
similar update - heap_open, heap_close was replaced by table_open,
table_close

Regards

Pavel


schema_variables-20191103.patch.gz
Description: application/gzip


Re: updating unaccent.rules for Arabic letters

2019-11-03 Thread Tom Lane
kerbrose khaled  writes:
> I would like to update unaccent.rules file to support Arabic letters. so 
> could someone help me or tell me how could I add such contribution. I 
> attached the file including the modifications, only the last 4 lines.

Hi!  I've got no objection to including Arabic in the set of covered
languages, but handing us a new unaccent.rules file isn't the way to
do it, because that's a generated file.  The adjacent script
generate_unaccent_rules.py generates it from the official Unicode
source data (see comments in that script).  What we need, ultimately,
is a patch to that script so it will emit these additional translations.
Past commits that might be useful sources of inspiration include

https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=456e3718e7b72efe4d2639437fcbca2e4ad83099
https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=5e8d670c313531c0dca245943fb84c94a477ddc4
https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=ec0a69e49bf41a37b5c2d6f6be66d8abae00ee05

If you're not good with Python, maybe you could just explain to us
how to recognize these characters from Unicode character properties.

regards, tom lane




Re: [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition

2019-11-03 Thread Tom Lane
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> I just noticed that when contrib/seg was converted to V1 calling
> convention (commit 389bb2818f4), the PG_GETARG_SEG_P() macro got defined
> in terms of PG_GETARG_POINTER().  But it itself calls DatumGetPointer(),
> so shouldn't it be using PG_GETARG_DATUM()?

Yup, I agree.  Pushed.

regards, tom lane




Re: The flinfo->fn_extra question, from me this time.

2019-11-03 Thread Dent John
> On 3 Nov 2019, at 13:33, Pavel Stehule  wrote:
> 
> can be nice, if patch has some regress tests - it is good for memory 
> refreshing what is target of patch.

With a suitably small work_mem constraint, it is possible to show the absence 
of buffers resulting from the tuplestore. It’ll need some commentary explaining 
what is being looked for, and why. But it’s a good idea.

I’ll take a look.

denty.



Re: The flinfo->fn_extra question, from me this time.

2019-11-03 Thread Pavel Stehule
Hi

ne 3. 11. 2019 v 12:51 odesílatel Dent John  napsal:

> (And here’s aforementioned attachment… doh.)
>

can be nice, if patch has some regress tests - it is good for memory
refreshing what is target of patch.

Regards

Pavel


Re: The flinfo->fn_extra question, from me this time.

2019-11-03 Thread Dent John
(And here’s aforementioned attachment… doh.)



pipeline-functionscan-v2.patch
Description: Binary data



Re: 64 bit transaction id

2019-11-03 Thread Павел Ерёмин
I completely agree with all of the above. Therefore, the proposed mechanism may entail larger improvements (and not only VACUUM).I can offer the following solution.For VACUUM, create a hash table.VACUUM scanning the table sees that the version (tuple1) has t_ctid filled and refers to the address tuple2, it creates a structure into which it writes the address tuple1, tuple1.xid, length tuple1 (well, and other information that is needed), puts this structure in the hash table by key tuple2 addresses.VACUUM reaches tuple2, checks the address of tuple2 in the hash table - if it finds it, it evaluates the connection between them and makes a decision on cleaning.  regards03.11.2019, 02:20, "Tomas Vondra" :On Sat, Nov 02, 2019 at 11:35:09PM +0300, Павел Ерёмин wrote:   The proposed option is not much different from what it is now.   We are not trying to save some space - we will reuse the existing one. We   just work in 64 bit transaction counters. Correct me if I'm wrong - the   address of the next version of the line is stored in the 6 byte field   t_cid in the tuple header - which is not attached to the current page in   any way - and can be stored anywhere in the table. Nothing changes.I think you mean t_ctid, not t_cid (which is a 4-byte CommandId, not anysort of item pointer).I think this comment from htup_details.h explains the issue: * ... Beware however that VACUUM might * erase the pointed-to (newer) tuple before erasing the pointing (older) * tuple.  Hence, when following a t_ctid link, it is necessary to check * to see if the referenced slot is empty or contains an unrelated tuple. * Check that the referenced tuple has XMIN equal to the referencing tuple's * XMAX to verify that it is actually the descendant version and not an * unrelated tuple stored into a slot recently freed by VACUUM.  If either * check fails, one may assume that there is no live descendant version.Now, imagine you have a tuple that gets updated repeatedly (say, 3x) andeach version gets to a different page. Say, pages #1, #2, #3. And thenVACUUM happens on some of the "middle" page (this may happen when tryingto fit new row onto a page to allow HOT, but it might happen even duringregular VACUUM).So we started with 3 tuples on pages #1, #2, #3, but now we have this  #1 - tuple exists, points to tuple on page #2  #2 - tuple no longer exists, cleaned up by vacuum  #3 - tuple existsThe scheme you proposed requires existence of all the tuples in thechain to determine visibility. When tuple #2 no longer exists, it'simpossible to decide whether tuple on page #1 is visible or not.This also significantly increases the amount of random I/O, pretty muchby factor of 2, because whenever you look at a row, you also have tolook at the "next version" which may be on another page. That's prettybad, bot for I/O and cache hit ratio. I don't think that's a reasonabletrade-off (at least compared to simply making the XIDs 64bit).regards-- Tomas Vondra  http://www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 

Re: pglz performance

2019-11-03 Thread Tels

Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
1 нояб. 2019 г., в 18:48, Alvaro Herrera  
написал(а):

PFA two patches:
v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
test_pglz extension)
v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
as 'hacked8')


Looking at the patches, it seems only the case of a match is changed. 
But when we observe a literal byte, this is copied byte-by-byte with:


 else
  {
  * An unset control bit means LITERAL BYTE. So we just
  * copy one from INPUT to OUTPUT.
  */
  *dp++ = *sp++;
  }

Maybe we can optimize this, too. For instance, you could just increase a 
counter:


 else
  {
  /*
  * An unset control bit means LITERAL BYTE. We count
  * these and copy them later.
  */
  literal_bytes ++;
  }

and in the case of:

  if (ctrl & 1)
{
/* First copy all the literal bytes */
if (literal_bytes > 0)
  {
  memcpy( sp, dp, literal_bytes);
  sp += literal_bytes;
  dp += literal_bytes;
  literal_bytes = 0;
  }

(Code untested!)

The same would need to be done at the very end, if the input ends 
without any new CTRL-byte.


Wether that gains us anything depends on how common literal bytes are. 
It might be that highly compressible input has almost none, while input 
that is a mix of incompressible strings and compressible ones might have 
longer stretches. One example would be something like an SHA-256, that 
is repeated twice. The first instance would be incompressible, the 
second one would be just a copy. This might not happens that often in 
practical inputs, though.


I wonder if you agree and what would happen if you try this variant on 
your corpus tests.


Best regards,

Tels




Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.

2019-11-03 Thread Gilles Darold
Le 02/11/2019 à 08:31, Michael Paquier a écrit :
> On Fri, Nov 01, 2019 at 05:29:23PM +0100, Gilles Darold wrote:
>> I have attached a patch to the documentation that adds remote tables to
>> the list of objects where any operation prevent using a prepared
>> transaction, currently it is just notified "operations involving
>> temporary tables or the session's temporary namespace".
> Perhaps we had better use foreign tables for the error message and the
> docs?
> --
> Michael


Agree, attached is a new version of the patches that replace word remote
by foreign.

--

Gilles

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 7cd69cc709..7d0f9ed72a 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -725,7 +725,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	 */
 	ereport(ERROR,
 			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("cannot prepare a transaction that modified remote tables")));
+			 errmsg("cannot prepare a transaction that has operated on foreign tables")));
 	break;
 case XACT_EVENT_PARALLEL_COMMIT:
 case XACT_EVENT_COMMIT:
diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index 5016ca287e..d7bf2026f7 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -99,8 +99,8 @@ PREPARE TRANSACTION transaction_id
   
It is not currently allowed to PREPARE a transaction that
has executed any operations involving temporary tables or the session's
-   temporary namespace, created any cursors WITH HOLD, or
-   executed LISTEN, UNLISTEN, or
+   temporary namespace or foreign tables, created any cursors WITH HOLD,
+   or executed LISTEN, UNLISTEN, or
NOTIFY.
Those features are too tightly
tied to the current session to be useful in a transaction to be prepared.


updating unaccent.rules for Arabic letters

2019-11-03 Thread kerbrose khaled
Hello Folks

I would like to update unaccent.rules file to support Arabic letters. so could 
someone help me or tell me how could I add such contribution. I attached the 
file including the modifications, only the last 4 lines.

thank you




unaccent.rules
Description: unaccent.rules