Re: Refactor parse analysis of EXECUTE command
On 2019-11-02 16:00, Tom Lane wrote: Peter Eisentraut writes: This patch moves the parse analysis component of ExecuteQuery() and EvaluateParams() into a new transformExecuteStmt() that is called from transformStmt(). Uhmm ... no actual patch attached? Oops, here it is. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 7c5bc30a02ec34646c8e49af1499fe4113bc9e5e Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 31 Oct 2019 09:54:07 +0100 Subject: [PATCH] Refactor parse analysis of EXECUTE command Move the parse analysis component of ExecuteQuery() and EvaluateParams() into a new transformExecuteStmt() that is called from transformStmt(). This makes EXECUTE behave more like other utility commands. It also allows error messages to have position information, and it allows using external parameters in the arguments of the EXECUTE command. --- src/backend/commands/createas.c | 2 +- src/backend/commands/prepare.c| 72 ++ src/backend/parser/analyze.c | 89 +++ src/backend/tcop/utility.c| 2 +- src/include/commands/prepare.h| 2 +- src/test/regress/expected/prepare.out | 25 src/test/regress/sql/prepare.sql | 21 +++ 7 files changed, 143 insertions(+), 70 deletions(-) diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index b7d220699f..e4244f84e2 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -271,7 +271,7 @@ ExecCreateTableAs(CreateTableAsStmt *stmt, const char *queryString, ExecuteStmt *estmt = castNode(ExecuteStmt, query->utilityStmt); Assert(!is_matview);/* excluded by syntax */ - ExecuteQuery(estmt, into, queryString, params, dest, completionTag); + ExecuteQuery(estmt, into, params, dest, completionTag); /* get object address that intorel_startup saved for us */ address = ((DR_intorel *) dest)->reladdr; diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 7e0a041fab..0aba6a7b00 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -47,7 +47,7 @@ static HTAB *prepared_queries = NULL; static void InitQueryHashTable(void); static ParamListInfo EvaluateParams(PreparedStatement *pstmt, List *params, - const char *queryString, EState *estate); + EState *estate); static Datum build_regtype_array(Oid *param_types, int num_params); /* @@ -189,16 +189,10 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString, * indicated by passing a non-null intoClause. The DestReceiver is already * set up correctly for CREATE TABLE AS, but we still have to make a few * other adjustments here. - * - * Note: this is one of very few places in the code that needs to deal with - * two query strings at once. The passed-in queryString is that of the - * EXECUTE, which we might need for error reporting while processing the - * parameter expressions. The query_string that we copy from the plan - * source is that of the original PREPARE. */ void ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause, -const char *queryString, ParamListInfo params, +ParamListInfo params, DestReceiver *dest, char *completionTag) { PreparedStatement *entry; @@ -229,8 +223,7 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause, */ estate = CreateExecutorState(); estate->es_param_list_info = params; - paramLI = EvaluateParams(entry, stmt->params, -queryString, estate); + paramLI = EvaluateParams(entry, stmt->params, estate); } /* Create a new portal to run the query in */ @@ -316,7 +309,6 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause, * * pstmt: statement we are getting parameters for. * params: list of given parameter expressions (raw parser output!) - * queryString: source text for error messages. * estate: executor state to use. * * Returns a filled-in ParamListInfo -- this can later be passed to @@ -324,72 +316,19 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause, * during query execution. */ static ParamListInfo -EvaluateParams(PreparedStatement *pstmt, List *params, - const char *queryString, EState *estate) +EvaluateParams(PreparedStatement *pstmt, List *params, EState *estate) { Oid*param_types = pstmt->plansource->param_types; int num_params = pstmt->plansource->num_params; - int
Re: cost based vacuum (parallel)
> > > This is somewhat similar to a memory usage problem with a > parallel query where each worker is allowed to use up to work_mem of > memory. We can say that the users using parallel operation can expect > more system resources to be used as they want to get the operation > done faster, so we are fine with this. However, I am not sure if that > is the right thing, so we should try to come up with some solution for > it and if the solution is too complex, then probably we can think of > documenting such behavior. > In cloud environments (Amazon + gp2) there's a budget on input/output operations. If you cross it for long time, everything starts looking like you work with a floppy disk. For the ease of configuration, I would need a "max_vacuum_disk_iops" that would limit number of input-output operations by all of the vacuums in the system. If I set it to less than value of budget refill, I can be sure than that no vacuum runs too fast to impact any sibling query. There's also value in non-throttled VACUUM for smaller tables. On gp2 such things will be consumed out of surge budget, and its size is known to sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has less blocks than this value and it's a blocking in any way situation (antiwraparound, interactive console, ...) - go on and run without throttling. For how to balance the cost: if we know a number of vacuum processes that were running in the previous second, we can just divide a slot for this iteration by that previous number. To correct for overshots, we can subtract the previous second's overshot from next one's. That would also allow to account for surge budget usage and let it refill, pausing all autovacuum after a manual one for some time. Precision of accounting limiting count of operations more than once a second isn't beneficial for this use case. Please don't forget that processing one page can become several iops (read, write, wal). Does this make sense? :)
Re: [HACKERS] Block level parallel vacuum
On Mon, 4 Nov 2019 at 14:02, Amit Kapila wrote: > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada wrote: > > > > I think that two approaches make parallel vacuum worker wait in > > different way: in approach(a) the vacuum delay works as if vacuum is > > performed by single process, on the other hand in approach(b) the > > vacuum delay work for each workers independently. > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > ms. In single process vacuum the total sleep time is 2,500ms (= > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > Because all parallel vacuum workers use the shared balance value and a > > worker sleeps once the balance value exceeds the limit. In > > approach(b), since the cost limit is divided evenly the value of each > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > processes blocks evenly, the total sleep time of all workers is > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > compute the sleep time of approach(b) by dividing the total value by > > the number of parallel workers. > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > vacuum and parallel vacuum with approach(a) even with the same > > settings. Which behaviors do we expect? > > > > Yeah, this is an important thing to decide. I don't think that the > conclusion you are drawing is correct because it that is true then the > same applies to the current autovacuum work division where we divide > the cost_limit among workers but the cost_delay is same (see > autovac_balance_cost). Basically, if we consider the delay time of > each worker independently, then it would appear that a parallel vacuum > delay with approach (b) is more, but that is true only if the workers > run serially which is not true. > > > I thought the vacuum delay for > > parallel vacuum should work as if it's a single process vacuum as we > > did for memory usage. I might be missing something. If we prefer > > approach(b) I should change the patch so that the leader process > > divides the cost limit evenly. > > > > I am also not completely sure which approach is better but I slightly > lean towards approach (b). Can we get the same sleep time as approach (b) if we divide the cost limit by the number of workers and have the shared cost balance (i.e. approach (a) with dividing the cost limit)? Currently the approach (b) seems better but I'm concerned that it might unnecessarily delay vacuum if some indexes are very small or bulk-deletions of indexes does almost nothing such as brin. > > I think we need input from some other > people as well. I will start a separate thread to discuss this and > see if that helps to get the input from others. +1 -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pglz performance
Hi Tels! Thanks for your interest in fast decompression. > 3 нояб. 2019 г., в 12:24, Tels написал(а): > > I wonder if you agree and what would happen if you try this variant on your > corpus tests. I've tried some different optimization for literals. For example loop unrolling[0] and literals bulk-copying. This approaches were brining some performance improvement. But with noise. Statistically they were somewhere better, somewhere worse, net win, but that "net win" depends on what we consider important data and important platform. Proposed patch makes clearly decompression faster on any dataset, and platform. I believe improving pglz further is viable, but optimizations like common data prefix seems more promising to me. Also, I think we actually need real codecs like lz4, zstd and brotli instead of our own invented wheel. If you have some spare time - Pull Requests to test_pglz are welcome, lets benchmark more micro optimizations, it brings a lot of fun :) -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud [0] https://github.com/x4m/test_pglz/blob/master/pg_lzcompress_hacked.c#L166
cost based vacuum (parallel)
For parallel vacuum [1], we were discussing what is the best way to divide the cost among parallel workers but we didn't get many inputs apart from people who are very actively involved in patch development. I feel that we need some more inputs before we finalize anything, so starting a new thread. The initial version of the patch has a very rudimentary way of doing it which means each parallel vacuum worker operates independently w.r.t vacuum delay and cost. This will lead to more I/O in the system than the user has intended to do. Assume that the overall I/O allowed for vacuum operation is X after which it will sleep for some time, reset the balance and continue. In the patch, each worker will be allowed to perform X before which it can sleep and also there is no coordination for the same with master backend which would have done some I/O for the heap. So, in the worst-case scenario, there can be n times more I/O where n is the number of workers doing the parallel operation. This is somewhat similar to a memory usage problem with a parallel query where each worker is allowed to use up to work_mem of memory. We can say that the users using parallel operation can expect more system resources to be used as they want to get the operation done faster, so we are fine with this. However, I am not sure if that is the right thing, so we should try to come up with some solution for it and if the solution is too complex, then probably we can think of documenting such behavior. The two approaches to solve this problem being discussed in that thread [1] are as follows: (a) Allow the parallel workers and master backend to have a shared view of vacuum cost related parameters (mainly VacuumCostBalance) and allow each worker to update it and then based on that decide whether it needs to sleep. Sawada-San has done the POC for this approach. See v32-0004-PoC-shared-vacuum-cost-balance in email [2]. One drawback of this approach could be that we allow the worker to sleep even though the I/O has been performed by some other worker. (b) The other idea could be that we split the I/O among workers something similar to what we do for auto vacuum workers (see autovac_balance_cost). The basic idea would be that before launching workers, we need to compute the remaining I/O (heap operation would have used something) after which we need to sleep and split it equally across workers. Here, we are primarily thinking of dividing VacuumCostBalance and VacuumCostLimit parameters. Once the workers are finished, they need to let master backend know how much I/O they have consumed and then master backend can add it to it's current I/O consumed. I think we also need to rebalance the cost of remaining workers once some of the worker's exit. Dilip has prepared a POC patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3]. I think approach-2 is better in throttling the system as it doesn't have the drawback of the first approach, but it might be a bit tricky to implement. As of now, the POC for both the approaches has been developed and we see similar results for both approaches, but we have tested simpler cases where each worker has similar amount of I/O to perform. Thoughts? [1] - https://commitfest.postgresql.org/25/1774/ [2] - https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAFiTN-thU-z8f04jO7xGMu5yUUpTpsBTvBrFW6EhRf-jGvEz%3Dg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Mon, Nov 4, 2019 at 10:32 AM Amit Kapila wrote: > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada wrote: > > > > I think that two approaches make parallel vacuum worker wait in > > different way: in approach(a) the vacuum delay works as if vacuum is > > performed by single process, on the other hand in approach(b) the > > vacuum delay work for each workers independently. > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > ms. In single process vacuum the total sleep time is 2,500ms (= > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > Because all parallel vacuum workers use the shared balance value and a > > worker sleeps once the balance value exceeds the limit. In > > approach(b), since the cost limit is divided evenly the value of each > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > processes blocks evenly, the total sleep time of all workers is > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > compute the sleep time of approach(b) by dividing the total value by > > the number of parallel workers. > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > vacuum and parallel vacuum with approach(a) even with the same > > settings. Which behaviors do we expect? > > > > Yeah, this is an important thing to decide. I don't think that the > conclusion you are drawing is correct because it that is true then the > same applies to the current autovacuum work division where we divide > the cost_limit among workers but the cost_delay is same (see > autovac_balance_cost). Basically, if we consider the delay time of > each worker independently, then it would appear that a parallel vacuum > delay with approach (b) is more, but that is true only if the workers > run serially which is not true. > > > I thought the vacuum delay for > > parallel vacuum should work as if it's a single process vacuum as we > > did for memory usage. I might be missing something. If we prefer > > approach(b) I should change the patch so that the leader process > > divides the cost limit evenly. > > > > I am also not completely sure which approach is better but I slightly > lean towards approach (b). I think we need input from some other > people as well. I will start a separate thread to discuss this and > see if that helps to get the input from others. +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Mon, Nov 4, 2019 at 10:45 AM Amit Kapila wrote: > > On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar wrote: > > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada > > wrote: > > > > > > > > > I think that two approaches make parallel vacuum worker wait in > > > different way: in approach(a) the vacuum delay works as if vacuum is > > > performed by single process, on the other hand in approach(b) the > > > vacuum delay work for each workers independently. > > > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > > ms. In single process vacuum the total sleep time is 2,500ms (= > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > > Because all parallel vacuum workers use the shared balance value and a > > > worker sleeps once the balance value exceeds the limit. In > > > approach(b), since the cost limit is divided evenly the value of each > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > > processes blocks evenly, the total sleep time of all workers is > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > > compute the sleep time of approach(b) by dividing the total value by > > > the number of parallel workers. > > > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > > vacuum and parallel vacuum with approach(a) even with the same > > > settings. Which behaviors do we expect? I thought the vacuum delay for > > > parallel vacuum should work as if it's a single process vacuum as we > > > did for memory usage. I might be missing something. If we prefer > > > approach(b) I should change the patch so that the leader process > > > divides the cost limit evenly. > > > > > I have repeated the same test (test1 and test2)[1] with a higher > > shared buffer (1GB). Currently, I have used the same formula for > > computing the total delay > > heap scan delay + index vacuuming delay / workers. Because, In my > > opinion, multiple workers are doing I/O here so the total delay should > > also be in multiple > > of the number of workers. So if we want to compare the delay with the > > sequential vacuum then we should divide total delay by the number of > > workers. But, I am not > > sure whether computing the total delay is the right way to compute the > > I/O throttling or not. But, I support the approach (b) for dividing > > the I/O limit because > > auto vacuum workers are already operating with this approach. > > > > test1: > > normal: stats delay 1348.16, hit 68952, miss 2, dirty 10063, total 79017 > > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146, > > total 79102 (cost divide patch) > > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036, > > total 78994 (cost divide patch) > > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066, > > total 92252 (share cost patch) > > 2 worker: stats delay 904.37, hit 86482, miss 2, dirty 17806, > > total 104290 (share cost patch) > > > > test2: > > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472 > > 1 worker: stats delay 530.70, hit 36984, miss 2, dirty 3527, total > > 40513 (cost divide patch) > > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total > > 40518 (cost divide patch) > > 1 worker: stats delay 490.57, hit 39090, miss 2, dirty 3497, total > > 42589 (share cost patch) > > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total > > 42871 (share cost patch) > > > > So with higher, shared buffers, I can see with approach (b) we can > > see the same total delay. With approach (a) I can see a bit less > > total delay. But, a point to be noted that I have used the same > > formulae for computing the total delay for both the approaches. But, > > Sawada-san explained in the above mail that it may not be the right > > way to computing the total delay for the approach (a). But my take is > > that whether we are working with shared cost or we are dividing the > > cost, the delay must be divided by number of workers in the parallel > > phase. > > > > Why do you think so? I think with approach (b) if all the workers are > doing equal amount of I/O, they will probably sleep at the same time > whereas with approach (a) each of them will sleep at different times. > So, probably dividing the delay in approach (b) makes more sense. Just to be clear, I did not mean that we divide the sleep time for each worker. Actually, I meant how to project the total delay in the test patch. So I think if we directly want to compare the sleep time of the sequential vs parallel then it's not fair to just compare the total sleep time because when multiple workers are working parallelly shouldn't we need to consider their average sleep time? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar wrote: > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada wrote: > > > > > > I think that two approaches make parallel vacuum worker wait in > > different way: in approach(a) the vacuum delay works as if vacuum is > > performed by single process, on the other hand in approach(b) the > > vacuum delay work for each workers independently. > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > ms. In single process vacuum the total sleep time is 2,500ms (= > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > Because all parallel vacuum workers use the shared balance value and a > > worker sleeps once the balance value exceeds the limit. In > > approach(b), since the cost limit is divided evenly the value of each > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > processes blocks evenly, the total sleep time of all workers is > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > compute the sleep time of approach(b) by dividing the total value by > > the number of parallel workers. > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > vacuum and parallel vacuum with approach(a) even with the same > > settings. Which behaviors do we expect? I thought the vacuum delay for > > parallel vacuum should work as if it's a single process vacuum as we > > did for memory usage. I might be missing something. If we prefer > > approach(b) I should change the patch so that the leader process > > divides the cost limit evenly. > > > I have repeated the same test (test1 and test2)[1] with a higher > shared buffer (1GB). Currently, I have used the same formula for > computing the total delay > heap scan delay + index vacuuming delay / workers. Because, In my > opinion, multiple workers are doing I/O here so the total delay should > also be in multiple > of the number of workers. So if we want to compare the delay with the > sequential vacuum then we should divide total delay by the number of > workers. But, I am not > sure whether computing the total delay is the right way to compute the > I/O throttling or not. But, I support the approach (b) for dividing > the I/O limit because > auto vacuum workers are already operating with this approach. > > test1: > normal: stats delay 1348.16, hit 68952, miss 2, dirty 10063, total 79017 > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146, > total 79102 (cost divide patch) > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036, > total 78994 (cost divide patch) > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066, > total 92252 (share cost patch) > 2 worker: stats delay 904.37, hit 86482, miss 2, dirty 17806, > total 104290 (share cost patch) > > test2: > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472 > 1 worker: stats delay 530.70, hit 36984, miss 2, dirty 3527, total > 40513 (cost divide patch) > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total > 40518 (cost divide patch) > 1 worker: stats delay 490.57, hit 39090, miss 2, dirty 3497, total > 42589 (share cost patch) > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total > 42871 (share cost patch) > > So with higher, shared buffers, I can see with approach (b) we can > see the same total delay. With approach (a) I can see a bit less > total delay. But, a point to be noted that I have used the same > formulae for computing the total delay for both the approaches. But, > Sawada-san explained in the above mail that it may not be the right > way to computing the total delay for the approach (a). But my take is > that whether we are working with shared cost or we are dividing the > cost, the delay must be divided by number of workers in the parallel > phase. > Why do you think so? I think with approach (b) if all the workers are doing equal amount of I/O, they will probably sleep at the same time whereas with approach (a) each of them will sleep at different times. So, probably dividing the delay in approach (b) makes more sense. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Can avoid list_copy in recomputeNamespacePath() conditionally?
On Sat, Nov 2, 2019 at 8:01 PM Tom Lane wrote: > amul sul writes: > > I wondered can we have a shortcut somewhat similar to following POC > > in recomputeNamespacePath () when the recomputed path is the same as the > > previous baseSearchPath/activeSearchPath : > > + /* TODO: POC */ > > + if (equal(oidlist, baseSearchPath)) > > + return; > > There's an awful lot missing from that sketch; all of the remaining > steps still need to be done: > > You are correct, but that was intentionally skipped to avoid longer post descriptions for the initial discussion. Sorry for being little lazy. > baseCreationNamespace = firstNS; > baseTempCreationPending = temp_missing; > > /* Mark the path valid. */ > baseSearchPathValid = true; > namespaceUser = roleid; > > /* And make it active. */ > activeSearchPath = baseSearchPath; > activeCreationNamespace = baseCreationNamespace; > activeTempCreationPending = baseTempCreationPending; > > /* Clean up. */ > pfree(rawname); > list_free(namelist); > list_free(oidlist); > > More to the point, I think the onus would be on the patch submitter > to prove that the extra complexity had some measurable benefit. > I really doubt that it would, since the list_copy is surely trivial > compared to the catalog lookup work we had to do to compute the OID > list above here. > Agree. > It'd likely be more useful to see if you could reduce the number of > places where we have to invalidate the path in the first place. > Understood, let me check. Regards, Amul
Re: [HACKERS] Block level parallel vacuum
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada wrote: > > I think that two approaches make parallel vacuum worker wait in > different way: in approach(a) the vacuum delay works as if vacuum is > performed by single process, on the other hand in approach(b) the > vacuum delay work for each workers independently. > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > ms. In single process vacuum the total sleep time is 2,500ms (= > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > Because all parallel vacuum workers use the shared balance value and a > worker sleeps once the balance value exceeds the limit. In > approach(b), since the cost limit is divided evenly the value of each > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > processes blocks evenly, the total sleep time of all workers is > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > compute the sleep time of approach(b) by dividing the total value by > the number of parallel workers. > > IOW the approach(b) makes parallel vacuum delay much more than normal > vacuum and parallel vacuum with approach(a) even with the same > settings. Which behaviors do we expect? > Yeah, this is an important thing to decide. I don't think that the conclusion you are drawing is correct because it that is true then the same applies to the current autovacuum work division where we divide the cost_limit among workers but the cost_delay is same (see autovac_balance_cost). Basically, if we consider the delay time of each worker independently, then it would appear that a parallel vacuum delay with approach (b) is more, but that is true only if the workers run serially which is not true. > I thought the vacuum delay for > parallel vacuum should work as if it's a single process vacuum as we > did for memory usage. I might be missing something. If we prefer > approach(b) I should change the patch so that the leader process > divides the cost limit evenly. > I am also not completely sure which approach is better but I slightly lean towards approach (b). I think we need input from some other people as well. I will start a separate thread to discuss this and see if that helps to get the input from others. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Allow superuser to grant passwordless connection rights on postgres_fdw
Greetings, * Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote: > On 11/1/19 12:58 PM, Robert Haas wrote: > > On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan > > wrote: > >> This patch allows the superuser to grant passwordless connection rights > >> in postgres_fdw user mappings. > > This is clearly something that we need, as the current code seems > > woefully ignorant of the fact that passwords are not the only > > authentication method supported by PostgreSQL, nor even the most > > secure. > > > > But, I do wonder a bit if we ought to think harder about the overall > > authentication model for FDW. Like, maybe we'd take a different view > > of how to solve this particular piece of the problem if we were > > thinking about how FDWs could do LDAP authentication, SSL > > authentication, credentials forwarding... > > I'm certainly open to alternatives. I've long felt that the way to handle this kind of requirement is to have a "trusted remote server" kind of option- where the local server authenticates to the remote server as a *server* and then says "this is the user on this server, and this is the user that this user wishes to be" and the remote server is then able to decide if they accept that, or not. To be specific, there would be some kind of 'trust' established between the servers and only if there is some kind of server-level authentication, eg: dual TLS auth, or dual GSSAPI auth; and then, a mapping is defined for that server, which specifies what remote user is allowed to log in as what local user. This would be a server-to-server auth arrangement, and is quite different from credential forwarding, or similar. I am certainly also a huge fan of the idea that we support Kerberos/GSSAPI credential forwarding / delegation, where a client willingly forwards to the PG server a set of credentials which then allow the PG server to authenticate as that user to another system (eg: through an FDW to another PG server). Of course, as long as we're talking pie-in-the-sky ideas, I would certainly be entirely for supporting both. ;) Thanks, Stephen signature.asc Description: PGP signature
Re: Collation versioning
On Fri, Nov 1, 2019 at 2:21 AM Julien Rouhaud wrote: > Are you planning to continue working on it? For the record, that's > something needed to be able to implement a filter in REINDEX command > [1]. Bonjour Julien, Unfortunately I haven't had time to work on it seriously, but here's a quick rebase to get the proof-of-concept back into working shape. It's nice to see progress in other bits of the problem-space. I hope to have time to look at this patch set again soon, but if you or someone else would like hack on or think about it too, please feel free! Yes indeed this is exactly the same problem that you're trying to solve, approached from a different starting point. Here are some problems to think about: * We'd need to track dependencies on the default collation once we have versioning for that (see https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com). That is how most people actually consume collations out there in real life, and yet we don't normally track dependencies on the default collation and I don't know if that's simply a matter of ripping out all the code that looks like "xxx != DEFAULT_COLLATION_ID" in the dependency analysis code or if there's more to it. * Andres mentioned off-list that pg_depend rows might get blown away and recreated in some DDL circumstances. We need to look into that. * Another is that pg_upgrade won't preserve pg_depend rows, so you'd need some catalog manipulation (direct or via new DDL) to fix that. * Some have expressed doubt that pg_depend is the right place for this; let's see if any counter-proposals appear. > # reindex table t1; > WARNING: 01000: index "t1_val_idx" depends on collation 13330 version > "a153.97.35.8", but the current version is "153.97.35.8" > DETAIL: The index may be corrupted due to changes in sort order. > HINT: REINDEX to avoid the risk of corruption. > LOCATION: index_check_collation_version, index.c:1263 Duh. Yeah, that's stupid and needs to be fixed somehow. 0001-Remove-pg_collation.collversion-v2.patch Description: Binary data 0002-Add-pg_depend.refobjversion-v2.patch Description: Binary data 0003-Track-collation-versions-for-indexes-v2.patch Description: Binary data
TAP tests aren't using the magic words for Windows file access
Buildfarm member drongo has been failing the pg_ctl regression test pretty often. I happened to look closer at what's happening, and it's this: could not read "C:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_ctl/tmp_check/t_004_logrotate_primary_data/pgdata/current_logfiles": Permission denied at C:/prog/bf/root/HEAD/pgsql.build/src/test/perl/TestLib.pm line 397. That is, TestLib::slurp_file is failing to read a file. Almost certainly, "permission denied" doesn't really mean a permissions problem, but failure to specify the file-opening flags needed to allow concurrent access on Windows. We fixed this in pg_ctl itself in commit 0ba06e0bf ... but we didn't fix the TAP infrastructure. Is there an easy way to get Perl on board with that? regards, tom lane
Re: Binary support for pgoutput plugin
On Thu, Oct 31, 2019 at 3:03 AM Dave Cramer wrote: > Ok, I've rebased and reverted logicalrep_read_insert Hi Dave, >From the code style police (actually just from cfbot, which is set up to complain about declarations after statements, a bit of C99 we aren't ready for): proto.c:557:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] int len = pq_getmsgint(in, 4); /* read length */ ^ proto.c:573:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] int len = pq_getmsgint(in, 4); /* read length */ ^
Re: New SQL counter statistics view (pg_stat_sql)
On Thu, Oct 17, 2019 at 3:22 PM Smith, Peter wrote: > We have resurrected this 2 year old "SQL statements statistics counter" > proposal from Hari. > > The attached patch addresses the previous review issues. Hi, No comment on the patch but I noticed that the documentation changes don't build. Please make sure you can "make docs" successfully, having installed the documentation tools[1]. [1] https://www.postgresql.org/docs/devel/docguide-toolsets.html
Re: [HACKERS] Block level parallel vacuum
On Mon, Oct 28, 2019 at 1:52 PM Dilip Kumar wrote: > > On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila wrote: > > > > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar wrote: > > > > > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada > > > wrote: > > > > > > > > > > > I haven't yet read the new set of the patch. But, I have noticed one > > > thing. That we are getting the size of the statistics using the AM > > > routine. But, we are copying those statistics from local memory to > > > the shared memory directly using the memcpy. Wouldn't it be a good > > > idea to have an AM specific routine to get it copied from the local > > > memory to the shared memory? I am not sure it is worth it or not but > > > my thought behind this point is that it will give AM to have local > > > stats in any form ( like they can store a pointer in that ) but they > > > can serialize that while copying to shared stats. And, later when > > > shared stats are passed back to the Am then it can deserialize in its > > > local form and use it. > > > > > > > You have a point, but after changing the gist index, we don't have any > > current usage for indexes that need something like that. So, on one > > side there is some value in having an API to copy the stats, but on > > the other side without having clear usage of an API, it might not be > > good to expose a new API for the same. I think we can expose such an > > API in the future if there is a need for the same. > I agree with the point. But, the current patch exposes an API for > estimating the size for the statistics. So IMHO, either we expose > both APIs for estimating the size of the stats and copy the stats or > none. Am I missing something here? > I think the first one is a must as the things stand today because otherwise, we won't be able to copy the stats. The second one (expose an API to copy stats) is good to have but there is no usage of it immediately. We can expose the second API considering the future need but as there is no valid case as of now, it will be difficult to test and we are also not sure whether in future any IndexAM will require such an API. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: SimpleLruTruncate() mutual exclusion
On Thu, Aug 1, 2019 at 6:51 PM Noah Misch wrote: > vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to > unlink > vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to > unlink > vac_truncate_clog() instance 1 unlinks segment ABCD > vac_truncate_clog() instance 1 calls SetTransactionIdLimit() > vac_truncate_clog() instance 1 finishes > some backend calls SimpleLruZeroPage(), creating segment ABCD > vac_truncate_clog() instance 2 unlinks segment ABCD > > Serializing vac_truncate_clog() fixes that. I've wondered before (in a -bugs thread[1] about unexplained pg_serial wraparound warnings) if we could map 64 bit xids to wide SLRU file names that never wrap around and make this class of problem go away. Unfortunately multixacts would need 64 bit support too... [1] https://www.postgresql.org/message-id/flat/CAEBTBzuS-01t12GGVD6qCezce8EFD8aZ1V%2Bo_3BZ%3DbuVLQBtRg%40mail.gmail.com
Re: fe-utils - share query cancellation code
On Sat, Nov 2, 2019 at 10:38 AM Fabien COELHO wrote: > Attached patch v4 does it. Hi Fabien, It looks like don't define sigint_interrupt_jmp and sigint_interrupt_enabled on Windows, yet they are still declared and referenced? command.obj : error LNK2001: unresolved external symbol sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj] copy.obj : error LNK2001: unresolved external symbol sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj] input.obj : error LNK2001: unresolved external symbol sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj] mainloop.obj : error LNK2001: unresolved external symbol sigint_interrupt_enabled [C:\projects\postgresql\psql.vcxproj] command.obj : error LNK2001: unresolved external symbol sigint_interrupt_jmp [C:\projects\postgresql\psql.vcxproj] copy.obj : error LNK2001: unresolved external symbol sigint_interrupt_jmp [C:\projects\postgresql\psql.vcxproj] mainloop.obj : error LNK2001: unresolved external symbol sigint_interrupt_jmp [C:\projects\postgresql\psql.vcxproj] .\Release\psql\psql.exe : fatal error LNK1120: 2 unresolved externals [C:\projects\postgresql\psql.vcxproj] 0 Warning(s) https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.64074
Re: Log statement sample - take two
On Sat, Oct 19, 2019 at 05:02:01PM +0200, Adrien Nayrat wrote: Hello, This patch propose a new way to sample statement to logs. As a reminder, this feature was committed in PG12[1] then reverted[2] after the proposition of log_statement_sample_limit[3] The first implementation added a new GUC to sample statement logged by log_min_duration_statement. Then, we wanted to add the ability to log all statement whose duration exceed log_statement_sample_limit. This was confusing because log_min_duration behaves as a minimum to enable sampling. While log_statement_sample_limit behave as maximum to disable it.[4] Tomas Vondra proposed to use two minimum thresholds: 1) log_min_duration_sample - enables sampling of commands, using the existing GUC log_statement_sample_rate 2) log_min_duration_statement - logs all commands exceeding this This patch implement this idea. PS: I notice I forgot to mention "Only superusers can change this setting" in the log_transaction_sample_rate documentation. It attached a second patch to fix this. Seems fine to me, mostly. I think the docs should explain how log_min_duration_statement interacts with log_min_duration_sample. Attached is a patch doing that, by adding one para to each GUC, along with some minor rewordings. I think the docs are mixing "sampling" vs. "logging" and "durations" vs. "statements" not sure. I also think the two new sampling GUCs (log_min_duration_sample and log_statement_sample_rate) should be next to each other. We're not ordering the GUCs alphabetically anyway. I plan to make those changes and push in a couple days. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 185caaf3b1956300d87bdf48cdd91a7f89b515c3 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 4 Nov 2019 02:00:26 +0100 Subject: [PATCH 1/2] fix log_min_transaction_sample_rate docs --- doc/src/sgml/config.sgml | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 0191ec84b1..48d7939d2d 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5988,6 +5988,7 @@ local0.*/var/log/postgresql logs all statements for all transactions. log_transaction_sample_rate is helpful to track a sample of transaction. + Only superusers can change this setting. -- 2.21.0 >From ca4a8b558742f247e3c9e5d388dacd4ef918e154 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 4 Nov 2019 01:57:45 +0100 Subject: [PATCH 2/2] log_min_duration_sample rework --- doc/src/sgml/config.sgml | 85 +++ src/backend/tcop/postgres.c | 40 ++--- src/backend/utils/misc/guc.c | 27 +- src/backend/utils/misc/postgresql.conf.sample | 11 +++ src/include/utils/guc.h | 2 + 5 files changed, 153 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 48d7939d2d..e5372d933c 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5930,6 +5930,55 @@ local0.*/var/log/postgresql + + log_min_duration_sample (integer) + + log_min_duration_sample configuration parameter + + + + + Allows to sample the logging of the duration of each completed + statement if the statement ran for at least the specified amount of + time. If this value is specified without units, it is taken as milliseconds. + Setting this to zero samples all statement durations. + Minus-one (the default) disables sampling statement durations. + For example, if you set it to 250ms + then all SQL statements that run 250ms or longer will be considered + for sampling, with sample rate is controlled by . + Enabling this parameter can be helpful when the traffic too high to + sample all queries. + Only superusers can change this setting. + + + + This option has lower priority than , + which means setting this to a value higher than + means the sample rate is ignored and all queries will be logged. + + + + For clients using extended query protocol, durations of the Parse, + Bind, and Execute steps are logged independently. + + + + + When using this option together with + , + the text of statements that are logged because of + log_statement will not be repeated in the + duration log message. + If you are not using syslog, it is recommended + that you log the PID or session ID using + + so that you can link the statement message to the later + duration message using the process ID or session ID. + + + + +
Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
On Wed, Oct 16, 2019 at 6:49 PM Dave Cramer wrote: > Here's an updated patch that addresses some of Andres' concerns specifically > does not use strtok. Hi Dave, I think you need to s/strncasecmp/pg_strncasecmp/ to make this build on Windows. https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.63963
Re: Getting psql to redisplay command after \e
Fabien COELHO writes: >> I did experiment with trying to do that, but I couldn't get it to work, >> even with the single version of libreadline I had at hand. It appears >> to me that readline() starts by clearing the internal buffer. Even if >> we could persuade it to work in a particular readline version, I think >> the odds of making it portable across all the libreadline and libedit >> versions that are out there aren't very good. And there's definitely no >> chance of being remotely compatible with that behavior when using the >> bare tty drivers (psql -n). > This suggests that readline cannot be used to edit simply a known string? > :-( "rl_insert_text" looked promising, although probably not portable, and > I tried to make it work without much success anyway. Maybe I'll try to > investigate more deeply later. I think that rl_insert_text and friends can probably only be used from readline callback functions. So in principle maybe you could make it work by having an rl_startup_hook that injects text if there is any to inject. There would remain the issues of (a) is it portable across a wide range of readline and libedit versions, (b) will the prompting behavior be nice, and (c) do we really want this to work fundamentally differently when readline is turned off? (Pavel's code cited nearby seems to me to be a fine example of what we do *not* want to do. Getting in bed with libreadline to that extent is inevitably going to misbehave in some places.) >> In practice, if you decide that you don't like what you're looking at, >> you're probably going to go back into the editor to fix it, ie issue >> another \e. So I'm not sure that it's worth such pushups to get the >> data into readline's buffer. > For me \e should mean edit, not edit-and-execute, so I should be back to > prompt, which is the crux of my unease with how the feature behaves, > because it combines two functions that IMO shouldn't. I don't understand that complaint at all. My proposed patch does not change the behavior to force execution, and it does display a prompt --- one that reflects whether you've given a complete command. regards, tom lane
Re: Parallel leader process info in EXPLAIN
On Mon, Nov 4, 2019 at 12:11 PM Thomas Munro wrote: > I guess I thought of that as a debugging feature and took it out > because it was too verbose, but maybe it just needs to be controlled > by the VERBOSE switch. Do you think we should put that back? By which I mean: would you like to send a patch? :-) Here is a new version of the "Leader:" patch, because cfbot told me that gcc didn't like it as much as clang. 0001-Show-parallel-leader-stats-in-EXPLAIN-output-v2.patch Description: Binary data 0002-Improve-EXPLAIN-of-Sort-in-parallel-queries-v2.patch Description: Binary data
Re: Parallel leader process info in EXPLAIN
On Thu, Oct 31, 2019 at 5:24 AM Melanie Plageman wrote: > On Wed, Oct 23, 2019 at 12:30 AM Thomas Munro wrote: >> Of course there are some more things that could be reported in a >> similar way eventually, such as filter counters and hash join details. > > This made me think about other explain wishlist items. > For parallel hashjoin, I would find it useful to know which batches > each worker participated in (maybe just probing to start with, but > loading would be great too). > > I'm not sure anyone else (especially users) would care about this, > though. Yeah, I think that'd be interesting. At some point in the patch set when I was working on the batch load balancing strategy I showed the number of tuples hashed and number of batches probed by each process (not the actual batch numbers, since that seems a bit over the top): https://www.postgresql.org/message-id/CAEepm%3D0th8Le2SDCv32zN7tMyCJYR9oGYJ52fXNYJz1hrpGW%2BQ%40mail.gmail.com I guess I thought of that as a debugging feature and took it out because it was too verbose, but maybe it just needs to be controlled by the VERBOSE switch. Do you think we should put that back?
Re: Consolidate 'unique array values' logic into a reusable function?
On Tue, Sep 10, 2019 at 11:43 AM Thomas Munro wrote: > Rebased due to bitrot. Spotted one more place to use this, in > src/backend/utils/adt/txid.c. Rebased. I'm planning to commit this soon. 0001-Consolidate-code-that-makes-a-sorted-array-unique-v3.patch Description: Binary data
Re: Should we add xid_current() or a int8->xid cast?
On Tue, Oct 29, 2019 at 5:23 PM btfujiitkp wrote: > > Thomas Munro writes: > >> On Sun, Sep 1, 2019 at 5:04 PM Thomas Munro > >> wrote: > >>> Adding to CF. > > > >> Rebased. An OID clashed so re-roll the dice. Also spotted a typo. > > > > I have some questions in this code. Thanks for looking at the patch. > First, > "FullTransactionIdPrecedes(xmax, val)" is not equal to "val >= xmax" of > the previous code. "FullTransactionIdPrecedes(xmax, val)" expresses > "val > xmax". Is it all right? > > @@ -384,15 +324,17 @@ parse_snapshot(const char *str) > while (*str != '\0') > { > /* read next value */ > - val = str2txid(str, ); > + val = FullTransactionIdFromU64(pg_strtouint64(str, , > 10)); > str = endp; > > /* require the input to be in order */ > - if (val < xmin || val >= xmax || val < last_val) > + if (FullTransactionIdPrecedes(val, xmin) || > + FullTransactionIdPrecedes(xmax, val) || > + FullTransactionIdPrecedes(val, last_val)) > > In addition to it, as to current TransactionId(not FullTransactionId) > comparison, when we express ">=" of TransactionId, we use > "TransactionIdFollowsOrEquals". this method is referred by some methods. > On the other hand, FullTransactionIdFollowsOrEquals has not implemented > yet. So, how about implementing this method? Good idea. I added the missing variants: +#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value) +#define FullTransactionIdFollows(a, b) ((a).value > (b).value) +#define FullTransactionIdFollowsOrEquals(a, b) ((a).value >= (b).value) > Second, > About naming rule, "8" of xid8 means 8 bytes, but "8" has different > meaning in each situation. For example, int8 of PostgreSQL means 8 > bytes, int8 of C language means 8 bits. If 64 is used, it just means 64 > bits. how about xid64()? In C, the typenames use bits, by happy coincidence similar to the C99 stdint.h typenames (int32_t etc) that we should perhaps eventually switch to. In SQL, the types have names based on the number of bytes: int2, int4, int8, float4, float8, not conforming to any standard but established over 3 decades ago and also understood by a few other SQL systems. That's unfortunate, but I can't see that ever changing. I thought that it would make most sense for the SQL type to be called xid8, though admittedly it doesn't quite fit the pattern because xid is not called xid4. There is another example a bit like that: macaddr (6 bytes) and macaccdr8 (8 bytes). As for the C type, we use TransactionId and FullTransactionId (rather than, say, xid32 and xid64). In the attached I also took Tom's advice and used unused_oids script to pick random OIDs >= 8000 for all new objects (ignoring nearby comments about the range of OIDs used in different sections of the file). 0001-Add-SQL-type-xid8-to-expose-FullTransactionId-to--v3.patch Description: Binary data 0002-Introduce-xid8-variants-of-the-txid_XXX-fmgr-func-v3.patch Description: Binary data
Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
[ redirecting to -hackers ] I wrote: > Yeah, it seems like a bad idea to override the user's choice to write > a quote, even if one is not formally necessary. I propose the attached. After further experimentation, I concluded that that patch is a bad idea; it breaks a lot of cases that used to work before. It turns out that Readline has a bunch of behaviors for filename completion that occur outside of the rl_filename_completion_function function proper, and they all assume that what's passed back from that function is plain unquoted filename(s). Notably, completion of a path that includes directory names just doesn't work well at all anymore with that patch ... nor did it work well before, if the path contained characters that we thought we should quote. The right way to do things, seemingly, is to let rl_filename_completion_function be invoked without any interference, and instead put our SQL-aware quoting/dequoting logic into the hooks Readline provides for that purpose, rl_filename_quoting_function and rl_filename_dequoting_function. (It appears that somebody tried to do that before, way back around the turn of the century, but gave up on it. Too bad, because it's the right thing.) Of course, this only works if we have those hooks :-(. So far as I can tell, all libreadline variants that might still exist in the wild do have them; but libedit doesn't, at least not in the version Apple is shipping. Hence, what the attached patch does is to make configure probe for the existence of the hook variables; if they're not there, fall back to what I proposed previously. The behavior on libedit is a bit less nice than one could wish, but it's better than what we have now. I've tested this on the oldest and newest readline versions I have at hand (4.2a and 8.0), as well as the oldest and newest versions of Apple's libedit derivative; but I haven't tried it on whatever the BSDen are shipping as libedit. There's enough moving parts here that this probably needs to go through a full review cycle, so I'll add it to the next commitfest. Some notes for anybody wanting to review: * The patch now always quotes completed filenames, so quote_if_needed() is misnamed and overcomplicated for this use-case. I left the extra generality in place for possible future use. On the other hand, this is the *only* use-case, so you could also argue that we should just simplify the function's API. I have no strong opinion about that. * In addition to the directly-related-to-filenames changes, it turns out to be necessary to set rl_completer_quote_characters to include at least single quotes, else Readline doesn't understand that a quoted filename is quoted. The patch sets it to include double quotes as well. This is probably the aspect of the patch that most needs testing. The general effect of this change is that Readline now understands that quoted strings are single entities, plus it will try to complete the contents of a string if you ask it. The side-effects I've noticed seem to be all beneficial -- for example, if you do select * from "foo it now correctly completes table names starting with "foo", which it did not before. But there might be some bad effects as well. Also, although libedit has this variable, setting it doesn't have that effect there; I've not really found that the variable does anything at all there. * The behavior of quote_file_name is directly modeled on what Readline's default implementation rl_quote_filename does, except that it uses SQL-aware quoting rules. The business of passing back the final quote mark separately is their idea. * An example of the kind of case that's interesting is that if you type \lo_import /usr/i then what you get on readline (with this patch) is \lo_import '/usr/include/ while libedit produces \lo_import '/usr/include' (with a space after the trailing quote) That is, readline knows that the completion-so-far is a directory and assumes that's not all you want, whereas libedit doesn't know that. So you typically now have to back up two characters, type slash, and resume completing. That's kind of a pain, but I'm not sure we can make it better very easily. Anyway, libedit did something close to that before, too. * There are also some interesting cases around special characters in the filename. It seems to work well for embedded spaces, not so well for embedded single quotes, though that may well vary across readline versions. Again, there seems to be a limited amount we can do about that, given how much of the relevant logic is buried where we can't modify it. And I'm not sure how much I care about that case, anyway. regards, tom lane diff --git a/config/programs.m4 b/config/programs.m4 index 90ff944..e5bf980 100644 --- a/config/programs.m4 +++ b/config/programs.m4 @@ -209,11 +209,12 @@ fi -# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER -# --- +# PGAC_READLINE_VARIABLES +#
Re: [PATCH] Include triggers in EXPLAIN
=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= writes: > Recently I got few times into situation where I was trying to find out what > is blocking DELETE queries. Running EXPLAIN (even VERBOSE one) wasn't > useful, since the reason was slow trigger (missing index on foreign key > column). I had to create testing entry and run EXPLAIN ANALYZE DELETE to > get this information. > It will be really valuable for me to show triggers in EXPLAIN query since > it will make clear for me there will be any trigger "activated" during > execution of DELETE query and that can be the reason for slow DELETE. I don't really see the point of this patch? You do get the trigger times during EXPLAIN ANALYZE, and I don't believe that a plain EXPLAIN is going to have full information about what triggers might fire or not fire. regards, tom lane
Re: 64 bit transaction id
On Sat, Nov 2, 2019 at 6:15 AM Tomas Vondra wrote: > On Fri, Nov 01, 2019 at 12:05:12PM +0300, Павел Ерёмин wrote: > > Hi. > > sorry for my English. > > I want to once again open the topic of 64 bit transaction id. I did not > > manage to find in the archive of the option that I want to discuss, so I > > write. If I searched poorly, then please forgive me. > > The idea is not very original and probably has already been considered, > > again I repeat - I did not find it. Therefore, please do not scold me > > severely. > > In discussions of 64-bit transaction id, I did not find mention of an > > algorithm for storing them, as it was done, for example, in MS SQL Server. > > What if instead of 2 fields (xmin and xmax) with a total length of 64 bits > > - use 1 field (let's call it xid) with a length of 64 bits in tuple > > header? In this field store the xid of the transaction that created the > > version. In this case, the new transaction in order to understand whether > > the read version is suitable for it or not, will have to read the next > > version as well. Those. The downside of such decision is of course an > > increase in I / O. Transactions will have to read the +1 version. On the > > plus side, the title of the tuple remains the same length. > > > > I think that assumes we can easily identify the next version of a tuple, > and I don't think we can do that. We may be able to do that for for HOT > chains, but that only works when the next version fits onto the same > page (and does not update indexed columns). But when we store the new > version on a separate page, we don't have any link between those tuples. > And adding it may easily mean more overhead than the 8B we'd save by > only storing a single XID. > > IMO the most promising solution to this is the "page epoch" approach > discussed some time ago (1-2 years?). There have been so many discussions of this topic that it's hard to search for. Since we have in fact begun to take some baby steps towards using 64 bit transaction IDs in a few places, I decided to create a new wiki page to try to keep track of the various discussions. If you know where to find the 'best' discussions (for example the one where, if I recall correctly, it was Heikki who proposed a 'reference' FullTranasctionId on the page header) and any proposals that came with patches, then I'd be grateful if you could add links to it to this wiki page! https://wiki.postgresql.org/wiki/FullTransactionId
function calls optimization
Hi Main goal of this patch is to avoid repeated calls of immutable/stable functions. This patch is against version 10.10. I guess same logic could be implemented up till version 12. --- src/include/nodes/execnodes.h 2019-08-05 23:16:54.0 +0200 +++ src/include/nodes/execnodes.h 2019-11-03 20:05:34.338305825 +0100 @@ -882,6 +883,39 @@ typedef struct PlanState TupleTableSlot *ps_ResultTupleSlot; /* slot for my result tuples */ ExprContext *ps_ExprContext; /* node's expression-evaluation context */ ProjectionInfo *ps_ProjInfo; /* info for doing tuple projection */ +#ifdef OPTFUNCALLS + /* was_called - list of ExprEvalStep* or FuncExpr* depending on execution stage + * + * Stage I. ExecInitExprRec() + * List gathers all not volatile, not set returning, not window FuncExpr*, + * equal nodes occupy one position in the list. Position in this list ( counting from 1 ) + * and planstate are remembered in actual ExprEvalStep* + * + * For query: select f(n),f(n) from t - was_called->length will be 1 and ptr_value + * will be FuncExpr* node of f(n) + * + * For query: select f(n),g(n),f(n) from t - list->length == 2 + * + * Stage II. ExecProcnode() + * For every planstate->was_called list changes its interpretation - from now on + * it is a list of ExprEvalStep* . Before executing real execProcnode + * every element of this list ( ptr_value ) is set to NULL. We don't know which + * function will be called first + * + * Stage III. ExecInterpExpr() case EEOP_FUNCEXPR + * ExprEvalStep.position > 0 means that in planstate->was_called could be ExprEvalStep* + * which was done yet or NULL. + * + * NULL means that eval step is entered first time and: + * 1. real function must be called + * 2. ExprEvalStep has to be remembered in planstate->was_called at position + * step->position - 1 + * + * NOT NULL means that in planstate->was_called is ExprEvalStep* with ready result, so + * there is no need to call function + */ + List *was_called; +#endif } PlanState; /* --- src/include/executor/execExpr.h 2019-08-05 23:16:54.0 +0200 +++ src/include/executor/execExpr.h 2019-11-03 20:04:03.739025142 +0100 @@ -561,6 +561,10 @@ typedef struct ExprEvalStep AlternativeSubPlanState *asstate; } alternative_subplan; } d; +#ifdef OPTFUNCALLS + PlanState *planstate; /* parent PlanState for this expression */ + int position; /* position in planstate->was_called counted from 1 */ +#endif } ExprEvalStep; --- src/backend/executor/execProcnode.c 2019-08-05 23:16:54.0 +0200 +++ src/backend/executor/execProcnode.c 2019-11-03 19:54:28.071672386 +0100 @@ -120,6 +120,17 @@ static TupleTableSlot *ExecProcNodeFirst(PlanState *node); static TupleTableSlot *ExecProcNodeInstr(PlanState *node); +#ifdef OPTFUNCALLS +static TupleTableSlot *execReal(PlanState *node) +{ + /* Before each scan step, node->was_called elements must be set to NULL */ + ListCell *item; + foreach(item,node->was_called) + item->data.ptr_value = NULL; + + return node->ExecProcNodeReal(node); +} +#endif /* * ExecInitNode @@ -425,8 +436,11 @@ ExecProcNodeFirst(PlanState *node) if (node->instrument) node->ExecProcNode = ExecProcNodeInstr; else +#ifndef OPTFUNCALLS node->ExecProcNode = node->ExecProcNodeReal; - +#else + node->ExecProcNode = execReal; +#endif return node->ExecProcNode(node); } @@ -442,9 +456,11 @@ ExecProcNodeInstr(PlanState *node) TupleTableSlot *result; InstrStartNode(node->instrument); - +#ifndef OPTFUNCALLS result = node->ExecProcNodeReal(node); - +#else + result = execReal(node); +#endif InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0); return result; --- src/backend/executor/execExpr.c 2019-08-05 23:16:54.0 +0200 +++ src/backend/executor/execExpr.c 2019-11-03 19:57:21.994249398 +0100 @@ -45,7 +45,13 @@ #include "utils/builtins.h" #include "utils/lsyscache.h" #include "utils/typcache.h" - +#ifdef OPTFUNCALLS +#include "catalog/pg_proc.h" +#include "utils/syscache.h" +#include "access/htup_details.h" +static bool isNotVolatile(Oid funcid); +static int findFuncExpr(FuncExpr* node,PlanState* parent); +#endif typedef struct LastAttnumInfo { @@ -806,7 +812,40 @@ ExecInitExprRec(Expr *node, PlanState *p ExecInitFunc(, node, func->args, func->funcid, func->inputcollid, parent, state); +#ifdef OPTFUNCALLS +scratch.position = 0; +scratch.planstate = parent; +if( parent ) +{ + /* Build/extend the list of non volatile functions for this PlanState node. + * Try to find func ( equal node ) in parent->was_called list at first + */ + int pos = findFuncExpr(func,parent); + if( !pos && isNotVolatile(func->funcid )) + { + /* Function is not in the list yet but it's maybe useful for optimizing + * repeated calls - register function in parent and remember its position +
Re: Getting psql to redisplay command after \e
ne 3. 11. 2019 v 20:58 odesílatel Fabien COELHO napsal: > > Hello Tom, > > >> I was suggesting something much simpler than rethinking readline > handling. > >> Does not mean that it is a good idea, but while testing the patch I > would > >> have liked the unfinished line to be in the current editing buffer, > >> basically as if I had not typed . > > > > I did experiment with trying to do that, but I couldn't get it to work, > > even with the single version of libreadline I had at hand. It appears > > to me that readline() starts by clearing the internal buffer. Even if > > we could persuade it to work in a particular readline version, I think > > the odds of making it portable across all the libreadline and libedit > > versions that are out there aren't very good. And there's definitely no > > chance of being remotely compatible with that behavior when using the > > bare tty drivers (psql -n). > > Argh, too bad. > > This suggests that readline cannot be used to edit simply a known string? > :-( "rl_insert_text" looked promising, although probably not portable, and > I tried to make it work without much success anyway. Maybe I'll try to > investigate more deeply later. > pspg uses rl_insert_text https://github.com/okbob/pspg/blob/59d115cd55926ab1886fc0dedbbc6ce0577b0cb3/src/pspg.c#L2522 Pavel > Note that replacing the current buffer is exactly what history does. So > maybe that could be exploited by appending the edited line into history > (add_history) and tell readline to move there (could not find how to do > that automatically, though)? Or some other history handling… > > > In practice, if you decide that you don't like what you're looking at, > > you're probably going to go back into the editor to fix it, ie issue > > another \e. So I'm not sure that it's worth such pushups to get the > > data into readline's buffer. > > For me \e should mean edit, not edit-and-execute, so I should be back to > prompt, which is the crux of my unease with how the feature behaves, > because it combines two functions that IMO shouldn't. > > Anyway the submitted patch is an improvement to the current status. > > -- > Fabien.
Re: Getting psql to redisplay command after \e
Hello Tom, I was suggesting something much simpler than rethinking readline handling. Does not mean that it is a good idea, but while testing the patch I would have liked the unfinished line to be in the current editing buffer, basically as if I had not typed . I did experiment with trying to do that, but I couldn't get it to work, even with the single version of libreadline I had at hand. It appears to me that readline() starts by clearing the internal buffer. Even if we could persuade it to work in a particular readline version, I think the odds of making it portable across all the libreadline and libedit versions that are out there aren't very good. And there's definitely no chance of being remotely compatible with that behavior when using the bare tty drivers (psql -n). Argh, too bad. This suggests that readline cannot be used to edit simply a known string? :-( "rl_insert_text" looked promising, although probably not portable, and I tried to make it work without much success anyway. Maybe I'll try to investigate more deeply later. Note that replacing the current buffer is exactly what history does. So maybe that could be exploited by appending the edited line into history (add_history) and tell readline to move there (could not find how to do that automatically, though)? Or some other history handling… In practice, if you decide that you don't like what you're looking at, you're probably going to go back into the editor to fix it, ie issue another \e. So I'm not sure that it's worth such pushups to get the data into readline's buffer. For me \e should mean edit, not edit-and-execute, so I should be back to prompt, which is the crux of my unease with how the feature behaves, because it combines two functions that IMO shouldn't. Anyway the submitted patch is an improvement to the current status. -- Fabien.
[PATCH] Include triggers in EXPLAIN
Hello! Recently I got few times into situation where I was trying to find out what is blocking DELETE queries. Running EXPLAIN (even VERBOSE one) wasn't useful, since the reason was slow trigger (missing index on foreign key column). I had to create testing entry and run EXPLAIN ANALYZE DELETE to get this information. It will be really valuable for me to show triggers in EXPLAIN query since it will make clear for me there will be any trigger "activated" during execution of DELETE query and that can be the reason for slow DELETE. I have seen initial discussion at https://www.postgresql.org/message-id/flat/20693.732761%40sss.pgh.pa.us to show time spent in triggers in EXPLAIN ANALYZE including quick discussion to possibly show triggers during EXPLAIN. Anyway since it doesn't show any additional cost and just inform about the possibilities, I still consider this feature useful. This is probably implementation of idea mentioned at https://www.postgresql.org/message-id/21221.736869%40sss.pgh.pa.us by Tom Lane. After initial discussion with Pavel Stěhule and Tomáš Vondra at czech postgresql maillist ( https://groups.google.com/forum/#!topic/postgresql-cz/Dq1sT7huVho) I was able to prepare initial version of this patch. I have added EXPLAIN option called TRIGGERS enabled by default.There's already autoexplain property for this. I understand it is not possible to show only triggers which will be really activated unless query is really executed. EXPLAIN ANALYZE remains untouched with this patch. - patch with examples can be found at https://github.com/simi/postgres/pull/2 - DIFF format https://github.com/simi/postgres/pull/2.diff - PATCH format (also attached) https://github.com/simi/postgres/pull/2.patch All regression tests passed with this change locally on latest git master. I would like to cover this patch with more regression tests, but I wasn't sure where to place them, since there's no "EXPLAIN" related test "group". Is "src/test/regress/sql/triggers.sql" the best place to add tests related to this change? PS: This is my first try to contribute to postgresql codebase. The quality of patch is probably not the best, but I will be more than happy to do any requested change if needed. Regards, Josef Šimánek From 18578e3d07aa159631e0903abae496a6482fa279 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= Date: Sat, 27 Jul 2019 09:27:41 +0200 Subject: [PATCH] Show possible triggers in EXPLAIN. --- doc/src/sgml/ref/explain.sgml | 9 src/backend/commands/explain.c| 43 +++ src/include/commands/explain.h| 1 + src/test/regress/expected/foreign_key.out | 6 ++- src/test/regress/expected/insert_conflict.out | 4 +- src/test/regress/expected/updatable_views.out | 10 - 6 files changed, 60 insertions(+), 13 deletions(-) diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index 385d10411fa0..ba0d8df88c19 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -43,6 +43,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] statementboolean ] TIMING [ boolean ] SUMMARY [ boolean ] +TRIGGERS [ boolean ] FORMAT { TEXT | XML | JSON | YAML } @@ -224,6 +225,14 @@ ROLLBACK; + +TRIGGERS + + + TODO + + + FORMAT diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 62fb3434a32f..93845bcaad36 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -149,6 +149,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString, ListCell *lc; bool timing_set = false; bool summary_set = false; + bool triggers_set = false; /* Parse options list. */ foreach(lc, stmt->options) @@ -175,6 +176,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString, summary_set = true; es->summary = defGetBoolean(opt); } + else if (strcmp(opt->defname, "triggers") == 0) + { + triggers_set = true; + es->triggers = defGetBoolean(opt); + } else if (strcmp(opt->defname, "format") == 0) { char *p = defGetString(opt); @@ -210,6 +216,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString, /* if the timing was not set explicitly, set default value */ es->timing = (timing_set) ? es->timing : es->analyze; + /* if the triggers was not set explicitly, set default value */ + es->triggers = (triggers_set) ? es->triggers : true; + /* check that timing is used with EXPLAIN ANALYZE */ if (es->timing && !es->analyze) ereport(ERROR, @@ -556,7 +565,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, } /* Print info about runtime of triggers */ - if (es->analyze) + if (es->triggers) ExplainPrintTriggers(es, queryDesc); /* @@ -911,17 +920,23 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es) {
Re: Index Skip Scan
> On Wed, Sep 25, 2019 at 2:33 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > v27-0001-Index-skip-scan.patch > > Some random thoughts on this: Thanks a lot for the commentaries! > * Is _bt_scankey_within_page() really doing the right thing within empty > pages? > > It looks like you're accidentally using the high key when the leaf > page is empty with forward scans (assuming that the leaf page isn't > rightmost). You'll need to think about empty pages for both forward > and backward direction scans there. Yes, you're right, that's an issue I need to fix. > Actually, using the high key in some cases may be desirable, once the > details are worked out -- the high key is actually very helpful with > low cardinality indexes. If you populate an index with retail > insertions (i.e. don't just do a CREATE INDEX after the table is > populated), and use low cardinality data in the indexed columns then > you'll see this effect. Can you please elaborate a bit more? I see that using high key will help a forward index scans to access the minimum number of leaf pages, but I'm not following how is it connected to the _bt_scankey_within_page? Or is this commentary related in general to the whole implementation? > * The extra scankeys that you are using in most of the new nbtsearch.c > code is an insertion scankey -- not a search style scankey. I think > that you should try to be a bit clearer on that distinction in > comments. It is already confusing now, but at least there is only zero > or one insertion scankeys per scan (for the initial positioning). > > * There are two _bt_skip() prototypes in nbtree.h (actually, there is > a btskip() and a _bt_skip()). I understand that the former is a public > wrapper of the latter, but I find the naming a little bit confusing. > Maybe rename _bt_skip() to something that is a little bit more > suggestive of its purpose. > > * Suggest running pgindent on the patch. Sure, I'll incorporate mentioned improvements into the next patch version (hopefully soon). > And now some more: > > * I'm confused about this code in _bt_skip(): > Yeah, it shouldn't be there, but rather before _bt_next, that expects unlocked buffer. Will fix. > * It also seems a bit odd that you assume that the scan is > "scan->xs_want_itup", but then check that condition many times. Why > bother? > > * Similarly, why bother using _bt_drop_lock_and_maybe_pin() at all, > rather than just unlocking the buffer directly? We'll only drop the > pin for a scan that is "!scan->xs_want_itup", which is never the case > within _bt_skip(). > > I think that the macros and stuff that manage pins and buffer locks in > nbtsearch.c is kind of a disaster anyway [1]. Maybe there is some > value in trying to be consistent with existing nbtsearch.c code in > ways that aren't strictly necessary. Yep, I've seen this thread, but tried to be consistent with the surrounding core style. Probably it indeed doesn't make sense. > * Not sure why you need this code after throwing an error: > > > else > > { > > elog(ERROR, "Could not read closest index tuples: %d", > > offnum); > > pfree(so->skipScanKey); > > so->skipScanKey = NULL; > > return false; > > } Unfortunately this is just a leftover from a previous version. Sorry for that, will get rid of it.
Re: [HACKERS] proposal: schema variables
čt 10. 10. 2019 v 11:41 odesílatel Pavel Stehule napsal: > Hi > > minor change - replace heap_tuple_fetch_attr by detoast_external_attr. > > similar update - heap_open, heap_close was replaced by table_open, table_close Regards Pavel schema_variables-20191103.patch.gz Description: application/gzip
Re: updating unaccent.rules for Arabic letters
kerbrose khaled writes: > I would like to update unaccent.rules file to support Arabic letters. so > could someone help me or tell me how could I add such contribution. I > attached the file including the modifications, only the last 4 lines. Hi! I've got no objection to including Arabic in the set of covered languages, but handing us a new unaccent.rules file isn't the way to do it, because that's a generated file. The adjacent script generate_unaccent_rules.py generates it from the official Unicode source data (see comments in that script). What we need, ultimately, is a patch to that script so it will emit these additional translations. Past commits that might be useful sources of inspiration include https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=456e3718e7b72efe4d2639437fcbca2e4ad83099 https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=5e8d670c313531c0dca245943fb84c94a477ddc4 https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=ec0a69e49bf41a37b5c2d6f6be66d8abae00ee05 If you're not good with Python, maybe you could just explain to us how to recognize these characters from Unicode character properties. regards, tom lane
Re: [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: > I just noticed that when contrib/seg was converted to V1 calling > convention (commit 389bb2818f4), the PG_GETARG_SEG_P() macro got defined > in terms of PG_GETARG_POINTER(). But it itself calls DatumGetPointer(), > so shouldn't it be using PG_GETARG_DATUM()? Yup, I agree. Pushed. regards, tom lane
Re: The flinfo->fn_extra question, from me this time.
> On 3 Nov 2019, at 13:33, Pavel Stehule wrote: > > can be nice, if patch has some regress tests - it is good for memory > refreshing what is target of patch. With a suitably small work_mem constraint, it is possible to show the absence of buffers resulting from the tuplestore. It’ll need some commentary explaining what is being looked for, and why. But it’s a good idea. I’ll take a look. denty.
Re: The flinfo->fn_extra question, from me this time.
Hi ne 3. 11. 2019 v 12:51 odesílatel Dent John napsal: > (And here’s aforementioned attachment… doh.) > can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch. Regards Pavel
Re: The flinfo->fn_extra question, from me this time.
(And here’s aforementioned attachment… doh.) pipeline-functionscan-v2.patch Description: Binary data
Re: 64 bit transaction id
I completely agree with all of the above. Therefore, the proposed mechanism may entail larger improvements (and not only VACUUM).I can offer the following solution.For VACUUM, create a hash table.VACUUM scanning the table sees that the version (tuple1) has t_ctid filled and refers to the address tuple2, it creates a structure into which it writes the address tuple1, tuple1.xid, length tuple1 (well, and other information that is needed), puts this structure in the hash table by key tuple2 addresses.VACUUM reaches tuple2, checks the address of tuple2 in the hash table - if it finds it, it evaluates the connection between them and makes a decision on cleaning. regards03.11.2019, 02:20, "Tomas Vondra" :On Sat, Nov 02, 2019 at 11:35:09PM +0300, Павел Ерёмин wrote: The proposed option is not much different from what it is now. We are not trying to save some space - we will reuse the existing one. We just work in 64 bit transaction counters. Correct me if I'm wrong - the address of the next version of the line is stored in the 6 byte field t_cid in the tuple header - which is not attached to the current page in any way - and can be stored anywhere in the table. Nothing changes.I think you mean t_ctid, not t_cid (which is a 4-byte CommandId, not anysort of item pointer).I think this comment from htup_details.h explains the issue: * ... Beware however that VACUUM might * erase the pointed-to (newer) tuple before erasing the pointing (older) * tuple. Hence, when following a t_ctid link, it is necessary to check * to see if the referenced slot is empty or contains an unrelated tuple. * Check that the referenced tuple has XMIN equal to the referencing tuple's * XMAX to verify that it is actually the descendant version and not an * unrelated tuple stored into a slot recently freed by VACUUM. If either * check fails, one may assume that there is no live descendant version.Now, imagine you have a tuple that gets updated repeatedly (say, 3x) andeach version gets to a different page. Say, pages #1, #2, #3. And thenVACUUM happens on some of the "middle" page (this may happen when tryingto fit new row onto a page to allow HOT, but it might happen even duringregular VACUUM).So we started with 3 tuples on pages #1, #2, #3, but now we have this #1 - tuple exists, points to tuple on page #2 #2 - tuple no longer exists, cleaned up by vacuum #3 - tuple existsThe scheme you proposed requires existence of all the tuples in thechain to determine visibility. When tuple #2 no longer exists, it'simpossible to decide whether tuple on page #1 is visible or not.This also significantly increases the amount of random I/O, pretty muchby factor of 2, because whenever you look at a row, you also have tolook at the "next version" which may be on another page. That's prettybad, bot for I/O and cache hit ratio. I don't think that's a reasonabletrade-off (at least compared to simply making the XIDs 64bit).regards-- Tomas Vondra http://www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pglz performance
Hello Andrey, On 2019-11-02 12:30, Andrey Borodin wrote: 1 нояб. 2019 г., в 18:48, Alvaro Herrera написал(а): PFA two patches: v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in test_pglz extension) v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known as 'hacked8') Looking at the patches, it seems only the case of a match is changed. But when we observe a literal byte, this is copied byte-by-byte with: else { * An unset control bit means LITERAL BYTE. So we just * copy one from INPUT to OUTPUT. */ *dp++ = *sp++; } Maybe we can optimize this, too. For instance, you could just increase a counter: else { /* * An unset control bit means LITERAL BYTE. We count * these and copy them later. */ literal_bytes ++; } and in the case of: if (ctrl & 1) { /* First copy all the literal bytes */ if (literal_bytes > 0) { memcpy( sp, dp, literal_bytes); sp += literal_bytes; dp += literal_bytes; literal_bytes = 0; } (Code untested!) The same would need to be done at the very end, if the input ends without any new CTRL-byte. Wether that gains us anything depends on how common literal bytes are. It might be that highly compressible input has almost none, while input that is a mix of incompressible strings and compressible ones might have longer stretches. One example would be something like an SHA-256, that is repeated twice. The first instance would be incompressible, the second one would be just a copy. This might not happens that often in practical inputs, though. I wonder if you agree and what would happen if you try this variant on your corpus tests. Best regards, Tels
Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.
Le 02/11/2019 à 08:31, Michael Paquier a écrit : > On Fri, Nov 01, 2019 at 05:29:23PM +0100, Gilles Darold wrote: >> I have attached a patch to the documentation that adds remote tables to >> the list of objects where any operation prevent using a prepared >> transaction, currently it is just notified "operations involving >> temporary tables or the session's temporary namespace". > Perhaps we had better use foreign tables for the error message and the > docs? > -- > Michael Agree, attached is a new version of the patches that replace word remote by foreign. -- Gilles diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 7cd69cc709..7d0f9ed72a 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -725,7 +725,7 @@ pgfdw_xact_callback(XactEvent event, void *arg) */ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot prepare a transaction that modified remote tables"))); + errmsg("cannot prepare a transaction that has operated on foreign tables"))); break; case XACT_EVENT_PARALLEL_COMMIT: case XACT_EVENT_COMMIT: diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml index 5016ca287e..d7bf2026f7 100644 --- a/doc/src/sgml/ref/prepare_transaction.sgml +++ b/doc/src/sgml/ref/prepare_transaction.sgml @@ -99,8 +99,8 @@ PREPARE TRANSACTION transaction_id It is not currently allowed to PREPARE a transaction that has executed any operations involving temporary tables or the session's - temporary namespace, created any cursors WITH HOLD, or - executed LISTEN, UNLISTEN, or + temporary namespace or foreign tables, created any cursors WITH HOLD, + or executed LISTEN, UNLISTEN, or NOTIFY. Those features are too tightly tied to the current session to be useful in a transaction to be prepared.
updating unaccent.rules for Arabic letters
Hello Folks I would like to update unaccent.rules file to support Arabic letters. so could someone help me or tell me how could I add such contribution. I attached the file including the modifications, only the last 4 lines. thank you unaccent.rules Description: unaccent.rules